* [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure
@ 2015-10-12 12:27 Peter Lieven
2015-10-12 12:27 ` [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async Peter Lieven
` (4 more replies)
0 siblings, 5 replies; 18+ messages in thread
From: Peter Lieven @ 2015-10-12 12:27 UTC (permalink / raw)
To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody, jsnow, Peter Lieven
This series aims at avoiding a hanging main-loop if a vserver has a
CDROM image mounted from a NFS share and that NFS share goes down.
Typical situation is that users mount an CDROM ISO to install something
and then forget to eject that CDROM afterwards.
As a consequence this mounted CD is able to bring down the
whole vserver if the backend NFS share is unreachable. This is bad
especially if the CDROM itself is not needed anymore at this point.
This series aims at fixing 2 blocking I/O operations that would
hang if the NFS server is unavailable:
- ATAPI PIO read requests used sync calls to blk_read, convert
them to an async variant where possible.
- If a busmaster DMA request is cancelled all requests are drained.
Convert the drain to an async request canceling.
v1->v2: - fix offset for 2352 byte sector size [Kevin]
- use a sync request if we continue an elementary transfer.
As John pointed out we enter a race condition between next
IDE command and async transfer otherwise. This is sill not
optimal, but it fixes the NFS down problems for all cases where
the NFS server goes down while there is no PIO CD activity.
Of course, it could still happen during a PIO transfer, but I
expect this to be the unlikelier case.
I spent some effort trying to read more sectors at once and
avoiding continuation of elementary transfers, but with
whatever I came up it was destroying migration between different
Qemu versions. I have a quite hackish patch that works and
should survive migration, but I am not happy with it. So I
would like to start with this version as it is a big improvement
already.
- Dropped Patch 5 because it is upstream meanwhile.
Peter Lieven (4):
ide/atapi: make PIO read requests async
ide/atapi: blk_aio_readv may return NULL
ide: add support for cancelable read requests
ide/atapi: enable cancelable requests
hw/ide/atapi.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++------
hw/ide/core.c | 55 +++++++++++++++++++++++++++++++
hw/ide/internal.h | 16 +++++++++
hw/ide/pci.c | 42 +++++++++++++++--------
4 files changed, 188 insertions(+), 24 deletions(-)
--
1.9.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async
2015-10-12 12:27 [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
@ 2015-10-12 12:27 ` Peter Lieven
2015-10-22 16:17 ` Stefan Hajnoczi
2015-11-03 0:48 ` John Snow
2015-10-12 12:27 ` [Qemu-devel] [PATCH 2/4] ide/atapi: blk_aio_readv may return NULL Peter Lieven
` (3 subsequent siblings)
4 siblings, 2 replies; 18+ messages in thread
From: Peter Lieven @ 2015-10-12 12:27 UTC (permalink / raw)
To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody, jsnow, Peter Lieven
PIO read requests on the ATAPI interface used to be sync blk requests.
This has two significant drawbacks. First the main loop hangs util an
I/O request is completed and secondly if the I/O request does not
complete (e.g. due to an unresponsive storage) Qemu hangs completely.
Signed-off-by: Peter Lieven <pl@kamp.de>
---
hw/ide/atapi.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 84 insertions(+), 9 deletions(-)
diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 747f466..2271ea2 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -105,11 +105,16 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
memset(buf, 0, 288);
}
-static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
+static int
+cd_read_sector_sync(IDEState *s, int lba, uint8_t *buf)
{
int ret;
- switch(sector_size) {
+#ifdef DEBUG_IDE_ATAPI
+ printf("cd_read_sector_sync: lba=%d\n", lba);
+#endif
+
+ switch (s->cd_sector_size) {
case 2048:
block_acct_start(blk_get_stats(s->blk), &s->acct,
4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
@@ -129,9 +134,71 @@ static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
ret = -EIO;
break;
}
+
+ if (!ret) {
+ s->lba++;
+ s->io_buffer_index = 0;
+ }
+
return ret;
}
+static void cd_read_sector_cb(void *opaque, int ret)
+{
+ IDEState *s = opaque;
+
+ block_acct_done(blk_get_stats(s->blk), &s->acct);
+
+#ifdef DEBUG_IDE_ATAPI
+ printf("cd_read_sector_cb: lba=%d ret=%d\n", s->lba, ret);
+#endif
+
+ if (ret < 0) {
+ ide_atapi_io_error(s, ret);
+ return;
+ }
+
+ if (s->cd_sector_size == 2352) {
+ cd_data_to_raw(s->io_buffer, s->lba);
+ }
+
+ s->lba++;
+ s->io_buffer_index = 0;
+ s->status &= ~BUSY_STAT;
+
+ ide_atapi_cmd_reply_end(s);
+}
+
+static int cd_read_sector(IDEState *s, int lba, void *buf)
+{
+ if (s->cd_sector_size != 2048 && s->cd_sector_size != 2352) {
+ return -EINVAL;
+ }
+
+ s->iov.iov_base = buf;
+ if (s->cd_sector_size == 2352) {
+ buf += 16;
+ }
+
+ s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
+ qemu_iovec_init_external(&s->qiov, &s->iov, 1);
+
+#ifdef DEBUG_IDE_ATAPI
+ printf("cd_read_sector: lba=%d\n", lba);
+#endif
+
+ if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
+ cd_read_sector_cb, s) == NULL) {
+ return -EIO;
+ }
+
+ block_acct_start(blk_get_stats(s->blk), &s->acct,
+ 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
+
+ s->status |= BUSY_STAT;
+ return 0;
+}
+
void ide_atapi_cmd_ok(IDEState *s)
{
s->error = 0;
@@ -182,18 +249,27 @@ void ide_atapi_cmd_reply_end(IDEState *s)
ide_atapi_cmd_ok(s);
ide_set_irq(s->bus);
#ifdef DEBUG_IDE_ATAPI
- printf("status=0x%x\n", s->status);
+ printf("end of transfer, status=0x%x\n", s->status);
#endif
} else {
/* see if a new sector must be read */
if (s->lba != -1 && s->io_buffer_index >= s->cd_sector_size) {
- ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
- if (ret < 0) {
- ide_atapi_io_error(s, ret);
+ if (!s->elementary_transfer_size) {
+ ret = cd_read_sector(s, s->lba, s->io_buffer);
+ if (ret < 0) {
+ ide_atapi_io_error(s, ret);
+ }
return;
+ } else {
+ /* rebuffering within an elementary transfer is
+ * only possible with a sync request because we
+ * end up with a race condition otherwise */
+ ret = cd_read_sector_sync(s, s->lba, s->io_buffer);
+ if (ret < 0) {
+ ide_atapi_io_error(s, ret);
+ return;
+ }
}
- s->lba++;
- s->io_buffer_index = 0;
}
if (s->elementary_transfer_size > 0) {
/* there are some data left to transmit in this elementary
@@ -275,7 +351,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
s->io_buffer_index = sector_size;
s->cd_sector_size = sector_size;
- s->status = READY_STAT | SEEK_STAT;
ide_atapi_cmd_reply_end(s);
}
--
1.9.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 2/4] ide/atapi: blk_aio_readv may return NULL
2015-10-12 12:27 [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
2015-10-12 12:27 ` [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async Peter Lieven
@ 2015-10-12 12:27 ` Peter Lieven
2015-10-22 16:20 ` Stefan Hajnoczi
2015-10-12 12:27 ` [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests Peter Lieven
` (2 subsequent siblings)
4 siblings, 1 reply; 18+ messages in thread
From: Peter Lieven @ 2015-10-12 12:27 UTC (permalink / raw)
To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody, jsnow, Peter Lieven
Signed-off-by: Peter Lieven <pl@kamp.de>
---
hw/ide/atapi.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 2271ea2..e0cf066 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -429,6 +429,10 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
s->bus->dma->aiocb = blk_aio_readv(s->blk, (int64_t)s->lba << 2,
&s->bus->dma->qiov, n * 4,
ide_atapi_cmd_read_dma_cb, s);
+ if (s->bus->dma->aiocb == NULL) {
+ ide_atapi_io_error(s, -EIO);
+ goto eot;
+ }
return;
eot:
--
1.9.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests
2015-10-12 12:27 [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
2015-10-12 12:27 ` [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async Peter Lieven
2015-10-12 12:27 ` [Qemu-devel] [PATCH 2/4] ide/atapi: blk_aio_readv may return NULL Peter Lieven
@ 2015-10-12 12:27 ` Peter Lieven
2015-10-26 10:39 ` Stefan Hajnoczi
2015-10-12 12:27 ` [Qemu-devel] [PATCH 4/4] ide/atapi: enable cancelable requests Peter Lieven
2015-10-26 10:42 ` [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Stefan Hajnoczi
4 siblings, 1 reply; 18+ messages in thread
From: Peter Lieven @ 2015-10-12 12:27 UTC (permalink / raw)
To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody, jsnow, Peter Lieven
this patch adds a new aio readv compatible function which copies
all data through a bounce buffer. The benefit is that these requests
can be flagged as canceled to avoid guest memory corruption when
a canceled request is completed by the backend at a later stage.
If an IDE protocol wants to use this function it has to pipe
all read requests through ide_readv_cancelable and it may then
enable requests_cancelable in the IDEState.
If this state is enable we can avoid the blocking blk_drain_all
in case of a BMDMA reset.
Currently only read operations are cancelable thus we can only
use this logic for read-only devices.
Signed-off-by: Peter Lieven <pl@kamp.de>
---
hw/ide/core.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
hw/ide/internal.h | 16 ++++++++++++++++
hw/ide/pci.c | 42 ++++++++++++++++++++++++++++--------------
3 files changed, 98 insertions(+), 14 deletions(-)
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 317406d..24547ce 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -561,6 +561,59 @@ static bool ide_sect_range_ok(IDEState *s,
return true;
}
+static void ide_readv_cancelable_cb(void *opaque, int ret)
+{
+ IDECancelableRequest *req = opaque;
+ if (!req->canceled) {
+ if (!ret) {
+ qemu_iovec_from_buf(req->org_qiov, 0, req->buf, req->org_qiov->size);
+ }
+ req->org_cb(req->org_opaque, ret);
+ }
+ QLIST_REMOVE(req, list);
+ qemu_vfree(req->buf);
+ qemu_iovec_destroy(&req->qiov);
+ g_free(req);
+}
+
+#define MAX_CANCELABLE_REQS 16
+
+BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
+ QEMUIOVector *iov, int nb_sectors,
+ BlockCompletionFunc *cb, void *opaque)
+{
+ BlockAIOCB *aioreq;
+ IDECancelableRequest *req;
+ int c = 0;
+
+ QLIST_FOREACH(req, &s->cancelable_requests, list) {
+ c++;
+ }
+ if (c > MAX_CANCELABLE_REQS) {
+ return NULL;
+ }
+
+ req = g_new0(IDECancelableRequest, 1);
+ qemu_iovec_init(&req->qiov, 1);
+ req->buf = qemu_blockalign(blk_bs(s->blk), iov->size);
+ qemu_iovec_add(&req->qiov, req->buf, iov->size);
+ req->org_qiov = iov;
+ req->org_cb = cb;
+ req->org_opaque = opaque;
+
+ aioreq = blk_aio_readv(s->blk, sector_num, &req->qiov, nb_sectors,
+ ide_readv_cancelable_cb, req);
+ if (aioreq == NULL) {
+ qemu_vfree(req->buf);
+ qemu_iovec_destroy(&req->qiov);
+ g_free(req);
+ } else {
+ QLIST_INSERT_HEAD(&s->cancelable_requests, req, list);
+ }
+
+ return aioreq;
+}
+
static void ide_sector_read(IDEState *s);
static void ide_sector_read_cb(void *opaque, int ret)
@@ -805,6 +858,7 @@ void ide_start_dma(IDEState *s, BlockCompletionFunc *cb)
s->bus->retry_unit = s->unit;
s->bus->retry_sector_num = ide_get_sector(s);
s->bus->retry_nsector = s->nsector;
+ s->bus->s = s;
if (s->bus->dma->ops->start_dma) {
s->bus->dma->ops->start_dma(s->bus->dma, s, cb);
}
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 05e93ff..ad188c2 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -343,6 +343,16 @@ enum ide_dma_cmd {
#define ide_cmd_is_read(s) \
((s)->dma_cmd == IDE_DMA_READ)
+typedef struct IDECancelableRequest {
+ QLIST_ENTRY(IDECancelableRequest) list;
+ QEMUIOVector qiov;
+ uint8_t *buf;
+ QEMUIOVector *org_qiov;
+ BlockCompletionFunc *org_cb;
+ void *org_opaque;
+ bool canceled;
+} IDECancelableRequest;
+
/* NOTE: IDEState represents in fact one drive */
struct IDEState {
IDEBus *bus;
@@ -396,6 +406,8 @@ struct IDEState {
BlockAIOCB *pio_aiocb;
struct iovec iov;
QEMUIOVector qiov;
+ QLIST_HEAD(, IDECancelableRequest) cancelable_requests;
+ bool requests_cancelable;
/* ATA DMA state */
int32_t io_buffer_offset;
int32_t io_buffer_size;
@@ -468,6 +480,7 @@ struct IDEBus {
uint8_t retry_unit;
int64_t retry_sector_num;
uint32_t retry_nsector;
+ IDEState *s;
};
#define TYPE_IDE_DEVICE "ide-device"
@@ -572,6 +585,9 @@ void ide_set_inactive(IDEState *s, bool more);
BlockAIOCB *ide_issue_trim(BlockBackend *blk,
int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
+ QEMUIOVector *iov, int nb_sectors,
+ BlockCompletionFunc *cb, void *opaque);
/* hw/ide/atapi.c */
void ide_atapi_cmd(IDEState *s);
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index d31ff88..5587183 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -240,21 +240,35 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
/* Ignore writes to SSBM if it keeps the old value */
if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) {
if (!(val & BM_CMD_START)) {
- /*
- * We can't cancel Scatter Gather DMA in the middle of the
- * operation or a partial (not full) DMA transfer would reach
- * the storage so we wait for completion instead (we beahve
- * like if the DMA was completed by the time the guest trying
- * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
- * set).
- *
- * In the future we'll be able to safely cancel the I/O if the
- * whole DMA operation will be submitted to disk with a single
- * aio operation with preadv/pwritev.
- */
if (bm->bus->dma->aiocb) {
- blk_drain_all();
- assert(bm->bus->dma->aiocb == NULL);
+ if (bm->bus->s && bm->bus->s->requests_cancelable) {
+ /*
+ * If the used IDE protocol supports request cancelation we
+ * can flag requests as canceled here and disable DMA.
+ * The IDE protocol used MUST use ide_readv_cancelable for all
+ * read operations and then subsequently can enable this code
+ * path. Currently this is only supported for read-only
+ * devices.
+ */
+ IDECancelableRequest *req;
+ QLIST_FOREACH(req, &bm->bus->s->cancelable_requests, list) {
+ if (!req->canceled) {
+ req->org_cb(req->org_opaque, -ECANCELED);
+ }
+ req->canceled = true;
+ }
+ } else {
+ /*
+ * We can't cancel Scatter Gather DMA in the middle of the
+ * operation or a partial (not full) DMA transfer would reach
+ * the storage so we wait for completion instead (we beahve
+ * like if the DMA was completed by the time the guest trying
+ * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
+ * set).
+ */
+ blk_drain_all();
+ assert(bm->bus->dma->aiocb == NULL);
+ }
}
bm->status &= ~BM_STATUS_DMAING;
} else {
--
1.9.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 4/4] ide/atapi: enable cancelable requests
2015-10-12 12:27 [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
` (2 preceding siblings ...)
2015-10-12 12:27 ` [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests Peter Lieven
@ 2015-10-12 12:27 ` Peter Lieven
2015-10-26 10:42 ` [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Stefan Hajnoczi
4 siblings, 0 replies; 18+ messages in thread
From: Peter Lieven @ 2015-10-12 12:27 UTC (permalink / raw)
To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody, jsnow, Peter Lieven
Signed-off-by: Peter Lieven <pl@kamp.de>
---
hw/ide/atapi.c | 4 ++--
hw/ide/core.c | 1 +
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index e0cf066..8d38b1d 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -187,7 +187,7 @@ static int cd_read_sector(IDEState *s, int lba, void *buf)
printf("cd_read_sector: lba=%d\n", lba);
#endif
- if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
+ if (ide_readv_cancelable(s, (int64_t)lba << 2, &s->qiov, 4,
cd_read_sector_cb, s) == NULL) {
return -EIO;
}
@@ -426,7 +426,7 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
s->bus->dma->iov.iov_len = n * 4 * 512;
qemu_iovec_init_external(&s->bus->dma->qiov, &s->bus->dma->iov, 1);
- s->bus->dma->aiocb = blk_aio_readv(s->blk, (int64_t)s->lba << 2,
+ s->bus->dma->aiocb = ide_readv_cancelable(s, (int64_t)s->lba << 2,
&s->bus->dma->qiov, n * 4,
ide_atapi_cmd_read_dma_cb, s);
if (s->bus->dma->aiocb == NULL) {
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 24547ce..5c7a346 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2330,6 +2330,7 @@ int ide_init_drive(IDEState *s, BlockBackend *blk, IDEDriveKind kind,
if (kind == IDE_CD) {
blk_set_dev_ops(blk, &ide_cd_block_ops, s);
blk_set_guest_block_size(blk, 2048);
+ s->requests_cancelable = true;
} else {
if (!blk_is_inserted(s->blk)) {
error_report("Device needs media, but drive is empty");
--
1.9.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async
2015-10-12 12:27 ` [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async Peter Lieven
@ 2015-10-22 16:17 ` Stefan Hajnoczi
2015-10-23 15:17 ` Peter Lieven
2015-11-03 0:48 ` John Snow
1 sibling, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2015-10-22 16:17 UTC (permalink / raw)
To: Peter Lieven; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
On Mon, Oct 12, 2015 at 02:27:22PM +0200, Peter Lieven wrote:
> @@ -129,9 +134,71 @@ static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
> ret = -EIO;
> break;
> }
> +
> + if (!ret) {
> + s->lba++;
This function probably shouldn't take the lba argument if it modifies
s->lba. You dropped the sector_size argument and I think the lba
argument should be dropped for the same reason.
> +static int cd_read_sector(IDEState *s, int lba, void *buf)
> +{
> + if (s->cd_sector_size != 2048 && s->cd_sector_size != 2352) {
> + return -EINVAL;
> + }
> +
> + s->iov.iov_base = buf;
> + if (s->cd_sector_size == 2352) {
> + buf += 16;
> + }
> +
> + s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> + qemu_iovec_init_external(&s->qiov, &s->iov, 1);
> +
> +#ifdef DEBUG_IDE_ATAPI
> + printf("cd_read_sector: lba=%d\n", lba);
> +#endif
> +
> + if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
> + cd_read_sector_cb, s) == NULL) {
This function never returns NULL, the if statement can be removed.
Doesn't the aiocb need to be stored for cancellation (e.g. device reset)?
> + return -EIO;
> + }
> +
> + block_acct_start(blk_get_stats(s->blk), &s->acct,
> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
Why does accounting start *after* the read request has been submitted?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 2/4] ide/atapi: blk_aio_readv may return NULL
2015-10-12 12:27 ` [Qemu-devel] [PATCH 2/4] ide/atapi: blk_aio_readv may return NULL Peter Lieven
@ 2015-10-22 16:20 ` Stefan Hajnoczi
2015-10-23 15:18 ` Peter Lieven
0 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2015-10-22 16:20 UTC (permalink / raw)
To: Peter Lieven; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
On Mon, Oct 12, 2015 at 02:27:23PM +0200, Peter Lieven wrote:
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
> hw/ide/atapi.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index 2271ea2..e0cf066 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -429,6 +429,10 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
> s->bus->dma->aiocb = blk_aio_readv(s->blk, (int64_t)s->lba << 2,
> &s->bus->dma->qiov, n * 4,
> ide_atapi_cmd_read_dma_cb, s);
> + if (s->bus->dma->aiocb == NULL) {
> + ide_atapi_io_error(s, -EIO);
> + goto eot;
> + }
Where does blk_aio_readv() return NULL?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async
2015-10-22 16:17 ` Stefan Hajnoczi
@ 2015-10-23 15:17 ` Peter Lieven
0 siblings, 0 replies; 18+ messages in thread
From: Peter Lieven @ 2015-10-23 15:17 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
Am 22.10.2015 um 18:17 schrieb Stefan Hajnoczi:
> On Mon, Oct 12, 2015 at 02:27:22PM +0200, Peter Lieven wrote:
>> @@ -129,9 +134,71 @@ static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
>> ret = -EIO;
>> break;
>> }
>> +
>> + if (!ret) {
>> + s->lba++;
> This function probably shouldn't take the lba argument if it modifies
> s->lba. You dropped the sector_size argument and I think the lba
> argument should be dropped for the same reason.
>
>> +static int cd_read_sector(IDEState *s, int lba, void *buf)
>> +{
>> + if (s->cd_sector_size != 2048 && s->cd_sector_size != 2352) {
>> + return -EINVAL;
>> + }
>> +
>> + s->iov.iov_base = buf;
>> + if (s->cd_sector_size == 2352) {
>> + buf += 16;
>> + }
>> +
>> + s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>> + qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>> +
>> +#ifdef DEBUG_IDE_ATAPI
>> + printf("cd_read_sector: lba=%d\n", lba);
>> +#endif
>> +
>> + if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
>> + cd_read_sector_cb, s) == NULL) {
> This function never returns NULL, the if statement can be removed.
Okay, that was my fault. I was believing it could return NULL.
>
> Doesn't the aiocb need to be stored for cancellation (e.g. device reset)?
The ide_readv_cancelable function introduced in Patch 3 will store
the aiocb. At this point there is blk_drain_all called when the device is reset.
>
>> + return -EIO;
>> + }
>> +
>> + block_acct_start(blk_get_stats(s->blk), &s->acct,
>> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> Why does accounting start *after* the read request has been submitted?
You are right it has to start before the request.
Peter
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 2/4] ide/atapi: blk_aio_readv may return NULL
2015-10-22 16:20 ` Stefan Hajnoczi
@ 2015-10-23 15:18 ` Peter Lieven
0 siblings, 0 replies; 18+ messages in thread
From: Peter Lieven @ 2015-10-23 15:18 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
Am 22.10.2015 um 18:20 schrieb Stefan Hajnoczi:
> On Mon, Oct 12, 2015 at 02:27:23PM +0200, Peter Lieven wrote:
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> hw/ide/atapi.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>> index 2271ea2..e0cf066 100644
>> --- a/hw/ide/atapi.c
>> +++ b/hw/ide/atapi.c
>> @@ -429,6 +429,10 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
>> s->bus->dma->aiocb = blk_aio_readv(s->blk, (int64_t)s->lba << 2,
>> &s->bus->dma->qiov, n * 4,
>> ide_atapi_cmd_read_dma_cb, s);
>> + if (s->bus->dma->aiocb == NULL) {
>> + ide_atapi_io_error(s, -EIO);
>> + goto eot;
>> + }
> Where does blk_aio_readv() return NULL?
Never. My fault.
Peter
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests
2015-10-12 12:27 ` [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests Peter Lieven
@ 2015-10-26 10:39 ` Stefan Hajnoczi
2015-10-27 10:58 ` Peter Lieven
0 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2015-10-26 10:39 UTC (permalink / raw)
To: Peter Lieven; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
On Mon, Oct 12, 2015 at 02:27:24PM +0200, Peter Lieven wrote:
> this patch adds a new aio readv compatible function which copies
> all data through a bounce buffer. The benefit is that these requests
> can be flagged as canceled to avoid guest memory corruption when
> a canceled request is completed by the backend at a later stage.
>
> If an IDE protocol wants to use this function it has to pipe
> all read requests through ide_readv_cancelable and it may then
> enable requests_cancelable in the IDEState.
>
> If this state is enable we can avoid the blocking blk_drain_all
> in case of a BMDMA reset.
>
> Currently only read operations are cancelable thus we can only
> use this logic for read-only devices.
Naming is confusing here. Requests are already "cancelable" using
bdv_aio_cancel().
Please use a different name, for example "orphan" requests. These are
requests that QEMU still knows about but the guest believes are
complete. Or maybe "IDEBufferedRequest" since data is transferred
through a bounce buffer.
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
> hw/ide/core.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> hw/ide/internal.h | 16 ++++++++++++++++
> hw/ide/pci.c | 42 ++++++++++++++++++++++++++++--------------
> 3 files changed, 98 insertions(+), 14 deletions(-)
>
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 317406d..24547ce 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -561,6 +561,59 @@ static bool ide_sect_range_ok(IDEState *s,
> return true;
> }
>
> +static void ide_readv_cancelable_cb(void *opaque, int ret)
> +{
> + IDECancelableRequest *req = opaque;
> + if (!req->canceled) {
> + if (!ret) {
> + qemu_iovec_from_buf(req->org_qiov, 0, req->buf, req->org_qiov->size);
> + }
> + req->org_cb(req->org_opaque, ret);
> + }
> + QLIST_REMOVE(req, list);
> + qemu_vfree(req->buf);
> + qemu_iovec_destroy(&req->qiov);
> + g_free(req);
> +}
> +
> +#define MAX_CANCELABLE_REQS 16
> +
> +BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
> + QEMUIOVector *iov, int nb_sectors,
> + BlockCompletionFunc *cb, void *opaque)
> +{
> + BlockAIOCB *aioreq;
> + IDECancelableRequest *req;
> + int c = 0;
> +
> + QLIST_FOREACH(req, &s->cancelable_requests, list) {
> + c++;
> + }
> + if (c > MAX_CANCELABLE_REQS) {
> + return NULL;
> + }
A BH is probably needed here to schedule an cb(-EIO) call since this
function isn't supposed to return NULL if it's a direct replacement for
blk_aio_readv().
> +
> + req = g_new0(IDECancelableRequest, 1);
> + qemu_iovec_init(&req->qiov, 1);
It saves a g_new() call if you add a struct iovec field to
IDECancelableRequest and use qemu_iovec_init_external() instead of
qemu_iovec_init().
The qemu_iovec_destroy() calls must be dropped when an external struct
iovec is used.
The qemu_iovec_init_external() call must be moved after the
qemu_blockalign() and struct iovec setup below.
> + req->buf = qemu_blockalign(blk_bs(s->blk), iov->size);
> + qemu_iovec_add(&req->qiov, req->buf, iov->size);
> + req->org_qiov = iov;
> + req->org_cb = cb;
> + req->org_opaque = opaque;
> +
> + aioreq = blk_aio_readv(s->blk, sector_num, &req->qiov, nb_sectors,
> + ide_readv_cancelable_cb, req);
> + if (aioreq == NULL) {
> + qemu_vfree(req->buf);
> + qemu_iovec_destroy(&req->qiov);
> + g_free(req);
> + } else {
> + QLIST_INSERT_HEAD(&s->cancelable_requests, req, list);
> + }
> +
> + return aioreq;
> +}
> +
> static void ide_sector_read(IDEState *s);
>
> static void ide_sector_read_cb(void *opaque, int ret)
> @@ -805,6 +858,7 @@ void ide_start_dma(IDEState *s, BlockCompletionFunc *cb)
> s->bus->retry_unit = s->unit;
> s->bus->retry_sector_num = ide_get_sector(s);
> s->bus->retry_nsector = s->nsector;
> + s->bus->s = s;
How is 's' different from 'unit' and 'retry_unit'?
The logic for switching between units is already a little tricky since
the guest can write to the hardware registers while requests are
in-flight.
Please don't duplicate "active unit" state, that increases the risk of
inconsistencies.
Can you use idebus_active_if() to get an equivalent IDEState pointer
without storing s?
> if (s->bus->dma->ops->start_dma) {
> s->bus->dma->ops->start_dma(s->bus->dma, s, cb);
> }
> diff --git a/hw/ide/internal.h b/hw/ide/internal.h
> index 05e93ff..ad188c2 100644
> --- a/hw/ide/internal.h
> +++ b/hw/ide/internal.h
> @@ -343,6 +343,16 @@ enum ide_dma_cmd {
> #define ide_cmd_is_read(s) \
> ((s)->dma_cmd == IDE_DMA_READ)
>
> +typedef struct IDECancelableRequest {
> + QLIST_ENTRY(IDECancelableRequest) list;
> + QEMUIOVector qiov;
> + uint8_t *buf;
> + QEMUIOVector *org_qiov;
> + BlockCompletionFunc *org_cb;
> + void *org_opaque;
Please don't shorten names, original_* is clearer than org_*.
> + bool canceled;
> +} IDECancelableRequest;
> +
> /* NOTE: IDEState represents in fact one drive */
> struct IDEState {
> IDEBus *bus;
> @@ -396,6 +406,8 @@ struct IDEState {
> BlockAIOCB *pio_aiocb;
> struct iovec iov;
> QEMUIOVector qiov;
> + QLIST_HEAD(, IDECancelableRequest) cancelable_requests;
> + bool requests_cancelable;
> /* ATA DMA state */
> int32_t io_buffer_offset;
> int32_t io_buffer_size;
> @@ -468,6 +480,7 @@ struct IDEBus {
> uint8_t retry_unit;
> int64_t retry_sector_num;
> uint32_t retry_nsector;
> + IDEState *s;
> };
>
> #define TYPE_IDE_DEVICE "ide-device"
> @@ -572,6 +585,9 @@ void ide_set_inactive(IDEState *s, bool more);
> BlockAIOCB *ide_issue_trim(BlockBackend *blk,
> int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
> BlockCompletionFunc *cb, void *opaque);
> +BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
> + QEMUIOVector *iov, int nb_sectors,
> + BlockCompletionFunc *cb, void *opaque);
>
> /* hw/ide/atapi.c */
> void ide_atapi_cmd(IDEState *s);
> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> index d31ff88..5587183 100644
> --- a/hw/ide/pci.c
> +++ b/hw/ide/pci.c
> @@ -240,21 +240,35 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
> /* Ignore writes to SSBM if it keeps the old value */
> if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) {
> if (!(val & BM_CMD_START)) {
> - /*
> - * We can't cancel Scatter Gather DMA in the middle of the
> - * operation or a partial (not full) DMA transfer would reach
> - * the storage so we wait for completion instead (we beahve
> - * like if the DMA was completed by the time the guest trying
> - * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
> - * set).
> - *
> - * In the future we'll be able to safely cancel the I/O if the
> - * whole DMA operation will be submitted to disk with a single
> - * aio operation with preadv/pwritev.
> - */
> if (bm->bus->dma->aiocb) {
> - blk_drain_all();
> - assert(bm->bus->dma->aiocb == NULL);
> + if (bm->bus->s && bm->bus->s->requests_cancelable) {
> + /*
> + * If the used IDE protocol supports request cancelation we
> + * can flag requests as canceled here and disable DMA.
> + * The IDE protocol used MUST use ide_readv_cancelable for all
> + * read operations and then subsequently can enable this code
> + * path. Currently this is only supported for read-only
> + * devices.
> + */
> + IDECancelableRequest *req;
> + QLIST_FOREACH(req, &bm->bus->s->cancelable_requests, list) {
> + if (!req->canceled) {
> + req->org_cb(req->org_opaque, -ECANCELED);
> + }
> + req->canceled = true;
> + }
> + } else {
> + /*
> + * We can't cancel Scatter Gather DMA in the middle of the
> + * operation or a partial (not full) DMA transfer would reach
> + * the storage so we wait for completion instead (we beahve
> + * like if the DMA was completed by the time the guest trying
> + * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
> + * set).
> + */
> + blk_drain_all();
> + assert(bm->bus->dma->aiocb == NULL);
This assertion applies is both branches of the if statement, it could be
moved after the if statement.
> + }
> }
> bm->status &= ~BM_STATUS_DMAING;
> } else {
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure
2015-10-12 12:27 [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
` (3 preceding siblings ...)
2015-10-12 12:27 ` [Qemu-devel] [PATCH 4/4] ide/atapi: enable cancelable requests Peter Lieven
@ 2015-10-26 10:42 ` Stefan Hajnoczi
2015-10-26 10:56 ` Peter Lieven
4 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2015-10-26 10:42 UTC (permalink / raw)
To: Peter Lieven; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
On Mon, Oct 12, 2015 at 02:27:21PM +0200, Peter Lieven wrote:
> This series aims at avoiding a hanging main-loop if a vserver has a
> CDROM image mounted from a NFS share and that NFS share goes down.
> Typical situation is that users mount an CDROM ISO to install something
> and then forget to eject that CDROM afterwards.
> As a consequence this mounted CD is able to bring down the
> whole vserver if the backend NFS share is unreachable. This is bad
> especially if the CDROM itself is not needed anymore at this point.
>
> This series aims at fixing 2 blocking I/O operations that would
> hang if the NFS server is unavailable:
> - ATAPI PIO read requests used sync calls to blk_read, convert
> them to an async variant where possible.
> - If a busmaster DMA request is cancelled all requests are drained.
> Convert the drain to an async request canceling.
>
> v1->v2: - fix offset for 2352 byte sector size [Kevin]
> - use a sync request if we continue an elementary transfer.
> As John pointed out we enter a race condition between next
> IDE command and async transfer otherwise. This is sill not
> optimal, but it fixes the NFS down problems for all cases where
> the NFS server goes down while there is no PIO CD activity.
> Of course, it could still happen during a PIO transfer, but I
> expect this to be the unlikelier case.
> I spent some effort trying to read more sectors at once and
> avoiding continuation of elementary transfers, but with
> whatever I came up it was destroying migration between different
> Qemu versions. I have a quite hackish patch that works and
> should survive migration, but I am not happy with it. So I
> would like to start with this version as it is a big improvement
> already.
> - Dropped Patch 5 because it is upstream meanwhile.
>
> Peter Lieven (4):
> ide/atapi: make PIO read requests async
> ide/atapi: blk_aio_readv may return NULL
> ide: add support for cancelable read requests
> ide/atapi: enable cancelable requests
>
> hw/ide/atapi.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++------
> hw/ide/core.c | 55 +++++++++++++++++++++++++++++++
> hw/ide/internal.h | 16 +++++++++
> hw/ide/pci.c | 42 +++++++++++++++--------
> 4 files changed, 188 insertions(+), 24 deletions(-)
Any reason why write and discard requests aren't covered in this series?
If this is a good idea for CD-ROM it should be a good idea for all PCI
IDE devices.
Having a specialized code path is often a sign that it hasn't been
tested enough. Can we get confident enough to enable this everywhere?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure
2015-10-26 10:42 ` [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Stefan Hajnoczi
@ 2015-10-26 10:56 ` Peter Lieven
2015-10-28 11:27 ` Stefan Hajnoczi
0 siblings, 1 reply; 18+ messages in thread
From: Peter Lieven @ 2015-10-26 10:56 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
Am 26.10.2015 um 11:42 schrieb Stefan Hajnoczi:
> On Mon, Oct 12, 2015 at 02:27:21PM +0200, Peter Lieven wrote:
>> This series aims at avoiding a hanging main-loop if a vserver has a
>> CDROM image mounted from a NFS share and that NFS share goes down.
>> Typical situation is that users mount an CDROM ISO to install something
>> and then forget to eject that CDROM afterwards.
>> As a consequence this mounted CD is able to bring down the
>> whole vserver if the backend NFS share is unreachable. This is bad
>> especially if the CDROM itself is not needed anymore at this point.
>>
>> This series aims at fixing 2 blocking I/O operations that would
>> hang if the NFS server is unavailable:
>> - ATAPI PIO read requests used sync calls to blk_read, convert
>> them to an async variant where possible.
>> - If a busmaster DMA request is cancelled all requests are drained.
>> Convert the drain to an async request canceling.
>>
>> v1->v2: - fix offset for 2352 byte sector size [Kevin]
>> - use a sync request if we continue an elementary transfer.
>> As John pointed out we enter a race condition between next
>> IDE command and async transfer otherwise. This is sill not
>> optimal, but it fixes the NFS down problems for all cases where
>> the NFS server goes down while there is no PIO CD activity.
>> Of course, it could still happen during a PIO transfer, but I
>> expect this to be the unlikelier case.
>> I spent some effort trying to read more sectors at once and
>> avoiding continuation of elementary transfers, but with
>> whatever I came up it was destroying migration between different
>> Qemu versions. I have a quite hackish patch that works and
>> should survive migration, but I am not happy with it. So I
>> would like to start with this version as it is a big improvement
>> already.
>> - Dropped Patch 5 because it is upstream meanwhile.
>>
>> Peter Lieven (4):
>> ide/atapi: make PIO read requests async
>> ide/atapi: blk_aio_readv may return NULL
>> ide: add support for cancelable read requests
>> ide/atapi: enable cancelable requests
>>
>> hw/ide/atapi.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++------
>> hw/ide/core.c | 55 +++++++++++++++++++++++++++++++
>> hw/ide/internal.h | 16 +++++++++
>> hw/ide/pci.c | 42 +++++++++++++++--------
>> 4 files changed, 188 insertions(+), 24 deletions(-)
> Any reason why write and discard requests aren't covered in this series?
>
> If this is a good idea for CD-ROM it should be a good idea for all PCI
> IDE devices.
>
> Having a specialized code path is often a sign that it hasn't been
> tested enough. Can we get confident enough to enable this everywhere?
The reason is that the buffered request trick does only work for
read-only devices (like a CDROM). A write request that is completed
on the backend storage at a later point (after the OS thinks the request
is canceled) can cause damage to the filesystem.
Peter
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests
2015-10-26 10:39 ` Stefan Hajnoczi
@ 2015-10-27 10:58 ` Peter Lieven
2015-10-28 11:26 ` Stefan Hajnoczi
0 siblings, 1 reply; 18+ messages in thread
From: Peter Lieven @ 2015-10-27 10:58 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
Am 26.10.2015 um 11:39 schrieb Stefan Hajnoczi:
> On Mon, Oct 12, 2015 at 02:27:24PM +0200, Peter Lieven wrote:
>> this patch adds a new aio readv compatible function which copies
>> all data through a bounce buffer. The benefit is that these requests
>> can be flagged as canceled to avoid guest memory corruption when
>> a canceled request is completed by the backend at a later stage.
>>
>> If an IDE protocol wants to use this function it has to pipe
>> all read requests through ide_readv_cancelable and it may then
>> enable requests_cancelable in the IDEState.
>>
>> If this state is enable we can avoid the blocking blk_drain_all
>> in case of a BMDMA reset.
>>
>> Currently only read operations are cancelable thus we can only
>> use this logic for read-only devices.
> Naming is confusing here. Requests are already "cancelable" using
> bdv_aio_cancel().
>
> Please use a different name, for example "orphan" requests. These are
> requests that QEMU still knows about but the guest believes are
> complete. Or maybe "IDEBufferedRequest" since data is transferred
> through a bounce buffer.
>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> hw/ide/core.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> hw/ide/internal.h | 16 ++++++++++++++++
>> hw/ide/pci.c | 42 ++++++++++++++++++++++++++++--------------
>> 3 files changed, 98 insertions(+), 14 deletions(-)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index 317406d..24547ce 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -561,6 +561,59 @@ static bool ide_sect_range_ok(IDEState *s,
>> return true;
>> }
>>
>> +static void ide_readv_cancelable_cb(void *opaque, int ret)
>> +{
>> + IDECancelableRequest *req = opaque;
>> + if (!req->canceled) {
>> + if (!ret) {
>> + qemu_iovec_from_buf(req->org_qiov, 0, req->buf, req->org_qiov->size);
>> + }
>> + req->org_cb(req->org_opaque, ret);
>> + }
>> + QLIST_REMOVE(req, list);
>> + qemu_vfree(req->buf);
>> + qemu_iovec_destroy(&req->qiov);
>> + g_free(req);
>> +}
>> +
>> +#define MAX_CANCELABLE_REQS 16
>> +
>> +BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
>> + QEMUIOVector *iov, int nb_sectors,
>> + BlockCompletionFunc *cb, void *opaque)
>> +{
>> + BlockAIOCB *aioreq;
>> + IDECancelableRequest *req;
>> + int c = 0;
>> +
>> + QLIST_FOREACH(req, &s->cancelable_requests, list) {
>> + c++;
>> + }
>> + if (c > MAX_CANCELABLE_REQS) {
>> + return NULL;
>> + }
> A BH is probably needed here to schedule an cb(-EIO) call since this
> function isn't supposed to return NULL if it's a direct replacement for
> blk_aio_readv().
You mean sth like:
acb = qemu_aio_get(&bdrv_em_aiocb_info, bs, cb, opaque);
acb->bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_aio_bh_cb, acb);
acb->ret = -EIO;
qemu_bh_schedule(acb->bh);
return &acb->common;
>
>> +
>> + req = g_new0(IDECancelableRequest, 1);
>> + qemu_iovec_init(&req->qiov, 1);
> It saves a g_new() call if you add a struct iovec field to
> IDECancelableRequest and use qemu_iovec_init_external() instead of
> qemu_iovec_init().
>
> The qemu_iovec_destroy() calls must be dropped when an external struct
> iovec is used.
>
> The qemu_iovec_init_external() call must be moved after the
> qemu_blockalign() and struct iovec setup below.
okay
>
>> + req->buf = qemu_blockalign(blk_bs(s->blk), iov->size);
>> + qemu_iovec_add(&req->qiov, req->buf, iov->size);
>> + req->org_qiov = iov;
>> + req->org_cb = cb;
>> + req->org_opaque = opaque;
>> +
>> + aioreq = blk_aio_readv(s->blk, sector_num, &req->qiov, nb_sectors,
>> + ide_readv_cancelable_cb, req);
>> + if (aioreq == NULL) {
>> + qemu_vfree(req->buf);
>> + qemu_iovec_destroy(&req->qiov);
>> + g_free(req);
>> + } else {
>> + QLIST_INSERT_HEAD(&s->cancelable_requests, req, list);
>> + }
>> +
>> + return aioreq;
>> +}
>> +
>> static void ide_sector_read(IDEState *s);
>>
>> static void ide_sector_read_cb(void *opaque, int ret)
>> @@ -805,6 +858,7 @@ void ide_start_dma(IDEState *s, BlockCompletionFunc *cb)
>> s->bus->retry_unit = s->unit;
>> s->bus->retry_sector_num = ide_get_sector(s);
>> s->bus->retry_nsector = s->nsector;
>> + s->bus->s = s;
> How is 's' different from 'unit' and 'retry_unit'?
>
> The logic for switching between units is already a little tricky since
> the guest can write to the hardware registers while requests are
> in-flight.
>
> Please don't duplicate "active unit" state, that increases the risk of
> inconsistencies.
>
> Can you use idebus_active_if() to get an equivalent IDEState pointer
> without storing s?
That should be possible.
>
>> if (s->bus->dma->ops->start_dma) {
>> s->bus->dma->ops->start_dma(s->bus->dma, s, cb);
>> }
>> diff --git a/hw/ide/internal.h b/hw/ide/internal.h
>> index 05e93ff..ad188c2 100644
>> --- a/hw/ide/internal.h
>> +++ b/hw/ide/internal.h
>> @@ -343,6 +343,16 @@ enum ide_dma_cmd {
>> #define ide_cmd_is_read(s) \
>> ((s)->dma_cmd == IDE_DMA_READ)
>>
>> +typedef struct IDECancelableRequest {
>> + QLIST_ENTRY(IDECancelableRequest) list;
>> + QEMUIOVector qiov;
>> + uint8_t *buf;
>> + QEMUIOVector *org_qiov;
>> + BlockCompletionFunc *org_cb;
>> + void *org_opaque;
> Please don't shorten names, original_* is clearer than org_*.
Ok.
>
>> + bool canceled;
>> +} IDECancelableRequest;
>> +
>> /* NOTE: IDEState represents in fact one drive */
>> struct IDEState {
>> IDEBus *bus;
>> @@ -396,6 +406,8 @@ struct IDEState {
>> BlockAIOCB *pio_aiocb;
>> struct iovec iov;
>> QEMUIOVector qiov;
>> + QLIST_HEAD(, IDECancelableRequest) cancelable_requests;
>> + bool requests_cancelable;
>> /* ATA DMA state */
>> int32_t io_buffer_offset;
>> int32_t io_buffer_size;
>> @@ -468,6 +480,7 @@ struct IDEBus {
>> uint8_t retry_unit;
>> int64_t retry_sector_num;
>> uint32_t retry_nsector;
>> + IDEState *s;
>> };
>>
>> #define TYPE_IDE_DEVICE "ide-device"
>> @@ -572,6 +585,9 @@ void ide_set_inactive(IDEState *s, bool more);
>> BlockAIOCB *ide_issue_trim(BlockBackend *blk,
>> int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
>> BlockCompletionFunc *cb, void *opaque);
>> +BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
>> + QEMUIOVector *iov, int nb_sectors,
>> + BlockCompletionFunc *cb, void *opaque);
>>
>> /* hw/ide/atapi.c */
>> void ide_atapi_cmd(IDEState *s);
>> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
>> index d31ff88..5587183 100644
>> --- a/hw/ide/pci.c
>> +++ b/hw/ide/pci.c
>> @@ -240,21 +240,35 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
>> /* Ignore writes to SSBM if it keeps the old value */
>> if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) {
>> if (!(val & BM_CMD_START)) {
>> - /*
>> - * We can't cancel Scatter Gather DMA in the middle of the
>> - * operation or a partial (not full) DMA transfer would reach
>> - * the storage so we wait for completion instead (we beahve
>> - * like if the DMA was completed by the time the guest trying
>> - * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
>> - * set).
>> - *
>> - * In the future we'll be able to safely cancel the I/O if the
>> - * whole DMA operation will be submitted to disk with a single
>> - * aio operation with preadv/pwritev.
>> - */
>> if (bm->bus->dma->aiocb) {
>> - blk_drain_all();
>> - assert(bm->bus->dma->aiocb == NULL);
>> + if (bm->bus->s && bm->bus->s->requests_cancelable) {
>> + /*
>> + * If the used IDE protocol supports request cancelation we
>> + * can flag requests as canceled here and disable DMA.
>> + * The IDE protocol used MUST use ide_readv_cancelable for all
>> + * read operations and then subsequently can enable this code
>> + * path. Currently this is only supported for read-only
>> + * devices.
>> + */
>> + IDECancelableRequest *req;
>> + QLIST_FOREACH(req, &bm->bus->s->cancelable_requests, list) {
>> + if (!req->canceled) {
>> + req->org_cb(req->org_opaque, -ECANCELED);
>> + }
>> + req->canceled = true;
>> + }
>> + } else {
>> + /*
>> + * We can't cancel Scatter Gather DMA in the middle of the
>> + * operation or a partial (not full) DMA transfer would reach
>> + * the storage so we wait for completion instead (we beahve
>> + * like if the DMA was completed by the time the guest trying
>> + * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
>> + * set).
>> + */
>> + blk_drain_all();
>> + assert(bm->bus->dma->aiocb == NULL);
> This assertion applies is both branches of the if statement, it could be
> moved after the if statement.
Right.
As pointed out in my comment to your requestion about write/discard I think it should
be feasible to use buffered readv requests for all read-only IDE devices.
Only thing I'm unsure about is reopening. A reopen seems to only flush the device not
drain all requests.
Peter
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests
2015-10-27 10:58 ` Peter Lieven
@ 2015-10-28 11:26 ` Stefan Hajnoczi
2015-10-28 19:56 ` Peter Lieven
0 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2015-10-28 11:26 UTC (permalink / raw)
To: Peter Lieven; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
[-- Attachment #1: Type: text/plain, Size: 1551 bytes --]
On Tue, Oct 27, 2015 at 11:58:55AM +0100, Peter Lieven wrote:
> Am 26.10.2015 um 11:39 schrieb Stefan Hajnoczi:
> >On Mon, Oct 12, 2015 at 02:27:24PM +0200, Peter Lieven wrote:
> >>+BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
> >>+ QEMUIOVector *iov, int nb_sectors,
> >>+ BlockCompletionFunc *cb, void *opaque)
> >>+{
> >>+ BlockAIOCB *aioreq;
> >>+ IDECancelableRequest *req;
> >>+ int c = 0;
> >>+
> >>+ QLIST_FOREACH(req, &s->cancelable_requests, list) {
> >>+ c++;
> >>+ }
> >>+ if (c > MAX_CANCELABLE_REQS) {
> >>+ return NULL;
> >>+ }
> >A BH is probably needed here to schedule an cb(-EIO) call since this
> >function isn't supposed to return NULL if it's a direct replacement for
> >blk_aio_readv().
>
> You mean sth like:
>
> acb = qemu_aio_get(&bdrv_em_aiocb_info, bs, cb, opaque);
> acb->bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_aio_bh_cb, acb);
> acb->ret = -EIO;
> qemu_bh_schedule(acb->bh);
>
> return &acb->common;
Yes.
> As pointed out in my comment to your requestion about write/discard I think it should
> be feasible to use buffered readv requests for all read-only IDE devices.
> Only thing I'm unsure about is reopening. A reopen seems to only flush the device not
> drain all requests.
bdrv_reopen_prepare() callers should drain requests. For example,
bdrv_reopen_multiple() (and indirectly bdrv_reopen()) call
bdrv_drain_all(). Is this what you mean?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure
2015-10-26 10:56 ` Peter Lieven
@ 2015-10-28 11:27 ` Stefan Hajnoczi
0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2015-10-28 11:27 UTC (permalink / raw)
To: Peter Lieven; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
[-- Attachment #1: Type: text/plain, Size: 3426 bytes --]
On Mon, Oct 26, 2015 at 11:56:26AM +0100, Peter Lieven wrote:
> Am 26.10.2015 um 11:42 schrieb Stefan Hajnoczi:
> >On Mon, Oct 12, 2015 at 02:27:21PM +0200, Peter Lieven wrote:
> >>This series aims at avoiding a hanging main-loop if a vserver has a
> >>CDROM image mounted from a NFS share and that NFS share goes down.
> >>Typical situation is that users mount an CDROM ISO to install something
> >>and then forget to eject that CDROM afterwards.
> >>As a consequence this mounted CD is able to bring down the
> >>whole vserver if the backend NFS share is unreachable. This is bad
> >>especially if the CDROM itself is not needed anymore at this point.
> >>
> >>This series aims at fixing 2 blocking I/O operations that would
> >>hang if the NFS server is unavailable:
> >> - ATAPI PIO read requests used sync calls to blk_read, convert
> >> them to an async variant where possible.
> >> - If a busmaster DMA request is cancelled all requests are drained.
> >> Convert the drain to an async request canceling.
> >>
> >>v1->v2: - fix offset for 2352 byte sector size [Kevin]
> >> - use a sync request if we continue an elementary transfer.
> >> As John pointed out we enter a race condition between next
> >> IDE command and async transfer otherwise. This is sill not
> >> optimal, but it fixes the NFS down problems for all cases where
> >> the NFS server goes down while there is no PIO CD activity.
> >> Of course, it could still happen during a PIO transfer, but I
> >> expect this to be the unlikelier case.
> >> I spent some effort trying to read more sectors at once and
> >> avoiding continuation of elementary transfers, but with
> >> whatever I came up it was destroying migration between different
> >> Qemu versions. I have a quite hackish patch that works and
> >> should survive migration, but I am not happy with it. So I
> >> would like to start with this version as it is a big improvement
> >> already.
> >> - Dropped Patch 5 because it is upstream meanwhile.
> >>
> >>Peter Lieven (4):
> >> ide/atapi: make PIO read requests async
> >> ide/atapi: blk_aio_readv may return NULL
> >> ide: add support for cancelable read requests
> >> ide/atapi: enable cancelable requests
> >>
> >> hw/ide/atapi.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++------
> >> hw/ide/core.c | 55 +++++++++++++++++++++++++++++++
> >> hw/ide/internal.h | 16 +++++++++
> >> hw/ide/pci.c | 42 +++++++++++++++--------
> >> 4 files changed, 188 insertions(+), 24 deletions(-)
> >Any reason why write and discard requests aren't covered in this series?
> >
> >If this is a good idea for CD-ROM it should be a good idea for all PCI
> >IDE devices.
> >
> >Having a specialized code path is often a sign that it hasn't been
> >tested enough. Can we get confident enough to enable this everywhere?
>
> The reason is that the buffered request trick does only work for
> read-only devices (like a CDROM). A write request that is completed
> on the backend storage at a later point (after the OS thinks the request
> is canceled) can cause damage to the filesystem.
Of course, you are right.
This is really annoying because it means a guest cannot reboot if writes
are pending...
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests
2015-10-28 11:26 ` Stefan Hajnoczi
@ 2015-10-28 19:56 ` Peter Lieven
0 siblings, 0 replies; 18+ messages in thread
From: Peter Lieven @ 2015-10-28 19:56 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
Am 28.10.2015 um 12:26 schrieb Stefan Hajnoczi:
> On Tue, Oct 27, 2015 at 11:58:55AM +0100, Peter Lieven wrote:
>> Am 26.10.2015 um 11:39 schrieb Stefan Hajnoczi:
>>> On Mon, Oct 12, 2015 at 02:27:24PM +0200, Peter Lieven wrote:
>>>> +BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
>>>> + QEMUIOVector *iov, int nb_sectors,
>>>> + BlockCompletionFunc *cb, void *opaque)
>>>> +{
>>>> + BlockAIOCB *aioreq;
>>>> + IDECancelableRequest *req;
>>>> + int c = 0;
>>>> +
>>>> + QLIST_FOREACH(req, &s->cancelable_requests, list) {
>>>> + c++;
>>>> + }
>>>> + if (c > MAX_CANCELABLE_REQS) {
>>>> + return NULL;
>>>> + }
>>> A BH is probably needed here to schedule an cb(-EIO) call since this
>>> function isn't supposed to return NULL if it's a direct replacement for
>>> blk_aio_readv().
>> You mean sth like:
>>
>> acb = qemu_aio_get(&bdrv_em_aiocb_info, bs, cb, opaque);
>> acb->bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_aio_bh_cb, acb);
>> acb->ret = -EIO;
>> qemu_bh_schedule(acb->bh);
>>
>> return &acb->common;
> Yes.
>
>> As pointed out in my comment to your requestion about write/discard I think it should
>> be feasible to use buffered readv requests for all read-only IDE devices.
>> Only thing I'm unsure about is reopening. A reopen seems to only flush the device not
>> drain all requests.
> bdrv_reopen_prepare() callers should drain requests. For example,
> bdrv_reopen_multiple() (and indirectly bdrv_reopen()) call
> bdrv_drain_all(). Is this what you mean?
Yes, I have only found a flush in bdrv_reopen_prepare, but if you say they need to drain before
I think it is safe to use the buffered_ide_readv for all read-only IDE devices not only CDROMs.
Peter
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async
2015-10-12 12:27 ` [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async Peter Lieven
2015-10-22 16:17 ` Stefan Hajnoczi
@ 2015-11-03 0:48 ` John Snow
2015-11-03 7:03 ` Peter Lieven
1 sibling, 1 reply; 18+ messages in thread
From: John Snow @ 2015-11-03 0:48 UTC (permalink / raw)
To: Peter Lieven, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody
On 10/12/2015 08:27 AM, Peter Lieven wrote:
> PIO read requests on the ATAPI interface used to be sync blk requests.
> This has two significant drawbacks. First the main loop hangs util an
> I/O request is completed and secondly if the I/O request does not
> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
> hw/ide/atapi.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 84 insertions(+), 9 deletions(-)
>
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index 747f466..2271ea2 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -105,11 +105,16 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
> memset(buf, 0, 288);
> }
>
> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
> +static int
> +cd_read_sector_sync(IDEState *s, int lba, uint8_t *buf)
> {
> int ret;
>
> - switch(sector_size) {
> +#ifdef DEBUG_IDE_ATAPI
> + printf("cd_read_sector_sync: lba=%d\n", lba);
> +#endif
> +
> + switch (s->cd_sector_size) {
> case 2048:
> block_acct_start(blk_get_stats(s->blk), &s->acct,
> 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> @@ -129,9 +134,71 @@ static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
> ret = -EIO;
> break;
> }
> +
> + if (!ret) {
> + s->lba++;
> + s->io_buffer_index = 0;
> + }
> +
> return ret;
> }
>
> +static void cd_read_sector_cb(void *opaque, int ret)
> +{
> + IDEState *s = opaque;
> +
> + block_acct_done(blk_get_stats(s->blk), &s->acct);
> +
> +#ifdef DEBUG_IDE_ATAPI
> + printf("cd_read_sector_cb: lba=%d ret=%d\n", s->lba, ret);
> +#endif
> +
> + if (ret < 0) {
> + ide_atapi_io_error(s, ret);
> + return;
> + }
> +
> + if (s->cd_sector_size == 2352) {
> + cd_data_to_raw(s->io_buffer, s->lba);
> + }
> +
> + s->lba++;
> + s->io_buffer_index = 0;
> + s->status &= ~BUSY_STAT;
> +
> + ide_atapi_cmd_reply_end(s);
> +}
> +
> +static int cd_read_sector(IDEState *s, int lba, void *buf)
> +{
> + if (s->cd_sector_size != 2048 && s->cd_sector_size != 2352) {
> + return -EINVAL;
> + }
> +
> + s->iov.iov_base = buf;
> + if (s->cd_sector_size == 2352) {
> + buf += 16;
> + }
> +
> + s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> + qemu_iovec_init_external(&s->qiov, &s->iov, 1);
> +
> +#ifdef DEBUG_IDE_ATAPI
> + printf("cd_read_sector: lba=%d\n", lba);
> +#endif
> +
> + if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
> + cd_read_sector_cb, s) == NULL) {
> + return -EIO;
> + }
> +
> + block_acct_start(blk_get_stats(s->blk), &s->acct,
> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> +
> + s->status |= BUSY_STAT;
> + return 0;
> +}
> +
> void ide_atapi_cmd_ok(IDEState *s)
> {
> s->error = 0;
> @@ -182,18 +249,27 @@ void ide_atapi_cmd_reply_end(IDEState *s)
> ide_atapi_cmd_ok(s);
> ide_set_irq(s->bus);
> #ifdef DEBUG_IDE_ATAPI
> - printf("status=0x%x\n", s->status);
> + printf("end of transfer, status=0x%x\n", s->status);
> #endif
> } else {
> /* see if a new sector must be read */
> if (s->lba != -1 && s->io_buffer_index >= s->cd_sector_size) {
> - ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
> - if (ret < 0) {
> - ide_atapi_io_error(s, ret);
> + if (!s->elementary_transfer_size) {
> + ret = cd_read_sector(s, s->lba, s->io_buffer);
> + if (ret < 0) {
> + ide_atapi_io_error(s, ret);
> + }
> return;
> + } else {
> + /* rebuffering within an elementary transfer is
> + * only possible with a sync request because we
> + * end up with a race condition otherwise */
> + ret = cd_read_sector_sync(s, s->lba, s->io_buffer);
> + if (ret < 0) {
> + ide_atapi_io_error(s, ret);
> + return;
> + }
> }
> - s->lba++;
> - s->io_buffer_index = 0;
> }
> if (s->elementary_transfer_size > 0) {
> /* there are some data left to transmit in this elementary
> @@ -275,7 +351,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
> s->io_buffer_index = sector_size;
> s->cd_sector_size = sector_size;
>
> - s->status = READY_STAT | SEEK_STAT;
> ide_atapi_cmd_reply_end(s);
> }
>
>
This patch looks good to me, apart from Stefan's other comments. Will
you be sending a V3? I can try to merge it for this week before the hard
freeze hits.
--js
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async
2015-11-03 0:48 ` John Snow
@ 2015-11-03 7:03 ` Peter Lieven
0 siblings, 0 replies; 18+ messages in thread
From: Peter Lieven @ 2015-11-03 7:03 UTC (permalink / raw)
To: John Snow, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody
Am 03.11.2015 um 01:48 schrieb John Snow:
>
> On 10/12/2015 08:27 AM, Peter Lieven wrote:
>> PIO read requests on the ATAPI interface used to be sync blk requests.
>> This has two significant drawbacks. First the main loop hangs util an
>> I/O request is completed and secondly if the I/O request does not
>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> hw/ide/atapi.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
>> 1 file changed, 84 insertions(+), 9 deletions(-)
>>
>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>> index 747f466..2271ea2 100644
>> --- a/hw/ide/atapi.c
>> +++ b/hw/ide/atapi.c
>> @@ -105,11 +105,16 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>> memset(buf, 0, 288);
>> }
>>
>> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
>> +static int
>> +cd_read_sector_sync(IDEState *s, int lba, uint8_t *buf)
>> {
>> int ret;
>>
>> - switch(sector_size) {
>> +#ifdef DEBUG_IDE_ATAPI
>> + printf("cd_read_sector_sync: lba=%d\n", lba);
>> +#endif
>> +
>> + switch (s->cd_sector_size) {
>> case 2048:
>> block_acct_start(blk_get_stats(s->blk), &s->acct,
>> 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>> @@ -129,9 +134,71 @@ static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
>> ret = -EIO;
>> break;
>> }
>> +
>> + if (!ret) {
>> + s->lba++;
>> + s->io_buffer_index = 0;
>> + }
>> +
>> return ret;
>> }
>>
>> +static void cd_read_sector_cb(void *opaque, int ret)
>> +{
>> + IDEState *s = opaque;
>> +
>> + block_acct_done(blk_get_stats(s->blk), &s->acct);
>> +
>> +#ifdef DEBUG_IDE_ATAPI
>> + printf("cd_read_sector_cb: lba=%d ret=%d\n", s->lba, ret);
>> +#endif
>> +
>> + if (ret < 0) {
>> + ide_atapi_io_error(s, ret);
>> + return;
>> + }
>> +
>> + if (s->cd_sector_size == 2352) {
>> + cd_data_to_raw(s->io_buffer, s->lba);
>> + }
>> +
>> + s->lba++;
>> + s->io_buffer_index = 0;
>> + s->status &= ~BUSY_STAT;
>> +
>> + ide_atapi_cmd_reply_end(s);
>> +}
>> +
>> +static int cd_read_sector(IDEState *s, int lba, void *buf)
>> +{
>> + if (s->cd_sector_size != 2048 && s->cd_sector_size != 2352) {
>> + return -EINVAL;
>> + }
>> +
>> + s->iov.iov_base = buf;
>> + if (s->cd_sector_size == 2352) {
>> + buf += 16;
>> + }
>> +
>> + s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>> + qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>> +
>> +#ifdef DEBUG_IDE_ATAPI
>> + printf("cd_read_sector: lba=%d\n", lba);
>> +#endif
>> +
>> + if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
>> + cd_read_sector_cb, s) == NULL) {
>> + return -EIO;
>> + }
>> +
>> + block_acct_start(blk_get_stats(s->blk), &s->acct,
>> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>> +
>> + s->status |= BUSY_STAT;
>> + return 0;
>> +}
>> +
>> void ide_atapi_cmd_ok(IDEState *s)
>> {
>> s->error = 0;
>> @@ -182,18 +249,27 @@ void ide_atapi_cmd_reply_end(IDEState *s)
>> ide_atapi_cmd_ok(s);
>> ide_set_irq(s->bus);
>> #ifdef DEBUG_IDE_ATAPI
>> - printf("status=0x%x\n", s->status);
>> + printf("end of transfer, status=0x%x\n", s->status);
>> #endif
>> } else {
>> /* see if a new sector must be read */
>> if (s->lba != -1 && s->io_buffer_index >= s->cd_sector_size) {
>> - ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
>> - if (ret < 0) {
>> - ide_atapi_io_error(s, ret);
>> + if (!s->elementary_transfer_size) {
>> + ret = cd_read_sector(s, s->lba, s->io_buffer);
>> + if (ret < 0) {
>> + ide_atapi_io_error(s, ret);
>> + }
>> return;
>> + } else {
>> + /* rebuffering within an elementary transfer is
>> + * only possible with a sync request because we
>> + * end up with a race condition otherwise */
>> + ret = cd_read_sector_sync(s, s->lba, s->io_buffer);
>> + if (ret < 0) {
>> + ide_atapi_io_error(s, ret);
>> + return;
>> + }
>> }
>> - s->lba++;
>> - s->io_buffer_index = 0;
>> }
>> if (s->elementary_transfer_size > 0) {
>> /* there are some data left to transmit in this elementary
>> @@ -275,7 +351,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
>> s->io_buffer_index = sector_size;
>> s->cd_sector_size = sector_size;
>>
>> - s->status = READY_STAT | SEEK_STAT;
>> ide_atapi_cmd_reply_end(s);
>> }
>>
>>
> This patch looks good to me, apart from Stefan's other comments. Will
> you be sending a V3? I can try to merge it for this week before the hard
> freeze hits.
I will look at this today.
Peter
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2015-11-03 7:03 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-12 12:27 [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
2015-10-12 12:27 ` [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async Peter Lieven
2015-10-22 16:17 ` Stefan Hajnoczi
2015-10-23 15:17 ` Peter Lieven
2015-11-03 0:48 ` John Snow
2015-11-03 7:03 ` Peter Lieven
2015-10-12 12:27 ` [Qemu-devel] [PATCH 2/4] ide/atapi: blk_aio_readv may return NULL Peter Lieven
2015-10-22 16:20 ` Stefan Hajnoczi
2015-10-23 15:18 ` Peter Lieven
2015-10-12 12:27 ` [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests Peter Lieven
2015-10-26 10:39 ` Stefan Hajnoczi
2015-10-27 10:58 ` Peter Lieven
2015-10-28 11:26 ` Stefan Hajnoczi
2015-10-28 19:56 ` Peter Lieven
2015-10-12 12:27 ` [Qemu-devel] [PATCH 4/4] ide/atapi: enable cancelable requests Peter Lieven
2015-10-26 10:42 ` [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Stefan Hajnoczi
2015-10-26 10:56 ` Peter Lieven
2015-10-28 11:27 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).