* [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async
2015-10-12 12:27 [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
@ 2015-10-12 12:27 ` Peter Lieven
2015-10-22 16:17 ` Stefan Hajnoczi
2015-11-03 0:48 ` John Snow
2015-10-12 12:27 ` [Qemu-devel] [PATCH 2/4] ide/atapi: blk_aio_readv may return NULL Peter Lieven
` (3 subsequent siblings)
4 siblings, 2 replies; 18+ messages in thread
From: Peter Lieven @ 2015-10-12 12:27 UTC (permalink / raw)
To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody, jsnow, Peter Lieven
PIO read requests on the ATAPI interface used to be sync blk requests.
This has two significant drawbacks. First the main loop hangs util an
I/O request is completed and secondly if the I/O request does not
complete (e.g. due to an unresponsive storage) Qemu hangs completely.
Signed-off-by: Peter Lieven <pl@kamp.de>
---
hw/ide/atapi.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 84 insertions(+), 9 deletions(-)
diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 747f466..2271ea2 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -105,11 +105,16 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
memset(buf, 0, 288);
}
-static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
+static int
+cd_read_sector_sync(IDEState *s, int lba, uint8_t *buf)
{
int ret;
- switch(sector_size) {
+#ifdef DEBUG_IDE_ATAPI
+ printf("cd_read_sector_sync: lba=%d\n", lba);
+#endif
+
+ switch (s->cd_sector_size) {
case 2048:
block_acct_start(blk_get_stats(s->blk), &s->acct,
4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
@@ -129,9 +134,71 @@ static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
ret = -EIO;
break;
}
+
+ if (!ret) {
+ s->lba++;
+ s->io_buffer_index = 0;
+ }
+
return ret;
}
+static void cd_read_sector_cb(void *opaque, int ret)
+{
+ IDEState *s = opaque;
+
+ block_acct_done(blk_get_stats(s->blk), &s->acct);
+
+#ifdef DEBUG_IDE_ATAPI
+ printf("cd_read_sector_cb: lba=%d ret=%d\n", s->lba, ret);
+#endif
+
+ if (ret < 0) {
+ ide_atapi_io_error(s, ret);
+ return;
+ }
+
+ if (s->cd_sector_size == 2352) {
+ cd_data_to_raw(s->io_buffer, s->lba);
+ }
+
+ s->lba++;
+ s->io_buffer_index = 0;
+ s->status &= ~BUSY_STAT;
+
+ ide_atapi_cmd_reply_end(s);
+}
+
+static int cd_read_sector(IDEState *s, int lba, void *buf)
+{
+ if (s->cd_sector_size != 2048 && s->cd_sector_size != 2352) {
+ return -EINVAL;
+ }
+
+ s->iov.iov_base = buf;
+ if (s->cd_sector_size == 2352) {
+ buf += 16;
+ }
+
+ s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
+ qemu_iovec_init_external(&s->qiov, &s->iov, 1);
+
+#ifdef DEBUG_IDE_ATAPI
+ printf("cd_read_sector: lba=%d\n", lba);
+#endif
+
+ if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
+ cd_read_sector_cb, s) == NULL) {
+ return -EIO;
+ }
+
+ block_acct_start(blk_get_stats(s->blk), &s->acct,
+ 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
+
+ s->status |= BUSY_STAT;
+ return 0;
+}
+
void ide_atapi_cmd_ok(IDEState *s)
{
s->error = 0;
@@ -182,18 +249,27 @@ void ide_atapi_cmd_reply_end(IDEState *s)
ide_atapi_cmd_ok(s);
ide_set_irq(s->bus);
#ifdef DEBUG_IDE_ATAPI
- printf("status=0x%x\n", s->status);
+ printf("end of transfer, status=0x%x\n", s->status);
#endif
} else {
/* see if a new sector must be read */
if (s->lba != -1 && s->io_buffer_index >= s->cd_sector_size) {
- ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
- if (ret < 0) {
- ide_atapi_io_error(s, ret);
+ if (!s->elementary_transfer_size) {
+ ret = cd_read_sector(s, s->lba, s->io_buffer);
+ if (ret < 0) {
+ ide_atapi_io_error(s, ret);
+ }
return;
+ } else {
+ /* rebuffering within an elementary transfer is
+ * only possible with a sync request because we
+ * end up with a race condition otherwise */
+ ret = cd_read_sector_sync(s, s->lba, s->io_buffer);
+ if (ret < 0) {
+ ide_atapi_io_error(s, ret);
+ return;
+ }
}
- s->lba++;
- s->io_buffer_index = 0;
}
if (s->elementary_transfer_size > 0) {
/* there are some data left to transmit in this elementary
@@ -275,7 +351,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
s->io_buffer_index = sector_size;
s->cd_sector_size = sector_size;
- s->status = READY_STAT | SEEK_STAT;
ide_atapi_cmd_reply_end(s);
}
--
1.9.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async
2015-10-12 12:27 ` [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async Peter Lieven
@ 2015-10-22 16:17 ` Stefan Hajnoczi
2015-10-23 15:17 ` Peter Lieven
2015-11-03 0:48 ` John Snow
1 sibling, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2015-10-22 16:17 UTC (permalink / raw)
To: Peter Lieven; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
On Mon, Oct 12, 2015 at 02:27:22PM +0200, Peter Lieven wrote:
> @@ -129,9 +134,71 @@ static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
> ret = -EIO;
> break;
> }
> +
> + if (!ret) {
> + s->lba++;
This function probably shouldn't take the lba argument if it modifies
s->lba. You dropped the sector_size argument and I think the lba
argument should be dropped for the same reason.
> +static int cd_read_sector(IDEState *s, int lba, void *buf)
> +{
> + if (s->cd_sector_size != 2048 && s->cd_sector_size != 2352) {
> + return -EINVAL;
> + }
> +
> + s->iov.iov_base = buf;
> + if (s->cd_sector_size == 2352) {
> + buf += 16;
> + }
> +
> + s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> + qemu_iovec_init_external(&s->qiov, &s->iov, 1);
> +
> +#ifdef DEBUG_IDE_ATAPI
> + printf("cd_read_sector: lba=%d\n", lba);
> +#endif
> +
> + if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
> + cd_read_sector_cb, s) == NULL) {
This function never returns NULL, the if statement can be removed.
Doesn't the aiocb need to be stored for cancellation (e.g. device reset)?
> + return -EIO;
> + }
> +
> + block_acct_start(blk_get_stats(s->blk), &s->acct,
> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
Why does accounting start *after* the read request has been submitted?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async
2015-10-22 16:17 ` Stefan Hajnoczi
@ 2015-10-23 15:17 ` Peter Lieven
0 siblings, 0 replies; 18+ messages in thread
From: Peter Lieven @ 2015-10-23 15:17 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
Am 22.10.2015 um 18:17 schrieb Stefan Hajnoczi:
> On Mon, Oct 12, 2015 at 02:27:22PM +0200, Peter Lieven wrote:
>> @@ -129,9 +134,71 @@ static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
>> ret = -EIO;
>> break;
>> }
>> +
>> + if (!ret) {
>> + s->lba++;
> This function probably shouldn't take the lba argument if it modifies
> s->lba. You dropped the sector_size argument and I think the lba
> argument should be dropped for the same reason.
>
>> +static int cd_read_sector(IDEState *s, int lba, void *buf)
>> +{
>> + if (s->cd_sector_size != 2048 && s->cd_sector_size != 2352) {
>> + return -EINVAL;
>> + }
>> +
>> + s->iov.iov_base = buf;
>> + if (s->cd_sector_size == 2352) {
>> + buf += 16;
>> + }
>> +
>> + s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>> + qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>> +
>> +#ifdef DEBUG_IDE_ATAPI
>> + printf("cd_read_sector: lba=%d\n", lba);
>> +#endif
>> +
>> + if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
>> + cd_read_sector_cb, s) == NULL) {
> This function never returns NULL, the if statement can be removed.
Okay, that was my fault. I was believing it could return NULL.
>
> Doesn't the aiocb need to be stored for cancellation (e.g. device reset)?
The ide_readv_cancelable function introduced in Patch 3 will store
the aiocb. At this point there is blk_drain_all called when the device is reset.
>
>> + return -EIO;
>> + }
>> +
>> + block_acct_start(blk_get_stats(s->blk), &s->acct,
>> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> Why does accounting start *after* the read request has been submitted?
You are right it has to start before the request.
Peter
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async
2015-10-12 12:27 ` [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async Peter Lieven
2015-10-22 16:17 ` Stefan Hajnoczi
@ 2015-11-03 0:48 ` John Snow
2015-11-03 7:03 ` Peter Lieven
1 sibling, 1 reply; 18+ messages in thread
From: John Snow @ 2015-11-03 0:48 UTC (permalink / raw)
To: Peter Lieven, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody
On 10/12/2015 08:27 AM, Peter Lieven wrote:
> PIO read requests on the ATAPI interface used to be sync blk requests.
> This has two significant drawbacks. First the main loop hangs util an
> I/O request is completed and secondly if the I/O request does not
> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
> hw/ide/atapi.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 84 insertions(+), 9 deletions(-)
>
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index 747f466..2271ea2 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -105,11 +105,16 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
> memset(buf, 0, 288);
> }
>
> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
> +static int
> +cd_read_sector_sync(IDEState *s, int lba, uint8_t *buf)
> {
> int ret;
>
> - switch(sector_size) {
> +#ifdef DEBUG_IDE_ATAPI
> + printf("cd_read_sector_sync: lba=%d\n", lba);
> +#endif
> +
> + switch (s->cd_sector_size) {
> case 2048:
> block_acct_start(blk_get_stats(s->blk), &s->acct,
> 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> @@ -129,9 +134,71 @@ static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
> ret = -EIO;
> break;
> }
> +
> + if (!ret) {
> + s->lba++;
> + s->io_buffer_index = 0;
> + }
> +
> return ret;
> }
>
> +static void cd_read_sector_cb(void *opaque, int ret)
> +{
> + IDEState *s = opaque;
> +
> + block_acct_done(blk_get_stats(s->blk), &s->acct);
> +
> +#ifdef DEBUG_IDE_ATAPI
> + printf("cd_read_sector_cb: lba=%d ret=%d\n", s->lba, ret);
> +#endif
> +
> + if (ret < 0) {
> + ide_atapi_io_error(s, ret);
> + return;
> + }
> +
> + if (s->cd_sector_size == 2352) {
> + cd_data_to_raw(s->io_buffer, s->lba);
> + }
> +
> + s->lba++;
> + s->io_buffer_index = 0;
> + s->status &= ~BUSY_STAT;
> +
> + ide_atapi_cmd_reply_end(s);
> +}
> +
> +static int cd_read_sector(IDEState *s, int lba, void *buf)
> +{
> + if (s->cd_sector_size != 2048 && s->cd_sector_size != 2352) {
> + return -EINVAL;
> + }
> +
> + s->iov.iov_base = buf;
> + if (s->cd_sector_size == 2352) {
> + buf += 16;
> + }
> +
> + s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> + qemu_iovec_init_external(&s->qiov, &s->iov, 1);
> +
> +#ifdef DEBUG_IDE_ATAPI
> + printf("cd_read_sector: lba=%d\n", lba);
> +#endif
> +
> + if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
> + cd_read_sector_cb, s) == NULL) {
> + return -EIO;
> + }
> +
> + block_acct_start(blk_get_stats(s->blk), &s->acct,
> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> +
> + s->status |= BUSY_STAT;
> + return 0;
> +}
> +
> void ide_atapi_cmd_ok(IDEState *s)
> {
> s->error = 0;
> @@ -182,18 +249,27 @@ void ide_atapi_cmd_reply_end(IDEState *s)
> ide_atapi_cmd_ok(s);
> ide_set_irq(s->bus);
> #ifdef DEBUG_IDE_ATAPI
> - printf("status=0x%x\n", s->status);
> + printf("end of transfer, status=0x%x\n", s->status);
> #endif
> } else {
> /* see if a new sector must be read */
> if (s->lba != -1 && s->io_buffer_index >= s->cd_sector_size) {
> - ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
> - if (ret < 0) {
> - ide_atapi_io_error(s, ret);
> + if (!s->elementary_transfer_size) {
> + ret = cd_read_sector(s, s->lba, s->io_buffer);
> + if (ret < 0) {
> + ide_atapi_io_error(s, ret);
> + }
> return;
> + } else {
> + /* rebuffering within an elementary transfer is
> + * only possible with a sync request because we
> + * end up with a race condition otherwise */
> + ret = cd_read_sector_sync(s, s->lba, s->io_buffer);
> + if (ret < 0) {
> + ide_atapi_io_error(s, ret);
> + return;
> + }
> }
> - s->lba++;
> - s->io_buffer_index = 0;
> }
> if (s->elementary_transfer_size > 0) {
> /* there are some data left to transmit in this elementary
> @@ -275,7 +351,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
> s->io_buffer_index = sector_size;
> s->cd_sector_size = sector_size;
>
> - s->status = READY_STAT | SEEK_STAT;
> ide_atapi_cmd_reply_end(s);
> }
>
>
This patch looks good to me, apart from Stefan's other comments. Will
you be sending a V3? I can try to merge it for this week before the hard
freeze hits.
--js
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async
2015-11-03 0:48 ` John Snow
@ 2015-11-03 7:03 ` Peter Lieven
0 siblings, 0 replies; 18+ messages in thread
From: Peter Lieven @ 2015-11-03 7:03 UTC (permalink / raw)
To: John Snow, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody
Am 03.11.2015 um 01:48 schrieb John Snow:
>
> On 10/12/2015 08:27 AM, Peter Lieven wrote:
>> PIO read requests on the ATAPI interface used to be sync blk requests.
>> This has two significant drawbacks. First the main loop hangs util an
>> I/O request is completed and secondly if the I/O request does not
>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> hw/ide/atapi.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
>> 1 file changed, 84 insertions(+), 9 deletions(-)
>>
>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>> index 747f466..2271ea2 100644
>> --- a/hw/ide/atapi.c
>> +++ b/hw/ide/atapi.c
>> @@ -105,11 +105,16 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>> memset(buf, 0, 288);
>> }
>>
>> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
>> +static int
>> +cd_read_sector_sync(IDEState *s, int lba, uint8_t *buf)
>> {
>> int ret;
>>
>> - switch(sector_size) {
>> +#ifdef DEBUG_IDE_ATAPI
>> + printf("cd_read_sector_sync: lba=%d\n", lba);
>> +#endif
>> +
>> + switch (s->cd_sector_size) {
>> case 2048:
>> block_acct_start(blk_get_stats(s->blk), &s->acct,
>> 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>> @@ -129,9 +134,71 @@ static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
>> ret = -EIO;
>> break;
>> }
>> +
>> + if (!ret) {
>> + s->lba++;
>> + s->io_buffer_index = 0;
>> + }
>> +
>> return ret;
>> }
>>
>> +static void cd_read_sector_cb(void *opaque, int ret)
>> +{
>> + IDEState *s = opaque;
>> +
>> + block_acct_done(blk_get_stats(s->blk), &s->acct);
>> +
>> +#ifdef DEBUG_IDE_ATAPI
>> + printf("cd_read_sector_cb: lba=%d ret=%d\n", s->lba, ret);
>> +#endif
>> +
>> + if (ret < 0) {
>> + ide_atapi_io_error(s, ret);
>> + return;
>> + }
>> +
>> + if (s->cd_sector_size == 2352) {
>> + cd_data_to_raw(s->io_buffer, s->lba);
>> + }
>> +
>> + s->lba++;
>> + s->io_buffer_index = 0;
>> + s->status &= ~BUSY_STAT;
>> +
>> + ide_atapi_cmd_reply_end(s);
>> +}
>> +
>> +static int cd_read_sector(IDEState *s, int lba, void *buf)
>> +{
>> + if (s->cd_sector_size != 2048 && s->cd_sector_size != 2352) {
>> + return -EINVAL;
>> + }
>> +
>> + s->iov.iov_base = buf;
>> + if (s->cd_sector_size == 2352) {
>> + buf += 16;
>> + }
>> +
>> + s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>> + qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>> +
>> +#ifdef DEBUG_IDE_ATAPI
>> + printf("cd_read_sector: lba=%d\n", lba);
>> +#endif
>> +
>> + if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
>> + cd_read_sector_cb, s) == NULL) {
>> + return -EIO;
>> + }
>> +
>> + block_acct_start(blk_get_stats(s->blk), &s->acct,
>> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>> +
>> + s->status |= BUSY_STAT;
>> + return 0;
>> +}
>> +
>> void ide_atapi_cmd_ok(IDEState *s)
>> {
>> s->error = 0;
>> @@ -182,18 +249,27 @@ void ide_atapi_cmd_reply_end(IDEState *s)
>> ide_atapi_cmd_ok(s);
>> ide_set_irq(s->bus);
>> #ifdef DEBUG_IDE_ATAPI
>> - printf("status=0x%x\n", s->status);
>> + printf("end of transfer, status=0x%x\n", s->status);
>> #endif
>> } else {
>> /* see if a new sector must be read */
>> if (s->lba != -1 && s->io_buffer_index >= s->cd_sector_size) {
>> - ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
>> - if (ret < 0) {
>> - ide_atapi_io_error(s, ret);
>> + if (!s->elementary_transfer_size) {
>> + ret = cd_read_sector(s, s->lba, s->io_buffer);
>> + if (ret < 0) {
>> + ide_atapi_io_error(s, ret);
>> + }
>> return;
>> + } else {
>> + /* rebuffering within an elementary transfer is
>> + * only possible with a sync request because we
>> + * end up with a race condition otherwise */
>> + ret = cd_read_sector_sync(s, s->lba, s->io_buffer);
>> + if (ret < 0) {
>> + ide_atapi_io_error(s, ret);
>> + return;
>> + }
>> }
>> - s->lba++;
>> - s->io_buffer_index = 0;
>> }
>> if (s->elementary_transfer_size > 0) {
>> /* there are some data left to transmit in this elementary
>> @@ -275,7 +351,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
>> s->io_buffer_index = sector_size;
>> s->cd_sector_size = sector_size;
>>
>> - s->status = READY_STAT | SEEK_STAT;
>> ide_atapi_cmd_reply_end(s);
>> }
>>
>>
> This patch looks good to me, apart from Stefan's other comments. Will
> you be sending a V3? I can try to merge it for this week before the hard
> freeze hits.
I will look at this today.
Peter
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 2/4] ide/atapi: blk_aio_readv may return NULL
2015-10-12 12:27 [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
2015-10-12 12:27 ` [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async Peter Lieven
@ 2015-10-12 12:27 ` Peter Lieven
2015-10-22 16:20 ` Stefan Hajnoczi
2015-10-12 12:27 ` [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests Peter Lieven
` (2 subsequent siblings)
4 siblings, 1 reply; 18+ messages in thread
From: Peter Lieven @ 2015-10-12 12:27 UTC (permalink / raw)
To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody, jsnow, Peter Lieven
Signed-off-by: Peter Lieven <pl@kamp.de>
---
hw/ide/atapi.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 2271ea2..e0cf066 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -429,6 +429,10 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
s->bus->dma->aiocb = blk_aio_readv(s->blk, (int64_t)s->lba << 2,
&s->bus->dma->qiov, n * 4,
ide_atapi_cmd_read_dma_cb, s);
+ if (s->bus->dma->aiocb == NULL) {
+ ide_atapi_io_error(s, -EIO);
+ goto eot;
+ }
return;
eot:
--
1.9.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 2/4] ide/atapi: blk_aio_readv may return NULL
2015-10-12 12:27 ` [Qemu-devel] [PATCH 2/4] ide/atapi: blk_aio_readv may return NULL Peter Lieven
@ 2015-10-22 16:20 ` Stefan Hajnoczi
2015-10-23 15:18 ` Peter Lieven
0 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2015-10-22 16:20 UTC (permalink / raw)
To: Peter Lieven; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
On Mon, Oct 12, 2015 at 02:27:23PM +0200, Peter Lieven wrote:
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
> hw/ide/atapi.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index 2271ea2..e0cf066 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -429,6 +429,10 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
> s->bus->dma->aiocb = blk_aio_readv(s->blk, (int64_t)s->lba << 2,
> &s->bus->dma->qiov, n * 4,
> ide_atapi_cmd_read_dma_cb, s);
> + if (s->bus->dma->aiocb == NULL) {
> + ide_atapi_io_error(s, -EIO);
> + goto eot;
> + }
Where does blk_aio_readv() return NULL?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 2/4] ide/atapi: blk_aio_readv may return NULL
2015-10-22 16:20 ` Stefan Hajnoczi
@ 2015-10-23 15:18 ` Peter Lieven
0 siblings, 0 replies; 18+ messages in thread
From: Peter Lieven @ 2015-10-23 15:18 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
Am 22.10.2015 um 18:20 schrieb Stefan Hajnoczi:
> On Mon, Oct 12, 2015 at 02:27:23PM +0200, Peter Lieven wrote:
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> hw/ide/atapi.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>> index 2271ea2..e0cf066 100644
>> --- a/hw/ide/atapi.c
>> +++ b/hw/ide/atapi.c
>> @@ -429,6 +429,10 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
>> s->bus->dma->aiocb = blk_aio_readv(s->blk, (int64_t)s->lba << 2,
>> &s->bus->dma->qiov, n * 4,
>> ide_atapi_cmd_read_dma_cb, s);
>> + if (s->bus->dma->aiocb == NULL) {
>> + ide_atapi_io_error(s, -EIO);
>> + goto eot;
>> + }
> Where does blk_aio_readv() return NULL?
Never. My fault.
Peter
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests
2015-10-12 12:27 [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
2015-10-12 12:27 ` [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async Peter Lieven
2015-10-12 12:27 ` [Qemu-devel] [PATCH 2/4] ide/atapi: blk_aio_readv may return NULL Peter Lieven
@ 2015-10-12 12:27 ` Peter Lieven
2015-10-26 10:39 ` Stefan Hajnoczi
2015-10-12 12:27 ` [Qemu-devel] [PATCH 4/4] ide/atapi: enable cancelable requests Peter Lieven
2015-10-26 10:42 ` [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Stefan Hajnoczi
4 siblings, 1 reply; 18+ messages in thread
From: Peter Lieven @ 2015-10-12 12:27 UTC (permalink / raw)
To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody, jsnow, Peter Lieven
this patch adds a new aio readv compatible function which copies
all data through a bounce buffer. The benefit is that these requests
can be flagged as canceled to avoid guest memory corruption when
a canceled request is completed by the backend at a later stage.
If an IDE protocol wants to use this function it has to pipe
all read requests through ide_readv_cancelable and it may then
enable requests_cancelable in the IDEState.
If this state is enable we can avoid the blocking blk_drain_all
in case of a BMDMA reset.
Currently only read operations are cancelable thus we can only
use this logic for read-only devices.
Signed-off-by: Peter Lieven <pl@kamp.de>
---
hw/ide/core.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
hw/ide/internal.h | 16 ++++++++++++++++
hw/ide/pci.c | 42 ++++++++++++++++++++++++++++--------------
3 files changed, 98 insertions(+), 14 deletions(-)
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 317406d..24547ce 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -561,6 +561,59 @@ static bool ide_sect_range_ok(IDEState *s,
return true;
}
+static void ide_readv_cancelable_cb(void *opaque, int ret)
+{
+ IDECancelableRequest *req = opaque;
+ if (!req->canceled) {
+ if (!ret) {
+ qemu_iovec_from_buf(req->org_qiov, 0, req->buf, req->org_qiov->size);
+ }
+ req->org_cb(req->org_opaque, ret);
+ }
+ QLIST_REMOVE(req, list);
+ qemu_vfree(req->buf);
+ qemu_iovec_destroy(&req->qiov);
+ g_free(req);
+}
+
+#define MAX_CANCELABLE_REQS 16
+
+BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
+ QEMUIOVector *iov, int nb_sectors,
+ BlockCompletionFunc *cb, void *opaque)
+{
+ BlockAIOCB *aioreq;
+ IDECancelableRequest *req;
+ int c = 0;
+
+ QLIST_FOREACH(req, &s->cancelable_requests, list) {
+ c++;
+ }
+ if (c > MAX_CANCELABLE_REQS) {
+ return NULL;
+ }
+
+ req = g_new0(IDECancelableRequest, 1);
+ qemu_iovec_init(&req->qiov, 1);
+ req->buf = qemu_blockalign(blk_bs(s->blk), iov->size);
+ qemu_iovec_add(&req->qiov, req->buf, iov->size);
+ req->org_qiov = iov;
+ req->org_cb = cb;
+ req->org_opaque = opaque;
+
+ aioreq = blk_aio_readv(s->blk, sector_num, &req->qiov, nb_sectors,
+ ide_readv_cancelable_cb, req);
+ if (aioreq == NULL) {
+ qemu_vfree(req->buf);
+ qemu_iovec_destroy(&req->qiov);
+ g_free(req);
+ } else {
+ QLIST_INSERT_HEAD(&s->cancelable_requests, req, list);
+ }
+
+ return aioreq;
+}
+
static void ide_sector_read(IDEState *s);
static void ide_sector_read_cb(void *opaque, int ret)
@@ -805,6 +858,7 @@ void ide_start_dma(IDEState *s, BlockCompletionFunc *cb)
s->bus->retry_unit = s->unit;
s->bus->retry_sector_num = ide_get_sector(s);
s->bus->retry_nsector = s->nsector;
+ s->bus->s = s;
if (s->bus->dma->ops->start_dma) {
s->bus->dma->ops->start_dma(s->bus->dma, s, cb);
}
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 05e93ff..ad188c2 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -343,6 +343,16 @@ enum ide_dma_cmd {
#define ide_cmd_is_read(s) \
((s)->dma_cmd == IDE_DMA_READ)
+typedef struct IDECancelableRequest {
+ QLIST_ENTRY(IDECancelableRequest) list;
+ QEMUIOVector qiov;
+ uint8_t *buf;
+ QEMUIOVector *org_qiov;
+ BlockCompletionFunc *org_cb;
+ void *org_opaque;
+ bool canceled;
+} IDECancelableRequest;
+
/* NOTE: IDEState represents in fact one drive */
struct IDEState {
IDEBus *bus;
@@ -396,6 +406,8 @@ struct IDEState {
BlockAIOCB *pio_aiocb;
struct iovec iov;
QEMUIOVector qiov;
+ QLIST_HEAD(, IDECancelableRequest) cancelable_requests;
+ bool requests_cancelable;
/* ATA DMA state */
int32_t io_buffer_offset;
int32_t io_buffer_size;
@@ -468,6 +480,7 @@ struct IDEBus {
uint8_t retry_unit;
int64_t retry_sector_num;
uint32_t retry_nsector;
+ IDEState *s;
};
#define TYPE_IDE_DEVICE "ide-device"
@@ -572,6 +585,9 @@ void ide_set_inactive(IDEState *s, bool more);
BlockAIOCB *ide_issue_trim(BlockBackend *blk,
int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
+ QEMUIOVector *iov, int nb_sectors,
+ BlockCompletionFunc *cb, void *opaque);
/* hw/ide/atapi.c */
void ide_atapi_cmd(IDEState *s);
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index d31ff88..5587183 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -240,21 +240,35 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
/* Ignore writes to SSBM if it keeps the old value */
if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) {
if (!(val & BM_CMD_START)) {
- /*
- * We can't cancel Scatter Gather DMA in the middle of the
- * operation or a partial (not full) DMA transfer would reach
- * the storage so we wait for completion instead (we beahve
- * like if the DMA was completed by the time the guest trying
- * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
- * set).
- *
- * In the future we'll be able to safely cancel the I/O if the
- * whole DMA operation will be submitted to disk with a single
- * aio operation with preadv/pwritev.
- */
if (bm->bus->dma->aiocb) {
- blk_drain_all();
- assert(bm->bus->dma->aiocb == NULL);
+ if (bm->bus->s && bm->bus->s->requests_cancelable) {
+ /*
+ * If the used IDE protocol supports request cancelation we
+ * can flag requests as canceled here and disable DMA.
+ * The IDE protocol used MUST use ide_readv_cancelable for all
+ * read operations and then subsequently can enable this code
+ * path. Currently this is only supported for read-only
+ * devices.
+ */
+ IDECancelableRequest *req;
+ QLIST_FOREACH(req, &bm->bus->s->cancelable_requests, list) {
+ if (!req->canceled) {
+ req->org_cb(req->org_opaque, -ECANCELED);
+ }
+ req->canceled = true;
+ }
+ } else {
+ /*
+ * We can't cancel Scatter Gather DMA in the middle of the
+ * operation or a partial (not full) DMA transfer would reach
+ * the storage so we wait for completion instead (we beahve
+ * like if the DMA was completed by the time the guest trying
+ * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
+ * set).
+ */
+ blk_drain_all();
+ assert(bm->bus->dma->aiocb == NULL);
+ }
}
bm->status &= ~BM_STATUS_DMAING;
} else {
--
1.9.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests
2015-10-12 12:27 ` [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests Peter Lieven
@ 2015-10-26 10:39 ` Stefan Hajnoczi
2015-10-27 10:58 ` Peter Lieven
0 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2015-10-26 10:39 UTC (permalink / raw)
To: Peter Lieven; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
On Mon, Oct 12, 2015 at 02:27:24PM +0200, Peter Lieven wrote:
> this patch adds a new aio readv compatible function which copies
> all data through a bounce buffer. The benefit is that these requests
> can be flagged as canceled to avoid guest memory corruption when
> a canceled request is completed by the backend at a later stage.
>
> If an IDE protocol wants to use this function it has to pipe
> all read requests through ide_readv_cancelable and it may then
> enable requests_cancelable in the IDEState.
>
> If this state is enable we can avoid the blocking blk_drain_all
> in case of a BMDMA reset.
>
> Currently only read operations are cancelable thus we can only
> use this logic for read-only devices.
Naming is confusing here. Requests are already "cancelable" using
bdv_aio_cancel().
Please use a different name, for example "orphan" requests. These are
requests that QEMU still knows about but the guest believes are
complete. Or maybe "IDEBufferedRequest" since data is transferred
through a bounce buffer.
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
> hw/ide/core.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> hw/ide/internal.h | 16 ++++++++++++++++
> hw/ide/pci.c | 42 ++++++++++++++++++++++++++++--------------
> 3 files changed, 98 insertions(+), 14 deletions(-)
>
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 317406d..24547ce 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -561,6 +561,59 @@ static bool ide_sect_range_ok(IDEState *s,
> return true;
> }
>
> +static void ide_readv_cancelable_cb(void *opaque, int ret)
> +{
> + IDECancelableRequest *req = opaque;
> + if (!req->canceled) {
> + if (!ret) {
> + qemu_iovec_from_buf(req->org_qiov, 0, req->buf, req->org_qiov->size);
> + }
> + req->org_cb(req->org_opaque, ret);
> + }
> + QLIST_REMOVE(req, list);
> + qemu_vfree(req->buf);
> + qemu_iovec_destroy(&req->qiov);
> + g_free(req);
> +}
> +
> +#define MAX_CANCELABLE_REQS 16
> +
> +BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
> + QEMUIOVector *iov, int nb_sectors,
> + BlockCompletionFunc *cb, void *opaque)
> +{
> + BlockAIOCB *aioreq;
> + IDECancelableRequest *req;
> + int c = 0;
> +
> + QLIST_FOREACH(req, &s->cancelable_requests, list) {
> + c++;
> + }
> + if (c > MAX_CANCELABLE_REQS) {
> + return NULL;
> + }
A BH is probably needed here to schedule an cb(-EIO) call since this
function isn't supposed to return NULL if it's a direct replacement for
blk_aio_readv().
> +
> + req = g_new0(IDECancelableRequest, 1);
> + qemu_iovec_init(&req->qiov, 1);
It saves a g_new() call if you add a struct iovec field to
IDECancelableRequest and use qemu_iovec_init_external() instead of
qemu_iovec_init().
The qemu_iovec_destroy() calls must be dropped when an external struct
iovec is used.
The qemu_iovec_init_external() call must be moved after the
qemu_blockalign() and struct iovec setup below.
> + req->buf = qemu_blockalign(blk_bs(s->blk), iov->size);
> + qemu_iovec_add(&req->qiov, req->buf, iov->size);
> + req->org_qiov = iov;
> + req->org_cb = cb;
> + req->org_opaque = opaque;
> +
> + aioreq = blk_aio_readv(s->blk, sector_num, &req->qiov, nb_sectors,
> + ide_readv_cancelable_cb, req);
> + if (aioreq == NULL) {
> + qemu_vfree(req->buf);
> + qemu_iovec_destroy(&req->qiov);
> + g_free(req);
> + } else {
> + QLIST_INSERT_HEAD(&s->cancelable_requests, req, list);
> + }
> +
> + return aioreq;
> +}
> +
> static void ide_sector_read(IDEState *s);
>
> static void ide_sector_read_cb(void *opaque, int ret)
> @@ -805,6 +858,7 @@ void ide_start_dma(IDEState *s, BlockCompletionFunc *cb)
> s->bus->retry_unit = s->unit;
> s->bus->retry_sector_num = ide_get_sector(s);
> s->bus->retry_nsector = s->nsector;
> + s->bus->s = s;
How is 's' different from 'unit' and 'retry_unit'?
The logic for switching between units is already a little tricky since
the guest can write to the hardware registers while requests are
in-flight.
Please don't duplicate "active unit" state, that increases the risk of
inconsistencies.
Can you use idebus_active_if() to get an equivalent IDEState pointer
without storing s?
> if (s->bus->dma->ops->start_dma) {
> s->bus->dma->ops->start_dma(s->bus->dma, s, cb);
> }
> diff --git a/hw/ide/internal.h b/hw/ide/internal.h
> index 05e93ff..ad188c2 100644
> --- a/hw/ide/internal.h
> +++ b/hw/ide/internal.h
> @@ -343,6 +343,16 @@ enum ide_dma_cmd {
> #define ide_cmd_is_read(s) \
> ((s)->dma_cmd == IDE_DMA_READ)
>
> +typedef struct IDECancelableRequest {
> + QLIST_ENTRY(IDECancelableRequest) list;
> + QEMUIOVector qiov;
> + uint8_t *buf;
> + QEMUIOVector *org_qiov;
> + BlockCompletionFunc *org_cb;
> + void *org_opaque;
Please don't shorten names, original_* is clearer than org_*.
> + bool canceled;
> +} IDECancelableRequest;
> +
> /* NOTE: IDEState represents in fact one drive */
> struct IDEState {
> IDEBus *bus;
> @@ -396,6 +406,8 @@ struct IDEState {
> BlockAIOCB *pio_aiocb;
> struct iovec iov;
> QEMUIOVector qiov;
> + QLIST_HEAD(, IDECancelableRequest) cancelable_requests;
> + bool requests_cancelable;
> /* ATA DMA state */
> int32_t io_buffer_offset;
> int32_t io_buffer_size;
> @@ -468,6 +480,7 @@ struct IDEBus {
> uint8_t retry_unit;
> int64_t retry_sector_num;
> uint32_t retry_nsector;
> + IDEState *s;
> };
>
> #define TYPE_IDE_DEVICE "ide-device"
> @@ -572,6 +585,9 @@ void ide_set_inactive(IDEState *s, bool more);
> BlockAIOCB *ide_issue_trim(BlockBackend *blk,
> int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
> BlockCompletionFunc *cb, void *opaque);
> +BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
> + QEMUIOVector *iov, int nb_sectors,
> + BlockCompletionFunc *cb, void *opaque);
>
> /* hw/ide/atapi.c */
> void ide_atapi_cmd(IDEState *s);
> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> index d31ff88..5587183 100644
> --- a/hw/ide/pci.c
> +++ b/hw/ide/pci.c
> @@ -240,21 +240,35 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
> /* Ignore writes to SSBM if it keeps the old value */
> if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) {
> if (!(val & BM_CMD_START)) {
> - /*
> - * We can't cancel Scatter Gather DMA in the middle of the
> - * operation or a partial (not full) DMA transfer would reach
> - * the storage so we wait for completion instead (we beahve
> - * like if the DMA was completed by the time the guest trying
> - * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
> - * set).
> - *
> - * In the future we'll be able to safely cancel the I/O if the
> - * whole DMA operation will be submitted to disk with a single
> - * aio operation with preadv/pwritev.
> - */
> if (bm->bus->dma->aiocb) {
> - blk_drain_all();
> - assert(bm->bus->dma->aiocb == NULL);
> + if (bm->bus->s && bm->bus->s->requests_cancelable) {
> + /*
> + * If the used IDE protocol supports request cancelation we
> + * can flag requests as canceled here and disable DMA.
> + * The IDE protocol used MUST use ide_readv_cancelable for all
> + * read operations and then subsequently can enable this code
> + * path. Currently this is only supported for read-only
> + * devices.
> + */
> + IDECancelableRequest *req;
> + QLIST_FOREACH(req, &bm->bus->s->cancelable_requests, list) {
> + if (!req->canceled) {
> + req->org_cb(req->org_opaque, -ECANCELED);
> + }
> + req->canceled = true;
> + }
> + } else {
> + /*
> + * We can't cancel Scatter Gather DMA in the middle of the
> + * operation or a partial (not full) DMA transfer would reach
> + * the storage so we wait for completion instead (we beahve
> + * like if the DMA was completed by the time the guest trying
> + * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
> + * set).
> + */
> + blk_drain_all();
> + assert(bm->bus->dma->aiocb == NULL);
This assertion applies is both branches of the if statement, it could be
moved after the if statement.
> + }
> }
> bm->status &= ~BM_STATUS_DMAING;
> } else {
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests
2015-10-26 10:39 ` Stefan Hajnoczi
@ 2015-10-27 10:58 ` Peter Lieven
2015-10-28 11:26 ` Stefan Hajnoczi
0 siblings, 1 reply; 18+ messages in thread
From: Peter Lieven @ 2015-10-27 10:58 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
Am 26.10.2015 um 11:39 schrieb Stefan Hajnoczi:
> On Mon, Oct 12, 2015 at 02:27:24PM +0200, Peter Lieven wrote:
>> this patch adds a new aio readv compatible function which copies
>> all data through a bounce buffer. The benefit is that these requests
>> can be flagged as canceled to avoid guest memory corruption when
>> a canceled request is completed by the backend at a later stage.
>>
>> If an IDE protocol wants to use this function it has to pipe
>> all read requests through ide_readv_cancelable and it may then
>> enable requests_cancelable in the IDEState.
>>
>> If this state is enable we can avoid the blocking blk_drain_all
>> in case of a BMDMA reset.
>>
>> Currently only read operations are cancelable thus we can only
>> use this logic for read-only devices.
> Naming is confusing here. Requests are already "cancelable" using
> bdv_aio_cancel().
>
> Please use a different name, for example "orphan" requests. These are
> requests that QEMU still knows about but the guest believes are
> complete. Or maybe "IDEBufferedRequest" since data is transferred
> through a bounce buffer.
>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> hw/ide/core.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> hw/ide/internal.h | 16 ++++++++++++++++
>> hw/ide/pci.c | 42 ++++++++++++++++++++++++++++--------------
>> 3 files changed, 98 insertions(+), 14 deletions(-)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index 317406d..24547ce 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -561,6 +561,59 @@ static bool ide_sect_range_ok(IDEState *s,
>> return true;
>> }
>>
>> +static void ide_readv_cancelable_cb(void *opaque, int ret)
>> +{
>> + IDECancelableRequest *req = opaque;
>> + if (!req->canceled) {
>> + if (!ret) {
>> + qemu_iovec_from_buf(req->org_qiov, 0, req->buf, req->org_qiov->size);
>> + }
>> + req->org_cb(req->org_opaque, ret);
>> + }
>> + QLIST_REMOVE(req, list);
>> + qemu_vfree(req->buf);
>> + qemu_iovec_destroy(&req->qiov);
>> + g_free(req);
>> +}
>> +
>> +#define MAX_CANCELABLE_REQS 16
>> +
>> +BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
>> + QEMUIOVector *iov, int nb_sectors,
>> + BlockCompletionFunc *cb, void *opaque)
>> +{
>> + BlockAIOCB *aioreq;
>> + IDECancelableRequest *req;
>> + int c = 0;
>> +
>> + QLIST_FOREACH(req, &s->cancelable_requests, list) {
>> + c++;
>> + }
>> + if (c > MAX_CANCELABLE_REQS) {
>> + return NULL;
>> + }
> A BH is probably needed here to schedule an cb(-EIO) call since this
> function isn't supposed to return NULL if it's a direct replacement for
> blk_aio_readv().
You mean sth like:
acb = qemu_aio_get(&bdrv_em_aiocb_info, bs, cb, opaque);
acb->bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_aio_bh_cb, acb);
acb->ret = -EIO;
qemu_bh_schedule(acb->bh);
return &acb->common;
>
>> +
>> + req = g_new0(IDECancelableRequest, 1);
>> + qemu_iovec_init(&req->qiov, 1);
> It saves a g_new() call if you add a struct iovec field to
> IDECancelableRequest and use qemu_iovec_init_external() instead of
> qemu_iovec_init().
>
> The qemu_iovec_destroy() calls must be dropped when an external struct
> iovec is used.
>
> The qemu_iovec_init_external() call must be moved after the
> qemu_blockalign() and struct iovec setup below.
okay
>
>> + req->buf = qemu_blockalign(blk_bs(s->blk), iov->size);
>> + qemu_iovec_add(&req->qiov, req->buf, iov->size);
>> + req->org_qiov = iov;
>> + req->org_cb = cb;
>> + req->org_opaque = opaque;
>> +
>> + aioreq = blk_aio_readv(s->blk, sector_num, &req->qiov, nb_sectors,
>> + ide_readv_cancelable_cb, req);
>> + if (aioreq == NULL) {
>> + qemu_vfree(req->buf);
>> + qemu_iovec_destroy(&req->qiov);
>> + g_free(req);
>> + } else {
>> + QLIST_INSERT_HEAD(&s->cancelable_requests, req, list);
>> + }
>> +
>> + return aioreq;
>> +}
>> +
>> static void ide_sector_read(IDEState *s);
>>
>> static void ide_sector_read_cb(void *opaque, int ret)
>> @@ -805,6 +858,7 @@ void ide_start_dma(IDEState *s, BlockCompletionFunc *cb)
>> s->bus->retry_unit = s->unit;
>> s->bus->retry_sector_num = ide_get_sector(s);
>> s->bus->retry_nsector = s->nsector;
>> + s->bus->s = s;
> How is 's' different from 'unit' and 'retry_unit'?
>
> The logic for switching between units is already a little tricky since
> the guest can write to the hardware registers while requests are
> in-flight.
>
> Please don't duplicate "active unit" state, that increases the risk of
> inconsistencies.
>
> Can you use idebus_active_if() to get an equivalent IDEState pointer
> without storing s?
That should be possible.
>
>> if (s->bus->dma->ops->start_dma) {
>> s->bus->dma->ops->start_dma(s->bus->dma, s, cb);
>> }
>> diff --git a/hw/ide/internal.h b/hw/ide/internal.h
>> index 05e93ff..ad188c2 100644
>> --- a/hw/ide/internal.h
>> +++ b/hw/ide/internal.h
>> @@ -343,6 +343,16 @@ enum ide_dma_cmd {
>> #define ide_cmd_is_read(s) \
>> ((s)->dma_cmd == IDE_DMA_READ)
>>
>> +typedef struct IDECancelableRequest {
>> + QLIST_ENTRY(IDECancelableRequest) list;
>> + QEMUIOVector qiov;
>> + uint8_t *buf;
>> + QEMUIOVector *org_qiov;
>> + BlockCompletionFunc *org_cb;
>> + void *org_opaque;
> Please don't shorten names, original_* is clearer than org_*.
Ok.
>
>> + bool canceled;
>> +} IDECancelableRequest;
>> +
>> /* NOTE: IDEState represents in fact one drive */
>> struct IDEState {
>> IDEBus *bus;
>> @@ -396,6 +406,8 @@ struct IDEState {
>> BlockAIOCB *pio_aiocb;
>> struct iovec iov;
>> QEMUIOVector qiov;
>> + QLIST_HEAD(, IDECancelableRequest) cancelable_requests;
>> + bool requests_cancelable;
>> /* ATA DMA state */
>> int32_t io_buffer_offset;
>> int32_t io_buffer_size;
>> @@ -468,6 +480,7 @@ struct IDEBus {
>> uint8_t retry_unit;
>> int64_t retry_sector_num;
>> uint32_t retry_nsector;
>> + IDEState *s;
>> };
>>
>> #define TYPE_IDE_DEVICE "ide-device"
>> @@ -572,6 +585,9 @@ void ide_set_inactive(IDEState *s, bool more);
>> BlockAIOCB *ide_issue_trim(BlockBackend *blk,
>> int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
>> BlockCompletionFunc *cb, void *opaque);
>> +BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
>> + QEMUIOVector *iov, int nb_sectors,
>> + BlockCompletionFunc *cb, void *opaque);
>>
>> /* hw/ide/atapi.c */
>> void ide_atapi_cmd(IDEState *s);
>> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
>> index d31ff88..5587183 100644
>> --- a/hw/ide/pci.c
>> +++ b/hw/ide/pci.c
>> @@ -240,21 +240,35 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
>> /* Ignore writes to SSBM if it keeps the old value */
>> if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) {
>> if (!(val & BM_CMD_START)) {
>> - /*
>> - * We can't cancel Scatter Gather DMA in the middle of the
>> - * operation or a partial (not full) DMA transfer would reach
>> - * the storage so we wait for completion instead (we beahve
>> - * like if the DMA was completed by the time the guest trying
>> - * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
>> - * set).
>> - *
>> - * In the future we'll be able to safely cancel the I/O if the
>> - * whole DMA operation will be submitted to disk with a single
>> - * aio operation with preadv/pwritev.
>> - */
>> if (bm->bus->dma->aiocb) {
>> - blk_drain_all();
>> - assert(bm->bus->dma->aiocb == NULL);
>> + if (bm->bus->s && bm->bus->s->requests_cancelable) {
>> + /*
>> + * If the used IDE protocol supports request cancelation we
>> + * can flag requests as canceled here and disable DMA.
>> + * The IDE protocol used MUST use ide_readv_cancelable for all
>> + * read operations and then subsequently can enable this code
>> + * path. Currently this is only supported for read-only
>> + * devices.
>> + */
>> + IDECancelableRequest *req;
>> + QLIST_FOREACH(req, &bm->bus->s->cancelable_requests, list) {
>> + if (!req->canceled) {
>> + req->org_cb(req->org_opaque, -ECANCELED);
>> + }
>> + req->canceled = true;
>> + }
>> + } else {
>> + /*
>> + * We can't cancel Scatter Gather DMA in the middle of the
>> + * operation or a partial (not full) DMA transfer would reach
>> + * the storage so we wait for completion instead (we beahve
>> + * like if the DMA was completed by the time the guest trying
>> + * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
>> + * set).
>> + */
>> + blk_drain_all();
>> + assert(bm->bus->dma->aiocb == NULL);
> This assertion applies is both branches of the if statement, it could be
> moved after the if statement.
Right.
As pointed out in my comment to your requestion about write/discard I think it should
be feasible to use buffered readv requests for all read-only IDE devices.
Only thing I'm unsure about is reopening. A reopen seems to only flush the device not
drain all requests.
Peter
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests
2015-10-27 10:58 ` Peter Lieven
@ 2015-10-28 11:26 ` Stefan Hajnoczi
2015-10-28 19:56 ` Peter Lieven
0 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2015-10-28 11:26 UTC (permalink / raw)
To: Peter Lieven; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
[-- Attachment #1: Type: text/plain, Size: 1551 bytes --]
On Tue, Oct 27, 2015 at 11:58:55AM +0100, Peter Lieven wrote:
> Am 26.10.2015 um 11:39 schrieb Stefan Hajnoczi:
> >On Mon, Oct 12, 2015 at 02:27:24PM +0200, Peter Lieven wrote:
> >>+BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
> >>+ QEMUIOVector *iov, int nb_sectors,
> >>+ BlockCompletionFunc *cb, void *opaque)
> >>+{
> >>+ BlockAIOCB *aioreq;
> >>+ IDECancelableRequest *req;
> >>+ int c = 0;
> >>+
> >>+ QLIST_FOREACH(req, &s->cancelable_requests, list) {
> >>+ c++;
> >>+ }
> >>+ if (c > MAX_CANCELABLE_REQS) {
> >>+ return NULL;
> >>+ }
> >A BH is probably needed here to schedule an cb(-EIO) call since this
> >function isn't supposed to return NULL if it's a direct replacement for
> >blk_aio_readv().
>
> You mean sth like:
>
> acb = qemu_aio_get(&bdrv_em_aiocb_info, bs, cb, opaque);
> acb->bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_aio_bh_cb, acb);
> acb->ret = -EIO;
> qemu_bh_schedule(acb->bh);
>
> return &acb->common;
Yes.
> As pointed out in my comment to your requestion about write/discard I think it should
> be feasible to use buffered readv requests for all read-only IDE devices.
> Only thing I'm unsure about is reopening. A reopen seems to only flush the device not
> drain all requests.
bdrv_reopen_prepare() callers should drain requests. For example,
bdrv_reopen_multiple() (and indirectly bdrv_reopen()) call
bdrv_drain_all(). Is this what you mean?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests
2015-10-28 11:26 ` Stefan Hajnoczi
@ 2015-10-28 19:56 ` Peter Lieven
0 siblings, 0 replies; 18+ messages in thread
From: Peter Lieven @ 2015-10-28 19:56 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
Am 28.10.2015 um 12:26 schrieb Stefan Hajnoczi:
> On Tue, Oct 27, 2015 at 11:58:55AM +0100, Peter Lieven wrote:
>> Am 26.10.2015 um 11:39 schrieb Stefan Hajnoczi:
>>> On Mon, Oct 12, 2015 at 02:27:24PM +0200, Peter Lieven wrote:
>>>> +BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
>>>> + QEMUIOVector *iov, int nb_sectors,
>>>> + BlockCompletionFunc *cb, void *opaque)
>>>> +{
>>>> + BlockAIOCB *aioreq;
>>>> + IDECancelableRequest *req;
>>>> + int c = 0;
>>>> +
>>>> + QLIST_FOREACH(req, &s->cancelable_requests, list) {
>>>> + c++;
>>>> + }
>>>> + if (c > MAX_CANCELABLE_REQS) {
>>>> + return NULL;
>>>> + }
>>> A BH is probably needed here to schedule an cb(-EIO) call since this
>>> function isn't supposed to return NULL if it's a direct replacement for
>>> blk_aio_readv().
>> You mean sth like:
>>
>> acb = qemu_aio_get(&bdrv_em_aiocb_info, bs, cb, opaque);
>> acb->bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_aio_bh_cb, acb);
>> acb->ret = -EIO;
>> qemu_bh_schedule(acb->bh);
>>
>> return &acb->common;
> Yes.
>
>> As pointed out in my comment to your requestion about write/discard I think it should
>> be feasible to use buffered readv requests for all read-only IDE devices.
>> Only thing I'm unsure about is reopening. A reopen seems to only flush the device not
>> drain all requests.
> bdrv_reopen_prepare() callers should drain requests. For example,
> bdrv_reopen_multiple() (and indirectly bdrv_reopen()) call
> bdrv_drain_all(). Is this what you mean?
Yes, I have only found a flush in bdrv_reopen_prepare, but if you say they need to drain before
I think it is safe to use the buffered_ide_readv for all read-only IDE devices not only CDROMs.
Peter
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 4/4] ide/atapi: enable cancelable requests
2015-10-12 12:27 [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
` (2 preceding siblings ...)
2015-10-12 12:27 ` [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests Peter Lieven
@ 2015-10-12 12:27 ` Peter Lieven
2015-10-26 10:42 ` [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Stefan Hajnoczi
4 siblings, 0 replies; 18+ messages in thread
From: Peter Lieven @ 2015-10-12 12:27 UTC (permalink / raw)
To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody, jsnow, Peter Lieven
Signed-off-by: Peter Lieven <pl@kamp.de>
---
hw/ide/atapi.c | 4 ++--
hw/ide/core.c | 1 +
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index e0cf066..8d38b1d 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -187,7 +187,7 @@ static int cd_read_sector(IDEState *s, int lba, void *buf)
printf("cd_read_sector: lba=%d\n", lba);
#endif
- if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
+ if (ide_readv_cancelable(s, (int64_t)lba << 2, &s->qiov, 4,
cd_read_sector_cb, s) == NULL) {
return -EIO;
}
@@ -426,7 +426,7 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
s->bus->dma->iov.iov_len = n * 4 * 512;
qemu_iovec_init_external(&s->bus->dma->qiov, &s->bus->dma->iov, 1);
- s->bus->dma->aiocb = blk_aio_readv(s->blk, (int64_t)s->lba << 2,
+ s->bus->dma->aiocb = ide_readv_cancelable(s, (int64_t)s->lba << 2,
&s->bus->dma->qiov, n * 4,
ide_atapi_cmd_read_dma_cb, s);
if (s->bus->dma->aiocb == NULL) {
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 24547ce..5c7a346 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2330,6 +2330,7 @@ int ide_init_drive(IDEState *s, BlockBackend *blk, IDEDriveKind kind,
if (kind == IDE_CD) {
blk_set_dev_ops(blk, &ide_cd_block_ops, s);
blk_set_guest_block_size(blk, 2048);
+ s->requests_cancelable = true;
} else {
if (!blk_is_inserted(s->blk)) {
error_report("Device needs media, but drive is empty");
--
1.9.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure
2015-10-12 12:27 [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
` (3 preceding siblings ...)
2015-10-12 12:27 ` [Qemu-devel] [PATCH 4/4] ide/atapi: enable cancelable requests Peter Lieven
@ 2015-10-26 10:42 ` Stefan Hajnoczi
2015-10-26 10:56 ` Peter Lieven
4 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2015-10-26 10:42 UTC (permalink / raw)
To: Peter Lieven; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
On Mon, Oct 12, 2015 at 02:27:21PM +0200, Peter Lieven wrote:
> This series aims at avoiding a hanging main-loop if a vserver has a
> CDROM image mounted from a NFS share and that NFS share goes down.
> Typical situation is that users mount an CDROM ISO to install something
> and then forget to eject that CDROM afterwards.
> As a consequence this mounted CD is able to bring down the
> whole vserver if the backend NFS share is unreachable. This is bad
> especially if the CDROM itself is not needed anymore at this point.
>
> This series aims at fixing 2 blocking I/O operations that would
> hang if the NFS server is unavailable:
> - ATAPI PIO read requests used sync calls to blk_read, convert
> them to an async variant where possible.
> - If a busmaster DMA request is cancelled all requests are drained.
> Convert the drain to an async request canceling.
>
> v1->v2: - fix offset for 2352 byte sector size [Kevin]
> - use a sync request if we continue an elementary transfer.
> As John pointed out we enter a race condition between next
> IDE command and async transfer otherwise. This is sill not
> optimal, but it fixes the NFS down problems for all cases where
> the NFS server goes down while there is no PIO CD activity.
> Of course, it could still happen during a PIO transfer, but I
> expect this to be the unlikelier case.
> I spent some effort trying to read more sectors at once and
> avoiding continuation of elementary transfers, but with
> whatever I came up it was destroying migration between different
> Qemu versions. I have a quite hackish patch that works and
> should survive migration, but I am not happy with it. So I
> would like to start with this version as it is a big improvement
> already.
> - Dropped Patch 5 because it is upstream meanwhile.
>
> Peter Lieven (4):
> ide/atapi: make PIO read requests async
> ide/atapi: blk_aio_readv may return NULL
> ide: add support for cancelable read requests
> ide/atapi: enable cancelable requests
>
> hw/ide/atapi.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++------
> hw/ide/core.c | 55 +++++++++++++++++++++++++++++++
> hw/ide/internal.h | 16 +++++++++
> hw/ide/pci.c | 42 +++++++++++++++--------
> 4 files changed, 188 insertions(+), 24 deletions(-)
Any reason why write and discard requests aren't covered in this series?
If this is a good idea for CD-ROM it should be a good idea for all PCI
IDE devices.
Having a specialized code path is often a sign that it hasn't been
tested enough. Can we get confident enough to enable this everywhere?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure
2015-10-26 10:42 ` [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Stefan Hajnoczi
@ 2015-10-26 10:56 ` Peter Lieven
2015-10-28 11:27 ` Stefan Hajnoczi
0 siblings, 1 reply; 18+ messages in thread
From: Peter Lieven @ 2015-10-26 10:56 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
Am 26.10.2015 um 11:42 schrieb Stefan Hajnoczi:
> On Mon, Oct 12, 2015 at 02:27:21PM +0200, Peter Lieven wrote:
>> This series aims at avoiding a hanging main-loop if a vserver has a
>> CDROM image mounted from a NFS share and that NFS share goes down.
>> Typical situation is that users mount an CDROM ISO to install something
>> and then forget to eject that CDROM afterwards.
>> As a consequence this mounted CD is able to bring down the
>> whole vserver if the backend NFS share is unreachable. This is bad
>> especially if the CDROM itself is not needed anymore at this point.
>>
>> This series aims at fixing 2 blocking I/O operations that would
>> hang if the NFS server is unavailable:
>> - ATAPI PIO read requests used sync calls to blk_read, convert
>> them to an async variant where possible.
>> - If a busmaster DMA request is cancelled all requests are drained.
>> Convert the drain to an async request canceling.
>>
>> v1->v2: - fix offset for 2352 byte sector size [Kevin]
>> - use a sync request if we continue an elementary transfer.
>> As John pointed out we enter a race condition between next
>> IDE command and async transfer otherwise. This is sill not
>> optimal, but it fixes the NFS down problems for all cases where
>> the NFS server goes down while there is no PIO CD activity.
>> Of course, it could still happen during a PIO transfer, but I
>> expect this to be the unlikelier case.
>> I spent some effort trying to read more sectors at once and
>> avoiding continuation of elementary transfers, but with
>> whatever I came up it was destroying migration between different
>> Qemu versions. I have a quite hackish patch that works and
>> should survive migration, but I am not happy with it. So I
>> would like to start with this version as it is a big improvement
>> already.
>> - Dropped Patch 5 because it is upstream meanwhile.
>>
>> Peter Lieven (4):
>> ide/atapi: make PIO read requests async
>> ide/atapi: blk_aio_readv may return NULL
>> ide: add support for cancelable read requests
>> ide/atapi: enable cancelable requests
>>
>> hw/ide/atapi.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++------
>> hw/ide/core.c | 55 +++++++++++++++++++++++++++++++
>> hw/ide/internal.h | 16 +++++++++
>> hw/ide/pci.c | 42 +++++++++++++++--------
>> 4 files changed, 188 insertions(+), 24 deletions(-)
> Any reason why write and discard requests aren't covered in this series?
>
> If this is a good idea for CD-ROM it should be a good idea for all PCI
> IDE devices.
>
> Having a specialized code path is often a sign that it hasn't been
> tested enough. Can we get confident enough to enable this everywhere?
The reason is that the buffered request trick does only work for
read-only devices (like a CDROM). A write request that is completed
on the backend storage at a later point (after the OS thinks the request
is canceled) can cause damage to the filesystem.
Peter
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure
2015-10-26 10:56 ` Peter Lieven
@ 2015-10-28 11:27 ` Stefan Hajnoczi
0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2015-10-28 11:27 UTC (permalink / raw)
To: Peter Lieven; +Cc: kwolf, jcody, jsnow, qemu-devel, qemu-block
[-- Attachment #1: Type: text/plain, Size: 3426 bytes --]
On Mon, Oct 26, 2015 at 11:56:26AM +0100, Peter Lieven wrote:
> Am 26.10.2015 um 11:42 schrieb Stefan Hajnoczi:
> >On Mon, Oct 12, 2015 at 02:27:21PM +0200, Peter Lieven wrote:
> >>This series aims at avoiding a hanging main-loop if a vserver has a
> >>CDROM image mounted from a NFS share and that NFS share goes down.
> >>Typical situation is that users mount an CDROM ISO to install something
> >>and then forget to eject that CDROM afterwards.
> >>As a consequence this mounted CD is able to bring down the
> >>whole vserver if the backend NFS share is unreachable. This is bad
> >>especially if the CDROM itself is not needed anymore at this point.
> >>
> >>This series aims at fixing 2 blocking I/O operations that would
> >>hang if the NFS server is unavailable:
> >> - ATAPI PIO read requests used sync calls to blk_read, convert
> >> them to an async variant where possible.
> >> - If a busmaster DMA request is cancelled all requests are drained.
> >> Convert the drain to an async request canceling.
> >>
> >>v1->v2: - fix offset for 2352 byte sector size [Kevin]
> >> - use a sync request if we continue an elementary transfer.
> >> As John pointed out we enter a race condition between next
> >> IDE command and async transfer otherwise. This is sill not
> >> optimal, but it fixes the NFS down problems for all cases where
> >> the NFS server goes down while there is no PIO CD activity.
> >> Of course, it could still happen during a PIO transfer, but I
> >> expect this to be the unlikelier case.
> >> I spent some effort trying to read more sectors at once and
> >> avoiding continuation of elementary transfers, but with
> >> whatever I came up it was destroying migration between different
> >> Qemu versions. I have a quite hackish patch that works and
> >> should survive migration, but I am not happy with it. So I
> >> would like to start with this version as it is a big improvement
> >> already.
> >> - Dropped Patch 5 because it is upstream meanwhile.
> >>
> >>Peter Lieven (4):
> >> ide/atapi: make PIO read requests async
> >> ide/atapi: blk_aio_readv may return NULL
> >> ide: add support for cancelable read requests
> >> ide/atapi: enable cancelable requests
> >>
> >> hw/ide/atapi.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++------
> >> hw/ide/core.c | 55 +++++++++++++++++++++++++++++++
> >> hw/ide/internal.h | 16 +++++++++
> >> hw/ide/pci.c | 42 +++++++++++++++--------
> >> 4 files changed, 188 insertions(+), 24 deletions(-)
> >Any reason why write and discard requests aren't covered in this series?
> >
> >If this is a good idea for CD-ROM it should be a good idea for all PCI
> >IDE devices.
> >
> >Having a specialized code path is often a sign that it hasn't been
> >tested enough. Can we get confident enough to enable this everywhere?
>
> The reason is that the buffered request trick does only work for
> read-only devices (like a CDROM). A write request that is completed
> on the backend storage at a later point (after the OS thinks the request
> is canceled) can cause damage to the filesystem.
Of course, you are right.
This is really annoying because it means a guest cannot reboot if writes
are pending...
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread