[Qemu-devel] [PATCH 0/5] ide: avoid main-loop hang on CDROM/NFS failure

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/5] ide: avoid main-loop hang on CDROM/NFS failure
@ 2015-09-21 12:25 Peter Lieven
  2015-09-21 12:25 ` [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async Peter Lieven
                   ` (5 more replies)
  0 siblings, 6 replies; 32+ messages in thread
From: Peter Lieven @ 2015-09-21 12:25 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody, jsnow, Peter Lieven

This series aims at avoiding a hanging main-loop if a vserver has a
CDROM image mounted from a NFS share and that NFS share goes down.
Typical situation is that users mount an CDROM ISO to install something
and then forget to eject that CDROM afterwards.
As a consequence this mounted CD is able to bring down the
whole vserver if the backend NFS share is unreachable. This is bad
especially if the CDROM itself is not needed anymore at this point.

This series aims at fixing 3 blocking I/O operations that would
hang if the NFS server is unavailable:
 - ATAPI PIO read requests used sync calls to blk_read, convert
   them to an async variant.
 - If a busmaster DMA request is cancelled all requests are drained.
   Convert the drain to an async request canceling.
 - query-block in the HMP or QMP hangs because it indirectly calls
   bdrv_get_allocated_file_size.

Note that Patch 5 is only included for completeness.

Peter

Peter Lieven (5):
  ide/atapi: make PIO read requests async
  ide/atapi: blk_aio_readv may return NULL
  ide: add support for cancelable read requests
  ide/atapi: enable cancelable requests
  block/nfs: cache allocated filesize for read-only files

 block/nfs.c       | 36 ++++++++++++++++++++++++++
 hw/ide/atapi.c    | 75 +++++++++++++++++++++++++++++++++++--------------------
 hw/ide/core.c     | 55 ++++++++++++++++++++++++++++++++++++++++
 hw/ide/internal.h | 16 ++++++++++++
 hw/ide/pci.c      | 42 ++++++++++++++++++++-----------
 5 files changed, 183 insertions(+), 41 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-09-21 12:25 [Qemu-devel] [PATCH 0/5] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
@ 2015-09-21 12:25 ` Peter Lieven
  2015-10-02 21:02   ` John Snow
                     ` (2 more replies)
  2015-09-21 12:25 ` [Qemu-devel] [PATCH 2/5] ide/atapi: blk_aio_readv may return NULL Peter Lieven
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 32+ messages in thread
From: Peter Lieven @ 2015-09-21 12:25 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody, jsnow, Peter Lieven

PIO read requests on the ATAPI interface used to be sync blk requests.
This has to siginificant drawbacks. First the main loop hangs util an
I/O request is completed and secondly if the I/O request does not
complete (e.g. due to an unresponsive storage) Qemu hangs completely.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 hw/ide/atapi.c | 69 ++++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 43 insertions(+), 26 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 747f466..9257e1c 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
     memset(buf, 0, 288);
 }
 
-static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
+static void cd_read_sector_cb(void *opaque, int ret)
 {
-    int ret;
+    IDEState *s = opaque;
 
-    switch(sector_size) {
-    case 2048:
-        block_acct_start(blk_get_stats(s->blk), &s->acct,
-                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
-        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
-        block_acct_done(blk_get_stats(s->blk), &s->acct);
-        break;
-    case 2352:
-        block_acct_start(blk_get_stats(s->blk), &s->acct,
-                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
-        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
-        block_acct_done(blk_get_stats(s->blk), &s->acct);
-        if (ret < 0)
-            return ret;
-        cd_data_to_raw(buf, lba);
-        break;
-    default:
-        ret = -EIO;
-        break;
+    block_acct_done(blk_get_stats(s->blk), &s->acct);
+
+    if (ret < 0) {
+        ide_atapi_io_error(s, ret);
+        return;
+    }
+
+    if (s->cd_sector_size == 2352) {
+        cd_data_to_raw(s->io_buffer, s->lba);
     }
-    return ret;
+
+    s->lba++;
+    s->io_buffer_index = 0;
+    s->status &= ~BUSY_STAT;
+
+    ide_atapi_cmd_reply_end(s);
+}
+
+static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
+{
+    if (sector_size != 2048 && sector_size != 2352) {
+        return -EINVAL;
+    }
+
+    s->iov.iov_base = buf;
+    if (sector_size == 2352) {
+        buf += 4;
+    }
+
+    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
+    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
+
+    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
+                      cd_read_sector_cb, s) == NULL) {
+        return -EIO;
+    }
+
+    block_acct_start(blk_get_stats(s->blk), &s->acct,
+                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
+    s->status |= BUSY_STAT;
+    return 0;
 }
 
 void ide_atapi_cmd_ok(IDEState *s)
@@ -190,10 +210,8 @@ void ide_atapi_cmd_reply_end(IDEState *s)
             ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
             if (ret < 0) {
                 ide_atapi_io_error(s, ret);
-                return;
             }
-            s->lba++;
-            s->io_buffer_index = 0;
+            return;
         }
         if (s->elementary_transfer_size > 0) {
             /* there are some data left to transmit in this elementary
@@ -275,7 +293,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
     s->io_buffer_index = sector_size;
     s->cd_sector_size = sector_size;
 
-    s->status = READY_STAT | SEEK_STAT;
     ide_atapi_cmd_reply_end(s);
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH 2/5] ide/atapi: blk_aio_readv may return NULL
  2015-09-21 12:25 [Qemu-devel] [PATCH 0/5] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
  2015-09-21 12:25 ` [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async Peter Lieven
@ 2015-09-21 12:25 ` Peter Lieven
  2015-09-21 12:25 ` [Qemu-devel] [PATCH 3/5] ide: add support for cancelable read requests Peter Lieven
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Peter Lieven @ 2015-09-21 12:25 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody, jsnow, Peter Lieven

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 hw/ide/atapi.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 9257e1c..b209ed9 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -371,6 +371,10 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
     s->bus->dma->aiocb = blk_aio_readv(s->blk, (int64_t)s->lba << 2,
                                        &s->bus->dma->qiov, n * 4,
                                        ide_atapi_cmd_read_dma_cb, s);
+    if (s->bus->dma->aiocb == NULL) {
+        ide_atapi_io_error(s, -EIO);
+        goto eot;
+    }
     return;
 
 eot:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH 3/5] ide: add support for cancelable read requests
  2015-09-21 12:25 [Qemu-devel] [PATCH 0/5] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
  2015-09-21 12:25 ` [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async Peter Lieven
  2015-09-21 12:25 ` [Qemu-devel] [PATCH 2/5] ide/atapi: blk_aio_readv may return NULL Peter Lieven
@ 2015-09-21 12:25 ` Peter Lieven
  2015-09-21 12:25 ` [Qemu-devel] [PATCH 4/5] ide/atapi: enable cancelable requests Peter Lieven
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Peter Lieven @ 2015-09-21 12:25 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody, jsnow, Peter Lieven

this patch adds a new aio readv compatible function which copies
all data through a bounce buffer. The benefit is that these requests
can be flagged as canceled to avoid guest memory corruption when
a canceled request is completed by the backend at a later stage.

If an IDE protocol wants to use this function it has to pipe
all read requests through ide_readv_cancelable and it may then
enable requests_cancelable in the IDEState.

If this state is enable we can avoid the blocking blk_drain_all
in case of a BMDMA reset.

Currently only read operations are cancelable thus we can only
use this logic for read-only devices.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 hw/ide/core.c     | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/ide/internal.h | 16 ++++++++++++++++
 hw/ide/pci.c      | 42 ++++++++++++++++++++++++++++--------------
 3 files changed, 98 insertions(+), 14 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 317406d..24547ce 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -561,6 +561,59 @@ static bool ide_sect_range_ok(IDEState *s,
     return true;
 }
 
+static void ide_readv_cancelable_cb(void *opaque, int ret)
+{
+    IDECancelableRequest *req = opaque;
+    if (!req->canceled) {
+        if (!ret) {
+            qemu_iovec_from_buf(req->org_qiov, 0, req->buf, req->org_qiov->size);
+        }
+        req->org_cb(req->org_opaque, ret);
+    }
+    QLIST_REMOVE(req, list);
+    qemu_vfree(req->buf);
+    qemu_iovec_destroy(&req->qiov);
+    g_free(req);
+}
+
+#define MAX_CANCELABLE_REQS 16
+
+BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
+                                 QEMUIOVector *iov, int nb_sectors,
+                                 BlockCompletionFunc *cb, void *opaque)
+{
+    BlockAIOCB *aioreq;
+    IDECancelableRequest *req;
+    int c = 0;
+
+    QLIST_FOREACH(req, &s->cancelable_requests, list) {
+        c++;
+    }
+    if (c > MAX_CANCELABLE_REQS) {
+        return NULL;
+    }
+
+    req = g_new0(IDECancelableRequest, 1);
+    qemu_iovec_init(&req->qiov, 1);
+    req->buf = qemu_blockalign(blk_bs(s->blk), iov->size);
+    qemu_iovec_add(&req->qiov, req->buf, iov->size);
+    req->org_qiov = iov;
+    req->org_cb = cb;
+    req->org_opaque = opaque;
+
+    aioreq = blk_aio_readv(s->blk, sector_num, &req->qiov, nb_sectors,
+                           ide_readv_cancelable_cb, req);
+    if (aioreq == NULL) {
+        qemu_vfree(req->buf);
+        qemu_iovec_destroy(&req->qiov);
+        g_free(req);
+    } else {
+        QLIST_INSERT_HEAD(&s->cancelable_requests, req, list);
+    }
+
+    return aioreq;
+}
+
 static void ide_sector_read(IDEState *s);
 
 static void ide_sector_read_cb(void *opaque, int ret)
@@ -805,6 +858,7 @@ void ide_start_dma(IDEState *s, BlockCompletionFunc *cb)
     s->bus->retry_unit = s->unit;
     s->bus->retry_sector_num = ide_get_sector(s);
     s->bus->retry_nsector = s->nsector;
+    s->bus->s = s;
     if (s->bus->dma->ops->start_dma) {
         s->bus->dma->ops->start_dma(s->bus->dma, s, cb);
     }
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 05e93ff..ad188c2 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -343,6 +343,16 @@ enum ide_dma_cmd {
 #define ide_cmd_is_read(s) \
 	((s)->dma_cmd == IDE_DMA_READ)
 
+typedef struct IDECancelableRequest {
+    QLIST_ENTRY(IDECancelableRequest) list;
+    QEMUIOVector qiov;
+    uint8_t *buf;
+    QEMUIOVector *org_qiov;
+    BlockCompletionFunc *org_cb;
+    void *org_opaque;
+    bool canceled;
+} IDECancelableRequest;
+
 /* NOTE: IDEState represents in fact one drive */
 struct IDEState {
     IDEBus *bus;
@@ -396,6 +406,8 @@ struct IDEState {
     BlockAIOCB *pio_aiocb;
     struct iovec iov;
     QEMUIOVector qiov;
+    QLIST_HEAD(, IDECancelableRequest) cancelable_requests;
+    bool requests_cancelable;
     /* ATA DMA state */
     int32_t io_buffer_offset;
     int32_t io_buffer_size;
@@ -468,6 +480,7 @@ struct IDEBus {
     uint8_t retry_unit;
     int64_t retry_sector_num;
     uint32_t retry_nsector;
+    IDEState *s;
 };
 
 #define TYPE_IDE_DEVICE "ide-device"
@@ -572,6 +585,9 @@ void ide_set_inactive(IDEState *s, bool more);
 BlockAIOCB *ide_issue_trim(BlockBackend *blk,
         int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
         BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
+                                 QEMUIOVector *iov, int nb_sectors,
+                                 BlockCompletionFunc *cb, void *opaque);
 
 /* hw/ide/atapi.c */
 void ide_atapi_cmd(IDEState *s);
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index d31ff88..5587183 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -240,21 +240,35 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
     /* Ignore writes to SSBM if it keeps the old value */
     if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) {
         if (!(val & BM_CMD_START)) {
-            /*
-             * We can't cancel Scatter Gather DMA in the middle of the
-             * operation or a partial (not full) DMA transfer would reach
-             * the storage so we wait for completion instead (we beahve
-             * like if the DMA was completed by the time the guest trying
-             * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
-             * set).
-             *
-             * In the future we'll be able to safely cancel the I/O if the
-             * whole DMA operation will be submitted to disk with a single
-             * aio operation with preadv/pwritev.
-             */
             if (bm->bus->dma->aiocb) {
-                blk_drain_all();
-                assert(bm->bus->dma->aiocb == NULL);
+                if (bm->bus->s && bm->bus->s->requests_cancelable) {
+                    /*
+                     * If the used IDE protocol supports request cancelation we
+                     * can flag requests as canceled here and disable DMA.
+                     * The IDE protocol used MUST use ide_readv_cancelable for all
+                     * read operations and then subsequently can enable this code
+                     * path. Currently this is only supported for read-only
+                     * devices.
+                    */
+                    IDECancelableRequest *req;
+                    QLIST_FOREACH(req, &bm->bus->s->cancelable_requests, list) {
+                        if (!req->canceled) {
+                            req->org_cb(req->org_opaque, -ECANCELED);
+                        }
+                        req->canceled = true;
+                    }
+                } else {
+                    /*
+                     * We can't cancel Scatter Gather DMA in the middle of the
+                     * operation or a partial (not full) DMA transfer would reach
+                     * the storage so we wait for completion instead (we beahve
+                     * like if the DMA was completed by the time the guest trying
+                     * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
+                     * set).
+                     */
+                    blk_drain_all();
+                    assert(bm->bus->dma->aiocb == NULL);
+                }
             }
             bm->status &= ~BM_STATUS_DMAING;
         } else {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH 4/5] ide/atapi: enable cancelable requests
  2015-09-21 12:25 [Qemu-devel] [PATCH 0/5] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
                   ` (2 preceding siblings ...)
  2015-09-21 12:25 ` [Qemu-devel] [PATCH 3/5] ide: add support for cancelable read requests Peter Lieven
@ 2015-09-21 12:25 ` Peter Lieven
  2015-09-21 12:25 ` [Qemu-devel] [PATCH 5/5] block/nfs: cache allocated filesize for read-only files Peter Lieven
  2015-09-21 20:58 ` [Qemu-devel] [PATCH 0/5] ide: avoid main-loop hang on CDROM/NFS failure John Snow
  5 siblings, 0 replies; 32+ messages in thread
From: Peter Lieven @ 2015-09-21 12:25 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody, jsnow, Peter Lieven

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 hw/ide/atapi.c | 4 ++--
 hw/ide/core.c  | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index b209ed9..ab45495 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -141,7 +141,7 @@ static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
     s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
     qemu_iovec_init_external(&s->qiov, &s->iov, 1);
 
-    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
+    if (ide_readv_cancelable(s, (int64_t)lba << 2, &s->qiov, 4,
                       cd_read_sector_cb, s) == NULL) {
         return -EIO;
     }
@@ -368,7 +368,7 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
     s->bus->dma->iov.iov_len = n * 4 * 512;
     qemu_iovec_init_external(&s->bus->dma->qiov, &s->bus->dma->iov, 1);
 
-    s->bus->dma->aiocb = blk_aio_readv(s->blk, (int64_t)s->lba << 2,
+    s->bus->dma->aiocb = ide_readv_cancelable(s, (int64_t)s->lba << 2,
                                        &s->bus->dma->qiov, n * 4,
                                        ide_atapi_cmd_read_dma_cb, s);
     if (s->bus->dma->aiocb == NULL) {
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 24547ce..5c7a346 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2330,6 +2330,7 @@ int ide_init_drive(IDEState *s, BlockBackend *blk, IDEDriveKind kind,
     if (kind == IDE_CD) {
         blk_set_dev_ops(blk, &ide_cd_block_ops, s);
         blk_set_guest_block_size(blk, 2048);
+        s->requests_cancelable = true;
     } else {
         if (!blk_is_inserted(s->blk)) {
             error_report("Device needs media, but drive is empty");
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH 5/5] block/nfs: cache allocated filesize for read-only files
  2015-09-21 12:25 [Qemu-devel] [PATCH 0/5] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
                   ` (3 preceding siblings ...)
  2015-09-21 12:25 ` [Qemu-devel] [PATCH 4/5] ide/atapi: enable cancelable requests Peter Lieven
@ 2015-09-21 12:25 ` Peter Lieven
  2015-09-21 20:58 ` [Qemu-devel] [PATCH 0/5] ide: avoid main-loop hang on CDROM/NFS failure John Snow
  5 siblings, 0 replies; 32+ messages in thread
From: Peter Lieven @ 2015-09-21 12:25 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody, jsnow, Peter Lieven

If the file is readonly its not expected to grow so
save the blocking call to nfs_fstat_async and use
the value saved at connection time. Also important
the monitor (and thus the main loop) will not hang
if block device info is queried and the NFS share
is unresponsive.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
---
 block/nfs.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/block/nfs.c b/block/nfs.c
index c026ff6..5ffd19f 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -43,6 +43,7 @@ typedef struct NFSClient {
     int events;
     bool has_zero_init;
     AioContext *aio_context;
+    blkcnt_t st_blocks;
 } NFSClient;
 
 typedef struct NFSRPC {
@@ -374,6 +375,7 @@ static int64_t nfs_client_open(NFSClient *client, const char *filename,
     }
 
     ret = DIV_ROUND_UP(st.st_size, BDRV_SECTOR_SIZE);
+    client->st_blocks = st.st_blocks;
     client->has_zero_init = S_ISREG(st.st_mode);
     goto out;
 fail:
@@ -464,6 +466,11 @@ static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
     NFSRPC task = {0};
     struct stat st;
 
+    if (bdrv_is_read_only(bs) &&
+        !(bs->open_flags & BDRV_O_NOCACHE)) {
+        return client->st_blocks * 512;
+    }
+
     task.st = &st;
     if (nfs_fstat_async(client->context, client->fh, nfs_co_generic_cb,
                         &task) != 0) {
@@ -484,6 +491,34 @@ static int nfs_file_truncate(BlockDriverState *bs, int64_t offset)
     return nfs_ftruncate(client->context, client->fh, offset);
 }
 
+/* Note that this will not re-establish a connection with the NFS server
+ * - it is effectively a NOP.  */
+static int nfs_reopen_prepare(BDRVReopenState *state,
+                              BlockReopenQueue *queue, Error **errp)
+{
+    NFSClient *client = state->bs->opaque;
+    struct stat st;
+    int ret = 0;
+
+    if (state->flags & BDRV_O_RDWR && bdrv_is_read_only(state->bs)) {
+        error_setg(errp, "Cannot open a read-only mount as read-write");
+        return -EACCES;
+    }
+
+    /* Update cache for read-only reopens */
+    if (!(state->flags & BDRV_O_RDWR)) {
+        ret = nfs_fstat(client->context, client->fh, &st);
+        if (ret < 0) {
+            error_setg(errp, "Failed to fstat file: %s",
+                       nfs_get_error(client->context));
+            return ret;
+        }
+        client->st_blocks = st.st_blocks;
+    }
+
+    return 0;
+}
+
 static BlockDriver bdrv_nfs = {
     .format_name                    = "nfs",
     .protocol_name                  = "nfs",
@@ -499,6 +534,7 @@ static BlockDriver bdrv_nfs = {
     .bdrv_file_open                 = nfs_file_open,
     .bdrv_close                     = nfs_file_close,
     .bdrv_create                    = nfs_file_create,
+    .bdrv_reopen_prepare            = nfs_reopen_prepare,
 
     .bdrv_co_readv                  = nfs_co_readv,
     .bdrv_co_writev                 = nfs_co_writev,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/5] ide: avoid main-loop hang on CDROM/NFS failure
  2015-09-21 12:25 [Qemu-devel] [PATCH 0/5] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
                   ` (4 preceding siblings ...)
  2015-09-21 12:25 ` [Qemu-devel] [PATCH 5/5] block/nfs: cache allocated filesize for read-only files Peter Lieven
@ 2015-09-21 20:58 ` John Snow
  2015-09-21 21:22   ` Peter Lieven
  5 siblings, 1 reply; 32+ messages in thread
From: John Snow @ 2015-09-21 20:58 UTC (permalink / raw)
  To: Peter Lieven, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody



On 09/21/2015 08:25 AM, Peter Lieven wrote:
> This series aims at avoiding a hanging main-loop if a vserver has a
> CDROM image mounted from a NFS share and that NFS share goes down.
> Typical situation is that users mount an CDROM ISO to install something
> and then forget to eject that CDROM afterwards.
> As a consequence this mounted CD is able to bring down the
> whole vserver if the backend NFS share is unreachable. This is bad
> especially if the CDROM itself is not needed anymore at this point.
> 
> This series aims at fixing 3 blocking I/O operations that would
> hang if the NFS server is unavailable:
>  - ATAPI PIO read requests used sync calls to blk_read, convert
>    them to an async variant.
>  - If a busmaster DMA request is cancelled all requests are drained.
>    Convert the drain to an async request canceling.
>  - query-block in the HMP or QMP hangs because it indirectly calls
>    bdrv_get_allocated_file_size.
> 
> Note that Patch 5 is only included for completeness.
> 
> Peter
> 
> Peter Lieven (5):
>   ide/atapi: make PIO read requests async
>   ide/atapi: blk_aio_readv may return NULL
>   ide: add support for cancelable read requests
>   ide/atapi: enable cancelable requests
>   block/nfs: cache allocated filesize for read-only files
> 
>  block/nfs.c       | 36 ++++++++++++++++++++++++++
>  hw/ide/atapi.c    | 75 +++++++++++++++++++++++++++++++++++--------------------
>  hw/ide/core.c     | 55 ++++++++++++++++++++++++++++++++++++++++
>  hw/ide/internal.h | 16 ++++++++++++
>  hw/ide/pci.c      | 42 ++++++++++++++++++++-----------
>  5 files changed, 183 insertions(+), 41 deletions(-)
> 

I assume this supersedes both:

[Qemu-devel] [PATCH 0/2] ide/atapi: partially avoid deadlock if the
storage backend is dead

and

[Qemu-devel] [PATCH] ide/atapi: make PIO read requests async

right?

--js

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 0/5] ide: avoid main-loop hang on CDROM/NFS failure
  2015-09-21 20:58 ` [Qemu-devel] [PATCH 0/5] ide: avoid main-loop hang on CDROM/NFS failure John Snow
@ 2015-09-21 21:22   ` Peter Lieven
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Lieven @ 2015-09-21 21:22 UTC (permalink / raw)
  To: John Snow; +Cc: kwolf, stefanha, jcody, qemu-devel, qemu-block



> Am 21.09.2015 um 22:58 schrieb John Snow <jsnow@redhat.com>:
> 
> 
> 
>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>> This series aims at avoiding a hanging main-loop if a vserver has a
>> CDROM image mounted from a NFS share and that NFS share goes down.
>> Typical situation is that users mount an CDROM ISO to install something
>> and then forget to eject that CDROM afterwards.
>> As a consequence this mounted CD is able to bring down the
>> whole vserver if the backend NFS share is unreachable. This is bad
>> especially if the CDROM itself is not needed anymore at this point.
>> 
>> This series aims at fixing 3 blocking I/O operations that would
>> hang if the NFS server is unavailable:
>> - ATAPI PIO read requests used sync calls to blk_read, convert
>>   them to an async variant.
>> - If a busmaster DMA request is cancelled all requests are drained.
>>   Convert the drain to an async request canceling.
>> - query-block in the HMP or QMP hangs because it indirectly calls
>>   bdrv_get_allocated_file_size.
>> 
>> Note that Patch 5 is only included for completeness.
>> 
>> Peter
>> 
>> Peter Lieven (5):
>>  ide/atapi: make PIO read requests async
>>  ide/atapi: blk_aio_readv may return NULL
>>  ide: add support for cancelable read requests
>>  ide/atapi: enable cancelable requests
>>  block/nfs: cache allocated filesize for read-only files
>> 
>> block/nfs.c       | 36 ++++++++++++++++++++++++++
>> hw/ide/atapi.c    | 75 +++++++++++++++++++++++++++++++++++--------------------
>> hw/ide/core.c     | 55 ++++++++++++++++++++++++++++++++++++++++
>> hw/ide/internal.h | 16 ++++++++++++
>> hw/ide/pci.c      | 42 ++++++++++++++++++++-----------
>> 5 files changed, 183 insertions(+), 41 deletions(-)
> 
> I assume this supersedes both:
> 
> [Qemu-devel] [PATCH 0/2] ide/atapi: partially avoid deadlock if the
> storage backend is dead
> 
> and
> 
> [Qemu-devel] [PATCH] ide/atapi: make PIO read requests async
> 
> right?

yes, the first patch was wrong as Stefan pointed out and the second is the same version as previously on the list.

Peter 

> 
> --js

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-09-21 12:25 ` [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async Peter Lieven
@ 2015-10-02 21:02   ` John Snow
  2015-10-05 21:15   ` John Snow
  2015-10-06 13:05   ` Laszlo Ersek
  2 siblings, 0 replies; 32+ messages in thread
From: John Snow @ 2015-10-02 21:02 UTC (permalink / raw)
  To: Peter Lieven, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody



On 09/21/2015 08:25 AM, Peter Lieven wrote:
> PIO read requests on the ATAPI interface used to be sync blk requests.
> This has to siginificant drawbacks. First the main loop hangs util an
> I/O request is completed and secondly if the I/O request does not
> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  hw/ide/atapi.c | 69 ++++++++++++++++++++++++++++++++++++----------------------
>  1 file changed, 43 insertions(+), 26 deletions(-)
> 
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index 747f466..9257e1c 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>      memset(buf, 0, 288);
>  }
>  
> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
> +static void cd_read_sector_cb(void *opaque, int ret)
>  {
> -    int ret;
> +    IDEState *s = opaque;
>  
> -    switch(sector_size) {
> -    case 2048:
> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
> -        break;
> -    case 2352:
> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
> -        if (ret < 0)
> -            return ret;
> -        cd_data_to_raw(buf, lba);
> -        break;
> -    default:
> -        ret = -EIO;
> -        break;
> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
> +
> +    if (ret < 0) {
> +        ide_atapi_io_error(s, ret);
> +        return;
> +    }
> +
> +    if (s->cd_sector_size == 2352) {
> +        cd_data_to_raw(s->io_buffer, s->lba);
>      }
> -    return ret;
> +
> +    s->lba++;
> +    s->io_buffer_index = 0;
> +    s->status &= ~BUSY_STAT;
> +
> +    ide_atapi_cmd_reply_end(s);
> +}
> +
> +static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
> +{
> +    if (sector_size != 2048 && sector_size != 2352) {
> +        return -EINVAL;
> +    }
> +
> +    s->iov.iov_base = buf;
> +    if (sector_size == 2352) {
> +        buf += 4;
> +    }
> +
> +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
> +
> +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
> +                      cd_read_sector_cb, s) == NULL) {
> +        return -EIO;
> +    }
> +
> +    block_acct_start(blk_get_stats(s->blk), &s->acct,
> +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> +    s->status |= BUSY_STAT;
> +    return 0;
>  }
>  
>  void ide_atapi_cmd_ok(IDEState *s)
> @@ -190,10 +210,8 @@ void ide_atapi_cmd_reply_end(IDEState *s)
>              ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
>              if (ret < 0) {
>                  ide_atapi_io_error(s, ret);
> -                return;
>              }
> -            s->lba++;
> -            s->io_buffer_index = 0;
> +            return;
>          }
>          if (s->elementary_transfer_size > 0) {
>              /* there are some data left to transmit in this elementary
> @@ -275,7 +293,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
>      s->io_buffer_index = sector_size;
>      s->cd_sector_size = sector_size;
>  
> -    s->status = READY_STAT | SEEK_STAT;
>      ide_atapi_cmd_reply_end(s);
>  }
>  
> 

For me, this hangs the pio_large test in tests/ide-test, path:
/x86_64/ide/cdrom/pio_large

...I'll debug this over the weekend.

I think I checked this test in after you wrote this patch series, with a
mind of being able to test ATAPI for Alexander's ATAPI-SCSI bridge, but
it seems useful here :)

-- 
—js

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-09-21 12:25 ` [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async Peter Lieven
  2015-10-02 21:02   ` John Snow
@ 2015-10-05 21:15   ` John Snow
  2015-10-06  8:46     ` Peter Lieven
  2015-10-06  8:57     ` Kevin Wolf
  2015-10-06 13:05   ` Laszlo Ersek
  2 siblings, 2 replies; 32+ messages in thread
From: John Snow @ 2015-10-05 21:15 UTC (permalink / raw)
  To: Peter Lieven, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody



On 09/21/2015 08:25 AM, Peter Lieven wrote:
> PIO read requests on the ATAPI interface used to be sync blk requests.
> This has to siginificant drawbacks. First the main loop hangs util an
> I/O request is completed and secondly if the I/O request does not
> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  hw/ide/atapi.c | 69 ++++++++++++++++++++++++++++++++++++----------------------
>  1 file changed, 43 insertions(+), 26 deletions(-)
> 
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index 747f466..9257e1c 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>      memset(buf, 0, 288);
>  }
>  
> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
> +static void cd_read_sector_cb(void *opaque, int ret)
>  {
> -    int ret;
> +    IDEState *s = opaque;
>  
> -    switch(sector_size) {
> -    case 2048:
> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
> -        break;
> -    case 2352:
> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
> -        if (ret < 0)
> -            return ret;
> -        cd_data_to_raw(buf, lba);
> -        break;
> -    default:
> -        ret = -EIO;
> -        break;
> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
> +
> +    if (ret < 0) {
> +        ide_atapi_io_error(s, ret);
> +        return;
> +    }
> +
> +    if (s->cd_sector_size == 2352) {
> +        cd_data_to_raw(s->io_buffer, s->lba);
>      }
> -    return ret;
> +
> +    s->lba++;
> +    s->io_buffer_index = 0;
> +    s->status &= ~BUSY_STAT;
> +
> +    ide_atapi_cmd_reply_end(s);
> +}
> +
> +static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
> +{
> +    if (sector_size != 2048 && sector_size != 2352) {
> +        return -EINVAL;
> +    }
> +
> +    s->iov.iov_base = buf;
> +    if (sector_size == 2352) {
> +        buf += 4;
> +    }
> +
> +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
> +
> +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
> +                      cd_read_sector_cb, s) == NULL) {
> +        return -EIO;
> +    }
> +
> +    block_acct_start(blk_get_stats(s->blk), &s->acct,
> +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> +    s->status |= BUSY_STAT;
> +    return 0;
>  }
>  

We discussed this off-list a bit, but for upstream synchronization:

Unfortunately, I believe making cd_read_sector here non-blocking makes
ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
s->end_transfer_func() nonblocking, which functions like ide_data_readw
are not prepared to cope with.

My suggestion is to buffer an entire DRQ block of data at once
(byte_count_limit) to avoid the problem.

Thanks,
--js

>  void ide_atapi_cmd_ok(IDEState *s)
> @@ -190,10 +210,8 @@ void ide_atapi_cmd_reply_end(IDEState *s)
>              ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
>              if (ret < 0) {
>                  ide_atapi_io_error(s, ret);
> -                return;
>              }
> -            s->lba++;
> -            s->io_buffer_index = 0;
> +            return;
>          }
>          if (s->elementary_transfer_size > 0) {
>              /* there are some data left to transmit in this elementary
> @@ -275,7 +293,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
>      s->io_buffer_index = sector_size;
>      s->cd_sector_size = sector_size;
>  
> -    s->status = READY_STAT | SEEK_STAT;
>      ide_atapi_cmd_reply_end(s);
>  }
>  
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-05 21:15   ` John Snow
@ 2015-10-06  8:46     ` Peter Lieven
  2015-10-06 12:08       ` Peter Lieven
  2015-10-07 16:42       ` John Snow
  2015-10-06  8:57     ` Kevin Wolf
  1 sibling, 2 replies; 32+ messages in thread
From: Peter Lieven @ 2015-10-06  8:46 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody

Am 05.10.2015 um 23:15 schrieb John Snow:
>
> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>> PIO read requests on the ATAPI interface used to be sync blk requests.
>> This has to siginificant drawbacks. First the main loop hangs util an
>> I/O request is completed and secondly if the I/O request does not
>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>   hw/ide/atapi.c | 69 ++++++++++++++++++++++++++++++++++++----------------------
>>   1 file changed, 43 insertions(+), 26 deletions(-)
>>
>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>> index 747f466..9257e1c 100644
>> --- a/hw/ide/atapi.c
>> +++ b/hw/ide/atapi.c
>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>>       memset(buf, 0, 288);
>>   }
>>   
>> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
>> +static void cd_read_sector_cb(void *opaque, int ret)
>>   {
>> -    int ret;
>> +    IDEState *s = opaque;
>>   
>> -    switch(sector_size) {
>> -    case 2048:
>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>> -        break;
>> -    case 2352:
>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>> -        if (ret < 0)
>> -            return ret;
>> -        cd_data_to_raw(buf, lba);
>> -        break;
>> -    default:
>> -        ret = -EIO;
>> -        break;
>> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
>> +
>> +    if (ret < 0) {
>> +        ide_atapi_io_error(s, ret);
>> +        return;
>> +    }
>> +
>> +    if (s->cd_sector_size == 2352) {
>> +        cd_data_to_raw(s->io_buffer, s->lba);
>>       }
>> -    return ret;
>> +
>> +    s->lba++;
>> +    s->io_buffer_index = 0;
>> +    s->status &= ~BUSY_STAT;
>> +
>> +    ide_atapi_cmd_reply_end(s);
>> +}
>> +
>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
>> +{
>> +    if (sector_size != 2048 && sector_size != 2352) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    s->iov.iov_base = buf;
>> +    if (sector_size == 2352) {
>> +        buf += 4;
>> +    }
>> +
>> +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>> +
>> +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
>> +                      cd_read_sector_cb, s) == NULL) {
>> +        return -EIO;
>> +    }
>> +
>> +    block_acct_start(blk_get_stats(s->blk), &s->acct,
>> +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>> +    s->status |= BUSY_STAT;
>> +    return 0;
>>   }
>>   
> We discussed this off-list a bit, but for upstream synchronization:
>
> Unfortunately, I believe making cd_read_sector here non-blocking makes
> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
> s->end_transfer_func() nonblocking, which functions like ide_data_readw
> are not prepared to cope with.
>
> My suggestion is to buffer an entire DRQ block of data at once
> (byte_count_limit) to avoid the problem.

Hi John,

first of all thank you for the detailed analysis.

Is the following what you have i mind. For PIO reads > 1 sector
it is a great improvement for the NFS backend:

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index ab45495..ec2ba89 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -117,37 +117,40 @@ static void cd_read_sector_cb(void *opaque, int ret)
      }

      if (s->cd_sector_size == 2352) {
-        cd_data_to_raw(s->io_buffer, s->lba);
+        int nb_sectors = s->packet_transfer_size / 2352;
+        while (nb_sectors--) {
+            memmove(s->io_buffer + nb_sectors * 2352 + 4,
+                    s->io_buffer + nb_sectors * 2048, 2048);
+            cd_data_to_raw(s->io_buffer + nb_sectors * 2352,
+                           s->lba + nb_sectors);
+        }
      }

-    s->lba++;
+    s->lba = -1;
      s->io_buffer_index = 0;
      s->status &= ~BUSY_STAT;

      ide_atapi_cmd_reply_end(s);
  }

-static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
+static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size, int nb_sectors)
  {
      if (sector_size != 2048 && sector_size != 2352) {
          return -EINVAL;
      }

      s->iov.iov_base = buf;
-    if (sector_size == 2352) {
-        buf += 4;
-    }
-
-    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
+    s->iov.iov_len = nb_sectors * 2048;
      qemu_iovec_init_external(&s->qiov, &s->iov, 1);

-    if (ide_readv_cancelable(s, (int64_t)lba << 2, &s->qiov, 4,
-                      cd_read_sector_cb, s) == NULL) {
+    if (ide_readv_cancelable(s, (int64_t)lba << 2,
+                             &s->qiov, nb_sectors * 4,
+                             cd_read_sector_cb, s) == NULL) {
          return -EIO;
      }

      block_acct_start(blk_get_stats(s->blk), &s->acct,
-                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
+                     nb_sectors * 2048, BLOCK_ACCT_READ);
      s->status |= BUSY_STAT;
      return 0;
  }
@@ -190,7 +193,7 @@ void ide_atapi_io_error(IDEState *s, int ret)
  /* The whole ATAPI transfer logic is handled in this function */
  void ide_atapi_cmd_reply_end(IDEState *s)
  {
-    int byte_count_limit, size, ret;
+    int byte_count_limit, size;
  #ifdef DEBUG_IDE_ATAPI
      printf("reply: tx_size=%d elem_tx_size=%d index=%d\n",
             s->packet_transfer_size,
@@ -205,14 +208,6 @@ void ide_atapi_cmd_reply_end(IDEState *s)
          printf("status=0x%x\n", s->status);
  #endif
      } else {
-        /* see if a new sector must be read */
-        if (s->lba != -1 && s->io_buffer_index >= s->cd_sector_size) {
-            ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
-            if (ret < 0) {
-                ide_atapi_io_error(s, ret);
-            }
-            return;
-        }
          if (s->elementary_transfer_size > 0) {
              /* there are some data left to transmit in this elementary
                 transfer */
@@ -287,13 +282,18 @@ static void ide_atapi_cmd_reply(IDEState *s, int size, int max_size)
  static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
                                     int sector_size)
  {
+    int ret;
      s->lba = lba;
      s->packet_transfer_size = nb_sectors * sector_size;
+    assert(s->packet_transfer_size <=
+           IDE_DMA_BUF_SECTORS * BDRV_SECTOR_SIZE + 4);
      s->elementary_transfer_size = 0;
-    s->io_buffer_index = sector_size;
      s->cd_sector_size = sector_size;
-
-    ide_atapi_cmd_reply_end(s);
+    ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size,
+                         nb_sectors);
+    if (ret < 0) {
+        ide_atapi_io_error(s, ret);
+    }
  }

  static void ide_atapi_cmd_check_status(IDEState *s)


Did you also have a look at the other patches?

Thanks,
Peter

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-05 21:15   ` John Snow
  2015-10-06  8:46     ` Peter Lieven
@ 2015-10-06  8:57     ` Kevin Wolf
  2015-10-06  9:20       ` Peter Lieven
  2015-10-06 15:54       ` John Snow
  1 sibling, 2 replies; 32+ messages in thread
From: Kevin Wolf @ 2015-10-06  8:57 UTC (permalink / raw)
  To: John Snow; +Cc: stefanha, jcody, Peter Lieven, qemu-devel, qemu-block

Am 05.10.2015 um 23:15 hat John Snow geschrieben:
> 
> 
> On 09/21/2015 08:25 AM, Peter Lieven wrote:
> > PIO read requests on the ATAPI interface used to be sync blk requests.
> > This has to siginificant drawbacks. First the main loop hangs util an
> > I/O request is completed and secondly if the I/O request does not
> > complete (e.g. due to an unresponsive storage) Qemu hangs completely.
> > 
> > Signed-off-by: Peter Lieven <pl@kamp.de>
> > ---
> >  hw/ide/atapi.c | 69 ++++++++++++++++++++++++++++++++++++----------------------
> >  1 file changed, 43 insertions(+), 26 deletions(-)
> > 
> > diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> > index 747f466..9257e1c 100644
> > --- a/hw/ide/atapi.c
> > +++ b/hw/ide/atapi.c
> > @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
> >      memset(buf, 0, 288);
> >  }
> >  
> > -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
> > +static void cd_read_sector_cb(void *opaque, int ret)
> >  {
> > -    int ret;
> > +    IDEState *s = opaque;
> >  
> > -    switch(sector_size) {
> > -    case 2048:
> > -        block_acct_start(blk_get_stats(s->blk), &s->acct,
> > -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> > -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
> > -        block_acct_done(blk_get_stats(s->blk), &s->acct);
> > -        break;
> > -    case 2352:
> > -        block_acct_start(blk_get_stats(s->blk), &s->acct,
> > -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> > -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
> > -        block_acct_done(blk_get_stats(s->blk), &s->acct);
> > -        if (ret < 0)
> > -            return ret;
> > -        cd_data_to_raw(buf, lba);
> > -        break;
> > -    default:
> > -        ret = -EIO;
> > -        break;
> > +    block_acct_done(blk_get_stats(s->blk), &s->acct);
> > +
> > +    if (ret < 0) {
> > +        ide_atapi_io_error(s, ret);
> > +        return;
> > +    }
> > +
> > +    if (s->cd_sector_size == 2352) {
> > +        cd_data_to_raw(s->io_buffer, s->lba);
> >      }
> > -    return ret;
> > +
> > +    s->lba++;
> > +    s->io_buffer_index = 0;
> > +    s->status &= ~BUSY_STAT;
> > +
> > +    ide_atapi_cmd_reply_end(s);
> > +}
> > +
> > +static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
> > +{
> > +    if (sector_size != 2048 && sector_size != 2352) {
> > +        return -EINVAL;
> > +    }
> > +
> > +    s->iov.iov_base = buf;
> > +    if (sector_size == 2352) {
> > +        buf += 4;
> > +    }

This doesn't look quite right, buf is never read after this.

Also, why +=4 when it was originally buf + 16?

> > +
> > +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> > +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
> > +
> > +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
> > +                      cd_read_sector_cb, s) == NULL) {
> > +        return -EIO;
> > +    }
> > +
> > +    block_acct_start(blk_get_stats(s->blk), &s->acct,
> > +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> > +    s->status |= BUSY_STAT;
> > +    return 0;
> >  }
> >  
> 
> We discussed this off-list a bit, but for upstream synchronization:
> 
> Unfortunately, I believe making cd_read_sector here non-blocking makes
> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
> s->end_transfer_func() nonblocking, which functions like ide_data_readw
> are not prepared to cope with.

I don't think that's a problem as long as BSY is set while the
asynchronous command is running and DRQ is cleared. The latter will
protect ide_data_readw(). ide_sector_read() does essentially the same
thing.

Or maybe I'm just missing what you're trying to say.

> My suggestion is to buffer an entire DRQ block of data at once
> (byte_count_limit) to avoid the problem.

No matter whether there is a problem or not, buffering more data at once
(and therefore doing less requests) is better for performance anyway.

Kevin

> >  void ide_atapi_cmd_ok(IDEState *s)
> > @@ -190,10 +210,8 @@ void ide_atapi_cmd_reply_end(IDEState *s)
> >              ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
> >              if (ret < 0) {
> >                  ide_atapi_io_error(s, ret);
> > -                return;
> >              }
> > -            s->lba++;
> > -            s->io_buffer_index = 0;
> > +            return;
> >          }
> >          if (s->elementary_transfer_size > 0) {
> >              /* there are some data left to transmit in this elementary
> > @@ -275,7 +293,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
> >      s->io_buffer_index = sector_size;
> >      s->cd_sector_size = sector_size;
> >  
> > -    s->status = READY_STAT | SEEK_STAT;
> >      ide_atapi_cmd_reply_end(s);
> >  }
> >  
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-06  8:57     ` Kevin Wolf
@ 2015-10-06  9:20       ` Peter Lieven
  2015-10-06 17:07         ` John Snow
  2015-10-06 15:54       ` John Snow
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Lieven @ 2015-10-06  9:20 UTC (permalink / raw)
  To: Kevin Wolf, John Snow; +Cc: stefanha, jcody, qemu-devel, qemu-block

Am 06.10.2015 um 10:57 schrieb Kevin Wolf:
> Am 05.10.2015 um 23:15 hat John Snow geschrieben:
>>
>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>>> PIO read requests on the ATAPI interface used to be sync blk requests.
>>> This has to siginificant drawbacks. First the main loop hangs util an
>>> I/O request is completed and secondly if the I/O request does not
>>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>>>
>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>> ---
>>>   hw/ide/atapi.c | 69 ++++++++++++++++++++++++++++++++++++----------------------
>>>   1 file changed, 43 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>>> index 747f466..9257e1c 100644
>>> --- a/hw/ide/atapi.c
>>> +++ b/hw/ide/atapi.c
>>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>>>       memset(buf, 0, 288);
>>>   }
>>>   
>>> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
>>> +static void cd_read_sector_cb(void *opaque, int ret)
>>>   {
>>> -    int ret;
>>> +    IDEState *s = opaque;
>>>   
>>> -    switch(sector_size) {
>>> -    case 2048:
>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>> -        break;
>>> -    case 2352:
>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>> -        if (ret < 0)
>>> -            return ret;
>>> -        cd_data_to_raw(buf, lba);
>>> -        break;
>>> -    default:
>>> -        ret = -EIO;
>>> -        break;
>>> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
>>> +
>>> +    if (ret < 0) {
>>> +        ide_atapi_io_error(s, ret);
>>> +        return;
>>> +    }
>>> +
>>> +    if (s->cd_sector_size == 2352) {
>>> +        cd_data_to_raw(s->io_buffer, s->lba);
>>>       }
>>> -    return ret;
>>> +
>>> +    s->lba++;
>>> +    s->io_buffer_index = 0;
>>> +    s->status &= ~BUSY_STAT;
>>> +
>>> +    ide_atapi_cmd_reply_end(s);
>>> +}
>>> +
>>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
>>> +{
>>> +    if (sector_size != 2048 && sector_size != 2352) {
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    s->iov.iov_base = buf;
>>> +    if (sector_size == 2352) {
>>> +        buf += 4;
>>> +    }
> This doesn't look quite right, buf is never read after this.
>
> Also, why +=4 when it was originally buf + 16?

You are right. I mixed that up.

>
>>> +
>>> +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>>> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>>> +
>>> +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
>>> +                      cd_read_sector_cb, s) == NULL) {
>>> +        return -EIO;
>>> +    }
>>> +
>>> +    block_acct_start(blk_get_stats(s->blk), &s->acct,
>>> +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> +    s->status |= BUSY_STAT;
>>> +    return 0;
>>>   }
>>>   
>> We discussed this off-list a bit, but for upstream synchronization:
>>
>> Unfortunately, I believe making cd_read_sector here non-blocking makes
>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
>> s->end_transfer_func() nonblocking, which functions like ide_data_readw
>> are not prepared to cope with.
> I don't think that's a problem as long as BSY is set while the
> asynchronous command is running and DRQ is cleared. The latter will
> protect ide_data_readw(). ide_sector_read() does essentially the same
> thing.

  I was thinking the same. Without the BSY its not working at all.

>
> Or maybe I'm just missing what you're trying to say.
>
>> My suggestion is to buffer an entire DRQ block of data at once
>> (byte_count_limit) to avoid the problem.
> No matter whether there is a problem or not, buffering more data at once
> (and therefore doing less requests) is better for performance anyway.

Its possible to do only one read in the backend and read the whole
request into the IO buffer. I send a follow-up.

Maybe do you have a pointer to the test tool that John mentioned?

Peter

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-06  8:46     ` Peter Lieven
@ 2015-10-06 12:08       ` Peter Lieven
  2015-10-07 16:42       ` John Snow
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Lieven @ 2015-10-06 12:08 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block; +Cc: Kevin Wolf, stefanha, jcody

Am 06.10.2015 um 10:46 schrieb Peter Lieven:
> Am 05.10.2015 um 23:15 schrieb John Snow:
>>
>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>>> PIO read requests on the ATAPI interface used to be sync blk requests.
>>> This has to siginificant drawbacks. First the main loop hangs util an
>>> I/O request is completed and secondly if the I/O request does not
>>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>>>
>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>> ---
>>>   hw/ide/atapi.c | 69 ++++++++++++++++++++++++++++++++++++----------------------
>>>   1 file changed, 43 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>>> index 747f466..9257e1c 100644
>>> --- a/hw/ide/atapi.c
>>> +++ b/hw/ide/atapi.c
>>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>>>       memset(buf, 0, 288);
>>>   }
>>>   -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
>>> +static void cd_read_sector_cb(void *opaque, int ret)
>>>   {
>>> -    int ret;
>>> +    IDEState *s = opaque;
>>>   -    switch(sector_size) {
>>> -    case 2048:
>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>> -        break;
>>> -    case 2352:
>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>> -        if (ret < 0)
>>> -            return ret;
>>> -        cd_data_to_raw(buf, lba);
>>> -        break;
>>> -    default:
>>> -        ret = -EIO;
>>> -        break;
>>> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
>>> +
>>> +    if (ret < 0) {
>>> +        ide_atapi_io_error(s, ret);
>>> +        return;
>>> +    }
>>> +
>>> +    if (s->cd_sector_size == 2352) {
>>> +        cd_data_to_raw(s->io_buffer, s->lba);
>>>       }
>>> -    return ret;
>>> +
>>> +    s->lba++;
>>> +    s->io_buffer_index = 0;
>>> +    s->status &= ~BUSY_STAT;
>>> +
>>> +    ide_atapi_cmd_reply_end(s);
>>> +}
>>> +
>>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
>>> +{
>>> +    if (sector_size != 2048 && sector_size != 2352) {
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    s->iov.iov_base = buf;
>>> +    if (sector_size == 2352) {
>>> +        buf += 4;
>>> +    }
>>> +
>>> +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>>> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>>> +
>>> +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
>>> +                      cd_read_sector_cb, s) == NULL) {
>>> +        return -EIO;
>>> +    }
>>> +
>>> +    block_acct_start(blk_get_stats(s->blk), &s->acct,
>>> +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> +    s->status |= BUSY_STAT;
>>> +    return 0;
>>>   }
>> We discussed this off-list a bit, but for upstream synchronization:
>>
>> Unfortunately, I believe making cd_read_sector here non-blocking makes
>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
>> s->end_transfer_func() nonblocking, which functions like ide_data_readw
>> are not prepared to cope with.
>>
>> My suggestion is to buffer an entire DRQ block of data at once
>> (byte_count_limit) to avoid the problem.
>
> Hi John,
>
> first of all thank you for the detailed analysis.
>
> Is the following what you have i mind. For PIO reads > 1 sector
> it is a great improvement for the NFS backend:
>
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index ab45495..ec2ba89 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -117,37 +117,40 @@ static void cd_read_sector_cb(void *opaque, int ret)
>      }
>
>      if (s->cd_sector_size == 2352) {
> -        cd_data_to_raw(s->io_buffer, s->lba);
> +        int nb_sectors = s->packet_transfer_size / 2352;
> +        while (nb_sectors--) {
> +            memmove(s->io_buffer + nb_sectors * 2352 + 4,
> +                    s->io_buffer + nb_sectors * 2048, 2048);
> +            cd_data_to_raw(s->io_buffer + nb_sectors * 2352,
> +                           s->lba + nb_sectors);
> +        }
>      }
>
> -    s->lba++;
> +    s->lba = -1;
>      s->io_buffer_index = 0;
>      s->status &= ~BUSY_STAT;
>
>      ide_atapi_cmd_reply_end(s);
>  }
>
> -static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
> +static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size, int nb_sectors)
>  {
>      if (sector_size != 2048 && sector_size != 2352) {
>          return -EINVAL;
>      }
>
>      s->iov.iov_base = buf;
> -    if (sector_size == 2352) {
> -        buf += 4;
> -    }
> -
> -    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> +    s->iov.iov_len = nb_sectors * 2048;
>      qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>
> -    if (ide_readv_cancelable(s, (int64_t)lba << 2, &s->qiov, 4,
> -                      cd_read_sector_cb, s) == NULL) {
> +    if (ide_readv_cancelable(s, (int64_t)lba << 2,
> +                             &s->qiov, nb_sectors * 4,
> +                             cd_read_sector_cb, s) == NULL) {
>          return -EIO;
>      }
>
>      block_acct_start(blk_get_stats(s->blk), &s->acct,
> -                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> +                     nb_sectors * 2048, BLOCK_ACCT_READ);
>      s->status |= BUSY_STAT;
>      return 0;
>  }
> @@ -190,7 +193,7 @@ void ide_atapi_io_error(IDEState *s, int ret)
>  /* The whole ATAPI transfer logic is handled in this function */
>  void ide_atapi_cmd_reply_end(IDEState *s)
>  {
> -    int byte_count_limit, size, ret;
> +    int byte_count_limit, size;
>  #ifdef DEBUG_IDE_ATAPI
>      printf("reply: tx_size=%d elem_tx_size=%d index=%d\n",
>             s->packet_transfer_size,
> @@ -205,14 +208,6 @@ void ide_atapi_cmd_reply_end(IDEState *s)
>          printf("status=0x%x\n", s->status);
>  #endif
>      } else {
> -        /* see if a new sector must be read */
> -        if (s->lba != -1 && s->io_buffer_index >= s->cd_sector_size) {
> -            ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
> -            if (ret < 0) {
> -                ide_atapi_io_error(s, ret);
> -            }
> -            return;
> -        }
>          if (s->elementary_transfer_size > 0) {
>              /* there are some data left to transmit in this elementary
>                 transfer */
> @@ -287,13 +282,18 @@ static void ide_atapi_cmd_reply(IDEState *s, int size, int max_size)
>  static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
>                                     int sector_size)
>  {
> +    int ret;
>      s->lba = lba;
>      s->packet_transfer_size = nb_sectors * sector_size;
> +    assert(s->packet_transfer_size <=
> +           IDE_DMA_BUF_SECTORS * BDRV_SECTOR_SIZE + 4);
>      s->elementary_transfer_size = 0;
> -    s->io_buffer_index = sector_size;
>      s->cd_sector_size = sector_size;
> -
> -    ide_atapi_cmd_reply_end(s);
> +    ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size,
> +                         nb_sectors);
> +    if (ret < 0) {
> +        ide_atapi_io_error(s, ret);
> +    }
>  }
>
>  static void ide_atapi_cmd_check_status(IDEState *s)
>
>

Forgot to mention with this patch I pass tests/ide-test now.

Peter

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-09-21 12:25 ` [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async Peter Lieven
  2015-10-02 21:02   ` John Snow
  2015-10-05 21:15   ` John Snow
@ 2015-10-06 13:05   ` Laszlo Ersek
  2 siblings, 0 replies; 32+ messages in thread
From: Laszlo Ersek @ 2015-10-06 13:05 UTC (permalink / raw)
  To: Peter Lieven, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody, jsnow

On 09/21/15 14:25, Peter Lieven wrote:
> PIO read requests on the ATAPI interface used to be sync blk requests.
> This has to siginificant drawbacks. First the main loop hangs util an

Trivial comments:
- s/to/two/
- s/siginificant/significant/

Thanks
Laszlo

> I/O request is completed and secondly if the I/O request does not
> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  hw/ide/atapi.c | 69 ++++++++++++++++++++++++++++++++++++----------------------
>  1 file changed, 43 insertions(+), 26 deletions(-)
> 
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index 747f466..9257e1c 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>      memset(buf, 0, 288);
>  }
>  
> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
> +static void cd_read_sector_cb(void *opaque, int ret)
>  {
> -    int ret;
> +    IDEState *s = opaque;
>  
> -    switch(sector_size) {
> -    case 2048:
> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
> -        break;
> -    case 2352:
> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
> -        if (ret < 0)
> -            return ret;
> -        cd_data_to_raw(buf, lba);
> -        break;
> -    default:
> -        ret = -EIO;
> -        break;
> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
> +
> +    if (ret < 0) {
> +        ide_atapi_io_error(s, ret);
> +        return;
> +    }
> +
> +    if (s->cd_sector_size == 2352) {
> +        cd_data_to_raw(s->io_buffer, s->lba);
>      }
> -    return ret;
> +
> +    s->lba++;
> +    s->io_buffer_index = 0;
> +    s->status &= ~BUSY_STAT;
> +
> +    ide_atapi_cmd_reply_end(s);
> +}
> +
> +static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
> +{
> +    if (sector_size != 2048 && sector_size != 2352) {
> +        return -EINVAL;
> +    }
> +
> +    s->iov.iov_base = buf;
> +    if (sector_size == 2352) {
> +        buf += 4;
> +    }
> +
> +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
> +
> +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
> +                      cd_read_sector_cb, s) == NULL) {
> +        return -EIO;
> +    }
> +
> +    block_acct_start(blk_get_stats(s->blk), &s->acct,
> +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> +    s->status |= BUSY_STAT;
> +    return 0;
>  }
>  
>  void ide_atapi_cmd_ok(IDEState *s)
> @@ -190,10 +210,8 @@ void ide_atapi_cmd_reply_end(IDEState *s)
>              ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
>              if (ret < 0) {
>                  ide_atapi_io_error(s, ret);
> -                return;
>              }
> -            s->lba++;
> -            s->io_buffer_index = 0;
> +            return;
>          }
>          if (s->elementary_transfer_size > 0) {
>              /* there are some data left to transmit in this elementary
> @@ -275,7 +293,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
>      s->io_buffer_index = sector_size;
>      s->cd_sector_size = sector_size;
>  
> -    s->status = READY_STAT | SEEK_STAT;
>      ide_atapi_cmd_reply_end(s);
>  }
>  
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-06  8:57     ` Kevin Wolf
  2015-10-06  9:20       ` Peter Lieven
@ 2015-10-06 15:54       ` John Snow
  2015-10-07  7:28         ` Kevin Wolf
  1 sibling, 1 reply; 32+ messages in thread
From: John Snow @ 2015-10-06 15:54 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: stefanha, jcody, Peter Lieven, qemu-devel, qemu-block



On 10/06/2015 04:57 AM, Kevin Wolf wrote:
> Am 05.10.2015 um 23:15 hat John Snow geschrieben:
>>
>>
>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>>> PIO read requests on the ATAPI interface used to be sync blk requests.
>>> This has to siginificant drawbacks. First the main loop hangs util an
>>> I/O request is completed and secondly if the I/O request does not
>>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>>>
>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>> ---
>>>  hw/ide/atapi.c | 69 ++++++++++++++++++++++++++++++++++++----------------------
>>>  1 file changed, 43 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>>> index 747f466..9257e1c 100644
>>> --- a/hw/ide/atapi.c
>>> +++ b/hw/ide/atapi.c
>>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>>>      memset(buf, 0, 288);
>>>  }
>>>  
>>> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
>>> +static void cd_read_sector_cb(void *opaque, int ret)
>>>  {
>>> -    int ret;
>>> +    IDEState *s = opaque;
>>>  
>>> -    switch(sector_size) {
>>> -    case 2048:
>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>> -        break;
>>> -    case 2352:
>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>> -        if (ret < 0)
>>> -            return ret;
>>> -        cd_data_to_raw(buf, lba);
>>> -        break;
>>> -    default:
>>> -        ret = -EIO;
>>> -        break;
>>> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
>>> +
>>> +    if (ret < 0) {
>>> +        ide_atapi_io_error(s, ret);
>>> +        return;
>>> +    }
>>> +
>>> +    if (s->cd_sector_size == 2352) {
>>> +        cd_data_to_raw(s->io_buffer, s->lba);
>>>      }
>>> -    return ret;
>>> +
>>> +    s->lba++;
>>> +    s->io_buffer_index = 0;
>>> +    s->status &= ~BUSY_STAT;
>>> +
>>> +    ide_atapi_cmd_reply_end(s);
>>> +}
>>> +
>>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
>>> +{
>>> +    if (sector_size != 2048 && sector_size != 2352) {
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    s->iov.iov_base = buf;
>>> +    if (sector_size == 2352) {
>>> +        buf += 4;
>>> +    }
> 
> This doesn't look quite right, buf is never read after this.
> 
> Also, why +=4 when it was originally buf + 16?
> 
>>> +
>>> +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>>> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>>> +
>>> +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
>>> +                      cd_read_sector_cb, s) == NULL) {
>>> +        return -EIO;
>>> +    }
>>> +
>>> +    block_acct_start(blk_get_stats(s->blk), &s->acct,
>>> +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> +    s->status |= BUSY_STAT;
>>> +    return 0;
>>>  }
>>>  
>>
>> We discussed this off-list a bit, but for upstream synchronization:
>>
>> Unfortunately, I believe making cd_read_sector here non-blocking makes
>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
>> s->end_transfer_func() nonblocking, which functions like ide_data_readw
>> are not prepared to cope with.
> 
> I don't think that's a problem as long as BSY is set while the
> asynchronous command is running and DRQ is cleared. The latter will
> protect ide_data_readw(). ide_sector_read() does essentially the same
> thing.
> 
> Or maybe I'm just missing what you're trying to say.
> 

It will be correct from a code standpoint, but I don't think the guest
*expects* DRQ to become re-set before byte_count_limit is exhausted.

In the synchronous version of the code, DRQ flickers while we rebuffer
s->io_buffer, but since it's synchronous, the guest *never sees this*.

The guest does not necessarily have any reason or motivation to check if
DRQ is still set after 2048 bytes -- is that recommended in the spec?

("Warning! The drive may decide to rebuffer arbitrarily while you read,
so check if DRQ is still set before each read to the data register!" ??)

>> My suggestion is to buffer an entire DRQ block of data at once
>> (byte_count_limit) to avoid the problem.
> 
> No matter whether there is a problem or not, buffering more data at once
> (and therefore doing less requests) is better for performance anyway.
> 
> Kevin
> 
>>>  void ide_atapi_cmd_ok(IDEState *s)
>>> @@ -190,10 +210,8 @@ void ide_atapi_cmd_reply_end(IDEState *s)
>>>              ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
>>>              if (ret < 0) {
>>>                  ide_atapi_io_error(s, ret);
>>> -                return;
>>>              }
>>> -            s->lba++;
>>> -            s->io_buffer_index = 0;
>>> +            return;
>>>          }
>>>          if (s->elementary_transfer_size > 0) {
>>>              /* there are some data left to transmit in this elementary
>>> @@ -275,7 +293,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
>>>      s->io_buffer_index = sector_size;
>>>      s->cd_sector_size = sector_size;
>>>  
>>> -    s->status = READY_STAT | SEEK_STAT;
>>>      ide_atapi_cmd_reply_end(s);
>>>  }
>>>  
>>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-06  9:20       ` Peter Lieven
@ 2015-10-06 17:07         ` John Snow
  2015-10-06 17:12           ` Peter Lieven
  0 siblings, 1 reply; 32+ messages in thread
From: John Snow @ 2015-10-06 17:07 UTC (permalink / raw)
  To: Peter Lieven, Kevin Wolf; +Cc: stefanha, jcody, qemu-devel, qemu-block



On 10/06/2015 05:20 AM, Peter Lieven wrote:
> Am 06.10.2015 um 10:57 schrieb Kevin Wolf:
>> Am 05.10.2015 um 23:15 hat John Snow geschrieben:
>>>
>>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>>>> PIO read requests on the ATAPI interface used to be sync blk requests.
>>>> This has to siginificant drawbacks. First the main loop hangs util an
>>>> I/O request is completed and secondly if the I/O request does not
>>>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>>>>
>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>> ---
>>>>   hw/ide/atapi.c | 69
>>>> ++++++++++++++++++++++++++++++++++++----------------------
>>>>   1 file changed, 43 insertions(+), 26 deletions(-)
>>>>
>>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>>>> index 747f466..9257e1c 100644
>>>> --- a/hw/ide/atapi.c
>>>> +++ b/hw/ide/atapi.c
>>>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>>>>       memset(buf, 0, 288);
>>>>   }
>>>>   -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int
>>>> sector_size)
>>>> +static void cd_read_sector_cb(void *opaque, int ret)
>>>>   {
>>>> -    int ret;
>>>> +    IDEState *s = opaque;
>>>>   -    switch(sector_size) {
>>>> -    case 2048:
>>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
>>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>> -        break;
>>>> -    case 2352:
>>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
>>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>> -        if (ret < 0)
>>>> -            return ret;
>>>> -        cd_data_to_raw(buf, lba);
>>>> -        break;
>>>> -    default:
>>>> -        ret = -EIO;
>>>> -        break;
>>>> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>> +
>>>> +    if (ret < 0) {
>>>> +        ide_atapi_io_error(s, ret);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (s->cd_sector_size == 2352) {
>>>> +        cd_data_to_raw(s->io_buffer, s->lba);
>>>>       }
>>>> -    return ret;
>>>> +
>>>> +    s->lba++;
>>>> +    s->io_buffer_index = 0;
>>>> +    s->status &= ~BUSY_STAT;
>>>> +
>>>> +    ide_atapi_cmd_reply_end(s);
>>>> +}
>>>> +
>>>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int
>>>> sector_size)
>>>> +{
>>>> +    if (sector_size != 2048 && sector_size != 2352) {
>>>> +        return -EINVAL;
>>>> +    }
>>>> +
>>>> +    s->iov.iov_base = buf;
>>>> +    if (sector_size == 2352) {
>>>> +        buf += 4;
>>>> +    }
>> This doesn't look quite right, buf is never read after this.
>>
>> Also, why +=4 when it was originally buf + 16?
> 
> You are right. I mixed that up.
> 
>>
>>>> +
>>>> +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>>>> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>>>> +
>>>> +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
>>>> +                      cd_read_sector_cb, s) == NULL) {
>>>> +        return -EIO;
>>>> +    }
>>>> +
>>>> +    block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>> +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>> +    s->status |= BUSY_STAT;
>>>> +    return 0;
>>>>   }
>>>>   
>>> We discussed this off-list a bit, but for upstream synchronization:
>>>
>>> Unfortunately, I believe making cd_read_sector here non-blocking makes
>>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
>>> s->end_transfer_func() nonblocking, which functions like ide_data_readw
>>> are not prepared to cope with.
>> I don't think that's a problem as long as BSY is set while the
>> asynchronous command is running and DRQ is cleared. The latter will
>> protect ide_data_readw(). ide_sector_read() does essentially the same
>> thing.
> 
>  I was thinking the same. Without the BSY its not working at all.
> 
>>
>> Or maybe I'm just missing what you're trying to say.
>>
>>> My suggestion is to buffer an entire DRQ block of data at once
>>> (byte_count_limit) to avoid the problem.
>> No matter whether there is a problem or not, buffering more data at once
>> (and therefore doing less requests) is better for performance anyway.
> 
> Its possible to do only one read in the backend and read the whole
> request into the IO buffer. I send a follow-up.
> 

Be cautious: we only have 128K (+4 bytes) to play with in the io_buffer
and the READ10 cdb can request up to 128MiB! For performance, it might
be nice to always buffer something like:

MIN(128K, nb_sectors * sector_size)

and then as the guest drains the DRQ block of size byte_count_limit
which can only be at largest 0xFFFE (we can fit in at least two of these
per io_buffer refill) we can just shift the data_ptr and data_end
pointers to utilize io_buffer like a ring buffer.

Because the guest can at most fetch 0xfffe bytes at a time, it will tend
to leave at least 4 bytes left over from a 64 block read. Luckily, we've
got 4 extra bytes in s->io_buffer, so with a ring buffer we can always
rebuffer *at least* two full DRQ blocks of data at a time.

The routine would basically look like this:

- No DRQ blocks buffered, so read up to 64 blocks or however many are
left for our transfer
- If we have at least one full DRQ block allocated, start the transfer
and send an interrupt
- If we ran out of DRQ blocks, go back to the top and buffer them.

This would eliminate the need for code stanza #3 in
ide_atapi_cmd_reply_end, which re-starts a transfer without signaling to
the guest. We'd only have:

ide_atapi_cmd_reply_end(...) {
  if (packet_transfer_size == 0) { end(...); return; }
  if (blocks_buffered < 1) { async_buffer_blocks(...); return; }
  ide_transfer_start(...)
  ide_set_irq(s->bus);
}


which is a good deal simpler than what we have now, though I need to
look into the formatting of raw CD data a little more to make sure my
numbers make sense... it may not be quite so easy to buffer multiple DRQ
blocks in some cases, but so it goes -- we should always be able to
buffer at least one.

> Maybe do you have a pointer to the test tool that John mentioned?
> 
> Peter
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-06 17:07         ` John Snow
@ 2015-10-06 17:12           ` Peter Lieven
  2015-10-06 17:56             ` John Snow
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Lieven @ 2015-10-06 17:12 UTC (permalink / raw)
  To: John Snow; +Cc: Kevin Wolf, stefanha, jcody, qemu-devel, qemu-block


> Am 06.10.2015 um 19:07 schrieb John Snow <jsnow@redhat.com>:
> 
> 
> 
>> On 10/06/2015 05:20 AM, Peter Lieven wrote:
>>> Am 06.10.2015 um 10:57 schrieb Kevin Wolf:
>>> Am 05.10.2015 um 23:15 hat John Snow geschrieben:
>>>> 
>>>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>>>>> PIO read requests on the ATAPI interface used to be sync blk requests.
>>>>> This has to siginificant drawbacks. First the main loop hangs util an
>>>>> I/O request is completed and secondly if the I/O request does not
>>>>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>>>>> 
>>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>>> ---
>>>>>  hw/ide/atapi.c | 69
>>>>> ++++++++++++++++++++++++++++++++++++----------------------
>>>>>  1 file changed, 43 insertions(+), 26 deletions(-)
>>>>> 
>>>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>>>>> index 747f466..9257e1c 100644
>>>>> --- a/hw/ide/atapi.c
>>>>> +++ b/hw/ide/atapi.c
>>>>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>>>>>      memset(buf, 0, 288);
>>>>>  }
>>>>>  -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int
>>>>> sector_size)
>>>>> +static void cd_read_sector_cb(void *opaque, int ret)
>>>>>  {
>>>>> -    int ret;
>>>>> +    IDEState *s = opaque;
>>>>>  -    switch(sector_size) {
>>>>> -    case 2048:
>>>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
>>>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>> -        break;
>>>>> -    case 2352:
>>>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
>>>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>> -        if (ret < 0)
>>>>> -            return ret;
>>>>> -        cd_data_to_raw(buf, lba);
>>>>> -        break;
>>>>> -    default:
>>>>> -        ret = -EIO;
>>>>> -        break;
>>>>> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>> +
>>>>> +    if (ret < 0) {
>>>>> +        ide_atapi_io_error(s, ret);
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    if (s->cd_sector_size == 2352) {
>>>>> +        cd_data_to_raw(s->io_buffer, s->lba);
>>>>>      }
>>>>> -    return ret;
>>>>> +
>>>>> +    s->lba++;
>>>>> +    s->io_buffer_index = 0;
>>>>> +    s->status &= ~BUSY_STAT;
>>>>> +
>>>>> +    ide_atapi_cmd_reply_end(s);
>>>>> +}
>>>>> +
>>>>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int
>>>>> sector_size)
>>>>> +{
>>>>> +    if (sector_size != 2048 && sector_size != 2352) {
>>>>> +        return -EINVAL;
>>>>> +    }
>>>>> +
>>>>> +    s->iov.iov_base = buf;
>>>>> +    if (sector_size == 2352) {
>>>>> +        buf += 4;
>>>>> +    }
>>> This doesn't look quite right, buf is never read after this.
>>> 
>>> Also, why +=4 when it was originally buf + 16?
>> 
>> You are right. I mixed that up.
>> 
>>> 
>>>>> +
>>>>> +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>>>>> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>>>>> +
>>>>> +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
>>>>> +                      cd_read_sector_cb, s) == NULL) {
>>>>> +        return -EIO;
>>>>> +    }
>>>>> +
>>>>> +    block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>> +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>>> +    s->status |= BUSY_STAT;
>>>>> +    return 0;
>>>>>  }
>>>> We discussed this off-list a bit, but for upstream synchronization:
>>>> 
>>>> Unfortunately, I believe making cd_read_sector here non-blocking makes
>>>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
>>>> s->end_transfer_func() nonblocking, which functions like ide_data_readw
>>>> are not prepared to cope with.
>>> I don't think that's a problem as long as BSY is set while the
>>> asynchronous command is running and DRQ is cleared. The latter will
>>> protect ide_data_readw(). ide_sector_read() does essentially the same
>>> thing.
>> 
>> I was thinking the same. Without the BSY its not working at all.
>> 
>>> 
>>> Or maybe I'm just missing what you're trying to say.
>>> 
>>>> My suggestion is to buffer an entire DRQ block of data at once
>>>> (byte_count_limit) to avoid the problem.
>>> No matter whether there is a problem or not, buffering more data at once
>>> (and therefore doing less requests) is better for performance anyway.
>> 
>> Its possible to do only one read in the backend and read the whole
>> request into the IO buffer. I send a follow-up.
> 
> Be cautious: we only have 128K (+4 bytes) to play with in the io_buffer
> and the READ10 cdb can request up to 128MiB! For performance, it might
> be nice to always buffer something like:
> 
> MIN(128K, nb_sectors * sector_size)

isnt nb_sectors limited to CD_MAX_SECTORS (32)?

Peter


> 
> and then as the guest drains the DRQ block of size byte_count_limit
> which can only be at largest 0xFFFE (we can fit in at least two of these
> per io_buffer refill) we can just shift the data_ptr and data_end
> pointers to utilize io_buffer like a ring buffer.
> 
> Because the guest can at most fetch 0xfffe bytes at a time, it will tend
> to leave at least 4 bytes left over from a 64 block read. Luckily, we've
> got 4 extra bytes in s->io_buffer, so with a ring buffer we can always
> rebuffer *at least* two full DRQ blocks of data at a time.
> 
> The routine would basically look like this:
> 
> - No DRQ blocks buffered, so read up to 64 blocks or however many are
> left for our transfer
> - If we have at least one full DRQ block allocated, start the transfer
> and send an interrupt
> - If we ran out of DRQ blocks, go back to the top and buffer them.
> 
> This would eliminate the need for code stanza #3 in
> ide_atapi_cmd_reply_end, which re-starts a transfer without signaling to
> the guest. We'd only have:
> 
> ide_atapi_cmd_reply_end(...) {
>  if (packet_transfer_size == 0) { end(...); return; }
>  if (blocks_buffered < 1) { async_buffer_blocks(...); return; }
>  ide_transfer_start(...)
>  ide_set_irq(s->bus);
> }
> 
> 
> which is a good deal simpler than what we have now, though I need to
> look into the formatting of raw CD data a little more to make sure my
> numbers make sense... it may not be quite so easy to buffer multiple DRQ
> blocks in some cases, but so it goes -- we should always be able to
> buffer at least one.
> 
>> Maybe do you have a pointer to the test tool that John mentioned?
>> 
>> Peter
>> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-06 17:12           ` Peter Lieven
@ 2015-10-06 17:56             ` John Snow
  2015-10-06 18:31               ` Peter Lieven
  0 siblings, 1 reply; 32+ messages in thread
From: John Snow @ 2015-10-06 17:56 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Kevin Wolf, stefanha, jcody, qemu-devel, qemu-block



On 10/06/2015 01:12 PM, Peter Lieven wrote:
> 
>> Am 06.10.2015 um 19:07 schrieb John Snow <jsnow@redhat.com>:
>>
>>
>>
>>> On 10/06/2015 05:20 AM, Peter Lieven wrote:
>>>> Am 06.10.2015 um 10:57 schrieb Kevin Wolf:
>>>> Am 05.10.2015 um 23:15 hat John Snow geschrieben:
>>>>>
>>>>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>>>>>> PIO read requests on the ATAPI interface used to be sync blk requests.
>>>>>> This has to siginificant drawbacks. First the main loop hangs util an
>>>>>> I/O request is completed and secondly if the I/O request does not
>>>>>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>>>>>>
>>>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>>>> ---
>>>>>>  hw/ide/atapi.c | 69
>>>>>> ++++++++++++++++++++++++++++++++++++----------------------
>>>>>>  1 file changed, 43 insertions(+), 26 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>>>>>> index 747f466..9257e1c 100644
>>>>>> --- a/hw/ide/atapi.c
>>>>>> +++ b/hw/ide/atapi.c
>>>>>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>>>>>>      memset(buf, 0, 288);
>>>>>>  }
>>>>>>  -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int
>>>>>> sector_size)
>>>>>> +static void cd_read_sector_cb(void *opaque, int ret)
>>>>>>  {
>>>>>> -    int ret;
>>>>>> +    IDEState *s = opaque;
>>>>>>  -    switch(sector_size) {
>>>>>> -    case 2048:
>>>>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
>>>>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>> -        break;
>>>>>> -    case 2352:
>>>>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
>>>>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>> -        if (ret < 0)
>>>>>> -            return ret;
>>>>>> -        cd_data_to_raw(buf, lba);
>>>>>> -        break;
>>>>>> -    default:
>>>>>> -        ret = -EIO;
>>>>>> -        break;
>>>>>> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>> +
>>>>>> +    if (ret < 0) {
>>>>>> +        ide_atapi_io_error(s, ret);
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>> +    if (s->cd_sector_size == 2352) {
>>>>>> +        cd_data_to_raw(s->io_buffer, s->lba);
>>>>>>      }
>>>>>> -    return ret;
>>>>>> +
>>>>>> +    s->lba++;
>>>>>> +    s->io_buffer_index = 0;
>>>>>> +    s->status &= ~BUSY_STAT;
>>>>>> +
>>>>>> +    ide_atapi_cmd_reply_end(s);
>>>>>> +}
>>>>>> +
>>>>>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int
>>>>>> sector_size)
>>>>>> +{
>>>>>> +    if (sector_size != 2048 && sector_size != 2352) {
>>>>>> +        return -EINVAL;
>>>>>> +    }
>>>>>> +
>>>>>> +    s->iov.iov_base = buf;
>>>>>> +    if (sector_size == 2352) {
>>>>>> +        buf += 4;
>>>>>> +    }
>>>> This doesn't look quite right, buf is never read after this.
>>>>
>>>> Also, why +=4 when it was originally buf + 16?
>>>
>>> You are right. I mixed that up.
>>>
>>>>
>>>>>> +
>>>>>> +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>>>>>> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>>>>>> +
>>>>>> +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
>>>>>> +                      cd_read_sector_cb, s) == NULL) {
>>>>>> +        return -EIO;
>>>>>> +    }
>>>>>> +
>>>>>> +    block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>> +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>>>> +    s->status |= BUSY_STAT;
>>>>>> +    return 0;
>>>>>>  }
>>>>> We discussed this off-list a bit, but for upstream synchronization:
>>>>>
>>>>> Unfortunately, I believe making cd_read_sector here non-blocking makes
>>>>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
>>>>> s->end_transfer_func() nonblocking, which functions like ide_data_readw
>>>>> are not prepared to cope with.
>>>> I don't think that's a problem as long as BSY is set while the
>>>> asynchronous command is running and DRQ is cleared. The latter will
>>>> protect ide_data_readw(). ide_sector_read() does essentially the same
>>>> thing.
>>>
>>> I was thinking the same. Without the BSY its not working at all.
>>>
>>>>
>>>> Or maybe I'm just missing what you're trying to say.
>>>>
>>>>> My suggestion is to buffer an entire DRQ block of data at once
>>>>> (byte_count_limit) to avoid the problem.
>>>> No matter whether there is a problem or not, buffering more data at once
>>>> (and therefore doing less requests) is better for performance anyway.
>>>
>>> Its possible to do only one read in the backend and read the whole
>>> request into the IO buffer. I send a follow-up.
>>
>> Be cautious: we only have 128K (+4 bytes) to play with in the io_buffer
>> and the READ10 cdb can request up to 128MiB! For performance, it might
>> be nice to always buffer something like:
>>
>> MIN(128K, nb_sectors * sector_size)
> 
> isnt nb_sectors limited to CD_MAX_SECTORS (32)?
> 
> Peter
> 

CD_MAX_SECTORS is... (80 * 60 * 75 * 2048) / 512 --> 1440000, and
describes the maximum sector size for a CD medium, not the request size.

Where'd you get the 32 number?

> 
>>
>> and then as the guest drains the DRQ block of size byte_count_limit
>> which can only be at largest 0xFFFE (we can fit in at least two of these
>> per io_buffer refill) we can just shift the data_ptr and data_end
>> pointers to utilize io_buffer like a ring buffer.
>>
>> Because the guest can at most fetch 0xfffe bytes at a time, it will tend
>> to leave at least 4 bytes left over from a 64 block read. Luckily, we've
>> got 4 extra bytes in s->io_buffer, so with a ring buffer we can always
>> rebuffer *at least* two full DRQ blocks of data at a time.
>>
>> The routine would basically look like this:
>>
>> - No DRQ blocks buffered, so read up to 64 blocks or however many are
>> left for our transfer
>> - If we have at least one full DRQ block allocated, start the transfer
>> and send an interrupt
>> - If we ran out of DRQ blocks, go back to the top and buffer them.
>>
>> This would eliminate the need for code stanza #3 in
>> ide_atapi_cmd_reply_end, which re-starts a transfer without signaling to
>> the guest. We'd only have:
>>
>> ide_atapi_cmd_reply_end(...) {
>>  if (packet_transfer_size == 0) { end(...); return; }
>>  if (blocks_buffered < 1) { async_buffer_blocks(...); return; }
>>  ide_transfer_start(...)
>>  ide_set_irq(s->bus);
>> }
>>
>>
>> which is a good deal simpler than what we have now, though I need to
>> look into the formatting of raw CD data a little more to make sure my
>> numbers make sense... it may not be quite so easy to buffer multiple DRQ
>> blocks in some cases, but so it goes -- we should always be able to
>> buffer at least one.
>>
>>> Maybe do you have a pointer to the test tool that John mentioned?
>>>
>>> Peter
>>>

-- 
—js

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-06 17:56             ` John Snow
@ 2015-10-06 18:31               ` Peter Lieven
  2015-10-06 18:34                 ` John Snow
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Lieven @ 2015-10-06 18:31 UTC (permalink / raw)
  To: John Snow; +Cc: Kevin Wolf, stefanha, jcody, qemu-devel, qemu-block

Am 06.10.2015 um 19:56 schrieb John Snow:
>
> On 10/06/2015 01:12 PM, Peter Lieven wrote:
>>> Am 06.10.2015 um 19:07 schrieb John Snow <jsnow@redhat.com>:
>>>
>>>
>>>
>>>> On 10/06/2015 05:20 AM, Peter Lieven wrote:
>>>>> Am 06.10.2015 um 10:57 schrieb Kevin Wolf:
>>>>> Am 05.10.2015 um 23:15 hat John Snow geschrieben:
>>>>>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>>>>>>> PIO read requests on the ATAPI interface used to be sync blk requests.
>>>>>>> This has to siginificant drawbacks. First the main loop hangs util an
>>>>>>> I/O request is completed and secondly if the I/O request does not
>>>>>>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>>>>>>>
>>>>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>>>>> ---
>>>>>>>  hw/ide/atapi.c | 69
>>>>>>> ++++++++++++++++++++++++++++++++++++----------------------
>>>>>>>  1 file changed, 43 insertions(+), 26 deletions(-)
>>>>>>>
>>>>>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>>>>>>> index 747f466..9257e1c 100644
>>>>>>> --- a/hw/ide/atapi.c
>>>>>>> +++ b/hw/ide/atapi.c
>>>>>>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>>>>>>>      memset(buf, 0, 288);
>>>>>>>  }
>>>>>>>  -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int
>>>>>>> sector_size)
>>>>>>> +static void cd_read_sector_cb(void *opaque, int ret)
>>>>>>>  {
>>>>>>> -    int ret;
>>>>>>> +    IDEState *s = opaque;
>>>>>>>  -    switch(sector_size) {
>>>>>>> -    case 2048:
>>>>>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>>>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
>>>>>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>>> -        break;
>>>>>>> -    case 2352:
>>>>>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>>>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
>>>>>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>>> -        if (ret < 0)
>>>>>>> -            return ret;
>>>>>>> -        cd_data_to_raw(buf, lba);
>>>>>>> -        break;
>>>>>>> -    default:
>>>>>>> -        ret = -EIO;
>>>>>>> -        break;
>>>>>>> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>>> +
>>>>>>> +    if (ret < 0) {
>>>>>>> +        ide_atapi_io_error(s, ret);
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    if (s->cd_sector_size == 2352) {
>>>>>>> +        cd_data_to_raw(s->io_buffer, s->lba);
>>>>>>>      }
>>>>>>> -    return ret;
>>>>>>> +
>>>>>>> +    s->lba++;
>>>>>>> +    s->io_buffer_index = 0;
>>>>>>> +    s->status &= ~BUSY_STAT;
>>>>>>> +
>>>>>>> +    ide_atapi_cmd_reply_end(s);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int
>>>>>>> sector_size)
>>>>>>> +{
>>>>>>> +    if (sector_size != 2048 && sector_size != 2352) {
>>>>>>> +        return -EINVAL;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    s->iov.iov_base = buf;
>>>>>>> +    if (sector_size == 2352) {
>>>>>>> +        buf += 4;
>>>>>>> +    }
>>>>> This doesn't look quite right, buf is never read after this.
>>>>>
>>>>> Also, why +=4 when it was originally buf + 16?
>>>> You are right. I mixed that up.
>>>>
>>>>>>> +
>>>>>>> +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>>>>>>> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>>>>>>> +
>>>>>>> +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
>>>>>>> +                      cd_read_sector_cb, s) == NULL) {
>>>>>>> +        return -EIO;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>>> +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>>>>> +    s->status |= BUSY_STAT;
>>>>>>> +    return 0;
>>>>>>>  }
>>>>>> We discussed this off-list a bit, but for upstream synchronization:
>>>>>>
>>>>>> Unfortunately, I believe making cd_read_sector here non-blocking makes
>>>>>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
>>>>>> s->end_transfer_func() nonblocking, which functions like ide_data_readw
>>>>>> are not prepared to cope with.
>>>>> I don't think that's a problem as long as BSY is set while the
>>>>> asynchronous command is running and DRQ is cleared. The latter will
>>>>> protect ide_data_readw(). ide_sector_read() does essentially the same
>>>>> thing.
>>>> I was thinking the same. Without the BSY its not working at all.
>>>>
>>>>> Or maybe I'm just missing what you're trying to say.
>>>>>
>>>>>> My suggestion is to buffer an entire DRQ block of data at once
>>>>>> (byte_count_limit) to avoid the problem.
>>>>> No matter whether there is a problem or not, buffering more data at once
>>>>> (and therefore doing less requests) is better for performance anyway.
>>>> Its possible to do only one read in the backend and read the whole
>>>> request into the IO buffer. I send a follow-up.
>>> Be cautious: we only have 128K (+4 bytes) to play with in the io_buffer
>>> and the READ10 cdb can request up to 128MiB! For performance, it might
>>> be nice to always buffer something like:
>>>
>>> MIN(128K, nb_sectors * sector_size)
>> isnt nb_sectors limited to CD_MAX_SECTORS (32)?
>>
>> Peter
>>
> CD_MAX_SECTORS is... (80 * 60 * 75 * 2048) / 512 --> 1440000, and
> describes the maximum sector size for a CD medium, not the request size.
>
> Where'd you get the 32 number?

You are right. I mixed this up. You where talking of a maximum transfer
size of close to 32 sectors. But you where referring to an ide transfer not
the maximum request size in terms of SCSI block limits.

I will rework that patch on Thursday.

Maybe you can have a look at the other patches of this series as well? Then I can
respin the whole series.

Thanks for your help,
Peter

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-06 18:31               ` Peter Lieven
@ 2015-10-06 18:34                 ` John Snow
  0 siblings, 0 replies; 32+ messages in thread
From: John Snow @ 2015-10-06 18:34 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Kevin Wolf, stefanha, jcody, qemu-devel, qemu-block



On 10/06/2015 02:31 PM, Peter Lieven wrote:
> Am 06.10.2015 um 19:56 schrieb John Snow:
>>
>> On 10/06/2015 01:12 PM, Peter Lieven wrote:
>>>> Am 06.10.2015 um 19:07 schrieb John Snow <jsnow@redhat.com>:
>>>>
>>>>
>>>>
>>>>> On 10/06/2015 05:20 AM, Peter Lieven wrote:
>>>>>> Am 06.10.2015 um 10:57 schrieb Kevin Wolf:
>>>>>> Am 05.10.2015 um 23:15 hat John Snow geschrieben:
>>>>>>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>>>>>>>> PIO read requests on the ATAPI interface used to be sync blk requests.
>>>>>>>> This has to siginificant drawbacks. First the main loop hangs util an
>>>>>>>> I/O request is completed and secondly if the I/O request does not
>>>>>>>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
Maybe you can have a look at the other patches of this series as well?
Then I can
respin the whole series.


>>>>>>>>
>>>>>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>>>>>> ---
>>>>>>>>  hw/ide/atapi.c | 69
>>>>>>>> ++++++++++++++++++++++++++++++++++++----------------------
>>>>>>>>  1 file changed, 43 insertions(+), 26 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>>>>>>>> index 747f466..9257e1c 100644
>>>>>>>> --- a/hw/ide/atapi.c
>>>>>>>> +++ b/hw/ide/atapi.c
>>>>>>>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>>>>>>>>      memset(buf, 0, 288);
>>>>>>>>  }
>>>>>>>>  -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int
>>>>>>>> sector_size)
>>>>>>>> +static void cd_read_sector_cb(void *opaque, int ret)
>>>>>>>>  {
>>>>>>>> -    int ret;
>>>>>>>> +    IDEState *s = opaque;
>>>>>>>>  -    switch(sector_size) {
>>>>>>>> -    case 2048:
>>>>>>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>>>>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
>>>>>>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>>>> -        break;
>>>>>>>> -    case 2352:
>>>>>>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>>>>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
>>>>>>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>>>> -        if (ret < 0)
>>>>>>>> -            return ret;
>>>>>>>> -        cd_data_to_raw(buf, lba);
>>>>>>>> -        break;
>>>>>>>> -    default:
>>>>>>>> -        ret = -EIO;
>>>>>>>> -        break;
>>>>>>>> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>>>> +
>>>>>>>> +    if (ret < 0) {
>>>>>>>> +        ide_atapi_io_error(s, ret);
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (s->cd_sector_size == 2352) {
>>>>>>>> +        cd_data_to_raw(s->io_buffer, s->lba);
>>>>>>>>      }
>>>>>>>> -    return ret;
>>>>>>>> +
>>>>>>>> +    s->lba++;
>>>>>>>> +    s->io_buffer_index = 0;
>>>>>>>> +    s->status &= ~BUSY_STAT;
>>>>>>>> +
>>>>>>>> +    ide_atapi_cmd_reply_end(s);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int
>>>>>>>> sector_size)
>>>>>>>> +{
>>>>>>>> +    if (sector_size != 2048 && sector_size != 2352) {
>>>>>>>> +        return -EINVAL;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    s->iov.iov_base = buf;
>>>>>>>> +    if (sector_size == 2352) {
>>>>>>>> +        buf += 4;
>>>>>>>> +    }
>>>>>> This doesn't look quite right, buf is never read after this.
>>>>>>
>>>>>> Also, why +=4 when it was originally buf + 16?
>>>>> You are right. I mixed that up.
>>>>>
>>>>>>>> +
>>>>>>>> +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>>>>>>>> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>>>>>>>> +
>>>>>>>> +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
>>>>>>>> +                      cd_read_sector_cb, s) == NULL) {
>>>>>>>> +        return -EIO;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>>>> +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>>>>>> +    s->status |= BUSY_STAT;
>>>>>>>> +    return 0;
>>>>>>>>  }
>>>>>>> We discussed this off-list a bit, but for upstream synchronization:
>>>>>>>
>>>>>>> Unfortunately, I believe making cd_read_sector here non-blocking makes
>>>>>>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
>>>>>>> s->end_transfer_func() nonblocking, which functions like ide_data_readw
>>>>>>> are not prepared to cope with.
>>>>>> I don't think that's a problem as long as BSY is set while the
>>>>>> asynchronous command is running and DRQ is cleared. The latter will
>>>>>> protect ide_data_readw(). ide_sector_read() does essentially the same
>>>>>> thing.
>>>>> I was thinking the same. Without the BSY its not working at all.
>>>>>
>>>>>> Or maybe I'm just missing what you're trying to say.
>>>>>>
>>>>>>> My suggestion is to buffer an entire DRQ block of data at once
>>>>>>> (byte_count_limit) to avoid the problem.
>>>>>> No matter whether there is a problem or not, buffering more data at once
>>>>>> (and therefore doing less requests) is better for performance anyway.
>>>>> Its possible to do only one read in the backend and read the whole
>>>>> request into the IO buffer. I send a follow-up.
>>>> Be cautious: we only have 128K (+4 bytes) to play with in the io_buffer
>>>> and the READ10 cdb can request up to 128MiB! For performance, it might
>>>> be nice to always buffer something like:
>>>>
>>>> MIN(128K, nb_sectors * sector_size)
>>> isnt nb_sectors limited to CD_MAX_SECTORS (32)?
>>>
>>> Peter
>>>
>> CD_MAX_SECTORS is... (80 * 60 * 75 * 2048) / 512 --> 1440000, and
>> describes the maximum sector size for a CD medium, not the request size.
>>
>> Where'd you get the 32 number?
> 
> You are right. I mixed this up. You where talking of a maximum transfer
> size of close to 32 sectors. But you where referring to an ide transfer not
> the maximum request size in terms of SCSI block limits.
> 

Ah, yes. You can request up to 0xfffe bytes per DRQ cycle, which is a
hair shy of 32 sectors, but the overall transaction can be quite large.

> I will rework that patch on Thursday.
> 
> Maybe you can have a look at the other patches of this series as well? Then I can
> respin the whole series.
> 
> Thanks for your help,
> Peter
> 

Sure thing. Sorry to drag you into one of the darkest corners of hw/ide/*.

--js

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-06 15:54       ` John Snow
@ 2015-10-07  7:28         ` Kevin Wolf
  0 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2015-10-07  7:28 UTC (permalink / raw)
  To: John Snow; +Cc: stefanha, jcody, Peter Lieven, qemu-devel, qemu-block

Am 06.10.2015 um 17:54 hat John Snow geschrieben:
> 
> 
> On 10/06/2015 04:57 AM, Kevin Wolf wrote:
> > Am 05.10.2015 um 23:15 hat John Snow geschrieben:
> >>
> >>
> >> On 09/21/2015 08:25 AM, Peter Lieven wrote:
> >>> PIO read requests on the ATAPI interface used to be sync blk requests.
> >>> This has to siginificant drawbacks. First the main loop hangs util an
> >>> I/O request is completed and secondly if the I/O request does not
> >>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
> >>>
> >>> Signed-off-by: Peter Lieven <pl@kamp.de>
> >>> ---
> >>>  hw/ide/atapi.c | 69 ++++++++++++++++++++++++++++++++++++----------------------
> >>>  1 file changed, 43 insertions(+), 26 deletions(-)
> >>>
> >>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> >>> index 747f466..9257e1c 100644
> >>> --- a/hw/ide/atapi.c
> >>> +++ b/hw/ide/atapi.c
> >>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
> >>>      memset(buf, 0, 288);
> >>>  }
> >>>  
> >>> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
> >>> +static void cd_read_sector_cb(void *opaque, int ret)
> >>>  {
> >>> -    int ret;
> >>> +    IDEState *s = opaque;
> >>>  
> >>> -    switch(sector_size) {
> >>> -    case 2048:
> >>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
> >>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> >>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
> >>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
> >>> -        break;
> >>> -    case 2352:
> >>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
> >>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> >>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
> >>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
> >>> -        if (ret < 0)
> >>> -            return ret;
> >>> -        cd_data_to_raw(buf, lba);
> >>> -        break;
> >>> -    default:
> >>> -        ret = -EIO;
> >>> -        break;
> >>> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
> >>> +
> >>> +    if (ret < 0) {
> >>> +        ide_atapi_io_error(s, ret);
> >>> +        return;
> >>> +    }
> >>> +
> >>> +    if (s->cd_sector_size == 2352) {
> >>> +        cd_data_to_raw(s->io_buffer, s->lba);
> >>>      }
> >>> -    return ret;
> >>> +
> >>> +    s->lba++;
> >>> +    s->io_buffer_index = 0;
> >>> +    s->status &= ~BUSY_STAT;
> >>> +
> >>> +    ide_atapi_cmd_reply_end(s);
> >>> +}
> >>> +
> >>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
> >>> +{
> >>> +    if (sector_size != 2048 && sector_size != 2352) {
> >>> +        return -EINVAL;
> >>> +    }
> >>> +
> >>> +    s->iov.iov_base = buf;
> >>> +    if (sector_size == 2352) {
> >>> +        buf += 4;
> >>> +    }
> > 
> > This doesn't look quite right, buf is never read after this.
> > 
> > Also, why +=4 when it was originally buf + 16?
> > 
> >>> +
> >>> +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> >>> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
> >>> +
> >>> +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
> >>> +                      cd_read_sector_cb, s) == NULL) {
> >>> +        return -EIO;
> >>> +    }
> >>> +
> >>> +    block_acct_start(blk_get_stats(s->blk), &s->acct,
> >>> +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> >>> +    s->status |= BUSY_STAT;
> >>> +    return 0;
> >>>  }
> >>>  
> >>
> >> We discussed this off-list a bit, but for upstream synchronization:
> >>
> >> Unfortunately, I believe making cd_read_sector here non-blocking makes
> >> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
> >> s->end_transfer_func() nonblocking, which functions like ide_data_readw
> >> are not prepared to cope with.
> > 
> > I don't think that's a problem as long as BSY is set while the
> > asynchronous command is running and DRQ is cleared. The latter will
> > protect ide_data_readw(). ide_sector_read() does essentially the same
> > thing.
> > 
> > Or maybe I'm just missing what you're trying to say.
> > 
> 
> It will be correct from a code standpoint, but I don't think the guest
> *expects* DRQ to become re-set before byte_count_limit is exhausted.

Oh, I misunderstood what you're after. Yes, I think you're right. The
guest most probably uses string I/O instructions like 'rep insw' in
order to transfer the whole block, i.e. it doesn't even check the status
register in between and will simply transfer invalid data (zeros) while
DRQ isn't set.

> In the synchronous version of the code, DRQ flickers while we rebuffer
> s->io_buffer, but since it's synchronous, the guest *never sees this*.

Thanks, that the current code would be wrong if it weren't synchronous
is the part I missed.

> The guest does not necessarily have any reason or motivation to check if
> DRQ is still set after 2048 bytes -- is that recommended in the spec?
> 
> ("Warning! The drive may decide to rebuffer arbitrarily while you read,
> so check if DRQ is still set before each read to the data register!" ??)

No, it's not recommended. PIO performance is already bad enough without
it. :-)

If you want to check this, have a look at the state machines described in
APT. In my (probably outdated) version it's chapter 9.8 "PACKET command
protocol".

Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-06  8:46     ` Peter Lieven
  2015-10-06 12:08       ` Peter Lieven
@ 2015-10-07 16:42       ` John Snow
  2015-10-07 18:53         ` Peter Lieven
  2015-10-08 12:06         ` Peter Lieven
  1 sibling, 2 replies; 32+ messages in thread
From: John Snow @ 2015-10-07 16:42 UTC (permalink / raw)
  To: Peter Lieven, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody



On 10/06/2015 04:46 AM, Peter Lieven wrote:
> Am 05.10.2015 um 23:15 schrieb John Snow:
>>
>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>>> PIO read requests on the ATAPI interface used to be sync blk requests.
>>> This has to siginificant drawbacks. First the main loop hangs util an
>>> I/O request is completed and secondly if the I/O request does not
>>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>>>
>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>> ---
>>>   hw/ide/atapi.c | 69
>>> ++++++++++++++++++++++++++++++++++++----------------------
>>>   1 file changed, 43 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>>> index 747f466..9257e1c 100644
>>> --- a/hw/ide/atapi.c
>>> +++ b/hw/ide/atapi.c
>>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>>>       memset(buf, 0, 288);
>>>   }
>>>   -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int
>>> sector_size)
>>> +static void cd_read_sector_cb(void *opaque, int ret)
>>>   {
>>> -    int ret;
>>> +    IDEState *s = opaque;
>>>   -    switch(sector_size) {
>>> -    case 2048:
>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>> -        break;
>>> -    case 2352:
>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>> -        if (ret < 0)
>>> -            return ret;
>>> -        cd_data_to_raw(buf, lba);
>>> -        break;
>>> -    default:
>>> -        ret = -EIO;
>>> -        break;
>>> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
>>> +
>>> +    if (ret < 0) {
>>> +        ide_atapi_io_error(s, ret);
>>> +        return;
>>> +    }
>>> +
>>> +    if (s->cd_sector_size == 2352) {
>>> +        cd_data_to_raw(s->io_buffer, s->lba);
>>>       }
>>> -    return ret;
>>> +
>>> +    s->lba++;
>>> +    s->io_buffer_index = 0;
>>> +    s->status &= ~BUSY_STAT;
>>> +
>>> +    ide_atapi_cmd_reply_end(s);
>>> +}
>>> +
>>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int
>>> sector_size)
>>> +{
>>> +    if (sector_size != 2048 && sector_size != 2352) {
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    s->iov.iov_base = buf;
>>> +    if (sector_size == 2352) {
>>> +        buf += 4;
>>> +    }
>>> +
>>> +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>>> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>>> +
>>> +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
>>> +                      cd_read_sector_cb, s) == NULL) {
>>> +        return -EIO;
>>> +    }
>>> +
>>> +    block_acct_start(blk_get_stats(s->blk), &s->acct,
>>> +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> +    s->status |= BUSY_STAT;
>>> +    return 0;
>>>   }
>>>   
>> We discussed this off-list a bit, but for upstream synchronization:
>>
>> Unfortunately, I believe making cd_read_sector here non-blocking makes
>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
>> s->end_transfer_func() nonblocking, which functions like ide_data_readw
>> are not prepared to cope with.
>>
>> My suggestion is to buffer an entire DRQ block of data at once
>> (byte_count_limit) to avoid the problem.
> 
> Hi John,
> 
> first of all thank you for the detailed analysis.
> 
> Is the following what you have i mind. For PIO reads > 1 sector
> it is a great improvement for the NFS backend:
> 
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index ab45495..ec2ba89 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -117,37 +117,40 @@ static void cd_read_sector_cb(void *opaque, int ret)
>      }
> 
>      if (s->cd_sector_size == 2352) {
> -        cd_data_to_raw(s->io_buffer, s->lba);
> +        int nb_sectors = s->packet_transfer_size / 2352;
> +        while (nb_sectors--) {
> +            memmove(s->io_buffer + nb_sectors * 2352 + 4,
> +                    s->io_buffer + nb_sectors * 2048, 2048);
> +            cd_data_to_raw(s->io_buffer + nb_sectors * 2352,
> +                           s->lba + nb_sectors);
> +        }
>      }

Is this going to be correct in cases where the number of sectors we are
copying is less than the total request size? We might need to bookmark
how many sectors/bytes we're copying this go-around. Perhaps by looking
at lcyl/hcyl.

> 
> -    s->lba++;
> +    s->lba = -1;
>      s->io_buffer_index = 0;
>      s->status &= ~BUSY_STAT;
> 
>      ide_atapi_cmd_reply_end(s);
>  }
> 

Well, I might not name it cd_read_sector and cd_read_sector_cb anymore.
Perhaps cd_read_sectors[_cb].

> -static int cd_read_sector(IDEState *s, int lba, void *buf, int
> sector_size)
> +static int cd_read_sector(IDEState *s, int lba, void *buf, int
> sector_size, int nb_sectors)
>  {
>      if (sector_size != 2048 && sector_size != 2352) {
>          return -EINVAL;
>      }
> 
>      s->iov.iov_base = buf;
> -    if (sector_size == 2352) {
> -        buf += 4;
> -    }
> -
> -    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> +    s->iov.iov_len = nb_sectors * 2048;
>      qemu_iovec_init_external(&s->qiov, &s->iov, 1);
> 
> -    if (ide_readv_cancelable(s, (int64_t)lba << 2, &s->qiov, 4,
> -                      cd_read_sector_cb, s) == NULL) {
> +    if (ide_readv_cancelable(s, (int64_t)lba << 2,
> +                             &s->qiov, nb_sectors * 4,
> +                             cd_read_sector_cb, s) == NULL) {
>          return -EIO;
>      }
> 
>      block_acct_start(blk_get_stats(s->blk), &s->acct,
> -                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> +                     nb_sectors * 2048, BLOCK_ACCT_READ);
>      s->status |= BUSY_STAT;
>      return 0;
>  }
> @@ -190,7 +193,7 @@ void ide_atapi_io_error(IDEState *s, int ret)
>  /* The whole ATAPI transfer logic is handled in this function */
>  void ide_atapi_cmd_reply_end(IDEState *s)
>  {
> -    int byte_count_limit, size, ret;
> +    int byte_count_limit, size;
>  #ifdef DEBUG_IDE_ATAPI
>      printf("reply: tx_size=%d elem_tx_size=%d index=%d\n",
>             s->packet_transfer_size,
> @@ -205,14 +208,6 @@ void ide_atapi_cmd_reply_end(IDEState *s)
>          printf("status=0x%x\n", s->status);
>  #endif
>      } else {
> -        /* see if a new sector must be read */
> -        if (s->lba != -1 && s->io_buffer_index >= s->cd_sector_size) {
> -            ret = cd_read_sector(s, s->lba, s->io_buffer,
> s->cd_sector_size);
> -            if (ret < 0) {
> -                ide_atapi_io_error(s, ret);
> -            }
> -            return;
> -        }
>          if (s->elementary_transfer_size > 0) {
>              /* there are some data left to transmit in this elementary
>                 transfer */
> @@ -287,13 +282,18 @@ static void ide_atapi_cmd_reply(IDEState *s, int
> size, int max_size)
>  static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
>                                     int sector_size)
>  {
> +    int ret;
>      s->lba = lba;
>      s->packet_transfer_size = nb_sectors * sector_size;
> +    assert(s->packet_transfer_size <=
> +           IDE_DMA_BUF_SECTORS * BDRV_SECTOR_SIZE + 4);
>      s->elementary_transfer_size = 0;
> -    s->io_buffer_index = sector_size;
>      s->cd_sector_size = sector_size;
> -
> -    ide_atapi_cmd_reply_end(s);
> +    ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size,
> +                         nb_sectors);
> +    if (ret < 0) {
> +        ide_atapi_io_error(s, ret);
> +    }
>  }
> 
>  static void ide_atapi_cmd_check_status(IDEState *s)
> 
> 
> Did you also have a look at the other patches?
> 
> Thanks,
> Peter

On my queue; hopefully Stefan can take a peek too, but I'll try to
review the IDE-specific bits. I imagine you want to wait to respin until
we've looked at all the patches, that's fine -- I'll try not to keep you
waiting for much longer.

--js

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-07 16:42       ` John Snow
@ 2015-10-07 18:53         ` Peter Lieven
  2015-10-08 12:06         ` Peter Lieven
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Lieven @ 2015-10-07 18:53 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody

Am 07.10.2015 um 18:42 schrieb John Snow:
>
> On 10/06/2015 04:46 AM, Peter Lieven wrote:
>> Am 05.10.2015 um 23:15 schrieb John Snow:
>>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>>>> PIO read requests on the ATAPI interface used to be sync blk requests.
>>>> This has to siginificant drawbacks. First the main loop hangs util an
>>>> I/O request is completed and secondly if the I/O request does not
>>>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>>>>
>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>> ---
>>>>   hw/ide/atapi.c | 69
>>>> ++++++++++++++++++++++++++++++++++++----------------------
>>>>   1 file changed, 43 insertions(+), 26 deletions(-)
>>>>
>>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>>>> index 747f466..9257e1c 100644
>>>> --- a/hw/ide/atapi.c
>>>> +++ b/hw/ide/atapi.c
>>>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>>>>       memset(buf, 0, 288);
>>>>   }
>>>>   -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int
>>>> sector_size)
>>>> +static void cd_read_sector_cb(void *opaque, int ret)
>>>>   {
>>>> -    int ret;
>>>> +    IDEState *s = opaque;
>>>>   -    switch(sector_size) {
>>>> -    case 2048:
>>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
>>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>> -        break;
>>>> -    case 2352:
>>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>> -        ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
>>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>> -        if (ret < 0)
>>>> -            return ret;
>>>> -        cd_data_to_raw(buf, lba);
>>>> -        break;
>>>> -    default:
>>>> -        ret = -EIO;
>>>> -        break;
>>>> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>> +
>>>> +    if (ret < 0) {
>>>> +        ide_atapi_io_error(s, ret);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (s->cd_sector_size == 2352) {
>>>> +        cd_data_to_raw(s->io_buffer, s->lba);
>>>>       }
>>>> -    return ret;
>>>> +
>>>> +    s->lba++;
>>>> +    s->io_buffer_index = 0;
>>>> +    s->status &= ~BUSY_STAT;
>>>> +
>>>> +    ide_atapi_cmd_reply_end(s);
>>>> +}
>>>> +
>>>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int
>>>> sector_size)
>>>> +{
>>>> +    if (sector_size != 2048 && sector_size != 2352) {
>>>> +        return -EINVAL;
>>>> +    }
>>>> +
>>>> +    s->iov.iov_base = buf;
>>>> +    if (sector_size == 2352) {
>>>> +        buf += 4;
>>>> +    }
>>>> +
>>>> +    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>>>> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>>>> +
>>>> +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
>>>> +                      cd_read_sector_cb, s) == NULL) {
>>>> +        return -EIO;
>>>> +    }
>>>> +
>>>> +    block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>> +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>> +    s->status |= BUSY_STAT;
>>>> +    return 0;
>>>>   }
>>>>   
>>> We discussed this off-list a bit, but for upstream synchronization:
>>>
>>> Unfortunately, I believe making cd_read_sector here non-blocking makes
>>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
>>> s->end_transfer_func() nonblocking, which functions like ide_data_readw
>>> are not prepared to cope with.
>>>
>>> My suggestion is to buffer an entire DRQ block of data at once
>>> (byte_count_limit) to avoid the problem.
>> Hi John,
>>
>> first of all thank you for the detailed analysis.
>>
>> Is the following what you have i mind. For PIO reads > 1 sector
>> it is a great improvement for the NFS backend:
>>
>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>> index ab45495..ec2ba89 100644
>> --- a/hw/ide/atapi.c
>> +++ b/hw/ide/atapi.c
>> @@ -117,37 +117,40 @@ static void cd_read_sector_cb(void *opaque, int ret)
>>      }
>>
>>      if (s->cd_sector_size == 2352) {
>> -        cd_data_to_raw(s->io_buffer, s->lba);
>> +        int nb_sectors = s->packet_transfer_size / 2352;
>> +        while (nb_sectors--) {
>> +            memmove(s->io_buffer + nb_sectors * 2352 + 4,
>> +                    s->io_buffer + nb_sectors * 2048, 2048);
>> +            cd_data_to_raw(s->io_buffer + nb_sectors * 2352,
>> +                           s->lba + nb_sectors);
>> +        }
>>      }
> Is this going to be correct in cases where the number of sectors we are
> copying is less than the total request size? We might need to bookmark
> how many sectors/bytes we're copying this go-around. Perhaps by looking
> at lcyl/hcyl.

It needs some adjustments, like the whole copying logic. We need
a read and a write offset.

And, of course, it should read +16 and not +4 in the memmove
line as Kevin pointed out.

My current plan is to rebuffer if not all data has been read from
the block layer and we have less than 0xfffe unsend bytes in the buffer.

I would then move the unsend bytes to the front of the io_buffer and
append more data.

In any case it would be good to have a test for such a big transfer
in ide_test. With byte limits that devide the sector size and also not.

>
>> -    s->lba++;
>> +    s->lba = -1;
>>      s->io_buffer_index = 0;
>>      s->status &= ~BUSY_STAT;
>>
>>      ide_atapi_cmd_reply_end(s);
>>  }
>>
> Well, I might not name it cd_read_sector and cd_read_sector_cb anymore.
> Perhaps cd_read_sectors[_cb].

Sure, we could also add a pio_ präfix. Since DMA uses its
own functions.

>
>> -static int cd_read_sector(IDEState *s, int lba, void *buf, int
>> sector_size)
>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int
>> sector_size, int nb_sectors)
>>  {
>>      if (sector_size != 2048 && sector_size != 2352) {
>>          return -EINVAL;
>>      }
>>
>>      s->iov.iov_base = buf;
>> -    if (sector_size == 2352) {
>> -        buf += 4;
>> -    }
>> -
>> -    s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>> +    s->iov.iov_len = nb_sectors * 2048;
>>      qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>>
>> -    if (ide_readv_cancelable(s, (int64_t)lba << 2, &s->qiov, 4,
>> -                      cd_read_sector_cb, s) == NULL) {
>> +    if (ide_readv_cancelable(s, (int64_t)lba << 2,
>> +                             &s->qiov, nb_sectors * 4,
>> +                             cd_read_sector_cb, s) == NULL) {
>>          return -EIO;
>>      }
>>
>>      block_acct_start(blk_get_stats(s->blk), &s->acct,
>> -                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>> +                     nb_sectors * 2048, BLOCK_ACCT_READ);
>>      s->status |= BUSY_STAT;
>>      return 0;
>>  }
>> @@ -190,7 +193,7 @@ void ide_atapi_io_error(IDEState *s, int ret)
>>  /* The whole ATAPI transfer logic is handled in this function */
>>  void ide_atapi_cmd_reply_end(IDEState *s)
>>  {
>> -    int byte_count_limit, size, ret;
>> +    int byte_count_limit, size;
>>  #ifdef DEBUG_IDE_ATAPI
>>      printf("reply: tx_size=%d elem_tx_size=%d index=%d\n",
>>             s->packet_transfer_size,
>> @@ -205,14 +208,6 @@ void ide_atapi_cmd_reply_end(IDEState *s)
>>          printf("status=0x%x\n", s->status);
>>  #endif
>>      } else {
>> -        /* see if a new sector must be read */
>> -        if (s->lba != -1 && s->io_buffer_index >= s->cd_sector_size) {
>> -            ret = cd_read_sector(s, s->lba, s->io_buffer,
>> s->cd_sector_size);
>> -            if (ret < 0) {
>> -                ide_atapi_io_error(s, ret);
>> -            }
>> -            return;
>> -        }
>>          if (s->elementary_transfer_size > 0) {
>>              /* there are some data left to transmit in this elementary
>>                 transfer */
>> @@ -287,13 +282,18 @@ static void ide_atapi_cmd_reply(IDEState *s, int
>> size, int max_size)
>>  static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
>>                                     int sector_size)
>>  {
>> +    int ret;
>>      s->lba = lba;
>>      s->packet_transfer_size = nb_sectors * sector_size;
>> +    assert(s->packet_transfer_size <=
>> +           IDE_DMA_BUF_SECTORS * BDRV_SECTOR_SIZE + 4);
>>      s->elementary_transfer_size = 0;
>> -    s->io_buffer_index = sector_size;
>>      s->cd_sector_size = sector_size;
>> -
>> -    ide_atapi_cmd_reply_end(s);
>> +    ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size,
>> +                         nb_sectors);
>> +    if (ret < 0) {
>> +        ide_atapi_io_error(s, ret);
>> +    }
>>  }
>>
>>  static void ide_atapi_cmd_check_status(IDEState *s)
>>
>>
>> Did you also have a look at the other patches?
>>
>> Thanks,
>> Peter
> On my queue; hopefully Stefan can take a peek too, but I'll try to
> review the IDE-specific bits. I imagine you want to wait to respin until
> we've looked at all the patches, that's fine -- I'll try not to keep you
> waiting for much longer.

Thanks,
Peter

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-07 16:42       ` John Snow
  2015-10-07 18:53         ` Peter Lieven
@ 2015-10-08 12:06         ` Peter Lieven
  2015-10-08 16:44           ` John Snow
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Lieven @ 2015-10-08 12:06 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody

Hi all,

short summary from my side. The whole thing seems to get complicated, let me explain why:

1) During review I found that the code in ide_atapi_cmd_reply_end can't work correctly if the
byte_count_limit is not a divider or a multiple of cd_sector_size. The reason is that as soon
as we load the next sector we start at io_buffer offset 0 overwriting whatever is left in there
for transfer. We also reset the io_buffer_index to 0 which means if we continue with the
elementary transfer we always transfer a whole sector (of corrupt data) regardless if we
are allowed to transfer that much data. Before we consider fixing this I wonder if it
is legal at all to have an unaligned byte_count_limit. It obviously has never caused trouble in
practice so maybe its not happening in real life.

2) I found that whatever cool optimization I put in to buffer multiple sectors at once I end
up with code that breaks migration because older versions would either not fill the io_buffer
as expected or we introduce variables that older versions do not understand. This will
lead to problems if we migrate in the middle of a transfer.

3) My current plan to get this patch to a useful state would be to use my initial patch and just
change the code to use a sync request if we need to buffer additional sectors in an elementary
transfer. I found that in real world operating systems the byte_count_limit seems to be equal to
the cd_sector_size. After all its just a PIO transfer an operating system will likely switch to DMA
as soon as the kernel ist loaded.

Thanks,
Peter

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-08 12:06         ` Peter Lieven
@ 2015-10-08 16:44           ` John Snow
  2015-10-09  8:21             ` Kevin Wolf
  2015-10-14 18:19             ` Peter Lieven
  0 siblings, 2 replies; 32+ messages in thread
From: John Snow @ 2015-10-08 16:44 UTC (permalink / raw)
  To: Peter Lieven, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody



On 10/08/2015 08:06 AM, Peter Lieven wrote:
> Hi all,
> 
> short summary from my side. The whole thing seems to get complicated,
> let me explain why:
> 
> 1) During review I found that the code in ide_atapi_cmd_reply_end can't
> work correctly if the
> byte_count_limit is not a divider or a multiple of cd_sector_size. The
> reason is that as soon
> as we load the next sector we start at io_buffer offset 0 overwriting
> whatever is left in there
> for transfer. We also reset the io_buffer_index to 0 which means if we
> continue with the
> elementary transfer we always transfer a whole sector (of corrupt data)
> regardless if we
> are allowed to transfer that much data. Before we consider fixing this I
> wonder if it
> is legal at all to have an unaligned byte_count_limit. It obviously has
> never caused trouble in
> practice so maybe its not happening in real life.
> 

I had overlooked that part. Good catch. I do suspect that in practice
nobody will be asking for bizarre values.

There's no rule against an unaligned byte_count_limit as far as I have
read, but suspect nobody would have a reason to use it in practice.

> 2) I found that whatever cool optimization I put in to buffer multiple
> sectors at once I end
> up with code that breaks migration because older versions would either
> not fill the io_buffer
> as expected or we introduce variables that older versions do not
> understand. This will
> lead to problems if we migrate in the middle of a transfer.
> 

Ech. This sounds like a bit of a problem. I'll need to think about this
one...

> 3) My current plan to get this patch to a useful state would be to use
> my initial patch and just
> change the code to use a sync request if we need to buffer additional
> sectors in an elementary
> transfer. I found that in real world operating systems the
> byte_count_limit seems to be equal to
> the cd_sector_size. After all its just a PIO transfer an operating
> system will likely switch to DMA
> as soon as the kernel ist loaded.
> 
> Thanks,
> Peter
> 

It sounds like that might be "good enough" for now, and won't make
behavior *worse* than it currently is. You can adjust the test I had
checked in to not use a "tricky" value and we can amend support for this
later if desired.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-08 16:44           ` John Snow
@ 2015-10-09  8:21             ` Kevin Wolf
  2015-10-09 11:18               ` Peter Lieven
  2015-10-09 16:32               ` John Snow
  2015-10-14 18:19             ` Peter Lieven
  1 sibling, 2 replies; 32+ messages in thread
From: Kevin Wolf @ 2015-10-09  8:21 UTC (permalink / raw)
  To: John Snow; +Cc: stefanha, jcody, Peter Lieven, qemu-devel, qemu-block

Am 08.10.2015 um 18:44 hat John Snow geschrieben:
> On 10/08/2015 08:06 AM, Peter Lieven wrote:
> > Hi all,
> > 
> > short summary from my side. The whole thing seems to get complicated,
> > let me explain why:
> > 
> > 1) During review I found that the code in ide_atapi_cmd_reply_end can't
> > work correctly if the
> > byte_count_limit is not a divider or a multiple of cd_sector_size. The
> > reason is that as soon
> > as we load the next sector we start at io_buffer offset 0 overwriting
> > whatever is left in there
> > for transfer. We also reset the io_buffer_index to 0 which means if we
> > continue with the
> > elementary transfer we always transfer a whole sector (of corrupt data)
> > regardless if we
> > are allowed to transfer that much data. Before we consider fixing this I
> > wonder if it
> > is legal at all to have an unaligned byte_count_limit. It obviously has
> > never caused trouble in
> > practice so maybe its not happening in real life.
> > 
> 
> I had overlooked that part. Good catch. I do suspect that in practice
> nobody will be asking for bizarre values.
> 
> There's no rule against an unaligned byte_count_limit as far as I have
> read, but suspect nobody would have a reason to use it in practice.

If we know that our code is technically wrong, "nobody uses it" is not a
good reason not to fix it. Because sooner or later someone will use it.

> > 2) I found that whatever cool optimization I put in to buffer multiple
> > sectors at once I end
> > up with code that breaks migration because older versions would either
> > not fill the io_buffer
> > as expected or we introduce variables that older versions do not
> > understand. This will
> > lead to problems if we migrate in the middle of a transfer.
> > 
> 
> Ech. This sounds like a bit of a problem. I'll need to think about this
> one...

Sounds like a textbook example for subsections to me?

Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-09  8:21             ` Kevin Wolf
@ 2015-10-09 11:18               ` Peter Lieven
  2015-10-09 16:32               ` John Snow
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Lieven @ 2015-10-09 11:18 UTC (permalink / raw)
  To: Kevin Wolf, John Snow; +Cc: stefanha, jcody, qemu-devel, qemu-block

Am 09.10.2015 um 10:21 schrieb Kevin Wolf:
> Am 08.10.2015 um 18:44 hat John Snow geschrieben:
>> On 10/08/2015 08:06 AM, Peter Lieven wrote:
>>> Hi all,
>>>
>>> short summary from my side. The whole thing seems to get complicated,
>>> let me explain why:
>>>
>>> 1) During review I found that the code in ide_atapi_cmd_reply_end can't
>>> work correctly if the
>>> byte_count_limit is not a divider or a multiple of cd_sector_size. The
>>> reason is that as soon
>>> as we load the next sector we start at io_buffer offset 0 overwriting
>>> whatever is left in there
>>> for transfer. We also reset the io_buffer_index to 0 which means if we
>>> continue with the
>>> elementary transfer we always transfer a whole sector (of corrupt data)
>>> regardless if we
>>> are allowed to transfer that much data. Before we consider fixing this I
>>> wonder if it
>>> is legal at all to have an unaligned byte_count_limit. It obviously has
>>> never caused trouble in
>>> practice so maybe its not happening in real life.
>>>
>> I had overlooked that part. Good catch. I do suspect that in practice
>> nobody will be asking for bizarre values.
>>
>> There's no rule against an unaligned byte_count_limit as far as I have
>> read, but suspect nobody would have a reason to use it in practice.
> If we know that our code is technically wrong, "nobody uses it" is not a
> good reason not to fix it. Because sooner or later someone will use it.

I found that I just misinterpreted the code. I think its correct altough
its not nice.

>
>>> 2) I found that whatever cool optimization I put in to buffer multiple
>>> sectors at once I end
>>> up with code that breaks migration because older versions would either
>>> not fill the io_buffer
>>> as expected or we introduce variables that older versions do not
>>> understand. This will
>>> lead to problems if we migrate in the middle of a transfer.
>>>
>> Ech. This sounds like a bit of a problem. I'll need to think about this
>> one...
> Sounds like a textbook example for subsections to me?

I wonder if we need this at all. Its just PIO. However, Windows and Linux
fall back to PIO if I disconnect the NFS Server for some time. With the
latest version of my patch series this works fine now and even works
fine after restoring NFS connectivy. This was not properly working
because of the glitch that John found.

I have an optimization in mind that might work and will disable the need
for the sync requests while keeping migration possible.
I hope to be able to complete this by Monday and send out a V2 then.

If someone wants to have a look at what I currently have (especially
the cancelable requests part) its here:
https://github.com/plieven/qemu/commits/atapi_async_V2

Thanks,
Peter

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-09  8:21             ` Kevin Wolf
  2015-10-09 11:18               ` Peter Lieven
@ 2015-10-09 16:32               ` John Snow
  1 sibling, 0 replies; 32+ messages in thread
From: John Snow @ 2015-10-09 16:32 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: stefanha, jcody, Peter Lieven, qemu-devel, qemu-block



On 10/09/2015 04:21 AM, Kevin Wolf wrote:
> Am 08.10.2015 um 18:44 hat John Snow geschrieben:
>> On 10/08/2015 08:06 AM, Peter Lieven wrote:
>>> Hi all,
>>>
>>> short summary from my side. The whole thing seems to get complicated,
>>> let me explain why:
>>>
>>> 1) During review I found that the code in ide_atapi_cmd_reply_end can't
>>> work correctly if the
>>> byte_count_limit is not a divider or a multiple of cd_sector_size. The
>>> reason is that as soon
>>> as we load the next sector we start at io_buffer offset 0 overwriting
>>> whatever is left in there
>>> for transfer. We also reset the io_buffer_index to 0 which means if we
>>> continue with the
>>> elementary transfer we always transfer a whole sector (of corrupt data)
>>> regardless if we
>>> are allowed to transfer that much data. Before we consider fixing this I
>>> wonder if it
>>> is legal at all to have an unaligned byte_count_limit. It obviously has
>>> never caused trouble in
>>> practice so maybe its not happening in real life.
>>>
>>
>> I had overlooked that part. Good catch. I do suspect that in practice
>> nobody will be asking for bizarre values.
>>
>> There's no rule against an unaligned byte_count_limit as far as I have
>> read, but suspect nobody would have a reason to use it in practice.
> 
> If we know that our code is technically wrong, "nobody uses it" is not a
> good reason not to fix it. Because sooner or later someone will use it.
> 

Not arguing against fixing it, just speaking to priorities and not
making it worse.

If it were up to me I'd spent the next three months obsessively making
the AHCI emulation spec-perfect, but suspect I'd fix close to zero bugs
actually being observed in the real world.

You can always trust that I want to fix every bug no matter how trivial
or untriggerable in the field. :)

>>> 2) I found that whatever cool optimization I put in to buffer multiple
>>> sectors at once I end
>>> up with code that breaks migration because older versions would either
>>> not fill the io_buffer
>>> as expected or we introduce variables that older versions do not
>>> understand. This will
>>> lead to problems if we migrate in the middle of a transfer.
>>>
>>
>> Ech. This sounds like a bit of a problem. I'll need to think about this
>> one...
> 
> Sounds like a textbook example for subsections to me?
> 
> Kevin
> 

Haven't used them before, personally -- I'll take a look.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-08 16:44           ` John Snow
  2015-10-09  8:21             ` Kevin Wolf
@ 2015-10-14 18:19             ` Peter Lieven
  2015-10-14 18:21               ` John Snow
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Lieven @ 2015-10-14 18:19 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody

Am 08.10.2015 um 18:44 schrieb John Snow:
>
> On 10/08/2015 08:06 AM, Peter Lieven wrote:
>> Hi all,
>>
>> short summary from my side. The whole thing seems to get complicated,
>> let me explain why:
>>
>> 1) During review I found that the code in ide_atapi_cmd_reply_end can't
>> work correctly if the
>> byte_count_limit is not a divider or a multiple of cd_sector_size. The
>> reason is that as soon
>> as we load the next sector we start at io_buffer offset 0 overwriting
>> whatever is left in there
>> for transfer. We also reset the io_buffer_index to 0 which means if we
>> continue with the
>> elementary transfer we always transfer a whole sector (of corrupt data)
>> regardless if we
>> are allowed to transfer that much data. Before we consider fixing this I
>> wonder if it
>> is legal at all to have an unaligned byte_count_limit. It obviously has
>> never caused trouble in
>> practice so maybe its not happening in real life.
>>
> I had overlooked that part. Good catch. I do suspect that in practice
> nobody will be asking for bizarre values.
>
> There's no rule against an unaligned byte_count_limit as far as I have
> read, but suspect nobody would have a reason to use it in practice.
>
>> 2) I found that whatever cool optimization I put in to buffer multiple
>> sectors at once I end
>> up with code that breaks migration because older versions would either
>> not fill the io_buffer
>> as expected or we introduce variables that older versions do not
>> understand. This will
>> lead to problems if we migrate in the middle of a transfer.
>>
> Ech. This sounds like a bit of a problem. I'll need to think about this
> one...
>
>> 3) My current plan to get this patch to a useful state would be to use
>> my initial patch and just
>> change the code to use a sync request if we need to buffer additional
>> sectors in an elementary
>> transfer. I found that in real world operating systems the
>> byte_count_limit seems to be equal to
>> the cd_sector_size. After all its just a PIO transfer an operating
>> system will likely switch to DMA
>> as soon as the kernel ist loaded.
>>
>> Thanks,
>> Peter
>>
> It sounds like that might be "good enough" for now, and won't make
> behavior *worse* than it currently is. You can adjust the test I had
> checked in to not use a "tricky" value and we can amend support for this
> later if desired.

Have you had a chance to look at the series with the "good enough" fix?

Thanks,
Peter

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-14 18:19             ` Peter Lieven
@ 2015-10-14 18:21               ` John Snow
  2015-10-16 10:56                 ` Peter Lieven
  0 siblings, 1 reply; 32+ messages in thread
From: John Snow @ 2015-10-14 18:21 UTC (permalink / raw)
  To: Peter Lieven, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody



On 10/14/2015 02:19 PM, Peter Lieven wrote:
> Am 08.10.2015 um 18:44 schrieb John Snow:
>>
>> On 10/08/2015 08:06 AM, Peter Lieven wrote:
>>> Hi all,
>>>
>>> short summary from my side. The whole thing seems to get complicated,
>>> let me explain why:
>>>
>>> 1) During review I found that the code in ide_atapi_cmd_reply_end can't
>>> work correctly if the
>>> byte_count_limit is not a divider or a multiple of cd_sector_size. The
>>> reason is that as soon
>>> as we load the next sector we start at io_buffer offset 0 overwriting
>>> whatever is left in there
>>> for transfer. We also reset the io_buffer_index to 0 which means if we
>>> continue with the
>>> elementary transfer we always transfer a whole sector (of corrupt data)
>>> regardless if we
>>> are allowed to transfer that much data. Before we consider fixing this I
>>> wonder if it
>>> is legal at all to have an unaligned byte_count_limit. It obviously has
>>> never caused trouble in
>>> practice so maybe its not happening in real life.
>>>
>> I had overlooked that part. Good catch. I do suspect that in practice
>> nobody will be asking for bizarre values.
>>
>> There's no rule against an unaligned byte_count_limit as far as I have
>> read, but suspect nobody would have a reason to use it in practice.
>>
>>> 2) I found that whatever cool optimization I put in to buffer multiple
>>> sectors at once I end
>>> up with code that breaks migration because older versions would either
>>> not fill the io_buffer
>>> as expected or we introduce variables that older versions do not
>>> understand. This will
>>> lead to problems if we migrate in the middle of a transfer.
>>>
>> Ech. This sounds like a bit of a problem. I'll need to think about this
>> one...
>>
>>> 3) My current plan to get this patch to a useful state would be to use
>>> my initial patch and just
>>> change the code to use a sync request if we need to buffer additional
>>> sectors in an elementary
>>> transfer. I found that in real world operating systems the
>>> byte_count_limit seems to be equal to
>>> the cd_sector_size. After all its just a PIO transfer an operating
>>> system will likely switch to DMA
>>> as soon as the kernel ist loaded.
>>>
>>> Thanks,
>>> Peter
>>>
>> It sounds like that might be "good enough" for now, and won't make
>> behavior *worse* than it currently is. You can adjust the test I had
>> checked in to not use a "tricky" value and we can amend support for this
>> later if desired.
> 
> Have you had a chance to look at the series with the "good enough" fix?
> 
> Thanks,
> Peter
> 

Will do so Friday, thanks!

--js

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
  2015-10-14 18:21               ` John Snow
@ 2015-10-16 10:56                 ` Peter Lieven
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Lieven @ 2015-10-16 10:56 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block; +Cc: kwolf, stefanha, jcody

Am 14.10.2015 um 20:21 schrieb John Snow:
>
> On 10/14/2015 02:19 PM, Peter Lieven wrote:
>> Am 08.10.2015 um 18:44 schrieb John Snow:
>>> On 10/08/2015 08:06 AM, Peter Lieven wrote:
>>>> Hi all,
>>>>
>>>> short summary from my side. The whole thing seems to get complicated,
>>>> let me explain why:
>>>>
>>>> 1) During review I found that the code in ide_atapi_cmd_reply_end can't
>>>> work correctly if the
>>>> byte_count_limit is not a divider or a multiple of cd_sector_size. The
>>>> reason is that as soon
>>>> as we load the next sector we start at io_buffer offset 0 overwriting
>>>> whatever is left in there
>>>> for transfer. We also reset the io_buffer_index to 0 which means if we
>>>> continue with the
>>>> elementary transfer we always transfer a whole sector (of corrupt data)
>>>> regardless if we
>>>> are allowed to transfer that much data. Before we consider fixing this I
>>>> wonder if it
>>>> is legal at all to have an unaligned byte_count_limit. It obviously has
>>>> never caused trouble in
>>>> practice so maybe its not happening in real life.
>>>>
>>> I had overlooked that part. Good catch. I do suspect that in practice
>>> nobody will be asking for bizarre values.
>>>
>>> There's no rule against an unaligned byte_count_limit as far as I have
>>> read, but suspect nobody would have a reason to use it in practice.
>>>
>>>> 2) I found that whatever cool optimization I put in to buffer multiple
>>>> sectors at once I end
>>>> up with code that breaks migration because older versions would either
>>>> not fill the io_buffer
>>>> as expected or we introduce variables that older versions do not
>>>> understand. This will
>>>> lead to problems if we migrate in the middle of a transfer.
>>>>
>>> Ech. This sounds like a bit of a problem. I'll need to think about this
>>> one...
>>>
>>>> 3) My current plan to get this patch to a useful state would be to use
>>>> my initial patch and just
>>>> change the code to use a sync request if we need to buffer additional
>>>> sectors in an elementary
>>>> transfer. I found that in real world operating systems the
>>>> byte_count_limit seems to be equal to
>>>> the cd_sector_size. After all its just a PIO transfer an operating
>>>> system will likely switch to DMA
>>>> as soon as the kernel ist loaded.
>>>>
>>>> Thanks,
>>>> Peter
>>>>
>>> It sounds like that might be "good enough" for now, and won't make
>>> behavior *worse* than it currently is. You can adjust the test I had
>>> checked in to not use a "tricky" value and we can amend support for this
>>> later if desired.
>> Have you had a chance to look at the series with the "good enough" fix?
>>
>> Thanks,
>> Peter
>>
> Will do so Friday, thanks!

Thank you,
Peter

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2015-10-16 10:56 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-21 12:25 [Qemu-devel] [PATCH 0/5] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
2015-09-21 12:25 ` [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async Peter Lieven
2015-10-02 21:02   ` John Snow
2015-10-05 21:15   ` John Snow
2015-10-06  8:46     ` Peter Lieven
2015-10-06 12:08       ` Peter Lieven
2015-10-07 16:42       ` John Snow
2015-10-07 18:53         ` Peter Lieven
2015-10-08 12:06         ` Peter Lieven
2015-10-08 16:44           ` John Snow
2015-10-09  8:21             ` Kevin Wolf
2015-10-09 11:18               ` Peter Lieven
2015-10-09 16:32               ` John Snow
2015-10-14 18:19             ` Peter Lieven
2015-10-14 18:21               ` John Snow
2015-10-16 10:56                 ` Peter Lieven
2015-10-06  8:57     ` Kevin Wolf
2015-10-06  9:20       ` Peter Lieven
2015-10-06 17:07         ` John Snow
2015-10-06 17:12           ` Peter Lieven
2015-10-06 17:56             ` John Snow
2015-10-06 18:31               ` Peter Lieven
2015-10-06 18:34                 ` John Snow
2015-10-06 15:54       ` John Snow
2015-10-07  7:28         ` Kevin Wolf
2015-10-06 13:05   ` Laszlo Ersek
2015-09-21 12:25 ` [Qemu-devel] [PATCH 2/5] ide/atapi: blk_aio_readv may return NULL Peter Lieven
2015-09-21 12:25 ` [Qemu-devel] [PATCH 3/5] ide: add support for cancelable read requests Peter Lieven
2015-09-21 12:25 ` [Qemu-devel] [PATCH 4/5] ide/atapi: enable cancelable requests Peter Lieven
2015-09-21 12:25 ` [Qemu-devel] [PATCH 5/5] block/nfs: cache allocated filesize for read-only files Peter Lieven
2015-09-21 20:58 ` [Qemu-devel] [PATCH 0/5] ide: avoid main-loop hang on CDROM/NFS failure John Snow
2015-09-21 21:22   ` Peter Lieven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).