qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] better I/O accounting
@ 2011-08-11 23:44 Christoph Hellwig
  2011-08-11 23:44 ` [Qemu-devel] [PATCH 1/3] block: include flush requests in info blockstats Christoph Hellwig
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Christoph Hellwig @ 2011-08-11 23:44 UTC (permalink / raw)
  To: qemu-devel



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 1/3] block: include flush requests in info blockstats
  2011-08-11 23:44 [Qemu-devel] [PATCH 0/3] better I/O accounting Christoph Hellwig
@ 2011-08-11 23:44 ` Christoph Hellwig
  2011-08-11 23:44 ` [Qemu-devel] [PATCH 2/3] block: explicit I/O accounting Christoph Hellwig
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2011-08-11 23:44 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/block.c
===================================================================
--- qemu.orig/block.c	2011-08-11 12:32:35.842658196 +0200
+++ qemu/block.c	2011-08-11 16:04:48.302061893 +0200
@@ -1888,11 +1888,13 @@ static void bdrv_stats_iter(QObject *dat
                         " wr_bytes=%" PRId64
                         " rd_operations=%" PRId64
                         " wr_operations=%" PRId64
+                        " flush_operations=%" PRId64
                         "\n",
                         qdict_get_int(qdict, "rd_bytes"),
                         qdict_get_int(qdict, "wr_bytes"),
                         qdict_get_int(qdict, "rd_operations"),
-                        qdict_get_int(qdict, "wr_operations"));
+                        qdict_get_int(qdict, "wr_operations"),
+                        qdict_get_int(qdict, "flush_operations"));
 }
 
 void bdrv_stats_print(Monitor *mon, const QObject *data)
@@ -1910,12 +1912,16 @@ static QObject* bdrv_info_stats_bs(Block
                              "'wr_bytes': %" PRId64 ","
                              "'rd_operations': %" PRId64 ","
                              "'wr_operations': %" PRId64 ","
-                             "'wr_highest_offset': %" PRId64
+                             "'wr_highest_offset': %" PRId64 ","
+                             "'flush_operations': %" PRId64
                              "} }",
-                             bs->rd_bytes, bs->wr_bytes,
-                             bs->rd_ops, bs->wr_ops,
+                             bs->rd_bytes,
+                             bs->wr_bytes,
+                             bs->rd_ops,
+                             bs->wr_ops,
                              bs->wr_highest_sector *
-                             (uint64_t)BDRV_SECTOR_SIZE);
+                             (uint64_t)BDRV_SECTOR_SIZE,
+                             bs->flush_ops);
     dict  = qobject_to_qdict(res);
 
     if (*bs->device_name) {
@@ -2579,6 +2585,8 @@ BlockDriverAIOCB *bdrv_aio_flush(BlockDr
 
     trace_bdrv_aio_flush(bs, opaque);
 
+    bs->flush_ops++;
+
     if (bs->open_flags & BDRV_O_NO_FLUSH) {
         return bdrv_aio_noop_em(bs, cb, opaque);
     }
Index: qemu/block_int.h
===================================================================
--- qemu.orig/block_int.h	2011-08-11 12:32:35.852658142 +0200
+++ qemu/block_int.h	2011-08-11 16:04:48.302061893 +0200
@@ -188,6 +188,7 @@ struct BlockDriverState {
     uint64_t wr_bytes;
     uint64_t rd_ops;
     uint64_t wr_ops;
+    uint64_t flush_ops;
     uint64_t wr_highest_sector;
 
     /* Whether the disk can expand beyond total_sectors */
Index: qemu/qmp-commands.hx
===================================================================
--- qemu.orig/qmp-commands.hx	2011-08-11 12:32:36.062657004 +0200
+++ qemu/qmp-commands.hx	2011-08-11 16:04:48.305395208 +0200
@@ -1201,6 +1201,7 @@ Each json-object contain the following:
     - "wr_bytes": bytes written (json-int)
     - "rd_operations": read operations (json-int)
     - "wr_operations": write operations (json-int)
+    - "flush_operations": cache flush operations (json-int)
     - "wr_highest_offset": Highest offset of a sector written since the
                            BlockDriverState has been opened (json-int)
 - "parent": Contains recursively the statistics of the underlying
@@ -1222,6 +1223,7 @@ Example:
                   "wr_operations":751,
                   "rd_bytes":122567168,
                   "rd_operations":36772
+                  "flush_operations":61,
                }
             },
             "stats":{
@@ -1230,6 +1232,7 @@ Example:
                "wr_operations":692,
                "rd_bytes":122739200,
                "rd_operations":36604
+               "flush_operations":51,
             }
          },
          {
@@ -1240,6 +1243,7 @@ Example:
                "wr_operations":0,
                "rd_bytes":0,
                "rd_operations":0
+               "flush_operations":0,
             }
          },
          {
@@ -1250,6 +1254,7 @@ Example:
                "wr_operations":0,
                "rd_bytes":0,
                "rd_operations":0
+               "flush_operations":0,
             }
          },
          {
@@ -1260,6 +1265,7 @@ Example:
                "wr_operations":0,
                "rd_bytes":0,
                "rd_operations":0
+               "flush_operations":0,
             }
          }
       ]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 2/3] block: explicit I/O accounting
  2011-08-11 23:44 [Qemu-devel] [PATCH 0/3] better I/O accounting Christoph Hellwig
  2011-08-11 23:44 ` [Qemu-devel] [PATCH 1/3] block: include flush requests in info blockstats Christoph Hellwig
@ 2011-08-11 23:44 ` Christoph Hellwig
  2011-08-11 23:44 ` [Qemu-devel] [PATCH 3/3] block: latency accounting Christoph Hellwig
  2011-08-12  5:14 ` [Qemu-devel] [PATCH 0/3] better I/O accounting Stefan Hajnoczi
  3 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2011-08-11 23:44 UTC (permalink / raw)
  To: qemu-devel

Decouple the I/O accounting from bdrv_aio_readv/writev/flush and
make the hardware models call directly into the accounting helpers.

This means:
 - we do not count internal requests from image formats in addition
   to guest originating I/O
 - we do not double count I/O ops if the device model handles it
   chunk wise
 - we only account I/O once it actuall is done
 - can extent I/O accounting to synchronous or coroutine I/O easily
 - implement I/O latency tracking easily (see the next patch)

I've conveted the existing device model callers to the new model,
device models that are using synchronous I/O and weren't accounted
before haven't been updated yet.  Also scsi hasn't been converted
to the end-to-end accounting as I want to defer that after the pending
scsi layer overhaul.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/block.c
===================================================================
--- qemu.orig/block.c	2011-08-11 16:04:48.302061893 +0200
+++ qemu/block.c	2011-08-11 16:05:24.868530461 +0200
@@ -1915,13 +1915,13 @@ static QObject* bdrv_info_stats_bs(Block
                              "'wr_highest_offset': %" PRId64 ","
                              "'flush_operations': %" PRId64
                              "} }",
-                             bs->rd_bytes,
-                             bs->wr_bytes,
-                             bs->rd_ops,
-                             bs->wr_ops,
+                             bs->nr_bytes[BDRV_ACCT_READ],
+                             bs->nr_bytes[BDRV_ACCT_WRITE],
+                             bs->nr_ops[BDRV_ACCT_READ],
+                             bs->nr_ops[BDRV_ACCT_WRITE],
                              bs->wr_highest_sector *
                              (uint64_t)BDRV_SECTOR_SIZE,
-                             bs->flush_ops);
+                             bs->nr_ops[BDRV_ACCT_FLUSH]);
     dict  = qobject_to_qdict(res);
 
     if (*bs->device_name) {
@@ -2235,7 +2235,6 @@ char *bdrv_snapshot_dump(char *buf, int
     return buf;
 }
 
-
 /**************************************************************/
 /* async I/Os */
 
@@ -2244,7 +2243,6 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDr
                                  BlockDriverCompletionFunc *cb, void *opaque)
 {
     BlockDriver *drv = bs->drv;
-    BlockDriverAIOCB *ret;
 
     trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
 
@@ -2253,16 +2251,7 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDr
     if (bdrv_check_request(bs, sector_num, nb_sectors))
         return NULL;
 
-    ret = drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
-                              cb, opaque);
-
-    if (ret) {
-	/* Update stats even though technically transfer has not happened. */
-	bs->rd_bytes += (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
-	bs->rd_ops ++;
-    }
-
-    return ret;
+    return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors, cb, opaque);
 }
 
 typedef struct BlockCompleteData {
@@ -2329,9 +2318,6 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockD
                                cb, opaque);
 
     if (ret) {
-        /* Update stats even though technically transfer has not happened. */
-        bs->wr_bytes += (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
-        bs->wr_ops ++;
         if (bs->wr_highest_sector < sector_num + nb_sectors - 1) {
             bs->wr_highest_sector = sector_num + nb_sectors - 1;
         }
@@ -2585,8 +2571,6 @@ BlockDriverAIOCB *bdrv_aio_flush(BlockDr
 
     trace_bdrv_aio_flush(bs, opaque);
 
-    bs->flush_ops++;
-
     if (bs->open_flags & BDRV_O_NO_FLUSH) {
         return bdrv_aio_noop_em(bs, cb, opaque);
     }
Index: qemu/block_int.h
===================================================================
--- qemu.orig/block_int.h	2011-08-11 16:04:48.302061893 +0200
+++ qemu/block_int.h	2011-08-11 16:05:24.868530461 +0200
@@ -148,6 +148,13 @@ struct BlockDriver {
     QLIST_ENTRY(BlockDriver) list;
 };
 
+enum BlockIOType {
+    BDRV_ACCT_READ,
+    BDRV_ACCT_WRITE,
+    BDRV_ACCT_FLUSH,
+    BDRV_MAX_IOTYPE,
+};
+
 struct BlockDriverState {
     int64_t total_sectors; /* if we are reading a disk image, give its
                               size in sectors */
@@ -184,11 +191,8 @@ struct BlockDriverState {
     void *sync_aiocb;
 
     /* I/O stats (display with "info blockstats"). */
-    uint64_t rd_bytes;
-    uint64_t wr_bytes;
-    uint64_t rd_ops;
-    uint64_t wr_ops;
-    uint64_t flush_ops;
+    uint64_t nr_bytes[BDRV_MAX_IOTYPE];
+    uint64_t nr_ops[BDRV_MAX_IOTYPE];
     uint64_t wr_highest_sector;
 
     /* Whether the disk can expand beyond total_sectors */
@@ -212,6 +216,27 @@ struct BlockDriverState {
     void *private;
 };
 
+typedef struct BlockAcctCookie {
+    int64_t bytes;
+    enum BlockIOType type;
+} BlockAcctCookie;
+
+static inline void
+bdrv_acct_start(BlockDriverState *bs, BlockAcctCookie *cookie, int64_t bytes,
+        enum BlockIOType type)
+{
+    cookie->bytes = bytes;
+    cookie->type = type;
+}
+
+static inline void
+bdrv_acct_done(BlockDriverState *bs, BlockAcctCookie *cookie)
+{
+    bs->nr_bytes[cookie->type] += cookie->bytes;
+    bs->nr_ops[cookie->type]++;
+}
+
+
 #define CHANGE_MEDIA    0x01
 #define CHANGE_SIZE     0x02
 
Index: qemu/hw/ide/ahci.c
===================================================================
--- qemu.orig/hw/ide/ahci.c	2011-08-11 12:32:35.925991078 +0200
+++ qemu/hw/ide/ahci.c	2011-08-11 16:36:25.051786316 +0200
@@ -710,6 +710,7 @@ static void ncq_cb(void *opaque, int ret
     DPRINTF(ncq_tfs->drive->port_no, "NCQ transfer tag %d finished\n",
             ncq_tfs->tag);
 
+    bdrv_acct_done(ncq_tfs->drive->port.ifs[0].bs, &ncq_tfs->acct);
     qemu_sglist_destroy(&ncq_tfs->sglist);
     ncq_tfs->used = 0;
 }
@@ -755,7 +756,11 @@ static void process_ncq_command(AHCIStat
                     ncq_tfs->sector_count-1, ncq_tfs->lba, ncq_tfs->tag);
             ncq_tfs->is_read = 1;
 
-            DPRINTF(port, "tag %d aio read %ld\n", ncq_tfs->tag, ncq_tfs->lba);
+	    DPRINTF(port, "tag %d aio read %ld\n", ncq_tfs->tag, ncq_tfs->lba);
+
+            bdrv_acct_start(ncq_tfs->drive->port.ifs[0].bs, &ncq_tfs->acct,
+                            (ncq_tfs->sector_count-1) * BDRV_SECTOR_SIZE,
+                            BDRV_ACCT_READ);
             ncq_tfs->aiocb = dma_bdrv_read(ncq_tfs->drive->port.ifs[0].bs,
                                            &ncq_tfs->sglist, ncq_tfs->lba,
                                            ncq_cb, ncq_tfs);
@@ -766,6 +771,10 @@ static void process_ncq_command(AHCIStat
             ncq_tfs->is_read = 0;
 
             DPRINTF(port, "tag %d aio write %ld\n", ncq_tfs->tag, ncq_tfs->lba);
+
+            bdrv_acct_start(ncq_tfs->drive->port.ifs[0].bs, &ncq_tfs->acct,
+                            (ncq_tfs->sector_count-1) * BDRV_SECTOR_SIZE,
+                            BDRV_ACCT_WRITE);
             ncq_tfs->aiocb = dma_bdrv_write(ncq_tfs->drive->port.ifs[0].bs,
                                             &ncq_tfs->sglist, ncq_tfs->lba,
                                             ncq_cb, ncq_tfs);
Index: qemu/hw/ide/atapi.c
===================================================================
--- qemu.orig/hw/ide/atapi.c	2011-08-11 12:32:35.925991078 +0200
+++ qemu/hw/ide/atapi.c	2011-08-11 16:24:16.302400957 +0200
@@ -250,6 +250,7 @@ static void ide_atapi_cmd_reply(IDEState
     s->io_buffer_index = 0;
 
     if (s->atapi_dma) {
+        bdrv_acct_start(s->bs, &s->acct, size, BDRV_ACCT_READ);
         s->status = READY_STAT | SEEK_STAT | DRQ_STAT;
         s->bus->dma->ops->start_dma(s->bus->dma, s,
                                    ide_atapi_cmd_read_dma_cb);
@@ -322,10 +323,7 @@ static void ide_atapi_cmd_read_dma_cb(vo
         s->status = READY_STAT | SEEK_STAT;
         s->nsector = (s->nsector & ~7) | ATAPI_INT_REASON_IO | ATAPI_INT_REASON_CD;
         ide_set_irq(s->bus);
-    eot:
-        s->bus->dma->ops->add_status(s->bus->dma, BM_STATUS_INT);
-        ide_set_inactive(s);
-        return;
+        goto eot;
     }
 
     s->io_buffer_index = 0;
@@ -343,9 +341,11 @@ static void ide_atapi_cmd_read_dma_cb(vo
 #ifdef DEBUG_AIO
     printf("aio_read_cd: lba=%u n=%d\n", s->lba, n);
 #endif
+
     s->bus->dma->iov.iov_base = (void *)(s->io_buffer + data_offset);
     s->bus->dma->iov.iov_len = n * 4 * 512;
     qemu_iovec_init_external(&s->bus->dma->qiov, &s->bus->dma->iov, 1);
+
     s->bus->dma->aiocb = bdrv_aio_readv(s->bs, (int64_t)s->lba << 2,
                                        &s->bus->dma->qiov, n * 4,
                                        ide_atapi_cmd_read_dma_cb, s);
@@ -355,6 +355,12 @@ static void ide_atapi_cmd_read_dma_cb(vo
                             ASC_MEDIUM_NOT_PRESENT);
         goto eot;
     }
+
+    return;
+eot:
+    bdrv_acct_done(s->bs, &s->acct);
+    s->bus->dma->ops->add_status(s->bus->dma, BM_STATUS_INT);
+    ide_set_inactive(s);
 }
 
 /* start a CD-CDROM read command with DMA */
@@ -368,6 +374,8 @@ static void ide_atapi_cmd_read_dma(IDESt
     s->io_buffer_size = 0;
     s->cd_sector_size = sector_size;
 
+    bdrv_acct_start(s->bs, &s->acct, s->packet_transfer_size, BDRV_ACCT_READ);
+
     /* XXX: check if BUSY_STAT should be set */
     s->status = READY_STAT | SEEK_STAT | DRQ_STAT | BUSY_STAT;
     s->bus->dma->ops->start_dma(s->bus->dma, s,
Index: qemu/hw/ide/core.c
===================================================================
--- qemu.orig/hw/ide/core.c	2011-08-11 12:32:35.925991078 +0200
+++ qemu/hw/ide/core.c	2011-08-11 16:32:54.282928150 +0200
@@ -473,7 +473,10 @@ void ide_sector_read(IDEState *s)
 #endif
         if (n > s->req_nb_sectors)
             n = s->req_nb_sectors;
+
+        bdrv_acct_start(s->bs, &s->acct, n * BDRV_SECTOR_SIZE, BDRV_ACCT_READ);
         ret = bdrv_read(s->bs, sector_num, s->io_buffer, n);
+        bdrv_acct_done(s->bs, &s->acct);
         if (ret != 0) {
             if (ide_handle_rw_error(s, -ret,
                 BM_STATUS_PIO_RETRY | BM_STATUS_RETRY_READ))
@@ -610,7 +613,8 @@ handle_rw_error:
     return;
 
 eot:
-   ide_set_inactive(s);
+    bdrv_acct_done(s->bs, &s->acct);
+    ide_set_inactive(s);
 }
 
 static void ide_sector_start_dma(IDEState *s, enum ide_dma_cmd dma_cmd)
@@ -619,6 +623,20 @@ static void ide_sector_start_dma(IDEStat
     s->io_buffer_index = 0;
     s->io_buffer_size = 0;
     s->dma_cmd = dma_cmd;
+
+    switch (dma_cmd) {
+    case IDE_DMA_READ:
+        bdrv_acct_start(s->bs, &s->acct, s->nsector * BDRV_SECTOR_SIZE,
+                        BDRV_ACCT_READ);
+        break;
+    case IDE_DMA_WRITE:
+        bdrv_acct_start(s->bs, &s->acct, s->nsector * BDRV_SECTOR_SIZE,
+                        BDRV_ACCT_WRITE);
+	break;
+    default:
+        break;
+    }
+
     s->bus->dma->ops->start_dma(s->bus->dma, s, ide_dma_cb);
 }
 
@@ -641,7 +659,10 @@ void ide_sector_write(IDEState *s)
     n = s->nsector;
     if (n > s->req_nb_sectors)
         n = s->req_nb_sectors;
+
+    bdrv_acct_start(s->bs, &s->acct, n * BDRV_SECTOR_SIZE, BDRV_ACCT_READ);
     ret = bdrv_write(s->bs, sector_num, s->io_buffer, n);
+    bdrv_acct_done(s->bs, &s->acct);
 
     if (ret != 0) {
         if (ide_handle_rw_error(s, -ret, BM_STATUS_PIO_RETRY))
@@ -685,6 +706,7 @@ static void ide_flush_cb(void *opaque, i
         }
     }
 
+    bdrv_acct_done(s->bs, &s->acct);
     s->status = READY_STAT | SEEK_STAT;
     ide_set_irq(s->bus);
 }
@@ -698,6 +720,7 @@ void ide_flush_cache(IDEState *s)
         return;
     }
 
+    bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH);
     acb = bdrv_aio_flush(s->bs, ide_flush_cb, s);
     if (acb == NULL) {
         ide_flush_cb(s, -EIO);
Index: qemu/hw/ide/macio.c
===================================================================
--- qemu.orig/hw/ide/macio.c	2011-08-11 12:32:35.925991078 +0200
+++ qemu/hw/ide/macio.c	2011-08-11 16:33:07.629522512 +0200
@@ -52,8 +52,7 @@ static void pmac_ide_atapi_transfer_cb(v
         m->aiocb = NULL;
         qemu_sglist_destroy(&s->sg);
         ide_atapi_io_error(s, ret);
-        io->dma_end(opaque);
-        return;
+        goto done;
     }
 
     if (s->io_buffer_size > 0) {
@@ -71,8 +70,7 @@ static void pmac_ide_atapi_transfer_cb(v
         ide_atapi_cmd_ok(s);
 
     if (io->len == 0) {
-        io->dma_end(opaque);
-        return;
+        goto done;
     }
 
     /* launch next transfer */
@@ -92,9 +90,14 @@ static void pmac_ide_atapi_transfer_cb(v
         /* Note: media not present is the most likely case */
         ide_atapi_cmd_error(s, SENSE_NOT_READY,
                             ASC_MEDIUM_NOT_PRESENT);
-        io->dma_end(opaque);
-        return;
+        goto done;
     }
+    return;
+
+done:
+    bdrv_acct_done(s->bs, &s->acct);
+    io->dma_end(opaque);
+    return;
 }
 
 static void pmac_ide_transfer_cb(void *opaque, int ret)
@@ -109,8 +112,7 @@ static void pmac_ide_transfer_cb(void *o
         m->aiocb = NULL;
         qemu_sglist_destroy(&s->sg);
 	ide_dma_error(s);
-        io->dma_end(io);
-        return;
+        goto done;
     }
 
     sector_num = ide_get_sector(s);
@@ -130,10 +132,8 @@ static void pmac_ide_transfer_cb(void *o
     }
 
     /* end of DMA ? */
-
     if (io->len == 0) {
-        io->dma_end(io);
-	return;
+        goto done;
     }
 
     /* launch next transfer */
@@ -163,6 +163,10 @@ static void pmac_ide_transfer_cb(void *o
 
     if (!m->aiocb)
         pmac_ide_transfer_cb(io, -1);
+    return;
+done:
+    bdrv_acct_done(s->bs, &s->acct);
+    io->dma_end(io);
 }
 
 static void pmac_ide_transfer(DBDMA_io *io)
@@ -172,10 +176,22 @@ static void pmac_ide_transfer(DBDMA_io *
 
     s->io_buffer_size = 0;
     if (s->drive_kind == IDE_CD) {
+        bdrv_acct_start(s->bs, &s->acct, io->len, BDRV_ACCT_READ);
         pmac_ide_atapi_transfer_cb(io, 0);
         return;
     }
 
+    switch (s->dma_cmd) {
+    case IDE_DMA_READ:
+        bdrv_acct_start(s->bs, &s->acct, io->len, BDRV_ACCT_READ);
+        break;
+    case IDE_DMA_WRITE:
+        bdrv_acct_start(s->bs, &s->acct, io->len, BDRV_ACCT_WRITE);
+        break;
+    default:
+        break;
+    }
+
     pmac_ide_transfer_cb(io, 0);
 }
 
Index: qemu/hw/scsi-disk.c
===================================================================
--- qemu.orig/hw/scsi-disk.c	2011-08-11 12:32:35.969324178 +0200
+++ qemu/hw/scsi-disk.c	2011-08-11 16:05:24.875197093 +0200
@@ -57,6 +57,7 @@ typedef struct SCSIDiskReq {
     struct iovec iov;
     QEMUIOVector qiov;
     uint32_t status;
+    BlockAcctCookie acct;
 } SCSIDiskReq;
 
 struct SCSIDiskState
@@ -134,6 +135,7 @@ static void scsi_cancel_io(SCSIRequest *
 static void scsi_read_complete(void * opaque, int ret)
 {
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
+    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
     int n;
 
     r->req.aiocb = NULL;
@@ -144,6 +146,8 @@ static void scsi_read_complete(void * op
         }
     }
 
+    bdrv_acct_done(s->bs, &r->acct);
+
     DPRINTF("Data ready tag=0x%x len=%zd\n", r->req.tag, r->iov.iov_len);
 
     n = r->iov.iov_len / 512;
@@ -187,6 +191,8 @@ static void scsi_read_data(SCSIRequest *
 
     r->iov.iov_len = n * 512;
     qemu_iovec_init_external(&r->qiov, &r->iov, 1);
+
+    bdrv_acct_start(s->bs, &r->acct, n * BDRV_SECTOR_SIZE, BDRV_ACCT_READ);
     r->req.aiocb = bdrv_aio_readv(s->bs, r->sector, &r->qiov, n,
                               scsi_read_complete, r);
     if (r->req.aiocb == NULL) {
@@ -239,6 +245,7 @@ static int scsi_handle_rw_error(SCSIDisk
 static void scsi_write_complete(void * opaque, int ret)
 {
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
+    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
     uint32_t len;
     uint32_t n;
 
@@ -250,6 +257,8 @@ static void scsi_write_complete(void * o
         }
     }
 
+    bdrv_acct_done(s->bs, &r->acct);
+
     n = r->iov.iov_len / 512;
     r->sector += n;
     r->sector_count -= n;
@@ -284,6 +293,8 @@ static void scsi_write_data(SCSIRequest
     n = r->iov.iov_len / 512;
     if (n) {
         qemu_iovec_init_external(&r->qiov, &r->iov, 1);
+
+        bdrv_acct_start(s->bs, &r->acct, n * BDRV_SECTOR_SIZE, BDRV_ACCT_WRITE);
         r->req.aiocb = bdrv_aio_writev(s->bs, r->sector, &r->qiov, n,
                                    scsi_write_complete, r);
         if (r->req.aiocb == NULL) {
@@ -906,13 +917,19 @@ static int scsi_disk_emulate_command(SCS
         buflen = 8;
         break;
     case SYNCHRONIZE_CACHE:
+{
+        BlockAcctCookie acct;
+
+        bdrv_acct_start(s->bs, &acct, 0, BDRV_ACCT_FLUSH);
         ret = bdrv_flush(s->bs);
         if (ret < 0) {
             if (scsi_handle_rw_error(r, -ret, SCSI_REQ_STATUS_RETRY_FLUSH)) {
                 return -1;
             }
         }
+        bdrv_acct_done(s->bs, &acct);
         break;
+}
     case GET_CONFIGURATION:
         memset(outbuf, 0, 8);
         /* ??? This should probably return much more information.  For now
Index: qemu/hw/virtio-blk.c
===================================================================
--- qemu.orig/hw/virtio-blk.c	2011-08-11 12:32:35.989324068 +0200
+++ qemu/hw/virtio-blk.c	2011-08-11 16:07:58.384365461 +0200
@@ -47,6 +47,7 @@ typedef struct VirtIOBlockReq
     struct virtio_scsi_inhdr *scsi;
     QEMUIOVector qiov;
     struct VirtIOBlockReq *next;
+    BlockAcctCookie acct;
 } VirtIOBlockReq;
 
 static void virtio_blk_req_complete(VirtIOBlockReq *req, int status)
@@ -59,6 +60,7 @@ static void virtio_blk_req_complete(Virt
     virtqueue_push(s->vq, &req->elem, req->qiov.size + sizeof(*req->in));
     virtio_notify(&s->vdev, s->vq);
 
+    bdrv_acct_done(s->bs, &req->acct);
     qemu_free(req);
 }
 
@@ -266,6 +268,8 @@ static void virtio_blk_handle_flush(Virt
 {
     BlockDriverAIOCB *acb;
 
+    bdrv_acct_start(req->dev->bs, &req->acct, 0, BDRV_ACCT_FLUSH);
+
     /*
      * Make sure all outstanding writes are posted to the backing device.
      */
@@ -284,6 +288,8 @@ static void virtio_blk_handle_write(Virt
 
     sector = ldq_p(&req->out->sector);
 
+    bdrv_acct_start(req->dev->bs, &req->acct, req->qiov.size, BDRV_ACCT_WRITE);
+
     trace_virtio_blk_handle_write(req, sector, req->qiov.size / 512);
 
     if (sector & req->dev->sector_mask) {
@@ -317,6 +323,8 @@ static void virtio_blk_handle_read(VirtI
 
     sector = ldq_p(&req->out->sector);
 
+    bdrv_acct_start(req->dev->bs, &req->acct, req->qiov.size, BDRV_ACCT_READ);
+
     if (sector & req->dev->sector_mask) {
         virtio_blk_rw_complete(req, -EIO);
         return;
Index: qemu/hw/xen_disk.c
===================================================================
--- qemu.orig/hw/xen_disk.c	2011-08-11 12:32:35.995990698 +0200
+++ qemu/hw/xen_disk.c	2011-08-11 16:05:24.878530408 +0200
@@ -79,6 +79,7 @@ struct ioreq {
 
     struct XenBlkDev    *blkdev;
     QLIST_ENTRY(ioreq)   list;
+    BlockAcctCookie     acct;
 };
 
 struct XenBlkDev {
@@ -401,6 +402,7 @@ static void qemu_aio_complete(void *opaq
     ioreq->status = ioreq->aio_errors ? BLKIF_RSP_ERROR : BLKIF_RSP_OKAY;
     ioreq_unmap(ioreq);
     ioreq_finish(ioreq);
+    bdrv_acct_done(ioreq->blkdev->bs, &ioreq->acct);
     qemu_bh_schedule(ioreq->blkdev->bh);
 }
 
@@ -419,6 +421,7 @@ static int ioreq_runio_qemu_aio(struct i
 
     switch (ioreq->req.operation) {
     case BLKIF_OP_READ:
+        bdrv_acct_start(blkdev->bs, &ioreq->acct, ioreq->v.size, BDRV_ACCT_READ);
         ioreq->aio_inflight++;
         bdrv_aio_readv(blkdev->bs, ioreq->start / BLOCK_SIZE,
                        &ioreq->v, ioreq->v.size / BLOCK_SIZE,
@@ -429,6 +432,8 @@ static int ioreq_runio_qemu_aio(struct i
         if (!ioreq->req.nr_segments) {
             break;
         }
+
+        bdrv_acct_start(blkdev->bs, &ioreq->acct, ioreq->v.size, BDRV_ACCT_WRITE);
         ioreq->aio_inflight++;
         bdrv_aio_writev(blkdev->bs, ioreq->start / BLOCK_SIZE,
                         &ioreq->v, ioreq->v.size / BLOCK_SIZE,
Index: qemu/hw/ide/internal.h
===================================================================
--- qemu.orig/hw/ide/internal.h	2011-08-11 12:32:35.925991078 +0200
+++ qemu/hw/ide/internal.h	2011-08-11 16:05:24.878530408 +0200
@@ -440,6 +440,7 @@ struct IDEState {
     int lba;
     int cd_sector_size;
     int atapi_dma; /* true if dma is requested for the packet cmd */
+    BlockAcctCookie acct;
     /* ATA DMA state */
     int io_buffer_size;
     QEMUSGList sg;
Index: qemu/hw/ide/ahci.h
===================================================================
--- qemu.orig/hw/ide/ahci.h	2011-08-11 16:35:07.095541974 +0200
+++ qemu/hw/ide/ahci.h	2011-08-11 16:35:24.818779294 +0200
@@ -258,6 +258,7 @@ typedef struct NCQTransferState {
     AHCIDevice *drive;
     BlockDriverAIOCB *aiocb;
     QEMUSGList sglist;
+    BlockAcctCookie acct;
     int is_read;
     uint16_t sector_count;
     uint64_t lba;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 3/3] block: latency accounting
  2011-08-11 23:44 [Qemu-devel] [PATCH 0/3] better I/O accounting Christoph Hellwig
  2011-08-11 23:44 ` [Qemu-devel] [PATCH 1/3] block: include flush requests in info blockstats Christoph Hellwig
  2011-08-11 23:44 ` [Qemu-devel] [PATCH 2/3] block: explicit I/O accounting Christoph Hellwig
@ 2011-08-11 23:44 ` Christoph Hellwig
  2011-08-12  5:14 ` [Qemu-devel] [PATCH 0/3] better I/O accounting Stefan Hajnoczi
  3 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2011-08-11 23:44 UTC (permalink / raw)
  To: qemu-devel

Account the total latency for read/write/flush requests.  This allows
management tools to average it based on a snapshot of the nr ops
counters and allow checking for SLAs or provide statistics.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/block.c
===================================================================
--- qemu.orig/block.c	2011-08-04 16:09:22.656913900 +0200
+++ qemu/block.c	2011-08-04 16:11:14.746306661 +0200
@@ -1790,12 +1790,18 @@ static void bdrv_stats_iter(QObject *dat
                         " rd_operations=%" PRId64
                         " wr_operations=%" PRId64
                         " flush_operations=%" PRId64
+                        " wr_total_time_ns=%" PRId64
+                        " rd_total_time_ns=%" PRId64
+                        " flush_total_time_ns=%" PRId64
                         "\n",
                         qdict_get_int(qdict, "rd_bytes"),
                         qdict_get_int(qdict, "wr_bytes"),
                         qdict_get_int(qdict, "rd_operations"),
                         qdict_get_int(qdict, "wr_operations"),
-                        qdict_get_int(qdict, "flush_operations"));
+                        qdict_get_int(qdict, "flush_operations"),
+                        qdict_get_int(qdict, "wr_total_time_ns"),
+                        qdict_get_int(qdict, "rd_total_time_ns"),
+                        qdict_get_int(qdict, "flush_total_time_ns"));
 }
 
 void bdrv_stats_print(Monitor *mon, const QObject *data)
@@ -1814,7 +1820,10 @@ static QObject* bdrv_info_stats_bs(Block
                              "'rd_operations': %" PRId64 ","
                              "'wr_operations': %" PRId64 ","
                              "'wr_highest_offset': %" PRId64 ","
-                             "'flush_operations': %" PRId64
+                             "'flush_operations': %" PRId64 ","
+                             "'wr_total_time_ns': %" PRId64 ","
+                             "'rd_total_time_ns': %" PRId64 ","
+                             "'flush_total_time_ns': %" PRId64
                              "} }",
                              bs->nr_bytes[BDRV_ACCT_READ],
                              bs->nr_bytes[BDRV_ACCT_WRITE],
@@ -1822,7 +1831,10 @@ static QObject* bdrv_info_stats_bs(Block
                              bs->nr_ops[BDRV_ACCT_WRITE],
                              bs->wr_highest_sector *
                              (uint64_t)BDRV_SECTOR_SIZE,
-                             bs->nr_ops[BDRV_ACCT_FLUSH]);
+                             bs->nr_ops[BDRV_ACCT_FLUSH],
+                             bs->total_time_ns[BDRV_ACCT_READ],
+                             bs->total_time_ns[BDRV_ACCT_WRITE],
+                             bs->total_time_ns[BDRV_ACCT_FLUSH]);
     dict  = qobject_to_qdict(res);
 
     if (*bs->device_name) {
Index: qemu/block_int.h
===================================================================
--- qemu.orig/block_int.h	2011-08-04 16:07:42.614122545 +0200
+++ qemu/block_int.h	2011-08-04 16:11:42.752821602 +0200
@@ -27,6 +27,7 @@
 #include "block.h"
 #include "qemu-option.h"
 #include "qemu-queue.h"
+#include "qemu-timer.h"
 
 #define BLOCK_FLAG_ENCRYPT	1
 #define BLOCK_FLAG_COMPAT6	4
@@ -187,6 +188,7 @@ struct BlockDriverState {
     /* I/O stats (display with "info blockstats"). */
     uint64_t nr_bytes[BDRV_MAX_IOTYPE];
     uint64_t nr_ops[BDRV_MAX_IOTYPE];
+    uint64_t total_time_ns[BDRV_MAX_IOTYPE];
     uint64_t wr_highest_sector;
 
     /* Whether the disk can expand beyond total_sectors */
@@ -212,6 +214,7 @@ struct BlockDriverState {
 
 typedef struct BlockAcctCookie {
     int64_t bytes;
+    int64_t start_time_ns;
     enum BlockIOType type;
 } BlockAcctCookie;
 
@@ -221,6 +224,7 @@ bdrv_acct_start(BlockDriverState *bs, Bl
 {
     cookie->bytes = bytes;
     cookie->type = type;
+    cookie->start_time_ns = get_clock();
 }
 
 static inline void
@@ -228,6 +232,7 @@ bdrv_acct_done(BlockDriverState *bs, Blo
 {
     bs->nr_bytes[cookie->type] += cookie->bytes;
     bs->nr_ops[cookie->type]++;
+    bs->total_time_ns[cookie->type] += get_clock() - cookie->start_time_ns;
 }
 
 
Index: qemu/qmp-commands.hx
===================================================================
--- qemu.orig/qmp-commands.hx	2011-08-04 16:13:37.912197731 +0200
+++ qemu/qmp-commands.hx	2011-08-04 16:16:06.034728615 +0200
@@ -1202,6 +1202,9 @@ Each json-object contain the following:
     - "rd_operations": read operations (json-int)
     - "wr_operations": write operations (json-int)
     - "flush_operations": cache flush operations (json-int)
+    - "wr_total_time_ns": total time spend on writes in nano-seconds (json-int)
+    - "rd_total_time_ns": total time spend on reads in nano-seconds (json-int)
+    - "flush_total_time_ns": total time spend on cache flushes in nano-seconds (json-int)
     - "wr_highest_offset": Highest offset of a sector written since the
                            BlockDriverState has been opened (json-int)
 - "parent": Contains recursively the statistics of the underlying
@@ -1223,6 +1226,9 @@ Example:
                   "wr_operations":751,
                   "rd_bytes":122567168,
                   "rd_operations":36772
+                  "wr_total_times_ns":313253456
+                  "rd_total_times_ns":3465673657
+                  "flush_total_times_ns":49653
                   "flush_operations":61,
                }
             },
@@ -1233,6 +1239,9 @@ Example:
                "rd_bytes":122739200,
                "rd_operations":36604
                "flush_operations":51,
+               "wr_total_times_ns":313253456
+               "rd_total_times_ns":3465673657
+               "flush_total_times_ns":49653
             }
          },
          {
@@ -1244,6 +1253,9 @@ Example:
                "rd_bytes":0,
                "rd_operations":0
                "flush_operations":0,
+               "wr_total_times_ns":0
+               "rd_total_times_ns":0
+               "flush_total_times_ns":0
             }
          },
          {
@@ -1255,6 +1267,9 @@ Example:
                "rd_bytes":0,
                "rd_operations":0
                "flush_operations":0,
+               "wr_total_times_ns":0
+               "rd_total_times_ns":0
+               "flush_total_times_ns":0
             }
          },
          {
@@ -1266,6 +1281,9 @@ Example:
                "rd_bytes":0,
                "rd_operations":0
                "flush_operations":0,
+               "wr_total_times_ns":0
+               "rd_total_times_ns":0
+               "flush_total_times_ns":0
             }
          }
       ]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] better I/O accounting
  2011-08-11 23:44 [Qemu-devel] [PATCH 0/3] better I/O accounting Christoph Hellwig
                   ` (2 preceding siblings ...)
  2011-08-11 23:44 ` [Qemu-devel] [PATCH 3/3] block: latency accounting Christoph Hellwig
@ 2011-08-12  5:14 ` Stefan Hajnoczi
  2011-08-12  5:50   ` Stefan Hajnoczi
  3 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2011-08-12  5:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel

This change requires that callers manage the accounting cookie.  I
think this belongs in block.c to make the life of callers easier.

Today block.c doesn't track requests and pretty much lets the
BlockDriver control the BlockDriverAIOCB.  But the request metadata
(timestamp, operation type, number of bytes) could all be part of the
BlockDriverAIOCB.  block.c also needs to interpose a cb function so
that it can account when the request completes.  Doing this all once
seems like the right way to go.

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] better I/O accounting
  2011-08-12  5:14 ` [Qemu-devel] [PATCH 0/3] better I/O accounting Stefan Hajnoczi
@ 2011-08-12  5:50   ` Stefan Hajnoczi
  2011-08-12 13:40     ` Christoph Hellwig
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2011-08-12  5:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel

On Fri, Aug 12, 2011 at 6:14 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> This change requires that callers manage the accounting cookie.  I
> think this belongs in block.c to make the life of callers easier.
>
> Today block.c doesn't track requests and pretty much lets the
> BlockDriver control the BlockDriverAIOCB.  But the request metadata
> (timestamp, operation type, number of bytes) could all be part of the
> BlockDriverAIOCB.  block.c also needs to interpose a cb function so
> that it can account when the request completes.  Doing this all once
> seems like the right way to go.

I do see the drawbacks of leaving accounting in block.c on every
request.  For example, we'll fetch the timestamp for internal requests
as well as external requests from hardware emulation.  Due to memory
mapping and bounce buffers it is also possible that requests get
split.  Multiwrite also changes the I/O pattern.

So I guess this external accounting cookie approach is okay.  We just
need to be strict about reviewing any code that uses bdrv_aio_*() to
make sure it has accounting in place where appropriate.

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] better I/O accounting
  2011-08-12  5:50   ` Stefan Hajnoczi
@ 2011-08-12 13:40     ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2011-08-12 13:40 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Christoph Hellwig, qemu-devel

On Fri, Aug 12, 2011 at 06:50:57AM +0100, Stefan Hajnoczi wrote:
> I do see the drawbacks of leaving accounting in block.c on every
> request.  For example, we'll fetch the timestamp for internal requests
> as well as external requests from hardware emulation.  Due to memory
> mapping and bounce buffers it is also possible that requests get
> split.  Multiwrite also changes the I/O pattern.
> 
> So I guess this external accounting cookie approach is okay.  We just
> need to be strict about reviewing any code that uses bdrv_aio_*() to
> make sure it has accounting in place where appropriate.

The most important point is that we potentially have a lot of
bdrv_aio_read/write calls for a single guest request.  That's the case
for both scsi and ide, only virtio does 1 call per guest request (modulo
multiwrite).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 3/3] block: latency accounting
  2011-08-21 22:25 [Qemu-devel] [PATCH 0/3] better I/O accounting V2 Christoph Hellwig
@ 2011-08-21 22:26 ` Christoph Hellwig
  2011-08-25  6:25 ` [Qemu-devel] [PATCH 0/3] better I/O accounting V3 Christoph Hellwig
  1 sibling, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2011-08-21 22:26 UTC (permalink / raw)
  To: qemu-devel

Account the total latency for read/write/flush requests.  This allows
management tools to average it based on a snapshot of the nr ops
counters and allow checking for SLAs or provide statistics.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/block.c
===================================================================
--- qemu.orig/block.c	2011-08-11 07:05:24.000000000 -0700
+++ qemu/block.c	2011-08-11 07:37:01.114924279 -0700
@@ -1889,12 +1889,18 @@ static void bdrv_stats_iter(QObject *dat
                         " rd_operations=%" PRId64
                         " wr_operations=%" PRId64
                         " flush_operations=%" PRId64
+                        " wr_total_time_ns=%" PRId64
+                        " rd_total_time_ns=%" PRId64
+                        " flush_total_time_ns=%" PRId64
                         "\n",
                         qdict_get_int(qdict, "rd_bytes"),
                         qdict_get_int(qdict, "wr_bytes"),
                         qdict_get_int(qdict, "rd_operations"),
                         qdict_get_int(qdict, "wr_operations"),
-                        qdict_get_int(qdict, "flush_operations"));
+                        qdict_get_int(qdict, "flush_operations"),
+                        qdict_get_int(qdict, "wr_total_time_ns"),
+                        qdict_get_int(qdict, "rd_total_time_ns"),
+                        qdict_get_int(qdict, "flush_total_time_ns"));
 }
 
 void bdrv_stats_print(Monitor *mon, const QObject *data)
@@ -1913,7 +1919,10 @@ static QObject* bdrv_info_stats_bs(Block
                              "'rd_operations': %" PRId64 ","
                              "'wr_operations': %" PRId64 ","
                              "'wr_highest_offset': %" PRId64 ","
-                             "'flush_operations': %" PRId64
+                             "'flush_operations': %" PRId64 ","
+                             "'wr_total_time_ns': %" PRId64 ","
+                             "'rd_total_time_ns': %" PRId64 ","
+                             "'flush_total_time_ns': %" PRId64
                              "} }",
                              bs->nr_bytes[BDRV_ACCT_READ],
                              bs->nr_bytes[BDRV_ACCT_WRITE],
@@ -1921,7 +1930,10 @@ static QObject* bdrv_info_stats_bs(Block
                              bs->nr_ops[BDRV_ACCT_WRITE],
                              bs->wr_highest_sector *
                              (uint64_t)BDRV_SECTOR_SIZE,
-                             bs->nr_ops[BDRV_ACCT_FLUSH]);
+                             bs->nr_ops[BDRV_ACCT_FLUSH],
+                             bs->total_time_ns[BDRV_ACCT_READ],
+                             bs->total_time_ns[BDRV_ACCT_WRITE],
+                             bs->total_time_ns[BDRV_ACCT_FLUSH]);
     dict  = qobject_to_qdict(res);
 
     if (*bs->device_name) {
Index: qemu/block_int.h
===================================================================
--- qemu.orig/block_int.h	2011-08-11 07:05:24.000000000 -0700
+++ qemu/block_int.h	2011-08-11 07:37:17.964832995 -0700
@@ -28,6 +28,7 @@
 #include "qemu-option.h"
 #include "qemu-queue.h"
 #include "qemu-coroutine.h"
+#include "qemu-timer.h"
 
 #define BLOCK_FLAG_ENCRYPT	1
 #define BLOCK_FLAG_COMPAT6	4
@@ -193,6 +194,7 @@ struct BlockDriverState {
     /* I/O stats (display with "info blockstats"). */
     uint64_t nr_bytes[BDRV_MAX_IOTYPE];
     uint64_t nr_ops[BDRV_MAX_IOTYPE];
+    uint64_t total_time_ns[BDRV_MAX_IOTYPE];
     uint64_t wr_highest_sector;
 
     /* Whether the disk can expand beyond total_sectors */
@@ -218,6 +220,7 @@ struct BlockDriverState {
 
 typedef struct BlockAcctCookie {
     int64_t bytes;
+    int64_t start_time_ns;
     enum BlockIOType type;
 } BlockAcctCookie;
 
@@ -227,6 +230,7 @@ bdrv_acct_start(BlockDriverState *bs, Bl
 {
     cookie->bytes = bytes;
     cookie->type = type;
+    cookie->start_time_ns = get_clock();
 }
 
 static inline void
@@ -234,6 +238,7 @@ bdrv_acct_done(BlockDriverState *bs, Blo
 {
     bs->nr_bytes[cookie->type] += cookie->bytes;
     bs->nr_ops[cookie->type]++;
+    bs->total_time_ns[cookie->type] += get_clock() - cookie->start_time_ns;
 }
 
 
Index: qemu/qmp-commands.hx
===================================================================
--- qemu.orig/qmp-commands.hx	2011-08-11 07:04:48.000000000 -0700
+++ qemu/qmp-commands.hx	2011-08-11 07:37:01.121590911 -0700
@@ -1202,6 +1202,9 @@ Each json-object contain the following:
     - "rd_operations": read operations (json-int)
     - "wr_operations": write operations (json-int)
     - "flush_operations": cache flush operations (json-int)
+    - "wr_total_time_ns": total time spend on writes in nano-seconds (json-int)
+    - "rd_total_time_ns": total time spend on reads in nano-seconds (json-int)
+    - "flush_total_time_ns": total time spend on cache flushes in nano-seconds (json-int)
     - "wr_highest_offset": Highest offset of a sector written since the
                            BlockDriverState has been opened (json-int)
 - "parent": Contains recursively the statistics of the underlying
@@ -1223,6 +1226,9 @@ Example:
                   "wr_operations":751,
                   "rd_bytes":122567168,
                   "rd_operations":36772
+                  "wr_total_times_ns":313253456
+                  "rd_total_times_ns":3465673657
+                  "flush_total_times_ns":49653
                   "flush_operations":61,
                }
             },
@@ -1233,6 +1239,9 @@ Example:
                "rd_bytes":122739200,
                "rd_operations":36604
                "flush_operations":51,
+               "wr_total_times_ns":313253456
+               "rd_total_times_ns":3465673657
+               "flush_total_times_ns":49653
             }
          },
          {
@@ -1244,6 +1253,9 @@ Example:
                "rd_bytes":0,
                "rd_operations":0
                "flush_operations":0,
+               "wr_total_times_ns":0
+               "rd_total_times_ns":0
+               "flush_total_times_ns":0
             }
          },
          {
@@ -1255,6 +1267,9 @@ Example:
                "rd_bytes":0,
                "rd_operations":0
                "flush_operations":0,
+               "wr_total_times_ns":0
+               "rd_total_times_ns":0
+               "flush_total_times_ns":0
             }
          },
          {
@@ -1266,6 +1281,9 @@ Example:
                "rd_bytes":0,
                "rd_operations":0
                "flush_operations":0,
+               "wr_total_times_ns":0
+               "rd_total_times_ns":0
+               "flush_total_times_ns":0
             }
          }
       ]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 3/3] block: latency accounting
  2011-08-25  6:25 ` [Qemu-devel] [PATCH 0/3] better I/O accounting V3 Christoph Hellwig
@ 2011-08-25  6:26   ` Christoph Hellwig
  2011-08-26 13:05     ` Kevin Wolf
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2011-08-25  6:26 UTC (permalink / raw)
  To: qemu-devel

Account the total latency for read/write/flush requests.  This allows
management tools to average it based on a snapshot of the nr ops
counters and allow checking for SLAs or provide statistics.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/block.c
===================================================================
--- qemu.orig/block.c	2011-08-25 08:01:12.000000000 +0200
+++ qemu/block.c	2011-08-25 08:03:02.908005646 +0200
@@ -1889,12 +1889,18 @@ static void bdrv_stats_iter(QObject *dat
                         " rd_operations=%" PRId64
                         " wr_operations=%" PRId64
                         " flush_operations=%" PRId64
+                        " wr_total_time_ns=%" PRId64
+                        " rd_total_time_ns=%" PRId64
+                        " flush_total_time_ns=%" PRId64
                         "\n",
                         qdict_get_int(qdict, "rd_bytes"),
                         qdict_get_int(qdict, "wr_bytes"),
                         qdict_get_int(qdict, "rd_operations"),
                         qdict_get_int(qdict, "wr_operations"),
-                        qdict_get_int(qdict, "flush_operations"));
+                        qdict_get_int(qdict, "flush_operations"),
+                        qdict_get_int(qdict, "wr_total_time_ns"),
+                        qdict_get_int(qdict, "rd_total_time_ns"),
+                        qdict_get_int(qdict, "flush_total_time_ns"));
 }
 
 void bdrv_stats_print(Monitor *mon, const QObject *data)
@@ -1913,7 +1919,10 @@ static QObject* bdrv_info_stats_bs(Block
                              "'rd_operations': %" PRId64 ","
                              "'wr_operations': %" PRId64 ","
                              "'wr_highest_offset': %" PRId64 ","
-                             "'flush_operations': %" PRId64
+                             "'flush_operations': %" PRId64 ","
+                             "'wr_total_time_ns': %" PRId64 ","
+                             "'rd_total_time_ns': %" PRId64 ","
+                             "'flush_total_time_ns': %" PRId64
                              "} }",
                              bs->nr_bytes[BDRV_ACCT_READ],
                              bs->nr_bytes[BDRV_ACCT_WRITE],
@@ -1921,7 +1930,10 @@ static QObject* bdrv_info_stats_bs(Block
                              bs->nr_ops[BDRV_ACCT_WRITE],
                              bs->wr_highest_sector *
                              (uint64_t)BDRV_SECTOR_SIZE,
-                             bs->nr_ops[BDRV_ACCT_FLUSH]);
+                             bs->nr_ops[BDRV_ACCT_FLUSH],
+                             bs->total_time_ns[BDRV_ACCT_READ],
+                             bs->total_time_ns[BDRV_ACCT_WRITE],
+                             bs->total_time_ns[BDRV_ACCT_FLUSH]);
     dict  = qobject_to_qdict(res);
 
     if (*bs->device_name) {
@@ -3133,6 +3145,7 @@ bdrv_acct_start(BlockDriverState *bs, Bl
     assert(type < BDRV_MAX_IOTYPE);
 
     cookie->bytes = bytes;
+    cookie->start_time_ns = get_clock();
     cookie->type = type;
 }
 
@@ -3143,6 +3156,7 @@ bdrv_acct_done(BlockDriverState *bs, Blo
 
     bs->nr_bytes[cookie->type] += cookie->bytes;
     bs->nr_ops[cookie->type]++;
+    bs->total_time_ns[cookie->type] += get_clock() - cookie->start_time_ns;
 }
 
 int bdrv_img_create(const char *filename, const char *fmt,
Index: qemu/block_int.h
===================================================================
--- qemu.orig/block_int.h	2011-08-25 08:01:12.000000000 +0200
+++ qemu/block_int.h	2011-08-25 08:02:20.981566114 +0200
@@ -28,6 +28,7 @@
 #include "qemu-option.h"
 #include "qemu-queue.h"
 #include "qemu-coroutine.h"
+#include "qemu-timer.h"
 
 #define BLOCK_FLAG_ENCRYPT	1
 #define BLOCK_FLAG_COMPAT6	4
@@ -186,6 +187,7 @@ struct BlockDriverState {
     /* I/O stats (display with "info blockstats"). */
     uint64_t nr_bytes[BDRV_MAX_IOTYPE];
     uint64_t nr_ops[BDRV_MAX_IOTYPE];
+    uint64_t total_time_ns[BDRV_MAX_IOTYPE];
     uint64_t wr_highest_sector;
 
     /* Whether the disk can expand beyond total_sectors */
Index: qemu/qmp-commands.hx
===================================================================
--- qemu.orig/qmp-commands.hx	2011-08-25 08:01:04.000000000 +0200
+++ qemu/qmp-commands.hx	2011-08-25 08:02:20.984899429 +0200
@@ -1202,6 +1202,9 @@ Each json-object contain the following:
     - "rd_operations": read operations (json-int)
     - "wr_operations": write operations (json-int)
     - "flush_operations": cache flush operations (json-int)
+    - "wr_total_time_ns": total time spend on writes in nano-seconds (json-int)
+    - "rd_total_time_ns": total time spend on reads in nano-seconds (json-int)
+    - "flush_total_time_ns": total time spend on cache flushes in nano-seconds (json-int)
     - "wr_highest_offset": Highest offset of a sector written since the
                            BlockDriverState has been opened (json-int)
 - "parent": Contains recursively the statistics of the underlying
@@ -1223,6 +1226,9 @@ Example:
                   "wr_operations":751,
                   "rd_bytes":122567168,
                   "rd_operations":36772
+                  "wr_total_times_ns":313253456
+                  "rd_total_times_ns":3465673657
+                  "flush_total_times_ns":49653
                   "flush_operations":61,
                }
             },
@@ -1233,6 +1239,9 @@ Example:
                "rd_bytes":122739200,
                "rd_operations":36604
                "flush_operations":51,
+               "wr_total_times_ns":313253456
+               "rd_total_times_ns":3465673657
+               "flush_total_times_ns":49653
             }
          },
          {
@@ -1244,6 +1253,9 @@ Example:
                "rd_bytes":0,
                "rd_operations":0
                "flush_operations":0,
+               "wr_total_times_ns":0
+               "rd_total_times_ns":0
+               "flush_total_times_ns":0
             }
          },
          {
@@ -1255,6 +1267,9 @@ Example:
                "rd_bytes":0,
                "rd_operations":0
                "flush_operations":0,
+               "wr_total_times_ns":0
+               "rd_total_times_ns":0
+               "flush_total_times_ns":0
             }
          },
          {
@@ -1266,6 +1281,9 @@ Example:
                "rd_bytes":0,
                "rd_operations":0
                "flush_operations":0,
+               "wr_total_times_ns":0
+               "rd_total_times_ns":0
+               "flush_total_times_ns":0
             }
          }
       ]
Index: qemu/block.h
===================================================================
--- qemu.orig/block.h	2011-08-25 08:02:32.094839241 +0200
+++ qemu/block.h	2011-08-25 08:02:40.718125859 +0200
@@ -263,6 +263,7 @@ enum BlockAcctType {
 
 typedef struct BlockAcctCookie {
     int64_t bytes;
+    int64_t start_time_ns;
     enum BlockAcctType type;
 } BlockAcctCookie;
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] block: latency accounting
  2011-08-25  6:26   ` [Qemu-devel] [PATCH 3/3] block: latency accounting Christoph Hellwig
@ 2011-08-26 13:05     ` Kevin Wolf
  2011-08-26 16:06       ` Christoph Hellwig
  0 siblings, 1 reply; 11+ messages in thread
From: Kevin Wolf @ 2011-08-26 13:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel

Am 25.08.2011 08:26, schrieb Christoph Hellwig:
> Account the total latency for read/write/flush requests.  This allows
> management tools to average it based on a snapshot of the nr ops
> counters and allow checking for SLAs or provide statistics.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Index: qemu/block.c
> ===================================================================
> --- qemu.orig/block.c	2011-08-25 08:01:12.000000000 +0200
> +++ qemu/block.c	2011-08-25 08:03:02.908005646 +0200
> @@ -1889,12 +1889,18 @@ static void bdrv_stats_iter(QObject *dat
>                          " rd_operations=%" PRId64
>                          " wr_operations=%" PRId64
>                          " flush_operations=%" PRId64
> +                        " wr_total_time_ns=%" PRId64
> +                        " rd_total_time_ns=%" PRId64
> +                        " flush_total_time_ns=%" PRId64
>                          "\n",
>                          qdict_get_int(qdict, "rd_bytes"),
>                          qdict_get_int(qdict, "wr_bytes"),
>                          qdict_get_int(qdict, "rd_operations"),
>                          qdict_get_int(qdict, "wr_operations"),
> -                        qdict_get_int(qdict, "flush_operations"));
> +                        qdict_get_int(qdict, "flush_operations"),
> +                        qdict_get_int(qdict, "wr_total_time_ns"),
> +                        qdict_get_int(qdict, "rd_total_time_ns"),
> +                        qdict_get_int(qdict, "flush_total_time_ns"));
>  }
>  
>  void bdrv_stats_print(Monitor *mon, const QObject *data)
> @@ -1913,7 +1919,10 @@ static QObject* bdrv_info_stats_bs(Block
>                               "'rd_operations': %" PRId64 ","
>                               "'wr_operations': %" PRId64 ","
>                               "'wr_highest_offset': %" PRId64 ","
> -                             "'flush_operations': %" PRId64
> +                             "'flush_operations': %" PRId64 ","
> +                             "'wr_total_time_ns': %" PRId64 ","
> +                             "'rd_total_time_ns': %" PRId64 ","
> +                             "'flush_total_time_ns': %" PRId64
>                               "} }",
>                               bs->nr_bytes[BDRV_ACCT_READ],
>                               bs->nr_bytes[BDRV_ACCT_WRITE],
> @@ -1921,7 +1930,10 @@ static QObject* bdrv_info_stats_bs(Block
>                               bs->nr_ops[BDRV_ACCT_WRITE],
>                               bs->wr_highest_sector *
>                               (uint64_t)BDRV_SECTOR_SIZE,
> -                             bs->nr_ops[BDRV_ACCT_FLUSH]);
> +                             bs->nr_ops[BDRV_ACCT_FLUSH],
> +                             bs->total_time_ns[BDRV_ACCT_READ],
> +                             bs->total_time_ns[BDRV_ACCT_WRITE],
> +                             bs->total_time_ns[BDRV_ACCT_FLUSH]);

The order of READ vs. WRITE is wrong here, so read latencies show up as
wr_total_time. I can fix it locally if you agree that swapping the two
lines is the right fix.

Kevin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] block: latency accounting
  2011-08-26 13:05     ` Kevin Wolf
@ 2011-08-26 16:06       ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2011-08-26 16:06 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Christoph Hellwig, qemu-devel

On Fri, Aug 26, 2011 at 03:05:06PM +0200, Kevin Wolf wrote:
> The order of READ vs. WRITE is wrong here, so read latencies show up as
> wr_total_time. I can fix it locally if you agree that swapping the two
> lines is the right fix.

Yes, it got mixed up in one of the revamps.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-08-26 16:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-11 23:44 [Qemu-devel] [PATCH 0/3] better I/O accounting Christoph Hellwig
2011-08-11 23:44 ` [Qemu-devel] [PATCH 1/3] block: include flush requests in info blockstats Christoph Hellwig
2011-08-11 23:44 ` [Qemu-devel] [PATCH 2/3] block: explicit I/O accounting Christoph Hellwig
2011-08-11 23:44 ` [Qemu-devel] [PATCH 3/3] block: latency accounting Christoph Hellwig
2011-08-12  5:14 ` [Qemu-devel] [PATCH 0/3] better I/O accounting Stefan Hajnoczi
2011-08-12  5:50   ` Stefan Hajnoczi
2011-08-12 13:40     ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2011-08-21 22:25 [Qemu-devel] [PATCH 0/3] better I/O accounting V2 Christoph Hellwig
2011-08-21 22:26 ` [Qemu-devel] [PATCH 3/3] block: latency accounting Christoph Hellwig
2011-08-25  6:25 ` [Qemu-devel] [PATCH 0/3] better I/O accounting V3 Christoph Hellwig
2011-08-25  6:26   ` [Qemu-devel] [PATCH 3/3] block: latency accounting Christoph Hellwig
2011-08-26 13:05     ` Kevin Wolf
2011-08-26 16:06       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).