qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PULL 00/38] Block patches
@ 2010-12-17 17:44 Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 01/38] blockdev: check dinfo ptr before using Kevin Wolf
                   ` (38 more replies)
  0 siblings, 39 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

The following changes since commit 9d861fa595c93f22d1d55b723a691531c36c9672:

  Merge remote branch 'arm/for-anthony' into staging (2010-12-17 08:25:17 -0600)

are available in the git repository at:

  git://repo.or.cz/qemu/kevin.git for-anthony

Alexander Graf (10):
      ide: split ide command interpretation off
      ide: fix whitespace gap in ide_exec_cmd
      ide: Split out BMDMA code from ATA core
      ide: move transfer_start after variable modification
      pci: add storage class for sata
      pci: add ich9 pci id
      ahci: add ahci emulation
      config: move ide core and pci to pci.mak
      config: add ahci for pci capable machines
      ide: honor ncq for atapi

Christoph Hellwig (3):
      block: add discard support
      scsi-disk: support WRITE SAME (16) with unmap bit
      raw-posix: add discard support

Jes Sorensen (9):
      Introduce strtosz_suffix()
      qemu-img.c: Clean up handling of image size in img_create()
      qemu-img.c: Re-factor img_create()
      Introduce do_snapshot_blkdev() and monitor command to handle it.
      Prevent creating an image with the same filename as backing file
      bdrv_img_create() use proper errno return values
      qemu.img.c: Use error_report() instead of own error() implementation
      block/qcow2.c: rename qcow_ functions to qcow2_
      Add proper -errno error return values to qcow2_open()

Kevin Wolf (3):
      qemu-img: Call error_set_progname
      Remove NULL checks for bdrv_new return value
      qemu-io: Fix typo in help texts

Roland Elek (1):
      ide: add ncq identify data for ahci sata drives

Ryan Harper (1):
      blockdev: check dinfo ptr before using

Sebastian Herbszt (1):
      ahci: set SATA Mode Select

Stefan Hajnoczi (10):
      block: Introduce path_has_protocol() function
      block: Fix the use of protocols in backing files
      ide: Register vm change state handler once only
      qemu-io: Add discard command
      docs: Add QED image format specification
      qed: Add QEMU Enhanced Disk image format
      qed: Table, L2 cache, and cluster functions
      qed: Read/write support
      qed: Consistency check support
      docs: Fix missing carets in QED specification

 Makefile.objs                        |    3 +
 block.c                              |  234 +++++-
 block.h                              |    5 +
 block/qcow2-cluster.c                |    6 +-
 block/qcow2-snapshot.c               |    6 +-
 block/qcow2.c                        |  248 +++---
 block/qed-check.c                    |  210 +++++
 block/qed-cluster.c                  |  154 ++++
 block/qed-gencb.c                    |   32 +
 block/qed-l2-cache.c                 |  173 ++++
 block/qed-table.c                    |  319 +++++++
 block/qed.c                          | 1349 ++++++++++++++++++++++++++++++
 block/qed.h                          |  301 +++++++
 block/raw-posix.c                    |   45 +
 block/raw.c                          |    6 +
 block_int.h                          |    8 +-
 blockdev.c                           |   68 ++-
 blockdev.h                           |    1 +
 configure                            |   26 +
 cutils.c                             |   17 +-
 default-configs/arm-softmmu.mak      |    1 -
 default-configs/i386-softmmu.mak     |    3 -
 default-configs/mips-softmmu.mak     |    3 -
 default-configs/mips64-softmmu.mak   |    3 -
 default-configs/mips64el-softmmu.mak |    3 -
 default-configs/mipsel-softmmu.mak   |    3 -
 default-configs/pci.mak              |    4 +
 default-configs/ppc-softmmu.mak      |    3 -
 default-configs/ppc64-softmmu.mak    |    3 -
 default-configs/ppcemb-softmmu.mak   |    3 -
 default-configs/sh4-softmmu.mak      |    1 -
 default-configs/sh4eb-softmmu.mak    |    1 -
 default-configs/sparc64-softmmu.mak  |    3 -
 default-configs/x86_64-softmmu.mak   |    3 -
 docs/specs/qed_spec.txt              |  130 +++
 hmp-commands.hx                      |   19 +
 hw/ide/ahci.c                        | 1527 ++++++++++++++++++++++++++++++++++
 hw/ide/cmd646.c                      |   18 +-
 hw/ide/core.c                        | 1116 +++++++++++--------------
 hw/ide/internal.h                    |   73 +-
 hw/ide/pci.c                         |  280 ++++++-
 hw/ide/pci.h                         |   30 +
 hw/ide/piix.c                        |   34 +-
 hw/ide/via.c                         |   34 +-
 hw/pci.h                             |    1 +
 hw/pci_ids.h                         |    1 +
 hw/scsi-defs.h                       |    1 +
 hw/scsi-disk.c                       |   53 ++-
 hw/xen_disk.c                        |   17 +-
 qemu-common.h                        |    7 +
 qemu-img.c                           |  250 ++----
 qemu-io.c                            |  102 +++-
 qemu-nbd.c                           |    2 -
 trace-events                         |   21 +
 54 files changed, 5892 insertions(+), 1072 deletions(-)
 create mode 100644 block/qed-check.c
 create mode 100644 block/qed-cluster.c
 create mode 100644 block/qed-gencb.c
 create mode 100644 block/qed-l2-cache.c
 create mode 100644 block/qed-table.c
 create mode 100644 block/qed.c
 create mode 100644 block/qed.h
 create mode 100644 docs/specs/qed_spec.txt
 create mode 100644 hw/ide/ahci.c

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 01/38] blockdev: check dinfo ptr before using
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 02/38] block: Introduce path_has_protocol() function Kevin Wolf
                   ` (37 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Ryan Harper <ryanh@us.ibm.com>

If a user decides to punish a guest by revoking its block device via
drive_del, and subsequently also attempts to remove the pci device
backing it, and the device is using blockdev_auto_del() then we get a
segfault when we attempt to access dinfo->auto_del.[1]

The fix is to check if drive_get_by_blockdev() actually returns a valid
dinfo pointer or not.

1. (qemu) pci_add auto storage file=images/test01.raw,if=virtio,id=block1,snapshot=on
   (qemu) drive_del block1
   (qemu) pci_del 5
   *segfault*

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Tested-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 blockdev.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index f6ac439..3b3b82d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -30,14 +30,16 @@ void blockdev_mark_auto_del(BlockDriverState *bs)
 {
     DriveInfo *dinfo = drive_get_by_blockdev(bs);
 
-    dinfo->auto_del = 1;
+    if (dinfo) {
+        dinfo->auto_del = 1;
+    }
 }
 
 void blockdev_auto_del(BlockDriverState *bs)
 {
     DriveInfo *dinfo = drive_get_by_blockdev(bs);
 
-    if (dinfo->auto_del) {
+    if (dinfo && dinfo->auto_del) {
         drive_uninit(dinfo);
     }
 }
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 02/38] block: Introduce path_has_protocol() function
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 01/38] blockdev: check dinfo ptr before using Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 03/38] block: Fix the use of protocols in backing files Kevin Wolf
                   ` (36 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

The bdrv_find_protocol() function returns NULL if an unknown protocol
name is given.  It returns the "file" protocol when the filename
contains no protocol at all.  This makes it difficult to distinguish
between paths which contain a protocol and those which do not.

Factor out a helper function that tests whether or not a filename has a
protocol.  The next patch makes use of this function.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c |   64 +++++++++++++++++++++++++++++++++++---------------------------
 1 files changed, 36 insertions(+), 28 deletions(-)

diff --git a/block.c b/block.c
index e7a986c..65fce80 100644
--- a/block.c
+++ b/block.c
@@ -70,6 +70,39 @@ static BlockDriverState *bs_snapshots;
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
+#ifdef _WIN32
+static int is_windows_drive_prefix(const char *filename)
+{
+    return (((filename[0] >= 'a' && filename[0] <= 'z') ||
+             (filename[0] >= 'A' && filename[0] <= 'Z')) &&
+            filename[1] == ':');
+}
+
+int is_windows_drive(const char *filename)
+{
+    if (is_windows_drive_prefix(filename) &&
+        filename[2] == '\0')
+        return 1;
+    if (strstart(filename, "\\\\.\\", NULL) ||
+        strstart(filename, "//./", NULL))
+        return 1;
+    return 0;
+}
+#endif
+
+/* check if the path starts with "<protocol>:" */
+static int path_has_protocol(const char *path)
+{
+#ifdef _WIN32
+    if (is_windows_drive(path) ||
+        is_windows_drive_prefix(path)) {
+        return 0;
+    }
+#endif
+
+    return strchr(path, ':') != NULL;
+}
+
 int path_is_absolute(const char *path)
 {
     const char *p;
@@ -244,26 +277,6 @@ void get_tmp_filename(char *filename, int size)
 }
 #endif
 
-#ifdef _WIN32
-static int is_windows_drive_prefix(const char *filename)
-{
-    return (((filename[0] >= 'a' && filename[0] <= 'z') ||
-             (filename[0] >= 'A' && filename[0] <= 'Z')) &&
-            filename[1] == ':');
-}
-
-int is_windows_drive(const char *filename)
-{
-    if (is_windows_drive_prefix(filename) &&
-        filename[2] == '\0')
-        return 1;
-    if (strstart(filename, "\\\\.\\", NULL) ||
-        strstart(filename, "//./", NULL))
-        return 1;
-    return 0;
-}
-#endif
-
 /*
  * Detect host devices. By convention, /dev/cdrom[N] is always
  * recognized as a host CDROM.
@@ -307,16 +320,11 @@ BlockDriver *bdrv_find_protocol(const char *filename)
         return drv1;
     }
 
-#ifdef _WIN32
-     if (is_windows_drive(filename) ||
-         is_windows_drive_prefix(filename))
-         return bdrv_find_format("file");
-#endif
-
-    p = strchr(filename, ':');
-    if (!p) {
+    if (!path_has_protocol(filename)) {
         return bdrv_find_format("file");
     }
+    p = strchr(filename, ':');
+    assert(p != NULL);
     len = p - filename;
     if (len > sizeof(protocol) - 1)
         len = sizeof(protocol) - 1;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 03/38] block: Fix the use of protocols in backing files
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 01/38] blockdev: check dinfo ptr before using Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 02/38] block: Introduce path_has_protocol() function Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 04/38] Introduce strtosz_suffix() Kevin Wolf
                   ` (35 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

Backing filenames may contain a protocol.  The code currently doesn't
consider this case and produces filenames that embed "<protocol>:".
Don't combine filenames if the backing filename contains a protocol.

Based on an earlier patch by Anthony Liguori <aliguori@us.ibm.com>.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index 65fce80..b4aaf41 100644
--- a/block.c
+++ b/block.c
@@ -611,10 +611,18 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
         BlockDriver *back_drv = NULL;
 
         bs->backing_hd = bdrv_new("");
-        path_combine(backing_filename, sizeof(backing_filename),
-                     filename, bs->backing_file);
-        if (bs->backing_format[0] != '\0')
+
+        if (path_has_protocol(bs->backing_file)) {
+            pstrcpy(backing_filename, sizeof(backing_filename),
+                    bs->backing_file);
+        } else {
+            path_combine(backing_filename, sizeof(backing_filename),
+                         filename, bs->backing_file);
+        }
+
+        if (bs->backing_format[0] != '\0') {
             back_drv = bdrv_find_format(bs->backing_format);
+        }
 
         /* backing files always opened read-only */
         back_flags =
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 04/38] Introduce strtosz_suffix()
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (2 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 03/38] block: Fix the use of protocols in backing files Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 05/38] qemu-img.c: Clean up handling of image size in img_create() Kevin Wolf
                   ` (34 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Jes Sorensen <Jes.Sorensen@redhat.com>

This introduces strtosz_suffix() which allows the caller to specify a
default suffix in case the non default of MB is wanted.

strtosz() is kept as a wrapper for strtosz_suffix() which keeps it's
current default of MB.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 cutils.c      |   17 ++++++++++++++---
 qemu-common.h |    7 +++++++
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/cutils.c b/cutils.c
index 28089aa..7984bc1 100644
--- a/cutils.c
+++ b/cutils.c
@@ -291,10 +291,10 @@ int fcntl_setfl(int fd, int flag)
  * value must be terminated by whitespace, ',' or '\0'. Return -1 on
  * error.
  */
-ssize_t strtosz(const char *nptr, char **end)
+ssize_t strtosz_suffix(const char *nptr, char **end, const char default_suffix)
 {
     ssize_t retval = -1;
-    char *endptr, c;
+    char *endptr, c, d;
     int mul_required = 0;
     double val, mul, integral, fraction;
 
@@ -313,10 +313,16 @@ ssize_t strtosz(const char *nptr, char **end)
      * part of a multi token argument.
      */
     c = *endptr;
+    d = c;
     if (isspace(c) || c == '\0' || c == ',') {
         c = 0;
+        if (default_suffix) {
+            d = default_suffix;
+        } else {
+            d = c;
+        }
     }
-    switch (c) {
+    switch (d) {
     case 'B':
     case 'b':
         mul = 1;
@@ -371,3 +377,8 @@ fail:
 
     return retval;
 }
+
+ssize_t strtosz(const char *nptr, char **end)
+{
+    return strtosz_suffix(nptr, end, STRTOSZ_DEFSUFFIX_MB);
+}
diff --git a/qemu-common.h b/qemu-common.h
index de82c2e..1ed32e5 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -149,7 +149,14 @@ time_t mktimegm(struct tm *tm);
 int qemu_fls(int i);
 int qemu_fdatasync(int fd);
 int fcntl_setfl(int fd, int flag);
+
+#define STRTOSZ_DEFSUFFIX_TB	'T'
+#define STRTOSZ_DEFSUFFIX_GB	'G'
+#define STRTOSZ_DEFSUFFIX_MB	'M'
+#define STRTOSZ_DEFSUFFIX_KB	'K'
+#define STRTOSZ_DEFSUFFIX_B	'B'
 ssize_t strtosz(const char *nptr, char **end);
+ssize_t strtosz_suffix(const char *nptr, char **end, const char default_suffix);
 
 /* path.c */
 void init_paths(const char *prefix);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 05/38] qemu-img.c: Clean up handling of image size in img_create()
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (3 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 04/38] Introduce strtosz_suffix() Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 06/38] ide: split ide command interpretation off Kevin Wolf
                   ` (33 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Jes Sorensen <Jes.Sorensen@redhat.com>

This cleans up the handling of image size in img_create() by parsing
the value early, and then only setting it once if a value has been
added as the last argument to the command line.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qemu-img.c |   23 +++++++++++++++++------
 1 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 52282e3..1d936ed 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -282,6 +282,7 @@ static int add_old_style_options(const char *fmt, QEMUOptionParameter *list,
 static int img_create(int argc, char **argv)
 {
     int c, ret = 0;
+    uint64_t img_size = -1;
     const char *fmt = "raw";
     const char *base_fmt = NULL;
     const char *filename;
@@ -330,6 +331,20 @@ static int img_create(int argc, char **argv)
     }
     filename = argv[optind++];
 
+    /* Get image size, if specified */
+    if (optind < argc) {
+        ssize_t sval;
+        sval = strtosz_suffix(argv[optind++], NULL, STRTOSZ_DEFSUFFIX_B);
+        if (sval < 0) {
+            error("Invalid image size specified! You may use k, M, G or "
+                  "T suffixes for ");
+            error("kilobytes, megabytes, gigabytes and terabytes.");
+            ret = -1;
+            goto out;
+        }
+        img_size = (uint64_t)sval;
+    }
+
     if (options && !strcmp(options, "?")) {
         ret = print_block_option_help(filename, fmt);
         goto out;
@@ -357,7 +372,8 @@ static int img_create(int argc, char **argv)
 
     /* Create parameter list with default values */
     param = parse_option_parameters("", create_options, param);
-    set_option_parameter_int(param, BLOCK_OPT_SIZE, -1);
+
+    set_option_parameter_int(param, BLOCK_OPT_SIZE, img_size);
 
     /* Parse -o options */
     if (options) {
@@ -369,11 +385,6 @@ static int img_create(int argc, char **argv)
         }
     }
 
-    /* Add size to parameters */
-    if (optind < argc) {
-        set_option_parameter(param, BLOCK_OPT_SIZE, argv[optind++]);
-    }
-
     /* Add old-style options to parameters */
     ret = add_old_style_options(fmt, param, base_filename, base_fmt);
     if (ret < 0) {
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 06/38] ide: split ide command interpretation off
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (4 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 05/38] qemu-img.c: Clean up handling of image size in img_create() Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 07/38] ide: fix whitespace gap in ide_exec_cmd Kevin Wolf
                   ` (32 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Alexander Graf <agraf@suse.de>

The ATA command interpretation code can be used for PATA and SATA
interfaces alike. So let's split it out into a separate function.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/ide/core.c     |   20 ++++++++++++++------
 hw/ide/internal.h |    2 ++
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 430350f..ac4ee71 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -1791,9 +1791,6 @@ static void ide_clear_hob(IDEBus *bus)
 void ide_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
     IDEBus *bus = opaque;
-    IDEState *s;
-    int n;
-    int lba48 = 0;
 
 #ifdef DEBUG_IDE
     printf("IDE: write addr=0x%x val=0x%02x\n", addr, val);
@@ -1854,17 +1851,29 @@ void ide_ioport_write(void *opaque, uint32_t addr, uint32_t val)
     default:
     case 7:
         /* command */
+        ide_exec_cmd(bus, val);
+        break;
+    }
+}
+
+
+void ide_exec_cmd(IDEBus *bus, uint32_t val)
+{
+    IDEState *s;
+    int n;
+    int lba48 = 0;
+
 #if defined(DEBUG_IDE)
         printf("ide: CMD=%02x\n", val);
 #endif
         s = idebus_active_if(bus);
         /* ignore commands to non existant slave */
         if (s != bus->ifs && !s->bs)
-            break;
+            return;
 
         /* Only DEVICE RESET is allowed while BSY or/and DRQ are set */
         if ((s->status & (BUSY_STAT|DRQ_STAT)) && val != WIN_DEVICE_RESET)
-            break;
+            return;
 
         switch(val) {
         case WIN_IDENTIFY:
@@ -2355,7 +2364,6 @@ void ide_ioport_write(void *opaque, uint32_t addr, uint32_t val)
             ide_set_irq(s->bus);
             break;
         }
-    }
 }
 
 uint32_t ide_ioport_read(void *opaque, uint32_t addr1)
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 71af66f..029c76c 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -567,6 +567,8 @@ void ide_init2_with_non_qdev_drives(IDEBus *bus, DriveInfo *hd0,
                                     DriveInfo *hd1, qemu_irq irq);
 void ide_init_ioport(IDEBus *bus, int iobase, int iobase2);
 
+void ide_exec_cmd(IDEBus *bus, uint32_t val);
+
 /* hw/ide/qdev.c */
 void ide_bus_new(IDEBus *idebus, DeviceState *dev, int bus_id);
 IDEDevice *ide_create_drive(IDEBus *bus, int unit, DriveInfo *drive);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 07/38] ide: fix whitespace gap in ide_exec_cmd
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (5 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 06/38] ide: split ide command interpretation off Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 08/38] ide: Split out BMDMA code from ATA core Kevin Wolf
                   ` (31 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Alexander Graf <agraf@suse.de>

Now that we have the function split out, we have to reindent it.
In order to increase the readability of the actual functional change,
this is split out.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/ide/core.c |  734 ++++++++++++++++++++++++++++----------------------------
 1 files changed, 367 insertions(+), 367 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index ac4ee71..ed6854d 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -1864,423 +1864,423 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val)
     int lba48 = 0;
 
 #if defined(DEBUG_IDE)
-        printf("ide: CMD=%02x\n", val);
+    printf("ide: CMD=%02x\n", val);
 #endif
-        s = idebus_active_if(bus);
-        /* ignore commands to non existant slave */
-        if (s != bus->ifs && !s->bs)
-            return;
+    s = idebus_active_if(bus);
+    /* ignore commands to non existant slave */
+    if (s != bus->ifs && !s->bs)
+        return;
 
-        /* Only DEVICE RESET is allowed while BSY or/and DRQ are set */
-        if ((s->status & (BUSY_STAT|DRQ_STAT)) && val != WIN_DEVICE_RESET)
-            return;
+    /* Only DEVICE RESET is allowed while BSY or/and DRQ are set */
+    if ((s->status & (BUSY_STAT|DRQ_STAT)) && val != WIN_DEVICE_RESET)
+        return;
 
-        switch(val) {
-        case WIN_IDENTIFY:
-            if (s->bs && s->drive_kind != IDE_CD) {
-                if (s->drive_kind != IDE_CFATA)
-                    ide_identify(s);
-                else
-                    ide_cfata_identify(s);
-                s->status = READY_STAT | SEEK_STAT;
-                ide_transfer_start(s, s->io_buffer, 512, ide_transfer_stop);
-            } else {
-                if (s->drive_kind == IDE_CD) {
-                    ide_set_signature(s);
-                }
-                ide_abort_command(s);
-            }
-            ide_set_irq(s->bus);
-            break;
-        case WIN_SPECIFY:
-        case WIN_RECAL:
-            s->error = 0;
+    switch(val) {
+    case WIN_IDENTIFY:
+        if (s->bs && s->drive_kind != IDE_CD) {
+            if (s->drive_kind != IDE_CFATA)
+                ide_identify(s);
+            else
+                ide_cfata_identify(s);
             s->status = READY_STAT | SEEK_STAT;
-            ide_set_irq(s->bus);
-            break;
-        case WIN_SETMULT:
-            if (s->drive_kind == IDE_CFATA && s->nsector == 0) {
-                /* Disable Read and Write Multiple */
-                s->mult_sectors = 0;
-                s->status = READY_STAT | SEEK_STAT;
-            } else if ((s->nsector & 0xff) != 0 &&
-                ((s->nsector & 0xff) > MAX_MULT_SECTORS ||
-                 (s->nsector & (s->nsector - 1)) != 0)) {
-                ide_abort_command(s);
-            } else {
-                s->mult_sectors = s->nsector & 0xff;
-                s->status = READY_STAT | SEEK_STAT;
+            ide_transfer_start(s, s->io_buffer, 512, ide_transfer_stop);
+        } else {
+            if (s->drive_kind == IDE_CD) {
+                ide_set_signature(s);
             }
-            ide_set_irq(s->bus);
-            break;
-        case WIN_VERIFY_EXT:
-	    lba48 = 1;
-        case WIN_VERIFY:
-        case WIN_VERIFY_ONCE:
-            /* do sector number check ? */
-	    ide_cmd_lba48_transform(s, lba48);
+            ide_abort_command(s);
+        }
+        ide_set_irq(s->bus);
+        break;
+    case WIN_SPECIFY:
+    case WIN_RECAL:
+        s->error = 0;
+        s->status = READY_STAT | SEEK_STAT;
+        ide_set_irq(s->bus);
+        break;
+    case WIN_SETMULT:
+        if (s->drive_kind == IDE_CFATA && s->nsector == 0) {
+            /* Disable Read and Write Multiple */
+            s->mult_sectors = 0;
             s->status = READY_STAT | SEEK_STAT;
-            ide_set_irq(s->bus);
-            break;
+        } else if ((s->nsector & 0xff) != 0 &&
+            ((s->nsector & 0xff) > MAX_MULT_SECTORS ||
+             (s->nsector & (s->nsector - 1)) != 0)) {
+            ide_abort_command(s);
+        } else {
+            s->mult_sectors = s->nsector & 0xff;
+            s->status = READY_STAT | SEEK_STAT;
+        }
+        ide_set_irq(s->bus);
+        break;
+    case WIN_VERIFY_EXT:
+	lba48 = 1;
+    case WIN_VERIFY:
+    case WIN_VERIFY_ONCE:
+        /* do sector number check ? */
+	ide_cmd_lba48_transform(s, lba48);
+        s->status = READY_STAT | SEEK_STAT;
+        ide_set_irq(s->bus);
+        break;
 	case WIN_READ_EXT:
-	    lba48 = 1;
-        case WIN_READ:
-        case WIN_READ_ONCE:
-            if (!s->bs)
-                goto abort_cmd;
-	    ide_cmd_lba48_transform(s, lba48);
-            s->req_nb_sectors = 1;
-            ide_sector_read(s);
-            break;
+	lba48 = 1;
+    case WIN_READ:
+    case WIN_READ_ONCE:
+        if (!s->bs)
+            goto abort_cmd;
+	ide_cmd_lba48_transform(s, lba48);
+        s->req_nb_sectors = 1;
+        ide_sector_read(s);
+        break;
 	case WIN_WRITE_EXT:
-	    lba48 = 1;
-        case WIN_WRITE:
-        case WIN_WRITE_ONCE:
-        case CFA_WRITE_SECT_WO_ERASE:
-        case WIN_WRITE_VERIFY:
-	    ide_cmd_lba48_transform(s, lba48);
-            s->error = 0;
-            s->status = SEEK_STAT | READY_STAT;
-            s->req_nb_sectors = 1;
-            ide_transfer_start(s, s->io_buffer, 512, ide_sector_write);
-            s->media_changed = 1;
-            break;
+	lba48 = 1;
+    case WIN_WRITE:
+    case WIN_WRITE_ONCE:
+    case CFA_WRITE_SECT_WO_ERASE:
+    case WIN_WRITE_VERIFY:
+	ide_cmd_lba48_transform(s, lba48);
+        s->error = 0;
+        s->status = SEEK_STAT | READY_STAT;
+        s->req_nb_sectors = 1;
+        ide_transfer_start(s, s->io_buffer, 512, ide_sector_write);
+        s->media_changed = 1;
+        break;
 	case WIN_MULTREAD_EXT:
-	    lba48 = 1;
-        case WIN_MULTREAD:
-            if (!s->mult_sectors)
-                goto abort_cmd;
-	    ide_cmd_lba48_transform(s, lba48);
-            s->req_nb_sectors = s->mult_sectors;
-            ide_sector_read(s);
-            break;
-        case WIN_MULTWRITE_EXT:
-	    lba48 = 1;
-        case WIN_MULTWRITE:
-        case CFA_WRITE_MULTI_WO_ERASE:
-            if (!s->mult_sectors)
-                goto abort_cmd;
-	    ide_cmd_lba48_transform(s, lba48);
-            s->error = 0;
-            s->status = SEEK_STAT | READY_STAT;
-            s->req_nb_sectors = s->mult_sectors;
-            n = s->nsector;
-            if (n > s->req_nb_sectors)
-                n = s->req_nb_sectors;
-            ide_transfer_start(s, s->io_buffer, 512 * n, ide_sector_write);
-            s->media_changed = 1;
-            break;
+	lba48 = 1;
+    case WIN_MULTREAD:
+        if (!s->mult_sectors)
+            goto abort_cmd;
+	ide_cmd_lba48_transform(s, lba48);
+        s->req_nb_sectors = s->mult_sectors;
+        ide_sector_read(s);
+        break;
+    case WIN_MULTWRITE_EXT:
+	lba48 = 1;
+    case WIN_MULTWRITE:
+    case CFA_WRITE_MULTI_WO_ERASE:
+        if (!s->mult_sectors)
+            goto abort_cmd;
+	ide_cmd_lba48_transform(s, lba48);
+        s->error = 0;
+        s->status = SEEK_STAT | READY_STAT;
+        s->req_nb_sectors = s->mult_sectors;
+        n = s->nsector;
+        if (n > s->req_nb_sectors)
+            n = s->req_nb_sectors;
+        ide_transfer_start(s, s->io_buffer, 512 * n, ide_sector_write);
+        s->media_changed = 1;
+        break;
 	case WIN_READDMA_EXT:
-	    lba48 = 1;
-        case WIN_READDMA:
-        case WIN_READDMA_ONCE:
-            if (!s->bs)
-                goto abort_cmd;
-	    ide_cmd_lba48_transform(s, lba48);
-            ide_sector_read_dma(s);
-            break;
+	lba48 = 1;
+    case WIN_READDMA:
+    case WIN_READDMA_ONCE:
+        if (!s->bs)
+            goto abort_cmd;
+	ide_cmd_lba48_transform(s, lba48);
+        ide_sector_read_dma(s);
+        break;
 	case WIN_WRITEDMA_EXT:
-	    lba48 = 1;
-        case WIN_WRITEDMA:
-        case WIN_WRITEDMA_ONCE:
-            if (!s->bs)
-                goto abort_cmd;
-	    ide_cmd_lba48_transform(s, lba48);
-            ide_sector_write_dma(s);
-            s->media_changed = 1;
-            break;
-        case WIN_READ_NATIVE_MAX_EXT:
-	    lba48 = 1;
-        case WIN_READ_NATIVE_MAX:
-	    ide_cmd_lba48_transform(s, lba48);
-            ide_set_sector(s, s->nb_sectors - 1);
-            s->status = READY_STAT | SEEK_STAT;
-            ide_set_irq(s->bus);
-            break;
-        case WIN_CHECKPOWERMODE1:
-        case WIN_CHECKPOWERMODE2:
-            s->nsector = 0xff; /* device active or idle */
+	lba48 = 1;
+    case WIN_WRITEDMA:
+    case WIN_WRITEDMA_ONCE:
+        if (!s->bs)
+            goto abort_cmd;
+	ide_cmd_lba48_transform(s, lba48);
+        ide_sector_write_dma(s);
+        s->media_changed = 1;
+        break;
+    case WIN_READ_NATIVE_MAX_EXT:
+	lba48 = 1;
+    case WIN_READ_NATIVE_MAX:
+	ide_cmd_lba48_transform(s, lba48);
+        ide_set_sector(s, s->nb_sectors - 1);
+        s->status = READY_STAT | SEEK_STAT;
+        ide_set_irq(s->bus);
+        break;
+    case WIN_CHECKPOWERMODE1:
+    case WIN_CHECKPOWERMODE2:
+        s->nsector = 0xff; /* device active or idle */
+        s->status = READY_STAT | SEEK_STAT;
+        ide_set_irq(s->bus);
+        break;
+    case WIN_SETFEATURES:
+        if (!s->bs)
+            goto abort_cmd;
+        /* XXX: valid for CDROM ? */
+        switch(s->feature) {
+        case 0xcc: /* reverting to power-on defaults enable */
+        case 0x66: /* reverting to power-on defaults disable */
+        case 0x02: /* write cache enable */
+        case 0x82: /* write cache disable */
+        case 0xaa: /* read look-ahead enable */
+        case 0x55: /* read look-ahead disable */
+        case 0x05: /* set advanced power management mode */
+        case 0x85: /* disable advanced power management mode */
+        case 0x69: /* NOP */
+        case 0x67: /* NOP */
+        case 0x96: /* NOP */
+        case 0x9a: /* NOP */
+        case 0x42: /* enable Automatic Acoustic Mode */
+        case 0xc2: /* disable Automatic Acoustic Mode */
             s->status = READY_STAT | SEEK_STAT;
             ide_set_irq(s->bus);
             break;
-        case WIN_SETFEATURES:
-            if (!s->bs)
-                goto abort_cmd;
-            /* XXX: valid for CDROM ? */
-            switch(s->feature) {
-            case 0xcc: /* reverting to power-on defaults enable */
-            case 0x66: /* reverting to power-on defaults disable */
-            case 0x02: /* write cache enable */
-            case 0x82: /* write cache disable */
-            case 0xaa: /* read look-ahead enable */
-            case 0x55: /* read look-ahead disable */
-            case 0x05: /* set advanced power management mode */
-            case 0x85: /* disable advanced power management mode */
-            case 0x69: /* NOP */
-            case 0x67: /* NOP */
-            case 0x96: /* NOP */
-            case 0x9a: /* NOP */
-            case 0x42: /* enable Automatic Acoustic Mode */
-            case 0xc2: /* disable Automatic Acoustic Mode */
-                s->status = READY_STAT | SEEK_STAT;
-                ide_set_irq(s->bus);
-                break;
-            case 0x03: { /* set transfer mode */
+        case 0x03: { /* set transfer mode */
 		uint8_t val = s->nsector & 0x07;
-                uint16_t *identify_data = (uint16_t *)s->identify_data;
+            uint16_t *identify_data = (uint16_t *)s->identify_data;
 
 		switch (s->nsector >> 3) {
-		    case 0x00: /* pio default */
-		    case 0x01: /* pio mode */
+		case 0x00: /* pio default */
+		case 0x01: /* pio mode */
 			put_le16(identify_data + 62,0x07);
 			put_le16(identify_data + 63,0x07);
 			put_le16(identify_data + 88,0x3f);
 			break;
-                    case 0x02: /* sigle word dma mode*/
+                case 0x02: /* sigle word dma mode*/
 			put_le16(identify_data + 62,0x07 | (1 << (val + 8)));
 			put_le16(identify_data + 63,0x07);
 			put_le16(identify_data + 88,0x3f);
 			break;
-		    case 0x04: /* mdma mode */
+		case 0x04: /* mdma mode */
 			put_le16(identify_data + 62,0x07);
 			put_le16(identify_data + 63,0x07 | (1 << (val + 8)));
 			put_le16(identify_data + 88,0x3f);
 			break;
-		    case 0x08: /* udma mode */
+		case 0x08: /* udma mode */
 			put_le16(identify_data + 62,0x07);
 			put_le16(identify_data + 63,0x07);
 			put_le16(identify_data + 88,0x3f | (1 << (val + 8)));
 			break;
-		    default:
+		default:
 			goto abort_cmd;
 		}
-                s->status = READY_STAT | SEEK_STAT;
-                ide_set_irq(s->bus);
-                break;
-	    }
-            default:
-                goto abort_cmd;
-            }
-            break;
-        case WIN_FLUSH_CACHE:
-        case WIN_FLUSH_CACHE_EXT:
-            ide_flush_cache(s);
-            break;
-        case WIN_STANDBY:
-        case WIN_STANDBY2:
-        case WIN_STANDBYNOW1:
-        case WIN_STANDBYNOW2:
-        case WIN_IDLEIMMEDIATE:
-        case CFA_IDLEIMMEDIATE:
-        case WIN_SETIDLE1:
-        case WIN_SETIDLE2:
-        case WIN_SLEEPNOW1:
-        case WIN_SLEEPNOW2:
-            s->status = READY_STAT;
-            ide_set_irq(s->bus);
-            break;
-        case WIN_SEEK:
-            if(s->drive_kind == IDE_CD)
-                goto abort_cmd;
-            /* XXX: Check that seek is within bounds */
             s->status = READY_STAT | SEEK_STAT;
             ide_set_irq(s->bus);
             break;
-            /* ATAPI commands */
-        case WIN_PIDENTIFY:
-            if (s->drive_kind == IDE_CD) {
-                ide_atapi_identify(s);
-                s->status = READY_STAT | SEEK_STAT;
-                ide_transfer_start(s, s->io_buffer, 512, ide_transfer_stop);
-            } else {
-                ide_abort_command(s);
-            }
-            ide_set_irq(s->bus);
-            break;
-        case WIN_DIAGNOSE:
-            ide_set_signature(s);
-            if (s->drive_kind == IDE_CD)
-                s->status = 0; /* ATAPI spec (v6) section 9.10 defines packet
-                                * devices to return a clear status register
-                                * with READY_STAT *not* set. */
-            else
-                s->status = READY_STAT | SEEK_STAT;
-            s->error = 0x01; /* Device 0 passed, Device 1 passed or not
-                              * present. 
-                              */
-            ide_set_irq(s->bus);
-            break;
-        case WIN_SRST:
-            if (s->drive_kind != IDE_CD)
-                goto abort_cmd;
-            ide_set_signature(s);
-            s->status = 0x00; /* NOTE: READY is _not_ set */
-            s->error = 0x01;
-            break;
-        case WIN_PACKETCMD:
-            if (s->drive_kind != IDE_CD)
-                goto abort_cmd;
-            /* overlapping commands not supported */
-            if (s->feature & 0x02)
-                goto abort_cmd;
+	}
+        default:
+            goto abort_cmd;
+        }
+        break;
+    case WIN_FLUSH_CACHE:
+    case WIN_FLUSH_CACHE_EXT:
+        ide_flush_cache(s);
+        break;
+    case WIN_STANDBY:
+    case WIN_STANDBY2:
+    case WIN_STANDBYNOW1:
+    case WIN_STANDBYNOW2:
+    case WIN_IDLEIMMEDIATE:
+    case CFA_IDLEIMMEDIATE:
+    case WIN_SETIDLE1:
+    case WIN_SETIDLE2:
+    case WIN_SLEEPNOW1:
+    case WIN_SLEEPNOW2:
+        s->status = READY_STAT;
+        ide_set_irq(s->bus);
+        break;
+    case WIN_SEEK:
+        if(s->drive_kind == IDE_CD)
+            goto abort_cmd;
+        /* XXX: Check that seek is within bounds */
+        s->status = READY_STAT | SEEK_STAT;
+        ide_set_irq(s->bus);
+        break;
+        /* ATAPI commands */
+    case WIN_PIDENTIFY:
+        if (s->drive_kind == IDE_CD) {
+            ide_atapi_identify(s);
             s->status = READY_STAT | SEEK_STAT;
-            s->atapi_dma = s->feature & 1;
-            s->nsector = 1;
-            ide_transfer_start(s, s->io_buffer, ATAPI_PACKET_SIZE,
-                               ide_atapi_cmd);
-            break;
-        /* CF-ATA commands */
-        case CFA_REQ_EXT_ERROR_CODE:
-            if (s->drive_kind != IDE_CFATA)
-                goto abort_cmd;
-            s->error = 0x09;    /* miscellaneous error */
+            ide_transfer_start(s, s->io_buffer, 512, ide_transfer_stop);
+        } else {
+            ide_abort_command(s);
+        }
+        ide_set_irq(s->bus);
+        break;
+    case WIN_DIAGNOSE:
+        ide_set_signature(s);
+        if (s->drive_kind == IDE_CD)
+            s->status = 0; /* ATAPI spec (v6) section 9.10 defines packet
+                            * devices to return a clear status register
+                            * with READY_STAT *not* set. */
+        else
             s->status = READY_STAT | SEEK_STAT;
-            ide_set_irq(s->bus);
+        s->error = 0x01; /* Device 0 passed, Device 1 passed or not
+                          * present.
+                          */
+        ide_set_irq(s->bus);
+        break;
+    case WIN_SRST:
+        if (s->drive_kind != IDE_CD)
+            goto abort_cmd;
+        ide_set_signature(s);
+        s->status = 0x00; /* NOTE: READY is _not_ set */
+        s->error = 0x01;
+        break;
+    case WIN_PACKETCMD:
+        if (s->drive_kind != IDE_CD)
+            goto abort_cmd;
+        /* overlapping commands not supported */
+        if (s->feature & 0x02)
+            goto abort_cmd;
+        s->status = READY_STAT | SEEK_STAT;
+        s->atapi_dma = s->feature & 1;
+        s->nsector = 1;
+        ide_transfer_start(s, s->io_buffer, ATAPI_PACKET_SIZE,
+                           ide_atapi_cmd);
+        break;
+    /* CF-ATA commands */
+    case CFA_REQ_EXT_ERROR_CODE:
+        if (s->drive_kind != IDE_CFATA)
+            goto abort_cmd;
+        s->error = 0x09;    /* miscellaneous error */
+        s->status = READY_STAT | SEEK_STAT;
+        ide_set_irq(s->bus);
+        break;
+    case CFA_ERASE_SECTORS:
+    case CFA_WEAR_LEVEL:
+        if (s->drive_kind != IDE_CFATA)
+            goto abort_cmd;
+        if (val == CFA_WEAR_LEVEL)
+            s->nsector = 0;
+        if (val == CFA_ERASE_SECTORS)
+            s->media_changed = 1;
+        s->error = 0x00;
+        s->status = READY_STAT | SEEK_STAT;
+        ide_set_irq(s->bus);
+        break;
+    case CFA_TRANSLATE_SECTOR:
+        if (s->drive_kind != IDE_CFATA)
+            goto abort_cmd;
+        s->error = 0x00;
+        s->status = READY_STAT | SEEK_STAT;
+        memset(s->io_buffer, 0, 0x200);
+        s->io_buffer[0x00] = s->hcyl;			/* Cyl MSB */
+        s->io_buffer[0x01] = s->lcyl;			/* Cyl LSB */
+        s->io_buffer[0x02] = s->select;			/* Head */
+        s->io_buffer[0x03] = s->sector;			/* Sector */
+        s->io_buffer[0x04] = ide_get_sector(s) >> 16;	/* LBA MSB */
+        s->io_buffer[0x05] = ide_get_sector(s) >> 8;	/* LBA */
+        s->io_buffer[0x06] = ide_get_sector(s) >> 0;	/* LBA LSB */
+        s->io_buffer[0x13] = 0x00;				/* Erase flag */
+        s->io_buffer[0x18] = 0x00;				/* Hot count */
+        s->io_buffer[0x19] = 0x00;				/* Hot count */
+        s->io_buffer[0x1a] = 0x01;				/* Hot count */
+        ide_transfer_start(s, s->io_buffer, 0x200, ide_transfer_stop);
+        ide_set_irq(s->bus);
+        break;
+    case CFA_ACCESS_METADATA_STORAGE:
+        if (s->drive_kind != IDE_CFATA)
+            goto abort_cmd;
+        switch (s->feature) {
+        case 0x02:	/* Inquiry Metadata Storage */
+            ide_cfata_metadata_inquiry(s);
             break;
-        case CFA_ERASE_SECTORS:
-        case CFA_WEAR_LEVEL:
-            if (s->drive_kind != IDE_CFATA)
-                goto abort_cmd;
-            if (val == CFA_WEAR_LEVEL)
-                s->nsector = 0;
-            if (val == CFA_ERASE_SECTORS)
-                s->media_changed = 1;
-            s->error = 0x00;
-            s->status = READY_STAT | SEEK_STAT;
-            ide_set_irq(s->bus);
+        case 0x03:	/* Read Metadata Storage */
+            ide_cfata_metadata_read(s);
             break;
-        case CFA_TRANSLATE_SECTOR:
-            if (s->drive_kind != IDE_CFATA)
-                goto abort_cmd;
-            s->error = 0x00;
-            s->status = READY_STAT | SEEK_STAT;
-            memset(s->io_buffer, 0, 0x200);
-            s->io_buffer[0x00] = s->hcyl;			/* Cyl MSB */
-            s->io_buffer[0x01] = s->lcyl;			/* Cyl LSB */
-            s->io_buffer[0x02] = s->select;			/* Head */
-            s->io_buffer[0x03] = s->sector;			/* Sector */
-            s->io_buffer[0x04] = ide_get_sector(s) >> 16;	/* LBA MSB */
-            s->io_buffer[0x05] = ide_get_sector(s) >> 8;	/* LBA */
-            s->io_buffer[0x06] = ide_get_sector(s) >> 0;	/* LBA LSB */
-            s->io_buffer[0x13] = 0x00;				/* Erase flag */
-            s->io_buffer[0x18] = 0x00;				/* Hot count */
-            s->io_buffer[0x19] = 0x00;				/* Hot count */
-            s->io_buffer[0x1a] = 0x01;				/* Hot count */
-            ide_transfer_start(s, s->io_buffer, 0x200, ide_transfer_stop);
-            ide_set_irq(s->bus);
+        case 0x04:	/* Write Metadata Storage */
+            ide_cfata_metadata_write(s);
             break;
-        case CFA_ACCESS_METADATA_STORAGE:
-            if (s->drive_kind != IDE_CFATA)
-                goto abort_cmd;
-            switch (s->feature) {
-            case 0x02:	/* Inquiry Metadata Storage */
-                ide_cfata_metadata_inquiry(s);
-                break;
-            case 0x03:	/* Read Metadata Storage */
-                ide_cfata_metadata_read(s);
-                break;
-            case 0x04:	/* Write Metadata Storage */
-                ide_cfata_metadata_write(s);
-                break;
-            default:
-                goto abort_cmd;
-            }
-            ide_transfer_start(s, s->io_buffer, 0x200, ide_transfer_stop);
-            s->status = 0x00; /* NOTE: READY is _not_ set */
-            ide_set_irq(s->bus);
-            break;
-        case IBM_SENSE_CONDITION:
-            if (s->drive_kind != IDE_CFATA)
-                goto abort_cmd;
-            switch (s->feature) {
-            case 0x01:  /* sense temperature in device */
-                s->nsector = 0x50;      /* +20 C */
-                break;
-            default:
-                goto abort_cmd;
-            }
-            s->status = READY_STAT | SEEK_STAT;
-            ide_set_irq(s->bus);
+        default:
+            goto abort_cmd;
+        }
+        ide_transfer_start(s, s->io_buffer, 0x200, ide_transfer_stop);
+        s->status = 0x00; /* NOTE: READY is _not_ set */
+        ide_set_irq(s->bus);
+        break;
+    case IBM_SENSE_CONDITION:
+        if (s->drive_kind != IDE_CFATA)
+            goto abort_cmd;
+        switch (s->feature) {
+        case 0x01:  /* sense temperature in device */
+            s->nsector = 0x50;      /* +20 C */
             break;
+        default:
+            goto abort_cmd;
+        }
+        s->status = READY_STAT | SEEK_STAT;
+        ide_set_irq(s->bus);
+        break;
 
 	case WIN_SMART:
-	    if (s->drive_kind == IDE_CD)
+	if (s->drive_kind == IDE_CD)
 		goto abort_cmd;
-	    if (s->hcyl != 0xc2 || s->lcyl != 0x4f)
+	if (s->hcyl != 0xc2 || s->lcyl != 0x4f)
 		goto abort_cmd;
-	    if (!s->smart_enabled && s->feature != SMART_ENABLE)
+	if (!s->smart_enabled && s->feature != SMART_ENABLE)
 		goto abort_cmd;
-	    switch (s->feature) {
-	    case SMART_DISABLE:
+	switch (s->feature) {
+	case SMART_DISABLE:
 		s->smart_enabled = 0;
 		s->status = READY_STAT | SEEK_STAT;
 		ide_set_irq(s->bus);
 		break;
-	    case SMART_ENABLE:
+	case SMART_ENABLE:
 		s->smart_enabled = 1;
 		s->status = READY_STAT | SEEK_STAT;
 		ide_set_irq(s->bus);
 		break;
-	    case SMART_ATTR_AUTOSAVE:
+	case SMART_ATTR_AUTOSAVE:
 		switch (s->sector) {
 		case 0x00:
-		    s->smart_autosave = 0;
-		    break;
+		s->smart_autosave = 0;
+		break;
 		case 0xf1:
-		    s->smart_autosave = 1;
-		    break;
+		s->smart_autosave = 1;
+		break;
 		default:
-		    goto abort_cmd;
+		goto abort_cmd;
 		}
 		s->status = READY_STAT | SEEK_STAT;
 		ide_set_irq(s->bus);
 		break;
-	    case SMART_STATUS:
+	case SMART_STATUS:
 		if (!s->smart_errors) {
-		    s->hcyl = 0xc2;
-		    s->lcyl = 0x4f;
+		s->hcyl = 0xc2;
+		s->lcyl = 0x4f;
 		} else {
-		    s->hcyl = 0x2c;
-		    s->lcyl = 0xf4;
+		s->hcyl = 0x2c;
+		s->lcyl = 0xf4;
 		}
 		s->status = READY_STAT | SEEK_STAT;
 		ide_set_irq(s->bus);
 		break;
-	    case SMART_READ_THRESH:
+	case SMART_READ_THRESH:
 		memset(s->io_buffer, 0, 0x200);
 		s->io_buffer[0] = 0x01; /* smart struct version */
 		for (n=0; n<30; n++) {
-		    if (smart_attributes[n][0] == 0)
+		if (smart_attributes[n][0] == 0)
 			break;
-		    s->io_buffer[2+0+(n*12)] = smart_attributes[n][0];
-		    s->io_buffer[2+1+(n*12)] = smart_attributes[n][4];
+		s->io_buffer[2+0+(n*12)] = smart_attributes[n][0];
+		s->io_buffer[2+1+(n*12)] = smart_attributes[n][4];
 		}
 		for (n=0; n<511; n++) /* checksum */
-		    s->io_buffer[511] += s->io_buffer[n];
+		s->io_buffer[511] += s->io_buffer[n];
 		s->io_buffer[511] = 0x100 - s->io_buffer[511];
 		s->status = READY_STAT | SEEK_STAT;
 		ide_transfer_start(s, s->io_buffer, 0x200, ide_transfer_stop);
 		ide_set_irq(s->bus);
 		break;
-	    case SMART_READ_DATA:
+	case SMART_READ_DATA:
 		memset(s->io_buffer, 0, 0x200);
 		s->io_buffer[0] = 0x01; /* smart struct version */
 		for (n=0; n<30; n++) {
-		    if (smart_attributes[n][0] == 0)
+		if (smart_attributes[n][0] == 0)
 			break;
-		    s->io_buffer[2+0+(n*12)] = smart_attributes[n][0];
-		    s->io_buffer[2+1+(n*12)] = smart_attributes[n][1];
-		    s->io_buffer[2+3+(n*12)] = smart_attributes[n][2];
-		    s->io_buffer[2+4+(n*12)] = smart_attributes[n][3];
+		s->io_buffer[2+0+(n*12)] = smart_attributes[n][0];
+		s->io_buffer[2+1+(n*12)] = smart_attributes[n][1];
+		s->io_buffer[2+3+(n*12)] = smart_attributes[n][2];
+		s->io_buffer[2+4+(n*12)] = smart_attributes[n][3];
 		}
 		s->io_buffer[362] = 0x02 | (s->smart_autosave?0x80:0x00);
 		if (s->smart_selftest_count == 0) {
-		    s->io_buffer[363] = 0;
+		s->io_buffer[363] = 0;
 		} else {
-		    s->io_buffer[363] = 
+		s->io_buffer[363] =
 			s->smart_selftest_data[3 + 
-					       (s->smart_selftest_count - 1) * 
-					       24];
+					   (s->smart_selftest_count - 1) *
+					   24];
 		}
 		s->io_buffer[364] = 0x20; 
 		s->io_buffer[365] = 0x01; 
@@ -2294,76 +2294,76 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val)
 		s->io_buffer[374] = 0x01; /* minutes for poll conveyance */
 
 		for (n=0; n<511; n++) 
-		    s->io_buffer[511] += s->io_buffer[n];
+		s->io_buffer[511] += s->io_buffer[n];
 		s->io_buffer[511] = 0x100 - s->io_buffer[511];
 		s->status = READY_STAT | SEEK_STAT;
 		ide_transfer_start(s, s->io_buffer, 0x200, ide_transfer_stop);
 		ide_set_irq(s->bus);
 		break;
-	    case SMART_READ_LOG:
+	case SMART_READ_LOG:
 		switch (s->sector) {
 		case 0x01: /* summary smart error log */
-		    memset(s->io_buffer, 0, 0x200);
-		    s->io_buffer[0] = 0x01;
-		    s->io_buffer[1] = 0x00; /* no error entries */
-		    s->io_buffer[452] = s->smart_errors & 0xff;
-		    s->io_buffer[453] = (s->smart_errors & 0xff00) >> 8;
+		memset(s->io_buffer, 0, 0x200);
+		s->io_buffer[0] = 0x01;
+		s->io_buffer[1] = 0x00; /* no error entries */
+		s->io_buffer[452] = s->smart_errors & 0xff;
+		s->io_buffer[453] = (s->smart_errors & 0xff00) >> 8;
 
-		    for (n=0; n<511; n++)
+		for (n=0; n<511; n++)
 			s->io_buffer[511] += s->io_buffer[n];
-		    s->io_buffer[511] = 0x100 - s->io_buffer[511];
-		    break;
+		s->io_buffer[511] = 0x100 - s->io_buffer[511];
+		break;
 		case 0x06: /* smart self test log */
-		    memset(s->io_buffer, 0, 0x200);
-		    s->io_buffer[0] = 0x01; 
-		    if (s->smart_selftest_count == 0) {
+		memset(s->io_buffer, 0, 0x200);
+		s->io_buffer[0] = 0x01;
+		if (s->smart_selftest_count == 0) {
 			s->io_buffer[508] = 0;
-		    } else {
+		} else {
 			s->io_buffer[508] = s->smart_selftest_count;
 			for (n=2; n<506; n++) 
-			    s->io_buffer[n] = s->smart_selftest_data[n];
-		    }		    
-		    for (n=0; n<511; n++)
+			s->io_buffer[n] = s->smart_selftest_data[n];
+		}
+		for (n=0; n<511; n++)
 			s->io_buffer[511] += s->io_buffer[n];
-		    s->io_buffer[511] = 0x100 - s->io_buffer[511];
-		    break;
+		s->io_buffer[511] = 0x100 - s->io_buffer[511];
+		break;
 		default:
-		    goto abort_cmd;
+		goto abort_cmd;
 		}
 		s->status = READY_STAT | SEEK_STAT;
 		ide_transfer_start(s, s->io_buffer, 0x200, ide_transfer_stop);
 		ide_set_irq(s->bus);
 		break;
-	    case SMART_EXECUTE_OFFLINE:
+	case SMART_EXECUTE_OFFLINE:
 		switch (s->sector) {
 		case 0: /* off-line routine */
 		case 1: /* short self test */
 		case 2: /* extended self test */
-		    s->smart_selftest_count++;
-		    if(s->smart_selftest_count > 21)
+		s->smart_selftest_count++;
+		if(s->smart_selftest_count > 21)
 			s->smart_selftest_count = 0;
-		    n = 2 + (s->smart_selftest_count - 1) * 24;
-		    s->smart_selftest_data[n] = s->sector;
-		    s->smart_selftest_data[n+1] = 0x00; /* OK and finished */
-		    s->smart_selftest_data[n+2] = 0x34; /* hour count lsb */
-		    s->smart_selftest_data[n+3] = 0x12; /* hour count msb */
-		    s->status = READY_STAT | SEEK_STAT;
-		    ide_set_irq(s->bus);
-		    break;
+		n = 2 + (s->smart_selftest_count - 1) * 24;
+		s->smart_selftest_data[n] = s->sector;
+		s->smart_selftest_data[n+1] = 0x00; /* OK and finished */
+		s->smart_selftest_data[n+2] = 0x34; /* hour count lsb */
+		s->smart_selftest_data[n+3] = 0x12; /* hour count msb */
+		s->status = READY_STAT | SEEK_STAT;
+		ide_set_irq(s->bus);
+		break;
 		default:
-		    goto abort_cmd;
+		goto abort_cmd;
 		}
 		break;
-	    default:
+	default:
 		goto abort_cmd;
-	    }
-	    break;
-        default:
-        abort_cmd:
-            ide_abort_command(s);
-            ide_set_irq(s->bus);
-            break;
-        }
+	}
+	break;
+    default:
+    abort_cmd:
+        ide_abort_command(s);
+        ide_set_irq(s->bus);
+        break;
+    }
 }
 
 uint32_t ide_ioport_read(void *opaque, uint32_t addr1)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 08/38] ide: Split out BMDMA code from ATA core
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (6 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 07/38] ide: fix whitespace gap in ide_exec_cmd Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 09/38] ide: move transfer_start after variable modification Kevin Wolf
                   ` (30 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Alexander Graf <agraf@suse.de>

The ATA core is currently heavily intertwined with BMDMA code. Let's loosen
that a bit, so we can happily replace the DMA backend with different
implementations.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/ide/cmd646.c   |    6 +-
 hw/ide/core.c     |  347 +++++++++++++++--------------------------------------
 hw/ide/internal.h |   69 +++++------
 hw/ide/pci.c      |  280 ++++++++++++++++++++++++++++++++++++++++++-
 hw/ide/pci.h      |   30 +++++
 hw/ide/piix.c     |    6 +-
 hw/ide/via.c      |    6 +-
 7 files changed, 446 insertions(+), 298 deletions(-)

diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index ea5d2dc..e191ee6 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -167,9 +167,10 @@ static void bmdma_map(PCIDevice *pci_dev, int region_num,
 
     for(i = 0;i < 2; i++) {
         BMDMAState *bm = &d->bmdma[i];
-        d->bus[i].bmdma = bm;
+        bmdma_init(&d->bus[i], bm);
         bm->bus = d->bus+i;
-        qemu_add_vm_change_state_handler(ide_dma_restart_cb, bm);
+        qemu_add_vm_change_state_handler(d->bus[i].dma->ops->restart_cb,
+                                         &bm->dma);
 
         if (i == 0) {
             register_ioport_write(addr, 4, 1, bmdma_writeb_0, d);
@@ -218,7 +219,6 @@ static void cmd646_reset(void *opaque)
 
     for (i = 0; i < 2; i++) {
         ide_bus_reset(&d->bus[i]);
-        ide_dma_reset(&d->bmdma[i]);
     }
 }
 
diff --git a/hw/ide/core.c b/hw/ide/core.c
index ed6854d..2032e20 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -34,8 +34,6 @@
 
 #include <hw/ide/internal.h>
 
-#define IDE_PAGE_SIZE 4096
-
 static const int smart_attributes[][5] = {
     /* id,  flags, val, wrst, thrsh */
     { 0x01, 0x03, 0x64, 0x64, 0x06}, /* raw read */
@@ -61,11 +59,8 @@ static inline int media_is_cd(IDEState *s)
     return (media_present(s) && s->nb_sectors <= CD_MAX_SECTORS);
 }
 
-static void ide_dma_start(IDEState *s, BlockDriverCompletionFunc *dma_cb);
-static void ide_dma_restart(IDEState *s, int is_read);
 static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret);
 static int ide_handle_rw_error(IDEState *s, int error, int op);
-static void ide_flush_cache(IDEState *s);
 
 static void padstr(char *str, const char *src, int len)
 {
@@ -314,11 +309,11 @@ static inline void ide_abort_command(IDEState *s)
 }
 
 static inline void ide_dma_submit_check(IDEState *s,
-          BlockDriverCompletionFunc *dma_cb, BMDMAState *bm)
+          BlockDriverCompletionFunc *dma_cb)
 {
-    if (bm->aiocb)
+    if (s->bus->dma->aiocb)
 	return;
-    dma_cb(bm, -1);
+    dma_cb(s, -1);
 }
 
 /* prepare data transfer and tell what to do after */
@@ -328,8 +323,10 @@ static void ide_transfer_start(IDEState *s, uint8_t *buf, int size,
     s->end_transfer_func = end_transfer_func;
     s->data_ptr = buf;
     s->data_end = buf + size;
-    if (!(s->status & ERR_STAT))
+    if (!(s->status & ERR_STAT)) {
         s->status |= DRQ_STAT;
+    }
+    s->bus->dma->ops->start_transfer(s->bus->dma);
 }
 
 static void ide_transfer_stop(IDEState *s)
@@ -394,7 +391,7 @@ static void ide_rw_error(IDEState *s) {
     ide_set_irq(s->bus);
 }
 
-static void ide_sector_read(IDEState *s)
+void ide_sector_read(IDEState *s)
 {
     int64_t sector_num;
     int ret, n;
@@ -427,58 +424,15 @@ static void ide_sector_read(IDEState *s)
     }
 }
 
-
-/* return 0 if buffer completed */
-static int dma_buf_prepare(BMDMAState *bm, int is_write)
-{
-    IDEState *s = bmdma_active_if(bm);
-    struct {
-        uint32_t addr;
-        uint32_t size;
-    } prd;
-    int l, len;
-
-    qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1);
-    s->io_buffer_size = 0;
-    for(;;) {
-        if (bm->cur_prd_len == 0) {
-            /* end of table (with a fail safe of one page) */
-            if (bm->cur_prd_last ||
-                (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
-                return s->io_buffer_size != 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
-            bm->cur_addr += 8;
-            prd.addr = le32_to_cpu(prd.addr);
-            prd.size = le32_to_cpu(prd.size);
-            len = prd.size & 0xfffe;
-            if (len == 0)
-                len = 0x10000;
-            bm->cur_prd_len = len;
-            bm->cur_prd_addr = prd.addr;
-            bm->cur_prd_last = (prd.size & 0x80000000);
-        }
-        l = bm->cur_prd_len;
-        if (l > 0) {
-            qemu_sglist_add(&s->sg, bm->cur_prd_addr, l);
-            bm->cur_prd_addr += l;
-            bm->cur_prd_len -= l;
-            s->io_buffer_size += l;
-        }
-    }
-    return 1;
-}
-
 static void dma_buf_commit(IDEState *s, int is_write)
 {
     qemu_sglist_destroy(&s->sg);
 }
 
-static void ide_dma_set_inactive(BMDMAState *bm)
+static void ide_set_inactive(IDEState *s)
 {
-    bm->status &= ~BM_STATUS_DMAING;
-    bm->dma_cb = NULL;
-    bm->unit = -1;
-    bm->aiocb = NULL;
+    s->bus->dma->aiocb = NULL;
+    s->bus->dma->ops->set_inactive(s->bus->dma);
 }
 
 void ide_dma_error(IDEState *s)
@@ -486,8 +440,8 @@ void ide_dma_error(IDEState *s)
     ide_transfer_stop(s);
     s->error = ABRT_ERR;
     s->status = READY_STAT | ERR_STAT;
-    ide_dma_set_inactive(s->bus->bmdma);
-    s->bus->bmdma->status |= BM_STATUS_INT;
+    ide_set_inactive(s);
+    s->bus->dma->ops->add_status(s->bus->dma, BM_STATUS_INT);
     ide_set_irq(s->bus);
 }
 
@@ -503,8 +457,8 @@ static int ide_handle_rw_error(IDEState *s, int error, int op)
 
     if ((error == ENOSPC && action == BLOCK_ERR_STOP_ENOSPC)
             || action == BLOCK_ERR_STOP_ANY) {
-        s->bus->bmdma->unit = s->unit;
-        s->bus->bmdma->status |= op;
+        s->bus->dma->ops->set_unit(s->bus->dma, s->unit);
+        s->bus->dma->ops->add_status(s->bus->dma, op);
         bdrv_mon_event(s->bs, BDRV_ACTION_STOP, is_read);
         vm_stop(0);
     } else {
@@ -520,58 +474,9 @@ static int ide_handle_rw_error(IDEState *s, int error, int op)
     return 1;
 }
 
-/* return 0 if buffer completed */
-static int dma_buf_rw(BMDMAState *bm, int is_write)
+void ide_read_dma_cb(void *opaque, int ret)
 {
-    IDEState *s = bmdma_active_if(bm);
-    struct {
-        uint32_t addr;
-        uint32_t size;
-    } prd;
-    int l, len;
-
-    for(;;) {
-        l = s->io_buffer_size - s->io_buffer_index;
-        if (l <= 0)
-            break;
-        if (bm->cur_prd_len == 0) {
-            /* end of table (with a fail safe of one page) */
-            if (bm->cur_prd_last ||
-                (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
-                return 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
-            bm->cur_addr += 8;
-            prd.addr = le32_to_cpu(prd.addr);
-            prd.size = le32_to_cpu(prd.size);
-            len = prd.size & 0xfffe;
-            if (len == 0)
-                len = 0x10000;
-            bm->cur_prd_len = len;
-            bm->cur_prd_addr = prd.addr;
-            bm->cur_prd_last = (prd.size & 0x80000000);
-        }
-        if (l > bm->cur_prd_len)
-            l = bm->cur_prd_len;
-        if (l > 0) {
-            if (is_write) {
-                cpu_physical_memory_write(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
-            } else {
-                cpu_physical_memory_read(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
-            }
-            bm->cur_prd_addr += l;
-            bm->cur_prd_len -= l;
-            s->io_buffer_index += l;
-        }
-    }
-    return 1;
-}
-
-static void ide_read_dma_cb(void *opaque, int ret)
-{
-    BMDMAState *bm = opaque;
-    IDEState *s = bmdma_active_if(bm);
+    IDEState *s = opaque;
     int n;
     int64_t sector_num;
 
@@ -597,8 +502,8 @@ static void ide_read_dma_cb(void *opaque, int ret)
         s->status = READY_STAT | SEEK_STAT;
         ide_set_irq(s->bus);
     eot:
-        bm->status |= BM_STATUS_INT;
-        ide_dma_set_inactive(bm);
+        s->bus->dma->ops->add_status(s->bus->dma, BM_STATUS_INT);
+        ide_set_inactive(s);
         return;
     }
 
@@ -606,13 +511,13 @@ static void ide_read_dma_cb(void *opaque, int ret)
     n = s->nsector;
     s->io_buffer_index = 0;
     s->io_buffer_size = n * 512;
-    if (dma_buf_prepare(bm, 1) == 0)
+    if (s->bus->dma->ops->prepare_buf(s->bus->dma, 1) == 0)
         goto eot;
 #ifdef DEBUG_AIO
     printf("aio_read: sector_num=%" PRId64 " n=%d\n", sector_num, n);
 #endif
-    bm->aiocb = dma_bdrv_read(s->bs, &s->sg, sector_num, ide_read_dma_cb, bm);
-    ide_dma_submit_check(s, ide_read_dma_cb, bm);
+    s->bus->dma->aiocb = dma_bdrv_read(s->bs, &s->sg, sector_num, ide_read_dma_cb, s);
+    ide_dma_submit_check(s, ide_read_dma_cb);
 }
 
 static void ide_sector_read_dma(IDEState *s)
@@ -621,7 +526,7 @@ static void ide_sector_read_dma(IDEState *s)
     s->io_buffer_index = 0;
     s->io_buffer_size = 0;
     s->is_read = 1;
-    ide_dma_start(s, ide_read_dma_cb);
+    s->bus->dma->ops->start_dma(s->bus->dma, s, ide_read_dma_cb);
 }
 
 static void ide_sector_write_timer_cb(void *opaque)
@@ -630,7 +535,7 @@ static void ide_sector_write_timer_cb(void *opaque)
     ide_set_irq(s->bus);
 }
 
-static void ide_sector_write(IDEState *s)
+void ide_sector_write(IDEState *s)
 {
     int64_t sector_num;
     int ret, n, n1;
@@ -676,48 +581,9 @@ static void ide_sector_write(IDEState *s)
     }
 }
 
-static void ide_dma_restart_bh(void *opaque)
+void ide_write_dma_cb(void *opaque, int ret)
 {
-    BMDMAState *bm = opaque;
-    int is_read;
-
-    qemu_bh_delete(bm->bh);
-    bm->bh = NULL;
-
-    is_read = !!(bm->status & BM_STATUS_RETRY_READ);
-
-    if (bm->status & BM_STATUS_DMA_RETRY) {
-        bm->status &= ~(BM_STATUS_DMA_RETRY | BM_STATUS_RETRY_READ);
-        ide_dma_restart(bmdma_active_if(bm), is_read);
-    } else if (bm->status & BM_STATUS_PIO_RETRY) {
-        bm->status &= ~(BM_STATUS_PIO_RETRY | BM_STATUS_RETRY_READ);
-        if (is_read) {
-            ide_sector_read(bmdma_active_if(bm));
-        } else {
-            ide_sector_write(bmdma_active_if(bm));
-        }
-    } else if (bm->status & BM_STATUS_RETRY_FLUSH) {
-        ide_flush_cache(bmdma_active_if(bm));
-    }
-}
-
-void ide_dma_restart_cb(void *opaque, int running, int reason)
-{
-    BMDMAState *bm = opaque;
-
-    if (!running)
-        return;
-
-    if (!bm->bh) {
-        bm->bh = qemu_bh_new(ide_dma_restart_bh, bm);
-        qemu_bh_schedule(bm->bh);
-    }
-}
-
-static void ide_write_dma_cb(void *opaque, int ret)
-{
-    BMDMAState *bm = opaque;
-    IDEState *s = bmdma_active_if(bm);
+    IDEState *s = opaque;
     int n;
     int64_t sector_num;
 
@@ -740,21 +606,21 @@ static void ide_write_dma_cb(void *opaque, int ret)
         s->status = READY_STAT | SEEK_STAT;
         ide_set_irq(s->bus);
     eot:
-        bm->status |= BM_STATUS_INT;
-        ide_dma_set_inactive(bm);
+        s->bus->dma->ops->add_status(s->bus->dma, BM_STATUS_INT);
+        ide_set_inactive(s);
         return;
     }
 
     n = s->nsector;
     s->io_buffer_size = n * 512;
     /* launch next transfer */
-    if (dma_buf_prepare(bm, 0) == 0)
+    if (s->bus->dma->ops->prepare_buf(s->bus->dma, 0) == 0)
         goto eot;
 #ifdef DEBUG_AIO
     printf("aio_write: sector_num=%" PRId64 " n=%d\n", sector_num, n);
 #endif
-    bm->aiocb = dma_bdrv_write(s->bs, &s->sg, sector_num, ide_write_dma_cb, bm);
-    ide_dma_submit_check(s, ide_write_dma_cb, bm);
+    s->bus->dma->aiocb = dma_bdrv_write(s->bs, &s->sg, sector_num, ide_write_dma_cb, s);
+    ide_dma_submit_check(s, ide_write_dma_cb);
 }
 
 static void ide_sector_write_dma(IDEState *s)
@@ -763,7 +629,7 @@ static void ide_sector_write_dma(IDEState *s)
     s->io_buffer_index = 0;
     s->io_buffer_size = 0;
     s->is_read = 0;
-    ide_dma_start(s, ide_write_dma_cb);
+    s->bus->dma->ops->start_dma(s->bus->dma, s, ide_write_dma_cb);
 }
 
 void ide_atapi_cmd_ok(IDEState *s)
@@ -813,7 +679,7 @@ static void ide_flush_cb(void *opaque, int ret)
     ide_set_irq(s->bus);
 }
 
-static void ide_flush_cache(IDEState *s)
+void ide_flush_cache(IDEState *s)
 {
     BlockDriverAIOCB *acb;
 
@@ -1003,7 +869,8 @@ static void ide_atapi_cmd_reply(IDEState *s, int size, int max_size)
 
     if (s->atapi_dma) {
     	s->status = READY_STAT | SEEK_STAT | DRQ_STAT;
-	ide_dma_start(s, ide_atapi_cmd_read_dma_cb);
+        s->bus->dma->ops->start_dma(s->bus->dma, s,
+                                   ide_atapi_cmd_read_dma_cb);
     } else {
     	s->status = READY_STAT | SEEK_STAT;
     	ide_atapi_cmd_reply_end(s);
@@ -1029,8 +896,7 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
 /* XXX: handle read errors */
 static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
 {
-    BMDMAState *bm = opaque;
-    IDEState *s = bmdma_active_if(bm);
+    IDEState *s = opaque;
     int data_offset, n;
 
     if (ret < 0) {
@@ -1056,7 +922,7 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
 	    s->lba += n;
 	}
         s->packet_transfer_size -= s->io_buffer_size;
-        if (dma_buf_rw(bm, 1) == 0)
+        if (s->bus->dma->ops->rw_buf(s->bus->dma, 1) == 0)
             goto eot;
     }
 
@@ -1065,8 +931,8 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
         s->nsector = (s->nsector & ~7) | ATAPI_INT_REASON_IO | ATAPI_INT_REASON_CD;
         ide_set_irq(s->bus);
     eot:
-        bm->status |= BM_STATUS_INT;
-        ide_dma_set_inactive(bm);
+        s->bus->dma->ops->add_status(s->bus->dma, BM_STATUS_INT);
+        ide_set_inactive(s);
         return;
     }
 
@@ -1085,12 +951,13 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
 #ifdef DEBUG_AIO
     printf("aio_read_cd: lba=%u n=%d\n", s->lba, n);
 #endif
-    bm->iov.iov_base = (void *)(s->io_buffer + data_offset);
-    bm->iov.iov_len = n * 4 * 512;
-    qemu_iovec_init_external(&bm->qiov, &bm->iov, 1);
-    bm->aiocb = bdrv_aio_readv(s->bs, (int64_t)s->lba << 2, &bm->qiov,
-                               n * 4, ide_atapi_cmd_read_dma_cb, bm);
-    if (!bm->aiocb) {
+    s->bus->dma->iov.iov_base = (void *)(s->io_buffer + data_offset);
+    s->bus->dma->iov.iov_len = n * 4 * 512;
+    qemu_iovec_init_external(&s->bus->dma->qiov, &s->bus->dma->iov, 1);
+    s->bus->dma->aiocb = bdrv_aio_readv(s->bs, (int64_t)s->lba << 2,
+                                       &s->bus->dma->qiov, n * 4,
+                                       ide_atapi_cmd_read_dma_cb, s);
+    if (!s->bus->dma->aiocb) {
         /* Note: media not present is the most likely case */
         ide_atapi_cmd_error(s, SENSE_NOT_READY,
                             ASC_MEDIUM_NOT_PRESENT);
@@ -1111,7 +978,8 @@ static void ide_atapi_cmd_read_dma(IDEState *s, int lba, int nb_sectors,
 
     /* XXX: check if BUSY_STAT should be set */
     s->status = READY_STAT | SEEK_STAT | DRQ_STAT | BUSY_STAT;
-    ide_dma_start(s, ide_atapi_cmd_read_dma_cb);
+    s->bus->dma->ops->start_dma(s->bus->dma, s,
+                               ide_atapi_cmd_read_dma_cb);
 }
 
 static void ide_atapi_cmd_read(IDEState *s, int lba, int nb_sectors,
@@ -2638,6 +2506,18 @@ void ide_bus_reset(IDEBus *bus)
     ide_reset(&bus->ifs[0]);
     ide_reset(&bus->ifs[1]);
     ide_clear_hob(bus);
+
+    /* pending async DMA */
+    if (bus->dma->aiocb) {
+#ifdef DEBUG_AIO
+        printf("aio_cancel\n");
+#endif
+        bdrv_aio_cancel(bus->dma->aiocb);
+        bus->dma->aiocb = NULL;
+    }
+
+    /* reset dma provider too */
+    bus->dma->ops->reset(bus->dma);
 }
 
 int ide_init_drive(IDEState *s, BlockDriverState *bs,
@@ -2696,6 +2576,7 @@ int ide_init_drive(IDEState *s, BlockDriverState *bs,
     } else {
         pstrcpy(s->version, sizeof(s->version), QEMU_VERSION);
     }
+
     ide_reset(s);
     bdrv_set_removable(bs, s->drive_kind == IDE_CD);
     return 0;
@@ -2717,6 +2598,42 @@ static void ide_init1(IDEBus *bus, int unit)
                                            ide_sector_write_timer_cb, s);
 }
 
+static void ide_nop_start(IDEDMA *dma, IDEState *s,
+                          BlockDriverCompletionFunc *cb)
+{
+}
+
+static int ide_nop(IDEDMA *dma)
+{
+    return 0;
+}
+
+static int ide_nop_int(IDEDMA *dma, int x)
+{
+    return 0;
+}
+
+static void ide_nop_restart(void *opaque, int x, int y)
+{
+}
+
+static const IDEDMAOps ide_dma_nop_ops = {
+    .start_dma      = ide_nop_start,
+    .start_transfer = ide_nop,
+    .prepare_buf    = ide_nop_int,
+    .rw_buf         = ide_nop_int,
+    .set_unit       = ide_nop_int,
+    .add_status     = ide_nop_int,
+    .set_inactive   = ide_nop,
+    .restart_cb     = ide_nop_restart,
+    .reset          = ide_nop,
+};
+
+static IDEDMA ide_dma_nop = {
+    .ops = &ide_dma_nop_ops,
+    .aiocb = NULL,
+};
+
 void ide_init2(IDEBus *bus, qemu_irq irq)
 {
     int i;
@@ -2726,6 +2643,7 @@ void ide_init2(IDEBus *bus, qemu_irq irq)
         ide_reset(&bus->ifs[i]);
     }
     bus->irq = irq;
+    bus->dma = &ide_dma_nop;
 }
 
 /* TODO convert users to qdev and remove */
@@ -2749,6 +2667,7 @@ void ide_init2_with_non_qdev_drives(IDEBus *bus, DriveInfo *hd0,
         }
     }
     bus->irq = irq;
+    bus->dma = &ide_dma_nop;
 }
 
 void ide_init_ioport(IDEBus *bus, int iobase, int iobase2)
@@ -2916,73 +2835,3 @@ const VMStateDescription vmstate_ide_bus = {
         VMSTATE_END_OF_LIST()
     }
 };
-
-/***********************************************************/
-/* PCI IDE definitions */
-
-static void ide_dma_start(IDEState *s, BlockDriverCompletionFunc *dma_cb)
-{
-    BMDMAState *bm = s->bus->bmdma;
-    if(!bm)
-        return;
-    bm->unit = s->unit;
-    bm->dma_cb = dma_cb;
-    bm->cur_prd_last = 0;
-    bm->cur_prd_addr = 0;
-    bm->cur_prd_len = 0;
-    bm->sector_num = ide_get_sector(s);
-    bm->nsector = s->nsector;
-    if (bm->status & BM_STATUS_DMAING) {
-        bm->dma_cb(bm, 0);
-    }
-}
-
-static void ide_dma_restart(IDEState *s, int is_read)
-{
-    BMDMAState *bm = s->bus->bmdma;
-    ide_set_sector(s, bm->sector_num);
-    s->io_buffer_index = 0;
-    s->io_buffer_size = 0;
-    s->nsector = bm->nsector;
-    bm->cur_addr = bm->addr;
-
-    if (is_read) {
-        bm->dma_cb = ide_read_dma_cb;
-    } else {
-        bm->dma_cb = ide_write_dma_cb;
-    }
-
-    ide_dma_start(s, bm->dma_cb);
-}
-
-void ide_dma_cancel(BMDMAState *bm)
-{
-    if (bm->status & BM_STATUS_DMAING) {
-        if (bm->aiocb) {
-#ifdef DEBUG_AIO
-            printf("aio_cancel\n");
-#endif
-            bdrv_aio_cancel(bm->aiocb);
-        }
-
-        /* cancel DMA request */
-        ide_dma_set_inactive(bm);
-    }
-}
-
-void ide_dma_reset(BMDMAState *bm)
-{
-#ifdef DEBUG_IDE
-    printf("ide: dma_reset\n");
-#endif
-    ide_dma_cancel(bm);
-    bm->cmd = 0;
-    bm->status = 0;
-    bm->addr = 0;
-    bm->cur_addr = 0;
-    bm->cur_prd_last = 0;
-    bm->cur_prd_addr = 0;
-    bm->cur_prd_len = 0;
-    bm->sector_num = 0;
-    bm->nsector = 0;
-}
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 029c76c..aadb505 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -20,7 +20,8 @@ typedef struct IDEBus IDEBus;
 typedef struct IDEDevice IDEDevice;
 typedef struct IDEDeviceInfo IDEDeviceInfo;
 typedef struct IDEState IDEState;
-typedef struct BMDMAState BMDMAState;
+typedef struct IDEDMA IDEDMA;
+typedef struct IDEDMAOps IDEDMAOps;
 
 /* Bits of HD_STATUS */
 #define ERR_STAT		0x01
@@ -367,6 +368,11 @@ typedef enum { IDE_HD, IDE_CD, IDE_CFATA } IDEDriveKind;
 
 typedef void EndTransferFunc(IDEState *);
 
+typedef void DMAStartFunc(IDEDMA *, IDEState *, BlockDriverCompletionFunc *);
+typedef int DMAFunc(IDEDMA *);
+typedef int DMAIntFunc(IDEDMA *, int);
+typedef void DMARestartFunc(void *, int, int);
+
 /* NOTE: IDEState represents in fact one drive */
 struct IDEState {
     IDEBus *bus;
@@ -443,13 +449,32 @@ struct IDEState {
     uint8_t *smart_selftest_data;
 };
 
+struct IDEDMAOps {
+    DMAStartFunc *start_dma;
+    DMAFunc *start_transfer;
+    DMAIntFunc *prepare_buf;
+    DMAIntFunc *rw_buf;
+    DMAIntFunc *set_unit;
+    DMAIntFunc *add_status;
+    DMAFunc *set_inactive;
+    DMARestartFunc *restart_cb;
+    DMAFunc *reset;
+};
+
+struct IDEDMA {
+    const struct IDEDMAOps *ops;
+    struct iovec iov;
+    QEMUIOVector qiov;
+    BlockDriverAIOCB *aiocb;
+};
+
 struct IDEBus {
     BusState qbus;
     IDEDevice *master;
     IDEDevice *slave;
-    BMDMAState *bmdma;
     IDEState ifs[2];
     int bus_id;
+    IDEDMA *dma;
     uint8_t unit;
     uint8_t cmd;
     qemu_irq irq;
@@ -480,46 +505,14 @@ struct IDEDeviceInfo {
 #define BM_CMD_START     0x01
 #define BM_CMD_READ      0x08
 
-struct BMDMAState {
-    uint8_t cmd;
-    uint8_t status;
-    uint32_t addr;
-
-    IDEBus *bus;
-    /* current transfer state */
-    uint32_t cur_addr;
-    uint32_t cur_prd_last;
-    uint32_t cur_prd_addr;
-    uint32_t cur_prd_len;
-    uint8_t unit;
-    BlockDriverCompletionFunc *dma_cb;
-    BlockDriverAIOCB *aiocb;
-    struct iovec iov;
-    QEMUIOVector qiov;
-    int64_t sector_num;
-    uint32_t nsector;
-    IORange addr_ioport;
-    QEMUBH *bh;
-};
-
 static inline IDEState *idebus_active_if(IDEBus *bus)
 {
     return bus->ifs + bus->unit;
 }
 
-static inline IDEState *bmdma_active_if(BMDMAState *bmdma)
-{
-    assert(bmdma->unit != (uint8_t)-1);
-    return bmdma->bus->ifs + bmdma->unit;
-}
-
 static inline void ide_set_irq(IDEBus *bus)
 {
-    BMDMAState *bm = bus->bmdma;
     if (!(bus->cmd & IDE_CMD_DISABLE_IRQ)) {
-        if (bm) {
-            bm->status |= BM_STATUS_INT;
-        }
         qemu_irq_raise(bus->irq);
     }
 }
@@ -542,10 +535,7 @@ void ide_bus_reset(IDEBus *bus);
 int64_t ide_get_sector(IDEState *s);
 void ide_set_sector(IDEState *s, int64_t sector_num);
 
-void ide_dma_cancel(BMDMAState *bm);
-void ide_dma_restart_cb(void *opaque, int running, int reason);
 void ide_dma_error(IDEState *s);
-void ide_dma_reset(BMDMAState *bm);
 
 void ide_atapi_cmd_ok(IDEState *s);
 void ide_atapi_cmd_error(IDEState *s, int sense_key, int asc);
@@ -568,6 +558,11 @@ void ide_init2_with_non_qdev_drives(IDEBus *bus, DriveInfo *hd0,
 void ide_init_ioport(IDEBus *bus, int iobase, int iobase2);
 
 void ide_exec_cmd(IDEBus *bus, uint32_t val);
+void ide_read_dma_cb(void *opaque, int ret);
+void ide_write_dma_cb(void *opaque, int ret);
+void ide_sector_write(IDEState *s);
+void ide_sector_read(IDEState *s);
+void ide_flush_cache(IDEState *s);
 
 /* hw/ide/qdev.c */
 void ide_bus_new(IDEBus *idebus, DeviceState *dev, int bus_id);
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index ad406ee..510b2de 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -33,6 +33,253 @@
 
 #include <hw/ide/pci.h>
 
+#define BMDMA_PAGE_SIZE 4096
+
+static void bmdma_start_dma(IDEDMA *dma, IDEState *s,
+                            BlockDriverCompletionFunc *dma_cb)
+{
+    BMDMAState *bm = DO_UPCAST(BMDMAState, dma, dma);
+
+    bm->unit = s->unit;
+    bm->dma_cb = dma_cb;
+    bm->cur_prd_last = 0;
+    bm->cur_prd_addr = 0;
+    bm->cur_prd_len = 0;
+    bm->sector_num = ide_get_sector(s);
+    bm->nsector = s->nsector;
+
+    if (bm->status & BM_STATUS_DMAING) {
+        bm->dma_cb(bmdma_active_if(bm), 0);
+    }
+}
+
+/* return 0 if buffer completed */
+static int bmdma_prepare_buf(IDEDMA *dma, int is_write)
+{
+    BMDMAState *bm = DO_UPCAST(BMDMAState, dma, dma);
+    IDEState *s = bmdma_active_if(bm);
+    struct {
+        uint32_t addr;
+        uint32_t size;
+    } prd;
+    int l, len;
+
+    qemu_sglist_init(&s->sg, s->nsector / (BMDMA_PAGE_SIZE / 512) + 1);
+    s->io_buffer_size = 0;
+    for(;;) {
+        if (bm->cur_prd_len == 0) {
+            /* end of table (with a fail safe of one page) */
+            if (bm->cur_prd_last ||
+                (bm->cur_addr - bm->addr) >= BMDMA_PAGE_SIZE)
+                return s->io_buffer_size != 0;
+            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            bm->cur_addr += 8;
+            prd.addr = le32_to_cpu(prd.addr);
+            prd.size = le32_to_cpu(prd.size);
+            len = prd.size & 0xfffe;
+            if (len == 0)
+                len = 0x10000;
+            bm->cur_prd_len = len;
+            bm->cur_prd_addr = prd.addr;
+            bm->cur_prd_last = (prd.size & 0x80000000);
+        }
+        l = bm->cur_prd_len;
+        if (l > 0) {
+            qemu_sglist_add(&s->sg, bm->cur_prd_addr, l);
+            bm->cur_prd_addr += l;
+            bm->cur_prd_len -= l;
+            s->io_buffer_size += l;
+        }
+    }
+    return 1;
+}
+
+/* return 0 if buffer completed */
+static int bmdma_rw_buf(IDEDMA *dma, int is_write)
+{
+    BMDMAState *bm = DO_UPCAST(BMDMAState, dma, dma);
+    IDEState *s = bmdma_active_if(bm);
+    struct {
+        uint32_t addr;
+        uint32_t size;
+    } prd;
+    int l, len;
+
+    for(;;) {
+        l = s->io_buffer_size - s->io_buffer_index;
+        if (l <= 0)
+            break;
+        if (bm->cur_prd_len == 0) {
+            /* end of table (with a fail safe of one page) */
+            if (bm->cur_prd_last ||
+                (bm->cur_addr - bm->addr) >= BMDMA_PAGE_SIZE)
+                return 0;
+            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            bm->cur_addr += 8;
+            prd.addr = le32_to_cpu(prd.addr);
+            prd.size = le32_to_cpu(prd.size);
+            len = prd.size & 0xfffe;
+            if (len == 0)
+                len = 0x10000;
+            bm->cur_prd_len = len;
+            bm->cur_prd_addr = prd.addr;
+            bm->cur_prd_last = (prd.size & 0x80000000);
+        }
+        if (l > bm->cur_prd_len)
+            l = bm->cur_prd_len;
+        if (l > 0) {
+            if (is_write) {
+                cpu_physical_memory_write(bm->cur_prd_addr,
+                                          s->io_buffer + s->io_buffer_index, l);
+            } else {
+                cpu_physical_memory_read(bm->cur_prd_addr,
+                                          s->io_buffer + s->io_buffer_index, l);
+            }
+            bm->cur_prd_addr += l;
+            bm->cur_prd_len -= l;
+            s->io_buffer_index += l;
+        }
+    }
+    return 1;
+}
+
+static int bmdma_set_unit(IDEDMA *dma, int unit)
+{
+    BMDMAState *bm = DO_UPCAST(BMDMAState, dma, dma);
+    bm->unit = unit;
+
+    return 0;
+}
+
+static int bmdma_add_status(IDEDMA *dma, int status)
+{
+    BMDMAState *bm = DO_UPCAST(BMDMAState, dma, dma);
+    bm->status |= status;
+
+    return 0;
+}
+
+static int bmdma_set_inactive(IDEDMA *dma)
+{
+    BMDMAState *bm = DO_UPCAST(BMDMAState, dma, dma);
+
+    bm->status &= ~BM_STATUS_DMAING;
+    bm->dma_cb = NULL;
+    bm->unit = -1;
+
+    return 0;
+}
+
+static void bmdma_restart_dma(BMDMAState *bm, int is_read)
+{
+    IDEState *s = bmdma_active_if(bm);
+
+    ide_set_sector(s, bm->sector_num);
+    s->io_buffer_index = 0;
+    s->io_buffer_size = 0;
+    s->nsector = bm->nsector;
+    bm->cur_addr = bm->addr;
+
+    if (is_read) {
+        bm->dma_cb = ide_read_dma_cb;
+    } else {
+        bm->dma_cb = ide_write_dma_cb;
+    }
+
+    bmdma_start_dma(&bm->dma, s, bm->dma_cb);
+}
+
+static void bmdma_restart_bh(void *opaque)
+{
+    BMDMAState *bm = opaque;
+    int is_read;
+
+    qemu_bh_delete(bm->bh);
+    bm->bh = NULL;
+
+    is_read = !!(bm->status & BM_STATUS_RETRY_READ);
+
+    if (bm->status & BM_STATUS_DMA_RETRY) {
+        bm->status &= ~(BM_STATUS_DMA_RETRY | BM_STATUS_RETRY_READ);
+        bmdma_restart_dma(bm, is_read);
+    } else if (bm->status & BM_STATUS_PIO_RETRY) {
+        bm->status &= ~(BM_STATUS_PIO_RETRY | BM_STATUS_RETRY_READ);
+        if (is_read) {
+            ide_sector_read(bmdma_active_if(bm));
+        } else {
+            ide_sector_write(bmdma_active_if(bm));
+        }
+    } else if (bm->status & BM_STATUS_RETRY_FLUSH) {
+        ide_flush_cache(bmdma_active_if(bm));
+    }
+}
+
+static void bmdma_restart_cb(void *opaque, int running, int reason)
+{
+    IDEDMA *dma = opaque;
+    BMDMAState *bm = DO_UPCAST(BMDMAState, dma, dma);
+
+    if (!running)
+        return;
+
+    if (!bm->bh) {
+        bm->bh = qemu_bh_new(bmdma_restart_bh, &bm->dma);
+        qemu_bh_schedule(bm->bh);
+    }
+}
+
+static void bmdma_cancel(BMDMAState *bm)
+{
+    if (bm->status & BM_STATUS_DMAING) {
+        /* cancel DMA request */
+        bmdma_set_inactive(&bm->dma);
+    }
+}
+
+static int bmdma_reset(IDEDMA *dma)
+{
+    BMDMAState *bm = DO_UPCAST(BMDMAState, dma, dma);
+
+#ifdef DEBUG_IDE
+    printf("ide: dma_reset\n");
+#endif
+    bmdma_cancel(bm);
+    bm->cmd = 0;
+    bm->status = 0;
+    bm->addr = 0;
+    bm->cur_addr = 0;
+    bm->cur_prd_last = 0;
+    bm->cur_prd_addr = 0;
+    bm->cur_prd_len = 0;
+    bm->sector_num = 0;
+    bm->nsector = 0;
+
+    return 0;
+}
+
+static int bmdma_start_transfer(IDEDMA *dma)
+{
+    return 0;
+}
+
+static void bmdma_irq(void *opaque, int n, int level)
+{
+    BMDMAState *bm = opaque;
+
+    if (!level) {
+        /* pass through lower */
+        qemu_set_irq(bm->irq, level);
+        return;
+    }
+
+    if (bm) {
+        bm->status |= BM_STATUS_INT;
+    }
+
+    /* trigger the real irq */
+    qemu_set_irq(bm->irq, level);
+}
+
 void bmdma_cmd_writeb(void *opaque, uint32_t addr, uint32_t val)
 {
     BMDMAState *bm = opaque;
@@ -55,10 +302,10 @@ void bmdma_cmd_writeb(void *opaque, uint32_t addr, uint32_t val)
              * whole DMA operation will be submitted to disk with a single
              * aio operation with preadv/pwritev.
              */
-            if (bm->aiocb) {
+            if (bm->bus->dma->aiocb) {
                 qemu_aio_flush();
 #ifdef DEBUG_IDE
-                if (bm->aiocb)
+                if (bm->bus->dma->aiocb)
                     printf("ide_dma_cancel: aiocb still pending");
                 if (bm->status & BM_STATUS_DMAING)
                     printf("ide_dma_cancel: BM_STATUS_DMAING still pending");
@@ -70,7 +317,7 @@ void bmdma_cmd_writeb(void *opaque, uint32_t addr, uint32_t val)
                 bm->status |= BM_STATUS_DMAING;
                 /* start dma transfer if possible */
                 if (bm->dma_cb)
-                    bm->dma_cb(bm, 0);
+                    bm->dma_cb(bmdma_active_if(bm), 0);
             }
         }
     }
@@ -198,3 +445,30 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
         ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
     }
 }
+
+static const struct IDEDMAOps bmdma_ops = {
+    .start_dma = bmdma_start_dma,
+    .start_transfer = bmdma_start_transfer,
+    .prepare_buf = bmdma_prepare_buf,
+    .rw_buf = bmdma_rw_buf,
+    .set_unit = bmdma_set_unit,
+    .add_status = bmdma_add_status,
+    .set_inactive = bmdma_set_inactive,
+    .restart_cb = bmdma_restart_cb,
+    .reset = bmdma_reset,
+};
+
+void bmdma_init(IDEBus *bus, BMDMAState *bm)
+{
+    qemu_irq *irq;
+
+    if (bus->dma == &bm->dma) {
+        return;
+    }
+
+    bm->dma.ops = &bmdma_ops;
+    bus->dma = &bm->dma;
+    bm->irq = bus->irq;
+    irq = qemu_allocate_irqs(bmdma_irq, bm, 1);
+    bus->irq = *irq;
+}
diff --git a/hw/ide/pci.h b/hw/ide/pci.h
index b81b26c..cd72cba 100644
--- a/hw/ide/pci.h
+++ b/hw/ide/pci.h
@@ -3,6 +3,27 @@
 
 #include <hw/ide/internal.h>
 
+typedef struct BMDMAState {
+    IDEDMA dma;
+    uint8_t cmd;
+    uint8_t status;
+    uint32_t addr;
+
+    IDEBus *bus;
+    /* current transfer state */
+    uint32_t cur_addr;
+    uint32_t cur_prd_last;
+    uint32_t cur_prd_addr;
+    uint32_t cur_prd_len;
+    uint8_t unit;
+    BlockDriverCompletionFunc *dma_cb;
+    int64_t sector_num;
+    uint32_t nsector;
+    IORange addr_ioport;
+    QEMUBH *bh;
+    qemu_irq irq;
+} BMDMAState;
+
 typedef struct PCIIDEState {
     PCIDevice dev;
     IDEBus bus[2];
@@ -10,6 +31,15 @@ typedef struct PCIIDEState {
     uint32_t secondary; /* used only for cmd646 */
 } PCIIDEState;
 
+
+static inline IDEState *bmdma_active_if(BMDMAState *bmdma)
+{
+    assert(bmdma->unit != (uint8_t)-1);
+    return bmdma->bus->ifs + bmdma->unit;
+}
+
+
+void bmdma_init(IDEBus *bus, BMDMAState *bm);
 void bmdma_cmd_writeb(void *opaque, uint32_t addr, uint32_t val);
 extern const IORangeOps bmdma_addr_ioport_ops;
 void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table);
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 1c0cb0c..a6b5d92 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -76,9 +76,10 @@ static void bmdma_map(PCIDevice *pci_dev, int region_num,
 
     for(i = 0;i < 2; i++) {
         BMDMAState *bm = &d->bmdma[i];
-        d->bus[i].bmdma = bm;
+        bmdma_init(&d->bus[i], bm);
         bm->bus = d->bus+i;
-        qemu_add_vm_change_state_handler(ide_dma_restart_cb, bm);
+        qemu_add_vm_change_state_handler(d->bus[i].dma->ops->restart_cb,
+                                         &bm->dma);
 
         register_ioport_write(addr, 1, 1, bmdma_cmd_writeb, bm);
 
@@ -99,7 +100,6 @@ static void piix3_reset(void *opaque)
 
     for (i = 0; i < 2; i++) {
         ide_bus_reset(&d->bus[i]);
-        ide_dma_reset(&d->bmdma[i]);
     }
 
     /* TODO: this is the default. do not override. */
diff --git a/hw/ide/via.c b/hw/ide/via.c
index 78857e8..2603110 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -78,9 +78,10 @@ static void bmdma_map(PCIDevice *pci_dev, int region_num,
 
     for(i = 0;i < 2; i++) {
         BMDMAState *bm = &d->bmdma[i];
-        d->bus[i].bmdma = bm;
+        bmdma_init(&d->bus[i], bm);
         bm->bus = d->bus+i;
-        qemu_add_vm_change_state_handler(ide_dma_restart_cb, bm);
+        qemu_add_vm_change_state_handler(d->bus[i].dma->ops->restart_cb,
+                                         &bm->dma);
 
         register_ioport_write(addr, 1, 1, bmdma_cmd_writeb, bm);
 
@@ -101,7 +102,6 @@ static void via_reset(void *opaque)
 
     for (i = 0; i < 2; i++) {
         ide_bus_reset(&d->bus[i]);
-        ide_dma_reset(&d->bmdma[i]);
     }
 
     pci_set_word(pci_conf + PCI_COMMAND, PCI_COMMAND_WAIT);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 09/38] ide: move transfer_start after variable modification
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (7 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 08/38] ide: Split out BMDMA code from ATA core Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 10/38] ide: add ncq identify data for ahci sata drives Kevin Wolf
                   ` (29 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Alexander Graf <agraf@suse.de>

We hook into transfer_start and immediately call the end function
for ahci. This means that everything needs to be in place for the
end function when we start the transfer, so let's move the function
down to where all state is in place.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/ide/core.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 2032e20..228911d 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -814,11 +814,11 @@ static void ide_atapi_cmd_reply_end(IDEState *s)
             size = s->cd_sector_size - s->io_buffer_index;
             if (size > s->elementary_transfer_size)
                 size = s->elementary_transfer_size;
-            ide_transfer_start(s, s->io_buffer + s->io_buffer_index,
-                               size, ide_atapi_cmd_reply_end);
             s->packet_transfer_size -= size;
             s->elementary_transfer_size -= size;
             s->io_buffer_index += size;
+            ide_transfer_start(s, s->io_buffer + s->io_buffer_index - size,
+                               size, ide_atapi_cmd_reply_end);
         } else {
             /* a new transfer is needed */
             s->nsector = (s->nsector & ~7) | ATAPI_INT_REASON_IO;
@@ -843,11 +843,11 @@ static void ide_atapi_cmd_reply_end(IDEState *s)
                 if (size > (s->cd_sector_size - s->io_buffer_index))
                     size = (s->cd_sector_size - s->io_buffer_index);
             }
-            ide_transfer_start(s, s->io_buffer + s->io_buffer_index,
-                               size, ide_atapi_cmd_reply_end);
             s->packet_transfer_size -= size;
             s->elementary_transfer_size -= size;
             s->io_buffer_index += size;
+            ide_transfer_start(s, s->io_buffer + s->io_buffer_index - size,
+                               size, ide_atapi_cmd_reply_end);
             ide_set_irq(s->bus);
 #ifdef DEBUG_IDE_ATAPI
             printf("status=0x%x\n", s->status);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 10/38] ide: add ncq identify data for ahci sata drives
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (8 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 09/38] ide: move transfer_start after variable modification Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 11/38] pci: add storage class for sata Kevin Wolf
                   ` (28 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Roland Elek <elek.roland@gmail.com>

I modified ide_identify() to include the zero-based queue length
value in word 75, and set bit 8 in word 76 to signal NCQ support
in the identify data for AHCI SATA drives.

Signed-off-by: Roland Elek <elek.roland@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/ide/core.c     |    7 +++++++
 hw/ide/internal.h |    2 ++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 228911d..9e1d4e6 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -140,6 +140,13 @@ static void ide_identify(IDEState *s)
     put_le16(p + 66, 120);
     put_le16(p + 67, 120);
     put_le16(p + 68, 120);
+
+    if (s->ncq_queues) {
+        put_le16(p + 75, s->ncq_queues - 1);
+        /* NCQ supported */
+        put_le16(p + 76, (1 << 8));
+    }
+
     put_le16(p + 80, 0xf0); /* ata3 -> ata6 supported */
     put_le16(p + 81, 0x16); /* conforms to ata5 */
     /* 14=NOP supported, 5=WCACHE supported, 0=SMART supported */
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index aadb505..697c3b4 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -447,6 +447,8 @@ struct IDEState {
     int smart_errors;
     uint8_t smart_selftest_count;
     uint8_t *smart_selftest_data;
+    /* AHCI */
+    int ncq_queues;
 };
 
 struct IDEDMAOps {
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 11/38] pci: add storage class for sata
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (9 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 10/38] ide: add ncq identify data for ahci sata drives Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 12/38] pci: add ich9 pci id Kevin Wolf
                   ` (27 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Alexander Graf <agraf@suse.de>

This patch adds the storage sata class id.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/pci_ids.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/pci_ids.h b/hw/pci_ids.h
index 82cba7e..ea3418c 100644
--- a/hw/pci_ids.h
+++ b/hw/pci_ids.h
@@ -15,6 +15,7 @@
 
 #define PCI_CLASS_STORAGE_SCSI           0x0100
 #define PCI_CLASS_STORAGE_IDE            0x0101
+#define PCI_CLASS_STORAGE_SATA           0x0106
 #define PCI_CLASS_STORAGE_OTHER          0x0180
 
 #define PCI_CLASS_NETWORK_ETHERNET       0x0200
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 12/38] pci: add ich9 pci id
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (10 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 11/38] pci: add storage class for sata Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 13/38] ahci: add ahci emulation Kevin Wolf
                   ` (26 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Alexander Graf <agraf@suse.de>

We need a PCI ID for our new AHCI adapter. I just picked an ICH-9
because that's the one in the Q35 chipset.

This patch adds a PCI ID define for an ICH-9 AHCI adapter.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/pci.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/pci.h b/hw/pci.h
index aa3afe9..17744dc 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -62,6 +62,7 @@
 /* Intel (0x8086) */
 #define PCI_DEVICE_ID_INTEL_82551IT      0x1209
 #define PCI_DEVICE_ID_INTEL_82557        0x1229
+#define PCI_DEVICE_ID_INTEL_82801IR      0x2922
 
 /* Red Hat / Qumranet (for QEMU) -- see pci-ids.txt */
 #define PCI_VENDOR_ID_REDHAT_QUMRANET    0x1af4
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 13/38] ahci: add ahci emulation
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (11 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 12/38] pci: add ich9 pci id Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 14/38] config: move ide core and pci to pci.mak Kevin Wolf
                   ` (25 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Alexander Graf <agraf@suse.de>

This patch adds an emulation layer for an ICH-9 AHCI controller. For now
this controller does not do IDE legacy emulation. It is a pure AHCI controller.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 Makefile.objs |    1 +
 hw/ide/ahci.c | 1524 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 1525 insertions(+), 0 deletions(-)
 create mode 100644 hw/ide/ahci.c

diff --git a/Makefile.objs b/Makefile.objs
index 24b2f99..72c07dd 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -243,6 +243,7 @@ hw-obj-$(CONFIG_IDE_PIIX) += ide/piix.o
 hw-obj-$(CONFIG_IDE_CMD646) += ide/cmd646.o
 hw-obj-$(CONFIG_IDE_MACIO) += ide/macio.o
 hw-obj-$(CONFIG_IDE_VIA) += ide/via.o
+hw-obj-$(CONFIG_AHCI) += ide/ahci.o
 
 # SCSI layer
 hw-obj-$(CONFIG_LSI_SCSI_PCI) += lsi53c895a.o
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
new file mode 100644
index 0000000..f937a92
--- /dev/null
+++ b/hw/ide/ahci.c
@@ -0,0 +1,1524 @@
+/*
+ * QEMU AHCI Emulation
+ *
+ * Copyright (c) 2010 qiaochong@loongson.cn
+ * Copyright (c) 2010 Roland Elek <elek.roland@gmail.com>
+ * Copyright (c) 2010 Sebastian Herbszt <herbszt@gmx.de>
+ * Copyright (c) 2010 Alexander Graf <agraf@suse.de>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ *
+ *
+ * lspci dump of a ICH-9 real device in IDE mode (hopefully close enough):
+ *
+ * 00:1f.2 SATA controller [0106]: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller [8086:2922] (rev 02) (prog-if 01 [AHCI 1.0])
+ *         Subsystem: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller [8086:2922]
+ *         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
+ *         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
+ *         Latency: 0
+ *         Interrupt: pin B routed to IRQ 222
+ *         Region 0: I/O ports at d000 [size=8]
+ *         Region 1: I/O ports at cc00 [size=4]
+ *         Region 2: I/O ports at c880 [size=8]
+ *         Region 3: I/O ports at c800 [size=4]
+ *         Region 4: I/O ports at c480 [size=32]
+ *         Region 5: Memory at febf9000 (32-bit, non-prefetchable) [size=2K]
+ *         Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Count=1/16 Enable+
+ *                 Address: fee0f00c  Data: 41d9
+ *         Capabilities: [70] Power Management version 3
+ *                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
+ *                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
+ *         Capabilities: [a8] SATA HBA <?>
+ *         Capabilities: [b0] Vendor Specific Information <?>
+ *         Kernel driver in use: ahci
+ *         Kernel modules: ahci
+ * 00: 86 80 22 29 07 04 b0 02 02 01 06 01 00 00 00 00
+ * 10: 01 d0 00 00 01 cc 00 00 81 c8 00 00 01 c8 00 00
+ * 20: 81 c4 00 00 00 90 bf fe 00 00 00 00 86 80 22 29
+ * 30: 00 00 00 00 80 00 00 00 00 00 00 00 0f 02 00 00
+ * 40: 00 80 00 80 00 00 00 00 00 00 00 00 00 00 00 00
+ * 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+ * 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+ * 70: 01 a8 03 40 08 00 00 00 00 00 00 00 00 00 00 00
+ * 80: 05 70 09 00 0c f0 e0 fe d9 41 00 00 00 00 00 00
+ * 90: 40 00 0f 82 93 01 00 00 00 00 00 00 00 00 00 00
+ * a0: ac 00 00 00 0a 00 12 00 12 b0 10 00 48 00 00 00
+ * b0: 09 00 06 20 00 00 00 00 00 00 00 00 00 00 00 00
+ * c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+ * d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+ * e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+ * f0: 00 00 00 00 00 00 00 00 86 0f 02 00 00 00 00 00
+ *
+ */
+
+#include <hw/hw.h>
+#include <hw/msi.h>
+#include <hw/pc.h>
+#include <hw/pci.h>
+
+#include "monitor.h"
+#include "dma.h"
+#include "cpu-common.h"
+#include "blockdev.h"
+#include "internal.h"
+#include <hw/ide/pci.h>
+
+/* #define DEBUG_AHCI */
+
+#ifdef DEBUG_AHCI
+#define DPRINTF(port, fmt, ...) \
+do { fprintf(stderr, "ahci: %s: [%d] ", __FUNCTION__, port); \
+     fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(port, fmt, ...) do {} while(0)
+#endif
+
+#define AHCI_PCI_BAR              5
+#define AHCI_MAX_PORTS            32
+#define AHCI_MAX_SG               168 /* hardware max is 64K */
+#define AHCI_DMA_BOUNDARY         0xffffffff
+#define AHCI_USE_CLUSTERING       0
+#define AHCI_MAX_CMDS             32
+#define AHCI_CMD_SZ               32
+#define AHCI_CMD_SLOT_SZ          (AHCI_MAX_CMDS * AHCI_CMD_SZ)
+#define AHCI_RX_FIS_SZ            256
+#define AHCI_CMD_TBL_CDB          0x40
+#define AHCI_CMD_TBL_HDR_SZ       0x80
+#define AHCI_CMD_TBL_SZ           (AHCI_CMD_TBL_HDR_SZ + (AHCI_MAX_SG * 16))
+#define AHCI_CMD_TBL_AR_SZ        (AHCI_CMD_TBL_SZ * AHCI_MAX_CMDS)
+#define AHCI_PORT_PRIV_DMA_SZ     (AHCI_CMD_SLOT_SZ + AHCI_CMD_TBL_AR_SZ + \
+                                   AHCI_RX_FIS_SZ)
+
+#define AHCI_IRQ_ON_SG            (1 << 31)
+#define AHCI_CMD_ATAPI            (1 << 5)
+#define AHCI_CMD_WRITE            (1 << 6)
+#define AHCI_CMD_PREFETCH         (1 << 7)
+#define AHCI_CMD_RESET            (1 << 8)
+#define AHCI_CMD_CLR_BUSY         (1 << 10)
+
+#define RX_FIS_D2H_REG            0x40 /* offset of D2H Register FIS data */
+#define RX_FIS_SDB                0x58 /* offset of SDB FIS data */
+#define RX_FIS_UNK                0x60 /* offset of Unknown FIS data */
+
+/* global controller registers */
+#define HOST_CAP                  0x00 /* host capabilities */
+#define HOST_CTL                  0x04 /* global host control */
+#define HOST_IRQ_STAT             0x08 /* interrupt status */
+#define HOST_PORTS_IMPL           0x0c /* bitmap of implemented ports */
+#define HOST_VERSION              0x10 /* AHCI spec. version compliancy */
+
+/* HOST_CTL bits */
+#define HOST_CTL_RESET            (1 << 0)  /* reset controller; self-clear */
+#define HOST_CTL_IRQ_EN           (1 << 1)  /* global IRQ enable */
+#define HOST_CTL_AHCI_EN          (1 << 31) /* AHCI enabled */
+
+/* HOST_CAP bits */
+#define HOST_CAP_SSC              (1 << 14) /* Slumber capable */
+#define HOST_CAP_AHCI             (1 << 18) /* AHCI only */
+#define HOST_CAP_CLO              (1 << 24) /* Command List Override support */
+#define HOST_CAP_SSS              (1 << 27) /* Staggered Spin-up */
+#define HOST_CAP_NCQ              (1 << 30) /* Native Command Queueing */
+#define HOST_CAP_64               (1 << 31) /* PCI DAC (64-bit DMA) support */
+
+/* registers for each SATA port */
+#define PORT_LST_ADDR             0x00 /* command list DMA addr */
+#define PORT_LST_ADDR_HI          0x04 /* command list DMA addr hi */
+#define PORT_FIS_ADDR             0x08 /* FIS rx buf addr */
+#define PORT_FIS_ADDR_HI          0x0c /* FIS rx buf addr hi */
+#define PORT_IRQ_STAT             0x10 /* interrupt status */
+#define PORT_IRQ_MASK             0x14 /* interrupt enable/disable mask */
+#define PORT_CMD                  0x18 /* port command */
+#define PORT_TFDATA               0x20 /* taskfile data */
+#define PORT_SIG                  0x24 /* device TF signature */
+#define PORT_SCR_STAT             0x28 /* SATA phy register: SStatus */
+#define PORT_SCR_CTL              0x2c /* SATA phy register: SControl */
+#define PORT_SCR_ERR              0x30 /* SATA phy register: SError */
+#define PORT_SCR_ACT              0x34 /* SATA phy register: SActive */
+#define PORT_CMD_ISSUE            0x38 /* command issue */
+#define PORT_RESERVED             0x3c /* reserved */
+
+/* PORT_IRQ_{STAT,MASK} bits */
+#define PORT_IRQ_COLD_PRES        (1 << 31) /* cold presence detect */
+#define PORT_IRQ_TF_ERR           (1 << 30) /* task file error */
+#define PORT_IRQ_HBUS_ERR         (1 << 29) /* host bus fatal error */
+#define PORT_IRQ_HBUS_DATA_ERR    (1 << 28) /* host bus data error */
+#define PORT_IRQ_IF_ERR           (1 << 27) /* interface fatal error */
+#define PORT_IRQ_IF_NONFATAL      (1 << 26) /* interface non-fatal error */
+#define PORT_IRQ_OVERFLOW         (1 << 24) /* xfer exhausted available S/G */
+#define PORT_IRQ_BAD_PMP          (1 << 23) /* incorrect port multiplier */
+
+#define PORT_IRQ_PHYRDY           (1 << 22) /* PhyRdy changed */
+#define PORT_IRQ_DEV_ILCK         (1 << 7) /* device interlock */
+#define PORT_IRQ_CONNECT          (1 << 6) /* port connect change status */
+#define PORT_IRQ_SG_DONE          (1 << 5) /* descriptor processed */
+#define PORT_IRQ_UNK_FIS          (1 << 4) /* unknown FIS rx'd */
+#define PORT_IRQ_SDB_FIS          (1 << 3) /* Set Device Bits FIS rx'd */
+#define PORT_IRQ_DMAS_FIS         (1 << 2) /* DMA Setup FIS rx'd */
+#define PORT_IRQ_PIOS_FIS         (1 << 1) /* PIO Setup FIS rx'd */
+#define PORT_IRQ_D2H_REG_FIS      (1 << 0) /* D2H Register FIS rx'd */
+
+#define PORT_IRQ_FREEZE           (PORT_IRQ_HBUS_ERR | PORT_IRQ_IF_ERR |   \
+                                   PORT_IRQ_CONNECT | PORT_IRQ_PHYRDY |    \
+                                   PORT_IRQ_UNK_FIS)
+#define PORT_IRQ_ERROR            (PORT_IRQ_FREEZE | PORT_IRQ_TF_ERR |     \
+                                   PORT_IRQ_HBUS_DATA_ERR)
+#define DEF_PORT_IRQ              (PORT_IRQ_ERROR | PORT_IRQ_SG_DONE |     \
+                                   PORT_IRQ_SDB_FIS | PORT_IRQ_DMAS_FIS |  \
+                                   PORT_IRQ_PIOS_FIS | PORT_IRQ_D2H_REG_FIS)
+
+/* PORT_CMD bits */
+#define PORT_CMD_ATAPI            (1 << 24) /* Device is ATAPI */
+#define PORT_CMD_LIST_ON          (1 << 15) /* cmd list DMA engine running */
+#define PORT_CMD_FIS_ON           (1 << 14) /* FIS DMA engine running */
+#define PORT_CMD_FIS_RX           (1 << 4) /* Enable FIS receive DMA engine */
+#define PORT_CMD_CLO              (1 << 3) /* Command list override */
+#define PORT_CMD_POWER_ON         (1 << 2) /* Power up device */
+#define PORT_CMD_SPIN_UP          (1 << 1) /* Spin up device */
+#define PORT_CMD_START            (1 << 0) /* Enable port DMA engine */
+
+#define PORT_CMD_ICC_MASK         (0xf << 28) /* i/f ICC state mask */
+#define PORT_CMD_ICC_ACTIVE       (0x1 << 28) /* Put i/f in active state */
+#define PORT_CMD_ICC_PARTIAL      (0x2 << 28) /* Put i/f in partial state */
+#define PORT_CMD_ICC_SLUMBER      (0x6 << 28) /* Put i/f in slumber state */
+
+#define PORT_IRQ_STAT_DHRS        (1 << 0) /* Device to Host Register FIS */
+#define PORT_IRQ_STAT_PSS         (1 << 1) /* PIO Setup FIS */
+#define PORT_IRQ_STAT_DSS         (1 << 2) /* DMA Setup FIS */
+#define PORT_IRQ_STAT_SDBS        (1 << 3) /* Set Device Bits */
+#define PORT_IRQ_STAT_UFS         (1 << 4) /* Unknown FIS */
+#define PORT_IRQ_STAT_DPS         (1 << 5) /* Descriptor Processed */
+#define PORT_IRQ_STAT_PCS         (1 << 6) /* Port Connect Change Status */
+#define PORT_IRQ_STAT_DMPS        (1 << 7) /* Device Mechanical Presence
+                                              Status */
+#define PORT_IRQ_STAT_PRCS        (1 << 22) /* File Ready Status */
+#define PORT_IRQ_STAT_IPMS        (1 << 23) /* Incorrect Port Multiplier
+                                               Status */
+#define PORT_IRQ_STAT_OFS         (1 << 24) /* Overflow Status */
+#define PORT_IRQ_STAT_INFS        (1 << 26) /* Interface Non-Fatal Error
+                                               Status */
+#define PORT_IRQ_STAT_IFS         (1 << 27) /* Interface Fatal Error */
+#define PORT_IRQ_STAT_HBDS        (1 << 28) /* Host Bus Data Error Status */
+#define PORT_IRQ_STAT_HBFS        (1 << 29) /* Host Bus Fatal Error Status */
+#define PORT_IRQ_STAT_TFES        (1 << 30) /* Task File Error Status */
+#define PORT_IRQ_STAT_CPDS        (1 << 31) /* Code Port Detect Status */
+
+/* ap->flags bits */
+#define AHCI_FLAG_NO_NCQ                  (1 << 24)
+#define AHCI_FLAG_IGN_IRQ_IF_ERR          (1 << 25) /* ignore IRQ_IF_ERR */
+#define AHCI_FLAG_HONOR_PI                (1 << 26) /* honor PORTS_IMPL */
+#define AHCI_FLAG_IGN_SERR_INTERNAL       (1 << 27) /* ignore SERR_INTERNAL */
+#define AHCI_FLAG_32BIT_ONLY              (1 << 28) /* force 32bit */
+
+#define ATA_SRST                          (1 << 2)  /* software reset */
+
+#define STATE_RUN                         0
+#define STATE_RESET                       1
+
+#define SATA_SCR_SSTATUS_DET_NODEV        0x0
+#define SATA_SCR_SSTATUS_DET_DEV_PRESENT_PHY_UP 0x3
+
+#define SATA_SCR_SSTATUS_SPD_NODEV        0x00
+#define SATA_SCR_SSTATUS_SPD_GEN1         0x10
+
+#define SATA_SCR_SSTATUS_IPM_NODEV        0x000
+#define SATA_SCR_SSTATUS_IPM_ACTIVE       0X100
+
+#define AHCI_SCR_SCTL_DET                 0xf
+
+#define SATA_FIS_TYPE_REGISTER_H2D        0x27
+#define SATA_FIS_REG_H2D_UPDATE_COMMAND_REGISTER 0x80
+
+#define AHCI_CMD_HDR_CMD_FIS_LEN           0x1f
+#define AHCI_CMD_HDR_PRDT_LEN              16
+
+#define SATA_SIGNATURE_CDROM               0xeb140000
+#define SATA_SIGNATURE_DISK                0x00000101
+
+#define AHCI_GENERIC_HOST_CONTROL_REGS_MAX_ADDR 0x20
+                                            /* Shouldn't this be 0x2c? */
+
+#define SATA_PORTS                         4
+
+#define AHCI_PORT_REGS_START_ADDR          0x100
+#define AHCI_PORT_REGS_END_ADDR (AHCI_PORT_REGS_START_ADDR + SATA_PORTS * 0x80)
+#define AHCI_PORT_ADDR_OFFSET_MASK         0x7f
+
+#define AHCI_NUM_COMMAND_SLOTS             31
+#define AHCI_SUPPORTED_SPEED               20
+#define AHCI_SUPPORTED_SPEED_GEN1          1
+#define AHCI_VERSION_1_0                   0x10000
+
+#define AHCI_PROGMODE_MAJOR_REV_1          1
+
+#define AHCI_COMMAND_TABLE_ACMD            0x40
+
+#define IDE_FEATURE_DMA                    1
+
+#define READ_FPDMA_QUEUED                  0x60
+#define WRITE_FPDMA_QUEUED                 0x61
+
+#define RES_FIS_DSFIS                      0x00
+#define RES_FIS_PSFIS                      0x20
+#define RES_FIS_RFIS                       0x40
+#define RES_FIS_SDBFIS                     0x58
+#define RES_FIS_UFIS                       0x60
+
+typedef struct AHCIControlRegs {
+    uint32_t    cap;
+    uint32_t    ghc;
+    uint32_t    irqstatus;
+    uint32_t    impl;
+    uint32_t    version;
+} AHCIControlRegs;
+
+typedef struct AHCIPortRegs {
+    uint32_t    lst_addr;
+    uint32_t    lst_addr_hi;
+    uint32_t    fis_addr;
+    uint32_t    fis_addr_hi;
+    uint32_t    irq_stat;
+    uint32_t    irq_mask;
+    uint32_t    cmd;
+    uint32_t    unused0;
+    uint32_t    tfdata;
+    uint32_t    sig;
+    uint32_t    scr_stat;
+    uint32_t    scr_ctl;
+    uint32_t    scr_err;
+    uint32_t    scr_act;
+    uint32_t    cmd_issue;
+    uint32_t    reserved;
+} AHCIPortRegs;
+
+typedef struct AHCICmdHdr {
+    uint32_t    opts;
+    uint32_t    status;
+    uint64_t    tbl_addr;
+    uint32_t    reserved[4];
+} __attribute__ ((packed)) AHCICmdHdr;
+
+typedef struct AHCI_SG {
+    uint64_t    addr;
+    uint32_t    reserved;
+    uint32_t    flags_size;
+} __attribute__ ((packed)) AHCI_SG;
+
+typedef struct AHCIDevice AHCIDevice;
+
+typedef struct NCQTransferState {
+    AHCIDevice *drive;
+    BlockDriverAIOCB *aiocb;
+    QEMUSGList sglist;
+    int is_read;
+    uint16_t sector_count;
+    uint64_t lba;
+    uint8_t tag;
+    int slot;
+    int used;
+} NCQTransferState;
+
+struct AHCIDevice {
+    IDEDMA dma;
+    IDEBus port;
+    int port_no;
+    uint32_t port_state;
+    uint32_t finished;
+    AHCIPortRegs port_regs;
+    struct AHCIState *hba;
+    QEMUBH *check_bh;
+    uint8_t *lst;
+    uint8_t *res_fis;
+    int dma_status;
+    int done_atapi_packet;
+    int busy_slot;
+    BlockDriverCompletionFunc *dma_cb;
+    AHCICmdHdr *cur_cmd;
+    NCQTransferState ncq_tfs[AHCI_MAX_CMDS];
+};
+
+typedef struct AHCIState {
+    AHCIDevice dev[SATA_PORTS];
+    AHCIControlRegs control_regs;
+    int mem;
+    qemu_irq irq;
+} AHCIState;
+
+typedef struct AHCIPCIState {
+    PCIDevice card;
+    AHCIState ahci;
+} AHCIPCIState;
+
+typedef struct NCQFrame {
+    uint8_t fis_type;
+    uint8_t c;
+    uint8_t command;
+    uint8_t sector_count_low;
+    uint8_t lba0;
+    uint8_t lba1;
+    uint8_t lba2;
+    uint8_t fua;
+    uint8_t lba3;
+    uint8_t lba4;
+    uint8_t lba5;
+    uint8_t sector_count_high;
+    uint8_t tag;
+    uint8_t reserved5;
+    uint8_t reserved6;
+    uint8_t control;
+    uint8_t reserved7;
+    uint8_t reserved8;
+    uint8_t reserved9;
+    uint8_t reserved10;
+} __attribute__ ((packed)) NCQFrame;
+
+static void check_cmd(AHCIState *s, int port);
+static int handle_cmd(AHCIState *s,int port,int slot);
+static void ahci_reset_port(AHCIState *s, int port);
+static void ahci_write_fis_d2h(AHCIDevice *ad, uint8_t *cmd_fis);
+
+static uint32_t  ahci_port_read(AHCIState *s, int port, int offset)
+{
+    uint32_t val;
+    AHCIPortRegs *pr;
+    pr = &s->dev[port].port_regs;
+
+    switch (offset) {
+    case PORT_LST_ADDR:
+        val = pr->lst_addr;
+        break;
+    case PORT_LST_ADDR_HI:
+        val = pr->lst_addr_hi;
+        break;
+    case PORT_FIS_ADDR:
+        val = pr->fis_addr;
+        break;
+    case PORT_FIS_ADDR_HI:
+        val = pr->fis_addr_hi;
+        break;
+    case PORT_IRQ_STAT:
+        val = pr->irq_stat;
+        break;
+    case PORT_IRQ_MASK:
+        val = pr->irq_mask;
+        break;
+    case PORT_CMD:
+        val = pr->cmd;
+        break;
+    case PORT_TFDATA:
+        val = ((uint16_t)s->dev[port].port.ifs[0].error << 8) |
+              s->dev[port].port.ifs[0].status;
+        break;
+    case PORT_SIG:
+        val = pr->sig;
+        break;
+    case PORT_SCR_STAT:
+        if (s->dev[port].port.ifs[0].bs) {
+            val = SATA_SCR_SSTATUS_DET_DEV_PRESENT_PHY_UP |
+                  SATA_SCR_SSTATUS_SPD_GEN1 | SATA_SCR_SSTATUS_IPM_ACTIVE;
+        } else {
+            val = SATA_SCR_SSTATUS_DET_NODEV;
+        }
+        break;
+    case PORT_SCR_CTL:
+        val = pr->scr_ctl;
+        break;
+    case PORT_SCR_ERR:
+        val = pr->scr_err;
+        break;
+    case PORT_SCR_ACT:
+        pr->scr_act &= ~s->dev[port].finished;
+        s->dev[port].finished = 0;
+        val = pr->scr_act;
+        break;
+    case PORT_CMD_ISSUE:
+        val = pr->cmd_issue;
+        break;
+    case PORT_RESERVED:
+    default:
+        val = 0;
+    }
+    DPRINTF(port, "offset: 0x%x val: 0x%x\n", offset, val);
+    return val;
+
+}
+
+static void ahci_irq_raise(AHCIState *s, AHCIDevice *dev)
+{
+    struct AHCIPCIState *d = container_of(s, AHCIPCIState, ahci);
+
+    DPRINTF(0, "raise irq\n");
+
+    if (msi_enabled(&d->card)) {
+        msi_notify(&d->card, 0);
+    } else {
+        qemu_irq_raise(s->irq);
+    }
+}
+
+static void ahci_irq_lower(AHCIState *s, AHCIDevice *dev)
+{
+    struct AHCIPCIState *d = container_of(s, AHCIPCIState, ahci);
+
+    DPRINTF(0, "lower irq\n");
+
+    if (!msi_enabled(&d->card)) {
+        qemu_irq_lower(s->irq);
+    }
+}
+
+static void ahci_check_irq(AHCIState *s)
+{
+    int i;
+
+    DPRINTF(-1, "check irq %#x\n", s->control_regs.irqstatus);
+
+    for (i = 0; i < SATA_PORTS; i++) {
+        AHCIPortRegs *pr = &s->dev[i].port_regs;
+        if (pr->irq_stat & pr->irq_mask) {
+            s->control_regs.irqstatus |= (1 << i);
+        }
+    }
+
+    if (s->control_regs.irqstatus &&
+        (s->control_regs.ghc & HOST_CTL_IRQ_EN)) {
+            ahci_irq_raise(s, NULL);
+    } else {
+        ahci_irq_lower(s, NULL);
+    }
+}
+
+static void ahci_trigger_irq(AHCIState *s, AHCIDevice *d,
+                             int irq_type)
+{
+    DPRINTF(d->port_no, "trigger irq %#x -> %x\n",
+            irq_type, d->port_regs.irq_mask & irq_type);
+
+    d->port_regs.irq_stat |= irq_type;
+    ahci_check_irq(s);
+}
+
+static void map_page(uint8_t **ptr, uint64_t addr, uint32_t wanted)
+{
+    target_phys_addr_t len = wanted;
+
+    if (*ptr) {
+        cpu_physical_memory_unmap(*ptr, 1, len, len);
+    }
+
+    *ptr = cpu_physical_memory_map(addr, &len, 1);
+    if (len < wanted) {
+        cpu_physical_memory_unmap(*ptr, 1, len, len);
+        *ptr = NULL;
+    }
+}
+
+static void  ahci_port_write(AHCIState *s, int port, int offset, uint32_t val)
+{
+    AHCIPortRegs *pr = &s->dev[port].port_regs;
+
+    DPRINTF(port, "offset: 0x%x val: 0x%x\n", offset, val);
+    switch (offset) {
+        case PORT_LST_ADDR:
+            pr->lst_addr = val;
+            map_page(&s->dev[port].lst,
+                     ((uint64_t)pr->lst_addr_hi << 32) | pr->lst_addr, 1024);
+            s->dev[port].cur_cmd = NULL;
+            break;
+        case PORT_LST_ADDR_HI:
+            pr->lst_addr_hi = val;
+            map_page(&s->dev[port].lst,
+                     ((uint64_t)pr->lst_addr_hi << 32) | pr->lst_addr, 1024);
+            s->dev[port].cur_cmd = NULL;
+            break;
+        case PORT_FIS_ADDR:
+            pr->fis_addr = val;
+            map_page(&s->dev[port].res_fis,
+                     ((uint64_t)pr->fis_addr_hi << 32) | pr->fis_addr, 256);
+            break;
+        case PORT_FIS_ADDR_HI:
+            pr->fis_addr_hi = val;
+            map_page(&s->dev[port].res_fis,
+                     ((uint64_t)pr->fis_addr_hi << 32) | pr->fis_addr, 256);
+            break;
+        case PORT_IRQ_STAT:
+            pr->irq_stat &= ~val;
+            break;
+        case PORT_IRQ_MASK:
+            pr->irq_mask = val & 0xfdc000ff;
+            ahci_check_irq(s);
+            break;
+        case PORT_CMD:
+            pr->cmd = val & ~(PORT_CMD_LIST_ON | PORT_CMD_FIS_ON);
+
+            if (pr->cmd & PORT_CMD_START) {
+                pr->cmd |= PORT_CMD_LIST_ON;
+            }
+
+            if (pr->cmd & PORT_CMD_FIS_RX) {
+                pr->cmd |= PORT_CMD_FIS_ON;
+            }
+
+            check_cmd(s, port);
+            break;
+        case PORT_TFDATA:
+            s->dev[port].port.ifs[0].error = (val >> 8) & 0xff;
+            s->dev[port].port.ifs[0].status = val & 0xff;
+            break;
+        case PORT_SIG:
+            pr->sig = val;
+            break;
+        case PORT_SCR_STAT:
+            pr->scr_stat = val;
+            break;
+        case PORT_SCR_CTL:
+            if (((pr->scr_ctl & AHCI_SCR_SCTL_DET) == 1) &&
+                ((val & AHCI_SCR_SCTL_DET) == 0)) {
+                ahci_reset_port(s, port);
+            }
+            pr->scr_ctl = val;
+            break;
+        case PORT_SCR_ERR:
+            pr->scr_err &= ~val;
+            break;
+        case PORT_SCR_ACT:
+            /* RW1 */
+            pr->scr_act |= val;
+            break;
+        case PORT_CMD_ISSUE:
+            pr->cmd_issue |= val;
+            check_cmd(s, port);
+            break;
+        default:
+            break;
+    }
+}
+
+static uint32_t ahci_mem_readl(void *ptr, target_phys_addr_t addr)
+{
+    AHCIState *s = ptr;
+    uint32_t val = 0;
+
+    addr = addr & 0xfff;
+    if (addr < AHCI_GENERIC_HOST_CONTROL_REGS_MAX_ADDR) {
+        switch (addr) {
+        case HOST_CAP:
+            val = s->control_regs.cap;
+            break;
+        case HOST_CTL:
+            val = s->control_regs.ghc;
+            break;
+        case HOST_IRQ_STAT:
+            val = s->control_regs.irqstatus;
+            break;
+        case HOST_PORTS_IMPL:
+            val = s->control_regs.impl;
+            break;
+        case HOST_VERSION:
+            val = s->control_regs.version;
+            break;
+        }
+
+        DPRINTF(-1, "(addr 0x%08X), val 0x%08X\n", (unsigned) addr, val);
+    } else if ((addr >= AHCI_PORT_REGS_START_ADDR) &&
+               (addr < AHCI_PORT_REGS_END_ADDR)) {
+        val = ahci_port_read(s, (addr - AHCI_PORT_REGS_START_ADDR) >> 7,
+                             addr & AHCI_PORT_ADDR_OFFSET_MASK);
+    }
+
+    return val;
+}
+
+
+
+static void ahci_mem_writel(void *ptr, target_phys_addr_t addr, uint32_t val)
+{
+    AHCIState *s = ptr;
+    addr = addr & 0xfff;
+
+    /* Only aligned reads are allowed on AHCI */
+    if (addr & 3) {
+        fprintf(stderr, "ahci: Mis-aligned write to addr 0x"
+                TARGET_FMT_plx "\n", addr);
+        return;
+    }
+
+    if (addr < AHCI_GENERIC_HOST_CONTROL_REGS_MAX_ADDR) {
+        DPRINTF(-1, "(addr 0x%08X), val 0x%08X\n", (unsigned) addr, val);
+
+        switch (addr) {
+            case HOST_CAP: /* R/WO, RO */
+                /* FIXME handle R/WO */
+                break;
+            case HOST_CTL: /* R/W */
+                if (val & HOST_CTL_RESET) {
+                    DPRINTF(-1, "HBA Reset\n");
+                    /* FIXME reset? */
+                } else {
+                    s->control_regs.ghc = (val & 0x3) | HOST_CTL_AHCI_EN;
+                    ahci_check_irq(s);
+                }
+                break;
+            case HOST_IRQ_STAT: /* R/WC, RO */
+                s->control_regs.irqstatus &= ~val;
+                ahci_check_irq(s);
+                break;
+            case HOST_PORTS_IMPL: /* R/WO, RO */
+                /* FIXME handle R/WO */
+                break;
+            case HOST_VERSION: /* RO */
+                /* FIXME report write? */
+                break;
+            default:
+                DPRINTF(-1, "write to unknown register 0x%x\n", (unsigned)addr);
+        }
+    } else if ((addr >= AHCI_PORT_REGS_START_ADDR) &&
+               (addr < AHCI_PORT_REGS_END_ADDR)) {
+        ahci_port_write(s, (addr - AHCI_PORT_REGS_START_ADDR) >> 7,
+                        addr & AHCI_PORT_ADDR_OFFSET_MASK, val);
+    }
+
+}
+
+static CPUReadMemoryFunc * const ahci_readfn[3]={
+    ahci_mem_readl,
+    ahci_mem_readl,
+    ahci_mem_readl
+};
+
+static CPUWriteMemoryFunc * const ahci_writefn[3]={
+    ahci_mem_writel,
+    ahci_mem_writel,
+    ahci_mem_writel
+};
+
+static void ahci_reg_init(AHCIState *s)
+{
+    int i;
+
+    s->control_regs.cap = (SATA_PORTS - 1) |
+                          (AHCI_NUM_COMMAND_SLOTS << 8) |
+                          (AHCI_SUPPORTED_SPEED_GEN1 << AHCI_SUPPORTED_SPEED) |
+                          HOST_CAP_NCQ | HOST_CAP_AHCI;
+
+    s->control_regs.impl = (1 << SATA_PORTS) - 1;
+
+    s->control_regs.version = AHCI_VERSION_1_0;
+
+    for (i = 0; i < SATA_PORTS; i++) {
+        s->dev[i].port_state = STATE_RUN;
+    }
+}
+
+static uint32_t read_from_sglist(uint8_t *buffer, uint32_t len,
+                                 QEMUSGList *sglist)
+{
+    uint32_t i = 0;
+    uint32_t total = 0, once;
+    ScatterGatherEntry *cur_prd;
+    uint32_t sgcount;
+
+    cur_prd = sglist->sg;
+    sgcount = sglist->nsg;
+    for (i = 0; len && sgcount; i++) {
+        once = MIN(cur_prd->len, len);
+        cpu_physical_memory_read(cur_prd->base, buffer, once);
+        cur_prd++;
+        sgcount--;
+        len -= once;
+        buffer += once;
+        total += once;
+    }
+
+    return total;
+}
+
+static uint32_t write_to_sglist(uint8_t *buffer, uint32_t len,
+                                QEMUSGList *sglist)
+{
+    uint32_t i = 0;
+    uint32_t total = 0, once;
+    ScatterGatherEntry *cur_prd;
+    uint32_t sgcount;
+
+    DPRINTF(-1, "total: 0x%x bytes\n", len);
+
+    cur_prd = sglist->sg;
+    sgcount = sglist->nsg;
+    for (i = 0; len && sgcount; i++) {
+        once = MIN(cur_prd->len, len);
+        DPRINTF(-1, "write 0x%x bytes to 0x%lx\n", once, (long)cur_prd->base);
+        cpu_physical_memory_write(cur_prd->base, buffer, once);
+        cur_prd++;
+        sgcount--;
+        len -= once;
+        buffer += once;
+        total += once;
+    }
+
+    return total;
+}
+
+static void check_cmd(AHCIState *s, int port)
+{
+    AHCIPortRegs *pr = &s->dev[port].port_regs;
+    int slot;
+
+    if ((pr->cmd & PORT_CMD_START) && pr->cmd_issue) {
+        for (slot = 0; (slot < 32) && pr->cmd_issue; slot++) {
+            if ((pr->cmd_issue & (1 << slot)) &&
+                !handle_cmd(s, port, slot)) {
+                pr->cmd_issue &= ~(1 << slot);
+            }
+        }
+    }
+}
+
+static void ahci_check_cmd_bh(void *opaque)
+{
+    AHCIDevice *ad = opaque;
+
+    qemu_bh_delete(ad->check_bh);
+    ad->check_bh = NULL;
+
+    if ((ad->busy_slot != -1) &&
+        !(ad->port.ifs[0].status & (BUSY_STAT|DRQ_STAT))) {
+        /* no longer busy */
+        ad->port_regs.cmd_issue &= ~(1 << ad->busy_slot);
+        ad->busy_slot = -1;
+    }
+
+    check_cmd(ad->hba, ad->port_no);
+}
+
+static void ahci_reset_port(AHCIState *s, int port)
+{
+    AHCIDevice *d = &s->dev[port];
+    AHCIPortRegs *pr = &d->port_regs;
+    IDEState *ide_state = &d->port.ifs[0];
+    uint8_t init_fis[0x20];
+    uint32_t tfd;
+    int i;
+
+    DPRINTF(port, "reset port\n");
+
+    ide_bus_reset(&d->port);
+    ide_state->ncq_queues = AHCI_MAX_CMDS;
+
+    pr->irq_stat = 0;
+    pr->irq_mask = 0;
+    pr->scr_stat = 0;
+    pr->scr_ctl = 0;
+    pr->scr_err = 0;
+    pr->scr_act = 0;
+    d->busy_slot = -1;
+
+    ide_state = &s->dev[port].port.ifs[0];
+    if (!ide_state->bs) {
+        return;
+    }
+
+    /* reset ncq queue */
+    for (i = 0; i < AHCI_MAX_CMDS; i++) {
+        NCQTransferState *ncq_tfs = &s->dev[port].ncq_tfs[i];
+        if (!ncq_tfs->used) {
+            continue;
+        }
+
+        if (ncq_tfs->aiocb) {
+            bdrv_aio_cancel(ncq_tfs->aiocb);
+            ncq_tfs->aiocb = NULL;
+        }
+
+        qemu_sglist_destroy(&ncq_tfs->sglist);
+        ncq_tfs->used = 0;
+    }
+
+    memset(init_fis, 0, sizeof(init_fis));
+    s->dev[port].port_state = STATE_RUN;
+    if (!ide_state->bs) {
+        s->dev[port].port_regs.sig = 0;
+        tfd = (1 << 8) | SEEK_STAT | WRERR_STAT;
+    } else if (ide_state->drive_kind == IDE_CD) {
+        s->dev[port].port_regs.sig = SATA_SIGNATURE_CDROM;
+        ide_state->lcyl = 0x14;
+        ide_state->hcyl = 0xeb;
+        DPRINTF(port, "set lcyl = %d\n", ide_state->lcyl);
+        init_fis[5] = ide_state->lcyl;
+        init_fis[6] = ide_state->hcyl;
+        ide_state->status = SEEK_STAT | WRERR_STAT | READY_STAT;
+    } else {
+        s->dev[port].port_regs.sig = SATA_SIGNATURE_DISK;
+        ide_state->status = SEEK_STAT | WRERR_STAT;
+    }
+
+    ide_state->error = 1;
+    init_fis[4] = 1;
+    init_fis[12] = 1;
+    ahci_write_fis_d2h(d, init_fis);
+}
+
+static void debug_print_fis(uint8_t *fis, int cmd_len)
+{
+#ifdef DEBUG_AHCI
+    int i;
+
+    fprintf(stderr, "fis:");
+    for (i = 0; i < cmd_len; i++) {
+        if ((i & 0xf) == 0) {
+            fprintf(stderr, "\n%02x:",i);
+        }
+        fprintf(stderr, "%02x ",fis[i]);
+    }
+    fprintf(stderr, "\n");
+#endif
+}
+
+static void ahci_write_fis_sdb(AHCIState *s, int port, uint32_t finished)
+{
+    AHCIPortRegs *pr = &s->dev[port].port_regs;
+    IDEState *ide_state;
+    uint8_t *sdb_fis;
+
+    if (!s->dev[port].res_fis ||
+        !(pr->cmd & PORT_CMD_FIS_RX)) {
+        return;
+    }
+
+    sdb_fis = &s->dev[port].res_fis[RES_FIS_SDBFIS];
+    ide_state = &s->dev[port].port.ifs[0];
+
+    /* clear memory */
+    *(uint32_t*)sdb_fis = 0;
+
+    /* write values */
+    sdb_fis[0] = ide_state->error;
+    sdb_fis[2] = ide_state->status & 0x77;
+    s->dev[port].finished |= finished;
+    *(uint32_t*)(sdb_fis + 4) = cpu_to_le32(s->dev[port].finished);
+
+    ahci_trigger_irq(s, &s->dev[port], PORT_IRQ_STAT_SDBS);
+}
+
+static void ahci_write_fis_d2h(AHCIDevice *ad, uint8_t *cmd_fis)
+{
+    AHCIPortRegs *pr = &ad->port_regs;
+    uint8_t *d2h_fis;
+    int i;
+    target_phys_addr_t cmd_len = 0x80;
+    int cmd_mapped = 0;
+
+    if (!ad->res_fis || !(pr->cmd & PORT_CMD_FIS_RX)) {
+        return;
+    }
+
+    if (!cmd_fis) {
+        /* map cmd_fis */
+        uint64_t tbl_addr = le64_to_cpu(ad->cur_cmd->tbl_addr);
+        cmd_fis = cpu_physical_memory_map(tbl_addr, &cmd_len, 0);
+        cmd_mapped = 1;
+    }
+
+    d2h_fis = &ad->res_fis[RES_FIS_RFIS];
+
+    d2h_fis[0] = 0x34;
+    d2h_fis[1] = (ad->hba->control_regs.irqstatus ? (1 << 6) : 0);
+    d2h_fis[2] = ad->port.ifs[0].status;
+    d2h_fis[3] = ad->port.ifs[0].error;
+
+    d2h_fis[4] = cmd_fis[4];
+    d2h_fis[5] = cmd_fis[5];
+    d2h_fis[6] = cmd_fis[6];
+    d2h_fis[7] = cmd_fis[7];
+    d2h_fis[8] = cmd_fis[8];
+    d2h_fis[9] = cmd_fis[9];
+    d2h_fis[10] = cmd_fis[10];
+    d2h_fis[11] = cmd_fis[11];
+    d2h_fis[12] = cmd_fis[12];
+    d2h_fis[13] = cmd_fis[13];
+    for (i = 14; i < 0x20; i++) {
+        d2h_fis[i] = 0;
+    }
+
+    if (d2h_fis[2] & ERR_STAT) {
+        ahci_trigger_irq(ad->hba, ad, PORT_IRQ_STAT_TFES);
+    }
+
+    ahci_trigger_irq(ad->hba, ad, PORT_IRQ_D2H_REG_FIS);
+
+    if (cmd_mapped) {
+        cpu_physical_memory_unmap(cmd_fis, 0, cmd_len, cmd_len);
+    }
+}
+
+static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist)
+{
+    AHCICmdHdr *cmd = ad->cur_cmd;
+    uint32_t opts = le32_to_cpu(cmd->opts);
+    uint64_t prdt_addr = le64_to_cpu(cmd->tbl_addr) + 0x80;
+    int sglist_alloc_hint = opts >> AHCI_CMD_HDR_PRDT_LEN;
+    target_phys_addr_t prdt_len = (sglist_alloc_hint * sizeof(AHCI_SG));
+    target_phys_addr_t real_prdt_len = prdt_len;
+    uint8_t *prdt;
+    int i;
+    int r = 0;
+
+    if (!sglist_alloc_hint) {
+        DPRINTF(ad->port_no, "no sg list given by guest: 0x%08x\n", opts);
+        return -1;
+    }
+
+    /* map PRDT */
+    if (!(prdt = cpu_physical_memory_map(prdt_addr, &prdt_len, 0))){
+        DPRINTF(ad->port_no, "map failed\n");
+        return -1;
+    }
+
+    if (prdt_len < real_prdt_len) {
+        DPRINTF(ad->port_no, "mapped less than expected\n");
+        r = -1;
+        goto out;
+    }
+
+    /* Get entries in the PRDT, init a qemu sglist accordingly */
+    if (sglist_alloc_hint > 0) {
+        AHCI_SG *tbl = (AHCI_SG *)prdt;
+
+        qemu_sglist_init(sglist, sglist_alloc_hint);
+        for (i = 0; i < sglist_alloc_hint; i++) {
+            /* flags_size is zero-based */
+            qemu_sglist_add(sglist, le64_to_cpu(tbl[i].addr),
+                            le32_to_cpu(tbl[i].flags_size) + 1);
+        }
+    }
+
+out:
+    cpu_physical_memory_unmap(prdt, 0, prdt_len, prdt_len);
+    return r;
+}
+
+static void ncq_cb(void *opaque, int ret)
+{
+    NCQTransferState *ncq_tfs = (NCQTransferState *)opaque;
+    IDEState *ide_state = &ncq_tfs->drive->port.ifs[0];
+
+    /* Clear bit for this tag in SActive */
+    ncq_tfs->drive->port_regs.scr_act &= ~(1 << ncq_tfs->tag);
+
+    if (ret < 0) {
+        /* error */
+        ide_state->error = ABRT_ERR;
+        ide_state->status = READY_STAT | ERR_STAT;
+        ncq_tfs->drive->port_regs.scr_err |= (1 << ncq_tfs->tag);
+    } else {
+        ide_state->status = READY_STAT | SEEK_STAT;
+    }
+
+    ahci_write_fis_sdb(ncq_tfs->drive->hba, ncq_tfs->drive->port_no,
+                       (1 << ncq_tfs->tag));
+
+    DPRINTF(ncq_tfs->drive->port_no, "NCQ transfer tag %d finished\n",
+            ncq_tfs->tag);
+
+    qemu_sglist_destroy(&ncq_tfs->sglist);
+    ncq_tfs->used = 0;
+}
+
+static void process_ncq_command(AHCIState *s, int port, uint8_t *cmd_fis,
+                                int slot)
+{
+    NCQFrame *ncq_fis = (NCQFrame*)cmd_fis;
+    uint8_t tag = ncq_fis->tag >> 3;
+    NCQTransferState *ncq_tfs = &s->dev[port].ncq_tfs[tag];
+
+    if (ncq_tfs->used) {
+        /* error - already in use */
+        fprintf(stderr, "%s: tag %d already used\n", __FUNCTION__, tag);
+        return;
+    }
+
+    ncq_tfs->used = 1;
+    ncq_tfs->drive = &s->dev[port];
+    ncq_tfs->slot = slot;
+    ncq_tfs->lba = ((uint64_t)ncq_fis->lba5 << 40) |
+                   ((uint64_t)ncq_fis->lba4 << 32) |
+                   ((uint64_t)ncq_fis->lba3 << 24) |
+                   ((uint64_t)ncq_fis->lba2 << 16) |
+                   ((uint64_t)ncq_fis->lba1 << 8) |
+                   (uint64_t)ncq_fis->lba0;
+
+    /* Note: We calculate the sector count, but don't currently rely on it.
+     * The total size of the DMA buffer tells us the transfer size instead. */
+    ncq_tfs->sector_count = ((uint16_t)ncq_fis->sector_count_high << 8) |
+                                ncq_fis->sector_count_low;
+
+    DPRINTF(port, "NCQ transfer LBA from %ld to %ld, drive max %ld\n",
+            ncq_tfs->lba, ncq_tfs->lba + ncq_tfs->sector_count - 2,
+            s->dev[port].port.ifs[0].nb_sectors - 1);
+
+    ahci_populate_sglist(&s->dev[port], &ncq_tfs->sglist);
+    ncq_tfs->tag = tag;
+
+    switch(ncq_fis->command) {
+        case READ_FPDMA_QUEUED:
+            DPRINTF(port, "NCQ reading %d sectors from LBA %ld, tag %d\n",
+                    ncq_tfs->sector_count-1, ncq_tfs->lba, ncq_tfs->tag);
+            ncq_tfs->is_read = 1;
+
+            DPRINTF(port, "tag %d aio read %ld\n", ncq_tfs->tag, ncq_tfs->lba);
+            ncq_tfs->aiocb = dma_bdrv_read(ncq_tfs->drive->port.ifs[0].bs,
+                                           &ncq_tfs->sglist, ncq_tfs->lba,
+                                           ncq_cb, ncq_tfs);
+            break;
+        case WRITE_FPDMA_QUEUED:
+            DPRINTF(port, "NCQ writing %d sectors to LBA %ld, tag %d\n",
+                    ncq_tfs->sector_count-1, ncq_tfs->lba, ncq_tfs->tag);
+            ncq_tfs->is_read = 0;
+
+            DPRINTF(port, "tag %d aio write %ld\n", ncq_tfs->tag, ncq_tfs->lba);
+            ncq_tfs->aiocb = dma_bdrv_write(ncq_tfs->drive->port.ifs[0].bs,
+                                            &ncq_tfs->sglist, ncq_tfs->lba,
+                                            ncq_cb, ncq_tfs);
+            break;
+        default:
+            DPRINTF(port, "error: tried to process non-NCQ command as NCQ\n");
+            qemu_sglist_destroy(&ncq_tfs->sglist);
+            break;
+    }
+}
+
+static int handle_cmd(AHCIState *s, int port, int slot)
+{
+    IDEState *ide_state;
+    AHCIPortRegs *pr;
+    uint32_t opts;
+    uint64_t tbl_addr;
+    AHCICmdHdr *cmd;
+    uint8_t *cmd_fis;
+    target_phys_addr_t cmd_len;
+
+    if (s->dev[port].port.ifs[0].status & (BUSY_STAT|DRQ_STAT)) {
+        /* Engine currently busy, try again later */
+        DPRINTF(port, "engine busy\n");
+        return -1;
+    }
+
+    pr = &s->dev[port].port_regs;
+    cmd = &((AHCICmdHdr *)s->dev[port].lst)[slot];
+
+    if (!s->dev[port].lst) {
+        DPRINTF(port, "error: lst not given but cmd handled");
+        return -1;
+    }
+
+    /* remember current slot handle for later */
+    s->dev[port].cur_cmd = cmd;
+
+    opts = le32_to_cpu(cmd->opts);
+    tbl_addr = le64_to_cpu(cmd->tbl_addr);
+
+    cmd_len = 0x80;
+    cmd_fis = cpu_physical_memory_map(tbl_addr, &cmd_len, 1);
+
+    if (!cmd_fis) {
+        DPRINTF(port, "error: guest passed us an invalid cmd fis\n");
+        return -1;
+    }
+
+    /* The device we are working for */
+    ide_state = &s->dev[port].port.ifs[0];
+
+    if (!ide_state->bs) {
+        DPRINTF(port, "error: guest accessed unused port");
+        goto out;
+    }
+
+    debug_print_fis(cmd_fis, 0x90);
+    //debug_print_fis(cmd_fis, (opts & AHCI_CMD_HDR_CMD_FIS_LEN) * 4);
+
+    switch (cmd_fis[0]) {
+        case SATA_FIS_TYPE_REGISTER_H2D:
+            break;
+        default:
+            DPRINTF(port, "unknown command cmd_fis[0]=%02x cmd_fis[1]=%02x "
+                          "cmd_fis[2]=%02x\n", cmd_fis[0], cmd_fis[1],
+                          cmd_fis[2]);
+            goto out;
+            break;
+    }
+
+    switch (cmd_fis[1]) {
+        case SATA_FIS_REG_H2D_UPDATE_COMMAND_REGISTER:
+            break;
+        case 0:
+            break;
+        default:
+            DPRINTF(port, "unknown command cmd_fis[0]=%02x cmd_fis[1]=%02x "
+                          "cmd_fis[2]=%02x\n", cmd_fis[0], cmd_fis[1],
+                          cmd_fis[2]);
+            goto out;
+            break;
+    }
+
+    switch (s->dev[port].port_state) {
+        case STATE_RUN:
+            if (cmd_fis[15] & ATA_SRST) {
+                s->dev[port].port_state = STATE_RESET;
+            }
+            break;
+        case STATE_RESET:
+            if (!(cmd_fis[15] & ATA_SRST)) {
+                ahci_reset_port(s, port);
+            }
+            break;
+    }
+
+    if (cmd_fis[1] == SATA_FIS_REG_H2D_UPDATE_COMMAND_REGISTER) {
+
+        /* Check for NCQ command */
+        if ((cmd_fis[2] == READ_FPDMA_QUEUED) ||
+            (cmd_fis[2] == WRITE_FPDMA_QUEUED)) {
+            process_ncq_command(s, port, cmd_fis, slot);
+            goto out;
+        }
+
+        /* Decompose the FIS  */
+        ide_state->nsector = (int64_t)((cmd_fis[13] << 8) | cmd_fis[12]);
+        ide_state->feature = cmd_fis[3];
+        if (!ide_state->nsector) {
+            ide_state->nsector = 256;
+        }
+
+        if (ide_state->drive_kind != IDE_CD) {
+            ide_set_sector(ide_state, (cmd_fis[6] << 16) | (cmd_fis[5] << 8) |
+                           cmd_fis[4]);
+        }
+
+        /* Copy the ACMD field (ATAPI packet, if any) from the AHCI command
+         * table to ide_state->io_buffer
+         */
+        if (opts & AHCI_CMD_ATAPI) {
+            memcpy(ide_state->io_buffer, &cmd_fis[AHCI_COMMAND_TABLE_ACMD], 0x10);
+            ide_state->lcyl = 0x14;
+            ide_state->hcyl = 0xeb;
+            debug_print_fis(ide_state->io_buffer, 0x10);
+            ide_state->feature = IDE_FEATURE_DMA;
+            s->dev[port].done_atapi_packet = 0;
+            /* XXX send PIO setup FIS */
+        }
+
+        ide_state->error = 0;
+
+        /* Reset transferred byte counter */
+        cmd->status = 0;
+
+        /* We're ready to process the command in FIS byte 2. */
+        ide_exec_cmd(&s->dev[port].port, cmd_fis[2]);
+
+        if (s->dev[port].port.ifs[0].status & READY_STAT) {
+            ahci_write_fis_d2h(&s->dev[port], cmd_fis);
+        }
+    }
+
+out:
+    cpu_physical_memory_unmap(cmd_fis, 1, cmd_len, cmd_len);
+
+    if (s->dev[port].port.ifs[0].status & (BUSY_STAT|DRQ_STAT)) {
+        /* async command, complete later */
+        s->dev[port].busy_slot = slot;
+        return -1;
+    }
+
+    /* done handling the command */
+    return 0;
+}
+
+/* DMA dev <-> ram */
+static int ahci_start_transfer(IDEDMA *dma)
+{
+    AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma);
+    IDEState *s = &ad->port.ifs[0];
+    uint32_t size = (uint32_t)(s->data_end - s->data_ptr);
+    /* write == ram -> device */
+    uint32_t opts = le32_to_cpu(ad->cur_cmd->opts);
+    int is_write = opts & AHCI_CMD_WRITE;
+    int is_atapi = opts & AHCI_CMD_ATAPI;
+    int has_sglist = 0;
+
+    if (is_atapi && !ad->done_atapi_packet) {
+        /* already prepopulated iobuffer */
+        ad->done_atapi_packet = 1;
+        goto out;
+    }
+
+    if (!ahci_populate_sglist(ad, &s->sg)) {
+        has_sglist = 1;
+    }
+
+    DPRINTF(ad->port_no, "%sing %d bytes on %s w/%s sglist\n",
+            is_write ? "writ" : "read", size, is_atapi ? "atapi" : "ata",
+            has_sglist ? "" : "o");
+
+    if (is_write && has_sglist && (s->data_ptr < s->data_end)) {
+        read_from_sglist(s->data_ptr, size, &s->sg);
+    }
+
+    if (!is_write && has_sglist && (s->data_ptr < s->data_end)) {
+        write_to_sglist(s->data_ptr, size, &s->sg);
+    }
+
+    /* update number of transferred bytes */
+    ad->cur_cmd->status = cpu_to_le32(le32_to_cpu(ad->cur_cmd->status) + size);
+
+out:
+    /* declare that we processed everything */
+    s->data_ptr = s->data_end;
+
+    if (has_sglist) {
+        qemu_sglist_destroy(&s->sg);
+    }
+
+    s->end_transfer_func(s);
+
+    if (!(s->status & DRQ_STAT)) {
+        /* done with DMA */
+        ahci_trigger_irq(ad->hba, ad, PORT_IRQ_STAT_DSS);
+    }
+
+    return 0;
+}
+
+static void ahci_start_dma(IDEDMA *dma, IDEState *s,
+                           BlockDriverCompletionFunc *dma_cb)
+{
+    AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma);
+
+    DPRINTF(ad->port_no, "\n");
+    ad->dma_cb = dma_cb;
+    ad->dma_status |= BM_STATUS_DMAING;
+    dma_cb(s, 0);
+}
+
+static int ahci_dma_prepare_buf(IDEDMA *dma, int is_write)
+{
+    AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma);
+    IDEState *s = &ad->port.ifs[0];
+    int i;
+
+    ahci_populate_sglist(ad, &s->sg);
+
+    s->io_buffer_size = 0;
+    for (i = 0; i < s->sg.nsg; i++) {
+        s->io_buffer_size += s->sg.sg[i].len;
+    }
+
+    DPRINTF(ad->port_no, "len=%#x\n", s->io_buffer_size);
+    return s->io_buffer_size != 0;
+}
+
+static int ahci_dma_rw_buf(IDEDMA *dma, int is_write)
+{
+    AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma);
+    IDEState *s = &ad->port.ifs[0];
+    uint8_t *p = s->io_buffer + s->io_buffer_index;
+    int l = s->io_buffer_size - s->io_buffer_index;
+
+    if (ahci_populate_sglist(ad, &s->sg)) {
+        return 0;
+    }
+
+    if (is_write) {
+        write_to_sglist(p, l, &s->sg);
+    } else {
+        read_from_sglist(p, l, &s->sg);
+    }
+
+    /* update number of transferred bytes */
+    ad->cur_cmd->status = cpu_to_le32(le32_to_cpu(ad->cur_cmd->status) + l);
+    s->io_buffer_index += l;
+
+    DPRINTF(ad->port_no, "len=%#x\n", l);
+
+    return 1;
+}
+
+static int ahci_dma_set_unit(IDEDMA *dma, int unit)
+{
+    /* only a single unit per link */
+    return 0;
+}
+
+static int ahci_dma_add_status(IDEDMA *dma, int status)
+{
+    AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma);
+    ad->dma_status |= status;
+    DPRINTF(ad->port_no, "set status: %x\n", status);
+
+    if (status & BM_STATUS_INT) {
+        ahci_trigger_irq(ad->hba, ad, PORT_IRQ_STAT_DSS);
+    }
+
+    return 0;
+}
+
+static int ahci_dma_set_inactive(IDEDMA *dma)
+{
+    AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma);
+
+    DPRINTF(ad->port_no, "dma done\n");
+
+    /* update d2h status */
+    ahci_write_fis_d2h(ad, NULL);
+
+    ad->dma_cb = NULL;
+
+    /* maybe we still have something to process, check later */
+    ad->check_bh = qemu_bh_new(ahci_check_cmd_bh, ad);
+    qemu_bh_schedule(ad->check_bh);
+
+    return 0;
+}
+
+static void ahci_irq_set(void *opaque, int n, int level)
+{
+}
+
+static void ahci_dma_restart_cb(void *opaque, int running, int reason)
+{
+}
+
+static int ahci_dma_reset(IDEDMA *dma)
+{
+    return 0;
+}
+
+static const IDEDMAOps ahci_dma_ops = {
+    .start_dma = ahci_start_dma,
+    .start_transfer = ahci_start_transfer,
+    .prepare_buf = ahci_dma_prepare_buf,
+    .rw_buf = ahci_dma_rw_buf,
+    .set_unit = ahci_dma_set_unit,
+    .add_status = ahci_dma_add_status,
+    .set_inactive = ahci_dma_set_inactive,
+    .restart_cb = ahci_dma_restart_cb,
+    .reset = ahci_dma_reset,
+};
+
+static void ahci_init(AHCIState *s, DeviceState *qdev)
+{
+    qemu_irq *irqs;
+    int i;
+
+    ahci_reg_init(s);
+    s->mem = cpu_register_io_memory(ahci_readfn, ahci_writefn, s,
+                                    DEVICE_LITTLE_ENDIAN);
+    irqs = qemu_allocate_irqs(ahci_irq_set, s, SATA_PORTS);
+
+    for (i = 0; i < SATA_PORTS; i++) {
+        AHCIDevice *ad = &s->dev[i];
+
+        ide_bus_new(&ad->port, qdev, i);
+        ide_init2(&ad->port, irqs[i]);
+
+        ad->hba = s;
+        ad->port_no = i;
+        ad->port.dma = &ad->dma;
+        ad->port.dma->ops = &ahci_dma_ops;
+        ad->port_regs.cmd = PORT_CMD_SPIN_UP | PORT_CMD_POWER_ON;
+    }
+}
+
+static void ahci_pci_map(PCIDevice *pci_dev, int region_num,
+        pcibus_t addr, pcibus_t size, int type)
+{
+    struct AHCIPCIState *d = (struct AHCIPCIState *)pci_dev;
+    AHCIState *s = &d->ahci;
+
+    cpu_register_physical_memory(addr, size, s->mem);
+}
+
+static void ahci_reset(void *opaque)
+{
+    struct AHCIPCIState *d = opaque;
+    int i;
+
+    for (i = 0; i < SATA_PORTS; i++) {
+        ahci_reset_port(&d->ahci, i);
+    }
+}
+
+static int pci_ahci_init(PCIDevice *dev)
+{
+    struct AHCIPCIState *d;
+    d = DO_UPCAST(struct AHCIPCIState, card, dev);
+
+    pci_config_set_vendor_id(d->card.config, PCI_VENDOR_ID_INTEL);
+    pci_config_set_device_id(d->card.config, PCI_DEVICE_ID_INTEL_82801IR);
+
+    pci_config_set_class(d->card.config, PCI_CLASS_STORAGE_SATA);
+    pci_config_set_revision(d->card.config, 0x02);
+    pci_config_set_prog_interface(d->card.config, AHCI_PROGMODE_MAJOR_REV_1);
+
+    d->card.config[PCI_CACHE_LINE_SIZE] = 0x08;  /* Cache line size */
+    d->card.config[PCI_LATENCY_TIMER]   = 0x00;  /* Latency timer */
+    pci_config_set_interrupt_pin(d->card.config, 1);
+
+    qemu_register_reset(ahci_reset, d);
+
+    /* XXX BAR size should be 1k, but that breaks, so bump it to 4k for now */
+    pci_register_bar(&d->card, 5, 0x1000, PCI_BASE_ADDRESS_SPACE_MEMORY,
+                     ahci_pci_map);
+
+    msi_init(dev, 0x50, 1, true, false);
+
+    ahci_init(&d->ahci, &dev->qdev);
+    d->ahci.irq = d->card.irq[0];
+
+    return 0;
+}
+
+static int pci_ahci_uninit(PCIDevice *dev)
+{
+    struct AHCIPCIState *d;
+    d = DO_UPCAST(struct AHCIPCIState, card, dev);
+
+    if (msi_enabled(dev)) {
+        msi_uninit(dev);
+    }
+
+    qemu_unregister_reset(ahci_reset, d);
+
+    return 0;
+}
+
+static void pci_ahci_write_config(PCIDevice *pci, uint32_t addr,
+                                  uint32_t val, int len)
+{
+    pci_default_write_config(pci, addr, val, len);
+    msi_write_config(pci, addr, val, len);
+}
+
+static PCIDeviceInfo ahci_info = {
+    .qdev.name  = "ahci",
+    .qdev.size  = sizeof(AHCIPCIState),
+    .init       = pci_ahci_init,
+    .exit       = pci_ahci_uninit,
+    .config_write = pci_ahci_write_config,
+};
+
+static void ahci_pci_register_devices(void)
+{
+    pci_qdev_register(&ahci_info);
+}
+
+device_init(ahci_pci_register_devices)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 14/38] config: move ide core and pci to pci.mak
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (12 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 13/38] ahci: add ahci emulation Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 15/38] config: add ahci for pci capable machines Kevin Wolf
                   ` (24 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Alexander Graf <agraf@suse.de>

Every device that can do PCI should also be able to do IDE. So let's move
the IDE definitions over to pci.mak.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 default-configs/arm-softmmu.mak      |    1 -
 default-configs/i386-softmmu.mak     |    3 ---
 default-configs/mips-softmmu.mak     |    3 ---
 default-configs/mips64-softmmu.mak   |    3 ---
 default-configs/mips64el-softmmu.mak |    3 ---
 default-configs/mipsel-softmmu.mak   |    3 ---
 default-configs/pci.mak              |    3 +++
 default-configs/ppc-softmmu.mak      |    3 ---
 default-configs/ppc64-softmmu.mak    |    3 ---
 default-configs/ppcemb-softmmu.mak   |    3 ---
 default-configs/sh4-softmmu.mak      |    1 -
 default-configs/sh4eb-softmmu.mak    |    1 -
 default-configs/sparc64-softmmu.mak  |    3 ---
 default-configs/x86_64-softmmu.mak   |    3 ---
 14 files changed, 3 insertions(+), 33 deletions(-)

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index ac48dc1..8d1174f 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -8,7 +8,6 @@ CONFIG_ECC=y
 CONFIG_SERIAL=y
 CONFIG_PTIMER=y
 CONFIG_SD=y
-CONFIG_IDE_CORE=y
 CONFIG_MAX7310=y
 CONFIG_WM8750=y
 CONFIG_TWL92230=y
diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index ce905d2..323fafb 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -13,9 +13,6 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_APM=y
 CONFIG_DMA=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_NE2000_ISA=y
diff --git a/default-configs/mips-softmmu.mak b/default-configs/mips-softmmu.mak
index 565e611..f524971 100644
--- a/default-configs/mips-softmmu.mak
+++ b/default-configs/mips-softmmu.mak
@@ -17,9 +17,6 @@ CONFIG_ACPI=y
 CONFIG_APM=y
 CONFIG_DMA=y
 CONFIG_PIIX4=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_NE2000_ISA=y
diff --git a/default-configs/mips64-softmmu.mak b/default-configs/mips64-softmmu.mak
index 03bd8eb..aeab6b2 100644
--- a/default-configs/mips64-softmmu.mak
+++ b/default-configs/mips64-softmmu.mak
@@ -17,9 +17,6 @@ CONFIG_ACPI=y
 CONFIG_APM=y
 CONFIG_DMA=y
 CONFIG_PIIX4=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_NE2000_ISA=y
diff --git a/default-configs/mips64el-softmmu.mak b/default-configs/mips64el-softmmu.mak
index 4661617..8e6511c 100644
--- a/default-configs/mips64el-softmmu.mak
+++ b/default-configs/mips64el-softmmu.mak
@@ -17,9 +17,6 @@ CONFIG_ACPI=y
 CONFIG_APM=y
 CONFIG_DMA=y
 CONFIG_PIIX4=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_IDE_VIA=y
diff --git a/default-configs/mipsel-softmmu.mak b/default-configs/mipsel-softmmu.mak
index 92fc473..a05ac25 100644
--- a/default-configs/mipsel-softmmu.mak
+++ b/default-configs/mipsel-softmmu.mak
@@ -17,9 +17,6 @@ CONFIG_ACPI=y
 CONFIG_APM=y
 CONFIG_DMA=y
 CONFIG_PIIX4=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_NE2000_ISA=y
diff --git a/default-configs/pci.mak b/default-configs/pci.mak
index c74a99f..d700b3c 100644
--- a/default-configs/pci.mak
+++ b/default-configs/pci.mak
@@ -10,3 +10,6 @@ CONFIG_PCNET_COMMON=y
 CONFIG_LSI_SCSI_PCI=y
 CONFIG_RTL8139_PCI=y
 CONFIG_E1000_PCI=y
+CONFIG_IDE_CORE=y
+CONFIG_IDE_QDEV=y
+CONFIG_IDE_PCI=y
diff --git a/default-configs/ppc-softmmu.mak b/default-configs/ppc-softmmu.mak
index f1cb99e..4563742 100644
--- a/default-configs/ppc-softmmu.mak
+++ b/default-configs/ppc-softmmu.mak
@@ -23,9 +23,6 @@ CONFIG_GRACKLE_PCI=y
 CONFIG_UNIN_PCI=y
 CONFIG_DEC_PCI=y
 CONFIG_PPCE500_PCI=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_CMD646=y
 CONFIG_IDE_MACIO=y
diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index 83cbe97..d5073b3 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -23,9 +23,6 @@ CONFIG_GRACKLE_PCI=y
 CONFIG_UNIN_PCI=y
 CONFIG_DEC_PCI=y
 CONFIG_PPCE500_PCI=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_CMD646=y
 CONFIG_IDE_MACIO=y
diff --git a/default-configs/ppcemb-softmmu.mak b/default-configs/ppcemb-softmmu.mak
index 2b52d4a..9f0730c 100644
--- a/default-configs/ppcemb-softmmu.mak
+++ b/default-configs/ppcemb-softmmu.mak
@@ -23,9 +23,6 @@ CONFIG_GRACKLE_PCI=y
 CONFIG_UNIN_PCI=y
 CONFIG_DEC_PCI=y
 CONFIG_PPCE500_PCI=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_CMD646=y
 CONFIG_IDE_MACIO=y
diff --git a/default-configs/sh4-softmmu.mak b/default-configs/sh4-softmmu.mak
index 87247a4..5c69acc 100644
--- a/default-configs/sh4-softmmu.mak
+++ b/default-configs/sh4-softmmu.mak
@@ -3,6 +3,5 @@
 include pci.mak
 CONFIG_SERIAL=y
 CONFIG_PTIMER=y
-CONFIG_IDE_CORE=y
 CONFIG_PFLASH_CFI02=y
 CONFIG_ISA_MMIO=y
diff --git a/default-configs/sh4eb-softmmu.mak b/default-configs/sh4eb-softmmu.mak
index 5b8a16e..7cdc122 100644
--- a/default-configs/sh4eb-softmmu.mak
+++ b/default-configs/sh4eb-softmmu.mak
@@ -3,6 +3,5 @@
 include pci.mak
 CONFIG_SERIAL=y
 CONFIG_PTIMER=y
-CONFIG_IDE_CORE=y
 CONFIG_PFLASH_CFI02=y
 CONFIG_ISA_MMIO=y
diff --git a/default-configs/sparc64-softmmu.mak b/default-configs/sparc64-softmmu.mak
index ecc3122..d8f17e7 100644
--- a/default-configs/sparc64-softmmu.mak
+++ b/default-configs/sparc64-softmmu.mak
@@ -9,8 +9,5 @@ CONFIG_SERIAL=y
 CONFIG_PARALLEL=y
 CONFIG_PCKBD=y
 CONFIG_FDC=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_CMD646=y
diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index 7f22599..eff26d2 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -13,9 +13,6 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_APM=y
 CONFIG_DMA=y
-CONFIG_IDE_CORE=y
-CONFIG_IDE_QDEV=y
-CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_NE2000_ISA=y
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 15/38] config: add ahci for pci capable machines
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (13 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 14/38] config: move ide core and pci to pci.mak Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 16/38] ahci: set SATA Mode Select Kevin Wolf
                   ` (23 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Alexander Graf <agraf@suse.de>

This patch enables AHCI for all machines supporting PCI.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 default-configs/pci.mak |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/default-configs/pci.mak b/default-configs/pci.mak
index d700b3c..0471efb 100644
--- a/default-configs/pci.mak
+++ b/default-configs/pci.mak
@@ -13,3 +13,4 @@ CONFIG_E1000_PCI=y
 CONFIG_IDE_CORE=y
 CONFIG_IDE_QDEV=y
 CONFIG_IDE_PCI=y
+CONFIG_AHCI=y
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 16/38] ahci: set SATA Mode Select
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (14 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 15/38] config: add ahci for pci capable machines Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 17/38] ide: honor ncq for atapi Kevin Wolf
                   ` (22 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Sebastian Herbszt <herbszt@gmx.de>

Set SATA Mode Select to AHCI in the Address Map Register.

Signed-off-by: Sebastian Herbszt <herbszt@gmx.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/ide/ahci.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index f937a92..8ae236a 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1473,6 +1473,9 @@ static int pci_ahci_init(PCIDevice *dev)
     d->card.config[PCI_LATENCY_TIMER]   = 0x00;  /* Latency timer */
     pci_config_set_interrupt_pin(d->card.config, 1);
 
+    /* XXX Software should program this register */
+    d->card.config[0x90]   = 1 << 6; /* Address Map Register - AHCI mode */
+
     qemu_register_reset(ahci_reset, d);
 
     /* XXX BAR size should be 1k, but that breaks, so bump it to 4k for now */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 17/38] ide: honor ncq for atapi
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (15 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 16/38] ahci: set SATA Mode Select Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 18/38] qemu-img: Call error_set_progname Kevin Wolf
                   ` (21 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Alexander Graf <agraf@suse.de>

ATAPI also can do ncq, so let's expose the capability.

This patch makes CD-ROM support work on Windows 7 for me.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/ide/core.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 9e1d4e6..9496e99 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -217,6 +217,12 @@ static void ide_atapi_identify(IDEState *s)
     put_le16(p + 71, 30); /* in ns */
     put_le16(p + 72, 30); /* in ns */
 
+    if (s->ncq_queues) {
+        put_le16(p + 75, s->ncq_queues - 1);
+        /* NCQ supported */
+        put_le16(p + 76, (1 << 8));
+    }
+
     put_le16(p + 80, 0x1e); /* support up to ATA/ATAPI-4 */
 #ifdef USE_DMA_CDROM
     put_le16(p + 88, 0x3f | (1 << 13)); /* udma5 set and supported */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 18/38] qemu-img: Call error_set_progname
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (16 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 17/38] ide: honor ncq for atapi Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 19/38] qemu-img.c: Re-factor img_create() Kevin Wolf
                   ` (20 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

Call error_set_progname during the qemu-img initialization, so that error
messages printed with error_report() use the right prefix.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qemu-img.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 1d936ed..f576cfb 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -23,6 +23,7 @@
  */
 #include "qemu-common.h"
 #include "qemu-option.h"
+#include "qemu-error.h"
 #include "osdep.h"
 #include "sysemu.h"
 #include "block_int.h"
@@ -1612,6 +1613,8 @@ int main(int argc, char **argv)
     const img_cmd_t *cmd;
     const char *cmdname;
 
+    error_set_progname(argv[0]);
+
     bdrv_init();
     if (argc < 2)
         help();
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 19/38] qemu-img.c: Re-factor img_create()
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (17 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 18/38] qemu-img: Call error_set_progname Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 20/38] Introduce do_snapshot_blkdev() and monitor command to handle it Kevin Wolf
                   ` (19 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Jes Sorensen <Jes.Sorensen@redhat.com>

This patch re-factors img_create() moving the code doing the actual
work into block.c where it can be shared with QEMU. This is needed to
be able to create images from QEMU to be used for live snapshots.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c    |  141 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 block.h    |    4 ++
 qemu-img.c |  108 +---------------------------------------------
 3 files changed, 147 insertions(+), 106 deletions(-)

diff --git a/block.c b/block.c
index b4aaf41..a48b30c 100644
--- a/block.c
+++ b/block.c
@@ -2758,3 +2758,144 @@ int64_t bdrv_get_dirty_count(BlockDriverState *bs)
 {
     return bs->dirty_count;
 }
+
+int bdrv_img_create(const char *filename, const char *fmt,
+                    const char *base_filename, const char *base_fmt,
+                    char *options, uint64_t img_size, int flags)
+{
+    QEMUOptionParameter *param = NULL, *create_options = NULL;
+    QEMUOptionParameter *backing_fmt;
+    BlockDriverState *bs = NULL;
+    BlockDriver *drv, *proto_drv;
+    int ret = 0;
+
+    /* Find driver and parse its options */
+    drv = bdrv_find_format(fmt);
+    if (!drv) {
+        error_report("Unknown file format '%s'", fmt);
+        ret = -1;
+        goto out;
+    }
+
+    proto_drv = bdrv_find_protocol(filename);
+    if (!proto_drv) {
+        error_report("Unknown protocol '%s'", filename);
+        ret = -1;
+        goto out;
+    }
+
+    create_options = append_option_parameters(create_options,
+                                              drv->create_options);
+    create_options = append_option_parameters(create_options,
+                                              proto_drv->create_options);
+
+    /* Create parameter list with default values */
+    param = parse_option_parameters("", create_options, param);
+
+    set_option_parameter_int(param, BLOCK_OPT_SIZE, img_size);
+
+    /* Parse -o options */
+    if (options) {
+        param = parse_option_parameters(options, create_options, param);
+        if (param == NULL) {
+            error_report("Invalid options for file format '%s'.", fmt);
+            ret = -1;
+            goto out;
+        }
+    }
+
+    if (base_filename) {
+        if (set_option_parameter(param, BLOCK_OPT_BACKING_FILE,
+                                 base_filename)) {
+            error_report("Backing file not supported for file format '%s'",
+                         fmt);
+            ret = -1;
+            goto out;
+        }
+    }
+
+    if (base_fmt) {
+        if (set_option_parameter(param, BLOCK_OPT_BACKING_FMT, base_fmt)) {
+            error_report("Backing file format not supported for file "
+                         "format '%s'", fmt);
+            ret = -1;
+            goto out;
+        }
+    }
+
+    backing_fmt = get_option_parameter(param, BLOCK_OPT_BACKING_FMT);
+    if (backing_fmt && backing_fmt->value.s) {
+        if (!bdrv_find_format(backing_fmt->value.s)) {
+            error_report("Unknown backing file format '%s'",
+                         backing_fmt->value.s);
+            ret = -1;
+            goto out;
+        }
+    }
+
+    // The size for the image must always be specified, with one exception:
+    // If we are using a backing file, we can obtain the size from there
+    if (get_option_parameter(param, BLOCK_OPT_SIZE)->value.n == -1) {
+        QEMUOptionParameter *backing_file =
+            get_option_parameter(param, BLOCK_OPT_BACKING_FILE);
+
+        if (backing_file && backing_file->value.s) {
+            uint64_t size;
+            const char *fmt = NULL;
+            char buf[32];
+
+            if (backing_fmt && backing_fmt->value.s) {
+                fmt = backing_fmt->value.s;
+            }
+
+            bs = bdrv_new("");
+
+            ret = bdrv_open(bs, backing_file->value.s, flags, drv);
+            if (ret < 0) {
+                error_report("Could not open '%s'", filename);
+                ret = -1;
+                goto out;
+            }
+            bdrv_get_geometry(bs, &size);
+            size *= 512;
+
+            snprintf(buf, sizeof(buf), "%" PRId64, size);
+            set_option_parameter(param, BLOCK_OPT_SIZE, buf);
+        } else {
+            error_report("Image creation needs a size parameter");
+            ret = -1;
+            goto out;
+        }
+    }
+
+    printf("Formatting '%s', fmt=%s ", filename, fmt);
+    print_option_parameters(param);
+    puts("");
+
+    ret = bdrv_create(drv, filename, param);
+
+    if (ret < 0) {
+        if (ret == -ENOTSUP) {
+            error_report("Formatting or formatting option not supported for "
+                         "file format '%s'", fmt);
+        } else if (ret == -EFBIG) {
+            error_report("The image size is too large for file format '%s'",
+                         fmt);
+        } else {
+            error_report("%s: error while creating %s: %s", filename, fmt,
+                         strerror(-ret));
+        }
+    }
+
+out:
+    free_option_parameters(create_options);
+    free_option_parameters(param);
+
+    if (bs) {
+        bdrv_delete(bs);
+    }
+    if (ret) {
+        return 1;
+    }
+    return 0;
+}
diff --git a/block.h b/block.h
index 78ecfac..b812172 100644
--- a/block.h
+++ b/block.h
@@ -227,6 +227,10 @@ int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
 int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf,
                       int64_t pos, int size);
 
+int bdrv_img_create(const char *filename, const char *fmt,
+                    const char *base_filename, const char *base_fmt,
+                    char *options, uint64_t img_size, int flags);
+
 #define BDRV_SECTORS_PER_DIRTY_CHUNK 2048
 
 void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable);
diff --git a/qemu-img.c b/qemu-img.c
index f576cfb..0ff179f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -288,9 +288,6 @@ static int img_create(int argc, char **argv)
     const char *base_fmt = NULL;
     const char *filename;
     const char *base_filename = NULL;
-    BlockDriver *drv, *proto_drv;
-    QEMUOptionParameter *param = NULL, *create_options = NULL;
-    QEMUOptionParameter *backing_fmt = NULL;
     char *options = NULL;
 
     for(;;) {
@@ -351,110 +348,9 @@ static int img_create(int argc, char **argv)
         goto out;
     }
 
-    /* Find driver and parse its options */
-    drv = bdrv_find_format(fmt);
-    if (!drv) {
-        error("Unknown file format '%s'", fmt);
-        ret = -1;
-        goto out;
-    }
-
-    proto_drv = bdrv_find_protocol(filename);
-    if (!proto_drv) {
-        error("Unknown protocol '%s'", filename);
-        ret = -1;
-        goto out;
-    }
-
-    create_options = append_option_parameters(create_options,
-                                              drv->create_options);
-    create_options = append_option_parameters(create_options,
-                                              proto_drv->create_options);
-
-    /* Create parameter list with default values */
-    param = parse_option_parameters("", create_options, param);
-
-    set_option_parameter_int(param, BLOCK_OPT_SIZE, img_size);
-
-    /* Parse -o options */
-    if (options) {
-        param = parse_option_parameters(options, create_options, param);
-        if (param == NULL) {
-            error("Invalid options for file format '%s'.", fmt);
-            ret = -1;
-            goto out;
-        }
-    }
-
-    /* Add old-style options to parameters */
-    ret = add_old_style_options(fmt, param, base_filename, base_fmt);
-    if (ret < 0) {
-        goto out;
-    }
-
-    backing_fmt = get_option_parameter(param, BLOCK_OPT_BACKING_FMT);
-    if (backing_fmt && backing_fmt->value.s) {
-        if (!bdrv_find_format(backing_fmt->value.s)) {
-            error("Unknown backing file format '%s'",
-                  backing_fmt->value.s);
-            ret = -1;
-            goto out;
-        }
-    }
-
-    // The size for the image must always be specified, with one exception:
-    // If we are using a backing file, we can obtain the size from there
-    if (get_option_parameter(param, BLOCK_OPT_SIZE)->value.n == -1) {
-
-        QEMUOptionParameter *backing_file =
-            get_option_parameter(param, BLOCK_OPT_BACKING_FILE);
-
-        if (backing_file && backing_file->value.s) {
-            BlockDriverState *bs;
-            uint64_t size;
-            const char *fmt = NULL;
-            char buf[32];
-
-            if (backing_fmt && backing_fmt->value.s) {
-                fmt = backing_fmt->value.s;
-            }
-
-            bs = bdrv_new_open(backing_file->value.s, fmt, BDRV_O_FLAGS);
-            if (!bs) {
-                ret = -1;
-                goto out;
-            }
-            bdrv_get_geometry(bs, &size);
-            size *= 512;
-            bdrv_delete(bs);
-
-            snprintf(buf, sizeof(buf), "%" PRId64, size);
-            set_option_parameter(param, BLOCK_OPT_SIZE, buf);
-        } else {
-            error("Image creation needs a size parameter");
-            ret = -1;
-            goto out;
-        }
-    }
-
-    printf("Formatting '%s', fmt=%s ", filename, fmt);
-    print_option_parameters(param);
-    puts("");
-
-    ret = bdrv_create(drv, filename, param);
-
-    if (ret < 0) {
-        if (ret == -ENOTSUP) {
-            error("Formatting or formatting option not supported for file format '%s'", fmt);
-        } else if (ret == -EFBIG) {
-            error("The image size is too large for file format '%s'", fmt);
-        } else {
-            error("%s: error while creating %s: %s", filename, fmt, strerror(-ret));
-        }
-    }
+    ret = bdrv_img_create(filename, fmt, base_filename, base_fmt,
+                          options, img_size, BDRV_O_FLAGS);
 out:
-    free_option_parameters(create_options);
-    free_option_parameters(param);
     if (ret) {
         return 1;
     }
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 20/38] Introduce do_snapshot_blkdev() and monitor command to handle it.
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (18 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 19/38] qemu-img.c: Re-factor img_create() Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 21/38] Prevent creating an image with the same filename as backing file Kevin Wolf
                   ` (18 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Jes Sorensen <Jes.Sorensen@redhat.com>

The monitor command is:
snapshot_blkdev <device> [snapshot-file] [format]

Default format is qcow2. For now snapshots without a snapshot-file, eg
internal snapshots, are not supported.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 blockdev.c      |   62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 blockdev.h      |    1 +
 hmp-commands.hx |   19 ++++++++++++++++
 3 files changed, 82 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 3b3b82d..d7add36 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -516,6 +516,68 @@ void do_commit(Monitor *mon, const QDict *qdict)
     }
 }
 
+int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+    const char *device = qdict_get_str(qdict, "device");
+    const char *filename = qdict_get_try_str(qdict, "snapshot_file");
+    const char *format = qdict_get_try_str(qdict, "format");
+    BlockDriverState *bs;
+    BlockDriver *drv, *proto_drv;
+    int ret = 0;
+    int flags;
+
+    bs = bdrv_find(device);
+    if (!bs) {
+        qerror_report(QERR_DEVICE_NOT_FOUND, device);
+        ret = -1;
+        goto out;
+    }
+
+    if (!format) {
+        format = "qcow2";
+    }
+
+    drv = bdrv_find_format(format);
+    if (!drv) {
+        qerror_report(QERR_INVALID_BLOCK_FORMAT, format);
+        ret = -1;
+        goto out;
+    }
+
+    proto_drv = bdrv_find_protocol(filename);
+    if (!proto_drv) {
+        qerror_report(QERR_INVALID_BLOCK_FORMAT, format);
+        ret = -1;
+        goto out;
+    }
+
+    ret = bdrv_img_create(filename, format, bs->filename,
+                          bs->drv->format_name, NULL, -1, bs->open_flags);
+    if (ret) {
+        goto out;
+    }
+
+    qemu_aio_flush();
+    bdrv_flush(bs);
+
+    flags = bs->open_flags;
+    bdrv_close(bs);
+    ret = bdrv_open(bs, filename, flags, drv);
+    /*
+     * If reopening the image file we just created fails, we really
+     * are in trouble :(
+     */
+    if (ret != 0) {
+        abort();
+    }
+out:
+    if (ret) {
+        ret = -1;
+    }
+
+    return ret;
+}
+
 static int eject_device(Monitor *mon, BlockDriverState *bs, int force)
 {
     if (!force) {
diff --git a/blockdev.h b/blockdev.h
index 4cb8ca9..4536b5c 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -52,5 +52,6 @@ int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
                     const char *filename, const char *fmt);
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 23024ba..dd3db36 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -801,6 +801,25 @@ STEXI
 Set maximum tolerated downtime (in seconds) for migration.
 ETEXI
 
+    {
+        .name       = "snapshot_blkdev",
+        .args_type  = "device:s,snapshot_file:s?,format:s?",
+        .params     = "device [new-image-file] [format]",
+        .help       = "initiates a live snapshot\n\t\t\t"
+                      "of device. If a new image file is specified, the\n\t\t\t"
+                      "new image file will become the new root image.\n\t\t\t"
+                      "If format is specified, the snapshot file will\n\t\t\t"
+                      "be created in that format. Otherwise the\n\t\t\t"
+                      "snapshot will be internal! (currently unsupported)",
+        .mhandler.cmd_new = do_snapshot_blkdev,
+    },
+
+STEXI
+@item snapshot_blkdev
+@findex snapshot_blkdev
+Snapshot device, using snapshot file as target if provided
+ETEXI
+
 #if defined(TARGET_I386)
     {
         .name       = "drive_add",
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 21/38] Prevent creating an image with the same filename as backing file
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (19 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 20/38] Introduce do_snapshot_blkdev() and monitor command to handle it Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 22/38] bdrv_img_create() use proper errno return values Kevin Wolf
                   ` (17 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Jes Sorensen <Jes.Sorensen@redhat.com>

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index a48b30c..0c14eee 100644
--- a/block.c
+++ b/block.c
@@ -2764,7 +2764,7 @@ int bdrv_img_create(const char *filename, const char *fmt,
                     char *options, uint64_t img_size, int flags)
 {
     QEMUOptionParameter *param = NULL, *create_options = NULL;
-    QEMUOptionParameter *backing_fmt;
+    QEMUOptionParameter *backing_fmt, *backing_file;
     BlockDriverState *bs = NULL;
     BlockDriver *drv, *proto_drv;
     int ret = 0;
@@ -2823,6 +2823,16 @@ int bdrv_img_create(const char *filename, const char *fmt,
         }
     }
 
+    backing_file = get_option_parameter(param, BLOCK_OPT_BACKING_FILE);
+    if (backing_file && backing_file->value.s) {
+        if (!strcmp(filename, backing_file->value.s)) {
+            error_report("Error: Trying to create an image with the "
+                         "same filename as the backing file");
+            ret = -1;
+            goto out;
+        }
+    }
+
     backing_fmt = get_option_parameter(param, BLOCK_OPT_BACKING_FMT);
     if (backing_fmt && backing_fmt->value.s) {
         if (!bdrv_find_format(backing_fmt->value.s)) {
@@ -2836,9 +2846,6 @@ int bdrv_img_create(const char *filename, const char *fmt,
     // The size for the image must always be specified, with one exception:
     // If we are using a backing file, we can obtain the size from there
     if (get_option_parameter(param, BLOCK_OPT_SIZE)->value.n == -1) {
-        QEMUOptionParameter *backing_file =
-            get_option_parameter(param, BLOCK_OPT_BACKING_FILE);
-
         if (backing_file && backing_file->value.s) {
             uint64_t size;
             const char *fmt = NULL;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 22/38] bdrv_img_create() use proper errno return values
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (20 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 21/38] Prevent creating an image with the same filename as backing file Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 23/38] qemu.img.c: Use error_report() instead of own error() implementation Kevin Wolf
                   ` (16 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Jes Sorensen <Jes.Sorensen@redhat.com>

Kevin suggested to have bdrv_img_create() return proper -errno values
on error.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c |   23 ++++++++++-------------
 1 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/block.c b/block.c
index 0c14eee..fe07d0b 100644
--- a/block.c
+++ b/block.c
@@ -2773,14 +2773,14 @@ int bdrv_img_create(const char *filename, const char *fmt,
     drv = bdrv_find_format(fmt);
     if (!drv) {
         error_report("Unknown file format '%s'", fmt);
-        ret = -1;
+        ret = -EINVAL;
         goto out;
     }
 
     proto_drv = bdrv_find_protocol(filename);
     if (!proto_drv) {
         error_report("Unknown protocol '%s'", filename);
-        ret = -1;
+        ret = -EINVAL;
         goto out;
     }
 
@@ -2799,7 +2799,7 @@ int bdrv_img_create(const char *filename, const char *fmt,
         param = parse_option_parameters(options, create_options, param);
         if (param == NULL) {
             error_report("Invalid options for file format '%s'.", fmt);
-            ret = -1;
+            ret = -EINVAL;
             goto out;
         }
     }
@@ -2809,7 +2809,7 @@ int bdrv_img_create(const char *filename, const char *fmt,
                                  base_filename)) {
             error_report("Backing file not supported for file format '%s'",
                          fmt);
-            ret = -1;
+            ret = -EINVAL;
             goto out;
         }
     }
@@ -2818,7 +2818,7 @@ int bdrv_img_create(const char *filename, const char *fmt,
         if (set_option_parameter(param, BLOCK_OPT_BACKING_FMT, base_fmt)) {
             error_report("Backing file format not supported for file "
                          "format '%s'", fmt);
-            ret = -1;
+            ret = -EINVAL;
             goto out;
         }
     }
@@ -2828,7 +2828,7 @@ int bdrv_img_create(const char *filename, const char *fmt,
         if (!strcmp(filename, backing_file->value.s)) {
             error_report("Error: Trying to create an image with the "
                          "same filename as the backing file");
-            ret = -1;
+            ret = -EINVAL;
             goto out;
         }
     }
@@ -2838,7 +2838,7 @@ int bdrv_img_create(const char *filename, const char *fmt,
         if (!bdrv_find_format(backing_fmt->value.s)) {
             error_report("Unknown backing file format '%s'",
                          backing_fmt->value.s);
-            ret = -1;
+            ret = -EINVAL;
             goto out;
         }
     }
@@ -2860,7 +2860,6 @@ int bdrv_img_create(const char *filename, const char *fmt,
             ret = bdrv_open(bs, backing_file->value.s, flags, drv);
             if (ret < 0) {
                 error_report("Could not open '%s'", filename);
-                ret = -1;
                 goto out;
             }
             bdrv_get_geometry(bs, &size);
@@ -2870,7 +2869,7 @@ int bdrv_img_create(const char *filename, const char *fmt,
             set_option_parameter(param, BLOCK_OPT_SIZE, buf);
         } else {
             error_report("Image creation needs a size parameter");
-            ret = -1;
+            ret = -EINVAL;
             goto out;
         }
     }
@@ -2901,8 +2900,6 @@ out:
     if (bs) {
         bdrv_delete(bs);
     }
-    if (ret) {
-        return 1;
-    }
-    return 0;
+
+    return ret;
 }
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 23/38] qemu.img.c: Use error_report() instead of own error() implementation
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (21 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 22/38] bdrv_img_create() use proper errno return values Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 24/38] Remove NULL checks for bdrv_new return value Kevin Wolf
                   ` (15 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Jes Sorensen <Jes.Sorensen@redhat.com>

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qemu-img.c |  127 +++++++++++++++++++++++++++++------------------------------
 1 files changed, 62 insertions(+), 65 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 0ff179f..0b871d8 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -41,16 +41,6 @@ typedef struct img_cmd_t {
 /* Default to cache=writeback as data integrity is not important for qemu-tcg. */
 #define BDRV_O_FLAGS BDRV_O_CACHE_WB
 
-static void GCC_FMT_ATTR(1, 2) error(const char *fmt, ...)
-{
-    va_list ap;
-    va_start(ap, fmt);
-    fprintf(stderr, "qemu-img: ");
-    vfprintf(stderr, fmt, ap);
-    fprintf(stderr, "\n");
-    va_end(ap);
-}
-
 static void format_print(void *opaque, const char *name)
 {
     printf(" %s", name);
@@ -197,13 +187,13 @@ static int print_block_option_help(const char *filename, const char *fmt)
     /* Find driver and parse its options */
     drv = bdrv_find_format(fmt);
     if (!drv) {
-        error("Unknown file format '%s'", fmt);
+        error_report("Unknown file format '%s'", fmt);
         return 1;
     }
 
     proto_drv = bdrv_find_protocol(filename);
     if (!proto_drv) {
-        error("Unknown protocol '%s'", filename);
+        error_report("Unknown protocol '%s'", filename);
         return 1;
     }
 
@@ -226,30 +216,30 @@ static BlockDriverState *bdrv_new_open(const char *filename,
 
     bs = bdrv_new("");
     if (!bs) {
-        error("Not enough memory");
+        error_report("Not enough memory");
         goto fail;
     }
     if (fmt) {
         drv = bdrv_find_format(fmt);
         if (!drv) {
-            error("Unknown file format '%s'", fmt);
+            error_report("Unknown file format '%s'", fmt);
             goto fail;
         }
     } else {
         drv = NULL;
     }
     if (bdrv_open(bs, filename, flags, drv) < 0) {
-        error("Could not open '%s'", filename);
+        error_report("Could not open '%s'", filename);
         goto fail;
     }
     if (bdrv_is_encrypted(bs)) {
         printf("Disk image '%s' is encrypted.\n", filename);
         if (read_password(password, sizeof(password)) < 0) {
-            error("No password given");
+            error_report("No password given");
             goto fail;
         }
         if (bdrv_set_key(bs, password) < 0) {
-            error("invalid password");
+            error_report("invalid password");
             goto fail;
         }
     }
@@ -267,13 +257,15 @@ static int add_old_style_options(const char *fmt, QEMUOptionParameter *list,
 {
     if (base_filename) {
         if (set_option_parameter(list, BLOCK_OPT_BACKING_FILE, base_filename)) {
-            error("Backing file not supported for file format '%s'", fmt);
+            error_report("Backing file not supported for file format '%s'",
+                         fmt);
             return -1;
         }
     }
     if (base_fmt) {
         if (set_option_parameter(list, BLOCK_OPT_BACKING_FMT, base_fmt)) {
-            error("Backing file format not supported for file format '%s'", fmt);
+            error_report("Backing file format not supported for file "
+                         "format '%s'", fmt);
             return -1;
         }
     }
@@ -310,11 +302,11 @@ static int img_create(int argc, char **argv)
             fmt = optarg;
             break;
         case 'e':
-            error("qemu-img: option -e is deprecated, please use \'-o "
+            error_report("qemu-img: option -e is deprecated, please use \'-o "
                   "encryption\' instead!");
             return 1;
         case '6':
-            error("qemu-img: option -6 is deprecated, please use \'-o "
+            error_report("qemu-img: option -6 is deprecated, please use \'-o "
                   "compat6\' instead!");
             return 1;
         case 'o':
@@ -334,9 +326,9 @@ static int img_create(int argc, char **argv)
         ssize_t sval;
         sval = strtosz_suffix(argv[optind++], NULL, STRTOSZ_DEFSUFFIX_B);
         if (sval < 0) {
-            error("Invalid image size specified! You may use k, M, G or "
+            error_report("Invalid image size specified! You may use k, M, G or "
                   "T suffixes for ");
-            error("kilobytes, megabytes, gigabytes and terabytes.");
+            error_report("kilobytes, megabytes, gigabytes and terabytes.");
             ret = -1;
             goto out;
         }
@@ -400,7 +392,7 @@ static int img_check(int argc, char **argv)
     ret = bdrv_check(bs, &result);
 
     if (ret == -ENOTSUP) {
-        error("This image format does not support checks");
+        error_report("This image format does not support checks");
         bdrv_delete(bs);
         return 1;
     }
@@ -482,16 +474,16 @@ static int img_commit(int argc, char **argv)
         printf("Image committed.\n");
         break;
     case -ENOENT:
-        error("No disk inserted");
+        error_report("No disk inserted");
         break;
     case -EACCES:
-        error("Image is read-only");
+        error_report("Image is read-only");
         break;
     case -ENOTSUP:
-        error("Image is already committed");
+        error_report("Image is already committed");
         break;
     default:
-        error("Error while committing image");
+        error_report("Error while committing image");
         break;
     }
 
@@ -614,11 +606,11 @@ static int img_convert(int argc, char **argv)
             compress = 1;
             break;
         case 'e':
-            error("qemu-img: option -e is deprecated, please use \'-o "
+            error_report("qemu-img: option -e is deprecated, please use \'-o "
                   "encryption\' instead!");
             return 1;
         case '6':
-            error("qemu-img: option -6 is deprecated, please use \'-o "
+            error_report("qemu-img: option -6 is deprecated, please use \'-o "
                   "compat6\' instead!");
             return 1;
         case 'o':
@@ -643,7 +635,8 @@ static int img_convert(int argc, char **argv)
     }
 
     if (bs_n > 1 && out_baseimg) {
-        error("-B makes no sense when concatenating multiple input images");
+        error_report("-B makes no sense when concatenating multiple input "
+                     "images");
         ret = -1;
         goto out;
     }
@@ -654,7 +647,7 @@ static int img_convert(int argc, char **argv)
     for (bs_i = 0; bs_i < bs_n; bs_i++) {
         bs[bs_i] = bdrv_new_open(argv[optind + bs_i], fmt, BDRV_O_FLAGS);
         if (!bs[bs_i]) {
-            error("Could not open '%s'", argv[optind + bs_i]);
+            error_report("Could not open '%s'", argv[optind + bs_i]);
             ret = -1;
             goto out;
         }
@@ -664,12 +657,12 @@ static int img_convert(int argc, char **argv)
 
     if (snapshot_name != NULL) {
         if (bs_n > 1) {
-            error("No support for concatenating multiple snapshot\n");
+            error_report("No support for concatenating multiple snapshot\n");
             ret = -1;
             goto out;
         }
         if (bdrv_snapshot_load_tmp(bs[0], snapshot_name) < 0) {
-            error("Failed to load snapshot\n");
+            error_report("Failed to load snapshot\n");
             ret = -1;
             goto out;
         }
@@ -678,14 +671,14 @@ static int img_convert(int argc, char **argv)
     /* Find driver and parse its options */
     drv = bdrv_find_format(out_fmt);
     if (!drv) {
-        error("Unknown file format '%s'", out_fmt);
+        error_report("Unknown file format '%s'", out_fmt);
         ret = -1;
         goto out;
     }
 
     proto_drv = bdrv_find_protocol(out_filename);
     if (!proto_drv) {
-        error("Unknown protocol '%s'", out_filename);
+        error_report("Unknown protocol '%s'", out_filename);
         ret = -1;
         goto out;
     }
@@ -698,7 +691,7 @@ static int img_convert(int argc, char **argv)
     if (options) {
         param = parse_option_parameters(options, create_options, param);
         if (param == NULL) {
-            error("Invalid options for file format '%s'.", out_fmt);
+            error_report("Invalid options for file format '%s'.", out_fmt);
             ret = -1;
             goto out;
         }
@@ -724,13 +717,14 @@ static int img_convert(int argc, char **argv)
             get_option_parameter(param, BLOCK_OPT_ENCRYPT);
 
         if (!drv->bdrv_write_compressed) {
-            error("Compression not supported for this file format");
+            error_report("Compression not supported for this file format");
             ret = -1;
             goto out;
         }
 
         if (encryption && encryption->value.n) {
-            error("Compression and encryption not supported at the same time");
+            error_report("Compression and encryption not supported at "
+                         "the same time");
             ret = -1;
             goto out;
         }
@@ -740,11 +734,14 @@ static int img_convert(int argc, char **argv)
     ret = bdrv_create(drv, out_filename, param);
     if (ret < 0) {
         if (ret == -ENOTSUP) {
-            error("Formatting not supported for file format '%s'", out_fmt);
+            error_report("Formatting not supported for file format '%s'",
+                         out_fmt);
         } else if (ret == -EFBIG) {
-            error("The image size is too large for file format '%s'", out_fmt);
+            error_report("The image size is too large for file format '%s'",
+                         out_fmt);
         } else {
-            error("%s: error while converting %s: %s", out_filename, out_fmt, strerror(-ret));
+            error_report("%s: error while converting %s: %s",
+                         out_filename, out_fmt, strerror(-ret));
         }
         goto out;
     }
@@ -764,12 +761,12 @@ static int img_convert(int argc, char **argv)
     if (compress) {
         ret = bdrv_get_info(out_bs, &bdi);
         if (ret < 0) {
-            error("could not get block driver info");
+            error_report("could not get block driver info");
             goto out;
         }
         cluster_size = bdi.cluster_size;
         if (cluster_size <= 0 || cluster_size > IO_BUF_SIZE) {
-            error("invalid cluster size");
+            error_report("invalid cluster size");
             ret = -1;
             goto out;
         }
@@ -810,7 +807,7 @@ static int img_convert(int argc, char **argv)
 
                 ret = bdrv_read(bs[bs_i], bs_num, buf2, nlow);
                 if (ret < 0) {
-                    error("error while reading");
+                    error_report("error while reading");
                     goto out;
                 }
 
@@ -828,7 +825,7 @@ static int img_convert(int argc, char **argv)
                 ret = bdrv_write_compressed(out_bs, sector_num, buf,
                                             cluster_sectors);
                 if (ret != 0) {
-                    error("error while compressing sector %" PRId64,
+                    error_report("error while compressing sector %" PRId64,
                           sector_num);
                     goto out;
                 }
@@ -887,7 +884,7 @@ static int img_convert(int argc, char **argv)
 
             ret = bdrv_read(bs[bs_i], sector_num - bs_offset, buf, n);
             if (ret < 0) {
-                error("error while reading");
+                error_report("error while reading");
                 goto out;
             }
             /* NOTE: at the same time we convert, we do not write zero
@@ -906,7 +903,7 @@ static int img_convert(int argc, char **argv)
                     is_allocated_sectors(buf1, n, &n1)) {
                     ret = bdrv_write(out_bs, sector_num, buf1, n1);
                     if (ret < 0) {
-                        error("error while writing");
+                        error_report("error while writing");
                         goto out;
                     }
                 }
@@ -1148,7 +1145,7 @@ static int img_snapshot(int argc, char **argv)
 
         ret = bdrv_snapshot_create(bs, &sn);
         if (ret) {
-            error("Could not create snapshot '%s': %d (%s)",
+            error_report("Could not create snapshot '%s': %d (%s)",
                 snapshot_name, ret, strerror(-ret));
         }
         break;
@@ -1156,7 +1153,7 @@ static int img_snapshot(int argc, char **argv)
     case SNAPSHOT_APPLY:
         ret = bdrv_snapshot_goto(bs, snapshot_name);
         if (ret) {
-            error("Could not apply snapshot '%s': %d (%s)",
+            error_report("Could not apply snapshot '%s': %d (%s)",
                 snapshot_name, ret, strerror(-ret));
         }
         break;
@@ -1164,7 +1161,7 @@ static int img_snapshot(int argc, char **argv)
     case SNAPSHOT_DELETE:
         ret = bdrv_snapshot_delete(bs, snapshot_name);
         if (ret) {
-            error("Could not delete snapshot '%s': %d (%s)",
+            error_report("Could not delete snapshot '%s': %d (%s)",
                 snapshot_name, ret, strerror(-ret));
         }
         break;
@@ -1241,7 +1238,7 @@ static int img_rebase(int argc, char **argv)
     if (!unsafe && bs->backing_format[0] != '\0') {
         old_backing_drv = bdrv_find_format(bs->backing_format);
         if (old_backing_drv == NULL) {
-            error("Invalid format name: '%s'", bs->backing_format);
+            error_report("Invalid format name: '%s'", bs->backing_format);
             ret = -1;
             goto out;
         }
@@ -1250,7 +1247,7 @@ static int img_rebase(int argc, char **argv)
     if (out_basefmt != NULL) {
         new_backing_drv = bdrv_find_format(out_basefmt);
         if (new_backing_drv == NULL) {
-            error("Invalid format name: '%s'", out_basefmt);
+            error_report("Invalid format name: '%s'", out_basefmt);
             ret = -1;
             goto out;
         }
@@ -1269,7 +1266,7 @@ static int img_rebase(int argc, char **argv)
         ret = bdrv_open(bs_old_backing, backing_name, BDRV_O_FLAGS,
                         old_backing_drv);
         if (ret) {
-            error("Could not open old backing file '%s'", backing_name);
+            error_report("Could not open old backing file '%s'", backing_name);
             goto out;
         }
 
@@ -1277,7 +1274,7 @@ static int img_rebase(int argc, char **argv)
         ret = bdrv_open(bs_new_backing, out_baseimg, BDRV_O_FLAGS,
                         new_backing_drv);
         if (ret) {
-            error("Could not open new backing file '%s'", out_baseimg);
+            error_report("Could not open new backing file '%s'", out_baseimg);
             goto out;
         }
     }
@@ -1321,12 +1318,12 @@ static int img_rebase(int argc, char **argv)
             /* Read old and new backing file */
             ret = bdrv_read(bs_old_backing, sector, buf_old, n);
             if (ret < 0) {
-                error("error while reading from old backing file");
+                error_report("error while reading from old backing file");
                 goto out;
             }
             ret = bdrv_read(bs_new_backing, sector, buf_new, n);
             if (ret < 0) {
-                error("error while reading from new backing file");
+                error_report("error while reading from new backing file");
                 goto out;
             }
 
@@ -1342,7 +1339,7 @@ static int img_rebase(int argc, char **argv)
                     ret = bdrv_write(bs, sector + written,
                         buf_old + written * 512, pnum);
                     if (ret < 0) {
-                        error("Error while writing to COW image: %s",
+                        error_report("Error while writing to COW image: %s",
                             strerror(-ret));
                         goto out;
                     }
@@ -1363,10 +1360,10 @@ static int img_rebase(int argc, char **argv)
      */
     ret = bdrv_change_backing_file(bs, out_baseimg, out_basefmt);
     if (ret == -ENOSPC) {
-        error("Could not change the backing file to '%s': No space left in "
-            "the file header", out_baseimg);
+        error_report("Could not change the backing file to '%s': No "
+                     "space left in the file header", out_baseimg);
     } else if (ret < 0) {
-        error("Could not change the backing file to '%s': %s",
+        error_report("Could not change the backing file to '%s': %s",
             out_baseimg, strerror(-ret));
     }
 
@@ -1465,7 +1462,7 @@ static int img_resize(int argc, char **argv)
         total_size = n;
     }
     if (total_size <= 0) {
-        error("New image size must be positive");
+        error_report("New image size must be positive");
         ret = -1;
         goto out;
     }
@@ -1476,13 +1473,13 @@ static int img_resize(int argc, char **argv)
         printf("Image resized.\n");
         break;
     case -ENOTSUP:
-        error("This image format does not support resize");
+        error_report("This image format does not support resize");
         break;
     case -EACCES:
-        error("Image is read-only");
+        error_report("Image is read-only");
         break;
     default:
-        error("Error resizing image (%d)", -ret);
+        error_report("Error resizing image (%d)", -ret);
         break;
     }
 out:
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 24/38] Remove NULL checks for bdrv_new return value
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (22 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 23/38] qemu.img.c: Use error_report() instead of own error() implementation Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 25/38] ide: Register vm change state handler once only Kevin Wolf
                   ` (14 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

It's an indirect call to qemu_malloc, which never returns an error.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/xen_disk.c |   17 ++++++-----------
 qemu-img.c    |    5 +----
 qemu-io.c     |    2 --
 qemu-nbd.c    |    2 --
 4 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/hw/xen_disk.c b/hw/xen_disk.c
index 85a1c85..ed9e5eb 100644
--- a/hw/xen_disk.c
+++ b/hw/xen_disk.c
@@ -634,17 +634,12 @@ static int blk_init(struct XenDevice *xendev)
     if (!blkdev->dinfo) {
         /* setup via xenbus -> create new block driver instance */
         xen_be_printf(&blkdev->xendev, 2, "create new bdrv (xenbus setup)\n");
-	blkdev->bs = bdrv_new(blkdev->dev);
-	if (blkdev->bs) {
-	    if (bdrv_open(blkdev->bs, blkdev->filename, qflags,
-                           bdrv_find_whitelisted_format(blkdev->fileproto))
-                != 0) {
-		bdrv_delete(blkdev->bs);
-		blkdev->bs = NULL;
-	    }
-	}
-	if (!blkdev->bs)
-	    return -1;
+        blkdev->bs = bdrv_new(blkdev->dev);
+        if (bdrv_open(blkdev->bs, blkdev->filename, qflags,
+                      bdrv_find_whitelisted_format(blkdev->fileproto)) != 0) {
+            bdrv_delete(blkdev->bs);
+            return -1;
+        }
     } else {
         /* setup via qemu cmdline -> already setup for us */
         xen_be_printf(&blkdev->xendev, 2, "get configured bdrv (cmdline setup)\n");
diff --git a/qemu-img.c b/qemu-img.c
index 0b871d8..afd9ed2 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -215,10 +215,7 @@ static BlockDriverState *bdrv_new_open(const char *filename,
     char password[256];
 
     bs = bdrv_new("");
-    if (!bs) {
-        error_report("Not enough memory");
-        goto fail;
-    }
+
     if (fmt) {
         drv = bdrv_find_format(fmt);
         if (!drv) {
diff --git a/qemu-io.c b/qemu-io.c
index ff353eb..0f6d1b6 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -1509,8 +1509,6 @@ static int openfile(char *name, int flags, int growable)
 		}
 	} else {
 		bs = bdrv_new("hda");
-		if (!bs)
-			return 1;
 
 		if (bdrv_open(bs, name, flags, NULL) < 0) {
 			fprintf(stderr, "%s: can't open device %s\n", progname, name);
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 99f1d22..e858033 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -336,8 +336,6 @@ int main(int argc, char **argv)
     bdrv_init();
 
     bs = bdrv_new("hda");
-    if (bs == NULL)
-        return 1;
 
     if ((ret = bdrv_open(bs, argv[optind], flags, NULL)) < 0) {
         errno = -ret;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 25/38] ide: Register vm change state handler once only
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (23 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 24/38] Remove NULL checks for bdrv_new return value Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 26/38] block: add discard support Kevin Wolf
                   ` (13 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

We register the vm change state handler in a PCI BAR map() function.
This function can be called multiple times throughout the lifetime of a
PCI IDE device.  This results in duplicate vm change state handlers
being register, none of which are ever unregistered.

Instead, register the vm change state handler in the device's init
function once and for all.

piix tested, cmd646 and via not tested.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/ide/cmd646.c |   18 ++++++++++--------
 hw/ide/piix.c   |   34 ++++++++++++++++++++++++----------
 hw/ide/via.c    |   34 ++++++++++++++++++++++++----------
 3 files changed, 58 insertions(+), 28 deletions(-)

diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index e191ee6..89ba836 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -167,10 +167,6 @@ static void bmdma_map(PCIDevice *pci_dev, int region_num,
 
     for(i = 0;i < 2; i++) {
         BMDMAState *bm = &d->bmdma[i];
-        bmdma_init(&d->bus[i], bm);
-        bm->bus = d->bus+i;
-        qemu_add_vm_change_state_handler(d->bus[i].dma->ops->restart_cb,
-                                         &bm->dma);
 
         if (i == 0) {
             register_ioport_write(addr, 4, 1, bmdma_writeb_0, d);
@@ -228,6 +224,7 @@ static int pci_cmd646_ide_initfn(PCIDevice *dev)
     PCIIDEState *d = DO_UPCAST(PCIIDEState, dev, dev);
     uint8_t *pci_conf = d->dev.config;
     qemu_irq *irq;
+    int i;
 
     pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_CMD);
     pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_CMD_646);
@@ -253,10 +250,15 @@ static int pci_cmd646_ide_initfn(PCIDevice *dev)
     pci_conf[PCI_INTERRUPT_PIN] = 0x01; // interrupt on pin 1
 
     irq = qemu_allocate_irqs(cmd646_set_irq, d, 2);
-    ide_bus_new(&d->bus[0], &d->dev.qdev, 0);
-    ide_bus_new(&d->bus[1], &d->dev.qdev, 1);
-    ide_init2(&d->bus[0], irq[0]);
-    ide_init2(&d->bus[1], irq[1]);
+    for (i = 0; i < 2; i++) {
+        ide_bus_new(&d->bus[i], &d->dev.qdev, i);
+        ide_init2(&d->bus[i], irq[i]);
+
+        bmdma_init(&d->bus[i], &d->bmdma[i]);
+        bm->bus = &d->bus[i];
+        qemu_add_vm_change_state_handler(d->bus[i].dma->ops->restart_cb,
+                                         &d->bmdma[i]->dma);
+    }
 
     vmstate_register(&dev->qdev, 0, &vmstate_ide_pci, d);
     qemu_register_reset(cmd646_reset, d);
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index a6b5d92..1cad906 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -76,10 +76,6 @@ static void bmdma_map(PCIDevice *pci_dev, int region_num,
 
     for(i = 0;i < 2; i++) {
         BMDMAState *bm = &d->bmdma[i];
-        bmdma_init(&d->bus[i], bm);
-        bm->bus = d->bus+i;
-        qemu_add_vm_change_state_handler(d->bus[i].dma->ops->restart_cb,
-                                         &bm->dma);
 
         register_ioport_write(addr, 1, 1, bmdma_cmd_writeb, bm);
 
@@ -112,6 +108,29 @@ static void piix3_reset(void *opaque)
     pci_conf[0x20] = 0x01; /* BMIBA: 20-23h */
 }
 
+static void pci_piix_init_ports(PCIIDEState *d) {
+    int i;
+    struct {
+        int iobase;
+        int iobase2;
+        int isairq;
+    } port_info[] = {
+        {0x1f0, 0x3f6, 14},
+        {0x170, 0x376, 15},
+    };
+
+    for (i = 0; i < 2; i++) {
+        ide_bus_new(&d->bus[i], &d->dev.qdev, i);
+        ide_init_ioport(&d->bus[i], port_info[i].iobase, port_info[i].iobase2);
+        ide_init2(&d->bus[i], isa_reserve_irq(port_info[i].isairq));
+
+        bmdma_init(&d->bus[i], &d->bmdma[i]);
+        d->bmdma[i].bus = &d->bus[i];
+        qemu_add_vm_change_state_handler(d->bus[i].dma->ops->restart_cb,
+                                         &d->bmdma[i].dma);
+    }
+}
+
 static int pci_piix_ide_initfn(PCIIDEState *d)
 {
     uint8_t *pci_conf = d->dev.config;
@@ -125,13 +144,8 @@ static int pci_piix_ide_initfn(PCIIDEState *d)
 
     vmstate_register(&d->dev.qdev, 0, &vmstate_ide_pci, d);
 
-    ide_bus_new(&d->bus[0], &d->dev.qdev, 0);
-    ide_bus_new(&d->bus[1], &d->dev.qdev, 1);
-    ide_init_ioport(&d->bus[0], 0x1f0, 0x3f6);
-    ide_init_ioport(&d->bus[1], 0x170, 0x376);
+    pci_piix_init_ports(d);
 
-    ide_init2(&d->bus[0], isa_reserve_irq(14));
-    ide_init2(&d->bus[1], isa_reserve_irq(15));
     return 0;
 }
 
diff --git a/hw/ide/via.c b/hw/ide/via.c
index 2603110..5b70bd2 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -78,10 +78,6 @@ static void bmdma_map(PCIDevice *pci_dev, int region_num,
 
     for(i = 0;i < 2; i++) {
         BMDMAState *bm = &d->bmdma[i];
-        bmdma_init(&d->bus[i], bm);
-        bm->bus = d->bus+i;
-        qemu_add_vm_change_state_handler(d->bus[i].dma->ops->restart_cb,
-                                         &bm->dma);
 
         register_ioport_write(addr, 1, 1, bmdma_cmd_writeb, bm);
 
@@ -135,6 +131,29 @@ static void via_reset(void *opaque)
     pci_set_long(pci_conf + 0xc0, 0x00020001);
 }
 
+static void vt82c686b_init_ports(PCIIDEState *d) {
+    int i;
+    struct {
+        int iobase;
+        int iobase2;
+        int isairq;
+    } port_info[] = {
+        {0x1f0, 0x3f6, 14},
+        {0x170, 0x376, 15},
+    };
+
+    for (i = 0; i < 2; i++) {
+        ide_bus_new(&d->bus[i], &d->dev.qdev, i);
+        ide_init_ioport(&d->bus[i], port_info[i].iobase, port_info[i].iobase2);
+        ide_init2(&d->bus[i], isa_reserve_irq(port_info[i].isairq));
+
+        bmdma_init(&d->bus[i], &d->bmdma[i]);
+        d->bmdma[i].bus = &d->bus[i];
+        qemu_add_vm_change_state_handler(d->bus[i].dma->ops->restart_cb,
+                                         &d->bmdma[i]->dma);
+    }
+}
+
 /* via ide func */
 static int vt82c686b_ide_initfn(PCIDevice *dev)
 {
@@ -154,12 +173,7 @@ static int vt82c686b_ide_initfn(PCIDevice *dev)
 
     vmstate_register(&dev->qdev, 0, &vmstate_ide_pci, d);
 
-    ide_bus_new(&d->bus[0], &d->dev.qdev, 0);
-    ide_bus_new(&d->bus[1], &d->dev.qdev, 1);
-    ide_init2(&d->bus[0], isa_reserve_irq(14));
-    ide_init2(&d->bus[1], isa_reserve_irq(15));
-    ide_init_ioport(&d->bus[0], 0x1f0, 0x3f6);
-    ide_init_ioport(&d->bus[1], 0x170, 0x376);
+    vt82c686b_init_ports(d);
 
     return 0;
 }
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 26/38] block: add discard support
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (24 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 25/38] ide: Register vm change state handler once only Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 27/38] scsi-disk: support WRITE SAME (16) with unmap bit Kevin Wolf
                   ` (12 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Christoph Hellwig <hch@lst.de>

Add a new bdrv_discard method to free blocks in a mapping image, and a new
drive property to set the granularity for these discard.  If no discard
granularity support is set discard support is disabled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c     |   11 +++++++++++
 block.h     |    1 +
 block/raw.c |    6 ++++++
 block_int.h |    7 ++++++-
 4 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/block.c b/block.c
index fe07d0b..9b5e9e1 100644
--- a/block.c
+++ b/block.c
@@ -1515,6 +1515,17 @@ int bdrv_has_zero_init(BlockDriverState *bs)
     return 1;
 }
 
+int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
+{
+    if (!bs->drv) {
+        return -ENOMEDIUM;
+    }
+    if (!bs->drv->bdrv_discard) {
+        return 0;
+    }
+    return bs->drv->bdrv_discard(bs, sector_num, nb_sectors);
+}
+
 /*
  * Returns true iff the specified sector is present in the disk image. Drivers
  * not implementing the functionality are assumed to not support backing files,
diff --git a/block.h b/block.h
index b812172..f923add 100644
--- a/block.h
+++ b/block.h
@@ -146,6 +146,7 @@ int bdrv_flush(BlockDriverState *bs);
 void bdrv_flush_all(void);
 void bdrv_close_all(void);
 
+int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors);
 int bdrv_has_zero_init(BlockDriverState *bs);
 int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
 	int *pnum);
diff --git a/block/raw.c b/block/raw.c
index 1980deb..b0f72d6 100644
--- a/block/raw.c
+++ b/block/raw.c
@@ -65,6 +65,11 @@ static int raw_probe(const uint8_t *buf, int buf_size, const char *filename)
    return 1; /* everything can be opened as raw image */
 }
 
+static int raw_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
+{
+    return bdrv_discard(bs->file, sector_num, nb_sectors);
+}
+
 static int raw_is_inserted(BlockDriverState *bs)
 {
     return bdrv_is_inserted(bs->file);
@@ -130,6 +135,7 @@ static BlockDriver bdrv_raw = {
     .bdrv_aio_readv     = raw_aio_readv,
     .bdrv_aio_writev    = raw_aio_writev,
     .bdrv_aio_flush     = raw_aio_flush,
+    .bdrv_discard       = raw_discard,
 
     .bdrv_is_inserted   = raw_is_inserted,
     .bdrv_eject         = raw_eject,
diff --git a/block_int.h b/block_int.h
index 6b3b098..eb5cd42 100644
--- a/block_int.h
+++ b/block_int.h
@@ -72,6 +72,8 @@ struct BlockDriver {
         BlockDriverCompletionFunc *cb, void *opaque);
     BlockDriverAIOCB *(*bdrv_aio_flush)(BlockDriverState *bs,
         BlockDriverCompletionFunc *cb, void *opaque);
+    int (*bdrv_discard)(BlockDriverState *bs, int64_t sector_num,
+                        int nb_sectors);
 
     int (*bdrv_aio_multiwrite)(BlockDriverState *bs, BlockRequest *reqs,
         int num_reqs);
@@ -227,6 +229,7 @@ typedef struct BlockConf {
     uint16_t min_io_size;
     uint32_t opt_io_size;
     int32_t bootindex;
+    uint32_t discard_granularity;
 } BlockConf;
 
 static inline unsigned int get_physical_block_exp(BlockConf *conf)
@@ -250,6 +253,8 @@ static inline unsigned int get_physical_block_exp(BlockConf *conf)
                        _conf.physical_block_size, 512),                 \
     DEFINE_PROP_UINT16("min_io_size", _state, _conf.min_io_size, 0),  \
     DEFINE_PROP_UINT32("opt_io_size", _state, _conf.opt_io_size, 0),    \
-    DEFINE_PROP_INT32("bootindex", _state, _conf.bootindex, -1)         \
+    DEFINE_PROP_INT32("bootindex", _state, _conf.bootindex, -1),        \
+    DEFINE_PROP_UINT32("discard_granularity", _state, \
+                       _conf.discard_granularity, 0)
 
 #endif /* BLOCK_INT_H */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 27/38] scsi-disk: support WRITE SAME (16) with unmap bit
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (25 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 26/38] block: add discard support Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 28/38] qemu-io: Add discard command Kevin Wolf
                   ` (11 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Christoph Hellwig <hch@lst.de>

Support discards via the WRITE SAME command with the unmap bit set, and
tell the initiator about the support for it via the block limit and the
new thin provisioning EVPD pages.  Also fix the comment which incorrectly
describedthe block limits EVPD page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/scsi-defs.h |    1 +
 hw/scsi-disk.c |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/hw/scsi-defs.h b/hw/scsi-defs.h
index 1473ecb..413cce0 100644
--- a/hw/scsi-defs.h
+++ b/hw/scsi-defs.h
@@ -84,6 +84,7 @@
 #define MODE_SENSE_10         0x5a
 #define PERSISTENT_RESERVE_IN 0x5e
 #define PERSISTENT_RESERVE_OUT 0x5f
+#define WRITE_SAME_16         0x93
 #define MAINTENANCE_IN        0xa3
 #define MAINTENANCE_OUT       0xa4
 #define MOVE_MEDIUM           0xa5
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 87f9e86..6cb317c 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -424,7 +424,8 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf)
             outbuf[buflen++] = 0x80; // unit serial number
             outbuf[buflen++] = 0x83; // device identification
             if (bdrv_get_type_hint(s->bs) != BDRV_TYPE_CDROM) {
-                outbuf[buflen++] = 0xb0; // block device characteristics
+                outbuf[buflen++] = 0xb0; // block limits
+                outbuf[buflen++] = 0xb2; // thin provisioning
             }
             outbuf[pages] = buflen - pages - 1; // number of pages
             break;
@@ -466,8 +467,10 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf)
             buflen += id_len;
             break;
         }
-        case 0xb0: /* block device characteristics */
+        case 0xb0: /* block limits */
         {
+            unsigned int unmap_sectors =
+                    s->qdev.conf.discard_granularity / s->qdev.blocksize;
             unsigned int min_io_size =
                     s->qdev.conf.min_io_size / s->qdev.blocksize;
             unsigned int opt_io_size =
@@ -492,6 +495,21 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf)
             outbuf[13] = (opt_io_size >> 16) & 0xff;
             outbuf[14] = (opt_io_size >> 8) & 0xff;
             outbuf[15] = opt_io_size & 0xff;
+
+            /* optimal unmap granularity */
+            outbuf[28] = (unmap_sectors >> 24) & 0xff;
+            outbuf[29] = (unmap_sectors >> 16) & 0xff;
+            outbuf[30] = (unmap_sectors >> 8) & 0xff;
+            outbuf[31] = unmap_sectors & 0xff;
+            break;
+        }
+        case 0xb2: /* thin provisioning */
+        {
+            outbuf[3] = buflen = 8;
+            outbuf[4] = 0;
+            outbuf[5] = 0x40; /* write same with unmap supported */
+            outbuf[6] = 0;
+            outbuf[7] = 0;
             break;
         }
         default:
@@ -959,6 +977,12 @@ static int scsi_disk_emulate_command(SCSIDiskReq *r, uint8_t *outbuf)
             outbuf[11] = 0;
             outbuf[12] = 0;
             outbuf[13] = get_physical_block_exp(&s->qdev.conf);
+
+            /* set TPE bit if the format supports discard */
+            if (s->qdev.conf.discard_granularity) {
+                outbuf[14] = 0x80;
+            }
+
             /* Protection, exponent and lowest lba field left blank. */
             buflen = req->cmd.xfer;
             break;
@@ -1123,6 +1147,31 @@ static int32_t scsi_send_command(SCSIDevice *d, uint32_t tag,
             goto illegal_lba;
         }
         break;
+    case WRITE_SAME_16:
+        len = r->req.cmd.xfer / d->blocksize;
+
+        DPRINTF("WRITE SAME(16) (sector %" PRId64 ", count %d)\n",
+                r->req.cmd.lba, len);
+
+        if (r->req.cmd.lba > s->max_lba) {
+            goto illegal_lba;
+        }
+
+        /*
+         * We only support WRITE SAME with the unmap bit set for now.
+         */
+        if (!(buf[1] & 0x8)) {
+            goto fail;
+        }
+
+        rc = bdrv_discard(s->bs, r->req.cmd.lba * s->cluster_size,
+                          len * s->cluster_size);
+        if (rc < 0) {
+            /* XXX: better error code ?*/
+            goto fail;
+        }
+
+        break;
     default:
         DPRINTF("Unknown SCSI command (%2.2x)\n", buf[0]);
     fail:
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 28/38] qemu-io: Add discard command
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (26 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 27/38] scsi-disk: support WRITE SAME (16) with unmap bit Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 29/38] raw-posix: add discard support Kevin Wolf
                   ` (10 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

discard [-Cq] off len -- discards a number of bytes at a specified
offset

 discards a range of bytes from the given offset

 Example:
 'discard 512 1k' - discards 1 kilobyte from 512 bytes into the file

 Discards a segment of the currently open file.
 -C, -- report statistics in a machine parsable format
 -q, -- quite mode, do not show I/O statistics

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qemu-io.c |   88 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 0f6d1b6..2318a28 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -1394,6 +1394,93 @@ static const cmdinfo_t info_cmd = {
 	.oneline	= "prints information about the current file",
 };
 
+static void
+discard_help(void)
+{
+	printf(
+"\n"
+" discards a range of bytes from the given offset\n"
+"\n"
+" Example:\n"
+" 'discard 512 1k' - discards 1 kilobyte from 512 bytes into the file\n"
+"\n"
+" Discards a segment of the currently open file.\n"
+" -C, -- report statistics in a machine parsable format\n"
+" -q, -- quite mode, do not show I/O statistics\n"
+"\n");
+}
+
+static int discard_f(int argc, char **argv);
+
+static const cmdinfo_t discard_cmd = {
+	.name		= "discard",
+	.altname	= "d",
+	.cfunc		= discard_f,
+	.argmin		= 2,
+	.argmax		= -1,
+	.args		= "[-Cq] off len",
+	.oneline	= "discards a number of bytes at a specified offset",
+	.help		= discard_help,
+};
+
+static int
+discard_f(int argc, char **argv)
+{
+	struct timeval t1, t2;
+	int Cflag = 0, qflag = 0;
+	int c, ret;
+	int64_t offset;
+	int count;
+
+	while ((c = getopt(argc, argv, "Cq")) != EOF) {
+		switch (c) {
+		case 'C':
+			Cflag = 1;
+			break;
+		case 'q':
+			qflag = 1;
+			break;
+		default:
+			return command_usage(&discard_cmd);
+		}
+	}
+
+	if (optind != argc - 2) {
+		return command_usage(&discard_cmd);
+	}
+
+	offset = cvtnum(argv[optind]);
+	if (offset < 0) {
+		printf("non-numeric length argument -- %s\n", argv[optind]);
+		return 0;
+	}
+
+	optind++;
+	count = cvtnum(argv[optind]);
+	if (count < 0) {
+		printf("non-numeric length argument -- %s\n", argv[optind]);
+		return 0;
+	}
+
+	gettimeofday(&t1, NULL);
+	ret = bdrv_discard(bs, offset, count);
+	gettimeofday(&t2, NULL);
+
+	if (ret < 0) {
+		printf("discard failed: %s\n", strerror(-ret));
+		goto out;
+	}
+
+	/* Finally, report back -- -C gives a parsable format */
+	if (!qflag) {
+		t2 = tsub(t2, t1);
+		print_report("discard", &t2, offset, count, count, 1, Cflag);
+	}
+
+out:
+	return 0;
+}
+
 static int
 alloc_f(int argc, char **argv)
 {
@@ -1715,6 +1802,7 @@ int main(int argc, char **argv)
 	add_command(&truncate_cmd);
 	add_command(&length_cmd);
 	add_command(&info_cmd);
+	add_command(&discard_cmd);
 	add_command(&alloc_cmd);
 	add_command(&map_cmd);
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 29/38] raw-posix: add discard support
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (27 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 28/38] qemu-io: Add discard command Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 30/38] qemu-io: Fix typo in help texts Kevin Wolf
                   ` (9 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Christoph Hellwig <hch@lst.de>

Add support to discard blocks in a raw image residing on an XFS filesystem
by calling the XFS_IOC_UNRESVSP64 ioctl to punch holes.  Support for other
hole punching mechanisms can be added when they become available.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/raw-posix.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 configure         |   26 ++++++++++++++++++++++++++
 2 files changed, 71 insertions(+), 0 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 9286fb8..6b72470 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -69,6 +69,10 @@
 #include <sys/diskslice.h>
 #endif
 
+#ifdef CONFIG_XFS
+#include <xfs/xfs.h>
+#endif
+
 //#define DEBUG_FLOPPY
 
 //#define DEBUG_BLOCK
@@ -120,6 +124,9 @@ typedef struct BDRVRawState {
 #endif
     uint8_t *aligned_buf;
     unsigned aligned_buf_size;
+#ifdef CONFIG_XFS
+    bool is_xfs : 1;
+#endif
 } BDRVRawState;
 
 static int fd_open(BlockDriverState *bs);
@@ -196,6 +203,12 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
 #endif
     }
 
+#ifdef CONFIG_XFS
+    if (platform_test_xfs_fd(s->fd)) {
+        s->is_xfs = 1;
+    }
+#endif
+
     return 0;
 
 out_free_buf:
@@ -740,6 +753,37 @@ static int raw_flush(BlockDriverState *bs)
     return qemu_fdatasync(s->fd);
 }
 
+#ifdef CONFIG_XFS
+static int xfs_discard(BDRVRawState *s, int64_t sector_num, int nb_sectors)
+{
+    struct xfs_flock64 fl;
+
+    memset(&fl, 0, sizeof(fl));
+    fl.l_whence = SEEK_SET;
+    fl.l_start = sector_num << 9;
+    fl.l_len = (int64_t)nb_sectors << 9;
+
+    if (xfsctl(NULL, s->fd, XFS_IOC_UNRESVSP64, &fl) < 0) {
+        DEBUG_BLOCK_PRINT("cannot punch hole (%s)\n", strerror(errno));
+        return -errno;
+    }
+
+    return 0;
+}
+#endif
+
+static int raw_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
+{
+#ifdef CONFIG_XFS
+    BDRVRawState *s = bs->opaque;
+
+    if (s->is_xfs) {
+        return xfs_discard(s, sector_num, nb_sectors);
+    }
+#endif
+
+    return 0;
+}
 
 static QEMUOptionParameter raw_create_options[] = {
     {
@@ -761,6 +805,7 @@ static BlockDriver bdrv_file = {
     .bdrv_close = raw_close,
     .bdrv_create = raw_create,
     .bdrv_flush = raw_flush,
+    .bdrv_discard = raw_discard,
 
     .bdrv_aio_readv = raw_aio_readv,
     .bdrv_aio_writev = raw_aio_writev,
diff --git a/configure b/configure
index 62defc4..47e4cf0 100755
--- a/configure
+++ b/configure
@@ -288,6 +288,7 @@ xen=""
 linux_aio=""
 attr=""
 vhost_net=""
+xfs=""
 
 gprof="no"
 debug_tcg="no"
@@ -1399,6 +1400,27 @@ EOF
 fi
 
 ##########################################
+# xfsctl() probe, used for raw-posix
+if test "$xfs" != "no" ; then
+  cat > $TMPC << EOF
+#include <xfs/xfs.h>
+int main(void)
+{
+    xfsctl(NULL, 0, 0, NULL);
+    return 0;
+}
+EOF
+  if compile_prog "" "" ; then
+    xfs="yes"
+  else
+    if test "$xfs" = "yes" ; then
+      feature_not_found "xfs"
+    fi
+    xfs=no
+  fi
+fi
+
+##########################################
 # vde libraries probe
 if test "$vde" != "no" ; then
   vde_libs="-lvdeplug"
@@ -2403,6 +2425,7 @@ echo "Trace backend     $trace_backend"
 echo "Trace output file $trace_file-<pid>"
 echo "spice support     $spice"
 echo "rbd support       $rbd"
+echo "xfsctl support    $xfs"
 
 if test $sdl_too_old = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -2548,6 +2571,9 @@ fi
 if test "$uuid" = "yes" ; then
   echo "CONFIG_UUID=y" >> $config_host_mak
 fi
+if test "$xfs" = "yes" ; then
+  echo "CONFIG_XFS=y" >> $config_host_mak
+fi
 qemu_version=`head $source_path/VERSION`
 echo "VERSION=$qemu_version" >>$config_host_mak
 echo "PKGVERSION=$pkgversion" >>$config_host_mak
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 30/38] qemu-io: Fix typo in help texts
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (28 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 29/38] raw-posix: add discard support Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 31/38] docs: Add QED image format specification Kevin Wolf
                   ` (8 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qemu-io.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 2318a28..65dee13 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -326,7 +326,7 @@ read_help(void)
 " -l, -- length for pattern verification (only with -P)\n"
 " -p, -- use bdrv_pread to read the file\n"
 " -P, -- use a pattern to verify read data\n"
-" -q, -- quite mode, do not show I/O statistics\n"
+" -q, -- quiet mode, do not show I/O statistics\n"
 " -s, -- start offset for pattern verification (only with -P)\n"
 " -v, -- dump buffer to standard output\n"
 "\n");
@@ -509,7 +509,7 @@ readv_help(void)
 " -C, -- report statistics in a machine parsable format\n"
 " -P, -- use a pattern to verify read data\n"
 " -v, -- dump buffer to standard output\n"
-" -q, -- quite mode, do not show I/O statistics\n"
+" -q, -- quiet mode, do not show I/O statistics\n"
 "\n");
 }
 
@@ -633,7 +633,7 @@ write_help(void)
 " -p, -- use bdrv_pwrite to write the file\n"
 " -P, -- use different pattern to fill file\n"
 " -C, -- report statistics in a machine parsable format\n"
-" -q, -- quite mode, do not show I/O statistics\n"
+" -q, -- quiet mode, do not show I/O statistics\n"
 "\n");
 }
 
@@ -765,7 +765,7 @@ writev_help(void)
 " filled with a set pattern (0xcdcdcdcd).\n"
 " -P, -- use different pattern to fill file\n"
 " -C, -- report statistics in a machine parsable format\n"
-" -q, -- quite mode, do not show I/O statistics\n"
+" -q, -- quiet mode, do not show I/O statistics\n"
 "\n");
 }
 
@@ -1100,7 +1100,7 @@ aio_read_help(void)
 " -C, -- report statistics in a machine parsable format\n"
 " -P, -- use a pattern to verify read data\n"
 " -v, -- dump buffer to standard output\n"
-" -q, -- quite mode, do not show I/O statistics\n"
+" -q, -- quiet mode, do not show I/O statistics\n"
 "\n");
 }
 
@@ -1198,7 +1198,7 @@ aio_write_help(void)
 " used to ensure all outstanding aio requests have been completed\n"
 " -P, -- use different pattern to fill file\n"
 " -C, -- report statistics in a machine parsable format\n"
-" -q, -- quite mode, do not show I/O statistics\n"
+" -q, -- quiet mode, do not show I/O statistics\n"
 "\n");
 }
 
@@ -1406,7 +1406,7 @@ discard_help(void)
 "\n"
 " Discards a segment of the currently open file.\n"
 " -C, -- report statistics in a machine parsable format\n"
-" -q, -- quite mode, do not show I/O statistics\n"
+" -q, -- quiet mode, do not show I/O statistics\n"
 "\n");
 }
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 31/38] docs: Add QED image format specification
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (29 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 30/38] qemu-io: Fix typo in help texts Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 32/38] qed: Add QEMU Enhanced Disk image format Kevin Wolf
                   ` (7 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 docs/specs/qed_spec.txt |  130 +++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 130 insertions(+), 0 deletions(-)
 create mode 100644 docs/specs/qed_spec.txt

diff --git a/docs/specs/qed_spec.txt b/docs/specs/qed_spec.txt
new file mode 100644
index 0000000..446b5a2
--- /dev/null
+++ b/docs/specs/qed_spec.txt
@@ -0,0 +1,130 @@
+=Specification=
+
+The file format looks like this:
+
+ +----------+----------+----------+-----+
+ | cluster0 | cluster1 | cluster2 | ... |
+ +----------+----------+----------+-----+
+
+The first cluster begins with the '''header'''.  The header contains information about where regular clusters start; this allows the header to be extensible and store extra information about the image file.  A regular cluster may be a '''data cluster''', an '''L2''', or an '''L1 table'''.  L1 and L2 tables are composed of one or more contiguous clusters.
+
+Normally the file size will be a multiple of the cluster size.  If the file size is not a multiple, extra information after the last cluster may not be preserved if data is written.  Legitimate extra information should use space between the header and the first regular cluster.
+
+All fields are little-endian.
+
+==Header==
+ Header {
+     uint32_t magic;               /* QED\0 */
+ 
+     uint32_t cluster_size;        /* in bytes */
+     uint32_t table_size;          /* for L1 and L2 tables, in clusters */
+     uint32_t header_size;         /* in clusters */
+ 
+     uint64_t features;            /* format feature bits */
+     uint64_t compat_features;     /* compat feature bits */
+     uint64_t autoclear_features;  /* self-resetting feature bits */
+
+     uint64_t l1_table_offset;     /* in bytes */
+     uint64_t image_size;          /* total logical image size, in bytes */
+ 
+     /* if (features & QED_F_BACKING_FILE) */
+     uint32_t backing_filename_offset; /* in bytes from start of header */
+     uint32_t backing_filename_size;   /* in bytes */
+ }
+
+Field descriptions:
+* ''cluster_size'' must be a power of 2 in range [212, 226].
+* ''table_size'' must be a power of 2 in range [1, 16].
+* ''header_size'' is the number of clusters used by the header and any additional information stored before regular clusters.
+* ''features'', ''compat_features'', and ''autoclear_features'' are file format extension bitmaps.  They work as follows:
+** An image with unknown ''features'' bits enabled must not be opened.  File format changes that are not backwards-compatible must use ''features'' bits.
+** An image with unknown ''compat_features'' bits enabled can be opened safely.  The unknown features are simply ignored and represent backwards-compatible changes to the file format.
+** An image with unknown ''autoclear_features'' bits enable can be opened safely after clearing the unknown bits.  This allows for backwards-compatible changes to the file format which degrade gracefully and can be re-enabled again by a new program later.
+* ''l1_table_offset'' is the offset of the first byte of the L1 table in the image file and must be a multiple of ''cluster_size''.
+* ''image_size'' is the block device size seen by the guest and must be a multiple of 512 bytes.
+* ''backing_filename_offset'' and ''backing_filename_size'' describe a string in (byte offset, byte size) form.  It is not NUL-terminated and has no alignment constraints.  The string must be stored within the first ''header_size'' clusters.  The backing filename may be an absolute path or relative to the image file.
+
+Feature bits:
+* QED_F_BACKING_FILE = 0x01.  The image uses a backing file.
+* QED_F_NEED_CHECK = 0x02.  The image needs a consistency check before use.
+* QED_F_BACKING_FORMAT_NO_PROBE = 0x04.  The backing file is a raw disk image and no file format autodetection should be attempted.  This should be used to ensure that raw backing files are never detected as an image format if they happen to contain magic constants.
+
+There are currently no defined ''compat_features'' or ''autoclear_features'' bits.
+
+Fields predicated on a feature bit are only used when that feature is set.  The fields always take up header space, regardless of whether or not the feature bit is set.
+
+==Tables==
+
+Tables provide the translation from logical offsets in the block device to cluster offsets in the file.
+
+ #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))
+  
+ Table {
+     uint64_t offsets[TABLE_NOFFSETS];
+ }
+
+The tables are organized as follows:
+
+                    +----------+
+                    | L1 table |
+                    +----------+
+               ,------'  |  '------.
+          +----------+   |    +----------+
+          | L2 table |  ...   | L2 table |
+          +----------+        +----------+
+      ,------'  |  '------.
+ +----------+   |    +----------+
+ |   Data   |  ...   |   Data   |
+ +----------+        +----------+
+
+A table is made up of one or more contiguous clusters.  The table_size header field determines table size for an image file.  For example, cluster_size=64 KB and table_size=4 results in 256 KB tables.
+
+The logical image size must be less than or equal to the maximum possible size of clusters rooted by the L1 table:
+ header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size
+
+L1, L2, and data cluster offsets must be aligned to header.cluster_size.  The following offsets have special meanings:
+
+===L2 table offsets===
+* 0 - unallocated.  The L2 table is not yet allocated.
+
+===Data cluster offsets===
+* 0 - unallocated.  The data cluster is not yet allocated.
+
+Future format extensions may wish to store per-offset information.  The least significant 12 bits of an offset are reserved for this purpose and must be set to zero.  Image files with cluster_size > 212 will have more unused bits which should also be zeroed.
+
+===Unallocated L2 tables and data clusters===
+Reads to an unallocated area of the image file access the backing file.  If there is no backing file, then zeroes are produced.  The backing file may be smaller than the image file and reads of unallocated areas beyond the end of the backing file produce zeroes.
+
+Writes to an unallocated area cause a new data clusters to be allocated, and a new L2 table if that is also unallocated.  The new data cluster is populated with data from the backing file (or zeroes if no backing file) and the data being written.
+
+===Logical offset translation===
+Logical offsets are translated into cluster offsets as follows:
+
+  table_bits table_bits    cluster_bits
+  <--------> <--------> <--------------->
+ +----------+----------+-----------------+
+ | L1 index | L2 index |     byte offset |
+ +----------+----------+-----------------+
+ 
+       Structure of a logical offset
+
+ offset_mask = ~(cluster_size - 1) # mask for the image file byte offset
+ 
+ def logical_to_cluster_offset(l1_index, l2_index, byte_offset):
+   l2_offset = l1_table[l1_index]
+   l2_table = load_table(l2_offset)
+   cluster_offset = l2_table[l2_index] & offset_mask
+   return cluster_offset + byte_offset
+
+==Consistency checking==
+
+This section is informational and included to provide background on the use of the QED_F_NEED_CHECK ''features'' bit.
+
+The QED_F_NEED_CHECK bit is used to mark an image as dirty before starting an operation that could leave the image in an inconsistent state if interrupted by a crash or power failure.  A dirty image must be checked on open because its metadata may not be consistent.
+
+Consistency check includes the following invariants:
+# Each cluster is referenced once and only once.  It is an inconsistency to have a cluster referenced more than once by L1 or L2 tables.  A cluster has been leaked if it has no references.
+# Offsets must be within the image file size and must be ''cluster_size'' aligned.
+# Table offsets must at least ''table_size'' * ''cluster_size'' bytes from the end of the image file so that there is space for the entire table.
+
+The consistency check process starts by from ''l1_table_offset'' and scans all L2 tables.  After the check completes with no other errors besides leaks, the QED_F_NEED_CHECK bit can be cleared and the image can be accessed.
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 32/38] qed: Add QEMU Enhanced Disk image format
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (30 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 31/38] docs: Add QED image format specification Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 33/38] qed: Table, L2 cache, and cluster functions Kevin Wolf
                   ` (6 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

This patch introduces the qed on-disk layout and implements image
creation.  Later patches add read/write and other functionality.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 Makefile.objs |    1 +
 block/qed.c   |  554 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qed.h   |  148 +++++++++++++++
 block_int.h   |    1 +
 4 files changed, 704 insertions(+), 0 deletions(-)
 create mode 100644 block/qed.c
 create mode 100644 block/qed.h

diff --git a/Makefile.objs b/Makefile.objs
index 72c07dd..50b91e8 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -20,6 +20,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 
 block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
+block-nested-y += qed.o
 block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
diff --git a/block/qed.c b/block/qed.c
new file mode 100644
index 0000000..1436ac4
--- /dev/null
+++ b/block/qed.c
@@ -0,0 +1,554 @@
+/*
+ * QEMU Enhanced Disk Format
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *  Stefan Hajnoczi   <stefanha@linux.vnet.ibm.com>
+ *  Anthony Liguori   <aliguori@us.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qed.h"
+
+static int bdrv_qed_probe(const uint8_t *buf, int buf_size,
+                          const char *filename)
+{
+    const QEDHeader *header = (const QEDHeader *)buf;
+
+    if (buf_size < sizeof(*header)) {
+        return 0;
+    }
+    if (le32_to_cpu(header->magic) != QED_MAGIC) {
+        return 0;
+    }
+    return 100;
+}
+
+/**
+ * Check whether an image format is raw
+ *
+ * @fmt:    Backing file format, may be NULL
+ */
+static bool qed_fmt_is_raw(const char *fmt)
+{
+    return fmt && strcmp(fmt, "raw") == 0;
+}
+
+static void qed_header_le_to_cpu(const QEDHeader *le, QEDHeader *cpu)
+{
+    cpu->magic = le32_to_cpu(le->magic);
+    cpu->cluster_size = le32_to_cpu(le->cluster_size);
+    cpu->table_size = le32_to_cpu(le->table_size);
+    cpu->header_size = le32_to_cpu(le->header_size);
+    cpu->features = le64_to_cpu(le->features);
+    cpu->compat_features = le64_to_cpu(le->compat_features);
+    cpu->autoclear_features = le64_to_cpu(le->autoclear_features);
+    cpu->l1_table_offset = le64_to_cpu(le->l1_table_offset);
+    cpu->image_size = le64_to_cpu(le->image_size);
+    cpu->backing_filename_offset = le32_to_cpu(le->backing_filename_offset);
+    cpu->backing_filename_size = le32_to_cpu(le->backing_filename_size);
+}
+
+static void qed_header_cpu_to_le(const QEDHeader *cpu, QEDHeader *le)
+{
+    le->magic = cpu_to_le32(cpu->magic);
+    le->cluster_size = cpu_to_le32(cpu->cluster_size);
+    le->table_size = cpu_to_le32(cpu->table_size);
+    le->header_size = cpu_to_le32(cpu->header_size);
+    le->features = cpu_to_le64(cpu->features);
+    le->compat_features = cpu_to_le64(cpu->compat_features);
+    le->autoclear_features = cpu_to_le64(cpu->autoclear_features);
+    le->l1_table_offset = cpu_to_le64(cpu->l1_table_offset);
+    le->image_size = cpu_to_le64(cpu->image_size);
+    le->backing_filename_offset = cpu_to_le32(cpu->backing_filename_offset);
+    le->backing_filename_size = cpu_to_le32(cpu->backing_filename_size);
+}
+
+static int qed_write_header_sync(BDRVQEDState *s)
+{
+    QEDHeader le;
+    int ret;
+
+    qed_header_cpu_to_le(&s->header, &le);
+    ret = bdrv_pwrite(s->bs->file, 0, &le, sizeof(le));
+    if (ret != sizeof(le)) {
+        return ret;
+    }
+    return 0;
+}
+
+static uint64_t qed_max_image_size(uint32_t cluster_size, uint32_t table_size)
+{
+    uint64_t table_entries;
+    uint64_t l2_size;
+
+    table_entries = (table_size * cluster_size) / sizeof(uint64_t);
+    l2_size = table_entries * cluster_size;
+
+    return l2_size * table_entries;
+}
+
+static bool qed_is_cluster_size_valid(uint32_t cluster_size)
+{
+    if (cluster_size < QED_MIN_CLUSTER_SIZE ||
+        cluster_size > QED_MAX_CLUSTER_SIZE) {
+        return false;
+    }
+    if (cluster_size & (cluster_size - 1)) {
+        return false; /* not power of 2 */
+    }
+    return true;
+}
+
+static bool qed_is_table_size_valid(uint32_t table_size)
+{
+    if (table_size < QED_MIN_TABLE_SIZE ||
+        table_size > QED_MAX_TABLE_SIZE) {
+        return false;
+    }
+    if (table_size & (table_size - 1)) {
+        return false; /* not power of 2 */
+    }
+    return true;
+}
+
+static bool qed_is_image_size_valid(uint64_t image_size, uint32_t cluster_size,
+                                    uint32_t table_size)
+{
+    if (image_size % BDRV_SECTOR_SIZE != 0) {
+        return false; /* not multiple of sector size */
+    }
+    if (image_size > qed_max_image_size(cluster_size, table_size)) {
+        return false; /* image is too large */
+    }
+    return true;
+}
+
+/**
+ * Read a string of known length from the image file
+ *
+ * @file:       Image file
+ * @offset:     File offset to start of string, in bytes
+ * @n:          String length in bytes
+ * @buf:        Destination buffer
+ * @buflen:     Destination buffer length in bytes
+ * @ret:        0 on success, -errno on failure
+ *
+ * The string is NUL-terminated.
+ */
+static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n,
+                           char *buf, size_t buflen)
+{
+    int ret;
+    if (n >= buflen) {
+        return -EINVAL;
+    }
+    ret = bdrv_pread(file, offset, buf, n);
+    if (ret < 0) {
+        return ret;
+    }
+    buf[n] = '\0';
+    return 0;
+}
+
+static int bdrv_qed_open(BlockDriverState *bs, int flags)
+{
+    BDRVQEDState *s = bs->opaque;
+    QEDHeader le_header;
+    int64_t file_size;
+    int ret;
+
+    s->bs = bs;
+
+    ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
+    if (ret < 0) {
+        return ret;
+    }
+    ret = 0; /* ret should always be 0 or -errno */
+    qed_header_le_to_cpu(&le_header, &s->header);
+
+    if (s->header.magic != QED_MAGIC) {
+        return -EINVAL;
+    }
+    if (s->header.features & ~QED_FEATURE_MASK) {
+        return -ENOTSUP; /* image uses unsupported feature bits */
+    }
+    if (!qed_is_cluster_size_valid(s->header.cluster_size)) {
+        return -EINVAL;
+    }
+
+    /* Round down file size to the last cluster */
+    file_size = bdrv_getlength(bs->file);
+    if (file_size < 0) {
+        return file_size;
+    }
+    s->file_size = qed_start_of_cluster(s, file_size);
+
+    if (!qed_is_table_size_valid(s->header.table_size)) {
+        return -EINVAL;
+    }
+    if (!qed_is_image_size_valid(s->header.image_size,
+                                 s->header.cluster_size,
+                                 s->header.table_size)) {
+        return -EINVAL;
+    }
+    if (!qed_check_table_offset(s, s->header.l1_table_offset)) {
+        return -EINVAL;
+    }
+
+    s->table_nelems = (s->header.cluster_size * s->header.table_size) /
+                      sizeof(uint64_t);
+    s->l2_shift = ffs(s->header.cluster_size) - 1;
+    s->l2_mask = s->table_nelems - 1;
+    s->l1_shift = s->l2_shift + ffs(s->table_nelems) - 1;
+
+    if ((s->header.features & QED_F_BACKING_FILE)) {
+        if ((uint64_t)s->header.backing_filename_offset +
+            s->header.backing_filename_size >
+            s->header.cluster_size * s->header.header_size) {
+            return -EINVAL;
+        }
+
+        ret = qed_read_string(bs->file, s->header.backing_filename_offset,
+                              s->header.backing_filename_size, bs->backing_file,
+                              sizeof(bs->backing_file));
+        if (ret < 0) {
+            return ret;
+        }
+
+        if (s->header.features & QED_F_BACKING_FORMAT_NO_PROBE) {
+            pstrcpy(bs->backing_format, sizeof(bs->backing_format), "raw");
+        }
+    }
+
+    /* Reset unknown autoclear feature bits.  This is a backwards
+     * compatibility mechanism that allows images to be opened by older
+     * programs, which "knock out" unknown feature bits.  When an image is
+     * opened by a newer program again it can detect that the autoclear
+     * feature is no longer valid.
+     */
+    if ((s->header.autoclear_features & ~QED_AUTOCLEAR_FEATURE_MASK) != 0 &&
+        !bdrv_is_read_only(bs->file)) {
+        s->header.autoclear_features &= QED_AUTOCLEAR_FEATURE_MASK;
+
+        ret = qed_write_header_sync(s);
+        if (ret) {
+            return ret;
+        }
+
+        /* From here on only known autoclear feature bits are valid */
+        bdrv_flush(bs->file);
+    }
+
+    return ret;
+}
+
+static void bdrv_qed_close(BlockDriverState *bs)
+{
+}
+
+static int bdrv_qed_flush(BlockDriverState *bs)
+{
+    return bdrv_flush(bs->file);
+}
+
+static int qed_create(const char *filename, uint32_t cluster_size,
+                      uint64_t image_size, uint32_t table_size,
+                      const char *backing_file, const char *backing_fmt)
+{
+    QEDHeader header = {
+        .magic = QED_MAGIC,
+        .cluster_size = cluster_size,
+        .table_size = table_size,
+        .header_size = 1,
+        .features = 0,
+        .compat_features = 0,
+        .l1_table_offset = cluster_size,
+        .image_size = image_size,
+    };
+    QEDHeader le_header;
+    uint8_t *l1_table = NULL;
+    size_t l1_size = header.cluster_size * header.table_size;
+    int ret = 0;
+    BlockDriverState *bs = NULL;
+
+    ret = bdrv_create_file(filename, NULL);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR | BDRV_O_CACHE_WB);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (backing_file) {
+        header.features |= QED_F_BACKING_FILE;
+        header.backing_filename_offset = sizeof(le_header);
+        header.backing_filename_size = strlen(backing_file);
+
+        if (qed_fmt_is_raw(backing_fmt)) {
+            header.features |= QED_F_BACKING_FORMAT_NO_PROBE;
+        }
+    }
+
+    qed_header_cpu_to_le(&header, &le_header);
+    ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
+    if (ret < 0) {
+        goto out;
+    }
+    ret = bdrv_pwrite(bs, sizeof(le_header), backing_file,
+                      header.backing_filename_size);
+    if (ret < 0) {
+        goto out;
+    }
+
+    l1_table = qemu_mallocz(l1_size);
+    ret = bdrv_pwrite(bs, header.l1_table_offset, l1_table, l1_size);
+    if (ret < 0) {
+        goto out;
+    }
+
+    ret = 0; /* success */
+out:
+    qemu_free(l1_table);
+    bdrv_delete(bs);
+    return ret;
+}
+
+static int bdrv_qed_create(const char *filename, QEMUOptionParameter *options)
+{
+    uint64_t image_size = 0;
+    uint32_t cluster_size = QED_DEFAULT_CLUSTER_SIZE;
+    uint32_t table_size = QED_DEFAULT_TABLE_SIZE;
+    const char *backing_file = NULL;
+    const char *backing_fmt = NULL;
+
+    while (options && options->name) {
+        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
+            image_size = options->value.n;
+        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
+            backing_file = options->value.s;
+        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
+            backing_fmt = options->value.s;
+        } else if (!strcmp(options->name, BLOCK_OPT_CLUSTER_SIZE)) {
+            if (options->value.n) {
+                cluster_size = options->value.n;
+            }
+        } else if (!strcmp(options->name, BLOCK_OPT_TABLE_SIZE)) {
+            if (options->value.n) {
+                table_size = options->value.n;
+            }
+        }
+        options++;
+    }
+
+    if (!qed_is_cluster_size_valid(cluster_size)) {
+        fprintf(stderr, "QED cluster size must be within range [%u, %u] and power of 2\n",
+                QED_MIN_CLUSTER_SIZE, QED_MAX_CLUSTER_SIZE);
+        return -EINVAL;
+    }
+    if (!qed_is_table_size_valid(table_size)) {
+        fprintf(stderr, "QED table size must be within range [%u, %u] and power of 2\n",
+                QED_MIN_TABLE_SIZE, QED_MAX_TABLE_SIZE);
+        return -EINVAL;
+    }
+    if (!qed_is_image_size_valid(image_size, cluster_size, table_size)) {
+        fprintf(stderr, "QED image size must be a non-zero multiple of "
+                        "cluster size and less than %" PRIu64 " bytes\n",
+                qed_max_image_size(cluster_size, table_size));
+        return -EINVAL;
+    }
+
+    return qed_create(filename, cluster_size, image_size, table_size,
+                      backing_file, backing_fmt);
+}
+
+static int bdrv_qed_is_allocated(BlockDriverState *bs, int64_t sector_num,
+                                  int nb_sectors, int *pnum)
+{
+    return -ENOTSUP;
+}
+
+static int bdrv_qed_make_empty(BlockDriverState *bs)
+{
+    return -ENOTSUP;
+}
+
+static BlockDriverAIOCB *bdrv_qed_aio_readv(BlockDriverState *bs,
+                                            int64_t sector_num,
+                                            QEMUIOVector *qiov, int nb_sectors,
+                                            BlockDriverCompletionFunc *cb,
+                                            void *opaque)
+{
+    return NULL;
+}
+
+static BlockDriverAIOCB *bdrv_qed_aio_writev(BlockDriverState *bs,
+                                             int64_t sector_num,
+                                             QEMUIOVector *qiov, int nb_sectors,
+                                             BlockDriverCompletionFunc *cb,
+                                             void *opaque)
+{
+    return NULL;
+}
+
+static BlockDriverAIOCB *bdrv_qed_aio_flush(BlockDriverState *bs,
+                                            BlockDriverCompletionFunc *cb,
+                                            void *opaque)
+{
+    return bdrv_aio_flush(bs->file, cb, opaque);
+}
+
+static int bdrv_qed_truncate(BlockDriverState *bs, int64_t offset)
+{
+    return -ENOTSUP;
+}
+
+static int64_t bdrv_qed_getlength(BlockDriverState *bs)
+{
+    BDRVQEDState *s = bs->opaque;
+    return s->header.image_size;
+}
+
+static int bdrv_qed_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+    BDRVQEDState *s = bs->opaque;
+
+    memset(bdi, 0, sizeof(*bdi));
+    bdi->cluster_size = s->header.cluster_size;
+    return 0;
+}
+
+static int bdrv_qed_change_backing_file(BlockDriverState *bs,
+                                        const char *backing_file,
+                                        const char *backing_fmt)
+{
+    BDRVQEDState *s = bs->opaque;
+    QEDHeader new_header, le_header;
+    void *buffer;
+    size_t buffer_len, backing_file_len;
+    int ret;
+
+    /* Refuse to set backing filename if unknown compat feature bits are
+     * active.  If the image uses an unknown compat feature then we may not
+     * know the layout of data following the header structure and cannot safely
+     * add a new string.
+     */
+    if (backing_file && (s->header.compat_features &
+                         ~QED_COMPAT_FEATURE_MASK)) {
+        return -ENOTSUP;
+    }
+
+    memcpy(&new_header, &s->header, sizeof(new_header));
+
+    new_header.features &= ~(QED_F_BACKING_FILE |
+                             QED_F_BACKING_FORMAT_NO_PROBE);
+
+    /* Adjust feature flags */
+    if (backing_file) {
+        new_header.features |= QED_F_BACKING_FILE;
+
+        if (qed_fmt_is_raw(backing_fmt)) {
+            new_header.features |= QED_F_BACKING_FORMAT_NO_PROBE;
+        }
+    }
+
+    /* Calculate new header size */
+    backing_file_len = 0;
+
+    if (backing_file) {
+        backing_file_len = strlen(backing_file);
+    }
+
+    buffer_len = sizeof(new_header);
+    new_header.backing_filename_offset = buffer_len;
+    new_header.backing_filename_size = backing_file_len;
+    buffer_len += backing_file_len;
+
+    /* Make sure we can rewrite header without failing */
+    if (buffer_len > new_header.header_size * new_header.cluster_size) {
+        return -ENOSPC;
+    }
+
+    /* Prepare new header */
+    buffer = qemu_malloc(buffer_len);
+
+    qed_header_cpu_to_le(&new_header, &le_header);
+    memcpy(buffer, &le_header, sizeof(le_header));
+    buffer_len = sizeof(le_header);
+
+    memcpy(buffer + buffer_len, backing_file, backing_file_len);
+    buffer_len += backing_file_len;
+
+    /* Write new header */
+    ret = bdrv_pwrite_sync(bs->file, 0, buffer, buffer_len);
+    qemu_free(buffer);
+    if (ret == 0) {
+        memcpy(&s->header, &new_header, sizeof(new_header));
+    }
+    return ret;
+}
+
+static int bdrv_qed_check(BlockDriverState *bs, BdrvCheckResult *result)
+{
+    return -ENOTSUP;
+}
+
+static QEMUOptionParameter qed_create_options[] = {
+    {
+        .name = BLOCK_OPT_SIZE,
+        .type = OPT_SIZE,
+        .help = "Virtual disk size (in bytes)"
+    }, {
+        .name = BLOCK_OPT_BACKING_FILE,
+        .type = OPT_STRING,
+        .help = "File name of a base image"
+    }, {
+        .name = BLOCK_OPT_BACKING_FMT,
+        .type = OPT_STRING,
+        .help = "Image format of the base image"
+    }, {
+        .name = BLOCK_OPT_CLUSTER_SIZE,
+        .type = OPT_SIZE,
+        .help = "Cluster size (in bytes)"
+    }, {
+        .name = BLOCK_OPT_TABLE_SIZE,
+        .type = OPT_SIZE,
+        .help = "L1/L2 table size (in clusters)"
+    },
+    { /* end of list */ }
+};
+
+static BlockDriver bdrv_qed = {
+    .format_name              = "qed",
+    .instance_size            = sizeof(BDRVQEDState),
+    .create_options           = qed_create_options,
+
+    .bdrv_probe               = bdrv_qed_probe,
+    .bdrv_open                = bdrv_qed_open,
+    .bdrv_close               = bdrv_qed_close,
+    .bdrv_create              = bdrv_qed_create,
+    .bdrv_flush               = bdrv_qed_flush,
+    .bdrv_is_allocated        = bdrv_qed_is_allocated,
+    .bdrv_make_empty          = bdrv_qed_make_empty,
+    .bdrv_aio_readv           = bdrv_qed_aio_readv,
+    .bdrv_aio_writev          = bdrv_qed_aio_writev,
+    .bdrv_aio_flush           = bdrv_qed_aio_flush,
+    .bdrv_truncate            = bdrv_qed_truncate,
+    .bdrv_getlength           = bdrv_qed_getlength,
+    .bdrv_get_info            = bdrv_qed_get_info,
+    .bdrv_change_backing_file = bdrv_qed_change_backing_file,
+    .bdrv_check               = bdrv_qed_check,
+};
+
+static void bdrv_qed_init(void)
+{
+    bdrv_register(&bdrv_qed);
+}
+
+block_init(bdrv_qed_init);
diff --git a/block/qed.h b/block/qed.h
new file mode 100644
index 0000000..1f8a125
--- /dev/null
+++ b/block/qed.h
@@ -0,0 +1,148 @@
+/*
+ * QEMU Enhanced Disk Format
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *  Stefan Hajnoczi   <stefanha@linux.vnet.ibm.com>
+ *  Anthony Liguori   <aliguori@us.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef BLOCK_QED_H
+#define BLOCK_QED_H
+
+#include "block_int.h"
+
+/* The layout of a QED file is as follows:
+ *
+ * +--------+----------+----------+----------+-----+
+ * | header | L1 table | cluster0 | cluster1 | ... |
+ * +--------+----------+----------+----------+-----+
+ *
+ * There is a 2-level pagetable for cluster allocation:
+ *
+ *                     +----------+
+ *                     | L1 table |
+ *                     +----------+
+ *                ,------'  |  '------.
+ *           +----------+   |    +----------+
+ *           | L2 table |  ...   | L2 table |
+ *           +----------+        +----------+
+ *       ,------'  |  '------.
+ *  +----------+   |    +----------+
+ *  |   Data   |  ...   |   Data   |
+ *  +----------+        +----------+
+ *
+ * The L1 table is fixed size and always present.  L2 tables are allocated on
+ * demand.  The L1 table size determines the maximum possible image size; it
+ * can be influenced using the cluster_size and table_size values.
+ *
+ * All fields are little-endian on disk.
+ */
+
+enum {
+    QED_MAGIC = 'Q' | 'E' << 8 | 'D' << 16 | '\0' << 24,
+
+    /* The image supports a backing file */
+    QED_F_BACKING_FILE = 0x01,
+
+    /* The backing file format must not be probed, treat as raw image */
+    QED_F_BACKING_FORMAT_NO_PROBE = 0x04,
+
+    /* Feature bits must be used when the on-disk format changes */
+    QED_FEATURE_MASK = QED_F_BACKING_FILE | /* supported feature bits */
+                       QED_F_BACKING_FORMAT_NO_PROBE,
+    QED_COMPAT_FEATURE_MASK = 0,            /* supported compat feature bits */
+    QED_AUTOCLEAR_FEATURE_MASK = 0,         /* supported autoclear feature bits */
+
+    /* Data is stored in groups of sectors called clusters.  Cluster size must
+     * be large to avoid keeping too much metadata.  I/O requests that have
+     * sub-cluster size will require read-modify-write.
+     */
+    QED_MIN_CLUSTER_SIZE = 4 * 1024, /* in bytes */
+    QED_MAX_CLUSTER_SIZE = 64 * 1024 * 1024,
+    QED_DEFAULT_CLUSTER_SIZE = 64 * 1024,
+
+    /* Allocated clusters are tracked using a 2-level pagetable.  Table size is
+     * a multiple of clusters so large maximum image sizes can be supported
+     * without jacking up the cluster size too much.
+     */
+    QED_MIN_TABLE_SIZE = 1,        /* in clusters */
+    QED_MAX_TABLE_SIZE = 16,
+    QED_DEFAULT_TABLE_SIZE = 4,
+};
+
+typedef struct {
+    uint32_t magic;                 /* QED\0 */
+
+    uint32_t cluster_size;          /* in bytes */
+    uint32_t table_size;            /* for L1 and L2 tables, in clusters */
+    uint32_t header_size;           /* in clusters */
+
+    uint64_t features;              /* format feature bits */
+    uint64_t compat_features;       /* compatible feature bits */
+    uint64_t autoclear_features;    /* self-resetting feature bits */
+
+    uint64_t l1_table_offset;       /* in bytes */
+    uint64_t image_size;            /* total logical image size, in bytes */
+
+    /* if (features & QED_F_BACKING_FILE) */
+    uint32_t backing_filename_offset; /* in bytes from start of header */
+    uint32_t backing_filename_size;   /* in bytes */
+} QEDHeader;
+
+typedef struct {
+    BlockDriverState *bs;           /* device */
+    uint64_t file_size;             /* length of image file, in bytes */
+
+    QEDHeader header;               /* always cpu-endian */
+    uint32_t table_nelems;
+    uint32_t l1_shift;
+    uint32_t l2_shift;
+    uint32_t l2_mask;
+} BDRVQEDState;
+
+/**
+ * Round down to the start of a cluster
+ */
+static inline uint64_t qed_start_of_cluster(BDRVQEDState *s, uint64_t offset)
+{
+    return offset & ~(uint64_t)(s->header.cluster_size - 1);
+}
+
+/**
+ * Test if a cluster offset is valid
+ */
+static inline bool qed_check_cluster_offset(BDRVQEDState *s, uint64_t offset)
+{
+    uint64_t header_size = (uint64_t)s->header.header_size *
+                           s->header.cluster_size;
+
+    if (offset & (s->header.cluster_size - 1)) {
+        return false;
+    }
+    return offset >= header_size && offset < s->file_size;
+}
+
+/**
+ * Test if a table offset is valid
+ */
+static inline bool qed_check_table_offset(BDRVQEDState *s, uint64_t offset)
+{
+    uint64_t end_offset = offset + (s->header.table_size - 1) *
+                          s->header.cluster_size;
+
+    /* Overflow check */
+    if (end_offset <= offset) {
+        return false;
+    }
+
+    return qed_check_cluster_offset(s, offset) &&
+           qed_check_cluster_offset(s, end_offset);
+}
+
+#endif /* BLOCK_QED_H */
diff --git a/block_int.h b/block_int.h
index eb5cd42..12663e8 100644
--- a/block_int.h
+++ b/block_int.h
@@ -37,6 +37,7 @@
 #define BLOCK_OPT_BACKING_FILE  "backing_file"
 #define BLOCK_OPT_BACKING_FMT   "backing_fmt"
 #define BLOCK_OPT_CLUSTER_SIZE  "cluster_size"
+#define BLOCK_OPT_TABLE_SIZE    "table_size"
 #define BLOCK_OPT_PREALLOC      "preallocation"
 
 typedef struct AIOPool {
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 33/38] qed: Table, L2 cache, and cluster functions
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (31 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 32/38] qed: Add QEMU Enhanced Disk image format Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 34/38] qed: Read/write support Kevin Wolf
                   ` (5 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

This patch adds code to look up data cluster offsets in the image via
the L1/L2 tables.  The L2 tables are writethrough cached in memory for
performance (each read/write requires a lookup so it is essential to
cache the tables).

With cluster lookup code in place it is possible to implement
bdrv_is_allocated() to query the number of contiguous
allocated/unallocated clusters.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 Makefile.objs        |    2 +-
 block/qed-cluster.c  |  154 ++++++++++++++++++++++++
 block/qed-gencb.c    |   32 +++++
 block/qed-l2-cache.c |  173 +++++++++++++++++++++++++++
 block/qed-table.c    |  319 ++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qed.c          |   54 +++++++++-
 block/qed.h          |  123 +++++++++++++++++++
 trace-events         |   11 ++
 8 files changed, 866 insertions(+), 2 deletions(-)
 create mode 100644 block/qed-cluster.c
 create mode 100644 block/qed-gencb.c
 create mode 100644 block/qed-l2-cache.c
 create mode 100644 block/qed-table.c

diff --git a/Makefile.objs b/Makefile.objs
index 50b91e8..1860152 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -20,7 +20,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 
 block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
-block-nested-y += qed.o
+block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
diff --git a/block/qed-cluster.c b/block/qed-cluster.c
new file mode 100644
index 0000000..0ec864b
--- /dev/null
+++ b/block/qed-cluster.c
@@ -0,0 +1,154 @@
+/*
+ * QEMU Enhanced Disk Format Cluster functions
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *  Stefan Hajnoczi   <stefanha@linux.vnet.ibm.com>
+ *  Anthony Liguori   <aliguori@us.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qed.h"
+
+/**
+ * Count the number of contiguous data clusters
+ *
+ * @s:              QED state
+ * @table:          L2 table
+ * @index:          First cluster index
+ * @n:              Maximum number of clusters
+ * @offset:         Set to first cluster offset
+ *
+ * This function scans tables for contiguous allocated or free clusters.
+ */
+static unsigned int qed_count_contiguous_clusters(BDRVQEDState *s,
+                                                  QEDTable *table,
+                                                  unsigned int index,
+                                                  unsigned int n,
+                                                  uint64_t *offset)
+{
+    unsigned int end = MIN(index + n, s->table_nelems);
+    uint64_t last = table->offsets[index];
+    unsigned int i;
+
+    *offset = last;
+
+    for (i = index + 1; i < end; i++) {
+        if (last == 0) {
+            /* Counting free clusters */
+            if (table->offsets[i] != 0) {
+                break;
+            }
+        } else {
+            /* Counting allocated clusters */
+            if (table->offsets[i] != last + s->header.cluster_size) {
+                break;
+            }
+            last = table->offsets[i];
+        }
+    }
+    return i - index;
+}
+
+typedef struct {
+    BDRVQEDState *s;
+    uint64_t pos;
+    size_t len;
+
+    QEDRequest *request;
+
+    /* User callback */
+    QEDFindClusterFunc *cb;
+    void *opaque;
+} QEDFindClusterCB;
+
+static void qed_find_cluster_cb(void *opaque, int ret)
+{
+    QEDFindClusterCB *find_cluster_cb = opaque;
+    BDRVQEDState *s = find_cluster_cb->s;
+    QEDRequest *request = find_cluster_cb->request;
+    uint64_t offset = 0;
+    size_t len = 0;
+    unsigned int index;
+    unsigned int n;
+
+    if (ret) {
+        goto out;
+    }
+
+    index = qed_l2_index(s, find_cluster_cb->pos);
+    n = qed_bytes_to_clusters(s,
+                              qed_offset_into_cluster(s, find_cluster_cb->pos) +
+                              find_cluster_cb->len);
+    n = qed_count_contiguous_clusters(s, request->l2_table->table,
+                                      index, n, &offset);
+
+    ret = offset ? QED_CLUSTER_FOUND : QED_CLUSTER_L2;
+    len = MIN(find_cluster_cb->len, n * s->header.cluster_size -
+              qed_offset_into_cluster(s, find_cluster_cb->pos));
+
+    if (offset && !qed_check_cluster_offset(s, offset)) {
+        ret = -EINVAL;
+    }
+
+out:
+    find_cluster_cb->cb(find_cluster_cb->opaque, ret, offset, len);
+    qemu_free(find_cluster_cb);
+}
+
+/**
+ * Find the offset of a data cluster
+ *
+ * @s:          QED state
+ * @request:    L2 cache entry
+ * @pos:        Byte position in device
+ * @len:        Number of bytes
+ * @cb:         Completion function
+ * @opaque:     User data for completion function
+ *
+ * This function translates a position in the block device to an offset in the
+ * image file.  It invokes the cb completion callback to report back the
+ * translated offset or unallocated range in the image file.
+ *
+ * If the L2 table exists, request->l2_table points to the L2 table cache entry
+ * and the caller must free the reference when they are finished.  The cache
+ * entry is exposed in this way to avoid callers having to read the L2 table
+ * again later during request processing.  If request->l2_table is non-NULL it
+ * will be unreferenced before taking on the new cache entry.
+ */
+void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
+                      size_t len, QEDFindClusterFunc *cb, void *opaque)
+{
+    QEDFindClusterCB *find_cluster_cb;
+    uint64_t l2_offset;
+
+    /* Limit length to L2 boundary.  Requests are broken up at the L2 boundary
+     * so that a request acts on one L2 table at a time.
+     */
+    len = MIN(len, (((pos >> s->l1_shift) + 1) << s->l1_shift) - pos);
+
+    l2_offset = s->l1_table->offsets[qed_l1_index(s, pos)];
+    if (!l2_offset) {
+        cb(opaque, QED_CLUSTER_L1, 0, len);
+        return;
+    }
+    if (!qed_check_table_offset(s, l2_offset)) {
+        cb(opaque, -EINVAL, 0, 0);
+        return;
+    }
+
+    find_cluster_cb = qemu_malloc(sizeof(*find_cluster_cb));
+    find_cluster_cb->s = s;
+    find_cluster_cb->pos = pos;
+    find_cluster_cb->len = len;
+    find_cluster_cb->cb = cb;
+    find_cluster_cb->opaque = opaque;
+    find_cluster_cb->request = request;
+
+    qed_read_l2_table(s, request, l2_offset,
+                      qed_find_cluster_cb, find_cluster_cb);
+}
diff --git a/block/qed-gencb.c b/block/qed-gencb.c
new file mode 100644
index 0000000..1513dc6
--- /dev/null
+++ b/block/qed-gencb.c
@@ -0,0 +1,32 @@
+/*
+ * QEMU Enhanced Disk Format
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *  Stefan Hajnoczi   <stefanha@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qed.h"
+
+void *gencb_alloc(size_t len, BlockDriverCompletionFunc *cb, void *opaque)
+{
+    GenericCB *gencb = qemu_malloc(len);
+    gencb->cb = cb;
+    gencb->opaque = opaque;
+    return gencb;
+}
+
+void gencb_complete(void *opaque, int ret)
+{
+    GenericCB *gencb = opaque;
+    BlockDriverCompletionFunc *cb = gencb->cb;
+    void *user_opaque = gencb->opaque;
+
+    qemu_free(gencb);
+    cb(user_opaque, ret);
+}
diff --git a/block/qed-l2-cache.c b/block/qed-l2-cache.c
new file mode 100644
index 0000000..57518a4
--- /dev/null
+++ b/block/qed-l2-cache.c
@@ -0,0 +1,173 @@
+/*
+ * QEMU Enhanced Disk Format L2 Cache
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *  Anthony Liguori   <aliguori@us.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+/*
+ * L2 table cache usage is as follows:
+ *
+ * An open image has one L2 table cache that is used to avoid accessing the
+ * image file for recently referenced L2 tables.
+ *
+ * Cluster offset lookup translates the logical offset within the block device
+ * to a cluster offset within the image file.  This is done by indexing into
+ * the L1 and L2 tables which store cluster offsets.  It is here where the L2
+ * table cache serves up recently referenced L2 tables.
+ *
+ * If there is a cache miss, that L2 table is read from the image file and
+ * committed to the cache.  Subsequent accesses to that L2 table will be served
+ * from the cache until the table is evicted from the cache.
+ *
+ * L2 tables are also committed to the cache when new L2 tables are allocated
+ * in the image file.  Since the L2 table cache is write-through, the new L2
+ * table is first written out to the image file and then committed to the
+ * cache.
+ *
+ * Multiple I/O requests may be using an L2 table cache entry at any given
+ * time.  That means an entry may be in use across several requests and
+ * reference counting is needed to free the entry at the correct time.  In
+ * particular, an entry evicted from the cache will only be freed once all
+ * references are dropped.
+ *
+ * An in-flight I/O request will hold a reference to a L2 table cache entry for
+ * the period during which it needs to access the L2 table.  This includes
+ * cluster offset lookup, L2 table allocation, and L2 table update when a new
+ * data cluster has been allocated.
+ *
+ * An interesting case occurs when two requests need to access an L2 table that
+ * is not in the cache.  Since the operation to read the table from the image
+ * file takes some time to complete, both requests may see a cache miss and
+ * start reading the L2 table from the image file.  The first to finish will
+ * commit its L2 table into the cache.  When the second tries to commit its
+ * table will be deleted in favor of the existing cache entry.
+ */
+
+#include "trace.h"
+#include "qed.h"
+
+/* Each L2 holds 2GB so this let's us fully cache a 100GB disk */
+#define MAX_L2_CACHE_SIZE 50
+
+/**
+ * Initialize the L2 cache
+ */
+void qed_init_l2_cache(L2TableCache *l2_cache)
+{
+    QTAILQ_INIT(&l2_cache->entries);
+    l2_cache->n_entries = 0;
+}
+
+/**
+ * Free the L2 cache
+ */
+void qed_free_l2_cache(L2TableCache *l2_cache)
+{
+    CachedL2Table *entry, *next_entry;
+
+    QTAILQ_FOREACH_SAFE(entry, &l2_cache->entries, node, next_entry) {
+        qemu_vfree(entry->table);
+        qemu_free(entry);
+    }
+}
+
+/**
+ * Allocate an uninitialized entry from the cache
+ *
+ * The returned entry has a reference count of 1 and is owned by the caller.
+ * The caller must allocate the actual table field for this entry and it must
+ * be freeable using qemu_vfree().
+ */
+CachedL2Table *qed_alloc_l2_cache_entry(L2TableCache *l2_cache)
+{
+    CachedL2Table *entry;
+
+    entry = qemu_mallocz(sizeof(*entry));
+    entry->ref++;
+
+    trace_qed_alloc_l2_cache_entry(l2_cache, entry);
+
+    return entry;
+}
+
+/**
+ * Decrease an entry's reference count and free if necessary when the reference
+ * count drops to zero.
+ */
+void qed_unref_l2_cache_entry(CachedL2Table *entry)
+{
+    if (!entry) {
+        return;
+    }
+
+    entry->ref--;
+    trace_qed_unref_l2_cache_entry(entry, entry->ref);
+    if (entry->ref == 0) {
+        qemu_vfree(entry->table);
+        qemu_free(entry);
+    }
+}
+
+/**
+ * Find an entry in the L2 cache.  This may return NULL and it's up to the
+ * caller to satisfy the cache miss.
+ *
+ * For a cached entry, this function increases the reference count and returns
+ * the entry.
+ */
+CachedL2Table *qed_find_l2_cache_entry(L2TableCache *l2_cache, uint64_t offset)
+{
+    CachedL2Table *entry;
+
+    QTAILQ_FOREACH(entry, &l2_cache->entries, node) {
+        if (entry->offset == offset) {
+            trace_qed_find_l2_cache_entry(l2_cache, entry, offset, entry->ref);
+            entry->ref++;
+            return entry;
+        }
+    }
+    return NULL;
+}
+
+/**
+ * Commit an L2 cache entry into the cache.  This is meant to be used as part of
+ * the process to satisfy a cache miss.  A caller would allocate an entry which
+ * is not actually in the L2 cache and then once the entry was valid and
+ * present on disk, the entry can be committed into the cache.
+ *
+ * Since the cache is write-through, it's important that this function is not
+ * called until the entry is present on disk and the L1 has been updated to
+ * point to the entry.
+ *
+ * N.B. This function steals a reference to the l2_table from the caller so the
+ * caller must obtain a new reference by issuing a call to
+ * qed_find_l2_cache_entry().
+ */
+void qed_commit_l2_cache_entry(L2TableCache *l2_cache, CachedL2Table *l2_table)
+{
+    CachedL2Table *entry;
+
+    entry = qed_find_l2_cache_entry(l2_cache, l2_table->offset);
+    if (entry) {
+        qed_unref_l2_cache_entry(entry);
+        qed_unref_l2_cache_entry(l2_table);
+        return;
+    }
+
+    if (l2_cache->n_entries >= MAX_L2_CACHE_SIZE) {
+        entry = QTAILQ_FIRST(&l2_cache->entries);
+        QTAILQ_REMOVE(&l2_cache->entries, entry, node);
+        l2_cache->n_entries--;
+        qed_unref_l2_cache_entry(entry);
+    }
+
+    l2_cache->n_entries++;
+    QTAILQ_INSERT_TAIL(&l2_cache->entries, l2_table, node);
+}
diff --git a/block/qed-table.c b/block/qed-table.c
new file mode 100644
index 0000000..d38c673
--- /dev/null
+++ b/block/qed-table.c
@@ -0,0 +1,319 @@
+/*
+ * QEMU Enhanced Disk Format Table I/O
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *  Stefan Hajnoczi   <stefanha@linux.vnet.ibm.com>
+ *  Anthony Liguori   <aliguori@us.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "trace.h"
+#include "qemu_socket.h" /* for EINPROGRESS on Windows */
+#include "qed.h"
+
+typedef struct {
+    GenericCB gencb;
+    BDRVQEDState *s;
+    QEDTable *table;
+
+    struct iovec iov;
+    QEMUIOVector qiov;
+} QEDReadTableCB;
+
+static void qed_read_table_cb(void *opaque, int ret)
+{
+    QEDReadTableCB *read_table_cb = opaque;
+    QEDTable *table = read_table_cb->table;
+    int noffsets = read_table_cb->iov.iov_len / sizeof(uint64_t);
+    int i;
+
+    /* Handle I/O error */
+    if (ret) {
+        goto out;
+    }
+
+    /* Byteswap offsets */
+    for (i = 0; i < noffsets; i++) {
+        table->offsets[i] = le64_to_cpu(table->offsets[i]);
+    }
+
+out:
+    /* Completion */
+    trace_qed_read_table_cb(read_table_cb->s, read_table_cb->table, ret);
+    gencb_complete(&read_table_cb->gencb, ret);
+}
+
+static void qed_read_table(BDRVQEDState *s, uint64_t offset, QEDTable *table,
+                           BlockDriverCompletionFunc *cb, void *opaque)
+{
+    QEDReadTableCB *read_table_cb = gencb_alloc(sizeof(*read_table_cb),
+                                                cb, opaque);
+    QEMUIOVector *qiov = &read_table_cb->qiov;
+    BlockDriverAIOCB *aiocb;
+
+    trace_qed_read_table(s, offset, table);
+
+    read_table_cb->s = s;
+    read_table_cb->table = table;
+    read_table_cb->iov.iov_base = table->offsets,
+    read_table_cb->iov.iov_len = s->header.cluster_size * s->header.table_size,
+
+    qemu_iovec_init_external(qiov, &read_table_cb->iov, 1);
+    aiocb = bdrv_aio_readv(s->bs->file, offset / BDRV_SECTOR_SIZE, qiov,
+                           read_table_cb->iov.iov_len / BDRV_SECTOR_SIZE,
+                           qed_read_table_cb, read_table_cb);
+    if (!aiocb) {
+        qed_read_table_cb(read_table_cb, -EIO);
+    }
+}
+
+typedef struct {
+    GenericCB gencb;
+    BDRVQEDState *s;
+    QEDTable *orig_table;
+    QEDTable *table;
+    bool flush;             /* flush after write? */
+
+    struct iovec iov;
+    QEMUIOVector qiov;
+} QEDWriteTableCB;
+
+static void qed_write_table_cb(void *opaque, int ret)
+{
+    QEDWriteTableCB *write_table_cb = opaque;
+
+    trace_qed_write_table_cb(write_table_cb->s,
+                             write_table_cb->orig_table,
+                             write_table_cb->flush,
+                             ret);
+
+    if (ret) {
+        goto out;
+    }
+
+    if (write_table_cb->flush) {
+        /* We still need to flush first */
+        write_table_cb->flush = false;
+        bdrv_aio_flush(write_table_cb->s->bs, qed_write_table_cb,
+                       write_table_cb);
+        return;
+    }
+
+out:
+    qemu_vfree(write_table_cb->table);
+    gencb_complete(&write_table_cb->gencb, ret);
+    return;
+}
+
+/**
+ * Write out an updated part or all of a table
+ *
+ * @s:          QED state
+ * @offset:     Offset of table in image file, in bytes
+ * @table:      Table
+ * @index:      Index of first element
+ * @n:          Number of elements
+ * @flush:      Whether or not to sync to disk
+ * @cb:         Completion function
+ * @opaque:     Argument for completion function
+ */
+static void qed_write_table(BDRVQEDState *s, uint64_t offset, QEDTable *table,
+                            unsigned int index, unsigned int n, bool flush,
+                            BlockDriverCompletionFunc *cb, void *opaque)
+{
+    QEDWriteTableCB *write_table_cb;
+    BlockDriverAIOCB *aiocb;
+    unsigned int sector_mask = BDRV_SECTOR_SIZE / sizeof(uint64_t) - 1;
+    unsigned int start, end, i;
+    size_t len_bytes;
+
+    trace_qed_write_table(s, offset, table, index, n);
+
+    /* Calculate indices of the first and one after last elements */
+    start = index & ~sector_mask;
+    end = (index + n + sector_mask) & ~sector_mask;
+
+    len_bytes = (end - start) * sizeof(uint64_t);
+
+    write_table_cb = gencb_alloc(sizeof(*write_table_cb), cb, opaque);
+    write_table_cb->s = s;
+    write_table_cb->orig_table = table;
+    write_table_cb->flush = flush;
+    write_table_cb->table = qemu_blockalign(s->bs, len_bytes);
+    write_table_cb->iov.iov_base = write_table_cb->table->offsets;
+    write_table_cb->iov.iov_len = len_bytes;
+    qemu_iovec_init_external(&write_table_cb->qiov, &write_table_cb->iov, 1);
+
+    /* Byteswap table */
+    for (i = start; i < end; i++) {
+        uint64_t le_offset = cpu_to_le64(table->offsets[i]);
+        write_table_cb->table->offsets[i - start] = le_offset;
+    }
+
+    /* Adjust for offset into table */
+    offset += start * sizeof(uint64_t);
+
+    aiocb = bdrv_aio_writev(s->bs->file, offset / BDRV_SECTOR_SIZE,
+                            &write_table_cb->qiov,
+                            write_table_cb->iov.iov_len / BDRV_SECTOR_SIZE,
+                            qed_write_table_cb, write_table_cb);
+    if (!aiocb) {
+        qed_write_table_cb(write_table_cb, -EIO);
+    }
+}
+
+/**
+ * Propagate return value from async callback
+ */
+static void qed_sync_cb(void *opaque, int ret)
+{
+    *(int *)opaque = ret;
+}
+
+int qed_read_l1_table_sync(BDRVQEDState *s)
+{
+    int ret = -EINPROGRESS;
+
+    async_context_push();
+
+    qed_read_table(s, s->header.l1_table_offset,
+                   s->l1_table, qed_sync_cb, &ret);
+    while (ret == -EINPROGRESS) {
+        qemu_aio_wait();
+    }
+
+    async_context_pop();
+
+    return ret;
+}
+
+void qed_write_l1_table(BDRVQEDState *s, unsigned int index, unsigned int n,
+                        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    BLKDBG_EVENT(s->bs->file, BLKDBG_L1_UPDATE);
+    qed_write_table(s, s->header.l1_table_offset,
+                    s->l1_table, index, n, false, cb, opaque);
+}
+
+int qed_write_l1_table_sync(BDRVQEDState *s, unsigned int index,
+                            unsigned int n)
+{
+    int ret = -EINPROGRESS;
+
+    async_context_push();
+
+    qed_write_l1_table(s, index, n, qed_sync_cb, &ret);
+    while (ret == -EINPROGRESS) {
+        qemu_aio_wait();
+    }
+
+    async_context_pop();
+
+    return ret;
+}
+
+typedef struct {
+    GenericCB gencb;
+    BDRVQEDState *s;
+    uint64_t l2_offset;
+    QEDRequest *request;
+} QEDReadL2TableCB;
+
+static void qed_read_l2_table_cb(void *opaque, int ret)
+{
+    QEDReadL2TableCB *read_l2_table_cb = opaque;
+    QEDRequest *request = read_l2_table_cb->request;
+    BDRVQEDState *s = read_l2_table_cb->s;
+    CachedL2Table *l2_table = request->l2_table;
+
+    if (ret) {
+        /* can't trust loaded L2 table anymore */
+        qed_unref_l2_cache_entry(l2_table);
+        request->l2_table = NULL;
+    } else {
+        l2_table->offset = read_l2_table_cb->l2_offset;
+
+        qed_commit_l2_cache_entry(&s->l2_cache, l2_table);
+
+        /* This is guaranteed to succeed because we just committed the entry
+         * to the cache.
+         */
+        request->l2_table = qed_find_l2_cache_entry(&s->l2_cache,
+                                                    l2_table->offset);
+        assert(request->l2_table != NULL);
+    }
+
+    gencb_complete(&read_l2_table_cb->gencb, ret);
+}
+
+void qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset,
+                       BlockDriverCompletionFunc *cb, void *opaque)
+{
+    QEDReadL2TableCB *read_l2_table_cb;
+
+    qed_unref_l2_cache_entry(request->l2_table);
+
+    /* Check for cached L2 entry */
+    request->l2_table = qed_find_l2_cache_entry(&s->l2_cache, offset);
+    if (request->l2_table) {
+        cb(opaque, 0);
+        return;
+    }
+
+    request->l2_table = qed_alloc_l2_cache_entry(&s->l2_cache);
+    request->l2_table->table = qed_alloc_table(s);
+
+    read_l2_table_cb = gencb_alloc(sizeof(*read_l2_table_cb), cb, opaque);
+    read_l2_table_cb->s = s;
+    read_l2_table_cb->l2_offset = offset;
+    read_l2_table_cb->request = request;
+
+    BLKDBG_EVENT(s->bs->file, BLKDBG_L2_LOAD);
+    qed_read_table(s, offset, request->l2_table->table,
+                   qed_read_l2_table_cb, read_l2_table_cb);
+}
+
+int qed_read_l2_table_sync(BDRVQEDState *s, QEDRequest *request, uint64_t offset)
+{
+    int ret = -EINPROGRESS;
+
+    async_context_push();
+
+    qed_read_l2_table(s, request, offset, qed_sync_cb, &ret);
+    while (ret == -EINPROGRESS) {
+        qemu_aio_wait();
+    }
+
+    async_context_pop();
+    return ret;
+}
+
+void qed_write_l2_table(BDRVQEDState *s, QEDRequest *request,
+                        unsigned int index, unsigned int n, bool flush,
+                        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    BLKDBG_EVENT(s->bs->file, BLKDBG_L2_UPDATE);
+    qed_write_table(s, request->l2_table->offset,
+                    request->l2_table->table, index, n, flush, cb, opaque);
+}
+
+int qed_write_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
+                            unsigned int index, unsigned int n, bool flush)
+{
+    int ret = -EINPROGRESS;
+
+    async_context_push();
+
+    qed_write_l2_table(s, request, index, n, flush, qed_sync_cb, &ret);
+    while (ret == -EINPROGRESS) {
+        qemu_aio_wait();
+    }
+
+    async_context_pop();
+    return ret;
+}
diff --git a/block/qed.c b/block/qed.c
index 1436ac4..cd1bead 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -155,6 +155,13 @@ static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n,
     return 0;
 }
 
+QEDTable *qed_alloc_table(BDRVQEDState *s)
+{
+    /* Honor O_DIRECT memory alignment requirements */
+    return qemu_blockalign(s->bs,
+                           s->header.cluster_size * s->header.table_size);
+}
+
 static int bdrv_qed_open(BlockDriverState *bs, int flags)
 {
     BDRVQEDState *s = bs->opaque;
@@ -244,11 +251,23 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
         bdrv_flush(bs->file);
     }
 
+    s->l1_table = qed_alloc_table(s);
+    qed_init_l2_cache(&s->l2_cache);
+
+    ret = qed_read_l1_table_sync(s);
+    if (ret) {
+        qed_free_l2_cache(&s->l2_cache);
+        qemu_vfree(s->l1_table);
+    }
     return ret;
 }
 
 static void bdrv_qed_close(BlockDriverState *bs)
 {
+    BDRVQEDState *s = bs->opaque;
+
+    qed_free_l2_cache(&s->l2_cache);
+    qemu_vfree(s->l1_table);
 }
 
 static int bdrv_qed_flush(BlockDriverState *bs)
@@ -368,10 +387,43 @@ static int bdrv_qed_create(const char *filename, QEMUOptionParameter *options)
                       backing_file, backing_fmt);
 }
 
+typedef struct {
+    int is_allocated;
+    int *pnum;
+} QEDIsAllocatedCB;
+
+static void qed_is_allocated_cb(void *opaque, int ret, uint64_t offset, size_t len)
+{
+    QEDIsAllocatedCB *cb = opaque;
+    *cb->pnum = len / BDRV_SECTOR_SIZE;
+    cb->is_allocated = ret == QED_CLUSTER_FOUND;
+}
+
 static int bdrv_qed_is_allocated(BlockDriverState *bs, int64_t sector_num,
                                   int nb_sectors, int *pnum)
 {
-    return -ENOTSUP;
+    BDRVQEDState *s = bs->opaque;
+    uint64_t pos = (uint64_t)sector_num * BDRV_SECTOR_SIZE;
+    size_t len = (size_t)nb_sectors * BDRV_SECTOR_SIZE;
+    QEDIsAllocatedCB cb = {
+        .is_allocated = -1,
+        .pnum = pnum,
+    };
+    QEDRequest request = { .l2_table = NULL };
+
+    async_context_push();
+
+    qed_find_cluster(s, &request, pos, len, qed_is_allocated_cb, &cb);
+
+    while (cb.is_allocated == -1) {
+        qemu_aio_wait();
+    }
+
+    async_context_pop();
+
+    qed_unref_l2_cache_entry(request.l2_table);
+
+    return cb.is_allocated;
 }
 
 static int bdrv_qed_make_empty(BlockDriverState *bs)
diff --git a/block/qed.h b/block/qed.h
index 1f8a125..6d49a4d 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -96,16 +96,118 @@ typedef struct {
 } QEDHeader;
 
 typedef struct {
+    uint64_t offsets[0];            /* in bytes */
+} QEDTable;
+
+/* The L2 cache is a simple write-through cache for L2 structures */
+typedef struct CachedL2Table {
+    QEDTable *table;
+    uint64_t offset;    /* offset=0 indicates an invalidate entry */
+    QTAILQ_ENTRY(CachedL2Table) node;
+    int ref;
+} CachedL2Table;
+
+typedef struct {
+    QTAILQ_HEAD(, CachedL2Table) entries;
+    unsigned int n_entries;
+} L2TableCache;
+
+typedef struct QEDRequest {
+    CachedL2Table *l2_table;
+} QEDRequest;
+
+typedef struct {
     BlockDriverState *bs;           /* device */
     uint64_t file_size;             /* length of image file, in bytes */
 
     QEDHeader header;               /* always cpu-endian */
+    QEDTable *l1_table;
+    L2TableCache l2_cache;          /* l2 table cache */
     uint32_t table_nelems;
     uint32_t l1_shift;
     uint32_t l2_shift;
     uint32_t l2_mask;
 } BDRVQEDState;
 
+enum {
+    QED_CLUSTER_FOUND,         /* cluster found */
+    QED_CLUSTER_L2,            /* cluster missing in L2 */
+    QED_CLUSTER_L1,            /* cluster missing in L1 */
+};
+
+/**
+ * qed_find_cluster() completion callback
+ *
+ * @opaque:     User data for completion callback
+ * @ret:        QED_CLUSTER_FOUND   Success
+ *              QED_CLUSTER_L2      Data cluster unallocated in L2
+ *              QED_CLUSTER_L1      L2 unallocated in L1
+ *              -errno              POSIX error occurred
+ * @offset:     Data cluster offset
+ * @len:        Contiguous bytes starting from cluster offset
+ *
+ * This function is invoked when qed_find_cluster() completes.
+ *
+ * On success ret is QED_CLUSTER_FOUND and offset/len are a contiguous range
+ * in the image file.
+ *
+ * On failure ret is QED_CLUSTER_L2 or QED_CLUSTER_L1 for missing L2 or L1
+ * table offset, respectively.  len is number of contiguous unallocated bytes.
+ */
+typedef void QEDFindClusterFunc(void *opaque, int ret, uint64_t offset, size_t len);
+
+/**
+ * Generic callback for chaining async callbacks
+ */
+typedef struct {
+    BlockDriverCompletionFunc *cb;
+    void *opaque;
+} GenericCB;
+
+void *gencb_alloc(size_t len, BlockDriverCompletionFunc *cb, void *opaque);
+void gencb_complete(void *opaque, int ret);
+
+/**
+ * L2 cache functions
+ */
+void qed_init_l2_cache(L2TableCache *l2_cache);
+void qed_free_l2_cache(L2TableCache *l2_cache);
+CachedL2Table *qed_alloc_l2_cache_entry(L2TableCache *l2_cache);
+void qed_unref_l2_cache_entry(CachedL2Table *entry);
+CachedL2Table *qed_find_l2_cache_entry(L2TableCache *l2_cache, uint64_t offset);
+void qed_commit_l2_cache_entry(L2TableCache *l2_cache, CachedL2Table *l2_table);
+
+/**
+ * Table I/O functions
+ */
+int qed_read_l1_table_sync(BDRVQEDState *s);
+void qed_write_l1_table(BDRVQEDState *s, unsigned int index, unsigned int n,
+                        BlockDriverCompletionFunc *cb, void *opaque);
+int qed_write_l1_table_sync(BDRVQEDState *s, unsigned int index,
+                            unsigned int n);
+int qed_read_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
+                           uint64_t offset);
+void qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset,
+                       BlockDriverCompletionFunc *cb, void *opaque);
+void qed_write_l2_table(BDRVQEDState *s, QEDRequest *request,
+                        unsigned int index, unsigned int n, bool flush,
+                        BlockDriverCompletionFunc *cb, void *opaque);
+int qed_write_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
+                            unsigned int index, unsigned int n, bool flush);
+
+/**
+ * Cluster functions
+ */
+void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
+                      size_t len, QEDFindClusterFunc *cb, void *opaque);
+
+/**
+ * Consistency check
+ */
+int qed_check(BDRVQEDState *s, BdrvCheckResult *result, bool fix);
+
+QEDTable *qed_alloc_table(BDRVQEDState *s);
+
 /**
  * Round down to the start of a cluster
  */
@@ -114,6 +216,27 @@ static inline uint64_t qed_start_of_cluster(BDRVQEDState *s, uint64_t offset)
     return offset & ~(uint64_t)(s->header.cluster_size - 1);
 }
 
+static inline uint64_t qed_offset_into_cluster(BDRVQEDState *s, uint64_t offset)
+{
+    return offset & (s->header.cluster_size - 1);
+}
+
+static inline unsigned int qed_bytes_to_clusters(BDRVQEDState *s, size_t bytes)
+{
+    return qed_start_of_cluster(s, bytes + (s->header.cluster_size - 1)) /
+           (s->header.cluster_size - 1);
+}
+
+static inline unsigned int qed_l1_index(BDRVQEDState *s, uint64_t pos)
+{
+    return pos >> s->l1_shift;
+}
+
+static inline unsigned int qed_l2_index(BDRVQEDState *s, uint64_t pos)
+{
+    return (pos >> s->l2_shift) & s->l2_mask;
+}
+
 /**
  * Test if a cluster offset is valid
  */
diff --git a/trace-events b/trace-events
index da03d4b..59f97a2 100644
--- a/trace-events
+++ b/trace-events
@@ -192,3 +192,14 @@ disable sun4m_iommu_bad_addr(uint64_t addr) "bad addr %"PRIx64""
 
 # vl.c
 disable vm_state_notify(int running, int reason) "running %d reason %d"
+
+# block/qed-l2-cache.c
+disable qed_alloc_l2_cache_entry(void *l2_cache, void *entry) "l2_cache %p entry %p"
+disable qed_unref_l2_cache_entry(void *entry, int ref) "entry %p ref %d"
+disable qed_find_l2_cache_entry(void *l2_cache, void *entry, uint64_t offset, int ref) "l2_cache %p entry %p offset %"PRIu64" ref %d"
+
+# block/qed-table.c
+disable qed_read_table(void *s, uint64_t offset, void *table) "s %p offset %"PRIu64" table %p"
+disable qed_read_table_cb(void *s, void *table, int ret) "s %p table %p ret %d"
+disable qed_write_table(void *s, uint64_t offset, void *table, unsigned int index, unsigned int n) "s %p offset %"PRIu64" table %p index %u n %u"
+disable qed_write_table_cb(void *s, void *table, int flush, int ret) "s %p table %p flush %d ret %d"
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 34/38] qed: Read/write support
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (32 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 33/38] qed: Table, L2 cache, and cluster functions Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 35/38] qed: Consistency check support Kevin Wolf
                   ` (4 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

This patch implements the read/write state machine.  Operations are
fully asynchronous and multiple operations may be active at any time.

Allocating writes lock tables to ensure metadata updates do not
interfere with each other.  If two allocating writes need to update the
same L2 table they will run sequentially.  If two allocating writes need
to update different L2 tables they will run in parallel.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 Makefile.objs |    1 +
 block/qed.c   |  628 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 block/qed.h   |   26 +++
 trace-events  |   10 +
 4 files changed, 663 insertions(+), 2 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 1860152..d6b3d60 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -21,6 +21,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
+block-nested-y += qed-check.o
 block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
diff --git a/block/qed.c b/block/qed.c
index cd1bead..8e65d18 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -12,8 +12,26 @@
  *
  */
 
+#include "trace.h"
 #include "qed.h"
 
+static void qed_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+    QEDAIOCB *acb = (QEDAIOCB *)blockacb;
+    bool finished = false;
+
+    /* Wait for the request to finish */
+    acb->finished = &finished;
+    while (!finished) {
+        qemu_aio_wait();
+    }
+}
+
+static AIOPool qed_aio_pool = {
+    .aiocb_size         = sizeof(QEDAIOCB),
+    .cancel             = qed_aio_cancel,
+};
+
 static int bdrv_qed_probe(const uint8_t *buf, int buf_size,
                           const char *filename)
 {
@@ -155,6 +173,24 @@ static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n,
     return 0;
 }
 
+/**
+ * Allocate new clusters
+ *
+ * @s:          QED state
+ * @n:          Number of contiguous clusters to allocate
+ * @ret:        Offset of first allocated cluster
+ *
+ * This function only produces the offset where the new clusters should be
+ * written.  It updates BDRVQEDState but does not make any changes to the image
+ * file.
+ */
+static uint64_t qed_alloc_clusters(BDRVQEDState *s, unsigned int n)
+{
+    uint64_t offset = s->file_size;
+    s->file_size += n * s->header.cluster_size;
+    return offset;
+}
+
 QEDTable *qed_alloc_table(BDRVQEDState *s)
 {
     /* Honor O_DIRECT memory alignment requirements */
@@ -162,6 +198,23 @@ QEDTable *qed_alloc_table(BDRVQEDState *s)
                            s->header.cluster_size * s->header.table_size);
 }
 
+/**
+ * Allocate a new zeroed L2 table
+ */
+static CachedL2Table *qed_new_l2_table(BDRVQEDState *s)
+{
+    CachedL2Table *l2_table = qed_alloc_l2_cache_entry(&s->l2_cache);
+
+    l2_table->table = qed_alloc_table(s);
+    l2_table->offset = qed_alloc_clusters(s, s->header.table_size);
+
+    memset(l2_table->table->offsets, 0,
+           s->header.cluster_size * s->header.table_size);
+    return l2_table;
+}
+
+static void qed_aio_next_io(void *opaque, int ret);
+
 static int bdrv_qed_open(BlockDriverState *bs, int flags)
 {
     BDRVQEDState *s = bs->opaque;
@@ -170,6 +223,7 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
     int ret;
 
     s->bs = bs;
+    QSIMPLEQ_INIT(&s->allocating_write_reqs);
 
     ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
     if (ret < 0) {
@@ -431,13 +485,583 @@ static int bdrv_qed_make_empty(BlockDriverState *bs)
     return -ENOTSUP;
 }
 
+static BDRVQEDState *acb_to_s(QEDAIOCB *acb)
+{
+    return acb->common.bs->opaque;
+}
+
+/**
+ * Read from the backing file or zero-fill if no backing file
+ *
+ * @s:          QED state
+ * @pos:        Byte position in device
+ * @qiov:       Destination I/O vector
+ * @cb:         Completion function
+ * @opaque:     User data for completion function
+ *
+ * This function reads qiov->size bytes starting at pos from the backing file.
+ * If there is no backing file then zeroes are read.
+ */
+static void qed_read_backing_file(BDRVQEDState *s, uint64_t pos,
+                                  QEMUIOVector *qiov,
+                                  BlockDriverCompletionFunc *cb, void *opaque)
+{
+    BlockDriverAIOCB *aiocb;
+    uint64_t backing_length = 0;
+    size_t size;
+
+    /* If there is a backing file, get its length.  Treat the absence of a
+     * backing file like a zero length backing file.
+     */
+    if (s->bs->backing_hd) {
+        int64_t l = bdrv_getlength(s->bs->backing_hd);
+        if (l < 0) {
+            cb(opaque, l);
+            return;
+        }
+        backing_length = l;
+    }
+
+    /* Zero all sectors if reading beyond the end of the backing file */
+    if (pos >= backing_length ||
+        pos + qiov->size > backing_length) {
+        qemu_iovec_memset(qiov, 0, qiov->size);
+    }
+
+    /* Complete now if there are no backing file sectors to read */
+    if (pos >= backing_length) {
+        cb(opaque, 0);
+        return;
+    }
+
+    /* If the read straddles the end of the backing file, shorten it */
+    size = MIN((uint64_t)backing_length - pos, qiov->size);
+
+    BLKDBG_EVENT(s->bs->file, BLKDBG_READ_BACKING);
+    aiocb = bdrv_aio_readv(s->bs->backing_hd, pos / BDRV_SECTOR_SIZE,
+                           qiov, size / BDRV_SECTOR_SIZE, cb, opaque);
+    if (!aiocb) {
+        cb(opaque, -EIO);
+    }
+}
+
+typedef struct {
+    GenericCB gencb;
+    BDRVQEDState *s;
+    QEMUIOVector qiov;
+    struct iovec iov;
+    uint64_t offset;
+} CopyFromBackingFileCB;
+
+static void qed_copy_from_backing_file_cb(void *opaque, int ret)
+{
+    CopyFromBackingFileCB *copy_cb = opaque;
+    qemu_vfree(copy_cb->iov.iov_base);
+    gencb_complete(&copy_cb->gencb, ret);
+}
+
+static void qed_copy_from_backing_file_write(void *opaque, int ret)
+{
+    CopyFromBackingFileCB *copy_cb = opaque;
+    BDRVQEDState *s = copy_cb->s;
+    BlockDriverAIOCB *aiocb;
+
+    if (ret) {
+        qed_copy_from_backing_file_cb(copy_cb, ret);
+        return;
+    }
+
+    BLKDBG_EVENT(s->bs->file, BLKDBG_COW_WRITE);
+    aiocb = bdrv_aio_writev(s->bs->file, copy_cb->offset / BDRV_SECTOR_SIZE,
+                            &copy_cb->qiov,
+                            copy_cb->qiov.size / BDRV_SECTOR_SIZE,
+                            qed_copy_from_backing_file_cb, copy_cb);
+    if (!aiocb) {
+        qed_copy_from_backing_file_cb(copy_cb, -EIO);
+    }
+}
+
+/**
+ * Copy data from backing file into the image
+ *
+ * @s:          QED state
+ * @pos:        Byte position in device
+ * @len:        Number of bytes
+ * @offset:     Byte offset in image file
+ * @cb:         Completion function
+ * @opaque:     User data for completion function
+ */
+static void qed_copy_from_backing_file(BDRVQEDState *s, uint64_t pos,
+                                       uint64_t len, uint64_t offset,
+                                       BlockDriverCompletionFunc *cb,
+                                       void *opaque)
+{
+    CopyFromBackingFileCB *copy_cb;
+
+    /* Skip copy entirely if there is no work to do */
+    if (len == 0) {
+        cb(opaque, 0);
+        return;
+    }
+
+    copy_cb = gencb_alloc(sizeof(*copy_cb), cb, opaque);
+    copy_cb->s = s;
+    copy_cb->offset = offset;
+    copy_cb->iov.iov_base = qemu_blockalign(s->bs, len);
+    copy_cb->iov.iov_len = len;
+    qemu_iovec_init_external(&copy_cb->qiov, &copy_cb->iov, 1);
+
+    qed_read_backing_file(s, pos, &copy_cb->qiov,
+                          qed_copy_from_backing_file_write, copy_cb);
+}
+
+/**
+ * Link one or more contiguous clusters into a table
+ *
+ * @s:              QED state
+ * @table:          L2 table
+ * @index:          First cluster index
+ * @n:              Number of contiguous clusters
+ * @cluster:        First cluster byte offset in image file
+ */
+static void qed_update_l2_table(BDRVQEDState *s, QEDTable *table, int index,
+                                unsigned int n, uint64_t cluster)
+{
+    int i;
+    for (i = index; i < index + n; i++) {
+        table->offsets[i] = cluster;
+        cluster += s->header.cluster_size;
+    }
+}
+
+static void qed_aio_complete_bh(void *opaque)
+{
+    QEDAIOCB *acb = opaque;
+    BlockDriverCompletionFunc *cb = acb->common.cb;
+    void *user_opaque = acb->common.opaque;
+    int ret = acb->bh_ret;
+    bool *finished = acb->finished;
+
+    qemu_bh_delete(acb->bh);
+    qemu_aio_release(acb);
+
+    /* Invoke callback */
+    cb(user_opaque, ret);
+
+    /* Signal cancel completion */
+    if (finished) {
+        *finished = true;
+    }
+}
+
+static void qed_aio_complete(QEDAIOCB *acb, int ret)
+{
+    BDRVQEDState *s = acb_to_s(acb);
+
+    trace_qed_aio_complete(s, acb, ret);
+
+    /* Free resources */
+    qemu_iovec_destroy(&acb->cur_qiov);
+    qed_unref_l2_cache_entry(acb->request.l2_table);
+
+    /* Arrange for a bh to invoke the completion function */
+    acb->bh_ret = ret;
+    acb->bh = qemu_bh_new(qed_aio_complete_bh, acb);
+    qemu_bh_schedule(acb->bh);
+
+    /* Start next allocating write request waiting behind this one.  Note that
+     * requests enqueue themselves when they first hit an unallocated cluster
+     * but they wait until the entire request is finished before waking up the
+     * next request in the queue.  This ensures that we don't cycle through
+     * requests multiple times but rather finish one at a time completely.
+     */
+    if (acb == QSIMPLEQ_FIRST(&s->allocating_write_reqs)) {
+        QSIMPLEQ_REMOVE_HEAD(&s->allocating_write_reqs, next);
+        acb = QSIMPLEQ_FIRST(&s->allocating_write_reqs);
+        if (acb) {
+            qed_aio_next_io(acb, 0);
+        }
+    }
+}
+
+/**
+ * Commit the current L2 table to the cache
+ */
+static void qed_commit_l2_update(void *opaque, int ret)
+{
+    QEDAIOCB *acb = opaque;
+    BDRVQEDState *s = acb_to_s(acb);
+    CachedL2Table *l2_table = acb->request.l2_table;
+
+    qed_commit_l2_cache_entry(&s->l2_cache, l2_table);
+
+    /* This is guaranteed to succeed because we just committed the entry to the
+     * cache.
+     */
+    acb->request.l2_table = qed_find_l2_cache_entry(&s->l2_cache,
+                                                    l2_table->offset);
+    assert(acb->request.l2_table != NULL);
+
+    qed_aio_next_io(opaque, ret);
+}
+
+/**
+ * Update L1 table with new L2 table offset and write it out
+ */
+static void qed_aio_write_l1_update(void *opaque, int ret)
+{
+    QEDAIOCB *acb = opaque;
+    BDRVQEDState *s = acb_to_s(acb);
+    int index;
+
+    if (ret) {
+        qed_aio_complete(acb, ret);
+        return;
+    }
+
+    index = qed_l1_index(s, acb->cur_pos);
+    s->l1_table->offsets[index] = acb->request.l2_table->offset;
+
+    qed_write_l1_table(s, index, 1, qed_commit_l2_update, acb);
+}
+
+/**
+ * Update L2 table with new cluster offsets and write them out
+ */
+static void qed_aio_write_l2_update(void *opaque, int ret)
+{
+    QEDAIOCB *acb = opaque;
+    BDRVQEDState *s = acb_to_s(acb);
+    bool need_alloc = acb->find_cluster_ret == QED_CLUSTER_L1;
+    int index;
+
+    if (ret) {
+        goto err;
+    }
+
+    if (need_alloc) {
+        qed_unref_l2_cache_entry(acb->request.l2_table);
+        acb->request.l2_table = qed_new_l2_table(s);
+    }
+
+    index = qed_l2_index(s, acb->cur_pos);
+    qed_update_l2_table(s, acb->request.l2_table->table, index, acb->cur_nclusters,
+                         acb->cur_cluster);
+
+    if (need_alloc) {
+        /* Write out the whole new L2 table */
+        qed_write_l2_table(s, &acb->request, 0, s->table_nelems, true,
+                            qed_aio_write_l1_update, acb);
+    } else {
+        /* Write out only the updated part of the L2 table */
+        qed_write_l2_table(s, &acb->request, index, acb->cur_nclusters, false,
+                            qed_aio_next_io, acb);
+    }
+    return;
+
+err:
+    qed_aio_complete(acb, ret);
+}
+
+/**
+ * Flush new data clusters before updating the L2 table
+ *
+ * This flush is necessary when a backing file is in use.  A crash during an
+ * allocating write could result in empty clusters in the image.  If the write
+ * only touched a subregion of the cluster, then backing image sectors have
+ * been lost in the untouched region.  The solution is to flush after writing a
+ * new data cluster and before updating the L2 table.
+ */
+static void qed_aio_write_flush_before_l2_update(void *opaque, int ret)
+{
+    QEDAIOCB *acb = opaque;
+    BDRVQEDState *s = acb_to_s(acb);
+
+    if (!bdrv_aio_flush(s->bs->file, qed_aio_write_l2_update, opaque)) {
+        qed_aio_complete(acb, -EIO);
+    }
+}
+
+/**
+ * Write data to the image file
+ */
+static void qed_aio_write_main(void *opaque, int ret)
+{
+    QEDAIOCB *acb = opaque;
+    BDRVQEDState *s = acb_to_s(acb);
+    uint64_t offset = acb->cur_cluster +
+                      qed_offset_into_cluster(s, acb->cur_pos);
+    BlockDriverCompletionFunc *next_fn;
+    BlockDriverAIOCB *file_acb;
+
+    trace_qed_aio_write_main(s, acb, ret, offset, acb->cur_qiov.size);
+
+    if (ret) {
+        qed_aio_complete(acb, ret);
+        return;
+    }
+
+    if (acb->find_cluster_ret == QED_CLUSTER_FOUND) {
+        next_fn = qed_aio_next_io;
+    } else {
+        if (s->bs->backing_hd) {
+            next_fn = qed_aio_write_flush_before_l2_update;
+        } else {
+            next_fn = qed_aio_write_l2_update;
+        }
+    }
+
+    BLKDBG_EVENT(s->bs->file, BLKDBG_WRITE_AIO);
+    file_acb = bdrv_aio_writev(s->bs->file, offset / BDRV_SECTOR_SIZE,
+                               &acb->cur_qiov,
+                               acb->cur_qiov.size / BDRV_SECTOR_SIZE,
+                               next_fn, acb);
+    if (!file_acb) {
+        qed_aio_complete(acb, -EIO);
+    }
+}
+
+/**
+ * Populate back untouched region of new data cluster
+ */
+static void qed_aio_write_postfill(void *opaque, int ret)
+{
+    QEDAIOCB *acb = opaque;
+    BDRVQEDState *s = acb_to_s(acb);
+    uint64_t start = acb->cur_pos + acb->cur_qiov.size;
+    uint64_t len =
+        qed_start_of_cluster(s, start + s->header.cluster_size - 1) - start;
+    uint64_t offset = acb->cur_cluster +
+                      qed_offset_into_cluster(s, acb->cur_pos) +
+                      acb->cur_qiov.size;
+
+    if (ret) {
+        qed_aio_complete(acb, ret);
+        return;
+    }
+
+    trace_qed_aio_write_postfill(s, acb, start, len, offset);
+    qed_copy_from_backing_file(s, start, len, offset,
+                                qed_aio_write_main, acb);
+}
+
+/**
+ * Populate front untouched region of new data cluster
+ */
+static void qed_aio_write_prefill(void *opaque, int ret)
+{
+    QEDAIOCB *acb = opaque;
+    BDRVQEDState *s = acb_to_s(acb);
+    uint64_t start = qed_start_of_cluster(s, acb->cur_pos);
+    uint64_t len = qed_offset_into_cluster(s, acb->cur_pos);
+
+    trace_qed_aio_write_prefill(s, acb, start, len, acb->cur_cluster);
+    qed_copy_from_backing_file(s, start, len, acb->cur_cluster,
+                                qed_aio_write_postfill, acb);
+}
+
+/**
+ * Write new data cluster
+ *
+ * @acb:        Write request
+ * @len:        Length in bytes
+ *
+ * This path is taken when writing to previously unallocated clusters.
+ */
+static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
+{
+    BDRVQEDState *s = acb_to_s(acb);
+
+    /* Freeze this request if another allocating write is in progress */
+    if (acb != QSIMPLEQ_FIRST(&s->allocating_write_reqs)) {
+        QSIMPLEQ_INSERT_TAIL(&s->allocating_write_reqs, acb, next);
+    }
+    if (acb != QSIMPLEQ_FIRST(&s->allocating_write_reqs)) {
+        return; /* wait for existing request to finish */
+    }
+
+    acb->cur_nclusters = qed_bytes_to_clusters(s,
+            qed_offset_into_cluster(s, acb->cur_pos) + len);
+    acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
+    qemu_iovec_copy(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
+
+    /* Write new cluster */
+    qed_aio_write_prefill(acb, 0);
+}
+
+/**
+ * Write data cluster in place
+ *
+ * @acb:        Write request
+ * @offset:     Cluster offset in bytes
+ * @len:        Length in bytes
+ *
+ * This path is taken when writing to already allocated clusters.
+ */
+static void qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
+{
+    /* Calculate the I/O vector */
+    acb->cur_cluster = offset;
+    qemu_iovec_copy(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
+
+    /* Do the actual write */
+    qed_aio_write_main(acb, 0);
+}
+
+/**
+ * Write data cluster
+ *
+ * @opaque:     Write request
+ * @ret:        QED_CLUSTER_FOUND, QED_CLUSTER_L2, QED_CLUSTER_L1,
+ *              or -errno
+ * @offset:     Cluster offset in bytes
+ * @len:        Length in bytes
+ *
+ * Callback from qed_find_cluster().
+ */
+static void qed_aio_write_data(void *opaque, int ret,
+                               uint64_t offset, size_t len)
+{
+    QEDAIOCB *acb = opaque;
+
+    trace_qed_aio_write_data(acb_to_s(acb), acb, ret, offset, len);
+
+    acb->find_cluster_ret = ret;
+
+    switch (ret) {
+    case QED_CLUSTER_FOUND:
+        qed_aio_write_inplace(acb, offset, len);
+        break;
+
+    case QED_CLUSTER_L2:
+    case QED_CLUSTER_L1:
+        qed_aio_write_alloc(acb, len);
+        break;
+
+    default:
+        qed_aio_complete(acb, ret);
+        break;
+    }
+}
+
+/**
+ * Read data cluster
+ *
+ * @opaque:     Read request
+ * @ret:        QED_CLUSTER_FOUND, QED_CLUSTER_L2, QED_CLUSTER_L1,
+ *              or -errno
+ * @offset:     Cluster offset in bytes
+ * @len:        Length in bytes
+ *
+ * Callback from qed_find_cluster().
+ */
+static void qed_aio_read_data(void *opaque, int ret,
+                              uint64_t offset, size_t len)
+{
+    QEDAIOCB *acb = opaque;
+    BDRVQEDState *s = acb_to_s(acb);
+    BlockDriverState *bs = acb->common.bs;
+    BlockDriverAIOCB *file_acb;
+
+    /* Adjust offset into cluster */
+    offset += qed_offset_into_cluster(s, acb->cur_pos);
+
+    trace_qed_aio_read_data(s, acb, ret, offset, len);
+
+    if (ret < 0) {
+        goto err;
+    }
+
+    qemu_iovec_copy(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
+
+    /* Handle backing file and unallocated sparse hole reads */
+    if (ret != QED_CLUSTER_FOUND) {
+        qed_read_backing_file(s, acb->cur_pos, &acb->cur_qiov,
+                              qed_aio_next_io, acb);
+        return;
+    }
+
+    BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
+    file_acb = bdrv_aio_readv(bs->file, offset / BDRV_SECTOR_SIZE,
+                              &acb->cur_qiov,
+                              acb->cur_qiov.size / BDRV_SECTOR_SIZE,
+                              qed_aio_next_io, acb);
+    if (!file_acb) {
+        ret = -EIO;
+        goto err;
+    }
+    return;
+
+err:
+    qed_aio_complete(acb, ret);
+}
+
+/**
+ * Begin next I/O or complete the request
+ */
+static void qed_aio_next_io(void *opaque, int ret)
+{
+    QEDAIOCB *acb = opaque;
+    BDRVQEDState *s = acb_to_s(acb);
+    QEDFindClusterFunc *io_fn =
+        acb->is_write ? qed_aio_write_data : qed_aio_read_data;
+
+    trace_qed_aio_next_io(s, acb, ret, acb->cur_pos + acb->cur_qiov.size);
+
+    /* Handle I/O error */
+    if (ret) {
+        qed_aio_complete(acb, ret);
+        return;
+    }
+
+    acb->qiov_offset += acb->cur_qiov.size;
+    acb->cur_pos += acb->cur_qiov.size;
+    qemu_iovec_reset(&acb->cur_qiov);
+
+    /* Complete request */
+    if (acb->cur_pos >= acb->end_pos) {
+        qed_aio_complete(acb, 0);
+        return;
+    }
+
+    /* Find next cluster and start I/O */
+    qed_find_cluster(s, &acb->request,
+                      acb->cur_pos, acb->end_pos - acb->cur_pos,
+                      io_fn, acb);
+}
+
+static BlockDriverAIOCB *qed_aio_setup(BlockDriverState *bs,
+                                       int64_t sector_num,
+                                       QEMUIOVector *qiov, int nb_sectors,
+                                       BlockDriverCompletionFunc *cb,
+                                       void *opaque, bool is_write)
+{
+    QEDAIOCB *acb = qemu_aio_get(&qed_aio_pool, bs, cb, opaque);
+
+    trace_qed_aio_setup(bs->opaque, acb, sector_num, nb_sectors,
+                         opaque, is_write);
+
+    acb->is_write = is_write;
+    acb->finished = NULL;
+    acb->qiov = qiov;
+    acb->qiov_offset = 0;
+    acb->cur_pos = (uint64_t)sector_num * BDRV_SECTOR_SIZE;
+    acb->end_pos = acb->cur_pos + nb_sectors * BDRV_SECTOR_SIZE;
+    acb->request.l2_table = NULL;
+    qemu_iovec_init(&acb->cur_qiov, qiov->niov);
+
+    /* Start request */
+    qed_aio_next_io(acb, 0);
+    return &acb->common;
+}
+
 static BlockDriverAIOCB *bdrv_qed_aio_readv(BlockDriverState *bs,
                                             int64_t sector_num,
                                             QEMUIOVector *qiov, int nb_sectors,
                                             BlockDriverCompletionFunc *cb,
                                             void *opaque)
 {
-    return NULL;
+    return qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, false);
 }
 
 static BlockDriverAIOCB *bdrv_qed_aio_writev(BlockDriverState *bs,
@@ -446,7 +1070,7 @@ static BlockDriverAIOCB *bdrv_qed_aio_writev(BlockDriverState *bs,
                                              BlockDriverCompletionFunc *cb,
                                              void *opaque)
 {
-    return NULL;
+    return qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, true);
 }
 
 static BlockDriverAIOCB *bdrv_qed_aio_flush(BlockDriverState *bs,
diff --git a/block/qed.h b/block/qed.h
index 6d49a4d..046a410 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -116,6 +116,29 @@ typedef struct QEDRequest {
     CachedL2Table *l2_table;
 } QEDRequest;
 
+typedef struct QEDAIOCB {
+    BlockDriverAIOCB common;
+    QEMUBH *bh;
+    int bh_ret;                     /* final return status for completion bh */
+    QSIMPLEQ_ENTRY(QEDAIOCB) next;  /* next request */
+    bool is_write;                  /* false - read, true - write */
+    bool *finished;                 /* signal for cancel completion */
+    uint64_t end_pos;               /* request end on block device, in bytes */
+
+    /* User scatter-gather list */
+    QEMUIOVector *qiov;
+    size_t qiov_offset;             /* byte count already processed */
+
+    /* Current cluster scatter-gather list */
+    QEMUIOVector cur_qiov;
+    uint64_t cur_pos;               /* position on block device, in bytes */
+    uint64_t cur_cluster;           /* cluster offset in image file */
+    unsigned int cur_nclusters;     /* number of clusters being accessed */
+    int find_cluster_ret;           /* used for L1/L2 update */
+
+    QEDRequest request;
+} QEDAIOCB;
+
 typedef struct {
     BlockDriverState *bs;           /* device */
     uint64_t file_size;             /* length of image file, in bytes */
@@ -127,6 +150,9 @@ typedef struct {
     uint32_t l1_shift;
     uint32_t l2_shift;
     uint32_t l2_mask;
+
+    /* Allocating write request queue */
+    QSIMPLEQ_HEAD(, QEDAIOCB) allocating_write_reqs;
 } BDRVQEDState;
 
 enum {
diff --git a/trace-events b/trace-events
index 59f97a2..e8fed0f 100644
--- a/trace-events
+++ b/trace-events
@@ -203,3 +203,13 @@ disable qed_read_table(void *s, uint64_t offset, void *table) "s %p offset %"PRI
 disable qed_read_table_cb(void *s, void *table, int ret) "s %p table %p ret %d"
 disable qed_write_table(void *s, uint64_t offset, void *table, unsigned int index, unsigned int n) "s %p offset %"PRIu64" table %p index %u n %u"
 disable qed_write_table_cb(void *s, void *table, int flush, int ret) "s %p table %p flush %d ret %d"
+
+# block/qed.c
+disable qed_aio_complete(void *s, void *acb, int ret) "s %p acb %p ret %d"
+disable qed_aio_setup(void *s, void *acb, int64_t sector_num, int nb_sectors, void *opaque, int is_write) "s %p acb %p sector_num %"PRId64" nb_sectors %d opaque %p is_write %d"
+disable qed_aio_next_io(void *s, void *acb, int ret, uint64_t cur_pos) "s %p acb %p ret %d cur_pos %"PRIu64""
+disable qed_aio_read_data(void *s, void *acb, int ret, uint64_t offset, size_t len) "s %p acb %p ret %d offset %"PRIu64" len %zu"
+disable qed_aio_write_data(void *s, void *acb, int ret, uint64_t offset, size_t len) "s %p acb %p ret %d offset %"PRIu64" len %zu"
+disable qed_aio_write_prefill(void *s, void *acb, uint64_t start, size_t len, uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64""
+disable qed_aio_write_postfill(void *s, void *acb, uint64_t start, size_t len, uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64""
+disable qed_aio_write_main(void *s, void *acb, int ret, uint64_t offset, size_t len) "s %p acb %p ret %d offset %"PRIu64" len %zu"
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 35/38] qed: Consistency check support
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (33 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 34/38] qed: Read/write support Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 36/38] block/qcow2.c: rename qcow_ functions to qcow2_ Kevin Wolf
                   ` (3 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

This patch adds support for the qemu-img check command.  It also
introduces a dirty bit in the qed header to mark modified images as
needing a check.  This bit is cleared when the image file is closed
cleanly.

If an image file is opened and it has the dirty bit set, a consistency
check will run and try to fix corrupted table offsets.  These
corruptions may occur if there is power loss while an allocating write
is performed.  Once the image is fixed it opens as normal again.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qed-check.c |  210 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qed.c       |  125 +++++++++++++++++++++++++++++++-
 block/qed.h       |    4 +
 3 files changed, 336 insertions(+), 3 deletions(-)
 create mode 100644 block/qed-check.c

diff --git a/block/qed-check.c b/block/qed-check.c
new file mode 100644
index 0000000..4600932
--- /dev/null
+++ b/block/qed-check.c
@@ -0,0 +1,210 @@
+/*
+ * QEMU Enhanced Disk Format Consistency Check
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *  Stefan Hajnoczi   <stefanha@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qed.h"
+
+typedef struct {
+    BDRVQEDState *s;
+    BdrvCheckResult *result;
+    bool fix;                           /* whether to fix invalid offsets */
+
+    size_t nclusters;
+    uint32_t *used_clusters;            /* referenced cluster bitmap */
+
+    QEDRequest request;
+} QEDCheck;
+
+static bool qed_test_bit(uint32_t *bitmap, uint64_t n) {
+    return !!(bitmap[n / 32] & (1 << (n % 32)));
+}
+
+static void qed_set_bit(uint32_t *bitmap, uint64_t n) {
+    bitmap[n / 32] |= 1 << (n % 32);
+}
+
+/**
+ * Set bitmap bits for clusters
+ *
+ * @check:          Check structure
+ * @offset:         Starting offset in bytes
+ * @n:              Number of clusters
+ */
+static bool qed_set_used_clusters(QEDCheck *check, uint64_t offset,
+                                  unsigned int n)
+{
+    uint64_t cluster = qed_bytes_to_clusters(check->s, offset);
+    unsigned int corruptions = 0;
+
+    while (n-- != 0) {
+        /* Clusters should only be referenced once */
+        if (qed_test_bit(check->used_clusters, cluster)) {
+            corruptions++;
+        }
+
+        qed_set_bit(check->used_clusters, cluster);
+        cluster++;
+    }
+
+    check->result->corruptions += corruptions;
+    return corruptions == 0;
+}
+
+/**
+ * Check an L2 table
+ *
+ * @ret:            Number of invalid cluster offsets
+ */
+static unsigned int qed_check_l2_table(QEDCheck *check, QEDTable *table)
+{
+    BDRVQEDState *s = check->s;
+    unsigned int i, num_invalid = 0;
+
+    for (i = 0; i < s->table_nelems; i++) {
+        uint64_t offset = table->offsets[i];
+
+        if (!offset) {
+            continue;
+        }
+
+        /* Detect invalid cluster offset */
+        if (!qed_check_cluster_offset(s, offset)) {
+            if (check->fix) {
+                table->offsets[i] = 0;
+            } else {
+                check->result->corruptions++;
+            }
+
+            num_invalid++;
+            continue;
+        }
+
+        qed_set_used_clusters(check, offset, 1);
+    }
+
+    return num_invalid;
+}
+
+/**
+ * Descend tables and check each cluster is referenced once only
+ */
+static int qed_check_l1_table(QEDCheck *check, QEDTable *table)
+{
+    BDRVQEDState *s = check->s;
+    unsigned int i, num_invalid_l1 = 0;
+    int ret, last_error = 0;
+
+    /* Mark L1 table clusters used */
+    qed_set_used_clusters(check, s->header.l1_table_offset,
+                          s->header.table_size);
+
+    for (i = 0; i < s->table_nelems; i++) {
+        unsigned int num_invalid_l2;
+        uint64_t offset = table->offsets[i];
+
+        if (!offset) {
+            continue;
+        }
+
+        /* Detect invalid L2 offset */
+        if (!qed_check_table_offset(s, offset)) {
+            /* Clear invalid offset */
+            if (check->fix) {
+                table->offsets[i] = 0;
+            } else {
+                check->result->corruptions++;
+            }
+
+            num_invalid_l1++;
+            continue;
+        }
+
+        if (!qed_set_used_clusters(check, offset, s->header.table_size)) {
+            continue; /* skip an invalid table */
+        }
+
+        ret = qed_read_l2_table_sync(s, &check->request, offset);
+        if (ret) {
+            check->result->check_errors++;
+            last_error = ret;
+            continue;
+        }
+
+        num_invalid_l2 = qed_check_l2_table(check,
+                                            check->request.l2_table->table);
+
+        /* Write out fixed L2 table */
+        if (num_invalid_l2 > 0 && check->fix) {
+            ret = qed_write_l2_table_sync(s, &check->request, 0,
+                                          s->table_nelems, false);
+            if (ret) {
+                check->result->check_errors++;
+                last_error = ret;
+                continue;
+            }
+        }
+    }
+
+    /* Drop reference to final table */
+    qed_unref_l2_cache_entry(check->request.l2_table);
+    check->request.l2_table = NULL;
+
+    /* Write out fixed L1 table */
+    if (num_invalid_l1 > 0 && check->fix) {
+        ret = qed_write_l1_table_sync(s, 0, s->table_nelems);
+        if (ret) {
+            check->result->check_errors++;
+            last_error = ret;
+        }
+    }
+
+    return last_error;
+}
+
+/**
+ * Check for unreferenced (leaked) clusters
+ */
+static void qed_check_for_leaks(QEDCheck *check)
+{
+    BDRVQEDState *s = check->s;
+    size_t i;
+
+    for (i = s->header.header_size; i < check->nclusters; i++) {
+        if (!qed_test_bit(check->used_clusters, i)) {
+            check->result->leaks++;
+        }
+    }
+}
+
+int qed_check(BDRVQEDState *s, BdrvCheckResult *result, bool fix)
+{
+    QEDCheck check = {
+        .s = s,
+        .result = result,
+        .nclusters = qed_bytes_to_clusters(s, s->file_size),
+        .request = { .l2_table = NULL },
+        .fix = fix,
+    };
+    int ret;
+
+    check.used_clusters = qemu_mallocz(((check.nclusters + 31) / 32) *
+                                       sizeof(check.used_clusters[0]));
+
+    ret = qed_check_l1_table(&check, s->l1_table);
+    if (ret == 0) {
+        /* Only check for leaks if entire image was scanned successfully */
+        qed_check_for_leaks(&check);
+    }
+
+    qemu_free(check.used_clusters);
+    return ret;
+}
diff --git a/block/qed.c b/block/qed.c
index 8e65d18..085c4f2 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -99,6 +99,81 @@ static int qed_write_header_sync(BDRVQEDState *s)
     return 0;
 }
 
+typedef struct {
+    GenericCB gencb;
+    BDRVQEDState *s;
+    struct iovec iov;
+    QEMUIOVector qiov;
+    int nsectors;
+    uint8_t *buf;
+} QEDWriteHeaderCB;
+
+static void qed_write_header_cb(void *opaque, int ret)
+{
+    QEDWriteHeaderCB *write_header_cb = opaque;
+
+    qemu_vfree(write_header_cb->buf);
+    gencb_complete(write_header_cb, ret);
+}
+
+static void qed_write_header_read_cb(void *opaque, int ret)
+{
+    QEDWriteHeaderCB *write_header_cb = opaque;
+    BDRVQEDState *s = write_header_cb->s;
+    BlockDriverAIOCB *acb;
+
+    if (ret) {
+        qed_write_header_cb(write_header_cb, ret);
+        return;
+    }
+
+    /* Update header */
+    qed_header_cpu_to_le(&s->header, (QEDHeader *)write_header_cb->buf);
+
+    acb = bdrv_aio_writev(s->bs->file, 0, &write_header_cb->qiov,
+                          write_header_cb->nsectors, qed_write_header_cb,
+                          write_header_cb);
+    if (!acb) {
+        qed_write_header_cb(write_header_cb, -EIO);
+    }
+}
+
+/**
+ * Update header in-place (does not rewrite backing filename or other strings)
+ *
+ * This function only updates known header fields in-place and does not affect
+ * extra data after the QED header.
+ */
+static void qed_write_header(BDRVQEDState *s, BlockDriverCompletionFunc cb,
+                             void *opaque)
+{
+    /* We must write full sectors for O_DIRECT but cannot necessarily generate
+     * the data following the header if an unrecognized compat feature is
+     * active.  Therefore, first read the sectors containing the header, update
+     * them, and write back.
+     */
+
+    BlockDriverAIOCB *acb;
+    int nsectors = (sizeof(QEDHeader) + BDRV_SECTOR_SIZE - 1) /
+                   BDRV_SECTOR_SIZE;
+    size_t len = nsectors * BDRV_SECTOR_SIZE;
+    QEDWriteHeaderCB *write_header_cb = gencb_alloc(sizeof(*write_header_cb),
+                                                    cb, opaque);
+
+    write_header_cb->s = s;
+    write_header_cb->nsectors = nsectors;
+    write_header_cb->buf = qemu_blockalign(s->bs, len);
+    write_header_cb->iov.iov_base = write_header_cb->buf;
+    write_header_cb->iov.iov_len = len;
+    qemu_iovec_init_external(&write_header_cb->qiov, &write_header_cb->iov, 1);
+
+    acb = bdrv_aio_readv(s->bs->file, 0, &write_header_cb->qiov, nsectors,
+                         qed_write_header_read_cb, write_header_cb);
+    if (!acb) {
+        qed_write_header_cb(write_header_cb, -EIO);
+    }
+}
+
 static uint64_t qed_max_image_size(uint32_t cluster_size, uint32_t table_size)
 {
     uint64_t table_entries;
@@ -310,6 +385,32 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
 
     ret = qed_read_l1_table_sync(s);
     if (ret) {
+        goto out;
+    }
+
+    /* If image was not closed cleanly, check consistency */
+    if (s->header.features & QED_F_NEED_CHECK) {
+        /* Read-only images cannot be fixed.  There is no risk of corruption
+         * since write operations are not possible.  Therefore, allow
+         * potentially inconsistent images to be opened read-only.  This can
+         * aid data recovery from an otherwise inconsistent image.
+         */
+        if (!bdrv_is_read_only(bs->file)) {
+            BdrvCheckResult result = {0};
+
+            ret = qed_check(s, &result, true);
+            if (!ret && !result.corruptions && !result.check_errors) {
+                /* Ensure fixes reach storage before clearing check bit */
+                bdrv_flush(s->bs);
+
+                s->header.features &= ~QED_F_NEED_CHECK;
+                qed_write_header_sync(s);
+            }
+        }
+    }
+
+out:
+    if (ret) {
         qed_free_l2_cache(&s->l2_cache);
         qemu_vfree(s->l1_table);
     }
@@ -320,6 +421,15 @@ static void bdrv_qed_close(BlockDriverState *bs)
 {
     BDRVQEDState *s = bs->opaque;
 
+    /* Ensure writes reach stable storage */
+    bdrv_flush(bs->file);
+
+    /* Clean shutdown, no check required on next open */
+    if (s->header.features & QED_F_NEED_CHECK) {
+        s->header.features &= ~QED_F_NEED_CHECK;
+        qed_write_header_sync(s);
+    }
+
     qed_free_l2_cache(&s->l2_cache);
     qemu_vfree(s->l1_table);
 }
@@ -885,8 +995,15 @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
     acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
     qemu_iovec_copy(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
 
-    /* Write new cluster */
-    qed_aio_write_prefill(acb, 0);
+    /* Write new cluster if the image is already marked dirty */
+    if (s->header.features & QED_F_NEED_CHECK) {
+        qed_aio_write_prefill(acb, 0);
+        return;
+    }
+
+    /* Mark the image dirty before writing the new cluster */
+    s->header.features |= QED_F_NEED_CHECK;
+    qed_write_header(s, qed_aio_write_prefill, acb);
 }
 
 /**
@@ -1172,7 +1289,9 @@ static int bdrv_qed_change_backing_file(BlockDriverState *bs,
 
 static int bdrv_qed_check(BlockDriverState *bs, BdrvCheckResult *result)
 {
-    return -ENOTSUP;
+    BDRVQEDState *s = bs->opaque;
+
+    return qed_check(s, result, false);
 }
 
 static QEMUOptionParameter qed_create_options[] = {
diff --git a/block/qed.h b/block/qed.h
index 046a410..2925e37 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -50,11 +50,15 @@ enum {
     /* The image supports a backing file */
     QED_F_BACKING_FILE = 0x01,
 
+    /* The image needs a consistency check before use */
+    QED_F_NEED_CHECK = 0x02,
+
     /* The backing file format must not be probed, treat as raw image */
     QED_F_BACKING_FORMAT_NO_PROBE = 0x04,
 
     /* Feature bits must be used when the on-disk format changes */
     QED_FEATURE_MASK = QED_F_BACKING_FILE | /* supported feature bits */
+                       QED_F_NEED_CHECK |
                        QED_F_BACKING_FORMAT_NO_PROBE,
     QED_COMPAT_FEATURE_MASK = 0,            /* supported compat feature bits */
     QED_AUTOCLEAR_FEATURE_MASK = 0,         /* supported autoclear feature bits */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 36/38] block/qcow2.c: rename qcow_ functions to qcow2_
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (34 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 35/38] qed: Consistency check support Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 37/38] Add proper -errno error return values to qcow2_open() Kevin Wolf
                   ` (2 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Jes Sorensen <Jes.Sorensen@redhat.com>

It doesn't really make sense for functions in qcow2.c to be named
qcow_ so convert the names to match correctly.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c  |    6 +-
 block/qcow2-snapshot.c |    6 +-
 block/qcow2.c          |  190 +++++++++++++++++++++++++-----------------------
 3 files changed, 104 insertions(+), 98 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index b040208..6928c63 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -352,8 +352,8 @@ void qcow2_encrypt_sectors(BDRVQcowState *s, int64_t sector_num,
 }
 
 
-static int qcow_read(BlockDriverState *bs, int64_t sector_num,
-                     uint8_t *buf, int nb_sectors)
+static int qcow2_read(BlockDriverState *bs, int64_t sector_num,
+                      uint8_t *buf, int nb_sectors)
 {
     BDRVQcowState *s = bs->opaque;
     int ret, index_in_cluster, n, n1;
@@ -419,7 +419,7 @@ static int copy_sectors(BlockDriverState *bs, uint64_t start_sect,
     if (n <= 0)
         return 0;
     BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
-    ret = qcow_read(bs, start_sect + n_start, s->cluster_data, n);
+    ret = qcow2_read(bs, start_sect + n_start, s->cluster_data, n);
     if (ret < 0)
         return ret;
     if (s->crypt_method) {
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index aacf357..74823a5 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -116,7 +116,7 @@ int qcow2_read_snapshots(BlockDriverState *bs)
 }
 
 /* add at the end of the file a new list of snapshots */
-static int qcow_write_snapshots(BlockDriverState *bs)
+static int qcow2_write_snapshots(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
     QCowSnapshot *sn;
@@ -300,7 +300,7 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
     s->snapshots = snapshots1;
     s->snapshots[s->nb_snapshots++] = *sn;
 
-    if (qcow_write_snapshots(bs) < 0)
+    if (qcow2_write_snapshots(bs) < 0)
         goto fail;
 #ifdef DEBUG_ALLOC
     qcow2_check_refcounts(bs);
@@ -378,7 +378,7 @@ int qcow2_snapshot_delete(BlockDriverState *bs, const char *snapshot_id)
     qemu_free(sn->name);
     memmove(sn, sn + 1, (s->nb_snapshots - snapshot_index - 1) * sizeof(*sn));
     s->nb_snapshots--;
-    ret = qcow_write_snapshots(bs);
+    ret = qcow2_write_snapshots(bs);
     if (ret < 0) {
         /* XXX: restore snapshot if error ? */
         return ret;
diff --git a/block/qcow2.c b/block/qcow2.c
index 537c479..4b41190 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -50,10 +50,10 @@ typedef struct {
     uint32_t magic;
     uint32_t len;
 } QCowExtension;
-#define  QCOW_EXT_MAGIC_END 0
-#define  QCOW_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
+#define  QCOW2_EXT_MAGIC_END 0
+#define  QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
 
-static int qcow_probe(const uint8_t *buf, int buf_size, const char *filename)
+static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
     const QCowHeader *cow_header = (const void *)buf;
 
@@ -73,14 +73,14 @@ static int qcow_probe(const uint8_t *buf, int buf_size, const char *filename)
  * unknown magic is skipped (future extension this version knows nothing about)
  * return 0 upon success, non-0 otherwise
  */
-static int qcow_read_extensions(BlockDriverState *bs, uint64_t start_offset,
-                                uint64_t end_offset)
+static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
+                                 uint64_t end_offset)
 {
     QCowExtension ext;
     uint64_t offset;
 
 #ifdef DEBUG_EXT
-    printf("qcow_read_extensions: start=%ld end=%ld\n", start_offset, end_offset);
+    printf("qcow2_read_extensions: start=%ld end=%ld\n", start_offset, end_offset);
 #endif
     offset = start_offset;
     while (offset < end_offset) {
@@ -88,13 +88,13 @@ static int qcow_read_extensions(BlockDriverState *bs, uint64_t start_offset,
 #ifdef DEBUG_EXT
         /* Sanity check */
         if (offset > s->cluster_size)
-            printf("qcow_handle_extension: suspicious offset %lu\n", offset);
+            printf("qcow2_read_extension: suspicious offset %lu\n", offset);
 
         printf("attemting to read extended header in offset %lu\n", offset);
 #endif
 
         if (bdrv_pread(bs->file, offset, &ext, sizeof(ext)) != sizeof(ext)) {
-            fprintf(stderr, "qcow_handle_extension: ERROR: "
+            fprintf(stderr, "qcow2_read_extension: ERROR: "
                     "pread fail from offset %" PRIu64 "\n",
                     offset);
             return 1;
@@ -106,10 +106,10 @@ static int qcow_read_extensions(BlockDriverState *bs, uint64_t start_offset,
         printf("ext.magic = 0x%x\n", ext.magic);
 #endif
         switch (ext.magic) {
-        case QCOW_EXT_MAGIC_END:
+        case QCOW2_EXT_MAGIC_END:
             return 0;
 
-        case QCOW_EXT_MAGIC_BACKING_FORMAT:
+        case QCOW2_EXT_MAGIC_BACKING_FORMAT:
             if (ext.len >= sizeof(bs->backing_format)) {
                 fprintf(stderr, "ERROR: ext_backing_format: len=%u too large"
                         " (>=%zu)\n",
@@ -137,7 +137,7 @@ static int qcow_read_extensions(BlockDriverState *bs, uint64_t start_offset,
 }
 
 
-static int qcow_open(BlockDriverState *bs, int flags)
+static int qcow2_open(BlockDriverState *bs, int flags)
 {
     BDRVQcowState *s = bs->opaque;
     int len, i;
@@ -222,7 +222,7 @@ static int qcow_open(BlockDriverState *bs, int flags)
         ext_end = header.backing_file_offset;
     else
         ext_end = s->cluster_size;
-    if (qcow_read_extensions(bs, sizeof(header), ext_end))
+    if (qcow2_read_extensions(bs, sizeof(header), ext_end))
         goto fail;
 
     /* read the backing file name */
@@ -252,7 +252,7 @@ static int qcow_open(BlockDriverState *bs, int flags)
     return -1;
 }
 
-static int qcow_set_key(BlockDriverState *bs, const char *key)
+static int qcow2_set_key(BlockDriverState *bs, const char *key)
 {
     BDRVQcowState *s = bs->opaque;
     uint8_t keybuf[16];
@@ -294,8 +294,8 @@ static int qcow_set_key(BlockDriverState *bs, const char *key)
     return 0;
 }
 
-static int qcow_is_allocated(BlockDriverState *bs, int64_t sector_num,
-                             int nb_sectors, int *pnum)
+static int qcow2_is_allocated(BlockDriverState *bs, int64_t sector_num,
+                              int nb_sectors, int *pnum)
 {
     uint64_t cluster_offset;
     int ret;
@@ -344,7 +344,7 @@ typedef struct QCowAIOCB {
     QLIST_ENTRY(QCowAIOCB) next_depend;
 } QCowAIOCB;
 
-static void qcow_aio_cancel(BlockDriverAIOCB *blockacb)
+static void qcow2_aio_cancel(BlockDriverAIOCB *blockacb)
 {
     QCowAIOCB *acb = container_of(blockacb, QCowAIOCB, common);
     if (acb->hd_aiocb)
@@ -352,21 +352,21 @@ static void qcow_aio_cancel(BlockDriverAIOCB *blockacb)
     qemu_aio_release(acb);
 }
 
-static AIOPool qcow_aio_pool = {
+static AIOPool qcow2_aio_pool = {
     .aiocb_size         = sizeof(QCowAIOCB),
-    .cancel             = qcow_aio_cancel,
+    .cancel             = qcow2_aio_cancel,
 };
 
-static void qcow_aio_read_cb(void *opaque, int ret);
-static void qcow_aio_read_bh(void *opaque)
+static void qcow2_aio_read_cb(void *opaque, int ret);
+static void qcow2_aio_read_bh(void *opaque)
 {
     QCowAIOCB *acb = opaque;
     qemu_bh_delete(acb->bh);
     acb->bh = NULL;
-    qcow_aio_read_cb(opaque, 0);
+    qcow2_aio_read_cb(opaque, 0);
 }
 
-static int qcow_schedule_bh(QEMUBHFunc *cb, QCowAIOCB *acb)
+static int qcow2_schedule_bh(QEMUBHFunc *cb, QCowAIOCB *acb)
 {
     if (acb->bh)
         return -EIO;
@@ -380,7 +380,7 @@ static int qcow_schedule_bh(QEMUBHFunc *cb, QCowAIOCB *acb)
     return 0;
 }
 
-static void qcow_aio_read_cb(void *opaque, int ret)
+static void qcow2_aio_read_cb(void *opaque, int ret)
 {
     QCowAIOCB *acb = opaque;
     BlockDriverState *bs = acb->common.bs;
@@ -447,18 +447,18 @@ static void qcow_aio_read_cb(void *opaque, int ret)
                 BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
                 acb->hd_aiocb = bdrv_aio_readv(bs->backing_hd, acb->sector_num,
                                     &acb->hd_qiov, acb->cur_nr_sectors,
-				    qcow_aio_read_cb, acb);
+				    qcow2_aio_read_cb, acb);
                 if (acb->hd_aiocb == NULL)
                     goto done;
             } else {
-                ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
+                ret = qcow2_schedule_bh(qcow2_aio_read_bh, acb);
                 if (ret < 0)
                     goto done;
             }
         } else {
             /* Note: in this case, no need to wait */
             qemu_iovec_memset(&acb->hd_qiov, 0, 512 * acb->cur_nr_sectors);
-            ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
+            ret = qcow2_schedule_bh(qcow2_aio_read_bh, acb);
             if (ret < 0)
                 goto done;
         }
@@ -471,7 +471,7 @@ static void qcow_aio_read_cb(void *opaque, int ret)
             s->cluster_cache + index_in_cluster * 512,
             512 * acb->cur_nr_sectors);
 
-        ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
+        ret = qcow2_schedule_bh(qcow2_aio_read_bh, acb);
         if (ret < 0)
             goto done;
     } else {
@@ -501,7 +501,7 @@ static void qcow_aio_read_cb(void *opaque, int ret)
         acb->hd_aiocb = bdrv_aio_readv(bs->file,
                             (acb->cluster_offset >> 9) + index_in_cluster,
                             &acb->hd_qiov, acb->cur_nr_sectors,
-                            qcow_aio_read_cb, acb);
+                            qcow2_aio_read_cb, acb);
         if (acb->hd_aiocb == NULL) {
             ret = -EIO;
             goto done;
@@ -515,13 +515,14 @@ done:
     qemu_aio_release(acb);
 }
 
-static QCowAIOCB *qcow_aio_setup(BlockDriverState *bs,
-        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
-        BlockDriverCompletionFunc *cb, void *opaque, int is_write)
+static QCowAIOCB *qcow2_aio_setup(BlockDriverState *bs, int64_t sector_num,
+                                  QEMUIOVector *qiov, int nb_sectors,
+                                  BlockDriverCompletionFunc *cb,
+                                  void *opaque, int is_write)
 {
     QCowAIOCB *acb;
 
-    acb = qemu_aio_get(&qcow_aio_pool, bs, cb, opaque);
+    acb = qemu_aio_get(&qcow2_aio_pool, bs, cb, opaque);
     if (!acb)
         return NULL;
     acb->hd_aiocb = NULL;
@@ -539,21 +540,23 @@ static QCowAIOCB *qcow_aio_setup(BlockDriverState *bs,
     return acb;
 }
 
-static BlockDriverAIOCB *qcow_aio_readv(BlockDriverState *bs,
-        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
-        BlockDriverCompletionFunc *cb, void *opaque)
+static BlockDriverAIOCB *qcow2_aio_readv(BlockDriverState *bs,
+                                         int64_t sector_num,
+                                         QEMUIOVector *qiov, int nb_sectors,
+                                         BlockDriverCompletionFunc *cb,
+                                         void *opaque)
 {
     QCowAIOCB *acb;
 
-    acb = qcow_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
+    acb = qcow2_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
     if (!acb)
         return NULL;
 
-    qcow_aio_read_cb(acb, 0);
+    qcow2_aio_read_cb(acb, 0);
     return &acb->common;
 }
 
-static void qcow_aio_write_cb(void *opaque, int ret);
+static void qcow2_aio_write_cb(void *opaque, int ret);
 
 static void run_dependent_requests(QCowL2Meta *m)
 {
@@ -567,14 +570,14 @@ static void run_dependent_requests(QCowL2Meta *m)
 
     /* Restart all dependent requests */
     QLIST_FOREACH_SAFE(req, &m->dependent_requests, next_depend, next) {
-        qcow_aio_write_cb(req, 0);
+        qcow2_aio_write_cb(req, 0);
     }
 
     /* Empty the list for the next part of the request */
     QLIST_INIT(&m->dependent_requests);
 }
 
-static void qcow_aio_write_cb(void *opaque, int ret)
+static void qcow2_aio_write_cb(void *opaque, int ret)
 {
     QCowAIOCB *acb = opaque;
     BlockDriverState *bs = acb->common.bs;
@@ -651,7 +654,7 @@ static void qcow_aio_write_cb(void *opaque, int ret)
     acb->hd_aiocb = bdrv_aio_writev(bs->file,
                                     (acb->cluster_offset >> 9) + index_in_cluster,
                                     &acb->hd_qiov, acb->cur_nr_sectors,
-                                    qcow_aio_write_cb, acb);
+                                    qcow2_aio_write_cb, acb);
     if (acb->hd_aiocb == NULL) {
         ret = -EIO;
         goto fail;
@@ -669,24 +672,26 @@ done:
     qemu_aio_release(acb);
 }
 
-static BlockDriverAIOCB *qcow_aio_writev(BlockDriverState *bs,
-        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
-        BlockDriverCompletionFunc *cb, void *opaque)
+static BlockDriverAIOCB *qcow2_aio_writev(BlockDriverState *bs,
+                                          int64_t sector_num,
+                                          QEMUIOVector *qiov, int nb_sectors,
+                                          BlockDriverCompletionFunc *cb,
+                                          void *opaque)
 {
     BDRVQcowState *s = bs->opaque;
     QCowAIOCB *acb;
 
     s->cluster_cache_offset = -1; /* disable compressed cache */
 
-    acb = qcow_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 1);
+    acb = qcow2_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 1);
     if (!acb)
         return NULL;
 
-    qcow_aio_write_cb(acb, 0);
+    qcow2_aio_write_cb(acb, 0);
     return &acb->common;
 }
 
-static void qcow_close(BlockDriverState *bs)
+static void qcow2_close(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
     qemu_free(s->l1_table);
@@ -721,7 +726,7 @@ static int qcow2_update_ext_header(BlockDriverState *bs,
     /* Prepare the backing file format extension if needed */
     if (backing_fmt) {
         ext_backing_fmt.len = cpu_to_be32(strlen(backing_fmt));
-        ext_backing_fmt.magic = cpu_to_be32(QCOW_EXT_MAGIC_BACKING_FORMAT);
+        ext_backing_fmt.magic = cpu_to_be32(QCOW2_EXT_MAGIC_BACKING_FORMAT);
         backing_fmt_len = ((sizeof(ext_backing_fmt)
             + strlen(backing_fmt) + 7) & ~7);
     }
@@ -848,10 +853,10 @@ static int preallocate(BlockDriverState *bs)
     return 0;
 }
 
-static int qcow_create2(const char *filename, int64_t total_size,
-                        const char *backing_file, const char *backing_format,
-                        int flags, size_t cluster_size, int prealloc,
-                        QEMUOptionParameter *options)
+static int qcow2_create2(const char *filename, int64_t total_size,
+                         const char *backing_file, const char *backing_format,
+                         int flags, size_t cluster_size, int prealloc,
+                         QEMUOptionParameter *options)
 {
     /* Calulate cluster_bits */
     int cluster_bits;
@@ -974,7 +979,7 @@ out:
     return ret;
 }
 
-static int qcow_create(const char *filename, QEMUOptionParameter *options)
+static int qcow2_create(const char *filename, QEMUOptionParameter *options)
 {
     const char *backing_file = NULL;
     const char *backing_fmt = NULL;
@@ -1017,11 +1022,11 @@ static int qcow_create(const char *filename, QEMUOptionParameter *options)
         return -EINVAL;
     }
 
-    return qcow_create2(filename, sectors, backing_file, backing_fmt, flags,
-        cluster_size, prealloc, options);
+    return qcow2_create2(filename, sectors, backing_file, backing_fmt, flags,
+                         cluster_size, prealloc, options);
 }
 
-static int qcow_make_empty(BlockDriverState *bs)
+static int qcow2_make_empty(BlockDriverState *bs)
 {
 #if 0
     /* XXX: not correct */
@@ -1080,8 +1085,8 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset)
 
 /* XXX: put compressed sectors first, then all the cluster aligned
    tables to avoid losing bytes in alignment */
-static int qcow_write_compressed(BlockDriverState *bs, int64_t sector_num,
-                                 const uint8_t *buf, int nb_sectors)
+static int qcow2_write_compressed(BlockDriverState *bs, int64_t sector_num,
+                                  const uint8_t *buf, int nb_sectors)
 {
     BDRVQcowState *s = bs->opaque;
     z_stream strm;
@@ -1148,32 +1153,33 @@ static int qcow_write_compressed(BlockDriverState *bs, int64_t sector_num,
     return 0;
 }
 
-static int qcow_flush(BlockDriverState *bs)
+static int qcow2_flush(BlockDriverState *bs)
 {
     return bdrv_flush(bs->file);
 }
 
-static BlockDriverAIOCB *qcow_aio_flush(BlockDriverState *bs,
-         BlockDriverCompletionFunc *cb, void *opaque)
+static BlockDriverAIOCB *qcow2_aio_flush(BlockDriverState *bs,
+                                         BlockDriverCompletionFunc *cb,
+                                         void *opaque)
 {
     return bdrv_aio_flush(bs->file, cb, opaque);
 }
 
-static int64_t qcow_vm_state_offset(BDRVQcowState *s)
+static int64_t qcow2_vm_state_offset(BDRVQcowState *s)
 {
 	return (int64_t)s->l1_vm_state_index << (s->cluster_bits + s->l2_bits);
 }
 
-static int qcow_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+static int qcow2_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
     BDRVQcowState *s = bs->opaque;
     bdi->cluster_size = s->cluster_size;
-    bdi->vm_state_offset = qcow_vm_state_offset(s);
+    bdi->vm_state_offset = qcow2_vm_state_offset(s);
     return 0;
 }
 
 
-static int qcow_check(BlockDriverState *bs, BdrvCheckResult *result)
+static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result)
 {
     return qcow2_check_refcounts(bs, result);
 }
@@ -1199,8 +1205,8 @@ static void dump_refcounts(BlockDriverState *bs)
 }
 #endif
 
-static int qcow_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
-                           int64_t pos, int size)
+static int qcow2_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
+                              int64_t pos, int size)
 {
     BDRVQcowState *s = bs->opaque;
     int growable = bs->growable;
@@ -1208,14 +1214,14 @@ static int qcow_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
 
     BLKDBG_EVENT(bs->file, BLKDBG_VMSTATE_SAVE);
     bs->growable = 1;
-    ret = bdrv_pwrite(bs, qcow_vm_state_offset(s) + pos, buf, size);
+    ret = bdrv_pwrite(bs, qcow2_vm_state_offset(s) + pos, buf, size);
     bs->growable = growable;
 
     return ret;
 }
 
-static int qcow_load_vmstate(BlockDriverState *bs, uint8_t *buf,
-                           int64_t pos, int size)
+static int qcow2_load_vmstate(BlockDriverState *bs, uint8_t *buf,
+                              int64_t pos, int size)
 {
     BDRVQcowState *s = bs->opaque;
     int growable = bs->growable;
@@ -1223,13 +1229,13 @@ static int qcow_load_vmstate(BlockDriverState *bs, uint8_t *buf,
 
     BLKDBG_EVENT(bs->file, BLKDBG_VMSTATE_LOAD);
     bs->growable = 1;
-    ret = bdrv_pread(bs, qcow_vm_state_offset(s) + pos, buf, size);
+    ret = bdrv_pread(bs, qcow2_vm_state_offset(s) + pos, buf, size);
     bs->growable = growable;
 
     return ret;
 }
 
-static QEMUOptionParameter qcow_create_options[] = {
+static QEMUOptionParameter qcow2_create_options[] = {
     {
         .name = BLOCK_OPT_SIZE,
         .type = OPT_SIZE,
@@ -1264,38 +1270,38 @@ static QEMUOptionParameter qcow_create_options[] = {
 };
 
 static BlockDriver bdrv_qcow2 = {
-    .format_name	= "qcow2",
-    .instance_size	= sizeof(BDRVQcowState),
-    .bdrv_probe		= qcow_probe,
-    .bdrv_open		= qcow_open,
-    .bdrv_close		= qcow_close,
-    .bdrv_create	= qcow_create,
-    .bdrv_flush		= qcow_flush,
-    .bdrv_is_allocated	= qcow_is_allocated,
-    .bdrv_set_key	= qcow_set_key,
-    .bdrv_make_empty	= qcow_make_empty,
-
-    .bdrv_aio_readv	= qcow_aio_readv,
-    .bdrv_aio_writev	= qcow_aio_writev,
-    .bdrv_aio_flush	= qcow_aio_flush,
+    .format_name        = "qcow2",
+    .instance_size      = sizeof(BDRVQcowState),
+    .bdrv_probe         = qcow2_probe,
+    .bdrv_open          = qcow2_open,
+    .bdrv_close         = qcow2_close,
+    .bdrv_create        = qcow2_create,
+    .bdrv_flush         = qcow2_flush,
+    .bdrv_is_allocated  = qcow2_is_allocated,
+    .bdrv_set_key       = qcow2_set_key,
+    .bdrv_make_empty    = qcow2_make_empty,
+
+    .bdrv_aio_readv     = qcow2_aio_readv,
+    .bdrv_aio_writev    = qcow2_aio_writev,
+    .bdrv_aio_flush     = qcow2_aio_flush,
 
     .bdrv_truncate          = qcow2_truncate,
-    .bdrv_write_compressed  = qcow_write_compressed,
+    .bdrv_write_compressed  = qcow2_write_compressed,
 
     .bdrv_snapshot_create   = qcow2_snapshot_create,
     .bdrv_snapshot_goto     = qcow2_snapshot_goto,
     .bdrv_snapshot_delete   = qcow2_snapshot_delete,
     .bdrv_snapshot_list     = qcow2_snapshot_list,
     .bdrv_snapshot_load_tmp     = qcow2_snapshot_load_tmp,
-    .bdrv_get_info	= qcow_get_info,
+    .bdrv_get_info      = qcow2_get_info,
 
-    .bdrv_save_vmstate    = qcow_save_vmstate,
-    .bdrv_load_vmstate    = qcow_load_vmstate,
+    .bdrv_save_vmstate    = qcow2_save_vmstate,
+    .bdrv_load_vmstate    = qcow2_load_vmstate,
 
     .bdrv_change_backing_file   = qcow2_change_backing_file,
 
-    .create_options = qcow_create_options,
-    .bdrv_check = qcow_check,
+    .create_options = qcow2_create_options,
+    .bdrv_check = qcow2_check,
 };
 
 static void bdrv_qcow2_init(void)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 37/38] Add proper -errno error return values to qcow2_open()
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (35 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 36/38] block/qcow2.c: rename qcow_ functions to qcow2_ Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 38/38] docs: Fix missing carets in QED specification Kevin Wolf
  2010-12-17 17:52 ` [Qemu-devel] Re: [PULL 00/38] Block patches Anthony Liguori
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Jes Sorensen <Jes.Sorensen@redhat.com>

In addition this adds missing braces to the function to be consistent
with the coding style.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2.c |   60 +++++++++++++++++++++++++++++++++++++++-----------------
 1 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 4b41190..b6b094c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -140,12 +140,14 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
 static int qcow2_open(BlockDriverState *bs, int flags)
 {
     BDRVQcowState *s = bs->opaque;
-    int len, i;
+    int len, i, ret = 0;
     QCowHeader header;
     uint64_t ext_end;
 
-    if (bdrv_pread(bs->file, 0, &header, sizeof(header)) != sizeof(header))
+    ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
+    if (ret < 0) {
         goto fail;
+    }
     be32_to_cpus(&header.magic);
     be32_to_cpus(&header.version);
     be64_to_cpus(&header.backing_file_offset);
@@ -160,16 +162,23 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     be64_to_cpus(&header.snapshots_offset);
     be32_to_cpus(&header.nb_snapshots);
 
-    if (header.magic != QCOW_MAGIC || header.version != QCOW_VERSION)
+    if (header.magic != QCOW_MAGIC || header.version != QCOW_VERSION) {
+        ret = -EINVAL;
         goto fail;
+    }
     if (header.cluster_bits < MIN_CLUSTER_BITS ||
-        header.cluster_bits > MAX_CLUSTER_BITS)
+        header.cluster_bits > MAX_CLUSTER_BITS) {
+        ret = -EINVAL;
         goto fail;
-    if (header.crypt_method > QCOW_CRYPT_AES)
+    }
+    if (header.crypt_method > QCOW_CRYPT_AES) {
+        ret = -EINVAL;
         goto fail;
+    }
     s->crypt_method_header = header.crypt_method;
-    if (s->crypt_method_header)
+    if (s->crypt_method_header) {
         bs->encrypted = 1;
+    }
     s->cluster_bits = header.cluster_bits;
     s->cluster_size = 1 << s->cluster_bits;
     s->cluster_sectors = 1 << (s->cluster_bits - 9);
@@ -191,15 +200,19 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     s->l1_vm_state_index = size_to_l1(s, header.size);
     /* the L1 table must contain at least enough entries to put
        header.size bytes */
-    if (s->l1_size < s->l1_vm_state_index)
+    if (s->l1_size < s->l1_vm_state_index) {
+        ret = -EINVAL;
         goto fail;
+    }
     s->l1_table_offset = header.l1_table_offset;
     if (s->l1_size > 0) {
         s->l1_table = qemu_mallocz(
             align_offset(s->l1_size * sizeof(uint64_t), 512));
-        if (bdrv_pread(bs->file, s->l1_table_offset, s->l1_table, s->l1_size * sizeof(uint64_t)) !=
-            s->l1_size * sizeof(uint64_t))
+        ret = bdrv_pread(bs->file, s->l1_table_offset, s->l1_table,
+                         s->l1_size * sizeof(uint64_t));
+        if (ret < 0) {
             goto fail;
+        }
         for(i = 0;i < s->l1_size; i++) {
             be64_to_cpus(&s->l1_table[i]);
         }
@@ -212,35 +225,46 @@ static int qcow2_open(BlockDriverState *bs, int flags)
                                   + 512);
     s->cluster_cache_offset = -1;
 
-    if (qcow2_refcount_init(bs) < 0)
+    ret = qcow2_refcount_init(bs);
+    if (ret != 0) {
         goto fail;
+    }
 
     QLIST_INIT(&s->cluster_allocs);
 
     /* read qcow2 extensions */
-    if (header.backing_file_offset)
+    if (header.backing_file_offset) {
         ext_end = header.backing_file_offset;
-    else
+    } else {
         ext_end = s->cluster_size;
-    if (qcow2_read_extensions(bs, sizeof(header), ext_end))
+    }
+    if (qcow2_read_extensions(bs, sizeof(header), ext_end)) {
+        ret = -EINVAL;
         goto fail;
+    }
 
     /* read the backing file name */
     if (header.backing_file_offset != 0) {
         len = header.backing_file_size;
-        if (len > 1023)
+        if (len > 1023) {
             len = 1023;
-        if (bdrv_pread(bs->file, header.backing_file_offset, bs->backing_file, len) != len)
+        }
+        ret = bdrv_pread(bs->file, header.backing_file_offset,
+                         bs->backing_file, len);
+        if (ret < 0) {
             goto fail;
+        }
         bs->backing_file[len] = '\0';
     }
-    if (qcow2_read_snapshots(bs) < 0)
+    if (qcow2_read_snapshots(bs) < 0) {
+        ret = -EINVAL;
         goto fail;
+    }
 
 #ifdef DEBUG_ALLOC
     qcow2_check_refcounts(bs);
 #endif
-    return 0;
+    return ret;
 
  fail:
     qcow2_free_snapshots(bs);
@@ -249,7 +273,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     qemu_free(s->l2_cache);
     qemu_free(s->cluster_cache);
     qemu_free(s->cluster_data);
-    return -1;
+    return ret;
 }
 
 static int qcow2_set_key(BlockDriverState *bs, const char *key)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 38/38] docs: Fix missing carets in QED specification
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (36 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 37/38] Add proper -errno error return values to qcow2_open() Kevin Wolf
@ 2010-12-17 17:44 ` Kevin Wolf
  2010-12-17 17:52 ` [Qemu-devel] Re: [PULL 00/38] Block patches Anthony Liguori
  38 siblings, 0 replies; 40+ messages in thread
From: Kevin Wolf @ 2010-12-17 17:44 UTC (permalink / raw)
  To: anthony; +Cc: kwolf, qemu-devel

From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

For some reason the carets ('^') in the QED specification disappeared.
This patch puts them back.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 docs/specs/qed_spec.txt |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/specs/qed_spec.txt b/docs/specs/qed_spec.txt
index 446b5a2..1d5fa87 100644
--- a/docs/specs/qed_spec.txt
+++ b/docs/specs/qed_spec.txt
@@ -33,7 +33,7 @@ All fields are little-endian.
  }
 
 Field descriptions:
-* ''cluster_size'' must be a power of 2 in range [212, 226].
+* ''cluster_size'' must be a power of 2 in range [2^12, 2^26].
 * ''table_size'' must be a power of 2 in range [1, 16].
 * ''header_size'' is the number of clusters used by the header and any additional information stored before regular clusters.
 * ''features'', ''compat_features'', and ''autoclear_features'' are file format extension bitmaps.  They work as follows:
@@ -90,7 +90,7 @@ L1, L2, and data cluster offsets must be aligned to header.cluster_size.  The fo
 ===Data cluster offsets===
 * 0 - unallocated.  The data cluster is not yet allocated.
 
-Future format extensions may wish to store per-offset information.  The least significant 12 bits of an offset are reserved for this purpose and must be set to zero.  Image files with cluster_size > 212 will have more unused bits which should also be zeroed.
+Future format extensions may wish to store per-offset information.  The least significant 12 bits of an offset are reserved for this purpose and must be set to zero.  Image files with cluster_size > 2^12 will have more unused bits which should also be zeroed.
 
 ===Unallocated L2 tables and data clusters===
 Reads to an unallocated area of the image file access the backing file.  If there is no backing file, then zeroes are produced.  The backing file may be smaller than the image file and reads of unallocated areas beyond the end of the backing file produce zeroes.
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] Re: [PULL 00/38] Block patches
  2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
                   ` (37 preceding siblings ...)
  2010-12-17 17:44 ` [Qemu-devel] [PATCH 38/38] docs: Fix missing carets in QED specification Kevin Wolf
@ 2010-12-17 17:52 ` Anthony Liguori
  38 siblings, 0 replies; 40+ messages in thread
From: Anthony Liguori @ 2010-12-17 17:52 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

On 12/17/2010 11:44 AM, Kevin Wolf wrote:
> The following changes since commit 9d861fa595c93f22d1d55b723a691531c36c9672:
>
>    Merge remote branch 'arm/for-anthony' into staging (2010-12-17 08:25:17 -0600)
>
> are available in the git repository at:
>
>    git://repo.or.cz/qemu/kevin.git for-anthony
>    

Pulled.  Thanks.

Regards,

Anthony Liguori

> Alexander Graf (10):
>        ide: split ide command interpretation off
>        ide: fix whitespace gap in ide_exec_cmd
>        ide: Split out BMDMA code from ATA core
>        ide: move transfer_start after variable modification
>        pci: add storage class for sata
>        pci: add ich9 pci id
>        ahci: add ahci emulation
>        config: move ide core and pci to pci.mak
>        config: add ahci for pci capable machines
>        ide: honor ncq for atapi
>
> Christoph Hellwig (3):
>        block: add discard support
>        scsi-disk: support WRITE SAME (16) with unmap bit
>        raw-posix: add discard support
>
> Jes Sorensen (9):
>        Introduce strtosz_suffix()
>        qemu-img.c: Clean up handling of image size in img_create()
>        qemu-img.c: Re-factor img_create()
>        Introduce do_snapshot_blkdev() and monitor command to handle it.
>        Prevent creating an image with the same filename as backing file
>        bdrv_img_create() use proper errno return values
>        qemu.img.c: Use error_report() instead of own error() implementation
>        block/qcow2.c: rename qcow_ functions to qcow2_
>        Add proper -errno error return values to qcow2_open()
>
> Kevin Wolf (3):
>        qemu-img: Call error_set_progname
>        Remove NULL checks for bdrv_new return value
>        qemu-io: Fix typo in help texts
>
> Roland Elek (1):
>        ide: add ncq identify data for ahci sata drives
>
> Ryan Harper (1):
>        blockdev: check dinfo ptr before using
>
> Sebastian Herbszt (1):
>        ahci: set SATA Mode Select
>
> Stefan Hajnoczi (10):
>        block: Introduce path_has_protocol() function
>        block: Fix the use of protocols in backing files
>        ide: Register vm change state handler once only
>        qemu-io: Add discard command
>        docs: Add QED image format specification
>        qed: Add QEMU Enhanced Disk image format
>        qed: Table, L2 cache, and cluster functions
>        qed: Read/write support
>        qed: Consistency check support
>        docs: Fix missing carets in QED specification
>
>   Makefile.objs                        |    3 +
>   block.c                              |  234 +++++-
>   block.h                              |    5 +
>   block/qcow2-cluster.c                |    6 +-
>   block/qcow2-snapshot.c               |    6 +-
>   block/qcow2.c                        |  248 +++---
>   block/qed-check.c                    |  210 +++++
>   block/qed-cluster.c                  |  154 ++++
>   block/qed-gencb.c                    |   32 +
>   block/qed-l2-cache.c                 |  173 ++++
>   block/qed-table.c                    |  319 +++++++
>   block/qed.c                          | 1349 ++++++++++++++++++++++++++++++
>   block/qed.h                          |  301 +++++++
>   block/raw-posix.c                    |   45 +
>   block/raw.c                          |    6 +
>   block_int.h                          |    8 +-
>   blockdev.c                           |   68 ++-
>   blockdev.h                           |    1 +
>   configure                            |   26 +
>   cutils.c                             |   17 +-
>   default-configs/arm-softmmu.mak      |    1 -
>   default-configs/i386-softmmu.mak     |    3 -
>   default-configs/mips-softmmu.mak     |    3 -
>   default-configs/mips64-softmmu.mak   |    3 -
>   default-configs/mips64el-softmmu.mak |    3 -
>   default-configs/mipsel-softmmu.mak   |    3 -
>   default-configs/pci.mak              |    4 +
>   default-configs/ppc-softmmu.mak      |    3 -
>   default-configs/ppc64-softmmu.mak    |    3 -
>   default-configs/ppcemb-softmmu.mak   |    3 -
>   default-configs/sh4-softmmu.mak      |    1 -
>   default-configs/sh4eb-softmmu.mak    |    1 -
>   default-configs/sparc64-softmmu.mak  |    3 -
>   default-configs/x86_64-softmmu.mak   |    3 -
>   docs/specs/qed_spec.txt              |  130 +++
>   hmp-commands.hx                      |   19 +
>   hw/ide/ahci.c                        | 1527 ++++++++++++++++++++++++++++++++++
>   hw/ide/cmd646.c                      |   18 +-
>   hw/ide/core.c                        | 1116 +++++++++++--------------
>   hw/ide/internal.h                    |   73 +-
>   hw/ide/pci.c                         |  280 ++++++-
>   hw/ide/pci.h                         |   30 +
>   hw/ide/piix.c                        |   34 +-
>   hw/ide/via.c                         |   34 +-
>   hw/pci.h                             |    1 +
>   hw/pci_ids.h                         |    1 +
>   hw/scsi-defs.h                       |    1 +
>   hw/scsi-disk.c                       |   53 ++-
>   hw/xen_disk.c                        |   17 +-
>   qemu-common.h                        |    7 +
>   qemu-img.c                           |  250 ++----
>   qemu-io.c                            |  102 +++-
>   qemu-nbd.c                           |    2 -
>   trace-events                         |   21 +
>   54 files changed, 5892 insertions(+), 1072 deletions(-)
>   create mode 100644 block/qed-check.c
>   create mode 100644 block/qed-cluster.c
>   create mode 100644 block/qed-gencb.c
>   create mode 100644 block/qed-l2-cache.c
>   create mode 100644 block/qed-table.c
>   create mode 100644 block/qed.c
>   create mode 100644 block/qed.h
>   create mode 100644 docs/specs/qed_spec.txt
>   create mode 100644 hw/ide/ahci.c
>    

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2010-12-17 17:52 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-17 17:44 [Qemu-devel] [PULL 00/38] Block patches Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 01/38] blockdev: check dinfo ptr before using Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 02/38] block: Introduce path_has_protocol() function Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 03/38] block: Fix the use of protocols in backing files Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 04/38] Introduce strtosz_suffix() Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 05/38] qemu-img.c: Clean up handling of image size in img_create() Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 06/38] ide: split ide command interpretation off Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 07/38] ide: fix whitespace gap in ide_exec_cmd Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 08/38] ide: Split out BMDMA code from ATA core Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 09/38] ide: move transfer_start after variable modification Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 10/38] ide: add ncq identify data for ahci sata drives Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 11/38] pci: add storage class for sata Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 12/38] pci: add ich9 pci id Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 13/38] ahci: add ahci emulation Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 14/38] config: move ide core and pci to pci.mak Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 15/38] config: add ahci for pci capable machines Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 16/38] ahci: set SATA Mode Select Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 17/38] ide: honor ncq for atapi Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 18/38] qemu-img: Call error_set_progname Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 19/38] qemu-img.c: Re-factor img_create() Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 20/38] Introduce do_snapshot_blkdev() and monitor command to handle it Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 21/38] Prevent creating an image with the same filename as backing file Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 22/38] bdrv_img_create() use proper errno return values Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 23/38] qemu.img.c: Use error_report() instead of own error() implementation Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 24/38] Remove NULL checks for bdrv_new return value Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 25/38] ide: Register vm change state handler once only Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 26/38] block: add discard support Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 27/38] scsi-disk: support WRITE SAME (16) with unmap bit Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 28/38] qemu-io: Add discard command Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 29/38] raw-posix: add discard support Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 30/38] qemu-io: Fix typo in help texts Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 31/38] docs: Add QED image format specification Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 32/38] qed: Add QEMU Enhanced Disk image format Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 33/38] qed: Table, L2 cache, and cluster functions Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 34/38] qed: Read/write support Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 35/38] qed: Consistency check support Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 36/38] block/qcow2.c: rename qcow_ functions to qcow2_ Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 37/38] Add proper -errno error return values to qcow2_open() Kevin Wolf
2010-12-17 17:44 ` [Qemu-devel] [PATCH 38/38] docs: Fix missing carets in QED specification Kevin Wolf
2010-12-17 17:52 ` [Qemu-devel] Re: [PULL 00/38] Block patches Anthony Liguori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).