qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
To: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
	Anthony Liguori <aliguori@us.ibm.com>,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>,
	Khoa Huynh <khoa@us.ibm.com>,
	Badari Pulavarty <pbadari@us.ibm.com>,
	Christoph Hellwig <hch@lst.de>
Subject: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS
Date: Fri, 15 Apr 2011 14:40:55 +0100	[thread overview]
Message-ID: <1302874855-14736-1-git-send-email-stefanha@linux.vnet.ibm.com> (raw)

The Linux NFS client issues separate NFS requests for vectored direct
I/O writes.  For example, a pwritev() with 8 elements results in 8 write
requests to the server.  This is very inefficient and a kernel-side fix
is not trivial or likely to be available soon.

This patch detects files on NFS and uses the QEMU_AIO_MISALIGNED flag to
force requests to bounce through a linear buffer.

Khoa Huynh <khoa@us.ibm.com> reports the following ffsb benchmark
results over 1 Gbit Ethernet:

Test (threads=8)               unpatched patched
                                  (MB/s)  (MB/s)
Large File Creates (bs=256 KB)      20.5   112.0
Sequential Reads (bs=256 KB)        58.7   112.0
Large File Creates (bs=8 KB)         5.2     5.8
Sequential Reads (bs=8 KB)          46.7    80.9
Random Reads (bs=8 KB)               8.7    23.4
Random Writes (bs=8 KB)             39.6    44.0
Mail Server (bs=8 KB)               10.2    23.6

Test (threads=1)               unpatched patched
                                  (MB/s)  (MB/s)
Large File Creates (bs=256 KB)      14.5    49.8
Sequential Reads (bs=256 KB)        87.9    83.9
Large File Creates (bs=8 KB)         4.8     4.8
Sequential Reads (bs=8 KB)          23.2    23.1
Random Reads (bs=8 KB)               4.8     4.7
Random Writes (bs=8 KB)              9.4    12.8
Mail Server (bs=8 KB)                5.4     7.3

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block/raw-posix.c |   55 ++++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 6b72470..40b7c61 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -49,8 +49,10 @@
 #ifdef __linux__
 #include <sys/ioctl.h>
 #include <sys/param.h>
+#include <sys/vfs.h>
 #include <linux/cdrom.h>
 #include <linux/fd.h>
+#include <linux/magic.h>
 #endif
 #if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
 #include <signal.h>
@@ -124,6 +126,7 @@ typedef struct BDRVRawState {
 #endif
     uint8_t *aligned_buf;
     unsigned aligned_buf_size;
+    bool force_linearize;
 #ifdef CONFIG_XFS
     bool is_xfs : 1;
 #endif
@@ -136,6 +139,32 @@ static int64_t raw_getlength(BlockDriverState *bs);
 static int cdrom_reopen(BlockDriverState *bs);
 #endif
 
+#if defined(__linux__)
+static bool is_vectored_io_slow(int fd, int open_flags)
+{
+    struct statfs stfs;
+    int ret;
+
+    do {
+        ret = fstatfs(fd, &stfs);
+    } while (ret != 0 && errno == EINTR);
+
+    /*
+     * Linux NFS client splits vectored direct I/O requests into separate NFS
+     * requests so it is faster to submit a single buffer instead.
+     */
+    if (!ret && stfs.f_type == NFS_SUPER_MAGIC && (open_flags & O_DIRECT)) {
+        return true;
+    }
+    return false;
+}
+#else /* !defined(__linux__) */
+static bool is_vectored_io_slow(int fd, int open_flags)
+{
+    return false;
+}
+#endif
+
 static int raw_open_common(BlockDriverState *bs, const char *filename,
                            int bdrv_flags, int open_flags)
 {
@@ -167,6 +196,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
     }
     s->fd = fd;
     s->aligned_buf = NULL;
+    s->force_linearize = is_vectored_io_slow(fd, s->open_flags);
 
     if ((bdrv_flags & BDRV_O_NOCACHE)) {
         /*
@@ -536,20 +566,27 @@ static BlockDriverAIOCB *raw_aio_submit(BlockDriverState *bs,
         return NULL;
 
     /*
+     * Check if buffers need to be copied into a single linear buffer.
+     */
+    if (s->force_linearize && qiov->niov > 1) {
+        type |= QEMU_AIO_MISALIGNED;
+    }
+
+    /*
      * If O_DIRECT is used the buffer needs to be aligned on a sector
-     * boundary.  Check if this is the case or telll the low-level
+     * boundary.  Check if this is the case or tell the low-level
      * driver that it needs to copy the buffer.
      */
-    if (s->aligned_buf) {
-        if (!qiov_is_aligned(bs, qiov)) {
-            type |= QEMU_AIO_MISALIGNED;
+    if (s->aligned_buf && !qiov_is_aligned(bs, qiov)) {
+        type |= QEMU_AIO_MISALIGNED;
+    }
+
 #ifdef CONFIG_LINUX_AIO
-        } else if (s->use_aio) {
-            return laio_submit(bs, s->aio_ctx, s->fd, sector_num, qiov,
-                               nb_sectors, cb, opaque, type);
-#endif
-        }
+    if (s->use_aio && (type & QEMU_AIO_MISALIGNED) == 0) {
+        return laio_submit(bs, s->aio_ctx, s->fd, sector_num, qiov,
+                           nb_sectors, cb, opaque, type);
     }
+#endif
 
     return paio_submit(bs, s->fd, sector_num, qiov, nb_sectors,
                        cb, opaque, type);
-- 
1.7.4.1

             reply	other threads:[~2011-04-15 13:41 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-15 13:40 Stefan Hajnoczi [this message]
2011-04-15 15:05 ` [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS Christoph Hellwig
2011-04-15 15:26   ` Stefan Hajnoczi
2011-04-15 15:34     ` Christoph Hellwig
2011-04-15 16:10       ` Anthony Liguori
2011-04-15 16:17         ` Stefan Hajnoczi
2011-04-15 17:27         ` Christoph Hellwig
2011-04-15 16:23       ` Badari Pulavarty
2011-04-15 17:29         ` Christoph Hellwig
2011-04-15 22:21           ` Badari Pulavarty
2011-04-15 23:00             ` Anthony Liguori
2011-04-15 23:33               ` Badari Pulavarty
2011-04-16  2:05                 ` Christoph Hellwig
2011-04-16  8:46               ` Stefan Hajnoczi
2011-04-16  2:03             ` Christoph Hellwig
2011-04-15 18:09         ` Anthony Liguori
2011-04-15 18:25           ` Badari Pulavarty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1302874855-14736-1-git-send-email-stefanha@linux.vnet.ibm.com \
    --to=stefanha@linux.vnet.ibm.com \
    --cc=aliguori@us.ibm.com \
    --cc=hch@lst.de \
    --cc=khoa@us.ibm.com \
    --cc=kwolf@redhat.com \
    --cc=pbadari@us.ibm.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).