* [Qemu-devel] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices
@ 2019-07-04 12:43 Maxim Levitsky
  2019-07-04 12:43 ` [Qemu-devel] [PATCH v2 1/1] raw-posix.c - use max transfer length / max segement count only for SCSI passthrough Maxim Levitsky
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Maxim Levitsky @ 2019-07-04 12:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, John Ferlan,
	Max Reitz
Linux block devices, even in O_DIRECT mode don't have any user visible
limit on transfer size / number of segments, which underlying kernel block device can have.
The kernel block layer takes care of enforcing these limits by splitting the bios.
By limiting the transfer sizes, we force qemu to do the splitting itself which
introduces various overheads.
It is especially visible in nbd server, where the low max transfer size of the
underlying device forces us to advertise this over NBD, thus increasing the
traffic overhead in case of image conversion which benefits from large blocks.
More information can be found here:
https://bugzilla.redhat.com/show_bug.cgi?id=1647104
Tested this with qemu-img convert over nbd and natively and to my surprise,
even native IO performance improved a bit.
(The device on which it was tested is Intel Optane DC P4800X,
which has 128k max transfer size reported by the kernel)
The benchmark:
Images were created using:
Sparse image:  qemu-img create -f qcow2 /dev/nvme0n1p3 1G / 10G / 100G
Allocated image: qemu-img create -f qcow2 /dev/nvme0n1p3 -o preallocation=metadata  1G / 10G / 100G
The test was:
 echo "convert native:"
 rm -rf /dev/shm/disk.img
 time qemu-img convert -p -f qcow2 -O raw -T none $FILE /dev/shm/disk.img > /dev/zero
 echo "convert via nbd:"
 qemu-nbd -k /tmp/nbd.sock -v  -f qcow2 $FILE -x export --cache=none --aio=native --fork
 rm -rf /dev/shm/disk.img
 time qemu-img convert -p -f raw -O raw nbd:unix:/tmp/nbd.sock:exportname=export /dev/shm/disk.img > /dev/zero
The results:
=========================================
1G sparse image:
 native:
	before: 0.027s
	after: 0.027s
 nbd:
	before: 0.287s
	after: 0.035s
=========================================
100G sparse image:
 native:
	before: 0.028s
	after: 0.028s
 nbd:
	before: 23.796s
	after: 0.109s
=========================================
1G preallocated image:
 native:
       before: 0.454s
       after: 0.427s
 nbd:
       before: 0.649s
       after: 0.546s
The block limits of max transfer size/max segment size are retained
for the SCSI passthrough because in this case the kernel passes the userspace request
directly to the kernel scsi driver, bypassing the block layer, and thus there is no code to split
such requests.
Fam, since you was the original author of the code that added
these limits, could you share your opinion on that?
What was the reason besides SCSI passthrough?
V2:
*  Manually tested to not break the scsi passthrough with a nested VM
*  As Eric suggested, refactored the area around the fstat.
*  Spelling/grammar fixes
Best regards,
	Maxim Levitsky
Maxim Levitsky (1):
  raw-posix.c - use max transfer length / max segement count only for
    SCSI passthrough
 block/file-posix.c | 54 ++++++++++++++++++++++++----------------------
 1 file changed, 28 insertions(+), 26 deletions(-)
-- 
2.17.2
^ permalink raw reply	[flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v2 1/1] raw-posix.c - use max transfer length / max segement count only for SCSI passthrough
  2019-07-04 12:43 [Qemu-devel] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices Maxim Levitsky
@ 2019-07-04 12:43 ` Maxim Levitsky
  2019-07-10 13:43   ` Maxim Levitsky
  2019-07-11 10:31 ` [Qemu-devel] [Qemu-block] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices Stefan Hajnoczi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Maxim Levitsky @ 2019-07-04 12:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Maxim Levitsky, John Ferlan,
	Max Reitz
Regular kernel block devices (/dev/sda*, /dev/nvme*, etc) don't have
max segment size/max segment count hardware requirements exposed
to the userspace, but rather the kernel block layer
takes care to split the incoming requests that
violate these requirements.
Allowing the kernel to do the splitting allows qemu to avoid
various overheads that arise otherwise from this.
This is especially visible in nbd server,
exposing as a raw file, a mostly empty qcow2 image over the net.
In this case most of the reads by the remote user
won't even hit the underlying kernel block device,
and therefore most of the  overhead will be in the
nbd traffic which increases significantly with lower max transfer size.
In addition to that even for local block device
access the peformance improves a bit due to less
traffic between qemu and the kernel when large
transfer sizes are used (e.g for image conversion)
More info can be found at:
https://bugzilla.redhat.com/show_bug.cgi?id=1647104
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/file-posix.c | 54 ++++++++++++++++++++++++----------------------
 1 file changed, 28 insertions(+), 26 deletions(-)
diff --git a/block/file-posix.c b/block/file-posix.c
index ab05b51a66..4479cc7ab4 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1038,15 +1038,13 @@ static void raw_reopen_abort(BDRVReopenState *state)
     s->reopen_state = NULL;
 }
 
-static int hdev_get_max_transfer_length(BlockDriverState *bs, int fd)
+static int sg_get_max_transfer_length(int fd)
 {
 #ifdef BLKSECTGET
     int max_bytes = 0;
-    short max_sectors = 0;
-    if (bs->sg && ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
+
+    if (ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
         return max_bytes;
-    } else if (!bs->sg && ioctl(fd, BLKSECTGET, &max_sectors) == 0) {
-        return max_sectors << BDRV_SECTOR_BITS;
     } else {
         return -errno;
     }
@@ -1055,25 +1053,31 @@ static int hdev_get_max_transfer_length(BlockDriverState *bs, int fd)
 #endif
 }
 
-static int hdev_get_max_segments(const struct stat *st)
+static int sg_get_max_segments(int fd)
 {
 #ifdef CONFIG_LINUX
     char buf[32];
     const char *end;
-    char *sysfspath;
+    char *sysfspath = NULL;
     int ret;
-    int fd = -1;
+    int sysfd = -1;
     long max_segments;
+    struct stat st;
+
+    if (fstat(fd, &st)) {
+        ret = -errno;
+        goto out;
+    }
 
     sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
-                                major(st->st_rdev), minor(st->st_rdev));
-    fd = open(sysfspath, O_RDONLY);
-    if (fd == -1) {
+                                major(st.st_rdev), minor(st.st_rdev));
+    sysfd = open(sysfspath, O_RDONLY);
+    if (sysfd == -1) {
         ret = -errno;
         goto out;
     }
     do {
-        ret = read(fd, buf, sizeof(buf) - 1);
+        ret = read(sysfd, buf, sizeof(buf) - 1);
     } while (ret == -1 && errno == EINTR);
     if (ret < 0) {
         ret = -errno;
@@ -1090,8 +1094,8 @@ static int hdev_get_max_segments(const struct stat *st)
     }
 
 out:
-    if (fd != -1) {
-        close(fd);
+    if (sysfd != -1) {
+        close(sysfd);
     }
     g_free(sysfspath);
     return ret;
@@ -1103,19 +1107,17 @@ out:
 static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BDRVRawState *s = bs->opaque;
-    struct stat st;
 
-    if (!fstat(s->fd, &st)) {
-        if (S_ISBLK(st.st_mode) || S_ISCHR(st.st_mode)) {
-            int ret = hdev_get_max_transfer_length(bs, s->fd);
-            if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
-                bs->bl.max_transfer = pow2floor(ret);
-            }
-            ret = hdev_get_max_segments(&st);
-            if (ret > 0) {
-                bs->bl.max_transfer = MIN(bs->bl.max_transfer,
-                                          ret * getpagesize());
-            }
+    if (bs->sg) {
+        int ret = sg_get_max_transfer_length(s->fd);
+
+        if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
+            bs->bl.max_transfer = pow2floor(ret);
+        }
+
+        ret = sg_get_max_segments(s->fd);
+        if (ret > 0) {
+            bs->bl.max_transfer = MIN(bs->bl.max_transfer, ret * getpagesize());
         }
     }
 
-- 
2.17.2
^ permalink raw reply related	[flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/1] raw-posix.c - use max transfer length / max segement count only for SCSI passthrough
  2019-07-04 12:43 ` [Qemu-devel] [PATCH v2 1/1] raw-posix.c - use max transfer length / max segement count only for SCSI passthrough Maxim Levitsky
@ 2019-07-10 13:43   ` Maxim Levitsky
  0 siblings, 0 replies; 6+ messages in thread
From: Maxim Levitsky @ 2019-07-10 13:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Fam Zheng, Max Reitz, qemu-block, John Ferlan
On Thu, 2019-07-04 at 15:43 +0300, Maxim Levitsky wrote:
> Regular kernel block devices (/dev/sda*, /dev/nvme*, etc) don't have
> max segment size/max segment count hardware requirements exposed
> to the userspace, but rather the kernel block layer
> takes care to split the incoming requests that
> violate these requirements.
> 
> Allowing the kernel to do the splitting allows qemu to avoid
> various overheads that arise otherwise from this.
> 
> This is especially visible in nbd server,
> exposing as a raw file, a mostly empty qcow2 image over the net.
> In this case most of the reads by the remote user
> won't even hit the underlying kernel block device,
> and therefore most of the  overhead will be in the
> nbd traffic which increases significantly with lower max transfer size.
> 
> In addition to that even for local block device
> access the peformance improves a bit due to less
> traffic between qemu and the kernel when large
> transfer sizes are used (e.g for image conversion)
> 
> More info can be found at:
> https://bugzilla.redhat.com/show_bug.cgi?id=1647104
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
>  block/file-posix.c | 54 ++++++++++++++++++++++++----------------------
>  1 file changed, 28 insertions(+), 26 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index ab05b51a66..4479cc7ab4 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1038,15 +1038,13 @@ static void raw_reopen_abort(BDRVReopenState *state)
>      s->reopen_state = NULL;
>  }
>  
> -static int hdev_get_max_transfer_length(BlockDriverState *bs, int fd)
> +static int sg_get_max_transfer_length(int fd)
>  {
>  #ifdef BLKSECTGET
>      int max_bytes = 0;
> -    short max_sectors = 0;
> -    if (bs->sg && ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
> +
> +    if (ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
>          return max_bytes;
> -    } else if (!bs->sg && ioctl(fd, BLKSECTGET, &max_sectors) == 0) {
> -        return max_sectors << BDRV_SECTOR_BITS;
>      } else {
>          return -errno;
>      }
> @@ -1055,25 +1053,31 @@ static int hdev_get_max_transfer_length(BlockDriverState *bs, int fd)
>  #endif
>  }
>  
> -static int hdev_get_max_segments(const struct stat *st)
> +static int sg_get_max_segments(int fd)
>  {
>  #ifdef CONFIG_LINUX
>      char buf[32];
>      const char *end;
> -    char *sysfspath;
> +    char *sysfspath = NULL;
>      int ret;
> -    int fd = -1;
> +    int sysfd = -1;
>      long max_segments;
> +    struct stat st;
> +
> +    if (fstat(fd, &st)) {
> +        ret = -errno;
> +        goto out;
> +    }
>  
>      sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
> -                                major(st->st_rdev), minor(st->st_rdev));
> -    fd = open(sysfspath, O_RDONLY);
> -    if (fd == -1) {
> +                                major(st.st_rdev), minor(st.st_rdev));
> +    sysfd = open(sysfspath, O_RDONLY);
> +    if (sysfd == -1) {
>          ret = -errno;
>          goto out;
>      }
>      do {
> -        ret = read(fd, buf, sizeof(buf) - 1);
> +        ret = read(sysfd, buf, sizeof(buf) - 1);
>      } while (ret == -1 && errno == EINTR);
>      if (ret < 0) {
>          ret = -errno;
> @@ -1090,8 +1094,8 @@ static int hdev_get_max_segments(const struct stat *st)
>      }
>  
>  out:
> -    if (fd != -1) {
> -        close(fd);
> +    if (sysfd != -1) {
> +        close(sysfd);
>      }
>      g_free(sysfspath);
>      return ret;
> @@ -1103,19 +1107,17 @@ out:
>  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>      BDRVRawState *s = bs->opaque;
> -    struct stat st;
>  
> -    if (!fstat(s->fd, &st)) {
> -        if (S_ISBLK(st.st_mode) || S_ISCHR(st.st_mode)) {
> -            int ret = hdev_get_max_transfer_length(bs, s->fd);
> -            if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
> -                bs->bl.max_transfer = pow2floor(ret);
> -            }
> -            ret = hdev_get_max_segments(&st);
> -            if (ret > 0) {
> -                bs->bl.max_transfer = MIN(bs->bl.max_transfer,
> -                                          ret * getpagesize());
> -            }
> +    if (bs->sg) {
> +        int ret = sg_get_max_transfer_length(s->fd);
> +
> +        if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
> +            bs->bl.max_transfer = pow2floor(ret);
> +        }
> +
> +        ret = sg_get_max_segments(s->fd);
> +        if (ret > 0) {
> +            bs->bl.max_transfer = MIN(bs->bl.max_transfer, ret * getpagesize());
>          }
>      }
>  
Ping.
Best regards,
	Maxim Levitsky
^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices
  2019-07-04 12:43 [Qemu-devel] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices Maxim Levitsky
  2019-07-04 12:43 ` [Qemu-devel] [PATCH v2 1/1] raw-posix.c - use max transfer length / max segement count only for SCSI passthrough Maxim Levitsky
@ 2019-07-11 10:31 ` Stefan Hajnoczi
  2019-07-12  8:32 ` [Qemu-devel] " Pankaj Gupta
  2019-07-12  9:20 ` Kevin Wolf
  3 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2019-07-11 10:31 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Fam Zheng, qemu-block, qemu-devel, Max Reitz,
	John Ferlan
[-- Attachment #1: Type: text/plain, Size: 3273 bytes --]
On Thu, Jul 04, 2019 at 03:43:41PM +0300, Maxim Levitsky wrote:
> Linux block devices, even in O_DIRECT mode don't have any user visible
> limit on transfer size / number of segments, which underlying kernel block device can have.
> The kernel block layer takes care of enforcing these limits by splitting the bios.
> 
> By limiting the transfer sizes, we force qemu to do the splitting itself which
> introduces various overheads.
> It is especially visible in nbd server, where the low max transfer size of the
> underlying device forces us to advertise this over NBD, thus increasing the
> traffic overhead in case of image conversion which benefits from large blocks.
> 
> More information can be found here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1647104
> 
> Tested this with qemu-img convert over nbd and natively and to my surprise,
> even native IO performance improved a bit.
> 
> (The device on which it was tested is Intel Optane DC P4800X,
> which has 128k max transfer size reported by the kernel)
> 
> The benchmark:
> 
> Images were created using:
> 
> Sparse image:  qemu-img create -f qcow2 /dev/nvme0n1p3 1G / 10G / 100G
> Allocated image: qemu-img create -f qcow2 /dev/nvme0n1p3 -o preallocation=metadata  1G / 10G / 100G
> 
> The test was:
> 
>  echo "convert native:"
>  rm -rf /dev/shm/disk.img
>  time qemu-img convert -p -f qcow2 -O raw -T none $FILE /dev/shm/disk.img > /dev/zero
> 
>  echo "convert via nbd:"
>  qemu-nbd -k /tmp/nbd.sock -v  -f qcow2 $FILE -x export --cache=none --aio=native --fork
>  rm -rf /dev/shm/disk.img
>  time qemu-img convert -p -f raw -O raw nbd:unix:/tmp/nbd.sock:exportname=export /dev/shm/disk.img > /dev/zero
> 
> The results:
> 
> =========================================
> 1G sparse image:
>  native:
> 	before: 0.027s
> 	after: 0.027s
>  nbd:
> 	before: 0.287s
> 	after: 0.035s
> 
> =========================================
> 100G sparse image:
>  native:
> 	before: 0.028s
> 	after: 0.028s
>  nbd:
> 	before: 23.796s
> 	after: 0.109s
> 
> =========================================
> 1G preallocated image:
>  native:
>        before: 0.454s
>        after: 0.427s
>  nbd:
>        before: 0.649s
>        after: 0.546s
> 
> The block limits of max transfer size/max segment size are retained
> for the SCSI passthrough because in this case the kernel passes the userspace request
> directly to the kernel scsi driver, bypassing the block layer, and thus there is no code to split
> such requests.
> 
> Fam, since you was the original author of the code that added
> these limits, could you share your opinion on that?
> What was the reason besides SCSI passthrough?
> 
> V2:
> 
> *  Manually tested to not break the scsi passthrough with a nested VM
> *  As Eric suggested, refactored the area around the fstat.
> *  Spelling/grammar fixes
> 
> Best regards,
> 	Maxim Levitsky
> 
> Maxim Levitsky (1):
>   raw-posix.c - use max transfer length / max segement count only for
>     SCSI passthrough
> 
>  block/file-posix.c | 54 ++++++++++++++++++++++++----------------------
>  1 file changed, 28 insertions(+), 26 deletions(-)
> 
> -- 
> 2.17.2
> 
> 
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices
  2019-07-04 12:43 [Qemu-devel] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices Maxim Levitsky
  2019-07-04 12:43 ` [Qemu-devel] [PATCH v2 1/1] raw-posix.c - use max transfer length / max segement count only for SCSI passthrough Maxim Levitsky
  2019-07-11 10:31 ` [Qemu-devel] [Qemu-block] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices Stefan Hajnoczi
@ 2019-07-12  8:32 ` Pankaj Gupta
  2019-07-12  9:20 ` Kevin Wolf
  3 siblings, 0 replies; 6+ messages in thread
From: Pankaj Gupta @ 2019-07-12  8:32 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Kevin Wolf, Fam Zheng, qemu-block, qemu-devel, Max Reitz,
	John Ferlan
> Linux block devices, even in O_DIRECT mode don't have any user visible
> limit on transfer size / number of segments, which underlying kernel block
> device can have.
> The kernel block layer takes care of enforcing these limits by splitting the
> bios.
> 
> By limiting the transfer sizes, we force qemu to do the splitting itself
> which
> introduces various overheads.
> It is especially visible in nbd server, where the low max transfer size of
> the
> underlying device forces us to advertise this over NBD, thus increasing the
> traffic overhead in case of image conversion which benefits from large
> blocks.
> 
> More information can be found here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1647104
> 
> Tested this with qemu-img convert over nbd and natively and to my surprise,Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
> even native IO performance improved a bit.
> 
> (The device on which it was tested is Intel Optane DC P4800X,
> which has 128k max transfer size reported by the kernel)
> 
> The benchmark:
> 
> Images were created using:
> 
> Sparse image:  qemu-img create -f qcow2 /dev/nvme0n1p3 1G / 10G / 100G
> Allocated image: qemu-img create -f qcow2 /dev/nvme0n1p3 -o
> preallocation=metadata  1G / 10G / 100G
> 
> The test was:
> 
>  echo "convert native:"
>  rm -rf /dev/shm/disk.img
>  time qemu-img convert -p -f qcow2 -O raw -T none $FILE /dev/shm/disk.img >
>  /dev/zero
> 
>  echo "convert via nbd:"
>  qemu-nbd -k /tmp/nbd.sock -v  -f qcow2 $FILE -x export --cache=none
>  --aio=native --fork
>  rm -rf /dev/shm/disk.img
>  time qemu-img convert -p -f raw -O raw
>  nbd:unix:/tmp/nbd.sock:exportname=export /dev/shm/disk.img > /dev/zero
> 
> The results:
> 
> =========================================
> 1G sparse image:
>  native:
> 	before: 0.027s
> 	after: 0.027s
>  nbd:
> 	before: 0.287s
> 	after: 0.035s
> 
> =========================================
> 100G sparse image:
>  native:
> 	before: 0.028s
> 	after: 0.028s
>  nbd:
> 	before: 23.796s
> 	after: 0.109s
> 
> =========================================
> 1G preallocated image:
>  native:
>        before: 0.454s
>        after: 0.427s
>  nbd:
>        before: 0.649s
>        after: 0.546s
> 
> The block limits of max transfer size/max segment size are retained
> for the SCSI passthrough because in this case the kernel passes the userspace
> request
> directly to the kernel scsi driver, bypassing the block layer, and thus there
> is no code to split
> such requests.
> 
> Fam, since you was the original author of the code that added
> these limits, could you share your opinion on that?
> What was the reason besides SCSI passthrough?
> 
> V2:
> 
> *  Manually tested to not break the scsi passthrough with a nested VM
> *  As Eric suggested, refactored the area around the fstat.
> *  Spelling/grammar fixes
> 
> Best regards,
> 	Maxim Levitsky
> 
> Maxim Levitsky (1):
>   raw-posix.c - use max transfer length / max segement count only for
>     SCSI passthrough
> 
>  block/file-posix.c | 54 ++++++++++++++++++++++++----------------------
>  1 file changed, 28 insertions(+), 26 deletions(-)
> 
> --
I am not familiar with SCSI passthrough special case. But overall looks good to me.
Feel free to add:
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
> 2.17.2
> 
> 
> 
^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices
  2019-07-04 12:43 [Qemu-devel] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices Maxim Levitsky
                   ` (2 preceding siblings ...)
  2019-07-12  8:32 ` [Qemu-devel] " Pankaj Gupta
@ 2019-07-12  9:20 ` Kevin Wolf
  3 siblings, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2019-07-12  9:20 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: Fam Zheng, John Ferlan, qemu-devel, qemu-block, Max Reitz
Am 04.07.2019 um 14:43 hat Maxim Levitsky geschrieben:
> Linux block devices, even in O_DIRECT mode don't have any user visible
> limit on transfer size / number of segments, which underlying kernel block device can have.
> The kernel block layer takes care of enforcing these limits by splitting the bios.
> 
> By limiting the transfer sizes, we force qemu to do the splitting itself which
> introduces various overheads.
> It is especially visible in nbd server, where the low max transfer size of the
> underlying device forces us to advertise this over NBD, thus increasing the
> traffic overhead in case of image conversion which benefits from large blocks.
> 
> More information can be found here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1647104
> 
> Tested this with qemu-img convert over nbd and natively and to my surprise,
> even native IO performance improved a bit.
Thanks, applied to the block branch.
Kevin
^ permalink raw reply	[flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-07-12  9:20 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-07-04 12:43 [Qemu-devel] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices Maxim Levitsky
2019-07-04 12:43 ` [Qemu-devel] [PATCH v2 1/1] raw-posix.c - use max transfer length / max segement count only for SCSI passthrough Maxim Levitsky
2019-07-10 13:43   ` Maxim Levitsky
2019-07-11 10:31 ` [Qemu-devel] [Qemu-block] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices Stefan Hajnoczi
2019-07-12  8:32 ` [Qemu-devel] " Pankaj Gupta
2019-07-12  9:20 ` Kevin Wolf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).