[Qemu-devel] [PATCH 0/1] RFC: don't obey the block device max transfer len / max segments for block devices

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/1] RFC: don't obey the block device max transfer len / max segments for block devices
@ 2019-06-30 15:08 Maxim Levitsky
  2019-06-30 15:08 ` [Qemu-devel] [PATCH 1/1] raw-posix.c - use max transfer length / max segemnt count only for SCSI passthrough Maxim Levitsky
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Maxim Levitsky @ 2019-06-30 15:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Maxim Levitsky, John Ferlan,
	Max Reitz

It looks like Linux block devices, even in O_DIRECT mode don't have any user visible
limit on transfer size / number of segments, which underlying block device can have.
The block layer takes care of enforcing these limits by splitting the bios.

By limiting the transfer sizes, we  force qemu to do the splitting itself which
introduces various overheads.
It is especially visible in nbd server, where the low max transfer size of the
underlying device forces us to advertise this over NBD, thus increasing the traffic overhead in case of
image conversion which benefits from large blocks.

More information can be found here:
https://bugzilla.redhat.com/show_bug.cgi?id=1647104

Tested this with qemu-img convert over nbd and natively and to my surprise, even native IO performance improved a bit.
(The device on which it was tested is Intel Optane DC P4800X, which has 128k max transfer size)

The benchmark:

Images were created using:

Sparse image:  qemu-img create -f qcow2 /dev/nvme0n1p3 1G / 10G / 100G
Allocated image: qemu-img create -f qcow2 /dev/nvme0n1p3 -o preallocation=metadata  1G / 10G / 100G

The test was:

 echo "convert native:"
 rm -rf /dev/shm/disk.img
 time qemu-img convert -p -f qcow2 -O raw -T none $FILE /dev/shm/disk.img > /dev/zero

 echo "convert via nbd:"
 qemu-nbd -k /tmp/nbd.sock -v  -f qcow2 $FILE -x export --cache=none --aio=native --fork
 rm -rf /dev/shm/disk.img
 time qemu-img convert -p -f raw -O raw nbd:unix:/tmp/nbd.sock:exportname=export /dev/shm/disk.img > /dev/zero

The results:

=========================================
1G sparse image:
 native:
	before: 0.027s
	after: 0.027s
 nbd:
	before: 0.287s
	after: 0.035s

=========================================
100G sparse image:
 native:
	before: 0.028s
	after: 0.028s
 nbd:
	before: 23.796s
	after: 0.109s

=========================================
1G preallocated image:
 native:
       before: 0.454s
       after: 0.427s
 nbd:
       before: 0.649s
       after: 0.546s

The block limits of max transfer size/max segment size are retained
for the SCSI passthrough because in this case the kernel passes the userspace request
directly to the kernel scsi driver, bypassing the block layer, and thus there is no code to split
such requests.

What do you think?

Fam, since you was the original author of the code that added
these limits, could you share your opinion on that?
What was the reason besides SCSI passthrough?

Best regards,
	Maxim Levitsky

Maxim Levitsky (1):
  raw-posix.c - use max transfer length / max segemnt count only for
    SCSI passthrough

 block/file-posix.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [PATCH 1/1] raw-posix.c - use max transfer length / max segemnt count only for SCSI passthrough
  2019-06-30 15:08 [Qemu-devel] [PATCH 0/1] RFC: don't obey the block device max transfer len / max segments for block devices Maxim Levitsky
@ 2019-06-30 15:08 ` Maxim Levitsky
  2019-07-03 14:50   ` Eric Blake
  2019-07-02 16:11 ` [Qemu-devel] [PATCH 0/1] RFC: don't obey the block device max transfer len / max segments for block devices Maxim Levitsky
  2019-07-03  9:52 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2 siblings, 1 reply; 7+ messages in thread
From: Maxim Levitsky @ 2019-06-30 15:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Maxim Levitsky, John Ferlan,
	Max Reitz

Regular block devices (/dev/sda*, /dev/nvme*, etc) interface is not limited
by the underlying storage limits, but rather the kernel block layer
takes care to split the requests that are too large/fragmented.

Doing so allows us to have less overhead in qemu.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 block/file-posix.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index ab05b51a66..66dad34f8a 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1038,15 +1038,13 @@ static void raw_reopen_abort(BDRVReopenState *state)
     s->reopen_state = NULL;
 }
 
-static int hdev_get_max_transfer_length(BlockDriverState *bs, int fd)
+static int sg_get_max_transfer_length(BlockDriverState *bs, int fd)
 {
 #ifdef BLKSECTGET
     int max_bytes = 0;
-    short max_sectors = 0;
-    if (bs->sg && ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
+
+    if (ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
         return max_bytes;
-    } else if (!bs->sg && ioctl(fd, BLKSECTGET, &max_sectors) == 0) {
-        return max_sectors << BDRV_SECTOR_BITS;
     } else {
         return -errno;
     }
@@ -1055,7 +1053,7 @@ static int hdev_get_max_transfer_length(BlockDriverState *bs, int fd)
 #endif
 }
 
-static int hdev_get_max_segments(const struct stat *st)
+static int sg_get_max_segments(const struct stat *st)
 {
 #ifdef CONFIG_LINUX
     char buf[32];
@@ -1106,12 +1104,12 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
     struct stat st;
 
     if (!fstat(s->fd, &st)) {
-        if (S_ISBLK(st.st_mode) || S_ISCHR(st.st_mode)) {
-            int ret = hdev_get_max_transfer_length(bs, s->fd);
+        if (bs->sg) {
+            int ret = sg_get_max_transfer_length(bs, s->fd);
             if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
                 bs->bl.max_transfer = pow2floor(ret);
             }
-            ret = hdev_get_max_segments(&st);
+            ret = sg_get_max_segments(&st);
             if (ret > 0) {
                 bs->bl.max_transfer = MIN(bs->bl.max_transfer,
                                           ret * getpagesize());
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] raw-posix.c - use max transfer length / max segemnt count only for SCSI passthrough
  2019-06-30 15:08 ` [Qemu-devel] [PATCH 1/1] raw-posix.c - use max transfer length / max segemnt count only for SCSI passthrough Maxim Levitsky
@ 2019-07-03 14:50   ` Eric Blake
  2019-07-03 15:28     ` Maxim Levitsky
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Blake @ 2019-07-03 14:50 UTC (permalink / raw)
  To: Maxim Levitsky, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Max Reitz, qemu-block, John Ferlan


[-- Attachment #1.1: Type: text/plain, Size: 2796 bytes --]

On 6/30/19 10:08 AM, Maxim Levitsky wrote:
> Regular block devices (/dev/sda*, /dev/nvme*, etc) interface is not limited

The regular block device interface is

or

Regular block devices interfaces are

> by the underlying storage limits, but rather the kernel block layer
> takes care to split the requests that are too large/fragmented.
> 
> Doing so allows us to have less overhead in qemu.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  block/file-posix.c | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index ab05b51a66..66dad34f8a 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1038,15 +1038,13 @@ static void raw_reopen_abort(BDRVReopenState *state)
>      s->reopen_state = NULL;
>  }
>  
> -static int hdev_get_max_transfer_length(BlockDriverState *bs, int fd)
> +static int sg_get_max_transfer_length(BlockDriverState *bs, int fd)
>  {
>  #ifdef BLKSECTGET
>      int max_bytes = 0;
> -    short max_sectors = 0;
> -    if (bs->sg && ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
> +
> +    if (ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
>          return max_bytes;
> -    } else if (!bs->sg && ioctl(fd, BLKSECTGET, &max_sectors) == 0) {
> -        return max_sectors << BDRV_SECTOR_BITS;
>      } else {
>          return -errno;
>      }
> @@ -1055,7 +1053,7 @@ static int hdev_get_max_transfer_length(BlockDriverState *bs, int fd)
>  #endif
>  }
>  
> -static int hdev_get_max_segments(const struct stat *st)
> +static int sg_get_max_segments(const struct stat *st)
>  {
>  #ifdef CONFIG_LINUX
>      char buf[32];
> @@ -1106,12 +1104,12 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>      struct stat st;
>  
>      if (!fstat(s->fd, &st)) {
> -        if (S_ISBLK(st.st_mode) || S_ISCHR(st.st_mode)) {
> -            int ret = hdev_get_max_transfer_length(bs, s->fd);

Is it worth delaying the fstat()...

> +        if (bs->sg) {
> +            int ret = sg_get_max_transfer_length(bs, s->fd);
>              if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
>                  bs->bl.max_transfer = pow2floor(ret);
>              }
> -            ret = hdev_get_max_segments(&st);
> +            ret = sg_get_max_segments(&st);

...until inside the if (bs->sg) condition, to avoid wasted work for
other scenarios?

>              if (ret > 0) {
>                  bs->bl.max_transfer = MIN(bs->bl.max_transfer,
>                                            ret * getpagesize());
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] raw-posix.c - use max transfer length / max segemnt count only for SCSI passthrough
  2019-07-03 14:50   ` Eric Blake
@ 2019-07-03 15:28     ` Maxim Levitsky
  0 siblings, 0 replies; 7+ messages in thread
From: Maxim Levitsky @ 2019-07-03 15:28 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Max Reitz, qemu-block, John Ferlan

On Wed, 2019-07-03 at 09:50 -0500, Eric Blake wrote:
> On 6/30/19 10:08 AM, Maxim Levitsky wrote:
> > Regular block devices (/dev/sda*, /dev/nvme*, etc) interface is not limited
> 
> The regular block device interface is
> 
> or
> 
> Regular block devices interfaces are
> 
> > by the underlying storage limits, but rather the kernel block layer
> > takes care to split the requests that are too large/fragmented.
> > 
> > Doing so allows us to have less overhead in qemu.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  block/file-posix.c | 16 +++++++---------
> >  1 file changed, 7 insertions(+), 9 deletions(-)
> > 
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index ab05b51a66..66dad34f8a 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -1038,15 +1038,13 @@ static void raw_reopen_abort(BDRVReopenState *state)
> >      s->reopen_state = NULL;
> >  }
> >  
> > -static int hdev_get_max_transfer_length(BlockDriverState *bs, int fd)
> > +static int sg_get_max_transfer_length(BlockDriverState *bs, int fd)
> >  {
> >  #ifdef BLKSECTGET
> >      int max_bytes = 0;
> > -    short max_sectors = 0;
> > -    if (bs->sg && ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
> > +
> > +    if (ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
> >          return max_bytes;
> > -    } else if (!bs->sg && ioctl(fd, BLKSECTGET, &max_sectors) == 0) {
> > -        return max_sectors << BDRV_SECTOR_BITS;
> >      } else {
> >          return -errno;
> >      }
> > @@ -1055,7 +1053,7 @@ static int hdev_get_max_transfer_length(BlockDriverState *bs, int fd)
> >  #endif
> >  }
> >  
> > -static int hdev_get_max_segments(const struct stat *st)
> > +static int sg_get_max_segments(const struct stat *st)
> >  {
> >  #ifdef CONFIG_LINUX
> >      char buf[32];
> > @@ -1106,12 +1104,12 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
> >      struct stat st;
> >  
> >      if (!fstat(s->fd, &st)) {
> > -        if (S_ISBLK(st.st_mode) || S_ISCHR(st.st_mode)) {
> > -            int ret = hdev_get_max_transfer_length(bs, s->fd);
> 
> Is it worth delaying the fstat()...
> 
> > +        if (bs->sg) {
> > +            int ret = sg_get_max_transfer_length(bs, s->fd);
> >              if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
> >                  bs->bl.max_transfer = pow2floor(ret);
> >              }
> > -            ret = hdev_get_max_segments(&st);
> > +            ret = sg_get_max_segments(&st);
> 
> ...until inside the if (bs->sg) condition, to avoid wasted work for
> other scenarios?
> 
> >              if (ret > 0) {
> >                  bs->bl.max_transfer = MIN(bs->bl.max_transfer,
> >                                            ret * getpagesize());
> > 
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 

Thank you very much for the review. I'll send a V2 soon.

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH 0/1] RFC: don't obey the block device max transfer len / max segments for block devices
  2019-06-30 15:08 [Qemu-devel] [PATCH 0/1] RFC: don't obey the block device max transfer len / max segments for block devices Maxim Levitsky
  2019-06-30 15:08 ` [Qemu-devel] [PATCH 1/1] raw-posix.c - use max transfer length / max segemnt count only for SCSI passthrough Maxim Levitsky
@ 2019-07-02 16:11 ` Maxim Levitsky
  2019-07-03  9:52 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2 siblings, 0 replies; 7+ messages in thread
From: Maxim Levitsky @ 2019-07-02 16:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Fam Zheng, Kevin Wolf, Max Reitz, qemu-block, John Ferlan

On Sun, 2019-06-30 at 18:08 +0300, Maxim Levitsky wrote:
> It looks like Linux block devices, even in O_DIRECT mode don't have any user visible
> limit on transfer size / number of segments, which underlying block device can have.
> The block layer takes care of enforcing these limits by splitting the bios.
> 
> By limiting the transfer sizes, we  force qemu to do the splitting itself which
> introduces various overheads.
> It is especially visible in nbd server, where the low max transfer size of the
> underlying device forces us to advertise this over NBD, thus increasing the traffic overhead in case of
> image conversion which benefits from large blocks.
> 
> More information can be found here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1647104
> 
> Tested this with qemu-img convert over nbd and natively and to my surprise, even native IO performance improved a bit.
> (The device on which it was tested is Intel Optane DC P4800X, which has 128k max transfer size)
> 
> The benchmark:
> 
> Images were created using:
> 
> Sparse image:  qemu-img create -f qcow2 /dev/nvme0n1p3 1G / 10G / 100G
> Allocated image: qemu-img create -f qcow2 /dev/nvme0n1p3 -o preallocation=metadata  1G / 10G / 100G
> 
> The test was:
> 
>  echo "convert native:"
>  rm -rf /dev/shm/disk.img
>  time qemu-img convert -p -f qcow2 -O raw -T none $FILE /dev/shm/disk.img > /dev/zero
> 
>  echo "convert via nbd:"
>  qemu-nbd -k /tmp/nbd.sock -v  -f qcow2 $FILE -x export --cache=none --aio=native --fork
>  rm -rf /dev/shm/disk.img
>  time qemu-img convert -p -f raw -O raw nbd:unix:/tmp/nbd.sock:exportname=export /dev/shm/disk.img > /dev/zero
> 
> The results:
> 
> =========================================
> 1G sparse image:
>  native:
> 	before: 0.027s
> 	after: 0.027s
>  nbd:
> 	before: 0.287s
> 	after: 0.035s
> 
> =========================================
> 100G sparse image:
>  native:
> 	before: 0.028s
> 	after: 0.028s
>  nbd:
> 	before: 23.796s
> 	after: 0.109s
> 
> =========================================
> 1G preallocated image:
>  native:
>        before: 0.454s
>        after: 0.427s
>  nbd:
>        before: 0.649s
>        after: 0.546s
> 
> The block limits of max transfer size/max segment size are retained
> for the SCSI passthrough because in this case the kernel passes the userspace request
> directly to the kernel scsi driver, bypassing the block layer, and thus there is no code to split
> such requests.
> 
> What do you think?
> 
> Fam, since you was the original author of the code that added
> these limits, could you share your opinion on that?
> What was the reason besides SCSI passthrough?
> 
> Best regards,
> 	Maxim Levitsky
> 
> Maxim Levitsky (1):
>   raw-posix.c - use max transfer length / max segemnt count only for
>     SCSI passthrough
> 
>  block/file-posix.c | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 


Ping

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 0/1] RFC: don't obey the block device max transfer len / max segments for block devices
  2019-06-30 15:08 [Qemu-devel] [PATCH 0/1] RFC: don't obey the block device max transfer len / max segments for block devices Maxim Levitsky
  2019-06-30 15:08 ` [Qemu-devel] [PATCH 1/1] raw-posix.c - use max transfer length / max segemnt count only for SCSI passthrough Maxim Levitsky
  2019-07-02 16:11 ` [Qemu-devel] [PATCH 0/1] RFC: don't obey the block device max transfer len / max segments for block devices Maxim Levitsky
@ 2019-07-03  9:52 ` Stefan Hajnoczi
  2019-07-03 14:46   ` Eric Blake
  2 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2019-07-03  9:52 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Fam Zheng, Kevin Wolf, qemu-block, qemu-devel, Max Reitz,
	John Ferlan

[-- Attachment #1: Type: text/plain, Size: 3261 bytes --]

On Sun, Jun 30, 2019 at 06:08:54PM +0300, Maxim Levitsky wrote:
> It looks like Linux block devices, even in O_DIRECT mode don't have any user visible
> limit on transfer size / number of segments, which underlying block device can have.
> The block layer takes care of enforcing these limits by splitting the bios.
> 
> By limiting the transfer sizes, we  force qemu to do the splitting itself which
> introduces various overheads.
> It is especially visible in nbd server, where the low max transfer size of the
> underlying device forces us to advertise this over NBD, thus increasing the traffic overhead in case of
> image conversion which benefits from large blocks.
> 
> More information can be found here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1647104
> 
> Tested this with qemu-img convert over nbd and natively and to my surprise, even native IO performance improved a bit.
> (The device on which it was tested is Intel Optane DC P4800X, which has 128k max transfer size)
> 
> The benchmark:
> 
> Images were created using:
> 
> Sparse image:  qemu-img create -f qcow2 /dev/nvme0n1p3 1G / 10G / 100G
> Allocated image: qemu-img create -f qcow2 /dev/nvme0n1p3 -o preallocation=metadata  1G / 10G / 100G
> 
> The test was:
> 
>  echo "convert native:"
>  rm -rf /dev/shm/disk.img
>  time qemu-img convert -p -f qcow2 -O raw -T none $FILE /dev/shm/disk.img > /dev/zero
> 
>  echo "convert via nbd:"
>  qemu-nbd -k /tmp/nbd.sock -v  -f qcow2 $FILE -x export --cache=none --aio=native --fork
>  rm -rf /dev/shm/disk.img
>  time qemu-img convert -p -f raw -O raw nbd:unix:/tmp/nbd.sock:exportname=export /dev/shm/disk.img > /dev/zero
> 
> The results:
> 
> =========================================
> 1G sparse image:
>  native:
> 	before: 0.027s
> 	after: 0.027s
>  nbd:
> 	before: 0.287s
> 	after: 0.035s
> 
> =========================================
> 100G sparse image:
>  native:
> 	before: 0.028s
> 	after: 0.028s
>  nbd:
> 	before: 23.796s
> 	after: 0.109s
> 
> =========================================
> 1G preallocated image:
>  native:
>        before: 0.454s
>        after: 0.427s
>  nbd:
>        before: 0.649s
>        after: 0.546s
> 
> The block limits of max transfer size/max segment size are retained
> for the SCSI passthrough because in this case the kernel passes the userspace request
> directly to the kernel scsi driver, bypassing the block layer, and thus there is no code to split
> such requests.
> 
> What do you think?
> 
> Fam, since you was the original author of the code that added
> these limits, could you share your opinion on that?
> What was the reason besides SCSI passthrough?
> 
> Best regards,
> 	Maxim Levitsky
> 
> Maxim Levitsky (1):
>   raw-posix.c - use max transfer length / max segemnt count only for
>     SCSI passthrough
> 
>  block/file-posix.c | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)

Adding Eric Blake, who implemented the generic request splitting in the
block layer and may know if there were any other reasons aside from SCSI
passthrough why file-posix.c enforces the host block device's maximum
transfer size.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 0/1] RFC: don't obey the block device max transfer len / max segments for block devices
  2019-07-03  9:52 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2019-07-03 14:46   ` Eric Blake
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Blake @ 2019-07-03 14:46 UTC (permalink / raw)
  To: Stefan Hajnoczi, Maxim Levitsky
  Cc: Fam Zheng, Kevin Wolf, qemu-block, qemu-devel, Max Reitz,
	John Ferlan


[-- Attachment #1.1: Type: text/plain, Size: 2762 bytes --]

On 7/3/19 4:52 AM, Stefan Hajnoczi wrote:
> On Sun, Jun 30, 2019 at 06:08:54PM +0300, Maxim Levitsky wrote:
>> It looks like Linux block devices, even in O_DIRECT mode don't have any user visible
>> limit on transfer size / number of segments, which underlying block device can have.
>> The block layer takes care of enforcing these limits by splitting the bios.

s/The block layer/The kernel block layer/

>>
>> By limiting the transfer sizes, we  force qemu to do the splitting itself which

double space

>> introduces various overheads.
>> It is especially visible in nbd server, where the low max transfer size of the
>> underlying device forces us to advertise this over NBD, thus increasing the traffic overhead in case of

Long line for a commit message.

>> image conversion which benefits from large blocks.
>>
>> More information can be found here:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1647104
>>
>> Tested this with qemu-img convert over nbd and natively and to my surprise, even native IO performance improved a bit.
>> (The device on which it was tested is Intel Optane DC P4800X, which has 128k max transfer size)
>>
>> The benchmark:
>>

I'm sorry I didn't see this before softfreeze, but as a performance
improvement, I think it still classes as a bug fix and is safe for
inclusion in rc0.

>> The block limits of max transfer size/max segment size are retained
>> for the SCSI passthrough because in this case the kernel passes the userspace request
>> directly to the kernel scsi driver, bypassing the block layer, and thus there is no code to split
>> such requests.
>>
>> What do you think?

Seems like a reasonable explanation.

>>
>> Fam, since you was the original author of the code that added
>> these limits, could you share your opinion on that?
>> What was the reason besides SCSI passthrough?
>>
>> Best regards,
>> 	Maxim Levitsky
>>
>> Maxim Levitsky (1):
>>   raw-posix.c - use max transfer length / max segemnt count only for
>>     SCSI passthrough
>>
>>  block/file-posix.c | 16 +++++++---------
>>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> Adding Eric Blake, who implemented the generic request splitting in the
> block layer and may know if there were any other reasons aside from SCSI
> passthrough why file-posix.c enforces the host block device's maximum
> transfer size.

No, I don't have any strong reasons for why file I/O must be capped to a
specific limit other than size_t (since the kernel does just fine at
splitting things up).

> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-07-03 15:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-30 15:08 [Qemu-devel] [PATCH 0/1] RFC: don't obey the block device max transfer len / max segments for block devices Maxim Levitsky
2019-06-30 15:08 ` [Qemu-devel] [PATCH 1/1] raw-posix.c - use max transfer length / max segemnt count only for SCSI passthrough Maxim Levitsky
2019-07-03 14:50   ` Eric Blake
2019-07-03 15:28     ` Maxim Levitsky
2019-07-02 16:11 ` [Qemu-devel] [PATCH 0/1] RFC: don't obey the block device max transfer len / max segments for block devices Maxim Levitsky
2019-07-03  9:52 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2019-07-03 14:46   ` Eric Blake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).