qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: "Denis V. Lunev" <den@openvz.org>, Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [PATCH for 2.7 1/1] qcow2: improve qcow2_co_write_zeroes()
Date: Mon, 25 Apr 2016 13:35:12 -0600	[thread overview]
Message-ID: <571E7170.1090305@redhat.com> (raw)
In-Reply-To: <571DEF7D.1010304@openvz.org>

[-- Attachment #1: Type: text/plain, Size: 3048 bytes --]

On 04/25/2016 04:20 AM, Denis V. Lunev wrote:
> On 04/25/2016 12:05 PM, Kevin Wolf wrote:
>> Am 23.04.2016 um 14:05 hat Denis V. Lunev geschrieben:
>>> Unfortunately Linux kernel could send non-aligned requests to qemu-nbd
>>> if the caller is using O_DIRECT and does not align in-memory data to
>>> page. Thus qemu-nbd will call block layer with non-aligned requests.

At first glance, I'd argue that any caller using O_DIRECT without
obeying memory alignment restrictions is broken; why should qemu have to
work around such broken callers?


> The program is 100% reproducible. The following sequence
> is performed:
> 
> #define _GNU_SOURCE
> 
> #include <stdio.h>
> #include <malloc.h>
> #include <string.h>
> #include <sys/fcntl.h>
> #include <unistd.h>
> 
> int main(int argc, char *argv[])
> {
>     char *buf;
>     int fd;
> 
>     if (argc != 2) {
>         return -1;
>     }
> 
>     fd = open(argv[1], O_WRONLY | O_DIRECT);
> 
>     do {
>         buf = memalign(512, 1024 * 1024);
>     } while (!(unsigned long)buf & (4096 - 1));

In other words, you are INTENTIONALLY grabbing an unaligned buffer for
use with an O_DIRECT fd, when O_DIRECT demands that you should be using
at least page alignment (4096 or greater).  Arguably, the bug is in your
program, not qemu.

That said, teaching qemu to split up a write_zeroes request into head,
tail, and aligned body, so at least the aligned part might benefit from
optimizations, seems like it might be worthwhile, particularly since my
pending NBD series changed from blk_write_zeroes (cluster-aligned) to
blk_pwrite_zeroes (byte-aligned), and it is conceivable that we  can
encounter a situation without O_DIRECT in the picture at all, where our
NBD server is connected to a client that specifically asks for the new
NBD_CMD_WRITE_ZEROES on any arbitrary byte alignment.

>     memset(buf, 0, 1024 * 1024);
>     write(fd, buf, 1024 * 1024);
>     return 0;
> }
> 
> This program is compiled as a.out.
> 
> Before the patch:
> hades ~/src/qemu $ qemu-img create -f qcow2 test.qcow2 64G
> Formatting 'test.qcow2', fmt=qcow2 size=68719476736 encryption=off
> cluster_size=65536 lazy_refcounts=off refcount_bits=16
> hades ~/src/qemu $ sudo ./qemu-nbd  --connect=/dev/nbd3 test.qcow2
> --detect-zeroes=on --aio=native --cache=none
> hades ~/src/qemu $ sudo ./a.out /dev/nbd3

Here, I'd argue that the kernel NBD module is buggy, for allowing a
user-space app to pass an unaligned buffer to an O_DIRECT fd.  But
that's outside the realm of qemu code.

But again, per my argument that you don't have to involve the kernel nor
O_DIRECT to be able to write a client that can attempt to cause an NBD
server to do unaligned operations, we can use this kernel bug as an
easier way to test any proposed fix to the qemu side of things, whether
or not the kernel module gets tightened in behavior down the road.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

  reply	other threads:[~2016-04-25 19:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-23 12:05 [Qemu-devel] [PATCH for 2.7 1/1] qcow2: improve qcow2_co_write_zeroes() Denis V. Lunev
2016-04-25  9:05 ` Kevin Wolf
2016-04-25 10:20   ` Denis V. Lunev
2016-04-25 19:35     ` Eric Blake [this message]
2016-04-25 21:00       ` Denis V. Lunev
2016-04-26  8:23     ` Kevin Wolf
2016-04-26  9:35       ` Denis V. Lunev
2016-04-26 10:19         ` Kevin Wolf
2016-04-27  7:07           ` Denis V. Lunev
2016-04-27  8:12             ` Kevin Wolf
2016-04-27  8:32               ` Denis V. Lunev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=571E7170.1090305@redhat.com \
    --to=eblake@redhat.com \
    --cc=den@openvz.org \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).