From: Kevin Wolf <kwolf@redhat.com>
To: Nir Soffer <nsoffer@redhat.com>
Cc: qemu-devel@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
qemu-block@nongnu.org
Subject: Re: [PATCH] block/file-posix: Consider discard flag when opening
Date: Wed, 19 Jun 2024 10:16:48 +0200 [thread overview]
Message-ID: <ZnKT8ECzHjpLGOm_@redhat.com> (raw)
In-Reply-To: <20240618212457.714456-1-nsoffer@redhat.com>
Am 18.06.2024 um 23:24 hat Nir Soffer geschrieben:
> Set has_discard only when BDRV_O_UNMAP is not set. With this users that
> want to keep their images fully allocated can disable hole punching
> when writing zeros or discarding using:
>
> -drive file=thick.img,discard=off
>
> This change is not entirely correct since it changes the default discard
> behavior. Previously we always allowed punching holes, but now you have
> must use discard=unmap|on to enable it. We probably need to add the
> BDDR_O_UNMAP flag by default.
>
> make check still works, so maybe we don't have tests for sparsifying
> images, or maybe you need to run special tests that do not run by
> default. We needs tests for keeping images non-sparse.
>
> Signed-off-by: Nir Soffer <nsoffer@redhat.com>
So first of all, I agree with you that this patch is wrong. ;-)
At first, I failed to understand the problem this is trying to solve. I
put a debug message in handle_aiocb_discard() and tried with which
options it triggers. [1] To me, this looked exactly like it should be.
We only try to discard blocks when discard=unmap is given as an option.
That leaves the case of write_zeroes. And while at the first sight, the
code looked good, we do seem to have a problem there and it tried to
unmap even with discard=off.
> block/file-posix.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index be25e35ff6..acac2abadc 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -738,11 +738,11 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
> ret = -EINVAL;
> goto fail;
> }
> #endif /* !defined(CONFIG_LINUX_IO_URING) */
>
> - s->has_discard = true;
> + s->has_discard = !!(bdrv_flags & BDRV_O_UNMAP);
> s->has_write_zeroes = true;
>
> if (fstat(s->fd, &st) < 0) {
> ret = -errno;
> error_setg_errno(errp, errno, "Could not stat file");
s->has_discard is about what the host supports, not about the semantics
of the QEMU block node. So this doesn't feel right to me.
So for the buggy case, write_zeroes, bdrv_co_pwrite_zeroes() has code
that considers the case and clears the ~BDRV_REQ_MAY_UNMAP flags:
if (!(child->bs->open_flags & BDRV_O_UNMAP)) {
flags &= ~BDRV_REQ_MAY_UNMAP;
}
But it turns out that we don't necessarily even go through this function
for the top node which has discard=off, so it can't take effect:
(gdb) bt
#0 0x00007ffff4f2f144 in __pthread_kill_implementation () at /lib64/libc.so.6
#1 0x00007ffff4ed765e in raise () at /lib64/libc.so.6
#2 0x00007ffff4ebf902 in abort () at /lib64/libc.so.6
#3 0x000055555615aff0 in raw_do_pwrite_zeroes (bs=0x555557f4bcf0, offset=0, bytes=1048576, flags=BDRV_REQ_MAY_UNMAP, blkdev=false) at ../block/file-posix.c:3643
#4 0x000055555615557e in raw_co_pwrite_zeroes (bs=0x555557f4bcf0, offset=0, bytes=1048576, flags=BDRV_REQ_MAY_UNMAP) at ../block/file-posix.c:3655
#5 0x00005555560cde2a in bdrv_co_do_pwrite_zeroes (bs=0x555557f4bcf0, offset=0, bytes=1048576, flags=6) at ../block/io.c:1901
#6 0x00005555560c72f9 in bdrv_aligned_pwritev (child=0x555557f51460, req=0x7fffed5ff800, offset=0, bytes=1048576, align=1, qiov=0x0, qiov_offset=0, flags=6) at ../block/io.c:2100
#7 0x00005555560c6b41 in bdrv_co_do_zero_pwritev (child=0x555557f51460, offset=0, bytes=1048576, flags=6, req=0x7fffed5ff800) at ../block/io.c:2183
#8 0x00005555560c6647 in bdrv_co_pwritev_part (child=0x555557f51460, offset=0, bytes=1048576, qiov=0x0, qiov_offset=0, flags=6) at ../block/io.c:2283
#9 0x00005555560c634f in bdrv_co_pwritev (child=0x555557f51460, offset=0, bytes=1048576, qiov=0x0, flags=6) at ../block/io.c:2216
#10 0x00005555560c75b5 in bdrv_co_pwrite_zeroes (child=0x555557f51460, offset=0, bytes=1048576, flags=BDRV_REQ_MAY_UNMAP) at ../block/io.c:2322
#11 0x0000555556117d24 in raw_co_pwrite_zeroes (bs=0x555557f44980, offset=0, bytes=1048576, flags=BDRV_REQ_MAY_UNMAP) at ../block/raw-format.c:307
#12 0x00005555560cde2a in bdrv_co_do_pwrite_zeroes (bs=0x555557f44980, offset=0, bytes=1048576, flags=6) at ../block/io.c:1901
#13 0x00005555560c72f9 in bdrv_aligned_pwritev (child=0x555557f513f0, req=0x7fffed5ffd90, offset=0, bytes=1048576, align=1, qiov=0x0, qiov_offset=0, flags=6) at ../block/io.c:2100
#14 0x00005555560c6b41 in bdrv_co_do_zero_pwritev (child=0x555557f513f0, offset=0, bytes=1048576, flags=6, req=0x7fffed5ffd90) at ../block/io.c:2183
#15 0x00005555560c6647 in bdrv_co_pwritev_part (child=0x555557f513f0, offset=0, bytes=1048576, qiov=0x0, qiov_offset=0, flags=6) at ../block/io.c:2283
#16 0x00005555560ad741 in blk_co_do_pwritev_part (blk=0x555557f51660, offset=0, bytes=1048576, qiov=0x0, qiov_offset=0, flags=6) at ../block/block-backend.c:1425
#17 0x00005555560ad5f2 in blk_co_pwritev_part (blk=0x555557f51660, offset=0, bytes=1048576, qiov=0x0, qiov_offset=0, flags=6) at ../block/block-backend.c:1440
#18 0x00005555560ad8cf in blk_co_pwritev (blk=0x555557f51660, offset=0, bytes=1048576, qiov=0x0, flags=6) at ../block/block-backend.c:1462
#19 0x00005555560b0f79 in blk_co_pwrite_zeroes (blk=0x555557f51660, offset=0, bytes=1048576, flags=6) at ../block/block-backend.c:2590
#20 0x000055555606d240 in blk_co_pwrite_zeroes_entry (opaque=0x7fffffffbc18) at block/block-gen.c:2162
#21 0x00005555562c36ba in coroutine_trampoline (i0=1475685216, i1=21845) at ../util/coroutine-ucontext.c:175
#22 0x00007ffff4ef1190 in ??? () at /lib64/libc.so.6
#23 0x00007fffffffc010 in ??? ()
#24 0x0000000000000000 in ??? ()
I haven't checked the details yet, but my first impression is that this
check should probably move to bdrv_co_do_pwrite_zeroes().
Kevin
[1] Tests I did:
# For discard
$ echo -e 'qemu-io none0 "discard 0 1M"\nquit' | ./qemu-system-x86_64 -drive if=none,file=/tmp/test.raw,format=raw -monitor stdio
QEMU 9.0.50 monitor - type 'help' for more information
(qemu) qemu-io none0 "discard 0 1M"
discard 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 00.00 sec (396.171 GiB/sec and 405679.5132 ops/sec)
(qemu) quit
$ echo -e 'qemu-io none0 "discard 0 1M"\nquit' | ./qemu-system-x86_64 -drive if=none,file=/tmp/test.raw,format=raw,discard=unmap -monitor stdio
QEMU 9.0.50 monitor - type 'help' for more information
(qemu) qemu-io none0 "discard 0 1M"
handle_aiocb_discard
discard 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 00.00 sec (6.623 GiB/sec and 6782.2820 ops/sec)
(qemu) quit
$ echo -e 'qemu-io none0 "discard 0 1M"\nquit' | ./qemu-system-x86_64 -drive if=none,file=/tmp/test.raw,format=raw,discard=off -monitor stdio
QEMU 9.0.50 monitor - type 'help' for more information
(qemu) qemu-io none0 "discard 0 1M"
discard 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 00.00 sec (375.168 GiB/sec and 384172.1091 ops/sec)
(qemu) quit
# For write_zeroes
$ echo -e 'qemu-io none0 "write -zu 0 1M"\nquit' | ./qemu-system-x86_64 -drive if=none,file=/tmp/test.raw,format=raw,discard=unmap -monitor stdio
QEMU 9.0.50 monitor - type 'help' for more information
(qemu) qemu-io none0 "write -zu 0 1M"
handle_aiocb_write_zeroes_unmap
wrote 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 00.00 sec (4.943 GiB/sec and 5061.5997 ops/sec)
(qemu) quit
$ echo -e 'qemu-io none0 "write -zu 0 1M"\nquit' | ./qemu-system-x86_64 -drive if=none,file=/tmp/test.raw,format=raw,discard=off -monitor stdio
QEMU 9.0.50 monitor - type 'help' for more information
(qemu) qemu-io none0 "write -zu 0 1M"
handle_aiocb_write_zeroes_unmap
wrote 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 00.00 sec (7.208 GiB/sec and 7381.4357 ops/sec)
(qemu) quit
next prev parent reply other threads:[~2024-06-19 8:18 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-18 21:24 [PATCH] block/file-posix: Consider discard flag when opening Nir Soffer
2024-06-19 8:16 ` Kevin Wolf [this message]
2024-06-19 10:58 ` Nir Soffer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZnKT8ECzHjpLGOm_@redhat.com \
--to=kwolf@redhat.com \
--cc=hreitz@redhat.com \
--cc=nsoffer@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.