From: Paolo Bonzini <pbonzini@redhat.com>
To: Richard Laager <rlaager@wiktel.com>
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations
Date: Mon, 12 Mar 2012 13:27:58 +0100 [thread overview]
Message-ID: <4F5DEBCE.3040409@redhat.com> (raw)
In-Reply-To: <1331402560.8577.46.camel@watermelon.coderich.net>
Il 10/03/2012 19:02, Richard Laager ha scritto:
> I propose adding the following behaviors in any event:
> * If a QEMU block device reports a discard_granularity > 0, it
> must be equal to 2^n (n >= 0), or QEMU's block core will change
> it to 0. (Non-power-of-two granularities are not likely to exist
> in the real world, and this assumption greatly simplifies
> ensuring correctness.)
Yeah, I was considering this to be simply a bug in the block device.
> * For SCSI, report an unmap_granularity to the guest as follows:
> max(logical_block_size, discard_granularity) / logical_block_size
This is more or less already in place later in the series.
> As a design concept, instead of guaranteeing that 512B zero'ing discards
> are supported, I think the QEMU block layer should instead guarantee
> aligned discards to QEMU block devices, emulating any misaligned
> discards (or portions thereof) by writing zeroes if (and only if)
> discard_zeros_data is set.
Yes, this can be done of course. This series does not include it yet.
> This leaves one remaining issue: In raw-posix.c, for files (i.e. not
> devices), I assume you're going to advertise discard_granularity=1 and
> discard_zeros_data=1 when compiled with support for
> fallocate(FALLOC_FL_PUNCH_HOLE). Note, I'm assuming fallocate() actually
> guarantees that it zeros the data when punching holes.
It does, that's pretty much the definition of a hole.
> If the guest does a big discard (think mkfs) and fallocate() returns
> EOPNOTSUPP, you'll have to zero essentially the whole virtual disk,
> which, as you noted, will also allocate it (unless you explicitly check
> for holes). This is bad. It can be avoided by not advertising
> discard_zeros_data, but as you noted, that's unfortunate.
If you have a new kernel that supports SEEK_HOLE/SEEK_DATA, it can also
be done by skipping the zero write on known holes.
This could even be done at the block layer level using bdrv_is_allocated.
> If we could probe for FALLOC_FL_PUNCH_HOLE support, then we could avoid
> advertising discard support based on FALLOC_FL_PUNCH_HOLE when it is not
> going to work. This would side step these problems.
... and introduce others when migrating if your datacenter doesn't have
homogeneous kernel versions and/or filesystems. :(
> You said it wasn't
> possible to probe for FALLOC_FL_PUNCH_HOLE. Have you considered probing
> by extending the file by one byte and then punching that:
> char buf = 0;
> fstat(s->fd, &st);
> pwrite(s->fd, &buf, 1, st.st_size + 1);
> has_discard = !fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> st.st_size + 1, 1);
> ftruncate(s->fd, st.st_size);
Nice trick. :) Yes, that could work.
Do you know if non-Linux operating systems have something similar to
BLKDISCARDZEROES?
Paolo
next prev parent reply other threads:[~2012-03-12 12:28 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-08 17:15 [Qemu-devel] [RFC PATCH 00/17] Improvements around discard and write zeroes Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 01/17] qemu-iotests: add a simple test for write_zeroes Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 02/17] qed: make write-zeroes bounce buffer smaller than a single cluster Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 03/17] block: add discard properties to BlockDriverInfo Paolo Bonzini
2012-03-09 16:47 ` Kevin Wolf
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 04/17] qed: implement bdrv_aio_discard Paolo Bonzini
2012-03-09 16:31 ` Kevin Wolf
2012-03-09 17:53 ` Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 05/17] block: pass around qiov for write_zeroes operation Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations Paolo Bonzini
2012-03-09 16:37 ` Kevin Wolf
2012-03-09 18:06 ` Paolo Bonzini
2012-03-10 18:02 ` Richard Laager
2012-03-12 12:27 ` Paolo Bonzini [this message]
2012-03-12 13:04 ` Kevin Wolf
2012-03-13 19:13 ` Richard Laager
2012-03-14 7:41 ` Paolo Bonzini
2012-03-14 12:01 ` Kevin Wolf
2012-03-14 12:14 ` Paolo Bonzini
2012-03-14 12:37 ` Kevin Wolf
2012-03-14 12:49 ` Paolo Bonzini
2012-03-14 13:04 ` Kevin Wolf
2012-03-24 15:33 ` Christoph Hellwig
2012-03-24 15:30 ` Christoph Hellwig
2012-03-26 19:40 ` Richard Laager
2012-03-27 10:20 ` Kevin Wolf
2012-03-24 15:29 ` Christoph Hellwig
2012-03-26 9:44 ` Daniel P. Berrange
2012-03-26 9:56 ` Christoph Hellwig
2012-03-15 0:42 ` Richard Laager
2012-03-15 9:36 ` Paolo Bonzini
2012-03-16 0:47 ` Richard Laager
2012-03-16 9:34 ` Paolo Bonzini
2012-03-24 15:27 ` Christoph Hellwig
2012-03-26 19:40 ` Richard Laager
2012-03-27 9:08 ` Christoph Hellwig
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 07/17] block: make high level discard operation always zero Paolo Bonzini
2012-03-08 17:55 ` Avi Kivity
2012-03-09 16:42 ` Kevin Wolf
2012-03-12 10:42 ` Avi Kivity
2012-03-12 11:04 ` Kevin Wolf
2012-03-12 12:03 ` Avi Kivity
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 08/17] block: kill the write zeroes operation Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 09/17] ide: issue discard asynchronously but serialize the pieces Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 10/17] ide/scsi: add discard_zeroes_data property Paolo Bonzini
2012-03-08 18:13 ` Avi Kivity
2012-03-08 18:14 ` Avi Kivity
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 11/17] ide/scsi: prepare for flipping the discard defaults Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 12/17] ide/scsi: turn on discard Paolo Bonzini
2012-03-08 18:17 ` Avi Kivity
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 13/17] block: fallback from discard to writes Paolo Bonzini
2012-03-24 15:35 ` Christoph Hellwig
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 14/17] block: support FALLOC_FL_PUNCH_HOLE trimming Paolo Bonzini
2012-03-09 8:20 ` Chris Wedgwood
2012-03-09 8:31 ` Paolo Bonzini
2012-03-09 8:35 ` Chris Wedgwood
2012-03-09 8:40 ` Paolo Bonzini
2012-03-09 10:31 ` Stefan Hajnoczi
2012-03-09 10:43 ` Paolo Bonzini
2012-03-09 10:53 ` Stefan Hajnoczi
2012-03-09 10:57 ` Paolo Bonzini
2012-03-09 20:36 ` Richard Laager
2012-03-12 9:34 ` Paolo Bonzini
2012-03-24 15:40 ` Christoph Hellwig
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 15/17] raw: add get_info Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 16/17] qemu-io: fix the alloc command Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 17/17] raw: implement is_allocated Paolo Bonzini
2012-03-24 15:42 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F5DEBCE.3040409@redhat.com \
--to=pbonzini@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rlaager@wiktel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).