public inbox for qemu-devel@nongnu.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Hanna Czenczek <hreitz@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org
Subject: Re: [PATCH 2/5] block: Introduce BDRV_O_NO_DATA_{WRITE,RESIZE}
Date: Fri, 6 Mar 2026 14:04:53 +0100	[thread overview]
Message-ID: <aarQ9RSHeFHfsdwB@redhat.com> (raw)
In-Reply-To: <538cb2b6-5a4f-4927-afbf-ce4512320d21@redhat.com>

Am 06.03.2026 um 12:28 hat Hanna Czenczek geschrieben:
> On 06.03.26 12:12, Kevin Wolf wrote:
> > Am 06.03.2026 um 11:20 hat Hanna Czenczek geschrieben:
> > > On 04.03.26 17:13, Kevin Wolf wrote:
> > > > Am 04.03.2026 um 15:20 hat Hanna Czenczek geschrieben:
> > > > > On 02.03.26 15:30, Kevin Wolf wrote:
> > > > > > Am 05.02.2026 um 15:47 hat Hanna Czenczek geschrieben:
> > > > > > > Add BDS flags that prevent taking WRITE and/or RESIZE permissions on
> > > > > > > pure data (no metadata) children.  These are going to be used by qcow2
> > > > > > > during formatting, when we need write access to format the metadata
> > > > > > > file, but no write access to an external data file.  This will allow
> > > > > > > creating a qcow2 image for a raw image while the latter is currently in
> > > > > > > use by the VM.
> > > > > > > 
> > > > > > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> > > > > > > ---
> > > > > > >     include/block/block-common.h | 11 +++++++++++
> > > > > > >     block.c                      | 15 +++++++++++++++
> > > > > > >     2 files changed, 26 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/include/block/block-common.h b/include/block/block-common.h
> > > > > > > index c8c626daea..504f6aa113 100644
> > > > > > > --- a/include/block/block-common.h
> > > > > > > +++ b/include/block/block-common.h
> > > > > > > @@ -245,6 +245,17 @@ typedef enum {
> > > > > > >     #define BDRV_O_CBW_DISCARD_SOURCE 0x80000 /* for copy-before-write filter */
> > > > > > > +/*
> > > > > > > + * Promise not to write any data to pure (non-metadata-bearing) data storage
> > > > > > > + * children, so we don't need the WRITE permission for them.
> > > > > > > + * For image creation, formatting requires write access to the image, but not
> > > > > > > + * necessarily to its pure storage children.  This allows creating an image on
> > > > > > > + * top of an existing raw storage image that is already attached to the VM.
> > > > > > > + */
> > > > > > > +#define BDRV_O_NO_DATA_WRITE  0x100000
> > > > > > Can't we just use BDRV_O_NO_IO for this one? It is stricter because it
> > > > > > doesn't allow reading either, but I don't think image creation ever
> > > > > > requires reading from the image?
> > > > > How would qcow2 set it?  It opens the qcow2 image, so it can only set the
> > > > > flag on the qcow2 BDS (via a BlockBackend), but BDRV_O_NO_IO needs to go on
> > > > > the data-file child.  Maybe we can construct the graph manually…? It would
> > > > > be quite painful, I imagine, but I haven’t tried yet.
> > > > Why should it go on the data-file child? The child permissions are
> > > > defined by the qcow2 node. If the caller promises not to do any I/O (and
> > > > I'm fairly sure that apart from preallocation, creating the image
> > > > doesn't involve any I/O on the qcow2 node, just on the primary child),
> > > > then qcow2 doesn't need any permissions on data-file.
> > > > 
> > > > Or am I missing a reason why BDRV_O_NO_IO can't be set for the qcow2
> > > > node?
> > > First, bdrv_co_write_req_prepare() requires !BDRV_O_NO_IO and the RESIZE
> > > permission.
> > Ah, I wasn't aware that truncate requires !BDRV_O_NO_IO. It's not what I
> > would intuitively call I/O, but it's also justifiable because it changes
> > the result of reading some blocks.
> > 
> > The part where things start to feel questionable is with the way
> > qcow2_co_create() uses truncate. It doesn't actually want to truncate
> > the image (especially with an existing data file), but just allocate the
> > metadata for the full image size.
> > 
> > If the problem is just bdrv_co_write_req_prepare(), would a request flag
> > for the truncate solve it without having to introduce two global BDS
> > flags that can never be set by the user? BDRV_REQ_NO_DATA_IO or
> > something?
> 
> Oh, only one new flag.  We can drop the RESIZE flag, as you said, and make
> it !BDRV_O_RESIZE.  (If we call qcow2_co_truncate() directly, that is.)

The per-request part feels almost more important to me than having only
one flag. But yes, two separate flags for a single purpose isn't great
either, so moving to one would already be some improvement.

> Dropping that flag changes the error messages sometimes (because without
> this flag, WRITE will always imply RESIZE, so with a guest device that
> prevents concurrent resize, you will then always get RESIZE conflicts
> alongside WRITE), but that’s OK (because it’s only when you get a WRITE
> conflict anyway).
> 
> > > We could bypass this by calling qcow2_co_truncate() instead of
> > > blk_co_truncate().  Feels wrong to me to pass BDRV_O_NO_IO and
> > > !BDRV_O_RESIZE to blk_co_new_open() when we actually kind of want those
> > > things and just bypass the BB to get them, but only morally wrong. Not
> > > technically wrong.
> > > 
> > > Bigger problem: BDRV_O_NO_IO makes qcow2 skip opening the data-file
> > > altogether.  So we would need to distinguish between qemu-img info and
> > > this case somehow.
> > Do we need to have it opened, except for preallocation, which can't set
> > BDRV_O_NO_IO anyway?
> 
> Yes, for resizing it without preallocation (growing to fit).

Right. In this case, this series wouldn't set BDRV_O_NO_DATA_RESIZE
either, but it would set BDRV_O_NO_DATA_WRITE. This makes me wonder if
it would make sense...

> We could have qcow2_do_open() distinguish by checking whether the
> "data-file" option in the QDict is set and a string (a node-name), because
> if it is, it’s safe to take that existing BDS despite BDRV_O_NO_IO.

...to let qcow2_do_open() still open the data file with BDRV_O_NO_IO if
BDRV_O_RESIZE is given at the same time.

> I’m still not sure how I morally feel about passing BDRV_O_NO_IO when we do
> want to do I/O.  Yes, only metadata.  I know.  But I understand BDRV_O_NO_IO
> was introduced specifically for nothing at all, just querying image
> information.

"Just querying image information" is reading metadata. So if you include
metadata in your definition, you're already inconsistent.

For me I/O means reads, writes, discards etc. on the node. That is,
operations that actually access data. Metadata was never part of this
for me and obviously it is always accessed read-only at least because
otherwise you can't open the image. What's different here is that it's
also written to, but that's what BDRV_O_RDWR means. If you don't want to
write either data or metadata, you should just open the image read-only.

> And it works, yes – you just have to make sure you keep
> BDRV_O_RDWR in there, too, because otherwise the whole thing will be
> read-only and not work.  Feels really wrong to me, but I guess we already do
> the same thing at the end of qcow2_co_create() to flush the image, so…  Too
> late to protest now, I suppose.

Yes, that seems like the same case to me. Nothing is doing I/O on the
qcow2 node, but some metadata writes may be involved.

Kevin



  reply	other threads:[~2026-03-06 13:05 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-05 14:47 [PATCH 0/5] qcow2: Suppress data-file WRITE during creation Hanna Czenczek
2026-02-05 14:47 ` [PATCH 1/5] qcow2: Skip data-file resize if possible Hanna Czenczek
2026-02-05 14:47 ` [PATCH 2/5] block: Introduce BDRV_O_NO_DATA_{WRITE,RESIZE} Hanna Czenczek
2026-03-02 14:30   ` Kevin Wolf
2026-03-04 14:20     ` Hanna Czenczek
2026-03-04 16:13       ` Kevin Wolf
2026-03-06 10:20         ` Hanna Czenczek
2026-03-06 11:12           ` Kevin Wolf
2026-03-06 11:28             ` Hanna Czenczek
2026-03-06 13:04               ` Kevin Wolf [this message]
2026-02-05 14:47 ` [PATCH 3/5] qcow2: Preallocation: Do not COW after disk end Hanna Czenczek
2026-02-05 14:47 ` [PATCH 4/5] qcow2: Suppress data-file WRITE/RESIZE if possible Hanna Czenczek
2026-02-05 14:47 ` [PATCH 5/5] iotests: Add qcow2-live-data-file test Hanna Czenczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aarQ9RSHeFHfsdwB@redhat.com \
    --to=kwolf@redhat.com \
    --cc=hreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox