All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Hanna Czenczek <hreitz@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org
Subject: Re: [PATCH 2/5] block: Introduce BDRV_O_NO_DATA_{WRITE,RESIZE}
Date: Fri, 6 Mar 2026 14:04:53 +0100	[thread overview]
Message-ID: <aarQ9RSHeFHfsdwB@redhat.com> (raw)
In-Reply-To: <538cb2b6-5a4f-4927-afbf-ce4512320d21@redhat.com>

Am 06.03.2026 um 12:28 hat Hanna Czenczek geschrieben:
> On 06.03.26 12:12, Kevin Wolf wrote:
> > Am 06.03.2026 um 11:20 hat Hanna Czenczek geschrieben:
> > > On 04.03.26 17:13, Kevin Wolf wrote:
> > > > Am 04.03.2026 um 15:20 hat Hanna Czenczek geschrieben:
> > > > > On 02.03.26 15:30, Kevin Wolf wrote:
> > > > > > Am 05.02.2026 um 15:47 hat Hanna Czenczek geschrieben:
> > > > > > > Add BDS flags that prevent taking WRITE and/or RESIZE permissions on
> > > > > > > pure data (no metadata) children.  These are going to be used by qcow2
> > > > > > > during formatting, when we need write access to format the metadata
> > > > > > > file, but no write access to an external data file.  This will allow
> > > > > > > creating a qcow2 image for a raw image while the latter is currently in
> > > > > > > use by the VM.
> > > > > > > 
> > > > > > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> > > > > > > ---
> > > > > > >     include/block/block-common.h | 11 +++++++++++
> > > > > > >     block.c                      | 15 +++++++++++++++
> > > > > > >     2 files changed, 26 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/include/block/block-common.h b/include/block/block-common.h
> > > > > > > index c8c626daea..504f6aa113 100644
> > > > > > > --- a/include/block/block-common.h
> > > > > > > +++ b/include/block/block-common.h
> > > > > > > @@ -245,6 +245,17 @@ typedef enum {
> > > > > > >     #define BDRV_O_CBW_DISCARD_SOURCE 0x80000 /* for copy-before-write filter */
> > > > > > > +/*
> > > > > > > + * Promise not to write any data to pure (non-metadata-bearing) data storage
> > > > > > > + * children, so we don't need the WRITE permission for them.
> > > > > > > + * For image creation, formatting requires write access to the image, but not
> > > > > > > + * necessarily to its pure storage children.  This allows creating an image on
> > > > > > > + * top of an existing raw storage image that is already attached to the VM.
> > > > > > > + */
> > > > > > > +#define BDRV_O_NO_DATA_WRITE  0x100000
> > > > > > Can't we just use BDRV_O_NO_IO for this one? It is stricter because it
> > > > > > doesn't allow reading either, but I don't think image creation ever
> > > > > > requires reading from the image?
> > > > > How would qcow2 set it?  It opens the qcow2 image, so it can only set the
> > > > > flag on the qcow2 BDS (via a BlockBackend), but BDRV_O_NO_IO needs to go on
> > > > > the data-file child.  Maybe we can construct the graph manually…? It would
> > > > > be quite painful, I imagine, but I haven’t tried yet.
> > > > Why should it go on the data-file child? The child permissions are
> > > > defined by the qcow2 node. If the caller promises not to do any I/O (and
> > > > I'm fairly sure that apart from preallocation, creating the image
> > > > doesn't involve any I/O on the qcow2 node, just on the primary child),
> > > > then qcow2 doesn't need any permissions on data-file.
> > > > 
> > > > Or am I missing a reason why BDRV_O_NO_IO can't be set for the qcow2
> > > > node?
> > > First, bdrv_co_write_req_prepare() requires !BDRV_O_NO_IO and the RESIZE
> > > permission.
> > Ah, I wasn't aware that truncate requires !BDRV_O_NO_IO. It's not what I
> > would intuitively call I/O, but it's also justifiable because it changes
> > the result of reading some blocks.
> > 
> > The part where things start to feel questionable is with the way
> > qcow2_co_create() uses truncate. It doesn't actually want to truncate
> > the image (especially with an existing data file), but just allocate the
> > metadata for the full image size.
> > 
> > If the problem is just bdrv_co_write_req_prepare(), would a request flag
> > for the truncate solve it without having to introduce two global BDS
> > flags that can never be set by the user? BDRV_REQ_NO_DATA_IO or
> > something?
> 
> Oh, only one new flag.  We can drop the RESIZE flag, as you said, and make
> it !BDRV_O_RESIZE.  (If we call qcow2_co_truncate() directly, that is.)

The per-request part feels almost more important to me than having only
one flag. But yes, two separate flags for a single purpose isn't great
either, so moving to one would already be some improvement.

> Dropping that flag changes the error messages sometimes (because without
> this flag, WRITE will always imply RESIZE, so with a guest device that
> prevents concurrent resize, you will then always get RESIZE conflicts
> alongside WRITE), but that’s OK (because it’s only when you get a WRITE
> conflict anyway).
> 
> > > We could bypass this by calling qcow2_co_truncate() instead of
> > > blk_co_truncate().  Feels wrong to me to pass BDRV_O_NO_IO and
> > > !BDRV_O_RESIZE to blk_co_new_open() when we actually kind of want those
> > > things and just bypass the BB to get them, but only morally wrong. Not
> > > technically wrong.
> > > 
> > > Bigger problem: BDRV_O_NO_IO makes qcow2 skip opening the data-file
> > > altogether.  So we would need to distinguish between qemu-img info and
> > > this case somehow.
> > Do we need to have it opened, except for preallocation, which can't set
> > BDRV_O_NO_IO anyway?
> 
> Yes, for resizing it without preallocation (growing to fit).

Right. In this case, this series wouldn't set BDRV_O_NO_DATA_RESIZE
either, but it would set BDRV_O_NO_DATA_WRITE. This makes me wonder if
it would make sense...

> We could have qcow2_do_open() distinguish by checking whether the
> "data-file" option in the QDict is set and a string (a node-name), because
> if it is, it’s safe to take that existing BDS despite BDRV_O_NO_IO.

...to let qcow2_do_open() still open the data file with BDRV_O_NO_IO if
BDRV_O_RESIZE is given at the same time.

> I’m still not sure how I morally feel about passing BDRV_O_NO_IO when we do
> want to do I/O.  Yes, only metadata.  I know.  But I understand BDRV_O_NO_IO
> was introduced specifically for nothing at all, just querying image
> information.

"Just querying image information" is reading metadata. So if you include
metadata in your definition, you're already inconsistent.

For me I/O means reads, writes, discards etc. on the node. That is,
operations that actually access data. Metadata was never part of this
for me and obviously it is always accessed read-only at least because
otherwise you can't open the image. What's different here is that it's
also written to, but that's what BDRV_O_RDWR means. If you don't want to
write either data or metadata, you should just open the image read-only.

> And it works, yes – you just have to make sure you keep
> BDRV_O_RDWR in there, too, because otherwise the whole thing will be
> read-only and not work.  Feels really wrong to me, but I guess we already do
> the same thing at the end of qcow2_co_create() to flush the image, so…  Too
> late to protest now, I suppose.

Yes, that seems like the same case to me. Nothing is doing I/O on the
qcow2 node, but some metadata writes may be involved.

Kevin



  reply	other threads:[~2026-03-06 13:05 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-05 14:47 [PATCH 0/5] qcow2: Suppress data-file WRITE during creation Hanna Czenczek
2026-02-05 14:47 ` [PATCH 1/5] qcow2: Skip data-file resize if possible Hanna Czenczek
2026-02-05 14:47 ` [PATCH 2/5] block: Introduce BDRV_O_NO_DATA_{WRITE,RESIZE} Hanna Czenczek
2026-03-02 14:30   ` Kevin Wolf
2026-03-04 14:20     ` Hanna Czenczek
2026-03-04 16:13       ` Kevin Wolf
2026-03-06 10:20         ` Hanna Czenczek
2026-03-06 11:12           ` Kevin Wolf
2026-03-06 11:28             ` Hanna Czenczek
2026-03-06 13:04               ` Kevin Wolf [this message]
2026-02-05 14:47 ` [PATCH 3/5] qcow2: Preallocation: Do not COW after disk end Hanna Czenczek
2026-02-05 14:47 ` [PATCH 4/5] qcow2: Suppress data-file WRITE/RESIZE if possible Hanna Czenczek
2026-02-05 14:47 ` [PATCH 5/5] iotests: Add qcow2-live-data-file test Hanna Czenczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aarQ9RSHeFHfsdwB@redhat.com \
    --to=kwolf@redhat.com \
    --cc=hreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.