qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jason Dillaman <jdillama@redhat.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Josh Durgin <jdurgin@redhat.com>,
	qemu-block <qemu-block@nongnu.org>,
	qemu-devel <qemu-devel@nongnu.org>, Max Reitz <mreitz@redhat.com>,
	John Snow <jsnow@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v3] block/rbd: implement .bdrv_get_allocated_file_size callback
Date: Mon, 8 Jul 2019 23:08:53 -0400	[thread overview]
Message-ID: <CA+aFP1AgNGJMdAG_E23Q-rf2Gt1rpeKjDfrk1PLA3f4XiUkGtw@mail.gmail.com> (raw)
In-Reply-To: <20190705104318.dngmmu3lpuvbe2nh@steredhat>

On Fri, Jul 5, 2019 at 6:43 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Fri, Jul 05, 2019 at 11:58:43AM +0200, Max Reitz wrote:
> > On 05.07.19 11:32, Stefano Garzarella wrote:
> > > This patch allows 'qemu-img info' to show the 'disk size' for
> > > the RBD images that have the fast-diff feature enabled.
> > >
> > > If this feature is enabled, we use the rbd_diff_iterate2() API
> > > to calculate the allocated size for the image.
> > >
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > ---
> > > v3:
> > >   - return -ENOTSUP instead of -1 when fast-diff is not available
> > >     [John, Jason]
> > > v2:
> > >   - calculate the actual usage only if the fast-diff feature is
> > >     enabled [Jason]
> > > ---
> > >  block/rbd.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 54 insertions(+)
> >
> > Well, the librbd documentation is non-existing as always, but while
> > googling, I at least found that libvirt has exactly the same code.  So I
> > suppose it must be quite correct, then.
> >
>
> While I wrote this code I took a look at libvirt implementation and also
> at the "rbd" tool in the ceph repository: compute_image_disk_usage() in
> src/tools/rbd/action/DiskUsage.cc
>
> > > diff --git a/block/rbd.c b/block/rbd.c
> > > index 59757b3120..b6bed683e5 100644
> > > --- a/block/rbd.c
> > > +++ b/block/rbd.c
> > > @@ -1084,6 +1084,59 @@ static int64_t qemu_rbd_getlength(BlockDriverState *bs)
> > >      return info.size;
> > >  }
> > >
> > > +static int rbd_allocated_size_cb(uint64_t offset, size_t len, int exists,
> > > +                                 void *arg)
> > > +{
> > > +    int64_t *alloc_size = (int64_t *) arg;
> > > +
> > > +    if (exists) {
> > > +        (*alloc_size) += len;
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static int64_t qemu_rbd_get_allocated_file_size(BlockDriverState *bs)
> > > +{
> > > +    BDRVRBDState *s = bs->opaque;
> > > +    uint64_t flags, features;
> > > +    int64_t alloc_size = 0;
> > > +    int r;
> > > +
> > > +    r = rbd_get_flags(s->image, &flags);
> > > +    if (r < 0) {
> > > +        return r;
> > > +    }
> > > +
> > > +    r = rbd_get_features(s->image, &features);
> > > +    if (r < 0) {
> > > +        return r;
> > > +    }
> > > +
> > > +    /*
> > > +     * We use rbd_diff_iterate2() only if the RBD image have fast-diff
> > > +     * feature enabled. If it is disabled, rbd_diff_iterate2() could be
> > > +     * very slow on a big image.
> > > +     */
> > > +    if (!(features & RBD_FEATURE_FAST_DIFF) ||
> > > +        (flags & RBD_FLAG_FAST_DIFF_INVALID)) {
> > > +        return -ENOTSUP;
> > > +    }
> > > +
> > > +    /*
> > > +     * rbd_diff_iterate2(), if the source snapshot name is NULL, invokes
> > > +     * the callback on all allocated regions of the image.
> > > +     */
> > > +    r = rbd_diff_iterate2(s->image, NULL, 0,
> > > +                          bs->total_sectors * BDRV_SECTOR_SIZE, 0, 1,
> > > +                          &rbd_allocated_size_cb, &alloc_size);
> >
> > But I have a question.  This is basically block_status, right?  So it
> > gives us information on which areas are allocated and which are not.
> > The result thus gives us a lower bound on the allocation size, but is it
> > really exactly the allocation size?
> >
> > There are two things I’m concerned about:
> >
> > 1. What about metadata?
>
> Good question, I don't think it includes the size used by metadata and I
> don't know if there is a way to know it. I'll check better.

It does not include the size of metadata, the "rbd_diff_iterate2"
function is literally just looking for touched data blocks within the
RBD image.

> >
> > 2. If you have multiple snapshots, this will only report the overall
> > allocation information, right?  So say there is something like this:
> >
> > (“A” means an allocated MB, “-” is an unallocated MB)
> >
> > Snapshot 1: AAAA---
> > Snapshot 2: --AAAAA
> > Snapshot 3: -AAAA--
> >
> > I think the allocated data size is the number of As in total (13 MB).
> > But I suppose this API will just return 7 MB, because it looks on
> > everything an it sees the whole image range (7 MB) to be allocated.  It
> > doesn’t report in how many snapshots some region is allocated.

It should return 13 dirty data blocks (multipled by the size of the
data block) since when you don't provide a "from snapshot" name, it
will iterate from the first snapshot to the HEAD revision.

> Looking at the documentation of rbd_diff_iterate2() [1] they says:
>
>  *                        If the source snapshot name is NULL, we
>  * interpret that as the beginning of time and return all allocated
>  * regions of the image.
>
> But I don't know the answer of your question (maybe Jason can help
> here).
> I should check better the implementation to understand if I can cycle
> on all snapshot to get the exact allocated data size.
>
> https://github.com/ceph/ceph/blob/master/src/include/rbd/librbd.h#L925
>
> I'll back when I have more details on the rbd implementation to better
> answer your questions.
>
> Thanks,
> Stefano



-- 
Jason


  reply	other threads:[~2019-07-09  3:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-05  9:32 [Qemu-devel] [PATCH v3] block/rbd: implement .bdrv_get_allocated_file_size callback Stefano Garzarella
2019-07-05  9:58 ` Max Reitz
2019-07-05 10:43   ` Stefano Garzarella
2019-07-09  3:08     ` Jason Dillaman [this message]
2019-07-09  8:55       ` Max Reitz
2019-07-09  9:45         ` Max Reitz
2019-07-09 12:55           ` Jason Dillaman
2019-07-09 13:09             ` Stefano Garzarella
2019-07-09 15:32               ` Max Reitz
2019-07-10  1:42                 ` Jason Dillaman
2019-07-10 15:28                   ` Stefano Garzarella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+aFP1AgNGJMdAG_E23Q-rf2Gt1rpeKjDfrk1PLA3f4XiUkGtw@mail.gmail.com \
    --to=jdillama@redhat.com \
    --cc=dillaman@redhat.com \
    --cc=jdurgin@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).