Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table for qcow2 shrinking

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Jun Li <junmuzi@126.com>
To: Max Reitz <mreitz@redhat.com>
Cc: kwolf@redhat.com, famz@redhat.com, juli@redhat.com,
	qemu-devel@nongnu.org, stefanha@redhat.com,
	Jun Li <junmuzi@gmail.com>
Subject: Re: [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table for qcow2 shrinking
Date: Mon, 19 Jan 2015 21:16:11 +0800	[thread overview]
Message-ID: <20150119131611.GA2307@localhost.localdomain> (raw)
In-Reply-To: <54B80B4C.7010406@redhat.com>

On Thu, 01/15 13:47, Max Reitz wrote:
> On 2015-01-03 at 07:23, Jun Li wrote:
> >On Fri, 11/21 11:56, Max Reitz wrote:
> >>So, as for what I think we do need to do when shrinking (and keep in mind:
> >>The offset given to qcow2_truncate() is the guest size! NOT the host image
> >>size!):
> >>
> >>(1) Determine the first L2 table and the first entry in the table which will
> >>lie beyond the new guest disk size.
> >Here is not correct always. Due to the COW, using offset to calculate the
> >first entry of the first L2 table will be incorrect.
> 
> Again: This is *not* about the host disk size or the host offset of some
> cluster, but about the *guest* disk size.
> 
> Let's make up an example. You have a 2 GB disk but you want to resize it to
> 1.25 GB. The cluster size is 64 kB, therefore we have 2 GB / 64 kB = 32,768
> data clusters (as long as there aren't any internal snapshots, which is a
> prerequisite for resizing qcow2 images).
> 
> Every L2 table contains 65,536 / 8 = 8,192 entries; there are thus 32,768 /
> 8,192 = 4 L2 tables.
> 
> As you can see, one can directly derive the number of data clusters and L2
> tables from the guest disk size (as long as there aren't any internal
> snapshots).
> 
> So of course we can do the same for the target disk size: 1.25 GB / 64 kB =
> 20,480 data clusters; 20,480 / 8,192 = 2.5 L2 tables, therefore we need
> three L2 tables but only half of the last one (4,096 entries).
> 

Sorry, last time is my mis-understanding. If do not use qcow2_truncate(), I
think don't existing above issue.

For my original thought, I want to say:
Sometimes the second L2 table will contain some entry, the pointer in this
entry will point to a cluster which address is larger than 1.25 GB.

So if not use qcow2_truncate(), won't discard above cluster which address is
larger than 1.25 GB.

But I still have another worry.

Suppose "virtual size" and "disk size" are all 2G. After we resize it to
1.25G, seems we will get "virtual size" is 1.25G but "disk size" is still 2G
if do not use "qcow2_truncate()" to truncate the file(Yes, I know use
qcow2_truncate is not a resolution). This seems strange, not so perfect.

> We know that every cluster references somewhere after that limit (that is,
> every entry in the fourth L2 table and every entry starting with index 4,096
> in the third L2 table) is a data cluster with a guest offset somewhere
> beyond 1.25 GB, so we don't need it anymore.
> 
> Thus, we simply discard all those data clusters and after that we can
> discard the fourth L2 table. That's it.
> 
> If we really want to we can calculate the highest cluster host offset in use
> and truncate the image accordingly. But that's optional, see the last point
> in my "problems with this approach" list (having discarded the clusters
> should save us all the space already). Furthermore, as I'm saying in that
> list, to really solve this issue, we'd need qcow2 defragmentation.
> 

Do we already have "qcow2 defragmentation" realization?

Jun Li

> >What I have done for this scenario:
> >(1) if the first entry is the first entry of the L2 table, so will scan "the
> >previous L2 table"("the previous L2 table" location is in front of "L2 table"
> >in L1 table). If the entry of previous L2 table is larger than offset, will
> >discard this entry, too.
> >(2) If the first entry is not the first entry of the L2 table, still to scan
> >the whole L2 table to make sure no entry is beyond offset.
> >
> >>(2) Discard all clusters beginning from there.
> >>(3) Discard all L2 tables which are then completely empty.
> >>(4) Update the header size.
> >For this patch current's realizion, have include above 4 steps I think.
> >Current patch, also have another step 5.
> >(5) truncate the file.
> 
> As I wrote above, you can do that but it shouldn't matter much because the
> discarded clusters should not use any disk space.
> 
> >Here I think we also should add discard refcount table and refcount block
> >table when they are completely empty.
> >
> >>And that's it. We can either speed up step (2) by implementing it manually,
> >>or we just use bdrv_discard() on the qcow2 BDS (in the simplest case:
> >>bdrv_discard(bs, DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE), bs->total_sectors -
> >>DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE));.
> >>
> >>We can incorporate step (3) by extending qcow2_discard_clusters() to free L2
> >>tables when they are empty after discard_single_l2(). But we don't even have
> >>to that now. It's an optimization we can go about later.
> >>
> >>So, we can do (1), (2) and (3) in a single step: Just one bdrv_discard()
> >>call. But it's probably better to use qcow2_discard_clusters() instead and
> >>set the full_discard parameter to true.
> >>
> >>So: qcow2_discard_clusters(bs, offset, bs->total_sectors - offset /
> >>BDRV_SECTOR_SIZE, true);. Then update the guest disk size field in the
> >>header. And we're done.
> >>
> >>There are four problems with this approach:
> >>- qcow2_discard_clusters() might be slower than optimal. I personally don't
> >>care at all.
> >>- If "bs->total_sectors * BDRV_SECTOR_SIZE - offset" is greater than
> >>INT_MAX, this won't work. Trivially solvable by encapsulating the
> >>qcow2_discard_clusters() call in a loop which limits nb_clusters to INT_MAX
> >>/ BDRV_SECTOR_SIZE.
> >>- The L1 table is not resized. Should not matter in practice at all.
> >Yes, agree with you.
> >
> >>- The file is not truncated. Does not matter either (because holes in the
> >>file are good enough), and we can't actually solve this problem without
> >>defragmentation anyway.
> >>
> >>There is one advantage:
> >>- It's extremely simple. It's literally below ten lines of code.
> >>
> >>I think the advantage far outweighs the disadvantage. But I may be wrong.
> >>What do you think?
> >Hi max,
> >
> >   Sorry for so late to reply as I am so busy recently. I think let's have an
> >agreement on how to realize qcow2 shrinking first, then type code is better.
> 
> Yes, this will probably be for the best. :-)
> 
> >Another issue, as gmail can not be used in current China, I have to use this
> >email to reply. :)
> 
> No problem.
> 
> Max

next prev parent reply	other threads:[~2015-01-19 13:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-26 15:20 [Qemu-devel] [PATCH v5 0/3] qcow2: Patch for shrinking qcow2 disk image Jun Li
2014-10-26 15:20 ` [Qemu-devel] [PATCH v5 1/3] qcow2: Add qcow2_shrink_l1_and_l2_table for qcow2 shrinking Jun Li
2014-11-21 10:56   ` Max Reitz
2014-11-24 17:49     ` Eric Blake
2015-01-03 12:23     ` Jun Li
2015-01-15 18:47       ` Max Reitz
2015-01-19 13:16         ` Jun Li [this message]
2015-01-22 19:14           ` Max Reitz
2015-01-27 14:06             ` Jun Li
2014-10-26 15:20 ` [Qemu-devel] [PATCH v5 2/3] qcow2: add update refcount table realization for update_refcount Jun Li
2014-11-21 12:41   ` Max Reitz
2014-11-24 18:11     ` Eric Blake
2014-10-26 15:20 ` [Qemu-devel] [PATCH v5 3/3] qcow2: Add qemu-iotests for qcow2 shrinking Jun Li
2014-11-21 13:01   ` Max Reitz
2014-11-10  8:36 ` [Qemu-devel] [PATCH v5 0/3] qcow2: Patch for shrinking qcow2 disk image Jun Li
2014-11-10  9:17   ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150119131611.GA2307@localhost.localdomain \
    --to=junmuzi@126.com \
    --cc=famz@redhat.com \
    --cc=juli@redhat.com \
    --cc=junmuzi@gmail.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).