All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Avi Kivity <avi@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	stefanha@gmail.com, mjt@tls.msk.ru, qemu-devel@nongnu.org,
	hch@lst.de
Subject: [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes"
Date: Wed, 25 Aug 2010 09:14:07 -0500	[thread overview]
Message-ID: <4C75252F.6040002@codemonkey.ws> (raw)
In-Reply-To: <4C752211.5010600@redhat.com>

On 08/25/2010 09:00 AM, Avi Kivity wrote:
>  On 08/25/2010 04:42 PM, Anthony Liguori wrote:
>> On 08/25/2010 08:23 AM, Avi Kivity wrote:
>>>  On 08/25/2010 03:46 PM, Anthony Liguori wrote:
>>>>
>>>> If we had another disk format that only supported growth and 
>>>> metadata for a backing file, can you think of another failure 
>>>> scenario?
>>>
>>> btw, only supporting growth is a step backwards.  Currently 
>>> file-backed disks keep growing even the guest-used storage doesn't 
>>> grow, since once we allocate something we never release it.  But 
>>> eventually guests will start using TRIM or DISCARD or however it's 
>>> called, and then we can expose it and reclaim unused blocks.
>>
>> You can do this in one of two ways.  You can do online compaction or 
>> you can maintain a free list.  Online compaction has an advantage 
>> because it does not require any operations in the fast path whereas a 
>> free list would require ordered metadata updates (must remove 
>> something from the first list before updating the l2 table) which 
>> implies a sync.
>
> DISCARD/TRIM can queue blocks to the same preallocated block list we 
> have to optimize allocation.  New
> allocations can come from this list, if it grows too large we sync 
> part of it to disk to avoid loss of a lot of free space on power fail.
>
>> At a high level, I don't think online compaction requires any 
>> specific support from an image format.
>>
>
> You need to know that the block is free and can be reallocated.

Semantically, TRIM/DISCARD means that "I don't care about the contents 
of the block anymore until I do another write."  Behind the scenes, we 
can keep track of which blocks have been discarded in an in-memory list 
whereas the first write to the block causes it to be evicted from the 
discarded list.

A background task would attempt to detect idle I/O and copy a block from 
the end of the file to a location on the discarded list.  When the copy 
has completed, you can then remove the L2 entry for the discarded block 
(effectively punching a hole in the image), sync, and then update the l2 
entry for the block at the end of file location to point to the new 
block location.  You can then ftruncate to reduce overall file size.

If you tried to maintain a free list, then you would need to sync on 
TRIM/DISCARD which is potentially a fast path.  While a background task 
may be less efficient in the short term, it's just as efficient in the 
long term and it has the advantage of keeping any fast path fast.

Regards,

Anthony Liguori

  reply	other threads:[~2010-08-25 14:14 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-24 10:40 [Qemu-devel] [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes" Kevin Wolf
2010-08-24 11:02 ` [Qemu-devel] " Stefan Hajnoczi
2010-08-24 11:06   ` Michael Tokarev
2010-08-24 11:40   ` Kevin Wolf
2010-08-24 11:56     ` Alexander Graf
2010-08-24 12:10       ` Kevin Wolf
2010-08-24 12:12         ` Alexander Graf
2010-08-24 12:18           ` Avi Kivity
2010-08-24 12:21             ` Alexander Graf
2010-08-24 12:27               ` Avi Kivity
2010-08-24 12:35                 ` Kevin Wolf
2010-08-24 12:39                   ` Avi Kivity
2010-08-24 12:53                     ` Kevin Wolf
2010-08-24 12:21     ` Stefan Hajnoczi
2010-08-24 12:23       ` Michael Tokarev
2010-08-24 12:48 ` Juan Quintela
2010-08-24 13:01 ` Anthony Liguori
2010-08-24 13:16   ` Kevin Wolf
2010-08-24 13:29     ` Anthony Liguori
2010-08-24 13:31       ` Avi Kivity
2010-08-24 13:35         ` Anthony Liguori
2010-08-24 13:39           ` Avi Kivity
2010-08-24 13:40             ` Anthony Liguori
2010-08-24 13:44               ` Avi Kivity
2010-08-24 13:56                 ` Anthony Liguori
2010-08-25  7:14                   ` Avi Kivity
2010-08-25 12:46                     ` Anthony Liguori
2010-08-25 13:07                       ` Avi Kivity
2010-08-25 13:37                         ` Anthony Liguori
2010-08-25 13:23                       ` Avi Kivity
2010-08-25 13:42                         ` Anthony Liguori
2010-08-25 14:00                           ` Avi Kivity
2010-08-25 14:14                             ` Anthony Liguori [this message]
2010-08-25 14:36                               ` Avi Kivity
2010-08-25 15:06                                 ` Anthony Liguori
2010-08-25 15:15                                   ` Avi Kivity
2010-08-25 15:21                                     ` Anthony Liguori
2010-08-25 13:46                         ` Anthony Liguori
2010-08-25 14:03                           ` Avi Kivity
2010-08-25 14:19                           ` Christoph Hellwig
2010-08-25 14:37                             ` Avi Kivity
2010-08-25 14:18                         ` Christoph Hellwig
2010-08-25 14:26                           ` Anthony Liguori
2010-08-25 14:49                             ` Daniel P. Berrange

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C75252F.6040002@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=avi@redhat.com \
    --cc=hch@lst.de \
    --cc=kwolf@redhat.com \
    --cc=mjt@tls.msk.ru \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.