qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "Benoît Canet" <benoit@irqsave.net>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
Date: Tue, 2 Jul 2013 16:54:46 +0200	[thread overview]
Message-ID: <20130702145446.GG3031@dhcp-200-207.str.redhat.com> (raw)
In-Reply-To: <20130702144224.GF9870@stefanha-thinkpad.redhat.com>

Am 02.07.2013 um 16:42 hat Stefan Hajnoczi geschrieben:
> On Thu, Jun 20, 2013 at 04:26:09PM +0200, Benoît Canet wrote:
> > ---
> >  docs/specs/qcow2.txt |   42 ++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 42 insertions(+)
> > 
> > diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> > index 36a559d..a4ffc85 100644
> > --- a/docs/specs/qcow2.txt
> > +++ b/docs/specs/qcow2.txt
> > @@ -350,3 +350,45 @@ Snapshot table entry:
> >          variable:   Unique ID string for the snapshot (not null terminated)
> >  
> >          variable:   Name of the snapshot (not null terminated)
> > +
> > +== Journal ==
> > +
> > +QCOW2 can use one or more instance of a metadata journal.
> 
> s/instance/instances/
> 
> Is there a reason to use multiple journals rather than a single journal
> for all entry types?  The single journal area avoids seeks.
> 
> > +
> > +A journal is a sequential log of journal entries appended on a previously
> > +allocated and reseted area.
> 
> I think you say "previously reset area" instead of "reseted".  Another
> option is "initialized area".
> 
> > +A journal is designed like a linked list with each entry pointing to the next
> > +so it's easy to iterate over entries.
> > +
> > +A journal uses the following constants to denote the type of each entry
> > +
> > +TYPE_NONE = 0xFF      default value of any bytes in a reseted journal
> > +TYPE_END  = 1         the entry ends a journal cluster and point to the next
> > +                      cluster
> > +TYPE_HASH = 2         the entry contains a deduplication hash
> > +
> > +QCOW2 journal entry:
> > +
> > +    Byte 0         :    Size of the entry: size = 2 + n with size <= 254
> 
> This is not clear.  I'm wondering if the +2 is included in the byte
> value or not.  I'm also wondering what a byte value of zero means and
> what a byte value of 255 means.
> 
> Please include an example to illustrate how this field works.
> 
> > +
> > +         1         :    Type of the entry
> > +
> > +         2 - size  :    The optional n bytes structure carried by entry
> > +
> > +A journal is divided into clusters and no journal entry can be spilled on two
> > +clusters. This avoid having to read more than one cluster to get a single entry.
> > +
> > +For this purpose an entry with the end type is added at the end of a journal
> > +cluster before starting to write in the next cluster.
> > +The size of such an entry is set so the entry points to the next cluster.
> > +
> > +As any journal cluster must be ended with an end entry the size of regular
> > +journal entries is limited to 254 bytes in order to always left room for an end
> > +entry which mimimal size is two bytes.
> > +
> > +The only cases where size > 254 are none entries where size = 255.
> > +
> > +The replay of a journal stop when the first end none entry is reached.
> 
> s/stop/stops/
> 
> > +The journal cluster size is 4096 bytes.
> 
> Questions about this layout:
> 
> 1. Journal entries have no integrity mechanism, which is especially
>    important if they span physical sectors where cheap disks may perform
>    a partial write.  This would leave a corrupt journal.  If the last
>    bytes are a checksum then you can get some confidence that the entry
>    was fully written and is valid.
> 
>    Did I miss something?

Adding a checksum sounds like a good idea.

> 2. Byte-granularity means that read-modify-write is necessary to append
>    entries to the journal.  Therefore a failure could destroy previously
>    committed entries.
> 
>    Any ideas how existing journals handle this?

You commit only whole blocks. So in this case we can consider a block
only committed as soon as a TYPE_END entry has been written (and after
that we won't touch it any more until the journalled changes have been
flushed to disk).

There's one "interesting" case: cache=writethrough. I'm not entirely
sure yet what to do with it, but it's slow anyway, so using one block
per entry and therefore flushing the journal very often might actually
be not totally unreasonable.

Another thing I'm not sure about is whether a fixed 4k block is good or
if we should leave it configurable. I don't think making it an option
would hurt (not necessarily modifyable with qemu-img, but as a field
in the file format).

Kevin

  reply	other threads:[~2013-07-02 14:55 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification Benoît Canet
2013-07-02 14:42   ` Stefan Hajnoczi
2013-07-02 14:54     ` Kevin Wolf [this message]
2013-07-02 21:26       ` Benoît Canet
2013-07-03  8:08         ` Kevin Wolf
2013-07-03  7:51       ` Stefan Hajnoczi
2013-07-02 21:23     ` Benoît Canet
2013-07-03  8:01       ` Stefan Hajnoczi
2013-07-03 12:35         ` Benoît Canet
2013-07-03  8:04       ` Kevin Wolf
2013-07-03 12:30         ` Benoît Canet
2013-07-03  8:12       ` Stefan Hajnoczi
2013-07-03 12:53         ` Benoît Canet
2013-07-04  7:13           ` Stefan Hajnoczi
2013-07-04 10:01             ` Benoît Canet
2013-07-16 22:45               ` Benoît Canet
2013-07-17  8:20                 ` Kevin Wolf
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 02/24] qcow2: Add deduplication structures and fields Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 03/24] qcow2: Add journal Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 04/24] qcow2: Create the log store Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 05/24] qcow2: Add the hash store Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 06/24] qcow2: Add the deduplication store Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 07/24] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 08/24] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 09/24] qcow2: Make qcow2_update_cluster_refcount public Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 10/24] qcow2: Add qcow2_dedup and related functions Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 11/24] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 12/24] qcow2: Do allocate on rewrite on the dedup case Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 13/24] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 14/24] qcow2: Load and save deduplication table header extension Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 15/24] qcow2: Extract qcow2_set_incompat_feature and qcow2_clear_incompat_feature Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 16/24] block: Add qcow2_dedup format and image creation code Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 17/24] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2 Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 18/24] qcow2: Remove hash when cluster is deleted Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 19/24] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 20/24] qcow2: Serialize write requests when deduplication is activated Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 21/24] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 22/24] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 23/24] qcow2: Enable the deduplication feature Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 24/24] qcow2: Enable deduplication tests Benoît Canet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130702145446.GG3031@dhcp-200-207.str.redhat.com \
    --to=kwolf@redhat.com \
    --cc=benoit@irqsave.net \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).