From: Stefan Hajnoczi <stefanha@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: "Benoît Canet" <benoit@irqsave.net>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
Date: Wed, 3 Jul 2013 09:51:14 +0200 [thread overview]
Message-ID: <20130703075114.GB16585@stefanha-thinkpad.muc.redhat.com> (raw)
In-Reply-To: <20130702145446.GG3031@dhcp-200-207.str.redhat.com>
On Tue, Jul 02, 2013 at 04:54:46PM +0200, Kevin Wolf wrote:
> Am 02.07.2013 um 16:42 hat Stefan Hajnoczi geschrieben:
> > On Thu, Jun 20, 2013 at 04:26:09PM +0200, Benoît Canet wrote:
> > > ---
> > > docs/specs/qcow2.txt | 42 ++++++++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 42 insertions(+)
> > >
> > > diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> > > index 36a559d..a4ffc85 100644
> > > --- a/docs/specs/qcow2.txt
> > > +++ b/docs/specs/qcow2.txt
> > > @@ -350,3 +350,45 @@ Snapshot table entry:
> > > variable: Unique ID string for the snapshot (not null terminated)
> > >
> > > variable: Name of the snapshot (not null terminated)
> > > +
> > > +== Journal ==
> > > +
> > > +QCOW2 can use one or more instance of a metadata journal.
> >
> > s/instance/instances/
> >
> > Is there a reason to use multiple journals rather than a single journal
> > for all entry types? The single journal area avoids seeks.
> >
> > > +
> > > +A journal is a sequential log of journal entries appended on a previously
> > > +allocated and reseted area.
> >
> > I think you say "previously reset area" instead of "reseted". Another
> > option is "initialized area".
> >
> > > +A journal is designed like a linked list with each entry pointing to the next
> > > +so it's easy to iterate over entries.
> > > +
> > > +A journal uses the following constants to denote the type of each entry
> > > +
> > > +TYPE_NONE = 0xFF default value of any bytes in a reseted journal
> > > +TYPE_END = 1 the entry ends a journal cluster and point to the next
> > > + cluster
> > > +TYPE_HASH = 2 the entry contains a deduplication hash
> > > +
> > > +QCOW2 journal entry:
> > > +
> > > + Byte 0 : Size of the entry: size = 2 + n with size <= 254
> >
> > This is not clear. I'm wondering if the +2 is included in the byte
> > value or not. I'm also wondering what a byte value of zero means and
> > what a byte value of 255 means.
> >
> > Please include an example to illustrate how this field works.
> >
> > > +
> > > + 1 : Type of the entry
> > > +
> > > + 2 - size : The optional n bytes structure carried by entry
> > > +
> > > +A journal is divided into clusters and no journal entry can be spilled on two
> > > +clusters. This avoid having to read more than one cluster to get a single entry.
> > > +
> > > +For this purpose an entry with the end type is added at the end of a journal
> > > +cluster before starting to write in the next cluster.
> > > +The size of such an entry is set so the entry points to the next cluster.
> > > +
> > > +As any journal cluster must be ended with an end entry the size of regular
> > > +journal entries is limited to 254 bytes in order to always left room for an end
> > > +entry which mimimal size is two bytes.
> > > +
> > > +The only cases where size > 254 are none entries where size = 255.
> > > +
> > > +The replay of a journal stop when the first end none entry is reached.
> >
> > s/stop/stops/
> >
> > > +The journal cluster size is 4096 bytes.
> >
> > Questions about this layout:
> >
> > 1. Journal entries have no integrity mechanism, which is especially
> > important if they span physical sectors where cheap disks may perform
> > a partial write. This would leave a corrupt journal. If the last
> > bytes are a checksum then you can get some confidence that the entry
> > was fully written and is valid.
> >
> > Did I miss something?
>
> Adding a checksum sounds like a good idea.
>
> > 2. Byte-granularity means that read-modify-write is necessary to append
> > entries to the journal. Therefore a failure could destroy previously
> > committed entries.
> >
> > Any ideas how existing journals handle this?
>
> You commit only whole blocks. So in this case we can consider a block
> only committed as soon as a TYPE_END entry has been written (and after
> that we won't touch it any more until the journalled changes have been
> flushed to disk).
>
> There's one "interesting" case: cache=writethrough. I'm not entirely
> sure yet what to do with it, but it's slow anyway, so using one block
> per entry and therefore flushing the journal very often might actually
> be not totally unreasonable.
>
> Another thing I'm not sure about is whether a fixed 4k block is good or
> if we should leave it configurable. I don't think making it an option
> would hurt (not necessarily modifyable with qemu-img, but as a field
> in the file format).
Making block size configurable seems like a good idea so we can adapt to
disk performance and data integrity characteristics.
Stefan
next prev parent reply other threads:[~2013-07-03 7:51 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification Benoît Canet
2013-07-02 14:42 ` Stefan Hajnoczi
2013-07-02 14:54 ` Kevin Wolf
2013-07-02 21:26 ` Benoît Canet
2013-07-03 8:08 ` Kevin Wolf
2013-07-03 7:51 ` Stefan Hajnoczi [this message]
2013-07-02 21:23 ` Benoît Canet
2013-07-03 8:01 ` Stefan Hajnoczi
2013-07-03 12:35 ` Benoît Canet
2013-07-03 8:04 ` Kevin Wolf
2013-07-03 12:30 ` Benoît Canet
2013-07-03 8:12 ` Stefan Hajnoczi
2013-07-03 12:53 ` Benoît Canet
2013-07-04 7:13 ` Stefan Hajnoczi
2013-07-04 10:01 ` Benoît Canet
2013-07-16 22:45 ` Benoît Canet
2013-07-17 8:20 ` Kevin Wolf
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 02/24] qcow2: Add deduplication structures and fields Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 03/24] qcow2: Add journal Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 04/24] qcow2: Create the log store Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 05/24] qcow2: Add the hash store Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 06/24] qcow2: Add the deduplication store Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 07/24] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 08/24] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 09/24] qcow2: Make qcow2_update_cluster_refcount public Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 10/24] qcow2: Add qcow2_dedup and related functions Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 11/24] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 12/24] qcow2: Do allocate on rewrite on the dedup case Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 13/24] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 14/24] qcow2: Load and save deduplication table header extension Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 15/24] qcow2: Extract qcow2_set_incompat_feature and qcow2_clear_incompat_feature Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 16/24] block: Add qcow2_dedup format and image creation code Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 17/24] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2 Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 18/24] qcow2: Remove hash when cluster is deleted Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 19/24] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 20/24] qcow2: Serialize write requests when deduplication is activated Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 21/24] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 22/24] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 23/24] qcow2: Enable the deduplication feature Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 24/24] qcow2: Enable deduplication tests Benoît Canet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130703075114.GB16585@stefanha-thinkpad.muc.redhat.com \
--to=stefanha@redhat.com \
--cc=benoit@irqsave.net \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).