Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Benoît Canet" <benoit.canet@irqsave.net>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: kwolf@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
Date: Tue, 2 Jul 2013 23:23:56 +0200	[thread overview]
Message-ID: <20130702212355.GB4985@irqsave.net> (raw)
In-Reply-To: <20130702144224.GF9870@stefanha-thinkpad.redhat.com>

> > +QCOW2 can use one or more instance of a metadata journal.
> 
> s/instance/instances/
> 
> Is there a reason to use multiple journals rather than a single journal
> for all entry types?  The single journal area avoids seeks.

Here are the main reason for this:

For the deduplication some patterns like cycles of insertion/deletion could
leave the hash table almost empty while filling the journal.

If the journal is full and the hash table is empty a packing operation is
started.

Basically a new journal is created and only the entry presents in the hash table
are reinserted.

This is why I want to keep the deduplication journal appart from regular qcow2
journal: to avoid interferences between a pack operation and regular qcow2
journal entries.

The other thing is that freezing the log store would need a replay of regular
qcow2 entries as it trigger a reset of the journal.

Also since deduplication will not work on spinning disk I discarded the seek
time factor.

Maybe commiting the dedupe journal by erase block sized chunk would be a good
idea to reduce random writes to the SSD.

The additional reason for having multiple journals is that the SILT paper
propose a mode where prefix of the hash is used to dispatch insertions in
multiples store and it easier to do with multiple journals.

> 
> > +
> > +A journal is a sequential log of journal entries appended on a previously
> > +allocated and reseted area.
> 
> I think you say "previously reset area" instead of "reseted".  Another
> option is "initialized area".
> 
> > +A journal is designed like a linked list with each entry pointing to the next
> > +so it's easy to iterate over entries.
> > +
> > +A journal uses the following constants to denote the type of each entry
> > +
> > +TYPE_NONE = 0xFF      default value of any bytes in a reseted journal
> > +TYPE_END  = 1         the entry ends a journal cluster and point to the next
> > +                      cluster
> > +TYPE_HASH = 2         the entry contains a deduplication hash
> > +
> > +QCOW2 journal entry:
> > +
> > +    Byte 0         :    Size of the entry: size = 2 + n with size <= 254
> 
> This is not clear.  I'm wondering if the +2 is included in the byte
> value or not.  I'm also wondering what a byte value of zero means and
> what a byte value of 255 means.

I am counting the journal entry header in the size. So yes the +2 is in the byte
value.
A byte value of zero, 1 or 255  is an error.

Maybe this design is bogus and I should only count the payload size in the size
field. It would make less tricky cases.

> 
> Please include an example to illustrate how this field works.
> 
> > +
> > +         1         :    Type of the entry
> > +
> > +         2 - size  :    The optional n bytes structure carried by entry
> > +
> > +A journal is divided into clusters and no journal entry can be spilled on two
> > +clusters. This avoid having to read more than one cluster to get a single entry.
> > +
> > +For this purpose an entry with the end type is added at the end of a journal
> > +cluster before starting to write in the next cluster.
> > +The size of such an entry is set so the entry points to the next cluster.
> > +
> > +As any journal cluster must be ended with an end entry the size of regular
> > +journal entries is limited to 254 bytes in order to always left room for an end
> > +entry which mimimal size is two bytes.
> > +
> > +The only cases where size > 254 are none entries where size = 255.
> > +
> > +The replay of a journal stop when the first end none entry is reached.
> 
> s/stop/stops/
> 
> > +The journal cluster size is 4096 bytes.
> 
> Questions about this layout:
> 
> 1. Journal entries have no integrity mechanism, which is especially
>    important if they span physical sectors where cheap disks may perform
>    a partial write.  This would leave a corrupt journal.  If the last
>    bytes are a checksum then you can get some confidence that the entry
>    was fully written and is valid.

I will add a checksum mecanism.

Do you have any preferences regarding the checksum function ?

> 
>    Did I miss something?
> 
> 2. Byte-granularity means that read-modify-write is necessary to append
>    entries to the journal.  Therefore a failure could destroy previously
>    committed entries.

It's designed to be committed by 4KB blocks.

> 
>    Any ideas how existing journals handle this?
>

next prev parent reply	other threads:[~2013-07-02 21:22 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-20 14:26 [Qemu-devel] [RFC V8 00/24] QCOW2 deduplication core functionality Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification Benoît Canet
2013-07-02 14:42   ` Stefan Hajnoczi
2013-07-02 14:54     ` Kevin Wolf
2013-07-02 21:26       ` Benoît Canet
2013-07-03  8:08         ` Kevin Wolf
2013-07-03  7:51       ` Stefan Hajnoczi
2013-07-02 21:23     ` Benoît Canet [this message]
2013-07-03  8:01       ` Stefan Hajnoczi
2013-07-03 12:35         ` Benoît Canet
2013-07-03  8:04       ` Kevin Wolf
2013-07-03 12:30         ` Benoît Canet
2013-07-03  8:12       ` Stefan Hajnoczi
2013-07-03 12:53         ` Benoît Canet
2013-07-04  7:13           ` Stefan Hajnoczi
2013-07-04 10:01             ` Benoît Canet
2013-07-16 22:45               ` Benoît Canet
2013-07-17  8:20                 ` Kevin Wolf
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 02/24] qcow2: Add deduplication structures and fields Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 03/24] qcow2: Add journal Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 04/24] qcow2: Create the log store Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 05/24] qcow2: Add the hash store Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 06/24] qcow2: Add the deduplication store Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 07/24] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 08/24] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 09/24] qcow2: Make qcow2_update_cluster_refcount public Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 10/24] qcow2: Add qcow2_dedup and related functions Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 11/24] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 12/24] qcow2: Do allocate on rewrite on the dedup case Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 13/24] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 14/24] qcow2: Load and save deduplication table header extension Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 15/24] qcow2: Extract qcow2_set_incompat_feature and qcow2_clear_incompat_feature Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 16/24] block: Add qcow2_dedup format and image creation code Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 17/24] qcow2: Drop hash for a given cluster when dedup makes refcount > 2^16/2 Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 18/24] qcow2: Remove hash when cluster is deleted Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 19/24] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 20/24] qcow2: Serialize write requests when deduplication is activated Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 21/24] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 22/24] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 23/24] qcow2: Enable the deduplication feature Benoît Canet
2013-06-20 14:26 ` [Qemu-devel] [RFC V8 24/24] qcow2: Enable deduplication tests Benoît Canet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130702212355.GB4985@irqsave.net \
    --to=benoit.canet@irqsave.net \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).