From: Kevin Wolf <kwolf@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: "Benoît Canet" <benoit.canet@irqsave.net>,
"Fam Zheng" <famz@redhat.com>, "Jeff Cody" <jcody@redhat.com>,
qemu-devel <qemu-devel@nongnu.org>,
"Max Reitz" <mreitz@redhat.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [RFC] qcow2 journalling draft
Date: Thu, 5 Sep 2013 17:20:02 +0200 [thread overview]
Message-ID: <20130905152002.GH2826@dhcp-200-207.str.redhat.com> (raw)
In-Reply-To: <CAJSP0QWdpLpnoBv0OVVNNQPTdfiRBDOMN0infUoFHyvUkNgKxA@mail.gmail.com>
Am 05.09.2013 um 16:55 hat Stefan Hajnoczi geschrieben:
> On Thu, Sep 5, 2013 at 1:18 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> > Am 05.09.2013 um 11:21 hat Stefan Hajnoczi geschrieben:
> >> On Wed, Sep 04, 2013 at 11:39:51AM +0200, Kevin Wolf wrote:
> >> > > > +A journal is organised in journal blocks, all of which have a reference count
> >> > > > +of exactly 1. It starts with a block containing the following journal header:
> >> > > > +
> >> > > > + Byte 0 - 7: Magic ("qjournal" ASCII string)
> >> > > > +
> >> > > > + 8 - 11: Journal size in bytes, including the header
> >> > > > +
> >> > > > + 12 - 15: Journal block size order (block size in bytes = 1 << order)
> >> > > > + The block size must be at least 512 bytes and must not
> >> > > > + exceed the cluster size.
> >> > > > +
> >> > > > + 16 - 19: Journal block index of the descriptor for the last
> >> > > > + transaction that has been synced, starting with 1 for the
> >> > > > + journal block after the header. 0 is used for empty
> >> > > > + journals.
> >> > > > +
> >> > > > + 20 - 23: Sequence number of the last transaction that has been
> >> > > > + synced. 0 is recommended as the initial value.
> >> > > > +
> >> > > > + 24 - 27: Sequence number of the last transaction that has been
> >> > > > + committed. When replaying a journal, all transactions
> >> > > > + after the last synced one up to the last commit one must be
> >> > > > + synced. Note that this may include a wraparound of sequence
> >> > > > + numbers.
> >> > > > +
> >> > > > + 28 - 31: Checksum (one's complement of the sum of all bytes in the
> >> > > > + header journal block except those of the checksum field)
> >> > > > +
> >> > > > + 32 - 511: Reserved (set to 0)
> >> > >
> >> > > I'm not sure if these fields are necessary. They require updates (and
> >> > > maybe flush) after every commit and sync.
> >> > >
> >> > > The fewer metadata updates, the better, not just for performance but
> >> > > also to reduce the risk of data loss. If any metadata required to
> >> > > access the journal is corrupted, the image will be unavailable.
> >> > >
> >> > > It should be possible to determine this information by scanning the
> >> > > journal transactions.
> >> >
> >> > This is rather handwavy. Can you elaborate how this would work in detail?
> >> >
> >> >
> >> > For example, let's assume we get to read this journal (a journal can be
> >> > rather large, too, so I'm not sure if we want to read it in completely):
> >> >
> >> > - Descriptor, seq 42, 2 data blocks
> >> > - Data block
> >> > - Data block
> >> > - Data block starting with "qjbk"
> >> > - Data block
> >> > - Descriptor, seq 7, 0 data blocks
> >> > - Descriptor, seq 8, 1 data block
> >> > - Data block
> >> >
> >> > Which of these have already been synced? Which have been committed?
> >
> > So what's your algorithm for this?
>
> Scan the journal to find unsynced transactions, if they exist:
>
> last_sync_seq = 0
> last_seqno = 0
> while True:
> block = journal[(i++) % journal_nblocks]
> if i >= journal_nblocks * 2:
> break # avoid infinite loop
> if block.magic != 'qjbk':
> continue
Important implication: This doesn't allow data blocks starting with
'qjbk'. Otherwise you're not even guaranteed to find a descriptor block
to start your seach with.
The second time you make this assumption is when there are stale data
blocks in the unused area between the head and tail of the journal.
> if block.seqno < last_seqno:
> # Wrapped around to oldest transaction
> break
Why can you stop here? There might be transactions in the second half of
the journal that aren't synced yet.
> elif block.seqno == seqno:
> # Corrupt journal, sequence number should be
> # monotonically increasing
> raise InvalidJournalException
> if block.last_sync_seq != last_sync_seq:
> last_sync_seq = block.last_sync_seq
The 'if' doesn't add anything here, so you end up using the
last_sync_seq field of the last valid descriptor.
> last_seqno = block.seqno
>
> print 'First unsynced block seq no:', last_sync_seq
> print 'Last block seq no:', last_seqno
>
> This is broken pseudocode, but hopefully the idea makes sense.
One additional thought that might make the thing a bit more interesting:
Sequence numbers can wrap around as well.
Kevin
next prev parent reply other threads:[~2013-09-05 15:42 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-03 13:45 [Qemu-devel] [RFC] qcow2 journalling draft Kevin Wolf
2013-09-03 14:43 ` Benoît Canet
2013-09-04 8:03 ` Stefan Hajnoczi
2013-09-04 9:37 ` Benoît Canet
2013-09-04 9:39 ` Kevin Wolf
2013-09-04 9:55 ` Benoît Canet
2013-09-05 9:24 ` Stefan Hajnoczi
2013-09-05 15:26 ` Benoît Canet
2013-09-06 7:27 ` Kevin Wolf
2013-09-15 18:23 ` Benoît Canet
2013-09-05 9:21 ` Stefan Hajnoczi
2013-09-05 11:18 ` Kevin Wolf
2013-09-05 14:55 ` Stefan Hajnoczi
2013-09-05 15:20 ` Kevin Wolf [this message]
2013-09-05 15:56 ` Eric Blake
2013-09-06 9:20 ` Fam Zheng
2013-09-06 9:57 ` Kevin Wolf
2013-09-06 10:02 ` Fam Zheng
2013-09-04 8:32 ` Max Reitz
2013-09-04 10:12 ` Kevin Wolf
2013-09-05 9:35 ` Stefan Hajnoczi
2013-09-05 11:50 ` Kevin Wolf
2013-09-05 12:08 ` Benoît Canet
2013-09-06 9:59 ` Fam Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130905152002.GH2826@dhcp-200-207.str.redhat.com \
--to=kwolf@redhat.com \
--cc=benoit.canet@irqsave.net \
--cc=famz@redhat.com \
--cc=jcody@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).