All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Dillaman <dillaman@redhat.com>
To: Gregory Farnum <greg@gregs42.com>
Cc: Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: RBD journal draft design
Date: Thu, 4 Jun 2015 20:36:13 -0400 (EDT)	[thread overview]
Message-ID: <810657134.11416115.1433464573115.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <CAC6JEv8v9Wzx094E5pReJE47=d2wbP8rR=FJcU+BNJcT7H488w@mail.gmail.com>

> >> ...Actually, doesn't *not* forcing a coordinated move from one object
> >> set to another mean that you don't actually have an ordering guarantee
> >> across tags if you replay the journal objects in order?
> >
> > The ordering between tags was meant to be a soft ordering guarantee (since
> > any number of delays could throw off the actual order as delivered from
> > the OS).  In the case of a VM using multiple RBD images sharing the same
> > journal, this provides an ordering guarantee per device but not between
> > devices.
> >
> > This is no worse than the case of each RBD image using its own journal
> > instead of sharing a journal and the behavior doesn't seem too different
> > from a non-RBD case when submitting requests to two different physical
> > devices (e.g. a SSD device and a NAS device will commit data at different
> > latencies).
> 
> Yes, it's exactly the same. But I thought the point was that if you
> commingle the journals then you actually have the appropriate ordering
> across clients/disks (if there's enough ordering and synchronization)
> that you can stream the journal off-site and know that if there's any
> kind of disaster you are always at least crash-consistent. If there's
> arbitrary re-ordering of different volume writes at object boundaries
> then I don't see what benefit there is to having a commingled journal
> at all.
> 
> I think there's a thing called a "consistency group" in various
> storage platforms that is sort of similar to this, where you can take
> a snapshot of a related group of volumes at once. I presume the
> commingled journal is an attempt at basically having an ongoing
> snapshot of the whole consistency group.

Seems like even with a SAN-type consistency group, you could still have temporal ordering issues between volume writes unless it synchronized with the client OSes to flush out all volumes at a consistent place so that the snapshot could take place.

I suppose you could provide much tighter QEMU inter-volume ordering guarantees if you modified the RBD block device so that each individual RBD image instance was provided a mechanism to coordinate the allocation of the sequence number between the images.  Right now, each image is opened in its own context w/ no knowledge of one another and no way to coordinate.  The current proposed tag + sequence number approach could be used to provide the soft inter-volume ordering guarantees until QEMU / librbd could be modified to support volume groupings.

  reply	other threads:[~2015-06-05  0:36 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1574383603.9391063.1433257824183.JavaMail.zimbra@redhat.com>
2015-06-02 15:11 ` RBD journal draft design Jason Dillaman
2015-06-03  0:39   ` Gregory Farnum
2015-06-03 16:13     ` Jason Dillaman
2015-06-04  0:01       ` Gregory Farnum
2015-06-04 15:08         ` Jason Dillaman
2015-06-04 20:25           ` Gregory Farnum
2015-06-05  0:36             ` Jason Dillaman [this message]
2015-06-09 18:32               ` Gregory Farnum
2015-06-09 19:08                 ` Jason Dillaman
2015-06-09 22:30                   ` Gregory Farnum
2015-06-03 10:47   ` John Spray
2015-06-03 16:24     ` Jason Dillaman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=810657134.11416115.1433464573115.JavaMail.zimbra@redhat.com \
    --to=dillaman@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=greg@gregs42.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.