All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Dillaman <dillaman@redhat.com>
To: Gregory Farnum <greg@gregs42.com>
Cc: Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: RBD journal draft design
Date: Thu, 4 Jun 2015 11:08:08 -0400 (EDT)	[thread overview]
Message-ID: <1628237419.11058538.1433430488520.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <CAC6JEv_M5jKk+FSDZja15Rmyf3W2x5-yeg9LRukKCHLnYq-j+w@mail.gmail.com>

> >> >A successful append will indicate whether or not the journal is now full
> >> >(larger than the max object size), indicating to the client that a new
> >> >journal object should be used.  If the journal is too large, an error
> >> >code
> >> >responce would alert the client that it needs to write to the current
> >> >active journal object.  In practice, the only time the journaler should
> >> >expect to see such a response would be in the case where multiple clients
> >> >are using the same journal and the active object update notification has
> >> >yet to be received.
> >>
> >> I'm confused. How does this work with the splay count thing you
> >> mentioned above? Can you define <splay count>?
> >
> > Similar to the stripe width.
> 
> Okay, that sort of makes sense but I don't see how you could legally
> be writing to different "sets" so why not just make it an explicit
> striping thing and move all journal entries for that "set" at once?
> 
> ...Actually, doesn't *not* forcing a coordinated move from one object
> set to another mean that you don't actually have an ordering guarantee 
> across tags if you replay the journal objects in order?

The ordering between tags was meant to be a soft ordering guarantee (since any number of delays could throw off the actual order as delivered from the OS).  In the case of a VM using multiple RBD images sharing the same journal, this provides an ordering guarantee per device but not between devices.

This is no worse than the case of each RBD image using its own journal instead of sharing a journal and the behavior doesn't seem too different from a non-RBD case when submitting requests to two different physical devices (e.g. a SSD device and a NAS device will commit data at different latencies). Without the forced coordinated move, the potential gap in request orders between two devices would increase by the latency of the notify message roundtrip time, but it prevents the need for potentially resending journal entries to a new journal object.

  reply	other threads:[~2015-06-04 15:08 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1574383603.9391063.1433257824183.JavaMail.zimbra@redhat.com>
2015-06-02 15:11 ` RBD journal draft design Jason Dillaman
2015-06-03  0:39   ` Gregory Farnum
2015-06-03 16:13     ` Jason Dillaman
2015-06-04  0:01       ` Gregory Farnum
2015-06-04 15:08         ` Jason Dillaman [this message]
2015-06-04 20:25           ` Gregory Farnum
2015-06-05  0:36             ` Jason Dillaman
2015-06-09 18:32               ` Gregory Farnum
2015-06-09 19:08                 ` Jason Dillaman
2015-06-09 22:30                   ` Gregory Farnum
2015-06-03 10:47   ` John Spray
2015-06-03 16:24     ` Jason Dillaman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1628237419.11058538.1433430488520.JavaMail.zimbra@redhat.com \
    --to=dillaman@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=greg@gregs42.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.