All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joao Eduardo Luis <joao@redhat.com>
To: Sage Weil <sweil@redhat.com>, gfarnum@redhat.com
Cc: ceph-devel@vger.kernel.org
Subject: Re: full osdmaps in mon txns
Date: Thu, 01 Jan 2015 15:50:09 +0000	[thread overview]
Message-ID: <54A56CB1.4010207@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1412231306170.28630@cobra.newdream.net>

On 12/23/2014 09:10 PM, Sage Weil wrote:
> This fun issue came up again in the form of 10422:
>
> 	http://tracker.ceph.com/issues/10422
>
> I think we have 3 main options:
>
> 1. Ask users to do a mon scrub prior to upgrade to
> ensure it is safe.  If a mon is out of sync, manually kick it out, blow it
> away, and resync.
>
> 2. Do a one-time broadcast of the full osdmap across mons to ensure they
> are consistent after upgrade.  Bleh.
>
> 3. Include full encoded OSDMap in txns on updates going forward.
>
> I like 3 because it solves this and all related problems going forward.
> The local encoding of full osdmaps has proven to be a huge headache.
> And, the patch to do it is remarkably simple
>
> 	https://github.com/ceph/ceph/pull/3247/files
>
> and dovetails well with the new CRC.

I prefer 3 as well.  Below is my reply on the pull request, which I 
wrote before addressing this email, and I shall leave it here for posterity!

(Also, I think the approach in the pull request is correct)

As far as I can tell, the whole idea about relying solely on incremental 
to locally build full osdmaps goes as back as a5e2dcb. This has me 
believe that while the idea may have seemed good at the time it may not 
have been based on a real issue.

Anyway, relaying a few MB's worth of osdmap (if it gets to that) over 
the wire doesn't concern me particularly -- the one thing that may be 
annoying is writing them to leveldb.

I fear that writing a just-big enough map to leveldb may cause a hang; 
while we do now have the async mechanism to handle this, I fear that we 
may end up waiting for a big transaction to be applied to leveldb before 
accepting the value (e.g., in Paxos::handle_begin() we will wait for the 
value to be applied to the store before we send out 
MMonPaxos::OP_ACCEPT). Then again, this can easily be something 
surmountable by adjusting timeouts if we ever hit it.


   -Joao

>
> What do you think?
> sage
>


-- 
Joao Eduardo Luis
Software Engineer | http://ceph.com

  reply	other threads:[~2015-01-01 15:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-23 21:10 full osdmaps in mon txns Sage Weil
2015-01-01 15:50 ` Joao Eduardo Luis [this message]
     [not found] ` <alpine.DEB.2.00.1412231306170.28630-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-01-05  9:12   ` Dan van der Ster
2015-01-06  8:49     ` Dan van der Ster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54A56CB1.4010207@redhat.com \
    --to=joao@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=gfarnum@redhat.com \
    --cc=sweil@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.