Re: Bounding OSD memory requirements during peering/recovery

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David McBride <dwm37@cam.ac.uk>
To: Gregory Farnum <greg@gregs42.com>
Cc: Ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: Bounding OSD memory requirements during peering/recovery
Date: Mon, 09 Feb 2015 21:36:16 +0000	[thread overview]
Message-ID: <54D92850.5080409@cam.ac.uk> (raw)
In-Reply-To: <CAC6JEv8NYw2qk9O7pcSmrVwd2p=7mfLDrA+1tBmFxf2-_f-tZw@mail.gmail.com>

On 09/02/15 15:31, Gregory Farnum wrote:

> So, memory usage of an OSD is usually linear in the number of PGs it
> hosts. However, that memory can also grow based on at least one other
> thing: the number of OSD Maps required to go through peering. It
> *looks* to me like this is what you're running in to, not growth on
> the number of state machines. In particular, those past_intervals you
> mentioned. ;)

Hi Greg,

Right, that sounds entirely plausible, and is very helpful.

In practice, that means I'll need to be careful to avoid this situation 
occurring in production — but given that's unlikely to occur except in 
the case of non-trivial neglect, I don't think I need be particularly 
concerned.

(Happily, I'm in the situation that my existing cluster is purely for 
testing purposes; the data is expendable.)

That said, for my own peace of mind, it would be valuable to have a 
procedure that can be used to recover from this state, even if it's 
unlikely to occur in practice.

I'm currently running an experiment where I augment the RAM of each OSD 
node with 10GB swapfiles on each spinning OSD disk, so that there's a 
big-enough backing-store to complete log reconstruction.

(You obviously wouldn't want to operate in this manner during normal 
production operation — the loss of a single drive would cause a hard 
machine-crash, and the performance will be fairly diabolical, 
particularly if you allow client workloads to carry on in the background.)

I did try enabling zswap on the Utopic LTS kernel as supplied as an 
option in Ubuntu 14.04; however, the kernel was not stable in such a 
configuration and several machines crashed under memory pressure.

I do have OSDs committing suicide periodically, probably because they're 
insufficiently responsive to heartbeats as they start to hit swap.  This 
is before experimenting with the various OSD tuning dials for timeouts, 
so some improvement may be possible.

In the meantime, I've configured the ceph-osd Upstart jobs to apply a 
post-exec command of `sleep 3600` to reduce the rate at which they're 
respawned.

So far, the resulting configuration seems to be making progress, albeit 
moderately slowly.

Cheers,
David
-- 
David McBride <dwm37@cam.ac.uk>
Unix Specialist, University Information Services
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2015-02-09 21:35 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-08 16:05 Bounding OSD memory requirements during peering/recovery David McBride
2015-02-08 20:05 ` David McBride
2015-02-09 10:38   ` David McBride
2015-02-09 15:31 ` Gregory Farnum
2015-02-09 21:36   ` David McBride [this message]
2015-02-10  1:51     ` Sage Weil
2015-03-09 15:42       ` Dan van der Ster
2015-03-09 15:47         ` Gregory Farnum
2015-03-13 11:24           ` Dan van der Ster
     [not found]             ` <f943965c-b279-4e5f-ac47-1dc6443e594d@email.android.com>
2015-03-13 12:52               ` Dan van der Ster
2015-03-13 15:36                 ` Dan van der Ster
2015-03-13 20:42                   ` Samuel Just
2015-03-13 20:53                     ` Samuel Just
2015-03-13 21:24                       ` Dan van der Ster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54D92850.5080409@cam.ac.uk \
    --to=dwm37@cam.ac.uk \
    --cc=ceph-devel@vger.kernel.org \
    --cc=greg@gregs42.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.