Re: ceph stability - Mark Nelson

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mark Nelson <mark.nelson@inktank.com>
To: Roman Hlynovskiy <roman.hlynovskiy@gmail.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: ceph stability
Date: Wed, 19 Dec 2012 07:26:54 -0600	[thread overview]
Message-ID: <50D1C09E.9000504@inktank.com> (raw)
In-Reply-To: <CAD5ewrosrMWD3jxznaMhab=VTL4VzQ7savVk42KDe374VSSzbA@mail.gmail.com>

On 12/19/2012 03:03 AM, Roman Hlynovskiy wrote:
> Hello,
>
> I have 2 issues with ceph stability and looking for help to resolve them.
> My setup is pretty simple - 3 debian 32bit stable systems each running
> osd, mon and mds.
> the conf is the following:
> --------------------
> [global]
>      auth cluster required = none
>      auth service required = none
>      auth client required = none
>
> [osd]
>      osd journal size = 1000
>      filestore xattr use omap = true
>
> [mon.a]
>      host = ceph-node01
>      mon addr = 192.168.7.11:6789
>
> [mon.b]
>      host = ceph-node02
>      mon addr = 192.168.7.12:6789
>
> [mon.c]
>          host = ceph-node03
>          mon addr = 192.168.7.13:6789
>
> [mds.a]
>          host = ceph-node01
>
> [mds.b]
>          host = ceph-node02
>
> [mds.c]
>          host = ceph-node03

A quicky side-node:  multi-mds solutions aren't being supported in 
production right now.  Not sure if your stat problems below are related, 
but you may want to try starting out with a single mds and see if the 
problem goes away.  If so, there may be some hints in the mds logs 
regarding what's going on.  Bug reports are welcome!

>
> [osd.0]
>      host = ceph-node01
>
> [osd.1]
>      host = ceph-node02
>
> [osd.2]
>      host = ceph-node03
> --------------------
> ceph -s is:
>     health HEALTH_OK
>     monmap e4: 3 mons at
> {a=192.168.7.11:6789/0,b=192.168.7.12:6789/0,c=192.168.7.13:6789/0},
> election epoch 118, quorum 0,1,2 a,b,c
>     osdmap e197: 3 osds: 3 up, 3 in
>      pgmap v43305: 384 pgs: 384 active+clean; 72351 MB data, 144 GB
> used, 105 GB / 249 GB avail
>     mdsmap e4439: 1/1/1 up {0=a=up:active}, 2 up:standby
> --------------------
>
> My first problem - I am getting spurious mon's deaths, which usually
> looks like this:
>
> --- begin dump of recent events ---
>       0> 2012-12-19 10:35:58.912119 b41eab70 -1 *** Caught signal (Aborted) **
>   in thread b41eab70
>
>   ceph version 0.55.1 (8e25c8d984f9258644389a18997ec6bdef8e056b)
>   1: /usr/bin/ceph-mon() [0x8183a11]
>   2: [0xb7714400]
>   3: (gsignal()+0x47) [0xb7337577]
>   4: (abort()+0x182) [0xb733a962]
>   5: (__gnu_cxx::__verbose_terminate_handler()+0x14f) [0xb755653f]
>   6: (()+0xbd405) [0xb7554405]
>   7: (()+0xbd442) [0xb7554442]
>   8: (()+0xbd581) [0xb7554581]
>   9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x80f) [0x824cabf]
>   10: /usr/bin/ceph-mon() [0x80e3c1d]
>   11: (MDSMonitor::tick()+0x1e3b) [0x811ea0b]
>   12: (MDSMonitor::on_active()+0x1d) [0x81188dd]
>   13: (PaxosService::_active()+0x212) [0x80e4b02]
>   14: (Context::complete(int)+0x19) [0x80c4cf9]
>   15: (finish_contexts(CephContext*, std::list<Context*,
> std::allocator<Context*> >&, int)+0x13f) [0x80d208f]
>   16: (Monitor::recovered_leader(int)+0x3ac) [0x80ac5ac]
>   17: (Paxos::handle_last(MMonPaxos*)+0xb02) [0x80e0572]
>   18: (Paxos::dispatch(PaxosServiceMessage*)+0x2c4) [0x80e0e94]
>   19: (Monitor::_ms_dispatch(Message*)+0x1181) [0x80c3b11]
>   20: (Monitor::ms_dispatch(Message*)+0x31) [0x80d5021]
>   21: (DispatchQueue::entry()+0x337) [0x82afa47]
>   22: (DispatchQueue::DispatchThread::entry()+0x20) [0x823eec0]
>   23: (Thread::_entry_func(void*)+0x11) [0x824be41]
>   24: (()+0x57b0) [0xb75ef7b0]
>   25: (clone()+0x5e) [0xb73d8cde]
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
>     0/ 5 none
>     0/ 1 lockdep
>     0/ 1 context
>     1/ 1 crush
>     1/ 5 mds
>     1/ 5 mds_balancer
>     1/ 5 mds_locker
>     1/ 5 mds_log
>     1/ 5 mds_log_expire
>     1/ 5 mds_migrator
>     0/ 1 buffer
>     0/ 1 timer
>     0/ 1 filer
>     0/ 1 striper
>     0/ 1 objecter
>     0/ 5 rados
>     0/ 5 rbd
>     0/ 5 journaler
>     0/ 5 objectcacher
>     0/ 5 client
>     0/ 5 osd
>     0/ 5 optracker
>     0/ 5 objclass
>     1/ 3 filestore
>     1/ 3 journal
>     0/ 5 ms
>     1/ 5 mon
>     0/10 monc
>     0/ 5 paxos
>     0/ 5 tp
>     1/ 5 auth
>     1/ 5 crypto
>     1/ 1 finisher
>     1/ 5 heartbeatmap
>     1/ 5 perfcounter
>     1/ 5 rgw
>     1/ 5 hadoop
>     1/ 5 javaclient
>     1/ 5 asok
>     1/ 1 throttle
>    -2/-2 (syslog threshold)
>    -1/-1 (stderr threshold)
>    max_recent    100000
>    max_new         1000
>    log_file /var/log/ceph/ceph-mon.a.log
> --- end dump of recent events ---
>
> the binaries are coming from ceph.com debian-testing repo.
>
> My second problem - I have 2 systems which mount ceph. Whenever I
> mount ceph on any other system it usually mounts but get stuck on
> stat* operations (i.e. simple ls -al will hang with read( from the
> ceph-mounted directory for ages). This kind of client stuck also
> affects two working clients. they also start to stuck on the stat*
> even after shutdown of the third client. so usually umount/mount or
> even reboot for existing clients solves the issue)
>
>
> --
> ...WBR, Roman Hlynovskiy
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

next prev parent reply	other threads:[~2012-12-19 13:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-19  9:03 ceph stability Roman Hlynovskiy
2012-12-19 10:08 ` Joao Eduardo Luis
2012-12-19 10:58   ` Roman Hlynovskiy
2012-12-19 12:11     ` Joao Eduardo Luis
2012-12-19 13:26 ` Mark Nelson [this message]
2012-12-20  7:08   ` Roman Hlynovskiy
2012-12-20 14:31     ` Mark Nelson
2012-12-21 10:07       ` Amon Ott
2013-01-05  0:37         ` Gregory Farnum
2012-12-19 15:40 ` Sage Weil
2012-12-20  8:02   ` Roman Hlynovskiy
2012-12-20 15:20     ` Sam Lang
2012-12-21  4:49       ` Roman Hlynovskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50D1C09E.9000504@inktank.com \
    --to=mark.nelson@inktank.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=roman.hlynovskiy@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.