From: Mark Nelson <mark.nelson@inktank.com>
To: Roman Hlynovskiy <roman.hlynovskiy@gmail.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: ceph stability
Date: Wed, 19 Dec 2012 07:26:54 -0600 [thread overview]
Message-ID: <50D1C09E.9000504@inktank.com> (raw)
In-Reply-To: <CAD5ewrosrMWD3jxznaMhab=VTL4VzQ7savVk42KDe374VSSzbA@mail.gmail.com>
On 12/19/2012 03:03 AM, Roman Hlynovskiy wrote:
> Hello,
>
> I have 2 issues with ceph stability and looking for help to resolve them.
> My setup is pretty simple - 3 debian 32bit stable systems each running
> osd, mon and mds.
> the conf is the following:
> --------------------
> [global]
> auth cluster required = none
> auth service required = none
> auth client required = none
>
> [osd]
> osd journal size = 1000
> filestore xattr use omap = true
>
> [mon.a]
> host = ceph-node01
> mon addr = 192.168.7.11:6789
>
> [mon.b]
> host = ceph-node02
> mon addr = 192.168.7.12:6789
>
> [mon.c]
> host = ceph-node03
> mon addr = 192.168.7.13:6789
>
> [mds.a]
> host = ceph-node01
>
> [mds.b]
> host = ceph-node02
>
> [mds.c]
> host = ceph-node03
A quicky side-node: multi-mds solutions aren't being supported in
production right now. Not sure if your stat problems below are related,
but you may want to try starting out with a single mds and see if the
problem goes away. If so, there may be some hints in the mds logs
regarding what's going on. Bug reports are welcome!
>
> [osd.0]
> host = ceph-node01
>
> [osd.1]
> host = ceph-node02
>
> [osd.2]
> host = ceph-node03
> --------------------
> ceph -s is:
> health HEALTH_OK
> monmap e4: 3 mons at
> {a=192.168.7.11:6789/0,b=192.168.7.12:6789/0,c=192.168.7.13:6789/0},
> election epoch 118, quorum 0,1,2 a,b,c
> osdmap e197: 3 osds: 3 up, 3 in
> pgmap v43305: 384 pgs: 384 active+clean; 72351 MB data, 144 GB
> used, 105 GB / 249 GB avail
> mdsmap e4439: 1/1/1 up {0=a=up:active}, 2 up:standby
> --------------------
>
> My first problem - I am getting spurious mon's deaths, which usually
> looks like this:
>
> --- begin dump of recent events ---
> 0> 2012-12-19 10:35:58.912119 b41eab70 -1 *** Caught signal (Aborted) **
> in thread b41eab70
>
> ceph version 0.55.1 (8e25c8d984f9258644389a18997ec6bdef8e056b)
> 1: /usr/bin/ceph-mon() [0x8183a11]
> 2: [0xb7714400]
> 3: (gsignal()+0x47) [0xb7337577]
> 4: (abort()+0x182) [0xb733a962]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x14f) [0xb755653f]
> 6: (()+0xbd405) [0xb7554405]
> 7: (()+0xbd442) [0xb7554442]
> 8: (()+0xbd581) [0xb7554581]
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x80f) [0x824cabf]
> 10: /usr/bin/ceph-mon() [0x80e3c1d]
> 11: (MDSMonitor::tick()+0x1e3b) [0x811ea0b]
> 12: (MDSMonitor::on_active()+0x1d) [0x81188dd]
> 13: (PaxosService::_active()+0x212) [0x80e4b02]
> 14: (Context::complete(int)+0x19) [0x80c4cf9]
> 15: (finish_contexts(CephContext*, std::list<Context*,
> std::allocator<Context*> >&, int)+0x13f) [0x80d208f]
> 16: (Monitor::recovered_leader(int)+0x3ac) [0x80ac5ac]
> 17: (Paxos::handle_last(MMonPaxos*)+0xb02) [0x80e0572]
> 18: (Paxos::dispatch(PaxosServiceMessage*)+0x2c4) [0x80e0e94]
> 19: (Monitor::_ms_dispatch(Message*)+0x1181) [0x80c3b11]
> 20: (Monitor::ms_dispatch(Message*)+0x31) [0x80d5021]
> 21: (DispatchQueue::entry()+0x337) [0x82afa47]
> 22: (DispatchQueue::DispatchThread::entry()+0x20) [0x823eec0]
> 23: (Thread::_entry_func(void*)+0x11) [0x824be41]
> 24: (()+0x57b0) [0xb75ef7b0]
> 25: (clone()+0x5e) [0xb73d8cde]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
> 0/ 5 none
> 0/ 1 lockdep
> 0/ 1 context
> 1/ 1 crush
> 1/ 5 mds
> 1/ 5 mds_balancer
> 1/ 5 mds_locker
> 1/ 5 mds_log
> 1/ 5 mds_log_expire
> 1/ 5 mds_migrator
> 0/ 1 buffer
> 0/ 1 timer
> 0/ 1 filer
> 0/ 1 striper
> 0/ 1 objecter
> 0/ 5 rados
> 0/ 5 rbd
> 0/ 5 journaler
> 0/ 5 objectcacher
> 0/ 5 client
> 0/ 5 osd
> 0/ 5 optracker
> 0/ 5 objclass
> 1/ 3 filestore
> 1/ 3 journal
> 0/ 5 ms
> 1/ 5 mon
> 0/10 monc
> 0/ 5 paxos
> 0/ 5 tp
> 1/ 5 auth
> 1/ 5 crypto
> 1/ 1 finisher
> 1/ 5 heartbeatmap
> 1/ 5 perfcounter
> 1/ 5 rgw
> 1/ 5 hadoop
> 1/ 5 javaclient
> 1/ 5 asok
> 1/ 1 throttle
> -2/-2 (syslog threshold)
> -1/-1 (stderr threshold)
> max_recent 100000
> max_new 1000
> log_file /var/log/ceph/ceph-mon.a.log
> --- end dump of recent events ---
>
> the binaries are coming from ceph.com debian-testing repo.
>
> My second problem - I have 2 systems which mount ceph. Whenever I
> mount ceph on any other system it usually mounts but get stuck on
> stat* operations (i.e. simple ls -al will hang with read( from the
> ceph-mounted directory for ages). This kind of client stuck also
> affects two working clients. they also start to stuck on the stat*
> even after shutdown of the third client. so usually umount/mount or
> even reboot for existing clients solves the issue)
>
>
> --
> ...WBR, Roman Hlynovskiy
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2012-12-19 13:26 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-19 9:03 ceph stability Roman Hlynovskiy
2012-12-19 10:08 ` Joao Eduardo Luis
2012-12-19 10:58 ` Roman Hlynovskiy
2012-12-19 12:11 ` Joao Eduardo Luis
2012-12-19 13:26 ` Mark Nelson [this message]
2012-12-20 7:08 ` Roman Hlynovskiy
2012-12-20 14:31 ` Mark Nelson
2012-12-21 10:07 ` Amon Ott
2013-01-05 0:37 ` Gregory Farnum
2012-12-19 15:40 ` Sage Weil
2012-12-20 8:02 ` Roman Hlynovskiy
2012-12-20 15:20 ` Sam Lang
2012-12-21 4:49 ` Roman Hlynovskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50D1C09E.9000504@inktank.com \
--to=mark.nelson@inktank.com \
--cc=ceph-devel@vger.kernel.org \
--cc=roman.hlynovskiy@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.