From: Wido den Hollander <wido@widodh.nl>
To: "Jens Kristian Søgaard" <jens@mermaidconsulting.dk>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: Hit suicide timeout after adding new osd
Date: Thu, 17 Jan 2013 15:47:27 +0100 [thread overview]
Message-ID: <50F80EFF.7020803@widodh.nl> (raw)
In-Reply-To: <50F80C3A.9020007@mermaidconsulting.dk>
Hi,
On 01/17/2013 03:35 PM, Jens Kristian Søgaard wrote:
> Hi guys,
>
> I had a functioning Ceph system that reported HEALTH_OK. It was running
> with 3 osds on 3 servers.
>
> Then I added an extra osd on 1 of the servers using the commands from
> the documentation here:
>
> http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
>
> Shortly after I did that 2 of the existing osds crashed.
>
> I restarted them and after some hours they were up and running again,
> but soon one of them crashed again - and a third existing osd crashed as
> well. I restarted those two and waited some hours for them to come up. A
> short while later one of them crashed again.
>
> I have then restarted restarted that last one and watched the logs
> closely. It seems the same patterns repeats itself every time. It starts
> up doing its normal maintenance before going "up" (takes a long while).
> Then it seems to be running, but logs the following every 5 seconds:
>
> heartbeat_map is_healthy 'OSD::op_tp thread 0x7f051b7f6700' had timed
> out after 30
>
> After some time it logs:
>
> ===================================================
> heartbeat_map is_healthy 'OSD::op_tp thread 0x7f051b7f6700' had suicide
> timed out after 300
>
> 2013-01-17 15:24:35.051524 7f053f149700 -1 common/HeartbeatMap.cc: In
> function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*,
> const char*, time_t)' thread 7f053f149700 time 2013-01-17 15:24:33.849654
> common/HeartbeatMap.cc: 78: FAILED assert(0 == "hit suicide timeout")
>
> ceph version 0.56.1 (e4a541624df62ef353e754391cbbb707f54b16f7)
> 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x2eb) [0x8462bb]
> 2: (ceph::HeartbeatMap::is_healthy()+0x8e) [0x846a9e]
> 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0x846cc8]
> 4: (CephContextServiceThread::entry()+0x55) [0x8e01c5]
> 5: /lib64/libpthread.so.0() [0x360de07d14]
> 6: (clone()+0x6d) [0x360d6f167d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> 2013-01-17 15:24:35.301183 7f053f149700 -1 *** Caught signal (Aborted) **
> in thread 7f053f149700
>
> ceph version 0.56.1 (e4a541624df62ef353e754391cbbb707f54b16f7)
> 1: /usr/bin/ceph-osd() [0x82ea90]
> 2: /lib64/libpthread.so.0() [0x360de0efe0]
> 3: (gsignal()+0x35) [0x360d635925]
> 4: (abort()+0x148) [0x360d6370d8]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x3611660dad]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> ===================================================
>
> How can I avoid this? - is it a bug, or have I done something wrong?
>
I think you are seeing the same issue as I noticed about two weeks ago:
http://www.spinics.net/lists/ceph-devel/msg11328.html
See this issue: http://tracker.newdream.net/issues/3714
I can't find branch wip-3714 anymore, so it might be already merged into
next.
You might want to try building from 'next' yourself or fetch some new
packages from the RPM repos: http://eu.ceph.com/docs/master/install/rpm/
Wido
> I'm running Ceph 0.56.1 from the official RPMs on Fedora 17.
> The underlying disks and network connectivity has been tested and
> nothing seems to be wrong there.
>
> Thanks in advance for your assistance!
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-01-17 14:47 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-17 14:35 Hit suicide timeout after adding new osd Jens Kristian Søgaard
2013-01-17 14:47 ` Wido den Hollander [this message]
2013-01-17 14:50 ` Stefan Priebe
2013-01-17 15:33 ` Wido den Hollander
2013-01-17 15:37 ` Stefan Priebe
2013-01-17 17:17 ` Sage Weil
2013-01-17 20:32 ` Jens Kristian Søgaard
2013-01-17 22:03 ` Sage Weil
2013-01-18 11:24 ` Jens Kristian Søgaard
2013-01-18 21:28 ` Sage Weil
2013-01-18 21:36 ` Jens Kristian Søgaard
2013-01-18 21:44 ` Sage Weil
2013-01-19 9:25 ` Jens Kristian Søgaard
2013-01-19 16:44 ` Sage Weil
2013-01-19 17:56 ` Jens Kristian Søgaard
2013-01-19 18:19 ` Sage Weil
2013-01-19 18:40 ` Jens Kristian Søgaard
2013-01-19 20:08 ` Sage Weil
2013-01-19 20:29 ` Jens Kristian Søgaard
2013-01-19 22:04 ` Sage Weil
2013-01-21 0:14 ` Sage Weil
2013-01-21 6:59 ` Jens Kristian Søgaard
2013-01-21 7:11 ` Sage Weil
2013-01-23 12:14 ` Jens Kristian Søgaard
2013-01-23 12:26 ` Wido den Hollander
2013-01-23 12:29 ` Jens Kristian Søgaard
2013-01-23 13:13 ` Sage Weil
2013-01-23 20:59 ` Jens Kristian Søgaard
2013-01-23 22:56 ` Andrey Korolyov
2013-01-24 4:39 ` Sage Weil
2013-01-24 7:44 ` Andrey Korolyov
2013-01-24 18:01 ` Sage Weil
2013-02-17 11:21 ` Andrey Korolyov
2013-02-17 17:52 ` Sage Weil
2013-01-24 4:28 ` Sage Weil
2013-01-24 10:08 ` Jens Kristian Søgaard
2013-01-24 18:06 ` Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50F80EFF.7020803@widodh.nl \
--to=wido@widodh.nl \
--cc=ceph-devel@vger.kernel.org \
--cc=jens@mermaidconsulting.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.