* mon crash
@ 2013-06-19 9:53 James Harper
2013-06-19 15:30 ` Joao Eduardo Luis
2013-06-19 16:01 ` Sage Weil
0 siblings, 2 replies; 5+ messages in thread
From: James Harper @ 2013-06-19 9:53 UTC (permalink / raw)
To: ceph-devel@vger.kernel.org
Every time I start up one of my mons it crashes. Two others are running but there seems to be long delays (=several seconds) when doing mon status (maybe this is the behaviour when one mon is down?)
The tail of /var/log/ceph/ceph-mon.4.log follows this email.
Version is 0.61.3-1~bpo70+1 from http://ceph.com/debian-cuttlefish wheezy main
This was happening in a previous version, and then even before that but I thought I'd fixed it by wiping the errant mon and recreating it.
Anything else I can supply that might help?
Thanks
James
0> 2013-06-19 19:45:44.018695 7f472d995700 -1 mon/Monitor.cc: In function 'void Monitor::sync_timeout(entity_inst_t&)' thread 7f472d995700 time 2013-06-19 19:45:44.017928
mon/Monitor.cc: 1101: FAILED assert(sync_state == SYNC_STATE_CHUNKS)
ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)
1: /usr/bin/ceph-mon() [0x4c8eca]
2: (Context::complete(int)+0xa) [0x4d70fa]
3: (SafeTimer::timer_thread()+0x1af) [0x64ad4f]
4: (SafeTimerThread::entry()+0xd) [0x64c3dd]
5: (()+0x6b50) [0x7f47c0c3ab50]
6: (clone()+0x6d) [0x7f47bf39ba7d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
0/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 hadoop
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-mon.4.log
--- end dump of recent events ---
2013-06-19 19:45:44.036036 7f472d995700 -1 *** Caught signal (Aborted) **
in thread 7f472d995700
ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)
1: /usr/bin/ceph-mon() [0x5a08b2]
2: (()+0xf030) [0x7f47c0c43030]
3: (gsignal()+0x35) [0x7f47bf2f3475]
4: (abort()+0x180) [0x7f47bf2f66f0]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f47bfb4889d]
6: (()+0x63996) [0x7f47bfb46996]
7: (()+0x639c3) [0x7f47bfb469c3]
8: (()+0x63bee) [0x7f47bfb46bee]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x40a) [0x65418a]
10: /usr/bin/ceph-mon() [0x4c8eca]
11: (Context::complete(int)+0xa) [0x4d70fa]
12: (SafeTimer::timer_thread()+0x1af) [0x64ad4f]
13: (SafeTimerThread::entry()+0xd) [0x64c3dd]
14: (()+0x6b50) [0x7f47c0c3ab50]
15: (clone()+0x6d) [0x7f47bf39ba7d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
0> 2013-06-19 19:45:44.036036 7f472d995700 -1 *** Caught signal (Aborted) **
in thread 7f472d995700
ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)
1: /usr/bin/ceph-mon() [0x5a08b2]
2: (()+0xf030) [0x7f47c0c43030]
3: (gsignal()+0x35) [0x7f47bf2f3475]
4: (abort()+0x180) [0x7f47bf2f66f0]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f47bfb4889d]
6: (()+0x63996) [0x7f47bfb46996]
7: (()+0x639c3) [0x7f47bfb469c3]
8: (()+0x63bee) [0x7f47bfb46bee]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x40a) [0x65418a]
10: /usr/bin/ceph-mon() [0x4c8eca]
11: (Context::complete(int)+0xa) [0x4d70fa]
12: (SafeTimer::timer_thread()+0x1af) [0x64ad4f]
13: (SafeTimerThread::entry()+0xd) [0x64c3dd]
14: (()+0x6b50) [0x7f47c0c3ab50]
15: (clone()+0x6d) [0x7f47bf39ba7d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
0/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 hadoop
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-mon.4.log
--- end dump of recent events ---
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: mon crash
2013-06-19 9:53 mon crash James Harper
@ 2013-06-19 15:30 ` Joao Eduardo Luis
2013-06-19 16:01 ` Sage Weil
1 sibling, 0 replies; 5+ messages in thread
From: Joao Eduardo Luis @ 2013-06-19 15:30 UTC (permalink / raw)
To: James Harper; +Cc: ceph-devel@vger.kernel.org
On 06/19/2013 10:53 AM, James Harper wrote:
> Every time I start up one of my mons it crashes. Two others are running but there seems to be long delays (=several seconds) when doing mon status (maybe this is the behaviour when one mon is down?)
>
> The tail of /var/log/ceph/ceph-mon.4.log follows this email.
>
> Version is 0.61.3-1~bpo70+1 from http://ceph.com/debian-cuttlefish wheezy main
>
> This was happening in a previous version, and then even before that but I thought I'd fixed it by wiping the errant mon and recreating it.
>
> Anything else I can supply that might help?
>
> Thanks
>
> James
>
> 0> 2013-06-19 19:45:44.018695 7f472d995700 -1 mon/Monitor.cc: In function 'void Monitor::sync_timeout(entity_inst_t&)' thread 7f472d995700 time 2013-06-19 19:45:44.017928
> mon/Monitor.cc: 1101: FAILED assert(sync_state == SYNC_STATE_CHUNKS)
>
> ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)
> 1: /usr/bin/ceph-mon() [0x4c8eca]
> 2: (Context::complete(int)+0xa) [0x4d70fa]
> 3: (SafeTimer::timer_thread()+0x1af) [0x64ad4f]
> 4: (SafeTimerThread::entry()+0xd) [0x64c3dd]
> 5: (()+0x6b50) [0x7f47c0c3ab50]
> 6: (clone()+0x6d) [0x7f47bf39ba7d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Issues on sync_timeout() have been seen, I track them down for some
time, find nothing of worth and logs usually don't help that much, and I
eventually have to move on.
http://tracker.ceph.com/issues/4845
and
http://tracker.ceph.com/issues/5171
contain two iterations of what appears to be the same bug. My guess is
that there's a lingering Context not being cancelled somewhere. Or it
might be some other thing altogether.
James, do you happen to have a full log you can share with us?
-Joao
--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: mon crash
2013-06-19 9:53 mon crash James Harper
2013-06-19 15:30 ` Joao Eduardo Luis
@ 2013-06-19 16:01 ` Sage Weil
2013-06-19 23:31 ` James Harper
1 sibling, 1 reply; 5+ messages in thread
From: Sage Weil @ 2013-06-19 16:01 UTC (permalink / raw)
To: James Harper; +Cc: ceph-devel@vger.kernel.org
On Wed, 19 Jun 2013, James Harper wrote:
> Every time I start up one of my mons it crashes. Two others are running
> but there seems to be long delays (=several seconds) when doing mon
> status (maybe this is the behaviour when one mon is down?)
>
> The tail of /var/log/ceph/ceph-mon.4.log follows this email.
>
> Version is 0.61.3-1~bpo70+1 from http://ceph.com/debian-cuttlefish wheezy main
>
> This was happening in a previous version, and then even before that but
> I thought I'd fixed it by wiping the errant mon and recreating it.
>
> Anything else I can supply that might help?
Can you try installing the current cuttlefish branch package and see if
the problem is still present? If so, we can gather logs to fully
diagnose.
http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/cuttlefish/
or similar, depending on your distro. Or
ceph-deploy install --dev=cuttlefish <hostname>
Thanks!
sage
>
> Thanks
>
> James
>
> 0> 2013-06-19 19:45:44.018695 7f472d995700 -1 mon/Monitor.cc: In function 'void Monitor::sync_timeout(entity_inst_t&)' thread 7f472d995700 time 2013-06-19 19:45:44.017928
> mon/Monitor.cc: 1101: FAILED assert(sync_state == SYNC_STATE_CHUNKS)
>
> ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)
> 1: /usr/bin/ceph-mon() [0x4c8eca]
> 2: (Context::complete(int)+0xa) [0x4d70fa]
> 3: (SafeTimer::timer_thread()+0x1af) [0x64ad4f]
> 4: (SafeTimerThread::entry()+0xd) [0x64c3dd]
> 5: (()+0x6b50) [0x7f47c0c3ab50]
> 6: (clone()+0x6d) [0x7f47bf39ba7d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>
> --- logging levels ---
> 0/ 5 none
> 0/ 1 lockdep
> 0/ 1 context
> 1/ 1 crush
> 1/ 5 mds
> 1/ 5 mds_balancer
> 1/ 5 mds_locker
> 1/ 5 mds_log
> 1/ 5 mds_log_expire
> 1/ 5 mds_migrator
> 0/ 1 buffer
> 0/ 1 timer
> 0/ 1 filer
> 0/ 1 striper
> 0/ 1 objecter
> 0/ 5 rados
> 0/ 5 rbd
> 0/ 5 journaler
> 0/ 5 objectcacher
> 0/ 5 client
> 0/ 5 osd
> 0/ 5 optracker
> 0/ 5 objclass
> 1/ 3 filestore
> 1/ 3 journal
> 0/ 5 ms
> 1/ 5 mon
> 0/10 monc
> 0/ 5 paxos
> 0/ 5 tp
> 1/ 5 auth
> 1/ 5 crypto
> 1/ 1 finisher
> 1/ 5 heartbeatmap
> 1/ 5 perfcounter
> 1/ 5 rgw
> 1/ 5 hadoop
> 1/ 5 javaclient
> 1/ 5 asok
> 1/ 1 throttle
> -2/-2 (syslog threshold)
> -1/-1 (stderr threshold)
> max_recent 10000
> max_new 1000
> log_file /var/log/ceph/ceph-mon.4.log
> --- end dump of recent events ---
> 2013-06-19 19:45:44.036036 7f472d995700 -1 *** Caught signal (Aborted) **
> in thread 7f472d995700
>
> ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)
> 1: /usr/bin/ceph-mon() [0x5a08b2]
> 2: (()+0xf030) [0x7f47c0c43030]
> 3: (gsignal()+0x35) [0x7f47bf2f3475]
> 4: (abort()+0x180) [0x7f47bf2f66f0]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f47bfb4889d]
> 6: (()+0x63996) [0x7f47bfb46996]
> 7: (()+0x639c3) [0x7f47bfb469c3]
> 8: (()+0x63bee) [0x7f47bfb46bee]
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x40a) [0x65418a]
> 10: /usr/bin/ceph-mon() [0x4c8eca]
> 11: (Context::complete(int)+0xa) [0x4d70fa]
> 12: (SafeTimer::timer_thread()+0x1af) [0x64ad4f]
> 13: (SafeTimerThread::entry()+0xd) [0x64c3dd]
> 14: (()+0x6b50) [0x7f47c0c3ab50]
> 15: (clone()+0x6d) [0x7f47bf39ba7d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>
> --- begin dump of recent events ---
> 0> 2013-06-19 19:45:44.036036 7f472d995700 -1 *** Caught signal (Aborted) **
> in thread 7f472d995700
>
> ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)
> 1: /usr/bin/ceph-mon() [0x5a08b2]
> 2: (()+0xf030) [0x7f47c0c43030]
> 3: (gsignal()+0x35) [0x7f47bf2f3475]
> 4: (abort()+0x180) [0x7f47bf2f66f0]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f47bfb4889d]
> 6: (()+0x63996) [0x7f47bfb46996]
> 7: (()+0x639c3) [0x7f47bfb469c3]
> 8: (()+0x63bee) [0x7f47bfb46bee]
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x40a) [0x65418a]
> 10: /usr/bin/ceph-mon() [0x4c8eca]
> 11: (Context::complete(int)+0xa) [0x4d70fa]
> 12: (SafeTimer::timer_thread()+0x1af) [0x64ad4f]
> 13: (SafeTimerThread::entry()+0xd) [0x64c3dd]
> 14: (()+0x6b50) [0x7f47c0c3ab50]
> 15: (clone()+0x6d) [0x7f47bf39ba7d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>
> --- logging levels ---
> 0/ 5 none
> 0/ 1 lockdep
> 0/ 1 context
> 1/ 1 crush
> 1/ 5 mds
> 1/ 5 mds_balancer
> 1/ 5 mds_locker
> 1/ 5 mds_log
> 1/ 5 mds_log_expire
> 1/ 5 mds_migrator
> 0/ 1 buffer
> 0/ 1 timer
> 0/ 1 filer
> 0/ 1 striper
> 0/ 1 objecter
> 0/ 5 rados
> 0/ 5 rbd
> 0/ 5 journaler
> 0/ 5 objectcacher
> 0/ 5 client
> 0/ 5 osd
> 0/ 5 optracker
> 0/ 5 objclass
> 1/ 3 filestore
> 1/ 3 journal
> 0/ 5 ms
> 1/ 5 mon
> 0/10 monc
> 0/ 5 paxos
> 0/ 5 tp
> 1/ 5 auth
> 1/ 5 crypto
> 1/ 1 finisher
> 1/ 5 heartbeatmap
> 1/ 5 perfcounter
> 1/ 5 rgw
> 1/ 5 hadoop
> 1/ 5 javaclient
> 1/ 5 asok
> 1/ 1 throttle
> -2/-2 (syslog threshold)
> -1/-1 (stderr threshold)
> max_recent 10000
> max_new 1000
> log_file /var/log/ceph/ceph-mon.4.log
> --- end dump of recent events ---
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: mon crash
2013-06-19 16:01 ` Sage Weil
@ 2013-06-19 23:31 ` James Harper
2013-06-20 11:03 ` Joao Eduardo Luis
0 siblings, 1 reply; 5+ messages in thread
From: James Harper @ 2013-06-19 23:31 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel@vger.kernel.org
>
> On Wed, 19 Jun 2013, James Harper wrote:
> > Every time I start up one of my mons it crashes. Two others are running
> > but there seems to be long delays (=several seconds) when doing mon
> > status (maybe this is the behaviour when one mon is down?)
> >
> > The tail of /var/log/ceph/ceph-mon.4.log follows this email.
> >
> > Version is 0.61.3-1~bpo70+1 from http://ceph.com/debian-cuttlefish
> wheezy main
> >
> > This was happening in a previous version, and then even before that but
> > I thought I'd fixed it by wiping the errant mon and recreating it.
> >
> > Anything else I can supply that might help?
>
> Can you try installing the current cuttlefish branch package and see if
> the problem is still present? If so, we can gather logs to fully
> diagnose.
>
> http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/cuttlefish/
>
I used http://gitbuilder.ceph.com/ceph-deb-wheezy-x86_64-basic/ref/cuttlefish and mon started and joined the cluster within 30 seconds (probably much less), and appears to be running and stable.
Will this patch hit the regular repository soon?
Thanks
James
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: mon crash
2013-06-19 23:31 ` James Harper
@ 2013-06-20 11:03 ` Joao Eduardo Luis
0 siblings, 0 replies; 5+ messages in thread
From: Joao Eduardo Luis @ 2013-06-20 11:03 UTC (permalink / raw)
To: James Harper; +Cc: Sage Weil, ceph-devel@vger.kernel.org
On 06/20/2013 12:31 AM, James Harper wrote:
>>
>> On Wed, 19 Jun 2013, James Harper wrote:
>>> Every time I start up one of my mons it crashes. Two others are running
>>> but there seems to be long delays (=several seconds) when doing mon
>>> status (maybe this is the behaviour when one mon is down?)
>>>
>>> The tail of /var/log/ceph/ceph-mon.4.log follows this email.
>>>
>>> Version is 0.61.3-1~bpo70+1 from http://ceph.com/debian-cuttlefish
>> wheezy main
>>>
>>> This was happening in a previous version, and then even before that but
>>> I thought I'd fixed it by wiping the errant mon and recreating it.
>>>
>>> Anything else I can supply that might help?
>>
>> Can you try installing the current cuttlefish branch package and see if
>> the problem is still present? If so, we can gather logs to fully
>> diagnose.
>>
>> http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/cuttlefish/
>>
>
> I used http://gitbuilder.ceph.com/ceph-deb-wheezy-x86_64-basic/ref/cuttlefish and mon started and joined the cluster within 30 seconds (probably much less), and appears to be running and stable.
>
> Will this patch hit the regular repository soon?
I believe cuttlefish should become 0.61.4 soon.
-Joao
>
> Thanks
>
> James
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-06-20 11:03 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-19 9:53 mon crash James Harper
2013-06-19 15:30 ` Joao Eduardo Luis
2013-06-19 16:01 ` Sage Weil
2013-06-19 23:31 ` James Harper
2013-06-20 11:03 ` Joao Eduardo Luis
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.