All of lore.kernel.org
 help / color / mirror / Atom feed
* mon crash
@ 2013-06-19  9:53 James Harper
  2013-06-19 15:30 ` Joao Eduardo Luis
  2013-06-19 16:01 ` Sage Weil
  0 siblings, 2 replies; 5+ messages in thread
From: James Harper @ 2013-06-19  9:53 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

Every time I start up one of my mons it crashes. Two others are running but there seems to be long delays (=several seconds) when doing mon status (maybe this is the behaviour when one mon is down?)

The tail of /var/log/ceph/ceph-mon.4.log follows this email.

Version is 0.61.3-1~bpo70+1 from http://ceph.com/debian-cuttlefish wheezy main

This was happening in a previous version, and then even before that but I thought I'd fixed it by wiping the errant mon and recreating it.

Anything else I can supply that might help?

Thanks

James

     0> 2013-06-19 19:45:44.018695 7f472d995700 -1 mon/Monitor.cc: In function 'void Monitor::sync_timeout(entity_inst_t&)' thread 7f472d995700 time 2013-06-19 19:45:44.017928
mon/Monitor.cc: 1101: FAILED assert(sync_state == SYNC_STATE_CHUNKS)

 ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)
 1: /usr/bin/ceph-mon() [0x4c8eca]
 2: (Context::complete(int)+0xa) [0x4d70fa]
 3: (SafeTimer::timer_thread()+0x1af) [0x64ad4f]
 4: (SafeTimerThread::entry()+0xd) [0x64c3dd]
 5: (()+0x6b50) [0x7f47c0c3ab50]
 6: (clone()+0x6d) [0x7f47bf39ba7d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mon.4.log
--- end dump of recent events ---
2013-06-19 19:45:44.036036 7f472d995700 -1 *** Caught signal (Aborted) **
 in thread 7f472d995700

 ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)
 1: /usr/bin/ceph-mon() [0x5a08b2]
 2: (()+0xf030) [0x7f47c0c43030]
 3: (gsignal()+0x35) [0x7f47bf2f3475]
 4: (abort()+0x180) [0x7f47bf2f66f0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f47bfb4889d]
 6: (()+0x63996) [0x7f47bfb46996]
 7: (()+0x639c3) [0x7f47bfb469c3]
 8: (()+0x63bee) [0x7f47bfb46bee]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x40a) [0x65418a]
 10: /usr/bin/ceph-mon() [0x4c8eca]
 11: (Context::complete(int)+0xa) [0x4d70fa]
 12: (SafeTimer::timer_thread()+0x1af) [0x64ad4f]
 13: (SafeTimerThread::entry()+0xd) [0x64c3dd]
 14: (()+0x6b50) [0x7f47c0c3ab50]
 15: (clone()+0x6d) [0x7f47bf39ba7d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
     0> 2013-06-19 19:45:44.036036 7f472d995700 -1 *** Caught signal (Aborted) **
 in thread 7f472d995700

 ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)
 1: /usr/bin/ceph-mon() [0x5a08b2]
 2: (()+0xf030) [0x7f47c0c43030]
 3: (gsignal()+0x35) [0x7f47bf2f3475]
 4: (abort()+0x180) [0x7f47bf2f66f0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f47bfb4889d]
 6: (()+0x63996) [0x7f47bfb46996]
 7: (()+0x639c3) [0x7f47bfb469c3]
 8: (()+0x63bee) [0x7f47bfb46bee]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x40a) [0x65418a]
 10: /usr/bin/ceph-mon() [0x4c8eca]
 11: (Context::complete(int)+0xa) [0x4d70fa]
 12: (SafeTimer::timer_thread()+0x1af) [0x64ad4f]
 13: (SafeTimerThread::entry()+0xd) [0x64c3dd]
 14: (()+0x6b50) [0x7f47c0c3ab50]
 15: (clone()+0x6d) [0x7f47bf39ba7d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mon.4.log
--- end dump of recent events ---

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mon crash
  2013-06-19  9:53 mon crash James Harper
@ 2013-06-19 15:30 ` Joao Eduardo Luis
  2013-06-19 16:01 ` Sage Weil
  1 sibling, 0 replies; 5+ messages in thread
From: Joao Eduardo Luis @ 2013-06-19 15:30 UTC (permalink / raw)
  To: James Harper; +Cc: ceph-devel@vger.kernel.org

On 06/19/2013 10:53 AM, James Harper wrote:
> Every time I start up one of my mons it crashes. Two others are running but there seems to be long delays (=several seconds) when doing mon status (maybe this is the behaviour when one mon is down?)
>
> The tail of /var/log/ceph/ceph-mon.4.log follows this email.
>
> Version is 0.61.3-1~bpo70+1 from http://ceph.com/debian-cuttlefish wheezy main
>
> This was happening in a previous version, and then even before that but I thought I'd fixed it by wiping the errant mon and recreating it.
>
> Anything else I can supply that might help?
>
> Thanks
>
> James
>
>       0> 2013-06-19 19:45:44.018695 7f472d995700 -1 mon/Monitor.cc: In function 'void Monitor::sync_timeout(entity_inst_t&)' thread 7f472d995700 time 2013-06-19 19:45:44.017928
> mon/Monitor.cc: 1101: FAILED assert(sync_state == SYNC_STATE_CHUNKS)
>
>   ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)
>   1: /usr/bin/ceph-mon() [0x4c8eca]
>   2: (Context::complete(int)+0xa) [0x4d70fa]
>   3: (SafeTimer::timer_thread()+0x1af) [0x64ad4f]
>   4: (SafeTimerThread::entry()+0xd) [0x64c3dd]
>   5: (()+0x6b50) [0x7f47c0c3ab50]
>   6: (clone()+0x6d) [0x7f47bf39ba7d]
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Issues on sync_timeout() have been seen, I track them down for some 
time, find nothing of worth and logs usually don't help that much, and I 
eventually have to move on.

http://tracker.ceph.com/issues/4845

and

http://tracker.ceph.com/issues/5171

contain two iterations of what appears to be the same bug.  My guess is 
that there's a lingering Context not being cancelled somewhere.  Or it 
might be some other thing altogether.

James, do you happen to have a full log you can share with us?


   -Joao

-- 
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mon crash
  2013-06-19  9:53 mon crash James Harper
  2013-06-19 15:30 ` Joao Eduardo Luis
@ 2013-06-19 16:01 ` Sage Weil
  2013-06-19 23:31   ` James Harper
  1 sibling, 1 reply; 5+ messages in thread
From: Sage Weil @ 2013-06-19 16:01 UTC (permalink / raw)
  To: James Harper; +Cc: ceph-devel@vger.kernel.org

On Wed, 19 Jun 2013, James Harper wrote:
> Every time I start up one of my mons it crashes. Two others are running 
> but there seems to be long delays (=several seconds) when doing mon 
> status (maybe this is the behaviour when one mon is down?)
> 
> The tail of /var/log/ceph/ceph-mon.4.log follows this email.
> 
> Version is 0.61.3-1~bpo70+1 from http://ceph.com/debian-cuttlefish wheezy main
> 
> This was happening in a previous version, and then even before that but 
> I thought I'd fixed it by wiping the errant mon and recreating it.
> 
> Anything else I can supply that might help?

Can you try installing the current cuttlefish branch package and see if 
the problem is still present?  If so, we can gather logs to fully 
diagnose.

http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/cuttlefish/

or similar, depending on your distro.  Or

 ceph-deploy install --dev=cuttlefish <hostname>

Thanks!
sage

> 
> Thanks
> 
> James
> 
>      0> 2013-06-19 19:45:44.018695 7f472d995700 -1 mon/Monitor.cc: In function 'void Monitor::sync_timeout(entity_inst_t&)' thread 7f472d995700 time 2013-06-19 19:45:44.017928
> mon/Monitor.cc: 1101: FAILED assert(sync_state == SYNC_STATE_CHUNKS)
> 
>  ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)
>  1: /usr/bin/ceph-mon() [0x4c8eca]
>  2: (Context::complete(int)+0xa) [0x4d70fa]
>  3: (SafeTimer::timer_thread()+0x1af) [0x64ad4f]
>  4: (SafeTimerThread::entry()+0xd) [0x64c3dd]
>  5: (()+0x6b50) [0x7f47c0c3ab50]
>  6: (clone()+0x6d) [0x7f47bf39ba7d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
> 
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    0/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/ 5 hadoop
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-mon.4.log
> --- end dump of recent events ---
> 2013-06-19 19:45:44.036036 7f472d995700 -1 *** Caught signal (Aborted) **
>  in thread 7f472d995700
> 
>  ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)
>  1: /usr/bin/ceph-mon() [0x5a08b2]
>  2: (()+0xf030) [0x7f47c0c43030]
>  3: (gsignal()+0x35) [0x7f47bf2f3475]
>  4: (abort()+0x180) [0x7f47bf2f66f0]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f47bfb4889d]
>  6: (()+0x63996) [0x7f47bfb46996]
>  7: (()+0x639c3) [0x7f47bfb469c3]
>  8: (()+0x63bee) [0x7f47bfb46bee]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x40a) [0x65418a]
>  10: /usr/bin/ceph-mon() [0x4c8eca]
>  11: (Context::complete(int)+0xa) [0x4d70fa]
>  12: (SafeTimer::timer_thread()+0x1af) [0x64ad4f]
>  13: (SafeTimerThread::entry()+0xd) [0x64c3dd]
>  14: (()+0x6b50) [0x7f47c0c3ab50]
>  15: (clone()+0x6d) [0x7f47bf39ba7d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
> 
> --- begin dump of recent events ---
>      0> 2013-06-19 19:45:44.036036 7f472d995700 -1 *** Caught signal (Aborted) **
>  in thread 7f472d995700
> 
>  ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)
>  1: /usr/bin/ceph-mon() [0x5a08b2]
>  2: (()+0xf030) [0x7f47c0c43030]
>  3: (gsignal()+0x35) [0x7f47bf2f3475]
>  4: (abort()+0x180) [0x7f47bf2f66f0]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f47bfb4889d]
>  6: (()+0x63996) [0x7f47bfb46996]
>  7: (()+0x639c3) [0x7f47bfb469c3]
>  8: (()+0x63bee) [0x7f47bfb46bee]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x40a) [0x65418a]
>  10: /usr/bin/ceph-mon() [0x4c8eca]
>  11: (Context::complete(int)+0xa) [0x4d70fa]
>  12: (SafeTimer::timer_thread()+0x1af) [0x64ad4f]
>  13: (SafeTimerThread::entry()+0xd) [0x64c3dd]
>  14: (()+0x6b50) [0x7f47c0c3ab50]
>  15: (clone()+0x6d) [0x7f47bf39ba7d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
> 
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    0/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/ 5 hadoop
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-mon.4.log
> --- end dump of recent events ---
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: mon crash
  2013-06-19 16:01 ` Sage Weil
@ 2013-06-19 23:31   ` James Harper
  2013-06-20 11:03     ` Joao Eduardo Luis
  0 siblings, 1 reply; 5+ messages in thread
From: James Harper @ 2013-06-19 23:31 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel@vger.kernel.org

> 
> On Wed, 19 Jun 2013, James Harper wrote:
> > Every time I start up one of my mons it crashes. Two others are running
> > but there seems to be long delays (=several seconds) when doing mon
> > status (maybe this is the behaviour when one mon is down?)
> >
> > The tail of /var/log/ceph/ceph-mon.4.log follows this email.
> >
> > Version is 0.61.3-1~bpo70+1 from http://ceph.com/debian-cuttlefish
> wheezy main
> >
> > This was happening in a previous version, and then even before that but
> > I thought I'd fixed it by wiping the errant mon and recreating it.
> >
> > Anything else I can supply that might help?
> 
> Can you try installing the current cuttlefish branch package and see if
> the problem is still present?  If so, we can gather logs to fully
> diagnose.
> 
> http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/cuttlefish/
> 

I used http://gitbuilder.ceph.com/ceph-deb-wheezy-x86_64-basic/ref/cuttlefish and mon started and joined the cluster within 30 seconds (probably much less), and appears to be running and stable.

Will this patch hit the regular repository soon?

Thanks

James

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mon crash
  2013-06-19 23:31   ` James Harper
@ 2013-06-20 11:03     ` Joao Eduardo Luis
  0 siblings, 0 replies; 5+ messages in thread
From: Joao Eduardo Luis @ 2013-06-20 11:03 UTC (permalink / raw)
  To: James Harper; +Cc: Sage Weil, ceph-devel@vger.kernel.org

On 06/20/2013 12:31 AM, James Harper wrote:
>>
>> On Wed, 19 Jun 2013, James Harper wrote:
>>> Every time I start up one of my mons it crashes. Two others are running
>>> but there seems to be long delays (=several seconds) when doing mon
>>> status (maybe this is the behaviour when one mon is down?)
>>>
>>> The tail of /var/log/ceph/ceph-mon.4.log follows this email.
>>>
>>> Version is 0.61.3-1~bpo70+1 from http://ceph.com/debian-cuttlefish
>> wheezy main
>>>
>>> This was happening in a previous version, and then even before that but
>>> I thought I'd fixed it by wiping the errant mon and recreating it.
>>>
>>> Anything else I can supply that might help?
>>
>> Can you try installing the current cuttlefish branch package and see if
>> the problem is still present?  If so, we can gather logs to fully
>> diagnose.
>>
>> http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/cuttlefish/
>>
>
> I used http://gitbuilder.ceph.com/ceph-deb-wheezy-x86_64-basic/ref/cuttlefish and mon started and joined the cluster within 30 seconds (probably much less), and appears to be running and stable.
>
> Will this patch hit the regular repository soon?

I believe cuttlefish should become 0.61.4 soon.

   -Joao

>
> Thanks
>
> James
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-06-20 11:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-19  9:53 mon crash James Harper
2013-06-19 15:30 ` Joao Eduardo Luis
2013-06-19 16:01 ` Sage Weil
2013-06-19 23:31   ` James Harper
2013-06-20 11:03     ` Joao Eduardo Luis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.