All of lore.kernel.org
 help / color / mirror / Atom feed
* MDS crash.
@ 2011-07-02 21:30 Fyodor Ustinov
  2011-07-02 22:03 ` Sage Weil
  2011-07-05 16:03 ` Sage Weil
  0 siblings, 2 replies; 16+ messages in thread
From: Fyodor Ustinov @ 2011-07-02 21:30 UTC (permalink / raw)
  To: ceph-devel

Hi!

mds - 0.30

I can not to reproduce, sorry.

mds/Locker.cc: In function 'void Locker::file_excl(ScatterLock*, 
bool*)', in thread '0x7fefc6c68700'
mds/Locker.cc: 3982: FAILED assert(in->get_loner() >= 0 && 
in->mds_caps_wanted.empty())
  ceph version 0.30 (commit:64b1b2c70f0cde39c72d5d724c65ea8afaaa00b9)
  1: (Locker::file_excl(ScatterLock*, bool*)+0x944) [0x5baa04]
  2: (Locker::simple_sync(SimpleLock*, bool*)+0x211) [0x5bea41]
  3: (Locker::file_eval(ScatterLock*, bool*)+0x2c7) [0x5bf537]
  4: (Locker::rdlock_finish(SimpleLock*, Mutation*, bool*)+0x174) [0x5c7ec4]
  5: (Locker::drop_locks(Mutation*, std::set<CInode*, 
std::less<CInode*>, std::allocator<CInode*> >*)+0x148) [0x5c8488]
  6: (MDCache::request_cleanup(MDRequest*)+0x8a) [0x54247a]
  7: (MDCache::request_finish(MDRequest*)+0xf3) [0x542b23]
  8: (Server::reply_request(MDRequest*, MClientReply*, CInode*, 
CDentry*)+0x193) [0x4e2633]
  9: (Server::handle_client_stat(MDRequest*)+0x344) [0x4ebbe4]
  10: (Server::dispatch_client_request(MDRequest*)+0x3d6) [0x50b4a6]
  11: (MDCache::dispatch_request(MDRequest*)+0x46) [0x52b056]
  12: (C_MDS_RetryRequest::finish(int)+0x11) [0x5180e1]
  13: /usr/bin/cmds() [0x5b06d2]
  14: /usr/bin/cmds() [0x5b0817]
  15: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<Context*, 
std::allocator<Context*> >*)+0x175f) [0x5c517f]
  16: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x378) 
[0x5c7998]
  17: (Locker::drop_locks(Mutation*, std::set<CInode*, 
std::less<CInode*>, std::allocator<CInode*> >*)+0x208) [0x5c8548]
  18: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t, 
Capability*, MClientCaps*)+0x2c6) [0x5cc6c6]
  19: (C_Locker_FileUpdate_finish::finish(int)+0x2c) [0x5d9b1c]
  20: (finish_contexts(std::list<Context*, std::allocator<Context*> >&, 
int)+0xc4) [0x5943b4]
  21: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x20b) 
[0x6ae8ab]
  22: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xa6c) [0x692b6c]
  23: (MDS::handle_core_message(Message*)+0x82f) [0x4a3e6f]
  24: (MDS::_dispatch(Message*)+0x2a9) [0x4a4189]
  25: (MDS::ms_dispatch(Message*)+0x6d) [0x4a5d7d]
  26: (SimpleMessenger::dispatch_entry()+0x8f3) [0x6e16d3]
  27: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4829bc]
  28: (()+0x6d8c) [0x7fefc969fd8c]
  29: (clone()+0x6d) [0x7fefc855204d]
  ceph version 0.30 (commit:64b1b2c70f0cde39c72d5d724c65ea8afaaa00b9)
  1: (Locker::file_excl(ScatterLock*, bool*)+0x944) [0x5baa04]
  2: (Locker::simple_sync(SimpleLock*, bool*)+0x211) [0x5bea41]
  3: (Locker::file_eval(ScatterLock*, bool*)+0x2c7) [0x5bf537]
  4: (Locker::rdlock_finish(SimpleLock*, Mutation*, bool*)+0x174) [0x5c7ec4]
  5: (Locker::drop_locks(Mutation*, std::set<CInode*, 
std::less<CInode*>, std::allocator<CInode*> >*)+0x148) [0x5c8488]
  6: (MDCache::request_cleanup(MDRequest*)+0x8a) [0x54247a]
  7: (MDCache::request_finish(MDRequest*)+0xf3) [0x542b23]
  8: (Server::reply_request(MDRequest*, MClientReply*, CInode*, 
CDentry*)+0x193) [0x4e2633]
  9: (Server::handle_client_stat(MDRequest*)+0x344) [0x4ebbe4]
  10: (Server::dispatch_client_request(MDRequest*)+0x3d6) [0x50b4a6]
  11: (MDCache::dispatch_request(MDRequest*)+0x46) [0x52b056]
  12: (C_MDS_RetryRequest::finish(int)+0x11) [0x5180e1]
  13: /usr/bin/cmds() [0x5b06d2]
  14: /usr/bin/cmds() [0x5b0817]
  15: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<Context*, 
std::allocator<Context*> >*)+0x175f) [0x5c517f]
  16: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x378) 
[0x5c7998]
  17: (Locker::drop_locks(Mutation*, std::set<CInode*, 
std::less<CInode*>, std::allocator<CInode*> >*)+0x208) [0x5c8548]
  18: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t, 
Capability*, MClientCaps*)+0x2c6) [0x5cc6c6]
  19: (C_Locker_FileUpdate_finish::finish(int)+0x2c) [0x5d9b1c]
  20: (finish_contexts(std::list<Context*, std::allocator<Context*> >&, 
int)+0xc4) [0x5943b4]
  21: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x20b) 
[0x6ae8ab]
  22: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xa6c) [0x692b6c]
  23: (MDS::handle_core_message(Message*)+0x82f) [0x4a3e6f]
  24: (MDS::_dispatch(Message*)+0x2a9) [0x4a4189]
  25: (MDS::ms_dispatch(Message*)+0x6d) [0x4a5d7d]
  26: (SimpleMessenger::dispatch_entry()+0x8f3) [0x6e16d3]
  27: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4829bc]
  28: (()+0x6d8c) [0x7fefc969fd8c]
  29: (clone()+0x6d) [0x7fefc855204d]
*** Caught signal (Aborted) **
  in thread 0x7fefc6c68700
  ceph version 0.30 (commit:64b1b2c70f0cde39c72d5d724c65ea8afaaa00b9)
  1: /usr/bin/cmds() [0x70495e]
  2: (()+0xfc60) [0x7fefc96a8c60]
  3: (gsignal()+0x35) [0x7fefc849fd05]
  4: (abort()+0x186) [0x7fefc84a3ab6]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fefc8d566dd]
  6: (()+0xb9926) [0x7fefc8d54926]
  7: (()+0xb9953) [0x7fefc8d54953]
  8: (()+0xb9a5e) [0x7fefc8d54a5e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x371) [0x6cffa1]
  10: (Locker::file_excl(ScatterLock*, bool*)+0x944) [0x5baa04]
  11: (Locker::simple_sync(SimpleLock*, bool*)+0x211) [0x5bea41]
  12: (Locker::file_eval(ScatterLock*, bool*)+0x2c7) [0x5bf537]
  13: (Locker::rdlock_finish(SimpleLock*, Mutation*, bool*)+0x174) 
[0x5c7ec4]
  14: (Locker::drop_locks(Mutation*, std::set<CInode*, 
std::less<CInode*>, std::allocator<CInode*> >*)+0x148) [0x5c8488]
  15: (MDCache::request_cleanup(MDRequest*)+0x8a) [0x54247a]
  16: (MDCache::request_finish(MDRequest*)+0xf3) [0x542b23]
  17: (Server::reply_request(MDRequest*, MClientReply*, CInode*, 
CDentry*)+0x193) [0x4e2633]
  18: (Server::handle_client_stat(MDRequest*)+0x344) [0x4ebbe4]
  19: (Server::dispatch_client_request(MDRequest*)+0x3d6) [0x50b4a6]
  20: (MDCache::dispatch_request(MDRequest*)+0x46) [0x52b056]
  21: (C_MDS_RetryRequest::finish(int)+0x11) [0x5180e1]
  22: /usr/bin/cmds() [0x5b06d2]
  23: /usr/bin/cmds() [0x5b0817]
  24: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<Context*, 
std::allocator<Context*> >*)+0x175f) [0x5c517f]
  25: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x378) 
[0x5c7998]
  26: (Locker::drop_locks(Mutation*, std::set<CInode*, 
std::less<CInode*>, std::allocator<CInode*> >*)+0x208) [0x5c8548]
  27: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t, 
Capability*, MClientCaps*)+0x2c6) [0x5cc6c6]
  28: (C_Locker_FileUpdate_finish::finish(int)+0x2c) [0x5d9b1c]
  29: (finish_contexts(std::list<Context*, std::allocator<Context*> >&, 
int)+0xc4) [0x5943b4]
  30: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x20b) 
[0x6ae8ab]
  31: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xa6c) [0x692b6c]
  32: (MDS::handle_core_message(Message*)+0x82f) [0x4a3e6f]
  33: (MDS::_dispatch(Message*)+0x2a9) [0x4a4189]
  34: (MDS::ms_dispatch(Message*)+0x6d) [0x4a5d7d]
  35: (SimpleMessenger::dispatch_entry()+0x8f3) [0x6e16d3]
  36: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4829bc]
  37: (()+0x6d8c) [0x7fefc969fd8c]
  38: (clone()+0x6d) [0x7fefc855204d]

core bt:
(gdb) bt
#0  0x00007fefc96a8b3b in raise (sig=<value optimized out>) at 
../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
#1  0x0000000000703a53 in ?? ()
#2  0x0000000000704b7b in ?? ()
#3 <signal handler called>
#4  0x00007fefc849fd05 in raise (sig=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#5  0x00007fefc84a3ab6 in abort () at abort.c:92
#6  0x00007fefc8d566dd in __gnu_cxx::__verbose_terminate_handler() () 
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007fefc8d54926 in ?? () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007fefc8d54953 in std::terminate() () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007fefc8d54a5e in __cxa_throw () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00000000006cffa1 in ceph::__ceph_assert_fail(char const*, char 
const*, int, char const*) ()
#11 0x00000000005baa04 in Locker::file_excl(ScatterLock*, bool*) ()
#12 0x00000000005bea41 in Locker::simple_sync(SimpleLock*, bool*) ()
#13 0x00000000005bf537 in Locker::file_eval(ScatterLock*, bool*) ()
#14 0x00000000005c7ec4 in Locker::rdlock_finish(SimpleLock*, Mutation*, 
bool*) ()
#15 0x00000000005c8488 in Locker::drop_locks(Mutation*, 
std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*) ()
#16 0x000000000054247a in MDCache::request_cleanup(MDRequest*) ()
#17 0x0000000000542b23 in MDCache::request_finish(MDRequest*) ()
#18 0x00000000004e2633 in Server::reply_request(MDRequest*, 
MClientReply*, CInode*, CDentry*) ()
#19 0x00000000004ebbe4 in Server::handle_client_stat(MDRequest*) ()
#20 0x000000000050b4a6 in Server::dispatch_client_request(MDRequest*) ()
#21 0x000000000052b056 in MDCache::dispatch_request(MDRequest*) ()
#22 0x00000000005180e1 in C_MDS_RetryRequest::finish(int) ()
#23 0x00000000005b06d2 in ?? ()
#24 0x00000000005b0817 in ?? ()
#25 0x00000000005c517f in Locker::eval_gather(SimpleLock*, bool, bool*, 
std::list<Context*, std::allocator<Context*> >*) ()
#26 0x00000000005c7998 in Locker::wrlock_finish(SimpleLock*, Mutation*, 
bool*) ()
#27 0x00000000005c8548 in Locker::drop_locks(Mutation*, 
std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*) ()
#28 0x00000000005cc6c6 in Locker::file_update_finish(CInode*, Mutation*, 
bool, client_t, Capability*, MClientCaps*) ()
#29 0x00000000005d9b1c in C_Locker_FileUpdate_finish::finish(int) ()
#30 0x00000000005943b4 in finish_contexts(std::list<Context*, 
std::allocator<Context*> >&, int) ()
#31 0x00000000006ae8ab in Journaler::_finish_flush(int, unsigned long, 
utime_t) ()
#32 0x0000000000692b6c in Objecter::handle_osd_op_reply(MOSDOpReply*) ()
#33 0x00000000004a3e6f in MDS::handle_core_message(Message*) ()
#34 0x00000000004a4189 in MDS::_dispatch(Message*) ()
#35 0x00000000004a5d7d in MDS::ms_dispatch(Message*) ()
#36 0x00000000006e16d3 in SimpleMessenger::dispatch_entry() ()
#37 0x00000000004829bc in SimpleMessenger::DispatchThread::entry() ()
#38 0x00007fefc969fd8c in start_thread (arg=0x7fefc6c68700) at 
pthread_create.c:304
#39 0x00007fefc855204d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#40 0x0000000000000000 in ?? ()

WBR,
     Fyodor.

^ permalink raw reply	[flat|nested] 16+ messages in thread
* MDS crash
@ 2011-10-28 22:57 Noah Watkins
  0 siblings, 0 replies; 16+ messages in thread
From: Noah Watkins @ 2011-10-28 22:57 UTC (permalink / raw)
  To: ceph-devel

This is a trace of an MDS crash. I was running a simple setup (./vstart -d -n), and this is from out/mds.b

This is from the latest wip-getdir branch. I posted some context preceding the crash. I have the full trace if more context is helpful.

-Noah

================================

2011-10-28 15:50:00.251876 7f2f3102b700 mds.1.cache.dir(100000003f6) pop_and_dirty_projected_fnode 0x13ab180 v55
2011-10-28 15:50:00.251902 7f2f3102b700 mds.1.cache.dir(100000003f6) mark_dirty (already dirty) [dir 100000003f6 /tmp/hadoop-nwatkins/mapred/staging/nwatkins/.staging/ [2,head] auth{0=1} pv=55 v=55 cv=0/0 ap=1+2+2 state=1610612738|complete f(v0 m2011-10-28 15:50:00.116185 3=0+3)->f(v0 m2011-10-28 15:50:00.116185 3=0+3) n(v5 rc2011-10-28 15:50:00.116185 b284930 5=2+3)->n(v5 rc2011-10-28 15:50:00.116185 b284930 5=2+3) hs=3+1,ss=0+0 dirty=4 | child replicated dirty authpin 0x12b6770] version 55
2011-10-28 15:50:00.251909 7f2f3102b700 mds.1.cache.dir(100000003f5) pop_and_dirty_projected_fnode 0x13abb40 v52
2011-10-28 15:50:00.251936 7f2f3102b700 mds.1.cache.dir(100000003f5) mark_dirty (already dirty) [dir 100000003f5 /tmp/hadoop-nwatkins/mapred/staging/nwatkins/ [2,head] auth{0=1} pv=52 v=52 cv=0/0 ap=1+1+2 state=1610612738|complete f(v0 m2011-10-28 15:39:07.835948 1=0+1)->f(v0 m2011-10-28 15:39:07.835948 1=0+1) n(v9 rc2011-10-28 15:50:00.116185 b284930 6=2+4)/n(v9 rc2011-10-28 15:46:30.070103 b284930 5=2+3)->n(v9 rc2011-10-28 15:50:00.116185 b284930 6=2+4)/n(v9 rc2011-10-28 15:46:30.070103 b284930 5=2+3) hs=1+0,ss=0+0 dirty=1 | child replicated dirty authpin 0x12b6378] version 52
2011-10-28 15:50:00.251957 7f2f3102b700 mds.1.cache send_dentry_link [dentry #1/tmp/hadoop-nwatkins/mapred/staging/nwatkins/.staging/job_201110281545_0003 [2,head] auth (dn xlock x=1 by 0x135bc00) (dversion lock w=1 last_client=4242) v=54 ap=2+0 inode=0x1311b60 | request lock inodepin dirty authpin 0x1345d80]
2011-10-28 15:50:00.251980 7f2f3102b700 mds.1.server reply_request 0 (Success) client_request(client.4242:11 mkdir #100000003f6/job_201110281545_0003) v1
2011-10-28 15:50:00.251990 7f2f3102b700 mds.1.server apply_allocated_inos 20000000004 / [20000000005~3e8] / 0
2011-10-28 15:50:00.252002 7f2f3102b700 mds.1.inotable: apply_alloc_id 20000000004 to [200000003ed~2fffffffc12]/[200000003ec~2fffffffc13]
./include/interval_set.h: In function 'void interval_set<T>::erase(T, T) [with T = inodeno_t]', in thread '7f2f3102b700'
./include/interval_set.h: 385: FAILED assert(p->first <= start)
 ceph version 0.37-192-g1a4eec2 (commit:1a4eec20a345ced993a48012aaaa8d8ca344a1ba)
 1: (InoTable::apply_alloc_id(inodeno_t)+0x441) [0x647041]
 2: (Server::apply_allocated_inos(MDRequest*)+0x4dd) [0x509f3d]
 3: (Server::reply_request(MDRequest*, MClientReply*, CInode*, CDentry*)+0x83) [0x50a283]
 4: (C_MDS_mknod_finish::finish(int)+0xfe) [0x53686e]
 5: (Context::complete(int)+0xa) [0x4a4d7a]
 6: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xc8) [0x4c3568]
 7: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x18f) [0x69dd9f]
 8: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xc57) [0x686c47]
 9: (MDS::handle_core_message(Message*)+0x987) [0x4bedf7]
 10: (MDS::_dispatch(Message*)+0x2f) [0x4bef8f]
 11: (MDS::ms_dispatch(Message*)+0x70) [0x4c06f0]
 12: (SimpleMessenger::dispatch_entry()+0x833) [0x6edd13]
 13: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49ed7c]
 14: (()+0x7efc) [0x7f2f348f0efc]
 15: (clone()+0x6d) [0x7f2f3332a89d]
 ceph version 0.37-192-g1a4eec2 (commit:1a4eec20a345ced993a48012aaaa8d8ca344a1ba)
 1: (InoTable::apply_alloc_id(inodeno_t)+0x441) [0x647041]
 2: (Server::apply_allocated_inos(MDRequest*)+0x4dd) [0x509f3d]
 3: (Server::reply_request(MDRequest*, MClientReply*, CInode*, CDentry*)+0x83) [0x50a283]
 4: (C_MDS_mknod_finish::finish(int)+0xfe) [0x53686e]
 5: (Context::complete(int)+0xa) [0x4a4d7a]
 6: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xc8) [0x4c3568]
 7: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x18f) [0x69dd9f]
 8: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xc57) [0x686c47]
 9: (MDS::handle_core_message(Message*)+0x987) [0x4bedf7]
 10: (MDS::_dispatch(Message*)+0x2f) [0x4bef8f]
 11: (MDS::ms_dispatch(Message*)+0x70) [0x4c06f0]
 12: (SimpleMessenger::dispatch_entry()+0x833) [0x6edd13]
 13: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49ed7c]
 14: (()+0x7efc) [0x7f2f348f0efc]
 15: (clone()+0x6d) [0x7f2f3332a89d]
*** Caught signal (Aborted) **
 in thread 7f2f3102b700
 ceph version 0.37-192-g1a4eec2 (commit:1a4eec20a345ced993a48012aaaa8d8ca344a1ba)
 1: ./ceph-mds() [0x777fb6]
 2: (()+0x10060) [0x7f2f348f9060]
 3: (gsignal()+0x35) [0x7f2f3327f3a5]
 4: (abort()+0x17b) [0x7f2f33282b0b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f2f33b3dd7d]
 6: (()+0xb9f26) [0x7f2f33b3bf26]
 7: (()+0xb9f53) [0x7f2f33b3bf53]
 8: (()+0xba04e) [0x7f2f33b3c04e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x193) [0x6fedf3]
 10: (InoTable::apply_alloc_id(inodeno_t)+0x441) [0x647041]
 11: (Server::apply_allocated_inos(MDRequest*)+0x4dd) [0x509f3d]
 12: (Server::reply_request(MDRequest*, MClientReply*, CInode*, CDentry*)+0x83) [0x50a283]
 13: (C_MDS_mknod_finish::finish(int)+0xfe) [0x53686e]
 14: (Context::complete(int)+0xa) [0x4a4d7a]
 15: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xc8) [0x4c3568]
 16: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x18f) [0x69dd9f]
 17: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xc57) [0x686c47]
 18: (MDS::handle_core_message(Message*)+0x987) [0x4bedf7]
 19: (MDS::_dispatch(Message*)+0x2f) [0x4bef8f]
 20: (MDS::ms_dispatch(Message*)+0x70) [0x4c06f0]
 21: (SimpleMessenger::dispatch_entry()+0x833) [0x6edd13]
 22: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49ed7c]
 23: (()+0x7efc) [0x7f2f348f0efc]
 24: (clone()+0x6d) [0x7f2f3332a89d]

^ permalink raw reply	[flat|nested] 16+ messages in thread
* MDS crash
@ 2011-05-23 21:21 Fyodor Ustinov
  2011-05-23 22:27 ` Sage Weil
  2011-05-24 23:54 ` Sage Weil
  0 siblings, 2 replies; 16+ messages in thread
From: Fyodor Ustinov @ 2011-05-23 21:21 UTC (permalink / raw)
  To: ceph-devel

Hi.

2011-05-24 00:17:45.490684 7f45415e1740 ceph version 0.28.commit: 
071881d7e5599571e46bda17094bb4b48691e89a. process: cmds. pid: 4424
2011-05-24 00:17:45.492293 7f453ef81700 mds-1.0 ms_handle_connect on 
77.120.112.193:6789/0
2011-05-24 00:17:49.497862 7f453ef81700 mds-1.0 handle_mds_map standby
2011-05-24 00:17:53.274911 7f453ef81700 mds0.5 handle_mds_map i am now 
mds0.5
2011-05-24 00:17:53.274939 7f453ef81700 mds0.5 handle_mds_map state 
change up:standby --> up:replay
2011-05-24 00:17:53.274951 7f453ef81700 mds0.5 replay_start
2011-05-24 00:17:53.274962 7f453ef81700 mds0.5  recovery set is
2011-05-24 00:17:53.274969 7f453ef81700 mds0.5  need osdmap epoch 104, 
have 103
2011-05-24 00:17:53.274985 7f453ef81700 mds0.5  waiting for osdmap 104 
(which blacklists prior instance)
2011-05-24 00:17:53.275016 7f453ef81700 mds0.cache handle_mds_failure 
mds0 : recovery peers are
2011-05-24 00:17:53.276145 7f453ef81700 mds0.5 ms_handle_connect on 
77.120.112.201:6800/29765
2011-05-24 00:17:53.276223 7f453ef81700 mds0.5 ms_handle_connect on 
82.144.220.71:6800/5210
2011-05-24 00:17:53.276785 7f453ef81700 mds0.5 ms_handle_connect on 
82.144.220.72:6800/3960
2011-05-24 00:17:53.301249 7f453ef81700 mds0.5 ms_handle_connect on 
82.144.220.70:6800/25341
2011-05-24 00:17:53.307286 7f453ef81700 mds0.cache creating system inode 
with ino:100
2011-05-24 00:17:53.307441 7f453ef81700 mds0.cache creating system inode 
with ino:1
2011-05-24 00:17:53.308273 7f453ef81700 mds0.5 ms_handle_connect on 
77.120.112.200:6800/9187
2011-05-24 00:17:54.506400 7f4537fff700 mds0.5 replay_done
2011-05-24 00:17:54.506431 7f4537fff700 mds0.5 making mds journal writeable
2011-05-24 00:17:54.511104 7f453ef81700 mds0.5 handle_mds_map i am now 
mds0.5
2011-05-24 00:17:54.511127 7f453ef81700 mds0.5 handle_mds_map state 
change up:replay --> up:reconnect
2011-05-24 00:17:54.511138 7f453ef81700 mds0.5 reconnect_start
2011-05-24 00:17:54.511144 7f453ef81700 mds0.5 reopen_log
2011-05-24 00:17:54.511163 7f453ef81700 mds0.server reconnect_clients -- 
1 sessions
2011-05-24 00:17:54.511832 7f453c472700 -- 77.120.112.193:6800/4424 >> 
77.120.112.209:0/3638704563 pipe(0x10f8370 sd=11 pgs=0 cs=0 l=0).accept 
peer addr is really 77.120.112.209:0/3638704563 (socket is 
77.120.112.209:38599/0)
2011-05-24 00:17:54.512859 7f453ef81700 log [DBG] : reconnect by 
client4404 77.120.112.209:0/3638704563 after 0.001651
2011-05-24 00:17:54.513057 7f453ef81700 mds0.server missing 1000000860a 
#10000008019/vtapes/drive0/data (mine), will load later
2011-05-24 00:17:54.513091 7f453ef81700 mds0.5 reconnect_done
2011-05-24 00:17:54.515176 7f453ef81700 mds0.5 handle_mds_map i am now 
mds0.5
2011-05-24 00:17:54.515193 7f453ef81700 mds0.5 handle_mds_map state 
change up:reconnect --> up:rejoin
2011-05-24 00:17:54.515201 7f453ef81700 mds0.5 rejoin_joint_start
2011-05-24 00:17:54.522602 7f453ef81700 mds0.5 rejoin_done
2011-05-24 00:17:54.528794 7f453ef81700 mds0.5 handle_mds_map i am now 
mds0.5
2011-05-24 00:17:54.528812 7f453ef81700 mds0.5 handle_mds_map state 
change up:rejoin --> up:active
2011-05-24 00:17:54.528819 7f453ef81700 mds0.5 recovery_done -- 
successful recovery!
2011-05-24 00:17:54.529315 7f453ef81700 mds0.5 active_start
2011-05-24 00:17:54.531405 7f453ef81700 mds0.5 cluster recovered.
*** Caught signal (Segmentation fault) **
  in thread 0x7f453ef81700
  ceph version 0.28 (commit:071881d7e5599571e46bda17094bb4b48691e89a)
  1: /usr/bin/cmds() [0x712c5e]
  2: (()+0xfc60) [0x7f45411c0c60]
  3: (MDCache::get_or_create_stray_dentry(CInode*)+0x25) [0x5356f5]
  4: (Server::handle_client_unlink(MDRequest*)+0x997) [0x508857]
  5: (Server::handle_client_request(MClientRequest*)+0x522) [0x520852]
  6: (MDS::handle_deferrable_message(Message*)+0x9af) [0x4a266f]
  7: (MDS::_dispatch(Message*)+0x173e) [0x4b617e]
  8: (MDS::_dispatch(Message*)+0x427) [0x4b4e67]
  9: (MDS::ms_dispatch(Message*)+0x59) [0x4b66c9]
  10: (SimpleMessenger::dispatch_entry()+0x7ea) [0x4838aa]
  11: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x47b26c]
  12: (()+0x6d8c) [0x7f45411b7d8c]
  13: (clone()+0x6d) [0x7f454006a04d]

WBR,
     Fyodor.

^ permalink raw reply	[flat|nested] 16+ messages in thread
* mds crash
@ 2011-04-19 15:18 Mark Nigh
  2011-04-19 16:17 ` Sage Weil
  0 siblings, 1 reply; 16+ messages in thread
From: Mark Nigh @ 2011-04-19 15:18 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

I recently have been working with exporting ceph to NFS. I have had stability problems with NFS (ceph is working but NFS crashes). But most recently, my mds0 will not start after one of these instances with NFS.

My setup. 2 mds, 1 mon (located on mds0), 5 osds. All running Ubuntu v10.10.

Here is the output when I try to start the mds0. Is there other debugging I can turn on?

/etc/init.d/ceph start mds0

2011-04-19 10:06:58.602640 7fb202fe4700 mds0.11 ms_handle_connect on 10.6.1.93:6800/945
./include/elist.h: In function 'elist<T>::item::~item() [with T = MDSlaveUpdate*]', in thread '0x7fb2004d5700'
./include/elist.h: 39: FAILED assert(!is_on_list())
 ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
 1: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
 2: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
 3: (MDLog::_replay_thread()+0xb90) [0x67f850]
 4: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
 5: (()+0x7971) [0x7fb20564a971]
 6: (clone()+0x6d) [0x7fb2042e692d]
 ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
 1: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
 2: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
 3: (MDLog::_replay_thread()+0xb90) [0x67f850]
 4: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
 5: (()+0x7971) [0x7fb20564a971]
 6: (clone()+0x6d) [0x7fb2042e692d]
*** Caught signal (Aborted) **
 in thread 0x7fb2004d5700
 ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
 1: /usr/bin/cmds() [0x70fc38]
 2: (()+0xfb40) [0x7fb205652b40]
 3: (gsignal()+0x35) [0x7fb204233ba5]
 4: (abort()+0x180) [0x7fb2042376b0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb204ad76bd]
 6: (()+0xb9906) [0x7fb204ad5906]
 7: (()+0xb9933) [0x7fb204ad5933]
 8: (()+0xb9a3e) [0x7fb204ad5a3e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x36a) [0x6f5eaa]
 10: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
 11: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
 12: (MDLog::_replay_thread()+0xb90) [0x67f850]
 13: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
 14: (()+0x7971) [0x7fb20564a971]
 15: (clone()+0x6d) [0x7fb2042e692d]

I am not sure why the IP address of 0.0.0.0 shows up with starting the mds0.

root@mds0:/var/log/ceph# /etc/init.d/ceph start mds0
=== mds.0 ===
Starting Ceph mds.0 on mds0...
 ** WARNING: Ceph is still under heavy development, and is only suitable for **
 **          testing and review.  Do not trust it with important data.       **
starting mds.0 at 0.0.0.0:6800/2994

Thanks for your assistance.

Mark Nigh
Systems Architect
mnigh@netelligent.com
 (p) 314.392.6926




This transmission and any attached files are privileged, confidential or otherwise the exclusive property of the intended recipient or Netelligent Corporation. If you are not the intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please contact us immediately by responding to this message or by telephone (314-392-6900) and promptly destroy the original transmission and its attachments.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-10-28 22:57 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-02 21:30 MDS crash Fyodor Ustinov
2011-07-02 22:03 ` Sage Weil
2011-07-02 22:16   ` Fyodor Ustinov
2011-07-05 16:03 ` Sage Weil
  -- strict thread matches above, loose matches on Subject: below --
2011-10-28 22:57 Noah Watkins
2011-05-23 21:21 Fyodor Ustinov
2011-05-23 22:27 ` Sage Weil
2011-05-23 22:45   ` Fyodor Ustinov
2011-05-23 23:08     ` Sage Weil
2011-05-23 23:52       ` Fyodor Ustinov
2011-05-24  0:32   ` Fyodor Ustinov
2011-05-24 23:54 ` Sage Weil
2011-04-19 15:18 mds crash Mark Nigh
2011-04-19 16:17 ` Sage Weil
2011-04-20 14:00   ` Mark Nigh
2011-04-21 19:08     ` Tommi Virtanen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.