From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Priebe - Profihost AG Subject: still crashing osds with next branch Date: Wed, 20 Jun 2012 12:03:30 +0200 Message-ID: <4FE19FF2.2090302@profihost.ag> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail.profihost.ag ([85.158.179.208]:44219 "EHLO mail.profihost.ag" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753873Ab2FTKDh (ORCPT ); Wed, 20 Jun 2012 06:03:37 -0400 Sender: ceph-devel-owner@vger.kernel.org List-ID: To: "ceph-devel@vger.kernel.org" Hello list, i'm still seeing osd crashes with next branch under KVM load. If you need the core dump please tell me. Here are TWO different crashes. Here are the last log lines: ########### CRASH 1 ########### -3> 2012-06-20 11:59:06.446836 7f1660f4b700 0 osd.13 105 pg[4.64b( v 105'29708 (103'28588,105'29708] n=25 ec=56 les/c 105/105 104/104/104) [13] r=0 lpr=104 mlcod 105'29708 active+degraded] watch: oi.user_version=28492 -2> 2012-06-20 11:59:06.496350 7f166074a700 0 osd.13 105 pg[4.64b( v 105'29709 (103'28588,105'29709] n=25 ec=56 les/c 105/105 104/104/104) [13] r=0 lpr=104 mlcod 105'29709 active+degraded] watch: ctx->obc=0x9f94840 cookie=1 oi.version=29709 ctx->at_version=105'29710 -1> 2012-06-20 11:59:06.496386 7f166074a700 0 osd.13 105 pg[4.64b( v 105'29709 (103'28588,105'29709] n=25 ec=56 les/c 105/105 104/104/104) [13] r=0 lpr=104 mlcod 105'29709 active+degraded] watch: oi.user_version=28492 0> 2012-06-20 11:59:06.499813 7f1664052700 -1 *** Caught signal (Segmentation fault) ** in thread 7f1664052700 ceph version 0.47.2-521-g88c7629 (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb) 1: /usr/bin/ceph-osd() [0x70e429] 2: (()+0xeff0) [0x7f16714d5ff0] 3: (OSD::disconnect_session_watches(OSD::Session*)+0x418) [0x5c80b8] 4: (OSD::ms_handle_reset(Connection*)+0x13b) [0x5c88db] 5: (SimpleMessenger::dispatch_entry()+0x1145) [0x72ca85] 6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x719dad] 7: (()+0x68ca) [0x7f16714cd8ca] 8: (clone()+0x6d) [0x7f166fb51c0d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- end dump of recent events --- ########### CRASH 2 ########### 0> 2012-06-20 11:56:46.339027 7f39d5c0a700 -1 ./common/Mutex.h: In function 'void Mutex::Lock(bool)' thread 7f39d5c0a700 time 2012-06-20 11:56:46.338403 ./common/Mutex.h: 110: FAILED assert(r == 0) ceph version 0.47.2-521-g88c7629 (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb) 1: /usr/bin/ceph-osd() [0x51a05d] 2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a] 3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c] 4: (ThreadPool::worker()+0xbb7) [0x7bbff7] 5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad] 6: (()+0x68ca) [0x7f39e10818ca] 7: (clone()+0x6d) [0x7f39df705c0d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- end dump of recent events --- 2012-06-20 11:56:46.355013 7f39d5c0a700 -1 *** Caught signal (Aborted) ** in thread 7f39d5c0a700 ceph version 0.47.2-521-g88c7629 (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb) 1: /usr/bin/ceph-osd() [0x70e429] 2: (()+0xeff0) [0x7f39e1089ff0] 3: (gsignal()+0x35) [0x7f39df668225] 4: (abort()+0x180) [0x7f39df66b030] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39dfefcdc5] 6: (()+0xcb166) [0x7f39dfefb166] 7: (()+0xcb193) [0x7f39dfefb193] 8: (()+0xcb28e) [0x7f39dfefb28e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x940) [0x78ae90] 10: /usr/bin/ceph-osd() [0x51a05d] 11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a] 12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c] 13: (ThreadPool::worker()+0xbb7) [0x7bbff7] 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad] 15: (()+0x68ca) [0x7f39e10818ca] 16: (clone()+0x6d) [0x7f39df705c0d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- begin dump of recent events --- 0> 2012-06-20 11:56:46.355013 7f39d5c0a700 -1 *** Caught signal (Aborted) ** in thread 7f39d5c0a700 ceph version 0.47.2-521-g88c7629 (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb) 1: /usr/bin/ceph-osd() [0x70e429] 2: (()+0xeff0) [0x7f39e1089ff0] 3: (gsignal()+0x35) [0x7f39df668225] 4: (abort()+0x180) [0x7f39df66b030] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39dfefcdc5] 6: (()+0xcb166) [0x7f39dfefb166] 7: (()+0xcb193) [0x7f39dfefb193] 8: (()+0xcb28e) [0x7f39dfefb28e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x940) [0x78ae90] 10: /usr/bin/ceph-osd() [0x51a05d] 11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a] 12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c] 13: (ThreadPool::worker()+0xbb7) [0x7bbff7] 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad] 15: (()+0x68ca) [0x7f39e10818ca] 16: (clone()+0x6d) [0x7f39df705c0d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- end dump of recent events --- Stefan