* still crashing osds with next branch
@ 2012-06-20 10:03 Stefan Priebe - Profihost AG
2012-06-20 13:30 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 9+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-06-20 10:03 UTC (permalink / raw)
To: ceph-devel@vger.kernel.org
Hello list,
i'm still seeing osd crashes with next branch under KVM load. If you
need the core dump please tell me.
Here are TWO different crashes.
Here are the last log lines:
########### CRASH 1 ###########
-3> 2012-06-20 11:59:06.446836 7f1660f4b700 0 osd.13 105 pg[4.64b(
v 105'29708 (103'28588,105'29708] n=25 ec=56 les/c 105/105 104/104/104)
[13] r=0 lpr=104 mlcod 105'29708 active+degraded] watch:
oi.user_version=28492
-2> 2012-06-20 11:59:06.496350 7f166074a700 0 osd.13 105 pg[4.64b(
v 105'29709 (103'28588,105'29709] n=25 ec=56 les/c 105/105 104/104/104)
[13] r=0 lpr=104 mlcod 105'29709 active+degraded] watch:
ctx->obc=0x9f94840 cookie=1 oi.version=29709 ctx->at_version=105'29710
-1> 2012-06-20 11:59:06.496386 7f166074a700 0 osd.13 105 pg[4.64b(
v 105'29709 (103'28588,105'29709] n=25 ec=56 les/c 105/105 104/104/104)
[13] r=0 lpr=104 mlcod 105'29709 active+degraded] watch:
oi.user_version=28492
0> 2012-06-20 11:59:06.499813 7f1664052700 -1 *** Caught signal
(Segmentation fault) **
in thread 7f1664052700
ceph version 0.47.2-521-g88c7629
(commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
1: /usr/bin/ceph-osd() [0x70e429]
2: (()+0xeff0) [0x7f16714d5ff0]
3: (OSD::disconnect_session_watches(OSD::Session*)+0x418) [0x5c80b8]
4: (OSD::ms_handle_reset(Connection*)+0x13b) [0x5c88db]
5: (SimpleMessenger::dispatch_entry()+0x1145) [0x72ca85]
6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x719dad]
7: (()+0x68ca) [0x7f16714cd8ca]
8: (clone()+0x6d) [0x7f166fb51c0d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- end dump of recent events ---
########### CRASH 2 ###########
0> 2012-06-20 11:56:46.339027 7f39d5c0a700 -1 ./common/Mutex.h: In
function 'void Mutex::Lock(bool)' thread 7f39d5c0a700 time 2012-06-20
11:56:46.338403
./common/Mutex.h: 110: FAILED assert(r == 0)
ceph version 0.47.2-521-g88c7629
(commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
1: /usr/bin/ceph-osd() [0x51a05d]
2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
4: (ThreadPool::worker()+0xbb7) [0x7bbff7]
5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
6: (()+0x68ca) [0x7f39e10818ca]
7: (clone()+0x6d) [0x7f39df705c0d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- end dump of recent events ---
2012-06-20 11:56:46.355013 7f39d5c0a700 -1 *** Caught signal (Aborted) **
in thread 7f39d5c0a700
ceph version 0.47.2-521-g88c7629
(commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
1: /usr/bin/ceph-osd() [0x70e429]
2: (()+0xeff0) [0x7f39e1089ff0]
3: (gsignal()+0x35) [0x7f39df668225]
4: (abort()+0x180) [0x7f39df66b030]
5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39dfefcdc5]
6: (()+0xcb166) [0x7f39dfefb166]
7: (()+0xcb193) [0x7f39dfefb193]
8: (()+0xcb28e) [0x7f39dfefb28e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x940) [0x78ae90]
10: /usr/bin/ceph-osd() [0x51a05d]
11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
13: (ThreadPool::worker()+0xbb7) [0x7bbff7]
14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
15: (()+0x68ca) [0x7f39e10818ca]
16: (clone()+0x6d) [0x7f39df705c0d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- begin dump of recent events ---
0> 2012-06-20 11:56:46.355013 7f39d5c0a700 -1 *** Caught signal
(Aborted) **
in thread 7f39d5c0a700
ceph version 0.47.2-521-g88c7629
(commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
1: /usr/bin/ceph-osd() [0x70e429]
2: (()+0xeff0) [0x7f39e1089ff0]
3: (gsignal()+0x35) [0x7f39df668225]
4: (abort()+0x180) [0x7f39df66b030]
5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39dfefcdc5]
6: (()+0xcb166) [0x7f39dfefb166]
7: (()+0xcb193) [0x7f39dfefb193]
8: (()+0xcb28e) [0x7f39dfefb28e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x940) [0x78ae90]
10: /usr/bin/ceph-osd() [0x51a05d]
11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
13: (ThreadPool::worker()+0xbb7) [0x7bbff7]
14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
15: (()+0x68ca) [0x7f39e10818ca]
16: (clone()+0x6d) [0x7f39df705c0d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- end dump of recent events ---
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: still crashing osds with next branch
2012-06-20 10:03 still crashing osds with next branch Stefan Priebe - Profihost AG
@ 2012-06-20 13:30 ` Stefan Priebe - Profihost AG
2012-06-20 17:19 ` Stefan Priebe
0 siblings, 1 reply; 9+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-06-20 13:30 UTC (permalink / raw)
To: ceph-devel@vger.kernel.org
Mhm always the same osd's are crashing now again. Mostly while shutting
down or restarting a KVM machine.
This time:
####### Server 1 ########################
0> 2012-06-20 11:59:06.499813 7f1664052700 -1 *** Caught signal
(Segmentation fault) **
in thread 7f1664052700
ceph version 0.47.2-521-g88c7629
(commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
1: /usr/bin/ceph-osd() [0x70e429]
2: (()+0xeff0) [0x7f16714d5ff0]
3: (OSD::disconnect_session_watches(OSD::Session*)+0x418) [0x5c80b8]
4: (OSD::ms_handle_reset(Connection*)+0x13b) [0x5c88db]
5: (SimpleMessenger::dispatch_entry()+0x1145) [0x72ca85]
6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x719dad]
7: (()+0x68ca) [0x7f16714cd8ca]
8: (clone()+0x6d) [0x7f166fb51c0d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- end dump of recent events ---
And the
####### Server 2 ########################
thread 7ff933ef4700 time 2012-06-20 15:20:12.450641
osd/ReplicatedPG.cc: 968: FAILED assert(obc->registered)
ceph version 0.47.2-521-g88c7629
(commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
1: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0)
[0x56c3c0]
2: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
3: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
4: (ThreadPool::worker()+0xb38) [0x7bbf78]
5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
6: (()+0x68ca) [0x7ff9444768ca]
7: (clone()+0x6d) [0x7ff942afac0d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
0> 2012-06-20 15:20:12.466828 7ff939800700 -1 ./common/Mutex.h: In
function 'void Mutex::Lock(bool)' thread 7ff939800700 time 2
012-06-20 15:20:12.466152
./common/Mutex.h: 110: FAILED assert(r == 0)
ceph version 0.47.2-521-g88c7629
(commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
1: /usr/bin/ceph-osd() [0x51a05d]
2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
4: (ThreadPool::worker()+0xbb7) [0x7bbff7]
5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
6: (()+0x68ca) [0x7ff9444768ca]
7: (clone()+0x6d) [0x7ff942afac0d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- end dump of recent events ---
2012-06-20 15:20:12.511987 7ff933ef4700 -1 *** Caught signal (Aborted) **
in thread 7ff933ef4700
ceph version 0.47.2-521-g88c7629
(commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
1: /usr/bin/ceph-osd() [0x70e429]
2: (()+0xeff0) [0x7ff94447eff0]
3: (gsignal()+0x35) [0x7ff942a5d225]
4: (abort()+0x180) [0x7ff942a60030]
5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7ff9432f1dc5]
6: (()+0xcb166) [0x7ff9432f0166]
7: (()+0xcb193) [0x7ff9432f0193]
8: (()+0xcb28e) [0x7ff9432f028e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x940) [0x78ae90]
10: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0)
[0x56c3c0]
11: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
12: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
13: (ThreadPool::worker()+0xb38) [0x7bbf78]
14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
15: (()+0x68ca) [0x7ff9444768ca]
16: (clone()+0x6d) [0x7ff942afac0d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- begin dump of recent events ---
0> 2012-06-20 15:20:12.511987 7ff933ef4700 -1 *** Caught signal
(Aborted) **
in thread 7ff933ef4700
ceph version 0.47.2-521-g88c7629
(commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
1: /usr/bin/ceph-osd() [0x70e429]
2: (()+0xeff0) [0x7ff94447eff0]
3: (gsignal()+0x35) [0x7ff942a5d225]
4: (abort()+0x180) [0x7ff942a60030]
5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7ff9432f1dc5]
6: (()+0xcb166) [0x7ff9432f0166]
7: (()+0xcb193) [0x7ff9432f0193]
8: (()+0xcb28e) [0x7ff9432f028e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x940) [0x78ae90]
10: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0)
[0x56c3c0]
11: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
12: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
13: (ThreadPool::worker()+0xb38) [0x7bbf78]
14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
15: (()+0x68ca) [0x7ff9444768ca]
16: (clone()+0x6d) [0x7ff942afac0d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- end dump of recent events ---
Am 20.06.2012 12:03, schrieb Stefan Priebe - Profihost AG:
> Hello list,
>
> i'm still seeing osd crashes with next branch under KVM load. If you
> need the core dump please tell me.
>
> Here are TWO different crashes.
>
> Here are the last log lines:
>
> ########### CRASH 1 ###########
>
> -3> 2012-06-20 11:59:06.446836 7f1660f4b700 0 osd.13 105 pg[4.64b( v
> 105'29708 (103'28588,105'29708] n=25 ec=56 les/c 105/105 104/104/104)
> [13] r=0 lpr=104 mlcod 105'29708 active+degraded] watch:
> oi.user_version=28492
> -2> 2012-06-20 11:59:06.496350 7f166074a700 0 osd.13 105 pg[4.64b( v
> 105'29709 (103'28588,105'29709] n=25 ec=56 les/c 105/105 104/104/104)
> [13] r=0 lpr=104 mlcod 105'29709 active+degraded] watch:
> ctx->obc=0x9f94840 cookie=1 oi.version=29709 ctx->at_version=105'29710
> -1> 2012-06-20 11:59:06.496386 7f166074a700 0 osd.13 105 pg[4.64b( v
> 105'29709 (103'28588,105'29709] n=25 ec=56 les/c 105/105 104/104/104)
> [13] r=0 lpr=104 mlcod 105'29709 active+degraded] watch:
> oi.user_version=28492
> 0> 2012-06-20 11:59:06.499813 7f1664052700 -1 *** Caught signal
> (Segmentation fault) **
> in thread 7f1664052700
>
> ceph version 0.47.2-521-g88c7629
> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> 1: /usr/bin/ceph-osd() [0x70e429]
> 2: (()+0xeff0) [0x7f16714d5ff0]
> 3: (OSD::disconnect_session_watches(OSD::Session*)+0x418) [0x5c80b8]
> 4: (OSD::ms_handle_reset(Connection*)+0x13b) [0x5c88db]
> 5: (SimpleMessenger::dispatch_entry()+0x1145) [0x72ca85]
> 6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x719dad]
> 7: (()+0x68ca) [0x7f16714cd8ca]
> 8: (clone()+0x6d) [0x7f166fb51c0d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- end dump of recent events ---
>
>
> ########### CRASH 2 ###########
>
> 0> 2012-06-20 11:56:46.339027 7f39d5c0a700 -1 ./common/Mutex.h: In
> function 'void Mutex::Lock(bool)' thread 7f39d5c0a700 time 2012-06-20
> 11:56:46.338403
> ./common/Mutex.h: 110: FAILED assert(r == 0)
>
> ceph version 0.47.2-521-g88c7629
> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> 1: /usr/bin/ceph-osd() [0x51a05d]
> 2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> 3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> 4: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> 5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> 6: (()+0x68ca) [0x7f39e10818ca]
> 7: (clone()+0x6d) [0x7f39df705c0d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- end dump of recent events ---
> 2012-06-20 11:56:46.355013 7f39d5c0a700 -1 *** Caught signal (Aborted) **
> in thread 7f39d5c0a700
>
> ceph version 0.47.2-521-g88c7629
> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> 1: /usr/bin/ceph-osd() [0x70e429]
> 2: (()+0xeff0) [0x7f39e1089ff0]
> 3: (gsignal()+0x35) [0x7f39df668225]
> 4: (abort()+0x180) [0x7f39df66b030]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39dfefcdc5]
> 6: (()+0xcb166) [0x7f39dfefb166]
> 7: (()+0xcb193) [0x7f39dfefb193]
> 8: (()+0xcb28e) [0x7f39dfefb28e]
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x940) [0x78ae90]
> 10: /usr/bin/ceph-osd() [0x51a05d]
> 11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> 12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> 13: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> 15: (()+0x68ca) [0x7f39e10818ca]
> 16: (clone()+0x6d) [0x7f39df705c0d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- begin dump of recent events ---
> 0> 2012-06-20 11:56:46.355013 7f39d5c0a700 -1 *** Caught signal
> (Aborted) **
> in thread 7f39d5c0a700
>
> ceph version 0.47.2-521-g88c7629
> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> 1: /usr/bin/ceph-osd() [0x70e429]
> 2: (()+0xeff0) [0x7f39e1089ff0]
> 3: (gsignal()+0x35) [0x7f39df668225]
> 4: (abort()+0x180) [0x7f39df66b030]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39dfefcdc5]
> 6: (()+0xcb166) [0x7f39dfefb166]
> 7: (()+0xcb193) [0x7f39dfefb193]
> 8: (()+0xcb28e) [0x7f39dfefb28e]
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x940) [0x78ae90]
> 10: /usr/bin/ceph-osd() [0x51a05d]
> 11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> 12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> 13: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> 15: (()+0x68ca) [0x7f39e10818ca]
> 16: (clone()+0x6d) [0x7f39df705c0d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- end dump of recent events ---
>
> Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: still crashing osds with next branch
2012-06-20 13:30 ` Stefan Priebe - Profihost AG
@ 2012-06-20 17:19 ` Stefan Priebe
2012-06-20 17:35 ` Sage Weil
2012-06-20 22:56 ` Sage Weil
0 siblings, 2 replies; 9+ messages in thread
From: Stefan Priebe @ 2012-06-20 17:19 UTC (permalink / raw)
To: ceph-devel@vger.kernel.org
Nobody an idea? Should i open up bugs in tracker?
Am 20.06.2012 15:30, schrieb Stefan Priebe - Profihost AG:
> Mhm always the same osd's are crashing now again. Mostly while shutting
> down or restarting a KVM machine.
>
> This time:
> ####### Server 1 ########################
> 0> 2012-06-20 11:59:06.499813 7f1664052700 -1 *** Caught signal
> (Segmentation fault) **
> in thread 7f1664052700
>
> ceph version 0.47.2-521-g88c7629
> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> 1: /usr/bin/ceph-osd() [0x70e429]
> 2: (()+0xeff0) [0x7f16714d5ff0]
> 3: (OSD::disconnect_session_watches(OSD::Session*)+0x418) [0x5c80b8]
> 4: (OSD::ms_handle_reset(Connection*)+0x13b) [0x5c88db]
> 5: (SimpleMessenger::dispatch_entry()+0x1145) [0x72ca85]
> 6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x719dad]
> 7: (()+0x68ca) [0x7f16714cd8ca]
> 8: (clone()+0x6d) [0x7f166fb51c0d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- end dump of recent events ---
>
>
> And the
> ####### Server 2 ########################
>
> thread 7ff933ef4700 time 2012-06-20 15:20:12.450641
> osd/ReplicatedPG.cc: 968: FAILED assert(obc->registered)
>
> ceph version 0.47.2-521-g88c7629
> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> 1: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0)
> [0x56c3c0]
> 2: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
> 3: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
> 4: (ThreadPool::worker()+0xb38) [0x7bbf78]
> 5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> 6: (()+0x68ca) [0x7ff9444768ca]
> 7: (clone()+0x6d) [0x7ff942afac0d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> 0> 2012-06-20 15:20:12.466828 7ff939800700 -1 ./common/Mutex.h: In
> function 'void Mutex::Lock(bool)' thread 7ff939800700 time 2
> 012-06-20 15:20:12.466152
> ./common/Mutex.h: 110: FAILED assert(r == 0)
>
> ceph version 0.47.2-521-g88c7629
> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> 1: /usr/bin/ceph-osd() [0x51a05d]
> 2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> 3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> 4: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> 5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> 6: (()+0x68ca) [0x7ff9444768ca]
> 7: (clone()+0x6d) [0x7ff942afac0d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- end dump of recent events ---
> 2012-06-20 15:20:12.511987 7ff933ef4700 -1 *** Caught signal (Aborted) **
> in thread 7ff933ef4700
>
> ceph version 0.47.2-521-g88c7629
> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> 1: /usr/bin/ceph-osd() [0x70e429]
> 2: (()+0xeff0) [0x7ff94447eff0]
> 3: (gsignal()+0x35) [0x7ff942a5d225]
> 4: (abort()+0x180) [0x7ff942a60030]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7ff9432f1dc5]
> 6: (()+0xcb166) [0x7ff9432f0166]
> 7: (()+0xcb193) [0x7ff9432f0193]
> 8: (()+0xcb28e) [0x7ff9432f028e]
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x940) [0x78ae90]
> 10: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0)
> [0x56c3c0]
> 11: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
> 12: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
> 13: (ThreadPool::worker()+0xb38) [0x7bbf78]
> 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> 15: (()+0x68ca) [0x7ff9444768ca]
> 16: (clone()+0x6d) [0x7ff942afac0d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- begin dump of recent events ---
> 0> 2012-06-20 15:20:12.511987 7ff933ef4700 -1 *** Caught signal
> (Aborted) **
> in thread 7ff933ef4700
>
> ceph version 0.47.2-521-g88c7629
> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> 1: /usr/bin/ceph-osd() [0x70e429]
> 2: (()+0xeff0) [0x7ff94447eff0]
> 3: (gsignal()+0x35) [0x7ff942a5d225]
> 4: (abort()+0x180) [0x7ff942a60030]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7ff9432f1dc5]
> 6: (()+0xcb166) [0x7ff9432f0166]
> 7: (()+0xcb193) [0x7ff9432f0193]
> 8: (()+0xcb28e) [0x7ff9432f028e]
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x940) [0x78ae90]
> 10: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0)
> [0x56c3c0]
> 11: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
> 12: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
> 13: (ThreadPool::worker()+0xb38) [0x7bbf78]
> 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> 15: (()+0x68ca) [0x7ff9444768ca]
> 16: (clone()+0x6d) [0x7ff942afac0d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- end dump of recent events ---
>
>
> Am 20.06.2012 12:03, schrieb Stefan Priebe - Profihost AG:
>> Hello list,
>>
>> i'm still seeing osd crashes with next branch under KVM load. If you
>> need the core dump please tell me.
>>
>> Here are TWO different crashes.
>>
>> Here are the last log lines:
>>
>> ########### CRASH 1 ###########
>>
>> -3> 2012-06-20 11:59:06.446836 7f1660f4b700 0 osd.13 105 pg[4.64b( v
>> 105'29708 (103'28588,105'29708] n=25 ec=56 les/c 105/105 104/104/104)
>> [13] r=0 lpr=104 mlcod 105'29708 active+degraded] watch:
>> oi.user_version=28492
>> -2> 2012-06-20 11:59:06.496350 7f166074a700 0 osd.13 105 pg[4.64b( v
>> 105'29709 (103'28588,105'29709] n=25 ec=56 les/c 105/105 104/104/104)
>> [13] r=0 lpr=104 mlcod 105'29709 active+degraded] watch:
>> ctx->obc=0x9f94840 cookie=1 oi.version=29709 ctx->at_version=105'29710
>> -1> 2012-06-20 11:59:06.496386 7f166074a700 0 osd.13 105 pg[4.64b( v
>> 105'29709 (103'28588,105'29709] n=25 ec=56 les/c 105/105 104/104/104)
>> [13] r=0 lpr=104 mlcod 105'29709 active+degraded] watch:
>> oi.user_version=28492
>> 0> 2012-06-20 11:59:06.499813 7f1664052700 -1 *** Caught signal
>> (Segmentation fault) **
>> in thread 7f1664052700
>>
>> ceph version 0.47.2-521-g88c7629
>> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
>> 1: /usr/bin/ceph-osd() [0x70e429]
>> 2: (()+0xeff0) [0x7f16714d5ff0]
>> 3: (OSD::disconnect_session_watches(OSD::Session*)+0x418) [0x5c80b8]
>> 4: (OSD::ms_handle_reset(Connection*)+0x13b) [0x5c88db]
>> 5: (SimpleMessenger::dispatch_entry()+0x1145) [0x72ca85]
>> 6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x719dad]
>> 7: (()+0x68ca) [0x7f16714cd8ca]
>> 8: (clone()+0x6d) [0x7f166fb51c0d]
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
>> to interpret this.
>>
>> --- end dump of recent events ---
>>
>>
>> ########### CRASH 2 ###########
>>
>> 0> 2012-06-20 11:56:46.339027 7f39d5c0a700 -1 ./common/Mutex.h: In
>> function 'void Mutex::Lock(bool)' thread 7f39d5c0a700 time 2012-06-20
>> 11:56:46.338403
>> ./common/Mutex.h: 110: FAILED assert(r == 0)
>>
>> ceph version 0.47.2-521-g88c7629
>> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
>> 1: /usr/bin/ceph-osd() [0x51a05d]
>> 2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
>> 3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
>> 4: (ThreadPool::worker()+0xbb7) [0x7bbff7]
>> 5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
>> 6: (()+0x68ca) [0x7f39e10818ca]
>> 7: (clone()+0x6d) [0x7f39df705c0d]
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
>> to interpret this.
>>
>> --- end dump of recent events ---
>> 2012-06-20 11:56:46.355013 7f39d5c0a700 -1 *** Caught signal (Aborted) **
>> in thread 7f39d5c0a700
>>
>> ceph version 0.47.2-521-g88c7629
>> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
>> 1: /usr/bin/ceph-osd() [0x70e429]
>> 2: (()+0xeff0) [0x7f39e1089ff0]
>> 3: (gsignal()+0x35) [0x7f39df668225]
>> 4: (abort()+0x180) [0x7f39df66b030]
>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39dfefcdc5]
>> 6: (()+0xcb166) [0x7f39dfefb166]
>> 7: (()+0xcb193) [0x7f39dfefb193]
>> 8: (()+0xcb28e) [0x7f39dfefb28e]
>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x940) [0x78ae90]
>> 10: /usr/bin/ceph-osd() [0x51a05d]
>> 11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
>> 12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
>> 13: (ThreadPool::worker()+0xbb7) [0x7bbff7]
>> 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
>> 15: (()+0x68ca) [0x7f39e10818ca]
>> 16: (clone()+0x6d) [0x7f39df705c0d]
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
>> to interpret this.
>>
>> --- begin dump of recent events ---
>> 0> 2012-06-20 11:56:46.355013 7f39d5c0a700 -1 *** Caught signal
>> (Aborted) **
>> in thread 7f39d5c0a700
>>
>> ceph version 0.47.2-521-g88c7629
>> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
>> 1: /usr/bin/ceph-osd() [0x70e429]
>> 2: (()+0xeff0) [0x7f39e1089ff0]
>> 3: (gsignal()+0x35) [0x7f39df668225]
>> 4: (abort()+0x180) [0x7f39df66b030]
>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39dfefcdc5]
>> 6: (()+0xcb166) [0x7f39dfefb166]
>> 7: (()+0xcb193) [0x7f39dfefb193]
>> 8: (()+0xcb28e) [0x7f39dfefb28e]
>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x940) [0x78ae90]
>> 10: /usr/bin/ceph-osd() [0x51a05d]
>> 11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
>> 12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
>> 13: (ThreadPool::worker()+0xbb7) [0x7bbff7]
>> 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
>> 15: (()+0x68ca) [0x7f39e10818ca]
>> 16: (clone()+0x6d) [0x7f39df705c0d]
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
>> to interpret this.
>>
>> --- end dump of recent events ---
>>
>> Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: still crashing osds with next branch
2012-06-20 17:19 ` Stefan Priebe
@ 2012-06-20 17:35 ` Sage Weil
2012-06-20 18:10 ` Stefan Priebe
2012-06-20 22:56 ` Sage Weil
1 sibling, 1 reply; 9+ messages in thread
From: Sage Weil @ 2012-06-20 17:35 UTC (permalink / raw)
To: Stefan Priebe; +Cc: ceph-devel@vger.kernel.org
On Wed, 20 Jun 2012, Stefan Priebe wrote:
> Nobody an idea? Should i open up bugs in tracker?
Let's open up bugs. If they are reproducible, debug osd = 20 logs would
be awesome!
Also, the crash you mentioned in your earlier email we did see:
http://tracker.newdream.net/issues/2599
If you have logs from that crash, those would also be helpful.
Thanks!
sage
>
> Am 20.06.2012 15:30, schrieb Stefan Priebe - Profihost AG:
> > Mhm always the same osd's are crashing now again. Mostly while shutting
> > down or restarting a KVM machine.
> >
> > This time:
> > ####### Server 1 ########################
> > 0> 2012-06-20 11:59:06.499813 7f1664052700 -1 *** Caught signal
> > (Segmentation fault) **
> > in thread 7f1664052700
> >
> > ceph version 0.47.2-521-g88c7629
> > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > 1: /usr/bin/ceph-osd() [0x70e429]
> > 2: (()+0xeff0) [0x7f16714d5ff0]
> > 3: (OSD::disconnect_session_watches(OSD::Session*)+0x418) [0x5c80b8]
> > 4: (OSD::ms_handle_reset(Connection*)+0x13b) [0x5c88db]
> > 5: (SimpleMessenger::dispatch_entry()+0x1145) [0x72ca85]
> > 6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x719dad]
> > 7: (()+0x68ca) [0x7f16714cd8ca]
> > 8: (clone()+0x6d) [0x7f166fb51c0d]
> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> >
> > --- end dump of recent events ---
> >
> >
> > And the
> > ####### Server 2 ########################
> >
> > thread 7ff933ef4700 time 2012-06-20 15:20:12.450641
> > osd/ReplicatedPG.cc: 968: FAILED assert(obc->registered)
> >
> > ceph version 0.47.2-521-g88c7629
> > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > 1: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0)
> > [0x56c3c0]
> > 2: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
> > 3: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
> > 4: (ThreadPool::worker()+0xb38) [0x7bbf78]
> > 5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > 6: (()+0x68ca) [0x7ff9444768ca]
> > 7: (clone()+0x6d) [0x7ff942afac0d]
> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> >
> > 0> 2012-06-20 15:20:12.466828 7ff939800700 -1 ./common/Mutex.h: In
> > function 'void Mutex::Lock(bool)' thread 7ff939800700 time 2
> > 012-06-20 15:20:12.466152
> > ./common/Mutex.h: 110: FAILED assert(r == 0)
> >
> > ceph version 0.47.2-521-g88c7629
> > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > 1: /usr/bin/ceph-osd() [0x51a05d]
> > 2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> > 3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> > 4: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> > 5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > 6: (()+0x68ca) [0x7ff9444768ca]
> > 7: (clone()+0x6d) [0x7ff942afac0d]
> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> >
> > --- end dump of recent events ---
> > 2012-06-20 15:20:12.511987 7ff933ef4700 -1 *** Caught signal (Aborted) **
> > in thread 7ff933ef4700
> >
> > ceph version 0.47.2-521-g88c7629
> > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > 1: /usr/bin/ceph-osd() [0x70e429]
> > 2: (()+0xeff0) [0x7ff94447eff0]
> > 3: (gsignal()+0x35) [0x7ff942a5d225]
> > 4: (abort()+0x180) [0x7ff942a60030]
> > 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7ff9432f1dc5]
> > 6: (()+0xcb166) [0x7ff9432f0166]
> > 7: (()+0xcb193) [0x7ff9432f0193]
> > 8: (()+0xcb28e) [0x7ff9432f028e]
> > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x940) [0x78ae90]
> > 10: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0)
> > [0x56c3c0]
> > 11: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
> > 12: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
> > 13: (ThreadPool::worker()+0xb38) [0x7bbf78]
> > 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > 15: (()+0x68ca) [0x7ff9444768ca]
> > 16: (clone()+0x6d) [0x7ff942afac0d]
> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> >
> > --- begin dump of recent events ---
> > 0> 2012-06-20 15:20:12.511987 7ff933ef4700 -1 *** Caught signal
> > (Aborted) **
> > in thread 7ff933ef4700
> >
> > ceph version 0.47.2-521-g88c7629
> > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > 1: /usr/bin/ceph-osd() [0x70e429]
> > 2: (()+0xeff0) [0x7ff94447eff0]
> > 3: (gsignal()+0x35) [0x7ff942a5d225]
> > 4: (abort()+0x180) [0x7ff942a60030]
> > 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7ff9432f1dc5]
> > 6: (()+0xcb166) [0x7ff9432f0166]
> > 7: (()+0xcb193) [0x7ff9432f0193]
> > 8: (()+0xcb28e) [0x7ff9432f028e]
> > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x940) [0x78ae90]
> > 10: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0)
> > [0x56c3c0]
> > 11: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
> > 12: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
> > 13: (ThreadPool::worker()+0xb38) [0x7bbf78]
> > 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > 15: (()+0x68ca) [0x7ff9444768ca]
> > 16: (clone()+0x6d) [0x7ff942afac0d]
> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> >
> > --- end dump of recent events ---
> >
> >
> > Am 20.06.2012 12:03, schrieb Stefan Priebe - Profihost AG:
> > > Hello list,
> > >
> > > i'm still seeing osd crashes with next branch under KVM load. If you
> > > need the core dump please tell me.
> > >
> > > Here are TWO different crashes.
> > >
> > > Here are the last log lines:
> > >
> > > ########### CRASH 1 ###########
> > >
> > > -3> 2012-06-20 11:59:06.446836 7f1660f4b700 0 osd.13 105 pg[4.64b( v
> > > 105'29708 (103'28588,105'29708] n=25 ec=56 les/c 105/105 104/104/104)
> > > [13] r=0 lpr=104 mlcod 105'29708 active+degraded] watch:
> > > oi.user_version=28492
> > > -2> 2012-06-20 11:59:06.496350 7f166074a700 0 osd.13 105 pg[4.64b( v
> > > 105'29709 (103'28588,105'29709] n=25 ec=56 les/c 105/105 104/104/104)
> > > [13] r=0 lpr=104 mlcod 105'29709 active+degraded] watch:
> > > ctx->obc=0x9f94840 cookie=1 oi.version=29709 ctx->at_version=105'29710
> > > -1> 2012-06-20 11:59:06.496386 7f166074a700 0 osd.13 105 pg[4.64b( v
> > > 105'29709 (103'28588,105'29709] n=25 ec=56 les/c 105/105 104/104/104)
> > > [13] r=0 lpr=104 mlcod 105'29709 active+degraded] watch:
> > > oi.user_version=28492
> > > 0> 2012-06-20 11:59:06.499813 7f1664052700 -1 *** Caught signal
> > > (Segmentation fault) **
> > > in thread 7f1664052700
> > >
> > > ceph version 0.47.2-521-g88c7629
> > > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > > 1: /usr/bin/ceph-osd() [0x70e429]
> > > 2: (()+0xeff0) [0x7f16714d5ff0]
> > > 3: (OSD::disconnect_session_watches(OSD::Session*)+0x418) [0x5c80b8]
> > > 4: (OSD::ms_handle_reset(Connection*)+0x13b) [0x5c88db]
> > > 5: (SimpleMessenger::dispatch_entry()+0x1145) [0x72ca85]
> > > 6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x719dad]
> > > 7: (()+0x68ca) [0x7f16714cd8ca]
> > > 8: (clone()+0x6d) [0x7f166fb51c0d]
> > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> > > to interpret this.
> > >
> > > --- end dump of recent events ---
> > >
> > >
> > > ########### CRASH 2 ###########
> > >
> > > 0> 2012-06-20 11:56:46.339027 7f39d5c0a700 -1 ./common/Mutex.h: In
> > > function 'void Mutex::Lock(bool)' thread 7f39d5c0a700 time 2012-06-20
> > > 11:56:46.338403
> > > ./common/Mutex.h: 110: FAILED assert(r == 0)
> > >
> > > ceph version 0.47.2-521-g88c7629
> > > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > > 1: /usr/bin/ceph-osd() [0x51a05d]
> > > 2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> > > 3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> > > 4: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> > > 5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > > 6: (()+0x68ca) [0x7f39e10818ca]
> > > 7: (clone()+0x6d) [0x7f39df705c0d]
> > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> > > to interpret this.
> > >
> > > --- end dump of recent events ---
> > > 2012-06-20 11:56:46.355013 7f39d5c0a700 -1 *** Caught signal (Aborted) **
> > > in thread 7f39d5c0a700
> > >
> > > ceph version 0.47.2-521-g88c7629
> > > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > > 1: /usr/bin/ceph-osd() [0x70e429]
> > > 2: (()+0xeff0) [0x7f39e1089ff0]
> > > 3: (gsignal()+0x35) [0x7f39df668225]
> > > 4: (abort()+0x180) [0x7f39df66b030]
> > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39dfefcdc5]
> > > 6: (()+0xcb166) [0x7f39dfefb166]
> > > 7: (()+0xcb193) [0x7f39dfefb193]
> > > 8: (()+0xcb28e) [0x7f39dfefb28e]
> > > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > > const*)+0x940) [0x78ae90]
> > > 10: /usr/bin/ceph-osd() [0x51a05d]
> > > 11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> > > 12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> > > 13: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> > > 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > > 15: (()+0x68ca) [0x7f39e10818ca]
> > > 16: (clone()+0x6d) [0x7f39df705c0d]
> > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> > > to interpret this.
> > >
> > > --- begin dump of recent events ---
> > > 0> 2012-06-20 11:56:46.355013 7f39d5c0a700 -1 *** Caught signal
> > > (Aborted) **
> > > in thread 7f39d5c0a700
> > >
> > > ceph version 0.47.2-521-g88c7629
> > > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > > 1: /usr/bin/ceph-osd() [0x70e429]
> > > 2: (()+0xeff0) [0x7f39e1089ff0]
> > > 3: (gsignal()+0x35) [0x7f39df668225]
> > > 4: (abort()+0x180) [0x7f39df66b030]
> > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39dfefcdc5]
> > > 6: (()+0xcb166) [0x7f39dfefb166]
> > > 7: (()+0xcb193) [0x7f39dfefb193]
> > > 8: (()+0xcb28e) [0x7f39dfefb28e]
> > > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > > const*)+0x940) [0x78ae90]
> > > 10: /usr/bin/ceph-osd() [0x51a05d]
> > > 11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> > > 12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> > > 13: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> > > 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > > 15: (()+0x68ca) [0x7f39e10818ca]
> > > 16: (clone()+0x6d) [0x7f39df705c0d]
> > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> > > to interpret this.
> > >
> > > --- end dump of recent events ---
> > >
> > > Stefan
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: still crashing osds with next branch
2012-06-20 17:35 ` Sage Weil
@ 2012-06-20 18:10 ` Stefan Priebe
2012-06-20 18:11 ` Sage Weil
0 siblings, 1 reply; 9+ messages in thread
From: Stefan Priebe @ 2012-06-20 18:10 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel@vger.kernel.org
Am 20.06.2012 19:35, schrieb Sage Weil:
> On Wed, 20 Jun 2012, Stefan Priebe wrote:
>> Nobody an idea? Should i open up bugs in tracker?
>
> Let's open up bugs. If they are reproducible, debug osd = 20 logs would
> be awesome!
>
> Also, the crash you mentioned in your earlier email we did see:
>
> http://tracker.newdream.net/issues/2599
>
> If you have logs from that crash, those would also be helpful.
i can't reproduce but it happens pretty often. So i can switch to debug
osd = 20 in general. What else do you need?
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: still crashing osds with next branch
2012-06-20 18:10 ` Stefan Priebe
@ 2012-06-20 18:11 ` Sage Weil
2012-06-20 20:21 ` Stefan Priebe
0 siblings, 1 reply; 9+ messages in thread
From: Sage Weil @ 2012-06-20 18:11 UTC (permalink / raw)
To: Stefan Priebe; +Cc: ceph-devel@vger.kernel.org
On Wed, 20 Jun 2012, Stefan Priebe wrote:
> Am 20.06.2012 19:35, schrieb Sage Weil:
> > On Wed, 20 Jun 2012, Stefan Priebe wrote:
> > > Nobody an idea? Should i open up bugs in tracker?
> >
> > Let's open up bugs. If they are reproducible, debug osd = 20 logs would
> > be awesome!
> >
> > Also, the crash you mentioned in your earlier email we did see:
> >
> > http://tracker.newdream.net/issues/2599
> >
> > If you have logs from that crash, those would also be helpful.
>
> i can't reproduce but it happens pretty often. So i can switch to debug osd =
> 20 in general. What else do you need?
That will probably be enough, but if you get a core file, don't throw it
away :)
thanks!
sage
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: still crashing osds with next branch
2012-06-20 18:11 ` Sage Weil
@ 2012-06-20 20:21 ` Stefan Priebe
0 siblings, 0 replies; 9+ messages in thread
From: Stefan Priebe @ 2012-06-20 20:21 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel@vger.kernel.org
Am 20.06.2012 20:11, schrieb Sage Weil:
> On Wed, 20 Jun 2012, Stefan Priebe wrote:
>> Am 20.06.2012 19:35, schrieb Sage Weil:
>>> On Wed, 20 Jun 2012, Stefan Priebe wrote:
>>>> Nobody an idea? Should i open up bugs in tracker?
>>>
>>> Let's open up bugs. If they are reproducible, debug osd = 20 logs would
>>> be awesome!
>>>
>>> Also, the crash you mentioned in your earlier email we did see:
>>>
>>> http://tracker.newdream.net/issues/2599
>>>
>>> If you have logs from that crash, those would also be helpful.
>>
>> i can't reproduce but it happens pretty often. So i can switch to debug osd =
>> 20 in general. What else do you need?
>
> That will probably be enough, but if you get a core file, don't throw it
> away :)
Hi first log is 2.6GB and core dump 643MB? Where to put this?
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: still crashing osds with next branch
2012-06-20 17:19 ` Stefan Priebe
2012-06-20 17:35 ` Sage Weil
@ 2012-06-20 22:56 ` Sage Weil
2012-06-21 5:54 ` Stefan Priebe - Profihost AG
1 sibling, 1 reply; 9+ messages in thread
From: Sage Weil @ 2012-06-20 22:56 UTC (permalink / raw)
To: Stefan Priebe; +Cc: ceph-devel@vger.kernel.org
Just a quick update: there were some problems with doing a rolling
upgrade that may be responsible for these. We're testing the fix now.
Did this, by chance, happen on a cluster with a mix of 0.47.2 and 0.48?
sage
On Wed, 20 Jun 2012, Stefan Priebe wrote:
> Nobody an idea? Should i open up bugs in tracker?
>
> Am 20.06.2012 15:30, schrieb Stefan Priebe - Profihost AG:
> > Mhm always the same osd's are crashing now again. Mostly while shutting
> > down or restarting a KVM machine.
> >
> > This time:
> > ####### Server 1 ########################
> > 0> 2012-06-20 11:59:06.499813 7f1664052700 -1 *** Caught signal
> > (Segmentation fault) **
> > in thread 7f1664052700
> >
> > ceph version 0.47.2-521-g88c7629
> > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > 1: /usr/bin/ceph-osd() [0x70e429]
> > 2: (()+0xeff0) [0x7f16714d5ff0]
> > 3: (OSD::disconnect_session_watches(OSD::Session*)+0x418) [0x5c80b8]
> > 4: (OSD::ms_handle_reset(Connection*)+0x13b) [0x5c88db]
> > 5: (SimpleMessenger::dispatch_entry()+0x1145) [0x72ca85]
> > 6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x719dad]
> > 7: (()+0x68ca) [0x7f16714cd8ca]
> > 8: (clone()+0x6d) [0x7f166fb51c0d]
> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> >
> > --- end dump of recent events ---
> >
> >
> > And the
> > ####### Server 2 ########################
> >
> > thread 7ff933ef4700 time 2012-06-20 15:20:12.450641
> > osd/ReplicatedPG.cc: 968: FAILED assert(obc->registered)
> >
> > ceph version 0.47.2-521-g88c7629
> > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > 1: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0)
> > [0x56c3c0]
> > 2: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
> > 3: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
> > 4: (ThreadPool::worker()+0xb38) [0x7bbf78]
> > 5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > 6: (()+0x68ca) [0x7ff9444768ca]
> > 7: (clone()+0x6d) [0x7ff942afac0d]
> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> >
> > 0> 2012-06-20 15:20:12.466828 7ff939800700 -1 ./common/Mutex.h: In
> > function 'void Mutex::Lock(bool)' thread 7ff939800700 time 2
> > 012-06-20 15:20:12.466152
> > ./common/Mutex.h: 110: FAILED assert(r == 0)
> >
> > ceph version 0.47.2-521-g88c7629
> > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > 1: /usr/bin/ceph-osd() [0x51a05d]
> > 2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> > 3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> > 4: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> > 5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > 6: (()+0x68ca) [0x7ff9444768ca]
> > 7: (clone()+0x6d) [0x7ff942afac0d]
> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> >
> > --- end dump of recent events ---
> > 2012-06-20 15:20:12.511987 7ff933ef4700 -1 *** Caught signal (Aborted) **
> > in thread 7ff933ef4700
> >
> > ceph version 0.47.2-521-g88c7629
> > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > 1: /usr/bin/ceph-osd() [0x70e429]
> > 2: (()+0xeff0) [0x7ff94447eff0]
> > 3: (gsignal()+0x35) [0x7ff942a5d225]
> > 4: (abort()+0x180) [0x7ff942a60030]
> > 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7ff9432f1dc5]
> > 6: (()+0xcb166) [0x7ff9432f0166]
> > 7: (()+0xcb193) [0x7ff9432f0193]
> > 8: (()+0xcb28e) [0x7ff9432f028e]
> > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x940) [0x78ae90]
> > 10: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0)
> > [0x56c3c0]
> > 11: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
> > 12: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
> > 13: (ThreadPool::worker()+0xb38) [0x7bbf78]
> > 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > 15: (()+0x68ca) [0x7ff9444768ca]
> > 16: (clone()+0x6d) [0x7ff942afac0d]
> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> >
> > --- begin dump of recent events ---
> > 0> 2012-06-20 15:20:12.511987 7ff933ef4700 -1 *** Caught signal
> > (Aborted) **
> > in thread 7ff933ef4700
> >
> > ceph version 0.47.2-521-g88c7629
> > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > 1: /usr/bin/ceph-osd() [0x70e429]
> > 2: (()+0xeff0) [0x7ff94447eff0]
> > 3: (gsignal()+0x35) [0x7ff942a5d225]
> > 4: (abort()+0x180) [0x7ff942a60030]
> > 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7ff9432f1dc5]
> > 6: (()+0xcb166) [0x7ff9432f0166]
> > 7: (()+0xcb193) [0x7ff9432f0193]
> > 8: (()+0xcb28e) [0x7ff9432f028e]
> > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x940) [0x78ae90]
> > 10: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0)
> > [0x56c3c0]
> > 11: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
> > 12: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
> > 13: (ThreadPool::worker()+0xb38) [0x7bbf78]
> > 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > 15: (()+0x68ca) [0x7ff9444768ca]
> > 16: (clone()+0x6d) [0x7ff942afac0d]
> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> >
> > --- end dump of recent events ---
> >
> >
> > Am 20.06.2012 12:03, schrieb Stefan Priebe - Profihost AG:
> > > Hello list,
> > >
> > > i'm still seeing osd crashes with next branch under KVM load. If you
> > > need the core dump please tell me.
> > >
> > > Here are TWO different crashes.
> > >
> > > Here are the last log lines:
> > >
> > > ########### CRASH 1 ###########
> > >
> > > -3> 2012-06-20 11:59:06.446836 7f1660f4b700 0 osd.13 105 pg[4.64b( v
> > > 105'29708 (103'28588,105'29708] n=25 ec=56 les/c 105/105 104/104/104)
> > > [13] r=0 lpr=104 mlcod 105'29708 active+degraded] watch:
> > > oi.user_version=28492
> > > -2> 2012-06-20 11:59:06.496350 7f166074a700 0 osd.13 105 pg[4.64b( v
> > > 105'29709 (103'28588,105'29709] n=25 ec=56 les/c 105/105 104/104/104)
> > > [13] r=0 lpr=104 mlcod 105'29709 active+degraded] watch:
> > > ctx->obc=0x9f94840 cookie=1 oi.version=29709 ctx->at_version=105'29710
> > > -1> 2012-06-20 11:59:06.496386 7f166074a700 0 osd.13 105 pg[4.64b( v
> > > 105'29709 (103'28588,105'29709] n=25 ec=56 les/c 105/105 104/104/104)
> > > [13] r=0 lpr=104 mlcod 105'29709 active+degraded] watch:
> > > oi.user_version=28492
> > > 0> 2012-06-20 11:59:06.499813 7f1664052700 -1 *** Caught signal
> > > (Segmentation fault) **
> > > in thread 7f1664052700
> > >
> > > ceph version 0.47.2-521-g88c7629
> > > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > > 1: /usr/bin/ceph-osd() [0x70e429]
> > > 2: (()+0xeff0) [0x7f16714d5ff0]
> > > 3: (OSD::disconnect_session_watches(OSD::Session*)+0x418) [0x5c80b8]
> > > 4: (OSD::ms_handle_reset(Connection*)+0x13b) [0x5c88db]
> > > 5: (SimpleMessenger::dispatch_entry()+0x1145) [0x72ca85]
> > > 6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x719dad]
> > > 7: (()+0x68ca) [0x7f16714cd8ca]
> > > 8: (clone()+0x6d) [0x7f166fb51c0d]
> > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> > > to interpret this.
> > >
> > > --- end dump of recent events ---
> > >
> > >
> > > ########### CRASH 2 ###########
> > >
> > > 0> 2012-06-20 11:56:46.339027 7f39d5c0a700 -1 ./common/Mutex.h: In
> > > function 'void Mutex::Lock(bool)' thread 7f39d5c0a700 time 2012-06-20
> > > 11:56:46.338403
> > > ./common/Mutex.h: 110: FAILED assert(r == 0)
> > >
> > > ceph version 0.47.2-521-g88c7629
> > > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > > 1: /usr/bin/ceph-osd() [0x51a05d]
> > > 2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> > > 3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> > > 4: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> > > 5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > > 6: (()+0x68ca) [0x7f39e10818ca]
> > > 7: (clone()+0x6d) [0x7f39df705c0d]
> > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> > > to interpret this.
> > >
> > > --- end dump of recent events ---
> > > 2012-06-20 11:56:46.355013 7f39d5c0a700 -1 *** Caught signal (Aborted) **
> > > in thread 7f39d5c0a700
> > >
> > > ceph version 0.47.2-521-g88c7629
> > > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > > 1: /usr/bin/ceph-osd() [0x70e429]
> > > 2: (()+0xeff0) [0x7f39e1089ff0]
> > > 3: (gsignal()+0x35) [0x7f39df668225]
> > > 4: (abort()+0x180) [0x7f39df66b030]
> > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39dfefcdc5]
> > > 6: (()+0xcb166) [0x7f39dfefb166]
> > > 7: (()+0xcb193) [0x7f39dfefb193]
> > > 8: (()+0xcb28e) [0x7f39dfefb28e]
> > > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > > const*)+0x940) [0x78ae90]
> > > 10: /usr/bin/ceph-osd() [0x51a05d]
> > > 11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> > > 12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> > > 13: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> > > 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > > 15: (()+0x68ca) [0x7f39e10818ca]
> > > 16: (clone()+0x6d) [0x7f39df705c0d]
> > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> > > to interpret this.
> > >
> > > --- begin dump of recent events ---
> > > 0> 2012-06-20 11:56:46.355013 7f39d5c0a700 -1 *** Caught signal
> > > (Aborted) **
> > > in thread 7f39d5c0a700
> > >
> > > ceph version 0.47.2-521-g88c7629
> > > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > > 1: /usr/bin/ceph-osd() [0x70e429]
> > > 2: (()+0xeff0) [0x7f39e1089ff0]
> > > 3: (gsignal()+0x35) [0x7f39df668225]
> > > 4: (abort()+0x180) [0x7f39df66b030]
> > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39dfefcdc5]
> > > 6: (()+0xcb166) [0x7f39dfefb166]
> > > 7: (()+0xcb193) [0x7f39dfefb193]
> > > 8: (()+0xcb28e) [0x7f39dfefb28e]
> > > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > > const*)+0x940) [0x78ae90]
> > > 10: /usr/bin/ceph-osd() [0x51a05d]
> > > 11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> > > 12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> > > 13: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> > > 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > > 15: (()+0x68ca) [0x7f39e10818ca]
> > > 16: (clone()+0x6d) [0x7f39df705c0d]
> > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> > > to interpret this.
> > >
> > > --- end dump of recent events ---
> > >
> > > Stefan
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: still crashing osds with next branch
2012-06-20 22:56 ` Sage Weil
@ 2012-06-21 5:54 ` Stefan Priebe - Profihost AG
0 siblings, 0 replies; 9+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-06-21 5:54 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel@vger.kernel.org
Am 21.06.2012 00:56, schrieb Sage Weil:
> Just a quick update: there were some problems with doing a rolling
> upgrade that may be responsible for these. We're testing the fix now.
>
> Did this, by chance, happen on a cluster with a mix of 0.47.2 and 0.48?
No it was a clean install of the next branch. Just two hours old.
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-06-21 5:55 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-20 10:03 still crashing osds with next branch Stefan Priebe - Profihost AG
2012-06-20 13:30 ` Stefan Priebe - Profihost AG
2012-06-20 17:19 ` Stefan Priebe
2012-06-20 17:35 ` Sage Weil
2012-06-20 18:10 ` Stefan Priebe
2012-06-20 18:11 ` Sage Weil
2012-06-20 20:21 ` Stefan Priebe
2012-06-20 22:56 ` Sage Weil
2012-06-21 5:54 ` Stefan Priebe - Profihost AG
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.