All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
To: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: still crashing osds with next branch
Date: Wed, 20 Jun 2012 15:30:18 +0200	[thread overview]
Message-ID: <4FE1D06A.1080906@profihost.ag> (raw)
In-Reply-To: <4FE19FF2.2090302@profihost.ag>

Mhm always the same osd's are crashing now again. Mostly while shutting 
down or restarting a KVM machine.

This time:
####### Server 1 ########################
      0> 2012-06-20 11:59:06.499813 7f1664052700 -1 *** Caught signal 
(Segmentation fault) **
  in thread 7f1664052700

  ceph version 0.47.2-521-g88c7629 
(commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
  1: /usr/bin/ceph-osd() [0x70e429]
  2: (()+0xeff0) [0x7f16714d5ff0]
  3: (OSD::disconnect_session_watches(OSD::Session*)+0x418) [0x5c80b8]
  4: (OSD::ms_handle_reset(Connection*)+0x13b) [0x5c88db]
  5: (SimpleMessenger::dispatch_entry()+0x1145) [0x72ca85]
  6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x719dad]
  7: (()+0x68ca) [0x7f16714cd8ca]
  8: (clone()+0x6d) [0x7f166fb51c0d]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

--- end dump of recent events ---


And the
####### Server 2 ########################

  thread 7ff933ef4700 time 2012-06-20 15:20:12.450641
osd/ReplicatedPG.cc: 968: FAILED assert(obc->registered)

  ceph version 0.47.2-521-g88c7629 
(commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
  1: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0) 
[0x56c3c0]
  2: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
  3: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
  4: (ThreadPool::worker()+0xb38) [0x7bbf78]
  5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
  6: (()+0x68ca) [0x7ff9444768ca]
  7: (clone()+0x6d) [0x7ff942afac0d]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

      0> 2012-06-20 15:20:12.466828 7ff939800700 -1 ./common/Mutex.h: In 
function 'void Mutex::Lock(bool)' thread 7ff939800700 time 2
012-06-20 15:20:12.466152
./common/Mutex.h: 110: FAILED assert(r == 0)

  ceph version 0.47.2-521-g88c7629 
(commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
  1: /usr/bin/ceph-osd() [0x51a05d]
  2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
  3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
  4: (ThreadPool::worker()+0xbb7) [0x7bbff7]
  5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
  6: (()+0x68ca) [0x7ff9444768ca]
  7: (clone()+0x6d) [0x7ff942afac0d]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

--- end dump of recent events ---
2012-06-20 15:20:12.511987 7ff933ef4700 -1 *** Caught signal (Aborted) **
  in thread 7ff933ef4700

  ceph version 0.47.2-521-g88c7629 
(commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
  1: /usr/bin/ceph-osd() [0x70e429]
  2: (()+0xeff0) [0x7ff94447eff0]
  3: (gsignal()+0x35) [0x7ff942a5d225]
  4: (abort()+0x180) [0x7ff942a60030]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7ff9432f1dc5]
  6: (()+0xcb166) [0x7ff9432f0166]
  7: (()+0xcb193) [0x7ff9432f0193]
  8: (()+0xcb28e) [0x7ff9432f028e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x940) [0x78ae90]
  10: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0) 
[0x56c3c0]
  11: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
  12: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
  13: (ThreadPool::worker()+0xb38) [0x7bbf78]
  14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
  15: (()+0x68ca) [0x7ff9444768ca]
  16: (clone()+0x6d) [0x7ff942afac0d]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

--- begin dump of recent events ---
      0> 2012-06-20 15:20:12.511987 7ff933ef4700 -1 *** Caught signal 
(Aborted) **
  in thread 7ff933ef4700

  ceph version 0.47.2-521-g88c7629 
(commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
  1: /usr/bin/ceph-osd() [0x70e429]
  2: (()+0xeff0) [0x7ff94447eff0]
  3: (gsignal()+0x35) [0x7ff942a5d225]
  4: (abort()+0x180) [0x7ff942a60030]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7ff9432f1dc5]
  6: (()+0xcb166) [0x7ff9432f0166]
  7: (()+0xcb193) [0x7ff9432f0193]
  8: (()+0xcb28e) [0x7ff9432f028e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x940) [0x78ae90]
  10: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0) 
[0x56c3c0]
  11: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
  12: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
  13: (ThreadPool::worker()+0xb38) [0x7bbf78]
  14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
  15: (()+0x68ca) [0x7ff9444768ca]
  16: (clone()+0x6d) [0x7ff942afac0d]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

--- end dump of recent events ---


Am 20.06.2012 12:03, schrieb Stefan Priebe - Profihost AG:
> Hello list,
>
> i'm still seeing osd crashes with next branch under KVM load. If you
> need the core dump please tell me.
>
> Here are TWO different crashes.
>
> Here are the last log lines:
>
> ########### CRASH 1 ###########
>
> -3> 2012-06-20 11:59:06.446836 7f1660f4b700 0 osd.13 105 pg[4.64b( v
> 105'29708 (103'28588,105'29708] n=25 ec=56 les/c 105/105 104/104/104)
> [13] r=0 lpr=104 mlcod 105'29708 active+degraded] watch:
> oi.user_version=28492
> -2> 2012-06-20 11:59:06.496350 7f166074a700 0 osd.13 105 pg[4.64b( v
> 105'29709 (103'28588,105'29709] n=25 ec=56 les/c 105/105 104/104/104)
> [13] r=0 lpr=104 mlcod 105'29709 active+degraded] watch:
> ctx->obc=0x9f94840 cookie=1 oi.version=29709 ctx->at_version=105'29710
> -1> 2012-06-20 11:59:06.496386 7f166074a700 0 osd.13 105 pg[4.64b( v
> 105'29709 (103'28588,105'29709] n=25 ec=56 les/c 105/105 104/104/104)
> [13] r=0 lpr=104 mlcod 105'29709 active+degraded] watch:
> oi.user_version=28492
> 0> 2012-06-20 11:59:06.499813 7f1664052700 -1 *** Caught signal
> (Segmentation fault) **
> in thread 7f1664052700
>
> ceph version 0.47.2-521-g88c7629
> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> 1: /usr/bin/ceph-osd() [0x70e429]
> 2: (()+0xeff0) [0x7f16714d5ff0]
> 3: (OSD::disconnect_session_watches(OSD::Session*)+0x418) [0x5c80b8]
> 4: (OSD::ms_handle_reset(Connection*)+0x13b) [0x5c88db]
> 5: (SimpleMessenger::dispatch_entry()+0x1145) [0x72ca85]
> 6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x719dad]
> 7: (()+0x68ca) [0x7f16714cd8ca]
> 8: (clone()+0x6d) [0x7f166fb51c0d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- end dump of recent events ---
>
>
> ########### CRASH 2 ###########
>
> 0> 2012-06-20 11:56:46.339027 7f39d5c0a700 -1 ./common/Mutex.h: In
> function 'void Mutex::Lock(bool)' thread 7f39d5c0a700 time 2012-06-20
> 11:56:46.338403
> ./common/Mutex.h: 110: FAILED assert(r == 0)
>
> ceph version 0.47.2-521-g88c7629
> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> 1: /usr/bin/ceph-osd() [0x51a05d]
> 2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> 3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> 4: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> 5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> 6: (()+0x68ca) [0x7f39e10818ca]
> 7: (clone()+0x6d) [0x7f39df705c0d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- end dump of recent events ---
> 2012-06-20 11:56:46.355013 7f39d5c0a700 -1 *** Caught signal (Aborted) **
> in thread 7f39d5c0a700
>
> ceph version 0.47.2-521-g88c7629
> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> 1: /usr/bin/ceph-osd() [0x70e429]
> 2: (()+0xeff0) [0x7f39e1089ff0]
> 3: (gsignal()+0x35) [0x7f39df668225]
> 4: (abort()+0x180) [0x7f39df66b030]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39dfefcdc5]
> 6: (()+0xcb166) [0x7f39dfefb166]
> 7: (()+0xcb193) [0x7f39dfefb193]
> 8: (()+0xcb28e) [0x7f39dfefb28e]
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x940) [0x78ae90]
> 10: /usr/bin/ceph-osd() [0x51a05d]
> 11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> 12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> 13: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> 15: (()+0x68ca) [0x7f39e10818ca]
> 16: (clone()+0x6d) [0x7f39df705c0d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- begin dump of recent events ---
> 0> 2012-06-20 11:56:46.355013 7f39d5c0a700 -1 *** Caught signal
> (Aborted) **
> in thread 7f39d5c0a700
>
> ceph version 0.47.2-521-g88c7629
> (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> 1: /usr/bin/ceph-osd() [0x70e429]
> 2: (()+0xeff0) [0x7f39e1089ff0]
> 3: (gsignal()+0x35) [0x7f39df668225]
> 4: (abort()+0x180) [0x7f39df66b030]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39dfefcdc5]
> 6: (()+0xcb166) [0x7f39dfefb166]
> 7: (()+0xcb193) [0x7f39dfefb193]
> 8: (()+0xcb28e) [0x7f39dfefb28e]
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x940) [0x78ae90]
> 10: /usr/bin/ceph-osd() [0x51a05d]
> 11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> 12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> 13: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> 15: (()+0x68ca) [0x7f39e10818ca]
> 16: (clone()+0x6d) [0x7f39df705c0d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- end dump of recent events ---
>
> Stefan

  reply	other threads:[~2012-06-20 13:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-20 10:03 still crashing osds with next branch Stefan Priebe - Profihost AG
2012-06-20 13:30 ` Stefan Priebe - Profihost AG [this message]
2012-06-20 17:19   ` Stefan Priebe
2012-06-20 17:35     ` Sage Weil
2012-06-20 18:10       ` Stefan Priebe
2012-06-20 18:11         ` Sage Weil
2012-06-20 20:21           ` Stefan Priebe
2012-06-20 22:56     ` Sage Weil
2012-06-21  5:54       ` Stefan Priebe - Profihost AG

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FE1D06A.1080906@profihost.ag \
    --to=s.priebe@profihost.ag \
    --cc=ceph-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.