* osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) @ 2011-11-14 14:04 Martin Mailand 2011-11-14 19:11 ` Gregory Farnum 0 siblings, 1 reply; 11+ messages in thread From: Martin Mailand @ 2011-11-14 14:04 UTC (permalink / raw) To: ceph-devel Hi, today one of my ods died, the log is. sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread '7faeb6139700' osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] 4: (()+0x6d8c) [0x7faec4d12d8c] 5: (clone()+0x6d) [0x7faec355404d] ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] 4: (()+0x6d8c) [0x7faec4d12d8c] 5: (clone()+0x6d) [0x7faec355404d] *** Caught signal (Aborted) ** in thread 7faeb6139700 ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) 1: /usr/bin/ceph-osd() [0x5b8b52] 2: (()+0xfc60) [0x7faec4d1bc60] 3: (gsignal()+0x35) [0x7faec34a1d05] 4: (abort()+0x186) [0x7faec34a5ab6] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd] 6: (()+0xb9926) [0x7faec3d56926] 7: (()+0xb9953) [0x7faec3d56953] 8: (()+0xb9a5e) [0x7faec3d56a5e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x396) [0x5bddb6] 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] 11: (ThreadPool::worker()+0x6e6) [0x5b7b16] 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] 13: (()+0x6d8c) [0x7faec4d12d8c] 14: (clone()+0x6d) [0x7faec355404d] Anything else needed to debug this? -martin ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) 2011-11-14 14:04 osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) Martin Mailand @ 2011-11-14 19:11 ` Gregory Farnum 2011-11-14 19:45 ` Martin Mailand 0 siblings, 1 reply; 11+ messages in thread From: Gregory Farnum @ 2011-11-14 19:11 UTC (permalink / raw) To: Martin Mailand; +Cc: ceph-devel Do you have any other system state? (More logs, core dumps.) Make a bug in the tracker either way so it doesn't get lost track of. :) -Greg On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand <martin@tuxadero.com> wrote: > Hi, > today one of my ods died, the log is. > > sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread '7faeb6139700' > osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) > ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) > 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] > 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] > 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] > 4: (()+0x6d8c) [0x7faec4d12d8c] > 5: (clone()+0x6d) [0x7faec355404d] > ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) > 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] > 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] > 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] > 4: (()+0x6d8c) [0x7faec4d12d8c] > 5: (clone()+0x6d) [0x7faec355404d] > *** Caught signal (Aborted) ** > in thread 7faeb6139700 > ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) > 1: /usr/bin/ceph-osd() [0x5b8b52] > 2: (()+0xfc60) [0x7faec4d1bc60] > 3: (gsignal()+0x35) [0x7faec34a1d05] > 4: (abort()+0x186) [0x7faec34a5ab6] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd] > 6: (()+0xb9926) [0x7faec3d56926] > 7: (()+0xb9953) [0x7faec3d56953] > 8: (()+0xb9a5e) [0x7faec3d56a5e] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x396) [0x5bddb6] > 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] > 11: (ThreadPool::worker()+0x6e6) [0x5b7b16] > 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] > 13: (()+0x6d8c) [0x7faec4d12d8c] > 14: (clone()+0x6d) [0x7faec355404d] > > Anything else needed to debug this? > > -martin > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) 2011-11-14 19:11 ` Gregory Farnum @ 2011-11-14 19:45 ` Martin Mailand 2011-11-14 19:54 ` Gregory Farnum 0 siblings, 1 reply; 11+ messages in thread From: Martin Mailand @ 2011-11-14 19:45 UTC (permalink / raw) To: Gregory Farnum; +Cc: ceph-devel Hi Gregory, I do not have more at the moment. As I cannot have the debug log always on, a core dump would be the best solution? -martin Gregory Farnum schrieb: > Do you have any other system state? (More logs, core dumps.) > > Make a bug in the tracker either way so it doesn't get lost track of. :) > -Greg > > On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand <martin@tuxadero.com> wrote: >> Hi, >> today one of my ods died, the log is. >> >> sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread '7faeb6139700' >> osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) >> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] >> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >> 4: (()+0x6d8c) [0x7faec4d12d8c] >> 5: (clone()+0x6d) [0x7faec355404d] >> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] >> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >> 4: (()+0x6d8c) [0x7faec4d12d8c] >> 5: (clone()+0x6d) [0x7faec355404d] >> *** Caught signal (Aborted) ** >> in thread 7faeb6139700 >> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >> 1: /usr/bin/ceph-osd() [0x5b8b52] >> 2: (()+0xfc60) [0x7faec4d1bc60] >> 3: (gsignal()+0x35) [0x7faec34a1d05] >> 4: (abort()+0x186) [0x7faec34a5ab6] >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd] >> 6: (()+0xb9926) [0x7faec3d56926] >> 7: (()+0xb9953) [0x7faec3d56953] >> 8: (()+0xb9a5e) [0x7faec3d56a5e] >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x396) [0x5bddb6] >> 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >> 11: (ThreadPool::worker()+0x6e6) [0x5b7b16] >> 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >> 13: (()+0x6d8c) [0x7faec4d12d8c] >> 14: (clone()+0x6d) [0x7faec355404d] >> >> Anything else needed to debug this? >> >> -martin >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) 2011-11-14 19:45 ` Martin Mailand @ 2011-11-14 19:54 ` Gregory Farnum 2011-11-14 20:21 ` Sage Weil 0 siblings, 1 reply; 11+ messages in thread From: Gregory Farnum @ 2011-11-14 19:54 UTC (permalink / raw) To: martin; +Cc: ceph-devel It's not a big deal; logging is expensive. :) Just a backtrace isn't a lot to go on, but it's better than nothing! On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand <martin@tuxadero.com> wrote: > Hi Gregory, > I do not have more at the moment. As I cannot have the debug log always on, > a core dump would be the best solution? > > -martin > > Gregory Farnum schrieb: >> >> Do you have any other system state? (More logs, core dumps.) >> >> Make a bug in the tracker either way so it doesn't get lost track of. :) >> -Greg >> >> On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand <martin@tuxadero.com> >> wrote: >>> >>> Hi, >>> today one of my ods died, the log is. >>> >>> sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread >>> '7faeb6139700' >>> osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) >>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] >>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >>> 4: (()+0x6d8c) [0x7faec4d12d8c] >>> 5: (clone()+0x6d) [0x7faec355404d] >>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] >>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >>> 4: (()+0x6d8c) [0x7faec4d12d8c] >>> 5: (clone()+0x6d) [0x7faec355404d] >>> *** Caught signal (Aborted) ** >>> in thread 7faeb6139700 >>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >>> 1: /usr/bin/ceph-osd() [0x5b8b52] >>> 2: (()+0xfc60) [0x7faec4d1bc60] >>> 3: (gsignal()+0x35) [0x7faec34a1d05] >>> 4: (abort()+0x186) [0x7faec34a5ab6] >>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd] >>> 6: (()+0xb9926) [0x7faec3d56926] >>> 7: (()+0xb9953) [0x7faec3d56953] >>> 8: (()+0xb9a5e) [0x7faec3d56a5e] >>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> const*)+0x396) [0x5bddb6] >>> 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >>> 11: (ThreadPool::worker()+0x6e6) [0x5b7b16] >>> 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >>> 13: (()+0x6d8c) [0x7faec4d12d8c] >>> 14: (clone()+0x6d) [0x7faec355404d] >>> >>> Anything else needed to debug this? >>> >>> -martin >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) 2011-11-14 19:54 ` Gregory Farnum @ 2011-11-14 20:21 ` Sage Weil 2011-11-15 19:57 ` Martin Mailand 2011-11-15 23:05 ` Martin Mailand 0 siblings, 2 replies; 11+ messages in thread From: Sage Weil @ 2011-11-14 20:21 UTC (permalink / raw) To: Gregory Farnum; +Cc: martin, ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 3088 bytes --] On Mon, 14 Nov 2011, Gregory Farnum wrote: > It's not a big deal; logging is expensive. :) Just a backtrace isn't a > lot to go on, but it's better than nothing! > > On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand <martin@tuxadero.com> wrote: > > Hi Gregory, > > I do not have more at the moment. As I cannot have the debug log always on, > > a core dump would be the best solution? I'm mainly interested in whether pending_ops is 0 or < 0. A 'thread apply all bt' may also be useful. Thanks! sage > > > > -martin > > > > Gregory Farnum schrieb: > >> > >> Do you have any other system state? (More logs, core dumps.) > >> > >> Make a bug in the tracker either way so it doesn't get lost track of. :) > >> -Greg > >> > >> On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand <martin@tuxadero.com> > >> wrote: > >>> > >>> Hi, > >>> today one of my ods died, the log is. > >>> > >>> sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread > >>> '7faeb6139700' > >>> osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) > >>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) > >>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] > >>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] > >>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] > >>> 4: (()+0x6d8c) [0x7faec4d12d8c] > >>> 5: (clone()+0x6d) [0x7faec355404d] > >>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) > >>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] > >>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] > >>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] > >>> 4: (()+0x6d8c) [0x7faec4d12d8c] > >>> 5: (clone()+0x6d) [0x7faec355404d] > >>> *** Caught signal (Aborted) ** > >>> in thread 7faeb6139700 > >>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) > >>> 1: /usr/bin/ceph-osd() [0x5b8b52] > >>> 2: (()+0xfc60) [0x7faec4d1bc60] > >>> 3: (gsignal()+0x35) [0x7faec34a1d05] > >>> 4: (abort()+0x186) [0x7faec34a5ab6] > >>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd] > >>> 6: (()+0xb9926) [0x7faec3d56926] > >>> 7: (()+0xb9953) [0x7faec3d56953] > >>> 8: (()+0xb9a5e) [0x7faec3d56a5e] > >>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>> const*)+0x396) [0x5bddb6] > >>> 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] > >>> 11: (ThreadPool::worker()+0x6e6) [0x5b7b16] > >>> 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] > >>> 13: (()+0x6d8c) [0x7faec4d12d8c] > >>> 14: (clone()+0x6d) [0x7faec355404d] > >>> > >>> Anything else needed to debug this? > >>> > >>> -martin > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>> the body of a message to majordomo@vger.kernel.org > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>> > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) 2011-11-14 20:21 ` Sage Weil @ 2011-11-15 19:57 ` Martin Mailand 2011-11-15 23:05 ` Martin Mailand 1 sibling, 0 replies; 11+ messages in thread From: Martin Mailand @ 2011-11-15 19:57 UTC (permalink / raw) To: Sage Weil; +Cc: Gregory Farnum, ceph-devel Hi, I have a bt. http://pastebin.com/QNcja2QK -martin Sage Weil schrieb: > On Mon, 14 Nov 2011, Gregory Farnum wrote: >> It's not a big deal; logging is expensive. :) Just a backtrace isn't a >> lot to go on, but it's better than nothing! >> >> On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand <martin@tuxadero.com> wrote: >>> Hi Gregory, >>> I do not have more at the moment. As I cannot have the debug log always on, >>> a core dump would be the best solution? > > I'm mainly interested in whether pending_ops is 0 or < 0. A 'thread apply > all bt' may also be useful. > > Thanks! > sage > > >>> -martin >>> >>> Gregory Farnum schrieb: >>>> Do you have any other system state? (More logs, core dumps.) >>>> >>>> Make a bug in the tracker either way so it doesn't get lost track of. :) >>>> -Greg >>>> >>>> On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand <martin@tuxadero.com> >>>> wrote: >>>>> Hi, >>>>> today one of my ods died, the log is. >>>>> >>>>> sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread >>>>> '7faeb6139700' >>>>> osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) >>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >>>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >>>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] >>>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >>>>> 4: (()+0x6d8c) [0x7faec4d12d8c] >>>>> 5: (clone()+0x6d) [0x7faec355404d] >>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >>>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >>>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] >>>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >>>>> 4: (()+0x6d8c) [0x7faec4d12d8c] >>>>> 5: (clone()+0x6d) [0x7faec355404d] >>>>> *** Caught signal (Aborted) ** >>>>> in thread 7faeb6139700 >>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >>>>> 1: /usr/bin/ceph-osd() [0x5b8b52] >>>>> 2: (()+0xfc60) [0x7faec4d1bc60] >>>>> 3: (gsignal()+0x35) [0x7faec34a1d05] >>>>> 4: (abort()+0x186) [0x7faec34a5ab6] >>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd] >>>>> 6: (()+0xb9926) [0x7faec3d56926] >>>>> 7: (()+0xb9953) [0x7faec3d56953] >>>>> 8: (()+0xb9a5e) [0x7faec3d56a5e] >>>>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>> const*)+0x396) [0x5bddb6] >>>>> 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >>>>> 11: (ThreadPool::worker()+0x6e6) [0x5b7b16] >>>>> 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >>>>> 13: (()+0x6d8c) [0x7faec4d12d8c] >>>>> 14: (clone()+0x6d) [0x7faec355404d] >>>>> >>>>> Anything else needed to debug this? >>>>> >>>>> -martin >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) 2011-11-14 20:21 ` Sage Weil 2011-11-15 19:57 ` Martin Mailand @ 2011-11-15 23:05 ` Martin Mailand 2011-11-16 21:12 ` Sage Weil 1 sibling, 1 reply; 11+ messages in thread From: Martin Mailand @ 2011-11-15 23:05 UTC (permalink / raw) To: Sage Weil; +Cc: Gregory Farnum, ceph-devel Hi, so after a little help from greg. (gdb) print pending_ops $1 = 0 -martin Sage Weil schrieb: > On Mon, 14 Nov 2011, Gregory Farnum wrote: >> It's not a big deal; logging is expensive. :) Just a backtrace isn't a >> lot to go on, but it's better than nothing! >> >> On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand <martin@tuxadero.com> wrote: >>> Hi Gregory, >>> I do not have more at the moment. As I cannot have the debug log always on, >>> a core dump would be the best solution? > > I'm mainly interested in whether pending_ops is 0 or < 0. A 'thread apply > all bt' may also be useful. > > Thanks! > sage > > >>> -martin >>> >>> Gregory Farnum schrieb: >>>> Do you have any other system state? (More logs, core dumps.) >>>> >>>> Make a bug in the tracker either way so it doesn't get lost track of. :) >>>> -Greg >>>> >>>> On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand <martin@tuxadero.com> >>>> wrote: >>>>> Hi, >>>>> today one of my ods died, the log is. >>>>> >>>>> sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread >>>>> '7faeb6139700' >>>>> osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) >>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >>>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >>>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] >>>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >>>>> 4: (()+0x6d8c) [0x7faec4d12d8c] >>>>> 5: (clone()+0x6d) [0x7faec355404d] >>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >>>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >>>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] >>>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >>>>> 4: (()+0x6d8c) [0x7faec4d12d8c] >>>>> 5: (clone()+0x6d) [0x7faec355404d] >>>>> *** Caught signal (Aborted) ** >>>>> in thread 7faeb6139700 >>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >>>>> 1: /usr/bin/ceph-osd() [0x5b8b52] >>>>> 2: (()+0xfc60) [0x7faec4d1bc60] >>>>> 3: (gsignal()+0x35) [0x7faec34a1d05] >>>>> 4: (abort()+0x186) [0x7faec34a5ab6] >>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd] >>>>> 6: (()+0xb9926) [0x7faec3d56926] >>>>> 7: (()+0xb9953) [0x7faec3d56953] >>>>> 8: (()+0xb9a5e) [0x7faec3d56a5e] >>>>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>> const*)+0x396) [0x5bddb6] >>>>> 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >>>>> 11: (ThreadPool::worker()+0x6e6) [0x5b7b16] >>>>> 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >>>>> 13: (()+0x6d8c) [0x7faec4d12d8c] >>>>> 14: (clone()+0x6d) [0x7faec355404d] >>>>> >>>>> Anything else needed to debug this? >>>>> >>>>> -martin >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) 2011-11-15 23:05 ` Martin Mailand @ 2011-11-16 21:12 ` Sage Weil 2011-11-17 12:07 ` Martin Mailand 2011-11-24 13:23 ` Martin Mailand 0 siblings, 2 replies; 11+ messages in thread From: Sage Weil @ 2011-11-16 21:12 UTC (permalink / raw) To: Martin Mailand; +Cc: Gregory Farnum, ceph-devel Hi Martin, I've reread the code twice now and it's really not clear to me how pending_ops could get out of sync with the actual queue size. I've pushed a couple of patches that remove surrounding dead code and add an additional assert sanity check to master. Have you seen this again, or just that once? Opened http://tracker.newdream.net/issues/1727 Thanks- sage On Wed, 16 Nov 2011, Martin Mailand wrote: > Hi, > so after a little help from greg. > > (gdb) print pending_ops > $1 = 0 > > -martin > > Sage Weil schrieb: > > On Mon, 14 Nov 2011, Gregory Farnum wrote: > > > It's not a big deal; logging is expensive. :) Just a backtrace isn't a > > > lot to go on, but it's better than nothing! > > > > > > On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand <martin@tuxadero.com> > > > wrote: > > > > Hi Gregory, > > > > I do not have more at the moment. As I cannot have the debug log always > > > > on, > > > > a core dump would be the best solution? > > > > I'm mainly interested in whether pending_ops is 0 or < 0. A 'thread apply > > all bt' may also be useful. > > > > Thanks! > > sage > > > > > > > > -martin > > > > > > > > Gregory Farnum schrieb: > > > > > Do you have any other system state? (More logs, core dumps.) > > > > > > > > > > Make a bug in the tracker either way so it doesn't get lost track of. > > > > > :) > > > > > -Greg > > > > > > > > > > On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand <martin@tuxadero.com> > > > > > wrote: > > > > > > Hi, > > > > > > today one of my ods died, the log is. > > > > > > > > > > > > sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread > > > > > > '7faeb6139700' > > > > > > osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) > > > > > > ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) > > > > > > 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] > > > > > > 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] > > > > > > 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] > > > > > > 4: (()+0x6d8c) [0x7faec4d12d8c] > > > > > > 5: (clone()+0x6d) [0x7faec355404d] > > > > > > ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) > > > > > > 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] > > > > > > 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] > > > > > > 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] > > > > > > 4: (()+0x6d8c) [0x7faec4d12d8c] > > > > > > 5: (clone()+0x6d) [0x7faec355404d] > > > > > > *** Caught signal (Aborted) ** > > > > > > in thread 7faeb6139700 > > > > > > ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) > > > > > > 1: /usr/bin/ceph-osd() [0x5b8b52] > > > > > > 2: (()+0xfc60) [0x7faec4d1bc60] > > > > > > 3: (gsignal()+0x35) [0x7faec34a1d05] > > > > > > 4: (abort()+0x186) [0x7faec34a5ab6] > > > > > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) > > > > > > [0x7faec3d586dd] > > > > > > 6: (()+0xb9926) [0x7faec3d56926] > > > > > > 7: (()+0xb9953) [0x7faec3d56953] > > > > > > 8: (()+0xb9a5e) [0x7faec3d56a5e] > > > > > > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > > > > > > const*)+0x396) [0x5bddb6] > > > > > > 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] > > > > > > 11: (ThreadPool::worker()+0x6e6) [0x5b7b16] > > > > > > 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] > > > > > > 13: (()+0x6d8c) [0x7faec4d12d8c] > > > > > > 14: (clone()+0x6d) [0x7faec355404d] > > > > > > > > > > > > Anything else needed to debug this? > > > > > > > > > > > > -martin > > > > > > -- > > > > > > To unsubscribe from this list: send the line "unsubscribe > > > > > > ceph-devel" in > > > > > > the body of a message to majordomo@vger.kernel.org > > > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) 2011-11-16 21:12 ` Sage Weil @ 2011-11-17 12:07 ` Martin Mailand 2011-11-24 13:23 ` Martin Mailand 1 sibling, 0 replies; 11+ messages in thread From: Martin Mailand @ 2011-11-17 12:07 UTC (permalink / raw) To: Sage Weil; +Cc: Gregory Farnum, ceph-devel Hi Sage, I saw it once, but the osd node seems a bit dodgy. I re-imaged the node today, I try again to reproduce it. -martin Am 16.11.2011 22:12, schrieb Sage Weil: > Hi Martin, > > I've reread the code twice now and it's really not clear to me how > pending_ops could get out of sync with the actual queue size. I've pushed > a couple of patches that remove surrounding dead code and add an > additional assert sanity check to master. Have you seen this again, or > just that once? > > Opened http://tracker.newdream.net/issues/1727 > > Thanks- > sage > > > On Wed, 16 Nov 2011, Martin Mailand wrote: > >> Hi, >> so after a little help from greg. >> >> (gdb) print pending_ops >> $1 = 0 >> >> -martin >> >> Sage Weil schrieb: >>> On Mon, 14 Nov 2011, Gregory Farnum wrote: >>>> It's not a big deal; logging is expensive. :) Just a backtrace isn't a >>>> lot to go on, but it's better than nothing! >>>> >>>> On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand<martin@tuxadero.com> >>>> wrote: >>>>> Hi Gregory, >>>>> I do not have more at the moment. As I cannot have the debug log always >>>>> on, >>>>> a core dump would be the best solution? >>> >>> I'm mainly interested in whether pending_ops is 0 or< 0. A 'thread apply >>> all bt' may also be useful. >>> >>> Thanks! >>> sage >>> >>> >>>>> -martin >>>>> >>>>> Gregory Farnum schrieb: >>>>>> Do you have any other system state? (More logs, core dumps.) >>>>>> >>>>>> Make a bug in the tracker either way so it doesn't get lost track of. >>>>>> :) >>>>>> -Greg >>>>>> >>>>>> On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand<martin@tuxadero.com> >>>>>> wrote: >>>>>>> Hi, >>>>>>> today one of my ods died, the log is. >>>>>>> >>>>>>> sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread >>>>>>> '7faeb6139700' >>>>>>> osd/OSD.cc: 5534: FAILED assert(pending_ops> 0) >>>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >>>>>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >>>>>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] >>>>>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >>>>>>> 4: (()+0x6d8c) [0x7faec4d12d8c] >>>>>>> 5: (clone()+0x6d) [0x7faec355404d] >>>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >>>>>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >>>>>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] >>>>>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >>>>>>> 4: (()+0x6d8c) [0x7faec4d12d8c] >>>>>>> 5: (clone()+0x6d) [0x7faec355404d] >>>>>>> *** Caught signal (Aborted) ** >>>>>>> in thread 7faeb6139700 >>>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >>>>>>> 1: /usr/bin/ceph-osd() [0x5b8b52] >>>>>>> 2: (()+0xfc60) [0x7faec4d1bc60] >>>>>>> 3: (gsignal()+0x35) [0x7faec34a1d05] >>>>>>> 4: (abort()+0x186) [0x7faec34a5ab6] >>>>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) >>>>>>> [0x7faec3d586dd] >>>>>>> 6: (()+0xb9926) [0x7faec3d56926] >>>>>>> 7: (()+0xb9953) [0x7faec3d56953] >>>>>>> 8: (()+0xb9a5e) [0x7faec3d56a5e] >>>>>>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>> const*)+0x396) [0x5bddb6] >>>>>>> 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >>>>>>> 11: (ThreadPool::worker()+0x6e6) [0x5b7b16] >>>>>>> 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >>>>>>> 13: (()+0x6d8c) [0x7faec4d12d8c] >>>>>>> 14: (clone()+0x6d) [0x7faec355404d] >>>>>>> >>>>>>> Anything else needed to debug this? >>>>>>> >>>>>>> -martin >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe >>>>>>> ceph-devel" in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) 2011-11-16 21:12 ` Sage Weil 2011-11-17 12:07 ` Martin Mailand @ 2011-11-24 13:23 ` Martin Mailand 2011-11-28 17:19 ` Sage Weil 1 sibling, 1 reply; 11+ messages in thread From: Martin Mailand @ 2011-11-24 13:23 UTC (permalink / raw) To: Sage Weil; +Cc: Gregory Farnum, ceph-devel Hi Sage, I hit it again, this time on another osd ceph version 0.38-181-g2e19550 (commit:2e195500b5d3a8ab8512bcf2a219a6b7ff922c97) Thread 1 (Thread 2951): #0 0x00007f36bbb41b3b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00000000005f5852 in reraise_fatal (signum=6) at global/signal_handler.cc:59 #2 0x00000000005f5e4a in handle_fatal_signal (signum=6) at global/signal_handler.cc:106 #3 <signal handler called> #4 0x00007f36ba0c2d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x00007f36ba0c6ab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x00007f36ba9796dd in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 ---Type <return> to continue, or q <return> to quit--- #7 0x00007f36ba977926 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #8 0x00007f36ba977953 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #9 0x00007f36ba977a5e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #10 0x00000000005f6956 in ceph::__ceph_assert_fail (assertion=<value optimized out>, file=<value optimized out>, line=<value optimized out>, func=<value optimized out>) at common/assert.cc:70 #11 0x000000000056616a in OSD::dequeue_op (this=0x25b0000, pg=<value optimized out>) at osd/OSD.cc:5518 #12 0x00000000005d4406 in ThreadPool::worker (this=0x25b0408) at common/WorkQueue.cc:54 #13 0x00000000005822dd in ThreadPool::WorkThread::entry (this=<value optimized out>) at ./common/WorkQueue.h:120 #14 0x00007f36bbb38d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #15 0x00007f36ba17504d in clone () from /lib/x86_64-linux-gnu/libc.so.6 #16 0x0000000000000000 in ?? () (gdb) thread 1 [Switching to thread 1 (Thread 2951)]#0 0x00007f36bbb41b3b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 (gdb) frame 11 #11 0x000000000056616a in OSD::dequeue_op (this=0x25b0000, pg=<value optimized out>) at osd/OSD.cc:5518 5518 osd/OSD.cc: No such file or directory. in osd/OSD.cc (gdb) p pending_ops $1 = 0 -martin Am 16.11.2011 22:12, schrieb Sage Weil: > Hi Martin, > > I've reread the code twice now and it's really not clear to me how > pending_ops could get out of sync with the actual queue size. I've pushed > a couple of patches that remove surrounding dead code and add an > additional assert sanity check to master. Have you seen this again, or > just that once? > > Opened http://tracker.newdream.net/issues/1727 > > Thanks- > sage > > > On Wed, 16 Nov 2011, Martin Mailand wrote: > >> Hi, >> so after a little help from greg. >> >> (gdb) print pending_ops >> $1 = 0 >> >> -martin >> >> Sage Weil schrieb: >>> On Mon, 14 Nov 2011, Gregory Farnum wrote: >>>> It's not a big deal; logging is expensive. :) Just a backtrace isn't a >>>> lot to go on, but it's better than nothing! >>>> >>>> On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand<martin@tuxadero.com> >>>> wrote: >>>>> Hi Gregory, >>>>> I do not have more at the moment. As I cannot have the debug log always >>>>> on, >>>>> a core dump would be the best solution? >>> >>> I'm mainly interested in whether pending_ops is 0 or< 0. A 'thread apply >>> all bt' may also be useful. >>> >>> Thanks! >>> sage >>> >>> >>>>> -martin >>>>> >>>>> Gregory Farnum schrieb: >>>>>> Do you have any other system state? (More logs, core dumps.) >>>>>> >>>>>> Make a bug in the tracker either way so it doesn't get lost track of. >>>>>> :) >>>>>> -Greg >>>>>> >>>>>> On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand<martin@tuxadero.com> >>>>>> wrote: >>>>>>> Hi, >>>>>>> today one of my ods died, the log is. >>>>>>> >>>>>>> sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread >>>>>>> '7faeb6139700' >>>>>>> osd/OSD.cc: 5534: FAILED assert(pending_ops> 0) >>>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >>>>>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >>>>>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] >>>>>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >>>>>>> 4: (()+0x6d8c) [0x7faec4d12d8c] >>>>>>> 5: (clone()+0x6d) [0x7faec355404d] >>>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >>>>>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >>>>>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] >>>>>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >>>>>>> 4: (()+0x6d8c) [0x7faec4d12d8c] >>>>>>> 5: (clone()+0x6d) [0x7faec355404d] >>>>>>> *** Caught signal (Aborted) ** >>>>>>> in thread 7faeb6139700 >>>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) >>>>>>> 1: /usr/bin/ceph-osd() [0x5b8b52] >>>>>>> 2: (()+0xfc60) [0x7faec4d1bc60] >>>>>>> 3: (gsignal()+0x35) [0x7faec34a1d05] >>>>>>> 4: (abort()+0x186) [0x7faec34a5ab6] >>>>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) >>>>>>> [0x7faec3d586dd] >>>>>>> 6: (()+0xb9926) [0x7faec3d56926] >>>>>>> 7: (()+0xb9953) [0x7faec3d56953] >>>>>>> 8: (()+0xb9a5e) [0x7faec3d56a5e] >>>>>>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>> const*)+0x396) [0x5bddb6] >>>>>>> 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] >>>>>>> 11: (ThreadPool::worker()+0x6e6) [0x5b7b16] >>>>>>> 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] >>>>>>> 13: (()+0x6d8c) [0x7faec4d12d8c] >>>>>>> 14: (clone()+0x6d) [0x7faec355404d] >>>>>>> >>>>>>> Anything else needed to debug this? >>>>>>> >>>>>>> -martin >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe >>>>>>> ceph-devel" in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) 2011-11-24 13:23 ` Martin Mailand @ 2011-11-28 17:19 ` Sage Weil 0 siblings, 0 replies; 11+ messages in thread From: Sage Weil @ 2011-11-28 17:19 UTC (permalink / raw) To: Martin Mailand; +Cc: Gregory Farnum, ceph-devel Hi Martin, I reviewed this code again last week and realized the locking wasn't quite right. And then that the pending_ops counter was largely useless. So most of it has been simplified/rewritten now in master, and this problem will be gone--at least in its current form. Please let us know if you see any new issues with the latest master. (The relevant commit is b47347bd7c377037f7fbc199f0c88b447c9626d1.) Thanks- sage On Thu, 24 Nov 2011, Martin Mailand wrote: > Hi Sage, > I hit it again, this time on another osd > > ceph version 0.38-181-g2e19550 > (commit:2e195500b5d3a8ab8512bcf2a219a6b7ff922c97) > > Thread 1 (Thread 2951): > #0 0x00007f36bbb41b3b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x00000000005f5852 in reraise_fatal (signum=6) at > global/signal_handler.cc:59 > #2 0x00000000005f5e4a in handle_fatal_signal (signum=6) at > global/signal_handler.cc:106 > #3 <signal handler called> > #4 0x00007f36ba0c2d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > #5 0x00007f36ba0c6ab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6 > #6 0x00007f36ba9796dd in __gnu_cxx::__verbose_terminate_handler() () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > ---Type <return> to continue, or q <return> to quit--- > #7 0x00007f36ba977926 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #8 0x00007f36ba977953 in std::terminate() () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #9 0x00007f36ba977a5e in __cxa_throw () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #10 0x00000000005f6956 in ceph::__ceph_assert_fail (assertion=<value optimized > out>, file=<value optimized out>, line=<value optimized out>, > func=<value optimized out>) at common/assert.cc:70 > #11 0x000000000056616a in OSD::dequeue_op (this=0x25b0000, pg=<value optimized > out>) at osd/OSD.cc:5518 > #12 0x00000000005d4406 in ThreadPool::worker (this=0x25b0408) at > common/WorkQueue.cc:54 > #13 0x00000000005822dd in ThreadPool::WorkThread::entry (this=<value optimized > out>) at ./common/WorkQueue.h:120 > #14 0x00007f36bbb38d8c in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #15 0x00007f36ba17504d in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #16 0x0000000000000000 in ?? () > (gdb) thread 1 > [Switching to thread 1 (Thread 2951)]#0 0x00007f36bbb41b3b in raise () from > /lib/x86_64-linux-gnu/libpthread.so.0 > (gdb) frame 11 > #11 0x000000000056616a in OSD::dequeue_op (this=0x25b0000, pg=<value optimized > out>) at osd/OSD.cc:5518 > 5518 osd/OSD.cc: No such file or directory. > in osd/OSD.cc > (gdb) p pending_ops > $1 = 0 > > > > -martin > > > Am 16.11.2011 22:12, schrieb Sage Weil: > > Hi Martin, > > > > I've reread the code twice now and it's really not clear to me how > > pending_ops could get out of sync with the actual queue size. I've pushed > > a couple of patches that remove surrounding dead code and add an > > additional assert sanity check to master. Have you seen this again, or > > just that once? > > > > Opened http://tracker.newdream.net/issues/1727 > > > > Thanks- > > sage > > > > > > On Wed, 16 Nov 2011, Martin Mailand wrote: > > > > > Hi, > > > so after a little help from greg. > > > > > > (gdb) print pending_ops > > > $1 = 0 > > > > > > -martin > > > > > > Sage Weil schrieb: > > > > On Mon, 14 Nov 2011, Gregory Farnum wrote: > > > > > It's not a big deal; logging is expensive. :) Just a backtrace isn't a > > > > > lot to go on, but it's better than nothing! > > > > > > > > > > On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand<martin@tuxadero.com> > > > > > wrote: > > > > > > Hi Gregory, > > > > > > I do not have more at the moment. As I cannot have the debug log > > > > > > always > > > > > > on, > > > > > > a core dump would be the best solution? > > > > > > > > I'm mainly interested in whether pending_ops is 0 or< 0. A 'thread > > > > apply > > > > all bt' may also be useful. > > > > > > > > Thanks! > > > > sage > > > > > > > > > > > > > > -martin > > > > > > > > > > > > Gregory Farnum schrieb: > > > > > > > Do you have any other system state? (More logs, core dumps.) > > > > > > > > > > > > > > Make a bug in the tracker either way so it doesn't get lost track > > > > > > > of. > > > > > > > :) > > > > > > > -Greg > > > > > > > > > > > > > > On Mon, Nov 14, 2011 at 6:04 AM, Martin > > > > > > > Mailand<martin@tuxadero.com> > > > > > > > wrote: > > > > > > > > Hi, > > > > > > > > today one of my ods died, the log is. > > > > > > > > > > > > > > > > sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread > > > > > > > > '7faeb6139700' > > > > > > > > osd/OSD.cc: 5534: FAILED assert(pending_ops> 0) > > > > > > > > ceph version 0.38 > > > > > > > > (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) > > > > > > > > 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] > > > > > > > > 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] > > > > > > > > 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] > > > > > > > > 4: (()+0x6d8c) [0x7faec4d12d8c] > > > > > > > > 5: (clone()+0x6d) [0x7faec355404d] > > > > > > > > ceph version 0.38 > > > > > > > > (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) > > > > > > > > 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] > > > > > > > > 2: (ThreadPool::worker()+0x6e6) [0x5b7b16] > > > > > > > > 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] > > > > > > > > 4: (()+0x6d8c) [0x7faec4d12d8c] > > > > > > > > 5: (clone()+0x6d) [0x7faec355404d] > > > > > > > > *** Caught signal (Aborted) ** > > > > > > > > in thread 7faeb6139700 > > > > > > > > ceph version 0.38 > > > > > > > > (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9) > > > > > > > > 1: /usr/bin/ceph-osd() [0x5b8b52] > > > > > > > > 2: (()+0xfc60) [0x7faec4d1bc60] > > > > > > > > 3: (gsignal()+0x35) [0x7faec34a1d05] > > > > > > > > 4: (abort()+0x186) [0x7faec34a5ab6] > > > > > > > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) > > > > > > > > [0x7faec3d586dd] > > > > > > > > 6: (()+0xb9926) [0x7faec3d56926] > > > > > > > > 7: (()+0xb9953) [0x7faec3d56953] > > > > > > > > 8: (()+0xb9a5e) [0x7faec3d56a5e] > > > > > > > > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, > > > > > > > > char > > > > > > > > const*)+0x396) [0x5bddb6] > > > > > > > > 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db] > > > > > > > > 11: (ThreadPool::worker()+0x6e6) [0x5b7b16] > > > > > > > > 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d] > > > > > > > > 13: (()+0x6d8c) [0x7faec4d12d8c] > > > > > > > > 14: (clone()+0x6d) [0x7faec355404d] > > > > > > > > > > > > > > > > Anything else needed to debug this? > > > > > > > > > > > > > > > > -martin > > > > > > > > -- > > > > > > > > To unsubscribe from this list: send the line "unsubscribe > > > > > > > > ceph-devel" in > > > > > > > > the body of a message to majordomo@vger.kernel.org > > > > > > > > More majordomo info at > > > > > > > > http://vger.kernel.org/majordomo-info.html > > > > > > > > > > > > > -- > > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > > > > > in > > > > > the body of a message to majordomo@vger.kernel.org > > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-11-28 17:19 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-11-14 14:04 osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) Martin Mailand 2011-11-14 19:11 ` Gregory Farnum 2011-11-14 19:45 ` Martin Mailand 2011-11-14 19:54 ` Gregory Farnum 2011-11-14 20:21 ` Sage Weil 2011-11-15 19:57 ` Martin Mailand 2011-11-15 23:05 ` Martin Mailand 2011-11-16 21:12 ` Sage Weil 2011-11-17 12:07 ` Martin Mailand 2011-11-24 13:23 ` Martin Mailand 2011-11-28 17:19 ` Sage Weil
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.