* osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
@ 2011-11-14 14:04 Martin Mailand
2011-11-14 19:11 ` Gregory Farnum
0 siblings, 1 reply; 11+ messages in thread
From: Martin Mailand @ 2011-11-14 14:04 UTC (permalink / raw)
To: ceph-devel
Hi,
today one of my ods died, the log is.
sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread '7faeb6139700'
osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
4: (()+0x6d8c) [0x7faec4d12d8c]
5: (clone()+0x6d) [0x7faec355404d]
ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
4: (()+0x6d8c) [0x7faec4d12d8c]
5: (clone()+0x6d) [0x7faec355404d]
*** Caught signal (Aborted) **
in thread 7faeb6139700
ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
1: /usr/bin/ceph-osd() [0x5b8b52]
2: (()+0xfc60) [0x7faec4d1bc60]
3: (gsignal()+0x35) [0x7faec34a1d05]
4: (abort()+0x186) [0x7faec34a5ab6]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd]
6: (()+0xb9926) [0x7faec3d56926]
7: (()+0xb9953) [0x7faec3d56953]
8: (()+0xb9a5e) [0x7faec3d56a5e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x396) [0x5bddb6]
10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
13: (()+0x6d8c) [0x7faec4d12d8c]
14: (clone()+0x6d) [0x7faec355404d]
Anything else needed to debug this?
-martin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
2011-11-14 14:04 osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) Martin Mailand
@ 2011-11-14 19:11 ` Gregory Farnum
2011-11-14 19:45 ` Martin Mailand
0 siblings, 1 reply; 11+ messages in thread
From: Gregory Farnum @ 2011-11-14 19:11 UTC (permalink / raw)
To: Martin Mailand; +Cc: ceph-devel
Do you have any other system state? (More logs, core dumps.)
Make a bug in the tracker either way so it doesn't get lost track of. :)
-Greg
On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand <martin@tuxadero.com> wrote:
> Hi,
> today one of my ods died, the log is.
>
> sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread '7faeb6139700'
> osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
> 4: (()+0x6d8c) [0x7faec4d12d8c]
> 5: (clone()+0x6d) [0x7faec355404d]
> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
> 4: (()+0x6d8c) [0x7faec4d12d8c]
> 5: (clone()+0x6d) [0x7faec355404d]
> *** Caught signal (Aborted) **
> in thread 7faeb6139700
> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
> 1: /usr/bin/ceph-osd() [0x5b8b52]
> 2: (()+0xfc60) [0x7faec4d1bc60]
> 3: (gsignal()+0x35) [0x7faec34a1d05]
> 4: (abort()+0x186) [0x7faec34a5ab6]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd]
> 6: (()+0xb9926) [0x7faec3d56926]
> 7: (()+0xb9953) [0x7faec3d56953]
> 8: (()+0xb9a5e) [0x7faec3d56a5e]
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x396) [0x5bddb6]
> 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
> 11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
> 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
> 13: (()+0x6d8c) [0x7faec4d12d8c]
> 14: (clone()+0x6d) [0x7faec355404d]
>
> Anything else needed to debug this?
>
> -martin
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
2011-11-14 19:11 ` Gregory Farnum
@ 2011-11-14 19:45 ` Martin Mailand
2011-11-14 19:54 ` Gregory Farnum
0 siblings, 1 reply; 11+ messages in thread
From: Martin Mailand @ 2011-11-14 19:45 UTC (permalink / raw)
To: Gregory Farnum; +Cc: ceph-devel
Hi Gregory,
I do not have more at the moment. As I cannot have the debug log always
on, a core dump would be the best solution?
-martin
Gregory Farnum schrieb:
> Do you have any other system state? (More logs, core dumps.)
>
> Make a bug in the tracker either way so it doesn't get lost track of. :)
> -Greg
>
> On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand <martin@tuxadero.com> wrote:
>> Hi,
>> today one of my ods died, the log is.
>>
>> sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread '7faeb6139700'
>> osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>> 4: (()+0x6d8c) [0x7faec4d12d8c]
>> 5: (clone()+0x6d) [0x7faec355404d]
>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>> 4: (()+0x6d8c) [0x7faec4d12d8c]
>> 5: (clone()+0x6d) [0x7faec355404d]
>> *** Caught signal (Aborted) **
>> in thread 7faeb6139700
>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>> 1: /usr/bin/ceph-osd() [0x5b8b52]
>> 2: (()+0xfc60) [0x7faec4d1bc60]
>> 3: (gsignal()+0x35) [0x7faec34a1d05]
>> 4: (abort()+0x186) [0x7faec34a5ab6]
>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd]
>> 6: (()+0xb9926) [0x7faec3d56926]
>> 7: (()+0xb9953) [0x7faec3d56953]
>> 8: (()+0xb9a5e) [0x7faec3d56a5e]
>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x396) [0x5bddb6]
>> 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>> 11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>> 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>> 13: (()+0x6d8c) [0x7faec4d12d8c]
>> 14: (clone()+0x6d) [0x7faec355404d]
>>
>> Anything else needed to debug this?
>>
>> -martin
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
2011-11-14 19:45 ` Martin Mailand
@ 2011-11-14 19:54 ` Gregory Farnum
2011-11-14 20:21 ` Sage Weil
0 siblings, 1 reply; 11+ messages in thread
From: Gregory Farnum @ 2011-11-14 19:54 UTC (permalink / raw)
To: martin; +Cc: ceph-devel
It's not a big deal; logging is expensive. :) Just a backtrace isn't a
lot to go on, but it's better than nothing!
On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand <martin@tuxadero.com> wrote:
> Hi Gregory,
> I do not have more at the moment. As I cannot have the debug log always on,
> a core dump would be the best solution?
>
> -martin
>
> Gregory Farnum schrieb:
>>
>> Do you have any other system state? (More logs, core dumps.)
>>
>> Make a bug in the tracker either way so it doesn't get lost track of. :)
>> -Greg
>>
>> On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand <martin@tuxadero.com>
>> wrote:
>>>
>>> Hi,
>>> today one of my ods died, the log is.
>>>
>>> sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread
>>> '7faeb6139700'
>>> osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>>> 4: (()+0x6d8c) [0x7faec4d12d8c]
>>> 5: (clone()+0x6d) [0x7faec355404d]
>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>>> 4: (()+0x6d8c) [0x7faec4d12d8c]
>>> 5: (clone()+0x6d) [0x7faec355404d]
>>> *** Caught signal (Aborted) **
>>> in thread 7faeb6139700
>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>>> 1: /usr/bin/ceph-osd() [0x5b8b52]
>>> 2: (()+0xfc60) [0x7faec4d1bc60]
>>> 3: (gsignal()+0x35) [0x7faec34a1d05]
>>> 4: (abort()+0x186) [0x7faec34a5ab6]
>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd]
>>> 6: (()+0xb9926) [0x7faec3d56926]
>>> 7: (()+0xb9953) [0x7faec3d56953]
>>> 8: (()+0xb9a5e) [0x7faec3d56a5e]
>>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x396) [0x5bddb6]
>>> 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>>> 11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>>> 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>>> 13: (()+0x6d8c) [0x7faec4d12d8c]
>>> 14: (clone()+0x6d) [0x7faec355404d]
>>>
>>> Anything else needed to debug this?
>>>
>>> -martin
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
2011-11-14 19:54 ` Gregory Farnum
@ 2011-11-14 20:21 ` Sage Weil
2011-11-15 19:57 ` Martin Mailand
2011-11-15 23:05 ` Martin Mailand
0 siblings, 2 replies; 11+ messages in thread
From: Sage Weil @ 2011-11-14 20:21 UTC (permalink / raw)
To: Gregory Farnum; +Cc: martin, ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3088 bytes --]
On Mon, 14 Nov 2011, Gregory Farnum wrote:
> It's not a big deal; logging is expensive. :) Just a backtrace isn't a
> lot to go on, but it's better than nothing!
>
> On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand <martin@tuxadero.com> wrote:
> > Hi Gregory,
> > I do not have more at the moment. As I cannot have the debug log always on,
> > a core dump would be the best solution?
I'm mainly interested in whether pending_ops is 0 or < 0. A 'thread apply
all bt' may also be useful.
Thanks!
sage
> >
> > -martin
> >
> > Gregory Farnum schrieb:
> >>
> >> Do you have any other system state? (More logs, core dumps.)
> >>
> >> Make a bug in the tracker either way so it doesn't get lost track of. :)
> >> -Greg
> >>
> >> On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand <martin@tuxadero.com>
> >> wrote:
> >>>
> >>> Hi,
> >>> today one of my ods died, the log is.
> >>>
> >>> sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread
> >>> '7faeb6139700'
> >>> osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
> >>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
> >>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
> >>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
> >>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
> >>> 4: (()+0x6d8c) [0x7faec4d12d8c]
> >>> 5: (clone()+0x6d) [0x7faec355404d]
> >>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
> >>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
> >>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
> >>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
> >>> 4: (()+0x6d8c) [0x7faec4d12d8c]
> >>> 5: (clone()+0x6d) [0x7faec355404d]
> >>> *** Caught signal (Aborted) **
> >>> in thread 7faeb6139700
> >>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
> >>> 1: /usr/bin/ceph-osd() [0x5b8b52]
> >>> 2: (()+0xfc60) [0x7faec4d1bc60]
> >>> 3: (gsignal()+0x35) [0x7faec34a1d05]
> >>> 4: (abort()+0x186) [0x7faec34a5ab6]
> >>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd]
> >>> 6: (()+0xb9926) [0x7faec3d56926]
> >>> 7: (()+0xb9953) [0x7faec3d56953]
> >>> 8: (()+0xb9a5e) [0x7faec3d56a5e]
> >>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>> const*)+0x396) [0x5bddb6]
> >>> 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
> >>> 11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
> >>> 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
> >>> 13: (()+0x6d8c) [0x7faec4d12d8c]
> >>> 14: (clone()+0x6d) [0x7faec355404d]
> >>>
> >>> Anything else needed to debug this?
> >>>
> >>> -martin
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>>
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
2011-11-14 20:21 ` Sage Weil
@ 2011-11-15 19:57 ` Martin Mailand
2011-11-15 23:05 ` Martin Mailand
1 sibling, 0 replies; 11+ messages in thread
From: Martin Mailand @ 2011-11-15 19:57 UTC (permalink / raw)
To: Sage Weil; +Cc: Gregory Farnum, ceph-devel
Hi,
I have a bt.
http://pastebin.com/QNcja2QK
-martin
Sage Weil schrieb:
> On Mon, 14 Nov 2011, Gregory Farnum wrote:
>> It's not a big deal; logging is expensive. :) Just a backtrace isn't a
>> lot to go on, but it's better than nothing!
>>
>> On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand <martin@tuxadero.com> wrote:
>>> Hi Gregory,
>>> I do not have more at the moment. As I cannot have the debug log always on,
>>> a core dump would be the best solution?
>
> I'm mainly interested in whether pending_ops is 0 or < 0. A 'thread apply
> all bt' may also be useful.
>
> Thanks!
> sage
>
>
>>> -martin
>>>
>>> Gregory Farnum schrieb:
>>>> Do you have any other system state? (More logs, core dumps.)
>>>>
>>>> Make a bug in the tracker either way so it doesn't get lost track of. :)
>>>> -Greg
>>>>
>>>> On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand <martin@tuxadero.com>
>>>> wrote:
>>>>> Hi,
>>>>> today one of my ods died, the log is.
>>>>>
>>>>> sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread
>>>>> '7faeb6139700'
>>>>> osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>>>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>>>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>>>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>>>>> 4: (()+0x6d8c) [0x7faec4d12d8c]
>>>>> 5: (clone()+0x6d) [0x7faec355404d]
>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>>>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>>>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>>>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>>>>> 4: (()+0x6d8c) [0x7faec4d12d8c]
>>>>> 5: (clone()+0x6d) [0x7faec355404d]
>>>>> *** Caught signal (Aborted) **
>>>>> in thread 7faeb6139700
>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>>>>> 1: /usr/bin/ceph-osd() [0x5b8b52]
>>>>> 2: (()+0xfc60) [0x7faec4d1bc60]
>>>>> 3: (gsignal()+0x35) [0x7faec34a1d05]
>>>>> 4: (abort()+0x186) [0x7faec34a5ab6]
>>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd]
>>>>> 6: (()+0xb9926) [0x7faec3d56926]
>>>>> 7: (()+0xb9953) [0x7faec3d56953]
>>>>> 8: (()+0xb9a5e) [0x7faec3d56a5e]
>>>>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>> const*)+0x396) [0x5bddb6]
>>>>> 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>>>>> 11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>>>>> 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>>>>> 13: (()+0x6d8c) [0x7faec4d12d8c]
>>>>> 14: (clone()+0x6d) [0x7faec355404d]
>>>>>
>>>>> Anything else needed to debug this?
>>>>>
>>>>> -martin
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
2011-11-14 20:21 ` Sage Weil
2011-11-15 19:57 ` Martin Mailand
@ 2011-11-15 23:05 ` Martin Mailand
2011-11-16 21:12 ` Sage Weil
1 sibling, 1 reply; 11+ messages in thread
From: Martin Mailand @ 2011-11-15 23:05 UTC (permalink / raw)
To: Sage Weil; +Cc: Gregory Farnum, ceph-devel
Hi,
so after a little help from greg.
(gdb) print pending_ops
$1 = 0
-martin
Sage Weil schrieb:
> On Mon, 14 Nov 2011, Gregory Farnum wrote:
>> It's not a big deal; logging is expensive. :) Just a backtrace isn't a
>> lot to go on, but it's better than nothing!
>>
>> On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand <martin@tuxadero.com> wrote:
>>> Hi Gregory,
>>> I do not have more at the moment. As I cannot have the debug log always on,
>>> a core dump would be the best solution?
>
> I'm mainly interested in whether pending_ops is 0 or < 0. A 'thread apply
> all bt' may also be useful.
>
> Thanks!
> sage
>
>
>>> -martin
>>>
>>> Gregory Farnum schrieb:
>>>> Do you have any other system state? (More logs, core dumps.)
>>>>
>>>> Make a bug in the tracker either way so it doesn't get lost track of. :)
>>>> -Greg
>>>>
>>>> On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand <martin@tuxadero.com>
>>>> wrote:
>>>>> Hi,
>>>>> today one of my ods died, the log is.
>>>>>
>>>>> sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread
>>>>> '7faeb6139700'
>>>>> osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>>>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>>>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>>>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>>>>> 4: (()+0x6d8c) [0x7faec4d12d8c]
>>>>> 5: (clone()+0x6d) [0x7faec355404d]
>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>>>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>>>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>>>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>>>>> 4: (()+0x6d8c) [0x7faec4d12d8c]
>>>>> 5: (clone()+0x6d) [0x7faec355404d]
>>>>> *** Caught signal (Aborted) **
>>>>> in thread 7faeb6139700
>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>>>>> 1: /usr/bin/ceph-osd() [0x5b8b52]
>>>>> 2: (()+0xfc60) [0x7faec4d1bc60]
>>>>> 3: (gsignal()+0x35) [0x7faec34a1d05]
>>>>> 4: (abort()+0x186) [0x7faec34a5ab6]
>>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd]
>>>>> 6: (()+0xb9926) [0x7faec3d56926]
>>>>> 7: (()+0xb9953) [0x7faec3d56953]
>>>>> 8: (()+0xb9a5e) [0x7faec3d56a5e]
>>>>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>> const*)+0x396) [0x5bddb6]
>>>>> 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>>>>> 11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>>>>> 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>>>>> 13: (()+0x6d8c) [0x7faec4d12d8c]
>>>>> 14: (clone()+0x6d) [0x7faec355404d]
>>>>>
>>>>> Anything else needed to debug this?
>>>>>
>>>>> -martin
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
2011-11-15 23:05 ` Martin Mailand
@ 2011-11-16 21:12 ` Sage Weil
2011-11-17 12:07 ` Martin Mailand
2011-11-24 13:23 ` Martin Mailand
0 siblings, 2 replies; 11+ messages in thread
From: Sage Weil @ 2011-11-16 21:12 UTC (permalink / raw)
To: Martin Mailand; +Cc: Gregory Farnum, ceph-devel
Hi Martin,
I've reread the code twice now and it's really not clear to me how
pending_ops could get out of sync with the actual queue size. I've pushed
a couple of patches that remove surrounding dead code and add an
additional assert sanity check to master. Have you seen this again, or
just that once?
Opened http://tracker.newdream.net/issues/1727
Thanks-
sage
On Wed, 16 Nov 2011, Martin Mailand wrote:
> Hi,
> so after a little help from greg.
>
> (gdb) print pending_ops
> $1 = 0
>
> -martin
>
> Sage Weil schrieb:
> > On Mon, 14 Nov 2011, Gregory Farnum wrote:
> > > It's not a big deal; logging is expensive. :) Just a backtrace isn't a
> > > lot to go on, but it's better than nothing!
> > >
> > > On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand <martin@tuxadero.com>
> > > wrote:
> > > > Hi Gregory,
> > > > I do not have more at the moment. As I cannot have the debug log always
> > > > on,
> > > > a core dump would be the best solution?
> >
> > I'm mainly interested in whether pending_ops is 0 or < 0. A 'thread apply
> > all bt' may also be useful.
> >
> > Thanks!
> > sage
> >
> >
> > > > -martin
> > > >
> > > > Gregory Farnum schrieb:
> > > > > Do you have any other system state? (More logs, core dumps.)
> > > > >
> > > > > Make a bug in the tracker either way so it doesn't get lost track of.
> > > > > :)
> > > > > -Greg
> > > > >
> > > > > On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand <martin@tuxadero.com>
> > > > > wrote:
> > > > > > Hi,
> > > > > > today one of my ods died, the log is.
> > > > > >
> > > > > > sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread
> > > > > > '7faeb6139700'
> > > > > > osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
> > > > > > ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
> > > > > > 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
> > > > > > 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
> > > > > > 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
> > > > > > 4: (()+0x6d8c) [0x7faec4d12d8c]
> > > > > > 5: (clone()+0x6d) [0x7faec355404d]
> > > > > > ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
> > > > > > 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
> > > > > > 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
> > > > > > 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
> > > > > > 4: (()+0x6d8c) [0x7faec4d12d8c]
> > > > > > 5: (clone()+0x6d) [0x7faec355404d]
> > > > > > *** Caught signal (Aborted) **
> > > > > > in thread 7faeb6139700
> > > > > > ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
> > > > > > 1: /usr/bin/ceph-osd() [0x5b8b52]
> > > > > > 2: (()+0xfc60) [0x7faec4d1bc60]
> > > > > > 3: (gsignal()+0x35) [0x7faec34a1d05]
> > > > > > 4: (abort()+0x186) [0x7faec34a5ab6]
> > > > > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d)
> > > > > > [0x7faec3d586dd]
> > > > > > 6: (()+0xb9926) [0x7faec3d56926]
> > > > > > 7: (()+0xb9953) [0x7faec3d56953]
> > > > > > 8: (()+0xb9a5e) [0x7faec3d56a5e]
> > > > > > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > > > > > const*)+0x396) [0x5bddb6]
> > > > > > 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
> > > > > > 11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
> > > > > > 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
> > > > > > 13: (()+0x6d8c) [0x7faec4d12d8c]
> > > > > > 14: (clone()+0x6d) [0x7faec355404d]
> > > > > >
> > > > > > Anything else needed to debug this?
> > > > > >
> > > > > > -martin
> > > > > > --
> > > > > > To unsubscribe from this list: send the line "unsubscribe
> > > > > > ceph-devel" in
> > > > > > the body of a message to majordomo@vger.kernel.org
> > > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
2011-11-16 21:12 ` Sage Weil
@ 2011-11-17 12:07 ` Martin Mailand
2011-11-24 13:23 ` Martin Mailand
1 sibling, 0 replies; 11+ messages in thread
From: Martin Mailand @ 2011-11-17 12:07 UTC (permalink / raw)
To: Sage Weil; +Cc: Gregory Farnum, ceph-devel
Hi Sage,
I saw it once, but the osd node seems a bit dodgy. I re-imaged the node
today, I try again to reproduce it.
-martin
Am 16.11.2011 22:12, schrieb Sage Weil:
> Hi Martin,
>
> I've reread the code twice now and it's really not clear to me how
> pending_ops could get out of sync with the actual queue size. I've pushed
> a couple of patches that remove surrounding dead code and add an
> additional assert sanity check to master. Have you seen this again, or
> just that once?
>
> Opened http://tracker.newdream.net/issues/1727
>
> Thanks-
> sage
>
>
> On Wed, 16 Nov 2011, Martin Mailand wrote:
>
>> Hi,
>> so after a little help from greg.
>>
>> (gdb) print pending_ops
>> $1 = 0
>>
>> -martin
>>
>> Sage Weil schrieb:
>>> On Mon, 14 Nov 2011, Gregory Farnum wrote:
>>>> It's not a big deal; logging is expensive. :) Just a backtrace isn't a
>>>> lot to go on, but it's better than nothing!
>>>>
>>>> On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand<martin@tuxadero.com>
>>>> wrote:
>>>>> Hi Gregory,
>>>>> I do not have more at the moment. As I cannot have the debug log always
>>>>> on,
>>>>> a core dump would be the best solution?
>>>
>>> I'm mainly interested in whether pending_ops is 0 or< 0. A 'thread apply
>>> all bt' may also be useful.
>>>
>>> Thanks!
>>> sage
>>>
>>>
>>>>> -martin
>>>>>
>>>>> Gregory Farnum schrieb:
>>>>>> Do you have any other system state? (More logs, core dumps.)
>>>>>>
>>>>>> Make a bug in the tracker either way so it doesn't get lost track of.
>>>>>> :)
>>>>>> -Greg
>>>>>>
>>>>>> On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand<martin@tuxadero.com>
>>>>>> wrote:
>>>>>>> Hi,
>>>>>>> today one of my ods died, the log is.
>>>>>>>
>>>>>>> sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread
>>>>>>> '7faeb6139700'
>>>>>>> osd/OSD.cc: 5534: FAILED assert(pending_ops> 0)
>>>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>>>>>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>>>>>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>>>>>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>>>>>>> 4: (()+0x6d8c) [0x7faec4d12d8c]
>>>>>>> 5: (clone()+0x6d) [0x7faec355404d]
>>>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>>>>>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>>>>>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>>>>>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>>>>>>> 4: (()+0x6d8c) [0x7faec4d12d8c]
>>>>>>> 5: (clone()+0x6d) [0x7faec355404d]
>>>>>>> *** Caught signal (Aborted) **
>>>>>>> in thread 7faeb6139700
>>>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>>>>>>> 1: /usr/bin/ceph-osd() [0x5b8b52]
>>>>>>> 2: (()+0xfc60) [0x7faec4d1bc60]
>>>>>>> 3: (gsignal()+0x35) [0x7faec34a1d05]
>>>>>>> 4: (abort()+0x186) [0x7faec34a5ab6]
>>>>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d)
>>>>>>> [0x7faec3d586dd]
>>>>>>> 6: (()+0xb9926) [0x7faec3d56926]
>>>>>>> 7: (()+0xb9953) [0x7faec3d56953]
>>>>>>> 8: (()+0xb9a5e) [0x7faec3d56a5e]
>>>>>>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>> const*)+0x396) [0x5bddb6]
>>>>>>> 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>>>>>>> 11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>>>>>>> 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>>>>>>> 13: (()+0x6d8c) [0x7faec4d12d8c]
>>>>>>> 14: (clone()+0x6d) [0x7faec355404d]
>>>>>>>
>>>>>>> Anything else needed to debug this?
>>>>>>>
>>>>>>> -martin
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>> ceph-devel" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
2011-11-16 21:12 ` Sage Weil
2011-11-17 12:07 ` Martin Mailand
@ 2011-11-24 13:23 ` Martin Mailand
2011-11-28 17:19 ` Sage Weil
1 sibling, 1 reply; 11+ messages in thread
From: Martin Mailand @ 2011-11-24 13:23 UTC (permalink / raw)
To: Sage Weil; +Cc: Gregory Farnum, ceph-devel
Hi Sage,
I hit it again, this time on another osd
ceph version 0.38-181-g2e19550
(commit:2e195500b5d3a8ab8512bcf2a219a6b7ff922c97)
Thread 1 (Thread 2951):
#0 0x00007f36bbb41b3b in raise () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00000000005f5852 in reraise_fatal (signum=6) at
global/signal_handler.cc:59
#2 0x00000000005f5e4a in handle_fatal_signal (signum=6) at
global/signal_handler.cc:106
#3 <signal handler called>
#4 0x00007f36ba0c2d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x00007f36ba0c6ab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x00007f36ba9796dd in __gnu_cxx::__verbose_terminate_handler() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
---Type <return> to continue, or q <return> to quit---
#7 0x00007f36ba977926 in ?? () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8 0x00007f36ba977953 in std::terminate() () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9 0x00007f36ba977a5e in __cxa_throw () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00000000005f6956 in ceph::__ceph_assert_fail (assertion=<value
optimized out>, file=<value optimized out>, line=<value optimized out>,
func=<value optimized out>) at common/assert.cc:70
#11 0x000000000056616a in OSD::dequeue_op (this=0x25b0000, pg=<value
optimized out>) at osd/OSD.cc:5518
#12 0x00000000005d4406 in ThreadPool::worker (this=0x25b0408) at
common/WorkQueue.cc:54
#13 0x00000000005822dd in ThreadPool::WorkThread::entry (this=<value
optimized out>) at ./common/WorkQueue.h:120
#14 0x00007f36bbb38d8c in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007f36ba17504d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x0000000000000000 in ?? ()
(gdb) thread 1
[Switching to thread 1 (Thread 2951)]#0 0x00007f36bbb41b3b in raise ()
from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) frame 11
#11 0x000000000056616a in OSD::dequeue_op (this=0x25b0000, pg=<value
optimized out>) at osd/OSD.cc:5518
5518 osd/OSD.cc: No such file or directory.
in osd/OSD.cc
(gdb) p pending_ops
$1 = 0
-martin
Am 16.11.2011 22:12, schrieb Sage Weil:
> Hi Martin,
>
> I've reread the code twice now and it's really not clear to me how
> pending_ops could get out of sync with the actual queue size. I've pushed
> a couple of patches that remove surrounding dead code and add an
> additional assert sanity check to master. Have you seen this again, or
> just that once?
>
> Opened http://tracker.newdream.net/issues/1727
>
> Thanks-
> sage
>
>
> On Wed, 16 Nov 2011, Martin Mailand wrote:
>
>> Hi,
>> so after a little help from greg.
>>
>> (gdb) print pending_ops
>> $1 = 0
>>
>> -martin
>>
>> Sage Weil schrieb:
>>> On Mon, 14 Nov 2011, Gregory Farnum wrote:
>>>> It's not a big deal; logging is expensive. :) Just a backtrace isn't a
>>>> lot to go on, but it's better than nothing!
>>>>
>>>> On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand<martin@tuxadero.com>
>>>> wrote:
>>>>> Hi Gregory,
>>>>> I do not have more at the moment. As I cannot have the debug log always
>>>>> on,
>>>>> a core dump would be the best solution?
>>>
>>> I'm mainly interested in whether pending_ops is 0 or< 0. A 'thread apply
>>> all bt' may also be useful.
>>>
>>> Thanks!
>>> sage
>>>
>>>
>>>>> -martin
>>>>>
>>>>> Gregory Farnum schrieb:
>>>>>> Do you have any other system state? (More logs, core dumps.)
>>>>>>
>>>>>> Make a bug in the tracker either way so it doesn't get lost track of.
>>>>>> :)
>>>>>> -Greg
>>>>>>
>>>>>> On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand<martin@tuxadero.com>
>>>>>> wrote:
>>>>>>> Hi,
>>>>>>> today one of my ods died, the log is.
>>>>>>>
>>>>>>> sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread
>>>>>>> '7faeb6139700'
>>>>>>> osd/OSD.cc: 5534: FAILED assert(pending_ops> 0)
>>>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>>>>>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>>>>>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>>>>>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>>>>>>> 4: (()+0x6d8c) [0x7faec4d12d8c]
>>>>>>> 5: (clone()+0x6d) [0x7faec355404d]
>>>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>>>>>>> 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>>>>>>> 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>>>>>>> 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>>>>>>> 4: (()+0x6d8c) [0x7faec4d12d8c]
>>>>>>> 5: (clone()+0x6d) [0x7faec355404d]
>>>>>>> *** Caught signal (Aborted) **
>>>>>>> in thread 7faeb6139700
>>>>>>> ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
>>>>>>> 1: /usr/bin/ceph-osd() [0x5b8b52]
>>>>>>> 2: (()+0xfc60) [0x7faec4d1bc60]
>>>>>>> 3: (gsignal()+0x35) [0x7faec34a1d05]
>>>>>>> 4: (abort()+0x186) [0x7faec34a5ab6]
>>>>>>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d)
>>>>>>> [0x7faec3d586dd]
>>>>>>> 6: (()+0xb9926) [0x7faec3d56926]
>>>>>>> 7: (()+0xb9953) [0x7faec3d56953]
>>>>>>> 8: (()+0xb9a5e) [0x7faec3d56a5e]
>>>>>>> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>> const*)+0x396) [0x5bddb6]
>>>>>>> 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
>>>>>>> 11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
>>>>>>> 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
>>>>>>> 13: (()+0x6d8c) [0x7faec4d12d8c]
>>>>>>> 14: (clone()+0x6d) [0x7faec355404d]
>>>>>>>
>>>>>>> Anything else needed to debug this?
>>>>>>>
>>>>>>> -martin
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>> ceph-devel" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: osd/OSD.cc: 5534: FAILED assert(pending_ops > 0)
2011-11-24 13:23 ` Martin Mailand
@ 2011-11-28 17:19 ` Sage Weil
0 siblings, 0 replies; 11+ messages in thread
From: Sage Weil @ 2011-11-28 17:19 UTC (permalink / raw)
To: Martin Mailand; +Cc: Gregory Farnum, ceph-devel
Hi Martin,
I reviewed this code again last week and realized the locking wasn't quite
right. And then that the pending_ops counter was largely useless. So
most of it has been simplified/rewritten now in master, and this problem
will be gone--at least in its current form.
Please let us know if you see any new issues with the latest master. (The
relevant commit is b47347bd7c377037f7fbc199f0c88b447c9626d1.)
Thanks-
sage
On Thu, 24 Nov 2011, Martin Mailand wrote:
> Hi Sage,
> I hit it again, this time on another osd
>
> ceph version 0.38-181-g2e19550
> (commit:2e195500b5d3a8ab8512bcf2a219a6b7ff922c97)
>
> Thread 1 (Thread 2951):
> #0 0x00007f36bbb41b3b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
> #1 0x00000000005f5852 in reraise_fatal (signum=6) at
> global/signal_handler.cc:59
> #2 0x00000000005f5e4a in handle_fatal_signal (signum=6) at
> global/signal_handler.cc:106
> #3 <signal handler called>
> #4 0x00007f36ba0c2d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #5 0x00007f36ba0c6ab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #6 0x00007f36ba9796dd in __gnu_cxx::__verbose_terminate_handler() () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ---Type <return> to continue, or q <return> to quit---
> #7 0x00007f36ba977926 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #8 0x00007f36ba977953 in std::terminate() () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #9 0x00007f36ba977a5e in __cxa_throw () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #10 0x00000000005f6956 in ceph::__ceph_assert_fail (assertion=<value optimized
> out>, file=<value optimized out>, line=<value optimized out>,
> func=<value optimized out>) at common/assert.cc:70
> #11 0x000000000056616a in OSD::dequeue_op (this=0x25b0000, pg=<value optimized
> out>) at osd/OSD.cc:5518
> #12 0x00000000005d4406 in ThreadPool::worker (this=0x25b0408) at
> common/WorkQueue.cc:54
> #13 0x00000000005822dd in ThreadPool::WorkThread::entry (this=<value optimized
> out>) at ./common/WorkQueue.h:120
> #14 0x00007f36bbb38d8c in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #15 0x00007f36ba17504d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #16 0x0000000000000000 in ?? ()
> (gdb) thread 1
> [Switching to thread 1 (Thread 2951)]#0 0x00007f36bbb41b3b in raise () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> (gdb) frame 11
> #11 0x000000000056616a in OSD::dequeue_op (this=0x25b0000, pg=<value optimized
> out>) at osd/OSD.cc:5518
> 5518 osd/OSD.cc: No such file or directory.
> in osd/OSD.cc
> (gdb) p pending_ops
> $1 = 0
>
>
>
> -martin
>
>
> Am 16.11.2011 22:12, schrieb Sage Weil:
> > Hi Martin,
> >
> > I've reread the code twice now and it's really not clear to me how
> > pending_ops could get out of sync with the actual queue size. I've pushed
> > a couple of patches that remove surrounding dead code and add an
> > additional assert sanity check to master. Have you seen this again, or
> > just that once?
> >
> > Opened http://tracker.newdream.net/issues/1727
> >
> > Thanks-
> > sage
> >
> >
> > On Wed, 16 Nov 2011, Martin Mailand wrote:
> >
> > > Hi,
> > > so after a little help from greg.
> > >
> > > (gdb) print pending_ops
> > > $1 = 0
> > >
> > > -martin
> > >
> > > Sage Weil schrieb:
> > > > On Mon, 14 Nov 2011, Gregory Farnum wrote:
> > > > > It's not a big deal; logging is expensive. :) Just a backtrace isn't a
> > > > > lot to go on, but it's better than nothing!
> > > > >
> > > > > On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand<martin@tuxadero.com>
> > > > > wrote:
> > > > > > Hi Gregory,
> > > > > > I do not have more at the moment. As I cannot have the debug log
> > > > > > always
> > > > > > on,
> > > > > > a core dump would be the best solution?
> > > >
> > > > I'm mainly interested in whether pending_ops is 0 or< 0. A 'thread
> > > > apply
> > > > all bt' may also be useful.
> > > >
> > > > Thanks!
> > > > sage
> > > >
> > > >
> > > > > > -martin
> > > > > >
> > > > > > Gregory Farnum schrieb:
> > > > > > > Do you have any other system state? (More logs, core dumps.)
> > > > > > >
> > > > > > > Make a bug in the tracker either way so it doesn't get lost track
> > > > > > > of.
> > > > > > > :)
> > > > > > > -Greg
> > > > > > >
> > > > > > > On Mon, Nov 14, 2011 at 6:04 AM, Martin
> > > > > > > Mailand<martin@tuxadero.com>
> > > > > > > wrote:
> > > > > > > > Hi,
> > > > > > > > today one of my ods died, the log is.
> > > > > > > >
> > > > > > > > sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread
> > > > > > > > '7faeb6139700'
> > > > > > > > osd/OSD.cc: 5534: FAILED assert(pending_ops> 0)
> > > > > > > > ceph version 0.38
> > > > > > > > (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
> > > > > > > > 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
> > > > > > > > 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
> > > > > > > > 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
> > > > > > > > 4: (()+0x6d8c) [0x7faec4d12d8c]
> > > > > > > > 5: (clone()+0x6d) [0x7faec355404d]
> > > > > > > > ceph version 0.38
> > > > > > > > (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
> > > > > > > > 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
> > > > > > > > 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
> > > > > > > > 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
> > > > > > > > 4: (()+0x6d8c) [0x7faec4d12d8c]
> > > > > > > > 5: (clone()+0x6d) [0x7faec355404d]
> > > > > > > > *** Caught signal (Aborted) **
> > > > > > > > in thread 7faeb6139700
> > > > > > > > ceph version 0.38
> > > > > > > > (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
> > > > > > > > 1: /usr/bin/ceph-osd() [0x5b8b52]
> > > > > > > > 2: (()+0xfc60) [0x7faec4d1bc60]
> > > > > > > > 3: (gsignal()+0x35) [0x7faec34a1d05]
> > > > > > > > 4: (abort()+0x186) [0x7faec34a5ab6]
> > > > > > > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d)
> > > > > > > > [0x7faec3d586dd]
> > > > > > > > 6: (()+0xb9926) [0x7faec3d56926]
> > > > > > > > 7: (()+0xb9953) [0x7faec3d56953]
> > > > > > > > 8: (()+0xb9a5e) [0x7faec3d56a5e]
> > > > > > > > 9: (ceph::__ceph_assert_fail(char const*, char const*, int,
> > > > > > > > char
> > > > > > > > const*)+0x396) [0x5bddb6]
> > > > > > > > 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
> > > > > > > > 11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
> > > > > > > > 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
> > > > > > > > 13: (()+0x6d8c) [0x7faec4d12d8c]
> > > > > > > > 14: (clone()+0x6d) [0x7faec355404d]
> > > > > > > >
> > > > > > > > Anything else needed to debug this?
> > > > > > > >
> > > > > > > > -martin
> > > > > > > > --
> > > > > > > > To unsubscribe from this list: send the line "unsubscribe
> > > > > > > > ceph-devel" in
> > > > > > > > the body of a message to majordomo@vger.kernel.org
> > > > > > > > More majordomo info at
> > > > > > > > http://vger.kernel.org/majordomo-info.html
> > > > > > > >
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > > > > in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-11-28 17:19 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-14 14:04 osd/OSD.cc: 5534: FAILED assert(pending_ops > 0) Martin Mailand
2011-11-14 19:11 ` Gregory Farnum
2011-11-14 19:45 ` Martin Mailand
2011-11-14 19:54 ` Gregory Farnum
2011-11-14 20:21 ` Sage Weil
2011-11-15 19:57 ` Martin Mailand
2011-11-15 23:05 ` Martin Mailand
2011-11-16 21:12 ` Sage Weil
2011-11-17 12:07 ` Martin Mailand
2011-11-24 13:23 ` Martin Mailand
2011-11-28 17:19 ` Sage Weil
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.