Re: OSD crash - Fyodor Ustinov

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Fyodor Ustinov <ufm@ufm.su>
To: Gregory Farnum <gregf@hq.newdream.net>
Cc: ceph-devel@vger.kernel.org
Subject: Re: OSD crash
Date: Fri, 27 May 2011 19:41:50 +0300	[thread overview]
Message-ID: <4DDFD44E.8070305@ufm.su> (raw)
In-Reply-To: <BANLkTimxRbM95_LMJwAa+oUwj-HAePJM+Q@mail.gmail.com>

On 05/27/2011 06:16 PM, Gregory Farnum wrote:
> This is an interesting one -- the invariant that assert is checking
> isn't too complicated (that the object lives on the RecoveryWQ's
> queue) and seems to hold everywhere the RecoveryWQ is called. And the
> functions modifying the queue are always called under the workqueue
> lock, and do maintenance if the xlist::item is on a different list.
> Which makes me think that the problem must be from conflating the
> RecoveryWQ lock and the PG lock in the few places that modify the
> PG::recovery_item directly, rather than via RecoveryWQ functions.
> Anybody more familiar than me with this have ideas?
> Fyodor, based on the time stamps and output you've given us, I assume
> you don't have more detailed logs?
> -Greg

Greg, i got this crash again.
Let me tell you the configuration and what is happening:
Configuration:
6 osd servers. 4G RAM, 4*1T hdd (mdadmed to raid0), 2*1G etherchannel 
ethernet, Ubuntu server 11.04/64  with kernel 2.6.39 (hand compiled)
mon+mds server 24G RAM, the same os.

On each OSD Journal placed on 1G tempfs. OSD data - on xfs in this case.

Configuration file:

[global]
         max open files = 131072
         log file = /var/log/ceph/$name.log
         pid file = /var/run/ceph/$name.pid

[mon]
         mon data = /mfs/mon$id

[mon.0]
         mon addr  = 10.5.51.230:6789

[mds]
         keyring = /mfs/mds/keyring.$name

[mds.0]
         host = mds0


[osd]
         osd data = /$name
         osd journal = /journal/$name
         osd journal size = 950
         journal dio = false

[osd.0]
         host = osd0
         cluster addr = 10.5.51.10
         public addr = 10.5.51.140

[osd.1]
         host = osd1
         cluster addr = 10.5.51.11
         public addr = 10.5.51.141

[osd.2]
         host = osd2
         cluster addr = 10.5.51.12
         public addr = 10.5.51.142

[osd.3]
         host = osd3
         cluster addr = 10.5.51.13
         public addr = 10.5.51.143

[osd.4]
         host = osd4
         cluster addr = 10.5.51.14
         public addr = 10.5.51.144

[osd.5]
         host = osd5
         cluster addr = 10.5.51.15
         public addr = 10.5.51.145

What happening:
osd2 was crashed, rebooted, osd data and journal created from scratch by 
"cosd --mkfs -i 2 --monmap /tmp/monmap" and server started.
Additional - on osd2 enables "writeahaed", but I think it's not 
principal in this case.

Well, server start rebalancing:

2011-05-27 15:12:49.323558 7f3b69de5740 ceph version 0.28.1.commit: 
d66c6ca19bbde3c363b135b66072de44e67c6632. process: cosd. pid: 1694
2011-05-27 15:12:49.325331 7f3b69de5740 filestore(/osd.2) mount FIEMAP 
ioctl is NOT supported
2011-05-27 15:12:49.325378 7f3b69de5740 filestore(/osd.2) mount did NOT 
detect btrfs
2011-05-27 15:12:49.325467 7f3b69de5740 filestore(/osd.2) mount found 
snaps <>
2011-05-27 15:12:49.325512 7f3b69de5740 filestore(/osd.2) mount: 
WRITEAHEAD journal mode explicitly enabled in conf
2011-05-27 15:12:49.325526 7f3b69de5740 filestore(/osd.2) mount WARNING: 
not btrfs or ext3; data may be lost
2011-05-27 15:12:49.325606 7f3b69de5740 journal _open /journal/osd.2 fd 
11: 996147200 bytes, block size 4096 bytes, directio = 0
2011-05-27 15:12:49.325641 7f3b69de5740 journal read_entry 4096 : seq 1 
203 bytes
2011-05-27 15:12:49.325698 7f3b69de5740 journal _open /journal/osd.2 fd 
11: 996147200 bytes, block size 4096 bytes, directio = 0
2011-05-27 15:12:49.544716 7f3b59656700 -- 10.5.51.12:6801/1694 >> 
10.5.51.14:6801/5070 pipe(0x1239d20 sd=27 pgs=0 cs=0 l=0).accept we 
reset (peer sent cseq 2), sending RESETSESSION
2011-05-27 15:12:49.544798 7f3b59c5c700 -- 10.5.51.12:6801/1694 >> 
10.5.51.13:6801/5165 pipe(0x104b950 sd=14 pgs=0 cs=0 l=0).accept we 
reset (peer sent cseq 2), sending RESETSESSION
2011-05-27 15:12:49.544864 7f3b59757700 -- 10.5.51.12:6801/1694 >> 
10.5.51.15:6801/1574 pipe(0x11e7cd0 sd=16 pgs=0 cs=0 l=0).accept we 
reset (peer sent cseq 2), sending RESETSESSION
2011-05-27 15:12:49.544909 7f3b59959700 -- 10.5.51.12:6801/1694 >> 
10.5.51.10:6801/6148 pipe(0x11d7d30 sd=15 pgs=0 cs=0 l=0).accept we 
reset (peer sent cseq 2), sending RESETSESSION
2011-05-27 15:13:23.015637 7f3b64579700 journal check_for_full at 
66404352 : JOURNAL FULL 66404352 >= 851967 (max_size 996147200 start 
67256320)
2011-05-27 15:13:25.586081 7f3b5dc6b700 journal throttle: waited for bytes
2011-05-27 15:13:25.601789 7f3b5d46a700 journal throttle: waited for bytes

[...] and after 2 hours:

2011-05-27 17:30:21.355034 7f3b64579700 journal check_for_full at 
415199232 : JOURNAL FULL 415199232 >= 778239 (max_size 996147200 start 
415977472)
2011-05-27 17:30:23.441445 7f3b5d46a700 journal throttle: waited for bytes
2011-05-27 17:30:36.362877 7f3b64579700 journal check_for_full at 
414326784 : JOURNAL FULL 414326784 >= 872447 (max_size 996147200 start 
415199232)
2011-05-27 17:30:38.391372 7f3b5d46a700 journal throttle: waited for bytes
2011-05-27 17:30:50.373936 7f3b64579700 journal check_for_full at 
414314496 : JOURNAL FULL 414314496 >= 12287 (max_size 996147200 start 
414326784)
./include/xlist.h: In function 'void xlist<T>::remove(xlist<T>::item*) 
[with T = PG*]', in thread '0x7f3b5cc69700'
./include/xlist.h: 107: FAILED assert(i->_list == this)
  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
  1: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
  2: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
  3: (ThreadPool::worker()+0x10a) [0x65799a]
  4: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
  5: (()+0x6d8c) [0x7f3b697b5d8c]
  6: (clone()+0x6d) [0x7f3b6866804d]
  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
  1: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
  2: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
  3: (ThreadPool::worker()+0x10a) [0x65799a]
  4: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
  5: (()+0x6d8c) [0x7f3b697b5d8c]
  6: (clone()+0x6d) [0x7f3b6866804d]
*** Caught signal (Aborted) **
  in thread 0x7f3b5cc69700
  ceph version 0.28.1 (commit:d66c6ca19bbde3c363b135b66072de44e67c6632)
  1: /usr/bin/cosd() [0x6729f9]
  2: (()+0xfc60) [0x7f3b697bec60]
  3: (gsignal()+0x35) [0x7f3b685b5d05]
  4: (abort()+0x186) [0x7f3b685b9ab6]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f3b68e6c6dd]
  6: (()+0xb9926) [0x7f3b68e6a926]
  7: (()+0xb9953) [0x7f3b68e6a953]
  8: (()+0xb9a5e) [0x7f3b68e6aa5e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x362) [0x655e32]
  10: (xlist<PG*>::pop_front()+0xbb) [0x54f28b]
  11: (OSD::RecoveryWQ::_dequeue()+0x73) [0x56bcc3]
  12: (ThreadPool::worker()+0x10a) [0x65799a]
  13: (ThreadPool::WorkThread::entry()+0xd) [0x548c8d]
  14: (()+0x6d8c) [0x7f3b697b5d8c]
  15: (clone()+0x6d) [0x7f3b6866804d]

I.e. it's not "easy reproduced" bug. While I had less data in the 
cluster - I not seen this error.

I think that I do not have enough space for "full" log for 2-3 hours. Sorry.

WBR,
     Fyodor.

next prev parent reply	other threads:[~2011-05-27 16:41 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-27  0:12 OSD crash Fyodor Ustinov
2011-05-27 15:16 ` Gregory Farnum
2011-05-27 16:41   ` Fyodor Ustinov [this message]
2011-05-27 16:49     ` Gregory Farnum
2011-05-27 19:18       ` Gregory Farnum
2011-05-27 19:30         ` Fyodor Ustinov
2011-05-27 22:52         ` Fyodor Ustinov
  -- strict thread matches above, loose matches on Subject: below --
2020-09-07 16:42 osd crash Kaarlo Lahtela
     [not found] <8566685.312.1362419807745.JavaMail.dspano@it1>
2013-03-04 18:02 ` OSD Crash Dave Spano
2012-08-22 20:31 OSD crash Andrey Korolyov
2012-08-22 22:33 ` Sage Weil
2012-08-22 22:55   ` Andrey Korolyov
2012-08-23  0:09     ` Gregory Farnum
2012-08-25  8:30       ` Andrey Korolyov
2012-08-26 16:52         ` Andrey Korolyov
2012-08-26 20:44           ` Sage Weil
2012-09-04  8:13           ` Andrey Korolyov
2012-09-04 15:32             ` Sage Weil
2012-06-16 12:57 Stefan Priebe
2012-06-16 13:34 ` Stefan Priebe
2012-06-17 21:16   ` Sage Weil
2012-06-18  6:41     ` Stefan Priebe - Profihost AG
2011-05-11 20:47 OSD Crash Mark Nigh
2011-05-11 21:06 ` Sage Weil
2011-05-11 21:39 ` Colin McCabe
2011-05-13 17:03   ` Mark Nigh
2011-05-13 18:34     ` Sage Weil
2011-05-11 13:12 Mark Nigh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DDFD44E.8070305@ufm.su \
    --to=ufm@ufm.su \
    --cc=ceph-devel@vger.kernel.org \
    --cc=gregf@hq.newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.