Still inconsistant pg's, ceph-osd crashes reliably after trying to repair

All of lore.kernel.org
 help / color / mirror / Atom feed

* Still inconsistant pg's, ceph-osd crashes reliably after trying to repair
@ 2012-03-01 17:15 Oliver Francke
  2012-03-01 18:07 ` Oliver Francke
  0 siblings, 1 reply; 3+ messages in thread
From: Oliver Francke @ 2012-03-01 17:15 UTC (permalink / raw)
  To: ceph-devel

Hi *,

after some crashes we still had to care for some remaining 
inconsistancies reported via
     ceph -w
and friends.
Well, we traced one of them down via
     ceph pg dump

and we picked 79. pg=79.7 and found the corresponding file in the 
/var/log/ceph/osd.2.log.
     /data/osd4/current/79.7_head/rb.0.0.00000000136c__head_9FB2FA17
and the dup on
     /data/osd2/...
Strange though, they had the same checksum but reported a stat-error. 
Anyway. Decided to do a:
     ceph pg repair 79.7
... byebye ceph-osd on node2!

Here the trace:

=== 8-< ===

2012-03-01 17:49:13.024571 7f3944584700 -- 10.10.10.14:6802/4892 >> 
10.10.10.10:6802/19139 pipe(0xfcd2c80 sd=16 pgs=0 cs=0 l=0).connect 
protocol version mismatch, my 9 != 0
2012-03-01 17:49:23.674162 7f395001b700 log [ERR] : 79.7 osd.4: soid 
9fb2fa17/rb.0.0.00000000136c/headextra attr _, extra attr snapset
2012-03-01 17:49:23.674222 7f395001b700 log [ERR] : 79.7 repair 0 
missing, 1 inconsistent objects
*** Caught signal (Aborted) **
  in thread 7f395001b700
  ceph version 0.42-142-gc9416e6 
(commit:c9416e6184905501159e96115f734bdf65a74d28)
  1: /usr/bin/ceph-osd() [0x5a6b89]
  2: (()+0xeff0) [0x7f3960ca5ff0]
  3: (gsignal()+0x35) [0x7f395f2841b5]
  4: (abort()+0x180) [0x7f395f286fc0]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f395fb18dc5]
  6: (()+0xcb166) [0x7f395fb17166]
  7: (()+0xcb193) [0x7f395fb17193]
  8: (()+0xcb28e) [0x7f395fb1728e]
  9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x13e) 
[0x67c5ce]
  10: (object_info_t::decode(ceph::buffer::list::iterator&)+0x2c) [0x61663c]
  11: (PG::repair_object(hobject_t const&, ScrubMap::object*, int, 
int)+0x3be) [0x68d96e]
  12: (PG::scrub_finalize()+0x1438) [0x6b8568]
  13: (OSD::ScrubFinalizeWQ::_process(PG*)+0xc) [0x588edc]
  14: (ThreadPool::worker()+0xa26) [0x5bc426]
  15: (ThreadPool::WorkThread::entry()+0xd) [0x585f0d]
  16: (()+0x68ca) [0x7f3960c9d8ca]
  17: (clone()+0x6d) [0x7f395f32186d]
2012-03-01 17:49:30.017269 7f81b662b780 ceph version 0.42-142-gc9416e6 
(commit:c9416e6184905501159e96115f734bdf65a74d28), process ceph-osd, pid 
3111
2012-03-01 17:49:30.085426 7f81b662b780 filestore(/data/osd2) mount 
FIEMAP ioctl is NOT supported
2012-03-01 17:49:30.085466 7f81b662b780 filestore(/data/osd2) mount did 
NOT detect btrfs
2012-03-01 17:49:30.110409 7f81b662b780 filestore(/data/osd2) mount 
found snaps <>
2012-03-01 17:49:30.110476 7f81b662b780 filestore(/data/osd2) mount: 
enabling WRITEAHEAD journal mode: btrfs not detected
2012-03-01 17:49:31.964977 7f81b662b780 journal _open /dev/sdc1 fd 16: 
10737942528 bytes, block size 4096 bytes, directio = 1, aio = 0
2012-03-01 17:49:31.967549 7f81b662b780 journal read_entry 9292222464 : 
seq 67841857 11225 bytes

=== 8-< ===

... after some journal-replay things calmed down, but:

     2012-03-01 17:58:29.470446   log 2012-03-01 17:58:24.242369 osd.2 
10.10.10.14:6801/3111 368 : [WRN] bad locator @56 on object @79 loc @56 
op osd_op(client.44350.0:1412387 rb.0.0.00000000136c [write 
2465792~49152] 56.9fb2fa17) v4

these type of messages we see ever so often... It corresponds, but in 
what way?

Can't we assume, if both snipplets "rb.0.0..." are identical, that 
life's good?
We had some other inconsistancies, where we had to delete the whole pool 
to get rid of crappy
blocks. The ceph-osd died, too, after doing some
     rbd rm <pool>/<image>
the one block in question remained, visable via
     rados ls -p <pool>

Any idea, o better clue? ;-)

Kind reg's,

Oliver.

-- 

Oliver Francke

filoo GmbH
Moltkestraße 25a
33330 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Still inconsistant pg's, ceph-osd crashes reliably after trying to repair
  2012-03-01 17:15 Still inconsistant pg's, ceph-osd crashes reliably after trying to repair Oliver Francke
@ 2012-03-01 18:07 ` Oliver Francke
  2012-03-01 19:03   ` Gregory Farnum
  0 siblings, 1 reply; 3+ messages in thread
From: Oliver Francke @ 2012-03-01 18:07 UTC (permalink / raw)
  To: ceph-devel

Well,

Am 01.03.2012 um 18:15 schrieb Oliver Francke:

> Hi *,
> 
> after some crashes we still had to care for some remaining inconsistancies reported via
>    ceph -w
> and friends.
> Well, we traced one of them down via
>    ceph pg dump
> 
> and we picked 79. pg=79.7 and found the corresponding file in the /var/log/ceph/osd.2.log.
>    /data/osd4/current/79.7_head/rb.0.0.00000000136c__head_9FB2FA17
> and the dup on
>    /data/osd2/...
> Strange though, they had the same checksum but reported a stat-error. Anyway. Decided to do a:
>    ceph pg repair 79.7
> ... byebye ceph-osd on node2!
> 
> Here the trace:
> 
> === 8-< ===
> 
> 2012-03-01 17:49:13.024571 7f3944584700 -- 10.10.10.14:6802/4892 >> 10.10.10.10:6802/19139 pipe(0xfcd2c80 sd=16 pgs=0 cs=0 l=0).connect protocol version mismatch, my 9 != 0
> 2012-03-01 17:49:23.674162 7f395001b700 log [ERR] : 79.7 osd.4: soid 9fb2fa17/rb.0.0.00000000136c/headextra attr _, extra attr snapset

one clarification by ourselves done: one copy is missing the xattrs, checked via
	getfattr
but why can't it be corrected, and worse this crash happens?

> 2012-03-01 17:49:23.674222 7f395001b700 log [ERR] : 79.7 repair 0 missing, 1 inconsistent objects
> *** Caught signal (Aborted) **
> in thread 7f395001b700
> ceph version 0.42-142-gc9416e6 (commit:c9416e6184905501159e96115f734bdf65a74d28)
> 1: /usr/bin/ceph-osd() [0x5a6b89]
> 2: (()+0xeff0) [0x7f3960ca5ff0]
> 3: (gsignal()+0x35) [0x7f395f2841b5]
> 4: (abort()+0x180) [0x7f395f286fc0]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f395fb18dc5]
> 6: (()+0xcb166) [0x7f395fb17166]
> 7: (()+0xcb193) [0x7f395fb17193]
> 8: (()+0xcb28e) [0x7f395fb1728e]
> 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x13e) [0x67c5ce]
> 10: (object_info_t::decode(ceph::buffer::list::iterator&)+0x2c) [0x61663c]
> 11: (PG::repair_object(hobject_t const&, ScrubMap::object*, int, int)+0x3be) [0x68d96e]
> 12: (PG::scrub_finalize()+0x1438) [0x6b8568]
> 13: (OSD::ScrubFinalizeWQ::_process(PG*)+0xc) [0x588edc]
> 14: (ThreadPool::worker()+0xa26) [0x5bc426]
> 15: (ThreadPool::WorkThread::entry()+0xd) [0x585f0d]
> 16: (()+0x68ca) [0x7f3960c9d8ca]
> 17: (clone()+0x6d) [0x7f395f32186d]
> 2012-03-01 17:49:30.017269 7f81b662b780 ceph version 0.42-142-gc9416e6 (commit:c9416e6184905501159e96115f734bdf65a74d28), process ceph-osd, pid 3111
> 2012-03-01 17:49:30.085426 7f81b662b780 filestore(/data/osd2) mount FIEMAP ioctl is NOT supported
> 2012-03-01 17:49:30.085466 7f81b662b780 filestore(/data/osd2) mount did NOT detect btrfs
> 2012-03-01 17:49:30.110409 7f81b662b780 filestore(/data/osd2) mount found snaps <>
> 2012-03-01 17:49:30.110476 7f81b662b780 filestore(/data/osd2) mount: enabling WRITEAHEAD journal mode: btrfs not detected
> 2012-03-01 17:49:31.964977 7f81b662b780 journal _open /dev/sdc1 fd 16: 10737942528 bytes, block size 4096 bytes, directio = 1, aio = 0
> 2012-03-01 17:49:31.967549 7f81b662b780 journal read_entry 9292222464 : seq 67841857 11225 bytes
> 
> === 8-< ===
> 
> ... after some journal-replay things calmed down, but:
> 
>    2012-03-01 17:58:29.470446   log 2012-03-01 17:58:24.242369 osd.2 10.10.10.14:6801/3111 368 : [WRN] bad locator @56 on object @79 loc @56 op osd_op(client.44350.0:1412387 rb.0.0.00000000136c [write 2465792~49152] 56.9fb2fa17) v4
> 
> these type of messages we see ever so often... It corresponds, but in what way?
> 
> Can't we assume, if both snipplets "rb.0.0..." are identical, that life's good?
> We had some other inconsistancies, where we had to delete the whole pool to get rid of crappy
> blocks. The ceph-osd died, too, after doing some
>    rbd rm <pool>/<image>
> the one block in question remained, visable via
>    rados ls -p <pool>
> 
> Any idea, o better clue? ;-)
> 
> Kind reg's,
> 
> Oliver.
> 
> -- 
> 
> Oliver Francke
> 
> filoo GmbH
> Moltkestraße 25a
> 33330 Gütersloh
> HRB4355 AG Gütersloh
> 
> Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
> 
> Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Still inconsistant pg's, ceph-osd crashes reliably after trying to repair
  2012-03-01 18:07 ` Oliver Francke
@ 2012-03-01 19:03   ` Gregory Farnum
  0 siblings, 0 replies; 3+ messages in thread
From: Gregory Farnum @ 2012-03-01 19:03 UTC (permalink / raw)
  To: Oliver Francke; +Cc: ceph-devel

On Thu, Mar 1, 2012 at 10:07 AM, Oliver Francke <Oliver.Francke@filoo.de> wrote:
> Well,
>
> Am 01.03.2012 um 18:15 schrieb Oliver Francke:
>
>> Hi *,
>>
>> after some crashes we still had to care for some remaining inconsistancies reported via
>>    ceph -w
>> and friends.
>> Well, we traced one of them down via
>>    ceph pg dump
>>
>> and we picked 79. pg=79.7 and found the corresponding file in the /var/log/ceph/osd.2.log.
>>    /data/osd4/current/79.7_head/rb.0.0.00000000136c__head_9FB2FA17
>> and the dup on
>>    /data/osd2/...
>> Strange though, they had the same checksum but reported a stat-error. Anyway. Decided to do a:
>>    ceph pg repair 79.7
>> ... byebye ceph-osd on node2!
>>
>> Here the trace:
>>
>> === 8-< ===
>>
>> 2012-03-01 17:49:13.024571 7f3944584700 -- 10.10.10.14:6802/4892 >> 10.10.10.10:6802/19139 pipe(0xfcd2c80 sd=16 pgs=0 cs=0 l=0).connect protocol version mismatch, my 9 != 0
>> 2012-03-01 17:49:23.674162 7f395001b700 log [ERR] : 79.7 osd.4: soid 9fb2fa17/rb.0.0.00000000136c/headextra attr _, extra attr snapset
>
> one clarification by ourselves done: one copy is missing the xattrs, checked via
>        getfattr
> but why can't it be corrected, and worse this crash happens?

You've got a lot of odd things going on here, some of which are
obviously connected and some of which aren't. Right now Ceph doesn't
automatically handle conflicts like missing xattrs because doing so
with 2x replication is really hard, and even when you have more people
(and can do some form of voting) you have to write some complicated
code to make it actually happen. :) At some point in the future, it
will, but for now we really want the attention anyway.

So the reason it's crashing is because it lost the "_" xattr, which we
could handle, except...losing that xattr is really, really bad. What
backing filesystem are you using? Are you using snapshots?

>> === 8-< ===
>>
>> ... after some journal-replay things calmed down, but:
>>
>>    2012-03-01 17:58:29.470446   log 2012-03-01 17:58:24.242369 osd.2 10.10.10.14:6801/3111 368 : [WRN] bad locator @56 on object @79 loc @56 op osd_op(client.44350.0:1412387 rb.0.0.00000000136c [write 2465792~49152] 56.9fb2fa17) v4

Does the bad locator always look like that, with the @56 and @79 values?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-03-01 19:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-01 17:15 Still inconsistant pg's, ceph-osd crashes reliably after trying to repair Oliver Francke
2012-03-01 18:07 ` Oliver Francke
2012-03-01 19:03   ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.