osd crash when deep-scrubbing

All of lore.kernel.org
 help / color / mirror / Atom feed

* osd crash when deep-scrubbing
@ 2015-10-18 14:56 Jiaying Ren
  0 siblings, 0 replies; 2+ messages in thread
From: Jiaying Ren @ 2015-10-18 14:56 UTC (permalink / raw)
  To: ceph-devel

Hi, cephers:

I've encountered a problem that a pg stuck in inconsistent status:

$ ceph -s
    cluster 27d39faa-48ae-4356-a8e3-19d5b81e179e
     health HEALTH_ERR 1 pgs inconsistent; 34 near full osd(s); 1
scrub errors; noout flag(s) set
     monmap e4: 3 mons at
{server-61.0.yyyy.xxxxxxxxx.in=10.8.0.61:6789/0,server-62.0.yyyy.xxxxxxxxx.in=10.8.0.62:6789/0,server-63.0.yyyy.xxxxxxxxx.in=10.8.0.63:6789/0},
election epoch 6706, quorum 0,1,2
server-61.0.yyyy.xxxxxxxxx.in,server-62.0.yyyy.xxxxxxxxx.in,server-63.0.yyyy.xxxxxxxxx.in
     osdmap e87808: 180 osds: 180 up, 180 in
            flags noout
      pgmap v29322850: 35026 pgs, 15 pools, 27768 GB data, 1905 kobjects
            83575 GB used, 114 TB / 196 TB avail
               35025 active+clean
                   1 active+clean+inconsistent
  client io 120 kB/s rd, 216 MB/s wr, 6398 op/s

`pg repair` cmd doesn't work, so I manually repaired a inconsistent object(pool
size is 3,I removed the object different from other two copys).after that pg
still in inconsistent status:

$ ceph pg dump | grep active+clean+inconsistent
dumped all in format plain
3.d70   290     0       0       0       4600869888      3050    3050
  stale+active+clean+inconsistent 2015-10-18 13:05:43.320451
  87798'7631234   87798:10758311        [131,119,132]   131
  [131,119,132]   131     85161'7599152   2015-10-16 14:34:21.283303
  85161'7599152   2015-10-16 14:34:21.283303

And after restarted osd.131, the primary osd osd.131 would crash,the straceback:

 1: /usr/bin/ceph-osd() [0x9c6de1]
 2: (()+0xf790) [0x7f384b6b8790]
 3: (gsignal()+0x35) [0x7f384a58a625]
 4: (abort()+0x175) [0x7f384a58be05]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7f384ae44a5d]
 6: (()+0xbcbe6) [0x7f384ae42be6]
 7: (()+0xbcc13) [0x7f384ae42c13]
 8: (()+0xbcd0e) [0x7f384ae42d0e]
 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x13e) [0x9cd0de]
 10: (object_info_t::decode(ceph::buffer::list::iterator&)+0x81) [0x7dfaf1]
 11: (PG::_scan_snaps(ScrubMap&)+0x394) [0x84b8c4]
 12: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
ThreadPool::TPHandle&)+0x27b) [0x84cdab]
 13: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x5c4) [0x85c1b4]
 14: (PG::scrub(ThreadPool::TPHandle&)+0x181) [0x85d691]
 15: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x1c) [0x6737cc]
 16: (ThreadPool::worker(ThreadPool::WorkThread*)+0x53d) [0x9e05dd]
 17: (ThreadPool::WorkThread::entry()+0x10) [0x9e1760]
 18: (()+0x7a51) [0x7f384b6b0a51]
 19: (clone()+0x6d) [0x7f384a6409ad]

ceph version is v0.80.9, manually executes `ceph pg deep-scrub 3.d70` would also
cause osd crash.

Any ideas? or did I missed some logs necessary for further investigation?

Thx.

--
Best Regards!
Jiaying Ren(mikulely)

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: osd crash when deep-scrubbing
@ 2015-10-20  3:04 changtao381
  0 siblings, 0 replies; 2+ messages in thread
From: changtao381 @ 2015-10-20  3:04 UTC (permalink / raw)
  Cc: ceph-devel

Jiaying Ren <mikulely <at> gmail.com> writes:

> 
> Hi, cephers:
> 
> I've encountered a problem that a pg stuck in inconsistent status:
> 
> $ ceph -s
>     cluster 27d39faa-48ae-4356-a8e3-19d5b81e179e
>      health HEALTH_ERR 1 pgs inconsistent; 34 near full osd(s); 1
> scrub errors; noout flag(s) set
>      monmap e4: 3 mons at
>
{server-61.0.yyyy.xxxxxxxxx.in=10.8.0.61:6789/0,server-62.0.yyyy.xxxxxxxxx.i
n=10.8.0.62:6789/0,server-63.0.yyyy.xxxxxxxxx.in=10.8.0.63:6789/0},
> election epoch 6706, quorum 0,1,2
>
server-61.0.yyyy.xxxxxxxxx.in,server-62.0.yyyy.xxxxxxxxx.in,server-63.0.yyyy
.xxxxxxxxx.in
>      osdmap e87808: 180 osds: 180 up, 180 in
>             flags noout
>       pgmap v29322850: 35026 pgs, 15 pools, 27768 GB data, 1905 kobjects
>             83575 GB used, 114 TB / 196 TB avail
>                35025 active+clean
>                    1 active+clean+inconsistent
>   client io 120 kB/s rd, 216 MB/s wr, 6398 op/s
> 
> `pg repair` cmd doesn't work, so I manually repaired a inconsistent
object(pool
> size is 3,I removed the object different from other two copys).after that
pg
> still in inconsistent status:
> 
> $ ceph pg dump | grep active+clean+inconsistent
> dumped all in format plain
> 3.d70   290     0       0       0       4600869888      3050    3050
>   stale+active+clean+inconsistent 2015-10-18 13:05:43.320451
>   87798'7631234   87798:10758311        [131,119,132]   131
>   [131,119,132]   131     85161'7599152   2015-10-16 14:34:21.283303
>   85161'7599152   2015-10-16 14:34:21.283303
> 
> And after restarted osd.131, the primary osd osd.131 would crash,the
straceback:
> 
>  1: /usr/bin/ceph-osd() [0x9c6de1]
>  2: (()+0xf790) [0x7f384b6b8790]
>  3: (gsignal()+0x35) [0x7f384a58a625]
>  4: (abort()+0x175) [0x7f384a58be05]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7f384ae44a5d]
>  6: (()+0xbcbe6) [0x7f384ae42be6]
>  7: (()+0xbcc13) [0x7f384ae42c13]
>  8: (()+0xbcd0e) [0x7f384ae42d0e]
>  9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x13e)
[0x9cd0de]
>  10: (object_info_t::decode(ceph::buffer::list::iterator&)+0x81)
[0x7dfaf1]
>  11: (PG::_scan_snaps(ScrubMap&)+0x394) [0x84b8c4]
>  12: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> ThreadPool::TPHandle&)+0x27b) [0x84cdab]
>  13: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x5c4) [0x85c1b4]
>  14: (PG::scrub(ThreadPool::TPHandle&)+0x181) [0x85d691]
>  15: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x1c) [0x6737cc]
>  16: (ThreadPool::worker(ThreadPool::WorkThread*)+0x53d) [0x9e05dd]
>  17: (ThreadPool::WorkThread::entry()+0x10) [0x9e1760]
>  18: (()+0x7a51) [0x7f384b6b0a51]
>  19: (clone()+0x6d) [0x7f384a6409ad]
> 
> ceph version is v0.80.9, manually executes `ceph pg deep-scrub 3.d70`
would also
> cause osd crash.
> 
> Any ideas? or did I missed some logs necessary for further investigation?
> 
> Thx.
> 
> --
> Best Regards!
> Jiaying Ren(mikulely)
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo <at> vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

I have met a problem when run 'ceph pg deep-scrub' command. It also causes
osd crash. And finally i find some sector of the disk have corrupted .so
please check dmesg info to check weather there is some disk errors



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-10-20  3:04 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-20  3:04 osd crash when deep-scrubbing changtao381
  -- strict thread matches above, loose matches on Subject: below --
2015-10-18 14:56 Jiaying Ren

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.