From: Olivier Bonvalet <ceph.list@daevel.fr>
To: Samuel Just <sam.just@inktank.com>
Cc: Denis Kaganovich <mahatma@eu.by>,
"ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>,
ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: [ceph-users] scrub error: found clone without head
Date: Fri, 24 May 2013 00:27:53 +0200 [thread overview]
Message-ID: <1369348073.3440.3.camel@localhost> (raw)
In-Reply-To: <CA+4uBUYPFWBgYqqS7qN1eoQ5ies1S5_R=CCUZoeFPY3Fe+=_mA@mail.gmail.com>
No :
pg 3.7c is active+clean+inconsistent, acting [24,13,39]
pg 3.6b is active+clean+inconsistent, acting [28,23,5]
pg 3.d is active+clean+inconsistent, acting [29,4,11]
pg 3.1 is active+clean+inconsistent, acting [28,19,5]
But I suppose that all PG *was* having the osd.25 as primary (on the
same host), which is (disabled) buggy OSD.
Question : "12d7" in object path is the snapshot id, right ? If it's the
case, I haven't got any snapshot with this id for the
rb.0.15c26.238e1f29 image.
So, which files should I remove ?
Thanks for your help.
Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit :
> Do all of the affected PGs share osd.28 as the primary? I think the
> only recovery is probably to manually remove the orphaned clones.
> -Sam
>
> On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> > Not yet. I keep it for now.
> >
> > Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
> >> rb.0.15c26.238e1f29
> >>
> >> Has that rbd volume been removed?
> >> -Sam
> >>
> >> On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> >> > 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
> >> >
> >> >
> >> > Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
> >> >> What version are you running?
> >> >> -Sam
> >> >>
> >> >> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> >> >> > Is it enough ?
> >> >> >
> >> >> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head'
> >> >> > 2013-05-22 15:43:09.308352 7f707dd64700 0 log [INF] : 9.105 scrub ok
> >> >> > 2013-05-22 15:44:21.054893 7f707dd64700 0 log [INF] : 9.451 scrub ok
> >> >> > 2013-05-22 15:44:52.898784 7f707cd62700 0 log [INF] : 9.784 scrub ok
> >> >> > 2013-05-22 15:47:43.148515 7f707cd62700 0 log [INF] : 9.3c3 scrub ok
> >> >> > 2013-05-22 15:47:45.717085 7f707dd64700 0 log [INF] : 9.3d0 scrub ok
> >> >> > 2013-05-22 15:52:14.573815 7f707dd64700 0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head
> >> >> > 2013-05-22 15:55:07.230114 7f707d563700 0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head
> >> >> > 2013-05-22 15:56:56.456242 7f707d563700 0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head
> >> >> > 2013-05-22 15:57:51.667085 7f707dd64700 0 log [ERR] : 3.6b scrub 3 errors
> >> >> > 2013-05-22 15:57:55.241224 7f707dd64700 0 log [INF] : 9.450 scrub ok
> >> >> > 2013-05-22 15:57:59.800383 7f707cd62700 0 log [INF] : 9.465 scrub ok
> >> >> > 2013-05-22 15:59:55.024065 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby
> >> >> > 2013-05-22 16:01:45.542579 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby
> >> >> > --
> >> >> > 2013-05-22 16:29:49.544310 7f707dd64700 0 log [INF] : 9.4eb scrub ok
> >> >> > 2013-05-22 16:29:53.190233 7f707dd64700 0 log [INF] : 9.4f4 scrub ok
> >> >> > 2013-05-22 16:29:59.478736 7f707dd64700 0 log [INF] : 8.6bb scrub ok
> >> >> > 2013-05-22 16:35:12.240246 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby
> >> >> > 2013-05-22 16:35:19.519019 7f707d563700 0 log [INF] : 8.700 scrub ok
> >> >> > 2013-05-22 16:39:15.422532 7f707dd64700 0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head
> >> >> > 2013-05-22 16:40:04.995256 7f707cd62700 0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head
> >> >> > 2013-05-22 16:41:07.008717 7f707d563700 0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head
> >> >> > 2013-05-22 16:41:42.460280 7f707c561700 0 log [ERR] : 3.1 scrub 3 errors
> >> >> > 2013-05-22 16:46:12.385678 7f7077735700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby
> >> >> > 2013-05-22 16:58:36.079010 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby
> >> >> > 2013-05-22 16:58:36.798038 7f707d563700 0 log [INF] : 9.50c scrub ok
> >> >> > 2013-05-22 16:58:40.104159 7f707c561700 0 log [INF] : 9.526 scrub ok
> >> >> >
> >> >> >
> >> >> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
> >> >> >
> >> >> >
> >> >> >
> >> >> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
> >> >> >> Can you post your ceph.log with the period including all of these errors?
> >> >> >> -Sam
> >> >> >>
> >> >> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
> >> >> >> <mahatma@bspu.unibel.by> wrote:
> >> >> >> > Olivier Bonvalet пишет:
> >> >> >> >>
> >> >> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
> >> >> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
> >> >> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
> >> >> >> >>>> repairing. How to repair it exclude re-creating of OSD?
> >> >> >> >>>>
> >> >> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple
> >> >> >> >>>> OSDs - it may cause data lost.
> >> >> >> >>>
> >> >> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without
> >> >> >> >>> head". How can I fix that ?
> >> >> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to
> >> >> >> >> manually fix that ? (it's a production cluster)
> >> >> >> >
> >> >> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And
> >> >> >> > many others troubles. So, if you want to keep cluster running, wait for
> >> >> >> > developers answer. IMHO.
> >> >> >> >
> >> >> >> > About manual repair attempt: see issue #4937. Also similar results - in subject
> >> >> >> > "Inconsistent PG's, repair ineffective".
> >> >> >> >
> >> >> >> > --
> >> >> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
> >> >> >> > _______________________________________________
> >> >> >> > ceph-users mailing list
> >> >> >> > ceph-users@lists.ceph.com
> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> >>
> >> >> >
> >> >> >
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> >>
> >> >
> >> >
> >>
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-05-23 22:27 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <5188F8D2.5040303@bspu.unibel.by>
[not found] ` <1369001190.9705.37.camel@localhost>
2013-05-22 7:00 ` scrub error: found clone without head Olivier Bonvalet
2013-05-22 12:39 ` Dzianis Kahanovich
[not found] ` <519CBC66.9030607-jC57FVqSskN9IU3jSFkzTg@public.gmane.org>
2013-05-22 18:01 ` Samuel Just
[not found] ` <CA+4uBUYKNnGfH2JnDBmyRy86RovFEOV=pbW2cdhQ2kV0spijXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-22 18:25 ` Olivier Bonvalet
2013-05-22 19:00 ` Samuel Just
2013-05-22 19:18 ` [ceph-users] " Olivier Bonvalet
2013-05-22 22:50 ` Samuel Just
[not found] ` <CA+4uBUZcsi5eK31p8t+-7jZaERu40O1LSRW6wJOkfho_FwYXkg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-23 12:00 ` Olivier Bonvalet
2013-05-23 22:17 ` [ceph-users] " Samuel Just
2013-05-23 22:27 ` Olivier Bonvalet [this message]
2013-05-23 22:53 ` Samuel Just
2013-05-31 13:36 ` Olivier Bonvalet
2013-05-31 14:34 ` Olivier Bonvalet
2013-05-31 15:55 ` [solved] " Olivier Bonvalet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1369348073.3440.3.camel@localhost \
--to=ceph.list@daevel.fr \
--cc=ceph-devel@vger.kernel.org \
--cc=ceph-users@lists.ceph.com \
--cc=mahatma@eu.by \
--cc=sam.just@inktank.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox