All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: scrub error: found clone without head
       [not found] ` <1369001190.9705.37.camel@localhost>
@ 2013-05-22  7:00   ` Olivier Bonvalet
  2013-05-22 12:39     ` Dzianis Kahanovich
  0 siblings, 1 reply; 14+ messages in thread
From: Olivier Bonvalet @ 2013-05-22  7:00 UTC (permalink / raw)
  To: ceph-users-idqoXFIVOFJgJs9I8MT0rw; +Cc: ceph-devel


Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
> > I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
> > repairing. How to repair it exclude re-creating of OSD?
> > 
> > Now it "easy" to clean+create OSD, but in theory - in case there are multiple
> > OSDs - it may cause data lost.
> > 
> > -- 
> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> 
> 
> Hi,
> 
> I have same problem : 8 objects (4 PG) with error "found clone without
> head". How can I fix that ?
> 
> thanks,
> Olivier
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Hi,

since "pg repair" doesn't handle that kind of errors, is there a way to
manually fix that ? (it's a production cluster)

thanks in advance,
Olivier

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scrub error: found clone without head
  2013-05-22  7:00   ` scrub error: found clone without head Olivier Bonvalet
@ 2013-05-22 12:39     ` Dzianis Kahanovich
       [not found]       ` <519CBC66.9030607-jC57FVqSskN9IU3jSFkzTg@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Dzianis Kahanovich @ 2013-05-22 12:39 UTC (permalink / raw)
  To: Olivier Bonvalet, ceph-users-idqoXFIVOFJgJs9I8MT0rw; +Cc: ceph-devel

Olivier Bonvalet пишет:
> 
> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
>>> repairing. How to repair it exclude re-creating of OSD?
>>>
>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple
>>> OSDs - it may cause data lost.
>>
>> I have same problem : 8 objects (4 PG) with error "found clone without
>> head". How can I fix that ?
> since "pg repair" doesn't handle that kind of errors, is there a way to
> manually fix that ? (it's a production cluster)

Trying to fix manually I cause assertions in trimming process (died OSD). And
many others troubles. So, if you want to keep cluster running, wait for
developers answer. IMHO.

About manual repair attempt: see issue #4937. Also similar results - in subject
"Inconsistent PG's, repair ineffective".

-- 
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scrub error: found clone without head
       [not found]       ` <519CBC66.9030607-jC57FVqSskN9IU3jSFkzTg@public.gmane.org>
@ 2013-05-22 18:01         ` Samuel Just
       [not found]           ` <CA+4uBUYKNnGfH2JnDBmyRy86RovFEOV=pbW2cdhQ2kV0spijXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Samuel Just @ 2013-05-22 18:01 UTC (permalink / raw)
  To: mahatma-cw37gHAUgAY
  Cc: ceph-devel, ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org

Can you post your ceph.log with the period including all of these errors?
-Sam

On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
<mahatma@bspu.unibel.by> wrote:
> Olivier Bonvalet пишет:
>>
>> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
>>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
>>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
>>>> repairing. How to repair it exclude re-creating of OSD?
>>>>
>>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple
>>>> OSDs - it may cause data lost.
>>>
>>> I have same problem : 8 objects (4 PG) with error "found clone without
>>> head". How can I fix that ?
>> since "pg repair" doesn't handle that kind of errors, is there a way to
>> manually fix that ? (it's a production cluster)
>
> Trying to fix manually I cause assertions in trimming process (died OSD). And
> many others troubles. So, if you want to keep cluster running, wait for
> developers answer. IMHO.
>
> About manual repair attempt: see issue #4937. Also similar results - in subject
> "Inconsistent PG's, repair ineffective".
>
> --
> WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scrub error: found clone without head
       [not found]           ` <CA+4uBUYKNnGfH2JnDBmyRy86RovFEOV=pbW2cdhQ2kV0spijXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-05-22 18:25             ` Olivier Bonvalet
  2013-05-22 19:00               ` Samuel Just
  0 siblings, 1 reply; 14+ messages in thread
From: Olivier Bonvalet @ 2013-05-22 18:25 UTC (permalink / raw)
  To: Samuel Just
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-devel,
	mahatma-cw37gHAUgAY

Is it enough ?

# tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head'
2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head
2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head
2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head
2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 errors
2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
2013-05-22 15:59:55.024065 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby
2013-05-22 16:01:45.542579 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby
--
2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
2013-05-22 16:35:12.240246 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby
2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head
2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head
2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head
2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 errors
2013-05-22 16:46:12.385678 7f7077735700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby
2013-05-22 16:58:36.079010 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby
2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok


Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29).



Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
> Can you post your ceph.log with the period including all of these errors?
> -Sam
> 
> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
> <mahatma@bspu.unibel.by> wrote:
> > Olivier Bonvalet пишет:
> >>
> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
> >>>> repairing. How to repair it exclude re-creating of OSD?
> >>>>
> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple
> >>>> OSDs - it may cause data lost.
> >>>
> >>> I have same problem : 8 objects (4 PG) with error "found clone without
> >>> head". How can I fix that ?
> >> since "pg repair" doesn't handle that kind of errors, is there a way to
> >> manually fix that ? (it's a production cluster)
> >
> > Trying to fix manually I cause assertions in trimming process (died OSD). And
> > many others troubles. So, if you want to keep cluster running, wait for
> > developers answer. IMHO.
> >
> > About manual repair attempt: see issue #4937. Also similar results - in subject
> > "Inconsistent PG's, repair ineffective".
> >
> > --
> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scrub error: found clone without head
  2013-05-22 18:25             ` Olivier Bonvalet
@ 2013-05-22 19:00               ` Samuel Just
  2013-05-22 19:18                 ` [ceph-users] " Olivier Bonvalet
  0 siblings, 1 reply; 14+ messages in thread
From: Samuel Just @ 2013-05-22 19:00 UTC (permalink / raw)
  To: Olivier Bonvalet
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-devel,
	Denis Kaganovich

What version are you running?
-Sam

On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> Is it enough ?
>
> # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head'
> 2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
> 2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
> 2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
> 2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
> 2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
> 2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head
> 2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head
> 2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head
> 2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 errors
> 2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
> 2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
> 2013-05-22 15:59:55.024065 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby
> 2013-05-22 16:01:45.542579 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby
> --
> 2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
> 2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
> 2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
> 2013-05-22 16:35:12.240246 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby
> 2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
> 2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head
> 2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head
> 2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head
> 2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 errors
> 2013-05-22 16:46:12.385678 7f7077735700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby
> 2013-05-22 16:58:36.079010 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby
> 2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
> 2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok
>
>
> Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
>
>
>
> Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
>> Can you post your ceph.log with the period including all of these errors?
>> -Sam
>>
>> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
>> <mahatma@bspu.unibel.by> wrote:
>> > Olivier Bonvalet пишет:
>> >>
>> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
>> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
>> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
>> >>>> repairing. How to repair it exclude re-creating of OSD?
>> >>>>
>> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple
>> >>>> OSDs - it may cause data lost.
>> >>>
>> >>> I have same problem : 8 objects (4 PG) with error "found clone without
>> >>> head". How can I fix that ?
>> >> since "pg repair" doesn't handle that kind of errors, is there a way to
>> >> manually fix that ? (it's a production cluster)
>> >
>> > Trying to fix manually I cause assertions in trimming process (died OSD). And
>> > many others troubles. So, if you want to keep cluster running, wait for
>> > developers answer. IMHO.
>> >
>> > About manual repair attempt: see issue #4937. Also similar results - in subject
>> > "Inconsistent PG's, repair ineffective".
>> >
>> > --
>> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ceph-users] scrub error: found clone without head
  2013-05-22 19:00               ` Samuel Just
@ 2013-05-22 19:18                 ` Olivier Bonvalet
  2013-05-22 22:50                   ` Samuel Just
  0 siblings, 1 reply; 14+ messages in thread
From: Olivier Bonvalet @ 2013-05-22 19:18 UTC (permalink / raw)
  To: Samuel Just; +Cc: Denis Kaganovich, ceph-users@lists.ceph.com, ceph-devel

0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.


Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
> What version are you running?
> -Sam
> 
> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> > Is it enough ?
> >
> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head'
> > 2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
> > 2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
> > 2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
> > 2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
> > 2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
> > 2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head
> > 2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head
> > 2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head
> > 2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 errors
> > 2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
> > 2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
> > 2013-05-22 15:59:55.024065 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby
> > 2013-05-22 16:01:45.542579 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby
> > --
> > 2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
> > 2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
> > 2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
> > 2013-05-22 16:35:12.240246 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby
> > 2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
> > 2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head
> > 2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head
> > 2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head
> > 2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 errors
> > 2013-05-22 16:46:12.385678 7f7077735700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby
> > 2013-05-22 16:58:36.079010 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby
> > 2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
> > 2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok
> >
> >
> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
> >
> >
> >
> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
> >> Can you post your ceph.log with the period including all of these errors?
> >> -Sam
> >>
> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
> >> <mahatma@bspu.unibel.by> wrote:
> >> > Olivier Bonvalet пишет:
> >> >>
> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
> >> >>>> repairing. How to repair it exclude re-creating of OSD?
> >> >>>>
> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple
> >> >>>> OSDs - it may cause data lost.
> >> >>>
> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without
> >> >>> head". How can I fix that ?
> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to
> >> >> manually fix that ? (it's a production cluster)
> >> >
> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And
> >> > many others troubles. So, if you want to keep cluster running, wait for
> >> > developers answer. IMHO.
> >> >
> >> > About manual repair attempt: see issue #4937. Also similar results - in subject
> >> > "Inconsistent PG's, repair ineffective".
> >> >
> >> > --
> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scrub error: found clone without head
  2013-05-22 19:18                 ` [ceph-users] " Olivier Bonvalet
@ 2013-05-22 22:50                   ` Samuel Just
       [not found]                     ` <CA+4uBUZcsi5eK31p8t+-7jZaERu40O1LSRW6wJOkfho_FwYXkg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Samuel Just @ 2013-05-22 22:50 UTC (permalink / raw)
  To: Olivier Bonvalet
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-devel,
	Denis Kaganovich

rb.0.15c26.238e1f29

Has that rbd volume been removed?
-Sam

On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
>
>
> Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
>> What version are you running?
>> -Sam
>>
>> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
>> > Is it enough ?
>> >
>> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head'
>> > 2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
>> > 2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
>> > 2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
>> > 2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
>> > 2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
>> > 2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head
>> > 2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head
>> > 2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head
>> > 2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 errors
>> > 2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
>> > 2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
>> > 2013-05-22 15:59:55.024065 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby
>> > 2013-05-22 16:01:45.542579 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby
>> > --
>> > 2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
>> > 2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
>> > 2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
>> > 2013-05-22 16:35:12.240246 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby
>> > 2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
>> > 2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head
>> > 2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head
>> > 2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head
>> > 2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 errors
>> > 2013-05-22 16:46:12.385678 7f7077735700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby
>> > 2013-05-22 16:58:36.079010 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby
>> > 2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
>> > 2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok
>> >
>> >
>> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
>> >
>> >
>> >
>> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
>> >> Can you post your ceph.log with the period including all of these errors?
>> >> -Sam
>> >>
>> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
>> >> <mahatma@bspu.unibel.by> wrote:
>> >> > Olivier Bonvalet пишет:
>> >> >>
>> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
>> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
>> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
>> >> >>>> repairing. How to repair it exclude re-creating of OSD?
>> >> >>>>
>> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple
>> >> >>>> OSDs - it may cause data lost.
>> >> >>>
>> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without
>> >> >>> head". How can I fix that ?
>> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to
>> >> >> manually fix that ? (it's a production cluster)
>> >> >
>> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And
>> >> > many others troubles. So, if you want to keep cluster running, wait for
>> >> > developers answer. IMHO.
>> >> >
>> >> > About manual repair attempt: see issue #4937. Also similar results - in subject
>> >> > "Inconsistent PG's, repair ineffective".
>> >> >
>> >> > --
>> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
>> >> > _______________________________________________
>> >> > ceph-users mailing list
>> >> > ceph-users@lists.ceph.com
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scrub error: found clone without head
       [not found]                     ` <CA+4uBUZcsi5eK31p8t+-7jZaERu40O1LSRW6wJOkfho_FwYXkg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-05-23 12:00                       ` Olivier Bonvalet
  2013-05-23 22:17                         ` [ceph-users] " Samuel Just
  0 siblings, 1 reply; 14+ messages in thread
From: Olivier Bonvalet @ 2013-05-23 12:00 UTC (permalink / raw)
  To: Samuel Just
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-devel,
	Denis Kaganovich

Not yet. I keep it for now.

Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
> rb.0.15c26.238e1f29
> 
> Has that rbd volume been removed?
> -Sam
> 
> On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> > 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
> >
> >
> > Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
> >> What version are you running?
> >> -Sam
> >>
> >> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> >> > Is it enough ?
> >> >
> >> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head'
> >> > 2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
> >> > 2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
> >> > 2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
> >> > 2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
> >> > 2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
> >> > 2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head
> >> > 2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head
> >> > 2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head
> >> > 2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 errors
> >> > 2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
> >> > 2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
> >> > 2013-05-22 15:59:55.024065 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby
> >> > 2013-05-22 16:01:45.542579 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby
> >> > --
> >> > 2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
> >> > 2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
> >> > 2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
> >> > 2013-05-22 16:35:12.240246 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby
> >> > 2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
> >> > 2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head
> >> > 2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head
> >> > 2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head
> >> > 2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 errors
> >> > 2013-05-22 16:46:12.385678 7f7077735700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby
> >> > 2013-05-22 16:58:36.079010 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby
> >> > 2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
> >> > 2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok
> >> >
> >> >
> >> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
> >> >
> >> >
> >> >
> >> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
> >> >> Can you post your ceph.log with the period including all of these errors?
> >> >> -Sam
> >> >>
> >> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
> >> >> <mahatma@bspu.unibel.by> wrote:
> >> >> > Olivier Bonvalet пишет:
> >> >> >>
> >> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
> >> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
> >> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
> >> >> >>>> repairing. How to repair it exclude re-creating of OSD?
> >> >> >>>>
> >> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple
> >> >> >>>> OSDs - it may cause data lost.
> >> >> >>>
> >> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without
> >> >> >>> head". How can I fix that ?
> >> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to
> >> >> >> manually fix that ? (it's a production cluster)
> >> >> >
> >> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And
> >> >> > many others troubles. So, if you want to keep cluster running, wait for
> >> >> > developers answer. IMHO.
> >> >> >
> >> >> > About manual repair attempt: see issue #4937. Also similar results - in subject
> >> >> > "Inconsistent PG's, repair ineffective".
> >> >> >
> >> >> > --
> >> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
> >> >> > _______________________________________________
> >> >> > ceph-users mailing list
> >> >> > ceph-users@lists.ceph.com
> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >>
> >> >
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >
> >
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ceph-users] scrub error: found clone without head
  2013-05-23 12:00                       ` Olivier Bonvalet
@ 2013-05-23 22:17                         ` Samuel Just
  2013-05-23 22:27                           ` Olivier Bonvalet
  0 siblings, 1 reply; 14+ messages in thread
From: Samuel Just @ 2013-05-23 22:17 UTC (permalink / raw)
  To: Olivier Bonvalet; +Cc: Denis Kaganovich, ceph-users@lists.ceph.com, ceph-devel

Do all of the affected PGs share osd.28 as the primary?  I think the
only recovery is probably to manually remove the orphaned clones.
-Sam

On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> Not yet. I keep it for now.
>
> Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
>> rb.0.15c26.238e1f29
>>
>> Has that rbd volume been removed?
>> -Sam
>>
>> On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
>> > 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
>> >
>> >
>> > Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
>> >> What version are you running?
>> >> -Sam
>> >>
>> >> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
>> >> > Is it enough ?
>> >> >
>> >> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head'
>> >> > 2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
>> >> > 2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
>> >> > 2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
>> >> > 2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
>> >> > 2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
>> >> > 2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head
>> >> > 2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head
>> >> > 2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head
>> >> > 2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 errors
>> >> > 2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
>> >> > 2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
>> >> > 2013-05-22 15:59:55.024065 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby
>> >> > 2013-05-22 16:01:45.542579 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby
>> >> > --
>> >> > 2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
>> >> > 2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
>> >> > 2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
>> >> > 2013-05-22 16:35:12.240246 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby
>> >> > 2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
>> >> > 2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head
>> >> > 2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head
>> >> > 2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head
>> >> > 2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 errors
>> >> > 2013-05-22 16:46:12.385678 7f7077735700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby
>> >> > 2013-05-22 16:58:36.079010 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby
>> >> > 2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
>> >> > 2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok
>> >> >
>> >> >
>> >> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
>> >> >
>> >> >
>> >> >
>> >> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
>> >> >> Can you post your ceph.log with the period including all of these errors?
>> >> >> -Sam
>> >> >>
>> >> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
>> >> >> <mahatma@bspu.unibel.by> wrote:
>> >> >> > Olivier Bonvalet пишет:
>> >> >> >>
>> >> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
>> >> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
>> >> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
>> >> >> >>>> repairing. How to repair it exclude re-creating of OSD?
>> >> >> >>>>
>> >> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple
>> >> >> >>>> OSDs - it may cause data lost.
>> >> >> >>>
>> >> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without
>> >> >> >>> head". How can I fix that ?
>> >> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to
>> >> >> >> manually fix that ? (it's a production cluster)
>> >> >> >
>> >> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And
>> >> >> > many others troubles. So, if you want to keep cluster running, wait for
>> >> >> > developers answer. IMHO.
>> >> >> >
>> >> >> > About manual repair attempt: see issue #4937. Also similar results - in subject
>> >> >> > "Inconsistent PG's, repair ineffective".
>> >> >> >
>> >> >> > --
>> >> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
>> >> >> > _______________________________________________
>> >> >> > ceph-users mailing list
>> >> >> > ceph-users@lists.ceph.com
>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >>
>> >> >
>> >> >
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>
>> >
>> >
>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ceph-users] scrub error: found clone without head
  2013-05-23 22:17                         ` [ceph-users] " Samuel Just
@ 2013-05-23 22:27                           ` Olivier Bonvalet
  2013-05-23 22:53                             ` Samuel Just
  0 siblings, 1 reply; 14+ messages in thread
From: Olivier Bonvalet @ 2013-05-23 22:27 UTC (permalink / raw)
  To: Samuel Just; +Cc: Denis Kaganovich, ceph-users@lists.ceph.com, ceph-devel

No : 
pg 3.7c is active+clean+inconsistent, acting [24,13,39]
pg 3.6b is active+clean+inconsistent, acting [28,23,5]
pg 3.d is active+clean+inconsistent, acting [29,4,11]
pg 3.1 is active+clean+inconsistent, acting [28,19,5]

But I suppose that all PG *was* having the osd.25 as primary (on the
same host), which is (disabled) buggy OSD.

Question : "12d7" in object path is the snapshot id, right ? If it's the
case, I haven't got any snapshot with this id for the
rb.0.15c26.238e1f29 image.

So, which files should I remove ?

Thanks for your help.


Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit :
> Do all of the affected PGs share osd.28 as the primary?  I think the
> only recovery is probably to manually remove the orphaned clones.
> -Sam
> 
> On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> > Not yet. I keep it for now.
> >
> > Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
> >> rb.0.15c26.238e1f29
> >>
> >> Has that rbd volume been removed?
> >> -Sam
> >>
> >> On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> >> > 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
> >> >
> >> >
> >> > Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
> >> >> What version are you running?
> >> >> -Sam
> >> >>
> >> >> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> >> >> > Is it enough ?
> >> >> >
> >> >> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head'
> >> >> > 2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
> >> >> > 2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
> >> >> > 2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
> >> >> > 2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
> >> >> > 2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
> >> >> > 2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head
> >> >> > 2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head
> >> >> > 2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head
> >> >> > 2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 errors
> >> >> > 2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
> >> >> > 2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
> >> >> > 2013-05-22 15:59:55.024065 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby
> >> >> > 2013-05-22 16:01:45.542579 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby
> >> >> > --
> >> >> > 2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
> >> >> > 2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
> >> >> > 2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
> >> >> > 2013-05-22 16:35:12.240246 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby
> >> >> > 2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
> >> >> > 2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head
> >> >> > 2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head
> >> >> > 2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head
> >> >> > 2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 errors
> >> >> > 2013-05-22 16:46:12.385678 7f7077735700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby
> >> >> > 2013-05-22 16:58:36.079010 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby
> >> >> > 2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
> >> >> > 2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok
> >> >> >
> >> >> >
> >> >> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
> >> >> >
> >> >> >
> >> >> >
> >> >> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
> >> >> >> Can you post your ceph.log with the period including all of these errors?
> >> >> >> -Sam
> >> >> >>
> >> >> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
> >> >> >> <mahatma@bspu.unibel.by> wrote:
> >> >> >> > Olivier Bonvalet пишет:
> >> >> >> >>
> >> >> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
> >> >> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
> >> >> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
> >> >> >> >>>> repairing. How to repair it exclude re-creating of OSD?
> >> >> >> >>>>
> >> >> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple
> >> >> >> >>>> OSDs - it may cause data lost.
> >> >> >> >>>
> >> >> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without
> >> >> >> >>> head". How can I fix that ?
> >> >> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to
> >> >> >> >> manually fix that ? (it's a production cluster)
> >> >> >> >
> >> >> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And
> >> >> >> > many others troubles. So, if you want to keep cluster running, wait for
> >> >> >> > developers answer. IMHO.
> >> >> >> >
> >> >> >> > About manual repair attempt: see issue #4937. Also similar results - in subject
> >> >> >> > "Inconsistent PG's, repair ineffective".
> >> >> >> >
> >> >> >> > --
> >> >> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
> >> >> >> > _______________________________________________
> >> >> >> > ceph-users mailing list
> >> >> >> > ceph-users@lists.ceph.com
> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> >>
> >> >> >
> >> >> >
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >>
> >> >
> >> >
> >>
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [ceph-users] scrub error: found clone without head
  2013-05-23 22:27                           ` Olivier Bonvalet
@ 2013-05-23 22:53                             ` Samuel Just
  2013-05-31 13:36                               ` Olivier Bonvalet
  0 siblings, 1 reply; 14+ messages in thread
From: Samuel Just @ 2013-05-23 22:53 UTC (permalink / raw)
  To: Olivier Bonvalet; +Cc: Denis Kaganovich, ceph-users@lists.ceph.com, ceph-devel

Can you send the filenames in the pg directories for those 4 pgs?
-Sam

On Thu, May 23, 2013 at 3:27 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> No :
> pg 3.7c is active+clean+inconsistent, acting [24,13,39]
> pg 3.6b is active+clean+inconsistent, acting [28,23,5]
> pg 3.d is active+clean+inconsistent, acting [29,4,11]
> pg 3.1 is active+clean+inconsistent, acting [28,19,5]
>
> But I suppose that all PG *was* having the osd.25 as primary (on the
> same host), which is (disabled) buggy OSD.
>
> Question : "12d7" in object path is the snapshot id, right ? If it's the
> case, I haven't got any snapshot with this id for the
> rb.0.15c26.238e1f29 image.
>
> So, which files should I remove ?
>
> Thanks for your help.
>
>
> Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit :
>> Do all of the affected PGs share osd.28 as the primary?  I think the
>> only recovery is probably to manually remove the orphaned clones.
>> -Sam
>>
>> On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
>> > Not yet. I keep it for now.
>> >
>> > Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
>> >> rb.0.15c26.238e1f29
>> >>
>> >> Has that rbd volume been removed?
>> >> -Sam
>> >>
>> >> On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
>> >> > 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
>> >> >
>> >> >
>> >> > Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
>> >> >> What version are you running?
>> >> >> -Sam
>> >> >>
>> >> >> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
>> >> >> > Is it enough ?
>> >> >> >
>> >> >> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head'
>> >> >> > 2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
>> >> >> > 2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
>> >> >> > 2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
>> >> >> > 2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
>> >> >> > 2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
>> >> >> > 2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head
>> >> >> > 2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head
>> >> >> > 2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head
>> >> >> > 2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 errors
>> >> >> > 2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
>> >> >> > 2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
>> >> >> > 2013-05-22 15:59:55.024065 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby
>> >> >> > 2013-05-22 16:01:45.542579 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby
>> >> >> > --
>> >> >> > 2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
>> >> >> > 2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
>> >> >> > 2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
>> >> >> > 2013-05-22 16:35:12.240246 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby
>> >> >> > 2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
>> >> >> > 2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head
>> >> >> > 2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head
>> >> >> > 2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head
>> >> >> > 2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 errors
>> >> >> > 2013-05-22 16:46:12.385678 7f7077735700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby
>> >> >> > 2013-05-22 16:58:36.079010 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby
>> >> >> > 2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
>> >> >> > 2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok
>> >> >> >
>> >> >> >
>> >> >> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
>> >> >> >> Can you post your ceph.log with the period including all of these errors?
>> >> >> >> -Sam
>> >> >> >>
>> >> >> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
>> >> >> >> <mahatma@bspu.unibel.by> wrote:
>> >> >> >> > Olivier Bonvalet пишет:
>> >> >> >> >>
>> >> >> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
>> >> >> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
>> >> >> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
>> >> >> >> >>>> repairing. How to repair it exclude re-creating of OSD?
>> >> >> >> >>>>
>> >> >> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple
>> >> >> >> >>>> OSDs - it may cause data lost.
>> >> >> >> >>>
>> >> >> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without
>> >> >> >> >>> head". How can I fix that ?
>> >> >> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to
>> >> >> >> >> manually fix that ? (it's a production cluster)
>> >> >> >> >
>> >> >> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And
>> >> >> >> > many others troubles. So, if you want to keep cluster running, wait for
>> >> >> >> > developers answer. IMHO.
>> >> >> >> >
>> >> >> >> > About manual repair attempt: see issue #4937. Also similar results - in subject
>> >> >> >> > "Inconsistent PG's, repair ineffective".
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
>> >> >> >> > _______________________________________________
>> >> >> >> > ceph-users mailing list
>> >> >> >> > ceph-users@lists.ceph.com
>> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> --
>> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scrub error: found clone without head
  2013-05-23 22:53                             ` Samuel Just
@ 2013-05-31 13:36                               ` Olivier Bonvalet
  2013-05-31 14:34                                 ` Olivier Bonvalet
  0 siblings, 1 reply; 14+ messages in thread
From: Olivier Bonvalet @ 2013-05-31 13:36 UTC (permalink / raw)
  To: Samuel Just
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-devel,
	Denis Kaganovich

Hi,

sorry for the late answer : trying to fix that, I tried to delete the
image (rbd rm XXX), the "rbd rm" complete without errors, but "rbd ls"
still display this image.

What should I do ?


Here the files for the PG 3.6b :

# find /var/lib/ceph/osd/ceph-28/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
-rw-r--r-- 1 root root 4194304 19 mai   22:52 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.000000009221__12d7_ADE3C16B__3
-rw-r--r-- 1 root root 4194304 19 mai   23:00 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.000000003671__12d7_261CC0EB__3
-rw-r--r-- 1 root root 4194304 19 mai   22:59 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.0000000086a2__12d7_B10DEAEB__3

# find /var/lib/ceph/osd/ceph-23/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
-rw-r--r-- 1 root root 4194304 25 mars  19:18 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.000000009221__12d7_ADE3C16B__3
-rw-r--r-- 1 root root 4194304 25 mars  19:33 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.000000003671__12d7_261CC0EB__3
-rw-r--r-- 1 root root 4194304 25 mars  19:34 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.0000000086a2__12d7_B10DEAEB__3

# find /var/lib/ceph/osd/ceph-5/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
-rw-r--r-- 1 root root 4194304 25 mars  19:18 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.000000009221__12d7_ADE3C16B__3
-rw-r--r-- 1 root root 4194304 25 mars  19:33 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.000000003671__12d7_261CC0EB__3
-rw-r--r-- 1 root root 4194304 25 mars  19:34 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.0000000086a2__12d7_B10DEAEB__3


As you can see, OSD doesn't contain any other data on thoses PG for this RBD image. Should I remove them thought rados ?


In fact I remember that some of thoses files was truncated (size 0), then I manually copy data from osd-5. It was probably an error to do that.


Thanks,
Olivier

Le jeudi 23 mai 2013 à 15:53 -0700, Samuel Just a écrit :
> Can you send the filenames in the pg directories for those 4 pgs?
> -Sam
> 
> On Thu, May 23, 2013 at 3:27 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> > No :
> > pg 3.7c is active+clean+inconsistent, acting [24,13,39]
> > pg 3.6b is active+clean+inconsistent, acting [28,23,5]
> > pg 3.d is active+clean+inconsistent, acting [29,4,11]
> > pg 3.1 is active+clean+inconsistent, acting [28,19,5]
> >
> > But I suppose that all PG *was* having the osd.25 as primary (on the
> > same host), which is (disabled) buggy OSD.
> >
> > Question : "12d7" in object path is the snapshot id, right ? If it's the
> > case, I haven't got any snapshot with this id for the
> > rb.0.15c26.238e1f29 image.
> >
> > So, which files should I remove ?
> >
> > Thanks for your help.
> >
> >
> > Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit :
> >> Do all of the affected PGs share osd.28 as the primary?  I think the
> >> only recovery is probably to manually remove the orphaned clones.
> >> -Sam
> >>
> >> On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> >> > Not yet. I keep it for now.
> >> >
> >> > Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
> >> >> rb.0.15c26.238e1f29
> >> >>
> >> >> Has that rbd volume been removed?
> >> >> -Sam
> >> >>
> >> >> On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> >> >> > 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
> >> >> >
> >> >> >
> >> >> > Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
> >> >> >> What version are you running?
> >> >> >> -Sam
> >> >> >>
> >> >> >> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> >> >> >> > Is it enough ?
> >> >> >> >
> >> >> >> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head'
> >> >> >> > 2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
> >> >> >> > 2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
> >> >> >> > 2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
> >> >> >> > 2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
> >> >> >> > 2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
> >> >> >> > 2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head
> >> >> >> > 2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head
> >> >> >> > 2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head
> >> >> >> > 2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 errors
> >> >> >> > 2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
> >> >> >> > 2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
> >> >> >> > 2013-05-22 15:59:55.024065 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby
> >> >> >> > 2013-05-22 16:01:45.542579 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby
> >> >> >> > --
> >> >> >> > 2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
> >> >> >> > 2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
> >> >> >> > 2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
> >> >> >> > 2013-05-22 16:35:12.240246 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby
> >> >> >> > 2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
> >> >> >> > 2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head
> >> >> >> > 2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head
> >> >> >> > 2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head
> >> >> >> > 2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 errors
> >> >> >> > 2013-05-22 16:46:12.385678 7f7077735700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby
> >> >> >> > 2013-05-22 16:58:36.079010 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby
> >> >> >> > 2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
> >> >> >> > 2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok
> >> >> >> >
> >> >> >> >
> >> >> >> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
> >> >> >> >> Can you post your ceph.log with the period including all of these errors?
> >> >> >> >> -Sam
> >> >> >> >>
> >> >> >> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
> >> >> >> >> <mahatma@bspu.unibel.by> wrote:
> >> >> >> >> > Olivier Bonvalet пишет:
> >> >> >> >> >>
> >> >> >> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
> >> >> >> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
> >> >> >> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
> >> >> >> >> >>>> repairing. How to repair it exclude re-creating of OSD?
> >> >> >> >> >>>>
> >> >> >> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple
> >> >> >> >> >>>> OSDs - it may cause data lost.
> >> >> >> >> >>>
> >> >> >> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without
> >> >> >> >> >>> head". How can I fix that ?
> >> >> >> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to
> >> >> >> >> >> manually fix that ? (it's a production cluster)
> >> >> >> >> >
> >> >> >> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And
> >> >> >> >> > many others troubles. So, if you want to keep cluster running, wait for
> >> >> >> >> > developers answer. IMHO.
> >> >> >> >> >
> >> >> >> >> > About manual repair attempt: see issue #4937. Also similar results - in subject
> >> >> >> >> > "Inconsistent PG's, repair ineffective".
> >> >> >> >> >
> >> >> >> >> > --
> >> >> >> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
> >> >> >> >> > _______________________________________________
> >> >> >> >> > ceph-users mailing list
> >> >> >> >> > ceph-users@lists.ceph.com
> >> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> --
> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scrub error: found clone without head
  2013-05-31 13:36                               ` Olivier Bonvalet
@ 2013-05-31 14:34                                 ` Olivier Bonvalet
  2013-05-31 15:55                                   ` [solved] " Olivier Bonvalet
  0 siblings, 1 reply; 14+ messages in thread
From: Olivier Bonvalet @ 2013-05-31 14:34 UTC (permalink / raw)
  To: Samuel Just
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-devel,
	Denis Kaganovich

Note that I still have scrub errors, but rados doesn't see thoses
objects :

root! brontes:~# rados -p hdd3copies ls | grep '^rb.0.15c26.238e1f29'
root! brontes:~# 



Le vendredi 31 mai 2013 à 15:36 +0200, Olivier Bonvalet a écrit :
> Hi,
> 
> sorry for the late answer : trying to fix that, I tried to delete the
> image (rbd rm XXX), the "rbd rm" complete without errors, but "rbd ls"
> still display this image.
> 
> What should I do ?
> 
> 
> Here the files for the PG 3.6b :
> 
> # find /var/lib/ceph/osd/ceph-28/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
> -rw-r--r-- 1 root root 4194304 19 mai   22:52 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.000000009221__12d7_ADE3C16B__3
> -rw-r--r-- 1 root root 4194304 19 mai   23:00 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.000000003671__12d7_261CC0EB__3
> -rw-r--r-- 1 root root 4194304 19 mai   22:59 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.0000000086a2__12d7_B10DEAEB__3
> 
> # find /var/lib/ceph/osd/ceph-23/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
> -rw-r--r-- 1 root root 4194304 25 mars  19:18 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.000000009221__12d7_ADE3C16B__3
> -rw-r--r-- 1 root root 4194304 25 mars  19:33 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.000000003671__12d7_261CC0EB__3
> -rw-r--r-- 1 root root 4194304 25 mars  19:34 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.0000000086a2__12d7_B10DEAEB__3
> 
> # find /var/lib/ceph/osd/ceph-5/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
> -rw-r--r-- 1 root root 4194304 25 mars  19:18 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.000000009221__12d7_ADE3C16B__3
> -rw-r--r-- 1 root root 4194304 25 mars  19:33 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.000000003671__12d7_261CC0EB__3
> -rw-r--r-- 1 root root 4194304 25 mars  19:34 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.0000000086a2__12d7_B10DEAEB__3
> 
> 
> As you can see, OSD doesn't contain any other data on thoses PG for this RBD image. Should I remove them thought rados ?
> 
> 
> In fact I remember that some of thoses files was truncated (size 0), then I manually copy data from osd-5. It was probably an error to do that.
> 
> 
> Thanks,
> Olivier
> 
> Le jeudi 23 mai 2013 à 15:53 -0700, Samuel Just a écrit :
> > Can you send the filenames in the pg directories for those 4 pgs?
> > -Sam
> > 
> > On Thu, May 23, 2013 at 3:27 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> > > No :
> > > pg 3.7c is active+clean+inconsistent, acting [24,13,39]
> > > pg 3.6b is active+clean+inconsistent, acting [28,23,5]
> > > pg 3.d is active+clean+inconsistent, acting [29,4,11]
> > > pg 3.1 is active+clean+inconsistent, acting [28,19,5]
> > >
> > > But I suppose that all PG *was* having the osd.25 as primary (on the
> > > same host), which is (disabled) buggy OSD.
> > >
> > > Question : "12d7" in object path is the snapshot id, right ? If it's the
> > > case, I haven't got any snapshot with this id for the
> > > rb.0.15c26.238e1f29 image.
> > >
> > > So, which files should I remove ?
> > >
> > > Thanks for your help.
> > >
> > >
> > > Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit :
> > >> Do all of the affected PGs share osd.28 as the primary?  I think the
> > >> only recovery is probably to manually remove the orphaned clones.
> > >> -Sam
> > >>
> > >> On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> > >> > Not yet. I keep it for now.
> > >> >
> > >> > Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
> > >> >> rb.0.15c26.238e1f29
> > >> >>
> > >> >> Has that rbd volume been removed?
> > >> >> -Sam
> > >> >>
> > >> >> On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> > >> >> > 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
> > >> >> >
> > >> >> >
> > >> >> > Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
> > >> >> >> What version are you running?
> > >> >> >> -Sam
> > >> >> >>
> > >> >> >> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> > >> >> >> > Is it enough ?
> > >> >> >> >
> > >> >> >> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head'
> > >> >> >> > 2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
> > >> >> >> > 2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
> > >> >> >> > 2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
> > >> >> >> > 2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
> > >> >> >> > 2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
> > >> >> >> > 2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head
> > >> >> >> > 2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head
> > >> >> >> > 2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head
> > >> >> >> > 2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 errors
> > >> >> >> > 2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
> > >> >> >> > 2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
> > >> >> >> > 2013-05-22 15:59:55.024065 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby
> > >> >> >> > 2013-05-22 16:01:45.542579 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby
> > >> >> >> > --
> > >> >> >> > 2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
> > >> >> >> > 2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
> > >> >> >> > 2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
> > >> >> >> > 2013-05-22 16:35:12.240246 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby
> > >> >> >> > 2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
> > >> >> >> > 2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head
> > >> >> >> > 2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head
> > >> >> >> > 2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head
> > >> >> >> > 2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 errors
> > >> >> >> > 2013-05-22 16:46:12.385678 7f7077735700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby
> > >> >> >> > 2013-05-22 16:58:36.079010 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby
> > >> >> >> > 2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
> > >> >> >> > 2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok
> > >> >> >> >
> > >> >> >> >
> > >> >> >> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
> > >> >> >> >
> > >> >> >> >
> > >> >> >> >
> > >> >> >> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
> > >> >> >> >> Can you post your ceph.log with the period including all of these errors?
> > >> >> >> >> -Sam
> > >> >> >> >>
> > >> >> >> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
> > >> >> >> >> <mahatma@bspu.unibel.by> wrote:
> > >> >> >> >> > Olivier Bonvalet пишет:
> > >> >> >> >> >>
> > >> >> >> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
> > >> >> >> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
> > >> >> >> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
> > >> >> >> >> >>>> repairing. How to repair it exclude re-creating of OSD?
> > >> >> >> >> >>>>
> > >> >> >> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple
> > >> >> >> >> >>>> OSDs - it may cause data lost.
> > >> >> >> >> >>>
> > >> >> >> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without
> > >> >> >> >> >>> head". How can I fix that ?
> > >> >> >> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to
> > >> >> >> >> >> manually fix that ? (it's a production cluster)
> > >> >> >> >> >
> > >> >> >> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And
> > >> >> >> >> > many others troubles. So, if you want to keep cluster running, wait for
> > >> >> >> >> > developers answer. IMHO.
> > >> >> >> >> >
> > >> >> >> >> > About manual repair attempt: see issue #4937. Also similar results - in subject
> > >> >> >> >> > "Inconsistent PG's, repair ineffective".
> > >> >> >> >> >
> > >> >> >> >> > --
> > >> >> >> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
> > >> >> >> >> > _______________________________________________
> > >> >> >> >> > ceph-users mailing list
> > >> >> >> >> > ceph-users@lists.ceph.com
> > >> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >> >> >> >>
> > >> >> >> >
> > >> >> >> >
> > >> >> >> --
> > >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > >> >> >> the body of a message to majordomo@vger.kernel.org
> > >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >> >> >>
> > >> >> >
> > >> >> >
> > >> >>
> > >> >
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > >> the body of a message to majordomo@vger.kernel.org
> > >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >>
> > >
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [solved] scrub error: found clone without head
  2013-05-31 14:34                                 ` Olivier Bonvalet
@ 2013-05-31 15:55                                   ` Olivier Bonvalet
  0 siblings, 0 replies; 14+ messages in thread
From: Olivier Bonvalet @ 2013-05-31 15:55 UTC (permalink / raw)
  To: Samuel Just
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-devel,
	Denis Kaganovich

Ok, so :
- after a second "rbd rm XXX", the image was gone
- and "rados ls" doesn't see any object from that image
- so I tried to move thoses files

=> scrub is now ok !

So for me it's fixed. Thanks

Le vendredi 31 mai 2013 à 16:34 +0200, Olivier Bonvalet a écrit :
> Note that I still have scrub errors, but rados doesn't see thoses
> objects :
> 
> root! brontes:~# rados -p hdd3copies ls | grep '^rb.0.15c26.238e1f29'
> root! brontes:~# 
> 
> 
> 



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-05-31 15:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <5188F8D2.5040303@bspu.unibel.by>
     [not found] ` <1369001190.9705.37.camel@localhost>
2013-05-22  7:00   ` scrub error: found clone without head Olivier Bonvalet
2013-05-22 12:39     ` Dzianis Kahanovich
     [not found]       ` <519CBC66.9030607-jC57FVqSskN9IU3jSFkzTg@public.gmane.org>
2013-05-22 18:01         ` Samuel Just
     [not found]           ` <CA+4uBUYKNnGfH2JnDBmyRy86RovFEOV=pbW2cdhQ2kV0spijXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-22 18:25             ` Olivier Bonvalet
2013-05-22 19:00               ` Samuel Just
2013-05-22 19:18                 ` [ceph-users] " Olivier Bonvalet
2013-05-22 22:50                   ` Samuel Just
     [not found]                     ` <CA+4uBUZcsi5eK31p8t+-7jZaERu40O1LSRW6wJOkfho_FwYXkg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-23 12:00                       ` Olivier Bonvalet
2013-05-23 22:17                         ` [ceph-users] " Samuel Just
2013-05-23 22:27                           ` Olivier Bonvalet
2013-05-23 22:53                             ` Samuel Just
2013-05-31 13:36                               ` Olivier Bonvalet
2013-05-31 14:34                                 ` Olivier Bonvalet
2013-05-31 15:55                                   ` [solved] " Olivier Bonvalet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.