* Re: scrub error: found clone without head [not found] ` <1369001190.9705.37.camel@localhost> @ 2013-05-22 7:00 ` Olivier Bonvalet 2013-05-22 12:39 ` Dzianis Kahanovich 0 siblings, 1 reply; 14+ messages in thread From: Olivier Bonvalet @ 2013-05-22 7:00 UTC (permalink / raw) To: ceph-users-idqoXFIVOFJgJs9I8MT0rw; +Cc: ceph-devel Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : > Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : > > I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not > > repairing. How to repair it exclude re-creating of OSD? > > > > Now it "easy" to clean+create OSD, but in theory - in case there are multiple > > OSDs - it may cause data lost. > > > > -- > > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/ > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > Hi, > > I have same problem : 8 objects (4 PG) with error "found clone without > head". How can I fix that ? > > thanks, > Olivier > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Hi, since "pg repair" doesn't handle that kind of errors, is there a way to manually fix that ? (it's a production cluster) thanks in advance, Olivier _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scrub error: found clone without head 2013-05-22 7:00 ` scrub error: found clone without head Olivier Bonvalet @ 2013-05-22 12:39 ` Dzianis Kahanovich [not found] ` <519CBC66.9030607-jC57FVqSskN9IU3jSFkzTg@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Dzianis Kahanovich @ 2013-05-22 12:39 UTC (permalink / raw) To: Olivier Bonvalet, ceph-users-idqoXFIVOFJgJs9I8MT0rw; +Cc: ceph-devel Olivier Bonvalet пишет: > > Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : >> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : >>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not >>> repairing. How to repair it exclude re-creating of OSD? >>> >>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple >>> OSDs - it may cause data lost. >> >> I have same problem : 8 objects (4 PG) with error "found clone without >> head". How can I fix that ? > since "pg repair" doesn't handle that kind of errors, is there a way to > manually fix that ? (it's a production cluster) Trying to fix manually I cause assertions in trimming process (died OSD). And many others troubles. So, if you want to keep cluster running, wait for developers answer. IMHO. About manual repair attempt: see issue #4937. Also similar results - in subject "Inconsistent PG's, repair ineffective". -- WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/ _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <519CBC66.9030607-jC57FVqSskN9IU3jSFkzTg@public.gmane.org>]
* Re: scrub error: found clone without head [not found] ` <519CBC66.9030607-jC57FVqSskN9IU3jSFkzTg@public.gmane.org> @ 2013-05-22 18:01 ` Samuel Just [not found] ` <CA+4uBUYKNnGfH2JnDBmyRy86RovFEOV=pbW2cdhQ2kV0spijXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Samuel Just @ 2013-05-22 18:01 UTC (permalink / raw) To: mahatma-cw37gHAUgAY Cc: ceph-devel, ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org Can you post your ceph.log with the period including all of these errors? -Sam On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich <mahatma@bspu.unibel.by> wrote: > Olivier Bonvalet пишет: >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not >>>> repairing. How to repair it exclude re-creating of OSD? >>>> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple >>>> OSDs - it may cause data lost. >>> >>> I have same problem : 8 objects (4 PG) with error "found clone without >>> head". How can I fix that ? >> since "pg repair" doesn't handle that kind of errors, is there a way to >> manually fix that ? (it's a production cluster) > > Trying to fix manually I cause assertions in trimming process (died OSD). And > many others troubles. So, if you want to keep cluster running, wait for > developers answer. IMHO. > > About manual repair attempt: see issue #4937. Also similar results - in subject > "Inconsistent PG's, repair ineffective". > > -- > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/ > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <CA+4uBUYKNnGfH2JnDBmyRy86RovFEOV=pbW2cdhQ2kV0spijXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: scrub error: found clone without head [not found] ` <CA+4uBUYKNnGfH2JnDBmyRy86RovFEOV=pbW2cdhQ2kV0spijXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-05-22 18:25 ` Olivier Bonvalet 2013-05-22 19:00 ` Samuel Just 0 siblings, 1 reply; 14+ messages in thread From: Olivier Bonvalet @ 2013-05-22 18:25 UTC (permalink / raw) To: Samuel Just Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-devel, mahatma-cw37gHAUgAY Is it enough ? # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head' 2013-05-22 15:43:09.308352 7f707dd64700 0 log [INF] : 9.105 scrub ok 2013-05-22 15:44:21.054893 7f707dd64700 0 log [INF] : 9.451 scrub ok 2013-05-22 15:44:52.898784 7f707cd62700 0 log [INF] : 9.784 scrub ok 2013-05-22 15:47:43.148515 7f707cd62700 0 log [INF] : 9.3c3 scrub ok 2013-05-22 15:47:45.717085 7f707dd64700 0 log [INF] : 9.3d0 scrub ok 2013-05-22 15:52:14.573815 7f707dd64700 0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head 2013-05-22 15:55:07.230114 7f707d563700 0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head 2013-05-22 15:56:56.456242 7f707d563700 0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head 2013-05-22 15:57:51.667085 7f707dd64700 0 log [ERR] : 3.6b scrub 3 errors 2013-05-22 15:57:55.241224 7f707dd64700 0 log [INF] : 9.450 scrub ok 2013-05-22 15:57:59.800383 7f707cd62700 0 log [INF] : 9.465 scrub ok 2013-05-22 15:59:55.024065 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby 2013-05-22 16:01:45.542579 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby -- 2013-05-22 16:29:49.544310 7f707dd64700 0 log [INF] : 9.4eb scrub ok 2013-05-22 16:29:53.190233 7f707dd64700 0 log [INF] : 9.4f4 scrub ok 2013-05-22 16:29:59.478736 7f707dd64700 0 log [INF] : 8.6bb scrub ok 2013-05-22 16:35:12.240246 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby 2013-05-22 16:35:19.519019 7f707d563700 0 log [INF] : 8.700 scrub ok 2013-05-22 16:39:15.422532 7f707dd64700 0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head 2013-05-22 16:40:04.995256 7f707cd62700 0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head 2013-05-22 16:41:07.008717 7f707d563700 0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head 2013-05-22 16:41:42.460280 7f707c561700 0 log [ERR] : 3.1 scrub 3 errors 2013-05-22 16:46:12.385678 7f7077735700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby 2013-05-22 16:58:36.079010 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby 2013-05-22 16:58:36.798038 7f707d563700 0 log [INF] : 9.50c scrub ok 2013-05-22 16:58:40.104159 7f707c561700 0 log [INF] : 9.526 scrub ok Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29). Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit : > Can you post your ceph.log with the period including all of these errors? > -Sam > > On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich > <mahatma@bspu.unibel.by> wrote: > > Olivier Bonvalet пишет: > >> > >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : > >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : > >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not > >>>> repairing. How to repair it exclude re-creating of OSD? > >>>> > >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple > >>>> OSDs - it may cause data lost. > >>> > >>> I have same problem : 8 objects (4 PG) with error "found clone without > >>> head". How can I fix that ? > >> since "pg repair" doesn't handle that kind of errors, is there a way to > >> manually fix that ? (it's a production cluster) > > > > Trying to fix manually I cause assertions in trimming process (died OSD). And > > many others troubles. So, if you want to keep cluster running, wait for > > developers answer. IMHO. > > > > About manual repair attempt: see issue #4937. Also similar results - in subject > > "Inconsistent PG's, repair ineffective". > > > > -- > > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/ > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scrub error: found clone without head 2013-05-22 18:25 ` Olivier Bonvalet @ 2013-05-22 19:00 ` Samuel Just 2013-05-22 19:18 ` [ceph-users] " Olivier Bonvalet 0 siblings, 1 reply; 14+ messages in thread From: Samuel Just @ 2013-05-22 19:00 UTC (permalink / raw) To: Olivier Bonvalet Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-devel, Denis Kaganovich What version are you running? -Sam On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > Is it enough ? > > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head' > 2013-05-22 15:43:09.308352 7f707dd64700 0 log [INF] : 9.105 scrub ok > 2013-05-22 15:44:21.054893 7f707dd64700 0 log [INF] : 9.451 scrub ok > 2013-05-22 15:44:52.898784 7f707cd62700 0 log [INF] : 9.784 scrub ok > 2013-05-22 15:47:43.148515 7f707cd62700 0 log [INF] : 9.3c3 scrub ok > 2013-05-22 15:47:45.717085 7f707dd64700 0 log [INF] : 9.3d0 scrub ok > 2013-05-22 15:52:14.573815 7f707dd64700 0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head > 2013-05-22 15:55:07.230114 7f707d563700 0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head > 2013-05-22 15:56:56.456242 7f707d563700 0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head > 2013-05-22 15:57:51.667085 7f707dd64700 0 log [ERR] : 3.6b scrub 3 errors > 2013-05-22 15:57:55.241224 7f707dd64700 0 log [INF] : 9.450 scrub ok > 2013-05-22 15:57:59.800383 7f707cd62700 0 log [INF] : 9.465 scrub ok > 2013-05-22 15:59:55.024065 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby > 2013-05-22 16:01:45.542579 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby > -- > 2013-05-22 16:29:49.544310 7f707dd64700 0 log [INF] : 9.4eb scrub ok > 2013-05-22 16:29:53.190233 7f707dd64700 0 log [INF] : 9.4f4 scrub ok > 2013-05-22 16:29:59.478736 7f707dd64700 0 log [INF] : 8.6bb scrub ok > 2013-05-22 16:35:12.240246 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby > 2013-05-22 16:35:19.519019 7f707d563700 0 log [INF] : 8.700 scrub ok > 2013-05-22 16:39:15.422532 7f707dd64700 0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head > 2013-05-22 16:40:04.995256 7f707cd62700 0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head > 2013-05-22 16:41:07.008717 7f707d563700 0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head > 2013-05-22 16:41:42.460280 7f707c561700 0 log [ERR] : 3.1 scrub 3 errors > 2013-05-22 16:46:12.385678 7f7077735700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby > 2013-05-22 16:58:36.079010 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby > 2013-05-22 16:58:36.798038 7f707d563700 0 log [INF] : 9.50c scrub ok > 2013-05-22 16:58:40.104159 7f707c561700 0 log [INF] : 9.526 scrub ok > > > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29). > > > > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit : >> Can you post your ceph.log with the period including all of these errors? >> -Sam >> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich >> <mahatma@bspu.unibel.by> wrote: >> > Olivier Bonvalet пишет: >> >> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not >> >>>> repairing. How to repair it exclude re-creating of OSD? >> >>>> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple >> >>>> OSDs - it may cause data lost. >> >>> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without >> >>> head". How can I fix that ? >> >> since "pg repair" doesn't handle that kind of errors, is there a way to >> >> manually fix that ? (it's a production cluster) >> > >> > Trying to fix manually I cause assertions in trimming process (died OSD). And >> > many others troubles. So, if you want to keep cluster running, wait for >> > developers answer. IMHO. >> > >> > About manual repair attempt: see issue #4937. Also similar results - in subject >> > "Inconsistent PG's, repair ineffective". >> > >> > -- >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/ >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ceph-users] scrub error: found clone without head 2013-05-22 19:00 ` Samuel Just @ 2013-05-22 19:18 ` Olivier Bonvalet 2013-05-22 22:50 ` Samuel Just 0 siblings, 1 reply; 14+ messages in thread From: Olivier Bonvalet @ 2013-05-22 19:18 UTC (permalink / raw) To: Samuel Just; +Cc: Denis Kaganovich, ceph-users@lists.ceph.com, ceph-devel 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail. Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit : > What version are you running? > -Sam > > On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > > Is it enough ? > > > > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head' > > 2013-05-22 15:43:09.308352 7f707dd64700 0 log [INF] : 9.105 scrub ok > > 2013-05-22 15:44:21.054893 7f707dd64700 0 log [INF] : 9.451 scrub ok > > 2013-05-22 15:44:52.898784 7f707cd62700 0 log [INF] : 9.784 scrub ok > > 2013-05-22 15:47:43.148515 7f707cd62700 0 log [INF] : 9.3c3 scrub ok > > 2013-05-22 15:47:45.717085 7f707dd64700 0 log [INF] : 9.3d0 scrub ok > > 2013-05-22 15:52:14.573815 7f707dd64700 0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head > > 2013-05-22 15:55:07.230114 7f707d563700 0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head > > 2013-05-22 15:56:56.456242 7f707d563700 0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head > > 2013-05-22 15:57:51.667085 7f707dd64700 0 log [ERR] : 3.6b scrub 3 errors > > 2013-05-22 15:57:55.241224 7f707dd64700 0 log [INF] : 9.450 scrub ok > > 2013-05-22 15:57:59.800383 7f707cd62700 0 log [INF] : 9.465 scrub ok > > 2013-05-22 15:59:55.024065 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby > > 2013-05-22 16:01:45.542579 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby > > -- > > 2013-05-22 16:29:49.544310 7f707dd64700 0 log [INF] : 9.4eb scrub ok > > 2013-05-22 16:29:53.190233 7f707dd64700 0 log [INF] : 9.4f4 scrub ok > > 2013-05-22 16:29:59.478736 7f707dd64700 0 log [INF] : 8.6bb scrub ok > > 2013-05-22 16:35:12.240246 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby > > 2013-05-22 16:35:19.519019 7f707d563700 0 log [INF] : 8.700 scrub ok > > 2013-05-22 16:39:15.422532 7f707dd64700 0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head > > 2013-05-22 16:40:04.995256 7f707cd62700 0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head > > 2013-05-22 16:41:07.008717 7f707d563700 0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head > > 2013-05-22 16:41:42.460280 7f707c561700 0 log [ERR] : 3.1 scrub 3 errors > > 2013-05-22 16:46:12.385678 7f7077735700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby > > 2013-05-22 16:58:36.079010 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby > > 2013-05-22 16:58:36.798038 7f707d563700 0 log [INF] : 9.50c scrub ok > > 2013-05-22 16:58:40.104159 7f707c561700 0 log [INF] : 9.526 scrub ok > > > > > > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29). > > > > > > > > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit : > >> Can you post your ceph.log with the period including all of these errors? > >> -Sam > >> > >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich > >> <mahatma@bspu.unibel.by> wrote: > >> > Olivier Bonvalet пишет: > >> >> > >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : > >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : > >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not > >> >>>> repairing. How to repair it exclude re-creating of OSD? > >> >>>> > >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple > >> >>>> OSDs - it may cause data lost. > >> >>> > >> >>> I have same problem : 8 objects (4 PG) with error "found clone without > >> >>> head". How can I fix that ? > >> >> since "pg repair" doesn't handle that kind of errors, is there a way to > >> >> manually fix that ? (it's a production cluster) > >> > > >> > Trying to fix manually I cause assertions in trimming process (died OSD). And > >> > many others troubles. So, if you want to keep cluster running, wait for > >> > developers answer. IMHO. > >> > > >> > About manual repair attempt: see issue #4937. Also similar results - in subject > >> > "Inconsistent PG's, repair ineffective". > >> > > >> > -- > >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/ > >> > _______________________________________________ > >> > ceph-users mailing list > >> > ceph-users@lists.ceph.com > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scrub error: found clone without head 2013-05-22 19:18 ` [ceph-users] " Olivier Bonvalet @ 2013-05-22 22:50 ` Samuel Just [not found] ` <CA+4uBUZcsi5eK31p8t+-7jZaERu40O1LSRW6wJOkfho_FwYXkg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Samuel Just @ 2013-05-22 22:50 UTC (permalink / raw) To: Olivier Bonvalet Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-devel, Denis Kaganovich rb.0.15c26.238e1f29 Has that rbd volume been removed? -Sam On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail. > > > Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit : >> What version are you running? >> -Sam >> >> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: >> > Is it enough ? >> > >> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head' >> > 2013-05-22 15:43:09.308352 7f707dd64700 0 log [INF] : 9.105 scrub ok >> > 2013-05-22 15:44:21.054893 7f707dd64700 0 log [INF] : 9.451 scrub ok >> > 2013-05-22 15:44:52.898784 7f707cd62700 0 log [INF] : 9.784 scrub ok >> > 2013-05-22 15:47:43.148515 7f707cd62700 0 log [INF] : 9.3c3 scrub ok >> > 2013-05-22 15:47:45.717085 7f707dd64700 0 log [INF] : 9.3d0 scrub ok >> > 2013-05-22 15:52:14.573815 7f707dd64700 0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head >> > 2013-05-22 15:55:07.230114 7f707d563700 0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head >> > 2013-05-22 15:56:56.456242 7f707d563700 0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head >> > 2013-05-22 15:57:51.667085 7f707dd64700 0 log [ERR] : 3.6b scrub 3 errors >> > 2013-05-22 15:57:55.241224 7f707dd64700 0 log [INF] : 9.450 scrub ok >> > 2013-05-22 15:57:59.800383 7f707cd62700 0 log [INF] : 9.465 scrub ok >> > 2013-05-22 15:59:55.024065 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby >> > 2013-05-22 16:01:45.542579 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby >> > -- >> > 2013-05-22 16:29:49.544310 7f707dd64700 0 log [INF] : 9.4eb scrub ok >> > 2013-05-22 16:29:53.190233 7f707dd64700 0 log [INF] : 9.4f4 scrub ok >> > 2013-05-22 16:29:59.478736 7f707dd64700 0 log [INF] : 8.6bb scrub ok >> > 2013-05-22 16:35:12.240246 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby >> > 2013-05-22 16:35:19.519019 7f707d563700 0 log [INF] : 8.700 scrub ok >> > 2013-05-22 16:39:15.422532 7f707dd64700 0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head >> > 2013-05-22 16:40:04.995256 7f707cd62700 0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head >> > 2013-05-22 16:41:07.008717 7f707d563700 0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head >> > 2013-05-22 16:41:42.460280 7f707c561700 0 log [ERR] : 3.1 scrub 3 errors >> > 2013-05-22 16:46:12.385678 7f7077735700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby >> > 2013-05-22 16:58:36.079010 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby >> > 2013-05-22 16:58:36.798038 7f707d563700 0 log [INF] : 9.50c scrub ok >> > 2013-05-22 16:58:40.104159 7f707c561700 0 log [INF] : 9.526 scrub ok >> > >> > >> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29). >> > >> > >> > >> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit : >> >> Can you post your ceph.log with the period including all of these errors? >> >> -Sam >> >> >> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich >> >> <mahatma@bspu.unibel.by> wrote: >> >> > Olivier Bonvalet пишет: >> >> >> >> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : >> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : >> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not >> >> >>>> repairing. How to repair it exclude re-creating of OSD? >> >> >>>> >> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple >> >> >>>> OSDs - it may cause data lost. >> >> >>> >> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without >> >> >>> head". How can I fix that ? >> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to >> >> >> manually fix that ? (it's a production cluster) >> >> > >> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And >> >> > many others troubles. So, if you want to keep cluster running, wait for >> >> > developers answer. IMHO. >> >> > >> >> > About manual repair attempt: see issue #4937. Also similar results - in subject >> >> > "Inconsistent PG's, repair ineffective". >> >> > >> >> > -- >> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/ >> >> > _______________________________________________ >> >> > ceph-users mailing list >> >> > ceph-users@lists.ceph.com >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> > >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <CA+4uBUZcsi5eK31p8t+-7jZaERu40O1LSRW6wJOkfho_FwYXkg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: scrub error: found clone without head [not found] ` <CA+4uBUZcsi5eK31p8t+-7jZaERu40O1LSRW6wJOkfho_FwYXkg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-05-23 12:00 ` Olivier Bonvalet 2013-05-23 22:17 ` [ceph-users] " Samuel Just 0 siblings, 1 reply; 14+ messages in thread From: Olivier Bonvalet @ 2013-05-23 12:00 UTC (permalink / raw) To: Samuel Just Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-devel, Denis Kaganovich Not yet. I keep it for now. Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit : > rb.0.15c26.238e1f29 > > Has that rbd volume been removed? > -Sam > > On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > > 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail. > > > > > > Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit : > >> What version are you running? > >> -Sam > >> > >> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > >> > Is it enough ? > >> > > >> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head' > >> > 2013-05-22 15:43:09.308352 7f707dd64700 0 log [INF] : 9.105 scrub ok > >> > 2013-05-22 15:44:21.054893 7f707dd64700 0 log [INF] : 9.451 scrub ok > >> > 2013-05-22 15:44:52.898784 7f707cd62700 0 log [INF] : 9.784 scrub ok > >> > 2013-05-22 15:47:43.148515 7f707cd62700 0 log [INF] : 9.3c3 scrub ok > >> > 2013-05-22 15:47:45.717085 7f707dd64700 0 log [INF] : 9.3d0 scrub ok > >> > 2013-05-22 15:52:14.573815 7f707dd64700 0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head > >> > 2013-05-22 15:55:07.230114 7f707d563700 0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head > >> > 2013-05-22 15:56:56.456242 7f707d563700 0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head > >> > 2013-05-22 15:57:51.667085 7f707dd64700 0 log [ERR] : 3.6b scrub 3 errors > >> > 2013-05-22 15:57:55.241224 7f707dd64700 0 log [INF] : 9.450 scrub ok > >> > 2013-05-22 15:57:59.800383 7f707cd62700 0 log [INF] : 9.465 scrub ok > >> > 2013-05-22 15:59:55.024065 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby > >> > 2013-05-22 16:01:45.542579 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby > >> > -- > >> > 2013-05-22 16:29:49.544310 7f707dd64700 0 log [INF] : 9.4eb scrub ok > >> > 2013-05-22 16:29:53.190233 7f707dd64700 0 log [INF] : 9.4f4 scrub ok > >> > 2013-05-22 16:29:59.478736 7f707dd64700 0 log [INF] : 8.6bb scrub ok > >> > 2013-05-22 16:35:12.240246 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby > >> > 2013-05-22 16:35:19.519019 7f707d563700 0 log [INF] : 8.700 scrub ok > >> > 2013-05-22 16:39:15.422532 7f707dd64700 0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head > >> > 2013-05-22 16:40:04.995256 7f707cd62700 0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head > >> > 2013-05-22 16:41:07.008717 7f707d563700 0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head > >> > 2013-05-22 16:41:42.460280 7f707c561700 0 log [ERR] : 3.1 scrub 3 errors > >> > 2013-05-22 16:46:12.385678 7f7077735700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby > >> > 2013-05-22 16:58:36.079010 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby > >> > 2013-05-22 16:58:36.798038 7f707d563700 0 log [INF] : 9.50c scrub ok > >> > 2013-05-22 16:58:40.104159 7f707c561700 0 log [INF] : 9.526 scrub ok > >> > > >> > > >> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29). > >> > > >> > > >> > > >> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit : > >> >> Can you post your ceph.log with the period including all of these errors? > >> >> -Sam > >> >> > >> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich > >> >> <mahatma@bspu.unibel.by> wrote: > >> >> > Olivier Bonvalet пишет: > >> >> >> > >> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : > >> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : > >> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not > >> >> >>>> repairing. How to repair it exclude re-creating of OSD? > >> >> >>>> > >> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple > >> >> >>>> OSDs - it may cause data lost. > >> >> >>> > >> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without > >> >> >>> head". How can I fix that ? > >> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to > >> >> >> manually fix that ? (it's a production cluster) > >> >> > > >> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And > >> >> > many others troubles. So, if you want to keep cluster running, wait for > >> >> > developers answer. IMHO. > >> >> > > >> >> > About manual repair attempt: see issue #4937. Also similar results - in subject > >> >> > "Inconsistent PG's, repair ineffective". > >> >> > > >> >> > -- > >> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/ > >> >> > _______________________________________________ > >> >> > ceph-users mailing list > >> >> > ceph-users@lists.ceph.com > >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> >> > >> > > >> > > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > > > > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ceph-users] scrub error: found clone without head 2013-05-23 12:00 ` Olivier Bonvalet @ 2013-05-23 22:17 ` Samuel Just 2013-05-23 22:27 ` Olivier Bonvalet 0 siblings, 1 reply; 14+ messages in thread From: Samuel Just @ 2013-05-23 22:17 UTC (permalink / raw) To: Olivier Bonvalet; +Cc: Denis Kaganovich, ceph-users@lists.ceph.com, ceph-devel Do all of the affected PGs share osd.28 as the primary? I think the only recovery is probably to manually remove the orphaned clones. -Sam On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > Not yet. I keep it for now. > > Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit : >> rb.0.15c26.238e1f29 >> >> Has that rbd volume been removed? >> -Sam >> >> On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: >> > 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail. >> > >> > >> > Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit : >> >> What version are you running? >> >> -Sam >> >> >> >> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: >> >> > Is it enough ? >> >> > >> >> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head' >> >> > 2013-05-22 15:43:09.308352 7f707dd64700 0 log [INF] : 9.105 scrub ok >> >> > 2013-05-22 15:44:21.054893 7f707dd64700 0 log [INF] : 9.451 scrub ok >> >> > 2013-05-22 15:44:52.898784 7f707cd62700 0 log [INF] : 9.784 scrub ok >> >> > 2013-05-22 15:47:43.148515 7f707cd62700 0 log [INF] : 9.3c3 scrub ok >> >> > 2013-05-22 15:47:45.717085 7f707dd64700 0 log [INF] : 9.3d0 scrub ok >> >> > 2013-05-22 15:52:14.573815 7f707dd64700 0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head >> >> > 2013-05-22 15:55:07.230114 7f707d563700 0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head >> >> > 2013-05-22 15:56:56.456242 7f707d563700 0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head >> >> > 2013-05-22 15:57:51.667085 7f707dd64700 0 log [ERR] : 3.6b scrub 3 errors >> >> > 2013-05-22 15:57:55.241224 7f707dd64700 0 log [INF] : 9.450 scrub ok >> >> > 2013-05-22 15:57:59.800383 7f707cd62700 0 log [INF] : 9.465 scrub ok >> >> > 2013-05-22 15:59:55.024065 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby >> >> > 2013-05-22 16:01:45.542579 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby >> >> > -- >> >> > 2013-05-22 16:29:49.544310 7f707dd64700 0 log [INF] : 9.4eb scrub ok >> >> > 2013-05-22 16:29:53.190233 7f707dd64700 0 log [INF] : 9.4f4 scrub ok >> >> > 2013-05-22 16:29:59.478736 7f707dd64700 0 log [INF] : 8.6bb scrub ok >> >> > 2013-05-22 16:35:12.240246 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby >> >> > 2013-05-22 16:35:19.519019 7f707d563700 0 log [INF] : 8.700 scrub ok >> >> > 2013-05-22 16:39:15.422532 7f707dd64700 0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head >> >> > 2013-05-22 16:40:04.995256 7f707cd62700 0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head >> >> > 2013-05-22 16:41:07.008717 7f707d563700 0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head >> >> > 2013-05-22 16:41:42.460280 7f707c561700 0 log [ERR] : 3.1 scrub 3 errors >> >> > 2013-05-22 16:46:12.385678 7f7077735700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby >> >> > 2013-05-22 16:58:36.079010 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby >> >> > 2013-05-22 16:58:36.798038 7f707d563700 0 log [INF] : 9.50c scrub ok >> >> > 2013-05-22 16:58:40.104159 7f707c561700 0 log [INF] : 9.526 scrub ok >> >> > >> >> > >> >> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29). >> >> > >> >> > >> >> > >> >> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit : >> >> >> Can you post your ceph.log with the period including all of these errors? >> >> >> -Sam >> >> >> >> >> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich >> >> >> <mahatma@bspu.unibel.by> wrote: >> >> >> > Olivier Bonvalet пишет: >> >> >> >> >> >> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : >> >> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : >> >> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not >> >> >> >>>> repairing. How to repair it exclude re-creating of OSD? >> >> >> >>>> >> >> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple >> >> >> >>>> OSDs - it may cause data lost. >> >> >> >>> >> >> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without >> >> >> >>> head". How can I fix that ? >> >> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to >> >> >> >> manually fix that ? (it's a production cluster) >> >> >> > >> >> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And >> >> >> > many others troubles. So, if you want to keep cluster running, wait for >> >> >> > developers answer. IMHO. >> >> >> > >> >> >> > About manual repair attempt: see issue #4937. Also similar results - in subject >> >> >> > "Inconsistent PG's, repair ineffective". >> >> >> > >> >> >> > -- >> >> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/ >> >> >> > _______________________________________________ >> >> >> > ceph-users mailing list >> >> >> > ceph-users@lists.ceph.com >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> >> > >> >> > >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> >> the body of a message to majordomo@vger.kernel.org >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> > >> > >> > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ceph-users] scrub error: found clone without head 2013-05-23 22:17 ` [ceph-users] " Samuel Just @ 2013-05-23 22:27 ` Olivier Bonvalet 2013-05-23 22:53 ` Samuel Just 0 siblings, 1 reply; 14+ messages in thread From: Olivier Bonvalet @ 2013-05-23 22:27 UTC (permalink / raw) To: Samuel Just; +Cc: Denis Kaganovich, ceph-users@lists.ceph.com, ceph-devel No : pg 3.7c is active+clean+inconsistent, acting [24,13,39] pg 3.6b is active+clean+inconsistent, acting [28,23,5] pg 3.d is active+clean+inconsistent, acting [29,4,11] pg 3.1 is active+clean+inconsistent, acting [28,19,5] But I suppose that all PG *was* having the osd.25 as primary (on the same host), which is (disabled) buggy OSD. Question : "12d7" in object path is the snapshot id, right ? If it's the case, I haven't got any snapshot with this id for the rb.0.15c26.238e1f29 image. So, which files should I remove ? Thanks for your help. Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit : > Do all of the affected PGs share osd.28 as the primary? I think the > only recovery is probably to manually remove the orphaned clones. > -Sam > > On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > > Not yet. I keep it for now. > > > > Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit : > >> rb.0.15c26.238e1f29 > >> > >> Has that rbd volume been removed? > >> -Sam > >> > >> On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > >> > 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail. > >> > > >> > > >> > Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit : > >> >> What version are you running? > >> >> -Sam > >> >> > >> >> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > >> >> > Is it enough ? > >> >> > > >> >> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head' > >> >> > 2013-05-22 15:43:09.308352 7f707dd64700 0 log [INF] : 9.105 scrub ok > >> >> > 2013-05-22 15:44:21.054893 7f707dd64700 0 log [INF] : 9.451 scrub ok > >> >> > 2013-05-22 15:44:52.898784 7f707cd62700 0 log [INF] : 9.784 scrub ok > >> >> > 2013-05-22 15:47:43.148515 7f707cd62700 0 log [INF] : 9.3c3 scrub ok > >> >> > 2013-05-22 15:47:45.717085 7f707dd64700 0 log [INF] : 9.3d0 scrub ok > >> >> > 2013-05-22 15:52:14.573815 7f707dd64700 0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head > >> >> > 2013-05-22 15:55:07.230114 7f707d563700 0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head > >> >> > 2013-05-22 15:56:56.456242 7f707d563700 0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head > >> >> > 2013-05-22 15:57:51.667085 7f707dd64700 0 log [ERR] : 3.6b scrub 3 errors > >> >> > 2013-05-22 15:57:55.241224 7f707dd64700 0 log [INF] : 9.450 scrub ok > >> >> > 2013-05-22 15:57:59.800383 7f707cd62700 0 log [INF] : 9.465 scrub ok > >> >> > 2013-05-22 15:59:55.024065 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby > >> >> > 2013-05-22 16:01:45.542579 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby > >> >> > -- > >> >> > 2013-05-22 16:29:49.544310 7f707dd64700 0 log [INF] : 9.4eb scrub ok > >> >> > 2013-05-22 16:29:53.190233 7f707dd64700 0 log [INF] : 9.4f4 scrub ok > >> >> > 2013-05-22 16:29:59.478736 7f707dd64700 0 log [INF] : 8.6bb scrub ok > >> >> > 2013-05-22 16:35:12.240246 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby > >> >> > 2013-05-22 16:35:19.519019 7f707d563700 0 log [INF] : 8.700 scrub ok > >> >> > 2013-05-22 16:39:15.422532 7f707dd64700 0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head > >> >> > 2013-05-22 16:40:04.995256 7f707cd62700 0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head > >> >> > 2013-05-22 16:41:07.008717 7f707d563700 0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head > >> >> > 2013-05-22 16:41:42.460280 7f707c561700 0 log [ERR] : 3.1 scrub 3 errors > >> >> > 2013-05-22 16:46:12.385678 7f7077735700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby > >> >> > 2013-05-22 16:58:36.079010 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby > >> >> > 2013-05-22 16:58:36.798038 7f707d563700 0 log [INF] : 9.50c scrub ok > >> >> > 2013-05-22 16:58:40.104159 7f707c561700 0 log [INF] : 9.526 scrub ok > >> >> > > >> >> > > >> >> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29). > >> >> > > >> >> > > >> >> > > >> >> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit : > >> >> >> Can you post your ceph.log with the period including all of these errors? > >> >> >> -Sam > >> >> >> > >> >> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich > >> >> >> <mahatma@bspu.unibel.by> wrote: > >> >> >> > Olivier Bonvalet пишет: > >> >> >> >> > >> >> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : > >> >> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : > >> >> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not > >> >> >> >>>> repairing. How to repair it exclude re-creating of OSD? > >> >> >> >>>> > >> >> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple > >> >> >> >>>> OSDs - it may cause data lost. > >> >> >> >>> > >> >> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without > >> >> >> >>> head". How can I fix that ? > >> >> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to > >> >> >> >> manually fix that ? (it's a production cluster) > >> >> >> > > >> >> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And > >> >> >> > many others troubles. So, if you want to keep cluster running, wait for > >> >> >> > developers answer. IMHO. > >> >> >> > > >> >> >> > About manual repair attempt: see issue #4937. Also similar results - in subject > >> >> >> > "Inconsistent PG's, repair ineffective". > >> >> >> > > >> >> >> > -- > >> >> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/ > >> >> >> > _______________________________________________ > >> >> >> > ceph-users mailing list > >> >> >> > ceph-users@lists.ceph.com > >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> >> >> > >> >> > > >> >> > > >> >> -- > >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> >> the body of a message to majordomo@vger.kernel.org > >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> >> > >> > > >> > > >> > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ceph-users] scrub error: found clone without head 2013-05-23 22:27 ` Olivier Bonvalet @ 2013-05-23 22:53 ` Samuel Just 2013-05-31 13:36 ` Olivier Bonvalet 0 siblings, 1 reply; 14+ messages in thread From: Samuel Just @ 2013-05-23 22:53 UTC (permalink / raw) To: Olivier Bonvalet; +Cc: Denis Kaganovich, ceph-users@lists.ceph.com, ceph-devel Can you send the filenames in the pg directories for those 4 pgs? -Sam On Thu, May 23, 2013 at 3:27 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > No : > pg 3.7c is active+clean+inconsistent, acting [24,13,39] > pg 3.6b is active+clean+inconsistent, acting [28,23,5] > pg 3.d is active+clean+inconsistent, acting [29,4,11] > pg 3.1 is active+clean+inconsistent, acting [28,19,5] > > But I suppose that all PG *was* having the osd.25 as primary (on the > same host), which is (disabled) buggy OSD. > > Question : "12d7" in object path is the snapshot id, right ? If it's the > case, I haven't got any snapshot with this id for the > rb.0.15c26.238e1f29 image. > > So, which files should I remove ? > > Thanks for your help. > > > Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit : >> Do all of the affected PGs share osd.28 as the primary? I think the >> only recovery is probably to manually remove the orphaned clones. >> -Sam >> >> On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: >> > Not yet. I keep it for now. >> > >> > Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit : >> >> rb.0.15c26.238e1f29 >> >> >> >> Has that rbd volume been removed? >> >> -Sam >> >> >> >> On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: >> >> > 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail. >> >> > >> >> > >> >> > Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit : >> >> >> What version are you running? >> >> >> -Sam >> >> >> >> >> >> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: >> >> >> > Is it enough ? >> >> >> > >> >> >> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head' >> >> >> > 2013-05-22 15:43:09.308352 7f707dd64700 0 log [INF] : 9.105 scrub ok >> >> >> > 2013-05-22 15:44:21.054893 7f707dd64700 0 log [INF] : 9.451 scrub ok >> >> >> > 2013-05-22 15:44:52.898784 7f707cd62700 0 log [INF] : 9.784 scrub ok >> >> >> > 2013-05-22 15:47:43.148515 7f707cd62700 0 log [INF] : 9.3c3 scrub ok >> >> >> > 2013-05-22 15:47:45.717085 7f707dd64700 0 log [INF] : 9.3d0 scrub ok >> >> >> > 2013-05-22 15:52:14.573815 7f707dd64700 0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head >> >> >> > 2013-05-22 15:55:07.230114 7f707d563700 0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head >> >> >> > 2013-05-22 15:56:56.456242 7f707d563700 0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head >> >> >> > 2013-05-22 15:57:51.667085 7f707dd64700 0 log [ERR] : 3.6b scrub 3 errors >> >> >> > 2013-05-22 15:57:55.241224 7f707dd64700 0 log [INF] : 9.450 scrub ok >> >> >> > 2013-05-22 15:57:59.800383 7f707cd62700 0 log [INF] : 9.465 scrub ok >> >> >> > 2013-05-22 15:59:55.024065 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby >> >> >> > 2013-05-22 16:01:45.542579 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby >> >> >> > -- >> >> >> > 2013-05-22 16:29:49.544310 7f707dd64700 0 log [INF] : 9.4eb scrub ok >> >> >> > 2013-05-22 16:29:53.190233 7f707dd64700 0 log [INF] : 9.4f4 scrub ok >> >> >> > 2013-05-22 16:29:59.478736 7f707dd64700 0 log [INF] : 8.6bb scrub ok >> >> >> > 2013-05-22 16:35:12.240246 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby >> >> >> > 2013-05-22 16:35:19.519019 7f707d563700 0 log [INF] : 8.700 scrub ok >> >> >> > 2013-05-22 16:39:15.422532 7f707dd64700 0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head >> >> >> > 2013-05-22 16:40:04.995256 7f707cd62700 0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head >> >> >> > 2013-05-22 16:41:07.008717 7f707d563700 0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head >> >> >> > 2013-05-22 16:41:42.460280 7f707c561700 0 log [ERR] : 3.1 scrub 3 errors >> >> >> > 2013-05-22 16:46:12.385678 7f7077735700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby >> >> >> > 2013-05-22 16:58:36.079010 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby >> >> >> > 2013-05-22 16:58:36.798038 7f707d563700 0 log [INF] : 9.50c scrub ok >> >> >> > 2013-05-22 16:58:40.104159 7f707c561700 0 log [INF] : 9.526 scrub ok >> >> >> > >> >> >> > >> >> >> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29). >> >> >> > >> >> >> > >> >> >> > >> >> >> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit : >> >> >> >> Can you post your ceph.log with the period including all of these errors? >> >> >> >> -Sam >> >> >> >> >> >> >> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich >> >> >> >> <mahatma@bspu.unibel.by> wrote: >> >> >> >> > Olivier Bonvalet пишет: >> >> >> >> >> >> >> >> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : >> >> >> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : >> >> >> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not >> >> >> >> >>>> repairing. How to repair it exclude re-creating of OSD? >> >> >> >> >>>> >> >> >> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple >> >> >> >> >>>> OSDs - it may cause data lost. >> >> >> >> >>> >> >> >> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without >> >> >> >> >>> head". How can I fix that ? >> >> >> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to >> >> >> >> >> manually fix that ? (it's a production cluster) >> >> >> >> > >> >> >> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And >> >> >> >> > many others troubles. So, if you want to keep cluster running, wait for >> >> >> >> > developers answer. IMHO. >> >> >> >> > >> >> >> >> > About manual repair attempt: see issue #4937. Also similar results - in subject >> >> >> >> > "Inconsistent PG's, repair ineffective". >> >> >> >> > >> >> >> >> > -- >> >> >> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/ >> >> >> >> > _______________________________________________ >> >> >> >> > ceph-users mailing list >> >> >> >> > ceph-users@lists.ceph.com >> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> >> >> >> > >> >> >> > >> >> >> -- >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> >> >> the body of a message to majordomo@vger.kernel.org >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> >> > >> >> > >> >> >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scrub error: found clone without head 2013-05-23 22:53 ` Samuel Just @ 2013-05-31 13:36 ` Olivier Bonvalet 2013-05-31 14:34 ` Olivier Bonvalet 0 siblings, 1 reply; 14+ messages in thread From: Olivier Bonvalet @ 2013-05-31 13:36 UTC (permalink / raw) To: Samuel Just Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-devel, Denis Kaganovich Hi, sorry for the late answer : trying to fix that, I tried to delete the image (rbd rm XXX), the "rbd rm" complete without errors, but "rbd ls" still display this image. What should I do ? Here the files for the PG 3.6b : # find /var/lib/ceph/osd/ceph-28/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l -rw-r--r-- 1 root root 4194304 19 mai 22:52 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.000000009221__12d7_ADE3C16B__3 -rw-r--r-- 1 root root 4194304 19 mai 23:00 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.000000003671__12d7_261CC0EB__3 -rw-r--r-- 1 root root 4194304 19 mai 22:59 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.0000000086a2__12d7_B10DEAEB__3 # find /var/lib/ceph/osd/ceph-23/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l -rw-r--r-- 1 root root 4194304 25 mars 19:18 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.000000009221__12d7_ADE3C16B__3 -rw-r--r-- 1 root root 4194304 25 mars 19:33 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.000000003671__12d7_261CC0EB__3 -rw-r--r-- 1 root root 4194304 25 mars 19:34 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.0000000086a2__12d7_B10DEAEB__3 # find /var/lib/ceph/osd/ceph-5/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l -rw-r--r-- 1 root root 4194304 25 mars 19:18 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.000000009221__12d7_ADE3C16B__3 -rw-r--r-- 1 root root 4194304 25 mars 19:33 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.000000003671__12d7_261CC0EB__3 -rw-r--r-- 1 root root 4194304 25 mars 19:34 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.0000000086a2__12d7_B10DEAEB__3 As you can see, OSD doesn't contain any other data on thoses PG for this RBD image. Should I remove them thought rados ? In fact I remember that some of thoses files was truncated (size 0), then I manually copy data from osd-5. It was probably an error to do that. Thanks, Olivier Le jeudi 23 mai 2013 à 15:53 -0700, Samuel Just a écrit : > Can you send the filenames in the pg directories for those 4 pgs? > -Sam > > On Thu, May 23, 2013 at 3:27 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > > No : > > pg 3.7c is active+clean+inconsistent, acting [24,13,39] > > pg 3.6b is active+clean+inconsistent, acting [28,23,5] > > pg 3.d is active+clean+inconsistent, acting [29,4,11] > > pg 3.1 is active+clean+inconsistent, acting [28,19,5] > > > > But I suppose that all PG *was* having the osd.25 as primary (on the > > same host), which is (disabled) buggy OSD. > > > > Question : "12d7" in object path is the snapshot id, right ? If it's the > > case, I haven't got any snapshot with this id for the > > rb.0.15c26.238e1f29 image. > > > > So, which files should I remove ? > > > > Thanks for your help. > > > > > > Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit : > >> Do all of the affected PGs share osd.28 as the primary? I think the > >> only recovery is probably to manually remove the orphaned clones. > >> -Sam > >> > >> On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > >> > Not yet. I keep it for now. > >> > > >> > Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit : > >> >> rb.0.15c26.238e1f29 > >> >> > >> >> Has that rbd volume been removed? > >> >> -Sam > >> >> > >> >> On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > >> >> > 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail. > >> >> > > >> >> > > >> >> > Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit : > >> >> >> What version are you running? > >> >> >> -Sam > >> >> >> > >> >> >> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > >> >> >> > Is it enough ? > >> >> >> > > >> >> >> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head' > >> >> >> > 2013-05-22 15:43:09.308352 7f707dd64700 0 log [INF] : 9.105 scrub ok > >> >> >> > 2013-05-22 15:44:21.054893 7f707dd64700 0 log [INF] : 9.451 scrub ok > >> >> >> > 2013-05-22 15:44:52.898784 7f707cd62700 0 log [INF] : 9.784 scrub ok > >> >> >> > 2013-05-22 15:47:43.148515 7f707cd62700 0 log [INF] : 9.3c3 scrub ok > >> >> >> > 2013-05-22 15:47:45.717085 7f707dd64700 0 log [INF] : 9.3d0 scrub ok > >> >> >> > 2013-05-22 15:52:14.573815 7f707dd64700 0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head > >> >> >> > 2013-05-22 15:55:07.230114 7f707d563700 0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head > >> >> >> > 2013-05-22 15:56:56.456242 7f707d563700 0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head > >> >> >> > 2013-05-22 15:57:51.667085 7f707dd64700 0 log [ERR] : 3.6b scrub 3 errors > >> >> >> > 2013-05-22 15:57:55.241224 7f707dd64700 0 log [INF] : 9.450 scrub ok > >> >> >> > 2013-05-22 15:57:59.800383 7f707cd62700 0 log [INF] : 9.465 scrub ok > >> >> >> > 2013-05-22 15:59:55.024065 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby > >> >> >> > 2013-05-22 16:01:45.542579 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby > >> >> >> > -- > >> >> >> > 2013-05-22 16:29:49.544310 7f707dd64700 0 log [INF] : 9.4eb scrub ok > >> >> >> > 2013-05-22 16:29:53.190233 7f707dd64700 0 log [INF] : 9.4f4 scrub ok > >> >> >> > 2013-05-22 16:29:59.478736 7f707dd64700 0 log [INF] : 8.6bb scrub ok > >> >> >> > 2013-05-22 16:35:12.240246 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby > >> >> >> > 2013-05-22 16:35:19.519019 7f707d563700 0 log [INF] : 8.700 scrub ok > >> >> >> > 2013-05-22 16:39:15.422532 7f707dd64700 0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head > >> >> >> > 2013-05-22 16:40:04.995256 7f707cd62700 0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head > >> >> >> > 2013-05-22 16:41:07.008717 7f707d563700 0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head > >> >> >> > 2013-05-22 16:41:42.460280 7f707c561700 0 log [ERR] : 3.1 scrub 3 errors > >> >> >> > 2013-05-22 16:46:12.385678 7f7077735700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby > >> >> >> > 2013-05-22 16:58:36.079010 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby > >> >> >> > 2013-05-22 16:58:36.798038 7f707d563700 0 log [INF] : 9.50c scrub ok > >> >> >> > 2013-05-22 16:58:40.104159 7f707c561700 0 log [INF] : 9.526 scrub ok > >> >> >> > > >> >> >> > > >> >> >> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29). > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit : > >> >> >> >> Can you post your ceph.log with the period including all of these errors? > >> >> >> >> -Sam > >> >> >> >> > >> >> >> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich > >> >> >> >> <mahatma@bspu.unibel.by> wrote: > >> >> >> >> > Olivier Bonvalet пишет: > >> >> >> >> >> > >> >> >> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : > >> >> >> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : > >> >> >> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not > >> >> >> >> >>>> repairing. How to repair it exclude re-creating of OSD? > >> >> >> >> >>>> > >> >> >> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple > >> >> >> >> >>>> OSDs - it may cause data lost. > >> >> >> >> >>> > >> >> >> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without > >> >> >> >> >>> head". How can I fix that ? > >> >> >> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to > >> >> >> >> >> manually fix that ? (it's a production cluster) > >> >> >> >> > > >> >> >> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And > >> >> >> >> > many others troubles. So, if you want to keep cluster running, wait for > >> >> >> >> > developers answer. IMHO. > >> >> >> >> > > >> >> >> >> > About manual repair attempt: see issue #4937. Also similar results - in subject > >> >> >> >> > "Inconsistent PG's, repair ineffective". > >> >> >> >> > > >> >> >> >> > -- > >> >> >> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/ > >> >> >> >> > _______________________________________________ > >> >> >> >> > ceph-users mailing list > >> >> >> >> > ceph-users@lists.ceph.com > >> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> >> >> >> > >> >> >> > > >> >> >> > > >> >> >> -- > >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> >> >> the body of a message to majordomo@vger.kernel.org > >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> >> >> > >> >> > > >> >> > > >> >> > >> > > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scrub error: found clone without head 2013-05-31 13:36 ` Olivier Bonvalet @ 2013-05-31 14:34 ` Olivier Bonvalet 2013-05-31 15:55 ` [solved] " Olivier Bonvalet 0 siblings, 1 reply; 14+ messages in thread From: Olivier Bonvalet @ 2013-05-31 14:34 UTC (permalink / raw) To: Samuel Just Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-devel, Denis Kaganovich Note that I still have scrub errors, but rados doesn't see thoses objects : root! brontes:~# rados -p hdd3copies ls | grep '^rb.0.15c26.238e1f29' root! brontes:~# Le vendredi 31 mai 2013 à 15:36 +0200, Olivier Bonvalet a écrit : > Hi, > > sorry for the late answer : trying to fix that, I tried to delete the > image (rbd rm XXX), the "rbd rm" complete without errors, but "rbd ls" > still display this image. > > What should I do ? > > > Here the files for the PG 3.6b : > > # find /var/lib/ceph/osd/ceph-28/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l > -rw-r--r-- 1 root root 4194304 19 mai 22:52 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.000000009221__12d7_ADE3C16B__3 > -rw-r--r-- 1 root root 4194304 19 mai 23:00 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.000000003671__12d7_261CC0EB__3 > -rw-r--r-- 1 root root 4194304 19 mai 22:59 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.0000000086a2__12d7_B10DEAEB__3 > > # find /var/lib/ceph/osd/ceph-23/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l > -rw-r--r-- 1 root root 4194304 25 mars 19:18 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.000000009221__12d7_ADE3C16B__3 > -rw-r--r-- 1 root root 4194304 25 mars 19:33 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.000000003671__12d7_261CC0EB__3 > -rw-r--r-- 1 root root 4194304 25 mars 19:34 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.0000000086a2__12d7_B10DEAEB__3 > > # find /var/lib/ceph/osd/ceph-5/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l > -rw-r--r-- 1 root root 4194304 25 mars 19:18 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.000000009221__12d7_ADE3C16B__3 > -rw-r--r-- 1 root root 4194304 25 mars 19:33 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.000000003671__12d7_261CC0EB__3 > -rw-r--r-- 1 root root 4194304 25 mars 19:34 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.0000000086a2__12d7_B10DEAEB__3 > > > As you can see, OSD doesn't contain any other data on thoses PG for this RBD image. Should I remove them thought rados ? > > > In fact I remember that some of thoses files was truncated (size 0), then I manually copy data from osd-5. It was probably an error to do that. > > > Thanks, > Olivier > > Le jeudi 23 mai 2013 à 15:53 -0700, Samuel Just a écrit : > > Can you send the filenames in the pg directories for those 4 pgs? > > -Sam > > > > On Thu, May 23, 2013 at 3:27 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > > > No : > > > pg 3.7c is active+clean+inconsistent, acting [24,13,39] > > > pg 3.6b is active+clean+inconsistent, acting [28,23,5] > > > pg 3.d is active+clean+inconsistent, acting [29,4,11] > > > pg 3.1 is active+clean+inconsistent, acting [28,19,5] > > > > > > But I suppose that all PG *was* having the osd.25 as primary (on the > > > same host), which is (disabled) buggy OSD. > > > > > > Question : "12d7" in object path is the snapshot id, right ? If it's the > > > case, I haven't got any snapshot with this id for the > > > rb.0.15c26.238e1f29 image. > > > > > > So, which files should I remove ? > > > > > > Thanks for your help. > > > > > > > > > Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit : > > >> Do all of the affected PGs share osd.28 as the primary? I think the > > >> only recovery is probably to manually remove the orphaned clones. > > >> -Sam > > >> > > >> On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > > >> > Not yet. I keep it for now. > > >> > > > >> > Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit : > > >> >> rb.0.15c26.238e1f29 > > >> >> > > >> >> Has that rbd volume been removed? > > >> >> -Sam > > >> >> > > >> >> On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > > >> >> > 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail. > > >> >> > > > >> >> > > > >> >> > Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit : > > >> >> >> What version are you running? > > >> >> >> -Sam > > >> >> >> > > >> >> >> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote: > > >> >> >> > Is it enough ? > > >> >> >> > > > >> >> >> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head' > > >> >> >> > 2013-05-22 15:43:09.308352 7f707dd64700 0 log [INF] : 9.105 scrub ok > > >> >> >> > 2013-05-22 15:44:21.054893 7f707dd64700 0 log [INF] : 9.451 scrub ok > > >> >> >> > 2013-05-22 15:44:52.898784 7f707cd62700 0 log [INF] : 9.784 scrub ok > > >> >> >> > 2013-05-22 15:47:43.148515 7f707cd62700 0 log [INF] : 9.3c3 scrub ok > > >> >> >> > 2013-05-22 15:47:45.717085 7f707dd64700 0 log [INF] : 9.3d0 scrub ok > > >> >> >> > 2013-05-22 15:52:14.573815 7f707dd64700 0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head > > >> >> >> > 2013-05-22 15:55:07.230114 7f707d563700 0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head > > >> >> >> > 2013-05-22 15:56:56.456242 7f707d563700 0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head > > >> >> >> > 2013-05-22 15:57:51.667085 7f707dd64700 0 log [ERR] : 3.6b scrub 3 errors > > >> >> >> > 2013-05-22 15:57:55.241224 7f707dd64700 0 log [INF] : 9.450 scrub ok > > >> >> >> > 2013-05-22 15:57:59.800383 7f707cd62700 0 log [INF] : 9.465 scrub ok > > >> >> >> > 2013-05-22 15:59:55.024065 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby > > >> >> >> > 2013-05-22 16:01:45.542579 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby > > >> >> >> > -- > > >> >> >> > 2013-05-22 16:29:49.544310 7f707dd64700 0 log [INF] : 9.4eb scrub ok > > >> >> >> > 2013-05-22 16:29:53.190233 7f707dd64700 0 log [INF] : 9.4f4 scrub ok > > >> >> >> > 2013-05-22 16:29:59.478736 7f707dd64700 0 log [INF] : 8.6bb scrub ok > > >> >> >> > 2013-05-22 16:35:12.240246 7f7022770700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby > > >> >> >> > 2013-05-22 16:35:19.519019 7f707d563700 0 log [INF] : 8.700 scrub ok > > >> >> >> > 2013-05-22 16:39:15.422532 7f707dd64700 0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head > > >> >> >> > 2013-05-22 16:40:04.995256 7f707cd62700 0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head > > >> >> >> > 2013-05-22 16:41:07.008717 7f707d563700 0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head > > >> >> >> > 2013-05-22 16:41:42.460280 7f707c561700 0 log [ERR] : 3.1 scrub 3 errors > > >> >> >> > 2013-05-22 16:46:12.385678 7f7077735700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby > > >> >> >> > 2013-05-22 16:58:36.079010 7f707661a700 0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby > > >> >> >> > 2013-05-22 16:58:36.798038 7f707d563700 0 log [INF] : 9.50c scrub ok > > >> >> >> > 2013-05-22 16:58:40.104159 7f707c561700 0 log [INF] : 9.526 scrub ok > > >> >> >> > > > >> >> >> > > > >> >> >> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29). > > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit : > > >> >> >> >> Can you post your ceph.log with the period including all of these errors? > > >> >> >> >> -Sam > > >> >> >> >> > > >> >> >> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich > > >> >> >> >> <mahatma@bspu.unibel.by> wrote: > > >> >> >> >> > Olivier Bonvalet пишет: > > >> >> >> >> >> > > >> >> >> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : > > >> >> >> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : > > >> >> >> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not > > >> >> >> >> >>>> repairing. How to repair it exclude re-creating of OSD? > > >> >> >> >> >>>> > > >> >> >> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple > > >> >> >> >> >>>> OSDs - it may cause data lost. > > >> >> >> >> >>> > > >> >> >> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without > > >> >> >> >> >>> head". How can I fix that ? > > >> >> >> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to > > >> >> >> >> >> manually fix that ? (it's a production cluster) > > >> >> >> >> > > > >> >> >> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And > > >> >> >> >> > many others troubles. So, if you want to keep cluster running, wait for > > >> >> >> >> > developers answer. IMHO. > > >> >> >> >> > > > >> >> >> >> > About manual repair attempt: see issue #4937. Also similar results - in subject > > >> >> >> >> > "Inconsistent PG's, repair ineffective". > > >> >> >> >> > > > >> >> >> >> > -- > > >> >> >> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/ > > >> >> >> >> > _______________________________________________ > > >> >> >> >> > ceph-users mailing list > > >> >> >> >> > ceph-users@lists.ceph.com > > >> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >> >> >> >> > > >> >> >> > > > >> >> >> > > > >> >> >> -- > > >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > >> >> >> the body of a message to majordomo@vger.kernel.org > > >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > >> >> >> > > >> >> > > > >> >> > > > >> >> > > >> > > > >> -- > > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > >> the body of a message to majordomo@vger.kernel.org > > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > >> > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [solved] scrub error: found clone without head 2013-05-31 14:34 ` Olivier Bonvalet @ 2013-05-31 15:55 ` Olivier Bonvalet 0 siblings, 0 replies; 14+ messages in thread From: Olivier Bonvalet @ 2013-05-31 15:55 UTC (permalink / raw) To: Samuel Just Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-devel, Denis Kaganovich Ok, so : - after a second "rbd rm XXX", the image was gone - and "rados ls" doesn't see any object from that image - so I tried to move thoses files => scrub is now ok ! So for me it's fixed. Thanks Le vendredi 31 mai 2013 à 16:34 +0200, Olivier Bonvalet a écrit : > Note that I still have scrub errors, but rados doesn't see thoses > objects : > > root! brontes:~# rados -p hdd3copies ls | grep '^rb.0.15c26.238e1f29' > root! brontes:~# > > > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-05-31 15:55 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <5188F8D2.5040303@bspu.unibel.by>
[not found] ` <1369001190.9705.37.camel@localhost>
2013-05-22 7:00 ` scrub error: found clone without head Olivier Bonvalet
2013-05-22 12:39 ` Dzianis Kahanovich
[not found] ` <519CBC66.9030607-jC57FVqSskN9IU3jSFkzTg@public.gmane.org>
2013-05-22 18:01 ` Samuel Just
[not found] ` <CA+4uBUYKNnGfH2JnDBmyRy86RovFEOV=pbW2cdhQ2kV0spijXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-22 18:25 ` Olivier Bonvalet
2013-05-22 19:00 ` Samuel Just
2013-05-22 19:18 ` [ceph-users] " Olivier Bonvalet
2013-05-22 22:50 ` Samuel Just
[not found] ` <CA+4uBUZcsi5eK31p8t+-7jZaERu40O1LSRW6wJOkfho_FwYXkg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-23 12:00 ` Olivier Bonvalet
2013-05-23 22:17 ` [ceph-users] " Samuel Just
2013-05-23 22:27 ` Olivier Bonvalet
2013-05-23 22:53 ` Samuel Just
2013-05-31 13:36 ` Olivier Bonvalet
2013-05-31 14:34 ` Olivier Bonvalet
2013-05-31 15:55 ` [solved] " Olivier Bonvalet
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.