All of lore.kernel.org
 help / color / mirror / Atom feed
From: Olivier Bonvalet <ceph.list-PaEMFeTk6C1QFI55V6+gNQ@public.gmane.org>
To: Samuel Just <sam.just-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org>
Cc: "ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org"
	<ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>,
	ceph-devel <ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Denis Kaganovich <mahatma-cw37gHAUgAY@public.gmane.org>
Subject: Re: scrub error: found clone without head
Date: Fri, 31 May 2013 15:36:10 +0200	[thread overview]
Message-ID: <1370007370.2951.13.camel@localhost> (raw)
In-Reply-To: <CA+4uBUbMu-iJ8EPqaVe9smL0Ri2jT86puviFoc3b5di2Uop==A@mail.gmail.com>

Hi,

sorry for the late answer : trying to fix that, I tried to delete the
image (rbd rm XXX), the "rbd rm" complete without errors, but "rbd ls"
still display this image.

What should I do ?


Here the files for the PG 3.6b :

# find /var/lib/ceph/osd/ceph-28/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
-rw-r--r-- 1 root root 4194304 19 mai   22:52 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.000000009221__12d7_ADE3C16B__3
-rw-r--r-- 1 root root 4194304 19 mai   23:00 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.000000003671__12d7_261CC0EB__3
-rw-r--r-- 1 root root 4194304 19 mai   22:59 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.0000000086a2__12d7_B10DEAEB__3

# find /var/lib/ceph/osd/ceph-23/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
-rw-r--r-- 1 root root 4194304 25 mars  19:18 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.000000009221__12d7_ADE3C16B__3
-rw-r--r-- 1 root root 4194304 25 mars  19:33 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.000000003671__12d7_261CC0EB__3
-rw-r--r-- 1 root root 4194304 25 mars  19:34 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.0000000086a2__12d7_B10DEAEB__3

# find /var/lib/ceph/osd/ceph-5/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
-rw-r--r-- 1 root root 4194304 25 mars  19:18 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.000000009221__12d7_ADE3C16B__3
-rw-r--r-- 1 root root 4194304 25 mars  19:33 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.000000003671__12d7_261CC0EB__3
-rw-r--r-- 1 root root 4194304 25 mars  19:34 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.0000000086a2__12d7_B10DEAEB__3


As you can see, OSD doesn't contain any other data on thoses PG for this RBD image. Should I remove them thought rados ?


In fact I remember that some of thoses files was truncated (size 0), then I manually copy data from osd-5. It was probably an error to do that.


Thanks,
Olivier

Le jeudi 23 mai 2013 à 15:53 -0700, Samuel Just a écrit :
> Can you send the filenames in the pg directories for those 4 pgs?
> -Sam
> 
> On Thu, May 23, 2013 at 3:27 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> > No :
> > pg 3.7c is active+clean+inconsistent, acting [24,13,39]
> > pg 3.6b is active+clean+inconsistent, acting [28,23,5]
> > pg 3.d is active+clean+inconsistent, acting [29,4,11]
> > pg 3.1 is active+clean+inconsistent, acting [28,19,5]
> >
> > But I suppose that all PG *was* having the osd.25 as primary (on the
> > same host), which is (disabled) buggy OSD.
> >
> > Question : "12d7" in object path is the snapshot id, right ? If it's the
> > case, I haven't got any snapshot with this id for the
> > rb.0.15c26.238e1f29 image.
> >
> > So, which files should I remove ?
> >
> > Thanks for your help.
> >
> >
> > Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit :
> >> Do all of the affected PGs share osd.28 as the primary?  I think the
> >> only recovery is probably to manually remove the orphaned clones.
> >> -Sam
> >>
> >> On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> >> > Not yet. I keep it for now.
> >> >
> >> > Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
> >> >> rb.0.15c26.238e1f29
> >> >>
> >> >> Has that rbd volume been removed?
> >> >> -Sam
> >> >>
> >> >> On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> >> >> > 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
> >> >> >
> >> >> >
> >> >> > Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
> >> >> >> What version are you running?
> >> >> >> -Sam
> >> >> >>
> >> >> >> On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> >> >> >> > Is it enough ?
> >> >> >> >
> >> >> >> > # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head'
> >> >> >> > 2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
> >> >> >> > 2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
> >> >> >> > 2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
> >> >> >> > 2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
> >> >> >> > 2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
> >> >> >> > 2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.000000009221/12d7//3 found clone without head
> >> >> >> > 2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.000000003671/12d7//3 found clone without head
> >> >> >> > 2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.0000000086a2/12d7//3 found clone without head
> >> >> >> > 2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 errors
> >> >> >> > 2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
> >> >> >> > 2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
> >> >> >> > 2013-05-22 15:59:55.024065 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby
> >> >> >> > 2013-05-22 16:01:45.542579 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby
> >> >> >> > --
> >> >> >> > 2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
> >> >> >> > 2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
> >> >> >> > 2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
> >> >> >> > 2013-05-22 16:35:12.240246 7f7022770700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby
> >> >> >> > 2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
> >> >> >> > 2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.000000000836/12d7//3 found clone without head
> >> >> >> > 2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.000000009a00/12d7//3 found clone without head
> >> >> >> > 2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.000000009820/12d7//3 found clone without head
> >> >> >> > 2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 errors
> >> >> >> > 2013-05-22 16:46:12.385678 7f7077735700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby
> >> >> >> > 2013-05-22 16:58:36.079010 7f707661a700  0 -- 192.168.42.3:6803/12142 >> 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby
> >> >> >> > 2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
> >> >> >> > 2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok
> >> >> >> >
> >> >> >> >
> >> >> >> > Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
> >> >> >> >> Can you post your ceph.log with the period including all of these errors?
> >> >> >> >> -Sam
> >> >> >> >>
> >> >> >> >> On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
> >> >> >> >> <mahatma@bspu.unibel.by> wrote:
> >> >> >> >> > Olivier Bonvalet пишет:
> >> >> >> >> >>
> >> >> >> >> >> Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
> >> >> >> >> >>> Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
> >> >> >> >> >>>> I have 4 scrub errors (3 PGs - "found clone without head"), on one OSD. Not
> >> >> >> >> >>>> repairing. How to repair it exclude re-creating of OSD?
> >> >> >> >> >>>>
> >> >> >> >> >>>> Now it "easy" to clean+create OSD, but in theory - in case there are multiple
> >> >> >> >> >>>> OSDs - it may cause data lost.
> >> >> >> >> >>>
> >> >> >> >> >>> I have same problem : 8 objects (4 PG) with error "found clone without
> >> >> >> >> >>> head". How can I fix that ?
> >> >> >> >> >> since "pg repair" doesn't handle that kind of errors, is there a way to
> >> >> >> >> >> manually fix that ? (it's a production cluster)
> >> >> >> >> >
> >> >> >> >> > Trying to fix manually I cause assertions in trimming process (died OSD). And
> >> >> >> >> > many others troubles. So, if you want to keep cluster running, wait for
> >> >> >> >> > developers answer. IMHO.
> >> >> >> >> >
> >> >> >> >> > About manual repair attempt: see issue #4937. Also similar results - in subject
> >> >> >> >> > "Inconsistent PG's, repair ineffective".
> >> >> >> >> >
> >> >> >> >> > --
> >> >> >> >> > WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
> >> >> >> >> > _______________________________________________
> >> >> >> >> > ceph-users mailing list
> >> >> >> >> > ceph-users@lists.ceph.com
> >> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> --
> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

  reply	other threads:[~2013-05-31 13:36 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <5188F8D2.5040303@bspu.unibel.by>
     [not found] ` <1369001190.9705.37.camel@localhost>
2013-05-22  7:00   ` scrub error: found clone without head Olivier Bonvalet
2013-05-22 12:39     ` Dzianis Kahanovich
     [not found]       ` <519CBC66.9030607-jC57FVqSskN9IU3jSFkzTg@public.gmane.org>
2013-05-22 18:01         ` Samuel Just
     [not found]           ` <CA+4uBUYKNnGfH2JnDBmyRy86RovFEOV=pbW2cdhQ2kV0spijXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-22 18:25             ` Olivier Bonvalet
2013-05-22 19:00               ` Samuel Just
2013-05-22 19:18                 ` [ceph-users] " Olivier Bonvalet
2013-05-22 22:50                   ` Samuel Just
     [not found]                     ` <CA+4uBUZcsi5eK31p8t+-7jZaERu40O1LSRW6wJOkfho_FwYXkg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-23 12:00                       ` Olivier Bonvalet
2013-05-23 22:17                         ` [ceph-users] " Samuel Just
2013-05-23 22:27                           ` Olivier Bonvalet
2013-05-23 22:53                             ` Samuel Just
2013-05-31 13:36                               ` Olivier Bonvalet [this message]
2013-05-31 14:34                                 ` Olivier Bonvalet
2013-05-31 15:55                                   ` [solved] " Olivier Bonvalet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1370007370.2951.13.camel@localhost \
    --to=ceph.list-paemfetk6c1qfi55v6+gnq@public.gmane.org \
    --cc=ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org \
    --cc=mahatma-cw37gHAUgAY@public.gmane.org \
    --cc=sam.just-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.