All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josh Durgin <josh.durgin@dreamhost.com>
To: Henry C Chang <henry.cy.chang@gmail.com>
Cc: ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: Questions about OSD recovery
Date: Wed, 08 Feb 2012 19:14:20 -0800	[thread overview]
Message-ID: <4F333A0C.7080902@dreamhost.com> (raw)
In-Reply-To: <CANExM-NCijMZX89Bp6wOGkLmkqKeqiqhnMJ8VKKN_QSQXnm7LA@mail.gmail.com>

On 02/07/2012 06:54 PM, Henry C Chang wrote:
> Hi all,
>
> I did some experiments on the OSD and had some questions about it.
>
> I removed one object directly from the osd data store. As expected,
> the osd didn't notice it until I manually scrubbed the pg. However,
> the scrubbing doen't trigger the recovery automatically. I had to do
> 'ceph pg repair' to fix it.
>
> So, my first question is: can the recovery process be triggered
> automatically once the scrubbing has detected the inconsistency?

It's possible to do what the current repair code does
automatically, but this would be a bad idea since it just takes
the first osd (with primary before replicas) to have the object
as authoritative, and copies it to all the relevant osds. If the
primary has a corrupt copy, this corruption will spread to other
osds. In your case, since you removed the object entirely, repair
could correct it.

In general, if an object is corrupted, there's no way to tell
which one is correct right now. You could use btrfs checksumming
underneath the osd to protect against this, but the osds don't
checksum the objects themselves. Scrub/repair could certainly be
a lot smarter. It's been on the todo list for a while, but we
haven't gotten to it yet.

> Then, I tried again and removed another object. But this time, I
> didn't scrub the pg. I restarted the osd. As expected, the osd didn't
> notice that, either.
>
> My second question is: is it possible to check the existence of the
> objects when scanning the pg during osd startup? Does it make sense to
> do so?

Detecting missing objects on startup is possible by looking at
the pg log and comparing it to the objects on disk, but this can
be a pretty expensive operation. The osd might also be out of
date, so it's log might be useless (for example it could have
divergent history that was not acked). It can't know how many
current objects that should be there aren't until it goes through
peering (to get an up to date and authoritative log) and
recovery (to get missing data the logs say should be there). This
is why scrub skips pgs that aren't active+clean. More details of
peering can be found at http://ceph.newdream.net/docs/latest/dev/peering/.

  reply	other threads:[~2012-02-09  3:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-08  2:54 Questions about OSD recovery Henry C Chang
2012-02-09  3:14 ` Josh Durgin [this message]
2012-02-09 17:28   ` Tommi Virtanen
2012-02-10  8:26   ` Henry C Chang
2012-02-10 22:34     ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F333A0C.7080902@dreamhost.com \
    --to=josh.durgin@dreamhost.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=henry.cy.chang@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.