Re: Cluster sync doesn't finsh

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Martin Mailand <martin@tuxadero.com>
To: Samuel Just <sam.just@dreamhost.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: Cluster sync doesn't finsh
Date: Mon, 05 Dec 2011 13:44:10 +0100	[thread overview]
Message-ID: <4EDCBC9A.1070107@tuxadero.com> (raw)
In-Reply-To: <CACLRD_3S1xfBE0v=vX5hQcLoBSd_EJVsUvLK8R=oRCBu2fSz4w@mail.gmail.com>

Hi Sam,
is there anything new on this Issue, which I could test?

-martin


Am 19.11.2011 02:05, schrieb Samuel Just:
> I've inserted this bug as #1738.  Unfortunately, this will take a bit
> of effort to fix.  In the short term, you could switch to a crushmap
> where each node at the bottom level of the hierarchy contains more
> than one device.  (i.e., remove the node level and stop at the rack
> level).
>
> Thanks for the help!
> -Sam
>
> On Fri, Nov 18, 2011 at 12:17 PM, Martin Mailand<martin@tuxadero.com>  wrote:
>> Hi Sam,
>>
>> here the crushmap
>>
>> http://85.214.49.87/ceph/crushmap.txt
>> http://85.214.49.87/ceph/crushmap
>>
>> -martin
>>
>> Samuel Just schrieb:
>>>
>>> It looks like a crushmap related problem.  Could you send us the crushmap?
>>>
>>> ceph osd getcrushmap
>>>
>>> Thanks
>>> -Sam
>>>
>>> On Fri, Nov 18, 2011 at 10:13 AM, Gregory Farnum
>>> <gregory.farnum@dreamhost.com>  wrote:
>>>>
>>>> On Fri, Nov 18, 2011 at 10:05 AM, Tommi Virtanen
>>>> <tommi.virtanen@dreamhost.com>  wrote:
>>>>>
>>>>> On Thu, Nov 17, 2011 at 12:48, Martin Mailand<martin@tuxadero.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>> I am doing cluster failure test, where I shut down one OSD an wait for
>>>>>> the
>>>>>> cluster to sync. But the sync never finshed, at around 4-5% it stops. I
>>>>>> stoped osd2.
>>>>>
>>>>> ...
>>>>>>
>>>>>> 2011-11-17 16:42:45.520740    pg v1337: 600 pgs: 547 active+clean, 53
>>>>>> active+clean+degraded; 113 GB data, 184 GB used, 1141 GB / 1395 GB
>>>>>> avail;
>>>>>> 4025/82404 degraded (4.884%)
>>>>>
>>>>> ...
>>>>>>
>>>>>> The osd log, the ceph.conf, pg dump, osd dump could be found here.
>>>>>>
>>>>>> http://85.214.49.87/ceph/
>>>>>
>>>>> This looks a bit worrying:
>>>>>
>>>>> 2011-11-17 17:56:35.771574 7f704c834700 -- 192.168.42.113:0/2424>>
>>>>> 192.168.42.114:6802/21115 pipe(0x2596c80 sd=17 pgs=0 cs=0 l=0).connect
>>>>> claims to be 192.168.42.114:6802/21507 not 192.168.42.114:6802/21115 -
>>>>> wrong node!
>>>>>
>>>>> So osd.0 is basically refusing to talk to one of the other OSDs. I
>>>>> don't understand the messenger well enough to know why this would be,
>>>>> but it wouldn't surprise me if this problem kept the objects degraded
>>>>> -- it looks like a breakage in the osd<->osd communication.
>>>>>
>>>>> Now if this was the reason, I'd expect a restart of all the OSDs to
>>>>> get it back in shape; messenger state is ephemeral. Can you confirm
>>>>> that?
>>>>
>>>> Probably not — that wrong node thing can occur for a lot of different
>>>> reasons, some of which matter and most of which don't. Sam's looking
>>>> into the problem; there's something going wrong with the CRUSH
>>>> calculations or the monitor PG placement overrides or something...
>>>> -Greg
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

     prev parent reply	other threads:[~2011-12-05 12:44 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-17 20:48 Cluster sync doesn't finsh Martin Mailand
2011-11-18 18:05 ` Tommi Virtanen
2011-11-18 18:13   ` Gregory Farnum
2011-11-18 19:18     ` Samuel Just
2011-11-18 20:17       ` Martin Mailand
2011-11-19  1:05         ` Samuel Just
2011-12-05 12:44           ` Martin Mailand [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EDCBC9A.1070107@tuxadero.com \
    --to=martin@tuxadero.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sam.just@dreamhost.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.