From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Mailand Subject: Re: Cluster sync doesn't finsh Date: Mon, 05 Dec 2011 13:44:10 +0100 Message-ID: <4EDCBC9A.1070107@tuxadero.com> References: <4EC57338.9040004@tuxadero.com> <4EC6BD55.5050709@tuxadero.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from einhorn.in-berlin.de ([192.109.42.8]:57060 "EHLO einhorn.in-berlin.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754203Ab1LEMoP (ORCPT ); Mon, 5 Dec 2011 07:44:15 -0500 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Samuel Just Cc: ceph-devel@vger.kernel.org Hi Sam, is there anything new on this Issue, which I could test? -martin Am 19.11.2011 02:05, schrieb Samuel Just: > I've inserted this bug as #1738. Unfortunately, this will take a bit > of effort to fix. In the short term, you could switch to a crushmap > where each node at the bottom level of the hierarchy contains more > than one device. (i.e., remove the node level and stop at the rack > level). > > Thanks for the help! > -Sam > > On Fri, Nov 18, 2011 at 12:17 PM, Martin Mailand= wrote: >> Hi Sam, >> >> here the crushmap >> >> http://85.214.49.87/ceph/crushmap.txt >> http://85.214.49.87/ceph/crushmap >> >> -martin >> >> Samuel Just schrieb: >>> >>> It looks like a crushmap related problem. Could you send us the cr= ushmap? >>> >>> ceph osd getcrushmap >>> >>> Thanks >>> -Sam >>> >>> On Fri, Nov 18, 2011 at 10:13 AM, Gregory Farnum >>> wrote: >>>> >>>> On Fri, Nov 18, 2011 at 10:05 AM, Tommi Virtanen >>>> wrote: >>>>> >>>>> On Thu, Nov 17, 2011 at 12:48, Martin Mailand >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> I am doing cluster failure test, where I shut down one OSD an wa= it for >>>>>> the >>>>>> cluster to sync. But the sync never finshed, at around 4-5% it s= tops. I >>>>>> stoped osd2. >>>>> >>>>> ... >>>>>> >>>>>> 2011-11-17 16:42:45.520740 pg v1337: 600 pgs: 547 active+clea= n, 53 >>>>>> active+clean+degraded; 113 GB data, 184 GB used, 1141 GB / 1395 = GB >>>>>> avail; >>>>>> 4025/82404 degraded (4.884%) >>>>> >>>>> ... >>>>>> >>>>>> The osd log, the ceph.conf, pg dump, osd dump could be found her= e. >>>>>> >>>>>> http://85.214.49.87/ceph/ >>>>> >>>>> This looks a bit worrying: >>>>> >>>>> 2011-11-17 17:56:35.771574 7f704c834700 -- 192.168.42.113:0/2424>= > >>>>> 192.168.42.114:6802/21115 pipe(0x2596c80 sd=3D17 pgs=3D0 cs=3D0 l= =3D0).connect >>>>> claims to be 192.168.42.114:6802/21507 not 192.168.42.114:6802/21= 115 - >>>>> wrong node! >>>>> >>>>> So osd.0 is basically refusing to talk to one of the other OSDs. = I >>>>> don't understand the messenger well enough to know why this would= be, >>>>> but it wouldn't surprise me if this problem kept the objects degr= aded >>>>> -- it looks like a breakage in the osd<->osd communication. >>>>> >>>>> Now if this was the reason, I'd expect a restart of all the OSDs = to >>>>> get it back in shape; messenger state is ephemeral. Can you confi= rm >>>>> that? >>>> >>>> Probably not =E2=80=94 that wrong node thing can occur for a lot o= f different >>>> reasons, some of which matter and most of which don't. Sam's looki= ng >>>> into the problem; there's something going wrong with the CRUSH >>>> calculations or the monitor PG placement overrides or something... >>>> -Greg >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-dev= el" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-deve= l" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel"= in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html