From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wido den Hollander Subject: Re: cuttlefish countdown -- OSD doesn't get marked out Date: Thu, 25 Apr 2013 14:56:06 +0200 Message-ID: <517927E6.2000903@42on.com> References: <51791C83.3010403@tuxadero.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from websrv.42on.com ([31.25.102.167]:33293 "EHLO websrv.42on.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756155Ab3DYM4K (ORCPT ); Thu, 25 Apr 2013 08:56:10 -0400 In-Reply-To: <51791C83.3010403@tuxadero.com> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Martin Mailand Cc: Sage Weil , ceph-devel@vger.kernel.org On 04/25/2013 02:07 PM, Martin Mailand wrote: > Hi, > > if I shutdown an OSD, the OSD gets marked down after 20 seconds, after > 300 seconds the osd should get marked out, an the cluster should resync. > But that doesn't happened, the OSD stays in the status down/in forever, > therefore the cluster stays forever degraded. > I can reproduce it with a new installed cluster. > > If I manually set the osd out (ceph osd out 1), the cluster resync > starts immediately. > Could you dump your osdmap? The first 10 lines would be interesting. There is a flag where you say "noosdout", could it be that the flag is set? Wido > I think thats a release critical bug, because the cluster health is not > automatically recovered. > > And I reported this behavior a while ago > http://article.gmane.org/gmane.comp.file-systems.ceph.user/603/ > > -martin > > > Log: > > > root@store1:~# ceph -s > health HEALTH_OK > monmap e1: 3 mons at > {a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0}, > election epoch 82, quorum 0,1,2 a,b,c > osdmap e204: 24 osds: 24 up, 24 in > pgmap v106709: 5056 pgs: 5056 active+clean; 526 GB data, 1068 GB > used, 173 TB / 174 TB avail > mdsmap e1: 0/0/1 up > > root@store1:~# ceph --version > ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c) > root@store1:~# /etc/init.d/ceph stop osd.1 > === osd.1 === > Stopping Ceph osd.1 on store1...bash: warning: setlocale: LC_ALL: cannot > change locale (en_GB.utf8) > kill 5492...done > root@store1:~# ceph -s > health HEALTH_OK > monmap e1: 3 mons at > {a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0}, > election epoch 82, quorum 0,1,2 a,b,c > osdmap e204: 24 osds: 24 up, 24 in > pgmap v106709: 5056 pgs: 5056 active+clean; 526 GB data, 1068 GB > used, 173 TB / 174 TB avail > mdsmap e1: 0/0/1 up > > root@store1:~# date -R > Thu, 25 Apr 2013 13:09:54 +0200 > > > > root@store1:~# ceph -s && date -R > health HEALTH_WARN 423 pgs degraded; 423 pgs stuck unclean; recovery > 10999/269486 degraded (4.081%); 1/24 in osds are down > monmap e1: 3 mons at > {a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0}, > election epoch 82, quorum 0,1,2 a,b,c > osdmap e206: 24 osds: 23 up, 24 in > pgmap v106715: 5056 pgs: 4633 active+clean, 423 active+degraded; 526 > GB data, 1068 GB used, 173 TB / 174 TB avail; 10999/269486 degraded (4.081%) > mdsmap e1: 0/0/1 up > > Thu, 25 Apr 2013 13:10:14 +0200 > > > root@store1:~# ceph -s && date -R > health HEALTH_WARN 423 pgs degraded; 423 pgs stuck unclean; recovery > 10999/269486 degraded (4.081%); 1/24 in osds are down > monmap e1: 3 mons at > {a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0}, > election epoch 82, quorum 0,1,2 a,b,c > osdmap e206: 24 osds: 23 up, 24 in > pgmap v106719: 5056 pgs: 4633 active+clean, 423 active+degraded; 526 > GB data, 1068 GB used, 173 TB / 174 TB avail; 10999/269486 degraded (4.081%) > mdsmap e1: 0/0/1 up > > Thu, 25 Apr 2013 13:23:01 +0200 > > On 25.04.2013 01:46, Sage Weil wrote: >> Hi everyone- >> >> We are down to a handful of urgent bugs (3!) and a cuttlefish release date >> that is less than a week away. Thank you to everyone who has been >> involved in coding, testing, and stabilizing this release. We are close! >> >> If you would like to test the current release candidate, your efforts >> would be much appreciated! For deb systems, you can do >> >> wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/autobuild.asc' | sudo apt-key add - >> echo deb http://gitbuilder.ceph.com/ceph-deb-$(lsb_release -sc)-x86_64-basic/ref/next $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list >> >> For rpm users you can find packages at >> >> http://gitbuilder.ceph.com/ceph-rpm-centos6-x86_64-basic/ref/next/ >> http://gitbuilder.ceph.com/ceph-rpm-fc17-x86_64-basic/ref/next/ >> http://gitbuilder.ceph.com/ceph-rpm-fc18-x86_64-basic/ref/next/ >> >> A draft of the release notes is up at >> >> http://ceph.com/docs/master/release-notes/#v0-61 >> >> Let me know if I've missed anything! >> >> sage >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on