From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Kirkwood Subject: Re: Pg stuck stale...why? Date: Wed, 11 Jul 2012 14:09:03 +1200 Message-ID: <4FFCE03F.3050602@catalyst.net.nz> References: <4FFCD2AC.3040809@catalyst.net.nz> <4FFCD53F.108@inktank.com> <4FFCD7B7.5020707@catalyst.net.nz> <4FFCDD19.3010700@inktank.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from bertrand.catalyst.net.nz ([202.78.240.40]:45028 "EHLO mail.catalyst.net.nz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753687Ab2GKCJJ (ORCPT ); Tue, 10 Jul 2012 22:09:09 -0400 In-Reply-To: <4FFCDD19.3010700@inktank.com> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Josh Durgin Cc: ceph-devel@vger.kernel.org On 11/07/12 13:55, Josh Durgin wrote: > On 07/10/2012 06:32 PM, Mark Kirkwood wrote: >> On 11/07/12 13:22, Josh Durgin wrote: >>> On 07/10/2012 06:11 PM, Mark Kirkwood wrote: >>>> I am seeing this: >>>> >>>> # ceph -s >>>> health HEALTH_WARN 256 pgs stale; 256 pgs stuck stale >>>> monmap e1: 3 mons at >>>> {ved1=192.168.122.11:6789/0,ved2=192.168.122.12:6789/0,ved3=192.168.122.13:6789/0}, >>>> >>>> >>>> election epoch 18, quorum 0,1,2 ved1,ved2,ved3 >>>> osdmap e62: 4 osds: 4 up, 4 in >>>> pgmap v47148: 768 pgs: 512 active+clean, 256 stale+active+clean; >>>> 2224 MB >>>> data, 15442 MB used, 86907 MB / 102350 MB avail >>>> mdsmap e1: 0/0/1 >>>> >>>> In particular 256 pgs stuck stale - I've tried a) waiting a while >>>> (overnight), b) a rolling restart of all 4 osd's, c) restarting all >>>> ceph >>>> services on all 4 nodes. All without changing this. >>>> >>>> As far as I understand what stuck state means, I can't see why they >>>> need >>>> to stay that way, given all osd's and mon's are up. (I have no mds >>>> configured)....any ideas? Or is this just expected? >>>> >>>> Regards >>>> >>>> Mark >>> >>> What does 'ceph pg dump_stuck stale' show? Stale means that the >>> monitors haven't gotten updates about those pgs from the osds within >>> the a certain period of time (default is 300 seconds), so something may >>> be wrong with your crushmap or those pgs themselves. >>> >>> Josh >> >> I have attached the dump of stuck stale pgs, and the crushmap in use. >> >> I did wonder if it is to do with not using any mds - i.e could this mean >> the meta data pgs never get touched? >> >> Mark >> > > It doesn't look like a problem with your crushmap - the pgs are all > mapped to osds, and there's no common osd holding things up. > > Not using the mds doesn't affect the pgs. They should still be active. > All the stuck ones are in pool 0 though. Is there anything special > about that pool? Were there any changes before the pgs became stuck? > > I don't think it should work in this case, but you might try 'ceph pg > force_create_pg 0.c'. > > Hmm - good observation - pool 0 is data, and I am only using rbd at the moment, and: $ ceph osd dump -o -|grep size pool 0 'data' rep size 4 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 8 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 256 pgp_num 256 last_change 1 owner 0 pool 2 'rbd' rep size 4 crush_ruleset 2 object_hash rjenkins pg_num 256 pgp_num 256 last_change 11 owner 0 ... pool 0 is mapped to crush ruleset 0 - and I don't have a ruleset 0. Could that be the problem? Mark