From mboxrd@z Thu Jan 1 00:00:00 1970 From: Josh Durgin Subject: Re: Pg stuck stale...why? Date: Tue, 10 Jul 2012 18:22:07 -0700 Message-ID: <4FFCD53F.108@inktank.com> References: <4FFCD2AC.3040809@catalyst.net.nz> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-pb0-f46.google.com ([209.85.160.46]:54719 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752653Ab2GKBWK (ORCPT ); Tue, 10 Jul 2012 21:22:10 -0400 Received: by pbbrp8 with SMTP id rp8so1135981pbb.19 for ; Tue, 10 Jul 2012 18:22:10 -0700 (PDT) In-Reply-To: <4FFCD2AC.3040809@catalyst.net.nz> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Mark Kirkwood Cc: ceph-devel@vger.kernel.org On 07/10/2012 06:11 PM, Mark Kirkwood wrote: > I am seeing this: > > # ceph -s > health HEALTH_WARN 256 pgs stale; 256 pgs stuck stale > monmap e1: 3 mons at > {ved1=192.168.122.11:6789/0,ved2=192.168.122.12:6789/0,ved3=192.168.122.13:6789/0}, > election epoch 18, quorum 0,1,2 ved1,ved2,ved3 > osdmap e62: 4 osds: 4 up, 4 in > pgmap v47148: 768 pgs: 512 active+clean, 256 stale+active+clean; 2224 MB > data, 15442 MB used, 86907 MB / 102350 MB avail > mdsmap e1: 0/0/1 > > In particular 256 pgs stuck stale - I've tried a) waiting a while > (overnight), b) a rolling restart of all 4 osd's, c) restarting all ceph > services on all 4 nodes. All without changing this. > > As far as I understand what stuck state means, I can't see why they need > to stay that way, given all osd's and mon's are up. (I have no mds > configured)....any ideas? Or is this just expected? > > Regards > > Mark What does 'ceph pg dump_stuck stale' show? Stale means that the monitors haven't gotten updates about those pgs from the osds within the a certain period of time (default is 300 seconds), so something may be wrong with your crushmap or those pgs themselves. Josh