From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?iso-8859-2?Q?=A3ukasz_Chrustek?= Subject: Re: Problem with query and any operation on PGs Date: Wed, 24 May 2017 15:19:52 +0200 Message-ID: <135176900.20170524151952@tlen.pl> References: <175484591.20170523135449@tlen.pl> <483467685.20170523144818@tlen.pl> <1464688590.20170523185052@tlen.pl> <1075363645.20170523234331@tlen.pl> Reply-To: =?iso-8859-2?Q?=A3ukasz_Chrustek?= Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 8BIT Return-path: Received: from mx-out.tlen.pl ([193.222.135.148]:42277 "EHLO mx-out.tlen.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1761687AbdEXNaQ (ORCPT ); Wed, 24 May 2017 09:30:16 -0400 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Sage Weil Cc: ceph-devel@vger.kernel.org Cześć, > On Tue, 23 May 2017, Łukasz Chrustek wrote: >> Cześć, >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote: >> >> I'm not sleeping for over 30 hours, and still can't find solution. I >> >> did, as You wrote, but turning off this >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue... >> >> > The important bit is: >> >> > "blocked": "peering is blocked due to down osds", >> > "down_osds_we_would_probe": [ >> > 6, >> > 10, >> > 33, >> > 37, >> > 72 >> > ], >> > "peering_blocked_by": [ >> > { >> > "osd": 6, >> > "current_lost_at": 0, >> > "comment": "starting or marking this osd lost may let >> > us proceed" >> > }, >> > { >> > "osd": 10, >> > "current_lost_at": 0, >> > "comment": "starting or marking this osd lost may let >> > us proceed" >> > }, >> > { >> > "osd": 37, >> > "current_lost_at": 0, >> > "comment": "starting or marking this osd lost may let >> > us proceed" >> > }, >> > { >> > "osd": 72, >> > "current_lost_at": 113771, >> > "comment": "starting or marking this osd lost may let >> > us proceed" >> > } >> > ] >> > }, >> >> > Are any of those OSDs startable? >> >> They were all up and running - but I decided to shut them down and out >> them from ceph, now it looks like ceph working ok, but still two PGs >> are in down state, how to get rid of it ? > If you haven't deleted the data, you should start the OSDs back up. > If they are partially damanged you can use ceph-objectstore-tool to > extract just the PGs in question to make sure you haven't lost anything, > inject them on some other OSD(s) and restart those, and *then* mark the > bad OSDs as 'lost'. > If all else fails, you can just mark those OSDs 'lost', but in doing so > you might be telling the cluster to lose data. > The best thing to do is definitely to get those OSDs started again. Now situation looks like this: [root@cc1 ~]# rbd info volumes/volume-ccc5d976-cecf-4938-a452-1bee6188987b rbd image 'volume-ccc5d976-cecf-4938-a452-1bee6188987b': size 500 GB in 128000 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.ed9d394a851426 format: 2 features: layering flags: [root@cc1 ~]# rados -p volumes ls | grep rbd_data.ed9d394a851426 (output cutted) rbd_data.ed9d394a851426.000000000000447c rbd_data.ed9d394a851426.0000000000010857 rbd_data.ed9d394a851426.000000000000ec8b rbd_data.ed9d394a851426.000000000000fa43 rbd_data.ed9d394a851426.000000000001ef2d ^C it hangs on this object and isn't going further. rbd cp also hangs... rbd map - also... can You advice what can be solution for this case ? -- Regards, Łukasz Chrustek