From mboxrd@z Thu Jan 1 00:00:00 1970 From: flisky Subject: [EMG]incomplete pgs make RGW unusable Date: Tue, 19 May 2015 08:06:01 +0800 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from plane.gmane.org ([80.91.229.3]:34202 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751946AbbESAKI (ORCPT ); Mon, 18 May 2015 20:10:08 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1YuV6j-0002mX-3y for ceph-devel@vger.kernel.org; Tue, 19 May 2015 02:10:07 +0200 Received: from 118.186.147.8 ([118.186.147.8]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 19 May 2015 02:10:05 +0200 Received: from yinjifeng by 118.186.147.8 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 19 May 2015 02:10:05 +0200 Sender: ceph-devel-owner@vger.kernel.org List-ID: To: ceph-devel@vger.kernel.org Hi core developers, Sorry, I have to raise this from user list because there is no response. I reformat the disks to increase osd journal size. During the process, I lost 9 pgs. The incomplete pgs belong to the pool .rgw.buckets, which causes slow requests and make RGW unusable. I have to restart the OSDs and RGWs time to time, to make RGW responsible. The most scary things happen. I cannot recovery the incomplete pgs. [force_create_pg](http://tracker.ceph.com/issues/10098) marks it a feature to handle this. The [blog](https://ceph.com/community/incomplete-pgs-oh-my/) is not working, either. And finally I give up the recovery, and use 'rados cppool' to copy our huge pool, it's STUCK... I'm very frustrated, and very surprised that Ceph doesn't offer a way to just let the lost data lost. Could anyone give any advise on this? Thanks! Sincerely,