From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Priebe - Profihost AG Subject: Re: still recovery issues with cuttlefish Date: Wed, 14 Aug 2013 09:04:16 +0200 Message-ID: <520B2BF0.2030208@profihost.ag> References: <51FA1AC1.8040207@profihost.ag> <51FAAED3.3010509@cloudapt.com> <51FAB20F.3030707@profihost.ag> <51FB636B.5050301@profihost.ag> <51FBF765.9030700@profihost.ag> <51FBFE85.5040700@profihost.ag> <5203A597.4060701@cloudapt.com> <5203DFAE.9070100@profihost.ag> <52068FB6.1080209@profihost.ag> <673B805F-B036-4066-B8AD-770E6464B64C@profihost.ag> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: Received: from mail-ph.de-nserver.de ([85.158.179.214]:42105 "EHLO mail-ph.de-nserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752942Ab3HNHEX (ORCPT ); Wed, 14 Aug 2013 03:04:23 -0400 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Samuel Just Cc: Mike Dawson , "josh.durgin@inktank.com" , Oliver Francke , "ceph-devel@vger.kernel.org" , Stefan Hajnoczi the same problem still occours. Will need to check when i've time to gather logs again. Am 14.08.2013 01:11, schrieb Samuel Just: > I'm not sure, but your logs did show that you had >16 recovery ops in > flight, so it's worth a try. If it doesn't help, you should collect > the same set of logs I'll look again. Also, there are a few other > patches between 61.7 and current cuttlefish which may help. > -Sam > > On Tue, Aug 13, 2013 at 2:03 PM, Stefan Priebe - Profihost AG > wrote: >> >> Am 13.08.2013 um 22:43 schrieb Samuel Just : >> >>> I just backported a couple of patches from next to fix a bug where we >>> weren't respecting the osd_recovery_max_active config in some cases >>> (1ea6b56170fc9e223e7c30635db02fa2ad8f4b4e). You can either try the >>> current cuttlefish branch or wait for a 61.8 release. >> >> Thanks! Are you sure that this is the issue? I don't believe that but i'll give it a try. I already tested a branch from sage where he fixed a race regarding max active some weeks ago. So active recovering was max 1 but the issue didn't went away. >> >> Stefan >> >>> -Sam >>> >>> On Mon, Aug 12, 2013 at 10:34 PM, Samuel Just wrote: >>>> I got swamped today. I should be able to look tomorrow. Sorry! >>>> -Sam >>>> >>>> On Mon, Aug 12, 2013 at 9:39 PM, Stefan Priebe - Profihost AG >>>> wrote: >>>>> Did you take a look? >>>>> >>>>> Stefan >>>>> >>>>> Am 11.08.2013 um 05:50 schrieb Samuel Just : >>>>> >>>>>> Great! I'll take a look on Monday. >>>>>> -Sam >>>>>> >>>>>> On Sat, Aug 10, 2013 at 12:08 PM, Stefan Priebe wrote: >>>>>>> Hi Samual, >>>>>>> >>>>>>> Am 09.08.2013 23:44, schrieb Samuel Just: >>>>>>> >>>>>>>> I think Stefan's problem is probably distinct from Mike's. >>>>>>>> >>>>>>>> Stefan: Can you reproduce the problem with >>>>>>>> >>>>>>>> debug osd = 20 >>>>>>>> debug filestore = 20 >>>>>>>> debug ms = 1 >>>>>>>> debug optracker = 20 >>>>>>>> >>>>>>>> on a few osds (including the restarted osd), and upload those osd logs >>>>>>>> along with the ceph.log from before killing the osd until after the >>>>>>>> cluster becomes clean again? >>>>>>> >>>>>>> >>>>>>> done - you'll find the logs at cephdrop folder: >>>>>>> slow_requests_recovering_cuttlefish >>>>>>> >>>>>>> osd.52 was the one recovering >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> Greets, >>>>>>> Stefan >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >