From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Layton Subject: Re: [PATCH] libceph: Complete stuck requests to OSD with EIO Date: Sun, 12 Feb 2017 07:34:48 -0500 Message-ID: <1486902888.17544.1.camel@redhat.com> References: <4e080919-2df5-1b75-3c8a-3ae95eb8f08d@synesis.ru> <1486726280.4233.1.camel@redhat.com> <000f7e99-d074-d2a6-5e8a-f115106a913d@synesis.ru> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Return-path: Received: from mail-qt0-f181.google.com ([209.85.216.181]:34617 "EHLO mail-qt0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751139AbdBLMew (ORCPT ); Sun, 12 Feb 2017 07:34:52 -0500 Received: by mail-qt0-f181.google.com with SMTP id w20so66133536qtb.1 for ; Sun, 12 Feb 2017 04:34:51 -0800 (PST) In-Reply-To: <000f7e99-d074-d2a6-5e8a-f115106a913d@synesis.ru> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Artur Molchanov , Ilya Dryomov Cc: ceph-devel@vger.kernel.org On Sat, 2017-02-11 at 23:30 +0300, Artur Molchanov wrote: > Hi Jef, > > On Fri, 2017-02-10 at 14:31, Jeff Layton wrote: > > On Thu, 2017-02-09 at 16:04 +0300, Artur Molchanov wrote: > > > From: Artur Molchanov > > > > > > Complete stuck requests to OSD with error EIO after osd_request_timeout expired. > > > If osd_request_timeout equals to 0 (default value) then do nothing with > > > hung requests (keep default behavior). > > > > > > Create RBD map option osd_request_timeout to set timeout in seconds. Set > > > osd_request_timeout to 0 by default. > > > > > > > Also, what exactly are the requests blocked on when this occurs? Is the > > ceph_osd_request_target ending up paused? I wonder if we might be better > > off with something that returns a hard error under the circumstances > > where you're hanging, rather than depending on timeouts. > > I wonder that it is better to complete requests only after timeout expired, just > because a request can fail due to temporary network issues (e.g. router > restarted) or restarting machine/services. > > > Having a job that has to wake up every second or so isn't ideal. Perhaps > > you would be better off scheduling the delayed work in the request > > submission codepath, and only rearm it when the tree isn't empty after > > calling complete_osd_stuck_requests? > > Would you please tell me more about rearming work only if the tree is not empty > after calling complete_osd_stuck_requests? From what code we should call > complete_osd_stuck_requests? > Sure. I'm saying you would want to call schedule_delayed_work for the timeout handler from the request submission path when you link a request into the tree that has a timeout. Maybe in __submit_request? Then, instead of unconditionally calling schedule_delayed_work at the end of handle_request_timeout, you'd only call it if there were no requests still sitting in the osdc trees. > As I understood, there are two primary cases: > 1 - Requests to OSD failed, but monitors do not return new osdmap (because all > monitors are offline or monitors did not update osdmap yet). > In this case requests are retried by cyclic calling ceph_con_workfn -> con_fault > -> ceph_con_workfn. We can check request timestamp and does not call con_fault > but complete it. > > 2 - Monitors return new osdmap which does not have any OSD for RBD. > In this case all requests to the last ready OSD will be linked on "homeless" OSD > and will not be retried until new osdmap with appropriate OSD received. I think > that we need additional periodic checking timestamp such requests. > > Yes, there is already existing job handle_timeout. But the responsibility of > this job is to sending keepalive requests to slow OSD. I'm not sure that it is a > good idea to perform additional actions inside this job. > I decided that creating specific job handle_osd_request_timeout is more applicable. > > This job will be run only once with a default value of osd_request_timeout (0). Ahh, I missed that -- thanks. > At the same time, I think that user will not use too small value for this > parameter. I wonder that typical value will be about 1 minute or greater. > > > Also, I don't see where this job is ever cancelled when the osdc is torn > > down. That needs to occur or you'll cause a use-after-free oops... > > It is my fault, thanks for the correction. > -- Jeff Layton