From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751746AbcGYK3s (ORCPT ); Mon, 25 Jul 2016 06:29:48 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:43004 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750940AbcGYK3m (ORCPT ); Mon, 25 Jul 2016 06:29:42 -0400 Message-ID: <5795EA02.2090200@oracle.com> Date: Mon, 25 Jul 2016 18:29:22 +0800 From: Bob Liu User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= CC: linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, konrad.wilk@oracle.com, jgross@suse.com Subject: Re: [PATCH 3/3] xen-blkfront: dynamic configuration of per-vbd resources References: <1468575109-12209-1-git-send-email-bob.liu@oracle.com> <1468575109-12209-3-git-send-email-bob.liu@oracle.com> <20160721085756.ps4rtdns4xh35yii@mac> <57909F05.9030809@oracle.com> <20160722074506.l5nfcmqg3jzsmxzi@mac> <5791D6AC.1070604@oracle.com> <20160722093409.iwcmlubhou4rjjop@mac> <5791EAC4.2030309@oracle.com> <20160722114538.7un36zw7mrvcueob@mac> <57929BAF.4030304@oracle.com> <20160725092002.madb7j6ryn4jcoho@mac> In-Reply-To: <20160725092002.madb7j6ryn4jcoho@mac> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Source-IP: aserv0022.oracle.com [141.146.126.234] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/25/2016 05:20 PM, Roger Pau Monné wrote: > On Sat, Jul 23, 2016 at 06:18:23AM +0800, Bob Liu wrote: >> >> On 07/22/2016 07:45 PM, Roger Pau Monné wrote: >>> On Fri, Jul 22, 2016 at 05:43:32PM +0800, Bob Liu wrote: >>>> >>>> On 07/22/2016 05:34 PM, Roger Pau Monné wrote: >>>>> On Fri, Jul 22, 2016 at 04:17:48PM +0800, Bob Liu wrote: >>>>>> >>>>>> On 07/22/2016 03:45 PM, Roger Pau Monné wrote: >>>>>>> On Thu, Jul 21, 2016 at 06:08:05PM +0800, Bob Liu wrote: >>>>>>>> >>>>>>>> On 07/21/2016 04:57 PM, Roger Pau Monné wrote: >>>>>> ..[snip].. >>>>>>>>>> + >>>>>>>>>> +static ssize_t dynamic_reconfig_device(struct blkfront_info *info, ssize_t count) >>>>>>>>>> +{ >>>>>>>>>> + unsigned int i; >>>>>>>>>> + int err = -EBUSY; >>>>>>>>>> + >>>>>>>>>> + /* >>>>>>>>>> + * Make sure no migration in parallel, device lock is actually a >>>>>>>>>> + * mutex. >>>>>>>>>> + */ >>>>>>>>>> + if (!device_trylock(&info->xbdev->dev)) { >>>>>>>>>> + pr_err("Fail to acquire dev:%s lock, may be in migration.\n", >>>>>>>>>> + dev_name(&info->xbdev->dev)); >>>>>>>>>> + return err; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + /* >>>>>>>>>> + * Prevent new requests and guarantee no uncompleted reqs. >>>>>>>>>> + */ >>>>>>>>>> + blk_mq_freeze_queue(info->rq); >>>>>>>>>> + if (part_in_flight(&info->gd->part0)) >>>>>>>>>> + goto out; >>>>>>>>>> + >>>>>>>>>> + /* >>>>>>>>>> + * Front Backend >>>>>>>>>> + * Switch to XenbusStateClosed >>>>>>>>>> + * frontend_changed(): >>>>>>>>>> + * case XenbusStateClosed: >>>>>>>>>> + * xen_blkif_disconnect() >>>>>>>>>> + * Switch to XenbusStateClosed >>>>>>>>>> + * blkfront_resume(): >>>>>>>>>> + * frontend_changed(): >>>>>>>>>> + * reconnect >>>>>>>>>> + * Wait until XenbusStateConnected >>>>>>>>>> + */ >>>>>>>>>> + info->reconfiguring = true; >>>>>>>>>> + xenbus_switch_state(info->xbdev, XenbusStateClosed); >>>>>>>>>> + >>>>>>>>>> + /* Poll every 100ms, 1 minute timeout. */ >>>>>>>>>> + for (i = 0; i < 600; i++) { >>>>>>>>>> + /* >>>>>>>>>> + * Wait backend enter XenbusStateClosed, blkback_changed() >>>>>>>>>> + * will clear reconfiguring. >>>>>>>>>> + */ >>>>>>>>>> + if (!info->reconfiguring) >>>>>>>>>> + goto resume; >>>>>>>>>> + schedule_timeout_interruptible(msecs_to_jiffies(100)); >>>>>>>>>> + } >>>>>>>>> >>>>>>>>> Instead of having this wait, could you just set info->reconfiguring = 1, set >>>>>>>>> the frontend state to XenbusStateClosed and mimic exactly what a resume from >>>>>>>>> suspension does? blkback_changed would have to set the frontend state to >>>>>>>>> InitWait when it detects that the backend has switched to Closed, and call >>>>>>>>> blkfront_resume. >>>>>>>> >>>>>>>> >>>>>>>> I think that won't work. >>>>>>>> In the real "resume" case, the power management system will trigger all ->resume() path. >>>>>>>> But there is no place for dynamic configuration. >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I think it should be possible to set info->reconfiguring and wait for the >>>>>>> backend to switch to state Closed, at that point we should call blkif_resume >>>>>>> (from blkback_changed) and the backend will follow the reconection. >>>>>>> >>>>>> >>>>>> Okay, I get your point. Yes, that's an option. >>>>>> >>>>>> But this will make 'dynamic configuration' to be async, I'm worry about the end-user will get panic. >>>>>> E.g >>>>>> A end-user "echo > /sys/devices/vbd-xxx/max_indirect_segs", >>>>>> but then the device will be Closed and disappeared, the user have to wait for a random time so that the device can resume. >>>>> >>>>> That should not happen, AFAICT on migration the device never dissapears. >>>> >>>> Oh, yes. >>>> >>>>> alloc_disk and friends should not be called on resume from migration (see >>>>> the switch in blkfront_connect, you should take the BLKIF_STATE_SUSPENDED >>>>> path for the reconfiguration). >>>>> >>>> >>>> What about if the end-user starts I/O immediately after writing new value to /sys? >>>> But the resume is still in progress. >>> >>> blkif_free already stops the queues by calling blk_mq_stop_hw_queues, and >>> blkif_queue_request will refuse to put anything on the ring if the state >>> is different than connected, which in turn makes blkif_queue_rq call >>> blk_mq_stop_hw_queue to stop the queue, so it should be safe. >>> >> >> But this will surprise the end-user, our user script is like this: >> 1) echo > /sys/xxxx >> 2) Start I/O immediately. >> ^^^ Fail because requests would be refused(even software queue was still freezed). > > Which error do you get? AFAICT reads/writes should either block when the You are right, the reads/writes will just block which is fine. > queue is full, or if the fd is in non-blocking mode it should return EAGAIN > (which a properly coded application should be prepared to deal with > gracefully if it's using non-blocking fds anyway). > >> It's not good for the end user have to wait for a random time before restart I/O. >> >> >> There are two more concerns I have: >> * blkif_resume() may fail, how the end-user can aware that if "echo > /sys/xxx" already returned success. > > If you really think this is needed, can't you use a monitor or some kind of > condition with a timeout instead of open-coding it? Although I'm still not > convinced that blocking here is TRTTD. > Let me ask in another way. If moving blkfront_resume() to blkback_changed, should we check the return value of blkfront_resume()? And what to do if it returns error. >> * We get the device lock and blk_mq_freeze_queue() in dynamic_reconfig_device(), >> and then have to release in blkif_recover() asynchronously which makes the code more difficult to readable. > > I'm not sure I'm following here, do you mean that you will pick the lock in > dynamic_reconfig_device and release it in blkif_recover? Why wouldn't you Yes. > release the lock in dynamic_reconfig_device after doing whatever is needed? > Both 'dynamic configuration' and migration:xenbus_dev_resume() use { blkfront_resume(); blkif_recover() } and depends on the change of xbdev->state. If they happen simultaneously, the State machine of xbdev->state is going to be a mess and very difficult to track. The lock(actually a mutex) is like a big lock to make sure no race would happen at all. Thanks, Bob Liu