From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966829AbZLHWvD (ORCPT ); Tue, 8 Dec 2009 17:51:03 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S966820AbZLHWu5 (ORCPT ); Tue, 8 Dec 2009 17:50:57 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47883 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966765AbZLHWu4 (ORCPT ); Tue, 8 Dec 2009 17:50:56 -0500 Date: Tue, 8 Dec 2009 17:50:56 -0500 From: Vivek Goyal To: Jeff Moyer Cc: Jens Axboe , linux kernel mailing list , Gui Jianfeng Subject: Re: [PATCH] cfq-iosched: Wait for next request if we are about to expire the queue Message-ID: <20091208225055.GE28615@redhat.com> References: <20091208150412.GC28615@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.19 (2009-01-05) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 08, 2009 at 12:52:48PM -0500, Jeff Moyer wrote: > Vivek Goyal writes: > > > If there is a sequential reader running in a group, we wait for next request > > to come in that group after slice expiry and once new request is in, we expire > > the queue. Otherwise we delete the group from service tree and group looses > > its fair share. > > > > So far I was marking a queue as wait_busy if it had consumed its slice and > > it was last queue in the group. But this condition did not cover following > > two cases. > > > > - If think time of process is more than slice left, we will not arm the timer > > and queue will be expired. > > > > - If we hit the boundary condition where slice has not expired after request > > completion but almost immediately after (4-5 ns), select_queue() hits and > > by that time slice has expired because jiffies has incremented by one. > > > > Gui was hitting the boundary condition and not getting fairness numbers > > proportional to weight. > > > > This patch puts the checks for above two conditions and improves the fairness > > numbers for sequential workload on rotational media. > > > > Reported-by: Gui Jianfeng > > Signed-off-by: Vivek Goyal > > Acked-by: Gui Jianfeng > > --- > > block/cfq-iosched.c | 30 ++++++++++++++++++++++++++---- > > 1 file changed, 26 insertions(+), 4 deletions(-) > > > > Index: linux1/block/cfq-iosched.c > > =================================================================== > > --- linux1.orig/block/cfq-iosched.c 2009-12-07 14:40:03.000000000 -0500 > > +++ linux1/block/cfq-iosched.c 2009-12-07 14:40:04.000000000 -0500 > > @@ -3251,6 +3251,29 @@ static void cfq_update_hw_tag(struct cfq > > cfqd->hw_tag = 0; > > } > > > > +static inline bool > > +cfq_should_wait_busy(struct cfq_data *cfqd, struct cfq_queue *cfqq) > > +{ > > + struct cfq_io_context *cic = cfqd->active_cic; > > + > > + /* If there are other queues in the group, don't wait */ > > + if (cfqq->cfqg->nr_cfqq > 1) > > + return false; > > + > > + if (cfq_slice_used(cfqq)) > > + return true; > > + > > + if (cfqq->slice_end - jiffies == 1) > > + return true; > > This really looks like a hack. At the very least, it requires a big > comment above it explaining why it's there. Would it be possible to > detect such a condition in select_queue and handle it there? Hi Jeff, We have to wait for next request in the queue somewhere to make sure group does not get deleted otherwise it loses share. We can wait for next request either in request completed_request() or in select_queue(). Initially I had put wait_busy() call in select_queue() and after taking care of all the cases, it was looking more ugly as compared to current solution. I had to modify arm_slice_timer() routine to specify special case that we are doing wait busy and also call arm_slice_timer() in select_queue() to make sure we don't stall the queue in case there is nothing in the dispatch queue and we still need to wait for group to get busy. We want to make sure atleast idle timer is armed. So to avoid the modifications in arm_slice_timer() and to avoid calling it in select(), we chose to implement wait_busy mechanism in completed_request(). It is consistent with CFQ theme of arming the timer in completed_request(), if need be and select_queue() will honor that. Now there are couple of corner cases left. 1.If a request completed and slice has not expired yet. Next request comes in and is dispatched to disk. Now select_queue() hits and slice has expired. This group will be deleted. Because request is still in the disk, this queue will never get a chance to wait_busy. 2.If request completed and slice has not expired yet. Before next request comes in (delay due to think time), select_queue() hits and expires the queue hence group. This queue never got a chance to wait busy. Now I have modified the patch to take care of case 1 in select_queue() and to take care of case 2 in completed_request(). It is a two patch series I will post soon to lkml. Please have a look and see if you like that better. Thanks Vivek