linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux-kernel@vger.kernel.org, jens.axboe@oracle.com,
	containers@lists.linux-foundation.org, dm-devel@redhat.com,
	nauman@google.com, dpshah@google.com, lizf@cn.fujitsu.com,
	mikew@google.com, fchecconi@gmail.com, paolo.valente@unimore.it,
	ryov@valinux.co.jp, fernando@oss.ntt.co.jp,
	s-uchida@ap.jp.nec.com, taka@valinux.co.jp, jmoyer@redhat.com,
	dhaval@linux.vnet.ibm.com, balbir@linux.vnet.ibm.com,
	righi.andrea@gmail.com, m-ikeda@ds.jp.nec.com, agk@redhat.com,
	akpm@linux-foundation.org, peterz@infradead.org,
	jmarchan@redhat.com, torvalds@linux-foundation.org,
	mingo@elte.hu, riel@redhat.com,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH 15/28] io-controller: Allow CFQ specific extra preemptions
Date: Fri, 25 Sep 2009 14:24:34 +0800	[thread overview]
Message-ID: <4ABC6222.9090103@cn.fujitsu.com> (raw)
In-Reply-To: <1253820332-10246-16-git-send-email-vgoyal@redhat.com>

Vivek Goyal wrote:
> o CFQ allows a reader preemting a writer. So far we allow this with-in group
>   but not across groups. But there seems to be following special case where
>   this preemption might make sense.
> 
> 			root
> 			/  \
> 		       R   Group
> 			     |
> 			     W
> 
>   Now here reader should be able to preempt the writer. Think of there are
>   10 groups each running a writer and an admin trying to do "ls" and he
>   experiences suddenly high latencies for ls.

  Hi Vivek, 

  This preemption might be unfair to the readers who stay in the same group with
  writer. Consider the following:

                     root
                     /  \
                    R1  Group
                        /  \
                       R2   W

  Say W is running and late preemption is enabled, then a request goes into R1,
  R1 will preempt W immediately regardless of R2. Now R2 don't have a chance to
  get scheduled even if R1 has a very high vdisktime. It seems not so fair to R2.
  So I suggest the number of readers in group should be taken into account when
  making this preemption decision. R1 should only preempts W when there are not 
  any readers in that group.

  Thanks,
  Gui Jianfeng

> 
>   Same is true for meta data requests. If there is a meta data request and
>   a reader is running inside a sibling group, preemption will be allowed.
>   Note, following is not allowed.
> 			root
> 			/  \
> 	            group1 group2
> 		      |      |
> 	              R	     W
> 
>   Here reader can't preempt writer.
> 
> o Put meta data requesting queues at the front of the service tree. Generally
>   such queues will preempt currently running queue but not in following case.
> 			root
> 			/  \
> 	            group1 group2
> 		      |     / \
> 	              R1   R3  R2 (meta data)
> 
>  Here R2 is having a meta data request but it will not preempt R1. We need
>  to make sure that R2 gets queued ahead of R3 so taht once group2 gets
>  going, we first service R2 and then R3 and not vice versa.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  block/elevator-fq.c |   47 +++++++++++++++++++++++++++++++++++++++++++++--
>  block/elevator-fq.h |    3 +++
>  2 files changed, 48 insertions(+), 2 deletions(-)
> 
> diff --git a/block/elevator-fq.c b/block/elevator-fq.c
> index 25beaf7..8ff8a19 100644
> --- a/block/elevator-fq.c
> +++ b/block/elevator-fq.c
> @@ -701,6 +701,7 @@ static void enqueue_io_entity(struct io_entity *entity)
>  	struct io_service_tree *st;
>  	struct io_sched_data *sd = io_entity_sched_data(entity);
>  	struct io_queue *ioq = ioq_of(entity);
> +	int add_front = 0;
>  
>  	if (entity->on_idle_st)
>  		dequeue_io_entity_idle(entity);
> @@ -716,12 +717,22 @@ static void enqueue_io_entity(struct io_entity *entity)
>  	st = entity->st;
>  	st->nr_active++;
>  	sd->nr_active++;
> +
>  	/* Keep a track of how many sync queues are backlogged on this group */
>  	if (ioq && elv_ioq_sync(ioq) && !elv_ioq_class_idle(ioq))
>  		sd->nr_sync++;
>  	entity->on_st = 1;
> -	place_entity(st, entity, 0);
> -	__enqueue_io_entity(st, entity, 0);
> +
> +	/*
> +	 * If a meta data request is pending in this queue, put this
> +	 * queue at the front so that it gets a chance to run first
> +	 * as soon as the associated group becomes eligbile to run.
> +	 */
> +	if (ioq && ioq->meta_pending)
> +		add_front = 1;
> +
> +	place_entity(st, entity, add_front);
> +	__enqueue_io_entity(st, entity, add_front);
>  	debug_update_stats_enqueue(entity);
>  }
>  
> @@ -2280,6 +2291,31 @@ static int elv_should_preempt(struct request_queue *q, struct io_queue *new_ioq,
>  		return 1;
>  
>  	/*
> +	 * Allow some additional preemptions where a reader queue gets
> +	 * backlogged and some writer queue is running under any of the
> +	 * sibling groups.
> +	 *
> +	 * 		     root
> +	 * 		     /  \
> +	 * 		    R  group
> +	 * 			 |
> +	 * 			 W
> +	 */
> +
> +	if (ioq_of(new_entity) == new_ioq  && iog_of(entity)) {
> +		/* Let reader queue preempt writer in sibling group */
> +		if (elv_ioq_sync(new_ioq) && !elv_ioq_sync(active_ioq))
> +			return 1;
> +		/*
> +		 * So both queues are sync. Let the new request get disk time if
> +		 * it's a metadata request and the current queue is doing
> +		 * regular IO.
> +		 */
> +		if (new_ioq->meta_pending && !active_ioq->meta_pending)
> +			return 1;
> +	}
> +
> +	/*
>  	 * If both the queues belong to same group, check with io scheduler
>  	 * if it has additional criterion based on which it wants to
>  	 * preempt existing queue.
> @@ -2335,6 +2371,8 @@ void elv_ioq_request_add(struct request_queue *q, struct request *rq)
>  	BUG_ON(!efqd);
>  	BUG_ON(!ioq);
>  	ioq->nr_queued++;
> +	if (rq_is_meta(rq))
> +		ioq->meta_pending++;
>  	elv_log_ioq(efqd, ioq, "add rq: rq_queued=%d", ioq->nr_queued);
>  
>  	if (!elv_ioq_busy(ioq))
> @@ -2669,6 +2707,11 @@ void elv_ioq_request_removed(struct elevator_queue *e, struct request *rq)
>  	ioq = rq->ioq;
>  	BUG_ON(!ioq);
>  	ioq->nr_queued--;
> +
> +	if (rq_is_meta(rq)) {
> +		WARN_ON(!ioq->meta_pending);
> +		ioq->meta_pending--;
> +	}
>  }
>  
>  /* A request got dispatched. Do the accounting. */
> diff --git a/block/elevator-fq.h b/block/elevator-fq.h
> index 2992d93..27ff5c4 100644
> --- a/block/elevator-fq.h
> +++ b/block/elevator-fq.h
> @@ -100,6 +100,9 @@ struct io_queue {
>  
>  	/* Pointer to io scheduler's queue */
>  	void *sched_queue;
> +
> +	/* pending metadata requests */
> +	int meta_pending;
>  };
>  
>  #ifdef CONFIG_GROUP_IOSCHED /* CONFIG_GROUP_IOSCHED */

-- 


  reply	other threads:[~2009-09-25  6:26 UTC|newest]

Thread overview: 177+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-24 19:25 IO scheduler based IO controller V10 Vivek Goyal
2009-09-24 19:25 ` [PATCH 01/28] io-controller: Documentation Vivek Goyal
2009-09-24 19:25 ` [PATCH 02/28] io-controller: Core of the elevator fair queuing Vivek Goyal
2009-09-24 19:25 ` [PATCH 03/28] io-controller: Keep a cache of recently expired queues Vivek Goyal
2009-09-24 19:25 ` [PATCH 04/28] io-controller: Common flat fair queuing code in elevaotor layer Vivek Goyal
2009-09-24 19:25 ` [PATCH 05/28] io-controller: Modify cfq to make use of flat elevator fair queuing Vivek Goyal
2009-09-24 19:25 ` [PATCH 06/28] io-controller: Core scheduler changes to support hierarhical scheduling Vivek Goyal
2009-09-24 19:25 ` [PATCH 07/28] io-controller: cgroup related changes for hierarchical group support Vivek Goyal
2009-09-24 19:25 ` [PATCH 08/28] io-controller: Common hierarchical fair queuing code in elevaotor layer Vivek Goyal
2009-09-24 19:25 ` [PATCH 09/28] io-controller: cfq changes to use " Vivek Goyal
2009-09-24 19:25 ` [PATCH 10/28] io-controller: Export disk time used and nr sectors dipatched through cgroups Vivek Goyal
2009-09-24 19:25 ` [PATCH 11/28] io-controller: Debug hierarchical IO scheduling Vivek Goyal
2009-09-24 19:25 ` [PATCH 12/28] io-controller: Introduce group idling Vivek Goyal
2009-09-24 19:25 ` [PATCH 13/28] io-controller: Implement wait busy for io queues Vivek Goyal
2009-09-24 19:25 ` [PATCH 14/28] io-controller: Keep track of late preemptions Vivek Goyal
2009-09-24 19:25 ` [PATCH 15/28] io-controller: Allow CFQ specific extra preemptions Vivek Goyal
2009-09-25  6:24   ` Gui Jianfeng [this message]
2009-09-24 19:25 ` [PATCH 16/28] io-controller: Wait for requests to complete from last queue before new queue is scheduled Vivek Goyal
2009-09-24 19:25 ` [PATCH 17/28] io-controller: Separate out queue and data Vivek Goyal
2009-09-24 19:25 ` [PATCH 18/28] io-conroller: Prepare elevator layer for single queue schedulers Vivek Goyal
2009-09-24 19:25 ` [PATCH 19/28] io-controller: Avoid expiring ioq for single ioq scheduler if only root group Vivek Goyal
2009-09-24 19:25 ` [PATCH 20/28] io-controller: noop changes for hierarchical fair queuing Vivek Goyal
2009-09-24 19:25 ` [PATCH 21/28] io-controller: deadline " Vivek Goyal
2009-09-24 19:25 ` [PATCH 22/28] io-controller: anticipatory " Vivek Goyal
2009-09-24 19:25 ` [PATCH 23/28] io-controller: blkio_cgroup patches from Ryo to track async bios Vivek Goyal
2009-09-24 19:25 ` [PATCH 24/28] io-controller: map async requests to appropriate cgroup Vivek Goyal
2009-09-24 19:25 ` [PATCH 25/28] io-controller: Per cgroup request descriptor support Vivek Goyal
2009-09-24 19:25 ` [PATCH 26/28] io-controller: Per io group bdi congestion interface Vivek Goyal
2009-09-24 19:25 ` [PATCH 27/28] io-controller: Support per cgroup per device weights and io class Vivek Goyal
2009-09-24 19:25 ` [PATCH 28/28] io-controller: debug elevator fair queuing support Vivek Goyal
2009-09-24 21:33 ` IO scheduler based IO controller V10 Andrew Morton
2009-09-25  1:09   ` KAMEZAWA Hiroyuki
2009-09-25  1:18     ` KAMEZAWA Hiroyuki
2009-09-25  5:29       ` Balbir Singh
2009-09-25  7:09         ` Ryo Tsuruta
2009-09-25  4:14     ` Vivek Goyal
2009-09-25  5:04   ` Vivek Goyal
2009-09-25  9:07     ` Ryo Tsuruta
2009-09-25 14:33       ` Vivek Goyal
2009-09-28  7:30         ` Ryo Tsuruta
2009-09-25 15:04       ` Rik van Riel
2009-09-28  7:38         ` Ryo Tsuruta
2009-10-08  4:42   ` More performance numbers (Was: Re: IO scheduler based IO controller V10) Vivek Goyal
2009-10-08  8:34     ` Andrea Righi
2009-10-10 19:53   ` Performance numbers with IO throttling patches " Vivek Goyal
2009-10-10 22:27     ` Andrea Righi
2009-10-11 12:32       ` Vivek Goyal
2009-10-12 21:11       ` Vivek Goyal
2009-10-17 15:18         ` Andrea Righi
2009-09-25  2:20 ` IO scheduler based IO controller V10 Ulrich Lukas
2009-09-25 20:26   ` Vivek Goyal
2009-09-26 14:51     ` Mike Galbraith
2009-09-27  6:55       ` Mike Galbraith
2009-09-27 16:42         ` Jens Axboe
2009-09-27 18:15           ` Mike Galbraith
2009-09-28  4:04             ` Mike Galbraith
2009-09-28  5:55               ` Mike Galbraith
2009-09-28 17:48               ` Vivek Goyal
2009-09-28 18:24                 ` Mike Galbraith
2009-09-30 19:58           ` Mike Galbraith
2009-09-30 20:05             ` Mike Galbraith
2009-09-30 20:24               ` Vivek Goyal
2009-10-01  7:33                 ` Mike Galbraith
2009-10-01 18:58                   ` Jens Axboe
2009-10-02  6:23                     ` Mike Galbraith
2009-10-02  8:04                       ` Jens Axboe
2009-10-02  8:53                         ` Mike Galbraith
2009-10-02  9:00                           ` Mike Galbraith
2009-10-02  9:55                           ` Jens Axboe
2009-10-02 12:22                             ` Mike Galbraith
2009-10-02  9:24                         ` Ingo Molnar
2009-10-02  9:28                           ` Jens Axboe
2009-10-02 14:24                             ` Linus Torvalds
2009-10-02 14:45                               ` Mike Galbraith
2009-10-02 14:57                                 ` Jens Axboe
2009-10-02 14:56                               ` Jens Axboe
2009-10-02 15:14                                 ` Linus Torvalds
2009-10-02 16:01                                   ` jim owens
2009-10-02 17:11                                   ` Jens Axboe
2009-10-02 17:20                                     ` Ingo Molnar
2009-10-02 17:25                                       ` Jens Axboe
2009-10-02 17:28                                         ` Ingo Molnar
2009-10-02 17:37                                           ` Jens Axboe
2009-10-02 17:56                                             ` Ingo Molnar
2009-10-02 18:04                                               ` Jens Axboe
2009-10-02 18:22                                                 ` Mike Galbraith
2009-10-02 18:26                                                   ` Jens Axboe
2009-10-02 18:33                                                     ` Mike Galbraith
2009-10-02 18:36                                                 ` Theodore Tso
2009-10-02 18:45                                                   ` Jens Axboe
2009-10-02 19:01                                                     ` Ingo Molnar
2009-10-02 19:09                                                       ` Jens Axboe
2009-10-02 18:13                                             ` Mike Galbraith
2009-10-02 18:19                                               ` Jens Axboe
2009-10-02 18:57                                                 ` Mike Galbraith
2009-10-02 20:47                                                   ` Mike Galbraith
2009-10-03  5:48                                                 ` Mike Galbraith
2009-10-03  5:56                                                   ` Mike Galbraith
2009-10-03  6:31                                                     ` tweaking IO latency [was Re: IO scheduler based IO controller V10] Mike Galbraith
2009-10-03  7:24                                                     ` IO scheduler based IO controller V10 Jens Axboe
2009-10-03  9:00                                                       ` Mike Galbraith
2009-10-03  9:12                                                         ` Corrado Zoccolo
2009-10-03 13:18                                                           ` Jens Axboe
2009-10-03 13:17                                                         ` Jens Axboe
2009-10-03 11:29                                                     ` Vivek Goyal
2009-10-03 12:40                                                       ` Do not overload dispatch queue (Was: Re: IO scheduler based IO controller V10) Vivek Goyal
2009-10-03 13:21                                                         ` Jens Axboe
2009-10-03 13:56                                                           ` Vivek Goyal
2009-10-03 14:02                                                             ` Mike Galbraith
2009-10-03 14:28                                                               ` Jens Axboe
2009-10-03 14:33                                                                 ` Mike Galbraith
2009-10-03 14:51                                                                 ` Mike Galbraith
2009-10-03 15:14                                                                   ` Jens Axboe
2009-10-03 15:57                                                                     ` Mike Galbraith
2009-10-03 17:35                                                                       ` Jens Axboe
2009-10-03 17:45                                                                         ` Linus Torvalds
2009-10-03 17:51                                                                           ` Jens Axboe
2009-10-03 19:07                                                                         ` Mike Galbraith
2009-10-03 19:11                                                                           ` Mike Galbraith
2009-10-03 19:23                                                                           ` Jens Axboe
2009-10-03 19:49                                                                             ` Mike Galbraith
2009-10-04 10:50                                                                               ` Mike Galbraith
2009-10-04 11:33                                                                                 ` Mike Galbraith
2009-10-04 17:39                                                                                 ` Jens Axboe
2009-10-04 18:23                                                                                   ` Mike Galbraith
2009-10-04 18:38                                                                                     ` Jens Axboe
2009-10-04 19:47                                                                                       ` Mike Galbraith
2009-10-04 20:17                                                                                         ` Jens Axboe
2009-10-04 22:15                                                                                           ` Mike Galbraith
2009-10-03 13:57                                                         ` Mike Galbraith
2009-10-03  7:20                                                   ` IO scheduler based IO controller V10 Ingo Molnar
2009-10-03  7:25                                                     ` Jens Axboe
2009-10-03  8:53                                                       ` Mike Galbraith
2009-10-03  9:01                                                       ` Corrado Zoccolo
2009-10-02 16:33                                 ` Ray Lee
2009-10-02 17:13                                   ` Jens Axboe
2009-10-02 16:22                               ` Ingo Molnar
2009-10-02  9:36                           ` Mike Galbraith
2009-10-02 16:37                             ` Ingo Molnar
2009-10-02 18:08                   ` Jens Axboe
2009-10-02 18:29                     ` Mike Galbraith
2009-10-02 18:36                       ` Jens Axboe
2009-09-27 17:00     ` Corrado Zoccolo
2009-09-28 14:56       ` Vivek Goyal
2009-09-28 15:35         ` Corrado Zoccolo
2009-09-28 17:14           ` Vivek Goyal
2009-09-29  7:10             ` Corrado Zoccolo
2009-09-28 17:51           ` Mike Galbraith
2009-09-28 18:18             ` Vivek Goyal
2009-09-28 18:53               ` Mike Galbraith
2009-09-29  7:14                 ` Corrado Zoccolo
2009-09-29  5:55             ` Mike Galbraith
2009-09-29  0:37 ` Nauman Rafique
2009-09-29  3:22   ` Vivek Goyal
2009-09-29  9:56     ` Ryo Tsuruta
2009-09-29 10:49       ` Takuya Yoshikawa
2009-09-29 14:10       ` Vivek Goyal
2009-09-29 19:53         ` Nauman Rafique
2009-09-30  8:43         ` Ryo Tsuruta
2009-09-30 11:05           ` Vivek Goyal
2009-10-01  6:41             ` Ryo Tsuruta
2009-10-01 13:31               ` Vivek Goyal
2009-10-02  2:57                 ` Vivek Goyal
2009-10-02 20:27                   ` Munehiro Ikeda
2009-10-05 10:38                     ` Ryo Tsuruta
2009-10-05 12:31                       ` Vivek Goyal
2009-10-05 14:55                         ` Ryo Tsuruta
2009-10-05 17:10                           ` Vivek Goyal
2009-10-05 18:11                             ` Nauman Rafique
2009-10-06  7:17                               ` Ryo Tsuruta
2009-10-06 11:22                                 ` Vivek Goyal
2009-10-07 14:38                                   ` Ryo Tsuruta
2009-10-07 15:09                                     ` Vivek Goyal
2009-10-08  2:18                                       ` Ryo Tsuruta
2009-10-07 16:41                                     ` Rik van Riel
2009-10-08 10:22                                       ` Ryo Tsuruta
2009-09-30  3:11       ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ABC6222.9090103@cn.fujitsu.com \
    --to=guijianfeng@cn.fujitsu.com \
    --cc=agk@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=dm-devel@redhat.com \
    --cc=dpshah@google.com \
    --cc=fchecconi@gmail.com \
    --cc=fernando@oss.ntt.co.jp \
    --cc=jens.axboe@oracle.com \
    --cc=jmarchan@redhat.com \
    --cc=jmoyer@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=m-ikeda@ds.jp.nec.com \
    --cc=mikew@google.com \
    --cc=mingo@elte.hu \
    --cc=nauman@google.com \
    --cc=paolo.valente@unimore.it \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=righi.andrea@gmail.com \
    --cc=ryov@valinux.co.jp \
    --cc=s-uchida@ap.jp.nec.com \
    --cc=taka@valinux.co.jp \
    --cc=torvalds@linux-foundation.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).