From: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux-kernel@vger.kernel.org, jens.axboe@oracle.com,
containers@lists.linux-foundation.org, dm-devel@redhat.com,
nauman@google.com, dpshah@google.com, lizf@cn.fujitsu.com,
mikew@google.com, fchecconi@gmail.com, paolo.valente@unimore.it,
ryov@valinux.co.jp, fernando@oss.ntt.co.jp,
s-uchida@ap.jp.nec.com, taka@valinux.co.jp, jmoyer@redhat.com,
dhaval@linux.vnet.ibm.com, balbir@linux.vnet.ibm.com,
righi.andrea@gmail.com, m-ikeda@ds.jp.nec.com, agk@redhat.com,
akpm@linux-foundation.org, peterz@infradead.org,
jmarchan@redhat.com, torvalds@linux-foundation.org,
mingo@elte.hu, riel@redhat.com,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH 15/28] io-controller: Allow CFQ specific extra preemptions
Date: Fri, 25 Sep 2009 14:24:34 +0800 [thread overview]
Message-ID: <4ABC6222.9090103@cn.fujitsu.com> (raw)
In-Reply-To: <1253820332-10246-16-git-send-email-vgoyal@redhat.com>
Vivek Goyal wrote:
> o CFQ allows a reader preemting a writer. So far we allow this with-in group
> but not across groups. But there seems to be following special case where
> this preemption might make sense.
>
> root
> / \
> R Group
> |
> W
>
> Now here reader should be able to preempt the writer. Think of there are
> 10 groups each running a writer and an admin trying to do "ls" and he
> experiences suddenly high latencies for ls.
Hi Vivek,
This preemption might be unfair to the readers who stay in the same group with
writer. Consider the following:
root
/ \
R1 Group
/ \
R2 W
Say W is running and late preemption is enabled, then a request goes into R1,
R1 will preempt W immediately regardless of R2. Now R2 don't have a chance to
get scheduled even if R1 has a very high vdisktime. It seems not so fair to R2.
So I suggest the number of readers in group should be taken into account when
making this preemption decision. R1 should only preempts W when there are not
any readers in that group.
Thanks,
Gui Jianfeng
>
> Same is true for meta data requests. If there is a meta data request and
> a reader is running inside a sibling group, preemption will be allowed.
> Note, following is not allowed.
> root
> / \
> group1 group2
> | |
> R W
>
> Here reader can't preempt writer.
>
> o Put meta data requesting queues at the front of the service tree. Generally
> such queues will preempt currently running queue but not in following case.
> root
> / \
> group1 group2
> | / \
> R1 R3 R2 (meta data)
>
> Here R2 is having a meta data request but it will not preempt R1. We need
> to make sure that R2 gets queued ahead of R3 so taht once group2 gets
> going, we first service R2 and then R3 and not vice versa.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
> block/elevator-fq.c | 47 +++++++++++++++++++++++++++++++++++++++++++++--
> block/elevator-fq.h | 3 +++
> 2 files changed, 48 insertions(+), 2 deletions(-)
>
> diff --git a/block/elevator-fq.c b/block/elevator-fq.c
> index 25beaf7..8ff8a19 100644
> --- a/block/elevator-fq.c
> +++ b/block/elevator-fq.c
> @@ -701,6 +701,7 @@ static void enqueue_io_entity(struct io_entity *entity)
> struct io_service_tree *st;
> struct io_sched_data *sd = io_entity_sched_data(entity);
> struct io_queue *ioq = ioq_of(entity);
> + int add_front = 0;
>
> if (entity->on_idle_st)
> dequeue_io_entity_idle(entity);
> @@ -716,12 +717,22 @@ static void enqueue_io_entity(struct io_entity *entity)
> st = entity->st;
> st->nr_active++;
> sd->nr_active++;
> +
> /* Keep a track of how many sync queues are backlogged on this group */
> if (ioq && elv_ioq_sync(ioq) && !elv_ioq_class_idle(ioq))
> sd->nr_sync++;
> entity->on_st = 1;
> - place_entity(st, entity, 0);
> - __enqueue_io_entity(st, entity, 0);
> +
> + /*
> + * If a meta data request is pending in this queue, put this
> + * queue at the front so that it gets a chance to run first
> + * as soon as the associated group becomes eligbile to run.
> + */
> + if (ioq && ioq->meta_pending)
> + add_front = 1;
> +
> + place_entity(st, entity, add_front);
> + __enqueue_io_entity(st, entity, add_front);
> debug_update_stats_enqueue(entity);
> }
>
> @@ -2280,6 +2291,31 @@ static int elv_should_preempt(struct request_queue *q, struct io_queue *new_ioq,
> return 1;
>
> /*
> + * Allow some additional preemptions where a reader queue gets
> + * backlogged and some writer queue is running under any of the
> + * sibling groups.
> + *
> + * root
> + * / \
> + * R group
> + * |
> + * W
> + */
> +
> + if (ioq_of(new_entity) == new_ioq && iog_of(entity)) {
> + /* Let reader queue preempt writer in sibling group */
> + if (elv_ioq_sync(new_ioq) && !elv_ioq_sync(active_ioq))
> + return 1;
> + /*
> + * So both queues are sync. Let the new request get disk time if
> + * it's a metadata request and the current queue is doing
> + * regular IO.
> + */
> + if (new_ioq->meta_pending && !active_ioq->meta_pending)
> + return 1;
> + }
> +
> + /*
> * If both the queues belong to same group, check with io scheduler
> * if it has additional criterion based on which it wants to
> * preempt existing queue.
> @@ -2335,6 +2371,8 @@ void elv_ioq_request_add(struct request_queue *q, struct request *rq)
> BUG_ON(!efqd);
> BUG_ON(!ioq);
> ioq->nr_queued++;
> + if (rq_is_meta(rq))
> + ioq->meta_pending++;
> elv_log_ioq(efqd, ioq, "add rq: rq_queued=%d", ioq->nr_queued);
>
> if (!elv_ioq_busy(ioq))
> @@ -2669,6 +2707,11 @@ void elv_ioq_request_removed(struct elevator_queue *e, struct request *rq)
> ioq = rq->ioq;
> BUG_ON(!ioq);
> ioq->nr_queued--;
> +
> + if (rq_is_meta(rq)) {
> + WARN_ON(!ioq->meta_pending);
> + ioq->meta_pending--;
> + }
> }
>
> /* A request got dispatched. Do the accounting. */
> diff --git a/block/elevator-fq.h b/block/elevator-fq.h
> index 2992d93..27ff5c4 100644
> --- a/block/elevator-fq.h
> +++ b/block/elevator-fq.h
> @@ -100,6 +100,9 @@ struct io_queue {
>
> /* Pointer to io scheduler's queue */
> void *sched_queue;
> +
> + /* pending metadata requests */
> + int meta_pending;
> };
>
> #ifdef CONFIG_GROUP_IOSCHED /* CONFIG_GROUP_IOSCHED */
--
next prev parent reply other threads:[~2009-09-25 6:26 UTC|newest]
Thread overview: 177+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-24 19:25 IO scheduler based IO controller V10 Vivek Goyal
2009-09-24 19:25 ` [PATCH 01/28] io-controller: Documentation Vivek Goyal
2009-09-24 19:25 ` [PATCH 02/28] io-controller: Core of the elevator fair queuing Vivek Goyal
2009-09-24 19:25 ` [PATCH 03/28] io-controller: Keep a cache of recently expired queues Vivek Goyal
2009-09-24 19:25 ` [PATCH 04/28] io-controller: Common flat fair queuing code in elevaotor layer Vivek Goyal
2009-09-24 19:25 ` [PATCH 05/28] io-controller: Modify cfq to make use of flat elevator fair queuing Vivek Goyal
2009-09-24 19:25 ` [PATCH 06/28] io-controller: Core scheduler changes to support hierarhical scheduling Vivek Goyal
2009-09-24 19:25 ` [PATCH 07/28] io-controller: cgroup related changes for hierarchical group support Vivek Goyal
2009-09-24 19:25 ` [PATCH 08/28] io-controller: Common hierarchical fair queuing code in elevaotor layer Vivek Goyal
2009-09-24 19:25 ` [PATCH 09/28] io-controller: cfq changes to use " Vivek Goyal
2009-09-24 19:25 ` [PATCH 10/28] io-controller: Export disk time used and nr sectors dipatched through cgroups Vivek Goyal
2009-09-24 19:25 ` [PATCH 11/28] io-controller: Debug hierarchical IO scheduling Vivek Goyal
2009-09-24 19:25 ` [PATCH 12/28] io-controller: Introduce group idling Vivek Goyal
2009-09-24 19:25 ` [PATCH 13/28] io-controller: Implement wait busy for io queues Vivek Goyal
2009-09-24 19:25 ` [PATCH 14/28] io-controller: Keep track of late preemptions Vivek Goyal
2009-09-24 19:25 ` [PATCH 15/28] io-controller: Allow CFQ specific extra preemptions Vivek Goyal
2009-09-25 6:24 ` Gui Jianfeng [this message]
2009-09-24 19:25 ` [PATCH 16/28] io-controller: Wait for requests to complete from last queue before new queue is scheduled Vivek Goyal
2009-09-24 19:25 ` [PATCH 17/28] io-controller: Separate out queue and data Vivek Goyal
2009-09-24 19:25 ` [PATCH 18/28] io-conroller: Prepare elevator layer for single queue schedulers Vivek Goyal
2009-09-24 19:25 ` [PATCH 19/28] io-controller: Avoid expiring ioq for single ioq scheduler if only root group Vivek Goyal
2009-09-24 19:25 ` [PATCH 20/28] io-controller: noop changes for hierarchical fair queuing Vivek Goyal
2009-09-24 19:25 ` [PATCH 21/28] io-controller: deadline " Vivek Goyal
2009-09-24 19:25 ` [PATCH 22/28] io-controller: anticipatory " Vivek Goyal
2009-09-24 19:25 ` [PATCH 23/28] io-controller: blkio_cgroup patches from Ryo to track async bios Vivek Goyal
2009-09-24 19:25 ` [PATCH 24/28] io-controller: map async requests to appropriate cgroup Vivek Goyal
2009-09-24 19:25 ` [PATCH 25/28] io-controller: Per cgroup request descriptor support Vivek Goyal
2009-09-24 19:25 ` [PATCH 26/28] io-controller: Per io group bdi congestion interface Vivek Goyal
2009-09-24 19:25 ` [PATCH 27/28] io-controller: Support per cgroup per device weights and io class Vivek Goyal
2009-09-24 19:25 ` [PATCH 28/28] io-controller: debug elevator fair queuing support Vivek Goyal
2009-09-24 21:33 ` IO scheduler based IO controller V10 Andrew Morton
2009-09-25 1:09 ` KAMEZAWA Hiroyuki
2009-09-25 1:18 ` KAMEZAWA Hiroyuki
2009-09-25 5:29 ` Balbir Singh
2009-09-25 7:09 ` Ryo Tsuruta
2009-09-25 4:14 ` Vivek Goyal
2009-09-25 5:04 ` Vivek Goyal
2009-09-25 9:07 ` Ryo Tsuruta
2009-09-25 14:33 ` Vivek Goyal
2009-09-28 7:30 ` Ryo Tsuruta
2009-09-25 15:04 ` Rik van Riel
2009-09-28 7:38 ` Ryo Tsuruta
2009-10-08 4:42 ` More performance numbers (Was: Re: IO scheduler based IO controller V10) Vivek Goyal
2009-10-08 8:34 ` Andrea Righi
2009-10-10 19:53 ` Performance numbers with IO throttling patches " Vivek Goyal
2009-10-10 22:27 ` Andrea Righi
2009-10-11 12:32 ` Vivek Goyal
2009-10-12 21:11 ` Vivek Goyal
2009-10-17 15:18 ` Andrea Righi
2009-09-25 2:20 ` IO scheduler based IO controller V10 Ulrich Lukas
2009-09-25 20:26 ` Vivek Goyal
2009-09-26 14:51 ` Mike Galbraith
2009-09-27 6:55 ` Mike Galbraith
2009-09-27 16:42 ` Jens Axboe
2009-09-27 18:15 ` Mike Galbraith
2009-09-28 4:04 ` Mike Galbraith
2009-09-28 5:55 ` Mike Galbraith
2009-09-28 17:48 ` Vivek Goyal
2009-09-28 18:24 ` Mike Galbraith
2009-09-30 19:58 ` Mike Galbraith
2009-09-30 20:05 ` Mike Galbraith
2009-09-30 20:24 ` Vivek Goyal
2009-10-01 7:33 ` Mike Galbraith
2009-10-01 18:58 ` Jens Axboe
2009-10-02 6:23 ` Mike Galbraith
2009-10-02 8:04 ` Jens Axboe
2009-10-02 8:53 ` Mike Galbraith
2009-10-02 9:00 ` Mike Galbraith
2009-10-02 9:55 ` Jens Axboe
2009-10-02 12:22 ` Mike Galbraith
2009-10-02 9:24 ` Ingo Molnar
2009-10-02 9:28 ` Jens Axboe
2009-10-02 14:24 ` Linus Torvalds
2009-10-02 14:45 ` Mike Galbraith
2009-10-02 14:57 ` Jens Axboe
2009-10-02 14:56 ` Jens Axboe
2009-10-02 15:14 ` Linus Torvalds
2009-10-02 16:01 ` jim owens
2009-10-02 17:11 ` Jens Axboe
2009-10-02 17:20 ` Ingo Molnar
2009-10-02 17:25 ` Jens Axboe
2009-10-02 17:28 ` Ingo Molnar
2009-10-02 17:37 ` Jens Axboe
2009-10-02 17:56 ` Ingo Molnar
2009-10-02 18:04 ` Jens Axboe
2009-10-02 18:22 ` Mike Galbraith
2009-10-02 18:26 ` Jens Axboe
2009-10-02 18:33 ` Mike Galbraith
2009-10-02 18:36 ` Theodore Tso
2009-10-02 18:45 ` Jens Axboe
2009-10-02 19:01 ` Ingo Molnar
2009-10-02 19:09 ` Jens Axboe
2009-10-02 18:13 ` Mike Galbraith
2009-10-02 18:19 ` Jens Axboe
2009-10-02 18:57 ` Mike Galbraith
2009-10-02 20:47 ` Mike Galbraith
2009-10-03 5:48 ` Mike Galbraith
2009-10-03 5:56 ` Mike Galbraith
2009-10-03 6:31 ` tweaking IO latency [was Re: IO scheduler based IO controller V10] Mike Galbraith
2009-10-03 7:24 ` IO scheduler based IO controller V10 Jens Axboe
2009-10-03 9:00 ` Mike Galbraith
2009-10-03 9:12 ` Corrado Zoccolo
2009-10-03 13:18 ` Jens Axboe
2009-10-03 13:17 ` Jens Axboe
2009-10-03 11:29 ` Vivek Goyal
2009-10-03 12:40 ` Do not overload dispatch queue (Was: Re: IO scheduler based IO controller V10) Vivek Goyal
2009-10-03 13:21 ` Jens Axboe
2009-10-03 13:56 ` Vivek Goyal
2009-10-03 14:02 ` Mike Galbraith
2009-10-03 14:28 ` Jens Axboe
2009-10-03 14:33 ` Mike Galbraith
2009-10-03 14:51 ` Mike Galbraith
2009-10-03 15:14 ` Jens Axboe
2009-10-03 15:57 ` Mike Galbraith
2009-10-03 17:35 ` Jens Axboe
2009-10-03 17:45 ` Linus Torvalds
2009-10-03 17:51 ` Jens Axboe
2009-10-03 19:07 ` Mike Galbraith
2009-10-03 19:11 ` Mike Galbraith
2009-10-03 19:23 ` Jens Axboe
2009-10-03 19:49 ` Mike Galbraith
2009-10-04 10:50 ` Mike Galbraith
2009-10-04 11:33 ` Mike Galbraith
2009-10-04 17:39 ` Jens Axboe
2009-10-04 18:23 ` Mike Galbraith
2009-10-04 18:38 ` Jens Axboe
2009-10-04 19:47 ` Mike Galbraith
2009-10-04 20:17 ` Jens Axboe
2009-10-04 22:15 ` Mike Galbraith
2009-10-03 13:57 ` Mike Galbraith
2009-10-03 7:20 ` IO scheduler based IO controller V10 Ingo Molnar
2009-10-03 7:25 ` Jens Axboe
2009-10-03 8:53 ` Mike Galbraith
2009-10-03 9:01 ` Corrado Zoccolo
2009-10-02 16:33 ` Ray Lee
2009-10-02 17:13 ` Jens Axboe
2009-10-02 16:22 ` Ingo Molnar
2009-10-02 9:36 ` Mike Galbraith
2009-10-02 16:37 ` Ingo Molnar
2009-10-02 18:08 ` Jens Axboe
2009-10-02 18:29 ` Mike Galbraith
2009-10-02 18:36 ` Jens Axboe
2009-09-27 17:00 ` Corrado Zoccolo
2009-09-28 14:56 ` Vivek Goyal
2009-09-28 15:35 ` Corrado Zoccolo
2009-09-28 17:14 ` Vivek Goyal
2009-09-29 7:10 ` Corrado Zoccolo
2009-09-28 17:51 ` Mike Galbraith
2009-09-28 18:18 ` Vivek Goyal
2009-09-28 18:53 ` Mike Galbraith
2009-09-29 7:14 ` Corrado Zoccolo
2009-09-29 5:55 ` Mike Galbraith
2009-09-29 0:37 ` Nauman Rafique
2009-09-29 3:22 ` Vivek Goyal
2009-09-29 9:56 ` Ryo Tsuruta
2009-09-29 10:49 ` Takuya Yoshikawa
2009-09-29 14:10 ` Vivek Goyal
2009-09-29 19:53 ` Nauman Rafique
2009-09-30 8:43 ` Ryo Tsuruta
2009-09-30 11:05 ` Vivek Goyal
2009-10-01 6:41 ` Ryo Tsuruta
2009-10-01 13:31 ` Vivek Goyal
2009-10-02 2:57 ` Vivek Goyal
2009-10-02 20:27 ` Munehiro Ikeda
2009-10-05 10:38 ` Ryo Tsuruta
2009-10-05 12:31 ` Vivek Goyal
2009-10-05 14:55 ` Ryo Tsuruta
2009-10-05 17:10 ` Vivek Goyal
2009-10-05 18:11 ` Nauman Rafique
2009-10-06 7:17 ` Ryo Tsuruta
2009-10-06 11:22 ` Vivek Goyal
2009-10-07 14:38 ` Ryo Tsuruta
2009-10-07 15:09 ` Vivek Goyal
2009-10-08 2:18 ` Ryo Tsuruta
2009-10-07 16:41 ` Rik van Riel
2009-10-08 10:22 ` Ryo Tsuruta
2009-09-30 3:11 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ABC6222.9090103@cn.fujitsu.com \
--to=guijianfeng@cn.fujitsu.com \
--cc=agk@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=containers@lists.linux-foundation.org \
--cc=dhaval@linux.vnet.ibm.com \
--cc=dm-devel@redhat.com \
--cc=dpshah@google.com \
--cc=fchecconi@gmail.com \
--cc=fernando@oss.ntt.co.jp \
--cc=jens.axboe@oracle.com \
--cc=jmarchan@redhat.com \
--cc=jmoyer@redhat.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=m-ikeda@ds.jp.nec.com \
--cc=mikew@google.com \
--cc=mingo@elte.hu \
--cc=nauman@google.com \
--cc=paolo.valente@unimore.it \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=righi.andrea@gmail.com \
--cc=ryov@valinux.co.jp \
--cc=s-uchida@ap.jp.nec.com \
--cc=taka@valinux.co.jp \
--cc=torvalds@linux-foundation.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).