From: Vivek Goyal <vgoyal@redhat.com>
To: Divyesh Shah <dpshah@google.com>
Cc: Jeff Moyer <jmoyer@redhat.com>,
linux-kernel@vger.kernel.org, jens.axboe@oracle.com,
nauman@google.com, lizf@cn.fujitsu.com, ryov@valinux.co.jp,
fernando@oss.ntt.co.jp, s-uchida@ap.jp.nec.com,
taka@valinux.co.jp, guijianfeng@cn.fujitsu.com,
balbir@linux.vnet.ibm.com, righi.andrea@gmail.com,
m-ikeda@ds.jp.nec.com, akpm@linux-foundation.org,
riel@redhat.com, kamezawa.hiroyu@jp.fujitsu.com
Subject: Re: [PATCH 03/20] blkio: Introduce the notion of weights
Date: Wed, 4 Nov 2009 14:00:33 -0500 [thread overview]
Message-ID: <20091104190033.GG2870@redhat.com> (raw)
In-Reply-To: <af41c7c40911040907y11103944ief0654f84ffdf5ed@mail.gmail.com>
On Wed, Nov 04, 2009 at 09:07:41AM -0800, Divyesh Shah wrote:
> On Wed, Nov 4, 2009 at 7:41 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Wed, Nov 04, 2009 at 10:06:16AM -0500, Jeff Moyer wrote:
> > > Vivek Goyal <vgoyal@redhat.com> writes:
> > >
> > > > o Introduce the notion of weights. Priorities are mapped to weights internally.
> > > > These weights will be useful once IO groups are introduced and group's share
> > > > will be decided by the group weight.
> > >
> > > I'm sorry, but I need more background to review this patch. Where do
> > > the min and max come from? Why do you scale 7-0 from 200-900? How does
> > > this map to what was there before (exactly, approximately)?
> > >
> >
> > Well, So far we only have the notion of iopriority for the process and
> > based on that we determine time slice length.
> >
> > Soon we will throw cfq groups also in the mix. Because cpu IO controller
> > is weight driven, people have shown preference that group's share should
> > be decided based on its weight and not introduce the notion of ioprio for
> > groups.
> >
> > So now core scheduling algorithm only recognizes weights for entities (be it
> > cfq queues or cfq groups), and it is required that we convert the ioprio
> > of cfqq into weight.
> >
> > Now it is a matter of coming up with what weight range do we support and
> > how ioprio should be mapped onto these weights. We can always change the
> > mappings but to being with, I have followed following.
> >
> > Allow a weight range from 100 to 1000. Allowing too small a weights like
> > "1", can lead to very interesting corner cases and I wanted to avoid that
> > in first implementation. For example, if some group with weight "1" gets
> > a time slice of 100ms, its vtime will be really high and after that it
> > will not get scheduled in for a very long time.
> >
> > Seconly allowing too small a weights can make vtime of the tree move very
> > fast with faster wrap around of min_vdistime. (especially on SSD where idling
> > might not be enabled, and for every queue expiry we will attribute minimum of
> > 1ms of slice. If weight of the group is "1" it will higher vtime and
> > min_vdisktime will move very fast). We don't want too fast a wrap around
> > of min_vdisktime (especially in case of idle tree. That infrastructure is
> > not part of current patches).
> >
> > Hence, to begin with I wanted to limit the range of weights allowed because
> > wider range opens up lot of interesting corner cases. That's why limited
> > minimum weight to 100. So at max user can expect the 1000/100=10 times service
> > differentiation between highest and lower weight groups. If folks need more
> > than that, we can look into it once things stablize.
>
> We definitely need the 1:100 differentiation. I'm ok with adding that
> later after the core set of patches stabilize but just letting you
> know that it is important to us.
Good to know. I will begin with max service difference of 10 times and
once things stablize, will go enable wider range of weights.
> Also curious why you chose a higher
> range 100-1000 instead of 10-100? For smaller vtime leaps?
Good question. Initially we had thought that range of 1-1000 should be
good enough. Later decided to cap minimum weight to 100. But same can be
achieved by smaller range of 1-100 and capping minimum weight at 10. This
will make vtime leap forward slower also.
Later if somebody needs ratio higher than 1:100, we can think of
supporting even wider weight range.
Thanks Divyesh for the idea. I think I will change weight range to 10-100
and map ioprio 0-7 on weights 20 to 90.
Thanks
Vivek
>
> >
> > Priority and weights follow reverse order. Higher priority means low
> > weight and vice-versa.
> >
> > Currently we support 8 priority levels and prio "4" is the middle point.
> > Anything higher than prio 4 gets 20% less slice as compared to prio 4 and
> > priorities lower than 4, get 20% higher slice of prio 4 (20% higher/lower
> > for each priority level).
> >
> > For weight range 100 - 1000, 500 can be considered as mid point. Now this
> > is how priority mapping looks like.
> >
> > 100 200 300 400 500 600 700 800 900 1000 (Weights)
> > 7 6 5 4 3 2 1 0 (io prio).
> >
> > Once priorities are converted to weights, we are able to retain the notion
> > of 20% difference between prio levels by choosing 500 as the mid point and
> > mapping prio 0-7 to weights 900-200, hence this mapping.
> >
> > I am all ears if you have any suggestions on how this ca be handled
> > better.
> >
> > Thanks
> > Vivek
next prev parent reply other threads:[~2009-11-04 19:01 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-03 23:43 [RFC] Block IO Controller V1 Vivek Goyal
2009-11-03 23:43 ` [PATCH 01/20] blkio: Documentation Vivek Goyal
2009-11-04 13:37 ` Jeff Moyer
2009-11-04 17:21 ` Balbir Singh
2009-11-04 17:52 ` Vivek Goyal
2009-11-04 23:36 ` Balbir Singh
2009-11-03 23:43 ` [PATCH 02/20] blkio: Change CFQ to use CFS like queue time stamps Vivek Goyal
2009-11-04 14:30 ` Jeff Moyer
2009-11-04 16:37 ` Vivek Goyal
2009-11-04 17:59 ` Corrado Zoccolo
2009-11-04 18:54 ` Vivek Goyal
2009-11-05 2:44 ` Divyesh Shah
2009-11-05 14:39 ` Vivek Goyal
2009-11-04 21:18 ` Corrado Zoccolo
2009-11-04 22:25 ` Vivek Goyal
2009-11-05 8:36 ` Corrado Zoccolo
2009-11-04 23:22 ` Vivek Goyal
2009-11-05 8:27 ` Corrado Zoccolo
2009-11-05 0:05 ` Vivek Goyal
2009-11-06 22:22 ` [RFC] Workload type Vs Groups (Was: Re: [PATCH 02/20] blkio: Change CFQ to use CFS like queue time stamps) Vivek Goyal
2009-11-09 17:33 ` Nauman Rafique
2009-11-09 21:47 ` Corrado Zoccolo
2009-11-09 23:12 ` Vivek Goyal
2009-11-10 11:29 ` Corrado Zoccolo
2009-11-10 13:31 ` Vivek Goyal
2009-11-10 14:12 ` Vivek Goyal
2009-11-10 18:05 ` Corrado Zoccolo
2009-11-10 19:15 ` Vivek Goyal
2009-11-12 8:53 ` Corrado Zoccolo
2009-11-11 0:48 ` [PATCH 02/20] blkio: Change CFQ to use CFS like queue time stamps Gui Jianfeng
2009-11-12 23:07 ` Vivek Goyal
2009-11-13 0:59 ` Gui Jianfeng
2009-11-13 1:24 ` Vivek Goyal
2009-11-13 2:05 ` Gui Jianfeng
2009-11-03 23:43 ` [PATCH 03/20] blkio: Introduce the notion of weights Vivek Goyal
2009-11-04 15:06 ` Jeff Moyer
2009-11-04 15:41 ` Vivek Goyal
2009-11-04 17:07 ` Divyesh Shah
2009-11-04 19:00 ` Vivek Goyal [this message]
2009-11-04 19:15 ` Jeff Moyer
2009-11-03 23:43 ` [PATCH 04/20] blkio: Introduce the notion of cfq entity Vivek Goyal
2009-11-03 23:43 ` [PATCH 05/20] blkio: Introduce the notion of cfq groups Vivek Goyal
2009-11-03 23:43 ` [PATCH 06/20] blkio: Introduce cgroup interface Vivek Goyal
2009-11-04 15:23 ` Jeff Moyer
2009-11-04 16:47 ` Vivek Goyal
2009-11-03 23:43 ` [PATCH 07/20] blkio: Provide capablity to enqueue/dequeue group entities Vivek Goyal
2009-11-04 15:34 ` Jeff Moyer
2009-11-04 16:54 ` Vivek Goyal
2009-11-03 23:43 ` [PATCH 08/20] blkio: Add support for dynamic creation of cfq_groups Vivek Goyal
2009-11-04 16:01 ` Jeff Moyer
2009-11-03 23:43 ` [PATCH 09/20] blkio: Porpogate blkio cgroup weight or ioprio class updation to cfq groups Vivek Goyal
2009-11-05 5:35 ` Gui Jianfeng
2009-11-05 14:42 ` Vivek Goyal
2009-11-03 23:43 ` [PATCH 10/20] blkio: Implement cfq group deletion and reference counting support Vivek Goyal
2009-11-04 18:44 ` Jeff Moyer
2009-11-04 19:00 ` Vivek Goyal
2009-11-03 23:43 ` [PATCH 11/20] blkio: Some CFQ debugging Aid Vivek Goyal
2009-11-04 18:52 ` Jeff Moyer
2009-11-04 19:12 ` Vivek Goyal
2009-11-04 19:25 ` Jeff Moyer
2009-11-05 3:10 ` Divyesh Shah
2009-11-05 14:42 ` Vivek Goyal
2009-11-06 0:56 ` Divyesh Shah
2009-11-03 23:43 ` [PATCH 12/20] blkio: Export disk time and sectors dispatched from cgroup interface Vivek Goyal
2009-11-03 23:43 ` [PATCH 13/20] blkio: Add a group dequeue interface in cgroup for debugging Vivek Goyal
2009-11-03 23:43 ` [PATCH 14/20] blkio: Do not allow request merging across cfq groups Vivek Goyal
2009-11-03 23:43 ` [PATCH 15/20] blkio: Take care of preemptions across groups Vivek Goyal
2009-11-04 19:00 ` Jeff Moyer
2009-11-04 19:27 ` Vivek Goyal
2009-11-04 19:30 ` Jeff Moyer
2009-11-06 7:55 ` Gui Jianfeng
2009-11-06 22:10 ` Vivek Goyal
2009-11-09 7:41 ` Gui Jianfeng
2009-11-03 23:43 ` [PATCH 16/20] blkio: do not select co-operating queues from different cfq groups Vivek Goyal
2009-11-03 23:43 ` [PATCH 17/20] blkio: Wait for queue to get backlogged before it expires Vivek Goyal
2009-11-03 23:43 ` [PATCH 18/20] blkio: arm idle timer even if think time is great then time slice left Vivek Goyal
2009-11-04 19:04 ` Jeff Moyer
2009-11-04 19:17 ` Vivek Goyal
2009-11-03 23:43 ` [PATCH 19/20] blkio: Arm slice timer even if there are requests in driver Vivek Goyal
2009-11-03 23:43 ` [PATCH 20/20] blkio: Drop the reference to queue once the task changes cgroup Vivek Goyal
2009-11-04 19:09 ` Jeff Moyer
2009-11-04 19:18 ` Vivek Goyal
2009-11-04 7:43 ` [RFC] Block IO Controller V1 Jens Axboe
2009-11-04 13:39 ` Vivek Goyal
2009-11-04 19:12 ` Jeff Moyer
2009-11-04 19:19 ` Vivek Goyal
2009-11-04 19:27 ` Jeff Moyer
2009-11-04 19:38 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091104190033.GG2870@redhat.com \
--to=vgoyal@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=dpshah@google.com \
--cc=fernando@oss.ntt.co.jp \
--cc=guijianfeng@cn.fujitsu.com \
--cc=jens.axboe@oracle.com \
--cc=jmoyer@redhat.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=m-ikeda@ds.jp.nec.com \
--cc=nauman@google.com \
--cc=riel@redhat.com \
--cc=righi.andrea@gmail.com \
--cc=ryov@valinux.co.jp \
--cc=s-uchida@ap.jp.nec.com \
--cc=taka@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.