From: Vivek Goyal <vgoyal@redhat.com>
To: Corrado Zoccolo <czoccolo@gmail.com>
Cc: Jeff Moyer <jmoyer@redhat.com>,
linux-kernel@vger.kernel.org, jens.axboe@oracle.com,
nauman@google.com, dpshah@google.com, lizf@cn.fujitsu.com,
ryov@valinux.co.jp, fernando@oss.ntt.co.jp,
s-uchida@ap.jp.nec.com, taka@valinux.co.jp,
guijianfeng@cn.fujitsu.com, balbir@linux.vnet.ibm.com,
righi.andrea@gmail.com, m-ikeda@ds.jp.nec.com,
akpm@linux-foundation.org, riel@redhat.com,
kamezawa.hiroyu@jp.fujitsu.com
Subject: Re: [PATCH 02/20] blkio: Change CFQ to use CFS like queue time stamps
Date: Wed, 4 Nov 2009 13:54:02 -0500 [thread overview]
Message-ID: <20091104185402.GF2870@redhat.com> (raw)
In-Reply-To: <4e5e476b0911040959t122df6aaqee1039f265cbd18a@mail.gmail.com>
On Wed, Nov 04, 2009 at 06:59:45PM +0100, Corrado Zoccolo wrote:
> On Wed, Nov 4, 2009 at 5:37 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Wed, Nov 04, 2009 at 09:30:34AM -0500, Jeff Moyer wrote:
> >> Vivek Goyal <vgoyal@redhat.com> writes:
> >>
> >
> > Thanks for the review Jeff.
> >
> >> > o Currently CFQ provides priority scaled time slices to processes. If a process
> >> > does not use the time slice, either because process did not have sufficient
> >> > IO to do or because think time of process is large and CFQ decided to disable
> >> > idling, then processes looses it time slice share.
> >> ^^^^^^
> >> loses
> >>
> should be 'process loses'
>
Will fix it.
> >> > +static inline u64 max_vdisktime(u64 min_vdisktime, u64 vdisktime)
> >> > +{
> >> > + s64 delta = (s64)(vdisktime - min_vdisktime);
> >> > + if (delta > 0)
> >> > + min_vdisktime = vdisktime;
> >> > +
> >> > + return min_vdisktime;
> >> > +}
> >> > +
> >> > +static inline u64 min_vdisktime(u64 min_vdisktime, u64 vdisktime)
> >> > +{
> >> > + s64 delta = (s64)(vdisktime - min_vdisktime);
> >> > + if (delta < 0)
> >> > + min_vdisktime = vdisktime;
> >> > +
> >> > + return min_vdisktime;
> >> > +}
> >>
> >> Is there a reason you've reimplemented min and max?
> >
> > I think you are referring to min_t and max_t. Will these macros take care
> > of wrapping too?
> >
> > For example, if I used min_t(u64, A, B), then unsigned comparision will
> > not work right wrapping has just taken place for any of the A or B. So if
> > A=-1 and B=2, then min_t() would return B as minimum. This is not right
> > in our case.
> >
> > If we do signed comparison (min_t(s64, A, B)), that also seems to be
> > broken in another case where a value of variable moves from 63bits to 64bits,
> > (A=0x7fffffffffffffff, B=0x8000000000000000). Above will return B as minimum but
> > in our scanario, vdisktime will progress from 0x7fffffffffffffff to
> > 0x8000000000000000 and A should be returned as minimum (unsigned
> > comparison).
> >
> > Hence I took these difnitions from CFS.
> If those are times (measured in jiffies), why are you using u64? You
> could use unsigned long and time_before/time_after, that perform the
> proper wrap checking.
>
This is virtual time and not exactly the jiffies. This can run faster than
real time. In current patchset there are two reasons for that.
- We assign minimum time slice used by queue as 1ms and queue if we expire
the queue immediately after dispatching a request. So if we have really
fast hardware and we decide not to idle, we will be doing queue switch
very fast assigning each queue 1ms as slice used and vtime will be
progressing much faster than real time.
- We shift real time by CFQ_SERVICE_SHIFT so that theoritically one can
see service difference between weights x and x+1 for all values in the
weight range of 1 to 1000. and not loose small difference in the division
part.
vtime calculation is as follows.
vtime = (slice_used << CFQ_SERVICE_SHIFT )* DEFAUTL_WEIGHT/cfqe->weight
minimum value of slice_used is 1 jiffy. DEFAULT WEIGHT is 500. So if one
wants to get different vtime values for queue weights of 998 and 999, we
need to shift slice used by atleast 12 bits.
We are giving 12 bits shift to the real time, we are left with 20 bits on 32
bit hardware. So even if vtime does not run faster, in 1M jiffies (~ 1000
seconds) we will wrap around. I did some 4K sized direct IO on one of the
SSD and it achieve 7K io per second. In the worst case if these IO were
coming from two queues and we were interleaving the requests between these
then we will do 7 queue switch in 1ms. That means vtime travelling 7 times
faster and wrap around will take place in 1000/7 ~= 130 seconds.
I thought that on 32bit hardware we are really close to pushing the
limits, hence I thought of continuing to use 64bit vdisktime/key instead
of a unsigned long one.
Thanks
Vivek
next prev parent reply other threads:[~2009-11-04 18:54 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-03 23:43 [RFC] Block IO Controller V1 Vivek Goyal
2009-11-03 23:43 ` [PATCH 01/20] blkio: Documentation Vivek Goyal
2009-11-04 13:37 ` Jeff Moyer
2009-11-04 17:21 ` Balbir Singh
2009-11-04 17:52 ` Vivek Goyal
2009-11-04 23:36 ` Balbir Singh
2009-11-03 23:43 ` [PATCH 02/20] blkio: Change CFQ to use CFS like queue time stamps Vivek Goyal
2009-11-04 14:30 ` Jeff Moyer
2009-11-04 16:37 ` Vivek Goyal
2009-11-04 17:59 ` Corrado Zoccolo
2009-11-04 18:54 ` Vivek Goyal [this message]
2009-11-05 2:44 ` Divyesh Shah
2009-11-05 14:39 ` Vivek Goyal
2009-11-04 21:18 ` Corrado Zoccolo
2009-11-04 22:25 ` Vivek Goyal
2009-11-05 8:36 ` Corrado Zoccolo
2009-11-04 23:22 ` Vivek Goyal
2009-11-05 8:27 ` Corrado Zoccolo
2009-11-05 0:05 ` Vivek Goyal
2009-11-06 22:22 ` [RFC] Workload type Vs Groups (Was: Re: [PATCH 02/20] blkio: Change CFQ to use CFS like queue time stamps) Vivek Goyal
2009-11-09 17:33 ` Nauman Rafique
2009-11-09 21:47 ` Corrado Zoccolo
2009-11-09 23:12 ` Vivek Goyal
2009-11-10 11:29 ` Corrado Zoccolo
2009-11-10 13:31 ` Vivek Goyal
2009-11-10 14:12 ` Vivek Goyal
2009-11-10 18:05 ` Corrado Zoccolo
2009-11-10 19:15 ` Vivek Goyal
2009-11-12 8:53 ` Corrado Zoccolo
2009-11-11 0:48 ` [PATCH 02/20] blkio: Change CFQ to use CFS like queue time stamps Gui Jianfeng
2009-11-12 23:07 ` Vivek Goyal
2009-11-13 0:59 ` Gui Jianfeng
2009-11-13 1:24 ` Vivek Goyal
2009-11-13 2:05 ` Gui Jianfeng
2009-11-03 23:43 ` [PATCH 03/20] blkio: Introduce the notion of weights Vivek Goyal
2009-11-04 15:06 ` Jeff Moyer
2009-11-04 15:41 ` Vivek Goyal
2009-11-04 17:07 ` Divyesh Shah
2009-11-04 19:00 ` Vivek Goyal
2009-11-04 19:15 ` Jeff Moyer
2009-11-03 23:43 ` [PATCH 04/20] blkio: Introduce the notion of cfq entity Vivek Goyal
2009-11-03 23:43 ` [PATCH 05/20] blkio: Introduce the notion of cfq groups Vivek Goyal
2009-11-03 23:43 ` [PATCH 06/20] blkio: Introduce cgroup interface Vivek Goyal
2009-11-04 15:23 ` Jeff Moyer
2009-11-04 16:47 ` Vivek Goyal
2009-11-03 23:43 ` [PATCH 07/20] blkio: Provide capablity to enqueue/dequeue group entities Vivek Goyal
2009-11-04 15:34 ` Jeff Moyer
2009-11-04 16:54 ` Vivek Goyal
2009-11-03 23:43 ` [PATCH 08/20] blkio: Add support for dynamic creation of cfq_groups Vivek Goyal
2009-11-04 16:01 ` Jeff Moyer
2009-11-03 23:43 ` [PATCH 09/20] blkio: Porpogate blkio cgroup weight or ioprio class updation to cfq groups Vivek Goyal
2009-11-05 5:35 ` Gui Jianfeng
2009-11-05 14:42 ` Vivek Goyal
2009-11-03 23:43 ` [PATCH 10/20] blkio: Implement cfq group deletion and reference counting support Vivek Goyal
2009-11-04 18:44 ` Jeff Moyer
2009-11-04 19:00 ` Vivek Goyal
2009-11-03 23:43 ` [PATCH 11/20] blkio: Some CFQ debugging Aid Vivek Goyal
2009-11-04 18:52 ` Jeff Moyer
2009-11-04 19:12 ` Vivek Goyal
2009-11-04 19:25 ` Jeff Moyer
2009-11-05 3:10 ` Divyesh Shah
2009-11-05 14:42 ` Vivek Goyal
2009-11-06 0:56 ` Divyesh Shah
2009-11-03 23:43 ` [PATCH 12/20] blkio: Export disk time and sectors dispatched from cgroup interface Vivek Goyal
2009-11-03 23:43 ` [PATCH 13/20] blkio: Add a group dequeue interface in cgroup for debugging Vivek Goyal
2009-11-03 23:43 ` [PATCH 14/20] blkio: Do not allow request merging across cfq groups Vivek Goyal
2009-11-03 23:43 ` [PATCH 15/20] blkio: Take care of preemptions across groups Vivek Goyal
2009-11-04 19:00 ` Jeff Moyer
2009-11-04 19:27 ` Vivek Goyal
2009-11-04 19:30 ` Jeff Moyer
2009-11-06 7:55 ` Gui Jianfeng
2009-11-06 22:10 ` Vivek Goyal
2009-11-09 7:41 ` Gui Jianfeng
2009-11-03 23:43 ` [PATCH 16/20] blkio: do not select co-operating queues from different cfq groups Vivek Goyal
2009-11-03 23:43 ` [PATCH 17/20] blkio: Wait for queue to get backlogged before it expires Vivek Goyal
2009-11-03 23:43 ` [PATCH 18/20] blkio: arm idle timer even if think time is great then time slice left Vivek Goyal
2009-11-04 19:04 ` Jeff Moyer
2009-11-04 19:17 ` Vivek Goyal
2009-11-03 23:43 ` [PATCH 19/20] blkio: Arm slice timer even if there are requests in driver Vivek Goyal
2009-11-03 23:43 ` [PATCH 20/20] blkio: Drop the reference to queue once the task changes cgroup Vivek Goyal
2009-11-04 19:09 ` Jeff Moyer
2009-11-04 19:18 ` Vivek Goyal
2009-11-04 7:43 ` [RFC] Block IO Controller V1 Jens Axboe
2009-11-04 13:39 ` Vivek Goyal
2009-11-04 19:12 ` Jeff Moyer
2009-11-04 19:19 ` Vivek Goyal
2009-11-04 19:27 ` Jeff Moyer
2009-11-04 19:38 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091104185402.GF2870@redhat.com \
--to=vgoyal@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=czoccolo@gmail.com \
--cc=dpshah@google.com \
--cc=fernando@oss.ntt.co.jp \
--cc=guijianfeng@cn.fujitsu.com \
--cc=jens.axboe@oracle.com \
--cc=jmoyer@redhat.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=m-ikeda@ds.jp.nec.com \
--cc=nauman@google.com \
--cc=riel@redhat.com \
--cc=righi.andrea@gmail.com \
--cc=ryov@valinux.co.jp \
--cc=s-uchida@ap.jp.nec.com \
--cc=taka@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).