From: Vivek Goyal <vgoyal@redhat.com>
To: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Cc: linux-kernel@vger.kernel.org, jens.axboe@oracle.com,
nauman@google.com, dpshah@google.com, lizf@cn.fujitsu.com,
ryov@valinux.co.jp, fernando@oss.ntt.co.jp,
s-uchida@ap.jp.nec.com, taka@valinux.co.jp, jmoyer@redhat.com,
righi.andrea@gmail.com, m-ikeda@ds.jp.nec.com,
czoccolo@gmail.com, Alan.Brunelle@hp.com
Subject: Re: Block IO Controller V4
Date: Thu, 3 Dec 2009 09:36:41 -0500 [thread overview]
Message-ID: <20091203143641.GA3887@redhat.com> (raw)
In-Reply-To: <4B1779CE.1050801@cn.fujitsu.com>
On Thu, Dec 03, 2009 at 04:41:50PM +0800, Gui Jianfeng wrote:
> Vivek Goyal wrote:
> > On Wed, Dec 02, 2009 at 09:51:36AM +0800, Gui Jianfeng wrote:
> >> Vivek Goyal wrote:
> >>> Hi Jens,
> >>>
> >>> This is V4 of the Block IO controller patches on top of "for-2.6.33" branch
> >>> of block tree.
> >>>
> >>> A consolidated patch can be found here:
> >>>
> >>> http://people.redhat.com/vgoyal/io-controller/blkio-controller/blkio-controller-v4.patch
> >>>
> >> Hi Vivek,
> >>
> >> It seems this version doesn't work very well for "direct(O_DIRECT) sequence read" mode.
> >> For example, you can create group A and group B, then assign weight 100 to group A and
> >> weight 400 to group B, and you run "direct sequence read" workload in group A and B
> >> simultaneously. Ideally, we should see 1:4 disk time differentiation for group A and B.
> >> But actually, I see almost 1:2 disk time differentiation for group A and B. I'm looking
> >> into this issue.
> >> BTW, V3 works well for this case.
> >
> > Hi Gui,
> >
> > In my testing of 8 fio jobs in 8 cgroups, direct sequential reads seems to
> > be working fine.
> >
> > http://lkml.org/lkml/2009/12/1/367
> >
> > I suspect that in some case we choose not to idle on the group and it gets
> > deleted from service tree hence we loose share. Can you have a look at
> > blkio.dequeue files. If there are excessive deletions, that will signify
> > that we are loosing share because we chose not to idle.
> >
> > If yes, please also run blktrace to see in what cases we chose not to
> > idle.
> >
> > In V3, I had a stronger check to idle on the group if it is empty using
> > wait_busy() function. In V4 I have removed that and trying to wait busy
> > on a queue by extending its slice if it has consumed its allocated slice.
>
> Hi Vivek,
>
> I ckecked the blktrace output, it seems that io group was deleted all the time,
> because we don't have group idle any more. I pulled the wait_busy code back to
> V4, and retest it, problem seems disappeared.
>
> So i suggest that we need to retain the wait_busy code.
Hi Gui,
We need to figure out why the existing code is not working on your system.
In V4, I introduced the functionality to extend the slice by slice_idle
so that we will arm slice idle timer and wait for new request to come in
and then expire the queue. Following is the code to extend the slice.
/*
* If this queue consumed its slice and this is last queue
* in the group, wait for next request before we expire
* the queue
*/
if (cfq_slice_used(cfqq) && cfqq->cfqg->nr_cfqq == 1) {
cfqq->slice_end = jiffies + cfqd->cfq_slice_idle;
cfq_mark_cfqq_wait_busy(cfqq);
}
One loop hole I see is that, I extend the slice only if current slice has
been used. If if we on the boundary and slice has not been used yet, then
I will not extend the slice. We also might not arm the timer thinking that
remaining slice is less than think time of process and that can lead to
expiry of queue. To rule out this possibility, can you remove following
code in arm_slice_timer() and try it again.
/*
* If our average think time is larger than the remaining time
* slice, then don't idle. This avoids overrunning the allotted
* time slice.
*/
if (sample_valid(cic->ttime_samples) &&
(cfqq->slice_end - jiffies < cic->ttime_mean))
return;
The other possiblity is that at the request completion time slice has not
expired hence we don't extend the slice and arm the timer. But then
select_queue() hits and by that time slice has expired and we expire the
queue. I thought this will not happen very frequently.
Can you figure out what is happening on your system. Why we are not doing
wait busy on the queue/group (new queue wait_busy and wait_busy_done
flags) and instead expiring the queue and hence group.
You can send your blktrace logs to me also. I can also try figuring out
what is happening.
Thanks
Vivek
next prev parent reply other threads:[~2009-12-03 14:38 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-30 2:59 Block IO Controller V4 Vivek Goyal
2009-11-30 2:59 ` [PATCH 01/21] blkio: Set must_dispatch only if we decided to not dispatch the request Vivek Goyal
2009-12-02 14:06 ` Jeff Moyer
2009-11-30 2:59 ` [PATCH 02/21] blkio: Introduce the notion of cfq groups Vivek Goyal
2009-11-30 2:59 ` [PATCH 03/21] blkio: Implement macro to traverse each idle tree in group Vivek Goyal
2009-11-30 20:13 ` Divyesh Shah
2009-11-30 22:24 ` Vivek Goyal
2009-11-30 2:59 ` [PATCH 04/21] blkio: Keep queue on service tree until we expire it Vivek Goyal
2009-11-30 2:59 ` [PATCH 05/21] blkio: Introduce the root service tree for cfq groups Vivek Goyal
2009-11-30 23:55 ` Divyesh Shah
2009-12-02 15:42 ` Vivek Goyal
2009-12-02 15:49 ` Vivek Goyal
2009-11-30 2:59 ` [PATCH 06/21] blkio: Introduce blkio controller cgroup interface Vivek Goyal
2009-12-01 0:04 ` Divyesh Shah
2009-12-02 15:27 ` Vivek Goyal
2009-11-30 2:59 ` [PATCH 07/21] blkio: Introduce per cfq group weights and vdisktime calculations Vivek Goyal
2009-12-02 15:50 ` Vivek Goyal
2009-11-30 2:59 ` [PATCH 08/21] blkio: Implement per cfq group latency target and busy queue avg Vivek Goyal
2009-11-30 2:59 ` [PATCH 09/21] blkio: Group time used accounting and workload context save restore Vivek Goyal
2009-11-30 2:59 ` [PATCH 10/21] blkio: Dynamic cfq group creation based on cgroup tasks belongs to Vivek Goyal
2009-11-30 2:59 ` [PATCH 11/21] blkio: Take care of cgroup deletion and cfq group reference counting Vivek Goyal
2009-11-30 2:59 ` [PATCH 12/21] blkio: Some debugging aids for CFQ Vivek Goyal
2009-11-30 2:59 ` [PATCH 13/21] blkio: Export disk time and sectors used by a group to user space Vivek Goyal
2009-11-30 2:59 ` [PATCH 14/21] blkio: Provide some isolation between groups Vivek Goyal
2009-11-30 2:59 ` [PATCH 15/21] blkio: Drop the reference to queue once the task changes cgroup Vivek Goyal
2009-11-30 2:59 ` [PATCH 16/21] blkio: Propagate cgroup weight updation to cfq groups Vivek Goyal
2009-11-30 2:59 ` [PATCH 17/21] blkio: Wait for cfq queue to get backlogged if group is empty Vivek Goyal
2009-11-30 2:59 ` [PATCH 18/21] blkio: Determine async workload length based on total number of queues Vivek Goyal
2009-11-30 2:59 ` [PATCH 19/21] blkio: Implement group_isolation tunable Vivek Goyal
2009-11-30 2:59 ` [PATCH 20/21] blkio: Wait on sync-noidle queue even if rq_noidle = 1 Vivek Goyal
2009-11-30 2:59 ` [PATCH 21/21] blkio: Documentation Vivek Goyal
2009-11-30 15:34 ` Block IO Controller V4 Corrado Zoccolo
2009-11-30 16:00 ` Vivek Goyal
2009-11-30 21:34 ` Corrado Zoccolo
2009-11-30 21:58 ` Vivek Goyal
2009-11-30 22:00 ` Alan D. Brunelle
2009-11-30 22:56 ` Vivek Goyal
2009-11-30 23:50 ` Alan D. Brunelle
2009-12-02 19:12 ` Vivek Goyal
2009-12-08 15:17 ` Alan D. Brunelle
2009-12-08 16:32 ` Vivek Goyal
2009-12-08 18:05 ` Alan D. Brunelle
2009-12-10 3:44 ` Vivek Goyal
2009-12-01 22:27 ` Vivek Goyal
2009-12-02 1:51 ` Gui Jianfeng
2009-12-02 14:25 ` Vivek Goyal
2009-12-03 8:41 ` Gui Jianfeng
2009-12-03 14:36 ` Vivek Goyal [this message]
2009-12-03 18:10 ` Vivek Goyal
2009-12-03 23:51 ` Vivek Goyal
2009-12-07 8:45 ` Gui Jianfeng
2009-12-07 15:25 ` Vivek Goyal
2009-12-07 1:35 ` Gui Jianfeng
2009-12-07 8:41 ` Gui Jianfeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091203143641.GA3887@redhat.com \
--to=vgoyal@redhat.com \
--cc=Alan.Brunelle@hp.com \
--cc=czoccolo@gmail.com \
--cc=dpshah@google.com \
--cc=fernando@oss.ntt.co.jp \
--cc=guijianfeng@cn.fujitsu.com \
--cc=jens.axboe@oracle.com \
--cc=jmoyer@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=m-ikeda@ds.jp.nec.com \
--cc=nauman@google.com \
--cc=righi.andrea@gmail.com \
--cc=ryov@valinux.co.jp \
--cc=s-uchida@ap.jp.nec.com \
--cc=taka@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox