public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Corrado Zoccolo <czoccolo@gmail.com>
Cc: linux-kernel@vger.kernel.org, jens.axboe@oracle.com,
	nauman@google.com, dpshah@google.com, lizf@cn.fujitsu.com,
	ryov@valinux.co.jp, fernando@oss.ntt.co.jp,
	s-uchida@ap.jp.nec.com, taka@valinux.co.jp,
	guijianfeng@cn.fujitsu.com, jmoyer@redhat.com,
	righi.andrea@gmail.com, m-ikeda@ds.jp.nec.com,
	Alan.Brunelle@hp.com
Subject: Re: Block IO Controller V4
Date: Mon, 30 Nov 2009 11:00:24 -0500	[thread overview]
Message-ID: <20091130160024.GD11670@redhat.com> (raw)
In-Reply-To: <4e5e476b0911300734h34a22c88oa5d7d4e5642ead50@mail.gmail.com>

On Mon, Nov 30, 2009 at 04:34:36PM +0100, Corrado Zoccolo wrote:
> Hi Vivek,
> On Mon, Nov 30, 2009 at 3:59 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > Hi Jens,
> > [snip]
> > TODO
> > ====
> > - Direct random writers seem to be very fickle in terms of workload
> >  classification. They seem to be switching between sync-idle and sync-noidle
> >  workload type in a little unpredictable manner. Debug and fix it.
> >
> 
> Are you still experiencing erratic behaviour after my patches were
> integrated in for-2.6.33?

Your patches helped with deep seeky queues. But if I am running a random
writer with default iodepth of 1 (without libaio), I still see that idle
0/1 flipping happens so frequently during 30 seconds duration of
execution. 

As per CFQ classification definition, a seeky random writer with shallow
depth should be classified as sync-noidle and stay there until and unless
workload changes its nature. But that does not seem to be happening.

Just try two fio random writers and monitor the blktrace and see how
freqently we enable and disable idle on the queues.

> 
> > - Support async IO control (buffered writes).
> I was thinking about this.
> Currently, writeback can either be issued by a kernel daemon (when
> actual dirty ratio is > background dirty ratio, but < dirty_ratio) or
> from various processes, if the actual dirty ratio is > dirty ratio.

- If dirty_ratio > background_dirty_ratio, then a process will be
  throttled and it can do one of the following actions.

	- Pick one inode and start flushing its dirty pages. Now these
	  pages could have been dirtied by another process in another
	  group.

	- It might just wait for flusher threads to flush some pages and
	  sleep for that duration.

> Could the writeback issued in the context of a process be marked as sync?
> In this way:
> * normal writeback when system is not under pressure will run in the
> root group, without interferring with sync workload
> * the writeback issued when we have high dirty ratio will have more
> priority, so the system will return in a normal condition quicker.

Marking async IO submitted in the context of processes and not kernel
threads is interesting. We could try that, but in general the processes
that are being throttled are doing buffered writes and generally these
are not very latency sensitive.

Group stuff apart, I would rather think of providing consistent share to
async workload. So that when there is lots of sync as well async IO is
going on in the system, nobody starves and we provide access to disk in
a deterministic manner.

That's why I do like the idea of fixing a workload share of async
workload so that async workload does not starve in the face of lot of sync
IO going on. Not sure how effectively it is working though.

Thanks
Vivek


> * your code will work out of the box, in fact processes with lower
> weight will complete less I/O, therefore they will be slowed down more
> than higher weight ones.
> 
> >
> >  Buffered writes is a beast and requires changes at many a places to solve the
> >  problem and patchset becomes huge. Hence first we plan to support only sync
> >  IO in control then work on async IO too.
> >
> >  Some of the work items identified are.
> >
> >        - Per memory cgroup dirty ratio
> >        - Possibly modification of writeback to force writeback from a
> >          particular cgroup.
> >        - Implement IO tracking support so that a bio can be mapped to a cgroup.
> >        - Per group request descriptor infrastructure in block layer.
> >        - At CFQ level, implement per cfq_group async queues.
> >
> >  In this patchset, all the async IO goes in system wide queues and there are
> >  no per group async queues. That means we will see service differentiation
> >  only for sync IO only. Async IO willl be handled later.
> >
> > - Support for higher level policies like max BW controller.
> > - Support groups of RT class also.
> 
> Thanks,
> Corrado

  reply	other threads:[~2009-11-30 16:02 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-30  2:59 Block IO Controller V4 Vivek Goyal
2009-11-30  2:59 ` [PATCH 01/21] blkio: Set must_dispatch only if we decided to not dispatch the request Vivek Goyal
2009-12-02 14:06   ` Jeff Moyer
2009-11-30  2:59 ` [PATCH 02/21] blkio: Introduce the notion of cfq groups Vivek Goyal
2009-11-30  2:59 ` [PATCH 03/21] blkio: Implement macro to traverse each idle tree in group Vivek Goyal
2009-11-30 20:13   ` Divyesh Shah
2009-11-30 22:24     ` Vivek Goyal
2009-11-30  2:59 ` [PATCH 04/21] blkio: Keep queue on service tree until we expire it Vivek Goyal
2009-11-30  2:59 ` [PATCH 05/21] blkio: Introduce the root service tree for cfq groups Vivek Goyal
2009-11-30 23:55   ` Divyesh Shah
2009-12-02 15:42     ` Vivek Goyal
2009-12-02 15:49   ` Vivek Goyal
2009-11-30  2:59 ` [PATCH 06/21] blkio: Introduce blkio controller cgroup interface Vivek Goyal
2009-12-01  0:04   ` Divyesh Shah
2009-12-02 15:27     ` Vivek Goyal
2009-11-30  2:59 ` [PATCH 07/21] blkio: Introduce per cfq group weights and vdisktime calculations Vivek Goyal
2009-12-02 15:50   ` Vivek Goyal
2009-11-30  2:59 ` [PATCH 08/21] blkio: Implement per cfq group latency target and busy queue avg Vivek Goyal
2009-11-30  2:59 ` [PATCH 09/21] blkio: Group time used accounting and workload context save restore Vivek Goyal
2009-11-30  2:59 ` [PATCH 10/21] blkio: Dynamic cfq group creation based on cgroup tasks belongs to Vivek Goyal
2009-11-30  2:59 ` [PATCH 11/21] blkio: Take care of cgroup deletion and cfq group reference counting Vivek Goyal
2009-11-30  2:59 ` [PATCH 12/21] blkio: Some debugging aids for CFQ Vivek Goyal
2009-11-30  2:59 ` [PATCH 13/21] blkio: Export disk time and sectors used by a group to user space Vivek Goyal
2009-11-30  2:59 ` [PATCH 14/21] blkio: Provide some isolation between groups Vivek Goyal
2009-11-30  2:59 ` [PATCH 15/21] blkio: Drop the reference to queue once the task changes cgroup Vivek Goyal
2009-11-30  2:59 ` [PATCH 16/21] blkio: Propagate cgroup weight updation to cfq groups Vivek Goyal
2009-11-30  2:59 ` [PATCH 17/21] blkio: Wait for cfq queue to get backlogged if group is empty Vivek Goyal
2009-11-30  2:59 ` [PATCH 18/21] blkio: Determine async workload length based on total number of queues Vivek Goyal
2009-11-30  2:59 ` [PATCH 19/21] blkio: Implement group_isolation tunable Vivek Goyal
2009-11-30  2:59 ` [PATCH 20/21] blkio: Wait on sync-noidle queue even if rq_noidle = 1 Vivek Goyal
2009-11-30  2:59 ` [PATCH 21/21] blkio: Documentation Vivek Goyal
2009-11-30 15:34 ` Block IO Controller V4 Corrado Zoccolo
2009-11-30 16:00   ` Vivek Goyal [this message]
2009-11-30 21:34     ` Corrado Zoccolo
2009-11-30 21:58       ` Vivek Goyal
2009-11-30 22:00       ` Alan D. Brunelle
2009-11-30 22:56         ` Vivek Goyal
2009-11-30 23:50           ` Alan D. Brunelle
2009-12-02 19:12             ` Vivek Goyal
2009-12-08 15:17           ` Alan D. Brunelle
2009-12-08 16:32             ` Vivek Goyal
2009-12-08 18:05               ` Alan D. Brunelle
2009-12-10  3:44                 ` Vivek Goyal
2009-12-01 22:27 ` Vivek Goyal
2009-12-02  1:51 ` Gui Jianfeng
2009-12-02 14:25   ` Vivek Goyal
2009-12-03  8:41     ` Gui Jianfeng
2009-12-03 14:36       ` Vivek Goyal
2009-12-03 18:10         ` Vivek Goyal
2009-12-03 23:51           ` Vivek Goyal
2009-12-07  8:45             ` Gui Jianfeng
2009-12-07 15:25               ` Vivek Goyal
2009-12-07  1:35         ` Gui Jianfeng
2009-12-07  8:41           ` Gui Jianfeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091130160024.GD11670@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=Alan.Brunelle@hp.com \
    --cc=czoccolo@gmail.com \
    --cc=dpshah@google.com \
    --cc=fernando@oss.ntt.co.jp \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=jens.axboe@oracle.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=m-ikeda@ds.jp.nec.com \
    --cc=nauman@google.com \
    --cc=righi.andrea@gmail.com \
    --cc=ryov@valinux.co.jp \
    --cc=s-uchida@ap.jp.nec.com \
    --cc=taka@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox