Re: [PATCH] cfq-iosched: rework seeky detection

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Vivek Goyal <vgoyal@redhat.com>
To: Corrado Zoccolo <czoccolo@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>,
	Linux-Kernel <linux-kernel@vger.kernel.org>,
	Jeff Moyer <jmoyer@redhat.com>, Shaohua Li <shaohua.li@intel.com>,
	Gui Jianfeng <guijianfeng@cn.fujitsu.com>,
	Yanmin Zhang <yanmin_zhang@linux.intel.com>
Subject: Re: [PATCH] cfq-iosched: rework seeky detection
Date: Wed, 13 Jan 2010 15:19:13 -0500	[thread overview]
Message-ID: <20100113201913.GE6123@redhat.com> (raw)
In-Reply-To: <4e5e476b1001130005p4acfdd55na387f925ad6078f3@mail.gmail.com>

On Wed, Jan 13, 2010 at 09:05:21AM +0100, Corrado Zoccolo wrote:
> On Wed, Jan 13, 2010 at 12:17 AM, Corrado Zoccolo <czoccolo@gmail.com> wrote:
> > On Tue, Jan 12, 2010 at 11:36 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >>> The fact is, can we reliably determine which of those two setups we
> >>> have from cfq?
> >>
> >> I have no idea at this point of time but it looks like determining this
> >> will help.
> >>
> >> May be something like keep a track of number of processes on "sync-noidle"
> >> tree and average read times when sync-noidle tree is being served. Over a
> >> period of time we need to monitor what's the number of processes
> >> (threshold), after which average read time goes up. For sync-noidle we can
> >> then drive "queue_depth=nr_thrshold" and once queue depth reaches that,
> >> then idle on the process. So for single spindle, I guess tipping point
> >> will be 2 processes and we can idle on sync-noidle process. For more
> >> spindles, tipping point will be higher.
> >>
> >> These are just some random thoughts.
> > It seems reasonable.
> I think, though, that the implementation will be complex.
> We should limit this to request sizes that are <= stripe size (larger
> requests will hit more disks, and have a much lower optimal queue
> depth), so we need to add a new service_tree (they will become:
> SYNC_IDLE_LARGE, SYNC_IDLE_SMALL, SYNC_NOIDLE, ASYNC), and the
> optimization will apply only to the SYNC_IDLE_SMALL tree.
> Moreover, we can't just dispatch K queues and then idle on the last
> one. We need to have a set of K active queues, and wait on any of
> them. This makes this optimization very complex, and I think for
> little gain. In fact, usually we don't have sequential streams of
> small requests, unless we misuse mmap or direct I/O.

I guess one little simpler thing could be to determine whether underlying
media is single disk/spindle or not. So if optimal queue depth is more
than 1, there are most likely more than one spindle and we can drive
deeper queue depths and not idle on mmap process. If optimal queue depth
is 1, then there is single disk/spindle, and we can mark mmap process as
sync-idle. Not need of extra service tree.

But I do agree, that even determining optimal queue depth might turn out
to be complex. But in the long run it might be a useful information to
detct/know whether we are operating on single disk or an array of disks. I
will play around a bit with it if time permits.

> BTW, the mmap problem could be easily fixed adding madvise(WILL_NEED)
> to the userspace program, when dealing with data.
> I think we only have to worry about binaries, here.
> 
> > Something similar to what we do to reduce depth for async writes.
> > Can you see if you get similar BW improvements also for parallel
> > sequential direct I/Os with block size < stripe size?
> 
> Thanks,
> Corrado

next prev parent reply	other threads:[~2010-01-13 20:19 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-09 15:59 [PATCH] cfq-iosched: rework seeky detection Corrado Zoccolo
2010-01-11  1:47 ` Shaohua Li
2010-01-11  2:53   ` Gui Jianfeng
2010-01-11 14:20     ` Jeff Moyer
2010-01-11 14:46   ` Corrado Zoccolo
2010-01-12  1:49     ` Shaohua Li
2010-01-12  8:52       ` Corrado Zoccolo
2010-01-13  3:45         ` Shaohua Li
2010-01-13  7:09           ` Corrado Zoccolo
2010-01-13  8:00             ` Shaohua Li
2010-01-13  8:09               ` Corrado Zoccolo
2010-01-11 16:29 ` Vivek Goyal
2010-01-11 16:52   ` Corrado Zoccolo
2010-01-12 19:12 ` Vivek Goyal
2010-01-12 20:05   ` Corrado Zoccolo
2010-01-12 22:36     ` Vivek Goyal
2010-01-12 23:17       ` Corrado Zoccolo
2010-01-13  8:05         ` Corrado Zoccolo
2010-01-13 20:19           ` Vivek Goyal [this message]
2010-01-13 20:10         ` Vivek Goyal
     [not found]           ` <4e5e476b1001131324t148d195cp7ad92e7edf8325fb@mail.gmail.com>
2010-01-13 22:21             ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100113201913.GE6123@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=czoccolo@gmail.com \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=jens.axboe@oracle.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shaohua.li@intel.com \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox