linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Michal Hocko <mhocko@suse.cz>,
	cgroups@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Sha Zhengju <handai.szj@gmail.com>,
	devel@openvz.org, Jens Axboe <axboe@kernel.dk>
Subject: Re: [PATCH RFC] fsio: filesystem io accounting cgroup
Date: Tue, 9 Jul 2013 08:08:15 -0700	[thread overview]
Message-ID: <20130709150815.GG2478@htj.dyndns.org> (raw)
In-Reply-To: <20130709145430.GB2237@redhat.com>

Hello, Vivek.

On Tue, Jul 09, 2013 at 10:54:30AM -0400, Vivek Goyal wrote:
> It is not clear whether counting bio or counting request is right
> thing to do here. It depends where you are trying to throttle. For
> bio based drivers there is request and they need throttling mechanism
> too. So keeping it common for both, kind of makes sense.

It gets weird because we may end up with wildy disagreeing statistics
from queue and the resource management.  It should have been part of
request_queue not something sitting on top.  Note that with
multi-queue support, we're unlikely to need bio based drivers except
for the stacking ones.

> Ok, so first of all you agree that time slice management is not a
> requirement for fast devices.

Not fast, but consistent.

> So time slice management is a problem even on slow devices which implement
> NCQ. IIRC, in the beginning even CFQ as doing some kind of request
> management (and not time slice management). And later it switched to
> time slice management in an effort to provide better fairness (If somebody
> is doing random IO and seek takes more time the process should be
> accounted for it).
> 
> But ideal time slice accounting requires driving a queue depth of 1
> and for any non-sequential IO, it kills performance.

Yeap, complete control only works with qd == 1 and even then write
buffering will throw you off.  But even w/ qd > 1 and write buffering,
time slice is fundamentally right thing to manage and than iops for
disks - e.g. you want to group IOs from the same issuer in the same
time slice even if the time accounting for that is not accurate so
that you can size the slice according to the operating characteristics
of the device and do things like idling inbetween.

> Seriously, time slice accounting is one way of managing resource. Same
> disk resource can be divided proportionally by counting either iops
> or by counting amount of IO done (bandwidth).

In practice, bio iops based proportional control becomes almost
completely worthless if you have any mix of random and sequential
accesses.  cfq wouldn't be accurate but it'd be *far* closer than
anything based on iops.

> If we count iops or bandwidth, it might not be most fair way of doing
> things on rotational media but it also should provide more accurate
> results in case of NCQ. When multiple requests have been dispatched
> to disk we have no idea which request consumed how much of disk time.
> So there is no way to account it properly. Iops or bandwidth based
> accounting will work just fine even with NCQ.

Sure, if iops or bw is what you explicitly want to control with hard
limits, it's fine, but doing proportional control with that on
rotating disk is just silly.

> So you want this generic block layer proportional implementation to
> do time slice management?
> 
> I thought we talked about this implementation to use some kind of
> token based mechanism so that it scales better on faster
> devices. And on slower devices one will continue to use CFQ.

I want to leave rotating disk proportional control to cfq-iosched for
as long as it matters and do iops / bw based things in the generic
layer.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-07-09 15:08 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-08 10:01 [PATCH RFC] fsio: filesystem io accounting cgroup Konstantin Khlebnikov
2013-07-08 17:00 ` Tejun Heo
2013-07-08 17:52   ` Vivek Goyal
2013-07-08 17:56     ` Tejun Heo
2013-07-09  8:28       ` Konstantin Khlebnikov
2013-07-09 12:57         ` Tejun Heo
2013-07-09 13:15           ` Konstantin Khlebnikov
2013-07-09 13:16             ` Tejun Heo
2013-07-09 13:16               ` Tejun Heo
2013-07-09 13:43                 ` Konstantin Khlebnikov
2013-07-09 13:45                   ` Tejun Heo
2013-07-09 14:18                     ` Vivek Goyal
2013-07-09 14:29                       ` Tejun Heo
2013-07-09 14:54                         ` Vivek Goyal
2013-07-09 15:08                           ` Tejun Heo [this message]
     [not found]                             ` <20130710030955.GA3569@redhat.com>
2013-07-10  3:50                               ` Tejun Heo
2013-07-09 14:35                     ` Konstantin Khlebnikov
2013-07-09 14:42                       ` Tejun Heo
2013-07-09 15:06                       ` Vivek Goyal
2013-07-09 17:42                         ` Konstantin Khlebnikov
2013-07-09 18:35                           ` Vivek Goyal
2013-07-09 20:54                             ` Konstantin Khlebnikov
2013-07-08 18:11 ` Vivek Goyal
2013-07-09 15:39 ` Theodore Ts'o
2013-07-09 17:12   ` Konstantin Khlebnikov
  -- strict thread matches above, loose matches on Subject: below --
2013-07-08  9:59 Konstantin Khlebnikov
2013-07-10  4:43 ` Sha Zhengju
2013-07-10  6:03   ` Konstantin Khlebnikov
2013-07-10  8:37     ` Sha Zhengju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130709150815.GG2478@htj.dyndns.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=devel@openvz.org \
    --cc=handai.szj@gmail.com \
    --cc=khlebnikov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).