From: Tao Ma <tm-d1IQDZat3X0@public.gmane.org>
To: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org,
ctalbott-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
rni-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
Shaohua Li <shli-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Subject: Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
Date: Wed, 04 Apr 2012 00:36:24 +0800 [thread overview]
Message-ID: <4F7B2708.6080504@tao.ma> (raw)
In-Reply-To: <20120403153736.GI5913-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
add Shaohua to the cc list,
On 04/03/2012 11:37 PM, Vivek Goyal wrote:
> On Tue, Apr 03, 2012 at 06:41:37AM +0800, Tao Ma wrote:
>> On 04/03/2012 06:25 AM, Vivek Goyal wrote:
>>> On Tue, Apr 03, 2012 at 06:20:10AM +0800, Tao Ma wrote:
>>>
>>> [..]
>>>>> Yeah, just add config and stat files prefixed with the name of the new
>>>>> blkcg policy.
>>>> OK, I will add a new config file for it.
>>>
>>> Only if CFQ could be modified to add one iops mode, flippable through a
>>> sysfs tunable, things will be much simpler. You will not have to add a
>>> new IO scheduler, no new configuration/stat files in blkcg (which is
>>> already crowded now).
>>>
>>> I don't think anybody has shown the code that why CFQ can't be modified
>>> to support iops mode.
>> Yes, I have thought of it, but it seems to me that time slice is deeply
>> involved within the cfq(even current cfq's iops mode has used time slice
>> to calculate). So I don't think it is feasible for me to change it. And
>> cfq works perfectly well for sas/sata environment and the code is quite
>> stable, more codes and more complicate algorithm does mean more bugs. So
>> I guess a new iops based scheduler is easy and not intrusive for the
>> user(since he can choose whether to use it or not).
>
> Ok, let me take one step back.
>
> - What's the goal of iops based scheduler. In what kind of workload and
> storage it is going to help.
>
> - Can't we just set the slice_idle=0 and "quantum" to some high value
> say "64" or "128" and achieve similar results to iops based scheduler?
yes, I should say cfq with slice_idle = 0 works well in most cases. But
if it comes to blkcg with ssd, it is really a disaster. You know, cfq
has to choose between different cgroups, so even if you choose 1ms as
the service time for each cgroup(actually in my test, only >2ms can work
reliably). the latency for some requests(which have been sent by the
user while not submitting to the driver) is really too much for the
application. I don't think there is a way to resolve it in cfq.
>
> In theory, above will cut down on idling and try to provide fairness in
> terms of time. I thought fairness in terms of time is most fair. The
> most common problem is measurement of time is not attributable to
> individual queue in an NCQ hardware. I guess that throws time measurement
> of out the window until and unless we have a better algorithm to measure
> time in NCQ environment.
>
> I guess then we can just replace time with number of requests dispatched
> from a process queue. Allow it to dispatch requests for some time and
> then schedule it out and put it back on service tree and charge it
> according to its weight.
As I have said, in this case, the minimal time(1ms) multiple the group
number is too much for a ssd.
If we can use iops based scheduler, we can use iops_weight for different
cgroups and switch cgroup according to this number. So all the
applications can have a moderate response time which can be estimated.
btw, I have talked with Shaohua in LSF and we made a consensus that I
will continue his work and try to add cgroup support to it.
Thanks
Tao
>
> This all works only if we have right workload. The workloads which are
> not doing dependent reads and can keep the disk busy continuously. If
> there is think time involved, and we do not idle, process will lose its
> share and whole scheme of trying to differentiate between processes will
> become ineffective.
>
> So if you have come with a better algorith which can keep track of iops
> without idling and still provide service differentiation for common
> workloads, it will be interesting.
>
> Thanks
> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
WARNING: multiple messages have this Message-ID (diff)
From: Tao Ma <tm@tao.ma>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Tejun Heo <tj@kernel.org>,
axboe@kernel.dk, ctalbott@google.com, rni@google.com,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
containers@lists.linux-foundation.org,
Shaohua Li <shli@kernel.org>
Subject: Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
Date: Wed, 04 Apr 2012 00:36:24 +0800 [thread overview]
Message-ID: <4F7B2708.6080504@tao.ma> (raw)
In-Reply-To: <20120403153736.GI5913@redhat.com>
add Shaohua to the cc list,
On 04/03/2012 11:37 PM, Vivek Goyal wrote:
> On Tue, Apr 03, 2012 at 06:41:37AM +0800, Tao Ma wrote:
>> On 04/03/2012 06:25 AM, Vivek Goyal wrote:
>>> On Tue, Apr 03, 2012 at 06:20:10AM +0800, Tao Ma wrote:
>>>
>>> [..]
>>>>> Yeah, just add config and stat files prefixed with the name of the new
>>>>> blkcg policy.
>>>> OK, I will add a new config file for it.
>>>
>>> Only if CFQ could be modified to add one iops mode, flippable through a
>>> sysfs tunable, things will be much simpler. You will not have to add a
>>> new IO scheduler, no new configuration/stat files in blkcg (which is
>>> already crowded now).
>>>
>>> I don't think anybody has shown the code that why CFQ can't be modified
>>> to support iops mode.
>> Yes, I have thought of it, but it seems to me that time slice is deeply
>> involved within the cfq(even current cfq's iops mode has used time slice
>> to calculate). So I don't think it is feasible for me to change it. And
>> cfq works perfectly well for sas/sata environment and the code is quite
>> stable, more codes and more complicate algorithm does mean more bugs. So
>> I guess a new iops based scheduler is easy and not intrusive for the
>> user(since he can choose whether to use it or not).
>
> Ok, let me take one step back.
>
> - What's the goal of iops based scheduler. In what kind of workload and
> storage it is going to help.
>
> - Can't we just set the slice_idle=0 and "quantum" to some high value
> say "64" or "128" and achieve similar results to iops based scheduler?
yes, I should say cfq with slice_idle = 0 works well in most cases. But
if it comes to blkcg with ssd, it is really a disaster. You know, cfq
has to choose between different cgroups, so even if you choose 1ms as
the service time for each cgroup(actually in my test, only >2ms can work
reliably). the latency for some requests(which have been sent by the
user while not submitting to the driver) is really too much for the
application. I don't think there is a way to resolve it in cfq.
>
> In theory, above will cut down on idling and try to provide fairness in
> terms of time. I thought fairness in terms of time is most fair. The
> most common problem is measurement of time is not attributable to
> individual queue in an NCQ hardware. I guess that throws time measurement
> of out the window until and unless we have a better algorithm to measure
> time in NCQ environment.
>
> I guess then we can just replace time with number of requests dispatched
> from a process queue. Allow it to dispatch requests for some time and
> then schedule it out and put it back on service tree and charge it
> according to its weight.
As I have said, in this case, the minimal time(1ms) multiple the group
number is too much for a ssd.
If we can use iops based scheduler, we can use iops_weight for different
cgroups and switch cgroup according to this number. So all the
applications can have a moderate response time which can be estimated.
btw, I have talked with Shaohua in LSF and we made a consensus that I
will continue his work and try to add cgroup support to it.
Thanks
Tao
>
> This all works only if we have right workload. The workloads which are
> not doing dependent reads and can keep the disk busy continuously. If
> there is think time involved, and we do not idle, process will lose its
> share and whole scheme of trying to differentiate between processes will
> become ineffective.
>
> So if you have come with a better algorith which can keep track of iops
> without idling and still provide service differentiation for common
> workloads, it will be interesting.
>
> Thanks
> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2012-04-03 16:36 UTC|newest]
Thread overview: 135+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
2012-03-28 22:51 ` Tejun Heo
[not found] ` <1332975091-10950-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-03-28 22:51 ` [PATCH 01/21] blkcg: remove unused @pol and @plid parameters Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 02/21] blkcg: BLKIO_STAT_CPU_SECTORS doesn't have subcounters Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 03/21] blkcg: introduce blkg_stat and blkg_rwstat Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 04/21] blkcg: restructure statistics printing Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 05/21] blkcg: drop blkiocg_file_write_u64() Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 06/21] blkcg: restructure configuration printing Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 07/21] blkcg: restructure blkio_group configruation setting Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 08/21] blkcg: blkg_conf_prep() Tejun Heo
2012-03-28 22:51 ` Tejun Heo
[not found] ` <1332975091-10950-9-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-03-28 22:53 ` Tejun Heo
2012-03-28 22:53 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 09/21] blkcg: export conf/stat helpers to prepare for reorganization Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 10/21] blkcg: implement blkio_policy_type->cftypes Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 11/21] blkcg: move conf/stat file handling code to policies Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 12/21] cfq: collapse cfq.h into cfq-iosched.c Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 13/21] blkcg: move statistics update code to policies Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 14/21] blkcg: cfq doesn't need per-cpu dispatch stats Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 15/21] blkcg: add blkio_policy_ops operations for exit and stat reset Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 16/21] blkcg: move blkio_group_stats to cfq-iosched.c Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 17/21] blkcg: move blkio_group_stats_cpu and friends to blk-throttle.c Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq Tejun Heo
2012-03-28 22:51 ` Tejun Heo
[not found] ` <1332975091-10950-19-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-04-01 21:09 ` Vivek Goyal
2012-04-01 21:09 ` Vivek Goyal
[not found] ` <20120401210955.GE6116-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-01 21:22 ` Tejun Heo
2012-04-01 21:22 ` Tejun Heo
2012-04-01 21:22 ` Tejun Heo
2012-04-02 21:39 ` Tao Ma
2012-04-02 21:39 ` Tao Ma
[not found] ` <4F7A1C8B.3010402-d1IQDZat3X0@public.gmane.org>
2012-04-02 21:49 ` Tejun Heo
2012-04-02 21:49 ` Tejun Heo
2012-04-02 21:49 ` Tejun Heo
[not found] ` <20120402214938.GA19634-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-04-02 22:03 ` Tao Ma
2012-04-02 22:03 ` Tao Ma
2012-04-02 22:03 ` Tao Ma
[not found] ` <4F7A2217.2030201-d1IQDZat3X0@public.gmane.org>
2012-04-02 22:17 ` Tejun Heo
2012-04-02 22:17 ` Tejun Heo
[not found] ` <20120402221702.GA21017-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-04-02 22:20 ` Tao Ma
2012-04-02 22:20 ` Tao Ma
2012-04-02 22:20 ` Tao Ma
[not found] ` <4F7A261A.9000200-d1IQDZat3X0@public.gmane.org>
2012-04-02 22:25 ` Vivek Goyal
2012-04-02 22:25 ` Vivek Goyal
[not found] ` <20120402222504.GA2672-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-02 22:28 ` Tejun Heo
2012-04-02 22:28 ` Tejun Heo
2012-04-02 22:28 ` Tejun Heo
2012-04-02 22:41 ` Tao Ma
2012-04-02 22:41 ` Tao Ma
2012-04-02 22:41 ` Tao Ma
[not found] ` <4F7A2B21.5000907-d1IQDZat3X0@public.gmane.org>
2012-04-03 15:37 ` IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq) Vivek Goyal
2012-04-03 15:37 ` Vivek Goyal
2012-04-03 15:37 ` Vivek Goyal
[not found] ` <20120403153736.GI5913-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-03 16:36 ` Tao Ma
2012-04-03 16:36 ` Tao Ma [this message]
2012-04-03 16:36 ` Tao Ma
[not found] ` <4F7B2708.6080504-d1IQDZat3X0@public.gmane.org>
2012-04-03 16:50 ` Vivek Goyal
2012-04-03 16:50 ` Vivek Goyal
[not found] ` <20120403164959.GJ5913-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-03 17:26 ` Tao Ma
2012-04-03 17:26 ` Tao Ma
2012-04-03 17:26 ` Tao Ma
[not found] ` <4F7B32AE.7050900-d1IQDZat3X0@public.gmane.org>
2012-04-04 12:35 ` Shaohua Li
2012-04-04 12:35 ` Shaohua Li
2012-04-04 12:35 ` Shaohua Li
[not found] ` <CANejiEU1qAsvogozY3MjZnpcrbYZO4CkRE8s73WGPc_R5LKV9g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-04 13:37 ` Vivek Goyal
2012-04-04 13:37 ` Vivek Goyal
[not found] ` <20120404133705.GB12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 14:52 ` Shaohua Li
2012-04-04 14:52 ` Shaohua Li
[not found] ` <CANejiEVD6nFVqX8Jf_hmRHg8YyBvPxbTVjgEif3mO1aa925KkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-04 15:10 ` Vivek Goyal
2012-04-04 15:10 ` Vivek Goyal
[not found] ` <20120404151014.GD12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 16:06 ` Tao Ma
2012-04-04 16:06 ` Tao Ma
2012-04-04 16:06 ` Tao Ma
2012-04-04 15:10 ` Vivek Goyal
2012-04-04 16:45 ` Tao Ma
2012-04-04 16:45 ` Tao Ma
[not found] ` <4F7C7A91.8040707-d1IQDZat3X0@public.gmane.org>
2012-04-04 16:50 ` Vivek Goyal
2012-04-04 16:50 ` Vivek Goyal
[not found] ` <20120404165048.GF12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 17:17 ` Vivek Goyal
2012-04-04 17:17 ` Vivek Goyal
2012-04-04 17:17 ` Vivek Goyal
2012-04-04 17:18 ` Tao Ma
2012-04-04 17:18 ` Tao Ma
[not found] ` <4F7C824D.2050308-d1IQDZat3X0@public.gmane.org>
2012-04-04 17:27 ` Vivek Goyal
2012-04-04 17:27 ` Vivek Goyal
2012-04-04 18:22 ` Vivek Goyal
2012-04-04 18:22 ` Vivek Goyal
2012-04-04 18:22 ` Vivek Goyal
[not found] ` <20120404182238.GI12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 18:36 ` Tao Ma
2012-04-04 18:36 ` Tao Ma
2012-04-04 18:36 ` Tao Ma
2012-04-04 17:18 ` Tao Ma
2012-04-04 13:31 ` Vivek Goyal
2012-04-04 13:31 ` Vivek Goyal
2012-04-03 16:50 ` Vivek Goyal
2012-04-02 21:39 ` [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq Tao Ma
2012-03-28 22:51 ` [PATCH 19/21] blkcg: move blkio_group_conf->iops and ->bps to blk-throttle Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 20/21] blkcg: pass around pd->pdata instead of pd itself in prfill functions Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 21/21] blkcg: drop BLKCG_STAT_{PRIV|POL|OFF} macros Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-29 8:18 ` [PATCHSET] block: modularize blkcg config and stat file handling Jens Axboe
2012-03-29 8:18 ` Jens Axboe
[not found] ` <4F741AED.6090901-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2012-04-02 20:02 ` Tejun Heo
2012-04-02 20:02 ` Tejun Heo
[not found] ` <20120402200233.GB17175-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-04-02 21:51 ` Jens Axboe
2012-04-02 21:51 ` Jens Axboe
[not found] ` <4F7A1F52.50706-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2012-04-02 22:33 ` Tejun Heo
2012-04-02 22:33 ` Tejun Heo
2012-04-02 22:33 ` Tejun Heo
2012-04-02 20:02 ` Tejun Heo
2012-04-01 19:38 ` Vivek Goyal
2012-04-01 19:38 ` Vivek Goyal
2012-04-01 19:38 ` Vivek Goyal
2012-04-01 21:42 ` Tejun Heo
2012-04-01 21:42 ` Tejun Heo
2012-04-01 21:42 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F7B2708.6080504@tao.ma \
--to=tm-d1iqdzat3x0@public.gmane.org \
--cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=ctalbott-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=rni-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=shli-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.