From: Tao Ma <tm-d1IQDZat3X0@public.gmane.org>
To: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org,
ctalbott-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
rni-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Shaohua Li <shli-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Subject: Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
Date: Thu, 05 Apr 2012 00:45:05 +0800 [thread overview]
Message-ID: <4F7C7A91.8040707@tao.ma> (raw)
In-Reply-To: <20120404133705.GB12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
On 04/04/2012 09:37 PM, Vivek Goyal wrote:
> On Wed, Apr 04, 2012 at 05:35:49AM -0700, Shaohua Li wrote:
>
> [..]
>>>> How iops_weight and switching different than CFQ group scheduling logic?
>>>> I think shaohua was talking of using similar logic. What would you do
>>>> fundamentally different so that without idling you will get service
>>>> differentiation?
>>> I am thinking of differentiate different groups with iops, so if there
>>> are 3 groups(the weight are 100, 200, 300) we can let them submit 1 io,
>>> 2 io and 3 io in a round-robin way. With a intel ssd, every io can be
>>> finished within 100us. So the maximum latency for one io is about 600us,
>>> still less than 1ms. But with cfq, if all the cgroups are busy, we have
>>> to switch between these group in ms which means the maximum latency will
>>> be 6ms. It is terrible for some applications since they use ssds now.
>> Yes, with iops based scheduling, we do queue switching for every request.
>> Doing the same thing between groups is quite straightforward. The only issue
>> I found is this will introduce more process context switch, this isn't
>> a big issue
>> for io bound application, but depends. It cuts latency a lot, which I
>> guess is more
>> important for web 2.0 application.
>
> In iops_mode(), expire each cfqq after dispatch of 1 or bunch of requests
> and you should get the same behavior (with slice_idle=0 and group_idle=0).
> So why write a new scheduler.
really? How could we config cfq to work like this? Or you mean we can
change the code for it?
>
> Only thing is that with above, current code will provide iops fairness only
> for groups. We should be able to tweak queue scheduling to support iops
> fairness also.
OK, as I have said in another e-mail another my concern is the
complexity. It will make cfq too much complicated. I just checked the
source code of shaohua's original patch, fiops scheduler is only ~700
lines, so with cgroup support added it would be ~1000 lines I guess.
Currently cfq-iosched.c is around ~4000 lines even after Tejun's cleanup
of io context...
Thanks
Tao
>
> Anyway, we will end up doing that at some point of time. Supporting two
> scheduling algorihtms for queue and groups is not sustainable. There are
> already calls to make CFQ hierarchical and in that case both queue and
> groups need to be on a single service tree and that means need to follow
> same algorithm for scheduling.
>
> Thanks
> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
WARNING: multiple messages have this Message-ID (diff)
From: Tao Ma <tm@tao.ma>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Shaohua Li <shli@kernel.org>, Tejun Heo <tj@kernel.org>,
axboe@kernel.dk, ctalbott@google.com, rni@google.com,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
containers@lists.linux-foundation.org
Subject: Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
Date: Thu, 05 Apr 2012 00:45:05 +0800 [thread overview]
Message-ID: <4F7C7A91.8040707@tao.ma> (raw)
In-Reply-To: <20120404133705.GB12676@redhat.com>
On 04/04/2012 09:37 PM, Vivek Goyal wrote:
> On Wed, Apr 04, 2012 at 05:35:49AM -0700, Shaohua Li wrote:
>
> [..]
>>>> How iops_weight and switching different than CFQ group scheduling logic?
>>>> I think shaohua was talking of using similar logic. What would you do
>>>> fundamentally different so that without idling you will get service
>>>> differentiation?
>>> I am thinking of differentiate different groups with iops, so if there
>>> are 3 groups(the weight are 100, 200, 300) we can let them submit 1 io,
>>> 2 io and 3 io in a round-robin way. With a intel ssd, every io can be
>>> finished within 100us. So the maximum latency for one io is about 600us,
>>> still less than 1ms. But with cfq, if all the cgroups are busy, we have
>>> to switch between these group in ms which means the maximum latency will
>>> be 6ms. It is terrible for some applications since they use ssds now.
>> Yes, with iops based scheduling, we do queue switching for every request.
>> Doing the same thing between groups is quite straightforward. The only issue
>> I found is this will introduce more process context switch, this isn't
>> a big issue
>> for io bound application, but depends. It cuts latency a lot, which I
>> guess is more
>> important for web 2.0 application.
>
> In iops_mode(), expire each cfqq after dispatch of 1 or bunch of requests
> and you should get the same behavior (with slice_idle=0 and group_idle=0).
> So why write a new scheduler.
really? How could we config cfq to work like this? Or you mean we can
change the code for it?
>
> Only thing is that with above, current code will provide iops fairness only
> for groups. We should be able to tweak queue scheduling to support iops
> fairness also.
OK, as I have said in another e-mail another my concern is the
complexity. It will make cfq too much complicated. I just checked the
source code of shaohua's original patch, fiops scheduler is only ~700
lines, so with cgroup support added it would be ~1000 lines I guess.
Currently cfq-iosched.c is around ~4000 lines even after Tejun's cleanup
of io context...
Thanks
Tao
>
> Anyway, we will end up doing that at some point of time. Supporting two
> scheduling algorihtms for queue and groups is not sustainable. There are
> already calls to make CFQ hierarchical and in that case both queue and
> groups need to be on a single service tree and that means need to follow
> same algorithm for scheduling.
>
> Thanks
> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2012-04-04 16:45 UTC|newest]
Thread overview: 135+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
2012-03-28 22:51 ` Tejun Heo
[not found] ` <1332975091-10950-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-03-28 22:51 ` [PATCH 01/21] blkcg: remove unused @pol and @plid parameters Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 02/21] blkcg: BLKIO_STAT_CPU_SECTORS doesn't have subcounters Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 03/21] blkcg: introduce blkg_stat and blkg_rwstat Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 04/21] blkcg: restructure statistics printing Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 05/21] blkcg: drop blkiocg_file_write_u64() Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 06/21] blkcg: restructure configuration printing Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 07/21] blkcg: restructure blkio_group configruation setting Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 08/21] blkcg: blkg_conf_prep() Tejun Heo
2012-03-28 22:51 ` Tejun Heo
[not found] ` <1332975091-10950-9-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-03-28 22:53 ` Tejun Heo
2012-03-28 22:53 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 09/21] blkcg: export conf/stat helpers to prepare for reorganization Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 10/21] blkcg: implement blkio_policy_type->cftypes Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 11/21] blkcg: move conf/stat file handling code to policies Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 12/21] cfq: collapse cfq.h into cfq-iosched.c Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 13/21] blkcg: move statistics update code to policies Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 14/21] blkcg: cfq doesn't need per-cpu dispatch stats Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 15/21] blkcg: add blkio_policy_ops operations for exit and stat reset Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 16/21] blkcg: move blkio_group_stats to cfq-iosched.c Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 17/21] blkcg: move blkio_group_stats_cpu and friends to blk-throttle.c Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq Tejun Heo
2012-03-28 22:51 ` Tejun Heo
[not found] ` <1332975091-10950-19-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-04-01 21:09 ` Vivek Goyal
2012-04-01 21:09 ` Vivek Goyal
[not found] ` <20120401210955.GE6116-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-01 21:22 ` Tejun Heo
2012-04-01 21:22 ` Tejun Heo
2012-04-01 21:22 ` Tejun Heo
2012-04-02 21:39 ` Tao Ma
2012-04-02 21:39 ` Tao Ma
2012-04-02 21:39 ` Tao Ma
[not found] ` <4F7A1C8B.3010402-d1IQDZat3X0@public.gmane.org>
2012-04-02 21:49 ` Tejun Heo
2012-04-02 21:49 ` Tejun Heo
2012-04-02 21:49 ` Tejun Heo
[not found] ` <20120402214938.GA19634-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-04-02 22:03 ` Tao Ma
2012-04-02 22:03 ` Tao Ma
[not found] ` <4F7A2217.2030201-d1IQDZat3X0@public.gmane.org>
2012-04-02 22:17 ` Tejun Heo
2012-04-02 22:17 ` Tejun Heo
[not found] ` <20120402221702.GA21017-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-04-02 22:20 ` Tao Ma
2012-04-02 22:20 ` Tao Ma
[not found] ` <4F7A261A.9000200-d1IQDZat3X0@public.gmane.org>
2012-04-02 22:25 ` Vivek Goyal
2012-04-02 22:25 ` Vivek Goyal
[not found] ` <20120402222504.GA2672-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-02 22:28 ` Tejun Heo
2012-04-02 22:28 ` Tejun Heo
2012-04-02 22:28 ` Tejun Heo
2012-04-02 22:41 ` Tao Ma
2012-04-02 22:41 ` Tao Ma
[not found] ` <4F7A2B21.5000907-d1IQDZat3X0@public.gmane.org>
2012-04-03 15:37 ` IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq) Vivek Goyal
2012-04-03 15:37 ` Vivek Goyal
2012-04-03 15:37 ` Vivek Goyal
[not found] ` <20120403153736.GI5913-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-03 16:36 ` Tao Ma
2012-04-03 16:36 ` Tao Ma
2012-04-03 16:36 ` Tao Ma
[not found] ` <4F7B2708.6080504-d1IQDZat3X0@public.gmane.org>
2012-04-03 16:50 ` Vivek Goyal
2012-04-03 16:50 ` Vivek Goyal
2012-04-03 16:50 ` Vivek Goyal
[not found] ` <20120403164959.GJ5913-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-03 17:26 ` Tao Ma
2012-04-03 17:26 ` Tao Ma
2012-04-03 17:26 ` Tao Ma
[not found] ` <4F7B32AE.7050900-d1IQDZat3X0@public.gmane.org>
2012-04-04 12:35 ` Shaohua Li
2012-04-04 12:35 ` Shaohua Li
[not found] ` <CANejiEU1qAsvogozY3MjZnpcrbYZO4CkRE8s73WGPc_R5LKV9g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-04 13:37 ` Vivek Goyal
2012-04-04 13:37 ` Vivek Goyal
[not found] ` <20120404133705.GB12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 14:52 ` Shaohua Li
2012-04-04 14:52 ` Shaohua Li
[not found] ` <CANejiEVD6nFVqX8Jf_hmRHg8YyBvPxbTVjgEif3mO1aa925KkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-04 15:10 ` Vivek Goyal
2012-04-04 15:10 ` Vivek Goyal
[not found] ` <20120404151014.GD12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 16:06 ` Tao Ma
2012-04-04 16:06 ` Tao Ma
2012-04-04 16:06 ` Tao Ma
2012-04-04 15:10 ` Vivek Goyal
2012-04-04 16:45 ` Tao Ma [this message]
2012-04-04 16:45 ` Tao Ma
[not found] ` <4F7C7A91.8040707-d1IQDZat3X0@public.gmane.org>
2012-04-04 16:50 ` Vivek Goyal
2012-04-04 16:50 ` Vivek Goyal
[not found] ` <20120404165048.GF12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 17:17 ` Vivek Goyal
2012-04-04 17:17 ` Vivek Goyal
2012-04-04 17:17 ` Vivek Goyal
2012-04-04 17:18 ` Tao Ma
2012-04-04 17:18 ` Tao Ma
2012-04-04 17:18 ` Tao Ma
[not found] ` <4F7C824D.2050308-d1IQDZat3X0@public.gmane.org>
2012-04-04 17:27 ` Vivek Goyal
2012-04-04 17:27 ` Vivek Goyal
2012-04-04 18:22 ` Vivek Goyal
2012-04-04 18:22 ` Vivek Goyal
[not found] ` <20120404182238.GI12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 18:36 ` Tao Ma
2012-04-04 18:36 ` Tao Ma
2012-04-04 18:36 ` Tao Ma
2012-04-04 18:22 ` Vivek Goyal
2012-04-04 12:35 ` Shaohua Li
2012-04-04 13:31 ` Vivek Goyal
2012-04-04 13:31 ` Vivek Goyal
2012-04-02 22:41 ` [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq Tao Ma
2012-04-02 22:20 ` Tao Ma
2012-04-02 22:03 ` Tao Ma
2012-03-28 22:51 ` [PATCH 19/21] blkcg: move blkio_group_conf->iops and ->bps to blk-throttle Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 20/21] blkcg: pass around pd->pdata instead of pd itself in prfill functions Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 21/21] blkcg: drop BLKCG_STAT_{PRIV|POL|OFF} macros Tejun Heo
2012-03-28 22:51 ` Tejun Heo
2012-03-29 8:18 ` [PATCHSET] block: modularize blkcg config and stat file handling Jens Axboe
2012-03-29 8:18 ` Jens Axboe
[not found] ` <4F741AED.6090901-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2012-04-02 20:02 ` Tejun Heo
2012-04-02 20:02 ` Tejun Heo
2012-04-02 20:02 ` Tejun Heo
[not found] ` <20120402200233.GB17175-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-04-02 21:51 ` Jens Axboe
2012-04-02 21:51 ` Jens Axboe
[not found] ` <4F7A1F52.50706-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2012-04-02 22:33 ` Tejun Heo
2012-04-02 22:33 ` Tejun Heo
2012-04-02 22:33 ` Tejun Heo
2012-04-01 19:38 ` Vivek Goyal
2012-04-01 19:38 ` Vivek Goyal
2012-04-01 19:38 ` Vivek Goyal
2012-04-01 21:42 ` Tejun Heo
2012-04-01 21:42 ` Tejun Heo
2012-04-01 21:42 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F7C7A91.8040707@tao.ma \
--to=tm-d1iqdzat3x0@public.gmane.org \
--cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=ctalbott-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=rni-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=shli-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.