From: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Tao Ma <tm-d1IQDZat3X0@public.gmane.org>
Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org,
ctalbott-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
rni-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Shaohua Li <shli-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Subject: Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
Date: Wed, 4 Apr 2012 09:31:19 -0400 [thread overview]
Message-ID: <20120404133119.GA12676@redhat.com> (raw)
In-Reply-To: <4F7B32AE.7050900-d1IQDZat3X0@public.gmane.org>
On Wed, Apr 04, 2012 at 01:26:06AM +0800, Tao Ma wrote:
> On 04/04/2012 12:50 AM, Vivek Goyal wrote:
> > On Wed, Apr 04, 2012 at 12:36:24AM +0800, Tao Ma wrote:
> >
> > [..]
> >>> - Can't we just set the slice_idle=0 and "quantum" to some high value
> >>> say "64" or "128" and achieve similar results to iops based scheduler?
> >> yes, I should say cfq with slice_idle = 0 works well in most cases. But
> >> if it comes to blkcg with ssd, it is really a disaster. You know, cfq
> >> has to choose between different cgroups, so even if you choose 1ms as
> >> the service time for each cgroup(actually in my test, only >2ms can work
> >> reliably). the latency for some requests(which have been sent by the
> >> user while not submitting to the driver) is really too much for the
> >> application. I don't think there is a way to resolve it in cfq.
> >
> > Ok, so now you are saying that CFQ as such is not a problem but blkcg
> > logic in CFQ is an issue.
> >
> > What's the issue there? I think the issue there also is group idling.
> > If you set group_idle=0, that idling will be cut down and switching
> > between groups will be fast. That's a different thing that in the
> > process you will most likely lose service differentiation also for
> > most of the workloads.
> No, group_idle=0 doesn't help. We don't have problem with idling, the
> disk is busy for all the tasks, we just want it to be proportional and
> time endurable.
I am not sure what does time "endurable" mean here. So if group idling
is not a problem, then what is the problem. I am still failing to
understand that what's the problem?
[..]
> > How iops_weight and switching different than CFQ group scheduling logic?
> > I think shaohua was talking of using similar logic. What would you do
> > fundamentally different so that without idling you will get service
> > differentiation?
> I am thinking of differentiate different groups with iops, so if there
> are 3 groups(the weight are 100, 200, 300) we can let them submit 1 io,
> 2 io and 3 io in a round-robin way. With a intel ssd, every io can be
> finished within 100us. So the maximum latency for one io is about 600us,
> still less than 1ms. But with cfq, if all the cgroups are busy, we have
> to switch between these group in ms which means the maximum latency will
> be 6ms. It is terrible for some applications since they use ssds now.
You can always do faster switching in CFQ. With idling disabled, you can
always expire a queue after dispatching few requests. You don't have to
wait for 1ms. I am not sure why are you assuming that the minimum time
a queue/group has to dispatch is 1ms.
We already have the notion of not dispatching too many IOs from async
queues. (cfq_prio_to_maxrq()). Something similar can be quickly written
for iops_mode(). Just define a quantum of requests to be dispatched (say
10), and expire a queue after that and charge the queue/group for those
10 requests. Based on its weight, it will automatically go in right
position in the tree and you should get iops based scheduling.
> >
> > If you explain your logic in detail, it will help.
> >
> > BTW, in last mail you mentioned that in iops_mode() we make use of time.
> > That's not the case. in iops_mode() we charge group based on number of
> > requests dispatched. (slice_dispatch records number of requests dispatched
> > from the queue in that slice). So to me counting number of requests
> > instead of time will effectively switch CFQ to iops based scheduler, isn't
> > it?
> yes, iops_mode in cfq is calculated iops, but it is switched according
> to the time slice, right? So it can't resolve the problem I mentioned above.
What do you mean that it is switched according to time slice?
We currently have separate scheduling tree for queue and groups. Currently
iops mode works only for groups. We might still allocate a time slice
to a queue but with idling disabled we will expire it much early. Because
most of the workloads don't keep queue busy long enough. If your workload
keeps the queue busy long enough (say for few ms), then we can introduce
the logic in queue expiry to expire queue after dispatch of few requests
in iops mode so that queue don't get extended time slices.
Thanks
Vivek
next prev parent reply other threads:[~2012-04-04 13:31 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
[not found] ` <1332975091-10950-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-03-28 22:51 ` [PATCH 01/21] blkcg: remove unused @pol and @plid parameters Tejun Heo
2012-03-28 22:51 ` [PATCH 02/21] blkcg: BLKIO_STAT_CPU_SECTORS doesn't have subcounters Tejun Heo
2012-03-28 22:51 ` [PATCH 03/21] blkcg: introduce blkg_stat and blkg_rwstat Tejun Heo
2012-03-28 22:51 ` [PATCH 04/21] blkcg: restructure statistics printing Tejun Heo
2012-03-28 22:51 ` [PATCH 05/21] blkcg: drop blkiocg_file_write_u64() Tejun Heo
2012-03-28 22:51 ` [PATCH 06/21] blkcg: restructure configuration printing Tejun Heo
2012-03-28 22:51 ` [PATCH 07/21] blkcg: restructure blkio_group configruation setting Tejun Heo
2012-03-28 22:51 ` [PATCH 08/21] blkcg: blkg_conf_prep() Tejun Heo
[not found] ` <1332975091-10950-9-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-03-28 22:53 ` Tejun Heo
2012-03-28 22:51 ` [PATCH 09/21] blkcg: export conf/stat helpers to prepare for reorganization Tejun Heo
2012-03-28 22:51 ` [PATCH 10/21] blkcg: implement blkio_policy_type->cftypes Tejun Heo
2012-03-28 22:51 ` [PATCH 11/21] blkcg: move conf/stat file handling code to policies Tejun Heo
2012-03-28 22:51 ` [PATCH 12/21] cfq: collapse cfq.h into cfq-iosched.c Tejun Heo
2012-03-28 22:51 ` [PATCH 13/21] blkcg: move statistics update code to policies Tejun Heo
2012-03-28 22:51 ` [PATCH 14/21] blkcg: cfq doesn't need per-cpu dispatch stats Tejun Heo
2012-03-28 22:51 ` [PATCH 15/21] blkcg: add blkio_policy_ops operations for exit and stat reset Tejun Heo
2012-03-28 22:51 ` [PATCH 16/21] blkcg: move blkio_group_stats to cfq-iosched.c Tejun Heo
2012-03-28 22:51 ` [PATCH 17/21] blkcg: move blkio_group_stats_cpu and friends to blk-throttle.c Tejun Heo
2012-03-28 22:51 ` [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq Tejun Heo
[not found] ` <1332975091-10950-19-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-04-01 21:09 ` Vivek Goyal
[not found] ` <20120401210955.GE6116-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-01 21:22 ` Tejun Heo
2012-04-02 21:39 ` Tao Ma
[not found] ` <4F7A1C8B.3010402-d1IQDZat3X0@public.gmane.org>
2012-04-02 21:49 ` Tejun Heo
[not found] ` <20120402214938.GA19634-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-04-02 22:03 ` Tao Ma
[not found] ` <4F7A2217.2030201-d1IQDZat3X0@public.gmane.org>
2012-04-02 22:17 ` Tejun Heo
[not found] ` <20120402221702.GA21017-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-04-02 22:20 ` Tao Ma
[not found] ` <4F7A261A.9000200-d1IQDZat3X0@public.gmane.org>
2012-04-02 22:25 ` Vivek Goyal
[not found] ` <20120402222504.GA2672-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-02 22:28 ` Tejun Heo
2012-04-02 22:41 ` Tao Ma
[not found] ` <4F7A2B21.5000907-d1IQDZat3X0@public.gmane.org>
2012-04-03 15:37 ` IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq) Vivek Goyal
[not found] ` <20120403153736.GI5913-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-03 16:36 ` Tao Ma
[not found] ` <4F7B2708.6080504-d1IQDZat3X0@public.gmane.org>
2012-04-03 16:50 ` Vivek Goyal
[not found] ` <20120403164959.GJ5913-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-03 17:26 ` Tao Ma
[not found] ` <4F7B32AE.7050900-d1IQDZat3X0@public.gmane.org>
2012-04-04 12:35 ` Shaohua Li
[not found] ` <CANejiEU1qAsvogozY3MjZnpcrbYZO4CkRE8s73WGPc_R5LKV9g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-04 13:37 ` Vivek Goyal
[not found] ` <20120404133705.GB12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 14:52 ` Shaohua Li
[not found] ` <CANejiEVD6nFVqX8Jf_hmRHg8YyBvPxbTVjgEif3mO1aa925KkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-04 15:10 ` Vivek Goyal
[not found] ` <20120404151014.GD12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 16:06 ` Tao Ma
2012-04-04 16:45 ` Tao Ma
[not found] ` <4F7C7A91.8040707-d1IQDZat3X0@public.gmane.org>
2012-04-04 16:50 ` Vivek Goyal
[not found] ` <20120404165048.GF12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 17:17 ` Vivek Goyal
2012-04-04 17:18 ` Tao Ma
[not found] ` <4F7C824D.2050308-d1IQDZat3X0@public.gmane.org>
2012-04-04 17:27 ` Vivek Goyal
2012-04-04 18:22 ` Vivek Goyal
[not found] ` <20120404182238.GI12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 18:36 ` Tao Ma
2012-04-04 13:31 ` Vivek Goyal [this message]
2012-03-28 22:51 ` [PATCH 19/21] blkcg: move blkio_group_conf->iops and ->bps to blk-throttle Tejun Heo
2012-03-28 22:51 ` [PATCH 20/21] blkcg: pass around pd->pdata instead of pd itself in prfill functions Tejun Heo
2012-03-28 22:51 ` [PATCH 21/21] blkcg: drop BLKCG_STAT_{PRIV|POL|OFF} macros Tejun Heo
2012-03-29 8:18 ` [PATCHSET] block: modularize blkcg config and stat file handling Jens Axboe
[not found] ` <4F741AED.6090901-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2012-04-02 20:02 ` Tejun Heo
[not found] ` <20120402200233.GB17175-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-04-02 21:51 ` Jens Axboe
[not found] ` <4F7A1F52.50706-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2012-04-02 22:33 ` Tejun Heo
2012-04-01 19:38 ` Vivek Goyal
2012-04-01 21:42 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120404133119.GA12676@redhat.com \
--to=vgoyal-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=ctalbott-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=rni-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=shli-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=tm-d1IQDZat3X0@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).