From: Munehiro Ikeda <m-ikeda@ds.jp.nec.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Gui Jianfeng <guijianfeng@cn.fujitsu.com>,
linux-kernel@vger.kernel.org,
containers@lists.linux-foundation.org, dm-devel@redhat.com,
jens.axboe@oracle.com, nauman@google.com, dpshah@google.com,
lizf@cn.fujitsu.com, mikew@google.com, fchecconi@gmail.com,
paolo.valente@unimore.it, ryov@valinux.co.jp,
fernando@oss.ntt.co.jp, s-uchida@ap.jp.nec.com,
taka@valinux.co.jp, jmoyer@redhat.com, dhaval@linux.vnet.ibm.com,
balbir@linux.vnet.ibm.com, righi.andrea@gmail.com,
jbaron@redhat.com, agk@redhat.com, snitzer@redhat.com,
akpm@linux-foundation.org, peterz@infradead.org
Subject: Re: [PATCH] io-controller: implement per group request allocation limitation
Date: Mon, 13 Jul 2009 17:08:08 -0400 [thread overview]
Message-ID: <4A5BA238.3030902@ds.jp.nec.com> (raw)
In-Reply-To: <20090713160352.GA3714@redhat.com>
Vivek Goyal wrote, on 07/13/2009 12:03 PM:
> On Fri, Jul 10, 2009 at 09:56:21AM +0800, Gui Jianfeng wrote:
>> Hi Vivek,
>>
>> This patch exports a cgroup based per group request limits interface.
>> and removes the global one. Now we can use this interface to perform
>> different request allocation limitation for different groups.
>>
>
> Thanks Gui. Few points come to mind.
>
> - You seem to be making this as per cgroup limit on all devices. I guess
> that different devices in the system can have different settings of
> q->nr_requests and hence will probably want different per group limit.
> So we might have to make it per cgroup per device limit.
From the viewpoint of implementation, there is a difficulty in my mind to
implement per cgroup per device limit arising from that io_group is allocated
when associated device is firstly used. I guess Gui chose per cgroup limit
on all devices approach because of this, right?
> - There does not seem to be any checks for making sure that children
> cgroups don't have more request descriptors allocated than parent group.
>
> - I am re-thinking that what's the advantage of configuring request
> descriptors also through cgroups. It does bring in additional complexity
> with it and it should justfiy the advantages. Can you think of some?
>
> Until and unless we can come up with some significant advantages, I will
> prefer to continue to use per group limit through q->nr_group_requests
> interface instead of cgroup. Once things stablize, we can revisit it and
> see how this interface can be improved.
I agree. I will try to clarify if per group per device limitation is needed
or not (or, if it has the advantage beyond the complexity) through some tests.
Tnaks a lot,
Muuhh
> Thanks
> Vivek
>
>> Signed-off-by: Gui Jianfeng<guijianfeng@cn.fujitsu.com>
>> ---
>> block/blk-core.c | 23 ++++++++++--
>> block/blk-settings.c | 1 -
>> block/blk-sysfs.c | 43 -----------------------
>> block/elevator-fq.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++---
>> block/elevator-fq.h | 4 ++
>> 5 files changed, 111 insertions(+), 54 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 79fe6a9..7010b76 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -722,13 +722,20 @@ static void ioc_set_batching(struct request_queue *q, struct io_context *ioc)
>> static void __freed_request(struct request_queue *q, int sync,
>> struct request_list *rl)
>> {
>> + struct io_group *iog;
>> + unsigned long nr_group_requests;
>> +
>> if (q->rq_data.count[sync]< queue_congestion_off_threshold(q))
>> blk_clear_queue_congested(q, sync);
>>
>> if (q->rq_data.count[sync] + 1<= q->nr_requests)
>> blk_clear_queue_full(q, sync);
>>
>> - if (rl->count[sync] + 1<= q->nr_group_requests) {
>> + iog = rl_iog(rl);
>> +
>> + nr_group_requests = get_group_requests(q, iog);
>> +
>> + if (nr_group_requests&& rl->count[sync] + 1<= nr_group_requests) {
>> if (waitqueue_active(&rl->wait[sync]))
>> wake_up(&rl->wait[sync]);
>> }
>> @@ -828,6 +835,8 @@ static struct request *get_request(struct request_queue *q, int rw_flags,
>> const bool is_sync = rw_is_sync(rw_flags) != 0;
>> int may_queue, priv;
>> int sleep_on_global = 0;
>> + struct io_group *iog;
>> + unsigned long nr_group_requests;
>>
>> may_queue = elv_may_queue(q, rw_flags);
>> if (may_queue == ELV_MQUEUE_NO)
>> @@ -843,7 +852,12 @@ static struct request *get_request(struct request_queue *q, int rw_flags,
>> if (q->rq_data.count[is_sync]+1>= q->nr_requests)
>> blk_set_queue_full(q, is_sync);
>>
>> - if (rl->count[is_sync]+1>= q->nr_group_requests) {
>> + iog = rl_iog(rl);
>> +
>> + nr_group_requests = get_group_requests(q, iog);
>> +
>> + if (nr_group_requests&&
>> + rl->count[is_sync]+1>= nr_group_requests) {
>> ioc = current_io_context(GFP_ATOMIC, q->node);
>> /*
>> * The queue request descriptor group will fill after this
>> @@ -852,7 +866,7 @@ static struct request *get_request(struct request_queue *q, int rw_flags,
>> * This process will be allowed to complete a batch of
>> * requests, others will be blocked.
>> */
>> - if (rl->count[is_sync]<= q->nr_group_requests)
>> + if (rl->count[is_sync]<= nr_group_requests)
>> ioc_set_batching(q, ioc);
>> else {
>> if (may_queue != ELV_MQUEUE_MUST
>> @@ -898,7 +912,8 @@ static struct request *get_request(struct request_queue *q, int rw_flags,
>> * from per group request list
>> */
>>
>> - if (rl->count[is_sync]>= (3 * q->nr_group_requests / 2))
>> + if (nr_group_requests&&
>> + rl->count[is_sync]>= (3 * nr_group_requests / 2))
>> goto out;
>>
>> rl->starved[is_sync] = 0;
>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>> index 78b8aec..bd582a7 100644
>> --- a/block/blk-settings.c
>> +++ b/block/blk-settings.c
>> @@ -148,7 +148,6 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)
>> * set defaults
>> */
>> q->nr_requests = BLKDEV_MAX_RQ;
>> - q->nr_group_requests = BLKDEV_MAX_GROUP_RQ;
>>
>> q->make_request_fn = mfn;
>> blk_queue_dma_alignment(q, 511);
>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> index 92b9f25..706d852 100644
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -78,40 +78,8 @@ queue_requests_store(struct request_queue *q, const char *page, size_t count)
>> return ret;
>> }
>> #ifdef CONFIG_GROUP_IOSCHED
>> -static ssize_t queue_group_requests_show(struct request_queue *q, char *page)
>> -{
>> - return queue_var_show(q->nr_group_requests, (page));
>> -}
>> -
>> extern void elv_io_group_congestion_threshold(struct request_queue *q,
>> struct io_group *iog);
>> -
>> -static ssize_t
>> -queue_group_requests_store(struct request_queue *q, const char *page,
>> - size_t count)
>> -{
>> - struct hlist_node *n;
>> - struct io_group *iog;
>> - struct elv_fq_data *efqd;
>> - unsigned long nr;
>> - int ret = queue_var_store(&nr, page, count);
>> -
>> - if (nr< BLKDEV_MIN_RQ)
>> - nr = BLKDEV_MIN_RQ;
>> -
>> - spin_lock_irq(q->queue_lock);
>> -
>> - q->nr_group_requests = nr;
>> -
>> - efqd =&q->elevator->efqd;
>> -
>> - hlist_for_each_entry(iog, n,&efqd->group_list, elv_data_node) {
>> - elv_io_group_congestion_threshold(q, iog);
>> - }
>> -
>> - spin_unlock_irq(q->queue_lock);
>> - return ret;
>> -}
>> #endif
>>
>> static ssize_t queue_ra_show(struct request_queue *q, char *page)
>> @@ -278,14 +246,6 @@ static struct queue_sysfs_entry queue_requests_entry = {
>> .store = queue_requests_store,
>> };
>>
>> -#ifdef CONFIG_GROUP_IOSCHED
>> -static struct queue_sysfs_entry queue_group_requests_entry = {
>> - .attr = {.name = "nr_group_requests", .mode = S_IRUGO | S_IWUSR },
>> - .show = queue_group_requests_show,
>> - .store = queue_group_requests_store,
>> -};
>> -#endif
>> -
>> static struct queue_sysfs_entry queue_ra_entry = {
>> .attr = {.name = "read_ahead_kb", .mode = S_IRUGO | S_IWUSR },
>> .show = queue_ra_show,
>> @@ -360,9 +320,6 @@ static struct queue_sysfs_entry queue_iostats_entry = {
>>
>> static struct attribute *default_attrs[] = {
>> &queue_requests_entry.attr,
>> -#ifdef CONFIG_GROUP_IOSCHED
>> - &queue_group_requests_entry.attr,
>> -#endif
>> &queue_ra_entry.attr,
>> &queue_max_hw_sectors_entry.attr,
>> &queue_max_sectors_entry.attr,
>> diff --git a/block/elevator-fq.c b/block/elevator-fq.c
>> index 29392e7..bfb0210 100644
>> --- a/block/elevator-fq.c
>> +++ b/block/elevator-fq.c
>> @@ -59,6 +59,35 @@ elv_release_ioq(struct elevator_queue *eq, struct io_queue **ioq_ptr);
>> #define for_each_entity_safe(entity, parent) \
>> for (; entity&& ({ parent = entity->parent; 1; }); entity = parent)
>>
>> +unsigned short get_group_requests(struct request_queue *q,
>> + struct io_group *iog)
>> +{
>> + struct cgroup_subsys_state *css;
>> + struct io_cgroup *iocg;
>> + unsigned long nr_group_requests;
>> +
>> + if (!iog)
>> + return q->nr_requests;
>> +
>> + rcu_read_lock();
>> +
>> + if (!iog->iocg_id) {
>> + nr_group_requests = 0;
>> + goto out;
>> + }
>> +
>> + css = css_lookup(&io_subsys, iog->iocg_id);
>> + if (!css) {
>> + nr_group_requests = 0;
>> + goto out;
>> + }
>> +
>> + iocg = container_of(css, struct io_cgroup, css);
>> + nr_group_requests = iocg->nr_group_requests;
>> +out:
>> + rcu_read_unlock();
>> + return nr_group_requests;
>> +}
>>
>> static struct io_entity *bfq_lookup_next_entity(struct io_sched_data *sd,
>> int extract);
>> @@ -1257,14 +1286,17 @@ void elv_io_group_congestion_threshold(struct request_queue *q,
>> struct io_group *iog)
>> {
>> int nr;
>> + unsigned long nr_group_requests;
>>
>> - nr = q->nr_group_requests - (q->nr_group_requests / 8) + 1;
>> - if (nr> q->nr_group_requests)
>> - nr = q->nr_group_requests;
>> + nr_group_requests = get_group_requests(q, iog);
>> +
>> + nr = nr_group_requests - (nr_group_requests / 8) + 1;
>> + if (nr> nr_group_requests)
>> + nr = nr_group_requests;
>> iog->nr_congestion_on = nr;
>>
>> - nr = q->nr_group_requests - (q->nr_group_requests / 8)
>> - - (q->nr_group_requests / 16) - 1;
>> + nr = nr_group_requests - (nr_group_requests / 8)
>> + - (nr_group_requests / 16) - 1;
>> if (nr< 1)
>> nr = 1;
>> iog->nr_congestion_off = nr;
>> @@ -1283,6 +1315,7 @@ int elv_io_group_congested(struct request_queue *q, struct page *page, int sync)
>> {
>> struct io_group *iog;
>> int ret = 0;
>> + unsigned long nr_group_requests;
>>
>> rcu_read_lock();
>>
>> @@ -1300,10 +1333,11 @@ int elv_io_group_congested(struct request_queue *q, struct page *page, int sync)
>> }
>>
>> ret = elv_is_iog_congested(q, iog, sync);
>> + nr_group_requests = get_group_requests(q, iog);
>> if (ret)
>> elv_log_iog(&q->elevator->efqd, iog, "iog congested=%d sync=%d"
>> " rl.count[sync]=%d nr_group_requests=%d",
>> - ret, sync, iog->rl.count[sync], q->nr_group_requests);
>> + ret, sync, iog->rl.count[sync], nr_group_requests);
>> rcu_read_unlock();
>> return ret;
>> }
>> @@ -1549,6 +1583,48 @@ free_buf:
>> return ret;
>> }
>>
>> +static u64 io_cgroup_nr_requests_read(struct cgroup *cgroup,
>> + struct cftype *cftype)
>> +{
>> + struct io_cgroup *iocg;
>> + u64 ret;
>> +
>> + if (!cgroup_lock_live_group(cgroup))
>> + return -ENODEV;
>> +
>> + iocg = cgroup_to_io_cgroup(cgroup);
>> + spin_lock_irq(&iocg->lock);
>> + ret = iocg->nr_group_requests;
>> + spin_unlock_irq(&iocg->lock);
>> +
>> + cgroup_unlock();
>> +
>> + return ret;
>> +}
>> +
>> +static int io_cgroup_nr_requests_write(struct cgroup *cgroup,
>> + struct cftype *cftype,
>> + u64 val)
>> +{
>> + struct io_cgroup *iocg;
>> +
>> + if (val< BLKDEV_MIN_RQ)
>> + val = BLKDEV_MIN_RQ;
>> +
>> + if (!cgroup_lock_live_group(cgroup))
>> + return -ENODEV;
>> +
>> + iocg = cgroup_to_io_cgroup(cgroup);
>> +
>> + spin_lock_irq(&iocg->lock);
>> + iocg->nr_group_requests = (unsigned long)val;
>> + spin_unlock_irq(&iocg->lock);
>> +
>> + cgroup_unlock();
>> +
>> + return 0;
>> +}
>> +
>> #define SHOW_FUNCTION(__VAR) \
>> static u64 io_cgroup_##__VAR##_read(struct cgroup *cgroup, \
>> struct cftype *cftype) \
>> @@ -1735,6 +1811,11 @@ static int io_cgroup_disk_dequeue_read(struct cgroup *cgroup,
>>
>> struct cftype bfqio_files[] = {
>> {
>> + .name = "nr_group_requests",
>> + .read_u64 = io_cgroup_nr_requests_read,
>> + .write_u64 = io_cgroup_nr_requests_write,
>> + },
>> + {
>> .name = "policy",
>> .read_seq_string = io_cgroup_policy_read,
>> .write_string = io_cgroup_policy_write,
>> @@ -1790,6 +1871,7 @@ static struct cgroup_subsys_state *iocg_create(struct cgroup_subsys *subsys,
>>
>> spin_lock_init(&iocg->lock);
>> INIT_HLIST_HEAD(&iocg->group_data);
>> + iocg->nr_group_requests = BLKDEV_MAX_GROUP_RQ;
>> iocg->weight = IO_DEFAULT_GRP_WEIGHT;
>> iocg->ioprio_class = IO_DEFAULT_GRP_CLASS;
>> INIT_LIST_HEAD(&iocg->policy_list);
>> diff --git a/block/elevator-fq.h b/block/elevator-fq.h
>> index f089a55..df077d0 100644
>> --- a/block/elevator-fq.h
>> +++ b/block/elevator-fq.h
>> @@ -308,6 +308,7 @@ struct io_cgroup {
>> unsigned int weight;
>> unsigned short ioprio_class;
>>
>> + unsigned long nr_group_requests;
>> /* list of io_policy_node */
>> struct list_head policy_list;
>>
>> @@ -386,6 +387,9 @@ struct elv_fq_data {
>> unsigned int fairness;
>> };
>>
>> +extern unsigned short get_group_requests(struct request_queue *q,
>> + struct io_group *iog);
>> +
>> /* Logging facilities. */
>> #ifdef CONFIG_DEBUG_GROUP_IOSCHED
>> #define elv_log_ioq(efqd, ioq, fmt, args...) \
>> --
>> 1.5.4.rc3
>
--
IKEDA, Munehiro
NEC Corporation of America
m-ikeda@ds.jp.nec.com
next prev parent reply other threads:[~2009-07-13 21:11 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-02 20:01 [RFC] IO scheduler based IO controller V6 Vivek Goyal
2009-07-02 20:01 ` [PATCH 01/25] io-controller: Documentation Vivek Goyal
2009-07-02 20:01 ` [PATCH 02/25] io-controller: Core of the B-WF2Q+ scheduler Vivek Goyal
2009-07-02 20:01 ` [PATCH 03/25] io-controller: bfq support of in-class preemption Vivek Goyal
2009-07-02 20:01 ` [PATCH 04/25] io-controller: Common flat fair queuing code in elevaotor layer Vivek Goyal
2009-07-02 20:01 ` [PATCH 05/25] io-controller: Charge for time slice based on average disk rate Vivek Goyal
2009-07-02 20:01 ` [PATCH 06/25] io-controller: Modify cfq to make use of flat elevator fair queuing Vivek Goyal
2009-07-02 20:01 ` [PATCH 07/25] io-controller: core bfq scheduler changes for hierarchical setup Vivek Goyal
2009-07-02 20:01 ` [PATCH 08/25] io-controller: cgroup related changes for hierarchical group support Vivek Goyal
2009-07-02 20:01 ` [PATCH 09/25] io-controller: Common hierarchical fair queuing code in elevaotor layer Vivek Goyal
2009-07-06 2:46 ` Gui Jianfeng
2009-07-06 14:16 ` Vivek Goyal
2009-07-07 1:40 ` [PATCH] io-controller: Get rid of css id from io cgroup Gui Jianfeng
2009-07-08 14:04 ` Vivek Goyal
2009-07-02 20:01 ` [PATCH 10/25] io-controller: cfq changes to use hierarchical fair queuing code in elevaotor layer Vivek Goyal
2009-07-02 20:01 ` [PATCH 11/25] io-controller: Export disk time used and nr sectors dipatched through cgroups Vivek Goyal
2009-07-08 2:16 ` Gui Jianfeng
2009-07-08 14:00 ` Vivek Goyal
2009-07-02 20:01 ` [PATCH 12/25] io-controller: idle for sometime on sync queue before expiring it Vivek Goyal
2009-07-02 20:01 ` [PATCH 13/25] io-controller: Wait for requests to complete from last queue before new queue is scheduled Vivek Goyal
2009-07-02 20:09 ` Nauman Rafique
2009-07-02 20:17 ` Vivek Goyal
2009-07-02 20:01 ` [PATCH 14/25] io-controller: Separate out queue and data Vivek Goyal
2009-07-02 20:01 ` [PATCH 15/25] io-conroller: Prepare elevator layer for single queue schedulers Vivek Goyal
2009-07-02 20:01 ` [PATCH 16/25] io-controller: noop changes for hierarchical fair queuing Vivek Goyal
2009-07-02 20:01 ` [PATCH 17/25] io-controller: deadline " Vivek Goyal
2009-07-02 20:01 ` [PATCH 18/25] io-controller: anticipatory " Vivek Goyal
2009-07-02 20:01 ` [PATCH 19/25] blkio_cgroup patches from Ryo to track async bios Vivek Goyal
2009-07-02 20:01 ` [PATCH 20/25] io-controller: map async requests to appropriate cgroup Vivek Goyal
2009-08-03 2:13 ` Gui Jianfeng
2009-08-04 1:25 ` Vivek Goyal
2009-07-02 20:01 ` [PATCH 21/25] io-controller: Per cgroup request descriptor support Vivek Goyal
2009-07-08 3:27 ` Gui Jianfeng
2009-07-08 13:57 ` Vivek Goyal
2009-07-21 5:37 ` Gui Jianfeng
2009-07-21 5:55 ` Nauman Rafique
2009-07-21 14:01 ` Vivek Goyal
2009-07-21 17:57 ` Nauman Rafique
2009-07-02 20:01 ` [PATCH 22/25] io-controller: Per io group bdi congestion interface Vivek Goyal
2009-07-17 0:16 ` Munehiro Ikeda
2009-07-17 13:52 ` Vivek Goyal
2009-07-02 20:01 ` [PATCH 23/25] io-controller: Support per cgroup per device weights and io class Vivek Goyal
2009-07-02 20:01 ` [PATCH 24/25] io-controller: Debug hierarchical IO scheduling Vivek Goyal
2009-07-02 20:01 ` [PATCH 25/25] io-controller: experimental debug patch for async queue wait before expiry Vivek Goyal
2009-07-08 3:56 ` [RFC] IO scheduler based IO controller V6 Balbir Singh
2009-07-08 13:41 ` Vivek Goyal
2009-07-08 14:39 ` Balbir Singh
2009-07-09 1:58 ` Vivek Goyal
2009-07-10 1:56 ` [PATCH] io-controller: implement per group request allocation limitation Gui Jianfeng
2009-07-13 16:03 ` Vivek Goyal
2009-07-13 21:08 ` Munehiro Ikeda [this message]
2009-07-14 7:45 ` Gui Jianfeng
2009-08-04 2:00 ` Munehiro Ikeda
2009-08-04 6:38 ` Gui Jianfeng
2009-08-04 22:37 ` Vivek Goyal
2009-07-14 7:37 ` Gui Jianfeng
2009-08-04 2:02 ` Munehiro Ikeda
2009-08-04 6:41 ` Gui Jianfeng
2009-08-04 2:04 ` Munehiro Ikeda
2009-08-04 6:45 ` Gui Jianfeng
2009-07-27 2:10 ` [RFC] IO scheduler based IO controller V6 Gui Jianfeng
2009-07-27 12:55 ` Vivek Goyal
2009-07-28 3:27 ` Vivek Goyal
2009-07-28 3:36 ` Gui Jianfeng
2009-07-28 11:36 ` Gui Jianfeng
2009-07-29 9:07 ` Gui Jianfeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A5BA238.3030902@ds.jp.nec.com \
--to=m-ikeda@ds.jp.nec.com \
--cc=agk@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=containers@lists.linux-foundation.org \
--cc=dhaval@linux.vnet.ibm.com \
--cc=dm-devel@redhat.com \
--cc=dpshah@google.com \
--cc=fchecconi@gmail.com \
--cc=fernando@oss.ntt.co.jp \
--cc=guijianfeng@cn.fujitsu.com \
--cc=jbaron@redhat.com \
--cc=jens.axboe@oracle.com \
--cc=jmoyer@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=mikew@google.com \
--cc=nauman@google.com \
--cc=paolo.valente@unimore.it \
--cc=peterz@infradead.org \
--cc=righi.andrea@gmail.com \
--cc=ryov@valinux.co.jp \
--cc=s-uchida@ap.jp.nec.com \
--cc=snitzer@redhat.com \
--cc=taka@valinux.co.jp \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).