From: Munehiro Ikeda <m-ikeda@ds.jp.nec.com>
To: Nauman Rafique <nauman@google.com>
Cc: linux-kernel@vger.kernel.org, Vivek Goyal <vgoyal@redhat.com>,
Ryo Tsuruta <ryov@valinux.co.jp>,
taka@valinux.co.jp, kamezawa.hiroyu@jp.fujitsu.com,
Andrea Righi <righi.andrea@gmail.com>,
Gui Jianfeng <guijianfeng@cn.fujitsu.com>,
akpm@linux-foundation.org, balbir@linux.vnet.ibm.com
Subject: Re: [RFC][PATCH 10/11] blkiocg async: Async queue per cfq_group
Date: Fri, 13 Aug 2010 20:49:45 -0400 [thread overview]
Message-ID: <4C65E829.2030401@ds.jp.nec.com> (raw)
In-Reply-To: <AANLkTi=4V6XDbqjMBwNaLE9B+h1V-2hGPZFTXNsjbSpA@mail.gmail.com>
Nauman Rafique wrote, on 08/13/2010 07:01 PM:
> On Fri, Aug 13, 2010 at 2:00 PM, Munehiro Ikeda<m-ikeda@ds.jp.nec.com> wrote:
>> Nauman Rafique wrote, on 08/12/2010 09:24 PM:
>>>
>>> Muuhh,
>>> I do not understand the motivation behind making cfq_io_context per
>>> cgroup. We can extract cgroup id from bio, use that to lookup cgroup
>>> and its async queues. What am I missing?
>>
>> What cgroup ID can suggest is only cgroup to which the thread belongs,
>> but not the thread itself. This means that IO priority and IO prio-class
>> can't be determined by cgroup ID.
>
> One way to do it would be to get ioprio and class from the context
> that is used to submit the async request. IO priority and class is
> tied to a thread anyways. And the io context of that thread can be
> used to retrieve those values. Even if a thread is submitting IOs to
> different cgroups, I don't see how you can apply different IO priority
> and class to its async IOs for different cgroups. Please let me know
> if it does not make sense.
My talking about IO prio might be misleading and pointless, sorry.
I was confused IO context of flush thread and thread which dirtied
a page.
Then, reason why making cfq_io_context per cgroup.
That is simply because the current code retrieves cfqq from
cfq_io_context by cic_to_cfqq().
As you said, we can lookup cgroup by cgroup ID and its async queue.
This is done by cfq_get_queue() and cfq_async_queue_prio(). So if
we change all call of cic_to_cfqq() into cfq_get_queue() (or slightly
changed version of cfq_get_queue() ), we may avoid making cfq_io_context
per cgroup.
Which approach is better depends on the complexity of the patch. I chose
the former approach, but if the second approach is more simple,
it is better. I need to think it over and am feeling that the second
approach would be nicer. Thanks for the suggestion.
>> The pointers of async queues are held in cfqg->async_cfqq[class][prio].
>> It is impossible to find out which queue is appropriate by only cgroup
>> ID if considering IO priority.
>>
>> Frankly speaking, I'm not 100% confident if IO priority should be
>> applied on async write IOs, but anyway, I made up my mind to keep the
>> current scheme.
>>
>> Do I make sense? If you have any other idea, I am glad to hear.
>>
>>
>> Thanks,
>> Muuhh
>>
>>
>>>
>>> On Thu, Jul 8, 2010 at 8:22 PM, Munehiro Ikeda<m-ikeda@ds.jp.nec.com>
>>> wrote:
>>>>
>>>> This is the main body for async capability of blkio controller.
>>>> The basic ideas are
>>>> - To move async queues from cfq_data to cfq_group, and
>>>> - To associate cfq_io_context with cfq_group
>>>>
>>>> Each cfq_io_context, which was created per an io_context
>>>> per a device, is now created per an io_context per a cfq_group.
>>>> Each cfq_group is created per a cgroup per a device, so
>>>> cfq_io_context is now resulted in to be created per
>>>> an io_context per a cgroup per a device.
>>>>
>>>> To protect link between cfq_io_context and cfq_group,
>>>> locking code is modified in several parts.
>>>>
>>>> ToDo:
>>>> - Lock protection still might be imperfect and more investigation
>>>> is needed.
>>>>
>>>> - cic_index of root cfq_group (cfqd->root_group.cic_index) is not
>>>> removed and lasts beyond elevator switching.
>>>> This issues should be solved.
>>>>
>>>> Signed-off-by: Munehiro "Muuhh" Ikeda<m-ikeda@ds.jp.nec.com>
>>>> ---
>>>> block/cfq-iosched.c | 277
>>>> ++++++++++++++++++++++++++++-----------------
>>>> include/linux/iocontext.h | 2 +-
>>>> 2 files changed, 175 insertions(+), 104 deletions(-)
>>>>
>>>> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
>>>> index 68f76e9..4186c30 100644
>>>> --- a/block/cfq-iosched.c
>>>> +++ b/block/cfq-iosched.c
>>>> @@ -191,10 +191,23 @@ struct cfq_group {
>>>> struct cfq_rb_root service_trees[2][3];
>>>> struct cfq_rb_root service_tree_idle;
>>>>
>>>> + /*
>>>> + * async queue for each priority case
>>>> + */
>>>> + struct cfq_queue *async_cfqq[2][IOPRIO_BE_NR];
>>>> + struct cfq_queue *async_idle_cfqq;
>>>> +
>>>> unsigned long saved_workload_slice;
>>>> enum wl_type_t saved_workload;
>>>> enum wl_prio_t saved_serving_prio;
>>>> struct blkio_group blkg;
>>>> + struct cfq_data *cfqd;
>>>> +
>>>> + /* lock for cic_list */
>>>> + spinlock_t lock;
>>>> + unsigned int cic_index;
>>>> + struct list_head cic_list;
>>>> +
>>>> #ifdef CONFIG_CFQ_GROUP_IOSCHED
>>>> struct hlist_node cfqd_node;
>>>> atomic_t ref;
>>>> @@ -254,12 +267,6 @@ struct cfq_data {
>>>> struct cfq_queue *active_queue;
>>>> struct cfq_io_context *active_cic;
>>>>
>>>> - /*
>>>> - * async queue for each priority case
>>>> - */
>>>> - struct cfq_queue *async_cfqq[2][IOPRIO_BE_NR];
>>>> - struct cfq_queue *async_idle_cfqq;
>>>> -
>>>> sector_t last_position;
>>>>
>>>> /*
>>>> @@ -275,8 +282,6 @@ struct cfq_data {
>>>> unsigned int cfq_latency;
>>>> unsigned int cfq_group_isolation;
>>>>
>>>> - unsigned int cic_index;
>>>> - struct list_head cic_list;
>>>>
>>>> /*
>>>> * Fallback dummy cfqq for extreme OOM conditions
>>>> @@ -418,10 +423,16 @@ static inline int cfqg_busy_async_queues(struct
>>>> cfq_data *cfqd,
>>>> }
>>>>
>>>> static void cfq_dispatch_insert(struct request_queue *, struct request
>>>> *);
>>>> +static void __cfq_exit_single_io_context(struct cfq_data *,
>>>> + struct cfq_group *,
>>>> + struct cfq_io_context *);
>>>> static struct cfq_queue *cfq_get_queue(struct cfq_data *, bool,
>>>> - struct io_context *, gfp_t);
>>>> + struct cfq_io_context *, gfp_t);
>>>> static struct cfq_io_context *cfq_cic_lookup(struct cfq_data *,
>>>> - struct io_context *);
>>>> + struct cfq_group *,
>>>> + struct io_context *);
>>>> +static void cfq_put_async_queues(struct cfq_group *);
>>>> +static int cfq_alloc_cic_index(void);
>>>>
>>>> static inline struct cfq_queue *cic_to_cfqq(struct cfq_io_context *cic,
>>>> bool is_sync)
>>>> @@ -438,17 +449,28 @@ static inline void cic_set_cfqq(struct
>>>> cfq_io_context *cic,
>>>> #define CIC_DEAD_KEY 1ul
>>>> #define CIC_DEAD_INDEX_SHIFT 1
>>>>
>>>> -static inline void *cfqd_dead_key(struct cfq_data *cfqd)
>>>> +static inline void *cfqg_dead_key(struct cfq_group *cfqg)
>>>> {
>>>> - return (void *)(cfqd->cic_index<< CIC_DEAD_INDEX_SHIFT |
>>>> CIC_DEAD_KEY);
>>>> + return (void *)(cfqg->cic_index<< CIC_DEAD_INDEX_SHIFT |
>>>> CIC_DEAD_KEY);
>>>> +}
>>>> +
>>>> +static inline struct cfq_group *cic_to_cfqg(struct cfq_io_context *cic)
>>>> +{
>>>> + struct cfq_group *cfqg = cic->key;
>>>> +
>>>> + if (unlikely((unsigned long) cfqg& CIC_DEAD_KEY))
>>>> + cfqg = NULL;
>>>> +
>>>> + return cfqg;
>>>> }
>>>>
>>>> static inline struct cfq_data *cic_to_cfqd(struct cfq_io_context *cic)
>>>> {
>>>> - struct cfq_data *cfqd = cic->key;
>>>> + struct cfq_data *cfqd = NULL;
>>>> + struct cfq_group *cfqg = cic_to_cfqg(cic);
>>>>
>>>> - if (unlikely((unsigned long) cfqd& CIC_DEAD_KEY))
>>>> - return NULL;
>>>> + if (likely(cfqg))
>>>> + cfqd = cfqg->cfqd;
>>>>
>>>> return cfqd;
>>>> }
>>>> @@ -959,12 +981,12 @@ cfq_update_blkio_group_weight(struct blkio_group
>>>> *blkg, unsigned int weight)
>>>> }
>>>>
>>>> static struct cfq_group *
>>>> -cfq_find_alloc_cfqg(struct cfq_data *cfqd, struct cgroup *cgroup, int
>>>> create)
>>>> +cfq_find_alloc_cfqg(struct cfq_data *cfqd, struct blkio_cgroup *blkcg,
>>>> + int create)
>>>> {
>>>> - struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgroup);
>>>> struct cfq_group *cfqg = NULL;
>>>> void *key = cfqd;
>>>> - int i, j;
>>>> + int i, j, idx;
>>>> struct cfq_rb_root *st;
>>>> struct backing_dev_info *bdi =&cfqd->queue->backing_dev_info;
>>>> unsigned int major, minor;
>>>> @@ -978,14 +1000,21 @@ cfq_find_alloc_cfqg(struct cfq_data *cfqd, struct
>>>> cgroup *cgroup, int create)
>>>> if (cfqg || !create)
>>>> goto done;
>>>>
>>>> + idx = cfq_alloc_cic_index();
>>>> + if (idx< 0)
>>>> + goto done;
>>>> +
>>>> cfqg = kzalloc_node(sizeof(*cfqg), GFP_ATOMIC, cfqd->queue->node);
>>>> if (!cfqg)
>>>> - goto done;
>>>> + goto err;
>>>>
>>>> for_each_cfqg_st(cfqg, i, j, st)
>>>> *st = CFQ_RB_ROOT;
>>>> RB_CLEAR_NODE(&cfqg->rb_node);
>>>>
>>>> + spin_lock_init(&cfqg->lock);
>>>> + INIT_LIST_HEAD(&cfqg->cic_list);
>>>> +
>>>> /*
>>>> * Take the initial reference that will be released on destroy
>>>> * This can be thought of a joint reference by cgroup and
>>>> @@ -1002,7 +1031,14 @@ cfq_find_alloc_cfqg(struct cfq_data *cfqd, struct
>>>> cgroup *cgroup, int create)
>>>>
>>>> /* Add group on cfqd list */
>>>> hlist_add_head(&cfqg->cfqd_node,&cfqd->cfqg_list);
>>>> + cfqg->cfqd = cfqd;
>>>> + cfqg->cic_index = idx;
>>>> + goto done;
>>>>
>>>> +err:
>>>> + spin_lock(&cic_index_lock);
>>>> + ida_remove(&cic_index_ida, idx);
>>>> + spin_unlock(&cic_index_lock);
>>>> done:
>>>> return cfqg;
>>>> }
>>>> @@ -1095,10 +1131,6 @@ static inline struct cfq_group
>>>> *cfq_ref_get_cfqg(struct cfq_group *cfqg)
>>>>
>>>> static void cfq_link_cfqq_cfqg(struct cfq_queue *cfqq, struct cfq_group
>>>> *cfqg)
>>>> {
>>>> - /* Currently, all async queues are mapped to root group */
>>>> - if (!cfq_cfqq_sync(cfqq))
>>>> - cfqg =&cfqq->cfqd->root_group;
>>>> -
>>>> cfqq->cfqg = cfqg;
>>>> /* cfqq reference on cfqg */
>>>> atomic_inc(&cfqq->cfqg->ref);
>>>> @@ -1114,6 +1146,16 @@ static void cfq_put_cfqg(struct cfq_group *cfqg)
>>>> return;
>>>> for_each_cfqg_st(cfqg, i, j, st)
>>>> BUG_ON(!RB_EMPTY_ROOT(&st->rb) || st->active != NULL);
>>>> +
>>>> + /**
>>>> + * ToDo:
>>>> + * root_group never reaches here and cic_index is never
>>>> + * removed. Unused cic_index lasts by elevator switching.
>>>> + */
>>>> + spin_lock(&cic_index_lock);
>>>> + ida_remove(&cic_index_ida, cfqg->cic_index);
>>>> + spin_unlock(&cic_index_lock);
>>>> +
>>>> kfree(cfqg);
>>>> }
>>>>
>>>> @@ -1122,6 +1164,15 @@ static void cfq_destroy_cfqg(struct cfq_data
>>>> *cfqd, struct cfq_group *cfqg)
>>>> /* Something wrong if we are trying to remove same group twice */
>>>> BUG_ON(hlist_unhashed(&cfqg->cfqd_node));
>>>>
>>>> + while (!list_empty(&cfqg->cic_list)) {
>>>> + struct cfq_io_context *cic =
>>>> list_entry(cfqg->cic_list.next,
>>>> + struct
>>>> cfq_io_context,
>>>> + group_list);
>>>> +
>>>> + __cfq_exit_single_io_context(cfqd, cfqg, cic);
>>>> + }
>>>> +
>>>> + cfq_put_async_queues(cfqg);
>>>> hlist_del_init(&cfqg->cfqd_node);
>>>>
>>>> /*
>>>> @@ -1497,10 +1548,12 @@ static struct request *
>>>> cfq_find_rq_fmerge(struct cfq_data *cfqd, struct bio *bio)
>>>> {
>>>> struct task_struct *tsk = current;
>>>> + struct cfq_group *cfqg;
>>>> struct cfq_io_context *cic;
>>>> struct cfq_queue *cfqq;
>>>>
>>>> - cic = cfq_cic_lookup(cfqd, tsk->io_context);
>>>> + cfqg = cfq_get_cfqg(cfqd, bio, 0);
>>>> + cic = cfq_cic_lookup(cfqd, cfqg, tsk->io_context);
>>>> if (!cic)
>>>> return NULL;
>>>>
>>>> @@ -1611,6 +1664,7 @@ static int cfq_allow_merge(struct request_queue *q,
>>>> struct request *rq,
>>>> struct bio *bio)
>>>> {
>>>> struct cfq_data *cfqd = q->elevator->elevator_data;
>>>> + struct cfq_group *cfqg;
>>>> struct cfq_io_context *cic;
>>>> struct cfq_queue *cfqq;
>>>>
>>>> @@ -1624,7 +1678,8 @@ static int cfq_allow_merge(struct request_queue *q,
>>>> struct request *rq,
>>>> * Lookup the cfqq that this bio will be queued with. Allow
>>>> * merge only if rq is queued there.
>>>> */
>>>> - cic = cfq_cic_lookup(cfqd, current->io_context);
>>>> + cfqg = cfq_get_cfqg(cfqd, bio, 0);
>>>> + cic = cfq_cic_lookup(cfqd, cfqg, current->io_context);
>>>> if (!cic)
>>>> return false;
>>>>
>>>> @@ -2667,17 +2722,18 @@ static void cfq_exit_cfqq(struct cfq_data *cfqd,
>>>> struct cfq_queue *cfqq)
>>>> }
>>>>
>>>> static void __cfq_exit_single_io_context(struct cfq_data *cfqd,
>>>> + struct cfq_group *cfqg,
>>>> struct cfq_io_context *cic)
>>>> {
>>>> struct io_context *ioc = cic->ioc;
>>>>
>>>> - list_del_init(&cic->queue_list);
>>>> + list_del_init(&cic->group_list);
>>>>
>>>> /*
>>>> * Make sure dead mark is seen for dead queues
>>>> */
>>>> smp_wmb();
>>>> - cic->key = cfqd_dead_key(cfqd);
>>>> + cic->key = cfqg_dead_key(cfqg);
>>>>
>>>> if (ioc->ioc_data == cic)
>>>> rcu_assign_pointer(ioc->ioc_data, NULL);
>>>> @@ -2696,23 +2752,23 @@ static void __cfq_exit_single_io_context(struct
>>>> cfq_data *cfqd,
>>>> static void cfq_exit_single_io_context(struct io_context *ioc,
>>>> struct cfq_io_context *cic)
>>>> {
>>>> - struct cfq_data *cfqd = cic_to_cfqd(cic);
>>>> + struct cfq_group *cfqg = cic_to_cfqg(cic);
>>>>
>>>> - if (cfqd) {
>>>> - struct request_queue *q = cfqd->queue;
>>>> + if (cfqg) {
>>>> + struct cfq_data *cfqd = cfqg->cfqd;
>>>> unsigned long flags;
>>>>
>>>> - spin_lock_irqsave(q->queue_lock, flags);
>>>> + spin_lock_irqsave(&cfqg->lock, flags);
>>>>
>>>> /*
>>>> * Ensure we get a fresh copy of the ->key to prevent
>>>> * race between exiting task and queue
>>>> */
>>>> smp_read_barrier_depends();
>>>> - if (cic->key == cfqd)
>>>> - __cfq_exit_single_io_context(cfqd, cic);
>>>> + if (cic->key == cfqg)
>>>> + __cfq_exit_single_io_context(cfqd, cfqg, cic);
>>>>
>>>> - spin_unlock_irqrestore(q->queue_lock, flags);
>>>> + spin_unlock_irqrestore(&cfqg->lock, flags);
>>>> }
>>>> }
>>>>
>>>> @@ -2734,7 +2790,7 @@ cfq_alloc_io_context(struct cfq_data *cfqd, gfp_t
>>>> gfp_mask)
>>>>
>>>> cfqd->queue->node);
>>>> if (cic) {
>>>> cic->last_end_request = jiffies;
>>>> - INIT_LIST_HEAD(&cic->queue_list);
>>>> + INIT_LIST_HEAD(&cic->group_list);
>>>> INIT_HLIST_NODE(&cic->cic_list);
>>>> cic->dtor = cfq_free_io_context;
>>>> cic->exit = cfq_exit_io_context;
>>>> @@ -2801,8 +2857,7 @@ static void changed_ioprio(struct io_context *ioc,
>>>> struct cfq_io_context *cic)
>>>> cfqq = cic->cfqq[BLK_RW_ASYNC];
>>>> if (cfqq) {
>>>> struct cfq_queue *new_cfqq;
>>>> - new_cfqq = cfq_get_queue(cfqd, BLK_RW_ASYNC, cic->ioc,
>>>> - GFP_ATOMIC);
>>>> + new_cfqq = cfq_get_queue(cfqd, BLK_RW_ASYNC, cic,
>>>> GFP_ATOMIC);
>>>> if (new_cfqq) {
>>>> cic->cfqq[BLK_RW_ASYNC] = new_cfqq;
>>>> cfq_put_queue(cfqq);
>>>> @@ -2879,16 +2934,14 @@ static void cfq_ioc_set_cgroup(struct io_context
>>>> *ioc)
>>>>
>>>> static struct cfq_queue *
>>>> cfq_find_alloc_queue(struct cfq_data *cfqd, bool is_sync,
>>>> - struct io_context *ioc, gfp_t gfp_mask)
>>>> + struct cfq_io_context *cic, gfp_t gfp_mask)
>>>> {
>>>> struct cfq_queue *cfqq, *new_cfqq = NULL;
>>>> - struct cfq_io_context *cic;
>>>> + struct io_context *ioc = cic->ioc;
>>>> struct cfq_group *cfqg;
>>>>
>>>> retry:
>>>> - cfqg = cfq_get_cfqg(cfqd, NULL, 1);
>>>> - cic = cfq_cic_lookup(cfqd, ioc);
>>>> - /* cic always exists here */
>>>> + cfqg = cic_to_cfqg(cic);
>>>> cfqq = cic_to_cfqq(cic, is_sync);
>>>>
>>>> /*
>>>> @@ -2930,36 +2983,38 @@ retry:
>>>> }
>>>>
>>>> static struct cfq_queue **
>>>> -cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int
>>>> ioprio)
>>>> +cfq_async_queue_prio(struct cfq_group *cfqg, int ioprio_class, int
>>>> ioprio)
>>>> {
>>>> switch (ioprio_class) {
>>>> case IOPRIO_CLASS_RT:
>>>> - return&cfqd->async_cfqq[0][ioprio];
>>>> + return&cfqg->async_cfqq[0][ioprio];
>>>> case IOPRIO_CLASS_BE:
>>>> - return&cfqd->async_cfqq[1][ioprio];
>>>> + return&cfqg->async_cfqq[1][ioprio];
>>>> case IOPRIO_CLASS_IDLE:
>>>> - return&cfqd->async_idle_cfqq;
>>>> + return&cfqg->async_idle_cfqq;
>>>> default:
>>>> BUG();
>>>> }
>>>> }
>>>>
>>>> static struct cfq_queue *
>>>> -cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct io_context
>>>> *ioc,
>>>> +cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_context
>>>> *cic,
>>>> gfp_t gfp_mask)
>>>> {
>>>> + struct io_context *ioc = cic->ioc;
>>>> const int ioprio = task_ioprio(ioc);
>>>> const int ioprio_class = task_ioprio_class(ioc);
>>>> + struct cfq_group *cfqg = cic_to_cfqg(cic);
>>>> struct cfq_queue **async_cfqq = NULL;
>>>> struct cfq_queue *cfqq = NULL;
>>>>
>>>> if (!is_sync) {
>>>> - async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class,
>>>> ioprio);
>>>> + async_cfqq = cfq_async_queue_prio(cfqg, ioprio_class,
>>>> ioprio);
>>>> cfqq = *async_cfqq;
>>>> }
>>>>
>>>> if (!cfqq)
>>>> - cfqq = cfq_find_alloc_queue(cfqd, is_sync, ioc,
>>>> gfp_mask);
>>>> + cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic,
>>>> gfp_mask);
>>>>
>>>> /*
>>>> * pin the queue now that it's allocated, scheduler exit will
>>>> prune it
>>>> @@ -2977,19 +3032,19 @@ cfq_get_queue(struct cfq_data *cfqd, bool
>>>> is_sync, struct io_context *ioc,
>>>> * We drop cfq io contexts lazily, so we may find a dead one.
>>>> */
>>>> static void
>>>> -cfq_drop_dead_cic(struct cfq_data *cfqd, struct io_context *ioc,
>>>> - struct cfq_io_context *cic)
>>>> +cfq_drop_dead_cic(struct cfq_data *cfqd, struct cfq_group *cfqg,
>>>> + struct io_context *ioc, struct cfq_io_context *cic)
>>>> {
>>>> unsigned long flags;
>>>>
>>>> - WARN_ON(!list_empty(&cic->queue_list));
>>>> - BUG_ON(cic->key != cfqd_dead_key(cfqd));
>>>> + WARN_ON(!list_empty(&cic->group_list));
>>>> + BUG_ON(cic->key != cfqg_dead_key(cfqg));
>>>>
>>>> spin_lock_irqsave(&ioc->lock, flags);
>>>>
>>>> BUG_ON(ioc->ioc_data == cic);
>>>>
>>>> - radix_tree_delete(&ioc->radix_root, cfqd->cic_index);
>>>> + radix_tree_delete(&ioc->radix_root, cfqg->cic_index);
>>>> hlist_del_rcu(&cic->cic_list);
>>>> spin_unlock_irqrestore(&ioc->lock, flags);
>>>>
>>>> @@ -2997,11 +3052,14 @@ cfq_drop_dead_cic(struct cfq_data *cfqd, struct
>>>> io_context *ioc,
>>>> }
>>>>
>>>> static struct cfq_io_context *
>>>> -cfq_cic_lookup(struct cfq_data *cfqd, struct io_context *ioc)
>>>> +cfq_cic_lookup(struct cfq_data *cfqd, struct cfq_group *cfqg,
>>>> + struct io_context *ioc)
>>>> {
>>>> struct cfq_io_context *cic;
>>>> unsigned long flags;
>>>>
>>>> + if (!cfqg)
>>>> + return NULL;
>>>> if (unlikely(!ioc))
>>>> return NULL;
>>>>
>>>> @@ -3011,18 +3069,18 @@ cfq_cic_lookup(struct cfq_data *cfqd, struct
>>>> io_context *ioc)
>>>> * we maintain a last-hit cache, to avoid browsing over the tree
>>>> */
>>>> cic = rcu_dereference(ioc->ioc_data);
>>>> - if (cic&& cic->key == cfqd) {
>>>> + if (cic&& cic->key == cfqg) {
>>>> rcu_read_unlock();
>>>> return cic;
>>>> }
>>>>
>>>> do {
>>>> - cic = radix_tree_lookup(&ioc->radix_root,
>>>> cfqd->cic_index);
>>>> + cic = radix_tree_lookup(&ioc->radix_root,
>>>> cfqg->cic_index);
>>>> rcu_read_unlock();
>>>> if (!cic)
>>>> break;
>>>> - if (unlikely(cic->key != cfqd)) {
>>>> - cfq_drop_dead_cic(cfqd, ioc, cic);
>>>> + if (unlikely(cic->key != cfqg)) {
>>>> + cfq_drop_dead_cic(cfqd, cfqg, ioc, cic);
>>>> rcu_read_lock();
>>>> continue;
>>>> }
>>>> @@ -3037,24 +3095,25 @@ cfq_cic_lookup(struct cfq_data *cfqd, struct
>>>> io_context *ioc)
>>>> }
>>>>
>>>> /*
>>>> - * Add cic into ioc, using cfqd as the search key. This enables us to
>>>> lookup
>>>> + * Add cic into ioc, using cfqg as the search key. This enables us to
>>>> lookup
>>>> * the process specific cfq io context when entered from the block layer.
>>>> - * Also adds the cic to a per-cfqd list, used when this queue is
>>>> removed.
>>>> + * Also adds the cic to a per-cfqg list, used when the group is removed.
>>>> + * request_queue lock must be held.
>>>> */
>>>> -static int cfq_cic_link(struct cfq_data *cfqd, struct io_context *ioc,
>>>> - struct cfq_io_context *cic, gfp_t gfp_mask)
>>>> +static int cfq_cic_link(struct cfq_data *cfqd, struct cfq_group *cfqg,
>>>> + struct io_context *ioc, struct cfq_io_context
>>>> *cic)
>>>> {
>>>> unsigned long flags;
>>>> int ret;
>>>>
>>>> - ret = radix_tree_preload(gfp_mask);
>>>> + ret = radix_tree_preload(GFP_ATOMIC);
>>>> if (!ret) {
>>>> cic->ioc = ioc;
>>>> - cic->key = cfqd;
>>>> + cic->key = cfqg;
>>>>
>>>> spin_lock_irqsave(&ioc->lock, flags);
>>>> ret = radix_tree_insert(&ioc->radix_root,
>>>> - cfqd->cic_index, cic);
>>>> + cfqg->cic_index, cic);
>>>> if (!ret)
>>>> hlist_add_head_rcu(&cic->cic_list,&ioc->cic_list);
>>>> spin_unlock_irqrestore(&ioc->lock, flags);
>>>> @@ -3062,9 +3121,9 @@ static int cfq_cic_link(struct cfq_data *cfqd,
>>>> struct io_context *ioc,
>>>> radix_tree_preload_end();
>>>>
>>>> if (!ret) {
>>>> - spin_lock_irqsave(cfqd->queue->queue_lock,
>>>> flags);
>>>> - list_add(&cic->queue_list,&cfqd->cic_list);
>>>> - spin_unlock_irqrestore(cfqd->queue->queue_lock,
>>>> flags);
>>>> + spin_lock_irqsave(&cfqg->lock, flags);
>>>> + list_add(&cic->group_list,&cfqg->cic_list);
>>>> + spin_unlock_irqrestore(&cfqg->lock, flags);
>>>> }
>>>> }
>>>>
>>>> @@ -3080,10 +3139,12 @@ static int cfq_cic_link(struct cfq_data *cfqd,
>>>> struct io_context *ioc,
>>>> * than one device managed by cfq.
>>>> */
>>>> static struct cfq_io_context *
>>>> -cfq_get_io_context(struct cfq_data *cfqd, gfp_t gfp_mask)
>>>> +cfq_get_io_context(struct cfq_data *cfqd, struct bio *bio, gfp_t
>>>> gfp_mask)
>>>> {
>>>> struct io_context *ioc = NULL;
>>>> struct cfq_io_context *cic;
>>>> + struct cfq_group *cfqg, *cfqg2;
>>>> + unsigned long flags;
>>>>
>>>> might_sleep_if(gfp_mask& __GFP_WAIT);
>>>>
>>>> @@ -3091,18 +3152,38 @@ cfq_get_io_context(struct cfq_data *cfqd, gfp_t
>>>> gfp_mask)
>>>> if (!ioc)
>>>> return NULL;
>>>>
>>>> - cic = cfq_cic_lookup(cfqd, ioc);
>>>> + spin_lock_irqsave(cfqd->queue->queue_lock, flags);
>>>> +retry_cfqg:
>>>> + cfqg = cfq_get_cfqg(cfqd, bio, 1);
>>>> +retry_cic:
>>>> + cic = cfq_cic_lookup(cfqd, cfqg, ioc);
>>>> if (cic)
>>>> goto out;
>>>> + spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
>>>>
>>>> cic = cfq_alloc_io_context(cfqd, gfp_mask);
>>>> if (cic == NULL)
>>>> goto err;
>>>>
>>>> - if (cfq_cic_link(cfqd, ioc, cic, gfp_mask))
>>>> + spin_lock_irqsave(cfqd->queue->queue_lock, flags);
>>>> +
>>>> + /* check the consistency breakage during unlock period */
>>>> + cfqg2 = cfq_get_cfqg(cfqd, bio, 0);
>>>> + if (cfqg != cfqg2) {
>>>> + cfq_cic_free(cic);
>>>> + if (!cfqg2)
>>>> + goto retry_cfqg;
>>>> + else {
>>>> + cfqg = cfqg2;
>>>> + goto retry_cic;
>>>> + }
>>>> + }
>>>> +
>>>> + if (cfq_cic_link(cfqd, cfqg, ioc, cic))
>>>> goto err_free;
>>>>
>>>> out:
>>>> + spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
>>>> smp_read_barrier_depends();
>>>> if (unlikely(ioc->ioprio_changed))
>>>> cfq_ioc_set_ioprio(ioc);
>>>> @@ -3113,6 +3194,7 @@ out:
>>>> #endif
>>>> return cic;
>>>> err_free:
>>>> + spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
>>>> cfq_cic_free(cic);
>>>> err:
>>>> put_io_context(ioc);
>>>> @@ -3537,6 +3619,7 @@ static inline int __cfq_may_queue(struct cfq_queue
>>>> *cfqq)
>>>> static int cfq_may_queue(struct request_queue *q, struct bio *bio, int
>>>> rw)
>>>> {
>>>> struct cfq_data *cfqd = q->elevator->elevator_data;
>>>> + struct cfq_group *cfqg;
>>>> struct task_struct *tsk = current;
>>>> struct cfq_io_context *cic;
>>>> struct cfq_queue *cfqq;
>>>> @@ -3547,7 +3630,8 @@ static int cfq_may_queue(struct request_queue *q,
>>>> struct bio *bio, int rw)
>>>> * so just lookup a possibly existing queue, or return 'may queue'
>>>> * if that fails
>>>> */
>>>> - cic = cfq_cic_lookup(cfqd, tsk->io_context);
>>>> + cfqg = cfq_get_cfqg(cfqd, bio, 0);
>>>> + cic = cfq_cic_lookup(cfqd, cfqg, tsk->io_context);
>>>> if (!cic)
>>>> return ELV_MQUEUE_MAY;
>>>>
>>>> @@ -3636,7 +3720,7 @@ cfq_set_request(struct request_queue *q, struct
>>>> request *rq, struct bio *bio,
>>>>
>>>> might_sleep_if(gfp_mask& __GFP_WAIT);
>>>>
>>>> - cic = cfq_get_io_context(cfqd, gfp_mask);
>>>> + cic = cfq_get_io_context(cfqd, bio, gfp_mask);
>>>>
>>>> spin_lock_irqsave(q->queue_lock, flags);
>>>>
>>>> @@ -3646,7 +3730,7 @@ cfq_set_request(struct request_queue *q, struct
>>>> request *rq, struct bio *bio,
>>>> new_queue:
>>>> cfqq = cic_to_cfqq(cic, is_sync);
>>>> if (!cfqq || cfqq ==&cfqd->oom_cfqq) {
>>>> - cfqq = cfq_get_queue(cfqd, is_sync, cic->ioc, gfp_mask);
>>>> + cfqq = cfq_get_queue(cfqd, is_sync, cic, gfp_mask);
>>>> cic_set_cfqq(cic, cfqq, is_sync);
>>>> } else {
>>>> /*
>>>> @@ -3762,19 +3846,19 @@ static void cfq_shutdown_timer_wq(struct cfq_data
>>>> *cfqd)
>>>> cancel_work_sync(&cfqd->unplug_work);
>>>> }
>>>>
>>>> -static void cfq_put_async_queues(struct cfq_data *cfqd)
>>>> +static void cfq_put_async_queues(struct cfq_group *cfqg)
>>>> {
>>>> int i;
>>>>
>>>> for (i = 0; i< IOPRIO_BE_NR; i++) {
>>>> - if (cfqd->async_cfqq[0][i])
>>>> - cfq_put_queue(cfqd->async_cfqq[0][i]);
>>>> - if (cfqd->async_cfqq[1][i])
>>>> - cfq_put_queue(cfqd->async_cfqq[1][i]);
>>>> + if (cfqg->async_cfqq[0][i])
>>>> + cfq_put_queue(cfqg->async_cfqq[0][i]);
>>>> + if (cfqg->async_cfqq[1][i])
>>>> + cfq_put_queue(cfqg->async_cfqq[1][i]);
>>>> }
>>>>
>>>> - if (cfqd->async_idle_cfqq)
>>>> - cfq_put_queue(cfqd->async_idle_cfqq);
>>>> + if (cfqg->async_idle_cfqq)
>>>> + cfq_put_queue(cfqg->async_idle_cfqq);
>>>> }
>>>>
>>>> static void cfq_cfqd_free(struct rcu_head *head)
>>>> @@ -3794,15 +3878,6 @@ static void cfq_exit_queue(struct elevator_queue
>>>> *e)
>>>> if (cfqd->active_queue)
>>>> __cfq_slice_expired(cfqd, cfqd->active_queue, 0);
>>>>
>>>> - while (!list_empty(&cfqd->cic_list)) {
>>>> - struct cfq_io_context *cic =
>>>> list_entry(cfqd->cic_list.next,
>>>> - struct
>>>> cfq_io_context,
>>>> - queue_list);
>>>> -
>>>> - __cfq_exit_single_io_context(cfqd, cic);
>>>> - }
>>>> -
>>>> - cfq_put_async_queues(cfqd);
>>>> cfq_release_cfq_groups(cfqd);
>>>> cfq_blkiocg_del_blkio_group(&cfqd->root_group.blkg);
>>>>
>>>> @@ -3810,10 +3885,6 @@ static void cfq_exit_queue(struct elevator_queue
>>>> *e)
>>>>
>>>> cfq_shutdown_timer_wq(cfqd);
>>>>
>>>> - spin_lock(&cic_index_lock);
>>>> - ida_remove(&cic_index_ida, cfqd->cic_index);
>>>> - spin_unlock(&cic_index_lock);
>>>> -
>>>> /* Wait for cfqg->blkg->key accessors to exit their grace periods.
>>>> */
>>>> call_rcu(&cfqd->rcu, cfq_cfqd_free);
>>>> }
>>>> @@ -3823,7 +3894,7 @@ static int cfq_alloc_cic_index(void)
>>>> int index, error;
>>>>
>>>> do {
>>>> - if (!ida_pre_get(&cic_index_ida, GFP_KERNEL))
>>>> + if (!ida_pre_get(&cic_index_ida, GFP_ATOMIC))
>>>> return -ENOMEM;
>>>>
>>>> spin_lock(&cic_index_lock);
>>>> @@ -3839,20 +3910,18 @@ static int cfq_alloc_cic_index(void)
>>>> static void *cfq_init_queue(struct request_queue *q)
>>>> {
>>>> struct cfq_data *cfqd;
>>>> - int i, j;
>>>> + int i, j, idx;
>>>> struct cfq_group *cfqg;
>>>> struct cfq_rb_root *st;
>>>>
>>>> - i = cfq_alloc_cic_index();
>>>> - if (i< 0)
>>>> + idx = cfq_alloc_cic_index();
>>>> + if (idx< 0)
>>>> return NULL;
>>>>
>>>> cfqd = kmalloc_node(sizeof(*cfqd), GFP_KERNEL | __GFP_ZERO,
>>>> q->node);
>>>> if (!cfqd)
>>>> return NULL;
>>>>
>>>> - cfqd->cic_index = i;
>>>> -
>>>> /* Init root service tree */
>>>> cfqd->grp_service_tree = CFQ_RB_ROOT;
>>>>
>>>> @@ -3861,6 +3930,9 @@ static void *cfq_init_queue(struct request_queue
>>>> *q)
>>>> for_each_cfqg_st(cfqg, i, j, st)
>>>> *st = CFQ_RB_ROOT;
>>>> RB_CLEAR_NODE(&cfqg->rb_node);
>>>> + cfqg->cfqd = cfqd;
>>>> + cfqg->cic_index = idx;
>>>> + INIT_LIST_HEAD(&cfqg->cic_list);
>>>>
>>>> /* Give preference to root group over other groups */
>>>> cfqg->weight = 2*BLKIO_WEIGHT_DEFAULT;
>>>> @@ -3874,6 +3946,7 @@ static void *cfq_init_queue(struct request_queue
>>>> *q)
>>>> rcu_read_lock();
>>>> cfq_blkiocg_add_blkio_group(&blkio_root_cgroup,&cfqg->blkg,
>>>> (void *)cfqd, 0);
>>>> + hlist_add_head(&cfqg->cfqd_node,&cfqd->cfqg_list);
>>>> rcu_read_unlock();
>>>> #endif
>>>> /*
>>>> @@ -3893,8 +3966,6 @@ static void *cfq_init_queue(struct request_queue
>>>> *q)
>>>> atomic_inc(&cfqd->oom_cfqq.ref);
>>>> cfq_link_cfqq_cfqg(&cfqd->oom_cfqq,&cfqd->root_group);
>>>>
>>>> - INIT_LIST_HEAD(&cfqd->cic_list);
>>>> -
>>>> cfqd->queue = q;
>>>>
>>>> init_timer(&cfqd->idle_slice_timer);
>>>> diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
>>>> index 64d5291..6c05b54 100644
>>>> --- a/include/linux/iocontext.h
>>>> +++ b/include/linux/iocontext.h
>>>> @@ -18,7 +18,7 @@ struct cfq_io_context {
>>>> unsigned long ttime_samples;
>>>> unsigned long ttime_mean;
>>>>
>>>> - struct list_head queue_list;
>>>> + struct list_head group_list;
>>>> struct hlist_node cic_list;
>>>>
>>>> void (*dtor)(struct io_context *); /* destructor */
>>>> --
>>>> 1.6.2.5
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
>>>> in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>> Please read the FAQ at http://www.tux.org/lkml/
>>>>
>>>
>>
>> --
>> IKEDA, Munehiro
>> NEC Corporation of America
>> m-ikeda@ds.jp.nec.com
>>
>>
>
--
IKEDA, Munehiro
NEC Corporation of America
m-ikeda@ds.jp.nec.com
next prev parent reply other threads:[~2010-08-14 0:53 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-09 2:57 [RFC][PATCH 00/11] blkiocg async support Munehiro Ikeda
2010-07-09 3:14 ` [RFC][PATCH 01/11] blkiocg async: Make page_cgroup independent from memory controller Munehiro Ikeda
2010-07-26 6:49 ` Balbir Singh
2010-07-09 3:15 ` [RFC][PATCH 02/11] blkiocg async: The main part of iotrack Munehiro Ikeda
2010-07-09 7:35 ` KAMEZAWA Hiroyuki
2010-07-09 23:06 ` Munehiro Ikeda
2010-07-12 0:11 ` KAMEZAWA Hiroyuki
2010-07-14 14:46 ` Munehiro IKEDA
2010-07-09 7:38 ` KAMEZAWA Hiroyuki
2010-07-09 23:09 ` Munehiro Ikeda
2010-07-10 10:06 ` Andrea Righi
2010-07-09 3:16 ` [RFC][PATCH 03/11] blkiocg async: Hooks for iotrack Munehiro Ikeda
2010-07-09 9:24 ` Andrea Righi
2010-07-09 23:43 ` Munehiro Ikeda
2010-07-09 3:16 ` [RFC][PATCH 04/11] blkiocg async: block_commit_write not to record process info Munehiro Ikeda
2010-07-09 3:17 ` [RFC][PATCH 05/11] blkiocg async: __set_page_dirty_nobuffer " Munehiro Ikeda
2010-07-09 3:17 ` [RFC][PATCH 06/11] blkiocg async: ext4_writepage not to overwrite iotrack info Munehiro Ikeda
2010-07-09 3:18 ` [RFC][PATCH 07/11] blkiocg async: Pass bio to elevator_ops functions Munehiro Ikeda
2010-07-09 3:19 ` [RFC][PATCH 08/11] blkiocg async: Function to search blkcg from css ID Munehiro Ikeda
2010-07-09 3:20 ` [RFC][PATCH 09/11] blkiocg async: Functions to get cfqg from bio Munehiro Ikeda
2010-07-09 3:22 ` [RFC][PATCH 10/11] blkiocg async: Async queue per cfq_group Munehiro Ikeda
2010-08-13 1:24 ` Nauman Rafique
2010-08-13 21:00 ` Munehiro Ikeda
2010-08-13 23:01 ` Nauman Rafique
2010-08-14 0:49 ` Munehiro Ikeda [this message]
2010-07-09 3:23 ` [RFC][PATCH 11/11] blkiocg async: Workload timeslice adjustment for async queues Munehiro Ikeda
2010-07-09 10:04 ` [RFC][PATCH 00/11] blkiocg async support Andrea Righi
2010-07-09 13:45 ` Vivek Goyal
2010-07-10 0:17 ` Munehiro Ikeda
2010-07-10 0:55 ` Nauman Rafique
2010-07-10 13:24 ` Vivek Goyal
2010-07-12 0:20 ` KAMEZAWA Hiroyuki
2010-07-12 13:18 ` Vivek Goyal
2010-07-13 4:36 ` KAMEZAWA Hiroyuki
2010-07-14 14:29 ` Vivek Goyal
2010-07-15 0:00 ` KAMEZAWA Hiroyuki
2010-07-16 13:43 ` Vivek Goyal
2010-07-16 14:15 ` Daniel P. Berrange
2010-07-16 14:35 ` Vivek Goyal
2010-07-16 14:53 ` Daniel P. Berrange
2010-07-16 15:12 ` Vivek Goyal
2010-07-27 10:40 ` Daniel P. Berrange
2010-07-27 14:03 ` Vivek Goyal
2010-07-22 19:28 ` Greg Thelen
2010-07-22 23:59 ` KAMEZAWA Hiroyuki
2010-07-26 6:41 ` Balbir Singh
2010-07-27 6:40 ` Greg Thelen
2010-07-27 6:39 ` KAMEZAWA Hiroyuki
2010-08-02 20:58 ` Vivek Goyal
2010-08-03 14:31 ` Munehiro Ikeda
2010-08-03 19:24 ` Nauman Rafique
2010-08-04 14:32 ` Munehiro Ikeda
2010-08-03 20:15 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C65E829.2030401@ds.jp.nec.com \
--to=m-ikeda@ds.jp.nec.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=guijianfeng@cn.fujitsu.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nauman@google.com \
--cc=righi.andrea@gmail.com \
--cc=ryov@valinux.co.jp \
--cc=taka@valinux.co.jp \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.