public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Shaohua Li <shaohua.li@intel.com>
Cc: jaxboe@fusionio.com, jmoyer@redhat.com, czoccolo@gmail.com,
	guijianfeng@cn.fujitsu.com, linux-kernel@vger.kernel.org
Subject: Re: cfq-iosched preempt issues
Date: Wed, 2 Mar 2011 15:21:18 -0500	[thread overview]
Message-ID: <20110302202118.GA2547@redhat.com> (raw)
In-Reply-To: <20110302124341.GA23940@sli10-conroe.sh.intel.com>

On Wed, Mar 02, 2011 at 08:43:41PM +0800, Shaohua Li wrote:
> queue preemption is good for some workloads and not for others. With commit
> f8ae6e3eb825, the impact is amplified. I currently have two issues with it:
> 1. In a multi-threaded workload, each thread runs a random read/write (for
> example, mmap write) with iodepth 1. I found the queue depth gets smaller
> with commit f8ae6e3eb825. The reason is write gets preempted, so more threads
> are waitting for write, and on the other hand, there are less threads doing
> read. This will make the queue depth small, so performance drops a little.
> So in this case, speed up write can speed up read too, but we can't detect
> it.
> 2. cfq_may_dispatch doesn't limit queue depth if the queue is the sole queue.
> What about if there are two queues, one sync and one async? If the sync queue's
> think time is small, we can treat it as the sole queue, because the sync queue
> will preempt async queue, so we don't need care about the async queue's latency.
> The issue exists before, but f8ae6e3eb825 amplifies it. Below is a patch for it.
> 
> Any idea?

CFQ is already very complicated, lets try to keep it simple. Because it
is complicated, making it hierarchical for cgroup becomes even harder.

IIUC, you are saying that cfqd->busy_queues check is not sufficient as
it takes async queues also in account.

So we can keep another count say, cfqd->busy_sync_queues and if there
are no busy_sync_queues, allow unlimited depth and that should be
a really simple few lines change.

But lets do it only if you have a real life workload. Similiarly we can
worry about RT case when there is a real workload behind it.

Thanks
Vivek
 

 


> 
> Thanks,
> Shaohua
> -----------------------------------------------
> Subject: cfq-iosched: don't limlit sync queue depth with only one such sync queue
> 
> If there are a sync and an async queue and the sync queue's think time is small,
> we can ignore the sync queue's dispatch quantum. Because the sync queue will
> always preempt the async queue, we don't need to care about async's latency.
> In the same way, we can optimize a RT queue too to improve performance.
> This can fix a performance regression of aiostress test, which is introduced by
> commit f8ae6e3eb825. The issue should exist even without the commit, but the
> commit amplifies the impact.
> 
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> ---
>  block/cfq-iosched.c |   91 +++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 73 insertions(+), 18 deletions(-)
> 
> Index: linux/block/cfq-iosched.c
> ===================================================================
> --- linux.orig/block/cfq-iosched.c	2011-03-02 14:58:19.000000000 +0800
> +++ linux/block/cfq-iosched.c	2011-03-02 15:48:38.000000000 +0800
> @@ -1150,6 +1150,20 @@ void cfq_unlink_blkio_group(void *key, s
>  	spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
>  }
>  
> +static bool cfq_have_cfqgs(struct cfq_data *cfqd)
> +{
> +	struct hlist_node *pos;
> +	struct cfq_group *cfqg;
> +	int cnt = 0;
> +
> +	hlist_for_each_entry(cfqg, pos, &cfqd->cfqg_list, cfqd_node) {
> +		cnt++;
> +		if (cnt > 1)
> +			break;
> +	}
> +	return cnt > 1;
> +}
> +
>  #else /* GROUP_IOSCHED */
>  static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd, int create)
>  {
> @@ -1169,6 +1183,12 @@ cfq_link_cfqq_cfqg(struct cfq_queue *cfq
>  static void cfq_release_cfq_groups(struct cfq_data *cfqd) {}
>  static inline void cfq_put_cfqg(struct cfq_group *cfqg) {}
>  
> +static inline bool cfq_have_cfqgs(struct cfq_data *cfqd)
> +{
> +	return false;
> +}
> +
> +
>  #endif /* GROUP_IOSCHED */
>  
>  /*
> @@ -2381,6 +2401,57 @@ static inline bool cfq_slice_used_soon(s
>  	return false;
>  }
>  
> +static unsigned int cfq_queue_max_quantum(struct cfq_data *cfqd,
> +	struct cfq_queue *cfqq)
> +{
> +	int sync = cfq_cfqq_sync(cfqq);
> +	enum wl_prio_t prio = cfqq_prio(cfqq);
> +	struct cfq_group *cfqg = cfqq->cfqg;
> +	int sync_queues_cnt, async_queues_cnt;
> +	struct cfq_io_context *cic = RQ_CIC(cfqq->next_rq);
> +
> +	/* Sole queue user, no limit */
> +	if (cfqd->busy_queues == 1)
> +		return -1;
> +
> +	if (cfq_have_cfqgs(cfqd) || (!sync && prio != RT_WORKLOAD))
> +		goto normal;
> +
> +	sync_queues_cnt = cfqg->service_trees[prio][SYNC_NOIDLE_WORKLOAD].count
> +		+ cfqg->service_trees[prio][SYNC_WORKLOAD].count;
> +	async_queues_cnt = cfqg->service_trees[prio][ASYNC_WORKLOAD].count;
> +	/*
> +	 * If a queue is a sole sync queue and think time is small, we can ignore
> +	 * async queue here and give the sync queue no dispatch limit, because a
> +	 * sync queue can preempt async queue.
> +	 *
> +	 * If the queue is RT, we don't need check BE, because even the
> +	 * queue is expired, the dispatcher will select RT queue again next time.
> +	 *
> +	 * If the queue is BE, we don't check RT here, because dispatcher will
> +	 * switch to RT next time, so we at most dispatch one extra request.
> +	 */
> +	if (((!sync && prio == RT_WORKLOAD && sync_queues_cnt == 0 &&
> +		async_queues_cnt == 1) || sync_queues_cnt == 1) &&
> +		sample_valid(cic->ttime_samples) &&
> +		cic->ttime_mean < cfqd->cfq_slice_idle)
> +		return -1;
> +normal:
> +	/*
> +	 * We have other queues, don't allow more IO from this one
> +	 */
> +	if (cfq_slice_used_soon(cfqd, cfqq))
> +		return 0;
> +	else
> +		/*
> +		 * Normally we start throttling cfqq when cfq_quantum/2
> +		 * requests have been dispatched. But we can drive
> +		 * deeper queue depths at the beginning of slice
> +		 * subjected to upper limit of cfq_quantum.
> +		 * */
> +		return cfqd->cfq_quantum;
> +}
> +
>  static bool cfq_may_dispatch(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>  {
>  	unsigned int max_dispatch;
> @@ -2411,25 +2482,9 @@ static bool cfq_may_dispatch(struct cfq_
>  		if (cfq_class_idle(cfqq))
>  			return false;
>  
> -		/*
> -		 * We have other queues, don't allow more IO from this one
> -		 */
> -		if (cfqd->busy_queues > 1 && cfq_slice_used_soon(cfqd, cfqq))
> +		max_dispatch = cfq_queue_max_quantum(cfqd, cfqq);
> +		if (max_dispatch == 0)
>  			return false;
> -
> -		/*
> -		 * Sole queue user, no limit
> -		 */
> -		if (cfqd->busy_queues == 1)
> -			max_dispatch = -1;
> -		else
> -			/*
> -			 * Normally we start throttling cfqq when cfq_quantum/2
> -			 * requests have been dispatched. But we can drive
> -			 * deeper queue depths at the beginning of slice
> -			 * subjected to upper limit of cfq_quantum.
> -			 * */
> -			max_dispatch = cfqd->cfq_quantum;
>  	}
>  
>  	/*

  parent reply	other threads:[~2011-03-02 20:21 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-02 12:43 cfq-iosched preempt issues Shaohua Li
2011-03-02 16:17 ` Jeff Moyer
2011-03-03  0:46   ` Shaohua Li
2011-03-02 20:21 ` Vivek Goyal [this message]
2011-03-02 21:05   ` Jeff Moyer
2011-03-02 21:27     ` Vivek Goyal
2011-03-02 21:47       ` Jeff Moyer
2011-03-03  1:05       ` Shaohua Li
2011-03-03  0:49   ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110302202118.GA2547@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=czoccolo@gmail.com \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=jaxboe@fusionio.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox