[patch]cfq-iosched: delete deep seeky queue idle logic

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch]cfq-iosched: delete deep seeky queue idle logic
@ 2011-09-16  3:09 Shaohua Li
  2011-09-16  6:04 ` Corrado Zoccolo
  2011-09-16  9:54 ` Tao Ma
  0 siblings, 2 replies; 19+ messages in thread
From: Shaohua Li @ 2011-09-16  3:09 UTC (permalink / raw)
  To: lkml; +Cc: Jens Axboe, Maxim Patlasov, Vivek Goyal, Corrado Zoccolo

Recently Maxim and I discussed why his aiostress workload performs poorly. If
you didn't follow the discussion, here are the issues we found:
1. cfq seeky dection isn't good. Assume a task accesses sector A, B, C, D, A+1,
B+1, C+1, D+1, A+2...Accessing A, B, C, D is random. cfq will detect the queue
as seeky, but since when accessing A+1, A+1 is already in disk cache, this
should be detected as sequential really. Not sure if any real workload has such
access patern, and seems not easy to have a clean fix too. Any idea for this?

2. deep seeky queue idle. This makes raid performs poorly. I would think we
revert the logic. Deep queue is more popular with high end hardware. In such
hardware, we'd better not do idle.
Note, currently we set a queue's slice after the first request is finished.
This means the drive already idles a little time. If the queue is truely deep,
new requests should already come in, so idle isn't required.
Looks Vivek used to post a patch to rever it, but it gets ignored.
http://us.generation-nt.com/patch-cfq-iosched-revert-logic-deep-queues-help-198339681.html

Signed-off-by: Shaohua Li<shaohua.li@intel.com>

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index a33bd43..f75439e 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -334,7 +334,6 @@ enum cfqq_state_flags {
 	CFQ_CFQQ_FLAG_sync,		/* synchronous queue */
 	CFQ_CFQQ_FLAG_coop,		/* cfqq is shared */
 	CFQ_CFQQ_FLAG_split_coop,	/* shared cfqq will be splitted */
-	CFQ_CFQQ_FLAG_deep,		/* sync cfqq experienced large depth */
 	CFQ_CFQQ_FLAG_wait_busy,	/* Waiting for next request */
 };

@@ -363,7 +362,6 @@ CFQ_CFQQ_FNS(slice_new);
 CFQ_CFQQ_FNS(sync);
 CFQ_CFQQ_FNS(coop);
 CFQ_CFQQ_FNS(split_coop);
-CFQ_CFQQ_FNS(deep);
 CFQ_CFQQ_FNS(wait_busy);
 #undef CFQ_CFQQ_FNS

@@ -2375,17 +2373,6 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
 		goto keep_queue;
 	}

-	/*
-	 * This is a deep seek queue, but the device is much faster than
-	 * the queue can deliver, don't idle
-	 **/
-	if (CFQQ_SEEKY(cfqq) && cfq_cfqq_idle_window(cfqq) &&
-	    (cfq_cfqq_slice_new(cfqq) ||
-	    (cfqq->slice_end - jiffies > jiffies - cfqq->slice_start))) {
-		cfq_clear_cfqq_deep(cfqq);
-		cfq_clear_cfqq_idle_window(cfqq);
-	}
-
 	if (cfqq->dispatched && cfq_should_idle(cfqd, cfqq)) {
 		cfqq = NULL;
 		goto keep_queue;
@@ -3298,13 +3285,10 @@ cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq,

 	enable_idle = old_idle = cfq_cfqq_idle_window(cfqq);

-	if (cfqq->queued[0] + cfqq->queued[1] >= 4)
-		cfq_mark_cfqq_deep(cfqq);
-
 	if (cfqq->next_rq && (cfqq->next_rq->cmd_flags & REQ_NOIDLE))
 		enable_idle = 0;
 	else if (!atomic_read(&cic->ioc->nr_tasks) || !cfqd->cfq_slice_idle ||
-	    (!cfq_cfqq_deep(cfqq) && CFQQ_SEEKY(cfqq)))
+	    CFQQ_SEEKY(cfqq))
 		enable_idle = 0;
 	else if (sample_valid(cic->ttime.ttime_samples)) {
 		if (cic->ttime.ttime_mean > cfqd->cfq_slice_idle)
@@ -3874,11 +3858,6 @@ static void cfq_idle_slice_timer(unsigned long data)
 		 */
 		if (!RB_EMPTY_ROOT(&cfqq->sort_list))
 			goto out_kick;
-
-		/*
-		 * Queue depth flag is reset only when the idle didn't succeed
-		 */
-		cfq_clear_cfqq_deep(cfqq);
 	}
 expire:
 	cfq_slice_expired(cfqd, timed_out);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-16  3:09 [patch]cfq-iosched: delete deep seeky queue idle logic Shaohua Li
@ 2011-09-16  6:04 ` Corrado Zoccolo
  2011-09-16  6:40   ` Shaohua Li
                     ` (2 more replies)
  2011-09-16  9:54 ` Tao Ma
  1 sibling, 3 replies; 19+ messages in thread
From: Corrado Zoccolo @ 2011-09-16  6:04 UTC (permalink / raw)
  To: Shaohua Li; +Cc: lkml, Jens Axboe, Maxim Patlasov, Vivek Goyal

On Fri, Sep 16, 2011 at 5:09 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> Recently Maxim and I discussed why his aiostress workload performs poorly. If
> you didn't follow the discussion, here are the issues we found:
> 1. cfq seeky dection isn't good. Assume a task accesses sector A, B, C, D, A+1,
> B+1, C+1, D+1, A+2...Accessing A, B, C, D is random. cfq will detect the queue
> as seeky, but since when accessing A+1, A+1 is already in disk cache, this
> should be detected as sequential really. Not sure if any real workload has such
> access patern, and seems not easy to have a clean fix too. Any idea for this?

Not all disks will cache 4 independent streams, we can't make that
assumption in cfq.
The current behaviour of assuming it as seeky should work well enough,
in fact it will be put in the seeky tree, and it can enjoy the seeky
tree quantum of time. If the second round takes a short time, it will
be able to schedule a third round again after the idle time.
If there are other seeky processes competing for the tree, the cache
can be cleared by the time it gets back to your 4 streams process, so
it will behave exactly as a seeky process from cfq point of view.
If the various accesses were submitted in parallel, the deep seeky
queue logic should kick in and make sure the process gets a sequential
quantum, rather than sharing it with other seeky processes, so
depending on your disk, it could perform better.

>
> 2. deep seeky queue idle. This makes raid performs poorly. I would think we
> revert the logic. Deep queue is more popular with high end hardware. In such
> hardware, we'd better not do idle.
> Note, currently we set a queue's slice after the first request is finished.
> This means the drive already idles a little time. If the queue is truely deep,
> new requests should already come in, so idle isn't required.
> Looks Vivek used to post a patch to rever it, but it gets ignored.
> http://us.generation-nt.com/patch-cfq-iosched-revert-logic-deep-queues-help-198339681.html
I get a 404 here. I think you are seeing only one half of the medal.
That logic is there mainly to ensure fairness between deep seeky
processes and normal seeky processes that want low latency.
If you remove that logic, a single process making many parallel aio
reads could completely swamp one machine, preventing other seeky
processes from progressing.
Instead of removing completely the logic, you should make the depth
configurable, so multi-spindle storages could allow deeper queues
before switching to fairness-enforcing policy.


> Signed-off-by: Shaohua Li<shaohua.li@intel.com>
>
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index a33bd43..f75439e 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -334,7 +334,6 @@ enum cfqq_state_flags {
>        CFQ_CFQQ_FLAG_sync,             /* synchronous queue */
>        CFQ_CFQQ_FLAG_coop,             /* cfqq is shared */
>        CFQ_CFQQ_FLAG_split_coop,       /* shared cfqq will be splitted */
> -       CFQ_CFQQ_FLAG_deep,             /* sync cfqq experienced large depth */
>        CFQ_CFQQ_FLAG_wait_busy,        /* Waiting for next request */
>  };
>
> @@ -363,7 +362,6 @@ CFQ_CFQQ_FNS(slice_new);
>  CFQ_CFQQ_FNS(sync);
>  CFQ_CFQQ_FNS(coop);
>  CFQ_CFQQ_FNS(split_coop);
> -CFQ_CFQQ_FNS(deep);
>  CFQ_CFQQ_FNS(wait_busy);
>  #undef CFQ_CFQQ_FNS
>
> @@ -2375,17 +2373,6 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
>                goto keep_queue;
>        }
>
> -       /*
> -        * This is a deep seek queue, but the device is much faster than
> -        * the queue can deliver, don't idle
> -        **/
> -       if (CFQQ_SEEKY(cfqq) && cfq_cfqq_idle_window(cfqq) &&
> -           (cfq_cfqq_slice_new(cfqq) ||
> -           (cfqq->slice_end - jiffies > jiffies - cfqq->slice_start))) {
> -               cfq_clear_cfqq_deep(cfqq);
> -               cfq_clear_cfqq_idle_window(cfqq);
> -       }
> -
I haven't seen the patch that introduced this code hunk, but this
could disrupt the cache in your first scenario if the reads A B C D
were sent in parallel. You mistakenly assume your disk can issue more
requests in parallel only because many of them hit the cache. Now you
start sending other unrelated requests (your 4 stream process is not
identified as deep any more, so other processes in the seeky tree
compete with it), and this makes your cache hit ratio drop and
everything slows down.
>        if (cfqq->dispatched && cfq_should_idle(cfqd, cfqq)) {
>                cfqq = NULL;
>                goto keep_queue;
> @@ -3298,13 +3285,10 @@ cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>
>        enable_idle = old_idle = cfq_cfqq_idle_window(cfqq);
>
> -       if (cfqq->queued[0] + cfqq->queued[1] >= 4)
> -               cfq_mark_cfqq_deep(cfqq);
> -
>        if (cfqq->next_rq && (cfqq->next_rq->cmd_flags & REQ_NOIDLE))
>                enable_idle = 0;
>        else if (!atomic_read(&cic->ioc->nr_tasks) || !cfqd->cfq_slice_idle ||
> -           (!cfq_cfqq_deep(cfqq) && CFQQ_SEEKY(cfqq)))
> +           CFQQ_SEEKY(cfqq))
>                enable_idle = 0;
>        else if (sample_valid(cic->ttime.ttime_samples)) {
>                if (cic->ttime.ttime_mean > cfqd->cfq_slice_idle)
> @@ -3874,11 +3858,6 @@ static void cfq_idle_slice_timer(unsigned long data)
>                 */
>                if (!RB_EMPTY_ROOT(&cfqq->sort_list))
>                        goto out_kick;
> -
> -               /*
> -                * Queue depth flag is reset only when the idle didn't succeed
> -                */
> -               cfq_clear_cfqq_deep(cfqq);
>        }
>  expire:
>        cfq_slice_expired(cfqd, timed_out);
>
>
>
Thanks,
Corrado

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-16  6:04 ` Corrado Zoccolo
@ 2011-09-16  6:40   ` Shaohua Li
  2011-09-16 19:25     ` Corrado Zoccolo
  2011-09-16 13:24   ` Vivek Goyal
  2011-09-16 13:37   ` Vivek Goyal
  2 siblings, 1 reply; 19+ messages in thread
From: Shaohua Li @ 2011-09-16  6:40 UTC (permalink / raw)
  To: Corrado Zoccolo; +Cc: lkml, Jens Axboe, Maxim Patlasov, Vivek Goyal

On Fri, 2011-09-16 at 14:04 +0800, Corrado Zoccolo wrote:
> On Fri, Sep 16, 2011 at 5:09 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> > Recently Maxim and I discussed why his aiostress workload performs poorly. If
> > you didn't follow the discussion, here are the issues we found:
> > 1. cfq seeky dection isn't good. Assume a task accesses sector A, B, C, D, A+1,
> > B+1, C+1, D+1, A+2...Accessing A, B, C, D is random. cfq will detect the queue
> > as seeky, but since when accessing A+1, A+1 is already in disk cache, this
> > should be detected as sequential really. Not sure if any real workload has such
> > access patern, and seems not easy to have a clean fix too. Any idea for this?
> 
> Not all disks will cache 4 independent streams, we can't make that
> assumption in cfq.
sure thing. we can't make such assumption. I'm thinking if we should
move the seeky detection in request finish. If time between two requests
finish is short, we thought the queue is sequential. This will make the
detection adaptive. but seems time measurement isn't easy.

> The current behaviour of assuming it as seeky should work well enough,
> in fact it will be put in the seeky tree, and it can enjoy the seeky
> tree quantum of time. If the second round takes a short time, it will
> be able to schedule a third round again after the idle time.
> If there are other seeky processes competing for the tree, the cache
> can be cleared by the time it gets back to your 4 streams process, so
> it will behave exactly as a seeky process from cfq point of view.
> If the various accesses were submitted in parallel, the deep seeky
> queue logic should kick in and make sure the process gets a sequential
> quantum, rather than sharing it with other seeky processes, so
> depending on your disk, it could perform better.
yes, the idle logic makes it ok, but sounds like "make things wrong
first (in seeky detection) and then fix it later (the idle logic)".

> > 2. deep seeky queue idle. This makes raid performs poorly. I would think we
> > revert the logic. Deep queue is more popular with high end hardware. In such
> > hardware, we'd better not do idle.
> > Note, currently we set a queue's slice after the first request is finished.
> > This means the drive already idles a little time. If the queue is truely deep,
> > new requests should already come in, so idle isn't required.
What did you think about this? Assume seeky request takes long time, so
the queue is already idling for a little time.

> > Looks Vivek used to post a patch to rever it, but it gets ignored.
> > http://us.generation-nt.com/patch-cfq-iosched-revert-logic-deep-queues-help-198339681.html
> I get a 404 here. I think you are seeing only one half of the medal.
> That logic is there mainly to ensure fairness between deep seeky
> processes and normal seeky processes that want low latency.
didn't understand it. The logic doesn't protect non-deep process. how
could it make the normal seeky process have low latency? or did you have
a test case for this, so I can analyze?
I tried a workload with one task drives depth 4 and one task drives
depth 16. Appears the behavior isn't changed w/wo the logic.

> If you remove that logic, a single process making many parallel aio
> reads could completely swamp one machine, preventing other seeky
> processes from progressing.
> Instead of removing completely the logic, you should make the depth
> configurable, so multi-spindle storages could allow deeper queues
> before switching to fairness-enforcing policy.
we already had too many tunable ;( And we don't have a way to get the
maximum depth a storage can provide.
Could the driver detect it's a multi-spindle storage and report it to
iosched, just like SSD detection?

Thanks,
Shaohua


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-16  3:09 [patch]cfq-iosched: delete deep seeky queue idle logic Shaohua Li
  2011-09-16  6:04 ` Corrado Zoccolo
@ 2011-09-16  9:54 ` Tao Ma
  2011-09-16 14:08   ` Christoph Hellwig
  1 sibling, 1 reply; 19+ messages in thread
From: Tao Ma @ 2011-09-16  9:54 UTC (permalink / raw)
  To: Shaohua Li; +Cc: lkml, Jens Axboe, Maxim Patlasov, Vivek Goyal, Corrado Zoccolo

Hi Shaohua,
On 09/16/2011 11:09 AM, Shaohua Li wrote:
> Recently Maxim and I discussed why his aiostress workload performs poorly. If
> you didn't follow the discussion, here are the issues we found:
> 1. cfq seeky dection isn't good. Assume a task accesses sector A, B, C, D, A+1,
> B+1, C+1, D+1, A+2...Accessing A, B, C, D is random. cfq will detect the queue
> as seeky, but since when accessing A+1, A+1 is already in disk cache, this
> should be detected as sequential really. Not sure if any real workload has such
> access patern, and seems not easy to have a clean fix too. Any idea for this?
This year's FAST has a paper named "A Scheduling Framework That Makes
Any Disk Schedulers Non-Work-Conserving Solely Based on Request
Characteristics". It has described this situation and suggests a new
scheduler named "stream scheduler" to resolve this. But I am not sure
whether CFQ can work like that or not.

Thanks
Tao
> 
> 2. deep seeky queue idle. This makes raid performs poorly. I would think we
> revert the logic. Deep queue is more popular with high end hardware. In such
> hardware, we'd better not do idle.
> Note, currently we set a queue's slice after the first request is finished.
> This means the drive already idles a little time. If the queue is truely deep,
> new requests should already come in, so idle isn't required.
> Looks Vivek used to post a patch to rever it, but it gets ignored.
> http://us.generation-nt.com/patch-cfq-iosched-revert-logic-deep-queues-help-198339681.html
> 
> Signed-off-by: Shaohua Li<shaohua.li@intel.com>
> 
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index a33bd43..f75439e 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -334,7 +334,6 @@ enum cfqq_state_flags {
>  	CFQ_CFQQ_FLAG_sync,		/* synchronous queue */
>  	CFQ_CFQQ_FLAG_coop,		/* cfqq is shared */
>  	CFQ_CFQQ_FLAG_split_coop,	/* shared cfqq will be splitted */
> -	CFQ_CFQQ_FLAG_deep,		/* sync cfqq experienced large depth */
>  	CFQ_CFQQ_FLAG_wait_busy,	/* Waiting for next request */
>  };
>  
> @@ -363,7 +362,6 @@ CFQ_CFQQ_FNS(slice_new);
>  CFQ_CFQQ_FNS(sync);
>  CFQ_CFQQ_FNS(coop);
>  CFQ_CFQQ_FNS(split_coop);
> -CFQ_CFQQ_FNS(deep);
>  CFQ_CFQQ_FNS(wait_busy);
>  #undef CFQ_CFQQ_FNS
>  
> @@ -2375,17 +2373,6 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
>  		goto keep_queue;
>  	}
>  
> -	/*
> -	 * This is a deep seek queue, but the device is much faster than
> -	 * the queue can deliver, don't idle
> -	 **/
> -	if (CFQQ_SEEKY(cfqq) && cfq_cfqq_idle_window(cfqq) &&
> -	    (cfq_cfqq_slice_new(cfqq) ||
> -	    (cfqq->slice_end - jiffies > jiffies - cfqq->slice_start))) {
> -		cfq_clear_cfqq_deep(cfqq);
> -		cfq_clear_cfqq_idle_window(cfqq);
> -	}
> -
>  	if (cfqq->dispatched && cfq_should_idle(cfqd, cfqq)) {
>  		cfqq = NULL;
>  		goto keep_queue;
> @@ -3298,13 +3285,10 @@ cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>  
>  	enable_idle = old_idle = cfq_cfqq_idle_window(cfqq);
>  
> -	if (cfqq->queued[0] + cfqq->queued[1] >= 4)
> -		cfq_mark_cfqq_deep(cfqq);
> -
>  	if (cfqq->next_rq && (cfqq->next_rq->cmd_flags & REQ_NOIDLE))
>  		enable_idle = 0;
>  	else if (!atomic_read(&cic->ioc->nr_tasks) || !cfqd->cfq_slice_idle ||
> -	    (!cfq_cfqq_deep(cfqq) && CFQQ_SEEKY(cfqq)))
> +	    CFQQ_SEEKY(cfqq))
>  		enable_idle = 0;
>  	else if (sample_valid(cic->ttime.ttime_samples)) {
>  		if (cic->ttime.ttime_mean > cfqd->cfq_slice_idle)
> @@ -3874,11 +3858,6 @@ static void cfq_idle_slice_timer(unsigned long data)
>  		 */
>  		if (!RB_EMPTY_ROOT(&cfqq->sort_list))
>  			goto out_kick;
> -
> -		/*
> -		 * Queue depth flag is reset only when the idle didn't succeed
> -		 */
> -		cfq_clear_cfqq_deep(cfqq);
>  	}
>  expire:
>  	cfq_slice_expired(cfqd, timed_out);
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-16  6:04 ` Corrado Zoccolo
  2011-09-16  6:40   ` Shaohua Li
@ 2011-09-16 13:24   ` Vivek Goyal
  2011-09-16 13:37   ` Vivek Goyal
  2 siblings, 0 replies; 19+ messages in thread
From: Vivek Goyal @ 2011-09-16 13:24 UTC (permalink / raw)
  To: Corrado Zoccolo; +Cc: Shaohua Li, lkml, Jens Axboe, Maxim Patlasov

On Fri, Sep 16, 2011 at 08:04:49AM +0200, Corrado Zoccolo wrote:

[..]
> >
> > 2. deep seeky queue idle. This makes raid performs poorly. I would think we
> > revert the logic. Deep queue is more popular with high end hardware. In such
> > hardware, we'd better not do idle.
> > Note, currently we set a queue's slice after the first request is finished.
> > This means the drive already idles a little time. If the queue is truely deep,
> > new requests should already come in, so idle isn't required.
> > Looks Vivek used to post a patch to rever it, but it gets ignored.
> > http://us.generation-nt.com/patch-cfq-iosched-revert-logic-deep-queues-help-198339681.html
> I get a 404 here. I think you are seeing only one half of the medal.
> That logic is there mainly to ensure fairness between deep seeky
> processes and normal seeky processes that want low latency.
> If you remove that logic, a single process making many parallel aio
> reads could completely swamp one machine, preventing other seeky
> processes from progressing.

I think to tackle that we can expire the queue early once it has
dispatched few requests. (Something along the lines of async queues
being expired once they dispatch cfq_prio_to_maxrq() requests).

That way one deep queue will not starve other random queues and
at the same time we will not introduce per queue idling for deep
queues.

Anyway deep queue here is seeky so idling is not helping getting
improved throughput.

> Instead of removing completely the logic, you should make the depth
> configurable, so multi-spindle storages could allow deeper queues
> before switching to fairness-enforcing policy.

I would think getting rid of deep queue logic is probably better than
introducing more tunables. We just need to expire queue after dispatching
few requests so that we do good rounding robin dispatch among all the
queues on sync-noidle service tree.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-16  6:04 ` Corrado Zoccolo
  2011-09-16  6:40   ` Shaohua Li
  2011-09-16 13:24   ` Vivek Goyal
@ 2011-09-16 13:37   ` Vivek Goyal
  2 siblings, 0 replies; 19+ messages in thread
From: Vivek Goyal @ 2011-09-16 13:37 UTC (permalink / raw)
  To: Corrado Zoccolo; +Cc: Shaohua Li, lkml, Jens Axboe, Maxim Patlasov

On Fri, Sep 16, 2011 at 08:04:49AM +0200, Corrado Zoccolo wrote:
> On Fri, Sep 16, 2011 at 5:09 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> > Recently Maxim and I discussed why his aiostress workload performs poorly. If
> > you didn't follow the discussion, here are the issues we found:
> > 1. cfq seeky dection isn't good. Assume a task accesses sector A, B, C, D, A+1,
> > B+1, C+1, D+1, A+2...Accessing A, B, C, D is random. cfq will detect the queue
> > as seeky, but since when accessing A+1, A+1 is already in disk cache, this
> > should be detected as sequential really. Not sure if any real workload has such
> > access patern, and seems not easy to have a clean fix too. Any idea for this?
> 
> Not all disks will cache 4 independent streams, we can't make that
> assumption in cfq.
> The current behaviour of assuming it as seeky should work well enough,
> in fact it will be put in the seeky tree, and it can enjoy the seeky
> tree quantum of time. If the second round takes a short time, it will
> be able to schedule a third round again after the idle time.
> If there are other seeky processes competing for the tree, the cache
> can be cleared by the time it gets back to your 4 streams process, so
> it will behave exactly as a seeky process from cfq point of view.
> If the various accesses were submitted in parallel, the deep seeky
> queue logic should kick in and make sure the process gets a sequential
> quantum, rather than sharing it with other seeky processes, so
> depending on your disk, it could perform better.

I think I agree that we probably can not optimize CFQ for cache behavior
without even knowing what a cache on a device might be doing. There
are no gurantees that by making this 4 stream process sequential you will
get better throughput in fact additional idling can kill the throughput
on faster storage. It probably should be left to the device cache to
optimize for such IO patterns.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-16  9:54 ` Tao Ma
@ 2011-09-16 14:08   ` Christoph Hellwig
  2011-09-16 14:50     ` Tao Ma
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2011-09-16 14:08 UTC (permalink / raw)
  To: Tao Ma
  Cc: Shaohua Li, lkml, Jens Axboe, Maxim Patlasov, Vivek Goyal,
	Corrado Zoccolo

On Fri, Sep 16, 2011 at 05:54:51PM +0800, Tao Ma wrote:
> This year's FAST has a paper named "A Scheduling Framework That Makes
> Any Disk Schedulers Non-Work-Conserving Solely Based on Request
> Characteristics". It has described this situation and suggests a new
> scheduler named "stream scheduler" to resolve this. But I am not sure
> whether CFQ can work like that or not.

As usual I suspect the best thing is to just use noop for these kinds of
cases.  E.g. when you use xfs with the filestreams options you'll get
patterns pretty similar to that in the initial post  - that is
intentional as it is generally use to place them into different areas
of a complex RAID array.  Any scheduler "smarts" will just help to break these
I/O streams.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-16 14:08   ` Christoph Hellwig
@ 2011-09-16 14:50     ` Tao Ma
  0 siblings, 0 replies; 19+ messages in thread
From: Tao Ma @ 2011-09-16 14:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Shaohua Li, lkml, Jens Axboe, Maxim Patlasov, Vivek Goyal,
	Corrado Zoccolo

On 09/16/2011 10:08 PM, Christoph Hellwig wrote:
> On Fri, Sep 16, 2011 at 05:54:51PM +0800, Tao Ma wrote:
>> This year's FAST has a paper named "A Scheduling Framework That Makes
>> Any Disk Schedulers Non-Work-Conserving Solely Based on Request
>> Characteristics". It has described this situation and suggests a new
>> scheduler named "stream scheduler" to resolve this. But I am not sure
>> whether CFQ can work like that or not.
> 
> As usual I suspect the best thing is to just use noop for these kinds of
> cases.  E.g. when you use xfs with the filestreams options you'll get
> patterns pretty similar to that in the initial post  - that is
> intentional as it is generally use to place them into different areas
> of a complex RAID array.  Any scheduler "smarts" will just help to break these
> I/O streams.
yeah, actually the paper does show that the performance of cfq is worse
than noop in this case. ;) See section 3.4 if you are interested.

Thanks
Tao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-16  6:40   ` Shaohua Li
@ 2011-09-16 19:25     ` Corrado Zoccolo
  2011-09-21 11:16       ` Shaohua Li
  0 siblings, 1 reply; 19+ messages in thread
From: Corrado Zoccolo @ 2011-09-16 19:25 UTC (permalink / raw)
  To: Shaohua Li; +Cc: lkml, Jens Axboe, Maxim Patlasov, Vivek Goyal

On Fri, Sep 16, 2011 at 8:40 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> On Fri, 2011-09-16 at 14:04 +0800, Corrado Zoccolo wrote:
>> On Fri, Sep 16, 2011 at 5:09 AM, Shaohua Li <shaohua.li@intel.com> wrote:
>> > Recently Maxim and I discussed why his aiostress workload performs poorly. If
>> > you didn't follow the discussion, here are the issues we found:
>> > 1. cfq seeky dection isn't good. Assume a task accesses sector A, B, C, D, A+1,
>> > B+1, C+1, D+1, A+2...Accessing A, B, C, D is random. cfq will detect the queue
>> > as seeky, but since when accessing A+1, A+1 is already in disk cache, this
>> > should be detected as sequential really. Not sure if any real workload has such
>> > access patern, and seems not easy to have a clean fix too. Any idea for this?
>>
>> Not all disks will cache 4 independent streams, we can't make that
>> assumption in cfq.
> sure thing. we can't make such assumption. I'm thinking if we should
> move the seeky detection in request finish. If time between two requests
> finish is short, we thought the queue is sequential. This will make the
> detection adaptive. but seems time measurement isn't easy.
>
>> The current behaviour of assuming it as seeky should work well enough,
>> in fact it will be put in the seeky tree, and it can enjoy the seeky
>> tree quantum of time. If the second round takes a short time, it will
>> be able to schedule a third round again after the idle time.
>> If there are other seeky processes competing for the tree, the cache
>> can be cleared by the time it gets back to your 4 streams process, so
>> it will behave exactly as a seeky process from cfq point of view.
>> If the various accesses were submitted in parallel, the deep seeky
>> queue logic should kick in and make sure the process gets a sequential
>> quantum, rather than sharing it with other seeky processes, so
>> depending on your disk, it could perform better.
> yes, the idle logic makes it ok, but sounds like "make things wrong
> first (in seeky detection) and then fix it later (the idle logic)".
>
>> > 2. deep seeky queue idle. This makes raid performs poorly. I would think we
>> > revert the logic. Deep queue is more popular with high end hardware. In such
>> > hardware, we'd better not do idle.
>> > Note, currently we set a queue's slice after the first request is finished.
>> > This means the drive already idles a little time. If the queue is truely deep,
>> > new requests should already come in, so idle isn't required.
> What did you think about this? Assume seeky request takes long time, so
> the queue is already idling for a little time.
I don't think I understand. If cfq doesn't idle, it will dispatch an
other request from the same or an other queue (if present)
immediately, until all possible in-flight requests are sent. Now, you
depend on NCQ for the order requests are handled, so you cannot
guarantee fairness any more.

>
>> > Looks Vivek used to post a patch to rever it, but it gets ignored.
>> > http://us.generation-nt.com/patch-cfq-iosched-revert-logic-deep-queues-help-198339681.html
>> I get a 404 here. I think you are seeing only one half of the medal.
>> That logic is there mainly to ensure fairness between deep seeky
>> processes and normal seeky processes that want low latency.
> didn't understand it. The logic doesn't protect non-deep process. how
> could it make the normal seeky process have low latency? or did you have
> a test case for this, so I can analyze?
> I tried a workload with one task drives depth 4 and one task drives
> depth 16. Appears the behavior isn't changed w/wo the logic.
>
Try a workload with one shallow seeky queue and one deep (16) one, on
a single spindle NCQ disk.
I think the behaviour when I submitted my patch was that both were
getting 100ms slice (if this is not happening, probably some
subsequent patch broke it).
If you remove idling, they will get disk time roughly in proportion
16:1, i.e. pretty unfair.

>> If you remove that logic, a single process making many parallel aio
>> reads could completely swamp one machine, preventing other seeky
>> processes from progressing.
>> Instead of removing completely the logic, you should make the depth
>> configurable, so multi-spindle storages could allow deeper queues
>> before switching to fairness-enforcing policy.
> we already had too many tunable ;( And we don't have a way to get the
> maximum depth a storage can provide.
> Could the driver detect it's a multi-spindle storage and report it to
> iosched, just like SSD detection?
I'd really like if this was possoble.

Thanks,
Corrado
>
> Thanks,
> Shaohua
>
>



-- 
__________________________________________________________________________

dott. Corrado Zoccolo                          mailto:czoccolo@gmail.com
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
                               Tales of Power - C. Castaneda

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-16 19:25     ` Corrado Zoccolo
@ 2011-09-21 11:16       ` Shaohua Li
  2011-09-23 13:24         ` Vivek Goyal
       [not found]         ` <CADX3swq0qURdi7VYLAVbsAmX5psPrzq-uvbqANsnLkHO0xcOMQ@mail.gmail.com>
  0 siblings, 2 replies; 19+ messages in thread
From: Shaohua Li @ 2011-09-21 11:16 UTC (permalink / raw)
  To: Corrado Zoccolo; +Cc: lkml, Jens Axboe, Maxim Patlasov, Vivek Goyal

On Sat, 2011-09-17 at 03:25 +0800, Corrado Zoccolo wrote:
> On Fri, Sep 16, 2011 at 8:40 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> > On Fri, 2011-09-16 at 14:04 +0800, Corrado Zoccolo wrote:
> >> On Fri, Sep 16, 2011 at 5:09 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> >> > Recently Maxim and I discussed why his aiostress workload performs poorly. If
> >> > you didn't follow the discussion, here are the issues we found:
> >> > 1. cfq seeky dection isn't good. Assume a task accesses sector A, B, C, D, A+1,
> >> > B+1, C+1, D+1, A+2...Accessing A, B, C, D is random. cfq will detect the queue
> >> > as seeky, but since when accessing A+1, A+1 is already in disk cache, this
> >> > should be detected as sequential really. Not sure if any real workload has such
> >> > access patern, and seems not easy to have a clean fix too. Any idea for this?
> >>
> >> Not all disks will cache 4 independent streams, we can't make that
> >> assumption in cfq.
> > sure thing. we can't make such assumption. I'm thinking if we should
> > move the seeky detection in request finish. If time between two requests
> > finish is short, we thought the queue is sequential. This will make the
> > detection adaptive. but seems time measurement isn't easy.
> >
> >> The current behaviour of assuming it as seeky should work well enough,
> >> in fact it will be put in the seeky tree, and it can enjoy the seeky
> >> tree quantum of time. If the second round takes a short time, it will
> >> be able to schedule a third round again after the idle time.
> >> If there are other seeky processes competing for the tree, the cache
> >> can be cleared by the time it gets back to your 4 streams process, so
> >> it will behave exactly as a seeky process from cfq point of view.
> >> If the various accesses were submitted in parallel, the deep seeky
> >> queue logic should kick in and make sure the process gets a sequential
> >> quantum, rather than sharing it with other seeky processes, so
> >> depending on your disk, it could perform better.
> > yes, the idle logic makes it ok, but sounds like "make things wrong
> > first (in seeky detection) and then fix it later (the idle logic)".
> >
> >> > 2. deep seeky queue idle. This makes raid performs poorly. I would think we
> >> > revert the logic. Deep queue is more popular with high end hardware. In such
> >> > hardware, we'd better not do idle.
> >> > Note, currently we set a queue's slice after the first request is finished.
> >> > This means the drive already idles a little time. If the queue is truely deep,
> >> > new requests should already come in, so idle isn't required.
> > What did you think about this? Assume seeky request takes long time, so
> > the queue is already idling for a little time.
> I don't think I understand. If cfq doesn't idle, it will dispatch an
> other request from the same or an other queue (if present)
> immediately, until all possible in-flight requests are sent. Now, you
> depend on NCQ for the order requests are handled, so you cannot
> guarantee fairness any more.
> 
> >
> >> > Looks Vivek used to post a patch to rever it, but it gets ignored.
> >> > http://us.generation-nt.com/patch-cfq-iosched-revert-logic-deep-queues-help-198339681.html
> >> I get a 404 here. I think you are seeing only one half of the medal.
> >> That logic is there mainly to ensure fairness between deep seeky
> >> processes and normal seeky processes that want low latency.
> > didn't understand it. The logic doesn't protect non-deep process. how
> > could it make the normal seeky process have low latency? or did you have
> > a test case for this, so I can analyze?
> > I tried a workload with one task drives depth 4 and one task drives
> > depth 16. Appears the behavior isn't changed w/wo the logic.
sorry for the delay.

> Try a workload with one shallow seeky queue and one deep (16) one, on
> a single spindle NCQ disk.
> I think the behaviour when I submitted my patch was that both were
> getting 100ms slice (if this is not happening, probably some
> subsequent patch broke it).
> If you remove idling, they will get disk time roughly in proportion
> 16:1, i.e. pretty unfair.
I thought you are talking about a workload with one thread depth 4, and
the other thread depth 16. I did some tests here. In an old kernel,
without the deep seeky idle logic, the threads have disk time in
proportion 1:5. With it, they get almost equal disk time. SO this
reaches your goal. In a latest kernel, w/wo the logic, there is no big
difference (the 16 depth thread get about 5x more disk time). With the
logic, the depth 4 thread gets equal disk time in first several slices.
But after an idle expiration(mostly because current block plug hold
requests in task list and didn't add them to elevator), the queue never
gets detected as deep, because the queue dispatch request one by one. So
the logic is already broken for some time (maybe since block plug is
added).

Thanks,
Shaohua


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-21 11:16       ` Shaohua Li
@ 2011-09-23 13:24         ` Vivek Goyal
  2011-09-25  7:34           ` Corrado Zoccolo
  2011-09-26  0:51           ` Shaohua Li
       [not found]         ` <CADX3swq0qURdi7VYLAVbsAmX5psPrzq-uvbqANsnLkHO0xcOMQ@mail.gmail.com>
  1 sibling, 2 replies; 19+ messages in thread
From: Vivek Goyal @ 2011-09-23 13:24 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Corrado Zoccolo, lkml, Jens Axboe, Maxim Patlasov

On Wed, Sep 21, 2011 at 07:16:20PM +0800, Shaohua Li wrote:

[..]
> > Try a workload with one shallow seeky queue and one deep (16) one, on
> > a single spindle NCQ disk.
> > I think the behaviour when I submitted my patch was that both were
> > getting 100ms slice (if this is not happening, probably some
> > subsequent patch broke it).
> > If you remove idling, they will get disk time roughly in proportion
> > 16:1, i.e. pretty unfair.
> I thought you are talking about a workload with one thread depth 4, and
> the other thread depth 16. I did some tests here. In an old kernel,
> without the deep seeky idle logic, the threads have disk time in
> proportion 1:5. With it, they get almost equal disk time. SO this
> reaches your goal. In a latest kernel, w/wo the logic, there is no big
> difference (the 16 depth thread get about 5x more disk time). With the
> logic, the depth 4 thread gets equal disk time in first several slices.
> But after an idle expiration(mostly because current block plug hold
> requests in task list and didn't add them to elevator), the queue never
> gets detected as deep, because the queue dispatch request one by one.

When the plugged requests are flushed, then they will be added to elevator
and at that point of time queue should be marked as deep?

Anyway, what's wrong with the idea I suggested in other mail of expiring
a sync-noidle queue afer few reuqest dispatches so that it does not
starve other sync-noidle queues.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-23 13:24         ` Vivek Goyal
@ 2011-09-25  7:34           ` Corrado Zoccolo
  2011-09-27 13:08             ` Vivek Goyal
  2011-09-26  0:51           ` Shaohua Li
  1 sibling, 1 reply; 19+ messages in thread
From: Corrado Zoccolo @ 2011-09-25  7:34 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Shaohua Li, lkml, Jens Axboe, Maxim Patlasov

On Fri, Sep 23, 2011 at 3:24 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Sep 21, 2011 at 07:16:20PM +0800, Shaohua Li wrote:
>
> [..]
>> > Try a workload with one shallow seeky queue and one deep (16) one, on
>> > a single spindle NCQ disk.
>> > I think the behaviour when I submitted my patch was that both were
>> > getting 100ms slice (if this is not happening, probably some
>> > subsequent patch broke it).
>> > If you remove idling, they will get disk time roughly in proportion
>> > 16:1, i.e. pretty unfair.
>> I thought you are talking about a workload with one thread depth 4, and
>> the other thread depth 16. I did some tests here. In an old kernel,
>> without the deep seeky idle logic, the threads have disk time in
>> proportion 1:5. With it, they get almost equal disk time. SO this
>> reaches your goal. In a latest kernel, w/wo the logic, there is no big
>> difference (the 16 depth thread get about 5x more disk time). With the
>> logic, the depth 4 thread gets equal disk time in first several slices.
>> But after an idle expiration(mostly because current block plug hold
>> requests in task list and didn't add them to elevator), the queue never
>> gets detected as deep, because the queue dispatch request one by one.
>
> When the plugged requests are flushed, then they will be added to elevator
> and at that point of time queue should be marked as deep?
>
> Anyway, what's wrong with the idea I suggested in other mail of expiring
> a sync-noidle queue afer few reuqest dispatches so that it does not
> starve other sync-noidle queues.
I don't know the current state of the code. Are the noidle queues
sorted in some tree, by sector number?
If that is the case, then even an expired queue could still be in
front of the tree.
>
> Thanks
> Vivek
>



-- 
__________________________________________________________________________

dott. Corrado Zoccolo                          mailto:czoccolo@gmail.com
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
                               Tales of Power - C. Castaneda

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-23 13:24         ` Vivek Goyal
  2011-09-25  7:34           ` Corrado Zoccolo
@ 2011-09-26  0:51           ` Shaohua Li
  2011-09-27 13:11             ` Vivek Goyal
  1 sibling, 1 reply; 19+ messages in thread
From: Shaohua Li @ 2011-09-26  0:51 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Corrado Zoccolo, lkml, Jens Axboe, Maxim Patlasov

On Fri, 2011-09-23 at 21:24 +0800, Vivek Goyal wrote:
> On Wed, Sep 21, 2011 at 07:16:20PM +0800, Shaohua Li wrote:
> 
> [..]
> > > Try a workload with one shallow seeky queue and one deep (16) one, on
> > > a single spindle NCQ disk.
> > > I think the behaviour when I submitted my patch was that both were
> > > getting 100ms slice (if this is not happening, probably some
> > > subsequent patch broke it).
> > > If you remove idling, they will get disk time roughly in proportion
> > > 16:1, i.e. pretty unfair.
> > I thought you are talking about a workload with one thread depth 4, and
> > the other thread depth 16. I did some tests here. In an old kernel,
> > without the deep seeky idle logic, the threads have disk time in
> > proportion 1:5. With it, they get almost equal disk time. SO this
> > reaches your goal. In a latest kernel, w/wo the logic, there is no big
> > difference (the 16 depth thread get about 5x more disk time). With the
> > logic, the depth 4 thread gets equal disk time in first several slices.
> > But after an idle expiration(mostly because current block plug hold
> > requests in task list and didn't add them to elevator), the queue never
> > gets detected as deep, because the queue dispatch request one by one.
> 
> When the plugged requests are flushed, then they will be added to elevator
> and at that point of time queue should be marked as deep?
The problem is there are just 2 or 3 requests are hold to the per-task
list and then get flushed into elevator later, so the queue isn't marked
as deep.

> Anyway, what's wrong with the idea I suggested in other mail of expiring
> a sync-noidle queue afer few reuqest dispatches so that it does not
> starve other sync-noidle queues.
The problem is how many requests a queue should dispatch.
cfq_prio_to_maxrq() == 16, which is too many. Maybe use 4, but it has
its risk. seeky requests from one task might be still much far way with
requests from other tasks.

Thanks,
Shaohua


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
       [not found]         ` <CADX3swq0qURdi7VYLAVbsAmX5psPrzq-uvbqANsnLkHO0xcOMQ@mail.gmail.com>
@ 2011-09-26  0:55           ` Shaohua Li
  2011-09-27  6:07             ` Corrado Zoccolo
  0 siblings, 1 reply; 19+ messages in thread
From: Shaohua Li @ 2011-09-26  0:55 UTC (permalink / raw)
  To: Corrado Zoccolo; +Cc: Vivek Goyal, Maxim Patlasov, Jens Axboe, lkml

On Fri, 2011-09-23 at 13:50 +0800, Corrado Zoccolo wrote:
> Il giorno 21/set/2011 13:16, "Shaohua Li" <shaohua.li@intel.com> ha
> scritto:
> >
> > On Sat, 2011-09-17 at 03:25 +0800, Corrado Zoccolo wrote:
> > > On Fri, Sep 16, 2011 at 8:40 AM, Shaohua Li <shaohua.li@intel.com>
> wrote:
> > > > On Fri, 2011-09-16 at 14:04 +0800, Corrado Zoccolo wrote:
> > > >> On Fri, Sep 16, 2011 at 5:09 AM, Shaohua Li
> <shaohua.li@intel.com> wrote:
> > > >> > Recently Maxim and I discussed why his aiostress workload
> performs poorly. If
> > > >> > you didn't follow the discussion, here are the issues we
> found:
> > > >> > 1. cfq seeky dection isn't good. Assume a task accesses
> sector A, B, C, D, A+1,
> > > >> > B+1, C+1, D+1, A+2...Accessing A, B, C, D is random. cfq will
> detect the queue
> > > >> > as seeky, but since when accessing A+1, A+1 is already in
> disk cache, this
> > > >> > should be detected as sequential really. Not sure if any real
> workload has such
> > > >> > access patern, and seems not easy to have a clean fix too.
> Any idea for this?
> > > >>
> > > >> Not all disks will cache 4 independent streams, we can't make
> that
> > > >> assumption in cfq.
> > > > sure thing. we can't make such assumption. I'm thinking if we
> should
> > > > move the seeky detection in request finish. If time between two
> requests
> > > > finish is short, we thought the queue is sequential. This will
> make the
> > > > detection adaptive. but seems time measurement isn't easy.
> > > >
> > > >> The current behaviour of assuming it as seeky should work well
> enough,
> > > >> in fact it will be put in the seeky tree, and it can enjoy the
> seeky
> > > >> tree quantum of time. If the second round takes a short time,
> it will
> > > >> be able to schedule a third round again after the idle time.
> > > >> If there are other seeky processes competing for the tree, the
> cache
> > > >> can be cleared by the time it gets back to your 4 streams
> process, so
> > > >> it will behave exactly as a seeky process from cfq point of
> view.
> > > >> If the various accesses were submitted in parallel, the deep
> seeky
> > > >> queue logic should kick in and make sure the process gets a
> sequential
> > > >> quantum, rather than sharing it with other seeky processes, so
> > > >> depending on your disk, it could perform better.
> > > > yes, the idle logic makes it ok, but sounds like "make things
> wrong
> > > > first (in seeky detection) and then fix it later (the idle
> logic)".
> > > >
> > > >> > 2. deep seeky queue idle. This makes raid performs poorly. I
> would think we
> > > >> > revert the logic. Deep queue is more popular with high end
> hardware. In such
> > > >> > hardware, we'd better not do idle.
> > > >> > Note, currently we set a queue's slice after the first
> request is finished.
> > > >> > This means the drive already idles a little time. If the
> queue is truely deep,
> > > >> > new requests should already come in, so idle isn't required.
> > > > What did you think about this? Assume seeky request takes long
> time, so
> > > > the queue is already idling for a little time.
> > > I don't think I understand. If cfq doesn't idle, it will dispatch
> an
> > > other request from the same or an other queue (if present)
> > > immediately, until all possible in-flight requests are sent. Now,
> you
> > > depend on NCQ for the order requests are handled, so you cannot
> > > guarantee fairness any more.
> > >
> > > >
> > > >> > Looks Vivek used to post a patch to rever it, but it gets
> ignored.
> > > >> >
> http://us.generation-nt.com/patch-cfq-iosched-revert-logic-deep-queues-help-198339681.html
> > > >> I get a 404 here. I think you are seeing only one half of the
> medal.
> > > >> That logic is there mainly to ensure fairness between deep
> seeky
> > > >> processes and normal seeky processes that want low latency.
> > > > didn't understand it. The logic doesn't protect non-deep
> process. how
> > > > could it make the normal seeky process have low latency? or did
> you have
> > > > a test case for this, so I can analyze?
> > > > I tried a workload with one task drives depth 4 and one task
> drives
> > > > depth 16. Appears the behavior isn't changed w/wo the logic.
> > sorry for the delay.
> >
> > > Try a workload with one shallow seeky queue and one deep (16) one,
> on
> > > a single spindle NCQ disk.
> > > I think the behaviour when I submitted my patch was that both were
> > > getting 100ms slice (if this is not happening, probably some
> > > subsequent patch broke it).
> > > If you remove idling, they will get disk time roughly in
> proportion
> > > 16:1, i.e. pretty unfair.
> > I thought you are talking about a workload with one thread depth 4,
> and
> > the other thread depth 16. I did some tests here. In an old kernel,
> > without the deep seeky idle logic, the threads have disk time in
> > proportion 1:5. With it, they get almost equal disk time. SO this
> > reaches your goal. In a latest kernel, w/wo the logic, there is no
> big
> > difference (the 16 depth thread get about 5x more disk time). With
> the
> > logic, the depth 4 thread gets equal disk time in first several
> slices.
> > But after an idle expiration(mostly because current block plug hold
> > requests in task list and didn't add them to elevator), the queue
> never
> > gets detected as deep, because the queue dispatch request one by
> one. So
> > the logic is already broken for some time (maybe since block plug is
> > added).
> Could be that dispatching requests one by one is harming the
> performance, then?
Not really. Say 4 requests are running, the task dispatches a request
after one previous request is completed. requests are dispatching one by
one but there are still 4 requests running at any time. Checking the
in_flight requests are more precise for the deep detection.

Thanks,
Shaohua


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-26  0:55           ` Shaohua Li
@ 2011-09-27  6:07             ` Corrado Zoccolo
  2011-09-27  6:33               ` Shaohua Li
  0 siblings, 1 reply; 19+ messages in thread
From: Corrado Zoccolo @ 2011-09-27  6:07 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Vivek Goyal, Maxim Patlasov, Jens Axboe, lkml

On Mon, Sep 26, 2011 at 2:55 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> On Fri, 2011-09-23 at 13:50 +0800, Corrado Zoccolo wrote:
>> Il giorno 21/set/2011 13:16, "Shaohua Li" <shaohua.li@intel.com> ha
>> scritto:
>> >
>> > On Sat, 2011-09-17 at 03:25 +0800, Corrado Zoccolo wrote:
>> > > On Fri, Sep 16, 2011 at 8:40 AM, Shaohua Li <shaohua.li@intel.com>
>> wrote:
>> > > > On Fri, 2011-09-16 at 14:04 +0800, Corrado Zoccolo wrote:
>> > > >> On Fri, Sep 16, 2011 at 5:09 AM, Shaohua Li
>> <shaohua.li@intel.com> wrote:
>> > > >> > Recently Maxim and I discussed why his aiostress workload
>> performs poorly. If
>> > > >> > you didn't follow the discussion, here are the issues we
>> found:
>> > > >> > 1. cfq seeky dection isn't good. Assume a task accesses
>> sector A, B, C, D, A+1,
>> > > >> > B+1, C+1, D+1, A+2...Accessing A, B, C, D is random. cfq will
>> detect the queue
>> > > >> > as seeky, but since when accessing A+1, A+1 is already in
>> disk cache, this
>> > > >> > should be detected as sequential really. Not sure if any real
>> workload has such
>> > > >> > access patern, and seems not easy to have a clean fix too.
>> Any idea for this?
>> > > >>
>> > > >> Not all disks will cache 4 independent streams, we can't make
>> that
>> > > >> assumption in cfq.
>> > > > sure thing. we can't make such assumption. I'm thinking if we
>> should
>> > > > move the seeky detection in request finish. If time between two
>> requests
>> > > > finish is short, we thought the queue is sequential. This will
>> make the
>> > > > detection adaptive. but seems time measurement isn't easy.
>> > > >
>> > > >> The current behaviour of assuming it as seeky should work well
>> enough,
>> > > >> in fact it will be put in the seeky tree, and it can enjoy the
>> seeky
>> > > >> tree quantum of time. If the second round takes a short time,
>> it will
>> > > >> be able to schedule a third round again after the idle time.
>> > > >> If there are other seeky processes competing for the tree, the
>> cache
>> > > >> can be cleared by the time it gets back to your 4 streams
>> process, so
>> > > >> it will behave exactly as a seeky process from cfq point of
>> view.
>> > > >> If the various accesses were submitted in parallel, the deep
>> seeky
>> > > >> queue logic should kick in and make sure the process gets a
>> sequential
>> > > >> quantum, rather than sharing it with other seeky processes, so
>> > > >> depending on your disk, it could perform better.
>> > > > yes, the idle logic makes it ok, but sounds like "make things
>> wrong
>> > > > first (in seeky detection) and then fix it later (the idle
>> logic)".
>> > > >
>> > > >> > 2. deep seeky queue idle. This makes raid performs poorly. I
>> would think we
>> > > >> > revert the logic. Deep queue is more popular with high end
>> hardware. In such
>> > > >> > hardware, we'd better not do idle.
>> > > >> > Note, currently we set a queue's slice after the first
>> request is finished.
>> > > >> > This means the drive already idles a little time. If the
>> queue is truely deep,
>> > > >> > new requests should already come in, so idle isn't required.
>> > > > What did you think about this? Assume seeky request takes long
>> time, so
>> > > > the queue is already idling for a little time.
>> > > I don't think I understand. If cfq doesn't idle, it will dispatch
>> an
>> > > other request from the same or an other queue (if present)
>> > > immediately, until all possible in-flight requests are sent. Now,
>> you
>> > > depend on NCQ for the order requests are handled, so you cannot
>> > > guarantee fairness any more.
>> > >
>> > > >
>> > > >> > Looks Vivek used to post a patch to rever it, but it gets
>> ignored.
>> > > >> >
>> http://us.generation-nt.com/patch-cfq-iosched-revert-logic-deep-queues-help-198339681.html
>> > > >> I get a 404 here. I think you are seeing only one half of the
>> medal.
>> > > >> That logic is there mainly to ensure fairness between deep
>> seeky
>> > > >> processes and normal seeky processes that want low latency.
>> > > > didn't understand it. The logic doesn't protect non-deep
>> process. how
>> > > > could it make the normal seeky process have low latency? or did
>> you have
>> > > > a test case for this, so I can analyze?
>> > > > I tried a workload with one task drives depth 4 and one task
>> drives
>> > > > depth 16. Appears the behavior isn't changed w/wo the logic.
>> > sorry for the delay.
>> >
>> > > Try a workload with one shallow seeky queue and one deep (16) one,
>> on
>> > > a single spindle NCQ disk.
>> > > I think the behaviour when I submitted my patch was that both were
>> > > getting 100ms slice (if this is not happening, probably some
>> > > subsequent patch broke it).
>> > > If you remove idling, they will get disk time roughly in
>> proportion
>> > > 16:1, i.e. pretty unfair.
>> > I thought you are talking about a workload with one thread depth 4,
>> and
>> > the other thread depth 16. I did some tests here. In an old kernel,
>> > without the deep seeky idle logic, the threads have disk time in
>> > proportion 1:5. With it, they get almost equal disk time. SO this
>> > reaches your goal. In a latest kernel, w/wo the logic, there is no
>> big
>> > difference (the 16 depth thread get about 5x more disk time). With
>> the
>> > logic, the depth 4 thread gets equal disk time in first several
>> slices.
>> > But after an idle expiration(mostly because current block plug hold
>> > requests in task list and didn't add them to elevator), the queue
>> never
>> > gets detected as deep, because the queue dispatch request one by
>> one. So
>> > the logic is already broken for some time (maybe since block plug is
>> > added).
>> Could be that dispatching requests one by one is harming the
>> performance, then?
> Not really. Say 4 requests are running, the task dispatches a request
> after one previous request is completed. requests are dispatching one by
> one but there are still 4 requests running at any time. Checking the
> in_flight requests are more precise for the deep detection.
>
What happens if there are 4 tasks, all that could dispatch 4 requests
in parallel? Will we reach and sustain 16 in flight requests, or it
will bounce around 4 in flight? I think here we could get a big
difference.
Probably it is better to move the deep queue detection logic in the
per-task queue?
Then cfq will decide if it should dispatch few requests from every
task (shallow case) or all requests from a single task (deep), and
then idle.

Thanks
Corrado

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-27  6:07             ` Corrado Zoccolo
@ 2011-09-27  6:33               ` Shaohua Li
  2011-09-28  7:09                 ` Corrado Zoccolo
  0 siblings, 1 reply; 19+ messages in thread
From: Shaohua Li @ 2011-09-27  6:33 UTC (permalink / raw)
  To: Corrado Zoccolo; +Cc: Vivek Goyal, Maxim Patlasov, Jens Axboe, lkml

On Tue, 2011-09-27 at 14:07 +0800, Corrado Zoccolo wrote:
> On Mon, Sep 26, 2011 at 2:55 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> > On Fri, 2011-09-23 at 13:50 +0800, Corrado Zoccolo wrote:
> >> Il giorno 21/set/2011 13:16, "Shaohua Li" <shaohua.li@intel.com> ha
> >> scritto:
> >> >
> >> > On Sat, 2011-09-17 at 03:25 +0800, Corrado Zoccolo wrote:
> >> > > On Fri, Sep 16, 2011 at 8:40 AM, Shaohua Li <shaohua.li@intel.com>
> >> wrote:
> >> > > > On Fri, 2011-09-16 at 14:04 +0800, Corrado Zoccolo wrote:
> >> > > >> On Fri, Sep 16, 2011 at 5:09 AM, Shaohua Li
> >> <shaohua.li@intel.com> wrote:
> >> > > >> > Recently Maxim and I discussed why his aiostress workload
> >> performs poorly. If
> >> > > >> > you didn't follow the discussion, here are the issues we
> >> found:
> >> > > >> > 1. cfq seeky dection isn't good. Assume a task accesses
> >> sector A, B, C, D, A+1,
> >> > > >> > B+1, C+1, D+1, A+2...Accessing A, B, C, D is random. cfq will
> >> detect the queue
> >> > > >> > as seeky, but since when accessing A+1, A+1 is already in
> >> disk cache, this
> >> > > >> > should be detected as sequential really. Not sure if any real
> >> workload has such
> >> > > >> > access patern, and seems not easy to have a clean fix too.
> >> Any idea for this?
> >> > > >>
> >> > > >> Not all disks will cache 4 independent streams, we can't make
> >> that
> >> > > >> assumption in cfq.
> >> > > > sure thing. we can't make such assumption. I'm thinking if we
> >> should
> >> > > > move the seeky detection in request finish. If time between two
> >> requests
> >> > > > finish is short, we thought the queue is sequential. This will
> >> make the
> >> > > > detection adaptive. but seems time measurement isn't easy.
> >> > > >
> >> > > >> The current behaviour of assuming it as seeky should work well
> >> enough,
> >> > > >> in fact it will be put in the seeky tree, and it can enjoy the
> >> seeky
> >> > > >> tree quantum of time. If the second round takes a short time,
> >> it will
> >> > > >> be able to schedule a third round again after the idle time.
> >> > > >> If there are other seeky processes competing for the tree, the
> >> cache
> >> > > >> can be cleared by the time it gets back to your 4 streams
> >> process, so
> >> > > >> it will behave exactly as a seeky process from cfq point of
> >> view.
> >> > > >> If the various accesses were submitted in parallel, the deep
> >> seeky
> >> > > >> queue logic should kick in and make sure the process gets a
> >> sequential
> >> > > >> quantum, rather than sharing it with other seeky processes, so
> >> > > >> depending on your disk, it could perform better.
> >> > > > yes, the idle logic makes it ok, but sounds like "make things
> >> wrong
> >> > > > first (in seeky detection) and then fix it later (the idle
> >> logic)".
> >> > > >
> >> > > >> > 2. deep seeky queue idle. This makes raid performs poorly. I
> >> would think we
> >> > > >> > revert the logic. Deep queue is more popular with high end
> >> hardware. In such
> >> > > >> > hardware, we'd better not do idle.
> >> > > >> > Note, currently we set a queue's slice after the first
> >> request is finished.
> >> > > >> > This means the drive already idles a little time. If the
> >> queue is truely deep,
> >> > > >> > new requests should already come in, so idle isn't required.
> >> > > > What did you think about this? Assume seeky request takes long
> >> time, so
> >> > > > the queue is already idling for a little time.
> >> > > I don't think I understand. If cfq doesn't idle, it will dispatch
> >> an
> >> > > other request from the same or an other queue (if present)
> >> > > immediately, until all possible in-flight requests are sent. Now,
> >> you
> >> > > depend on NCQ for the order requests are handled, so you cannot
> >> > > guarantee fairness any more.
> >> > >
> >> > > >
> >> > > >> > Looks Vivek used to post a patch to rever it, but it gets
> >> ignored.
> >> > > >> >
> >> http://us.generation-nt.com/patch-cfq-iosched-revert-logic-deep-queues-help-198339681.html
> >> > > >> I get a 404 here. I think you are seeing only one half of the
> >> medal.
> >> > > >> That logic is there mainly to ensure fairness between deep
> >> seeky
> >> > > >> processes and normal seeky processes that want low latency.
> >> > > > didn't understand it. The logic doesn't protect non-deep
> >> process. how
> >> > > > could it make the normal seeky process have low latency? or did
> >> you have
> >> > > > a test case for this, so I can analyze?
> >> > > > I tried a workload with one task drives depth 4 and one task
> >> drives
> >> > > > depth 16. Appears the behavior isn't changed w/wo the logic.
> >> > sorry for the delay.
> >> >
> >> > > Try a workload with one shallow seeky queue and one deep (16) one,
> >> on
> >> > > a single spindle NCQ disk.
> >> > > I think the behaviour when I submitted my patch was that both were
> >> > > getting 100ms slice (if this is not happening, probably some
> >> > > subsequent patch broke it).
> >> > > If you remove idling, they will get disk time roughly in
> >> proportion
> >> > > 16:1, i.e. pretty unfair.
> >> > I thought you are talking about a workload with one thread depth 4,
> >> and
> >> > the other thread depth 16. I did some tests here. In an old kernel,
> >> > without the deep seeky idle logic, the threads have disk time in
> >> > proportion 1:5. With it, they get almost equal disk time. SO this
> >> > reaches your goal. In a latest kernel, w/wo the logic, there is no
> >> big
> >> > difference (the 16 depth thread get about 5x more disk time). With
> >> the
> >> > logic, the depth 4 thread gets equal disk time in first several
> >> slices.
> >> > But after an idle expiration(mostly because current block plug hold
> >> > requests in task list and didn't add them to elevator), the queue
> >> never
> >> > gets detected as deep, because the queue dispatch request one by
> >> one. So
> >> > the logic is already broken for some time (maybe since block plug is
> >> > added).
> >> Could be that dispatching requests one by one is harming the
> >> performance, then?
> > Not really. Say 4 requests are running, the task dispatches a request
> > after one previous request is completed. requests are dispatching one by
> > one but there are still 4 requests running at any time. Checking the
> > in_flight requests are more precise for the deep detection.
> >
> What happens if there are 4 tasks, all that could dispatch 4 requests
> in parallel? Will we reach and sustain 16 in flight requests, or it
> will bounce around 4 in flight? I think here we could get a big
> difference.
ah, yes, we really should change
if (cfqq->queued[0] + cfqq->queued[1] >= 4)
                cfq_mark_cfqq_deep(cfqq);
to
if (cfqq->queued[0] + cfqq->queued[1] + cfqq->cfqq->dispatched >= 4)
                cfq_mark_cfqq_deep(cfqq);
this is a bug, though this isn't related to the original raid issue.

> Probably it is better to move the deep queue detection logic in the
> per-task queue?
it's per-task queue currently.

> Then cfq will decide if it should dispatch few requests from every
> task (shallow case) or all requests from a single task (deep), and
> then idle.
don't get your point. detect the deep logic considering all tasks?

Thanks,
Shaohua


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-25  7:34           ` Corrado Zoccolo
@ 2011-09-27 13:08             ` Vivek Goyal
  0 siblings, 0 replies; 19+ messages in thread
From: Vivek Goyal @ 2011-09-27 13:08 UTC (permalink / raw)
  To: Corrado Zoccolo; +Cc: Shaohua Li, lkml, Jens Axboe, Maxim Patlasov

On Sun, Sep 25, 2011 at 09:34:16AM +0200, Corrado Zoccolo wrote:

[..]
> >
> > Anyway, what's wrong with the idea I suggested in other mail of expiring
> > a sync-noidle queue afer few reuqest dispatches so that it does not
> > starve other sync-noidle queues.
> I don't know the current state of the code. Are the noidle queues
> sorted in some tree, by sector number?
> If that is the case, then even an expired queue could still be in
> front of the tree.

I am not aware of any queue sorting based on sector number on sync-noidle
tree. So we should just be able to do round robin.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-26  0:51           ` Shaohua Li
@ 2011-09-27 13:11             ` Vivek Goyal
  0 siblings, 0 replies; 19+ messages in thread
From: Vivek Goyal @ 2011-09-27 13:11 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Corrado Zoccolo, lkml, Jens Axboe, Maxim Patlasov

On Mon, Sep 26, 2011 at 08:51:39AM +0800, Shaohua Li wrote:
> On Fri, 2011-09-23 at 21:24 +0800, Vivek Goyal wrote:
> > On Wed, Sep 21, 2011 at 07:16:20PM +0800, Shaohua Li wrote:
> > 
> > [..]
> > > > Try a workload with one shallow seeky queue and one deep (16) one, on
> > > > a single spindle NCQ disk.
> > > > I think the behaviour when I submitted my patch was that both were
> > > > getting 100ms slice (if this is not happening, probably some
> > > > subsequent patch broke it).
> > > > If you remove idling, they will get disk time roughly in proportion
> > > > 16:1, i.e. pretty unfair.
> > > I thought you are talking about a workload with one thread depth 4, and
> > > the other thread depth 16. I did some tests here. In an old kernel,
> > > without the deep seeky idle logic, the threads have disk time in
> > > proportion 1:5. With it, they get almost equal disk time. SO this
> > > reaches your goal. In a latest kernel, w/wo the logic, there is no big
> > > difference (the 16 depth thread get about 5x more disk time). With the
> > > logic, the depth 4 thread gets equal disk time in first several slices.
> > > But after an idle expiration(mostly because current block plug hold
> > > requests in task list and didn't add them to elevator), the queue never
> > > gets detected as deep, because the queue dispatch request one by one.
> > 
> > When the plugged requests are flushed, then they will be added to elevator
> > and at that point of time queue should be marked as deep?
> The problem is there are just 2 or 3 requests are hold to the per-task
> list and then get flushed into elevator later, so the queue isn't marked
> as deep.

That would be workload dependent. Isn't it?

> 
> > Anyway, what's wrong with the idea I suggested in other mail of expiring
> > a sync-noidle queue afer few reuqest dispatches so that it does not
> > starve other sync-noidle queues.
> The problem is how many requests a queue should dispatch.
> cfq_prio_to_maxrq() == 16, which is too many. Maybe use 4, but it has
> its risk. seeky requests from one task might be still much far way with
> requests from other tasks.

4-6 might be a reasonable number to begin with. I am not sure about the
throughput impact thing because seek distance might be more by moving
to a different task. And also fairness might have some cost. Lets run some
tests and see if something shows up.

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch]cfq-iosched: delete deep seeky queue idle logic
  2011-09-27  6:33               ` Shaohua Li
@ 2011-09-28  7:09                 ` Corrado Zoccolo
  0 siblings, 0 replies; 19+ messages in thread
From: Corrado Zoccolo @ 2011-09-28  7:09 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Vivek Goyal, Maxim Patlasov, Jens Axboe, lkml

On Tue, Sep 27, 2011 at 8:33 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> On Tue, 2011-09-27 at 14:07 +0800, Corrado Zoccolo wrote:
>> On Mon, Sep 26, 2011 at 2:55 AM, Shaohua Li <shaohua.li@intel.com> wrote:
>> > On Fri, 2011-09-23 at 13:50 +0800, Corrado Zoccolo wrote:
>> >> Il giorno 21/set/2011 13:16, "Shaohua Li" <shaohua.li@intel.com> ha
>> >> scritto:
>> >> >
>> >> > On Sat, 2011-09-17 at 03:25 +0800, Corrado Zoccolo wrote:
>> >> > > On Fri, Sep 16, 2011 at 8:40 AM, Shaohua Li <shaohua.li@intel.com>
>> >> wrote:
>> >> > > > On Fri, 2011-09-16 at 14:04 +0800, Corrado Zoccolo wrote:
>> >> > > >> On Fri, Sep 16, 2011 at 5:09 AM, Shaohua Li
>> >> <shaohua.li@intel.com> wrote:
>> >> > > >> > Recently Maxim and I discussed why his aiostress workload
>> >> performs poorly. If
>> >> > > >> > you didn't follow the discussion, here are the issues we
>> >> found:
>> >> > > >> > 1. cfq seeky dection isn't good. Assume a task accesses
>> >> sector A, B, C, D, A+1,
>> >> > > >> > B+1, C+1, D+1, A+2...Accessing A, B, C, D is random. cfq will
>> >> detect the queue
>> >> > > >> > as seeky, but since when accessing A+1, A+1 is already in
>> >> disk cache, this
>> >> > > >> > should be detected as sequential really. Not sure if any real
>> >> workload has such
>> >> > > >> > access patern, and seems not easy to have a clean fix too.
>> >> Any idea for this?
>> >> > > >>
>> >> > > >> Not all disks will cache 4 independent streams, we can't make
>> >> that
>> >> > > >> assumption in cfq.
>> >> > > > sure thing. we can't make such assumption. I'm thinking if we
>> >> should
>> >> > > > move the seeky detection in request finish. If time between two
>> >> requests
>> >> > > > finish is short, we thought the queue is sequential. This will
>> >> make the
>> >> > > > detection adaptive. but seems time measurement isn't easy.
>> >> > > >
>> >> > > >> The current behaviour of assuming it as seeky should work well
>> >> enough,
>> >> > > >> in fact it will be put in the seeky tree, and it can enjoy the
>> >> seeky
>> >> > > >> tree quantum of time. If the second round takes a short time,
>> >> it will
>> >> > > >> be able to schedule a third round again after the idle time.
>> >> > > >> If there are other seeky processes competing for the tree, the
>> >> cache
>> >> > > >> can be cleared by the time it gets back to your 4 streams
>> >> process, so
>> >> > > >> it will behave exactly as a seeky process from cfq point of
>> >> view.
>> >> > > >> If the various accesses were submitted in parallel, the deep
>> >> seeky
>> >> > > >> queue logic should kick in and make sure the process gets a
>> >> sequential
>> >> > > >> quantum, rather than sharing it with other seeky processes, so
>> >> > > >> depending on your disk, it could perform better.
>> >> > > > yes, the idle logic makes it ok, but sounds like "make things
>> >> wrong
>> >> > > > first (in seeky detection) and then fix it later (the idle
>> >> logic)".
>> >> > > >
>> >> > > >> > 2. deep seeky queue idle. This makes raid performs poorly. I
>> >> would think we
>> >> > > >> > revert the logic. Deep queue is more popular with high end
>> >> hardware. In such
>> >> > > >> > hardware, we'd better not do idle.
>> >> > > >> > Note, currently we set a queue's slice after the first
>> >> request is finished.
>> >> > > >> > This means the drive already idles a little time. If the
>> >> queue is truely deep,
>> >> > > >> > new requests should already come in, so idle isn't required.
>> >> > > > What did you think about this? Assume seeky request takes long
>> >> time, so
>> >> > > > the queue is already idling for a little time.
>> >> > > I don't think I understand. If cfq doesn't idle, it will dispatch
>> >> an
>> >> > > other request from the same or an other queue (if present)
>> >> > > immediately, until all possible in-flight requests are sent. Now,
>> >> you
>> >> > > depend on NCQ for the order requests are handled, so you cannot
>> >> > > guarantee fairness any more.
>> >> > >
>> >> > > >
>> >> > > >> > Looks Vivek used to post a patch to rever it, but it gets
>> >> ignored.
>> >> > > >> >
>> >> http://us.generation-nt.com/patch-cfq-iosched-revert-logic-deep-queues-help-198339681.html
>> >> > > >> I get a 404 here. I think you are seeing only one half of the
>> >> medal.
>> >> > > >> That logic is there mainly to ensure fairness between deep
>> >> seeky
>> >> > > >> processes and normal seeky processes that want low latency.
>> >> > > > didn't understand it. The logic doesn't protect non-deep
>> >> process. how
>> >> > > > could it make the normal seeky process have low latency? or did
>> >> you have
>> >> > > > a test case for this, so I can analyze?
>> >> > > > I tried a workload with one task drives depth 4 and one task
>> >> drives
>> >> > > > depth 16. Appears the behavior isn't changed w/wo the logic.
>> >> > sorry for the delay.
>> >> >
>> >> > > Try a workload with one shallow seeky queue and one deep (16) one,
>> >> on
>> >> > > a single spindle NCQ disk.
>> >> > > I think the behaviour when I submitted my patch was that both were
>> >> > > getting 100ms slice (if this is not happening, probably some
>> >> > > subsequent patch broke it).
>> >> > > If you remove idling, they will get disk time roughly in
>> >> proportion
>> >> > > 16:1, i.e. pretty unfair.
>> >> > I thought you are talking about a workload with one thread depth 4,
>> >> and
>> >> > the other thread depth 16. I did some tests here. In an old kernel,
>> >> > without the deep seeky idle logic, the threads have disk time in
>> >> > proportion 1:5. With it, they get almost equal disk time. SO this
>> >> > reaches your goal. In a latest kernel, w/wo the logic, there is no
>> >> big
>> >> > difference (the 16 depth thread get about 5x more disk time). With
>> >> the
>> >> > logic, the depth 4 thread gets equal disk time in first several
>> >> slices.
>> >> > But after an idle expiration(mostly because current block plug hold
>> >> > requests in task list and didn't add them to elevator), the queue
>> >> never
>> >> > gets detected as deep, because the queue dispatch request one by
>> >> one. So
>> >> > the logic is already broken for some time (maybe since block plug is
>> >> > added).
>> >> Could be that dispatching requests one by one is harming the
>> >> performance, then?
>> > Not really. Say 4 requests are running, the task dispatches a request
>> > after one previous request is completed. requests are dispatching one by
>> > one but there are still 4 requests running at any time. Checking the
>> > in_flight requests are more precise for the deep detection.
>> >
>> What happens if there are 4 tasks, all that could dispatch 4 requests
>> in parallel? Will we reach and sustain 16 in flight requests, or it
>> will bounce around 4 in flight? I think here we could get a big
>> difference.
> ah, yes, we really should change
> if (cfqq->queued[0] + cfqq->queued[1] >= 4)
>                cfq_mark_cfqq_deep(cfqq);
> to
> if (cfqq->queued[0] + cfqq->queued[1] + cfqq->cfqq->dispatched >= 4)
>                cfq_mark_cfqq_deep(cfqq);
> this is a bug, though this isn't related to the original raid issue.
>
>> Probably it is better to move the deep queue detection logic in the
>> per-task queue?
> it's per-task queue currently.
>
>> Then cfq will decide if it should dispatch few requests from every
>> task (shallow case) or all requests from a single task (deep), and
>> then idle.
> don't get your point. detect the deep logic considering all tasks?
No no, I was just sayng that if no queue is marked deep, then they
will end up all on the seeky tree, so cfq will dispatch requests from
each before idling. This is the ideal case for raid, I think, so you
only need to identify the sweet spot in which you prefer to insulate a
seeky queue (mark it as deep) from the others, for the reasons that
Vivek also points out, i.e. that a single queue, even being seeky,
could have requests more concentrated, so the seek time could be
shorter.

Thanks,
Corrado

> Thanks,
> Shaohua
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-09-28  7:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-16  3:09 [patch]cfq-iosched: delete deep seeky queue idle logic Shaohua Li
2011-09-16  6:04 ` Corrado Zoccolo
2011-09-16  6:40   ` Shaohua Li
2011-09-16 19:25     ` Corrado Zoccolo
2011-09-21 11:16       ` Shaohua Li
2011-09-23 13:24         ` Vivek Goyal
2011-09-25  7:34           ` Corrado Zoccolo
2011-09-27 13:08             ` Vivek Goyal
2011-09-26  0:51           ` Shaohua Li
2011-09-27 13:11             ` Vivek Goyal
     [not found]         ` <CADX3swq0qURdi7VYLAVbsAmX5psPrzq-uvbqANsnLkHO0xcOMQ@mail.gmail.com>
2011-09-26  0:55           ` Shaohua Li
2011-09-27  6:07             ` Corrado Zoccolo
2011-09-27  6:33               ` Shaohua Li
2011-09-28  7:09                 ` Corrado Zoccolo
2011-09-16 13:24   ` Vivek Goyal
2011-09-16 13:37   ` Vivek Goyal
2011-09-16  9:54 ` Tao Ma
2011-09-16 14:08   ` Christoph Hellwig
2011-09-16 14:50     ` Tao Ma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox