public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] cfq-iosched: quantum check tweak --resend
@ 2010-03-01  1:50 Shaohua Li
  2010-03-01  8:02 ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Shaohua Li @ 2010-03-01  1:50 UTC (permalink / raw)
  To: jens.axboe; +Cc: linux-kernel, czoccolo, vgoyal, jmoyer, guijianfeng

Currently a queue can only dispatch up to 4 requests if there are other queues.
This isn't optimal, device can handle more requests, for example, AHCI can
handle 31 requests. I can understand the limit is for fairness, but we could
do a tweak: if the queue still has a lot of slice left, sounds we could
ignore the limit. Test shows this boost my workload (two thread randread of
a SSD) from 78m/s to 100m/s.
Thanks for suggestions from Corrado and Vivek for the patch.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
---
 block/cfq-iosched.c |   30 ++++++++++++++++++++++++++----
 1 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index f27e535..0db07d7 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -19,7 +19,7 @@
  * tunables
  */
 /* max queue in one round of service */
-static const int cfq_quantum = 4;
+static const int cfq_quantum = 8;
 static const int cfq_fifo_expire[2] = { HZ / 4, HZ / 8 };
 /* maximum backwards seek, in KiB */
 static const int cfq_back_max = 16 * 1024;
@@ -2197,6 +2197,19 @@ static int cfq_forced_dispatch(struct cfq_data *cfqd)
 	return dispatched;
 }
 
+static inline bool cfq_slice_used_soon(struct cfq_data *cfqd,
+	struct cfq_queue *cfqq)
+{
+	/* the queue hasn't finished any request, can't estimate */
+	if (cfq_cfqq_slice_new(cfqq))
+		return 1;
+	if (time_after(jiffies + cfqd->cfq_slice_idle * cfqq->dispatched,
+		cfqq->slice_end))
+		return 1;
+
+	return 0;
+}
+
 static bool cfq_may_dispatch(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 {
 	unsigned int max_dispatch;
@@ -2213,7 +2226,7 @@ static bool cfq_may_dispatch(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 	if (cfqd->rq_in_flight[BLK_RW_SYNC] && !cfq_cfqq_sync(cfqq))
 		return false;
 
-	max_dispatch = cfqd->cfq_quantum;
+	max_dispatch = max_t(unsigned int, cfqd->cfq_quantum / 2, 1);
 	if (cfq_class_idle(cfqq))
 		max_dispatch = 1;
 
@@ -2230,13 +2243,22 @@ static bool cfq_may_dispatch(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 		/*
 		 * We have other queues, don't allow more IO from this one
 		 */
-		if (cfqd->busy_queues > 1)
+		if (cfqd->busy_queues > 1 && cfq_slice_used_soon(cfqd, cfqq))
 			return false;
 
 		/*
 		 * Sole queue user, no limit
 		 */
-		max_dispatch = -1;
+		if (cfqd->busy_queues == 1)
+			max_dispatch = -1;
+		else
+			/*
+			 * Normally we start throttling cfqq when cfq_quantum/2
+			 * requests have been dispatched. But we can drive
+			 * deeper queue depths at the beginning of slice
+			 * subjected to upper limit of cfq_quantum.
+			 * */
+			max_dispatch = cfqd->cfq_quantum;
 	}
 
 	/*
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] cfq-iosched: quantum check tweak --resend
  2010-03-01  1:50 [PATCH] cfq-iosched: quantum check tweak --resend Shaohua Li
@ 2010-03-01  8:02 ` Jens Axboe
  2010-03-01  8:15   ` Shaohua Li
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2010-03-01  8:02 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-kernel, czoccolo, vgoyal, jmoyer, guijianfeng

On Mon, Mar 01 2010, Shaohua Li wrote:
> Currently a queue can only dispatch up to 4 requests if there are other queues.
> This isn't optimal, device can handle more requests, for example, AHCI can
> handle 31 requests. I can understand the limit is for fairness, but we could
> do a tweak: if the queue still has a lot of slice left, sounds we could
> ignore the limit. Test shows this boost my workload (two thread randread of
> a SSD) from 78m/s to 100m/s.
> Thanks for suggestions from Corrado and Vivek for the patch.

As mentioned before, I think we definitely want to ensure that we drive
the full queue depth whenever possible. I think your patch is a bit
dangerous, though. The problematic workload here is a buffered write,
interleaved with the occasional sync reader. If the sync reader has to
endure 32 requests every time, latency rises dramatically for him.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] cfq-iosched: quantum check tweak --resend
  2010-03-01  8:02 ` Jens Axboe
@ 2010-03-01  8:15   ` Shaohua Li
  2010-03-01  8:19     ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Shaohua Li @ 2010-03-01  8:15 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-kernel@vger.kernel.org, czoccolo@gmail.com,
	vgoyal@redhat.com, jmoyer@redhat.com, guijianfeng@cn.fujitsu.com

On Mon, Mar 01, 2010 at 04:02:34PM +0800, Jens Axboe wrote:
> On Mon, Mar 01 2010, Shaohua Li wrote:
> > Currently a queue can only dispatch up to 4 requests if there are other queues.
> > This isn't optimal, device can handle more requests, for example, AHCI can
> > handle 31 requests. I can understand the limit is for fairness, but we could
> > do a tweak: if the queue still has a lot of slice left, sounds we could
> > ignore the limit. Test shows this boost my workload (two thread randread of
> > a SSD) from 78m/s to 100m/s.
> > Thanks for suggestions from Corrado and Vivek for the patch.
> 
> As mentioned before, I think we definitely want to ensure that we drive
> the full queue depth whenever possible. I think your patch is a bit
> dangerous, though. The problematic workload here is a buffered write,
> interleaved with the occasional sync reader. If the sync reader has to
> endure 32 requests every time, latency rises dramatically for him.
the patch still matains a hardlimit for dispatched request. For a async,
the limit is cfq_slice_async/cfq_slice_idle = 5. For sync, the limit is 8.
And we only pipe out such number of requests at the begining of a slice.
For the workload you mentioned here, we only dispatch 1 extra request.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] cfq-iosched: quantum check tweak --resend
  2010-03-01  8:15   ` Shaohua Li
@ 2010-03-01  8:19     ` Jens Axboe
  2010-03-01  8:22       ` Shaohua Li
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2010-03-01  8:19 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-kernel@vger.kernel.org, czoccolo@gmail.com,
	vgoyal@redhat.com, jmoyer@redhat.com, guijianfeng@cn.fujitsu.com

On Mon, Mar 01 2010, Shaohua Li wrote:
> On Mon, Mar 01, 2010 at 04:02:34PM +0800, Jens Axboe wrote:
> > On Mon, Mar 01 2010, Shaohua Li wrote:
> > > Currently a queue can only dispatch up to 4 requests if there are other queues.
> > > This isn't optimal, device can handle more requests, for example, AHCI can
> > > handle 31 requests. I can understand the limit is for fairness, but we could
> > > do a tweak: if the queue still has a lot of slice left, sounds we could
> > > ignore the limit. Test shows this boost my workload (two thread randread of
> > > a SSD) from 78m/s to 100m/s.
> > > Thanks for suggestions from Corrado and Vivek for the patch.
> > 
> > As mentioned before, I think we definitely want to ensure that we drive
> > the full queue depth whenever possible. I think your patch is a bit
> > dangerous, though. The problematic workload here is a buffered write,
> > interleaved with the occasional sync reader. If the sync reader has to
> > endure 32 requests every time, latency rises dramatically for him.
> the patch still matains a hardlimit for dispatched request. For a async,
> the limit is cfq_slice_async/cfq_slice_idle = 5. For sync, the limit is 8.
> And we only pipe out such number of requests at the begining of a slice.
> For the workload you mentioned here, we only dispatch 1 extra request.

OK, that sound appropriate. Final question - why change the quantum and
use quantum/2?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] cfq-iosched: quantum check tweak --resend
  2010-03-01  8:19     ` Jens Axboe
@ 2010-03-01  8:22       ` Shaohua Li
  2010-03-01  8:25         ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Shaohua Li @ 2010-03-01  8:22 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-kernel@vger.kernel.org, czoccolo@gmail.com,
	vgoyal@redhat.com, jmoyer@redhat.com, guijianfeng@cn.fujitsu.com

On Mon, Mar 01, 2010 at 04:19:20PM +0800, Jens Axboe wrote:
> On Mon, Mar 01 2010, Shaohua Li wrote:
> > On Mon, Mar 01, 2010 at 04:02:34PM +0800, Jens Axboe wrote:
> > > On Mon, Mar 01 2010, Shaohua Li wrote:
> > > > Currently a queue can only dispatch up to 4 requests if there are other queues.
> > > > This isn't optimal, device can handle more requests, for example, AHCI can
> > > > handle 31 requests. I can understand the limit is for fairness, but we could
> > > > do a tweak: if the queue still has a lot of slice left, sounds we could
> > > > ignore the limit. Test shows this boost my workload (two thread randread of
> > > > a SSD) from 78m/s to 100m/s.
> > > > Thanks for suggestions from Corrado and Vivek for the patch.
> > > 
> > > As mentioned before, I think we definitely want to ensure that we drive
> > > the full queue depth whenever possible. I think your patch is a bit
> > > dangerous, though. The problematic workload here is a buffered write,
> > > interleaved with the occasional sync reader. If the sync reader has to
> > > endure 32 requests every time, latency rises dramatically for him.
> > the patch still matains a hardlimit for dispatched request. For a async,
> > the limit is cfq_slice_async/cfq_slice_idle = 5. For sync, the limit is 8.
> > And we only pipe out such number of requests at the begining of a slice.
> > For the workload you mentioned here, we only dispatch 1 extra request.
> 
> OK, that sound appropriate. Final question - why change the quantum and
> use quantum/2?
This is suggested by Vivek. In this way quantum is still the hard limit and
doesn't surprise users. we do throttling at 1/2 quantum (softlimit) and
then stop at quantum (hard limit)

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] cfq-iosched: quantum check tweak --resend
  2010-03-01  8:22       ` Shaohua Li
@ 2010-03-01  8:25         ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2010-03-01  8:25 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-kernel@vger.kernel.org, czoccolo@gmail.com,
	vgoyal@redhat.com, jmoyer@redhat.com, guijianfeng@cn.fujitsu.com

On Mon, Mar 01 2010, Shaohua Li wrote:
> On Mon, Mar 01, 2010 at 04:19:20PM +0800, Jens Axboe wrote:
> > On Mon, Mar 01 2010, Shaohua Li wrote:
> > > On Mon, Mar 01, 2010 at 04:02:34PM +0800, Jens Axboe wrote:
> > > > On Mon, Mar 01 2010, Shaohua Li wrote:
> > > > > Currently a queue can only dispatch up to 4 requests if there are other queues.
> > > > > This isn't optimal, device can handle more requests, for example, AHCI can
> > > > > handle 31 requests. I can understand the limit is for fairness, but we could
> > > > > do a tweak: if the queue still has a lot of slice left, sounds we could
> > > > > ignore the limit. Test shows this boost my workload (two thread randread of
> > > > > a SSD) from 78m/s to 100m/s.
> > > > > Thanks for suggestions from Corrado and Vivek for the patch.
> > > > 
> > > > As mentioned before, I think we definitely want to ensure that we drive
> > > > the full queue depth whenever possible. I think your patch is a bit
> > > > dangerous, though. The problematic workload here is a buffered write,
> > > > interleaved with the occasional sync reader. If the sync reader has to
> > > > endure 32 requests every time, latency rises dramatically for him.
> > > the patch still matains a hardlimit for dispatched request. For a async,
> > > the limit is cfq_slice_async/cfq_slice_idle = 5. For sync, the limit is 8.
> > > And we only pipe out such number of requests at the begining of a slice.
> > > For the workload you mentioned here, we only dispatch 1 extra request.
> > 
> > OK, that sound appropriate. Final question - why change the quantum and
> > use quantum/2?
> This is suggested by Vivek. In this way quantum is still the hard limit and
> doesn't surprise users. we do throttling at 1/2 quantum (softlimit) and
> then stop at quantum (hard limit)

OK, that makes sense. I will apply the patch, thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-03-01  8:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-01  1:50 [PATCH] cfq-iosched: quantum check tweak --resend Shaohua Li
2010-03-01  8:02 ` Jens Axboe
2010-03-01  8:15   ` Shaohua Li
2010-03-01  8:19     ` Jens Axboe
2010-03-01  8:22       ` Shaohua Li
2010-03-01  8:25         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox