* [RFC]block: change sort order of elv_dispatch_sort @ 2010-12-08 5:42 Shaohua Li 2010-12-08 6:56 ` Jens Axboe 0 siblings, 1 reply; 7+ messages in thread From: Shaohua Li @ 2010-12-08 5:42 UTC (permalink / raw) To: lkml; +Cc: Jens Axboe, vgoyal Change the sort order a little bit. Makes requests with sector above boundary in ascendant order, and requests with sector below boundary in descendant order. The goal is we have less disk spindle move. For example, boundary is 7, we add sector 8, 1, 9, 2, 3, 4, 10, 12, 5, 11, 6 In the original sort, the sorted list is: 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6 the spindle move is 8->12->1->6, total movement is 12*2 sectors with the new sort, the list is: 8, 9, 10, 11, 12, 6, 5, 4, 3, 2, 1 the spindle move is 8->12->6->1, total movement is 12*1.5 sectors Signed-off-by: Shaohua Li <shaohua.li@intel.com> diff --git a/block/elevator.c b/block/elevator.c index 2569512..1e01e49 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -443,12 +443,12 @@ void elv_dispatch_sort(struct request_queue *q, struct request *rq) if (blk_rq_pos(rq) >= boundary) { if (blk_rq_pos(pos) < boundary) continue; + if (blk_rq_pos(rq) >= blk_rq_pos(pos)) + break; } else { - if (blk_rq_pos(pos) >= boundary) + if (blk_rq_pos(rq) < blk_rq_pos(pos)) break; } - if (blk_rq_pos(rq) >= blk_rq_pos(pos)) - break; } list_add(&rq->queuelist, entry); ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC]block: change sort order of elv_dispatch_sort 2010-12-08 5:42 [RFC]block: change sort order of elv_dispatch_sort Shaohua Li @ 2010-12-08 6:56 ` Jens Axboe 2010-12-08 7:50 ` Shaohua Li 0 siblings, 1 reply; 7+ messages in thread From: Jens Axboe @ 2010-12-08 6:56 UTC (permalink / raw) To: Shaohua Li; +Cc: lkml, vgoyal@redhat.com On 2010-12-08 13:42, Shaohua Li wrote: > Change the sort order a little bit. Makes requests with sector above boundary > in ascendant order, and requests with sector below boundary in descendant > order. The goal is we have less disk spindle move. > For example, boundary is 7, we add sector 8, 1, 9, 2, 3, 4, 10, 12, 5, 11, 6 > In the original sort, the sorted list is: > 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6 > the spindle move is 8->12->1->6, total movement is 12*2 sectors > with the new sort, the list is: > 8, 9, 10, 11, 12, 6, 5, 4, 3, 2, 1 > the spindle move is 8->12->6->1, total movement is 12*1.5 sectors It was actually done this way on purpose, it's been a while since we have done two way elevators even outside the dispatch list sorting itself. Do you have any results to back this change up? I'd argue that continuing to the end, sweeping back, and reading forwards again will be faster then doing backwards reads usually. -- Jens Axboe ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC]block: change sort order of elv_dispatch_sort 2010-12-08 6:56 ` Jens Axboe @ 2010-12-08 7:50 ` Shaohua Li 2010-12-08 8:01 ` Jens Axboe 0 siblings, 1 reply; 7+ messages in thread From: Shaohua Li @ 2010-12-08 7:50 UTC (permalink / raw) To: Jens Axboe; +Cc: lkml, vgoyal@redhat.com On Wed, 2010-12-08 at 14:56 +0800, Jens Axboe wrote: > On 2010-12-08 13:42, Shaohua Li wrote: > > Change the sort order a little bit. Makes requests with sector above boundary > > in ascendant order, and requests with sector below boundary in descendant > > order. The goal is we have less disk spindle move. > > For example, boundary is 7, we add sector 8, 1, 9, 2, 3, 4, 10, 12, 5, 11, 6 > > In the original sort, the sorted list is: > > 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6 > > the spindle move is 8->12->1->6, total movement is 12*2 sectors > > with the new sort, the list is: > > 8, 9, 10, 11, 12, 6, 5, 4, 3, 2, 1 > > the spindle move is 8->12->6->1, total movement is 12*1.5 sectors > > It was actually done this way on purpose, it's been a while since we > have done two way elevators even outside the dispatch list sorting > itself. > > Do you have any results to back this change up? I'd argue that > continuing to the end, sweeping back, and reading forwards again will be > faster then doing backwards reads usually. No, have no data, that is why this is a RFC patch. Part reason is I don't know when we dispatch several requests to the list. Appears driver only takes one request one time. What kind of test do you suggest? I'm curious why the sweeping back is faster. It definitely needs more spindle move. is there any hardware trick here? Thanks, Shaohua ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC]block: change sort order of elv_dispatch_sort 2010-12-08 7:50 ` Shaohua Li @ 2010-12-08 8:01 ` Jens Axboe 2010-12-08 14:39 ` Shaohua Li 0 siblings, 1 reply; 7+ messages in thread From: Jens Axboe @ 2010-12-08 8:01 UTC (permalink / raw) To: Shaohua Li; +Cc: lkml, vgoyal@redhat.com On 2010-12-08 15:50, Shaohua Li wrote: > On Wed, 2010-12-08 at 14:56 +0800, Jens Axboe wrote: >> On 2010-12-08 13:42, Shaohua Li wrote: >>> Change the sort order a little bit. Makes requests with sector above boundary >>> in ascendant order, and requests with sector below boundary in descendant >>> order. The goal is we have less disk spindle move. >>> For example, boundary is 7, we add sector 8, 1, 9, 2, 3, 4, 10, 12, 5, 11, 6 >>> In the original sort, the sorted list is: >>> 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6 >>> the spindle move is 8->12->1->6, total movement is 12*2 sectors >>> with the new sort, the list is: >>> 8, 9, 10, 11, 12, 6, 5, 4, 3, 2, 1 >>> the spindle move is 8->12->6->1, total movement is 12*1.5 sectors >> >> It was actually done this way on purpose, it's been a while since we >> have done two way elevators even outside the dispatch list sorting >> itself. >> >> Do you have any results to back this change up? I'd argue that >> continuing to the end, sweeping back, and reading forwards again will be >> faster then doing backwards reads usually. > No, have no data, that is why this is a RFC patch. Part reason is I > don't know when we dispatch several requests to the list. Appears driver > only takes one request one time. What kind of test do you suggest? Yes that is usually the case, it's mainly meant as a holding point for dispatch, or for requeue, or for request that don't give sort ordering. Or on io scheduler switches, for instance. > I'm curious why the sweeping back is faster. It definitely needs more > spindle move. is there any hardware trick here? The idea is that while the initial seek is longer, due to drive prefetch serving the latter half request series after the sweep is faster. I know that classic OS books mentions this is a good method, but I don't think that has been the case for a long time. -- Jens Axboe ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC]block: change sort order of elv_dispatch_sort 2010-12-08 8:01 ` Jens Axboe @ 2010-12-08 14:39 ` Shaohua Li 2010-12-08 14:44 ` Jens Axboe 0 siblings, 1 reply; 7+ messages in thread From: Shaohua Li @ 2010-12-08 14:39 UTC (permalink / raw) To: Jens Axboe; +Cc: lkml, vgoyal@redhat.com On Wed, 2010-12-08 at 16:01 +0800, Jens Axboe wrote: > On 2010-12-08 15:50, Shaohua Li wrote: > > On Wed, 2010-12-08 at 14:56 +0800, Jens Axboe wrote: > >> On 2010-12-08 13:42, Shaohua Li wrote: > >>> Change the sort order a little bit. Makes requests with sector above boundary > >>> in ascendant order, and requests with sector below boundary in descendant > >>> order. The goal is we have less disk spindle move. > >>> For example, boundary is 7, we add sector 8, 1, 9, 2, 3, 4, 10, 12, 5, 11, 6 > >>> In the original sort, the sorted list is: > >>> 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6 > >>> the spindle move is 8->12->1->6, total movement is 12*2 sectors > >>> with the new sort, the list is: > >>> 8, 9, 10, 11, 12, 6, 5, 4, 3, 2, 1 > >>> the spindle move is 8->12->6->1, total movement is 12*1.5 sectors > >> > >> It was actually done this way on purpose, it's been a while since we > >> have done two way elevators even outside the dispatch list sorting > >> itself. > >> > >> Do you have any results to back this change up? I'd argue that > >> continuing to the end, sweeping back, and reading forwards again will be > >> faster then doing backwards reads usually. > > No, have no data, that is why this is a RFC patch. Part reason is I > > don't know when we dispatch several requests to the list. Appears driver > > only takes one request one time. What kind of test do you suggest? > > Yes that is usually the case, it's mainly meant as a holding point for > dispatch, or for requeue, or for request that don't give sort ordering. > Or on io scheduler switches, for instance. Have a test in a hacked way. I use modified noop iosched, and every time when noop tries to dispatch request, it dispatches all requests in its list. Test does random read. The result is actually quite stable. The changed order always gives slightly better throughput, but the improvement is quite small (<1%) > > I'm curious why the sweeping back is faster. It definitely needs more > > spindle move. is there any hardware trick here? > > The idea is that while the initial seek is longer, due to drive prefetch > serving the latter half request series after the sweep is faster. > > I know that classic OS books mentions this is a good method, but I don't > think that has been the case for a long time. Hmm, if this is sequential I/O, then requests already merged. if not, how could drive know how to do prefetch. Thanks, shaohua ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC]block: change sort order of elv_dispatch_sort 2010-12-08 14:39 ` Shaohua Li @ 2010-12-08 14:44 ` Jens Axboe 2010-12-09 13:17 ` Shaohua Li 0 siblings, 1 reply; 7+ messages in thread From: Jens Axboe @ 2010-12-08 14:44 UTC (permalink / raw) To: Shaohua Li; +Cc: lkml, vgoyal@redhat.com On 2010-12-08 22:39, Shaohua Li wrote: > On Wed, 2010-12-08 at 16:01 +0800, Jens Axboe wrote: >> On 2010-12-08 15:50, Shaohua Li wrote: >>> On Wed, 2010-12-08 at 14:56 +0800, Jens Axboe wrote: >>>> On 2010-12-08 13:42, Shaohua Li wrote: >>>>> Change the sort order a little bit. Makes requests with sector above boundary >>>>> in ascendant order, and requests with sector below boundary in descendant >>>>> order. The goal is we have less disk spindle move. >>>>> For example, boundary is 7, we add sector 8, 1, 9, 2, 3, 4, 10, 12, 5, 11, 6 >>>>> In the original sort, the sorted list is: >>>>> 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6 >>>>> the spindle move is 8->12->1->6, total movement is 12*2 sectors >>>>> with the new sort, the list is: >>>>> 8, 9, 10, 11, 12, 6, 5, 4, 3, 2, 1 >>>>> the spindle move is 8->12->6->1, total movement is 12*1.5 sectors >>>> >>>> It was actually done this way on purpose, it's been a while since we >>>> have done two way elevators even outside the dispatch list sorting >>>> itself. >>>> >>>> Do you have any results to back this change up? I'd argue that >>>> continuing to the end, sweeping back, and reading forwards again will be >>>> faster then doing backwards reads usually. >>> No, have no data, that is why this is a RFC patch. Part reason is I >>> don't know when we dispatch several requests to the list. Appears driver >>> only takes one request one time. What kind of test do you suggest? >> >> Yes that is usually the case, it's mainly meant as a holding point for >> dispatch, or for requeue, or for request that don't give sort ordering. >> Or on io scheduler switches, for instance. > > Have a test in a hacked way. I use modified noop iosched, and every time > when noop tries to dispatch request, it dispatches all requests in its > list. Test does random read. The result is actually quite stable. The > changed order always gives slightly better throughput, but the > improvement is quite small (<1%) First of all I think 1% is too close to call, unless your results are REALLY stable. Secondly, a truly random workload is not a good test case as requests are going to be all over the map anyway. For something more realistic (like your example, but of course not fully contig) it would be interesting to see. >>> I'm curious why the sweeping back is faster. It definitely needs more >>> spindle move. is there any hardware trick here? >> >> The idea is that while the initial seek is longer, due to drive prefetch >> serving the latter half request series after the sweep is faster. >> >> I know that classic OS books mentions this is a good method, but I don't >> think that has been the case for a long time. > > Hmm, if this is sequential I/O, then requests already merged. if not, > how could drive know how to do prefetch. Certainly, the requests are not going to look like in your example. I didn't take those literally, I was assuming you just meant increasing order on both sides. Once the drive has positioned the head, it is going to read more than just the single sector in that request. They do do read caching. -- Jens Axboe ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC]block: change sort order of elv_dispatch_sort 2010-12-08 14:44 ` Jens Axboe @ 2010-12-09 13:17 ` Shaohua Li 0 siblings, 0 replies; 7+ messages in thread From: Shaohua Li @ 2010-12-09 13:17 UTC (permalink / raw) To: Jens Axboe; +Cc: lkml, vgoyal@redhat.com On Wed, 2010-12-08 at 22:44 +0800, Jens Axboe wrote: > On 2010-12-08 22:39, Shaohua Li wrote: > > On Wed, 2010-12-08 at 16:01 +0800, Jens Axboe wrote: > >> On 2010-12-08 15:50, Shaohua Li wrote: > >>> On Wed, 2010-12-08 at 14:56 +0800, Jens Axboe wrote: > >>>> On 2010-12-08 13:42, Shaohua Li wrote: > >>>>> Change the sort order a little bit. Makes requests with sector above boundary > >>>>> in ascendant order, and requests with sector below boundary in descendant > >>>>> order. The goal is we have less disk spindle move. > >>>>> For example, boundary is 7, we add sector 8, 1, 9, 2, 3, 4, 10, 12, 5, 11, 6 > >>>>> In the original sort, the sorted list is: > >>>>> 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6 > >>>>> the spindle move is 8->12->1->6, total movement is 12*2 sectors > >>>>> with the new sort, the list is: > >>>>> 8, 9, 10, 11, 12, 6, 5, 4, 3, 2, 1 > >>>>> the spindle move is 8->12->6->1, total movement is 12*1.5 sectors > >>>> > >>>> It was actually done this way on purpose, it's been a while since we > >>>> have done two way elevators even outside the dispatch list sorting > >>>> itself. > >>>> > >>>> Do you have any results to back this change up? I'd argue that > >>>> continuing to the end, sweeping back, and reading forwards again will be > >>>> faster then doing backwards reads usually. > >>> No, have no data, that is why this is a RFC patch. Part reason is I > >>> don't know when we dispatch several requests to the list. Appears driver > >>> only takes one request one time. What kind of test do you suggest? > >> > >> Yes that is usually the case, it's mainly meant as a holding point for > >> dispatch, or for requeue, or for request that don't give sort ordering. > >> Or on io scheduler switches, for instance. > > > > Have a test in a hacked way. I use modified noop iosched, and every time > > when noop tries to dispatch request, it dispatches all requests in its > > list. Test does random read. The result is actually quite stable. The > > changed order always gives slightly better throughput, but the > > improvement is quite small (<1%) > > First of all I think 1% is too close to call, unless your results are > REALLY stable. Secondly, a truly random workload is not a good test > case as requests are going to be all over the map anyway. For something > more realistic (like your example, but of course not fully contig) it > would be interesting to see. Tried a random I/O read with block size range from 4k to 64k, I thought this is more realistic. The test result for the two different sort methods still shows slightly difference. I'll give up the patch unless there is better workload to try. Thanks, Shaohua ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-12-09 13:17 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-12-08 5:42 [RFC]block: change sort order of elv_dispatch_sort Shaohua Li 2010-12-08 6:56 ` Jens Axboe 2010-12-08 7:50 ` Shaohua Li 2010-12-08 8:01 ` Jens Axboe 2010-12-08 14:39 ` Shaohua Li 2010-12-08 14:44 ` Jens Axboe 2010-12-09 13:17 ` Shaohua Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox