* [Regression] High latency when doing large I/O @ 2009-01-17 0:44 Mathieu Desnoyers 2009-01-17 16:26 ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers 0 siblings, 1 reply; 39+ messages in thread From: Mathieu Desnoyers @ 2009-01-17 0:44 UTC (permalink / raw) To: Jens Axboe, Andrea Arcangeli, akpm, Ben Gamari, ltt-dev Cc: linux-kernel, Jens Axboe Hi, A long standing I/O regression (since 2.6.18, still there today) has hit Slashdot recently : http://bugzilla.kernel.org/show_bug.cgi?id=12309 http://it.slashdot.org/article.pl?sid=09/01/15/049201 I've taken a trace reproducing the wrong behavior on my machine and I think it's getting us somewhere. LTTng 0.83, kernel 2.6.28 Machine : Intel Xeon E5405 dual quad-core, 16GB ram (just created a new block-trace.c LTTng probe which is not released yet. It basically replaces blktrace) echo 3 > /proc/sys/vm/drop_caches lttctl -C -w /tmp/trace -o channel.mm.bufnum=8 -o channel.block.bufnum=64 trace dd if=/dev/zero of=/tmp/newfile bs=1M count=1M cp -ax music /tmp (copying 1.1GB of mp3) ls (takes 15 seconds to get the directory listing !) lttctl -D trace I looked at the trace (especially at the ls surroundings), and bash is waiting for a few seconds for I/O in the exec system call (to exec ls). While this happens, we have dd doing lots and lots of bio_queue. There is a bio_backmerge after each bio_queue event. This is reasonable, because dd is writing to a contiguous file. However, I wonder if this is not the actual problem. We have dd which has the head request in the elevator request queue. It is progressing steadily by plugging/unplugging the device periodically and gets its work done. However, because requests are being dequeued at the same rate others are being merged, I suspect it stays at the top of the queue and does not let the other unrelated requests run. There is a test in the blk-merge.c which makes sure that merged requests do not get bigger than a certain size. However, if the request is steadily dequeued, I think this test is not doing anything. If you are interested in looking at the trace I've taken, I could provide it. Does that make sense ? Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-17 0:44 [Regression] High latency when doing large I/O Mathieu Desnoyers @ 2009-01-17 16:26 ` Mathieu Desnoyers 2009-01-17 16:50 ` Leon Woestenberg ` (2 more replies) 0 siblings, 3 replies; 39+ messages in thread From: Mathieu Desnoyers @ 2009-01-17 16:26 UTC (permalink / raw) To: Jens Axboe, Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds Cc: linux-kernel, ltt-dev A long standing I/O regression (since 2.6.18, still there today) has hit Slashdot recently : http://bugzilla.kernel.org/show_bug.cgi?id=12309 http://it.slashdot.org/article.pl?sid=09/01/15/049201 I've taken a trace reproducing the wrong behavior on my machine and I think it's getting us somewhere. LTTng 0.83, kernel 2.6.28 Machine : Intel Xeon E5405 dual quad-core, 16GB ram (just created a new block-trace.c LTTng probe which is not released yet. It basically replaces blktrace) echo 3 > /proc/sys/vm/drop_caches lttctl -C -w /tmp/trace -o channel.mm.bufnum=8 -o channel.block.bufnum=64 trace dd if=/dev/zero of=/tmp/newfile bs=1M count=1M cp -ax music /tmp (copying 1.1GB of mp3) ls (takes 15 seconds to get the directory listing !) lttctl -D trace I looked at the trace (especially at the ls surroundings), and bash is waiting for a few seconds for I/O in the exec system call (to exec ls). While this happens, we have dd doing lots and lots of bio_queue. There is a bio_backmerge after each bio_queue event. This is reasonable, because dd is writing to a contiguous file. However, I wonder if this is not the actual problem. We have dd which has the head request in the elevator request queue. It is progressing steadily by plugging/unplugging the device periodically and gets its work done. However, because requests are being dequeued at the same rate others are being merged, I suspect it stays at the top of the queue and does not let the other unrelated requests run. There is a test in the blk-merge.c which makes sure that merged requests do not get bigger than a certain size. However, if the request is steadily dequeued, I think this test is not doing anything. This patch implements a basic test to make sure we never merge more than 128 requests into the same request if it is the "last_merge" request. I have not been able to trigger the problem again with the fix applied. It might not be in a perfect state : there may be better solutions to the problem, but I think it helps pointing out where the culprit lays. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: Jens Axboe <axboe@kernel.dk> CC: Andrea Arcangeli <andrea@suse.de> CC: akpm@linux-foundation.org CC: Ingo Molnar <mingo@elte.hu> CC: Linus Torvalds <torvalds@linux-foundation.org> --- block/blk-merge.c | 12 +++++++++--- block/elevator.c | 31 ++++++++++++++++++++++++++++--- include/linux/blkdev.h | 1 + 3 files changed, 38 insertions(+), 6 deletions(-) Index: linux-2.6-lttng/include/linux/blkdev.h =================================================================== --- linux-2.6-lttng.orig/include/linux/blkdev.h 2009-01-17 09:49:54.000000000 -0500 +++ linux-2.6-lttng/include/linux/blkdev.h 2009-01-17 09:50:29.000000000 -0500 @@ -313,6 +313,7 @@ struct request_queue */ struct list_head queue_head; struct request *last_merge; + int nr_cached_merge; elevator_t *elevator; /* Index: linux-2.6-lttng/block/elevator.c =================================================================== --- linux-2.6-lttng.orig/block/elevator.c 2009-01-17 09:49:54.000000000 -0500 +++ linux-2.6-lttng/block/elevator.c 2009-01-17 11:07:12.000000000 -0500 @@ -255,6 +255,7 @@ int elevator_init(struct request_queue * INIT_LIST_HEAD(&q->queue_head); q->last_merge = NULL; + q->nr_cached_merge = 0; q->end_sector = 0; q->boundary_rq = NULL; @@ -438,8 +439,10 @@ void elv_dispatch_sort(struct request_qu struct list_head *entry; int stop_flags; - if (q->last_merge == rq) + if (q->last_merge == rq) { q->last_merge = NULL; + q->nr_cached_merge = 0; + } elv_rqhash_del(q, rq); @@ -478,8 +481,10 @@ EXPORT_SYMBOL(elv_dispatch_sort); */ void elv_dispatch_add_tail(struct request_queue *q, struct request *rq) { - if (q->last_merge == rq) + if (q->last_merge == rq) { q->last_merge = NULL; + q->nr_cached_merge = 0; + } elv_rqhash_del(q, rq); @@ -498,6 +503,16 @@ int elv_merge(struct request_queue *q, s int ret; /* + * Make sure we don't starve other requests by merging too many cached + * requests together. + */ + if (q->nr_cached_merge >= BLKDEV_MAX_RQ) { + q->last_merge = NULL; + q->nr_cached_merge = 0; + return ELEVATOR_NO_MERGE; + } + + /* * First try one-hit cache. */ if (q->last_merge) { @@ -536,6 +551,10 @@ void elv_merged_request(struct request_q if (type == ELEVATOR_BACK_MERGE) elv_rqhash_reposition(q, rq); + if (q->last_merge != rq) + q->nr_cached_merge = 0; + else + q->nr_cached_merge++; q->last_merge = rq; } @@ -551,6 +570,10 @@ void elv_merge_requests(struct request_q elv_rqhash_del(q, next); q->nr_sorted--; + if (q->last_merge != rq) + q->nr_cached_merge = 0; + else + q->nr_cached_merge++; q->last_merge = rq; } @@ -626,8 +649,10 @@ void elv_insert(struct request_queue *q, q->nr_sorted++; if (rq_mergeable(rq)) { elv_rqhash_add(q, rq); - if (!q->last_merge) + if (!q->last_merge) { + q->nr_cached_merge = 1; q->last_merge = rq; + } } /* Index: linux-2.6-lttng/block/blk-merge.c =================================================================== --- linux-2.6-lttng.orig/block/blk-merge.c 2009-01-17 09:49:54.000000000 -0500 +++ linux-2.6-lttng/block/blk-merge.c 2009-01-17 09:50:29.000000000 -0500 @@ -231,8 +231,10 @@ static inline int ll_new_hw_segment(stru if (req->nr_phys_segments + nr_phys_segs > q->max_hw_segments || req->nr_phys_segments + nr_phys_segs > q->max_phys_segments) { req->cmd_flags |= REQ_NOMERGE; - if (req == q->last_merge) + if (req == q->last_merge) { q->last_merge = NULL; + q->nr_cached_merge = 0; + } return 0; } @@ -256,8 +258,10 @@ int ll_back_merge_fn(struct request_queu if (req->nr_sectors + bio_sectors(bio) > max_sectors) { req->cmd_flags |= REQ_NOMERGE; - if (req == q->last_merge) + if (req == q->last_merge) { q->last_merge = NULL; + q->nr_cached_merge = 0; + } return 0; } if (!bio_flagged(req->biotail, BIO_SEG_VALID)) @@ -281,8 +285,10 @@ int ll_front_merge_fn(struct request_que if (req->nr_sectors + bio_sectors(bio) > max_sectors) { req->cmd_flags |= REQ_NOMERGE; - if (req == q->last_merge) + if (req == q->last_merge) { q->last_merge = NULL; + q->nr_cached_merge = 0; + } return 0; } if (!bio_flagged(bio, BIO_SEG_VALID)) -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-17 16:26 ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers @ 2009-01-17 16:50 ` Leon Woestenberg 2009-01-17 17:15 ` Mathieu Desnoyers 2009-01-17 19:04 ` Jens Axboe 2009-01-17 20:03 ` Ben Gamari 2 siblings, 1 reply; 39+ messages in thread From: Leon Woestenberg @ 2009-01-17 16:50 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Jens Axboe, Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev, Thomas Gleixner Hello Mathieu et al, On Sat, Jan 17, 2009 at 5:26 PM, Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote: > A long standing I/O regression (since 2.6.18, still there today) has hit > Slashdot recently : > http://bugzilla.kernel.org/show_bug.cgi?id=12309 Are you sure you are solving the *actual* problem? The bugzilla entry shows a bisect attempt that leads to a patch involving negative clock jumps. http://bugzilla.kernel.org/show_bug.cgi?id=12309#c29 with a corrected link to the bisect patch: http://bugzilla.kernel.org/show_bug.cgi?id=12309#c30 Wouldn't a negative clock jump be very influential to the (time-driven) I/O schedulers and be a more probable cause? Regards, -- Leon p.s. Added Thomas to the CC list as his name is on the patch Signed-off-by list. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-17 16:50 ` Leon Woestenberg @ 2009-01-17 17:15 ` Mathieu Desnoyers 0 siblings, 0 replies; 39+ messages in thread From: Mathieu Desnoyers @ 2009-01-17 17:15 UTC (permalink / raw) To: Leon Woestenberg Cc: Jens Axboe, Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev, Thomas Gleixner * Leon Woestenberg (leon.woestenberg@gmail.com) wrote: > Hello Mathieu et al, > > On Sat, Jan 17, 2009 at 5:26 PM, Mathieu Desnoyers > <mathieu.desnoyers@polymtl.ca> wrote: > > A long standing I/O regression (since 2.6.18, still there today) has hit > > Slashdot recently : > > http://bugzilla.kernel.org/show_bug.cgi?id=12309 > > Are you sure you are solving the *actual* problem? > > The bugzilla entry shows a bisect attempt that leads to a patch > involving negative clock jumps. > http://bugzilla.kernel.org/show_bug.cgi?id=12309#c29 > > with a corrected link to the bisect patch: > http://bugzilla.kernel.org/show_bug.cgi?id=12309#c30 > > Wouldn't a negative clock jump be very influential to the > (time-driven) I/O schedulers and be a more probable cause? > When a merge is done, the lowest timestamp between the existing request and the new request to merge is kept as a start_time value for the merged request we end up with. In this case, that would probably make that request stay on top of the queue even if unrelated interactive I/O requests come. I suspect that this negative clock jump could have hidden the problem by making the start time of the interactive request lower than the start time of the merged request. Mathieu > Regards, > -- > Leon > > p.s. Added Thomas to the CC list as his name is on the patch Signed-off-by list. -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-17 16:26 ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers 2009-01-17 16:50 ` Leon Woestenberg @ 2009-01-17 19:04 ` Jens Axboe 2009-01-18 21:12 ` Mathieu Desnoyers 2009-01-19 15:45 ` Nikanth K 2009-01-17 20:03 ` Ben Gamari 2 siblings, 2 replies; 39+ messages in thread From: Jens Axboe @ 2009-01-17 19:04 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev On Sat, Jan 17 2009, Mathieu Desnoyers wrote: > A long standing I/O regression (since 2.6.18, still there today) has hit > Slashdot recently : > http://bugzilla.kernel.org/show_bug.cgi?id=12309 > http://it.slashdot.org/article.pl?sid=09/01/15/049201 > > I've taken a trace reproducing the wrong behavior on my machine and I > think it's getting us somewhere. > > LTTng 0.83, kernel 2.6.28 > Machine : Intel Xeon E5405 dual quad-core, 16GB ram > (just created a new block-trace.c LTTng probe which is not released yet. > It basically replaces blktrace) > > > echo 3 > /proc/sys/vm/drop_caches > > lttctl -C -w /tmp/trace -o channel.mm.bufnum=8 -o channel.block.bufnum=64 trace > > dd if=/dev/zero of=/tmp/newfile bs=1M count=1M > cp -ax music /tmp (copying 1.1GB of mp3) > > ls (takes 15 seconds to get the directory listing !) > > lttctl -D trace > > I looked at the trace (especially at the ls surroundings), and bash is > waiting for a few seconds for I/O in the exec system call (to exec ls). > > While this happens, we have dd doing lots and lots of bio_queue. There > is a bio_backmerge after each bio_queue event. This is reasonable, > because dd is writing to a contiguous file. > > However, I wonder if this is not the actual problem. We have dd which > has the head request in the elevator request queue. It is progressing > steadily by plugging/unplugging the device periodically and gets its > work done. However, because requests are being dequeued at the same > rate others are being merged, I suspect it stays at the top of the queue > and does not let the other unrelated requests run. > > There is a test in the blk-merge.c which makes sure that merged requests > do not get bigger than a certain size. However, if the request is > steadily dequeued, I think this test is not doing anything. > > > This patch implements a basic test to make sure we never merge more > than 128 requests into the same request if it is the "last_merge" > request. I have not been able to trigger the problem again with the > fix applied. It might not be in a perfect state : there may be better > solutions to the problem, but I think it helps pointing out where the > culprit lays. To be painfully honest, I have no idea what you are attempting to solve with this patch. First of all, Linux has always merged any request possible. The one-hit cache is just that, a one hit cache frontend for merging. We'll be hitting the merge hash and doing the same merge if it fails. Since we even cap the size of the request, the merging is also bounded. Furthermore, the request being merged is not considered for IO yet. It has not been dispatched by the io scheduler. IOW, I'm surprised your patch makes any difference at all. Especially with your 128 limit, since 4kbx128kb is 512kb which is the default max merge size anyway. These sort of test cases tend to be very sensitive and exhibit different behaviour for many runs, so call me a bit skeptical and consider that an enouragement to do more directed testing. You could use fio for instance. Have two jobs in your job file. One is a dd type process that just writes a huge file, the other job starts eg 10 seconds later and does a 4kb read of a file. As a quick test, could you try and increase the slice_idle to eg 20ms? Sometimes I've seen timing being slightly off, which makes us miss the sync window for the ls (in your case) process. Then you get a mix of async and sync IO all the time, which very much slows down the sync process. -- Jens Axboe ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-17 19:04 ` Jens Axboe @ 2009-01-18 21:12 ` Mathieu Desnoyers 2009-01-18 21:27 ` Mathieu Desnoyers 2009-01-19 18:26 ` Jens Axboe 2009-01-19 15:45 ` Nikanth K 1 sibling, 2 replies; 39+ messages in thread From: Mathieu Desnoyers @ 2009-01-18 21:12 UTC (permalink / raw) To: Jens Axboe Cc: Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev * Jens Axboe (jens.axboe@oracle.com) wrote: > On Sat, Jan 17 2009, Mathieu Desnoyers wrote: > > A long standing I/O regression (since 2.6.18, still there today) has hit > > Slashdot recently : > > http://bugzilla.kernel.org/show_bug.cgi?id=12309 > > http://it.slashdot.org/article.pl?sid=09/01/15/049201 > > > > I've taken a trace reproducing the wrong behavior on my machine and I > > think it's getting us somewhere. > > > > LTTng 0.83, kernel 2.6.28 > > Machine : Intel Xeon E5405 dual quad-core, 16GB ram > > (just created a new block-trace.c LTTng probe which is not released yet. > > It basically replaces blktrace) > > > > > > echo 3 > /proc/sys/vm/drop_caches > > > > lttctl -C -w /tmp/trace -o channel.mm.bufnum=8 -o channel.block.bufnum=64 trace > > > > dd if=/dev/zero of=/tmp/newfile bs=1M count=1M > > cp -ax music /tmp (copying 1.1GB of mp3) > > > > ls (takes 15 seconds to get the directory listing !) > > > > lttctl -D trace > > > > I looked at the trace (especially at the ls surroundings), and bash is > > waiting for a few seconds for I/O in the exec system call (to exec ls). > > > > While this happens, we have dd doing lots and lots of bio_queue. There > > is a bio_backmerge after each bio_queue event. This is reasonable, > > because dd is writing to a contiguous file. > > > > However, I wonder if this is not the actual problem. We have dd which > > has the head request in the elevator request queue. It is progressing > > steadily by plugging/unplugging the device periodically and gets its > > work done. However, because requests are being dequeued at the same > > rate others are being merged, I suspect it stays at the top of the queue > > and does not let the other unrelated requests run. > > > > There is a test in the blk-merge.c which makes sure that merged requests > > do not get bigger than a certain size. However, if the request is > > steadily dequeued, I think this test is not doing anything. > > > > > > This patch implements a basic test to make sure we never merge more > > than 128 requests into the same request if it is the "last_merge" > > request. I have not been able to trigger the problem again with the > > fix applied. It might not be in a perfect state : there may be better > > solutions to the problem, but I think it helps pointing out where the > > culprit lays. > > To be painfully honest, I have no idea what you are attempting to solve > with this patch. First of all, Linux has always merged any request > possible. The one-hit cache is just that, a one hit cache frontend for > merging. We'll be hitting the merge hash and doing the same merge if it > fails. Since we even cap the size of the request, the merging is also > bounded. > Hi Jens, I was mostly trying to poke around and try to figure out what was going on in the I/O elevator. Sorry if my first attempts did not make much sense. Following your advice, I've looked more deeply into the test cases. > Furthermore, the request being merged is not considered for IO yet. It > has not been dispatched by the io scheduler. IOW, I'm surprised your > patch makes any difference at all. Especially with your 128 limit, since > 4kbx128kb is 512kb which is the default max merge size anyway. These > sort of test cases tend to be very sensitive and exhibit different > behaviour for many runs, so call me a bit skeptical and consider that an > enouragement to do more directed testing. You could use fio for > instance. Have two jobs in your job file. One is a dd type process that > just writes a huge file, the other job starts eg 10 seconds later and > does a 4kb read of a file. > I looked at the "ls" behavior (while doing a dd) within my LTTng trace to create a fio job file. The said behavior is appended below as "Part 1 - ls I/O behavior". Note that the original "ls" test case was done with the anticipatory I/O scheduler, which was active by default on my debian system with custom vanilla 2.6.28 kernel. Also note that I am running this on a raid-1, but have experienced the same problem on a standard partition I created on the same machine. I created the fio job file appended as "Part 2 - dd+ls fio job file". It consists of one dd-like job and many small jobs reading as many data as ls did. I used the small test script to batch run this ("Part 3 - batch test"). The results for the ls-like jobs are interesting : I/O scheduler runt-min (msec) runt-max (msec) noop 41 10563 anticipatory 63 8185 deadline 52 33387 cfq 43 1420 > As a quick test, could you try and increase the slice_idle to eg 20ms? > Sometimes I've seen timing being slightly off, which makes us miss the > sync window for the ls (in your case) process. Then you get a mix of > async and sync IO all the time, which very much slows down the sync > process. > Just to confirm, the quick test you are taking about would be : --- block/cfq-iosched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6-lttng/block/cfq-iosched.c =================================================================== --- linux-2.6-lttng.orig/block/cfq-iosched.c 2009-01-18 15:17:32.000000000 -0500 +++ linux-2.6-lttng/block/cfq-iosched.c 2009-01-18 15:46:38.000000000 -0500 @@ -26,7 +26,7 @@ static const int cfq_back_penalty = 2; static const int cfq_slice_sync = HZ / 10; static int cfq_slice_async = HZ / 25; static const int cfq_slice_async_rq = 2; -static int cfq_slice_idle = HZ / 125; +static int cfq_slice_idle = 20; /* * offset from end of service tree It does not make much difference with the standard cfq test : I/O scheduler runt-min (msec) runt-max (msec) cfq (standard) 43 1420 cfq (20ms slice_idle) 31 1573 So, I guess 1.5s delay to run ls on a directory when the cache is cold with a cfq I/O scheduler is somewhat acceptable, but I doubt the 8, 10 and 33s response times for the anticipatory, noop and deadline I/O schedulers are. I wonder why on earth is the anticipatory I/O scheduler activated by default with my kernel given it results in so poor interactive behavior when doing large I/O ? Thanks for the advices, Mathieu * Part 1 - ls I/O behavior lttv -m textDump -t /traces/block-backmerge \ -e "state.pid=4145&event.subname=bio_queue" block.bio_queue: 662.707321959 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, SYSCALL { sector = 327680048, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 662.707331445 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, SYSCALL { sector = 349175018, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 662.968214766 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, SYSCALL { sector = 327696968, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 662.968222110 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, SYSCALL { sector = 349191938, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 662.971662800 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697032, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 662.971670417 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192002, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 662.971684184 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697040, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 662.971689854 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192010, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 662.971695762 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697048, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 662.971701135 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192018, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 662.971706301 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697056, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 662.971711698 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192026, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 662.971723359 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697064, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 662.971729035 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192034, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 662.999391873 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697072, size = 53248, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 662.999397864 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192042, size = 53248, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 670.809328737 (/traces/block-backmerge/block_7), 4145, 4145, /bin/ls, , 4063, 0x0, TRAP { sector = 327697000, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 670.809337500 (/traces/block-backmerge/block_7), 4145, 4145, /bin/ls, , 4063, 0x0, TRAP { sector = 349191970, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 671.161036834 (/traces/block-backmerge/block_5), 4145, 4145, /bin/ls, , 4063, 0x0, SYSCALL { sector = 360714880, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 671.161047247 (/traces/block-backmerge/block_5), 4145, 4145, /bin/ls, , 4063, 0x0, SYSCALL { sector = 382209850, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 671.653601399 (/traces/block-backmerge/block_7), 4145, 4145, /bin/ls, , 4063, 0x0, SYSCALL { sector = 360712184, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 671.653611077 (/traces/block-backmerge/block_7), 4145, 4145, /bin/ls, , 4063, 0x0, SYSCALL { sector = 382207154, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } * Part 2 - dd+ls fio job file (test.job5) [job1] rw=write size=10240m direct=0 blocksize=1024k [global] rw=randread size=96k filesize=30m direct=0 bsrange=4k-52k [file1] startdelay=0 [file2] startdelay=4 [file3] startdelay=8 [file4] startdelay=12 [file5] startdelay=16 [file6] startdelay=20 [file7] startdelay=24 [file8] startdelay=28 [file9] startdelay=32 [file10] startdelay=36 [file11] startdelay=40 [file12] startdelay=44 [file13] startdelay=48 [file14] startdelay=52 [file15] startdelay=56 [file16] startdelay=60 [file17] startdelay=64 [file18] startdelay=68 [file19] startdelay=72 [file20] startdelay=76 [file21] startdelay=80 [file22] startdelay=84 [file23] startdelay=88 [file24] startdelay=92 [file25] startdelay=96 [file26] startdelay=100 [file27] startdelay=104 [file28] startdelay=108 [file29] startdelay=112 [file30] startdelay=116 [file31] startdelay=120 [file32] startdelay=124 [file33] startdelay=128 [file34] startdelay=132 [file35] startdelay=134 [file36] startdelay=138 [file37] startdelay=142 [file38] startdelay=146 [file39] startdelay=150 [file40] startdelay=200 [file41] startdelay=260 * Part 3 - batch test (do-tests.sh) #!/bin/sh TESTS="anticipatory noop deadline cfq" for TEST in ${TESTS}; do echo "Running ${TEST}" rm -f file*.0 job*.0 echo ${TEST} > /sys/block/sda/queue/scheduler echo ${TEST} > /sys/block/sdb/queue/scheduler sync echo 3 > /proc/sys/vm/drop_caches sleep 5 ./fio test.job5 --output test.result.${TEST} done -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-18 21:12 ` Mathieu Desnoyers @ 2009-01-18 21:27 ` Mathieu Desnoyers 2009-01-19 18:26 ` Jens Axboe 1 sibling, 0 replies; 39+ messages in thread From: Mathieu Desnoyers @ 2009-01-18 21:27 UTC (permalink / raw) To: Jens Axboe Cc: Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote: > * Jens Axboe (jens.axboe@oracle.com) wrote: > > On Sat, Jan 17 2009, Mathieu Desnoyers wrote: > > > A long standing I/O regression (since 2.6.18, still there today) has hit > > > Slashdot recently : > > > http://bugzilla.kernel.org/show_bug.cgi?id=12309 > > > http://it.slashdot.org/article.pl?sid=09/01/15/049201 > > > > > > I've taken a trace reproducing the wrong behavior on my machine and I > > > think it's getting us somewhere. > > > > > > LTTng 0.83, kernel 2.6.28 > > > Machine : Intel Xeon E5405 dual quad-core, 16GB ram > > > (just created a new block-trace.c LTTng probe which is not released yet. > > > It basically replaces blktrace) > > > > > > > > > echo 3 > /proc/sys/vm/drop_caches > > > > > > lttctl -C -w /tmp/trace -o channel.mm.bufnum=8 -o channel.block.bufnum=64 trace > > > > > > dd if=/dev/zero of=/tmp/newfile bs=1M count=1M > > > cp -ax music /tmp (copying 1.1GB of mp3) > > > > > > ls (takes 15 seconds to get the directory listing !) > > > > > > lttctl -D trace > > > > > > I looked at the trace (especially at the ls surroundings), and bash is > > > waiting for a few seconds for I/O in the exec system call (to exec ls). > > > > > > While this happens, we have dd doing lots and lots of bio_queue. There > > > is a bio_backmerge after each bio_queue event. This is reasonable, > > > because dd is writing to a contiguous file. > > > > > > However, I wonder if this is not the actual problem. We have dd which > > > has the head request in the elevator request queue. It is progressing > > > steadily by plugging/unplugging the device periodically and gets its > > > work done. However, because requests are being dequeued at the same > > > rate others are being merged, I suspect it stays at the top of the queue > > > and does not let the other unrelated requests run. > > > > > > There is a test in the blk-merge.c which makes sure that merged requests > > > do not get bigger than a certain size. However, if the request is > > > steadily dequeued, I think this test is not doing anything. > > > > > > > > > This patch implements a basic test to make sure we never merge more > > > than 128 requests into the same request if it is the "last_merge" > > > request. I have not been able to trigger the problem again with the > > > fix applied. It might not be in a perfect state : there may be better > > > solutions to the problem, but I think it helps pointing out where the > > > culprit lays. > > > > To be painfully honest, I have no idea what you are attempting to solve > > with this patch. First of all, Linux has always merged any request > > possible. The one-hit cache is just that, a one hit cache frontend for > > merging. We'll be hitting the merge hash and doing the same merge if it > > fails. Since we even cap the size of the request, the merging is also > > bounded. > > > > Hi Jens, > > I was mostly trying to poke around and try to figure out what was going > on in the I/O elevator. Sorry if my first attempts did not make much > sense. Following your advice, I've looked more deeply into the test > cases. > > > Furthermore, the request being merged is not considered for IO yet. It > > has not been dispatched by the io scheduler. IOW, I'm surprised your > > patch makes any difference at all. Especially with your 128 limit, since > > 4kbx128kb is 512kb which is the default max merge size anyway. These > > sort of test cases tend to be very sensitive and exhibit different > > behaviour for many runs, so call me a bit skeptical and consider that an > > enouragement to do more directed testing. You could use fio for > > instance. Have two jobs in your job file. One is a dd type process that > > just writes a huge file, the other job starts eg 10 seconds later and > > does a 4kb read of a file. > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace > to create a fio job file. The said behavior is appended below as "Part > 1 - ls I/O behavior". Note that the original "ls" test case was done > with the anticipatory I/O scheduler, which was active by default on my > debian system with custom vanilla 2.6.28 kernel. Also note that I am > running this on a raid-1, but have experienced the same problem on a > standard partition I created on the same machine. > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It > consists of one dd-like job and many small jobs reading as many data as > ls did. I used the small test script to batch run this ("Part 3 - batch > test"). > > The results for the ls-like jobs are interesting : > > I/O scheduler runt-min (msec) runt-max (msec) > noop 41 10563 > anticipatory 63 8185 > deadline 52 33387 > cfq 43 1420 > > > > As a quick test, could you try and increase the slice_idle to eg 20ms? > > Sometimes I've seen timing being slightly off, which makes us miss the > > sync window for the ls (in your case) process. Then you get a mix of > > async and sync IO all the time, which very much slows down the sync > > process. > > > > Just to confirm, the quick test you are taking about would be : > > --- > block/cfq-iosched.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: linux-2.6-lttng/block/cfq-iosched.c > =================================================================== > --- linux-2.6-lttng.orig/block/cfq-iosched.c 2009-01-18 15:17:32.000000000 -0500 > +++ linux-2.6-lttng/block/cfq-iosched.c 2009-01-18 15:46:38.000000000 -0500 > @@ -26,7 +26,7 @@ static const int cfq_back_penalty = 2; > static const int cfq_slice_sync = HZ / 10; > static int cfq_slice_async = HZ / 25; > static const int cfq_slice_async_rq = 2; > -static int cfq_slice_idle = HZ / 125; > +static int cfq_slice_idle = 20; > > /* > * offset from end of service tree > > > It does not make much difference with the standard cfq test : > > I/O scheduler runt-min (msec) runt-max (msec) > cfq (standard) 43 1420 > cfq (20ms slice_idle) 31 1573 > > > So, I guess 1.5s delay to run ls on a directory when the cache is cold > with a cfq I/O scheduler is somewhat acceptable, but I doubt the 8, 10 > and 33s response times for the anticipatory, noop and deadline I/O > schedulers are. I wonder why on earth is the anticipatory I/O scheduler > activated by default with my kernel given it results in so poor > interactive behavior when doing large I/O ? > I found out why : I had an old pre-2.6.18 .config hanging around in /boot on _many_ of my systems and upgraded to a newer vanilla kernel using these defaults. make oldconfig left CONFIG_DEFAULT_IOSCHED="anticipatory". Changing to CONFIG_DEFAULT_IOSCHED="cfq" makes everything run better under heavy I/O. I bet I'm not the only one in this situation. Mathieu > Thanks for the advices, > > Mathieu > > > > * Part 1 - ls I/O behavior > > lttv -m textDump -t /traces/block-backmerge \ > -e "state.pid=4145&event.subname=bio_queue" > > block.bio_queue: 662.707321959 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, SYSCALL { sector = 327680048, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } > block.bio_queue: 662.707331445 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, SYSCALL { sector = 349175018, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 662.968214766 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, SYSCALL { sector = 327696968, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 662.968222110 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, SYSCALL { sector = 349191938, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 662.971662800 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697032, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 662.971670417 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192002, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 662.971684184 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697040, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 662.971689854 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192010, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 662.971695762 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697048, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 662.971701135 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192018, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 662.971706301 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697056, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 662.971711698 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192026, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 662.971723359 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697064, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 662.971729035 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192034, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 662.999391873 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697072, size = 53248, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 662.999397864 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192042, size = 53248, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 670.809328737 (/traces/block-backmerge/block_7), 4145, 4145, /bin/ls, , 4063, 0x0, TRAP { sector = 327697000, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 670.809337500 (/traces/block-backmerge/block_7), 4145, 4145, /bin/ls, , 4063, 0x0, TRAP { sector = 349191970, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 671.161036834 (/traces/block-backmerge/block_5), 4145, 4145, /bin/ls, , 4063, 0x0, SYSCALL { sector = 360714880, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } > block.bio_queue: 671.161047247 (/traces/block-backmerge/block_5), 4145, 4145, /bin/ls, , 4063, 0x0, SYSCALL { sector = 382209850, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > block.bio_queue: 671.653601399 (/traces/block-backmerge/block_7), 4145, 4145, /bin/ls, , 4063, 0x0, SYSCALL { sector = 360712184, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } > block.bio_queue: 671.653611077 (/traces/block-backmerge/block_7), 4145, 4145, /bin/ls, , 4063, 0x0, SYSCALL { sector = 382207154, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } > > > * Part 2 - dd+ls fio job file (test.job5) > > [job1] > rw=write > size=10240m > direct=0 > blocksize=1024k > > [global] > rw=randread > size=96k > filesize=30m > direct=0 > bsrange=4k-52k > > [file1] > startdelay=0 > > [file2] > startdelay=4 > > [file3] > startdelay=8 > > [file4] > startdelay=12 > > [file5] > startdelay=16 > > [file6] > startdelay=20 > > [file7] > startdelay=24 > > [file8] > startdelay=28 > > [file9] > startdelay=32 > > [file10] > startdelay=36 > > [file11] > startdelay=40 > > [file12] > startdelay=44 > > [file13] > startdelay=48 > > [file14] > startdelay=52 > > [file15] > startdelay=56 > > [file16] > startdelay=60 > > [file17] > startdelay=64 > > [file18] > startdelay=68 > > [file19] > startdelay=72 > > [file20] > startdelay=76 > > [file21] > startdelay=80 > > [file22] > startdelay=84 > > [file23] > startdelay=88 > > [file24] > startdelay=92 > > [file25] > startdelay=96 > > [file26] > startdelay=100 > > [file27] > startdelay=104 > > [file28] > startdelay=108 > > [file29] > startdelay=112 > > [file30] > startdelay=116 > > [file31] > startdelay=120 > > [file32] > startdelay=124 > > [file33] > startdelay=128 > > [file34] > startdelay=132 > > [file35] > startdelay=134 > > [file36] > startdelay=138 > > [file37] > startdelay=142 > > [file38] > startdelay=146 > > [file39] > startdelay=150 > > [file40] > startdelay=200 > > [file41] > startdelay=260 > > > * Part 3 - batch test (do-tests.sh) > > #!/bin/sh > > TESTS="anticipatory noop deadline cfq" > > for TEST in ${TESTS}; do > echo "Running ${TEST}" > > rm -f file*.0 job*.0 > > echo ${TEST} > /sys/block/sda/queue/scheduler > echo ${TEST} > /sys/block/sdb/queue/scheduler > sync > echo 3 > /proc/sys/vm/drop_caches > sleep 5 > > ./fio test.job5 --output test.result.${TEST} > done > > > -- > Mathieu Desnoyers > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-18 21:12 ` Mathieu Desnoyers 2009-01-18 21:27 ` Mathieu Desnoyers @ 2009-01-19 18:26 ` Jens Axboe 2009-01-20 2:10 ` Mathieu Desnoyers 1 sibling, 1 reply; 39+ messages in thread From: Jens Axboe @ 2009-01-19 18:26 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev On Sun, Jan 18 2009, Mathieu Desnoyers wrote: > I looked at the "ls" behavior (while doing a dd) within my LTTng trace > to create a fio job file. The said behavior is appended below as "Part > 1 - ls I/O behavior". Note that the original "ls" test case was done > with the anticipatory I/O scheduler, which was active by default on my > debian system with custom vanilla 2.6.28 kernel. Also note that I am > running this on a raid-1, but have experienced the same problem on a > standard partition I created on the same machine. > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It > consists of one dd-like job and many small jobs reading as many data as > ls did. I used the small test script to batch run this ("Part 3 - batch > test"). > > The results for the ls-like jobs are interesting : > > I/O scheduler runt-min (msec) runt-max (msec) > noop 41 10563 > anticipatory 63 8185 > deadline 52 33387 > cfq 43 1420 Do you have queuing enabled on your drives? You can check that in /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all schedulers, would be good for comparison. raid personalities or dm complicates matters, since it introduces a disconnect between 'ls' and the io scheduler at the bottom... > > As a quick test, could you try and increase the slice_idle to eg 20ms? > > Sometimes I've seen timing being slightly off, which makes us miss the > > sync window for the ls (in your case) process. Then you get a mix of > > async and sync IO all the time, which very much slows down the sync > > process. > > > > Just to confirm, the quick test you are taking about would be : > > --- > block/cfq-iosched.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: linux-2.6-lttng/block/cfq-iosched.c > =================================================================== > --- linux-2.6-lttng.orig/block/cfq-iosched.c 2009-01-18 15:17:32.000000000 -0500 > +++ linux-2.6-lttng/block/cfq-iosched.c 2009-01-18 15:46:38.000000000 -0500 > @@ -26,7 +26,7 @@ static const int cfq_back_penalty = 2; > static const int cfq_slice_sync = HZ / 10; > static int cfq_slice_async = HZ / 25; > static const int cfq_slice_async_rq = 2; > -static int cfq_slice_idle = HZ / 125; > +static int cfq_slice_idle = 20; > > /* > * offset from end of service tree > > > It does not make much difference with the standard cfq test : > > I/O scheduler runt-min (msec) runt-max (msec) > cfq (standard) 43 1420 > cfq (20ms slice_idle) 31 1573 OK, that's good at least! > So, I guess 1.5s delay to run ls on a directory when the cache is cold > with a cfq I/O scheduler is somewhat acceptable, but I doubt the 8, 10 > and 33s response times for the anticipatory, noop and deadline I/O > schedulers are. I wonder why on earth is the anticipatory I/O scheduler > activated by default with my kernel given it results in so poor > interactive behavior when doing large I/O ? I see you already found out why :-) -- Jens Axboe ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-19 18:26 ` Jens Axboe @ 2009-01-20 2:10 ` Mathieu Desnoyers 2009-01-20 7:37 ` Jens Axboe 0 siblings, 1 reply; 39+ messages in thread From: Mathieu Desnoyers @ 2009-01-20 2:10 UTC (permalink / raw) To: Jens Axboe Cc: Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev * Jens Axboe (jens.axboe@oracle.com) wrote: > On Sun, Jan 18 2009, Mathieu Desnoyers wrote: > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace > > to create a fio job file. The said behavior is appended below as "Part > > 1 - ls I/O behavior". Note that the original "ls" test case was done > > with the anticipatory I/O scheduler, which was active by default on my > > debian system with custom vanilla 2.6.28 kernel. Also note that I am > > running this on a raid-1, but have experienced the same problem on a > > standard partition I created on the same machine. > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It > > consists of one dd-like job and many small jobs reading as many data as > > ls did. I used the small test script to batch run this ("Part 3 - batch > > test"). > > > > The results for the ls-like jobs are interesting : > > > > I/O scheduler runt-min (msec) runt-max (msec) > > noop 41 10563 > > anticipatory 63 8185 > > deadline 52 33387 > > cfq 43 1420 > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did not make much difference (also tried with NO_HZ enabled). > Do you have queuing enabled on your drives? You can check that in > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all > schedulers, would be good for comparison. > Here are the tests with a queue_depth of 1 : I/O scheduler runt-min (msec) runt-max (msec) noop 43 38235 anticipatory 44 8728 deadline 51 19751 cfq 48 427 Overall, I wouldn't say it makes much difference. > raid personalities or dm complicates matters, since it introduces a > disconnect between 'ls' and the io scheduler at the bottom... > Yes, ideally I should re-run those directly on the disk partitions. I am also tempted to create a fio job file which acts like a ssh server receiving a connexion after it has been pruned from the cache while the system if doing heavy I/O. "ssh", in this case, seems to be doing much more I/O than a simple "ls", and I think we might want to see if cfq behaves correctly in such case. Most of this I/O is coming from page faults (identified as traps in the trace) probably because the ssh executable has been thrown out of the cache by echo 3 > /proc/sys/vm/drop_caches The behavior of an incoming ssh connexion after clearing the cache is appended below (Part 1 - LTTng trace for incoming ssh connexion). The job file created (Part 2) reads, for each job, a 2MB file with random reads each between 4k-44k. The results are very interesting for cfq : I/O scheduler runt-min (msec) runt-max (msec) noop 586 110242 anticipatory 531 26942 deadline 561 108772 cfq 523 28216 So, basically, ssh being out of the cache can take 28s to answer an incoming ssh connexion even with the cfq scheduler. This is not exactly what I would call an acceptable latency. Mathieu * Part 1 - LTTng trace for incoming ssh connexion block.bio_queue: 14270.987362011 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, sshd, , 4159, 0x0, SYSCALL { sector = 12312, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14270.987370577 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, sshd, , 4159, 0x0, SYSCALL { sector = 21507282, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14271.002701211 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, sshd, , 4159, 0x0, SYSCALL { sector = 376717312, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14271.002708852 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, sshd, , 4159, 0x0, SYSCALL { sector = 398212282, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14271.994249134 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, sshd, , 4159, 0x0, SYSCALL { sector = 376762504, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14271.994258500 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, sshd, , 4159, 0x0, SYSCALL { sector = 398257474, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.005047300 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, sshd, , 4159, 0x0, TRAP { sector = 186581088, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.005054182 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, sshd, , 4159, 0x0, TRAP { sector = 208076058, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.197046688 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, sshd, , 4159, 0x0, TRAP { sector = 186581680, size = 45056, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.197056120 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, sshd, , 4159, 0x0, TRAP { sector = 208076650, size = 45056, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.214463959 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, sshd, , 4159, 0x0, TRAP { sector = 376983192, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.214469777 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, sshd, , 4159, 0x0, TRAP { sector = 398478162, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.358980449 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, sshd, , 4159, 0x0, TRAP { sector = 376983312, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.358986893 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, sshd, , 4159, 0x0, TRAP { sector = 398478282, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366179882 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504036296, size = 20480, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366188841 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525531266, size = 20480, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366228133 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504037392, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366233770 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525532362, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366245471 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504070144, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366250460 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525565114, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366258431 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504172624, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366263414 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525667594, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366271329 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504172640, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366275709 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525667610, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366305707 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504172664, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366311569 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525667634, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366320581 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504172680, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366327005 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525667650, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366334928 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504172688, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366339671 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525667658, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366351578 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504172696, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.366356064 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525667666, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.394371136 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504172704, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.394378840 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525667674, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.394396826 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504172744, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.394402397 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525667714, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.504393076 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 376762496, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14272.504399733 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 398257466, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.651642743 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376819168, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.651650198 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398314138, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.651668568 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376819192, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.651673473 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398314162, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.813095173 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376930384, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.813103780 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398425354, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.818773204 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376983360, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.818779958 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398478330, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.867827280 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376871792, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.867834786 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398366762, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.867857878 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376871816, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14272.867863845 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398366786, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.000933599 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376871832, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.000941927 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398366802, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.000962547 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376871856, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.000967971 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398366826, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.000988999 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376871896, size = 20480, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.000994441 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398366866, size = 20480, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.016781818 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 557798168, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.016787698 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 579293138, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.027449494 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557798264, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.027455846 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579293234, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.079950572 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557801192, size = 69632, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.079957430 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579296162, size = 69632, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.087728033 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557800984, size = 106496, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.087734033 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579295954, size = 106496, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.205730103 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376977904, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.205735312 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398472874, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.213716615 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 557596672, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14273.213725447 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 579091642, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.376105867 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 557632888, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14273.376113769 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 579127858, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.390329162 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744176, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.390338057 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239146, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.390366345 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744184, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.390371136 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239154, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.390384775 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744192, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.390389617 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239162, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.390402469 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744200, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.390407113 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239170, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.390420125 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744208, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.390424982 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239178, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.390432638 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744216, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.390436805 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239186, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.390462732 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744224, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.390467689 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239194, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.548801789 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744232, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.548812506 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239202, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.548844346 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744256, size = 32768, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.548850571 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239226, size = 32768, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.555483129 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978008, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.555489558 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398472978, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.555502566 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978016, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.555507462 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398472986, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.555513691 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978024, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.555518362 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398472994, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.555522790 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978032, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.555527365 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473002, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.555531940 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978040, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.555536359 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473010, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.555540953 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978048, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.555545306 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473018, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.555549707 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978056, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.555554228 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473026, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.555565226 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978064, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.555583185 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473034, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.556111195 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978072, size = 12288, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.556116436 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473042, size = 12288, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.556132550 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978104, size = 24576, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.556137395 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473074, size = 24576, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.557633755 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376979192, size = 20480, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.557639746 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398474162, size = 20480, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.557651417 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376979240, size = 12288, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.557655782 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398474210, size = 12288, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.558790122 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978680, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.558797670 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473650, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.558810157 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978688, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.558815023 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473658, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.558826051 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978736, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.558830869 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473706, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.559618325 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978744, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.559624455 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473714, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.559648476 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978760, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.559653673 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473730, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.560470401 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 557632776, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14273.560475954 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 579127746, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.564633093 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557647824, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.564639949 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579142794, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.570412202 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557647944, size = 36864, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.570417494 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579142914, size = 36864, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.570432050 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557648024, size = 28672, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.570436544 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579142994, size = 28672, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.573250317 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 557648112, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.573255825 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 579143082, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.573813668 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557648208, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.573819380 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579143178, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.574357597 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557649240, size = 69632, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.574363720 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579144210, size = 69632, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.579745509 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 557632816, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14273.579750936 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 579127786, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.580137575 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557649536, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.580143137 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579144506, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.581782686 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557649648, size = 28672, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.581787972 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579144618, size = 28672, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.581798890 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557649712, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.581803213 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579144682, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.583373838 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376980416, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.583379589 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398475386, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.592597554 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376982864, size = 77824, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.592603461 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398477834, size = 77824, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.605484632 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557649424, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.605490392 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579144394, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.606285537 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376766472, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.606292749 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398261442, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.618255248 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503841136, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14273.618262031 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525336106, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.766848612 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957088, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14273.766854819 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452058, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.779173851 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503857536, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.779179020 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525352506, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.956064108 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 383516688, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14273.956073127 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 405011658, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14273.963661833 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 504172672, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14273.963667482 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525667642, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.105890774 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503857200, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.105897887 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525352170, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.114466614 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 639844352, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14274.114471721 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 661339322, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.194546003 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503857392, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.194551112 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525352362, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.195244833 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978584, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.195250131 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473554, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.342679172 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376977824, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.342686069 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398472794, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.342702066 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376977864, size = 12288, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.342706689 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398472834, size = 12288, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.514308041 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376979128, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.514316219 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398474098, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.514332549 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376979144, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.514337418 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398474114, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.514354278 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376979160, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.514358806 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398474130, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.514371841 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376979176, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.514376353 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398474146, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.671607720 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 110366736, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.671614533 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 131861706, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.688855653 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503841144, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14274.688861789 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525336114, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.710775517 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957224, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14274.710783249 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452194, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.711178453 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 504036272, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14274.711185887 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525531242, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14275.753947620 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 557727992, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14275.753956191 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 579222962, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14275.891101527 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 558242792, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14275.891109390 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 579737762, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14276.054306664 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566165504, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14276.054312781 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587660474, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14276.202061219 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169560, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14276.202067900 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664530, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14276.343169743 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169656, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14276.343177097 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664626, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14276.435036005 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566171584, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14276.435042329 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587666554, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14276.587967625 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566170576, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14276.587975446 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587665546, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14276.714877542 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566171080, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14276.714885441 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587666050, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14276.885331923 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566170824, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14276.885338400 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587665794, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.041004774 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566170696, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.041011242 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587665666, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.090024321 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566170760, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.090030807 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587665730, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.139160617 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566170792, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.139166503 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587665762, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.146527238 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566170808, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.146532806 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587665778, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.147041642 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566170816, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.147046664 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587665786, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.147056378 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566170832, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.147060909 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587665802, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.149654636 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 504086544, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.149661995 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525581514, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.299441568 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566165512, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.299449098 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587660482, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.316058849 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566165608, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.316064702 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587660578, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.316655231 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566167536, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.316661231 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587662506, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.319198772 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566168544, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.319204644 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587663514, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.325427594 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169048, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.325432190 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664018, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.327980237 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169296, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.327985268 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664266, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.329234978 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169168, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.329239811 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664138, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.330769742 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169104, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.330775631 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664074, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.331300113 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169136, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.331305777 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664106, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.331634685 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169120, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.331640664 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664090, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.332191280 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169112, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.332198036 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664082, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.332857870 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 641990688, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14277.332863016 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 663485658, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.339925356 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 504086552, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.339930549 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525581522, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.350000251 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503840960, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14277.350007112 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525335930, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.360440736 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503844888, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.360446037 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525339858, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.417649469 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503841152, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14277.417655383 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525336122, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418058555 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957240, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418063403 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452210, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418555076 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957272, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418560377 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452242, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418570217 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957280, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418574897 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452250, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418581063 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957288, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418585764 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452258, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418590078 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957296, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418594614 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452266, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418598451 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957304, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418602756 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452274, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418606908 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957312, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418611238 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452282, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418615216 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957320, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418619527 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452290, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418623322 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957328, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418627663 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452298, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418836246 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957336, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.418841193 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452306, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.419381341 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957344, size = 65536, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.419386225 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452314, size = 65536, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.419849133 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957472, size = 20480, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.419853747 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452442, size = 20480, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.576690908 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 110510128, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.576698949 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 132005098, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.588845789 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503988328, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.588852656 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525483298, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.601952879 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503873536, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14277.601959539 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525368506, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.060232543 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376983048, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.060241912 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398478018, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.064129159 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503857272, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14278.064138655 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525352242, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.071310370 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 504037776, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.071330264 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525532746, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.080891196 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503939072, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.080897109 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525434042, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.084320641 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376947512, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.084328574 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398442482, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.084343616 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376947552, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.084348755 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398442522, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.084358266 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376947568, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.084363390 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398442538, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.084378252 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376947576, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.084383308 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398442546, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.096592889 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376947584, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.096599909 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398442554, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.096953622 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376946984, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.096958890 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398441954, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.101879473 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503955464, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.101885305 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525450434, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.118154240 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503971864, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.118162137 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525466834, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.126133387 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503988608, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.126139687 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525483578, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.136351623 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503857280, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.136357399 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525352250, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.138499766 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169080, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.138506375 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664050, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.139160026 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169064, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.139165315 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664034, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.139782848 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169072, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.139788161 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664042, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.139799535 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169088, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.139804017 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664058, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.141005857 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503841632, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14278.141012172 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525336602, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.149367501 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 503956240, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.149373775 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525451210, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.155173707 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 315408384, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14278.155179359 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 336903354, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.169842985 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 483393984, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14278.169849091 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 504888954, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.180896269 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 483400808, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14278.180903577 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 504895778, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.184431117 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 483795656, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.184437162 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 505290626, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.209624125 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 503923064, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.209631628 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525418034, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.221083451 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503873552, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.221090019 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525368522, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.318767351 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 640040968, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 } block.bio_queue: 14278.318773435 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 661535938, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.325009226 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 641367208, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.325014566 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 662862178, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.330573352 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 641367216, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } block.bio_queue: 14278.330579649 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 662862186, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 } * Part 2 - ssh connexion job file (test.job.ssh) [job1] rw=write size=10240m direct=0 blocksize=1024k [global] rw=randread size=2048k filesize=30m direct=0 bsrange=4k-44k [file1] startdelay=0 [file2] startdelay=4 [file3] startdelay=8 [file4] startdelay=12 [file5] startdelay=16 [file6] startdelay=20 [file7] startdelay=24 [file8] startdelay=28 [file9] startdelay=32 [file10] startdelay=36 [file11] startdelay=40 [file12] startdelay=44 [file13] startdelay=48 [file14] startdelay=52 [file15] startdelay=56 [file16] startdelay=60 [file17] startdelay=64 [file18] startdelay=68 [file19] startdelay=72 [file20] startdelay=76 [file21] startdelay=80 [file22] startdelay=84 [file23] startdelay=88 [file24] startdelay=92 [file25] startdelay=96 [file26] startdelay=100 [file27] startdelay=104 [file28] startdelay=108 [file29] startdelay=112 [file30] startdelay=116 [file31] startdelay=120 [file32] startdelay=124 [file33] startdelay=128 [file34] startdelay=132 [file35] startdelay=134 [file36] startdelay=138 [file37] startdelay=142 [file38] startdelay=146 [file39] startdelay=150 [file40] startdelay=200 [file41] startdelay=260 -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-20 2:10 ` Mathieu Desnoyers @ 2009-01-20 7:37 ` Jens Axboe 2009-01-20 12:28 ` Jens Axboe ` (2 more replies) 0 siblings, 3 replies; 39+ messages in thread From: Jens Axboe @ 2009-01-20 7:37 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev On Mon, Jan 19 2009, Mathieu Desnoyers wrote: > * Jens Axboe (jens.axboe@oracle.com) wrote: > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote: > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace > > > to create a fio job file. The said behavior is appended below as "Part > > > 1 - ls I/O behavior". Note that the original "ls" test case was done > > > with the anticipatory I/O scheduler, which was active by default on my > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am > > > running this on a raid-1, but have experienced the same problem on a > > > standard partition I created on the same machine. > > > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It > > > consists of one dd-like job and many small jobs reading as many data as > > > ls did. I used the small test script to batch run this ("Part 3 - batch > > > test"). > > > > > > The results for the ls-like jobs are interesting : > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > noop 41 10563 > > > anticipatory 63 8185 > > > deadline 52 33387 > > > cfq 43 1420 > > > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did > not make much difference (also tried with NO_HZ enabled). > > > Do you have queuing enabled on your drives? You can check that in > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all > > schedulers, would be good for comparison. > > > > Here are the tests with a queue_depth of 1 : > > I/O scheduler runt-min (msec) runt-max (msec) > noop 43 38235 > anticipatory 44 8728 > deadline 51 19751 > cfq 48 427 > > > Overall, I wouldn't say it makes much difference. 0,5 seconds vs 1,5 seconds isn't much of a difference? > > raid personalities or dm complicates matters, since it introduces a > > disconnect between 'ls' and the io scheduler at the bottom... > > > > Yes, ideally I should re-run those directly on the disk partitions. At least for comparison. > I am also tempted to create a fio job file which acts like a ssh server > receiving a connexion after it has been pruned from the cache while the > system if doing heavy I/O. "ssh", in this case, seems to be doing much > more I/O than a simple "ls", and I think we might want to see if cfq > behaves correctly in such case. Most of this I/O is coming from page > faults (identified as traps in the trace) probably because the ssh > executable has been thrown out of the cache by > > echo 3 > /proc/sys/vm/drop_caches > > The behavior of an incoming ssh connexion after clearing the cache is > appended below (Part 1 - LTTng trace for incoming ssh connexion). The > job file created (Part 2) reads, for each job, a 2MB file with random > reads each between 4k-44k. The results are very interesting for cfq : > > I/O scheduler runt-min (msec) runt-max (msec) > noop 586 110242 > anticipatory 531 26942 > deadline 561 108772 > cfq 523 28216 > > So, basically, ssh being out of the cache can take 28s to answer an > incoming ssh connexion even with the cfq scheduler. This is not exactly > what I would call an acceptable latency. At some point, you have to stop and consider what is acceptable performance for a given IO pattern. Your ssh test case is purely random IO, and neither CFQ nor AS would do any idling for that. We can make this test case faster for sure, the hard part is making sure that we don't regress on async throughput at the same time. Also remember that with your raid1, it's not entirely reasonable to blaim all performance issues on the IO scheduler as per my previous mail. It would be a lot more fair to view the disk numbers individually. Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set to 1 as well? However, I think we should be doing somewhat better at this test case. -- Jens Axboe ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-20 7:37 ` Jens Axboe @ 2009-01-20 12:28 ` Jens Axboe 2009-01-20 14:22 ` [ltt-dev] " Mathieu Desnoyers ` (2 more replies) 2009-01-20 13:45 ` [ltt-dev] " Mathieu Desnoyers 2009-01-20 20:22 ` Ben Gamari 2 siblings, 3 replies; 39+ messages in thread From: Jens Axboe @ 2009-01-20 12:28 UTC (permalink / raw) To: Mathieu Desnoyers Cc: akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev On Tue, Jan 20 2009, Jens Axboe wrote: > On Mon, Jan 19 2009, Mathieu Desnoyers wrote: > > * Jens Axboe (jens.axboe@oracle.com) wrote: > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote: > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace > > > > to create a fio job file. The said behavior is appended below as "Part > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done > > > > with the anticipatory I/O scheduler, which was active by default on my > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am > > > > running this on a raid-1, but have experienced the same problem on a > > > > standard partition I created on the same machine. > > > > > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It > > > > consists of one dd-like job and many small jobs reading as many data as > > > > ls did. I used the small test script to batch run this ("Part 3 - batch > > > > test"). > > > > > > > > The results for the ls-like jobs are interesting : > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > noop 41 10563 > > > > anticipatory 63 8185 > > > > deadline 52 33387 > > > > cfq 43 1420 > > > > > > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did > > not make much difference (also tried with NO_HZ enabled). > > > > > Do you have queuing enabled on your drives? You can check that in > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all > > > schedulers, would be good for comparison. > > > > > > > Here are the tests with a queue_depth of 1 : > > > > I/O scheduler runt-min (msec) runt-max (msec) > > noop 43 38235 > > anticipatory 44 8728 > > deadline 51 19751 > > cfq 48 427 > > > > > > Overall, I wouldn't say it makes much difference. > > 0,5 seconds vs 1,5 seconds isn't much of a difference? > > > > raid personalities or dm complicates matters, since it introduces a > > > disconnect between 'ls' and the io scheduler at the bottom... > > > > > > > Yes, ideally I should re-run those directly on the disk partitions. > > At least for comparison. > > > I am also tempted to create a fio job file which acts like a ssh server > > receiving a connexion after it has been pruned from the cache while the > > system if doing heavy I/O. "ssh", in this case, seems to be doing much > > more I/O than a simple "ls", and I think we might want to see if cfq > > behaves correctly in such case. Most of this I/O is coming from page > > faults (identified as traps in the trace) probably because the ssh > > executable has been thrown out of the cache by > > > > echo 3 > /proc/sys/vm/drop_caches > > > > The behavior of an incoming ssh connexion after clearing the cache is > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The > > job file created (Part 2) reads, for each job, a 2MB file with random > > reads each between 4k-44k. The results are very interesting for cfq : > > > > I/O scheduler runt-min (msec) runt-max (msec) > > noop 586 110242 > > anticipatory 531 26942 > > deadline 561 108772 > > cfq 523 28216 > > > > So, basically, ssh being out of the cache can take 28s to answer an > > incoming ssh connexion even with the cfq scheduler. This is not exactly > > what I would call an acceptable latency. > > At some point, you have to stop and consider what is acceptable > performance for a given IO pattern. Your ssh test case is purely random > IO, and neither CFQ nor AS would do any idling for that. We can make > this test case faster for sure, the hard part is making sure that we > don't regress on async throughput at the same time. > > Also remember that with your raid1, it's not entirely reasonable to > blaim all performance issues on the IO scheduler as per my previous > mail. It would be a lot more fair to view the disk numbers individually. > > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set > to 1 as well? > > However, I think we should be doing somewhat better at this test case. Mathieu, does this improve anything for you? diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index e8525fa..a556512 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -1765,6 +1765,32 @@ cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq, } /* + * Pull dispatched requests from 'cfqq' back into the scheduler + */ +static void cfq_pull_dispatched_requests(struct cfq_data *cfqd, + struct cfq_queue *cfqq) +{ + struct request_queue *q = cfqd->queue; + struct request *rq, *tmp; + + list_for_each_entry_safe(rq, tmp, &q->queue_head, queuelist) { + if ((rq->cmd_flags & REQ_STARTED) || RQ_CFQQ(rq) != cfqq) + continue; + + /* + * Pull off the dispatch list and put it back into the cfqq + */ + list_del(&rq->queuelist); + cfqq->dispatched--; + if (cfq_cfqq_sync(cfqq)) + cfqd->sync_flight--; + + list_add_tail(&rq->queuelist, &cfqq->fifo); + cfq_add_rq_rb(rq); + } +} + +/* * Check if new_cfqq should preempt the currently active queue. Return 0 for * no or if we aren't sure, a 1 will cause a preempt. */ @@ -1820,8 +1846,14 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq, */ static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq) { + struct cfq_queue *old_cfqq = cfqd->active_queue; + cfq_log_cfqq(cfqd, cfqq, "preempt"); - cfq_slice_expired(cfqd, 1); + + if (old_cfqq) { + __cfq_slice_expired(cfqd, old_cfqq, 1); + cfq_pull_dispatched_requests(cfqd, old_cfqq); + } /* * Put the new queue at the front of the of the current list, -- Jens Axboe ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-20 12:28 ` Jens Axboe @ 2009-01-20 14:22 ` Mathieu Desnoyers 2009-01-20 14:24 ` Jens Axboe 2009-01-20 23:27 ` Mathieu Desnoyers 2009-02-02 2:08 ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers 2 siblings, 1 reply; 39+ messages in thread From: Mathieu Desnoyers @ 2009-01-20 14:22 UTC (permalink / raw) To: Jens Axboe; +Cc: akpm, ltt-dev, Linus Torvalds, Ingo Molnar, linux-kernel * Jens Axboe (jens.axboe@oracle.com) wrote: > On Tue, Jan 20 2009, Jens Axboe wrote: > > On Mon, Jan 19 2009, Mathieu Desnoyers wrote: > > > * Jens Axboe (jens.axboe@oracle.com) wrote: > > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote: > > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace > > > > > to create a fio job file. The said behavior is appended below as "Part > > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done > > > > > with the anticipatory I/O scheduler, which was active by default on my > > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am > > > > > running this on a raid-1, but have experienced the same problem on a > > > > > standard partition I created on the same machine. > > > > > > > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It > > > > > consists of one dd-like job and many small jobs reading as many data as > > > > > ls did. I used the small test script to batch run this ("Part 3 - batch > > > > > test"). > > > > > > > > > > The results for the ls-like jobs are interesting : > > > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > > noop 41 10563 > > > > > anticipatory 63 8185 > > > > > deadline 52 33387 > > > > > cfq 43 1420 > > > > > > > > > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did > > > not make much difference (also tried with NO_HZ enabled). > > > > > > > Do you have queuing enabled on your drives? You can check that in > > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all > > > > schedulers, would be good for comparison. > > > > > > > > > > Here are the tests with a queue_depth of 1 : > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > noop 43 38235 > > > anticipatory 44 8728 > > > deadline 51 19751 > > > cfq 48 427 > > > > > > > > > Overall, I wouldn't say it makes much difference. > > > > 0,5 seconds vs 1,5 seconds isn't much of a difference? > > > > > > raid personalities or dm complicates matters, since it introduces a > > > > disconnect between 'ls' and the io scheduler at the bottom... > > > > > > > > > > Yes, ideally I should re-run those directly on the disk partitions. > > > > At least for comparison. > > > > > I am also tempted to create a fio job file which acts like a ssh server > > > receiving a connexion after it has been pruned from the cache while the > > > system if doing heavy I/O. "ssh", in this case, seems to be doing much > > > more I/O than a simple "ls", and I think we might want to see if cfq > > > behaves correctly in such case. Most of this I/O is coming from page > > > faults (identified as traps in the trace) probably because the ssh > > > executable has been thrown out of the cache by > > > > > > echo 3 > /proc/sys/vm/drop_caches > > > > > > The behavior of an incoming ssh connexion after clearing the cache is > > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The > > > job file created (Part 2) reads, for each job, a 2MB file with random > > > reads each between 4k-44k. The results are very interesting for cfq : > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > noop 586 110242 > > > anticipatory 531 26942 > > > deadline 561 108772 > > > cfq 523 28216 > > > > > > So, basically, ssh being out of the cache can take 28s to answer an > > > incoming ssh connexion even with the cfq scheduler. This is not exactly > > > what I would call an acceptable latency. > > > > At some point, you have to stop and consider what is acceptable > > performance for a given IO pattern. Your ssh test case is purely random > > IO, and neither CFQ nor AS would do any idling for that. We can make > > this test case faster for sure, the hard part is making sure that we > > don't regress on async throughput at the same time. > > > > Also remember that with your raid1, it's not entirely reasonable to > > blaim all performance issues on the IO scheduler as per my previous > > mail. It would be a lot more fair to view the disk numbers individually. > > > > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set > > to 1 as well? > > > > However, I think we should be doing somewhat better at this test case. > > Mathieu, does this improve anything for you? > I got this message when running with your patch applied : cfq: forced dispatching is broken (nr_sorted=4294967275), please report this (message appeared 10 times in a job run) Here is the result : ssh test done on /dev/sda directly queue_depth=31 (default) /sys/block/sda/queue/iosched/slice_async_rq = 2 (default) /sys/block/sda/queue/iosched/quantum = 4 (default) I/O scheduler runt-min (msec) runt-max (msec) cfq (default) 523 6637 cfq (patched) 564 7195 Pretty much the same. Here is the test done on raid1 : queue_depth=31 (default) /sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default) /sys/block/sd{a,b}/queue/iosched/quantum = 4 (default) I/O scheduler runt-min (msec) runt-max (msec) cfq (default, raid1) 523 28216 cfq (patched, raid1) 540 16454 With nearly same order of magnitude worse-case. Mathieu > diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c > index e8525fa..a556512 100644 > --- a/block/cfq-iosched.c > +++ b/block/cfq-iosched.c > @@ -1765,6 +1765,32 @@ cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq, > } > > /* > + * Pull dispatched requests from 'cfqq' back into the scheduler > + */ > +static void cfq_pull_dispatched_requests(struct cfq_data *cfqd, > + struct cfq_queue *cfqq) > +{ > + struct request_queue *q = cfqd->queue; > + struct request *rq, *tmp; > + > + list_for_each_entry_safe(rq, tmp, &q->queue_head, queuelist) { > + if ((rq->cmd_flags & REQ_STARTED) || RQ_CFQQ(rq) != cfqq) > + continue; > + > + /* > + * Pull off the dispatch list and put it back into the cfqq > + */ > + list_del(&rq->queuelist); > + cfqq->dispatched--; > + if (cfq_cfqq_sync(cfqq)) > + cfqd->sync_flight--; > + > + list_add_tail(&rq->queuelist, &cfqq->fifo); > + cfq_add_rq_rb(rq); > + } > +} > + > +/* > * Check if new_cfqq should preempt the currently active queue. Return 0 for > * no or if we aren't sure, a 1 will cause a preempt. > */ > @@ -1820,8 +1846,14 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq, > */ > static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq) > { > + struct cfq_queue *old_cfqq = cfqd->active_queue; > + > cfq_log_cfqq(cfqd, cfqq, "preempt"); > - cfq_slice_expired(cfqd, 1); > + > + if (old_cfqq) { > + __cfq_slice_expired(cfqd, old_cfqq, 1); > + cfq_pull_dispatched_requests(cfqd, old_cfqq); > + } > > /* > * Put the new queue at the front of the of the current list, > > -- > Jens Axboe > > > _______________________________________________ > ltt-dev mailing list > ltt-dev@lists.casi.polymtl.ca > http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-20 14:22 ` [ltt-dev] " Mathieu Desnoyers @ 2009-01-20 14:24 ` Jens Axboe 2009-01-20 15:42 ` Mathieu Desnoyers 0 siblings, 1 reply; 39+ messages in thread From: Jens Axboe @ 2009-01-20 14:24 UTC (permalink / raw) To: Mathieu Desnoyers Cc: akpm, ltt-dev, Linus Torvalds, Ingo Molnar, linux-kernel On Tue, Jan 20 2009, Mathieu Desnoyers wrote: > * Jens Axboe (jens.axboe@oracle.com) wrote: > > On Tue, Jan 20 2009, Jens Axboe wrote: > > > On Mon, Jan 19 2009, Mathieu Desnoyers wrote: > > > > * Jens Axboe (jens.axboe@oracle.com) wrote: > > > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote: > > > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace > > > > > > to create a fio job file. The said behavior is appended below as "Part > > > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done > > > > > > with the anticipatory I/O scheduler, which was active by default on my > > > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am > > > > > > running this on a raid-1, but have experienced the same problem on a > > > > > > standard partition I created on the same machine. > > > > > > > > > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It > > > > > > consists of one dd-like job and many small jobs reading as many data as > > > > > > ls did. I used the small test script to batch run this ("Part 3 - batch > > > > > > test"). > > > > > > > > > > > > The results for the ls-like jobs are interesting : > > > > > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > > > noop 41 10563 > > > > > > anticipatory 63 8185 > > > > > > deadline 52 33387 > > > > > > cfq 43 1420 > > > > > > > > > > > > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did > > > > not make much difference (also tried with NO_HZ enabled). > > > > > > > > > Do you have queuing enabled on your drives? You can check that in > > > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all > > > > > schedulers, would be good for comparison. > > > > > > > > > > > > > Here are the tests with a queue_depth of 1 : > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > noop 43 38235 > > > > anticipatory 44 8728 > > > > deadline 51 19751 > > > > cfq 48 427 > > > > > > > > > > > > Overall, I wouldn't say it makes much difference. > > > > > > 0,5 seconds vs 1,5 seconds isn't much of a difference? > > > > > > > > raid personalities or dm complicates matters, since it introduces a > > > > > disconnect between 'ls' and the io scheduler at the bottom... > > > > > > > > > > > > > Yes, ideally I should re-run those directly on the disk partitions. > > > > > > At least for comparison. > > > > > > > I am also tempted to create a fio job file which acts like a ssh server > > > > receiving a connexion after it has been pruned from the cache while the > > > > system if doing heavy I/O. "ssh", in this case, seems to be doing much > > > > more I/O than a simple "ls", and I think we might want to see if cfq > > > > behaves correctly in such case. Most of this I/O is coming from page > > > > faults (identified as traps in the trace) probably because the ssh > > > > executable has been thrown out of the cache by > > > > > > > > echo 3 > /proc/sys/vm/drop_caches > > > > > > > > The behavior of an incoming ssh connexion after clearing the cache is > > > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The > > > > job file created (Part 2) reads, for each job, a 2MB file with random > > > > reads each between 4k-44k. The results are very interesting for cfq : > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > noop 586 110242 > > > > anticipatory 531 26942 > > > > deadline 561 108772 > > > > cfq 523 28216 > > > > > > > > So, basically, ssh being out of the cache can take 28s to answer an > > > > incoming ssh connexion even with the cfq scheduler. This is not exactly > > > > what I would call an acceptable latency. > > > > > > At some point, you have to stop and consider what is acceptable > > > performance for a given IO pattern. Your ssh test case is purely random > > > IO, and neither CFQ nor AS would do any idling for that. We can make > > > this test case faster for sure, the hard part is making sure that we > > > don't regress on async throughput at the same time. > > > > > > Also remember that with your raid1, it's not entirely reasonable to > > > blaim all performance issues on the IO scheduler as per my previous > > > mail. It would be a lot more fair to view the disk numbers individually. > > > > > > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set > > > to 1 as well? > > > > > > However, I think we should be doing somewhat better at this test case. > > > > Mathieu, does this improve anything for you? > > > > I got this message when running with your patch applied : > cfq: forced dispatching is broken (nr_sorted=4294967275), please report this > (message appeared 10 times in a job run) Woops, missed a sort inc. Updated version below, or just ignore the warning. > Here is the result : > > ssh test done on /dev/sda directly > > queue_depth=31 (default) > /sys/block/sda/queue/iosched/slice_async_rq = 2 (default) > /sys/block/sda/queue/iosched/quantum = 4 (default) > > I/O scheduler runt-min (msec) runt-max (msec) > cfq (default) 523 6637 > cfq (patched) 564 7195 > > Pretty much the same. Can you retry with depth=1 as well? There's not much to rip back out, if everything is immediately sent to the device. > > Here is the test done on raid1 : > queue_depth=31 (default) > /sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default) > /sys/block/sd{a,b}/queue/iosched/quantum = 4 (default) > > I/O scheduler runt-min (msec) runt-max (msec) > cfq (default, raid1) 523 28216 > cfq (patched, raid1) 540 16454 > > With nearly same order of magnitude worse-case. diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index e8525fa..30714de 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -1765,6 +1765,36 @@ cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq, } /* + * Pull dispatched requests from 'cfqq' back into the scheduler + */ +static void cfq_pull_dispatched_requests(struct cfq_data *cfqd, + struct cfq_queue *cfqq) +{ + struct request_queue *q = cfqd->queue; + struct request *rq; + + list_for_each_entry_reverse(rq, &q->queue_head, queuelist) { + if (rq->cmd_flags & REQ_STARTED) + break; + + if (RQ_CFQQ(rq) != cfqq) + continue; + + /* + * Pull off the dispatch list and put it back into the cfqq + */ + list_del(&rq->queuelist); + cfqq->dispatched--; + if (cfq_cfqq_sync(cfqq)) + cfqd->sync_flight--; + + cfq_add_rq_rb(rq); + q->nr_sorted++; + list_add_tail(&rq->queuelist, &cfqq->fifo); + } +} + +/* * Check if new_cfqq should preempt the currently active queue. Return 0 for * no or if we aren't sure, a 1 will cause a preempt. */ @@ -1820,8 +1850,14 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq, */ static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq) { + struct cfq_queue *old_cfqq = cfqd->active_queue; + cfq_log_cfqq(cfqd, cfqq, "preempt"); - cfq_slice_expired(cfqd, 1); + + if (old_cfqq) { + __cfq_slice_expired(cfqd, old_cfqq, 1); + cfq_pull_dispatched_requests(cfqd, old_cfqq); + } /* * Put the new queue at the front of the of the current list, -- Jens Axboe ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-20 14:24 ` Jens Axboe @ 2009-01-20 15:42 ` Mathieu Desnoyers 2009-01-20 23:06 ` Mathieu Desnoyers 0 siblings, 1 reply; 39+ messages in thread From: Mathieu Desnoyers @ 2009-01-20 15:42 UTC (permalink / raw) To: Jens Axboe; +Cc: akpm, ltt-dev, Linus Torvalds, Ingo Molnar, linux-kernel * Jens Axboe (jens.axboe@oracle.com) wrote: > On Tue, Jan 20 2009, Mathieu Desnoyers wrote: > > * Jens Axboe (jens.axboe@oracle.com) wrote: > > > On Tue, Jan 20 2009, Jens Axboe wrote: > > > > On Mon, Jan 19 2009, Mathieu Desnoyers wrote: > > > > > * Jens Axboe (jens.axboe@oracle.com) wrote: > > > > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote: > > > > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace > > > > > > > to create a fio job file. The said behavior is appended below as "Part > > > > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done > > > > > > > with the anticipatory I/O scheduler, which was active by default on my > > > > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am > > > > > > > running this on a raid-1, but have experienced the same problem on a > > > > > > > standard partition I created on the same machine. > > > > > > > > > > > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It > > > > > > > consists of one dd-like job and many small jobs reading as many data as > > > > > > > ls did. I used the small test script to batch run this ("Part 3 - batch > > > > > > > test"). > > > > > > > > > > > > > > The results for the ls-like jobs are interesting : > > > > > > > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > > > > noop 41 10563 > > > > > > > anticipatory 63 8185 > > > > > > > deadline 52 33387 > > > > > > > cfq 43 1420 > > > > > > > > > > > > > > > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did > > > > > not make much difference (also tried with NO_HZ enabled). > > > > > > > > > > > Do you have queuing enabled on your drives? You can check that in > > > > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all > > > > > > schedulers, would be good for comparison. > > > > > > > > > > > > > > > > Here are the tests with a queue_depth of 1 : > > > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > > noop 43 38235 > > > > > anticipatory 44 8728 > > > > > deadline 51 19751 > > > > > cfq 48 427 > > > > > > > > > > > > > > > Overall, I wouldn't say it makes much difference. > > > > > > > > 0,5 seconds vs 1,5 seconds isn't much of a difference? > > > > > > > > > > raid personalities or dm complicates matters, since it introduces a > > > > > > disconnect between 'ls' and the io scheduler at the bottom... > > > > > > > > > > > > > > > > Yes, ideally I should re-run those directly on the disk partitions. > > > > > > > > At least for comparison. > > > > > > > > > I am also tempted to create a fio job file which acts like a ssh server > > > > > receiving a connexion after it has been pruned from the cache while the > > > > > system if doing heavy I/O. "ssh", in this case, seems to be doing much > > > > > more I/O than a simple "ls", and I think we might want to see if cfq > > > > > behaves correctly in such case. Most of this I/O is coming from page > > > > > faults (identified as traps in the trace) probably because the ssh > > > > > executable has been thrown out of the cache by > > > > > > > > > > echo 3 > /proc/sys/vm/drop_caches > > > > > > > > > > The behavior of an incoming ssh connexion after clearing the cache is > > > > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The > > > > > job file created (Part 2) reads, for each job, a 2MB file with random > > > > > reads each between 4k-44k. The results are very interesting for cfq : > > > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > > noop 586 110242 > > > > > anticipatory 531 26942 > > > > > deadline 561 108772 > > > > > cfq 523 28216 > > > > > > > > > > So, basically, ssh being out of the cache can take 28s to answer an > > > > > incoming ssh connexion even with the cfq scheduler. This is not exactly > > > > > what I would call an acceptable latency. > > > > > > > > At some point, you have to stop and consider what is acceptable > > > > performance for a given IO pattern. Your ssh test case is purely random > > > > IO, and neither CFQ nor AS would do any idling for that. We can make > > > > this test case faster for sure, the hard part is making sure that we > > > > don't regress on async throughput at the same time. > > > > > > > > Also remember that with your raid1, it's not entirely reasonable to > > > > blaim all performance issues on the IO scheduler as per my previous > > > > mail. It would be a lot more fair to view the disk numbers individually. > > > > > > > > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set > > > > to 1 as well? > > > > > > > > However, I think we should be doing somewhat better at this test case. > > > > > > Mathieu, does this improve anything for you? > > > > > > > I got this message when running with your patch applied : > > cfq: forced dispatching is broken (nr_sorted=4294967275), please report this > > (message appeared 10 times in a job run) > > Woops, missed a sort inc. Updated version below, or just ignore the > warning. > > > Here is the result : > > > > ssh test done on /dev/sda directly > > > > queue_depth=31 (default) > > /sys/block/sda/queue/iosched/slice_async_rq = 2 (default) > > /sys/block/sda/queue/iosched/quantum = 4 (default) > > > > I/O scheduler runt-min (msec) runt-max (msec) > > cfq (default) 523 6637 > > cfq (patched) 564 7195 > > > > Pretty much the same. > > Can you retry with depth=1 as well? There's not much to rip back out, if > everything is immediately sent to the device. > echo 1 > /sys/block/sda/queue/iosched/quantum echo 1 > /sys/block/sda/queue/iosched/slice_async_rq echo 1 > /sys/block/sda/device/queue_depth ssh test done on /dev/sda directly oops, something wrong in the new patch ? [ 302.077063] BUG: unable to handle kernel paging request at 00000008 [ 302.078732] IP: [<ffffffff8040a1e5>] cfq_remove_request+0x35/0x1d0 [ 302.078732] PGD 43ac76067 PUD 43b1f3067 PMD 0 [ 302.078732] Oops: 0002 [#1] PREEMPT SMP [ 302.078732] LTT NESTING LEVEL : 0 [ 302.078732] last sysfs file: /sys/block/sda/stat [ 302.078732] Dumping ftrace buffer: [ 302.078732] (ftrace buffer empty) [ 302.078732] CPU 0 [ 302.078732] Modules linked in: e1000e loop ltt_tracer ltt_trace_control ltt_e [ 302.078732] Pid: 3748, comm: cron Not tainted 2.6.28 #53 [ 302.078732] RIP: 0010:[<ffffffff8040a1e5>] [<ffffffff8040a1e5>] cfq_remove_0 [ 302.078732] RSP: 0018:ffff8804388a38a8 EFLAGS: 00010087 [ 302.078732] RAX: 0000000000200200 RBX: ffff880437d92000 RCX: 000000002bcde392 [ 302.078732] RDX: 0000000000100100 RSI: ffff880437d92fd0 RDI: ffff880437d92fd0 [ 302.078732] RBP: ffff8804388a38d8 R08: ffff88043e8ce608 R09: 000000002bcdb78a [ 302.078732] R10: 000000002bcdbb8a R11: 0000000000000808 R12: ffff88043e8ce5d8 [ 302.078732] R13: ffff880437d92fd0 R14: ffff88043e433800 R15: ffff88043e8ce5d8 [ 302.078732] FS: 00007fd9637ea780(0000) GS:ffffffff808de7c0(0000) knlGS:00000 [ 302.078732] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 302.078732] CR2: 0000000000100108 CR3: 000000043ad52000 CR4: 00000000000006e0 [ 302.078732] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 302.078732] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 302.078732] Process cron (pid: 3748, threadinfo ffff8804388a2000, task ffff8) [ 302.078732] Stack: [ 302.078732] ffff88043e8ce5e8 ffff880437d92fd0 ffff88043e8ce5d8 ffff88043d550 [ 302.078732] ffff88043e433800 ffff88043e433800 ffff8804388a3908 ffffffff8040d [ 302.078732] ffff88043e8ce5d8 ffff88043e433800 ffff880437d92fd0 ffff88043e8c8 [ 302.078732] Call Trace: [ 302.078732] [<ffffffff8040a3bd>] cfq_dispatch_insert+0x3d/0x70 [ 302.078732] [<ffffffff8040a43c>] cfq_add_rq_rb+0x4c/0xb0 [ 302.078732] [<ffffffff8040ab6f>] cfq_insert_request+0x24f/0x420 [ 302.078732] [<ffffffff803fac30>] elv_insert+0x160/0x2f0 [ 302.078732] [<ffffffff803fae3b>] __elv_add_request+0x7b/0xd0 [ 302.078732] [<ffffffff803fe02d>] __make_request+0xfd/0x4f0 [ 302.078732] [<ffffffff803fc39c>] generic_make_request+0x40c/0x550 [ 302.078732] [<ffffffff8029ccab>] ? mempool_alloc+0x5b/0x150 [ 302.078732] [<ffffffff802f54c8>] ? __find_get_block+0xc8/0x210 [ 302.078732] [<ffffffff803fc582>] submit_bio+0xa2/0x150 [ 302.078732] [<ffffffff802fa75e>] ? bio_alloc_bioset+0x5e/0x100 [ 302.078732] [<ffffffff802f4d26>] submit_bh+0xf6/0x130 [ 302.078732] [<ffffffff8032fbc4>] __ext3_get_inode_loc+0x224/0x340 [ 302.078732] [<ffffffff8032fd40>] ext3_iget+0x60/0x420 [ 302.078732] [<ffffffff80336e68>] ext3_lookup+0xa8/0x100 [ 302.078732] [<ffffffff802e3d46>] ? d_alloc+0x186/0x1f0 [ 302.078732] [<ffffffff802d92a6>] do_lookup+0x206/0x260 [ 302.078732] [<ffffffff802db4f6>] __link_path_walk+0x756/0xfe0 [ 302.078732] [<ffffffff80262cd4>] ? get_lock_stats+0x34/0x70 [ 302.078732] [<ffffffff802dc16b>] ? do_path_lookup+0x9b/0x200 [ 302.078732] [<ffffffff802dbf9e>] path_walk+0x6e/0xe0 [ 302.078732] [<ffffffff802dc176>] do_path_lookup+0xa6/0x200 [ 302.078732] [<ffffffff802dad36>] ? getname+0x1c6/0x230 [ 302.078732] [<ffffffff802dd02b>] user_path_at+0x7b/0xb0 [ 302.078732] [<ffffffff8067d3a7>] ? _spin_unlock_irqrestore+0x47/0x80 [ 302.078732] [<ffffffff80259ad3>] ? hrtimer_try_to_cancel+0x53/0xb0 [ 302.078732] [<ffffffff80259b52>] ? hrtimer_cancel+0x22/0x30 [ 302.078732] [<ffffffff802d414d>] vfs_stat_fd+0x2d/0x60 [ 302.078732] [<ffffffff802d422c>] sys_newstat+0x2c/0x50 [ 302.078732] [<ffffffff80265901>] ? trace_hardirqs_on_caller+0x1b1/0x210 [ 302.078732] [<ffffffff8067cd0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 302.078732] [<ffffffff8020c5db>] system_call_fastpath+0x16/0x1b [ 302.078732] Code: 41 54 53 48 83 ec 08 0f 1f 44 00 00 4c 8b bf c0 00 00 00 4 [ 302.078732] RIP [<ffffffff8040a1e5>] cfq_remove_request+0x35/0x1d0 [ 302.078732] RSP <ffff8804388a38a8> [ 302.078732] CR2: 0000000000100108 [ 302.078732] ---[ end trace 925e67a354a83fdc ]--- [ 302.078732] note: cron[3748] exited with preempt_count 1 > > > > Here is the test done on raid1 : > > queue_depth=31 (default) > > /sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default) > > /sys/block/sd{a,b}/queue/iosched/quantum = 4 (default) > > > > I/O scheduler runt-min (msec) runt-max (msec) > > cfq (default, raid1) 523 28216 > > cfq (patched, raid1) 540 16454 > > > > With nearly same order of magnitude worse-case. > > diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c > index e8525fa..30714de 100644 > --- a/block/cfq-iosched.c > +++ b/block/cfq-iosched.c > @@ -1765,6 +1765,36 @@ cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq, > } > > /* > + * Pull dispatched requests from 'cfqq' back into the scheduler > + */ > +static void cfq_pull_dispatched_requests(struct cfq_data *cfqd, > + struct cfq_queue *cfqq) > +{ > + struct request_queue *q = cfqd->queue; > + struct request *rq; > + > + list_for_each_entry_reverse(rq, &q->queue_head, queuelist) { > + if (rq->cmd_flags & REQ_STARTED) > + break; > + > + if (RQ_CFQQ(rq) != cfqq) > + continue; > + > + /* > + * Pull off the dispatch list and put it back into the cfqq > + */ > + list_del(&rq->queuelist); > + cfqq->dispatched--; > + if (cfq_cfqq_sync(cfqq)) > + cfqd->sync_flight--; > + > + cfq_add_rq_rb(rq); > + q->nr_sorted++; > + list_add_tail(&rq->queuelist, &cfqq->fifo); > + } > +} > + > +/* > * Check if new_cfqq should preempt the currently active queue. Return 0 for > * no or if we aren't sure, a 1 will cause a preempt. > */ > @@ -1820,8 +1850,14 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq, > */ > static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq) > { > + struct cfq_queue *old_cfqq = cfqd->active_queue; > + > cfq_log_cfqq(cfqd, cfqq, "preempt"); > - cfq_slice_expired(cfqd, 1); > + > + if (old_cfqq) { > + __cfq_slice_expired(cfqd, old_cfqq, 1); > + cfq_pull_dispatched_requests(cfqd, old_cfqq); > + } > > /* > * Put the new queue at the front of the of the current list, > > -- > Jens Axboe > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-20 15:42 ` Mathieu Desnoyers @ 2009-01-20 23:06 ` Mathieu Desnoyers 0 siblings, 0 replies; 39+ messages in thread From: Mathieu Desnoyers @ 2009-01-20 23:06 UTC (permalink / raw) To: Jens Axboe; +Cc: akpm, ltt-dev, Linus Torvalds, Ingo Molnar, linux-kernel * Mathieu Desnoyers (compudj@krystal.dyndns.org) wrote: > * Jens Axboe (jens.axboe@oracle.com) wrote: > > On Tue, Jan 20 2009, Mathieu Desnoyers wrote: > > > * Jens Axboe (jens.axboe@oracle.com) wrote: > > > > On Tue, Jan 20 2009, Jens Axboe wrote: > > > > > On Mon, Jan 19 2009, Mathieu Desnoyers wrote: > > > > > > * Jens Axboe (jens.axboe@oracle.com) wrote: > > > > > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote: > > > > > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace > > > > > > > > to create a fio job file. The said behavior is appended below as "Part > > > > > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done > > > > > > > > with the anticipatory I/O scheduler, which was active by default on my > > > > > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am > > > > > > > > running this on a raid-1, but have experienced the same problem on a > > > > > > > > standard partition I created on the same machine. > > > > > > > > > > > > > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It > > > > > > > > consists of one dd-like job and many small jobs reading as many data as > > > > > > > > ls did. I used the small test script to batch run this ("Part 3 - batch > > > > > > > > test"). > > > > > > > > > > > > > > > > The results for the ls-like jobs are interesting : > > > > > > > > > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > > > > > noop 41 10563 > > > > > > > > anticipatory 63 8185 > > > > > > > > deadline 52 33387 > > > > > > > > cfq 43 1420 > > > > > > > > > > > > > > > > > > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did > > > > > > not make much difference (also tried with NO_HZ enabled). > > > > > > > > > > > > > Do you have queuing enabled on your drives? You can check that in > > > > > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all > > > > > > > schedulers, would be good for comparison. > > > > > > > > > > > > > > > > > > > Here are the tests with a queue_depth of 1 : > > > > > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > > > noop 43 38235 > > > > > > anticipatory 44 8728 > > > > > > deadline 51 19751 > > > > > > cfq 48 427 > > > > > > > > > > > > > > > > > > Overall, I wouldn't say it makes much difference. > > > > > > > > > > 0,5 seconds vs 1,5 seconds isn't much of a difference? > > > > > > > > > > > > raid personalities or dm complicates matters, since it introduces a > > > > > > > disconnect between 'ls' and the io scheduler at the bottom... > > > > > > > > > > > > > > > > > > > Yes, ideally I should re-run those directly on the disk partitions. > > > > > > > > > > At least for comparison. > > > > > > > > > > > I am also tempted to create a fio job file which acts like a ssh server > > > > > > receiving a connexion after it has been pruned from the cache while the > > > > > > system if doing heavy I/O. "ssh", in this case, seems to be doing much > > > > > > more I/O than a simple "ls", and I think we might want to see if cfq > > > > > > behaves correctly in such case. Most of this I/O is coming from page > > > > > > faults (identified as traps in the trace) probably because the ssh > > > > > > executable has been thrown out of the cache by > > > > > > > > > > > > echo 3 > /proc/sys/vm/drop_caches > > > > > > > > > > > > The behavior of an incoming ssh connexion after clearing the cache is > > > > > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The > > > > > > job file created (Part 2) reads, for each job, a 2MB file with random > > > > > > reads each between 4k-44k. The results are very interesting for cfq : > > > > > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > > > noop 586 110242 > > > > > > anticipatory 531 26942 > > > > > > deadline 561 108772 > > > > > > cfq 523 28216 > > > > > > > > > > > > So, basically, ssh being out of the cache can take 28s to answer an > > > > > > incoming ssh connexion even with the cfq scheduler. This is not exactly > > > > > > what I would call an acceptable latency. > > > > > > > > > > At some point, you have to stop and consider what is acceptable > > > > > performance for a given IO pattern. Your ssh test case is purely random > > > > > IO, and neither CFQ nor AS would do any idling for that. We can make > > > > > this test case faster for sure, the hard part is making sure that we > > > > > don't regress on async throughput at the same time. > > > > > > > > > > Also remember that with your raid1, it's not entirely reasonable to > > > > > blaim all performance issues on the IO scheduler as per my previous > > > > > mail. It would be a lot more fair to view the disk numbers individually. > > > > > > > > > > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set > > > > > to 1 as well? > > > > > > > > > > However, I think we should be doing somewhat better at this test case. > > > > > > > > Mathieu, does this improve anything for you? > > > > > > > > > > I got this message when running with your patch applied : > > > cfq: forced dispatching is broken (nr_sorted=4294967275), please report this > > > (message appeared 10 times in a job run) > > > > Woops, missed a sort inc. Updated version below, or just ignore the > > warning. > > > > > Here is the result : > > > > > > ssh test done on /dev/sda directly > > > > > > queue_depth=31 (default) > > > /sys/block/sda/queue/iosched/slice_async_rq = 2 (default) > > > /sys/block/sda/queue/iosched/quantum = 4 (default) > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > cfq (default) 523 6637 > > > cfq (patched) 564 7195 > > > > > > Pretty much the same. > > > > Can you retry with depth=1 as well? There's not much to rip back out, if > > everything is immediately sent to the device. > > > > echo 1 > /sys/block/sda/queue/iosched/quantum > echo 1 > /sys/block/sda/queue/iosched/slice_async_rq > echo 1 > /sys/block/sda/device/queue_depth > > ssh test done on /dev/sda directly > > oops, something wrong in the new patch ? > [...] Don't waste time looking into this, here is the fixed version (list_del in a previously non-safe list iteration). Mathieu Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> --- block/cfq-iosched.c | 38 +++++++++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) Index: linux-2.6-lttng/block/cfq-iosched.c =================================================================== --- linux-2.6-lttng.orig/block/cfq-iosched.c 2009-01-20 10:31:46.000000000 -0500 +++ linux-2.6-lttng/block/cfq-iosched.c 2009-01-20 17:41:06.000000000 -0500 @@ -1761,6 +1761,36 @@ cfq_update_idle_window(struct cfq_data * } /* + * Pull dispatched requests from 'cfqq' back into the scheduler + */ +static void cfq_pull_dispatched_requests(struct cfq_data *cfqd, + struct cfq_queue *cfqq) +{ + struct request_queue *q = cfqd->queue; + struct request *rq, *tmp; + + list_for_each_entry_safe_reverse(rq, tmp, &q->queue_head, queuelist) { + if (rq->cmd_flags & REQ_STARTED) + break; + + if (RQ_CFQQ(rq) != cfqq) + continue; + + /* + * Pull off the dispatch list and put it back into the cfqq + */ + list_del(&rq->queuelist); + cfqq->dispatched--; + if (cfq_cfqq_sync(cfqq)) + cfqd->sync_flight--; + + cfq_add_rq_rb(rq); + q->nr_sorted++; + list_add_tail(&rq->queuelist, &cfqq->fifo); + } +} + +/* * Check if new_cfqq should preempt the currently active queue. Return 0 for * no or if we aren't sure, a 1 will cause a preempt. */ @@ -1816,8 +1846,14 @@ cfq_should_preempt(struct cfq_data *cfqd */ static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq) { + struct cfq_queue *old_cfqq = cfqd->active_queue; + cfq_log_cfqq(cfqd, cfqq, "preempt"); - cfq_slice_expired(cfqd, 1); + + if (old_cfqq) { + __cfq_slice_expired(cfqd, old_cfqq, 1); + cfq_pull_dispatched_requests(cfqd, old_cfqq); + } /* * Put the new queue at the front of the of the current list, -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-20 12:28 ` Jens Axboe 2009-01-20 14:22 ` [ltt-dev] " Mathieu Desnoyers @ 2009-01-20 23:27 ` Mathieu Desnoyers 2009-01-21 0:25 ` Mathieu Desnoyers 2009-01-23 3:21 ` [ltt-dev] " KOSAKI Motohiro 2009-02-02 2:08 ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers 2 siblings, 2 replies; 39+ messages in thread From: Mathieu Desnoyers @ 2009-01-20 23:27 UTC (permalink / raw) To: Jens Axboe; +Cc: akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev * Jens Axboe (jens.axboe@oracle.com) wrote: > On Tue, Jan 20 2009, Jens Axboe wrote: > > On Mon, Jan 19 2009, Mathieu Desnoyers wrote: > > > * Jens Axboe (jens.axboe@oracle.com) wrote: > > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote: > > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace > > > > > to create a fio job file. The said behavior is appended below as "Part > > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done > > > > > with the anticipatory I/O scheduler, which was active by default on my > > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am > > > > > running this on a raid-1, but have experienced the same problem on a > > > > > standard partition I created on the same machine. > > > > > > > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It > > > > > consists of one dd-like job and many small jobs reading as many data as > > > > > ls did. I used the small test script to batch run this ("Part 3 - batch > > > > > test"). > > > > > > > > > > The results for the ls-like jobs are interesting : > > > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > > noop 41 10563 > > > > > anticipatory 63 8185 > > > > > deadline 52 33387 > > > > > cfq 43 1420 > > > > > > > > > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did > > > not make much difference (also tried with NO_HZ enabled). > > > > > > > Do you have queuing enabled on your drives? You can check that in > > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all > > > > schedulers, would be good for comparison. > > > > > > > > > > Here are the tests with a queue_depth of 1 : > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > noop 43 38235 > > > anticipatory 44 8728 > > > deadline 51 19751 > > > cfq 48 427 > > > > > > > > > Overall, I wouldn't say it makes much difference. > > > > 0,5 seconds vs 1,5 seconds isn't much of a difference? > > > > > > raid personalities or dm complicates matters, since it introduces a > > > > disconnect between 'ls' and the io scheduler at the bottom... > > > > > > > > > > Yes, ideally I should re-run those directly on the disk partitions. > > > > At least for comparison. > > > > > I am also tempted to create a fio job file which acts like a ssh server > > > receiving a connexion after it has been pruned from the cache while the > > > system if doing heavy I/O. "ssh", in this case, seems to be doing much > > > more I/O than a simple "ls", and I think we might want to see if cfq > > > behaves correctly in such case. Most of this I/O is coming from page > > > faults (identified as traps in the trace) probably because the ssh > > > executable has been thrown out of the cache by > > > > > > echo 3 > /proc/sys/vm/drop_caches > > > > > > The behavior of an incoming ssh connexion after clearing the cache is > > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The > > > job file created (Part 2) reads, for each job, a 2MB file with random > > > reads each between 4k-44k. The results are very interesting for cfq : > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > noop 586 110242 > > > anticipatory 531 26942 > > > deadline 561 108772 > > > cfq 523 28216 > > > > > > So, basically, ssh being out of the cache can take 28s to answer an > > > incoming ssh connexion even with the cfq scheduler. This is not exactly > > > what I would call an acceptable latency. > > > > At some point, you have to stop and consider what is acceptable > > performance for a given IO pattern. Your ssh test case is purely random > > IO, and neither CFQ nor AS would do any idling for that. We can make > > this test case faster for sure, the hard part is making sure that we > > don't regress on async throughput at the same time. > > > > Also remember that with your raid1, it's not entirely reasonable to > > blaim all performance issues on the IO scheduler as per my previous > > mail. It would be a lot more fair to view the disk numbers individually. > > > > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set > > to 1 as well? > > > > However, I think we should be doing somewhat better at this test case. > > Mathieu, does this improve anything for you? > So, I ran the tests with my corrected patch, and the results are very good ! "incoming ssh connexion" test "config 2.6.28 cfq" Linux 2.6.28 /sys/block/sd{a,b}/device/queue_depth = 31 (default) /sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default) /sys/block/sd{a,b}/queue/iosched/quantum = 4 (default) "config 2.6.28.1-patch1" Linux 2.6.28.1 Corrected cfq patch applied echo 1 > /sys/block/sd{a,b}/device/queue_depth echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum On /dev/sda : I/O scheduler runt-min (msec) runt-max (msec) cfq (2.6.28 cfq) 523 6637 cfq (2.6.28.1-patch1) 579 2082 On raid1 : I/O scheduler runt-min (msec) runt-max (msec) cfq (2.6.28 cfq) 523 28216 cfq (2.6.28.1-patch1) 517 3086 It looks like we are getting somewhere :) Are there any specific queue_depth, slice_async_rq, quantum variations you would like to be tested ? For reference, I attach my ssh-like job file (again) to this mail. Mathieu [job1] rw=write size=10240m direct=0 blocksize=1024k [global] rw=randread size=2048k filesize=30m direct=0 bsrange=4k-44k [file1] startdelay=0 [file2] startdelay=4 [file3] startdelay=8 [file4] startdelay=12 [file5] startdelay=16 [file6] startdelay=20 [file7] startdelay=24 [file8] startdelay=28 [file9] startdelay=32 [file10] startdelay=36 [file11] startdelay=40 [file12] startdelay=44 [file13] startdelay=48 [file14] startdelay=52 [file15] startdelay=56 [file16] startdelay=60 [file17] startdelay=64 [file18] startdelay=68 [file19] startdelay=72 [file20] startdelay=76 [file21] startdelay=80 [file22] startdelay=84 [file23] startdelay=88 [file24] startdelay=92 [file25] startdelay=96 [file26] startdelay=100 [file27] startdelay=104 [file28] startdelay=108 [file29] startdelay=112 [file30] startdelay=116 [file31] startdelay=120 [file32] startdelay=124 [file33] startdelay=128 [file34] startdelay=132 [file35] startdelay=134 [file36] startdelay=138 [file37] startdelay=142 [file38] startdelay=146 [file39] startdelay=150 [file40] startdelay=200 [file41] startdelay=260 -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-20 23:27 ` Mathieu Desnoyers @ 2009-01-21 0:25 ` Mathieu Desnoyers 2009-01-21 4:38 ` Ben Gamari 2009-01-22 22:59 ` Mathieu Desnoyers 2009-01-23 3:21 ` [ltt-dev] " KOSAKI Motohiro 1 sibling, 2 replies; 39+ messages in thread From: Mathieu Desnoyers @ 2009-01-21 0:25 UTC (permalink / raw) To: Jens Axboe; +Cc: akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote: > * Jens Axboe (jens.axboe@oracle.com) wrote: > > On Tue, Jan 20 2009, Jens Axboe wrote: > > > On Mon, Jan 19 2009, Mathieu Desnoyers wrote: > > > > * Jens Axboe (jens.axboe@oracle.com) wrote: > > > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote: > > > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace > > > > > > to create a fio job file. The said behavior is appended below as "Part > > > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done > > > > > > with the anticipatory I/O scheduler, which was active by default on my > > > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am > > > > > > running this on a raid-1, but have experienced the same problem on a > > > > > > standard partition I created on the same machine. > > > > > > > > > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It > > > > > > consists of one dd-like job and many small jobs reading as many data as > > > > > > ls did. I used the small test script to batch run this ("Part 3 - batch > > > > > > test"). > > > > > > > > > > > > The results for the ls-like jobs are interesting : > > > > > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > > > noop 41 10563 > > > > > > anticipatory 63 8185 > > > > > > deadline 52 33387 > > > > > > cfq 43 1420 > > > > > > > > > > > > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did > > > > not make much difference (also tried with NO_HZ enabled). > > > > > > > > > Do you have queuing enabled on your drives? You can check that in > > > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all > > > > > schedulers, would be good for comparison. > > > > > > > > > > > > > Here are the tests with a queue_depth of 1 : > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > noop 43 38235 > > > > anticipatory 44 8728 > > > > deadline 51 19751 > > > > cfq 48 427 > > > > > > > > > > > > Overall, I wouldn't say it makes much difference. > > > > > > 0,5 seconds vs 1,5 seconds isn't much of a difference? > > > > > > > > raid personalities or dm complicates matters, since it introduces a > > > > > disconnect between 'ls' and the io scheduler at the bottom... > > > > > > > > > > > > > Yes, ideally I should re-run those directly on the disk partitions. > > > > > > At least for comparison. > > > > > > > I am also tempted to create a fio job file which acts like a ssh server > > > > receiving a connexion after it has been pruned from the cache while the > > > > system if doing heavy I/O. "ssh", in this case, seems to be doing much > > > > more I/O than a simple "ls", and I think we might want to see if cfq > > > > behaves correctly in such case. Most of this I/O is coming from page > > > > faults (identified as traps in the trace) probably because the ssh > > > > executable has been thrown out of the cache by > > > > > > > > echo 3 > /proc/sys/vm/drop_caches > > > > > > > > The behavior of an incoming ssh connexion after clearing the cache is > > > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The > > > > job file created (Part 2) reads, for each job, a 2MB file with random > > > > reads each between 4k-44k. The results are very interesting for cfq : > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > noop 586 110242 > > > > anticipatory 531 26942 > > > > deadline 561 108772 > > > > cfq 523 28216 > > > > > > > > So, basically, ssh being out of the cache can take 28s to answer an > > > > incoming ssh connexion even with the cfq scheduler. This is not exactly > > > > what I would call an acceptable latency. > > > > > > At some point, you have to stop and consider what is acceptable > > > performance for a given IO pattern. Your ssh test case is purely random > > > IO, and neither CFQ nor AS would do any idling for that. We can make > > > this test case faster for sure, the hard part is making sure that we > > > don't regress on async throughput at the same time. > > > > > > Also remember that with your raid1, it's not entirely reasonable to > > > blaim all performance issues on the IO scheduler as per my previous > > > mail. It would be a lot more fair to view the disk numbers individually. > > > > > > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set > > > to 1 as well? > > > > > > However, I think we should be doing somewhat better at this test case. > > > > Mathieu, does this improve anything for you? > > > > So, I ran the tests with my corrected patch, and the results are very > good ! > > "incoming ssh connexion" test > > "config 2.6.28 cfq" > Linux 2.6.28 > /sys/block/sd{a,b}/device/queue_depth = 31 (default) > /sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default) > /sys/block/sd{a,b}/queue/iosched/quantum = 4 (default) > > "config 2.6.28.1-patch1" > Linux 2.6.28.1 > Corrected cfq patch applied > echo 1 > /sys/block/sd{a,b}/device/queue_depth > echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq > echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum > > On /dev/sda : > > I/O scheduler runt-min (msec) runt-max (msec) > cfq (2.6.28 cfq) 523 6637 > cfq (2.6.28.1-patch1) 579 2082 > > On raid1 : > > I/O scheduler runt-min (msec) runt-max (msec) > cfq (2.6.28 cfq) 523 28216 As a side-note : I'd like to have my results confirmed by others. I just found out that my 2 Seagate drives are in the "defect" list (ST3500320AS) that exhibits the behavior to stop for about 30s when doing "video streaming". (http://www.computerworld.com/action/article.do?command=viewArticleBasic&taxonomyName=storage&articleId=9126280&taxonomyId=19&intsrc=kc_top) (http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931) Therefore, I would not take any decision based on such known bad firmware. But the last results we've got are definitely interesting. I'll upgrade my firmware as soon as Segate puts it back online so I can re-run more tests. Mathieu > cfq (2.6.28.1-patch1) 517 3086 > > It looks like we are getting somewhere :) Are there any specific > queue_depth, slice_async_rq, quantum variations you would like to be > tested ? > > For reference, I attach my ssh-like job file (again) to this mail. > > Mathieu > > > [job1] > rw=write > size=10240m > direct=0 > blocksize=1024k > > [global] > rw=randread > size=2048k > filesize=30m > direct=0 > bsrange=4k-44k > > [file1] > startdelay=0 > > [file2] > startdelay=4 > > [file3] > startdelay=8 > > [file4] > startdelay=12 > > [file5] > startdelay=16 > > [file6] > startdelay=20 > > [file7] > startdelay=24 > > [file8] > startdelay=28 > > [file9] > startdelay=32 > > [file10] > startdelay=36 > > [file11] > startdelay=40 > > [file12] > startdelay=44 > > [file13] > startdelay=48 > > [file14] > startdelay=52 > > [file15] > startdelay=56 > > [file16] > startdelay=60 > > [file17] > startdelay=64 > > [file18] > startdelay=68 > > [file19] > startdelay=72 > > [file20] > startdelay=76 > > [file21] > startdelay=80 > > [file22] > startdelay=84 > > [file23] > startdelay=88 > > [file24] > startdelay=92 > > [file25] > startdelay=96 > > [file26] > startdelay=100 > > [file27] > startdelay=104 > > [file28] > startdelay=108 > > [file29] > startdelay=112 > > [file30] > startdelay=116 > > [file31] > startdelay=120 > > [file32] > startdelay=124 > > [file33] > startdelay=128 > > [file34] > startdelay=132 > > [file35] > startdelay=134 > > [file36] > startdelay=138 > > [file37] > startdelay=142 > > [file38] > startdelay=146 > > [file39] > startdelay=150 > > [file40] > startdelay=200 > > [file41] > startdelay=260 > > -- > Mathieu Desnoyers > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-21 0:25 ` Mathieu Desnoyers @ 2009-01-21 4:38 ` Ben Gamari 2009-01-21 4:54 ` [ltt-dev] " Mathieu Desnoyers 2009-01-22 22:59 ` Mathieu Desnoyers 1 sibling, 1 reply; 39+ messages in thread From: Ben Gamari @ 2009-01-21 4:38 UTC (permalink / raw) To: Mathieu Desnoyers, Jens Axboe Cc: akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev On Tue, Jan 20, 2009 at 7:25 PM, Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote: > * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote: > > As a side-note : I'd like to have my results confirmed by others. Well, I think the (fixed) patch did help to some degree (I haven't done fio benchmarks to compare against yet). Unfortunately, the I/O wait time problem still remains. I have been waiting 3 minutes now for evolution to start with 88% I/O wait time yet no visible signs of progress. I've confirmed I'm using the CFQ scheduler, so that's not the problem. Also, Jens, I'd just like to point out that the problem is reproducible across all schedulers. Does your patch seek to tackle a problem specific to the CFQ scheduler, leaving the I/O wait issue for later? Just wondering. I'll post some benchmarks numbers once I have them. Thanks, - Ben ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-21 4:38 ` Ben Gamari @ 2009-01-21 4:54 ` Mathieu Desnoyers 2009-01-21 6:17 ` Ben Gamari 0 siblings, 1 reply; 39+ messages in thread From: Mathieu Desnoyers @ 2009-01-21 4:54 UTC (permalink / raw) To: Ben Gamari Cc: Jens Axboe, akpm, ltt-dev, Linus Torvalds, Ingo Molnar, linux-kernel * Ben Gamari (bgamari@gmail.com) wrote: > On Tue, Jan 20, 2009 at 7:25 PM, Mathieu Desnoyers > <mathieu.desnoyers@polymtl.ca> wrote: > > * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote: > > > > As a side-note : I'd like to have my results confirmed by others. > > Well, I think the (fixed) patch did help to some degree (I haven't > done fio benchmarks to compare against yet). Unfortunately, the I/O > wait time problem still remains. I have been waiting 3 minutes now for > evolution to start with 88% I/O wait time yet no visible signs of > progress. I've confirmed I'm using the CFQ scheduler, so that's not > the problem. > Did you also echo 1 > /sys/block/sd{a,b}/device/queue_depth echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum (replacing sd{a,b} with your actual drives) ? It seems to have been part of the factors that helped (along with the patch). And hopefully you don't have a recent Seagate hard drive like me ? :-) So you test case is : - start a large dd with 1M block size - time evolution ? Mathieu > Also, Jens, I'd just like to point out that the problem is > reproducible across all schedulers. Does your patch seek to tackle a > problem specific to the CFQ scheduler, leaving the I/O wait issue for > later? Just wondering. > > I'll post some benchmarks numbers once I have them. Thanks, > > - Ben > > _______________________________________________ > ltt-dev mailing list > ltt-dev@lists.casi.polymtl.ca > http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-21 4:54 ` [ltt-dev] " Mathieu Desnoyers @ 2009-01-21 6:17 ` Ben Gamari 0 siblings, 0 replies; 39+ messages in thread From: Ben Gamari @ 2009-01-21 6:17 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Jens Axboe, akpm, ltt-dev, Linus Torvalds, Ingo Molnar, linux-kernel On Tue, 2009-01-20 at 23:54 -0500, Mathieu Desnoyers wrote: > * Ben Gamari (bgamari@gmail.com) wrote: > > On Tue, Jan 20, 2009 at 7:25 PM, Mathieu Desnoyers > > <mathieu.desnoyers@polymtl.ca> wrote: > > > * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote: > > > > > > As a side-note : I'd like to have my results confirmed by others. > > > > Well, I think the (fixed) patch did help to some degree (I haven't > > done fio benchmarks to compare against yet). Unfortunately, the I/O > > wait time problem still remains. I have been waiting 3 minutes now for > > evolution to start with 88% I/O wait time yet no visible signs of > > progress. I've confirmed I'm using the CFQ scheduler, so that's not > > the problem. > > > > Did you also > > echo 1 > /sys/block/sd{a,b}/device/queue_depth I have been using this in some of my measurements (this is recorded, of course). > echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq > echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum I haven't been doing this although I will collect a data set with these parameters set. It would be to compare the effect of this to the default configuration. > > (replacing sd{a,b} with your actual drives) ? > > It seems to have been part of the factors that helped (along with the > patch). > > And hopefully you don't have a recent Seagate hard drive like me ? :-) Thankfully, no. > > So you test case is : > - start a large dd with 1M block size > - time evolution > I've been using evolution to get a rough idea of the performance of the configurations but not as a benchmark per se. I have some pretty good-sized maildirs, so launching evolution for the first time can be quite a task, IO-wise. Also, switching between folders used to be quite time consuming. It seems like the patch did help a bit on this front though. For a quantitative benchmark I've been using the fio job that you posted earlier. I've been collecting results and should have a pretty good data set soon. I'll send out a compilation of all the data I've collected as soon as I've finished. - Ben ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-21 0:25 ` Mathieu Desnoyers 2009-01-21 4:38 ` Ben Gamari @ 2009-01-22 22:59 ` Mathieu Desnoyers 1 sibling, 0 replies; 39+ messages in thread From: Mathieu Desnoyers @ 2009-01-22 22:59 UTC (permalink / raw) To: Jens Axboe; +Cc: akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote: > * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote: > > * Jens Axboe (jens.axboe@oracle.com) wrote: > > > On Tue, Jan 20 2009, Jens Axboe wrote: > > > > On Mon, Jan 19 2009, Mathieu Desnoyers wrote: > > > > > * Jens Axboe (jens.axboe@oracle.com) wrote: > > > > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote: > > > > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace > > > > > > > to create a fio job file. The said behavior is appended below as "Part > > > > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done > > > > > > > with the anticipatory I/O scheduler, which was active by default on my > > > > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am > > > > > > > running this on a raid-1, but have experienced the same problem on a > > > > > > > standard partition I created on the same machine. > > > > > > > > > > > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It > > > > > > > consists of one dd-like job and many small jobs reading as many data as > > > > > > > ls did. I used the small test script to batch run this ("Part 3 - batch > > > > > > > test"). > > > > > > > > > > > > > > The results for the ls-like jobs are interesting : > > > > > > > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > > > > noop 41 10563 > > > > > > > anticipatory 63 8185 > > > > > > > deadline 52 33387 > > > > > > > cfq 43 1420 > > > > > > > > > > > > > > > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did > > > > > not make much difference (also tried with NO_HZ enabled). > > > > > > > > > > > Do you have queuing enabled on your drives? You can check that in > > > > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all > > > > > > schedulers, would be good for comparison. > > > > > > > > > > > > > > > > Here are the tests with a queue_depth of 1 : > > > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > > noop 43 38235 > > > > > anticipatory 44 8728 > > > > > deadline 51 19751 > > > > > cfq 48 427 > > > > > > > > > > > > > > > Overall, I wouldn't say it makes much difference. > > > > > > > > 0,5 seconds vs 1,5 seconds isn't much of a difference? > > > > > > > > > > raid personalities or dm complicates matters, since it introduces a > > > > > > disconnect between 'ls' and the io scheduler at the bottom... > > > > > > > > > > > > > > > > Yes, ideally I should re-run those directly on the disk partitions. > > > > > > > > At least for comparison. > > > > > > > > > I am also tempted to create a fio job file which acts like a ssh server > > > > > receiving a connexion after it has been pruned from the cache while the > > > > > system if doing heavy I/O. "ssh", in this case, seems to be doing much > > > > > more I/O than a simple "ls", and I think we might want to see if cfq > > > > > behaves correctly in such case. Most of this I/O is coming from page > > > > > faults (identified as traps in the trace) probably because the ssh > > > > > executable has been thrown out of the cache by > > > > > > > > > > echo 3 > /proc/sys/vm/drop_caches > > > > > > > > > > The behavior of an incoming ssh connexion after clearing the cache is > > > > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The > > > > > job file created (Part 2) reads, for each job, a 2MB file with random > > > > > reads each between 4k-44k. The results are very interesting for cfq : > > > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > > noop 586 110242 > > > > > anticipatory 531 26942 > > > > > deadline 561 108772 > > > > > cfq 523 28216 > > > > > > > > > > So, basically, ssh being out of the cache can take 28s to answer an > > > > > incoming ssh connexion even with the cfq scheduler. This is not exactly > > > > > what I would call an acceptable latency. > > > > > > > > At some point, you have to stop and consider what is acceptable > > > > performance for a given IO pattern. Your ssh test case is purely random > > > > IO, and neither CFQ nor AS would do any idling for that. We can make > > > > this test case faster for sure, the hard part is making sure that we > > > > don't regress on async throughput at the same time. > > > > > > > > Also remember that with your raid1, it's not entirely reasonable to > > > > blaim all performance issues on the IO scheduler as per my previous > > > > mail. It would be a lot more fair to view the disk numbers individually. > > > > > > > > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set > > > > to 1 as well? > > > > > > > > However, I think we should be doing somewhat better at this test case. > > > > > > Mathieu, does this improve anything for you? > > > > > > > So, I ran the tests with my corrected patch, and the results are very > > good ! > > > > "incoming ssh connexion" test > > > > "config 2.6.28 cfq" > > Linux 2.6.28 > > /sys/block/sd{a,b}/device/queue_depth = 31 (default) > > /sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default) > > /sys/block/sd{a,b}/queue/iosched/quantum = 4 (default) > > > > "config 2.6.28.1-patch1" > > Linux 2.6.28.1 > > Corrected cfq patch applied > > echo 1 > /sys/block/sd{a,b}/device/queue_depth > > echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq > > echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum > > > > On /dev/sda : > > > > I/O scheduler runt-min (msec) runt-max (msec) > > cfq (2.6.28 cfq) 523 6637 > > cfq (2.6.28.1-patch1) 579 2082 > > > > On raid1 : > > > > I/O scheduler runt-min (msec) runt-max (msec) > > cfq (2.6.28 cfq) 523 28216 > > As a side-note : I'd like to have my results confirmed by others. I just > found out that my 2 Seagate drives are in the "defect" list > (ST3500320AS) that exhibits the behavior to stop for about 30s when doing > "video streaming". > (http://www.computerworld.com/action/article.do?command=viewArticleBasic&taxonomyName=storage&articleId=9126280&taxonomyId=19&intsrc=kc_top) > (http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931) > > Therefore, I would not take any decision based on such known bad > firmware. But the last results we've got are definitely interesting. > > I'll upgrade my firmware as soon as Segate puts it back online so I can > re-run more tests. > After firmware upgrade : "incoming ssh connexion" test (ran the job file 2-3 times to get correct runt-max results) "config 2.6.28.1 dfl" Linux 2.6.28.1 /sys/block/sd{a,b}/device/queue_depth = 31 (default) /sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default) /sys/block/sd{a,b}/queue/iosched/quantum = 4 (default) "config 2.6.28.1 1" Linux 2.6.28.1 echo 1 > /sys/block/sd{a,b}/device/queue_depth echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum "config 2.6.28.1-patch dfl" Linux 2.6.28.1 Corrected cfq patch applied /sys/block/sd{a,b}/device/queue_depth = 31 (default) /sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default) /sys/block/sd{a,b}/queue/iosched/quantum = 4 (default) "config 2.6.28.1-patch 1" Linux 2.6.28.1 Corrected cfq patch applied echo 1 > /sys/block/sd{a,b}/device/queue_depth echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum On /dev/sda : I/O scheduler runt-min (msec) runt-avg (msec) runt-max (msec) cfq (2.6.28.1 dfl) 560 4134.04 12125 cfq (2.6.28.1-patch dfl) 508 4329.75 9625 cfq (2.6.28.1 1) 535 1068.46 2622 cfq (2.6.28.1-patch 1) 511 2239.87 4117 On /dev/md1 (raid1) : I/O scheduler runt-min (msec) runt-avg (msec) runt-max (msec) cfq (2.6.28.1 dfl) 507 4053.19 26265 cfq (2.6.28.1-patch dfl) 532 3991.75 18567 cfq (2.6.28.1 1) 510 1900.14 27410 cfq (2.6.28.1-patch 1) 539 2112.60 22859 A fio output taken from the raid1 cfq (2.6.28.1-patch 1) run looks like the following. It's a bit strange that we have readers started earlier which seems to complete only _after_ more recent readers have. Excerpt (full output appended after email) : Jobs: 2 (f=1): [W________________rPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 3 (f=1): [W________________rrPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 3 (f=1): [W________________rrPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 3 (f=1): [W________________rrPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 3 (f=1): [W________________rrPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 4 (f=1): [W________________rrrPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 4 (f=1): [W________________rrrPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 4 (f=1): [W________________rrrPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 4 (f=1): [W________________rrrPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 5 (f=1): [W________________rrrrPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 5 (f=1): [W________________rrrrPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 5 (f=1): [W________________rrrrPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 5 (f=1): [W________________rrrrPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 6 (f=2): [W________________rrrrrPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 560/ Jobs: 5 (f=1): [W________________rrrr_PPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 1512/ Jobs: 5 (f=1): [W________________rrrr_PPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 5 (f=1): [W________________rrrr_PPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 6 (f=2): [W________________rrrr_rPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 144/ Jobs: 5 (f=1): [W________________rrrr__PPPPPPPPPPPPPPPPPPP] [0.0% done] [ 1932/ Jobs: 5 (f=1): [W________________rrrr__PPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 5 (f=1): [W________________rrrr__IPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 6 (f=2): [W________________rrrr__rPPPPPPPPPPPPPPPPPP] [0.0% done] [ 608/ Jobs: 6 (f=2): [W________________rrrr__rPPPPPPPPPPPPPPPPPP] [0.0% done] [ 1052/ Jobs: 5 (f=1): [W________________rrrr___PPPPPPPPPPPPPPPPPP] [0.0% done] [ 388/ Jobs: 5 (f=1): [W________________rrrr___IPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 5 (f=1): [W________________rrrr____PPPPPPPPPPPPPPPPP] [0.0% done] [ 2076/ Jobs: 5 (f=5): [W________________rrrr____PPPPPPPPPPPPPPPPP] [49.0% done] [ 2936 Jobs: 2 (f=2): [W_________________r______PPPPPPPPPPPPPPPPP] [50.8% done] [ 5192 Jobs: 2 (f=2): [W________________________rPPPPPPPPPPPPPPPP] [16.0% done] [ 104 Given the numbers I get, I see that runt-max numbers does not appear to be so high at each job file run, which makes it difficult to compare them (since you never know if you've hit the worse-case yet). This could be related to raid1, because I've seen this both with and without your patch applied, and it only seems to appear on raid1 executions. However, the patch you sent does not seem to improve the behavior. It actually makes the average and max latency worse in almost every case. Changing the queue, slice_async_rq and quantum parameters clearly helps reducing both avg and max latency. Mathieu Full output : Running cfq Starting 42 processes Jobs: 1 (f=1): [W_PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [5.7% done] [ 0/ Jobs: 1 (f=1): [W_PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [7.7% done] [ 0/ Jobs: 1 (f=1): [W_IPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [9.6% done] [ 0/ Jobs: 2 (f=2): [W_rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [11.1% done] [ 979 Jobs: 1 (f=1): [W__PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [10.9% done] [ 1098 Jobs: 1 (f=1): [W__PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [12.9% done] [ 0 Jobs: 2 (f=2): [W__rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [112.5% done] [ Jobs: 2 (f=2): [W__rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [16.1% done] [ 1160 Jobs: 2 (f=1): [W__rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [15.9% done] [ 888 Jobs: 2 (f=1): [W__rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [16.0% done] [ 0 Jobs: 3 (f=2): [W__rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 3 (f=2): [W__rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 3 (f=2): [W__rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 3 (f=2): [W__rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 3 (f=3): [W___rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [16.7% done] [ 660 Jobs: 2 (f=2): [W____rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [18.0% done] [ 2064 Jobs: 1 (f=1): [W_____PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [19.4% done] [ 1392 Jobs: 1 (f=1): [W_____PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [20.6% done] [ 0 Jobs: 2 (f=2): [W_____rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [105.0% done] [ Jobs: 2 (f=2): [W_____rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [110.0% done] [ Jobs: 2 (f=2): [W_____rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [115.0% done] [ Jobs: 2 (f=2): [W_____rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [120.0% done] [ Jobs: 3 (f=3): [W_____rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [104.2% done] [ Jobs: 3 (f=3): [W_____rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [108.3% done] [ Jobs: 3 (f=3): [W_____rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [112.5% done] [ Jobs: 3 (f=3): [W_____rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [116.7% done] [ Jobs: 4 (f=4): [W_____rrrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [103.6% done] [ Jobs: 4 (f=4): [W_____rrrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [107.1% done] [ Jobs: 4 (f=4): [W_____rrrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [9.8% done] [ 280/ Jobs: 3 (f=3): [W_____r_rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [34.0% done] [ 3624 Jobs: 2 (f=2): [W________rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [34.0% done] [ 2744 Jobs: 1 (f=1): [W_________PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [34.0% done] [ 1620 Jobs: 1 (f=1): [W_________PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [34.0% done] [ 0 Jobs: 1 (f=1): [W_________PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [34.3% done] [ 0 Jobs: 2 (f=2): [W_________rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [34.9% done] [ 116 Jobs: 1 (f=1): [W__________PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [34.9% done] [ 1944 Jobs: 1 (f=1): [W__________PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [35.8% done] [ 0 Jobs: 1 (f=1): [W__________PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [36.4% done] [ 0 Jobs: 2 (f=2): [W__________rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [36.9% done] [ 228 Jobs: 2 (f=2): [W__________rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [37.2% done] [ 1420 Jobs: 1 (f=1): [W___________PPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [37.7% done] [ 400 Jobs: 1 (f=1): [W___________PPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [39.1% done] [ 0 Jobs: 2 (f=2): [W___________rPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [39.3% done] [ 268 Jobs: 2 (f=2): [W___________rPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [39.5% done] [ 944 Jobs: 1 (f=1): [W____________PPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [40.3% done] [ 848 Jobs: 1 (f=1): [W____________IPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [40.5% done] [ 0 Jobs: 2 (f=2): [W____________rPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [41.0% done] [ 400 Jobs: 2 (f=2): [W____________rPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [41.1% done] [ 1208 Jobs: 1 (f=1): [W_____________PPPPPPPPPPPPPPPPPPPPPPPPPPPP] [41.9% done] [ 456 Jobs: 2 (f=2): [W_____________rPPPPPPPPPPPPPPPPPPPPPPPPPPP] [101.9% done] [ Jobs: 2 (f=2): [W_____________rPPPPPPPPPPPPPPPPPPPPPPPPPPP] [42.9% done] [ 380 Jobs: 2 (f=2): [W_____________rPPPPPPPPPPPPPPPPPPPPPPPPPPP] [43.3% done] [ 760 Jobs: 1 (f=1): [W______________PPPPPPPPPPPPPPPPPPPPPPPPPPP] [43.8% done] [ 912 Jobs: 2 (f=2): [W______________rPPPPPPPPPPPPPPPPPPPPPPPPPP] [44.2% done] [ 44 Jobs: 2 (f=2): [W______________rPPPPPPPPPPPPPPPPPPPPPPPPPP] [44.6% done] [ 1020 Jobs: 1 (f=1): [W_______________PPPPPPPPPPPPPPPPPPPPPPPPPP] [45.4% done] [ 1008 Jobs: 1 (f=1): [W_______________PPPPPPPPPPPPPPPPPPPPPPPPPP] [46.2% done] [ 0 Jobs: 2 (f=2): [W_______________rPPPPPPPPPPPPPPPPPPPPPPPPP] [46.6% done] [ 52 Jobs: 2 (f=2): [W_______________rPPPPPPPPPPPPPPPPPPPPPPPPP] [47.0% done] [ 1248 Jobs: 1 (f=1): [W________________PPPPPPPPPPPPPPPPPPPPPPPPP] [47.4% done] [ 760 Jobs: 1 (f=1): [W________________PPPPPPPPPPPPPPPPPPPPPPPPP] [48.1% done] [ 0 Jobs: 2 (f=1): [W________________rPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 2 (f=1): [W________________rPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 2 (f=1): [W________________rPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 2 (f=1): [W________________rPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 3 (f=1): [W________________rrPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 3 (f=1): [W________________rrPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 3 (f=1): [W________________rrPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 3 (f=1): [W________________rrPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 4 (f=1): [W________________rrrPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 4 (f=1): [W________________rrrPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 4 (f=1): [W________________rrrPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 4 (f=1): [W________________rrrPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 5 (f=1): [W________________rrrrPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 5 (f=1): [W________________rrrrPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 5 (f=1): [W________________rrrrPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 5 (f=1): [W________________rrrrPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 6 (f=2): [W________________rrrrrPPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 560/ Jobs: 5 (f=1): [W________________rrrr_PPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 1512/ Jobs: 5 (f=1): [W________________rrrr_PPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 5 (f=1): [W________________rrrr_PPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 6 (f=2): [W________________rrrr_rPPPPPPPPPPPPPPPPPPP] [0.0% done] [ 144/ Jobs: 5 (f=1): [W________________rrrr__PPPPPPPPPPPPPPPPPPP] [0.0% done] [ 1932/ Jobs: 5 (f=1): [W________________rrrr__PPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 5 (f=1): [W________________rrrr__IPPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 6 (f=2): [W________________rrrr__rPPPPPPPPPPPPPPPPPP] [0.0% done] [ 608/ Jobs: 6 (f=2): [W________________rrrr__rPPPPPPPPPPPPPPPPPP] [0.0% done] [ 1052/ Jobs: 5 (f=1): [W________________rrrr___PPPPPPPPPPPPPPPPPP] [0.0% done] [ 388/ Jobs: 5 (f=1): [W________________rrrr___IPPPPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 5 (f=1): [W________________rrrr____PPPPPPPPPPPPPPPPP] [0.0% done] [ 2076/ Jobs: 5 (f=5): [W________________rrrr____PPPPPPPPPPPPPPPPP] [49.0% done] [ 2936 Jobs: 2 (f=2): [W_________________r______PPPPPPPPPPPPPPPPP] [50.8% done] [ 5192 Jobs: 2 (f=2): [W________________________rPPPPPPPPPPPPPPPP] [16.0% done] [ 104 Jobs: 2 (f=2): [W________________________rPPPPPPPPPPPPPPPP] [54.7% done] [ 1052 Jobs: 1 (f=1): [W_________________________PPPPPPPPPPPPPPPP] [56.6% done] [ 1016 Jobs: 1 (f=1): [W_________________________PPPPPPPPPPPPPPPP] [58.1% done] [ 0 Jobs: 2 (f=2): [W_________________________rPPPPPPPPPPPPPPP] [59.8% done] [ 52 Jobs: 2 (f=2): [W_________________________rPPPPPPPPPPPPPPP] [61.1% done] [ 1372 Jobs: 1 (f=1): [W__________________________PPPPPPPPPPPPPPP] [63.2% done] [ 652 Jobs: 1 (f=1): [W__________________________PPPPPPPPPPPPPPP] [65.0% done] [ 0 Jobs: 2 (f=1): [W__________________________rPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 2 (f=1): [W__________________________rPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 2 (f=1): [W__________________________rPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 2 (f=1): [W__________________________rPPPPPPPPPPPPPP] [0.0% done] [ 0/ Jobs: 3 (f=3): [W__________________________rrPPPPPPPPPPPPP] [67.3% done] [ 1224 Jobs: 2 (f=2): [W___________________________rPPPPPPPPPPPPP] [68.8% done] [ 2124 Jobs: 1 (f=1): [W____________________________PPPPPPPPPPPPP] [69.8% done] [ 780 Jobs: 1 (f=1): [W____________________________PPPPPPPPPPPPP] [71.3% done] [ 0 Jobs: 2 (f=2): [W____________________________rPPPPPPPPPPPP] [72.9% done] [ 84 Jobs: 2 (f=2): [W____________________________rPPPPPPPPPPPP] [73.1% done] [ 1312 Jobs: 1 (f=1): [W_____________________________PPPPPPPPPPPP] [73.2% done] [ 688 Jobs: 1 (f=1): [W_____________________________PPPPPPPPPPPP] [73.9% done] [ 0 Jobs: 2 (f=2): [W_____________________________rPPPPPPPPPPP] [73.6% done] [ 476 Jobs: 1 (f=1): [W_____________________________EPPPPPPPPPPP] [73.8% done] [ 1608 Jobs: 1 (f=1): [W______________________________PPPPPPPPPPP] [73.9% done] [ 0 Jobs: 1 (f=1): [W______________________________PPPPPPPPPPP] [74.1% done] [ 0 Jobs: 2 (f=2): [W______________________________rPPPPPPPPPP] [74.7% done] [ 228 Jobs: 2 (f=2): [W______________________________rPPPPPPPPPP] [74.8% done] [ 1564 Jobs: 1 (f=1): [W_______________________________PPPPPPPPPP] [75.5% done] [ 264 Jobs: 1 (f=1): [W_______________________________PPPPPPPPPP] [76.1% done] [ 0 Jobs: 2 (f=2): [W_______________________________rPPPPPPPPP] [76.2% done] [ 516 Jobs: 1 (f=1): [W________________________________PPPPPPPPP] [75.9% done] [ 1532 Jobs: 1 (f=1): [W________________________________PPPPPPPPP] [76.0% done] [ 0 Jobs: 1 (f=1): [W________________________________PPPPPPPPP] [76.2% done] [ 0 Jobs: 2 (f=2): [W________________________________rPPPPPPPP] [76.5% done] [ 768 Jobs: 1 (f=1): [W_________________________________PPPPPPPP] [76.6% done] [ 1316 Jobs: 1 (f=1): [W_________________________________PPPPPPPP] [76.7% done] [ 0 Jobs: 1 (f=1): [W_________________________________IPPPPPPP] [77.8% done] [ 0 Jobs: 2 (f=2): [W_________________________________rPPPPPPP] [77.9% done] [ 604 Jobs: 1 (f=1): [W__________________________________IPPPPPP] [78.0% done] [ 1444 Jobs: 2 (f=2): [W__________________________________rPPPPPP] [78.2% done] [ 1145 Jobs: 1 (f=1): [W___________________________________PPPPPP] [78.3% done] [ 932 Jobs: 1 (f=1): [W___________________________________PPPPPP] [79.3% done] [ 0 Jobs: 2 (f=2): [W___________________________________rPPPPP] [100.7% done] [ Jobs: 2 (f=2): [W___________________________________rPPPPP] [80.0% done] [ 1012 Jobs: 1 (f=1): [W____________________________________PPPPP] [80.6% done] [ 1072 Jobs: 1 (f=1): [W____________________________________PPPPP] [81.6% done] [ 0 Jobs: 2 (f=2): [W____________________________________rPPPP] [72.2% done] [ 36 Jobs: 2 (f=2): [W____________________________________rPPPP] [82.3% done] [ 956 Jobs: 1 (f=1): [W_____________________________________PPPP] [82.9% done] [ 1076 Jobs: 1 (f=1): [W_____________________________________PPPP] [83.4% done] [ 0 Jobs: 2 (f=2): [W_____________________________________rPPP] [78.2% done] [ 48 Jobs: 2 (f=2): [W_____________________________________rPPP] [84.6% done] [ 1060 Jobs: 1 (f=1): [W______________________________________PPP] [85.1% done] [ 956 Jobs: 1 (f=1): [W______________________________________PPP] [85.7% done] [ 0 Jobs: 2 (f=2): [W______________________________________rPP] [86.3% done] [ 96 Jobs: 2 (f=2): [W______________________________________rPP] [86.4% done] [ 756 Jobs: 1 (f=1): [W_______________________________________PP] [86.9% done] [ 1212 Jobs: 1 (f=1): [W_______________________________________PP] [87.5% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [88.6% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [89.1% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [90.2% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [90.8% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [91.4% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [92.5% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [93.1% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [93.6% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [94.2% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [95.3% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [95.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [97.1% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [97.7% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.2% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.8% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.8% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [99.0% done] [ 0 Jobs: 1 (f=1): [W_______________________________________PP] [99.0% done] [ 0 Jobs: 0 (f=0) [eta 00m:02s] Mathieu > Mathieu > > > cfq (2.6.28.1-patch1) 517 3086 > > > > It looks like we are getting somewhere :) Are there any specific > > queue_depth, slice_async_rq, quantum variations you would like to be > > tested ? > > > > For reference, I attach my ssh-like job file (again) to this mail. > > > > Mathieu > > > > > > [job1] > > rw=write > > size=10240m > > direct=0 > > blocksize=1024k > > > > [global] > > rw=randread > > size=2048k > > filesize=30m > > direct=0 > > bsrange=4k-44k > > > > [file1] > > startdelay=0 > > > > [file2] > > startdelay=4 > > > > [file3] > > startdelay=8 > > > > [file4] > > startdelay=12 > > > > [file5] > > startdelay=16 > > > > [file6] > > startdelay=20 > > > > [file7] > > startdelay=24 > > > > [file8] > > startdelay=28 > > > > [file9] > > startdelay=32 > > > > [file10] > > startdelay=36 > > > > [file11] > > startdelay=40 > > > > [file12] > > startdelay=44 > > > > [file13] > > startdelay=48 > > > > [file14] > > startdelay=52 > > > > [file15] > > startdelay=56 > > > > [file16] > > startdelay=60 > > > > [file17] > > startdelay=64 > > > > [file18] > > startdelay=68 > > > > [file19] > > startdelay=72 > > > > [file20] > > startdelay=76 > > > > [file21] > > startdelay=80 > > > > [file22] > > startdelay=84 > > > > [file23] > > startdelay=88 > > > > [file24] > > startdelay=92 > > > > [file25] > > startdelay=96 > > > > [file26] > > startdelay=100 > > > > [file27] > > startdelay=104 > > > > [file28] > > startdelay=108 > > > > [file29] > > startdelay=112 > > > > [file30] > > startdelay=116 > > > > [file31] > > startdelay=120 > > > > [file32] > > startdelay=124 > > > > [file33] > > startdelay=128 > > > > [file34] > > startdelay=132 > > > > [file35] > > startdelay=134 > > > > [file36] > > startdelay=138 > > > > [file37] > > startdelay=142 > > > > [file38] > > startdelay=146 > > > > [file39] > > startdelay=150 > > > > [file40] > > startdelay=200 > > > > [file41] > > startdelay=260 > > > > -- > > Mathieu Desnoyers > > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 > > -- > Mathieu Desnoyers > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-20 23:27 ` Mathieu Desnoyers 2009-01-21 0:25 ` Mathieu Desnoyers @ 2009-01-23 3:21 ` KOSAKI Motohiro 2009-01-23 4:03 ` Mathieu Desnoyers 2009-02-10 3:36 ` [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O Mathieu Desnoyers 1 sibling, 2 replies; 39+ messages in thread From: KOSAKI Motohiro @ 2009-01-23 3:21 UTC (permalink / raw) To: Mathieu Desnoyers Cc: kosaki.motohiro, Jens Axboe, akpm, ltt-dev, Linus Torvalds, Ingo Molnar, linux-kernel > So, I ran the tests with my corrected patch, and the results are very > good ! > > "incoming ssh connexion" test > > "config 2.6.28 cfq" > Linux 2.6.28 > /sys/block/sd{a,b}/device/queue_depth = 31 (default) > /sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default) > /sys/block/sd{a,b}/queue/iosched/quantum = 4 (default) > > "config 2.6.28.1-patch1" > Linux 2.6.28.1 > Corrected cfq patch applied > echo 1 > /sys/block/sd{a,b}/device/queue_depth > echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq > echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum > > On /dev/sda : > > I/O scheduler runt-min (msec) runt-max (msec) > cfq (2.6.28 cfq) 523 6637 > cfq (2.6.28.1-patch1) 579 2082 > > On raid1 : > > I/O scheduler runt-min (msec) runt-max (msec) > cfq (2.6.28 cfq) 523 28216 > cfq (2.6.28.1-patch1) 517 3086 Congraturation. In university machine room (at least, the university in japan), parallel ssh workload freqently happend. I like this patch :) > > It looks like we are getting somewhere :) Are there any specific > queue_depth, slice_async_rq, quantum variations you would like to be > tested ? > > For reference, I attach my ssh-like job file (again) to this mail. > > Mathieu > > > [job1] > rw=write > size=10240m > direct=0 > blocksize=1024k > > [global] > rw=randread > size=2048k > filesize=30m > direct=0 > bsrange=4k-44k > > [file1] > startdelay=0 > > [file2] > startdelay=4 > > [file3] > startdelay=8 > > [file4] > startdelay=12 > > [file5] > startdelay=16 > > [file6] > startdelay=20 > > [file7] > startdelay=24 > > [file8] > startdelay=28 > > [file9] > startdelay=32 > > [file10] > startdelay=36 > > [file11] > startdelay=40 > > [file12] > startdelay=44 > > [file13] > startdelay=48 > > [file14] > startdelay=52 > > [file15] > startdelay=56 > > [file16] > startdelay=60 > > [file17] > startdelay=64 > > [file18] > startdelay=68 > > [file19] > startdelay=72 > > [file20] > startdelay=76 > > [file21] > startdelay=80 > > [file22] > startdelay=84 > > [file23] > startdelay=88 > > [file24] > startdelay=92 > > [file25] > startdelay=96 > > [file26] > startdelay=100 > > [file27] > startdelay=104 > > [file28] > startdelay=108 > > [file29] > startdelay=112 > > [file30] > startdelay=116 > > [file31] > startdelay=120 > > [file32] > startdelay=124 > > [file33] > startdelay=128 > > [file34] > startdelay=132 > > [file35] > startdelay=134 > > [file36] > startdelay=138 > > [file37] > startdelay=142 > > [file38] > startdelay=146 > > [file39] > startdelay=150 > > [file40] > startdelay=200 > > [file41] > startdelay=260 > > -- > Mathieu Desnoyers > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 > > _______________________________________________ > ltt-dev mailing list > ltt-dev@lists.casi.polymtl.ca > http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-23 3:21 ` [ltt-dev] " KOSAKI Motohiro @ 2009-01-23 4:03 ` Mathieu Desnoyers 2009-02-10 3:36 ` [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O Mathieu Desnoyers 1 sibling, 0 replies; 39+ messages in thread From: Mathieu Desnoyers @ 2009-01-23 4:03 UTC (permalink / raw) To: KOSAKI Motohiro Cc: linux-kernel, ltt-dev, Jens Axboe, akpm, Linus Torvalds, Ingo Molnar * KOSAKI Motohiro (kosaki.motohiro@jp.fujitsu.com) wrote: > > So, I ran the tests with my corrected patch, and the results are very > > good ! > > > > "incoming ssh connexion" test > > > > "config 2.6.28 cfq" > > Linux 2.6.28 > > /sys/block/sd{a,b}/device/queue_depth = 31 (default) > > /sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default) > > /sys/block/sd{a,b}/queue/iosched/quantum = 4 (default) > > > > "config 2.6.28.1-patch1" > > Linux 2.6.28.1 > > Corrected cfq patch applied > > echo 1 > /sys/block/sd{a,b}/device/queue_depth > > echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq > > echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum > > > > On /dev/sda : > > > > I/O scheduler runt-min (msec) runt-max (msec) > > cfq (2.6.28 cfq) 523 6637 > > cfq (2.6.28.1-patch1) 579 2082 > > > > On raid1 : > > > > I/O scheduler runt-min (msec) runt-max (msec) > > cfq (2.6.28 cfq) 523 28216 > > cfq (2.6.28.1-patch1) 517 3086 > > Congraturation. > In university machine room (at least, the university in japan), > parallel ssh workload freqently happend. > > I like this patch :) > Please see my today's posts with numbers taken after my Seagate firmware upgrade. The runt-max case is pretty hard to trigger "for sure" and I had to do a few runs to trigger the problem. The latest tests are better. E.g. the 3086msec is actually just because the problem has not been hit. But the echo 1 > /sys/block/sd{a,b}/device/queue_depth echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum Are definitely helping a lot, as my last numbers also show. The patch, OTOH, degraded performances rather than making them better. Mathieu > > > > > > It looks like we are getting somewhere :) Are there any specific > > queue_depth, slice_async_rq, quantum variations you would like to be > > tested ? > > > > For reference, I attach my ssh-like job file (again) to this mail. > > > > Mathieu > > > > > > [job1] > > rw=write > > size=10240m > > direct=0 > > blocksize=1024k > > > > [global] > > rw=randread > > size=2048k > > filesize=30m > > direct=0 > > bsrange=4k-44k > > > > [file1] > > startdelay=0 > > > > [file2] > > startdelay=4 > > > > [file3] > > startdelay=8 > > > > [file4] > > startdelay=12 > > > > [file5] > > startdelay=16 > > > > [file6] > > startdelay=20 > > > > [file7] > > startdelay=24 > > > > [file8] > > startdelay=28 > > > > [file9] > > startdelay=32 > > > > [file10] > > startdelay=36 > > > > [file11] > > startdelay=40 > > > > [file12] > > startdelay=44 > > > > [file13] > > startdelay=48 > > > > [file14] > > startdelay=52 > > > > [file15] > > startdelay=56 > > > > [file16] > > startdelay=60 > > > > [file17] > > startdelay=64 > > > > [file18] > > startdelay=68 > > > > [file19] > > startdelay=72 > > > > [file20] > > startdelay=76 > > > > [file21] > > startdelay=80 > > > > [file22] > > startdelay=84 > > > > [file23] > > startdelay=88 > > > > [file24] > > startdelay=92 > > > > [file25] > > startdelay=96 > > > > [file26] > > startdelay=100 > > > > [file27] > > startdelay=104 > > > > [file28] > > startdelay=108 > > > > [file29] > > startdelay=112 > > > > [file30] > > startdelay=116 > > > > [file31] > > startdelay=120 > > > > [file32] > > startdelay=124 > > > > [file33] > > startdelay=128 > > > > [file34] > > startdelay=132 > > > > [file35] > > startdelay=134 > > > > [file36] > > startdelay=138 > > > > [file37] > > startdelay=142 > > > > [file38] > > startdelay=146 > > > > [file39] > > startdelay=150 > > > > [file40] > > startdelay=200 > > > > [file41] > > startdelay=260 > > > > -- > > Mathieu Desnoyers > > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 > > > > _______________________________________________ > > ltt-dev mailing list > > ltt-dev@lists.casi.polymtl.ca > > http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev > > > > > _______________________________________________ > ltt-dev mailing list > ltt-dev@lists.casi.polymtl.ca > http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O 2009-01-23 3:21 ` [ltt-dev] " KOSAKI Motohiro 2009-01-23 4:03 ` Mathieu Desnoyers @ 2009-02-10 3:36 ` Mathieu Desnoyers 2009-02-10 3:55 ` Nick Piggin 2009-02-10 5:23 ` Linus Torvalds 1 sibling, 2 replies; 39+ messages in thread From: Mathieu Desnoyers @ 2009-02-10 3:36 UTC (permalink / raw) To: KOSAKI Motohiro, Jens Axboe, akpm, Peter Zijlstra, Linus Torvalds, Ingo Molnar, thomas.pi, Yuriy Lalym Cc: ltt-dev, linux-kernel, linux-mm Related to : http://bugzilla.kernel.org/show_bug.cgi?id=12309 Very annoying I/O latencies (20-30 seconds) are occuring under heavy I/O since ~2.6.18. Yuriy Lalym noticed that the oom killer was eventually called. So I took a look at /proc/meminfo and noticed that under my test case (fio job created from a LTTng block I/O trace, reproducing dd writing to a 20GB file and ssh sessions being opened), the Inactive(file) value increased, and the total memory consumed increased until only 80kB (out of 16GB) were left. So I first used cgroups to limit the memory usable by fio (or dd). This seems to fix the problem. Thomas noted that there seems to be a problem with pages being passed to the block I/O elevator not being counted as dirty. I looked at clear_page_dirty_for_io and noticed that page_mkclean clears the dirty bit and then set_page_dirty(page) is called on the page. This calls mm/page-writeback.c:set_page_dirty(). I assume that the mapping->a_ops->set_page_dirty is NULL, so it calls buffer.c:__set_page_dirty_buffers(). This calls set_buffer_dirty(bh). So we come back in clear_page_dirty_for_io where we decrement the dirty accounting. This is a problem, because we assume that the block layer will re-increment it when it gets the page, but because the buffer is marked as dirty, this won't happen. So this patch fixes this behavior by only decrementing the page accounting _after_ the block I/O writepage has been done. The effect on my workload is that the memory stops being completely filled by page cache under heavy I/O. The vfs_cache_pressure value seems to work again. However, this does not fully solve the high latency issue : when there are enough vfs pages in cache that the pages are being written directly to disk rather than left in the page cache, the CFQ I/O scheduler does not seem to be able to correctly prioritize I/O requests. I think this might be because when this high pressure point is reached, all tasks are blocked in the same way when they try to add pages to the page cache, independently of their I/O priority. Any idea on how to fix this is welcome. Related commits : commit 7658cc289288b8ae7dd2c2224549a048431222b3 Author: Linus Torvalds <torvalds@macmini.osdl.org> Date: Fri Dec 29 10:00:58 2006 -0800 VM: Fix nasty and subtle race in shared mmap'ed page writeback commit 8c08540f8755c451d8b96ea14cfe796bc3cd712d Author: Andrew Morton <akpm@osdl.org> Date: Sun Dec 10 02:19:24 2006 -0800 [PATCH] clean up __set_page_dirty_nobuffers() Both were merged Dec 2006, which is between kernel v2.6.19 and v2.6.20-rc3. This patch applies on 2.6.29-rc3. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> CC: Jens Axboe <jens.axboe@oracle.com> CC: akpm@linux-foundation.org CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Ingo Molnar <mingo@elte.hu> CC: thomas.pi@arcor.dea CC: Yuriy Lalym <ylalym@gmail.com> --- mm/page-writeback.c | 33 +++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-) Index: linux-2.6-lttng/mm/page-writeback.c =================================================================== --- linux-2.6-lttng.orig/mm/page-writeback.c 2009-02-09 20:18:41.000000000 -0500 +++ linux-2.6-lttng/mm/page-writeback.c 2009-02-09 20:42:39.000000000 -0500 @@ -945,6 +945,7 @@ int write_cache_pages(struct address_spa int cycled; int range_whole = 0; long nr_to_write = wbc->nr_to_write; + int lazyaccounting; if (wbc->nonblocking && bdi_write_congested(bdi)) { wbc->encountered_congestion = 1; @@ -1028,10 +1029,18 @@ continue_unlock: } BUG_ON(PageWriteback(page)); - if (!clear_page_dirty_for_io(page)) + lazyaccounting = clear_page_dirty_for_io(page); + if (!lazyaccounting) goto continue_unlock; ret = (*writepage)(page, wbc, data); + + if (lazyaccounting == 2) { + dec_zone_page_state(page, NR_FILE_DIRTY); + dec_bdi_stat(mapping->backing_dev_info, + BDI_RECLAIMABLE); + } + if (unlikely(ret)) { if (ret == AOP_WRITEPAGE_ACTIVATE) { unlock_page(page); @@ -1149,6 +1158,7 @@ int write_one_page(struct page *page, in { struct address_space *mapping = page->mapping; int ret = 0; + int lazyaccounting; struct writeback_control wbc = { .sync_mode = WB_SYNC_ALL, .nr_to_write = 1, @@ -1159,7 +1169,8 @@ int write_one_page(struct page *page, in if (wait) wait_on_page_writeback(page); - if (clear_page_dirty_for_io(page)) { + lazyaccounting = clear_page_dirty_for_io(page); + if (lazyaccounting) { page_cache_get(page); ret = mapping->a_ops->writepage(page, &wbc); if (ret == 0 && wait) { @@ -1167,6 +1178,11 @@ int write_one_page(struct page *page, in if (PageError(page)) ret = -EIO; } + if (lazyaccounting == 2) { + dec_zone_page_state(page, NR_FILE_DIRTY); + dec_bdi_stat(mapping->backing_dev_info, + BDI_RECLAIMABLE); + } page_cache_release(page); } else { unlock_page(page); @@ -1312,6 +1328,11 @@ EXPORT_SYMBOL(set_page_dirty_lock); * * This incoherency between the page's dirty flag and radix-tree tag is * unfortunate, but it only exists while the page is locked. + * + * Return values : + * 0 : page is not dirty + * 1 : page is dirty, no lazy accounting update still have to be performed + * 2 : page is direct *and* lazy accounting update must still be performed */ int clear_page_dirty_for_io(struct page *page) { @@ -1358,12 +1379,8 @@ int clear_page_dirty_for_io(struct page * the desired exclusion. See mm/memory.c:do_wp_page() * for more comments. */ - if (TestClearPageDirty(page)) { - dec_zone_page_state(page, NR_FILE_DIRTY); - dec_bdi_stat(mapping->backing_dev_info, - BDI_RECLAIMABLE); - return 1; - } + if (TestClearPageDirty(page)) + return 2; return 0; } return TestClearPageDirty(page); -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O 2009-02-10 3:36 ` [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O Mathieu Desnoyers @ 2009-02-10 3:55 ` Nick Piggin 2009-02-10 5:23 ` Linus Torvalds 1 sibling, 0 replies; 39+ messages in thread From: Nick Piggin @ 2009-02-10 3:55 UTC (permalink / raw) To: Mathieu Desnoyers Cc: KOSAKI Motohiro, Jens Axboe, akpm, Peter Zijlstra, Linus Torvalds, Ingo Molnar, thomas.pi, Yuriy Lalym, ltt-dev, linux-kernel, linux-mm On Tuesday 10 February 2009 14:36:53 Mathieu Desnoyers wrote: > Related to : > http://bugzilla.kernel.org/show_bug.cgi?id=12309 > > Very annoying I/O latencies (20-30 seconds) are occuring under heavy I/O > since ~2.6.18. > > Yuriy Lalym noticed that the oom killer was eventually called. So I took a > look at /proc/meminfo and noticed that under my test case (fio job created > from a LTTng block I/O trace, reproducing dd writing to a 20GB file and ssh > sessions being opened), the Inactive(file) value increased, and the total > memory consumed increased until only 80kB (out of 16GB) were left. > > So I first used cgroups to limit the memory usable by fio (or dd). This > seems to fix the problem. > > Thomas noted that there seems to be a problem with pages being passed to > the block I/O elevator not being counted as dirty. I looked at > clear_page_dirty_for_io and noticed that page_mkclean clears the dirty bit > and then set_page_dirty(page) is called on the page. This calls > mm/page-writeback.c:set_page_dirty(). I assume that the > mapping->a_ops->set_page_dirty is NULL, so it calls > buffer.c:__set_page_dirty_buffers(). This calls set_buffer_dirty(bh). > > So we come back in clear_page_dirty_for_io where we decrement the dirty > accounting. This is a problem, because we assume that the block layer will > re-increment it when it gets the page, but because the buffer is marked as > dirty, this won't happen. > > So this patch fixes this behavior by only decrementing the page accounting > _after_ the block I/O writepage has been done. > > The effect on my workload is that the memory stops being completely filled > by page cache under heavy I/O. The vfs_cache_pressure value seems to work > again. I don't think we're supposed to assume the block layer will re-increment the dirty count? It should be all in the VM. And the VM should increment writeback count before sending it to the block device, and dirty page throttling also takes into account the number of writeback pages, so it should not be allowed to fill up memory with dirty pages even if the block device queue size is unlimited. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O 2009-02-10 3:36 ` [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O Mathieu Desnoyers 2009-02-10 3:55 ` Nick Piggin @ 2009-02-10 5:23 ` Linus Torvalds 2009-02-10 5:56 ` Nick Piggin 2009-02-10 6:12 ` Mathieu Desnoyers 1 sibling, 2 replies; 39+ messages in thread From: Linus Torvalds @ 2009-02-10 5:23 UTC (permalink / raw) To: Mathieu Desnoyers Cc: KOSAKI Motohiro, Jens Axboe, akpm, Peter Zijlstra, Ingo Molnar, thomas.pi, Yuriy Lalym, ltt-dev, linux-kernel, linux-mm On Mon, 9 Feb 2009, Mathieu Desnoyers wrote: > > So this patch fixes this behavior by only decrementing the page accounting > _after_ the block I/O writepage has been done. This makes no sense, really. Or rather, I don't mind the notion of updating the counters only after IO per se, and _that_ part of it probably makes sense. But why is it that you only then fix up two of the call-sites. There's a lot more call-sites than that for this function. So if this really makes a big difference, that's an interesting starting point for discussion, but I don't see how this particular patch could possibly be the right thing to do. Linus ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O 2009-02-10 5:23 ` Linus Torvalds @ 2009-02-10 5:56 ` Nick Piggin 2009-02-10 6:12 ` Mathieu Desnoyers 1 sibling, 0 replies; 39+ messages in thread From: Nick Piggin @ 2009-02-10 5:56 UTC (permalink / raw) To: Linus Torvalds Cc: Mathieu Desnoyers, KOSAKI Motohiro, Jens Axboe, akpm, Peter Zijlstra, Ingo Molnar, thomas.pi, Yuriy Lalym, ltt-dev, linux-kernel, linux-mm On Tuesday 10 February 2009 16:23:56 Linus Torvalds wrote: > On Mon, 9 Feb 2009, Mathieu Desnoyers wrote: > > So this patch fixes this behavior by only decrementing the page > > accounting _after_ the block I/O writepage has been done. > > This makes no sense, really. > > Or rather, I don't mind the notion of updating the counters only after IO > per se, and _that_ part of it probably makes sense. But why is it that you > only then fix up two of the call-sites. There's a lot more call-sites than > that for this function. Well if you do that, then I'd think you also have to change some calculations that today use dirty+writeback. In some ways it does make sense, but OTOH it is natural in the pagecache since it was introduced to treat writeback as basically equivalent to dirty. So writeback && !dirty pages shouldn't cause things to blow up, or if it does then hopefully it is a simple bug somewhere. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O 2009-02-10 5:23 ` Linus Torvalds 2009-02-10 5:56 ` Nick Piggin @ 2009-02-10 6:12 ` Mathieu Desnoyers 1 sibling, 0 replies; 39+ messages in thread From: Mathieu Desnoyers @ 2009-02-10 6:12 UTC (permalink / raw) To: Linus Torvalds Cc: KOSAKI Motohiro, Jens Axboe, akpm, Peter Zijlstra, Ingo Molnar, thomas.pi, Yuriy Lalym, ltt-dev, linux-kernel, linux-mm * Linus Torvalds (torvalds@linux-foundation.org) wrote: > > > On Mon, 9 Feb 2009, Mathieu Desnoyers wrote: > > > > So this patch fixes this behavior by only decrementing the page accounting > > _after_ the block I/O writepage has been done. > > This makes no sense, really. > > Or rather, I don't mind the notion of updating the counters only after IO > per se, and _that_ part of it probably makes sense. But why is it that you > only then fix up two of the call-sites. There's a lot more call-sites than > that for this function. > > So if this really makes a big difference, that's an interesting starting > point for discussion, but I don't see how this particular patch could > possibly be the right thing to do. > Yes, you are right. Looking in more details at /proc/meminfo under the workload, I notice this : MemTotal: 16028812 kB MemFree: 13651440 kB Buffers: 8944 kB Cached: 2209456 kB <--- increments up to ~16GB cached = global_page_state(NR_FILE_PAGES) - total_swapcache_pages - i.bufferram; SwapCached: 0 kB Active: 34668 kB Inactive: 2200668 kB <--- also K(pages[LRU_INACTIVE_ANON] + pages[LRU_INACTIVE_FILE]), Active(anon): 17136 kB Inactive(anon): 0 kB Active(file): 17532 kB Inactive(file): 2200668 kB <--- also K(pages[LRU_INACTIVE_FILE]), Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 19535024 kB SwapFree: 19535024 kB Dirty: 1159036 kB Writeback: 0 kB <--- stays close to 0 AnonPages: 17060 kB Mapped: 9476 kB Slab: 96188 kB SReclaimable: 79776 kB SUnreclaim: 16412 kB PageTables: 3364 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 27549428 kB Committed_AS: 54292 kB VmallocTotal: 34359738367 kB VmallocUsed: 9960 kB VmallocChunk: 34359727667 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 7552 kB DirectMap2M: 16769024 kB So I think simply substracting K(pages[LRU_INACTIVE_FILE]) from avail_dirty in clip_bdi_dirty_limit() and to consider it in balance_dirty_pages() and throttle_vm_writeout() would probably make my problem go away, but I would like to understand exactly why this is needed and if I would need to consider other types of page counts that would have been forgotten. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-20 12:28 ` Jens Axboe 2009-01-20 14:22 ` [ltt-dev] " Mathieu Desnoyers 2009-01-20 23:27 ` Mathieu Desnoyers @ 2009-02-02 2:08 ` Mathieu Desnoyers 2009-02-02 11:26 ` Jens Axboe 2 siblings, 1 reply; 39+ messages in thread From: Mathieu Desnoyers @ 2009-02-02 2:08 UTC (permalink / raw) To: Jens Axboe; +Cc: akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev Hi Jens, I tried your patch at http://bugzilla.kernel.org/attachment.cgi?id=20001 On a 2.6.29-rc3 kernel. I get the following OOPS just after I start running the fio test. It happens after a few cfq: moving ffff88043d4b42e0 to dispatch cfq: moving ffff88043d4b4170 to dispatch messages (~20). Here is the oops : ------------[ cut here ]------------ kernel BUG at block/cfq-iosched.c:650! invalid opcode: 0000 [#1] PREEMPT SMP LTT NESTING LEVEL : 0 last sysfs file: /sys/block/sda/stat CPU 2 Modules linked in: loop ltt_tracer ltt_trace_control ltt_userspa] Pid: 2934, comm: kjournald Not tainted 2.6.29-rc3 #3 RIP: 0010:[<ffffffff80419c2b>] [<ffffffff80419c2b>] cfq_remove_0 RSP: 0018:ffff88043b167c20 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff88043fd9e088 RCX: 0000000000000001 RDX: 0000000000000010 RSI: ffff88043887b590 RDI: ffff88043887b590 RBP: ffff88043b167c50 R08: 0000000000000002 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88043fd9e088 R13: ffff88043887b590 R14: ffff88043fc40200 R15: ffff88043fd9e088 FS: 0000000000000000(0000) GS:ffff88043e81a080(0000) knlGS:00000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00007f2a5f98b8c0 CR3: 000000043e8c4000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kjournald (pid: 2934, threadinfo ffff88043b166000, task ) Stack: 000000000000003b ffff88043887b590 ffff88043fd9e088 ffff88043e5a0 ffff88043fc40200 ffff88002809ed50 ffff88043b167c80 ffffffff8041d 0000000000000001 ffff88043887b590 ffffe2001b805138 ffff88043e5a0 Call Trace: [<ffffffff80419e4d>] cfq_dispatch_insert+0x3d/0x70 [<ffffffff80419f2f>] cfq_wait_on_page+0xaf/0xc0 [<ffffffff804098ed>] elv_wait_on_page+0x1d/0x20 [<ffffffff8040d207>] blk_backing_dev_wop+0x17/0x50 [<ffffffff80301872>] sync_buffer+0x52/0x80 [<ffffffff806a33b2>] __wait_on_bit+0x62/0x90 [<ffffffff80301820>] ? sync_buffer+0x0/0x80 [<ffffffff80301820>] ? sync_buffer+0x0/0x80 [<ffffffff806a3459>] out_of_line_wait_on_bit+0x79/0x90 [<ffffffff8025a8a0>] ? wake_bit_function+0x0/0x50 [<ffffffff80301769>] __wait_on_buffer+0xf9/0x130 [<ffffffff80379acd>] journal_commit_transaction+0x72d/0x1650 [<ffffffff806a5c87>] ? _spin_unlock_irqrestore+0x47/0x80 [<ffffffff8024dd2f>] ? try_to_del_timer_sync+0x5f/0x70 [<ffffffff8037e488>] kjournald+0xe8/0x250 [<ffffffff8025a860>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8037e3a0>] ? kjournald+0x0/0x250 [<ffffffff8025a38e>] kthread+0x4e/0x90 [<ffffffff8025a340>] ? kthread+0x0/0x90 [<ffffffff8020db2a>] child_rip+0xa/0x20 [<ffffffff8020d480>] ? restore_args+0x0/0x30 [<ffffffff8025a340>] ? kthread+0x0/0x90 [<ffffffff8020db20>] ? child_rip+0x0/0x20 Code: 4d 89 6d 00 49 8b 9d c0 00 00 00 41 8b 45 48 4c 8b 73 08 2 RIP [<ffffffff80419c2b>] cfq_remove_request+0x6b/0x250 RSP <ffff88043b167c20> ---[ end trace eab134a8bd405d05 ]--- It seems that the cfqq->queued[sync] counter should either be incremented/decremented in the new cfq_wait_on_page, or that the fact that the type of request (sync vs !sync) changes would not be taken care of correctly. I have not looked at the code enough to find out exactly what is happening, but I though you might have an idea of the cause. Thanks, Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-02-02 2:08 ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers @ 2009-02-02 11:26 ` Jens Axboe 2009-02-03 0:46 ` Mathieu Desnoyers 0 siblings, 1 reply; 39+ messages in thread From: Jens Axboe @ 2009-02-02 11:26 UTC (permalink / raw) To: Mathieu Desnoyers Cc: akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev On Sun, Feb 01 2009, Mathieu Desnoyers wrote: > Hi Jens, > > I tried your patch at > > http://bugzilla.kernel.org/attachment.cgi?id=20001 > > On a 2.6.29-rc3 kernel. I get the following OOPS just after I start > running the fio test. It happens after a few > > cfq: moving ffff88043d4b42e0 to dispatch > cfq: moving ffff88043d4b4170 to dispatch > > messages (~20). > > Here is the oops : > > ------------[ cut here ]------------ > kernel BUG at block/cfq-iosched.c:650! > invalid opcode: 0000 [#1] PREEMPT SMP > LTT NESTING LEVEL : 0 > last sysfs file: /sys/block/sda/stat > CPU 2 > Modules linked in: loop ltt_tracer ltt_trace_control ltt_userspa] > Pid: 2934, comm: kjournald Not tainted 2.6.29-rc3 #3 > RIP: 0010:[<ffffffff80419c2b>] [<ffffffff80419c2b>] cfq_remove_0 > RSP: 0018:ffff88043b167c20 EFLAGS: 00010046 > RAX: 0000000000000000 RBX: ffff88043fd9e088 RCX: 0000000000000001 > RDX: 0000000000000010 RSI: ffff88043887b590 RDI: ffff88043887b590 > RBP: ffff88043b167c50 R08: 0000000000000002 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: ffff88043fd9e088 > R13: ffff88043887b590 R14: ffff88043fc40200 R15: ffff88043fd9e088 > FS: 0000000000000000(0000) GS:ffff88043e81a080(0000) knlGS:00000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 00007f2a5f98b8c0 CR3: 000000043e8c4000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process kjournald (pid: 2934, threadinfo ffff88043b166000, task ) > Stack: > 000000000000003b ffff88043887b590 ffff88043fd9e088 ffff88043e5a0 > ffff88043fc40200 ffff88002809ed50 ffff88043b167c80 ffffffff8041d > 0000000000000001 ffff88043887b590 ffffe2001b805138 ffff88043e5a0 > Call Trace: > [<ffffffff80419e4d>] cfq_dispatch_insert+0x3d/0x70 > [<ffffffff80419f2f>] cfq_wait_on_page+0xaf/0xc0 > [<ffffffff804098ed>] elv_wait_on_page+0x1d/0x20 > [<ffffffff8040d207>] blk_backing_dev_wop+0x17/0x50 > [<ffffffff80301872>] sync_buffer+0x52/0x80 > [<ffffffff806a33b2>] __wait_on_bit+0x62/0x90 > [<ffffffff80301820>] ? sync_buffer+0x0/0x80 > [<ffffffff80301820>] ? sync_buffer+0x0/0x80 > [<ffffffff806a3459>] out_of_line_wait_on_bit+0x79/0x90 > [<ffffffff8025a8a0>] ? wake_bit_function+0x0/0x50 > [<ffffffff80301769>] __wait_on_buffer+0xf9/0x130 > [<ffffffff80379acd>] journal_commit_transaction+0x72d/0x1650 > [<ffffffff806a5c87>] ? _spin_unlock_irqrestore+0x47/0x80 > [<ffffffff8024dd2f>] ? try_to_del_timer_sync+0x5f/0x70 > [<ffffffff8037e488>] kjournald+0xe8/0x250 > [<ffffffff8025a860>] ? autoremove_wake_function+0x0/0x40 > [<ffffffff8037e3a0>] ? kjournald+0x0/0x250 > [<ffffffff8025a38e>] kthread+0x4e/0x90 > [<ffffffff8025a340>] ? kthread+0x0/0x90 > [<ffffffff8020db2a>] child_rip+0xa/0x20 > [<ffffffff8020d480>] ? restore_args+0x0/0x30 > [<ffffffff8025a340>] ? kthread+0x0/0x90 > [<ffffffff8020db20>] ? child_rip+0x0/0x20 > Code: 4d 89 6d 00 49 8b 9d c0 00 00 00 41 8b 45 48 4c 8b 73 08 2 > RIP [<ffffffff80419c2b>] cfq_remove_request+0x6b/0x250 > RSP <ffff88043b167c20> > ---[ end trace eab134a8bd405d05 ]--- > > It seems that the cfqq->queued[sync] counter should either be > incremented/decremented in the new cfq_wait_on_page, or that the fact > that the type of request (sync vs !sync) changes would not be taken care > of correctly. I have not looked at the code enough to find out exactly > what is happening, but I though you might have an idea of the cause. Just ignore the patch for now, I'm not going to be spending more time on it. It was just an attempt at a quick test, I don't think this approach is very feasible since it doesn't appear to be the root of the problem. In any case, were we to continue on this path, the accounting logic in CFQ would have to be adjusted for this new behaviour. Otherwise there's a big risk of giving great preference to async writeout once things get tight. It's also working around the real problem for this specific issue, which is that you just don't want to have sync apps blocked waiting for async writeout in the first place. -- Jens Axboe ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-02-02 11:26 ` Jens Axboe @ 2009-02-03 0:46 ` Mathieu Desnoyers 0 siblings, 0 replies; 39+ messages in thread From: Mathieu Desnoyers @ 2009-02-03 0:46 UTC (permalink / raw) To: Jens Axboe; +Cc: akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev * Jens Axboe (jens.axboe@oracle.com) wrote: > It's also working around the real problem for this specific issue, which > is that you just don't want to have sync apps blocked waiting for async > writeout in the first place. > Maybe I could help to identify criterion for such sync requests which are treated as async. From a newcomer's look at the situation, I would assume that : - Small I/O requests - I/O requests caused by major page faults, except those caused by access to mmapped files which result in large consecutive file reads/writes. Should never *ever* fall into the async I/O request path. Am I correct ? If yes, then I could trigger some tracing test cases and identify the faulty scenarios with LTTng. Maybe the solution does not sit only within the block I/O layer : I guess we would also have to find out what is considered a "large" and a "small" I/O request. I think using open() flags to specify if I/O is expected to be synchronous or asynchronous for a particular file would be a good start (AFAIK, only O_DIRECT seems to be close to this, but it also has the side-effect of not using any kernel buffering, which I am not sure is wanted in every case). If this implies adding new flags to open(), then supporting older apps could be done by heuristics on the size of the requests. New applications which have very specific needs (e.g. large synchronous I/O) could be tuned with the new flags. Any small request coming from the page fault handler would be treated as synchronous. Requests coming from the page fault handler on a particular mmapped file would behave following the sync/async flags of the associated open(). If not flag is specified, the heuristic would apply to the resulting merged requests from the page fault handler. Therefore, large consecutive reads of mmapped files would fall in the "async" category by default. mmap of shared libraries and memory mapping done by exec() should clearly specify the "sync" flag, because those accesses *will* cause delays when the application needs to be executed. Hopefully what I am saying here makes sense. If you have links to some background information to point me to so I get a better understanding of how async vs sync requests are handled by the CFQ, I would greatly appreciate. Best regards, Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-20 7:37 ` Jens Axboe 2009-01-20 12:28 ` Jens Axboe @ 2009-01-20 13:45 ` Mathieu Desnoyers 2009-01-20 20:22 ` Ben Gamari 2 siblings, 0 replies; 39+ messages in thread From: Mathieu Desnoyers @ 2009-01-20 13:45 UTC (permalink / raw) To: Jens Axboe Cc: linux-kernel, ltt-dev, Andrea Arcangeli, akpm, Linus Torvalds, Ingo Molnar * Jens Axboe (jens.axboe@oracle.com) wrote: > On Mon, Jan 19 2009, Mathieu Desnoyers wrote: > > * Jens Axboe (jens.axboe@oracle.com) wrote: > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote: > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace > > > > to create a fio job file. The said behavior is appended below as "Part > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done > > > > with the anticipatory I/O scheduler, which was active by default on my > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am > > > > running this on a raid-1, but have experienced the same problem on a > > > > standard partition I created on the same machine. > > > > > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It > > > > consists of one dd-like job and many small jobs reading as many data as > > > > ls did. I used the small test script to batch run this ("Part 3 - batch > > > > test"). > > > > > > > > The results for the ls-like jobs are interesting : > > > > > > > > I/O scheduler runt-min (msec) runt-max (msec) > > > > noop 41 10563 > > > > anticipatory 63 8185 > > > > deadline 52 33387 > > > > cfq 43 1420 > > > > > > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did > > not make much difference (also tried with NO_HZ enabled). > > > > > Do you have queuing enabled on your drives? You can check that in > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all > > > schedulers, would be good for comparison. > > > > > > > Here are the tests with a queue_depth of 1 : > > > > I/O scheduler runt-min (msec) runt-max (msec) > > noop 43 38235 > > anticipatory 44 8728 > > deadline 51 19751 > > cfq 48 427 > > > > > > Overall, I wouldn't say it makes much difference. > > 0,5 seconds vs 1,5 seconds isn't much of a difference? > threefold.. yes, that's significant, but not in term of usability in that specific case. > > > raid personalities or dm complicates matters, since it introduces a > > > disconnect between 'ls' and the io scheduler at the bottom... > > > > > > > Yes, ideally I should re-run those directly on the disk partitions. > > At least for comparison. > Here it is. ssh test done on /dev/sda directly queue_depth=31 (default) /sys/block/sda/queue/iosched/slice_async_rq = 2 (default) /sys/block/sda/queue/iosched/quantum = 4 (default) I/O scheduler runt-min (msec) runt-max (msec) noop 612 205684 anticipatory 562 5555 deadline 505 113153 cfq 523 6637 > > I am also tempted to create a fio job file which acts like a ssh server > > receiving a connexion after it has been pruned from the cache while the > > system if doing heavy I/O. "ssh", in this case, seems to be doing much > > more I/O than a simple "ls", and I think we might want to see if cfq > > behaves correctly in such case. Most of this I/O is coming from page > > faults (identified as traps in the trace) probably because the ssh > > executable has been thrown out of the cache by > > > > echo 3 > /proc/sys/vm/drop_caches > > > > The behavior of an incoming ssh connexion after clearing the cache is > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The > > job file created (Part 2) reads, for each job, a 2MB file with random > > reads each between 4k-44k. The results are very interesting for cfq : > > > > I/O scheduler runt-min (msec) runt-max (msec) > > noop 586 110242 > > anticipatory 531 26942 > > deadline 561 108772 > > cfq 523 28216 > > > > So, basically, ssh being out of the cache can take 28s to answer an > > incoming ssh connexion even with the cfq scheduler. This is not exactly > > what I would call an acceptable latency. > > At some point, you have to stop and consider what is acceptable > performance for a given IO pattern. Your ssh test case is purely random > IO, and neither CFQ nor AS would do any idling for that. We can make > this test case faster for sure, the hard part is making sure that we > don't regress on async throughput at the same time. > > Also remember that with your raid1, it's not entirely reasonable to > blaim all performance issues on the IO scheduler as per my previous > mail. It would be a lot more fair to view the disk numbers individually. > > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set > to 1 as well? > Sure, ssh test done on /dev/sda queue_depth=31 (default) /sys/block/sda/queue/iosched/slice_async_rq = 1 /sys/block/sda/queue/iosched/quantum = 1 I/O scheduler runt-min (msec) runt-max (msec) cfq (default) 523 6637 cfq (s_rq=1,q=1) 503 6743 It did not do much difference. Mathieu > However, I think we should be doing somewhat better at this test case. > > -- > Jens Axboe > > > _______________________________________________ > ltt-dev mailing list > ltt-dev@lists.casi.polymtl.ca > http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-20 7:37 ` Jens Axboe 2009-01-20 12:28 ` Jens Axboe 2009-01-20 13:45 ` [ltt-dev] " Mathieu Desnoyers @ 2009-01-20 20:22 ` Ben Gamari 2009-01-20 22:23 ` Ben Gamari 2009-01-22 2:35 ` Ben Gamari 2 siblings, 2 replies; 39+ messages in thread From: Ben Gamari @ 2009-01-20 20:22 UTC (permalink / raw) To: Jens Axboe Cc: Mathieu Desnoyers, Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev On Tue, Jan 20, 2009 at 2:37 AM, Jens Axboe <jens.axboe@oracle.com> wrote: > On Mon, Jan 19 2009, Mathieu Desnoyers wrote: >> * Jens Axboe (jens.axboe@oracle.com) wrote: >> Yes, ideally I should re-run those directly on the disk partitions. > > At least for comparison. > I just completed my own set of benchmarks using the fio job file Mathieu provided. This was on a 2.5 inch 7200 RPM SATA partition formatted as ext3. As you can see, I tested all of the available schedulers with both queuing enabled and disabled. I'll test the Jens' patch soon. Would a blktrace of the fio run help? Let me know if there's any other benchmarking or profiling that could be done. Thanks, - Ben mint maxt ========================================================== queue_depth=31: anticipatory 35 msec 11036 msec cfq 37 msec 3350 msec deadline 36 msec 18144 msec noop 39 msec 41512 msec ========================================================== queue_depth=1: anticipatory 45 msec 9561 msec cfq 28 msec 3974 msec deadline 47 msec 16802 msec noop 35 msec 38173 msec ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-20 20:22 ` Ben Gamari @ 2009-01-20 22:23 ` Ben Gamari 2009-01-20 23:05 ` Mathieu Desnoyers 2009-01-22 2:35 ` Ben Gamari 1 sibling, 1 reply; 39+ messages in thread From: Ben Gamari @ 2009-01-20 22:23 UTC (permalink / raw) To: Jens Axboe Cc: Mathieu Desnoyers, Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev The kernel build finally finished. Unfortunately, it crashes quickly after booting with moderate disk IO, bringing down the entire machine. For this reason, I haven't been able to complete a fio benchmark. Jens, what do you think about this backtrace? - Ben BUG: unable to handle kernel paging request at 0000000008 IP: [<ffffffff811c4b2d>] cfq_remove_request+0xb0/0x1da PGD b2902067 PUD b292e067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0t CPU 0 Modules linked in: aes_x86_64 aes_generic i915 drm i2c_algo_bit rfcomm bridge s] Pid: 3903, comm: evolution Not tainted 2.6.29-rc2ben #16 RIP: 0010:[<ffffffff811c4b2d>] [<ffffffff811c4b2d>] cfq_remove_request+0xb0/0xa RSP: 0018:ffff8800bb853758 EFLAGS: 00010006 RAX: 0000000000200200 RBX: ffff8800b28f3420 RCX: 0000000009deabeb RDX: 0000000000100100 RSI: ffff8800b010afd0 RDI: ffff8800b010afd0 RBP: ffff8800bb853788 R08: ffff88011fc08250 R09: 000000000cf8b20b R10: 0000000009e15923 R11: ffff8800b28f3420 R12: ffff8800b010afd0 R13: ffff8800b010afd0 R14: ffff88011d4e8000 R15: ffff88011fc08220 FS: 00007f4b1ef407e0(0000) GS:ffffffff817e7000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000100108 CR3: 00000000b284b000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process evolution (pid: 3903, threadinfo ffff8800bb852000, task ffff8800da0c2de) Stack: ffffffff811ccc19 ffff88011fc08220 ffff8800b010afd0 ffff88011d572000 ffff88011d4e8000 ffff88011d572000 ffff8800bb8537b8 ffffffff811c4ca8 ffff88011fc08220 ffff88011d572000 ffff8800b010afd0 ffff88011fc08250 Call Trace: [<ffffffff811ccc19>] ? rb_insert_color+0xbd/0xe6 [<ffffffff811c4ca8>] cfq_dispatch_insert+0x51/0x72 [<ffffffff811c4d0d>] cfq_add_rq_rb+0x44/0xcf [<ffffffff811c5519>] cfq_insert_request+0x34d/0x3d1 [<ffffffff811b6d81>] elv_insert+0x1a9/0x250 [<ffffffff811b6ec3>] __elv_add_request+0x9b/0xa4 [<ffffffff811b9769>] __make_request+0x3c4/0x446 [<ffffffff811b7f53>] generic_make_request+0x2bf/0x309 [<ffffffff811b8068>] submit_bio+0xcb/0xd4 [<ffffffff810f170b>] submit_bh+0x115/0x138 [<ffffffff810f31f7>] ll_rw_block+0xa5/0xf4 [<ffffffff810f3886>] __block_prepare_write+0x277/0x306 [<ffffffff8112c759>] ? ext3_get_block+0x0/0x101 [<ffffffff810f3a7e>] block_write_begin+0x8b/0xdd [<ffffffff8112bd66>] ext3_write_begin+0xee/0x1c0 [<ffffffff8112c759>] ? ext3_get_block+0x0/0x101 [<ffffffff8109f3be>] generic_file_buffered_write+0x12e/0x2e4 [<ffffffff8109f973>] __generic_file_aio_write_nolock+0x263/0x297 [<ffffffff810e4470>] ? touch_atime+0xdf/0x101 [<ffffffff8109feaa>] ? generic_file_aio_read+0x503/0x59c [<ffffffff810a01ed>] generic_file_aio_write+0x6c/0xc8 [<ffffffff81128c72>] ext3_file_write+0x23/0xa5 [<ffffffff810d2d77>] do_sync_write+0xec/0x132 [<ffffffff8105da1c>] ? autoremove_wake_function+0x0/0x3d [<ffffffff8119c880>] ? selinux_file_permission+0x40/0xcb [<ffffffff8119c902>] ? selinux_file_permission+0xc2/0xcb [<ffffffff81194cc4>] ? security_file_permission+0x16/0x18 [<ffffffff810d3693>] vfs_write+0xb0/0x10a [<ffffffff810d37bb>] sys_write+0x4c/0x74 [<ffffffff810114aa>] system_call_fastpath+0x16/0x1b Code: 48 85 c0 74 0c 4c 39 e0 48 8d b0 60 ff ff ff 75 02 31 f6 48 8b 7d d0 48 8 RIP [<ffffffff811c4b2d>] cfq_remove_request+0xb0/0x1da RSP <ffff8800bb853758> CR2: 0000000000100108 ---[ end trace 6c5ef63f7957c4cf ]--- On Tue, Jan 20, 2009 at 3:22 PM, Ben Gamari <bgamari@gmail.com> wrote: > On Tue, Jan 20, 2009 at 2:37 AM, Jens Axboe <jens.axboe@oracle.com> wrote: >> On Mon, Jan 19 2009, Mathieu Desnoyers wrote: >>> * Jens Axboe (jens.axboe@oracle.com) wrote: >>> Yes, ideally I should re-run those directly on the disk partitions. >> >> At least for comparison. >> > > I just completed my own set of benchmarks using the fio job file > Mathieu provided. This was on a 2.5 inch 7200 RPM SATA partition > formatted as ext3. As you can see, I tested all of the available > schedulers with both queuing enabled and disabled. I'll test the Jens' > patch soon. Would a blktrace of the fio run help? Let me know if > there's any other benchmarking or profiling that could be done. > Thanks, > > - Ben > > > mint maxt > ========================================================== > queue_depth=31: > anticipatory 35 msec 11036 msec > cfq 37 msec 3350 msec > deadline 36 msec 18144 msec > noop 39 msec 41512 msec > > ========================================================== > queue_depth=1: > anticipatory 45 msec 9561 msec > cfq 28 msec 3974 msec > deadline 47 msec 16802 msec > noop 35 msec 38173 msec > ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-20 22:23 ` Ben Gamari @ 2009-01-20 23:05 ` Mathieu Desnoyers 0 siblings, 0 replies; 39+ messages in thread From: Mathieu Desnoyers @ 2009-01-20 23:05 UTC (permalink / raw) To: Ben Gamari Cc: Jens Axboe, Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev * Ben Gamari (bgamari@gmail.com) wrote: > The kernel build finally finished. Unfortunately, it crashes quickly > after booting with moderate disk IO, bringing down the entire machine. > For this reason, I haven't been able to complete a fio benchmark. > Jens, what do you think about this backtrace? > Hi Ben, Try with this new patch I just did. It solves the problem for me. Jens seems to have done a list_del in a non-safe list iteration. Mathieu Fixes cfq iosched test patch Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> --- block/cfq-iosched.c | 38 +++++++++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) Index: linux-2.6-lttng/block/cfq-iosched.c =================================================================== --- linux-2.6-lttng.orig/block/cfq-iosched.c 2009-01-20 10:31:46.000000000 -0500 +++ linux-2.6-lttng/block/cfq-iosched.c 2009-01-20 17:41:06.000000000 -0500 @@ -1761,6 +1761,36 @@ cfq_update_idle_window(struct cfq_data * } /* + * Pull dispatched requests from 'cfqq' back into the scheduler + */ +static void cfq_pull_dispatched_requests(struct cfq_data *cfqd, + struct cfq_queue *cfqq) +{ + struct request_queue *q = cfqd->queue; + struct request *rq, *tmp; + + list_for_each_entry_safe_reverse(rq, tmp, &q->queue_head, queuelist) { + if (rq->cmd_flags & REQ_STARTED) + break; + + if (RQ_CFQQ(rq) != cfqq) + continue; + + /* + * Pull off the dispatch list and put it back into the cfqq + */ + list_del(&rq->queuelist); + cfqq->dispatched--; + if (cfq_cfqq_sync(cfqq)) + cfqd->sync_flight--; + + cfq_add_rq_rb(rq); + q->nr_sorted++; + list_add_tail(&rq->queuelist, &cfqq->fifo); + } +} + +/* * Check if new_cfqq should preempt the currently active queue. Return 0 for * no or if we aren't sure, a 1 will cause a preempt. */ @@ -1816,8 +1846,14 @@ cfq_should_preempt(struct cfq_data *cfqd */ static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq) { + struct cfq_queue *old_cfqq = cfqd->active_queue; + cfq_log_cfqq(cfqd, cfqq, "preempt"); - cfq_slice_expired(cfqd, 1); + + if (old_cfqq) { + __cfq_slice_expired(cfqd, old_cfqq, 1); + cfq_pull_dispatched_requests(cfqd, old_cfqq); + } /* * Put the new queue at the front of the of the current list, -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-20 20:22 ` Ben Gamari 2009-01-20 22:23 ` Ben Gamari @ 2009-01-22 2:35 ` Ben Gamari 1 sibling, 0 replies; 39+ messages in thread From: Ben Gamari @ 2009-01-22 2:35 UTC (permalink / raw) To: Jens Axboe Cc: Mathieu Desnoyers, Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev I'm not sure if this will help, but I just completed another set of benchmarks using Jens' patch and a variety of device parameters. Again, I don't know if this will help anyone, but I figured it might help quantify the differences between device parameters. Let me know if there's any other benchmarking or testing that I can do. Thanks, - Ben mint maxt ========================================================== queue_depth=1, slice_async_rq=1, quantum=1, patched anticipatory 25 msec 4410 msec cfq 27 msec 1466 msec deadline 36 msec 10735 msec noop 48 msec 37439 msec ========================================================== queue_depth=1, slice_async_rq=1, quantum=4, patched anticipatory 38 msec 3579 msec cfq 35 msec 822 msec deadline 37 msec 10072 msec noop 32 msec 45535 msec ========================================================== queue_depth=1, slice_async_rq=2, quantum=1, patched anticipatory 33 msec 4480 msec cfq 28 msec 353 msec deadline 30 msec 6738 msec noop 36 msec 39691 msec ========================================================== queue_depth=1, slice_async_rq=2, quantum=4, patched anticipatory 40 msec 4498 msec cfq 35 msec 1395 msec deadline 41 msec 6877 msec noop 38 msec 46410 msec ========================================================== queue_depth=31, slice_async_rq=1, quantum=1, patched anticipatory 31 msec 6011 msec cfq 36 msec 4575 msec deadline 41 msec 18599 msec noop 38 msec 46347 msec ========================================================== queue_depth=31, slice_async_rq=2, quantum=1, patched anticipatory 30 msec 9985 msec cfq 33 msec 4200 msec deadline 38 msec 22285 msec noop 25 msec 40245 msec ========================================================== queue_depth=31, slice_async_rq=2, quantum=4, patched anticipatory 30 msec 12197 msec cfq 30 msec 3457 msec deadline 35 msec 18969 msec noop 34 msec 42803 msec On Tue, 2009-01-20 at 15:22 -0500, Ben Gamari wrote: > On Tue, Jan 20, 2009 at 2:37 AM, Jens Axboe <jens.axboe@oracle.com> wrote: > > On Mon, Jan 19 2009, Mathieu Desnoyers wrote: > >> * Jens Axboe (jens.axboe@oracle.com) wrote: > >> Yes, ideally I should re-run those directly on the disk partitions. > > > > At least for comparison. > > > > I just completed my own set of benchmarks using the fio job file > Mathieu provided. This was on a 2.5 inch 7200 RPM SATA partition > formatted as ext3. As you can see, I tested all of the available > schedulers with both queuing enabled and disabled. I'll test the Jens' > patch soon. Would a blktrace of the fio run help? Let me know if > there's any other benchmarking or profiling that could be done. > Thanks, > > - Ben > > > mint maxt > ========================================================== > queue_depth=31: > anticipatory 35 msec 11036 msec > cfq 37 msec 3350 msec > deadline 36 msec 18144 msec > noop 39 msec 41512 msec > > ========================================================== > queue_depth=1: > anticipatory 45 msec 9561 msec > cfq 28 msec 3974 msec > deadline 47 msec 16802 msec > noop 35 msec 38173 msec ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-17 19:04 ` Jens Axboe 2009-01-18 21:12 ` Mathieu Desnoyers @ 2009-01-19 15:45 ` Nikanth K 2009-01-19 18:23 ` Jens Axboe 1 sibling, 1 reply; 39+ messages in thread From: Nikanth K @ 2009-01-19 15:45 UTC (permalink / raw) To: Jens Axboe Cc: Mathieu Desnoyers, Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev On Sun, Jan 18, 2009 at 12:34 AM, Jens Axboe <jens.axboe@oracle.com> wrote: > > As a quick test, could you try and increase the slice_idle to eg 20ms? > Sometimes I've seen timing being slightly off, which makes us miss the > sync window for the ls (in your case) process. Then you get a mix of > async and sync IO all the time, which very much slows down the sync > process. > Do you mean to say that 'ls' could not submit another request until the previous sync request completes, but its idle window gets disabled as it takes way too long to complete during heavy load? But when there are requests in the driver, wont the idling be disabled anyway? Or did you mean to increase slice_sync? Thanks Nikanth ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-19 15:45 ` Nikanth K @ 2009-01-19 18:23 ` Jens Axboe 0 siblings, 0 replies; 39+ messages in thread From: Jens Axboe @ 2009-01-19 18:23 UTC (permalink / raw) To: Nikanth K Cc: Mathieu Desnoyers, Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev On Mon, Jan 19 2009, Nikanth K wrote: > On Sun, Jan 18, 2009 at 12:34 AM, Jens Axboe <jens.axboe@oracle.com> wrote: > > > > > As a quick test, could you try and increase the slice_idle to eg 20ms? > > Sometimes I've seen timing being slightly off, which makes us miss the > > sync window for the ls (in your case) process. Then you get a mix of > > async and sync IO all the time, which very much slows down the sync > > process. > > > > Do you mean to say that 'ls' could not submit another request until > the previous sync request completes, but its idle window gets disabled > as it takes way too long to complete during heavy load? But when there 'ls' would never submit a new request before the previous one completes, such is the nature of sync processes. That's the whole reason we have the idle window. > are requests in the driver, wont the idling be disabled anyway? Or did > you mean to increase slice_sync? No, idling is on a per-cfqq (process) basis. I did not mean to increase slice_sync, that wont help at all. It's the window between submissions of requests that I wanted to test being larger, but apparently that wasn't the case here. -- Jens Axboe ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency 2009-01-17 16:26 ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers 2009-01-17 16:50 ` Leon Woestenberg 2009-01-17 19:04 ` Jens Axboe @ 2009-01-17 20:03 ` Ben Gamari 2 siblings, 0 replies; 39+ messages in thread From: Ben Gamari @ 2009-01-17 20:03 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Jens Axboe, Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev On Sat, 2009-01-17 at 11:26 -0500, Mathieu Desnoyers wrote: > This patch implements a basic test to make sure we never merge more than 128 > requests into the same request if it is the "last_merge" request. I have not > been able to trigger the problem again with the fix applied. It might not be in > a perfect state : there may be better solutions to the problem, but I think it > helps pointing out where the culprit lays. Unfortunately, it seems like the patch hasn't really fixed much. After porting it forward to Linus' master, I haven't exhibited any difference in real world use cases (e.g. desktop use cases while building a kernel). Given Jen's remarks, I suppose this isn't too surprising. Does anyone else with greater familiarity with the block I/O subsystem have any more ideas about the source of the slowdown? It seems like the recent patches incorporating blktrace support into ftrace could be helpful for further data collection, correct? - Ben ^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2009-02-10 6:12 UTC | newest] Thread overview: 39+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-17 0:44 [Regression] High latency when doing large I/O Mathieu Desnoyers 2009-01-17 16:26 ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers 2009-01-17 16:50 ` Leon Woestenberg 2009-01-17 17:15 ` Mathieu Desnoyers 2009-01-17 19:04 ` Jens Axboe 2009-01-18 21:12 ` Mathieu Desnoyers 2009-01-18 21:27 ` Mathieu Desnoyers 2009-01-19 18:26 ` Jens Axboe 2009-01-20 2:10 ` Mathieu Desnoyers 2009-01-20 7:37 ` Jens Axboe 2009-01-20 12:28 ` Jens Axboe 2009-01-20 14:22 ` [ltt-dev] " Mathieu Desnoyers 2009-01-20 14:24 ` Jens Axboe 2009-01-20 15:42 ` Mathieu Desnoyers 2009-01-20 23:06 ` Mathieu Desnoyers 2009-01-20 23:27 ` Mathieu Desnoyers 2009-01-21 0:25 ` Mathieu Desnoyers 2009-01-21 4:38 ` Ben Gamari 2009-01-21 4:54 ` [ltt-dev] " Mathieu Desnoyers 2009-01-21 6:17 ` Ben Gamari 2009-01-22 22:59 ` Mathieu Desnoyers 2009-01-23 3:21 ` [ltt-dev] " KOSAKI Motohiro 2009-01-23 4:03 ` Mathieu Desnoyers 2009-02-10 3:36 ` [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O Mathieu Desnoyers 2009-02-10 3:55 ` Nick Piggin 2009-02-10 5:23 ` Linus Torvalds 2009-02-10 5:56 ` Nick Piggin 2009-02-10 6:12 ` Mathieu Desnoyers 2009-02-02 2:08 ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers 2009-02-02 11:26 ` Jens Axboe 2009-02-03 0:46 ` Mathieu Desnoyers 2009-01-20 13:45 ` [ltt-dev] " Mathieu Desnoyers 2009-01-20 20:22 ` Ben Gamari 2009-01-20 22:23 ` Ben Gamari 2009-01-20 23:05 ` Mathieu Desnoyers 2009-01-22 2:35 ` Ben Gamari 2009-01-19 15:45 ` Nikanth K 2009-01-19 18:23 ` Jens Axboe 2009-01-17 20:03 ` Ben Gamari
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).