linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Jens Axboe <axboe@kernel.dk>, Andrea Arcangeli <andrea@suse.de>,
	akpm@linux-foundation.org, Ingo Molnar <mingo@elte.hu>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, ltt-dev@lists.casi.polymtl.ca
Subject: [RFC PATCH] block: Fix bio merge induced high I/O latency
Date: Sat, 17 Jan 2009 11:26:57 -0500	[thread overview]
Message-ID: <20090117162657.GA31965@Krystal> (raw)
In-Reply-To: <20090117004439.GA11492@Krystal>

A long standing I/O regression (since 2.6.18, still there today) has hit
Slashdot recently :
http://bugzilla.kernel.org/show_bug.cgi?id=12309
http://it.slashdot.org/article.pl?sid=09/01/15/049201

I've taken a trace reproducing the wrong behavior on my machine and I
think it's getting us somewhere.

LTTng 0.83, kernel 2.6.28
Machine : Intel Xeon E5405 dual quad-core, 16GB ram
(just created a new block-trace.c LTTng probe which is not released yet.
It basically replaces blktrace)


echo 3 > /proc/sys/vm/drop_caches

lttctl -C -w /tmp/trace -o channel.mm.bufnum=8 -o channel.block.bufnum=64 trace

dd if=/dev/zero of=/tmp/newfile bs=1M count=1M
cp -ax music /tmp   (copying 1.1GB of mp3)

ls  (takes 15 seconds to get the directory listing !)

lttctl -D trace

I looked at the trace (especially at the ls surroundings), and bash is
waiting for a few seconds for I/O in the exec system call (to exec ls).

While this happens, we have dd doing lots and lots of bio_queue. There
is a bio_backmerge after each bio_queue event. This is reasonable,
because dd is writing to a contiguous file.

However, I wonder if this is not the actual problem. We have dd which
has the head request in the elevator request queue. It is progressing
steadily by plugging/unplugging the device periodically and gets its
work done. However, because requests are being dequeued at the same
rate others are being merged, I suspect it stays at the top of the queue
and does not let the other unrelated requests run.

There is a test in the blk-merge.c which makes sure that merged requests
do not get bigger than a certain size. However, if the request is
steadily dequeued, I think this test is not doing anything.


This patch implements a basic test to make sure we never merge more than 128
requests into the same request if it is the "last_merge" request. I have not
been able to trigger the problem again with the fix applied. It might not be in
a perfect state : there may be better solutions to the problem, but I think it
helps pointing out where the culprit lays.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Jens Axboe <axboe@kernel.dk>
CC: Andrea Arcangeli <andrea@suse.de>
CC: akpm@linux-foundation.org
CC: Ingo Molnar <mingo@elte.hu>
CC: Linus Torvalds <torvalds@linux-foundation.org>
---
 block/blk-merge.c      |   12 +++++++++---
 block/elevator.c       |   31 ++++++++++++++++++++++++++++---
 include/linux/blkdev.h |    1 +
 3 files changed, 38 insertions(+), 6 deletions(-)

Index: linux-2.6-lttng/include/linux/blkdev.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/blkdev.h	2009-01-17 09:49:54.000000000 -0500
+++ linux-2.6-lttng/include/linux/blkdev.h	2009-01-17 09:50:29.000000000 -0500
@@ -313,6 +313,7 @@ struct request_queue
 	 */
 	struct list_head	queue_head;
 	struct request		*last_merge;
+	int			nr_cached_merge;
 	elevator_t		*elevator;
 
 	/*
Index: linux-2.6-lttng/block/elevator.c
===================================================================
--- linux-2.6-lttng.orig/block/elevator.c	2009-01-17 09:49:54.000000000 -0500
+++ linux-2.6-lttng/block/elevator.c	2009-01-17 11:07:12.000000000 -0500
@@ -255,6 +255,7 @@ int elevator_init(struct request_queue *
 
 	INIT_LIST_HEAD(&q->queue_head);
 	q->last_merge = NULL;
+	q->nr_cached_merge = 0;
 	q->end_sector = 0;
 	q->boundary_rq = NULL;
 
@@ -438,8 +439,10 @@ void elv_dispatch_sort(struct request_qu
 	struct list_head *entry;
 	int stop_flags;
 
-	if (q->last_merge == rq)
+	if (q->last_merge == rq) {
 		q->last_merge = NULL;
+		q->nr_cached_merge = 0;
+	}
 
 	elv_rqhash_del(q, rq);
 
@@ -478,8 +481,10 @@ EXPORT_SYMBOL(elv_dispatch_sort);
  */
 void elv_dispatch_add_tail(struct request_queue *q, struct request *rq)
 {
-	if (q->last_merge == rq)
+	if (q->last_merge == rq) {
 		q->last_merge = NULL;
+		q->nr_cached_merge = 0;
+	}
 
 	elv_rqhash_del(q, rq);
 
@@ -498,6 +503,16 @@ int elv_merge(struct request_queue *q, s
 	int ret;
 
 	/*
+	 * Make sure we don't starve other requests by merging too many cached
+	 * requests together.
+	 */
+	if (q->nr_cached_merge >= BLKDEV_MAX_RQ) {
+		q->last_merge = NULL;
+		q->nr_cached_merge = 0;
+		return ELEVATOR_NO_MERGE;
+	}
+
+	/*
 	 * First try one-hit cache.
 	 */
 	if (q->last_merge) {
@@ -536,6 +551,10 @@ void elv_merged_request(struct request_q
 	if (type == ELEVATOR_BACK_MERGE)
 		elv_rqhash_reposition(q, rq);
 
+	if (q->last_merge != rq)
+		q->nr_cached_merge = 0;
+	else
+		q->nr_cached_merge++;
 	q->last_merge = rq;
 }
 
@@ -551,6 +570,10 @@ void elv_merge_requests(struct request_q
 	elv_rqhash_del(q, next);
 
 	q->nr_sorted--;
+	if (q->last_merge != rq)
+		q->nr_cached_merge = 0;
+	else
+		q->nr_cached_merge++;
 	q->last_merge = rq;
 }
 
@@ -626,8 +649,10 @@ void elv_insert(struct request_queue *q,
 		q->nr_sorted++;
 		if (rq_mergeable(rq)) {
 			elv_rqhash_add(q, rq);
-			if (!q->last_merge)
+			if (!q->last_merge) {
+				q->nr_cached_merge = 1;
 				q->last_merge = rq;
+			}
 		}
 
 		/*
Index: linux-2.6-lttng/block/blk-merge.c
===================================================================
--- linux-2.6-lttng.orig/block/blk-merge.c	2009-01-17 09:49:54.000000000 -0500
+++ linux-2.6-lttng/block/blk-merge.c	2009-01-17 09:50:29.000000000 -0500
@@ -231,8 +231,10 @@ static inline int ll_new_hw_segment(stru
 	if (req->nr_phys_segments + nr_phys_segs > q->max_hw_segments
 	    || req->nr_phys_segments + nr_phys_segs > q->max_phys_segments) {
 		req->cmd_flags |= REQ_NOMERGE;
-		if (req == q->last_merge)
+		if (req == q->last_merge) {
 			q->last_merge = NULL;
+			q->nr_cached_merge = 0;
+		}
 		return 0;
 	}
 
@@ -256,8 +258,10 @@ int ll_back_merge_fn(struct request_queu
 
 	if (req->nr_sectors + bio_sectors(bio) > max_sectors) {
 		req->cmd_flags |= REQ_NOMERGE;
-		if (req == q->last_merge)
+		if (req == q->last_merge) {
 			q->last_merge = NULL;
+			q->nr_cached_merge = 0;
+		}
 		return 0;
 	}
 	if (!bio_flagged(req->biotail, BIO_SEG_VALID))
@@ -281,8 +285,10 @@ int ll_front_merge_fn(struct request_que
 
 	if (req->nr_sectors + bio_sectors(bio) > max_sectors) {
 		req->cmd_flags |= REQ_NOMERGE;
-		if (req == q->last_merge)
+		if (req == q->last_merge) {
 			q->last_merge = NULL;
+			q->nr_cached_merge = 0;
+		}
 		return 0;
 	}
 	if (!bio_flagged(bio, BIO_SEG_VALID))

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2009-01-17 16:27 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-17  0:44 [Regression] High latency when doing large I/O Mathieu Desnoyers
2009-01-17 16:26 ` Mathieu Desnoyers [this message]
2009-01-17 16:50   ` [RFC PATCH] block: Fix bio merge induced high I/O latency Leon Woestenberg
2009-01-17 17:15     ` Mathieu Desnoyers
2009-01-17 19:04   ` Jens Axboe
2009-01-18 21:12     ` Mathieu Desnoyers
2009-01-18 21:27       ` Mathieu Desnoyers
2009-01-19 18:26       ` Jens Axboe
2009-01-20  2:10         ` Mathieu Desnoyers
2009-01-20  7:37           ` Jens Axboe
2009-01-20 12:28             ` Jens Axboe
2009-01-20 14:22               ` [ltt-dev] " Mathieu Desnoyers
2009-01-20 14:24                 ` Jens Axboe
2009-01-20 15:42                   ` Mathieu Desnoyers
2009-01-20 23:06                     ` Mathieu Desnoyers
2009-01-20 23:27               ` Mathieu Desnoyers
2009-01-21  0:25                 ` Mathieu Desnoyers
2009-01-21  4:38                   ` Ben Gamari
2009-01-21  4:54                     ` [ltt-dev] " Mathieu Desnoyers
2009-01-21  6:17                       ` Ben Gamari
2009-01-22 22:59                   ` Mathieu Desnoyers
2009-01-23  3:21                 ` [ltt-dev] " KOSAKI Motohiro
2009-01-23  4:03                   ` Mathieu Desnoyers
2009-02-10  3:36                   ` [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O Mathieu Desnoyers
2009-02-10  3:55                     ` Nick Piggin
2009-02-10  5:23                     ` Linus Torvalds
2009-02-10  5:56                       ` Nick Piggin
2009-02-10  6:12                       ` Mathieu Desnoyers
2009-02-02  2:08               ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers
2009-02-02 11:26                 ` Jens Axboe
2009-02-03  0:46                   ` Mathieu Desnoyers
2009-01-20 13:45             ` [ltt-dev] " Mathieu Desnoyers
2009-01-20 20:22             ` Ben Gamari
2009-01-20 22:23               ` Ben Gamari
2009-01-20 23:05                 ` Mathieu Desnoyers
2009-01-22  2:35               ` Ben Gamari
2009-01-19 15:45     ` Nikanth K
2009-01-19 18:23       ` Jens Axboe
2009-01-17 20:03   ` Ben Gamari

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090117162657.GA31965@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@suse.de \
    --cc=axboe@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ltt-dev@lists.casi.polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).