public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>
To: Lukas Hejtmanek <xhejtman@ics.muni.cz>
Cc: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>,
	agk@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: request baset device mapper in Linux
Date: Sat, 23 Jul 2011 16:28:06 +0900	[thread overview]
Message-ID: <4E2A7806.1060601@ce.jp.nec.com> (raw)
In-Reply-To: <20110722081903.GB7561@ics.muni.cz>

Hi,

On 07/22/11 17:19, Lukas Hejtmanek wrote:
> this is trace from sysprof but it is not in exact order: (and percetage is not
> accurate and is across whole system, ksoftirqd runs at 100% CPU according to
> top).
> 
>   [ksoftirqd/0]                        0.00%  33.45%
>     - - kernel - -                     0.00%  33.45%
>       __blk_recalc_rq_segments        16.61%  16.61%
>       _spin_unlock_irqrestore          6.17%   6.17%
>       kmem_cache_free                  2.21%   2.21%
>       blk_update_request               1.78%   1.80%
>       end_buffer_async_read            1.40%   1.40%
...
>> (How many CPUs do you have and how fast are those CPUs?
>>  I just tried, but no such phenomenon can be seen on the environment
>>  of 10 (FC) devices and 1 CPU (Xeon(R) E5205 1.86[GHz]).)
> 
> I have E5640  @ 2.67GHz with 16 cores (8 real cores with HT).
> 
> 10 devices is not enough. I cannot preproduce it with just 10 devices. At
> least 20 is necessary. 

How fast is the single disk performance?
Could you check /proc/interrupts and /proc/softirqs and
see how they are distributed among CPUs?
As for the memory usage, what happens if you add 'iflag=direct' to dd?

Also, is it possible for you to try the attached patch?
I would like to know whether it changes the phenomenon you see.
This patch should reduce the number of calls to recalc segments.
If it is the root cause, the patch should fix your case.
The patch is generated for 3.0 but should be easily applicable to
other version of request-based dm.

As Kiyoshi suggested, it is important to know whether this
problem occurs with the latest kernel.
So if you could try 3.0, it would be very helpful.

# and sorry, I will not be able to respond e-mail during next week..

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation


--- linux-3.0/drivers/md/dm.c.orig	2011-07-23 11:04:54.487100496 +0900
+++ linux-3.0/drivers/md/dm.c	2011-07-23 15:30:14.748606235 +0900
@@ -70,6 +70,7 @@ struct dm_rq_target_io {
 	struct mapped_device *md;
 	struct dm_target *ti;
 	struct request *orig, clone;
+	unsigned int done_bytes;
 	int error;
 	union map_info info;
 };
@@ -705,23 +706,8 @@ static void end_clone_bio(struct bio *cl
 
 	/*
 	 * I/O for the bio successfully completed.
-	 * Notice the data completion to the upper layer.
 	 */
-
-	/*
-	 * bios are processed from the head of the list.
-	 * So the completing bio should always be rq->bio.
-	 * If it's not, something wrong is happening.
-	 */
-	if (tio->orig->bio != bio)
-		DMERR("bio completion is going in the middle of the request");
-
-	/*
-	 * Update the original request.
-	 * Do not use blk_end_request() here, because it may complete
-	 * the original request before the clone, and break the ordering.
-	 */
-	blk_update_request(tio->orig, 0, nr_bytes);
+	tio->done_bytes += nr_bytes;
 }
 
 /*
@@ -850,6 +836,16 @@ static void dm_done(struct request *clon
 	struct dm_rq_target_io *tio = clone->end_io_data;
 	dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io;
 
+	/*
+	 * Update the original request.
+	 * Do not use blk_end_request() here, because it may complete
+	 * the original request before the clone, and break the ordering.
+	 */
+	if (tio->done_bytes) {
+		blk_update_request(tio->orig, 0, tio->done_bytes);
+		tio->done_bytes = 0;
+	}
+
 	if (mapped && rq_end_io)
 		r = rq_end_io(tio->ti, clone, error, &tio->info);
 
@@ -1507,6 +1503,7 @@ static struct request *clone_rq(struct r
 	tio->md = md;
 	tio->ti = NULL;
 	tio->orig = rq;
+	tio->done_bytes = 0;
 	tio->error = 0;
 	memset(&tio->info, 0, sizeof(tio->info));
 

  reply	other threads:[~2011-07-23  7:39 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-20  8:26 request baset device mapper in Linux Lukas Hejtmanek
2011-07-21 11:11 ` Kiyoshi Ueda
2011-07-21 13:26   ` Lukas Hejtmanek
2011-07-22  6:56     ` Kiyoshi Ueda
2011-07-22  8:19       ` Lukas Hejtmanek
2011-07-23  7:28         ` Jun'ichi Nomura [this message]
2011-07-24 22:16           ` Lukas Hejtmanek
2011-08-01  9:31             ` Kiyoshi Ueda
2011-09-08 13:27               ` Lukas Hejtmanek
2011-09-15 18:49                 ` Mike Snitzer
2011-09-16 14:08                   ` Lukas Hejtmanek
2011-09-19  5:50                     ` Jun'ichi Nomura
2011-09-29 20:57                       ` Lukas Hejtmanek
2011-10-05  8:13                         ` Jun'ichi Nomura
2011-10-05 10:35                           ` Lukas Hejtmanek
2011-10-06  5:11                             ` Jun'ichi Nomura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E2A7806.1060601@ce.jp.nec.com \
    --to=j-nomura@ce.jp.nec.com \
    --cc=agk@redhat.com \
    --cc=k-ueda@ct.jp.nec.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xhejtman@ics.muni.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox