public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
To: Lukas Hejtmanek <xhejtman@ics.muni.cz>
Cc: agk@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: request baset device mapper in Linux
Date: Fri, 22 Jul 2011 15:56:40 +0900	[thread overview]
Message-ID: <4E291F28.2040609@ct.jp.nec.com> (raw)
In-Reply-To: <20110721132627.GY7561@ics.muni.cz>

Hi Lukas,

Lukas Hejtmanek wrote:
> On Thu, Jul 21, 2011 at 08:11:32PM +0900, Kiyoshi Ueda wrote:
>> I don't understand why you are saying request-based device-mapper makes
>> serious troubles.
>> BIO merging is done in the block layer.  Don't you see the same thing
>> if you use sd devices (/dev/sdX)?
> 
> no, if I use sd devices directly, it does not overload ksoftirqd and this is
> obvious. The problem with overloading ksoftirqd has roots in request based
> stuff in dm layer, in particular in dm_softirq_done() call. 
> 
>> If you see your problem only with request-based device-mapper, please
>> elaborate about below:
>>     - end_clone_bio() has someting like quadratic complexity.
>>         * What do you mean the "quadratic complexity"?
> 
> end_clone_bio calls blk_update_request which calls __blk_recalc_rq_segments
> which has code:
> for_each_bio(bio) {
>         bio_for_each_segment(bv, bio, i) {
> 
> in total, it seems that *whole* bio list is traversed again and again as some parts
> are done and some not which leads to comlexity O(n^2) with respect to number
> of bio and segments. But this is just wild guess. The real problem could be
> elsewhere.
> 
> However, oprofile or sysprof show that ksoftirq spends most time in
> __blk_recalc_rq_segments().

Thank you very much for the detailed explanation.
Now, I understand what you mentioned except below:

> ksoftirqd eats 100% CPU as soon as all available memory is used for buffers.

If the slow down is caused only by lack of CPU power, memory usage
should not matter here.
Don't you see 100% CPU of ksoftirqd (nor the slow down) if you use
a fewer size than memory size? (e.g. 500[MB] for each dd)

> The problem with overloading ksoftirqd has roots in request based
> stuff in dm layer, in particular in dm_softirq_done() call. 

Is that your actual trace result?
end_clone_bio(), which seems taking much time, is called
in the context of SCSI's softirq.
So if you really see that dm_softirq_done() is taking a time,
there may be other problems, too.


>>     - each request contains more then 100 bios which makes serious
>>       troubles for ksoftirqd call backs.
>>         * What do you mean the "serious troubles for ksoftirqd call backs"?
> 
> serious troubles means that ksoftirqd eats 100% CPU and slows down I/O
> significantly (from 2.8GB/s to 500MB/s).

OK, at least, request-based device-mapper eats more CPU/memory resources
than bio-based device-mapper due to the design which clones bio, too,
not only request.
So I think you need more CPUs on such environments which have lots of devices,
or you may be able to work around by splitting each request to shorter size
as you mentioned.
(How many CPUs do you have and how fast are those CPUs?
 I just tried, but no such phenomenon can be seen on the environment
 of 10 (FC) devices and 1 CPU (Xeon(R) E5205 1.86[GHz]).)

# I will be on a vacation whole next week, so I won't be able to respond
# until 8/1.  Sorry about that.

Thanks,
Kiyoshi Ueda

  reply	other threads:[~2011-07-22  6:58 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-20  8:26 request baset device mapper in Linux Lukas Hejtmanek
2011-07-21 11:11 ` Kiyoshi Ueda
2011-07-21 13:26   ` Lukas Hejtmanek
2011-07-22  6:56     ` Kiyoshi Ueda [this message]
2011-07-22  8:19       ` Lukas Hejtmanek
2011-07-23  7:28         ` Jun'ichi Nomura
2011-07-24 22:16           ` Lukas Hejtmanek
2011-08-01  9:31             ` Kiyoshi Ueda
2011-09-08 13:27               ` Lukas Hejtmanek
2011-09-15 18:49                 ` Mike Snitzer
2011-09-16 14:08                   ` Lukas Hejtmanek
2011-09-19  5:50                     ` Jun'ichi Nomura
2011-09-29 20:57                       ` Lukas Hejtmanek
2011-10-05  8:13                         ` Jun'ichi Nomura
2011-10-05 10:35                           ` Lukas Hejtmanek
2011-10-06  5:11                             ` Jun'ichi Nomura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E291F28.2040609@ct.jp.nec.com \
    --to=k-ueda@ct.jp.nec.com \
    --cc=agk@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xhejtman@ics.muni.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox