From: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
To: Lukas Hejtmanek <xhejtman@ics.muni.cz>
Cc: agk@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: request baset device mapper in Linux
Date: Thu, 21 Jul 2011 20:11:32 +0900 [thread overview]
Message-ID: <4E280964.6070000@ct.jp.nec.com> (raw)
In-Reply-To: <20110720082640.GZ7561@ics.muni.cz>
Hi Lukas,
Lukas Hejtmanek wrote:
> Hi,
>
> I encouter serious problems with you commit
> cec47e3d4a861e1d942b3a580d0bbef2700d2bb2 introducing request based device
> mapper in Linux.
>
> I got machine with 80 SATA disks connected to two LSI SAS 2.0 controller
> (mpt2sas driver).
>
> All disks are configured as multipath devices in failover mode:
>
> defaults {
> udev_dir /dev
> polling_interval 10
> selector "round-robin 0"
> path_grouping_policy failover
> path_checker directio
> rr_min_io 100
> no_path_retry queue
> user_friendly_names no
> }
>
> if I run the following command, ksoftirqd eats 100% CPU as soon as all
> available memory is used for buffers.
>
> for i in `seq 0 79`; do dd if=/dev/dm-$i of=/dev/null bs=1M count=10000 & done
>
> top looks like this:
>
> Mem: 48390M total, 45741M used, 2649M free, 43243M buffers
> Swap: 0M total, 0M used, 0M free, 1496M cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 12 root 20 0 0 0 0 R 96 0.0 0:38.78 ksoftirqd/4
> 17263 root 20 0 9432 1752 616 R 14 0.0 0:03.19 dd
> 17275 root 20 0 9432 1756 616 D 14 0.0 0:03.16 dd
> 17271 root 20 0 9432 1756 616 D 10 0.0 0:02.60 dd
> 17258 root 20 0 9432 1756 616 D 7 0.0 0:02.67 dd
> 17260 root 20 0 9432 1756 616 D 7 0.0 0:02.47 dd
> 17262 root 20 0 9432 1752 616 D 7 0.0 0:02.38 dd
> 17264 root 20 0 9432 1756 616 D 7 0.0 0:02.42 dd
> 17267 root 20 0 9432 1756 616 D 7 0.0 0:02.35 dd
> 17268 root 20 0 9432 1756 616 D 7 0.0 0:02.45 dd
> 17274 root 20 0 9432 1756 616 D 7 0.0 0:02.47 dd
> 17277 root 20 0 9432 1756 616 D 7 0.0 0:02.53 dd
> 17261 root 20 0 9432 1756 616 D 7 0.0 0:02.36 dd
> 17265 root 20 0 9432 1756 616 R 7 0.0 0:02.47 dd
> 17266 root 20 0 9432 1756 616 R 7 0.0 0:02.44 dd
> 17269 root 20 0 9432 1756 616 D 7 0.0 0:02.62 dd
> 17270 root 20 0 9432 1756 616 D 7 0.0 0:02.46 dd
> 17272 root 20 0 9432 1756 616 D 7 0.0 0:02.36 dd
> 17273 root 20 0 9432 1756 616 D 7 0.0 0:02.46 dd
> 17276 root 20 0 9432 1752 616 D 7 0.0 0:02.36 dd
> 17278 root 20 0 9432 1752 616 D 7 0.0 0:02.44 dd
> 17259 root 20 0 9432 1752 616 D 6 0.0 0:02.37 dd
>
>
> It looks like device mapper produces long SG lists and end_clone_bio() has
> someting like quadratic complexity.
>
> The problem can be workarounded using:
> for i in /sys/block/dm-*; do echo 128 > $i/queue/max_sectors_kb; done
>
> to short SG lists.
>
> I use SLES 2.6.32.36-0.5-default kernel.
>
> Using iostat -x, I can see there is about 25000 rrmq/s, while there is only
> 180 r/s, so it looks like each bio contains more then 100 requests which makes
> serious troubles for ksoftirqd call backs.
I don't understand why you are saying request-based device-mapper makes
serious troubles.
BIO merging is done in the block layer. Don't you see the same thing
if you use sd devices (/dev/sdX)?
Also, what do you see with the latest upstream kernel (say, 3.0-rc7)?
If you see your problem only with request-based device-mapper, please
elaborate about below:
- end_clone_bio() has someting like quadratic complexity.
* What do you mean the "quadratic complexity"?
- each request contains more then 100 bios which makes serious
troubles for ksoftirqd call backs.
* What do you mean the "serious troubles for ksoftirqd call backs"?
> Without the mentioned workeround, I got only 600MB/s sum of all dd readers.
> With workernoud, I got about 2.8GB/s sum of all dd readers.
Thanks,
Kiyoshi Ueda
next prev parent reply other threads:[~2011-07-21 11:23 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-20 8:26 request baset device mapper in Linux Lukas Hejtmanek
2011-07-21 11:11 ` Kiyoshi Ueda [this message]
2011-07-21 13:26 ` Lukas Hejtmanek
2011-07-22 6:56 ` Kiyoshi Ueda
2011-07-22 8:19 ` Lukas Hejtmanek
2011-07-23 7:28 ` Jun'ichi Nomura
2011-07-24 22:16 ` Lukas Hejtmanek
2011-08-01 9:31 ` Kiyoshi Ueda
2011-09-08 13:27 ` Lukas Hejtmanek
2011-09-15 18:49 ` Mike Snitzer
2011-09-16 14:08 ` Lukas Hejtmanek
2011-09-19 5:50 ` Jun'ichi Nomura
2011-09-29 20:57 ` Lukas Hejtmanek
2011-10-05 8:13 ` Jun'ichi Nomura
2011-10-05 10:35 ` Lukas Hejtmanek
2011-10-06 5:11 ` Jun'ichi Nomura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E280964.6070000@ct.jp.nec.com \
--to=k-ueda@ct.jp.nec.com \
--cc=agk@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=xhejtman@ics.muni.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox