From: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
To: Lukas Hejtmanek <xhejtman@ics.muni.cz>
Cc: agk@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: request baset device mapper in Linux
Date: Thu, 21 Jul 2011 20:11:32 +0900 [thread overview]
Message-ID: <4E280964.6070000@ct.jp.nec.com> (raw)
In-Reply-To: <20110720082640.GZ7561@ics.muni.cz>
Hi Lukas,
Lukas Hejtmanek wrote:
> Hi,
>
> I encouter serious problems with you commit
> cec47e3d4a861e1d942b3a580d0bbef2700d2bb2 introducing request based device
> mapper in Linux.
>
> I got machine with 80 SATA disks connected to two LSI SAS 2.0 controller
> (mpt2sas driver).
>
> All disks are configured as multipath devices in failover mode:
>
> defaults {
> udev_dir /dev
> polling_interval 10
> selector "round-robin 0"
> path_grouping_policy failover
> path_checker directio
> rr_min_io 100
> no_path_retry queue
> user_friendly_names no
> }
>
> if I run the following command, ksoftirqd eats 100% CPU as soon as all
> available memory is used for buffers.
>
> for i in `seq 0 79`; do dd if=/dev/dm-$i of=/dev/null bs=1M count=10000 & done
>
> top looks like this:
>
> Mem: 48390M total, 45741M used, 2649M free, 43243M buffers
> Swap: 0M total, 0M used, 0M free, 1496M cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 12 root 20 0 0 0 0 R 96 0.0 0:38.78 ksoftirqd/4
> 17263 root 20 0 9432 1752 616 R 14 0.0 0:03.19 dd
> 17275 root 20 0 9432 1756 616 D 14 0.0 0:03.16 dd
> 17271 root 20 0 9432 1756 616 D 10 0.0 0:02.60 dd
> 17258 root 20 0 9432 1756 616 D 7 0.0 0:02.67 dd
> 17260 root 20 0 9432 1756 616 D 7 0.0 0:02.47 dd
> 17262 root 20 0 9432 1752 616 D 7 0.0 0:02.38 dd
> 17264 root 20 0 9432 1756 616 D 7 0.0 0:02.42 dd
> 17267 root 20 0 9432 1756 616 D 7 0.0 0:02.35 dd
> 17268 root 20 0 9432 1756 616 D 7 0.0 0:02.45 dd
> 17274 root 20 0 9432 1756 616 D 7 0.0 0:02.47 dd
> 17277 root 20 0 9432 1756 616 D 7 0.0 0:02.53 dd
> 17261 root 20 0 9432 1756 616 D 7 0.0 0:02.36 dd
> 17265 root 20 0 9432 1756 616 R 7 0.0 0:02.47 dd
> 17266 root 20 0 9432 1756 616 R 7 0.0 0:02.44 dd
> 17269 root 20 0 9432 1756 616 D 7 0.0 0:02.62 dd
> 17270 root 20 0 9432 1756 616 D 7 0.0 0:02.46 dd
> 17272 root 20 0 9432 1756 616 D 7 0.0 0:02.36 dd
> 17273 root 20 0 9432 1756 616 D 7 0.0 0:02.46 dd
> 17276 root 20 0 9432 1752 616 D 7 0.0 0:02.36 dd
> 17278 root 20 0 9432 1752 616 D 7 0.0 0:02.44 dd
> 17259 root 20 0 9432 1752 616 D 6 0.0 0:02.37 dd
>
>
> It looks like device mapper produces long SG lists and end_clone_bio() has
> someting like quadratic complexity.
>
> The problem can be workarounded using:
> for i in /sys/block/dm-*; do echo 128 > $i/queue/max_sectors_kb; done
>
> to short SG lists.
>
> I use SLES 2.6.32.36-0.5-default kernel.
>
> Using iostat -x, I can see there is about 25000 rrmq/s, while there is only
> 180 r/s, so it looks like each bio contains more then 100 requests which makes
> serious troubles for ksoftirqd call backs.
I don't understand why you are saying request-based device-mapper makes
serious troubles.
BIO merging is done in the block layer. Don't you see the same thing
if you use sd devices (/dev/sdX)?
Also, what do you see with the latest upstream kernel (say, 3.0-rc7)?
If you see your problem only with request-based device-mapper, please
elaborate about below:
- end_clone_bio() has someting like quadratic complexity.
* What do you mean the "quadratic complexity"?
- each request contains more then 100 bios which makes serious
troubles for ksoftirqd call backs.
* What do you mean the "serious troubles for ksoftirqd call backs"?
> Without the mentioned workeround, I got only 600MB/s sum of all dd readers.
> With workernoud, I got about 2.8GB/s sum of all dd readers.
Thanks,
Kiyoshi Ueda
next prev parent reply other threads:[~2011-07-21 11:23 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-20 8:26 request baset device mapper in Linux Lukas Hejtmanek
2011-07-21 11:11 ` Kiyoshi Ueda [this message]
2011-07-21 13:26 ` Lukas Hejtmanek
2011-07-22 6:56 ` Kiyoshi Ueda
2011-07-22 8:19 ` Lukas Hejtmanek
2011-07-23 7:28 ` Jun'ichi Nomura
2011-07-24 22:16 ` Lukas Hejtmanek
2011-08-01 9:31 ` Kiyoshi Ueda
2011-09-08 13:27 ` Lukas Hejtmanek
2011-09-15 18:49 ` Mike Snitzer
2011-09-16 14:08 ` Lukas Hejtmanek
2011-09-19 5:50 ` Jun'ichi Nomura
2011-09-29 20:57 ` Lukas Hejtmanek
2011-10-05 8:13 ` Jun'ichi Nomura
2011-10-05 10:35 ` Lukas Hejtmanek
2011-10-06 5:11 ` Jun'ichi Nomura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E280964.6070000@ct.jp.nec.com \
--to=k-ueda@ct.jp.nec.com \
--cc=agk@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=xhejtman@ics.muni.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.