From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752193Ab1GULXl (ORCPT ); Thu, 21 Jul 2011 07:23:41 -0400 Received: from TYO200.gate.nec.co.jp ([202.32.8.215]:38144 "EHLO tyo200.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751690Ab1GULXj (ORCPT ); Thu, 21 Jul 2011 07:23:39 -0400 Message-ID: <4E280964.6070000@ct.jp.nec.com> Date: Thu, 21 Jul 2011 20:11:32 +0900 From: Kiyoshi Ueda User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: Lukas Hejtmanek CC: agk@redhat.com, linux-kernel@vger.kernel.org Subject: Re: request baset device mapper in Linux References: <20110720082640.GZ7561@ics.muni.cz> In-Reply-To: <20110720082640.GZ7561@ics.muni.cz> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Lukas, Lukas Hejtmanek wrote: > Hi, > > I encouter serious problems with you commit > cec47e3d4a861e1d942b3a580d0bbef2700d2bb2 introducing request based device > mapper in Linux. > > I got machine with 80 SATA disks connected to two LSI SAS 2.0 controller > (mpt2sas driver). > > All disks are configured as multipath devices in failover mode: > > defaults { > udev_dir /dev > polling_interval 10 > selector "round-robin 0" > path_grouping_policy failover > path_checker directio > rr_min_io 100 > no_path_retry queue > user_friendly_names no > } > > if I run the following command, ksoftirqd eats 100% CPU as soon as all > available memory is used for buffers. > > for i in `seq 0 79`; do dd if=/dev/dm-$i of=/dev/null bs=1M count=10000 & done > > top looks like this: > > Mem: 48390M total, 45741M used, 2649M free, 43243M buffers > Swap: 0M total, 0M used, 0M free, 1496M cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 12 root 20 0 0 0 0 R 96 0.0 0:38.78 ksoftirqd/4 > 17263 root 20 0 9432 1752 616 R 14 0.0 0:03.19 dd > 17275 root 20 0 9432 1756 616 D 14 0.0 0:03.16 dd > 17271 root 20 0 9432 1756 616 D 10 0.0 0:02.60 dd > 17258 root 20 0 9432 1756 616 D 7 0.0 0:02.67 dd > 17260 root 20 0 9432 1756 616 D 7 0.0 0:02.47 dd > 17262 root 20 0 9432 1752 616 D 7 0.0 0:02.38 dd > 17264 root 20 0 9432 1756 616 D 7 0.0 0:02.42 dd > 17267 root 20 0 9432 1756 616 D 7 0.0 0:02.35 dd > 17268 root 20 0 9432 1756 616 D 7 0.0 0:02.45 dd > 17274 root 20 0 9432 1756 616 D 7 0.0 0:02.47 dd > 17277 root 20 0 9432 1756 616 D 7 0.0 0:02.53 dd > 17261 root 20 0 9432 1756 616 D 7 0.0 0:02.36 dd > 17265 root 20 0 9432 1756 616 R 7 0.0 0:02.47 dd > 17266 root 20 0 9432 1756 616 R 7 0.0 0:02.44 dd > 17269 root 20 0 9432 1756 616 D 7 0.0 0:02.62 dd > 17270 root 20 0 9432 1756 616 D 7 0.0 0:02.46 dd > 17272 root 20 0 9432 1756 616 D 7 0.0 0:02.36 dd > 17273 root 20 0 9432 1756 616 D 7 0.0 0:02.46 dd > 17276 root 20 0 9432 1752 616 D 7 0.0 0:02.36 dd > 17278 root 20 0 9432 1752 616 D 7 0.0 0:02.44 dd > 17259 root 20 0 9432 1752 616 D 6 0.0 0:02.37 dd > > > It looks like device mapper produces long SG lists and end_clone_bio() has > someting like quadratic complexity. > > The problem can be workarounded using: > for i in /sys/block/dm-*; do echo 128 > $i/queue/max_sectors_kb; done > > to short SG lists. > > I use SLES 2.6.32.36-0.5-default kernel. > > Using iostat -x, I can see there is about 25000 rrmq/s, while there is only > 180 r/s, so it looks like each bio contains more then 100 requests which makes > serious troubles for ksoftirqd call backs. I don't understand why you are saying request-based device-mapper makes serious troubles. BIO merging is done in the block layer. Don't you see the same thing if you use sd devices (/dev/sdX)? Also, what do you see with the latest upstream kernel (say, 3.0-rc7)? If you see your problem only with request-based device-mapper, please elaborate about below: - end_clone_bio() has someting like quadratic complexity. * What do you mean the "quadratic complexity"? - each request contains more then 100 bios which makes serious troubles for ksoftirqd call backs. * What do you mean the "serious troubles for ksoftirqd call backs"? > Without the mentioned workeround, I got only 600MB/s sum of all dd readers. > With workernoud, I got about 2.8GB/s sum of all dd readers. Thanks, Kiyoshi Ueda