public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* request baset device mapper in Linux
@ 2011-07-20  8:26 Lukas Hejtmanek
  2011-07-21 11:11 ` Kiyoshi Ueda
  0 siblings, 1 reply; 16+ messages in thread
From: Lukas Hejtmanek @ 2011-07-20  8:26 UTC (permalink / raw)
  To: k-ueda; +Cc: agk, linux-kernel

Hi,

I encouter serious problems with you commit
cec47e3d4a861e1d942b3a580d0bbef2700d2bb2 introducing request based device
mapper in Linux.

I got machine with 80 SATA disks connected to two LSI SAS 2.0 controller
(mpt2sas driver).

All disks are configured as multipath devices in failover mode:

defaults {
        udev_dir        /dev
        polling_interval 10
        selector        "round-robin 0"
        path_grouping_policy    failover
        path_checker    directio
        rr_min_io       100
        no_path_retry   queue
        user_friendly_names no
}

if I run the following command, ksoftirqd eats 100% CPU as soon as all
available memory is used for buffers.

for i in `seq 0 79`; do dd if=/dev/dm-$i of=/dev/null bs=1M count=10000 & done

top looks like this:

Mem:     48390M total,    45741M used,     2649M free,    43243M buffers
Swap:        0M total,        0M used,        0M free,     1496M cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                       
   12 root      20   0     0    0    0 R   96  0.0   0:38.78 ksoftirqd/4                                   
17263 root      20   0  9432 1752  616 R   14  0.0   0:03.19 dd                                            
17275 root      20   0  9432 1756  616 D   14  0.0   0:03.16 dd                                            
17271 root      20   0  9432 1756  616 D   10  0.0   0:02.60 dd                                            
17258 root      20   0  9432 1756  616 D    7  0.0   0:02.67 dd                                            
17260 root      20   0  9432 1756  616 D    7  0.0   0:02.47 dd                                            
17262 root      20   0  9432 1752  616 D    7  0.0   0:02.38 dd                                            
17264 root      20   0  9432 1756  616 D    7  0.0   0:02.42 dd                                            
17267 root      20   0  9432 1756  616 D    7  0.0   0:02.35 dd                                            
17268 root      20   0  9432 1756  616 D    7  0.0   0:02.45 dd                                            
17274 root      20   0  9432 1756  616 D    7  0.0   0:02.47 dd                                            
17277 root      20   0  9432 1756  616 D    7  0.0   0:02.53 dd                                            
17261 root      20   0  9432 1756  616 D    7  0.0   0:02.36 dd                                            
17265 root      20   0  9432 1756  616 R    7  0.0   0:02.47 dd                                            
17266 root      20   0  9432 1756  616 R    7  0.0   0:02.44 dd                                            
17269 root      20   0  9432 1756  616 D    7  0.0   0:02.62 dd                                            
17270 root      20   0  9432 1756  616 D    7  0.0   0:02.46 dd                                            
17272 root      20   0  9432 1756  616 D    7  0.0   0:02.36 dd                                            
17273 root      20   0  9432 1756  616 D    7  0.0   0:02.46 dd                                            
17276 root      20   0  9432 1752  616 D    7  0.0   0:02.36 dd                                            
17278 root      20   0  9432 1752  616 D    7  0.0   0:02.44 dd                                            
17259 root      20   0  9432 1752  616 D    6  0.0   0:02.37 dd 


It looks like device mapper produces long SG lists and  end_clone_bio() has
someting like quadratic complexity.

The problem can be workarounded using:
for i in /sys/block/dm-*; do echo 128 > $i/queue/max_sectors_kb; done

to short SG lists.

I use SLES 2.6.32.36-0.5-default kernel.

Using iostat -x, I can see there is about 25000 rrmq/s, while there is only
180 r/s, so it looks like each bio contains more then 100 requests which makes
serious troubles for ksoftirqd call backs.

Without the mentioned workeround, I got only 600MB/s sum of all dd readers.
With workernoud, I got about 2.8GB/s sum of all dd readers.

-- 
Lukáš Hejtmánek

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-10-06  5:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-20  8:26 request baset device mapper in Linux Lukas Hejtmanek
2011-07-21 11:11 ` Kiyoshi Ueda
2011-07-21 13:26   ` Lukas Hejtmanek
2011-07-22  6:56     ` Kiyoshi Ueda
2011-07-22  8:19       ` Lukas Hejtmanek
2011-07-23  7:28         ` Jun'ichi Nomura
2011-07-24 22:16           ` Lukas Hejtmanek
2011-08-01  9:31             ` Kiyoshi Ueda
2011-09-08 13:27               ` Lukas Hejtmanek
2011-09-15 18:49                 ` Mike Snitzer
2011-09-16 14:08                   ` Lukas Hejtmanek
2011-09-19  5:50                     ` Jun'ichi Nomura
2011-09-29 20:57                       ` Lukas Hejtmanek
2011-10-05  8:13                         ` Jun'ichi Nomura
2011-10-05 10:35                           ` Lukas Hejtmanek
2011-10-06  5:11                             ` Jun'ichi Nomura

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox