All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
To: Lukas Hejtmanek <xhejtman@ics.muni.cz>
Cc: agk@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: request baset device mapper in Linux
Date: Thu, 21 Jul 2011 20:11:32 +0900	[thread overview]
Message-ID: <4E280964.6070000@ct.jp.nec.com> (raw)
In-Reply-To: <20110720082640.GZ7561@ics.muni.cz>

Hi Lukas,

Lukas Hejtmanek wrote:
> Hi,
> 
> I encouter serious problems with you commit
> cec47e3d4a861e1d942b3a580d0bbef2700d2bb2 introducing request based device
> mapper in Linux.
> 
> I got machine with 80 SATA disks connected to two LSI SAS 2.0 controller
> (mpt2sas driver).
> 
> All disks are configured as multipath devices in failover mode:
> 
> defaults {
>         udev_dir        /dev
>         polling_interval 10
>         selector        "round-robin 0"
>         path_grouping_policy    failover
>         path_checker    directio
>         rr_min_io       100
>         no_path_retry   queue
>         user_friendly_names no
> }
> 
> if I run the following command, ksoftirqd eats 100% CPU as soon as all
> available memory is used for buffers.
> 
> for i in `seq 0 79`; do dd if=/dev/dm-$i of=/dev/null bs=1M count=10000 & done
> 
> top looks like this:
> 
> Mem:     48390M total,    45741M used,     2649M free,    43243M buffers
> Swap:        0M total,        0M used,        0M free,     1496M cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                       
>    12 root      20   0     0    0    0 R   96  0.0   0:38.78 ksoftirqd/4                                   
> 17263 root      20   0  9432 1752  616 R   14  0.0   0:03.19 dd                                            
> 17275 root      20   0  9432 1756  616 D   14  0.0   0:03.16 dd                                            
> 17271 root      20   0  9432 1756  616 D   10  0.0   0:02.60 dd                                            
> 17258 root      20   0  9432 1756  616 D    7  0.0   0:02.67 dd                                            
> 17260 root      20   0  9432 1756  616 D    7  0.0   0:02.47 dd                                            
> 17262 root      20   0  9432 1752  616 D    7  0.0   0:02.38 dd                                            
> 17264 root      20   0  9432 1756  616 D    7  0.0   0:02.42 dd                                            
> 17267 root      20   0  9432 1756  616 D    7  0.0   0:02.35 dd                                            
> 17268 root      20   0  9432 1756  616 D    7  0.0   0:02.45 dd                                            
> 17274 root      20   0  9432 1756  616 D    7  0.0   0:02.47 dd                                            
> 17277 root      20   0  9432 1756  616 D    7  0.0   0:02.53 dd                                            
> 17261 root      20   0  9432 1756  616 D    7  0.0   0:02.36 dd                                            
> 17265 root      20   0  9432 1756  616 R    7  0.0   0:02.47 dd                                            
> 17266 root      20   0  9432 1756  616 R    7  0.0   0:02.44 dd                                            
> 17269 root      20   0  9432 1756  616 D    7  0.0   0:02.62 dd                                            
> 17270 root      20   0  9432 1756  616 D    7  0.0   0:02.46 dd                                            
> 17272 root      20   0  9432 1756  616 D    7  0.0   0:02.36 dd                                            
> 17273 root      20   0  9432 1756  616 D    7  0.0   0:02.46 dd                                            
> 17276 root      20   0  9432 1752  616 D    7  0.0   0:02.36 dd                                            
> 17278 root      20   0  9432 1752  616 D    7  0.0   0:02.44 dd                                            
> 17259 root      20   0  9432 1752  616 D    6  0.0   0:02.37 dd 
> 
> 
> It looks like device mapper produces long SG lists and  end_clone_bio() has
> someting like quadratic complexity.
> 
> The problem can be workarounded using:
> for i in /sys/block/dm-*; do echo 128 > $i/queue/max_sectors_kb; done
> 
> to short SG lists.
> 
> I use SLES 2.6.32.36-0.5-default kernel.
> 
> Using iostat -x, I can see there is about 25000 rrmq/s, while there is only
> 180 r/s, so it looks like each bio contains more then 100 requests which makes
> serious troubles for ksoftirqd call backs.

I don't understand why you are saying request-based device-mapper makes
serious troubles.
BIO merging is done in the block layer.  Don't you see the same thing
if you use sd devices (/dev/sdX)?

Also, what do you see with the latest upstream kernel (say, 3.0-rc7)?

If you see your problem only with request-based device-mapper, please
elaborate about below:
    - end_clone_bio() has someting like quadratic complexity.
        * What do you mean the "quadratic complexity"?
    - each request contains more then 100 bios which makes serious
      troubles for ksoftirqd call backs.
        * What do you mean the "serious troubles for ksoftirqd call backs"?

> Without the mentioned workeround, I got only 600MB/s sum of all dd readers.
> With workernoud, I got about 2.8GB/s sum of all dd readers.

Thanks,
Kiyoshi Ueda


  reply	other threads:[~2011-07-21 11:23 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-20  8:26 request baset device mapper in Linux Lukas Hejtmanek
2011-07-21 11:11 ` Kiyoshi Ueda [this message]
2011-07-21 13:26   ` Lukas Hejtmanek
2011-07-22  6:56     ` Kiyoshi Ueda
2011-07-22  8:19       ` Lukas Hejtmanek
2011-07-23  7:28         ` Jun'ichi Nomura
2011-07-24 22:16           ` Lukas Hejtmanek
2011-08-01  9:31             ` Kiyoshi Ueda
2011-09-08 13:27               ` Lukas Hejtmanek
2011-09-15 18:49                 ` Mike Snitzer
2011-09-16 14:08                   ` Lukas Hejtmanek
2011-09-19  5:50                     ` Jun'ichi Nomura
2011-09-29 20:57                       ` Lukas Hejtmanek
2011-10-05  8:13                         ` Jun'ichi Nomura
2011-10-05 10:35                           ` Lukas Hejtmanek
2011-10-06  5:11                             ` Jun'ichi Nomura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E280964.6070000@ct.jp.nec.com \
    --to=k-ueda@ct.jp.nec.com \
    --cc=agk@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xhejtman@ics.muni.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.