All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steve Lord <lord@xfs.org>
To: rgautier@redhat.com, device-mapper development <dm-devel@redhat.com>
Cc: consult-list@redhat.com
Subject: Re: dm-multipath has great throughput but we'd like more!
Date: Thu, 18 May 2006 15:28:09 -0500	[thread overview]
Message-ID: <446CD8D9.2010106@xfs.org> (raw)
In-Reply-To: <1147938254.27006.65.camel@baggage>


Provided you have things cabled right and you have 2 HBA ports going either
into a switch, or into the controllers of the raid (raid probably has 4
ports), then the theoretical bandwidth is closer to 400 Mbytes/sec. Pretty sure
any reasonable Hitachi raid will sustain close to that. Using other software and
raid hardware I can generally sustain 375 Mbytes/sec from 2 qlogic hba ports in
a  fairly old dell server box, and that is going through 3 switches in the
middle.

You need to have sustained I/O which is directed at both sides of the
raid though. Not sure about the HDS 9980, but I think that is an
active/active raid, which means each controller can access each lun
in parallel. You really need to be striping your I/O across the luns
and controllers though. You can pull tricks to measure the fabric
capacity vs the storage bandwidth by using the raid's cache. Ensure you
have caching enabled in the raid, and have a file which is laid out
across multiple luns. Read a file which is a large percentage of the
cache size using o_direct (lmdd can be built with direct I/O support).
Then run the read again, if you did it right, you just eliminated the
spindles from the I/O.

Not sure about the hitachi raid again, but a lun would generally
belong to a controller on the raid, and there are usually two
controllers. Make sure that when you build the volume you stripe
luns so that they alternate between controllers. Then you need to
make sure that your I/Os are large enough to hit multiple disks
at once. There are lots of tricks to tuning this type of setup.

The problem with the load balancing in dm-multipath is that it is not
really load balancing, it is round robin, on a per lun basis I think,
it has no global picture of how much other load is currently going
to each HBA or controller port. The best you can do is drop the value
of rr_min_io in the /etc/multipath.conf file to a small value, try
something like 1 or 2.

Steve


Bob Gautier wrote:
> On Thu, 2006-05-18 at 02:25 -0500, Jonathan E Brassow wrote:
>> The system bus isn't a limiting factor is it?  64-bit PCI-X will get 
>> 8.5 GB/s (plenty), but 32-bit PCI 33MHz got 133MB/s.
>>
>> Can your disks sustain that much bandwidth? 10 striped drives might get 
>> better than 200MB/s if done right, I suppose.
>>
>> Don't the switches run at 2 Gbits/s?  2 Gbits/s / 10 (throw in 2 bits 
>> for protocol) ~= 200MB/s.
>>
> 
> Thanks for the fast responses:
> 
> The card is a 64-bit PCI-X, so I don't think the bus is the bottleneck,
> and anyway the vendor specifies a maximum throughput of 200Mbyte/s per
> card.
> 
> The disk array does not appear to be the bottleneck because we get
> 200Mbyte/s when we use *two* HBAs in load-balanced mode.
> 
> The question is really about why we only see O(100Mbyte/s) with one HBA
> when we can achieve O(200MByte/s) with two cards, given that one card
> should be able to achieve that throughput.
> 
> I don't think the method of producing the traffic (bonnie++ or something
> else) should be relevant but if it were that would be very interesting
> for the benchmark authors!
> 
> The storage is an HDS 9980 (I think?)
> 
>> Could be a bunch of reasons...
>>
>>   brassow
>>
>> On May 18, 2006, at 2:05 AM, Bob Gautier wrote:
>>
>>> Yesterday my client was testing of multipath load balancing and 
>>> failover
>>> on a system running ext3 on a logical volume which comprises about ten
>>> SAN LUNs all reached using multipath in multibus mode over two QL2340
>>> HBAs.
>>>
>>> On the one hand, the client is very impressed: running bonnie++
>>> (inspired by Ronan's GFS v VxFS example) we get just over 200Mbyte/s
>>> over the two HBAs, and when we pull a link we get about 120MByte/s.
>>>
>>> The throughput and failover response times are better than the client
>>> has ever seen, but we're wondering why we are not seeing higher
>>> throughput per-HBA -- the QL2340 datasheet says it should manage
>>> 200Mbyte/s and all switches etc. run at 2GBps.
>>>
>>> Any ideas?
>>>
>>> Bob Gautier
>>> +44 7921 700996
>>>

  parent reply	other threads:[~2006-05-18 20:28 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-18  7:05 dm-multipath has great throughput but we'd like more! Bob Gautier
2006-05-18  7:19 ` [Consult-list] " Bob Gautier
2006-05-18  7:27   ` Luca Berra
2006-05-18  7:36     ` Jonathan E Brassow
2006-05-18  7:44       ` Luca Berra
2006-05-18  7:25 ` Jonathan E Brassow
2006-05-18  7:44   ` Bob Gautier
2006-05-18  7:55     ` Jonathan E Brassow
2006-05-18  7:59     ` Luca Berra
2006-05-18  8:04     ` [Consult-list] " Nicholas C. Strugnell
2006-05-18  9:42       ` Nicholas C. Strugnell
2006-05-18 10:28         ` Richard Keech
2006-05-22 15:31         ` Ed Wilts
2006-05-18 20:28     ` Steve Lord [this message]
2006-05-18 17:00 ` [Consult-list] " Rod Nayfield

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=446CD8D9.2010106@xfs.org \
    --to=lord@xfs.org \
    --cc=consult-list@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=rgautier@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.