* [Multipath] Round-robin performance limit
@ 2011-04-28 15:55 Adam Chasen
2011-05-02 7:25 ` Pasi Kärkkäinen
0 siblings, 1 reply; 16+ messages in thread
From: Adam Chasen @ 2011-04-28 15:55 UTC (permalink / raw)
To: dm-devel
I am very pleased with the features and even some of the documentation
(once I found the RedHat docs) surrounding the latest .49 multipath
tools.
I would imagine that the multipath driver would attempt to max out the
links available to it, but I am not seeing that behavior. I am unable
to achieve bandwidth greater than the value of one of the four links.
Is this the expected behavior?
It is balancing the traffic between the links and when a path fails
the bandwidth increases proportionately between the remaining links. I
originally thought this may be a problem outside of multipath, but
accessing the devices directly allows me to max out all of my links.
If there is a more appropriate venue for this question, I would
appreciate a redirection.
The current setup is as follows:
* iSCSI with 4 portals with two LUNs defined
* server connected to each portal over 4 Gigabit ports (1 to 1 mapping
of ports) yielding 4 devices for each LUN, 8 devices total
There is one device per LUN per portal connection. Multipathing is
enabled with multibus so the round robin will leverage all (4) devices
available per LUN.
I have experienced the following scenarios. I used dd (reading from
device) and bmon (network interface monitor) for all of these tests.
Note that the bandwidth never exceeds 113MB/s.
* direct (no multipath)
** all links fully saturated
** bandwidth close to theoretical max of the gigabit connection (113MB/s).
* all dvices active (multipath)
** all links equally balanced
** links show 1/4 saturation (~30MB/s)
** bandwidth ~113MB/s
* 3 of 4 devices active (multipath)
** remaining links equally balanced
** remaining links show 1/3 saturation (~40MB/s)
** bandwidth ~113MB/s
* 2 of 4 devices active (multipath)
** active links equally balanced
** active links show 1/2 saturation (~60MB/s)
** bandwidth ~113MB/s
* 1 of 4 devices active (multipath)
** active links equally balanced
** active link shows full saturation (~113MB/s)
** bandwidth ~113MB/s
To ensure that it is not the transport or backing storage, I dd from
the direct device, during the tests where the links were not fully
saturated and I was able to fully saturate the link.
[root@zed ~]# multipath -ll
3600c0ff000111346d473554d01000000 dm-3 DotHill,DH3000
size=1.1T features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 88:0:0:0 sdd 8:48 active ready running
|- 86:0:0:0 sdc 8:32 active ready running
|- 89:0:0:0 sdg 8:96 active ready running
`- 87:0:0:0 sdf 8:80 active ready running
3600c0ff00011148af973554d01000000 dm-2 DotHill,DH3000
size=1.1T features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 89:0:0:1 sdk 8:160 active ready running
|- 88:0:0:1 sdi 8:128 active ready running
|- 86:0:0:1 sdh 8:112 active ready running
`- 87:0:0:1 sdl 8:176 active ready running
/etc/multipath.conf
defaults {
path_grouping_policy multibus
rr_min_io 100
}
multipath-tools v0.4.9 (05/33, 2016)
2.6.35.11-2-fl.smp.gcc4.4.x86_64
Thanks,
Adam
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Multipath] Round-robin performance limit
2011-04-28 15:55 [Multipath] Round-robin performance limit Adam Chasen
@ 2011-05-02 7:25 ` Pasi Kärkkäinen
2011-05-02 13:36 ` Adam Chasen
0 siblings, 1 reply; 16+ messages in thread
From: Pasi Kärkkäinen @ 2011-05-02 7:25 UTC (permalink / raw)
To: device-mapper development
On Thu, Apr 28, 2011 at 11:55:55AM -0400, Adam Chasen wrote:
>
> [root@zed ~]# multipath -ll
> 3600c0ff000111346d473554d01000000 dm-3 DotHill,DH3000
> size=1.1T features='0' hwhandler='0' wp=rw
> `-+- policy='round-robin 0' prio=1 status=active
> |- 88:0:0:0 sdd 8:48 active ready running
> |- 86:0:0:0 sdc 8:32 active ready running
> |- 89:0:0:0 sdg 8:96 active ready running
> `- 87:0:0:0 sdf 8:80 active ready running
> 3600c0ff00011148af973554d01000000 dm-2 DotHill,DH3000
> size=1.1T features='0' hwhandler='0' wp=rw
> `-+- policy='round-robin 0' prio=1 status=active
> |- 89:0:0:1 sdk 8:160 active ready running
> |- 88:0:0:1 sdi 8:128 active ready running
> |- 86:0:0:1 sdh 8:112 active ready running
> `- 87:0:0:1 sdl 8:176 active ready running
>
> /etc/multipath.conf
> defaults {
> path_grouping_policy multibus
> rr_min_io 100
> }
Did you try a lower value for rr_min_io ?
-- Pasi
>
> multipath-tools v0.4.9 (05/33, 2016)
> 2.6.35.11-2-fl.smp.gcc4.4.x86_64
>
> Thanks,
> Adam
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Multipath] Round-robin performance limit
2011-05-02 7:25 ` Pasi Kärkkäinen
@ 2011-05-02 13:36 ` Adam Chasen
2011-05-02 22:27 ` John A. Sullivan III
0 siblings, 1 reply; 16+ messages in thread
From: Adam Chasen @ 2011-05-02 13:36 UTC (permalink / raw)
To: device-mapper development
Lowering rr_min_io provides marginal improvement. I see 6MB/s
improvement at an rr_min_io of 3 vs 100. I played around with it
before all the way down to 1. People seems to settle on 3. Still, I am
not seeing the bandwidth I assume it should be (4 aggregated links).
Some additional information. If I attempt to pull from my two
multipath devices simultaneously (different LUNs, but same iSCSI
connections) then I can pull additional data (50MB/s vs 27-30MB/s
from each link).
Adam
This is a response to a direct email I sent to someone who had a
similar issue on this list a while back:
Date: Sat, 30 Apr 2011 00:13:20 +0200
From: Bart Coninckx <bart.coninckx@telenet.be>
Hi Adam,
I believe setting rr_min_io to 3 in stead of 100 improved things
significantly.
What is still an unexplainable issue though is dd-ing to the multipath
device (very slow) while reading from it is very fast. Doing the same
piped over SSH to the original devices on the iSCSI server was OK, so it
seems like either an iSCSI or still a multipath issue.
But I definitely remember that lowering rr_min_io helped quite a bit.
I think the paths are switched faster in this way resulting into more speed.
Good luck,
b.
On Mon, May 2, 2011 at 3:25 AM, Pasi Kärkkäinen <pasik@iki.fi> wrote:
> On Thu, Apr 28, 2011 at 11:55:55AM -0400, Adam Chasen wrote:
>>
>> [root@zed ~]# multipath -ll
>> 3600c0ff000111346d473554d01000000 dm-3 DotHill,DH3000
>> size=1.1T features='0' hwhandler='0' wp=rw
>> `-+- policy='round-robin 0' prio=1 status=active
>> |- 88:0:0:0 sdd 8:48 active ready running
>> |- 86:0:0:0 sdc 8:32 active ready running
>> |- 89:0:0:0 sdg 8:96 active ready running
>> `- 87:0:0:0 sdf 8:80 active ready running
>> 3600c0ff00011148af973554d01000000 dm-2 DotHill,DH3000
>> size=1.1T features='0' hwhandler='0' wp=rw
>> `-+- policy='round-robin 0' prio=1 status=active
>> |- 89:0:0:1 sdk 8:160 active ready running
>> |- 88:0:0:1 sdi 8:128 active ready running
>> |- 86:0:0:1 sdh 8:112 active ready running
>> `- 87:0:0:1 sdl 8:176 active ready running
>>
>> /etc/multipath.conf
>> defaults {
>> path_grouping_policy multibus
>> rr_min_io 100
>> }
>
> Did you try a lower value for rr_min_io ?
>
> -- Pasi
>
>>
>> multipath-tools v0.4.9 (05/33, 2016)
>> 2.6.35.11-2-fl.smp.gcc4.4.x86_64
>>
>> Thanks,
>> Adam
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Multipath] Round-robin performance limit
2011-05-02 13:36 ` Adam Chasen
@ 2011-05-02 22:27 ` John A. Sullivan III
2011-05-03 5:04 ` Malahal Naineni
0 siblings, 1 reply; 16+ messages in thread
From: John A. Sullivan III @ 2011-05-02 22:27 UTC (permalink / raw)
To: device-mapper development; +Cc: pmdaws
On Mon, 2011-05-02 at 09:36 -0400, Adam Chasen wrote:
> Lowering rr_min_io provides marginal improvement. I see 6MB/s
> improvement at an rr_min_io of 3 vs 100. I played around with it
> before all the way down to 1. People seems to settle on 3. Still, I am
> not seeing the bandwidth I assume it should be (4 aggregated links).
>
> Some additional information. If I attempt to pull from my two
> multipath devices simultaneously (different LUNs, but same iSCSI
> connections) then I can pull additional data (50MB/s vs 27-30MB/s
> from each link).
>
> Adam
>
> This is a response to a direct email I sent to someone who had a
> similar issue on this list a while back:
> Date: Sat, 30 Apr 2011 00:13:20 +0200
> From: Bart Coninckx <bart.coninckx@telenet.be>
> Hi Adam,
>
> I believe setting rr_min_io to 3 in stead of 100 improved things
> significantly.
> What is still an unexplainable issue though is dd-ing to the multipath
> device (very slow) while reading from it is very fast. Doing the same
> piped over SSH to the original devices on the iSCSI server was OK, so it
> seems like either an iSCSI or still a multipath issue.
>
> But I definitely remember that lowering rr_min_io helped quite a bit.
> I think the paths are switched faster in this way resulting into more speed.
>
> Good luck,
>
> b.
>
>
> On Mon, May 2, 2011 at 3:25 AM, Pasi Kärkkäinen <pasik@iki.fi> wrote:
> > On Thu, Apr 28, 2011 at 11:55:55AM -0400, Adam Chasen wrote:
> >>
> >> [root@zed ~]# multipath -ll
> >> 3600c0ff000111346d473554d01000000 dm-3 DotHill,DH3000
> >> size=1.1T features='0' hwhandler='0' wp=rw
> >> `-+- policy='round-robin 0' prio=1 status=active
> >> |- 88:0:0:0 sdd 8:48 active ready running
> >> |- 86:0:0:0 sdc 8:32 active ready running
> >> |- 89:0:0:0 sdg 8:96 active ready running
> >> `- 87:0:0:0 sdf 8:80 active ready running
> >> 3600c0ff00011148af973554d01000000 dm-2 DotHill,DH3000
> >> size=1.1T features='0' hwhandler='0' wp=rw
> >> `-+- policy='round-robin 0' prio=1 status=active
> >> |- 89:0:0:1 sdk 8:160 active ready running
> >> |- 88:0:0:1 sdi 8:128 active ready running
> >> |- 86:0:0:1 sdh 8:112 active ready running
> >> `- 87:0:0:1 sdl 8:176 active ready running
> >>
> >> /etc/multipath.conf
> >> defaults {
> >> path_grouping_policy multibus
> >> rr_min_io 100
> >> }
> >
> > Did you try a lower value for rr_min_io ?
> >
> > -- Pasi
> >
> >>
> >> multipath-tools v0.4.9 (05/33, 2016)
> >> 2.6.35.11-2-fl.smp.gcc4.4.x86_64
<snip>
I'm quite curious to see what you ultimately find on this as we have a
similar setup (four paths to an iSCSI SAN) and have struggled quite a
bit. We had settled on using multipath for failover but load balancing
using software RAID0 across the four devices. That seemed to provide
more even scaling under various IO patterns until we realized we could
not take a transactionally consistent snapshot of the SAN because we
would not know which RAID transaction had been committed at the timeof
the snapshot. Thus, we are planning to implement multibus.
What scheduler are you using? We found that the default cfq scheduler in
our kernel versions (2.6.28 and 29) did not scale at all to the number
of parallel iSCSI sessions. Deadline or noop scaled almost linearly.
We then realized that our SAN (Nexenta running ZFS) was doing its own
optimization of writing to the physical media (we assumed that's what
the scheduler is for) so we had no need for the overhead of any
scheduler and set ours to noop except for local disks.
I'm also very curious about your findings on rr_min_io. I cannot find
my benchmarks but we tested various settings heavily. I do not recall
if we saw more even scaling with 10 or 100. I remember being surprised
that performance with it set to 1 was poor. I would have thought that,
in a bonded environment, changing paths per iSCSI command would give
optimal performance. Can anyone explain why it does not?
We speculated that it either added too much overhead to manage the
constant switching or it was the nature of iSCSI. Does each iSCSI
command need to be acknowledged before the next one can be sent? If so,
does multibus not increase throughput any individual iSCSI stream but
only as we multiplex iSCSI streams?
If that is the case, it would exacerbate the already significant problem
of Linux, iSCSI, and latency. We have found that in any Linux disk IO
that touches the Linux file system, iSCSI performance is quite poor
because it is latency bound due to the maximum 4KB page size. I'm only
parroting what others have told me so correct me if I am wrong. Since
iSCSI can only commit 4KB at a time in Linux (unless bypassing the file
system with raw devices, dd, or direct writes in something like Oracle),
and since each write needs to be acknowledged before the next is sent,
and because sending 4KB down a high speed pipe like 10Gbps or even 1Gbps
comes nowhere near to saturating the link, iSCSI Linux IO is latency
bound and no amount of increase in bandwidth or number of bound channels
will increase the throughput of an individual iSCSI stream. Only
minimizing latency will.
I hope some of that might have helped and look forward to hearing about
your optimization of multibus. Thanks - John
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Multipath] Round-robin performance limit
2011-05-02 22:27 ` John A. Sullivan III
@ 2011-05-03 5:04 ` Malahal Naineni
2011-05-03 10:12 ` John A. Sullivan III
0 siblings, 1 reply; 16+ messages in thread
From: Malahal Naineni @ 2011-05-03 5:04 UTC (permalink / raw)
To: dm-devel
John A. Sullivan III [jsullivan@opensourcedevel.com] wrote:
> I'm also very curious about your findings on rr_min_io. I cannot find
> my benchmarks but we tested various settings heavily. I do not recall
> if we saw more even scaling with 10 or 100. I remember being surprised
> that performance with it set to 1 was poor. I would have thought that,
> in a bonded environment, changing paths per iSCSI command would give
> optimal performance. Can anyone explain why it does not?
rr_min_io of 1 will give poor performance if your multipath kernel
module doesn't support request based multipath. In those BIO based
multipath, multipath receives 4KB requests. Such requests can't be
coalesced if they are sent on different paths.
Thanks, Malahal.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Multipath] Round-robin performance limit
2011-05-03 5:04 ` Malahal Naineni
@ 2011-05-03 10:12 ` John A. Sullivan III
2011-10-04 3:08 ` Adam Chasen
0 siblings, 1 reply; 16+ messages in thread
From: John A. Sullivan III @ 2011-05-03 10:12 UTC (permalink / raw)
To: device-mapper development; +Cc: pmdaws
On Mon, 2011-05-02 at 22:04 -0700, Malahal Naineni wrote:
> John A. Sullivan III [jsullivan@opensourcedevel.com] wrote:
> > I'm also very curious about your findings on rr_min_io. I cannot find
> > my benchmarks but we tested various settings heavily. I do not recall
> > if we saw more even scaling with 10 or 100. I remember being surprised
> > that performance with it set to 1 was poor. I would have thought that,
> > in a bonded environment, changing paths per iSCSI command would give
> > optimal performance. Can anyone explain why it does not?
>
> rr_min_io of 1 will give poor performance if your multipath kernel
> module doesn't support request based multipath. In those BIO based
> multipath, multipath receives 4KB requests. Such requests can't be
> coalesced if they are sent on different paths.
<snip>
Ah, that makes perfect sense and why 3 seems to be the magic number in
Linux (4000 / 1460 (or whatever IP payload is)). Does that change with
Jumbo frames? In fact, how would that be optimized in Linux?
9KB seems to be a reasonable common jumbo frame value for various
vendors and that should contain two pages but, I would guess, Linux
can't utilize it as each block must be independently acknowledged. Is
that correct? Thus a frame size of a little over 4KB would be optimal
for Linux?
Would that mean that rr_min_io of 1 would become optimal? However, if
each block needs to be acknowledged before the next is sent, I would
think we are still latency bound, i.e., even if I can send four requests
down four separate paths, I cannot send the second until the first has
been acknowledged and since I can easily place four packets on the same
path within the latency period of four packets, multibus gives me
absolutely no performance advantage for a single iSCSI stream and only
proves useful as I start multiplexing multiple iSCSI streams.
Is that analysis correct? If so, what constitutes a separate iSCSI
stream? Are two separate file requests from the same file systems to the
same iSCSI device considered two iSCSI streams and thus can be
multiplexed and benefit from multipath or are they considered all part
of the same iSCSI stream? If they are considered one, do they become two
if they reside on different partitions and thus different file systems?
If not, then do we only see multibus performance gains between a single
file system host and a single iSCSI host when we use virtualization each
with their own iSCSI connection (as opposed to using iSCSI connections
in the underlying host and exposing them to the virtual machines as
local storage)?
I hope I'm not hijacking this thread and realize I've asked some
convoluted questions but optimizing multibus through bonded links for
single large hosts is still a bit of a mystery to me. Thanks - John
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Multipath] Round-robin performance limit
2011-05-03 10:12 ` John A. Sullivan III
@ 2011-10-04 3:08 ` Adam Chasen
2011-10-04 20:19 ` Adam Chasen
0 siblings, 1 reply; 16+ messages in thread
From: Adam Chasen @ 2011-10-04 3:08 UTC (permalink / raw)
To: device-mapper development; +Cc: jsullivan, malahal
Malahal,
After your mentioning bio vs request based I attempted to determine if
my kernel contains the request based mpath. It seems in 2.6.31 all
mpath was switched to request based. I have a kernel 2.6.31+ (actually
.35 and .38), so I believe I have requrest-based mpath.
All,
There also appears to be a new multipath configuration option
documented in the RHEL 6 beta documentation:
rr_min_io_rq Specifies the number of I/O requests to route to a path
before switching to the next path in the current path group, using
request-based device-mapper-multipath. This setting should be used on
systems running current kernels. On systems running kernels older than
2.6.31, use rr_min_io. The default value is 1.
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6-Beta/html/DM_Multipath/config_file_multipath.html
I have not tested using this setting vs rr_min_io yet or even if my
system supports the configuration directive.
If I trust some of the claims of several VMware ESX iscsi multipath
setups, it is possible (possibly using different software) to gain a
multiplicative throughput by adding additional Ethernet links. This
makes me hopeful that we can do this with open-iscsi and dm-mulitpath
as well.
It could be something obvious I am missing, but it appears a lot of
people experience this same issue.
Thanks,
Adam
On Tue, May 3, 2011 at 6:12 AM, John A. Sullivan III
<jsullivan@opensourcedevel.com> wrote:
> On Mon, 2011-05-02 at 22:04 -0700, Malahal Naineni wrote:
>> John A. Sullivan III [jsullivan@opensourcedevel.com] wrote:
>> > I'm also very curious about your findings on rr_min_io. I cannot find
>> > my benchmarks but we tested various settings heavily. I do not recall
>> > if we saw more even scaling with 10 or 100. I remember being surprised
>> > that performance with it set to 1 was poor. I would have thought that,
>> > in a bonded environment, changing paths per iSCSI command would give
>> > optimal performance. Can anyone explain why it does not?
>>
>> rr_min_io of 1 will give poor performance if your multipath kernel
>> module doesn't support request based multipath. In those BIO based
>> multipath, multipath receives 4KB requests. Such requests can't be
>> coalesced if they are sent on different paths.
> <snip>
> Ah, that makes perfect sense and why 3 seems to be the magic number in
> Linux (4000 / 1460 (or whatever IP payload is)). Does that change with
> Jumbo frames? In fact, how would that be optimized in Linux?
>
> 9KB seems to be a reasonable common jumbo frame value for various
> vendors and that should contain two pages but, I would guess, Linux
> can't utilize it as each block must be independently acknowledged. Is
> that correct? Thus a frame size of a little over 4KB would be optimal
> for Linux?
>
> Would that mean that rr_min_io of 1 would become optimal? However, if
> each block needs to be acknowledged before the next is sent, I would
> think we are still latency bound, i.e., even if I can send four requests
> down four separate paths, I cannot send the second until the first has
> been acknowledged and since I can easily place four packets on the same
> path within the latency period of four packets, multibus gives me
> absolutely no performance advantage for a single iSCSI stream and only
> proves useful as I start multiplexing multiple iSCSI streams.
>
> Is that analysis correct? If so, what constitutes a separate iSCSI
> stream? Are two separate file requests from the same file systems to the
> same iSCSI device considered two iSCSI streams and thus can be
> multiplexed and benefit from multipath or are they considered all part
> of the same iSCSI stream? If they are considered one, do they become two
> if they reside on different partitions and thus different file systems?
> If not, then do we only see multibus performance gains between a single
> file system host and a single iSCSI host when we use virtualization each
> with their own iSCSI connection (as opposed to using iSCSI connections
> in the underlying host and exposing them to the virtual machines as
> local storage)?
>
> I hope I'm not hijacking this thread and realize I've asked some
> convoluted questions but optimizing multibus through bonded links for
> single large hosts is still a bit of a mystery to me. Thanks - John
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Multipath] Round-robin performance limit
2011-10-04 3:08 ` Adam Chasen
@ 2011-10-04 20:19 ` Adam Chasen
2011-10-05 3:07 ` John A. Sullivan III
0 siblings, 1 reply; 16+ messages in thread
From: Adam Chasen @ 2011-10-04 20:19 UTC (permalink / raw)
To: device-mapper development
Unfortunately even with playing around with various settings, queues,
and other techniques, I was never able to exceed the bandwidth of more
than one of the Ethernet links when accessing a single multipathed
LUN.
When communicating with two different multipathed LUNs, which present
as two different multipath devices, I can saturate two links, but it
is still a one to one ratio of multipath devices to link saturation.
After further research on multipathing, it appears people are using md
raid to achieve multipathed devices. My initial testing of using raid0
md-raid device produces the behavior I expect of multipathed devices.
I can easily saturate both links during read operations.
I feel using md-raid is a less elegant solution than using
dm-multipath, but it will have to suffice until someone can provide me
some additional guidance.
Thanks,
Adam
On Mon, Oct 3, 2011 at 11:08 PM, Adam Chasen <adam@chasen.name> wrote:
> Malahal,
> After your mentioning bio vs request based I attempted to determine if
> my kernel contains the request based mpath. It seems in 2.6.31 all
> mpath was switched to request based. I have a kernel 2.6.31+ (actually
> .35 and .38), so I believe I have requrest-based mpath.
>
> All,
> There also appears to be a new multipath configuration option
> documented in the RHEL 6 beta documentation:
> rr_min_io_rq Specifies the number of I/O requests to route to a path
> before switching to the next path in the current path group, using
> request-based device-mapper-multipath. This setting should be used on
> systems running current kernels. On systems running kernels older than
> 2.6.31, use rr_min_io. The default value is 1.
>
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6-Beta/html/DM_Multipath/config_file_multipath.html
>
> I have not tested using this setting vs rr_min_io yet or even if my
> system supports the configuration directive.
>
> If I trust some of the claims of several VMware ESX iscsi multipath
> setups, it is possible (possibly using different software) to gain a
> multiplicative throughput by adding additional Ethernet links. This
> makes me hopeful that we can do this with open-iscsi and dm-mulitpath
> as well.
>
> It could be something obvious I am missing, but it appears a lot of
> people experience this same issue.
>
> Thanks,
> Adam
>
> On Tue, May 3, 2011 at 6:12 AM, John A. Sullivan III
> <jsullivan@opensourcedevel.com> wrote:
>> On Mon, 2011-05-02 at 22:04 -0700, Malahal Naineni wrote:
>>> John A. Sullivan III [jsullivan@opensourcedevel.com] wrote:
>>> > I'm also very curious about your findings on rr_min_io. I cannot find
>>> > my benchmarks but we tested various settings heavily. I do not recall
>>> > if we saw more even scaling with 10 or 100. I remember being surprised
>>> > that performance with it set to 1 was poor. I would have thought that,
>>> > in a bonded environment, changing paths per iSCSI command would give
>>> > optimal performance. Can anyone explain why it does not?
>>>
>>> rr_min_io of 1 will give poor performance if your multipath kernel
>>> module doesn't support request based multipath. In those BIO based
>>> multipath, multipath receives 4KB requests. Such requests can't be
>>> coalesced if they are sent on different paths.
>> <snip>
>> Ah, that makes perfect sense and why 3 seems to be the magic number in
>> Linux (4000 / 1460 (or whatever IP payload is)). Does that change with
>> Jumbo frames? In fact, how would that be optimized in Linux?
>>
>> 9KB seems to be a reasonable common jumbo frame value for various
>> vendors and that should contain two pages but, I would guess, Linux
>> can't utilize it as each block must be independently acknowledged. Is
>> that correct? Thus a frame size of a little over 4KB would be optimal
>> for Linux?
>>
>> Would that mean that rr_min_io of 1 would become optimal? However, if
>> each block needs to be acknowledged before the next is sent, I would
>> think we are still latency bound, i.e., even if I can send four requests
>> down four separate paths, I cannot send the second until the first has
>> been acknowledged and since I can easily place four packets on the same
>> path within the latency period of four packets, multibus gives me
>> absolutely no performance advantage for a single iSCSI stream and only
>> proves useful as I start multiplexing multiple iSCSI streams.
>>
>> Is that analysis correct? If so, what constitutes a separate iSCSI
>> stream? Are two separate file requests from the same file systems to the
>> same iSCSI device considered two iSCSI streams and thus can be
>> multiplexed and benefit from multipath or are they considered all part
>> of the same iSCSI stream? If they are considered one, do they become two
>> if they reside on different partitions and thus different file systems?
>> If not, then do we only see multibus performance gains between a single
>> file system host and a single iSCSI host when we use virtualization each
>> with their own iSCSI connection (as opposed to using iSCSI connections
>> in the underlying host and exposing them to the virtual machines as
>> local storage)?
>>
>> I hope I'm not hijacking this thread and realize I've asked some
>> convoluted questions but optimizing multibus through bonded links for
>> single large hosts is still a bit of a mystery to me. Thanks - John
>>
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
>>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Multipath] Round-robin performance limit
2011-10-04 20:19 ` Adam Chasen
@ 2011-10-05 3:07 ` John A. Sullivan III
2011-10-05 19:54 ` Adam Chasen
0 siblings, 1 reply; 16+ messages in thread
From: John A. Sullivan III @ 2011-10-05 3:07 UTC (permalink / raw)
To: device-mapper development
On Tue, 2011-10-04 at 16:19 -0400, Adam Chasen wrote:
> Unfortunately even with playing around with various settings, queues,
> and other techniques, I was never able to exceed the bandwidth of more
> than one of the Ethernet links when accessing a single multipathed
> LUN.
>
> When communicating with two different multipathed LUNs, which present
> as two different multipath devices, I can saturate two links, but it
> is still a one to one ratio of multipath devices to link saturation.
>
> After further research on multipathing, it appears people are using md
> raid to achieve multipathed devices. My initial testing of using raid0
> md-raid device produces the behavior I expect of multipathed devices.
> I can easily saturate both links during read operations.
>
> I feel using md-raid is a less elegant solution than using
> dm-multipath, but it will have to suffice until someone can provide me
> some additional guidance.
>
> Thanks,
> Adam
We recently changed from the RAID0 approach to multipath multibus.
RAID0 did seem to give more even performance over a variety of IO
patterns but it had a critical flaw. We could not use the snapshot
capabilities of the SAN because we could never be certain of
snapshotting the RAID0 disks in a transactionally consistent state. If
I have four disk in a RAID0 array and snapshot them all, how can I be
assured that I have not done something like written two of three stripes
and no parity. This was our singular reason for discarding RAID0 over
iSCSI for multipath multibus - John
>
> On Mon, Oct 3, 2011 at 11:08 PM, Adam Chasen <adam@chasen.name> wrote:
> > Malahal,
> > After your mentioning bio vs request based I attempted to determine if
> > my kernel contains the request based mpath. It seems in 2.6.31 all
> > mpath was switched to request based. I have a kernel 2.6.31+ (actually
> > .35 and .38), so I believe I have requrest-based mpath.
> >
> > All,
> > There also appears to be a new multipath configuration option
> > documented in the RHEL 6 beta documentation:
> > rr_min_io_rq Specifies the number of I/O requests to route to a path
> > before switching to the next path in the current path group, using
> > request-based device-mapper-multipath. This setting should be used on
> > systems running current kernels. On systems running kernels older than
> > 2.6.31, use rr_min_io. The default value is 1.
> >
> > http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6-Beta/html/DM_Multipath/config_file_multipath.html
> >
> > I have not tested using this setting vs rr_min_io yet or even if my
> > system supports the configuration directive.
> >
> > If I trust some of the claims of several VMware ESX iscsi multipath
> > setups, it is possible (possibly using different software) to gain a
> > multiplicative throughput by adding additional Ethernet links. This
> > makes me hopeful that we can do this with open-iscsi and dm-mulitpath
> > as well.
> >
> > It could be something obvious I am missing, but it appears a lot of
> > people experience this same issue.
> >
> > Thanks,
> > Adam
> >
> > On Tue, May 3, 2011 at 6:12 AM, John A. Sullivan III
> > <jsullivan@opensourcedevel.com> wrote:
> >> On Mon, 2011-05-02 at 22:04 -0700, Malahal Naineni wrote:
> >>> John A. Sullivan III [jsullivan@opensourcedevel.com] wrote:
> >>> > I'm also very curious about your findings on rr_min_io. I cannot find
> >>> > my benchmarks but we tested various settings heavily. I do not recall
> >>> > if we saw more even scaling with 10 or 100. I remember being surprised
> >>> > that performance with it set to 1 was poor. I would have thought that,
> >>> > in a bonded environment, changing paths per iSCSI command would give
> >>> > optimal performance. Can anyone explain why it does not?
> >>>
> >>> rr_min_io of 1 will give poor performance if your multipath kernel
> >>> module doesn't support request based multipath. In those BIO based
> >>> multipath, multipath receives 4KB requests. Such requests can't be
> >>> coalesced if they are sent on different paths.
> >> <snip>
> >> Ah, that makes perfect sense and why 3 seems to be the magic number in
> >> Linux (4000 / 1460 (or whatever IP payload is)). Does that change with
> >> Jumbo frames? In fact, how would that be optimized in Linux?
> >>
> >> 9KB seems to be a reasonable common jumbo frame value for various
> >> vendors and that should contain two pages but, I would guess, Linux
> >> can't utilize it as each block must be independently acknowledged. Is
> >> that correct? Thus a frame size of a little over 4KB would be optimal
> >> for Linux?
> >>
> >> Would that mean that rr_min_io of 1 would become optimal? However, if
> >> each block needs to be acknowledged before the next is sent, I would
> >> think we are still latency bound, i.e., even if I can send four requests
> >> down four separate paths, I cannot send the second until the first has
> >> been acknowledged and since I can easily place four packets on the same
> >> path within the latency period of four packets, multibus gives me
> >> absolutely no performance advantage for a single iSCSI stream and only
> >> proves useful as I start multiplexing multiple iSCSI streams.
> >>
> >> Is that analysis correct? If so, what constitutes a separate iSCSI
> >> stream? Are two separate file requests from the same file systems to the
> >> same iSCSI device considered two iSCSI streams and thus can be
> >> multiplexed and benefit from multipath or are they considered all part
> >> of the same iSCSI stream? If they are considered one, do they become two
> >> if they reside on different partitions and thus different file systems?
> >> If not, then do we only see multibus performance gains between a single
> >> file system host and a single iSCSI host when we use virtualization each
> >> with their own iSCSI connection (as opposed to using iSCSI connections
> >> in the underlying host and exposing them to the virtual machines as
> >> local storage)?
> >>
> >> I hope I'm not hijacking this thread and realize I've asked some
> >> convoluted questions but optimizing multibus through bonded links for
> >> single large hosts is still a bit of a mystery to me. Thanks - John
> >>
> >> --
> >> dm-devel mailing list
> >> dm-devel@redhat.com
> >> https://www.redhat.com/mailman/listinfo/dm-devel
> >>
> >
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Multipath] Round-robin performance limit
2011-10-05 3:07 ` John A. Sullivan III
@ 2011-10-05 19:54 ` Adam Chasen
2011-10-05 20:14 ` John A. Sullivan III
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Adam Chasen @ 2011-10-05 19:54 UTC (permalink / raw)
To: device-mapper development, jsullivan
John,
I am limited in a similar fashion. I would much prefer to use multibus
multipath, but was unable to achieve bandwidth which would exceed a
single link even though it was spread over the 4 available links. Were
you able to gain even a similar performance of the RAID0 setup with
the multibus multipath?
Thanks,
Adam
On Tue, Oct 4, 2011 at 11:07 PM, John A. Sullivan III
<jsullivan@opensourcedevel.com> wrote:
> On Tue, 2011-10-04 at 16:19 -0400, Adam Chasen wrote:
>> Unfortunately even with playing around with various settings, queues,
>> and other techniques, I was never able to exceed the bandwidth of more
>> than one of the Ethernet links when accessing a single multipathed
>> LUN.
>>
>> When communicating with two different multipathed LUNs, which present
>> as two different multipath devices, I can saturate two links, but it
>> is still a one to one ratio of multipath devices to link saturation.
>>
>> After further research on multipathing, it appears people are using md
>> raid to achieve multipathed devices. My initial testing of using raid0
>> md-raid device produces the behavior I expect of multipathed devices.
>> I can easily saturate both links during read operations.
>>
>> I feel using md-raid is a less elegant solution than using
>> dm-multipath, but it will have to suffice until someone can provide me
>> some additional guidance.
>>
>> Thanks,
>> Adam
> We recently changed from the RAID0 approach to multipath multibus.
> RAID0 did seem to give more even performance over a variety of IO
> patterns but it had a critical flaw. We could not use the snapshot
> capabilities of the SAN because we could never be certain of
> snapshotting the RAID0 disks in a transactionally consistent state. If
> I have four disk in a RAID0 array and snapshot them all, how can I be
> assured that I have not done something like written two of three stripes
> and no parity. This was our singular reason for discarding RAID0 over
> iSCSI for multipath multibus - John
>
>>
>> On Mon, Oct 3, 2011 at 11:08 PM, Adam Chasen <adam@chasen.name> wrote:
>> > Malahal,
>> > After your mentioning bio vs request based I attempted to determine if
>> > my kernel contains the request based mpath. It seems in 2.6.31 all
>> > mpath was switched to request based. I have a kernel 2.6.31+ (actually
>> > .35 and .38), so I believe I have requrest-based mpath.
>> >
>> > All,
>> > There also appears to be a new multipath configuration option
>> > documented in the RHEL 6 beta documentation:
>> > rr_min_io_rq Specifies the number of I/O requests to route to a path
>> > before switching to the next path in the current path group, using
>> > request-based device-mapper-multipath. This setting should be used on
>> > systems running current kernels. On systems running kernels older than
>> > 2.6.31, use rr_min_io. The default value is 1.
>> >
>> > http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6-Beta/html/DM_Multipath/config_file_multipath.html
>> >
>> > I have not tested using this setting vs rr_min_io yet or even if my
>> > system supports the configuration directive.
>> >
>> > If I trust some of the claims of several VMware ESX iscsi multipath
>> > setups, it is possible (possibly using different software) to gain a
>> > multiplicative throughput by adding additional Ethernet links. This
>> > makes me hopeful that we can do this with open-iscsi and dm-mulitpath
>> > as well.
>> >
>> > It could be something obvious I am missing, but it appears a lot of
>> > people experience this same issue.
>> >
>> > Thanks,
>> > Adam
>> >
>> > On Tue, May 3, 2011 at 6:12 AM, John A. Sullivan III
>> > <jsullivan@opensourcedevel.com> wrote:
>> >> On Mon, 2011-05-02 at 22:04 -0700, Malahal Naineni wrote:
>> >>> John A. Sullivan III [jsullivan@opensourcedevel.com] wrote:
>> >>> > I'm also very curious about your findings on rr_min_io. I cannot find
>> >>> > my benchmarks but we tested various settings heavily. I do not recall
>> >>> > if we saw more even scaling with 10 or 100. I remember being surprised
>> >>> > that performance with it set to 1 was poor. I would have thought that,
>> >>> > in a bonded environment, changing paths per iSCSI command would give
>> >>> > optimal performance. Can anyone explain why it does not?
>> >>>
>> >>> rr_min_io of 1 will give poor performance if your multipath kernel
>> >>> module doesn't support request based multipath. In those BIO based
>> >>> multipath, multipath receives 4KB requests. Such requests can't be
>> >>> coalesced if they are sent on different paths.
>> >> <snip>
>> >> Ah, that makes perfect sense and why 3 seems to be the magic number in
>> >> Linux (4000 / 1460 (or whatever IP payload is)). Does that change with
>> >> Jumbo frames? In fact, how would that be optimized in Linux?
>> >>
>> >> 9KB seems to be a reasonable common jumbo frame value for various
>> >> vendors and that should contain two pages but, I would guess, Linux
>> >> can't utilize it as each block must be independently acknowledged. Is
>> >> that correct? Thus a frame size of a little over 4KB would be optimal
>> >> for Linux?
>> >>
>> >> Would that mean that rr_min_io of 1 would become optimal? However, if
>> >> each block needs to be acknowledged before the next is sent, I would
>> >> think we are still latency bound, i.e., even if I can send four requests
>> >> down four separate paths, I cannot send the second until the first has
>> >> been acknowledged and since I can easily place four packets on the same
>> >> path within the latency period of four packets, multibus gives me
>> >> absolutely no performance advantage for a single iSCSI stream and only
>> >> proves useful as I start multiplexing multiple iSCSI streams.
>> >>
>> >> Is that analysis correct? If so, what constitutes a separate iSCSI
>> >> stream? Are two separate file requests from the same file systems to the
>> >> same iSCSI device considered two iSCSI streams and thus can be
>> >> multiplexed and benefit from multipath or are they considered all part
>> >> of the same iSCSI stream? If they are considered one, do they become two
>> >> if they reside on different partitions and thus different file systems?
>> >> If not, then do we only see multibus performance gains between a single
>> >> file system host and a single iSCSI host when we use virtualization each
>> >> with their own iSCSI connection (as opposed to using iSCSI connections
>> >> in the underlying host and exposing them to the virtual machines as
>> >> local storage)?
>> >>
>> >> I hope I'm not hijacking this thread and realize I've asked some
>> >> convoluted questions but optimizing multibus through bonded links for
>> >> single large hosts is still a bit of a mystery to me. Thanks - John
>> >>
>> >> --
>> >> dm-devel mailing list
>> >> dm-devel@redhat.com
>> >> https://www.redhat.com/mailman/listinfo/dm-devel
>> >>
>> >
>>
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Multipath] Round-robin performance limit
2011-10-05 19:54 ` Adam Chasen
@ 2011-10-05 20:14 ` John A. Sullivan III
2011-10-22 15:02 ` Pasi Kärkkäinen
2011-12-23 0:54 ` John A. Sullivan III
2 siblings, 0 replies; 16+ messages in thread
From: John A. Sullivan III @ 2011-10-05 20:14 UTC (permalink / raw)
To: Adam Chasen; +Cc: device-mapper development
On Wed, 2011-10-05 at 15:54 -0400, Adam Chasen wrote:
> John,
> I am limited in a similar fashion. I would much prefer to use multibus
> multipath, but was unable to achieve bandwidth which would exceed a
> single link even though it was spread over the 4 available links. Were
> you able to gain even a similar performance of the RAID0 setup with
> the multibus multipath?
>
> Thanks,
> Adam
Yes but it depends :) A single iSCSI conversation is limited to the
bandwidth of a single path. However, the aggregate bandwidth of several
simultaneous conversations can begin to utilize the links well. I wish
there was a simple answer. However, even with RAID0, do all four links
fire at once or are they fired serially to write each stripe? In which
case, again, the individual throughput is probably quite limited and the
real gain is in aggregate throughput through multiple concurrent reads
and writes. However, I am by no means an expert on this subject - John
>
> On Tue, Oct 4, 2011 at 11:07 PM, John A. Sullivan III
> <jsullivan@opensourcedevel.com> wrote:
> > On Tue, 2011-10-04 at 16:19 -0400, Adam Chasen wrote:
> >> Unfortunately even with playing around with various settings, queues,
> >> and other techniques, I was never able to exceed the bandwidth of more
> >> than one of the Ethernet links when accessing a single multipathed
> >> LUN.
> >>
> >> When communicating with two different multipathed LUNs, which present
> >> as two different multipath devices, I can saturate two links, but it
> >> is still a one to one ratio of multipath devices to link saturation.
> >>
> >> After further research on multipathing, it appears people are using md
> >> raid to achieve multipathed devices. My initial testing of using raid0
> >> md-raid device produces the behavior I expect of multipathed devices.
> >> I can easily saturate both links during read operations.
> >>
> >> I feel using md-raid is a less elegant solution than using
> >> dm-multipath, but it will have to suffice until someone can provide me
> >> some additional guidance.
> >>
> >> Thanks,
> >> Adam
> > We recently changed from the RAID0 approach to multipath multibus.
> > RAID0 did seem to give more even performance over a variety of IO
> > patterns but it had a critical flaw. We could not use the snapshot
> > capabilities of the SAN because we could never be certain of
> > snapshotting the RAID0 disks in a transactionally consistent state. If
> > I have four disk in a RAID0 array and snapshot them all, how can I be
> > assured that I have not done something like written two of three stripes
> > and no parity. This was our singular reason for discarding RAID0 over
> > iSCSI for multipath multibus - John
> >
> >>
> >> On Mon, Oct 3, 2011 at 11:08 PM, Adam Chasen <adam@chasen.name> wrote:
> >> > Malahal,
> >> > After your mentioning bio vs request based I attempted to determine if
> >> > my kernel contains the request based mpath. It seems in 2.6.31 all
> >> > mpath was switched to request based. I have a kernel 2.6.31+ (actually
> >> > .35 and .38), so I believe I have requrest-based mpath.
> >> >
> >> > All,
> >> > There also appears to be a new multipath configuration option
> >> > documented in the RHEL 6 beta documentation:
> >> > rr_min_io_rq Specifies the number of I/O requests to route to a path
> >> > before switching to the next path in the current path group, using
> >> > request-based device-mapper-multipath. This setting should be used on
> >> > systems running current kernels. On systems running kernels older than
> >> > 2.6.31, use rr_min_io. The default value is 1.
> >> >
> >> > http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6-Beta/html/DM_Multipath/config_file_multipath.html
> >> >
> >> > I have not tested using this setting vs rr_min_io yet or even if my
> >> > system supports the configuration directive.
> >> >
> >> > If I trust some of the claims of several VMware ESX iscsi multipath
> >> > setups, it is possible (possibly using different software) to gain a
> >> > multiplicative throughput by adding additional Ethernet links. This
> >> > makes me hopeful that we can do this with open-iscsi and dm-mulitpath
> >> > as well.
> >> >
> >> > It could be something obvious I am missing, but it appears a lot of
> >> > people experience this same issue.
> >> >
> >> > Thanks,
> >> > Adam
> >> >
> >> > On Tue, May 3, 2011 at 6:12 AM, John A. Sullivan III
> >> > <jsullivan@opensourcedevel.com> wrote:
> >> >> On Mon, 2011-05-02 at 22:04 -0700, Malahal Naineni wrote:
> >> >>> John A. Sullivan III [jsullivan@opensourcedevel.com] wrote:
> >> >>> > I'm also very curious about your findings on rr_min_io. I cannot find
> >> >>> > my benchmarks but we tested various settings heavily. I do not recall
> >> >>> > if we saw more even scaling with 10 or 100. I remember being surprised
> >> >>> > that performance with it set to 1 was poor. I would have thought that,
> >> >>> > in a bonded environment, changing paths per iSCSI command would give
> >> >>> > optimal performance. Can anyone explain why it does not?
> >> >>>
> >> >>> rr_min_io of 1 will give poor performance if your multipath kernel
> >> >>> module doesn't support request based multipath. In those BIO based
> >> >>> multipath, multipath receives 4KB requests. Such requests can't be
> >> >>> coalesced if they are sent on different paths.
> >> >> <snip>
> >> >> Ah, that makes perfect sense and why 3 seems to be the magic number in
> >> >> Linux (4000 / 1460 (or whatever IP payload is)). Does that change with
> >> >> Jumbo frames? In fact, how would that be optimized in Linux?
> >> >>
> >> >> 9KB seems to be a reasonable common jumbo frame value for various
> >> >> vendors and that should contain two pages but, I would guess, Linux
> >> >> can't utilize it as each block must be independently acknowledged. Is
> >> >> that correct? Thus a frame size of a little over 4KB would be optimal
> >> >> for Linux?
> >> >>
> >> >> Would that mean that rr_min_io of 1 would become optimal? However, if
> >> >> each block needs to be acknowledged before the next is sent, I would
> >> >> think we are still latency bound, i.e., even if I can send four requests
> >> >> down four separate paths, I cannot send the second until the first has
> >> >> been acknowledged and since I can easily place four packets on the same
> >> >> path within the latency period of four packets, multibus gives me
> >> >> absolutely no performance advantage for a single iSCSI stream and only
> >> >> proves useful as I start multiplexing multiple iSCSI streams.
> >> >>
> >> >> Is that analysis correct? If so, what constitutes a separate iSCSI
> >> >> stream? Are two separate file requests from the same file systems to the
> >> >> same iSCSI device considered two iSCSI streams and thus can be
> >> >> multiplexed and benefit from multipath or are they considered all part
> >> >> of the same iSCSI stream? If they are considered one, do they become two
> >> >> if they reside on different partitions and thus different file systems?
> >> >> If not, then do we only see multibus performance gains between a single
> >> >> file system host and a single iSCSI host when we use virtualization each
> >> >> with their own iSCSI connection (as opposed to using iSCSI connections
> >> >> in the underlying host and exposing them to the virtual machines as
> >> >> local storage)?
> >> >>
> >> >> I hope I'm not hijacking this thread and realize I've asked some
> >> >> convoluted questions but optimizing multibus through bonded links for
> >> >> single large hosts is still a bit of a mystery to me. Thanks - John
> >> >>
> >> >> --
> >> >> dm-devel mailing list
> >> >> dm-devel@redhat.com
> >> >> https://www.redhat.com/mailman/listinfo/dm-devel
> >> >>
> >> >
> >>
> >> --
> >> dm-devel mailing list
> >> dm-devel@redhat.com
> >> https://www.redhat.com/mailman/listinfo/dm-devel
> >
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> >
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Multipath] Round-robin performance limit
2011-10-05 19:54 ` Adam Chasen
2011-10-05 20:14 ` John A. Sullivan III
@ 2011-10-22 15:02 ` Pasi Kärkkäinen
2011-10-24 11:50 ` Pasi Kärkkäinen
2011-12-23 0:54 ` John A. Sullivan III
2 siblings, 1 reply; 16+ messages in thread
From: Pasi Kärkkäinen @ 2011-10-22 15:02 UTC (permalink / raw)
To: device-mapper development; +Cc: jsullivan
On Wed, Oct 05, 2011 at 03:54:35PM -0400, Adam Chasen wrote:
> John,
> I am limited in a similar fashion. I would much prefer to use multibus
> multipath, but was unable to achieve bandwidth which would exceed a
> single link even though it was spread over the 4 available links. Were
> you able to gain even a similar performance of the RAID0 setup with
> the multibus multipath?
>
Utilizing multiple links works with for example this setup:
- VMware ESXi 4.1 software iSCSI initiator.
- Dell Equallogic iSCSI target.
The steps needed for ESXi are:
- Configure multiple VMkernel (vmkX) IP interfaces.
- Configure ESXi iscsi initiator to use (bind to) all the vmkX interfaces.
- Configure the path selection policy to be RR (RoundRobin).
- Configure multipath to switch paths after 3 IOs.
The same should work with Linux dm-multipath.
-- Pasi
> Thanks,
> Adam
>
> On Tue, Oct 4, 2011 at 11:07 PM, John A. Sullivan III
> <jsullivan@opensourcedevel.com> wrote:
> > On Tue, 2011-10-04 at 16:19 -0400, Adam Chasen wrote:
> >> Unfortunately even with playing around with various settings, queues,
> >> and other techniques, I was never able to exceed the bandwidth of more
> >> than one of the Ethernet links when accessing a single multipathed
> >> LUN.
> >>
> >> When communicating with two different multipathed LUNs, which present
> >> as two different multipath devices, I can saturate two links, but it
> >> is still a one to one ratio of multipath devices to link saturation.
> >>
> >> After further research on multipathing, it appears people are using md
> >> raid to achieve multipathed devices. My initial testing of using raid0
> >> md-raid device produces the behavior I expect of multipathed devices.
> >> I can easily saturate both links during read operations.
> >>
> >> I feel using md-raid is a less elegant solution than using
> >> dm-multipath, but it will have to suffice until someone can provide me
> >> some additional guidance.
> >>
> >> Thanks,
> >> Adam
> > We recently changed from the RAID0 approach to multipath multibus.
> > RAID0 did seem to give more even performance over a variety of IO
> > patterns but it had a critical flaw. We could not use the snapshot
> > capabilities of the SAN because we could never be certain of
> > snapshotting the RAID0 disks in a transactionally consistent state. If
> > I have four disk in a RAID0 array and snapshot them all, how can I be
> > assured that I have not done something like written two of three stripes
> > and no parity. This was our singular reason for discarding RAID0 over
> > iSCSI for multipath multibus - John
> >
> >>
> >> On Mon, Oct 3, 2011 at 11:08 PM, Adam Chasen <adam@chasen.name> wrote:
> >> > Malahal,
> >> > After your mentioning bio vs request based I attempted to determine if
> >> > my kernel contains the request based mpath. It seems in 2.6.31 all
> >> > mpath was switched to request based. I have a kernel 2.6.31+ (actually
> >> > .35 and .38), so I believe I have requrest-based mpath.
> >> >
> >> > All,
> >> > There also appears to be a new multipath configuration option
> >> > documented in the RHEL 6 beta documentation:
> >> > rr_min_io_rq Specifies the number of I/O requests to route to a path
> >> > before switching to the next path in the current path group, using
> >> > request-based device-mapper-multipath. This setting should be used on
> >> > systems running current kernels. On systems running kernels older than
> >> > 2.6.31, use rr_min_io. The default value is 1.
> >> >
> >> > http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6-Beta/html/DM_Multipath/config_file_multipath.html
> >> >
> >> > I have not tested using this setting vs rr_min_io yet or even if my
> >> > system supports the configuration directive.
> >> >
> >> > If I trust some of the claims of several VMware ESX iscsi multipath
> >> > setups, it is possible (possibly using different software) to gain a
> >> > multiplicative throughput by adding additional Ethernet links. This
> >> > makes me hopeful that we can do this with open-iscsi and dm-mulitpath
> >> > as well.
> >> >
> >> > It could be something obvious I am missing, but it appears a lot of
> >> > people experience this same issue.
> >> >
> >> > Thanks,
> >> > Adam
> >> >
> >> > On Tue, May 3, 2011 at 6:12 AM, John A. Sullivan III
> >> > <jsullivan@opensourcedevel.com> wrote:
> >> >> On Mon, 2011-05-02 at 22:04 -0700, Malahal Naineni wrote:
> >> >>> John A. Sullivan III [jsullivan@opensourcedevel.com] wrote:
> >> >>> > I'm also very curious about your findings on rr_min_io. I cannot find
> >> >>> > my benchmarks but we tested various settings heavily. I do not recall
> >> >>> > if we saw more even scaling with 10 or 100. I remember being surprised
> >> >>> > that performance with it set to 1 was poor. I would have thought that,
> >> >>> > in a bonded environment, changing paths per iSCSI command would give
> >> >>> > optimal performance. Can anyone explain why it does not?
> >> >>>
> >> >>> rr_min_io of 1 will give poor performance if your multipath kernel
> >> >>> module doesn't support request based multipath. In those BIO based
> >> >>> multipath, multipath receives 4KB requests. Such requests can't be
> >> >>> coalesced if they are sent on different paths.
> >> >> <snip>
> >> >> Ah, that makes perfect sense and why 3 seems to be the magic number in
> >> >> Linux (4000 / 1460 (or whatever IP payload is)). Does that change with
> >> >> Jumbo frames? In fact, how would that be optimized in Linux?
> >> >>
> >> >> 9KB seems to be a reasonable common jumbo frame value for various
> >> >> vendors and that should contain two pages but, I would guess, Linux
> >> >> can't utilize it as each block must be independently acknowledged. Is
> >> >> that correct? Thus a frame size of a little over 4KB would be optimal
> >> >> for Linux?
> >> >>
> >> >> Would that mean that rr_min_io of 1 would become optimal? However, if
> >> >> each block needs to be acknowledged before the next is sent, I would
> >> >> think we are still latency bound, i.e., even if I can send four requests
> >> >> down four separate paths, I cannot send the second until the first has
> >> >> been acknowledged and since I can easily place four packets on the same
> >> >> path within the latency period of four packets, multibus gives me
> >> >> absolutely no performance advantage for a single iSCSI stream and only
> >> >> proves useful as I start multiplexing multiple iSCSI streams.
> >> >>
> >> >> Is that analysis correct? If so, what constitutes a separate iSCSI
> >> >> stream? Are two separate file requests from the same file systems to the
> >> >> same iSCSI device considered two iSCSI streams and thus can be
> >> >> multiplexed and benefit from multipath or are they considered all part
> >> >> of the same iSCSI stream? If they are considered one, do they become two
> >> >> if they reside on different partitions and thus different file systems?
> >> >> If not, then do we only see multibus performance gains between a single
> >> >> file system host and a single iSCSI host when we use virtualization each
> >> >> with their own iSCSI connection (as opposed to using iSCSI connections
> >> >> in the underlying host and exposing them to the virtual machines as
> >> >> local storage)?
> >> >>
> >> >> I hope I'm not hijacking this thread and realize I've asked some
> >> >> convoluted questions but optimizing multibus through bonded links for
> >> >> single large hosts is still a bit of a mystery to me. Thanks - John
> >> >>
> >> >> --
> >> >> dm-devel mailing list
> >> >> dm-devel@redhat.com
> >> >> https://www.redhat.com/mailman/listinfo/dm-devel
> >> >>
> >> >
> >>
> >> --
> >> dm-devel mailing list
> >> dm-devel@redhat.com
> >> https://www.redhat.com/mailman/listinfo/dm-devel
> >
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> >
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Multipath] Round-robin performance limit
2011-10-22 15:02 ` Pasi Kärkkäinen
@ 2011-10-24 11:50 ` Pasi Kärkkäinen
0 siblings, 0 replies; 16+ messages in thread
From: Pasi Kärkkäinen @ 2011-10-24 11:50 UTC (permalink / raw)
To: device-mapper development; +Cc: jsullivan
On Sat, Oct 22, 2011 at 06:02:47PM +0300, Pasi Kärkkäinen wrote:
> On Wed, Oct 05, 2011 at 03:54:35PM -0400, Adam Chasen wrote:
> > John,
> > I am limited in a similar fashion. I would much prefer to use multibus
> > multipath, but was unable to achieve bandwidth which would exceed a
> > single link even though it was spread over the 4 available links. Were
> > you able to gain even a similar performance of the RAID0 setup with
> > the multibus multipath?
> >
>
> Utilizing multiple links works with for example this setup:
> - VMware ESXi 4.1 software iSCSI initiator.
> - Dell Equallogic iSCSI target.
>
> The steps needed for ESXi are:
> - Configure multiple VMkernel (vmkX) IP interfaces.
And I forgot to write this:
- Bind the vmkX interfaces to portgroups that use dedicated NICs.
> - Configure ESXi iscsi initiator to use (bind to) all the vmkX interfaces.
> - Configure the path selection policy to be RR (RoundRobin).
> - Configure multipath to switch paths after 3 IOs.
>
>
> The same should work with Linux dm-multipath.
>
>
That should be it.
-- Pasi
>
> > Thanks,
> > Adam
> >
> > On Tue, Oct 4, 2011 at 11:07 PM, John A. Sullivan III
> > <jsullivan@opensourcedevel.com> wrote:
> > > On Tue, 2011-10-04 at 16:19 -0400, Adam Chasen wrote:
> > >> Unfortunately even with playing around with various settings, queues,
> > >> and other techniques, I was never able to exceed the bandwidth of more
> > >> than one of the Ethernet links when accessing a single multipathed
> > >> LUN.
> > >>
> > >> When communicating with two different multipathed LUNs, which present
> > >> as two different multipath devices, I can saturate two links, but it
> > >> is still a one to one ratio of multipath devices to link saturation.
> > >>
> > >> After further research on multipathing, it appears people are using md
> > >> raid to achieve multipathed devices. My initial testing of using raid0
> > >> md-raid device produces the behavior I expect of multipathed devices.
> > >> I can easily saturate both links during read operations.
> > >>
> > >> I feel using md-raid is a less elegant solution than using
> > >> dm-multipath, but it will have to suffice until someone can provide me
> > >> some additional guidance.
> > >>
> > >> Thanks,
> > >> Adam
> > > We recently changed from the RAID0 approach to multipath multibus.
> > > RAID0 did seem to give more even performance over a variety of IO
> > > patterns but it had a critical flaw. We could not use the snapshot
> > > capabilities of the SAN because we could never be certain of
> > > snapshotting the RAID0 disks in a transactionally consistent state. If
> > > I have four disk in a RAID0 array and snapshot them all, how can I be
> > > assured that I have not done something like written two of three stripes
> > > and no parity. This was our singular reason for discarding RAID0 over
> > > iSCSI for multipath multibus - John
> > >
> > >>
> > >> On Mon, Oct 3, 2011 at 11:08 PM, Adam Chasen <adam@chasen.name> wrote:
> > >> > Malahal,
> > >> > After your mentioning bio vs request based I attempted to determine if
> > >> > my kernel contains the request based mpath. It seems in 2.6.31 all
> > >> > mpath was switched to request based. I have a kernel 2.6.31+ (actually
> > >> > .35 and .38), so I believe I have requrest-based mpath.
> > >> >
> > >> > All,
> > >> > There also appears to be a new multipath configuration option
> > >> > documented in the RHEL 6 beta documentation:
> > >> > rr_min_io_rq Specifies the number of I/O requests to route to a path
> > >> > before switching to the next path in the current path group, using
> > >> > request-based device-mapper-multipath. This setting should be used on
> > >> > systems running current kernels. On systems running kernels older than
> > >> > 2.6.31, use rr_min_io. The default value is 1.
> > >> >
> > >> > http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6-Beta/html/DM_Multipath/config_file_multipath.html
> > >> >
> > >> > I have not tested using this setting vs rr_min_io yet or even if my
> > >> > system supports the configuration directive.
> > >> >
> > >> > If I trust some of the claims of several VMware ESX iscsi multipath
> > >> > setups, it is possible (possibly using different software) to gain a
> > >> > multiplicative throughput by adding additional Ethernet links. This
> > >> > makes me hopeful that we can do this with open-iscsi and dm-mulitpath
> > >> > as well.
> > >> >
> > >> > It could be something obvious I am missing, but it appears a lot of
> > >> > people experience this same issue.
> > >> >
> > >> > Thanks,
> > >> > Adam
> > >> >
> > >> > On Tue, May 3, 2011 at 6:12 AM, John A. Sullivan III
> > >> > <jsullivan@opensourcedevel.com> wrote:
> > >> >> On Mon, 2011-05-02 at 22:04 -0700, Malahal Naineni wrote:
> > >> >>> John A. Sullivan III [jsullivan@opensourcedevel.com] wrote:
> > >> >>> > I'm also very curious about your findings on rr_min_io. I cannot find
> > >> >>> > my benchmarks but we tested various settings heavily. I do not recall
> > >> >>> > if we saw more even scaling with 10 or 100. I remember being surprised
> > >> >>> > that performance with it set to 1 was poor. I would have thought that,
> > >> >>> > in a bonded environment, changing paths per iSCSI command would give
> > >> >>> > optimal performance. Can anyone explain why it does not?
> > >> >>>
> > >> >>> rr_min_io of 1 will give poor performance if your multipath kernel
> > >> >>> module doesn't support request based multipath. In those BIO based
> > >> >>> multipath, multipath receives 4KB requests. Such requests can't be
> > >> >>> coalesced if they are sent on different paths.
> > >> >> <snip>
> > >> >> Ah, that makes perfect sense and why 3 seems to be the magic number in
> > >> >> Linux (4000 / 1460 (or whatever IP payload is)). Does that change with
> > >> >> Jumbo frames? In fact, how would that be optimized in Linux?
> > >> >>
> > >> >> 9KB seems to be a reasonable common jumbo frame value for various
> > >> >> vendors and that should contain two pages but, I would guess, Linux
> > >> >> can't utilize it as each block must be independently acknowledged. Is
> > >> >> that correct? Thus a frame size of a little over 4KB would be optimal
> > >> >> for Linux?
> > >> >>
> > >> >> Would that mean that rr_min_io of 1 would become optimal? However, if
> > >> >> each block needs to be acknowledged before the next is sent, I would
> > >> >> think we are still latency bound, i.e., even if I can send four requests
> > >> >> down four separate paths, I cannot send the second until the first has
> > >> >> been acknowledged and since I can easily place four packets on the same
> > >> >> path within the latency period of four packets, multibus gives me
> > >> >> absolutely no performance advantage for a single iSCSI stream and only
> > >> >> proves useful as I start multiplexing multiple iSCSI streams.
> > >> >>
> > >> >> Is that analysis correct? If so, what constitutes a separate iSCSI
> > >> >> stream? Are two separate file requests from the same file systems to the
> > >> >> same iSCSI device considered two iSCSI streams and thus can be
> > >> >> multiplexed and benefit from multipath or are they considered all part
> > >> >> of the same iSCSI stream? If they are considered one, do they become two
> > >> >> if they reside on different partitions and thus different file systems?
> > >> >> If not, then do we only see multibus performance gains between a single
> > >> >> file system host and a single iSCSI host when we use virtualization each
> > >> >> with their own iSCSI connection (as opposed to using iSCSI connections
> > >> >> in the underlying host and exposing them to the virtual machines as
> > >> >> local storage)?
> > >> >>
> > >> >> I hope I'm not hijacking this thread and realize I've asked some
> > >> >> convoluted questions but optimizing multibus through bonded links for
> > >> >> single large hosts is still a bit of a mystery to me. Thanks - John
> > >> >>
> > >> >> --
> > >> >> dm-devel mailing list
> > >> >> dm-devel@redhat.com
> > >> >> https://www.redhat.com/mailman/listinfo/dm-devel
> > >> >>
> > >> >
> > >>
> > >> --
> > >> dm-devel mailing list
> > >> dm-devel@redhat.com
> > >> https://www.redhat.com/mailman/listinfo/dm-devel
> > >
> > >
> > > --
> > > dm-devel mailing list
> > > dm-devel@redhat.com
> > > https://www.redhat.com/mailman/listinfo/dm-devel
> > >
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Multipath] Round-robin performance limit
2011-10-05 19:54 ` Adam Chasen
2011-10-05 20:14 ` John A. Sullivan III
2011-10-22 15:02 ` Pasi Kärkkäinen
@ 2011-12-23 0:54 ` John A. Sullivan III
2011-12-27 11:36 ` Pasi Kärkkäinen
2 siblings, 1 reply; 16+ messages in thread
From: John A. Sullivan III @ 2011-12-23 0:54 UTC (permalink / raw)
To: device-mapper development; +Cc: Paul T. Bibaud
On Wed, 2011-10-05 at 15:54 -0400, Adam Chasen wrote:
> John,
> I am limited in a similar fashion. I would much prefer to use multibus
> multipath, but was unable to achieve bandwidth which would exceed a
> single link even though it was spread over the 4 available links. Were
> you able to gain even a similar performance of the RAID0 setup with
> the multibus multipath?
>
> Thanks,
> Adam
<snip>
We just ran a quick benchmark before optimizing. Using multibus rather
than RAID0 with four GbE NICs, and testing with a simple cat /dev/zero >
zeros, we hit 3.664 Gbps!
This is still on CentOS 5.4 so we are not able to play with
rr_min_io_rq. We have not yet activated jumbo frames. We are also
thinking of using SFQ as a qdisc instead of the default pfifo_fast. So,
we think we can make it go even faster.
We are delighted to be achieving this with multibus rather than RAID0 as
it means we can take transactionally consistent snapshots on the SAN.
Many thanks to whomever pointed out that tag queueing should solve the
4KB block size latency problem. The problem turned out to not be
latency as we were told but simply an under resources SAN. We brought
in new Nexenta SANs with much more RAM and they are flying - John
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Multipath] Round-robin performance limit
2011-12-23 0:54 ` John A. Sullivan III
@ 2011-12-27 11:36 ` Pasi Kärkkäinen
2011-12-27 19:18 ` John A. Sullivan III
0 siblings, 1 reply; 16+ messages in thread
From: Pasi Kärkkäinen @ 2011-12-27 11:36 UTC (permalink / raw)
To: device-mapper development; +Cc: Paul T. Bibaud
On Thu, Dec 22, 2011 at 07:54:46PM -0500, John A. Sullivan III wrote:
> On Wed, 2011-10-05 at 15:54 -0400, Adam Chasen wrote:
> > John,
> > I am limited in a similar fashion. I would much prefer to use multibus
> > multipath, but was unable to achieve bandwidth which would exceed a
> > single link even though it was spread over the 4 available links. Were
> > you able to gain even a similar performance of the RAID0 setup with
> > the multibus multipath?
> >
> > Thanks,
> > Adam
> <snip>
> We just ran a quick benchmark before optimizing. Using multibus rather
> than RAID0 with four GbE NICs, and testing with a simple cat /dev/zero >
> zeros, we hit 3.664 Gbps!
>
> This is still on CentOS 5.4 so we are not able to play with
> rr_min_io_rq. We have not yet activated jumbo frames. We are also
> thinking of using SFQ as a qdisc instead of the default pfifo_fast. So,
> we think we can make it go even faster.
>
> We are delighted to be achieving this with multibus rather than RAID0 as
> it means we can take transactionally consistent snapshots on the SAN.
>
> Many thanks to whomever pointed out that tag queueing should solve the
> 4KB block size latency problem. The problem turned out to not be
> latency as we were told but simply an under resources SAN. We brought
> in new Nexenta SANs with much more RAM and they are flying - John
>
Hey,
Can you please post your multipath configuration ?
Just for reference for future people googling for this :)
-- Pasi
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Multipath] Round-robin performance limit
2011-12-27 11:36 ` Pasi Kärkkäinen
@ 2011-12-27 19:18 ` John A. Sullivan III
0 siblings, 0 replies; 16+ messages in thread
From: John A. Sullivan III @ 2011-12-27 19:18 UTC (permalink / raw)
To: device-mapper development; +Cc: Paul T. Bibaud
On Tue, 2011-12-27 at 13:36 +0200, Pasi Kärkkäinen wrote:
> On Thu, Dec 22, 2011 at 07:54:46PM -0500, John A. Sullivan III wrote:
> > On Wed, 2011-10-05 at 15:54 -0400, Adam Chasen wrote:
> > > John,
> > > I am limited in a similar fashion. I would much prefer to use multibus
> > > multipath, but was unable to achieve bandwidth which would exceed a
> > > single link even though it was spread over the 4 available links. Were
> > > you able to gain even a similar performance of the RAID0 setup with
> > > the multibus multipath?
> > >
> > > Thanks,
> > > Adam
> > <snip>
> > We just ran a quick benchmark before optimizing. Using multibus rather
> > than RAID0 with four GbE NICs, and testing with a simple cat /dev/zero >
> > zeros, we hit 3.664 Gbps!
> >
> > This is still on CentOS 5.4 so we are not able to play with
> > rr_min_io_rq. We have not yet activated jumbo frames. We are also
> > thinking of using SFQ as a qdisc instead of the default pfifo_fast. So,
> > we think we can make it go even faster.
> >
> > We are delighted to be achieving this with multibus rather than RAID0 as
> > it means we can take transactionally consistent snapshots on the SAN.
> >
> > Many thanks to whomever pointed out that tag queueing should solve the
> > 4KB block size latency problem. The problem turned out to not be
> > latency as we were told but simply an under resources SAN. We brought
> > in new Nexenta SANs with much more RAM and they are flying - John
> >
>
> Hey,
>
> Can you please post your multipath configuration ?
> Just for reference for future people googling for this :)
>
> -- Pasi
>
<snip>
Sure although I would be a bit careful. I think there are a few things
we need to tweak in it and the lead engineer on the product and I just
haven't had the time to go over it. It is also based upon CentOS 5.4 so
we do not have rr_min_io_rq. We are a moderately secure environment so
I might need to scrub a bit of data:
multipath.conf
blacklist {
# devnode "*"
# sdb
wwid SATA_ST3250310NS_9XX0LYYY
#sda
wwid SATA_ST3250310NS_9XX0LZZZ
# The above does not seem to be working thus we will do
devnode "^sd[ab]$"
# This is usually a bad idea as the device names can change
# However, since we add our iSCSI devices long after boot, I think we are safe
}
defaults {
udev_dir /dev
polling_interval 5
selector "round-robin 0"
path_grouping_policy multibus
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
prio_callout "/bin/bash /sbin/mpath_prio_ssi %n" # This needs to be cleaned up
prio_callout /bin/true
path_checker directio
rr_min_io 100
max_fds 8192
rr_weight uniform
failback immediate
no_path_retry fail
# user_friendly_names yes
}
multipaths {
multipath {
wwid aaaaaaaaaa53f0d0000004e81f27d0001
alias isda
}
multipath {
wwid aaaaaaaaaa53f0d0000004e81f2910002
alias isdb
}
multipath {
wwid aaaaaaaaaa53f0d0000004e81f2ab0003
alias isdc
}
multipath {
wwid aaaaaaaaaa53f0d0000004e81f2c10004
alias isdd
}
}
devices {
device {
vendor "NEXENTA"
product "COMSTAR"
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
features "0"
hardware_handler "0"
}
}
Other miscellaneous settings:
# Some optimizations for the SAN network
ip link set eth0 txqlen 2000
ip link set eth1 txqlen 2000
ip link set eth2 txqlen 2000
ip link set eth3 txqlen 2000
The more we read about and test bufferbloat
(http://www.bufferbloat.net/projects/bloat), the more we are thinking of
actually dramatically reducing these buffers as it is quite possible for
one new iSCSI conversation to become backlogged behind another and, I
suspect, that could also wreak havoc on command reordering if we are
doing round robin around the interfaces.
We are also thinking of changing the queuing discipline from the default
pfifo_fast. Since it is all the same traffic, there is no need to band
it like pfifo_fast does by examining the TOS bits. A regular fifo qdisc
might be a hair faster. On the other hand, we might want to go with SFQ
so that one heavy iSCSI conversation cannot starve others or cause them
to not quickly accelerate TCP slow start.
multipath -F #flush
multipath
sleep 2
service multipathd start
sleep 2
blockdev --setra 1024 /dev/mapper/isda
blockdev --setra 1024 /dev/mapper/isdb
blockdev --setra 1024 /dev/mapper/isdc
blockdev --setra 1024 /dev/mapper/isdd
mount -o defaults,noatime /dev/mapper/id02sdd /backups # Note the
noatime
>From sysctl.conf:
# Controls tcp maximum receive window size
#net.core.rmem_max = 409600
#net.core.rmem_max = 8738000
net.core.rmem_max = 16777216
net.ipv4.tcp_rmem = 8192 873800 16777216
# Controls tcp maximum send window size
#net.core.wmem_max = 409600
#net.core.wmem_max = 6553600
net.core.wmem_max = 16777216
net.ipv4.tcp_wmem = 4096 655360 16777216
# Controls disabling Nagle algorithm and delayed acks
net.ipv4.tcp_low_latency=1
net.core.netdev_max_backlog = 2000
# Controls the use of TCP syncookies
net.ipv4.tcp_syncookies = 1
# Controls the maximum size of a message, in bytes
kernel.msgmnb = 65536
# Controls the default maxmimum size of a mesage queue
kernel.msgmax = 65536
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736
# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296
# Controls when we call for more entropy
# Since these systems have no mouse or keyboard and Linux no longer uses
network I/O,
# we are regularly running low on entropy
kernel.random.write_wakeup_threshold = 1024
# Not really needed for iSCSI - just an interesting setting we use in
conjunction with haveged to address the problem of lack of entropy on
headless systems
We have not yet re-enabled jumbo packets as that actually reduced
throughput in the past but that may have been related to the lack of
resources in the original unit.
Hope this helps. We are not experts so, if someone sees something we
can tweak, please point it out - John
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-12-27 19:18 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-28 15:55 [Multipath] Round-robin performance limit Adam Chasen
2011-05-02 7:25 ` Pasi Kärkkäinen
2011-05-02 13:36 ` Adam Chasen
2011-05-02 22:27 ` John A. Sullivan III
2011-05-03 5:04 ` Malahal Naineni
2011-05-03 10:12 ` John A. Sullivan III
2011-10-04 3:08 ` Adam Chasen
2011-10-04 20:19 ` Adam Chasen
2011-10-05 3:07 ` John A. Sullivan III
2011-10-05 19:54 ` Adam Chasen
2011-10-05 20:14 ` John A. Sullivan III
2011-10-22 15:02 ` Pasi Kärkkäinen
2011-10-24 11:50 ` Pasi Kärkkäinen
2011-12-23 0:54 ` John A. Sullivan III
2011-12-27 11:36 ` Pasi Kärkkäinen
2011-12-27 19:18 ` John A. Sullivan III
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.