netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jay Vosburgh <fubar@us.ibm.com>
To: "Oleg V. Ukhno" <olegu@yandex-team.ru>
Cc: netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	John Fastabend <john.r.fastabend@intel.com>
Subject: Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
Date: Fri, 14 Jan 2011 16:05:12 -0800	[thread overview]
Message-ID: <26330.1295049912@death> (raw)
In-Reply-To: <4D30D37B.6090908@yandex-team.ru>

Oleg V. Ukhno <olegu@yandex-team.ru> wrote:
>Jay Vosburgh wrote:
>
>> 	This is a violation of the 802.3ad (now 802.1ax) standard, 5.2.1
>> (f), which requires that all frames of a given "conversation" are passed
>> to a single port.
>>
>> 	The existing layer3+4 hash has a similar problem (that it may
>> send packets from a conversation to multiple ports), but for that case
>> it's an unlikely exception (only in the case of IP fragmentation), but
>> here it's the norm.  At a minimum, this must be clearly documented.
>>
>> 	Also, what does a round robin in 802.3ad provide that the
>> existing round robin does not?  My presumption is that you're looking to
>> get the aggregator autoconfiguration that 802.3ad provides, but you
>> don't say.

	I'm still curious about this question.  Given the rather
intricate setup of your particular network (described below), I'm not
sure why 802.3ad is of benefit over traditional etherchannel
(balance-rr / balance-xor).

>> 	I don't necessarily think this is a bad cheat (round robining on
>> 802.3ad as an explicit non-standard extension), since everybody wants to
>> stripe their traffic across multiple slaves.  I've given some thought to
>> making round robin into just another hash mode, but this also does some
>> magic to the MAC addresses of the outgoing frames (more on that below).
>Yes, I am resetting MAC addresses when transmitting packets to have switch
>to put packets into different ports of the receiving etherchannel.

	By "etherchannel" do you really mean "Cisco switch with a
port-channel group using LACP"?

>I am using this patch to provide full-mesh ISCSI connectivity between at
>least 4 hosts (all hosts of course are in same ethernet segment) and every
>host is connected with aggregate link with 4 slaves(usually).
>Using round-robin I provide near-equal load striping when transmitting,
>using MAC address magic I force switch to stripe packets over all slave
>links in destination port-channel(when number of rx-ing slaves is equal to
>number ot tx-ing slaves and is even).

	By "MAC address magic" do you mean that you're assigning
specifically chosen MAC addresses to the slaves so that the switch's
hash is essentially "assigning" the bonding slaves to particular ports
on the outgoing port-channel group?

	Assuming that this is the case, it's an interesting idea, but
I'm unconvinced that it's better on 802.3ad vs. balance-rr.  Unless I'm
missing something, you can get everything you need from an option to
have balance-rr / balance-xor utilize the slave's permanent address as
the source address for outgoing traffic.

>[...] So I am able to utilize all slaves
>for tx and for rx up to maximum capacity; besides I am getting L2 link
>failure detection (and load rebalancing), which is (in my opinion) much
>faster and robust than L3 or than dm-multipath provides.
>It's my idea with the patch

	Can somebody (John?) more knowledgable than I about dm-multipath
comment on the above?

>> 	This is the code that resets the MAC header as described above.
>> It doesn't quite match the documentation, since it only resets the MAC
>> for ETH_P_IP packets.
>Yes, I really meant that my patch applies to ETH_P_IP packets and I've
>missed that from documentation I wrote.

	Is limiting this to just ETH_P_IP really a means to exclude ARP,
or is there some advantage to (effectively) only balancing IP traffic,
and leaving other traffic (IPv6, for one) essentially unbalanced (when
exiting the switch through the destination port-channel group, which
you've set to use a src-mac hash)?

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

  reply	other threads:[~2011-01-15  0:05 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-14 19:07 [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing Oleg V. Ukhno
2011-01-14 20:10 ` John Fastabend
2011-01-14 23:12   ` Oleg V. Ukhno
2011-01-14 20:13 ` Jay Vosburgh
2011-01-14 22:51   ` Oleg V. Ukhno
2011-01-15  0:05     ` Jay Vosburgh [this message]
2011-01-15 12:11       ` Oleg V. Ukhno
2011-01-18  3:16       ` John Fastabend
2011-01-18 12:40         ` Oleg V. Ukhno
2011-01-18 14:54           ` Nicolas de Pesloüan
2011-01-18 15:28             ` Oleg V. Ukhno
2011-01-18 16:24               ` Nicolas de Pesloüan
2011-01-18 16:57                 ` Oleg V. Ukhno
2011-01-18 20:24                 ` Jay Vosburgh
2011-01-18 21:20                   ` Nicolas de Pesloüan
2011-01-19  1:45                     ` Jay Vosburgh
2011-01-18 22:22                   ` Oleg V. Ukhno
2011-01-19 16:13                   ` Oleg V. Ukhno
2011-01-19 20:12                     ` Nicolas de Pesloüan
2011-01-21 13:55                       ` Oleg V. Ukhno
2011-01-22 12:48                         ` Nicolas de Pesloüan
2011-01-24 19:32                           ` Oleg V. Ukhno
2011-01-29  2:28                         ` Jay Vosburgh
2011-02-01 16:25                           ` Oleg V. Ukhno
2011-02-02 17:30                             ` Jay Vosburgh
2011-02-02  9:54                           ` Nicolas de Pesloüan
2011-02-02 17:57                             ` Jay Vosburgh
2011-02-03 14:54                               ` Oleg V. Ukhno
2011-01-18 17:56               ` Kirill Smelkov
2011-01-18 16:41           ` John Fastabend
2011-01-18 17:21             ` Oleg V. Ukhno
2011-01-14 20:41 ` Nicolas de Pesloüan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=26330.1295049912@death \
    --to=fubar@us.ibm.com \
    --cc=davem@davemloft.net \
    --cc=john.r.fastabend@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=olegu@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).