* [PATCH] bonding: added 802.3ad round-robin hashing policy and source mac selection mode
@ 2011-03-01 22:34 Oleg V. Ukhno
2011-03-01 23:29 ` Stephen Hemminger
0 siblings, 1 reply; 9+ messages in thread
From: Oleg V. Ukhno @ 2011-03-01 22:34 UTC (permalink / raw)
To: netdev; +Cc: Jay Vosburgh, David S. Miller
Patch introduces two new (related) features to bonding module.
First feature is round-robin hashing policy, which is primarily
intended for use with 802.3ad mode, and puts every next IPv4 and
IPv6 packet into next availables slave without taling into account
which layer3 and above protocol is used.
Second feature makes possible choosing which MAC-address will be set
in the transmitted packet - when set to src-mac it will force setting
slave's interface real MAC address as source MAC address in every
packet, sent via this slave interface.
Main goal of this patch is to make possible single TCP stream
equally striped for both transmitted and received packets over all
available slaves.
This operating mode is not fully 802.3ad compliant, and will cause
some packet reordering in TCP stream, to some kernel tuning may be
required.
For correct working enabling round-robin hashing policy plus using
real slave's MAC addresses as source MAC addresses in transmitted
packets requires specific switch setting)hashing mode for port-channel
("etherchannel) should be set to src-mac or src-dst-mac to get
correct load-striping on the receiving host's etherchannel.
General requirements for using bonding in this operating mode are:
- even and preferrably equal number of slaves on sending and receiving
hosts;
- equal RTT between sending and receiving hosts on all slaves;
- switch capable of doing etherchannels and using src-mac or src-dst-mac
hashing policy for egress load striping
Signed-off-by: Oleg V. Ukhno <olegu@yandex-team.ru>
---
Documentation/networking/bonding.txt | 109 +++++++++++++++++++++++++++++++++++
drivers/net/bonding/bond_3ad.c | 2
drivers/net/bonding/bond_main.c | 59 +++++++++++++++++-
drivers/net/bonding/bond_sysfs.c | 50 ++++++++++++++++
drivers/net/bonding/bonding.h | 11 ++-
include/linux/if_bonding.h | 1
6 files changed, 223 insertions(+), 9 deletions(-)
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/Documentation/networking/bonding.txt linux-2.6.p/Documentation/networking/bonding.txt
--- linux-2.6/Documentation/networking/bonding.txt 2011-02-08 16:03:01.290281998 +0300
+++ linux-2.6.p/Documentation/networking/bonding.txt 2011-03-01 22:27:56.570282000 +0300
@@ -83,6 +83,7 @@ Table of Contents
12. Configuring Bonding for Maximum Throughput
12.1 Maximum Throughput in a Single Switch Topology
12.1.1 MT Bonding Mode Selection for Single Switch Topology
+12.1.1.1 Maximizing TCP Throughput for RX/TX for Single Switch Topology using layer2 mechanisms
12.1.2 MT Link Monitoring for Single Switch Topology
12.2 Maximum Throughput in a Multiple Switch Topology
12.2.1 MT Bonding Mode Selection for Multiple Switch Topology
@@ -761,11 +762,62 @@ xmit_hash_policy
conversations. Other implementations of 802.3ad may
or may not tolerate this noncompliance.
+ round-robin
+
+ This policy simply puts every next packet into next
+ slave interfaces, providing round-robin load striping
+ for transmitted data. This policy can be enabled with
+ any mode which supports choosing alternate hash policy,
+ but was initially done for 802.3ad mode.
+
+ Main goal for this policy is to stripe TX load without
+ taking into account which layer3 protocol is used, and
+ can be used for single TCP connection load striping. When
+ enabled, it will round-robin packets for IPv4 and IPv6
+ only.
+
+ There is also src_mac_select option, which can be used
+ to configure RX load-striping using switch hashing
+ algorhytms on the receiving side. See detailed description
+ below.
+
+ This algorithm is not 802.3ad compliant. This hash
+ policy will generally cause TCP packets to be delivered
+ out of order. Other implementations of 802.3ad may or
+ may not tolerate this noncompliance.
+
The default value is layer2. This option was added in bonding
version 2.6.3. In earlier versions of bonding, this parameter
does not exist, and the layer2 policy is the only policy. The
layer2+3 value was added for bonding version 3.2.2.
+src_mac_select
+
+ Specifies the source MAC selection method for outgoing packets.
+
+ Possible values are:
+
+ default or 0
+
+ The normal selection policy for the bonding mode is
+ used. This varies according to the bonding mode. The
+ balance-xor, balance-rr, 802.3ad and broadcast modes use
+ the MAC address of the bonding master. The balance-alb
+ and balance-tlb modes use the MAC address of the slave
+ the packet is sent on.
+
+ slave-src or 1
+
+ Sets the source MAC for all outgoing IPv4 and IPv6
+ packets to the MAC address of the slave the packet is
+ sent on. This is intended to permit fine grained load
+ balancing for 802.3ad and balance-xor modes by changing
+ the slave's MAC addresses. This is documented in detail
+ in section 12.1.1.1.
+
+ This option was added for bonding version 3.8.0
+
+
resend_igmp
Specifies the number of IGMP membership reports to be issued after
@@ -2190,6 +2242,63 @@ balance-alb: This mode is everything tha
device driver must support changing the hardware address while
the device is open.
+12.1.1.1 Maximizing TCP Throughput for RX/TX for Single Switch Topology
+ using layer2 mechanisms
+----------------------------------------------------------------------
+
+ It is also possible to use round-robin packet transmission,
+either in the balance-rr mode or using the round-robin xmit_hash_policy
+setting for balance-xor or 802.3ad modes to evenly stripe traffic across
+the set of slaves. In conjuction with the src_mac_select option's
+"slave-src" value and a carefully configured network, it is possible to
+achieve high throughput using round robin.
+
+ The network here involves multiple hosts running bonding, all
+connected to a common switch, for example:
+
+ +--------+ +------------+ +--------+
+ | |eth0 port1| |port3 eth0| |
+ | Host A +-------------+ switch +-------------+ Host B |
+ | bond0 +-------------+ +-------------+ bond0 |
+ | |eth1 port2| |port4 eth1| |
+ +--------+ +------------+ +--------+
+
+ In this configuration, the switch has ports 1 and 2 in one port
+channel group, and ports 3 and 4 in another port channel group.
+These channel groups are set to hash transmitted packets according to
+the source MAC address.
+
+ Host A and Host B each have a bond utilizing a round-robin
+transmit scheme, with src_mac_select set to "slave-src" and each slave's
+MAC address chosen to hash sequentially when run through the switch's
+source MAC address egress hash.
+
+ For clarity, only two slaves / ports are shown in the diagram.
+
+ In this manner, traffic sent from Host A to Host B will
+generally be evenly striped all the way through to Host B. It is first
+sent round robin by Host A. The switch will then hash on the source MAC
+address of the packets, whose MAC addresses have been manually selected
+to hash sequentially, thus the switch's egress port channel will tend to
+keep the traffic separated by port similarly to how it arrived (e.g.,
+packets from eth0 arriving on port1 will tend to go out port3, and
+simiarly for eth1, port2 and port4).
+
+ This scheme works best when the number of slaves and switch
+ports match at both ends. Note that it will work with differing numbers
+of slaves, but the traffic balance may not be optimal, and it may be
+possible for a host with more slaves to overrun a host with fewer
+slaves.
+
+ According to usage experience and test results, balancing multiple
+(8-16) TCP sessions across bonded interface of 2 slaves utilizes 1.9-2.0
+Gbps in both directions, but with 4 slaves in bond - only 3.2-3.5 Gbps.
+When doing single unidirectional data transfer usually full aggregate
+bandwidth utilization is possible.
+
+ This usage scenario requires special switch configuration, as well
+as tuning TCP reordering related sysctl parameters.
+
12.1.2 MT Link Monitoring for Single Switch Topology
----------------------------------------------------
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/bonding/bond_3ad.c linux-2.6.p/drivers/net/bonding/bond_3ad.c
--- linux-2.6/drivers/net/bonding/bond_3ad.c 2011-02-16 00:59:18.710282002 +0300
+++ linux-2.6.p/drivers/net/bonding/bond_3ad.c 2011-03-01 22:57:17.530282004 +0300
@@ -2419,7 +2419,7 @@ int bond_3ad_xmit_xor(struct sk_buff *sk
goto out;
}
- slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
+ slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg, dev);
bond_for_each_slave(bond, slave, i) {
struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/bonding/bonding.h linux-2.6.p/drivers/net/bonding/bonding.h
--- linux-2.6/drivers/net/bonding/bonding.h 2011-02-16 00:59:18.720282002 +0300
+++ linux-2.6.p/drivers/net/bonding/bonding.h 2011-03-01 22:53:16.160281999 +0300
@@ -23,8 +23,8 @@
#include "bond_3ad.h"
#include "bond_alb.h"
-#define DRV_VERSION "3.7.0"
-#define DRV_RELDATE "June 2, 2010"
+#define DRV_VERSION "3.8.0"
+#define DRV_RELDATE "March 3, 2011"
#define DRV_NAME "bonding"
#define DRV_DESCRIPTION "Ethernet Channel Bonding Driver"
@@ -162,6 +162,7 @@ struct bond_params {
int tx_queues;
int all_slaves_active;
int resend_igmp;
+ int src_mac_select;
};
struct bond_parm_tbl {
@@ -235,7 +236,7 @@ struct bonding {
#endif /* CONFIG_PROC_FS */
struct list_head bond_list;
struct netdev_hw_addr_list mc_list;
- int (*xmit_hash_policy)(struct sk_buff *, int);
+ int (*xmit_hash_policy)(struct sk_buff *, int, struct net_device *);
__be32 master_ip;
u16 flags;
u16 rr_tx_counter;
@@ -308,6 +309,9 @@ static inline bool bond_is_lb(const stru
#define BOND_ARP_VALIDATE_ALL (BOND_ARP_VALIDATE_ACTIVE | \
BOND_ARP_VALIDATE_BACKUP)
+#define BOND_SRC_MAC_DEFAULT 0
+#define BOND_SRC_MAC_SLAVE 1
+
static inline int slave_do_arp_validate(struct bonding *bond,
struct slave *slave)
{
@@ -402,6 +406,7 @@ extern const struct bond_parm_tbl arp_va
extern const struct bond_parm_tbl fail_over_mac_tbl[];
extern const struct bond_parm_tbl pri_reselect_tbl[];
extern struct bond_parm_tbl ad_select_tbl[];
+extern const struct bond_parm_tbl src_mac_select_tbl[];
#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
void bond_send_unsolicited_na(struct bonding *bond);
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/bonding/bond_main.c linux-2.6.p/drivers/net/bonding/bond_main.c
--- linux-2.6/drivers/net/bonding/bond_main.c 2011-02-16 00:59:18.720282002 +0300
+++ linux-2.6.p/drivers/net/bonding/bond_main.c 2011-03-01 23:00:24.770282003 +0300
@@ -111,6 +111,7 @@ static char *fail_over_mac;
static int all_slaves_active = 0;
static struct bond_params bonding_defaults;
static int resend_igmp = BOND_DEFAULT_RESEND_IGMP;
+static char *src_mac_select;
module_param(max_bonds, int, 0);
MODULE_PARM_DESC(max_bonds, "Max number of bonded devices");
@@ -152,7 +153,7 @@ module_param(ad_select, charp, 0);
MODULE_PARM_DESC(ad_select, "803.ad aggregation selection logic: stable (0, default), bandwidth (1), count (2)");
module_param(xmit_hash_policy, charp, 0);
MODULE_PARM_DESC(xmit_hash_policy, "XOR hashing method: 0 for layer 2 (default)"
- ", 1 for layer 3+4");
+ ", 1 for layer 3+4, 3 for round-robin");
module_param(arp_interval, int, 0);
MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
module_param_array(arp_ip_target, charp, NULL, 0);
@@ -167,6 +168,9 @@ MODULE_PARM_DESC(all_slaves_active, "Kee
"0 for never (default), 1 for always.");
module_param(resend_igmp, int, 0);
MODULE_PARM_DESC(resend_igmp, "Number of IGMP membership reports to send on link failure");
+module_param(src_mac_select, charp, 0);
+MODULE_PARM_DESC(src_mac_select, "Source MAC selection mode: 0 or default (default),"
+ "1 or slave-src to use slave's MAC as packet's src MAC");
/*----------------------------- Global variables ----------------------------*/
@@ -206,6 +210,7 @@ const struct bond_parm_tbl xmit_hashtype
{ "layer2", BOND_XMIT_POLICY_LAYER2},
{ "layer3+4", BOND_XMIT_POLICY_LAYER34},
{ "layer2+3", BOND_XMIT_POLICY_LAYER23},
+{ "round-robin", BOND_XMIT_POLICY_RR},
{ NULL, -1},
};
@@ -238,6 +243,12 @@ struct bond_parm_tbl ad_select_tbl[] = {
{ NULL, -1},
};
+const struct bond_parm_tbl src_mac_select_tbl[] = {
+{ "default", BOND_SRC_MAC_DEFAULT},
+{ "slave-src", BOND_SRC_MAC_SLAVE},
+{ NULL, -1},
+};
+
/*-------------------------- Forward declarations ---------------------------*/
static void bond_send_gratuitous_arp(struct bonding *bond);
@@ -422,6 +433,7 @@ struct vlan_entry *bond_next_vlan(struct
int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb,
struct net_device *slave_dev)
{
+ struct ethhdr *eth_data;
skb->dev = slave_dev;
skb->priority = 1;
#ifdef CONFIG_NET_POLL_CONTROLLER
@@ -433,6 +445,15 @@ int bond_dev_queue_xmit(struct bonding *
slave_dev->priv_flags &= ~IFF_IN_NETPOLL;
} else
#endif
+ if (bond->params.src_mac_select == BOND_SRC_MAC_SLAVE &&
+ (skb->protocol == htons(ETH_P_IP) ||
+ skb->protocol == htons(ETH_P_IPV6))) {
+ skb_reset_mac_header(skb);
+ eth_data = eth_hdr(skb);
+ memcpy(eth_data->h_source, slave_dev->perm_addr,
+ ETH_ALEN);
+ }
+
dev_queue_xmit(skb);
return 0;
@@ -3261,6 +3282,10 @@ static void bond_info_show_master(struct
bond->params.xmit_policy);
}
+ seq_printf(seq, "Source MAC select is: %s (%d)\n",
+ src_mac_select_tbl[bond->params.src_mac_select].modename,
+ bond->params.src_mac_select);
+
if (USES_PRIMARY(bond->params.mode)) {
seq_printf(seq, "Primary Slave: %s",
(bond->primary_slave) ?
@@ -3717,7 +3742,8 @@ void bond_unregister_arp(struct bonding
* Hash for the output device based upon layer 2 and layer 3 data. If
* the packet is not IP mimic bond_xmit_hash_policy_l2()
*/
-static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
+static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count,
+ struct net_device *bond_dev)
{
struct ethhdr *data = (struct ethhdr *)skb->data;
struct iphdr *iph = ip_hdr(skb);
@@ -3735,7 +3761,8 @@ static int bond_xmit_hash_policy_l23(str
* the packet is a frag or not TCP or UDP, just use layer 3 data. If it is
* altogether not IP, mimic bond_xmit_hash_policy_l2()
*/
-static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
+static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count,
+ struct net_device *bond_dev)
{
struct ethhdr *data = (struct ethhdr *)skb->data;
struct iphdr *iph = ip_hdr(skb);
@@ -3759,13 +3786,31 @@ static int bond_xmit_hash_policy_l34(str
/*
* Hash for the output device based upon layer 2 data
*/
-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count,
+ struct net_device *bond_dev)
{
struct ethhdr *data = (struct ethhdr *)skb->data;
return (data->h_dest[5] ^ data->h_source[5]) % count;
}
+/*
+ * Round-robin over all active slaves(one packet per slave) for IP and IPv6,
+ * otherwise mimic bond_xmit_hash_policy_l2() for IP IGMP traffic
+ */
+static int bond_xmit_hash_policy_rr(struct sk_buff *skb, int count,
+ struct net_device *bond_dev)
+{
+ struct ethhdr *data = (struct ethhdr *)skb->data;
+ struct iphdr *iph = ip_hdr(skb);
+ struct bonding *bond = netdev_priv(bond_dev);
+ if ((iph->protocol == IPPROTO_IGMP) &&
+ (skb->protocol == htons(ETH_P_IP))) {
+ return (data->h_dest[5] ^ data->h_source[5]) % count;
+ }
+ return bond->rr_tx_counter++ % count;
+}
+
/*-------------------------- Device entry points ----------------------------*/
static int bond_open(struct net_device *bond_dev)
@@ -4395,7 +4440,8 @@ static int bond_xmit_xor(struct sk_buff
if (!BOND_IS_OK(bond))
goto out;
- slave_no = bond->xmit_hash_policy(skb, bond->slave_cnt);
+ slave_no = bond->xmit_hash_policy(skb, bond->slave_cnt,
+ bond_dev);
bond_for_each_slave(bond, slave, i) {
slave_no--;
@@ -4492,6 +4538,9 @@ static void bond_set_xmit_hash_policy(st
case BOND_XMIT_POLICY_LAYER34:
bond->xmit_hash_policy = bond_xmit_hash_policy_l34;
break;
+ case BOND_XMIT_POLICY_RR:
+ bond->xmit_hash_policy = bond_xmit_hash_policy_rr;
+ break;
case BOND_XMIT_POLICY_LAYER2:
default:
bond->xmit_hash_policy = bond_xmit_hash_policy_l2;
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/bonding/bond_sysfs.c linux-2.6.p/drivers/net/bonding/bond_sysfs.c
--- linux-2.6/drivers/net/bonding/bond_sysfs.c 2011-02-08 16:03:02.950282003 +0300
+++ linux-2.6.p/drivers/net/bonding/bond_sysfs.c 2011-02-16 02:05:58.650281999 +0300
@@ -1643,6 +1643,55 @@ out:
static DEVICE_ATTR(resend_igmp, S_IRUGO | S_IWUSR,
bonding_show_resend_igmp, bonding_store_resend_igmp);
+/*
+ * Show and set the bonding src_mac_select param.
+ */
+
+static ssize_t bonding_show_src_mac_select(struct device *d,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct bonding *bond = to_bond(d);
+
+ return sprintf(buf, "%s %d\n",
+ src_mac_select_tbl[bond->params.src_mac_select].modename,
+ bond->params.src_mac_select);
+}
+
+static ssize_t bonding_store_src_mac_select(struct device *d,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int new_value, ret = count;
+ struct bonding *bond = to_bond(d);
+
+ if (bond->dev->flags & IFF_UP) {
+ pr_err("%s: Interface is up. Unable to update src mac select policy.\n",
+ bond->dev->name);
+ ret = -EPERM;
+ goto out;
+ }
+
+ new_value = bond_parse_parm(buf, src_mac_select_tbl);
+ if (new_value < 0) {
+ pr_err("%s: Ignoring invalid src mac select policy value %.*s.\n",
+ bond->dev->name,
+ (int)strlen(buf) - 1, buf);
+ ret = -EINVAL;
+ goto out;
+ } else {
+ bond->params.src_mac_select = new_value;
+ pr_info("%s: setting src mac select policy to %s (%d).\n",
+ bond->dev->name,
+ src_mac_select_tbl[new_value].modename, new_value);
+ }
+out:
+ return ret;
+}
+
+static DEVICE_ATTR(src_mac_select, S_IRUGO | S_IWUSR,
+ bonding_show_src_mac_select, bonding_store_src_mac_select);
+
static struct attribute *per_bond_attrs[] = {
&dev_attr_slaves.attr,
&dev_attr_mode.attr,
@@ -1671,6 +1720,7 @@ static struct attribute *per_bond_attrs[
&dev_attr_queue_id.attr,
&dev_attr_all_slaves_active.attr,
&dev_attr_resend_igmp.attr,
+ &dev_attr_src_mac_select.attr,
NULL,
};
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/include/linux/if_bonding.h linux-2.6.p/include/linux/if_bonding.h
--- linux-2.6/include/linux/if_bonding.h 2011-02-16 00:59:18.720282002 +0300
+++ linux-2.6.p/include/linux/if_bonding.h 2011-03-01 23:00:56.610282019 +0300
@@ -91,6 +91,7 @@
#define BOND_XMIT_POLICY_LAYER2 0 /* layer 2 (MAC only), default */
#define BOND_XMIT_POLICY_LAYER34 1 /* layer 3+4 (IP ^ (TCP || UDP)) */
#define BOND_XMIT_POLICY_LAYER23 2 /* layer 2+3 (IP ^ MAC) */
+#define BOND_XMIT_POLICY_RR 3 /* round-robin mode */
typedef struct ifbond {
__s32 bond_mode;
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy and source mac selection mode
2011-03-01 22:34 [PATCH] bonding: added 802.3ad round-robin hashing policy and source mac selection mode Oleg V. Ukhno
@ 2011-03-01 23:29 ` Stephen Hemminger
2011-03-02 2:56 ` Jay Vosburgh
0 siblings, 1 reply; 9+ messages in thread
From: Stephen Hemminger @ 2011-03-01 23:29 UTC (permalink / raw)
To: Oleg V. Ukhno; +Cc: netdev, Jay Vosburgh, David S. Miller
On Wed, 2 Mar 2011 01:34:58 +0300
"Oleg V. Ukhno" <olegu@yandex-team.ru> wrote:
> Patch introduces two new (related) features to bonding module.
> First feature is round-robin hashing policy, which is primarily
> intended for use with 802.3ad mode, and puts every next IPv4 and
> IPv6 packet into next availables slave without taling into account
> which layer3 and above protocol is used.
> Second feature makes possible choosing which MAC-address will be set
> in the transmitted packet - when set to src-mac it will force setting
> slave's interface real MAC address as source MAC address in every
> packet, sent via this slave interface.
> Main goal of this patch is to make possible single TCP stream
> equally striped for both transmitted and received packets over all
> available slaves.
> This operating mode is not fully 802.3ad compliant, and will cause
> some packet reordering in TCP stream, to some kernel tuning may be
> required.
> For correct working enabling round-robin hashing policy plus using
> real slave's MAC addresses as source MAC addresses in transmitted
> packets requires specific switch setting)hashing mode for port-channel
> ("etherchannel) should be set to src-mac or src-dst-mac to get
> correct load-striping on the receiving host's etherchannel.
> General requirements for using bonding in this operating mode are:
> - even and preferrably equal number of slaves on sending and receiving
> hosts;
> - equal RTT between sending and receiving hosts on all slaves;
> - switch capable of doing etherchannels and using src-mac or src-dst-mac
> hashing policy for egress load striping
>
> Signed-off-by: Oleg V. Ukhno <olegu@yandex-team.ru>
It seems to me the whole bonding policy is getting so complex
that the code is a mess. Perhaps it should be somehow linked into
existing packet classification or firewall mechanisms. This would
increase the flexibility and reduce the amount of policy code
in the bonding driver itself.
--
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy and source mac selection mode
2011-03-01 23:29 ` Stephen Hemminger
@ 2011-03-02 2:56 ` Jay Vosburgh
2011-03-02 9:15 ` Oleg V. Ukhno
0 siblings, 1 reply; 9+ messages in thread
From: Jay Vosburgh @ 2011-03-02 2:56 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Oleg V. Ukhno, netdev, David S. Miller
Stephen Hemminger <shemminger@vyatta.com> wrote:
>On Wed, 2 Mar 2011 01:34:58 +0300
>"Oleg V. Ukhno" <olegu@yandex-team.ru> wrote:
>
>> Patch introduces two new (related) features to bonding module.
>> First feature is round-robin hashing policy, which is primarily
>> intended for use with 802.3ad mode, and puts every next IPv4 and
>> IPv6 packet into next availables slave without taling into account
>> which layer3 and above protocol is used.
>> Second feature makes possible choosing which MAC-address will be set
>> in the transmitted packet - when set to src-mac it will force setting
>> slave's interface real MAC address as source MAC address in every
>> packet, sent via this slave interface.
>> Main goal of this patch is to make possible single TCP stream
>> equally striped for both transmitted and received packets over all
>> available slaves.
>> This operating mode is not fully 802.3ad compliant, and will cause
>> some packet reordering in TCP stream, to some kernel tuning may be
>> required.
>> For correct working enabling round-robin hashing policy plus using
>> real slave's MAC addresses as source MAC addresses in transmitted
>> packets requires specific switch setting)hashing mode for port-channel
>> ("etherchannel) should be set to src-mac or src-dst-mac to get
>> correct load-striping on the receiving host's etherchannel.
>> General requirements for using bonding in this operating mode are:
>> - even and preferrably equal number of slaves on sending and receiving
>> hosts;
>> - equal RTT between sending and receiving hosts on all slaves;
>> - switch capable of doing etherchannels and using src-mac or src-dst-mac
>> hashing policy for egress load striping
>>
>> Signed-off-by: Oleg V. Ukhno <olegu@yandex-team.ru>
>
>It seems to me the whole bonding policy is getting so complex
>that the code is a mess. Perhaps it should be somehow linked into
>existing packet classification or firewall mechanisms. This would
>increase the flexibility and reduce the amount of policy code
>in the bonding driver itself.
Hmm.
Yes, the number of special case knobs in bonding is getting
rather large, and there are one or two other proposals in the pipe
besides this one.
It would be handy to be able to do things like run ebtables
style rules against traffic going in and out of the bond. Right now
ebtables is pretty tightly coupled with the bridge, so we'd need to add
a whole new set of netfilter "bondtables" or something. Or add hooks
for ebtables outside of the bridge.
For this particular patch, the src-mac business could be handled
by a netfilter module. The round-robin hash policy part would probably
have to stay in bonding.
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy and source mac selection mode
2011-03-02 2:56 ` Jay Vosburgh
@ 2011-03-02 9:15 ` Oleg V. Ukhno
0 siblings, 0 replies; 9+ messages in thread
From: Oleg V. Ukhno @ 2011-03-02 9:15 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: Stephen Hemminger, netdev, David S. Miller
On 03/02/2011 05:56 AM, Jay Vosburgh wrote:
> Stephen Hemminger<shemminger@vyatta.com> wrote:
>
>> On Wed, 2 Mar 2011 01:34:58 +0300
>> "Oleg V. Ukhno"<olegu@yandex-team.ru> wrote:
>>
>>
>> It seems to me the whole bonding policy is getting so complex
>> that the code is a mess. Perhaps it should be somehow linked into
>> existing packet classification or firewall mechanisms. This would
>> increase the flexibility and reduce the amount of policy code
>> in the bonding driver itself.
>
> Hmm.
>
> Yes, the number of special case knobs in bonding is getting
> rather large, and there are one or two other proposals in the pipe
> besides this one.
>
> It would be handy to be able to do things like run ebtables
> style rules against traffic going in and out of the bond. Right now
> ebtables is pretty tightly coupled with the bridge, so we'd need to add
> a whole new set of netfilter "bondtables" or something. Or add hooks
> for ebtables outside of the bridge.
>
> For this particular patch, the src-mac business could be handled
> by a netfilter module. The round-robin hash policy part would probably
> have to stay in bonding.
>
> -J
>
> ---
> -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
>
I am sorry, but I disagree with you, although it is possible to use
ebtables as a general mechanism to alter L2 headers.
It seems to be possible(never did so) to use ebtables for altering
src-mac field for outgoing packets, but is is done in iptables/ipchains
manner - with manual configuration - and requires to know all the
mac-address - interface bindings.
My point in collecting all this stuff in bonding module was :
- make bonding configuration with src-mac subtitution as simple as
possible, which reduces choice of human error when mantaining 100+
server deployments
- make configuration equally simple for any number of slaves and allow
simple slave addon/removal
- eliminate need for tracking hwaddress changes when replacing network
cards/server body.
- although I've never really used ebtables in my production, my
experience with iptables (this may not be true for all cases or may be
true for lesser part) tells me that using quite complex set of rules to
analyze and alter packets will introduce excessive CPU and latency
penalties, which will possibly cause (much?) worse packet reordering as
it is for this patch.
- one important thing for me (maybe it is not always true) - simplicity
of debugging any network problems with this kind of port-teaming.
--
Best regards,
Oleg Ukhno
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] bonding: added 802.3ad round-robin hashing policy and source mac selection mode
@ 2011-02-16 19:13 Oleg V. Ukhno
2011-02-28 0:08 ` David Miller
2011-02-28 19:29 ` Jay Vosburgh
0 siblings, 2 replies; 9+ messages in thread
From: Oleg V. Ukhno @ 2011-02-16 19:13 UTC (permalink / raw)
To: netdev; +Cc: Jay Vosburgh, David S. Miller
Patch introduces two new (related) features to bonding module.
First feature is round-robin hashing policy, which is primarily
intended for use with 802.3ad mode, and puts every next IPv4 and
IPv6 packet into next availables slave without taling into account
which layer3 and above protocol is used.
Second feature makes possible choosing which MAC-address will be set
in the transmitted packet - when set to src-mac it will force setting
slave's interface real MAC address as source MAC address in every
packet, sent via this slave interface.
Main goal of this patch is to make possible single TCP stream
equally striped for both transmitted and received packets over all
available slaves.
This operating mode is not fully 802.3ad compliant, and will cause
some packet reordering in TCP stream, to some kernel tuning may be
required.
For correct working enabling round-robin hashing policy plus using
real slave's MAC addresses as source MAC addresses in transmitted
packets requires specific switch setting)hashing mode for port-channel
("etherchannel) should be set to src-mac or src-dst-mac to get
correct load-striping on the receiving host's etherchannel.
General requirements for using bonding in this operating mode are:
- even and preferrably equal number of slaves on sending and receiving
hosts;
- equal RTT between sending and receiving hosts on all slaves;
- switch capable of doing etherchannels and using src-mac or src-dst-mac
hashing policy for egress load striping
Signed-off-by: Oleg V. Ukhno <olegu@yandex-team.ru>
---
Documentation/networking/bonding.txt | 65 +++++++++++++++++++++++++++++++++++
drivers/net/bonding/bond_3ad.c | 2 -
drivers/net/bonding/bond_main.c | 60 +++++++++++++++++++++++++++++---
drivers/net/bonding/bond_sysfs.c | 50 ++++++++++++++++++++++++++
drivers/net/bonding/bonding.h | 7 +++
include/linux/if_bonding.h | 1
6 files changed, 178 insertions(+), 7 deletions(-)
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/Documentation/networking/bonding.txt linux-2.6.p/Documentation/networking/bonding.txt
--- linux-2.6/Documentation/networking/bonding.txt 2011-02-08 16:03:01.290281998 +0300
+++ linux-2.6.p/Documentation/networking/bonding.txt 2011-02-16 22:03:09.650281997 +0300
@@ -83,6 +83,7 @@ Table of Contents
12. Configuring Bonding for Maximum Throughput
12.1 Maximum Throughput in a Single Switch Topology
12.1.1 MT Bonding Mode Selection for Single Switch Topology
+12.1.1.1 Maximizing TCP Throughput for RX/TX for Single Switch Topology using layer2 mechanisms
12.1.2 MT Link Monitoring for Single Switch Topology
12.2 Maximum Throughput in a Multiple Switch Topology
12.2.1 MT Bonding Mode Selection for Multiple Switch Topology
@@ -761,6 +762,34 @@ xmit_hash_policy
conversations. Other implementations of 802.3ad may
or may not tolerate this noncompliance.
+ round-robin
+
+ This policy simply puts every next packet into next
+ slave interfaces, providing round-robin load striping
+ for transmitted data. This policy can be enabled with
+ any mode which supports choosing alternate hash policy,
+ but was initially done for 802.3ad mode.
+
+ Main goal for this policy is to stripe TX load without
+ taking into account which layer3 protocol is used, and
+ can be used for single TCP connection load striping. When
+ enabled, it will round-robin packets for IPv4 and IPv6
+ only.
+
+ There is also src_mac_select option, which can be used
+ to configure RX load-striping using switch hashing
+ algorhytms on the receiving side. See detailed description
+ below.
+
+ It is important to understand, that this hashing policy
+ will possibly cause TCP out-of-order packets when enabled
+ and must not be used when slaves have different bandwidth
+ and/or RTT in receiver's direction. This algorithm is not
+ fully 802.3ad compliant. Some implementations of 802.3ad
+ may or may not tolerate this noncompliance.
+
+ Hashing formula is transmitted packet number % slave count.
+
The default value is layer2. This option was added in bonding
version 2.6.3. In earlier versions of bonding, this parameter
does not exist, and the layer2 policy is the only policy. The
@@ -2190,6 +2219,42 @@ balance-alb: This mode is everything tha
device driver must support changing the hardware address while
the device is open.
+12.1.1.1 Maximizing TCP Throughput for RX/TX for Single Switch Topology
+ using layer2 mechanisms
+----------------------------------------------------------------------
+ Besides of methods of load striping and configuring HA, mentioned
+above, you can use round-robin hashing policy and src_mac_select "slave-src"
+setting to stripe TCP load near-equally over even number of slaves. Please
+note, that enabling round-robin policy for balance-xor mode should turn it
+into mode similar to balance-rr mode.
+ There is also specific switch configuration required to use all
+benefits of both round-robin hashing policy and src_mac_select "slave-src"
+setting.
+ When you enable round-robin xmit hashing policy plus set
+src_mac_select to slave-src mode, you will get every next packet
+transmitted over a new slave with every's packet source MAC address set
+to real MAC address of the according slave interface, not the aggregate
+interface.
+ Imagine, that you have two hosts(let's say A and B), each connected
+using 2 slave interfaces to switch with appropriate port-channels configured
+("etherchannels"). After you start transmitting TCP data from A to B, and
+round-robin hashing policy is enabled, you will see that TX load is equally
+striped over host A slaves, but all this traffic is received with only one
+machine's B slave.
+ Now, you set src_mac_select parameter to "slave-src" and
+configure switch for src-mac hashing for "outqoing" etherchannel load
+striping. Now every packet sent from host A has slave's MAC as source MAC
+address, and switch will send every packet from host A into receiving
+port-channel of host B taking into account source MAC address of packet being
+put into, so you will get near-equal RX load striping, which does not depend
+on layer3 and above protocols used for data transmission.
+ It is important to understand, that this load striping mode
+will only work correctly if number of slaves on each side is at least
+even, and preferrably equal and even.
+ This load striping mode also can cause TCP out-of-order packets,
+so you may need to tune your kernel for handling increased number of
+reordered packets.
+
12.1.2 MT Link Monitoring for Single Switch Topology
----------------------------------------------------
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/bonding/bond_3ad.c linux-2.6.p/drivers/net/bonding/bond_3ad.c
--- linux-2.6/drivers/net/bonding/bond_3ad.c 2011-02-16 00:59:18.710282002 +0300
+++ linux-2.6.p/drivers/net/bonding/bond_3ad.c 2011-02-16 01:30:47.770281998 +0300
@@ -2419,7 +2419,7 @@ int bond_3ad_xmit_xor(struct sk_buff *sk
goto out;
}
- slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
+ slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg, bond->rr_tx_counter++);
bond_for_each_slave(bond, slave, i) {
struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/bonding/bonding.h linux-2.6.p/drivers/net/bonding/bonding.h
--- linux-2.6/drivers/net/bonding/bonding.h 2011-02-16 00:59:18.720282002 +0300
+++ linux-2.6.p/drivers/net/bonding/bonding.h 2011-02-16 01:33:11.610282004 +0300
@@ -162,6 +162,7 @@ struct bond_params {
int tx_queues;
int all_slaves_active;
int resend_igmp;
+ int src_mac_select;
};
struct bond_parm_tbl {
@@ -235,7 +236,7 @@ struct bonding {
#endif /* CONFIG_PROC_FS */
struct list_head bond_list;
struct netdev_hw_addr_list mc_list;
- int (*xmit_hash_policy)(struct sk_buff *, int);
+ int (*xmit_hash_policy)(struct sk_buff *, int, int);
__be32 master_ip;
u16 flags;
u16 rr_tx_counter;
@@ -308,6 +309,9 @@ static inline bool bond_is_lb(const stru
#define BOND_ARP_VALIDATE_ALL (BOND_ARP_VALIDATE_ACTIVE | \
BOND_ARP_VALIDATE_BACKUP)
+#define BOND_MAC_SRC_DEFAULT 0
+#define BOND_MAC_SRC_SLAVE 1
+
static inline int slave_do_arp_validate(struct bonding *bond,
struct slave *slave)
{
@@ -402,6 +406,7 @@ extern const struct bond_parm_tbl arp_va
extern const struct bond_parm_tbl fail_over_mac_tbl[];
extern const struct bond_parm_tbl pri_reselect_tbl[];
extern struct bond_parm_tbl ad_select_tbl[];
+extern const struct bond_parm_tbl src_mac_select_tbl[];
#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
void bond_send_unsolicited_na(struct bonding *bond);
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/bonding/bond_main.c linux-2.6.p/drivers/net/bonding/bond_main.c
--- linux-2.6/drivers/net/bonding/bond_main.c 2011-02-16 00:59:18.720282002 +0300
+++ linux-2.6.p/drivers/net/bonding/bond_main.c 2011-02-16 22:08:22.650281997 +0300
@@ -111,6 +111,7 @@ static char *fail_over_mac;
static int all_slaves_active = 0;
static struct bond_params bonding_defaults;
static int resend_igmp = BOND_DEFAULT_RESEND_IGMP;
+static char *src_mac_select;
module_param(max_bonds, int, 0);
MODULE_PARM_DESC(max_bonds, "Max number of bonded devices");
@@ -152,7 +153,7 @@ module_param(ad_select, charp, 0);
MODULE_PARM_DESC(ad_select, "803.ad aggregation selection logic: stable (0, default), bandwidth (1), count (2)");
module_param(xmit_hash_policy, charp, 0);
MODULE_PARM_DESC(xmit_hash_policy, "XOR hashing method: 0 for layer 2 (default)"
- ", 1 for layer 3+4");
+ ", 1 for layer 3+4, 3 for round-robin");
module_param(arp_interval, int, 0);
MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
module_param_array(arp_ip_target, charp, NULL, 0);
@@ -167,6 +168,9 @@ MODULE_PARM_DESC(all_slaves_active, "Kee
"0 for never (default), 1 for always.");
module_param(resend_igmp, int, 0);
MODULE_PARM_DESC(resend_igmp, "Number of IGMP membership reports to send on link failure");
+module_param(src_mac_select, charp, 0);
+MODULE_PARM_DESC(src_mac_select, "Source MAC selection mode: 0 or default (default),"
+ "1 or slave-src to use slave's MAC as packet's src MAC");
/*----------------------------- Global variables ----------------------------*/
@@ -206,6 +210,7 @@ const struct bond_parm_tbl xmit_hashtype
{ "layer2", BOND_XMIT_POLICY_LAYER2},
{ "layer3+4", BOND_XMIT_POLICY_LAYER34},
{ "layer2+3", BOND_XMIT_POLICY_LAYER23},
+{ "round-robin", BOND_XMIT_POLICY_LAYERRR},
{ NULL, -1},
};
@@ -238,6 +243,12 @@ struct bond_parm_tbl ad_select_tbl[] = {
{ NULL, -1},
};
+const struct bond_parm_tbl src_mac_select_tbl[] = {
+{ "default", BOND_MAC_SRC_DEFAULT},
+{ "slave-src", BOND_MAC_SRC_SLAVE},
+{ NULL, -1},
+};
+
/*-------------------------- Forward declarations ---------------------------*/
static void bond_send_gratuitous_arp(struct bonding *bond);
@@ -422,6 +433,7 @@ struct vlan_entry *bond_next_vlan(struct
int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb,
struct net_device *slave_dev)
{
+ struct ethhdr *eth_data;
skb->dev = slave_dev;
skb->priority = 1;
#ifdef CONFIG_NET_POLL_CONTROLLER
@@ -433,6 +445,15 @@ int bond_dev_queue_xmit(struct bonding *
slave_dev->priv_flags &= ~IFF_IN_NETPOLL;
} else
#endif
+ if (bond->params.src_mac_select == BOND_MAC_SRC_SLAVE &&
+ (skb->protocol == htons(ETH_P_IP) ||
+ skb->protocol == htons(ETH_P_IPV6))) {
+ skb_reset_mac_header(skb);
+ eth_data = eth_hdr(skb);
+ memcpy(eth_data->h_source, slave_dev->perm_addr,
+ ETH_ALEN);
+ }
+
dev_queue_xmit(skb);
return 0;
@@ -3261,6 +3282,13 @@ static void bond_info_show_master(struct
bond->params.xmit_policy);
}
+ if (bond->params.src_mac_select == BOND_MAC_SRC_DEFAULT ||
+ bond->params.src_mac_select == BOND_MAC_SRC_DEFAULT) {
+ seq_printf(seq, "Source MAC select is: %s (%d)\n",
+ src_mac_select_tbl[bond->params.src_mac_select].modename,
+ bond->params.src_mac_select);
+ }
+
if (USES_PRIMARY(bond->params.mode)) {
seq_printf(seq, "Primary Slave: %s",
(bond->primary_slave) ?
@@ -3717,7 +3745,8 @@ void bond_unregister_arp(struct bonding
* Hash for the output device based upon layer 2 and layer 3 data. If
* the packet is not IP mimic bond_xmit_hash_policy_l2()
*/
-static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
+static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count,
+ int pktcount)
{
struct ethhdr *data = (struct ethhdr *)skb->data;
struct iphdr *iph = ip_hdr(skb);
@@ -3735,7 +3764,8 @@ static int bond_xmit_hash_policy_l23(str
* the packet is a frag or not TCP or UDP, just use layer 3 data. If it is
* altogether not IP, mimic bond_xmit_hash_policy_l2()
*/
-static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
+static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count,
+ int pktcount)
{
struct ethhdr *data = (struct ethhdr *)skb->data;
struct iphdr *iph = ip_hdr(skb);
@@ -3759,13 +3789,29 @@ static int bond_xmit_hash_policy_l34(str
/*
* Hash for the output device based upon layer 2 data
*/
-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count,
+ int pktcount)
{
struct ethhdr *data = (struct ethhdr *)skb->data;
return (data->h_dest[5] ^ data->h_source[5]) % count;
}
+/*
+ * Round-robin over all active slaves(one packet per slave) for IP and IPv6,
+ * otherwise mimic bond_xmit_hash_policy_l2()
+ */
+static int bond_xmit_hash_policy_rr(struct sk_buff *skb, int count,
+ int pktcount)
+{
+ struct ethhdr *data = (struct ethhdr *)skb->data;
+ if (skb->protocol == htons(ETH_P_IP)
+ || skb->protocol == htons(ETH_P_IPV6)) {
+ return pktcount % count;
+ }
+ return (data->h_dest[5] ^ data->h_source[5]) % count;
+}
+
/*-------------------------- Device entry points ----------------------------*/
static int bond_open(struct net_device *bond_dev)
@@ -4395,7 +4441,8 @@ static int bond_xmit_xor(struct sk_buff
if (!BOND_IS_OK(bond))
goto out;
- slave_no = bond->xmit_hash_policy(skb, bond->slave_cnt);
+ slave_no = bond->xmit_hash_policy(skb, bond->slave_cnt,
+ bond->rr_tx_counter++);
bond_for_each_slave(bond, slave, i) {
slave_no--;
@@ -4492,6 +4539,9 @@ static void bond_set_xmit_hash_policy(st
case BOND_XMIT_POLICY_LAYER34:
bond->xmit_hash_policy = bond_xmit_hash_policy_l34;
break;
+ case BOND_XMIT_POLICY_LAYERRR:
+ bond->xmit_hash_policy = bond_xmit_hash_policy_rr;
+ break;
case BOND_XMIT_POLICY_LAYER2:
default:
bond->xmit_hash_policy = bond_xmit_hash_policy_l2;
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/bonding/bond_sysfs.c linux-2.6.p/drivers/net/bonding/bond_sysfs.c
--- linux-2.6/drivers/net/bonding/bond_sysfs.c 2011-02-08 16:03:02.950282003 +0300
+++ linux-2.6.p/drivers/net/bonding/bond_sysfs.c 2011-02-16 02:05:58.650281999 +0300
@@ -1643,6 +1643,55 @@ out:
static DEVICE_ATTR(resend_igmp, S_IRUGO | S_IWUSR,
bonding_show_resend_igmp, bonding_store_resend_igmp);
+/*
+ * Show and set the bonding src_mac_select param.
+ */
+
+static ssize_t bonding_show_src_mac_select(struct device *d,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct bonding *bond = to_bond(d);
+
+ return sprintf(buf, "%s %d\n",
+ src_mac_select_tbl[bond->params.src_mac_select].modename,
+ bond->params.src_mac_select);
+}
+
+static ssize_t bonding_store_src_mac_select(struct device *d,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int new_value, ret = count;
+ struct bonding *bond = to_bond(d);
+
+ if (bond->dev->flags & IFF_UP) {
+ pr_err("%s: Interface is up. Unable to update src mac select policy.\n",
+ bond->dev->name);
+ ret = -EPERM;
+ goto out;
+ }
+
+ new_value = bond_parse_parm(buf, src_mac_select_tbl);
+ if (new_value < 0) {
+ pr_err("%s: Ignoring invalid src mac select policy value %.*s.\n",
+ bond->dev->name,
+ (int)strlen(buf) - 1, buf);
+ ret = -EINVAL;
+ goto out;
+ } else {
+ bond->params.src_mac_select = new_value;
+ pr_info("%s: setting src mac select policy to %s (%d).\n",
+ bond->dev->name,
+ src_mac_select_tbl[new_value].modename, new_value);
+ }
+out:
+ return ret;
+}
+
+static DEVICE_ATTR(src_mac_select, S_IRUGO | S_IWUSR,
+ bonding_show_src_mac_select, bonding_store_src_mac_select);
+
static struct attribute *per_bond_attrs[] = {
&dev_attr_slaves.attr,
&dev_attr_mode.attr,
@@ -1671,6 +1720,7 @@ static struct attribute *per_bond_attrs[
&dev_attr_queue_id.attr,
&dev_attr_all_slaves_active.attr,
&dev_attr_resend_igmp.attr,
+ &dev_attr_src_mac_select.attr,
NULL,
};
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/include/linux/if_bonding.h linux-2.6.p/include/linux/if_bonding.h
--- linux-2.6/include/linux/if_bonding.h 2011-02-16 00:59:18.720282002 +0300
+++ linux-2.6.p/include/linux/if_bonding.h 2011-02-16 01:23:38.660282000 +0300
@@ -91,6 +91,7 @@
#define BOND_XMIT_POLICY_LAYER2 0 /* layer 2 (MAC only), default */
#define BOND_XMIT_POLICY_LAYER34 1 /* layer 3+4 (IP ^ (TCP || UDP)) */
#define BOND_XMIT_POLICY_LAYER23 2 /* layer 2+3 (IP ^ MAC) */
+#define BOND_XMIT_POLICY_LAYERRR 3 /* round-robin mode */
typedef struct ifbond {
__s32 bond_mode;
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy and source mac selection mode
2011-02-16 19:13 Oleg V. Ukhno
@ 2011-02-28 0:08 ` David Miller
2011-02-28 10:09 ` Oleg V. Ukhno
2011-02-28 19:29 ` Jay Vosburgh
1 sibling, 1 reply; 9+ messages in thread
From: David Miller @ 2011-02-28 0:08 UTC (permalink / raw)
To: olegu; +Cc: netdev, fubar
From: "Oleg V. Ukhno" <olegu@yandex-team.ru>
Date: Wed, 16 Feb 2011 22:13:41 +0300
> Patch introduces two new (related) features to bonding module.
> First feature is round-robin hashing policy, which is primarily
> intended for use with 802.3ad mode, and puts every next IPv4 and
> IPv6 packet into next availables slave without taling into account
> which layer3 and above protocol is used.
> Second feature makes possible choosing which MAC-address will be set
> in the transmitted packet - when set to src-mac it will force setting
> slave's interface real MAC address as source MAC address in every
> packet, sent via this slave interface.
Can we get some feedback on this patch from bonding folks?
I'm not applying it blinding without at least one bonding developer
saying it at least looks ok.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy and source mac selection mode
2011-02-28 0:08 ` David Miller
@ 2011-02-28 10:09 ` Oleg V. Ukhno
0 siblings, 0 replies; 9+ messages in thread
From: Oleg V. Ukhno @ 2011-02-28 10:09 UTC (permalink / raw)
To: David Miller; +Cc: netdev, fubar
David, thank you for reply.
Actually this is second version of patch discussed previously in
http://patchwork.ozlabs.org/patch/78994/
I've remade that patch into current version
(patchwork.ozlabs.org/patch/83389/) in the way Jay suggested.
Jay, can you please comment on patch I've remade, please?
On 02/28/2011 03:08 AM, David Miller wrote:
> From: "Oleg V. Ukhno"<olegu@yandex-team.ru>
> Date: Wed, 16 Feb 2011 22:13:41 +0300
>
>>
> Can we get some feedback on this patch from bonding folks?
>
> I'm not applying it blinding without at least one bonding developer
> saying it at least looks ok.
>
> Thanks.
>
--
С уважением,
руководитель службы
эксплуатации коммерческих и финансовых сервисов
ООО Яндекс
Олег Юхно
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy and source mac selection mode
2011-02-16 19:13 Oleg V. Ukhno
2011-02-28 0:08 ` David Miller
@ 2011-02-28 19:29 ` Jay Vosburgh
2011-03-01 22:38 ` Oleg V. Ukhno
1 sibling, 1 reply; 9+ messages in thread
From: Jay Vosburgh @ 2011-02-28 19:29 UTC (permalink / raw)
To: Oleg V. Ukhno; +Cc: netdev, David S. Miller
Oleg V. Ukhno <olegu@yandex-team.ru> wrote:
>Patch introduces two new (related) features to bonding module.
>First feature is round-robin hashing policy, which is primarily
>intended for use with 802.3ad mode, and puts every next IPv4 and
>IPv6 packet into next availables slave without taling into account
>which layer3 and above protocol is used.
>Second feature makes possible choosing which MAC-address will be set
>in the transmitted packet - when set to src-mac it will force setting
>slave's interface real MAC address as source MAC address in every
>packet, sent via this slave interface.
>Main goal of this patch is to make possible single TCP stream
>equally striped for both transmitted and received packets over all
>available slaves.
>This operating mode is not fully 802.3ad compliant, and will cause
>some packet reordering in TCP stream, to some kernel tuning may be
>required.
>For correct working enabling round-robin hashing policy plus using
>real slave's MAC addresses as source MAC addresses in transmitted
>packets requires specific switch setting)hashing mode for port-channel
>("etherchannel) should be set to src-mac or src-dst-mac to get
>correct load-striping on the receiving host's etherchannel.
>General requirements for using bonding in this operating mode are:
>- even and preferrably equal number of slaves on sending and receiving
>hosts;
>- equal RTT between sending and receiving hosts on all slaves;
>- switch capable of doing etherchannels and using src-mac or src-dst-mac
>hashing policy for egress load striping
>
>Signed-off-by: Oleg V. Ukhno <olegu@yandex-team.ru>
>---
>
> Documentation/networking/bonding.txt | 65 +++++++++++++++++++++++++++++++++++
> drivers/net/bonding/bond_3ad.c | 2 -
> drivers/net/bonding/bond_main.c | 60 +++++++++++++++++++++++++++++---
> drivers/net/bonding/bond_sysfs.c | 50 ++++++++++++++++++++++++++
> drivers/net/bonding/bonding.h | 7 +++
> include/linux/if_bonding.h | 1
> 6 files changed, 178 insertions(+), 7 deletions(-)
>
>diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/Documentation/networking/bonding.txt linux-2.6.p/Documentation/networking/bonding.txt
>--- linux-2.6/Documentation/networking/bonding.txt 2011-02-08 16:03:01.290281998 +0300
>+++ linux-2.6.p/Documentation/networking/bonding.txt 2011-02-16 22:03:09.650281997 +0300
>@@ -83,6 +83,7 @@ Table of Contents
> 12. Configuring Bonding for Maximum Throughput
> 12.1 Maximum Throughput in a Single Switch Topology
> 12.1.1 MT Bonding Mode Selection for Single Switch Topology
>+12.1.1.1 Maximizing TCP Throughput for RX/TX for Single Switch Topology using layer2 mechanisms
> 12.1.2 MT Link Monitoring for Single Switch Topology
> 12.2 Maximum Throughput in a Multiple Switch Topology
> 12.2.1 MT Bonding Mode Selection for Multiple Switch Topology
>@@ -761,6 +762,34 @@ xmit_hash_policy
> conversations. Other implementations of 802.3ad may
> or may not tolerate this noncompliance.
>
>+ round-robin
>+
>+ This policy simply puts every next packet into next
>+ slave interfaces, providing round-robin load striping
>+ for transmitted data. This policy can be enabled with
>+ any mode which supports choosing alternate hash policy,
>+ but was initially done for 802.3ad mode.
>+
>+ Main goal for this policy is to stripe TX load without
>+ taking into account which layer3 protocol is used, and
>+ can be used for single TCP connection load striping. When
>+ enabled, it will round-robin packets for IPv4 and IPv6
>+ only.
>+
>+ There is also src_mac_select option, which can be used
>+ to configure RX load-striping using switch hashing
>+ algorhytms on the receiving side. See detailed description
>+ below.
The src_mac_select option should be documented separately (at
the appropriate place in the document) as well, something like:
src_mac_select
Specifies the source MAC selection method for outgoing packets.
Possible values are:
default or 0
The normal selection policy for the bonding mode is
used. This varies according to the bonding mode. The
balance-xor, balance-rr, 802.3ad and broadcast modes use
the MAC address of the bonding master. The balance-alb
and balance-tlb modes use the MAC address of the slave
the packet is sent on.
slave-src
Sets the source MAC for all outgoing IPv4 and IPv6
packets to the MAC address of the slave the packet is
sent on. This is intended to permit fine grained load
balancing for 802.3ad and balance-xor modes by changing
the slave's MAC addresses. This is documented in detail
in section 12.1.1.1.
This option was added for bonding version 3.8.0
>+ It is important to understand, that this hashing policy
>+ will possibly cause TCP out-of-order packets when enabled
>+ and must not be used when slaves have different bandwidth
>+ and/or RTT in receiver's direction. This algorithm is not
>+ fully 802.3ad compliant. Some implementations of 802.3ad
>+ may or may not tolerate this noncompliance.
I would phrase the above as:
This algorithm is not 802.3ad compliant. This hash
policy will generally cause TCP packets to be delivered
out of order. Other implementations of 802.3ad may or
may not tolerate this noncompliance.
>+ Hashing formula is transmitted packet number % slave count.
You can leave this out.
>+
> The default value is layer2. This option was added in bonding
> version 2.6.3. In earlier versions of bonding, this parameter
> does not exist, and the layer2 policy is the only policy. The
Also, bump the DRV_VERSION to 3.8.0, and DRV_RELDATE to today's
date in bonding.h, and put a bit here that says "the round-robin value
was added for bonding version 3.8.0"
>@@ -2190,6 +2219,42 @@ balance-alb: This mode is everything tha
> device driver must support changing the hardware address while
> the device is open.
>
>+12.1.1.1 Maximizing TCP Throughput for RX/TX for Single Switch Topology
>+ using layer2 mechanisms
>+----------------------------------------------------------------------
>+ Besides of methods of load striping and configuring HA, mentioned
>+above, you can use round-robin hashing policy and src_mac_select "slave-src"
>+setting to stripe TCP load near-equally over even number of slaves. Please
>+note, that enabling round-robin policy for balance-xor mode should turn it
>+into mode similar to balance-rr mode.
>+ There is also specific switch configuration required to use all
>+benefits of both round-robin hashing policy and src_mac_select "slave-src"
>+setting.
>+ When you enable round-robin xmit hashing policy plus set
>+src_mac_select to slave-src mode, you will get every next packet
>+transmitted over a new slave with every's packet source MAC address set
>+to real MAC address of the according slave interface, not the aggregate
>+interface.
>+ Imagine, that you have two hosts(let's say A and B), each connected
>+using 2 slave interfaces to switch with appropriate port-channels configured
>+("etherchannels"). After you start transmitting TCP data from A to B, and
>+round-robin hashing policy is enabled, you will see that TX load is equally
>+striped over host A slaves, but all this traffic is received with only one
>+machine's B slave.
>+ Now, you set src_mac_select parameter to "slave-src" and
>+configure switch for src-mac hashing for "outqoing" etherchannel load
>+striping. Now every packet sent from host A has slave's MAC as source MAC
>+address, and switch will send every packet from host A into receiving
>+port-channel of host B taking into account source MAC address of packet being
>+put into, so you will get near-equal RX load striping, which does not depend
>+on layer3 and above protocols used for data transmission.
>+ It is important to understand, that this load striping mode
>+will only work correctly if number of slaves on each side is at least
>+even, and preferrably equal and even.
>+ This load striping mode also can cause TCP out-of-order packets,
>+so you may need to tune your kernel for handling increased number of
>+reordered packets.
I'd write the above block as:
12.1.1.1 Maximizing TCP Throughput for RX/TX for Single Switch Topology
using layer2 mechanisms
----------------------------------------------------------------------
It is also possible to use round-robin packet transmission,
either in the balance-rr mode or using the round-robin xmit_hash_policy
setting for balance-xor or 802.3ad modes to evenly stripe traffic across
the set of slaves. In conjuction with the src_mac_select option's
"slave-src" value and a carefully configured network, it is possible to
achieve high throughput using round robin.
The network here involves multiple hosts running bonding, all
connected to a common switch, for example:
+--------+ +------------+ +--------+
| |eth0 port1| |port3 eth0| |
| Host A +-------------+ switch +-------------+ Host B |
| bond0 +-------------+ +-------------+ bond0 |
| |eth1 port2| |port4 eth1| |
+--------+ +------------+ +--------+
In this configuration, the switch has ports 1 and 2 in a port
channel group, and ports 3 and 4 in a separate port channel group.
These channel groups are set to hash transmitted packets according to
the source MAC address.
Host A and Host B each have a bond utilizing a round-robin
transmit scheme, with src_mac_select set to "slave-src" and each slave's
MAC address chosen to hash sequentially when run through the switch's
source MAC address egress hash.
For clarity, only two slaves / ports are shown in the diagram.
In this manner, traffic sent from Host A to Host B will
generally be evenly striped all the way through to Host B. It is first
sent round robin by Host A. The switch will then hash on the source MAC
address of the packets, whose MAC addresses have been manually selected
to hash sequentially, thus the switch's egress port channel will tend to
keep the traffic separated by port similarly to how it arrived (e.g.,
packets from eth0 arriving on port1 will tend to go out port3, and
simiarly for eth1, port2 and port4).
This scheme works best when the number of slaves and switch
ports match at both ends. Note that it will work with differing numbers
of slaves, but the traffic balance may not be optimal, and it may be
possible for a host with more slaves to overrun a host with fewer
slaves.
> 12.1.2 MT Link Monitoring for Single Switch Topology
> ----------------------------------------------------
>
>diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/bonding/bond_3ad.c linux-2.6.p/drivers/net/bonding/bond_3ad.c
>--- linux-2.6/drivers/net/bonding/bond_3ad.c 2011-02-16 00:59:18.710282002 +0300
>+++ linux-2.6.p/drivers/net/bonding/bond_3ad.c 2011-02-16 01:30:47.770281998 +0300
>@@ -2419,7 +2419,7 @@ int bond_3ad_xmit_xor(struct sk_buff *sk
> goto out;
> }
>
>- slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
>+ slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg, bond->rr_tx_counter++);
>
> bond_for_each_slave(bond, slave, i) {
> struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
>diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/bonding/bonding.h linux-2.6.p/drivers/net/bonding/bonding.h
>--- linux-2.6/drivers/net/bonding/bonding.h 2011-02-16 00:59:18.720282002 +0300
>+++ linux-2.6.p/drivers/net/bonding/bonding.h 2011-02-16 01:33:11.610282004 +0300
>@@ -162,6 +162,7 @@ struct bond_params {
> int tx_queues;
> int all_slaves_active;
> int resend_igmp;
>+ int src_mac_select;
> };
>
> struct bond_parm_tbl {
>@@ -235,7 +236,7 @@ struct bonding {
> #endif /* CONFIG_PROC_FS */
> struct list_head bond_list;
> struct netdev_hw_addr_list mc_list;
>- int (*xmit_hash_policy)(struct sk_buff *, int);
>+ int (*xmit_hash_policy)(struct sk_buff *, int, int);
> __be32 master_ip;
> u16 flags;
> u16 rr_tx_counter;
>@@ -308,6 +309,9 @@ static inline bool bond_is_lb(const stru
> #define BOND_ARP_VALIDATE_ALL (BOND_ARP_VALIDATE_ACTIVE | \
> BOND_ARP_VALIDATE_BACKUP)
>
>+#define BOND_MAC_SRC_DEFAULT 0
>+#define BOND_MAC_SRC_SLAVE 1
For consistency with the option name (src_mac_select), I'd name
these BOND_SRC_MAC_DEFAULT and BOND_SRC_MAC_SLAVE. There may be a
dst_mac_select someday, and that would make the option value sets
clearer.
> static inline int slave_do_arp_validate(struct bonding *bond,
> struct slave *slave)
> {
>@@ -402,6 +406,7 @@ extern const struct bond_parm_tbl arp_va
> extern const struct bond_parm_tbl fail_over_mac_tbl[];
> extern const struct bond_parm_tbl pri_reselect_tbl[];
> extern struct bond_parm_tbl ad_select_tbl[];
>+extern const struct bond_parm_tbl src_mac_select_tbl[];
>
> #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
> void bond_send_unsolicited_na(struct bonding *bond);
>diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/bonding/bond_main.c linux-2.6.p/drivers/net/bonding/bond_main.c
>--- linux-2.6/drivers/net/bonding/bond_main.c 2011-02-16 00:59:18.720282002 +0300
>+++ linux-2.6.p/drivers/net/bonding/bond_main.c 2011-02-16 22:08:22.650281997 +0300
>@@ -111,6 +111,7 @@ static char *fail_over_mac;
> static int all_slaves_active = 0;
> static struct bond_params bonding_defaults;
> static int resend_igmp = BOND_DEFAULT_RESEND_IGMP;
>+static char *src_mac_select;
>
> module_param(max_bonds, int, 0);
> MODULE_PARM_DESC(max_bonds, "Max number of bonded devices");
>@@ -152,7 +153,7 @@ module_param(ad_select, charp, 0);
> MODULE_PARM_DESC(ad_select, "803.ad aggregation selection logic: stable (0, default), bandwidth (1), count (2)");
> module_param(xmit_hash_policy, charp, 0);
> MODULE_PARM_DESC(xmit_hash_policy, "XOR hashing method: 0 for layer 2 (default)"
>- ", 1 for layer 3+4");
>+ ", 1 for layer 3+4, 3 for round-robin");
> module_param(arp_interval, int, 0);
> MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
> module_param_array(arp_ip_target, charp, NULL, 0);
>@@ -167,6 +168,9 @@ MODULE_PARM_DESC(all_slaves_active, "Kee
> "0 for never (default), 1 for always.");
> module_param(resend_igmp, int, 0);
> MODULE_PARM_DESC(resend_igmp, "Number of IGMP membership reports to send on link failure");
>+module_param(src_mac_select, charp, 0);
>+MODULE_PARM_DESC(src_mac_select, "Source MAC selection mode: 0 or default (default),"
>+ "1 or slave-src to use slave's MAC as packet's src MAC");
>
> /*----------------------------- Global variables ----------------------------*/
>
>@@ -206,6 +210,7 @@ const struct bond_parm_tbl xmit_hashtype
> { "layer2", BOND_XMIT_POLICY_LAYER2},
> { "layer3+4", BOND_XMIT_POLICY_LAYER34},
> { "layer2+3", BOND_XMIT_POLICY_LAYER23},
>+{ "round-robin", BOND_XMIT_POLICY_LAYERRR},
> { NULL, -1},
> };
>
>@@ -238,6 +243,12 @@ struct bond_parm_tbl ad_select_tbl[] = {
> { NULL, -1},
> };
>
>+const struct bond_parm_tbl src_mac_select_tbl[] = {
>+{ "default", BOND_MAC_SRC_DEFAULT},
>+{ "slave-src", BOND_MAC_SRC_SLAVE},
>+{ NULL, -1},
>+};
>+
> /*-------------------------- Forward declarations ---------------------------*/
>
> static void bond_send_gratuitous_arp(struct bonding *bond);
>@@ -422,6 +433,7 @@ struct vlan_entry *bond_next_vlan(struct
> int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb,
> struct net_device *slave_dev)
> {
>+ struct ethhdr *eth_data;
> skb->dev = slave_dev;
> skb->priority = 1;
> #ifdef CONFIG_NET_POLL_CONTROLLER
>@@ -433,6 +445,15 @@ int bond_dev_queue_xmit(struct bonding *
> slave_dev->priv_flags &= ~IFF_IN_NETPOLL;
> } else
> #endif
>+ if (bond->params.src_mac_select == BOND_MAC_SRC_SLAVE &&
>+ (skb->protocol == htons(ETH_P_IP) ||
>+ skb->protocol == htons(ETH_P_IPV6))) {
>+ skb_reset_mac_header(skb);
>+ eth_data = eth_hdr(skb);
>+ memcpy(eth_data->h_source, slave_dev->perm_addr,
>+ ETH_ALEN);
>+ }
>+
> dev_queue_xmit(skb);
>
> return 0;
>@@ -3261,6 +3282,13 @@ static void bond_info_show_master(struct
> bond->params.xmit_policy);
> }
>
>+ if (bond->params.src_mac_select == BOND_MAC_SRC_DEFAULT ||
>+ bond->params.src_mac_select == BOND_MAC_SRC_DEFAULT) {
This tests the same thing twice, which doesn't seem right. I
think you should remove the if () block and do the seq_printf
unconditionally.
>+ seq_printf(seq, "Source MAC select is: %s (%d)\n",
>+ src_mac_select_tbl[bond->params.src_mac_select].modename,
>+ bond->params.src_mac_select);
>+ }
>+
> if (USES_PRIMARY(bond->params.mode)) {
> seq_printf(seq, "Primary Slave: %s",
> (bond->primary_slave) ?
>@@ -3717,7 +3745,8 @@ void bond_unregister_arp(struct bonding
> * Hash for the output device based upon layer 2 and layer 3 data. If
> * the packet is not IP mimic bond_xmit_hash_policy_l2()
> */
>-static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
>+static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count,
>+ int pktcount)
> {
> struct ethhdr *data = (struct ethhdr *)skb->data;
> struct iphdr *iph = ip_hdr(skb);
>@@ -3735,7 +3764,8 @@ static int bond_xmit_hash_policy_l23(str
> * the packet is a frag or not TCP or UDP, just use layer 3 data. If it is
> * altogether not IP, mimic bond_xmit_hash_policy_l2()
> */
>-static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
>+static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count,
>+ int pktcount)
> {
> struct ethhdr *data = (struct ethhdr *)skb->data;
> struct iphdr *iph = ip_hdr(skb);
>@@ -3759,13 +3789,29 @@ static int bond_xmit_hash_policy_l34(str
> /*
> * Hash for the output device based upon layer 2 data
> */
>-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
>+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count,
>+ int pktcount)
> {
> struct ethhdr *data = (struct ethhdr *)skb->data;
>
> return (data->h_dest[5] ^ data->h_source[5]) % count;
> }
>
>+/*
>+ * Round-robin over all active slaves(one packet per slave) for IP and IPv6,
>+ * otherwise mimic bond_xmit_hash_policy_l2()
>+ */
>+static int bond_xmit_hash_policy_rr(struct sk_buff *skb, int count,
>+ int pktcount)
>+{
>+ struct ethhdr *data = (struct ethhdr *)skb->data;
>+ if (skb->protocol == htons(ETH_P_IP)
>+ || skb->protocol == htons(ETH_P_IPV6)) {
>+ return pktcount % count;
>+ }
>+ return (data->h_dest[5] ^ data->h_source[5]) % count;
>+}
Why does this only round robin for IP/IPv6? This hash policy
should behave identically to the bond_xmit_roundrobin logic, which round
robins all traffic except for IPv4 IGMP.
For consistency, it would be desirable for bond_xmit_roundrobin
and bond_xmit_hash_policy_rr to use a common backend function to select
the slave (probably passing in the struct bond * instead of a counter).
>+
> /*-------------------------- Device entry points ----------------------------*/
>
> static int bond_open(struct net_device *bond_dev)
>@@ -4395,7 +4441,8 @@ static int bond_xmit_xor(struct sk_buff
> if (!BOND_IS_OK(bond))
> goto out;
>
>- slave_no = bond->xmit_hash_policy(skb, bond->slave_cnt);
>+ slave_no = bond->xmit_hash_policy(skb, bond->slave_cnt,
>+ bond->rr_tx_counter++);
>
> bond_for_each_slave(bond, slave, i) {
> slave_no--;
>@@ -4492,6 +4539,9 @@ static void bond_set_xmit_hash_policy(st
> case BOND_XMIT_POLICY_LAYER34:
> bond->xmit_hash_policy = bond_xmit_hash_policy_l34;
> break;
>+ case BOND_XMIT_POLICY_LAYERRR:
>+ bond->xmit_hash_policy = bond_xmit_hash_policy_rr;
>+ break;
> case BOND_XMIT_POLICY_LAYER2:
> default:
> bond->xmit_hash_policy = bond_xmit_hash_policy_l2;
>diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/bonding/bond_sysfs.c linux-2.6.p/drivers/net/bonding/bond_sysfs.c
>--- linux-2.6/drivers/net/bonding/bond_sysfs.c 2011-02-08 16:03:02.950282003 +0300
>+++ linux-2.6.p/drivers/net/bonding/bond_sysfs.c 2011-02-16 02:05:58.650281999 +0300
>@@ -1643,6 +1643,55 @@ out:
> static DEVICE_ATTR(resend_igmp, S_IRUGO | S_IWUSR,
> bonding_show_resend_igmp, bonding_store_resend_igmp);
>
>+/*
>+ * Show and set the bonding src_mac_select param.
>+ */
>+
>+static ssize_t bonding_show_src_mac_select(struct device *d,
>+ struct device_attribute *attr,
>+ char *buf)
>+{
>+ struct bonding *bond = to_bond(d);
>+
>+ return sprintf(buf, "%s %d\n",
>+ src_mac_select_tbl[bond->params.src_mac_select].modename,
>+ bond->params.src_mac_select);
>+}
>+
>+static ssize_t bonding_store_src_mac_select(struct device *d,
>+ struct device_attribute *attr,
>+ const char *buf, size_t count)
>+{
>+ int new_value, ret = count;
>+ struct bonding *bond = to_bond(d);
>+
>+ if (bond->dev->flags & IFF_UP) {
>+ pr_err("%s: Interface is up. Unable to update src mac select policy.\n",
>+ bond->dev->name);
>+ ret = -EPERM;
>+ goto out;
>+ }
>+
>+ new_value = bond_parse_parm(buf, src_mac_select_tbl);
>+ if (new_value < 0) {
>+ pr_err("%s: Ignoring invalid src mac select policy value %.*s.\n",
>+ bond->dev->name,
>+ (int)strlen(buf) - 1, buf);
>+ ret = -EINVAL;
>+ goto out;
>+ } else {
>+ bond->params.src_mac_select = new_value;
>+ pr_info("%s: setting src mac select policy to %s (%d).\n",
>+ bond->dev->name,
>+ src_mac_select_tbl[new_value].modename, new_value);
>+ }
>+out:
>+ return ret;
>+}
>+
>+static DEVICE_ATTR(src_mac_select, S_IRUGO | S_IWUSR,
>+ bonding_show_src_mac_select, bonding_store_src_mac_select);
>+
> static struct attribute *per_bond_attrs[] = {
> &dev_attr_slaves.attr,
> &dev_attr_mode.attr,
>@@ -1671,6 +1720,7 @@ static struct attribute *per_bond_attrs[
> &dev_attr_queue_id.attr,
> &dev_attr_all_slaves_active.attr,
> &dev_attr_resend_igmp.attr,
>+ &dev_attr_src_mac_select.attr,
> NULL,
> };
>
>diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/include/linux/if_bonding.h linux-2.6.p/include/linux/if_bonding.h
>--- linux-2.6/include/linux/if_bonding.h 2011-02-16 00:59:18.720282002 +0300
>+++ linux-2.6.p/include/linux/if_bonding.h 2011-02-16 01:23:38.660282000 +0300
>@@ -91,6 +91,7 @@
> #define BOND_XMIT_POLICY_LAYER2 0 /* layer 2 (MAC only), default */
> #define BOND_XMIT_POLICY_LAYER34 1 /* layer 3+4 (IP ^ (TCP || UDP)) */
> #define BOND_XMIT_POLICY_LAYER23 2 /* layer 2+3 (IP ^ MAC) */
>+#define BOND_XMIT_POLICY_LAYERRR 3 /* round-robin mode */
A nit perhaps, but round robin isn't really a layer. I'm ok to
call this BOND_XMIT_POLICY_RR.
> typedef struct ifbond {
> __s32 bond_mode;
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-03-02 9:22 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-01 22:34 [PATCH] bonding: added 802.3ad round-robin hashing policy and source mac selection mode Oleg V. Ukhno
2011-03-01 23:29 ` Stephen Hemminger
2011-03-02 2:56 ` Jay Vosburgh
2011-03-02 9:15 ` Oleg V. Ukhno
-- strict thread matches above, loose matches on Subject: below --
2011-02-16 19:13 Oleg V. Ukhno
2011-02-28 0:08 ` David Miller
2011-02-28 10:09 ` Oleg V. Ukhno
2011-02-28 19:29 ` Jay Vosburgh
2011-03-01 22:38 ` Oleg V. Ukhno
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).