netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bonding: add mark mode
@ 2009-05-01 17:39 Andy Gospodarek
  2009-05-01 19:34 ` Neil Horman
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Gospodarek @ 2009-05-01 17:39 UTC (permalink / raw)
  To: netdev, fubar; +Cc: bonding-devel


Quite a few people are happy with the way bonding manages to split
traffic to different members of a bond.  Many are quite disappointed
that users or administrators cannot be given more control over the
traffic distribution and would like something that they can control more
easily.  I looked at extending some of the existing modes, but the
cleanest option seemed to be one that created an additional mode to
handle this case.  I hated to create yet another mode, but the
simplicity of this mode made it a nice candidate for a new mode.  I have
decided to call this mode 'mark' (or mode 7).

The mark mode of bonding relies on the skb->mark field for outgoing
device selection.  Unmarked frames (ones where the mark is still zero),
will be sent by the first active enslaved device.  Any marked frames
will choose the outgoing device based on result of the modulo of the mark
and the number of enslaved devices.  If that device is inactive
(link-down), the traffic will default back to the first active enslaved
device.  I debated how to use the mark to decide the outgoing device,
but it seemed that modulo of the mark and the number of enslaved devices
would provide the most flexibility for those who currently mark frames
for other purposes.

I considered some other options for choosing destination devices based
on marks, but the ones I came up would require additional sysfs
configuration parameters and I would prefer not to add any more to an
already crowded space.

I've tested this on a slightly older kernel than the net-next-2.6 tree
than this patch is against by marking frames using mark and connmark
iptables options and it seems to work as I expect.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
---

 Documentation/networking/bonding.txt |   25 ++++++++++++++
 drivers/net/bonding/bond_main.c      |   59 ++++++++++++++++++++++++++++++++++-
 include/linux/if_bonding.h           |    1 
 3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 0876275..7a0d4c2 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -582,6 +582,31 @@ mode
 		swapped with the new curr_active_slave that was
 		chosen.
 
+	mark or 7
+
+		Mark-based policy: skbuffs that arrive to be
+		transmitted will have the mark field inspected to
+		determine the destination slave device.  When the
+		skbuff's mark is zero, the first active device in the
+		ordered list of enslaved devices will be used.  When
+		the mark is non-zero the modulo of the mark and the
+		number of enslaved devices will determine the
+		interface used for transmission.  If this device is
+		not active (link-down) then the mark will essentially
+		be ignored and the first active device in the ordered
+		list of enslaved devices will be used.
+
+		The flexibility offered with this mode allows users
+		of netfilter to move various types of traffic to
+		different slaves quite easily.  Information on this
+		can be found in the manpages for iptables/ebtables
+		as well as netfilter documentation.
+
+		Prerequisites:
+
+		1.  Without the ability to mark skbuffs this mode is
+		not useful.  Netfilter greatly aides skbuff marking.
+
 num_grat_arp
 
 	Specifies the number of gratuitous ARPs to be issued after a
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index fd73836..5e1d166 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -123,7 +123,7 @@ module_param(mode, charp, 0);
 MODULE_PARM_DESC(mode, "Mode of operation : 0 for balance-rr, "
 		       "1 for active-backup, 2 for balance-xor, "
 		       "3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, "
-		       "6 for balance-alb");
+		       "6 for balance-alb, 7 for mark");
 module_param(primary, charp, 0);
 MODULE_PARM_DESC(primary, "Primary network device to use");
 module_param(lacp_rate, charp, 0);
@@ -175,6 +175,7 @@ const struct bond_parm_tbl bond_mode_tbl[] = {
 {	"802.3ad",		BOND_MODE_8023AD},
 {	"balance-tlb",		BOND_MODE_TLB},
 {	"balance-alb",		BOND_MODE_ALB},
+{	"mark",			BOND_MODE_MARK},
 {	NULL,			-1},
 };
 
@@ -224,6 +225,7 @@ static const char *bond_mode_name(int mode)
 		[BOND_MODE_8023AD]= "IEEE 802.3ad Dynamic link aggregation",
 		[BOND_MODE_TLB] = "transmit load balancing",
 		[BOND_MODE_ALB] = "adaptive load balancing",
+		[BOND_MODE_MARK] = "mark-based transmit balancing",
 	};
 
 	if (mode < 0 || mode > BOND_MODE_ALB)
@@ -4464,6 +4466,57 @@ out:
 	return 0;
 }
 
+static int bond_xmit_mark(struct sk_buff *skb, struct net_device *bond_dev)
+{
+	struct bonding *bond = netdev_priv(bond_dev);
+	struct slave *slave;
+	int i, slave_no, res = 1;
+
+	read_lock(&bond->lock);
+
+	if (!BOND_IS_OK(bond)) {
+		goto out;
+	}
+
+	/* Use the mark as the determining factor for which slave to
+	 * choose for transmission.  When behaving normally all should
+	 * work just fine.  When a slave that is destined to be the
+	 * transmitter of this frame is down, start at the front of the
+	 * list and find the first available slave. */
+
+	slave_no = skb->mark ? skb->mark % bond->slave_cnt : 0;
+
+	bond_for_each_slave(bond, slave, i) {
+		slave_no--;
+		if (slave_no < 0) {
+			break;
+		}
+	}
+
+	if (IS_UP(slave->dev) &&
+	    (slave->link == BOND_LINK_UP) &&
+	    (slave->state == BOND_STATE_ACTIVE)) {
+		res = bond_dev_queue_xmit(bond, skb, slave->dev);
+	} else {
+		/* If desired slave is down, send on the first up and
+		 * active slave in the list. */
+		bond_for_each_slave(bond, slave, i) {
+			if (IS_UP(slave->dev) &&
+			    (slave->link == BOND_LINK_UP) &&
+			    (slave->state == BOND_STATE_ACTIVE)) {
+				res = bond_dev_queue_xmit(bond, skb, slave->dev);
+				break;
+			}
+		}
+	}
+out:
+	if (res) {
+		/* no suitable interface, frame not sent */
+		dev_kfree_skb(skb);
+	}
+	read_unlock(&bond->lock);
+	return 0;
+}
 /*------------------------- Device initialization ---------------------------*/
 
 static void bond_set_xmit_hash_policy(struct bonding *bond)
@@ -4500,6 +4553,8 @@ static int bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	case BOND_MODE_ALB:
 	case BOND_MODE_TLB:
 		return bond_alb_xmit(skb, dev);
+	case BOND_MODE_MARK:
+		return bond_xmit_mark(skb, dev);
 	default:
 		/* Should never happen, mode already checked */
 		printk(KERN_ERR DRV_NAME ": %s: Error: Unknown bonding mode %d\n",
@@ -4537,6 +4592,8 @@ void bond_set_mode_ops(struct bonding *bond, int mode)
 		/* FALLTHRU */
 	case BOND_MODE_TLB:
 		break;
+	case BOND_MODE_MARK:
+		break;
 	default:
 		/* Should never happen, mode already checked */
 		printk(KERN_ERR DRV_NAME
diff --git a/include/linux/if_bonding.h b/include/linux/if_bonding.h
index 65c2d24..253098f 100644
--- a/include/linux/if_bonding.h
+++ b/include/linux/if_bonding.h
@@ -70,6 +70,7 @@
 #define BOND_MODE_8023AD        4
 #define BOND_MODE_TLB           5
 #define BOND_MODE_ALB		6 /* TLB + RLB (receive load balancing) */
+#define BOND_MODE_MARK		7
 
 /* each slave's link has 4 states */
 #define BOND_LINK_UP    0           /* link is up and running */

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] bonding: add mark mode
  2009-05-01 17:39 [PATCH] bonding: add mark mode Andy Gospodarek
@ 2009-05-01 19:34 ` Neil Horman
  2009-05-01 19:52   ` Jay Vosburgh
  0 siblings, 1 reply; 12+ messages in thread
From: Neil Horman @ 2009-05-01 19:34 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: netdev, fubar, bonding-devel

On Fri, May 01, 2009 at 01:39:16PM -0400, Andy Gospodarek wrote:
> 
> Quite a few people are happy with the way bonding manages to split
> traffic to different members of a bond.  Many are quite disappointed
> that users or administrators cannot be given more control over the
> traffic distribution and would like something that they can control more
> easily.  I looked at extending some of the existing modes, but the
> cleanest option seemed to be one that created an additional mode to
> handle this case.  I hated to create yet another mode, but the
> simplicity of this mode made it a nice candidate for a new mode.  I have
> decided to call this mode 'mark' (or mode 7).
> 
> The mark mode of bonding relies on the skb->mark field for outgoing
> device selection.  Unmarked frames (ones where the mark is still zero),
> will be sent by the first active enslaved device.  Any marked frames
> will choose the outgoing device based on result of the modulo of the mark
> and the number of enslaved devices.  If that device is inactive
> (link-down), the traffic will default back to the first active enslaved
> device.  I debated how to use the mark to decide the outgoing device,
> but it seemed that modulo of the mark and the number of enslaved devices
> would provide the most flexibility for those who currently mark frames
> for other purposes.
> 
> I considered some other options for choosing destination devices based
> on marks, but the ones I came up would require additional sysfs
> configuration parameters and I would prefer not to add any more to an
> already crowded space.
> 
> I've tested this on a slightly older kernel than the net-next-2.6 tree
> than this patch is against by marking frames using mark and connmark
> iptables options and it seems to work as I expect.
> 
> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> ---
> 
>  Documentation/networking/bonding.txt |   25 ++++++++++++++
>  drivers/net/bonding/bond_main.c      |   59 ++++++++++++++++++++++++++++++++++-
>  include/linux/if_bonding.h           |    1 
>  3 files changed, 84 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
> index 0876275..7a0d4c2 100644
> --- a/Documentation/networking/bonding.txt
> +++ b/Documentation/networking/bonding.txt
> @@ -582,6 +582,31 @@ mode
>  		swapped with the new curr_active_slave that was
>  		chosen.
>  
> +	mark or 7
> +
> +		Mark-based policy: skbuffs that arrive to be
> +		transmitted will have the mark field inspected to
> +		determine the destination slave device.  When the
> +		skbuff's mark is zero, the first active device in the
> +		ordered list of enslaved devices will be used.  When
> +		the mark is non-zero the modulo of the mark and the
> +		number of enslaved devices will determine the
> +		interface used for transmission.  If this device is
> +		not active (link-down) then the mark will essentially
> +		be ignored and the first active device in the ordered
> +		list of enslaved devices will be used.
> +
> +		The flexibility offered with this mode allows users
> +		of netfilter to move various types of traffic to
> +		different slaves quite easily.  Information on this
> +		can be found in the manpages for iptables/ebtables
> +		as well as netfilter documentation.
> +
> +		Prerequisites:
> +
> +		1.  Without the ability to mark skbuffs this mode is
> +		not useful.  Netfilter greatly aides skbuff marking.
> +
>  num_grat_arp
>  
>  	Specifies the number of gratuitous ARPs to be issued after a
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index fd73836..5e1d166 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -123,7 +123,7 @@ module_param(mode, charp, 0);
>  MODULE_PARM_DESC(mode, "Mode of operation : 0 for balance-rr, "
>  		       "1 for active-backup, 2 for balance-xor, "
>  		       "3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, "
> -		       "6 for balance-alb");
> +		       "6 for balance-alb, 7 for mark");
>  module_param(primary, charp, 0);
>  MODULE_PARM_DESC(primary, "Primary network device to use");
>  module_param(lacp_rate, charp, 0);
> @@ -175,6 +175,7 @@ const struct bond_parm_tbl bond_mode_tbl[] = {
>  {	"802.3ad",		BOND_MODE_8023AD},
>  {	"balance-tlb",		BOND_MODE_TLB},
>  {	"balance-alb",		BOND_MODE_ALB},
> +{	"mark",			BOND_MODE_MARK},
>  {	NULL,			-1},
>  };
>  
> @@ -224,6 +225,7 @@ static const char *bond_mode_name(int mode)
>  		[BOND_MODE_8023AD]= "IEEE 802.3ad Dynamic link aggregation",
>  		[BOND_MODE_TLB] = "transmit load balancing",
>  		[BOND_MODE_ALB] = "adaptive load balancing",
> +		[BOND_MODE_MARK] = "mark-based transmit balancing",
>  	};
>  
>  	if (mode < 0 || mode > BOND_MODE_ALB)
> @@ -4464,6 +4466,57 @@ out:
>  	return 0;
>  }
>  
> +static int bond_xmit_mark(struct sk_buff *skb, struct net_device *bond_dev)
> +{
> +	struct bonding *bond = netdev_priv(bond_dev);
> +	struct slave *slave;
> +	int i, slave_no, res = 1;
> +
> +	read_lock(&bond->lock);
> +
> +	if (!BOND_IS_OK(bond)) {
> +		goto out;
> +	}
> +
> +	/* Use the mark as the determining factor for which slave to
> +	 * choose for transmission.  When behaving normally all should
> +	 * work just fine.  When a slave that is destined to be the
> +	 * transmitter of this frame is down, start at the front of the
> +	 * list and find the first available slave. */
> +
> +	slave_no = skb->mark ? skb->mark % bond->slave_cnt : 0;
> +
Would it be worthwhile to add a special case here (say all f's in mark, to
indicate a frames should be sent out all slaves on the bond?  In the case you
have traffic that might need to go to all interface (like maybe igmp)?


Neil


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bonding: add mark mode
  2009-05-01 19:34 ` Neil Horman
@ 2009-05-01 19:52   ` Jay Vosburgh
  2009-05-01 20:09     ` Neil Horman
  2009-05-01 20:25     ` Andy Gospodarek
  0 siblings, 2 replies; 12+ messages in thread
From: Jay Vosburgh @ 2009-05-01 19:52 UTC (permalink / raw)
  To: Neil Horman; +Cc: Andy Gospodarek, netdev, bonding-devel

Neil Horman <nhorman@tuxdriver.com> wrote:

>On Fri, May 01, 2009 at 01:39:16PM -0400, Andy Gospodarek wrote:
>> 
>> Quite a few people are happy with the way bonding manages to split
>> traffic to different members of a bond.  Many are quite disappointed
>> that users or administrators cannot be given more control over the
>> traffic distribution and would like something that they can control more
>> easily.  I looked at extending some of the existing modes, but the
>> cleanest option seemed to be one that created an additional mode to
>> handle this case.  I hated to create yet another mode, but the
>> simplicity of this mode made it a nice candidate for a new mode.  I have
>> decided to call this mode 'mark' (or mode 7).
>> 
>> The mark mode of bonding relies on the skb->mark field for outgoing
>> device selection.  Unmarked frames (ones where the mark is still zero),
>> will be sent by the first active enslaved device.  Any marked frames
>> will choose the outgoing device based on result of the modulo of the mark
>> and the number of enslaved devices.  If that device is inactive
>> (link-down), the traffic will default back to the first active enslaved
>> device.  I debated how to use the mark to decide the outgoing device,
>> but it seemed that modulo of the mark and the number of enslaved devices
>> would provide the most flexibility for those who currently mark frames
>> for other purposes.
>> 
>> I considered some other options for choosing destination devices based
>> on marks, but the ones I came up would require additional sysfs
>> configuration parameters and I would prefer not to add any more to an
>> already crowded space.
>> 
>> I've tested this on a slightly older kernel than the net-next-2.6 tree
>> than this patch is against by marking frames using mark and connmark
>> iptables options and it seems to work as I expect.
>> 
>> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
>> ---
>> 
>>  Documentation/networking/bonding.txt |   25 ++++++++++++++
>>  drivers/net/bonding/bond_main.c      |   59 ++++++++++++++++++++++++++++++++++-
>>  include/linux/if_bonding.h           |    1 
>>  3 files changed, 84 insertions(+), 1 deletion(-)
>> 
>> diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
>> index 0876275..7a0d4c2 100644
>> --- a/Documentation/networking/bonding.txt
>> +++ b/Documentation/networking/bonding.txt
>> @@ -582,6 +582,31 @@ mode
>>  		swapped with the new curr_active_slave that was
>>  		chosen.
>>  
>> +	mark or 7
>> +
>> +		Mark-based policy: skbuffs that arrive to be
>> +		transmitted will have the mark field inspected to
>> +		determine the destination slave device.  When the
>> +		skbuff's mark is zero, the first active device in the
>> +		ordered list of enslaved devices will be used.  When
>> +		the mark is non-zero the modulo of the mark and the
>> +		number of enslaved devices will determine the
>> +		interface used for transmission.  If this device is
>> +		not active (link-down) then the mark will essentially
>> +		be ignored and the first active device in the ordered
>> +		list of enslaved devices will be used.
>> +
>> +		The flexibility offered with this mode allows users
>> +		of netfilter to move various types of traffic to
>> +		different slaves quite easily.  Information on this
>> +		can be found in the manpages for iptables/ebtables
>> +		as well as netfilter documentation.
>> +
>> +		Prerequisites:
>> +
>> +		1.  Without the ability to mark skbuffs this mode is
>> +		not useful.  Netfilter greatly aides skbuff marking.
>> +
>>  num_grat_arp
>>  
>>  	Specifies the number of gratuitous ARPs to be issued after a
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index fd73836..5e1d166 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -123,7 +123,7 @@ module_param(mode, charp, 0);
>>  MODULE_PARM_DESC(mode, "Mode of operation : 0 for balance-rr, "
>>  		       "1 for active-backup, 2 for balance-xor, "
>>  		       "3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, "
>> -		       "6 for balance-alb");
>> +		       "6 for balance-alb, 7 for mark");
>>  module_param(primary, charp, 0);
>>  MODULE_PARM_DESC(primary, "Primary network device to use");
>>  module_param(lacp_rate, charp, 0);
>> @@ -175,6 +175,7 @@ const struct bond_parm_tbl bond_mode_tbl[] = {
>>  {	"802.3ad",		BOND_MODE_8023AD},
>>  {	"balance-tlb",		BOND_MODE_TLB},
>>  {	"balance-alb",		BOND_MODE_ALB},
>> +{	"mark",			BOND_MODE_MARK},
>>  {	NULL,			-1},
>>  };
>>  
>> @@ -224,6 +225,7 @@ static const char *bond_mode_name(int mode)
>>  		[BOND_MODE_8023AD]= "IEEE 802.3ad Dynamic link aggregation",
>>  		[BOND_MODE_TLB] = "transmit load balancing",
>>  		[BOND_MODE_ALB] = "adaptive load balancing",
>> +		[BOND_MODE_MARK] = "mark-based transmit balancing",
>>  	};
>>  
>>  	if (mode < 0 || mode > BOND_MODE_ALB)
>> @@ -4464,6 +4466,57 @@ out:
>>  	return 0;
>>  }
>>  
>> +static int bond_xmit_mark(struct sk_buff *skb, struct net_device *bond_dev)
>> +{
>> +	struct bonding *bond = netdev_priv(bond_dev);
>> +	struct slave *slave;
>> +	int i, slave_no, res = 1;
>> +
>> +	read_lock(&bond->lock);
>> +
>> +	if (!BOND_IS_OK(bond)) {
>> +		goto out;
>> +	}
>> +
>> +	/* Use the mark as the determining factor for which slave to
>> +	 * choose for transmission.  When behaving normally all should
>> +	 * work just fine.  When a slave that is destined to be the
>> +	 * transmitter of this frame is down, start at the front of the
>> +	 * list and find the first available slave. */

	Why not simply use the N'th up slave instead of reverting to
slave 0 for the down slave case?  I'm guessing this has to do with
trying to maintain some fixed balancing of traffic even in the face of
slave failure.

>> +	slave_no = skb->mark ? skb->mark % bond->slave_cnt : 0;
>> +
>Would it be worthwhile to add a special case here (say all f's in mark, to
>indicate a frames should be sent out all slaves on the bond?  In the case you
>have traffic that might need to go to all interface (like maybe igmp)?

	If I'm not misunderstanding the purpose of the mode, I think
it's etherchannel compatible (meaning that the switch has to be
configured), so I'm not sure why there would ever be a need to flood
packets to all ports.

	I think this would be generally be better a special hash policy,
in which case both the etherchannel (balance-xor) and 802.3ad modes
could take advantage of it.  I'd hazard to guess that Andy thought about
that, too, so what was the impediment?

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bonding: add mark mode
  2009-05-01 19:52   ` Jay Vosburgh
@ 2009-05-01 20:09     ` Neil Horman
  2009-05-01 20:31       ` Jay Vosburgh
  2009-05-01 20:25     ` Andy Gospodarek
  1 sibling, 1 reply; 12+ messages in thread
From: Neil Horman @ 2009-05-01 20:09 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Andy Gospodarek, netdev, bonding-devel

> 
> 	If I'm not misunderstanding the purpose of the mode, I think
> it's etherchannel compatible (meaning that the switch has to be
> configured), so I'm not sure why there would ever be a need to flood
> packets to all ports.
> 
It entirely possible that there may be no need to flood frames, I was just
asking the question.  And while it is possible that this might be etherchannel
compatible (in fact I agree, it does look compatible), I can see uses for it
beyond that (active-backup with traffic-class load balancing for example).

> 	I think this would be generally be better a special hash policy,
> in which case both the etherchannel (balance-xor) and 802.3ad modes
> could take advantage of it.  I'd hazard to guess that Andy thought about
> that, too, so what was the impediment?
> 

I honestly don't know, although I will say that making a special hash mode for
balance-xor seems a bit odd, since the implication there would be that output
port was no longer chosen by an xor operation.
Neil


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bonding: add mark mode
  2009-05-01 19:52   ` Jay Vosburgh
  2009-05-01 20:09     ` Neil Horman
@ 2009-05-01 20:25     ` Andy Gospodarek
  2009-05-01 20:44       ` Jay Vosburgh
  1 sibling, 1 reply; 12+ messages in thread
From: Andy Gospodarek @ 2009-05-01 20:25 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Neil Horman, Andy Gospodarek, netdev, bonding-devel

On Fri, May 01, 2009 at 12:52:16PM -0700, Jay Vosburgh wrote:
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> >On Fri, May 01, 2009 at 01:39:16PM -0400, Andy Gospodarek wrote:
> >> 
> >> Quite a few people are happy with the way bonding manages to split
> >> traffic to different members of a bond.  Many are quite disappointed
> >> that users or administrators cannot be given more control over the
> >> traffic distribution and would like something that they can control more
> >> easily.  I looked at extending some of the existing modes, but the
> >> cleanest option seemed to be one that created an additional mode to
> >> handle this case.  I hated to create yet another mode, but the
> >> simplicity of this mode made it a nice candidate for a new mode.  I have
> >> decided to call this mode 'mark' (or mode 7).
> >> 
> >> The mark mode of bonding relies on the skb->mark field for outgoing
> >> device selection.  Unmarked frames (ones where the mark is still zero),
> >> will be sent by the first active enslaved device.  Any marked frames
> >> will choose the outgoing device based on result of the modulo of the mark
> >> and the number of enslaved devices.  If that device is inactive
> >> (link-down), the traffic will default back to the first active enslaved
> >> device.  I debated how to use the mark to decide the outgoing device,
> >> but it seemed that modulo of the mark and the number of enslaved devices
> >> would provide the most flexibility for those who currently mark frames
> >> for other purposes.
> >> 
> >> I considered some other options for choosing destination devices based
> >> on marks, but the ones I came up would require additional sysfs
> >> configuration parameters and I would prefer not to add any more to an
> >> already crowded space.
> >> 
> >> I've tested this on a slightly older kernel than the net-next-2.6 tree
> >> than this patch is against by marking frames using mark and connmark
> >> iptables options and it seems to work as I expect.
> >> 
> >> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> >> ---
> >> 
> >>  Documentation/networking/bonding.txt |   25 ++++++++++++++
> >>  drivers/net/bonding/bond_main.c      |   59 ++++++++++++++++++++++++++++++++++-
> >>  include/linux/if_bonding.h           |    1 
> >>  3 files changed, 84 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
> >> index 0876275..7a0d4c2 100644
> >> --- a/Documentation/networking/bonding.txt
> >> +++ b/Documentation/networking/bonding.txt
> >> @@ -582,6 +582,31 @@ mode
> >>  		swapped with the new curr_active_slave that was
> >>  		chosen.
> >>  
> >> +	mark or 7
> >> +
> >> +		Mark-based policy: skbuffs that arrive to be
> >> +		transmitted will have the mark field inspected to
> >> +		determine the destination slave device.  When the
> >> +		skbuff's mark is zero, the first active device in the
> >> +		ordered list of enslaved devices will be used.  When
> >> +		the mark is non-zero the modulo of the mark and the
> >> +		number of enslaved devices will determine the
> >> +		interface used for transmission.  If this device is
> >> +		not active (link-down) then the mark will essentially
> >> +		be ignored and the first active device in the ordered
> >> +		list of enslaved devices will be used.
> >> +
> >> +		The flexibility offered with this mode allows users
> >> +		of netfilter to move various types of traffic to
> >> +		different slaves quite easily.  Information on this
> >> +		can be found in the manpages for iptables/ebtables
> >> +		as well as netfilter documentation.
> >> +
> >> +		Prerequisites:
> >> +
> >> +		1.  Without the ability to mark skbuffs this mode is
> >> +		not useful.  Netfilter greatly aides skbuff marking.
> >> +
> >>  num_grat_arp
> >>  
> >>  	Specifies the number of gratuitous ARPs to be issued after a
> >> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >> index fd73836..5e1d166 100644
> >> --- a/drivers/net/bonding/bond_main.c
> >> +++ b/drivers/net/bonding/bond_main.c
> >> @@ -123,7 +123,7 @@ module_param(mode, charp, 0);
> >>  MODULE_PARM_DESC(mode, "Mode of operation : 0 for balance-rr, "
> >>  		       "1 for active-backup, 2 for balance-xor, "
> >>  		       "3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, "
> >> -		       "6 for balance-alb");
> >> +		       "6 for balance-alb, 7 for mark");
> >>  module_param(primary, charp, 0);
> >>  MODULE_PARM_DESC(primary, "Primary network device to use");
> >>  module_param(lacp_rate, charp, 0);
> >> @@ -175,6 +175,7 @@ const struct bond_parm_tbl bond_mode_tbl[] = {
> >>  {	"802.3ad",		BOND_MODE_8023AD},
> >>  {	"balance-tlb",		BOND_MODE_TLB},
> >>  {	"balance-alb",		BOND_MODE_ALB},
> >> +{	"mark",			BOND_MODE_MARK},
> >>  {	NULL,			-1},
> >>  };
> >>  
> >> @@ -224,6 +225,7 @@ static const char *bond_mode_name(int mode)
> >>  		[BOND_MODE_8023AD]= "IEEE 802.3ad Dynamic link aggregation",
> >>  		[BOND_MODE_TLB] = "transmit load balancing",
> >>  		[BOND_MODE_ALB] = "adaptive load balancing",
> >> +		[BOND_MODE_MARK] = "mark-based transmit balancing",
> >>  	};
> >>  
> >>  	if (mode < 0 || mode > BOND_MODE_ALB)
> >> @@ -4464,6 +4466,57 @@ out:
> >>  	return 0;
> >>  }
> >>  
> >> +static int bond_xmit_mark(struct sk_buff *skb, struct net_device *bond_dev)
> >> +{
> >> +	struct bonding *bond = netdev_priv(bond_dev);
> >> +	struct slave *slave;
> >> +	int i, slave_no, res = 1;
> >> +
> >> +	read_lock(&bond->lock);
> >> +
> >> +	if (!BOND_IS_OK(bond)) {
> >> +		goto out;
> >> +	}
> >> +
> >> +	/* Use the mark as the determining factor for which slave to
> >> +	 * choose for transmission.  When behaving normally all should
> >> +	 * work just fine.  When a slave that is destined to be the
> >> +	 * transmitter of this frame is down, start at the front of the
> >> +	 * list and find the first available slave. */
> 
> 	Why not simply use the N'th up slave instead of reverting to
> slave 0 for the down slave case?  I'm guessing this has to do with
> trying to maintain some fixed balancing of traffic even in the face of
> slave failure.
> 

That could be a reasonable option as well -- in fact that was my first
implementation.  I decided to do it this way, so that a user could
consider the first device as the primary slave for unmarked traffic and
the slave for marked traffic destined for a slave that was down rather
than what might seem like a random choice for traffic.

> >> +	slave_no = skb->mark ? skb->mark % bond->slave_cnt : 0;
> >> +
> >Would it be worthwhile to add a special case here (say all f's in mark, to
> >indicate a frames should be sent out all slaves on the bond?  In the case you
> >have traffic that might need to go to all interface (like maybe igmp)?
> 
> 	If I'm not misunderstanding the purpose of the mode, I think
> it's etherchannel compatible (meaning that the switch has to be
> configured), so I'm not sure why there would ever be a need to flood
> packets to all ports.
> 
> 	I think this would be generally be better a special hash policy,
> in which case both the etherchannel (balance-xor) and 802.3ad modes
> could take advantage of it.  I'd hazard to guess that Andy thought about
> that, too, so what was the impediment?
> 

I wasn't designing this to be etherchannel compatible.  Though it could
work that way, my idea was to use this in an environment where slaves
may be connected to different switches that are capable of moving all of
the traffic to it's destination if needed, but if the traffic is split
in a predictable application specific manner the network can run more
efficiently.  Think of this more as an active-backup mode where the
backup links could actually be utilized.

When I first started on this I considered just extending the
active-backup mode's transmit function, but I didn't want to deal with
the potential loss of duplicate frames in skb_bond_should_drop and I
also didn't want to deal with the complicated failover scenarios of
active-backup mode.

It's probably the most like round-robin (in fact it's quite close), but
I felt that this mode could eventually grow into something more useful
(who knows how it might be extended) and deserved it's own mode rather
than dealing with the possibility of having a larger more complicated
transmit routine in the future when another mode would be just fine.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bonding: add mark mode
  2009-05-01 20:09     ` Neil Horman
@ 2009-05-01 20:31       ` Jay Vosburgh
  2009-05-01 20:39         ` Neil Horman
  0 siblings, 1 reply; 12+ messages in thread
From: Jay Vosburgh @ 2009-05-01 20:31 UTC (permalink / raw)
  To: Neil Horman; +Cc: Andy Gospodarek, netdev, bonding-devel

Neil Horman <nhorman@tuxdriver.com> wrote:

>> 	If I'm not misunderstanding the purpose of the mode, I think
>> it's etherchannel compatible (meaning that the switch has to be
>> configured), so I'm not sure why there would ever be a need to flood
>> packets to all ports.
>> 
>It entirely possible that there may be no need to flood frames, I was just
>asking the question.  And while it is possible that this might be etherchannel
>compatible (in fact I agree, it does look compatible), I can see uses for it
>beyond that (active-backup with traffic-class load balancing for example).

	For that last comment (active-backup), are you thinking in terms
of N active slaves and one backup?

>> 	I think this would be generally be better a special hash policy,
>> in which case both the etherchannel (balance-xor) and 802.3ad modes
>> could take advantage of it.  I'd hazard to guess that Andy thought about
>> that, too, so what was the impediment?
>
>I honestly don't know, although I will say that making a special hash mode for
>balance-xor seems a bit odd, since the implication there would be that output
>port was no longer chosen by an xor operation.

	Well, there are already three different hash algorithms that can
be selected, and if the only thing stopping this functionality as some
type of different slave selection gizmo is terminology, then I'm happy
to change the name of "xmit_hash_policy" to simply "xmit_policy" (in a
backwards compatible manner, of course) and slap this in as a new
policy.  Yah, balance-xor has "xor" in the name, but that can either be
documented around or, again, aliased as "balance-policy" or something.

	I see no reason the terminology should stand in the way of
setting things up The Right Way (which, in this discussion, revolves
around this type of new slave selection stuff working for the largest
reasonable set of modes, here, etherchannel and 802.3ad).

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bonding: add mark mode
  2009-05-01 20:31       ` Jay Vosburgh
@ 2009-05-01 20:39         ` Neil Horman
  0 siblings, 0 replies; 12+ messages in thread
From: Neil Horman @ 2009-05-01 20:39 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Andy Gospodarek, netdev, bonding-devel

On Fri, May 01, 2009 at 01:31:23PM -0700, Jay Vosburgh wrote:
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> >> 	If I'm not misunderstanding the purpose of the mode, I think
> >> it's etherchannel compatible (meaning that the switch has to be
> >> configured), so I'm not sure why there would ever be a need to flood
> >> packets to all ports.
> >> 
> >It entirely possible that there may be no need to flood frames, I was just
> >asking the question.  And while it is possible that this might be etherchannel
> >compatible (in fact I agree, it does look compatible), I can see uses for it
> >beyond that (active-backup with traffic-class load balancing for example).
> 
> 	For that last comment (active-backup), are you thinking in terms
> of N active slaves and one backup?
> 
I suppose.  I really didn't have anything specific in mind, I was just musing on
what potential this might have.  As Andy mentioned in his post, this could be
used as an active backup mode where the backup links can still carry traffic.

> >> 	I think this would be generally be better a special hash policy,
> >> in which case both the etherchannel (balance-xor) and 802.3ad modes
> >> could take advantage of it.  I'd hazard to guess that Andy thought about
> >> that, too, so what was the impediment?
> >
> >I honestly don't know, although I will say that making a special hash mode for
> >balance-xor seems a bit odd, since the implication there would be that output
> >port was no longer chosen by an xor operation.
> 
> 	Well, there are already three different hash algorithms that can
> be selected, and if the only thing stopping this functionality as some
> type of different slave selection gizmo is terminology, then I'm happy
> to change the name of "xmit_hash_policy" to simply "xmit_policy" (in a
> backwards compatible manner, of course) and slap this in as a new
> policy.  Yah, balance-xor has "xor" in the name, but that can either be
> documented around or, again, aliased as "balance-policy" or something.
> 
> 	I see no reason the terminology should stand in the way of
> setting things up The Right Way (which, in this discussion, revolves
> around this type of new slave selection stuff working for the largest
> reasonable set of modes, here, etherchannel and 802.3ad).
> 
I'll stay out of your way and let you and Andy hash out "The Right Way" :).  I
just like the idea of being able to select your output destination :)

Neil


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bonding: add mark mode
  2009-05-01 20:25     ` Andy Gospodarek
@ 2009-05-01 20:44       ` Jay Vosburgh
  2009-05-01 21:06         ` Andy Gospodarek
  0 siblings, 1 reply; 12+ messages in thread
From: Jay Vosburgh @ 2009-05-01 20:44 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: Neil Horman, netdev, bonding-devel

Andy Gospodarek <andy@greyhouse.net> wrote:

>On Fri, May 01, 2009 at 12:52:16PM -0700, Jay Vosburgh wrote:
>> Neil Horman <nhorman@tuxdriver.com> wrote:
>> 
>> >On Fri, May 01, 2009 at 01:39:16PM -0400, Andy Gospodarek wrote:
>> >> 
>> >> Quite a few people are happy with the way bonding manages to split
>> >> traffic to different members of a bond.  Many are quite disappointed
>> >> that users or administrators cannot be given more control over the
>> >> traffic distribution and would like something that they can control more
>> >> easily.  I looked at extending some of the existing modes, but the
>> >> cleanest option seemed to be one that created an additional mode to
>> >> handle this case.  I hated to create yet another mode, but the
>> >> simplicity of this mode made it a nice candidate for a new mode.  I have
>> >> decided to call this mode 'mark' (or mode 7).
>> >> 
>> >> The mark mode of bonding relies on the skb->mark field for outgoing
>> >> device selection.  Unmarked frames (ones where the mark is still zero),
>> >> will be sent by the first active enslaved device.  Any marked frames
>> >> will choose the outgoing device based on result of the modulo of the mark
>> >> and the number of enslaved devices.  If that device is inactive
>> >> (link-down), the traffic will default back to the first active enslaved
>> >> device.  I debated how to use the mark to decide the outgoing device,
>> >> but it seemed that modulo of the mark and the number of enslaved devices
>> >> would provide the most flexibility for those who currently mark frames
>> >> for other purposes.
>> >> 
>> >> I considered some other options for choosing destination devices based
>> >> on marks, but the ones I came up would require additional sysfs
>> >> configuration parameters and I would prefer not to add any more to an
>> >> already crowded space.
>> >> 
>> >> I've tested this on a slightly older kernel than the net-next-2.6 tree
>> >> than this patch is against by marking frames using mark and connmark
>> >> iptables options and it seems to work as I expect.
>> >> 
>> >> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
>> >> ---
>> >> 
>> >>  Documentation/networking/bonding.txt |   25 ++++++++++++++
>> >>  drivers/net/bonding/bond_main.c      |   59 ++++++++++++++++++++++++++++++++++-
>> >>  include/linux/if_bonding.h           |    1 
>> >>  3 files changed, 84 insertions(+), 1 deletion(-)
>> >> 
>> >> diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
>> >> index 0876275..7a0d4c2 100644
>> >> --- a/Documentation/networking/bonding.txt
>> >> +++ b/Documentation/networking/bonding.txt
>> >> @@ -582,6 +582,31 @@ mode
>> >>  		swapped with the new curr_active_slave that was
>> >>  		chosen.
>> >>  
>> >> +	mark or 7
>> >> +
>> >> +		Mark-based policy: skbuffs that arrive to be
>> >> +		transmitted will have the mark field inspected to
>> >> +		determine the destination slave device.  When the
>> >> +		skbuff's mark is zero, the first active device in the
>> >> +		ordered list of enslaved devices will be used.  When
>> >> +		the mark is non-zero the modulo of the mark and the
>> >> +		number of enslaved devices will determine the
>> >> +		interface used for transmission.  If this device is
>> >> +		not active (link-down) then the mark will essentially
>> >> +		be ignored and the first active device in the ordered
>> >> +		list of enslaved devices will be used.
>> >> +
>> >> +		The flexibility offered with this mode allows users
>> >> +		of netfilter to move various types of traffic to
>> >> +		different slaves quite easily.  Information on this
>> >> +		can be found in the manpages for iptables/ebtables
>> >> +		as well as netfilter documentation.
>> >> +
>> >> +		Prerequisites:
>> >> +
>> >> +		1.  Without the ability to mark skbuffs this mode is
>> >> +		not useful.  Netfilter greatly aides skbuff marking.
>> >> +
>> >>  num_grat_arp
>> >>  
>> >>  	Specifies the number of gratuitous ARPs to be issued after a
>> >> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> >> index fd73836..5e1d166 100644
>> >> --- a/drivers/net/bonding/bond_main.c
>> >> +++ b/drivers/net/bonding/bond_main.c
>> >> @@ -123,7 +123,7 @@ module_param(mode, charp, 0);
>> >>  MODULE_PARM_DESC(mode, "Mode of operation : 0 for balance-rr, "
>> >>  		       "1 for active-backup, 2 for balance-xor, "
>> >>  		       "3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, "
>> >> -		       "6 for balance-alb");
>> >> +		       "6 for balance-alb, 7 for mark");
>> >>  module_param(primary, charp, 0);
>> >>  MODULE_PARM_DESC(primary, "Primary network device to use");
>> >>  module_param(lacp_rate, charp, 0);
>> >> @@ -175,6 +175,7 @@ const struct bond_parm_tbl bond_mode_tbl[] = {
>> >>  {	"802.3ad",		BOND_MODE_8023AD},
>> >>  {	"balance-tlb",		BOND_MODE_TLB},
>> >>  {	"balance-alb",		BOND_MODE_ALB},
>> >> +{	"mark",			BOND_MODE_MARK},
>> >>  {	NULL,			-1},
>> >>  };
>> >>  
>> >> @@ -224,6 +225,7 @@ static const char *bond_mode_name(int mode)
>> >>  		[BOND_MODE_8023AD]= "IEEE 802.3ad Dynamic link aggregation",
>> >>  		[BOND_MODE_TLB] = "transmit load balancing",
>> >>  		[BOND_MODE_ALB] = "adaptive load balancing",
>> >> +		[BOND_MODE_MARK] = "mark-based transmit balancing",
>> >>  	};
>> >>  
>> >>  	if (mode < 0 || mode > BOND_MODE_ALB)
>> >> @@ -4464,6 +4466,57 @@ out:
>> >>  	return 0;
>> >>  }
>> >>  
>> >> +static int bond_xmit_mark(struct sk_buff *skb, struct net_device *bond_dev)
>> >> +{
>> >> +	struct bonding *bond = netdev_priv(bond_dev);
>> >> +	struct slave *slave;
>> >> +	int i, slave_no, res = 1;
>> >> +
>> >> +	read_lock(&bond->lock);
>> >> +
>> >> +	if (!BOND_IS_OK(bond)) {
>> >> +		goto out;
>> >> +	}
>> >> +
>> >> +	/* Use the mark as the determining factor for which slave to
>> >> +	 * choose for transmission.  When behaving normally all should
>> >> +	 * work just fine.  When a slave that is destined to be the
>> >> +	 * transmitter of this frame is down, start at the front of the
>> >> +	 * list and find the first available slave. */
>> 
>> 	Why not simply use the N'th up slave instead of reverting to
>> slave 0 for the down slave case?  I'm guessing this has to do with
>> trying to maintain some fixed balancing of traffic even in the face of
>> slave failure.
>> 
>
>That could be a reasonable option as well -- in fact that was my first
>implementation.  I decided to do it this way, so that a user could
>consider the first device as the primary slave for unmarked traffic and
>the slave for marked traffic destined for a slave that was down rather
>than what might seem like a random choice for traffic.

	Ok, understood.

>> >> +	slave_no = skb->mark ? skb->mark % bond->slave_cnt : 0;
>> >> +
>> >Would it be worthwhile to add a special case here (say all f's in mark, to
>> >indicate a frames should be sent out all slaves on the bond?  In the case you
>> >have traffic that might need to go to all interface (like maybe igmp)?
>> 
>> 	If I'm not misunderstanding the purpose of the mode, I think
>> it's etherchannel compatible (meaning that the switch has to be
>> configured), so I'm not sure why there would ever be a need to flood
>> packets to all ports.
>> 
>> 	I think this would be generally be better a special hash policy,
>> in which case both the etherchannel (balance-xor) and 802.3ad modes
>> could take advantage of it.  I'd hazard to guess that Andy thought about
>> that, too, so what was the impediment?
>> 
>
>I wasn't designing this to be etherchannel compatible.  Though it could
>work that way, my idea was to use this in an environment where slaves
>may be connected to different switches that are capable of moving all of
>the traffic to it's destination if needed, but if the traffic is split
>in a predictable application specific manner the network can run more
>efficiently.  Think of this more as an active-backup mode where the
>backup links could actually be utilized.

	Hmm.  If the slaves don't connect to the same places, why use
bonding at all, then?  Or, perhaps, two smaller bonds instead of one big
one.

	Are you thinking of something like a load balancer back end, or
am I missing the point entirely?

	And, really, active-backup using the backup links is just a load
balancing mode.  I'm not sure I see the special distinction there.

>When I first started on this I considered just extending the
>active-backup mode's transmit function, but I didn't want to deal with
>the potential loss of duplicate frames in skb_bond_should_drop and I
>also didn't want to deal with the complicated failover scenarios of
>active-backup mode.
>
>It's probably the most like round-robin (in fact it's quite close), but
>I felt that this mode could eventually grow into something more useful
>(who knows how it might be extended) and deserved it's own mode rather
>than dealing with the possibility of having a larger more complicated
>transmit routine in the future when another mode would be just fine.

	I'm still not quite sure how this differs from balance-xor mode
with a hypothetical "xmit_hash_policy=mark" sort of arrangement, other
than, perhaps, intended purpose.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bonding: add mark mode
  2009-05-01 20:44       ` Jay Vosburgh
@ 2009-05-01 21:06         ` Andy Gospodarek
  2009-05-01 21:51           ` Jay Vosburgh
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Gospodarek @ 2009-05-01 21:06 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Andy Gospodarek, Neil Horman, netdev, bonding-devel

On Fri, May 01, 2009 at 01:44:53PM -0700, Jay Vosburgh wrote:
> Andy Gospodarek <andy@greyhouse.net> wrote:
> 
> >On Fri, May 01, 2009 at 12:52:16PM -0700, Jay Vosburgh wrote:
> >> Neil Horman <nhorman@tuxdriver.com> wrote:
> >> 
> >> >On Fri, May 01, 2009 at 01:39:16PM -0400, Andy Gospodarek wrote:
> >> >> 
> >> >> Quite a few people are happy with the way bonding manages to split
> >> >> traffic to different members of a bond.  Many are quite disappointed
> >> >> that users or administrators cannot be given more control over the
> >> >> traffic distribution and would like something that they can control more
> >> >> easily.  I looked at extending some of the existing modes, but the
> >> >> cleanest option seemed to be one that created an additional mode to
> >> >> handle this case.  I hated to create yet another mode, but the
> >> >> simplicity of this mode made it a nice candidate for a new mode.  I have
> >> >> decided to call this mode 'mark' (or mode 7).
> >> >> 
> >> >> The mark mode of bonding relies on the skb->mark field for outgoing
> >> >> device selection.  Unmarked frames (ones where the mark is still zero),
> >> >> will be sent by the first active enslaved device.  Any marked frames
> >> >> will choose the outgoing device based on result of the modulo of the mark
> >> >> and the number of enslaved devices.  If that device is inactive
> >> >> (link-down), the traffic will default back to the first active enslaved
> >> >> device.  I debated how to use the mark to decide the outgoing device,
> >> >> but it seemed that modulo of the mark and the number of enslaved devices
> >> >> would provide the most flexibility for those who currently mark frames
> >> >> for other purposes.
> >> >> 
> >> >> I considered some other options for choosing destination devices based
> >> >> on marks, but the ones I came up would require additional sysfs
> >> >> configuration parameters and I would prefer not to add any more to an
> >> >> already crowded space.
> >> >> 
> >> >> I've tested this on a slightly older kernel than the net-next-2.6 tree
> >> >> than this patch is against by marking frames using mark and connmark
> >> >> iptables options and it seems to work as I expect.
> >> >> 
> >> >> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> >> >> ---
> >> >> 
> >> >>  Documentation/networking/bonding.txt |   25 ++++++++++++++
> >> >>  drivers/net/bonding/bond_main.c      |   59 ++++++++++++++++++++++++++++++++++-
> >> >>  include/linux/if_bonding.h           |    1 
> >> >>  3 files changed, 84 insertions(+), 1 deletion(-)
> >> >> 
> >> >> diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
> >> >> index 0876275..7a0d4c2 100644
> >> >> --- a/Documentation/networking/bonding.txt
> >> >> +++ b/Documentation/networking/bonding.txt
> >> >> @@ -582,6 +582,31 @@ mode
> >> >>  		swapped with the new curr_active_slave that was
> >> >>  		chosen.
> >> >>  
> >> >> +	mark or 7
> >> >> +
> >> >> +		Mark-based policy: skbuffs that arrive to be
> >> >> +		transmitted will have the mark field inspected to
> >> >> +		determine the destination slave device.  When the
> >> >> +		skbuff's mark is zero, the first active device in the
> >> >> +		ordered list of enslaved devices will be used.  When
> >> >> +		the mark is non-zero the modulo of the mark and the
> >> >> +		number of enslaved devices will determine the
> >> >> +		interface used for transmission.  If this device is
> >> >> +		not active (link-down) then the mark will essentially
> >> >> +		be ignored and the first active device in the ordered
> >> >> +		list of enslaved devices will be used.
> >> >> +
> >> >> +		The flexibility offered with this mode allows users
> >> >> +		of netfilter to move various types of traffic to
> >> >> +		different slaves quite easily.  Information on this
> >> >> +		can be found in the manpages for iptables/ebtables
> >> >> +		as well as netfilter documentation.
> >> >> +
> >> >> +		Prerequisites:
> >> >> +
> >> >> +		1.  Without the ability to mark skbuffs this mode is
> >> >> +		not useful.  Netfilter greatly aides skbuff marking.
> >> >> +
> >> >>  num_grat_arp
> >> >>  
> >> >>  	Specifies the number of gratuitous ARPs to be issued after a
> >> >> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >> >> index fd73836..5e1d166 100644
> >> >> --- a/drivers/net/bonding/bond_main.c
> >> >> +++ b/drivers/net/bonding/bond_main.c
> >> >> @@ -123,7 +123,7 @@ module_param(mode, charp, 0);
> >> >>  MODULE_PARM_DESC(mode, "Mode of operation : 0 for balance-rr, "
> >> >>  		       "1 for active-backup, 2 for balance-xor, "
> >> >>  		       "3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, "
> >> >> -		       "6 for balance-alb");
> >> >> +		       "6 for balance-alb, 7 for mark");
> >> >>  module_param(primary, charp, 0);
> >> >>  MODULE_PARM_DESC(primary, "Primary network device to use");
> >> >>  module_param(lacp_rate, charp, 0);
> >> >> @@ -175,6 +175,7 @@ const struct bond_parm_tbl bond_mode_tbl[] = {
> >> >>  {	"802.3ad",		BOND_MODE_8023AD},
> >> >>  {	"balance-tlb",		BOND_MODE_TLB},
> >> >>  {	"balance-alb",		BOND_MODE_ALB},
> >> >> +{	"mark",			BOND_MODE_MARK},
> >> >>  {	NULL,			-1},
> >> >>  };
> >> >>  
> >> >> @@ -224,6 +225,7 @@ static const char *bond_mode_name(int mode)
> >> >>  		[BOND_MODE_8023AD]= "IEEE 802.3ad Dynamic link aggregation",
> >> >>  		[BOND_MODE_TLB] = "transmit load balancing",
> >> >>  		[BOND_MODE_ALB] = "adaptive load balancing",
> >> >> +		[BOND_MODE_MARK] = "mark-based transmit balancing",
> >> >>  	};
> >> >>  
> >> >>  	if (mode < 0 || mode > BOND_MODE_ALB)
> >> >> @@ -4464,6 +4466,57 @@ out:
> >> >>  	return 0;
> >> >>  }
> >> >>  
> >> >> +static int bond_xmit_mark(struct sk_buff *skb, struct net_device *bond_dev)
> >> >> +{
> >> >> +	struct bonding *bond = netdev_priv(bond_dev);
> >> >> +	struct slave *slave;
> >> >> +	int i, slave_no, res = 1;
> >> >> +
> >> >> +	read_lock(&bond->lock);
> >> >> +
> >> >> +	if (!BOND_IS_OK(bond)) {
> >> >> +		goto out;
> >> >> +	}
> >> >> +
> >> >> +	/* Use the mark as the determining factor for which slave to
> >> >> +	 * choose for transmission.  When behaving normally all should
> >> >> +	 * work just fine.  When a slave that is destined to be the
> >> >> +	 * transmitter of this frame is down, start at the front of the
> >> >> +	 * list and find the first available slave. */
> >> 
> >> 	Why not simply use the N'th up slave instead of reverting to
> >> slave 0 for the down slave case?  I'm guessing this has to do with
> >> trying to maintain some fixed balancing of traffic even in the face of
> >> slave failure.
> >> 
> >
> >That could be a reasonable option as well -- in fact that was my first
> >implementation.  I decided to do it this way, so that a user could
> >consider the first device as the primary slave for unmarked traffic and
> >the slave for marked traffic destined for a slave that was down rather
> >than what might seem like a random choice for traffic.
> 
> 	Ok, understood.
> 
> >> >> +	slave_no = skb->mark ? skb->mark % bond->slave_cnt : 0;
> >> >> +
> >> >Would it be worthwhile to add a special case here (say all f's in mark, to
> >> >indicate a frames should be sent out all slaves on the bond?  In the case you
> >> >have traffic that might need to go to all interface (like maybe igmp)?
> >> 
> >> 	If I'm not misunderstanding the purpose of the mode, I think
> >> it's etherchannel compatible (meaning that the switch has to be
> >> configured), so I'm not sure why there would ever be a need to flood
> >> packets to all ports.
> >> 
> >> 	I think this would be generally be better a special hash policy,
> >> in which case both the etherchannel (balance-xor) and 802.3ad modes
> >> could take advantage of it.  I'd hazard to guess that Andy thought about
> >> that, too, so what was the impediment?
> >> 
> >
> >I wasn't designing this to be etherchannel compatible.  Though it could
> >work that way, my idea was to use this in an environment where slaves
> >may be connected to different switches that are capable of moving all of
> >the traffic to it's destination if needed, but if the traffic is split
> >in a predictable application specific manner the network can run more
> >efficiently.  Think of this more as an active-backup mode where the
> >backup links could actually be utilized.
> 
> 	Hmm.  If the slaves don't connect to the same places, why use
> bonding at all, then?  Or, perhaps, two smaller bonds instead of one big
> one.
> 
> 	Are you thinking of something like a load balancer back end, or
> am I missing the point entirely?
> 
> 	And, really, active-backup using the backup links is just a load
> balancing mode.  I'm not sure I see the special distinction there.
> 

OK, let's forget I mentioned active-backup. :)

The best way to think about this is a user-configured load-balancing
mode.  It can work with switches that are unaware of the existence of
the bond.  This marking mode transmit could also be used for xor and
802.3ad if there was a desire, but my original intent (and continued
desire) was to create where the user controlled the load balancing and
the option of using single or multiple switches could exist.  I've
certainly seen network topolgies where this would be useful.  I can take
some time and draw one of those up if you like (I'm just pressed for
time right now).

> >When I first started on this I considered just extending the
> >active-backup mode's transmit function, but I didn't want to deal with
> >the potential loss of duplicate frames in skb_bond_should_drop and I
> >also didn't want to deal with the complicated failover scenarios of
> >active-backup mode.
> >
> >It's probably the most like round-robin (in fact it's quite close), but
> >I felt that this mode could eventually grow into something more useful
> >(who knows how it might be extended) and deserved it's own mode rather
> >than dealing with the possibility of having a larger more complicated
> >transmit routine in the future when another mode would be just fine.
> 
> 	I'm still not quite sure how this differs from balance-xor mode
> with a hypothetical "xmit_hash_policy=mark" sort of arrangement, other
> than, perhaps, intended purpose.
> 

XOR is designed to work with switches that are aware of the bond, right?
This does not have the same requirement and I wanted to make sure it
could work with switches that were unaware (or possibly separate
switches).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bonding: add mark mode
  2009-05-01 21:06         ` Andy Gospodarek
@ 2009-05-01 21:51           ` Jay Vosburgh
  2009-05-15 20:38             ` Andy Gospodarek
  0 siblings, 1 reply; 12+ messages in thread
From: Jay Vosburgh @ 2009-05-01 21:51 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: Neil Horman, netdev, bonding-devel

Andy Gospodarek <andy@greyhouse.net> wrote:

>On Fri, May 01, 2009 at 01:44:53PM -0700, Jay Vosburgh wrote:
>> Andy Gospodarek <andy@greyhouse.net> wrote:
>> 
>> >On Fri, May 01, 2009 at 12:52:16PM -0700, Jay Vosburgh wrote:
>> >> Neil Horman <nhorman@tuxdriver.com> wrote:
>> >> 
>> >> >On Fri, May 01, 2009 at 01:39:16PM -0400, Andy Gospodarek wrote:
>> >> >> 
>> >> >> Quite a few people are happy with the way bonding manages to split
>> >> >> traffic to different members of a bond.  Many are quite disappointed
>> >> >> that users or administrators cannot be given more control over the
>> >> >> traffic distribution and would like something that they can control more
>> >> >> easily.  I looked at extending some of the existing modes, but the
>> >> >> cleanest option seemed to be one that created an additional mode to
>> >> >> handle this case.  I hated to create yet another mode, but the
>> >> >> simplicity of this mode made it a nice candidate for a new mode.  I have
>> >> >> decided to call this mode 'mark' (or mode 7).
>> >> >> 
>> >> >> The mark mode of bonding relies on the skb->mark field for outgoing
>> >> >> device selection.  Unmarked frames (ones where the mark is still zero),
>> >> >> will be sent by the first active enslaved device.  Any marked frames
>> >> >> will choose the outgoing device based on result of the modulo of the mark
>> >> >> and the number of enslaved devices.  If that device is inactive
>> >> >> (link-down), the traffic will default back to the first active enslaved
>> >> >> device.  I debated how to use the mark to decide the outgoing device,
>> >> >> but it seemed that modulo of the mark and the number of enslaved devices
>> >> >> would provide the most flexibility for those who currently mark frames
>> >> >> for other purposes.
>> >> >> 
>> >> >> I considered some other options for choosing destination devices based
>> >> >> on marks, but the ones I came up would require additional sysfs
>> >> >> configuration parameters and I would prefer not to add any more to an
>> >> >> already crowded space.
>> >> >> 
>> >> >> I've tested this on a slightly older kernel than the net-next-2.6 tree
>> >> >> than this patch is against by marking frames using mark and connmark
>> >> >> iptables options and it seems to work as I expect.
>> >> >> 
>> >> >> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
>> >> >> ---
>> >> >> 
>> >> >>  Documentation/networking/bonding.txt |   25 ++++++++++++++
>> >> >>  drivers/net/bonding/bond_main.c      |   59 ++++++++++++++++++++++++++++++++++-
>> >> >>  include/linux/if_bonding.h           |    1 
>> >> >>  3 files changed, 84 insertions(+), 1 deletion(-)
>> >> >> 
>> >> >> diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
>> >> >> index 0876275..7a0d4c2 100644
>> >> >> --- a/Documentation/networking/bonding.txt
>> >> >> +++ b/Documentation/networking/bonding.txt
>> >> >> @@ -582,6 +582,31 @@ mode
>> >> >>  		swapped with the new curr_active_slave that was
>> >> >>  		chosen.
>> >> >>  
>> >> >> +	mark or 7
>> >> >> +
>> >> >> +		Mark-based policy: skbuffs that arrive to be
>> >> >> +		transmitted will have the mark field inspected to
>> >> >> +		determine the destination slave device.  When the
>> >> >> +		skbuff's mark is zero, the first active device in the
>> >> >> +		ordered list of enslaved devices will be used.  When
>> >> >> +		the mark is non-zero the modulo of the mark and the
>> >> >> +		number of enslaved devices will determine the
>> >> >> +		interface used for transmission.  If this device is
>> >> >> +		not active (link-down) then the mark will essentially
>> >> >> +		be ignored and the first active device in the ordered
>> >> >> +		list of enslaved devices will be used.
>> >> >> +
>> >> >> +		The flexibility offered with this mode allows users
>> >> >> +		of netfilter to move various types of traffic to
>> >> >> +		different slaves quite easily.  Information on this
>> >> >> +		can be found in the manpages for iptables/ebtables
>> >> >> +		as well as netfilter documentation.
>> >> >> +
>> >> >> +		Prerequisites:
>> >> >> +
>> >> >> +		1.  Without the ability to mark skbuffs this mode is
>> >> >> +		not useful.  Netfilter greatly aides skbuff marking.
>> >> >> +
>> >> >>  num_grat_arp
>> >> >>  
>> >> >>  	Specifies the number of gratuitous ARPs to be issued after a
>> >> >> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> >> >> index fd73836..5e1d166 100644
>> >> >> --- a/drivers/net/bonding/bond_main.c
>> >> >> +++ b/drivers/net/bonding/bond_main.c
>> >> >> @@ -123,7 +123,7 @@ module_param(mode, charp, 0);
>> >> >>  MODULE_PARM_DESC(mode, "Mode of operation : 0 for balance-rr, "
>> >> >>  		       "1 for active-backup, 2 for balance-xor, "
>> >> >>  		       "3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, "
>> >> >> -		       "6 for balance-alb");
>> >> >> +		       "6 for balance-alb, 7 for mark");
>> >> >>  module_param(primary, charp, 0);
>> >> >>  MODULE_PARM_DESC(primary, "Primary network device to use");
>> >> >>  module_param(lacp_rate, charp, 0);
>> >> >> @@ -175,6 +175,7 @@ const struct bond_parm_tbl bond_mode_tbl[] = {
>> >> >>  {	"802.3ad",		BOND_MODE_8023AD},
>> >> >>  {	"balance-tlb",		BOND_MODE_TLB},
>> >> >>  {	"balance-alb",		BOND_MODE_ALB},
>> >> >> +{	"mark",			BOND_MODE_MARK},
>> >> >>  {	NULL,			-1},
>> >> >>  };
>> >> >>  
>> >> >> @@ -224,6 +225,7 @@ static const char *bond_mode_name(int mode)
>> >> >>  		[BOND_MODE_8023AD]= "IEEE 802.3ad Dynamic link aggregation",
>> >> >>  		[BOND_MODE_TLB] = "transmit load balancing",
>> >> >>  		[BOND_MODE_ALB] = "adaptive load balancing",
>> >> >> +		[BOND_MODE_MARK] = "mark-based transmit balancing",
>> >> >>  	};
>> >> >>  
>> >> >>  	if (mode < 0 || mode > BOND_MODE_ALB)
>> >> >> @@ -4464,6 +4466,57 @@ out:
>> >> >>  	return 0;
>> >> >>  }
>> >> >>  
>> >> >> +static int bond_xmit_mark(struct sk_buff *skb, struct net_device *bond_dev)
>> >> >> +{
>> >> >> +	struct bonding *bond = netdev_priv(bond_dev);
>> >> >> +	struct slave *slave;
>> >> >> +	int i, slave_no, res = 1;
>> >> >> +
>> >> >> +	read_lock(&bond->lock);
>> >> >> +
>> >> >> +	if (!BOND_IS_OK(bond)) {
>> >> >> +		goto out;
>> >> >> +	}
>> >> >> +
>> >> >> +	/* Use the mark as the determining factor for which slave to
>> >> >> +	 * choose for transmission.  When behaving normally all should
>> >> >> +	 * work just fine.  When a slave that is destined to be the
>> >> >> +	 * transmitter of this frame is down, start at the front of the
>> >> >> +	 * list and find the first available slave. */
>> >> 
>> >> 	Why not simply use the N'th up slave instead of reverting to
>> >> slave 0 for the down slave case?  I'm guessing this has to do with
>> >> trying to maintain some fixed balancing of traffic even in the face of
>> >> slave failure.
>> >> 
>> >
>> >That could be a reasonable option as well -- in fact that was my first
>> >implementation.  I decided to do it this way, so that a user could
>> >consider the first device as the primary slave for unmarked traffic and
>> >the slave for marked traffic destined for a slave that was down rather
>> >than what might seem like a random choice for traffic.
>> 
>> 	Ok, understood.
>> 
>> >> >> +	slave_no = skb->mark ? skb->mark % bond->slave_cnt : 0;
>> >> >> +
>> >> >Would it be worthwhile to add a special case here (say all f's in mark, to
>> >> >indicate a frames should be sent out all slaves on the bond?  In the case you
>> >> >have traffic that might need to go to all interface (like maybe igmp)?
>> >> 
>> >> 	If I'm not misunderstanding the purpose of the mode, I think
>> >> it's etherchannel compatible (meaning that the switch has to be
>> >> configured), so I'm not sure why there would ever be a need to flood
>> >> packets to all ports.
>> >> 
>> >> 	I think this would be generally be better a special hash policy,
>> >> in which case both the etherchannel (balance-xor) and 802.3ad modes
>> >> could take advantage of it.  I'd hazard to guess that Andy thought about
>> >> that, too, so what was the impediment?
>> >> 
>> >
>> >I wasn't designing this to be etherchannel compatible.  Though it could
>> >work that way, my idea was to use this in an environment where slaves
>> >may be connected to different switches that are capable of moving all of
>> >the traffic to it's destination if needed, but if the traffic is split
>> >in a predictable application specific manner the network can run more
>> >efficiently.  Think of this more as an active-backup mode where the
>> >backup links could actually be utilized.
>> 
>> 	Hmm.  If the slaves don't connect to the same places, why use
>> bonding at all, then?  Or, perhaps, two smaller bonds instead of one big
>> one.
>> 
>> 	Are you thinking of something like a load balancer back end, or
>> am I missing the point entirely?
>> 
>> 	And, really, active-backup using the backup links is just a load
>> balancing mode.  I'm not sure I see the special distinction there.
>> 
>
>OK, let's forget I mentioned active-backup. :)

	Fair enough!

>The best way to think about this is a user-configured load-balancing
>mode.  It can work with switches that are unaware of the existence of
>the bond.  This marking mode transmit could also be used for xor and
>802.3ad if there was a desire, but my original intent (and continued
>desire) was to create where the user controlled the load balancing and
>the option of using single or multiple switches could exist.  I've
>certainly seen network topolgies where this would be useful.  I can take
>some time and draw one of those up if you like (I'm just pressed for
>time right now).

	I'm pretty sure there would be interest for a mark policy in
802.3ad.

>> >When I first started on this I considered just extending the
>> >active-backup mode's transmit function, but I didn't want to deal with
>> >the potential loss of duplicate frames in skb_bond_should_drop and I
>> >also didn't want to deal with the complicated failover scenarios of
>> >active-backup mode.
>> >
>> >It's probably the most like round-robin (in fact it's quite close), but
>> >I felt that this mode could eventually grow into something more useful
>> >(who knows how it might be extended) and deserved it's own mode rather
>> >than dealing with the possibility of having a larger more complicated
>> >transmit routine in the future when another mode would be just fine.
>> 
>> 	I'm still not quite sure how this differs from balance-xor mode
>> with a hypothetical "xmit_hash_policy=mark" sort of arrangement, other
>> than, perhaps, intended purpose.
>> 
>
>XOR is designed to work with switches that are aware of the bond, right?
>This does not have the same requirement and I wanted to make sure it
>could work with switches that were unaware (or possibly separate
>switches).

	Well, theoretically, yes, but all "etherchannel compatible"
means is that all of the slaves are set to the same MAC address, and
we're prepared to accept traffic for the bond from any of the slaves.
Your new mode does both of those, and etherchannel compatible doesn't
mean "won't work without etherchannel."  There's no voodoo for
etherchannel beyond this (it's really not very sophisticated).

	The 802.3ad protocol, on the other hand, does have what you're
worred about: handshaking and various chit chat between the bond and the
switch.

	The xor mode would work fine, for example, in a configuration
with two switches connecting to discrete networks, and arranged for the
hash (or mark, hint hint) to put the traffic for each network on the
proper slave.  If the networks converge somewhere down the line, there
may be loop issues, but that's not really a new thing.  If the networks
don't converge, then each of the switches just sees it own little
etherchannel compatible world (of however many slaves are plugged into
that switch).

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bonding: add mark mode
  2009-05-01 21:51           ` Jay Vosburgh
@ 2009-05-15 20:38             ` Andy Gospodarek
  2009-05-15 21:58               ` Jay Vosburgh
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Gospodarek @ 2009-05-15 20:38 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Andy Gospodarek, Neil Horman, netdev, bonding-devel

On Fri, May 01, 2009 at 02:51:42PM -0700, Jay Vosburgh wrote:
> Andy Gospodarek <andy@greyhouse.net> wrote:
> 
> >On Fri, May 01, 2009 at 01:44:53PM -0700, Jay Vosburgh wrote:
> >> Andy Gospodarek <andy@greyhouse.net> wrote:
> >> 
> >> >On Fri, May 01, 2009 at 12:52:16PM -0700, Jay Vosburgh wrote:
> >> >> Neil Horman <nhorman@tuxdriver.com> wrote:
> >> >> 
> >> >> >On Fri, May 01, 2009 at 01:39:16PM -0400, Andy Gospodarek wrote:
> >> >> >> 
> >> >> >> Quite a few people are happy with the way bonding manages to split
> >> >> >> traffic to different members of a bond.  Many are quite disappointed
> >> >> >> that users or administrators cannot be given more control over the
> >> >> >> traffic distribution and would like something that they can control more
> >> >> >> easily.  I looked at extending some of the existing modes, but the
> >> >> >> cleanest option seemed to be one that created an additional mode to
> >> >> >> handle this case.  I hated to create yet another mode, but the
> >> >> >> simplicity of this mode made it a nice candidate for a new mode.  I have
> >> >> >> decided to call this mode 'mark' (or mode 7).
> >> >> >> 
> >> >> >> The mark mode of bonding relies on the skb->mark field for outgoing
> >> >> >> device selection.  Unmarked frames (ones where the mark is still zero),
> >> >> >> will be sent by the first active enslaved device.  Any marked frames
> >> >> >> will choose the outgoing device based on result of the modulo of the mark
> >> >> >> and the number of enslaved devices.  If that device is inactive
> >> >> >> (link-down), the traffic will default back to the first active enslaved
> >> >> >> device.  I debated how to use the mark to decide the outgoing device,
> >> >> >> but it seemed that modulo of the mark and the number of enslaved devices
> >> >> >> would provide the most flexibility for those who currently mark frames
> >> >> >> for other purposes.
> >> >> >> 
> >> >> >> I considered some other options for choosing destination devices based
> >> >> >> on marks, but the ones I came up would require additional sysfs
> >> >> >> configuration parameters and I would prefer not to add any more to an
> >> >> >> already crowded space.
> >> >> >> 
> >> >> >> I've tested this on a slightly older kernel than the net-next-2.6 tree
> >> >> >> than this patch is against by marking frames using mark and connmark
> >> >> >> iptables options and it seems to work as I expect.
> >> >> >> 
> >> >> >> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> >> >> >> ---
> >> >> >> 
> >> >> >>  Documentation/networking/bonding.txt |   25 ++++++++++++++
> >> >> >>  drivers/net/bonding/bond_main.c      |   59 ++++++++++++++++++++++++++++++++++-
> >> >> >>  include/linux/if_bonding.h           |    1 
> >> >> >>  3 files changed, 84 insertions(+), 1 deletion(-)
> >> >> >> 
> >> >> >> diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
> >> >> >> index 0876275..7a0d4c2 100644
> >> >> >> --- a/Documentation/networking/bonding.txt
> >> >> >> +++ b/Documentation/networking/bonding.txt
> >> >> >> @@ -582,6 +582,31 @@ mode
> >> >> >>  		swapped with the new curr_active_slave that was
> >> >> >>  		chosen.
> >> >> >>  
> >> >> >> +	mark or 7
> >> >> >> +
> >> >> >> +		Mark-based policy: skbuffs that arrive to be
> >> >> >> +		transmitted will have the mark field inspected to
> >> >> >> +		determine the destination slave device.  When the
> >> >> >> +		skbuff's mark is zero, the first active device in the
> >> >> >> +		ordered list of enslaved devices will be used.  When
> >> >> >> +		the mark is non-zero the modulo of the mark and the
> >> >> >> +		number of enslaved devices will determine the
> >> >> >> +		interface used for transmission.  If this device is
> >> >> >> +		not active (link-down) then the mark will essentially
> >> >> >> +		be ignored and the first active device in the ordered
> >> >> >> +		list of enslaved devices will be used.
> >> >> >> +
> >> >> >> +		The flexibility offered with this mode allows users
> >> >> >> +		of netfilter to move various types of traffic to
> >> >> >> +		different slaves quite easily.  Information on this
> >> >> >> +		can be found in the manpages for iptables/ebtables
> >> >> >> +		as well as netfilter documentation.
> >> >> >> +
> >> >> >> +		Prerequisites:
> >> >> >> +
> >> >> >> +		1.  Without the ability to mark skbuffs this mode is
> >> >> >> +		not useful.  Netfilter greatly aides skbuff marking.
> >> >> >> +
> >> >> >>  num_grat_arp
> >> >> >>  
> >> >> >>  	Specifies the number of gratuitous ARPs to be issued after a
> >> >> >> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >> >> >> index fd73836..5e1d166 100644
> >> >> >> --- a/drivers/net/bonding/bond_main.c
> >> >> >> +++ b/drivers/net/bonding/bond_main.c
> >> >> >> @@ -123,7 +123,7 @@ module_param(mode, charp, 0);
> >> >> >>  MODULE_PARM_DESC(mode, "Mode of operation : 0 for balance-rr, "
> >> >> >>  		       "1 for active-backup, 2 for balance-xor, "
> >> >> >>  		       "3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, "
> >> >> >> -		       "6 for balance-alb");
> >> >> >> +		       "6 for balance-alb, 7 for mark");
> >> >> >>  module_param(primary, charp, 0);
> >> >> >>  MODULE_PARM_DESC(primary, "Primary network device to use");
> >> >> >>  module_param(lacp_rate, charp, 0);
> >> >> >> @@ -175,6 +175,7 @@ const struct bond_parm_tbl bond_mode_tbl[] = {
> >> >> >>  {	"802.3ad",		BOND_MODE_8023AD},
> >> >> >>  {	"balance-tlb",		BOND_MODE_TLB},
> >> >> >>  {	"balance-alb",		BOND_MODE_ALB},
> >> >> >> +{	"mark",			BOND_MODE_MARK},
> >> >> >>  {	NULL,			-1},
> >> >> >>  };
> >> >> >>  
> >> >> >> @@ -224,6 +225,7 @@ static const char *bond_mode_name(int mode)
> >> >> >>  		[BOND_MODE_8023AD]= "IEEE 802.3ad Dynamic link aggregation",
> >> >> >>  		[BOND_MODE_TLB] = "transmit load balancing",
> >> >> >>  		[BOND_MODE_ALB] = "adaptive load balancing",
> >> >> >> +		[BOND_MODE_MARK] = "mark-based transmit balancing",
> >> >> >>  	};
> >> >> >>  
> >> >> >>  	if (mode < 0 || mode > BOND_MODE_ALB)
> >> >> >> @@ -4464,6 +4466,57 @@ out:
> >> >> >>  	return 0;
> >> >> >>  }
> >> >> >>  
> >> >> >> +static int bond_xmit_mark(struct sk_buff *skb, struct net_device *bond_dev)
> >> >> >> +{
> >> >> >> +	struct bonding *bond = netdev_priv(bond_dev);
> >> >> >> +	struct slave *slave;
> >> >> >> +	int i, slave_no, res = 1;
> >> >> >> +
> >> >> >> +	read_lock(&bond->lock);
> >> >> >> +
> >> >> >> +	if (!BOND_IS_OK(bond)) {
> >> >> >> +		goto out;
> >> >> >> +	}
> >> >> >> +
> >> >> >> +	/* Use the mark as the determining factor for which slave to
> >> >> >> +	 * choose for transmission.  When behaving normally all should
> >> >> >> +	 * work just fine.  When a slave that is destined to be the
> >> >> >> +	 * transmitter of this frame is down, start at the front of the
> >> >> >> +	 * list and find the first available slave. */
> >> >> 
> >> >> 	Why not simply use the N'th up slave instead of reverting to
> >> >> slave 0 for the down slave case?  I'm guessing this has to do with
> >> >> trying to maintain some fixed balancing of traffic even in the face of
> >> >> slave failure.
> >> >> 
> >> >
> >> >That could be a reasonable option as well -- in fact that was my first
> >> >implementation.  I decided to do it this way, so that a user could
> >> >consider the first device as the primary slave for unmarked traffic and
> >> >the slave for marked traffic destined for a slave that was down rather
> >> >than what might seem like a random choice for traffic.
> >> 
> >> 	Ok, understood.
> >> 
> >> >> >> +	slave_no = skb->mark ? skb->mark % bond->slave_cnt : 0;
> >> >> >> +
> >> >> >Would it be worthwhile to add a special case here (say all f's in mark, to
> >> >> >indicate a frames should be sent out all slaves on the bond?  In the case you
> >> >> >have traffic that might need to go to all interface (like maybe igmp)?
> >> >> 
> >> >> 	If I'm not misunderstanding the purpose of the mode, I think
> >> >> it's etherchannel compatible (meaning that the switch has to be
> >> >> configured), so I'm not sure why there would ever be a need to flood
> >> >> packets to all ports.
> >> >> 
> >> >> 	I think this would be generally be better a special hash policy,
> >> >> in which case both the etherchannel (balance-xor) and 802.3ad modes
> >> >> could take advantage of it.  I'd hazard to guess that Andy thought about
> >> >> that, too, so what was the impediment?
> >> >> 
> >> >
> >> >I wasn't designing this to be etherchannel compatible.  Though it could
> >> >work that way, my idea was to use this in an environment where slaves
> >> >may be connected to different switches that are capable of moving all of
> >> >the traffic to it's destination if needed, but if the traffic is split
> >> >in a predictable application specific manner the network can run more
> >> >efficiently.  Think of this more as an active-backup mode where the
> >> >backup links could actually be utilized.
> >> 
> >> 	Hmm.  If the slaves don't connect to the same places, why use
> >> bonding at all, then?  Or, perhaps, two smaller bonds instead of one big
> >> one.
> >> 
> >> 	Are you thinking of something like a load balancer back end, or
> >> am I missing the point entirely?
> >> 
> >> 	And, really, active-backup using the backup links is just a load
> >> balancing mode.  I'm not sure I see the special distinction there.
> >> 
> >
> >OK, let's forget I mentioned active-backup. :)
> 
> 	Fair enough!
> 
> >The best way to think about this is a user-configured load-balancing
> >mode.  It can work with switches that are unaware of the existence of
> >the bond.  This marking mode transmit could also be used for xor and
> >802.3ad if there was a desire, but my original intent (and continued
> >desire) was to create where the user controlled the load balancing and
> >the option of using single or multiple switches could exist.  I've
> >certainly seen network topolgies where this would be useful.  I can take
> >some time and draw one of those up if you like (I'm just pressed for
> >time right now).
> 
> 	I'm pretty sure there would be interest for a mark policy in
> 802.3ad.
> 
> >> >When I first started on this I considered just extending the
> >> >active-backup mode's transmit function, but I didn't want to deal with
> >> >the potential loss of duplicate frames in skb_bond_should_drop and I
> >> >also didn't want to deal with the complicated failover scenarios of
> >> >active-backup mode.
> >> >
> >> >It's probably the most like round-robin (in fact it's quite close), but
> >> >I felt that this mode could eventually grow into something more useful
> >> >(who knows how it might be extended) and deserved it's own mode rather
> >> >than dealing with the possibility of having a larger more complicated
> >> >transmit routine in the future when another mode would be just fine.
> >> 
> >> 	I'm still not quite sure how this differs from balance-xor mode
> >> with a hypothetical "xmit_hash_policy=mark" sort of arrangement, other
> >> than, perhaps, intended purpose.
> >> 
> >
> >XOR is designed to work with switches that are aware of the bond, right?
> >This does not have the same requirement and I wanted to make sure it
> >could work with switches that were unaware (or possibly separate
> >switches).
> 
> 	Well, theoretically, yes, but all "etherchannel compatible"
> means is that all of the slaves are set to the same MAC address, and
> we're prepared to accept traffic for the bond from any of the slaves.
> Your new mode does both of those, and etherchannel compatible doesn't
> mean "won't work without etherchannel."  There's no voodoo for
> etherchannel beyond this (it's really not very sophisticated).
> 
> 	The 802.3ad protocol, on the other hand, does have what you're
> worred about: handshaking and various chit chat between the bond and the
> switch.
> 
> 	The xor mode would work fine, for example, in a configuration
> with two switches connecting to discrete networks, and arranged for the
> hash (or mark, hint hint) to put the traffic for each network on the
> proper slave.  If the networks converge somewhere down the line, there
> may be loop issues, but that's not really a new thing.  If the networks
> don't converge, then each of the switches just sees it own little
> etherchannel compatible world (of however many slaves are plugged into
> that switch).
> 

Jay,

Sorry for the delayed response.  Here are my thoughts.

My initial impression was that XOR was a bit more complex, but the
reality is the code is not. There's really just not much to it.
I would see no reason why the mark-based mode (mode 7) could be added
and the code that does slave selection could be shared, so that a
mark-based hashing algorithm could be used for XOR and 802.3ad.  It
could also be that the new mode is abandoned all-together and just the
hashing algorithm is used (as you have suggested in previous emails).

The only real problem I see with not creating a new mode is what happens
when new options to control mark-based behavior are added with the
intent of not using them in an 802.3ad scenario.  Initially these are
the ones I've planned (and I'm sure they could grow from here):

	mark_default - Modify the patch I've sent already to send all
	traffic marked for an interface that is down to the specified
	interface (rather than the 'next' in the list -- which would be
	the new default behavior).  This would take a single device
	argument.  (This could also just be an overload of the 'primary'
	directive if desired.)

	mark_order - This would be a list of interfaces to use as the
	order for the marks used.  You can bet that users will find
	mark mode/hashing useful, but would rather use interfaces in the
	order they specify like eth2, eth0, eth1 instead of the
	enslavement order (which would normally be eth0, eth1, eth2).
	This parameter would take multiple arguments like
	/sys/class/net/bondX/bonding/slaves currently does.

So the decision that is left is whether we want bonding arguments like
this:

mode=balance-xor xmit_hash_policy=mark

or

mode=mark

initially with the chance to grow into something like this:

mode=balance-xor xmit_hash_policy=mark mark_default=eth0 mark_order="eth0, eth1, eth2"

rather than

mode=mark mark_default=eth0 mark_order="eth0, eth1, eth2"

To me the shorter one seems better than leaves more room for expansion
of features specific to the marking mode, but I can also understand
the potential lack of interest in another bonding mode since it's just
more code to maintain when something breaks or when changing the way
transmission is done.

Of course I won't deny that I think it would be fun to be 'the guy that
created that new mode of bonding,' but I'm not sure that would really
impress anyone anyway. :-)

Let me know what you think,

-andy


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bonding: add mark mode
  2009-05-15 20:38             ` Andy Gospodarek
@ 2009-05-15 21:58               ` Jay Vosburgh
  0 siblings, 0 replies; 12+ messages in thread
From: Jay Vosburgh @ 2009-05-15 21:58 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: Neil Horman, netdev, bonding-devel

Andy Gospodarek <andy@greyhouse.net> wrote:

>On Fri, May 01, 2009 at 02:51:42PM -0700, Jay Vosburgh wrote:
>> Andy Gospodarek <andy@greyhouse.net> wrote:
>> 
>> >On Fri, May 01, 2009 at 01:44:53PM -0700, Jay Vosburgh wrote:
>> >> Andy Gospodarek <andy@greyhouse.net> wrote:
>> >> 
>> >> >On Fri, May 01, 2009 at 12:52:16PM -0700, Jay Vosburgh wrote:
>> >> >> Neil Horman <nhorman@tuxdriver.com> wrote:
>> >> >> 
>> >> >> >On Fri, May 01, 2009 at 01:39:16PM -0400, Andy Gospodarek wrote:
>> >> >> >> 
>> >> >> >> Quite a few people are happy with the way bonding manages to split
>> >> >> >> traffic to different members of a bond.  Many are quite disappointed
>> >> >> >> that users or administrators cannot be given more control over the
>> >> >> >> traffic distribution and would like something that they can control more
>> >> >> >> easily.  I looked at extending some of the existing modes, but the
>> >> >> >> cleanest option seemed to be one that created an additional mode to
>> >> >> >> handle this case.  I hated to create yet another mode, but the
>> >> >> >> simplicity of this mode made it a nice candidate for a new mode.  I have
>> >> >> >> decided to call this mode 'mark' (or mode 7).
>> >> >> >> 
>> >> >> >> The mark mode of bonding relies on the skb->mark field for outgoing
>> >> >> >> device selection.  Unmarked frames (ones where the mark is still zero),
>> >> >> >> will be sent by the first active enslaved device.  Any marked frames
>> >> >> >> will choose the outgoing device based on result of the modulo of the mark
>> >> >> >> and the number of enslaved devices.  If that device is inactive
>> >> >> >> (link-down), the traffic will default back to the first active enslaved
>> >> >> >> device.  I debated how to use the mark to decide the outgoing device,
>> >> >> >> but it seemed that modulo of the mark and the number of enslaved devices
>> >> >> >> would provide the most flexibility for those who currently mark frames
>> >> >> >> for other purposes.
>> >> >> >> 
>> >> >> >> I considered some other options for choosing destination devices based
>> >> >> >> on marks, but the ones I came up would require additional sysfs
>> >> >> >> configuration parameters and I would prefer not to add any more to an
>> >> >> >> already crowded space.
>> >> >> >> 
>> >> >> >> I've tested this on a slightly older kernel than the net-next-2.6 tree
>> >> >> >> than this patch is against by marking frames using mark and connmark
>> >> >> >> iptables options and it seems to work as I expect.
>> >> >> >> 
>> >> >> >> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
>> >> >> >> ---
>> >> >> >> 
>> >> >> >>  Documentation/networking/bonding.txt |   25 ++++++++++++++
>> >> >> >>  drivers/net/bonding/bond_main.c      |   59 ++++++++++++++++++++++++++++++++++-
>> >> >> >>  include/linux/if_bonding.h           |    1 
>> >> >> >>  3 files changed, 84 insertions(+), 1 deletion(-)
>> >> >> >> 
>> >> >> >> diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
>> >> >> >> index 0876275..7a0d4c2 100644
>> >> >> >> --- a/Documentation/networking/bonding.txt
>> >> >> >> +++ b/Documentation/networking/bonding.txt
>> >> >> >> @@ -582,6 +582,31 @@ mode
>> >> >> >>  		swapped with the new curr_active_slave that was
>> >> >> >>  		chosen.
>> >> >> >>  
>> >> >> >> +	mark or 7
>> >> >> >> +
>> >> >> >> +		Mark-based policy: skbuffs that arrive to be
>> >> >> >> +		transmitted will have the mark field inspected to
>> >> >> >> +		determine the destination slave device.  When the
>> >> >> >> +		skbuff's mark is zero, the first active device in the
>> >> >> >> +		ordered list of enslaved devices will be used.  When
>> >> >> >> +		the mark is non-zero the modulo of the mark and the
>> >> >> >> +		number of enslaved devices will determine the
>> >> >> >> +		interface used for transmission.  If this device is
>> >> >> >> +		not active (link-down) then the mark will essentially
>> >> >> >> +		be ignored and the first active device in the ordered
>> >> >> >> +		list of enslaved devices will be used.
>> >> >> >> +
>> >> >> >> +		The flexibility offered with this mode allows users
>> >> >> >> +		of netfilter to move various types of traffic to
>> >> >> >> +		different slaves quite easily.  Information on this
>> >> >> >> +		can be found in the manpages for iptables/ebtables
>> >> >> >> +		as well as netfilter documentation.
>> >> >> >> +
>> >> >> >> +		Prerequisites:
>> >> >> >> +
>> >> >> >> +		1.  Without the ability to mark skbuffs this mode is
>> >> >> >> +		not useful.  Netfilter greatly aides skbuff marking.
>> >> >> >> +
>> >> >> >>  num_grat_arp
>> >> >> >>  
>> >> >> >>  	Specifies the number of gratuitous ARPs to be issued after a
>> >> >> >> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> >> >> >> index fd73836..5e1d166 100644
>> >> >> >> --- a/drivers/net/bonding/bond_main.c
>> >> >> >> +++ b/drivers/net/bonding/bond_main.c
>> >> >> >> @@ -123,7 +123,7 @@ module_param(mode, charp, 0);
>> >> >> >>  MODULE_PARM_DESC(mode, "Mode of operation : 0 for balance-rr, "
>> >> >> >>  		       "1 for active-backup, 2 for balance-xor, "
>> >> >> >>  		       "3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, "
>> >> >> >> -		       "6 for balance-alb");
>> >> >> >> +		       "6 for balance-alb, 7 for mark");
>> >> >> >>  module_param(primary, charp, 0);
>> >> >> >>  MODULE_PARM_DESC(primary, "Primary network device to use");
>> >> >> >>  module_param(lacp_rate, charp, 0);
>> >> >> >> @@ -175,6 +175,7 @@ const struct bond_parm_tbl bond_mode_tbl[] = {
>> >> >> >>  {	"802.3ad",		BOND_MODE_8023AD},
>> >> >> >>  {	"balance-tlb",		BOND_MODE_TLB},
>> >> >> >>  {	"balance-alb",		BOND_MODE_ALB},
>> >> >> >> +{	"mark",			BOND_MODE_MARK},
>> >> >> >>  {	NULL,			-1},
>> >> >> >>  };
>> >> >> >>  
>> >> >> >> @@ -224,6 +225,7 @@ static const char *bond_mode_name(int mode)
>> >> >> >>  		[BOND_MODE_8023AD]= "IEEE 802.3ad Dynamic link aggregation",
>> >> >> >>  		[BOND_MODE_TLB] = "transmit load balancing",
>> >> >> >>  		[BOND_MODE_ALB] = "adaptive load balancing",
>> >> >> >> +		[BOND_MODE_MARK] = "mark-based transmit balancing",
>> >> >> >>  	};
>> >> >> >>  
>> >> >> >>  	if (mode < 0 || mode > BOND_MODE_ALB)
>> >> >> >> @@ -4464,6 +4466,57 @@ out:
>> >> >> >>  	return 0;
>> >> >> >>  }
>> >> >> >>  
>> >> >> >> +static int bond_xmit_mark(struct sk_buff *skb, struct net_device *bond_dev)
>> >> >> >> +{
>> >> >> >> +	struct bonding *bond = netdev_priv(bond_dev);
>> >> >> >> +	struct slave *slave;
>> >> >> >> +	int i, slave_no, res = 1;
>> >> >> >> +
>> >> >> >> +	read_lock(&bond->lock);
>> >> >> >> +
>> >> >> >> +	if (!BOND_IS_OK(bond)) {
>> >> >> >> +		goto out;
>> >> >> >> +	}
>> >> >> >> +
>> >> >> >> +	/* Use the mark as the determining factor for which slave to
>> >> >> >> +	 * choose for transmission.  When behaving normally all should
>> >> >> >> +	 * work just fine.  When a slave that is destined to be the
>> >> >> >> +	 * transmitter of this frame is down, start at the front of the
>> >> >> >> +	 * list and find the first available slave. */
>> >> >> 
>> >> >> 	Why not simply use the N'th up slave instead of reverting to
>> >> >> slave 0 for the down slave case?  I'm guessing this has to do with
>> >> >> trying to maintain some fixed balancing of traffic even in the face of
>> >> >> slave failure.
>> >> >> 
>> >> >
>> >> >That could be a reasonable option as well -- in fact that was my first
>> >> >implementation.  I decided to do it this way, so that a user could
>> >> >consider the first device as the primary slave for unmarked traffic and
>> >> >the slave for marked traffic destined for a slave that was down rather
>> >> >than what might seem like a random choice for traffic.
>> >> 
>> >> 	Ok, understood.
>> >> 
>> >> >> >> +	slave_no = skb->mark ? skb->mark % bond->slave_cnt : 0;
>> >> >> >> +
>> >> >> >Would it be worthwhile to add a special case here (say all f's in mark, to
>> >> >> >indicate a frames should be sent out all slaves on the bond?  In the case you
>> >> >> >have traffic that might need to go to all interface (like maybe igmp)?
>> >> >> 
>> >> >> 	If I'm not misunderstanding the purpose of the mode, I think
>> >> >> it's etherchannel compatible (meaning that the switch has to be
>> >> >> configured), so I'm not sure why there would ever be a need to flood
>> >> >> packets to all ports.
>> >> >> 
>> >> >> 	I think this would be generally be better a special hash policy,
>> >> >> in which case both the etherchannel (balance-xor) and 802.3ad modes
>> >> >> could take advantage of it.  I'd hazard to guess that Andy thought about
>> >> >> that, too, so what was the impediment?
>> >> >> 
>> >> >
>> >> >I wasn't designing this to be etherchannel compatible.  Though it could
>> >> >work that way, my idea was to use this in an environment where slaves
>> >> >may be connected to different switches that are capable of moving all of
>> >> >the traffic to it's destination if needed, but if the traffic is split
>> >> >in a predictable application specific manner the network can run more
>> >> >efficiently.  Think of this more as an active-backup mode where the
>> >> >backup links could actually be utilized.
>> >> 
>> >> 	Hmm.  If the slaves don't connect to the same places, why use
>> >> bonding at all, then?  Or, perhaps, two smaller bonds instead of one big
>> >> one.
>> >> 
>> >> 	Are you thinking of something like a load balancer back end, or
>> >> am I missing the point entirely?
>> >> 
>> >> 	And, really, active-backup using the backup links is just a load
>> >> balancing mode.  I'm not sure I see the special distinction there.
>> >> 
>> >
>> >OK, let's forget I mentioned active-backup. :)
>> 
>> 	Fair enough!
>> 
>> >The best way to think about this is a user-configured load-balancing
>> >mode.  It can work with switches that are unaware of the existence of
>> >the bond.  This marking mode transmit could also be used for xor and
>> >802.3ad if there was a desire, but my original intent (and continued
>> >desire) was to create where the user controlled the load balancing and
>> >the option of using single or multiple switches could exist.  I've
>> >certainly seen network topolgies where this would be useful.  I can take
>> >some time and draw one of those up if you like (I'm just pressed for
>> >time right now).
>> 
>> 	I'm pretty sure there would be interest for a mark policy in
>> 802.3ad.
>> 
>> >> >When I first started on this I considered just extending the
>> >> >active-backup mode's transmit function, but I didn't want to deal with
>> >> >the potential loss of duplicate frames in skb_bond_should_drop and I
>> >> >also didn't want to deal with the complicated failover scenarios of
>> >> >active-backup mode.
>> >> >
>> >> >It's probably the most like round-robin (in fact it's quite close), but
>> >> >I felt that this mode could eventually grow into something more useful
>> >> >(who knows how it might be extended) and deserved it's own mode rather
>> >> >than dealing with the possibility of having a larger more complicated
>> >> >transmit routine in the future when another mode would be just fine.
>> >> 
>> >> 	I'm still not quite sure how this differs from balance-xor mode
>> >> with a hypothetical "xmit_hash_policy=mark" sort of arrangement, other
>> >> than, perhaps, intended purpose.
>> >> 
>> >
>> >XOR is designed to work with switches that are aware of the bond, right?
>> >This does not have the same requirement and I wanted to make sure it
>> >could work with switches that were unaware (or possibly separate
>> >switches).
>> 
>> 	Well, theoretically, yes, but all "etherchannel compatible"
>> means is that all of the slaves are set to the same MAC address, and
>> we're prepared to accept traffic for the bond from any of the slaves.
>> Your new mode does both of those, and etherchannel compatible doesn't
>> mean "won't work without etherchannel."  There's no voodoo for
>> etherchannel beyond this (it's really not very sophisticated).
>> 
>> 	The 802.3ad protocol, on the other hand, does have what you're
>> worred about: handshaking and various chit chat between the bond and the
>> switch.
>> 
>> 	The xor mode would work fine, for example, in a configuration
>> with two switches connecting to discrete networks, and arranged for the
>> hash (or mark, hint hint) to put the traffic for each network on the
>> proper slave.  If the networks converge somewhere down the line, there
>> may be loop issues, but that's not really a new thing.  If the networks
>> don't converge, then each of the switches just sees it own little
>> etherchannel compatible world (of however many slaves are plugged into
>> that switch).
>> 
>
>Jay,
>
>Sorry for the delayed response.  Here are my thoughts.
>
>My initial impression was that XOR was a bit more complex, but the
>reality is the code is not. There's really just not much to it.
>I would see no reason why the mark-based mode (mode 7) could be added
>and the code that does slave selection could be shared, so that a

	Presumably you mean "could not be added."

>mark-based hashing algorithm could be used for XOR and 802.3ad.  It
>could also be that the new mode is abandoned all-together and just the
>hashing algorithm is used (as you have suggested in previous emails).
>
>The only real problem I see with not creating a new mode is what happens
>when new options to control mark-based behavior are added with the
>intent of not using them in an 802.3ad scenario.  Initially these are
>the ones I've planned (and I'm sure they could grow from here):

	I'm happy with having hash modes that are "use at own risk in
802.3ad"; there's one there already that's selectable but not fully
standard compliant.  As long as the documentation is clear on this, I
have no issue with it.

>	mark_default - Modify the patch I've sent already to send all
>	traffic marked for an interface that is down to the specified
>	interface (rather than the 'next' in the list -- which would be
>	the new default behavior).  This would take a single device
>	argument.  (This could also just be an overload of the 'primary'
>	directive if desired.)
>
>	mark_order - This would be a list of interfaces to use as the
>	order for the marks used.  You can bet that users will find
>	mark mode/hashing useful, but would rather use interfaces in the
>	order they specify like eth2, eth0, eth1 instead of the
>	enslavement order (which would normally be eth0, eth1, eth2).
>	This parameter would take multiple arguments like
>	/sys/class/net/bondX/bonding/slaves currently does.
>
>So the decision that is left is whether we want bonding arguments like
>this:
>
>mode=balance-xor xmit_hash_policy=mark
>
>or
>
>mode=mark
>
>initially with the chance to grow into something like this:
>
>mode=balance-xor xmit_hash_policy=mark mark_default=eth0 mark_order="eth0, eth1, eth2"
>
>rather than
>
>mode=mark mark_default=eth0 mark_order="eth0, eth1, eth2"
>
>To me the shorter one seems better than leaves more room for expansion
>of features specific to the marking mode, but I can also understand
>the potential lack of interest in another bonding mode since it's just
>more code to maintain when something breaks or when changing the way
>transmission is done.

	It's not that I'm trying to avoid adding a mode, it's just that
this functionality looks to have application in multiple existing modes.
It therefore seems to me that it's better as a modifier for existing
modes (xor and 802.3ad), rather than a discrete mode (that would
interoperate only with etherchannel, and not 802.3ad).

	Or, instead of "mark," create a new mode that's called
"etherchannel" and figure out a framework to have options sets for the
various slave selection gizmos (hash, mark, round-robin, anything else
coming up) apply generically to the new "etherchannel" and existing
"802.3ad" modes.

	For 802.3ad, the schemes that are not standards compliant are
either "use at own risk" or prohibited, probably on a case-by-case
basis.  E.g., Just about any strict hash is probably ok, but round-robin
is probably never appropriate for 802.3ad.  Or, now that I ruminate a
bit, just print a "whoa: not standards compliant" warning and let 'em go
for it.  If people wanna run round-robin over their 802.3ad aggregations
in the privacy of their own datacenter, who am I to complain?

	In the above, balance-xor, balance-rr and probably broadcast are
aliases to some particular set of options for "etherchannel" mode.  I'm
not really sure what broadcast is really used for (I suspect it's not
really used at all), but the choice is either drop it or treat it
equally, so treat it equally.

	Yah, I know you're not really thinking necessarily in terms of
Etherchannel compatibility, but if all the slaves always have the same
MAC and traffic is accepted equally on any slave, that's Etherchannel
compatible.  Even if you run some weird split-peer arrangement on
discrete switch segments (and, honestly, I'm still not too clear on why
that's better than two bonds, one to each switch).

>Of course I won't deny that I think it would be fun to be 'the guy that
>created that new mode of bonding,' but I'm not sure that would really
>impress anyone anyway. :-)
>
>Let me know what you think,

	I'm kinda liking my random idea (new "etherchannel" mode) on
first thought.  I'm not yet 100% sure it's really less bad that what's
there now, though.  On the up side, the nomenclature is clearer, and,
most importantly, you still get to be famous as having created a new
mode.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-05-15 21:57 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-01 17:39 [PATCH] bonding: add mark mode Andy Gospodarek
2009-05-01 19:34 ` Neil Horman
2009-05-01 19:52   ` Jay Vosburgh
2009-05-01 20:09     ` Neil Horman
2009-05-01 20:31       ` Jay Vosburgh
2009-05-01 20:39         ` Neil Horman
2009-05-01 20:25     ` Andy Gospodarek
2009-05-01 20:44       ` Jay Vosburgh
2009-05-01 21:06         ` Andy Gospodarek
2009-05-01 21:51           ` Jay Vosburgh
2009-05-15 20:38             ` Andy Gospodarek
2009-05-15 21:58               ` Jay Vosburgh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).