Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2] bonding: documentation and code cleanup for resend_igmp
From: Rick Jones @ 2011-05-25 18:44 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: netdev, Jay Vosburgh, Andy Gospodarek
In-Reply-To: <1306348738-23946-1-git-send-email-fbl@redhat.com>

On Wed, 2011-05-25 at 15:38 -0300, Flavio Leitner wrote:
> Improves the documentation about how IGMP resend parameter
> works, fix two missing checks and coding style issues.
> 
> Signed-off-by: Flavio Leitner <fbl@redhat.com>

Acked-by: Rick Jones <rick.jones2@hp.com>

At least as far as the documentation changes go.

rick jones



^ permalink raw reply

* [PATCH v2] bonding: documentation and code cleanup for resend_igmp
From: Flavio Leitner @ 2011-05-25 18:38 UTC (permalink / raw)
  To: netdev; +Cc: Jay Vosburgh, Andy Gospodarek, Rick Jones, Flavio Leitner

Improves the documentation about how IGMP resend parameter
works, fix two missing checks and coding style issues.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
---
 Documentation/networking/bonding.txt |   13 +++++++++++--
 drivers/net/bonding/bond_main.c      |   12 +++++++-----
 drivers/net/bonding/bond_sysfs.c     |   10 +++++-----
 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 1f45bd8..675612f 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -770,8 +770,17 @@ resend_igmp
 	a failover event. One membership report is issued immediately after
 	the failover, subsequent packets are sent in each 200ms interval.
 
-	The valid range is 0 - 255; the default value is 1. This option
-	was added for bonding version 3.7.0.
+	The valid range is 0 - 255; the default value is 1. A value of 0
+	prevents the IGMP membership report from being issued in response
+	to the failover event.
+
+	This option is useful for bonding modes balance-rr (0), active-backup
+	(1), balance-tlb (5) and balance-alb (6), in which a failover can
+	switch the IGMP traffic from one slave to another.  Therefore a fresh
+	IGMP report must be issued to cause the switch to forward the incoming
+	IGMP traffic over the newly selected slave.
+
+	This option was added for bonding version 3.7.0.
 
 3. Configuring Bonding Devices
 ==============================
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 6dc4284..86a8001 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -852,7 +852,7 @@ static void bond_resend_igmp_join_requests(struct bonding *bond)
 static void bond_resend_igmp_join_requests_delayed(struct work_struct *work)
 {
 	struct bonding *bond = container_of(work, struct bonding,
-							mcast_work.work);
+					    mcast_work.work);
 	bond_resend_igmp_join_requests(bond);
 }
 
@@ -1172,10 +1172,12 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
 	}
 
 	/* resend IGMP joins since active slave has changed or
-	 * all were sent on curr_active_slave */
-	if (((USES_PRIMARY(bond->params.mode) && new_active) ||
-	     bond->params.mode == BOND_MODE_ROUNDROBIN) &&
-	    netif_running(bond->dev)) {
+	 * all were sent on curr_active_slave.
+	 * resend only if bond is brought up with the affected
+	 * bonding modes and the retransmission is enabled */
+	if (netif_running(bond->dev) && (bond->params.resend_igmp > 0) &&
+	    ((USES_PRIMARY(bond->params.mode) && new_active) ||
+	     bond->params.mode == BOND_MODE_ROUNDROBIN)) {
 		bond->igmp_retrans = bond->params.resend_igmp;
 		queue_delayed_work(bond->wq, &bond->mcast_work, 0);
 	}
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 4059bfc..a32848a 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1539,8 +1539,8 @@ static DEVICE_ATTR(all_slaves_active, S_IRUGO | S_IWUSR,
  * Show and set the number of IGMP membership reports to send on link failure
  */
 static ssize_t bonding_show_resend_igmp(struct device *d,
-					 struct device_attribute *attr,
-					 char *buf)
+					struct device_attribute *attr,
+					char *buf)
 {
 	struct bonding *bond = to_bond(d);
 
@@ -1548,8 +1548,8 @@ static ssize_t bonding_show_resend_igmp(struct device *d,
 }
 
 static ssize_t bonding_store_resend_igmp(struct device *d,
-					  struct device_attribute *attr,
-					  const char *buf, size_t count)
+					 struct device_attribute *attr,
+					 const char *buf, size_t count)
 {
 	int new_value, ret = count;
 	struct bonding *bond = to_bond(d);
@@ -1561,7 +1561,7 @@ static ssize_t bonding_store_resend_igmp(struct device *d,
 		goto out;
 	}
 
-	if (new_value < 0) {
+	if (new_value < 0 || new_value > 255) {
 		pr_err("%s: Invalid resend_igmp value %d not in range 0-255; rejected.\n",
 		       bond->dev->name, new_value);
 		ret = -EINVAL;
-- 
1.7.4


^ permalink raw reply related

* [PATCH] bonding: prevent deadlock on slave store with alb mode (v3)
From: Neil Horman @ 2011-05-25 18:13 UTC (permalink / raw)
  To: netdev
  Cc: Neil Horman, Jay Vosburgh, Andy Gospodarek, nicolas.2p.debian,
	David S. Miller
In-Reply-To: <1306265765-8257-1-git-send-email-nhorman@tuxdriver.com>

This soft lockup was recently reported:

[root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
[root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.
bonding bond5: master_dev is not up in bond_enslave
[root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.

BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
CPU 12:
Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
be2d
Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
RIP: 0010:[<ffffffff80064bf0>]  [<ffffffff80064bf0>]
.text.lock.spinlock+0x26/00
RSP: 0018:ffff810113167da8  EFLAGS: 00000286
RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
FS:  00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
 [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
 [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
 [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
 [<ffffffff8006457b>] __down_write_nested+0x12/0x92
 [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
 [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
 [<ffffffff80016b87>] vfs_write+0xce/0x174
 [<ffffffff80017450>] sys_write+0x45/0x6e
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

It occurs because we are able to change the slave configuarion of a bond while
the bond interface is down.  The bonding driver initializes some data structures
only after its ndo_open routine is called.  Among them is the initalization of
the alb tx and rx hash locks.  So if we add or remove a slave without first
opening the bond master device, we run the risk of trying to lock/unlock a
spinlock that has garbage for data in it, which results in our above softlock.

Note that sometimes this works, because in many cases an unlocked spinlock has
the raw_lock parameter initialized to zero (meaning that the kzalloc of the
net_device private data is equivalent to calling spin_lock_init), but thats not
true in all cases, and we aren't guaranteed that condition, so we need to pass
the relevant spinlocks through the spin_lock_init function.

Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to
the ndo_init path, so they are ready for use by the bond_store_slaves path.

Change notes:
v2) Based on conversation with Jay and Nicolas it seems that the ability to
enslave devices while the bond master is down should be safe to do.  As such
this is an outlier bug, and so instead we'll just initalize the errant spinlocks
in the init path rather than the open path, solving the problem.  We'll also
remove the warnings about the bond being down during enslave operations, since
it should be safe

v3) Fix spelling error

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: jtluka@redhat.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: nicolas.2p.debian@gmail.com
CC: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/bonding/bond_alb.c   |    4 ----
 drivers/net/bonding/bond_main.c  |   16 ++++++++++------
 drivers/net/bonding/bond_sysfs.c |    6 ------
 3 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 8f2d2e7..2df9276 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -163,8 +163,6 @@ static int tlb_initialize(struct bonding *bond)
 	struct tlb_client_info *new_hashtbl;
 	int i;
 
-	spin_lock_init(&(bond_info->tx_hashtbl_lock));
-
 	new_hashtbl = kzalloc(size, GFP_KERNEL);
 	if (!new_hashtbl) {
 		pr_err("%s: Error: Failed to allocate TLB hash table\n",
@@ -747,8 +745,6 @@ static int rlb_initialize(struct bonding *bond)
 	int size = RLB_HASH_TABLE_SIZE * sizeof(struct rlb_client_info);
 	int i;
 
-	spin_lock_init(&(bond_info->rx_hashtbl_lock));
-
 	new_hashtbl = kmalloc(size, GFP_KERNEL);
 	if (!new_hashtbl) {
 		pr_err("%s: Error: Failed to allocate RLB hash table\n",
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 088fd84..c0045d7 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1542,12 +1542,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 			   bond_dev->name, slave_dev->name);
 	}
 
-	/* bond must be initialized by bond_open() before enslaving */
-	if (!(bond_dev->flags & IFF_UP)) {
-		pr_warning("%s: master_dev is not up in bond_enslave\n",
-			   bond_dev->name);
-	}
-
 	/* already enslaved */
 	if (slave_dev->flags & IFF_SLAVE) {
 		pr_debug("Error, Device was already enslaved\n");
@@ -4832,9 +4826,19 @@ static int bond_init(struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
 	struct bond_net *bn = net_generic(dev_net(bond_dev), bond_net_id);
+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
 
 	pr_debug("Begin bond_init for %s\n", bond_dev->name);
 
+	/*
+	 * Initialize locks that may be required during
+	 * en/deslave operations.  All of the bond_open work
+	 * (of which this is part) should really be moved to
+	 * a phase prior to dev_open
+	 */
+	spin_lock_init(&(bond_info->tx_hashtbl_lock));
+	spin_lock_init(&(bond_info->rx_hashtbl_lock));
+
 	bond->wq = create_singlethread_workqueue(bond_dev->name);
 	if (!bond->wq)
 		return -ENOMEM;
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 4059bfc..bb1319f 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -227,12 +227,6 @@ static ssize_t bonding_store_slaves(struct device *d,
 	struct net_device *dev;
 	struct bonding *bond = to_bond(d);
 
-	/* Quick sanity check -- is the bond interface up? */
-	if (!(bond->dev->flags & IFF_UP)) {
-		pr_warning("%s: doing slave updates when interface is down.\n",
-			   bond->dev->name);
-	}
-
 	if (!rtnl_trylock())
 		return restart_syscall();
 
-- 
1.7.5.2


^ permalink raw reply related

* Re: [PATCH] bonding: documentation and code cleanup for resend_igmp
From: Rick Jones @ 2011-05-25 18:10 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: netdev, Jay Vosburgh, Andy Gospodarek
In-Reply-To: <4DDD3D6E.1040706@redhat.com>

> > 
> > Grammar nit - "prevents the ICMP membership report from being issued in
> > response to the failover event."  Or "prevents issuing of the IGMP
> > membership report in response to a failover event."
> 
> I like the first suggestion. I'll update the patch fixing the ICMP
> typo.

Petards, petards, everywhere - and why are they all pointing at me :)

> 
> >> +
> >> +	This option is useful for bonding modes balance-rr or 0,
> >> +	active-backup or 1, balance-tlb or 5, balance-alb or 6, which a
> >> +	failover can move the IGMP traffic from one slave to another.
> >> +	Therefore, the switch must be notified with an extra IGMP report
> >> +	to start forwarding the IGMP traffic over the new selected slave.
> > 
> > More nits.  How about (with some added guesses on my part about
> > direction)
> > 
> > This option is useful for bonding modes balance-rr (0), active-backup
> > (1), balance-tlb (5) and balance-alb (6), in which a failover can switch
> > the outgoing IGMP traffic from one slave to another.  Therefore a fresh
> > IGMP report must be issued to cause the switch to forward the incoming
> > IGMP traffic over the newly selected slave.
> 
> It's also better. thanks. However, the failover can switch both
> incoming and outgoing IGMP traffic so I'd leave it generic as before.
> For instance:
> 
> This option is useful for bonding modes balance-rr (0), active-backup
> (1), balance-tlb (5) and balance-alb (6), in which a failover can switch
> the IGMP traffic from one slave to another.  Therefore a fresh IGMP
> report must be issued to cause the switch to forward the incoming
> IGMP traffic over the newly selected slave.
> 
> What do you think?

I'm find with leaving it generic.

rick jones


^ permalink raw reply

* Re: [PATCHv2] net: Define enum for the bits used in features.
From: Mahesh Bandewar @ 2011-05-25 18:05 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: David Miller, netdev, Tom Herbert, Stephen Hemminger
In-Reply-To: <BANLkTik4KYrR4vYm1R_hYNbwFjQ79FEoHQ@mail.gmail.com>

On Wed, May 25, 2011 at 2:43 AM, Michał Mirosław <mirqus@gmail.com> wrote:
> 2011/5/25 Mahesh Bandewar <maheshb@google.com>:
>> Little bit cleanup by defining enum for all bits used. Also use those enum
>> values to redefine flags.
>>
>> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
>> ---
>> Changes since v1:
>>  Split the patch into two pieces.
>>
>>  include/linux/netdevice.h |  100 +++++++++++++++++++++++++++++++-------------
>>  1 files changed, 70 insertions(+), 30 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index ca333e7..b4520b2 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -51,6 +51,7 @@
>>  #ifdef CONFIG_DCB
>>  #include <net/dcbnl.h>
>>  #endif
>> +#include <linux/netdev_features.h>
>
> This should go to other part.
>
Ahh! split mishap!

>>  struct vlan_group;
>>  struct netpoll_info;
>> @@ -981,6 +982,49 @@ struct net_device_ops {
>>  };
>>
>>  /*
>> + * Net device feature bits; if you change something,
>> + * also update netdev_features_strings[] in ethtool.c
>> + */
>> +enum netdev_features {
> [...]
>
>> +       RESERVED16_BIT,         /* the GSO_MASK reserved bit 16 */
>> +       RESERVED17_BIT,         /* the GSO_MASK reserved bit 17 */
>> +       RESERVED18_BIT,         /* the GSO_MASK reserved bit 18 */
>> +       RESERVED19_BIT,         /* the GSO_MASK reserved bit 19 */
>> +       RESERVED20_BIT,         /* the GSO_MASK reserved bit 20 */
>> +       RESERVED21_BIT,         /* the GSO_MASK reserved bit 21 */
>> +       RESERVED22_BIT,         /* the GSO_MASK reserved bit 22 */
>> +       RESERVED23_BIT,         /* the GSO_MASK reserved bit 23 */
>
> This could also define NETIF_F_GSO_SHIFT and name TSO and others.
> Maybe like this:
>
>
> NETIF_F_GSO_SHIFT, /* must == 16, for now */
> GSO_RESERVED0_BIT = NETIF_F_GSO_SHIFT,
> GSO_RESERVED1_BIT,
> ...
> GSO_RESERVED7_BIT,
> ... (other bits)
>
Yes, I was thinking about that but then thought that itself will
deserve a separate patch.

>> +       /* Add you bit above this */
>
> your.
>
>> +       ND_FEATURE_NUM_BITS,     /* (LAST VALUE) Total bits in use */
>
> And here GSO aliases:
>
> TSO_BIT = NETIF_F_GSO_SHIFT + SKB_GSO_TCPV4,
> UFO_BIT = NETIF_F_GSO_SHIFT + SKB_GSO_UDP,
> ...
>
> Afther this is done, I can convert the feature-names table in
> ethtool.c to use C99 array assignments, so that no one else will trip
> on a missing comma or wrong entries order there.
>
> About the bit names: NETIF_F_xxx_BIT?
>
I thought it was getting longer so tried to trim it, but don't have
reservations against it.

--mahesh..

> Best Regards,
> Michał Mirosław
>

^ permalink raw reply

* Re: pull request: wireless-next-2.6 2011-05-24
From: David Miller @ 2011-05-25 17:54 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20110524210400.GA8864@tuxdriver.com>

From: "John W. Linville" <linville@tuxdriver.com>
Date: Tue, 24 May 2011 17:04:01 -0400

> Here is the last (belated) batch of new wireless bits intended for
> 2.6.40.  I had intended to send this over the weekend, but there was
> a dust-up over some merge confusion that had to get settled first.
> In the meantime, this got a few more days to cook in -next. :-)
> 
> There isn't anything Earth-shattering here -- mostly just some
> last-minute bits posted a bit before the release.  Some of them are
> fixes for bugs in other new code.  There is also the addition of a
> generic GPIO-based rfkill driver.
> 
> Please let me know if there are problems!

Pulled, thanks John.

^ permalink raw reply

* [PATCH] net: hold rtnl again in dump callbacks
From: Eric Dumazet @ 2011-05-25 17:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Patrick McHardy, Stephen Hemminger

Commit e67f88dd12f6 (dont hold rtnl mutex during netlink dump callbacks)
missed fact that rtnl_fill_ifinfo() must be called with rtnl held.

Because of possible deadlocks between two mutexes (cb_mutex and rtnl),
its not easy to solve this problem, so revert this part of the patch.

It also forgot one rcu_read_unlock() in FIB dump_rules()

Add one ASSERT_RTNL() in rtnl_fill_ifinfo() to remind us the rule.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Stephen Hemminger <shemminger@vyatta.com>
---
I tried to solve this problem differently but failed. We need more
preliminary steps.

 net/core/fib_rules.c |    1 +
 net/core/rtnetlink.c |    9 +++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 3911586..008dc70 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -602,6 +602,7 @@ static int dump_rules(struct sk_buff *skb, struct netlink_callback *cb,
 skip:
 		idx++;
 	}
+	rcu_read_unlock();
 	cb->args[1] = idx;
 	rules_ops_put(ops);
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d1644e3..2d56cb9 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -850,6 +850,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	struct nlattr *attr, *af_spec;
 	struct rtnl_af_ops *af_ops;
 
+	ASSERT_RTNL();
 	nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ifm), flags);
 	if (nlh == NULL)
 		return -EMSGSIZE;
@@ -1876,6 +1877,7 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	int min_len;
 	int family;
 	int type;
+	int err;
 
 	type = nlh->nlmsg_type;
 	if (type > RTM_MAX)
@@ -1902,8 +1904,11 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		if (dumpit == NULL)
 			return -EOPNOTSUPP;
 
+		__rtnl_unlock();
 		rtnl = net->rtnl;
-		return netlink_dump_start(rtnl, skb, nlh, dumpit, NULL);
+		err = netlink_dump_start(rtnl, skb, nlh, dumpit, NULL);
+		rtnl_lock();
+		return err;
 	}
 
 	memset(rta_buf, 0, (rtattr_max * sizeof(struct rtattr *)));
@@ -1975,7 +1980,7 @@ static int __net_init rtnetlink_net_init(struct net *net)
 {
 	struct sock *sk;
 	sk = netlink_kernel_create(net, NETLINK_ROUTE, RTNLGRP_MAX,
-				   rtnetlink_rcv, NULL, THIS_MODULE);
+				   rtnetlink_rcv, &rtnl_mutex, THIS_MODULE);
 	if (!sk)
 		return -ENOMEM;
 	net->rtnl = sk;



^ permalink raw reply related

* Re: [PATCH] bonding: documentation and code cleanup for resend_igmp
From: Flavio Leitner @ 2011-05-25 17:33 UTC (permalink / raw)
  To: rick.jones2; +Cc: netdev, Jay Vosburgh, Andy Gospodarek
In-Reply-To: <1306343973.8149.1556.camel@tardy>

On 05/25/2011 02:19 PM, Rick Jones wrote:
> On Wed, 2011-05-25 at 11:44 -0300, Flavio Leitner wrote:
>> Improves the documentation about how IGMP resend parameter
>> works, fix two missing checks and coding style issues.
>>
>> Signed-off-by: Flavio Leitner <fbl@redhat.com>
>> ---
>>  Documentation/networking/bonding.txt |   13 +++++++++++--
>>  drivers/net/bonding/bond_main.c      |   12 +++++++-----
>>  drivers/net/bonding/bond_sysfs.c     |   10 +++++-----
>>  3 files changed, 23 insertions(+), 12 deletions(-)
>>
>> diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
>> index 1f45bd8..683ef76 100644
>> --- a/Documentation/networking/bonding.txt
>> +++ b/Documentation/networking/bonding.txt
>> @@ -770,8 +770,17 @@ resend_igmp
>>  	a failover event. One membership report is issued immediately after
>>  	the failover, subsequent packets are sent in each 200ms interval.
>>  
>> -	The valid range is 0 - 255; the default value is 1. This option
>> -	was added for bonding version 3.7.0.
>> +	The valid range is 0 - 255; the default value is 1. A value of 0
>> +	prevents the IGMP membership report to be issued due the failover
>> +	event.
> 
> Grammar nit - "prevents the ICMP membership report from being issued in
> response to the failover event."  Or "prevents issuing of the IGMP
> membership report in response to a failover event."

I like the first suggestion. I'll update the patch fixing the ICMP
typo.

>> +
>> +	This option is useful for bonding modes balance-rr or 0,
>> +	active-backup or 1, balance-tlb or 5, balance-alb or 6, which a
>> +	failover can move the IGMP traffic from one slave to another.
>> +	Therefore, the switch must be notified with an extra IGMP report
>> +	to start forwarding the IGMP traffic over the new selected slave.
> 
> More nits.  How about (with some added guesses on my part about
> direction)
> 
> This option is useful for bonding modes balance-rr (0), active-backup
> (1), balance-tlb (5) and balance-alb (6), in which a failover can switch
> the outgoing IGMP traffic from one slave to another.  Therefore a fresh
> IGMP report must be issued to cause the switch to forward the incoming
> IGMP traffic over the newly selected slave.

It's also better. thanks. However, the failover can switch both
incoming and outgoing IGMP traffic so I'd leave it generic as before.
For instance:

This option is useful for bonding modes balance-rr (0), active-backup
(1), balance-tlb (5) and balance-alb (6), in which a failover can switch
the IGMP traffic from one slave to another.  Therefore a fresh IGMP
report must be issued to cause the switch to forward the incoming
IGMP traffic over the newly selected slave.

What do you think?

thanks,
fbl

^ permalink raw reply

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode (v2)
From: Jay Vosburgh @ 2011-05-25 17:32 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, Andy Gospodarek, nicolas.2p.debian, David S. Miller
In-Reply-To: <1306343805-3223-1-git-send-email-nhorman@tuxdriver.com>

Neil Horman <nhorman@tuxdriver.com> wrote:

>This soft lockup was recently reported:
>
>[root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
>[root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
>bonding: bond5: doing slave updates when interface is down.
>bonding bond5: master_dev is not up in bond_enslave
>[root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
>bonding: bond5: doing slave updates when interface is down.
>
>BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
>CPU 12:
>Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
>be2d
>Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
>RIP: 0010:[<ffffffff80064bf0>]  [<ffffffff80064bf0>]
>.text.lock.spinlock+0x26/00
>RSP: 0018:ffff810113167da8  EFLAGS: 00000286
>RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
>RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
>RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
>R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
>R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
>FS:  00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0
>
>Call Trace:
> [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
> [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
> [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
> [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
> [<ffffffff8006457b>] __down_write_nested+0x12/0x92
> [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
> [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
> [<ffffffff80016b87>] vfs_write+0xce/0x174
> [<ffffffff80017450>] sys_write+0x45/0x6e
> [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>
>It occurs because we are able to change the slave configuarion of a bond while
>the bond interface is down.  The bonding driver initializes some data structures
>only after its ndo_open routine is called.  Among them is the initalization of
>the alb tx and rx hash locks.  So if we add or remove a slave without first
>opening the bond master device, we run the risk of trying to lock/unlock a
>spinlock that has garbage for data in it, which results in our above softlock.
>
>Note that sometimes this works, because in many cases an unlocked spinlock has
>the raw_lock parameter initialized to zero (meaning that the kzalloc of the
>net_device private data is equivalent to calling spin_lock_init), but thats not
>true in all cases, and we aren't guaranteed that condition, so we need to pass
>the relevant spinlocks through the spin_lock_init function.
>
>Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to
>the ndo_init path, so they are ready for use by the bond_store_slaves path.
>
>Change notes:
>v2) Based on conversation with Jay and Nicolas it seems that the ability to
>enslave devices while the bond master is down should be safe to do.  As such
>this is an outlier bug, and so instead we'll just initalize the errant spinlocks
>in the init path rather than the open path, solving the problem.  We'll also
>remove the warnings about the bond being down during enslave operations, since
>it should be safe

	One spelling nit, below; fix that and I'm good with this.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

	-J

>Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>Reported-by: jtluka@redhat.com
>CC: Jay Vosburgh <fubar@us.ibm.com>
>CC: Andy Gospodarek <andy@greyhouse.net>
>CC: nicolas.2p.debian@gmail.com
>CC: "David S. Miller" <davem@davemloft.net>
>---
> drivers/net/bonding/bond_alb.c   |    4 ----
> drivers/net/bonding/bond_main.c  |   16 ++++++++++------
> drivers/net/bonding/bond_sysfs.c |    6 ------
> 3 files changed, 10 insertions(+), 16 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index 8f2d2e7..2df9276 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -163,8 +163,6 @@ static int tlb_initialize(struct bonding *bond)
> 	struct tlb_client_info *new_hashtbl;
> 	int i;
>
>-	spin_lock_init(&(bond_info->tx_hashtbl_lock));
>-
> 	new_hashtbl = kzalloc(size, GFP_KERNEL);
> 	if (!new_hashtbl) {
> 		pr_err("%s: Error: Failed to allocate TLB hash table\n",
>@@ -747,8 +745,6 @@ static int rlb_initialize(struct bonding *bond)
> 	int size = RLB_HASH_TABLE_SIZE * sizeof(struct rlb_client_info);
> 	int i;
>
>-	spin_lock_init(&(bond_info->rx_hashtbl_lock));
>-
> 	new_hashtbl = kmalloc(size, GFP_KERNEL);
> 	if (!new_hashtbl) {
> 		pr_err("%s: Error: Failed to allocate RLB hash table\n",
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 088fd84..59bf5c5 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1542,12 +1542,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
> 			   bond_dev->name, slave_dev->name);
> 	}
>
>-	/* bond must be initialized by bond_open() before enslaving */
>-	if (!(bond_dev->flags & IFF_UP)) {
>-		pr_warning("%s: master_dev is not up in bond_enslave\n",
>-			   bond_dev->name);
>-	}
>-
> 	/* already enslaved */
> 	if (slave_dev->flags & IFF_SLAVE) {
> 		pr_debug("Error, Device was already enslaved\n");
>@@ -4832,9 +4826,19 @@ static int bond_init(struct net_device *bond_dev)
> {
> 	struct bonding *bond = netdev_priv(bond_dev);
> 	struct bond_net *bn = net_generic(dev_net(bond_dev), bond_net_id);
>+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>
> 	pr_debug("Begin bond_init for %s\n", bond_dev->name);
>
>+	/*
>+	 * Initialize locks that may be requiered during

	"Required."

>+	 * en/deslave operations.  All of the bond_open work
>+	 * (of which this is part) should really be moved to
>+	 * a phase prior to dev_open
>+	 */
>+	spin_lock_init(&(bond_info->tx_hashtbl_lock));
>+	spin_lock_init(&(bond_info->rx_hashtbl_lock));
>+
> 	bond->wq = create_singlethread_workqueue(bond_dev->name);
> 	if (!bond->wq)
> 		return -ENOMEM;
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index 4059bfc..bb1319f 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -227,12 +227,6 @@ static ssize_t bonding_store_slaves(struct device *d,
> 	struct net_device *dev;
> 	struct bonding *bond = to_bond(d);
>
>-	/* Quick sanity check -- is the bond interface up? */
>-	if (!(bond->dev->flags & IFF_UP)) {
>-		pr_warning("%s: doing slave updates when interface is down.\n",
>-			   bond->dev->name);
>-	}
>-
> 	if (!rtnl_trylock())
> 		return restart_syscall();
>
>-- 
>1.7.5.2
>

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH] bonding: documentation and code cleanup for resend_igmp
From: Rick Jones @ 2011-05-25 17:19 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: netdev, Jay Vosburgh, Andy Gospodarek
In-Reply-To: <1306334642-20733-1-git-send-email-fbl@redhat.com>

On Wed, 2011-05-25 at 11:44 -0300, Flavio Leitner wrote:
> Improves the documentation about how IGMP resend parameter
> works, fix two missing checks and coding style issues.
> 
> Signed-off-by: Flavio Leitner <fbl@redhat.com>
> ---
>  Documentation/networking/bonding.txt |   13 +++++++++++--
>  drivers/net/bonding/bond_main.c      |   12 +++++++-----
>  drivers/net/bonding/bond_sysfs.c     |   10 +++++-----
>  3 files changed, 23 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
> index 1f45bd8..683ef76 100644
> --- a/Documentation/networking/bonding.txt
> +++ b/Documentation/networking/bonding.txt
> @@ -770,8 +770,17 @@ resend_igmp
>  	a failover event. One membership report is issued immediately after
>  	the failover, subsequent packets are sent in each 200ms interval.
>  
> -	The valid range is 0 - 255; the default value is 1. This option
> -	was added for bonding version 3.7.0.
> +	The valid range is 0 - 255; the default value is 1. A value of 0
> +	prevents the IGMP membership report to be issued due the failover
> +	event.

Grammar nit - "prevents the ICMP membership report from being issued in
response to the failover event."  Or "prevents issuing of the IGMP
membership report in response to a failover event."

> +
> +	This option is useful for bonding modes balance-rr or 0,
> +	active-backup or 1, balance-tlb or 5, balance-alb or 6, which a
> +	failover can move the IGMP traffic from one slave to another.
> +	Therefore, the switch must be notified with an extra IGMP report
> +	to start forwarding the IGMP traffic over the new selected slave.

More nits.  How about (with some added guesses on my part about
direction)

This option is useful for bonding modes balance-rr (0), active-backup
(1), balance-tlb (5) and balance-alb (6), in which a failover can switch
the outgoing IGMP traffic from one slave to another.  Therefore a fresh
IGMP report must be issued to cause the switch to forward the incoming
IGMP traffic over the newly selected slave.

rick jones



^ permalink raw reply

* [PATCH] bonding: prevent deadlock on slave store with alb mode (v2)
From: Neil Horman @ 2011-05-25 17:16 UTC (permalink / raw)
  To: netdev
  Cc: Neil Horman, Jay Vosburgh, Andy Gospodarek, nicolas.2p.debian,
	David S. Miller
In-Reply-To: <1306265765-8257-1-git-send-email-nhorman@tuxdriver.com>

This soft lockup was recently reported:

[root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
[root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.
bonding bond5: master_dev is not up in bond_enslave
[root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.

BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
CPU 12:
Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
be2d
Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
RIP: 0010:[<ffffffff80064bf0>]  [<ffffffff80064bf0>]
.text.lock.spinlock+0x26/00
RSP: 0018:ffff810113167da8  EFLAGS: 00000286
RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
FS:  00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
 [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
 [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
 [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
 [<ffffffff8006457b>] __down_write_nested+0x12/0x92
 [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
 [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
 [<ffffffff80016b87>] vfs_write+0xce/0x174
 [<ffffffff80017450>] sys_write+0x45/0x6e
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

It occurs because we are able to change the slave configuarion of a bond while
the bond interface is down.  The bonding driver initializes some data structures
only after its ndo_open routine is called.  Among them is the initalization of
the alb tx and rx hash locks.  So if we add or remove a slave without first
opening the bond master device, we run the risk of trying to lock/unlock a
spinlock that has garbage for data in it, which results in our above softlock.

Note that sometimes this works, because in many cases an unlocked spinlock has
the raw_lock parameter initialized to zero (meaning that the kzalloc of the
net_device private data is equivalent to calling spin_lock_init), but thats not
true in all cases, and we aren't guaranteed that condition, so we need to pass
the relevant spinlocks through the spin_lock_init function.

Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to
the ndo_init path, so they are ready for use by the bond_store_slaves path.

Change notes:
v2) Based on conversation with Jay and Nicolas it seems that the ability to
enslave devices while the bond master is down should be safe to do.  As such
this is an outlier bug, and so instead we'll just initalize the errant spinlocks
in the init path rather than the open path, solving the problem.  We'll also
remove the warnings about the bond being down during enslave operations, since
it should be safe

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: jtluka@redhat.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: nicolas.2p.debian@gmail.com
CC: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/bonding/bond_alb.c   |    4 ----
 drivers/net/bonding/bond_main.c  |   16 ++++++++++------
 drivers/net/bonding/bond_sysfs.c |    6 ------
 3 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 8f2d2e7..2df9276 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -163,8 +163,6 @@ static int tlb_initialize(struct bonding *bond)
 	struct tlb_client_info *new_hashtbl;
 	int i;
 
-	spin_lock_init(&(bond_info->tx_hashtbl_lock));
-
 	new_hashtbl = kzalloc(size, GFP_KERNEL);
 	if (!new_hashtbl) {
 		pr_err("%s: Error: Failed to allocate TLB hash table\n",
@@ -747,8 +745,6 @@ static int rlb_initialize(struct bonding *bond)
 	int size = RLB_HASH_TABLE_SIZE * sizeof(struct rlb_client_info);
 	int i;
 
-	spin_lock_init(&(bond_info->rx_hashtbl_lock));
-
 	new_hashtbl = kmalloc(size, GFP_KERNEL);
 	if (!new_hashtbl) {
 		pr_err("%s: Error: Failed to allocate RLB hash table\n",
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 088fd84..59bf5c5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1542,12 +1542,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 			   bond_dev->name, slave_dev->name);
 	}
 
-	/* bond must be initialized by bond_open() before enslaving */
-	if (!(bond_dev->flags & IFF_UP)) {
-		pr_warning("%s: master_dev is not up in bond_enslave\n",
-			   bond_dev->name);
-	}
-
 	/* already enslaved */
 	if (slave_dev->flags & IFF_SLAVE) {
 		pr_debug("Error, Device was already enslaved\n");
@@ -4832,9 +4826,19 @@ static int bond_init(struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
 	struct bond_net *bn = net_generic(dev_net(bond_dev), bond_net_id);
+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
 
 	pr_debug("Begin bond_init for %s\n", bond_dev->name);
 
+	/*
+	 * Initialize locks that may be requiered during
+	 * en/deslave operations.  All of the bond_open work
+	 * (of which this is part) should really be moved to
+	 * a phase prior to dev_open
+	 */
+	spin_lock_init(&(bond_info->tx_hashtbl_lock));
+	spin_lock_init(&(bond_info->rx_hashtbl_lock));
+
 	bond->wq = create_singlethread_workqueue(bond_dev->name);
 	if (!bond->wq)
 		return -ENOMEM;
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 4059bfc..bb1319f 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -227,12 +227,6 @@ static ssize_t bonding_store_slaves(struct device *d,
 	struct net_device *dev;
 	struct bonding *bond = to_bond(d);
 
-	/* Quick sanity check -- is the bond interface up? */
-	if (!(bond->dev->flags & IFF_UP)) {
-		pr_warning("%s: doing slave updates when interface is down.\n",
-			   bond->dev->name);
-	}
-
 	if (!rtnl_trylock())
 		return restart_syscall();
 
-- 
1.7.5.2


^ permalink raw reply related

* [PATCH] Add Fujitsu 1000base-SX PCI ID to tg3
From: Meelis Roos @ 2011-05-25 15:43 UTC (permalink / raw)
  To: Matt Carlson, Michael Chan; +Cc: netdev

This patch adds the PCI ID of Fujitsu 1000base-SX NIC to tg3 driver. 
Tested to detect the card, MAC and serdes, not tested with link at the 
moment since I have no fiber switch here. I did not add new constants to 
the pci_ids.h header file since these constants are used only here.

Signed-off-by: Meelis Roos <mroos@linux.ee>

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 7a5daef..7b1901e 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -273,6 +273,7 @@ static DEFINE_PCI_DEVICE_TABLE(tg3_pci_tbl) = {
 	{PCI_DEVICE(PCI_VENDOR_ID_ALTIMA, PCI_DEVICE_ID_ALTIMA_AC1003)},
 	{PCI_DEVICE(PCI_VENDOR_ID_ALTIMA, PCI_DEVICE_ID_ALTIMA_AC9100)},
 	{PCI_DEVICE(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_TIGON3)},
+	{PCI_DEVICE(0x10cf, 0x11a2)}, /* Fujitsu 1000base-SX with BCM5703SKHB */
 	{}
 };
 

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply related

* Bad behaviour when unintentionally mixing ipv4 and ipv6 addresses
From: Marcus Meissner @ 2011-05-25 15:59 UTC (permalink / raw)
  To: netdev; +Cc: max

[-- Attachment #1: Type: text/plain, Size: 1660 bytes --]

Hi,

By chance Reinhard Max spotted an interesting flaw in Linux bind(2)...

If you create a IPv4 socket and then incorrectly bind(2) to a IPv6 address
(which you got from getaddrinfo(3) or similar), the socket will be bound
to INADDR_ANY.

The reason is that the kernel just takes the sockaddr_in6 struct and
evaluates it as a sockaddr_in struct, with the port being OK, but the IPv4
sin_addr overlaying the IPv6 sin6_flowinfo field.

As the sin6_flowinfo field is usually 0, your service can end up listening
to the world.

A testprogram that you can strace is attached, run netstat -apn |grep 12345
afterwards to see it binds 0.0.0.0:12345.

Perhaps add a check like the one below? (untested)
Or use if (addr->sin_family == AF_INET6) to just catch the IPv6 case?

Ciao, Marcus


Subject: [PATCH] net/ipv4: Check for mistakenly passed in non-IPv4 address

Hi,

Check against mistakenly passing in IPv6 addresses (which would result
in an INADDR_ANY bind) or similar incompatible sockaddrs.

Ciao, Marcus

Signed-off-by: Marcus Meissner <meissner@suse.de>
Cc: Reinhard Max <max@suse.de>
---
 net/ipv4/af_inet.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index cc14631..9c19260 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -465,6 +465,9 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	if (addr_len < sizeof(struct sockaddr_in))
 		goto out;
 
+	if (addr->sin_family != AF_INET)
+		goto out;
+
 	chk_addr_ret = inet_addr_type(sock_net(sk), addr->sin_addr.s_addr);
 
 	/* Not specified by any standard per-se, however it breaks too
-- 
1.7.4.1


[-- Attachment #2: xx.c --]
[-- Type: text/x-c++src, Size: 845 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <netinet/in.h>
#include <unistd.h>
#include <string.h>
int main(int argc, char **argv) {
	int			fd,fd2;
	struct sockaddr_in6	saddrin,saddrin2;
	size_t			len;

	fd = socket(AF_INET,SOCK_STREAM,0);
	if (fd == -1) {
		perror("socket");
		exit(1);
	}
	memset(&saddrin,0,sizeof(saddrin));

	/*memcpy(&saddrin.sin6_addr, &in6addr_loopback, sizeof(saddrin.sin6_addr));*/
	memset(&saddrin.sin6_addr, 0x42, sizeof(saddrin.sin6_addr));

	saddrin.sin6_family = AF_INET6;
	saddrin.sin6_port = htons(12345);
	if (-1 == bind(fd, (struct sockaddr*)&saddrin, sizeof(saddrin))) {
		perror("bind");
		exit(1);
	}
	if (-1 == listen(fd, 5)) {
		perror("listen");
		exit(1);
	}
	len  = sizeof(saddrin2);
	fd2 = accept (fd, (struct sockaddr*)&saddrin2, &len);
	if (fd2 == -1) perror("accept");
	close(fd);
	return 0;
}

^ permalink raw reply related

* Re: [PATCH 17/34] hdlcdrv: Drop __TIME__ usage
From: Michal Marek @ 2011-05-25 15:24 UTC (permalink / raw)
  To: Thomas Sailer
  Cc: linux-kbuild, lkml, linux-hams, Linux Kernel Network Developers
In-Reply-To: <1302015790.4124.54.camel@xbox360.hq.axsem.com>

On 5.4.2011 17:03, Thomas Sailer wrote:
> On Tue, 2011-04-05 at 16:59 +0200, Michal Marek wrote:
>> The kernel already prints its build timestamp during boot, no need to
>> repeat it in random drivers and produce different object files each
>> time.
>>
>> Cc: Thomas Sailer<t.sailer@alumni.ethz.ch>
>> Cc: linux-hams@vger.kernel.org
>> Cc: netdev@vger.kernel.org
>> Signed-off-by: Michal Marek<mmarek@suse.cz>
> Acked-By: Thomas Sailer<t.sailer@alumni.ethz.ch>
>
> Ok with me

I didn't find the commit in today's linux-next, so I applied this to
kbuild-2.6.git#trivial.

Michal

^ permalink raw reply

* Re: [PATCH 15/34] baycom: Drop __TIME__ usage
From: Michal Marek @ 2011-05-25 15:23 UTC (permalink / raw)
  To: Thomas Sailer
  Cc: linux-kbuild, linux-hams, Linux Kernel Network Developers, lkml
In-Reply-To: <1302015780.4124.53.camel@xbox360.hq.axsem.com>

On 5.4.2011 17:03, Thomas Sailer wrote:
> On Tue, 2011-04-05 at 16:59 +0200, Michal Marek wrote:
>> The kernel already prints its build timestamp during boot, no need to
>> repeat it in random drivers and produce different object files each
>> time.
>>
>> Cc: Thomas Sailer<t.sailer@alumni.ethz.ch>
>> Cc: linux-hams@vger.kernel.org
>> Cc: netdev@vger.kernel.org
>> Signed-off-by: Michal Marek<mmarek@suse.cz>
> Acked-By: Thomas Sailer<t.sailer@alumni.ethz.ch>

I didn't found the commit in today's linux-next, so I applied this to 
kbuild-2.6.git#trivial.

Michal

^ permalink raw reply

* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: Geert Uytterhoeven @ 2011-05-25 15:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Valdis.Kletnieks, Eric W. Biederman, James Bottomley,
	Linus Torvalds, linux-kernel, Linux Containers, netdev
In-Reply-To: <20110525131746.GA19118@elte.hu>

On Wed, May 25, 2011 at 15:17, Ingo Molnar <mingo@elte.hu> wrote:
> * Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>> On Wed, May 25, 2011 at 14:47, Ingo Molnar <mingo@elte.hu> wrote:
>> > * Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>> >
>> >> > But at least the primary, 'native' syscall table of every arch
>> >> > could be kept rather fresh via generic enumeration.
>> >>
>> >> So we can start all over at offset 501 (alpha just started using
>> >> 500) with a unified, clean, and compressed list of syscalls? Or do
>> >> we have some more other-os-compat syscalls around in this range?
>> >
>> > No, that would leave a big hole in the syscall table of most
>> > architectures.
>>
>> Sure, but we could (a) optimize for the case where the syscall number is
>> larger than 500 and/or (b) drop support for syscall numbers smaller than
>> 501, depending on a config option.
>
> Dunno why there is so much desire to complicate and break
> well-working ABIs while we have a 14+ MLOC kernel with so much code
> in it that is in dire need to be improved! :-)

Because we (think we) need less active brain cells to write emails that to code.
So when we're not "active" enough to hack, we tend to respond to long winding
getting off-topic email threads...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [RFC Patch] bonding: move to net/ directory
From: Neil Horman @ 2011-05-25 15:20 UTC (permalink / raw)
  To: Américo Wang
  Cc: Andy Gospodarek, Linux Kernel Network Developers, David Miller,
	Jay Vosburgh
In-Reply-To: <BANLkTinoRXmEAp0mbZdxSmMbntsE-QeRsQ@mail.gmail.com>

On Wed, May 25, 2011 at 08:43:21PM +0800, Américo Wang wrote:
> On Wed, May 25, 2011 at 12:33 AM, Neil Horman <nhorman@tuxdriver.com> wrote:
> > This all sounds like change for the sake of change to me.  I don't see any
> > compelling argument for moving bonding (or bridging or vlans, etc) around at
> > all.  All of these software drivers have feet in multple subsystems, but that
> > just means that theres not going to be a compelling argument to move them any
> > place,  at least not without an immediate subsequent argument that it really
> > belonged back where it was.  Unless you can show a solid benefit to moving the
> > code, I don't see why any reorganization is needed.
> 
> Well, as a people who worked on bonding code, I have no problem to know
> bonding code is under drivers/net/, but for people who don't know this, probably
> net/ is the first place they want to search.
> 
you've asked them?

cd git/net-next-2.6
find . -name '*bond*'

Thats going to find bonding code for you regardless of which place its in.  If
someone sufficiently unversed in working with a source tree goes looking for
some code by randomly poking around in directories, then they have bigger
problems than not finding what they want in the first place they look.  If thats
the only thing this move is meant to solve, it seems like a non-problem to me.

> The other similar thing is that pktgen is in net/core/ while netconsole is in
> drivers/net/, which seems a little strange too.
> 
Its is, and loopback is in /drivers/net, as is tun-tap and xen-netfront (none of
which touch actual hardware), as is slip and ppp, which are net drivers, but
only operate on non net hardware. ppoe is another example which operates on
ethernet, but sits on top of a secondary physical device (and so has no real
hardware itself).  These could all arguably be moved to /net, because they have
no real hardware, but are in drivers because they implement instances of the net
driver interface.

> vlan vs macvlan is the third example.
> 
yes, it is, macvlan, like you assert could be moved, but you could just as
easily move vlan to /drivers because it implements an instance of the net device
driver interface.

The bottom line is, sometimes things aren't black and white, they're gray. And
we put code where it makes the most sense at the time.  We move it when it
makes sense to, but I don't hear any compelling argument to make that move.
Yes, we're not always consistent with where we put hardwareless drivers, but to
make a policy and shove everything to one place or another doesn't any more
sense.  If we do that we either wind up with things that we think of as drivers
in the /net directory, or we wind up with stuff thats really protocol oriented
in the /drivers directory.  And if the major problem we solve by doing so is
making life easier for someone who otherwise wouldn't be able to find said code
with a quick grep or find operation, it really doesn't seem like its worthwhile
to me

> In short, there are three callers of netdev_rx_handler_register(), macvlan,
> bonding and bridge, only bridge code stays in net/ directory.
Why is calling netdev_rx_handler_register a gating factor here?

Neil

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [RFC Patch] bonding: move to net/ directory
From: Neil Horman @ 2011-05-25 15:01 UTC (permalink / raw)
  To: Américo Wang
  Cc: Andy Gospodarek, Linux Kernel Network Developers, David Miller,
	Jay Vosburgh
In-Reply-To: <BANLkTi=UCH8+ZVFwZ96YvFuDqDLwq8ihkQ@mail.gmail.com>

On Wed, May 25, 2011 at 08:32:43PM +0800, Américo Wang wrote:
> On Tue, May 24, 2011 at 11:03 PM, Andy Gospodarek <andy@greyhouse.net> wrote:
> > On Tue, May 24, 2011 at 10:00:23PM +0800, Américo Wang wrote:
> >> On Mon, May 23, 2011 at 11:13 PM, Andy Gospodarek <andy@greyhouse.net> wrote:
> >> > On Mon, May 23, 2011 at 08:45:14PM +0800, Américo Wang wrote:
> >> >> Hello, Jay, Andy,
> >> >>
> >> >> Is there any peculiar reason that bonding code has to stay
> >> >> in drivers/net/ directory?
> >> >>
> >> >> Since bonding and bridge are somewhat similar, both of
> >> >> which are used to "bond" two or more devices into one,
> >> >> and bridge code is already in net/bridge/, so I think it also
> >> >> makes sense to move bonding code into net/bonding/ too.
> >> >>
> >> >> This could also help to grep the source more easily. :)
> >> >>
> >> >> What do you think about the patch below?
> >> >> (Note, this patch is against net-next-2.6.)
> >> >>
> >> >
> >> > I would rather keep the code for bonding in drivers/net since it is
> >> > really a pure device (though not directly tied to any specific
> >> > hardware) rather than a device + forwarding or management code.
> >>
> >> Is this a reason strong enough to leave it to drivers/net/ ?
> >> I think it is generic enough to be moved to net/ directory... :-/
> >>
> >
> > I think the distinction is an important one and is one of the main
> > reasons why I would like to see bonding stay in drivers/net.
> >
> 
> Is this a rule that we judge if a piece of code should be in net/
> or drivers/net/ ? For me, that big difference between these
> two directory is clearly if the code is generic or hardware/platform
> independent.
> 
While thats a fine rule to draw a distinction on, it also creates other
organizational oddities.  By this reasoning, the loopback/tun-tap and xen
netfront drivers should also be moved to /net.  While this might be an ok move
to make, I think we can all agree, that while they don't touch specific
hardware, they implement instances of the driver model, and as such are
reasonably placed in /drivers.

> >> >
> >> > It has bothered me for a long time that the code just to manage the VLAN
> >> > and bridge devices sits outside of drivers/net, but I've never proposed
> >> > a patch to move the files as I suspect the maintainers of that code
> >> > would like to keep it all together.  Maybe it is time to do that.
> >> >
> >>
> >> You mean move net/8021q/ to drivers/net/8021q/ ?
> >>
> >
> > No.  I'm talking about the parts in the bridging and vlan code that
> > specifically setup devices, not all of the code.  I would be happier
> > if code that created objects of type net_device_ops all lived in
> > drivers/net.  Then all the drivers (real, stacked, or virtual) are in
> > the same place.
> 
> This would make grepping the source code more harder
> than it is, at least now we can grepping all bridge source code
> in net/bridge/, for example.
> 
This is really a false assertion.  Theres nothing more or less hard about
finding bonding code in /drivers than there is in /net.  grep and find let you
locate the code in either place equally well, and cscope really makes it all
moot anyway.

Neil

> Actually the vlan case you mentioned can be another example
> to show that to moving bonding code into net/ makes sense.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* [PATCH net-2.6] bnx2x: protect sequence increment with mutex
From: Dmitry Kravkov @ 2011-05-25 14:55 UTC (permalink / raw)
  To: davem, netdev@vger.kernel.org; +Cc: Eilon Greenstein


Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/bnx2x/bnx2x_main.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x_main.c b/drivers/net/bnx2x/bnx2x_main.c
index a97d9be..4b70311 100644
--- a/drivers/net/bnx2x/bnx2x_main.c
+++ b/drivers/net/bnx2x/bnx2x_main.c
@@ -2222,12 +2222,13 @@ static void bnx2x_pmf_update(struct bnx2x *bp)
 u32 bnx2x_fw_command(struct bnx2x *bp, u32 command, u32 param)
 {
 	int mb_idx = BP_FW_MB_IDX(bp);
-	u32 seq = ++bp->fw_seq;
+	u32 seq;
 	u32 rc = 0;
 	u32 cnt = 1;
 	u8 delay = CHIP_REV_IS_SLOW(bp) ? 100 : 10;
 
 	mutex_lock(&bp->fw_mb_mutex);
+	seq = ++bp->fw_seq;
 	SHMEM_WR(bp, func_mb[mb_idx].drv_mb_param, param);
 	SHMEM_WR(bp, func_mb[mb_idx].drv_mb_header, (command | seq));
 
-- 
1.7.2.2





^ permalink raw reply related

* [PATCH] bonding: documentation and code cleanup for resend_igmp
From: Flavio Leitner @ 2011-05-25 14:44 UTC (permalink / raw)
  To: netdev; +Cc: Jay Vosburgh, Andy Gospodarek, Flavio Leitner

Improves the documentation about how IGMP resend parameter
works, fix two missing checks and coding style issues.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
---
 Documentation/networking/bonding.txt |   13 +++++++++++--
 drivers/net/bonding/bond_main.c      |   12 +++++++-----
 drivers/net/bonding/bond_sysfs.c     |   10 +++++-----
 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 1f45bd8..683ef76 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -770,8 +770,17 @@ resend_igmp
 	a failover event. One membership report is issued immediately after
 	the failover, subsequent packets are sent in each 200ms interval.
 
-	The valid range is 0 - 255; the default value is 1. This option
-	was added for bonding version 3.7.0.
+	The valid range is 0 - 255; the default value is 1. A value of 0
+	prevents the IGMP membership report to be issued due the failover
+	event.
+
+	This option is useful for bonding modes balance-rr or 0,
+	active-backup or 1, balance-tlb or 5, balance-alb or 6, which a
+	failover can move the IGMP traffic from one slave to another.
+	Therefore, the switch must be notified with an extra IGMP report
+	to start forwarding the IGMP traffic over the new selected slave.
+
+	This option was added for bonding version 3.7.0.
 
 3. Configuring Bonding Devices
 ==============================
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 6dc4284..86a8001 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -852,7 +852,7 @@ static void bond_resend_igmp_join_requests(struct bonding *bond)
 static void bond_resend_igmp_join_requests_delayed(struct work_struct *work)
 {
 	struct bonding *bond = container_of(work, struct bonding,
-							mcast_work.work);
+					    mcast_work.work);
 	bond_resend_igmp_join_requests(bond);
 }
 
@@ -1172,10 +1172,12 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
 	}
 
 	/* resend IGMP joins since active slave has changed or
-	 * all were sent on curr_active_slave */
-	if (((USES_PRIMARY(bond->params.mode) && new_active) ||
-	     bond->params.mode == BOND_MODE_ROUNDROBIN) &&
-	    netif_running(bond->dev)) {
+	 * all were sent on curr_active_slave.
+	 * resend only if bond is brought up with the affected
+	 * bonding modes and the retransmission is enabled */
+	if (netif_running(bond->dev) && (bond->params.resend_igmp > 0) &&
+	    ((USES_PRIMARY(bond->params.mode) && new_active) ||
+	     bond->params.mode == BOND_MODE_ROUNDROBIN)) {
 		bond->igmp_retrans = bond->params.resend_igmp;
 		queue_delayed_work(bond->wq, &bond->mcast_work, 0);
 	}
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 4059bfc..a32848a 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1539,8 +1539,8 @@ static DEVICE_ATTR(all_slaves_active, S_IRUGO | S_IWUSR,
  * Show and set the number of IGMP membership reports to send on link failure
  */
 static ssize_t bonding_show_resend_igmp(struct device *d,
-					 struct device_attribute *attr,
-					 char *buf)
+					struct device_attribute *attr,
+					char *buf)
 {
 	struct bonding *bond = to_bond(d);
 
@@ -1548,8 +1548,8 @@ static ssize_t bonding_show_resend_igmp(struct device *d,
 }
 
 static ssize_t bonding_store_resend_igmp(struct device *d,
-					  struct device_attribute *attr,
-					  const char *buf, size_t count)
+					 struct device_attribute *attr,
+					 const char *buf, size_t count)
 {
 	int new_value, ret = count;
 	struct bonding *bond = to_bond(d);
@@ -1561,7 +1561,7 @@ static ssize_t bonding_store_resend_igmp(struct device *d,
 		goto out;
 	}
 
-	if (new_value < 0) {
+	if (new_value < 0 || new_value > 255) {
 		pr_err("%s: Invalid resend_igmp value %d not in range 0-255; rejected.\n",
 		       bond->dev->name, new_value);
 		ret = -EINVAL;
-- 
1.7.4


^ permalink raw reply related

* Re: [PATCH] bonding:update xmit_hash_policy description
From: Andy Gospodarek @ 2011-05-25 14:40 UTC (permalink / raw)
  To: Weiping Pan
  Cc: jpirko, Jay Vosburgh, Andy Gospodarek, open list:BONDING DRIVER,
	open list
In-Reply-To: <1306317904-18740-1-git-send-email-panweiping3@gmail.com>

On Wed, May 25, 2011 at 06:05:04PM +0800, Weiping Pan wrote:
> xmit_hash_policy supports layer2+3 now,
> and both balance-xor and 802.3ad use xmit_hash_policy.
> 
> Signed-off-by: Weiping Pan <panweiping3@gmail.com>
> ---
>  drivers/net/bonding/bond_main.c |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 088fd84..d1f5606 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -147,8 +147,7 @@ MODULE_PARM_DESC(lacp_rate, "LACPDU tx rate to request from 802.3ad partner "
>  module_param(ad_select, charp, 0);
>  MODULE_PARM_DESC(ad_select, "803.ad aggregation selection logic: stable (0, default), bandwidth (1), count (2)");
>  module_param(xmit_hash_policy, charp, 0);
> -MODULE_PARM_DESC(xmit_hash_policy, "XOR hashing method: 0 for layer 2 (default)"
> -				   ", 1 for layer 3+4");
> +MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method: 0 for layer 2 (default), 1 for layer 3+4, 2 for layer 2+3");

This is a good idea and a definite bug-fix.  Thanks for posting it.

I do not want to create a description that is longer than 80 characters
in the source if I can help it.  I went ahead and cleaned up all the
descriptions and have posted a followup patch.

^ permalink raw reply

* [PATCH net-next-2.6] bonding: cleanup module option descriptions
From: Andy Gospodarek @ 2011-05-25 14:41 UTC (permalink / raw)
  To: netdev; +Cc: Weiping Pan

Weiping Pan noticed that the module option description for
xmit_hash_policy was incorrect and was nice enough to post a patch to
fix it.  The text was correct, but created a line over 80 characters and
I would rather not add those.  I realized I could take a few minutes and
clean up all the descriptions and things would look much better.  This
is the result.

Based on patch from Weiping Pan <panweiping3@gmail.com>.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net> 
CC: Weiping Pan <panweiping3@gmail.com>

---
 drivers/net/bonding/bond_main.c |   34 ++++++++++++++++++++++------------
 1 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 088fd84..31a61d3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -113,9 +113,11 @@ MODULE_PARM_DESC(max_bonds, "Max number of bonded devices");
 module_param(tx_queues, int, 0);
 MODULE_PARM_DESC(tx_queues, "Max number of transmit queues (default = 16)");
 module_param_named(num_grat_arp, num_peer_notif, int, 0644);
-MODULE_PARM_DESC(num_grat_arp, "Number of peer notifications to send on failover event (alias of num_unsol_na)");
+MODULE_PARM_DESC(num_grat_arp, "Number of peer notifications to send on "
+			       "failover event (alias of num_unsol_na)");
 module_param_named(num_unsol_na, num_peer_notif, int, 0644);
-MODULE_PARM_DESC(num_unsol_na, "Number of peer notifications to send on failover event (alias of num_grat_arp)");
+MODULE_PARM_DESC(num_unsol_na, "Number of peer notifications to send on "
+			       "failover event (alias of num_grat_arp)");
 module_param(miimon, int, 0);
 MODULE_PARM_DESC(miimon, "Link check interval in milliseconds");
 module_param(updelay, int, 0);
@@ -127,7 +129,7 @@ module_param(use_carrier, int, 0);
 MODULE_PARM_DESC(use_carrier, "Use netif_carrier_ok (vs MII ioctls) in miimon; "
 			      "0 for off, 1 for on (default)");
 module_param(mode, charp, 0);
-MODULE_PARM_DESC(mode, "Mode of operation : 0 for balance-rr, "
+MODULE_PARM_DESC(mode, "Mode of operation; 0 for balance-rr, "
 		       "1 for active-backup, 2 for balance-xor, "
 		       "3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, "
 		       "6 for balance-alb");
@@ -142,27 +144,35 @@ MODULE_PARM_DESC(primary_reselect, "Reselect primary slave "
 				   "2 for only on active slave "
 				   "failure");
 module_param(lacp_rate, charp, 0);
-MODULE_PARM_DESC(lacp_rate, "LACPDU tx rate to request from 802.3ad partner "
-			    "(slow/fast)");
+MODULE_PARM_DESC(lacp_rate, "LACPDU tx rate to request from 802.3ad partner; "
+			    "0 for slow, 1 for fast");
 module_param(ad_select, charp, 0);
-MODULE_PARM_DESC(ad_select, "803.ad aggregation selection logic: stable (0, default), bandwidth (1), count (2)");
+MODULE_PARM_DESC(ad_select, "803.ad aggregation selection logic; "
+			    "0 for stable (default), 1 for bandwidth, "
+			    "2 for count");
 module_param(xmit_hash_policy, charp, 0);
-MODULE_PARM_DESC(xmit_hash_policy, "XOR hashing method: 0 for layer 2 (default)"
-				   ", 1 for layer 3+4");
+MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method; "
+				   "0 for layer 2 (default), 1 for layer 3+4, "
+				   "2 for layer 2+3");
 module_param(arp_interval, int, 0);
 MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
 module_param_array(arp_ip_target, charp, NULL, 0);
 MODULE_PARM_DESC(arp_ip_target, "arp targets in n.n.n.n form");
 module_param(arp_validate, charp, 0);
-MODULE_PARM_DESC(arp_validate, "validate src/dst of ARP probes: none (default), active, backup or all");
+MODULE_PARM_DESC(arp_validate, "validate src/dst of ARP probes; "
+			       "0 for none (default), 1 for active, "
+			       "2 for backup, 3 for all");
 module_param(fail_over_mac, charp, 0);
-MODULE_PARM_DESC(fail_over_mac, "For active-backup, do not set all slaves to the same MAC.  none (default), active or follow");
+MODULE_PARM_DESC(fail_over_mac, "For active-backup, do not set all slaves to "
+				"the same MAC; 0 for none (default), "
+				"1 for active, 2 for follow");
 module_param(all_slaves_active, int, 0);
 MODULE_PARM_DESC(all_slaves_active, "Keep all frames received on an interface"
-				     "by setting active flag for all slaves.  "
+				     "by setting active flag for all slaves; "
 				     "0 for never (default), 1 for always.");
 module_param(resend_igmp, int, 0);
-MODULE_PARM_DESC(resend_igmp, "Number of IGMP membership reports to send on link failure");
+MODULE_PARM_DESC(resend_igmp, "Number of IGMP membership reports to send on "
+			      "link failure");
 
 /*----------------------------- Global variables ----------------------------*/
 

^ permalink raw reply related

* [PATCH] sch_sfq: fix peek() implementation
From: Eric Dumazet @ 2011-05-25 14:40 UTC (permalink / raw)
  To: David Miller
  Cc: Jarek Poplawski, Patrick McHardy, netdev, Jesper Dangaard Brouer

Since commit eeaeb068f139 (sch_sfq: allow big packets and be fair),
sfq_peek() can return a different skb that would be normally dequeued by
sfq_dequeue() [ if current slot->allot is negative ]

Use generic qdisc_peek_dequeued() instead of custom implementation, to
get consistent result.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
---
 net/sched/sch_sfq.c |   14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index b1d00f8..b6ea6af 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -414,18 +414,6 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 }
 
 static struct sk_buff *
-sfq_peek(struct Qdisc *sch)
-{
-	struct sfq_sched_data *q = qdisc_priv(sch);
-
-	/* No active slots */
-	if (q->tail == NULL)
-		return NULL;
-
-	return q->slots[q->tail->next].skblist_next;
-}
-
-static struct sk_buff *
 sfq_dequeue(struct Qdisc *sch)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
@@ -706,7 +694,7 @@ static struct Qdisc_ops sfq_qdisc_ops __read_mostly = {
 	.priv_size	=	sizeof(struct sfq_sched_data),
 	.enqueue	=	sfq_enqueue,
 	.dequeue	=	sfq_dequeue,
-	.peek		=	sfq_peek,
+	.peek		=	qdisc_peek_dequeued,
 	.drop		=	sfq_drop,
 	.init		=	sfq_init,
 	.reset		=	sfq_reset,



^ permalink raw reply related

* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: Ingo Molnar @ 2011-05-25 13:17 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Valdis.Kletnieks, Eric W. Biederman, James Bottomley,
	Linus Torvalds, linux-kernel, Linux Containers, netdev
In-Reply-To: <BANLkTimQcyX9XSY__9Bpg3tg7-P7gtOTkw@mail.gmail.com>


* Geert Uytterhoeven <geert@linux-m68k.org> wrote:

> On Wed, May 25, 2011 at 14:47, Ingo Molnar <mingo@elte.hu> wrote:
> > * Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >
> >> > But at least the primary, 'native' syscall table of every arch
> >> > could be kept rather fresh via generic enumeration.
> >>
> >> So we can start all over at offset 501 (alpha just started using
> >> 500) with a unified, clean, and compressed list of syscalls? Or do
> >> we have some more other-os-compat syscalls around in this range?
> >
> > No, that would leave a big hole in the syscall table of most
> > architectures.
> 
> Sure, but we could (a) optimize for the case where the syscall number is
> larger than 500 and/or (b) drop support for syscall numbers smaller than
> 501, depending on a config option.

Dunno why there is so much desire to complicate and break 
well-working ABIs while we have a 14+ MLOC kernel with so much code 
in it that is in dire need to be improved! :-)

Yes, we can reduce the syscall addition pain via the 
ARCH_SYSCALLS_BASE trick, but we should really forget about 
*removing* (or reordering) syscall numbers as the advantages are 
marginal at best while the disadvantages are huge.

Messy syscall tables are irreversibly ingrained in tens of millions 
of systems and there's nothing we can do about that. We can improve 
the future shape of syscall tables and we can try not to make new 
mistakes, and that's a large enough job in itself ;-)

Thanks,

	Ingo

^ permalink raw reply

* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: Geert Uytterhoeven @ 2011-05-25 13:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Valdis.Kletnieks, Eric W. Biederman, James Bottomley,
	Linus Torvalds, linux-kernel, Linux Containers, netdev
In-Reply-To: <20110525124741.GC29300@elte.hu>

On Wed, May 25, 2011 at 14:47, Ingo Molnar <mingo@elte.hu> wrote:
> * Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
>> > But at least the primary, 'native' syscall table of every arch
>> > could be kept rather fresh via generic enumeration.
>>
>> So we can start all over at offset 501 (alpha just started using
>> 500) with a unified, clean, and compressed list of syscalls? Or do
>> we have some more other-os-compat syscalls around in this range?
>
> No, that would leave a big hole in the syscall table of most
> architectures.

Sure, but we could (a) optimize for the case where the syscall number is
larger than 500 and/or (b) drop support for syscall numbers smaller than
501, depending on a config option.

> So what would be needed is for each architecture to define a 'generic
> syscall table base index', ARCH_SYSCALL_BASE or so, and the generic
> syscalls would be added for that.
>
> Alpha would have 501, the others lower numbers.
>
> The only general assumption we can rely on is that there's a range of
> not yet used syscall numbers starting at the end of the current
> syscall table.

Yep, that would work too.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox