Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next-2.6] bonding: introduce primary_passive option
From: Jiri Pirko @ 2009-09-01 17:42 UTC (permalink / raw)
  To: netdev; +Cc: davem, fubar, bonding-devel, nicolas.2p.debian

(updated)

In some cases there is not desirable to switch back to primary interface when
it's link recovers and rather stay with currently active one. We need to avoid
packetloss as much as we can in some cases. This is solved by introducing
primary_passive option. Note that enslaved primary slave is set as current
active no matter what.

This patch depends on the following one:
[net-next-2.6] bonding: make ab_arp select active slaves as other modes
http://patchwork.ozlabs.org/patch/32684/

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index d5181ce..e70fa8e 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -614,6 +614,17 @@ primary
 
 	The primary option is only valid for active-backup mode.
 
+primary_passive
+
+	Specifies the behaviour of the primary slave in case of
+	it's link recovery has been detected. By default (value 0) the
+	primary slave is set as active slave immediately after the link
+	recovery. If the value is 1 or 2 then current active slave doesn't
+	change as long as it's link status doesn't change. This prevents
+	the bonding device from flip-flopping. Plus if the value is 1 this
+	behaviour happens only if the speed and duplex of primary slave is
+	higher. It the value is 2 then it happens everytime.
+
 updelay
 
 	Specifies the time, in milliseconds, to wait before enabling a
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 699bfdd..65066c1 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -94,6 +94,7 @@ static int downdelay;
 static int use_carrier	= 1;
 static char *mode;
 static char *primary;
+static int primary_passive;
 static char *lacp_rate;
 static char *ad_select;
 static char *xmit_hash_policy;
@@ -126,6 +127,9 @@ MODULE_PARM_DESC(mode, "Mode of operation : 0 for balance-rr, "
 		       "6 for balance-alb");
 module_param(primary, charp, 0);
 MODULE_PARM_DESC(primary, "Primary network device to use");
+module_param(primary_passive, int, 0);
+MODULE_PARM_DESC(primary_passive, "Do not set primary slave active once it comes up; "
+			       "0 for off (default), 1 for on only if speed of primary is not higher, 2 for on");
 module_param(lacp_rate, charp, 0);
 MODULE_PARM_DESC(lacp_rate, "LACPDU tx rate to request from 802.3ad partner "
 			    "(slow/fast)");
@@ -1070,6 +1074,25 @@ out:
 
 }
 
+static bool bond_should_loose_active(struct bonding *bond)
+{
+	struct slave *prim = bond->primary_slave;
+	struct slave *curr = bond->curr_active_slave;
+
+	if (!prim || !curr || curr->link != BOND_LINK_UP)
+		return true;
+	if (bond->force_primary) {
+		bond->force_primary = false;
+		return true;
+	}
+	if (bond->params.primary_passive == 1 &&
+	    (prim->speed < curr->speed ||
+	     (prim->speed == curr->speed && prim->duplex <= curr->duplex)))
+		return false;
+	if (bond->params.primary_passive == 2)
+		return false;
+	return true;
+}
 
 /**
  * find_best_interface - select the best available slave to be the active one
@@ -1094,7 +1117,8 @@ static struct slave *bond_find_best_slave(struct bonding *bond)
 	}
 
 	if ((bond->primary_slave) &&
-	    bond->primary_slave->link == BOND_LINK_UP) {
+	    bond->primary_slave->link == BOND_LINK_UP &&
+	    bond_should_loose_active(bond)) {
 		new_active = bond->primary_slave;
 	}
 
@@ -1675,8 +1699,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 
 	if (USES_PRIMARY(bond->params.mode) && bond->params.primary[0]) {
 		/* if there is a primary slave, remember it */
-		if (strcmp(bond->params.primary, new_slave->dev->name) == 0)
+		if (strcmp(bond->params.primary, new_slave->dev->name) == 0) {
 			bond->primary_slave = new_slave;
+			bond->force_primary = true;
+		}
 	}
 
 	write_lock_bh(&bond->curr_slave_lock);
@@ -4942,6 +4968,18 @@ static int bond_check_params(struct bond_params *params)
 		primary = NULL;
 	}
 
+	if (primary) {
+		if ((primary_passive != 0) && (primary_passive != 1) &&
+		    (primary_passive != 2)) {
+			pr_warning(DRV_NAME
+				   ": Warning: primary_passive module parameter "
+				   "(%d), not of valid value (0/1/2), so it was "
+				   "set to 0\n",
+				   primary_passive);
+			primary_passive = 0;
+		}
+	}
+
 	if (fail_over_mac) {
 		fail_over_mac_value = bond_parse_parm(fail_over_mac,
 						      fail_over_mac_tbl);
@@ -4973,6 +5011,7 @@ static int bond_check_params(struct bond_params *params)
 	params->use_carrier = use_carrier;
 	params->lacp_fast = lacp_fast;
 	params->primary[0] = 0;
+	params->primary_passive = primary_passive;
 	params->fail_over_mac = fail_over_mac_value;
 
 	if (primary) {
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 6044e12..e813d48 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1212,6 +1212,59 @@ static DEVICE_ATTR(primary, S_IRUGO | S_IWUSR,
 		   bonding_show_primary, bonding_store_primary);
 
 /*
+ * Show and set the primary_passive flag.
+ */
+static ssize_t bonding_show_primary_passive(struct device *d,
+					    struct device_attribute *attr,
+					    char *buf)
+{
+	struct bonding *bond = to_bond(d);
+
+	return sprintf(buf, "%d\n", bond->params.primary_passive);
+}
+
+static ssize_t bonding_store_primary_passive(struct device *d,
+					     struct device_attribute *attr,
+					     const char *buf, size_t count)
+{
+	int new_value, ret = count;
+	struct bonding *bond = to_bond(d);
+
+	if (!rtnl_trylock())
+		return restart_syscall();
+
+	if (sscanf(buf, "%d", &new_value) != 1) {
+		pr_err(DRV_NAME
+		       ": %s: no primary_passive value specified.\n",
+		       bond->dev->name);
+		ret = -EINVAL;
+		goto out;
+	}
+	if (new_value == 0 || new_value == 1 || new_value == 2) {
+		bond->params.primary_passive = new_value;
+		pr_info(DRV_NAME ": %s: Setting primary_passive to %d.\n",
+		       bond->dev->name, new_value);
+		if (new_value == 0 || new_value == 1) {
+			bond->force_primary = true;
+			read_lock(&bond->lock);
+			write_lock_bh(&bond->curr_slave_lock);
+			bond_select_active_slave(bond);
+			write_unlock_bh(&bond->curr_slave_lock);
+			read_unlock(&bond->lock);
+		}
+	} else {
+		pr_info(DRV_NAME
+		       ": %s: Ignoring invalid primary_passive value %d.\n",
+		       bond->dev->name, new_value);
+	}
+out:
+	rtnl_unlock();
+	return count;
+}
+static DEVICE_ATTR(primary_passive, S_IRUGO | S_IWUSR,
+		   bonding_show_primary_passive, bonding_store_primary_passive);
+
+/*
  * Show and set the use_carrier flag.
  */
 static ssize_t bonding_show_carrier(struct device *d,
@@ -1500,6 +1553,7 @@ static struct attribute *per_bond_attrs[] = {
 	&dev_attr_num_unsol_na.attr,
 	&dev_attr_miimon.attr,
 	&dev_attr_primary.attr,
+	&dev_attr_primary_passive.attr,
 	&dev_attr_use_carrier.attr,
 	&dev_attr_active_slave.attr,
 	&dev_attr_mii_status.attr,
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 6290a50..b6287e0 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -131,6 +131,7 @@ struct bond_params {
 	int lacp_fast;
 	int ad_select;
 	char primary[IFNAMSIZ];
+	int primary_passive;
 	__be32 arp_targets[BOND_MAX_ARP_TARGETS];
 };
 
@@ -190,6 +191,7 @@ struct bonding {
 	struct   slave *curr_active_slave;
 	struct   slave *current_arp_slave;
 	struct   slave *primary_slave;
+	bool     force_primary;
 	s32      slave_cnt; /* never change this value outside the attach/detach wrappers */
 	rwlock_t lock;
 	rwlock_t curr_slave_lock;

^ permalink raw reply related

* Re: UDP is bypassing qdisc statistics ....
From: Christoph Lameter @ 2009-09-01 21:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, Mark Smith, Jarek Poplawski, netdev
In-Reply-To: <4A9D448D.6050309@gmail.com>


Ok i got some meaningful statistics now. Why is the qlen zero for all
devices?

$ cat /proc/net/qdisc_stats
Type     Device  St    Bytes Packts Qlen Bklg  Drops Requeu Overlimit
TX0 root     lo   0        0      0    0    0      0      0      0
 RX root     lo   0        0      0    0    0      0      0      0
TX0 root   eth0   0    12463     91    0    0      0      0      0
TX1 root   eth0   0     2954     39    0    0      0      0      0
TX2 root   eth0   0     2132     23    0    0      0      0      0
TX3 root   eth0   0    29293    210    0    0      0      0      0
TX4 root   eth0   0     2805     31    0    0      0      0      0
TX5 root   eth0   0 1286862317 3762765    0    0 643253  30886      0
TX6 root   eth0   0     1310     17    0    0      0      0      0
TX7 root   eth0   0     2860     31    0    0      0      0      0

---
 net/sched/sch_api.c |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

Index: linux-2.6/net/sched/sch_api.c
===================================================================
--- linux-2.6.orig/net/sched/sch_api.c	2009-09-01 12:27:24.000000000 -0500
+++ linux-2.6/net/sched/sch_api.c	2009-09-01 12:40:30.000000000 -0500
@@ -1699,6 +1699,77 @@ static const struct file_operations psch
 	.llseek = seq_lseek,
 	.release = single_release,
 };
+
+static void dump_qdisc(struct seq_file *seq, struct net_device *dev,
+				struct Qdisc *q, char *inout, char *text)
+{
+	seq_printf(seq, "%3s %2s %6s %3lx %8lld %6d %4d %4d %6d %6d %6d\n",
+		inout, text, dev->name, q->state,
+		q->bstats.bytes, q->bstats.packets,
+		q->qstats.qlen, q->qstats.backlog, q->qstats.drops,
+		q->qstats.requeues, q->qstats.overlimits);
+}
+
+static void dump_qdisc_root(struct seq_file *seq, struct net_device *dev,
+					 struct Qdisc *root, char *inout)
+{
+	struct Qdisc *q;
+	int n = 0;
+
+	if (!root)
+		return;
+
+	dump_qdisc(seq, dev, root, inout, "root");
+
+	list_for_each_entry(q, &root->list, list) {
+		char buffer[10];
+
+		sprintf(buffer,"Q%d", ++n);
+		dump_qdisc(seq, dev, q, inout, buffer);
+	}
+}
+
+
+static int qdisc_show(struct seq_file *seq, void *v)
+{
+	struct net_device *dev;
+
+	seq_printf(seq, "Type     Device  St    Bytes Packts "
+			"Qlen Bklg  Drops Requeu Overlimit\n");
+
+	read_lock(&dev_base_lock);
+
+	for_each_netdev(&init_net, dev) {
+		struct netdev_queue *dev_queue;
+		int i;
+
+		for (i = 0; i < dev->real_num_tx_queues; i++) {
+			char buffer[10];
+
+			dev_queue = netdev_get_tx_queue(dev, i);
+			sprintf(buffer, "TX%d", i);
+			dump_qdisc_root(seq, dev, dev_queue->qdisc_sleeping, buffer);
+		}
+		dev_queue = &dev->rx_queue;
+		dump_qdisc_root(seq, dev, dev_queue->qdisc_sleeping, "RX");
+	}
+
+	read_unlock(&dev_base_lock);
+	return 0;
+}
+
+static int qdisc_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, qdisc_show, PDE(inode)->data);
+}
+
+static const struct file_operations qdisc_fops = {
+	.owner = THIS_MODULE,
+	.open = qdisc_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
 #endif

 static int __init pktsched_init(void)
@@ -1706,6 +1777,7 @@ static int __init pktsched_init(void)
 	register_qdisc(&pfifo_qdisc_ops);
 	register_qdisc(&bfifo_qdisc_ops);
 	proc_net_fops_create(&init_net, "psched", 0, &psched_fops);
+	proc_net_fops_create(&init_net, "qdisc_stats", 0, &qdisc_fops);

 	rtnl_register(PF_UNSPEC, RTM_NEWQDISC, tc_modify_qdisc, NULL);
 	rtnl_register(PF_UNSPEC, RTM_DELQDISC, tc_get_qdisc, NULL);

^ permalink raw reply

* Re: [PATCH net-next-2.6] bonding: use compare_ether_addr_64bits() in ALB
From: Jiri Pirko @ 2009-09-01 18:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Linux Netdev List, Jay Vosburgh
In-Reply-To: <4A9D4C56.1090804@gmail.com>

Tue, Sep 01, 2009 at 06:31:18PM CEST, eric.dumazet@gmail.com wrote:
>We can speedup ether addresses compares using compare_ether_addr_64bits()
>instead of memcmp(). We make sure all operands are at least 8 bytes long and
>16bits aligned (or better, long word aligned if possible)
>
>Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>---
> drivers/net/bonding/bond_alb.c |   71 ++++++++++++++++---------------
> drivers/net/bonding/bonding.h  |    2
> 2 files changed, 38 insertions(+), 35 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index 2108706..9b5936f 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -79,8 +79,15 @@
>  */
> #define RLB_PROMISC_TIMEOUT	10*ALB_TIMER_TICKS_PER_SEC
> 
>-static const u8 mac_bcast[ETH_ALEN] = {0xff,0xff,0xff,0xff,0xff,0xff};
>-static const u8 mac_v6_allmcast[ETH_ALEN] = {0x33,0x33,0x00,0x00,0x00,0x01};
>+#ifndef __long_aligned
>+#define __long_aligned __attribute__((aligned((sizeof(long)))))
>+#endif
>+static const u8 mac_bcast[ETH_ALEN] __long_aligned = {
>+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff
>+};
>+static const u8 mac_v6_allmcast[ETH_ALEN] __long_aligned = {
>+	0x33, 0x33, 0x00, 0x00, 0x00, 0x01
>+};
> static const int alb_delta_in_ticks = HZ / ALB_TIMER_TICKS_PER_SEC;
> 
> #pragma pack(1)
>@@ -460,8 +467,8 @@ static void rlb_clear_slave(struct bonding *bond, struct slave *slave)
> 
> 			if (assigned_slave) {
> 				rx_hash_table[index].slave = assigned_slave;
>-				if (memcmp(rx_hash_table[index].mac_dst,
>-					   mac_bcast, ETH_ALEN)) {
>+				if (compare_ether_addr_64bits(rx_hash_table[index].mac_dst,
>+							      mac_bcast)) {
> 					bond_info->rx_hashtbl[index].ntt = 1;
> 					bond_info->rx_ntt = 1;
> 					/* A slave has been removed from the
>@@ -575,7 +582,7 @@ static void rlb_req_update_slave_clients(struct bonding *bond, struct slave *sla
> 		client_info = &(bond_info->rx_hashtbl[hash_index]);
> 
> 		if ((client_info->slave == slave) &&
>-		    memcmp(client_info->mac_dst, mac_bcast, ETH_ALEN)) {
>+		    compare_ether_addr_64bits(client_info->mac_dst, mac_bcast)) {
> 			client_info->ntt = 1;
> 			ntt = 1;
> 		}
>@@ -616,9 +623,9 @@ static void rlb_req_update_subnet_clients(struct bonding *bond, __be32 src_ip)
> 		 * unicast mac address.
> 		 */
> 		if ((client_info->ip_src == src_ip) &&
>-		    memcmp(client_info->slave->dev->dev_addr,
>-			   bond->dev->dev_addr, ETH_ALEN) &&
>-		    memcmp(client_info->mac_dst, mac_bcast, ETH_ALEN)) {
>+		    compare_ether_addr_64bits(client_info->slave->dev->dev_addr,
>+			   bond->dev->dev_addr) &&
			^^^ I guess you wanted to add some white chars here

Otherwise the patch looks good. I wanted to make a similar a while ago.

Reviewed-by: Jiri Pirko <jpirko@redhat.com>

>+		    compare_ether_addr_64bits(client_info->mac_dst, mac_bcast)) {
> 			client_info->ntt = 1;
> 			bond_info->rx_ntt = 1;
> 		}
>@@ -645,7 +652,7 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
> 		if ((client_info->ip_src == arp->ip_src) &&
> 		    (client_info->ip_dst == arp->ip_dst)) {
> 			/* the entry is already assigned to this client */
>-			if (memcmp(arp->mac_dst, mac_bcast, ETH_ALEN)) {
>+			if (compare_ether_addr_64bits(arp->mac_dst, mac_bcast)) {
> 				/* update mac address from arp */
> 				memcpy(client_info->mac_dst, arp->mac_dst, ETH_ALEN);
> 			}
>@@ -680,7 +687,7 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
> 		memcpy(client_info->mac_dst, arp->mac_dst, ETH_ALEN);
> 		client_info->slave = assigned_slave;
> 
>-		if (memcmp(client_info->mac_dst, mac_bcast, ETH_ALEN)) {
>+		if (compare_ether_addr_64bits(client_info->mac_dst, mac_bcast)) {
> 			client_info->ntt = 1;
> 			bond->alb_info.rx_ntt = 1;
> 		} else {
>@@ -1046,21 +1053,18 @@ static void alb_change_hw_addr_on_detach(struct bonding *bond, struct slave *sla
> 	int perm_curr_diff;
> 	int perm_bond_diff;
> 
>-	perm_curr_diff = memcmp(slave->perm_hwaddr,
>-				slave->dev->dev_addr,
>-				ETH_ALEN);
>-	perm_bond_diff = memcmp(slave->perm_hwaddr,
>-				bond->dev->dev_addr,
>-				ETH_ALEN);
>+	perm_curr_diff = compare_ether_addr_64bits(slave->perm_hwaddr,
>+						   slave->dev->dev_addr);
>+	perm_bond_diff = compare_ether_addr_64bits(slave->perm_hwaddr,
>+						   bond->dev->dev_addr);
> 
> 	if (perm_curr_diff && perm_bond_diff) {
> 		struct slave *tmp_slave;
> 		int i, found = 0;
> 
> 		bond_for_each_slave(bond, tmp_slave, i) {
>-			if (!memcmp(slave->perm_hwaddr,
>-				    tmp_slave->dev->dev_addr,
>-				    ETH_ALEN)) {
>+			if (!compare_ether_addr_64bits(slave->perm_hwaddr,
>+						       tmp_slave->dev->dev_addr)) {
> 				found = 1;
> 				break;
> 			}
>@@ -1114,10 +1118,10 @@ static int alb_handle_addr_collision_on_attach(struct bonding *bond, struct slav
> 	 * check uniqueness of slave's mac address against the other
> 	 * slaves in the bond.
> 	 */
>-	if (memcmp(slave->perm_hwaddr, bond->dev->dev_addr, ETH_ALEN)) {
>+	if (compare_ether_addr_64bits(slave->perm_hwaddr, bond->dev->dev_addr)) {
> 		bond_for_each_slave(bond, tmp_slave1, i) {
>-			if (!memcmp(tmp_slave1->dev->dev_addr, slave->dev->dev_addr,
>-				    ETH_ALEN)) {
>+			if (!compare_ether_addr_64bits(tmp_slave1->dev->dev_addr,
>+						       slave->dev->dev_addr)) {
> 				found = 1;
> 				break;
> 			}
>@@ -1140,9 +1144,8 @@ static int alb_handle_addr_collision_on_attach(struct bonding *bond, struct slav
> 	bond_for_each_slave(bond, tmp_slave1, i) {
> 		found = 0;
> 		bond_for_each_slave(bond, tmp_slave2, j) {
>-			if (!memcmp(tmp_slave1->perm_hwaddr,
>-				    tmp_slave2->dev->dev_addr,
>-				    ETH_ALEN)) {
>+			if (!compare_ether_addr_64bits(tmp_slave1->perm_hwaddr,
>+						       tmp_slave2->dev->dev_addr)) {
> 				found = 1;
> 				break;
> 			}
>@@ -1157,9 +1160,8 @@ static int alb_handle_addr_collision_on_attach(struct bonding *bond, struct slav
> 		}
> 
> 		if (!has_bond_addr) {
>-			if (!memcmp(tmp_slave1->dev->dev_addr,
>-				    bond->dev->dev_addr,
>-				    ETH_ALEN)) {
>+			if (!compare_ether_addr_64bits(tmp_slave1->dev->dev_addr,
>+						       bond->dev->dev_addr)) {
> 
> 				has_bond_addr = tmp_slave1;
> 			}
>@@ -1313,7 +1315,7 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
> 	case ETH_P_IP: {
> 		const struct iphdr *iph = ip_hdr(skb);
> 
>-		if ((memcmp(eth_data->h_dest, mac_bcast, ETH_ALEN) == 0) ||
>+		if (!compare_ether_addr_64bits(eth_data->h_dest, mac_bcast) ||
> 		    (iph->daddr == ip_bcast) ||
> 		    (iph->protocol == IPPROTO_IGMP)) {
> 			do_tx_balance = 0;
>@@ -1327,7 +1329,7 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
> 		/* IPv6 doesn't really use broadcast mac address, but leave
> 		 * that here just in case.
> 		 */
>-		if (memcmp(eth_data->h_dest, mac_bcast, ETH_ALEN) == 0) {
>+		if (!compare_ether_addr_64bits(eth_data->h_dest, mac_bcast)) {
> 			do_tx_balance = 0;
> 			break;
> 		}
>@@ -1335,7 +1337,7 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
> 		/* IPv6 uses all-nodes multicast as an equivalent to
> 		 * broadcasts in IPv4.
> 		 */
>-		if (memcmp(eth_data->h_dest, mac_v6_allmcast, ETH_ALEN) == 0) {
>+		if (!compare_ether_addr_64bits(eth_data->h_dest, mac_v6_allmcast)) {
> 			do_tx_balance = 0;
> 			break;
> 		}
>@@ -1660,8 +1662,8 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave
> 		struct slave *tmp_slave;
> 		/* find slave that is holding the bond's mac address */
> 		bond_for_each_slave(bond, tmp_slave, i) {
>-			if (!memcmp(tmp_slave->dev->dev_addr,
>-				    bond->dev->dev_addr, ETH_ALEN)) {
>+			if (!compare_ether_addr_64bits(tmp_slave->dev->dev_addr,
>+						       bond->dev->dev_addr)) {
> 				swap_slave = tmp_slave;
> 				break;
> 			}
>@@ -1739,7 +1741,8 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
> 	swap_slave = NULL;
> 
> 	bond_for_each_slave(bond, slave, i) {
>-		if (!memcmp(slave->dev->dev_addr, bond_dev->dev_addr, ETH_ALEN)) {
>+		if (!compare_ether_addr_64bits(slave->dev->dev_addr,
>+					       bond_dev->dev_addr)) {
> 			swap_slave = slave;
> 			break;
> 		}
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index 6290a50..6824771 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -163,9 +163,9 @@ struct slave {
> 	u32    original_flags;
> 	u32    original_mtu;
> 	u32    link_failure_count;
>+	u8     perm_hwaddr[ETH_ALEN];
> 	u16    speed;
> 	u8     duplex;
>-	u8     perm_hwaddr[ETH_ALEN];
> 	struct ad_slave_info ad_info; /* HUGE - better to dynamically alloc */
> 	struct tlb_slave_info tlb_info;
> };
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* ipsec not forwarding (suspect SA issue)
From: Andrew Dickinson @ 2009-09-01 18:57 UTC (permalink / raw)
  To: netdev

Howdy netdev,

First, I'm not positive that this is the right list for this question,
so feel free to steer me in the right direction.  I'm trying to work
out an issue with ipsec not forwarding traffic from my LAN down my
tunnel.  I've walked through the troubleshooting-doc on the lartc site
and everything seems kosher...

Here's my setup.

I've got a linux-based router/firewall on the edge of my network with
two interfaces, $WAN and $LAN.  The router is MASQUERADING to the
internet.  My LAN is 10.0.0.0/24.  I'm trying to peer with a remote
network 10.254.0.0/23.  The remote network does not have internet
connectivity, so all non-10.254/23 traffic should traverse the VPN to
get to my router to go to the internet or my local LAN.

I'm using racoon and setkey to establish the VPN tunnel and BGP (via
quagga) to advertise routes into the remote network.  The routers are
using 169.254.255.0/30 for BGP.

The problem that I'm having is that traffic from my LAN to 10.254/23
is not going down the VPN tunnel; it just disappears.  I can see it
come in on the LAN interface, but I don't see it leave the WAN
interface as either unencrypted traffic or as esp traffic.  Traffic
from the router, however, works fine.

------ BEGIN racoon.conf ------
log info;

path pre_shared_key "/etc/racoon/psk.txt";


listen {
    adminsock "/var/run/racoon/racoon.sock" "root" "operator" 0660;
    isakmp <MY_IP>
}

timer {
    counter 5;
    interval 20 sec;
    persend 1;
    phase1 30 sec;
    phase2 15 sec;
}

remote anonymous {
    exchange_mode main,aggressive,base;
    lifetime time 28800 sec;
    proposal_check obey;
    dpd_delay 10;
    dpd_retry 10;
    dpd_maxfail 3;
    esp_frag 1396;
    proposal {
        encryption_algorithm aes;
        hash_algorithm sha1;
        authentication_method pre_shared_key;
        dh_group 2;
    }
}

sainfo anonymous {
    authentication_algorithm hmac_sha1;
    encryption_algorithm aes;
    lifetime time 3600 seconds;
    compression_algorithm deflate;
    pfs_group 2;
}

------- END racoon.conf -------

------ BEGIN setkey.conf -----
flush;
spdflush;


spdadd 169.254.255.1 169.254.255.2 any -P in ipsec
    esp/tunnel/REMOTE_IP-MY_IP/unique;
spdadd 169.254.255.2 169.254.255.1 any -P out ipsec
    esp/tunnel/MY_IP-REMOTE_IP/unique;

spdadd 10.254.0.0/23 0.0.0.0/0 any -P in ipsec
    esp/tunnel/REMOTE_IP-MY_IP/unique;
spdadd 0.0.0.0/0 10.254.0.0/23 any -P fwd ipsec
    esp/tunnel/MY_IP-REMOTE_IP/unique;
spdadd 0.0.0.0/0 10.254.0.0/23 any -P out ipsec
    esp/tunnel/MY_IP-REMOTE_IP/unique;

spdadd 0.0.0.0/0 0.0.0.0/0 254 -P in ipsec
    esp/tunnel/REMOTE_IP-MY_IP/unique;
spdadd 0.0.0.0/0 0.0.0.0/0 254 -P out ipsec
    esp/tunnel/MY_IP-REMOTE_IP/unique;

spdadd 0.0.0.0/0 0.0.0.0/0 tcp -P in none;
spdadd 0.0.0.0/0 0.0.0.0/0 tcp -P out none;
spdadd 0.0.0.0/0 0.0.0.0/0 udp -P in none;
spdadd 0.0.0.0/0 0.0.0.0/0 udp -P out none;

----- END setkey.conf ----

There's two things that are potentially funky with this config that
I'm not proud of (and which might potentially be part of my problem).
When racoon goes to phase2 negotiation, it looks for an SPD with 0/0 -
0/0 [any] .   I've installed an SPD of 0/0 - 0/0 [254] in order to
make racoon happy.  This isn't a problem for me as I don't have any
traffic using IP protocol #254 (obviously).  The other thing is that
I'm explicitly adding a fwd rule... that was my effort to try to fix
my problem (it didn't help).  Beyond that, the rest of the rules seem
fairly straight forward.

Further, when I initially connect the VPN, I see racoon do an SA
negotiation for the 0/0 rules.  When I start quagga, I see it do an SA
for the 169.254... rules.  If I ping a remote machine from the
routers, I see it do an SA for the 10.254.0.0/23 rules.  But if I ping
from something on my LAN there's no negotiation (this is true whether
I ping from the router first or not).

Here's what I've double checked:
1) iptables nat table has rules to ACCEPT 10.254.0.0/23 destined
traffic to prevent it from being MASQUERADE'd (which I see counters
for when I ping from the router)
2) iptables (main) table has FORWARD rules to ACCEPT 10.254.0.0/23
destined traffic (which I never see counters for)
3) IP forwarding is enabled (as this router is happily forwarding
other traffic to-from the LAN to the internet)

It seems like this is an issue with an SA not getting found for
forwarding traffic and the kernel silently dropping the packet.  How
do I debug this?

-A

^ permalink raw reply

* [PATCH 0/2] RTO connection timeout calculation
From: Damian Lukowski @ 2009-09-01 20:23 UTC (permalink / raw)
  To: Netdev

Hello,

this series of patches implements some changes concerning the RTO
retransmission timeouts.

1) retransmits_timed_out() is commented shortly and has
   more meaningful names.
2) The sysctl documentation is being updated.

Regards

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>


^ permalink raw reply

* [PATCH 1/2] RTO connection timeout: coding style fixes and comments
From: Damian Lukowski @ 2009-09-01 20:24 UTC (permalink / raw)
  To: Netdev

This patch affects the retransmits_timed_out() function.

Changes:
1) Variables have more meaningful names
2) retransmits_timed_out() has an introductionary comment.
3) Small coding style changes.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
---
 include/net/tcp.h |   19 ++++++++++++-------
 1 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e531949..df50bc4 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1252,22 +1252,27 @@ static inline struct sk_buff *tcp_write_queue_prev(struct sock *sk, struct sk_bu
 #define tcp_for_write_queue_from_safe(skb, tmp, sk)			\
 	skb_queue_walk_from_safe(&(sk)->sk_write_queue, skb, tmp)
 
+/* This function calculates a "timeout" which is equivalent to the timeout of a
+ * TCP connection after "boundary" unsucessful, exponentially backed-off
+ * retransmissions with an initial RTO of TCP_RTO_MIN.
+ */
 static inline bool retransmits_timed_out(const struct sock *sk,
 					 unsigned int boundary)
 {
-	int limit, K;
+	unsigned int timeout, linear_backoff_thresh;
+
 	if (!inet_csk(sk)->icsk_retransmits)
 		return false;
 
-	K = ilog2(TCP_RTO_MAX/TCP_RTO_MIN);
+	linear_backoff_thresh = ilog2(TCP_RTO_MAX/TCP_RTO_MIN);
 
-	if (boundary <= K)
-		limit = ((2 << boundary) - 1) * TCP_RTO_MIN;
+	if (boundary <= linear_backoff_thresh)
+		timeout = ((2 << boundary) - 1) * TCP_RTO_MIN;
 	else
-		limit = ((2 << K) - 1) * TCP_RTO_MIN +
-			(boundary - K) * TCP_RTO_MAX;
+		timeout = ((2 << linear_backoff_thresh) - 1) * TCP_RTO_MIN +
+			  (boundary - linear_backoff_thresh) * TCP_RTO_MAX;
 
-	return (tcp_time_stamp - tcp_sk(sk)->retrans_stamp) >= limit;
+	return (tcp_time_stamp - tcp_sk(sk)->retrans_stamp) >= timeout;
 }
 
 static inline struct sk_buff *tcp_send_head(struct sock *sk)
-- 
1.6.3.3



^ permalink raw reply related

* [PATCH 2/2] RTO connection timeout: sysctl documentation update
From: Damian Lukowski @ 2009-09-01 20:24 UTC (permalink / raw)
  To: Netdev

This patch updates the sysctl documentation concerning the interpretation
of tcp_retries{1,2} and tcp_orphan_retries.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
---
 Documentation/networking/ip-sysctl.txt |   37 ++++++++++++++++++++++---------
 1 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 8be7623..b6ee71f 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -311,9 +311,12 @@ tcp_no_metrics_save - BOOLEAN
 	connections.
 
 tcp_orphan_retries - INTEGER
-	How may times to retry before killing TCP connection, closed
-	by our side. Default value 7 corresponds to ~50sec-16min
-	depending on RTO. If you machine is loaded WEB server,
+	This value influences the timeout of a locally closed TCP connection,
+	when RTO retransmissions remain unacknowledged.
+	See tcp_retries2 for more details.
+
+	The default value is 7.
+	If your machine is a loaded WEB server,
 	you should think about lowering this value, such sockets
 	may consume significant resources. Cf. tcp_max_orphans.
 
@@ -327,16 +330,28 @@ tcp_retrans_collapse - BOOLEAN
 	certain TCP stacks.
 
 tcp_retries1 - INTEGER
-	How many times to retry before deciding that something is wrong
-	and it is necessary to report this suspicion to network layer.
-	Minimal RFC value is 3, it is default, which corresponds
-	to ~3sec-8min depending on RTO.
+	This value influences the time, after which TCP decides, that
+	something is wrong due to unacknowledged RTO retransmissions,
+	and reports this suspicion to the network layer.
+	See tcp_retries2 for more details.
+
+	RFC 1122 recommends at least 3 retransmissions, which is the
+	default.
 
 tcp_retries2 - INTEGER
-	How may times to retry before killing alive TCP connection.
-	RFC1122 says that the limit should be longer than 100 sec.
-	It is too small number.	Default value 15 corresponds to ~13-30min
-	depending on RTO.
+	This value influences the timeout of an alive TCP connection,
+	when RTO retransmissions remain unacknowledged.
+	Given a value of N, a hypothetical TCP connection following
+	exponential backoff with an initial RTO of TCP_RTO_MIN would
+	retransmit N times before killing the connection at the (N+1)th RTO.
+
+	The default value of 15 yields a hypothetical timeout of 924.6
+	seconds and is a lower bound for the effective timeout.
+	TCP will effectively time out at the first RTO which exceeds the
+	hypothetical timeout.
+
+	RFC 1122 recommends at least 100 seconds for the timeout,
+	which corresponds to a value of at least 8.
 
 tcp_rfc1337 - BOOLEAN
 	If set, the TCP stack behaves conforming to RFC1337. If unset,
-- 
1.6.3.3


^ permalink raw reply related

* Re: UDP is bypassing qdisc statistics ....
From: Jarek Poplawski @ 2009-09-01 20:24 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, Patrick McHardy, Mark Smith, netdev
In-Reply-To: <alpine.DEB.1.10.0909011753230.25607@V090114053VZO-1>

On Tue, Sep 01, 2009 at 05:55:44PM -0400, Christoph Lameter wrote:
> 
> Ok i got some meaningful statistics now. Why is the qlen zero for all
> devices?

Looks like it's updated with q->q.qlen before dumps only. Nice work!
Btw, isn't there something with your localhost's time?

Original-Received: from ...
	...
	for <netdev@vger.kernel.org>; Tue,  1 Sep 2009 13:58:12 -0400 (EDT)
Original-Received: from cl (helo=localhost)
	...
	id 1MibKK-0007nd-Ib; Tue, 01 Sep 2009 17:55:44 -0400

Jarek P.

^ permalink raw reply

* [PATCH] net: make neigh_ops constant
From: Stephen Hemminger @ 2009-09-01 21:13 UTC (permalink / raw)
  To: David Miller; +Cc: netdev


These tables are never modified at runtime. Move to read-only
section.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
 include/net/arp.h       |    2 +-
 include/net/neighbour.h |    2 +-
 net/atm/clip.c          |    2 +-
 net/decnet/dn_neigh.c   |    6 +++---
 net/ipv4/arp.c          |    8 ++++----
 net/ipv6/ndisc.c        |    6 +++---
 6 files changed, 13 insertions(+), 13 deletions(-)

--- a/include/net/neighbour.h	2009-09-01 10:03:19.768820418 -0700
+++ b/include/net/neighbour.h	2009-09-01 10:43:34.777044929 -0700
@@ -118,7 +118,7 @@ struct neighbour
 	int			(*output)(struct sk_buff *skb);
 	struct sk_buff_head	arp_queue;
 	struct timer_list	timer;
-	struct neigh_ops	*ops;
+	const struct neigh_ops	*ops;
 	u8			primary_key[0];
 };
 
--- a/net/atm/clip.c	2009-09-01 10:03:19.738819943 -0700
+++ b/net/atm/clip.c	2009-09-01 10:03:40.185578062 -0700
@@ -267,7 +267,7 @@ static void clip_neigh_error(struct neig
 	kfree_skb(skb);
 }
 
-static struct neigh_ops clip_neigh_ops = {
+static const struct neigh_ops clip_neigh_ops = {
 	.family =		AF_INET,
 	.solicit =		clip_neigh_solicit,
 	.error_report =		clip_neigh_error,
--- a/net/decnet/dn_neigh.c	2009-09-01 10:03:19.744923430 -0700
+++ b/net/decnet/dn_neigh.c	2009-09-01 10:03:40.186548865 -0700
@@ -59,7 +59,7 @@ static int dn_phase3_output(struct sk_bu
 /*
  * For talking to broadcast devices: Ethernet & PPP
  */
-static struct neigh_ops dn_long_ops = {
+static const struct neigh_ops dn_long_ops = {
 	.family =		AF_DECnet,
 	.error_report =		dn_long_error_report,
 	.output =		dn_long_output,
@@ -71,7 +71,7 @@ static struct neigh_ops dn_long_ops = {
 /*
  * For talking to pointopoint and multidrop devices: DDCMP and X.25
  */
-static struct neigh_ops dn_short_ops = {
+static const struct neigh_ops dn_short_ops = {
 	.family =		AF_DECnet,
 	.error_report =		dn_short_error_report,
 	.output =		dn_short_output,
@@ -83,7 +83,7 @@ static struct neigh_ops dn_short_ops = {
 /*
  * For talking to DECnet phase III nodes
  */
-static struct neigh_ops dn_phase3_ops = {
+static const struct neigh_ops dn_phase3_ops = {
 	.family =		AF_DECnet,
 	.error_report =		dn_short_error_report, /* Can use short version here */
 	.output =		dn_phase3_output,
--- a/net/ipv4/arp.c	2009-09-01 10:03:19.756847564 -0700
+++ b/net/ipv4/arp.c	2009-09-01 10:38:38.791532681 -0700
@@ -130,7 +130,7 @@ static void arp_solicit(struct neighbour
 static void arp_error_report(struct neighbour *neigh, struct sk_buff *skb);
 static void parp_redo(struct sk_buff *skb);
 
-static struct neigh_ops arp_generic_ops = {
+static const struct neigh_ops arp_generic_ops = {
 	.family =		AF_INET,
 	.solicit =		arp_solicit,
 	.error_report =		arp_error_report,
@@ -140,7 +140,7 @@ static struct neigh_ops arp_generic_ops 
 	.queue_xmit =		dev_queue_xmit,
 };
 
-static struct neigh_ops arp_hh_ops = {
+static const struct neigh_ops arp_hh_ops = {
 	.family =		AF_INET,
 	.solicit =		arp_solicit,
 	.error_report =		arp_error_report,
@@ -150,7 +150,7 @@ static struct neigh_ops arp_hh_ops = {
 	.queue_xmit =		dev_queue_xmit,
 };
 
-static struct neigh_ops arp_direct_ops = {
+static const struct neigh_ops arp_direct_ops = {
 	.family =		AF_INET,
 	.output =		dev_queue_xmit,
 	.connected_output =	dev_queue_xmit,
@@ -158,7 +158,7 @@ static struct neigh_ops arp_direct_ops =
 	.queue_xmit =		dev_queue_xmit,
 };
 
-struct neigh_ops arp_broken_ops = {
+const struct neigh_ops arp_broken_ops = {
 	.family =		AF_INET,
 	.solicit =		arp_solicit,
 	.error_report =		arp_error_report,
--- a/net/ipv6/ndisc.c	2009-09-01 10:03:19.750820837 -0700
+++ b/net/ipv6/ndisc.c	2009-09-01 10:03:40.192819983 -0700
@@ -98,7 +98,7 @@ static int pndisc_constructor(struct pne
 static void pndisc_destructor(struct pneigh_entry *n);
 static void pndisc_redo(struct sk_buff *skb);
 
-static struct neigh_ops ndisc_generic_ops = {
+static const struct neigh_ops ndisc_generic_ops = {
 	.family =		AF_INET6,
 	.solicit =		ndisc_solicit,
 	.error_report =		ndisc_error_report,
@@ -108,7 +108,7 @@ static struct neigh_ops ndisc_generic_op
 	.queue_xmit =		dev_queue_xmit,
 };
 
-static struct neigh_ops ndisc_hh_ops = {
+static const struct neigh_ops ndisc_hh_ops = {
 	.family =		AF_INET6,
 	.solicit =		ndisc_solicit,
 	.error_report =		ndisc_error_report,
@@ -119,7 +119,7 @@ static struct neigh_ops ndisc_hh_ops = {
 };
 
 
-static struct neigh_ops ndisc_direct_ops = {
+static const struct neigh_ops ndisc_direct_ops = {
 	.family =		AF_INET6,
 	.output =		dev_queue_xmit,
 	.connected_output =	dev_queue_xmit,
--- a/include/net/arp.h	2009-09-01 10:38:14.812653889 -0700
+++ b/include/net/arp.h	2009-09-01 10:39:53.838862539 -0700
@@ -26,6 +26,6 @@ extern struct sk_buff *arp_create(int ty
 				  const unsigned char *target_hw);
 extern void arp_xmit(struct sk_buff *skb);
 
-extern struct neigh_ops arp_broken_ops;
+extern const struct neigh_ops arp_broken_ops;
 
 #endif	/* _ARP_H */

^ permalink raw reply

* Re: neighbour table RCU
From: Stephen Hemminger @ 2009-09-01 21:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <4A9D4834.4090902@gmail.com>

On Tue, 01 Sep 2009 18:13:40 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Stephen Hemminger a écrit :
> > On Tue, 01 Sep 2009 08:50:17 +0200
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > 
> >> Stephen Hemminger a écrit :
> >>> Looking at the neighbour table, it should be possible to get
> >>> rid of the two reader/writer locks.  The hash table lock is pretty
> >>> amenable to RCU, but the dynamic resizing makes it non-trivial.
> >>> Thinking of using a combination of RCU and sequence counts so that the
> >>> reader would just rescan if resize was in progress.
> >> I am not sure neigh_tbl_lock rwlock should be changed, I did not
> >> see any contention on it.
> >>
> >>> The reader/writer lock on the neighbour entry is more of a problem.
> >>> Probably would be simpler/faster to change it into a spinlock and
> >>> be done with it.
> >>>
> >>> The reader/writer lock is also used for the proxy list hash table,
> >>> but that can just be a simple spinlock.
> >>>
> >> This is probably is the only thing we want to do at this moment,
> >> halving atomic ops on neigh_resolve_output()
> >>
> >> But why neigh_resolve_output() was called so much in the bench
> >> is the question...
> >>
> > 
> > Every packet has to have an ARP resolution.
> > 
> 
> Sure, but I thought we had a cache ?
> 
> static inline int ip_finish_output2(struct sk_buff *skb)
> {
> ...
> 	if (dst->hh)
> 		return neigh_hh_output(dst->hh, skb);
> 	else if (dst->neighbour)
> 		return dst->neighbour->output(skb);  << should fill cache first time >>
> ...
> }
> 
> in my pktgen benches, I always hit same dst so should take the hh cache ?

I ping the remote host before starting pktgen, that way I figure
the ARP and cache is available.

-- 

^ permalink raw reply

* Re: [PATCH 08/14] pktgen: reorganize transmit loop
From: Stephen Hemminger @ 2009-09-01 21:30 UTC (permalink / raw)
  To: David Miller; +Cc: greearb, robert.olsson, netdev, tglx
In-Reply-To: <20090828.230428.211927371.davem@davemloft.net>

On Fri, 28 Aug 2009 23:04:28 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Thu, 27 Aug 2009 22:49:02 -0700
> 
> > On Thu, 27 Aug 2009 20:52:32 -0700
> > Ben Greear <greearb@candelatech.com> wrote:
> > 
> >> +		default: /* Drivers are not supposed to return other
> >> values! */
> >> +			if (net_ratelimit())
> >> +				pr_info("pktgen: %s xmit error:
> >> %d\n",
> >> +					odev->name, ret);
> >>  			pkt_dev->errors++;
> >> 
> >> I believe this is faulty.  Things like vlans can send pkts to qdiscs
> >> of the underlying device and those can return other values.
> >> 
> >> Patric McHardy put in some patches recently to achieve this in a more
> >> uniform manner:
> >> 
> >> http://patchwork.ozlabs.org/patch/28340/
> >> 
> >> Thanks,
> >> Ben
> >> 
> > 
> > Since pktgen has its own way of generating vlan tags, it
> > makes no sense to use it on top of 8021q vlan driver.
> 
> I think Patrick's goals are quite sound, and with his patch we
> could let pktgen transmit over vlan just like any other device.
> 
> Otherwise we give no way to use pktgen to test the VLAN transmit path.

The only valid returns from device are OK, BUSY, or LOCKED
anything else is an error and should never occur.

If the vlan code returns other values, it must translate.

-- 

^ permalink raw reply

* [NET] Add proc file to display the state of all qdiscs.
From: Christoph Lameter @ 2009-09-01 23:52 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, David Miller, Patrick McHardy

[NET] Add proc file to display the state of all qdiscs

TC is a complicated tool and it currently does not allow the display of all
qdisc states. It does not support multiple tx queues and also not
localhost, nor does it display the current operating state of the queues.

This functionality could be added to tc / netlink but the tool is already
complex to handle. The simple proc file here allows easy scanning by
scripts and other tools. However, tc still needs to be updated to allow
the modifications of multiqueue TX settings. tc's main focus is the
configuration of qdiscs. The qdisc_stats file just shows the current
state.

This patch adds

	/proc/net/qdisc_stats

which displays the current state of all qdiscs on the system.

F.e.

$ cat /proc/net/qdisc_stats
Queue    Device  State   Bytes  Packets Qlen Blog   Drops Requeue Overlimit
TX0/root     lo   -          0        0    0    0       0       0       0
 RX/root     lo   -          0        0    0    0       0       0       0
TX0/root   eth0   -       5518       60    0    0       0       0       0
TX1/root   eth0   -       2549       37    0    0       0       0       0
TX2/root   eth0   -      63625      272    0    0       0       0       0
TX3/root   eth0   -       1580       21    0    0       0       0       0
TX4/root   eth0   R   88979440   260183    0 3532   43176    2111       0
TX5/root   eth0   -       4698       56    0    0       0       0       0
TX6/root   eth0   - 3598883129 10523140    0    0       0       0       0
TX7/root   eth0   -       1750       21    0    0       0       0       0


Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 net/sched/sch_api.c |  130 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

Index: linux-2.6/net/sched/sch_api.c
===================================================================
--- linux-2.6.orig/net/sched/sch_api.c	2009-09-01 12:27:24.000000000 -0500
+++ linux-2.6/net/sched/sch_api.c	2009-09-01 14:39:27.000000000 -0500
@@ -1699,6 +1699,135 @@ static const struct file_operations psch
 	.llseek = seq_lseek,
 	.release = single_release,
 };
+
+static void dump_qdisc(struct seq_file *seq, struct net_device *dev,
+				struct Qdisc *q, char *inout, char *text)
+{
+	char state[5];
+	char *p = state;
+
+	if (test_bit(__QDISC_STATE_RUNNING, &q->state))
+		*p++ = 'R';
+	if (test_bit(__QDISC_STATE_SCHED, &q->state))
+		*p++ = 'S';
+	if (test_bit(__QDISC_STATE_DEACTIVATED, &q->state))
+		*p++ = 'D';
+	if (q->state == 0)
+		*p++ = '-';
+	*p = 0;
+
+	seq_printf(seq, "%3s/%2s %6s %3s %10lld %8d %4d %4d %7d %7d %7d\n",
+		inout, text, dev->name, state,
+		q->bstats.bytes, q->bstats.packets,
+		q->qstats.qlen, q->qstats.backlog, q->qstats.drops,
+		q->qstats.requeues, q->qstats.overlimits);
+}
+
+static void dump_qdisc_root(struct seq_file *seq, struct net_device *dev,
+					 struct Qdisc *root, char *inout)
+{
+	struct Qdisc *q;
+	int n = 0;
+
+	if (!root)
+		return;
+
+	dump_qdisc(seq, dev, root, inout, "root");
+
+	list_for_each_entry(q, &root->list, list) {
+		char buffer[10];
+
+		sprintf(buffer,"Q%d", ++n);
+		dump_qdisc(seq, dev, q, inout, buffer);
+	}
+}
+
+
+static void qdisc_seq_out(struct seq_file *seq, struct net_device *dev)
+{
+	struct netdev_queue *dev_queue;
+	int i;
+
+	for (i = 0; i < dev->real_num_tx_queues; i++) {
+		char buffer[10];
+
+		dev_queue = netdev_get_tx_queue(dev, i);
+		sprintf(buffer, "TX%d", i);
+		dump_qdisc_root(seq, dev, dev_queue->qdisc_sleeping, buffer);
+	}
+	dev_queue = &dev->rx_queue;
+	dump_qdisc_root(seq, dev, dev_queue->qdisc_sleeping, "RX");
+}
+
+static int qdisc_seq_show(struct seq_file *seq, void *v)
+{
+	if (v == SEQ_START_TOKEN) {
+		seq_printf(seq, "Queue    Device  State   Bytes  Packets "
+			"Qlen Blog   Drops Requeue Overlimit\n");
+	} else
+		qdisc_seq_out(seq, v);
+
+	return 0;
+}
+
+static void *qdisc_seq_start(struct seq_file *seq, loff_t *pos)
+	__acquires(dev_base_lock)
+{
+	struct net *net = seq_file_net(seq);
+	loff_t off;
+	struct net_device *dev;
+
+	read_lock(&dev_base_lock);
+
+	if (!*pos)
+		return SEQ_START_TOKEN;
+
+	off = 1;
+
+	for_each_netdev(net, dev)
+		if (off++ == *pos)
+			return dev;
+
+	return NULL;
+}
+
+static void *qdisc_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	struct net *net = seq_file_net(seq);
+
+	++*pos;
+	return v == SEQ_START_TOKEN ?
+		first_net_device(net) : next_net_device((struct net_device *)v);
+}
+
+static void qdisc_seq_stop(struct seq_file *seq, void *v)
+	__releases(dev_base_lock)
+{
+	read_unlock(&dev_base_lock);
+}
+
+
+static const struct seq_operations qdisc_seq_ops = {
+	.start	= qdisc_seq_start,
+	.next	= qdisc_seq_next,
+	.stop	= qdisc_seq_stop,
+	.show	= qdisc_seq_show,
+};
+
+static int qdisc_open(struct inode *inode, struct file *file)
+{
+	return seq_open_net(inode, file, &qdisc_seq_ops,
+			sizeof(struct seq_net_private));
+}
+
+
+static const struct file_operations qdisc_fops = {
+	.owner = THIS_MODULE,
+	.open = qdisc_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = seq_release,
+};
 #endif

 static int __init pktsched_init(void)
@@ -1706,6 +1835,7 @@ static int __init pktsched_init(void)
 	register_qdisc(&pfifo_qdisc_ops);
 	register_qdisc(&bfifo_qdisc_ops);
 	proc_net_fops_create(&init_net, "psched", 0, &psched_fops);
+	proc_net_fops_create(&init_net, "qdisc_stats", 0, &qdisc_fops);

 	rtnl_register(PF_UNSPEC, RTM_NEWQDISC, tc_modify_qdisc, NULL);
 	rtnl_register(PF_UNSPEC, RTM_DELQDISC, tc_get_qdisc, NULL);


^ permalink raw reply

* Re: UDP is bypassing qdisc statistics ....
From: Christoph Lameter @ 2009-09-02  1:36 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Eric Dumazet, Patrick McHardy, Mark Smith, netdev
In-Reply-To: <20090901202432.GA3200@ami.dom.local>

On Tue, 1 Sep 2009, Jarek Poplawski wrote:

> Btw, isn't there something with your localhost's time?

Yup. I have virtual hosting and the provider screwed up time. I can do
nothing about it.

^ permalink raw reply

* Re: Crypto oops in async_chainiv_do_postponed
From: Herbert Xu @ 2009-09-01 22:17 UTC (permalink / raw)
  To: Brad Bosch; +Cc: linux-crypto, netdev, offbase0
In-Reply-To: <19101.16628.347039.619378@waldo.imnotcreative.homeip.net>

On Tue, Sep 01, 2009 at 10:42:44AM -0500, Brad Bosch wrote:
> 
> Now, ctx-err may be used by both async_chainiv_postpone_request to
> store the return value from skcipher_enqueue_givcrypt and by
> async_chainiv_givencrypt_tail to store the return value from
> crypto_ablkcipher_encrypt at the same time.  This can cause the
> calling function to think async_chainiv_givencrypt has completed it's
> work, when in fact, the work was defered.

async_chainiv_postpone_request never touches ctx->err unless
it can obtain the INUSE bit lock.  On the other hand, the normal
patch async_chainiv_givencrypt_tail never relinquishes the INUSE
bit until it is finisehd with ctx->err.
 
> OK.  I see now that your offset patch should indeed solve that
> problem.  But why did you choose to fix it in a complex way?  My
> suggestion just adds a single test while yours adds new parameters, a
> new function and an extra function call.

Because that introduces two NULL checks where the second one is
useless.  Not a big deal but then again, my patch wasn't that
complicated either :)

Please let me know whether it actually fixes your problem though
so I can get this upstream.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [RFC] iproute2: gracefully exit from rtnl_listen()
From: David Ward @ 2009-09-01 21:41 UTC (permalink / raw)
  To: netdev; +Cc: David Ward

rtnl_listen() (in iproute2's libnetlink) allows a userspace application to
monitor an RTNETLINK multicast group, so that the application can react to
changes in the kernel's routing table, neighbor cache, etc. as they occur.
(This is in contrast with using RTNETLINK to solicit the kernel for a full
copy of the routing table or neighbor cache at a specific point in time.)

However rtnl_listen() creates an infinite loop, which does not break in the
absence of errors, so there does not seem to be any way to gracefully exit
from it. Existing applications that call rtnl_listen(), such as rtmon, break
from this loop by terminating the entire application. There should be some
mechanism for stopping rtnl_listen() and continuing program execution.

If you assume that rtnl_listen() will only need to be stopped because the
application has received a signal (such as SIGINT), then the application's
signal handler can call rtnl_close() to close the RTNETLINK socket.
Afterwards, with this patch, rtnl_listen() can detect that the socket was
closed during the interrupt and exit.

I am offering this as a starting point for consideration, but I feel that
this is at best a poor solution. What is a better way to fix rtnl_listen()
(and similar libnetlink functions for consistency)?
---
 lib/libnetlink.c |   15 ++++++++++++---
 1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index b68e2fd..afb5b23 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -196,8 +196,11 @@ int rtnl_dump_filter(struct rtnl_handle *rth,
 		status = recvmsg(rth->fd, &msg, 0);
 
 		if (status < 0) {
-			if (errno == EINTR || errno == EAGAIN)
+			if (errno == EINTR || errno == EAGAIN) {
+				if (rth->fd < 0)
+					return 0;
 				continue;
+			}
 			fprintf(stderr, "netlink receive error %s (%d)\n",
 				strerror(errno), errno);
 			return -1;
@@ -300,8 +303,11 @@ int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n, pid_t peer,
 		status = recvmsg(rtnl->fd, &msg, 0);
 
 		if (status < 0) {
-			if (errno == EINTR || errno == EAGAIN)
+			if (errno == EINTR || errno == EAGAIN) {
+				if (rtnl->fd < 0)
+					return 0;
 				continue;
+			}
 			fprintf(stderr, "netlink receive error %s (%d)\n",
 				strerror(errno), errno);
 			return -1;
@@ -405,8 +411,11 @@ int rtnl_listen(struct rtnl_handle *rtnl,
 		status = recvmsg(rtnl->fd, &msg, 0);
 
 		if (status < 0) {
-			if (errno == EINTR || errno == EAGAIN)
+			if (errno == EINTR || errno == EAGAIN) {
+				if (rtnl->fd < 0)
+					return 0;
 				continue;
+			}
 			fprintf(stderr, "netlink receive error %s (%d)\n",
 				strerror(errno), errno);
 			return -1;
-- 
1.5.5.6


^ permalink raw reply related

* Re: TSecr != 0 check in inet_lro.c
From: David Miller @ 2009-09-01 22:46 UTC (permalink / raw)
  To: eric.dumazet; +Cc: opurdila, themann, netdev, raisch
In-Reply-To: <4A9379C9.6050108@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 25 Aug 2009 07:42:33 +0200

> Octavian Purdila a écrit :
>> 
>> I'm failing to understand the purpose of this check. Any hints? :)
>> 
> 
> rfc1323 badly interpreted ?
> 
> I remember tsecr=0 was forbidden by Linux, while apparently rfc is not
> so clear.
> 
> rfc1323 : 3.2
>          The Timestamp Echo Reply field (TSecr) is only valid if the ACK
>          bit is set in the TCP header; if it is valid, it echos a times-
>          tamp value that was sent by the remote TCP in the TSval field
>          of a Timestamps option.  When TSecr is not valid, its value
>          must be zero.  The TSecr value will generally be from the most
>          recent Timestamp option that was received; however, there are
>          exceptions that are explained below.
> 
> Note how this is not saying "a zero Tsecr value is not valid"

Indeed this is a very tricky situation.

What it's saying here is that you can emit a zero TSecr when you
simply haven't received a valid timestamp from the peer yet.

I think we can therefore remove all of these special zero checks, but
we must be mindful to make sure we handle the connection startup case
properly (where we see these 'no valid timestamp yet' zero TSecrs).

^ permalink raw reply

* Re: [PATCH 1/2]: pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer
From: David Miller @ 2009-09-01 22:49 UTC (permalink / raw)
  To: jarkao2; +Cc: netdev, tglx
In-Reply-To: <20090827083640.GA11152@ff.dom.local>

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Thu, 27 Aug 2009 08:36:41 +0000

> On 22-08-2009 02:03, David Miller wrote:
>> None of this stuff should execute in hw IRQ context, therefore
>> use a tasklet_hrtimer so that it runs in softirq context.
>> 
>> Signed-off-by: David S. Miller <davem@davemloft.net>
>> ---
> ...
>>  void qdisc_watchdog_cancel(struct qdisc_watchdog *wd)
>>  {
>> -	hrtimer_cancel(&wd->timer);
>> +	tasklet_hrtimer_cancel(&wd->timer);
>>  	wd->qdisc->flags &= ~TCQ_F_THROTTLED;
>>  }
>>  EXPORT_SYMBOL(qdisc_watchdog_cancel);
> 
> I've had a second look at it and I wonder why it's not reported, but
> since qdisc_watchdog_cancel is run in qdisc_reset with BHs disabled
> it seems there should be (at least) a warning triggered in
> tasklet_kill on positive in_interrupt test (unless I miss something,
> but can't test it now).

There are a lot of unknowns about this stuff.

It's very likely I'm going to simply revert these two patches
and try to work on these issues in net-next-2.6 instead.

^ permalink raw reply

* Re: [PATCH] Re: iproute2 / tbf with large burst seems broken again
From: David Miller @ 2009-09-01 22:51 UTC (permalink / raw)
  To: jarkao2; +Cc: denys, netdev
In-Reply-To: <20090831085848.GF5005@ff.dom.local>

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Mon, 31 Aug 2009 08:58:49 +0000

> On Mon, Aug 31, 2009 at 01:37:38AM -0700, David Miller wrote:
>> From: Denys Fedoryschenko <denys@visp.net.lb>
>> Date: Mon, 31 Aug 2009 11:18:42 +0300
>> 
>> > 2.6.31-rc8 i got dmesg flooded. First message appearing even without patch, i 
>> > will make sure now what is a reason of that (i have HTB classes on ifb0, and 
>> > TBF on pppX).
>> > 
>> > [  155.199923] Attempt to kill tasklet from interrupt
>> 
>> We know the reasons for this message, it's an HTB regression and we're
>> working on it.
> 
> There is a few more than HTB. Btw, IMHO, we don't risk much (if at
> all) if we revert this qdisc_watchdog change. There is still CBQ
> problem, where we could probably reconsider Thomas Gleixner's first
> attempt with additional locking and irq disabling as a temporary
> solution?

As I stated in another email, I think my plan is going to be to
revert both tasklet_hrtimer changes and try to deal with this in
net-next-2.6 instead.

^ permalink raw reply

* Re: [RFC] iproute2: gracefully exit from rtnl_listen()
From: Stephen Hemminger @ 2009-09-01 22:59 UTC (permalink / raw)
  To: David Ward; +Cc: netdev, David Ward
In-Reply-To: <1251841286-32463-1-git-send-email-david.ward@ll.mit.edu>

On Tue,  1 Sep 2009 17:41:26 -0400
David Ward <david.ward@ll.mit.edu> wrote:

> rtnl_listen() (in iproute2's libnetlink) allows a userspace application to
> monitor an RTNETLINK multicast group, so that the application can react to
> changes in the kernel's routing table, neighbor cache, etc. as they occur.
> (This is in contrast with using RTNETLINK to solicit the kernel for a full
> copy of the routing table or neighbor cache at a specific point in time.)
> 
> However rtnl_listen() creates an infinite loop, which does not break in the
> absence of errors, so there does not seem to be any way to gracefully exit
> from it. Existing applications that call rtnl_listen(), such as rtmon, break
> from this loop by terminating the entire application. There should be some
> mechanism for stopping rtnl_listen() and continuing program execution.
> 
> If you assume that rtnl_listen() will only need to be stopped because the
> application has received a signal (such as SIGINT), then the application's
> signal handler can call rtnl_close() to close the RTNETLINK socket.
> Afterwards, with this patch, rtnl_listen() can detect that the socket was
> closed during the interrupt and exit.
> 
> I am offering this as a starting point for consideration, but I feel that
> this is at best a poor solution. What is a better way to fix rtnl_listen()
> (and similar libnetlink functions for consistency)?
> ---
>  lib/libnetlink.c |   15 ++++++++++++---
>  1 files changed, 12 insertions(+), 3 deletions(-)

Good idea, but why not just change the while(1) loop?

diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index b68e2fd..cecfa34 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -188,7 +188,7 @@ int rtnl_dump_filter(struct rtnl_handle *rth,
 	char buf[16384];
 
 	iov.iov_base = buf;
-	while (1) {
+	while (rtnl->fd >= 0) {
 		int status;
 		struct nlmsghdr *h;
 
@@ -200,12 +200,12 @@ int rtnl_dump_filter(struct rtnl_handle *rth,
 				continue;
 			fprintf(stderr, "netlink receive error %s (%d)\n",
 				strerror(errno), errno);
-			return -1;
+			break;
 		}
 
 		if (status == 0) {
 			fprintf(stderr, "EOF on netlink\n");
-			return -1;
+			break;
 		}
 
 		h = (struct nlmsghdr*)buf;
@@ -251,6 +251,8 @@ skip_it:
 			exit(1);
 		}
 	}
+
+	return -1;
 }
 
 int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n, pid_t peer,



^ permalink raw reply related

* [net-next PATCH 1/3] ixgbe: refactor link setup code
From: Jeff Kirsher @ 2009-09-01 23:49 UTC (permalink / raw)
  To: davem
  Cc: netdev, gospo, Mallikarjuna R Chilakala, Peter P Waskiewicz Jr,
	Jeff Kirsher

From: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com>

Link code cleanup: a number of redundant functions and MAC variables are cleaned up,
with some functions being consolidated into a single-purpose code path.
Removed following deprecated link functions and mac variables
 * ixgbe_setup_copper_link_speed_82598
 * ixgbe_setup_mac_link_speed_multispeed_fiber
 * ixgbe_setup_mac_link_speed_82599
 * mac.autoneg, mac.autoneg_succeeded, phy.autoneg_wait_to_complete

Signed-off-by: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgbe/ixgbe_82598.c   |   49 +++----------
 drivers/net/ixgbe/ixgbe_82599.c   |  135 ++++++++++++-------------------------
 drivers/net/ixgbe/ixgbe_ethtool.c |    4 +
 drivers/net/ixgbe/ixgbe_main.c    |   17 ++---
 drivers/net/ixgbe/ixgbe_type.h    |    7 --
 5 files changed, 67 insertions(+), 145 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_82598.c b/drivers/net/ixgbe/ixgbe_82598.c
index 916430f..cb7f0c3 100644
--- a/drivers/net/ixgbe/ixgbe_82598.c
+++ b/drivers/net/ixgbe/ixgbe_82598.c
@@ -41,8 +41,7 @@
 static s32 ixgbe_get_copper_link_capabilities_82598(struct ixgbe_hw *hw,
                                              ixgbe_link_speed *speed,
                                              bool *autoneg);
-static s32 ixgbe_setup_copper_link_82598(struct ixgbe_hw *hw);
-static s32 ixgbe_setup_copper_link_speed_82598(struct ixgbe_hw *hw,
+static s32 ixgbe_setup_copper_link_82598(struct ixgbe_hw *hw,
                                                ixgbe_link_speed speed,
                                                bool autoneg,
                                                bool autoneg_wait_to_complete);
@@ -156,8 +155,6 @@ static s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw)
 	/* Overwrite the link function pointers if copper PHY */
 	if (mac->ops.get_media_type(hw) == ixgbe_media_type_copper) {
 		mac->ops.setup_link = &ixgbe_setup_copper_link_82598;
-		mac->ops.setup_link_speed =
-		                     &ixgbe_setup_copper_link_speed_82598;
 		mac->ops.get_link_capabilities =
 		                  &ixgbe_get_copper_link_capabilities_82598;
 	}
@@ -465,13 +462,14 @@ out:
 }
 
 /**
- *  ixgbe_setup_mac_link_82598 - Configures MAC link settings
+ *  ixgbe_start_mac_link_82598 - Configures MAC link settings
  *  @hw: pointer to hardware structure
  *
  *  Configures link settings based on values in the ixgbe_hw struct.
  *  Restarts the link.  Performs autonegotiation if needed.
  **/
-static s32 ixgbe_setup_mac_link_82598(struct ixgbe_hw *hw)
+static s32 ixgbe_start_mac_link_82598(struct ixgbe_hw *hw,
+                                      bool autoneg_wait_to_complete)
 {
 	u32 autoc_reg;
 	u32 links_reg;
@@ -484,7 +482,7 @@ static s32 ixgbe_setup_mac_link_82598(struct ixgbe_hw *hw)
 	IXGBE_WRITE_REG(hw, IXGBE_AUTOC, autoc_reg);
 
 	/* Only poll for autoneg to complete if specified to do so */
-	if (hw->phy.autoneg_wait_to_complete) {
+	if (autoneg_wait_to_complete) {
 		if ((autoc_reg & IXGBE_AUTOC_LMS_MASK) ==
 		     IXGBE_AUTOC_LMS_KX4_AN ||
 		    (autoc_reg & IXGBE_AUTOC_LMS_MASK) ==
@@ -600,7 +598,7 @@ out:
 
 
 /**
- *  ixgbe_setup_mac_link_speed_82598 - Set MAC link speed
+ *  ixgbe_setup_mac_link_82598 - Set MAC link speed
  *  @hw: pointer to hardware structure
  *  @speed: new link speed
  *  @autoneg: true if auto-negotiation enabled
@@ -608,7 +606,7 @@ out:
  *
  *  Set the link speed in the AUTOC register and restarts link.
  **/
-static s32 ixgbe_setup_mac_link_speed_82598(struct ixgbe_hw *hw,
+static s32 ixgbe_setup_mac_link_82598(struct ixgbe_hw *hw,
                                            ixgbe_link_speed speed, bool autoneg,
                                            bool autoneg_wait_to_complete)
 {
@@ -638,14 +636,12 @@ static s32 ixgbe_setup_mac_link_speed_82598(struct ixgbe_hw *hw,
 	}
 
 	if (status == 0) {
-		hw->phy.autoneg_wait_to_complete = autoneg_wait_to_complete;
-
 		/*
 		 * Setup and restart the link based on the new values in
 		 * ixgbe_hw This will write the AUTOC register based on the new
 		 * stored values
 		 */
-		status = ixgbe_setup_mac_link_82598(hw);
+		status = ixgbe_start_mac_link_82598(hw, autoneg_wait_to_complete);
 	}
 
 	return status;
@@ -653,29 +649,7 @@ static s32 ixgbe_setup_mac_link_speed_82598(struct ixgbe_hw *hw,
 
 
 /**
- *  ixgbe_setup_copper_link_82598 - Setup copper link settings
- *  @hw: pointer to hardware structure
- *
- *  Configures link settings based on values in the ixgbe_hw struct.
- *  Restarts the link.  Performs autonegotiation if needed.  Restart
- *  phy and wait for autonegotiate to finish.  Then synchronize the
- *  MAC and PHY.
- **/
-static s32 ixgbe_setup_copper_link_82598(struct ixgbe_hw *hw)
-{
-	s32 status;
-
-	/* Restart autonegotiation on PHY */
-	status = hw->phy.ops.setup_link(hw);
-
-	/* Set up MAC */
-	ixgbe_setup_mac_link_82598(hw);
-
-	return status;
-}
-
-/**
- *  ixgbe_setup_copper_link_speed_82598 - Set the PHY autoneg advertised field
+ *  ixgbe_setup_copper_link_82598 - Set the PHY autoneg advertised field
  *  @hw: pointer to hardware structure
  *  @speed: new link speed
  *  @autoneg: true if autonegotiation enabled
@@ -683,7 +657,7 @@ static s32 ixgbe_setup_copper_link_82598(struct ixgbe_hw *hw)
  *
  *  Sets the link speed in the AUTOC register in the MAC and restarts link.
  **/
-static s32 ixgbe_setup_copper_link_speed_82598(struct ixgbe_hw *hw,
+static s32 ixgbe_setup_copper_link_82598(struct ixgbe_hw *hw,
                                                ixgbe_link_speed speed,
                                                bool autoneg,
                                                bool autoneg_wait_to_complete)
@@ -695,7 +669,7 @@ static s32 ixgbe_setup_copper_link_speed_82598(struct ixgbe_hw *hw,
 	                                      autoneg_wait_to_complete);
 
 	/* Set up MAC */
-	ixgbe_setup_mac_link_82598(hw);
+	ixgbe_start_mac_link_82598(hw, autoneg_wait_to_complete);
 
 	return status;
 }
@@ -1163,7 +1137,6 @@ static struct ixgbe_mac_operations mac_ops_82598 = {
 	.read_analog_reg8	= &ixgbe_read_analog_reg8_82598,
 	.write_analog_reg8	= &ixgbe_write_analog_reg8_82598,
 	.setup_link		= &ixgbe_setup_mac_link_82598,
-	.setup_link_speed	= &ixgbe_setup_mac_link_speed_82598,
 	.check_link		= &ixgbe_check_mac_link_82598,
 	.get_link_capabilities	= &ixgbe_get_link_capabilities_82598,
 	.led_on			= &ixgbe_led_on_generic,
diff --git a/drivers/net/ixgbe/ixgbe_82599.c b/drivers/net/ixgbe/ixgbe_82599.c
index 364b6d2..5ee5608 100644
--- a/drivers/net/ixgbe/ixgbe_82599.c
+++ b/drivers/net/ixgbe/ixgbe_82599.c
@@ -38,23 +38,23 @@
 #define IXGBE_82599_MC_TBL_SIZE   128
 #define IXGBE_82599_VFT_TBL_SIZE  128
 
-static s32 ixgbe_setup_mac_link_multispeed_fiber(struct ixgbe_hw *hw);
-static s32 ixgbe_setup_mac_link_speed_multispeed_fiber(struct ixgbe_hw *hw,
-                                     ixgbe_link_speed speed, bool autoneg,
-                                     bool autoneg_wait_to_complete);
-static s32 ixgbe_setup_mac_link_82599(struct ixgbe_hw *hw);
-static s32 ixgbe_setup_mac_link_speed_82599(struct ixgbe_hw *hw,
-                                            ixgbe_link_speed speed,
-                                            bool autoneg,
-                                            bool autoneg_wait_to_complete);
+s32 ixgbe_setup_mac_link_multispeed_fiber(struct ixgbe_hw *hw,
+                                          ixgbe_link_speed speed,
+                                          bool autoneg,
+                                          bool autoneg_wait_to_complete);
+s32 ixgbe_start_mac_link_82599(struct ixgbe_hw *hw,
+                               bool autoneg_wait_to_complete);
+s32 ixgbe_setup_mac_link_82599(struct ixgbe_hw *hw,
+                               ixgbe_link_speed speed,
+                               bool autoneg,
+                               bool autoneg_wait_to_complete);
 static s32 ixgbe_get_copper_link_capabilities_82599(struct ixgbe_hw *hw,
                                              ixgbe_link_speed *speed,
                                              bool *autoneg);
-static s32 ixgbe_setup_copper_link_82599(struct ixgbe_hw *hw);
-static s32 ixgbe_setup_copper_link_speed_82599(struct ixgbe_hw *hw,
-                                               ixgbe_link_speed speed,
-                                               bool autoneg,
-                                               bool autoneg_wait_to_complete);
+static s32 ixgbe_setup_copper_link_82599(struct ixgbe_hw *hw,
+                                         ixgbe_link_speed speed,
+                                         bool autoneg,
+                                         bool autoneg_wait_to_complete);
 static s32 ixgbe_verify_fw_version_82599(struct ixgbe_hw *hw);
 
 static void ixgbe_init_mac_link_ops_82599(struct ixgbe_hw *hw)
@@ -62,15 +62,9 @@ static void ixgbe_init_mac_link_ops_82599(struct ixgbe_hw *hw)
 	struct ixgbe_mac_info *mac = &hw->mac;
 	if (hw->phy.multispeed_fiber) {
 		/* Set up dual speed SFP+ support */
-		mac->ops.setup_link =
-		          &ixgbe_setup_mac_link_multispeed_fiber;
-		mac->ops.setup_link_speed =
-		          &ixgbe_setup_mac_link_speed_multispeed_fiber;
+		mac->ops.setup_link = &ixgbe_setup_mac_link_multispeed_fiber;
 	} else {
-		mac->ops.setup_link =
-		          &ixgbe_setup_mac_link_82599;
-		mac->ops.setup_link_speed =
-		          &ixgbe_setup_mac_link_speed_82599;
+		mac->ops.setup_link = &ixgbe_setup_mac_link_82599;
 	}
 }
 
@@ -178,8 +172,6 @@ static s32 ixgbe_init_phy_ops_82599(struct ixgbe_hw *hw)
 	/* If copper media, overwrite with copper function pointers */
 	if (mac->ops.get_media_type(hw) == ixgbe_media_type_copper) {
 		mac->ops.setup_link = &ixgbe_setup_copper_link_82599;
-		mac->ops.setup_link_speed =
-		                     &ixgbe_setup_copper_link_speed_82599;
 		mac->ops.get_link_capabilities =
 		                  &ixgbe_get_copper_link_capabilities_82599;
 	}
@@ -354,13 +346,15 @@ out:
 }
 
 /**
- *  ixgbe_setup_mac_link_82599 - Setup MAC link settings
+ *  ixgbe_start_mac_link_82599 - Setup MAC link settings
  *  @hw: pointer to hardware structure
+ *  @autoneg_wait_to_complete: true when waiting for completion is needed
  *
  *  Configures link settings based on values in the ixgbe_hw struct.
  *  Restarts the link.  Performs autonegotiation if needed.
  **/
-static s32 ixgbe_setup_mac_link_82599(struct ixgbe_hw *hw)
+s32 ixgbe_start_mac_link_82599(struct ixgbe_hw *hw,
+                               bool autoneg_wait_to_complete)
 {
 	u32 autoc_reg;
 	u32 links_reg;
@@ -373,7 +367,7 @@ static s32 ixgbe_setup_mac_link_82599(struct ixgbe_hw *hw)
 	IXGBE_WRITE_REG(hw, IXGBE_AUTOC, autoc_reg);
 
 	/* Only poll for autoneg to complete if specified to do so */
-	if (hw->phy.autoneg_wait_to_complete) {
+	if (autoneg_wait_to_complete) {
 		if ((autoc_reg & IXGBE_AUTOC_LMS_MASK) ==
 		     IXGBE_AUTOC_LMS_KX4_KX_KR ||
 		    (autoc_reg & IXGBE_AUTOC_LMS_MASK) ==
@@ -401,25 +395,7 @@ static s32 ixgbe_setup_mac_link_82599(struct ixgbe_hw *hw)
 }
 
 /**
- *  ixgbe_setup_mac_link_multispeed_fiber - Setup MAC link settings
- *  @hw: pointer to hardware structure
- *
- *  Configures link settings based on values in the ixgbe_hw struct.
- *  Restarts the link for multi-speed fiber at 1G speed, if link
- *  fails at 10G.
- *  Performs autonegotiation if needed.
- **/
-static s32 ixgbe_setup_mac_link_multispeed_fiber(struct ixgbe_hw *hw)
-{
-	s32 status = 0;
-	ixgbe_link_speed link_speed = IXGBE_LINK_SPEED_82599_AUTONEG;
-	status = ixgbe_setup_mac_link_speed_multispeed_fiber(hw, link_speed,
-	                                                     true, true);
-	return status;
-}
-
-/**
- *  ixgbe_setup_mac_link_speed_multispeed_fiber - Set MAC link speed
+ *  ixgbe_setup_mac_link_multispeed_fiber - Set MAC link speed
  *  @hw: pointer to hardware structure
  *  @speed: new link speed
  *  @autoneg: true if autonegotiation enabled
@@ -427,10 +403,10 @@ static s32 ixgbe_setup_mac_link_multispeed_fiber(struct ixgbe_hw *hw)
  *
  *  Set the link speed in the AUTOC register and restarts link.
  **/
-static s32 ixgbe_setup_mac_link_speed_multispeed_fiber(struct ixgbe_hw *hw,
-                                                ixgbe_link_speed speed,
-                                                bool autoneg,
-                                                bool autoneg_wait_to_complete)
+s32 ixgbe_setup_mac_link_multispeed_fiber(struct ixgbe_hw *hw,
+                                          ixgbe_link_speed speed,
+                                          bool autoneg,
+                                          bool autoneg_wait_to_complete)
 {
 	s32 status = 0;
 	ixgbe_link_speed phy_link_speed;
@@ -485,10 +461,10 @@ static s32 ixgbe_setup_mac_link_speed_multispeed_fiber(struct ixgbe_hw *hw,
 		/* Allow module to change analog characteristics (1G->10G) */
 		msleep(40);
 
-		status = ixgbe_setup_mac_link_speed_82599(hw,
-		                                     IXGBE_LINK_SPEED_10GB_FULL,
-		                                     autoneg,
-		                                     autoneg_wait_to_complete);
+		status = ixgbe_setup_mac_link_82599(hw,
+		                               IXGBE_LINK_SPEED_10GB_FULL,
+		                               autoneg,
+		                               autoneg_wait_to_complete);
 		if (status != 0)
 			goto out;
 
@@ -539,7 +515,7 @@ static s32 ixgbe_setup_mac_link_speed_multispeed_fiber(struct ixgbe_hw *hw,
 		/* Allow module to change analog characteristics (10G->1G) */
 		msleep(40);
 
-		status = ixgbe_setup_mac_link_speed_82599(hw,
+		status = ixgbe_setup_mac_link_82599(hw,
 		                                      IXGBE_LINK_SPEED_1GB_FULL,
 		                                      autoneg,
 		                                      autoneg_wait_to_complete);
@@ -576,10 +552,10 @@ static s32 ixgbe_setup_mac_link_speed_multispeed_fiber(struct ixgbe_hw *hw,
 	 * single highest speed that the user requested.
 	 */
 	if (speedcnt > 1)
-		status = ixgbe_setup_mac_link_speed_multispeed_fiber(hw,
-		                                     highest_link_speed,
-		                                     autoneg,
-		                                     autoneg_wait_to_complete);
+		status = ixgbe_setup_mac_link_multispeed_fiber(hw,
+		                                               highest_link_speed,
+		                                               autoneg,
+		                                               autoneg_wait_to_complete);
 
 out:
 	return status;
@@ -640,7 +616,7 @@ static s32 ixgbe_check_mac_link_82599(struct ixgbe_hw *hw,
 }
 
 /**
- *  ixgbe_setup_mac_link_speed_82599 - Set MAC link speed
+ *  ixgbe_setup_mac_link_82599 - Set MAC link speed
  *  @hw: pointer to hardware structure
  *  @speed: new link speed
  *  @autoneg: true if autonegotiation enabled
@@ -648,10 +624,9 @@ static s32 ixgbe_check_mac_link_82599(struct ixgbe_hw *hw,
  *
  *  Set the link speed in the AUTOC register and restarts link.
  **/
-static s32 ixgbe_setup_mac_link_speed_82599(struct ixgbe_hw *hw,
-                                            ixgbe_link_speed speed,
-                                            bool autoneg,
-                                            bool autoneg_wait_to_complete)
+s32 ixgbe_setup_mac_link_82599(struct ixgbe_hw *hw,
+                               ixgbe_link_speed speed, bool autoneg,
+                               bool autoneg_wait_to_complete)
 {
 	s32 status = 0;
 	u32 autoc = IXGBE_READ_REG(hw, IXGBE_AUTOC);
@@ -751,26 +726,7 @@ out:
 }
 
 /**
- *  ixgbe_setup_copper_link_82599 - Setup copper link settings
- *  @hw: pointer to hardware structure
- *
- *  Restarts the link on PHY and then MAC. Performs autonegotiation if needed.
- **/
-static s32 ixgbe_setup_copper_link_82599(struct ixgbe_hw *hw)
-{
-	s32 status;
-
-	/* Restart autonegotiation on PHY */
-	status = hw->phy.ops.setup_link(hw);
-
-	/* Set up MAC */
-	ixgbe_setup_mac_link_82599(hw);
-
-	return status;
-}
-
-/**
- *  ixgbe_setup_copper_link_speed_82599 - Set the PHY autoneg advertised field
+ *  ixgbe_setup_copper_link_82599 - Set the PHY autoneg advertised field
  *  @hw: pointer to hardware structure
  *  @speed: new link speed
  *  @autoneg: true if autonegotiation enabled
@@ -778,10 +734,10 @@ static s32 ixgbe_setup_copper_link_82599(struct ixgbe_hw *hw)
  *
  *  Restarts link on PHY and MAC based on settings passed in.
  **/
-static s32 ixgbe_setup_copper_link_speed_82599(struct ixgbe_hw *hw,
-                                               ixgbe_link_speed speed,
-                                               bool autoneg,
-                                               bool autoneg_wait_to_complete)
+static s32 ixgbe_setup_copper_link_82599(struct ixgbe_hw *hw,
+                                         ixgbe_link_speed speed,
+                                         bool autoneg,
+                                         bool autoneg_wait_to_complete)
 {
 	s32 status;
 
@@ -789,7 +745,7 @@ static s32 ixgbe_setup_copper_link_speed_82599(struct ixgbe_hw *hw,
 	status = hw->phy.ops.setup_link_speed(hw, speed, autoneg,
 	                                      autoneg_wait_to_complete);
 	/* Set up MAC */
-	ixgbe_setup_mac_link_82599(hw);
+	ixgbe_start_mac_link_82599(hw, autoneg_wait_to_complete);
 
 	return status;
 }
@@ -2470,7 +2426,6 @@ static struct ixgbe_mac_operations mac_ops_82599 = {
 	.read_analog_reg8       = &ixgbe_read_analog_reg8_82599,
 	.write_analog_reg8      = &ixgbe_write_analog_reg8_82599,
 	.setup_link             = &ixgbe_setup_mac_link_82599,
-	.setup_link_speed       = &ixgbe_setup_mac_link_speed_82599,
 	.check_link             = &ixgbe_check_mac_link_82599,
 	.get_link_capabilities  = &ixgbe_get_link_capabilities_82599,
 	.led_on                 = &ixgbe_led_on_generic,
diff --git a/drivers/net/ixgbe/ixgbe_ethtool.c b/drivers/net/ixgbe/ixgbe_ethtool.c
index 1444ec5..026e94a 100644
--- a/drivers/net/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ixgbe/ixgbe_ethtool.c
@@ -233,11 +233,11 @@ static int ixgbe_set_settings(struct net_device *netdev,
 			return err;
 		/* this sets the link speed and restarts auto-neg */
 		hw->mac.autotry_restart = true;
-		err = hw->mac.ops.setup_link_speed(hw, advertised, true, true);
+		err = hw->mac.ops.setup_link(hw, advertised, true, true);
 		if (err) {
 			DPRINTK(PROBE, INFO,
 			        "setup link failed with code %d\n", err);
-			hw->mac.ops.setup_link_speed(hw, old, true, true);
+			hw->mac.ops.setup_link(hw, old, true, true);
 		}
 	} else {
 		/* in this case we currently only support 10Gb/FULL */
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index f907836..4042d87 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -2516,7 +2516,7 @@ static void ixgbe_sfp_link_config(struct ixgbe_adapter *adapter)
 static int ixgbe_non_sfp_link_config(struct ixgbe_hw *hw)
 {
 	u32 autoneg;
-	bool link_up = false;
+	bool negotiation, link_up = false;
 	u32 ret = IXGBE_ERR_LINK_SETUP;
 
 	if (hw->mac.ops.check_link)
@@ -2526,13 +2526,12 @@ static int ixgbe_non_sfp_link_config(struct ixgbe_hw *hw)
 		goto link_cfg_out;
 
 	if (hw->mac.ops.get_link_capabilities)
-		ret = hw->mac.ops.get_link_capabilities(hw, &autoneg,
-		                                        &hw->mac.autoneg);
+		ret = hw->mac.ops.get_link_capabilities(hw, &autoneg, &negotiation);
 	if (ret)
 		goto link_cfg_out;
 
-	if (hw->mac.ops.setup_link_speed)
-		ret = hw->mac.ops.setup_link_speed(hw, autoneg, true, link_up);
+	if (hw->mac.ops.setup_link)
+		ret = hw->mac.ops.setup_link(hw, autoneg, negotiation, link_up);
 link_cfg_out:
 	return ret;
 }
@@ -4517,14 +4516,14 @@ static void ixgbe_multispeed_fiber_task(struct work_struct *work)
 	                                             multispeed_fiber_task);
 	struct ixgbe_hw *hw = &adapter->hw;
 	u32 autoneg;
+	bool negotiation;
 
 	adapter->flags |= IXGBE_FLAG_IN_SFP_LINK_TASK;
 	autoneg = hw->phy.autoneg_advertised;
 	if ((!autoneg) && (hw->mac.ops.get_link_capabilities))
-		hw->mac.ops.get_link_capabilities(hw, &autoneg,
-		                                  &hw->mac.autoneg);
-	if (hw->mac.ops.setup_link_speed)
-		hw->mac.ops.setup_link_speed(hw, autoneg, true, true);
+		hw->mac.ops.get_link_capabilities(hw, &autoneg, &negotiation);
+	if (hw->mac.ops.setup_link)
+		hw->mac.ops.setup_link(hw, autoneg, negotiation, true);
 	adapter->flags |= IXGBE_FLAG_NEED_LINK_UPDATE;
 	adapter->flags &= ~IXGBE_FLAG_IN_SFP_LINK_TASK;
 }
diff --git a/drivers/net/ixgbe/ixgbe_type.h b/drivers/net/ixgbe/ixgbe_type.h
index f0f3406..8ba90ee 100644
--- a/drivers/net/ixgbe/ixgbe_type.h
+++ b/drivers/net/ixgbe/ixgbe_type.h
@@ -2332,9 +2332,7 @@ struct ixgbe_mac_operations {
 	s32 (*enable_rx_dma)(struct ixgbe_hw *, u32);
 
 	/* Link */
-	s32 (*setup_link)(struct ixgbe_hw *);
-	s32 (*setup_link_speed)(struct ixgbe_hw *, ixgbe_link_speed, bool,
-	                        bool);
+	s32 (*setup_link)(struct ixgbe_hw *, ixgbe_link_speed, bool, bool);
 	s32 (*check_link)(struct ixgbe_hw *, ixgbe_link_speed *, bool *, bool);
 	s32 (*get_link_capabilities)(struct ixgbe_hw *, ixgbe_link_speed *,
 	                             bool *);
@@ -2406,8 +2404,6 @@ struct ixgbe_mac_info {
 	u32                             orig_autoc;
 	u32                             orig_autoc2;
 	bool                            orig_link_settings_stored;
-	bool                            autoneg;
-	bool                            autoneg_succeeded;
 	bool                            autotry_restart;
 };
 
@@ -2422,7 +2418,6 @@ struct ixgbe_phy_info {
 	enum ixgbe_media_type           media_type;
 	bool                            reset_disable;
 	ixgbe_autoneg_advertised        autoneg_advertised;
-	bool                            autoneg_wait_to_complete;
 	bool                            multispeed_fiber;
 };
 


^ permalink raw reply related

* [net-next PATCH 2/3] ixgbe: Properly disable DCB arbiters prior to applying changes
From: Jeff Kirsher @ 2009-09-01 23:49 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Peter P Waskiewicz Jr, Jeff Kirsher
In-Reply-To: <20090901234916.4617.73579.stgit@localhost.localdomain>

From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>

When disabling the Rx and Tx data arbiters prior to configuration changes,
the arbiters were not being shut down properly.  This can create a race
in the DCB hardware blocks, and potentially hang the arbiters.  Also, the
Tx descriptor arbiter shouldn't be disabled when applying configuration
changes; disabling this arbiter can cause a Tx hang.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgbe/ixgbe_dcb_82599.c |   21 ++++++++++++++-------
 1 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_dcb_82599.c b/drivers/net/ixgbe/ixgbe_dcb_82599.c
index 589f62c..ec8a252 100644
--- a/drivers/net/ixgbe/ixgbe_dcb_82599.c
+++ b/drivers/net/ixgbe/ixgbe_dcb_82599.c
@@ -145,8 +145,12 @@ s32 ixgbe_dcb_config_rx_arbiter_82599(struct ixgbe_hw *hw,
 	u32    credit_max    = 0;
 	u8     i             = 0;
 
-	/* Disable the arbiter before changing parameters */
-	IXGBE_WRITE_REG(hw, IXGBE_RTRPCS, IXGBE_RTRPCS_ARBDIS);
+	/*
+	 * Disable the arbiter before changing parameters
+	 * (always enable recycle mode; WSP)
+	 */
+	reg = IXGBE_RTRPCS_RRM | IXGBE_RTRPCS_RAC | IXGBE_RTRPCS_ARBDIS;
+	IXGBE_WRITE_REG(hw, IXGBE_RTRPCS, reg);
 
 	/* Map all traffic classes to their UP, 1 to 1 */
 	reg = 0;
@@ -194,9 +198,6 @@ s32 ixgbe_dcb_config_tx_desc_arbiter_82599(struct ixgbe_hw *hw,
 	u32    reg, max_credits;
 	u8     i;
 
-	/* Disable the arbiter before changing parameters */
-	IXGBE_WRITE_REG(hw, IXGBE_RTTDCS, IXGBE_RTTDCS_ARBDIS);
-
 	/* Clear the per-Tx queue credits; we use per-TC instead */
 	for (i = 0; i < 128; i++) {
 		IXGBE_WRITE_REG(hw, IXGBE_RTTDQSEL, i);
@@ -244,8 +245,14 @@ s32 ixgbe_dcb_config_tx_data_arbiter_82599(struct ixgbe_hw *hw,
 	u32 reg;
 	u8 i;
 
-	/* Disable the arbiter before changing parameters */
-	IXGBE_WRITE_REG(hw, IXGBE_RTTPCS, IXGBE_RTTPCS_ARBDIS);
+	/*
+	 * Disable the arbiter before changing parameters
+	 * (always enable recycle mode; SP; arb delay)
+	 */
+	reg = IXGBE_RTTPCS_TPPAC | IXGBE_RTTPCS_TPRM |
+	      (IXGBE_RTTPCS_ARBD_DCB << IXGBE_RTTPCS_ARBD_SHIFT) |
+	      IXGBE_RTTPCS_ARBDIS;
+	IXGBE_WRITE_REG(hw, IXGBE_RTTPCS, reg);
 
 	/* Map all traffic classes to their UP, 1 to 1 */
 	reg = 0;


^ permalink raw reply related

* [net-next PATCH 3/3] ixgbe: Patch to fix 82599 multispeed fiber link issues when driver is loaded without any cable and reconnecting it to 1G partner
From: Jeff Kirsher @ 2009-09-01 23:50 UTC (permalink / raw)
  To: davem
  Cc: netdev, gospo, Mallikarjuna R Chilakala, Peter P Waskiewicz Jr,
	Jeff Kirsher
In-Reply-To: <20090901234916.4617.73579.stgit@localhost.localdomain>

From: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com>

In 82599 multi speed fiber case when driver is loaded without any
cable and reconnecting the cable with a 1G partner does not bring
up the link in 1Gb mode. When there is no link we first setup the link
at 10G & 1G and then try to re-establish the link at highest speed 10G
and thereby changing autoneg_advertised value to highest speed 10G.
After connecting back the cable to a 1G link partner we never try 1G
as autoneg advertised value is changed to link at 10G only. The
following patch fixes the issue by properly initializing the
autoneg_advertised value just before exiting from link setup routine.

Signed-off-by: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgbe/ixgbe_82599.c |   22 +++++++++++-----------
 1 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_82599.c b/drivers/net/ixgbe/ixgbe_82599.c
index 5ee5608..61af47e 100644
--- a/drivers/net/ixgbe/ixgbe_82599.c
+++ b/drivers/net/ixgbe/ixgbe_82599.c
@@ -421,15 +421,6 @@ s32 ixgbe_setup_mac_link_multispeed_fiber(struct ixgbe_hw *hw,
 	hw->mac.ops.get_link_capabilities(hw, &phy_link_speed, &negotiation);
 	speed &= phy_link_speed;
 
-	 /* Set autoneg_advertised value based on input link speed */
-	hw->phy.autoneg_advertised = 0;
-
-	if (speed & IXGBE_LINK_SPEED_10GB_FULL)
-		hw->phy.autoneg_advertised |= IXGBE_LINK_SPEED_10GB_FULL;
-
-	if (speed & IXGBE_LINK_SPEED_1GB_FULL)
-		hw->phy.autoneg_advertised |= IXGBE_LINK_SPEED_1GB_FULL;
-
 	/*
 	 * When the driver changes the link speeds that it can support,
 	 * it sets autotry_restart to true to indicate that we need to
@@ -466,7 +457,7 @@ s32 ixgbe_setup_mac_link_multispeed_fiber(struct ixgbe_hw *hw,
 		                               autoneg,
 		                               autoneg_wait_to_complete);
 		if (status != 0)
-			goto out;
+			return status;
 
 		/* Flap the tx laser if it has not already been done */
 		if (hw->mac.autotry_restart) {
@@ -520,7 +511,7 @@ s32 ixgbe_setup_mac_link_multispeed_fiber(struct ixgbe_hw *hw,
 		                                      autoneg,
 		                                      autoneg_wait_to_complete);
 		if (status != 0)
-			goto out;
+			return status;
 
 		/* Flap the tx laser if it has not already been done */
 		if (hw->mac.autotry_restart) {
@@ -558,6 +549,15 @@ s32 ixgbe_setup_mac_link_multispeed_fiber(struct ixgbe_hw *hw,
 		                                               autoneg_wait_to_complete);
 
 out:
+	/* Set autoneg_advertised value based on input link speed */
+	hw->phy.autoneg_advertised = 0;
+
+	if (speed & IXGBE_LINK_SPEED_10GB_FULL)
+		hw->phy.autoneg_advertised |= IXGBE_LINK_SPEED_10GB_FULL;
+
+	if (speed & IXGBE_LINK_SPEED_1GB_FULL)
+		hw->phy.autoneg_advertised |= IXGBE_LINK_SPEED_1GB_FULL;
+
 	return status;
 }
 


^ permalink raw reply related

* Re: [net-next PATCH 1/3] ixgbe: refactor link setup code
From: David Miller @ 2009-09-02  0:43 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, gospo, mallikarjuna.chilakala, peter.p.waskiewicz.jr
In-Reply-To: <20090901234916.4617.73579.stgit@localhost.localdomain>


All 3 patches applied, thanks.

^ permalink raw reply

* Re: [net-next PATCH] e1000: Fix for e1000 kills IPMI on a tagged vlan.
From: David Miller @ 2009-09-02  0:43 UTC (permalink / raw)
  To: ole; +Cc: jeffrey.t.kirsher, netdev, gospo, david.graham
In-Reply-To: <alpine.LNX.1.10.0909010247460.5651@bizon.gios.gov.pl>

From: Krzysztof Oledzki <ole@ans.pl>
Date: Tue, 1 Sep 2009 02:53:29 +0200 (CEST)

> 
> 
> On Mon, 31 Aug 2009, Jeff Kirsher wrote:
> 
>> From: Graham, David <david.graham@intel.com>
> 
> <CUT>
>> A complete solution to this issue would require further driver
>> changes.
>> The driver would need to discover if (and which) management VLANs are
>> active before enabling VLAN filtering, so that it could ensure that
>> the
>> managed VLANs are included in the VLAN filter table. This discovery
>> requires that the BMC identifies its VLAN in registers accessible
>> to the driver, and at least on Dell PE2850 systems the BMC does not
>> identify its VLAN to allow such discovery. Intel is pursuing this
>> issue
>> with the BMC vendor.
>>
>> Signed-off-by: Dave Graham <david.graham@intel.com>
>> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> 
> Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next-2.6] macvlan: Use compare_ether_addr_64bits()
From: David Miller @ 2009-09-02  0:43 UTC (permalink / raw)
  To: eric.dumazet; +Cc: kaber, netdev
In-Reply-To: <4A9D41BD.9000406@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 01 Sep 2009 17:46:05 +0200

> To speedup ether addresses compares, we can use compare_ether_addr_64bits()
> (all operands are guaranteed to be at least 8 bytes long)
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox