Netdev List

Netdev List
 help / color / mirror / Atom feed

* [patch net-next 0/3] team: bug fixes
From: Jiri Pirko @ 2011-11-16 21:09 UTC (permalink / raw)
  To: netdev
  Cc: davem, eric.dumazet, bhutchings, shemminger, andy, fbl, jzupka,
	ivecera

Jiri Pirko (3):
  team: Do not hold rcu_read_lock when running netlink cmds
  team: convert overall spinlock to mutex
  team: replicate options on register

 drivers/net/team/team.c                   |  120 +++++++++++++++++++++--------
 drivers/net/team/team_mode_activebackup.c |    5 +-
 include/linux/if_team.h                   |   10 +-
 3 files changed, 93 insertions(+), 42 deletions(-)

-- 
1.7.6

^ permalink raw reply

* [PATCH 2/2] net: add documentation for net_prio cgroups
From: Neil Horman @ 2011-11-16 20:51 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, John Fastabend, Robert Love, David S. Miller
In-Reply-To: <1321476666-8225-1-git-send-email-nhorman@tuxdriver.com>

Add the requisite documentation to explain to new users how net_prio cgroups work

Signed-off-by:Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
CC: Robert Love <robert.w.love@intel.com>
CC: "David S. Miller" <davem@davemloft.net>
---
 Documentation/cgroups/net_prio.txt |   53 ++++++++++++++++++++++++++++++++++++
 1 files changed, 53 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/cgroups/net_prio.txt

diff --git a/Documentation/cgroups/net_prio.txt b/Documentation/cgroups/net_prio.txt
new file mode 100644
index 0000000..01b3226
--- /dev/null
+++ b/Documentation/cgroups/net_prio.txt
@@ -0,0 +1,53 @@
+Network priority cgroup
+-------------------------
+
+The Network priority cgroup provides an interface to allow an administrator to
+dynamically set the priority of network traffic generated by various
+applications
+
+Nominally, an application would set the priority of its traffic via the
+SO_PRIORITY socket option.  This however, is not always possible because:
+
+1) The application may not have been coded to set this value
+2) The priority of application traffic is often a site-specific administrative
+   decision rather than an application defined one.
+
+This cgroup allows an administrator to assign a process to a group which defines
+the priority of egress traffic on a given interface. Network priority groups can
+be created by first mounting the cgroup filesystem.
+
+# mount -t cgroup -onet_prio none /sys/fs/cgroup/net_prio
+
+With the above step, the initial group acting as the parent accounting group
+becomes visible at '/sys/fs/cgroup/net_prio'.  This group includes all tasks in
+the system. '/sys/fs/cgroup/net_prio/tasks' lists the tasks in this cgroup.
+
+Each net_prio cgroup contains two files that are subsystem specific
+
+net_prio.prioidx
+This file is read-only, and is simply informative.  It contains a unique integer
+value that the kernel uses as an internal representation of this cgroup.
+
+net_prio.ifpriomap
+This file contains a map of the priorities assigned to traffic originating from
+processes in this group and egressing the system on various interfaces. It
+contains a list of tuples in the form <ifname priority>.  Contents of this file
+can be modified by echoing a string into the file using the same tuple format.
+for example:
+
+echo "eth0 5" > /sys/fs/cgroups/net_prio/iscsi/net_prio.ifpriomap
+
+This command would force any traffic originating from processes belonging to the
+iscsi net_prio cgroup and egressing on interface eth0 to have the priority of
+said traffic set to the value 5. The parent accounting group also has a
+writeable 'net_prio.ifpriomap' file that can be used to set a system default
+priority.
+
+Priorities are set immediately prior to queueing a frame to the device
+queueing discipline (qdisc) so priorities will be assigned prior to the hardware
+queue selection being made.
+
+One usage for the net_prio cgroup is with mqprio qdisc allowing application
+traffic to be steered to hardware/driver based traffic classes. These mappings
+can then be managed by administrators or other networking protocols such as
+DCBX.
-- 
1.7.6.4

^ permalink raw reply related

* [PATCH 1/2] net: add network priority cgroup infrastructure
From: Neil Horman @ 2011-11-16 20:51 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, John Fastabend, Robert Love, David S. Miller
In-Reply-To: <1321476666-8225-1-git-send-email-nhorman@tuxdriver.com>

This patch adds in the infrastructure code to create the network priority
cgroup.  The cgroup, in addition to the standard processes file creates two
control files:

1) prioidx - This is a read-only file that exports the index of this cgroup.
This is a value that is both arbitrary and unique to a cgroup in this subsystem,
and is used to index the per-device priority map

2) priomap - This is a writeable file.  On read it reports a table of 2-tuples
<name:priority> where name is the name of a network interface and priority is
indicates the priority assigned to frames egresessing on the named interface and
originating from a pid in this cgroup

This cgroup allows for skb priority to be set prior to a root qdisc getting
selected. This is benenficial for DCB enabled systems, in that it allows for any
application to use dcb configured priorities so without application modification

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
CC: Robert Love <robert.w.love@intel.com>
CC: "David S. Miller" <davem@davemloft.net>
---
 include/linux/cgroup_subsys.h |    8 +
 include/linux/netdevice.h     |    4 +
 include/net/netprio_cgroup.h  |   66 ++++++++
 include/net/sock.h            |    3 +
 net/Kconfig                   |    7 +
 net/core/Makefile             |    1 +
 net/core/dev.c                |   13 ++
 net/core/netprio_cgroup.c     |  340 +++++++++++++++++++++++++++++++++++++++++
 net/core/sock.c               |   22 +++-
 net/socket.c                  |    2 +
 10 files changed, 465 insertions(+), 1 deletions(-)
 create mode 100644 include/net/netprio_cgroup.h
 create mode 100644 net/core/netprio_cgroup.c

diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index ac663c1..0bd390c 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -59,8 +59,16 @@ SUBSYS(net_cls)
 SUBSYS(blkio)
 #endif
 
+/* */
+
 #ifdef CONFIG_CGROUP_PERF
 SUBSYS(perf)
 #endif
 
 /* */
+
+#ifdef CONFIG_NETPRIO_CGROUP
+SUBSYS(net_prio)
+#endif
+
+/* */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0db1f5f..86e8c3f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -50,6 +50,7 @@
 #ifdef CONFIG_DCB
 #include <net/dcbnl.h>
 #endif
+#include <net/netprio_cgroup.h>
 
 struct vlan_group;
 struct netpoll_info;
@@ -1312,6 +1313,9 @@ struct net_device {
 	/* max exchange id for FCoE LRO by ddp */
 	unsigned int		fcoe_ddp_xid;
 #endif
+#if IS_ENABLED(CONFIG_NETPRIO_CGROUP)
+	struct netprio_map *priomap;
+#endif
 	/* phy device may attach itself for hardware timestamping */
 	struct phy_device *phydev;
 
diff --git a/include/net/netprio_cgroup.h b/include/net/netprio_cgroup.h
new file mode 100644
index 0000000..6b65936
--- /dev/null
+++ b/include/net/netprio_cgroup.h
@@ -0,0 +1,66 @@
+/*
+ * netprio_cgroup.h			Control Group Priority set 
+ *
+ *
+ * Authors:	Neil Horman <nhorman@tuxdriver.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#ifndef _NETPRIO_CGROUP_H
+#define _NETPRIO_CGROUP_H
+#include <linux/module.h>
+#include <linux/cgroup.h>
+#include <linux/hardirq.h>
+#include <linux/rcupdate.h>
+
+struct cgroup_netprio_state
+{
+	struct cgroup_subsys_state css;
+	u32 prioidx;
+};
+
+struct netprio_map {
+	struct rcu_head rcu;
+	u32 priomap_len;
+	u32 priomap[];
+};
+
+#ifdef CONFIG_CGROUPS
+
+#ifndef CONFIG_NETPRIO_CGROUP
+extern int net_prio_subsys_id;
+#endif
+
+extern void sock_update_netprioidx(struct sock *sk);
+extern void skb_update_prio(struct sk_buff *skb);
+
+static inline struct cgroup_netprio_state
+		*task_netprio_state(struct task_struct *p)
+{
+#if IS_ENABLED(CONFIG_NETPRIO_CGROUP)
+	return container_of(task_subsys_state(p, net_prio_subsys_id),
+			    struct cgroup_netprio_state, css);
+#else
+	return NULL;
+#endif
+}
+
+#else
+
+#define sock_update_netprioidx(sk)
+#define skb_update_prio(skb)
+
+static inline struct cgroup_netprio_state
+		*task_netprio_state(struct task_struct *p)
+{
+	return NULL;
+}
+
+#endif
+
+#endif  /* _NET_CLS_CGROUP_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index 5ac682f..87b24aa 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -321,6 +321,9 @@ struct sock {
 	unsigned short		sk_ack_backlog;
 	unsigned short		sk_max_ack_backlog;
 	__u32			sk_priority;
+#ifdef CONFIG_CGROUPS
+	__u32			sk_cgrp_prioidx;
+#endif
 	struct pid		*sk_peer_pid;
 	const struct cred	*sk_peer_cred;
 	long			sk_rcvtimeo;
diff --git a/net/Kconfig b/net/Kconfig
index a073148..63d2c5d 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -232,6 +232,13 @@ config XPS
 	depends on SMP && SYSFS && USE_GENERIC_SMP_HELPERS
 	default y
 
+config NETPRIO_CGROUP
+	tristate "Network priority cgroup"
+	depends on CGROUPS
+	---help---
+	  Cgroup subsystem for use in assigning processes to network priorities on
+	  a per-interface basis
+
 config HAVE_BPF_JIT
 	bool
 
diff --git a/net/core/Makefile b/net/core/Makefile
index 0d357b1..3606d40 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -19,3 +19,4 @@ obj-$(CONFIG_FIB_RULES) += fib_rules.o
 obj-$(CONFIG_TRACEPOINTS) += net-traces.o
 obj-$(CONFIG_NET_DROP_MONITOR) += drop_monitor.o
 obj-$(CONFIG_NETWORK_PHY_TIMESTAMPING) += timestamping.o
+obj-$(CONFIG_NETPRIO_CGROUP) += netprio_cgroup.o
diff --git a/net/core/dev.c b/net/core/dev.c
index b7ba81a..a1dca83 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2456,6 +2456,17 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 	return rc;
 }
 
+#ifdef CONFIG_CGROUPS
+void skb_update_prio(struct sk_buff *skb)
+{
+	struct netprio_map *map = rcu_dereference(skb->dev->priomap);
+
+	if ((!skb->priority) && (skb->sk) && map)
+		skb->priority = map->priomap[skb->sk->sk_cgrp_prioidx];
+}
+EXPORT_SYMBOL_GPL(skb_update_prio);
+#endif
+
 static DEFINE_PER_CPU(int, xmit_recursion);
 #define RECURSION_LIMIT 10
 
@@ -2496,6 +2507,8 @@ int dev_queue_xmit(struct sk_buff *skb)
 	 */
 	rcu_read_lock_bh();
 
+	skb_update_prio(skb);
+
 	txq = dev_pick_tx(dev, skb);
 	q = rcu_dereference_bh(txq->qdisc);
 
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
new file mode 100644
index 0000000..14e896c
--- /dev/null
+++ b/net/core/netprio_cgroup.c
@@ -0,0 +1,340 @@
+/*
+ * net/sched/cls_cgroup.c	Control Group Classifier
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Thomas Graf <tgraf@suug.ch>
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/cgroup.h>
+#include <linux/rcupdate.h>
+#include <linux/atomic.h>
+#include <net/rtnetlink.h>
+#include <net/pkt_cls.h>
+#include <net/sock.h>
+#include <net/netprio_cgroup.h>
+
+static struct cgroup_subsys_state *cgrp_create(struct cgroup_subsys *ss,
+					       struct cgroup *cgrp);
+static void cgrp_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp);
+static int cgrp_populate(struct cgroup_subsys *ss, struct cgroup *cgrp);
+
+struct cgroup_subsys net_prio_subsys = {
+	.name		= "net_prio",
+	.create		= cgrp_create,
+	.destroy	= cgrp_destroy,
+	.populate	= cgrp_populate,
+#ifdef CONFIG_NETPRIO_CGROUP
+	.subsys_id	= net_prio_subsys_id,
+#endif
+	.module		= THIS_MODULE
+};
+
+#define PRIOIDX_SZ 128
+
+static unsigned long prioidx_map[PRIOIDX_SZ];
+static DEFINE_SPINLOCK(prioidx_map_lock);
+static atomic_t max_prioidx = ATOMIC_INIT(0);
+
+static inline struct cgroup_netprio_state *cgrp_netprio_state(struct cgroup *cgrp)
+{
+	return container_of(cgroup_subsys_state(cgrp, net_prio_subsys_id),
+			    struct cgroup_netprio_state, css);
+}
+
+static int get_prioidx(u32 *prio)
+{
+	unsigned long flags;
+	u32 prioidx;
+
+	spin_lock_irqsave(&prioidx_map_lock, flags);
+	prioidx = find_first_zero_bit(prioidx_map, sizeof(unsigned long) * PRIOIDX_SZ);
+	set_bit(prioidx, prioidx_map);
+	spin_unlock_irqrestore(&prioidx_map_lock, flags);
+	if (prioidx == sizeof(unsigned long) * PRIOIDX_SZ)
+		return -ENOSPC;
+
+	atomic_set(&max_prioidx, prioidx);
+	*prio = prioidx;
+	return 0;
+}
+
+static void put_prioidx(u32 idx)
+{
+	unsigned long flags;
+	spin_lock_irqsave(&prioidx_map_lock, flags);
+	clear_bit(idx, prioidx_map);
+	spin_unlock_irqrestore(&prioidx_map_lock, flags);
+}
+
+static void extend_netdev_table(struct net_device *dev, u32 new_len)
+{
+	size_t new_size = sizeof(struct netprio_map) +
+			   ((sizeof(u32) * new_len));
+	struct netprio_map *new_priomap = kzalloc(new_size, GFP_KERNEL);
+	struct netprio_map *old_priomap;
+	int i;
+
+	old_priomap  = rcu_dereference_protected(dev->priomap, 1);
+
+
+	if (!new_priomap) {
+		printk(KERN_WARNING "Unable to alloc new priomap!\n");
+		return;
+	}
+
+	for (i = 0;
+	     dev->priomap && (i < dev->priomap->priomap_len);
+	     i++)
+		new_priomap->priomap[i] = dev->priomap->priomap[i];
+
+	new_priomap->priomap_len = new_len;
+
+	rcu_assign_pointer(dev->priomap, new_priomap);
+	if (old_priomap)
+		kfree_rcu(old_priomap, rcu);
+
+}
+
+static void update_netdev_tables(void)
+{
+	struct net_device *dev;
+	u32 max_len = atomic_read(&max_prioidx);
+
+	rtnl_lock();
+
+	for_each_netdev(&init_net, dev) {
+		if ((!dev->priomap) ||
+		    (dev->priomap->priomap_len < max_len))
+			extend_netdev_table(dev, max_len);
+	}
+
+	rtnl_unlock();
+}
+
+static struct cgroup_subsys_state *cgrp_create(struct cgroup_subsys *ss,
+						 struct cgroup *cgrp)
+{
+	struct cgroup_netprio_state *cs;
+	int ret;
+
+	cs = kzalloc(sizeof(*cs), GFP_KERNEL);
+	if (!cs)
+		return ERR_PTR(-ENOMEM);
+
+	if (cgrp->parent && cgrp_netprio_state(cgrp->parent)->prioidx)
+		return ERR_PTR(-EINVAL);
+
+	ret = get_prioidx(&cs->prioidx);
+	if (ret != 0) {
+		printk(KERN_WARNING "No space in priority index array\n");
+		return ERR_PTR(ret);
+	}
+
+	return &cs->css;
+}
+
+static void cgrp_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp)
+{
+	struct cgroup_netprio_state *cs;
+	struct net_device *dev;
+
+	cs = cgrp_netprio_state(cgrp);
+	rtnl_lock();
+	for_each_netdev(&init_net, dev) {
+		if (dev->priomap)
+			dev->priomap->priomap[cs->prioidx] = 0;
+	}
+	rtnl_unlock();
+	put_prioidx(cs->prioidx);
+out_free:
+	kfree(cs);
+}
+
+static u64 read_prioidx(struct cgroup *cgrp, struct cftype *cft)
+{
+	return (u64)cgrp_netprio_state(cgrp)->prioidx;
+}
+
+static int read_priomap(struct cgroup *cont, struct cftype *cft,
+			struct cgroup_map_cb *cb)
+{
+	struct net_device *dev;
+	u32 prioidx = cgrp_netprio_state(cont)->prioidx;
+	u32 priority;
+
+	/*
+ 	 * Stub until I add the per-interface priority map
+ 	 */
+	rcu_read_lock();
+	for_each_netdev_rcu(&init_net, dev) {
+		priority = dev->priomap ? dev->priomap->priomap[prioidx] : 0;
+		cb->fill(cb, dev->name, priority);
+	}
+	rcu_read_unlock();
+	return 0;
+}
+
+static int write_priomap(struct cgroup *cgrp, struct cftype *cft,
+			 const char *buffer)
+{
+	char *devname = kstrdup(buffer, GFP_KERNEL);
+	int ret = -EINVAL;
+	u32 prioidx = cgrp_netprio_state(cgrp)->prioidx;
+	unsigned long priority;
+	char *priostr;
+	struct net_device *dev;
+
+	devname = kstrdup(buffer, GFP_KERNEL);
+	if (!devname)
+		return -ENOMEM;
+
+	/*
+	 * Minimally sized valid priomap string
+	 */
+	if (strlen(devname) < 3)
+		goto out_free_devname;
+
+	priostr = strstr(devname, " ");
+	if (!priostr)
+		goto out_free_devname;
+
+	/*
+	 *Separate the devname from the associated priority
+	 *and advance the priostr poitner to the priority value
+	 */
+	*priostr = '\0';
+	priostr++;
+
+	/*
+	 * If the priostr points to NULL, we're at the end of the passed
+	 * in string, and its not a valid write
+	 */
+	if (*priostr == '\0')
+		goto out_free_devname;
+
+	ret = kstrtoul(priostr, 10, &priority);
+	if (ret < 0)
+		goto out_free_devname;
+
+	ret = -ENODEV;
+
+	dev = dev_get_by_name(&init_net, devname);
+	if (!dev)
+		goto out_free_devname;
+
+	update_netdev_tables();
+	ret = 0;
+	if (dev->priomap)
+		dev->priomap->priomap[prioidx] = priority;
+
+	dev_put(dev);
+
+out_free_devname:
+	kfree(devname);
+	return ret;
+}
+
+static struct cftype ss_files[] = {
+	{
+		.name = "prioidx",
+		.read_u64 = read_prioidx,
+	},
+	{
+		.name = "ifpriomap",
+		.read_map = read_priomap,
+		.write_string = write_priomap,
+	},
+};
+
+static int cgrp_populate(struct cgroup_subsys *ss, struct cgroup *cgrp)
+{
+	return cgroup_add_files(cgrp, ss, ss_files, ARRAY_SIZE(ss_files));
+}
+
+static int netprio_device_event(struct notifier_block *unused,
+				unsigned long event, void *ptr)
+{
+	struct net_device *dev = ptr;
+	struct netprio_map *old;
+	u32 max_len = atomic_read(&max_prioidx);
+
+	old = rcu_dereference_protected(dev->priomap, 1);
+	/*
+	 * Note this is called with rtnl_lock held so we have update side
+	 * protection on our rcu assignments
+	 */
+
+	switch (event) {
+
+	case NETDEV_REGISTER:
+		if (max_len)
+			extend_netdev_table(dev, max_len);
+		break;
+	case NETDEV_UNREGISTER:
+		rcu_assign_pointer(dev->priomap, NULL);
+		if (old)
+			kfree_rcu(old, rcu);
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block netprio_device_notifier = {
+	.notifier_call = netprio_device_event
+};
+
+static int __init init_cgroup_netprio(void)
+{
+	int ret;
+
+	ret = cgroup_load_subsys(&net_prio_subsys);
+	if (ret)
+		goto out;
+#ifndef CONFIG_NETPRIO_CGROUP
+	smp_wmb();
+	net_prio_subsys_id = net_prio_subsys.subsys_id;
+#endif
+
+	register_netdevice_notifier(&netprio_device_notifier);
+
+out:
+	return ret;
+}
+
+static void __exit exit_cgroup_netprio(void)
+{
+	struct netprio_map *old;
+	struct net_device *dev;
+
+	unregister_netdevice_notifier(&netprio_device_notifier);
+
+	cgroup_unload_subsys(&net_prio_subsys);
+
+#ifndef CONFIG_NETPRIO_CGROUP
+	net_prio_subsys_id = -1;
+	synchronize_rcu();
+#endif
+
+	rtnl_lock();
+	for_each_netdev(&init_net, dev) {
+		old = dev->priomap;
+		rcu_assign_pointer(dev->priomap, NULL);
+		if (old)
+			kfree_rcu(old, rcu);
+	}
+	rtnl_unlock();
+}
+
+module_init(init_cgroup_netprio);
+module_exit(exit_cgroup_netprio);
+MODULE_LICENSE("GPL v2");
diff --git a/net/core/sock.c b/net/core/sock.c
index 5a08762..77a4888 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -125,6 +125,7 @@
 #include <net/xfrm.h>
 #include <linux/ipsec.h>
 #include <net/cls_cgroup.h>
+#include <net/netprio_cgroup.h>
 
 #include <linux/filter.h>
 
@@ -221,10 +222,16 @@ __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX;
 int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*(2*UIO_MAXIOV+512);
 EXPORT_SYMBOL(sysctl_optmem_max);
 
-#if defined(CONFIG_CGROUPS) && !defined(CONFIG_NET_CLS_CGROUP)
+#if defined(CONFIG_CGROUPS)
+#if !defined(CONFIG_NET_CLS_CGROUP)
 int net_cls_subsys_id = -1;
 EXPORT_SYMBOL_GPL(net_cls_subsys_id);
 #endif
+#if !defined(CONFIG_NETPRIO_CGROUP)
+int net_prio_subsys_id = -1;
+EXPORT_SYMBOL_GPL(net_prio_subsys_id);
+#endif
+#endif
 
 static int sock_set_timeout(long *timeo_p, char __user *optval, int optlen)
 {
@@ -1111,6 +1118,18 @@ void sock_update_classid(struct sock *sk)
 		sk->sk_classid = classid;
 }
 EXPORT_SYMBOL(sock_update_classid);
+
+void sock_update_netprioidx(struct sock *sk)
+{
+	struct cgroup_netprio_state *state;
+	if (in_interrupt())
+		return;
+	rcu_read_lock();
+	state = task_netprio_state(current);
+	sk->sk_cgrp_prioidx = state ? state->prioidx : 0;
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(sock_update_netprioidx);
 #endif
 
 /**
@@ -1138,6 +1157,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		atomic_set(&sk->sk_wmem_alloc, 1);
 
 		sock_update_classid(sk);
+		sock_update_netprioidx(sk);
 	}
 
 	return sk;
diff --git a/net/socket.c b/net/socket.c
index 2877647..108716f 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -549,6 +549,8 @@ static inline int __sock_sendmsg_nosec(struct kiocb *iocb, struct socket *sock,
 
 	sock_update_classid(sock->sk);
 
+	sock_update_netprioidx(sock->sk);
+
 	si->sock = sock;
 	si->scm = NULL;
 	si->msg = msg;
-- 
1.7.6.4

^ permalink raw reply related

* [PATCH 0/2] net: Add network priority cgroup
From: Neil Horman @ 2011-11-16 20:51 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, John Fastabend, Robert Love, David S. Miller

Data Center Bridging environments are currently somewhat limited in their
ability to provide a general mechanism for controlling traffic priority.
Specifically they are unable to administratively control the priority at which
various types of network traffic are sent.

Currently, the primary ways to set the priority of a network buffer are:

1) Through the use of the SO_PRIORITY socket option
2) By using low level hooks, like a tc action, or iptables

(1) is difficult from an administrative perspective because it requires that the
application to be coded to not just assume the default priority is sufficient,
and must expose an administrative interface to allow priority adjustment.  Such
a solution is not scalable in a DCB environment

(2) is also difficult, as it requires constant administrative oversight of
applications so as to build appropriate rules to match traffic belonging to
various classes, so that priority can be appropriately set. It is further
limiting when DCB enabled hardware is in use, due to the fact that tc rules are
only run after a root qdisc has been selected (DCB enabled hardware may reserve
hw queues for various traffic classes and needs the priority to be set prior to
selecting the root qdisc)

I've discussed various solutions with John Fastabend, and we saw a cgroup as
being a good general solution to this problem.  The network priority cgroup
allows for a per-interface priority map to be built per cgroup.  Any traffic
originating from an application in a cgroup, that does not explicitly set its
priority with SO_PRIORITY will have its priority assigned to the value
designated for that group on that interface.  This allows a user space daemon,
when conducting LLDP negotiation with a DCB enabled peer to create a cgroup
based on the APP_TLV value received and administratively assign applications to
that priority using the existing cgroup utility infrastructure.

Tested by John and myself, with good results

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
CC: Robert Love <robert.w.love@intel.com>
CC: "David S. Miller" <davem@davemloft.net>

^ permalink raw reply

* Re: [PATCH] route: add more relaxed option for secure_redirects
From: Flavio Leitner @ 2011-11-16 20:46 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20111111.203321.93615255271311412.davem@davemloft.net>

On Fri, 11 Nov 2011 20:33:21 -0500 (EST)
David Miller <davem@davemloft.net> wrote:

> From: Flavio Leitner <fbl@redhat.com>
> Date: Mon,  7 Nov 2011 22:03:50 -0200
> 
> > When the host uses a gateway IP address that is actually an alias
> > address, the ICMP redirect message source address can be the
> > gateway's main IP address, so the message is ignored by the host
> > regardless of the secure_redirects setup.
> > 
> > The new value (2) allows that ICMP message to be processed.
> > The possible values are:
> > 
> >  0 - Accept ICMP redirect messages only if its source address is the
> >      previous gateway address.
> >  1 - The same as above. However, if shared_media is FALSE, it has to
> >      be for gateways listed in default gateway list as well.
> >  2 - Accept ICMP redirects messages ignoring the conditions above.
> >  default value is 1.
> > 
> > Signed-off-by: Flavio Leitner <fbl@redhat.com>
> 
> The more I look at this the less I like it.
> 
> Look, if IPVS or whatever is translating addresses and this is what
> causes the problem then this entity can very well translate the damn
> addresses right back in the redirect so it looks legitimate to the
> sender.
> 
I thought about that, see below.

> You can't translate people's addresses, and them let them see that
> intenal remapping in ICMP errors.  The redirect is dropped by the
> sender because it not only looks like crap, it is crap.
> 
> This is fundamentally not the correct way to handle this.
> 

I agree. The problem is that the communication is between the host
and another external host, so it's not with the gateway. Therefore,
the original packet has the saddr of the origin host and the daddr
is an external host.  When the gateway receives that packet, there
is no way to tell which IP address should be used to reply with.

Today Linux picks the primary address of an existing interface.
There is the sysctl icmp_errors_use_inbound_ifaddr to use the
primary address of the interface that _received_ the packet that
caused the icmp error. Yet, it doesn't help if the host used
the alias address (i.e. secondary address) as the gw address.

It may possible to promote the secondary address to be the primary
one in the gateway, but this can't be done if you use more than
just one address in the same subnet as gw for some reason.

Thus, the only option at the sender side would be using iptables
to change the ICMP redirect source address to be the float address,
but that is not working as well. (It isn't passing through -t nat)

fbl

^ permalink raw reply

* Re: [PATCH net-next v5 07/10] forcedeth: implement ndo_get_stats64() API
From: David Decotigny @ 2011-11-16 20:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ben Hutchings, Stephen Hemminger, netdev, linux-kernel,
	David S. Miller, Ian Campbell, Jeff Kirsher, Jiri Pirko,
	Joe Perches, Szymon Janc, Richard Jones, Ayaz Abdulla
In-Reply-To: <1321475553.3274.3.camel@edumazet-laptop>

Thanks to you all!

I have an implementation that's similar to sky2's. I will run the
usual set of tests and send the updated patch series here.

Regards,

--
David Decotigny



On Wed, Nov 16, 2011 at 12:32 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 16 novembre 2011 à 19:32 +0000, Ben Hutchings a écrit :
>> On Wed, 2011-11-16 at 11:25 -0800, David Decotigny wrote:
>> > Thanks for your feedback on these concerns.
>> >
>> > On Wed, Nov 16, 2011 at 9:14 AM, Ben Hutchings
>> > <bhutchings@solarflare.com> wrote:
>> > > On Wed, 2011-11-16 at 08:55 -0800, Stephen Hemminger wrote:
>> > >> Or realize that the Tx side is single threaded by dev->xmit_lock already
>> > >> and the Rx side is single threaded by NAPI.
>> > >
>> > > Right.
>> >
>> > Yes, that's what I meant by "Each software stat field is updated by
>> > one single writer." in a previous email. However, I think that TX and
>> > RX paths are not always synchronized. So I'm afraid that if I'm using
>> > a single seqcount, I might run into trouble in the absence of a lock
>> > around each update.... and I'd really prefer to avoid such a lock. Are
>> > you suggesting I should use 2 independent seqcounts? One for RX path,
>> > the other for TX path, all this without a lock around writers?
>>
>> Yes, there should be 1 for each path.
>
> David, drivers/net/ethernet/marvell/sky2.c contains code like that
> (different syncpfor rx/tx)
>
> TX path:
>                        u64_stats_update_begin(&sky2->tx_stats.syncp);
>                        ++sky2->tx_stats.packets;
>                        sky2->tx_stats.bytes += skb->len;
>                        u64_stats_update_end(&sky2->tx_stats.syncp);
>
>
> RX path:
>
>        u64_stats_update_begin(&sky2->rx_stats.syncp);
>        sky2->rx_stats.packets += packets;
>        sky2->rx_stats.bytes += bytes;
>        u64_stats_update_end(&sky2->rx_stats.syncp);
>
>
>

^ permalink raw reply

* Re: [PATCH net-next v5 07/10] forcedeth: implement ndo_get_stats64() API
From: Eric Dumazet @ 2011-11-16 20:32 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Decotigny, Stephen Hemminger, netdev, linux-kernel,
	David S. Miller, Ian Campbell, Jeff Kirsher, Jiri Pirko,
	Joe Perches, Szymon Janc, Richard Jones, Ayaz Abdulla
In-Reply-To: <1321471967.2709.14.camel@bwh-desktop>

Le mercredi 16 novembre 2011 à 19:32 +0000, Ben Hutchings a écrit :
> On Wed, 2011-11-16 at 11:25 -0800, David Decotigny wrote:
> > Thanks for your feedback on these concerns.
> > 
> > On Wed, Nov 16, 2011 at 9:14 AM, Ben Hutchings
> > <bhutchings@solarflare.com> wrote:
> > > On Wed, 2011-11-16 at 08:55 -0800, Stephen Hemminger wrote:
> > >> Or realize that the Tx side is single threaded by dev->xmit_lock already
> > >> and the Rx side is single threaded by NAPI.
> > >
> > > Right.
> > 
> > Yes, that's what I meant by "Each software stat field is updated by
> > one single writer." in a previous email. However, I think that TX and
> > RX paths are not always synchronized. So I'm afraid that if I'm using
> > a single seqcount, I might run into trouble in the absence of a lock
> > around each update.... and I'd really prefer to avoid such a lock. Are
> > you suggesting I should use 2 independent seqcounts? One for RX path,
> > the other for TX path, all this without a lock around writers?
> 
> Yes, there should be 1 for each path.

David, drivers/net/ethernet/marvell/sky2.c contains code like that
(different syncpfor rx/tx)

TX path:
                        u64_stats_update_begin(&sky2->tx_stats.syncp);
                        ++sky2->tx_stats.packets;
                        sky2->tx_stats.bytes += skb->len;
                        u64_stats_update_end(&sky2->tx_stats.syncp);


RX path:

        u64_stats_update_begin(&sky2->rx_stats.syncp);
        sky2->rx_stats.packets += packets;
        sky2->rx_stats.bytes += bytes;
        u64_stats_update_end(&sky2->rx_stats.syncp);

^ permalink raw reply

* Re: [3.1] Divide by zero in __tcp_select_window()
From: Eric Dumazet @ 2011-11-16 20:26 UTC (permalink / raw)
  To: Simon Kirby
  Cc: David Miller, tglx, netdev, a.p.zijlstra, linux-kernel, davej,
	schwidefsky, mingo
In-Reply-To: <20111116195419.GE24411@hostway.ca>

Le mercredi 16 novembre 2011 à 11:54 -0800, Simon Kirby a écrit :

> Looks good, thanks! Working on ~25 boxes without issue for >36 hours.
> 
> Simon-
> 
> Tested-by: Simon Kirby <sim@hostway.ca>

Thanks Simon for this feedback.

^ permalink raw reply

* Re: seeing crash on running ltp proc01
From: Andrew Morton @ 2011-11-16 20:16 UTC (permalink / raw)
  To: madhuri appana; +Cc: linux-kernel, netdev, Mihai Maruseac, Eric Dumazet
In-Reply-To: <1321431477.4212.7.camel@madhuri-laptop>

(cc's added)

On Wed, 16 Nov 2011 13:47:57 +0530
madhuri appana <madhuria@linux.vnet.ibm.com> wrote:

> Hi All,
> 
> I am seeing crash on running ltp proc01 syscall on kvm guest using
> latest qemu1.0 rc0 and latest linus kernel.
> 
> Here is the crash report:
> 
> [  256.805661] The scan_unevictable_pages sysctl/node-interface has been
> disabled for lack of a legitimate use case.  If you have one, please
> send an email to linux-mm@kvack.org.
> [  256.820968] process `proc01' is using deprecated sysctl (syscall)
> net.ipv6.neigh.default.retrans_time; Use
> net.ipv6.neigh.default.retrans_time_ms instead.
> [  837.127321] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000010
> [  837.127980] IP: [<ffffffff8157690c>] dev_from_new_bucket+0xb/0x79
> [  837.128053] PGD 3a89d067 PUD 3ad09067 PMD 0 
> [  837.128053] Oops: 0000 [#1] SMP 
> [  837.128053] CPU 2 
> [  837.128053] Modules linked in:
> [  837.128053] 
> [  837.128053] Pid: 9933, comm: proc01 Not tainted 3.1.0+ #3 Bochs Bochs
> [  837.128053] RIP: 0010:[<ffffffff8157690c>]  [<ffffffff8157690c>]
> dev_from_new_bucket+0xb/0x79
> [  837.128053] RSP: 0018:ffff880039885e18  EFLAGS: 00010246
> [  837.128053] RAX: ffffffff818df330 RBX: 0000000000000400 RCX:
> 0000000000000000
> [  837.128053] RDX: ffff880039885e70 RSI: 0000000000000000 RDI:
> ffff88003c427600
> [  837.128053] RBP: ffff880039885e18 R08: 0000000000000010 R09:
> ffff88003c427600
> [  837.128053] R10: ffffffff81c1f610 R11: ffffffff81576fa2 R12:
> ffff88003a4e6140
> [  837.128053] R13: ffff88003c427600 R14: 0000000000000001 R15:
> 0000000000000000
> [  837.128053] FS:  00007f0c56deb7c0(0000) GS:ffff88003f100000(0000)
> knlGS:0000000000000000
> [  837.128053] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  837.128053] CR2: 0000000000000010 CR3: 00000000398a4000 CR4:
> 00000000000006e0
> [  837.128053] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  837.128053] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [  837.128053] Process proc01 (pid: 9933, threadinfo ffff880039884000,
> task ffff88003a91e6d0)
> [  837.128053] Stack:
> [  837.128053]  ffff880039885e28 ffffffff815769db ffff880039885ea8
> ffffffff8111f614
> [  837.128053]  0000000000000000 ffff880039885e70 ffff88003c427638
> 0000000000000246
> [  837.128053]  ffff880039885f58 00000000006243a0 ffffffff8111f463
> 0000000000000001
> [  837.128053] Call Trace:
> [  837.128053]  [<ffffffff815769db>] dev_seq_next+0x61/0x63
> [  837.128053]  [<ffffffff8111f614>] seq_read+0x1b1/0x369
> [  837.128053]  [<ffffffff8111f463>] ? seq_lseek+0xd4/0xd4
> [  837.128053]  [<ffffffff8111f463>] ? seq_lseek+0xd4/0xd4
> [  837.128053]  [<ffffffff81150a5c>] proc_reg_read+0x90/0xaf
> [  837.128053]  [<ffffffff8110519c>] vfs_read+0xa2/0xde
> [  837.128053]  [<ffffffff81099ec1>] ? trace_hardirqs_on_caller
> +0x11e/0x155
> [  837.128053]  [<ffffffff81105291>] sys_read+0x45/0x6c
> [  837.128053]  [<ffffffff81685bbb>] system_call_fastpath+0x16/0x1b
> [  837.128053] Code: 42 08 c9 c3 55 83 ff 27 b8 ea ff ff ff 48 89 e5 77
> 0c 89 ff 31 c0 48 89 34 fd 00 8e 63 82 c9 c3 4c 8b 87 d0 00 00 00 55 48
> 89 e5 
> [  837.128053]  8b 10 c1 ea 18 4c 8b 8f d0 00 00 00 31 c9 41 8b 01 89 c6
> 41 
> [  837.128053] RIP  [<ffffffff8157690c>] dev_from_new_bucket+0xb/0x79
> [  837.128053]  RSP <ffff880039885e18>
> [  837.128053] CR2: 0000000000000010
> [  837.148037] ---[ end trace 9639850acd813c5b ]---
> [root@qemu-img-64 bin]
> 
> Thanks,
> Madhuri
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [3.1] Divide by zero in __tcp_select_window()
From: Simon Kirby @ 2011-11-16 19:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, tglx, netdev, a.p.zijlstra, linux-kernel, davej,
	schwidefsky, mingo
In-Reply-To: <1321304216.2719.23.camel@edumazet-laptop>

On Mon, Nov 14, 2011 at 09:56:56PM +0100, Eric Dumazet wrote:

> Le lundi 14 novembre 2011 ?? 15:36 -0500, David Miller a ??crit :
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Tue, 08 Nov 2011 22:23:25 +0100
> > 
> > > OK, it seems we let a timer running while we free the socket (same error
> > > path than your previous bug report, because of the NULL route)
> > > 
> > > We arm this keepalive timer in tcp_create_openreq_child()
> > > 
> > > net/ipv4/tcp_minisocks.c:513
> > > 	if (sock_flag(newsk, SOCK_KEEPOPEN))
> > > 		inet_csk_reset_keepalive_timer(newsk,
> > > 			keepalive_time_when(newtp));
> > > 
> > > I would try to add a call to tcp_clear_xmit_timers() as well
> > > 
> > > Please try following patch :
> > 
> > We've been waiting quite some time to get some testing validation on
> > this patch, but I think it's correct.
> > 
> > Eric can you formally submit this?  Thanks!
> 
> Sure, here it is.
> 
> Please Simon feel free to add your "Tested-by" signature 
> 
> Thanks

Looks good, thanks! Working on ~25 boxes without issue for >36 hours.

Simon-

Tested-by: Simon Kirby <sim@hostway.ca>

> [PATCH] tcp: clear xmit timers in tcp_v4_syn_recv_sock()
> 
> Simon Kirby reported divides by zero errors in __tcp_select_window()
> 
> This happens when inet_csk_route_child_sock() returns a NULL pointer :
> 
> We free new socket while we eventually armed keepalive timer in
> tcp_create_openreq_child()
> 
> Fix this by a call to tcp_clear_xmit_timers()
> 
> [ This is a followup to commit 918eb39962dff (net: add missing
> bh_unlock_sock() calls) ]
> 
> Reported-by: Simon Kirby <sim@hostway.ca>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  net/ipv4/tcp_ipv4.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index a744315..a9db4b1 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1510,6 +1510,7 @@ exit:
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
>  	return NULL;
>  put_and_exit:
> +	tcp_clear_xmit_timers(newsk);
>  	bh_unlock_sock(newsk);
>  	sock_put(newsk);
>  	goto exit;
> 
> 

^ permalink raw reply

* [PATCH net-next 5/5] ucc_geth: Convert ENET_NUM_OCTETS_PER_ADDRESS uses to ETH_ALEN
From: Joe Perches @ 2011-11-16 19:38 UTC (permalink / raw)
  To: Li Yang; +Cc: netdev, linuxppc-dev, linux-kernel
In-Reply-To: <cover.1321472142.git.joe@perches.com>

Reduce the number of #defines, use the normal #define from if_ether.h

Signed-off-by: Joe Perches <joe@perches.com>
---
 drivers/net/ethernet/freescale/ucc_geth.c |    2 +-
 drivers/net/ethernet/freescale/ucc_geth.h |    6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c
index b5dc027..ba2dc08 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -443,7 +443,7 @@ static void hw_add_addr_in_hash(struct ucc_geth_private *ugeth,
 
 static inline int compare_addr(u8 **addr1, u8 **addr2)
 {
-	return memcmp(addr1, addr2, ENET_NUM_OCTETS_PER_ADDRESS);
+	return memcmp(addr1, addr2, ETH_ALEN);
 }
 
 #ifdef DEBUG
diff --git a/drivers/net/ethernet/freescale/ucc_geth.h b/drivers/net/ethernet/freescale/ucc_geth.h
index d12fcad..2e395a2 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.h
+++ b/drivers/net/ethernet/freescale/ucc_geth.h
@@ -20,6 +20,7 @@
 
 #include <linux/kernel.h>
 #include <linux/list.h>
+#include <linux/if_ether.h>
 
 #include <asm/immap_qe.h>
 #include <asm/qe.h>
@@ -881,7 +882,6 @@ struct ucc_geth_hardware_statistics {
 #define TX_RING_MOD_MASK(size)                  (size-1)
 #define RX_RING_MOD_MASK(size)                  (size-1)
 
-#define ENET_NUM_OCTETS_PER_ADDRESS             6
 #define ENET_GROUP_ADDR                         0x01	/* Group address mask
 							   for ethernet
 							   addresses */
@@ -1051,7 +1051,7 @@ enum ucc_geth_num_of_station_addresses {
 
 /* UCC GETH 82xx Ethernet Address Container */
 struct enet_addr_container {
-	u8 address[ENET_NUM_OCTETS_PER_ADDRESS];	/* ethernet address */
+	u8 address[ETH_ALEN];	/* ethernet address */
 	enum ucc_geth_enet_address_recognition_location location;	/* location in
 								   82xx address
 								   recognition
@@ -1194,7 +1194,7 @@ struct ucc_geth_private {
 	u16 cpucount[NUM_TX_QUEUES];
 	u16 __iomem *p_cpucount[NUM_TX_QUEUES];
 	int indAddrRegUsed[NUM_OF_PADDRS];
-	u8 paddr[NUM_OF_PADDRS][ENET_NUM_OCTETS_PER_ADDRESS];	/* ethernet address */
+	u8 paddr[NUM_OF_PADDRS][ETH_ALEN];	/* ethernet address */
 	u8 numGroupAddrInHash;
 	u8 numIndAddrInHash;
 	u8 numIndAddrInReg;
-- 
1.7.6.405.gc1be0

^ permalink raw reply related

* [PATCH net-next 4/5] amd8111e:  Convert ETH_ADDR_LEN uses to ETH_ALEN
From: Joe Perches @ 2011-11-16 19:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev
In-Reply-To: <cover.1321472142.git.joe@perches.com>

Reduce the number of #defines, use the normal #define from if_ether.h

Signed-off-by: Joe Perches <joe@perches.com>
---
 drivers/net/ethernet/amd/amd8111e.c |    6 +++---
 drivers/net/ethernet/amd/amd8111e.h |    1 -
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/amd/amd8111e.c b/drivers/net/ethernet/amd/amd8111e.c
index a388118..33e0a8c 100644
--- a/drivers/net/ethernet/amd/amd8111e.c
+++ b/drivers/net/ethernet/amd/amd8111e.c
@@ -499,7 +499,7 @@ static int amd8111e_restart(struct net_device *dev)
 	writel( VAL0 | APAD_XMT | REX_RTRY, mmio + CMD2 );
 
 	/* Setting the MAC address to the device */
-	for(i = 0; i < ETH_ADDR_LEN; i++)
+	for (i = 0; i < ETH_ALEN; i++)
 		writeb( dev->dev_addr[i], mmio + PADR + i );
 
 	/* Enable interrupt coalesce */
@@ -1550,7 +1550,7 @@ static int amd8111e_set_mac_address(struct net_device *dev, void *p)
 	memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
 	spin_lock_irq(&lp->lock);
 	/* Setting the MAC address to the device */
-	for(i = 0; i < ETH_ADDR_LEN; i++)
+	for (i = 0; i < ETH_ALEN; i++)
 		writeb( dev->dev_addr[i], lp->mmio + PADR + i );
 
 	spin_unlock_irq(&lp->lock);
@@ -1886,7 +1886,7 @@ static int __devinit amd8111e_probe_one(struct pci_dev *pdev,
 	}
 
 	/* Initializing MAC address */
-	for(i = 0; i < ETH_ADDR_LEN; i++)
+	for (i = 0; i < ETH_ALEN; i++)
 		dev->dev_addr[i] = readb(lp->mmio + PADR + i);
 
 	/* Setting user defined parametrs */
diff --git a/drivers/net/ethernet/amd/amd8111e.h b/drivers/net/ethernet/amd/amd8111e.h
index 2ff2e7a..5bbb53a 100644
--- a/drivers/net/ethernet/amd/amd8111e.h
+++ b/drivers/net/ethernet/amd/amd8111e.h
@@ -586,7 +586,6 @@ typedef enum {
 
 #define PKT_BUFF_SZ			1536
 #define MIN_PKT_LEN			60
-#define ETH_ADDR_LEN			6
 
 #define  AMD8111E_TX_TIMEOUT		(3 * HZ)/* 3 sec */
 #define SOFT_TIMER_FREQ 		0xBEBC  /* 0.5 sec */
-- 
1.7.6.405.gc1be0

^ permalink raw reply related

* [PATCH net-next 3/5] bna: Convert MAC_ADDRLEN uses to ETH_ALEN
From: Joe Perches @ 2011-11-16 19:38 UTC (permalink / raw)
  To: Rasesh Mody; +Cc: netdev, linux-kernel
In-Reply-To: <cover.1321472142.git.joe@perches.com>

Reduce the number of #defines, use the normal #define from if_ether.h

Signed-off-by: Joe Perches <joe@perches.com>
---
 drivers/net/ethernet/brocade/bna/cna.h |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/cna.h b/drivers/net/ethernet/brocade/bna/cna.h
index 1b3e90d..32e8f17 100644
--- a/drivers/net/ethernet/brocade/bna/cna.h
+++ b/drivers/net/ethernet/brocade/bna/cna.h
@@ -43,8 +43,7 @@ extern char bfa_version[];
 
 #pragma pack(1)
 
-#define MAC_ADDRLEN	(6)
-typedef struct mac { u8 mac[MAC_ADDRLEN]; } mac_t;
+typedef struct mac { u8 mac[ETH_ALEN]; } mac_t;
 
 #pragma pack()
 
-- 
1.7.6.405.gc1be0

^ permalink raw reply related

* [PATCH net-next 2/5] ethernet: Convert ETHER_ADDR_LEN uses to ETH_ALEN
From: Joe Perches @ 2011-11-16 19:38 UTC (permalink / raw)
  To: Yoshinori Sato; +Cc: netdev, linux-kernel
In-Reply-To: <cover.1321472142.git.joe@perches.com>

Reduce the number of #defines, use the normal #define from if_ether.h

Signed-off-by: Joe Perches <joe@perches.com>
---
 drivers/net/ethernet/8390/8390.h           |    2 --
 drivers/net/ethernet/8390/apne.c           |    2 +-
 drivers/net/ethernet/8390/ax88796.c        |    5 ++---
 drivers/net/ethernet/8390/es3210.c         |    2 +-
 drivers/net/ethernet/8390/hp-plus.c        |    2 +-
 drivers/net/ethernet/8390/hp.c             |    2 +-
 drivers/net/ethernet/8390/hydra.c          |    2 +-
 drivers/net/ethernet/8390/lne390.c         |    4 ++--
 drivers/net/ethernet/8390/ne-h8300.c       |    2 +-
 drivers/net/ethernet/8390/ne.c             |    4 ++--
 drivers/net/ethernet/8390/ne2.c            |    2 +-
 drivers/net/ethernet/8390/ne3210.c         |    2 +-
 drivers/net/ethernet/8390/stnic.c          |    2 +-
 drivers/net/ethernet/8390/zorro8390.c      |    2 +-
 drivers/net/ethernet/amd/nmclan_cs.c       |   12 +++++-------
 drivers/net/ethernet/broadcom/sb1250-mac.c |    3 +--
 16 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/8390/8390.h b/drivers/net/ethernet/8390/8390.h
index 58a12e4..ef325ff 100644
--- a/drivers/net/ethernet/8390/8390.h
+++ b/drivers/net/ethernet/8390/8390.h
@@ -14,8 +14,6 @@
 
 #define TX_PAGES 12	/* Two Tx slots */
 
-#define ETHER_ADDR_LEN 6
-
 /* The 8390 specific per-packet-header format. */
 struct e8390_pkt_hdr {
   unsigned char status; /* status */
diff --git a/drivers/net/ethernet/8390/apne.c b/drivers/net/ethernet/8390/apne.c
index 5477373..3ad5d2f 100644
--- a/drivers/net/ethernet/8390/apne.c
+++ b/drivers/net/ethernet/8390/apne.c
@@ -318,7 +318,7 @@ static int __init apne_probe1(struct net_device *dev, int ioaddr)
     i = request_irq(dev->irq, apne_interrupt, IRQF_SHARED, DRV_NAME, dev);
     if (i) return i;
 
-    for(i = 0; i < ETHER_ADDR_LEN; i++)
+    for (i = 0; i < ETH_ALEN; i++)
 	dev->dev_addr[i] = SA_prom[i];
 
     printk(" %pM\n", dev->dev_addr);
diff --git a/drivers/net/ethernet/8390/ax88796.c b/drivers/net/ethernet/8390/ax88796.c
index e9f8432..2a3b8c2 100644
--- a/drivers/net/ethernet/8390/ax88796.c
+++ b/drivers/net/ethernet/8390/ax88796.c
@@ -735,15 +735,14 @@ static int ax_init_dev(struct net_device *dev)
 	if (ax->plat->flags & AXFLG_MAC_FROMDEV) {
 		ei_outb(E8390_NODMA + E8390_PAGE1 + E8390_STOP,
 			ei_local->mem + E8390_CMD); /* 0x61 */
-		for (i = 0; i < ETHER_ADDR_LEN; i++)
+		for (i = 0; i < ETH_ALEN; i++)
 			dev->dev_addr[i] =
 				ei_inb(ioaddr + EN1_PHYS_SHIFT(i));
 	}
 
 	if ((ax->plat->flags & AXFLG_MAC_FROMPLATFORM) &&
 	    ax->plat->mac_addr)
-		memcpy(dev->dev_addr, ax->plat->mac_addr,
-		       ETHER_ADDR_LEN);
+		memcpy(dev->dev_addr, ax->plat->mac_addr, ETH_ALEN);
 
 	ax_reset_8390(dev);
 
diff --git a/drivers/net/ethernet/8390/es3210.c b/drivers/net/ethernet/8390/es3210.c
index 7a09575..6428f9e 100644
--- a/drivers/net/ethernet/8390/es3210.c
+++ b/drivers/net/ethernet/8390/es3210.c
@@ -195,7 +195,7 @@ static int __init es_probe1(struct net_device *dev, int ioaddr)
 		goto out;
 	}
 
-	for (i = 0; i < ETHER_ADDR_LEN ; i++)
+	for (i = 0; i < ETH_ALEN ; i++)
 		dev->dev_addr[i] = inb(ioaddr + ES_SA_PROM + i);
 
 /*	Check the Racal vendor ID as well. */
diff --git a/drivers/net/ethernet/8390/hp-plus.c b/drivers/net/ethernet/8390/hp-plus.c
index eeac843..d42938b 100644
--- a/drivers/net/ethernet/8390/hp-plus.c
+++ b/drivers/net/ethernet/8390/hp-plus.c
@@ -202,7 +202,7 @@ static int __init hpp_probe1(struct net_device *dev, int ioaddr)
 	/* Retrieve and checksum the station address. */
 	outw(MAC_Page, ioaddr + HP_PAGING);
 
-	for(i = 0; i < ETHER_ADDR_LEN; i++) {
+	for(i = 0; i < ETH_ALEN; i++) {
 		unsigned char inval = inb(ioaddr + 8 + i);
 		dev->dev_addr[i] = inval;
 		checksum += inval;
diff --git a/drivers/net/ethernet/8390/hp.c b/drivers/net/ethernet/8390/hp.c
index 18564d4..113f1e0 100644
--- a/drivers/net/ethernet/8390/hp.c
+++ b/drivers/net/ethernet/8390/hp.c
@@ -156,7 +156,7 @@ static int __init hp_probe1(struct net_device *dev, int ioaddr)
 
 	printk("%s: %s (ID %02x) at %#3x,", dev->name, name, board_id, ioaddr);
 
-	for(i = 0; i < ETHER_ADDR_LEN; i++)
+	for(i = 0; i < ETH_ALEN; i++)
 		dev->dev_addr[i] = inb(ioaddr + i);
 
 	printk(" %pM", dev->dev_addr);
diff --git a/drivers/net/ethernet/8390/hydra.c b/drivers/net/ethernet/8390/hydra.c
index 3dac937..5370c88 100644
--- a/drivers/net/ethernet/8390/hydra.c
+++ b/drivers/net/ethernet/8390/hydra.c
@@ -129,7 +129,7 @@ static int __devinit hydra_init(struct zorro_dev *z)
     if (!dev)
 	return -ENOMEM;
 
-    for(j = 0; j < ETHER_ADDR_LEN; j++)
+    for (j = 0; j < ETH_ALEN; j++)
 	dev->dev_addr[j] = *((u8 *)(board + HYDRA_ADDRPROM + 2*j));
 
     /* We must set the 8390 for word mode. */
diff --git a/drivers/net/ethernet/8390/lne390.c b/drivers/net/ethernet/8390/lne390.c
index f9888d2..69490ae 100644
--- a/drivers/net/ethernet/8390/lne390.c
+++ b/drivers/net/ethernet/8390/lne390.c
@@ -191,14 +191,14 @@ static int __init lne390_probe1(struct net_device *dev, int ioaddr)
 		|| inb(ioaddr + LNE390_SA_PROM + 1) != LNE390_ADDR1
 		|| inb(ioaddr + LNE390_SA_PROM + 2) != LNE390_ADDR2 ) {
 		printk("lne390.c: card not found");
-		for(i = 0; i < ETHER_ADDR_LEN; i++)
+		for (i = 0; i < ETH_ALEN; i++)
 			printk(" %02x", inb(ioaddr + LNE390_SA_PROM + i));
 		printk(" (invalid prefix).\n");
 		return -ENODEV;
 	}
 #endif
 
-	for(i = 0; i < ETHER_ADDR_LEN; i++)
+	for (i = 0; i < ETH_ALEN; i++)
 		dev->dev_addr[i] = inb(ioaddr + LNE390_SA_PROM + i);
 	printk("lne390.c: LNE390%X in EISA slot %d, address %pM.\n",
 	       0xa+revision, ioaddr/0x1000, dev->dev_addr);
diff --git a/drivers/net/ethernet/8390/ne-h8300.c b/drivers/net/ethernet/8390/ne-h8300.c
index cd36a6a..9b9c77d 100644
--- a/drivers/net/ethernet/8390/ne-h8300.c
+++ b/drivers/net/ethernet/8390/ne-h8300.c
@@ -312,7 +312,7 @@ static int __init ne_probe1(struct net_device *dev, int ioaddr)
 
 	dev->base_addr = ioaddr;
 
-	for(i = 0; i < ETHER_ADDR_LEN; i++)
+	for (i = 0; i < ETH_ALEN; i++)
 		dev->dev_addr[i] = SA_prom[i];
 	printk(" %pM\n", dev->dev_addr);
 
diff --git a/drivers/net/ethernet/8390/ne.c b/drivers/net/ethernet/8390/ne.c
index 1063093..f92ea2a 100644
--- a/drivers/net/ethernet/8390/ne.c
+++ b/drivers/net/ethernet/8390/ne.c
@@ -503,12 +503,12 @@ static int __init ne_probe1(struct net_device *dev, unsigned long ioaddr)
 #ifdef CONFIG_PLAT_MAPPI
 	outb_p(E8390_NODMA + E8390_PAGE1 + E8390_STOP,
 		ioaddr + E8390_CMD); /* 0x61 */
-	for (i = 0 ; i < ETHER_ADDR_LEN ; i++) {
+	for (i = 0; i < ETH_ALEN; i++) {
 		dev->dev_addr[i] = SA_prom[i]
 			= inb_p(ioaddr + EN1_PHYS_SHIFT(i));
 	}
 #else
-	for(i = 0; i < ETHER_ADDR_LEN; i++) {
+	for (i = 0; i < ETH_ALEN; i++) {
 		dev->dev_addr[i] = SA_prom[i];
 	}
 #endif
diff --git a/drivers/net/ethernet/8390/ne2.c b/drivers/net/ethernet/8390/ne2.c
index 70cdc69..922b320 100644
--- a/drivers/net/ethernet/8390/ne2.c
+++ b/drivers/net/ethernet/8390/ne2.c
@@ -460,7 +460,7 @@ static int __init ne2_probe1(struct net_device *dev, int slot)
 
 	dev->base_addr = base_addr;
 
-	for(i = 0; i < ETHER_ADDR_LEN; i++)
+	for (i = 0; i < ETH_ALEN; i++)
 		dev->dev_addr[i] = SA_prom[i];
 
 	printk(" %pM\n", dev->dev_addr);
diff --git a/drivers/net/ethernet/8390/ne3210.c b/drivers/net/ethernet/8390/ne3210.c
index 243ed2a..2a3e805 100644
--- a/drivers/net/ethernet/8390/ne3210.c
+++ b/drivers/net/ethernet/8390/ne3210.c
@@ -125,7 +125,7 @@ static int __init ne3210_eisa_probe (struct device *device)
 #endif
 
 	port_index = inb(ioaddr + NE3210_CFG2) >> 6;
-	for(i = 0; i < ETHER_ADDR_LEN; i++)
+	for (i = 0; i < ETH_ALEN; i++)
 		dev->dev_addr[i] = inb(ioaddr + NE3210_SA_PROM + i);
 	printk("ne3210.c: NE3210 in EISA slot %d, media: %s, addr: %pM.\n",
 		edev->slot, ifmap[port_index], dev->dev_addr);
diff --git a/drivers/net/ethernet/8390/stnic.c b/drivers/net/ethernet/8390/stnic.c
index d85f0a8..3b90375 100644
--- a/drivers/net/ethernet/8390/stnic.c
+++ b/drivers/net/ethernet/8390/stnic.c
@@ -114,7 +114,7 @@ static int __init stnic_probe(void)
 #ifdef CONFIG_SH_STANDARD_BIOS
   sh_bios_get_node_addr (stnic_eadr);
 #endif
-  for (i = 0; i < ETHER_ADDR_LEN; i++)
+  for (i = 0; i < ETH_ALEN; i++)
     dev->dev_addr[i] = stnic_eadr[i];
 
   /* Set the base address to point to the NIC, not the "real" base! */
diff --git a/drivers/net/ethernet/8390/zorro8390.c b/drivers/net/ethernet/8390/zorro8390.c
index 3aa9fe9..bcd2732 100644
--- a/drivers/net/ethernet/8390/zorro8390.c
+++ b/drivers/net/ethernet/8390/zorro8390.c
@@ -365,7 +365,7 @@ static int __devinit zorro8390_init(struct net_device *dev,
 	if (i)
 		return i;
 
-	for (i = 0; i < ETHER_ADDR_LEN; i++)
+	for (i = 0; i < ETH_ALEN; i++)
 		dev->dev_addr[i] = SA_prom[i];
 
 	pr_debug("Found ethernet address: %pM\n", dev->dev_addr);
diff --git a/drivers/net/ethernet/amd/nmclan_cs.c b/drivers/net/ethernet/amd/nmclan_cs.c
index 3accd5d..25f4862 100644
--- a/drivers/net/ethernet/amd/nmclan_cs.c
+++ b/drivers/net/ethernet/amd/nmclan_cs.c
@@ -160,8 +160,6 @@ Include Files
 Defines
 ---------------------------------------------------------------------------- */
 
-#define ETHER_ADDR_LEN			ETH_ALEN
-					/* 6 bytes in an Ethernet Address */
 #define MACE_LADRF_LEN			8
 					/* 8 bytes in Logical Address Filter */
 
@@ -600,7 +598,7 @@ static int mace_init(mace_private *lp, unsigned int ioaddr, char *enet_addr)
   	}
   }
   /* Set PADR register */
-  for (i = 0; i < ETHER_ADDR_LEN; i++)
+  for (i = 0; i < ETH_ALEN; i++)
     mace_write(lp, ioaddr, MACE_PADR, enet_addr[i]);
 
   /* MAC Configuration Control Register should be written last */
@@ -639,11 +637,11 @@ static int nmclan_config(struct pcmcia_device *link)
 
   /* Read the ethernet address from the CIS. */
   len = pcmcia_get_tuple(link, 0x80, &buf);
-  if (!buf || len < ETHER_ADDR_LEN) {
+  if (!buf || len < ETH_ALEN) {
 	  kfree(buf);
 	  goto failed;
   }
-  memcpy(dev->dev_addr, buf, ETHER_ADDR_LEN);
+  memcpy(dev->dev_addr, buf, ETH_ALEN);
   kfree(buf);
 
   /* Verify configuration by reading the MACE ID. */
@@ -1420,7 +1418,7 @@ Output
 static void set_multicast_list(struct net_device *dev)
 {
   mace_private *lp = netdev_priv(dev);
-  int adr[ETHER_ADDR_LEN] = {0}; /* Ethernet address */
+  int adr[ETH_ALEN] = {0}; /* Ethernet address */
   struct netdev_hw_addr *ha;
 
 #ifdef PCMCIA_DEBUG
@@ -1442,7 +1440,7 @@ static void set_multicast_list(struct net_device *dev)
     /* Calculate multicast logical address filter */
     memset(lp->multicast_ladrf, 0, MACE_LADRF_LEN);
     netdev_for_each_mc_addr(ha, dev) {
-      memcpy(adr, ha->addr, ETHER_ADDR_LEN);
+      memcpy(adr, ha->addr, ETH_ALEN);
       BuildLAF(lp->multicast_ladrf, adr);
     }
   }
diff --git a/drivers/net/ethernet/broadcom/sb1250-mac.c b/drivers/net/ethernet/broadcom/sb1250-mac.c
index 0a1d7f2..aa58f9e 100644
--- a/drivers/net/ethernet/broadcom/sb1250-mac.c
+++ b/drivers/net/ethernet/broadcom/sb1250-mac.c
@@ -163,7 +163,6 @@ enum sbmac_state {
 #define SBMAC_MAX_TXDESCR	256
 #define SBMAC_MAX_RXDESCR	256
 
-#define ETHER_ADDR_LEN		6
 #define ENET_PACKET_SIZE	1518
 /*#define ENET_PACKET_SIZE	9216 */
 
@@ -266,7 +265,7 @@ struct sbmac_softc {
 	int			sbm_pause;	/* current pause setting */
 	int			sbm_link;	/* current link state */
 
-	unsigned char		sbm_hwaddr[ETHER_ADDR_LEN];
+	unsigned char		sbm_hwaddr[ETH_ALEN];
 
 	struct sbmacdma		sbm_txdma;	/* only channel 0 for now */
 	struct sbmacdma		sbm_rxdma;
-- 
1.7.6.405.gc1be0

^ permalink raw reply related

* [PATCH net-next 1/5] ethernet: Convert MAC_ADDR_LEN uses to ETH_ALEN
From: Joe Perches @ 2011-11-16 19:38 UTC (permalink / raw)
  To: Realtek linux nic maintainers, Francois Romieu; +Cc: netdev, linux-kernel
In-Reply-To: <cover.1321472142.git.joe@perches.com>

Reduce the number of #defines, use the normal #define from if_ether.h

Signed-off-by: Joe Perches <joe@perches.com>
---
 drivers/net/ethernet/freescale/gianfar.c |   12 +++---
 drivers/net/ethernet/freescale/gianfar.h |    3 --
 drivers/net/ethernet/micrel/ksz884x.c    |   59 ++++++++++++++----------------
 drivers/net/ethernet/realtek/r8169.c     |    5 +--
 drivers/net/ethernet/sis/sis190.c        |    8 ++---
 5 files changed, 38 insertions(+), 49 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 83199fd..d103475 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -734,7 +734,7 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
 
 	mac_addr = of_get_mac_address(np);
 	if (mac_addr)
-		memcpy(dev->dev_addr, mac_addr, MAC_ADDR_LEN);
+		memcpy(dev->dev_addr, mac_addr, ETH_ALEN);
 
 	if (model && !strcasecmp(model, "TSEC"))
 		priv->device_flags =
@@ -3114,7 +3114,7 @@ static void gfar_set_multi(struct net_device *dev)
 static void gfar_clear_exact_match(struct net_device *dev)
 {
 	int idx;
-	static const u8 zero_arr[MAC_ADDR_LEN] = {0, 0, 0, 0, 0, 0};
+	static const u8 zero_arr[ETH_ALEN] = {0, 0, 0, 0, 0, 0};
 
 	for(idx = 1;idx < GFAR_EM_NUM + 1;idx++)
 		gfar_set_mac_for_addr(dev, idx, zero_arr);
@@ -3137,7 +3137,7 @@ static void gfar_set_hash_for_addr(struct net_device *dev, u8 *addr)
 {
 	u32 tempval;
 	struct gfar_private *priv = netdev_priv(dev);
-	u32 result = ether_crc(MAC_ADDR_LEN, addr);
+	u32 result = ether_crc(ETH_ALEN, addr);
 	int width = priv->hash_width;
 	u8 whichbit = (result >> (32 - width)) & 0x1f;
 	u8 whichreg = result >> (32 - width + 5);
@@ -3158,7 +3158,7 @@ static void gfar_set_mac_for_addr(struct net_device *dev, int num,
 	struct gfar_private *priv = netdev_priv(dev);
 	struct gfar __iomem *regs = priv->gfargrp[0].regs;
 	int idx;
-	char tmpbuf[MAC_ADDR_LEN];
+	char tmpbuf[ETH_ALEN];
 	u32 tempval;
 	u32 __iomem *macptr = &regs->macstnaddr1;
 
@@ -3166,8 +3166,8 @@ static void gfar_set_mac_for_addr(struct net_device *dev, int num,
 
 	/* Now copy it into the mac registers backwards, cuz */
 	/* little endian is silly */
-	for (idx = 0; idx < MAC_ADDR_LEN; idx++)
-		tmpbuf[MAC_ADDR_LEN - 1 - idx] = addr[idx];
+	for (idx = 0; idx < ETH_ALEN; idx++)
+		tmpbuf[ETH_ALEN - 1 - idx] = addr[idx];
 
 	gfar_write(macptr, *((u32 *) (tmpbuf)));
 
diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h
index 9aa4377..d93d49c 100644
--- a/drivers/net/ethernet/freescale/gianfar.h
+++ b/drivers/net/ethernet/freescale/gianfar.h
@@ -74,9 +74,6 @@ struct ethtool_rx_list {
  * will be the next highest multiple of 512 bytes. */
 #define INCREMENTAL_BUFFER_SIZE 512
 
-
-#define MAC_ADDR_LEN 6
-
 #define PHY_INIT_TIMEOUT 100000
 #define GFAR_PHY_CHANGE_TIME 2
 
diff --git a/drivers/net/ethernet/micrel/ksz884x.c b/drivers/net/ethernet/micrel/ksz884x.c
index 3b67fe6..cbdbc77d 100644
--- a/drivers/net/ethernet/micrel/ksz884x.c
+++ b/drivers/net/ethernet/micrel/ksz884x.c
@@ -743,8 +743,7 @@
 /* Change default LED mode. */
 #define SET_DEFAULT_LED			LED_SPEED_DUPLEX_ACT
 
-#define MAC_ADDR_LEN			6
-#define MAC_ADDR_ORDER(i)		(MAC_ADDR_LEN - 1 - (i))
+#define MAC_ADDR_ORDER(i)		(ETH_ALEN - 1 - (i))
 
 #define MAX_ETHERNET_BODY_SIZE		1500
 #define ETHERNET_HEADER_SIZE		14
@@ -1043,7 +1042,7 @@ enum {
  * @valid:	Valid setting indicating the entry is being used.
  */
 struct ksz_mac_table {
-	u8 mac_addr[MAC_ADDR_LEN];
+	u8 mac_addr[ETH_ALEN];
 	u16 vid;
 	u8 fid;
 	u8 ports;
@@ -1187,8 +1186,8 @@ struct ksz_switch {
 	u8 diffserv[DIFFSERV_ENTRIES];
 	u8 p_802_1p[PRIO_802_1P_ENTRIES];
 
-	u8 br_addr[MAC_ADDR_LEN];
-	u8 other_addr[MAC_ADDR_LEN];
+	u8 br_addr[ETH_ALEN];
+	u8 other_addr[ETH_ALEN];
 
 	u8 broad_per;
 	u8 member;
@@ -1292,14 +1291,14 @@ struct ksz_hw {
 	int tx_int_mask;
 	int tx_size;
 
-	u8 perm_addr[MAC_ADDR_LEN];
-	u8 override_addr[MAC_ADDR_LEN];
-	u8 address[ADDITIONAL_ENTRIES][MAC_ADDR_LEN];
+	u8 perm_addr[ETH_ALEN];
+	u8 override_addr[ETH_ALEN];
+	u8 address[ADDITIONAL_ENTRIES][ETH_ALEN];
 	u8 addr_list_size;
 	u8 mac_override;
 	u8 promiscuous;
 	u8 all_multi;
-	u8 multi_list[MAX_MULTICAST_LIST][MAC_ADDR_LEN];
+	u8 multi_list[MAX_MULTICAST_LIST][ETH_ALEN];
 	u8 multi_bits[HW_MULTICAST_SIZE];
 	u8 multi_list_size;
 
@@ -3654,7 +3653,7 @@ static void hw_add_wol_bcast(struct ksz_hw *hw)
 	static const u8 mask[] = { 0x3F };
 	static const u8 pattern[] = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF };
 
-	hw_set_wol_frame(hw, 2, 1, mask, MAC_ADDR_LEN, pattern);
+	hw_set_wol_frame(hw, 2, 1, mask, ETH_ALEN, pattern);
 }
 
 /**
@@ -3689,7 +3688,7 @@ static void hw_add_wol_ucast(struct ksz_hw *hw)
 {
 	static const u8 mask[] = { 0x3F };
 
-	hw_set_wol_frame(hw, 0, 1, mask, MAC_ADDR_LEN, hw->override_addr);
+	hw_set_wol_frame(hw, 0, 1, mask, ETH_ALEN, hw->override_addr);
 }
 
 /**
@@ -4055,7 +4054,7 @@ static void hw_set_addr(struct ksz_hw *hw)
 {
 	int i;
 
-	for (i = 0; i < MAC_ADDR_LEN; i++)
+	for (i = 0; i < ETH_ALEN; i++)
 		writeb(hw->override_addr[MAC_ADDR_ORDER(i)],
 			hw->io + KS884X_ADDR_0_OFFSET + i);
 
@@ -4072,17 +4071,16 @@ static void hw_read_addr(struct ksz_hw *hw)
 {
 	int i;
 
-	for (i = 0; i < MAC_ADDR_LEN; i++)
+	for (i = 0; i < ETH_ALEN; i++)
 		hw->perm_addr[MAC_ADDR_ORDER(i)] = readb(hw->io +
 			KS884X_ADDR_0_OFFSET + i);
 
 	if (!hw->mac_override) {
-		memcpy(hw->override_addr, hw->perm_addr, MAC_ADDR_LEN);
+		memcpy(hw->override_addr, hw->perm_addr, ETH_ALEN);
 		if (empty_addr(hw->override_addr)) {
-			memcpy(hw->perm_addr, DEFAULT_MAC_ADDRESS,
-				MAC_ADDR_LEN);
+			memcpy(hw->perm_addr, DEFAULT_MAC_ADDRESS, ETH_ALEN);
 			memcpy(hw->override_addr, DEFAULT_MAC_ADDRESS,
-				MAC_ADDR_LEN);
+			       ETH_ALEN);
 			hw->override_addr[5] += hw->id;
 			hw_set_addr(hw);
 		}
@@ -4130,16 +4128,16 @@ static int hw_add_addr(struct ksz_hw *hw, u8 *mac_addr)
 	int i;
 	int j = ADDITIONAL_ENTRIES;
 
-	if (!memcmp(hw->override_addr, mac_addr, MAC_ADDR_LEN))
+	if (!memcmp(hw->override_addr, mac_addr, ETH_ALEN))
 		return 0;
 	for (i = 0; i < hw->addr_list_size; i++) {
-		if (!memcmp(hw->address[i], mac_addr, MAC_ADDR_LEN))
+		if (!memcmp(hw->address[i], mac_addr, ETH_ALEN))
 			return 0;
 		if (ADDITIONAL_ENTRIES == j && empty_addr(hw->address[i]))
 			j = i;
 	}
 	if (j < ADDITIONAL_ENTRIES) {
-		memcpy(hw->address[j], mac_addr, MAC_ADDR_LEN);
+		memcpy(hw->address[j], mac_addr, ETH_ALEN);
 		hw_ena_add_addr(hw, j, hw->address[j]);
 		return 0;
 	}
@@ -4151,8 +4149,8 @@ static int hw_del_addr(struct ksz_hw *hw, u8 *mac_addr)
 	int i;
 
 	for (i = 0; i < hw->addr_list_size; i++) {
-		if (!memcmp(hw->address[i], mac_addr, MAC_ADDR_LEN)) {
-			memset(hw->address[i], 0, MAC_ADDR_LEN);
+		if (!memcmp(hw->address[i], mac_addr, ETH_ALEN)) {
+			memset(hw->address[i], 0, ETH_ALEN);
 			writel(0, hw->io + ADD_ADDR_INCR * i +
 				KS_ADD_ADDR_0_HI);
 			return 0;
@@ -5676,7 +5674,7 @@ static int netdev_set_mac_address(struct net_device *dev, void *addr)
 		hw_del_addr(hw, dev->dev_addr);
 	else {
 		hw->mac_override = 1;
-		memcpy(hw->override_addr, mac->sa_data, MAC_ADDR_LEN);
+		memcpy(hw->override_addr, mac->sa_data, ETH_ALEN);
 	}
 
 	memcpy(dev->dev_addr, mac->sa_data, MAX_ADDR_LEN);
@@ -5786,7 +5784,7 @@ static void netdev_set_rx_mode(struct net_device *dev)
 		netdev_for_each_mc_addr(ha, dev) {
 			if (i >= MAX_MULTICAST_LIST)
 				break;
-			memcpy(hw->multi_list[i++], ha->addr, MAC_ADDR_LEN);
+			memcpy(hw->multi_list[i++], ha->addr, ETH_ALEN);
 		}
 		hw->multi_list_size = (u8) i;
 		hw_set_grp_addr(hw);
@@ -6861,7 +6859,7 @@ static void get_mac_addr(struct dev_info *hw_priv, u8 *macaddr, int port)
 	int num;
 
 	i = j = num = got_num = 0;
-	while (j < MAC_ADDR_LEN) {
+	while (j < ETH_ALEN) {
 		if (macaddr[i]) {
 			int digit;
 
@@ -6892,7 +6890,7 @@ static void get_mac_addr(struct dev_info *hw_priv, u8 *macaddr, int port)
 		}
 		i++;
 	}
-	if (MAC_ADDR_LEN == j) {
+	if (ETH_ALEN == j) {
 		if (MAIN_PORT == port)
 			hw_priv->hw.mac_override = 1;
 	}
@@ -7059,7 +7057,7 @@ static int __devinit pcidev_init(struct pci_dev *pdev,
 
 	/* Multiple device interfaces mode requires a second MAC address. */
 	if (hw->dev_count > 1) {
-		memcpy(sw->other_addr, hw->override_addr, MAC_ADDR_LEN);
+		memcpy(sw->other_addr, hw->override_addr, ETH_ALEN);
 		read_other_addr(hw);
 		if (mac1addr[0] != ':')
 			get_mac_addr(hw_priv, mac1addr, OTHER_PORT);
@@ -7109,12 +7107,11 @@ static int __devinit pcidev_init(struct pci_dev *pdev,
 		dev->irq = pdev->irq;
 		if (MAIN_PORT == i)
 			memcpy(dev->dev_addr, hw_priv->hw.override_addr,
-				MAC_ADDR_LEN);
+			       ETH_ALEN);
 		else {
-			memcpy(dev->dev_addr, sw->other_addr,
-				MAC_ADDR_LEN);
+			memcpy(dev->dev_addr, sw->other_addr, ETH_ALEN);
 			if (!memcmp(sw->other_addr, hw->override_addr,
-					MAC_ADDR_LEN))
+				    ETH_ALEN))
 				dev->dev_addr[5] += port->first_port;
 		}
 
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index cdf66d6..847d855 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -69,9 +69,6 @@
    The RTL chips use a 64 element hash table based on the Ethernet CRC. */
 static const int multicast_filter_limit = 32;
 
-/* MAC address length */
-#define MAC_ADDR_LEN	6
-
 #define MAX_READ_REQUEST_SHIFT	12
 #define TX_DMA_BURST	6	/* Maximum PCI burst, '6' is 1024 */
 #define SafeMtu		0x1c20	/* ... actually life sucks beyond ~7k */
@@ -4099,7 +4096,7 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	spin_lock_init(&tp->lock);
 
 	/* Get MAC address */
-	for (i = 0; i < MAC_ADDR_LEN; i++)
+	for (i = 0; i < ETH_ALEN; i++)
 		dev->dev_addr[i] = RTL_R8(MAC0 + i);
 	memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len);
 
diff --git a/drivers/net/ethernet/sis/sis190.c b/drivers/net/ethernet/sis/sis190.c
index 220e982..5b118cd 100644
--- a/drivers/net/ethernet/sis/sis190.c
+++ b/drivers/net/ethernet/sis/sis190.c
@@ -47,8 +47,6 @@
 #define sis190_rx_skb			netif_rx
 #define sis190_rx_quota(count, quota)	count
 
-#define MAC_ADDR_LEN		6
-
 #define NUM_TX_DESC		64	/* [8..1024] */
 #define NUM_RX_DESC		64	/* [8..8192] */
 #define TX_RING_BYTES		(NUM_TX_DESC * sizeof(struct TxDesc))
@@ -1601,7 +1599,7 @@ static int __devinit sis190_get_mac_addr_from_eeprom(struct pci_dev *pdev,
 	}
 
 	/* Get MAC address from EEPROM */
-	for (i = 0; i < MAC_ADDR_LEN / 2; i++) {
+	for (i = 0; i < ETH_ALEN / 2; i++) {
 		u16 w = sis190_read_eeprom(ioaddr, EEPROMMACAddr + i);
 
 		((__le16 *)dev->dev_addr)[i] = cpu_to_le16(w);
@@ -1653,7 +1651,7 @@ static int __devinit sis190_get_mac_addr_from_apc(struct pci_dev *pdev,
 	udelay(50);
 	pci_read_config_byte(isa_bridge, 0x48, &reg);
 
-        for (i = 0; i < MAC_ADDR_LEN; i++) {
+        for (i = 0; i < ETH_ALEN; i++) {
                 outb(0x9 + i, 0x78);
                 dev->dev_addr[i] = inb(0x79);
         }
@@ -1692,7 +1690,7 @@ static inline void sis190_init_rxfilter(struct net_device *dev)
 	 */
 	SIS_W16(RxMacControl, ctl & ~0x0f00);
 
-	for (i = 0; i < MAC_ADDR_LEN; i++)
+	for (i = 0; i < ETH_ALEN; i++)
 		SIS_W8(RxMacAddr + i, dev->dev_addr[i]);
 
 	SIS_W16(RxMacControl, ctl);
-- 
1.7.6.405.gc1be0

^ permalink raw reply related

* [PATCH net-next 0/5] Use ETH_ALEN
From: Joe Perches @ 2011-11-16 19:38 UTC (permalink / raw)
  To: netdev, linuxppc-dev; +Cc: linux-kernel

Remove other #defines and uses in favor of ETH_ALEN

Joe Perches (5):
  ethernet: Convert MAC_ADDR_LEN uses to ETH_ALEN
  ethernet: Convert ETHER_ADDR_LEN uses to ETH_ALEN
  bna: Convert MAC_ADDRLEN uses to ETH_ALEN
  amd8111e:  Convert ETH_ADDR_LEN uses to ETH_ALEN
  ucc_geth: Convert ENET_NUM_OCTETS_PER_ADDRESS uses to ETH_ALEN

 drivers/net/ethernet/8390/8390.h           |    2 -
 drivers/net/ethernet/8390/apne.c           |    2 +-
 drivers/net/ethernet/8390/ax88796.c        |    5 +-
 drivers/net/ethernet/8390/es3210.c         |    2 +-
 drivers/net/ethernet/8390/hp-plus.c        |    2 +-
 drivers/net/ethernet/8390/hp.c             |    2 +-
 drivers/net/ethernet/8390/hydra.c          |    2 +-
 drivers/net/ethernet/8390/lne390.c         |    4 +-
 drivers/net/ethernet/8390/ne-h8300.c       |    2 +-
 drivers/net/ethernet/8390/ne.c             |    4 +-
 drivers/net/ethernet/8390/ne2.c            |    2 +-
 drivers/net/ethernet/8390/ne3210.c         |    2 +-
 drivers/net/ethernet/8390/stnic.c          |    2 +-
 drivers/net/ethernet/8390/zorro8390.c      |    2 +-
 drivers/net/ethernet/amd/amd8111e.c        |    6 +-
 drivers/net/ethernet/amd/amd8111e.h        |    1 -
 drivers/net/ethernet/amd/nmclan_cs.c       |   12 ++---
 drivers/net/ethernet/broadcom/sb1250-mac.c |    3 +-
 drivers/net/ethernet/brocade/bna/cna.h     |    3 +-
 drivers/net/ethernet/freescale/gianfar.c   |   12 +++---
 drivers/net/ethernet/freescale/gianfar.h   |    3 -
 drivers/net/ethernet/freescale/ucc_geth.c  |    2 +-
 drivers/net/ethernet/freescale/ucc_geth.h  |    6 +-
 drivers/net/ethernet/micrel/ksz884x.c      |   59 +++++++++++++--------------
 drivers/net/ethernet/realtek/r8169.c       |    5 +--
 drivers/net/ethernet/sis/sis190.c          |    8 +--
 26 files changed, 68 insertions(+), 87 deletions(-)

-- 
1.7.6.405.gc1be0

^ permalink raw reply

* Re: [PATCH net-next v5 07/10] forcedeth: implement ndo_get_stats64() API
From: Ben Hutchings @ 2011-11-16 19:32 UTC (permalink / raw)
  To: David Decotigny
  Cc: Stephen Hemminger, netdev, linux-kernel, David S. Miller,
	Ian Campbell, Eric Dumazet, Jeff Kirsher, Jiri Pirko, Joe Perches,
	Szymon Janc, Richard Jones, Ayaz Abdulla
In-Reply-To: <CAG88wWa7UsPSfX_NdNDkQeuWN0LYYDVckShgDj9GvMq2mtrpbQ@mail.gmail.com>

On Wed, 2011-11-16 at 11:25 -0800, David Decotigny wrote:
> Thanks for your feedback on these concerns.
> 
> On Wed, Nov 16, 2011 at 9:14 AM, Ben Hutchings
> <bhutchings@solarflare.com> wrote:
> > On Wed, 2011-11-16 at 08:55 -0800, Stephen Hemminger wrote:
> >> Or realize that the Tx side is single threaded by dev->xmit_lock already
> >> and the Rx side is single threaded by NAPI.
> >
> > Right.
> 
> Yes, that's what I meant by "Each software stat field is updated by
> one single writer." in a previous email. However, I think that TX and
> RX paths are not always synchronized. So I'm afraid that if I'm using
> a single seqcount, I might run into trouble in the absence of a lock
> around each update.... and I'd really prefer to avoid such a lock. Are
> you suggesting I should use 2 independent seqcounts? One for RX path,
> the other for TX path, all this without a lock around writers?

Yes, there should be 1 for each path.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-next v5 07/10] forcedeth: implement ndo_get_stats64() API
From: David Decotigny @ 2011-11-16 19:25 UTC (permalink / raw)
  To: Stephen Hemminger, Ben Hutchings
  Cc: netdev, linux-kernel, David S. Miller, Ian Campbell, Eric Dumazet,
	Jeff Kirsher, Jiri Pirko, Joe Perches, Szymon Janc, Richard Jones,
	Ayaz Abdulla
In-Reply-To: <1321463694.2709.1.camel@bwh-desktop>

Thanks for your feedback on these concerns.

On Wed, Nov 16, 2011 at 9:14 AM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Wed, 2011-11-16 at 08:55 -0800, Stephen Hemminger wrote:
>> Or realize that the Tx side is single threaded by dev->xmit_lock already
>> and the Rx side is single threaded by NAPI.
>
> Right.

Yes, that's what I meant by "Each software stat field is updated by
one single writer." in a previous email. However, I think that TX and
RX paths are not always synchronized. So I'm afraid that if I'm using
a single seqcount, I might run into trouble in the absence of a lock
around each update.... and I'd really prefer to avoid such a lock. Are
you suggesting I should use 2 independent seqcounts? One for RX path,
the other for TX path, all this without a lock around writers?

Side-note: what are the bad implications of using atomic_t (in the fast paths)?

^ permalink raw reply

* [PATCH 3/5] tcp: use SACKs and DSACKs that arrive on ACKs below snd_una
From: Neal Cardwell @ 2011-11-16 18:58 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, ilpo.jarvinen, Nandita Dukkipati, Yuchung Cheng,
	Tom Herbert, Neal Cardwell
In-Reply-To: <1321469885-10885-1-git-send-email-ncardwell@google.com>

The bug: When the ACK field is below snd_una (which can happen when
ACKs are reordered), senders ignored DSACKs (preventing undo) and did
not call tcp_fastretrans_alert, so they did not increment
prr_delivered to reflect newly-SACKed sequence ranges, and did not
call tcp_xmit_retransmit_queue, thus passing up chances to send out
more retransmitted and new packets based on any newly-SACKed packets.

The change: When the ACK field is below snd_una (the "old_ack" goto
label), call tcp_fastretrans_alert to allow undo based on any
newly-arrived DSACKs and try to send out more packets based on
newly-SACKed packets.

Other patches in this series will provide other changes that are
necessary to fully fix this problem.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_input.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b49e418..751d390 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3805,10 +3805,14 @@ invalid_ack:
 	return -1;

 old_ack:
+	/* If data was SACKed, tag it and see if we should send more data.
+	 * If data was DSACKed, see if we can undo a cwnd reduction.
+	 */
 	if (TCP_SKB_CB(skb)->sacked) {
-		tcp_sacktag_write_queue(sk, skb, prior_snd_una);
-		if (icsk->icsk_ca_state == TCP_CA_Open)
-			tcp_try_keep_open(sk);
+		flag |= tcp_sacktag_write_queue(sk, skb, prior_snd_una);
+		newly_acked_sacked = tp->sacked_out - prior_sacked;
+		tcp_fastretrans_alert(sk, pkts_acked, newly_acked_sacked,
+				      is_dupack, flag);
 	}

 	SOCK_DEBUG(sk, "Ack %u before %u:%u\n", ack, tp->snd_una, tp->snd_nxt);
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open
From: Neal Cardwell @ 2011-11-16 18:58 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, ilpo.jarvinen, Nandita Dukkipati, Yuchung Cheng,
	Tom Herbert, Neal Cardwell
In-Reply-To: <1321469885-10885-1-git-send-email-ncardwell@google.com>

The problem: Senders were overriding cwnd values picked during an undo
by calling tcp_moderate_cwnd() in tcp_try_to_open().

The fix: Don't moderate cwnd in tcp_try_to_open() if we're in
TCP_CA_Open, since doing so is generally unnecessary and specifically
would override a DSACK-based undo of a cwnd reduction made in fast
recovery.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_input.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a4efdd7..78dd38c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2881,7 +2881,8 @@ static void tcp_try_to_open(struct sock *sk, int flag)
 
 	if (inet_csk(sk)->icsk_ca_state != TCP_CA_CWR) {
 		tcp_try_keep_open(sk);
-		tcp_moderate_cwnd(tp);
+		if (inet_csk(sk)->icsk_ca_state != TCP_CA_Open)
+			tcp_moderate_cwnd(tp);
 	} else {
 		tcp_cwnd_down(sk, flag);
 	}
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH 4/5] tcp: allow undo from reordered DSACKs
From: Neal Cardwell @ 2011-11-16 18:58 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, ilpo.jarvinen, Nandita Dukkipati, Yuchung Cheng,
	Tom Herbert, Neal Cardwell
In-Reply-To: <1321469885-10885-1-git-send-email-ncardwell@google.com>

Previously, SACK-enabled connections hung around in TCP_CA_Disorder
state while snd_una==high_seq, just waiting to accumulate DSACKs and
hopefully undo a cwnd reduction. This could and did lead to the
following unfortunate scenario: if some incoming ACKs advance snd_una
beyond high_seq then we were setting undo_marker to 0 and moving to
TCP_CA_Open, so if (due to reordering in the ACK return path) we
shortly thereafter received a DSACK then we were no longer able to
undo the cwnd reduction.

The change: Simplify the congestion avoidance state machine by
removing the behavior where SACK-enabled connections hung around in
the TCP_CA_Disorder state just waiting for DSACKs. Instead, when
snd_una advances to high_seq or beyond we typically move to
TCP_CA_Open immediately and allow an undo in either TCP_CA_Open or
TCP_CA_Disorder if we later receive enough DSACKs.

Other patches in this series will provide other changes that are
necessary to fully fix this problem.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_input.c |   15 ++-------------
 1 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 751d390..a4efdd7 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2858,7 +2858,7 @@ static void tcp_try_keep_open(struct sock *sk)
 	struct tcp_sock *tp = tcp_sk(sk);
 	int state = TCP_CA_Open;

-	if (tcp_left_out(tp) || tcp_any_retrans_done(sk) || tp->undo_marker)
+	if (tcp_left_out(tp) || tcp_any_retrans_done(sk))
 		state = TCP_CA_Disorder;

 	if (inet_csk(sk)->icsk_ca_state != state) {
@@ -3066,17 +3066,6 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked,
 			}
 			break;

-		case TCP_CA_Disorder:
-			tcp_try_undo_dsack(sk);
-			if (!tp->undo_marker ||
-			    /* For SACK case do not Open to allow to undo
-			     * catching for all duplicate ACKs. */
-			    tcp_is_reno(tp) || tp->snd_una != tp->high_seq) {
-				tp->undo_marker = 0;
-				tcp_set_ca_state(sk, TCP_CA_Open);
-			}
-			break;
-
 		case TCP_CA_Recovery:
 			if (tcp_is_reno(tp))
 				tcp_reset_reno_sack(tp);
@@ -3117,7 +3106,7 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked,
 				tcp_add_reno_sack(sk);
 		}

-		if (icsk->icsk_ca_state == TCP_CA_Disorder)
+		if (icsk->icsk_ca_state <= TCP_CA_Disorder)
 			tcp_try_undo_dsack(sk);

 		if (!tcp_time_to_recover(sk)) {
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH 2/5] tcp: use DSACKs that arrive when packets_out is 0
From: Neal Cardwell @ 2011-11-16 18:58 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, ilpo.jarvinen, Nandita Dukkipati, Yuchung Cheng,
	Tom Herbert, Neal Cardwell
In-Reply-To: <1321469885-10885-1-git-send-email-ncardwell@google.com>

The bug: Senders ignored DSACKs after recovery when there were no
outstanding packets (a common scenario for HTTP servers).

The change: when there are no outstanding packets (the "no_queue" goto
label), call tcp_fastretrans_alert() in order to use DSACKs to undo
congestion window reductions.

Other patches in this series will provide other changes that are
necessary to fully fix this problem.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_input.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index f772aaa..b49e418 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3788,6 +3788,10 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	return 1;
 
 no_queue:
+	/* If data was DSACKed, see if we can undo a cwnd reduction. */
+	if (flag & FLAG_DSACKING_ACK)
+		tcp_fastretrans_alert(sk, pkts_acked, newly_acked_sacked,
+				      is_dupack, flag);
 	/* If this ack opens up a zero window, clear backoff.  It was
 	 * being used to time the probes, and is probably far higher than
 	 * it needs to be for normal retransmission.
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH 1/5] tcp: make is_dupack a parameter to tcp_fastretrans_alert()
From: Neal Cardwell @ 2011-11-16 18:58 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, ilpo.jarvinen, Nandita Dukkipati, Yuchung Cheng,
	Tom Herbert, Neal Cardwell

Allow callers to decide whether an ACK is a duplicate ACK. This is a
prerequisite to allowing fastretrans_alert to be called from new
contexts, such as the no_queue and old_ack code paths, from which we
have extra info that tells us whether an ACK is a dupack.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_input.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 52b5c2d..f772aaa 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3009,11 +3009,11 @@ static void tcp_update_cwnd_in_recovery(struct sock *sk, int newly_acked_sacked,
  * tcp_xmit_retransmit_queue().
  */
 static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked,
-				  int newly_acked_sacked, int flag)
+				  int newly_acked_sacked, bool is_dupack,
+				  int flag)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
-	int is_dupack = !(flag & (FLAG_SND_UNA_ADVANCED | FLAG_NOT_DUP));
 	int do_lost = is_dupack || ((flag & FLAG_DATA_SACKED) &&
 				    (tcp_fackets_out(tp) > tp->reordering));
 	int fast_rexmit = 0, mib_idx;
@@ -3681,10 +3681,12 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	u32 prior_snd_una = tp->snd_una;
 	u32 ack_seq = TCP_SKB_CB(skb)->seq;
 	u32 ack = TCP_SKB_CB(skb)->ack_seq;
+	bool is_dupack = false;
 	u32 prior_in_flight;
 	u32 prior_fackets;
 	int prior_packets;
 	int prior_sacked = tp->sacked_out;
+	int pkts_acked = 0;
 	int newly_acked_sacked = 0;
 	int frto_cwnd = 0;
 
@@ -3757,6 +3759,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	/* See if we can take anything off of the retransmit queue. */
 	flag |= tcp_clean_rtx_queue(sk, prior_fackets, prior_snd_una);
 
+	pkts_acked = prior_packets - tp->packets_out;
 	newly_acked_sacked = (prior_packets - prior_sacked) -
 			     (tp->packets_out - tp->sacked_out);
 
@@ -3771,8 +3774,9 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 		if ((flag & FLAG_DATA_ACKED) && !frto_cwnd &&
 		    tcp_may_raise_cwnd(sk, flag))
 			tcp_cong_avoid(sk, ack, prior_in_flight);
-		tcp_fastretrans_alert(sk, prior_packets - tp->packets_out,
-				      newly_acked_sacked, flag);
+		is_dupack = !(flag & (FLAG_SND_UNA_ADVANCED | FLAG_NOT_DUP));
+		tcp_fastretrans_alert(sk, pkts_acked, newly_acked_sacked,
+				      is_dupack, flag);
 	} else {
 		if ((flag & FLAG_DATA_ACKED) && !frto_cwnd)
 			tcp_cong_avoid(sk, ack, prior_in_flight);
-- 
1.7.3.1

^ permalink raw reply related

* Re: [PATCH iproute2] ss: report ecnseen
From: Eric Dumazet @ 2011-11-16 17:59 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20111116094550.61b73eee@nehalam.linuxnetplumber.net>

Le mercredi 16 novembre 2011 à 09:45 -0800, Stephen Hemminger a écrit :

> Please use same name as include/linux/tcp.h TCPI_OPT_ECN_SEEN
> Also, need to get this to glibc to make it included in standard distro
> headers.

Thanks, I'll take care of glibc headers as well.

[PATCH iproute2] ss: report ecnseen

Support ECNSEEN reporting in ss command.

ESTAB      0      0           10.170.73.123:4900
10.170.73.125:51001    uid:501 ino:385994 sk:f31e5f00
         mem:(r0,w0,f0,t0) ts sack ecn ecnseen bic wscale:8,8 rto:210
rtt:18.75/15 ato:40 cwnd:10 send 69.9Mbps rcv_space:32768

"ecn" means TCP session negociated ECN capability (TCP layer) at setup
time

"ecnseen" at least one frame with ECT(0) or ECT(1) or ECN (IP layer) was
received from peer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/netinet/tcp.h |    1 +
 misc/ss.c             |    2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/netinet/tcp.h b/include/netinet/tcp.h
index 282b29c..3f890a1 100644
--- a/include/netinet/tcp.h
+++ b/include/netinet/tcp.h
@@ -172,6 +172,7 @@ enum
 # define TCPI_OPT_SACK		2
 # define TCPI_OPT_WSCALE	4
 # define TCPI_OPT_ECN		8
+# define TCPI_OPT_ECN_SEEN	16
 
 /* Values for tcpi_state.  */
 enum tcp_ca_state
diff --git a/misc/ss.c b/misc/ss.c
index b00841b..487154e 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -1357,6 +1357,8 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r)
 				printf(" sack");
 			if (info->tcpi_options & TCPI_OPT_ECN)
 				printf(" ecn");
+			if (info->tcpi_options & TCPI_OPT_ECN_SEEN)
+				printf(" ecnseen");
 		}
 
 		if (tb[INET_DIAG_CONG])

^ permalink raw reply related

* Business Proposal of USD $22,500,000.00
From: Mr. Vincent Cheng Hoi Chuen @ 2011-11-16 17:52 UTC (permalink / raw)


Good day,

I am Mr.Vincent Cheng Hoi Chuen, GBS, JP Chairman of the Hong Kong and Shanghai Banking Corporation Limited. I have a business proposal of USD $22,500,000.00. Your earliest response to this letter will be appreciated.

Best Regards,
Mr.Vincent Cheng Hoi.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox