Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 4/4] FEC: Add time stamping code and a PTP hardware clock
From: Frank Li @ 2012-12-17 14:48 UTC (permalink / raw)
  To: Sascha Hauer
  Cc: Frank Li, lznua, richardcochran, shawn.guo, linux-arm-kernel,
	netdev, davem
In-Reply-To: <20121217091345.GA753@pengutronix.de>

2012/12/17 Sascha Hauer <s.hauer@pengutronix.de>:
> On Wed, Oct 31, 2012 at 12:25:31PM +0800, Frank Li wrote:
>> This patch adds a driver for the FEC(MX6) that offers time
>> stamping and a PTP haderware clock. Because FEC\ENET(MX6)
>> hardware frequency adjustment is complex, we have implemented
>> this in software by changing the multiplication factor of the
>> timecounter.
>>
>> Signed-off-by: Frank Li <Frank.Li@freescale.com>
>> ---
>>  drivers/net/ethernet/freescale/Kconfig   |    9 +
>>  drivers/net/ethernet/freescale/Makefile  |    1 +
>>  drivers/net/ethernet/freescale/fec.c     |   88 +++++++-
>>  drivers/net/ethernet/freescale/fec.h     |   38 +++
>>  drivers/net/ethernet/freescale/fec_ptp.c |  386 ++++++++++++++++++++++++++++++
>>  5 files changed, 521 insertions(+), 1 deletions(-)
>>  create mode 100644 drivers/net/ethernet/freescale/fec_ptp.c
>>
>> diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig
>> index feff516..ff3be53 100644
>> --- a/drivers/net/ethernet/freescale/Kconfig
>> +++ b/drivers/net/ethernet/freescale/Kconfig
>> @@ -92,4 +92,13 @@ config GIANFAR
>>         This driver supports the Gigabit TSEC on the MPC83xx, MPC85xx,
>>         and MPC86xx family of chips, and the FEC on the 8540.
>>
>> +config FEC_PTP
>> +     bool "PTP Hardware Clock (PHC)"
>> +     depends on FEC
>> +     select PPS
>> +     select PTP_1588_CLOCK
>> +     --help---
>> +       Say Y here if you want to use PTP Hardware Clock (PHC) in the
>> +       driver.  Only the basic clock operations have been implemented.
>> +
>>  endif # NET_VENDOR_FREESCALE
>> diff --git a/drivers/net/ethernet/freescale/Makefile b/drivers/net/ethernet/freescale/Makefile
>> index 3d1839a..d4d19b3 100644
>> --- a/drivers/net/ethernet/freescale/Makefile
>> +++ b/drivers/net/ethernet/freescale/Makefile
>> @@ -3,6 +3,7 @@
>>  #
>>
>>  obj-$(CONFIG_FEC) += fec.o
>> +obj-$(CONFIG_FEC_PTP) += fec_ptp.o
>>  obj-$(CONFIG_FEC_MPC52xx) += fec_mpc52xx.o
>>  ifeq ($(CONFIG_FEC_MPC52xx_MDIO),y)
>>       obj-$(CONFIG_FEC_MPC52xx) += fec_mpc52xx_phy.o
>> diff --git a/drivers/net/ethernet/freescale/fec.c b/drivers/net/ethernet/freescale/fec.c
>> index d0e1b33..2665162 100644
>> --- a/drivers/net/ethernet/freescale/fec.c
>> +++ b/drivers/net/ethernet/freescale/fec.c
>> @@ -280,6 +280,17 @@ fec_enet_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>>                       | BD_ENET_TX_LAST | BD_ENET_TX_TC);
>>       bdp->cbd_sc = status;
>>
>> +#ifdef CONFIG_FEC_PTP
>
> This ifdef desert in the fec driver currently breaks all SoCs except
> i.MX6 in the imx_v6_v7_defconfig.
>
> Most of these could be fixed with something like if (fec_use_ptp(fep)),
>
>
>>  #if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || \
>>      defined(CONFIG_M520x) || defined(CONFIG_M532x) || \
>>      defined(CONFIG_ARCH_MXC) || defined(CONFIG_SOC_IMX28)
>> @@ -88,6 +94,13 @@ struct bufdesc {
>>       unsigned short cbd_datlen;      /* Data length */
>>       unsigned short cbd_sc;  /* Control and status info */
>>       unsigned long cbd_bufaddr;      /* Buffer address */
>> +#ifdef CONFIG_FEC_PTP
>> +     unsigned long cbd_esc;
>> +     unsigned long cbd_prot;
>> +     unsigned long cbd_bdu;
>> +     unsigned long ts;
>> +     unsigned short res0[4];
>> +#endif
>>  };
>
> This one changes the layout of the hardware buffer description which is
> not so easy to fix.

Yes, it is not easy to fix if dynamic check mx6 or other devices.

>
> I don't know how to continue from here. Since the whole patch doesn't
> seem to reviewed very much I tend to say we should revert it for now and
> let Frank redo it for the next merge window.
>
> Other opinions?

Can we just disable CONFIG_FEC_PTP defaut instead of revert whole patch?

>
> Sascha
>
> --
> Pengutronix e.K.                           |                             |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 06/15] net,9p: use new hashtable implementation
From: Sasha Levin @ 2012-12-17 15:01 UTC (permalink / raw)
  To: David S. Miller, Sasha Levin, Eric Van Hensbergen, Joe Perches,
	netdev, linux-kernel
  Cc: Sasha Levin
In-Reply-To: <1355756497-15834-1-git-send-email-sasha.levin@oracle.com>

Switch 9p error table to use the new hashtable implementation. This reduces
the amount of generic unrelated code in 9p.

This patch depends on d9b482c ("hashtable: introduce a small and naive
hashtable") which was merged in v3.6.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/9p/error.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/net/9p/error.c b/net/9p/error.c
index 2ab2de7..a394b37 100644
--- a/net/9p/error.c
+++ b/net/9p/error.c
@@ -34,6 +34,7 @@
 #include <linux/jhash.h>
 #include <linux/errno.h>
 #include <net/9p/9p.h>
+#include <linux/hashtable.h>
 
 /**
  * struct errormap - map string errors from Plan 9 to Linux numeric ids
@@ -50,8 +51,8 @@ struct errormap {
 	struct hlist_node list;
 };
 
-#define ERRHASHSZ		32
-static struct hlist_head hash_errmap[ERRHASHSZ];
+#define ERR_HASH_BITS 5
+static DEFINE_HASHTABLE(hash_errmap, ERR_HASH_BITS);
 
 /* FixMe - reduce to a reasonable size */
 static struct errormap errmap[] = {
@@ -193,18 +194,14 @@ static struct errormap errmap[] = {
 int p9_error_init(void)
 {
 	struct errormap *c;
-	int bucket;
-
-	/* initialize hash table */
-	for (bucket = 0; bucket < ERRHASHSZ; bucket++)
-		INIT_HLIST_HEAD(&hash_errmap[bucket]);
+	u32 hash;
 
 	/* load initial error map into hash table */
 	for (c = errmap; c->name != NULL; c++) {
 		c->namelen = strlen(c->name);
-		bucket = jhash(c->name, c->namelen, 0) % ERRHASHSZ;
+		hash = jhash(c->name, c->namelen, 0);
 		INIT_HLIST_NODE(&c->list);
-		hlist_add_head(&c->list, &hash_errmap[bucket]);
+		hash_add(hash_errmap, &c->list, hash);
 	}
 
 	return 1;
@@ -223,13 +220,13 @@ int p9_errstr2errno(char *errstr, int len)
 	int errno;
 	struct hlist_node *p;
 	struct errormap *c;
-	int bucket;
+	u32 hash;
 
 	errno = 0;
 	p = NULL;
 	c = NULL;
-	bucket = jhash(errstr, len, 0) % ERRHASHSZ;
-	hlist_for_each_entry(c, p, &hash_errmap[bucket], list) {
+	hash = jhash(errstr, len, 0);
+	hash_for_each_possible(hash_errmap, c, p, list, hash) {
 		if (c->namelen == len && !memcmp(c->name, errstr, len)) {
 			errno = c->val;
 			break;
-- 
1.8.0

^ permalink raw reply related

* [PATCH 08/15] SUNRPC/cache: use new hashtable implementation
From: Sasha Levin @ 2012-12-17 15:01 UTC (permalink / raw)
  To: Trond Myklebust, J. Bruce Fields, David S. Miller, linux-nfs,
	netdev, linux-kernel
  Cc: Sasha Levin
In-Reply-To: <1355756497-15834-1-git-send-email-sasha.levin@oracle.com>

Switch cache to use the new hashtable implementation. This reduces the amount
of generic unrelated code in the cache implementation.

This patch depends on d9b482c ("hashtable: introduce a small and naive
hashtable") which was merged in v3.6.

Tested-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/sunrpc/cache.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 9afa439..d4539b6 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -28,6 +28,7 @@
 #include <linux/workqueue.h>
 #include <linux/mutex.h>
 #include <linux/pagemap.h>
+#include <linux/hashtable.h>
 #include <asm/ioctls.h>
 #include <linux/sunrpc/types.h>
 #include <linux/sunrpc/cache.h>
@@ -524,19 +525,18 @@ EXPORT_SYMBOL_GPL(cache_purge);
  * it to be revisited when cache info is available
  */
 
-#define	DFR_HASHSIZE	(PAGE_SIZE/sizeof(struct list_head))
-#define	DFR_HASH(item)	((((long)item)>>4 ^ (((long)item)>>13)) % DFR_HASHSIZE)
+#define	DFR_HASH_BITS	9
 
 #define	DFR_MAX	300	/* ??? */
 
 static DEFINE_SPINLOCK(cache_defer_lock);
 static LIST_HEAD(cache_defer_list);
-static struct hlist_head cache_defer_hash[DFR_HASHSIZE];
+static DEFINE_HASHTABLE(cache_defer_hash, DFR_HASH_BITS);
 static int cache_defer_cnt;
 
 static void __unhash_deferred_req(struct cache_deferred_req *dreq)
 {
-	hlist_del_init(&dreq->hash);
+	hash_del(&dreq->hash);
 	if (!list_empty(&dreq->recent)) {
 		list_del_init(&dreq->recent);
 		cache_defer_cnt--;
@@ -545,10 +545,7 @@ static void __unhash_deferred_req(struct cache_deferred_req *dreq)
 
 static void __hash_deferred_req(struct cache_deferred_req *dreq, struct cache_head *item)
 {
-	int hash = DFR_HASH(item);
-
-	INIT_LIST_HEAD(&dreq->recent);
-	hlist_add_head(&dreq->hash, &cache_defer_hash[hash]);
+	hash_add(cache_defer_hash, &dreq->hash, (unsigned long)item);
 }
 
 static void setup_deferral(struct cache_deferred_req *dreq,
@@ -600,7 +597,7 @@ static void cache_wait_req(struct cache_req *req, struct cache_head *item)
 		 * to clean up
 		 */
 		spin_lock(&cache_defer_lock);
-		if (!hlist_unhashed(&sleeper.handle.hash)) {
+		if (hash_hashed(&sleeper.handle.hash)) {
 			__unhash_deferred_req(&sleeper.handle);
 			spin_unlock(&cache_defer_lock);
 		} else {
@@ -671,12 +668,11 @@ static void cache_revisit_request(struct cache_head *item)
 	struct cache_deferred_req *dreq;
 	struct list_head pending;
 	struct hlist_node *lp, *tmp;
-	int hash = DFR_HASH(item);
 
 	INIT_LIST_HEAD(&pending);
 	spin_lock(&cache_defer_lock);
 
-	hlist_for_each_entry_safe(dreq, lp, tmp, &cache_defer_hash[hash], hash)
+	hash_for_each_possible_safe(cache_defer_hash, dreq, lp, tmp, hash, (unsigned long)item)
 		if (dreq->item == item) {
 			__unhash_deferred_req(dreq);
 			list_add(&dreq->recent, &pending);
-- 
1.8.0

^ permalink raw reply related

* [PATCH 10/15] net,l2tp: use new hashtable implementation
From: Sasha Levin @ 2012-12-17 15:01 UTC (permalink / raw)
  To: David S. Miller, James Chapman, Eric Dumazet, Dmitry Kozlov,
	Sasha Levin, Chris Elston, Joe Perches, netdev, linux-kernel
  Cc: Sasha Levin
In-Reply-To: <1355756497-15834-1-git-send-email-sasha.levin@oracle.com>

Switch l2tp to use the new hashtable implementation. This reduces the amount
of generic unrelated code in l2tp.

This patch depends on d9b482c ("hashtable: introduce a small and naive
hashtable") which was merged in v3.6.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/l2tp/l2tp_core.c    | 140 +++++++++++++++++++-----------------------------
 net/l2tp/l2tp_core.h    |  15 ++++--
 net/l2tp/l2tp_debugfs.c |  19 +++----
 3 files changed, 74 insertions(+), 100 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 1a9f372..0b369e4 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -44,6 +44,7 @@
 #include <linux/udp.h>
 #include <linux/l2tp.h>
 #include <linux/hash.h>
+#include <linux/hashtable.h>
 #include <linux/sort.h>
 #include <linux/file.h>
 #include <linux/nsproxy.h>
@@ -107,8 +108,14 @@ static unsigned int l2tp_net_id;
 struct l2tp_net {
 	struct list_head l2tp_tunnel_list;
 	spinlock_t l2tp_tunnel_list_lock;
-	struct hlist_head l2tp_session_hlist[L2TP_HASH_SIZE_2];
-	spinlock_t l2tp_session_hlist_lock;
+/*
+ * Session hash global list for L2TPv3.
+ * The session_id SHOULD be random according to RFC3931, but several
+ * L2TP implementations use incrementing session_ids.  So we do a real
+ * hash on the session_id, rather than a simple bitmask.
+ */
+	DECLARE_HASHTABLE(l2tp_session_hash, L2TP_HASH_BITS_2);
+	spinlock_t l2tp_session_hash_lock;
 };
 
 static void l2tp_session_set_header_len(struct l2tp_session *session, int version);
@@ -156,30 +163,17 @@ do {									\
 #define l2tp_tunnel_dec_refcount(t) l2tp_tunnel_dec_refcount_1(t)
 #endif
 
-/* Session hash global list for L2TPv3.
- * The session_id SHOULD be random according to RFC3931, but several
- * L2TP implementations use incrementing session_ids.  So we do a real
- * hash on the session_id, rather than a simple bitmask.
- */
-static inline struct hlist_head *
-l2tp_session_id_hash_2(struct l2tp_net *pn, u32 session_id)
-{
-	return &pn->l2tp_session_hlist[hash_32(session_id, L2TP_HASH_BITS_2)];
-
-}
-
 /* Lookup a session by id in the global session list
  */
 static struct l2tp_session *l2tp_session_find_2(struct net *net, u32 session_id)
 {
 	struct l2tp_net *pn = l2tp_pernet(net);
-	struct hlist_head *session_list =
-		l2tp_session_id_hash_2(pn, session_id);
 	struct l2tp_session *session;
 	struct hlist_node *walk;
 
 	rcu_read_lock_bh();
-	hlist_for_each_entry_rcu(session, walk, session_list, global_hlist) {
+	hash_for_each_possible_rcu(pn->l2tp_session_hash, session, walk,
+					global_hlist, session_id) {
 		if (session->session_id == session_id) {
 			rcu_read_unlock_bh();
 			return session;
@@ -190,23 +184,10 @@ static struct l2tp_session *l2tp_session_find_2(struct net *net, u32 session_id)
 	return NULL;
 }
 
-/* Session hash list.
- * The session_id SHOULD be random according to RFC2661, but several
- * L2TP implementations (Cisco and Microsoft) use incrementing
- * session_ids.  So we do a real hash on the session_id, rather than a
- * simple bitmask.
- */
-static inline struct hlist_head *
-l2tp_session_id_hash(struct l2tp_tunnel *tunnel, u32 session_id)
-{
-	return &tunnel->session_hlist[hash_32(session_id, L2TP_HASH_BITS)];
-}
-
 /* Lookup a session by id
  */
 struct l2tp_session *l2tp_session_find(struct net *net, struct l2tp_tunnel *tunnel, u32 session_id)
 {
-	struct hlist_head *session_list;
 	struct l2tp_session *session;
 	struct hlist_node *walk;
 
@@ -217,15 +198,14 @@ struct l2tp_session *l2tp_session_find(struct net *net, struct l2tp_tunnel *tunn
 	if (tunnel == NULL)
 		return l2tp_session_find_2(net, session_id);
 
-	session_list = l2tp_session_id_hash(tunnel, session_id);
-	read_lock_bh(&tunnel->hlist_lock);
-	hlist_for_each_entry(session, walk, session_list, hlist) {
+	read_lock_bh(&tunnel->hash_lock);
+	hash_for_each_possible(tunnel->session_hash, session, walk, hlist, session_id) {
 		if (session->session_id == session_id) {
-			read_unlock_bh(&tunnel->hlist_lock);
+			read_unlock_bh(&tunnel->hash_lock);
 			return session;
 		}
 	}
-	read_unlock_bh(&tunnel->hlist_lock);
+	read_unlock_bh(&tunnel->hash_lock);
 
 	return NULL;
 }
@@ -238,17 +218,15 @@ struct l2tp_session *l2tp_session_find_nth(struct l2tp_tunnel *tunnel, int nth)
 	struct l2tp_session *session;
 	int count = 0;
 
-	read_lock_bh(&tunnel->hlist_lock);
-	for (hash = 0; hash < L2TP_HASH_SIZE; hash++) {
-		hlist_for_each_entry(session, walk, &tunnel->session_hlist[hash], hlist) {
-			if (++count > nth) {
-				read_unlock_bh(&tunnel->hlist_lock);
-				return session;
-			}
+	read_lock_bh(&tunnel->hash_lock);
+	hash_for_each(tunnel->session_hash, hash, walk, session, hlist) {
+		if (++count > nth) {
+			read_unlock_bh(&tunnel->hash_lock);
+			return session;
 		}
 	}
 
-	read_unlock_bh(&tunnel->hlist_lock);
+	read_unlock_bh(&tunnel->hash_lock);
 
 	return NULL;
 }
@@ -265,12 +243,10 @@ struct l2tp_session *l2tp_session_find_by_ifname(struct net *net, char *ifname)
 	struct l2tp_session *session;
 
 	rcu_read_lock_bh();
-	for (hash = 0; hash < L2TP_HASH_SIZE_2; hash++) {
-		hlist_for_each_entry_rcu(session, walk, &pn->l2tp_session_hlist[hash], global_hlist) {
-			if (!strcmp(session->ifname, ifname)) {
-				rcu_read_unlock_bh();
-				return session;
-			}
+	hash_for_each_rcu(pn->l2tp_session_hash, hash, walk, session, global_hlist) {
+		if (!strcmp(session->ifname, ifname)) {
+			rcu_read_unlock_bh();
+			return session;
 		}
 	}
 
@@ -1272,7 +1248,7 @@ end:
  */
 static void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
 {
-	int hash;
+	int hash, found = 0;
 	struct hlist_node *walk;
 	struct hlist_node *tmp;
 	struct l2tp_session *session;
@@ -1282,16 +1258,14 @@ static void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
 	l2tp_info(tunnel, L2TP_MSG_CONTROL, "%s: closing all sessions...\n",
 		  tunnel->name);
 
-	write_lock_bh(&tunnel->hlist_lock);
-	for (hash = 0; hash < L2TP_HASH_SIZE; hash++) {
-again:
-		hlist_for_each_safe(walk, tmp, &tunnel->session_hlist[hash]) {
-			session = hlist_entry(walk, struct l2tp_session, hlist);
-
+	write_lock_bh(&tunnel->hash_lock);
+	do {
+		found = 0;
+		hash_for_each_safe(tunnel->session_hash, hash, walk, tmp, session, hlist) {
 			l2tp_info(session, L2TP_MSG_CONTROL,
 				  "%s: closing session\n", session->name);
 
-			hlist_del_init(&session->hlist);
+			hash_del(&session->hlist);
 
 			/* Since we should hold the sock lock while
 			 * doing any unbinding, we need to release the
@@ -1302,14 +1276,14 @@ again:
 			if (session->ref != NULL)
 				(*session->ref)(session);
 
-			write_unlock_bh(&tunnel->hlist_lock);
+			write_unlock_bh(&tunnel->hash_lock);
 
 			if (tunnel->version != L2TP_HDR_VER_2) {
 				struct l2tp_net *pn = l2tp_pernet(tunnel->l2tp_net);
 
-				spin_lock_bh(&pn->l2tp_session_hlist_lock);
-				hlist_del_init_rcu(&session->global_hlist);
-				spin_unlock_bh(&pn->l2tp_session_hlist_lock);
+				spin_lock_bh(&pn->l2tp_session_hash_lock);
+				hash_del_rcu(&session->global_hlist);
+				spin_unlock_bh(&pn->l2tp_session_hash_lock);
 				synchronize_rcu();
 			}
 
@@ -1319,17 +1293,17 @@ again:
 			if (session->deref != NULL)
 				(*session->deref)(session);
 
-			write_lock_bh(&tunnel->hlist_lock);
+			write_lock_bh(&tunnel->hash_lock);
 
 			/* Now restart from the beginning of this hash
 			 * chain.  We always remove a session from the
 			 * list so we are guaranteed to make forward
 			 * progress.
 			 */
-			goto again;
+			found = 1;
 		}
-	}
-	write_unlock_bh(&tunnel->hlist_lock);
+	} while (found);
+	write_unlock_bh(&tunnel->hash_lock);
 }
 
 /* Really kill the tunnel.
@@ -1576,7 +1550,7 @@ int l2tp_tunnel_create(struct net *net, int fd, int version, u32 tunnel_id, u32
 
 	tunnel->magic = L2TP_TUNNEL_MAGIC;
 	sprintf(&tunnel->name[0], "tunl %u", tunnel_id);
-	rwlock_init(&tunnel->hlist_lock);
+	rwlock_init(&tunnel->hash_lock);
 
 	/* The net we belong to */
 	tunnel->l2tp_net = net;
@@ -1613,6 +1587,8 @@ int l2tp_tunnel_create(struct net *net, int fd, int version, u32 tunnel_id, u32
 
 	/* Add tunnel to our list */
 	INIT_LIST_HEAD(&tunnel->list);
+
+	hash_init(tunnel->session_hash);
 	atomic_inc(&l2tp_tunnel_count);
 
 	/* Bump the reference count. The tunnel context is deleted
@@ -1677,17 +1653,17 @@ void l2tp_session_free(struct l2tp_session *session)
 		BUG_ON(tunnel->magic != L2TP_TUNNEL_MAGIC);
 
 		/* Delete the session from the hash */
-		write_lock_bh(&tunnel->hlist_lock);
-		hlist_del_init(&session->hlist);
-		write_unlock_bh(&tunnel->hlist_lock);
+		write_lock_bh(&tunnel->hash_lock);
+		hash_del(&session->hlist);
+		write_unlock_bh(&tunnel->hash_lock);
 
 		/* Unlink from the global hash if not L2TPv2 */
 		if (tunnel->version != L2TP_HDR_VER_2) {
 			struct l2tp_net *pn = l2tp_pernet(tunnel->l2tp_net);
 
-			spin_lock_bh(&pn->l2tp_session_hlist_lock);
-			hlist_del_init_rcu(&session->global_hlist);
-			spin_unlock_bh(&pn->l2tp_session_hlist_lock);
+			spin_lock_bh(&pn->l2tp_session_hash_lock);
+			hash_del_rcu(&session->global_hlist);
+			spin_unlock_bh(&pn->l2tp_session_hash_lock);
 			synchronize_rcu();
 		}
 
@@ -1800,19 +1776,17 @@ struct l2tp_session *l2tp_session_create(int priv_size, struct l2tp_tunnel *tunn
 		sock_hold(tunnel->sock);
 
 		/* Add session to the tunnel's hash list */
-		write_lock_bh(&tunnel->hlist_lock);
-		hlist_add_head(&session->hlist,
-			       l2tp_session_id_hash(tunnel, session_id));
-		write_unlock_bh(&tunnel->hlist_lock);
+		write_lock_bh(&tunnel->hash_lock);
+		hash_add(tunnel->session_hash, &session->hlist, session_id);
+		write_unlock_bh(&tunnel->hash_lock);
 
 		/* And to the global session list if L2TPv3 */
 		if (tunnel->version != L2TP_HDR_VER_2) {
 			struct l2tp_net *pn = l2tp_pernet(tunnel->l2tp_net);
 
-			spin_lock_bh(&pn->l2tp_session_hlist_lock);
-			hlist_add_head_rcu(&session->global_hlist,
-					   l2tp_session_id_hash_2(pn, session_id));
-			spin_unlock_bh(&pn->l2tp_session_hlist_lock);
+			spin_lock_bh(&pn->l2tp_session_hash_lock);
+			hash_add(pn->l2tp_session_hash, &session->global_hlist, session_id);
+			spin_unlock_bh(&pn->l2tp_session_hash_lock);
 		}
 
 		/* Ignore management session in session count value */
@@ -1831,15 +1805,13 @@ EXPORT_SYMBOL_GPL(l2tp_session_create);
 static __net_init int l2tp_init_net(struct net *net)
 {
 	struct l2tp_net *pn = net_generic(net, l2tp_net_id);
-	int hash;
 
 	INIT_LIST_HEAD(&pn->l2tp_tunnel_list);
 	spin_lock_init(&pn->l2tp_tunnel_list_lock);
 
-	for (hash = 0; hash < L2TP_HASH_SIZE_2; hash++)
-		INIT_HLIST_HEAD(&pn->l2tp_session_hlist[hash]);
+	hash_init(pn->l2tp_session_hash);
 
-	spin_lock_init(&pn->l2tp_session_hlist_lock);
+	spin_lock_init(&pn->l2tp_session_hash_lock);
 
 	return 0;
 }
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index 56d583e..fc58c85 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -11,17 +11,17 @@
 #ifndef _L2TP_CORE_H_
 #define _L2TP_CORE_H_
 
+#include <linux/hashtable.h>
+
 /* Just some random numbers */
 #define L2TP_TUNNEL_MAGIC	0x42114DDA
 #define L2TP_SESSION_MAGIC	0x0C04EB7D
 
 /* Per tunnel, session hash table size */
 #define L2TP_HASH_BITS	4
-#define L2TP_HASH_SIZE	(1 << L2TP_HASH_BITS)
 
 /* System-wide, session hash table size */
 #define L2TP_HASH_BITS_2	8
-#define L2TP_HASH_SIZE_2	(1 << L2TP_HASH_BITS_2)
 
 /* Debug message categories for the DEBUG socket option */
 enum {
@@ -164,8 +164,15 @@ struct l2tp_tunnel_cfg {
 struct l2tp_tunnel {
 	int			magic;		/* Should be L2TP_TUNNEL_MAGIC */
 	struct rcu_head rcu;
-	rwlock_t		hlist_lock;	/* protect session_hlist */
-	struct hlist_head	session_hlist[L2TP_HASH_SIZE];
+	rwlock_t		hash_lock;	/* protect session_hash */
+/*
+ * Session hash list.
+ * The session_id SHOULD be random according to RFC2661, but several
+ * L2TP implementations (Cisco and Microsoft) use incrementing
+ * session_ids.  So we do a real hash on the session_id, rather than a
+ * simple bitmask.
+*/
+	DECLARE_HASHTABLE(session_hash, L2TP_HASH_BITS);
 						/* hashed list of sessions,
 						 * hashed by id */
 	u32			tunnel_id;
diff --git a/net/l2tp/l2tp_debugfs.c b/net/l2tp/l2tp_debugfs.c
index c3813bc..655f1fa 100644
--- a/net/l2tp/l2tp_debugfs.c
+++ b/net/l2tp/l2tp_debugfs.c
@@ -105,21 +105,16 @@ static void l2tp_dfs_seq_tunnel_show(struct seq_file *m, void *v)
 	int session_count = 0;
 	int hash;
 	struct hlist_node *walk;
-	struct hlist_node *tmp;
+	struct l2tp_session *session;
 
-	read_lock_bh(&tunnel->hlist_lock);
-	for (hash = 0; hash < L2TP_HASH_SIZE; hash++) {
-		hlist_for_each_safe(walk, tmp, &tunnel->session_hlist[hash]) {
-			struct l2tp_session *session;
+	read_lock_bh(&tunnel->hash_lock);
+	hash_for_each(tunnel->session_hash, hash, walk, session, hlist) {
+		if (session->session_id == 0)
+			continue;
 
-			session = hlist_entry(walk, struct l2tp_session, hlist);
-			if (session->session_id == 0)
-				continue;
-
-			session_count++;
-		}
+		session_count++;
 	}
-	read_unlock_bh(&tunnel->hlist_lock);
+	read_unlock_bh(&tunnel->hash_lock);
 
 	seq_printf(m, "\nTUNNEL %u peer %u", tunnel->tunnel_id, tunnel->peer_tunnel_id);
 	if (tunnel->sock) {
-- 
1.8.0

^ permalink raw reply related

* [PATCH 13/15] net,rds: use new hashtable implementation
From: Sasha Levin @ 2012-12-17 15:01 UTC (permalink / raw)
  To: Venkat Venkatsubra, David S. Miller, rds-devel, netdev,
	linux-kernel
  Cc: Sasha Levin
In-Reply-To: <1355756497-15834-1-git-send-email-sasha.levin@oracle.com>

Switch rds to use the new hashtable implementation. This reduces the amount of
generic unrelated code in rds.

This patch depends on d9b482c ("hashtable: introduce a small and naive
hashtable") which was merged in v3.6.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/rds/bind.c       |  20 +++++------
 net/rds/connection.c | 100 ++++++++++++++++++++++-----------------------------
 2 files changed, 53 insertions(+), 67 deletions(-)

diff --git a/net/rds/bind.c b/net/rds/bind.c
index 637bde5..a99e524 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -36,16 +36,16 @@
 #include <linux/if_arp.h>
 #include <linux/jhash.h>
 #include <linux/ratelimit.h>
+#include <linux/hashtable.h>
 #include "rds.h"
 
-#define BIND_HASH_SIZE 1024
-static struct hlist_head bind_hash_table[BIND_HASH_SIZE];
+#define BIND_HASH_BITS 10
+static DEFINE_HASHTABLE(bind_hash_table, BIND_HASH_BITS);
 static DEFINE_SPINLOCK(rds_bind_lock);
 
-static struct hlist_head *hash_to_bucket(__be32 addr, __be16 port)
+static u32 rds_hash(__be32 addr, __be16 port)
 {
-	return bind_hash_table + (jhash_2words((u32)addr, (u32)port, 0) &
-				  (BIND_HASH_SIZE - 1));
+	return jhash_2words((u32)addr, (u32)port, 0);
 }
 
 static struct rds_sock *rds_bind_lookup(__be32 addr, __be16 port,
@@ -53,12 +53,12 @@ static struct rds_sock *rds_bind_lookup(__be32 addr, __be16 port,
 {
 	struct rds_sock *rs;
 	struct hlist_node *node;
-	struct hlist_head *head = hash_to_bucket(addr, port);
+	u32 key = rds_hash(addr, port);
 	u64 cmp;
 	u64 needle = ((u64)be32_to_cpu(addr) << 32) | be16_to_cpu(port);
 
 	rcu_read_lock();
-	hlist_for_each_entry_rcu(rs, node, head, rs_bound_node) {
+	hash_for_each_possible_rcu(bind_hash_table, rs, node, rs_bound_node, key) {
 		cmp = ((u64)be32_to_cpu(rs->rs_bound_addr) << 32) |
 		      be16_to_cpu(rs->rs_bound_port);
 
@@ -74,13 +74,13 @@ static struct rds_sock *rds_bind_lookup(__be32 addr, __be16 port,
 		 * make sure our addr and port are set before
 		 * we are added to the list, other people
 		 * in rcu will find us as soon as the
-		 * hlist_add_head_rcu is done
+		 * hash_add_rcu is done
 		 */
 		insert->rs_bound_addr = addr;
 		insert->rs_bound_port = port;
 		rds_sock_addref(insert);
 
-		hlist_add_head_rcu(&insert->rs_bound_node, head);
+		hash_add_rcu(bind_hash_table, &insert->rs_bound_node, key);
 	}
 	return NULL;
 }
@@ -152,7 +152,7 @@ void rds_remove_bound(struct rds_sock *rs)
 		  rs, &rs->rs_bound_addr,
 		  ntohs(rs->rs_bound_port));
 
-		hlist_del_init_rcu(&rs->rs_bound_node);
+		hash_del_rcu(&rs->rs_bound_node);
 		rds_sock_put(rs);
 		rs->rs_bound_addr = 0;
 	}
diff --git a/net/rds/connection.c b/net/rds/connection.c
index 9e07c75..a9afcb8 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -34,28 +34,24 @@
 #include <linux/list.h>
 #include <linux/slab.h>
 #include <linux/export.h>
+#include <linux/hashtable.h>
 #include <net/inet_hashtables.h>
 
 #include "rds.h"
 #include "loop.h"
 
 #define RDS_CONNECTION_HASH_BITS 12
-#define RDS_CONNECTION_HASH_ENTRIES (1 << RDS_CONNECTION_HASH_BITS)
-#define RDS_CONNECTION_HASH_MASK (RDS_CONNECTION_HASH_ENTRIES - 1)
 
 /* converting this to RCU is a chore for another day.. */
 static DEFINE_SPINLOCK(rds_conn_lock);
 static unsigned long rds_conn_count;
-static struct hlist_head rds_conn_hash[RDS_CONNECTION_HASH_ENTRIES];
+static DEFINE_HASHTABLE(rds_conn_hash, RDS_CONNECTION_HASH_BITS);
 static struct kmem_cache *rds_conn_slab;
 
-static struct hlist_head *rds_conn_bucket(__be32 laddr, __be32 faddr)
+static unsigned long rds_conn_hashfn(__be32 laddr, __be32 faddr)
 {
 	/* Pass NULL, don't need struct net for hash */
-	unsigned long hash = inet_ehashfn(NULL,
-					  be32_to_cpu(laddr), 0,
-					  be32_to_cpu(faddr), 0);
-	return &rds_conn_hash[hash & RDS_CONNECTION_HASH_MASK];
+	return inet_ehashfn(NULL,  be32_to_cpu(laddr), 0,  be32_to_cpu(faddr), 0);
 }
 
 #define rds_conn_info_set(var, test, suffix) do {		\
@@ -64,14 +60,14 @@ static struct hlist_head *rds_conn_bucket(__be32 laddr, __be32 faddr)
 } while (0)
 
 /* rcu read lock must be held or the connection spinlock */
-static struct rds_connection *rds_conn_lookup(struct hlist_head *head,
-					      __be32 laddr, __be32 faddr,
+static struct rds_connection *rds_conn_lookup(__be32 laddr, __be32 faddr,
 					      struct rds_transport *trans)
 {
 	struct rds_connection *conn, *ret = NULL;
 	struct hlist_node *pos;
+	unsigned long key = rds_conn_hashfn(laddr, faddr);
 
-	hlist_for_each_entry_rcu(conn, pos, head, c_hash_node) {
+	hash_for_each_possible_rcu(rds_conn_hash, conn, pos, c_hash_node, key) {
 		if (conn->c_faddr == faddr && conn->c_laddr == laddr &&
 				conn->c_trans == trans) {
 			ret = conn;
@@ -117,13 +113,12 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
 				       int is_outgoing)
 {
 	struct rds_connection *conn, *parent = NULL;
-	struct hlist_head *head = rds_conn_bucket(laddr, faddr);
 	struct rds_transport *loop_trans;
 	unsigned long flags;
 	int ret;
 
 	rcu_read_lock();
-	conn = rds_conn_lookup(head, laddr, faddr, trans);
+	conn = rds_conn_lookup(laddr, faddr, trans);
 	if (conn && conn->c_loopback && conn->c_trans != &rds_loop_transport &&
 	    !is_outgoing) {
 		/* This is a looped back IB connection, and we're
@@ -224,13 +219,15 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
 		/* Creating normal conn */
 		struct rds_connection *found;
 
-		found = rds_conn_lookup(head, laddr, faddr, trans);
+		found = rds_conn_lookup(laddr, faddr, trans);
 		if (found) {
 			trans->conn_free(conn->c_transport_data);
 			kmem_cache_free(rds_conn_slab, conn);
 			conn = found;
 		} else {
-			hlist_add_head_rcu(&conn->c_hash_node, head);
+			unsigned long key = rds_conn_hashfn(laddr, faddr);
+
+			hash_add_rcu(rds_conn_hash, &conn->c_hash_node, key);
 			rds_cong_add_conn(conn);
 			rds_conn_count++;
 		}
@@ -303,7 +300,7 @@ void rds_conn_shutdown(struct rds_connection *conn)
 	 * conn - the reconnect is always triggered by the active peer. */
 	cancel_delayed_work_sync(&conn->c_conn_w);
 	rcu_read_lock();
-	if (!hlist_unhashed(&conn->c_hash_node)) {
+	if (hash_hashed(&conn->c_hash_node)) {
 		rcu_read_unlock();
 		rds_queue_reconnect(conn);
 	} else {
@@ -329,7 +326,7 @@ void rds_conn_destroy(struct rds_connection *conn)
 
 	/* Ensure conn will not be scheduled for reconnect */
 	spin_lock_irq(&rds_conn_lock);
-	hlist_del_init_rcu(&conn->c_hash_node);
+	hash_del(&conn->c_hash_node);
 	spin_unlock_irq(&rds_conn_lock);
 	synchronize_rcu();
 
@@ -375,7 +372,6 @@ static void rds_conn_message_info(struct socket *sock, unsigned int len,
 				  struct rds_info_lengths *lens,
 				  int want_send)
 {
-	struct hlist_head *head;
 	struct hlist_node *pos;
 	struct list_head *list;
 	struct rds_connection *conn;
@@ -388,27 +384,24 @@ static void rds_conn_message_info(struct socket *sock, unsigned int len,
 
 	rcu_read_lock();
 
-	for (i = 0, head = rds_conn_hash; i < ARRAY_SIZE(rds_conn_hash);
-	     i++, head++) {
-		hlist_for_each_entry_rcu(conn, pos, head, c_hash_node) {
-			if (want_send)
-				list = &conn->c_send_queue;
-			else
-				list = &conn->c_retrans;
-
-			spin_lock_irqsave(&conn->c_lock, flags);
-
-			/* XXX too lazy to maintain counts.. */
-			list_for_each_entry(rm, list, m_conn_item) {
-				total++;
-				if (total <= len)
-					rds_inc_info_copy(&rm->m_inc, iter,
-							  conn->c_laddr,
-							  conn->c_faddr, 0);
-			}
-
-			spin_unlock_irqrestore(&conn->c_lock, flags);
+	hash_for_each_rcu(rds_conn_hash, i, pos, conn, c_hash_node) {
+		if (want_send)
+			list = &conn->c_send_queue;
+		else
+			list = &conn->c_retrans;
+
+		spin_lock_irqsave(&conn->c_lock, flags);
+
+		/* XXX too lazy to maintain counts.. */
+		list_for_each_entry(rm, list, m_conn_item) {
+			total++;
+			if (total <= len)
+				rds_inc_info_copy(&rm->m_inc, iter,
+						  conn->c_laddr,
+						  conn->c_faddr, 0);
 		}
+
+		spin_unlock_irqrestore(&conn->c_lock, flags);
 	}
 	rcu_read_unlock();
 
@@ -438,7 +431,6 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
 			  size_t item_len)
 {
 	uint64_t buffer[(item_len + 7) / 8];
-	struct hlist_head *head;
 	struct hlist_node *pos;
 	struct rds_connection *conn;
 	size_t i;
@@ -448,23 +440,19 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
 	lens->nr = 0;
 	lens->each = item_len;
 
-	for (i = 0, head = rds_conn_hash; i < ARRAY_SIZE(rds_conn_hash);
-	     i++, head++) {
-		hlist_for_each_entry_rcu(conn, pos, head, c_hash_node) {
-
-			/* XXX no c_lock usage.. */
-			if (!visitor(conn, buffer))
-				continue;
-
-			/* We copy as much as we can fit in the buffer,
-			 * but we count all items so that the caller
-			 * can resize the buffer. */
-			if (len >= item_len) {
-				rds_info_copy(iter, buffer, item_len);
-				len -= item_len;
-			}
-			lens->nr++;
+	hash_for_each_rcu(rds_conn_hash, i, pos, conn, c_hash_node) {
+		/* XXX no c_lock usage.. */
+		if (!visitor(conn, buffer))
+			continue;
+
+		/* We copy as much as we can fit in the buffer,
+		 * but we count all items so that the caller
+		 * can resize the buffer. */
+		if (len >= item_len) {
+			rds_info_copy(iter, buffer, item_len);
+			len -= item_len;
 		}
+		lens->nr++;
 	}
 	rcu_read_unlock();
 }
@@ -525,8 +513,6 @@ void rds_conn_exit(void)
 {
 	rds_loop_exit();
 
-	WARN_ON(!hlist_empty(rds_conn_hash));
-
 	kmem_cache_destroy(rds_conn_slab);
 
 	rds_info_deregister_func(RDS_INFO_CONNECTIONS, rds_conn_info);
-- 
1.8.0

^ permalink raw reply related

* [PATCH 14/15] openvswitch: use new hashtable implementation
From: Sasha Levin @ 2012-12-17 15:01 UTC (permalink / raw)
  To: Jesse Gross, David S. Miller, dev, netdev, linux-kernel; +Cc: Sasha Levin
In-Reply-To: <1355756497-15834-1-git-send-email-sasha.levin@oracle.com>

Switch openvswitch to use the new hashtable implementation. This reduces the
amount of generic unrelated code in openvswitch.

This patch depends on d9b482c ("hashtable: introduce a small and naive
hashtable") which was merged in v3.6.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/openvswitch/vport.c | 35 ++++++++++++-----------------------
 1 file changed, 12 insertions(+), 23 deletions(-)

diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 70af0be..a946529 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -28,6 +28,7 @@
 #include <linux/rtnetlink.h>
 #include <linux/compat.h>
 #include <net/net_namespace.h>
+#include <linux/hashtable.h>
 
 #include "datapath.h"
 #include "vport.h"
@@ -41,8 +42,8 @@ static const struct vport_ops *vport_ops_list[] = {
 };
 
 /* Protected by RCU read lock for reading, RTNL lock for writing. */
-static struct hlist_head *dev_table;
-#define VPORT_HASH_BUCKETS 1024
+#define VPORT_HASH_BITS 10
+static DEFINE_HASHTABLE(dev_table, VPORT_HASH_BITS);
 
 /**
  *	ovs_vport_init - initialize vport subsystem
@@ -51,11 +52,6 @@ static struct hlist_head *dev_table;
  */
 int ovs_vport_init(void)
 {
-	dev_table = kzalloc(VPORT_HASH_BUCKETS * sizeof(struct hlist_head),
-			    GFP_KERNEL);
-	if (!dev_table)
-		return -ENOMEM;
-
 	return 0;
 }
 
@@ -66,13 +62,6 @@ int ovs_vport_init(void)
  */
 void ovs_vport_exit(void)
 {
-	kfree(dev_table);
-}
-
-static struct hlist_head *hash_bucket(struct net *net, const char *name)
-{
-	unsigned int hash = jhash(name, strlen(name), (unsigned long) net);
-	return &dev_table[hash & (VPORT_HASH_BUCKETS - 1)];
 }
 
 /**
@@ -84,13 +73,12 @@ static struct hlist_head *hash_bucket(struct net *net, const char *name)
  */
 struct vport *ovs_vport_locate(struct net *net, const char *name)
 {
-	struct hlist_head *bucket = hash_bucket(net, name);
 	struct vport *vport;
 	struct hlist_node *node;
+	int key = full_name_hash(name, strlen(name));
 
-	hlist_for_each_entry_rcu(vport, node, bucket, hash_node)
-		if (!strcmp(name, vport->ops->get_name(vport)) &&
-		    net_eq(ovs_dp_get_net(vport->dp), net))
+	hash_for_each_possible_rcu(dev_table, vport, node, hash_node, key)
+		if (!strcmp(name, vport->ops->get_name(vport)))
 			return vport;
 
 	return NULL;
@@ -174,7 +162,8 @@ struct vport *ovs_vport_add(const struct vport_parms *parms)
 
 	for (i = 0; i < ARRAY_SIZE(vport_ops_list); i++) {
 		if (vport_ops_list[i]->type == parms->type) {
-			struct hlist_head *bucket;
+			int key;
+			const char *name;
 
 			vport = vport_ops_list[i]->create(parms);
 			if (IS_ERR(vport)) {
@@ -182,9 +171,9 @@ struct vport *ovs_vport_add(const struct vport_parms *parms)
 				goto out;
 			}
 
-			bucket = hash_bucket(ovs_dp_get_net(vport->dp),
-					     vport->ops->get_name(vport));
-			hlist_add_head_rcu(&vport->hash_node, bucket);
+			name = vport->ops->get_name(vport);
+			key = full_name_hash(name, strlen(name));
+			hash_add_rcu(dev_table, &vport->hash_node, key);
 			return vport;
 		}
 	}
@@ -225,7 +214,7 @@ void ovs_vport_del(struct vport *vport)
 {
 	ASSERT_RTNL();
 
-	hlist_del_rcu(&vport->hash_node);
+	hash_del_rcu(&vport->hash_node);
 
 	vport->ops->destroy(vport);
 }
-- 
1.8.0

^ permalink raw reply related

* Re: Do I need to skb_put() Ethernet frames to a minimum of 60 bytes?
From: Arvid Brodin @ 2012-12-17 15:15 UTC (permalink / raw)
  To: Nicolas Ferre
  Cc: Ben Hutchings, netdev@vger.kernel.org, Eric Dumazet,
	linux-arm-kernel
In-Reply-To: <50CF216F.2010107@atmel.com>

On 2012-12-17 14:43, Nicolas Ferre wrote:
> On 08/21/2012 07:34 PM, Arvid Brodin :
>> On 2012-08-14 22:35, Ben Hutchings wrote:
>>> On Tue, 2012-08-14 at 18:53 +0000, Arvid Brodin wrote:
>>>> Hi,
>>>>
>>>> If I create an sk_buff with a payload of less than 28 bytes (ethheader + data),
>>>> and send it using the cadence/macb (Ethernet) driver, I get
>>>>
>>>> eth0: TX underrun, resetting buffers
>>>>
>>>> Now I know the minimum Ethernet frame size is 64 bytes (including the 4-byte
>>>> FCS), but whose responsibility is it to pad the frame to this size if necessary?
>>>> Mine or the driver's - i.e. should I just skb_put() to the minimum size or
>>>> should I report the underrun as a driver bug?
>>>
>>> If the hardware doesn't pad frames automatically then it's the driver's
>>> reponsibility to do so.
>>>
>>
>> Nicolas, can you take a look at this? At the moment I'm using the following change
>> in macb.c to avoid TX underruns on short packages:
>>
>> --- a/drivers/net/ethernet/cadence/macb.c	2012-05-04 19:14:41.927719667 +0200
>> +++ b/drivers/net/ethernet/cadence/macb.c	2012-08-21 19:22:40.063739049 +0200
>> @@ -618,6 +618,7 @@ static void macb_poll_controller(struct
>>  }
>>  #endif
>>
>> +#define MIN_ETHFRAME_LEN	60
>>  static int macb_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>  {
>>  	struct macb *bp = netdev_priv(dev);
>> @@ -635,6 +636,12 @@ static int macb_start_xmit(struct sk_buf
>>  	printk("\n");
>>  #endif
>>
>> +	if (skb->len < MIN_ETHFRAME_LEN) {
>> +		/* Pad skb to minium Ethernet frame size */
>> +		if (skb_tailroom(skb) >= MIN_ETHFRAME_LEN - skb->len)
>> +			memset(skb_put(skb, MIN_ETHFRAME_LEN - skb->len), 0,
>> +						MIN_ETHFRAME_LEN - skb->len);
>> +	}
>>  	len = skb->len;
>>  	spin_lock_irqsave(&bp->lock, flags);
>>
>>
>> ... but as you can see this is limited to linear skbs which has been allocated with
>> enough tailroom. Perhaps there are better ways to fix the problem? (Maybe the hardware
>> is actually doing the padding already and the problem has to do with the way the DMA
>> transfer is set up?)
> 
> I come back to this issue. It seems to me that the macb Cadence IP is
> padding automatically a too little packet. It is the usual behavior
> unless you specify otherwise in the CTRL register embedded in the tx
> descriptor. I also verified this with wireshark on both ICMP and UDP
> packets.
> 
> The error that you are experiencing is on at91sam9260 or at91sam9263
> SoCs, am I right?

No, this was on an AVR32 AP7000 board.

I believe this is what I did to solve the issue (patch for linux-2.6.37):

diff -Nurp linux-2.6.37-001-bsa400/drivers/net//macb.c
linux-2.6.37-macb-hsr/drivers/net//macb.c
--- linux-2.6.37-orig/drivers/net//macb.c	2012-09-16 22:41:02.746754672 +0200
+++ linux-2.6.37-macb/drivers/net//macb.c	2012-09-17 00:34:35.161389720 +0200
@@ -376,8 +379,9 @@ static void macb_tx(struct macb *bp)

 			rmb();

-			dma_unmap_single(&bp->pdev->dev, rp->mapping, skb->len,
-							 DMA_TO_DEVICE);
+			dma_unmap_single(&bp->pdev->dev, rp->mapping,
+					 max(skb->len, (unsigned int) ETH_ZLEN),
+					 DMA_TO_DEVICE);
 			rp->skb = NULL;
 			dev_kfree_skb_irq(skb);
 		}
@@ -413,7 +417,8 @@ static void macb_tx(struct macb *bp)

 		dev_dbg(&bp->pdev->dev, "skb %u (data %p) TX complete\n",
 			tail, skb->data);
-		dma_unmap_single(&bp->pdev->dev, rp->mapping, skb->len,
+		dma_unmap_single(&bp->pdev->dev, rp->mapping,
+				 max(skb->len, (unsigned int) ETH_ZLEN),
 				 DMA_TO_DEVICE);
 		bp->stats.tx_packets++;
 		bp->stats.tx_bytes += skb->len;
@@ -675,7 +680,10 @@ static int macb_start_xmit(struct sk_buf
 	printk("\n");
 #endif

-	len = skb->len;
+	if (skb_padto(skb, ETH_ZLEN) != 0)
+		return NETDEV_TX_OK; /* There is no NETDEV_TX_FAIL... */
+
+	len = max(skb->len, (unsigned int) ETH_ZLEN);
 	spin_lock_irqsave(&bp->lock, flags);

 	/* This is a hard error, log it. */


-- 
Arvid Brodin | Consultant (Linux)
XDIN AB | Knarrarnäsgatan 7 | SE-164 40 Kista | Sweden | xdin.com

^ permalink raw reply

* Re: [PATCH 4/4] FEC: Add time stamping code and a PTP hardware clock
From: Shawn Guo @ 2012-12-17 15:14 UTC (permalink / raw)
  To: Frank Li
  Cc: Sascha Hauer, Frank Li, lznua, richardcochran, linux-arm-kernel,
	netdev, davem
In-Reply-To: <CAHrpEqTVuSR_-Tpdzb98=VJbg7grSFvSQ9xA6mPsHpGb7RvNCg@mail.gmail.com>

Hi Sascha,

On Mon, Dec 17, 2012 at 10:48:31PM +0800, Frank Li wrote:
> > I don't know how to continue from here. Since the whole patch doesn't
> > seem to reviewed very much I tend to say we should revert it for now and
> > let Frank redo it for the next merge window.
> >
> > Other opinions?
> 
> Can we just disable CONFIG_FEC_PTP defaut instead of revert whole patch?
> 
To be clear, the following is what Frank meant.  Since Frank is out of
office for some time, I will send this immediate fix to David, if you
are fine with it.

Shawn

diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig
index 5ba6e1c..d1edb2e 100644
--- a/drivers/net/ethernet/freescale/Kconfig
+++ b/drivers/net/ethernet/freescale/Kconfig
@@ -96,7 +96,6 @@ config FEC_PTP
        bool "PTP Hardware Clock (PHC)"
        depends on FEC && ARCH_MXC
        select PTP_1588_CLOCK
-       default y if SOC_IMX6Q
        --help---
          Say Y here if you want to use PTP Hardware Clock (PHC) in the
          driver.  Only the basic clock operations have been implemented.

^ permalink raw reply related

* Re: [PATCH] ipv6: Fix Makefile offload objects
From: Vlad Yasevich @ 2012-12-17 15:40 UTC (permalink / raw)
  To: Simon Arlott; +Cc: David Miller, Linux Kernel Mailing List, netdev
In-Reply-To: <50CDFB36.1020604@simon.arlott.org.uk>

On 12/16/2012 11:47 AM, Simon Arlott wrote:
> The following commit breaks IPv6 TCP transmission for me:
> 	Commit 75fe83c32248d99e6d5fe64155e519b78bb90481
> 	Author: Vlad Yasevich <vyasevic@redhat.com>
> 	Date:   Fri Nov 16 09:41:21 2012 +0000
> 	ipv6: Preserve ipv6 functionality needed by NET
>
> This patch fixes the typo "ipv6_offload" which should be
> "ipv6-offload".
>
> I don't know why not including the offload modules should
> break TCP. Disabling all offload options on the NIC didn't
> help. Outgoing pulseaudio traffic kept stalling.

Did you restart your application to restart the socket?\

The trouble is that whe GSO is turned on, we try to perform
it on output.  If the output path can't find the gso handler
for the protocol (in your case tcp over IPv6), it drops the
packet.  This causes tcp to retransmit eventually withough GSO.

If you were in a VM, GSO is always used even though you might
disable it on the interface with ethtool.  The only way I've been
able to disable it when using virtio driver is by passing gso=0
parameter to the module.

-vlad

>
> Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
> ---
>   net/ipv6/Makefile |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
> index 2068ac4..4ea2448 100644
> --- a/net/ipv6/Makefile
> +++ b/net/ipv6/Makefile
> @@ -41,6 +41,6 @@ obj-$(CONFIG_IPV6_TUNNEL) += ip6_tunnel.o
>   obj-$(CONFIG_IPV6_GRE) += ip6_gre.o
>
>   obj-y += addrconf_core.o exthdrs_core.o
> -obj-$(CONFIG_INET) += output_core.o protocol.o $(ipv6_offload)
> +obj-$(CONFIG_INET) += output_core.o protocol.o $(ipv6-offload)
>
>   obj-$(subst m,y,$(CONFIG_IPV6)) += inet6_hashtables.o
>

^ permalink raw reply

* [PATCH] atm: use scnprintf() instead of sprintf()
From: chas williams - CONTRACTOR @ 2012-12-17 16:00 UTC (permalink / raw)
  To: davem; +Cc: netdev, Chen Gang


As reported by Chen Gang <gang.chen@asianux.com>, we should ensure there
is enough space when formatting the sysfs buffers.

Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
---
 net/atm/atm_sysfs.c |   40 +++++++++++++++-------------------------
 1 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/net/atm/atm_sysfs.c b/net/atm/atm_sysfs.c
index f49da58..350bf62 100644
--- a/net/atm/atm_sysfs.c
+++ b/net/atm/atm_sysfs.c
@@ -14,49 +14,45 @@ static ssize_t show_type(struct device *cdev,
 			 struct device_attribute *attr, char *buf)
 {
 	struct atm_dev *adev = to_atm_dev(cdev);
-	return sprintf(buf, "%s\n", adev->type);
+
+	return scnprintf(buf, PAGE_SIZE, "%s\n", adev->type);
 }
 
 static ssize_t show_address(struct device *cdev,
 			    struct device_attribute *attr, char *buf)
 {
-	char *pos = buf;
 	struct atm_dev *adev = to_atm_dev(cdev);
-	int i;
-
-	for (i = 0; i < (ESI_LEN - 1); i++)
-		pos += sprintf(pos, "%02x:", adev->esi[i]);
-	pos += sprintf(pos, "%02x\n", adev->esi[i]);
 
-	return pos - buf;
+	return scnprintf(buf, PAGE_SIZE, "%pM\n", adev->esi);
 }
 
 static ssize_t show_atmaddress(struct device *cdev,
 			       struct device_attribute *attr, char *buf)
 {
 	unsigned long flags;
-	char *pos = buf;
 	struct atm_dev *adev = to_atm_dev(cdev);
 	struct atm_dev_addr *aaddr;
 	int bin[] = { 1, 2, 10, 6, 1 }, *fmt = bin;
-	int i, j;
+	int i, j, count = 0;
 
 	spin_lock_irqsave(&adev->lock, flags);
 	list_for_each_entry(aaddr, &adev->local, entry) {
 		for (i = 0, j = 0; i < ATM_ESA_LEN; ++i, ++j) {
 			if (j == *fmt) {
-				pos += sprintf(pos, ".");
+				count += scnprintf(buf + count,
+						   PAGE_SIZE - count, ".");
 				++fmt;
 				j = 0;
 			}
-			pos += sprintf(pos, "%02x",
-				       aaddr->addr.sas_addr.prv[i]);
+			count += scnprintf(buf + count,
+					   PAGE_SIZE - count, "%02x",
+					   aaddr->addr.sas_addr.prv[i]);
 		}
-		pos += sprintf(pos, "\n");
+		count += scnprintf(buf + count, PAGE_SIZE - count, "\n");
 	}
 	spin_unlock_irqrestore(&adev->lock, flags);
 
-	return pos - buf;
+	return count;
 }
 
 static ssize_t show_atmindex(struct device *cdev,
@@ -64,25 +60,21 @@ static ssize_t show_atmindex(struct device *cdev,
 {
 	struct atm_dev *adev = to_atm_dev(cdev);
 
-	return sprintf(buf, "%d\n", adev->number);
+	return scnprintf(buf, PAGE_SIZE, "%d\n", adev->number);
 }
 
 static ssize_t show_carrier(struct device *cdev,
 			    struct device_attribute *attr, char *buf)
 {
-	char *pos = buf;
 	struct atm_dev *adev = to_atm_dev(cdev);
 
-	pos += sprintf(pos, "%d\n",
-		       adev->signal == ATM_PHY_SIG_LOST ? 0 : 1);
-
-	return pos - buf;
+	return scnprintf(buf, PAGE_SIZE, "%d\n",
+			 adev->signal == ATM_PHY_SIG_LOST ? 0 : 1);
 }
 
 static ssize_t show_link_rate(struct device *cdev,
 			      struct device_attribute *attr, char *buf)
 {
-	char *pos = buf;
 	struct atm_dev *adev = to_atm_dev(cdev);
 	int link_rate;
 
@@ -100,9 +92,7 @@ static ssize_t show_link_rate(struct device *cdev,
 	default:
 		link_rate = adev->link_rate * 8 * 53;
 	}
-	pos += sprintf(pos, "%d\n", link_rate);
-
-	return pos - buf;
+	return scnprintf(buf, PAGE_SIZE, "%d\n", link_rate);
 }
 
 static DEVICE_ATTR(address, S_IRUGO, show_address, NULL);
-- 
1.7.7.6

^ permalink raw reply related

* Re: [PATCH iproute2 6/6] ip/link_iptnl: fix indentation
From: Stephen Hemminger @ 2012-12-17 16:10 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev
In-Reply-To: <50CEDB8A.40303@6wind.com>

On Mon, 17 Dec 2012 09:44:58 +0100
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:

> Le 14/12/2012 19:02, Stephen Hemminger a écrit :
> > On Thu, 13 Dec 2012 14:42:54 +0100
> > Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> >
> >> Use tabs instead of space when possible.
> >>
> >> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> >
> > Thanks applied all these.
> >
> Two patches are missing in your tree:
> 1/6 ip: update man pages and usage() for 'ip monitor'
> 2/6 ip: add man pages for netconf
> 
> Should I resend them?

yes, probably got lost in the merge day.

^ permalink raw reply

* Re: RFC  [PATCH] iproute2:  temporary solution to fix xt breakage
From: Stephen Hemminger @ 2012-12-17 16:12 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Hasan Chowdhury, Jan Engelhardt, Yury Stankevich,
	netdev@vger.kernel.org, pablo, netfilter-devel
In-Reply-To: <50CF1071.1050405@mojatatu.com>

On Mon, 17 Dec 2012 07:30:41 -0500
Jamal Hadi Salim <jhs@mojatatu.com> wrote:

> On 12-12-16 03:41 PM, Jamal Hadi Salim wrote:
> >
> > There is an "intermediate solution" from Hasan which doesnt require
> > the kernel change. It changes the kernel endpoint to "ipt". I am
> > conflicted because it is a quick hack while otoh forcing people to
> > upgrade kernel is a usability issue.
> >
> 
> 
> Attached. Author is Hasan - I didnt sign it because i am looking for
> feedback and i find it distasteful but it solves the problem.
> This is needed until we have a proper fix in the kernel propagated.
> Once that kernel change is ubiquitous this change is noise and a
> maintanance pain. I am making it hard to even turn it on
> (i.e someone knowledgeable will have to compile with CONFIG_XT_HACK)
> 
> cheers,
> jamal
> 
> 

Maybe xtables should have stable API/ABI and use shim routines there?

^ permalink raw reply

* Re: [PATCH] add a `make dist` helper
From: Stephen Hemminger @ 2012-12-17 16:13 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: Stephen Hemminger, netdev
In-Reply-To: <201212161657.15449.vapier@gentoo.org>

[-- Attachment #1: Type: text/plain, Size: 609 bytes --]

On Sun, 16 Dec 2012 16:57:14 -0500
Mike Frysinger <vapier@gentoo.org> wrote:

> On Friday 14 December 2012 12:09:35 Stephen Hemminger wrote:
> > On Thu, 13 Dec 2012 23:16:10 -0800 Stephen Hemminger wrote:
> > > I appreciate the effort but there are a number of more steps to doing a
> > > release and I need to script them all together.
> 
> np
> 
> > The tarball's have been rebased, and I built a iproute2-release script for
> > next time.
> 
> commit it to the tree ? :)
> -mike

It has to many things that are unique to kernel.org and the package.
Plus at this time it only works for me

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH] ip: add the type 'vxlan' in the output of "ip link help"
From: Stephen Hemminger @ 2012-12-17 16:16 UTC (permalink / raw)
  To: zwu.kernel; +Cc: netdev, linux-kernel, Zhi Yong Wu
In-Reply-To: <1355588468-4964-1-git-send-email-zwu.kernel@gmail.com>

On Sun, 16 Dec 2012 00:21:08 +0800
zwu.kernel@gmail.com wrote:

> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> 
>   The new type 'vxlan' is added in the output of "ip link help"
> 
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> ---
>  ip/iplink.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/ip/iplink.c b/ip/iplink.c
> index d73c705..5ff8f85 100644
> --- a/ip/iplink.c
> +++ b/ip/iplink.c
> @@ -84,7 +84,7 @@ void iplink_usage(void)
>  	if (iplink_have_newlink()) {
>  		fprintf(stderr, "\n");
>  		fprintf(stderr, "TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | can |\n");
> -		fprintf(stderr, "          bridge | ipoib | ip6tnl | ipip | sit }\n");
> +		fprintf(stderr, "          bridge | ipoib | ip6tnl | ipip | sit | vxlan }\n");
>  	}
>  	exit(-1);
>  }

Applied

^ permalink raw reply

* [PATCH lksctp-tools] sctp_send: fix msg_control data corruption
From: Daniel Borkmann @ 2012-12-17 16:32 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, lksctp-developers, Daniel Borkmann

The byte array outcmsg is allocated on the stack within the if-block
that test for a valif sctp_sndrcvinfo structure. There, it is assigned
to outmsg.msg_control, which is later on after leaving the if-block
passed to sendmsg. With this minimal example, the following is
happening:

int main(void)
{
	int fd;
	struct sctp_sndrcvinfo sndinfo;

	fd = socket(AF_INET, SOCK_SEQPACKET, IPPROTO_SCTP);
	assert(fd > 0);

	sctp_send(fd, "bla", strlen("bla") + 1, &sndinfo, 0);

	return 0;
}

strace ./a.out before this patch:

  sendmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"bla\0", 4}],
          msg_controllen=48, {cmsg_len=3364590592, cmsg_level=0x4003e8
          /* SOL_??? */, cmsg_type=, ...}, msg_flags=0}, 0)

  --> cmsg_len corrupted

strace ./a.out after this patch:

  sendmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"bla\0", 4}],
          msg_controllen=48, {cmsg_len=48, cmsg_level=0x84 /* SOL_??? */,
                              cmsg_type=, ...}, msg_flags=0}, 0)

This is basically the case since 2005, introduced in the commit 91239acf
("Add sctp_send() API support and testcases"). However, probably this
changed due to a different compiler behaviour / optimization (?), since
it was not visible / affected by older Linux versions.

Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 src/lib/sendmsg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/lib/sendmsg.c b/src/lib/sendmsg.c
index 1de592d..9046174 100644
--- a/src/lib/sendmsg.c
+++ b/src/lib/sendmsg.c
@@ -76,6 +76,7 @@ sctp_send(int s, const void *msg, size_t len,
 {
 	struct msghdr outmsg;
 	struct iovec iov;
+	char outcmsg[CMSG_SPACE(sizeof(struct sctp_sndrcvinfo))];
 
 	outmsg.msg_name = NULL;
 	outmsg.msg_namelen = 0;
@@ -86,7 +87,6 @@ sctp_send(int s, const void *msg, size_t len,
 	outmsg.msg_controllen = 0;
 
 	if (sinfo) {	
-		char outcmsg[CMSG_SPACE(sizeof(struct sctp_sndrcvinfo))];
 		struct cmsghdr *cmsg;
 
 		outmsg.msg_control = outcmsg;
-- 
1.7.11.7

^ permalink raw reply related

* [RESEND PATCH iproute2 2/2] ip: update man pages and usage() for 'ip monitor'
From: Nicolas Dichtel @ 2012-12-17 16:41 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Nicolas Dichtel
In-Reply-To: <1355762487-4082-1-git-send-email-nicolas.dichtel@6wind.com>

Sync with the current code.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 ip/ipmonitor.c        |  5 ++++-
 man/man8/ip-monitor.8 | 15 +++++++++------
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/ip/ipmonitor.c b/ip/ipmonitor.c
index 09a339c..a9ff1e8 100644
--- a/ip/ipmonitor.c
+++ b/ip/ipmonitor.c
@@ -29,7 +29,10 @@ int prefix_banner;
 
 static void usage(void)
 {
-	fprintf(stderr, "Usage: ip monitor [ all | LISTofOBJECTS ]\n");
+	fprintf(stderr, "Usage: ip monitor [ all | LISTofOBJECTS ] [ FILE ]\n");
+	fprintf(stderr, "LISTofOBJECTS := link | address | route | mroute | prefix |\n");
+	fprintf(stderr, "                 neigh | netconf\n");
+	fprintf(stderr, "FILE := file FILENAME\n");
 	exit(-1);
 }
 
diff --git a/man/man8/ip-monitor.8 b/man/man8/ip-monitor.8
index 351a744..b07cb0e 100644
--- a/man/man8/ip-monitor.8
+++ b/man/man8/ip-monitor.8
@@ -1,4 +1,4 @@
-.TH IP\-MONITOR 8 "20 Dec 2011" "iproute2" "Linux"
+.TH IP\-MONITOR 8 "13 Dec 2012" "iproute2" "Linux"
 .SH "NAME"
 ip-monitor, rtmon \- state monitoring
 .SH "SYNOPSIS"
@@ -6,8 +6,8 @@ ip-monitor, rtmon \- state monitoring
 .ad l
 .in +8
 .ti -8
-.BR "ip monitor" " [ " all " |"
-.IR LISTofOBJECTS " ]"
+.BR "ip " " [ ip-OPTIONS ] " "monitor" " [ " all " |"
+.IR LISTofOBJECTS " ] [ file " FILENAME " ]
 .sp
 
 .SH DESCRIPTION
@@ -20,12 +20,13 @@ Namely, the
 command is the first in the command line and then the object list follows:
 
 .BR "ip monitor" " [ " all " |"
-.IR LISTofOBJECTS " ]"
+.IR LISTofOBJECTS " ] [ file " FILENAME " ]
 
 .I OBJECT-LIST
 is the list of object types that we want to monitor.
 It may contain
-.BR link ", " address " and " route "."
+.BR link ", " address ", " route ", " mroute ", " prefix ", "
+.BR neigh " and " netconf "."
 If no
 .B file
 argument is given,
@@ -34,7 +35,9 @@ opens RTNETLINK, listens on it and dumps state changes in the format
 described in previous sections.
 
 .P
-If a file name is given, it does not listen on RTNETLINK,
+If a
+.I FILENAME
+is given, it does not listen on RTNETLINK,
 but opens the file containing RTNETLINK messages saved in binary format
 and dumps them.  Such a history file can be generated with the
 .B rtmon
-- 
1.8.0.1

^ permalink raw reply related

* [RESEND PATCH iproute2 1/2] ip: add man pages for netconf
From: Nicolas Dichtel @ 2012-12-17 16:41 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Nicolas Dichtel
In-Reply-To: <20121217081026.5acd58a4@nehalam.linuxnetplumber.net>

This patch add the documentation about 'ip netconf' command.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 man/man8/Makefile     |  2 +-
 man/man8/ip-netconf.8 | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 man/man8/ip-netconf.8

diff --git a/man/man8/Makefile b/man/man8/Makefile
index 4bad9d6..d208f3b 100644
--- a/man/man8/Makefile
+++ b/man/man8/Makefile
@@ -9,7 +9,7 @@ MAN8PAGES = $(TARGETS) ip.8 arpd.8 lnstat.8 routel.8 rtacct.8 rtmon.8 ss.8 \
 	ip-addrlabel.8 ip-l2tp.8 \
 	ip-maddress.8 ip-monitor.8 ip-mroute.8 ip-neighbour.8 \
 	ip-netns.8 ip-ntable.8 ip-rule.8 ip-tunnel.8 ip-xfrm.8 \
-	ip-tcp_metrics.8
+	ip-tcp_metrics.8 ip-netconf.8
 
 all: $(TARGETS)
 
diff --git a/man/man8/ip-netconf.8 b/man/man8/ip-netconf.8
new file mode 100644
index 0000000..8041ea2
--- /dev/null
+++ b/man/man8/ip-netconf.8
@@ -0,0 +1,36 @@
+.TH IP\-NETCONF 8 "13 Dec 2012" "iproute2" "Linux"
+.SH "NAME"
+ip-netconf \- network configuration monitoring
+.SH "SYNOPSIS"
+.sp
+.ad l
+.in +8
+.ti -8
+.BR "ip " " [ ip-OPTIONS ] " "netconf show" " [ "
+.B dev
+.IR STRING " ]"
+
+.SH DESCRIPTION
+The
+.B ip netconf
+utility can monitor IPv4 and IPv6 parameters (see
+.BR "/proc/sys/net/ipv[4|6]/conf/[all|DEV]/" ")"
+like forwarding, rp_filter
+or mc_forwarding status.
+
+If no interface is specified, the entry
+.B all
+is displayed.
+
+.SS ip netconf show - display network parameters
+
+.TP
+.BI dev " STRING"
+the name of the device to display network parameters.
+
+.SH SEE ALSO
+.br
+.BR ip (8)
+
+.SH AUTHOR
+Original Manpage by Nicolas Dichtel <nicolas.dichtel@6wind.com>
-- 
1.8.0.1

^ permalink raw reply related

* Re: [RESEND PATCH iproute2 2/2] ip: update man pages and usage() for 'ip monitor'
From: Stephen Hemminger @ 2012-12-17 16:48 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev
In-Reply-To: <1355762487-4082-2-git-send-email-nicolas.dichtel@6wind.com>

On Mon, 17 Dec 2012 17:41:27 +0100
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:

> Sync with the current code.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Ok applied

^ permalink raw reply

* Re: [PATCH] netlink: align attributes on 64-bits
From: Nicolas Dichtel @ 2012-12-17 16:53 UTC (permalink / raw)
  To: David Laight; +Cc: tgraf, netdev, davem
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B70F0@saturn3.aculab.com>

Le 17/12/2012 10:59, David Laight a écrit :
>> -	if (unlikely(skb_tailroom(skb) < nla_total_size(attrlen)))
>> +	int align = IS_ALIGNED((unsigned long)skb_tail_pointer(skb), sizeof(void *)) ? 0 : 4;
>> +
>> +	if (unlikely(skb_tailroom(skb) < nla_total_size(attrlen) + align))
>>   		return -EMSGSIZE;
>>
>> +	if (align) {
>> +		/* Goal is to add an attribute with size 4. We know that
>> +		 * NLA_HDRLEN is 4, hence payload is 0.
>> +		 */
>> +		__nla_reserve(skb, 0, 0);
>> +	}
>> +
>
> Shouldn't the size of the dummy parameter be based on the value
> of 'align' - and that be based on the amount of padding needed?
>
Align is 4 or 0. Instead of the comment and 0, I can put 'NLA_HDRLEN - align', 
which will always be 0, because we made this patch because we don't want to 
change values like NLA_HDRLEN, because many user apps have these values 
/structures hardcoded.

> That aligns the write pointer, what guarantees the alignment of
> the start of the buffer - so that the reader will find aligned data?
As Thomas said, skb->head will be aligned, am I wrong?

>
> What guarantees that the reader will read the data into an
> 8-byte aligned buffer.
>
> There is also the lurking issue of items that require more
> than 8-byte alignment.
> (x86/amd64 requires 16-byte alignment for 16-byte SSE2 regs and
> 32-byte alignment for the AVX regs.)
>
> Will anyone ever want to put such items into a netlink message?
>
> 	David

^ permalink raw reply

* [PATCH v2] netlink: align attributes on 64-bits
From: Nicolas Dichtel @ 2012-12-17 16:49 UTC (permalink / raw)
  To: bhutchings; +Cc: tgraf, netdev, davem, David.Laight, Nicolas Dichtel
In-Reply-To: <1355500160.2626.9.camel@bwh-desktop.uk.solarflarecom.com>

We must ensure that attributes are always aligned on 64-bits boundary because
some arch may trap when accessing unaligned 64 bits value. We do that by adding
attributes of type 0, size 4 (alignment on 32-bits is already done) when needed.
Attribute type 0 should be available and unused in all netlink families.

Some callers of nlmsg_new() calculates the exact length of the attributes they
want to add to their netlink messages. Because we may add some unexpected
attributes type 0, we should take more room for that.

Note that I made the choice to align all kind of netlink attributes (even u8,
u16, ...) to simplify netlink API. Having two sort of nla_put() functions will
certainly be a source of wrong usage. Moreover, it ensures that all existing
code will be fine.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---

v2: align attributes on all arch, not only on 64-bits arch

 include/net/netlink.h |  9 +++++++++
 lib/nlattr.c          | 11 ++++++++++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index 9690b0f..bd9e48f 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -492,6 +492,15 @@ static inline struct nlmsghdr *nlmsg_put_answer(struct sk_buff *skb,
  */
 static inline struct sk_buff *nlmsg_new(size_t payload, gfp_t flags)
 {
+	/* Because attributes may be aligned on 64-bits boundary with fake
+	 * attribute (type 0, size 4 (attributes are 32-bits align by default)),
+	 * an exact payload size cannot be calculated. Hence, we need to reserve
+	 * more space for these attributes.
+	 * 128 is arbitrary: it allows to align up to 32 attributes.
+	 */
+	if (payload < NLMSG_DEFAULT_SIZE)
+		payload = min(payload + 128, (size_t)NLMSG_DEFAULT_SIZE);
+
 	return alloc_skb(nlmsg_total_size(payload), flags);
 }
 
diff --git a/lib/nlattr.c b/lib/nlattr.c
index 18eca78..7440a80 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -450,9 +450,18 @@ EXPORT_SYMBOL(__nla_put_nohdr);
  */
 int nla_put(struct sk_buff *skb, int attrtype, int attrlen, const void *data)
 {
-	if (unlikely(skb_tailroom(skb) < nla_total_size(attrlen)))
+	int align = IS_ALIGNED((unsigned long)skb_tail_pointer(skb), 8) ? 0 : 4;
+
+	if (unlikely(skb_tailroom(skb) < nla_total_size(attrlen) + align))
 		return -EMSGSIZE;
 
+	if (align) {
+		/* Goal is to add an attribute with size 4. We know that
+		 * NLA_HDRLEN is 4, hence payload is 0.
+		 */
+		__nla_reserve(skb, 0, 0);
+	}
+
 	__nla_put(skb, attrtype, attrlen, data);
 	return 0;
 }
-- 
1.8.0.1

^ permalink raw reply related

* RE: [PATCH v2] netlink: align attributes on 64-bits
From: David Laight @ 2012-12-17 17:06 UTC (permalink / raw)
  To: Nicolas Dichtel, bhutchings; +Cc: tgraf, netdev, davem
In-Reply-To: <1355762980-4285-1-git-send-email-nicolas.dichtel@6wind.com>

>  int nla_put(struct sk_buff *skb, int attrtype, int attrlen, const void *data)
>  {
> -	if (unlikely(skb_tailroom(skb) < nla_total_size(attrlen)))
> +	int align = IS_ALIGNED((unsigned long)skb_tail_pointer(skb), 8) ? 0 : 4;

I've just realised where you are adding this!
You only want to add pad if the attribute is a single 64bit item,
not whenever the destination is misaligned.

Eg what happens if you add a 4-byte item after an 8 byte one.

Are there are attributes that consist of a pair of 4 byte values?

...
> +	if (align) {
> +		/* Goal is to add an attribute with size 4. We know that
> +		 * NLA_HDRLEN is 4, hence payload is 0.
> +		 */
> +		__nla_reserve(skb, 0, 0);

One of those zeros should be 'align - 4', then the comment
can be more descriptive.

	David

^ permalink raw reply

* Re: [PATCH 3/3] configure: pull AR from the env too
From: Stephen Hemminger @ 2012-12-17 17:14 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: stephen.hemminger, netdev, jengelh
In-Reply-To: <1355695757-9957-3-git-send-email-vapier@gentoo.org>

On Sun, 16 Dec 2012 17:09:17 -0500
Mike Frysinger <vapier@gentoo.org> wrote:

> This matches the existing CC behavior.
> 
> Signed-off-by: Mike Frysinger <vapier@gentoo.org>
> ---
>  configure | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/configure b/configure
> index ea1038d..7c2db9b 100755
> --- a/configure
> +++ b/configure
> @@ -10,7 +10,9 @@ trap 'status=$?; rm -rf $TMPDIR; exit $status' EXIT HUP INT QUIT TERM
>  check_toolchain()
>  {
>  : ${PKG_CONFIG:=pkg-config}
> +: ${AR=ar}
>  : ${CC=gcc}
> +echo "AR:=${AR}" >>Config
>  echo "CC:=${CC}" >>Config
>  echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
>  }

All applied

^ permalink raw reply

* Re: [PATCH] bugfix: network namespace & device dummy
From: Vitaly E. Lavrov @ 2012-12-17 17:25 UTC (permalink / raw)
  To: netdev
In-Reply-To: <50CF1797.803@guap.ru>

On 17.12.2012 17:01, V. Lavrov wrote:
> If container has a network device dummyX (with lxc.network.type = 
> phys), then it disappears from the system after you close the container.
> The patch returns the device dummyX to the initial network namespace 
> after container is closed.
Do not use this patch. Network devices such as "ifb" and "dummy" can 
re-create command "ip li add ..."

This feature should be documented in the LXC

^ permalink raw reply

* Re: [PATCH v2] netlink: align attributes on 64-bits
From: Nicolas Dichtel @ 2012-12-17 17:35 UTC (permalink / raw)
  To: David Laight; +Cc: bhutchings, tgraf, netdev, davem
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B70F1@saturn3.aculab.com>

Le 17/12/2012 18:06, David Laight a écrit :
>>   int nla_put(struct sk_buff *skb, int attrtype, int attrlen, const void *data)
>>   {
>> -	if (unlikely(skb_tailroom(skb) < nla_total_size(attrlen)))
>> +	int align = IS_ALIGNED((unsigned long)skb_tail_pointer(skb), 8) ? 0 : 4;
>
> I've just realised where you are adding this!
> You only want to add pad if the attribute is a single 64bit item,
> not whenever the destination is misaligned.
As said in the commit log, I want to align all attributes. An attribute can be 
like this:

struct foo {
	__u32 bar1;
	__u32 bar2;
	__u64 bar3;
}

nla_put() don't know what is contained in the attribute.

>
> Eg what happens if you add a 4-byte item after an 8 byte one.
>
> Are there are attributes that consist of a pair of 4 byte values?
>
> ...
>> +	if (align) {
>> +		/* Goal is to add an attribute with size 4. We know that
>> +		 * NLA_HDRLEN is 4, hence payload is 0.
>> +		 */
>> +		__nla_reserve(skb, 0, 0);
>
> One of those zeros should be 'align - 4', then the comment
> can be more descriptive.
I thought if you were to research why we use 0, you would know that the first 0 
is the type and the second is the payload size...

^ permalink raw reply

* 3.6.10 tcp crash - net/ipv4/tcp.c:1667 & tcp.c:1655
From: Benjamin LaHaise @ 2012-12-17 17:41 UTC (permalink / raw)
  To: netdev

Hi folks,

I just hit the following crash with Fedora's 3.6.10-2.fc17 kernel.  I don't 
have time to debug this myself at the moment, but can certainly test patches 
or provide more info as needed.  I wasn't doing anything unusual at the time, 
just reading email/web browsing.  I believe the network driver in use was 
ipheth for tethering to an iPhone 4S over USB (the other driver being used 
intermittently on this laptop is iwlwifi).  Any ideas?

		-ben
-- 
"Thought is the essence of where you are now."

Dec 17 12:28:40 lappy kernel: [ 4044.846922] ------------[ cut here ]------------
Dec 17 12:28:40 lappy kernel: [ 4044.846931] WARNING: at net/ipv4/tcp.c:1667 tcp_recvmsg+0xc25/0xd80()
Dec 17 12:28:40 lappy kernel: [ 4044.846933] Hardware name: HP Pavilion dv7 Notebook PC
Dec 17 12:28:40 lappy kernel: [ 4044.846935] recvmsg bug 2: copied DE50E114 seq 90D65A21 rcvnxt DE50E114 fl 0
Dec 17 12:28:40 lappy kernel: [ 4044.846936] Modules linked in: fuse lockd sunrpc rfcomm bnep ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack btusb bluetooth snd_hda_codec_hdmi arc4 iwldvm ipheth mac80211 iTCO_wdt iTCO_vendor_support hp_wmi sparse_keymap uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media coretemp microcode i7core_edac edac_core snd_hda_codec_idt i2c_i801 ir_lirc_codec lirc_dev ir_mce_kbd_decoder ir_sanyo_decoder iwlwifi snd_hda_intel ir_sony_decoder snd_hda_codec cfg80211 snd_hwdep jmb38x_ms snd_seq snd_seq_device ir_jvc_decoder lpc_ich memstick mfd_core ir_rc6_decoder snd_pcm r8169 mii rfkill ir_rc5_decoder ir_nec_decoder snd_page_alloc snd_timer snd soundcore vhost_net t
 un macvtap macvlan rc_rc6_mce kvm ene_ir rc_core hp_accel lis3lv02d input_polldev uinput crc32c_intel sdhci_pci firewire_ohci sdhci firewire_core mmc_core crc_itu_t nouveau mxm_wmi wmi video i2c_algo_bit drm_kms_helper ttm drm
Dec 17 12:28:40 lappy kernel: i2c_core
Dec 17 12:28:40 lappy kernel: [ 4044.847025] Pid: 2080, comm: Socket Thread Tainted: G        W    3.6.10-2.fc17.x86_64 #1
Dec 17 12:28:40 lappy kernel: [ 4044.847030] Call Trace:
Dec 17 12:28:40 lappy kernel: [ 4044.847035]  [<ffffffff8105c8ef>] warn_slowpath_common+0x7f/0xc0
Dec 17 12:28:40 lappy kernel: [ 4044.847038]  [<ffffffff8105c9e6>] warn_slowpath_fmt+0x46/0x50
Dec 17 12:28:40 lappy kernel: [ 4044.847040]  [<ffffffff815576c5>] tcp_recvmsg+0xc25/0xd80
Dec 17 12:28:40 lappy kernel: [ 4044.847043]  [<ffffffff8157cb1b>] inet_recvmsg+0x6b/0x80
Dec 17 12:28:40 lappy kernel: [ 4044.847047]  [<ffffffff814fa707>] sock_recvmsg+0xd7/0x110
Dec 17 12:28:40 lappy kernel: [ 4044.847051]  [<ffffffff811a2fd0>] ? __pollwait+0xf0/0xf0
Dec 17 12:28:40 lappy kernel: [ 4044.847053]  [<ffffffff811a2fd0>] ? __pollwait+0xf0/0xf0
Dec 17 12:28:40 lappy kernel: [ 4044.847055]  [<ffffffff814fc11f>] sys_recvfrom+0xef/0x170
Dec 17 12:28:40 lappy kernel: [ 4044.847058]  [<ffffffff811a2fd0>] ? __pollwait+0xf0/0xf0
Dec 17 12:28:40 lappy kernel: [ 4044.847062]  [<ffffffff810d868c>] ? __audit_syscall_entry+0xcc/0x300
Dec 17 12:28:40 lappy kernel: [ 4044.847064]  [<ffffffff810d8cac>] ? __audit_syscall_exit+0x3ec/0x450
Dec 17 12:28:40 lappy kernel: [ 4044.847067]  [<ffffffff816270e9>] system_call_fastpath+0x16/0x1b
Dec 17 12:28:40 lappy kernel: [ 4044.847068] ---[ end trace 28d4acf1e1aa598d ]---
Dec 17 12:28:40 lappy kernel: [ 4044.847069] ------------[ cut here ]------------
Dec 17 12:28:40 lappy kernel: [ 4044.847071] WARNING: at net/ipv4/tcp.c:1655 tcp_recvmsg+0x671/0xd80()
Dec 17 12:28:40 lappy kernel: [ 4044.847072] Hardware name: HP Pavilion dv7 Notebook PC
Dec 17 12:28:40 lappy kernel: [ 4044.847073] recvmsg bug: copied DE50E114 seq 0 rcvnxt DE50E114 fl 0
Dec 17 12:28:40 lappy kernel: [ 4044.847074] Modules linked in: fuse lockd sunrpc rfcomm bnep ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack btusb bluetooth snd_hda_codec_hdmi arc4 iwldvm ipheth mac80211 iTCO_wdt iTCO_vendor_support hp_wmi sparse_keymap uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media coretemp microcode i7core_edac edac_core snd_hda_codec_idt i2c_i801 ir_lirc_codec lirc_dev ir_mce_kbd_decDec 17 12:29:32 lappy kernel: imklog 5.8.10, log source = /proc/kmsg started.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox