Netdev List
 help / color / mirror / Atom feed
* [PATCH 6/8] net: implement emergency pools
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

This patch implements emergency pools which are bound to a specific
network device. They can be activated via the socket interface and used
for a specific socket.
The pools are built on top of rx-recycling. The socket interface allows
to set the number of skbs in the pool and to active the pool.
The size of the skb which are accepted / added to the pool can not be
changed. It is set by the network driver and get altered on MTU change.
This requires to drop the current pool and re-allocate it. If the driver
does not set the skb size, the emergency pools can not be used.
Once the emergency pools are activated all rx-skbs allocation by the
network driver are taken from the pool. tx-skbs are allocated from the
emergency pool only for the relevant socket, i.e. that one which
activated the emergency mode.
Since the socket _and_ the driver can add/remove skbs to/from the pool
the list operations are using now skb_queue_head() and skb_dequeue().
There is patch later in the series which tries to bring the old unlock
behavior back if the emergency pools are not used by the user.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/alpha/include/asm/socket.h   |    4 +
 arch/arm/include/asm/socket.h     |    3 +
 arch/avr32/include/asm/socket.h   |    3 +
 arch/cris/include/asm/socket.h    |    5 +-
 arch/frv/include/asm/socket.h     |    4 +-
 arch/h8300/include/asm/socket.h   |    3 +
 arch/ia64/include/asm/socket.h    |    3 +
 arch/m32r/include/asm/socket.h    |    3 +
 arch/m68k/include/asm/socket.h    |    3 +
 arch/mips/include/asm/socket.h    |    3 +
 arch/mn10300/include/asm/socket.h |    3 +
 arch/parisc/include/asm/socket.h  |    3 +
 arch/powerpc/include/asm/socket.h |    3 +
 arch/s390/include/asm/socket.h    |    3 +
 arch/sparc/include/asm/socket.h   |    3 +
 arch/xtensa/include/asm/socket.h  |    3 +
 include/asm-generic/socket.h      |    4 +
 include/linux/netdevice.h         |   52 +++++++------
 include/linux/skbuff.h            |    1 +
 include/net/sock.h                |    2 +
 net/core/skbuff.c                 |    8 ++
 net/core/sock.c                   |  142 +++++++++++++++++++++++++++++++++++++
 22 files changed, 234 insertions(+), 27 deletions(-)

diff --git a/arch/alpha/include/asm/socket.h b/arch/alpha/include/asm/socket.h
index 06edfef..ea49db3 100644
--- a/arch/alpha/include/asm/socket.h
+++ b/arch/alpha/include/asm/socket.h
@@ -69,6 +69,10 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
+
 /* O_NONBLOCK clashes with the bits used for socket types.  Therefore we
  * have to define SOCK_NONBLOCK to a different value here.
  */
diff --git a/arch/arm/include/asm/socket.h b/arch/arm/include/asm/socket.h
index 90ffd04..b827010 100644
--- a/arch/arm/include/asm/socket.h
+++ b/arch/arm/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/avr32/include/asm/socket.h b/arch/avr32/include/asm/socket.h
index c8d1fae..64a7d45 100644
--- a/arch/avr32/include/asm/socket.h
+++ b/arch/avr32/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* __ASM_AVR32_SOCKET_H */
diff --git a/arch/cris/include/asm/socket.h b/arch/cris/include/asm/socket.h
index 1a4a619..9b8e7ed 100644
--- a/arch/cris/include/asm/socket.h
+++ b/arch/cris/include/asm/socket.h
@@ -64,6 +64,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
-
-
diff --git a/arch/frv/include/asm/socket.h b/arch/frv/include/asm/socket.h
index a6b2688..15a262f 100644
--- a/arch/frv/include/asm/socket.h
+++ b/arch/frv/include/asm/socket.h
@@ -62,5 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
-
diff --git a/arch/h8300/include/asm/socket.h b/arch/h8300/include/asm/socket.h
index 04c0f45..d46d64e 100644
--- a/arch/h8300/include/asm/socket.h
+++ b/arch/h8300/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/ia64/include/asm/socket.h b/arch/ia64/include/asm/socket.h
index 51427ea..04983aa 100644
--- a/arch/ia64/include/asm/socket.h
+++ b/arch/ia64/include/asm/socket.h
@@ -71,4 +71,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/asm/socket.h b/arch/m32r/include/asm/socket.h
index 469787c..a0e5431 100644
--- a/arch/m32r/include/asm/socket.h
+++ b/arch/m32r/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/m68k/include/asm/socket.h b/arch/m68k/include/asm/socket.h
index 9bf49c8..7018ceb 100644
--- a/arch/m68k/include/asm/socket.h
+++ b/arch/m68k/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/mips/include/asm/socket.h b/arch/mips/include/asm/socket.h
index 9de5190..9f9d93a 100644
--- a/arch/mips/include/asm/socket.h
+++ b/arch/mips/include/asm/socket.h
@@ -82,6 +82,9 @@ To add: #define SO_REUSEPORT 0x0200	/* Allow local address and port reuse.  */
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #ifdef __KERNEL__
 
 /** sock_type - Socket types
diff --git a/arch/mn10300/include/asm/socket.h b/arch/mn10300/include/asm/socket.h
index 4e60c42..70476eb 100644
--- a/arch/mn10300/include/asm/socket.h
+++ b/arch/mn10300/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/asm/socket.h b/arch/parisc/include/asm/socket.h
index 225b7d6..a4706d0 100644
--- a/arch/parisc/include/asm/socket.h
+++ b/arch/parisc/include/asm/socket.h
@@ -61,6 +61,9 @@
 
 #define SO_RXQ_OVFL             0x4021
 
+#define SO_EPOOL_QLEN		0x4022
+#define SO_EPOOL_SIZE		0x4023
+#define SO_EPOOL_MODE		0x4024
 /* O_NONBLOCK clashes with the bits used for socket types.  Therefore we
  * have to define SOCK_NONBLOCK to a different value here.
  */
diff --git a/arch/powerpc/include/asm/socket.h b/arch/powerpc/include/asm/socket.h
index 866f760..dce10f9 100644
--- a/arch/powerpc/include/asm/socket.h
+++ b/arch/powerpc/include/asm/socket.h
@@ -69,4 +69,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN           41
+#define SO_EPOOL_SIZE           42
+#define SO_EPOOL_MODE           43
 #endif	/* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/asm/socket.h b/arch/s390/include/asm/socket.h
index fdff1e9..73d0117 100644
--- a/arch/s390/include/asm/socket.h
+++ b/arch/s390/include/asm/socket.h
@@ -70,4 +70,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN           41
+#define SO_EPOOL_SIZE           42
+#define SO_EPOOL_MODE           43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/asm/socket.h b/arch/sparc/include/asm/socket.h
index 9d3fefc..39eea91 100644
--- a/arch/sparc/include/asm/socket.h
+++ b/arch/sparc/include/asm/socket.h
@@ -58,6 +58,9 @@
 
 #define SO_RXQ_OVFL             0x0024
 
+#define SO_EPOOL_QLEN           0x0025
+#define SO_EPOOL_SIZE           0x0026
+#define SO_EPOOL_MODE           0x0027
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
 #define SO_SECURITY_ENCRYPTION_TRANSPORT	0x5002
diff --git a/arch/xtensa/include/asm/socket.h b/arch/xtensa/include/asm/socket.h
index cbdf2ff..161a2e5 100644
--- a/arch/xtensa/include/asm/socket.h
+++ b/arch/xtensa/include/asm/socket.h
@@ -73,4 +73,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN           41
+#define SO_EPOOL_SIZE           42
+#define SO_EPOOL_MODE           43
 #endif	/* _XTENSA_SOCKET_H */
diff --git a/include/asm-generic/socket.h b/include/asm-generic/socket.h
index 9a6115e..fa9ccbb 100644
--- a/include/asm-generic/socket.h
+++ b/include/asm-generic/socket.h
@@ -64,4 +64,8 @@
 #define SO_DOMAIN		39
 
 #define SO_RXQ_OVFL             40
+
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4fa400b..fa7e951 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1095,6 +1095,28 @@ struct net_device {
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
+/**
+ *	dev_put - release reference to device
+ *	@dev: network device
+ *
+ * Release reference to device to allow it to be freed.
+ */
+static inline void dev_put(struct net_device *dev)
+{
+	atomic_dec(&dev->refcnt);
+}
+
+/**
+ *	dev_hold - get reference to device
+ *	@dev: network device
+ *
+ * Hold reference to device to keep it from being freed.
+ */
+static inline void dev_hold(struct net_device *dev)
+{
+	atomic_inc(&dev->refcnt);
+}
+
 static inline void net_recycle_init(struct net_device *dev, u32 qlen, u32 size)
 {
 	dev->rx_rec_skbs_max = qlen;
@@ -1118,9 +1140,13 @@ static inline void net_recycle_cleanup(struct net_device *dev)
 
 static inline void net_recycle_add(struct net_device *dev, struct sk_buff *skb)
 {
+	if (skb->emerg_dev) {
+		dev_put(skb->emerg_dev);
+		skb->emerg_dev = NULL;
+	}
 	if (skb_queue_len(&dev->rx_recycle) < dev->rx_rec_skbs_max &&
 			skb_recycle_check(skb, dev->rx_rec_skb_size))
-		__skb_queue_head(&dev->rx_recycle, skb);
+		skb_queue_head(&dev->rx_recycle, skb);
 	else
 		dev_kfree_skb_any(skb);
 }
@@ -1129,7 +1155,7 @@ static inline struct sk_buff *net_recycle_get(struct net_device *dev)
 {
 	struct sk_buff *skb;
 
-	skb = __skb_dequeue(&dev->rx_recycle);
+	skb = skb_dequeue(&dev->rx_recycle);
 	if (skb)
 		return skb;
 	return netdev_alloc_skb(dev, dev->rx_rec_skb_size);
@@ -1783,28 +1809,6 @@ extern int		netdev_budget;
 /* Called by rtnetlink.c:rtnl_unlock() */
 extern void netdev_run_todo(void);
 
-/**
- *	dev_put - release reference to device
- *	@dev: network device
- *
- * Release reference to device to allow it to be freed.
- */
-static inline void dev_put(struct net_device *dev)
-{
-	atomic_dec(&dev->refcnt);
-}
-
-/**
- *	dev_hold - get reference to device
- *	@dev: network device
- *
- * Hold reference to device to keep it from being freed.
- */
-static inline void dev_hold(struct net_device *dev)
-{
-	atomic_inc(&dev->refcnt);
-}
-
 /* Carrier loss detection, dial on demand. The functions netif_carrier_on
  * and _off may be called from IRQ context, but it is caller
  * who is responsible for serialization of these calls.
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ac74ee0..caee62c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -319,6 +319,7 @@ struct sk_buff {
 
 	struct sock		*sk;
 	struct net_device	*dev;
+	struct net_device	*emerg_dev;
 
 	/*
 	 * This is the control buffer. It is free to use for every
diff --git a/include/net/sock.h b/include/net/sock.h
index 4f26f2f..3f3518a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -314,6 +314,8 @@ struct sock {
 #endif
 	__u32			sk_mark;
 	u32			sk_classid;
+	u32			emerg_en;
+	/* XXX 4 bytes hole on 64 bit */
 	void			(*sk_state_change)(struct sock *sk);
 	void			(*sk_data_ready)(struct sock *sk, int bytes);
 	void			(*sk_write_space)(struct sock *sk);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 34432b4..f02737d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -425,6 +425,13 @@ static void skb_release_all(struct sk_buff *skb)
 
 void __kfree_skb(struct sk_buff *skb)
 {
+	struct net_device *ndev = skb->emerg_dev;
+
+	if (ndev) {
+		net_recycle_add(ndev, skb);
+		return;
+	}
+
 	skb_release_all(skb);
 	kfree_skbmem(skb);
 }
@@ -563,6 +570,7 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
 {
 #define C(x) n->x = skb->x
 
+	n->emerg_dev = NULL;
 	n->next = n->prev = NULL;
 	n->sk = NULL;
 	__copy_skb_header(n, skb);
diff --git a/net/core/sock.c b/net/core/sock.c
index fef2434..33aa1a5 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -472,6 +472,71 @@ static inline void sock_valbool_flag(struct sock *sk, int bit, int valbool)
 		sock_reset_flag(sk, bit);
 }
 
+static int sock_epool_set_qlen(struct sock *sk, int val)
+{
+	struct net *net = sock_net(sk);
+	struct net_device *dev;
+
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	if (!sk->sk_bound_dev_if)
+		return -ENODEV;
+	dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+	if (!dev)
+		return -ENODEV;
+
+	net_recycle_qlen(dev, val);
+	dev_put(dev);
+	return 0;
+}
+
+static int sock_epool_set_mode(struct sock *sk, int val)
+{
+	int ret;
+	struct net *net = sock_net(sk);
+	struct net_device *dev;
+
+	if (!val) {
+		sk->emerg_en = 0;
+		return 0;
+	}
+	if (sk->emerg_en && val)
+		return -EBUSY;
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+	if (!sk->sk_bound_dev_if)
+		return -ENODEV;
+	dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+	if (!dev)
+		return -ENODEV;
+	ret = -ENODEV;
+	if (!dev->rx_rec_skb_size)
+		goto out;
+
+	do {
+		struct sk_buff *skb;
+
+		if (skb_queue_len(&dev->rx_recycle) >= dev->rx_rec_skbs_max) {
+			ret = 0;
+			break;
+		}
+
+		skb = __netdev_alloc_skb(dev, dev->rx_rec_skb_size, GFP_KERNEL);
+		if (!skb) {
+			ret = -ENOMEM;
+			break;
+		}
+		net_recycle_add(dev, skb);
+	} while (1);
+
+	if (!ret)
+		sk->emerg_en = 1;
+out:
+	dev_put(dev);
+	return ret;
+}
+
 /*
  *	This is meant for all protocols to use and covers goings on
  *	at the socket level. Everything here is generic.
@@ -740,6 +805,15 @@ set_rcvbuf:
 		else
 			sock_reset_flag(sk, SOCK_RXQ_OVFL);
 		break;
+	case SO_EPOOL_QLEN:
+		ret = sock_epool_set_qlen(sk, val);
+		break;
+	case SO_EPOOL_SIZE:
+		ret = -EINVAL;
+		break;
+	case SO_EPOOL_MODE:
+		ret = sock_epool_set_mode(sk, valbool);
+		break;
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -961,6 +1035,35 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = !!sock_flag(sk, SOCK_RXQ_OVFL);
 		break;
 
+	case SO_EPOOL_QLEN:
+	{
+		struct net *net = sock_net(sk);
+		struct net_device *dev;
+
+		if (!sk->sk_bound_dev_if)
+			return -ENODEV;
+		dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+		if (!dev)
+			return -ENODEV;
+		v.val = dev->rx_rec_skbs_max;
+		break;
+	}
+	case SO_EPOOL_SIZE:
+	{
+		struct net *net = sock_net(sk);
+		struct net_device *dev;
+
+		if (!sk->sk_bound_dev_if)
+			return -ENODEV;
+		dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+		if (!dev)
+			return -ENODEV;
+		v.val = dev->rx_rec_skb_size;
+		break;
+	}
+	case SO_EPOOL_MODE:
+		v.val = sk->emerg_en;
+		break;
 	default:
 		return -ENOPROTOOPT;
 	}
@@ -1459,6 +1562,37 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo)
 	return timeo;
 }
 
+static struct sk_buff *alloc_emerg_skb(struct sock *sk, unsigned int skb_len)
+{
+	struct net *net = sock_net(sk);
+	struct net_device *dev;
+	int err;
+	struct sk_buff *skb;
+
+	err = -ENODEV;
+	if (!sk->sk_bound_dev_if)
+		return ERR_PTR(err);
+	dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+	if (!dev)
+		return ERR_PTR(err);
+	err = -EINVAL;
+	if (dev->rx_rec_skb_size < skb_len) {
+		dev_put(dev);
+		return ERR_PTR(err);
+	}
+	skb = skb_dequeue(&dev->rx_recycle);
+	if (!skb) {
+		dev_put(dev);
+		err = -ENOBUFS;
+		return ERR_PTR(err);
+	}
+	/*
+	 * dev will be put once the skb is back from
+	 * its journey.
+	 */
+	skb->emerg_dev = dev;
+	return skb;
+}
 
 /*
  *	Generic send/receive buffer handlers
@@ -1488,6 +1622,14 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 			goto failure;
 
 		if (atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf) {
+			if (sk->emerg_en) {
+				skb = alloc_emerg_skb(sk, header_len + data_len);
+				if (IS_ERR(skb)) {
+					err = PTR_ERR(skb);
+					goto failure;
+				}
+				break;
+			}
 			skb = alloc_skb(header_len, gfp_mask);
 			if (skb) {
 				int npages;
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH 7/8] net/emergency_skb: create a deep copy on clone
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

skb_clone() creates a clone of the skb: a new head is allocated from the
slab cache and the reference counter for the data part is incremented.
For the skbs from the emergency pool, we don't really want to clone
them that way:
- talking to slab may lead to lock contention which in turn increases
  the latency.
- the original (with the data part) may return earlier to the pool than
  the clone. In that case we would "lose" the skb from the emergency
  pool.

Instead we do a copy of head and data into a skb from the emergency
pool.
This patch cuts pskb_copy() into a helper function which does
the bare work and the remaining pskb_copy() allocates a new skb and
calls it.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/core/skbuff.c |   80 +++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f02737d..9e094fc 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -613,6 +613,7 @@ struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src)
 }
 EXPORT_SYMBOL_GPL(skb_morph);
 
+static int __pskb_copy(struct sk_buff *skb, struct sk_buff *n);
 /**
  *	skb_clone	-	duplicate an sk_buff
  *	@skb: buffer to clone
@@ -631,6 +632,20 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 {
 	struct sk_buff *n;
 
+	if (skb->emerg_dev) {
+		n = skb_dequeue(&skb->emerg_dev->rx_recycle);
+		if (!n)
+			goto norm_clone;
+		/* remove earlier reservers */
+		skb_reserve(n, - skb_headroom(n));
+		if (!__pskb_copy(skb, n)) {
+			n->emerg_dev = skb->emerg_dev;
+			dev_hold(skb->emerg_dev);
+			return n;
+		}
+		net_recycle_add(skb->emerg_dev, n);
+	}
+norm_clone:
 	n = skb + 1;
 	if (skb->fclone == SKB_FCLONE_ORIG &&
 	    n->fclone == SKB_FCLONE_UNAVAILABLE) {
@@ -720,31 +735,22 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask)
 EXPORT_SYMBOL(skb_copy);
 
 /**
- *	pskb_copy	-	create copy of an sk_buff with private head.
- *	@skb: buffer to copy
- *	@gfp_mask: allocation priority
+ *      __pskb_copy     -       create copy of an sk_buff with private head.
+ *      @skb: buffer to copy
+ *      @n: skb to copy it
  *
- *	Make a copy of both an &sk_buff and part of its data, located
- *	in header. Fragmented data remain shared. This is used when
- *	the caller wishes to modify only header of &sk_buff and needs
- *	private copy of the header to alter. Returns %NULL on failure
- *	or the pointer to the buffer on success.
- *	The returned buffer has a reference count of 1.
+ *      This functions behaves like pskb_copy() except that it takes
+ *      an allready allocated skb where it will copy head and data.
+ *      The returned buffer has a reference count of 1.
  */
-
-struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask)
+static int __pskb_copy(struct sk_buff *skb, struct sk_buff *n)
 {
-	/*
-	 *	Allocate the copy buffer
-	 */
-	struct sk_buff *n;
 #ifdef NET_SKBUFF_DATA_USES_OFFSET
-	n = alloc_skb(skb->end, gfp_mask);
+	if (skb->end > n->end)
 #else
-	n = alloc_skb(skb->end - skb->head, gfp_mask);
+	if ((skb->end - skb->head) > (n->end - n->head))
 #endif
-	if (!n)
-		goto out;
+		return -EMSGSIZE;
 
 	/* Set the data pointer */
 	skb_reserve(n, skb->data - skb->head);
@@ -773,8 +779,40 @@ struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask)
 	}
 
 	copy_skb_header(n, skb);
-out:
-	return n;
+	return 0;
+}
+
+/**
+ *	pskb_copy	-	create copy of an sk_buff with private head.
+ *	@skb: buffer to copy
+ *	@gfp_mask: allocation priority
+ *
+ *	Make a copy of both an &sk_buff and part of its data, located
+ *	in header. Fragmented data remain shared. This is used when
+ *	the caller wishes to modify only header of &sk_buff and needs
+ *	private copy of the header to alter. Returns %NULL on failure
+ *	or the pointer to the buffer on success.
+ *	The returned buffer has a reference count of 1.
+ */
+
+struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask)
+{
+	/*
+	 *	Allocate the copy buffer
+	 */
+	struct sk_buff *n;
+#ifdef NET_SKBUFF_DATA_USES_OFFSET
+	n = alloc_skb(skb->end, gfp_mask);
+#else
+	n = alloc_skb(skb->end - skb->head, gfp_mask);
+#endif
+	if (!n)
+		return NULL;
+	if (!__pskb_copy(skb, n))
+		return n;
+	kfree_skb(n);
+	return NULL;
+
 }
 EXPORT_SYMBOL(pskb_copy);
 
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH 4/8] net/stmmac: use generic recycling infrastructure
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/stmmac/stmmac.h      |    1 -
 drivers/net/stmmac/stmmac_main.c |   26 +++++++-------------------
 2 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/drivers/net/stmmac/stmmac.h b/drivers/net/stmmac/stmmac.h
index ebebc64..dbf9f95 100644
--- a/drivers/net/stmmac/stmmac.h
+++ b/drivers/net/stmmac/stmmac.h
@@ -44,7 +44,6 @@ struct stmmac_priv {
 	unsigned int dirty_rx;
 	struct sk_buff **rx_skbuff;
 	dma_addr_t *rx_skbuff_dma;
-	struct sk_buff_head rx_recycle;
 
 	struct net_device *dev;
 	int is_gmac;
diff --git a/drivers/net/stmmac/stmmac_main.c b/drivers/net/stmmac/stmmac_main.c
index a31d580..722a5e6 100644
--- a/drivers/net/stmmac/stmmac_main.c
+++ b/drivers/net/stmmac/stmmac_main.c
@@ -636,18 +636,7 @@ static void stmmac_tx(struct stmmac_priv *priv)
 			p->des3 = 0;
 
 		if (likely(skb != NULL)) {
-			/*
-			 * If there's room in the queue (limit it to size)
-			 * we add this skb back into the pool,
-			 * if it's the right size.
-			 */
-			if ((skb_queue_len(&priv->rx_recycle) <
-				priv->dma_rx_size) &&
-				skb_recycle_check(skb, priv->dma_buf_sz))
-				__skb_queue_head(&priv->rx_recycle, skb);
-			else
-				dev_kfree_skb(skb);
-
+			net_recycle_add(priv->dev, skb);
 			priv->tx_skbuff[entry] = NULL;
 		}
 
@@ -843,6 +832,9 @@ static int stmmac_open(struct net_device *dev)
 	priv->dma_buf_sz = STMMAC_ALIGN(buf_sz);
 	init_dma_desc_rings(dev);
 
+	net_recycle_init(priv->dev, priv->dma_rx_size, priv->dma_buf_sz +
+			NET_IP_ALIGN);
+
 	/* DMA initialization and SW reset */
 	if (unlikely(priv->hw->dma->init(ioaddr, priv->pbl, priv->dma_tx_phy,
 					 priv->dma_rx_phy) < 0)) {
@@ -894,7 +886,6 @@ static int stmmac_open(struct net_device *dev)
 		phy_start(priv->phydev);
 
 	napi_enable(&priv->napi);
-	skb_queue_head_init(&priv->rx_recycle);
 	netif_start_queue(dev);
 	return 0;
 }
@@ -925,7 +916,7 @@ static int stmmac_release(struct net_device *dev)
 		kfree(priv->tm);
 #endif
 	napi_disable(&priv->napi);
-	skb_queue_purge(&priv->rx_recycle);
+	net_recycle_cleanup(priv->dev);
 
 	/* Free the IRQ lines */
 	free_irq(dev->irq, dev);
@@ -1157,13 +1148,10 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv)
 		if (likely(priv->rx_skbuff[entry] == NULL)) {
 			struct sk_buff *skb;
 
-			skb = __skb_dequeue(&priv->rx_recycle);
-			if (skb == NULL)
-				skb = netdev_alloc_skb_ip_align(priv->dev,
-								bfsize);
-
+			skb = net_recycle_get(priv->dev);
 			if (unlikely(skb == NULL))
 				break;
+			skb_reserve(skb, NET_IP_ALIGN);
 
 			priv->rx_skbuff[entry] = skb;
 			priv->rx_skbuff_dma[entry] =
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH 5/8] net/ucc_geth: use generic recycling infrastructure
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/ucc_geth.c |   30 ++++++++----------------------
 drivers/net/ucc_geth.h |    2 --
 2 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index dc32a62..9d6097b 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -210,10 +210,7 @@ static struct sk_buff *get_new_skb(struct ucc_geth_private *ugeth,
 {
 	struct sk_buff *skb = NULL;
 
-	skb = __skb_dequeue(&ugeth->rx_recycle);
-	if (!skb)
-		skb = dev_alloc_skb(ugeth->ug_info->uf_info.max_rx_buf_length +
-				    UCC_GETH_RX_DATA_BUF_ALIGNMENT);
+	skb = net_recycle_get(ugeth->ndev);
 	if (skb == NULL)
 		return NULL;
 
@@ -1992,8 +1989,6 @@ static void ucc_geth_memclean(struct ucc_geth_private *ugeth)
 		iounmap(ugeth->ug_regs);
 		ugeth->ug_regs = NULL;
 	}
-
-	skb_queue_purge(&ugeth->rx_recycle);
 }
 
 static void ucc_geth_set_multi(struct net_device *dev)
@@ -2069,6 +2064,7 @@ static void ucc_geth_stop(struct ucc_geth_private *ugeth)
 	ugeth->phydev = NULL;
 
 	ucc_geth_memclean(ugeth);
+	net_recycle_cleanup(ugeth->ndev);
 }
 
 static int ucc_struct_init(struct ucc_geth_private *ugeth)
@@ -2205,9 +2201,6 @@ static int ucc_struct_init(struct ucc_geth_private *ugeth)
 			ugeth_err("%s: Failed to ioremap regs.", __func__);
 		return -ENOMEM;
 	}
-
-	skb_queue_head_init(&ugeth->rx_recycle);
-
 	return 0;
 }
 
@@ -3213,12 +3206,8 @@ static int ucc_geth_rx(struct ucc_geth_private *ugeth, u8 rxQ, int rx_work_limit
 			if (netif_msg_rx_err(ugeth))
 				ugeth_err("%s, %d: ERROR!!! skb - 0x%08x",
 					   __func__, __LINE__, (u32) skb);
-			if (skb) {
-				skb->data = skb->head + NET_SKB_PAD;
-				skb->len = 0;
-				skb_reset_tail_pointer(skb);
-				__skb_queue_head(&ugeth->rx_recycle, skb);
-			}
+			if (skb)
+				net_recycle_add(dev, skb);
 
 			ugeth->rx_skbuff[rxQ][ugeth->skb_currx[rxQ]] = NULL;
 			dev->stats.rx_dropped++;
@@ -3288,13 +3277,7 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
 
 		dev->stats.tx_packets++;
 
-		if (skb_queue_len(&ugeth->rx_recycle) < RX_BD_RING_LEN &&
-			     skb_recycle_check(skb,
-				    ugeth->ug_info->uf_info.max_rx_buf_length +
-				    UCC_GETH_RX_DATA_BUF_ALIGNMENT))
-			__skb_queue_head(&ugeth->rx_recycle, skb);
-		else
-			dev_kfree_skb(skb);
+		net_recycle_add(dev, skb);
 
 		ugeth->tx_skbuff[txQ][ugeth->skb_dirtytx[txQ]] = NULL;
 		ugeth->skb_dirtytx[txQ] =
@@ -3929,6 +3912,9 @@ static int ucc_geth_probe(struct of_device* ofdev, const struct of_device_id *ma
 	netif_napi_add(dev, &ugeth->napi, ucc_geth_poll, 64);
 	dev->mtu = 1500;
 
+	net_recycle_init(dev, RX_BD_RING_LEN, ug_info->uf_info.max_rx_buf_length
+			+ UCC_GETH_RX_DATA_BUF_ALIGNMENT);
+
 	ugeth->msg_enable = netif_msg_init(debug.msg_enable, UGETH_MSG_DEFAULT);
 	ugeth->phy_interface = phy_interface;
 	ugeth->max_speed = max_speed;
diff --git a/drivers/net/ucc_geth.h b/drivers/net/ucc_geth.h
index 05a9558..07c0816 100644
--- a/drivers/net/ucc_geth.h
+++ b/drivers/net/ucc_geth.h
@@ -1213,8 +1213,6 @@ struct ucc_geth_private {
 	/* index of the first skb which hasn't been transmitted yet. */
 	u16 skb_dirtytx[NUM_TX_QUEUES];
 
-	struct sk_buff_head rx_recycle;
-
 	struct ugeth_mii_info *mii_info;
 	struct phy_device *phydev;
 	phy_interface_t phy_interface;
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH 2/8] net/gianfar: use generic recycling infrasstructure
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/gianfar.c         |   28 ++++++++--------------------
 drivers/net/gianfar.h         |    2 --
 drivers/net/gianfar_ethtool.c |    1 +
 3 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index fccb7a3..1a1a249 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -1148,6 +1148,9 @@ static int gfar_probe(struct of_device *ofdev,
 		priv->rx_queue[i]->rxic = DEFAULT_RXIC;
 	}
 
+	net_recycle_init(dev, DEFAULT_RX_RING_SIZE,
+			priv->rx_buffer_size + RXBUF_ALIGNMENT);
+
 	/* enable filer if using multiple RX queues*/
 	if(priv->num_rx_queues > 1)
 		priv->rx_filer_enable = 1;
@@ -1768,7 +1771,7 @@ static void free_skb_resources(struct gfar_private *priv)
 			sizeof(struct rxbd8) * priv->total_rx_ring_size,
 			priv->tx_queue[0]->tx_bd_base,
 			priv->tx_queue[0]->tx_bd_dma_base);
-	skb_queue_purge(&priv->rx_recycle);
+	net_recycle_cleanup(priv->ndev);
 }
 
 void gfar_start(struct net_device *dev)
@@ -1949,8 +1952,6 @@ static int gfar_enet_open(struct net_device *dev)
 
 	enable_napi(priv);
 
-	skb_queue_head_init(&priv->rx_recycle);
-
 	/* Initialize a bunch of registers */
 	init_registers(dev);
 
@@ -2366,6 +2367,7 @@ static int gfar_change_mtu(struct net_device *dev, int new_mtu)
 		stop_gfar(dev);
 
 	priv->rx_buffer_size = tempsize;
+	net_recycle_size(dev, tempsize + RXBUF_ALIGNMENT);
 
 	dev->mtu = new_mtu;
 
@@ -2498,16 +2500,7 @@ static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 			bdp = next_txbd(bdp, base, tx_ring_size);
 		}
 
-		/*
-		 * If there's room in the queue (limit it to rx_buffer_size)
-		 * we add this skb back into the pool, if it's the right size
-		 */
-		if (skb_queue_len(&priv->rx_recycle) < rx_queue->rx_ring_size &&
-				skb_recycle_check(skb, priv->rx_buffer_size +
-					RXBUF_ALIGNMENT))
-			__skb_queue_head(&priv->rx_recycle, skb);
-		else
-			dev_kfree_skb_any(skb);
+		net_recycle_add(dev, skb);
 
 		tx_queue->tx_skbuff[skb_dirtytx] = NULL;
 
@@ -2573,14 +2566,9 @@ static void gfar_new_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
 struct sk_buff * gfar_new_skb(struct net_device *dev)
 {
 	unsigned int alignamount;
-	struct gfar_private *priv = netdev_priv(dev);
 	struct sk_buff *skb = NULL;
 
-	skb = __skb_dequeue(&priv->rx_recycle);
-	if (!skb)
-		skb = netdev_alloc_skb(dev,
-				priv->rx_buffer_size + RXBUF_ALIGNMENT);
-
+	skb = net_recycle_get(dev);
 	if (!skb)
 		return NULL;
 
@@ -2753,7 +2741,7 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit)
 				 * recycle list.
 				 */
 				skb_reserve(skb, -GFAR_CB(skb)->alignamount);
-				__skb_queue_head(&priv->rx_recycle, skb);
+				net_recycle_add(dev, skb);
 			}
 		} else {
 			/* Increment the number of packets */
diff --git a/drivers/net/gianfar.h b/drivers/net/gianfar.h
index 710810e..66f8c04 100644
--- a/drivers/net/gianfar.h
+++ b/drivers/net/gianfar.h
@@ -1068,8 +1068,6 @@ struct gfar_private {
 
 	u32 cur_filer_idx;
 
-	struct sk_buff_head rx_recycle;
-
 	struct vlan_group *vlgrp;
 
 
diff --git a/drivers/net/gianfar_ethtool.c b/drivers/net/gianfar_ethtool.c
index 9bda023..8a6d567 100644
--- a/drivers/net/gianfar_ethtool.c
+++ b/drivers/net/gianfar_ethtool.c
@@ -508,6 +508,7 @@ static int gfar_sringparam(struct net_device *dev, struct ethtool_ringparam *rva
 		priv->tx_queue[i]->tx_ring_size = rvals->tx_pending;
 		priv->tx_queue[i]->num_txbdfree = priv->tx_queue[i]->tx_ring_size;
 	}
+	net_recycle_qlen(dev, rvals->rx_pending);
 
 	/* Rebuild the rings with the new size */
 	if (dev->flags & IFF_UP) {
-- 
1.6.6.1


^ permalink raw reply related

* Generic rx-recycling and emergency skb pool
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx


This is version two of generic rx-recycling followed by version one of
emergency skb pools which are built on top of rx-recycling.
The change from v1 of generic rx-recycling is that the list access is
unlocked instead of locked.
Patch six which introduces the emergency pools adds the locking back.
This is required since we now have two not serialized users. In order
not to punish everyone patch eight removes this locking again. That
patch converts only two drivers so you have an idea what I think is
required to get the locking removed.

The idea behind emergency pools is to have pre-allocated skbs for TX and
RX. Using the memory allocator for it leads to latencies during memory
pressure. The pre-allocated skb are "tagged" and should get back to the
pool once they are through the stack so the pool should never get
exhausted.

While it was easy to convert the drivers which share the same concept of
rx-recycling to use the emergency pools it was difficult to hook up the
more complex drivers like e1000e. The e1000e can use split skbs / a frag
list which is different from the allocation currently used. So instead of
forcing all drivers to use the same way of doing things I've been thinking
about providing a dedicated callback for skb allocation and checking if
this skb is "good enough". This is not yet implemented.

I would be glad to receive some feedback on this patch series before I go
any further. Unfortunately I'm on vacation for the next two weeks so I
can't respond earlier. tglx is on Cc and should be able respond earlier :)

Sebastian

^ permalink raw reply

* [PATCH 3/8] net/mv643xx: use generic recycling infrastructure
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/mv643xx_eth.c |   27 +++++++++------------------
 1 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index 82b720f..a58ba48 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -404,8 +404,6 @@ struct mv643xx_eth_private {
 	u8 work_rx_refill;
 
 	int skb_size;
-	struct sk_buff_head rx_recycle;
-
 	/*
 	 * RX state.
 	 */
@@ -649,6 +647,7 @@ err:
 static int rxq_refill(struct rx_queue *rxq, int budget)
 {
 	struct mv643xx_eth_private *mp = rxq_to_mp(rxq);
+	struct net_device *dev = mp->dev;
 	int refilled;
 
 	refilled = 0;
@@ -658,10 +657,7 @@ static int rxq_refill(struct rx_queue *rxq, int budget)
 		struct rx_desc *rx_desc;
 		int size;
 
-		skb = __skb_dequeue(&mp->rx_recycle);
-		if (skb == NULL)
-			skb = dev_alloc_skb(mp->skb_size);
-
+		skb = net_recycle_get(dev);
 		if (skb == NULL) {
 			mp->oom = 1;
 			goto oom;
@@ -921,6 +917,7 @@ out:
 static int txq_reclaim(struct tx_queue *txq, int budget, int force)
 {
 	struct mv643xx_eth_private *mp = txq_to_mp(txq);
+	struct net_device *dev = mp->dev;
 	struct netdev_queue *nq = netdev_get_tx_queue(mp->dev, txq->index);
 	int reclaimed;
 
@@ -967,14 +964,8 @@ static int txq_reclaim(struct tx_queue *txq, int budget, int force)
 				       desc->byte_cnt, DMA_TO_DEVICE);
 		}
 
-		if (skb != NULL) {
-			if (skb_queue_len(&mp->rx_recycle) <
-					mp->rx_ring_size &&
-			    skb_recycle_check(skb, mp->skb_size))
-				__skb_queue_head(&mp->rx_recycle, skb);
-			else
-				dev_kfree_skb(skb);
-		}
+		if (skb)
+			net_recycle_add(dev, skb);
 	}
 
 	__netif_tx_unlock(nq);
@@ -1563,7 +1554,7 @@ mv643xx_eth_set_ringparam(struct net_device *dev, struct ethtool_ringparam *er)
 
 	mp->rx_ring_size = er->rx_pending < 4096 ? er->rx_pending : 4096;
 	mp->tx_ring_size = er->tx_pending < 4096 ? er->tx_pending : 4096;
-
+	net_recycle_qlen(dev, mp->rx_ring_size);
 	if (netif_running(dev)) {
 		mv643xx_eth_stop(dev);
 		if (mv643xx_eth_open(dev)) {
@@ -2344,9 +2335,9 @@ static int mv643xx_eth_open(struct net_device *dev)
 
 	mv643xx_eth_recalc_skb_size(mp);
 
-	napi_enable(&mp->napi);
+	net_recycle_init(mp->dev, mp->rx_ring_size, mp->skb_size);
 
-	skb_queue_head_init(&mp->rx_recycle);
+	napi_enable(&mp->napi);
 
 	mp->int_mask = INT_EXT;
 
@@ -2442,7 +2433,7 @@ static int mv643xx_eth_stop(struct net_device *dev)
 	mib_counters_update(mp);
 	del_timer_sync(&mp->mib_counters_timer);
 
-	skb_queue_purge(&mp->rx_recycle);
+	net_recycle_cleanup(dev);
 
 	for (i = 0; i < mp->rxq_count; i++)
 		rxq_deinit(mp->rxq + i);
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH 1/8] net: implement generic rx recycling
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior
In-Reply-To: <1278098421-21296-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

the code is basically what other drivers like gianfar are using right
now.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/netdevice.h |   67 +++++++++++++++++++++++++++++++++++++--------
 net/core/dev.c            |    1 +
 2 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8fa5e5a..4fa400b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -595,6 +595,18 @@ struct netdev_rx_queue {
 } ____cacheline_aligned_in_smp;
 #endif /* CONFIG_RPS */
 
+/* Use this variant when it is known for sure that it
+ * is executing from hardware interrupt context or with hardware interrupts
+ * disabled.
+ */
+extern void dev_kfree_skb_irq(struct sk_buff *skb);
+
+/* Use this variant in places where it could be invoked
+ * from either hardware interrupt or other context, with hardware interrupts
+ * either disabled or enabled.
+ */
+extern void dev_kfree_skb_any(struct sk_buff *skb);
+
 /*
  * This structure defines the management hooks for network devices.
  * The following hooks can be defined; unless noted otherwise, they are
@@ -1077,9 +1089,52 @@ struct net_device {
 #endif
 	/* n-tuple filter list attached to this device */
 	struct ethtool_rx_ntuple_list ethtool_ntuple_list;
+	struct sk_buff_head rx_recycle;
+	u32 rx_rec_skbs_max;
+	u32 rx_rec_skb_size;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
+static inline void net_recycle_init(struct net_device *dev, u32 qlen, u32 size)
+{
+	dev->rx_rec_skbs_max = qlen;
+	dev->rx_rec_skb_size = size;
+}
+
+static inline void net_recycle_size(struct net_device *dev, u32 size)
+{
+	dev->rx_rec_skb_size = size;
+}
+
+static inline void net_recycle_qlen(struct net_device *dev, u32 qlen)
+{
+	dev->rx_rec_skbs_max = qlen;
+}
+
+static inline void net_recycle_cleanup(struct net_device *dev)
+{
+	skb_queue_purge(&dev->rx_recycle);
+}
+
+static inline void net_recycle_add(struct net_device *dev, struct sk_buff *skb)
+{
+	if (skb_queue_len(&dev->rx_recycle) < dev->rx_rec_skbs_max &&
+			skb_recycle_check(skb, dev->rx_rec_skb_size))
+		__skb_queue_head(&dev->rx_recycle, skb);
+	else
+		dev_kfree_skb_any(skb);
+}
+
+static inline struct sk_buff *net_recycle_get(struct net_device *dev)
+{
+	struct sk_buff *skb;
+
+	skb = __skb_dequeue(&dev->rx_recycle);
+	if (skb)
+		return skb;
+	return netdev_alloc_skb(dev, dev->rx_rec_skb_size);
+}
+
 #define	NETDEV_ALIGN		32
 
 static inline
@@ -1672,18 +1727,6 @@ static inline int netif_is_multiqueue(const struct net_device *dev)
 	return (dev->num_tx_queues > 1);
 }
 
-/* Use this variant when it is known for sure that it
- * is executing from hardware interrupt context or with hardware interrupts
- * disabled.
- */
-extern void dev_kfree_skb_irq(struct sk_buff *skb);
-
-/* Use this variant in places where it could be invoked
- * from either hardware interrupt or other context, with hardware interrupts
- * either disabled or enabled.
- */
-extern void dev_kfree_skb_any(struct sk_buff *skb);
-
 #define HAVE_NETIF_RX 1
 extern int		netif_rx(struct sk_buff *skb);
 extern int		netif_rx_ni(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index e85cc5f..db9acd5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5404,6 +5404,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 
 	netdev_init_queues(dev);
 
+	skb_queue_head_init(&dev->rx_recycle);
 	INIT_LIST_HEAD(&dev->ethtool_ntuple_list.list);
 	dev->ethtool_ntuple_list.count = 0;
 	INIT_LIST_HEAD(&dev->napi_list);
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH] s2io: resolve statistics issues
From: Jon Mason @ 2010-07-02 19:13 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, Ramkrishna Vepa, Sivakumar Subramani,
	Sreenivasa Honnur, Michal Schmidt
In-Reply-To: <20100630005454.GJ21324@exar.com>

This patch resolves a number of issues in the statistics gathering of
the s2io driver.

On Xframe adapters, the received multicast statistics counter includes
pause frames which are not indicated to the driver.  This can cause
issues where the multicast packet count is higher than what has actually
been received, possibly higher than the number of packets received.
    
The driver software counters are replaced with the adapter hardware
statistics for rx_packets, rx_bytes, and tx_bytes.  It also uses the
overflow registers to determine if the statistics wrapped the 32bit
register (removing the window of having a statistic value less than the
previous call).  rx_length_errors statistic now includes undersized
packets in addition to oversized packets in its counting.  Finally,
rx_crc_errors are now being counted.
    
Signed-off-by: Jon Mason <jon.mason@exar.com>

diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
index 22371f1..d0af924 100644
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -3129,7 +3129,6 @@ static void tx_intr_handler(struct fifo_info *fifo_data)
 		pkt_cnt++;
 
 		/* Updating the statistics block */
-		nic->dev->stats.tx_bytes += skb->len;
 		swstats->mem_freed += skb->truesize;
 		dev_kfree_skb_irq(skb);
 
@@ -4900,48 +4899,81 @@ static void s2io_updt_stats(struct s2io_nic *sp)
  *  Return value:
  *  pointer to the updated net_device_stats structure.
  */
-
 static struct net_device_stats *s2io_get_stats(struct net_device *dev)
 {
 	struct s2io_nic *sp = netdev_priv(dev);
-	struct config_param *config = &sp->config;
 	struct mac_info *mac_control = &sp->mac_control;
 	struct stat_block *stats = mac_control->stats_info;
-	int i;
+	u64 delta;
 
 	/* Configure Stats for immediate updt */
 	s2io_updt_stats(sp);
 
-	/* Using sp->stats as a staging area, because reset (due to mtu
-	   change, for example) will clear some hardware counters */
-	dev->stats.tx_packets += le32_to_cpu(stats->tmac_frms) -
-		sp->stats.tx_packets;
-	sp->stats.tx_packets = le32_to_cpu(stats->tmac_frms);
-
-	dev->stats.tx_errors += le32_to_cpu(stats->tmac_any_err_frms) -
-		sp->stats.tx_errors;
-	sp->stats.tx_errors = le32_to_cpu(stats->tmac_any_err_frms);
-
-	dev->stats.rx_errors += le64_to_cpu(stats->rmac_drop_frms) -
-		sp->stats.rx_errors;
-	sp->stats.rx_errors = le64_to_cpu(stats->rmac_drop_frms);
-
-	dev->stats.multicast = le32_to_cpu(stats->rmac_vld_mcst_frms) -
-		sp->stats.multicast;
-	sp->stats.multicast = le32_to_cpu(stats->rmac_vld_mcst_frms);
-
-	dev->stats.rx_length_errors = le64_to_cpu(stats->rmac_long_frms) -
-		sp->stats.rx_length_errors;
-	sp->stats.rx_length_errors = le64_to_cpu(stats->rmac_long_frms);
+	/* A device reset will cause the on-adapter statistics to be zero'ed.
+	 * This can be done while running by changing the MTU.  To prevent the
+	 * system from having the stats zero'ed, the driver keeps a copy of the
+	 * last update to the system (which is also zero'ed on reset).  This
+	 * enables the driver to accurately know the delta between the last
+	 * update and the current update.
+	 */
+	delta = ((u64) le32_to_cpu(stats->rmac_vld_frms_oflow) << 32 |
+		le32_to_cpu(stats->rmac_vld_frms)) - sp->stats.rx_packets;
+	sp->stats.rx_packets += delta;
+	dev->stats.rx_packets += delta;
+
+	delta = ((u64) le32_to_cpu(stats->tmac_frms_oflow) << 32 |
+		le32_to_cpu(stats->tmac_frms)) - sp->stats.tx_packets;
+	sp->stats.tx_packets += delta;
+	dev->stats.tx_packets += delta;
+
+	delta = ((u64) le32_to_cpu(stats->rmac_data_octets_oflow) << 32 |
+		le32_to_cpu(stats->rmac_data_octets)) - sp->stats.rx_bytes;
+	sp->stats.rx_bytes += delta;
+	dev->stats.rx_bytes += delta;
+
+	delta = ((u64) le32_to_cpu(stats->tmac_data_octets_oflow) << 32 |
+		le32_to_cpu(stats->tmac_data_octets)) - sp->stats.tx_bytes;
+	sp->stats.tx_bytes += delta;
+	dev->stats.tx_bytes += delta;
+
+	delta = le64_to_cpu(stats->rmac_drop_frms) - sp->stats.rx_errors;
+	sp->stats.rx_errors += delta;
+	dev->stats.rx_errors += delta;
+
+	delta = ((u64) le32_to_cpu(stats->tmac_any_err_frms_oflow) << 32 |
+		le32_to_cpu(stats->tmac_any_err_frms)) - sp->stats.tx_errors;
+	sp->stats.tx_errors += delta;
+	dev->stats.tx_errors += delta;
+
+	delta = le64_to_cpu(stats->rmac_drop_frms) - sp->stats.rx_dropped;
+	sp->stats.rx_dropped += delta;
+	dev->stats.rx_dropped += delta;
+
+	delta = le64_to_cpu(stats->tmac_drop_frms) - sp->stats.tx_dropped;
+	sp->stats.tx_dropped += delta;
+	dev->stats.tx_dropped += delta;
+
+	/* The adapter MAC interprets pause frames as multicast packets, but
+	 * does not pass them up.  This erroneously increases the multicast
+	 * packet count and needs to be deducted when the multicast frame count
+	 * is queried.
+	 */
+	delta = (u64) le32_to_cpu(stats->rmac_vld_mcst_frms_oflow) << 32 |
+		le32_to_cpu(stats->rmac_vld_mcst_frms);
+	delta -= le64_to_cpu(stats->rmac_pause_ctrl_frms);
+	delta -= sp->stats.multicast;
+	sp->stats.multicast += delta;
+	dev->stats.multicast += delta;
 
-	/* collect per-ring rx_packets and rx_bytes */
-	dev->stats.rx_packets = dev->stats.rx_bytes = 0;
-	for (i = 0; i < config->rx_ring_num; i++) {
-		struct ring_info *ring = &mac_control->rings[i];
+	delta = ((u64) le32_to_cpu(stats->rmac_usized_frms_oflow) << 32 |
+		le32_to_cpu(stats->rmac_usized_frms)) +
+		le64_to_cpu(stats->rmac_long_frms) - sp->stats.rx_length_errors;
+	sp->stats.rx_length_errors += delta;
+	dev->stats.rx_length_errors += delta;
 
-		dev->stats.rx_packets += ring->rx_packets;
-		dev->stats.rx_bytes += ring->rx_bytes;
-	}
+	delta = le64_to_cpu(stats->rmac_fcs_err_frms) - sp->stats.rx_crc_errors;
+	sp->stats.rx_crc_errors += delta;
+	dev->stats.rx_crc_errors += delta;
 
 	return &dev->stats;
 }
@@ -7494,15 +7526,11 @@ static int rx_osm_handler(struct ring_info *ring_data, struct RxD_t * rxdp)
 		}
 	}
 
-	/* Updating statistics */
-	ring_data->rx_packets++;
 	rxdp->Host_Control = 0;
 	if (sp->rxd_mode == RXD_MODE_1) {
 		int len = RXD_GET_BUFFER0_SIZE_1(rxdp->Control_2);
 
-		ring_data->rx_bytes += len;
 		skb_put(skb, len);
-
 	} else if (sp->rxd_mode == RXD_MODE_3B) {
 		int get_block = ring_data->rx_curr_get_info.block_index;
 		int get_off = ring_data->rx_curr_get_info.offset;
@@ -7511,7 +7539,6 @@ static int rx_osm_handler(struct ring_info *ring_data, struct RxD_t * rxdp)
 		unsigned char *buff = skb_push(skb, buf0_len);
 
 		struct buffAdd *ba = &ring_data->ba[get_block][get_off];
-		ring_data->rx_bytes += buf0_len + buf2_len;
 		memcpy(buff, ba->ba_0, buf0_len);
 		skb_put(skb, buf2_len);
 	}
diff --git a/drivers/net/s2io.h b/drivers/net/s2io.h
index 47c36e0..5e52c75 100644
--- a/drivers/net/s2io.h
+++ b/drivers/net/s2io.h
@@ -745,10 +745,6 @@ struct ring_info {
 
 	/* Buffer Address store. */
 	struct buffAdd **ba;
-
-	/* per-Ring statistics */
-	unsigned long rx_packets;
-	unsigned long rx_bytes;
 } ____cacheline_aligned;
 
 /* Fifo specific structure */

^ permalink raw reply related

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Peter Zijlstra @ 2010-07-02 18:11 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: Tejun Heo, Oleg Nesterov, Michael S. Tsirkin, Ingo Molnar, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <4C2E2987.9040702@us.ibm.com>

On Fri, 2010-07-02 at 11:01 -0700, Sridhar Samudrala wrote:
>  
>  Does  it (Tejun's kthread_clone() patch) also  inherit the 
> cgroup of the caller?

Of course, its a simple do_fork() which inherits everything just as you
would expect from a similar sys_clone()/sys_fork() call.

^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Sridhar Samudrala @ 2010-07-02 18:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Oleg Nesterov, Michael S. Tsirkin, Ingo Molnar, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <1277996135.1917.198.camel@laptop>

On 7/1/2010 7:55 AM, Peter Zijlstra wrote:
> On Thu, 2010-07-01 at 16:53 +0200, Tejun Heo wrote:
>    
>> Hello,
>>
>> On 07/01/2010 04:46 PM, Oleg Nesterov wrote:
>>      
>>>> It might be a good idea to make the function take extra clone flags
>>>> but anyways once created cloned task can be treated the same way as
>>>> other kthreads, so nothing else needs to be changed.
>>>>          
>>> This makes kthread_stop() work. Otherwise the new thread is just
>>> the CLONE_VM child of the caller, and the caller is the user-mode
>>> task doing ioctl() ?
>>>        
>> Hmmm, indeed.  It makes the attribute inheritance work but circumvents
>> the whole reason there is kthreadd.
>>      
> I thought the whole reason there was threadd was to avoid the
> inheritance? So avoiding the avoiding of inheritance seems like the goal
> here, no?
>    
I think so. Does  it (Tejun's kthread_clone() patch) also  inherit the 
cgroup of the caller? or do we still need the explicit
call to attach the thread to the current task's cgroup?

I am on vacation next week and cannot look into this until Jul 12. Hope 
this will be resoved by then.
If not, i will look into after i am back.

Thanks
Sridhar


^ permalink raw reply

* [PATCH 5/6] atm/suni.c: call atm_dev_signal_change() when signal changes.
From: Karl Hiramoto @ 2010-07-02 17:47 UTC (permalink / raw)
  To: linux-atm-general, netdev, chas; +Cc: nathan, Karl Hiramoto
In-Reply-To: <1278092830-10473-1-git-send-email-karl@hiramoto.org>

Propagate changes to upper atm layer.

Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
---
 drivers/atm/suni.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/atm/suni.c b/drivers/atm/suni.c
index da4b91f..41c56ea 100644
--- a/drivers/atm/suni.c
+++ b/drivers/atm/suni.c
@@ -291,8 +291,9 @@ static int suni_ioctl(struct atm_dev *dev,unsigned int cmd,void __user *arg)
 
 static void poll_los(struct atm_dev *dev)
 {
-	dev->signal = GET(RSOP_SIS) & SUNI_RSOP_SIS_LOSV ? ATM_PHY_SIG_LOST :
-	  ATM_PHY_SIG_FOUND;
+	atm_dev_signal_change(dev,
+		GET(RSOP_SIS) & SUNI_RSOP_SIS_LOSV ?
+		ATM_PHY_SIG_LOST : ATM_PHY_SIG_FOUND);
 }
 
 
-- 
1.7.1


^ permalink raw reply related

* [PATCH 6/6] atm/adummy: add syfs DEVICE_ATTR to change signal
From: Karl Hiramoto @ 2010-07-02 17:47 UTC (permalink / raw)
  To: linux-atm-general, netdev, chas; +Cc: nathan, Karl Hiramoto
In-Reply-To: <1278092830-10473-1-git-send-email-karl@hiramoto.org>

Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
---
 drivers/atm/adummy.c |   39 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/drivers/atm/adummy.c b/drivers/atm/adummy.c
index 6d44f07..46b9476 100644
--- a/drivers/atm/adummy.c
+++ b/drivers/atm/adummy.c
@@ -40,6 +40,42 @@ struct adummy_dev {
 
 static LIST_HEAD(adummy_devs);
 
+static ssize_t __set_signal(struct device *dev,
+		struct device_attribute *attr,
+		const char *buf, size_t len)
+{
+	struct atm_dev *atm_dev = container_of(dev, struct atm_dev, class_dev);
+	int signal;
+
+	if (sscanf(buf, "%d", &signal) == 1) {
+
+		if (signal < ATM_PHY_SIG_LOST || signal > ATM_PHY_SIG_FOUND)
+			signal = ATM_PHY_SIG_UNKNOWN;
+
+		atm_dev_signal_change(atm_dev, signal);
+		return 1;
+	}
+	return -EINVAL;
+}
+
+static ssize_t __show_signal(struct device *dev,
+	struct device_attribute *attr, char *buf)
+{
+	struct atm_dev *atm_dev = container_of(dev, struct atm_dev, class_dev);
+	return sprintf(buf, "%d\n", atm_dev->signal);
+}
+static DEVICE_ATTR(signal, 0644, __show_signal, __set_signal);
+
+static struct attribute *adummy_attrs[] = {
+	&dev_attr_signal.attr,
+	NULL
+};
+
+static struct attribute_group adummy_group_attrs = {
+	.name = NULL, /* We want them in dev's root folder */
+	.attrs = adummy_attrs
+};
+
 static int __init
 adummy_start(struct atm_dev *dev)
 {
@@ -128,6 +164,9 @@ static int __init adummy_init(void)
 	adummy_dev->atm_dev = atm_dev;
 	atm_dev->dev_data = adummy_dev;
 
+	if (sysfs_create_group(&atm_dev->class_dev.kobj, &adummy_group_attrs))
+		dev_err(&atm_dev->class_dev, "Could not register attrs for adummy\n");
+
 	if (adummy_start(atm_dev)) {
 		printk(KERN_ERR DEV_LABEL ": adummy_start() failed\n");
 		err = -ENODEV;
-- 
1.7.1


^ permalink raw reply related

* [PATCH 1/6] atm: add hooks to propagate signal changes to netdevice
From: Karl Hiramoto @ 2010-07-02 17:47 UTC (permalink / raw)
  To: linux-atm-general, netdev, chas; +Cc: nathan, Karl Hiramoto
In-Reply-To: <1278092830-10473-1-git-send-email-karl@hiramoto.org>

On DSL and ATM devices it's usefull to have a know if you have a carrier signal.
netdevice LOWER_UP changes can be propagated to userspace via netlink monitor.

Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
---
 include/linux/atmdev.h |    5 +++++
 net/atm/common.c       |   33 +++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/include/linux/atmdev.h b/include/linux/atmdev.h
index 817b237..c6958ec 100644
--- a/include/linux/atmdev.h
+++ b/include/linux/atmdev.h
@@ -311,6 +311,7 @@ struct atm_vcc {
 	void (*pop)(struct atm_vcc *vcc,struct sk_buff *skb); /* optional */
 	int (*push_oam)(struct atm_vcc *vcc,void *cell);
 	int (*send)(struct atm_vcc *vcc,struct sk_buff *skb);
+	void (*signal_change)(struct atm_vcc *vcc); /* optional. to propagate LOWER_UP */
 	void		*dev_data;	/* per-device data */
 	void		*proto_data;	/* per-protocol data */
 	struct k_atm_aal_stats *stats;	/* pointer to AAL stats group */
@@ -431,6 +432,10 @@ struct atm_dev *atm_dev_register(const char *type,const struct atmdev_ops *ops,
     int number,unsigned long *flags); /* number == -1: pick first available */
 struct atm_dev *atm_dev_lookup(int number);
 void atm_dev_deregister(struct atm_dev *dev);
+/**
+* Propagate lower layer signal change in atm_dev->signal to netdevice.
+*/
+void atm_dev_signal_change(struct atm_dev *dev, char signal);
 void vcc_insert_socket(struct sock *sk);
 
 
diff --git a/net/atm/common.c b/net/atm/common.c
index b43feb1..ccf09f2 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -212,6 +212,39 @@ void vcc_release_async(struct atm_vcc *vcc, int reply)
 }
 EXPORT_SYMBOL(vcc_release_async);
 
+void atm_dev_signal_change(struct atm_dev *dev, char signal)
+{
+	int i;
+	pr_debug("%s signal=%d dev=%p number=%d dev->signal=%d\n",
+		__func__, signal, dev, dev->number, dev->signal);
+
+	if (dev->signal == signal)
+		return; /* no change */
+
+	dev->signal = signal;
+
+	read_lock_irq(&vcc_sklist_lock);
+	for (i = 0; i < VCC_HTABLE_SIZE; i++) {
+		struct hlist_head *head = &vcc_hash[i];
+		struct hlist_node *node, *tmp;
+		struct sock *s;
+		struct atm_vcc *vcc;
+		sk_for_each_safe(s, node, tmp, head) {
+			vcc = atm_sk(s);
+			pr_debug("%s signal=%d vcc=%p dev=%p vcc->dev=%p %d.%d itf=%d meta=%d\n",
+				__func__, signal, vcc, dev, vcc->dev,
+				vcc->vpi, vcc->vci, vcc->itf, test_bit(ATM_VF_META, &vcc->flags));
+			/* if there is a signal change callback and dev matches,
+				or if this is a meta dev (clip atm_dev is arpd) */
+			if (vcc->signal_change && (vcc->dev == dev
+				|| test_bit(ATM_VF_META, &vcc->flags))) {
+				vcc->signal_change(vcc);
+			}
+		}
+	}
+	read_unlock_irq(&vcc_sklist_lock);
+}
+EXPORT_SYMBOL(atm_dev_signal_change);
 
 void atm_dev_release_vccs(struct atm_dev *dev)
 {
-- 
1.7.1


^ permalink raw reply related

* [PATCH 4/6] atm/solos-pci: call atm_dev_signal_change() when signal changes.
From: Karl Hiramoto @ 2010-07-02 17:47 UTC (permalink / raw)
  To: linux-atm-general, netdev, chas; +Cc: nathan, Karl Hiramoto
In-Reply-To: <1278092830-10473-1-git-send-email-karl@hiramoto.org>

Propagate changes to upper atm layer, so userspace netmontor knows when DSL showtime reached.

Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
---
 drivers/atm/solos-pci.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c
index ded76c4..6174965 100644
--- a/drivers/atm/solos-pci.c
+++ b/drivers/atm/solos-pci.c
@@ -383,7 +383,7 @@ static int process_status(struct solos_card *card, int port, struct sk_buff *skb
 
 	/* Anything but 'Showtime' is down */
 	if (strcmp(state_str, "Showtime")) {
-		card->atmdev[port]->signal = ATM_PHY_SIG_LOST;
+		atm_dev_signal_change(card->atmdev[port], ATM_PHY_SIG_LOST);
 		release_vccs(card->atmdev[port]);
 		dev_info(&card->dev->dev, "Port %d: %s\n", port, state_str);
 		return 0;
@@ -401,7 +401,7 @@ static int process_status(struct solos_card *card, int port, struct sk_buff *skb
 		 snr[0]?", SNR ":"", snr, attn[0]?", Attn ":"", attn);
 	
 	card->atmdev[port]->link_rate = rate_down / 424;
-	card->atmdev[port]->signal = ATM_PHY_SIG_FOUND;
+	atm_dev_signal_change(card->atmdev[port], ATM_PHY_SIG_FOUND);
 
 	return 0;
 }
@@ -1246,7 +1246,7 @@ static int atm_init(struct solos_card *card)
 		card->atmdev[i]->ci_range.vci_bits = 16;
 		card->atmdev[i]->dev_data = card;
 		card->atmdev[i]->phy_data = (void *)(unsigned long)i;
-		card->atmdev[i]->signal = ATM_PHY_SIG_UNKNOWN;
+		atm_dev_signal_change(card->atmdev[i], ATM_PHY_SIG_UNKNOWN);
 
 		skb = alloc_skb(sizeof(*header), GFP_ATOMIC);
 		if (!skb) {
-- 
1.7.1


^ permalink raw reply related

* [PATCH 0/6] atm:  propagate atm_dev signal carrier to LOWER_UP of netdevice
From: Karl Hiramoto @ 2010-07-02 17:47 UTC (permalink / raw)
  To: linux-atm-general, netdev, chas; +Cc: nathan, Karl Hiramoto

In userspace it's helpfull to know if a network device has a carrier signal. 
Often it is monitored via netlink.  This patchset allows a way for the 
struct atm_dev drivers to pass carrier on/off to the netdevice.

For DSL, carrier is on when the line has reached showtime state.

Currently this patchset only propagates the changes to br2684 vccs,
as this is the only type of hardware I have to test.

If you prefer git you can pull from:
git://github.com/karlhiramoto/linux-2.6.git linux-atm

Signed-off-by: Karl Hiramoto <karl@hiramoto.org>

Karl Hiramoto (6):
  atm: add hooks to propagate signal changes to netdevice
  atm br2684: add callback for carrier signal changes.
  atm/idt77105.c: call atm_dev_signal_change() when signal changes.
  atm/solos-pci: call atm_dev_signal_change() when signal changes.
  atm/suni.c: call atm_dev_signal_change() when signal changes.
  atm/adummy: add syfs DEVICE_ATTR to change signal

 drivers/atm/adummy.c    |   39 +++++++++++++++++++++++++++++++++++++++
 drivers/atm/idt77105.c  |   11 ++++++-----
 drivers/atm/solos-pci.c |    6 +++---
 drivers/atm/suni.c      |    5 +++--
 include/linux/atmdev.h  |    5 +++++
 net/atm/br2684.c        |   13 +++++++++++++
 net/atm/common.c        |   33 +++++++++++++++++++++++++++++++++
 7 files changed, 102 insertions(+), 10 deletions(-)


^ permalink raw reply

* [PATCH 2/6] atm br2684: add callback for carrier signal changes.
From: Karl Hiramoto @ 2010-07-02 17:47 UTC (permalink / raw)
  To: linux-atm-general, netdev, chas; +Cc: nathan, Karl Hiramoto
In-Reply-To: <1278092830-10473-1-git-send-email-karl@hiramoto.org>

When a signal change event occurs call netif_carrier_on/off.

Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
---
 net/atm/br2684.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/net/atm/br2684.c b/net/atm/br2684.c
index 6719af6..e0136ec 100644
--- a/net/atm/br2684.c
+++ b/net/atm/br2684.c
@@ -448,6 +448,17 @@ free_skb:
 	dev_kfree_skb(skb);
 }
 
+static void br2684_signal_change(struct atm_vcc *atmvcc)
+{
+	struct br2684_vcc *brvcc = BR2684_VCC(atmvcc);
+	struct net_device *net_dev = brvcc->device;
+
+	if (atmvcc->dev->signal == ATM_PHY_SIG_LOST)
+		netif_carrier_off(net_dev);
+	else
+		netif_carrier_on(net_dev);
+}
+
 /*
  * Assign a vcc to a dev
  * Note: we do not have explicit unassign, but look at _push()
@@ -514,6 +525,7 @@ static int br2684_regvcc(struct atm_vcc *atmvcc, void __user * arg)
 	barrier();
 	atmvcc->push = br2684_push;
 	atmvcc->pop = br2684_pop;
+	atmvcc->signal_change = br2684_signal_change;
 
 	__skb_queue_head_init(&queue);
 	rq = &sk_atm(atmvcc)->sk_receive_queue;
@@ -530,6 +542,7 @@ static int br2684_regvcc(struct atm_vcc *atmvcc, void __user * arg)
 
 		br2684_push(atmvcc, skb);
 	}
+	br2684_signal_change(atmvcc); /* initialize netdev carrier state */
 	__module_get(THIS_MODULE);
 	return 0;
 
-- 
1.7.1


^ permalink raw reply related

* [PATCH 3/6] atm/idt77105.c: call atm_dev_signal_change() when signal changes.
From: Karl Hiramoto @ 2010-07-02 17:47 UTC (permalink / raw)
  To: linux-atm-general, netdev, chas; +Cc: nathan, Karl Hiramoto
In-Reply-To: <1278092830-10473-1-git-send-email-karl@hiramoto.org>

Propagate changes to upper atm layer.

Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
---
 drivers/atm/idt77105.c |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/atm/idt77105.c b/drivers/atm/idt77105.c
index dab5cf5..bca9cb8 100644
--- a/drivers/atm/idt77105.c
+++ b/drivers/atm/idt77105.c
@@ -126,7 +126,7 @@ static void idt77105_restart_timer_func(unsigned long dummy)
                 istat = GET(ISTAT); /* side effect: clears all interrupt status bits */
                 if (istat & IDT77105_ISTAT_GOODSIG) {
                     /* Found signal again */
-                    dev->signal = ATM_PHY_SIG_FOUND;
+                    atm_dev_signal_change(dev, ATM_PHY_SIG_FOUND);
 	            printk(KERN_NOTICE "%s(itf %d): signal detected again\n",
                         dev->type,dev->number);
                     /* flush the receive FIFO */
@@ -222,7 +222,7 @@ static void idt77105_int(struct atm_dev *dev)
             /* Rx Signal Condition Change - line went up or down */
             if (istat & IDT77105_ISTAT_GOODSIG) {   /* signal detected again */
                 /* This should not happen (restart timer does it) but JIC */
-                dev->signal = ATM_PHY_SIG_FOUND;
+		atm_dev_signal_change(dev, ATM_PHY_SIG_FOUND);
             } else {    /* signal lost */
                 /*
                  * Disable interrupts and stop all transmission and
@@ -235,7 +235,7 @@ static void idt77105_int(struct atm_dev *dev)
                     IDT77105_MCR_DRIC|
                     IDT77105_MCR_HALTTX
                     ) & ~IDT77105_MCR_EIP, MCR);
-                dev->signal = ATM_PHY_SIG_LOST;
+		atm_dev_signal_change(dev, ATM_PHY_SIG_LOST);
 	        printk(KERN_NOTICE "%s(itf %d): signal lost\n",
                     dev->type,dev->number);
             }
@@ -272,8 +272,9 @@ static int idt77105_start(struct atm_dev *dev)
 	memset(&PRIV(dev)->stats,0,sizeof(struct idt77105_stats));
         
         /* initialise dev->signal from Good Signal Bit */
-        dev->signal = GET(ISTAT) & IDT77105_ISTAT_GOODSIG ? ATM_PHY_SIG_FOUND :
-	  ATM_PHY_SIG_LOST;
+	atm_dev_signal_change(dev,
+		GET(ISTAT) & IDT77105_ISTAT_GOODSIG ?
+		ATM_PHY_SIG_FOUND : ATM_PHY_SIG_LOST);
 	if (dev->signal == ATM_PHY_SIG_LOST)
 		printk(KERN_WARNING "%s(itf %d): no signal\n",dev->type,
 		    dev->number);
-- 
1.7.1


^ permalink raw reply related

* [PATCHv2] xfrm: fix xfrm by MARK logic
From: Peter Kosyh @ 2010-07-02 17:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

From: Peter Kosyh <p.kosyh@gmail.com>

While using xfrm by MARK feature in
2.6.34 - 2.6.35 kernels, the mark 
is always cleared in flowi structure via memset in 
_decode_session4 (net/ipv4/xfrm4_policy.c), so
the policy lookup fails.
IPv6 code is affected by this bug too.

Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
---

diff -uprN linux-2.6.35-rc3.orig/net/ipv4/xfrm4_policy.c linux-2.6.35-rc3/net/ipv4/xfrm4_policy.c
--- linux-2.6.35-rc3.orig/net/ipv4/xfrm4_policy.c	2010-06-12 06:14:04.000000000 +0400
+++ linux-2.6.35-rc3/net/ipv4/xfrm4_policy.c	2010-07-02 20:20:49.000000000 +0400
@@ -108,6 +108,8 @@ _decode_session4(struct sk_buff *skb, st
 	u8 *xprth = skb_network_header(skb) + iph->ihl * 4;
 
 	memset(fl, 0, sizeof(struct flowi));
+	fl->mark = skb->mark;
+
 	if (!(iph->frag_off & htons(IP_MF | IP_OFFSET))) {
 		switch (iph->protocol) {
 		case IPPROTO_UDP:
diff -uprN linux-2.6.35-rc3.orig/net/ipv6/xfrm6_policy.c linux-2.6.35-rc3/net/ipv6/xfrm6_policy.c
--- linux-2.6.35-rc3.orig/net/ipv6/xfrm6_policy.c	2010-06-12 06:14:04.000000000 +0400
+++ linux-2.6.35-rc3/net/ipv6/xfrm6_policy.c	2010-07-02 20:20:22.000000000 +0400
@@ -124,6 +124,8 @@ _decode_session6(struct sk_buff *skb, st
 	u8 nexthdr = nh[IP6CB(skb)->nhoff];
 
 	memset(fl, 0, sizeof(struct flowi));
+	fl->mark = skb->mark;
+
 	ipv6_addr_copy(&fl->fl6_dst, reverse ? &hdr->saddr : &hdr->daddr);
 	ipv6_addr_copy(&fl->fl6_src, reverse ? &hdr->daddr : &hdr->saddr);
 

^ permalink raw reply

* [PATCH net-2.6] net: Fix definition of netif_vdbg() when VERBOSE_DEBUG is defined
From: Ben Hutchings @ 2010-07-02 17:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers, Joe Perches

netif_vdbg() was originally defined as entirely equivalent to
netdev_vdbg(), but I assume that it was intended to take the same
parameters as netif_dbg() etc.  (Currently it is only used by the
sfc driver, in which I worked on that assumption.)

In commit a4ed89c I changed the definition used when VERBOSE_DEBUG is
not defined, but I failed to notice that the definition used when
VERBOSE_DEBUG is defined was also not as I expected.  Change that to
match netif_dbg() as well.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
Since there are no users of this macro in net-2.6, this should be safe
to change.  If you don't want to make this change in net-2.6 then please
revert a4ed89c in net-2.6 and apply this in net-next-2.6 so that the two
alternate definitions are consistent in each tree.

Ben.

 include/linux/netdevice.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8fa5e5a..f823fd8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2352,7 +2352,7 @@ do {								\
 #endif
 
 #if defined(VERBOSE_DEBUG)
-#define netif_vdbg	netdev_dbg
+#define netif_vdbg	netif_dbg
 #else
 #define netif_vdbg(priv, type, dev, format, args...)		\
 ({								\
-- 
1.6.2.5

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* Re: [PATCH] xfrm bugs with mark logic
From: Eric Dumazet @ 2010-07-02 16:59 UTC (permalink / raw)
  To: Peter Kosyh; +Cc: netdev
In-Reply-To: <20100702162403.GA10809@myhost>

Le vendredi 02 juillet 2010 à 20:24 +0400, Peter Kosyh a écrit :
> > 
> > Hi Peter
> > 
> > XFRMA_MARK part already in net-2.6 tree :
> > 
> > http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=4efd7e833591721bec21cc4730a7f6261417840f
> > 
> > Please submit another patch the second problem you spotted ?
> > 
> 
> Yes, i see. Here it is, but i am little unsure about ipv6 fix.
> 

Seems fine, but please read Documentation/SubmittingPatches to submit an
official patch.

Thanks !



^ permalink raw reply

* Re: [PATCH net-next-2.6 1/3] ethtool: Change ethtool_op_set_flags to validate flags
From: Randy Dunlap @ 2010-07-02 16:55 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Amit Salecha, linux-net-drivers, Dimitris Michailidis,
	Stanislaw Gruszka, Amerigo Wang, Jeff, e1000-devel, netdev,
	Anirban Chakraborty, Garzik, Vasanthy Kolluri, Brice Goglin,
	Andrew Gallatin, Scott Feldman, Stephen Hemminger, David Miller,
	Lennert Buytenhek, Roopa Prabhu
In-Reply-To: <1277901872.2082.10.camel@achroite.uk.solarflarecom.com>

On Wed, 30 Jun 2010 13:44:32 +0100 Ben Hutchings wrote:

> ethtool_op_set_flags() does not check for unsupported flags, and has
> no way of doing so.  This means it is not suitable for use as a
> default implementation of ethtool_ops::set_flags.
> 
> Add a 'supported' parameter specifying the flags that the driver and
> hardware support, validate the requested flags against this, and
> change all current callers to pass this parameter.
> 
> Change some other trivial implementations of ethtool_ops::set_flags to
> call ethtool_op_set_flags().
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
> Acked-by: Jeff Garzik <jgarzik@redhat.com>
> ---
>  drivers/net/cxgb4/cxgb4_main.c    |    9 +--------
>  drivers/net/enic/enic_main.c      |    1 -
>  drivers/net/ixgbe/ixgbe_ethtool.c |    5 ++++-
>  drivers/net/mv643xx_eth.c         |    7 ++++++-
>  drivers/net/myri10ge/myri10ge.c   |   10 +++++++---
>  drivers/net/niu.c                 |    9 +--------
>  drivers/net/sfc/ethtool.c         |    5 +----
>  drivers/net/sky2.c                |   16 ++++++----------
>  include/linux/ethtool.h           |    2 +-
>  net/core/ethtool.c                |   28 +++++-----------------------
>  10 files changed, 32 insertions(+), 60 deletions(-)
> 

> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index 2c8af09..084ddb3 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -457,7 +457,7 @@ int ethtool_op_set_tso(struct net_device *dev, u32 data);
>  u32 ethtool_op_get_ufo(struct net_device *dev);
>  int ethtool_op_set_ufo(struct net_device *dev, u32 data);
>  u32 ethtool_op_get_flags(struct net_device *dev);
> -int ethtool_op_set_flags(struct net_device *dev, u32 data);
> +int ethtool_op_set_flags(struct net_device *dev, u32 data, u32 supported);

That one-line change is missing from linux-next-20100702, causing:

drivers/infiniband/ulp/ipoib/ipoib_ethtool.c:157: warning: initialization from incompatible pointer type


but the change (below) to net/core/ethtool.c is merged.
I don't quite see how this happened...


>  void ethtool_ntuple_flush(struct net_device *dev);
>  
>  /**
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index a0f4964..5d42fae 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -144,31 +144,13 @@ u32 ethtool_op_get_flags(struct net_device *dev)
>  }
>  EXPORT_SYMBOL(ethtool_op_get_flags);
>  
> -int ethtool_op_set_flags(struct net_device *dev, u32 data)
> +int ethtool_op_set_flags(struct net_device *dev, u32 data, u32 supported)
>  {
> -	const struct ethtool_ops *ops = dev->ethtool_ops;
> -	unsigned long features = dev->features;
> -
> -	if (data & ETH_FLAG_LRO)
> -		features |= NETIF_F_LRO;
> -	else
> -		features &= ~NETIF_F_LRO;
> -
> -	if (data & ETH_FLAG_NTUPLE) {
> -		if (!ops->set_rx_ntuple)
> -			return -EOPNOTSUPP;
> -		features |= NETIF_F_NTUPLE;
> -	} else {
> -		/* safe to clear regardless */
> -		features &= ~NETIF_F_NTUPLE;
> -	}
> -
> -	if (data & ETH_FLAG_RXHASH)
> -		features |= NETIF_F_RXHASH;
> -	else
> -		features &= ~NETIF_F_RXHASH;
> +	if (data & ~supported)
> +		return -EINVAL;
>  
> -	dev->features = features;
> +	dev->features = ((dev->features & ~flags_dup_features) |
> +			 (data & flags_dup_features));
>  	return 0;
>  }
>  EXPORT_SYMBOL(ethtool_op_set_flags);
> -- 

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: iwl3945: HARDWARE GONE??
From: Stephen Hemminger @ 2010-07-02 16:48 UTC (permalink / raw)
  To: John W. Linville; +Cc: Priit Laes, netdev, linux-kernel
In-Reply-To: <20100702162650.GC2381@tuxdriver.com>

On Fri, 2 Jul 2010 12:26:50 -0400
"John W. Linville" <linville@tuxdriver.com> wrote:

> On Fri, Jul 02, 2010 at 07:02:55PM +0300, Priit Laes wrote:
> > Heya!
> > 
> > Bumped my kernel to version 2.6.35-rc3-00391-g97e0214 and ran into
> > HARDWARE GONE error message..
> > 
> > Hardware is Lenovo x60s and wireless card is intel 3945:
> > 
> > 03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection (rev 02)
> > 	Subsystem: Intel Corporation ThinkPad R60e/X60s
> > 	Flags: bus master, fast devsel, latency 0, IRQ 47
> > 	Memory at edf00000 (32-bit, non-prefetchable) [size=4K]
> > 	Capabilities: [c8] Power Management version 2
> > 	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> > 	Capabilities: [e0] Express Legacy Endpoint, MSI 00
> > 	Capabilities: [100] Advanced Error Reporting
> > 	Capabilities: [140] Device Serial Number 00-xx-xx-xx-xx-xx-xx-xx
> > 	Kernel driver in use: iwl3945
> > 	Kernel modules: iwl3945
> > 
> > I have attached dmesg error message...
> > 
> > Cheers,
> > Priit :)
> 
> > [13813.347617] Uhhuh. NMI received for unknown reason a1 on CPU 0.
> > [13813.347617] You have some hardware problem, likely on the PCI bus.
> > [13813.347617] Dazed and confused, but trying to continue
> > [13813.347617] iwl3945 0000:03:00.0: HARDWARE GONE?? INTA == 0xffffffff
> 
> We've been seeing this sort of thing a lot -- somehow the iwl3945 gets
> disconnected from the PCI bus.  Anyone have any clue how that happens?

Usually this kind of problem is a power management issue.

^ permalink raw reply

* soft lockup with conntrackd / keepalived / VLAN
From: Adam Gundy @ 2010-07-02 16:04 UTC (permalink / raw)
  To: netdev

I've built a pair of router boxes which are using keepalived and conntrackd to 
provide a redundant router setup. we're also using a single 802.1Q VLAN on the 
box.

occasionally, the box will lockup for 5 minutes, during which time routed 
traffic is extremely delayed (2 or 3 second ping times). initially, there were 
no log messages about the lockup. we switched from using an internal nvidia 
(forcedeth) NIC in the belief that it may have been causing the problem.. 
however: with the new gigabit NICs, we still see the hangs, but we also get 
this in the kernel log:

> Jul  2 07:50:12 cerberus1 kernel: [31895.510006] BUG: soft lockup - CPU#0 stuck for 61s! [conntrackd:1951]
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] Modules linked in: authenc xfrm_user ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel xfrm4_mode_tunnel xfrm4_mode_tr
> ansport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm6_tunnel af_key cls_u32 sch_sfq sch_htb nf_conntrack_netlink nfnetli
> nk deflate zlib_deflate ctr camellia cast5 rmd160 sha1_generic crypto_null ccm serpent blowfish twofish twofish_common xcbc sha256_generic sha512_generic des_generi
> c cryptd aes_x86_64 aes_generic tunnel4 xfrm_ipcomp tunnel6 xt_MARK xt_tcpudp xt_esp ipt_ah xt_TCPMSS xt_HL xt_DSCP ipt_MASQUERADE ipt_REDIRECT ipt_LOG ipt_REJECT x
> t_mac xt_length xt_hl xt_dscp xt_tcpmss nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_ftp nf_conntrack_ftp ip_queue iptable_mangle ip
> table_filter xt_mark xt_recent xt_iprange xt_multiport xt_state xt_limit xt_conntrack iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_t
> ables 8021q fbcon tileblit font bitblit garp s
> Jul  2 07:50:17 cerberus1 kernel: oftcursor stp vga16fb vgastate lp shpchp sis_agp parport 8139too 8139cp r8169 mii sata_sis [last unloaded: af_key]
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] CPU 0:
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] Modules linked in: authenc xfrm_user ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel xfrm4_mode_tunnel xfrm4_mode_tr
> ansport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm6_tunnel af_key cls_u32 sch_sfq sch_htb nf_conntrack_netlink nfnetli
> nk deflate zlib_deflate ctr camellia cast5 rmd160 sha1_generic crypto_null ccm serpent blowfish twofish twofish_common xcbc sha256_generic sha512_generic des_generi
> c cryptd aes_x86_64 aes_generic tunnel4 xfrm_ipcomp tunnel6 xt_MARK xt_tcpudp xt_esp ipt_ah xt_TCPMSS xt_HL xt_DSCP ipt_MASQUERADE ipt_REDIRECT ipt_LOG ipt_REJECT x
> t_mac xt_length xt_hl xt_dscp xt_tcpmss nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_ftp nf_conntrack_ftp ip_queue iptable_mangle ip
> table_filter xt_mark xt_recent xt_iprange xt_multiport xt_state xt_limit xt_conntrack iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_t
> ables 8021q fbcon tileblit font bitblit garp s
> Jul  2 07:50:17 cerberus1 kernel: oftcursor stp vga16fb vgastate lp shpchp sis_agp parport 8139too 8139cp r8169 mii sata_sis [last unloaded: af_key]
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] Pid: 1951, comm: conntrackd Not tainted 2.6.32-23-server #37-Ubuntu ps6002
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] RIP: 0010:[<ffffffff814e9f52>]  [<ffffffff814e9f52>] __xfrm4_find_bundle+0x52/0xc0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] RSP: 0018:ffff880001c03a88  EFLAGS: 00000282
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] RAX: 0000000000000000 RBX: ffff880001c03ab8 RCX: ffff880001c00000
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] RDX: 000000180c2813ac RSI: ffff880073f9e800 RDI: ffff880073f9e828
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] RBP: ffffffff81013cb3 R08: 0000000000000000 R09: ffff880001c03c44
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001c03a00
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] R13: ffff880001c03c08 R14: ffff88000c299980 R15: ffffffff8155effc
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] FS:  00007fdec3db5700(0000) GS:ffff880001c00000(0000) knlGS:0000000000000000
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] CR2: 0000000000f1d078 CR3: 00000000715c0000 CR4: 00000000000006f0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006] Call Trace:
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  <IRQ>  [<ffffffff814e9f28>] ? __xfrm4_find_bundle+0x28/0xc0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814ee91e>] ? __xfrm_lookup+0x14e/0x4e0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff8146c591>] ? __skb_checksum_complete+0x11/0x20
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814ee71d>] ? __xfrm_policy_check+0x57d/0x630
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814e9ee5>] ? _decode_session4+0x245/0x260
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814eecc6>] ? xfrm_lookup+0x16/0x40
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814eed5b>] ? __xfrm_route_forward+0x6b/0xa0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814a355d>] ? ip_forward+0x2ed/0x420
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814a15ed>] ? ip_rcv_finish+0x12d/0x440
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814a1b75>] ? ip_rcv+0x275/0x360
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff8153043d>] ? packet_rcv_spkt+0x4d/0x190
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff8147236a>] ? netif_receive_skb+0x38a/0x5d0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff81534be0>] ? __vlan_hwaccel_rx+0x140/0x230
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffffa0012596>] ? rtl8169_rx_interrupt+0x216/0x5b0 [r8169]
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffffa0012aad>] ? rtl8169_poll+0x3d/0x270 [r8169]
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff81472e5f>] ? net_rx_action+0x10f/0x250
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff8106e2f7>] ? __do_softirq+0xb7/0x1e0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff810142ec>] ? call_softirq+0x1c/0x30
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  <EOI>  [<ffffffff81015cb5>] ? do_softirq+0x65/0xa0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff8106e77a>] ? local_bh_enable+0x9a/0xa0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffffa00b6185>] ? ipt_do_table+0x295/0x678 [ip_tables]
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff8106e77a>] ? local_bh_enable+0x9a/0xa0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffffa00b6185>] ? ipt_do_table+0x295/0x678 [ip_tables]
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff8110ce7b>] ? __krealloc+0x3b/0xa0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffffa00cb983>] ? __nf_ct_ext_add+0x143/0x180 [nf_conntrack]
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff811330fd>] ? ksize+0x1d/0xd0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff8110ce7b>] ? __krealloc+0x3b/0xa0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffffa00f91d4>] ? nf_nat_rule_find+0x24/0x80 [iptable_nat]
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffffa00f9469>] ? nf_nat_fn+0x109/0x1b0 [iptable_nat]
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffffa00f9558>] ? nf_nat_local_fn+0x48/0xe0 [iptable_nat]
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814997cc>] ? nf_iterate+0x6c/0xb0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814a4b40>] ? dst_output+0x0/0x20
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff81499884>] ? nf_hook_slow+0x74/0x100
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814a4b40>] ? dst_output+0x0/0x20
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814a6c9f>] ? __ip_local_out+0x9f/0xb0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814a6cc6>] ? ip_local_out+0x16/0x30
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814a6f72>] ? ip_push_pending_frames+0x292/0x420
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814c9210>] ? udp_push_pending_frames+0x170/0x410
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814c9ef2>] ? udp_sendmsg+0x4e2/0x850
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814d2249>] ? inet_sendmsg+0x29/0x60
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff812862ed>] ? aa_revalidate_sk+0x6d/0x90
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff81283bb9>] ? apparmor_socket_sendmsg+0x19/0x20
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff814618ab>] ? sock_sendmsg+0x10b/0x140
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff8113f3d5>] ? __mem_cgroup_try_charge+0x55/0x1f0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff810fa819>] ? __alloc_pages_nodemask+0xd9/0x180
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff81461a65>] ? sys_sendto+0x125/0x180
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff81115d7f>] ? handle_mm_fault+0x31f/0x3c0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff8155c883>] ? do_page_fault+0x153/0x3b0
> Jul  2 07:50:17 cerberus1 kernel: [31895.510006]  [<ffffffff810131b2>] ? system_call_fastpath+0x16/0x1b


note, currently the second machine in the cluster is turned off.. same 
behavior with or without it.

I'm seriously inclined to think that the VLAN is a big factor:
the reason for the VLAN is to separate VOIP phones (nothing else on that 
VLAN). this morning, it was possible to trigger the lockup by placing a call. 
as soon as the call was hungup, everything returned to normal...

it's not repeatable, exactly. what seems to happen is that pings through the 
machine get slower and slower over the course of a few hours (from say 15ms to 
50ms), then all of a sudden the machine will throw a five minute 'fit'.

PS: this is a Ubuntu Lucid kernel - 2.6.32. I'm working on a stock kernel to 
see if it still happens..

^ permalink raw reply

* Re: iwl3945: HARDWARE GONE??
From: John W. Linville @ 2010-07-02 16:26 UTC (permalink / raw)
  To: Priit Laes; +Cc: netdev, linux-kernel
In-Reply-To: <1278086575.2889.8.camel@chi>

On Fri, Jul 02, 2010 at 07:02:55PM +0300, Priit Laes wrote:
> Heya!
> 
> Bumped my kernel to version 2.6.35-rc3-00391-g97e0214 and ran into
> HARDWARE GONE error message..
> 
> Hardware is Lenovo x60s and wireless card is intel 3945:
> 
> 03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection (rev 02)
> 	Subsystem: Intel Corporation ThinkPad R60e/X60s
> 	Flags: bus master, fast devsel, latency 0, IRQ 47
> 	Memory at edf00000 (32-bit, non-prefetchable) [size=4K]
> 	Capabilities: [c8] Power Management version 2
> 	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> 	Capabilities: [e0] Express Legacy Endpoint, MSI 00
> 	Capabilities: [100] Advanced Error Reporting
> 	Capabilities: [140] Device Serial Number 00-xx-xx-xx-xx-xx-xx-xx
> 	Kernel driver in use: iwl3945
> 	Kernel modules: iwl3945
> 
> I have attached dmesg error message...
> 
> Cheers,
> Priit :)

> [13813.347617] Uhhuh. NMI received for unknown reason a1 on CPU 0.
> [13813.347617] You have some hardware problem, likely on the PCI bus.
> [13813.347617] Dazed and confused, but trying to continue
> [13813.347617] iwl3945 0000:03:00.0: HARDWARE GONE?? INTA == 0xffffffff

We've been seeing this sort of thing a lot -- somehow the iwl3945 gets
disconnected from the PCI bus.  Anyone have any clue how that happens?

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox