Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 3/5] inet: Add family scope inetpeer flushes.
From: David Miller @ 2012-06-11  9:29 UTC (permalink / raw)
  To: netdev


This implementation can deal with having many inetpeer roots, which is
a necessary prerequisite for per-FIB table rooted peer tables.

Each family (AF_INET, AF_INET6) has a sequence number which we bump
when we get a family invalidation request.

Each peer lookup cheaply checks whether the flush sequence of the
root we are using is out of date, and if so flushes it and updates
the sequence number.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/inetpeer.h |    2 ++
 net/ipv4/inetpeer.c    |   28 ++++++++++++++++++++++++++++
 net/ipv4/route.c       |    2 +-
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/include/net/inetpeer.h b/include/net/inetpeer.h
index d432489..e15c086 100644
--- a/include/net/inetpeer.h
+++ b/include/net/inetpeer.h
@@ -68,6 +68,7 @@ struct inet_peer {
 struct inet_peer_base {
 	struct inet_peer __rcu	*root;
 	seqlock_t		lock;
+	u32			flush_seq;
 	int			total;
 };
 
@@ -168,6 +169,7 @@ extern void inet_putpeer(struct inet_peer *p);
 extern bool inet_peer_xrlim_allow(struct inet_peer *peer, int timeout);
 
 extern void inetpeer_invalidate_tree(struct inet_peer_base *);
+extern void inetpeer_invalidate_family(int family);
 
 /*
  * temporary check to make sure we dont access rid, ip_id_count, tcp_ts,
diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index e4cba56..cac02ad 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -86,10 +86,36 @@ void inet_peer_base_init(struct inet_peer_base *bp)
 {
 	bp->root = peer_avl_empty_rcu;
 	seqlock_init(&bp->lock);
+	bp->flush_seq = ~0U;
 	bp->total = 0;
 }
 EXPORT_SYMBOL_GPL(inet_peer_base_init);
 
+static atomic_t v4_seq = ATOMIC_INIT(0);
+static atomic_t v6_seq = ATOMIC_INIT(0);
+
+static atomic_t *inetpeer_seq_ptr(int family)
+{
+	return (family == AF_INET ? &v4_seq : &v6_seq);
+}
+
+static inline void flush_check(struct inet_peer_base *base, int family)
+{
+	atomic_t *fp = inetpeer_seq_ptr(family);
+
+	if (unlikely(base->flush_seq != atomic_read(fp))) {
+		inetpeer_invalidate_tree(base);
+		base->flush_seq = atomic_read(fp);
+	}
+}
+
+void inetpeer_invalidate_family(int family)
+{
+	atomic_t *fp = inetpeer_seq_ptr(family);
+
+	atomic_inc(fp);
+}
+
 #define PEER_MAXDEPTH 40 /* sufficient for about 2^27 nodes */
 
 /* Exported for sysctl_net_ipv4.  */
@@ -437,6 +463,8 @@ struct inet_peer *inet_getpeer(struct inet_peer_base *base,
 	unsigned int sequence;
 	int invalidated, gccnt = 0;
 
+	flush_check(base, daddr->family);
+
 	/* Attempt a lockless lookup first.
 	 * Because of a concurrent writer, we might not find an existing entry.
 	 */
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 4f5834c..456a947 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -935,7 +935,7 @@ static void rt_cache_invalidate(struct net *net)
 
 	get_random_bytes(&shuffle, sizeof(shuffle));
 	atomic_add(shuffle + 1U, &net->ipv4.rt_genid);
-	inetpeer_invalidate_tree(net->ipv4.peers);
+	inetpeer_invalidate_family(AF_INET);
 }
 
 /*
-- 
1.7.10

^ permalink raw reply related

* [PATCH 2/5] ipv4: Kill ip_rt_frag_needed().
From: David Miller @ 2012-06-11  9:29 UTC (permalink / raw)
  To: netdev


There is zero point to this function.

It's only real substance is to perform an extremely outdated BSD4.2
ICMP check, which we can safely remove.  If you really have a MTU
limited link being routed by a BSD4.2 derived system, here's a nickel
go buy yourself a real router.

The other actions of ip_rt_frag_needed(), checking and conditionally
updating the peer, are done by the per-protocol handlers of the ICMP
event.

TCP, UDP, et al. have a handler which will receive this event and
transmit it back into the associated route via dst_ops->update_pmtu().

This simplification is important, because it eliminates the one place
where we do not have a proper route context in which to make an
inetpeer lookup.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/route.h  |    2 --
 net/ipv4/icmp.c      |    4 +---
 net/ipv4/route.c     |   61 --------------------------------------------------
 net/rxrpc/ar-error.c |    4 ----
 4 files changed, 1 insertion(+), 70 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 6340c37..cc693a5 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -215,8 +215,6 @@ static inline int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 s
 	return ip_route_input_common(skb, dst, src, tos, devin, true);
 }
 
-extern unsigned short	ip_rt_frag_needed(struct net *net, const struct iphdr *iph,
-					  unsigned short new_mtu, struct net_device *dev);
 extern void		ip_rt_send_redirect(struct sk_buff *skb);
 
 extern unsigned int		inet_addr_type(struct net *net, __be32 addr);
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 0c78ef1..e1caa1a 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -673,9 +673,7 @@ static void icmp_unreach(struct sk_buff *skb)
 				LIMIT_NETDEBUG(KERN_INFO pr_fmt("%pI4: fragmentation needed and DF set\n"),
 					       &iph->daddr);
 			} else {
-				info = ip_rt_frag_needed(net, iph,
-							 ntohs(icmph->un.frag.mtu),
-							 skb->dev);
+				info = ntohs(icmph->un.frag.mtu);
 				if (!info)
 					goto out;
 			}
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 03e5b61..4f5834c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1664,67 +1664,6 @@ out:	kfree_skb(skb);
 	return 0;
 }
 
-/*
- *	The last two values are not from the RFC but
- *	are needed for AMPRnet AX.25 paths.
- */
-
-static const unsigned short mtu_plateau[] =
-{32000, 17914, 8166, 4352, 2002, 1492, 576, 296, 216, 128 };
-
-static inline unsigned short guess_mtu(unsigned short old_mtu)
-{
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(mtu_plateau); i++)
-		if (old_mtu > mtu_plateau[i])
-			return mtu_plateau[i];
-	return 68;
-}
-
-unsigned short ip_rt_frag_needed(struct net *net, const struct iphdr *iph,
-				 unsigned short new_mtu,
-				 struct net_device *dev)
-{
-	unsigned short old_mtu = ntohs(iph->tot_len);
-	unsigned short est_mtu = 0;
-	struct inet_peer *peer;
-
-	peer = inet_getpeer_v4(net->ipv4.peers, iph->daddr, 1);
-	if (peer) {
-		unsigned short mtu = new_mtu;
-
-		if (new_mtu < 68 || new_mtu >= old_mtu) {
-			/* BSD 4.2 derived systems incorrectly adjust
-			 * tot_len by the IP header length, and report
-			 * a zero MTU in the ICMP message.
-			 */
-			if (mtu == 0 &&
-			    old_mtu >= 68 + (iph->ihl << 2))
-				old_mtu -= iph->ihl << 2;
-			mtu = guess_mtu(old_mtu);
-		}
-
-		if (mtu < ip_rt_min_pmtu)
-			mtu = ip_rt_min_pmtu;
-		if (!peer->pmtu_expires || mtu < peer->pmtu_learned) {
-			unsigned long pmtu_expires;
-
-			pmtu_expires = jiffies + ip_rt_mtu_expires;
-			if (!pmtu_expires)
-				pmtu_expires = 1UL;
-
-			est_mtu = mtu;
-			peer->pmtu_learned = mtu;
-			peer->pmtu_expires = pmtu_expires;
-			atomic_inc(&__rt_peer_genid);
-		}
-
-		inet_putpeer(peer);
-	}
-	return est_mtu ? : new_mtu;
-}
-
 static void check_peer_pmtu(struct dst_entry *dst, struct inet_peer *peer)
 {
 	unsigned long expires = ACCESS_ONCE(peer->pmtu_expires);
diff --git a/net/rxrpc/ar-error.c b/net/rxrpc/ar-error.c
index 5d6b572..a920608 100644
--- a/net/rxrpc/ar-error.c
+++ b/net/rxrpc/ar-error.c
@@ -81,10 +81,6 @@ void rxrpc_UDP_error_report(struct sock *sk)
 			_net("I/F MTU %u", mtu);
 		}
 
-		/* ip_rt_frag_needed() may have eaten the info */
-		if (mtu == 0)
-			mtu = ntohs(icmp_hdr(skb)->un.frag.mtu);
-
 		if (mtu == 0) {
 			/* they didn't give us a size, estimate one */
 			if (mtu > 1500) {
-- 
1.7.10

^ permalink raw reply related

* [PATCH 1/5] inet: Hide route peer accesses behind helpers.
From: David Miller @ 2012-06-11  9:29 UTC (permalink / raw)
  To: netdev


We encode the pointer(s) into an unsigned long with one state bit.

The state bit is used so we can store the inetpeer tree root to use
when resolving the peer later.

Later the peer roots will be per-FIB table, and this change works to
facilitate that.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/inetpeer.h  |   54 +++++++++++++++++++++++++++++++++++++++++++++
 include/net/ip6_fib.h   |   32 ++++++++++++++++++++++++++-
 include/net/ip6_route.h |    6 ++---
 include/net/route.h     |   42 +++++++++++++++++++++++++++++++----
 net/ipv4/route.c        |   56 ++++++++++++++++++++++++++++-------------------
 net/ipv4/xfrm4_policy.c |   10 ++++-----
 net/ipv6/route.c        |   42 ++++++++++++++++++++---------------
 net/ipv6/xfrm6_policy.c |   10 ++++-----
 8 files changed, 193 insertions(+), 59 deletions(-)

diff --git a/include/net/inetpeer.h b/include/net/inetpeer.h
index b84b32f..d432489 100644
--- a/include/net/inetpeer.h
+++ b/include/net/inetpeer.h
@@ -71,6 +71,60 @@ struct inet_peer_base {
 	int			total;
 };
 
+#define INETPEER_BASE_BIT	0x1UL
+
+static inline struct inet_peer *inetpeer_ptr(unsigned long val)
+{
+	BUG_ON(val & INETPEER_BASE_BIT);
+	return (struct inet_peer *) val;
+}
+
+static inline struct inet_peer_base *inetpeer_base_ptr(unsigned long val)
+{
+	if (!(val & INETPEER_BASE_BIT))
+		return NULL;
+	val &= ~INETPEER_BASE_BIT;
+	return (struct inet_peer_base *) val;
+}
+
+static inline bool inetpeer_ptr_is_peer(unsigned long val)
+{
+	return !(val & INETPEER_BASE_BIT);
+}
+
+static inline void __inetpeer_ptr_set_peer(unsigned long *val, struct inet_peer *peer)
+{
+	/* This implicitly clears INETPEER_BASE_BIT */
+	*val = (unsigned long) peer;
+}
+
+static inline bool inetpeer_ptr_set_peer(unsigned long *ptr, struct inet_peer *peer)
+{
+	unsigned long val = (unsigned long) peer;
+	unsigned long orig = *ptr;
+
+	if (!(orig & INETPEER_BASE_BIT) || !val ||
+	    cmpxchg(ptr, orig, val) != orig)
+		return false;
+	return true;
+}
+
+static inline void inetpeer_init_ptr(unsigned long *ptr, struct inet_peer_base *base)
+{
+	*ptr = (unsigned long) base | INETPEER_BASE_BIT;
+}
+
+static inline void inetpeer_transfer_peer(unsigned long *to, unsigned long *from)
+{
+	unsigned long val = *from;
+
+	*to = val;
+	if (inetpeer_ptr_is_peer(val)) {
+		struct inet_peer *peer = inetpeer_ptr(val);
+		atomic_inc(&peer->refcnt);
+	}
+}
+
 extern void inet_peer_base_init(struct inet_peer_base *);
 
 void			inet_initpeers(void) __init;
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 0ae759a..3ac5f15 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -107,7 +107,7 @@ struct rt6_info {
 	u32				rt6i_peer_genid;
 
 	struct inet6_dev		*rt6i_idev;
-	struct inet_peer		*rt6i_peer;
+	unsigned long			_rt6i_peer;
 
 #ifdef CONFIG_XFRM
 	u32				rt6i_flow_cache_genid;
@@ -118,6 +118,36 @@ struct rt6_info {
 	u8				rt6i_protocol;
 };
 
+static inline struct inet_peer *rt6_peer_ptr(struct rt6_info *rt)
+{
+	return inetpeer_ptr(rt->_rt6i_peer);
+}
+
+static inline bool rt6_has_peer(struct rt6_info *rt)
+{
+	return inetpeer_ptr_is_peer(rt->_rt6i_peer);
+}
+
+static inline void __rt6_set_peer(struct rt6_info *rt, struct inet_peer *peer)
+{
+	__inetpeer_ptr_set_peer(&rt->_rt6i_peer, peer);
+}
+
+static inline bool rt6_set_peer(struct rt6_info *rt, struct inet_peer *peer)
+{
+	return inetpeer_ptr_set_peer(&rt->_rt6i_peer, peer);
+}
+
+static inline void rt6_init_peer(struct rt6_info *rt, struct inet_peer_base *base)
+{
+	inetpeer_init_ptr(&rt->_rt6i_peer, base);
+}
+
+static inline void rt6_transfer_peer(struct rt6_info *rt, struct rt6_info *ort)
+{
+	inetpeer_transfer_peer(&rt->_rt6i_peer, &ort->_rt6i_peer);
+}
+
 static inline struct inet6_dev *ip6_dst_idev(struct dst_entry *dst)
 {
 	return ((struct rt6_info *)dst)->rt6i_idev;
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 73d7502..f88a85c 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -57,11 +57,11 @@ extern void rt6_bind_peer(struct rt6_info *rt, int create);
 
 static inline struct inet_peer *__rt6_get_peer(struct rt6_info *rt, int create)
 {
-	if (rt->rt6i_peer)
-		return rt->rt6i_peer;
+	if (rt6_has_peer(rt))
+		return rt6_peer_ptr(rt);
 
 	rt6_bind_peer(rt, create);
-	return rt->rt6i_peer;
+	return rt6_peer_ptr(rt);
 }
 
 static inline struct inet_peer *rt6_get_peer(struct rt6_info *rt)
diff --git a/include/net/route.h b/include/net/route.h
index 433fc6c..6340c37 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -67,10 +67,44 @@ struct rtable {
 	/* Miscellaneous cached information */
 	__be32			rt_spec_dst; /* RFC1122 specific destination */
 	u32			rt_peer_genid;
-	struct inet_peer	*peer; /* long-living peer info */
+	unsigned long		_peer; /* long-living peer info */
 	struct fib_info		*fi; /* for client ref to shared metrics */
 };
 
+static inline struct inet_peer *rt_peer_ptr(struct rtable *rt)
+{
+	return inetpeer_ptr(rt->_peer);
+}
+
+static inline bool rt_has_peer(struct rtable *rt)
+{
+	return inetpeer_ptr_is_peer(rt->_peer);
+}
+
+static inline void __rt_set_peer(struct rtable *rt, struct inet_peer *peer)
+{
+	__inetpeer_ptr_set_peer(&rt->_peer, peer);
+}
+
+static inline bool rt_set_peer(struct rtable *rt, struct inet_peer *peer)
+{
+	return inetpeer_ptr_set_peer(&rt->_peer, peer);
+}
+
+static inline void rt_init_peer(struct rtable *rt, struct inet_peer_base *base)
+{
+	inetpeer_init_ptr(&rt->_peer, base);
+}
+
+static inline void rt_transfer_peer(struct rtable *rt, struct rtable *ort)
+{
+	rt->_peer = ort->_peer;
+	if (rt_has_peer(ort)) {
+		struct inet_peer *peer = rt_peer_ptr(ort);
+		atomic_inc(&peer->refcnt);
+	}
+}
+
 static inline bool rt_is_input_route(const struct rtable *rt)
 {
 	return rt->rt_route_iif != 0;
@@ -298,11 +332,11 @@ extern void rt_bind_peer(struct rtable *rt, __be32 daddr, int create);
 
 static inline struct inet_peer *__rt_get_peer(struct rtable *rt, __be32 daddr, int create)
 {
-	if (rt->peer)
-		return rt->peer;
+	if (rt_has_peer(rt))
+		return rt_peer_ptr(rt);
 
 	rt_bind_peer(rt, daddr, create);
-	return rt->peer;
+	return rt_peer_ptr(rt);
 }
 
 static inline struct inet_peer *rt_get_peer(struct rtable *rt, __be32 daddr)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2aa663a..03e5b61 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -677,7 +677,7 @@ static inline int rt_fast_clean(struct rtable *rth)
 static inline int rt_valuable(struct rtable *rth)
 {
 	return (rth->rt_flags & (RTCF_REDIRECTED | RTCF_NOTIFY)) ||
-		(rth->peer && rth->peer->pmtu_expires);
+		(rt_has_peer(rth) && rt_peer_ptr(rth)->pmtu_expires);
 }
 
 static int rt_may_expire(struct rtable *rth, unsigned long tmo1, unsigned long tmo2)
@@ -1325,12 +1325,16 @@ static u32 rt_peer_genid(void)
 
 void rt_bind_peer(struct rtable *rt, __be32 daddr, int create)
 {
-	struct net *net = dev_net(rt->dst.dev);
+	struct inet_peer_base *base;
 	struct inet_peer *peer;
 
-	peer = inet_getpeer_v4(net->ipv4.peers, daddr, create);
+	base = inetpeer_base_ptr(rt->_peer);
+	if (!base)
+		return;
+
+	peer = inet_getpeer_v4(base, daddr, create);
 
-	if (peer && cmpxchg(&rt->peer, NULL, peer) != NULL)
+	if (!rt_set_peer(rt, peer))
 		inet_putpeer(peer);
 	else
 		rt->rt_peer_genid = rt_peer_genid();
@@ -1533,8 +1537,10 @@ static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst)
 						rt_genid(dev_net(dst->dev)));
 			rt_del(hash, rt);
 			ret = NULL;
-		} else if (rt->peer && peer_pmtu_expired(rt->peer)) {
-			dst_metric_set(dst, RTAX_MTU, rt->peer->pmtu_orig);
+		} else if (rt_has_peer(rt)) {
+			struct inet_peer *peer = rt_peer_ptr(rt);
+			if (peer_pmtu_expired(peer))
+				dst_metric_set(dst, RTAX_MTU, peer->pmtu_orig);
 		}
 	}
 	return ret;
@@ -1796,14 +1802,13 @@ static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
 static void ipv4_dst_destroy(struct dst_entry *dst)
 {
 	struct rtable *rt = (struct rtable *) dst;
-	struct inet_peer *peer = rt->peer;
 
 	if (rt->fi) {
 		fib_info_put(rt->fi);
 		rt->fi = NULL;
 	}
-	if (peer) {
-		rt->peer = NULL;
+	if (rt_has_peer(rt)) {
+		struct inet_peer *peer = rt_peer_ptr(rt);
 		inet_putpeer(peer);
 	}
 }
@@ -1816,8 +1821,11 @@ static void ipv4_link_failure(struct sk_buff *skb)
 	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
 
 	rt = skb_rtable(skb);
-	if (rt && rt->peer && peer_pmtu_cleaned(rt->peer))
-		dst_metric_set(&rt->dst, RTAX_MTU, rt->peer->pmtu_orig);
+	if (rt && rt_has_peer(rt)) {
+		struct inet_peer *peer = rt_peer_ptr(rt);
+		if (peer_pmtu_cleaned(peer))
+			dst_metric_set(&rt->dst, RTAX_MTU, peer->pmtu_orig);
+	}
 }
 
 static int ip_rt_bug(struct sk_buff *skb)
@@ -1919,7 +1927,7 @@ static unsigned int ipv4_mtu(const struct dst_entry *dst)
 static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4,
 			    struct fib_info *fi)
 {
-	struct net *net = dev_net(rt->dst.dev);
+	struct inet_peer_base *base;
 	struct inet_peer *peer;
 	int create = 0;
 
@@ -1929,8 +1937,12 @@ static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4,
 	if (fl4 && (fl4->flowi4_flags & FLOWI_FLAG_PRECOW_METRICS))
 		create = 1;
 
-	rt->peer = peer = inet_getpeer_v4(net->ipv4.peers, rt->rt_dst, create);
+	base = inetpeer_base_ptr(rt->_peer);
+	BUG_ON(!base);
+
+	peer = inet_getpeer_v4(base, rt->rt_dst, create);
 	if (peer) {
+		__rt_set_peer(rt, peer);
 		rt->rt_peer_genid = rt_peer_genid();
 		if (inet_metrics_new(peer))
 			memcpy(peer->metrics, fi->fib_metrics,
@@ -2046,7 +2058,7 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	rth->rt_gateway	= daddr;
 	rth->rt_spec_dst= spec_dst;
 	rth->rt_peer_genid = 0;
-	rth->peer = NULL;
+	rt_init_peer(rth, dev_net(dev)->ipv4.peers);
 	rth->fi = NULL;
 	if (our) {
 		rth->dst.input= ip_local_deliver;
@@ -2174,7 +2186,7 @@ static int __mkroute_input(struct sk_buff *skb,
 	rth->rt_gateway	= daddr;
 	rth->rt_spec_dst= spec_dst;
 	rth->rt_peer_genid = 0;
-	rth->peer = NULL;
+	rt_init_peer(rth, dev_net(rth->dst.dev)->ipv4.peers);
 	rth->fi = NULL;
 
 	rth->dst.input = ip_forward;
@@ -2357,7 +2369,7 @@ local_input:
 	rth->rt_gateway	= daddr;
 	rth->rt_spec_dst= spec_dst;
 	rth->rt_peer_genid = 0;
-	rth->peer = NULL;
+	rt_init_peer(rth, net->ipv4.peers);
 	rth->fi = NULL;
 	if (res.type == RTN_UNREACHABLE) {
 		rth->dst.input= ip_error;
@@ -2561,7 +2573,7 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 	rth->rt_gateway = fl4->daddr;
 	rth->rt_spec_dst= fl4->saddr;
 	rth->rt_peer_genid = 0;
-	rth->peer = NULL;
+	rt_init_peer(rth, dev_net(dev_out)->ipv4.peers);
 	rth->fi = NULL;
 
 	RT_CACHE_STAT_INC(out_slow_tot);
@@ -2898,9 +2910,7 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_or
 		rt->rt_src = ort->rt_src;
 		rt->rt_gateway = ort->rt_gateway;
 		rt->rt_spec_dst = ort->rt_spec_dst;
-		rt->peer = ort->peer;
-		if (rt->peer)
-			atomic_inc(&rt->peer->refcnt);
+		rt_transfer_peer(rt, ort);
 		rt->fi = ort->fi;
 		if (rt->fi)
 			atomic_inc(&rt->fi->fib_clntref);
@@ -2938,7 +2948,6 @@ static int rt_fill_info(struct net *net,
 	struct rtmsg *r;
 	struct nlmsghdr *nlh;
 	unsigned long expires = 0;
-	const struct inet_peer *peer = rt->peer;
 	u32 id = 0, ts = 0, tsage = 0, error;
 
 	nlh = nlmsg_put(skb, pid, seq, event, sizeof(*r), flags);
@@ -2994,8 +3003,9 @@ static int rt_fill_info(struct net *net,
 		goto nla_put_failure;
 
 	error = rt->dst.error;
-	if (peer) {
-		inet_peer_refcheck(rt->peer);
+	if (rt_has_peer(rt)) {
+		const struct inet_peer *peer = rt_peer_ptr(rt);
+		inet_peer_refcheck(peer);
 		id = atomic_read(&peer->ip_id_count) & 0xffff;
 		if (peer->tcp_ts_stamp) {
 			ts = peer->tcp_ts;
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 0d3426c..8855d82 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -90,9 +90,7 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 	xdst->u.dst.dev = dev;
 	dev_hold(dev);
 
-	xdst->u.rt.peer = rt->peer;
-	if (rt->peer)
-		atomic_inc(&rt->peer->refcnt);
+	rt_transfer_peer(&xdst->u.rt, rt);
 
 	/* Sheit... I remember I did this right. Apparently,
 	 * it was magically lost, so this code needs audit */
@@ -212,8 +210,10 @@ static void xfrm4_dst_destroy(struct dst_entry *dst)
 
 	dst_destroy_metrics_generic(dst);
 
-	if (likely(xdst->u.rt.peer))
-		inet_putpeer(xdst->u.rt.peer);
+	if (rt_has_peer(&xdst->u.rt)) {
+		struct inet_peer *peer = rt_peer_ptr(&xdst->u.rt);
+		inet_putpeer(peer);
+	}
 
 	xfrm_dst_destroy(xdst);
 }
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8fc41d5..17a9b86 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -258,16 +258,18 @@ static struct rt6_info ip6_blk_hole_entry_template = {
 #endif
 
 /* allocate dst with ip6_dst_ops */
-static inline struct rt6_info *ip6_dst_alloc(struct dst_ops *ops,
+static inline struct rt6_info *ip6_dst_alloc(struct net *net,
 					     struct net_device *dev,
 					     int flags)
 {
-	struct rt6_info *rt = dst_alloc(ops, dev, 0, 0, flags);
+	struct rt6_info *rt = dst_alloc(&net->ipv6.ip6_dst_ops, dev,
+					0, 0, flags);
 
-	if (rt)
+	if (rt) {
 		memset(&rt->rt6i_table, 0,
 		       sizeof(*rt) - sizeof(struct dst_entry));
-
+		rt6_init_peer(rt, net->ipv6.peers);
+	}
 	return rt;
 }
 
@@ -275,7 +277,6 @@ static void ip6_dst_destroy(struct dst_entry *dst)
 {
 	struct rt6_info *rt = (struct rt6_info *)dst;
 	struct inet6_dev *idev = rt->rt6i_idev;
-	struct inet_peer *peer = rt->rt6i_peer;
 
 	if (!(rt->dst.flags & DST_HOST))
 		dst_destroy_metrics_generic(dst);
@@ -288,8 +289,8 @@ static void ip6_dst_destroy(struct dst_entry *dst)
 	if (!(rt->rt6i_flags & RTF_EXPIRES) && dst->from)
 		dst_release(dst->from);
 
-	if (peer) {
-		rt->rt6i_peer = NULL;
+	if (rt6_has_peer(rt)) {
+		struct inet_peer *peer = rt6_peer_ptr(rt);
 		inet_putpeer(peer);
 	}
 }
@@ -303,11 +304,15 @@ static u32 rt6_peer_genid(void)
 
 void rt6_bind_peer(struct rt6_info *rt, int create)
 {
-	struct net *net = dev_net(rt->dst.dev);
+	struct inet_peer_base *base;
 	struct inet_peer *peer;
 
-	peer = inet_getpeer_v6(net->ipv6.peers, &rt->rt6i_dst.addr, create);
-	if (peer && cmpxchg(&rt->rt6i_peer, NULL, peer) != NULL)
+	base = inetpeer_base_ptr(rt->_rt6i_peer);
+	if (!base)
+		return;
+
+	peer = inet_getpeer_v6(base, &rt->rt6i_dst.addr, create);
+	if (!rt6_set_peer(rt, peer))
 		inet_putpeer(peer);
 	else
 		rt->rt6i_peer_genid = rt6_peer_genid();
@@ -950,6 +955,7 @@ struct dst_entry *ip6_blackhole_route(struct net *net, struct dst_entry *dst_ori
 	rt = dst_alloc(&ip6_dst_blackhole_ops, ort->dst.dev, 1, 0, 0);
 	if (rt) {
 		memset(&rt->rt6i_table, 0, sizeof(*rt) - sizeof(struct dst_entry));
+		rt6_init_peer(rt, net->ipv6.peers);
 
 		new = &rt->dst;
 
@@ -994,7 +1000,7 @@ static struct dst_entry *ip6_dst_check(struct dst_entry *dst, u32 cookie)
 
 	if (rt->rt6i_node && (rt->rt6i_node->fn_sernum == cookie)) {
 		if (rt->rt6i_peer_genid != rt6_peer_genid()) {
-			if (!rt->rt6i_peer)
+			if (!rt6_has_peer(rt))
 				rt6_bind_peer(rt, 0);
 			rt->rt6i_peer_genid = rt6_peer_genid();
 		}
@@ -1108,7 +1114,7 @@ struct dst_entry *icmp6_dst_alloc(struct net_device *dev,
 	if (unlikely(!idev))
 		return ERR_PTR(-ENODEV);
 
-	rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops, dev, 0);
+	rt = ip6_dst_alloc(net, dev, 0);
 	if (unlikely(!rt)) {
 		in6_dev_put(idev);
 		dst = ERR_PTR(-ENOMEM);
@@ -1290,7 +1296,7 @@ int ip6_route_add(struct fib6_config *cfg)
 	if (!table)
 		goto out;
 
-	rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops, NULL, DST_NOCOUNT);
+	rt = ip6_dst_alloc(net, NULL, DST_NOCOUNT);
 
 	if (!rt) {
 		err = -ENOMEM;
@@ -1812,8 +1818,7 @@ static struct rt6_info *ip6_rt_copy(struct rt6_info *ort,
 				    const struct in6_addr *dest)
 {
 	struct net *net = dev_net(ort->dst.dev);
-	struct rt6_info *rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops,
-					    ort->dst.dev, 0);
+	struct rt6_info *rt = ip6_dst_alloc(net, ort->dst.dev, 0);
 
 	if (rt) {
 		rt->dst.input = ort->dst.input;
@@ -2097,8 +2102,7 @@ struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
 				    bool anycast)
 {
 	struct net *net = dev_net(idev->dev);
-	struct rt6_info *rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops,
-					    net->loopback_dev, 0);
+	struct rt6_info *rt = ip6_dst_alloc(net, net->loopback_dev, 0);
 	int err;
 
 	if (!rt) {
@@ -2519,7 +2523,9 @@ static int rt6_fill_node(struct net *net,
 	else
 		expires = INT_MAX;
 
-	peer = rt->rt6i_peer;
+	peer = NULL;
+	if (rt6_has_peer(rt))
+		peer = rt6_peer_ptr(rt);
 	ts = tsage = 0;
 	if (peer && peer->tcp_ts_stamp) {
 		ts = peer->tcp_ts;
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 8625fba..d749484 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -99,9 +99,7 @@ static int xfrm6_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 	if (!xdst->u.rt6.rt6i_idev)
 		return -ENODEV;
 
-	xdst->u.rt6.rt6i_peer = rt->rt6i_peer;
-	if (rt->rt6i_peer)
-		atomic_inc(&rt->rt6i_peer->refcnt);
+	rt6_transfer_peer(&xdst->u.rt6, rt);
 
 	/* Sheit... I remember I did this right. Apparently,
 	 * it was magically lost, so this code needs audit */
@@ -223,8 +221,10 @@ static void xfrm6_dst_destroy(struct dst_entry *dst)
 	if (likely(xdst->u.rt6.rt6i_idev))
 		in6_dev_put(xdst->u.rt6.rt6i_idev);
 	dst_destroy_metrics_generic(dst);
-	if (likely(xdst->u.rt6.rt6i_peer))
-		inet_putpeer(xdst->u.rt6.rt6i_peer);
+	if (rt6_has_peer(&xdst->u.rt6)) {
+		struct inet_peer *peer = rt6_peer_ptr(&xdst->u.rt6);
+		inet_putpeer(peer);
+	}
 	xfrm_dst_destroy(xdst);
 }
 
-- 
1.7.10

^ permalink raw reply related

* [PATCH 0/5] Inetpeer roots in FIB tables
From: David Miller @ 2012-06-11  9:28 UTC (permalink / raw)
  To: netdev

This patch series should fix the problem in the bugzilla Stephen
forwarded last week, in that we won't cache metrics properly for
source based routes.

Committed to net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [PATCH 1/3] net: lpc_eth: Replace WARN() trace with simple pr_warn()
From: Roland Stigge @ 2012-06-11  9:26 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, netdev, linux-kernel, kevin.wells, srinivas.bakki,
	aletes.xgr, linux-arm-kernel
In-Reply-To: <20120611.020352.1962768244524496467.davem@davemloft.net>

Hi!

On 06/11/2012 11:03 AM, David Miller wrote:
> From: Roland Stigge <stigge@antcom.de>
> Date: Mon, 11 Jun 2012 10:36:45 +0200
> 
>> But maybe this is wrong. Can you please give me a hint how the net
>> subsystem makes sure that this doesn't happen under normal circumstances?
> 
> Well if you are asking this question then you didn't read my feedback,
> because I explained exactly what prevents this.

Re-reading your feedback, you are right, sorry!

My question was based on the assumption that the driver is doing
correctly, which was wrong.

Thank you and Eric for clarifying!

Eric's second (cumulative) patch works fine for now, and I can't
reproduce the issue. Will do more test runs now and will reply back
later with an updated patch set.

Is it sensible at this point to increase the TX buffers anyway? For
different reasons of course: We have enough SRAM available and TX
buffers (16->32) are still more than RX buffers (48).

Roland

^ permalink raw reply

* [PATCH] lpc_eth: add missing ndo_change_mtu()
From: Eric Dumazet @ 2012-06-11  9:24 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, stigge, kevin.wells, aletes.xgr, srinivas.bakki

From: Eric Dumazet <edumazet@google.com>

lpc_eth does a copy of transmitted skbs to DMA area, without checking
skb lengths, so can trigger buffer overflows :

memcpy(pldat->tx_buff_v + txidx * ENET_MAXF_SIZE, skb->data, len);

One way to get bigger skbs is to allow MTU changes above the 1500 limit.

Calling eth_change_mtu() in ndo_change_mtu() makes sure this cannot
happen.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Roland Stigge <stigge@antcom.de>
Cc: Kevin Wells <kevin.wells@nxp.com>
---
diff --git a/drivers/net/ethernet/nxp/lpc_eth.c b/drivers/net/ethernet/nxp/lpc_eth.c
index 8d2666f..10febdc 100644
--- a/drivers/net/ethernet/nxp/lpc_eth.c
+++ b/drivers/net/ethernet/nxp/lpc_eth.c
@@ -1320,6 +1320,7 @@ static const struct net_device_ops lpc_netdev_ops = {
 	.ndo_set_rx_mode	= lpc_eth_set_multicast_list,
 	.ndo_do_ioctl		= lpc_eth_ioctl,
 	.ndo_set_mac_address	= lpc_set_mac_address,
+	.ndo_change_mtu		= eth_change_mtu,
 };
 
 static int lpc_eth_drv_probe(struct platform_device *pdev)

^ permalink raw reply related

* Re: [PATCH 1/3] net: lpc_eth: Replace WARN() trace with simple pr_warn()
From: David Miller @ 2012-06-11  9:03 UTC (permalink / raw)
  To: stigge
  Cc: eric.dumazet, netdev, linux-kernel, kevin.wells, srinivas.bakki,
	aletes.xgr, linux-arm-kernel
In-Reply-To: <4FD5AE1D.9030807@antcom.de>

From: Roland Stigge <stigge@antcom.de>
Date: Mon, 11 Jun 2012 10:36:45 +0200

> But maybe this is wrong. Can you please give me a hint how the net
> subsystem makes sure that this doesn't happen under normal circumstances?

Well if you are asking this question then you didn't read my feedback,
because I explained exactly what prevents this.

^ permalink raw reply

* [PATCH net 3/3] bonding:force to use primary slave
From: Weiping Pan @ 2012-06-11  9:00 UTC (permalink / raw)
  To: netdev
In-Reply-To: <cover.1339404887.git.wpan@redhat.com>

When we set primary slave with module parameters, bond will always use this
primary slave as active slave.

But when we modify primary slave via sysfs, it will call
bond_should_change_active() and take into account primary_reselect.

And I think we should use the new primary slave as the new active slave
regardless of the value of primary_reselect.
Thus the behavior is the same with module parameters and meets the
administrator's expectation.

Signed-off-by: Weiping Pan <wpan@redhat.com>
---
 drivers/net/bonding/bond_sysfs.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 1b0f3cd..7256ae4 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1077,6 +1077,7 @@ static ssize_t bonding_store_primary(struct device *d,
 				bond->dev->name, slave->dev->name);
 			bond->primary_slave = slave;
 			strcpy(bond->params.primary, slave->dev->name);
+			bond->force_primary = true;
 			bond_select_active_slave(bond);
 			goto out;
 		}
-- 
1.7.4

^ permalink raw reply related

* [PATCH net 2/3] bonding:check mode when modify primary_reselect
From: Weiping Pan @ 2012-06-11  9:00 UTC (permalink / raw)
  To: netdev
In-Reply-To: <cover.1339404887.git.wpan@redhat.com>

Using a primary_reselect only makes sense in active backup, TLB or ALB modes.

Signed-off-by: Weiping Pan <wpan@redhat.com>
---
 drivers/net/bonding/bond_sysfs.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 485bedb..1b0f3cd 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1123,6 +1123,13 @@ static ssize_t bonding_store_primary_reselect(struct device *d,
 	if (!rtnl_trylock())
 		return restart_syscall();
 
+	if (!USES_PRIMARY(bond->params.mode)) {
+		pr_err("%s: Unable to set primary_reselect; %s is in mode %d\n",
+			bond->dev->name, bond->dev->name, bond->params.mode);
+		ret = -EINVAL;
+		goto out;
+	}
+
 	new_value = bond_parse_parm(buf, pri_reselect_tbl);
 	if (new_value < 0)  {
 		pr_err("%s: Ignoring invalid primary_reselect value %.*s.\n",
-- 
1.7.4

^ permalink raw reply related

* [PATCH net 1/3] bonding:record primary when modify it via sysfs
From: Weiping Pan @ 2012-06-11  9:00 UTC (permalink / raw)
  To: netdev
In-Reply-To: <cover.1339404887.git.wpan@redhat.com>

If we modify primary via sysfs and it is not a valid slave,
we should record it for future use, and this behavior is the same with
bond_check_params().

Signed-off-by: Weiping Pan <wpan@redhat.com>
---
 drivers/net/bonding/bond_sysfs.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index aef42f0..485bedb 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1082,8 +1082,12 @@ static ssize_t bonding_store_primary(struct device *d,
 		}
 	}
 
-	pr_info("%s: Unable to set %.*s as primary slave.\n",
-		bond->dev->name, (int)strlen(buf) - 1, buf);
+	strncpy(bond->params.primary, ifname, IFNAMSIZ);
+	bond->params.primary[IFNAMSIZ - 1] = 0;
+
+	pr_info("%s: Recording %s as primary, "
+		"but it has not been enslaved to %s yet.\n",
+		bond->dev->name, ifname, bond->dev->name);
 out:
 	write_unlock_bh(&bond->curr_slave_lock);
 	read_unlock(&bond->lock);
-- 
1.7.4

^ permalink raw reply related

* [PATCH net 0/3] correct behavior when modify primary via sysfs
From: Weiping Pan @ 2012-06-11  9:00 UTC (permalink / raw)
  To: netdev

There is a problem that when we set primary slave with module parameters,
bond will always use this primary slave as active slave.

But when we modify primary slave via sysfs, it will call
bond_should_change_active() and take into account
primary_reselect.

And I think we should use the new primary slave as the new active slave
regardless of the value of primary_reselect.
Thus the behavior is the same with module parameters and meets the
administrator's expectation.

Weiping Pan (3):
  bonding:record primary when modify it via sysfs
  bonding:check mode when modify primary_reselect
  bonding:force to use primary slave

 drivers/net/bonding/bond_sysfs.c |   16 ++++++++++++++--
 1 files changed, 14 insertions(+), 2 deletions(-)

-- 
1.7.4

^ permalink raw reply

* Re: [PATCH 1/3] net: lpc_eth: Replace WARN() trace with simple pr_warn()
From: Eric Dumazet @ 2012-06-11  8:53 UTC (permalink / raw)
  To: Roland Stigge
  Cc: davem, netdev, linux-kernel, kevin.wells, srinivas.bakki,
	aletes.xgr, linux-arm-kernel
In-Reply-To: <4FD5AE1D.9030807@antcom.de>

On Mon, 2012-06-11 at 10:36 +0200, Roland Stigge wrote:

> I encountered cases where this happened for me on a custom board under
> heavy load.
> 
> I discussed this with Kevin Wells, the original driver author. We
> identified the case of xmit()'s TX request (from .ndo_start_xmit) with
> full TX driver buffers as valid when ethernet is busy.
> 
> But maybe this is wrong. Can you please give me a hint how the net
> subsystem makes sure that this doesn't happen under normal circumstances?

When TX ring is about to be filler, driver lpc_eth_hard_start_xmit()
calls netif_stop_queue(ndev);

So network stack should not call again lpc_eth_hard_start_xmit().

I would say the bug(s) come from __lpc_handle_xmit(), since it does :

if (netif_queue_stopped(ndev))
	netif_wake_queue(ndev);

without making sure some room is available in TX ring.

cumulative patch :

diff --git a/drivers/net/ethernet/nxp/lpc_eth.c b/drivers/net/ethernet/nxp/lpc_eth.c
index 8d2666f..59b37c8 100644
--- a/drivers/net/ethernet/nxp/lpc_eth.c
+++ b/drivers/net/ethernet/nxp/lpc_eth.c
@@ -946,16 +946,16 @@ static void __lpc_handle_xmit(struct net_device *ndev)
 			/* Update stats */
 			ndev->stats.tx_packets++;
 			ndev->stats.tx_bytes += skb->len;
-
-			/* Free buffer */
-			dev_kfree_skb_irq(skb);
 		}
+		dev_kfree_skb_irq(skb);
 
 		txcidx = readl(LPC_ENET_TXCONSUMEINDEX(pldat->net_base));
 	}
 
-	if (netif_queue_stopped(ndev))
-		netif_wake_queue(ndev);
+	if (pldat->num_used_tx_buffs <= ENET_TX_DESC/2) { 
+		if (netif_queue_stopped(ndev))
+			netif_wake_queue(ndev);
+	}
 }
 
 static int __lpc_handle_recv(struct net_device *ndev, int budget)

^ permalink raw reply related

* Re: [PATCH 1/3] net: lpc_eth: Replace WARN() trace with simple pr_warn()
From: Eric Dumazet @ 2012-06-11  8:39 UTC (permalink / raw)
  To: Roland Stigge
  Cc: davem, netdev, linux-kernel, kevin.wells, srinivas.bakki,
	aletes.xgr, linux-arm-kernel
In-Reply-To: <1339403108.6001.1697.camel@edumazet-glaptop>

On Mon, 2012-06-11 at 10:25 +0200, Eric Dumazet wrote:
> On Mon, 2012-06-11 at 10:03 +0200, Roland Stigge wrote:
> > A WARN() trace indicating a "BUG!" was identified as a "normal" case in the
> > xmit function in case all TX descriptors are occupied already. In this case,
> > NETDEV_TX_BUSY is returned, nothing buggy at all.
> > 
> > Signed-off-by: Roland Stigge <stigge@antcom.de>
> > Tested-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>
> > 
> > ---
> >  drivers/net/ethernet/nxp/lpc_eth.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > --- linux-2.6.orig/drivers/net/ethernet/nxp/lpc_eth.c
> > +++ linux-2.6/drivers/net/ethernet/nxp/lpc_eth.c
> > @@ -1114,7 +1114,7 @@ static int lpc_eth_hard_start_xmit(struc
> >  		   buffers */
> >  		netif_stop_queue(ndev);
> >  		spin_unlock_irq(&pldat->lock);
> > -		WARN(1, "BUG! TX request when no free TX buffers!\n");
> > +		pr_warn("Note: TX request when no free TX buffers.\n");
> >  		return NETDEV_TX_BUSY;
> >  	}
> >  
> 
> Entering this path is a bug, don't hide it...
> 
> Please share with us how this bug was identified as a "normal case" ?
> 
> 


There is an skb leak in this driver, maybe it's the real problem.

diff --git a/drivers/net/ethernet/nxp/lpc_eth.c b/drivers/net/ethernet/nxp/lpc_eth.c
index 8d2666f..0d0f4cb 100644
--- a/drivers/net/ethernet/nxp/lpc_eth.c
+++ b/drivers/net/ethernet/nxp/lpc_eth.c
@@ -946,10 +946,8 @@ static void __lpc_handle_xmit(struct net_device *ndev)
 			/* Update stats */
 			ndev->stats.tx_packets++;
 			ndev->stats.tx_bytes += skb->len;
-
-			/* Free buffer */
-			dev_kfree_skb_irq(skb);
 		}
+		dev_kfree_skb_irq(skb);
 
 		txcidx = readl(LPC_ENET_TXCONSUMEINDEX(pldat->net_base));
 	}

^ permalink raw reply related

* Re: [PATCH 1/3] net: lpc_eth: Replace WARN() trace with simple pr_warn()
From: Roland Stigge @ 2012-06-11  8:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, netdev, linux-kernel, kevin.wells, srinivas.bakki,
	aletes.xgr, linux-arm-kernel
In-Reply-To: <1339403108.6001.1697.camel@edumazet-glaptop>

Hi Dave and Eric,

thanks for your feedback!

On 06/11/2012 10:25 AM, Eric Dumazet wrote:
> On Mon, 2012-06-11 at 10:03 +0200, Roland Stigge wrote:
>> A WARN() trace indicating a "BUG!" was identified as a "normal" case in the
>> xmit function in case all TX descriptors are occupied already. In this case,
>> NETDEV_TX_BUSY is returned, nothing buggy at all.
>>
>> Signed-off-by: Roland Stigge <stigge@antcom.de>
>> Tested-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>
>>
>> ---
>>  drivers/net/ethernet/nxp/lpc_eth.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> --- linux-2.6.orig/drivers/net/ethernet/nxp/lpc_eth.c
>> +++ linux-2.6/drivers/net/ethernet/nxp/lpc_eth.c
>> @@ -1114,7 +1114,7 @@ static int lpc_eth_hard_start_xmit(struc
>>  		   buffers */
>>  		netif_stop_queue(ndev);
>>  		spin_unlock_irq(&pldat->lock);
>> -		WARN(1, "BUG! TX request when no free TX buffers!\n");
>> +		pr_warn("Note: TX request when no free TX buffers.\n");
>>  		return NETDEV_TX_BUSY;
>>  	}
>>  
> 
> Entering this path is a bug, don't hide it...
> 
> Please share with us how this bug was identified as a "normal case" ?

I encountered cases where this happened for me on a custom board under
heavy load.

I discussed this with Kevin Wells, the original driver author. We
identified the case of xmit()'s TX request (from .ndo_start_xmit) with
full TX driver buffers as valid when ethernet is busy.

But maybe this is wrong. Can you please give me a hint how the net
subsystem makes sure that this doesn't happen under normal circumstances?

Thanks in advance!

Roland

^ permalink raw reply

* Re: [PATCH 1/3] net: lpc_eth: Replace WARN() trace with simple pr_warn()
From: Eric Dumazet @ 2012-06-11  8:25 UTC (permalink / raw)
  To: Roland Stigge
  Cc: davem, netdev, linux-kernel, kevin.wells, srinivas.bakki,
	aletes.xgr, linux-arm-kernel
In-Reply-To: <1339401793-12258-1-git-send-email-stigge@antcom.de>

On Mon, 2012-06-11 at 10:03 +0200, Roland Stigge wrote:
> A WARN() trace indicating a "BUG!" was identified as a "normal" case in the
> xmit function in case all TX descriptors are occupied already. In this case,
> NETDEV_TX_BUSY is returned, nothing buggy at all.
> 
> Signed-off-by: Roland Stigge <stigge@antcom.de>
> Tested-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>
> 
> ---
>  drivers/net/ethernet/nxp/lpc_eth.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- linux-2.6.orig/drivers/net/ethernet/nxp/lpc_eth.c
> +++ linux-2.6/drivers/net/ethernet/nxp/lpc_eth.c
> @@ -1114,7 +1114,7 @@ static int lpc_eth_hard_start_xmit(struc
>  		   buffers */
>  		netif_stop_queue(ndev);
>  		spin_unlock_irq(&pldat->lock);
> -		WARN(1, "BUG! TX request when no free TX buffers!\n");
> +		pr_warn("Note: TX request when no free TX buffers.\n");
>  		return NETDEV_TX_BUSY;
>  	}
>  

Entering this path is a bug, don't hide it...

Please share with us how this bug was identified as a "normal case" ?

^ permalink raw reply

* Re: [PATCH 2/3] net: lpc_eth: Increase number of TX descriptors
From: Eric Dumazet @ 2012-06-11  8:21 UTC (permalink / raw)
  To: Roland Stigge
  Cc: davem, netdev, linux-kernel, kevin.wells, srinivas.bakki,
	aletes.xgr, linux-arm-kernel
In-Reply-To: <1339401793-12258-2-git-send-email-stigge@antcom.de>

On Mon, 2012-06-11 at 10:03 +0200, Roland Stigge wrote:
> Since we have enough SRAM, we can increase the number of TX descriptors, so the
> "BUSY" warning about occupied TX descriptors doesn't need to show up as often.
> 
> Signed-off-by: Roland Stigge <stigge@antcom.de>
> Tested-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>
> 
> ---
>  drivers/net/ethernet/nxp/lpc_eth.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- linux-2.6.orig/drivers/net/ethernet/nxp/lpc_eth.c
> +++ linux-2.6/drivers/net/ethernet/nxp/lpc_eth.c
> @@ -56,7 +56,7 @@
>  
>  #define ENET_MAXF_SIZE 1536
>  #define ENET_RX_DESC 48
> -#define ENET_TX_DESC 16
> +#define ENET_TX_DESC 32
>  
>  #define NAPI_WEIGHT 16
>  


Sorry, this is not the right fix.

You only lower probability of the bug.

What is this BUSY warning mentioned in changelog ?

^ permalink raw reply

* Re: [PATCH] ieee802154: verify packet size before trying to allocate it
From: Sasha Levin @ 2012-06-11  8:18 UTC (permalink / raw)
  To: David Miller; +Cc: dbaryshkov, slapin, linux-zigbee-devel, netdev, linux-kernel
In-Reply-To: <20120610.200443.971015025499077057.davem@davemloft.net>

On Sun, 2012-06-10 at 20:04 -0700, David Miller wrote:
> From: Sasha Levin <levinsasha928@gmail.com>
> Date: Sun, 10 Jun 2012 13:10:19 +0200
> 
> > Currently when sending data over datagram, the send function will attempt to
> > allocate any size passed on from the userspace.
> > 
> > We should make sure that this size is checked and limited. The maximum size
> > of an IP packet seemed like the safest limit here.
> > 
> > Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> 
> Why not limit to the device MTU?  That's exactly what I suggested
> to you.

That's what I ended up doing in the reply to this mail.

^ permalink raw reply

* Re: [PATCH 2/3] net: lpc_eth: Increase number of TX descriptors
From: David Miller @ 2012-06-11  8:11 UTC (permalink / raw)
  To: stigge
  Cc: eric.dumazet, netdev, linux-kernel, kevin.wells, srinivas.bakki,
	aletes.xgr, linux-arm-kernel
In-Reply-To: <1339401793-12258-2-git-send-email-stigge@antcom.de>

From: Roland Stigge <stigge@antcom.de>
Date: Mon, 11 Jun 2012 10:03:12 +0200

> Since we have enough SRAM, we can increase the number of TX descriptors, so the
> "BUSY" warning about occupied TX descriptors doesn't need to show up as often.
> 
> Signed-off-by: Roland Stigge <stigge@antcom.de>
> Tested-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>

This is way too terse, and as I described in my reply to your first
patch it is not normal for the transmit function to be invoked when
there are no TX descriptors available.  That's a bug if it is happening.

^ permalink raw reply

* Re: [PATCH 1/3] net: lpc_eth: Replace WARN() trace with simple pr_warn()
From: David Miller @ 2012-06-11  8:10 UTC (permalink / raw)
  To: stigge
  Cc: eric.dumazet, netdev, linux-kernel, kevin.wells, srinivas.bakki,
	aletes.xgr, linux-arm-kernel
In-Reply-To: <1339401793-12258-1-git-send-email-stigge@antcom.de>

From: Roland Stigge <stigge@antcom.de>
Date: Mon, 11 Jun 2012 10:03:11 +0200

> A WARN() trace indicating a "BUG!" was identified as a "normal" case in the
> xmit function in case all TX descriptors are occupied already. In this case,
> NETDEV_TX_BUSY is returned, nothing buggy at all.
> 
> Signed-off-by: Roland Stigge <stigge@antcom.de>
> Tested-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>

This is not normal.

Read the comment above this code you are changing.  If we are
out of TX descriptors, the queue must be stopped, and therefore
if the queue is stopped this transmit method should not be
invoked.

It is a hard error condition, should never occur, and indicates
a very serious error condition in the driver.

^ permalink raw reply

* Re: Generic user-space routing library -- need collaborator
From: Thomas Graf @ 2012-06-11  8:09 UTC (permalink / raw)
  To: Philip Prindeville; +Cc: Netdev
In-Reply-To: <4FCBBD6C.8020904@redfish-solutions.com>

On Sun, Jun 03, 2012 at 01:39:24PM -0600, Philip Prindeville wrote:
> Hi.
> 
> I'm working on adding a few more portability classes to Poco (a multi-platform C++ toolkit) and wanted to add a Net::Routing class for examining and manipulating the routing tables.
> 
> The C++ would just be convenience wrappers around a core C library that handles the netlink semantics. I've looked at libmnl and it's handy, but I need a higher level of abstraction (for instance, parsing an RTA_NETMASK for IPv6 is anything but well-documented).

You want to look at libnl. It's similiar to libmnl but provides a higher
level of abstraction. It implements routing, netfilter and generic
netlink parsing and message construction.

http://www.infradead.org/~tgr/libnl/

You should be able to easily construct C++ wrappers around the lib.

~Thomas

^ permalink raw reply

* [PATCH 3/3] net: lpc_eth: Driver cleanup
From: Roland Stigge @ 2012-06-11  8:03 UTC (permalink / raw)
  To: davem, eric.dumazet, netdev, linux-kernel, kevin.wells,
	srinivas.bakki, aletes.xgr, linux-arm-kernel
  Cc: Roland Stigge
In-Reply-To: <1339401793-12258-1-git-send-email-stigge@antcom.de>

This patch removes some nowadays superfluous definitions (one unused define and
an obsolete function forward declaration) and corrects a netdev_err() to
netdev_dbg().

Signed-off-by: Roland Stigge <stigge@antcom.de>
Signed-off-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>

---
 drivers/net/ethernet/nxp/lpc_eth.c |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

--- linux-2.6.orig/drivers/net/ethernet/nxp/lpc_eth.c
+++ linux-2.6/drivers/net/ethernet/nxp/lpc_eth.c
@@ -52,7 +52,6 @@
 
 #define MODNAME "lpc-eth"
 #define DRV_VERSION "1.00"
-#define PHYDEF_ADDR 0x00
 
 #define ENET_MAXF_SIZE 1536
 #define ENET_RX_DESC 48
@@ -416,9 +415,6 @@ static bool use_iram_for_net(struct devi
 #define TXDESC_CONTROL_LAST		(1 << 30)
 #define TXDESC_CONTROL_INT		(1 << 31)
 
-static int lpc_eth_hard_start_xmit(struct sk_buff *skb,
-				   struct net_device *ndev);
-
 /*
  * Structure of a TX/RX descriptors and RX status
  */
@@ -1441,7 +1437,7 @@ static int lpc_eth_drv_probe(struct plat
 			res->start);
 	netdev_dbg(ndev, "IO address size      :%d\n",
 			res->end - res->start + 1);
-	netdev_err(ndev, "IO address (mapped)  :0x%p\n",
+	netdev_dbg(ndev, "IO address (mapped)  :0x%p\n",
 			pldat->net_base);
 	netdev_dbg(ndev, "IRQ number           :%d\n", ndev->irq);
 	netdev_dbg(ndev, "DMA buffer size      :%d\n", pldat->dma_buff_size);

^ permalink raw reply

* [PATCH 2/3] net: lpc_eth: Increase number of TX descriptors
From: Roland Stigge @ 2012-06-11  8:03 UTC (permalink / raw)
  To: davem, eric.dumazet, netdev, linux-kernel, kevin.wells,
	srinivas.bakki, aletes.xgr, linux-arm-kernel
  Cc: Roland Stigge
In-Reply-To: <1339401793-12258-1-git-send-email-stigge@antcom.de>

Since we have enough SRAM, we can increase the number of TX descriptors, so the
"BUSY" warning about occupied TX descriptors doesn't need to show up as often.

Signed-off-by: Roland Stigge <stigge@antcom.de>
Tested-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>

---
 drivers/net/ethernet/nxp/lpc_eth.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.orig/drivers/net/ethernet/nxp/lpc_eth.c
+++ linux-2.6/drivers/net/ethernet/nxp/lpc_eth.c
@@ -56,7 +56,7 @@
 
 #define ENET_MAXF_SIZE 1536
 #define ENET_RX_DESC 48
-#define ENET_TX_DESC 16
+#define ENET_TX_DESC 32
 
 #define NAPI_WEIGHT 16

^ permalink raw reply

* [PATCH 1/3] net: lpc_eth: Replace WARN() trace with simple pr_warn()
From: Roland Stigge @ 2012-06-11  8:03 UTC (permalink / raw)
  To: davem, eric.dumazet, netdev, linux-kernel, kevin.wells,
	srinivas.bakki, aletes.xgr, linux-arm-kernel
  Cc: Roland Stigge

A WARN() trace indicating a "BUG!" was identified as a "normal" case in the
xmit function in case all TX descriptors are occupied already. In this case,
NETDEV_TX_BUSY is returned, nothing buggy at all.

Signed-off-by: Roland Stigge <stigge@antcom.de>
Tested-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>

---
 drivers/net/ethernet/nxp/lpc_eth.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.orig/drivers/net/ethernet/nxp/lpc_eth.c
+++ linux-2.6/drivers/net/ethernet/nxp/lpc_eth.c
@@ -1114,7 +1114,7 @@ static int lpc_eth_hard_start_xmit(struc
 		   buffers */
 		netif_stop_queue(ndev);
 		spin_unlock_irq(&pldat->lock);
-		WARN(1, "BUG! TX request when no free TX buffers!\n");
+		pr_warn("Note: TX request when no free TX buffers.\n");
 		return NETDEV_TX_BUSY;
 	}
 

^ permalink raw reply

* [PATCH v2] dummy: fix rcu_sched self-detected stalls
From: Eric Dumazet @ 2012-06-11  7:11 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120610.224800.602807319671005890.davem@davemloft.net>

From: Eric Dumazet <edumazet@google.com>

Trying to "modprobe dummy numdummies=30000" triggers :

INFO: rcu_sched self-detected stall on CPU { 8} (t=60000 jiffies)

After this splat, RTNL is locked and reboot is needed.

We must call cond_resched() to avoid this, even holding RTNL.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/dummy.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index 442d91a..bab0158 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -187,8 +187,10 @@ static int __init dummy_init_module(void)
 	rtnl_lock();
 	err = __rtnl_link_register(&dummy_link_ops);
 
-	for (i = 0; i < numdummies && !err; i++)
+	for (i = 0; i < numdummies && !err; i++) {
 		err = dummy_init_one();
+		cond_resched();
+	}
 	if (err < 0)
 		__rtnl_link_unregister(&dummy_link_ops);
 	rtnl_unlock();

^ permalink raw reply related

* Re: [PATCH] net: Reorder initialization in ip_route_output to fix gcc warning
From: David Miller @ 2012-06-11  7:05 UTC (permalink / raw)
  To: roland; +Cc: netdev
In-Reply-To: <CAG4TOxNpw0BAWPK3Zde_mThTvY-ho+Ce_aau3DoLc-0TRFq3PQ@mail.gmail.com>

From: Roland Dreier <roland@kernel.org>
Date: Mon, 11 Jun 2012 00:00:51 -0700

>> I can't figure out what it is actually warning about, can you?
> 
> I think gcc thinks it already initialized the __fl_common struct
> once when it hits the .flowi4_oif initializer (which expands to
> __fl_common.flowic_oif), and then having the .daddr / .saddr
> initializers makes it thinks its done with that structure.
> 
> So it thinks it's initializing __fl_common.flowic_tos to 0, and
> then the .flowi4_tos initializer comes along as a surprise.
> 
> Hmm, looks like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52880
> which is fixed after gcc 4.7.0.  I think it's probably worth working
> around this gcc issue, since this makes W=1 way noisier in
> my build.

I suspected it was a compiler bug :-)

Anyways, agreed, applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox