Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] dsa/mv88e6131: add support for mv88e6085 switch
From: Lennert Buytenhek @ 2011-04-05 13:39 UTC (permalink / raw)
  To: Peter Korsgaard; +Cc: davem, netdev
In-Reply-To: <1302008636-6544-1-git-send-email-jacmet@sunsite.dk>

On Tue, Apr 05, 2011 at 03:03:56PM +0200, Peter Korsgaard wrote:

> The mv88e6085 is identical to the mv88e6095, except that all ports are
> 10/100 Mb/s, so use the existing setup code except for the cpu/dsa speed
> selection in _setup_port().
> 
> Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>

I don't have access to DSA chip docs anymore, but assuming that you've
tested this:

Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>

^ permalink raw reply

* problem of "ipv4: revert Set rt->rt_iif more sanely on output routes."
From: OGAWA Hirofumi @ 2011-04-05 13:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Hi,

ipv4: Set rt->rt_iif more sanely on output routes.
(1018b5c01636c7c6bda31a719bda34fc631db29a)

The above patch seems to be caused of avahi breakage.

I'm not debugging fully though, avahi is using IP_PKTINFO and checking
in_pktinfo->ipi_ifindex > 0.

And if I reverted above patch, it seems to fix avahi's IP_PKTINFO problem.

Ideas?

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply

* RE: Netxen packet loss with VLANs and LRO (was: [PATCH] netxen: fix LRO disable warning)
From: Amit Salecha @ 2011-04-05 13:15 UTC (permalink / raw)
  To: Marc Haber
  Cc: davem@davemloft.net, netdev@vger.kernel.org, Ameen Rahman,
	Rajesh Borundia
In-Reply-To: <20110405124158.GA28382@torres.zugschlus.de>

> On Tue, Apr 05, 2011 at 12:38:11AM -0500, Amit Salecha wrote:
> > > On Mon, Mar 21, 2011 at 03:37:08AM -0700, Amit Kumar Salecha wrote:
> > > > netxen_nic_set_flags() rejects data if other flag than
> ETH_FLAG_LRO
> > > is set.
> > > > Driver also supports NETIF_F_HW_VLAN_TX.
> > > > Now compare data with ethtool_op_get_flags(), to get all
> supported
> > > features.
> > >
> > > Could that be the cause for packet loss on kernel 2.6.38.2 if:
> > >
> > >   - receiving card is NX3031 [4040:0100]
> > >   - frames are received with VLAN tags
> > >   - large received offload is on.
> >
> > If ip_forwarding or routing is enable ....then you may see packet
> loss.
>
> The box is intended to route, so disabling routing is
> contraproductive. I just would like to download software updates to
> the box with decent speed as well.

What I meant, with LRO enable and routing, you may see packet loss.
>
> > > Packet Loss of this kind is noticed when doing TCP data transfers
> > > towards the host with the Netxen Interface and the TCP session is
> > > terminated on the Netxen host itself. TCP sessions routed through
> the
> > > Netxen host are not affected.
> > >
> > > My ethtool doesn't allow me to influence the LRO setting alone - it
> is
> > > disabled when I set rx off but doesn't come on again when rx is set
> to
> > > on again. So, ethtool -K rx off, ethtool -K rx on fixes the issue.
> > >
> > If rx csum is disabled, LRO will be disable. LRO won't be enabled
> automatically if you enable rx csum.
> > You need to explicitly enable LRO.
>
> Explicitly enabling LRO does not work ("invalid argument", if I recall
> correctly).
>

With below patch enabling/disabling LRO will work.

> > > Is this a known bug, maybe with an available patch?
> > >
> > You need to retest with this patch
> > http://patchwork.ozlabs.org/patch/88060/. This patch got applied
> > instead of mine.
>
> Will that patch fix the behavior of the interface regarding the packet
> loss, or only its connection to ethtool?
>
This will fix LRO configuration problem. Do you see packet loss with LRO disable ?

This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.

^ permalink raw reply

* Re: [PATCH 12/14] ehea: Add some more ethtool operations and 64bit stats
From: Ben Hutchings @ 2011-04-05 13:07 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: leitao, netdev, michael
In-Reply-To: <20110405214249.36d248e7@kryten>

On Tue, 2011-04-05 at 21:42 +1000, Anton Blanchard wrote:
> We can use the standard ethtool functions for set_tx_csum and set_sg.

These are deprecated.  Please use the generic features interface
instead.

> Also switch to using ndo_get_stats64 to get 64bit tx/rx stats.
[...]

Should be a separate patch, really.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* [PATCH 2/8] netfilter: ipset: references are protected by rwlock instead of mutex
From: kaber @ 2011-04-05 13:04 UTC (permalink / raw)
  To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1302008659-21141-1-git-send-email-kaber@trash.net>

From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

The timeout variant of the list:set type must reference the member sets.
However, its garbage collector runs at timer interrupt so the mutex
protection of the references is a no go. Therefore the reference protection
is converted to rwlock.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/linux/netfilter/ipset/ip_set.h       |    2 +-
 include/linux/netfilter/ipset/ip_set_ahash.h |    3 +-
 net/netfilter/ipset/ip_set_bitmap_ip.c       |    3 +-
 net/netfilter/ipset/ip_set_bitmap_ipmac.c    |    3 +-
 net/netfilter/ipset/ip_set_bitmap_port.c     |    3 +-
 net/netfilter/ipset/ip_set_core.c            |  109 ++++++++++++++++----------
 net/netfilter/ipset/ip_set_list_set.c        |    6 +-
 7 files changed, 73 insertions(+), 56 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h
index ec333d8..5a262e3 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -293,7 +293,7 @@ struct ip_set {
 	/* Lock protecting the set data */
 	rwlock_t lock;
 	/* References to the set */
-	atomic_t ref;
+	u32 ref;
 	/* The core set type */
 	struct ip_set_type *type;
 	/* The type variant doing the real job */
diff --git a/include/linux/netfilter/ipset/ip_set_ahash.h b/include/linux/netfilter/ipset/ip_set_ahash.h
index ec9d9be..a0196ac 100644
--- a/include/linux/netfilter/ipset/ip_set_ahash.h
+++ b/include/linux/netfilter/ipset/ip_set_ahash.h
@@ -515,8 +515,7 @@ type_pf_head(struct ip_set *set, struct sk_buff *skb)
 	if (h->netmask != HOST_MASK)
 		NLA_PUT_U8(skb, IPSET_ATTR_NETMASK, h->netmask);
 #endif
-	NLA_PUT_NET32(skb, IPSET_ATTR_REFERENCES,
-		      htonl(atomic_read(&set->ref) - 1));
+	NLA_PUT_NET32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref - 1));
 	NLA_PUT_NET32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize));
 	if (with_timeout(h->timeout))
 		NLA_PUT_NET32(skb, IPSET_ATTR_TIMEOUT, htonl(h->timeout));
diff --git a/net/netfilter/ipset/ip_set_bitmap_ip.c b/net/netfilter/ipset/ip_set_bitmap_ip.c
index bca9699..a113ff0 100644
--- a/net/netfilter/ipset/ip_set_bitmap_ip.c
+++ b/net/netfilter/ipset/ip_set_bitmap_ip.c
@@ -338,8 +338,7 @@ bitmap_ip_head(struct ip_set *set, struct sk_buff *skb)
 	NLA_PUT_IPADDR4(skb, IPSET_ATTR_IP_TO, htonl(map->last_ip));
 	if (map->netmask != 32)
 		NLA_PUT_U8(skb, IPSET_ATTR_NETMASK, map->netmask);
-	NLA_PUT_NET32(skb, IPSET_ATTR_REFERENCES,
-		      htonl(atomic_read(&set->ref) - 1));
+	NLA_PUT_NET32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref - 1));
 	NLA_PUT_NET32(skb, IPSET_ATTR_MEMSIZE,
 		      htonl(sizeof(*map) + map->memsize));
 	if (with_timeout(map->timeout))
diff --git a/net/netfilter/ipset/ip_set_bitmap_ipmac.c b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
index 5e79017..00a3324 100644
--- a/net/netfilter/ipset/ip_set_bitmap_ipmac.c
+++ b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
@@ -434,8 +434,7 @@ bitmap_ipmac_head(struct ip_set *set, struct sk_buff *skb)
 		goto nla_put_failure;
 	NLA_PUT_IPADDR4(skb, IPSET_ATTR_IP, htonl(map->first_ip));
 	NLA_PUT_IPADDR4(skb, IPSET_ATTR_IP_TO, htonl(map->last_ip));
-	NLA_PUT_NET32(skb, IPSET_ATTR_REFERENCES,
-		      htonl(atomic_read(&set->ref) - 1));
+	NLA_PUT_NET32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref - 1));
 	NLA_PUT_NET32(skb, IPSET_ATTR_MEMSIZE,
 		      htonl(sizeof(*map)
 			    + (map->last_ip - map->first_ip + 1) * map->dsize));
diff --git a/net/netfilter/ipset/ip_set_bitmap_port.c b/net/netfilter/ipset/ip_set_bitmap_port.c
index 165f09b..6b38eb8 100644
--- a/net/netfilter/ipset/ip_set_bitmap_port.c
+++ b/net/netfilter/ipset/ip_set_bitmap_port.c
@@ -320,8 +320,7 @@ bitmap_port_head(struct ip_set *set, struct sk_buff *skb)
 		goto nla_put_failure;
 	NLA_PUT_NET16(skb, IPSET_ATTR_PORT, htons(map->first_port));
 	NLA_PUT_NET16(skb, IPSET_ATTR_PORT_TO, htons(map->last_port));
-	NLA_PUT_NET32(skb, IPSET_ATTR_REFERENCES,
-		      htonl(atomic_read(&set->ref) - 1));
+	NLA_PUT_NET32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref - 1));
 	NLA_PUT_NET32(skb, IPSET_ATTR_MEMSIZE,
 		      htonl(sizeof(*map) + map->memsize));
 	if (with_timeout(map->timeout))
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index d6b4823..e88ac3c 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -26,6 +26,7 @@
 
 static LIST_HEAD(ip_set_type_list);		/* all registered set types */
 static DEFINE_MUTEX(ip_set_type_mutex);		/* protects ip_set_type_list */
+static DEFINE_RWLOCK(ip_set_ref_lock);		/* protects the set refs */
 
 static struct ip_set **ip_set_list;		/* all individual sets */
 static ip_set_id_t ip_set_max = CONFIG_IP_SET_MAX; /* max number of sets */
@@ -301,13 +302,18 @@ EXPORT_SYMBOL_GPL(ip_set_get_ipaddr6);
 static inline void
 __ip_set_get(ip_set_id_t index)
 {
-	atomic_inc(&ip_set_list[index]->ref);
+	write_lock_bh(&ip_set_ref_lock);
+	ip_set_list[index]->ref++;
+	write_unlock_bh(&ip_set_ref_lock);
 }
 
 static inline void
 __ip_set_put(ip_set_id_t index)
 {
-	atomic_dec(&ip_set_list[index]->ref);
+	write_lock_bh(&ip_set_ref_lock);
+	BUG_ON(ip_set_list[index]->ref == 0);
+	ip_set_list[index]->ref--;
+	write_unlock_bh(&ip_set_ref_lock);
 }
 
 /*
@@ -324,7 +330,7 @@ ip_set_test(ip_set_id_t index, const struct sk_buff *skb,
 	struct ip_set *set = ip_set_list[index];
 	int ret = 0;
 
-	BUG_ON(set == NULL || atomic_read(&set->ref) == 0);
+	BUG_ON(set == NULL);
 	pr_debug("set %s, index %u\n", set->name, index);
 
 	if (dim < set->type->dimension ||
@@ -356,7 +362,7 @@ ip_set_add(ip_set_id_t index, const struct sk_buff *skb,
 	struct ip_set *set = ip_set_list[index];
 	int ret;
 
-	BUG_ON(set == NULL || atomic_read(&set->ref) == 0);
+	BUG_ON(set == NULL);
 	pr_debug("set %s, index %u\n", set->name, index);
 
 	if (dim < set->type->dimension ||
@@ -378,7 +384,7 @@ ip_set_del(ip_set_id_t index, const struct sk_buff *skb,
 	struct ip_set *set = ip_set_list[index];
 	int ret = 0;
 
-	BUG_ON(set == NULL || atomic_read(&set->ref) == 0);
+	BUG_ON(set == NULL);
 	pr_debug("set %s, index %u\n", set->name, index);
 
 	if (dim < set->type->dimension ||
@@ -397,7 +403,6 @@ EXPORT_SYMBOL_GPL(ip_set_del);
  * Find set by name, reference it once. The reference makes sure the
  * thing pointed to, does not go away under our feet.
  *
- * The nfnl mutex must already be activated.
  */
 ip_set_id_t
 ip_set_get_byname(const char *name, struct ip_set **set)
@@ -423,15 +428,12 @@ EXPORT_SYMBOL_GPL(ip_set_get_byname);
  * reference count by 1. The caller shall not assume the index
  * to be valid, after calling this function.
  *
- * The nfnl mutex must already be activated.
  */
 void
 ip_set_put_byindex(ip_set_id_t index)
 {
-	if (ip_set_list[index] != NULL) {
-		BUG_ON(atomic_read(&ip_set_list[index]->ref) == 0);
+	if (ip_set_list[index] != NULL)
 		__ip_set_put(index);
-	}
 }
 EXPORT_SYMBOL_GPL(ip_set_put_byindex);
 
@@ -441,7 +443,6 @@ EXPORT_SYMBOL_GPL(ip_set_put_byindex);
  * can't be destroyed. The set cannot be renamed due to
  * the referencing either.
  *
- * The nfnl mutex must already be activated.
  */
 const char *
 ip_set_name_byindex(ip_set_id_t index)
@@ -449,7 +450,7 @@ ip_set_name_byindex(ip_set_id_t index)
 	const struct ip_set *set = ip_set_list[index];
 
 	BUG_ON(set == NULL);
-	BUG_ON(atomic_read(&set->ref) == 0);
+	BUG_ON(set->ref == 0);
 
 	/* Referenced, so it's safe */
 	return set->name;
@@ -515,10 +516,7 @@ void
 ip_set_nfnl_put(ip_set_id_t index)
 {
 	nfnl_lock();
-	if (ip_set_list[index] != NULL) {
-		BUG_ON(atomic_read(&ip_set_list[index]->ref) == 0);
-		__ip_set_put(index);
-	}
+	ip_set_put_byindex(index);
 	nfnl_unlock();
 }
 EXPORT_SYMBOL_GPL(ip_set_nfnl_put);
@@ -526,7 +524,7 @@ EXPORT_SYMBOL_GPL(ip_set_nfnl_put);
 /*
  * Communication protocol with userspace over netlink.
  *
- * We already locked by nfnl_lock.
+ * The commands are serialized by the nfnl mutex.
  */
 
 static inline bool
@@ -657,7 +655,6 @@ ip_set_create(struct sock *ctnl, struct sk_buff *skb,
 		return -ENOMEM;
 	rwlock_init(&set->lock);
 	strlcpy(set->name, name, IPSET_MAXNAMELEN);
-	atomic_set(&set->ref, 0);
 	set->family = family;
 
 	/*
@@ -690,8 +687,8 @@ ip_set_create(struct sock *ctnl, struct sk_buff *skb,
 
 	/*
 	 * Here, we have a valid, constructed set and we are protected
-	 * by nfnl_lock. Find the first free index in ip_set_list and
-	 * check clashing.
+	 * by the nfnl mutex. Find the first free index in ip_set_list
+	 * and check clashing.
 	 */
 	if ((ret = find_free_id(set->name, &index, &clash)) != 0) {
 		/* If this is the same set and requested, ignore error */
@@ -751,31 +748,51 @@ ip_set_destroy(struct sock *ctnl, struct sk_buff *skb,
 	       const struct nlattr * const attr[])
 {
 	ip_set_id_t i;
+	int ret = 0;
 
 	if (unlikely(protocol_failed(attr)))
 		return -IPSET_ERR_PROTOCOL;
 
-	/* References are protected by the nfnl mutex */
+	/* Commands are serialized and references are
+	 * protected by the ip_set_ref_lock.
+	 * External systems (i.e. xt_set) must call
+	 * ip_set_put|get_nfnl_* functions, that way we
+	 * can safely check references here.
+	 *
+	 * list:set timer can only decrement the reference
+	 * counter, so if it's already zero, we can proceed
+	 * without holding the lock.
+	 */
+	read_lock_bh(&ip_set_ref_lock);
 	if (!attr[IPSET_ATTR_SETNAME]) {
 		for (i = 0; i < ip_set_max; i++) {
-			if (ip_set_list[i] != NULL &&
-			    (atomic_read(&ip_set_list[i]->ref)))
-				return -IPSET_ERR_BUSY;
+			if (ip_set_list[i] != NULL && ip_set_list[i]->ref) {
+				ret = IPSET_ERR_BUSY;
+				goto out;
+			}
 		}
+		read_unlock_bh(&ip_set_ref_lock);
 		for (i = 0; i < ip_set_max; i++) {
 			if (ip_set_list[i] != NULL)
 				ip_set_destroy_set(i);
 		}
 	} else {
 		i = find_set_id(nla_data(attr[IPSET_ATTR_SETNAME]));
-		if (i == IPSET_INVALID_ID)
-			return -ENOENT;
-		else if (atomic_read(&ip_set_list[i]->ref))
-			return -IPSET_ERR_BUSY;
+		if (i == IPSET_INVALID_ID) {
+			ret = -ENOENT;
+			goto out;
+		} else if (ip_set_list[i]->ref) {
+			ret = -IPSET_ERR_BUSY;
+			goto out;
+		}
+		read_unlock_bh(&ip_set_ref_lock);
 
 		ip_set_destroy_set(i);
 	}
 	return 0;
+out:
+	read_unlock_bh(&ip_set_ref_lock);
+	return ret;
 }
 
 /* Flush sets */
@@ -834,6 +851,7 @@ ip_set_rename(struct sock *ctnl, struct sk_buff *skb,
 	struct ip_set *set;
 	const char *name2;
 	ip_set_id_t i;
+	int ret = 0;
 
 	if (unlikely(protocol_failed(attr) ||
 		     attr[IPSET_ATTR_SETNAME] == NULL ||
@@ -843,25 +861,33 @@ ip_set_rename(struct sock *ctnl, struct sk_buff *skb,
 	set = find_set(nla_data(attr[IPSET_ATTR_SETNAME]));
 	if (set == NULL)
 		return -ENOENT;
-	if (atomic_read(&set->ref) != 0)
-		return -IPSET_ERR_REFERENCED;
+
+	read_lock_bh(&ip_set_ref_lock);
+	if (set->ref != 0) {
+		ret = -IPSET_ERR_REFERENCED;
+		goto out;
+	}
 
 	name2 = nla_data(attr[IPSET_ATTR_SETNAME2]);
 	for (i = 0; i < ip_set_max; i++) {
 		if (ip_set_list[i] != NULL &&
-		    STREQ(ip_set_list[i]->name, name2))
-			return -IPSET_ERR_EXIST_SETNAME2;
+		    STREQ(ip_set_list[i]->name, name2)) {
+			ret = -IPSET_ERR_EXIST_SETNAME2;
+			goto out;
+		}
 	}
 	strncpy(set->name, name2, IPSET_MAXNAMELEN);
 
-	return 0;
+out:
+	read_unlock_bh(&ip_set_ref_lock);
+	return ret;
 }
 
 /* Swap two sets so that name/index points to the other.
  * References and set names are also swapped.
  *
- * We are protected by the nfnl mutex and references are
- * manipulated only by holding the mutex. The kernel interfaces
+ * The commands are serialized by the nfnl mutex and references are
+ * protected by the ip_set_ref_lock. The kernel interfaces
  * do not hold the mutex but the pointer settings are atomic
  * so the ip_set_list always contains valid pointers to the sets.
  */
@@ -874,7 +900,6 @@ ip_set_swap(struct sock *ctnl, struct sk_buff *skb,
 	struct ip_set *from, *to;
 	ip_set_id_t from_id, to_id;
 	char from_name[IPSET_MAXNAMELEN];
-	u32 from_ref;
 
 	if (unlikely(protocol_failed(attr) ||
 		     attr[IPSET_ATTR_SETNAME] == NULL ||
@@ -899,17 +924,15 @@ ip_set_swap(struct sock *ctnl, struct sk_buff *skb,
 	      from->type->family == to->type->family))
 		return -IPSET_ERR_TYPE_MISMATCH;
 
-	/* No magic here: ref munging protected by the nfnl_lock */
 	strncpy(from_name, from->name, IPSET_MAXNAMELEN);
-	from_ref = atomic_read(&from->ref);
-
 	strncpy(from->name, to->name, IPSET_MAXNAMELEN);
-	atomic_set(&from->ref, atomic_read(&to->ref));
 	strncpy(to->name, from_name, IPSET_MAXNAMELEN);
-	atomic_set(&to->ref, from_ref);
 
+	write_lock_bh(&ip_set_ref_lock);
+	swap(from->ref, to->ref);
 	ip_set_list[from_id] = to;
 	ip_set_list[to_id] = from;
+	write_unlock_bh(&ip_set_ref_lock);
 
 	return 0;
 }
@@ -926,7 +949,7 @@ ip_set_dump_done(struct netlink_callback *cb)
 {
 	if (cb->args[2]) {
 		pr_debug("release set %s\n", ip_set_list[cb->args[1]]->name);
-		__ip_set_put((ip_set_id_t) cb->args[1]);
+		ip_set_put_byindex((ip_set_id_t) cb->args[1]);
 	}
 	return 0;
 }
@@ -1068,7 +1091,7 @@ release_refcount:
 	/* If there was an error or set is done, release set */
 	if (ret || !cb->args[2]) {
 		pr_debug("release set %s\n", ip_set_list[index]->name);
-		__ip_set_put(index);
+		ip_set_put_byindex(index);
 	}
 
 	/* If we dump all sets, continue with dumping last ones */
diff --git a/net/netfilter/ipset/ip_set_list_set.c b/net/netfilter/ipset/ip_set_list_set.c
index f4a46c0..e9159e9 100644
--- a/net/netfilter/ipset/ip_set_list_set.c
+++ b/net/netfilter/ipset/ip_set_list_set.c
@@ -366,8 +366,7 @@ list_set_head(struct ip_set *set, struct sk_buff *skb)
 	NLA_PUT_NET32(skb, IPSET_ATTR_SIZE, htonl(map->size));
 	if (with_timeout(map->timeout))
 		NLA_PUT_NET32(skb, IPSET_ATTR_TIMEOUT, htonl(map->timeout));
-	NLA_PUT_NET32(skb, IPSET_ATTR_REFERENCES,
-		      htonl(atomic_read(&set->ref) - 1));
+	NLA_PUT_NET32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref - 1));
 	NLA_PUT_NET32(skb, IPSET_ATTR_MEMSIZE,
 		      htonl(sizeof(*map) + map->size * map->dsize));
 	ipset_nest_end(skb, nested);
@@ -457,8 +456,7 @@ list_set_gc(unsigned long ul_set)
 	struct list_set *map = set->data;
 	struct set_telem *e;
 	u32 i;
-	
-	/* nfnl_lock should be called */
+
 	write_lock_bh(&set->lock);
 	for (i = 0; i < map->size; i++) {
 		e = list_set_telem(map, i);
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 1/8] netfilter: ipset: list:set timeout variant fixes
From: kaber @ 2011-04-05 13:04 UTC (permalink / raw)
  To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1302008659-21141-1-git-send-email-kaber@trash.net>

From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

- the timeout value was actually not set
- the garbage collector was broken

The variant is fixed, the tests to the ipset testsuite are added.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/netfilter/ipset/ip_set_list_set.c |   53 +++++++++++++++------------------
 1 files changed, 24 insertions(+), 29 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_list_set.c b/net/netfilter/ipset/ip_set_list_set.c
index a47c329..f4a46c0 100644
--- a/net/netfilter/ipset/ip_set_list_set.c
+++ b/net/netfilter/ipset/ip_set_list_set.c
@@ -43,14 +43,19 @@ struct list_set {
 static inline struct set_elem *
 list_set_elem(const struct list_set *map, u32 id)
 {
-	return (struct set_elem *)((char *)map->members + id * map->dsize);
+	return (struct set_elem *)((void *)map->members + id * map->dsize);
+}
+
+static inline struct set_telem *
+list_set_telem(const struct list_set *map, u32 id)
+{
+	return (struct set_telem *)((void *)map->members + id * map->dsize);
 }
 
 static inline bool
 list_set_timeout(const struct list_set *map, u32 id)
 {
-	const struct set_telem *elem =
-		(const struct set_telem *) list_set_elem(map, id);
+	const struct set_telem *elem = list_set_telem(map, id);
 
 	return ip_set_timeout_test(elem->timeout);
 }
@@ -58,19 +63,11 @@ list_set_timeout(const struct list_set *map, u32 id)
 static inline bool
 list_set_expired(const struct list_set *map, u32 id)
 {
-	const struct set_telem *elem =
-		(const struct set_telem *) list_set_elem(map, id);
+	const struct set_telem *elem = list_set_telem(map, id);
 
 	return ip_set_timeout_expired(elem->timeout);
 }
 
-static inline int
-list_set_exist(const struct set_telem *elem)
-{
-	return elem->id != IPSET_INVALID_ID &&
-	       !ip_set_timeout_expired(elem->timeout);
-}
-
 /* Set list without and with timeout */
 
 static int
@@ -146,11 +143,11 @@ list_elem_tadd(struct list_set *map, u32 i, ip_set_id_t id,
 	struct set_telem *e;
 
 	for (; i < map->size; i++) {
-		e = (struct set_telem *)list_set_elem(map, i);
+		e = list_set_telem(map, i);
 		swap(e->id, id);
+		swap(e->timeout, timeout);
 		if (e->id == IPSET_INVALID_ID)
 			break;
-		swap(e->timeout, timeout);
 	}
 }
 
@@ -164,7 +161,7 @@ list_set_add(struct list_set *map, u32 i, ip_set_id_t id,
 		/* Last element replaced: e.g. add new,before,last */
 		ip_set_put_byindex(e->id);
 	if (with_timeout(map->timeout))
-		list_elem_tadd(map, i, id, timeout);
+		list_elem_tadd(map, i, id, ip_set_timeout_set(timeout));
 	else
 		list_elem_add(map, i, id);
 
@@ -172,11 +169,11 @@ list_set_add(struct list_set *map, u32 i, ip_set_id_t id,
 }
 
 static int
-list_set_del(struct list_set *map, ip_set_id_t id, u32 i)
+list_set_del(struct list_set *map, u32 i)
 {
 	struct set_elem *a = list_set_elem(map, i), *b;
 
-	ip_set_put_byindex(id);
+	ip_set_put_byindex(a->id);
 
 	for (; i < map->size - 1; i++) {
 		b = list_set_elem(map, i + 1);
@@ -308,11 +305,11 @@ list_set_uadt(struct ip_set *set, struct nlattr *tb[],
 				 (before == 0 ||
 				  (before > 0 &&
 				   next_id_eq(map, i, refid))))
-				ret = list_set_del(map, id, i);
+				ret = list_set_del(map, i);
 			else if (before < 0 &&
 				 elem->id == refid &&
 				 next_id_eq(map, i, id))
-				ret = list_set_del(map, id, i + 1);
+				ret = list_set_del(map, i + 1);
 		}
 		break;
 	default:
@@ -460,17 +457,15 @@ list_set_gc(unsigned long ul_set)
 	struct list_set *map = set->data;
 	struct set_telem *e;
 	u32 i;
-
-	/* We run parallel with other readers (test element)
-	 * but adding/deleting new entries is locked out */
-	read_lock_bh(&set->lock);
-	for (i = map->size - 1; i >= 0; i--) {
-		e = (struct set_telem *) list_set_elem(map, i);
-		if (e->id != IPSET_INVALID_ID &&
-		    list_set_expired(map, i))
-			list_set_del(map, e->id, i);
+	
+	/* nfnl_lock should be called */
+	write_lock_bh(&set->lock);
+	for (i = 0; i < map->size; i++) {
+		e = list_set_telem(map, i);
+		if (e->id != IPSET_INVALID_ID && list_set_expired(map, i))
+			list_set_del(map, i);
 	}
-	read_unlock_bh(&set->lock);
+	write_unlock_bh(&set->lock);
 
 	map->gc.expires = jiffies + IPSET_GC_PERIOD(map->timeout) * HZ;
 	add_timer(&map->gc);
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 7/8] netfilter: xt_addrtype: replace rt6_lookup with nf_afinfo->route
From: kaber @ 2011-04-05 13:04 UTC (permalink / raw)
  To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1302008659-21141-1-git-send-email-kaber@trash.net>

From: Florian Westphal <fw@strlen.de>

This avoids pulling in the ipv6 module when using (ipv4-only) iptables
-m addrtype.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/netfilter/Kconfig       |    1 -
 net/netfilter/xt_addrtype.c |   42 ++++++++++++++++++++++++++++--------------
 2 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index c3f988a..32bff6d 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -652,7 +652,6 @@ comment "Xtables matches"
 config NETFILTER_XT_MATCH_ADDRTYPE
 	tristate '"addrtype" address type match support'
 	depends on NETFILTER_ADVANCED
-	depends on (IPV6 || IPV6=n)
 	---help---
 	  This option allows you to match what routing thinks of an address,
 	  eg. UNICAST, LOCAL, BROADCAST, ...
diff --git a/net/netfilter/xt_addrtype.c b/net/netfilter/xt_addrtype.c
index 2220b85..b77d383 100644
--- a/net/netfilter/xt_addrtype.c
+++ b/net/netfilter/xt_addrtype.c
@@ -32,11 +32,32 @@ MODULE_ALIAS("ipt_addrtype");
 MODULE_ALIAS("ip6t_addrtype");
 
 #if defined(CONFIG_IP6_NF_IPTABLES) || defined(CONFIG_IP6_NF_IPTABLES_MODULE)
-static u32 xt_addrtype_rt6_to_type(const struct rt6_info *rt)
+static u32 match_lookup_rt6(struct net *net, const struct net_device *dev,
+			    const struct in6_addr *addr)
 {
+	const struct nf_afinfo *afinfo;
+	struct flowi6 flow;
+	struct rt6_info *rt;
 	u32 ret;
+	int route_err;
 
-	if (!rt)
+	memset(&flow, 0, sizeof(flow));
+	ipv6_addr_copy(&flow.daddr, addr);
+	if (dev)
+		flow.flowi6_oif = dev->ifindex;
+
+	rcu_read_lock();
+
+	afinfo = nf_get_afinfo(NFPROTO_IPV6);
+	if (afinfo != NULL)
+		route_err = afinfo->route(net, (struct dst_entry **)&rt,
+					flowi6_to_flowi(&flow), !!dev);
+	else
+		route_err = 1;
+
+	rcu_read_unlock();
+
+	if (route_err)
 		return XT_ADDRTYPE_UNREACHABLE;
 
 	if (rt->rt6i_flags & RTF_REJECT)
@@ -48,6 +69,9 @@ static u32 xt_addrtype_rt6_to_type(const struct rt6_info *rt)
 		ret |= XT_ADDRTYPE_LOCAL;
 	if (rt->rt6i_flags & RTF_ANYCAST)
 		ret |= XT_ADDRTYPE_ANYCAST;
+
+
+	dst_release(&rt->dst);
 	return ret;
 }
 
@@ -65,18 +89,8 @@ static bool match_type6(struct net *net, const struct net_device *dev,
 		return false;
 
 	if ((XT_ADDRTYPE_LOCAL | XT_ADDRTYPE_ANYCAST |
-	     XT_ADDRTYPE_UNREACHABLE) & mask) {
-		struct rt6_info *rt;
-		u32 type;
-		int ifindex = dev ? dev->ifindex : 0;
-
-		rt = rt6_lookup(net, addr, NULL, ifindex, !!dev);
-
-		type = xt_addrtype_rt6_to_type(rt);
-
-		dst_release(&rt->dst);
-		return !!(mask & type);
-	}
+	     XT_ADDRTYPE_UNREACHABLE) & mask)
+		return !!(mask & match_lookup_rt6(net, dev, addr));
 	return true;
 }
 
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 8/8] netfilter: xt_conntrack: fix inverted conntrack direction test
From: kaber @ 2011-04-05 13:04 UTC (permalink / raw)
  To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1302008659-21141-1-git-send-email-kaber@trash.net>

From: Florian Westphal <fw@strlen.de>

--ctdir ORIGINAL matches REPLY packets, and vv:

userspace sets "invert_flags &= ~XT_CONNTRACK_DIRECTION" in ORIGINAL
case.

Thus: (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) ^
      !!(info->invert_flags & XT_CONNTRACK_DIRECTION))

yields "1 ^ 0", which is true -> returns false.

Reproducer:
iptables -I OUTPUT 1 -p tcp --syn -m conntrack --ctdir ORIGINAL

Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/netfilter/xt_conntrack.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/netfilter/xt_conntrack.c b/net/netfilter/xt_conntrack.c
index 2c0086a..481a86f 100644
--- a/net/netfilter/xt_conntrack.c
+++ b/net/netfilter/xt_conntrack.c
@@ -195,7 +195,7 @@ conntrack_mt(const struct sk_buff *skb, struct xt_action_param *par,
 		return info->match_flags & XT_CONNTRACK_STATE;
 	if ((info->match_flags & XT_CONNTRACK_DIRECTION) &&
 	    (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) ^
-	    !!(info->invert_flags & XT_CONNTRACK_DIRECTION))
+	    !(info->invert_flags & XT_CONNTRACK_DIRECTION))
 		return false;
 
 	if (info->match_flags & XT_CONNTRACK_ORIGSRC)
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 6/8] netfilter: af_info: add 'strict' parameter to limit lookup to .oif
From: kaber @ 2011-04-05 13:04 UTC (permalink / raw)
  To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1302008659-21141-1-git-send-email-kaber@trash.net>

From: Florian Westphal <fw@strlen.de>

ipv6 fib lookup can set RT6_LOOKUP_F_IFACE flag to restrict search
to an interface, but this flag cannot be set via struct flowi.

Also, it cannot be set via ip6_route_output: this function uses the
passed sock struct to determine if this flag is required
(by testing for nonzero sk_bound_dev_if).

Work around this by passing in an artificial struct sk in case
'strict' argument is true.

This is required to replace the rt6_lookup call in xt_addrtype.c with
nf_afinfo->route().

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/linux/netfilter.h              |    2 +-
 net/ipv4/netfilter.c                   |    2 +-
 net/ipv6/netfilter.c                   |   12 ++++++++++--
 net/netfilter/nf_conntrack_h323_main.c |    8 ++++----
 net/netfilter/xt_TCPMSS.c              |    2 +-
 5 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 20ed452..7fa95df 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -271,7 +271,7 @@ struct nf_afinfo {
 					    unsigned int len,
 					    u_int8_t protocol);
 	int		(*route)(struct net *net, struct dst_entry **dst,
-				 struct flowi *fl);
+				 struct flowi *fl, bool strict);
 	void		(*saveroute)(const struct sk_buff *skb,
 				     struct nf_queue_entry *entry);
 	int		(*reroute)(struct sk_buff *skb,
diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
index f1035f0..4614bab 100644
--- a/net/ipv4/netfilter.c
+++ b/net/ipv4/netfilter.c
@@ -222,7 +222,7 @@ static __sum16 nf_ip_checksum_partial(struct sk_buff *skb, unsigned int hook,
 }
 
 static int nf_ip_route(struct net *net, struct dst_entry **dst,
-		       struct flowi *fl)
+		       struct flowi *fl, bool strict __always_unused)
 {
 	struct rtable *rt = ip_route_output_key(net, &fl->u.ip4);
 	if (IS_ERR(rt))
diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c
index e008b9b..28bc1f6 100644
--- a/net/ipv6/netfilter.c
+++ b/net/ipv6/netfilter.c
@@ -91,9 +91,17 @@ static int nf_ip6_reroute(struct sk_buff *skb,
 }
 
 static int nf_ip6_route(struct net *net, struct dst_entry **dst,
-			struct flowi *fl)
+			struct flowi *fl, bool strict)
 {
-	*dst = ip6_route_output(net, NULL, &fl->u.ip6);
+	static const struct ipv6_pinfo fake_pinfo;
+	static const struct inet_sock fake_sk = {
+		/* makes ip6_route_output set RT6_LOOKUP_F_IFACE: */
+		.sk.sk_bound_dev_if = 1,
+		.pinet6 = (struct ipv6_pinfo *) &fake_pinfo,
+	};
+	const void *sk = strict ? &fake_sk : NULL;
+
+	*dst = ip6_route_output(net, sk, &fl->u.ip6);
 	return (*dst)->error;
 }
 
diff --git a/net/netfilter/nf_conntrack_h323_main.c b/net/netfilter/nf_conntrack_h323_main.c
index 39a4538..18b2ce5 100644
--- a/net/netfilter/nf_conntrack_h323_main.c
+++ b/net/netfilter/nf_conntrack_h323_main.c
@@ -732,9 +732,9 @@ static int callforward_do_filter(const union nf_inet_addr *src,
 		memset(&fl2, 0, sizeof(fl2));
 		fl2.daddr = dst->ip;
 		if (!afinfo->route(&init_net, (struct dst_entry **)&rt1,
-				   flowi4_to_flowi(&fl1))) {
+				   flowi4_to_flowi(&fl1), false)) {
 			if (!afinfo->route(&init_net, (struct dst_entry **)&rt2,
-					   flowi4_to_flowi(&fl2))) {
+					   flowi4_to_flowi(&fl2), false)) {
 				if (rt1->rt_gateway == rt2->rt_gateway &&
 				    rt1->dst.dev  == rt2->dst.dev)
 					ret = 1;
@@ -756,9 +756,9 @@ static int callforward_do_filter(const union nf_inet_addr *src,
 		memset(&fl2, 0, sizeof(fl2));
 		ipv6_addr_copy(&fl2.daddr, &dst->in6);
 		if (!afinfo->route(&init_net, (struct dst_entry **)&rt1,
-				   flowi6_to_flowi(&fl1))) {
+				   flowi6_to_flowi(&fl1), false)) {
 			if (!afinfo->route(&init_net, (struct dst_entry **)&rt2,
-					   flowi6_to_flowi(&fl2))) {
+					   flowi6_to_flowi(&fl2), false)) {
 				if (!memcmp(&rt1->rt6i_gateway, &rt2->rt6i_gateway,
 					    sizeof(rt1->rt6i_gateway)) &&
 				    rt1->dst.dev == rt2->dst.dev)
diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c
index 8690125..9e63b43 100644
--- a/net/netfilter/xt_TCPMSS.c
+++ b/net/netfilter/xt_TCPMSS.c
@@ -166,7 +166,7 @@ static u_int32_t tcpmss_reverse_mtu(const struct sk_buff *skb,
 	rcu_read_lock();
 	ai = nf_get_afinfo(family);
 	if (ai != NULL)
-		ai->route(&init_net, (struct dst_entry **)&rt, &fl);
+		ai->route(&init_net, (struct dst_entry **)&rt, &fl, false);
 	rcu_read_unlock();
 
 	if (rt != NULL) {
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 5/8] netfilter: af_info: add network namespace parameter to route hook
From: kaber @ 2011-04-05 13:04 UTC (permalink / raw)
  To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1302008659-21141-1-git-send-email-kaber@trash.net>

From: Florian Westphal <fw@strlen.de>

This is required to eventually replace the rt6_lookup call in
xt_addrtype.c with nf_afinfo->route().

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/linux/netfilter.h              |    3 ++-
 net/ipv4/netfilter.c                   |    5 +++--
 net/ipv6/netfilter.c                   |    5 +++--
 net/netfilter/nf_conntrack_h323_main.c |    8 ++++----
 net/netfilter/xt_TCPMSS.c              |    2 +-
 5 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index eeec00a..20ed452 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -270,7 +270,8 @@ struct nf_afinfo {
 					    unsigned int dataoff,
 					    unsigned int len,
 					    u_int8_t protocol);
-	int		(*route)(struct dst_entry **dst, struct flowi *fl);
+	int		(*route)(struct net *net, struct dst_entry **dst,
+				 struct flowi *fl);
 	void		(*saveroute)(const struct sk_buff *skb,
 				     struct nf_queue_entry *entry);
 	int		(*reroute)(struct sk_buff *skb,
diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
index f3c0b54..f1035f0 100644
--- a/net/ipv4/netfilter.c
+++ b/net/ipv4/netfilter.c
@@ -221,9 +221,10 @@ static __sum16 nf_ip_checksum_partial(struct sk_buff *skb, unsigned int hook,
 	return csum;
 }
 
-static int nf_ip_route(struct dst_entry **dst, struct flowi *fl)
+static int nf_ip_route(struct net *net, struct dst_entry **dst,
+		       struct flowi *fl)
 {
-	struct rtable *rt = ip_route_output_key(&init_net, &fl->u.ip4);
+	struct rtable *rt = ip_route_output_key(net, &fl->u.ip4);
 	if (IS_ERR(rt))
 		return PTR_ERR(rt);
 	*dst = &rt->dst;
diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c
index 39aaca2..e008b9b 100644
--- a/net/ipv6/netfilter.c
+++ b/net/ipv6/netfilter.c
@@ -90,9 +90,10 @@ static int nf_ip6_reroute(struct sk_buff *skb,
 	return 0;
 }
 
-static int nf_ip6_route(struct dst_entry **dst, struct flowi *fl)
+static int nf_ip6_route(struct net *net, struct dst_entry **dst,
+			struct flowi *fl)
 {
-	*dst = ip6_route_output(&init_net, NULL, &fl->u.ip6);
+	*dst = ip6_route_output(net, NULL, &fl->u.ip6);
 	return (*dst)->error;
 }
 
diff --git a/net/netfilter/nf_conntrack_h323_main.c b/net/netfilter/nf_conntrack_h323_main.c
index 533a183..39a4538 100644
--- a/net/netfilter/nf_conntrack_h323_main.c
+++ b/net/netfilter/nf_conntrack_h323_main.c
@@ -731,9 +731,9 @@ static int callforward_do_filter(const union nf_inet_addr *src,
 
 		memset(&fl2, 0, sizeof(fl2));
 		fl2.daddr = dst->ip;
-		if (!afinfo->route((struct dst_entry **)&rt1,
+		if (!afinfo->route(&init_net, (struct dst_entry **)&rt1,
 				   flowi4_to_flowi(&fl1))) {
-			if (!afinfo->route((struct dst_entry **)&rt2,
+			if (!afinfo->route(&init_net, (struct dst_entry **)&rt2,
 					   flowi4_to_flowi(&fl2))) {
 				if (rt1->rt_gateway == rt2->rt_gateway &&
 				    rt1->dst.dev  == rt2->dst.dev)
@@ -755,9 +755,9 @@ static int callforward_do_filter(const union nf_inet_addr *src,
 
 		memset(&fl2, 0, sizeof(fl2));
 		ipv6_addr_copy(&fl2.daddr, &dst->in6);
-		if (!afinfo->route((struct dst_entry **)&rt1,
+		if (!afinfo->route(&init_net, (struct dst_entry **)&rt1,
 				   flowi6_to_flowi(&fl1))) {
-			if (!afinfo->route((struct dst_entry **)&rt2,
+			if (!afinfo->route(&init_net, (struct dst_entry **)&rt2,
 					   flowi6_to_flowi(&fl2))) {
 				if (!memcmp(&rt1->rt6i_gateway, &rt2->rt6i_gateway,
 					    sizeof(rt1->rt6i_gateway)) &&
diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c
index 6e6b46c..8690125 100644
--- a/net/netfilter/xt_TCPMSS.c
+++ b/net/netfilter/xt_TCPMSS.c
@@ -166,7 +166,7 @@ static u_int32_t tcpmss_reverse_mtu(const struct sk_buff *skb,
 	rcu_read_lock();
 	ai = nf_get_afinfo(family);
 	if (ai != NULL)
-		ai->route((struct dst_entry **)&rt, &fl);
+		ai->route(&init_net, (struct dst_entry **)&rt, &fl);
 	rcu_read_unlock();
 
 	if (rt != NULL) {
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 4/8] IPVS: fix NULL ptr dereference in ip_vs_ctl.c ip_vs_genl_dump_daemons()
From: kaber @ 2011-04-05 13:04 UTC (permalink / raw)
  To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1302008659-21141-1-git-send-email-kaber@trash.net>

From: Hans Schillstrom <hans.schillstrom@ericsson.com>

ipvsadm -ln --daemon will trigger a Null pointer exception because
ip_vs_genl_dump_daemons() uses skb_net() instead of skb_sknet().

To prevent others from NULL ptr a check is made in ip_vs.h skb_net().

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/ip_vs.h            |    2 +-
 net/netfilter/ipvs/ip_vs_ctl.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 30b49ed..4d1b71a 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -52,7 +52,7 @@ static inline struct net *skb_net(const struct sk_buff *skb)
 	 */
 	if (likely(skb->dev && skb->dev->nd_net))
 		return dev_net(skb->dev);
-	if (skb_dst(skb)->dev)
+	if (skb_dst(skb) && skb_dst(skb)->dev)
 		return dev_net(skb_dst(skb)->dev);
 	WARN(skb->sk, "Maybe skb_sknet should be used in %s() at line:%d\n",
 		      __func__, __LINE__);
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 33733c8..ae47090 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -3120,7 +3120,7 @@ nla_put_failure:
 static int ip_vs_genl_dump_daemons(struct sk_buff *skb,
 				   struct netlink_callback *cb)
 {
-	struct net *net = skb_net(skb);
+	struct net *net = skb_sknet(skb);
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
 	mutex_lock(&__ip_vs_mutex);
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 3/8] netfilter: h323: bug in parsing of ASN1 SEQOF field
From: kaber @ 2011-04-05 13:04 UTC (permalink / raw)
  To: davem; +Cc: netfilter-devel, netdev
In-Reply-To: <1302008659-21141-1-git-send-email-kaber@trash.net>

From: David Sterba <dsterba@suse.cz>

Static analyzer of clang found a dead store which appears to be a bug in
reading count of items in SEQOF field, only the lower byte of word is
stored. This may lead to corrupted read and communication shutdown.

The bug has been in the module since it's first inclusion into linux
kernel.

[Patrick: the bug is real, but without practical consequence since the
 largest amount of sequence-of members we parse is 30.]

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/netfilter/nf_conntrack_h323_asn1.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/netfilter/nf_conntrack_h323_asn1.c b/net/netfilter/nf_conntrack_h323_asn1.c
index 8678823..bcd5ed6 100644
--- a/net/netfilter/nf_conntrack_h323_asn1.c
+++ b/net/netfilter/nf_conntrack_h323_asn1.c
@@ -631,7 +631,7 @@ static int decode_seqof(bitstr_t *bs, const struct field_t *f,
 		CHECK_BOUND(bs, 2);
 		count = *bs->cur++;
 		count <<= 8;
-		count = *bs->cur++;
+		count += *bs->cur++;
 		break;
 	case SEMI:
 		BYTE_ALIGN(bs);
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 0/8] netfilter: netfilter fixes for 2.6.39-rc1
From: kaber @ 2011-04-05 13:04 UTC (permalink / raw)
  To: davem; +Cc: netfilter-devel, netdev

Hi Dave,

the following patches for 2.6.39-rc1 fix a few bugs in netfilter:

- ipset list:set timeout variant was broken, from Jozsef

- incorrect locking in ipset, from Jozsef

- incorrect parsing of the amount of elements encoded in SEQUENCE OF for
  numbers between 2^8 and 2^16-1 in the H.323 conntrack ASN.1 parser,
  from David Sterba

- a NULL pointer dereference in ip_vs_genl_dump_daemons(), from Hans

- patches from Florian to avoid pulling in the IPv6 module when loading
  the xt_addrtype module

- an inverted check for inversion in the conntrack direction match,
  from Florian

Please apply or pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6.git master

Thanks!


^ permalink raw reply

* [PATCH] dsa/mv88e6131: add support for mv88e6085 switch
From: Peter Korsgaard @ 2011-04-05 13:03 UTC (permalink / raw)
  To: davem, buytenh, netdev; +Cc: Peter Korsgaard

The mv88e6085 is identical to the mv88e6095, except that all ports are
10/100 Mb/s, so use the existing setup code except for the cpu/dsa speed
selection in _setup_port().

Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
---
 net/dsa/mv88e6131.c |   23 +++++++++++++++++++----
 net/dsa/mv88e6xxx.h |    2 ++
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/net/dsa/mv88e6131.c b/net/dsa/mv88e6131.c
index bb2b41b..a8e4f8c 100644
--- a/net/dsa/mv88e6131.c
+++ b/net/dsa/mv88e6131.c
@@ -14,6 +14,13 @@
 #include "dsa_priv.h"
 #include "mv88e6xxx.h"
 
+/*
+ * Switch product IDs
+ */
+#define ID_6085		0x04a0
+#define ID_6095		0x0950
+#define ID_6131		0x1060
+
 static char *mv88e6131_probe(struct mii_bus *bus, int sw_addr)
 {
 	int ret;
@@ -21,9 +28,11 @@ static char *mv88e6131_probe(struct mii_bus *bus, int sw_addr)
 	ret = __mv88e6xxx_reg_read(bus, sw_addr, REG_PORT(0), 0x03);
 	if (ret >= 0) {
 		ret &= 0xfff0;
-		if (ret == 0x0950)
+		if (ret == ID_6085)
+			return "Marvell 88E6085";
+		if (ret == ID_6095)
 			return "Marvell 88E6095/88E6095F";
-		if (ret == 0x1060)
+		if (ret == ID_6131)
 			return "Marvell 88E6131";
 	}
 
@@ -164,6 +173,7 @@ static int mv88e6131_setup_global(struct dsa_switch *ds)
 
 static int mv88e6131_setup_port(struct dsa_switch *ds, int p)
 {
+	struct mv88e6xxx_priv_state *ps = (void *)(ds + 1);
 	int addr = REG_PORT(p);
 	u16 val;
 
@@ -171,10 +181,13 @@ static int mv88e6131_setup_port(struct dsa_switch *ds, int p)
 	 * MAC Forcing register: don't force link, speed, duplex
 	 * or flow control state to any particular values on physical
 	 * ports, but force the CPU port and all DSA ports to 1000 Mb/s
-	 * full duplex.
+	 * (100 Mb/s on 6085) full duplex.
 	 */
 	if (dsa_is_cpu_port(ds, p) || ds->dsa_port_mask & (1 << p))
-		REG_WRITE(addr, 0x01, 0x003e);
+		if (ps->id == ID_6085)
+			REG_WRITE(addr, 0x01, 0x003d); /* 100 Mb/s */
+		else
+			REG_WRITE(addr, 0x01, 0x003e); /* 1000 Mb/s */
 	else
 		REG_WRITE(addr, 0x01, 0x0003);
 
@@ -286,6 +299,8 @@ static int mv88e6131_setup(struct dsa_switch *ds)
 	mv88e6xxx_ppu_state_init(ds);
 	mutex_init(&ps->stats_mutex);
 
+	ps->id = REG_READ(REG_PORT(0), 0x03) & 0xfff0;
+
 	ret = mv88e6131_switch_reset(ds);
 	if (ret < 0)
 		return ret;
diff --git a/net/dsa/mv88e6xxx.h b/net/dsa/mv88e6xxx.h
index eb0e0aa..61156ca2 100644
--- a/net/dsa/mv88e6xxx.h
+++ b/net/dsa/mv88e6xxx.h
@@ -39,6 +39,8 @@ struct mv88e6xxx_priv_state {
 	 * Hold this mutex over snapshot + dump sequences.
 	 */
 	struct mutex	stats_mutex;
+
+	int		id; /* switch product id */
 };
 
 struct mv88e6xxx_hw_stat {
-- 
1.7.2.3


^ permalink raw reply related

* Re: Netxen packet loss with VLANs and LRO (was: [PATCH] netxen: fix LRO disable warning)
From: Marc Haber @ 2011-04-05 12:41 UTC (permalink / raw)
  To: Amit Salecha
  Cc: davem@davemloft.net, netdev@vger.kernel.org, Ameen Rahman,
	Rajesh Borundia
In-Reply-To: <99737F4847ED0A48AECC9F4A1974A4B80FD10E8E3C@MNEXMB2.qlogic.org>

On Tue, Apr 05, 2011 at 12:38:11AM -0500, Amit Salecha wrote:
> > On Mon, Mar 21, 2011 at 03:37:08AM -0700, Amit Kumar Salecha wrote:
> > > netxen_nic_set_flags() rejects data if other flag than ETH_FLAG_LRO
> > is set.
> > > Driver also supports NETIF_F_HW_VLAN_TX.
> > > Now compare data with ethtool_op_get_flags(), to get all supported
> > features.
> >
> > Could that be the cause for packet loss on kernel 2.6.38.2 if:
> >
> >   - receiving card is NX3031 [4040:0100]
> >   - frames are received with VLAN tags
> >   - large received offload is on.
> 
> If ip_forwarding or routing is enable ....then you may see packet loss.

The box is intended to route, so disabling routing is
contraproductive. I just would like to download software updates to
the box with decent speed as well.

> > Packet Loss of this kind is noticed when doing TCP data transfers
> > towards the host with the Netxen Interface and the TCP session is
> > terminated on the Netxen host itself. TCP sessions routed through the
> > Netxen host are not affected.
> >
> > My ethtool doesn't allow me to influence the LRO setting alone - it is
> > disabled when I set rx off but doesn't come on again when rx is set to
> > on again. So, ethtool -K rx off, ethtool -K rx on fixes the issue.
> >
> If rx csum is disabled, LRO will be disable. LRO won't be enabled automatically if you enable rx csum.
> You need to explicitly enable LRO.

Explicitly enabling LRO does not work ("invalid argument", if I recall
correctly).

> > Is this a known bug, maybe with an available patch?
> >
> You need to retest with this patch
> http://patchwork.ozlabs.org/patch/88060/. This patch got applied
> instead of mine.

Will that patch fix the behavior of the interface regarding the packet
loss, or only its connection to ethtool?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190

^ permalink raw reply

* Re: extending feature word.
From: Ben Hutchings @ 2011-04-05 12:07 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: Mahesh Bandewar, linux-netdev, David Miller
In-Reply-To: <20110405113007.GA21358@rere.qmqm.pl>

On Tue, 2011-04-05 at 13:30 +0200, Michał Mirosław wrote:
> On Sat, Apr 02, 2011 at 08:09:14PM -0700, Mahesh Bandewar wrote:
> > On Sat, Apr 2, 2011 at 5:42 AM, Michał Mirosław <mirq-linux@rere.qmqm.pl> wrote:
> > > On Fri, Apr 01, 2011 at 07:07:05PM -0700, Mahesh Bandewar wrote:
> > > If you want to split the work, it would be clearer to first convert
> > > hw_features and wanted_features (with all the core code touching it -
> > > this is the easy part), then vlan_features (this includes drivers'
> > > and VLAN code) and then features (it's all over).
> > I like the idea of splitting but it will be only useful when all of it
> > is done and not partially, isn't it? Or am I missing something?
> 
> Since this is a big change, when split it might be easier to follow.
> OTOH, with your idea of macro it might be easier to do incremental
> changes (I think this will be a lot of work for no gain in this case).

I strongly disagree with using macros for this.  They are very likely to
conflict with other identifiers..

We might be able to get away with something like:

	union {
		u32 features;
		u32 feature[N];
	};
	union {
		u32 vlan_features;
		u32 vlan_feature[N];
	};
	union {
		u32 hw_features;
		u32 hw_feature[N];
	};

(assuming hw_features is new enough that there is no need for the
alias).

Anyway, if we're going to put all the feature words in net_device
there's no longer any reason for NETIF_F_LOOPBACK not to be in the first
word.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCHv2 0/9] macb: add support for Cadence GEM
From: Jean-Christophe PLAGNIOL-VILLARD @ 2011-04-05 11:57 UTC (permalink / raw)
  To: Jamie Iles
  Cc: Russell King - ARM Linux, netdev, Nicolas Ferre, David Miller,
	Peter Korsgaard, Andrew Victor, linux-arm-kernel
In-Reply-To: <20110405114754.GG4797@pulham.picochip.com>

On 12:47 Tue 05 Apr     , Jamie Iles wrote:
> On Tue, Apr 05, 2011 at 01:21:02PM +0200, Jean-Christophe PLAGNIOL-VILLARD wrote:
> > On 11:49 Tue 05 Apr     , Jamie Iles wrote:
> > > On Tue, Apr 05, 2011 at 12:28:42PM +0200, Jean-Christophe PLAGNIOL-VILLARD wrote:
> > > > work fine on 9263ek except the IP version detection.
> > > > 
> > > > the at91 macb ip version is supposed to be at 0x0601010C but it's not.
> > > > At least on 9263 it's 0x0001010C. So we can not detect the arch at runtime
> > > > but we can detect that it's a macb.
> > > > 
> > > > So could keep the ifdef for 2 archs but use the ip version on arm
> > > 
> > > OK, well I think my patches are already doing that so should be OK as 
> > > they are.
> > > 
> > > Russell, are you able to take these through your tree (I think they 
> > > count as consolidation work) or should I ask Stephen for a tree in 
> > > linux-next for a while first?
> > no please do not us the is_gem but the same way as I did in the ip detection
> > keep the version register and then check it.
> > 
> > as this ip can be used on other arch we do not want to see thousands 
> > of is_xxx
> 
> But GEM isn't an architecture/machine type, it's a new Cadence Ethernet 
> controller that follows on from MACB, not some arch specific tweaks so 
> we really only have two options - MACB or GEM.
I agree but you can have this ip in other soc so we need to try to make it
as generic as possible
and socs can have specific tweaks
So use the version register and mask it is better in my mind and more flexible
It's a shame that we can not detect the diff between avr32 and at91 via ip
version.

I hope that ip variation could be detected via version register.
Next time.
> 
> Still, if it's important to you then I'll make the change.
If you don't ming please

Best Regards,
J.

^ permalink raw reply

* Re: [PATCH v4] net: Allow no-cache copy from user on transmit
From: David Miller @ 2011-04-05 11:56 UTC (permalink / raw)
  To: therbert; +Cc: netdev
In-Reply-To: <alpine.DEB.2.00.1104042058370.18421@pokey.mtv.corp.google.com>

From: Tom Herbert <therbert@google.com>
Date: Mon, 4 Apr 2011 21:03:30 -0700 (PDT)

> This patch uses __copy_from_user_nocache on transmit to bypass data
> cache for a performance improvement.  skb_add_data_nocache and
> skb_copy_to_page_nocache can be called by sendmsg functions to use
> this feature, initial support is in tcp_sendmsg.  This functionality is
> configurable per device using ethtool.
 ...
> Signed-off-by: Tom Herbert <therbert@google.com>

Applied, thanks Tom.

^ permalink raw reply

* Re: Kernel panic nf_nat_setup_info+0x5b3/0x6e0
From: Patrick McHardy @ 2011-04-05 11:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Oleg A. Arkhangelsky, Changli Gao, netfilter-devel, netdev,
	Paul E McKenney
In-Reply-To: <1301582872.3169.44.camel@edumazet-laptop>

On 31.03.2011 16:47, Eric Dumazet wrote:
> Le jeudi 31 mars 2011 à 18:03 +0400, "Oleg A. Arkhangelsky" a écrit :
>>
>> 26.03.2011, 16:44, "Changli Gao" <xiaosuo@gmail.com>:
>>> On Thu, Mar 3, 2011 at 3:33 PM, Changli Gao <xiaosuo@gmail.com>; wrote:
>>>
>>>>  Please try the patch attached and test if the problem is solved or not. Thanks.
>>>
>>> Any feedback? Thanks.
>>>
>>
>> Seems that patch is fine.
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=21512
>>
> 
> I wonder if this is not hiding another bug.
> 
> Adding an RCU grace period might reduce the probability window.
> 
> By the time nf_conntrack_free(ct) is called, no other cpu/thread
> could/should use ct, or ct->ext ?
> 
> Sure, another thread can find/pass_on ct in a lookup but should not use
> it, since its refcount (ct_general.use) should be 0.
> 
> Patrick ?

I think what's happening is that the conntrack entry is destroyed
and the NAT ct_extend destructor invoked, which removes the nat
extension from the RCU protected bysource hash, after which the
entire extension area is freed. Another CPU might still find the
old NAT entry with undefined contents in the hash though, so I
think using RCU to free the extension area is correct.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCHv2 0/9] macb: add support for Cadence GEM
From: Jamie Iles @ 2011-04-05 11:47 UTC (permalink / raw)
  To: Jean-Christophe PLAGNIOL-VILLARD
  Cc: Jamie Iles, Russell King - ARM Linux, netdev, Nicolas Ferre,
	David Miller, Peter Korsgaard, Andrew Victor, linux-arm-kernel
In-Reply-To: <20110405112102.GB19268@game.jcrosoft.org>

On Tue, Apr 05, 2011 at 01:21:02PM +0200, Jean-Christophe PLAGNIOL-VILLARD wrote:
> On 11:49 Tue 05 Apr     , Jamie Iles wrote:
> > On Tue, Apr 05, 2011 at 12:28:42PM +0200, Jean-Christophe PLAGNIOL-VILLARD wrote:
> > > work fine on 9263ek except the IP version detection.
> > > 
> > > the at91 macb ip version is supposed to be at 0x0601010C but it's not.
> > > At least on 9263 it's 0x0001010C. So we can not detect the arch at runtime
> > > but we can detect that it's a macb.
> > > 
> > > So could keep the ifdef for 2 archs but use the ip version on arm
> > 
> > OK, well I think my patches are already doing that so should be OK as 
> > they are.
> > 
> > Russell, are you able to take these through your tree (I think they 
> > count as consolidation work) or should I ask Stephen for a tree in 
> > linux-next for a while first?
> no please do not us the is_gem but the same way as I did in the ip detection
> keep the version register and then check it.
> 
> as this ip can be used on other arch we do not want to see thousands 
> of is_xxx

But GEM isn't an architecture/machine type, it's a new Cadence Ethernet 
controller that follows on from MACB, not some arch specific tweaks so 
we really only have two options - MACB or GEM.

Still, if it's important to you then I'll make the change.

Jamie

^ permalink raw reply

* [PATCH 14/14] ehea: Add GRO support
From: Anton Blanchard @ 2011-04-05 11:45 UTC (permalink / raw)
  To: leitao; +Cc: netdev, michael
In-Reply-To: <20110405212825.6eb85677@kryten>


Add GRO support to the ehea driver.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: linux-2.6/drivers/net/ehea/ehea_main.c
===================================================================
--- linux-2.6.orig/drivers/net/ehea/ehea_main.c	2011-04-05 20:47:47.870323182 +1000
+++ linux-2.6/drivers/net/ehea/ehea_main.c	2011-04-05 20:47:50.709998860 +1000
@@ -634,18 +634,6 @@ static int ehea_treat_poll_error(struct
 	return 0;
 }
 
-static void ehea_proc_skb(struct ehea_port_res *pr, struct ehea_cqe *cqe,
-			  struct sk_buff *skb)
-{
-	int vlan_extracted = ((cqe->status & EHEA_CQE_VLAN_TAG_XTRACT) &&
-			      pr->port->vgrp);
-
-	if (vlan_extracted)
-		vlan_hwaccel_receive_skb(skb, pr->port->vgrp, cqe->vlan_tag);
-	else
-		netif_receive_skb(skb);
-}
-
 static int ehea_proc_rwqes(struct net_device *dev,
 			   struct ehea_port_res *pr,
 			   int budget)
@@ -728,7 +716,14 @@ static int ehea_proc_rwqes(struct net_de
 			}
 
 			processed_bytes += skb->len;
-			ehea_proc_skb(pr, cqe, skb);
+
+			if ((cqe->status & EHEA_CQE_VLAN_TAG_XTRACT) &&
+			    pr->port->vgrp) {
+				vlan_gro_receive(&pr->napi, pr->port->vgrp,
+					cqe->vlan_tag, skb);
+			} else {
+				napi_gro_receive(&pr->napi, skb);
+			}
 		} else {
 			pr->p_stats.poll_receive_errors++;
 			port_reset = ehea_treat_poll_error(pr, rq, cqe,
@@ -3045,7 +3040,7 @@ struct ehea_port *ehea_setup_single_port
 	dev->netdev_ops = &ehea_netdev_ops;
 	ehea_set_ethtool_ops(dev);
 
-	dev->features = NETIF_F_SG | NETIF_F_TSO
+	dev->features = NETIF_F_SG | NETIF_F_TSO | NETIF_F_GRO
 		      | NETIF_F_HIGHDMA | NETIF_F_IP_CSUM | NETIF_F_HW_VLAN_TX
 		      | NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_FILTER;
 	dev->vlan_features = NETIF_F_SG | NETIF_F_TSO | NETIF_F_HIGHDMA |


^ permalink raw reply

* [PATCH 13/14] ehea: Remove LRO support
From: Anton Blanchard @ 2011-04-05 11:43 UTC (permalink / raw)
  To: leitao; +Cc: netdev, michael
In-Reply-To: <20110405212825.6eb85677@kryten>


In preparation for adding GRO to ehea, remove LRO.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: linux-2.6/drivers/net/ehea/ehea_main.c
===================================================================
--- linux-2.6.orig/drivers/net/ehea/ehea_main.c	2011-04-05 20:47:47.280390562 +1000
+++ linux-2.6/drivers/net/ehea/ehea_main.c	2011-04-05 20:47:47.870323182 +1000
@@ -61,8 +61,6 @@ static int rq2_entries = EHEA_DEF_ENTRIE
 static int rq3_entries = EHEA_DEF_ENTRIES_RQ3;
 static int sq_entries = EHEA_DEF_ENTRIES_SQ;
 static int use_mcs = 1;
-static int use_lro;
-static int lro_max_aggr = EHEA_LRO_MAX_AGGR;
 static int prop_carrier_state;
 
 module_param(msg_level, int, 0);
@@ -72,8 +70,6 @@ module_param(rq3_entries, int, 0);
 module_param(sq_entries, int, 0);
 module_param(prop_carrier_state, int, 0);
 module_param(use_mcs, int, 0);
-module_param(use_lro, int, 0);
-module_param(lro_max_aggr, int, 0);
 
 MODULE_PARM_DESC(msg_level, "msg_level");
 MODULE_PARM_DESC(prop_carrier_state, "Propagate carrier state of physical "
@@ -92,11 +88,6 @@ MODULE_PARM_DESC(sq_entries, " Number of
 		 __MODULE_STRING(EHEA_DEF_ENTRIES_SQ) ")");
 MODULE_PARM_DESC(use_mcs, " 0:NAPI, 1:Multiple receive queues, Default = 0 ");
 
-MODULE_PARM_DESC(lro_max_aggr, " LRO: Max packets to be aggregated. Default = "
-		 __MODULE_STRING(EHEA_LRO_MAX_AGGR));
-MODULE_PARM_DESC(use_lro, " Large Receive Offload, 1: enable, 0: disable, "
-		 "Default = 0");
-
 static int port_name_cnt;
 static LIST_HEAD(adapter_list);
 static unsigned long ehea_driver_flags;
@@ -643,58 +634,16 @@ static int ehea_treat_poll_error(struct
 	return 0;
 }
 
-static int get_skb_hdr(struct sk_buff *skb, void **iphdr,
-		       void **tcph, u64 *hdr_flags, void *priv)
-{
-	struct ehea_cqe *cqe = priv;
-	unsigned int ip_len;
-	struct iphdr *iph;
-
-	/* non tcp/udp packets */
-	if (!cqe->header_length)
-		return -1;
-
-	/* non tcp packet */
-	skb_reset_network_header(skb);
-	iph = ip_hdr(skb);
-	if (iph->protocol != IPPROTO_TCP)
-		return -1;
-
-	ip_len = ip_hdrlen(skb);
-	skb_set_transport_header(skb, ip_len);
-	*tcph = tcp_hdr(skb);
-
-	/* check if ip header and tcp header are complete */
-	if (ntohs(iph->tot_len) < ip_len + tcp_hdrlen(skb))
-		return -1;
-
-	*hdr_flags = LRO_IPV4 | LRO_TCP;
-	*iphdr = iph;
-
-	return 0;
-}
-
 static void ehea_proc_skb(struct ehea_port_res *pr, struct ehea_cqe *cqe,
 			  struct sk_buff *skb)
 {
 	int vlan_extracted = ((cqe->status & EHEA_CQE_VLAN_TAG_XTRACT) &&
 			      pr->port->vgrp);
 
-	if (skb->dev->features & NETIF_F_LRO) {
-		if (vlan_extracted)
-			lro_vlan_hwaccel_receive_skb(&pr->lro_mgr, skb,
-						     pr->port->vgrp,
-						     cqe->vlan_tag,
-						     cqe);
-		else
-			lro_receive_skb(&pr->lro_mgr, skb, cqe);
-	} else {
-		if (vlan_extracted)
-			vlan_hwaccel_receive_skb(skb, pr->port->vgrp,
-						 cqe->vlan_tag);
-		else
-			netif_receive_skb(skb);
-	}
+	if (vlan_extracted)
+		vlan_hwaccel_receive_skb(skb, pr->port->vgrp, cqe->vlan_tag);
+	else
+		netif_receive_skb(skb);
 }
 
 static int ehea_proc_rwqes(struct net_device *dev,
@@ -790,8 +739,6 @@ static int ehea_proc_rwqes(struct net_de
 		}
 		cqe = ehea_poll_rq1(qp, &wqe_index);
 	}
-	if (dev->features & NETIF_F_LRO)
-		lro_flush_all(&pr->lro_mgr);
 
 	pr->rx_packets += processed;
 	pr->rx_bytes += processed_bytes;
@@ -1616,15 +1563,6 @@ static int ehea_init_port_res(struct ehe
 
 	netif_napi_add(pr->port->netdev, &pr->napi, ehea_poll, 64);
 
-	pr->lro_mgr.max_aggr = pr->port->lro_max_aggr;
-	pr->lro_mgr.max_desc = MAX_LRO_DESCRIPTORS;
-	pr->lro_mgr.lro_arr = pr->lro_desc;
-	pr->lro_mgr.get_skb_header = get_skb_hdr;
-	pr->lro_mgr.features = LRO_F_NAPI | LRO_F_EXTRACT_VLAN_ID;
-	pr->lro_mgr.dev = port->netdev;
-	pr->lro_mgr.ip_summed = CHECKSUM_UNNECESSARY;
-	pr->lro_mgr.ip_summed_aggr = CHECKSUM_UNNECESSARY;
-
 	ret = 0;
 	goto out;
 
@@ -3114,9 +3052,6 @@ struct ehea_port *ehea_setup_single_port
 			NETIF_F_IP_CSUM;
 	dev->watchdog_timeo = EHEA_WATCH_DOG_TIMEOUT;
 
-	if (use_lro)
-		dev->features |= NETIF_F_LRO;
-
 	INIT_WORK(&port->reset_task, ehea_reset_port);
 
 	ret = register_netdev(dev);
@@ -3125,8 +3060,6 @@ struct ehea_port *ehea_setup_single_port
 		goto out_unreg_port;
 	}
 
-	port->lro_max_aggr = lro_max_aggr;
-
 	ret = ehea_get_jumboframe_status(port, &jumbo);
 	if (ret)
 		netdev_err(dev, "failed determining jumbo frame status\n");
Index: linux-2.6/drivers/net/ehea/ehea.h
===================================================================
--- linux-2.6.orig/drivers/net/ehea/ehea.h	2011-04-05 20:47:42.730910160 +1000
+++ linux-2.6/drivers/net/ehea/ehea.h	2011-04-05 20:47:47.870323182 +1000
@@ -33,7 +33,6 @@
 #include <linux/ethtool.h>
 #include <linux/vmalloc.h>
 #include <linux/if_vlan.h>
-#include <linux/inet_lro.h>
 
 #include <asm/ibmebus.h>
 #include <asm/abs_addr.h>
@@ -58,7 +57,6 @@
 #define EHEA_MIN_ENTRIES_QP  127
 
 #define EHEA_SMALL_QUEUES
-#define EHEA_LRO_MAX_AGGR 64
 
 #ifdef EHEA_SMALL_QUEUES
 #define EHEA_MAX_CQE_COUNT      1023
@@ -85,8 +83,6 @@
 #define EHEA_RQ2_PKT_SIZE       2048
 #define EHEA_L_PKT_SIZE         256	/* low latency */
 
-#define MAX_LRO_DESCRIPTORS 8
-
 /* Send completion signaling */
 
 /* Protection Domain Identifier */
@@ -382,8 +378,6 @@ struct ehea_port_res {
 	u64 tx_bytes;
 	u64 rx_packets;
 	u64 rx_bytes;
-	struct net_lro_mgr lro_mgr;
-	struct net_lro_desc lro_desc[MAX_LRO_DESCRIPTORS];
 	int sq_restart_flag;
 };
 
@@ -468,7 +462,6 @@ struct ehea_port {
 	u32 msg_enable;
 	u32 sig_comp_iv;
 	u32 state;
-	u32 lro_max_aggr;
 	u8 phy_link;
 	u8 full_duplex;
 	u8 autoneg;
Index: linux-2.6/drivers/net/ehea/ehea_ethtool.c
===================================================================
--- linux-2.6.orig/drivers/net/ehea/ehea_ethtool.c	2011-04-05 20:47:47.280390562 +1000
+++ linux-2.6/drivers/net/ehea/ehea_ethtool.c	2011-04-05 20:47:47.870323182 +1000
@@ -192,9 +192,6 @@ static char ehea_ethtool_stats_keys[][ET
 	{"PR13 free_swqes"},
 	{"PR14 free_swqes"},
 	{"PR15 free_swqes"},
-	{"LRO aggregated"},
-	{"LRO flushed"},
-	{"LRO no_desc"},
 };
 
 static void ehea_get_strings(struct net_device *dev, u32 stringset, u8 *data)
@@ -251,33 +248,6 @@ static void ehea_get_ethtool_stats(struc
 
 	for (k = 0; k < 16; k++)
 		data[i++] = atomic_read(&port->port_res[k].swqe_avail);
-
-	for (k = 0, tmp = 0; k < EHEA_MAX_PORT_RES; k++)
-		tmp |= port->port_res[k].lro_mgr.stats.aggregated;
-	data[i++] = tmp;
-
-	for (k = 0, tmp = 0; k < EHEA_MAX_PORT_RES; k++)
-		tmp |= port->port_res[k].lro_mgr.stats.flushed;
-	data[i++] = tmp;
-
-	for (k = 0, tmp = 0; k < EHEA_MAX_PORT_RES; k++)
-		tmp |= port->port_res[k].lro_mgr.stats.no_desc;
-	data[i++] = tmp;
-
-}
-
-static int ehea_set_flags(struct net_device *dev, u32 data)
-{
-	/* Avoid changing the VLAN flags */
-	if ((data & (ETH_FLAG_RXVLAN | ETH_FLAG_TXVLAN)) !=
-	    (ethtool_op_get_flags(dev) & (ETH_FLAG_RXVLAN |
-					  ETH_FLAG_TXVLAN))){
-		return -EINVAL;
-	}
-
-	return ethtool_op_set_flags(dev, data, ETH_FLAG_LRO
-					| ETH_FLAG_TXVLAN
-					| ETH_FLAG_RXVLAN);
 }
 
 const struct ethtool_ops ehea_ethtool_ops = {
@@ -293,7 +263,6 @@ const struct ethtool_ops ehea_ethtool_op
 	.get_rx_csum = ehea_get_rx_csum,
 	.set_settings = ehea_set_settings,
 	.get_flags = ethtool_op_get_flags,
-	.set_flags = ehea_set_flags,
 	.nway_reset = ehea_nway_reset,		/* Restart autonegotiation */
 	.set_tx_csum = ethtool_op_set_tx_csum,
 	.set_sg = ethtool_op_set_sg,

^ permalink raw reply

* [PATCH 12/14] ehea: Add some more ethtool operations and 64bit stats
From: Anton Blanchard @ 2011-04-05 11:42 UTC (permalink / raw)
  To: leitao; +Cc: netdev, michael
In-Reply-To: <20110405212825.6eb85677@kryten>


We can use the standard ethtool functions for set_tx_csum and set_sg.
Also switch to using ndo_get_stats64 to get 64bit tx/rx stats.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: linux-2.6/drivers/net/ehea/ehea_ethtool.c
===================================================================
--- linux-2.6.orig/drivers/net/ehea/ehea_ethtool.c	2011-03-25 18:34:06.358846652 +1100
+++ linux-2.6/drivers/net/ehea/ehea_ethtool.c	2011-03-25 18:37:37.982330408 +1100
@@ -295,6 +295,8 @@ const struct ethtool_ops ehea_ethtool_op
 	.get_flags = ethtool_op_get_flags,
 	.set_flags = ehea_set_flags,
 	.nway_reset = ehea_nway_reset,		/* Restart autonegotiation */
+	.set_tx_csum = ethtool_op_set_tx_csum,
+	.set_sg = ethtool_op_set_sg,
 };
 
 void ehea_set_ethtool_ops(struct net_device *netdev)
Index: linux-2.6/drivers/net/ehea/ehea_main.c
===================================================================
--- linux-2.6.orig/drivers/net/ehea/ehea_main.c	2011-03-25 18:34:06.378861533 +1100
+++ linux-2.6/drivers/net/ehea/ehea_main.c	2011-03-25 18:37:02.810025945 +1100
@@ -321,10 +321,10 @@ out:
 	spin_unlock_irqrestore(&ehea_bcmc_regs.lock, flags);
 }
 
-static struct net_device_stats *ehea_get_stats(struct net_device *dev)
+static struct rtnl_link_stats64 *ehea_get_stats64(struct net_device *dev,
+					struct rtnl_link_stats64 *stats)
 {
 	struct ehea_port *port = netdev_priv(dev);
-	struct net_device_stats *stats = &port->stats;
 	struct hcp_ehea_port_cb2 *cb2;
 	u64 hret, rx_packets, tx_packets, rx_bytes = 0, tx_bytes = 0;
 	int i;
@@ -3038,7 +3038,7 @@ static const struct net_device_ops ehea_
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= ehea_netpoll,
 #endif
-	.ndo_get_stats		= ehea_get_stats,
+	.ndo_get_stats64	= ehea_get_stats64,
 	.ndo_set_mac_address	= ehea_set_mac_addr,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_set_multicast_list	= ehea_set_multicast_list,

^ permalink raw reply

* [PATCH 11/14] ehea: Remove some unused definitions
From: Anton Blanchard @ 2011-04-05 11:41 UTC (permalink / raw)
  To: leitao; +Cc: netdev, michael
In-Reply-To: <20110405212825.6eb85677@kryten>


The queue macros are many levels deep and it makes it harder to
work your way through them when many of the versions are unused.
Remove the unused versions.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: linux-2.6/drivers/net/ehea/ehea_hw.h
===================================================================
--- linux-2.6.orig/drivers/net/ehea/ehea_hw.h	2011-03-21 14:18:18.642367504 +1100
+++ linux-2.6/drivers/net/ehea/ehea_hw.h	2011-03-21 14:18:52.523619465 +1100
@@ -210,36 +210,11 @@ static inline void epa_store_acc(struct
 	__raw_writeq(value, (void __iomem *)(epa.addr + offset));
 }
 
-#define epa_store_eq(epa, offset, value)\
-	epa_store(epa, EQTEMM_OFFSET(offset), value)
-#define epa_load_eq(epa, offset)\
-	epa_load(epa, EQTEMM_OFFSET(offset))
-
 #define epa_store_cq(epa, offset, value)\
 	epa_store(epa, CQTEMM_OFFSET(offset), value)
 #define epa_load_cq(epa, offset)\
 	epa_load(epa, CQTEMM_OFFSET(offset))
 
-#define epa_store_qp(epa, offset, value)\
-	epa_store(epa, QPTEMM_OFFSET(offset), value)
-#define epa_load_qp(epa, offset)\
-	epa_load(epa, QPTEMM_OFFSET(offset))
-
-#define epa_store_qped(epa, offset, value)\
-	epa_store(epa, QPEDMM_OFFSET(offset), value)
-#define epa_load_qped(epa, offset)\
-	epa_load(epa, QPEDMM_OFFSET(offset))
-
-#define epa_store_mrmw(epa, offset, value)\
-	epa_store(epa, MRMWMM_OFFSET(offset), value)
-#define epa_load_mrmw(epa, offset)\
-	epa_load(epa, MRMWMM_OFFSET(offset))
-
-#define epa_store_base(epa, offset, value)\
-	epa_store(epa, HCAGR_OFFSET(offset), value)
-#define epa_load_base(epa, offset)\
-	epa_load(epa, HCAGR_OFFSET(offset))
-
 static inline void ehea_update_sqa(struct ehea_qp *qp, u16 nr_wqes)
 {
 	struct h_epa epa = qp->epas.kernel;

^ permalink raw reply

* [PATCH 10/14] ehea: Simplify type 3 transmit routine
From: Anton Blanchard @ 2011-04-05 11:41 UTC (permalink / raw)
  To: leitao; +Cc: netdev, michael
In-Reply-To: <20110405212825.6eb85677@kryten>


If a nonlinear skb fits within the immediate area, use skb_copy_bits
instead of copying the frags by hand.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: linux-2.6/drivers/net/ehea/ehea_main.c
===================================================================
--- linux-2.6.orig/drivers/net/ehea/ehea_main.c	2011-03-21 18:32:44.258651836 +1100
+++ linux-2.6/drivers/net/ehea/ehea_main.c	2011-03-21 18:32:50.638888481 +1100
@@ -2095,29 +2095,14 @@ static void ehea_xmit2(struct sk_buff *s
 static void ehea_xmit3(struct sk_buff *skb, struct net_device *dev,
 		       struct ehea_swqe *swqe)
 {
-	int nfrags = skb_shinfo(skb)->nr_frags;
 	u8 *imm_data = &swqe->u.immdata_nodesc.immediate_data[0];
-	skb_frag_t *frag;
-	int i;
 
 	xmit_common(skb, swqe);
 
-	if (nfrags == 0) {
+	if (!skb->data_len)
 		skb_copy_from_linear_data(skb, imm_data, skb->len);
-	} else {
-		skb_copy_from_linear_data(skb, imm_data,
-					  skb_headlen(skb));
-		imm_data += skb_headlen(skb);
-
-		/* ... then copy data from the fragments */
-		for (i = 0; i < nfrags; i++) {
-			frag = &skb_shinfo(skb)->frags[i];
-			memcpy(imm_data,
-			       page_address(frag->page) + frag->page_offset,
-			       frag->size);
-			imm_data += frag->size;
-		}
-	}
+	else
+		skb_copy_bits(skb, 0, imm_data, skb->len);
 
 	swqe->immediate_data_length = skb->len;
 	dev_kfree_skb(skb);

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox