Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] sysctl: fix min/max handling in __do_proc_doulongvec_minmax()
From: Andrew Morton @ 2010-10-07 19:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric W. Biederman, Américo Wang, Robin Holt, linux-kernel,
	Willy Tarreau, David S. Miller, netdev, James Morris,
	Pekka Savola (ipv6), Patrick McHardy, Alexey Kuznetsov
In-Reply-To: <1286470743.2912.276.camel@edumazet-laptop>

On Thu, 07 Oct 2010 18:59:03 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le jeudi 07 octobre 2010 __ 09:37 -0700, Eric W. Biederman a __crit :
> 
> > The difference between long handling and int handling is a
> > usability issue.  I don't expect we will be exporting new
> > vectors via sysctl, so the conversion of a handful of vectors
> > from int to long is where this is most likely to be used.
> > 
> > I skimmed through all of what I presume are the current users
> > aka linux-2.6.36-rcX and there don't appear to be any users
> > of proc_dounlongvec_minmax that use it's vector properties there.
> > 
> > Which doubly tells me that incrementing the min and max pointers
> > is not what we want to do.
> > 
> 
> Thats fine by me, thanks Eric.
> 
> Andrew, please remove previous patch from your tree and replace it by
> following one :
> 
> [PATCH v2] sysctl: fix min/max handling in __do_proc_doulongvec_minmax()
> 
> When proc_doulongvec_minmax() is used with an array of longs,
> and no min/max check requested (.extra1 or .extra2 being NULL), we
> dereference a NULL pointer for the second element of the array.
> 
> Noticed while doing some changes in network stack for the "16TB problem"
> 
> Fix is to not change min & max pointers in
> __do_proc_doulongvec_minmax(), so that all elements of the vector share
> an unique min/max limit, like proc_dointvec_minmax().
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  kernel/sysctl.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index f88552c..8e45451 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -2485,7 +2485,7 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int
>  		kbuf[left] = 0;
>  	}
>  
> -	for (; left && vleft--; i++, min++, max++, first=0) {
> +	for (; left && vleft--; i++, first=0) {
>  		unsigned long val;
>  
>  		if (write) {

Did we check to see whether any present callers are passing in pointers
to arrays of min/max values?

I wonder if there's any documentation for this interface which just
became wrong.

^ permalink raw reply

* Re: [PATCH] net: Fix sk_dst_check() to reset the obsolete dst_entry of a socket.
From: Chung-Yih Wang (王崇懿) @ 2010-10-07 19:37 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, timo.teras
In-Reply-To: <AANLkTimF7piLEy7BUP4-BnmiiGT4NimZw0gE+kmYQd-c@mail.gmail.com>

As I am testing the l2tp/ipsec client(it is working fine on 2.6.32 but
failed on 2.6.35 with the same client). Please see the following log
dump for sk_dst_check().

<2>[  201.390289] ==== sk_dst_check sk=C7485800 dst=C717AC60
obsolete=FFFFFFFF cookie=0 check=C0296510
<2>[  211.247467] ==== sk_dst_check sk=C7485000 dst=C717AC60
obsolete=FFFFFFFF cookie=0 check=C0296510

[Basically, the ipsec tunnel is built and then free the dst_entry for
this l2tp socket. Therefore, the obsolete entry should be reset in
sk_dst_check(). However, the kernel 2.6.35 will do nothing here since
the ipv4_dst_check still return the obsolete entry even if it is
obsolete(dst->obsolete=2)]

<2>[  216.571350] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<6>[  218.069396] alg: No test for authenc(hmac(sha1),cbc(des3_ede))
(authenc(hmac(sha1-generic),cbc(des3_ede-generic)))
<6>[  218.164764] batt:  96%, 4114 mV, 0 mA (-6 avg), 27.2 C, 1305 mAh
<2>[  218.575561] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  220.580535] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  222.585754] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  224.591522] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  226.599212] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  228.602600] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  230.608062] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  232.613464] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  234.618896] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  236.623504] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  238.628936] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  240.634338] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  242.639709] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  244.645111] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  246.648864] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  248.654693] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  250.660125] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510
<2>[  252.665527] ==== sk_dst_check sk=C7485400 dst=C6F670E0
obsolete=00000002 cookie=0 check=C0296510

^ permalink raw reply

* Re: [PATCH] sysctl: fix min/max handling in __do_proc_doulongvec_minmax()
From: Eric W. Biederman @ 2010-10-07 19:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric Dumazet, Américo Wang, Robin Holt, linux-kernel,
	Willy Tarreau, David S. Miller, netdev, James Morris,
	Pekka Savola (ipv6), Patrick McHardy, Alexey Kuznetsov
In-Reply-To: <20101007121840.ca49e2ac.akpm@linux-foundation.org>

Andrew Morton <akpm@linux-foundation.org> writes:

> On Thu, 07 Oct 2010 18:59:03 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:

>> Thats fine by me, thanks Eric.
>> 
>> Andrew, please remove previous patch from your tree and replace it by
>> following one :
>> 
>> [PATCH v2] sysctl: fix min/max handling in __do_proc_doulongvec_minmax()
>> 
>> When proc_doulongvec_minmax() is used with an array of longs,
>> and no min/max check requested (.extra1 or .extra2 being NULL), we
>> dereference a NULL pointer for the second element of the array.
>> 
>> Noticed while doing some changes in network stack for the "16TB problem"
>> 
>> Fix is to not change min & max pointers in
>> __do_proc_doulongvec_minmax(), so that all elements of the vector share
>> an unique min/max limit, like proc_dointvec_minmax().
>> 
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>> ---
>>  kernel/sysctl.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>> index f88552c..8e45451 100644
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -2485,7 +2485,7 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int
>>  		kbuf[left] = 0;
>>  	}
>>  
>> -	for (; left && vleft--; i++, min++, max++, first=0) {
>> +	for (; left && vleft--; i++, first=0) {
>>  		unsigned long val;
>>  
>>  		if (write) {
>
> Did we check to see whether any present callers are passing in pointers
> to arrays of min/max values?

In 2.6.36 there are not any callers that pass in a vector of anything, I
don't know about linux-next.  It looks to me like incrementing min and
max was simply a bug.

> I wonder if there's any documentation for this interface which just
> became wrong.

Or it just became right.  Clearly no one has been expecting min
and max to be vectors.

Eric

^ permalink raw reply

* Re: ixgbe: normalize frag_list usage
From: Alexander Duyck @ 2010-10-07 19:59 UTC (permalink / raw)
  To: David Miller
  Cc: Kirsher, Jeffrey T, Brandeburg, Jesse, Allan, Bruce W,
	netdev@vger.kernel.org
In-Reply-To: <20101006.235837.241424681.davem@davemloft.net>

On 10/6/2010 11:58 PM, David Miller wrote:
> From: "Duyck, Alexander H"<alexander.h.duyck@intel.com>
> Date: Tue, 5 Oct 2010 15:45:32 -0700
>
>> The patch below is kind of what I had in mind for a way to do RSC and maintain
>> the pointer scheme you are looking for.  Consider this patch an RFC for now
>> since I based this off of Jeff's internal testing tree and so it would need
>> some modification to apply cleanly to net-next.
>
> Can you really not remember the head somewhere?

I can track it in the RSC_CB if that works for you.  Right now though I 
guess I am not seeing the difference between tracking this in 
skb->frag_next vs IXGBE_RSC_CB(skb)->frag_head.  I think it might help 
if you were to provide some functions that demonstrate exactly what you 
had in mind for frag list handling.  Specifically if you were to add a 
function for merging a frag into the frag list, and for how you want to 
approach cleaning up the skb->prev/frag_tail_tracker pointer when you 
are cleaning up an active frag_list.

> What I wanted is for everyone to build their frag list SKBs from head to tail,
> always.  So that I could, as I mentioned in my original posting, do something
> like:

My change is doing just that.  It is building from head to tail.  The 
only difference is that tail is tracking head since it has to be updated 
after the tail is added, and then I can drop next/frag_next back to NULL.

> struct sk_buff {
> 	union {
> 		struct list_head list;
> 		struct {
> 			struct sk_buff	*frag_next;
> 			struct sk_buff	*frag_tail_tracker;
> 		};
> 	};
> };
>
> The ->frag_tail_tracker is only used in the head SKB to maintain where the
> last SKB in the frag list is.

That is what I was doing in the head skb.

> You're tracking the head from the inner SKBs, such that my intended
> conventions are not being followed.

What I am doing is tracking the head from the tail skb.  The reason for 
this is that I need some way to update the len, data_len, and truesize 
after I have placed data in the tail via skb_put.  All of the other SKBs 
in the frag list are just tracking the next like any other SKB in the 
frag_list.

Now that I think about it I guess the issue is that I am adding the SKB 
to the frag_list before it is complete.  I will send another patch out 
in the next couple of hours that should address most of the issues.

Thanks,

Alex






^ permalink raw reply

* [PATCH] net: clear heap allocation for ETHTOOL_GRXCLSRLALL
From: Kees Cook @ 2010-10-07 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: David S. Miller, Ben Hutchings, Jeff Garzik, Jeff Kirsher,
	Peter P Waskiewicz Jr, netdev

Calling ETHTOOL_GRXCLSRLALL with a large rule_cnt will allocate kernel
heap without clearing it. For the one driver (niu) that implements it,
it will leave the unused portion of heap unchanged and copy the full
contents back to userspace.

Cc: stable@kernel.org
Signed-off-by: Kees Cook <kees.cook@canonical.com>
---
 net/core/ethtool.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 7a85367..4016ac6 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -348,7 +348,7 @@ static noinline_for_stack int ethtool_get_rxnfc(struct net_device *dev,
 	if (info.cmd == ETHTOOL_GRXCLSRLALL) {
 		if (info.rule_cnt > 0) {
 			if (info.rule_cnt <= KMALLOC_MAX_SIZE / sizeof(u32))
-				rule_buf = kmalloc(info.rule_cnt * sizeof(u32),
+				rule_buf = kzalloc(info.rule_cnt * sizeof(u32),
 						   GFP_USER);
 			if (!rule_buf)
 				return -ENOMEM;
-- 
1.7.1

-- 
Kees Cook
Ubuntu Security Team

^ permalink raw reply related

* [PATCH net-next-2.6] net: Fix rxq ref counting
From: Tom Herbert @ 2010-10-07 20:09 UTC (permalink / raw)
  To: davem, netdev

The rx->count reference is used to track reference counts to the
number of rx-queue kobjects created for the device.  This patch
eliminates initialization of the counter in netif_alloc_rx_queues
and instead increments the counter each time a kobject is created.
This is now symmetric with the decrement that is done when an object is
released.

Signed-off-by: Tom Herbert <therbert@google.com>
---
diff --git a/net/core/dev.c b/net/core/dev.c
index 7d14955..58b31d1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5026,7 +5026,6 @@ static int netif_alloc_rx_queues(struct net_device *dev)
 			return -ENOMEM;
 		}
 		dev->_rx = rx;
-		atomic_set(&rx->count, count);
 
 		/*
 		 * Set a pointer to first element in the array which holds the
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index fa81fd0..b143173 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -726,6 +726,7 @@ static struct kobj_type rx_queue_ktype = {
 static int rx_queue_add_kobject(struct net_device *net, int index)
 {
 	struct netdev_rx_queue *queue = net->_rx + index;
+	struct netdev_rx_queue *first = queue->first;
 	struct kobject *kobj = &queue->kobj;
 	int error = 0;
 
@@ -738,6 +739,7 @@ static int rx_queue_add_kobject(struct net_device *net, int index)
 	}
 
 	kobject_uevent(kobj, KOBJ_ADD);
+	atomic_inc(&first->count);
 
 	return error;
 }

^ permalink raw reply related

* Re: [PATCHv4 net-next-2.6 1/5] XFRM,IPv6: Remove xfrm_spi_hash() dependency on destination address
From: Arnaud Ebalard @ 2010-10-07 20:13 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, Eric Dumazet, Hideaki YOSHIFUJI, netdev
In-Reply-To: <20101005041707.GA26458@gondor.apana.org.au>

Hi,

Herbert Xu <herbert@gondor.apana.org.au> writes:

> On Tue, Oct 05, 2010 at 10:11:14AM +0800, Herbert Xu wrote:
>>
>> I'm thinking about the case where each remote end (or one remote
>> end with many IP addresses) chooses to use a single SPI which then
>> all gets hashed to the same bucket.
>> 
>> Outbound SAs are hashed into the same SPI hash table as inbound SAs.
>
> Another solution would be to create a hash table for inbound SAs
> only.  Unfortunately I don't think we have anything in our current
> user-interface to indicate whether an SA is inbound or outbound.
>
> So to do this you'll need to use a heuristic such as doing a
> lookup on the source/destination address at insertion time.

I spent some time scratching my head trying to find a good solution. It
would indeed be perfect to have a specific hash table for inbound
SA. But as you point, this would only be via a heuristic at insertion
time and there are various cases which would not work: a SA can be
installed w/o any of the addresses being configured on the system.

I think I will try the following alternative approach based on your
comments and proposals:

 - drop my patch to change spi hash computation
 - handle destination address remapping during input upon failure of
   xfrm_state_lookup()
 - handle source address remapping as it is currently done in the patch,
   i.e. by comparing received one against x->props.saddr once the
   state found and do 

To support the destination address remapping, I will have to reverse the
logic I currently have for destination remapping states, to allow the
lookup to be done based on the on-wire address (CoA) instead of the
address in the SA (HoA). If a remapping state is found for the on-wire
address, then a new lookup is done using the associated HoA this time.

I think it would make the feature easier less intrusive for the IPsec
stack.

Thanks for your support and patience, Herbert.

a+

^ permalink raw reply

* Re: [PATCH] net: clear heap allocation for ETHTOOL_GRXCLSRLALL
From: Ben Hutchings @ 2010-10-07 20:28 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, David S. Miller, Jeff Garzik, Jeff Kirsher,
	Peter P Waskiewicz Jr, netdev
In-Reply-To: <20101007200348.GA6038@outflux.net>

On Thu, 2010-10-07 at 13:03 -0700, Kees Cook wrote:
> Calling ETHTOOL_GRXCLSRLALL with a large rule_cnt will allocate kernel
> heap without clearing it. For the one driver (niu) that implements it,
> it will leave the unused portion of heap unchanged and copy the full
> contents back to userspace.
> 
> Cc: stable@kernel.org
> Signed-off-by: Kees Cook <kees.cook@canonical.com>

Acked-by: Ben Hutchings <bhutchings@solarflare.com>

Should have spotted this myself. :-(

Ben.

> ---
>  net/core/ethtool.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index 7a85367..4016ac6 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -348,7 +348,7 @@ static noinline_for_stack int ethtool_get_rxnfc(struct net_device *dev,
>  	if (info.cmd == ETHTOOL_GRXCLSRLALL) {
>  		if (info.rule_cnt > 0) {
>  			if (info.rule_cnt <= KMALLOC_MAX_SIZE / sizeof(u32))
> -				rule_buf = kmalloc(info.rule_cnt * sizeof(u32),
> +				rule_buf = kzalloc(info.rule_cnt * sizeof(u32),
>  						   GFP_USER);
>  			if (!rule_buf)
>  				return -ENOMEM;
> -- 
> 1.7.1
> 

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH V2] Use firmware provided index to register a network interface
From: Matt Domsch @ 2010-10-07 20:42 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Greg KH, K, Narendra, netdev@vger.kernel.org,
	linux-hotplug@vger.kernel.org, linux-pci@vger.kernel.org,
	Hargrave, Jordan, Nijhawan, Vijay, Rose, Charles
In-Reply-To: <AANLkTinP5WPDPvk+kq8vsyP=xC9qcoe+c=1EBp0XJNPk@mail.gmail.com>

On Thu, Oct 07, 2010 at 10:05:14AM -0700, Kay Sievers wrote:
> On Thu, Oct 7, 2010 at 18:48, Greg KH <greg@kroah.com> wrote:
> > On Thu, Oct 07, 2010 at 11:31:13AM -0500, Matt Domsch wrote:
> >> 1) SMBIOS type 41 method.  Windows does not use this today, and I
> >>    can't speak to their future plans.  Narendra's kernel patch does,
> >>    as has biosdevname, the udev helper we first wrote for this
> >>    purpose, for several years.
> >
> > Then stick with that udev helper please :)
> 
> What about just exporting this information in sysfs, and not touch the naming?

The config tools all take a netdevice name as their argument.  What
would it look like then?


$ ifconfig $(netdevname 'Embedded NIC 1')

repeat for each tool that's called?  This is similar to what we
proposed with the userspace patch and libnetdevname, so the lookup can
happen inside each app, rather than the system admin having to do the
translation themselves.  That was rejected too...

otherwise, it's up to a human to do the translation in their head,
which isn't script-friendly.

-Matt

--
Matt Domsch
Technology Strategist
Dell | Office of the CTO

^ permalink raw reply

* [PATCH net-next] neigh: speedup neigh_resolve_output()
From: Eric Dumazet @ 2010-10-07 20:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1286456008.2912.171.camel@edumazet-laptop>

Le jeudi 07 octobre 2010 à 14:53 +0200, Eric Dumazet a écrit :

> Further improvements would need to use a seqlock instead of an rwlock to
> protect neigh->ha[], to not dirty neigh too often and remove two atomic
> ops.
> 

I implemented this idea in following patch, on top on previous one.

[PATCH net-next] neigh: speedup neigh_resolve_output()

Add a seqlock in struct neighbour to protect neigh->ha[], and avoid
dirtying neighbour in stress situation (many different flows / dsts)

Dirtying takes place because of read_lock(&n->lock) and n->used writes.

Switching to a seqlock, and writing n->used only on jiffies changes
permits less dirtying.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/neighbour.h |   16 ++++++++++++
 net/core/neighbour.c    |   47 ++++++++++++++++++++++++--------------
 net/ipv4/arp.c          |    6 +---
 net/sched/sch_teql.c    |    8 +++---
 4 files changed, 51 insertions(+), 26 deletions(-)

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index a4538d5..f04e7a2 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -105,6 +105,7 @@ struct neighbour {
 	atomic_t		refcnt;
 	atomic_t		probes;
 	rwlock_t		lock;
+	seqlock_t		ha_lock;
 	unsigned char		ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))];
 	struct hh_cache		*hh;
 	int			(*output)(struct sk_buff *skb);
@@ -302,7 +303,10 @@ static inline void neigh_confirm(struct neighbour *neigh)
 
 static inline int neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
 {
-	neigh->used = jiffies;
+	unsigned long now = ACCESS_ONCE(jiffies);
+	
+	if (neigh->used != now)
+		neigh->used = now;
 	if (!(neigh->nud_state&(NUD_CONNECTED|NUD_DELAY|NUD_PROBE)))
 		return __neigh_event_send(neigh, skb);
 	return 0;
@@ -373,4 +377,14 @@ struct neighbour_cb {
 
 #define NEIGH_CB(skb)	((struct neighbour_cb *)(skb)->cb)
 
+static inline void neigh_ha_snapshot(char *dst, const struct neighbour *n,
+				     const struct net_device *dev)
+{
+	unsigned int seq;
+
+	do {
+		seq = read_seqbegin(&n->ha_lock);
+		memcpy(dst, n->ha, dev->addr_len);
+	} while (read_seqretry(&n->ha_lock, seq));
+}
 #endif
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 53cc548..54aef9c 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -294,6 +294,7 @@ static struct neighbour *neigh_alloc(struct neigh_table *tbl)
 
 	skb_queue_head_init(&n->arp_queue);
 	rwlock_init(&n->lock);
+	seqlock_init(&n->ha_lock);
 	n->updated	  = n->used = now;
 	n->nud_state	  = NUD_NONE;
 	n->output	  = neigh_blackhole;
@@ -1015,7 +1016,7 @@ out_unlock_bh:
 }
 EXPORT_SYMBOL(__neigh_event_send);
 
-static void neigh_update_hhs(struct neighbour *neigh)
+static void neigh_update_hhs(const struct neighbour *neigh)
 {
 	struct hh_cache *hh;
 	void (*update)(struct hh_cache*, const struct net_device*, const unsigned char *)
@@ -1151,7 +1152,9 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
 	}
 
 	if (lladdr != neigh->ha) {
+		write_seqlock(&neigh->ha_lock);
 		memcpy(&neigh->ha, lladdr, dev->addr_len);
+		write_sequnlock(&neigh->ha_lock);
 		neigh_update_hhs(neigh);
 		if (!(new & NUD_CONNECTED))
 			neigh->confirmed = jiffies -
@@ -1214,6 +1217,7 @@ static inline bool neigh_hh_lookup(struct neighbour *n, struct dst_entry *dst,
 {
 	struct hh_cache *hh;
 
+	smp_rmb(); /* paired with smp_wmb() in neigh_hh_init() */
 	for (hh = n->hh; hh; hh = hh->hh_next) {
 		if (hh->hh_type == protocol) {
 			atomic_inc(&hh->hh_refcnt);
@@ -1248,8 +1252,8 @@ static void neigh_hh_init(struct neighbour *n, struct dst_entry *dst,
 		kfree(hh);
 		return;
 	}
-	read_unlock(&n->lock);
-	write_lock(&n->lock);
+
+	write_lock_bh(&n->lock);
 	
 	/* must check if another thread already did the insert */
 	if (neigh_hh_lookup(n, dst, protocol)) {
@@ -1263,13 +1267,13 @@ static void neigh_hh_init(struct neighbour *n, struct dst_entry *dst,
 		hh->hh_output = n->ops->output;
 
 	hh->hh_next = n->hh;
+	smp_wmb(); /* paired with smp_rmb() in neigh_hh_lookup() */
 	n->hh	    = hh;
 
 	if (unlikely(cmpxchg(&dst->hh, NULL, hh) != NULL))
 		hh_cache_put(hh);
 end:
-	write_unlock(&n->lock);
-	read_lock(&n->lock);
+	write_unlock_bh(&n->lock);
 }
 
 /* This function can be used in contexts, where only old dev_queue_xmit
@@ -1308,16 +1312,18 @@ int neigh_resolve_output(struct sk_buff *skb)
 	if (!neigh_event_send(neigh, skb)) {
 		int err;
 		struct net_device *dev = neigh->dev;
+		unsigned int seq;
 
-		read_lock_bh(&neigh->lock);
 		if (dev->header_ops->cache &&
 		    !dst->hh &&
 		    !(dst->flags & DST_NOCACHE))
 			neigh_hh_init(neigh, dst, dst->ops->protocol);
 
-		err = dev_hard_header(skb, dev, ntohs(skb->protocol),
-				      neigh->ha, NULL, skb->len);
-		read_unlock_bh(&neigh->lock);
+		do {
+			seq = read_seqbegin(&neigh->ha_lock);
+			err = dev_hard_header(skb, dev, ntohs(skb->protocol),
+					      neigh->ha, NULL, skb->len);
+		} while (read_seqretry(&neigh->ha_lock, seq));
 
 		if (err >= 0)
 			rc = neigh->ops->queue_xmit(skb);
@@ -1344,13 +1350,16 @@ int neigh_connected_output(struct sk_buff *skb)
 	struct dst_entry *dst = skb_dst(skb);
 	struct neighbour *neigh = dst->neighbour;
 	struct net_device *dev = neigh->dev;
+	unsigned int seq;
 
 	__skb_pull(skb, skb_network_offset(skb));
 
-	read_lock_bh(&neigh->lock);
-	err = dev_hard_header(skb, dev, ntohs(skb->protocol),
-			      neigh->ha, NULL, skb->len);
-	read_unlock_bh(&neigh->lock);
+	do {
+		seq = read_seqbegin(&neigh->ha_lock);
+		err = dev_hard_header(skb, dev, ntohs(skb->protocol),
+				      neigh->ha, NULL, skb->len);
+	} while (read_seqretry(&neigh->ha_lock, seq));
+
 	if (err >= 0)
 		err = neigh->ops->queue_xmit(skb);
 	else {
@@ -2148,10 +2157,14 @@ static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
 
 	read_lock_bh(&neigh->lock);
 	ndm->ndm_state	 = neigh->nud_state;
-	if ((neigh->nud_state & NUD_VALID) &&
-	    nla_put(skb, NDA_LLADDR, neigh->dev->addr_len, neigh->ha) < 0) {
-		read_unlock_bh(&neigh->lock);
-		goto nla_put_failure;
+	if (neigh->nud_state & NUD_VALID) {
+		char haddr[MAX_ADDR_LEN];
+
+		neigh_ha_snapshot(haddr, neigh, neigh->dev);
+		if (nla_put(skb, NDA_LLADDR, neigh->dev->addr_len, haddr) < 0) {
+			read_unlock_bh(&neigh->lock);
+			goto nla_put_failure;
+		}
 	}
 
 	ci.ndm_used	 = jiffies_to_clock_t(now - neigh->used);
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index f353095..d8e540c 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -502,10 +502,8 @@ int arp_find(unsigned char *haddr, struct sk_buff *skb)
 
 	if (n) {
 		n->used = jiffies;
-		if (n->nud_state&NUD_VALID || neigh_event_send(n, skb) == 0) {
-			read_lock_bh(&n->lock);
-			memcpy(haddr, n->ha, dev->addr_len);
-			read_unlock_bh(&n->lock);
+		if (n->nud_state & NUD_VALID || neigh_event_send(n, skb) == 0) {
+			neigh_ha_snapshot(haddr, n, dev);
 			neigh_release(n);
 			return 0;
 		}
diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index feaabc1..401af95 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -241,11 +241,11 @@ __teql_resolve(struct sk_buff *skb, struct sk_buff *skb_res, struct net_device *
 	}
 	if (neigh_event_send(n, skb_res) == 0) {
 		int err;
+		char haddr[MAX_ADDR_LEN];
 
-		read_lock(&n->lock);
-		err = dev_hard_header(skb, dev, ntohs(skb->protocol),
-				      n->ha, NULL, skb->len);
-		read_unlock(&n->lock);
+		neigh_ha_snapshot(haddr, n, dev);
+		err = dev_hard_header(skb, dev, ntohs(skb->protocol), haddr,
+				      NULL, skb->len);
 
 		if (err < 0) {
 			neigh_release(n);



^ permalink raw reply related

* Re: [PATCH net-next-2.6] net: Fix rxq ref counting
From: Eric Dumazet @ 2010-10-07 20:56 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev
In-Reply-To: <alpine.DEB.1.00.1010071304240.4792@pokey.mtv.corp.google.com>

Le jeudi 07 octobre 2010 à 13:09 -0700, Tom Herbert a écrit :
> The rx->count reference is used to track reference counts to the
> number of rx-queue kobjects created for the device.  This patch
> eliminates initialization of the counter in netif_alloc_rx_queues
> and instead increments the counter each time a kobject is created.
> This is now symmetric with the decrement that is done when an object is
> released.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 7d14955..58b31d1 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5026,7 +5026,6 @@ static int netif_alloc_rx_queues(struct net_device *dev)
>  			return -ENOMEM;
>  		}
>  		dev->_rx = rx;
> -		atomic_set(&rx->count, count);
>  
>  		/*
>  		 * Set a pointer to first element in the array which holds the
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index fa81fd0..b143173 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -726,6 +726,7 @@ static struct kobj_type rx_queue_ktype = {
>  static int rx_queue_add_kobject(struct net_device *net, int index)
>  {
>  	struct netdev_rx_queue *queue = net->_rx + index;
> +	struct netdev_rx_queue *first = queue->first;
>  	struct kobject *kobj = &queue->kobj;
>  	int error = 0;
>  
> @@ -738,6 +739,7 @@ static int rx_queue_add_kobject(struct net_device *net, int index)
>  	}
>  
>  	kobject_uevent(kobj, KOBJ_ADD);
> +	atomic_inc(&first->count);
>  
>  	return error;
>  }

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>



^ permalink raw reply

* Re: [PATCH net-next-2.6] net: Fix rxq ref counting
From: Eric Dumazet @ 2010-10-07 20:59 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev
In-Reply-To: <alpine.DEB.1.00.1010071304240.4792@pokey.mtv.corp.google.com>

Le jeudi 07 octobre 2010 à 13:09 -0700, Tom Herbert a écrit :
> The rx->count reference is used to track reference counts to the
> number of rx-queue kobjects created for the device.  This patch
> eliminates initialization of the counter in netif_alloc_rx_queues
> and instead increments the counter each time a kobject is created.
> This is now symmetric with the decrement that is done when an object is
> released.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 7d14955..58b31d1 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5026,7 +5026,6 @@ static int netif_alloc_rx_queues(struct net_device *dev)
>  			return -ENOMEM;
>  		}
>  		dev->_rx = rx;
> -		atomic_set(&rx->count, count);
>  
>  		/*
>  		 * Set a pointer to first element in the array which holds the
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index fa81fd0..b143173 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -726,6 +726,7 @@ static struct kobj_type rx_queue_ktype = {
>  static int rx_queue_add_kobject(struct net_device *net, int index)
>  {
>  	struct netdev_rx_queue *queue = net->_rx + index;
> +	struct netdev_rx_queue *first = queue->first;
>  	struct kobject *kobj = &queue->kobj;
>  	int error = 0;
>  
> @@ -738,6 +739,7 @@ static int rx_queue_add_kobject(struct net_device *net, int index)
>  	}
>  
>  	kobject_uevent(kobj, KOBJ_ADD);
> +	atomic_inc(&first->count);
>  
>  	return error;
>  }

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>




^ permalink raw reply

* [PATCH] net: clear heap allocations for privileged ethtool actions
From: Kees Cook @ 2010-10-07 21:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: David S. Miller, Ben Hutchings, Jeff Garzik, Jeff Kirsher,
	Peter P Waskiewicz Jr, netdev

Several other ethtool functions leave heap uncleared (potentially) by
drivers. Some interfaces appear safe (eeprom, etc), in that the sizes
are well controlled. In some situations (e.g. unchecked error conditions),
the heap will remain unchanged in areas before copying back to userspace.
Note that these are less of an issue since these all require CAP_NET_ADMIN.

Cc: stable@kernel.org
Signed-off-by: Kees Cook <kees.cook@canonical.com>
---
 net/core/ethtool.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 7a85367..fb9cf30 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -397,7 +397,7 @@ static noinline_for_stack int ethtool_get_rxfh_indir(struct net_device *dev,
 	    (KMALLOC_MAX_SIZE - sizeof(*indir)) / sizeof(*indir->ring_index))
 		return -ENOMEM;
 	full_size = sizeof(*indir) + sizeof(*indir->ring_index) * table_size;
-	indir = kmalloc(full_size, GFP_USER);
+	indir = kzalloc(full_size, GFP_USER);
 	if (!indir)
 		return -ENOMEM;
 
@@ -538,7 +538,7 @@ static int ethtool_get_rx_ntuple(struct net_device *dev, void __user *useraddr)
 
 	gstrings.len = ret;
 
-	data = kmalloc(gstrings.len * ETH_GSTRING_LEN, GFP_USER);
+	data = kzalloc(gstrings.len * ETH_GSTRING_LEN, GFP_USER);
 	if (!data)
 		return -ENOMEM;
 
@@ -775,7 +775,7 @@ static int ethtool_get_regs(struct net_device *dev, char __user *useraddr)
 	if (regs.len > reglen)
 		regs.len = reglen;
 
-	regbuf = kmalloc(reglen, GFP_USER);
+	regbuf = kzalloc(reglen, GFP_USER);
 	if (!regbuf)
 		return -ENOMEM;
 
-- 
1.7.1

-- 
Kees Cook
Ubuntu Security Team

^ permalink raw reply related

* Re: [PATCH] net: clear heap allocations for privileged ethtool actions
From: Eric Dumazet @ 2010-10-07 21:31 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, David S. Miller, Ben Hutchings, Jeff Garzik,
	Jeff Kirsher, Peter P Waskiewicz Jr, netdev
In-Reply-To: <20101007211004.GA20267@outflux.net>

Le jeudi 07 octobre 2010 à 14:10 -0700, Kees Cook a écrit :
> Several other ethtool functions leave heap uncleared (potentially) by
> drivers. Some interfaces appear safe (eeprom, etc), in that the sizes
> are well controlled. In some situations (e.g. unchecked error conditions),
> the heap will remain unchanged in areas before copying back to userspace.
> Note that these are less of an issue since these all require CAP_NET_ADMIN.

> @@ -775,7 +775,7 @@ static int ethtool_get_regs(struct net_device *dev, char __user *useraddr)
>  	if (regs.len > reglen)
>  		regs.len = reglen;
>  
> -	regbuf = kmalloc(reglen, GFP_USER);
> +	regbuf = kzalloc(reglen, GFP_USER);
>  	if (!regbuf)
>  		return -ENOMEM;
>  
> -- 
> 1.7.1
> 

Are you sure this is not hiding a more problematic problem ?

Code does :

reglen = ops->get_regs_len(dev);
if (regs.len > reglen)
	regs.len = reglen;
regbuf = kmalloc(reglen, GFP_USER);

So we can not copy back kernel memory.

However, what happens if user provides regs.len = 1 byte, and driver
get_regs() doesnt properly checks regs.len and write past end of regbuf
-> We probably write on other parts of kernel memory

An audit is needed, but first driver I checked is buggy
(drivers/net/qlcnic/qlcnic_ethtool.c)

->
 	memset(p, 0, qlcnic_get_regs_len(dev));




^ permalink raw reply

* Re: [PATCH] net: clear heap allocations for privileged ethtool actions
From: Ben Hutchings @ 2010-10-07 21:34 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, David S. Miller, Jeff Garzik, Jeff Kirsher,
	Peter P Waskiewicz Jr, netdev
In-Reply-To: <20101007211004.GA20267@outflux.net>

On Thu, 2010-10-07 at 14:10 -0700, Kees Cook wrote:
> Several other ethtool functions leave heap uncleared (potentially) by
> drivers. Some interfaces appear safe (eeprom, etc), in that the sizes
> are well controlled. In some situations (e.g. unchecked error conditions),
> the heap will remain unchanged in areas before copying back to userspace.
> Note that these are less of an issue since these all require CAP_NET_ADMIN.
> 
> Cc: stable@kernel.org
> Signed-off-by: Kees Cook <kees.cook@canonical.com>
> ---
>  net/core/ethtool.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index 7a85367..fb9cf30 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -397,7 +397,7 @@ static noinline_for_stack int ethtool_get_rxfh_indir(struct net_device *dev,
>  	    (KMALLOC_MAX_SIZE - sizeof(*indir)) / sizeof(*indir->ring_index))
>  		return -ENOMEM;
>  	full_size = sizeof(*indir) + sizeof(*indir->ring_index) * table_size;
> -	indir = kmalloc(full_size, GFP_USER);
> +	indir = kzalloc(full_size, GFP_USER);
>  	if (!indir)
>  		return -ENOMEM;
>  
[...]

Acked-by: Ben Hutchings <bhutchings@solarflare.com>

You could alternately recalculate full_size before copying back to the
user buffer:

	full_size = sizeof(*indir) + sizeof(*indir->ring_index) * indir->size;

but kzalloc() is more obviously safe.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH] net: clear heap allocations for privileged ethtool actions
From: Ben Hutchings @ 2010-10-07 21:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kees Cook, linux-kernel, David S. Miller, Jeff Garzik,
	Jeff Kirsher, Peter P Waskiewicz Jr, netdev
In-Reply-To: <1286487085.3745.99.camel@edumazet-laptop>

On Thu, 2010-10-07 at 23:31 +0200, Eric Dumazet wrote:
> Le jeudi 07 octobre 2010 à 14:10 -0700, Kees Cook a écrit :
> > Several other ethtool functions leave heap uncleared (potentially) by
> > drivers. Some interfaces appear safe (eeprom, etc), in that the sizes
> > are well controlled. In some situations (e.g. unchecked error conditions),
> > the heap will remain unchanged in areas before copying back to userspace.
> > Note that these are less of an issue since these all require CAP_NET_ADMIN.
> 
> > @@ -775,7 +775,7 @@ static int ethtool_get_regs(struct net_device *dev, char __user *useraddr)
> >  	if (regs.len > reglen)
> >  		regs.len = reglen;
> >  
> > -	regbuf = kmalloc(reglen, GFP_USER);
> > +	regbuf = kzalloc(reglen, GFP_USER);

Actually, I recently changed this to vmalloc() so your patch won't
apply.

> >  	if (!regbuf)
> >  		return -ENOMEM;
> >  
> > -- 
> > 1.7.1
> > 
> 
> Are you sure this is not hiding a more problematic problem ?
> 
> Code does :
> 
> reglen = ops->get_regs_len(dev);
> if (regs.len > reglen)
> 	regs.len = reglen;
> regbuf = kmalloc(reglen, GFP_USER);
> 
> So we can not copy back kernel memory.
> 
> However, what happens if user provides regs.len = 1 byte, and driver
> get_regs() doesnt properly checks regs.len and write past end of regbuf
> -> We probably write on other parts of kernel memory
[...]

Why should the driver's get_regs() check regs.len?  The buffer is
allocated based on reglen which is provided by the driver, not the user.

reglen (length of the kernel buffer) is not reduced; regs.len (length of
the user buffer) is.  That lets the user know how much of the user
buffer was actually used.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] net: clear heap allocations for privileged ethtool actions
From: Kees Cook @ 2010-10-07 21:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, David S. Miller, Ben Hutchings, Jeff Garzik,
	Jeff Kirsher, Peter P Waskiewicz Jr, netdev
In-Reply-To: <1286487085.3745.99.camel@edumazet-laptop>

Hi Eric,

On Thu, Oct 07, 2010 at 11:31:25PM +0200, Eric Dumazet wrote:
> Le jeudi 07 octobre 2010 à 14:10 -0700, Kees Cook a écrit :
> > Several other ethtool functions leave heap uncleared (potentially) by
> > drivers. Some interfaces appear safe (eeprom, etc), in that the sizes
> > are well controlled. In some situations (e.g. unchecked error conditions),
> > the heap will remain unchanged in areas before copying back to userspace.
> > Note that these are less of an issue since these all require CAP_NET_ADMIN.
> 
> > @@ -775,7 +775,7 @@ static int ethtool_get_regs(struct net_device *dev, char __user *useraddr)
> >  	if (regs.len > reglen)
> >  		regs.len = reglen;
> >  
> > -	regbuf = kmalloc(reglen, GFP_USER);
> > +	regbuf = kzalloc(reglen, GFP_USER);
> >  	if (!regbuf)
> >  		return -ENOMEM;
> >  
> > -- 
> > 1.7.1
> > 
> 
> Are you sure this is not hiding a more problematic problem ?
> 
> Code does :
> 
> reglen = ops->get_regs_len(dev);
> if (regs.len > reglen)
> 	regs.len = reglen;
> regbuf = kmalloc(reglen, GFP_USER);
> 
> So we can not copy back kernel memory.
> 
> However, what happens if user provides regs.len = 1 byte, and driver
> get_regs() doesnt properly checks regs.len and write past end of regbuf
> -> We probably write on other parts of kernel memory

This code is basically a max() call from what I see.

regbuf = kmalloc(max(regs.len, ops->get_regs_len(dev)), GFP_USER);

If the user passes regs.len = 1, it will be ignored in favor of reglen,
so we'll not write past the end of regbuf, unless I'm misunderstanding.

-Kees

-- 
Kees Cook
Ubuntu Security Team

^ permalink raw reply

* IPv4: sysctl table check failed [was: mmotm 2010-10-07-14-08 uploaded]
From: Jiri Slaby @ 2010-10-07 22:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, mm-commits, ML netdev, David S. Miller
In-Reply-To: <201010072140.o97Le69i025659@imap1.linux-foundation.org>

On 10/07/2010 11:08 PM, akpm@linux-foundation.org wrote:
> The mm-of-the-moment snapshot 2010-10-07-14-08 has been uploaded to

Hi, I got bunch of "sysctl table check failed" below. All seem to be
related to ipv4:

sysctl table check failed: /net/ipv4/tcp_mem  No min
Pid: 1, comm: swapper Not tainted 2.6.36-rc7-mm1_64+ #1285
Call Trace:
 [<ffffffff8108d444>] set_fail+0xa4/0xf0
 [<ffffffff8108d736>] sysctl_check_table+0x2a6/0x310
 [<ffffffff8108d4eb>] sysctl_check_table+0x5b/0x310
 [<ffffffff8108d4eb>] sysctl_check_table+0x5b/0x310
 [<ffffffff81072204>] __register_sysctl_paths+0xf4/0x320
 [<ffffffff815c1232>] ? printk+0x3c/0x42
 [<ffffffff8115d3bc>] ? sysfs_add_file+0xc/0x10
 [<ffffffff818b18bd>] ? sysctl_ipv4_init+0x0/0x87
 [<ffffffff81072456>] register_sysctl_paths+0x26/0x30
 [<ffffffff818b18fd>] sysctl_ipv4_init+0x40/0x87
 [<ffffffff810002df>] do_one_initcall+0x3f/0x170
 [<ffffffff81884d44>] kernel_init+0x158/0x1e2
 [<ffffffff8102fb54>] kernel_thread_helper+0x4/0x10
 [<ffffffff81884bec>] ? kernel_init+0x0/0x1e2
 [<ffffffff8102fb50>] ? kernel_thread_helper+0x0/0x10
sysctl table check failed: /net/ipv4/tcp_mem  No max
Pid: 1, comm: swapper Not tainted 2.6.36-rc7-mm1_64+ #1285
Call Trace:
 [<ffffffff8108d444>] set_fail+0xa4/0xf0
 [<ffffffff8108d4da>] sysctl_check_table+0x4a/0x310
 [<ffffffff8108d4eb>] sysctl_check_table+0x5b/0x310
 [<ffffffff8108d4eb>] sysctl_check_table+0x5b/0x310
 [<ffffffff81072204>] __register_sysctl_paths+0xf4/0x320
 [<ffffffff815c1232>] ? printk+0x3c/0x42
 [<ffffffff8115d3bc>] ? sysfs_add_file+0xc/0x10
 [<ffffffff818b18bd>] ? sysctl_ipv4_init+0x0/0x87
 [<ffffffff81072456>] register_sysctl_paths+0x26/0x30
 [<ffffffff818b18fd>] sysctl_ipv4_init+0x40/0x87
 [<ffffffff810002df>] do_one_initcall+0x3f/0x170
 [<ffffffff81884d44>] kernel_init+0x158/0x1e2
 [<ffffffff8102fb54>] kernel_thread_helper+0x4/0x10
 [<ffffffff81884bec>] ? kernel_init+0x0/0x1e2
 [<ffffffff8102fb50>] ? kernel_thread_helper+0x0/0x10
sysctl table check failed: /net/ipv4/udp_mem  No min
Pid: 1, comm: swapper Not tainted 2.6.36-rc7-mm1_64+ #1285
Call Trace:
 [<ffffffff8108d444>] set_fail+0xa4/0xf0
 [<ffffffff8108d736>] sysctl_check_table+0x2a6/0x310
 [<ffffffff8108d4eb>] sysctl_check_table+0x5b/0x310
 [<ffffffff8108d4eb>] sysctl_check_table+0x5b/0x310
 [<ffffffff81072204>] __register_sysctl_paths+0xf4/0x320
 [<ffffffff815c1232>] ? printk+0x3c/0x42
 [<ffffffff8115d3bc>] ? sysfs_add_file+0xc/0x10
 [<ffffffff818b18bd>] ? sysctl_ipv4_init+0x0/0x87
 [<ffffffff81072456>] register_sysctl_paths+0x26/0x30
 [<ffffffff818b18fd>] sysctl_ipv4_init+0x40/0x87
 [<ffffffff810002df>] do_one_initcall+0x3f/0x170
 [<ffffffff81884d44>] kernel_init+0x158/0x1e2
 [<ffffffff8102fb54>] kernel_thread_helper+0x4/0x10
 [<ffffffff81884bec>] ? kernel_init+0x0/0x1e2
 [<ffffffff8102fb50>] ? kernel_thread_helper+0x0/0x10
sysctl table check failed: /net/ipv4/udp_mem  No max
Pid: 1, comm: swapper Not tainted 2.6.36-rc7-mm1_64+ #1285
Call Trace:
 [<ffffffff8108d444>] set_fail+0xa4/0xf0
 [<ffffffff8108d4da>] sysctl_check_table+0x4a/0x310
 [<ffffffff8108d4eb>] sysctl_check_table+0x5b/0x310
 [<ffffffff8108d4eb>] sysctl_check_table+0x5b/0x310
 [<ffffffff81072204>] __register_sysctl_paths+0xf4/0x320
 [<ffffffff815c1232>] ? printk+0x3c/0x42
 [<ffffffff8115d3bc>] ? sysfs_add_file+0xc/0x10
 [<ffffffff818b18bd>] ? sysctl_ipv4_init+0x0/0x87
 [<ffffffff81072456>] register_sysctl_paths+0x26/0x30
 [<ffffffff818b18fd>] sysctl_ipv4_init+0x40/0x87
 [<ffffffff810002df>] do_one_initcall+0x3f/0x170
 [<ffffffff81884d44>] kernel_init+0x158/0x1e2
 [<ffffffff8102fb54>] kernel_thread_helper+0x4/0x10
 [<ffffffff81884bec>] ? kernel_init+0x0/0x1e2
 [<ffffffff8102fb50>] ? kernel_thread_helper+0x0/0x10

regards,
-- 
js

^ permalink raw reply

* Re: IPv4: sysctl table check failed [was: mmotm 2010-10-07-14-08 uploaded]
From: Eric Dumazet @ 2010-10-07 22:22 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: linux-kernel, akpm, mm-commits, ML netdev, David S. Miller
In-Reply-To: <4CAE4479.6010606@gmail.com>

Le vendredi 08 octobre 2010 à 00:06 +0200, Jiri Slaby a écrit :
> On 10/07/2010 11:08 PM, akpm@linux-foundation.org wrote:
> > The mm-of-the-moment snapshot 2010-10-07-14-08 has been uploaded to
> 
> Hi, I got bunch of "sysctl table check failed" below. All seem to be
> related to ipv4:

I would say, sysctl check is buggy :(

min/max are optional

[PATCH] sysctl: min/max bounds are optional

sysctl check complains when proc_doulongvec_minmax or
proc_doulongvec_ms_jiffies_minmax are used by a vector of longs (with
more than one element), with no min or max value specified.

This is unexpected, given we had a bug on this min/max handling :)

Reported-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 kernel/sysctl_check.c |    9 ---------
 1 file changed, 9 deletions(-)

diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index 04cdcf7..10b90d8 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -143,15 +143,6 @@ int sysctl_check_table(struct nsproxy *namespaces, struct ctl_table *table)
 				if (!table->maxlen)
 					set_fail(&fail, table, "No maxlen");
 			}
-			if ((table->proc_handler == proc_doulongvec_minmax) ||
-			    (table->proc_handler == proc_doulongvec_ms_jiffies_minmax)) {
-				if (table->maxlen > sizeof (unsigned long)) {
-					if (!table->extra1)
-						set_fail(&fail, table, "No min");
-					if (!table->extra2)
-						set_fail(&fail, table, "No max");
-				}
-			}
 #ifdef CONFIG_PROC_SYSCTL
 			if (table->procname && !table->proc_handler)
 				set_fail(&fail, table, "No proc_handler");



^ permalink raw reply related

* Re: IPv4: sysctl table check failed [was: mmotm 2010-10-07-14-08 uploaded]
From: Andrew Morton @ 2010-10-07 22:22 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: linux-kernel, mm-commits, ML netdev, David S. Miller,
	Eric Dumazet, Eric W. Biederman
In-Reply-To: <4CAE4479.6010606@gmail.com>

On Fri, 08 Oct 2010 00:06:49 +0200
Jiri Slaby <jirislaby@gmail.com> wrote:

> On 10/07/2010 11:08 PM, akpm@linux-foundation.org wrote:
> > The mm-of-the-moment snapshot 2010-10-07-14-08 has been uploaded to
> 
> Hi, I got bunch of "sysctl table check failed" below. All seem to be
> related to ipv4:
> 
> sysctl table check failed: /net/ipv4/tcp_mem  No min
> Pid: 1, comm: swapper Not tainted 2.6.36-rc7-mm1_64+ #1285
> Call Trace:
>  [<ffffffff8108d444>] set_fail+0xa4/0xf0
>  [<ffffffff8108d736>] sysctl_check_table+0x2a6/0x310
>  [<ffffffff8108d4eb>] sysctl_check_table+0x5b/0x310
>  [<ffffffff8108d4eb>] sysctl_check_table+0x5b/0x310
>  [<ffffffff81072204>] __register_sysctl_paths+0xf4/0x320
>  [<ffffffff815c1232>] ? printk+0x3c/0x42
>  [<ffffffff8115d3bc>] ? sysfs_add_file+0xc/0x10
>  [<ffffffff818b18bd>] ? sysctl_ipv4_init+0x0/0x87
>  [<ffffffff81072456>] register_sysctl_paths+0x26/0x30
>  [<ffffffff818b18fd>] sysctl_ipv4_init+0x40/0x87
>  [<ffffffff810002df>] do_one_initcall+0x3f/0x170
>  [<ffffffff81884d44>] kernel_init+0x158/0x1e2
>  [<ffffffff8102fb54>] kernel_thread_helper+0x4/0x10
>  [<ffffffff81884bec>] ? kernel_init+0x0/0x1e2
>  [<ffffffff8102fb50>] ? kernel_thread_helper+0x0/0x10

OK, thanks.

Eric D's net-avoid-limits-overflow.patch switched tcp_mem and udp_mem
from proc_dointvec() to proc_doulongvec_minmax().  And
sysctl_check_table() checks `min' and `max' for
proc_doulongvec_minmax() but not for proc_dointvec().

I'm not sure which Eric to blame ;) .min and .max are optional, so
perhaps the check is wrong?


^ permalink raw reply

* Re: IPv4: sysctl table check failed [was: mmotm 2010-10-07-14-08 uploaded]
From: Andrew Morton @ 2010-10-07 22:28 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jiri Slaby, linux-kernel, mm-commits, ML netdev, David S. Miller,
	Eric W. Biederman
In-Reply-To: <1286490135.6536.75.camel@edumazet-laptop>

On Fri, 08 Oct 2010 00:22:15 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le vendredi 08 octobre 2010 __ 00:06 +0200, Jiri Slaby a __crit :
> > On 10/07/2010 11:08 PM, akpm@linux-foundation.org wrote:
> > > The mm-of-the-moment snapshot 2010-10-07-14-08 has been uploaded to
> > 
> > Hi, I got bunch of "sysctl table check failed" below. All seem to be
> > related to ipv4:
> 
> I would say, sysctl check is buggy :(
> 
> min/max are optional
> 
> [PATCH] sysctl: min/max bounds are optional
> 
> sysctl check complains when proc_doulongvec_minmax or
> proc_doulongvec_ms_jiffies_minmax are used by a vector of longs (with
> more than one element), with no min or max value specified.
> 
> This is unexpected, given we had a bug on this min/max handling :)
> 
> Reported-by: Jiri Slaby <jirislaby@gmail.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  kernel/sysctl_check.c |    9 ---------
>  1 file changed, 9 deletions(-)
> 
> diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
> index 04cdcf7..10b90d8 100644
> --- a/kernel/sysctl_check.c
> +++ b/kernel/sysctl_check.c
> @@ -143,15 +143,6 @@ int sysctl_check_table(struct nsproxy *namespaces, struct ctl_table *table)
>  				if (!table->maxlen)
>  					set_fail(&fail, table, "No maxlen");
>  			}
> -			if ((table->proc_handler == proc_doulongvec_minmax) ||
> -			    (table->proc_handler == proc_doulongvec_ms_jiffies_minmax)) {
> -				if (table->maxlen > sizeof (unsigned long)) {
> -					if (!table->extra1)
> -						set_fail(&fail, table, "No min");
> -					if (!table->extra2)
> -						set_fail(&fail, table, "No max");
> -				}
> -			}
>  #ifdef CONFIG_PROC_SYSCTL
>  			if (table->procname && !table->proc_handler)
>  				set_fail(&fail, table, "No proc_handler");

That will probably fix it ;)

net-avoid-limits-overflow.patch is dependent on this patch.  Unless
Eric B squeaks I'll plan on sending this patch in for 2.6.37.


^ permalink raw reply

* [PATCH] ehea: Fix a checksum issue on the receive path
From: leitao @ 2010-10-07 23:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, Breno Leitao, Jay Vosburgh

Currently we set all skbs with CHECKSUM_UNNECESSARY, even
those whose protocol we don't know. This patch just
add the CHECKSUM_COMPLETE tag for non TCP/UDP packets.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
---
 drivers/net/ehea/ehea_main.c |    9 ++++++++-
 drivers/net/ehea/ehea_qmr.h  |    1 +
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 0471cae..45fd045 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -533,8 +533,15 @@ static inline void ehea_fill_skb(struct net_device *dev,
 	int length = cqe->num_bytes_transfered - 4;	/*remove CRC */
 
 	skb_put(skb, length);
-	skb->ip_summed = CHECKSUM_UNNECESSARY;
 	skb->protocol = eth_type_trans(skb, dev);
+
+	/* The packet was not an IPV4 packet so a complemented checksum was
+	   calculated. The value is found in the Internet Checksum field. */
+	if (cqe->status & EHEA_CQE_BLIND_CKSUM) {
+		skb->ip_summed = CHECKSUM_COMPLETE;
+		skb->csum = csum_unfold(~cqe->inet_checksum_value);
+	} else
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
 }
 
 static inline struct sk_buff *get_skb_by_index(struct sk_buff **skb_array,
diff --git a/drivers/net/ehea/ehea_qmr.h b/drivers/net/ehea/ehea_qmr.h
index f608a6c..3810473 100644
--- a/drivers/net/ehea/ehea_qmr.h
+++ b/drivers/net/ehea/ehea_qmr.h
@@ -150,6 +150,7 @@ struct ehea_rwqe {
 #define EHEA_CQE_TYPE_RQ           0x60
 #define EHEA_CQE_STAT_ERR_MASK     0x700F
 #define EHEA_CQE_STAT_FAT_ERR_MASK 0xF
+#define EHEA_CQE_BLIND_CKSUM       0x8000
 #define EHEA_CQE_STAT_ERR_TCP      0x4000
 #define EHEA_CQE_STAT_ERR_IP       0x2000
 #define EHEA_CQE_STAT_ERR_CRC      0x1000
-- 
1.6.5


^ permalink raw reply related

* [PATCH] ehea: simplify conditional
From: Nicolas Kaiser @ 2010-10-07 23:14 UTC (permalink / raw)
  To: Breno Leitao; +Cc: netdev, linux-kernel

Simplify: ((a && b) || (!a && !b)) => (a == b)

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
---
 drivers/net/ehea/ehea_main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 190fb69..ef305f3 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -1914,7 +1914,7 @@ static void ehea_promiscuous(struct net_device *dev, int enable)
 	struct hcp_ehea_port_cb7 *cb7;
 	u64 hret;
 
-	if ((enable && port->promisc) || (!enable && !port->promisc))
+	if (enable == port->promisc)
 		return;
 
 	cb7 = (void *)get_zeroed_page(GFP_ATOMIC);
-- 
1.7.2.2

^ permalink raw reply related

* Re: [PATCH] ehea: simplify conditional
From: Breno Leitao @ 2010-10-07 23:49 UTC (permalink / raw)
  To: Nicolas Kaiser; +Cc: netdev, linux-kernel
In-Reply-To: <20101008011450.0126ffc0@absol.kitzblitz>

On 10/07/2010 08:14 PM, Nicolas Kaiser wrote:
> Simplify: ((a&&  b) || (!a&&  !b)) =>  (a == b)
>
> Signed-off-by: Nicolas Kaiser<nikai@nikai.net>
Acked-by: Breno Leitao <leitao@linux.vnet.ibm.com>

^ permalink raw reply

* [PATCH 2/2 net-next] bnx2: Enable AER on PCIE devices only
From: Michael Chan @ 2010-10-07 23:42 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1286494973-5115-1-git-send-email-mchan@broadcom.com>

To prevent unnecessary error message.  pci_save_state() is also moved to
the end of ->probe() so that all PCI config, including AER state, will be
saved.

Update version to 2.0.18.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Benjamin Li <mchan@broadcom.com>
---
 drivers/net/bnx2.c |   36 ++++++++++++++++++++++--------------
 1 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 56f3dfe..ae894bc 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -59,8 +59,8 @@
 #include "bnx2_fw.h"
 
 #define DRV_MODULE_NAME		"bnx2"
-#define DRV_MODULE_VERSION	"2.0.17"
-#define DRV_MODULE_RELDATE	"July 18, 2010"
+#define DRV_MODULE_VERSION	"2.0.18"
+#define DRV_MODULE_RELDATE	"Oct 7, 2010"
 #define FW_MIPS_FILE_06		"bnx2/bnx2-mips-06-6.0.15.fw"
 #define FW_RV2P_FILE_06		"bnx2/bnx2-rv2p-06-6.0.15.fw"
 #define FW_MIPS_FILE_09		"bnx2/bnx2-mips-09-6.0.17.fw"
@@ -7915,16 +7915,7 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device *dev)
 		goto err_out_disable;
 	}
 
-	/* AER (Advanced Error Reporting) hooks */
-	err = pci_enable_pcie_error_reporting(pdev);
-	if (err) {
-		dev_err(&pdev->dev, "pci_enable_pcie_error_reporting failed "
-				    "0x%x\n", err);
-		/* non-fatal, continue */
-	}
-
 	pci_set_master(pdev);
-	pci_save_state(pdev);
 
 	bp->pm_cap = pci_find_capability(pdev, PCI_CAP_ID_PM);
 	if (bp->pm_cap == 0) {
@@ -7979,6 +7970,15 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device *dev)
 		bp->flags |= BNX2_FLAG_PCIE;
 		if (CHIP_REV(bp) == CHIP_REV_Ax)
 			bp->flags |= BNX2_FLAG_JUMBO_BROKEN;
+
+		/* AER (Advanced Error Reporting) hooks */
+		err = pci_enable_pcie_error_reporting(pdev);
+		if (err) {
+			dev_err(&pdev->dev, "pci_enable_pcie_error_reporting "
+					    "failed 0x%x\n", err);
+			/* non-fatal, continue */
+		}
+
 	} else {
 		bp->pcix_cap = pci_find_capability(pdev, PCI_CAP_ID_PCIX);
 		if (bp->pcix_cap == 0) {
@@ -8235,16 +8235,20 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device *dev)
 	bp->timer.data = (unsigned long) bp;
 	bp->timer.function = bnx2_timer;
 
+	pci_save_state(pdev);
+
 	return 0;
 
 err_out_unmap:
+	if (bp->flags & BNX2_FLAG_PCIE)
+		pci_disable_pcie_error_reporting(pdev);
+
 	if (bp->regview) {
 		iounmap(bp->regview);
 		bp->regview = NULL;
 	}
 
 err_out_release:
-	pci_disable_pcie_error_reporting(pdev);
 	pci_release_regions(pdev);
 
 err_out_disable:
@@ -8434,9 +8438,10 @@ bnx2_remove_one(struct pci_dev *pdev)
 
 	kfree(bp->temp_stats_blk);
 
-	free_netdev(dev);
+	if (bp->flags & BNX2_FLAG_PCIE)
+		pci_disable_pcie_error_reporting(pdev);
 
-	pci_disable_pcie_error_reporting(pdev);
+	free_netdev(dev);
 
 	pci_release_regions(pdev);
 	pci_disable_device(pdev);
@@ -8550,6 +8555,9 @@ static pci_ers_result_t bnx2_io_slot_reset(struct pci_dev *pdev)
 	}
 	rtnl_unlock();
 
+	if (!(bp->flags & BNX2_FLAG_PCIE))
+		return result;
+
 	err = pci_cleanup_aer_uncorrect_error_status(pdev);
 	if (err) {
 		dev_err(&pdev->dev,
-- 
1.6.4.GIT



^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox