Netdev List
 help / color / mirror / Atom feed
* Re: [RFC v2] ipvs: allow transmit of GRO aggregated skbs
From: Simon Horman @ 2010-11-08  7:31 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: lvs-devel, netdev, Herbert Xu
In-Reply-To: <20101106142817.GA27212@verge.net.au>

[ CCing Herbet Xu ]

On Sat, Nov 06, 2010 at 11:28:21PM +0900, Simon Horman wrote:
> On Sat, Nov 06, 2010 at 04:18:21PM +0200, Julian Anastasov wrote:
> > 
> > 	Hello,
> > 
> > On Sat, 6 Nov 2010, Simon Horman wrote:
> > 
> > >This is a first attempt at allowing LVS to transmit
> > >skbs of greater than MTU length that have been aggregated by GRO.
> > >
> > >I have lightly tested the ip_vs_dr_xmit() portion of this patch and
> > >although it seems to work I am unsure that netif_needs_gso() is the correct
> > >test to use.
> > 
> > 	ip_forward() uses !skb_is_gso(skb), so may be it is
> > enough to check for GRO instead of using netif_needs_gso?
> 
> Thanks, I'll look into that.

Hi Julian,

just to clarify, you think that !skb_is_gso(skb) should be
used in ip_vs_xmit.c? If so, yes I think that makes sense
and I'll re-spin my patch accordingly.

^ permalink raw reply

* Re: [Security] [SECURITY] Fix leaking of kernel heap addresses via /proc
From: Eric Dumazet @ 2010-11-08  7:33 UTC (permalink / raw)
  To: David Miller
  Cc: andi, drosenberg, chas3, tytso, torvalds, kuznet, pekkas, jmorris,
	yoshfuji, kaber, remi.denis-courmont, netdev, security
In-Reply-To: <20101107.180108.71121019.davem@davemloft.net>

Le dimanche 07 novembre 2010 à 18:01 -0800, David Miller a écrit :
> From: Andi Kleen <andi@firstfloor.org>
> Date: Mon, 8 Nov 2010 00:56:10 +0100
> 
> > I would just remove the pointers from /proc and supply 
> > gdb macros that extract the equivalent information from /proc/kcore.
> > This is a bit racy, but for debugging it should be no
> > problem to run them multiple times as needed.
> 
> I do not think at all that this is tenable for the kind of
> things people use the socket pointers for when debugging
> problems.
> 
> I defeinitely prefer the inode number to this idea.

We currently have no guarantee of sockets inode numbers unicity.
I admit chances of clash are low.

When a printk() happens right before a BUG(), how are we going to check
the dumped registers are possibly close the socket involved, if we dont
have access to the machine, and only the crashlog ?

BTW, any local user can look at "dmesg", and crash reports. These
reports are even published on a remote site (bugzilla) so that hostile
hackers can be feeded.

I am OK to delete socket pointers from /proc files for non root users
(after checking things like lsof continue to work correctly).
I dont remember using them while doing debugging stuff.

BTW, rtnetlink also expose socket pointers to non root users :

$ ss -e dst 192.168.20.108
State      Recv-Q Send-Q    Local Address:Port    Peer Address:Port   
ESTAB      0      0         10.150.51.210:46979   192.168.20.108:ssh 
timer:(keepalive,119min,0) ino:136919 sk:ffff88002129d7c0


Mixing in same patch /proc pointers removal and printk() pointers
removal seems wrong to me. Very different problems.




^ permalink raw reply

* RE: [PATCH v13 10/16] Add a hook to intercept external buffers from NIC driver.
From: Xin, Xiaohui @ 2010-11-08  7:43 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
	herbert@gondor.apana.org.au, jdike@linux.intel.com
In-Reply-To: <20101029.132836.115944599.davem@davemloft.net>

I have addressed this issue in v14 patch set.

Thanks
Xiaohui

>-----Original Message-----
>From: David Miller [mailto:davem@davemloft.net]
>Sent: Saturday, October 30, 2010 4:29 AM
>To: Xin, Xiaohui
>Cc: netdev@vger.kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>mst@redhat.com; mingo@elte.hu; herbert@gondor.apana.org.au; jdike@linux.intel.com
>Subject: Re: [PATCH v13 10/16] Add a hook to intercept external buffers from NIC driver.
>
>From: "Xin, Xiaohui" <xiaohui.xin@intel.com>
>Date: Wed, 27 Oct 2010 09:33:12 +0800
>
>> Somehow, it seems not a trivial work to support it now. Can we support it
>> later and as a todo with our current work?
>
>I would prefer the feature work properly, rather than only in specific
>cases, before being integated.

^ permalink raw reply

* Re:[PATCH v14 06/17] Use callback to deal with skb_release_data() specially.
From: xiaohui.xin @ 2010-11-08  8:03 UTC (permalink / raw)
  To: eric.dumazet, netdev, kvm, linux-kernel, mst, mingo, davem,
	herbert, jdi
  Cc: Xin Xiaohui
In-Reply-To: <1288861663.2659.47.camel@edumazet-laptop>

From: Xin Xiaohui <xiaohui.xin@intel.com>

>> Hmm, I suggest you read the comment two lines above.
>>
>> If destructor_arg is now cleared each time we allocate a new skb, then,
>> please move it before dataref in shinfo structure, so that the following
>> memset() does the job efficiently...
>
>
>Something like :
>
>diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>index e6ba898..2dca504 100644
>--- a/include/linux/skbuff.h
>+++ b/include/linux/skbuff.h
>@@ -195,6 +195,9 @@ struct skb_shared_info {
> 	__be32          ip6_frag_id;
> 	__u8		tx_flags;
> 	struct sk_buff	*frag_list;
>+	/* Intermediate layers must ensure that destructor_arg
>+	 * remains valid until skb destructor */
>+	void		*destructor_arg;
> 	struct skb_shared_hwtstamps hwtstamps;
>
> 	/*
>@@ -202,9 +205,6 @@ struct skb_shared_info {
> 	 */
> 	atomic_t	dataref;
>
>-	/* Intermediate layers must ensure that destructor_arg
>-	 * remains valid until skb destructor */
>-	void *		destructor_arg;
> 	/* must be last field, see pskb_expand_head() */
> 	skb_frag_t	frags[MAX_SKB_FRAGS];
> };
>
>

Will that affect the cache line?
Or, we can move the line to clear destructor_arg to the end of __alloc_skb().
It looks like as the following, which one do you prefer?

Thanks
Xiaohui

---
 net/core/skbuff.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c83b421..df852f2 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -224,6 +224,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 
 		child->fclone = SKB_FCLONE_UNAVAILABLE;
 	}
+	shinfo->destructor_arg = NULL;
 out:
 	return skb;
 nodata:
@@ -343,6 +344,13 @@ static void skb_release_data(struct sk_buff *skb)
 		if (skb_has_frags(skb))
 			skb_drop_fraglist(skb);
 
+		if (skb->dev && dev_is_mpassthru(skb->dev)) {
+			struct skb_ext_page *ext_page =
+				skb_shinfo(skb)->destructor_arg;
+			if (ext_page && ext_page->dtor)
+				ext_page->dtor(ext_page);
+		}
+
 		kfree(skb->head);
 	}
 }
-- 
1.7.3

^ permalink raw reply related

* Re: [PATCH 0/9] Fix leaking of kernel heap addresses in net/
From: Rémi Denis-Courmont @ 2010-11-08  8:04 UTC (permalink / raw)
  To: ext Dan Rosenberg
  Cc: chas@cmf.nrl.navy.mil, davem@davemloft.net, kuznet@ms2.inr.ac.ru,
	pekkas@netcore.fi, jmorris@namei.org, yoshfuji@linux-ipv6.org,
	kaber@trash.net, netdev@vger.kernel.org, security@kernel.org,
	stable@kernel.org
In-Reply-To: <1289147492.3090.137.camel@Dan>

On Sunday 07 November 2010 18:31:32 ext Dan Rosenberg, you wrote:
> This patch series resolves the leakage of kernel heap addresses to
> userspace via network protocol /proc interfaces and public error
> messages.  Revealing this information is a bad idea from a security
> perspective for a number of reasons, the most obvious of which is it
> provides unprivileged users a mechanism by which to create a structure
> in the kernel heap containing function pointers, obtain the address of
> that structure, and overwrite those function pointers by leveraging
> other vulnerabilities.  It is my hope that by eliminating this
> information leakage, in conjunction with making statically-declared
> function pointer tables read-only (to be done in a separate patch
> series), we can at least add a small hurdle for the exploitation of a
> subset of kernel vulnerabilities.

Seems like this patch series is incomplete to me as far as /proc/net is 
concerned.

-- 
Rémi Denis-Courmont
Nokia Devices R&D, Maemo Software, Helsinki

^ permalink raw reply

* Re: [PATCH] firewire: net: rate-limit log spam at transmit failure
From: Stefan Richter @ 2010-11-08  8:12 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: linux1394-devel, netdev
In-Reply-To: <1289180517.4318.3.camel@maxim-laptop>

Maxim Levitsky wrote:
> But why the timeout is  never set?

It is set to the default 0.1s per IEEE 1394 in core-card.c::fw_card_initialize.

If card->split_timeout_jiffies or card->split_timeout_cycles /ever/ become
zero, then only due to a memory corrupting bug.

> Also, note that I see here that if I send a TCP stream from one system
> to another then the system that recieves the packets (and sends TCP
> acks), still overflows the queue (error 10, and confirmed by printks).

OK, I'll send a stricter version of "firewire: net: throttle TX queue before
running out of tlabels".
-- 
Stefan Richter
-=====-==-=- =-== -=---
http://arcgraph.de/sr/

^ permalink raw reply

* Re: Re:[PATCH v14 06/17] Use callback to deal with skb_release_data() specially.
From: Eric Dumazet @ 2010-11-08  8:24 UTC (permalink / raw)
  To: xiaohui.xin; +Cc: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike
In-Reply-To: <1289203430-5935-1-git-send-email-xiaohui.xin@intel.com>

Le lundi 08 novembre 2010 à 16:03 +0800, xiaohui.xin@intel.com a écrit :
> From: Xin Xiaohui <xiaohui.xin@intel.com>
> 
> >> Hmm, I suggest you read the comment two lines above.
> >>
> >> If destructor_arg is now cleared each time we allocate a new skb, then,
> >> please move it before dataref in shinfo structure, so that the following
> >> memset() does the job efficiently...
> >
> >
> >Something like :
> >
> >diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >index e6ba898..2dca504 100644
> >--- a/include/linux/skbuff.h
> >+++ b/include/linux/skbuff.h
> >@@ -195,6 +195,9 @@ struct skb_shared_info {
> > 	__be32          ip6_frag_id;
> > 	__u8		tx_flags;
> > 	struct sk_buff	*frag_list;
> >+	/* Intermediate layers must ensure that destructor_arg
> >+	 * remains valid until skb destructor */
> >+	void		*destructor_arg;
> > 	struct skb_shared_hwtstamps hwtstamps;
> >
> > 	/*
> >@@ -202,9 +205,6 @@ struct skb_shared_info {
> > 	 */
> > 	atomic_t	dataref;
> >
> >-	/* Intermediate layers must ensure that destructor_arg
> >-	 * remains valid until skb destructor */
> >-	void *		destructor_arg;
> > 	/* must be last field, see pskb_expand_head() */
> > 	skb_frag_t	frags[MAX_SKB_FRAGS];
> > };
> >
> >
> 
> Will that affect the cache line?

What do you mean ?

> Or, we can move the line to clear destructor_arg to the end of __alloc_skb().
> It looks like as the following, which one do you prefer?
> 
> Thanks
> Xiaohui
> 
> ---
>  net/core/skbuff.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index c83b421..df852f2 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -224,6 +224,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>  
>  		child->fclone = SKB_FCLONE_UNAVAILABLE;
>  	}
> +	shinfo->destructor_arg = NULL;
>  out:
>  	return skb;
>  nodata:

I dont understand why you want to do this.

This adds an instruction, makes code bigger, and no obvious gain for me,
at memory transactions side.

If integrated in the existing memset(), cost is an extra iteration to
perform the clear of this field.



^ permalink raw reply

* RE: Re:[PATCH v14 06/17] Use callback to deal with skb_release_data() specially.
From: Xin, Xiaohui @ 2010-11-08  8:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
	davem@davemloft.net, herbert@gondor.apana.org.au,
	jdike@linux.intel.com
In-Reply-To: <1289204686.2478.375.camel@edumazet-laptop>

>-----Original Message-----
>From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
>Sent: Monday, November 08, 2010 4:25 PM
>To: Xin, Xiaohui
>Cc: netdev@vger.kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>mst@redhat.com; mingo@elte.hu; davem@davemloft.net; herbert@gondor.apana.org.au;
>jdike@linux.intel.com
>Subject: Re: Re:[PATCH v14 06/17] Use callback to deal with skb_release_data() specially.
>
>Le lundi 08 novembre 2010 à 16:03 +0800, xiaohui.xin@intel.com a écrit :
>> From: Xin Xiaohui <xiaohui.xin@intel.com>
>>
>> >> Hmm, I suggest you read the comment two lines above.
>> >>
>> >> If destructor_arg is now cleared each time we allocate a new skb, then,
>> >> please move it before dataref in shinfo structure, so that the following
>> >> memset() does the job efficiently...
>> >
>> >
>> >Something like :
>> >
>> >diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> >index e6ba898..2dca504 100644
>> >--- a/include/linux/skbuff.h
>> >+++ b/include/linux/skbuff.h
>> >@@ -195,6 +195,9 @@ struct skb_shared_info {
>> > 	__be32          ip6_frag_id;
>> > 	__u8		tx_flags;
>> > 	struct sk_buff	*frag_list;
>> >+	/* Intermediate layers must ensure that destructor_arg
>> >+	 * remains valid until skb destructor */
>> >+	void		*destructor_arg;
>> > 	struct skb_shared_hwtstamps hwtstamps;
>> >
>> > 	/*
>> >@@ -202,9 +205,6 @@ struct skb_shared_info {
>> > 	 */
>> > 	atomic_t	dataref;
>> >
>> >-	/* Intermediate layers must ensure that destructor_arg
>> >-	 * remains valid until skb destructor */
>> >-	void *		destructor_arg;
>> > 	/* must be last field, see pskb_expand_head() */
>> > 	skb_frag_t	frags[MAX_SKB_FRAGS];
>> > };
>> >
>> >
>>
>> Will that affect the cache line?
>
>What do you mean ?
>
>> Or, we can move the line to clear destructor_arg to the end of __alloc_skb().
>> It looks like as the following, which one do you prefer?
>>
>> Thanks
>> Xiaohui
>>
>> ---
>>  net/core/skbuff.c |    8 ++++++++
>>  1 files changed, 8 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index c83b421..df852f2 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -224,6 +224,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>>
>>  		child->fclone = SKB_FCLONE_UNAVAILABLE;
>>  	}
>> +	shinfo->destructor_arg = NULL;
>>  out:
>>  	return skb;
>>  nodata:
>
>I dont understand why you want to do this.
>
>This adds an instruction, makes code bigger, and no obvious gain for me,
>at memory transactions side.
>
>If integrated in the existing memset(), cost is an extra iteration to
>perform the clear of this field.
>
Ok. Thanks for this explanation and will update with your solution.

Thanks
Xiaohui



^ permalink raw reply

* Re: [PATCH 1/1] UDEV - Add 'udevlom' command line param to start_udev
From: Sujit K M @ 2010-11-08  8:42 UTC (permalink / raw)
  To: Matt Domsch
  Cc: Greg KH, K, Narendra, linux-hotplug@vger.kernel.org,
	netdev@vger.kernel.org, Hargrave, Jordan, Rose, Charles
In-Reply-To: <20101105025848.GA14021@pws490.domsch.com>

> At Linux Plumbers Conference today, this problem space was discussed
> once again, and I believe concensus on approach was reached.  Here
> goes:

Was the patch a starting point for the discussion.

> * If a 70-persistent-net.rules file sets a name, honor that.  This
>  preserves existing installs.
>
> * If BIOS provides indexes for onboard devices, honor that.
> ** Rename onboard NICs "lom[1-N]" as BIOS reports (# matches chassis labels)
> ** No rename for all others "ethX" (no change for NICs in PCI slots/USB/others)
>
> * If neither are true, do not rename at all.

I would like to know what is the difference in the nomenclature for this.

>
> * Implementation will be:
> ** Udev rules to be included in upstream udev will read the index
>   value from sysfs (provided by SMBIOS 2.6 info on kernels >= 2.6.36,
>   PCI DSM info at some future point) if present, and rename LOMs
>   based on that index value.  Distros will use these rules by default
>   (Ubuntu and Fedora maintainers on board with the concept; I have
>   not spoken with other distros yet.)
> ** Legacy distros with older udev rules will invoke biosdevname on
>   kernels < 2.6.36 to get the same information, if present, and will
>   rename LOMs based on index value.

How will you manage these scenarios.
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] pktgen: correct uninitialized queue_map
From: Junchang Wang @ 2010-11-08  9:19 UTC (permalink / raw)
  To: davem, robert.olsson, eric.dumazet, joe, andy.shevchenko, backyes; +Cc: netdev


This fix a bug reported by backyes.
Right the first time pktgen's using queue_map that's not been initialized
by set_cur_queue_map(pkt_dev);

Signed-off-by: Junchang Wang <junchangwang@gmail.com>
Signed-off-by: Backyes <backyes@mail.ustc.edu.cn>
---

 net/core/pktgen.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 2c0df0f..564d9ba 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2611,8 +2611,8 @@ static struct sk_buff *fill_packet_ipv4(struct net_device *odev,
 	/* Update any of the values, used when we're incrementing various
 	 * fields.
 	 */
-	queue_map = pkt_dev->cur_queue_map;
 	mod_cur_headers(pkt_dev);
+	queue_map = pkt_dev->cur_queue_map;
 
 	datalen = (odev->hard_header_len + 16) & ~0xf;
 
@@ -2975,8 +2975,8 @@ static struct sk_buff *fill_packet_ipv6(struct net_device *odev,
 	/* Update any of the values, used when we're incrementing various
 	 * fields.
 	 */
-	queue_map = pkt_dev->cur_queue_map;
 	mod_cur_headers(pkt_dev);
+	queue_map = pkt_dev->cur_queue_map;
 
 	skb = __netdev_alloc_skb(odev,
 				 pkt_dev->cur_pkt_size + 64
--

--Junchang

^ permalink raw reply related

* Re: [RFC v2] ipvs: allow transmit of GRO aggregated skbs
From: Julian Anastasov @ 2010-11-08  9:36 UTC (permalink / raw)
  To: Simon Horman; +Cc: lvs-devel, netdev, Herbert Xu
In-Reply-To: <20101108073149.GA31384@verge.net.au>


 	Hello,

On Mon, 8 Nov 2010, Simon Horman wrote:

> [ CCing Herbet Xu ]
>
>>>> This is a first attempt at allowing LVS to transmit
>>>> skbs of greater than MTU length that have been aggregated by GRO.
>>>>
>>>> I have lightly tested the ip_vs_dr_xmit() portion of this patch and
>>>> although it seems to work I am unsure that netif_needs_gso() is the correct
>>>> test to use.
>>>
>>> 	ip_forward() uses !skb_is_gso(skb), so may be it is
>>> enough to check for GRO instead of using netif_needs_gso?
>>
>> Thanks, I'll look into that.
>
> Hi Julian,
>
> just to clarify, you think that !skb_is_gso(skb) should be
> used in ip_vs_xmit.c? If so, yes I think that makes sense
> and I'll re-spin my patch accordingly.

 	Yes, I think we should check for !skb_is_gso(skb)
as it looks as correct check to avoid FRAG_NEEDED after GRO
but lets wait for confirmation from Herbert Xu.
Also, !skb->local_df check should help for local IPVS clients
that set local_df.

 	If you prefer you can create such helper in ip_vs_xmit.c:

/* Check if packet exceeds MTU */
static inline int ip_vs_mtu_exceeded(struct sk_buff *skb, unsigned int mtu)
{
 	return skb->len > mtu && !skb_is_gso(skb) && !skb->local_df;
}

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [Security] [SECURITY] Fix leaking of kernel heap addresses via /proc
From: Andi Kleen @ 2010-11-08  9:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, andi, drosenberg, chas3, tytso, torvalds, kuznet,
	pekkas, jmorris, yoshfuji, kaber, remi.denis-courmont, netdev,
	security
In-Reply-To: <1289201612.2478.371.camel@edumazet-laptop>

> When a printk() happens right before a BUG(), how are we going to check
> the dumped registers are possibly close the socket involved, if we dont
> have access to the machine, and only the crashlog ?

Is that really something you do regularly? It seems highly obscure
to me.

Besides if the kernel has timestamps enabled you can easily
guess based on the timestamps if the printk and the oops
are related.

-Andi

^ permalink raw reply

* Re: [Security] [SECURITY] Fix leaking of kernel heap addresses via /proc
From: Eric Dumazet @ 2010-11-08 10:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David Miller, drosenberg, chas3, tytso, torvalds, kuznet, pekkas,
	jmorris, yoshfuji, kaber, remi.denis-courmont, netdev, security
In-Reply-To: <20101108094358.GA22069@basil.fritz.box>

Le lundi 08 novembre 2010 à 10:43 +0100, Andi Kleen a écrit :
> > When a printk() happens right before a BUG(), how are we going to check
> > the dumped registers are possibly close the socket involved, if we dont
> > have access to the machine, and only the crashlog ?
> 
> Is that really something you do regularly? It seems highly obscure
> to me.

Yes, very regularly, I can find bugs thanks to every bit of information
found in kernel logs, including code around the fault.

If people now say : "I have a kernel bug, but am not able to provide you
a kernel stack trace and previous printk() messages because of security.
You cannot have an access to this machine, and the bug happens once in a
while. Kernel version is also hidden. Please help me."

Oh well, thats a challenge, maybe use this cristal ball I have somewhere
in the attic ;)




^ permalink raw reply

* [PATCH] ucc_geth: Fix hung tasks.
From: Joakim Tjernlund @ 2010-11-08 10:23 UTC (permalink / raw)
  To: linuxppc-dev, netdev, Anton Vorontsov; +Cc: Joakim Tjernlund

We noticed a few hangs like this:

INFO: task ifconfig:572 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ifconfig      D 0ff65760     0   572    369 0x00000000
Call Trace:
[c6157be0] [c6008460] 0xc6008460 (unreliable)
[c6157ca0] [c0008608] __switch_to+0x4c/0x6c
[c6157cb0] [c028fecc] schedule+0x184/0x310
[c6157ce0] [c0290e54] __mutex_lock_slowpath+0xa4/0x150
[c6157d20] [c0290c48] mutex_lock+0x44/0x48
[c6157d30] [c01aba74] phy_stop+0x20/0x70
[c6157d40] [c01aef40] ucc_geth_stop+0x30/0x98
[c6157d60] [c01b18fc] ucc_geth_close+0x9c/0xdc
[c6157d80] [c01db0cc] __dev_close+0xa0/0xd0
[c6157d90] [c01deddc] __dev_change_flags+0x8c/0x148
[c6157db0] [c01def54] dev_change_flags+0x1c/0x64
[c6157dd0] [c0237ac8] devinet_ioctl+0x678/0x784
[c6157e50] [c0239a58] inet_ioctl+0xb0/0xbc
[c6157e60] [c01cafa8] sock_ioctl+0x174/0x2a0
[c6157e80] [c009a16c] vfs_ioctl+0xcc/0xe0
[c6157ea0] [c009a998] do_vfs_ioctl+0xc4/0x79c
[c6157f10] [c009b0b0] sys_ioctl+0x40/0x74
[c6157f40] [c00117c4] ret_from_syscall+0x0/0x38

I THINK this is due to a missing cancel_work_sync in the driver
although we cannot be sure. I found this by comparing
ucc_geth with gianfar.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 drivers/net/ucc_geth.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 97f9f7d..6647ed7 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -3556,6 +3556,7 @@ static int ucc_geth_close(struct net_device *dev)
 
 	napi_disable(&ugeth->napi);
 
+	cancel_work_sync(&ugeth->timeout_work);
 	ucc_geth_stop(ugeth);
 
 	free_irq(ugeth->ug_info->uf_info.irq, ugeth->ndev);
-- 
1.7.2.2


^ permalink raw reply related

* Re: OOM when adding ipv6 route:  How to make available more per-cpu memory?
From: Eric Dumazet @ 2010-11-08 11:02 UTC (permalink / raw)
  To: Ben Greear; +Cc: NetDev, linux-kernel, Tejun Heo
In-Reply-To: <4CD58B9C.2030006@candelatech.com>

Le samedi 06 novembre 2010 à 10:08 -0700, Ben Greear a écrit :

> At least I don't see any percpu dumps in dmesg.  I vaguely remember
> someone posting some ipv6 address scalability patches some time back.
> I think they had to hack on /proc fs as well.  I'll see if I can
> dig those up.
> 
> > Make sure udev / hotplug is not the problem, if you create your devices
> > very fast.
> 
> We can create the macvlans w/out problem, though I'm sure that could
> be sped up.  The problem is when we try to add IPv6 addresses to
> them.

I see. Did you check /proc/sys/net/ipv6/ tunables ?

For example, I bet you need to make route/max_size a bigger value than
default (4096)

Following is working for me

echo 16384 >/proc/sys/net/ipv6/route/max_size
modprobe dummy numdummies=2000
for a in `seq 1 1999`
do
 ip -6 add add 4444::444:$a/24 dev dummy$a
done

ip -6 ro | wc -l
6008




^ permalink raw reply

* Re: [PATCH v4 0/2] Get and Set Feature Reports on HIDRAW (USB and Bluetooth)
From: Antonio Ospite @ 2010-11-08 11:17 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Alan Ott, Marcel Holtmann, David S. Miller, Stefan Achatz,
	Alexey Dobriyan, Tejun Heo, Alan Stern, Greg Kroah-Hartman,
	Stephane Chatty, Michael Poole, Bastien Nocera, Eric Dumazet,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <alpine.LNX.2.00.1011011523150.15851-ztGlSCb7Y1iN3ZZ/Hiejyg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1251 bytes --]

On Mon, 1 Nov 2010 15:23:34 -0400 (EDT)
Jiri Kosina <jkosina-AlSwsSmVLrQ@public.gmane.org> wrote:

> On Wed, 22 Sep 2010, Jiri Kosina wrote:
> 
> > > > > This is version 4. Built against 2.6.35+ revision 320b2b8de12698 .
> > > > > 
> > > > > Alan Ott (2):
> > > > >   HID: Add Support for Setting and Getting Feature Reports from hidraw
> > > > >   Bluetooth: hidp: Add support for hidraw  HIDIOCGFEATURE  and
> > > > >     HIDIOCSFEATURE
[...]
> > > ... Marcel?
> > > 
> > > I'd really like not to miss 2.6.37 merge window with this.
> > 
> > Seemingly I have not enought powers to get statement from Marcel here 
> > these days/weeks.
> > 
> > Davem, would you perhaps be able to step in here?
> 
> Marcel, any word on this patchset by chance?
> 

Hopefully Alan will manage to send a v5 sometime soon, Alan I don't want
to pressure you, just remember to CC Gustavo F. Padovan (see
MAINTAINERS) as he looks to be the most active bluetooth maintainer
these days.

Thanks,
   Antonio

-- 
Antonio Ospite
http://ao2.it

PGP public key ID: 0x4553B001

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [RFC v2] ipvs: allow transmit of GRO aggregated skbs
From: Simon Horman @ 2010-11-08 11:45 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: lvs-devel, netdev, Herbert Xu
In-Reply-To: <alpine.LFD.2.00.1011081104310.1687@ja.ssi.bg>

On Mon, Nov 08, 2010 at 11:36:21AM +0200, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Mon, 8 Nov 2010, Simon Horman wrote:
> 
> >[ CCing Herbet Xu ]
> >
> >>>>This is a first attempt at allowing LVS to transmit
> >>>>skbs of greater than MTU length that have been aggregated by GRO.
> >>>>
> >>>>I have lightly tested the ip_vs_dr_xmit() portion of this patch and
> >>>>although it seems to work I am unsure that netif_needs_gso() is the correct
> >>>>test to use.
> >>>
> >>>	ip_forward() uses !skb_is_gso(skb), so may be it is
> >>>enough to check for GRO instead of using netif_needs_gso?
> >>
> >>Thanks, I'll look into that.
> >
> >Hi Julian,
> >
> >just to clarify, you think that !skb_is_gso(skb) should be
> >used in ip_vs_xmit.c? If so, yes I think that makes sense
> >and I'll re-spin my patch accordingly.
> 
> 	Yes, I think we should check for !skb_is_gso(skb)
> as it looks as correct check to avoid FRAG_NEEDED after GRO
> but lets wait for confirmation from Herbert Xu.

Yes, lets try and get a word from Herbert Xu on this.

> Also, !skb->local_df check should help for local IPVS clients
> that set local_df.
> 
> 	If you prefer you can create such helper in ip_vs_xmit.c:
> 
> /* Check if packet exceeds MTU */
> static inline int ip_vs_mtu_exceeded(struct sk_buff *skb, unsigned int mtu)
> {
> 	return skb->len > mtu && !skb_is_gso(skb) && !skb->local_df;
> }

Good thinking.


^ permalink raw reply

* [patch 0/2] s390: qeth bug fixes for 2.6.37 next rc
From: frank.blaschka @ 2010-11-08 13:03 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390

Hi Dave,

here are some qeth bug fixes for 2.6.37 next rc (net-2.6)

shortlog:

Ursula Braun (1)
qeth: remove dev_queue_xmit invocation

Frank Blaschka (1)
qeth: fix race condition during device startup

Thanks,
        Frank


^ permalink raw reply

* [patch 1/2] [PATCH] qeth: remove dev_queue_xmit invocation
From: frank.blaschka @ 2010-11-08 13:03 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Ursula Braun
In-Reply-To: <20101108130347.973708765@de.ibm.com>

[-- Attachment #1: 602-qeth-dev-queue-xmit.diff --]
[-- Type: text/plain, Size: 5644 bytes --]

From: Ursula Braun <ursula.braun@de.ibm.com>

For a certain Hipersockets specific error code in the xmit path, the
qeth driver tries to invoke dev_queue_xmit again.
Commit 79640a4ca6955e3ebdb7038508fa7a0cd7fa5527 introduces a busylock
causing locking problems in case of re-invoked dev_queue_xmit by qeth.
This patch removes the attempts to retry packet sending with
dev_queue_xmit from the qeth driver.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
---

 drivers/s390/net/qeth_core.h      |    9 ------
 drivers/s390/net/qeth_core_main.c |   53 +++++---------------------------------
 2 files changed, 8 insertions(+), 54 deletions(-)

diff -urpN linux-2.6/drivers/s390/net/qeth_core.h linux-2.6-patched/drivers/s390/net/qeth_core.h
--- linux-2.6/drivers/s390/net/qeth_core.h	2010-11-08 09:40:59.000000000 +0100
+++ linux-2.6-patched/drivers/s390/net/qeth_core.h	2010-11-08 09:41:12.000000000 +0100
@@ -440,7 +440,6 @@ struct qeth_qdio_out_q {
 	 * index of buffer to be filled by driver; state EMPTY or PACKING
 	 */
 	int next_buf_to_fill;
-	int sync_iqdio_error;
 	/*
 	 * number of buffers that are currently filled (PRIMED)
 	 * -> these buffers are hardware-owned
@@ -695,14 +694,6 @@ struct qeth_mc_mac {
 	int is_vmac;
 };
 
-struct qeth_skb_data {
-	__u32 magic;
-	int count;
-};
-
-#define QETH_SKB_MAGIC 0x71657468
-#define QETH_SIGA_CC2_RETRIES 3
-
 struct qeth_rx {
 	int b_count;
 	int b_index;
diff -urpN linux-2.6/drivers/s390/net/qeth_core_main.c linux-2.6-patched/drivers/s390/net/qeth_core_main.c
--- linux-2.6/drivers/s390/net/qeth_core_main.c	2010-11-08 09:40:59.000000000 +0100
+++ linux-2.6-patched/drivers/s390/net/qeth_core_main.c	2010-11-08 09:41:12.000000000 +0100
@@ -877,8 +877,8 @@ out:
 	return;
 }
 
-static void __qeth_clear_output_buffer(struct qeth_qdio_out_q *queue,
-		 struct qeth_qdio_out_buffer *buf, unsigned int qeth_skip_skb)
+static void qeth_clear_output_buffer(struct qeth_qdio_out_q *queue,
+		struct qeth_qdio_out_buffer *buf)
 {
 	int i;
 	struct sk_buff *skb;
@@ -887,13 +887,11 @@ static void __qeth_clear_output_buffer(s
 	if (buf->buffer->element[0].flags & 0x40)
 		atomic_dec(&queue->set_pci_flags_count);
 
-	if (!qeth_skip_skb) {
+	skb = skb_dequeue(&buf->skb_list);
+	while (skb) {
+		atomic_dec(&skb->users);
+		dev_kfree_skb_any(skb);
 		skb = skb_dequeue(&buf->skb_list);
-		while (skb) {
-			atomic_dec(&skb->users);
-			dev_kfree_skb_any(skb);
-			skb = skb_dequeue(&buf->skb_list);
-		}
 	}
 	for (i = 0; i < QETH_MAX_BUFFER_ELEMENTS(queue->card); ++i) {
 		if (buf->buffer->element[i].addr && buf->is_header[i])
@@ -909,12 +907,6 @@ static void __qeth_clear_output_buffer(s
 	atomic_set(&buf->state, QETH_QDIO_BUF_EMPTY);
 }
 
-static void qeth_clear_output_buffer(struct qeth_qdio_out_q *queue,
-		struct qeth_qdio_out_buffer *buf)
-{
-	__qeth_clear_output_buffer(queue, buf, 0);
-}
-
 void qeth_clear_qdio_buffers(struct qeth_card *card)
 {
 	int i, j;
@@ -2833,7 +2825,6 @@ static void qeth_flush_buffers(struct qe
 		}
 	}
 
-	queue->sync_iqdio_error = 0;
 	queue->card->dev->trans_start = jiffies;
 	if (queue->card->options.performance_stats) {
 		queue->card->perf_stats.outbound_do_qdio_cnt++;
@@ -2849,10 +2840,6 @@ static void qeth_flush_buffers(struct qe
 		queue->card->perf_stats.outbound_do_qdio_time +=
 			qeth_get_micros() -
 			queue->card->perf_stats.outbound_do_qdio_start_time;
-	if (rc > 0) {
-		if (!(rc & QDIO_ERROR_SIGA_BUSY))
-			queue->sync_iqdio_error = rc & 3;
-	}
 	if (rc) {
 		queue->card->stats.tx_errors += count;
 		/* ignore temporary SIGA errors without busy condition */
@@ -2940,7 +2927,6 @@ void qeth_qdio_output_handler(struct ccw
 	struct qeth_qdio_out_q *queue = card->qdio.out_qs[__queue];
 	struct qeth_qdio_out_buffer *buffer;
 	int i;
-	unsigned qeth_send_err;
 
 	QETH_CARD_TEXT(card, 6, "qdouhdl");
 	if (qdio_error & QDIO_ERROR_ACTIVATE_CHECK_CONDITION) {
@@ -2956,9 +2942,8 @@ void qeth_qdio_output_handler(struct ccw
 	}
 	for (i = first_element; i < (first_element + count); ++i) {
 		buffer = &queue->bufs[i % QDIO_MAX_BUFFERS_PER_Q];
-		qeth_send_err = qeth_handle_send_error(card, buffer, qdio_error);
-		__qeth_clear_output_buffer(queue, buffer,
-			(qeth_send_err == QETH_SEND_ERROR_RETRY) ? 1 : 0);
+		qeth_handle_send_error(card, buffer, qdio_error);
+		qeth_clear_output_buffer(queue, buffer);
 	}
 	atomic_sub(count, &queue->used_buffers);
 	/* check if we need to do something on this outbound queue */
@@ -3183,10 +3168,7 @@ int qeth_do_send_packet_fast(struct qeth
 		int offset, int hd_len)
 {
 	struct qeth_qdio_out_buffer *buffer;
-	struct sk_buff *skb1;
-	struct qeth_skb_data *retry_ctrl;
 	int index;
-	int rc;
 
 	/* spin until we get the queue ... */
 	while (atomic_cmpxchg(&queue->state, QETH_OUT_Q_UNLOCKED,
@@ -3205,25 +3187,6 @@ int qeth_do_send_packet_fast(struct qeth
 	atomic_set(&queue->state, QETH_OUT_Q_UNLOCKED);
 	qeth_fill_buffer(queue, buffer, skb, hdr, offset, hd_len);
 	qeth_flush_buffers(queue, index, 1);
-	if (queue->sync_iqdio_error == 2) {
-		skb1 = skb_dequeue(&buffer->skb_list);
-		while (skb1) {
-			atomic_dec(&skb1->users);
-			skb1 = skb_dequeue(&buffer->skb_list);
-		}
-		retry_ctrl = (struct qeth_skb_data *) &skb->cb[16];
-		if (retry_ctrl->magic != QETH_SKB_MAGIC) {
-			retry_ctrl->magic = QETH_SKB_MAGIC;
-			retry_ctrl->count = 0;
-		}
-		if (retry_ctrl->count < QETH_SIGA_CC2_RETRIES) {
-			retry_ctrl->count++;
-			rc = dev_queue_xmit(skb);
-		} else {
-			dev_kfree_skb_any(skb);
-			QETH_CARD_TEXT(card, 2, "qrdrop");
-		}
-	}
 	return 0;
 out:
 	atomic_set(&queue->state, QETH_OUT_Q_UNLOCKED);


^ permalink raw reply

* [patch 2/2] [PATCH] qeth: fix race condition during device startup
From: frank.blaschka @ 2010-11-08 13:03 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390
In-Reply-To: <20101108130347.973708765@de.ibm.com>

[-- Attachment #1: 603-qeth-startup-race.diff --]
[-- Type: text/plain, Size: 878 bytes --]

From: Frank Blaschka <frank.blaschka@de.ibm.com>

QDIO is running independent from netdevice state. We are not
allowed to schedule NAPI in case the netdevice is not open.

Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
---

 drivers/s390/net/qeth_core_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -urpN linux-2.6/drivers/s390/net/qeth_core_main.c linux-2.6-patched/drivers/s390/net/qeth_core_main.c
--- linux-2.6/drivers/s390/net/qeth_core_main.c	2010-11-08 09:41:13.000000000 +0100
+++ linux-2.6-patched/drivers/s390/net/qeth_core_main.c	2010-11-08 09:41:13.000000000 +0100
@@ -2903,7 +2903,7 @@ void qeth_qdio_start_poll(struct ccw_dev
 {
 	struct qeth_card *card = (struct qeth_card *)card_ptr;
 
-	if (card->dev)
+	if (card->dev && (card->dev->flags & IFF_UP))
 		napi_schedule(&card->napi);
 }
 EXPORT_SYMBOL_GPL(qeth_qdio_start_poll);


^ permalink raw reply

* Re: [PATCH 0/9] Fix leaking of kernel heap addresses in net/
From: Dan Rosenberg @ 2010-11-08 13:13 UTC (permalink / raw)
  To: Rémi Denis-Courmont
  Cc: chas@cmf.nrl.navy.mil, davem@davemloft.net, kuznet@ms2.inr.ac.ru,
	pekkas@netcore.fi, jmorris@namei.org, yoshfuji@linux-ipv6.org,
	kaber@trash.net, netdev@vger.kernel.org, security@kernel.org,
	stable@kernel.org
In-Reply-To: <201011081004.48382.remi.denis-courmont@nokia.com>


> Seems like this patch series is incomplete to me as far as /proc/net is 
> concerned.
> 

I'm aware that I missed SCTP, which I will fix in the next attempt.
Other than that (and sticking only to /proc leaks, since I realize there
are many printk examples), care to share what was missed?

-Dan


^ permalink raw reply

* Re: [PATCH 0/9] Fix leaking of kernel heap addresses in net/
From: Rémi Denis-Courmont @ 2010-11-08 13:36 UTC (permalink / raw)
  To: ext Dan Rosenberg
  Cc: chas@cmf.nrl.navy.mil, davem@davemloft.net, kuznet@ms2.inr.ac.ru,
	pekkas@netcore.fi, jmorris@namei.org, yoshfuji@linux-ipv6.org,
	kaber@trash.net, netdev@vger.kernel.org, security@kernel.org,
	stable@kernel.org
In-Reply-To: <1289221999.10229.28.camel@dan>

On Monday 08 November 2010 15:13:19 ext Dan Rosenberg, you wrote:
> > Seems like this patch series is incomplete to me as far as /proc/net is
> > concerned.
> 
> I'm aware that I missed SCTP, which I will fix in the next attempt.
> Other than that (and sticking only to /proc leaks, since I realize there
> are many printk examples), care to share what was missed?

At least Phonet seems to be missing too... which makes me wonder why I am in 
the Cc list.

-- 
Rémi Denis-Courmont
Nokia Devices R&D, Maemo Software, Helsinki

^ permalink raw reply

* Re: [PATCH 0/9] Fix leaking of kernel heap addresses in net/
From: Dan Rosenberg @ 2010-11-08 13:41 UTC (permalink / raw)
  To: Rémi Denis-Courmont
  Cc: chas@cmf.nrl.navy.mil, davem@davemloft.net, kuznet@ms2.inr.ac.ru,
	pekkas@netcore.fi, jmorris@namei.org, yoshfuji@linux-ipv6.org,
	kaber@trash.net, netdev@vger.kernel.org, security@kernel.org,
	stable@kernel.org
In-Reply-To: <201011081536.27650.remi.denis-courmont@nokia.com>


> 
> At least Phonet seems to be missing too... which makes me wonder why I am in 
> the Cc list.

I have a patch prepared for Phonet - it seems to have been lost in the
shuffle when I broke up the patch set.  I'll be sure to include it in
the next revision.  Sorry about that.

-Dan


^ permalink raw reply

* Re: [Patch] fix packet loss and massive ping spikes with PPP multi-link
From: Richard Hartmann @ 2010-11-08 14:05 UTC (permalink / raw)
  To: Ben McKeegan
  Cc: Paul Mackerras, netdev, linux-ppp, Alan Cox,
	Alexander E. Patrakov, linux-kernel, gabriele.paoloni
In-Reply-To: <4C0670E2.1090708@netservers.co.uk>

On Wed, Jun 2, 2010 at 16:55, Ben McKeegan <ben@netservers.co.uk> wrote:

> Still working on this, updating the patch wasn't as trivial as I thought as
> it clashed with Gabriele Paoloni's ppp_mp_explode redesign.  However, while
> looking at this code I believe I have found a bug which might have been
> contributing to the poor performance the OP was experiencing.   For the case
> where channel speeds are unknown and there are more than 2 channels it would
> miscalculate the fragment sizes so they are not balanced on the channels.
>
> Patch for the bug to follow immediately.

Is there any update on this? It's been quite some time since you last updated
on this issue.


Thanks a lot,
Richard

^ permalink raw reply

* Re: [RFC v2] ipvs: allow transmit of GRO aggregated skbs
From: Herbert Xu @ 2010-11-08 14:19 UTC (permalink / raw)
  To: Simon Horman; +Cc: Julian Anastasov, lvs-devel, netdev
In-Reply-To: <20101108114542.GF14454@verge.net.au>

On Mon, Nov 08, 2010 at 08:45:42PM +0900, Simon Horman wrote:
>
> > 	Yes, I think we should check for !skb_is_gso(skb)
> > as it looks as correct check to avoid FRAG_NEEDED after GRO
> > but lets wait for confirmation from Herbert Xu.
> 
> Yes, lets try and get a word from Herbert Xu on this.

Yes skb_is_gso(skb) is the best for now.  It isn't 100% correct
as we should still be checking that the gso_size is less than
the MTU but we need to fix that elsewhere first.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox