Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [net-next-2.6 PATCH] ethtool: time to blink provided in seconds not jiffies
From: Ben Hutchings @ 2011-04-11 23:57 UTC (permalink / raw)
  To: Bruce Allan; +Cc: netdev
In-Reply-To: <20110411230159.6727.91380.stgit@gitlad.jf.intel.com>

On Mon, 2011-04-11 at 16:01 -0700, Bruce Allan wrote:
> When blinking for a duration set by the user, the value specified is in
> seconds but it is used as the number of jiffies in the timeout after which
> the Physical ID indicator is deactivated.  Fix by converting the timeout
> to seconds.

D'oh.

> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* RE: [net-next-2.6 RFC PATCH] ethtool: allow custom interval for physical identification
From: Ben Hutchings @ 2011-04-12  0:00 UTC (permalink / raw)
  To: Allan, Bruce W; +Cc: Stephen Hemminger, netdev@vger.kernel.org
In-Reply-To: <8DD2590731AB5D4C9DBF71A877482A90018A2A30A9@orsmsx509.amr.corp.intel.com>

On Mon, 2011-04-11 at 16:30 -0700, Allan, Bruce W wrote:
> >-----Original Message-----
> >From: Stephen Hemminger [mailto:shemminger@vyatta.com]
> >Sent: Monday, April 11, 2011 4:26 PM
> >To: Allan, Bruce W
> >Cc: netdev@vger.kernel.org
> >Subject: Re: [net-next-2.6 RFC PATCH] ethtool: allow custom interval for
> >physical identification
> >
> >On Mon, 11 Apr 2011 16:16:35 -0700
> >Bruce Allan <bruce.w.allan@intel.com> wrote:
> >
> >> When physical identification of an adapter is done by toggling the
> >> mechanism on and off through software utilizing the .set_phys_id operation,
> >> it is done with a fixed duration for both on and off states.  Some drivers
> >> may want to set a custom duration for the on/off intervals.  This patch
> >> changes the API so the return code from the driver's entry point can
> >> specify the interval duration as a positive number; -EINVAL is still
> >> allowed in order to use the default single on/off interval per second.
> >>
> >> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
> >
> >IMHO this is -EOVERKILL.
> 
> I realize it does seem like that, but we have OEMs that expect the LEDs to
> blink a certain way during a physical identification (twice a second vs.
> once a second per the original .set_phys_id patchset).  There may be other
> drivers from different hardware vendors that have similar but different
> requirements.

I noticed that some drivers did this.  Do you know if these OEMs expect
this of all hardware, or do they actually want different vendors'
hardware to blink in different ways?  If it's a common requirement to
blink at 2 Hz then let's use that frequency for all the drivers that
want to be called periodically.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [Bugme-new] [Bug 33102] New: File's copied from client->linux server only copy 1st 64K data; rest is lost
From: Steve French @ 2011-04-12  0:06 UTC (permalink / raw)
  To: Linda Walsh
  Cc: Andrew Morton, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
In-Reply-To: <4DA3839B.1050102-gT3AUAsYRbTYtjvyW6yDsg@public.gmane.org>

On Mon, Apr 11, 2011 at 5:41 PM, Linda Walsh <lkml-gT3AUAsYRbTYtjvyW6yDsg@public.gmane.org> wrote:
>
> Andrew Morton wrote:
>>
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Mon, 11 Apr 2011 22:12:41 GMT
>> bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r@public.gmane.org wrote:
>>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=33102
>>>
>>>           Summary: File's copied from client->linux server only copy 1st
>>>                    64K data;rest is lost
>>>           Product: Networking
>>>           Version: 2.5
>>>    Kernel Version: 2.6.38.2
>>>          Platform: All
>>>        OS/Version: Linux
>>>              Tree: Mainline
>>>            Status: NEW
>>>          Severity: blocking
>>>          Priority: P1
>>>         Component: IPV4
>>>        AssignedTo: shemminger-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org
>>>        ReportedBy: lkml-gT3AUAsYRbTYtjvyW6yDsg@public.gmane.org
>>>        Regression: Yes
>>>
>>
>> Seems to be a 2.6.37->2.6.38 regression.
>>
>> ----------
>
> Not exactly -- Please note -- I tried both 2.6.38(.0) and 2.6.38.1.
>
> They both work.

Any chance that we could get a wireshark trace of the failure?

https://wiki.samba.org/index.php/Capture_Packets

gives instructions.   There may also be useful information on
network stack failures returned in the samba log
(often named smbd.log).



-- 
Thanks,

Steve

^ permalink raw reply

* Re: Race condition when creating multiple namespaces?
From: Eric W. Biederman @ 2011-04-12  0:27 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: netdev, Daniel Lezcano
In-Reply-To: <201104112301.46776.hans@schillstrom.com>

Hans Schillstrom <hans@schillstrom.com> writes:

> Hello
> I'v been strugling with this for some time now
>
> When creating multiple namespaces using lxc-start,  un-initialized network namespace parts will be called by the new process in the namespace.
> ex. when using conntrack or ipvsadm to quickly,  (a sleep 2 "solves" the problem).
> (From what I can see syscall clone() is used in lx-start  i.e. do_fork will be called later on.)
> Actually I was debugging ip_vs when closing multiple ns  when I fell into this one.
>
> I have a loop that create 33 containers whith lxc-start ... -- test.sh
> the first thing the new conatiner does in test.sh is
> #!/bin/bash
> iptables -t mangle -A PREROUTING -m conntrack --ctstate RELATED,ESTABLISHED -j CONNMARK --restore-mark
> nc -l -p1234
>
> This results in NULL ptr in ip_conntrack_net_init(struct *net)

Ouch!

> and in anoither test test.sh looks like this
> #!/bin/bash
> ipvsadm --start-daemon=master --mcast-interface=lo
> nc -l -p1234
>
> And this results in an uniitialized spinlock in ip_vs_sync
>
> I put a printk in nsproxy: copy_namespaces() and could see a dozens of them
> before anything appears from ipvs or conntrack.
>
> My feeling is that when you start up user processes in a new name space, 
> all kernel related init should have been done (you should not need to add a sleep to get it working)
>
> All test  made by using todays net-next-2.6 (2.6.39-rc1)
>
> Note:
> That neither conntrack or ip_vs modules where loaded,
> if modules where loaded before creating new namespaces it all works...
>
> Finally the question,
> Should it really work to load modules within a namespace , 
> that is a part of netns ?

>From an implementation point of view kernel modules are not in a
namespace, so there should be no difference between being in a namespace
and loading a kernel networking module and not being in a namespace and
loading in a kernel module.

It does sound like you have hit a module loading race, and perhaps
a race that is confined to network namespaces.

My head is in another problem so I won't be able to look at this for
a bit.  But if you are getting into ip_conntrack_net_init with
a NULL network namespace something spectacularly bad is happening.

In particular it looks like you must be hitting a bug in for_each_net.
Which would pretty much have to be a race in adding or removing from
net_namespace_list.

I took a quick skim through the code and whenever we modify the
net_namespace we hold but the net_mutex and inside it the rtnl_lock so I
don't immediate see how you could be getting a NULL net into
ip_conntrack_net_init.

Is there a codepath besides register_pernet_subsys that is calling
ip_conntrack_net_init?

Do you have any local modifications that could be messing up register_pernet_subsys?

Eric

^ permalink raw reply

* RE: [net-next-2.6 RFC PATCH] ethtool: allow custom interval for physical identification
From: Allan, Bruce W @ 2011-04-12  1:07 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Stephen Hemminger, netdev@vger.kernel.org
In-Reply-To: <1302566453.5282.585.camel@localhost>

>-----Original Message-----
>From: Ben Hutchings [mailto:bhutchings@solarflare.com]
>Sent: Monday, April 11, 2011 5:01 PM
>To: Allan, Bruce W
>Cc: Stephen Hemminger; netdev@vger.kernel.org
>Subject: RE: [net-next-2.6 RFC PATCH] ethtool: allow custom interval for
>physical identification
>
>I noticed that some drivers did this.  Do you know if these OEMs expect
>this of all hardware, or do they actually want different vendors'
>hardware to blink in different ways?  If it's a common requirement to
>blink at 2 Hz then let's use that frequency for all the drivers that
>want to be called periodically.
>
>Ben.

Sorry, I don't know.  I'll ask around, but doubt I will get a definitive
answer.

FWIW, without digging too deep into how other drivers identify their
respective ports through software, it appears it was split:
* bnx2*, cxgb3, niu, s2io, sfc, sky2, tg3 - once per second
* e100*, igb, ixgb*, pcnet32, ewrk3, cxgb4 - approx. twice per second

AFAIK for parts that can set the physical identification through hardware,
the Intel drivers set the on/off intervals to approximately twice/second;
I don't know what other drivers do in that situation.

So, I would guess it is not a common requirement to blink at a specific Hz.
I have no problem with changing the hard-coded blink frequency to what our
OEMs expect, but that might be an issue for those other vendors; I was just
trying to make it flexible.

Thanks,
Bruce.

^ permalink raw reply

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Wei Gu @ 2011-04-12  1:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Alexander Duyck, netdev, Kirsher, Jeffrey T
In-Reply-To: <1302536577.4605.1.camel@edumazet-laptop>

I was not stress the NIC/CPU, since I only send 290Kpps 400byte packets towards eth10. the CPU load almost 100%IDEL.

BTW, there are some problem with perf tool on 2.6.35.2, I will try to get you the top offenders if possible.

Thanks
WeiGu

-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
Sent: Monday, April 11, 2011 11:43 PM
To: Wei Gu
Cc: Alexander Duyck; netdev; Kirsher, Jeffrey T
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel

Le lundi 11 avril 2011 à 23:14 +0800, Wei Gu a écrit :
> I tried the ixgbe-3.3.8 (insmod ixgbe.ko RSS=8,8,8,8,8,8,8,8 FdirMode=0,0,0,0,0,0,0,0 Node=0,0,1,1,2,2,3,3)  from e1000.sf.net both on 2.6.35.1 and 2.6.35.2, same observation as 3.2.10 ixgbe driver, On 2.6.35.2 it have high rx errors:
> Ethtool -S eth10 |grep error
>      rx_errors: 0
>      tx_errors: 0
>      rx_over_errors: 0
>      rx_crc_errors: 0
>      rx_frame_errors: 0
>      rx_fifo_errors: 0
>      rx_missed_errors: 2263088
>      tx_aborted_errors: 0
>      tx_carrier_errors: 0
>      tx_fifo_errors: 0
>      tx_heartbeat_errors: 0
>      rx_long_length_errors: 0
>      rx_short_length_errors: 0
>      rx_csum_offload_errors: 0
>      fcoe_last_errors: 0
>

It would be nice you post perf record / perf report results

During your stress , do

perf record -a -g sleep 10
perf report

And post "top offenders"

Thanks



^ permalink raw reply

* Re: Kernel panic when using bridge
From: Stephen Hemminger @ 2011-04-12  1:31 UTC (permalink / raw)
  To: Scot Doyle; +Cc: Hiroaki SHIMODA, netdev, Sebastian Nickel, Pallai Roland
In-Reply-To: <4DA39330.2030102@scotdoyle.com>

On Mon, 11 Apr 2011 18:48:00 -0500
Scot Doyle <lkml@scotdoyle.com> wrote:

> On 04/09/2011 02:19 AM, Hiroaki SHIMODA wrote:
> >
> > It seems that the bug trap is occurred in ip_options_compile() due to
> > rt is NULL.
> >
> > 	8b 96 cc 00 00 00       mov    0xcc(%rsi),%edx
> > rsi is rt, and 0xcc means rt->rt_spec_dst. So I think below code hit
> > the bug trap.
> >
> > 332	if (skb) {
> > 333		memcpy(&optptr[optptr[2]-1],&rt->rt_spec_dst, 4);<- here
> > 334		opt->is_changed = 1;
> > 335	}
> >
> > And call trace seems as follows.
> >    __netif_receive_skb()
> >      ->  br_handle_frame()
> >           ->  NF_HOOK()
> >                ->  br_nf_pre_routing()
> >                     ->  br_parse_ip_options()
> >                          ->  ip_options_compile()
> >
> > br_parse_ip_options() was introduced at 462fb2a (bridge : Sanitize
> > skb before it enters the IP stack) but ip_options_compile() or
> > ip_options_rcv_srr() seems to be called with no rt info.
> 
> Thanks to a tip from Sebastian, I can now reproduce this panic by 
> running "IP Stack Integrity Checker v0.07" from another machine on the 
> same subnet with command "icmpsic -s x.y.z.a -d x.y.z.b" where "x.y.z.a" 
> is IP address of the other machine and "x.y.z.b" is the IP address of 
> the target. When I enable iptables logging on the target machine, no 
> panic occurs. When I disable iptables logging (but otherwise leave the 
> same iptables rules) a panic occurs within a few seconds.
> 
> Thanks Hiroaki for the analysis of the kernel panic output. I've 
> confirmed that you are correct by placing a printk just before those two 
> lines. In every panic, the printk was triggered on line 333 of 
> net/ipv4/ip_options.c
> 
> The kernel panic does not occur after applying the following patch.
> 
> # diff net/ipv4/ip_options.c.original net/ipv4/ip_options.c.fix
> 332c332
> <                 if (skb) {
> ---
>  >                 if (skb && rt) {
> 374c374
> <                     if (skb) {
> ---
>  >                     if (skb && rt) {
> 
> What do you all think? Will it cause other problems?

It would help if you gave a little more context (like diff -up)
next time.

I think the correct fix is for the skb handed to ip_compile_options
to match the layout expected by ip_compile_options.

This patch is compile tested only, please validate.


Subject: [PATCH] bridge: set pseudo-route table before calling ip_comple_options

For some ip options, ip_compile_options assumes it can find the associated
route table. The bridge to iptables code doesn't supply the necessary
reference causing NULL dereference.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
Patch against net-next-2.6, but if validated should go to net-2.6
and stable.

--- a/net/bridge/br_netfilter.c	2011-04-11 18:18:22.534837859 -0700
+++ b/net/bridge/br_netfilter.c	2011-04-11 18:25:15.427244826 -0700
@@ -221,6 +221,7 @@ static int br_parse_ip_options(struct sk
 	struct ip_options *opt;
 	struct iphdr *iph;
 	struct net_device *dev = skb->dev;
+	struct rtable *rt;
 	u32 len;
 
 	iph = ip_hdr(skb);
@@ -255,6 +256,14 @@ static int br_parse_ip_options(struct sk
 		return 0;
 	}
 
+	/* Associate bogus bridge route table */
+	rt = bridge_parent_rtable(dev);
+	if (!rt) {
+		kfree_skb(skb);
+		return 0;
+	}
+	skb_dst_set(skb, &rt->dst);
+
 	opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
 	if (ip_options_compile(dev_net(dev), opt, skb))
 		goto inhdr_error;


^ permalink raw reply

* Re: [Bugme-new] [Bug 32872] New: LLC PDU is dropped if skb is not linear
From: David Miller @ 2011-04-12  1:56 UTC (permalink / raw)
  To: akpm; +Cc: vitalyb, bugzilla-daemon, bugme-daemon, netdev
In-Reply-To: <20110411164812.8f84f995.akpm@linux-foundation.org>

From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 11 Apr 2011 16:48:12 -0700

> --- linux-2.6.32.36/net/llc/llc_input.c.orig	2009-12-03 05:51:21.000000000 +0200
> +++ linux-2.6.32.36/net/llc/llc_input.c	2011-04-08 08:57:29.000000000 +0300
> @@ -105,6 +105,11 @@
>  	if (unlikely(!pskb_may_pull(skb, sizeof(*pdu))))
>  		return 0;
>  
> +	if (skb->data_len != 0){
> +	    if (unlikely(skb_linearize(skb)))
> +		return 0;
> +	}
> +
>  	pdu = (struct llc_pdu_un *)skb->data;
>  	if ((pdu->ctrl_1 & LLC_PDU_TYPE_MASK) == LLC_PDU_TYPE_U)
>  		llc_len = 1;
> 
> 
> 2.6.32 is a pretty old kernel - we'll need to verify if current kernels
> have the same problem.
> 
> Please don't send patches via bugzilla - it causes lots of problems
> with our usual patch management and review processes.  It's preferred
> that patches be sent via email as per Documentation/SubmittingPatches,
> and that they include a Signed-off-by:, as described in that file.

The skb_tail_pointer() check in llc_fixup_skb() is beyond wonky and
honestly the source of the problems here.

I'd suggest instead:

diff --git a/net/llc/llc_input.c b/net/llc/llc_input.c
index 058f1e9..9032421 100644
--- a/net/llc/llc_input.c
+++ b/net/llc/llc_input.c
@@ -121,8 +121,7 @@ static inline int llc_fixup_skb(struct sk_buff *skb)
 		s32 data_size = ntohs(pdulen) - llc_len;
 
 		if (data_size < 0 ||
-		    ((skb_tail_pointer(skb) -
-		      (u8 *)pdu) - llc_len) < data_size)
+		    !pskb_may_pull(skb, data_size))
 			return 0;
 		if (unlikely(pskb_trim_rcsum(skb, data_size)))
 			return 0;

^ permalink raw reply related

* RE: [PATCH v2] net: r8169: convert to hw_features
From: hayeswang @ 2011-04-12  2:10 UTC (permalink / raw)
  To: 'François Romieu',
	'Michał Mirosław'
  Cc: netdev, 'David Dillow'
In-Reply-To: <20110411184739.GA17331@electric-eye.fr.zoreil.com>

 

> From: François Romieu [mailto:romieu@fr.zoreil.com] 
> Sent: Tuesday, April 12, 2011 2:48 AM
> To: Michał Mirosław
> Cc: netdev@vger.kernel.org; David Dillow; Hayeswang
> Subject: Re: [PATCH v2] net: r8169: convert to hw_features
> 
> 
> Hayes, I have a 8168c manual at hand. Do all 8168 have the 
> same Tx descriptors layout ?
> 

Yes, all 8168 have the same Tx descriptors layout except for 8168B series.
 
Best Regards,
Hayes


^ permalink raw reply

* Re: [PATCH RESEND] uts: Set default hostname to "localhost", rather than "(none)"
From: Valdis.Kletnieks @ 2011-04-12  2:47 UTC (permalink / raw)
  To: Josh Triplett
  Cc: David Miller, netdev, Serge E. Hallyn, Andrew Morton,
	Linus Torvalds, linux-kernel
In-Reply-To: <20110411050155.GA2507@feather>

[-- Attachment #1: Type: text/plain, Size: 630 bytes --]

On Sun, 10 Apr 2011 22:01:59 PDT, Josh Triplett said:

> Change the default hostname to "localhost".  This removes the need for
> the standard fallback, provides a useful default for systems that never
> call sethostname, and makes minimal systems that much more useful with
> less configuration.

Seems sane enough to me.  Only possible objection I can think of is "if you're running
with 'init=/bin/sh' or similar config too crippled to run /bin/hostname, maybe your
network config *should* be intentionally toasted so you can't get further surprises".

I personally don't agree - just saying somebody might hold that position.

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply

* Re: [Bugme-new] [Bug 33042] New: Marvell 88E1145 phy configured incorrectly in fiber mode
From: Alex Dubov @ 2011-04-12  3:45 UTC (permalink / raw)
  To: Andrew Morton, David Daney
  Cc: netdev, bugzilla-daemon, bugme-daemon, Grant Likely, Andy Fleming
In-Reply-To: <4DA3703B.1090802@caviumnetworks.com>

> 
> How does your u-boot configure the part?  Does it
> write any of the 
> configuration registers, or is it just the default
> configuration set via 
> the strapping pins?

U-boot configures this phy just like any other phy - by running a set of
register assignments from phy_info_M88E1145.

Unfortunately, I don't have a datasheet for this phy and kernel does
quite a few things differently, so simply copying stuff from u-boot
does not work well (in kernel, phy initialization is broken into 3
functions, if I'm not mistaken).

Otherwise, my problem seems to be identical to the one reported some
time ago against 88E1111 phy (which resulted in the addition of
"marvell_read_status" in the first place). The problem was, as it seems
to be now, that phy is always configured in "copper" mode, instead of
driver checking for the correct "fiber" mode bits.

> 
> In any event, you will probably have to read the
> configuration before 
> the drivers/net/phy/marvel.c changes them.  Then
> compare that to what 
> the driver is trying to set.  Then you will either
> have to override the 
> configuration with the device tree "marvell,reg-init"
> property, or if 
> you are not using the device tree, add a 88e1145 specific
> flag that you 
> set when calling phy_connect().
> 
> David Daney
> 

^ permalink raw reply

* Re: Kernel panic when using bridge
From: Scot Doyle @ 2011-04-12  3:47 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Hiroaki SHIMODA, netdev
In-Reply-To: <20110411183105.46e86684@nehalam>

On 04/11/2011 08:31 PM, Stephen Hemminger wrote:
>
> It would help if you gave a little more context (like diff -up)
> next time.
>
> I think the correct fix is for the skb handed to ip_compile_options
> to match the layout expected by ip_compile_options.
>
> This patch is compile tested only, please validate.
>
>
> Subject: [PATCH] bridge: set pseudo-route table before calling ip_comple_options
>
> For some ip options, ip_compile_options assumes it can find the associated
> route table. The bridge to iptables code doesn't supply the necessary
> reference causing NULL dereference.
>
> Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>
>
> ---
> Patch against net-next-2.6, but if validated should go to net-2.6
> and stable.
>
> --- a/net/bridge/br_netfilter.c	2011-04-11 18:18:22.534837859 -0700
> +++ b/net/bridge/br_netfilter.c	2011-04-11 18:25:15.427244826 -0700
> @@ -221,6 +221,7 @@ static int br_parse_ip_options(struct sk
>   	struct ip_options *opt;
>   	struct iphdr *iph;
>   	struct net_device *dev = skb->dev;
> +	struct rtable *rt;
>   	u32 len;
>
>   	iph = ip_hdr(skb);
> @@ -255,6 +256,14 @@ static int br_parse_ip_options(struct sk
>   		return 0;
>   	}
>
> +	/* Associate bogus bridge route table */
> +	rt = bridge_parent_rtable(dev);
> +	if (!rt) {
> +		kfree_skb(skb);
> +		return 0;
> +	}
> +	skb_dst_set(skb,&rt->dst);
> +
>   	opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
>   	if (ip_options_compile(dev_net(dev), opt, skb))
>   		goto inhdr_error;
>
>
Thanks for the advice on diff context, I appreciate it.  Here's the 
output from the patch:

[  422.577325] ------------[ cut here ]------------
[  422.581932] WARNING: at net/core/dst.c:278 dst_release+0x2e/0x5d()
[  422.588086] Hardware name: PowerEdge R510
[  422.592075] Modules linked in: kvm_intel kvm bridge stp loop snd_pcm 
snd_timer snd soundcore snd_page_alloc i7core_edac psmouse pcspkr 
edac_core evdev serio_raw power_meter processor ghes tpm_tis dcdbas tpm 
tpm_bios thermal_sys button hed ext2 mbcache dm_mod raid1 md_mod sd_mod 
crc_t10dif usb_storage uas uhci_hcd mpt2sas scsi_transport_sas igb 
ehci_hcd raid_class scsi_mod usbcore bnx2 dca [last unloaded: 
scsi_wait_scan]
[  422.629510] Pid: 0, comm: swapper Not tainted 2.6.39-rc2+ #10
[  422.635225] Call Trace:
[  422.637655] <IRQ>  [<ffffffff81045635>] ? warn_slowpath_common+0x78/0x8c
[  422.644425]  [<ffffffffa01cbe89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[  422.650918]  [<ffffffff8127cd60>] ? dst_release+0x2e/0x5d
[  422.656290]  [<ffffffff8126c25f>] ? skb_release_head_state+0x21/0xeb
[  422.662613]  [<ffffffffa01cbe89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[  422.669108]  [<ffffffff8126c06f>] ? __kfree_skb+0x9/0x77
[  422.674392]  [<ffffffff812985f7>] ? nf_hook_slow+0x93/0x114
[  422.679936]  [<ffffffffa01cbe89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[  422.686431]  [<ffffffffa01cbe89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[  422.692927]  [<ffffffffa01cbe6f>] ? NF_HOOK.clone.4+0x3c/0x56 [bridge]
[  422.699421]  [<ffffffff812a7d8e>] ? tcp_gro_receive+0xa1/0x204
[  422.705225]  [<ffffffffa01cc1e5>] ? br_handle_frame+0x195/0x1ac [bridge]
[  422.711892]  [<ffffffffa01cc050>] ? 
br_handle_frame_finish+0x1c7/0x1c7 [bridge]
[  422.719166]  [<ffffffff812764ef>] ? __netif_receive_skb+0x2a7/0x450
[  422.725401]  [<ffffffff81276928>] ? netif_receive_skb+0x52/0x58
[  422.731289]  [<ffffffff81276e2a>] ? napi_gro_receive+0x1f/0x2f
[  422.737091]  [<ffffffff812769ff>] ? napi_skb_finish+0x1c/0x31
[  422.742809]  [<ffffffffa0226fcd>] ? igb_poll+0x6d9/0x9ee [igb]
[  422.748615]  [<ffffffffa003bde2>] ? scsi_run_queue+0x2ce/0x30a [scsi_mod]
[  422.755371]  [<ffffffffa003cb31>] ? scsi_io_completion+0x44c/0x4cf 
[scsi_mod]
[  422.762472]  [<ffffffff81276f55>] ? net_rx_action+0xa4/0x1b1
[  422.768103]  [<ffffffff8104ad26>] ? __do_softirq+0xb8/0x176
[  422.773647]  [<ffffffff81333c5c>] ? call_softirq+0x1c/0x30
[  422.779104]  [<ffffffff8100aa57>] ? do_softirq+0x3f/0x84
[  422.784388]  [<ffffffff8104af91>] ? irq_exit+0x3f/0x8f
[  422.789499]  [<ffffffff8100a793>] ? do_IRQ+0x85/0x9e
[  422.794439]  [<ffffffff8132cbd3>] ? common_interrupt+0x13/0x13
[  422.800240] <EOI>  [<ffffffff81061348>] ? enqueue_hrtimer+0x3f/0x53
[  422.806575]  [<ffffffffa0310417>] ? arch_local_irq_enable+0x7/0x8 
[processor]
[  422.813676]  [<ffffffffa0310dab>] ? acpi_idle_enter_c1+0x86/0xa2 
[processor]
[  422.820690]  [<ffffffff8125d05d>] ? cpuidle_idle_call+0xf4/0x17e
[  422.826664]  [<ffffffff81008298>] ? cpu_idle+0xa2/0xc4
[  422.831776]  [<ffffffff8169db60>] ? start_kernel+0x3b9/0x3c4
[  422.837406]  [<ffffffff8169d3c6>] ? x86_64_start_kernel+0x102/0x10f
[  422.843640] ---[ end trace 5d4687f8472ee50c ]---


^ permalink raw reply

* Re: Kernel panic when using bridge
From: Eric Dumazet @ 2011-04-12  4:09 UTC (permalink / raw)
  To: Scot Doyle; +Cc: Stephen Hemminger, Hiroaki SHIMODA, netdev
In-Reply-To: <4DA3CB4B.9090506@scotdoyle.com>

Le lundi 11 avril 2011 à 22:47 -0500, Scot Doyle a écrit :
> On 04/11/2011 08:31 PM, Stephen Hemminger wrote:
> >
> > It would help if you gave a little more context (like diff -up)
> > next time.
> >
> > I think the correct fix is for the skb handed to ip_compile_options
> > to match the layout expected by ip_compile_options.
> >
> > This patch is compile tested only, please validate.
> >
> >
> > Subject: [PATCH] bridge: set pseudo-route table before calling ip_comple_options
> >
> > For some ip options, ip_compile_options assumes it can find the associated
> > route table. The bridge to iptables code doesn't supply the necessary
> > reference causing NULL dereference.
> >
> > Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>
> >
> > ---
> > Patch against net-next-2.6, but if validated should go to net-2.6
> > and stable.
> >
> > --- a/net/bridge/br_netfilter.c	2011-04-11 18:18:22.534837859 -0700
> > +++ b/net/bridge/br_netfilter.c	2011-04-11 18:25:15.427244826 -0700
> > @@ -221,6 +221,7 @@ static int br_parse_ip_options(struct sk
> >   	struct ip_options *opt;
> >   	struct iphdr *iph;
> >   	struct net_device *dev = skb->dev;
> > +	struct rtable *rt;
> >   	u32 len;
> >
> >   	iph = ip_hdr(skb);
> > @@ -255,6 +256,14 @@ static int br_parse_ip_options(struct sk
> >   		return 0;
> >   	}
> >
> > +	/* Associate bogus bridge route table */
> > +	rt = bridge_parent_rtable(dev);
> > +	if (!rt) {
> > +		kfree_skb(skb);
> > +		return 0;
> > +	}
> > +	skb_dst_set(skb,&rt->dst);

Please try skb_dst_set_noref() here instead of skb_dst_set()

Or increment rt refcount.

> > +
> >   	opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
> >   	if (ip_options_compile(dev_net(dev), opt, skb))
> >   		goto inhdr_error;
> >
> >
> Thanks for the advice on diff context, I appreciate it.  Here's the 
> output from the patch:
> 
> [  422.577325] ------------[ cut here ]------------
> [  422.581932] WARNING: at net/core/dst.c:278 dst_release+0x2e/0x5d()
> [  422.588086] Hardware name: PowerEdge R510
> [  422.592075] Modules linked in: kvm_intel kvm bridge stp loop snd_pcm 
> snd_timer snd soundcore snd_page_alloc i7core_edac psmouse pcspkr 
> edac_core evdev serio_raw power_meter processor ghes tpm_tis dcdbas tpm 
> tpm_bios thermal_sys button hed ext2 mbcache dm_mod raid1 md_mod sd_mod 
> crc_t10dif usb_storage uas uhci_hcd mpt2sas scsi_transport_sas igb 
> ehci_hcd raid_class scsi_mod usbcore bnx2 dca [last unloaded: 
> scsi_wait_scan]
> [  422.629510] Pid: 0, comm: swapper Not tainted 2.6.39-rc2+ #10
> [  422.635225] Call Trace:
> [  422.637655] <IRQ>  [<ffffffff81045635>] ? warn_slowpath_common+0x78/0x8c
> [  422.644425]  [<ffffffffa01cbe89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [  422.650918]  [<ffffffff8127cd60>] ? dst_release+0x2e/0x5d
> [  422.656290]  [<ffffffff8126c25f>] ? skb_release_head_state+0x21/0xeb
> [  422.662613]  [<ffffffffa01cbe89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [  422.669108]  [<ffffffff8126c06f>] ? __kfree_skb+0x9/0x77
> [  422.674392]  [<ffffffff812985f7>] ? nf_hook_slow+0x93/0x114
> [  422.679936]  [<ffffffffa01cbe89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [  422.686431]  [<ffffffffa01cbe89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [  422.692927]  [<ffffffffa01cbe6f>] ? NF_HOOK.clone.4+0x3c/0x56 [bridge]
> [  422.699421]  [<ffffffff812a7d8e>] ? tcp_gro_receive+0xa1/0x204
> [  422.705225]  [<ffffffffa01cc1e5>] ? br_handle_frame+0x195/0x1ac [bridge]
> [  422.711892]  [<ffffffffa01cc050>] ? 
> br_handle_frame_finish+0x1c7/0x1c7 [bridge]
> [  422.719166]  [<ffffffff812764ef>] ? __netif_receive_skb+0x2a7/0x450
> [  422.725401]  [<ffffffff81276928>] ? netif_receive_skb+0x52/0x58
> [  422.731289]  [<ffffffff81276e2a>] ? napi_gro_receive+0x1f/0x2f
> [  422.737091]  [<ffffffff812769ff>] ? napi_skb_finish+0x1c/0x31
> [  422.742809]  [<ffffffffa0226fcd>] ? igb_poll+0x6d9/0x9ee [igb]
> [  422.748615]  [<ffffffffa003bde2>] ? scsi_run_queue+0x2ce/0x30a [scsi_mod]
> [  422.755371]  [<ffffffffa003cb31>] ? scsi_io_completion+0x44c/0x4cf 
> [scsi_mod]
> [  422.762472]  [<ffffffff81276f55>] ? net_rx_action+0xa4/0x1b1
> [  422.768103]  [<ffffffff8104ad26>] ? __do_softirq+0xb8/0x176
> [  422.773647]  [<ffffffff81333c5c>] ? call_softirq+0x1c/0x30
> [  422.779104]  [<ffffffff8100aa57>] ? do_softirq+0x3f/0x84
> [  422.784388]  [<ffffffff8104af91>] ? irq_exit+0x3f/0x8f
> [  422.789499]  [<ffffffff8100a793>] ? do_IRQ+0x85/0x9e
> [  422.794439]  [<ffffffff8132cbd3>] ? common_interrupt+0x13/0x13
> [  422.800240] <EOI>  [<ffffffff81061348>] ? enqueue_hrtimer+0x3f/0x53
> [  422.806575]  [<ffffffffa0310417>] ? arch_local_irq_enable+0x7/0x8 
> [processor]
> [  422.813676]  [<ffffffffa0310dab>] ? acpi_idle_enter_c1+0x86/0xa2 
> [processor]
> [  422.820690]  [<ffffffff8125d05d>] ? cpuidle_idle_call+0xf4/0x17e
> [  422.826664]  [<ffffffff81008298>] ? cpu_idle+0xa2/0xc4
> [  422.831776]  [<ffffffff8169db60>] ? start_kernel+0x3b9/0x3c4
> [  422.837406]  [<ffffffff8169d3c6>] ? x86_64_start_kernel+0x102/0x10f
> [  422.843640] ---[ end trace 5d4687f8472ee50c ]---
> 



^ permalink raw reply

* Re: Kernel panic when using bridge
From: Eric Dumazet @ 2011-04-12  4:22 UTC (permalink / raw)
  To: Scot Doyle; +Cc: Stephen Hemminger, Hiroaki SHIMODA, netdev
In-Reply-To: <1302581384.3603.14.camel@edumazet-laptop>

Le mardi 12 avril 2011 à 06:09 +0200, Eric Dumazet a écrit :
> Le lundi 11 avril 2011 à 22:47 -0500, Scot Doyle a écrit :
> > On 04/11/2011 08:31 PM, Stephen Hemminger wrote:
> > >
> > > It would help if you gave a little more context (like diff -up)
> > > next time.
> > >
> > > I think the correct fix is for the skb handed to ip_compile_options
> > > to match the layout expected by ip_compile_options.
> > >
> > > This patch is compile tested only, please validate.
> > >
> > >
> > > Subject: [PATCH] bridge: set pseudo-route table before calling ip_comple_options
> > >
> > > For some ip options, ip_compile_options assumes it can find the associated
> > > route table. The bridge to iptables code doesn't supply the necessary
> > > reference causing NULL dereference.
> > >
> > > Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>
> > >
> > > ---
> > > Patch against net-next-2.6, but if validated should go to net-2.6
> > > and stable.
> > >
> > > --- a/net/bridge/br_netfilter.c	2011-04-11 18:18:22.534837859 -0700
> > > +++ b/net/bridge/br_netfilter.c	2011-04-11 18:25:15.427244826 -0700
> > > @@ -221,6 +221,7 @@ static int br_parse_ip_options(struct sk
> > >   	struct ip_options *opt;
> > >   	struct iphdr *iph;
> > >   	struct net_device *dev = skb->dev;
> > > +	struct rtable *rt;
> > >   	u32 len;
> > >
> > >   	iph = ip_hdr(skb);
> > > @@ -255,6 +256,14 @@ static int br_parse_ip_options(struct sk
> > >   		return 0;
> > >   	}
> > >
> > > +	/* Associate bogus bridge route table */
> > > +	rt = bridge_parent_rtable(dev);
> > > +	if (!rt) {
> > > +		kfree_skb(skb);
> > > +		return 0;
> > > +	}
> > > +	skb_dst_set(skb,&rt->dst);
> 
> Please try skb_dst_set_noref() here instead of skb_dst_set()
> 
> Or increment rt refcount.

Also, I would first check if skb->dst already set to not leak a dst

if (!skb->dst) {
	rt = bridge_parent_rtable(dev);
	if (!rt) {
		kfree_skb(skb);
		return 0;
	}
	skb_dst_set_noref(skb,&rt->dst);
}




^ permalink raw reply

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Wei Gu @ 2011-04-12  4:40 UTC (permalink / raw)
  To: Eric Dumazet, Alexander Duyck, Peter Zijlstra; +Cc: netdev, Kirsher, Jeffrey T
In-Reply-To: <1302536577.4605.1.camel@edumazet-laptop>

Hi,
I found the problem was introduced by this revert patch "2010-08-13     Peter Zijlstra  sched: Revert nohz_ratelimit() for now"

I tried the remove this patch from 2.6.35.2 and then build the application again, then the ixgbe driver looks works fine.
I don't know why this time revert the  nohz_ratelimit() will cause the problem on ixgbe driver, since this  nohz_ratelimit was first introduced "2010-03-11". And before that time with 2.6.32 kernel it also doesn't have this problem with ixgbe driver.


Some log from git:
=========================================================================================
2.6.35.2
2010-08-13      Peter Zijlstra  sched: Revert nohz_ratelimit() for now
2.6.35.1
2010-08-01      Linus Torvalds  Linux 2.6.35 v2.6.35
2010-06-17      Peter Zijlstra  nohz: Fix nohz ratelimit
2.6.35-rc3
2010-03-11      Mike Galbraith  sched: Rate-limit nohz

Thanks
WeiGu

-----Original Message-----
From: Wei Gu
Sent: Tuesday, April 12, 2011 9:23 AM
To: 'Eric Dumazet'
Cc: Alexander Duyck; netdev; Kirsher, Jeffrey T
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel

I was not stress the NIC/CPU, since I only send 290Kpps 400byte packets towards eth10. the CPU load almost 100%IDEL.

BTW, there are some problem with perf tool on 2.6.35.2, I will try to get you the top offenders if possible.

Thanks
WeiGu

-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
Sent: Monday, April 11, 2011 11:43 PM
To: Wei Gu
Cc: Alexander Duyck; netdev; Kirsher, Jeffrey T
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel

Le lundi 11 avril 2011 à 23:14 +0800, Wei Gu a écrit :
> I tried the ixgbe-3.3.8 (insmod ixgbe.ko RSS=8,8,8,8,8,8,8,8 FdirMode=0,0,0,0,0,0,0,0 Node=0,0,1,1,2,2,3,3)  from e1000.sf.net both on 2.6.35.1 and 2.6.35.2, same observation as 3.2.10 ixgbe driver, On 2.6.35.2 it have high rx errors:
> Ethtool -S eth10 |grep error
>      rx_errors: 0
>      tx_errors: 0
>      rx_over_errors: 0
>      rx_crc_errors: 0
>      rx_frame_errors: 0
>      rx_fifo_errors: 0
>      rx_missed_errors: 2263088
>      tx_aborted_errors: 0
>      tx_carrier_errors: 0
>      tx_fifo_errors: 0
>      tx_heartbeat_errors: 0
>      rx_long_length_errors: 0
>      rx_short_length_errors: 0
>      rx_csum_offload_errors: 0
>      fcoe_last_errors: 0
>

It would be nice you post perf record / perf report results

During your stress , do

perf record -a -g sleep 10
perf report

And post "top offenders"

Thanks



^ permalink raw reply

* [PATCH] net: davinci_emac: fix spinlock bug with dma channel cleanup
From: Sriramakrishnan A G @ 2011-04-12  4:42 UTC (permalink / raw)
  To: netdev; +Cc: davinci-linux-open-source, davem, Sriramakrishnan A G

The DMA cleanup function was holding the spinlock across
a busy loop where it waits for HW to indicate teardown is complete.
This generates a backtrace, when DEBUG_SPINLOCK is enabled. Make the
locking more granular.

Signed-off-by: Sriramakrishnan A G <srk@ti.com>
---
 drivers/net/davinci_cpdma.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/davinci_cpdma.c b/drivers/net/davinci_cpdma.c
index ae47f23..57fd0fc 100644
--- a/drivers/net/davinci_cpdma.c
+++ b/drivers/net/davinci_cpdma.c
@@ -824,6 +824,8 @@ int cpdma_chan_stop(struct cpdma_chan *chan)
 	/* trigger teardown */
 	dma_reg_write(ctlr, chan->td, chan->chan_num);
 
+	spin_unlock_irqrestore(&chan->lock, flags);
+
 	/* wait for teardown complete */
 	timeout = jiffies + HZ/10;	/* 100 msec */
 	while (time_before(jiffies, timeout)) {
@@ -843,6 +845,7 @@ int cpdma_chan_stop(struct cpdma_chan *chan)
 	} while ((ret & CPDMA_DESC_TD_COMPLETE) == 0);
 
 	/* remaining packets haven't been tx/rx'ed, clean them up */
+	spin_lock_irqsave(&chan->lock, flags);
 	while (chan->head) {
 		struct cpdma_desc __iomem *desc = chan->head;
 		dma_addr_t next_dma;
-- 
1.6.2.4


^ permalink raw reply related

* RE: [PATCHv2 NEXT 2/8] qlcnic: fix eswitch stats
From: Amit Salecha @ 2011-04-12  4:48 UTC (permalink / raw)
  To: David Miller, Stephen Hemminger
  Cc: netdev@vger.kernel.org, Ameen Rahman, Anirban Chakraborty
In-Reply-To: <20110411.155517.200362844.davem@davemloft.net>

> From: Amit Kumar Salecha <amit.salecha@qlogic.com>
> Date: Mon,  4 Oct 2010 08:14:51 -0700
>
> > Some of the counters are not implemented in fw.
> > Fw return NOT AVAILABLE VALUE as (0xffffffffffffffff).
> > Adding these counters, result in invalid value.
> >
> > Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
>
> Why are patches being posted from back in October 4, 2010?

My mail server is spamming mail, please ignore all below emails:

[PATCH NEXT 1/2] netxen: Notify firmware of Flex-10 interface down
[PATCHv2 NEXT 3/8] qlcnic: fix diag register
[PATCHv2 NEXT 8/8] qlcnic: set mtu lower limit
[PATCH NEXT 0/2]nexten: bug fixes
[PATCHv2 NEXT 2/8] qlcnic: fix eswitch stats
[PATCH NEXT 2/2] netxen: support for GbE port settings
[PATCHv2 NEXT 6/8] qlcnic: sparse warning fixes
[PATCHv2 NEXT 7/8] qlcnic: cleanup port mode setting
[PATCHv2 NEXT 5/8] qlcnic: fix vlan TSO on big endian machine

Sorry for inconvenience caused to all.

-Amit

This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.


^ permalink raw reply

* [PATCH] driver/e1000e: Fix default interrupt mode select
From: Prabhakar Kushwaha @ 2011-04-12  4:56 UTC (permalink / raw)
  To: linuxppc-dev, linux.nics, auke-jan.h.kok, e1000-devel, netdev
  Cc: meet2prabhu, Prabhakar, Jin Qing

From: Prabhakar <prabhakar@freescale.com>

The Intel e1000 device driver defaults to MSI interrupt mode, even if MSI
support is not enabled

Signed-off-by: Jin Qing <b24347@freescale.com>
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
---
 Based upon git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git(branch master)

 added  netdev mail-list and e1000 mail-list & maintainer

 drivers/net/e1000e/param.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/net/e1000e/param.c b/drivers/net/e1000e/param.c
index a150e48..7b3bbec 100644
--- a/drivers/net/e1000e/param.c
+++ b/drivers/net/e1000e/param.c
@@ -390,7 +390,11 @@ void __devinit e1000e_check_options(struct e1000_adapter *adapter)
 			.type = range_option,
 			.name = "Interrupt Mode",
 			.err  = "defaulting to 2 (MSI-X)",
+#ifdef CONFIG_PCI_MSI
 			.def  = E1000E_INT_MODE_MSIX,
+#else
+			.def  = E1000E_INT_MODE_LEGACY,
+#endif
 			.arg  = { .r = { .min = MIN_INTMODE,
 					 .max = MAX_INTMODE } }
 		};
-- 
1.7.3



^ permalink raw reply related

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Eric Dumazet @ 2011-04-12  4:56 UTC (permalink / raw)
  To: Wei Gu; +Cc: Alexander Duyck, Peter Zijlstra, netdev, Kirsher, Jeffrey T
In-Reply-To: <D12839161ADD3A4B8DA63D1A134D084026E490995D@ESGSCCMS0001.eapac.ericsson.se>

Le mardi 12 avril 2011 à 12:40 +0800, Wei Gu a écrit :
> Hi,
> I found the problem was introduced by this revert patch "2010-08-13
> Peter Zijlstra  sched: Revert nohz_ratelimit() for now"
> 
> I tried the remove this patch from 2.6.35.2 and then build the
> application again, then the ixgbe driver looks works fine.
> I don't know why this time revert the  nohz_ratelimit() will cause the
> problem on ixgbe driver, since this  nohz_ratelimit was first
> introduced "2010-03-11". And before that time with 2.6.32 kernel it
> also doesn't have this problem with ixgbe driver.
> 
> 
> Some log from git:
> =========================================================================================
> 2.6.35.2
> 2010-08-13      Peter Zijlstra  sched: Revert nohz_ratelimit() for now
> 2.6.35.1
> 2010-08-01      Linus Torvalds  Linux 2.6.35 v2.6.35
> 2010-06-17      Peter Zijlstra  nohz: Fix nohz ratelimit
> 2.6.35-rc3
> 2010-03-11      Mike Galbraith  sched: Rate-limit nohz
> 
> Thanks
> WeiGu
> 

Hmm...

Could you try to add "processor.max_cstate=1" to boot parameters ?




^ permalink raw reply

* Re: Race condition when creating multiple namespaces?
From: Hans Schillstrom @ 2011-04-12  4:56 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, Daniel Lezcano
In-Reply-To: <m1ei58co08.fsf@fess.ebiederm.org>


On Tuesday, April 12, 2011 02:27:35 Eric W. Biederman wrote:
> Hans Schillstrom <hans@schillstrom.com> writes:
> 
> > Hello
> > I'v been strugling with this for some time now
> >
> > When creating multiple namespaces using lxc-start,  un-initialized network namespace parts will be called by the new process in the namespace.
> > ex. when using conntrack or ipvsadm to quickly,  (a sleep 2 "solves" the problem).
> > (From what I can see syscall clone() is used in lx-start  i.e. do_fork will be called later on.)
> > Actually I was debugging ip_vs when closing multiple ns  when I fell into this one.
> >
> > I have a loop that create 33 containers whith lxc-start ... -- test.sh
> > the first thing the new conatiner does in test.sh is
> > #!/bin/bash
> > iptables -t mangle -A PREROUTING -m conntrack --ctstate RELATED,ESTABLISHED -j CONNMARK --restore-mark
> > nc -l -p1234
> >
> > This results in NULL ptr in ip_conntrack_net_init(struct *net)
> 
> Ouch!
> 
> > and in anoither test test.sh looks like this
> > #!/bin/bash
> > ipvsadm --start-daemon=master --mcast-interface=lo
> > nc -l -p1234
> >
> > And this results in an uniitialized spinlock in ip_vs_sync
> >
> > I put a printk in nsproxy: copy_namespaces() and could see a dozens of them
> > before anything appears from ipvs or conntrack.
> >
> > My feeling is that when you start up user processes in a new name space, 
> > all kernel related init should have been done (you should not need to add a sleep to get it working)
> >
> > All test  made by using todays net-next-2.6 (2.6.39-rc1)
> >
> > Note:
> > That neither conntrack or ip_vs modules where loaded,
> > if modules where loaded before creating new namespaces it all works...
> >
> > Finally the question,
> > Should it really work to load modules within a namespace , 
> > that is a part of netns ?
> 
> >From an implementation point of view kernel modules are not in a
> namespace, so there should be no difference between being in a namespace
> and loading a kernel networking module and not being in a namespace and
> loading in a kernel module.
> 
> It does sound like you have hit a module loading race, and perhaps
> a race that is confined to network namespaces.
> 
> My head is in another problem so I won't be able to look at this for
> a bit.  But if you are getting into ip_conntrack_net_init with
> a NULL network namespace something spectacularly bad is happening.

OK I'll continue to dig into this.

> 
> In particular it looks like you must be hitting a bug in for_each_net.
> Which would pretty much have to be a race in adding or removing from
> net_namespace_list.

It was further down in proc_net_fops_create()

> 
> I took a quick skim through the code and whenever we modify the
> net_namespace we hold but the net_mutex and inside it the rtnl_lock so I
> don't immediate see how you could be getting a NULL net into
> ip_conntrack_net_init.

I do had the same problem in ip_vs  a couple of times, but at that time I thought it was my changes...
In the ip_vs case it seems to be more like a race or a missing lock one core reach a "not fully" initialized ipvs struct.
That could be my fault like bad order when calling register_pernet_subsys...

> 
> Is there a codepath besides register_pernet_subsys that is calling
> ip_conntrack_net_init?

Not what I can see...
> 
> Do you have any local modifications that could be messing up register_pernet_subsys?

Not right now (I took them away, a clean git clone)

> 
> Eric
> 

I will continue with this today 

Thanks a lot
Hans

^ permalink raw reply

* Re: [PATCH] net: ipv4: add IPPROTO_ICMP socket kind
From: Solar Designer @ 2011-04-12  5:06 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: linux-kernel, netdev, Pavel Kankovsky, Kees Cook, Dan Rosenberg,
	Eugene Teo, Nelson Elhage, David S. Miller, Alexey Kuznetsov,
	Pekka Savola, James Morris, Hideaki YOSHIFUJI, Patrick McHardy
In-Reply-To: <20110409101514.GA4262@albatros>

On Sat, Apr 09, 2011 at 02:15:14PM +0400, Vasiliy Kulikov wrote:
> This patch adds IPPROTO_ICMP socket kind.  It makes it possible to send
> ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
> without any special privileges.  In other words, the patch makes it
> possible to implement setuid-less and CAP_NET_RAW-less /bin/ping.  In
> order not to increase the kernel's attack surface (in case of
> vulnerabilities in the newly added code), the new functionality is
> disabled by default, but is enabled at bootup by supporting Linux
> distributions, optionally with restriction to a group or a group range
...
> For Openwall GNU/*/Linux it is the last step on the road to the
> setuid-less distro.

More correctly, it _was_ the last step - we've already taken it, so a
revision of the patch (against OpenVZ/RHEL5 kernels) is currently in use.

We would really like this accepted into mainline, which is why Vasiliy
spends extra effort to keep the patch updated to current mainline
kernels and re-test it.  If there are any comments/concerns/objections,
we'd be happy to hear those.

> Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>

Acked-by: Solar Designer <solar@openwall.com>

>  include/net/netns/ipv4.h   |    2 +
>  include/net/ping.h         |   69 ++++
>  net/ipv4/Kconfig           |   21 +
>  net/ipv4/Makefile          |    1 +
>  net/ipv4/af_inet.c         |   36 ++
>  net/ipv4/icmp.c            |   14 +-
>  net/ipv4/ping.c            |  933 ++++++++++++++++++++++++++++++++++++++++++++
>  net/ipv4/sysctl_net_ipv4.c |   90 +++++
>  8 files changed, 1165 insertions(+), 1 deletions(-)

Thanks,

Alexander

^ permalink raw reply

* Re: Kernel panic when using bridge
From: Scot Doyle @ 2011-04-12  5:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, Hiroaki SHIMODA, netdev
In-Reply-To: <1302582172.3603.18.camel@edumazet-laptop>

On 04/11/2011 11:22 PM, Eric Dumazet wrote:
> Also, I would first check if skb->dst already set to not leak a dst
>
> if (!skb->dst) {
> 	rt = bridge_parent_rtable(dev);
> 	if (!rt) {
> 		kfree_skb(skb);
> 		return 0;
> 	}
> 	skb_dst_set_noref(skb,&rt->dst);
> }

Thank you for the idea. Here is the compiler output referring to the 
first line above.

net/bridge/br_netfilter.c: In function 'br_parse_ip_options':
net/bridge/br_netfilter.c:260:10: error: 'struct sk_buff' has no member 
named 'dst'


^ permalink raw reply

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Wei Gu @ 2011-04-12  5:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Alexander Duyck, Peter Zijlstra, netdev, Kirsher, Jeffrey T
In-Reply-To: <1302584201.3603.20.camel@edumazet-laptop>

Hi,
It doesn't looks any better by pass this param to kernel

  kernel /vmlinuz-2.6.35.2 ro root=UUID=e96f9df8-c28a-4ea8-ac26-64fbf948bce2 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.iso88591 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=sv-latin1 crashkernel=auto pci=bfsort rhgb quiet console=tty0 console=ttyS0,115200 processor.max_cstate=1


-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
Sent: Tuesday, April 12, 2011 12:57 PM
To: Wei Gu
Cc: Alexander Duyck; Peter Zijlstra; netdev; Kirsher, Jeffrey T
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel

Le mardi 12 avril 2011 à 12:40 +0800, Wei Gu a écrit :
> Hi,
> I found the problem was introduced by this revert patch "2010-08-13
> Peter Zijlstra  sched: Revert nohz_ratelimit() for now"
>
> I tried the remove this patch from 2.6.35.2 and then build the
> application again, then the ixgbe driver looks works fine.
> I don't know why this time revert the  nohz_ratelimit() will cause the
> problem on ixgbe driver, since this  nohz_ratelimit was first
> introduced "2010-03-11". And before that time with 2.6.32 kernel it
> also doesn't have this problem with ixgbe driver.
>
>
> Some log from git:
> ======================================================================
> ===================
> 2.6.35.2
> 2010-08-13      Peter Zijlstra  sched: Revert nohz_ratelimit() for now
> 2.6.35.1
> 2010-08-01      Linus Torvalds  Linux 2.6.35 v2.6.35
> 2010-06-17      Peter Zijlstra  nohz: Fix nohz ratelimit
> 2.6.35-rc3
> 2010-03-11      Mike Galbraith  sched: Rate-limit nohz
>
> Thanks
> WeiGu
>

Hmm...

Could you try to add "processor.max_cstate=1" to boot parameters ?




^ permalink raw reply

* Re: Kernel panic when using bridge
From: Eric Dumazet @ 2011-04-12  5:51 UTC (permalink / raw)
  To: Scot Doyle; +Cc: Stephen Hemminger, Hiroaki SHIMODA, netdev
In-Reply-To: <4DA3E074.5090603@scotdoyle.com>

Le mardi 12 avril 2011 à 00:17 -0500, Scot Doyle a écrit :
> On 04/11/2011 11:22 PM, Eric Dumazet wrote:
> > Also, I would first check if skb->dst already set to not leak a dst
> >
> > if (!skb->dst) {

Oh well, sorry (not enough time these days to even test patches)

	if (!skb_dst(skb)) {

> > 	rt = bridge_parent_rtable(dev);
> > 	if (!rt) {
> > 		kfree_skb(skb);
> > 		return 0;
> > 	}
> > 	skb_dst_set_noref(skb,&rt->dst);
> > }
> 
> Thank you for the idea. Here is the compiler output referring to the 
> first line above.
> 
> net/bridge/br_netfilter.c: In function 'br_parse_ip_options':
> net/bridge/br_netfilter.c:260:10: error: 'struct sk_buff' has no member 
> named 'dst'
> 



^ permalink raw reply

* Is __xfrm_lookup always on non-atomic context ?
From: Eduardo Panisset @ 2011-04-12  5:58 UTC (permalink / raw)
  To: netdev

Hi all,

I'm using XFRM for tunneling payload traffic on Dual Stack Mobility application.
However, if correspondent XFRM states to XFRM policy's templates have
not been registered yet, It's possible the current process wait for
them, using a wait queue.
But what if this function is not being called on atomic context (e.g. softirq) ?

Thanks in advance,
Eduardo Panisset.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox