Netdev List
 help / color / mirror / Atom feed
* Re: Kernel panic when using bridge
From: Eric Dumazet @ 2011-04-12  4:22 UTC (permalink / raw)
  To: Scot Doyle; +Cc: Stephen Hemminger, Hiroaki SHIMODA, netdev
In-Reply-To: <1302581384.3603.14.camel@edumazet-laptop>

Le mardi 12 avril 2011 à 06:09 +0200, Eric Dumazet a écrit :
> Le lundi 11 avril 2011 à 22:47 -0500, Scot Doyle a écrit :
> > On 04/11/2011 08:31 PM, Stephen Hemminger wrote:
> > >
> > > It would help if you gave a little more context (like diff -up)
> > > next time.
> > >
> > > I think the correct fix is for the skb handed to ip_compile_options
> > > to match the layout expected by ip_compile_options.
> > >
> > > This patch is compile tested only, please validate.
> > >
> > >
> > > Subject: [PATCH] bridge: set pseudo-route table before calling ip_comple_options
> > >
> > > For some ip options, ip_compile_options assumes it can find the associated
> > > route table. The bridge to iptables code doesn't supply the necessary
> > > reference causing NULL dereference.
> > >
> > > Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>
> > >
> > > ---
> > > Patch against net-next-2.6, but if validated should go to net-2.6
> > > and stable.
> > >
> > > --- a/net/bridge/br_netfilter.c	2011-04-11 18:18:22.534837859 -0700
> > > +++ b/net/bridge/br_netfilter.c	2011-04-11 18:25:15.427244826 -0700
> > > @@ -221,6 +221,7 @@ static int br_parse_ip_options(struct sk
> > >   	struct ip_options *opt;
> > >   	struct iphdr *iph;
> > >   	struct net_device *dev = skb->dev;
> > > +	struct rtable *rt;
> > >   	u32 len;
> > >
> > >   	iph = ip_hdr(skb);
> > > @@ -255,6 +256,14 @@ static int br_parse_ip_options(struct sk
> > >   		return 0;
> > >   	}
> > >
> > > +	/* Associate bogus bridge route table */
> > > +	rt = bridge_parent_rtable(dev);
> > > +	if (!rt) {
> > > +		kfree_skb(skb);
> > > +		return 0;
> > > +	}
> > > +	skb_dst_set(skb,&rt->dst);
> 
> Please try skb_dst_set_noref() here instead of skb_dst_set()
> 
> Or increment rt refcount.

Also, I would first check if skb->dst already set to not leak a dst

if (!skb->dst) {
	rt = bridge_parent_rtable(dev);
	if (!rt) {
		kfree_skb(skb);
		return 0;
	}
	skb_dst_set_noref(skb,&rt->dst);
}




^ permalink raw reply

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Wei Gu @ 2011-04-12  4:40 UTC (permalink / raw)
  To: Eric Dumazet, Alexander Duyck, Peter Zijlstra; +Cc: netdev, Kirsher, Jeffrey T
In-Reply-To: <1302536577.4605.1.camel@edumazet-laptop>

Hi,
I found the problem was introduced by this revert patch "2010-08-13     Peter Zijlstra  sched: Revert nohz_ratelimit() for now"

I tried the remove this patch from 2.6.35.2 and then build the application again, then the ixgbe driver looks works fine.
I don't know why this time revert the  nohz_ratelimit() will cause the problem on ixgbe driver, since this  nohz_ratelimit was first introduced "2010-03-11". And before that time with 2.6.32 kernel it also doesn't have this problem with ixgbe driver.


Some log from git:
=========================================================================================
2.6.35.2
2010-08-13      Peter Zijlstra  sched: Revert nohz_ratelimit() for now
2.6.35.1
2010-08-01      Linus Torvalds  Linux 2.6.35 v2.6.35
2010-06-17      Peter Zijlstra  nohz: Fix nohz ratelimit
2.6.35-rc3
2010-03-11      Mike Galbraith  sched: Rate-limit nohz

Thanks
WeiGu

-----Original Message-----
From: Wei Gu
Sent: Tuesday, April 12, 2011 9:23 AM
To: 'Eric Dumazet'
Cc: Alexander Duyck; netdev; Kirsher, Jeffrey T
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel

I was not stress the NIC/CPU, since I only send 290Kpps 400byte packets towards eth10. the CPU load almost 100%IDEL.

BTW, there are some problem with perf tool on 2.6.35.2, I will try to get you the top offenders if possible.

Thanks
WeiGu

-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
Sent: Monday, April 11, 2011 11:43 PM
To: Wei Gu
Cc: Alexander Duyck; netdev; Kirsher, Jeffrey T
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel

Le lundi 11 avril 2011 à 23:14 +0800, Wei Gu a écrit :
> I tried the ixgbe-3.3.8 (insmod ixgbe.ko RSS=8,8,8,8,8,8,8,8 FdirMode=0,0,0,0,0,0,0,0 Node=0,0,1,1,2,2,3,3)  from e1000.sf.net both on 2.6.35.1 and 2.6.35.2, same observation as 3.2.10 ixgbe driver, On 2.6.35.2 it have high rx errors:
> Ethtool -S eth10 |grep error
>      rx_errors: 0
>      tx_errors: 0
>      rx_over_errors: 0
>      rx_crc_errors: 0
>      rx_frame_errors: 0
>      rx_fifo_errors: 0
>      rx_missed_errors: 2263088
>      tx_aborted_errors: 0
>      tx_carrier_errors: 0
>      tx_fifo_errors: 0
>      tx_heartbeat_errors: 0
>      rx_long_length_errors: 0
>      rx_short_length_errors: 0
>      rx_csum_offload_errors: 0
>      fcoe_last_errors: 0
>

It would be nice you post perf record / perf report results

During your stress , do

perf record -a -g sleep 10
perf report

And post "top offenders"

Thanks



^ permalink raw reply

* [PATCH] net: davinci_emac: fix spinlock bug with dma channel cleanup
From: Sriramakrishnan A G @ 2011-04-12  4:42 UTC (permalink / raw)
  To: netdev; +Cc: davinci-linux-open-source, davem, Sriramakrishnan A G

The DMA cleanup function was holding the spinlock across
a busy loop where it waits for HW to indicate teardown is complete.
This generates a backtrace, when DEBUG_SPINLOCK is enabled. Make the
locking more granular.

Signed-off-by: Sriramakrishnan A G <srk@ti.com>
---
 drivers/net/davinci_cpdma.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/davinci_cpdma.c b/drivers/net/davinci_cpdma.c
index ae47f23..57fd0fc 100644
--- a/drivers/net/davinci_cpdma.c
+++ b/drivers/net/davinci_cpdma.c
@@ -824,6 +824,8 @@ int cpdma_chan_stop(struct cpdma_chan *chan)
 	/* trigger teardown */
 	dma_reg_write(ctlr, chan->td, chan->chan_num);
 
+	spin_unlock_irqrestore(&chan->lock, flags);
+
 	/* wait for teardown complete */
 	timeout = jiffies + HZ/10;	/* 100 msec */
 	while (time_before(jiffies, timeout)) {
@@ -843,6 +845,7 @@ int cpdma_chan_stop(struct cpdma_chan *chan)
 	} while ((ret & CPDMA_DESC_TD_COMPLETE) == 0);
 
 	/* remaining packets haven't been tx/rx'ed, clean them up */
+	spin_lock_irqsave(&chan->lock, flags);
 	while (chan->head) {
 		struct cpdma_desc __iomem *desc = chan->head;
 		dma_addr_t next_dma;
-- 
1.6.2.4


^ permalink raw reply related

* RE: [PATCHv2 NEXT 2/8] qlcnic: fix eswitch stats
From: Amit Salecha @ 2011-04-12  4:48 UTC (permalink / raw)
  To: David Miller, Stephen Hemminger
  Cc: netdev@vger.kernel.org, Ameen Rahman, Anirban Chakraborty
In-Reply-To: <20110411.155517.200362844.davem@davemloft.net>

> From: Amit Kumar Salecha <amit.salecha@qlogic.com>
> Date: Mon,  4 Oct 2010 08:14:51 -0700
>
> > Some of the counters are not implemented in fw.
> > Fw return NOT AVAILABLE VALUE as (0xffffffffffffffff).
> > Adding these counters, result in invalid value.
> >
> > Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
>
> Why are patches being posted from back in October 4, 2010?

My mail server is spamming mail, please ignore all below emails:

[PATCH NEXT 1/2] netxen: Notify firmware of Flex-10 interface down
[PATCHv2 NEXT 3/8] qlcnic: fix diag register
[PATCHv2 NEXT 8/8] qlcnic: set mtu lower limit
[PATCH NEXT 0/2]nexten: bug fixes
[PATCHv2 NEXT 2/8] qlcnic: fix eswitch stats
[PATCH NEXT 2/2] netxen: support for GbE port settings
[PATCHv2 NEXT 6/8] qlcnic: sparse warning fixes
[PATCHv2 NEXT 7/8] qlcnic: cleanup port mode setting
[PATCHv2 NEXT 5/8] qlcnic: fix vlan TSO on big endian machine

Sorry for inconvenience caused to all.

-Amit

This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.


^ permalink raw reply

* [PATCH] driver/e1000e: Fix default interrupt mode select
From: Prabhakar Kushwaha @ 2011-04-12  4:56 UTC (permalink / raw)
  To: linuxppc-dev, linux.nics, auke-jan.h.kok, e1000-devel, netdev
  Cc: meet2prabhu, Prabhakar, Jin Qing

From: Prabhakar <prabhakar@freescale.com>

The Intel e1000 device driver defaults to MSI interrupt mode, even if MSI
support is not enabled

Signed-off-by: Jin Qing <b24347@freescale.com>
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
---
 Based upon git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git(branch master)

 added  netdev mail-list and e1000 mail-list & maintainer

 drivers/net/e1000e/param.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/net/e1000e/param.c b/drivers/net/e1000e/param.c
index a150e48..7b3bbec 100644
--- a/drivers/net/e1000e/param.c
+++ b/drivers/net/e1000e/param.c
@@ -390,7 +390,11 @@ void __devinit e1000e_check_options(struct e1000_adapter *adapter)
 			.type = range_option,
 			.name = "Interrupt Mode",
 			.err  = "defaulting to 2 (MSI-X)",
+#ifdef CONFIG_PCI_MSI
 			.def  = E1000E_INT_MODE_MSIX,
+#else
+			.def  = E1000E_INT_MODE_LEGACY,
+#endif
 			.arg  = { .r = { .min = MIN_INTMODE,
 					 .max = MAX_INTMODE } }
 		};
-- 
1.7.3



^ permalink raw reply related

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Eric Dumazet @ 2011-04-12  4:56 UTC (permalink / raw)
  To: Wei Gu; +Cc: Alexander Duyck, Peter Zijlstra, netdev, Kirsher, Jeffrey T
In-Reply-To: <D12839161ADD3A4B8DA63D1A134D084026E490995D@ESGSCCMS0001.eapac.ericsson.se>

Le mardi 12 avril 2011 à 12:40 +0800, Wei Gu a écrit :
> Hi,
> I found the problem was introduced by this revert patch "2010-08-13
> Peter Zijlstra  sched: Revert nohz_ratelimit() for now"
> 
> I tried the remove this patch from 2.6.35.2 and then build the
> application again, then the ixgbe driver looks works fine.
> I don't know why this time revert the  nohz_ratelimit() will cause the
> problem on ixgbe driver, since this  nohz_ratelimit was first
> introduced "2010-03-11". And before that time with 2.6.32 kernel it
> also doesn't have this problem with ixgbe driver.
> 
> 
> Some log from git:
> =========================================================================================
> 2.6.35.2
> 2010-08-13      Peter Zijlstra  sched: Revert nohz_ratelimit() for now
> 2.6.35.1
> 2010-08-01      Linus Torvalds  Linux 2.6.35 v2.6.35
> 2010-06-17      Peter Zijlstra  nohz: Fix nohz ratelimit
> 2.6.35-rc3
> 2010-03-11      Mike Galbraith  sched: Rate-limit nohz
> 
> Thanks
> WeiGu
> 

Hmm...

Could you try to add "processor.max_cstate=1" to boot parameters ?




^ permalink raw reply

* Re: Race condition when creating multiple namespaces?
From: Hans Schillstrom @ 2011-04-12  4:56 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, Daniel Lezcano
In-Reply-To: <m1ei58co08.fsf@fess.ebiederm.org>


On Tuesday, April 12, 2011 02:27:35 Eric W. Biederman wrote:
> Hans Schillstrom <hans@schillstrom.com> writes:
> 
> > Hello
> > I'v been strugling with this for some time now
> >
> > When creating multiple namespaces using lxc-start,  un-initialized network namespace parts will be called by the new process in the namespace.
> > ex. when using conntrack or ipvsadm to quickly,  (a sleep 2 "solves" the problem).
> > (From what I can see syscall clone() is used in lx-start  i.e. do_fork will be called later on.)
> > Actually I was debugging ip_vs when closing multiple ns  when I fell into this one.
> >
> > I have a loop that create 33 containers whith lxc-start ... -- test.sh
> > the first thing the new conatiner does in test.sh is
> > #!/bin/bash
> > iptables -t mangle -A PREROUTING -m conntrack --ctstate RELATED,ESTABLISHED -j CONNMARK --restore-mark
> > nc -l -p1234
> >
> > This results in NULL ptr in ip_conntrack_net_init(struct *net)
> 
> Ouch!
> 
> > and in anoither test test.sh looks like this
> > #!/bin/bash
> > ipvsadm --start-daemon=master --mcast-interface=lo
> > nc -l -p1234
> >
> > And this results in an uniitialized spinlock in ip_vs_sync
> >
> > I put a printk in nsproxy: copy_namespaces() and could see a dozens of them
> > before anything appears from ipvs or conntrack.
> >
> > My feeling is that when you start up user processes in a new name space, 
> > all kernel related init should have been done (you should not need to add a sleep to get it working)
> >
> > All test  made by using todays net-next-2.6 (2.6.39-rc1)
> >
> > Note:
> > That neither conntrack or ip_vs modules where loaded,
> > if modules where loaded before creating new namespaces it all works...
> >
> > Finally the question,
> > Should it really work to load modules within a namespace , 
> > that is a part of netns ?
> 
> >From an implementation point of view kernel modules are not in a
> namespace, so there should be no difference between being in a namespace
> and loading a kernel networking module and not being in a namespace and
> loading in a kernel module.
> 
> It does sound like you have hit a module loading race, and perhaps
> a race that is confined to network namespaces.
> 
> My head is in another problem so I won't be able to look at this for
> a bit.  But if you are getting into ip_conntrack_net_init with
> a NULL network namespace something spectacularly bad is happening.

OK I'll continue to dig into this.

> 
> In particular it looks like you must be hitting a bug in for_each_net.
> Which would pretty much have to be a race in adding or removing from
> net_namespace_list.

It was further down in proc_net_fops_create()

> 
> I took a quick skim through the code and whenever we modify the
> net_namespace we hold but the net_mutex and inside it the rtnl_lock so I
> don't immediate see how you could be getting a NULL net into
> ip_conntrack_net_init.

I do had the same problem in ip_vs  a couple of times, but at that time I thought it was my changes...
In the ip_vs case it seems to be more like a race or a missing lock one core reach a "not fully" initialized ipvs struct.
That could be my fault like bad order when calling register_pernet_subsys...

> 
> Is there a codepath besides register_pernet_subsys that is calling
> ip_conntrack_net_init?

Not what I can see...
> 
> Do you have any local modifications that could be messing up register_pernet_subsys?

Not right now (I took them away, a clean git clone)

> 
> Eric
> 

I will continue with this today 

Thanks a lot
Hans

^ permalink raw reply

* Re: [PATCH] net: ipv4: add IPPROTO_ICMP socket kind
From: Solar Designer @ 2011-04-12  5:06 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: linux-kernel, netdev, Pavel Kankovsky, Kees Cook, Dan Rosenberg,
	Eugene Teo, Nelson Elhage, David S. Miller, Alexey Kuznetsov,
	Pekka Savola, James Morris, Hideaki YOSHIFUJI, Patrick McHardy
In-Reply-To: <20110409101514.GA4262@albatros>

On Sat, Apr 09, 2011 at 02:15:14PM +0400, Vasiliy Kulikov wrote:
> This patch adds IPPROTO_ICMP socket kind.  It makes it possible to send
> ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
> without any special privileges.  In other words, the patch makes it
> possible to implement setuid-less and CAP_NET_RAW-less /bin/ping.  In
> order not to increase the kernel's attack surface (in case of
> vulnerabilities in the newly added code), the new functionality is
> disabled by default, but is enabled at bootup by supporting Linux
> distributions, optionally with restriction to a group or a group range
...
> For Openwall GNU/*/Linux it is the last step on the road to the
> setuid-less distro.

More correctly, it _was_ the last step - we've already taken it, so a
revision of the patch (against OpenVZ/RHEL5 kernels) is currently in use.

We would really like this accepted into mainline, which is why Vasiliy
spends extra effort to keep the patch updated to current mainline
kernels and re-test it.  If there are any comments/concerns/objections,
we'd be happy to hear those.

> Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>

Acked-by: Solar Designer <solar@openwall.com>

>  include/net/netns/ipv4.h   |    2 +
>  include/net/ping.h         |   69 ++++
>  net/ipv4/Kconfig           |   21 +
>  net/ipv4/Makefile          |    1 +
>  net/ipv4/af_inet.c         |   36 ++
>  net/ipv4/icmp.c            |   14 +-
>  net/ipv4/ping.c            |  933 ++++++++++++++++++++++++++++++++++++++++++++
>  net/ipv4/sysctl_net_ipv4.c |   90 +++++
>  8 files changed, 1165 insertions(+), 1 deletions(-)

Thanks,

Alexander

^ permalink raw reply

* Re: Kernel panic when using bridge
From: Scot Doyle @ 2011-04-12  5:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, Hiroaki SHIMODA, netdev
In-Reply-To: <1302582172.3603.18.camel@edumazet-laptop>

On 04/11/2011 11:22 PM, Eric Dumazet wrote:
> Also, I would first check if skb->dst already set to not leak a dst
>
> if (!skb->dst) {
> 	rt = bridge_parent_rtable(dev);
> 	if (!rt) {
> 		kfree_skb(skb);
> 		return 0;
> 	}
> 	skb_dst_set_noref(skb,&rt->dst);
> }

Thank you for the idea. Here is the compiler output referring to the 
first line above.

net/bridge/br_netfilter.c: In function 'br_parse_ip_options':
net/bridge/br_netfilter.c:260:10: error: 'struct sk_buff' has no member 
named 'dst'


^ permalink raw reply

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Wei Gu @ 2011-04-12  5:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Alexander Duyck, Peter Zijlstra, netdev, Kirsher, Jeffrey T
In-Reply-To: <1302584201.3603.20.camel@edumazet-laptop>

Hi,
It doesn't looks any better by pass this param to kernel

  kernel /vmlinuz-2.6.35.2 ro root=UUID=e96f9df8-c28a-4ea8-ac26-64fbf948bce2 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.iso88591 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=sv-latin1 crashkernel=auto pci=bfsort rhgb quiet console=tty0 console=ttyS0,115200 processor.max_cstate=1


-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
Sent: Tuesday, April 12, 2011 12:57 PM
To: Wei Gu
Cc: Alexander Duyck; Peter Zijlstra; netdev; Kirsher, Jeffrey T
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel

Le mardi 12 avril 2011 à 12:40 +0800, Wei Gu a écrit :
> Hi,
> I found the problem was introduced by this revert patch "2010-08-13
> Peter Zijlstra  sched: Revert nohz_ratelimit() for now"
>
> I tried the remove this patch from 2.6.35.2 and then build the
> application again, then the ixgbe driver looks works fine.
> I don't know why this time revert the  nohz_ratelimit() will cause the
> problem on ixgbe driver, since this  nohz_ratelimit was first
> introduced "2010-03-11". And before that time with 2.6.32 kernel it
> also doesn't have this problem with ixgbe driver.
>
>
> Some log from git:
> ======================================================================
> ===================
> 2.6.35.2
> 2010-08-13      Peter Zijlstra  sched: Revert nohz_ratelimit() for now
> 2.6.35.1
> 2010-08-01      Linus Torvalds  Linux 2.6.35 v2.6.35
> 2010-06-17      Peter Zijlstra  nohz: Fix nohz ratelimit
> 2.6.35-rc3
> 2010-03-11      Mike Galbraith  sched: Rate-limit nohz
>
> Thanks
> WeiGu
>

Hmm...

Could you try to add "processor.max_cstate=1" to boot parameters ?




^ permalink raw reply

* Re: Kernel panic when using bridge
From: Eric Dumazet @ 2011-04-12  5:51 UTC (permalink / raw)
  To: Scot Doyle; +Cc: Stephen Hemminger, Hiroaki SHIMODA, netdev
In-Reply-To: <4DA3E074.5090603@scotdoyle.com>

Le mardi 12 avril 2011 à 00:17 -0500, Scot Doyle a écrit :
> On 04/11/2011 11:22 PM, Eric Dumazet wrote:
> > Also, I would first check if skb->dst already set to not leak a dst
> >
> > if (!skb->dst) {

Oh well, sorry (not enough time these days to even test patches)

	if (!skb_dst(skb)) {

> > 	rt = bridge_parent_rtable(dev);
> > 	if (!rt) {
> > 		kfree_skb(skb);
> > 		return 0;
> > 	}
> > 	skb_dst_set_noref(skb,&rt->dst);
> > }
> 
> Thank you for the idea. Here is the compiler output referring to the 
> first line above.
> 
> net/bridge/br_netfilter.c: In function 'br_parse_ip_options':
> net/bridge/br_netfilter.c:260:10: error: 'struct sk_buff' has no member 
> named 'dst'
> 



^ permalink raw reply

* Is __xfrm_lookup always on non-atomic context ?
From: Eduardo Panisset @ 2011-04-12  5:58 UTC (permalink / raw)
  To: netdev

Hi all,

I'm using XFRM for tunneling payload traffic on Dual Stack Mobility application.
However, if correspondent XFRM states to XFRM policy's templates have
not been registered yet, It's possible the current process wait for
them, using a wait queue.
But what if this function is not being called on atomic context (e.g. softirq) ?

Thanks in advance,
Eduardo Panisset.

^ permalink raw reply

* Re: Kernel panic when using bridge
From: Scot Doyle @ 2011-04-12  7:02 UTC (permalink / raw)
  To: Eric Dumazet, Stephen Hemminger; +Cc: Hiroaki SHIMODA, netdev
In-Reply-To: <1302587490.3603.22.camel@edumazet-laptop>

On 04/12/2011 12:51 AM, Eric Dumazet wrote:
>
> Oh well, sorry (not enough time these days to even test patches)
>
> 	if (!skb_dst(skb)) {

--- br_netfilter.c.a    2011-04-01 02:37:53.000000000 -0500
+++ br_netfilter.c.b    2011-04-12 00:29:00.000000000 -0500
@@ -221,6 +221,7 @@ static int br_parse_ip_options(struct sk
      struct ip_options *opt;
      struct iphdr *iph;
      struct net_device *dev = skb->dev;
+    struct rtable *rt;
      u32 len;

      iph = ip_hdr(skb);
@@ -255,6 +256,16 @@ static int br_parse_ip_options(struct sk
          return 0;
      }

+    /* Associate bogus bridge route table */
+    if (!skb_dst(skb)) {
+        rt = bridge_parent_rtable(dev);
+        if (!rt) {
+            kfree_skb(skb);
+            return 0;
+        }
+        skb_dst_set_noref(skb,&rt->dst);
+    }
+
      opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
      if (ip_options_compile(dev_net(dev), opt, skb))
          goto inhdr_error;


Now we are making progress! With the patch above from Stephen and Eric, 
I cannot make the kernel panic when sending packets to the IP address of 
the bridge.

However, if a guest virtual machine is sharing the bridge with the host 
via a tap device, I can cause a host panic by targeting the IP address 
of the guest. Is this an unrelated problem?

Here are two kernel panics. The guest virtual machine was pingable 
before being attacked with IP Stack Checker's tcpsic command. Spanning 
Tree Protocol was off during the first panic and on during the second.

------------

[  606.921739] br0: port 2(tap0) entering forwarding state
[  636.058941] Kernel panic - not syncing: stack-protector: Kernel stack 
is corrupted in: ffffffff812c2781
[  636.058942]
[  636.069789] Pid: 2261, comm: kvm Tainted: G        W   2.6.39-rc2+ #11
[  636.076292] Call Trace:
[  636.078725] <IRQ>  [<ffffffff8132ad78>] ? panic+0x92/0x1a1
[  636.084287]  [<ffffffff8104abe8>] ? _local_bh_enable_ip.clone.8+0x20/0x8c
[  636.091044]  [<ffffffff812c2781>] ? icmp_send+0x337/0x349
[  636.096418]  [<ffffffff810454e5>] ? __stack_chk_fail+0x17/0x17
[  636.102221]  [<ffffffff812c2781>] ? icmp_send+0x337/0x349
[  636.107595]  [<ffffffff81298527>] ? nf_iterate+0x41/0x7e
[  636.112883]  [<ffffffff81298527>] ? nf_iterate+0x41/0x7e
[  636.118172]  [<ffffffffa017b0d4>] ? br_flood+0xc8/0xc8 [bridge]
[  636.124065]  [<ffffffffa017b250>] ? __br_deliver+0xb0/0xb0 [bridge]
[  636.130302]  [<ffffffff812985d7>] ? nf_hook_slow+0x73/0x114
[  636.135850]  [<ffffffffa017b250>] ? __br_deliver+0xb0/0xb0 [bridge]
[  636.142089]  [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[  636.148586]  [<ffffffffa017b250>] ? __br_deliver+0xb0/0xb0 [bridge]
[  636.154826]  [<ffffffffa017b186>] ? NF_HOOK.clone.5+0x3c/0x56 [bridge]
[  636.161323]  [<ffffffffa017bfe1>] ? 
br_handle_frame_finish+0x158/0x1c7 [bridge]
[  636.168601]  [<ffffffffa0180689>] ? 
br_nf_pre_routing_finish+0x1d4/0x1e1 [bridge]
[  636.176052]  [<ffffffffa017fc76>] ? NF_HOOK_THRESH+0x3b/0x55 [bridge]
[  636.182463]  [<ffffffffa0180c84>] ? br_nf_pre_routing+0x3be/0x3cb 
[bridge]
[  636.189307]  [<ffffffff812985d7>] ? nf_hook_slow+0x73/0x114
[  636.194852]  [<ffffffff81298527>] ? nf_iterate+0x41/0x7e
[  636.200139]  [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[  636.206637]  [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[  636.213133]  [<ffffffff812985d7>] ? nf_hook_slow+0x73/0x114
[  636.218679]  [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[  636.225177]  [<ffffffffa017bfe1>] ? 
br_handle_frame_finish+0x158/0x1c7 [bridge]
[  636.232455]  [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[  636.238954]  [<ffffffffa017be6f>] ? NF_HOOK.clone.4+0x3c/0x56 [bridge]
[  636.245452]  [<ffffffff812a7d8e>] ? tcp_gro_receive+0xa1/0x204
[  636.251258]  [<ffffffffa017c1e5>] ? br_handle_frame+0x195/0x1ac [bridge]
[  636.257928]  [<ffffffffa017c050>] ? 
br_handle_frame_finish+0x1c7/0x1c7 [bridge]
[  636.265204]  [<ffffffff812764ef>] ? __netif_receive_skb+0x2a7/0x450
[  636.271443]  [<ffffffff81276928>] ? netif_receive_skb+0x52/0x58
[  636.277335]  [<ffffffff81276e2a>] ? napi_gro_receive+0x1f/0x2f
[  636.283139]  [<ffffffff812769ff>] ? napi_skb_finish+0x1c/0x31
[  636.288865]  [<ffffffffa0241fcd>] ? igb_poll+0x6d9/0x9ee [igb]
[  636.294673]  [<ffffffffa003bde2>] ? scsi_run_queue+0x2ce/0x30a [scsi_mod]
[  636.301431]  [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[  636.307930]  [<ffffffff812764ef>] ? __netif_receive_skb+0x2a7/0x450
[  636.314168]  [<ffffffff81276f55>] ? net_rx_action+0xa4/0x1b1
[  636.319800]  [<ffffffff8104ad26>] ? __do_softirq+0xb8/0x176
[  636.325346]  [<ffffffff81333c5c>] ? call_softirq+0x1c/0x30
[  636.330807]  [<ffffffff8100aa57>] ? do_softirq+0x3f/0x84
[  636.336092]  [<ffffffff8104af91>] ? irq_exit+0x3f/0x8f
[  636.341204]  [<ffffffff8100a793>] ? do_IRQ+0x85/0x9e
[  636.346146]  [<ffffffff8132cbd3>] ? common_interrupt+0x13/0x13
[  636.351949] <EOI>  [<ffffffff81271f58>] ? arch_local_irq_save+0x12/0x1b
[  636.358629]  [<ffffffff8100a9f2>] ? arch_local_irq_restore+0x2/0x8
[  636.364781]  [<ffffffff8127680d>] ? netif_rx_ni+0x1e/0x27
[  636.370154]  [<ffffffffa01557d2>] ? tun_get_user+0x3a3/0x3cb [tun]
[  636.376305]  [<ffffffffa0155bd8>] ? tun_get_socket+0x3b/0x3b [tun]
[  636.382457]  [<ffffffffa0155c36>] ? tun_chr_aio_write+0x5e/0x79 [tun]
[  636.388869]  [<ffffffff810f6b07>] ? do_sync_readv_writev+0x9a/0xd5
[  636.395021]  [<ffffffff810371f3>] ? need_resched+0x1a/0x23
[  636.400481]  [<ffffffff8132b725>] ? _cond_resched+0x9/0x20
[  636.405941]  [<ffffffff810f5f77>] ? copy_from_user+0x18/0x30
[  636.411573]  [<ffffffff8115fbf6>] ? security_file_permission+0x18/0x33
[  636.418068]  [<ffffffff810f6d55>] ? do_readv_writev+0xa4/0x11a
[  636.423873]  [<ffffffff810f7913>] ? fput+0x1a/0x1a2
[  636.428726]  [<ffffffff810f6f39>] ? sys_writev+0x45/0x90
[  636.434012]  [<ffffffff81332a52>] ? system_call_fastpath+0x16/0x1b

------------

[  110.442839] br0: port 2(tap0) entering forwarding state
[  136.948700] Kernel panic - not syncing: stack-protector: Kernel stack 
is corrupted in: ffffffff812c2781
[  136.948702]
[  136.959561] Pid: 1093, comm: md123_resync Not tainted 2.6.39-rc2+ #11
[  136.965977] Call Trace:
[  136.968408] <IRQ>  [<ffffffff8132ad78>] ? panic+0x92/0x1a1
[  136.973970]  [<ffffffff8104abe8>] ? _local_bh_enable_ip.clone.8+0x20/0x8c
[  136.980727]  [<ffffffff812c2781>] ? icmp_send+0x337/0x349
[  136.986102]  [<ffffffff810454e5>] ? __stack_chk_fail+0x17/0x17
[  136.991906]  [<ffffffff812c2781>] ? icmp_send+0x337/0x349
[  136.997281]  [<ffffffff81298527>] ? nf_iterate+0x41/0x7e
[  137.002570]  [<ffffffffa0198fe1>] ? 
br_handle_frame_finish+0x158/0x1c7 [bridge]
[  137.009847]  [<ffffffffa019d689>] ? 
br_nf_pre_routing_finish+0x1d4/0x1e1 [bridge]
[  137.017297]  [<ffffffffa019cc76>] ? NF_HOOK_THRESH+0x3b/0x55 [bridge]
[  137.023707]  [<ffffffffa019dc84>] ? br_nf_pre_routing+0x3be/0x3cb 
[bridge]
[  137.030551]  [<ffffffff81298527>] ? nf_iterate+0x41/0x7e
[  137.035837]  [<ffffffff8103704d>] ? test_tsk_need_resched+0xe/0x17
[  137.041991]  [<ffffffffa0198e89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[  137.048488]  [<ffffffffa0198e89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[  137.054984]  [<ffffffff812985d7>] ? nf_hook_slow+0x73/0x114
[  137.060531]  [<ffffffffa0198e89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[  137.067028]  [<ffffffffa0198e89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
[  137.073526]  [<ffffffffa0198e6f>] ? NF_HOOK.clone.4+0x3c/0x56 [bridge]
[  137.080023]  [<ffffffff812a7d8e>] ? tcp_gro_receive+0xa1/0x204
[  137.085830]  [<ffffffffa01991e5>] ? br_handle_frame+0x195/0x1ac [bridge]
[  137.092500]  [<ffffffffa0199050>] ? 
br_handle_frame_finish+0x1c7/0x1c7 [bridge]
[  137.099776]  [<ffffffff812764ef>] ? __netif_receive_skb+0x2a7/0x450
[  137.106013]  [<ffffffff81276928>] ? netif_receive_skb+0x52/0x58
[  137.111906]  [<ffffffff81276e2a>] ? napi_gro_receive+0x1f/0x2f
[  137.117713]  [<ffffffff812769ff>] ? napi_skb_finish+0x1c/0x31
[  137.123438]  [<ffffffffa0226fcd>] ? igb_poll+0x6d9/0x9ee [igb]
[  137.129243]  [<ffffffff8109034f>] ? handle_irq_event+0x40/0x55
[  137.135049]  [<ffffffff8132cbd3>] ? common_interrupt+0x13/0x13
[  137.140854]  [<ffffffff81276f55>] ? net_rx_action+0xa4/0x1b1
[  137.146487]  [<ffffffff8104ad26>] ? __do_softirq+0xb8/0x176
[  137.152034]  [<ffffffff81333c5c>] ? call_softirq+0x1c/0x30
[  137.157494]  [<ffffffff8100aa57>] ? do_softirq+0x3f/0x84
[  137.162779]  [<ffffffff8104af91>] ? irq_exit+0x3f/0x8f
[  137.167893]  [<ffffffff8100a793>] ? do_IRQ+0x85/0x9e
[  137.172833]  [<ffffffff8132cbd3>] ? common_interrupt+0x13/0x13
[  137.178636] <EOI>  [<ffffffff8106fc1a>] ? arch_local_irq_restore+0x2/0x8
[  137.185408]  [<ffffffffa0050fca>] ? _scsih_qcmd+0x54f/0x561 [mpt2sas]
[  137.191823]  [<ffffffffa01e452f>] ? scsi_dispatch_cmd+0x180/0x219 
[scsi_mod]
[  137.198841]  [<ffffffffa01ea385>] ? scsi_request_fn+0x3e6/0x413 
[scsi_mod]
[  137.205683]  [<ffffffff81187470>] ? elv_rqhash_add.clone.15+0x26/0x4c
[  137.212095]  [<ffffffff8118bde2>] ? __blk_run_queue+0x5e/0x84
[  137.217814]  [<ffffffff8118d63c>] ? __make_request+0x273/0x28f
[  137.223619]  [<ffffffff8118b569>] ? generic_make_request+0x267/0x2e1
[  137.229943]  [<ffffffff8105eb49>] ? remove_wait_queue+0x11/0x4d
[  137.235837]  [<ffffffffa0002417>] ? raise_barrier+0x162/0x16f [raid1]
[  137.242246]  [<ffffffff8103eba4>] ? try_to_wake_up+0x17c/0x17c
[  137.248052]  [<ffffffffa0002f2f>] ? sync_request+0x567/0x583 [raid1]
[  137.254379]  [<ffffffffa00bd834>] ? md_do_sync+0x776/0xb8e [md_mod]
[  137.260617]  [<ffffffff8100e537>] ? sched_clock+0x5/0x8
[  137.265819]  [<ffffffffa00bde83>] ? md_thread+0xfa/0x118 [md_mod]
[  137.271886]  [<ffffffffa00bdd89>] ? md_rdev_init+0x8f/0x8f [md_mod]
[  137.278124]  [<ffffffffa00bdd89>] ? md_rdev_init+0x8f/0x8f [md_mod]
[  137.284362]  [<ffffffff8105e497>] ? kthread+0x7a/0x82
[  137.289390]  [<ffffffff81333b64>] ? kernel_thread_helper+0x4/0x10
[  137.295454]  [<ffffffff8105e41d>] ? kthread_worker_fn+0x149/0x149
[  137.301519]  [<ffffffff81333b60>] ? gs_change+0x13/0x13


^ permalink raw reply

* [PATCH NET-2.6 1/1] qlcnic: limit skb frags for non tso packet
From: Amit Kumar Salecha @ 2011-04-12  7:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, anirban.chakraborty, stable, ameen.rahman
In-Reply-To: <1302592781-13881-1-git-send-email-amit.salecha@qlogic.com>

Machines are getting deadlock in four node cluster environment.
All nodes are accessing (find /gfs2 -depth -print|cpio -ocv > /dev/null)
200 GB storage on a GFS2 filesystem.
This result in memory fragmentation and driver receives 18 frags for
1448 byte packets.
For non tso packet, fw drops the tx request, if it has >14 frags.

Fixing it by pulling extra frags.

Cc: stable@kernel.org
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h      |    1 +
 drivers/net/qlcnic/qlcnic_main.c |   14 ++++++++++++++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index dc44564..b0dead0 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -99,6 +99,7 @@
 #define TX_UDPV6_PKT	0x0c
 
 /* Tx defines */
+#define QLCNIC_MAX_FRAGS_PER_TX	14
 #define MAX_TSO_HEADER_DESC	2
 #define MGMT_CMD_DESC_RESV	4
 #define TX_STOP_THRESH		((MAX_SKB_FRAGS >> 2) + MAX_TSO_HEADER_DESC \
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index cd88c7e..cb1a1ef 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -2099,6 +2099,7 @@ qlcnic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 	struct cmd_desc_type0 *hwdesc, *first_desc;
 	struct pci_dev *pdev;
 	struct ethhdr *phdr;
+	int delta = 0;
 	int i, k;
 
 	u32 producer;
@@ -2118,6 +2119,19 @@ qlcnic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 	}
 
 	frag_count = skb_shinfo(skb)->nr_frags + 1;
+	/* 14 frags supported for normal packet and
+	 * 32 frags supported for TSO packet
+	 */
+	if (!skb_is_gso(skb) && frag_count > QLCNIC_MAX_FRAGS_PER_TX) {
+
+		for (i = 0; i < (frag_count - QLCNIC_MAX_FRAGS_PER_TX); i++)
+			delta += skb_shinfo(skb)->frags[i].size;
+
+		if (!__pskb_pull_tail(skb, delta))
+			goto drop_packet;
+
+		frag_count = 1 + skb_shinfo(skb)->nr_frags;
+	}
 
 	/* 4 fragments per cmd des */
 	no_of_desc = (frag_count + 3) >> 2;
-- 
1.7.3.2

_______________________________________________
stable mailing list
stable@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/stable

^ permalink raw reply related

* [PATCH NET-2.6 0/1]qlcnic: bug fix
From: Amit Kumar Salecha @ 2011-04-12  7:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty

David,
	Apply this fix to net-2.6 tree.
	This patch will give hunk failure while merging to net-next tree.
	Somehow I can't avoid it. Two lines below diff has changed in qlcnic_xmit_frame().

-Amit

^ permalink raw reply

* Re: [PATCH v2] net: bnx2x: convert to hw_features
From: Vladislav Zolotarov @ 2011-04-12  7:26 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev@vger.kernel.org, Eilon Greenstein
In-Reply-To: <20110411201225.GA9249@rere.qmqm.pl>

On Mon, 2011-04-11 at 13:12 -0700, Michał Mirosław wrote:
> The v3 patch fixes missing LRO flag and ensures that netdev_update_features()
> won't be called after failed bnx2x_nic_load(). More comments below.

As long as there is v4 already I'll comment it and skip v3. See a few
comments on your comments below. ;)

> 
> On Mon, Apr 11, 2011 at 05:10:21PM +0300, Vladislav Zolotarov wrote:
> > On Sun, 2011-04-10 at 08:35 -0700, Michał Mirosław wrote:
> > > Since ndo_fix_features callback is postponing features change when
> > > bp->recovery_state != BNX2X_RECOVERY_DONE, netdev_update_features()
> > > has to be called again when this condition changes.
> > Unfortunately, NACK again. See below, pls.
> [...]
> > > diff --git a/drivers/net/bnx2x/bnx2x_cmn.c b/drivers/net/bnx2x/bnx2x_cmn.c
> > > index e83ac6d..9691b67 100644
> > > --- a/drivers/net/bnx2x/bnx2x_cmn.c
> > > +++ b/drivers/net/bnx2x/bnx2x_cmn.c
> > > @@ -2443,11 +2443,21 @@ alloc_err:
> > >  
> > >  }
> > >  
> > > +static int bnx2x_reload_if_running(struct net_device *dev)
> > > +{
> > > +	struct bnx2x *bp = netdev_priv(dev);
> > > +
> > > +	if (unlikely(!netif_running(dev)))
> > > +		return 0;
> > > +
> > > +	bnx2x_nic_unload(bp, UNLOAD_NORMAL);
> > > +	return bnx2x_nic_load(bp, LOAD_NORMAL);
> > > +}
> > > +
> > >  /* called with rtnl_lock */
> > >  int bnx2x_change_mtu(struct net_device *dev, int new_mtu)
> > >  {
> [...]
> > > +u32 bnx2x_fix_features(struct net_device *dev, u32 features)
> > > +{
> > > +	struct bnx2x *bp = netdev_priv(dev);
> > > +
> > > +	if (bp->recovery_state != BNX2X_RECOVERY_DONE) {
> > > +		netdev_err(dev, "Handling parity error recovery. Try again later\n");
> > > +
> > > +		/* Don't allow bnx2x_set_features() to be called now. */
> > > +		return dev->features;
> > > +	}
> > > +
> > > +	/* TPA requires Rx CSUM offloading */
> > > +	if (!(features & NETIF_F_RXCSUM) || bp->disable_tpa)
> > > +		features &= ~NETIF_F_LRO;
> > Shouldn't it be (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM) and not
> > NETIF_F_RXCSUM?
> [...]
> > In addition this function should ensure NETIF_F_IP_CSUM and
> > NETIF_F_IPV6_CSUM are changed together.
> [...]
> > > +int bnx2x_set_features(struct net_device *dev, u32 features)
> [...]
> > Since there is no set_rx_csum() anymore the above function has to handle
> > bp->rx_csum namely correlate it with (NETIF_F_IP_CSUM |
> > NETIF_F_IPV6_CSUM) bits in the 'features'. 
> 
> You seem to confuse TX checksum offloads (IP_CSUM,IPV6_CSUM) with
> RX checksum offload (RXCSUM).

U r right. My bad. However u forgot to add RXCSUM to hw_features in v2
but I see it fixed in v4.

> 
> The driver doesn't touch hardware state on changes to checksum offloads
> so they are independent - there's no point in adding artificial
> dependencies here.

Considering Tx csum offloads u are right but this is not true regarding
the Rx csum offload and this is what I meant above. I see that v4
properly handles it now. Sorry for a confusion.

> 
> [...]
> > > diff --git a/drivers/net/bnx2x/bnx2x_main.c b/drivers/net/bnx2x/bnx2x_main.c
> > > index f3cf889..ffa0611 100644
> > > --- a/drivers/net/bnx2x/bnx2x_main.c
> > > +++ b/drivers/net/bnx2x/bnx2x_main.c
> > > @@ -7661,6 +7661,7 @@ exit_leader_reset:
> > >  	bp->is_leader = 0;
> > >  	bnx2x_release_hw_lock(bp, HW_LOCK_RESOURCE_RESERVED_08);
> > >  	smp_wmb();
> > > +	netdev_update_features(bp->dev);
> > >  	return rc;
> > >  }
> > 
> > Before I continue I'd like to clarify one thing: there is no sense to
> > call for netdev_update_features() if bnx2x_nic_load(), called right
> > before it, has failed as long as the following bnx2x_nic_load() that
> > will be called from the netdev_update_features() flow will also fail
> > (for the same reasons as the previous one). If bnx2x_nic_load() fails
> > for the certain NIC we actually shut this NIC down. So, the following
> > remarks will be based on the above statement.
> 
> In all those cases, bnx2x_reload_if_running() will be called only when
> LRO state is changed while there's a recovery in progress.

Hmmm... And what about all other features from hw_features? What if they
have changed (in wanted_features) while recovery was in progress?
According to the __netdev_update_features() code it will invoke
ndo_set_features() in these cases either. Do I miss something here?

> 
> [...]
> > U shouldn't call for netdev_update_features(bp->dev) if bnx2x_nic_load()
> > has failed. It would also be nice if netdev_update_features() would
> > propagate the exit status of ndo_set_features() when ndo_set_features()
> > fails in the __netdev_update_features().
> 
> That's fixed in v3.

Not everything. See below.

> 
> > See the patch for the bnx2x below:
> > 
> > @@ -8993,7 +8995,14 @@ static int bnx2x_open(struct net_device *dev)
> >  
> >         bp->recovery_state = BNX2X_RECOVERY_DONE;
> >  
> > -       return bnx2x_nic_load(bp, LOAD_OPEN);
> > +       rc = bnx2x_nic_load(bp, LOAD_OPEN);
> > +       if (!rc)
> > +               netdev_update_features(bp->dev);
> > +
> > +       if (bp->state == BNX2X_STATE_OPEN)
> > +               return 0;
> > +       else
> > +               return -EBUSY;
> >  }
> 
> Hmm. I missed this part in the v3 patch. This clobbers bnx2x_nic_load()'s
> error return, though.

Exactly! Quoting my remark above: "It would also be nice if
netdev_update_features() would propagate the exit status of
ndo_set_features() when ndo_set_features() fails in the
__netdev_update_features()." Could u comment on this, pls.

> 
> > >  /* called with rtnl_lock */
> > > @@ -9304,6 +9309,8 @@ static const struct net_device_ops bnx2x_netdev_ops = {
> > >  	.ndo_validate_addr	= eth_validate_addr,
> > >  	.ndo_do_ioctl		= bnx2x_ioctl,
> > >  	.ndo_change_mtu		= bnx2x_change_mtu,
> > > +	.ndo_fix_features	= bnx2x_fix_features,
> > > +	.ndo_set_features	= bnx2x_set_features,
> > >  	.ndo_tx_timeout		= bnx2x_tx_timeout,
> > >  #ifdef CONFIG_NET_POLL_CONTROLLER
> > >  	.ndo_poll_controller	= poll_bnx2x,
> > > @@ -9430,20 +9437,18 @@ static int __devinit bnx2x_init_dev(struct pci_dev *pdev,
> > >  
> > >  	dev->netdev_ops = &bnx2x_netdev_ops;
> > >  	bnx2x_set_ethtool_ops(dev);
> > > -	dev->features |= NETIF_F_SG;
> > > -	dev->features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
> > > +
> > >  	if (bp->flags & USING_DAC_FLAG)
> > >  		dev->features |= NETIF_F_HIGHDMA;
> > > -	dev->features |= (NETIF_F_TSO | NETIF_F_TSO_ECN);
> > > -	dev->features |= NETIF_F_TSO6;
> > > -	dev->features |= (NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX);
> > >  
> > > -	dev->vlan_features |= NETIF_F_SG;
> > > -	dev->vlan_features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
> > > -	if (bp->flags & USING_DAC_FLAG)
> > > -		dev->vlan_features |= NETIF_F_HIGHDMA;
> > > -	dev->vlan_features |= (NETIF_F_TSO | NETIF_F_TSO_ECN);
> > > -	dev->vlan_features |= NETIF_F_TSO6;
> > > +	dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
> > > +		NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6 |
> > > +		NETIF_F_HW_VLAN_TX;
> > hw_features are missing NETIF_F_GRO and NETIF_F_LRO flags that are
> > currently configured in bnx2x_init_bp(). 
> 
> GRO is enabled by core now. LRO is fixed in v3.

Got it. Thanks.

> 
> > > +	dev->features |= dev->hw_features | NETIF_F_HW_VLAN_RX;
> > > +
> > > +	dev->vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
> > > +		NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6 | NETIF_F_HIGHDMA;
> > I'm not sure if it's safe to set NETIF_F_HIGHDMA unconditionally. I
> > think it's better to correlate it with the USING_DAC_FLAG which is set
> > according to what is returned by 
> > dma_set_mask(&pdev->dev, DMA_BIT_MASK(64)).
> 
> dev->vlan_features get masked with dev->features and only then applied
> to VLAN device.

Ok. However, could, pls., quote the above sentence of yours as a comment
for this code line? ;)

See my further comments for v4.

thanks,
vlad

> 
> Best Regards,
> Michał Mirosław
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 




^ permalink raw reply

* Re: Kernel panic when using bridge
From: Eric Dumazet @ 2011-04-12  7:31 UTC (permalink / raw)
  To: Scot Doyle; +Cc: Stephen Hemminger, Hiroaki SHIMODA, netdev
In-Reply-To: <4DA3F909.5020609@scotdoyle.com>

Le mardi 12 avril 2011 à 02:02 -0500, Scot Doyle a écrit :
> On 04/12/2011 12:51 AM, Eric Dumazet wrote:
> >
> > Oh well, sorry (not enough time these days to even test patches)
> >
> > 	if (!skb_dst(skb)) {
> 
> --- br_netfilter.c.a    2011-04-01 02:37:53.000000000 -0500
> +++ br_netfilter.c.b    2011-04-12 00:29:00.000000000 -0500
> @@ -221,6 +221,7 @@ static int br_parse_ip_options(struct sk
>       struct ip_options *opt;
>       struct iphdr *iph;
>       struct net_device *dev = skb->dev;
> +    struct rtable *rt;
>       u32 len;
> 
>       iph = ip_hdr(skb);
> @@ -255,6 +256,16 @@ static int br_parse_ip_options(struct sk
>           return 0;
>       }
> 
> +    /* Associate bogus bridge route table */
> +    if (!skb_dst(skb)) {
> +        rt = bridge_parent_rtable(dev);
> +        if (!rt) {
> +            kfree_skb(skb);
> +            return 0;
> +        }
> +        skb_dst_set_noref(skb,&rt->dst);
> +    }
> +
>       opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
>       if (ip_options_compile(dev_net(dev), opt, skb))
>           goto inhdr_error;
> 
> 
> Now we are making progress! With the patch above from Stephen and Eric, 
> I cannot make the kernel panic when sending packets to the IP address of 
> the bridge.
> 
> However, if a guest virtual machine is sharing the bridge with the host 
> via a tap device, I can cause a host panic by targeting the IP address 
> of the guest. Is this an unrelated problem?
> 
> Here are two kernel panics. The guest virtual machine was pingable 
> before being attacked with IP Stack Checker's tcpsic command. Spanning 
> Tree Protocol was off during the first panic and on during the second.
> 


I wonder if you are not running out of free stack space...

And it might be because of inet_getpeer() calling cleanup_once()

# objdump64 -d net/ipv4/inetpeer.o | scripts/checkstack.pl
0x0317 cleanup_once [inetpeer.o]:			344
0x03d6 cleanup_once [inetpeer.o]:			344
0x0680 inet_getpeer [inetpeer.o]:			344
0x071d inet_getpeer [inetpeer.o]:			344
0x0004 inet_initpeers [inetpeer.o]:			112



^ permalink raw reply

* Re: [PATCH v2] net: bnx2x: convert to hw_features
From: Vladislav Zolotarov @ 2011-04-12  7:46 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev@vger.kernel.org, Eilon Greenstein
In-Reply-To: <1302593208.32697.18.camel@lb-tlvb-vladz>


> > In all those cases, bnx2x_reload_if_running() will be called only when
> > LRO state is changed while there's a recovery in progress.
> 
> Hmmm... And what about all other features from hw_features? What if they
> have changed (in wanted_features) while recovery was in progress?
> According to the __netdev_update_features() code it will invoke
> ndo_set_features() in these cases either. Do I miss something here?

I think I understood what u meant. So, yes, if the bnx2x_nic_load()
called only if TPA_ENABLED_FLAG in bp->flags has changed. And this can
happen if either NETIF_F_LRO has changed while NETIF_F_RXCSUM was set or
if NETIF_F_LRO was set and NETIF_F_RXCSUM is being cleared.

thanks,
vlad



^ permalink raw reply

* Re: [RFC] iproute2: Fix meta match u32 with 0xffffffff
From: Thomas Graf @ 2011-04-12  7:56 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20110411115234.74b5936b@nehalam>

On Mon, 2011-04-11 at 11:52 -0700, Stephen Hemminger wrote: 
> The value 0xffffffff is a valid mask and bstrtoul() would return
> ULONG_MAX which was the error value. Resolve the problem by separating
> return value and error indication.
>  
> -unsigned long bstrtoul(const struct bstr *b)
> +int bstrtoul(const struct bstr *b, unsigned long *lp)
>  {
>  	char *inv = NULL;
> -	unsigned long l;
>  	char buf[b->len+1];
>  
> +	if (b->len == 0)
> +		return -EINVAL;
> +
>  	memcpy(buf, b->data, b->len);
>  	buf[b->len] = '\0';
>  
> -	l = strtoul(buf, &inv, 0);
> -	if (l == ULONG_MAX || inv == buf)
> -		return ULONG_MAX;
> +	*lp = strtoul(buf, &inv, 0);
> +	if (inv == buf)
> +		return -EINVAL;
> +
> +	if (*lp == ULONG_MAX || errno == ERANGE)
> +		return -ERANGE;
>  
> -	return l;
> +	return 0;
>  }

This is definitely much better but we still can't parse ULONG_MAX
as string representative. Checking glibc docs, the only way to do it is
to ignore the return value for error checking and look errno.

So I guess we should do something like this:

errno = 0;
*lp = strtoul(buf, &inv, 0);
if (*inv != '\0')
return -EINVAL;
else if (errno)
return errno;

return 0;


^ permalink raw reply

* [PATCH NET-2.6 1/1] qlcnic: limit skb frags for non tso packet
From: Amit Kumar Salecha @ 2011-04-12  8:15 UTC (permalink / raw)
  To: netdev; +Cc: stable

Machines are getting deadlock in four node cluster environment.
All nodes are accessing (find /gfs2 -depth -print|cpio -ocv > /dev/null)
200 GB storage on a GFS2 filesystem.
This result in memory fragmentation and driver receives 18 frags for
1448 byte packets.
For non tso packet, fw drops the tx request, if it has >14 frags.

Fixing it by pulling extra frags.

Cc: stable@kernel.org
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h      |    1 +
 drivers/net/qlcnic/qlcnic_main.c |   14 ++++++++++++++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index dc44564..b0dead0 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -99,6 +99,7 @@
 #define TX_UDPV6_PKT	0x0c
 
 /* Tx defines */
+#define QLCNIC_MAX_FRAGS_PER_TX	14
 #define MAX_TSO_HEADER_DESC	2
 #define MGMT_CMD_DESC_RESV	4
 #define TX_STOP_THRESH		((MAX_SKB_FRAGS >> 2) + MAX_TSO_HEADER_DESC \
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index cd88c7e..cb1a1ef 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -2099,6 +2099,7 @@ qlcnic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 	struct cmd_desc_type0 *hwdesc, *first_desc;
 	struct pci_dev *pdev;
 	struct ethhdr *phdr;
+	int delta = 0;
 	int i, k;
 
 	u32 producer;
@@ -2118,6 +2119,19 @@ qlcnic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 	}
 
 	frag_count = skb_shinfo(skb)->nr_frags + 1;
+	/* 14 frags supported for normal packet and
+	 * 32 frags supported for TSO packet
+	 */
+	if (!skb_is_gso(skb) && frag_count > QLCNIC_MAX_FRAGS_PER_TX) {
+
+		for (i = 0; i < (frag_count - QLCNIC_MAX_FRAGS_PER_TX); i++)
+			delta += skb_shinfo(skb)->frags[i].size;
+
+		if (!__pskb_pull_tail(skb, delta))
+			goto drop_packet;
+
+		frag_count = 1 + skb_shinfo(skb)->nr_frags;
+	}
 
 	/* 4 fragments per cmd des */
 	no_of_desc = (frag_count + 3) >> 2;
-- 
1.7.3.2


^ permalink raw reply related

* [PATCH] inetpeer: reduce stack usage
From: Eric Dumazet @ 2011-04-12  8:39 UTC (permalink / raw)
  To: Scot Doyle, David Miller; +Cc: Stephen Hemminger, Hiroaki SHIMODA, netdev
In-Reply-To: <1302593469.3603.44.camel@edumazet-laptop>

On 64bit arches, we use 752 bytes of stack when cleanup_once() is called
from inet_getpeer().

Lets share the avl stack to save ~376 bytes.

Before patch :

# objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl

0x000006c3 unlink_from_pool [inetpeer.o]:		376
0x00000721 unlink_from_pool [inetpeer.o]:		376
0x00000cb1 inet_getpeer [inetpeer.o]:			376
0x00000e6d inet_getpeer [inetpeer.o]:			376
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5320	    432	     21	   5773	   168d	net/ipv4/inetpeer.o

After patch :

objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl 
0x00000c11 inet_getpeer [inetpeer.o]:			376
0x00000dcd inet_getpeer [inetpeer.o]:			376
0x00000ab9 peer_check_expire [inetpeer.o]:		328
0x00000b7f peer_check_expire [inetpeer.o]:		328
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5163	    432	     21	   5616	   15f0	net/ipv4/inetpeer.o

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Scot Doyle <lkml@scotdoyle.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
---
 net/ipv4/inetpeer.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index dd1b20e..9df4e63 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -354,7 +354,8 @@ static void inetpeer_free_rcu(struct rcu_head *head)
 }
 
 /* May be called with local BH enabled. */
-static void unlink_from_pool(struct inet_peer *p, struct inet_peer_base *base)
+static void unlink_from_pool(struct inet_peer *p, struct inet_peer_base *base,
+			     struct inet_peer __rcu **stack[PEER_MAXDEPTH])
 {
 	int do_free;
 
@@ -368,7 +369,6 @@ static void unlink_from_pool(struct inet_peer *p, struct inet_peer_base *base)
 	 * We use refcnt=-1 to alert lockless readers this entry is deleted.
 	 */
 	if (atomic_cmpxchg(&p->refcnt, 1, -1) == 1) {
-		struct inet_peer __rcu **stack[PEER_MAXDEPTH];
 		struct inet_peer __rcu ***stackptr, ***delp;
 		if (lookup(&p->daddr, stack, base) != p)
 			BUG();
@@ -422,7 +422,7 @@ static struct inet_peer_base *peer_to_base(struct inet_peer *p)
 }
 
 /* May be called with local BH enabled. */
-static int cleanup_once(unsigned long ttl)
+static int cleanup_once(unsigned long ttl, struct inet_peer __rcu **stack[PEER_MAXDEPTH])
 {
 	struct inet_peer *p = NULL;
 
@@ -454,7 +454,7 @@ static int cleanup_once(unsigned long ttl)
 		 * happen because of entry limits in route cache. */
 		return -1;
 
-	unlink_from_pool(p, peer_to_base(p));
+	unlink_from_pool(p, peer_to_base(p), stack);
 	return 0;
 }
 
@@ -524,7 +524,7 @@ struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
 
 	if (base->total >= inet_peer_threshold)
 		/* Remove one less-recently-used entry. */
-		cleanup_once(0);
+		cleanup_once(0, stack);
 
 	return p;
 }
@@ -540,6 +540,7 @@ static void peer_check_expire(unsigned long dummy)
 {
 	unsigned long now = jiffies;
 	int ttl, total;
+	struct inet_peer __rcu **stack[PEER_MAXDEPTH];
 
 	total = compute_total();
 	if (total >= inet_peer_threshold)
@@ -548,7 +549,7 @@ static void peer_check_expire(unsigned long dummy)
 		ttl = inet_peer_maxttl
 				- (inet_peer_maxttl - inet_peer_minttl) / HZ *
 					total / inet_peer_threshold * HZ;
-	while (!cleanup_once(ttl)) {
+	while (!cleanup_once(ttl, stack)) {
 		if (jiffies != now)
 			break;
 	}



^ permalink raw reply related

* Re: Loopback and Nagle's algorithm
From: Alejandro Riveira Fernández @ 2011-04-12  9:42 UTC (permalink / raw)
  To: Adam McLaurin; +Cc: linux-kernel, netdev
In-Reply-To: <1302575869.13492.1440076201@webmail.messagingengine.com>

El Mon, 11 Apr 2011 22:37:49 -0400
"Adam McLaurin" <lkml@irotas.net> escribió:

 Just CCing netdev

> I understand that disabling Nagle's algorithm via TCP_NODELAY will
> generally degrade throughput. However, in my scenario (150 byte
> messages, sending as fast as possible), the actual throughput penalty
> over the network is marginal (maybe 10% at most).
> 
> However, when I disable Nagle's algorithm when connecting over loopback,
> the performance hit is *huge* - 10x reduction in throughput.
> 
> The question is, why is disabling Nagle's algorithm on loopback so much
> worse w.r.t. throughput? Is there anything I can do to reduce the
> incurred throughput penalty?
> 
> Thanks,
> Adam
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* [PATCH] net: Do not wrap sysctl igmp_max_memberships in IP_MULTICAST
From: Joakim Tjernlund @ 2011-04-12  9:49 UTC (permalink / raw)
  To: netdev; +Cc: Joakim Tjernlund

controlling igmp_max_membership is useful even when IP_MULTICAST
is off.
Quagga(an OSPF deamon) uses multicast addresses for all interfaces
using a single socket and hits igmp_max_membership limit when
there are 20 interfaces or more.
Always export sysctl igmp_max_memberships in proc, just like
igmp_max_msf

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 net/ipv4/sysctl_net_ipv4.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index d96c1da..9cc2824 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -306,7 +306,6 @@ static struct ctl_table ipv4_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_do_large_bitmap,
 	},
-#ifdef CONFIG_IP_MULTICAST
 	{
 		.procname	= "igmp_max_memberships",
 		.data		= &sysctl_igmp_max_memberships,
@@ -314,8 +313,6 @@ static struct ctl_table ipv4_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
-
-#endif
 	{
 		.procname	= "igmp_max_msf",
 		.data		= &sysctl_igmp_max_msf,
-- 
1.7.3.4


^ permalink raw reply related

* Re: Kernel panic when using bridge
From: Eric Dumazet @ 2011-04-12 11:49 UTC (permalink / raw)
  To: Scot Doyle; +Cc: Stephen Hemminger, Hiroaki SHIMODA, netdev, Jan Luebbe
In-Reply-To: <4DA3F909.5020609@scotdoyle.com>

Le mardi 12 avril 2011 à 02:02 -0500, Scot Doyle a écrit :
> On 04/12/2011 12:51 AM, Eric Dumazet wrote:
> >
> > Oh well, sorry (not enough time these days to even test patches)
> >
> > 	if (!skb_dst(skb)) {
> 
> --- br_netfilter.c.a    2011-04-01 02:37:53.000000000 -0500
> +++ br_netfilter.c.b    2011-04-12 00:29:00.000000000 -0500
> @@ -221,6 +221,7 @@ static int br_parse_ip_options(struct sk
>       struct ip_options *opt;
>       struct iphdr *iph;
>       struct net_device *dev = skb->dev;
> +    struct rtable *rt;
>       u32 len;
> 
>       iph = ip_hdr(skb);
> @@ -255,6 +256,16 @@ static int br_parse_ip_options(struct sk
>           return 0;
>       }
> 
> +    /* Associate bogus bridge route table */
> +    if (!skb_dst(skb)) {
> +        rt = bridge_parent_rtable(dev);
> +        if (!rt) {
> +            kfree_skb(skb);
> +            return 0;
> +        }
> +        skb_dst_set_noref(skb,&rt->dst);
> +    }
> +
>       opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
>       if (ip_options_compile(dev_net(dev), opt, skb))
>           goto inhdr_error;
> 
> 
> Now we are making progress! With the patch above from Stephen and Eric, 
> I cannot make the kernel panic when sending packets to the IP address of 
> the bridge.
> 
> However, if a guest virtual machine is sharing the bridge with the host 
> via a tap device, I can cause a host panic by targeting the IP address 
> of the guest. Is this an unrelated problem?
> 
> Here are two kernel panics. The guest virtual machine was pingable 
> before being attacked with IP Stack Checker's tcpsic command. Spanning 
> Tree Protocol was off during the first panic and on during the second.
> 
> ------------
> 
> [  606.921739] br0: port 2(tap0) entering forwarding state
> [  636.058941] Kernel panic - not syncing: stack-protector: Kernel stack 
> is corrupted in: ffffffff812c2781
> [  636.058942]
> [  636.069789] Pid: 2261, comm: kvm Tainted: G        W   2.6.39-rc2+ #11
> [  636.076292] Call Trace:
> [  636.078725] <IRQ>  [<ffffffff8132ad78>] ? panic+0x92/0x1a1
> [  636.084287]  [<ffffffff8104abe8>] ? _local_bh_enable_ip.clone.8+0x20/0x8c
> [  636.091044]  [<ffffffff812c2781>] ? icmp_send+0x337/0x349
> [  636.096418]  [<ffffffff810454e5>] ? __stack_chk_fail+0x17/0x17
> [  636.102221]  [<ffffffff812c2781>] ? icmp_send+0x337/0x349
> [  636.107595]  [<ffffffff81298527>] ? nf_iterate+0x41/0x7e
> [  636.112883]  [<ffffffff81298527>] ? nf_iterate+0x41/0x7e
> [  636.118172]  [<ffffffffa017b0d4>] ? br_flood+0xc8/0xc8 [bridge]
> [  636.124065]  [<ffffffffa017b250>] ? __br_deliver+0xb0/0xb0 [bridge]
> [  636.130302]  [<ffffffff812985d7>] ? nf_hook_slow+0x73/0x114
> [  636.135850]  [<ffffffffa017b250>] ? __br_deliver+0xb0/0xb0 [bridge]
> [  636.142089]  [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [  636.148586]  [<ffffffffa017b250>] ? __br_deliver+0xb0/0xb0 [bridge]
> [  636.154826]  [<ffffffffa017b186>] ? NF_HOOK.clone.5+0x3c/0x56 [bridge]
> [  636.161323]  [<ffffffffa017bfe1>] ? 
> br_handle_frame_finish+0x158/0x1c7 [bridge]
> [  636.168601]  [<ffffffffa0180689>] ? 
> br_nf_pre_routing_finish+0x1d4/0x1e1 [bridge]
> [  636.176052]  [<ffffffffa017fc76>] ? NF_HOOK_THRESH+0x3b/0x55 [bridge]
> [  636.182463]  [<ffffffffa0180c84>] ? br_nf_pre_routing+0x3be/0x3cb 
> [bridge]
> [  636.189307]  [<ffffffff812985d7>] ? nf_hook_slow+0x73/0x114
> [  636.194852]  [<ffffffff81298527>] ? nf_iterate+0x41/0x7e
> [  636.200139]  [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [  636.206637]  [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [  636.213133]  [<ffffffff812985d7>] ? nf_hook_slow+0x73/0x114
> [  636.218679]  [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [  636.225177]  [<ffffffffa017bfe1>] ? 
> br_handle_frame_finish+0x158/0x1c7 [bridge]
> [  636.232455]  [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [  636.238954]  [<ffffffffa017be6f>] ? NF_HOOK.clone.4+0x3c/0x56 [bridge]
> [  636.245452]  [<ffffffff812a7d8e>] ? tcp_gro_receive+0xa1/0x204
> [  636.251258]  [<ffffffffa017c1e5>] ? br_handle_frame+0x195/0x1ac [bridge]
> [  636.257928]  [<ffffffffa017c050>] ? 
> br_handle_frame_finish+0x1c7/0x1c7 [bridge]
> [  636.265204]  [<ffffffff812764ef>] ? __netif_receive_skb+0x2a7/0x450
> [  636.271443]  [<ffffffff81276928>] ? netif_receive_skb+0x52/0x58
> [  636.277335]  [<ffffffff81276e2a>] ? napi_gro_receive+0x1f/0x2f
> [  636.283139]  [<ffffffff812769ff>] ? napi_skb_finish+0x1c/0x31
> [  636.288865]  [<ffffffffa0241fcd>] ? igb_poll+0x6d9/0x9ee [igb]
> [  636.294673]  [<ffffffffa003bde2>] ? scsi_run_queue+0x2ce/0x30a [scsi_mod]
> [  636.301431]  [<ffffffffa017be89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [  636.307930]  [<ffffffff812764ef>] ? __netif_receive_skb+0x2a7/0x450
> [  636.314168]  [<ffffffff81276f55>] ? net_rx_action+0xa4/0x1b1
> [  636.319800]  [<ffffffff8104ad26>] ? __do_softirq+0xb8/0x176
> [  636.325346]  [<ffffffff81333c5c>] ? call_softirq+0x1c/0x30
> [  636.330807]  [<ffffffff8100aa57>] ? do_softirq+0x3f/0x84
> [  636.336092]  [<ffffffff8104af91>] ? irq_exit+0x3f/0x8f
> [  636.341204]  [<ffffffff8100a793>] ? do_IRQ+0x85/0x9e
> [  636.346146]  [<ffffffff8132cbd3>] ? common_interrupt+0x13/0x13
> [  636.351949] <EOI>  [<ffffffff81271f58>] ? arch_local_irq_save+0x12/0x1b
> [  636.358629]  [<ffffffff8100a9f2>] ? arch_local_irq_restore+0x2/0x8
> [  636.364781]  [<ffffffff8127680d>] ? netif_rx_ni+0x1e/0x27
> [  636.370154]  [<ffffffffa01557d2>] ? tun_get_user+0x3a3/0x3cb [tun]
> [  636.376305]  [<ffffffffa0155bd8>] ? tun_get_socket+0x3b/0x3b [tun]
> [  636.382457]  [<ffffffffa0155c36>] ? tun_chr_aio_write+0x5e/0x79 [tun]
> [  636.388869]  [<ffffffff810f6b07>] ? do_sync_readv_writev+0x9a/0xd5
> [  636.395021]  [<ffffffff810371f3>] ? need_resched+0x1a/0x23
> [  636.400481]  [<ffffffff8132b725>] ? _cond_resched+0x9/0x20
> [  636.405941]  [<ffffffff810f5f77>] ? copy_from_user+0x18/0x30
> [  636.411573]  [<ffffffff8115fbf6>] ? security_file_permission+0x18/0x33
> [  636.418068]  [<ffffffff810f6d55>] ? do_readv_writev+0xa4/0x11a
> [  636.423873]  [<ffffffff810f7913>] ? fput+0x1a/0x1a2
> [  636.428726]  [<ffffffff810f6f39>] ? sys_writev+0x45/0x90
> [  636.434012]  [<ffffffff81332a52>] ? system_call_fastpath+0x16/0x1b
> 
> ------------
> 
> [  110.442839] br0: port 2(tap0) entering forwarding state
> [  136.948700] Kernel panic - not syncing: stack-protector: Kernel stack 
> is corrupted in: ffffffff812c2781
> [  136.948702]
> [  136.959561] Pid: 1093, comm: md123_resync Not tainted 2.6.39-rc2+ #11
> [  136.965977] Call Trace:
> [  136.968408] <IRQ>  [<ffffffff8132ad78>] ? panic+0x92/0x1a1
> [  136.973970]  [<ffffffff8104abe8>] ? _local_bh_enable_ip.clone.8+0x20/0x8c
> [  136.980727]  [<ffffffff812c2781>] ? icmp_send+0x337/0x349
> [  136.986102]  [<ffffffff810454e5>] ? __stack_chk_fail+0x17/0x17
> [  136.991906]  [<ffffffff812c2781>] ? icmp_send+0x337/0x349
> [  136.997281]  [<ffffffff81298527>] ? nf_iterate+0x41/0x7e
> [  137.002570]  [<ffffffffa0198fe1>] ? 
> br_handle_frame_finish+0x158/0x1c7 [bridge]
> [  137.009847]  [<ffffffffa019d689>] ? 
> br_nf_pre_routing_finish+0x1d4/0x1e1 [bridge]
> [  137.017297]  [<ffffffffa019cc76>] ? NF_HOOK_THRESH+0x3b/0x55 [bridge]
> [  137.023707]  [<ffffffffa019dc84>] ? br_nf_pre_routing+0x3be/0x3cb 
> [bridge]
> [  137.030551]  [<ffffffff81298527>] ? nf_iterate+0x41/0x7e
> [  137.035837]  [<ffffffff8103704d>] ? test_tsk_need_resched+0xe/0x17
> [  137.041991]  [<ffffffffa0198e89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [  137.048488]  [<ffffffffa0198e89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [  137.054984]  [<ffffffff812985d7>] ? nf_hook_slow+0x73/0x114
> [  137.060531]  [<ffffffffa0198e89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [  137.067028]  [<ffffffffa0198e89>] ? NF_HOOK.clone.4+0x56/0x56 [bridge]
> [  137.073526]  [<ffffffffa0198e6f>] ? NF_HOOK.clone.4+0x3c/0x56 [bridge]
> [  137.080023]  [<ffffffff812a7d8e>] ? tcp_gro_receive+0xa1/0x204
> [  137.085830]  [<ffffffffa01991e5>] ? br_handle_frame+0x195/0x1ac [bridge]
> [  137.092500]  [<ffffffffa0199050>] ? 
> br_handle_frame_finish+0x1c7/0x1c7 [bridge]
> [  137.099776]  [<ffffffff812764ef>] ? __netif_receive_skb+0x2a7/0x450
> [  137.106013]  [<ffffffff81276928>] ? netif_receive_skb+0x52/0x58
> [  137.111906]  [<ffffffff81276e2a>] ? napi_gro_receive+0x1f/0x2f
> [  137.117713]  [<ffffffff812769ff>] ? napi_skb_finish+0x1c/0x31
> [  137.123438]  [<ffffffffa0226fcd>] ? igb_poll+0x6d9/0x9ee [igb]
> [  137.129243]  [<ffffffff8109034f>] ? handle_irq_event+0x40/0x55
> [  137.135049]  [<ffffffff8132cbd3>] ? common_interrupt+0x13/0x13
> [  137.140854]  [<ffffffff81276f55>] ? net_rx_action+0xa4/0x1b1
> [  137.146487]  [<ffffffff8104ad26>] ? __do_softirq+0xb8/0x176
> [  137.152034]  [<ffffffff81333c5c>] ? call_softirq+0x1c/0x30
> [  137.157494]  [<ffffffff8100aa57>] ? do_softirq+0x3f/0x84
> [  137.162779]  [<ffffffff8104af91>] ? irq_exit+0x3f/0x8f
> [  137.167893]  [<ffffffff8100a793>] ? do_IRQ+0x85/0x9e
> [  137.172833]  [<ffffffff8132cbd3>] ? common_interrupt+0x13/0x13
> [  137.178636] <EOI>  [<ffffffff8106fc1a>] ? arch_local_irq_restore+0x2/0x8
> [  137.185408]  [<ffffffffa0050fca>] ? _scsih_qcmd+0x54f/0x561 [mpt2sas]
> [  137.191823]  [<ffffffffa01e452f>] ? scsi_dispatch_cmd+0x180/0x219 
> [scsi_mod]
> [  137.198841]  [<ffffffffa01ea385>] ? scsi_request_fn+0x3e6/0x413 
> [scsi_mod]
> [  137.205683]  [<ffffffff81187470>] ? elv_rqhash_add.clone.15+0x26/0x4c
> [  137.212095]  [<ffffffff8118bde2>] ? __blk_run_queue+0x5e/0x84
> [  137.217814]  [<ffffffff8118d63c>] ? __make_request+0x273/0x28f
> [  137.223619]  [<ffffffff8118b569>] ? generic_make_request+0x267/0x2e1
> [  137.229943]  [<ffffffff8105eb49>] ? remove_wait_queue+0x11/0x4d
> [  137.235837]  [<ffffffffa0002417>] ? raise_barrier+0x162/0x16f [raid1]
> [  137.242246]  [<ffffffff8103eba4>] ? try_to_wake_up+0x17c/0x17c
> [  137.248052]  [<ffffffffa0002f2f>] ? sync_request+0x567/0x583 [raid1]
> [  137.254379]  [<ffffffffa00bd834>] ? md_do_sync+0x776/0xb8e [md_mod]
> [  137.260617]  [<ffffffff8100e537>] ? sched_clock+0x5/0x8
> [  137.265819]  [<ffffffffa00bde83>] ? md_thread+0xfa/0x118 [md_mod]
> [  137.271886]  [<ffffffffa00bdd89>] ? md_rdev_init+0x8f/0x8f [md_mod]
> [  137.278124]  [<ffffffffa00bdd89>] ? md_rdev_init+0x8f/0x8f [md_mod]
> [  137.284362]  [<ffffffff8105e497>] ? kthread+0x7a/0x82
> [  137.289390]  [<ffffffff81333b64>] ? kernel_thread_helper+0x4/0x10
> [  137.295454]  [<ffffffff8105e41d>] ? kthread_worker_fn+0x149/0x149
> [  137.301519]  [<ffffffff81333b60>] ? gs_change+0x13/0x13
> 

Considering recent changes in ip_options_echo() I would suggest to add
following patch and/or revert commit 8628bd8af7c4c14f40
(ipv4: Fix IP timestamp option (IPOPT_TS_PRESPEC) handling in
ip_options_echo())

Thanks

diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index 28a736f..35f2bf9 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -200,6 +200,11 @@ int ip_options_echo(struct ip_options * dopt, struct sk_buff * skb)
 		*dptr++ = IPOPT_END;
 		dopt->optlen++;
 	}
+	if (unlikely(dopt->optlen > 40)) {
+		pr_err("ip_options_echo() fatal error optlen=%u > 40\n", dopt->optlen);
+		print_hex_dump(KERN_ERR, "ip options: ", DUMP_PREFIX_OFFSET,
+			16, 1, dopt->__data, dopt->optlen, false);
+	}
 	return 0;
 }
 



^ permalink raw reply related

* Re: [PATCH v4] net: bnx2x: convert to hw_features
From: Vladislav Zolotarov @ 2011-04-12 12:10 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev@vger.kernel.org, Eilon Greenstein
In-Reply-To: <20110411202630.C079D13909@rere.qmqm.pl>

On Mon, 2011-04-11 at 13:26 -0700, Michał Mirosław wrote:
> Since ndo_fix_features callback is postponing features change when
> bp->recovery_state != BNX2X_RECOVERY_DONE, netdev_update_features()
> has to be called again when this condition changes. Previously,
> ethtool_ops->set_flags callback returned -EBUSY in that case
> (it's not possible in the new model).

ACK (with reservations). ;)

Could u, pls., just add this comment I've asked for in the previous
e-mail? 

The things I first thought to comment on are:
	- Removing TPA_ENABLED_FLAG the similar way u've removed the
bp->rx_csum.
	- Merging the code handling 'features' in bnx2x_init_bp() with the
similar code in bnx2x_init_dev().

However I think it would be right if we clear our mess by ourselves and
that u have already done much enough... ;) 

I've run our standard test suite (which in particular heavily tests the
RX_CSUM and LRO flags toggling) on this patch and it passed it.

Thanks a lot, Michal.
vlad

> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> ---
> v4: - complete bp->rx_csum -> NETIF_F_RXCSUM conversion
>     - add check for failed ndo_set_features in ndo_open callback
> v3: - include NETIF_F_LRO in hw_features
>     - don't call netdev_update_features() if bnx2x_nic_load() failed
> v2: - comment in ndo_fix_features callback
> ---
>  drivers/net/bnx2x/bnx2x.h         |    1 -
>  drivers/net/bnx2x/bnx2x_cmn.c     |   54 +++++++++++++++++++--
>  drivers/net/bnx2x/bnx2x_cmn.h     |    3 +
>  drivers/net/bnx2x/bnx2x_ethtool.c |   95 -------------------------------------
>  drivers/net/bnx2x/bnx2x_main.c    |   41 +++++++++-------
>  5 files changed, 75 insertions(+), 119 deletions(-)
> 
> diff --git a/drivers/net/bnx2x/bnx2x.h b/drivers/net/bnx2x/bnx2x.h
> index b7ff87b..fefd1d5 100644
> --- a/drivers/net/bnx2x/bnx2x.h
> +++ b/drivers/net/bnx2x/bnx2x.h
> @@ -918,7 +918,6 @@ struct bnx2x {
>  
>  	int			tx_ring_size;
>  
> -	u32			rx_csum;
>  /* L2 header size + 2*VLANs (8 bytes) + LLC SNAP (8 bytes) */
>  #define ETH_OVREHEAD		(ETH_HLEN + 8 + 8)
>  #define ETH_MIN_PACKET_SIZE		60
> diff --git a/drivers/net/bnx2x/bnx2x_cmn.c b/drivers/net/bnx2x/bnx2x_cmn.c
> index e83ac6d..7f49cf4 100644
> --- a/drivers/net/bnx2x/bnx2x_cmn.c
> +++ b/drivers/net/bnx2x/bnx2x_cmn.c
> @@ -640,7 +640,7 @@ reuse_rx:
>  
>  			skb_checksum_none_assert(skb);
>  
> -			if (bp->rx_csum) {
> +			if (bp->dev->features & NETIF_F_RXCSUM) {
>  				if (likely(BNX2X_RX_CSUM_OK(cqe)))
>  					skb->ip_summed = CHECKSUM_UNNECESSARY;
>  				else
> @@ -2443,11 +2443,21 @@ alloc_err:
>  
>  }
>  
> +static int bnx2x_reload_if_running(struct net_device *dev)
> +{
> +	struct bnx2x *bp = netdev_priv(dev);
> +
> +	if (unlikely(!netif_running(dev)))
> +		return 0;
> +
> +	bnx2x_nic_unload(bp, UNLOAD_NORMAL);
> +	return bnx2x_nic_load(bp, LOAD_NORMAL);
> +}
> +
>  /* called with rtnl_lock */
>  int bnx2x_change_mtu(struct net_device *dev, int new_mtu)
>  {
>  	struct bnx2x *bp = netdev_priv(dev);
> -	int rc = 0;
>  
>  	if (bp->recovery_state != BNX2X_RECOVERY_DONE) {
>  		printk(KERN_ERR "Handling parity error recovery. Try again later\n");
> @@ -2464,12 +2474,44 @@ int bnx2x_change_mtu(struct net_device *dev, int new_mtu)
>  	 */
>  	dev->mtu = new_mtu;
>  
> -	if (netif_running(dev)) {
> -		bnx2x_nic_unload(bp, UNLOAD_NORMAL);
> -		rc = bnx2x_nic_load(bp, LOAD_NORMAL);
> +	return bnx2x_reload_if_running(dev);
> +}
> +
> +u32 bnx2x_fix_features(struct net_device *dev, u32 features)
> +{
> +	struct bnx2x *bp = netdev_priv(dev);
> +
> +	if (bp->recovery_state != BNX2X_RECOVERY_DONE) {
> +		netdev_err(dev, "Handling parity error recovery. Try again later\n");
> +
> +		/* Don't allow bnx2x_set_features() to be called now. */
> +		return dev->features;
> +	}
> +
> +	/* TPA requires Rx CSUM offloading */
> +	if (!(features & NETIF_F_RXCSUM) || bp->disable_tpa)
> +		features &= ~NETIF_F_LRO;
> +
> +	return features;
> +}
> +
> +int bnx2x_set_features(struct net_device *dev, u32 features)
> +{
> +	struct bnx2x *bp = netdev_priv(dev);
> +	u32 flags = bp->flags;
> +
> +	if (features & NETIF_F_LRO)
> +		flags |= TPA_ENABLE_FLAG;
> +	else
> +		flags &= ~TPA_ENABLE_FLAG;
> +
> +	if (flags ^ bp->flags) {
> +		bp->flags = flags;
> +
> +		return bnx2x_reload_if_running(dev);
>  	}
>  
> -	return rc;
> +	return 0;
>  }
>  
>  void bnx2x_tx_timeout(struct net_device *dev)
> diff --git a/drivers/net/bnx2x/bnx2x_cmn.h b/drivers/net/bnx2x/bnx2x_cmn.h
> index 775fef0..1cdab69 100644
> --- a/drivers/net/bnx2x/bnx2x_cmn.h
> +++ b/drivers/net/bnx2x/bnx2x_cmn.h
> @@ -431,6 +431,9 @@ void bnx2x_free_mem_bp(struct bnx2x *bp);
>   */
>  int bnx2x_change_mtu(struct net_device *dev, int new_mtu);
>  
> +u32 bnx2x_fix_features(struct net_device *dev, u32 features);
> +int bnx2x_set_features(struct net_device *dev, u32 features);
> +
>  /**
>   * tx timeout netdev callback
>   *
> diff --git a/drivers/net/bnx2x/bnx2x_ethtool.c b/drivers/net/bnx2x/bnx2x_ethtool.c
> index 1479994..ad7d91e 100644
> --- a/drivers/net/bnx2x/bnx2x_ethtool.c
> +++ b/drivers/net/bnx2x/bnx2x_ethtool.c
> @@ -1299,91 +1299,6 @@ static int bnx2x_set_pauseparam(struct net_device *dev,
>  	return 0;
>  }
>  
> -static int bnx2x_set_flags(struct net_device *dev, u32 data)
> -{
> -	struct bnx2x *bp = netdev_priv(dev);
> -	int changed = 0;
> -	int rc = 0;
> -
> -	if (bp->recovery_state != BNX2X_RECOVERY_DONE) {
> -		printk(KERN_ERR "Handling parity error recovery. Try again later\n");
> -		return -EAGAIN;
> -	}
> -
> -	if (!(data & ETH_FLAG_RXVLAN))
> -		return -EINVAL;
> -
> -	if ((data & ETH_FLAG_LRO) && bp->rx_csum && bp->disable_tpa)
> -		return -EINVAL;
> -
> -	rc = ethtool_op_set_flags(dev, data, ETH_FLAG_LRO | ETH_FLAG_RXVLAN |
> -					ETH_FLAG_TXVLAN | ETH_FLAG_RXHASH);
> -	if (rc)
> -		return rc;
> -
> -	/* TPA requires Rx CSUM offloading */
> -	if ((data & ETH_FLAG_LRO) && bp->rx_csum) {
> -		if (!(bp->flags & TPA_ENABLE_FLAG)) {
> -			bp->flags |= TPA_ENABLE_FLAG;
> -			changed = 1;
> -		}
> -	} else if (bp->flags & TPA_ENABLE_FLAG) {
> -		dev->features &= ~NETIF_F_LRO;
> -		bp->flags &= ~TPA_ENABLE_FLAG;
> -		changed = 1;
> -	}
> -
> -	if (changed && netif_running(dev)) {
> -		bnx2x_nic_unload(bp, UNLOAD_NORMAL);
> -		rc = bnx2x_nic_load(bp, LOAD_NORMAL);
> -	}
> -
> -	return rc;
> -}
> -
> -static u32 bnx2x_get_rx_csum(struct net_device *dev)
> -{
> -	struct bnx2x *bp = netdev_priv(dev);
> -
> -	return bp->rx_csum;
> -}
> -
> -static int bnx2x_set_rx_csum(struct net_device *dev, u32 data)
> -{
> -	struct bnx2x *bp = netdev_priv(dev);
> -	int rc = 0;
> -
> -	if (bp->recovery_state != BNX2X_RECOVERY_DONE) {
> -		printk(KERN_ERR "Handling parity error recovery. Try again later\n");
> -		return -EAGAIN;
> -	}
> -
> -	bp->rx_csum = data;
> -
> -	/* Disable TPA, when Rx CSUM is disabled. Otherwise all
> -	   TPA'ed packets will be discarded due to wrong TCP CSUM */
> -	if (!data) {
> -		u32 flags = ethtool_op_get_flags(dev);
> -
> -		rc = bnx2x_set_flags(dev, (flags & ~ETH_FLAG_LRO));
> -	}
> -
> -	return rc;
> -}
> -
> -static int bnx2x_set_tso(struct net_device *dev, u32 data)
> -{
> -	if (data) {
> -		dev->features |= (NETIF_F_TSO | NETIF_F_TSO_ECN);
> -		dev->features |= NETIF_F_TSO6;
> -	} else {
> -		dev->features &= ~(NETIF_F_TSO | NETIF_F_TSO_ECN);
> -		dev->features &= ~NETIF_F_TSO6;
> -	}
> -
> -	return 0;
> -}
> -
>  static const struct {
>  	char string[ETH_GSTRING_LEN];
>  } bnx2x_tests_str_arr[BNX2X_NUM_TESTS] = {
> @@ -2207,16 +2122,6 @@ static const struct ethtool_ops bnx2x_ethtool_ops = {
>  	.set_ringparam		= bnx2x_set_ringparam,
>  	.get_pauseparam		= bnx2x_get_pauseparam,
>  	.set_pauseparam		= bnx2x_set_pauseparam,
> -	.get_rx_csum		= bnx2x_get_rx_csum,
> -	.set_rx_csum		= bnx2x_set_rx_csum,
> -	.get_tx_csum		= ethtool_op_get_tx_csum,
> -	.set_tx_csum		= ethtool_op_set_tx_hw_csum,
> -	.set_flags		= bnx2x_set_flags,
> -	.get_flags		= ethtool_op_get_flags,
> -	.get_sg			= ethtool_op_get_sg,
> -	.set_sg			= ethtool_op_set_sg,
> -	.get_tso		= ethtool_op_get_tso,
> -	.set_tso		= bnx2x_set_tso,
>  	.self_test		= bnx2x_self_test,
>  	.get_sset_count		= bnx2x_get_sset_count,
>  	.get_strings		= bnx2x_get_strings,
> diff --git a/drivers/net/bnx2x/bnx2x_main.c b/drivers/net/bnx2x/bnx2x_main.c
> index f3cf889..5fd7cbb 100644
> --- a/drivers/net/bnx2x/bnx2x_main.c
> +++ b/drivers/net/bnx2x/bnx2x_main.c
> @@ -7728,6 +7728,7 @@ static void bnx2x_parity_recover(struct bnx2x *bp)
>  						return;
>  					}
>  
> +					netdev_update_features(bp->dev);
>  					return;
>  				}
>  			} else { /* non-leader */
> @@ -7755,10 +7756,12 @@ static void bnx2x_parity_recover(struct bnx2x *bp)
>  					  * the "process kill". It's an exit
>  					  * point for a non-leader.
>  					  */
> -					bnx2x_nic_load(bp, LOAD_NORMAL);
> +					int rc = bnx2x_nic_load(bp, LOAD_NORMAL);
>  					bp->recovery_state =
>  						BNX2X_RECOVERY_DONE;
>  					smp_wmb();
> +					if (!rc)
> +						netdev_update_features(bp->dev);
>  					return;
>  				}
>  			}
> @@ -8904,8 +8907,6 @@ static int __devinit bnx2x_init_bp(struct bnx2x *bp)
>  	bp->multi_mode = multi_mode;
>  	bp->int_mode = int_mode;
>  
> -	bp->dev->features |= NETIF_F_GRO;
> -
>  	/* Set TPA flags */
>  	if (disable_tpa) {
>  		bp->flags &= ~TPA_ENABLE_FLAG;
> @@ -8925,8 +8926,6 @@ static int __devinit bnx2x_init_bp(struct bnx2x *bp)
>  
>  	bp->tx_ring_size = MAX_TX_AVAIL;
>  
> -	bp->rx_csum = 1;
> -
>  	/* make sure that the numbers are in the right granularity */
>  	bp->tx_ticks = (50 / BNX2X_BTR) * BNX2X_BTR;
>  	bp->rx_ticks = (25 / BNX2X_BTR) * BNX2X_BTR;
> @@ -8954,6 +8953,7 @@ static int __devinit bnx2x_init_bp(struct bnx2x *bp)
>  static int bnx2x_open(struct net_device *dev)
>  {
>  	struct bnx2x *bp = netdev_priv(dev);
> +	int rc;
>  
>  	netif_carrier_off(dev);
>  
> @@ -8993,7 +8993,14 @@ static int bnx2x_open(struct net_device *dev)
>  
>  	bp->recovery_state = BNX2X_RECOVERY_DONE;
>  
> -	return bnx2x_nic_load(bp, LOAD_OPEN);
> +	rc = bnx2x_nic_load(bp, LOAD_OPEN);
> +	if (!rc) {
> +		netdev_update_features(bp->dev);
> +		if (bp->state != BNX2X_STATE_OPEN)
> +			return -EBUSY;
> +	}
> +
> +	return rc;
>  }
>  
>  /* called with rtnl_lock */
> @@ -9304,6 +9311,8 @@ static const struct net_device_ops bnx2x_netdev_ops = {
>  	.ndo_validate_addr	= eth_validate_addr,
>  	.ndo_do_ioctl		= bnx2x_ioctl,
>  	.ndo_change_mtu		= bnx2x_change_mtu,
> +	.ndo_fix_features	= bnx2x_fix_features,
> +	.ndo_set_features	= bnx2x_set_features,
>  	.ndo_tx_timeout		= bnx2x_tx_timeout,
>  #ifdef CONFIG_NET_POLL_CONTROLLER
>  	.ndo_poll_controller	= poll_bnx2x,
> @@ -9430,20 +9439,18 @@ static int __devinit bnx2x_init_dev(struct pci_dev *pdev,
>  
>  	dev->netdev_ops = &bnx2x_netdev_ops;
>  	bnx2x_set_ethtool_ops(dev);
> -	dev->features |= NETIF_F_SG;
> -	dev->features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
> +
>  	if (bp->flags & USING_DAC_FLAG)
>  		dev->features |= NETIF_F_HIGHDMA;
> -	dev->features |= (NETIF_F_TSO | NETIF_F_TSO_ECN);
> -	dev->features |= NETIF_F_TSO6;
> -	dev->features |= (NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX);
>  
> -	dev->vlan_features |= NETIF_F_SG;
> -	dev->vlan_features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
> -	if (bp->flags & USING_DAC_FLAG)
> -		dev->vlan_features |= NETIF_F_HIGHDMA;
> -	dev->vlan_features |= (NETIF_F_TSO | NETIF_F_TSO_ECN);
> -	dev->vlan_features |= NETIF_F_TSO6;
> +	dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
> +		NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6 |
> +		NETIF_F_RXCSUM | NETIF_F_LRO | NETIF_F_HW_VLAN_TX;
> +
> +	dev->features |= dev->hw_features | NETIF_F_HW_VLAN_RX;
> +
> +	dev->vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
> +		NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6 | NETIF_F_HIGHDMA;
>  
>  #ifdef BCM_DCBNL
>  	dev->dcbnl_ops = &bnx2x_dcbnl_ops;




^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox