Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Thomas Gleixner @ 2014-10-29 19:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <20141029193603.GS12706@worktop.programming.kicks-ass.net>

On Wed, 29 Oct 2014, Peter Zijlstra wrote:

> On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > Yuck. No. You are just papering over the problem.
> > 
> > What happens if you add 'threadirqs' to the kernel command line? Or if
> > the interrupt line is shared with a real threaded interrupt user?
> > 
> > The proper solution is to have a poll_lock for e1000 which serializes
> > the hardware interrupt against netpoll instead of using
> > disable/enable_irq().
> > 
> > In fact that's less expensive than the disable/enable_irq() dance and
> > the chance of contention is pretty low. If done right it will be a
> > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > 
> 
> OK a little something like so then I suppose.. But I suspect most all
> the network drivers will need this and maybe more, disable_irq() is a
> popular little thing and we 'just' changed semantics on them.

We changed that almost 4 years ago :) What we 'just' did was to add a
prominent warning into the code.
 
> ---
>  drivers/net/ethernet/intel/e1000/e1000.h      |  2 ++
>  drivers/net/ethernet/intel/e1000/e1000_main.c | 22 +++++++++++++++++-----
>  kernel/irq/manage.c                           |  2 +-
>  3 files changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
> index 69707108d23c..3f48609f2318 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000.h
> +++ b/drivers/net/ethernet/intel/e1000/e1000.h
> @@ -323,6 +323,8 @@ struct e1000_adapter {
>  	struct delayed_work watchdog_task;
>  	struct delayed_work fifo_stall_task;
>  	struct delayed_work phy_info_task;
> +
> +	spinlock_t irq_lock;
>  };
>  
>  enum e1000_state_t {
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index 5f6aded512f5..d12cbffe2149 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -1310,6 +1310,7 @@ static int e1000_sw_init(struct e1000_adapter *adapter)
>  	e1000_irq_disable(adapter);
>  
>  	spin_lock_init(&adapter->stats_lock);
> +	spin_lock_init(&adapter->irq_lock);
>  
>  	set_bit(__E1000_DOWN, &adapter->flags);
>  
> @@ -3748,10 +3749,8 @@ void e1000_update_stats(struct e1000_adapter *adapter)
>   * @irq: interrupt number
>   * @data: pointer to a network interface device structure
>   **/
> -static irqreturn_t e1000_intr(int irq, void *data)
> +static irqreturn_t __e1000_intr(int irq, struct e1000_adapter *adapter)
>  {
> -	struct net_device *netdev = data;
> -	struct e1000_adapter *adapter = netdev_priv(netdev);
>  	struct e1000_hw *hw = &adapter->hw;
>  	u32 icr = er32(ICR);
>  
> @@ -3793,6 +3792,19 @@ static irqreturn_t e1000_intr(int irq, void *data)
>  	return IRQ_HANDLED;
>  }
>  
> +static irqreturn_t e1000_intr(int irq, void *data)
> +{
> +	struct net_device *netdev = data;
> +	struct e1000_adapter *adapter = netdev_priv(netdev);
> +	irqreturn_t ret;
> +
> +	spin_lock(&adapter->irq_lock);
> +	ret = __e1000_intr(irq, adapter);
> +	spin_unlock(&adapter->irq_lock);
> +
> +	return ret;
> +}
> +
>  /**
>   * e1000_clean - NAPI Rx polling callback
>   * @adapter: board private structure
> @@ -5217,9 +5229,9 @@ static void e1000_netpoll(struct net_device *netdev)
>  {
>  	struct e1000_adapter *adapter = netdev_priv(netdev);
>  
> -	disable_irq(adapter->pdev->irq);
> +	spin_lock(&adapter->irq_lock)
>  	e1000_intr(adapter->pdev->irq, netdev);
> -	enable_irq(adapter->pdev->irq);
> +	spin_unlock(&adapter->irq_lock)
>  }
>  #endif
>  
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 0a9104b4608b..b5a4a06bf2fd 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -427,7 +427,7 @@ EXPORT_SYMBOL(disable_irq_nosync);
>   *	to complete before returning. If you use this function while
>   *	holding a resource the IRQ handler may need you will deadlock.
>   *
> - *	This function may be called - with care - from IRQ context.
> + *	This function may _NOT_ be called from IRQ context.

It can only be called from preemptible thread context.

Thanks,

	tglx

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Peter Zijlstra @ 2014-10-29 19:50 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <alpine.DEB.2.11.1410292046270.5308@nanos>

On Wed, Oct 29, 2014 at 08:49:03PM +0100, Thomas Gleixner wrote:
> On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> 
> > On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > > Yuck. No. You are just papering over the problem.
> > > 
> > > What happens if you add 'threadirqs' to the kernel command line? Or if
> > > the interrupt line is shared with a real threaded interrupt user?
> > > 
> > > The proper solution is to have a poll_lock for e1000 which serializes
> > > the hardware interrupt against netpoll instead of using
> > > disable/enable_irq().
> > > 
> > > In fact that's less expensive than the disable/enable_irq() dance and
> > > the chance of contention is pretty low. If done right it will be a
> > > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > > 
> > 
> > OK a little something like so then I suppose.. But I suspect most all
> > the network drivers will need this and maybe more, disable_irq() is a
> > popular little thing and we 'just' changed semantics on them.
> 
> We changed that almost 4 years ago :) What we 'just' did was to add a
> prominent warning into the code.

You know that is the same right... they didn't know it was broken
therefore it wasn't :-), but now they need to go actually do stuff about
it, an entirely different proposition.

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Thomas Gleixner @ 2014-10-29 19:53 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: Peter Zijlstra, Sabrina Dubroca, netdev, linux-kernel
In-Reply-To: <1414611641.2420.54.camel@jtkirshe-mobl>

On Wed, 29 Oct 2014, Jeff Kirsher wrote:
> On Wed, 2014-10-29 at 20:36 +0100, Peter Zijlstra wrote:
> > On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > > Yuck. No. You are just papering over the problem.
> > > 
> > > What happens if you add 'threadirqs' to the kernel command line? Or if
> > > the interrupt line is shared with a real threaded interrupt user?
> > > 
> > > The proper solution is to have a poll_lock for e1000 which serializes
> > > the hardware interrupt against netpoll instead of using
> > > disable/enable_irq().
> > > 
> > > In fact that's less expensive than the disable/enable_irq() dance and
> > > the chance of contention is pretty low. If done right it will be a
> > > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > > 
> > 
> > OK a little something like so then I suppose.. But I suspect most all
> > the network drivers will need this and maybe more, disable_irq() is a
> > popular little thing and we 'just' changed semantics on them.
> 
> Thomas- if you are fine with Peter's patch, I can get this under
> testing.

I'm fine with it except for the comment part of disable_irq(), but
that does not matter :)

One nitpick: Instead of having the lock unconditionally, I'd make it
depend on CONFIG_NET_POLL_CONTROLLER.

#ifdef CONFIG_NET_POLL_CONTROLLER
static inline void netpoll_lock(struct e1000_adapter *adapter)
{
	spin_lock(&adapter->irq_lock);
}

static inline void netpoll_unlock(struct e1000_adapter *adapter)
{
	spin_unlock(&adapter->irq_lock);
}
#else
static inline void netpoll_lock(struct e1000_adapter *adapter) { }
static inline void netpoll_unlock(struct e1000_adapter *adapter) { }
#endif

and use that instead of the unconditional spin[un]lock() invocations.

But that's up to you.

Thanks,

	tglx

^ permalink raw reply

* Re: [RFC] use smp_load_acquire()/smp_store_release()
From: Eric Dumazet @ 2014-10-29 19:57 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: Alexander Duyck, netdev
In-Reply-To: <1414610868.2420.52.camel@jtkirshe-mobl>

On Wed, 2014-10-29 at 12:27 -0700, Jeff Kirsher wrote:
> On Wed, 2014-10-29 at 09:16 -0700, Alexander Duyck wrote:
> > On 10/29/2014 07:49 AM, Eric Dumazet wrote:
> > > Hi Alexander
> > >
> > > The memory barriers added in commit
> > > b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
> > > ("net: Add memory barriers to prevent possible race in byte queue
> > > limits")
> > >
> > > have heavy cost.
> > >
> > > It seems we could use smp_load_acquire() and smp_store_release()
> > > instead ?
> > >
> > > I'll post a patch later today. I would be interested if someone was able
> > > to test it, as your commit apparently was tested and known to fix a
> > > reproducible race.
> > >
> > > Thanks !
> 
> Eric- just CC me on the patch you post and I will see what I can do
> about getting validation eyes on it.

Thanks guys, will do, and will CC Paul as well.

Alexander, here is the following profile showing the cost of the
'mfence', in a typical rpc workload (a lot of IRQ are generated for TX
completions, because RPC tend to send small packets)

  0.11 │       je     33a
       │       mov    -0x3c(%rbp),%esi
  0.06 │       lea    0xc0(%rbx),%rdi
  0.06 │       callq  dql_completed
  0.06 │       mfence
 38.68 │       mov    0xc4(%rbx),%edx
  1.83 │       mov    0xc0(%rbx),%eax
       │       cmp    %eax,%edx
  0.22 │       js     333
  0.11 │       lock   btrl $0x1,0x98(%rbx)

^ permalink raw reply

* Re: [PATCHv1 0/2 net-next] xen-netback: minor cleanups
From: David Miller @ 2014-10-29 20:00 UTC (permalink / raw)
  To: david.vrabel; +Cc: netdev, xen-devel, ian.campbell, wei.liu2
In-Reply-To: <1414510171-12853-1-git-send-email-david.vrabel@citrix.com>

From: David Vrabel <david.vrabel@citrix.com>
Date: Tue, 28 Oct 2014 15:29:29 +0000

> Two minor xen-netback cleanups originally from Zoltan.

Series applied, thanks everyone.

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Thomas Gleixner @ 2014-10-29 20:07 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <20141029195054.GH10501@worktop.programming.kicks-ass.net>

On Wed, 29 Oct 2014, Peter Zijlstra wrote:

> On Wed, Oct 29, 2014 at 08:49:03PM +0100, Thomas Gleixner wrote:
> > On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> > 
> > > On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > > > Yuck. No. You are just papering over the problem.
> > > > 
> > > > What happens if you add 'threadirqs' to the kernel command line? Or if
> > > > the interrupt line is shared with a real threaded interrupt user?
> > > > 
> > > > The proper solution is to have a poll_lock for e1000 which serializes
> > > > the hardware interrupt against netpoll instead of using
> > > > disable/enable_irq().
> > > > 
> > > > In fact that's less expensive than the disable/enable_irq() dance and
> > > > the chance of contention is pretty low. If done right it will be a
> > > > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > > > 
> > > 
> > > OK a little something like so then I suppose.. But I suspect most all
> > > the network drivers will need this and maybe more, disable_irq() is a
> > > popular little thing and we 'just' changed semantics on them.
> > 
> > We changed that almost 4 years ago :) What we 'just' did was to add a
> > prominent warning into the code.
> 
> You know that is the same right... they didn't know it was broken
> therefore it wasn't :-), but now they need to go actually do stuff about
> it, an entirely different proposition.

Right, and of course the world and some more has the very same code
there:

poll_controller()
{
	disable_irq();
	dev_interrupt_handler();
	enable_irq();
}

Trying to twist my brain to come up with a solution which avoids the
spinlock, but I have a hard time to come up with one.

The only thing I came up with so far is to avoid adding locks to every
driver incarnation and instead put it into struct net_device and
provide helper functions for the lock/unlock case.

That does not change the fact that we need to deal with that on a per
driver basis :(

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH net-next] net: introduce napi_schedule_irqoff()
From: David Miller @ 2014-10-29 20:08 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1414544713.631.30.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 28 Oct 2014 18:05:13 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> napi_schedule() can be called from any context and has to mask hard
> irqs.
> 
> Add a variant that can only be called from hard interrupts handlers
> or when irqs are already masked.
> 
> Many NIC drivers can use it from their hard IRQ handler instead of
> generic variant.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH net-next] neigh: optimize neigh_parms_release()
From: David Miller @ 2014-10-29 20:12 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev
In-Reply-To: <1414607371-4246-1-git-send-email-nicolas.dichtel@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Wed, 29 Oct 2014 19:29:31 +0100

> In neigh_parms_release() we loop over all entries to find the entry given in
> argument and being able to remove it from the list. By using a double linked
> list, we can avoid this loop.
> 
> Here are some numbers with 30 000 dummy interfaces configured:
> 
> Before the patch:
> $ time rmmod dummy
> real	2m0.118s
> user	0m0.000s
> sys	1m50.048s
> 
> After the patch:
> $ time rmmod dummy
> real	1m9.970s
> user	0m0.000s
> sys	0m47.976s
> 
> Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Looks great, applied, thanks Nicolas.

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Thomas Gleixner @ 2014-10-29 20:23 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <alpine.DEB.2.11.1410292053430.5308@nanos>

On Wed, 29 Oct 2014, Thomas Gleixner wrote:
> On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> 
> > On Wed, Oct 29, 2014 at 08:49:03PM +0100, Thomas Gleixner wrote:
> > > On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> > > 
> > > > On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > > > > Yuck. No. You are just papering over the problem.
> > > > > 
> > > > > What happens if you add 'threadirqs' to the kernel command line? Or if
> > > > > the interrupt line is shared with a real threaded interrupt user?
> > > > > 
> > > > > The proper solution is to have a poll_lock for e1000 which serializes
> > > > > the hardware interrupt against netpoll instead of using
> > > > > disable/enable_irq().
> > > > > 
> > > > > In fact that's less expensive than the disable/enable_irq() dance and
> > > > > the chance of contention is pretty low. If done right it will be a
> > > > > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > > > > 
> > > > 
> > > > OK a little something like so then I suppose.. But I suspect most all
> > > > the network drivers will need this and maybe more, disable_irq() is a
> > > > popular little thing and we 'just' changed semantics on them.
> > > 
> > > We changed that almost 4 years ago :) What we 'just' did was to add a
> > > prominent warning into the code.
> > 
> > You know that is the same right... they didn't know it was broken
> > therefore it wasn't :-), but now they need to go actually do stuff about
> > it, an entirely different proposition.
> 
> Right, and of course the world and some more has the very same code
> there:
> 
> poll_controller()
> {
> 	disable_irq();
> 	dev_interrupt_handler();
> 	enable_irq();
> }
> 
> Trying to twist my brain to come up with a solution which avoids the
> spinlock, but I have a hard time to come up with one.
> 
> The only thing I came up with so far is to avoid adding locks to every
> driver incarnation and instead put it into struct net_device and
> provide helper functions for the lock/unlock case.
> 
> That does not change the fact that we need to deal with that on a per
> driver basis :(

But at least it allows to mitigate the impact by making it conditional
at a central point.

static inline void netpoll_lock(struct net_device *nd)
{
	if (netpoll_active(nd))
		spin_lock(&nd->netpoll_lock);
}

and let the core code make sure that activation/deactivation of
netpoll on a particular interface is serialized against the interrupt
and netpoll calls.

Not sure if it's worth the trouble, but at least it allows to deal
with it in the core instead of dealing with it on a per driver base.

Thanks,

	tglx

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Peter Zijlstra @ 2014-10-29 20:51 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <alpine.DEB.2.11.1410292119350.5308@nanos>

On Wed, Oct 29, 2014 at 09:23:42PM +0100, Thomas Gleixner wrote:
> But at least it allows to mitigate the impact by making it conditional
> at a central point.
> 
> static inline void netpoll_lock(struct net_device *nd)
> {
> 	if (netpoll_active(nd))
> 		spin_lock(&nd->netpoll_lock);
> }

branch fail vs lock might be a toss on most machines, but if we're
hitting cold cachelines we loose big.

> and let the core code make sure that activation/deactivation of
> netpoll on a particular interface is serialized against the interrupt
> and netpoll calls.
> 
> Not sure if it's worth the trouble, but at least it allows to deal
> with it in the core instead of dealing with it on a per driver base.

Does multi-queue have one netdev per queue or does that need moar
logicz?

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Thomas Gleixner @ 2014-10-29 21:03 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <20141029205131.GI10501@worktop.programming.kicks-ass.net>

On Wed, 29 Oct 2014, Peter Zijlstra wrote:

> On Wed, Oct 29, 2014 at 09:23:42PM +0100, Thomas Gleixner wrote:
> > But at least it allows to mitigate the impact by making it conditional
> > at a central point.
> > 
> > static inline void netpoll_lock(struct net_device *nd)
> > {
> > 	if (netpoll_active(nd))
> > 		spin_lock(&nd->netpoll_lock);
> > }
> 
> branch fail vs lock might be a toss on most machines, but if we're
> hitting cold cachelines we loose big.

Well, if the net_device is not cache hot on irq entry you have lost
already. The extra branch/lock is not going to add much to that.
 
Thanks,

	tglx

^ permalink raw reply

* Re: nf_reject_ipv4: module license 'unspecified' taints kernel
From: Benjamin Tissoires @ 2014-10-29 21:05 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Dave Young, davem, netdev, linux-kernel@vger.kernel.org,
	netfilter-devel
In-Reply-To: <20141014081109.GA5357@dhcp-16-198.nay.redhat.com>

On Tue, Oct 14, 2014 at 4:11 AM, Dave Young <dyoung@redhat.com> wrote:
> On 10/10/14 at 11:56am, Pablo Neira Ayuso wrote:
>> On Fri, Oct 10, 2014 at 05:19:04PM +0800, Dave Young wrote:
>> > Hi,
>> >
>> > With today's linus tree, I got below kmsg:
>> > [   23.545204] nf_reject_ipv4: module license 'unspecified' taints kernel.
>> >
>> > It could be caused by below commit:
>> >
>> > commit c8d7b98bec43faaa6583c3135030be5eb4693acb
>> > Author: Pablo Neira Ayuso <pablo@netfilter.org>
>> > Date:   Fri Sep 26 14:35:15 2014 +0200
>> >
>> >     netfilter: move nf_send_resetX() code to nf_reject_ipvX modules
>> >
>> >     Move nf_send_reset() and nf_send_reset6() to nf_reject_ipv4 and
>> >     nf_reject_ipv6 respectively. This code is shared by x_tables and
>> >     nf_tables.
>> >
>> >     Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>>
>> Patch attached, thanks for reporting.
>
> Tested-by: Dave Young <dyoung@redhat.com>
>
>>
>> P.S: Please, Cc netfilter-devel@vger.kernel.org in future reports, so
>> we make sure things don't get lost.
>
> Sure. Thanks.
>
>> From d4358bcf64ba7a64d4de4e1dc5533c4c8f88ea82 Mon Sep 17 00:00:00 2001
>> From: Pablo Neira Ayuso <pablo@netfilter.org>
>> Date: Fri, 10 Oct 2014 11:25:20 +0200
>> Subject: [PATCH] netfilter: missing module license in the nf_reject_ipvX
>>  modules
>>
>> [   23.545204] nf_reject_ipv4: module license 'unspecified' taints kernel.
>>
>> Reported-by: Dave Young <dyoung@redhat.com>
>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>> ---

Hi,

What is the status of this patch? I can not find it in Pablo's trees
(or I did not look enough).

Not having it is actually bothering me quite a lot because the vanilla
v3.18-rc2 gives the following dmesg on Fedora 21:

Oct 29 16:50:01 t440s kernel: nf_reject_ipv6: module license
'unspecified' taints kernel.
Oct 29 16:50:01 t440s kernel: Disabling lock debugging due to kernel taint
Oct 29 16:50:01 t440s kernel: nf_reject_ipv6: Unknown symbol
ip6_local_out (err 0)

And unfortunately, firewalld failed after, and I can not directly ssh
to the host.
Now that I found the solution, my process improved a lot (thank you
BTW for whoever included it in Fedora), but I guess other
distributions might hit the problem.

I would say such a trivial patch could easily go in one of the v3.18 RCs.

Cheers,
Benjamin


>>  net/ipv4/netfilter/nf_reject_ipv4.c |    3 +++
>>  net/ipv6/netfilter/nf_reject_ipv6.c |    4 ++++
>>  2 files changed, 7 insertions(+)
>>
>> diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c
>> index b023b4e..92b303d 100644
>> --- a/net/ipv4/netfilter/nf_reject_ipv4.c
>> +++ b/net/ipv4/netfilter/nf_reject_ipv4.c
>> @@ -6,6 +6,7 @@
>>   * published by the Free Software Foundation.
>>   */
>>
>> +#include <linux/module.h>
>>  #include <net/ip.h>
>>  #include <net/tcp.h>
>>  #include <net/route.h>
>> @@ -125,3 +126,5 @@ void nf_send_reset(struct sk_buff *oldskb, int hook)
>>       kfree_skb(nskb);
>>  }
>>  EXPORT_SYMBOL_GPL(nf_send_reset);
>> +
>> +MODULE_LICENSE("GPL");
>> diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c
>> index 5f5f043..20d9def 100644
>> --- a/net/ipv6/netfilter/nf_reject_ipv6.c
>> +++ b/net/ipv6/netfilter/nf_reject_ipv6.c
>> @@ -5,6 +5,8 @@
>>   * it under the terms of the GNU General Public License version 2 as
>>   * published by the Free Software Foundation.
>>   */
>> +
>> +#include <linux/module.h>
>>  #include <net/ipv6.h>
>>  #include <net/ip6_route.h>
>>  #include <net/ip6_fib.h>
>> @@ -161,3 +163,5 @@ void nf_send_reset6(struct net *net, struct sk_buff *oldskb, int hook)
>>               ip6_local_out(nskb);
>>  }
>>  EXPORT_SYMBOL_GPL(nf_send_reset6);
>> +
>> +MODULE_LICENSE("GPL");
>> --
>> 1.7.10.4
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [RFC] use smp_load_acquire()/smp_store_release()
From: Alexander Duyck @ 2014-10-29 21:13 UTC (permalink / raw)
  To: Eric Dumazet, Jeff Kirsher; +Cc: netdev
In-Reply-To: <1414612620.631.98.camel@edumazet-glaptop2.roam.corp.google.com>


On 10/29/2014 12:57 PM, Eric Dumazet wrote:
> On Wed, 2014-10-29 at 12:27 -0700, Jeff Kirsher wrote:
>> On Wed, 2014-10-29 at 09:16 -0700, Alexander Duyck wrote:
>>> On 10/29/2014 07:49 AM, Eric Dumazet wrote:
>>>> Hi Alexander
>>>>
>>>> The memory barriers added in commit
>>>> b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
>>>> ("net: Add memory barriers to prevent possible race in byte queue
>>>> limits")
>>>>
>>>> have heavy cost.
>>>>
>>>> It seems we could use smp_load_acquire() and smp_store_release()
>>>> instead ?
>>>>
>>>> I'll post a patch later today. I would be interested if someone was able
>>>> to test it, as your commit apparently was tested and known to fix a
>>>> reproducible race.
>>>>
>>>> Thanks !
>> Eric- just CC me on the patch you post and I will see what I can do
>> about getting validation eyes on it.
> Thanks guys, will do, and will CC Paul as well.
>
> Alexander, here is the following profile showing the cost of the
> 'mfence', in a typical rpc workload (a lot of IRQ are generated for TX
> completions, because RPC tend to send small packets)
>
>    0.11 │       je     33a
>         │       mov    -0x3c(%rbp),%esi
>    0.06 │       lea    0xc0(%rbx),%rdi
>    0.06 │       callq  dql_completed
>    0.06 │       mfence
>   38.68 │       mov    0xc4(%rbx),%edx
>    1.83 │       mov    0xc0(%rbx),%eax
>         │       cmp    %eax,%edx
>    0.22 │       js     333
>    0.11 │       lock   btrl $0x1,0x98(%rbx)

It might be worthwhile to see if it would be possible to combine BQL 
with the mechanism the drivers have for handling descriptors/packets.  
Otherwise you are going to be pulling one barrier just to hit another 
right after it.

Also depending on what driver it is that the trace is from you may want 
to check and see if you have any MMIO transactions occurring right 
before you make the call, otherwise that may be the actual cause for the 
significant cost as you are having to flush non-coherent memory before 
you can resume operation.

Thanks,

Alex

^ permalink raw reply

* Re: [PATCH v3 00/15] net: dsa: Fixes and enhancements
From: Guenter Roeck @ 2014-10-29 21:39 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, David S. Miller, Andrew Lunn, linux-kernel
In-Reply-To: <5451305B.7010303@gmail.com>

On Wed, Oct 29, 2014 at 11:22:19AM -0700, Florian Fainelli wrote:
> On 10/29/2014 10:44 AM, Guenter Roeck wrote:
> > Patch 01/15 addresses a bug indicated by an an annoying and unhelpful
> > log message.
> > 
> > Patches 02/15 and 03/15 are minor enhancements, adding support for
> > known switch revisions.
> > 
> > Patches 04/15 and 05/15 add support for MV88E6352 and MV88E6176.
> > 
> > Patch 06/15 adds support for hardware monitoring, specifically for
> > reporting the chip temperature, to the dsa subsystem.
> > 
> > Patches 07/15 and 08/15 implement hardware monitoring for MV88E6352,
> > MV88E6176, MV88E6123, MV88E6161, and MV88E6165.
> > 
> > Patch 09/15 and 10/15 add support for EEPROM access to the DSA subsystem.
> > 
> > Patch 11/15 implements EEPROM access for MV88E6352 and MV88E6176.
> > 
> > Patch 12/15 adds support for reading switch registers to the DSA
> > subsystem.
> > 
> > Patches 13/15 amd 14/15 implement support for reading switch registers
> > to the drivers for MV88E6352, MV88E6176, MV88E6123, MV88E6161, and MV88E6165.
> > 
> > Patch 15/15 adds support for reading additional RMON registers to the drivers
> > for  MV88E6352, MV88E6176, MV88E6123, MV88E6161, and MV88E6165.
> > 
> > The series was tested on top of v3.18-rc2 in an x86 system with MV88E6352.
> > Testing in systems with 88E6131, 88E6060 and MV88E6165 was done earlier
> > (I don't have access to those systems right now). The series was also build
> > tested using my build system at http://server.roeck-us.net:8010/builders.
> > Look into the 'dsa' column for build results.
> > 
> > The series merges cleanly into net-next as of today (10/29).
> 
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
> 
> Thanks Guenter!
> 
Thanks a lot for the review!

Guenter

^ permalink raw reply

* [PATCH net-next] ipv4: minor spelling fixes
From: Stephen Hemminger @ 2014-10-29 23:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev



Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

--- a/net/ipv4/geneve.c	2014-10-27 21:05:31.259174957 -0700
+++ b/net/ipv4/geneve.c	2014-10-27 21:05:31.255174943 -0700
@@ -104,7 +104,7 @@ static void geneve_build_header(struct g
 	memcpy(geneveh->options, options, options_len);
 }
 
-/* Transmit a fully formated Geneve frame.
+/* Transmit a fully formatted Geneve frame.
  *
  * When calling this function. The skb->data should point
  * to the geneve header which is fully formed.
--- a/net/ipv4/tcp_input.c	2014-10-27 21:05:31.259174957 -0700
+++ b/net/ipv4/tcp_input.c	2014-10-27 21:05:31.259174957 -0700
@@ -5865,7 +5865,7 @@ static inline void pr_drop_req(struct re
  * If we receive a SYN packet with these bits set, it means a
  * network is playing bad games with TOS bits. In order to
  * avoid possible false congestion notifications, we disable
- * TCP ECN negociation.
+ * TCP ECN negotiation.
  *
  * Exception: tcp_ca wants ECN. This is required for DCTCP
  * congestion control; it requires setting ECT on all packets,

^ permalink raw reply

* [PATCH] rtlwifi: Add more checks for get_btc_status callback
From: Murilo Opsfelder Araujo @ 2014-10-29 23:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-wireless, netdev, Larry Finger, Chaoming Li,
	John W. Linville, Mike Galbraith, Thadeu Cascardo, troy_tan,
	Murilo Opsfelder Araujo

This is a complement of commit 08054200117a95afc14c3d2ed3a38bf4e345bf78
"rtlwifi: Add check for get_btc_status callback".

With this patch, next-20141029 at least does not panic with rtl8192se
device.

Signed-off-by: Murilo Opsfelder Araujo <mopsfelder@gmail.com>
---

Hello, everyone.

Some days ago, I reported [1] that next-20140930 introduced an issue
with rtl8192se devices.

Later on, Larry Finger proposed [2] a fix that did not solve the
problem thoroughly.

This patch is based on Larry's one [3].  It also does not solve the
rtl8192se issue completely but I can at least boot next-20141029
without a panic.

The remaining issue is that the rtl8192se device does not associate.
It does not even show any wifi network available.  The device is shown
by iwconfig, but I cannot do anything with it.

I need help from someone out there that could provide me guidance or
possibly investigate the issue (I'm not a kernel expert yet).

I'd not like to see this regression landing on v3.18.

[1] http://marc.info/?l=linux-wireless&m=141403434929612
[2] http://marc.info/?l=linux-wireless&m=141408165513255
[3] http://marc.info/?l=linux-wireless&m=141416876810127

 drivers/net/wireless/rtlwifi/base.c |  6 ++++--
 drivers/net/wireless/rtlwifi/core.c |  9 ++++++---
 drivers/net/wireless/rtlwifi/pci.c  |  3 ++-
 drivers/net/wireless/rtlwifi/ps.c   | 12 ++++++++----
 4 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/net/wireless/rtlwifi/base.c b/drivers/net/wireless/rtlwifi/base.c
index 40b6d1d..1a51577 100644
--- a/drivers/net/wireless/rtlwifi/base.c
+++ b/drivers/net/wireless/rtlwifi/base.c
@@ -1234,7 +1234,8 @@ EXPORT_SYMBOL_GPL(rtl_action_proc);
 static void setup_arp_tx(struct rtl_priv *rtlpriv, struct rtl_ps_ctl *ppsc)
 {
 	rtlpriv->ra.is_special_data = true;
-	if (rtlpriv->cfg->ops->get_btc_status())
+	if (rtlpriv->cfg->ops->get_btc_status &&
+	    rtlpriv->cfg->ops->get_btc_status())
 		rtlpriv->btcoexist.btc_ops->btc_special_packet_notify(
 					rtlpriv, 1);
 	rtlpriv->enter_ps = false;
@@ -1629,7 +1630,8 @@ void rtl_watchdog_wq_callback(void *data)
 		}
 	}

-	if (rtlpriv->cfg->ops->get_btc_status())
+	if (rtlpriv->cfg->ops->get_btc_status &&
+	    rtlpriv->cfg->ops->get_btc_status())
 		rtlpriv->btcoexist.btc_ops->btc_periodical(rtlpriv);

 	rtlpriv->link_info.bcn_rx_inperiod = 0;
diff --git a/drivers/net/wireless/rtlwifi/core.c b/drivers/net/wireless/rtlwifi/core.c
index f6179bc..686d256 100644
--- a/drivers/net/wireless/rtlwifi/core.c
+++ b/drivers/net/wireless/rtlwifi/core.c
@@ -1133,7 +1133,8 @@ static void rtl_op_bss_info_changed(struct ieee80211_hw *hw,
 		ppsc->report_linked = (mstatus == RT_MEDIA_CONNECT) ?
 				      true : false;

-		if (rtlpriv->cfg->ops->get_btc_status())
+		if (rtlpriv->cfg->ops->get_btc_status &&
+		    rtlpriv->cfg->ops->get_btc_status())
 			rtlpriv->btcoexist.btc_ops->btc_mediastatus_notify(
 							rtlpriv, mstatus);
 	}
@@ -1373,7 +1374,8 @@ static void rtl_op_sw_scan_start(struct ieee80211_hw *hw)
 		return;
 	}

-	if (rtlpriv->cfg->ops->get_btc_status())
+	if (rtlpriv->cfg->ops->get_btc_status &&
+	    rtlpriv->cfg->ops->get_btc_status())
 		rtlpriv->btcoexist.btc_ops->btc_scan_notify(rtlpriv, 1);

 	if (rtlpriv->dm.supp_phymode_switch) {
@@ -1425,7 +1427,8 @@ static void rtl_op_sw_scan_complete(struct ieee80211_hw *hw)
 	}

 	rtlpriv->cfg->ops->scan_operation_backup(hw, SCAN_OPT_RESTORE);
-	if (rtlpriv->cfg->ops->get_btc_status())
+	if (rtlpriv->cfg->ops->get_btc_status &&
+	    rtlpriv->cfg->ops->get_btc_status())
 		rtlpriv->btcoexist.btc_ops->btc_scan_notify(rtlpriv, 0);
 }

diff --git a/drivers/net/wireless/rtlwifi/pci.c b/drivers/net/wireless/rtlwifi/pci.c
index 25daa87..ed3364d 100644
--- a/drivers/net/wireless/rtlwifi/pci.c
+++ b/drivers/net/wireless/rtlwifi/pci.c
@@ -1833,7 +1833,8 @@ static void rtl_pci_stop(struct ieee80211_hw *hw)
 	unsigned long flags;
 	u8 RFInProgressTimeOut = 0;

-	if (rtlpriv->cfg->ops->get_btc_status())
+	if (rtlpriv->cfg->ops->get_btc_status &&
+	    rtlpriv->cfg->ops->get_btc_status())
 		rtlpriv->btcoexist.btc_ops->btc_halt_notify();

 	/*
diff --git a/drivers/net/wireless/rtlwifi/ps.c b/drivers/net/wireless/rtlwifi/ps.c
index b69321d..2278af9 100644
--- a/drivers/net/wireless/rtlwifi/ps.c
+++ b/drivers/net/wireless/rtlwifi/ps.c
@@ -261,7 +261,8 @@ void rtl_ips_nic_off_wq_callback(void *data)
 			ppsc->in_powersavemode = true;

 			/* call before RF off */
-			if (rtlpriv->cfg->ops->get_btc_status())
+			if (rtlpriv->cfg->ops->get_btc_status &&
+			    rtlpriv->cfg->ops->get_btc_status())
 				rtlpriv->btcoexist.btc_ops->btc_ips_notify(rtlpriv,
 									ppsc->inactive_pwrstate);

@@ -306,7 +307,8 @@ void rtl_ips_nic_on(struct ieee80211_hw *hw)
 			ppsc->in_powersavemode = false;
 			_rtl_ps_inactive_ps(hw);
 			/* call after RF on */
-			if (rtlpriv->cfg->ops->get_btc_status())
+			if (rtlpriv->cfg->ops->get_btc_status &&
+			    rtlpriv->cfg->ops->get_btc_status())
 				rtlpriv->btcoexist.btc_ops->btc_ips_notify(rtlpriv,
 									ppsc->inactive_pwrstate);
 		}
@@ -390,14 +392,16 @@ void rtl_lps_set_psmode(struct ieee80211_hw *hw, u8 rt_psmode)
 			if (ppsc->p2p_ps_info.opp_ps)
 				rtl_p2p_ps_cmd(hw , P2P_PS_ENABLE);

-			if (rtlpriv->cfg->ops->get_btc_status())
+			if (rtlpriv->cfg->ops->get_btc_status &&
+			    rtlpriv->cfg->ops->get_btc_status())
 				rtlpriv->btcoexist.btc_ops->btc_lps_notify(rtlpriv, rt_psmode);
 		} else {
 			if (rtl_get_fwlps_doze(hw)) {
 				RT_TRACE(rtlpriv, COMP_RF, DBG_DMESG,
 					 "FW LPS enter ps_mode:%x\n",
 					 ppsc->fwctrl_psmode);
-				if (rtlpriv->cfg->ops->get_btc_status())
+				if (rtlpriv->cfg->ops->get_btc_status &&
+				    rtlpriv->cfg->ops->get_btc_status())
 					rtlpriv->btcoexist.btc_ops->btc_lps_notify(rtlpriv, rt_psmode);
 				enter_fwlps = true;
 				ppsc->pwr_mode = ppsc->fwctrl_psmode;
--
2.1.2

^ permalink raw reply related

* [PATCH net-next] mlx4: use napi_schedule_irqoff()
From: Eric Dumazet @ 2014-10-29 23:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Amir Vadai

From: Eric Dumazet <edumazet@google.com>

mlx4_en_rx_irq() and mlx4_en_tx_irq() run from hard interrupt context.

They can use napi_schedule_irqoff() instead of napi_schedule()

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |    4 ++--
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index c8e75dab80553c876b195361456fb49587231055..c562c1468944f9ad4319e5faaf19bf9e66d15eaf 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -878,8 +878,8 @@ void mlx4_en_rx_irq(struct mlx4_cq *mcq)
 	struct mlx4_en_cq *cq = container_of(mcq, struct mlx4_en_cq, mcq);
 	struct mlx4_en_priv *priv = netdev_priv(cq->dev);
 
-	if (priv->port_up)
-		napi_schedule(&cq->napi);
+	if (likely(priv->port_up))
+		napi_schedule_irqoff(&cq->napi);
 	else
 		mlx4_en_arm_cq(priv, cq);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 34c137878545fc672dad1a3d86e11c034c0ac368..5c4062921cdf46f1a7021a39705275c33ca4de77 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -479,8 +479,8 @@ void mlx4_en_tx_irq(struct mlx4_cq *mcq)
 	struct mlx4_en_cq *cq = container_of(mcq, struct mlx4_en_cq, mcq);
 	struct mlx4_en_priv *priv = netdev_priv(cq->dev);
 
-	if (priv->port_up)
-		napi_schedule(&cq->napi);
+	if (likely(priv->port_up))
+		napi_schedule_irqoff(&cq->napi);
 	else
 		mlx4_en_arm_cq(priv, cq);
 }

^ permalink raw reply related

* [PATCH net-next] bnx2x: use napi_schedule_irqoff()
From: Eric Dumazet @ 2014-10-30  0:07 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Ariel Elior

From: Eric Dumazet <edumazet@google.com>

bnx2x_msix_fp_int() and bnx2x_interrupt() run from hard interrupt
context.

They can use napi_schedule_irqoff() instead of napi_schedule()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ariel Elior <ariel.elior@qlogic.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |    2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 40beef5bca88ade51dd5248550c45059b8774476..e9af4af5edbaf2e464a2250038e0505aafbbcd06 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -1139,7 +1139,7 @@ static irqreturn_t bnx2x_msix_fp_int(int irq, void *fp_cookie)
 		prefetch(fp->txdata_ptr[cos]->tx_cons_sb);
 
 	prefetch(&fp->sb_running_index[SM_RX_ID]);
-	napi_schedule(&bnx2x_fp(bp, fp->index, napi));
+	napi_schedule_irqoff(&bnx2x_fp(bp, fp->index, napi));
 
 	return IRQ_HANDLED;
 }
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 74fbf9ea7bd878e4ee3f7d1561f4b74afe46ca54..c4bd025c74c96496a4709a4505da8a2d0f028df4 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -1931,7 +1931,7 @@ irqreturn_t bnx2x_interrupt(int irq, void *dev_instance)
 			for_each_cos_in_tx_queue(fp, cos)
 				prefetch(fp->txdata_ptr[cos]->tx_cons_sb);
 			prefetch(&fp->sb_running_index[SM_RX_ID]);
-			napi_schedule(&bnx2x_fp(bp, fp->index, napi));
+			napi_schedule_irqoff(&bnx2x_fp(bp, fp->index, napi));
 			status &= ~mask;
 		}
 	}

^ permalink raw reply related

* [PATCH net] cxgb4 : Fix missing initialization of win0_lock
From: Anish Bhatt @ 2014-10-30  0:54 UTC (permalink / raw)
  To: netdev; +Cc: davem, hariprasad, leedom, Anish Bhatt

win0_lock was being used un-initialized, resulting in warning traces
being seen when lock debugging is enabled (and just wrong)

Fixes : fc5ab0209650 ('cxgb4: Replaced the backdoor mechanism to access the HW
 memory with PCIe Window method')

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 97683c1..8520d55 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -6614,6 +6614,7 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	spin_lock_init(&adapter->stats_lock);
 	spin_lock_init(&adapter->tid_release_lock);
+	spin_lock_init(&adapter->win0_lock);
 
 	INIT_WORK(&adapter->tid_release_task, process_tid_release_list);
 	INIT_WORK(&adapter->db_full_task, process_db_full);
-- 
2.1.2

^ permalink raw reply related

* [PATCH v1 0/2]  drivers: net: xgene: Fix crash for backward compatibility
From: Iyappan Subramanian @ 2014-10-30  0:56 UTC (permalink / raw)
  To: davem, netdev, devicetree
  Cc: linux-arm-kernel, patches, kchudgar, Iyappan Subramanian

This patch set fixes the following issues that were reported during regression.

Patch 1/2: Disables 10GbE and SGMII based 1GbE by default for backward
	   compatiblity with older firmware (<= 1.13.28).  Newer firmware
	   will enable these interfaces based on its configuration.
Patch 2/2: Use separate hardware resources (descriptor ring, prefetch buffer)
	   that are not shared with the firmware
---

Iyappan Subramanian (2):
  dtb: xgene: fix: Disable 10GbE and SGMII based 1GbE by default
  drivers: net: xgene: fix: Use separate resources

 arch/arm64/boot/dts/apm-mustang.dts              | 8 --------
 arch/arm64/boot/dts/apm-storm.dtsi               | 4 ++--
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 6 +++---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h | 3 +++
 4 files changed, 8 insertions(+), 13 deletions(-)

-- 
1.9.1

^ permalink raw reply

* [PATCH v1 1/2] dtb: xgene: fix: Disable 10GbE and SGMII based 1GbE by default
From: Iyappan Subramanian @ 2014-10-30  0:56 UTC (permalink / raw)
  To: davem, netdev, devicetree
  Cc: linux-arm-kernel, patches, kchudgar, Iyappan Subramanian
In-Reply-To: <1414630580-24640-1-git-send-email-isubramanian@apm.com>

This patch disables 10GbE and SGMII based 1GbE interfaces by default
for backward compatibility with older firmware, which don't support
these interfaces.

The following kernel crash was reported when using older firmware (<= 1.13.28).

[    0.980000] libphy: APM X-Gene MDIO bus: probed
[    1.130000] Unhandled fault: synchronous external abort (0x96000010) at 0xffffff800009a17c
[    1.140000] Internal error: : 96000010 [#1] SMP
[    1.140000] Modules linked in:
[    1.140000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0+ #21
[    1.140000] task: ffffffc3f0110000 ti: ffffffc3f0064000 task.ti: ffffffc3f0064000
[    1.140000] PC is at ioread32+0x58/0x68
[    1.140000] LR is at xgene_enet_setup_ring+0x18c/0x1cc
[    1.140000] pc : [<ffffffc0003cec68>] lr : [<ffffffc00053dad8>] pstate: a0000045
[    1.140000] sp : ffffffc3f0067b20
[    1.140000] x29: ffffffc3f0067b20 x28: ffffffc000aa8ea0
[    1.140000] x27: ffffffc000bb2000 x26: ffffffc000a64270
[    1.140000] x25: ffffffc000b05ad8 x24: ffffffc0ff99ba58
[    1.140000] x23: 0000000000004000 x22: 0000000000004000
[    1.140000] x21: 0000000000000200 x20: 0000000000200000
[    1.140000] x19: ffffffc0ff99ba18 x18: ffffffc0007a6000
[    1.140000] x17: 0000000000000007 x16: 000000000000000e
[    1.140000] x15: 0000000000000001 x14: 0000000000000000
[    1.140000] x13: ffffffbeedb71320 x12: 00000000ffffff80
[    1.140000] x11: 0000000000000002 x10: 0000000000000000
[    1.140000] x9 : 0000000000000000 x8 : ffffffc3eb2a4000
[    1.140000] x7 : 0000000000000000 x6 : 0000000000000000
[    1.140000] x5 : 0000000001080000 x4 : 000000007d654010
[    1.140000] x3 : ffffffffffffffff x2 : 000000000003ffff
[    1.140000] x1 : ffffff800009a17c x0 : ffffff800009a17c

The issue was that the older firmware does not support 10GbE and
SGMII based 1GBE interfaces.

The newer firmware (version 1.13.29) will support 10GbE and SGMII based 1GbE
and it will patch the dtb to enable these nodes on the fly.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Reported-by: Dann Frazier <dann.frazier@canonical.com>
---
 arch/arm64/boot/dts/apm-mustang.dts | 8 --------
 arch/arm64/boot/dts/apm-storm.dtsi  | 4 ++--
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/boot/dts/apm-mustang.dts b/arch/arm64/boot/dts/apm-mustang.dts
index 2e25de0..f649000 100644
--- a/arch/arm64/boot/dts/apm-mustang.dts
+++ b/arch/arm64/boot/dts/apm-mustang.dts
@@ -40,11 +40,3 @@
 &menet {
 	status = "ok";
 };
-
-&sgenet0 {
-	status = "ok";
-};
-
-&xgenet {
-	status = "ok";
-};
diff --git a/arch/arm64/boot/dts/apm-storm.dtsi b/arch/arm64/boot/dts/apm-storm.dtsi
index 295c72d..52488c8 100644
--- a/arch/arm64/boot/dts/apm-storm.dtsi
+++ b/arch/arm64/boot/dts/apm-storm.dtsi
@@ -621,7 +621,7 @@
 			};
 		};
 
-		sgenet0: ethernet@1f210000 {
+		sgenet0: sgenet@1f210000 {
 			compatible = "apm,xgene-enet";
 			status = "disabled";
 			reg = <0x0 0x1f210000 0x0 0x10000>,
@@ -635,7 +635,7 @@
 			phy-connection-type = "sgmii";
 		};
 
-		xgenet: ethernet@1f610000 {
+		xgenet: xgenet@1f610000 {
 			compatible = "apm,xgene-enet";
 			status = "disabled";
 			reg = <0x0 0x1f610000 0x0 0xd100>,
-- 
1.9.1

^ permalink raw reply related

* [PATCH v1 2/2] drivers: net: xgene: fix: Use separate resources
From: Iyappan Subramanian @ 2014-10-30  0:56 UTC (permalink / raw)
  To: davem, netdev, devicetree
  Cc: linux-arm-kernel, patches, kchudgar, Iyappan Subramanian
In-Reply-To: <1414630580-24640-1-git-send-email-isubramanian@apm.com>

This patch fixes the following kernel crash during SGMII based 1GbE probe.

	BUG: Bad page state in process swapper/0  pfn:40fe6ad
	page:ffffffbee37a75d8 count:-1 mapcount:0 mapping:          (null) index:0x0
	flags: 0x0()
	page dumped because: nonzero _count
	Modules linked in:
	CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0+ #7
	Call trace:
	[<ffffffc000087fa0>] dump_backtrace+0x0/0x12c
	[<ffffffc0000880dc>] show_stack+0x10/0x1c
	[<ffffffc0004d981c>] dump_stack+0x74/0xc4
	[<ffffffc00012fe70>] bad_page+0xd8/0x128
	[<ffffffc000133000>] get_page_from_freelist+0x4b8/0x640
	[<ffffffc000133260>] __alloc_pages_nodemask+0xd8/0x834
	[<ffffffc0004194f8>] __netdev_alloc_frag+0x124/0x1b8
	[<ffffffc00041bfdc>] __netdev_alloc_skb+0x90/0x10c
	[<ffffffc00039ff30>] xgene_enet_refill_bufpool+0x11c/0x280
	[<ffffffc0003a11a4>] xgene_enet_process_ring+0x168/0x340
	[<ffffffc0003a1498>] xgene_enet_napi+0x1c/0x50
	[<ffffffc00042b454>] net_rx_action+0xc8/0x18c
	[<ffffffc0000b0880>] __do_softirq+0x114/0x24c
	[<ffffffc0000b0c34>] irq_exit+0x94/0xc8
	[<ffffffc0000e68a0>] __handle_domain_irq+0x8c/0xf4
	[<ffffffc000081288>] gic_handle_irq+0x30/0x7c

This was due to hardware resource sharing conflict with the firmware. This
patch fixes this crash by using resources (descriptor ring, prefetch buffer)
that are not shared.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 6 +++---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h | 3 +++
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 3c208cc..ac37e26 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -639,9 +639,9 @@ static int xgene_enet_create_desc_rings(struct net_device *ndev)
 	struct device *dev = ndev_to_dev(ndev);
 	struct xgene_enet_desc_ring *rx_ring, *tx_ring, *cp_ring;
 	struct xgene_enet_desc_ring *buf_pool = NULL;
-	u8 cpu_bufnum = 0, eth_bufnum = 0;
-	u8 bp_bufnum = 0x20;
-	u16 ring_id, ring_num = 0;
+	u8 cpu_bufnum = 0, eth_bufnum = START_ETH_BUFNUM;
+	u8 bp_bufnum = START_BP_BUFNUM;
+	u16 ring_id, ring_num = START_RING_NUM;
 	int ret;
 
 	/* allocate rx descriptor ring */
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
index 874e5a0..7d8b6ea 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
@@ -38,6 +38,9 @@
 #define SKB_BUFFER_SIZE		(XGENE_ENET_MAX_MTU - NET_IP_ALIGN)
 #define NUM_PKT_BUF	64
 #define NUM_BUFPOOL	32
+#define START_ETH_BUFNUM	2
+#define START_BP_BUFNUM		0x22
+#define START_RING_NUM		8
 
 #define PHY_POLL_LINK_ON	(10 * HZ)
 #define PHY_POLL_LINK_OFF	(PHY_POLL_LINK_ON / 5)
-- 
1.9.1

^ permalink raw reply related

* [PATCH] Bluetooth: Revert "Bluetooth: rfcomm: Remove unnecessary krfcommd event"
From: Sasha Levin @ 2014-10-30  1:32 UTC (permalink / raw)
  To: marcel, gustavo, johan.hedberg
  Cc: peter, davem, linux-bluetooth, netdev, linux-kernel, Sasha Levin

This reverts commit e5842cdb0f4f2c68f6acd39e286e5d10d8c073e8.

We can't call rfcomm_process_sessions() while our task state is not
TASK_RUNNING since rfcomm_process_sessions() tries to lock mutexes
and sleep. The scheduler even complains about it:

[   21.683959] WARNING: CPU: 13 PID: 8165 at kernel/sched/core.c:7305 __might_sleep+0xe5/0x1b0()
[   21.683962] do not call blocking ops when !TASK_RUNNING; state=1 set at rfcomm_run (net/bluetooth/rfcomm/core.c:2096)
[   21.683963] Modules linked in:
[   21.683966] CPU: 13 PID: 8165 Comm: krfcommd Tainted: G        W      3.18.0-rc2-next-20141029-sasha-00035-gd14bbcb-dirty #1425
[   21.683969]  ffffffffae2b4d0e 0000000000000000 ffff8805c0b23c00 ffff8805c0b23b98
[   21.683972]  ffffffffad010b76 0000000000000000 ffff8805c0b23bf8 ffff8805c0b23be8
[   21.683975]  ffffffffa3298dd8 ffffffffad0a0910 ffffffffa3309f95 ffff8805c0b23bc8
[   21.683976] Call Trace:
[   21.683979] dump_stack (lib/dump_stack.c:52)
[   21.683982] warn_slowpath_common (kernel/panic.c:432)
[   21.683985] ? __schedule (kernel/sched/core.c:2840)
[   21.683987] ? __might_sleep (kernel/sched/core.c:7311)
[   21.683990] warn_slowpath_fmt (kernel/panic.c:446)
[   21.683993] ? rfcomm_run (net/bluetooth/rfcomm/core.c:2096)
[   21.683996] ? rfcomm_run (net/bluetooth/rfcomm/core.c:2096)
[   21.683999] __might_sleep (kernel/sched/core.c:7311)
[   21.684002] mutex_lock_nested (kernel/locking/mutex.c:508 kernel/locking/mutex.c:622)
[   21.684004] ? __schedule (./arch/x86/include/asm/bitops.h:311 include/linux/thread_info.h:91 include/linux/sched.h:2937 kernel/sched/core.c:2845)
[   21.684008] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[   21.684011] rfcomm_run (net/bluetooth/rfcomm/core.c:1990 net/bluetooth/rfcomm/core.c:2102)
[   21.684014] ? preempt_count_sub (kernel/sched/core.c:2641)
[   21.684017] ? __schedule (./arch/x86/include/asm/bitops.h:311 include/linux/thread_info.h:91 include/linux/sched.h:2937 kernel/sched/core.c:2845)
[   21.684020] ? rfcomm_process_rx (net/bluetooth/rfcomm/core.c:2088)
[   21.684023] ? rfcomm_process_rx (net/bluetooth/rfcomm/core.c:2088)
[   21.684025] kthread (kernel/kthread.c:207)
[   21.684029] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2559 kernel/locking/lockdep.c:2601)
[   21.684032] ? flush_kthread_work (kernel/kthread.c:176)
[   21.684035] ret_from_fork (arch/x86/kernel/entry_64.S:348)
[   21.684038] ? flush_kthread_work (kernel/kthread.c:176)

Instead, just go to the old way oftracking wakeups.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/bluetooth/rfcomm/core.c |   17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
index bce9c3d..942de7765 100644
--- a/net/bluetooth/rfcomm/core.c
+++ b/net/bluetooth/rfcomm/core.c
@@ -48,6 +48,7 @@ static DEFINE_MUTEX(rfcomm_mutex);
 #define rfcomm_lock()	mutex_lock(&rfcomm_mutex)
 #define rfcomm_unlock()	mutex_unlock(&rfcomm_mutex)
 
+static unsigned long rfcomm_event;
 
 static LIST_HEAD(session_list);
 
@@ -105,6 +106,7 @@ static void rfcomm_schedule(void)
 {
 	if (!rfcomm_thread)
 		return;
+	set_bit(RFCOMM_SCHED_WAKEUP, &rfcomm_event);
 	wake_up_process(rfcomm_thread);
 }
 
@@ -2092,18 +2094,19 @@ static int rfcomm_run(void *unused)
 
 	rfcomm_add_listener(BDADDR_ANY);
 
-	while (1) {
+	while (!kthread_should_stop()) {
 		set_current_state(TASK_INTERRUPTIBLE);
-
-		if (kthread_should_stop())
-			break;
+		if (!test_bit(RFCOMM_SCHED_WAKEUP, &rfcomm_event)) {
+			/* No pending events. Let's sleep.
+			 * Incoming connections and data will wake us up. */
+			schedule();
+		}
+		set_current_state(TASK_RUNNING);
 
 		/* Process stuff */
+		clear_bit(RFCOMM_SCHED_WAKEUP, &rfcomm_event);
 		rfcomm_process_sessions();
-
-		schedule();
 	}
-	__set_current_state(TASK_RUNNING);
 
 	rfcomm_kill_listener();
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH -next] syncookies: only increment SYNCOOKIESFAILED on validation error
From: Florian Westphal @ 2014-10-30  1:55 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

Only count packets that failed cookie-authentication.
We can get SYNCOOKIESFAILED > 0 while we never even sent a single cookie.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/ipv4/syncookies.c | 7 +++++--
 net/ipv6/syncookies.c | 7 +++++--
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 32b98d0..4ac7bca 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -275,8 +275,11 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
 	if (!sysctl_tcp_syncookies || !th->ack || th->rst)
 		goto out;
 
-	if (tcp_synq_no_recent_overflow(sk) ||
-	    (mss = __cookie_v4_check(ip_hdr(skb), th, cookie)) == 0) {
+	if (tcp_synq_no_recent_overflow(sk))
+		goto out;
+
+	mss = __cookie_v4_check(ip_hdr(skb), th, cookie);
+	if (mss == 0) {
 		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESFAILED);
 		goto out;
 	}
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 0e26e79..be291ba 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -171,8 +171,11 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 	if (!sysctl_tcp_syncookies || !th->ack || th->rst)
 		goto out;
 
-	if (tcp_synq_no_recent_overflow(sk) ||
-		(mss = __cookie_v6_check(ipv6_hdr(skb), th, cookie)) == 0) {
+	if (tcp_synq_no_recent_overflow(sk))
+		goto out;
+
+	mss = __cookie_v6_check(ipv6_hdr(skb), th, cookie);
+	if (mss == 0) {
 		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESFAILED);
 		goto out;
 	}
-- 
2.0.4

^ permalink raw reply related

* Re: TCP NewReno and single retransmit
From: Neal Cardwell @ 2014-10-30  2:03 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: netdev, Yuchung Cheng, Eric Dumazet
In-Reply-To: <544E93BD.50202@redhat.com>

On Mon, Oct 27, 2014 at 2:49 PM, Marcelo Ricardo Leitner
<mleitner@redhat.com> wrote:
> Hi,
>
> We have a report from a customer saying that on a very calm connection, like
> having only a single data packet within some minutes, if this packet gets to
> be re-transmitted, retrans_stamp is only cleared when the next acked packet
> is received. But this may make we abort the connection too soon if this next
> packet also gets lost, because the reference for the initial loss is still
> for a big while ago..
...
> @@ -2382,31 +2382,32 @@ static inline bool tcp_may_undo(const struct
> tcp_sock *tp)
>  static bool tcp_try_undo_recovery(struct sock *sk)
...
>         if (tp->snd_una == tp->high_seq && tcp_is_reno(tp)) {
>                 /* Hold old state until something *above* high_seq
>                  * is ACKed. For Reno it is MUST to prevent false
>                  * fast retransmits (RFC2582). SACK TCP is safe. */
>                 tcp_moderate_cwnd(tp);
> +               tp->retrans_stamp = 0;
>                 return true;
>         }
>         tcp_set_ca_state(sk, TCP_CA_Open);
>         return false;
>  }
>
> We would still hold state, at least part of it.. WDYT?

This approach sounds OK to me as long as we include a check of
tcp_any_retrans_done(), as we do in the similar code paths (for
motivation, see the comment above tcp_any_retrans_done()).

So it sounds fine to me if you change that one new line to the following 2:

+  if (!tcp_any_retrans_done(sk))
+    tp->retrans_stamp = 0;

Nice catch!

neal

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox