Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] neigh: optimize neigh_parms_release()
From: David Miller @ 2014-10-29 20:12 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev
In-Reply-To: <1414607371-4246-1-git-send-email-nicolas.dichtel@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Wed, 29 Oct 2014 19:29:31 +0100

> In neigh_parms_release() we loop over all entries to find the entry given in
> argument and being able to remove it from the list. By using a double linked
> list, we can avoid this loop.
> 
> Here are some numbers with 30 000 dummy interfaces configured:
> 
> Before the patch:
> $ time rmmod dummy
> real	2m0.118s
> user	0m0.000s
> sys	1m50.048s
> 
> After the patch:
> $ time rmmod dummy
> real	1m9.970s
> user	0m0.000s
> sys	0m47.976s
> 
> Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Looks great, applied, thanks Nicolas.

^ permalink raw reply

* Re: [PATCH net-next] net: introduce napi_schedule_irqoff()
From: David Miller @ 2014-10-29 20:08 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1414544713.631.30.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 28 Oct 2014 18:05:13 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> napi_schedule() can be called from any context and has to mask hard
> irqs.
> 
> Add a variant that can only be called from hard interrupts handlers
> or when irqs are already masked.
> 
> Many NIC drivers can use it from their hard IRQ handler instead of
> generic variant.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Thomas Gleixner @ 2014-10-29 20:07 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <20141029195054.GH10501@worktop.programming.kicks-ass.net>

On Wed, 29 Oct 2014, Peter Zijlstra wrote:

> On Wed, Oct 29, 2014 at 08:49:03PM +0100, Thomas Gleixner wrote:
> > On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> > 
> > > On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > > > Yuck. No. You are just papering over the problem.
> > > > 
> > > > What happens if you add 'threadirqs' to the kernel command line? Or if
> > > > the interrupt line is shared with a real threaded interrupt user?
> > > > 
> > > > The proper solution is to have a poll_lock for e1000 which serializes
> > > > the hardware interrupt against netpoll instead of using
> > > > disable/enable_irq().
> > > > 
> > > > In fact that's less expensive than the disable/enable_irq() dance and
> > > > the chance of contention is pretty low. If done right it will be a
> > > > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > > > 
> > > 
> > > OK a little something like so then I suppose.. But I suspect most all
> > > the network drivers will need this and maybe more, disable_irq() is a
> > > popular little thing and we 'just' changed semantics on them.
> > 
> > We changed that almost 4 years ago :) What we 'just' did was to add a
> > prominent warning into the code.
> 
> You know that is the same right... they didn't know it was broken
> therefore it wasn't :-), but now they need to go actually do stuff about
> it, an entirely different proposition.

Right, and of course the world and some more has the very same code
there:

poll_controller()
{
	disable_irq();
	dev_interrupt_handler();
	enable_irq();
}

Trying to twist my brain to come up with a solution which avoids the
spinlock, but I have a hard time to come up with one.

The only thing I came up with so far is to avoid adding locks to every
driver incarnation and instead put it into struct net_device and
provide helper functions for the lock/unlock case.

That does not change the fact that we need to deal with that on a per
driver basis :(

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCHv1 0/2 net-next] xen-netback: minor cleanups
From: David Miller @ 2014-10-29 20:00 UTC (permalink / raw)
  To: david.vrabel; +Cc: netdev, xen-devel, ian.campbell, wei.liu2
In-Reply-To: <1414510171-12853-1-git-send-email-david.vrabel@citrix.com>

From: David Vrabel <david.vrabel@citrix.com>
Date: Tue, 28 Oct 2014 15:29:29 +0000

> Two minor xen-netback cleanups originally from Zoltan.

Series applied, thanks everyone.

^ permalink raw reply

* Re: [RFC] use smp_load_acquire()/smp_store_release()
From: Eric Dumazet @ 2014-10-29 19:57 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: Alexander Duyck, netdev
In-Reply-To: <1414610868.2420.52.camel@jtkirshe-mobl>

On Wed, 2014-10-29 at 12:27 -0700, Jeff Kirsher wrote:
> On Wed, 2014-10-29 at 09:16 -0700, Alexander Duyck wrote:
> > On 10/29/2014 07:49 AM, Eric Dumazet wrote:
> > > Hi Alexander
> > >
> > > The memory barriers added in commit
> > > b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
> > > ("net: Add memory barriers to prevent possible race in byte queue
> > > limits")
> > >
> > > have heavy cost.
> > >
> > > It seems we could use smp_load_acquire() and smp_store_release()
> > > instead ?
> > >
> > > I'll post a patch later today. I would be interested if someone was able
> > > to test it, as your commit apparently was tested and known to fix a
> > > reproducible race.
> > >
> > > Thanks !
> 
> Eric- just CC me on the patch you post and I will see what I can do
> about getting validation eyes on it.

Thanks guys, will do, and will CC Paul as well.

Alexander, here is the following profile showing the cost of the
'mfence', in a typical rpc workload (a lot of IRQ are generated for TX
completions, because RPC tend to send small packets)

  0.11 │       je     33a
       │       mov    -0x3c(%rbp),%esi
  0.06 │       lea    0xc0(%rbx),%rdi
  0.06 │       callq  dql_completed
  0.06 │       mfence
 38.68 │       mov    0xc4(%rbx),%edx
  1.83 │       mov    0xc0(%rbx),%eax
       │       cmp    %eax,%edx
  0.22 │       js     333
  0.11 │       lock   btrl $0x1,0x98(%rbx)

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Thomas Gleixner @ 2014-10-29 19:53 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: Peter Zijlstra, Sabrina Dubroca, netdev, linux-kernel
In-Reply-To: <1414611641.2420.54.camel@jtkirshe-mobl>

On Wed, 29 Oct 2014, Jeff Kirsher wrote:
> On Wed, 2014-10-29 at 20:36 +0100, Peter Zijlstra wrote:
> > On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > > Yuck. No. You are just papering over the problem.
> > > 
> > > What happens if you add 'threadirqs' to the kernel command line? Or if
> > > the interrupt line is shared with a real threaded interrupt user?
> > > 
> > > The proper solution is to have a poll_lock for e1000 which serializes
> > > the hardware interrupt against netpoll instead of using
> > > disable/enable_irq().
> > > 
> > > In fact that's less expensive than the disable/enable_irq() dance and
> > > the chance of contention is pretty low. If done right it will be a
> > > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > > 
> > 
> > OK a little something like so then I suppose.. But I suspect most all
> > the network drivers will need this and maybe more, disable_irq() is a
> > popular little thing and we 'just' changed semantics on them.
> 
> Thomas- if you are fine with Peter's patch, I can get this under
> testing.

I'm fine with it except for the comment part of disable_irq(), but
that does not matter :)

One nitpick: Instead of having the lock unconditionally, I'd make it
depend on CONFIG_NET_POLL_CONTROLLER.

#ifdef CONFIG_NET_POLL_CONTROLLER
static inline void netpoll_lock(struct e1000_adapter *adapter)
{
	spin_lock(&adapter->irq_lock);
}

static inline void netpoll_unlock(struct e1000_adapter *adapter)
{
	spin_unlock(&adapter->irq_lock);
}
#else
static inline void netpoll_lock(struct e1000_adapter *adapter) { }
static inline void netpoll_unlock(struct e1000_adapter *adapter) { }
#endif

and use that instead of the unconditional spin[un]lock() invocations.

But that's up to you.

Thanks,

	tglx

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Peter Zijlstra @ 2014-10-29 19:50 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <alpine.DEB.2.11.1410292046270.5308@nanos>

On Wed, Oct 29, 2014 at 08:49:03PM +0100, Thomas Gleixner wrote:
> On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> 
> > On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > > Yuck. No. You are just papering over the problem.
> > > 
> > > What happens if you add 'threadirqs' to the kernel command line? Or if
> > > the interrupt line is shared with a real threaded interrupt user?
> > > 
> > > The proper solution is to have a poll_lock for e1000 which serializes
> > > the hardware interrupt against netpoll instead of using
> > > disable/enable_irq().
> > > 
> > > In fact that's less expensive than the disable/enable_irq() dance and
> > > the chance of contention is pretty low. If done right it will be a
> > > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > > 
> > 
> > OK a little something like so then I suppose.. But I suspect most all
> > the network drivers will need this and maybe more, disable_irq() is a
> > popular little thing and we 'just' changed semantics on them.
> 
> We changed that almost 4 years ago :) What we 'just' did was to add a
> prominent warning into the code.

You know that is the same right... they didn't know it was broken
therefore it wasn't :-), but now they need to go actually do stuff about
it, an entirely different proposition.

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Thomas Gleixner @ 2014-10-29 19:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <20141029193603.GS12706@worktop.programming.kicks-ass.net>

On Wed, 29 Oct 2014, Peter Zijlstra wrote:

> On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > Yuck. No. You are just papering over the problem.
> > 
> > What happens if you add 'threadirqs' to the kernel command line? Or if
> > the interrupt line is shared with a real threaded interrupt user?
> > 
> > The proper solution is to have a poll_lock for e1000 which serializes
> > the hardware interrupt against netpoll instead of using
> > disable/enable_irq().
> > 
> > In fact that's less expensive than the disable/enable_irq() dance and
> > the chance of contention is pretty low. If done right it will be a
> > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > 
> 
> OK a little something like so then I suppose.. But I suspect most all
> the network drivers will need this and maybe more, disable_irq() is a
> popular little thing and we 'just' changed semantics on them.

We changed that almost 4 years ago :) What we 'just' did was to add a
prominent warning into the code.
 
> ---
>  drivers/net/ethernet/intel/e1000/e1000.h      |  2 ++
>  drivers/net/ethernet/intel/e1000/e1000_main.c | 22 +++++++++++++++++-----
>  kernel/irq/manage.c                           |  2 +-
>  3 files changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
> index 69707108d23c..3f48609f2318 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000.h
> +++ b/drivers/net/ethernet/intel/e1000/e1000.h
> @@ -323,6 +323,8 @@ struct e1000_adapter {
>  	struct delayed_work watchdog_task;
>  	struct delayed_work fifo_stall_task;
>  	struct delayed_work phy_info_task;
> +
> +	spinlock_t irq_lock;
>  };
>  
>  enum e1000_state_t {
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index 5f6aded512f5..d12cbffe2149 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -1310,6 +1310,7 @@ static int e1000_sw_init(struct e1000_adapter *adapter)
>  	e1000_irq_disable(adapter);
>  
>  	spin_lock_init(&adapter->stats_lock);
> +	spin_lock_init(&adapter->irq_lock);
>  
>  	set_bit(__E1000_DOWN, &adapter->flags);
>  
> @@ -3748,10 +3749,8 @@ void e1000_update_stats(struct e1000_adapter *adapter)
>   * @irq: interrupt number
>   * @data: pointer to a network interface device structure
>   **/
> -static irqreturn_t e1000_intr(int irq, void *data)
> +static irqreturn_t __e1000_intr(int irq, struct e1000_adapter *adapter)
>  {
> -	struct net_device *netdev = data;
> -	struct e1000_adapter *adapter = netdev_priv(netdev);
>  	struct e1000_hw *hw = &adapter->hw;
>  	u32 icr = er32(ICR);
>  
> @@ -3793,6 +3792,19 @@ static irqreturn_t e1000_intr(int irq, void *data)
>  	return IRQ_HANDLED;
>  }
>  
> +static irqreturn_t e1000_intr(int irq, void *data)
> +{
> +	struct net_device *netdev = data;
> +	struct e1000_adapter *adapter = netdev_priv(netdev);
> +	irqreturn_t ret;
> +
> +	spin_lock(&adapter->irq_lock);
> +	ret = __e1000_intr(irq, adapter);
> +	spin_unlock(&adapter->irq_lock);
> +
> +	return ret;
> +}
> +
>  /**
>   * e1000_clean - NAPI Rx polling callback
>   * @adapter: board private structure
> @@ -5217,9 +5229,9 @@ static void e1000_netpoll(struct net_device *netdev)
>  {
>  	struct e1000_adapter *adapter = netdev_priv(netdev);
>  
> -	disable_irq(adapter->pdev->irq);
> +	spin_lock(&adapter->irq_lock)
>  	e1000_intr(adapter->pdev->irq, netdev);
> -	enable_irq(adapter->pdev->irq);
> +	spin_unlock(&adapter->irq_lock)
>  }
>  #endif
>  
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 0a9104b4608b..b5a4a06bf2fd 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -427,7 +427,7 @@ EXPORT_SYMBOL(disable_irq_nosync);
>   *	to complete before returning. If you use this function while
>   *	holding a resource the IRQ handler may need you will deadlock.
>   *
> - *	This function may be called - with care - from IRQ context.
> + *	This function may _NOT_ be called from IRQ context.

It can only be called from preemptible thread context.

Thanks,

	tglx

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Jeff Kirsher @ 2014-10-29 19:40 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Thomas Gleixner, Sabrina Dubroca, netdev, linux-kernel
In-Reply-To: <20141029193603.GS12706@worktop.programming.kicks-ass.net>

[-- Attachment #1: Type: text/plain, Size: 4206 bytes --]

On Wed, 2014-10-29 at 20:36 +0100, Peter Zijlstra wrote:
> On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > Yuck. No. You are just papering over the problem.
> > 
> > What happens if you add 'threadirqs' to the kernel command line? Or if
> > the interrupt line is shared with a real threaded interrupt user?
> > 
> > The proper solution is to have a poll_lock for e1000 which serializes
> > the hardware interrupt against netpoll instead of using
> > disable/enable_irq().
> > 
> > In fact that's less expensive than the disable/enable_irq() dance and
> > the chance of contention is pretty low. If done right it will be a
> > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > 
> 
> OK a little something like so then I suppose.. But I suspect most all
> the network drivers will need this and maybe more, disable_irq() is a
> popular little thing and we 'just' changed semantics on them.

Thomas- if you are fine with Peter's patch, I can get this under
testing.

> 
> ---
>  drivers/net/ethernet/intel/e1000/e1000.h      |  2 ++
>  drivers/net/ethernet/intel/e1000/e1000_main.c | 22 +++++++++++++++++-----
>  kernel/irq/manage.c                           |  2 +-
>  3 files changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
> index 69707108d23c..3f48609f2318 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000.h
> +++ b/drivers/net/ethernet/intel/e1000/e1000.h
> @@ -323,6 +323,8 @@ struct e1000_adapter {
>  	struct delayed_work watchdog_task;
>  	struct delayed_work fifo_stall_task;
>  	struct delayed_work phy_info_task;
> +
> +	spinlock_t irq_lock;
>  };
>  
>  enum e1000_state_t {
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index 5f6aded512f5..d12cbffe2149 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -1310,6 +1310,7 @@ static int e1000_sw_init(struct e1000_adapter *adapter)
>  	e1000_irq_disable(adapter);
>  
>  	spin_lock_init(&adapter->stats_lock);
> +	spin_lock_init(&adapter->irq_lock);
>  
>  	set_bit(__E1000_DOWN, &adapter->flags);
>  
> @@ -3748,10 +3749,8 @@ void e1000_update_stats(struct e1000_adapter *adapter)
>   * @irq: interrupt number
>   * @data: pointer to a network interface device structure
>   **/
> -static irqreturn_t e1000_intr(int irq, void *data)
> +static irqreturn_t __e1000_intr(int irq, struct e1000_adapter *adapter)
>  {
> -	struct net_device *netdev = data;
> -	struct e1000_adapter *adapter = netdev_priv(netdev);
>  	struct e1000_hw *hw = &adapter->hw;
>  	u32 icr = er32(ICR);
>  
> @@ -3793,6 +3792,19 @@ static irqreturn_t e1000_intr(int irq, void *data)
>  	return IRQ_HANDLED;
>  }
>  
> +static irqreturn_t e1000_intr(int irq, void *data)
> +{
> +	struct net_device *netdev = data;
> +	struct e1000_adapter *adapter = netdev_priv(netdev);
> +	irqreturn_t ret;
> +
> +	spin_lock(&adapter->irq_lock);
> +	ret = __e1000_intr(irq, adapter);
> +	spin_unlock(&adapter->irq_lock);
> +
> +	return ret;
> +}
> +
>  /**
>   * e1000_clean - NAPI Rx polling callback
>   * @adapter: board private structure
> @@ -5217,9 +5229,9 @@ static void e1000_netpoll(struct net_device *netdev)
>  {
>  	struct e1000_adapter *adapter = netdev_priv(netdev);
>  
> -	disable_irq(adapter->pdev->irq);
> +	spin_lock(&adapter->irq_lock)
>  	e1000_intr(adapter->pdev->irq, netdev);
> -	enable_irq(adapter->pdev->irq);
> +	spin_unlock(&adapter->irq_lock)
>  }
>  #endif
>  
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 0a9104b4608b..b5a4a06bf2fd 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -427,7 +427,7 @@ EXPORT_SYMBOL(disable_irq_nosync);
>   *	to complete before returning. If you use this function while
>   *	holding a resource the IRQ handler may need you will deadlock.
>   *
> - *	This function may be called - with care - from IRQ context.
> + *	This function may _NOT_ be called from IRQ context.
>   */
>  void disable_irq(unsigned int irq)
>  {



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Peter Zijlstra @ 2014-10-29 19:36 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <alpine.DEB.2.11.1410291918060.5308@nanos>

On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> Yuck. No. You are just papering over the problem.
> 
> What happens if you add 'threadirqs' to the kernel command line? Or if
> the interrupt line is shared with a real threaded interrupt user?
> 
> The proper solution is to have a poll_lock for e1000 which serializes
> the hardware interrupt against netpoll instead of using
> disable/enable_irq().
> 
> In fact that's less expensive than the disable/enable_irq() dance and
> the chance of contention is pretty low. If done right it will be a
> NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> 

OK a little something like so then I suppose.. But I suspect most all
the network drivers will need this and maybe more, disable_irq() is a
popular little thing and we 'just' changed semantics on them.

---
 drivers/net/ethernet/intel/e1000/e1000.h      |  2 ++
 drivers/net/ethernet/intel/e1000/e1000_main.c | 22 +++++++++++++++++-----
 kernel/irq/manage.c                           |  2 +-
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
index 69707108d23c..3f48609f2318 100644
--- a/drivers/net/ethernet/intel/e1000/e1000.h
+++ b/drivers/net/ethernet/intel/e1000/e1000.h
@@ -323,6 +323,8 @@ struct e1000_adapter {
 	struct delayed_work watchdog_task;
 	struct delayed_work fifo_stall_task;
 	struct delayed_work phy_info_task;
+
+	spinlock_t irq_lock;
 };
 
 enum e1000_state_t {
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 5f6aded512f5..d12cbffe2149 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -1310,6 +1310,7 @@ static int e1000_sw_init(struct e1000_adapter *adapter)
 	e1000_irq_disable(adapter);
 
 	spin_lock_init(&adapter->stats_lock);
+	spin_lock_init(&adapter->irq_lock);
 
 	set_bit(__E1000_DOWN, &adapter->flags);
 
@@ -3748,10 +3749,8 @@ void e1000_update_stats(struct e1000_adapter *adapter)
  * @irq: interrupt number
  * @data: pointer to a network interface device structure
  **/
-static irqreturn_t e1000_intr(int irq, void *data)
+static irqreturn_t __e1000_intr(int irq, struct e1000_adapter *adapter)
 {
-	struct net_device *netdev = data;
-	struct e1000_adapter *adapter = netdev_priv(netdev);
 	struct e1000_hw *hw = &adapter->hw;
 	u32 icr = er32(ICR);
 
@@ -3793,6 +3792,19 @@ static irqreturn_t e1000_intr(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
+static irqreturn_t e1000_intr(int irq, void *data)
+{
+	struct net_device *netdev = data;
+	struct e1000_adapter *adapter = netdev_priv(netdev);
+	irqreturn_t ret;
+
+	spin_lock(&adapter->irq_lock);
+	ret = __e1000_intr(irq, adapter);
+	spin_unlock(&adapter->irq_lock);
+
+	return ret;
+}
+
 /**
  * e1000_clean - NAPI Rx polling callback
  * @adapter: board private structure
@@ -5217,9 +5229,9 @@ static void e1000_netpoll(struct net_device *netdev)
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 
-	disable_irq(adapter->pdev->irq);
+	spin_lock(&adapter->irq_lock)
 	e1000_intr(adapter->pdev->irq, netdev);
-	enable_irq(adapter->pdev->irq);
+	spin_unlock(&adapter->irq_lock)
 }
 #endif
 
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 0a9104b4608b..b5a4a06bf2fd 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -427,7 +427,7 @@ EXPORT_SYMBOL(disable_irq_nosync);
  *	to complete before returning. If you use this function while
  *	holding a resource the IRQ handler may need you will deadlock.
  *
- *	This function may be called - with care - from IRQ context.
+ *	This function may _NOT_ be called from IRQ context.
  */
 void disable_irq(unsigned int irq)
 {

^ permalink raw reply related

* Re: net: fec: fix regression on i.MX28 introduced by rx_copybreak support
From: David Miller @ 2014-10-29 19:34 UTC (permalink / raw)
  To: LW
  Cc: netdev, rmk+kernel, Frank.Li, fabio.estevam, linux-kernel,
	linux-arm-kernel
In-Reply-To: <1414502584-10583-1-git-send-email-LW@KARO-electronics.de>

From: Lothar Waßmann <LW@KARO-electronics.de>
Date: Tue, 28 Oct 2014 14:22:55 +0100

> Changes wrt. v1:
> - added some cleanup patches
> - simplify handling of 'quirks' flags as suggested by Russell King.
> - remove DIV_ROUND_UP() from byte swapping loop as suggested by
>   Eric Dumazet
> 
> Changes wrt. v2:
> - rebased against next-20141028
> - added some more cleanups in fec.h
> - removed unused return value from swap_buffer()
> - fixed messed swab32s() call in swap_buffer2()
> - fixed messed up setup of fep->quirks
> 

It is not appropriate to mix cleanups and bonafide bug fixes.

I want to see only bug fixes targetted at 'net'.  You can later
submit the cleanups to 'net-next'.

Also, I don't thnk your DIV_ROUND_UP() eliminate for the loop
in swap_buffer() is valid.  The whole point is that the current
code handles buffers which have a length which is not a multiple
of 4 properly, after your change it will no longer do so.

^ permalink raw reply

* [PATCH net-next 3/3] sunvnet: Use one Tx queue per vnet_port
From: Sowmini Varadhan @ 2014-10-29 19:27 UTC (permalink / raw)
  To: davem, sowmini.varadhan; +Cc: netdev


Use multple Tx netdev queues for sunvnet by supporting a one-to-one
mapping between vnet_port and Tx queue. Provide a ndo_select_queue
indirection (vnet_select_queue()) which selects the queue based
on the peer that would be selected in vnet_start_xmit()

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 drivers/net/ethernet/sun/sunvnet.c | 94 +++++++++++++++++++++++++-------------
 drivers/net/ethernet/sun/sunvnet.h |  2 +
 2 files changed, 65 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
index 7ada479..e7bb63b 100644
--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -40,6 +40,8 @@ MODULE_DESCRIPTION("Sun LDOM virtual network driver");
 MODULE_LICENSE("GPL");
 MODULE_VERSION(DRV_MODULE_VERSION);
 
+#define	VNET_MAX_TXQS		16
+
 /* Heuristic for the number of times to exponentially backoff and
  * retry sending an LDC trigger when EAGAIN is encountered
  */
@@ -551,6 +553,8 @@ static int vnet_ack(struct vnet_port *port, void *msgbuf)
 	struct vnet *vp;
 	u32 end;
 	struct vio_net_desc *desc;
+	struct netdev_queue *txq;
+
 	if (unlikely(pkt->tag.stype_env != VIO_DRING_DATA))
 		return 0;
 
@@ -580,7 +584,8 @@ static int vnet_ack(struct vnet_port *port, void *msgbuf)
 	}
 	netif_tx_unlock(dev);
 
-	if (unlikely(netif_queue_stopped(dev) &&
+	txq = netdev_get_tx_queue(dev, port->q_index);
+	if (unlikely(netif_tx_queue_stopped(txq) &&
 		     vnet_tx_dring_avail(dr) >= VNET_TX_WAKEUP_THRESH(dr)))
 		return 1;
 
@@ -608,31 +613,23 @@ static int handle_mcast(struct vnet_port *port, void *msgbuf)
 	return 0;
 }
 
-static void maybe_tx_wakeup(struct vnet *vp)
+/* Got back a STOPPED LDC message on port. If the queue is stopped,
+ * wake it up so that we'll send out another START message at the
+ * next TX.
+ */
+static void maybe_tx_wakeup(struct vnet_port *port)
 {
-	struct net_device *dev = vp->dev;
+	struct netdev_queue *txq;
 
-	netif_tx_lock(dev);
-	if (likely(netif_queue_stopped(dev))) {
-		struct vnet_port *port;
-		int wake = 1;
-
-		rcu_read_lock();
-		list_for_each_entry_rcu(port, &vp->port_list, list) {
-			struct vio_dring_state *dr;
-
-			dr = &port->vio.drings[VIO_DRIVER_TX_RING];
-			if (vnet_tx_dring_avail(dr) <
-			    VNET_TX_WAKEUP_THRESH(dr)) {
-				wake = 0;
-				break;
-			}
-		}
-		rcu_read_unlock();
-		if (wake)
-			netif_wake_queue(dev);
+	txq = netdev_get_tx_queue(port->vp->dev, port->q_index);
+	__netif_tx_lock(txq, smp_processor_id());
+	if (likely(netif_tx_queue_stopped(txq))) {
+		struct vio_dring_state *dr;
+
+		dr = &port->vio.drings[VIO_DRIVER_TX_RING];
+			netif_tx_wake_queue(txq);
 	}
-	netif_tx_unlock(dev);
+	__netif_tx_unlock(txq);
 }
 
 static inline bool port_is_up(struct vnet_port *vnet)
@@ -748,7 +745,7 @@ napi_resume:
 			break;
 	}
 	if (unlikely(tx_wakeup && err != -ECONNRESET))
-		maybe_tx_wakeup(port->vp);
+		maybe_tx_wakeup(port);
 	return npkts;
 }
 
@@ -953,6 +950,16 @@ static inline struct sk_buff *vnet_skb_shape(struct sk_buff *skb, void **pstart,
 	return skb;
 }
 
+static u16
+vnet_select_queue(struct net_device *dev, struct sk_buff *skb,
+		  void *accel_priv, select_queue_fallback_t fallback)
+{
+	struct vnet *vp = netdev_priv(dev);
+	struct vnet_port *port = __tx_port_find(vp, skb);
+
+	return port->q_index;
+}
+
 static int vnet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct vnet *vp = netdev_priv(dev);
@@ -965,6 +972,7 @@ static int vnet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	void *start = NULL;
 	int nlen = 0;
 	unsigned pending = 0;
+	struct netdev_queue *txq;
 
 	skb = vnet_skb_shape(skb, &start, &nlen);
 	if (unlikely(!skb))
@@ -1008,9 +1016,11 @@ static int vnet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	dr = &port->vio.drings[VIO_DRIVER_TX_RING];
+	i = skb_get_queue_mapping(skb);
+	txq = netdev_get_tx_queue(dev, i);
 	if (unlikely(vnet_tx_dring_avail(dr) < 1)) {
-		if (!netif_queue_stopped(dev)) {
-			netif_stop_queue(dev);
+		if (!netif_tx_queue_stopped(txq)) {
+			netif_tx_stop_queue(txq);
 
 			/* This is a hard error, log it. */
 			netdev_err(dev, "BUG! Tx Ring full when queue awake!\n");
@@ -1104,9 +1114,9 @@ ldc_start_done:
 
 	dr->prod = (dr->prod + 1) & (VNET_TX_RING_SIZE - 1);
 	if (unlikely(vnet_tx_dring_avail(dr) < 1)) {
-		netif_stop_queue(dev);
+		netif_tx_stop_queue(txq);
 		if (vnet_tx_dring_avail(dr) > VNET_TX_WAKEUP_THRESH(dr))
-			netif_wake_queue(dev);
+			netif_tx_wake_queue(txq);
 	}
 
 	(void)mod_timer(&port->clean_timer, jiffies + VNET_CLEAN_TIMEOUT);
@@ -1139,14 +1149,14 @@ static void vnet_tx_timeout(struct net_device *dev)
 static int vnet_open(struct net_device *dev)
 {
 	netif_carrier_on(dev);
-	netif_start_queue(dev);
+	netif_tx_start_all_queues(dev);
 
 	return 0;
 }
 
 static int vnet_close(struct net_device *dev)
 {
-	netif_stop_queue(dev);
+	netif_tx_stop_all_queues(dev);
 	netif_carrier_off(dev);
 
 	return 0;
@@ -1420,6 +1430,7 @@ static const struct net_device_ops vnet_ops = {
 	.ndo_tx_timeout		= vnet_tx_timeout,
 	.ndo_change_mtu		= vnet_change_mtu,
 	.ndo_start_xmit		= vnet_start_xmit,
+	.ndo_select_queue	= vnet_select_queue,
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= vnet_poll_controller,
 #endif
@@ -1431,7 +1442,7 @@ static struct vnet *vnet_new(const u64 *local_mac)
 	struct vnet *vp;
 	int err, i;
 
-	dev = alloc_etherdev(sizeof(*vp));
+	dev = alloc_etherdev_mqs(sizeof(*vp), VNET_MAX_TXQS, 1);
 	if (!dev)
 		return ERR_PTR(-ENOMEM);
 	dev->needed_headroom = VNET_PACKET_SKIP + 8;
@@ -1556,6 +1567,25 @@ static void print_version(void)
 
 const char *remote_macaddr_prop = "remote-mac-address";
 
+static void
+vnet_port_add_txq(struct vnet_port *port)
+{
+	struct vnet *vp = port->vp;
+	int n;
+
+	n = vp->nports++;
+	n = n & (VNET_MAX_TXQS - 1);
+	port->q_index = n;
+	netif_tx_wake_queue(netdev_get_tx_queue(vp->dev, port->q_index));
+}
+
+static void
+vnet_port_rm_txq(struct vnet_port *port)
+{
+	port->vp->nports--;
+	netif_tx_stop_queue(netdev_get_tx_queue(port->vp->dev, port->q_index));
+}
+
 static int vnet_port_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 {
 	struct mdesc_handle *hp;
@@ -1624,6 +1654,7 @@ static int vnet_port_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 		list_add_tail_rcu(&port->list, &vp->port_list);
 	hlist_add_head_rcu(&port->hash,
 			   &vp->port_hash[vnet_hashfn(port->raddr)]);
+	vnet_port_add_txq(port);
 	spin_unlock_irqrestore(&vp->lock, flags);
 
 	dev_set_drvdata(&vdev->dev, port);
@@ -1668,6 +1699,7 @@ static int vnet_port_remove(struct vio_dev *vdev)
 
 		synchronize_rcu();
 		del_timer_sync(&port->clean_timer);
+		vnet_port_rm_txq(port);
 		netif_napi_del(&port->napi);
 		vnet_port_free_tx_bufs(port);
 		vio_ldc_free(&port->vio);
diff --git a/drivers/net/ethernet/sun/sunvnet.h b/drivers/net/ethernet/sun/sunvnet.h
index c8a862e..cd5d343 100644
--- a/drivers/net/ethernet/sun/sunvnet.h
+++ b/drivers/net/ethernet/sun/sunvnet.h
@@ -61,6 +61,7 @@ struct vnet_port {
 	u32			napi_stop_idx;
 	bool			napi_resume;
 	int			rx_event;
+	u16			q_index;
 };
 
 static inline struct vnet_port *to_vnet_port(struct vio_driver_state *vio)
@@ -102,6 +103,7 @@ struct vnet {
 	struct list_head	list;
 	u64			local_mac;
 
+	int			nports;
 };
 
 #endif /* _SUNVNET_H */
-- 
1.8.4.2

^ permalink raw reply related

* Re: [RFC] use smp_load_acquire()/smp_store_release()
From: Jeff Kirsher @ 2014-10-29 19:27 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Eric Dumazet, netdev
In-Reply-To: <545112E0.40106@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2147 bytes --]

On Wed, 2014-10-29 at 09:16 -0700, Alexander Duyck wrote:
> On 10/29/2014 07:49 AM, Eric Dumazet wrote:
> > Hi Alexander
> >
> > The memory barriers added in commit
> > b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
> > ("net: Add memory barriers to prevent possible race in byte queue
> > limits")
> >
> > have heavy cost.
> >
> > It seems we could use smp_load_acquire() and smp_store_release()
> > instead ?
> >
> > I'll post a patch later today. I would be interested if someone was able
> > to test it, as your commit apparently was tested and known to fix a
> > reproducible race.
> >
> > Thanks !

Eric- just CC me on the patch you post and I will see what I can do
about getting validation eyes on it.

> 
> Unfortunately Stephen left Intel before I did, so we will need to find 
> someone else in the validation team to test this if possible. I have 
> added Jeff to the CC so that he can give the appropriate validation 
> people a heads up that this patch might be coming.
> 
> As I recall what was seen was random Tx hangs on systems with the 
> original BQL code when interfaces were stressed.  It has been a while so 
> I don't recall the exact set-up for all of it.  Also some less 
> used/tested architectures such as PowerPC can be more susceptible to 
> synchronization issues such as these as the memory model is more weakly 
> ordered.
> 
> I'm wondering where you are seeing the barrier show up?  In 
> netdev_tx_send_queue you should only hit the barrier if you actually are 
> triggering the XOFF condition, and in netdev_tx_completed_queue the 
> barrier should be coalesced in amongst a number of frames reducing the cost.
> 
> My concern with this would be that we are actually syncronizing multiple 
> things, the __QUEUE_STATE_STACK_XOFF flag, dql->adj_limit, and 
> dql->num_queued, and we might be trading off reducing the cost on x86 to 
> result in it being increased on other architectures as we may have to 
> actually add additional synchronization as I suspect we would need to 
> use acquire/release on both adj_limit and num_queued.
> 
> Thanks,
> 
> Alex
> 



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] tcp: allow for bigger reordering level
From: Eric Dumazet @ 2014-10-29 19:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, wygivan
In-Reply-To: <20141029.150624.878155212768758630.davem@davemloft.net>

On Wed, 2014-10-29 at 15:06 -0400, David Miller wrote:

> However in the longer term I'd say that this value, if it is to have a
> limit, then such a limit should probably be scaled based upon the
> window size.

Yuchung and othres are working on a new way to handle reorders (RACK),
and should present the concept in next IETF meeting.

A linux patch should follow shortly.

High level idea is :

Decide when and what to retransmit based on the timing, instead of
sequence, relationships. This covers both original or retransmitted
packets.

On dupacks, wait a fraction of RTT before the repair process to both
allow reordering and relieve the network

Thanks

^ permalink raw reply

* [PATCH net-next 2/3] sunvnet: Reset LDC_EVENT_DATA_READY when napi completes.
From: Sowmini Varadhan @ 2014-10-29 19:27 UTC (permalink / raw)
  To: davem, sowmini.varadhan; +Cc: netdev

When vnet_event_napi re-enables interrupts, it should
reset LDC_EVENT_DATA_READY as an optimization.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 drivers/net/ethernet/sun/sunvnet.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
index c390a27..7ada479 100644
--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -760,6 +760,7 @@ static int vnet_poll(struct napi_struct *napi, int budget)

 	if (processed < budget) {
 		napi_complete(napi);
+		port->rx_event &= ~LDC_EVENT_DATA_READY;
 		vio_set_intr(vio->vdev->rx_ino, HV_INTR_ENABLED);
 	}
 	return processed;
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH net-next 1/3] tcp: Correction to RFC number in comment
From: Sowmini Varadhan @ 2014-10-29 19:27 UTC (permalink / raw)
  To: davem, sowmini.varadhan; +Cc: netdev


Challenge ACK is described in RFC 5961, fix typo.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 net/ipv4/tcp_input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a12b455..d285962 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5028,7 +5028,7 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
 	/* step 3: check security and precedence [ignored] */
 
 	/* step 4: Check for a SYN
-	 * RFC 5691 4.2 : Send a challenge ack
+	 * RFC 5961 4.2 : Send a challenge ack
 	 */
 	if (th->syn) {
 syn_challenge:
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH net-next 0/3] sunvnet: Use multiple Tx queues.
From: Sowmini Varadhan @ 2014-10-29 19:27 UTC (permalink / raw)
  To: davem, sowmini.varadhan; +Cc: netdev

The primary objective of this patch-set is to address the suggestion from
  http://marc.info/?l=linux-netdev&m=140790778931563&w=2
With the changes in Patch 3, every vnet_port will get  packets from 
a single tx-queue, and flow-control/head-of-line-blocking is 
confined to the vnet_ports that share that tx queue (as opposed to 
flow-controlling *all* peers).

Patch 1 is a minor correction to a comment that has a typo in 
the RFC number cited for the challenge-ack generation.

Patch 2 is an optimization that resets the DATA_READY bit when
we re-enable Rx interrupts.  This optimization lets us exit quickly 
from vnet_event_napi() when new data has not triggered an interrupt.

Sowmini Varadhan (3):
  Correction to RFC number in comment
  Reset LDC_EVENT_DATA_READY when napi completes.
  Use one Tx queue per vnet_port

 drivers/net/ethernet/sun/sunvnet.c | 95 +++++++++++++++++++++++++-------------
 drivers/net/ethernet/sun/sunvnet.h |  2 +
 net/ipv4/tcp_input.c               |  2 +-
 3 files changed, 67 insertions(+), 32 deletions(-)

-- 
1.8.4.2

^ permalink raw reply

* Re: [PATCH net] inet: frags: remove the WARN_ON from inet_evict_bucket
From: David Miller @ 2014-10-29 19:22 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, fw, eric.dumazet, chutzpah
In-Reply-To: <1414489441-28578-1-git-send-email-nikolay@redhat.com>

From: Nikolay Aleksandrov <nikolay@redhat.com>
Date: Tue, 28 Oct 2014 10:44:01 +0100

> The WARN_ON in inet_evict_bucket can be triggered by a valid case:
> inet_frag_kill and inet_evict_bucket can be running in parallel on the
> same queue which means that there has been at least one more ref added
> by a previous inet_frag_find call, but inet_frag_kill can delete the
> timer before inet_evict_bucket which will cause the WARN_ON() there to
> trigger since we'll have refcnt!=1. Now, this case is valid because the
> queue is being "killed" for some reason (removed from the chain list and
> its timer deleted) so it will get destroyed in the end by one of the
> inet_frag_put() calls which reaches 0 i.e. refcnt is still valid.
> 
> CC: Florian Westphal <fw@strlen.de>
> CC: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Patrick McLean <chutzpah@gentoo.org>
> 
> Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue")
> Reported-by: Patrick McLean <chutzpah@gentoo.org>
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
> ---
> I'm sending this as a separate patch so the race fix doesn't get blocked
> in case I'm wrong and also it's a different issue.

Also applied, thanks.

^ permalink raw reply

* Re: [PATCH net] inet: frags: fix a race between inet_evict_bucket and inet_frag_kill
From: David Miller @ 2014-10-29 19:21 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, fw, eric.dumazet, chutzpah
In-Reply-To: <1414488634-28412-1-git-send-email-nikolay@redhat.com>

From: Nikolay Aleksandrov <nikolay@redhat.com>
Date: Tue, 28 Oct 2014 10:30:34 +0100

> When the evictor is running it adds some chosen frags to a local list to
> be evicted once the chain lock has been released but at the same time
> the *frag_queue can be running for some of the same queues and it
> may call inet_frag_kill which will wait on the chain lock and
> will then delete the queue from the wrong list since it was added in the
> eviction one. The fix is simple - check if the queue has the evict flag
> set under the chain lock before deleting it, this is safe because the
> evict flag is set only under that lock and having the flag set also means
> that the queue has been detached from the chain list, so no need to delete
> it again.
> An important note to make is that we're safe w.r.t refcnt because
> inet_frag_kill and inet_evict_bucket will sync on the del_timer operation
> where only one of the two can succeed (or if the timer is executing -
> none of them), the cases are:
> 1. inet_frag_kill succeeds in del_timer
>  - then the timer ref is removed, but inet_evict_bucket will not add
>    this queue to its expire list but will restart eviction in that chain
> 2. inet_evict_bucket succeeds in del_timer
>  - then the timer ref is kept until the evictor "expires" the queue, but
>    inet_frag_kill will remove the initial ref and will set
>    INET_FRAG_COMPLETE which will make the frag_expire fn just to remove
>    its ref.
> In the end all of the queue users will do an inet_frag_put and the one
> that reaches 0 will free it. The refcount balance should be okay.
> 
> CC: Florian Westphal <fw@strlen.de>
> CC: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Patrick McLean <chutzpah@gentoo.org>
> 
> Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue")
> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
> Reported-by: Patrick McLean <chutzpah@gentoo.org>
> Tested-by: Patrick McLean <chutzpah@gentoo.org>
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
> ---
> A few more eyes to confirm all of this would be much appreciated.

I've applied this and tentatively scheduled it for -stable, thanks.

^ permalink raw reply

* Re: [PATCH 6/7] can: m_can: update to support CAN FD features
From: Oliver Hartkopp @ 2014-10-29 19:21 UTC (permalink / raw)
  To: Dong Aisheng, linux-can; +Cc: mkl, wg, varkabhadram, netdev, linux-arm-kernel
In-Reply-To: <1414579527-31100-6-git-send-email-b29396@freescale.com>

Hello Dong,

thanks for your update to support CAN FD with the 3.0.x M_CAN IP core.

AFAIK from the last CAN in Automation (CiA) Plugfest which took place in
Nuremberg yesterday, there are two more IP cores on the way:

v3.0.1 / v3.0.2 (the current spec from the Bosch website)

v3.1.0 which will support per-frame CAN/CANFD switching in the tx path
(the FDF/(former)EDL bit and the BRS bit appear in the TX buffer element at
the bit position you know from reading the RX FIFO element)

v3.2.0 which will support the final(?) ISO specification with a bitstuffing
counter before the CRC in the frame on the wire.

So first I would suggest to check the core release register (CREL) to be
version 3.0.x and quit the driver initialization if it doesn't fit this
version. I would suggest to provide a separate patch for that check. What
about printing the core release version into the kernel log at driver
initialization time?

Regarding the CAN FD support in this patch I have some remarks in the text ...

On 10/29/2014 11:45 AM, Dong Aisheng wrote:

>  /* Rx Buffer Element */
> +/* R0 */
>  #define RX_BUF_ESI		BIT(31)
>  #define RX_BUF_XTD		BIT(30)
>  #define RX_BUF_RTR		BIT(29)
> +/* R1 */
> +#define RX_BUF_ANMF		BIT(31)
> +#define RX_BUF_EDL		BIT(21)
> +#define RX_BUF_BRS		BIT(20)
>  
>  /* Tx Buffer Element */
> +/* R0 */
>  #define TX_BUF_XTD		BIT(30)
>  #define TX_BUF_RTR		BIT(29)
>  
> @@ -327,11 +357,12 @@ static inline void m_can_disable_all_interrupts(const struct m_can_priv *priv)
>  	m_can_write(priv, M_CAN_ILE, 0x0);
>  }
>  
> -static void m_can_read_fifo(const struct net_device *dev, struct can_frame *cf,
> +static void m_can_read_fifo(const struct net_device *dev, struct canfd_frame *cf,
>  			    u32 rxfs)
>  {
>  	struct m_can_priv *priv = netdev_priv(dev);
>  	u32 id, fgi;
> +	int i;
>  
>  	/* calculate the fifo get index for where to read data */
>  	fgi = (rxfs & RXFS_FGI_MASK) >> RXFS_FGI_OFF;
> @@ -341,15 +372,23 @@ static void m_can_read_fifo(const struct net_device *dev, struct can_frame *cf,
>  	else
>  		cf->can_id = (id >> 18) & CAN_SFF_MASK;
>  
> +	if (id & RX_BUF_ESI) {
> +		cf->flags |= CANFD_ESI;
> +		netdev_dbg(dev, "ESI Error\n");
> +	}
> +
>  	if (id & RX_BUF_RTR) {
>  		cf->can_id |= CAN_RTR_FLAG;

When RX_BUF_EDL is set you should not check for RX_BUF_RTR as RTR is not
allowed for CAN FD.

>  	} else {
>  		id = m_can_fifo_read(priv, fgi, M_CAN_FIFO_DLC);
> -		cf->can_dlc = get_can_dlc((id >> 16) & 0x0F);
> -		*(u32 *)(cf->data + 0) = m_can_fifo_read(priv, fgi,
> -							 M_CAN_FIFO_DATA(0));
> -		*(u32 *)(cf->data + 4) = m_can_fifo_read(priv, fgi,
> -							 M_CAN_FIFO_DATA(1));
> +		cf->len = can_dlc2len(get_canfd_dlc((id >> 16) & 0x0F));
> +
> +		if (id & RX_BUF_BRS)
> +			cf->flags |= CANFD_BRS;
> +
> +		for (i = 0; i < cf->len; i += 4)
> +			*(u32 *)(cf->data + i) =
> +				m_can_fifo_read(priv, fgi, M_CAN_FIFO_DATA(i / 4));
>  	}
>  
>  	/* acknowledge rx fifo 0 */
> @@ -361,7 +400,7 @@ static int m_can_do_rx_poll(struct net_device *dev, int quota)
>  	struct m_can_priv *priv = netdev_priv(dev);
>  	struct net_device_stats *stats = &dev->stats;
>  	struct sk_buff *skb;
> -	struct can_frame *frame;
> +	struct canfd_frame *frame;
>  	u32 pkts = 0;
>  	u32 rxfs;
>  
> @@ -375,7 +414,7 @@ static int m_can_do_rx_poll(struct net_device *dev, int quota)
>  		if (rxfs & RXFS_RFL)
>  			netdev_warn(dev, "Rx FIFO 0 Message Lost\n");
>  
> -		skb = alloc_can_skb(dev, &frame);
> +		skb = alloc_canfd_skb(dev, &frame);

You are *always* allocating CAN FD frames now?

Depending on RX_BUF_EDL in the RX FIFO message you should create a CAN or CAN
FD frame.

Of course you can use the struct canfd_frame in m_can_read_fifo() as the
layout of the struct can_frame is identical when filled with 'normal' CAN
frame content.

But you need to distinguish whether it is a CAN or CAN FD frame when
allocating the skb based on the RX_BUF_EDL value.

>  		if (!skb) {
>  			stats->rx_dropped++;
>  			return pkts;
> @@ -384,7 +423,7 @@ static int m_can_do_rx_poll(struct net_device *dev, int quota)
>  		m_can_read_fifo(dev, frame, rxfs);
>  
>  		stats->rx_packets++;
> -		stats->rx_bytes += frame->can_dlc;
> +		stats->rx_bytes += frame->len;
>  
>  		netif_receive_skb(skb);
>  

The rest of your entire patch set looks very good from my perspective.

Best regards,
Oliver



^ permalink raw reply

* Re: [net 1/2] sctp: add transport state in /proc/net/sctp/remaddr
From: David Miller @ 2014-10-29 19:18 UTC (permalink / raw)
  To: nhorman; +Cc: michele, linux-sctp, vyasevich, netdev, dborkman
In-Reply-To: <20141028102741.GA18365@hmsreliant.think-freely.org>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Tue, 28 Oct 2014 06:27:41 -0400

> On Mon, Oct 27, 2014 at 06:55:45PM -0400, David Miller wrote:
>> From: Michele Baldessari <michele@acksyn.org>
>> Date: Thu, 23 Oct 2014 21:48:40 +0200
>> 
>> > It is often quite helpful to be able to know the state of a transport
>> > outside of the application itself (for troubleshooting purposes or for
>> > monitoring purposes). Add it under /proc/net/sctp/remaddr.
>> > 
>> > Signed-off-by: Michele Baldessari <michele@acksyn.org>
>> 
>> You can't change the layout of procfs files, applications parse
>> these files and any modification can potentially break such tools.
>> 
>> Secondly, even if this change were acceptable, targetting this
>> change at anything other than the net-next tree is not appropriate
>> because it is a new feature.
>> 
> 
> Agree on the net-next submission, though there is precident for extending this
> procfile, as we've done it a few times in the past to this, and other files in
> the sctp area (see commits f406c8b9693f2f71ef2caeb0b68521a7d22d00f0 and
> 58fbbed4fbc0094fc808a568fe99a915f85402ee)

Fair enough.

^ permalink raw reply

* Re: [PATCH v2 net-next] net: ipv6: Add a sysctl to make optimistic addresses useful candidates
From: David Miller @ 2014-10-29 19:12 UTC (permalink / raw)
  To: ek; +Cc: netdev, ben, lorenzo, hannes
In-Reply-To: <1414487474-18201-1-git-send-email-ek@google.com>

From: Erik Kline <ek@google.com>
Date: Tue, 28 Oct 2014 18:11:14 +0900

> Add a sysctl that causes an interface's optimistic addresses
> to be considered equivalent to other non-deprecated addresses
> for source address selection purposes.  Preferred addresses
> will still take precedence over optimistic addresses, subject
> to other ranking in the source address selection algorithm.
> 
> This is useful where different interfaces are connected to
> different networks from different ISPs (e.g., a cell network
> and a home wifi network).
> 
> The current behaviour complies with RFC 3484/6724, and it
> makes sense if the host has only one interface, or has
> multiple interfaces on the same network (same or cooperating
> administrative domain(s), but not in the multiple distinct
> networks case.
> 
> For example, if a mobile device has an IPv6 address on an LTE
> network and then connects to IPv6-enabled wifi, while the wifi
> IPv6 address is undergoing DAD, IPv6 connections will try use
> the wifi default route with the LTE IPv6 address, and will get
> stuck until they time out.
> 
> Also, because optimistic nodes can receive frames, issue
> an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC
> flag appropriately set).  A second RTM_NEWADDR is sent if DAD
> completes (the address flags have changed), otherwise an
> RTM_DELADDR is sent.
> 
> Also: add an entry in ip-sysctl.txt for optimistic_dad.
> 
> Signed-off-by: Erik Kline <ek@google.com>

Applied, if the idev != NULL check proves to be unnecessary we can
kill it in a follow-on patch.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next 0/2] r8152: support nway_reset
From: David Miller @ 2014-10-29 19:10 UTC (permalink / raw)
  To: hayeswang; +Cc: netdev, nic_swsd, linux-kernel, linux-usb
In-Reply-To: <1394712342-15778-66-Taiwan-albertk@realtek.com>

From: Hayes Wang <hayeswang@realtek.com>
Date: Tue, 28 Oct 2014 14:05:50 +0800

> Fix the CHECK from checkpatch.pl and support nway_reset.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net 1/1] cnic: Update the rcu_access_pointer() usages
From: David Miller @ 2014-10-29 19:07 UTC (permalink / raw)
  To: nilesh.javali
  Cc: netdev, Dept-GELinuxNICDev, sudarsana.kalluru, vikas.chaudhary,
	giridhar.malavali, tej.parkash
In-Reply-To: <1414473495-24790-2-git-send-email-nilesh.javali@qlogic.com>

From: Nilesh Javali <nilesh.javali@qlogic.com>
Date: Tue, 28 Oct 2014 01:18:15 -0400

> From: Tej Parkash <tej.parkash@qlogic.com>
> 
> 1. Remove the rcu_read_lock/unlock around rcu_access_pointer
> 2. Replace the rcu_dereference with rcu_access_pointer
> 
> Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] tcp: allow for bigger reordering level
From: David Miller @ 2014-10-29 19:06 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, wygivan
In-Reply-To: <1414471524.13259.14.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 27 Oct 2014 21:45:24 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> While testing upcoming Yaogong patch (converting out of order queue
> into an RB tree), I hit the max reordering level of linux TCP stack.
> 
> Reordering level was limited to 127 for no good reason, and some
> network setups [1] can easily reach this limit and get limited
> throughput.
> 
> Allow a new max limit of 300, and add a sysctl to allow admins to even
> allow bigger (or lower) values if needed.
> 
> [1] Aggregation of links, per packet load balancing, fabrics not doing
>  deep packet inspections, alternative TCP congestion modules...
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

I'll apply this, thanks.

However in the longer term I'd say that this value, if it is to have a
limit, then such a limit should probably be scaled based upon the
window size.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox