Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next] mlx4: use napi_schedule_irqoff()
From: Eric Dumazet @ 2014-10-29 23:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Amir Vadai

From: Eric Dumazet <edumazet@google.com>

mlx4_en_rx_irq() and mlx4_en_tx_irq() run from hard interrupt context.

They can use napi_schedule_irqoff() instead of napi_schedule()

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |    4 ++--
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index c8e75dab80553c876b195361456fb49587231055..c562c1468944f9ad4319e5faaf19bf9e66d15eaf 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -878,8 +878,8 @@ void mlx4_en_rx_irq(struct mlx4_cq *mcq)
 	struct mlx4_en_cq *cq = container_of(mcq, struct mlx4_en_cq, mcq);
 	struct mlx4_en_priv *priv = netdev_priv(cq->dev);
 
-	if (priv->port_up)
-		napi_schedule(&cq->napi);
+	if (likely(priv->port_up))
+		napi_schedule_irqoff(&cq->napi);
 	else
 		mlx4_en_arm_cq(priv, cq);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 34c137878545fc672dad1a3d86e11c034c0ac368..5c4062921cdf46f1a7021a39705275c33ca4de77 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -479,8 +479,8 @@ void mlx4_en_tx_irq(struct mlx4_cq *mcq)
 	struct mlx4_en_cq *cq = container_of(mcq, struct mlx4_en_cq, mcq);
 	struct mlx4_en_priv *priv = netdev_priv(cq->dev);
 
-	if (priv->port_up)
-		napi_schedule(&cq->napi);
+	if (likely(priv->port_up))
+		napi_schedule_irqoff(&cq->napi);
 	else
 		mlx4_en_arm_cq(priv, cq);
 }

^ permalink raw reply related

* [PATCH] rtlwifi: Add more checks for get_btc_status callback
From: Murilo Opsfelder Araujo @ 2014-10-29 23:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-wireless, netdev, Larry Finger, Chaoming Li,
	John W. Linville, Mike Galbraith, Thadeu Cascardo, troy_tan,
	Murilo Opsfelder Araujo

This is a complement of commit 08054200117a95afc14c3d2ed3a38bf4e345bf78
"rtlwifi: Add check for get_btc_status callback".

With this patch, next-20141029 at least does not panic with rtl8192se
device.

Signed-off-by: Murilo Opsfelder Araujo <mopsfelder@gmail.com>
---

Hello, everyone.

Some days ago, I reported [1] that next-20140930 introduced an issue
with rtl8192se devices.

Later on, Larry Finger proposed [2] a fix that did not solve the
problem thoroughly.

This patch is based on Larry's one [3].  It also does not solve the
rtl8192se issue completely but I can at least boot next-20141029
without a panic.

The remaining issue is that the rtl8192se device does not associate.
It does not even show any wifi network available.  The device is shown
by iwconfig, but I cannot do anything with it.

I need help from someone out there that could provide me guidance or
possibly investigate the issue (I'm not a kernel expert yet).

I'd not like to see this regression landing on v3.18.

[1] http://marc.info/?l=linux-wireless&m=141403434929612
[2] http://marc.info/?l=linux-wireless&m=141408165513255
[3] http://marc.info/?l=linux-wireless&m=141416876810127

 drivers/net/wireless/rtlwifi/base.c |  6 ++++--
 drivers/net/wireless/rtlwifi/core.c |  9 ++++++---
 drivers/net/wireless/rtlwifi/pci.c  |  3 ++-
 drivers/net/wireless/rtlwifi/ps.c   | 12 ++++++++----
 4 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/net/wireless/rtlwifi/base.c b/drivers/net/wireless/rtlwifi/base.c
index 40b6d1d..1a51577 100644
--- a/drivers/net/wireless/rtlwifi/base.c
+++ b/drivers/net/wireless/rtlwifi/base.c
@@ -1234,7 +1234,8 @@ EXPORT_SYMBOL_GPL(rtl_action_proc);
 static void setup_arp_tx(struct rtl_priv *rtlpriv, struct rtl_ps_ctl *ppsc)
 {
 	rtlpriv->ra.is_special_data = true;
-	if (rtlpriv->cfg->ops->get_btc_status())
+	if (rtlpriv->cfg->ops->get_btc_status &&
+	    rtlpriv->cfg->ops->get_btc_status())
 		rtlpriv->btcoexist.btc_ops->btc_special_packet_notify(
 					rtlpriv, 1);
 	rtlpriv->enter_ps = false;
@@ -1629,7 +1630,8 @@ void rtl_watchdog_wq_callback(void *data)
 		}
 	}

-	if (rtlpriv->cfg->ops->get_btc_status())
+	if (rtlpriv->cfg->ops->get_btc_status &&
+	    rtlpriv->cfg->ops->get_btc_status())
 		rtlpriv->btcoexist.btc_ops->btc_periodical(rtlpriv);

 	rtlpriv->link_info.bcn_rx_inperiod = 0;
diff --git a/drivers/net/wireless/rtlwifi/core.c b/drivers/net/wireless/rtlwifi/core.c
index f6179bc..686d256 100644
--- a/drivers/net/wireless/rtlwifi/core.c
+++ b/drivers/net/wireless/rtlwifi/core.c
@@ -1133,7 +1133,8 @@ static void rtl_op_bss_info_changed(struct ieee80211_hw *hw,
 		ppsc->report_linked = (mstatus == RT_MEDIA_CONNECT) ?
 				      true : false;

-		if (rtlpriv->cfg->ops->get_btc_status())
+		if (rtlpriv->cfg->ops->get_btc_status &&
+		    rtlpriv->cfg->ops->get_btc_status())
 			rtlpriv->btcoexist.btc_ops->btc_mediastatus_notify(
 							rtlpriv, mstatus);
 	}
@@ -1373,7 +1374,8 @@ static void rtl_op_sw_scan_start(struct ieee80211_hw *hw)
 		return;
 	}

-	if (rtlpriv->cfg->ops->get_btc_status())
+	if (rtlpriv->cfg->ops->get_btc_status &&
+	    rtlpriv->cfg->ops->get_btc_status())
 		rtlpriv->btcoexist.btc_ops->btc_scan_notify(rtlpriv, 1);

 	if (rtlpriv->dm.supp_phymode_switch) {
@@ -1425,7 +1427,8 @@ static void rtl_op_sw_scan_complete(struct ieee80211_hw *hw)
 	}

 	rtlpriv->cfg->ops->scan_operation_backup(hw, SCAN_OPT_RESTORE);
-	if (rtlpriv->cfg->ops->get_btc_status())
+	if (rtlpriv->cfg->ops->get_btc_status &&
+	    rtlpriv->cfg->ops->get_btc_status())
 		rtlpriv->btcoexist.btc_ops->btc_scan_notify(rtlpriv, 0);
 }

diff --git a/drivers/net/wireless/rtlwifi/pci.c b/drivers/net/wireless/rtlwifi/pci.c
index 25daa87..ed3364d 100644
--- a/drivers/net/wireless/rtlwifi/pci.c
+++ b/drivers/net/wireless/rtlwifi/pci.c
@@ -1833,7 +1833,8 @@ static void rtl_pci_stop(struct ieee80211_hw *hw)
 	unsigned long flags;
 	u8 RFInProgressTimeOut = 0;

-	if (rtlpriv->cfg->ops->get_btc_status())
+	if (rtlpriv->cfg->ops->get_btc_status &&
+	    rtlpriv->cfg->ops->get_btc_status())
 		rtlpriv->btcoexist.btc_ops->btc_halt_notify();

 	/*
diff --git a/drivers/net/wireless/rtlwifi/ps.c b/drivers/net/wireless/rtlwifi/ps.c
index b69321d..2278af9 100644
--- a/drivers/net/wireless/rtlwifi/ps.c
+++ b/drivers/net/wireless/rtlwifi/ps.c
@@ -261,7 +261,8 @@ void rtl_ips_nic_off_wq_callback(void *data)
 			ppsc->in_powersavemode = true;

 			/* call before RF off */
-			if (rtlpriv->cfg->ops->get_btc_status())
+			if (rtlpriv->cfg->ops->get_btc_status &&
+			    rtlpriv->cfg->ops->get_btc_status())
 				rtlpriv->btcoexist.btc_ops->btc_ips_notify(rtlpriv,
 									ppsc->inactive_pwrstate);

@@ -306,7 +307,8 @@ void rtl_ips_nic_on(struct ieee80211_hw *hw)
 			ppsc->in_powersavemode = false;
 			_rtl_ps_inactive_ps(hw);
 			/* call after RF on */
-			if (rtlpriv->cfg->ops->get_btc_status())
+			if (rtlpriv->cfg->ops->get_btc_status &&
+			    rtlpriv->cfg->ops->get_btc_status())
 				rtlpriv->btcoexist.btc_ops->btc_ips_notify(rtlpriv,
 									ppsc->inactive_pwrstate);
 		}
@@ -390,14 +392,16 @@ void rtl_lps_set_psmode(struct ieee80211_hw *hw, u8 rt_psmode)
 			if (ppsc->p2p_ps_info.opp_ps)
 				rtl_p2p_ps_cmd(hw , P2P_PS_ENABLE);

-			if (rtlpriv->cfg->ops->get_btc_status())
+			if (rtlpriv->cfg->ops->get_btc_status &&
+			    rtlpriv->cfg->ops->get_btc_status())
 				rtlpriv->btcoexist.btc_ops->btc_lps_notify(rtlpriv, rt_psmode);
 		} else {
 			if (rtl_get_fwlps_doze(hw)) {
 				RT_TRACE(rtlpriv, COMP_RF, DBG_DMESG,
 					 "FW LPS enter ps_mode:%x\n",
 					 ppsc->fwctrl_psmode);
-				if (rtlpriv->cfg->ops->get_btc_status())
+				if (rtlpriv->cfg->ops->get_btc_status &&
+				    rtlpriv->cfg->ops->get_btc_status())
 					rtlpriv->btcoexist.btc_ops->btc_lps_notify(rtlpriv, rt_psmode);
 				enter_fwlps = true;
 				ppsc->pwr_mode = ppsc->fwctrl_psmode;
--
2.1.2

^ permalink raw reply related

* [PATCH net-next] ipv4: minor spelling fixes
From: Stephen Hemminger @ 2014-10-29 23:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev



Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

--- a/net/ipv4/geneve.c	2014-10-27 21:05:31.259174957 -0700
+++ b/net/ipv4/geneve.c	2014-10-27 21:05:31.255174943 -0700
@@ -104,7 +104,7 @@ static void geneve_build_header(struct g
 	memcpy(geneveh->options, options, options_len);
 }
 
-/* Transmit a fully formated Geneve frame.
+/* Transmit a fully formatted Geneve frame.
  *
  * When calling this function. The skb->data should point
  * to the geneve header which is fully formed.
--- a/net/ipv4/tcp_input.c	2014-10-27 21:05:31.259174957 -0700
+++ b/net/ipv4/tcp_input.c	2014-10-27 21:05:31.259174957 -0700
@@ -5865,7 +5865,7 @@ static inline void pr_drop_req(struct re
  * If we receive a SYN packet with these bits set, it means a
  * network is playing bad games with TOS bits. In order to
  * avoid possible false congestion notifications, we disable
- * TCP ECN negociation.
+ * TCP ECN negotiation.
  *
  * Exception: tcp_ca wants ECN. This is required for DCTCP
  * congestion control; it requires setting ECT on all packets,

^ permalink raw reply

* Re: [PATCH v3 00/15] net: dsa: Fixes and enhancements
From: Guenter Roeck @ 2014-10-29 21:39 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, David S. Miller, Andrew Lunn, linux-kernel
In-Reply-To: <5451305B.7010303@gmail.com>

On Wed, Oct 29, 2014 at 11:22:19AM -0700, Florian Fainelli wrote:
> On 10/29/2014 10:44 AM, Guenter Roeck wrote:
> > Patch 01/15 addresses a bug indicated by an an annoying and unhelpful
> > log message.
> > 
> > Patches 02/15 and 03/15 are minor enhancements, adding support for
> > known switch revisions.
> > 
> > Patches 04/15 and 05/15 add support for MV88E6352 and MV88E6176.
> > 
> > Patch 06/15 adds support for hardware monitoring, specifically for
> > reporting the chip temperature, to the dsa subsystem.
> > 
> > Patches 07/15 and 08/15 implement hardware monitoring for MV88E6352,
> > MV88E6176, MV88E6123, MV88E6161, and MV88E6165.
> > 
> > Patch 09/15 and 10/15 add support for EEPROM access to the DSA subsystem.
> > 
> > Patch 11/15 implements EEPROM access for MV88E6352 and MV88E6176.
> > 
> > Patch 12/15 adds support for reading switch registers to the DSA
> > subsystem.
> > 
> > Patches 13/15 amd 14/15 implement support for reading switch registers
> > to the drivers for MV88E6352, MV88E6176, MV88E6123, MV88E6161, and MV88E6165.
> > 
> > Patch 15/15 adds support for reading additional RMON registers to the drivers
> > for  MV88E6352, MV88E6176, MV88E6123, MV88E6161, and MV88E6165.
> > 
> > The series was tested on top of v3.18-rc2 in an x86 system with MV88E6352.
> > Testing in systems with 88E6131, 88E6060 and MV88E6165 was done earlier
> > (I don't have access to those systems right now). The series was also build
> > tested using my build system at http://server.roeck-us.net:8010/builders.
> > Look into the 'dsa' column for build results.
> > 
> > The series merges cleanly into net-next as of today (10/29).
> 
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
> 
> Thanks Guenter!
> 
Thanks a lot for the review!

Guenter

^ permalink raw reply

* Re: [RFC] use smp_load_acquire()/smp_store_release()
From: Alexander Duyck @ 2014-10-29 21:13 UTC (permalink / raw)
  To: Eric Dumazet, Jeff Kirsher; +Cc: netdev
In-Reply-To: <1414612620.631.98.camel@edumazet-glaptop2.roam.corp.google.com>


On 10/29/2014 12:57 PM, Eric Dumazet wrote:
> On Wed, 2014-10-29 at 12:27 -0700, Jeff Kirsher wrote:
>> On Wed, 2014-10-29 at 09:16 -0700, Alexander Duyck wrote:
>>> On 10/29/2014 07:49 AM, Eric Dumazet wrote:
>>>> Hi Alexander
>>>>
>>>> The memory barriers added in commit
>>>> b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
>>>> ("net: Add memory barriers to prevent possible race in byte queue
>>>> limits")
>>>>
>>>> have heavy cost.
>>>>
>>>> It seems we could use smp_load_acquire() and smp_store_release()
>>>> instead ?
>>>>
>>>> I'll post a patch later today. I would be interested if someone was able
>>>> to test it, as your commit apparently was tested and known to fix a
>>>> reproducible race.
>>>>
>>>> Thanks !
>> Eric- just CC me on the patch you post and I will see what I can do
>> about getting validation eyes on it.
> Thanks guys, will do, and will CC Paul as well.
>
> Alexander, here is the following profile showing the cost of the
> 'mfence', in a typical rpc workload (a lot of IRQ are generated for TX
> completions, because RPC tend to send small packets)
>
>    0.11 │       je     33a
>         │       mov    -0x3c(%rbp),%esi
>    0.06 │       lea    0xc0(%rbx),%rdi
>    0.06 │       callq  dql_completed
>    0.06 │       mfence
>   38.68 │       mov    0xc4(%rbx),%edx
>    1.83 │       mov    0xc0(%rbx),%eax
>         │       cmp    %eax,%edx
>    0.22 │       js     333
>    0.11 │       lock   btrl $0x1,0x98(%rbx)

It might be worthwhile to see if it would be possible to combine BQL 
with the mechanism the drivers have for handling descriptors/packets.  
Otherwise you are going to be pulling one barrier just to hit another 
right after it.

Also depending on what driver it is that the trace is from you may want 
to check and see if you have any MMIO transactions occurring right 
before you make the call, otherwise that may be the actual cause for the 
significant cost as you are having to flush non-coherent memory before 
you can resume operation.

Thanks,

Alex

^ permalink raw reply

* Re: nf_reject_ipv4: module license 'unspecified' taints kernel
From: Benjamin Tissoires @ 2014-10-29 21:05 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Dave Young, davem, netdev, linux-kernel@vger.kernel.org,
	netfilter-devel
In-Reply-To: <20141014081109.GA5357@dhcp-16-198.nay.redhat.com>

On Tue, Oct 14, 2014 at 4:11 AM, Dave Young <dyoung@redhat.com> wrote:
> On 10/10/14 at 11:56am, Pablo Neira Ayuso wrote:
>> On Fri, Oct 10, 2014 at 05:19:04PM +0800, Dave Young wrote:
>> > Hi,
>> >
>> > With today's linus tree, I got below kmsg:
>> > [   23.545204] nf_reject_ipv4: module license 'unspecified' taints kernel.
>> >
>> > It could be caused by below commit:
>> >
>> > commit c8d7b98bec43faaa6583c3135030be5eb4693acb
>> > Author: Pablo Neira Ayuso <pablo@netfilter.org>
>> > Date:   Fri Sep 26 14:35:15 2014 +0200
>> >
>> >     netfilter: move nf_send_resetX() code to nf_reject_ipvX modules
>> >
>> >     Move nf_send_reset() and nf_send_reset6() to nf_reject_ipv4 and
>> >     nf_reject_ipv6 respectively. This code is shared by x_tables and
>> >     nf_tables.
>> >
>> >     Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>>
>> Patch attached, thanks for reporting.
>
> Tested-by: Dave Young <dyoung@redhat.com>
>
>>
>> P.S: Please, Cc netfilter-devel@vger.kernel.org in future reports, so
>> we make sure things don't get lost.
>
> Sure. Thanks.
>
>> From d4358bcf64ba7a64d4de4e1dc5533c4c8f88ea82 Mon Sep 17 00:00:00 2001
>> From: Pablo Neira Ayuso <pablo@netfilter.org>
>> Date: Fri, 10 Oct 2014 11:25:20 +0200
>> Subject: [PATCH] netfilter: missing module license in the nf_reject_ipvX
>>  modules
>>
>> [   23.545204] nf_reject_ipv4: module license 'unspecified' taints kernel.
>>
>> Reported-by: Dave Young <dyoung@redhat.com>
>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>> ---

Hi,

What is the status of this patch? I can not find it in Pablo's trees
(or I did not look enough).

Not having it is actually bothering me quite a lot because the vanilla
v3.18-rc2 gives the following dmesg on Fedora 21:

Oct 29 16:50:01 t440s kernel: nf_reject_ipv6: module license
'unspecified' taints kernel.
Oct 29 16:50:01 t440s kernel: Disabling lock debugging due to kernel taint
Oct 29 16:50:01 t440s kernel: nf_reject_ipv6: Unknown symbol
ip6_local_out (err 0)

And unfortunately, firewalld failed after, and I can not directly ssh
to the host.
Now that I found the solution, my process improved a lot (thank you
BTW for whoever included it in Fedora), but I guess other
distributions might hit the problem.

I would say such a trivial patch could easily go in one of the v3.18 RCs.

Cheers,
Benjamin


>>  net/ipv4/netfilter/nf_reject_ipv4.c |    3 +++
>>  net/ipv6/netfilter/nf_reject_ipv6.c |    4 ++++
>>  2 files changed, 7 insertions(+)
>>
>> diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c
>> index b023b4e..92b303d 100644
>> --- a/net/ipv4/netfilter/nf_reject_ipv4.c
>> +++ b/net/ipv4/netfilter/nf_reject_ipv4.c
>> @@ -6,6 +6,7 @@
>>   * published by the Free Software Foundation.
>>   */
>>
>> +#include <linux/module.h>
>>  #include <net/ip.h>
>>  #include <net/tcp.h>
>>  #include <net/route.h>
>> @@ -125,3 +126,5 @@ void nf_send_reset(struct sk_buff *oldskb, int hook)
>>       kfree_skb(nskb);
>>  }
>>  EXPORT_SYMBOL_GPL(nf_send_reset);
>> +
>> +MODULE_LICENSE("GPL");
>> diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c
>> index 5f5f043..20d9def 100644
>> --- a/net/ipv6/netfilter/nf_reject_ipv6.c
>> +++ b/net/ipv6/netfilter/nf_reject_ipv6.c
>> @@ -5,6 +5,8 @@
>>   * it under the terms of the GNU General Public License version 2 as
>>   * published by the Free Software Foundation.
>>   */
>> +
>> +#include <linux/module.h>
>>  #include <net/ipv6.h>
>>  #include <net/ip6_route.h>
>>  #include <net/ip6_fib.h>
>> @@ -161,3 +163,5 @@ void nf_send_reset6(struct net *net, struct sk_buff *oldskb, int hook)
>>               ip6_local_out(nskb);
>>  }
>>  EXPORT_SYMBOL_GPL(nf_send_reset6);
>> +
>> +MODULE_LICENSE("GPL");
>> --
>> 1.7.10.4
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Thomas Gleixner @ 2014-10-29 21:03 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <20141029205131.GI10501@worktop.programming.kicks-ass.net>

On Wed, 29 Oct 2014, Peter Zijlstra wrote:

> On Wed, Oct 29, 2014 at 09:23:42PM +0100, Thomas Gleixner wrote:
> > But at least it allows to mitigate the impact by making it conditional
> > at a central point.
> > 
> > static inline void netpoll_lock(struct net_device *nd)
> > {
> > 	if (netpoll_active(nd))
> > 		spin_lock(&nd->netpoll_lock);
> > }
> 
> branch fail vs lock might be a toss on most machines, but if we're
> hitting cold cachelines we loose big.

Well, if the net_device is not cache hot on irq entry you have lost
already. The extra branch/lock is not going to add much to that.
 
Thanks,

	tglx

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Peter Zijlstra @ 2014-10-29 20:51 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <alpine.DEB.2.11.1410292119350.5308@nanos>

On Wed, Oct 29, 2014 at 09:23:42PM +0100, Thomas Gleixner wrote:
> But at least it allows to mitigate the impact by making it conditional
> at a central point.
> 
> static inline void netpoll_lock(struct net_device *nd)
> {
> 	if (netpoll_active(nd))
> 		spin_lock(&nd->netpoll_lock);
> }

branch fail vs lock might be a toss on most machines, but if we're
hitting cold cachelines we loose big.

> and let the core code make sure that activation/deactivation of
> netpoll on a particular interface is serialized against the interrupt
> and netpoll calls.
> 
> Not sure if it's worth the trouble, but at least it allows to deal
> with it in the core instead of dealing with it on a per driver base.

Does multi-queue have one netdev per queue or does that need moar
logicz?

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Thomas Gleixner @ 2014-10-29 20:23 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <alpine.DEB.2.11.1410292053430.5308@nanos>

On Wed, 29 Oct 2014, Thomas Gleixner wrote:
> On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> 
> > On Wed, Oct 29, 2014 at 08:49:03PM +0100, Thomas Gleixner wrote:
> > > On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> > > 
> > > > On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > > > > Yuck. No. You are just papering over the problem.
> > > > > 
> > > > > What happens if you add 'threadirqs' to the kernel command line? Or if
> > > > > the interrupt line is shared with a real threaded interrupt user?
> > > > > 
> > > > > The proper solution is to have a poll_lock for e1000 which serializes
> > > > > the hardware interrupt against netpoll instead of using
> > > > > disable/enable_irq().
> > > > > 
> > > > > In fact that's less expensive than the disable/enable_irq() dance and
> > > > > the chance of contention is pretty low. If done right it will be a
> > > > > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > > > > 
> > > > 
> > > > OK a little something like so then I suppose.. But I suspect most all
> > > > the network drivers will need this and maybe more, disable_irq() is a
> > > > popular little thing and we 'just' changed semantics on them.
> > > 
> > > We changed that almost 4 years ago :) What we 'just' did was to add a
> > > prominent warning into the code.
> > 
> > You know that is the same right... they didn't know it was broken
> > therefore it wasn't :-), but now they need to go actually do stuff about
> > it, an entirely different proposition.
> 
> Right, and of course the world and some more has the very same code
> there:
> 
> poll_controller()
> {
> 	disable_irq();
> 	dev_interrupt_handler();
> 	enable_irq();
> }
> 
> Trying to twist my brain to come up with a solution which avoids the
> spinlock, but I have a hard time to come up with one.
> 
> The only thing I came up with so far is to avoid adding locks to every
> driver incarnation and instead put it into struct net_device and
> provide helper functions for the lock/unlock case.
> 
> That does not change the fact that we need to deal with that on a per
> driver basis :(

But at least it allows to mitigate the impact by making it conditional
at a central point.

static inline void netpoll_lock(struct net_device *nd)
{
	if (netpoll_active(nd))
		spin_lock(&nd->netpoll_lock);
}

and let the core code make sure that activation/deactivation of
netpoll on a particular interface is serialized against the interrupt
and netpoll calls.

Not sure if it's worth the trouble, but at least it allows to deal
with it in the core instead of dealing with it on a per driver base.

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH net-next] neigh: optimize neigh_parms_release()
From: David Miller @ 2014-10-29 20:12 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev
In-Reply-To: <1414607371-4246-1-git-send-email-nicolas.dichtel@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Wed, 29 Oct 2014 19:29:31 +0100

> In neigh_parms_release() we loop over all entries to find the entry given in
> argument and being able to remove it from the list. By using a double linked
> list, we can avoid this loop.
> 
> Here are some numbers with 30 000 dummy interfaces configured:
> 
> Before the patch:
> $ time rmmod dummy
> real	2m0.118s
> user	0m0.000s
> sys	1m50.048s
> 
> After the patch:
> $ time rmmod dummy
> real	1m9.970s
> user	0m0.000s
> sys	0m47.976s
> 
> Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Looks great, applied, thanks Nicolas.

^ permalink raw reply

* Re: [PATCH net-next] net: introduce napi_schedule_irqoff()
From: David Miller @ 2014-10-29 20:08 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1414544713.631.30.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 28 Oct 2014 18:05:13 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> napi_schedule() can be called from any context and has to mask hard
> irqs.
> 
> Add a variant that can only be called from hard interrupts handlers
> or when irqs are already masked.
> 
> Many NIC drivers can use it from their hard IRQ handler instead of
> generic variant.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Thomas Gleixner @ 2014-10-29 20:07 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <20141029195054.GH10501@worktop.programming.kicks-ass.net>

On Wed, 29 Oct 2014, Peter Zijlstra wrote:

> On Wed, Oct 29, 2014 at 08:49:03PM +0100, Thomas Gleixner wrote:
> > On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> > 
> > > On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > > > Yuck. No. You are just papering over the problem.
> > > > 
> > > > What happens if you add 'threadirqs' to the kernel command line? Or if
> > > > the interrupt line is shared with a real threaded interrupt user?
> > > > 
> > > > The proper solution is to have a poll_lock for e1000 which serializes
> > > > the hardware interrupt against netpoll instead of using
> > > > disable/enable_irq().
> > > > 
> > > > In fact that's less expensive than the disable/enable_irq() dance and
> > > > the chance of contention is pretty low. If done right it will be a
> > > > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > > > 
> > > 
> > > OK a little something like so then I suppose.. But I suspect most all
> > > the network drivers will need this and maybe more, disable_irq() is a
> > > popular little thing and we 'just' changed semantics on them.
> > 
> > We changed that almost 4 years ago :) What we 'just' did was to add a
> > prominent warning into the code.
> 
> You know that is the same right... they didn't know it was broken
> therefore it wasn't :-), but now they need to go actually do stuff about
> it, an entirely different proposition.

Right, and of course the world and some more has the very same code
there:

poll_controller()
{
	disable_irq();
	dev_interrupt_handler();
	enable_irq();
}

Trying to twist my brain to come up with a solution which avoids the
spinlock, but I have a hard time to come up with one.

The only thing I came up with so far is to avoid adding locks to every
driver incarnation and instead put it into struct net_device and
provide helper functions for the lock/unlock case.

That does not change the fact that we need to deal with that on a per
driver basis :(

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCHv1 0/2 net-next] xen-netback: minor cleanups
From: David Miller @ 2014-10-29 20:00 UTC (permalink / raw)
  To: david.vrabel; +Cc: netdev, xen-devel, ian.campbell, wei.liu2
In-Reply-To: <1414510171-12853-1-git-send-email-david.vrabel@citrix.com>

From: David Vrabel <david.vrabel@citrix.com>
Date: Tue, 28 Oct 2014 15:29:29 +0000

> Two minor xen-netback cleanups originally from Zoltan.

Series applied, thanks everyone.

^ permalink raw reply

* Re: [RFC] use smp_load_acquire()/smp_store_release()
From: Eric Dumazet @ 2014-10-29 19:57 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: Alexander Duyck, netdev
In-Reply-To: <1414610868.2420.52.camel@jtkirshe-mobl>

On Wed, 2014-10-29 at 12:27 -0700, Jeff Kirsher wrote:
> On Wed, 2014-10-29 at 09:16 -0700, Alexander Duyck wrote:
> > On 10/29/2014 07:49 AM, Eric Dumazet wrote:
> > > Hi Alexander
> > >
> > > The memory barriers added in commit
> > > b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
> > > ("net: Add memory barriers to prevent possible race in byte queue
> > > limits")
> > >
> > > have heavy cost.
> > >
> > > It seems we could use smp_load_acquire() and smp_store_release()
> > > instead ?
> > >
> > > I'll post a patch later today. I would be interested if someone was able
> > > to test it, as your commit apparently was tested and known to fix a
> > > reproducible race.
> > >
> > > Thanks !
> 
> Eric- just CC me on the patch you post and I will see what I can do
> about getting validation eyes on it.

Thanks guys, will do, and will CC Paul as well.

Alexander, here is the following profile showing the cost of the
'mfence', in a typical rpc workload (a lot of IRQ are generated for TX
completions, because RPC tend to send small packets)

  0.11 │       je     33a
       │       mov    -0x3c(%rbp),%esi
  0.06 │       lea    0xc0(%rbx),%rdi
  0.06 │       callq  dql_completed
  0.06 │       mfence
 38.68 │       mov    0xc4(%rbx),%edx
  1.83 │       mov    0xc0(%rbx),%eax
       │       cmp    %eax,%edx
  0.22 │       js     333
  0.11 │       lock   btrl $0x1,0x98(%rbx)

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Thomas Gleixner @ 2014-10-29 19:53 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: Peter Zijlstra, Sabrina Dubroca, netdev, linux-kernel
In-Reply-To: <1414611641.2420.54.camel@jtkirshe-mobl>

On Wed, 29 Oct 2014, Jeff Kirsher wrote:
> On Wed, 2014-10-29 at 20:36 +0100, Peter Zijlstra wrote:
> > On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > > Yuck. No. You are just papering over the problem.
> > > 
> > > What happens if you add 'threadirqs' to the kernel command line? Or if
> > > the interrupt line is shared with a real threaded interrupt user?
> > > 
> > > The proper solution is to have a poll_lock for e1000 which serializes
> > > the hardware interrupt against netpoll instead of using
> > > disable/enable_irq().
> > > 
> > > In fact that's less expensive than the disable/enable_irq() dance and
> > > the chance of contention is pretty low. If done right it will be a
> > > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > > 
> > 
> > OK a little something like so then I suppose.. But I suspect most all
> > the network drivers will need this and maybe more, disable_irq() is a
> > popular little thing and we 'just' changed semantics on them.
> 
> Thomas- if you are fine with Peter's patch, I can get this under
> testing.

I'm fine with it except for the comment part of disable_irq(), but
that does not matter :)

One nitpick: Instead of having the lock unconditionally, I'd make it
depend on CONFIG_NET_POLL_CONTROLLER.

#ifdef CONFIG_NET_POLL_CONTROLLER
static inline void netpoll_lock(struct e1000_adapter *adapter)
{
	spin_lock(&adapter->irq_lock);
}

static inline void netpoll_unlock(struct e1000_adapter *adapter)
{
	spin_unlock(&adapter->irq_lock);
}
#else
static inline void netpoll_lock(struct e1000_adapter *adapter) { }
static inline void netpoll_unlock(struct e1000_adapter *adapter) { }
#endif

and use that instead of the unconditional spin[un]lock() invocations.

But that's up to you.

Thanks,

	tglx

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Peter Zijlstra @ 2014-10-29 19:50 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <alpine.DEB.2.11.1410292046270.5308@nanos>

On Wed, Oct 29, 2014 at 08:49:03PM +0100, Thomas Gleixner wrote:
> On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> 
> > On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > > Yuck. No. You are just papering over the problem.
> > > 
> > > What happens if you add 'threadirqs' to the kernel command line? Or if
> > > the interrupt line is shared with a real threaded interrupt user?
> > > 
> > > The proper solution is to have a poll_lock for e1000 which serializes
> > > the hardware interrupt against netpoll instead of using
> > > disable/enable_irq().
> > > 
> > > In fact that's less expensive than the disable/enable_irq() dance and
> > > the chance of contention is pretty low. If done right it will be a
> > > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > > 
> > 
> > OK a little something like so then I suppose.. But I suspect most all
> > the network drivers will need this and maybe more, disable_irq() is a
> > popular little thing and we 'just' changed semantics on them.
> 
> We changed that almost 4 years ago :) What we 'just' did was to add a
> prominent warning into the code.

You know that is the same right... they didn't know it was broken
therefore it wasn't :-), but now they need to go actually do stuff about
it, an entirely different proposition.

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Thomas Gleixner @ 2014-10-29 19:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <20141029193603.GS12706@worktop.programming.kicks-ass.net>

On Wed, 29 Oct 2014, Peter Zijlstra wrote:

> On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > Yuck. No. You are just papering over the problem.
> > 
> > What happens if you add 'threadirqs' to the kernel command line? Or if
> > the interrupt line is shared with a real threaded interrupt user?
> > 
> > The proper solution is to have a poll_lock for e1000 which serializes
> > the hardware interrupt against netpoll instead of using
> > disable/enable_irq().
> > 
> > In fact that's less expensive than the disable/enable_irq() dance and
> > the chance of contention is pretty low. If done right it will be a
> > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > 
> 
> OK a little something like so then I suppose.. But I suspect most all
> the network drivers will need this and maybe more, disable_irq() is a
> popular little thing and we 'just' changed semantics on them.

We changed that almost 4 years ago :) What we 'just' did was to add a
prominent warning into the code.
 
> ---
>  drivers/net/ethernet/intel/e1000/e1000.h      |  2 ++
>  drivers/net/ethernet/intel/e1000/e1000_main.c | 22 +++++++++++++++++-----
>  kernel/irq/manage.c                           |  2 +-
>  3 files changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
> index 69707108d23c..3f48609f2318 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000.h
> +++ b/drivers/net/ethernet/intel/e1000/e1000.h
> @@ -323,6 +323,8 @@ struct e1000_adapter {
>  	struct delayed_work watchdog_task;
>  	struct delayed_work fifo_stall_task;
>  	struct delayed_work phy_info_task;
> +
> +	spinlock_t irq_lock;
>  };
>  
>  enum e1000_state_t {
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index 5f6aded512f5..d12cbffe2149 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -1310,6 +1310,7 @@ static int e1000_sw_init(struct e1000_adapter *adapter)
>  	e1000_irq_disable(adapter);
>  
>  	spin_lock_init(&adapter->stats_lock);
> +	spin_lock_init(&adapter->irq_lock);
>  
>  	set_bit(__E1000_DOWN, &adapter->flags);
>  
> @@ -3748,10 +3749,8 @@ void e1000_update_stats(struct e1000_adapter *adapter)
>   * @irq: interrupt number
>   * @data: pointer to a network interface device structure
>   **/
> -static irqreturn_t e1000_intr(int irq, void *data)
> +static irqreturn_t __e1000_intr(int irq, struct e1000_adapter *adapter)
>  {
> -	struct net_device *netdev = data;
> -	struct e1000_adapter *adapter = netdev_priv(netdev);
>  	struct e1000_hw *hw = &adapter->hw;
>  	u32 icr = er32(ICR);
>  
> @@ -3793,6 +3792,19 @@ static irqreturn_t e1000_intr(int irq, void *data)
>  	return IRQ_HANDLED;
>  }
>  
> +static irqreturn_t e1000_intr(int irq, void *data)
> +{
> +	struct net_device *netdev = data;
> +	struct e1000_adapter *adapter = netdev_priv(netdev);
> +	irqreturn_t ret;
> +
> +	spin_lock(&adapter->irq_lock);
> +	ret = __e1000_intr(irq, adapter);
> +	spin_unlock(&adapter->irq_lock);
> +
> +	return ret;
> +}
> +
>  /**
>   * e1000_clean - NAPI Rx polling callback
>   * @adapter: board private structure
> @@ -5217,9 +5229,9 @@ static void e1000_netpoll(struct net_device *netdev)
>  {
>  	struct e1000_adapter *adapter = netdev_priv(netdev);
>  
> -	disable_irq(adapter->pdev->irq);
> +	spin_lock(&adapter->irq_lock)
>  	e1000_intr(adapter->pdev->irq, netdev);
> -	enable_irq(adapter->pdev->irq);
> +	spin_unlock(&adapter->irq_lock)
>  }
>  #endif
>  
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 0a9104b4608b..b5a4a06bf2fd 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -427,7 +427,7 @@ EXPORT_SYMBOL(disable_irq_nosync);
>   *	to complete before returning. If you use this function while
>   *	holding a resource the IRQ handler may need you will deadlock.
>   *
> - *	This function may be called - with care - from IRQ context.
> + *	This function may _NOT_ be called from IRQ context.

It can only be called from preemptible thread context.

Thanks,

	tglx

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Jeff Kirsher @ 2014-10-29 19:40 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Thomas Gleixner, Sabrina Dubroca, netdev, linux-kernel
In-Reply-To: <20141029193603.GS12706@worktop.programming.kicks-ass.net>

[-- Attachment #1: Type: text/plain, Size: 4206 bytes --]

On Wed, 2014-10-29 at 20:36 +0100, Peter Zijlstra wrote:
> On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > Yuck. No. You are just papering over the problem.
> > 
> > What happens if you add 'threadirqs' to the kernel command line? Or if
> > the interrupt line is shared with a real threaded interrupt user?
> > 
> > The proper solution is to have a poll_lock for e1000 which serializes
> > the hardware interrupt against netpoll instead of using
> > disable/enable_irq().
> > 
> > In fact that's less expensive than the disable/enable_irq() dance and
> > the chance of contention is pretty low. If done right it will be a
> > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > 
> 
> OK a little something like so then I suppose.. But I suspect most all
> the network drivers will need this and maybe more, disable_irq() is a
> popular little thing and we 'just' changed semantics on them.

Thomas- if you are fine with Peter's patch, I can get this under
testing.

> 
> ---
>  drivers/net/ethernet/intel/e1000/e1000.h      |  2 ++
>  drivers/net/ethernet/intel/e1000/e1000_main.c | 22 +++++++++++++++++-----
>  kernel/irq/manage.c                           |  2 +-
>  3 files changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
> index 69707108d23c..3f48609f2318 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000.h
> +++ b/drivers/net/ethernet/intel/e1000/e1000.h
> @@ -323,6 +323,8 @@ struct e1000_adapter {
>  	struct delayed_work watchdog_task;
>  	struct delayed_work fifo_stall_task;
>  	struct delayed_work phy_info_task;
> +
> +	spinlock_t irq_lock;
>  };
>  
>  enum e1000_state_t {
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index 5f6aded512f5..d12cbffe2149 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -1310,6 +1310,7 @@ static int e1000_sw_init(struct e1000_adapter *adapter)
>  	e1000_irq_disable(adapter);
>  
>  	spin_lock_init(&adapter->stats_lock);
> +	spin_lock_init(&adapter->irq_lock);
>  
>  	set_bit(__E1000_DOWN, &adapter->flags);
>  
> @@ -3748,10 +3749,8 @@ void e1000_update_stats(struct e1000_adapter *adapter)
>   * @irq: interrupt number
>   * @data: pointer to a network interface device structure
>   **/
> -static irqreturn_t e1000_intr(int irq, void *data)
> +static irqreturn_t __e1000_intr(int irq, struct e1000_adapter *adapter)
>  {
> -	struct net_device *netdev = data;
> -	struct e1000_adapter *adapter = netdev_priv(netdev);
>  	struct e1000_hw *hw = &adapter->hw;
>  	u32 icr = er32(ICR);
>  
> @@ -3793,6 +3792,19 @@ static irqreturn_t e1000_intr(int irq, void *data)
>  	return IRQ_HANDLED;
>  }
>  
> +static irqreturn_t e1000_intr(int irq, void *data)
> +{
> +	struct net_device *netdev = data;
> +	struct e1000_adapter *adapter = netdev_priv(netdev);
> +	irqreturn_t ret;
> +
> +	spin_lock(&adapter->irq_lock);
> +	ret = __e1000_intr(irq, adapter);
> +	spin_unlock(&adapter->irq_lock);
> +
> +	return ret;
> +}
> +
>  /**
>   * e1000_clean - NAPI Rx polling callback
>   * @adapter: board private structure
> @@ -5217,9 +5229,9 @@ static void e1000_netpoll(struct net_device *netdev)
>  {
>  	struct e1000_adapter *adapter = netdev_priv(netdev);
>  
> -	disable_irq(adapter->pdev->irq);
> +	spin_lock(&adapter->irq_lock)
>  	e1000_intr(adapter->pdev->irq, netdev);
> -	enable_irq(adapter->pdev->irq);
> +	spin_unlock(&adapter->irq_lock)
>  }
>  #endif
>  
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 0a9104b4608b..b5a4a06bf2fd 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -427,7 +427,7 @@ EXPORT_SYMBOL(disable_irq_nosync);
>   *	to complete before returning. If you use this function while
>   *	holding a resource the IRQ handler may need you will deadlock.
>   *
> - *	This function may be called - with care - from IRQ context.
> + *	This function may _NOT_ be called from IRQ context.
>   */
>  void disable_irq(unsigned int irq)
>  {



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next
From: Peter Zijlstra @ 2014-10-29 19:36 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Sabrina Dubroca, netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <alpine.DEB.2.11.1410291918060.5308@nanos>

On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> Yuck. No. You are just papering over the problem.
> 
> What happens if you add 'threadirqs' to the kernel command line? Or if
> the interrupt line is shared with a real threaded interrupt user?
> 
> The proper solution is to have a poll_lock for e1000 which serializes
> the hardware interrupt against netpoll instead of using
> disable/enable_irq().
> 
> In fact that's less expensive than the disable/enable_irq() dance and
> the chance of contention is pretty low. If done right it will be a
> NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> 

OK a little something like so then I suppose.. But I suspect most all
the network drivers will need this and maybe more, disable_irq() is a
popular little thing and we 'just' changed semantics on them.

---
 drivers/net/ethernet/intel/e1000/e1000.h      |  2 ++
 drivers/net/ethernet/intel/e1000/e1000_main.c | 22 +++++++++++++++++-----
 kernel/irq/manage.c                           |  2 +-
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
index 69707108d23c..3f48609f2318 100644
--- a/drivers/net/ethernet/intel/e1000/e1000.h
+++ b/drivers/net/ethernet/intel/e1000/e1000.h
@@ -323,6 +323,8 @@ struct e1000_adapter {
 	struct delayed_work watchdog_task;
 	struct delayed_work fifo_stall_task;
 	struct delayed_work phy_info_task;
+
+	spinlock_t irq_lock;
 };
 
 enum e1000_state_t {
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 5f6aded512f5..d12cbffe2149 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -1310,6 +1310,7 @@ static int e1000_sw_init(struct e1000_adapter *adapter)
 	e1000_irq_disable(adapter);
 
 	spin_lock_init(&adapter->stats_lock);
+	spin_lock_init(&adapter->irq_lock);
 
 	set_bit(__E1000_DOWN, &adapter->flags);
 
@@ -3748,10 +3749,8 @@ void e1000_update_stats(struct e1000_adapter *adapter)
  * @irq: interrupt number
  * @data: pointer to a network interface device structure
  **/
-static irqreturn_t e1000_intr(int irq, void *data)
+static irqreturn_t __e1000_intr(int irq, struct e1000_adapter *adapter)
 {
-	struct net_device *netdev = data;
-	struct e1000_adapter *adapter = netdev_priv(netdev);
 	struct e1000_hw *hw = &adapter->hw;
 	u32 icr = er32(ICR);
 
@@ -3793,6 +3792,19 @@ static irqreturn_t e1000_intr(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
+static irqreturn_t e1000_intr(int irq, void *data)
+{
+	struct net_device *netdev = data;
+	struct e1000_adapter *adapter = netdev_priv(netdev);
+	irqreturn_t ret;
+
+	spin_lock(&adapter->irq_lock);
+	ret = __e1000_intr(irq, adapter);
+	spin_unlock(&adapter->irq_lock);
+
+	return ret;
+}
+
 /**
  * e1000_clean - NAPI Rx polling callback
  * @adapter: board private structure
@@ -5217,9 +5229,9 @@ static void e1000_netpoll(struct net_device *netdev)
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 
-	disable_irq(adapter->pdev->irq);
+	spin_lock(&adapter->irq_lock)
 	e1000_intr(adapter->pdev->irq, netdev);
-	enable_irq(adapter->pdev->irq);
+	spin_unlock(&adapter->irq_lock)
 }
 #endif
 
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 0a9104b4608b..b5a4a06bf2fd 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -427,7 +427,7 @@ EXPORT_SYMBOL(disable_irq_nosync);
  *	to complete before returning. If you use this function while
  *	holding a resource the IRQ handler may need you will deadlock.
  *
- *	This function may be called - with care - from IRQ context.
+ *	This function may _NOT_ be called from IRQ context.
  */
 void disable_irq(unsigned int irq)
 {

^ permalink raw reply related

* Re: net: fec: fix regression on i.MX28 introduced by rx_copybreak support
From: David Miller @ 2014-10-29 19:34 UTC (permalink / raw)
  To: LW
  Cc: netdev, rmk+kernel, Frank.Li, fabio.estevam, linux-kernel,
	linux-arm-kernel
In-Reply-To: <1414502584-10583-1-git-send-email-LW@KARO-electronics.de>

From: Lothar Waßmann <LW@KARO-electronics.de>
Date: Tue, 28 Oct 2014 14:22:55 +0100

> Changes wrt. v1:
> - added some cleanup patches
> - simplify handling of 'quirks' flags as suggested by Russell King.
> - remove DIV_ROUND_UP() from byte swapping loop as suggested by
>   Eric Dumazet
> 
> Changes wrt. v2:
> - rebased against next-20141028
> - added some more cleanups in fec.h
> - removed unused return value from swap_buffer()
> - fixed messed swab32s() call in swap_buffer2()
> - fixed messed up setup of fep->quirks
> 

It is not appropriate to mix cleanups and bonafide bug fixes.

I want to see only bug fixes targetted at 'net'.  You can later
submit the cleanups to 'net-next'.

Also, I don't thnk your DIV_ROUND_UP() eliminate for the loop
in swap_buffer() is valid.  The whole point is that the current
code handles buffers which have a length which is not a multiple
of 4 properly, after your change it will no longer do so.

^ permalink raw reply

* [PATCH net-next 3/3] sunvnet: Use one Tx queue per vnet_port
From: Sowmini Varadhan @ 2014-10-29 19:27 UTC (permalink / raw)
  To: davem, sowmini.varadhan; +Cc: netdev


Use multple Tx netdev queues for sunvnet by supporting a one-to-one
mapping between vnet_port and Tx queue. Provide a ndo_select_queue
indirection (vnet_select_queue()) which selects the queue based
on the peer that would be selected in vnet_start_xmit()

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 drivers/net/ethernet/sun/sunvnet.c | 94 +++++++++++++++++++++++++-------------
 drivers/net/ethernet/sun/sunvnet.h |  2 +
 2 files changed, 65 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
index 7ada479..e7bb63b 100644
--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -40,6 +40,8 @@ MODULE_DESCRIPTION("Sun LDOM virtual network driver");
 MODULE_LICENSE("GPL");
 MODULE_VERSION(DRV_MODULE_VERSION);
 
+#define	VNET_MAX_TXQS		16
+
 /* Heuristic for the number of times to exponentially backoff and
  * retry sending an LDC trigger when EAGAIN is encountered
  */
@@ -551,6 +553,8 @@ static int vnet_ack(struct vnet_port *port, void *msgbuf)
 	struct vnet *vp;
 	u32 end;
 	struct vio_net_desc *desc;
+	struct netdev_queue *txq;
+
 	if (unlikely(pkt->tag.stype_env != VIO_DRING_DATA))
 		return 0;
 
@@ -580,7 +584,8 @@ static int vnet_ack(struct vnet_port *port, void *msgbuf)
 	}
 	netif_tx_unlock(dev);
 
-	if (unlikely(netif_queue_stopped(dev) &&
+	txq = netdev_get_tx_queue(dev, port->q_index);
+	if (unlikely(netif_tx_queue_stopped(txq) &&
 		     vnet_tx_dring_avail(dr) >= VNET_TX_WAKEUP_THRESH(dr)))
 		return 1;
 
@@ -608,31 +613,23 @@ static int handle_mcast(struct vnet_port *port, void *msgbuf)
 	return 0;
 }
 
-static void maybe_tx_wakeup(struct vnet *vp)
+/* Got back a STOPPED LDC message on port. If the queue is stopped,
+ * wake it up so that we'll send out another START message at the
+ * next TX.
+ */
+static void maybe_tx_wakeup(struct vnet_port *port)
 {
-	struct net_device *dev = vp->dev;
+	struct netdev_queue *txq;
 
-	netif_tx_lock(dev);
-	if (likely(netif_queue_stopped(dev))) {
-		struct vnet_port *port;
-		int wake = 1;
-
-		rcu_read_lock();
-		list_for_each_entry_rcu(port, &vp->port_list, list) {
-			struct vio_dring_state *dr;
-
-			dr = &port->vio.drings[VIO_DRIVER_TX_RING];
-			if (vnet_tx_dring_avail(dr) <
-			    VNET_TX_WAKEUP_THRESH(dr)) {
-				wake = 0;
-				break;
-			}
-		}
-		rcu_read_unlock();
-		if (wake)
-			netif_wake_queue(dev);
+	txq = netdev_get_tx_queue(port->vp->dev, port->q_index);
+	__netif_tx_lock(txq, smp_processor_id());
+	if (likely(netif_tx_queue_stopped(txq))) {
+		struct vio_dring_state *dr;
+
+		dr = &port->vio.drings[VIO_DRIVER_TX_RING];
+			netif_tx_wake_queue(txq);
 	}
-	netif_tx_unlock(dev);
+	__netif_tx_unlock(txq);
 }
 
 static inline bool port_is_up(struct vnet_port *vnet)
@@ -748,7 +745,7 @@ napi_resume:
 			break;
 	}
 	if (unlikely(tx_wakeup && err != -ECONNRESET))
-		maybe_tx_wakeup(port->vp);
+		maybe_tx_wakeup(port);
 	return npkts;
 }
 
@@ -953,6 +950,16 @@ static inline struct sk_buff *vnet_skb_shape(struct sk_buff *skb, void **pstart,
 	return skb;
 }
 
+static u16
+vnet_select_queue(struct net_device *dev, struct sk_buff *skb,
+		  void *accel_priv, select_queue_fallback_t fallback)
+{
+	struct vnet *vp = netdev_priv(dev);
+	struct vnet_port *port = __tx_port_find(vp, skb);
+
+	return port->q_index;
+}
+
 static int vnet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct vnet *vp = netdev_priv(dev);
@@ -965,6 +972,7 @@ static int vnet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	void *start = NULL;
 	int nlen = 0;
 	unsigned pending = 0;
+	struct netdev_queue *txq;
 
 	skb = vnet_skb_shape(skb, &start, &nlen);
 	if (unlikely(!skb))
@@ -1008,9 +1016,11 @@ static int vnet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	dr = &port->vio.drings[VIO_DRIVER_TX_RING];
+	i = skb_get_queue_mapping(skb);
+	txq = netdev_get_tx_queue(dev, i);
 	if (unlikely(vnet_tx_dring_avail(dr) < 1)) {
-		if (!netif_queue_stopped(dev)) {
-			netif_stop_queue(dev);
+		if (!netif_tx_queue_stopped(txq)) {
+			netif_tx_stop_queue(txq);
 
 			/* This is a hard error, log it. */
 			netdev_err(dev, "BUG! Tx Ring full when queue awake!\n");
@@ -1104,9 +1114,9 @@ ldc_start_done:
 
 	dr->prod = (dr->prod + 1) & (VNET_TX_RING_SIZE - 1);
 	if (unlikely(vnet_tx_dring_avail(dr) < 1)) {
-		netif_stop_queue(dev);
+		netif_tx_stop_queue(txq);
 		if (vnet_tx_dring_avail(dr) > VNET_TX_WAKEUP_THRESH(dr))
-			netif_wake_queue(dev);
+			netif_tx_wake_queue(txq);
 	}
 
 	(void)mod_timer(&port->clean_timer, jiffies + VNET_CLEAN_TIMEOUT);
@@ -1139,14 +1149,14 @@ static void vnet_tx_timeout(struct net_device *dev)
 static int vnet_open(struct net_device *dev)
 {
 	netif_carrier_on(dev);
-	netif_start_queue(dev);
+	netif_tx_start_all_queues(dev);
 
 	return 0;
 }
 
 static int vnet_close(struct net_device *dev)
 {
-	netif_stop_queue(dev);
+	netif_tx_stop_all_queues(dev);
 	netif_carrier_off(dev);
 
 	return 0;
@@ -1420,6 +1430,7 @@ static const struct net_device_ops vnet_ops = {
 	.ndo_tx_timeout		= vnet_tx_timeout,
 	.ndo_change_mtu		= vnet_change_mtu,
 	.ndo_start_xmit		= vnet_start_xmit,
+	.ndo_select_queue	= vnet_select_queue,
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= vnet_poll_controller,
 #endif
@@ -1431,7 +1442,7 @@ static struct vnet *vnet_new(const u64 *local_mac)
 	struct vnet *vp;
 	int err, i;
 
-	dev = alloc_etherdev(sizeof(*vp));
+	dev = alloc_etherdev_mqs(sizeof(*vp), VNET_MAX_TXQS, 1);
 	if (!dev)
 		return ERR_PTR(-ENOMEM);
 	dev->needed_headroom = VNET_PACKET_SKIP + 8;
@@ -1556,6 +1567,25 @@ static void print_version(void)
 
 const char *remote_macaddr_prop = "remote-mac-address";
 
+static void
+vnet_port_add_txq(struct vnet_port *port)
+{
+	struct vnet *vp = port->vp;
+	int n;
+
+	n = vp->nports++;
+	n = n & (VNET_MAX_TXQS - 1);
+	port->q_index = n;
+	netif_tx_wake_queue(netdev_get_tx_queue(vp->dev, port->q_index));
+}
+
+static void
+vnet_port_rm_txq(struct vnet_port *port)
+{
+	port->vp->nports--;
+	netif_tx_stop_queue(netdev_get_tx_queue(port->vp->dev, port->q_index));
+}
+
 static int vnet_port_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 {
 	struct mdesc_handle *hp;
@@ -1624,6 +1654,7 @@ static int vnet_port_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 		list_add_tail_rcu(&port->list, &vp->port_list);
 	hlist_add_head_rcu(&port->hash,
 			   &vp->port_hash[vnet_hashfn(port->raddr)]);
+	vnet_port_add_txq(port);
 	spin_unlock_irqrestore(&vp->lock, flags);
 
 	dev_set_drvdata(&vdev->dev, port);
@@ -1668,6 +1699,7 @@ static int vnet_port_remove(struct vio_dev *vdev)
 
 		synchronize_rcu();
 		del_timer_sync(&port->clean_timer);
+		vnet_port_rm_txq(port);
 		netif_napi_del(&port->napi);
 		vnet_port_free_tx_bufs(port);
 		vio_ldc_free(&port->vio);
diff --git a/drivers/net/ethernet/sun/sunvnet.h b/drivers/net/ethernet/sun/sunvnet.h
index c8a862e..cd5d343 100644
--- a/drivers/net/ethernet/sun/sunvnet.h
+++ b/drivers/net/ethernet/sun/sunvnet.h
@@ -61,6 +61,7 @@ struct vnet_port {
 	u32			napi_stop_idx;
 	bool			napi_resume;
 	int			rx_event;
+	u16			q_index;
 };
 
 static inline struct vnet_port *to_vnet_port(struct vio_driver_state *vio)
@@ -102,6 +103,7 @@ struct vnet {
 	struct list_head	list;
 	u64			local_mac;
 
+	int			nports;
 };
 
 #endif /* _SUNVNET_H */
-- 
1.8.4.2

^ permalink raw reply related

* Re: [RFC] use smp_load_acquire()/smp_store_release()
From: Jeff Kirsher @ 2014-10-29 19:27 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Eric Dumazet, netdev
In-Reply-To: <545112E0.40106@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2147 bytes --]

On Wed, 2014-10-29 at 09:16 -0700, Alexander Duyck wrote:
> On 10/29/2014 07:49 AM, Eric Dumazet wrote:
> > Hi Alexander
> >
> > The memory barriers added in commit
> > b37c0fbe3f6dfba1f8ad2aed47fb40578a254635
> > ("net: Add memory barriers to prevent possible race in byte queue
> > limits")
> >
> > have heavy cost.
> >
> > It seems we could use smp_load_acquire() and smp_store_release()
> > instead ?
> >
> > I'll post a patch later today. I would be interested if someone was able
> > to test it, as your commit apparently was tested and known to fix a
> > reproducible race.
> >
> > Thanks !

Eric- just CC me on the patch you post and I will see what I can do
about getting validation eyes on it.

> 
> Unfortunately Stephen left Intel before I did, so we will need to find 
> someone else in the validation team to test this if possible. I have 
> added Jeff to the CC so that he can give the appropriate validation 
> people a heads up that this patch might be coming.
> 
> As I recall what was seen was random Tx hangs on systems with the 
> original BQL code when interfaces were stressed.  It has been a while so 
> I don't recall the exact set-up for all of it.  Also some less 
> used/tested architectures such as PowerPC can be more susceptible to 
> synchronization issues such as these as the memory model is more weakly 
> ordered.
> 
> I'm wondering where you are seeing the barrier show up?  In 
> netdev_tx_send_queue you should only hit the barrier if you actually are 
> triggering the XOFF condition, and in netdev_tx_completed_queue the 
> barrier should be coalesced in amongst a number of frames reducing the cost.
> 
> My concern with this would be that we are actually syncronizing multiple 
> things, the __QUEUE_STATE_STACK_XOFF flag, dql->adj_limit, and 
> dql->num_queued, and we might be trading off reducing the cost on x86 to 
> result in it being increased on other architectures as we may have to 
> actually add additional synchronization as I suspect we would need to 
> use acquire/release on both adj_limit and num_queued.
> 
> Thanks,
> 
> Alex
> 



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] tcp: allow for bigger reordering level
From: Eric Dumazet @ 2014-10-29 19:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, wygivan
In-Reply-To: <20141029.150624.878155212768758630.davem@davemloft.net>

On Wed, 2014-10-29 at 15:06 -0400, David Miller wrote:

> However in the longer term I'd say that this value, if it is to have a
> limit, then such a limit should probably be scaled based upon the
> window size.

Yuchung and othres are working on a new way to handle reorders (RACK),
and should present the concept in next IETF meeting.

A linux patch should follow shortly.

High level idea is :

Decide when and what to retransmit based on the timing, instead of
sequence, relationships. This covers both original or retransmitted
packets.

On dupacks, wait a fraction of RTT before the repair process to both
allow reordering and relieve the network

Thanks

^ permalink raw reply

* [PATCH net-next 2/3] sunvnet: Reset LDC_EVENT_DATA_READY when napi completes.
From: Sowmini Varadhan @ 2014-10-29 19:27 UTC (permalink / raw)
  To: davem, sowmini.varadhan; +Cc: netdev

When vnet_event_napi re-enables interrupts, it should
reset LDC_EVENT_DATA_READY as an optimization.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 drivers/net/ethernet/sun/sunvnet.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
index c390a27..7ada479 100644
--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -760,6 +760,7 @@ static int vnet_poll(struct napi_struct *napi, int budget)

 	if (processed < budget) {
 		napi_complete(napi);
+		port->rx_event &= ~LDC_EVENT_DATA_READY;
 		vio_set_intr(vio->vdev->rx_ino, HV_INTR_ENABLED);
 	}
 	return processed;
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH net-next 1/3] tcp: Correction to RFC number in comment
From: Sowmini Varadhan @ 2014-10-29 19:27 UTC (permalink / raw)
  To: davem, sowmini.varadhan; +Cc: netdev


Challenge ACK is described in RFC 5961, fix typo.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 net/ipv4/tcp_input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a12b455..d285962 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5028,7 +5028,7 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
 	/* step 3: check security and precedence [ignored] */
 
 	/* step 4: Check for a SYN
-	 * RFC 5691 4.2 : Send a challenge ack
+	 * RFC 5961 4.2 : Send a challenge ack
 	 */
 	if (th->syn) {
 syn_challenge:
-- 
1.8.4.2

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox