Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [Intel-wired-lan] [PATCH] PCI: Check/Set ARI capability before setting numVFs
From: Bjorn Helgaas @ 2017-10-06 18:19 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Tony Nguyen, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, intel-wired-lan, Netdev,
	Bjorn Helgaas
In-Reply-To: <20171005195959.GW25517@bhelgaas-glaptop.roam.corp.google.com>

On Thu, Oct 05, 2017 at 04:07:41PM -0500, Bjorn Helgaas wrote:
> On Wed, Oct 04, 2017 at 04:29:14PM -0700, Alexander Duyck wrote:
> > On Wed, Oct 4, 2017 at 4:01 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Wed, Oct 04, 2017 at 08:52:58AM -0700, Tony Nguyen wrote:
> > >> This fixes a bug that can occur if an AER error is encountered while SRIOV
> > >> devices are present.

I applied the patch below with Alex's ack to pci/virtualization for v4.15.

> commit 95594dedd443e42ab0c16b9fba0109e955e7be13
> Author: Tony Nguyen <anthony.l.nguyen@intel.com>
> Date:   Wed Oct 4 08:52:58 2017 -0700
> 
>     PCI: Restore "ARI Capable Hierarchy" before setting numVFs
>     
>     In the restore path, we previously read PCI_SRIOV_VF_OFFSET and
>     PCI_SRIOV_VF_STRIDE before restoring PCI_SRIOV_CTRL_ARI, which
>     affects the offset and stride:
>     
>       pci_restore_state
>         pci_restore_iov_state
>           sriov_restore_state
>             pci_iov_set_numvfs
>               pci_read_config_word(... PCI_SRIOV_VF_OFFSET, &iov->offset)
>             pci_write_config_word(... PCI_SRIOV_CTRL, iov->ctrl)
>     
>     The effect is that suspend/resume and AER recovery, which use
>     pci_restore_state(), may corrupt iov->offset and iov->stride.  The iov
>     state is associated with the device, not the driver, so if we reload the
>     driver, it will use the the corrupted data, which may cause crashes like
>     this:
>     
>       kernel BUG at drivers/pci/iov.c:157!
>       RIP: 0010:pci_iov_add_virtfn+0x2eb/0x350
>       Call Trace:
>        pci_enable_sriov+0x353/0x440
>        ixgbe_pci_sriov_configure+0xd5/0x1f0 [ixgbe]
>        sriov_numvfs_store+0xf7/0x170
>        dev_attr_store+0x18/0x30
>        sysfs_kf_write+0x37/0x40
>        kernfs_fop_write+0x120/0x1b0
>        vfs_write+0xb5/0x1a0
>        SyS_write+0x55/0xc0
>     
>     The occurs since during AER recovery the ARI Capable Hierarchy bit, which
>     can affect the values for First VF Offset and VF Stride, is not set until
>     after pci_iov_set_numvfs() is called.  This can cause the iov structure to
>     be populated with values that are incorrect if the bit is later set.
>     Check and set this bit, if needed, before calling pci_iov_set_numvfs() so
>     that the values being populated properly take the ARI bit into account.
>     
>     Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
>     [bhelgaas: changelog, add comment, also clear ARI if necessary]
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>     CC: Alexander Duyck <alexander.h.duyck@intel.com>
>     CC: Emil Tantilov <emil.s.tantilov@intel.com>
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index ce24cf235f01..6bacb8995e96 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -498,6 +498,14 @@ static void sriov_restore_state(struct pci_dev *dev)
>  	if (ctrl & PCI_SRIOV_CTRL_VFE)
>  		return;
>  
> +	/*
> +	 * Restore PCI_SRIOV_CTRL_ARI before pci_iov_set_numvfs() because
> +	 * it reads offset & stride, which depend on PCI_SRIOV_CTRL_ARI.
> +	 */
> +	ctrl &= ~PCI_SRIOV_CTRL_ARI;
> +	ctrl |= iov->ctrl & PCI_SRIOV_CTRL_ARI;
> +	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, ctrl);
> +
>  	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++)
>  		pci_update_resource(dev, i);
>  

^ permalink raw reply

* Re: [PATCH net] net: enable interface alias removal via rtnl
From: David Ahern @ 2017-10-06 18:18 UTC (permalink / raw)
  To: Nicolas Dichtel, davem; +Cc: netdev, Oliver Hartkopp, Stephen Hemminger
In-Reply-To: <20171005101940.28550-1-nicolas.dichtel@6wind.com>

On 10/5/17 4:19 AM, Nicolas Dichtel wrote:
> IFLA_IFALIAS is defined as NLA_STRING. It means that the minimal length of
> the attribute is 1 ("\0"). However, to remove an alias, the attribute
> length must be 0 (see dev_set_alias()).

why not add a check in dev_set_alias that if len is 1 and the 1
character is '\0' it means remove the alias?

> 
> Let's define the type to NLA_BINARY, so that the alias can be removed.

that changes the uapi

^ permalink raw reply

* Re: [PATCH net-next 1/1] [net] bonding: Add NUMA notice
From: Eric Dumazet @ 2017-10-06 18:08 UTC (permalink / raw)
  To: Patrick Talbert; +Cc: Andrew Lunn, netdev
In-Reply-To: <CAPRrrxUGMpOqEA8MxzY91RyMTQp+OisO8xB5jVR+QA+dqLeSww@mail.gmail.com>

On Fri, 2017-10-06 at 13:54 -0400, Patrick Talbert wrote:

> 
> My goal with the patch is not to prevent some one from bonding
> whichever interfaces they want, only to notify them that what they are
> doing is *likely* to be less than ideal from a performance
> perspective. Even if some theoretical load balancing bonding mode was
> intelligent enough to consider NUMA when choosing a transmit
> interface, it never has control over the interface traffic is received
> on (excluding the strange balance-alb mode).

Note that following the NUMA node of current process would lead to
reordering for TCP flows.

XPS makes sure we stick to one TXQ to avoid reorders.

I am not sure your patch will really help, since you do not warn if a
process from another NUMA node tries to send packet to a bonding driver
using slaves on another NUMA node.

^ permalink raw reply

* [net-next 12/15] i40e: do not enter PHY debug mode while setting LEDs behaviour
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Mariusz Stachura, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Mariusz Stachura <mariusz.stachura@intel.com>

Previous implementation of LED set/get functions required to enter
PHY debug mode, in order to prevent access to it from FW and SW at
the same time. Reset of all ports was a unwanted side effect.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 6203d362438c..de0dfe340494 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2010,7 +2010,9 @@ static int i40e_set_phys_id(struct net_device *netdev,
 		if (!(pf->hw_features & I40E_HW_PHY_CONTROLS_LEDS)) {
 			pf->led_status = i40e_led_get(hw);
 		} else {
-			i40e_aq_set_phy_debug(hw, I40E_PHY_DEBUG_ALL, NULL);
+			if (!(hw->flags & I40E_HW_FLAG_AQ_PHY_ACCESS_CAPABLE))
+				i40e_aq_set_phy_debug(hw, I40E_PHY_DEBUG_ALL,
+						      NULL);
 			ret = i40e_led_get_phy(hw, &temp_status,
 					       &pf->phy_led_val);
 			pf->led_status = temp_status;
@@ -2035,7 +2037,8 @@ static int i40e_set_phys_id(struct net_device *netdev,
 			ret = i40e_led_set_phy(hw, false, pf->led_status,
 					       (pf->phy_led_val |
 					       I40E_PHY_LED_MODE_ORIG));
-			i40e_aq_set_phy_debug(hw, 0, NULL);
+			if (!(hw->flags & I40E_HW_FLAG_AQ_PHY_ACCESS_CAPABLE))
+				i40e_aq_set_phy_debug(hw, 0, NULL);
 		}
 		break;
 	default:
-- 
2.14.2

^ permalink raw reply related

* [net-next 14/15] i40e: ignore skb->xmit_more when deciding to set RS bit
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Since commit 6a7fded776a7 ("i40e: Fix RS bit update in Tx path and
disable force WB workaround") we've tried to "optimize" setting the
RS bit based around skb->xmit_more. This same logic was refactored
in commit 1dc8b538795f ("i40e: Reorder logic for coalescing RS bits"),
but ultimately was not functionally changed.

Using skb->xmit_more in this way is incorrect, because in certain
circumstances we may see a large number of skbs in sequence with
xmit_more set. This leads to a performance loss as the hardware does not
writeback anything for those packets, which delays the time it takes for
us to respond to the stack transmit requests. This significantly impacts
UDP performance, especially when layered with multiple devices, such as
bonding, VLANs, and vnet setups.

This was not noticed until now because it is difficult to create a setup
which reproduces the issue. It was discovered in a UDP_STREAM test in
a VM, connected using a vnet device to a bridge, which is connected to
a bonded pair of X710 ports in active-backup mode with a VLAN. These
layered devices seem to compound the number of skbs transmitted at once
by the qdisc. Additionally, the problem can be masked by reducing the
ITR value.

Since the original commit does not provide strong justification for this
RS bit "optimization", revert to the previous behavior of setting the RS
bit every 4th packet.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 34 ++++-------------------------
 1 file changed, 4 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index d9fdf69bbc6e..3bd176606c09 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -3167,38 +3167,12 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 	/* write last descriptor with EOP bit */
 	td_cmd |= I40E_TX_DESC_CMD_EOP;

-	/* We can OR these values together as they both are checked against
-	 * 4 below and at this point desc_count will be used as a boolean value
-	 * after this if/else block.
+	/* We OR these values together to check both against 4 (WB_STRIDE)
+	 * below. This is safe since we don't re-use desc_count afterwards.
 	 */
 	desc_count |= ++tx_ring->packet_stride;

-	/* Algorithm to optimize tail and RS bit setting:
-	 * if queue is stopped
-	 *	mark RS bit
-	 *	reset packet counter
-	 * else if xmit_more is supported and is true
-	 *	advance packet counter to 4
-	 *	reset desc_count to 0
-	 *
-	 * if desc_count >= 4
-	 *	mark RS bit
-	 *	reset packet counter
-	 * if desc_count > 0
-	 *	update tail
-	 *
-	 * Note: If there are less than 4 descriptors
-	 * pending and interrupts were disabled the service task will
-	 * trigger a force WB.
-	 */
-	if (netif_xmit_stopped(txring_txq(tx_ring))) {
-		goto do_rs;
-	} else if (skb->xmit_more) {
-		/* set stride to arm on next packet and reset desc_count */
-		tx_ring->packet_stride = WB_STRIDE;
-		desc_count = 0;
-	} else if (desc_count >= WB_STRIDE) {
-do_rs:
+	if (desc_count >= WB_STRIDE) {
 		/* write last descriptor with RS bit set */
 		td_cmd |= I40E_TX_DESC_CMD_RS;
 		tx_ring->packet_stride = 0;
@@ -3219,7 +3193,7 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 	first->next_to_watch = tx_desc;

 	/* notify HW of packet */
-	if (desc_count) {
+	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
 		writel(i, tx_ring->tail);

 		/* we need this if more than one processor can write to our tail
-- 
2.14.2

^ permalink raw reply related

* [net-next 13/15] i40evf: enable support for VF VLAN tag stripping control
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

A recent commit 809481484e5d ("i40e/i40evf: support for VF VLAN tag
stripping control") added support for VFs to negotiate the control of
VLAN tag stripping. This should have allowed VFs to disable the feature.
Unfortunately, the flag was set only in netdev->feature flags and not in
netdev->hw_features.

This ultimately causes the stack to assume that it cannot change the
flag, so it was unchangeable and marked as [fixed] in the ethtool -k
output.

Fix this by setting the feature in hw_features first, just as we do for
the PF code. This enables ethtool -K to disable the feature correctly,
and fully enables user control of the VLAN tag stripping feature.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index bc76378a71e2..1d2fc898b664 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -2423,10 +2423,6 @@ static netdev_features_t i40evf_features_check(struct sk_buff *skb,
 	return features & ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
 }
 
-#define I40EVF_VLAN_FEATURES (NETIF_F_HW_VLAN_CTAG_TX |\
-			      NETIF_F_HW_VLAN_CTAG_RX |\
-			      NETIF_F_HW_VLAN_CTAG_FILTER)
-
 /**
  * i40evf_fix_features - fix up the netdev feature bits
  * @netdev: our net device
@@ -2439,9 +2435,11 @@ static netdev_features_t i40evf_fix_features(struct net_device *netdev,
 {
 	struct i40evf_adapter *adapter = netdev_priv(netdev);
 
-	features &= ~I40EVF_VLAN_FEATURES;
-	if (adapter->vf_res->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_VLAN)
-		features |= I40EVF_VLAN_FEATURES;
+	if (!(adapter->vf_res->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_VLAN))
+		features &= ~(NETIF_F_HW_VLAN_CTAG_TX |
+			      NETIF_F_HW_VLAN_CTAG_RX |
+			      NETIF_F_HW_VLAN_CTAG_FILTER);
+
 	return features;
 }
 
@@ -2572,9 +2570,17 @@ int i40evf_process_config(struct i40evf_adapter *adapter)
 	 */
 	hw_features = hw_enc_features;
 
+	/* Enable VLAN features if supported */
+	if (vfres->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_VLAN)
+		hw_features |= (NETIF_F_HW_VLAN_CTAG_TX |
+				NETIF_F_HW_VLAN_CTAG_RX);
+
 	netdev->hw_features |= hw_features;
 
-	netdev->features |= hw_features | I40EVF_VLAN_FEATURES;
+	netdev->features |= hw_features;
+
+	if (vfres->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_VLAN)
+		netdev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
 
 	adapter->vsi.id = adapter->vsi_res->vsi_id;
 
-- 
2.14.2

^ permalink raw reply related

* [net-next 15/15] i40e/i40evf: organize and re-number feature flags
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Now that we've reduced the number of flags, organize similar flags
together and re-number them accordingly.

Since we don't yet have more than 32 flags, we'll use a u32 for both the
hw_features and flag field. Should we gain more flags in the future, we
may need to convert to a u64 or separate flags out into two fields.

One alternative approach considered, but not implemented here, was to
use an enumeration for the flag variables, and create a macro
I40E_FLAG() which used string concatenation to generate BIT_ULL values.
This has the advantage of making the actual bit values compile-time
dynamic so that we do not need to worry about matching the order to the
bit value. However, this does produce a high level of code churn, and
makes it more difficult to read a dumped flags value when debugging.

Change-ID: I8653fff69453cd547d6fe98d29dfa9d8710387d1
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h         | 98 +++++++++++++-------------
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  6 +-
 drivers/net/ethernet/intel/i40evf/i40evf.h     | 32 ++++-----
 3 files changed, 68 insertions(+), 68 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 4dc6d43f8812..18c453a3e728 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -403,57 +403,57 @@ struct i40e_pf {
 	struct timer_list service_timer;
 	struct work_struct service_task;
 
-	u64 hw_features;
-#define I40E_HW_RSS_AQ_CAPABLE			BIT_ULL(0)
-#define I40E_HW_128_QP_RSS_CAPABLE		BIT_ULL(1)
-#define I40E_HW_ATR_EVICT_CAPABLE		BIT_ULL(2)
-#define I40E_HW_WB_ON_ITR_CAPABLE		BIT_ULL(3)
-#define I40E_HW_MULTIPLE_TCP_UDP_RSS_PCTYPE	BIT_ULL(4)
-#define I40E_HW_NO_PCI_LINK_CHECK		BIT_ULL(5)
-#define I40E_HW_100M_SGMII_CAPABLE		BIT_ULL(6)
-#define I40E_HW_NO_DCB_SUPPORT			BIT_ULL(7)
-#define I40E_HW_USE_SET_LLDP_MIB		BIT_ULL(8)
-#define I40E_HW_GENEVE_OFFLOAD_CAPABLE		BIT_ULL(9)
-#define I40E_HW_PTP_L4_CAPABLE			BIT_ULL(10)
-#define I40E_HW_WOL_MC_MAGIC_PKT_WAKE		BIT_ULL(11)
-#define I40E_HW_MPLS_HDR_OFFLOAD_CAPABLE	BIT_ULL(12)
-#define I40E_HW_HAVE_CRT_RETIMER		BIT_ULL(13)
-#define I40E_HW_OUTER_UDP_CSUM_CAPABLE		BIT_ULL(14)
-#define I40E_HW_PHY_CONTROLS_LEDS		BIT_ULL(15)
-#define I40E_HW_STOP_FW_LLDP			BIT_ULL(16)
-#define I40E_HW_PORT_ID_VALID			BIT_ULL(17)
-#define I40E_HW_RESTART_AUTONEG			BIT_ULL(18)
+	u32 hw_features;
+#define I40E_HW_RSS_AQ_CAPABLE			BIT(0)
+#define I40E_HW_128_QP_RSS_CAPABLE		BIT(1)
+#define I40E_HW_ATR_EVICT_CAPABLE		BIT(2)
+#define I40E_HW_WB_ON_ITR_CAPABLE		BIT(3)
+#define I40E_HW_MULTIPLE_TCP_UDP_RSS_PCTYPE	BIT(4)
+#define I40E_HW_NO_PCI_LINK_CHECK		BIT(5)
+#define I40E_HW_100M_SGMII_CAPABLE		BIT(6)
+#define I40E_HW_NO_DCB_SUPPORT			BIT(7)
+#define I40E_HW_USE_SET_LLDP_MIB		BIT(8)
+#define I40E_HW_GENEVE_OFFLOAD_CAPABLE		BIT(9)
+#define I40E_HW_PTP_L4_CAPABLE			BIT(10)
+#define I40E_HW_WOL_MC_MAGIC_PKT_WAKE		BIT(11)
+#define I40E_HW_MPLS_HDR_OFFLOAD_CAPABLE	BIT(12)
+#define I40E_HW_HAVE_CRT_RETIMER		BIT(13)
+#define I40E_HW_OUTER_UDP_CSUM_CAPABLE		BIT(14)
+#define I40E_HW_PHY_CONTROLS_LEDS		BIT(15)
+#define I40E_HW_STOP_FW_LLDP			BIT(16)
+#define I40E_HW_PORT_ID_VALID			BIT(17)
+#define I40E_HW_RESTART_AUTONEG			BIT(18)
 
 	u64 flags;
-#define I40E_FLAG_RX_CSUM_ENABLED		BIT_ULL(1)
-#define I40E_FLAG_MSI_ENABLED			BIT_ULL(2)
-#define I40E_FLAG_MSIX_ENABLED			BIT_ULL(3)
-#define I40E_FLAG_HW_ATR_EVICT_ENABLED		BIT_ULL(4)
-#define I40E_FLAG_RSS_ENABLED			BIT_ULL(6)
-#define I40E_FLAG_VMDQ_ENABLED			BIT_ULL(7)
-#define I40E_FLAG_IWARP_ENABLED			BIT_ULL(10)
-#define I40E_FLAG_FILTER_SYNC			BIT_ULL(15)
-#define I40E_FLAG_SERVICE_CLIENT_REQUESTED	BIT_ULL(16)
-#define I40E_FLAG_SRIOV_ENABLED			BIT_ULL(19)
-#define I40E_FLAG_DCB_ENABLED			BIT_ULL(20)
-#define I40E_FLAG_FD_SB_ENABLED			BIT_ULL(21)
-#define I40E_FLAG_FD_ATR_ENABLED		BIT_ULL(22)
-#define I40E_FLAG_FD_SB_AUTO_DISABLED		BIT_ULL(23)
-#define I40E_FLAG_FD_ATR_AUTO_DISABLED		BIT_ULL(24)
-#define I40E_FLAG_PTP				BIT_ULL(25)
-#define I40E_FLAG_MFP_ENABLED			BIT_ULL(26)
-#define I40E_FLAG_UDP_FILTER_SYNC		BIT_ULL(27)
-#define I40E_FLAG_DCB_CAPABLE			BIT_ULL(29)
-#define I40E_FLAG_VEB_STATS_ENABLED		BIT_ULL(37)
-#define I40E_FLAG_LINK_POLLING_ENABLED		BIT_ULL(39)
-#define I40E_FLAG_VEB_MODE_ENABLED		BIT_ULL(40)
-#define I40E_FLAG_TRUE_PROMISC_SUPPORT		BIT_ULL(51)
-#define I40E_FLAG_CLIENT_RESET			BIT_ULL(54)
-#define I40E_FLAG_TEMP_LINK_POLLING		BIT_ULL(55)
-#define I40E_FLAG_CLIENT_L2_CHANGE		BIT_ULL(56)
-#define I40E_FLAG_LINK_DOWN_ON_CLOSE_ENABLED	BIT_ULL(57)
-#define I40E_FLAG_LEGACY_RX			BIT_ULL(58)
-#define I40E_FLAG_SOURCE_PRUNING_DISABLED	BIT_ULL(59)
+#define I40E_FLAG_RX_CSUM_ENABLED		BIT(0)
+#define I40E_FLAG_MSI_ENABLED			BIT(1)
+#define I40E_FLAG_MSIX_ENABLED			BIT(2)
+#define I40E_FLAG_RSS_ENABLED			BIT(3)
+#define I40E_FLAG_VMDQ_ENABLED			BIT(4)
+#define I40E_FLAG_FILTER_SYNC			BIT(5)
+#define I40E_FLAG_SRIOV_ENABLED			BIT(6)
+#define I40E_FLAG_DCB_CAPABLE			BIT(7)
+#define I40E_FLAG_DCB_ENABLED			BIT(8)
+#define I40E_FLAG_FD_SB_ENABLED			BIT(9)
+#define I40E_FLAG_FD_ATR_ENABLED		BIT(10)
+#define I40E_FLAG_FD_SB_AUTO_DISABLED		BIT(11)
+#define I40E_FLAG_FD_ATR_AUTO_DISABLED		BIT(12)
+#define I40E_FLAG_MFP_ENABLED			BIT(13)
+#define I40E_FLAG_UDP_FILTER_SYNC		BIT(14)
+#define I40E_FLAG_HW_ATR_EVICT_ENABLED		BIT(15)
+#define I40E_FLAG_VEB_MODE_ENABLED		BIT(16)
+#define I40E_FLAG_VEB_STATS_ENABLED		BIT(17)
+#define I40E_FLAG_LINK_POLLING_ENABLED		BIT(18)
+#define I40E_FLAG_TRUE_PROMISC_SUPPORT		BIT(19)
+#define I40E_FLAG_TEMP_LINK_POLLING		BIT(20)
+#define I40E_FLAG_LEGACY_RX			BIT(21)
+#define I40E_FLAG_PTP				BIT(22)
+#define I40E_FLAG_IWARP_ENABLED			BIT(23)
+#define I40E_FLAG_SERVICE_CLIENT_REQUESTED	BIT(24)
+#define I40E_FLAG_CLIENT_L2_CHANGE		BIT(25)
+#define I40E_FLAG_CLIENT_RESET			BIT(26)
+#define I40E_FLAG_LINK_DOWN_ON_CLOSE_ENABLED	BIT(27)
+#define I40E_FLAG_SOURCE_PRUNING_DISABLED	BIT(28)
 
 	struct i40e_client_instance *cinst;
 	bool stat_offsets_loaded;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index de0dfe340494..afd3ca8d9851 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -4095,7 +4095,7 @@ static int i40e_set_priv_flags(struct net_device *dev, u32 flags)
 	struct i40e_netdev_priv *np = netdev_priv(dev);
 	struct i40e_vsi *vsi = np->vsi;
 	struct i40e_pf *pf = vsi->back;
-	u64 orig_flags, new_flags, changed_flags;
+	u32 orig_flags, new_flags, changed_flags;
 	u32 i, j;
 
 	orig_flags = READ_ONCE(pf->flags);
@@ -4147,12 +4147,12 @@ static int i40e_set_priv_flags(struct net_device *dev, u32 flags)
 		return -EOPNOTSUPP;
 
 	/* Compare and exchange the new flags into place. If we failed, that
-	 * is if cmpxchg64 returns anything but the old value, this means that
+	 * is if cmpxchg returns anything but the old value, this means that
 	 * something else has modified the flags variable since we copied it
 	 * originally. We'll just punt with an error and log something in the
 	 * message buffer.
 	 */
-	if (cmpxchg64(&pf->flags, orig_flags, new_flags) != orig_flags) {
+	if (cmpxchg(&pf->flags, orig_flags, new_flags) != orig_flags) {
 		dev_warn(&pf->pdev->dev,
 			 "Unable to update pf->flags as it was modified by another thread...\n");
 		return -EAGAIN;
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h b/drivers/net/ethernet/intel/i40evf/i40evf.h
index 5982362c5643..de0af521d602 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -222,22 +222,22 @@ struct i40evf_adapter {
 
 	u32 flags;
 #define I40EVF_FLAG_RX_CSUM_ENABLED		BIT(0)
-#define I40EVF_FLAG_IMIR_ENABLED		BIT(5)
-#define I40EVF_FLAG_MQ_CAPABLE			BIT(6)
-#define I40EVF_FLAG_PF_COMMS_FAILED		BIT(8)
-#define I40EVF_FLAG_RESET_PENDING		BIT(9)
-#define I40EVF_FLAG_RESET_NEEDED		BIT(10)
-#define I40EVF_FLAG_WB_ON_ITR_CAPABLE		BIT(11)
-#define I40EVF_FLAG_OUTER_UDP_CSUM_CAPABLE	BIT(12)
-#define I40EVF_FLAG_ADDR_SET_BY_PF		BIT(13)
-#define I40EVF_FLAG_SERVICE_CLIENT_REQUESTED	BIT(14)
-#define I40EVF_FLAG_CLIENT_NEEDS_OPEN		BIT(15)
-#define I40EVF_FLAG_CLIENT_NEEDS_CLOSE		BIT(16)
-#define I40EVF_FLAG_CLIENT_NEEDS_L2_PARAMS	BIT(17)
-#define I40EVF_FLAG_PROMISC_ON			BIT(18)
-#define I40EVF_FLAG_ALLMULTI_ON			BIT(19)
-#define I40EVF_FLAG_LEGACY_RX			BIT(20)
-#define I40EVF_FLAG_REINIT_ITR_NEEDED		BIT(21)
+#define I40EVF_FLAG_IMIR_ENABLED		BIT(1)
+#define I40EVF_FLAG_MQ_CAPABLE			BIT(2)
+#define I40EVF_FLAG_PF_COMMS_FAILED		BIT(3)
+#define I40EVF_FLAG_RESET_PENDING		BIT(4)
+#define I40EVF_FLAG_RESET_NEEDED		BIT(5)
+#define I40EVF_FLAG_WB_ON_ITR_CAPABLE		BIT(6)
+#define I40EVF_FLAG_OUTER_UDP_CSUM_CAPABLE	BIT(7)
+#define I40EVF_FLAG_ADDR_SET_BY_PF		BIT(8)
+#define I40EVF_FLAG_SERVICE_CLIENT_REQUESTED	BIT(9)
+#define I40EVF_FLAG_CLIENT_NEEDS_OPEN		BIT(10)
+#define I40EVF_FLAG_CLIENT_NEEDS_CLOSE		BIT(11)
+#define I40EVF_FLAG_CLIENT_NEEDS_L2_PARAMS	BIT(12)
+#define I40EVF_FLAG_PROMISC_ON			BIT(13)
+#define I40EVF_FLAG_ALLMULTI_ON			BIT(14)
+#define I40EVF_FLAG_LEGACY_RX			BIT(15)
+#define I40EVF_FLAG_REINIT_ITR_NEEDED		BIT(16)
 /* duplicates for common code */
 #define I40E_FLAG_DCB_ENABLED			0
 #define I40E_FLAG_RX_CSUM_ENABLED		I40EVF_FLAG_RX_CSUM_ENABLED
-- 
2.14.2

^ permalink raw reply related

* [net-next 11/15] i40e: implement split PCI error reset handler
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Alan Brady, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Alan Brady <alan.brady@intel.com>

This patch implements the PCI error handler reset_prepare and reset_done.
This allows us to handle function level reset.  Without this patch we
are unable to perform and recover from an FLR correctly and this will cause
VFs to be unable to recover from an FLR on the PF.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 9704cfef2f05..60b11fdeca2d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -12045,6 +12045,28 @@ static pci_ers_result_t i40e_pci_error_slot_reset(struct pci_dev *pdev)
 	return result;
 }
 
+/**
+ * i40e_pci_error_reset_prepare - prepare device driver for pci reset
+ * @pdev: PCI device information struct
+ */
+static void i40e_pci_error_reset_prepare(struct pci_dev *pdev)
+{
+	struct i40e_pf *pf = pci_get_drvdata(pdev);
+
+	i40e_prep_for_reset(pf, false);
+}
+
+/**
+ * i40e_pci_error_reset_done - pci reset done, device driver reset can begin
+ * @pdev: PCI device information struct
+ */
+static void i40e_pci_error_reset_done(struct pci_dev *pdev)
+{
+	struct i40e_pf *pf = pci_get_drvdata(pdev);
+
+	i40e_reset_and_rebuild(pf, false, false);
+}
+
 /**
  * i40e_pci_error_resume - restart operations after PCI error recovery
  * @pdev: PCI device information struct
@@ -12235,6 +12257,8 @@ static int i40e_resume(struct device *dev)
 static const struct pci_error_handlers i40e_err_handler = {
 	.error_detected = i40e_pci_error_detected,
 	.slot_reset = i40e_pci_error_slot_reset,
+	.reset_prepare = i40e_pci_error_reset_prepare,
+	.reset_done = i40e_pci_error_reset_done,
 	.resume = i40e_pci_error_resume,
 };
 
-- 
2.14.2

^ permalink raw reply related

* [net-next 09/15] i40e: Display error message if module does not meet thermal requirements
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Filip Sadowski, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Filip Sadowski <filip.sadowski@intel.com>

This patch causes error message to be displayed when NIC detects
insertion of module that does not meet thermal requirements.

Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h             |  1 +
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h  |  1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c        | 24 +++++++++++++++++-----
 .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h    |  1 +
 4 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index c78448daa7a1..4dc6d43f8812 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -451,6 +451,7 @@ struct i40e_pf {
 #define I40E_FLAG_CLIENT_RESET			BIT_ULL(54)
 #define I40E_FLAG_TEMP_LINK_POLLING		BIT_ULL(55)
 #define I40E_FLAG_CLIENT_L2_CHANGE		BIT_ULL(56)
+#define I40E_FLAG_LINK_DOWN_ON_CLOSE_ENABLED	BIT_ULL(57)
 #define I40E_FLAG_LEGACY_RX			BIT_ULL(58)
 #define I40E_FLAG_SOURCE_PRUNING_DISABLED	BIT_ULL(59)
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 50c5a4c630b8..a8f65aed5421 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -1772,6 +1772,7 @@ enum i40e_aq_phy_type {
 	I40E_PHY_TYPE_25GBASE_SR		= 0x21,
 	I40E_PHY_TYPE_25GBASE_LR		= 0x22,
 	I40E_PHY_TYPE_MAX,
+	I40E_PHY_TYPE_NOT_SUPPORTED_HIGH_TEMP	= 0xFD,
 	I40E_PHY_TYPE_EMPTY			= 0xFE,
 	I40E_PHY_TYPE_DEFAULT			= 0xFF,
 };
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 628101bb08d4..3d6d6a283327 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6558,12 +6558,26 @@ static void i40e_handle_link_event(struct i40e_pf *pf,
 	 */
 	i40e_link_event(pf);
 
-	/* check for unqualified module, if link is down */
-	if ((status->link_info & I40E_AQ_MEDIA_AVAILABLE) &&
-	    (!(status->an_info & I40E_AQ_QUALIFIED_MODULE)) &&
-	    (!(status->link_info & I40E_AQ_LINK_UP)))
+	/* Check if module meets thermal requirements */
+	if (status->phy_type == I40E_PHY_TYPE_NOT_SUPPORTED_HIGH_TEMP) {
 		dev_err(&pf->pdev->dev,
-			"The driver failed to link because an unqualified module was detected.\n");
+			"Rx/Tx is disabled on this device because the module does not meet thermal requirements.\n");
+		dev_err(&pf->pdev->dev,
+			"Refer to the Intel(R) Ethernet Adapters and Devices User Guide for a list of supported modules.\n");
+	} else {
+		/* check for unqualified module, if link is down, suppress
+		 * the message if link was forced to be down.
+		 */
+		if ((status->link_info & I40E_AQ_MEDIA_AVAILABLE) &&
+		    (!(status->an_info & I40E_AQ_QUALIFIED_MODULE)) &&
+		    (!(status->link_info & I40E_AQ_LINK_UP)) &&
+		    (!(pf->flags & I40E_FLAG_LINK_DOWN_ON_CLOSE_ENABLED))) {
+			dev_err(&pf->pdev->dev,
+				"Rx/Tx is disabled on this device because an unsupported SFP module type was detected.\n");
+			dev_err(&pf->pdev->dev,
+				"Refer to the Intel(R) Ethernet Adapters and Devices User Guide for a list of supported modules.\n");
+		}
+	}
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
index dc6fc8b1bc79..60c892f559b9 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
@@ -1768,6 +1768,7 @@ enum i40e_aq_phy_type {
 	I40E_PHY_TYPE_25GBASE_SR		= 0x21,
 	I40E_PHY_TYPE_25GBASE_LR		= 0x22,
 	I40E_PHY_TYPE_MAX,
+	I40E_PHY_TYPE_NOT_SUPPORTED_HIGH_TEMP	= 0xFD,
 	I40E_PHY_TYPE_EMPTY			= 0xFE,
 	I40E_PHY_TYPE_DEFAULT			= 0xFF,
 };
-- 
2.14.2

^ permalink raw reply related

* [net-next 06/15] i40e: fix incorrect register definition
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Mitch Williams, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Mitch Williams <mitch.a.williams@intel.com>

This register was defined incorrectly. Fix the increment value to 8, and
replace the iterator with _i to make the definition consistent with
other statistics registers.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_register.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_register.h b/drivers/net/ethernet/intel/i40e/i40e_register.h
index 86ca27f72f02..c234758dad15 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_register.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_register.h
@@ -2794,7 +2794,7 @@
 #define I40E_GLV_RUPP_MAX_INDEX 383
 #define I40E_GLV_RUPP_RUPP_SHIFT 0
 #define I40E_GLV_RUPP_RUPP_MASK I40E_MASK(0xFFFFFFFF, I40E_GLV_RUPP_RUPP_SHIFT)
-#define I40E_GLV_TEPC(_VSI) (0x00344000 + ((_VSI) * 4)) /* _i=0...383 */ /* Reset: CORER */
+#define I40E_GLV_TEPC(_i) (0x00344000 + ((_i) * 8)) /* _i=0...383 */ /* Reset: CORER */
 #define I40E_GLV_TEPC_MAX_INDEX 383
 #define I40E_GLV_TEPC_TEPC_SHIFT 0
 #define I40E_GLV_TEPC_TEPC_MASK I40E_MASK(0xFFFFFFFF, I40E_GLV_TEPC_TEPC_SHIFT)
-- 
2.14.2

^ permalink raw reply related

* [net-next 05/15] i40e: redfine I40E_PHY_TYPE_MAX
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Mitch Williams, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Mitch Williams <mitch.a.williams@intel.com>

Since I40E_PHY_TYPE_MAX is used as an iterator, usually combined with
some sort of bit-shifting, it should only include actual PHY types and
not error cases. Move it up in the enum declaration so that loops only
iterate across valid PHY types.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h   | 2 +-
 drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 4c85ea9cd89a..50c5a4c630b8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -1771,9 +1771,9 @@ enum i40e_aq_phy_type {
 	I40E_PHY_TYPE_25GBASE_CR		= 0x20,
 	I40E_PHY_TYPE_25GBASE_SR		= 0x21,
 	I40E_PHY_TYPE_25GBASE_LR		= 0x22,
+	I40E_PHY_TYPE_MAX,
 	I40E_PHY_TYPE_EMPTY			= 0xFE,
 	I40E_PHY_TYPE_DEFAULT			= 0xFF,
-	I40E_PHY_TYPE_MAX
 };
 
 #define I40E_LINK_SPEED_100MB_SHIFT	0x1
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
index ed5602f4bbcd..dc6fc8b1bc79 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
@@ -1767,9 +1767,9 @@ enum i40e_aq_phy_type {
 	I40E_PHY_TYPE_25GBASE_CR		= 0x20,
 	I40E_PHY_TYPE_25GBASE_SR		= 0x21,
 	I40E_PHY_TYPE_25GBASE_LR		= 0x22,
+	I40E_PHY_TYPE_MAX,
 	I40E_PHY_TYPE_EMPTY			= 0xFE,
 	I40E_PHY_TYPE_DEFAULT			= 0xFF,
-	I40E_PHY_TYPE_MAX
 };
 
 #define I40E_LINK_SPEED_100MB_SHIFT	0x1
-- 
2.14.2

^ permalink raw reply related

* [net-next 10/15] i40e: Properly maintain flow director filters list
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Filip Sadowski, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Filip Sadowski <filip.sadowski@intel.com>

When there is no space for more flow director filters and user requested to
add a new one it is rejected by firmware and automatically removed from the
filter list maintained by driver. This behaviour is correct. Afterwards
existing filter can be removed making free slot for the new one. This
however causes the newly added filter to be accepted by firmware but
removed from driver filter list resulting in not showing after issuing
'ethtool -n <dev_name>'.

This happened due to not clearing the variable pf->fd_inv which stores
filter number to be removed from the list when firmware refused to add the
requested filter. It caused the filter with this specific ID to be
constantly removed once it was added to the list although it has been
accepted by firmware and effectively applied to the NIC.
It was fixed by clearing pf->fd_inv variable after removal of the filter
from the list when it was rejected by firmware.

Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3d6d6a283327..9704cfef2f05 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6232,6 +6232,7 @@ void i40e_fdir_check_and_reenable(struct i40e_pf *pf)
 				hlist_del(&filter->fdir_node);
 				kfree(filter);
 				pf->fdir_pf_active_filters--;
+				pf->fd_inv = 0;
 			}
 		}
 	}
-- 
2.14.2

^ permalink raw reply related

* [net-next 07/15] i40e/i40evf: use DECLARE_BITMAP for state
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Jesse Brandeburg, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

When using set_bit and friends, we should be using actual
bitmaps, and fix all the locations where we might access
it.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 8 ++++----
 drivers/net/ethernet/intel/i40e/i40e_main.c    | 4 ++--
 drivers/net/ethernet/intel/i40e/i40e_txrx.h    | 3 ++-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h  | 3 ++-
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 8f326f87a815..6f2725fc50a1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -278,8 +278,8 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int seid)
 			 rx_ring->netdev,
 			 rx_ring->rx_bi);
 		dev_info(&pf->pdev->dev,
-			 "    rx_rings[%i]: state = %li, queue_index = %d, reg_idx = %d\n",
-			 i, rx_ring->state,
+			 "    rx_rings[%i]: state = %lu, queue_index = %d, reg_idx = %d\n",
+			 i, *rx_ring->state,
 			 rx_ring->queue_index,
 			 rx_ring->reg_idx);
 		dev_info(&pf->pdev->dev,
@@ -334,8 +334,8 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int seid)
 			 tx_ring->netdev,
 			 tx_ring->tx_bi);
 		dev_info(&pf->pdev->dev,
-			 "    tx_rings[%i]: state = %li, queue_index = %d, reg_idx = %d\n",
-			 i, tx_ring->state,
+			 "    tx_rings[%i]: state = %lu, queue_index = %d, reg_idx = %d\n",
+			 i, *tx_ring->state,
 			 tx_ring->queue_index,
 			 tx_ring->reg_idx);
 		dev_info(&pf->pdev->dev,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 85132eee9f64..49401be7a2f4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2891,7 +2891,7 @@ static void i40e_config_xps_tx_ring(struct i40e_ring *ring)
 		return;
 
 	if ((vsi->tc_config.numtc <= 1) &&
-	    !test_and_set_bit(__I40E_TX_XPS_INIT_DONE, &ring->state)) {
+	    !test_and_set_bit(__I40E_TX_XPS_INIT_DONE, ring->state)) {
 		cpu = cpumask_local_spread(ring->q_vector->v_idx, -1);
 		netif_set_xps_queue(ring->netdev, get_cpu_mask(cpu),
 				    ring->queue_index);
@@ -3010,7 +3010,7 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
 	struct i40e_hmc_obj_rxq rx_ctx;
 	i40e_status err = 0;
 
-	ring->state = 0;
+	bitmap_zero(ring->state, __I40E_RING_STATE_NBITS);
 
 	/* clear the context structure first */
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 2f848bc5e391..a4e3e665a1a1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -342,6 +342,7 @@ struct i40e_rx_queue_stats {
 enum i40e_ring_state_t {
 	__I40E_TX_FDIR_INIT_DONE,
 	__I40E_TX_XPS_INIT_DONE,
+	__I40E_RING_STATE_NBITS /* must be last */
 };
 
 /* some useful defines for virtchannel interface, which
@@ -366,7 +367,7 @@ struct i40e_ring {
 		struct i40e_tx_buffer *tx_bi;
 		struct i40e_rx_buffer *rx_bi;
 	};
-	unsigned long state;
+	DECLARE_BITMAP(state, __I40E_RING_STATE_NBITS);
 	u16 queue_index;		/* Queue number of ring */
 	u8 dcb_tc;			/* Traffic class of ring */
 	u8 __iomem *tail;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
index 0d9f98bc07bd..d8ca802a71a9 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
@@ -325,6 +325,7 @@ struct i40e_rx_queue_stats {
 enum i40e_ring_state_t {
 	__I40E_TX_FDIR_INIT_DONE,
 	__I40E_TX_XPS_INIT_DONE,
+	__I40E_RING_STATE_NBITS /* must be last */
 };
 
 /* some useful defines for virtchannel interface, which
@@ -348,7 +349,7 @@ struct i40e_ring {
 		struct i40e_tx_buffer *tx_bi;
 		struct i40e_rx_buffer *rx_bi;
 	};
-	unsigned long state;
+	DECLARE_BITMAP(state, __I40E_RING_STATE_NBITS);
 	u16 queue_index;		/* Queue number of ring */
 	u8 dcb_tc;			/* Traffic class of ring */
 	u8 __iomem *tail;
-- 
2.14.2

^ permalink raw reply related

* [net-next 08/15] i40e: fix merge error
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Alice Michael, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Alice Michael <alice.michael@intel.com>

This patch removes some code that was accidentally added to
the wrong function with a merge error.  Fixes: c53934c6d1b1
("i40e: fix: do not sleep in netdev_ops")

Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 49401be7a2f4..628101bb08d4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -1776,11 +1776,6 @@ static void i40e_set_rx_mode(struct net_device *netdev)
 		vsi->flags |= I40E_VSI_FLAG_FILTER_CHANGED;
 		vsi->back->flags |= I40E_FLAG_FILTER_SYNC;
 	}
-
-	/* schedule our worker thread which will take care of
-	 * applying the new filter changes
-	 */
-	i40e_service_event_schedule(vsi->back);
 }
 
 /**
-- 
2.14.2

^ permalink raw reply related

* [net-next 03/15] i40e/i40evf: spread CPU affinity hints across online CPUs only
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Currently, when setting up the IRQ for a q_vector, we set an affinity
hint based on the v_idx of that q_vector. Meaning a loop iterates on
v_idx, which is an incremental value, and the cpumask is created based
on this value.

This is a problem in systems with multiple logical CPUs per core (like in
simultaneous multithreading (SMT) scenarios). If we disable some logical
CPUs, by turning SMT off for example, we will end up with a sparse
cpu_online_mask, i.e., only the first CPU in a core is online, and
incremental filling in q_vector cpumask might lead to multiple offline
CPUs being assigned to q_vectors.

Example: if we have a system with 8 cores each one containing 8 logical
CPUs (SMT == 8 in this case), we have 64 CPUs in total. But if SMT is
disabled, only the 1st CPU in each core remains online, so the
cpu_online_mask in this case would have only 8 bits set, in a sparse way.

In general case, when SMT is off the cpu_online_mask has only C bits set:
0, 1*N, 2*N, ..., C*(N-1)  where
C == # of cores;
N == # of logical CPUs per core.
In our example, only bits 0, 8, 16, 24, 32, 40, 48, 56 would be set.

Instead, we should only assign hints for CPUs which are online. Even
better, the kernel already provides a function, cpumask_local_spread()
which takes an index and returns a CPU, spreading the interrupts across
local NUMA nodes first, and then remote ones if necessary.

Since we generally have a 1:1 mapping between vectors and CPUs, there
is no real advantage to spreading vectors to local CPUs first. In order
to avoid mismatch of the default XPS hints, we'll pass -1 so that it
spreads across all CPUs without regard to the node locality.

Note that we don't need to change the q_vector->affinity_mask as this is
initialized to cpu_possible_mask, until an actual affinity is set and
then notified back to us.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c     | 16 +++++++++++-----
 drivers/net/ethernet/intel/i40evf/i40evf_main.c |  9 ++++++---
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b539469f576f..d2bb4f17c89e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2885,14 +2885,15 @@ static void i40e_vsi_free_rx_resources(struct i40e_vsi *vsi)
 static void i40e_config_xps_tx_ring(struct i40e_ring *ring)
 {
 	struct i40e_vsi *vsi = ring->vsi;
+	int cpu;
 
 	if (!ring->q_vector || !ring->netdev)
 		return;
 
 	if ((vsi->tc_config.numtc <= 1) &&
 	    !test_and_set_bit(__I40E_TX_XPS_INIT_DONE, &ring->state)) {
-		netif_set_xps_queue(ring->netdev,
-				    get_cpu_mask(ring->q_vector->v_idx),
+		cpu = cpumask_local_spread(ring->q_vector->v_idx, -1);
+		netif_set_xps_queue(ring->netdev, get_cpu_mask(cpu),
 				    ring->queue_index);
 	}
 
@@ -3482,6 +3483,7 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename)
 	int tx_int_idx = 0;
 	int vector, err;
 	int irq_num;
+	int cpu;
 
 	for (vector = 0; vector < q_vectors; vector++) {
 		struct i40e_q_vector *q_vector = vsi->q_vectors[vector];
@@ -3517,10 +3519,14 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename)
 		q_vector->affinity_notify.notify = i40e_irq_affinity_notify;
 		q_vector->affinity_notify.release = i40e_irq_affinity_release;
 		irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify);
-		/* get_cpu_mask returns a static constant mask with
-		 * a permanent lifetime so it's ok to use here.
+		/* Spread affinity hints out across online CPUs.
+		 *
+		 * get_cpu_mask returns a static constant mask with
+		 * a permanent lifetime so it's ok to pass to
+		 * irq_set_affinity_hint without making a copy.
 		 */
-		irq_set_affinity_hint(irq_num, get_cpu_mask(q_vector->v_idx));
+		cpu = cpumask_local_spread(q_vector->v_idx, -1);
+		irq_set_affinity_hint(irq_num, get_cpu_mask(cpu));
 	}
 
 	vsi->irqs_ready = true;
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index f2f1e754c2ce..bc76378a71e2 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -515,6 +515,7 @@ i40evf_request_traffic_irqs(struct i40evf_adapter *adapter, char *basename)
 	unsigned int vector, q_vectors;
 	unsigned int rx_int_idx = 0, tx_int_idx = 0;
 	int irq_num, err;
+	int cpu;
 
 	i40evf_irq_disable(adapter);
 	/* Decrement for Other and TCP Timer vectors */
@@ -553,10 +554,12 @@ i40evf_request_traffic_irqs(struct i40evf_adapter *adapter, char *basename)
 		q_vector->affinity_notify.release =
 						   i40evf_irq_affinity_release;
 		irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify);
-		/* get_cpu_mask returns a static constant mask with
-		 * a permanent lifetime so it's ok to use here.
+		/* Spread the IRQ affinity hints across online CPUs. Note that
+		 * get_cpu_mask returns a mask with a permanent lifetime so
+		 * it's safe to use as a hint for irq_set_affinity_hint.
 		 */
-		irq_set_affinity_hint(irq_num, get_cpu_mask(q_vector->v_idx));
+		cpu = cpumask_local_spread(q_vector->v_idx, -1);
+		irq_set_affinity_hint(irq_num, get_cpu_mask(cpu));
 	}
 
 	return 0;
-- 
2.14.2

^ permalink raw reply related

* [net-next 02/15] i40e: add private flag to control source pruning
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Mitch Williams, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Mitch Williams <mitch.a.williams@intel.com>

By default, our devices do source pruning, that is, they drop receive
packets that have the source MAC matching one of the receive filters.
Unfortunately, this breaks ARP monitoring in channel bonding, as the
bonding driver expects devices to receive ARPs containing their own
source address.

Add an ethtool private flag to control this feature.

Also, remove the netif_running() check when we process our private
flags. It's OK to reset when the device is closed and in most cases we
need the reset the apply these changes.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h         |  1 +
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  7 +++++--
 drivers/net/ethernet/intel/i40e/i40e_main.c    | 25 +++++++++++++++++++++++++
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 2bc4dd0dbbf1..c78448daa7a1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -452,6 +452,7 @@ struct i40e_pf {
 #define I40E_FLAG_TEMP_LINK_POLLING		BIT_ULL(55)
 #define I40E_FLAG_CLIENT_L2_CHANGE		BIT_ULL(56)
 #define I40E_FLAG_LEGACY_RX			BIT_ULL(58)
+#define I40E_FLAG_SOURCE_PRUNING_DISABLED	BIT_ULL(59)
 
 	struct i40e_client_instance *cinst;
 	bool stat_offsets_loaded;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 1136d02e2e95..6203d362438c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -227,6 +227,8 @@ static const struct i40e_priv_flags i40e_gstrings_priv_flags[] = {
 	I40E_PRIV_FLAG("veb-stats", I40E_FLAG_VEB_STATS_ENABLED, 0),
 	I40E_PRIV_FLAG("hw-atr-eviction", I40E_FLAG_HW_ATR_EVICT_ENABLED, 0),
 	I40E_PRIV_FLAG("legacy-rx", I40E_FLAG_LEGACY_RX, 0),
+	I40E_PRIV_FLAG("disable-source-pruning",
+		       I40E_FLAG_SOURCE_PRUNING_DISABLED, 0),
 };
 
 #define I40E_PRIV_FLAGS_STR_LEN ARRAY_SIZE(i40e_gstrings_priv_flags)
@@ -4189,8 +4191,9 @@ static int i40e_set_priv_flags(struct net_device *dev, u32 flags)
 	/* Issue reset to cause things to take effect, as additional bits
 	 * are added we will need to create a mask of bits requiring reset
 	 */
-	if ((changed_flags & I40E_FLAG_VEB_STATS_ENABLED) ||
-	    ((changed_flags & I40E_FLAG_LEGACY_RX) && netif_running(dev)))
+	if (changed_flags & (I40E_FLAG_VEB_STATS_ENABLED |
+			     I40E_FLAG_LEGACY_RX |
+			     I40E_FLAG_SOURCE_PRUNING_DISABLED))
 		i40e_do_reset(pf, BIT(__I40E_PF_RESET_REQUESTED), true);
 
 	return 0;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3f9e89b054ec..b539469f576f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9903,6 +9903,31 @@ static int i40e_add_vsi(struct i40e_vsi *vsi)
 
 		enabled_tc = i40e_pf_get_tc_map(pf);
 
+		/* Source pruning is enabled by default, so the flag is
+		 * negative logic - if it's set, we need to fiddle with
+		 * the VSI to disable source pruning.
+		 */
+		if (pf->flags & I40E_FLAG_SOURCE_PRUNING_DISABLED) {
+			memset(&ctxt, 0, sizeof(ctxt));
+			ctxt.seid = pf->main_vsi_seid;
+			ctxt.pf_num = pf->hw.pf_id;
+			ctxt.vf_num = 0;
+			ctxt.info.valid_sections |=
+				     cpu_to_le16(I40E_AQ_VSI_PROP_SWITCH_VALID);
+			ctxt.info.switch_id =
+				   cpu_to_le16(I40E_AQ_VSI_SW_ID_FLAG_LOCAL_LB);
+			ret = i40e_aq_update_vsi_params(hw, &ctxt, NULL);
+			if (ret) {
+				dev_info(&pf->pdev->dev,
+					 "update vsi failed, err %s aq_err %s\n",
+					 i40e_stat_str(&pf->hw, ret),
+					 i40e_aq_str(&pf->hw,
+						     pf->hw.aq.asq_last_status));
+				ret = -ENOENT;
+				goto err;
+			}
+		}
+
 		/* MFP mode setup queue map and update VSI */
 		if ((pf->flags & I40E_FLAG_MFP_ENABLED) &&
 		    !(pf->hw.func_caps.iscsi)) { /* NIC type PF */
-- 
2.14.2

^ permalink raw reply related

* [net-next 04/15] i40e: re-enable PTP L4 capabilities for XL710 if FW >6.0
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Alan Brady, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Alan Brady <alan.brady@intel.com>

Starting with XL710 FW 5.3 PTP L4 was disabled for XL710 due to a bug.  The
bug has since been resolved in XL710 FW >6.0 and PTP L4 can now be
re-enabled on those devices with updated firmware.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d2bb4f17c89e..85132eee9f64 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9074,6 +9074,11 @@ static int i40e_sw_init(struct i40e_pf *pf)
 	    (pf->hw.aq.fw_maj_ver >= 5)))
 		pf->hw_features |= I40E_HW_USE_SET_LLDP_MIB;
 
+	/* Enable PTP L4 if FW > v6.0 */
+	if (pf->hw.mac.type == I40E_MAC_XL710 &&
+	    pf->hw.aq.fw_maj_ver >= 6)
+		pf->hw_features |= I40E_HW_PTP_L4_CAPABLE;
+
 	if (pf->hw.func_caps.vmdq) {
 		pf->num_vmdq_vsis = I40E_DEFAULT_NUM_VMDQ_VSI;
 		pf->flags |= I40E_FLAG_VMDQ_ENABLED;
-- 
2.14.2

^ permalink raw reply related

* [net-next 00/15][pull request] 40GbE Intel Wired LAN Driver Updates 2017-10-06
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains updates to i40e and i40evf only.

Rami fixes a typo in the code comments.

Mitch adds an ethtool private flag to control source pruning to resolve an
issue where our default behavior is to enable source pruning which breaks ARP
monitoring in channel bonding.  Fixes a couple of register definitions, which
were incorrect.

Jake fixes an issue with multiple logical CPUs per core (simultaneous
multithreading - SMT) and how we set an affinity hint based on the v_idx of
that q_vector, which is an incremental value and might lead to multiple
offline CPUs being assigned to a q_vector.  Instead, we should only assign
hints for CPUs which are online, so look to use cpumask_local_spread().
Also fixed a VF VLAN tag stripping issue, where the flag created to change
this feature was seen as unchangeable.  Lastly, organized and re-numbered
the feature flags.

Alan re-enables PTP L4 for XL710 devices with firmware version 6.0 or
greater, now that the previous bug in the older firmware is fixed.
Implements the PCI error handlers for reset_prepare() and reset_done() to
allow us to handle function level resets.

Alice cleans up code that was added to the incorrect function during a
merge.

Filip adds a change to display an error message when a module is inserted
that does not meet the thermal requirements, Talking Heads "Burning Down
the House" comes to mind.  Also fixed a flow director filter issue where
a variable was not being cleared which stores the filter number to be
removed from the list when the firmware refused to add the requested
filter.

The following are changes since commit cc71b7b071192ac1c288e272fdc3f3877eb96663:
  net/ipv6: remove unused err variable on icmpv6_push_pending_frames
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Alan Brady (2):
  i40e: re-enable PTP L4 capabilities for XL710 if FW >6.0
  i40e: implement split PCI error reset handler

Alice Michael (1):
  i40e: fix merge error

Filip Sadowski (2):
  i40e: Display error message if module does not meet thermal
    requirements
  i40e: Properly maintain flow director filters list

Jacob Keller (4):
  i40e/i40evf: spread CPU affinity hints across online CPUs only
  i40evf: enable support for VF VLAN tag stripping control
  i40e: ignore skb->xmit_more when deciding to set RS bit
  i40e/i40evf: organize and re-number feature flags

Jesse Brandeburg (1):
  i40e/i40evf: use DECLARE_BITMAP for state

Mariusz Stachura (1):
  i40e: do not enter PHY debug mode while setting LEDs behaviour

Mitch Williams (3):
  i40e: add private flag to control source pruning
  i40e: redfine I40E_PHY_TYPE_MAX
  i40e: fix incorrect register definition

Rami Rosen (1):
  i40e: fix a typo in i40e_pf documentation

 drivers/net/ethernet/intel/i40e/i40e.h             |  98 +++++++++----------
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h  |   3 +-
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c     |   8 +-
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c     |  20 ++--
 drivers/net/ethernet/intel/i40e/i40e_main.c        | 104 +++++++++++++++++----
 drivers/net/ethernet/intel/i40e/i40e_register.h    |   2 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        |  34 +------
 drivers/net/ethernet/intel/i40e/i40e_txrx.h        |   3 +-
 .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h    |   3 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h      |   3 +-
 drivers/net/ethernet/intel/i40evf/i40evf.h         |  32 +++----
 drivers/net/ethernet/intel/i40evf/i40evf_main.c    |  31 +++---
 12 files changed, 203 insertions(+), 138 deletions(-)

-- 
2.14.2

^ permalink raw reply

* [net-next 01/15] i40e: fix a typo in i40e_pf documentation
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Rami Rosen, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Rami Rosen <rami.rosen@intel.com>

This patch fixes a typo in i40e_pf object documentation; num_req_vfs
refers to the number of VFs requested for the PF.

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 439c63cb2a0c..2bc4dd0dbbf1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -350,7 +350,7 @@ struct i40e_pf {
 	u16 num_vmdq_vsis;         /* num vmdq vsis this PF has set up */
 	u16 num_vmdq_qps;          /* num queue pairs per vmdq pool */
 	u16 num_vmdq_msix;         /* num queue vectors per vmdq pool */
-	u16 num_req_vfs;           /* num VFs requested for this VF */
+	u16 num_req_vfs;           /* num VFs requested for this PF */
 	u16 num_vf_qps;            /* num queue pairs per VF */
 	u16 num_lan_qps;           /* num lan queues this PF has set up */
 	u16 num_lan_msix;          /* num queue vectors for the base PF vsi */
-- 
2.14.2

^ permalink raw reply related

* Re: [PATCH net-next 1/1] [net] bonding: Add NUMA notice
From: Patrick Talbert @ 2017-10-06 17:54 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <20171005214636.GM13247@lunn.ch>

On Thu, Oct 5, 2017 at 5:46 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> On Thu, Oct 05, 2017 at 04:23:45PM -0400, Patrick Talbert wrote:
>> Network performance can suffer when a load balancing bond uses slave
>> interfaces which are in different NUMA domains.
>>
>> This compares the NUMA domain of a newly enslaved interface against any
>> existing enslaved interfaces and prints a warning if they do not match.
>
> Hi Patrick
>
> Is there a bonding mode which might actually want to do this? Send on
> the local domain, unless it is overloaded, in which case send it to
> the other domain?
>

I suppose there could theoretically be a bonding mode that could do
that, but currently no such mode exists.

> There is also this talk for netdev:
>
> https://netdevconf.org/2.2/session.html?shochat-devicemgmt-talk

>From reading the abstract there, it sounds like such a device driver
would want to abstract away the numa location of the underlying
devices from the "unified" net device it presents to the kernel.

>
>         Andrew

My goal with the patch is not to prevent some one from bonding
whichever interfaces they want, only to notify them that what they are
doing is *likely* to be less than ideal from a performance
perspective. Even if some theoretical load balancing bonding mode was
intelligent enough to consider NUMA when choosing a transmit
interface, it never has control over the interface traffic is received
on (excluding the strange balance-alb mode).

Patrick

^ permalink raw reply

* Re: [PATCH] netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
From: Willem de Bruijn @ 2017-10-06 17:40 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: Pablo Neira Ayuso, netfilter-devel, Willem de Bruijn,
	Network Development, Daniel Borkmann, Rafael Buchbinder,
	Shmulik Ladkani
In-Reply-To: <20171006160242.4403-1-shmulik@nsof.io>

On Fri, Oct 6, 2017 at 12:02 PM, Shmulik Ladkani <shmulik@nsof.io> wrote:
> From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
>
> Commit 2c16d6033264 ("netfilter: xt_bpf: support ebpf") introduced
> support for attaching an eBPF object by an fd, with the
> 'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
> IPT_SO_SET_REPLACE call.
>
> However this breaks subsequent iptables calls:
>
>  # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
>  # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
>  iptables: Invalid argument. Run `dmesg' for more information.
>
> That's because iptables works by loading exising rules using
> IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
> the replacement set.
>
> However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
> (from the initial "iptables -m bpf" invocation) - so when 2nd invocation
> occurs, userspace passes a bogus fd number, which leads to
> 'bpf_mt_check_v1' to fail.
>
> One suggested solution [1] was to hack iptables userspace, to perform a
> "entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
> process-local fd per every 'xt_bpf_info_v1' entry seen.
>
> However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
> depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.
>
> This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
> '.fd' and instead perform an in-kernel lookup for the bpf object given
> the provided '.path'.
>
> It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
> XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
> expected to provide the path of the pinned object.
>
> Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.

I suppose that we have the same problem here. As a matter of fact, the
implementation is the same for both FD_ELF and FD_PINNED, and checks

  f.file->f_op == &bpf_prog_fops

so a file descriptor to a random open ELF file outside a bpf fs would not be
accepted as is.

Anyway, that is outside the scope of this fix.

>
> References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
>             [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2
>
> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Reported-by: Rafael Buchbinder <rafi@rbk.ms>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>

Acked-by: Willem de Bruijn <willemb@google.com>

Thanks a lot for fixing this.

> ---
>  include/uapi/linux/netfilter/xt_bpf.h |  1 +
>  kernel/bpf/inode.c                    |  1 +
>  net/netfilter/xt_bpf.c                | 22 ++++++++++++++++++++--
>  3 files changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/netfilter/xt_bpf.h b/include/uapi/linux/netfilter/xt_bpf.h
> index b97725af2ac0..da161b56c79e 100644
> --- a/include/uapi/linux/netfilter/xt_bpf.h
> +++ b/include/uapi/linux/netfilter/xt_bpf.h
> @@ -23,6 +23,7 @@ enum xt_bpf_modes {
>         XT_BPF_MODE_FD_PINNED,
>         XT_BPF_MODE_FD_ELF,
>  };
> +#define XT_BPF_MODE_PATH_PINNED XT_BPF_MODE_FD_PINNED
>
>  struct xt_bpf_info_v1 {
>         __u16 mode;
> diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
> index e833ed914358..be1dde967208 100644
> --- a/kernel/bpf/inode.c
> +++ b/kernel/bpf/inode.c
> @@ -363,6 +363,7 @@ int bpf_obj_get_user(const char __user *pathname)
>         putname(pname);
>         return ret;
>  }
> +EXPORT_SYMBOL_GPL(bpf_obj_get_user);
>
>  static void bpf_evict_inode(struct inode *inode)
>  {
> diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
> index 38986a95216c..29123934887b 100644
> --- a/net/netfilter/xt_bpf.c
> +++ b/net/netfilter/xt_bpf.c
> @@ -8,6 +8,7 @@
>   */
>
>  #include <linux/module.h>
> +#include <linux/syscalls.h>
>  #include <linux/skbuff.h>
>  #include <linux/filter.h>
>  #include <linux/bpf.h>
> @@ -49,6 +50,22 @@ static int __bpf_mt_check_fd(int fd, struct bpf_prog **ret)
>         return 0;
>  }
>
> +static int __bpf_mt_check_path(const char *path, struct bpf_prog **ret)
> +{
> +       mm_segment_t oldfs = get_fs();
> +       int retval, fd;
> +
> +       set_fs(KERNEL_DS);
> +       fd = bpf_obj_get_user(path);
> +       set_fs(oldfs);
> +       if (fd < 0)
> +               return fd;
> +
> +       retval = __bpf_mt_check_fd(fd, ret);
> +       sys_close(fd);
> +       return retval;
> +}
> +
>  static int bpf_mt_check(const struct xt_mtchk_param *par)
>  {
>         struct xt_bpf_info *info = par->matchinfo;
> @@ -66,9 +83,10 @@ static int bpf_mt_check_v1(const struct xt_mtchk_param *par)
>                 return __bpf_mt_check_bytecode(info->bpf_program,
>                                                info->bpf_program_num_elem,
>                                                &info->filter);
> -       else if (info->mode == XT_BPF_MODE_FD_PINNED ||
> -                info->mode == XT_BPF_MODE_FD_ELF)
> +       else if (info->mode == XT_BPF_MODE_FD_ELF)
>                 return __bpf_mt_check_fd(info->fd, &info->filter);
> +       else if (info->mode == XT_BPF_MODE_PATH_PINNED)
> +               return __bpf_mt_check_path(info->path, &info->filter);
>         else
>                 return -EINVAL;
>  }
> --
> 2.14.2
>

^ permalink raw reply

* Linux bridge of hsr and Ethernet interface. Is this valid?
From: Murali Karicheri @ 2017-10-06 17:28 UTC (permalink / raw)
  To: open list:TI NETCP ETHERNET DRIVER

Hello Linux netdev experts,

I have a board that has multiple Ethernet intefaces. two, eth0 and eth1 and 1G interfaces and two are 100M interfaces (eth2 and eth3). I create  hsr interface, hsr0 using eth2 and eth3 using ip link command. Now I want to create a linux bridge as follows:-

brctl addbr my_bridge
brctl addif eth0
brctl addif hsr0

I connect a PC to eth0 interface and another hsr compliant device to eth2 or eth3. Is it a valid scenario? 

I see following description at https://wiki.linuxfoundation.org/networking/bridge where is mentioned that 

=============================
Adding devices to a bridge

The command

 brctl addif //bridgename// //device//

adds the network device device to take part in the bridging of “bridgename.” All the devices contained in a bridge act as one big network. It is not possible to add a device to multiple bridges or bridge a bridge device, because it just wouldn't make any sense! The bridge will take a short amount of time when a device is added to learn the Ethernet addresses on the segment before starting to forward. 

=============================

In this case hsr is already a 802.1D bridge and we are trying to bridge a bridge.

So your expert opinion is needed. Thanks.
-- 
Murali Karicheri
Linux Kernel, Keystone

^ permalink raw reply

* [PATCH net-next v2] vhost_net: do not stall on zerocopy depletion
From: Willem de Bruijn @ 2017-10-06 17:22 UTC (permalink / raw)
  To: netdev; +Cc: davem, mst, jasowang, den, virtualization, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Vhost-net has a hard limit on the number of zerocopy skbs in flight.
When reached, transmission stalls. Stalls cause latency, as well as
head-of-line blocking of other flows that do not use zerocopy.

Instead of stalling, revert to copy-based transmission.

Tested by sending two udp flows from guest to host, one with payload
of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The
large flow is redirected to a netem instance with 1MBps rate limit
and deep 1000 entry queue.

  modprobe ifb
  ip link set dev ifb0 up
  tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit

  tc qdisc add dev tap0 ingress
  tc filter add dev tap0 parent ffff: protocol ip \
      u32 match ip dport 8000 0xffff \
      action mirred egress redirect dev ifb0

Before the delay, both flows process around 80K pps. With the delay,
before this patch, both process around 400. After this patch, the
large flow is still rate limited, while the small reverts to its
original rate. See also discussion in the first link, below.

Without rate limiting, {1, 10, 100}x TCP_STREAM tests continued to
send at 100% zerocopy.

The limit in vhost_exceeds_maxpend must be carefully chosen. With
vq->num >> 1, the flows remain correlated. This value happens to
correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller
fractions and ensure correctness also for much smaller values of
vq->num, by testing the min() of both explicitly. See also the
discussion in the second link below.

Changes
  v1 -> v2
    - replaced min with typed min_t
    - avoid unnecessary whitespace change

Link:http://lkml.kernel.org/r/CAF=yD-+Wk9sc9dXMUq1+x_hh=3ThTXa6BnZkygP3tgVpjbp93g@mail.gmail.com
Link:http://lkml.kernel.org/r/20170819064129.27272-1-den@klaipeden.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/vhost/net.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 58585ec8699e..68677d930e20 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -436,8 +436,8 @@ static bool vhost_exceeds_maxpend(struct vhost_net *net)
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_virtqueue *vq = &nvq->vq;

-	return (nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV
-		== nvq->done_idx;
+	return (nvq->upend_idx + UIO_MAXIOV - nvq->done_idx) % UIO_MAXIOV >
+	       min_t(unsigned int, VHOST_MAX_PEND, vq->num >> 2);
 }

 /* Expects to be always run from workqueue - which acts as
@@ -480,11 +480,6 @@ static void handle_tx(struct vhost_net *net)
 		if (zcopy)
 			vhost_zerocopy_signal_used(net, vq);

-		/* If more outstanding DMAs, queue the work.
-		 * Handle upend_idx wrap around
-		 */
-		if (unlikely(vhost_exceeds_maxpend(net)))
-			break;

 		head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
 						ARRAY_SIZE(vq->iov),
@@ -519,8 +514,7 @@ static void handle_tx(struct vhost_net *net)
 		len = msg_data_left(&msg);

 		zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
-				   && (nvq->upend_idx + 1) % UIO_MAXIOV !=
-				      nvq->done_idx
+				   && !vhost_exceeds_maxpend(net)
 				   && vhost_net_tx_select_zcopy(net);

 		/* use msg_control to pass vhost zerocopy ubuf info to skb */
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related

* Re: [PATCH net] ppp: fix race in ppp device destruction
From: David Miller @ 2017-10-06 17:17 UTC (permalink / raw)
  To: g.nault; +Cc: netdev, bgalvani, linux-ppp, paulus, dsahern, gfree.wind
In-Reply-To: <f197b94ec7e16902a9601fdf840d5a8e94c8e91a.1507302309.git.g.nault@alphalink.fr>

From: Guillaume Nault <g.nault@alphalink.fr>
Date: Fri, 6 Oct 2017 17:05:49 +0200

> ppp_release() tries to ensure that netdevices are unregistered before
> decrementing the unit refcount and running ppp_destroy_interface().
> 
> This is all fine as long as the the device is unregistered by
> ppp_release(): the unregister_netdevice() call, followed by
> rtnl_unlock(), guarantee that the unregistration process completes
> before rtnl_unlock() returns.
> 
> However, the device may be unregistered by other means (like
> ppp_nl_dellink()). If this happens right before ppp_release() calling
> rtnl_lock(), then ppp_release() has to wait for the concurrent
> unregistration code to release the lock.
> But rtnl_unlock() releases the lock before completing the device
> unregistration process. This allows ppp_release() to proceed and
> eventually call ppp_destroy_interface() before the unregistration
> process completes. Calling free_netdev() on this partially unregistered
> device will BUG():
 ...
> We could set the ->needs_free_netdev flag on PPP devices and move the
> ppp_destroy_interface() logic in the ->priv_destructor() callback. But
> that'd be quite intrusive as we'd first need to unlink from the other
> channels and units that depend on the device (the ones that used the
> PPPIOCCONNECT and PPPIOCATTACH ioctls).
> 
> Instead, we can just let the netdevice hold a reference on its
> ppp_file. This reference is dropped in ->priv_destructor(), at the very
> end of the unregistration process, so that neither ppp_release() nor
> ppp_disconnect_channel() can call ppp_destroy_interface() in the interim.
> 
> Reported-by: Beniamino Galvani <bgalvani@redhat.com>
> Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion")
> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH 0/4] pull request for net-next: batman-adv 2017-10-06
From: David Miller @ 2017-10-06 17:13 UTC (permalink / raw)
  To: sw; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <20171006135437.26736-1-sw@simonwunderlich.de>

From: Simon Wunderlich <sw@simonwunderlich.de>
Date: Fri,  6 Oct 2017 15:54:33 +0200

> here is a small cleanup pull request of batman-adv to go into net-next.
> 
> Please pull or let me know of any problem!

Pulled, thanks Simon.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox