Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next v2 2/2] i40e: fix setting debug parameter early
From: Stefan Assmann @ 2016-09-23 13:30 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, davem, jeffrey.t.kirsher, carolyn.wyborny, sassmann
In-Reply-To: <1474637458-5255-1-git-send-email-sassmann@kpanic.de>

pf->msg_enable is a bitmask, therefore assigning the value of the
"debug" parameter is wrong. It is initialized again later in
i40e_sw_init() so it didn't cause any problem, except that we missed
early debug messages. Moved the initialization and assigned
pf->hw.debug_mask the bitmask as that's what the driver actually uses
in i40e_debug(). Otherwise the debug parameter is just a noop.

Fixes: 5b5faa4 ("i40e: enable debug earlier")

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 56369761..f972f0d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8498,11 +8498,6 @@ static int i40e_sw_init(struct i40e_pf *pf)
 	int err = 0;
 	int size;
 
-	pf->msg_enable = netif_msg_init(debug,
-					NETIF_MSG_DRV    |
-					NETIF_MSG_PROBE  |
-					NETIF_MSG_LINK);
-
 	/* Set default capability flags */
 	pf->flags = I40E_FLAG_RX_CSUM_ENABLED |
 		    I40E_FLAG_MSI_ENABLED     |
@@ -10812,10 +10807,13 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	mutex_init(&hw->aq.asq_mutex);
 	mutex_init(&hw->aq.arq_mutex);
 
-	if (debug != -1) {
-		pf->msg_enable = pf->hw.debug_mask;
-		pf->msg_enable = debug;
-	}
+	/* enable debug prints if requested */
+	pf->msg_enable = netif_msg_init(debug,
+					NETIF_MSG_DRV   |
+					NETIF_MSG_PROBE |
+					NETIF_MSG_LINK);
+	if (debug != -1)
+		pf->hw.debug_mask = pf->msg_enable;
 
 	/* do a special CORER for clearing PXE mode once at init */
 	if (hw->revision_id == 0 &&
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next v2 1/2] i40e: remove superfluous I40E_DEBUG_USER statement
From: Stefan Assmann @ 2016-09-23 13:30 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, davem, jeffrey.t.kirsher, carolyn.wyborny, sassmann
In-Reply-To: <1474637458-5255-1-git-send-email-sassmann@kpanic.de>

This debug statement is confusing and never set in the code. Any debug
output should be guarded by the proper I40E_DEBUG_* statement which can
be enabled via the debug module parameter or ethtool.
Remove or convert the I40E_DEBUG_USER cases to I40E_DEBUG_INIT.

v2: re-add setting the debug_mask in i40e_set_msglevel() so that the
debug level can still be altered via ethtool msglvl.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
---
 drivers/net/ethernet/intel/i40e/i40e_common.c  |  3 ---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  6 -----
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  3 +--
 drivers/net/ethernet/intel/i40e/i40e_main.c    | 35 +++++++++++++-------------
 drivers/net/ethernet/intel/i40e/i40e_type.h    |  2 --
 5 files changed, 18 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 2154a34..8ccb09c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -3207,9 +3207,6 @@ static void i40e_parse_discover_capabilities(struct i40e_hw *hw, void *buff,
 			break;
 		case I40E_AQ_CAP_ID_MSIX:
 			p->num_msix_vectors = number;
-			i40e_debug(hw, I40E_DEBUG_INIT,
-				   "HW Capability: MSIX vector count = %d\n",
-				   p->num_msix_vectors);
 			break;
 		case I40E_AQ_CAP_ID_VF_MSIX:
 			p->num_msix_vectors_vf = number;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 05cf9a7..e9c6f1c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -1210,12 +1210,6 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
 		u32 level;
 		cnt = sscanf(&cmd_buf[10], "%i", &level);
 		if (cnt) {
-			if (I40E_DEBUG_USER & level) {
-				pf->hw.debug_mask = level;
-				dev_info(&pf->pdev->dev,
-					 "set hw.debug_mask = 0x%08x\n",
-					 pf->hw.debug_mask);
-			}
 			pf->msg_enable = level;
 			dev_info(&pf->pdev->dev, "set msg_enable = 0x%08x\n",
 				 pf->msg_enable);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 1835186..02f55ab 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -987,8 +987,7 @@ static void i40e_set_msglevel(struct net_device *netdev, u32 data)
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_pf *pf = np->vsi->back;
 
-	if (I40E_DEBUG_USER & data)
-		pf->hw.debug_mask = data;
+	pf->hw.debug_mask = data;
 	pf->msg_enable = data;
 }
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 61b0fc4..56369761 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6665,16 +6665,19 @@ static int i40e_get_capabilities(struct i40e_pf *pf)
 		}
 	} while (err);
 
-	if (pf->hw.debug_mask & I40E_DEBUG_USER)
-		dev_info(&pf->pdev->dev,
-			 "pf=%d, num_vfs=%d, msix_pf=%d, msix_vf=%d, fd_g=%d, fd_b=%d, pf_max_q=%d num_vsi=%d\n",
-			 pf->hw.pf_id, pf->hw.func_caps.num_vfs,
-			 pf->hw.func_caps.num_msix_vectors,
-			 pf->hw.func_caps.num_msix_vectors_vf,
-			 pf->hw.func_caps.fd_filters_guaranteed,
-			 pf->hw.func_caps.fd_filters_best_effort,
-			 pf->hw.func_caps.num_tx_qp,
-			 pf->hw.func_caps.num_vsis);
+	i40e_debug(&pf->hw, I40E_DEBUG_INIT,
+		   "HW Capabilities: PF-id[%d] num_vfs=%d, msix_pf=%d, msix_vf=%d\n",
+		   pf->hw.pf_id,
+		   pf->hw.func_caps.num_vfs,
+		   pf->hw.func_caps.num_msix_vectors,
+		   pf->hw.func_caps.num_msix_vectors_vf);
+	i40e_debug(&pf->hw, I40E_DEBUG_INIT,
+		   "HW Capabilities: PF-id[%d] fd_g=%d, fd_b=%d, pf_max_qp=%d num_vsis=%d\n",
+		   pf->hw.pf_id,
+		   pf->hw.func_caps.fd_filters_guaranteed,
+		   pf->hw.func_caps.fd_filters_best_effort,
+		   pf->hw.func_caps.num_tx_qp,
+		   pf->hw.func_caps.num_vsis);
 
 #define DEF_NUM_VSI (1 + (pf->hw.func_caps.fcoe ? 1 : 0) \
 		       + pf->hw.func_caps.num_vfs)
@@ -8495,14 +8498,10 @@ static int i40e_sw_init(struct i40e_pf *pf)
 	int err = 0;
 	int size;
 
-	pf->msg_enable = netif_msg_init(I40E_DEFAULT_MSG_ENABLE,
-				(NETIF_MSG_DRV|NETIF_MSG_PROBE|NETIF_MSG_LINK));
-	if (debug != -1 && debug != I40E_DEFAULT_MSG_ENABLE) {
-		if (I40E_DEBUG_USER & debug)
-			pf->hw.debug_mask = debug;
-		pf->msg_enable = netif_msg_init((debug & ~I40E_DEBUG_USER),
-						I40E_DEFAULT_MSG_ENABLE);
-	}
+	pf->msg_enable = netif_msg_init(debug,
+					NETIF_MSG_DRV    |
+					NETIF_MSG_PROBE  |
+					NETIF_MSG_LINK);
 
 	/* Set default capability flags */
 	pf->flags = I40E_FLAG_RX_CSUM_ENABLED |
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h
index bd5f13b..7e88e35 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -85,8 +85,6 @@ enum i40e_debug_mask {
 	I40E_DEBUG_AQ_COMMAND		= 0x06000000,
 	I40E_DEBUG_AQ			= 0x0F000000,
 
-	I40E_DEBUG_USER			= 0xF0000000,
-
 	I40E_DEBUG_ALL			= 0xFFFFFFFF
 };
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next v2 0/2] i40e: clean-up and fix for the i40e debug code
From: Stefan Assmann @ 2016-09-23 13:30 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, davem, jeffrey.t.kirsher, carolyn.wyborny, sassmann

v2 fixes setting the debug_mask in i40e_set_msglevel().

Stefan Assmann (2):
  i40e: remove superfluous I40E_DEBUG_USER statement
  i40e: fix setting debug parameter early

 drivers/net/ethernet/intel/i40e/i40e_common.c  |  3 --
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  6 ----
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  3 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c    | 43 ++++++++++++--------------
 drivers/net/ethernet/intel/i40e/i40e_type.h    |  2 --
 5 files changed, 21 insertions(+), 36 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH] brcmfmac: drop unused fields from struct brcmf_pub
From: Rafał Miłecki @ 2016-09-23 13:27 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Rafał Miłecki, Arend van Spriel, Franky Lin,
	Hante Meuleman, Pieter-Paul Giesberts, Franky (Zhenhui) Lin,
	Colin Ian King,
	open list:BROADCOM BRCM80211 IEEE802.11n WIRELESS DRIVER,
	open list:BROADCOM BRCM80211 IEEE802.11n WIRELESS DRIVER,
	open list:NETWORKING DRIVERS, open list

From: Rafał Miłecki <rafal@milecki.pl>

They seem to be there from the first day. We calculate these values but
never use them.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c     | 3 ---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.h     | 4 ----
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c | 2 --
 3 files changed, 9 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
index 65e8c87..27cd50a 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
@@ -519,9 +519,6 @@ int brcmf_net_attach(struct brcmf_if *ifp, bool rtnl_locked)
 	ndev->needed_headroom += drvr->hdrlen;
 	ndev->ethtool_ops = &brcmf_ethtool_ops;
 
-	drvr->rxsz = ndev->mtu + ndev->hard_header_len +
-			      drvr->hdrlen;
-
 	/* set the mac address */
 	memcpy(ndev->dev_addr, ifp->mac_addr, ETH_ALEN);
 
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.h b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.h
index 8fa34ca..f16cfc9 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.h
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.h
@@ -112,15 +112,11 @@ struct brcmf_pub {
 
 	/* Internal brcmf items */
 	uint hdrlen;		/* Total BRCMF header length (proto + bus) */
-	uint rxsz;		/* Rx buffer size bus module should use */
 
 	/* Dongle media info */
 	char fwver[BRCMF_DRIVER_FIRMWARE_VERSION_LEN];
 	u8 mac[ETH_ALEN];		/* MAC address obtained from dongle */
 
-	/* Multicast data packets sent to dongle */
-	unsigned long tx_multicast;
-
 	struct mac_address addresses[BRCMF_MAX_IFS];
 
 	struct brcmf_if *iflist[BRCMF_MAX_IFS];
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
index 9f9024a..a190f53 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
@@ -2104,8 +2104,6 @@ int brcmf_fws_process_skb(struct brcmf_if *ifp, struct sk_buff *skb)
 	if ((skb->priority == 0) || (skb->priority > 7))
 		skb->priority = cfg80211_classify8021d(skb, NULL);
 
-	drvr->tx_multicast += !!multicast;
-
 	if (fws->avoid_queueing) {
 		rc = brcmf_proto_txdata(drvr, ifp->ifidx, 0, skb);
 		if (rc < 0)
-- 
2.9.3

^ permalink raw reply related

* RE: [PATCH RFC 05/11] skbuff: Extend gso_type to unsigned int.
From: David Laight @ 2016-09-23 13:19 UTC (permalink / raw)
  To: 'Steffen Klassert', netdev@vger.kernel.org
  Cc: Sowmini Varadhan, Ilan Tayari, Boris Pismenny, Ariel Levkovich,
	Hay, Joshua A
In-Reply-To: <1474617228-26103-6-git-send-email-steffen.klassert@secunet.com>

From: Steffen Klassert
> Sent: 23 September 2016 08:54
> All available gso_type flags are currently in use,
> so extend gso_type to be able to add further flags.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> ---
>  include/linux/skbuff.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index f21da42..c1fd854 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -417,7 +417,7 @@ struct skb_shared_info {
>  	unsigned short	gso_size;
>  	/* Warning: this field is not always filled in (UFO)! */
>  	unsigned short	gso_segs;
> -	unsigned short  gso_type;
> +	unsigned int	gso_type;
>  	struct sk_buff	*frag_list;
>  	struct skb_shared_hwtstamps hwtstamps;
>  	u32		tskey;

That add a lot of padding.
I'm not even sure DM will like this structure being extended.
If ktime_t is 64 bit I think there is already some padding later on.

	David

^ permalink raw reply

* [PATCH] net: bcmgenet: Fix EPHY reset in power up
From: Jaedon Shin @ 2016-09-23 13:20 UTC (permalink / raw)
  To: Florian Fainelli, David S . Miller; +Cc: Philippe Reynes, netdev, Jaedon Shin

The bcmgenet_mii_reset() is always not running in power up sequence
after 'commit 62469c76007e ("net: ethernet: bcmgenet: use phydev from
struct net_device")'. This'll show extremely high latency and duplicate
packets while interface down and up repeatedly.

For now, adds again a private phydev for mii reset when runs power up to
open interface.

Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.h | 1 +
 drivers/net/ethernet/broadcom/genet/bcmmii.c   | 9 ++++++---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
index 0f0868c56f05..1e2dc34d331a 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -597,6 +597,7 @@ struct bcmgenet_priv {
 
 	/* MDIO bus variables */
 	wait_queue_head_t wq;
+	struct phy_device *phydev;
 	bool internal_phy;
 	struct device_node *phy_dn;
 	struct device_node *mdio_dn;
diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c b/drivers/net/ethernet/broadcom/genet/bcmmii.c
index e907acd81da9..b2bd5302c478 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmmii.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
@@ -183,9 +183,9 @@ void bcmgenet_mii_reset(struct net_device *dev)
 	if (GENET_IS_V4(priv))
 		return;
 
-	if (dev->phydev) {
-		phy_init_hw(dev->phydev);
-		phy_start_aneg(dev->phydev);
+	if (priv->phydev) {
+		phy_init_hw(priv->phydev);
+		phy_start_aneg(priv->phydev);
 	}
 }
 
@@ -383,6 +383,8 @@ int bcmgenet_mii_probe(struct net_device *dev)
 		}
 	}
 
+	priv->phydev = phydev;
+
 	/* Configure port multiplexer based on what the probed PHY device since
 	 * reading the 'max-speed' property determines the maximum supported
 	 * PHY speed which is needed for bcmgenet_mii_config() to configure
@@ -605,6 +607,7 @@ static int bcmgenet_mii_pd_init(struct bcmgenet_priv *priv)
 
 	}
 
+	priv->phydev = phydev;
 	priv->phy_interface = pd->phy_interface;
 
 	return 0;
-- 
2.10.0

^ permalink raw reply related

* Re: [net-next 5/5] PCI: disable FLR for 82579 device
From: Alex Williamson @ 2016-09-23 13:19 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: davem, bhelgaas, Sasha Neftin, netdev, nhorman, sassmann,
	jogreene, guru.anbalagane, linux-pci
In-Reply-To: <1474612741-75681-6-git-send-email-jeffrey.t.kirsher@intel.com>

On Thu, 22 Sep 2016 23:39:01 -0700
Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:

> From: Sasha Neftin <sasha.neftin@intel.com>
> 
> 82579 has a problem reattaching itself after the device is detached.
> The bug was reported by Redhat. The suggested fix is to disable
> FLR capability in PCIe configuration space.
> 
> Reproduction:
> Attach the device to a VM, then detach and try to attach again.
> 
> Fix:
> Disable FLR capability to prevent the 82579 from hanging.
> 
> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>  drivers/pci/quirks.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 44e0ff3..59fba6e 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4431,3 +4431,24 @@ static void quirk_intel_qat_vf_cap(struct pci_dev *pdev)
>  	}
>  }
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
> +/*
> + * Workaround FLR issues for 82579
> + * This code disables the FLR (Function Level Reset) via PCIe, in order
> + * to workaround a bug found while using device passthrough, where the
> + * interface would become non-responsive.
> + * NOTE: the FLR bit is Read/Write Once (RWO) in config space, so if
> + * the BIOS or kernel writes this register * then this workaround will
> + * not work.
> + */
> +static void quirk_intel_flr_cap_dis(struct pci_dev *dev)
> +{
> +	int pos = pci_find_capability(dev, PCI_CAP_ID_AF);
> +	if (pos) {
> +		u8 cap;
> +		pci_read_config_byte(dev, pos + PCI_AF_CAP, &cap);
> +		cap = cap & (~PCI_AF_CAP_FLR);
> +		pci_write_config_byte(dev, pos + PCI_AF_CAP, cap);
> +	}
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_intel_flr_cap_dis);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_intel_flr_cap_dis);

This seems like a pretty fragile quirk since we're just hoping that the
BIOS hasn't already written this byte.  Should we at least re-read and
warn if the write didn't take?  What about using dev_flags or a device
specific reset to make this less fragile?  A device specific reset
could pick the best reset mechanism for the device, ignoring AF FLR.
Thanks,

Alex

^ permalink raw reply

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup ebpf egress programs
From: Pablo Neira Ayuso @ 2016-09-23 13:17 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Thomas Graf, Daniel Mack, htejun-b10kYP2dOMg, ast-b10kYP2dOMg,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <57E3F4F9.70300-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>

On Thu, Sep 22, 2016 at 05:12:57PM +0200, Daniel Borkmann wrote:
> On 09/22/2016 02:05 PM, Pablo Neira Ayuso wrote:
[...]
> >Have a look at net/ipv4/netfilter/nft_chain_route_ipv4.c for instance.
> >In your case, you have to add a new chain type:
> >
> >static const struct nf_chain_type nft_chain_bpf = {
> >         .name           = "bpf",
> >         .type           = NFT_CHAIN_T_bpf,
> >         ...
> >         .hooks          = {
> >                 [NF_INET_LOCAL_IN]      = nft_do_bpf,
> >                 [NF_INET_LOCAL_OUT]     = nft_do_bpf,
> >                 [NF_INET_FORWARD]       = nft_do_bpf,
> >                 [NF_INET_PRE_ROUTING]   = nft_do_bpf,
> >                 [NF_INET_POST_ROUTING]  = nft_do_bpf,
> >         },
> >};
> >
> >nft_do_bpf() is the raw netfilter hook that you register, this hook
> >will just execute to iterate over the list of bpf filters and run
> >them.
> >
> >This new chain is created on demand, so no overhead if not needed, eg.
> >
> >nft add table bpf
> >nft add chain bpf input { type bpf hook output priority 0\; }
> >
> >Then, you add a rule for each bpf program you want to run, just like
> >tc+bpf.
> 
> But from a high-level point of view, this sounds like a huge hack to me,
> in the sense that nft as a bytecode engine (from whole architecture I
> mean) calls into another bytecode engine such as bpf as an extension.

nft is not only bytecode engine, it provides a netlink socket
interface to register hooks (from user perspective, these are called
basechain). It is providing the infrastructure that you're lacking
indeed and addressing the concerns I mentioned about the visibility of
the global policy that you want to apply on the packet path.

As I explained you can potentially add any basechain type with
specific semantics. Proposed semantics for this bpf chain would be:

1) You can use any of the existing netfilter hooks.
2) You can only run bpf program from there. No chance for the user
   can mix nftables with bpf VM.

> And bpf code from there isn't using any of the features from nft
> besides being invoked from the hook

I think there's a misunderstading here.

You will not run nft_do_chain(), you don't waste cycles to run what is
specific to nftables. You will just run nft_do_bpf() which will just
do what you want to run for each packet. Thus, you have control on
what nft_do_bpf() does and decide on what that function spend cycles
on.

> [...] I was hoping that nft would try to avoid some of those exotic
> modules we have from xt, I would consider xt_bpf (no offense ;))

This has nothing to do with it. In xt_bpf you waste cycles running
code that is specific to iptables, what I propose would not, just the
generic hook code and then your code.

[...]
> >Benefits are, rewording previous email:
> >
> >* You get access to all of the existing netfilter hooks in one go
> >   to run bpf programs. No need for specific redundant hooks. This
> >   provides raw access to the netfilter hook, you define the little
> >   code that your hook runs before you bpf run invocation. So there
> >   is *no need to bloat the stack with more hooks, we use what we
> >   have.*
> 
> But also this doesn't really address the fundamental underlying problem
> that is discussed here. nft doesn't even have cgroups v2 support.

You don't need native cgroups v2 support in nft, you just run bpf
programs from the native bpf basechain type. So whatever bpf supports,
you can do it.

Instead, if you take this approach, you will get access to all of the
existing hooks to run bpf programs, this includes arp, bridge and
potentially run filters for both ip and ip6 through our inet family.

[...]
> Or would the idea be that the current netfilter hooks would be redone in
> a way that they are generic enough so that any other user could make use
> of it independent of netfilter?

Redone? Why? What do you need, a rename?

Dependencies are very few: CONFIG_NETFILTER for the hooks,
CONFIG_NF_TABLES to obtain the netlink interface to load the bpf
programs and CONFIG_NF_TABLES_BPF to define the bpf basechain type
semantics to run bpf programs from there. It's actually very little
boilerplate code.

Other than that, I can predict where you're going: You will end up
adding a hook just before/after every of the existing netfilter hooks,
and that is really nonsense to me. Why bloat the stack with more
hooks? Use what it is already available.

^ permalink raw reply

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
From: Jesper Dangaard Brouer @ 2016-09-23 13:00 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Tom Herbert, davem, netdev, kernel-team, tariqt, bblanco,
	alexei.starovoitov, eric.dumazet, brouer, Thomas Graf
In-Reply-To: <06280af8-5451-c423-d295-a5f3f51e63cf@mojatatu.com>

On Fri, 23 Sep 2016 07:13:30 -0400
Jamal Hadi Salim <jhs@mojatatu.com> wrote:

> On 16-09-20 06:00 PM, Tom Herbert wrote:
> > This patch creates an infrastructure for registering and running code at
> > XDP hooks in drivers. This is based on the orignal XDP?BPF and borrows
> > heavily from the techniques used by netfilter to make generic nfhooks.
> >
> > An XDP hook is defined by the  xdp_hook_ops. This structure contains the
> > ops of an XDP hook. A pointer to this structure is passed into the XDP
> > register function to set up a hook. The XDP register function mallocs
> > its own xdp_hook_ops structure and copies the values from the
> > xdp_hook_ops passed in. The register function also stores the pointer
> > value of the xdp_hook_ops argument; this pointer is used in subsequently
> > calls to XDP to identify the registered hook.
> >
> > The interface is defined in net/xdp.h. This includes the definition of
> > xdp_hook_ops, functions to register and unregister hook ops on a device
> > or individual instances of napi, and xdp_hook_run that is called by
> > drivers to run the hooks.
> >  
> 
> Tom,
> perused the thread and it seems you are serious ;->
> Are we heading towards Frankenstein Avenue?
> The whole point behind letting in XDP is so that _small programs_
> can be written to do some quick thing. eBPF 4K program limit was
> touted as the check and bound. eBPF sounded fine.
> This sounds like a huge contradiction.
> 
> cheers,
> jamal

Hi Jamal,

I don't understand why you think this is so controversial. The way I
see it (after reading the thread): This is about allowing kernel
components to _also_ use the XDP hook.

I believe Tom have a valid use-case in ILA. The NVE (Network
Virtualization Edge) component very naturally fits in the XDP hook, as
it only need to look at the IPv6 address and do a table lookup (in a
ILA specific data structure) to see if this packet is for local stack
delivery or forward.  For forward it does not need take the performance
hit of allocating SKBs etc.

You can see it as a way to accelerate the NVE component. I can imagine
it could be done in approx 10-20 lines of code, as it would use the
existing ILA lookup function calls.

AFAIK Thomas Graf also see XDP as an acceleration for bpf_cls, but he
is lucky because his code is already eBFP based.

To support Tom's case, with eBPF, I think we would have to implement
specific eBPF helper functions that can do the ILA table lookups.  And
then need a userspace component to load the eBPF program.  Why add all
this, when we could contain all this in the kernel, and simply call
this as native C-code via the XDP hook?

Notice, this is not about throwing eBPF out.  Using eBPF is _very_
essential for XDP.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [RFC] net: store port/representative id in metadata_dst
From: Jakub Kicinski @ 2016-09-23 12:55 UTC (permalink / raw)
  To: Jiri Benc, Jiri Pirko
  Cc: netdev, Thomas Graf, Roopa Prabhu, ogerlitz, John Fastabend,
	sridhar.samudrala, ast, daniel, simon.horman, Paolo Abeni,
	Pravin B Shelar, hannes, kubakici
In-Reply-To: <20160923110609.2f221f99@griffin>

On Fri, 23 Sep 2016 11:06:09 +0200, Jiri Benc wrote:
> On Fri, 23 Sep 2016 08:34:29 +0200, Jiri Pirko wrote:
> > So if I understand that correctly, this would need some "shared netdev"
> > which would effectively serve only as a sink for all port netdevices to
> > tx packets to. On RX, this would be completely avoided. This lower
> > device looks like half zombie to me.  
> 
> Looks more like a quarter zombie. Even tx would not be allowed unless
> going through one of the ports, as all skbs without
> METADATA_HW_PORT_MUX metadata_dst would be dropped. But it would be
> possible to attach qdisc to the "lower" netdevice and it would actually
> have an effect. On rx this netdevice would be ignored completely. This
> is very weird behavior.
> 
> > I don't like it :( I wonder if the
> > solution would not be possible without this lower netdev.  
> 
> I agree. This approach doesn't sound correct. The skbs should not be
> requeued.

Thanks for the responses!

I think SR-IOV NICs are coming at this problem from a different angle,
we already have a big, feature-full per-port netdevs and now we want to
spawn representators for VFs to handle fallback traffic.  This patch
would help us mux VFR traffic on all the queues of the physical port
netdevs (the ones which were already present in legacy mode, that's the
lower device).

I read the mlxsw code when I was thinking about this and I wasn't
100% comfortable with returning NETDEV_TX_BUSY, I thought this
behaviour should be generally avoided.  (BTW a very lame question - does
mlxsw ever stop the queues?  AFAICS it only returns BUSY, isn't that
confusing to the stack?)

FWIW the switchdev SR-IOV model we have now seems to be to treat the
existing netdevs as "MAC ports" and spawn representatives for VFs but
not represent PFs in any way.  This makes it impossible to install
VF-PF flow rules.  I worry this can bite us later but that's slightly
different discussion :)  For the purpose of this patch please assume
the lower dev is the MAC/physical/external port.

^ permalink raw reply

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
From: Jamal Hadi Salim @ 2016-09-23 12:48 UTC (permalink / raw)
  To: Shmulik Ladkani; +Cc: David S. Miller, WANG Cong, Eric Dumazet, netdev
In-Reply-To: <20160923081106.73fb48df@halley>

On 16-09-23 01:11 AM, Shmulik Ladkani wrote:
> Hi,
>
> On Thu, 22 Sep 2016 19:40:15 -0400 Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>> On 16-09-22 09:21 AM, Shmulik Ladkani wrote:
>>> From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
>>>
>>> Up until now, 'action mirred' supported only egress actions (either
>>> TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
>>>
>>> This patch implements the corresponding ingress actions
>>> TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
>>>
>>> This allows attaching filters whose target is to hand matching skbs into
>>> the rx processing of a specified device.
>>
>> Thank you for doing this. There was something that made me remove
>> initial support for this feature - I am blanking out right now but
>> will find my notes and give more details.
>
> Thanks Jamal, appreciate any details.
>
> Was wondering why it's missing, googled a bit with no meaningful
> results, so speculated the following:
>
> Some time long ago, initial 'mirred' purpose was to facilitate ifb.
> Therefore 'egress redirect' was implemented. Jamal probably left the
> 'ingress' support for a later time :)
>

History is mirror/redirect were first introduced to do just those
plain vanilla-free features. IFB came later. Up until recently there
were still some bits to support the ingress features that were removed
by Florian W. to save some skb bits.

> One interesting usecase for 'ingress redirect' is creating "rx bouncing"
> construct (like macvlan/macvtap/ipvlan) but applied according to custom
> logic.
>

I thought that was the motivation as well.

>> It may be around preventing loops maybe.
>
> Could be, but personally, I treat these constructs as (powerful)
> building blocks, and "with great power comes great responsibility".
>

Amen.
I am a believer in let-the-user-shoot-their-big-toe-if-they-want.

> Even today, one may create loops using existing 'egress redirect',
> e.g. this rediculously errorneous construct:
>
>  # ip l add v0 type veth peer name v0p
>  # tc filter add dev v0p parent ffff: basic \
>     action mirred egress redirect dev v0
>

I think we actually recover from this one by eventually
dropping (theres a ttl field). We should at least not lock
the kernel forever.
The other question is what to set skb->dev and skb->iif?
Some information will be lost if you move around netdevs a
bit.

BTW: You have motivated me to start looking again at redirect
to socket that was also left out. I am getting tired of redirecting
to tuntap with all its bells and whistles.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net-next] tcp: add tcp_add_backlog()
From: Marcelo Ricardo Leitner @ 2016-09-23 12:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Neal Cardwell, Yuchung Cheng
In-Reply-To: <1474586490.28155.10.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, Sep 22, 2016 at 04:21:30PM -0700, Eric Dumazet wrote:
> On Thu, 2016-09-22 at 19:34 -0300, Marcelo Ricardo Leitner wrote:
> > On Sat, Aug 27, 2016 at 07:37:54AM -0700, Eric Dumazet wrote:
> > > +bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
> > > +{
> > > +	u32 limit = sk->sk_rcvbuf + sk->sk_sndbuf;
> >                                  ^^^
> > ...
> > > +	if (!skb->data_len)
> > > +		skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
> > > +
> > > +	if (unlikely(sk_add_backlog(sk, skb, limit))) {
> > ...
> > > -	} else if (unlikely(sk_add_backlog(sk, skb,
> > > -					   sk->sk_rcvbuf + sk->sk_sndbuf))) {
> > 	                                                 ^---- [1]
> > > -		bh_unlock_sock(sk);
> > > -		__NET_INC_STATS(net, LINUX_MIB_TCPBACKLOGDROP);
> > > +	} else if (tcp_add_backlog(sk, skb)) {
> > 
> > Hi Eric, after this patch, do you think we still need to add sk_sndbuf
> > as a stretching factor to the backlog here?
> > 
> > It was added by [1] and it was justified that the (s)ack packets were
> > just too big for the rx buf size. Maybe this new patch alone is enough
> > already, as such packets will have a very small truesize then.
> > 
> >   Marcelo
> > 
> > [1] da882c1f2eca ("tcp: sk_add_backlog() is too agressive for TCP")
> > 
> 
> Hi Marcelo
> 
> Yes, it is still needed, some drivers provide linear skbs, so the
> skb->truesize of ack packets will likely be the same (skb->head points
> to a full size frame allocated by the driver)

Aye. In that case, what about using tail instead of end? Because
accounting for something that we have to tweak the limits to accept is
like adding a constant to both sides of the equation.
But perhaps that would cut out too much of the fat which could be used
later by the stack.

^ permalink raw reply

* RE: [PATCH net] act_ife: Add support for machines with hard_header_len != mac_len
From: Yotam Gigi @ 2016-09-23 12:28 UTC (permalink / raw)
  To: David Miller; +Cc: jhs@mojatatu.com, netdev@vger.kernel.org, mlxsw
In-Reply-To: <20160923.070536.1880911510074184857.davem@davemloft.net>


>-----Original Message-----
>From: David Miller [mailto:davem@davemloft.net]
>Sent: Friday, September 23, 2016 2:06 PM
>To: Yotam Gigi <yotamg@mellanox.com>
>Cc: jhs@mojatatu.com; netdev@vger.kernel.org
>Subject: Re: [PATCH net] act_ife: Add support for machines with hard_header_len
>!= mac_len
>
>From: Yotam Gigi <yotamg@mellanox.com>
>Date: Wed, 21 Sep 2016 15:54:13 +0300
>
>> Without that fix, the following could occur:
> ...
>
>I don't think what you are doing in mlxsw is valid.
>
>You can't set hard_header_len arbitrarily, it's the MAC length.
>
>If you need to prepend special headers or whatever, set
>->needed_headroom which is designed for this purpose.

Ok, we will fix that.

Thanks for the comment!

>
>Thanks.

^ permalink raw reply

* Re: [PATCH net] tcp: fix a compile error in DBGUNDO()
From: David Miller @ 2016-09-23 12:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1474592040.28155.15.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 22 Sep 2016 17:54:00 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> If DBGUNDO() is enabled (FASTRETRANS_DEBUG > 1), a compile
> error will happen, since inet6_sk(sk)->daddr became sk->sk_v6_daddr
> 
> Fixes: efe4208f47f9 ("ipv6: make lookups simpler and faster")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net-next 0/3] Few minor BPF helper improvements
From: David Miller @ 2016-09-23 12:24 UTC (permalink / raw)
  To: daniel; +Cc: alexei.starovoitov, netdev
In-Reply-To: <cover.1474586162.git.daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Fri, 23 Sep 2016 01:28:34 +0200

> Just a few minor improvements around BPF helpers, first one is a
> fix but given this late stage and that it's not really a critical
> one, I think net-next is just fine. For details please see the
> individual patches.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] drivers: net: xgene: Fix MSS programming
From: David Miller @ 2016-09-23 12:21 UTC (permalink / raw)
  To: isubramanian; +Cc: netdev, linux-arm-kernel, patches, toanle
In-Reply-To: <1474584453-9071-1-git-send-email-isubramanian@apm.com>

From: Iyappan Subramanian <isubramanian@apm.com>
Date: Thu, 22 Sep 2016 15:47:33 -0700

> Current driver programs static value of MSS in hardware register for TSO
> offload engine to segment the TCP payload regardless the MSS value
> provided by network stack.
> 
> This patch fixes this by programming hardware registers with the
> stack provided MSS value.
> 
> Since the hardware has the limitation of having only 4 MSS registers,
> this patch uses reference count of mss values being used.
> 
> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
> Signed-off-by: Toan Le <toanle@apm.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/7] hv_netvsc changes
From: David Miller @ 2016-09-23 12:21 UTC (permalink / raw)
  To: sthemmin, sthemmin; +Cc: kys, haiyangz, netdev
In-Reply-To: <1474588595-16054-1-git-send-email-sthemmin@exchange.microsoft.com>

From: sthemmin@exchange.microsoft.com
Date: Thu, 22 Sep 2016 16:56:28 -0700

> These are mostly about improving the handling of interaction between
> the virtual network device (netvsc) and the SR-IOV VF network device.

Series applied, thanks Stephen.

^ permalink raw reply

* Re: [PATCH] mlxsw: spectrum: remove redundant check if err is zero
From: Jiri Pirko @ 2016-09-23 12:14 UTC (permalink / raw)
  To: Colin King; +Cc: Jiri Pirko, Ido Schimmel, netdev, linux-kernel
In-Reply-To: <20160923110245.18977-1-colin.king@canonical.com>

Fri, Sep 23, 2016 at 01:02:45PM CEST, colin.king@canonical.com wrote:
>From: Colin Ian King <colin.king@canonical.com>
>
>There is an earlier check and return if err is non-zero, so
>the check to see if it is zero is redundant in every iteration
>of the loop and hence the check can be removed.
>
>Signed-off-by: Colin Ian King <colin.king@canonical.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
From: David Miller @ 2016-09-23 12:06 UTC (permalink / raw)
  To: sowmini.varadhan; +Cc: jbenc, netdev, hannes, aduyck, daniel, pabeni
In-Reply-To: <20160922213010.GA32052@oracle.com>

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Thu, 22 Sep 2016 17:30:10 -0400

> On (09/22/16 01:52), David Miller wrote:
>> Alternatively we can do Alexander Duyck's trick, by pushing
>> the headers into the frag list, forcing a pull and realignment
>> by the next protocol layer.
> 
> What is the "Alexander Duyck trick" (hints about module or commit id,
> where this can be found, please)?
> 
> Is this basically about, e.g., putting the vxlanhdr in its own
> skb_frag_t, or something else?

Yes, and this way skb_header_pointer() is forced to do a memcpy.

^ permalink raw reply

* Re: [PATCH net-next 0/4] net: dsa: add port fast ageing
From: David Miller @ 2016-09-23 12:01 UTC (permalink / raw)
  To: vivien.didelot; +Cc: netdev, linux-kernel, kernel, f.fainelli, andrew, john
In-Reply-To: <20160922204924.16229-1-vivien.didelot@savoirfairelinux.com>

From: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date: Thu, 22 Sep 2016 16:49:20 -0400

> Today the DSA drivers are in charge of flushing the MAC addresses
> associated to a port when its STP state changes from Learning or
> Forwarding, to Disabled or Blocking or Listening.
> 
> This makes the drivers more complex and hides this generic switch logic.
> 
> This patchset introduces a new optional port_fast_age operation to
> dsa_switch_ops, to move this logic to the DSA layer and keep drivers
> simple. b53 and mv88e6xxx are updated accordingly.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH v3] tcp: fix wrong checksum calculation on MTU probing
From: David Miller @ 2016-09-23 11:58 UTC (permalink / raw)
  To: douglascs; +Cc: sergei.shtylyov, kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <f920ce68-c4e9-7772-809c-01f7319fa78e@taghos.com.br>

From: Douglas Caetano dos Santos <douglascs@taghos.com.br>
Date: Thu, 22 Sep 2016 15:52:04 -0300

> With TCP MTU probing enabled and offload TX checksumming disabled,
> tcp_mtu_probe() calculated the wrong checksum when a fragment being copied
> into the probe's SKB had an odd length. This was caused by the direct use
> of skb_copy_and_csum_bits() to calculate the checksum, as it pads the
> fragment being copied, if needed. When this fragment was not the last, a
> subsequent call used the previous checksum without considering this
> padding.
> 
> The effect was a stale connection in one way, as even retransmissions
> wouldn't solve the problem, because the checksum was never recalculated for
> the full SKB length.
> 
> Signed-off-by: Douglas Caetano dos Santos <douglascs@taghos.com.br>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH] softirq: let ksoftirqd do its job
From: Peter Zijlstra @ 2016-09-23 11:53 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, eric.dumazet, riel, pabeni, hannes, jbrouer,
	linux-kernel, netdev, corbet, Ingo Molnar
In-Reply-To: <57E5139F.5040708@iogearbox.net>

On Fri, Sep 23, 2016 at 01:35:59PM +0200, Daniel Borkmann wrote:
> On 09/02/2016 08:39 AM, David Miller wrote:
> >
> >I'm just kind of assuming this won't go through my tree, but I can take
> >it if that's what everyone agrees to.
> 
> Was this actually picked up somewhere in the mean time?

I can queue it for tip. In fact, I've just done so to avoid loosing it.
If anybody else wants it holler.

^ permalink raw reply

* Re: [net-next 5/5] PCI: disable FLR for 82579 device
From: Sergei Shtylyov @ 2016-09-23 11:52 UTC (permalink / raw)
  To: Jeff Kirsher, davem, bhelgaas
  Cc: Sasha Neftin, netdev, nhorman, sassmann, jogreene,
	guru.anbalagane, linux-pci
In-Reply-To: <1474612741-75681-6-git-send-email-jeffrey.t.kirsher@intel.com>

Hello.

On 9/23/2016 9:39 AM, Jeff Kirsher wrote:

> From: Sasha Neftin <sasha.neftin@intel.com>
>
> 82579 has a problem reattaching itself after the device is detached.
> The bug was reported by Redhat. The suggested fix is to disable
> FLR capability in PCIe configuration space.
>
> Reproduction:
> Attach the device to a VM, then detach and try to attach again.
>
> Fix:
> Disable FLR capability to prevent the 82579 from hanging.
>
> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>  drivers/pci/quirks.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 44e0ff3..59fba6e 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4431,3 +4431,24 @@ static void quirk_intel_qat_vf_cap(struct pci_dev *pdev)
>  	}
>  }
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
> +/*
> + * Workaround FLR issues for 82579
> + * This code disables the FLR (Function Level Reset) via PCIe, in order
> + * to workaround a bug found while using device passthrough, where the
> + * interface would become non-responsive.
> + * NOTE: the FLR bit is Read/Write Once (RWO) in config space, so if
> + * the BIOS or kernel writes this register * then this workaround will
                                               ^
    That asterisk shouldn't be there.

> + * not work.
> + */
> +static void quirk_intel_flr_cap_dis(struct pci_dev *dev)
> +{
> +	int pos = pci_find_capability(dev, PCI_CAP_ID_AF);

    Should be an empty line here...

> +	if (pos) {
> +		u8 cap;

    And here...

> +		pci_read_config_byte(dev, pos + PCI_AF_CAP, &cap);
> +		cap = cap & (~PCI_AF_CAP_FLR);

    () not needed.

> +		pci_write_config_byte(dev, pos + PCI_AF_CAP, cap);
> +	}
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_intel_flr_cap_dis);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_intel_flr_cap_dis);

MBR, Sergei

^ permalink raw reply

* Re: [PATCH] softirq: let ksoftirqd do its job
From: Daniel Borkmann @ 2016-09-23 11:35 UTC (permalink / raw)
  To: David Miller, eric.dumazet
  Cc: peterz, riel, pabeni, hannes, jbrouer, linux-kernel, netdev,
	corbet
In-Reply-To: <20160901.233913.237544263411665891.davem@davemloft.net>

On 09/02/2016 08:39 AM, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 31 Aug 2016 10:42:29 -0700
>
>> From: Eric Dumazet <edumazet@google.com>
>>
>> A while back, Paolo and Hannes sent an RFC patch adding threaded-able
>> napi poll loop support : (https://patchwork.ozlabs.org/patch/620657/)
>>
>> The problem seems to be that softirqs are very aggressive and are often
>> handled by the current process, even if we are under stress and that
>> ksoftirqd was scheduled, so that innocent threads would have more chance
>> to make progress.
>>
>> This patch makes sure that if ksoftirq is running, we let it
>> perform the softirq work.
>>
>> Jonathan Corbet summarized the issue in https://lwn.net/Articles/687617/
>>
>> Tested:
>>
>>   - NIC receiving traffic handled by CPU 0
>>   - UDP receiver running on CPU 0, using a single UDP socket.
>>   - Incoming flood of UDP packets targeting the UDP socket.
>>
>> Before the patch, the UDP receiver could almost never get cpu cycles and
>> could only receive ~2,000 packets per second.
>>
>> After the patch, cpu cycles are split 50/50 between user application and
>> ksoftirqd/0, and we can effectively read ~900,000 packets per second,
>> a huge improvement in DOS situation. (Note that more packets are now
>> dropped by the NIC itself, since the BH handlers get less cpu cycles to
>> drain RX ring buffer)
>>
>> Since the load runs in well identified threads context, an admin can
>> more easily tune process scheduling parameters if needed.
>>
>> Reported-by: Paolo Abeni <pabeni@redhat.com>
>> Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> I'm just kind of assuming this won't go through my tree, but I can take
> it if that's what everyone agrees to.

Was this actually picked up somewhere in the mean time?

^ permalink raw reply

* Re: [PATCH][V2] cxgb4: fix signed wrap around when decrementing index idx
From: David Miller @ 2016-09-23 11:25 UTC (permalink / raw)
  To: colin.king; +Cc: hariprasad, netdev, linux-kernel
In-Reply-To: <20160922174858.31922-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Thu, 22 Sep 2016 18:48:58 +0100

> From: Colin Ian King <colin.king@canonical.com>
> 
> Change predecrement compare to post decrement compare to avoid an
> unsigned integer wrap-around comparison when decrementing idx in
> the while loop.
> 
> For example, when idx is zero, the current situation will
> predecrement idx in the while loop, wrapping idx to the maximum
> signed integer and cause out of bounds reads on rxq_info->msix_tbl[idx].
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied to net-next, thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox