Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 09/18] net: mac80211.h: fix a bad comment line
From: Johannes Berg @ 2018-05-09 12:04 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Kalle Valo, Linux Doc Mailing List, Mauro Carvalho Chehab,
	linux-kernel, Jonathan Corbet, David S. Miller, linux-wireless,
	netdev
In-Reply-To: <20180509090400.0156c0c8@vento.lan>

On Wed, 2018-05-09 at 09:04 -0300, Mauro Carvalho Chehab wrote:
> Em Mon, 07 May 2018 14:38:26 +0200
> Johannes Berg <johannes@sipsolutions.net> escreveu:
> 
> > On Mon, 2018-05-07 at 15:37 +0300, Kalle Valo wrote:
> > > Mauro Carvalho Chehab <mchehab+samsung@kernel.org> writes:
> > >   
> > > > Sphinx produces a lot of errors like this:
> > > > 	./include/net/mac80211.h:2083: warning: bad line:  >
> > > > 
> > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>  
> > > 
> > > Randy already submitted a similar patch:
> > > 
> > > https://patchwork.kernel.org/patch/10367275/
> > > 
> > > But it seems Johannes has not applied that yet.  
> > 
> > Yeah, I've been super busy preparing for the plugfest.
> > 
> > I'll make a pass over all the patches as soon as I can, hopefully today
> > or tomorrow.
> 
> Thanks. I'll drop it from my patchset, assuming that you'll
> be applying Randy's version or mine via your tree.

Right, I did, just need to send a pull request.

johannes

^ permalink raw reply

* [PATCH] net: phy: DP83TC811: Introduce support for the DP83TC811 phy
From: Dan Murphy @ 2018-05-09 12:05 UTC (permalink / raw)
  To: andrew, f.fainelli; +Cc: netdev, linux-kernel, Dan Murphy

Add support for the DP83811 phy.

The DP83811 supports both rgmii and sgmii interfaces.
There are 2 part numbers for this the DP83TC811R does not
reliably support the SGMII interface but the DP83TC811S will.

There is not a way to differentiate these parts from the
hardware or register set.  So this is controlled via the DT
to indicate which phy mode is required.  Or the part can be
strapped to a certain interface.

Data sheet can be found here:
http://www.ti.com/product/DP83TC811S-Q1/description
http://www.ti.com/product/DP83TC811R-Q1/description

Signed-off-by: Dan Murphy <dmurphy@ti.com>
---
 drivers/net/phy/Kconfig     |   5 +
 drivers/net/phy/Makefile    |   1 +
 drivers/net/phy/dp83tc811.c | 350 ++++++++++++++++++++++++++++++++++++
 3 files changed, 356 insertions(+)
 create mode 100644 drivers/net/phy/dp83tc811.c

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index bdfbabb86ee0..810140a9e114 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -285,6 +285,11 @@ config DP83822_PHY
 	---help---
 	  Supports the DP83822 PHY.
 
+config DP83TC811_PHY
+	tristate "Texas Instruments DP83TC822 PHY"
+	---help---
+	  Supports the DP83TC822 PHY.
+
 config DP83848_PHY
 	tristate "Texas Instruments DP83848 PHY"
 	---help---
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 01acbcb2c798..00445b61a9a8 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_CORTINA_PHY)	+= cortina.o
 obj-$(CONFIG_DAVICOM_PHY)	+= davicom.o
 obj-$(CONFIG_DP83640_PHY)	+= dp83640.o
 obj-$(CONFIG_DP83822_PHY)	+= dp83822.o
+obj-$(CONFIG_DP83TC811_PHY)	+= dp83tc811.o
 obj-$(CONFIG_DP83848_PHY)	+= dp83848.o
 obj-$(CONFIG_DP83867_PHY)	+= dp83867.o
 obj-$(CONFIG_FIXED_PHY)		+= fixed_phy.o
diff --git a/drivers/net/phy/dp83tc811.c b/drivers/net/phy/dp83tc811.c
new file mode 100644
index 000000000000..01cb0e246449
--- /dev/null
+++ b/drivers/net/phy/dp83tc811.c
@@ -0,0 +1,350 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for the Texas Instruments DP83TC811 PHY
+ *
+ * Copyright (C) 2018 Texas Instruments Incorporated - http://www.ti.com/
+ *
+ */
+
+#include <linux/ethtool.h>
+#include <linux/etherdevice.h>
+#include <linux/kernel.h>
+#include <linux/mii.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/phy.h>
+#include <linux/netdevice.h>
+
+#define DP83TC811_PHY_ID	0x2000a253
+#define DP83811_DEVADDR		0x1f
+
+#define MII_DP83811_SGMII_CTRL	0x09
+#define MII_DP83811_INT_STAT1	0x12
+#define MII_DP83811_INT_STAT2	0x13
+#define MII_DP83811_RESET_CTRL	0x1f
+
+#define DP83811_HW_RESET	BIT(15)
+#define DP83811_SW_RESET	BIT(14)
+
+/* INT_STAT1 bits */
+#define DP83811_RX_ERR_HF_INT_EN	BIT(0)
+#define DP83811_MS_TRAINING_INT_EN	BIT(1)
+#define DP83811_ANEG_COMPLETE_INT_EN	BIT(2)
+#define DP83811_ESD_EVENT_INT_EN	BIT(3)
+#define DP83811_WOL_INT_EN		BIT(4)
+#define DP83811_LINK_STAT_INT_EN	BIT(5)
+#define DP83811_ENERGY_DET_INT_EN	BIT(6)
+#define DP83811_LINK_QUAL_INT_EN	BIT(7)
+
+/* INT_STAT2 bits */
+#define DP83811_JABBER_DET_INT_EN	BIT(0)
+#define DP83811_POLARITY_INT_EN		BIT(1)
+#define DP83811_SLEEP_MODE_INT_EN	BIT(2)
+#define DP83811_OVERTEMP_INT_EN		BIT(3)
+#define DP83811_OVERVOLTAGE_INT_EN	BIT(6)
+#define DP83811_UNDERVOLTAGE_INT_EN	BIT(7)
+
+#define MII_DP83811_RXSOP1	0x04a5
+#define MII_DP83811_RXSOP2	0x04a6
+#define MII_DP83811_RXSOP3	0x04a7
+
+/* WoL Registers */
+#define MII_DP83811_WOL_CFG	0x04a0
+#define MII_DP83811_WOL_STAT	0x04a1
+#define MII_DP83811_WOL_DA1	0x04a2
+#define MII_DP83811_WOL_DA2	0x04a3
+#define MII_DP83811_WOL_DA3	0x04a4
+
+/* WoL bits */
+#define DP83811_WOL_MAGIC_EN	BIT(0)
+#define DP83811_WOL_SECURE_ON	BIT(5)
+#define DP83811_WOL_EN		BIT(7)
+#define DP83811_WOL_INDICATION_SEL BIT(8)
+#define DP83811_WOL_CLR_INDICATION BIT(11)
+
+/* SGMII CTRL bits */
+#define DP83811_TDR_AUTO		BIT(8)
+#define DP83811_SGMII_EN		BIT(12)
+#define DP83811_SGMII_AUTO_NEG_EN	BIT(13)
+#define DP83811_SGMII_TX_ERR_DIS	BIT(14)
+#define DP83811_SGMII_SOFT_RESET	BIT(15)
+
+static int dp83811_ack_interrupt(struct phy_device *phydev)
+{
+	int err;
+
+	err = phy_read(phydev, MII_DP83811_INT_STAT1);
+	if (err < 0)
+		return err;
+
+	err = phy_read(phydev, MII_DP83811_INT_STAT2);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+
+static int dp83811_set_wol(struct phy_device *phydev,
+			   struct ethtool_wolinfo *wol)
+{
+	struct net_device *ndev = phydev->attached_dev;
+	u16 value;
+	const u8 *mac;
+
+	if (wol->wolopts & (WAKE_MAGIC | WAKE_MAGICSECURE)) {
+		mac = (const u8 *)ndev->dev_addr;
+
+		if (!is_valid_ether_addr(mac))
+			return -EINVAL;
+
+		/* MAC addresses start with byte 5, but stored in mac[0].
+		 * 811 PHYs store bytes 4|5, 2|3, 0|1
+		 */
+		phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_DA1,
+			      (mac[1] << 8) | mac[0]);
+		phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_DA2,
+			      (mac[3] << 8) | mac[2]);
+		phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_DA3,
+			      (mac[5] << 8) | mac[4]);
+
+		value = phy_read_mmd(phydev, DP83811_DEVADDR,
+				     MII_DP83811_WOL_CFG);
+		if (wol->wolopts & WAKE_MAGIC)
+			value |= DP83811_WOL_MAGIC_EN;
+		else
+			value &= ~DP83811_WOL_MAGIC_EN;
+
+		if (wol->wolopts & WAKE_MAGICSECURE) {
+			phy_write_mmd(phydev, DP83811_DEVADDR,
+				      MII_DP83811_RXSOP1,
+				      (wol->sopass[1] << 8) | wol->sopass[0]);
+			phy_write_mmd(phydev, DP83811_DEVADDR,
+				      MII_DP83811_RXSOP2,
+				      (wol->sopass[3] << 8) | wol->sopass[2]);
+			phy_write_mmd(phydev, DP83811_DEVADDR,
+				      MII_DP83811_RXSOP3,
+				      (wol->sopass[5] << 8) | wol->sopass[4]);
+			value |= DP83811_WOL_SECURE_ON;
+		} else {
+			value &= ~DP83811_WOL_SECURE_ON;
+		}
+
+		value |= (DP83811_WOL_EN | DP83811_WOL_INDICATION_SEL |
+			  DP83811_WOL_CLR_INDICATION);
+		phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG,
+			      value);
+	} else {
+		value = phy_read_mmd(phydev, DP83811_DEVADDR,
+				     MII_DP83811_WOL_CFG);
+		value &= ~DP83811_WOL_EN;
+		phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG,
+			      value);
+	}
+
+	return 0;
+}
+
+static void dp83811_get_wol(struct phy_device *phydev,
+			    struct ethtool_wolinfo *wol)
+{
+	int value;
+	u16 sopass_val;
+
+	wol->supported = (WAKE_MAGIC | WAKE_MAGICSECURE);
+	wol->wolopts = 0;
+
+	value = phy_read_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG);
+
+	if (value & DP83811_WOL_MAGIC_EN)
+		wol->wolopts |= WAKE_MAGIC;
+
+	if (value & DP83811_WOL_SECURE_ON) {
+		sopass_val = phy_read_mmd(phydev, DP83811_DEVADDR,
+					  MII_DP83811_RXSOP1);
+		wol->sopass[0] = (sopass_val & 0xff);
+		wol->sopass[1] = (sopass_val >> 8);
+
+		sopass_val = phy_read_mmd(phydev, DP83811_DEVADDR,
+					  MII_DP83811_RXSOP2);
+		wol->sopass[2] = (sopass_val & 0xff);
+		wol->sopass[3] = (sopass_val >> 8);
+
+		sopass_val = phy_read_mmd(phydev, DP83811_DEVADDR,
+					  MII_DP83811_RXSOP3);
+		wol->sopass[4] = (sopass_val & 0xff);
+		wol->sopass[5] = (sopass_val >> 8);
+
+		wol->wolopts |= WAKE_MAGICSECURE;
+	}
+
+	/* WoL is not enabled so set wolopts to 0 */
+	if (!(value & DP83811_WOL_EN))
+		wol->wolopts = 0;
+}
+
+static int dp83811_config_intr(struct phy_device *phydev)
+{
+	int misr_status;
+	int err;
+
+	if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
+		misr_status = phy_read(phydev, MII_DP83811_INT_STAT1);
+		if (misr_status < 0)
+			return misr_status;
+
+		misr_status |= (DP83811_RX_ERR_HF_INT_EN |
+				DP83811_MS_TRAINING_INT_EN |
+				DP83811_ANEG_COMPLETE_INT_EN |
+				DP83811_ESD_EVENT_INT_EN |
+				DP83811_WOL_INT_EN |
+				DP83811_LINK_STAT_INT_EN |
+				DP83811_ENERGY_DET_INT_EN |
+				DP83811_LINK_QUAL_INT_EN);
+
+		err = phy_write(phydev, MII_DP83811_INT_STAT1, misr_status);
+		if (err < 0)
+			return err;
+
+		misr_status = phy_read(phydev, MII_DP83811_INT_STAT2);
+		if (misr_status < 0)
+			return misr_status;
+
+		misr_status |= (DP83811_JABBER_DET_INT_EN |
+				DP83811_POLARITY_INT_EN |
+				DP83811_SLEEP_MODE_INT_EN |
+				DP83811_OVERTEMP_INT_EN |
+				DP83811_OVERVOLTAGE_INT_EN |
+				DP83811_UNDERVOLTAGE_INT_EN);
+
+		err = phy_write(phydev, MII_DP83811_INT_STAT2, misr_status);
+
+	} else {
+		err = phy_write(phydev, MII_DP83811_INT_STAT1, 0);
+		if (err < 0)
+			return err;
+
+		err = phy_write(phydev, MII_DP83811_INT_STAT1, 0);
+	}
+
+	return err;
+}
+
+static int dp83811_config_aneg(struct phy_device *phydev)
+{
+	int err;
+	int value;
+
+	value = phy_read(phydev, MII_DP83811_SGMII_CTRL);
+	if (phydev->autoneg == AUTONEG_ENABLE) {
+		err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
+				(DP83811_SGMII_AUTO_NEG_EN | value));
+		if (err < 0)
+			return err;
+	} else {
+		err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
+				(~DP83811_SGMII_AUTO_NEG_EN & value));
+		if (err < 0)
+			return err;
+	}
+
+	return genphy_config_aneg(phydev);
+}
+
+static int dp83811_config_init(struct phy_device *phydev)
+{
+	int err;
+	int value;
+
+	err = genphy_config_init(phydev);
+	if (err < 0)
+		return err;
+
+	if (phydev->interface == PHY_INTERFACE_MODE_SGMII) {
+		value = phy_read(phydev, MII_DP83811_SGMII_CTRL);
+		if (!(value & DP83811_SGMII_EN)) {
+			err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
+					(DP83811_SGMII_EN | value));
+			if (err < 0)
+				return err;
+		} else {
+			err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
+				(~DP83811_SGMII_EN & value));
+			if (err < 0)
+				return err;
+		}
+	}
+
+	value = DP83811_WOL_MAGIC_EN | DP83811_WOL_SECURE_ON | DP83811_WOL_EN;
+
+	return phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG,
+	      value);
+}
+
+static int dp83811_phy_reset(struct phy_device *phydev)
+{
+	int err;
+
+	err = phy_write(phydev, MII_DP83811_RESET_CTRL, DP83811_HW_RESET);
+	if (err < 0)
+		return err;
+
+	dp83811_config_init(phydev);
+
+	return 0;
+}
+
+static int dp83811_suspend(struct phy_device *phydev)
+{
+	int value;
+
+	value = phy_read_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG);
+
+	if (!(value & DP83811_WOL_EN))
+		genphy_suspend(phydev);
+
+	return 0;
+}
+
+static int dp83811_resume(struct phy_device *phydev)
+{
+	int value;
+
+	genphy_resume(phydev);
+
+	value = phy_read_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG);
+
+	phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG, value |
+		      DP83811_WOL_CLR_INDICATION);
+
+	return 0;
+}
+
+static struct phy_driver dp83811_driver[] = {
+	{
+		.phy_id = DP83TC811_PHY_ID,
+		.phy_id_mask = 0xfffffff0,
+		.name = "TI DP83TC811",
+		.features = PHY_BASIC_FEATURES,
+		.flags = PHY_HAS_INTERRUPT,
+		.config_init = genphy_config_init,
+		.config_aneg = dp83811_config_aneg,
+		.soft_reset = dp83811_phy_reset,
+		.get_wol = dp83811_get_wol,
+		.set_wol = dp83811_set_wol,
+		.ack_interrupt = dp83811_ack_interrupt,
+		.config_intr = dp83811_config_intr,
+		.suspend = dp83811_suspend,
+		.resume = dp83811_resume,
+	 },
+};
+module_phy_driver(dp83811_driver);
+
+static struct mdio_device_id __maybe_unused dp83811_tbl[] = {
+	{ DP83TC811_PHY_ID, 0xfffffff0 },
+	{ },
+};
+MODULE_DEVICE_TABLE(mdio, dp83811_tbl);
+
+MODULE_DESCRIPTION("Texas Instruments DP83TC811 PHY driver");
+MODULE_AUTHOR("Dan Murphy <dmurphy@ti.com");
+MODULE_LICENSE("GPL");
-- 
2.17.0.582.gccdcbd54c

^ permalink raw reply related

* Re: [PATCH ghak81 RFC V1 4/5] audit: use inline function to set audit context
From: Richard Guy Briggs @ 2018-05-09 12:09 UTC (permalink / raw)
  To: Tobin C. Harding
  Cc: Linux-Audit Mailing List, LKML,
	Linux NetDev Upstream Mailing List, Netfilter Devel List,
	Linux Security Module list, Integrity Measurement Architecture,
	SElinux list, Eric Paris, Paul Moore, Steve Grubb, Ingo Molnar,
	David Howells
In-Reply-To: <20180509020700.GE7517@eros>

On 2018-05-09 12:07, Tobin C. Harding wrote:
> On Fri, May 04, 2018 at 04:54:37PM -0400, Richard Guy Briggs wrote:
> > Recognizing that the audit context is an internal audit value, use an
> > access function to set the audit context pointer for the task
> > rather than reaching directly into the task struct to set it.
> > 
> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > ---
> >  include/linux/audit.h | 8 ++++++++
> >  kernel/auditsc.c      | 6 +++---
> >  kernel/fork.c         | 2 +-
> >  3 files changed, 12 insertions(+), 4 deletions(-)
> > 
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index 93e4c61..dba0d45 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -235,6 +235,10 @@ extern void __audit_inode_child(struct inode *parent,
> >  extern void __audit_seccomp(unsigned long syscall, long signr, int code);
> >  extern void __audit_ptrace(struct task_struct *t);
> >  
> > +static inline void audit_set_context(struct task_struct *task, struct audit_context *ctx)
> > +{
> > +	task->audit_context = ctx;
> > +}
> >  static inline struct audit_context *audit_context(struct task_struct *task)
> >  {
> >  	return task->audit_context;
> > @@ -472,6 +476,10 @@ static inline bool audit_dummy_context(void)
> >  {
> >  	return true;
> >  }
> > +static inline void audit_set_context(struct task_struct *task, struct audit_context *ctx)
> > +{
> > +	task->audit_context = ctx;
> > +}
> 
> If audit_context is an internal audit value why do we set it when
> CONFIG_AUDITSYSCALL is not set?

Agreed, that is unnecessary, but harmless since it won't be called, or
will be called with a value of NULL.  That has been fixed in my dynamic
allocation patchset since not even the audit_task_info struct is
available to assign the value.  It is now an empty function like the
rest.

> Tobin.

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: [PATCH v15 net-next 00/12] Chelsio Inline TLS
From: Atul Gupta @ 2018-05-09 12:15 UTC (permalink / raw)
  To: Boris Pismenny, David Miller
  Cc: herbert, davejwatson, sd, sbrivio, linux-crypto, netdev, werner,
	leedom, swise, indranil, ganeshgr
In-Reply-To: <ee9bfb6a-887b-a0fb-8880-dc709a217b50@mellanox.com>



On 4/1/2018 6:27 PM, Boris Pismenny wrote:
> Hi,
>
> On 4/1/2018 6:37 AM, David Miller wrote:
>> From: Atul Gupta <atul.gupta@chelsio.com>
>> Date: Sat, 31 Mar 2018 21:41:51 +0530
>>
>>> Series for Chelsio Inline TLS driver (chtls)
>>
>> Series applied, thank you.
>>
>
> Sorry for being late to the party, could you please help answer a few questions to help me understand better.
Going over these points and addressing some of them in follow-up patches:
>
> 1. What happens if someone attempts to set a TCP socket option for a socket whose TCP stack resides in the TCP offload engine(TOE)? Do you emulate all Linux socket options? What about IP socket options?
HW offloaded options are handled while rest shall be redirected to host.
>
> If I follow the code correctly, then the original TCP/IP setsockopt is called. But, it doesn't change any of the parameters of the TCP/IP offload engine in hardware.
>
> 2. I can't find where is the TLS record sequence number pushed to hardware. Is that on purpose?
seq-nos is pushed to HW in cpl_tx_tls_sfo->scmd1
>
> FYI, ignoring this parameter might cause a record sequence number reuse which breaks the integrity of the AES-GCM TLS ciphersuite.
>
> 3. How does a TOE handle Tx only or Rx only?
Driver does not differentiate/isolate the tx and rx path for Inline Processing
>
> 4. What happens when there is a routing change that redirects traffic to a different netdev? Is there a software fallback?
The case we think is handling the next hop change, is there any other case?
>
> 5. The TLS socket option is set in the middle of a TCP connection. What happens to the existing TCP connection and the data that is currently queued in the TCP write queue?
I believe this behave same as SW. If by TLS options you mean re-keying then outstanding data on Tx is flushed before new key takes effect. For Rx user should be careful else it will result in MAC error.

Thanks
Atul
>
> Thanks,
> Boris.
>

^ permalink raw reply

* [PATCH] netfilter: nf_tables: fix memory leak on error exit return
From: Colin King @ 2018-05-09 12:22 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	David S . Miller, netfilter-devel, coreteam, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Currently the -EBUSY error return path is not free'ing resources
allocated earlier, leaving a memory leak. Fix this by exiting via the
error exit label err5 that performs the necessary resource clean
up.

Detected by CoverityScan, CID#1432975 ("Resource leak")

Fixes: 9744a6fcefcb ("netfilter: nf_tables: check if same extensions are set when adding elements")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 net/netfilter/nf_tables_api.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 6422eba367cf..a3d77aa0f752 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -4098,8 +4098,10 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 			if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^
 			    nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) ||
 			    nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^
-			    nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF))
-				return -EBUSY;
+			    nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF)) {
+				err = -EBUSY;
+				goto err5;
+			}
 			if ((nft_set_ext_exists(ext, NFT_SET_EXT_DATA) &&
 			     nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) &&
 			     memcmp(nft_set_ext_data(ext),
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH] netfilter: nf_tables: fix memory leak on error exit return
From: Pablo Neira Ayuso @ 2018-05-09 12:32 UTC (permalink / raw)
  To: Colin King
  Cc: Jozsef Kadlecsik, Florian Westphal, David S . Miller,
	netfilter-devel, coreteam, netdev, kernel-janitors, linux-kernel
In-Reply-To: <20180509122256.16859-1-colin.king@canonical.com>

On Wed, May 09, 2018 at 01:22:56PM +0100, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> Currently the -EBUSY error return path is not free'ing resources
> allocated earlier, leaving a memory leak. Fix this by exiting via the
> error exit label err5 that performs the necessary resource clean
> up.
> 
> Detected by CoverityScan, CID#1432975 ("Resource leak")

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] selinux: add AF_UNSPEC and INADDR_ANY checks to selinux_socket_bind()
From: Stephen Smalley @ 2018-05-09 12:37 UTC (permalink / raw)
  To: Paul Moore
  Cc: Alexey Kodanev, netdev,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	selinux-+05T5uksL2qpZYMLLGbcSA
In-Reply-To: <CAHC9VhT1+-ch1Ncv5YCNgu7tPnUj1Qx8S=a=q=Fn=Dwx4SnTKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 05/08/2018 08:25 PM, Paul Moore wrote:
> On Tue, May 8, 2018 at 2:40 PM, Stephen Smalley <sds-+05T5uksL2qpZYMLLGbcSA@public.gmane.org> wrote:
>> On 05/08/2018 01:05 PM, Paul Moore wrote:
>>> On Tue, May 8, 2018 at 10:05 AM, Alexey Kodanev
>>> <alexey.kodanev-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>>> Commit d452930fd3b9 ("selinux: Add SCTP support") breaks compatibility
>>>> with the old programs that can pass sockaddr_in with AF_UNSPEC and
>>>> INADDR_ANY to bind(). As a result, bind() returns EAFNOSUPPORT error.
>>>> It was found with LTP/asapi_01 test.
>>>>
>>>> Similar to commit 29c486df6a20 ("net: ipv4: relax AF_INET check in
>>>> bind()"), which relaxed AF_INET check for compatibility, add AF_UNSPEC
>>>> case to AF_INET and make sure that the address is INADDR_ANY.
>>>>
>>>> Also, in the end of selinux_socket_bind(), instead of adding AF_UNSPEC
>>>> to 'address->sa_family == AF_INET', verify AF_INET6 first.
>>>>
>>>> Fixes: d452930fd3b9 ("selinux: Add SCTP support")
>>>> Signed-off-by: Alexey Kodanev <alexey.kodanev-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>> ---
>>>>  security/selinux/hooks.c | 12 +++++++++---
>>>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>>
>>> Thanks for finding and reporting this regression.
>>>
>>> I think I would prefer to avoid having to duplicate the
>>> AF_UNSPEC/INADDR_ANY checking logic in the SELinux hook, even though
>>> it is a small bit of code and unlikely to change.  I'm wondering if it
>>> would be better to check both the socket and sockaddr address family
>>> in the main if conditional inside selinux_socket_bind(), what do you
>>> think?  Another option would be to go back to just checking the
>>> soackaddr address family; we moved away from that for a reason which
>>> escapes at the moment (code cleanliness?), but perhaps that was a
>>> mistake.
>>
>> We've always used the sk->sk_family there; it was only the recent code from Richard that started
>> using the socket address family.
> 
> Yes I know, I thought I was the one that suggested it at some point
> (I'll take the blame) ... although now that I'm looking at the git
> log, maybe I'm confusing it with something else.
> 
>>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>>> index 4cafe6a19167..a3789b167667 100644
>>> --- a/security/selinux/hooks.c
>>> +++ b/security/selinux/hooks.c
>>> @@ -4577,6 +4577,7 @@ static int selinux_socket_bind(struct socket *sock, struc>
>>> {
>>>        struct sock *sk = sock->sk;
>>>        u16 family;
>>> +       u16 family_sa;
>>>        int err;
>>>
>>>        err = sock_has_perm(sk, SOCKET__BIND);
>>> @@ -4585,7 +4586,9 @@ static int selinux_socket_bind(struct socket *sock, struc>
>>>
>>>        /* If PF_INET or PF_INET6, check name_bind permission for the port. */
>>>        family = sk->sk_family;
>>> -       if (family == PF_INET || family == PF_INET6) {
>>> +       family_sa = address->sa_family;
>>> +       if ((family == PF_INET || family == PF_INET6) &&
>>> +           (family_sa == PF_INET || family_sa == PF_INET6)) {
>>
>> Wouldn't this allow bypassing the name_bind permission check by passing in AF_UNSPEC?
> 
> I believe these name_bind permission checkis skipped for AF_UNSPEC
> already, isn't it?  The only way the name_bind check would be
> triggered is if the source port, snum, was non-zero and I didn't think
> that was really legal for AF_UNSPEC/INADDR_ANY, is it?

1) What in inet_bind() prevents that from occurring?
2) Regardless, what about the node_bind check?

> 
>>>                char *addrp;
>>>                struct sk_security_struct *sksec = sk->sk_security;
>>>                struct common_audit_data ad;
>>> @@ -4601,7 +4604,7 @@ static int selinux_socket_bind(struct socket *sock, struc>
>>>                 * need to check address->sa_family as it is possible to have
>>>                 * sk->sk_family = PF_INET6 with addr->sa_family = AF_INET.
>>>                 */
>>> -               switch (address->sa_family) {
>>> +               switch (family_sa) {
>>>                case AF_INET:
>>>                        if (addrlen < sizeof(struct sockaddr_in))
>>>                                return -EINVAL;
>>>
>>>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>>>> index 4cafe6a..649a3be 100644
>>>> --- a/security/selinux/hooks.c
>>>> +++ b/security/selinux/hooks.c
>>>> @@ -4602,10 +4602,16 @@ static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, in
>>>>                  * sk->sk_family = PF_INET6 with addr->sa_family = AF_INET.
>>>>                  */
>>>>                 switch (address->sa_family) {
>>>> +               case AF_UNSPEC:
>>>>                 case AF_INET:
>>>>                         if (addrlen < sizeof(struct sockaddr_in))
>>>>                                 return -EINVAL;
>>>>                         addr4 = (struct sockaddr_in *)address;
>>>> +
>>>> +                       if (address->sa_family == AF_UNSPEC &&
>>>> +                           addr4->sin_addr.s_addr != htonl(INADDR_ANY))
>>>> +                               return -EAFNOSUPPORT;
>>>> +
>>>>                         snum = ntohs(addr4->sin_port);
>>>>                         addrp = (char *)&addr4->sin_addr.s_addr;
>>>>                         break;
>>>> @@ -4681,10 +4687,10 @@ static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, in
>>>>                 ad.u.net->sport = htons(snum);
>>>>                 ad.u.net->family = family;
>>>>
>>>> -               if (address->sa_family == AF_INET)
>>>> -                       ad.u.net->v4info.saddr = addr4->sin_addr.s_addr;
>>>> -               else
>>>> +               if (address->sa_family == AF_INET6)
>>>>                         ad.u.net->v6info.saddr = addr6->sin6_addr;
>>>> +               else
>>>> +                       ad.u.net->v4info.saddr = addr4->sin_addr.s_addr;
>>>>
>>>>                 err = avc_has_perm(&selinux_state,
>>>>                                    sksec->sid, sid,
>>>> --
>>>> 1.8.3.1
>>>>
>>>
>>
> 
> 
> 

^ permalink raw reply

* [PATCH] can: hi311x: Acquire SPI lock on ->do_get_berr_counter
From: Lukas Wunner @ 2018-05-09 12:38 UTC (permalink / raw)
  To: Marc Kleine-Budde, Wolfgang Grandegger, linux-can, netdev
  Cc: Mathias Duckeck, Akshay Bhat, Casey Fitzpatrick, Stef Walter,
	Karel Zak

hi3110_get_berr_counter() may run concurrently to the rest of the driver
but neglects to acquire the lock protecting access to the SPI device.
As a result, it and the rest of the driver may clobber each other's tx
and rx buffers.

We became aware of this issue because transmission of packets with
"cangen -g 0 -i -x" frequently hung.  It turns out that agetty executes
->do_get_berr_counter every few seconds via the following call stack:

    CPU: 2 PID: 1605 Comm: agetty
    [<7f3f7500>] (hi3110_get_berr_counter [hi311x])
    [<7f130204>] (can_fill_info [can_dev])
    [<80693bc0>] (rtnl_fill_ifinfo)
    [<806949ec>] (rtnl_dump_ifinfo)
    [<806b4834>] (netlink_dump)
    [<806b4bc8>] (netlink_recvmsg)
    [<8065f180>] (sock_recvmsg)
    [<80660f90>] (___sys_recvmsg)
    [<80661e7c>] (__sys_recvmsg)
    [<80661ec0>] (SyS_recvmsg)
    [<80108b20>] (ret_fast_syscall+0x0/0x1c)

agetty listens to netlink messages in order to update the login prompt
when IP addresses change (if /etc/issue contains \4 or \6 escape codes):
https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/commit/?id=e36deb6424e8

It's a useful feature, though it seems questionable that it causes CAN
bit error statistics to be queried.

Be that as it may, if hi3110_get_berr_counter() is invoked while a frame
is sent by hi3110_hw_tx(), bogus SPI transfers like the following may
occur:

    => 12 00             (hi3110_get_berr_counter() wanted to transmit
                          EC 00 to query the transmit error counter,
                          but the first byte was overwritten by
                          hi3110_hw_tx_frame())

    => EA 00 3E 80 01 FB (hi3110_hw_tx_frame() wanted to transmit a
                          frame, but the first byte was overwritten by
                          hi3110_get_berr_counter() because it wanted
                          to query the receive error counter)

This sequence hangs the transmission because the driver believes it has
sent a frame and waits for the interrupt signaling completion, but in
reality the chip has never sent away the frame since the commands it
received were malformed.

Fix by acquiring the SPI lock in hi3110_get_berr_counter().

I've scrutinized the entire driver for further unlocked SPI accesses but
found no others.

Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Akshay Bhat <akshay.bhat@timesys.com>
Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com>
Cc: Stef Walter <stefw@redhat.com>
Cc: Karel Zak <kzak@redhat.com>
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/net/can/spi/hi311x.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/can/spi/hi311x.c b/drivers/net/can/spi/hi311x.c
index 5590c559a8ca..c2cf254e4e95 100644
--- a/drivers/net/can/spi/hi311x.c
+++ b/drivers/net/can/spi/hi311x.c
@@ -427,8 +427,10 @@ static int hi3110_get_berr_counter(const struct net_device *net,
 	struct hi3110_priv *priv = netdev_priv(net);
 	struct spi_device *spi = priv->spi;
 
+	mutex_lock(&priv->hi3110_lock);
 	bec->txerr = hi3110_read(spi, HI3110_READ_TEC);
 	bec->rxerr = hi3110_read(spi, HI3110_READ_REC);
+	mutex_unlock(&priv->hi3110_lock);
 
 	return 0;
 }
-- 
2.17.0

^ permalink raw reply related

* [PATCH] can: hi311x: Work around TX complete interrupt erratum
From: Lukas Wunner @ 2018-05-09 12:43 UTC (permalink / raw)
  To: Marc Kleine-Budde, Wolfgang Grandegger, linux-can, netdev
  Cc: Mathias Duckeck, Akshay Bhat, Casey Fitzpatrick

When sending packets as fast as possible using "cangen -g 0 -i -x", the
HI-3110 occasionally latches the interrupt pin high on completion of a
packet, but doesn't set the TXCPLT bit in the INTF register.  The INTF
register contains 0x00 as if no interrupt has occurred.  Even waiting
for a few milliseconds after the interrupt doesn't help.

Work around this apparent erratum by instead checking the TXMTY bit in
the STATF register ("TX FIFO empty").  We know that we've queued up a
packet for transmission if priv->tx_len is nonzero.  If the TX FIFO is
empty, transmission of that packet must have completed.

Note that this is congruent with our handling of received packets, which
likewise gleans from the STATF register whether a packet is waiting in
the RX FIFO, instead of looking at the INTF register.

Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Akshay Bhat <akshay.bhat@timesys.com>
Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com>
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/net/can/spi/hi311x.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/spi/hi311x.c b/drivers/net/can/spi/hi311x.c
index c2cf254e4e95..53e320c92a8b 100644
--- a/drivers/net/can/spi/hi311x.c
+++ b/drivers/net/can/spi/hi311x.c
@@ -91,6 +91,7 @@
 #define HI3110_STAT_BUSOFF BIT(2)
 #define HI3110_STAT_ERRP BIT(3)
 #define HI3110_STAT_ERRW BIT(4)
+#define HI3110_STAT_TXMTY BIT(7)
 
 #define HI3110_BTR0_SJW_SHIFT 6
 #define HI3110_BTR0_BRP_SHIFT 0
@@ -737,10 +738,7 @@ static irqreturn_t hi3110_can_ist(int irq, void *dev_id)
 			}
 		}
 
-		if (intf == 0)
-			break;
-
-		if (intf & HI3110_INT_TXCPLT) {
+		if (priv->tx_len && statf & HI3110_STAT_TXMTY) {
 			net->stats.tx_packets++;
 			net->stats.tx_bytes += priv->tx_len - 1;
 			can_led_event(net, CAN_LED_EVENT_TX);
@@ -750,6 +748,9 @@ static irqreturn_t hi3110_can_ist(int irq, void *dev_id)
 			}
 			netif_wake_queue(net);
 		}
+
+		if (intf == 0)
+			break;
 	}
 	mutex_unlock(&priv->hi3110_lock);
 	return IRQ_HANDLED;
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH rdma-next] MAINTAINERS: Remove bouncing @mellanox.com addresses
From: Doug Ledford @ 2018-05-09 13:08 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, netdev
In-Reply-To: <20180503183746.7629-1-leon@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 418 bytes --]

On Thu, 2018-05-03 at 21:37 +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> Delete non-existent @mellanox.com addresses from MAINTAINERS file.
> 
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>

Thanks, applied to for-rc.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH net] cxgb4: zero the HMA memory
From: Ganesh Goudar @ 2018-05-09 13:10 UTC (permalink / raw)
  To: netdev, davem; +Cc: nirranjan, indranil, venkatesh, Ganesh Goudar

firmware expects HMA memory to be zeroed, use __GFP_ZERO
for HMA memory allocation.

Fixes: 8b4e6b3ca2ed ("cxgb4: Add HMA support")
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 24d2865..c3ae575 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -3433,8 +3433,8 @@ static int adap_config_hma(struct adapter *adapter)
 	sgl = adapter->hma.sgt->sgl;
 	node = dev_to_node(adapter->pdev_dev);
 	for_each_sg(sgl, iter, sgt->orig_nents, i) {
-		newpage = alloc_pages_node(node, __GFP_NOWARN | GFP_KERNEL,
-					   page_order);
+		newpage = alloc_pages_node(node, __GFP_NOWARN | GFP_KERNEL |
+					   __GFP_ZERO, page_order);
 		if (!newpage) {
 			dev_err(adapter->pdev_dev,
 				"Not enough memory for HMA page allocation\n");
-- 
2.1.0

^ permalink raw reply related

* [bpf-next PATCH 0/4] xdp: introduce bulking for ndo_xdp_xmit API
From: Jesper Dangaard Brouer @ 2018-05-09 13:12 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: Christoph Hellwig, BjörnTöpel, Magnus Karlsson

This patchset change ndo_xdp_xmit API to take a bulk of xdp frames.

When kernel is compiled with CONFIG_RETPOLINE, every indirect function
pointer (branch) call hurts performance. For XDP this have a huge
negative performance impact.

This patchset reduce the needed (indirect) calls to ndo_xdp_xmit, but
also prepares for further optimizations.  The DMA APIs use of indirect
function pointer calls is the primary source the regression.  It is
left for a followup patchset, to use bulking calls towards the DMA API
(via the scatter-gatter calls).

The other advantage of this API change is that drivers can easier
amortize the cost of any sync/locking scheme, over the bulk of
packets.  The assumption of the current API is that the driver
implemementing the NDO will also allocate a dedicated XDP TX queue for
every CPU in the system.  Which is not always possible or practical to
configure. E.g. ixgbe cannot load an XDP program on a machine with
more than 96 CPUs, due to limited hardware TX queues.  E.g. virtio_net
is hard to configure as it requires manually increasing the
queues. E.g. tun driver chooses to use a per XDP frame producer lock
modulo smp_processor_id over avail queues.

---

Jesper Dangaard Brouer (4):
      bpf: devmap introduce dev_map_enqueue
      bpf: devmap prepare xdp frames for bulking
      xdp: add tracepoint for devmap like cpumap have
      xdp: change ndo_xdp_xmit API to support bulking


 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   26 ++++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |    2 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   21 +++-
 drivers/net/tun.c                             |   37 ++++---
 drivers/net/virtio_net.c                      |   66 +++++++++---
 include/linux/bpf.h                           |   15 ++-
 include/linux/netdevice.h                     |   14 ++-
 include/net/xdp.h                             |    1 
 include/trace/events/xdp.h                    |   50 +++++++++
 kernel/bpf/devmap.c                           |  134 ++++++++++++++++++++++++-
 net/core/filter.c                             |   19 +---
 net/core/xdp.c                                |   14 ++-
 samples/bpf/xdp_monitor_kern.c                |   49 +++++++++
 samples/bpf/xdp_monitor_user.c                |   69 +++++++++++++
 14 files changed, 436 insertions(+), 81 deletions(-)

^ permalink raw reply

* [bpf-next PATCH 1/4] bpf: devmap introduce dev_map_enqueue
From: Jesper Dangaard Brouer @ 2018-05-09 13:12 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: Christoph Hellwig, BjörnTöpel, Magnus Karlsson
In-Reply-To: <152587152136.20423.14493673928480468024.stgit@firesoul>

Functionality is the same, but the ndo_xdp_xmit call is now
simply invoked from inside the devmap.c code.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/linux/bpf.h        |   13 ++++++++++---
 include/trace/events/xdp.h |    9 ++++++++-
 kernel/bpf/devmap.c        |   37 +++++++++++++++++++++++++++++++------
 net/core/filter.c          |   15 ++-------------
 4 files changed, 51 insertions(+), 23 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 321969da67b7..6457343f6dfa 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -485,14 +485,15 @@ int bpf_check(struct bpf_prog **fp, union bpf_attr *attr);
 void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth);
 
 /* Map specifics */
-struct net_device  *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
+struct xdp_buff;
+struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
 void __dev_map_insert_ctx(struct bpf_map *map, u32 index);
 void __dev_map_flush(struct bpf_map *map);
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp);
 
 struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key);
 void __cpu_map_insert_ctx(struct bpf_map *map, u32 index);
 void __cpu_map_flush(struct bpf_map *map);
-struct xdp_buff;
 int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
 		    struct net_device *dev_rx);
 
@@ -571,6 +572,13 @@ static inline void __dev_map_flush(struct bpf_map *map)
 {
 }
 
+struct xdp_buff;
+static inline
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
+{
+	return 0;
+}
+
 static inline
 struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key)
 {
@@ -585,7 +593,6 @@ static inline void __cpu_map_flush(struct bpf_map *map)
 {
 }
 
-struct xdp_buff;
 static inline int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu,
 				  struct xdp_buff *xdp,
 				  struct net_device *dev_rx)
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index 8989a92c571a..96104610d40e 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -138,11 +138,18 @@ DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map_err,
 		  __entry->map_id, __entry->map_index)
 );
 
+#ifndef __DEVMAP_OBJ_TYPE
+#define __DEVMAP_OBJ_TYPE
+struct _bpf_dtab_netdev {
+	struct net_device *dev;
+};
+#endif /* __DEVMAP_OBJ_TYPE */
+
 #define devmap_ifindex(fwd, map)				\
 	(!fwd ? 0 :						\
 	 (!map ? 0 :						\
 	  ((map->map_type == BPF_MAP_TYPE_DEVMAP) ?		\
-	   ((struct net_device *)fwd)->ifindex : 0)))
+	   ((struct _bpf_dtab_netdev *)fwd)->dev->ifindex : 0)))
 
 #define _trace_xdp_redirect_map(dev, xdp, fwd, map, idx)		\
 	 trace_xdp_redirect_map(dev, xdp, devmap_ifindex(fwd, map),	\
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 565f9ece9115..808808bf2bf2 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -48,18 +48,21 @@
  * calls will fail at this point.
  */
 #include <linux/bpf.h>
+#include <net/xdp.h>
 #include <linux/filter.h>
 
 #define DEV_CREATE_FLAG_MASK \
 	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
 
+/* objects in the map */
 struct bpf_dtab_netdev {
-	struct net_device *dev;
+	struct net_device *dev; /* must be first member, due to tracepoint */
 	struct bpf_dtab *dtab;
 	unsigned int bit;
 	struct rcu_head rcu;
 };
 
+/* bpf map container */
 struct bpf_dtab {
 	struct bpf_map map;
 	struct bpf_dtab_netdev **netdev_map;
@@ -240,21 +243,43 @@ void __dev_map_flush(struct bpf_map *map)
  * update happens in parallel here a dev_put wont happen until after reading the
  * ifindex.
  */
-struct net_device  *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
+struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	struct bpf_dtab_netdev *dev;
+	struct bpf_dtab_netdev *obj;
 
 	if (key >= map->max_entries)
 		return NULL;
 
-	dev = READ_ONCE(dtab->netdev_map[key]);
-	return dev ? dev->dev : NULL;
+	obj = READ_ONCE(dtab->netdev_map[key]);
+	return obj;
+}
+
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
+{
+	struct net_device *dev = dst->dev;
+	struct xdp_frame *xdpf;
+	int err;
+
+	if (!dev->netdev_ops->ndo_xdp_xmit)
+		return -EOPNOTSUPP;
+
+	xdpf = convert_to_xdp_frame(xdp);
+	if (unlikely(!xdpf))
+		return -EOVERFLOW;
+
+	/* TODO: implement a bulking/enqueue step later */
+	err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
+	if (err)
+		return err;
+
+	return 0;
 }
 
 static void *dev_map_lookup_elem(struct bpf_map *map, void *key)
 {
-	struct net_device *dev = __dev_map_lookup_elem(map, *(u32 *)key);
+	struct bpf_dtab_netdev *obj = __dev_map_lookup_elem(map, *(u32 *)key);
+	struct net_device *dev = dev = obj ? obj->dev : NULL;
 
 	return dev ? &dev->ifindex : NULL;
 }
diff --git a/net/core/filter.c b/net/core/filter.c
index 6877426c23a6..258d243099e1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3020,20 +3020,9 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
 
 	switch (map->map_type) {
 	case BPF_MAP_TYPE_DEVMAP: {
-		struct net_device *dev = fwd;
-		struct xdp_frame *xdpf;
+		struct bpf_dtab_netdev *dst = fwd;
 
-		if (!dev->netdev_ops->ndo_xdp_xmit)
-			return -EOPNOTSUPP;
-
-		xdpf = convert_to_xdp_frame(xdp);
-		if (unlikely(!xdpf))
-			return -EOVERFLOW;
-
-		/* TODO: move to inside map code instead, for bulk support
-		 * err = dev_map_enqueue(dev, xdp);
-		 */
-		err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
+		err = dev_map_enqueue(dst, xdp);
 		if (err)
 			return err;
 		__dev_map_insert_ctx(map, index);

^ permalink raw reply related

* [bpf-next PATCH 2/4] bpf: devmap prepare xdp frames for bulking
From: Jesper Dangaard Brouer @ 2018-05-09 13:13 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: Christoph Hellwig, BjörnTöpel, Magnus Karlsson
In-Reply-To: <152587152136.20423.14493673928480468024.stgit@firesoul>

Like cpumap create queue for xdp frames that will be bulked.  For now,
this patch simply invoke ndo_xdp_xmit foreach frame.  This happens,
either when the map flush operation is envoked, or when the limit
DEV_MAP_BULK_SIZE is reached.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 kernel/bpf/devmap.c |   77 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 73 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 808808bf2bf2..cab72c100bb5 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -54,11 +54,18 @@
 #define DEV_CREATE_FLAG_MASK \
 	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
 
+#define DEV_MAP_BULK_SIZE 16
+struct xdp_bulk_queue {
+	struct xdp_frame *q[DEV_MAP_BULK_SIZE];
+	unsigned int count;
+};
+
 /* objects in the map */
 struct bpf_dtab_netdev {
 	struct net_device *dev; /* must be first member, due to tracepoint */
 	struct bpf_dtab *dtab;
 	unsigned int bit;
+	struct xdp_bulk_queue __percpu *bulkq;
 	struct rcu_head rcu;
 };
 
@@ -209,6 +216,38 @@ void __dev_map_insert_ctx(struct bpf_map *map, u32 bit)
 	__set_bit(bit, bitmap);
 }
 
+static int bq_xmit_all(struct bpf_dtab_netdev *obj,
+			 struct xdp_bulk_queue *bq)
+{
+	unsigned int processed = 0, drops = 0;
+	struct net_device *dev = obj->dev;
+	int i;
+
+	if (unlikely(!bq->count))
+		return 0;
+
+	for (i = 0; i < bq->count; i++) {
+		struct xdp_frame *xdpf = bq->q[i];
+
+		prefetch(xdpf);
+	}
+
+	for (i = 0; i < bq->count; i++) {
+		struct xdp_frame *xdpf = bq->q[i];
+		int err;
+
+		err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
+		if (err) {
+			drops++;
+			xdp_return_frame(xdpf);
+		}
+		processed++;
+	}
+	bq->count = 0;
+
+	return 0;
+}
+
 /* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled
  * from the driver before returning from its napi->poll() routine. The poll()
  * routine is called either from busy_poll context or net_rx_action signaled
@@ -224,6 +263,7 @@ void __dev_map_flush(struct bpf_map *map)
 
 	for_each_set_bit(bit, bitmap, map->max_entries) {
 		struct bpf_dtab_netdev *dev = READ_ONCE(dtab->netdev_map[bit]);
+		struct xdp_bulk_queue *bq;
 		struct net_device *netdev;
 
 		/* This is possible if the dev entry is removed by user space
@@ -233,6 +273,9 @@ void __dev_map_flush(struct bpf_map *map)
 			continue;
 
 		__clear_bit(bit, bitmap);
+
+		bq = this_cpu_ptr(dev->bulkq);
+		bq_xmit_all(dev, bq);
 		netdev = dev->dev;
 		if (likely(netdev->netdev_ops->ndo_xdp_flush))
 			netdev->netdev_ops->ndo_xdp_flush(netdev);
@@ -255,6 +298,20 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
 	return obj;
 }
 
+/* Runs under RCU-read-side, plus in softirq under NAPI protection.
+ * Thus, safe percpu variable access.
+ */
+static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf)
+{
+	struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq);
+
+	if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
+		bq_xmit_all(obj, bq);
+
+	bq->q[bq->count++] = xdpf;
+	return 0;
+}
+
 int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
 {
 	struct net_device *dev = dst->dev;
@@ -268,8 +325,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
 	if (unlikely(!xdpf))
 		return -EOVERFLOW;
 
-	/* TODO: implement a bulking/enqueue step later */
-	err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
+	err = bq_enqueue(dst, xdpf);
 	if (err)
 		return err;
 
@@ -288,13 +344,18 @@ static void dev_map_flush_old(struct bpf_dtab_netdev *dev)
 {
 	if (dev->dev->netdev_ops->ndo_xdp_flush) {
 		struct net_device *fl = dev->dev;
+		struct xdp_bulk_queue *bq;
 		unsigned long *bitmap;
+
 		int cpu;
 
 		for_each_online_cpu(cpu) {
 			bitmap = per_cpu_ptr(dev->dtab->flush_needed, cpu);
 			__clear_bit(dev->bit, bitmap);
 
+			bq = per_cpu_ptr(dev->bulkq, cpu);
+			bq_xmit_all(dev, bq);
+
 			fl->netdev_ops->ndo_xdp_flush(dev->dev);
 		}
 	}
@@ -306,6 +367,7 @@ static void __dev_map_entry_free(struct rcu_head *rcu)
 
 	dev = container_of(rcu, struct bpf_dtab_netdev, rcu);
 	dev_map_flush_old(dev);
+	free_percpu(dev->bulkq);
 	dev_put(dev->dev);
 	kfree(dev);
 }
@@ -338,6 +400,7 @@ static int dev_map_update_elem(struct bpf_map *map, void *key, void *value,
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
 	struct net *net = current->nsproxy->net_ns;
+	gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN;
 	struct bpf_dtab_netdev *dev, *old_dev;
 	u32 i = *(u32 *)key;
 	u32 ifindex = *(u32 *)value;
@@ -352,11 +415,17 @@ static int dev_map_update_elem(struct bpf_map *map, void *key, void *value,
 	if (!ifindex) {
 		dev = NULL;
 	} else {
-		dev = kmalloc_node(sizeof(*dev), GFP_ATOMIC | __GFP_NOWARN,
-				   map->numa_node);
+		dev = kmalloc_node(sizeof(*dev), gfp, map->numa_node);
 		if (!dev)
 			return -ENOMEM;
 
+		dev->bulkq = __alloc_percpu_gfp(sizeof(*dev->bulkq),
+						sizeof(void *), gfp);
+		if (!dev->bulkq) {
+			kfree(dev);
+			return -ENOMEM;
+		}
+
 		dev->dev = dev_get_by_index(net, ifindex);
 		if (!dev->dev) {
 			kfree(dev);

^ permalink raw reply related

* [bpf-next PATCH 3/4] xdp: add tracepoint for devmap like cpumap have
From: Jesper Dangaard Brouer @ 2018-05-09 13:13 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: Christoph Hellwig, BjörnTöpel, Magnus Karlsson
In-Reply-To: <152587152136.20423.14493673928480468024.stgit@firesoul>

Notice how this allow us get XDP statistic without affecting the XDP
performance, as tracepoint is no-longer activated on a per packet basis.

The xdp_monitor sample/tool is updated to use this new tracepoint.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/linux/bpf.h            |    6 ++++-
 include/trace/events/xdp.h     |   39 +++++++++++++++++++++++++++++++++++
 kernel/bpf/devmap.c            |   25 ++++++++++++++++++-----
 net/core/filter.c              |    2 +-
 samples/bpf/xdp_monitor_kern.c |   39 +++++++++++++++++++++++++++++++++++
 samples/bpf/xdp_monitor_user.c |   44 +++++++++++++++++++++++++++++++++++++++-
 6 files changed, 146 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6457343f6dfa..066b5c67c71f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -489,7 +489,8 @@ struct xdp_buff;
 struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
 void __dev_map_insert_ctx(struct bpf_map *map, u32 index);
 void __dev_map_flush(struct bpf_map *map);
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp);
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+		    struct net_device *dev_rx);
 
 struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key);
 void __cpu_map_insert_ctx(struct bpf_map *map, u32 index);
@@ -574,7 +575,8 @@ static inline void __dev_map_flush(struct bpf_map *map)
 
 struct xdp_buff;
 static inline
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+		    struct net_device *dev_rx)
 {
 	return 0;
 }
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index 96104610d40e..2e9ef0650144 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -229,6 +229,45 @@ TRACE_EVENT(xdp_cpumap_enqueue,
 		  __entry->to_cpu)
 );
 
+TRACE_EVENT(xdp_devmap_xmit,
+
+	TP_PROTO(const struct bpf_map *map, u32 map_index,
+		 int sent, int drops,
+		 const struct net_device *from_dev,
+		 const struct net_device *to_dev),
+
+	TP_ARGS(map, map_index, sent, drops, from_dev, to_dev),
+
+	TP_STRUCT__entry(
+		__field(int, map_id)
+		__field(u32, act)
+		__field(u32, map_index)
+		__field(int, drops)
+		__field(int, sent)
+		__field(int, from_ifindex)
+		__field(int, to_ifindex)
+	),
+
+	TP_fast_assign(
+		__entry->map_id		= map->id;
+		__entry->act		= XDP_REDIRECT;
+		__entry->map_index	= map_index;
+		__entry->drops		= drops;
+		__entry->sent		= sent;
+		__entry->from_ifindex	= from_dev->ifindex;
+		__entry->to_ifindex	= to_dev->ifindex;
+	),
+
+	TP_printk("ndo_xdp_xmit"
+		  " map_id=%d map_index=%d action=%s"
+		  " sent=%d drops=%d"
+		  " from_ifindex=%d to_ifindex=%d",
+		  __entry->map_id, __entry->map_index,
+		  __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
+		  __entry->sent, __entry->drops,
+		  __entry->from_ifindex, __entry->to_ifindex)
+);
+
 #endif /* _TRACE_XDP_H */
 
 #include <trace/define_trace.h>
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index cab72c100bb5..6f84100723b0 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -50,6 +50,7 @@
 #include <linux/bpf.h>
 #include <net/xdp.h>
 #include <linux/filter.h>
+#include <trace/events/xdp.h>
 
 #define DEV_CREATE_FLAG_MASK \
 	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
@@ -57,6 +58,7 @@
 #define DEV_MAP_BULK_SIZE 16
 struct xdp_bulk_queue {
 	struct xdp_frame *q[DEV_MAP_BULK_SIZE];
+	struct net_device *dev_rx;
 	unsigned int count;
 };
 
@@ -219,8 +221,8 @@ void __dev_map_insert_ctx(struct bpf_map *map, u32 bit)
 static int bq_xmit_all(struct bpf_dtab_netdev *obj,
 			 struct xdp_bulk_queue *bq)
 {
-	unsigned int processed = 0, drops = 0;
 	struct net_device *dev = obj->dev;
+	int sent = 0, drops = 0;
 	int i;
 
 	if (unlikely(!bq->count))
@@ -241,10 +243,13 @@ static int bq_xmit_all(struct bpf_dtab_netdev *obj,
 			drops++;
 			xdp_return_frame(xdpf);
 		}
-		processed++;
+		sent++;
 	}
 	bq->count = 0;
 
+	trace_xdp_devmap_xmit(&obj->dtab->map, obj->bit,
+			      sent, drops, bq->dev_rx, dev);
+	bq->dev_rx = NULL;
 	return 0;
 }
 
@@ -301,18 +306,28 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
 /* Runs under RCU-read-side, plus in softirq under NAPI protection.
  * Thus, safe percpu variable access.
  */
-static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf)
+static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf,
+		      struct net_device *dev_rx)
+
 {
 	struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq);
 
 	if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
 		bq_xmit_all(obj, bq);
 
+	/* Ingress dev_rx will be the same for all xdp_frame's in
+	 * bulk_queue, because bq stored per-CPU and must be flushed
+	 * from net_device drivers NAPI func end.
+	 */
+	if (!bq->dev_rx)
+		bq->dev_rx = dev_rx;
+
 	bq->q[bq->count++] = xdpf;
 	return 0;
 }
 
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+		    struct net_device *dev_rx)
 {
 	struct net_device *dev = dst->dev;
 	struct xdp_frame *xdpf;
@@ -325,7 +340,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
 	if (unlikely(!xdpf))
 		return -EOVERFLOW;
 
-	err = bq_enqueue(dst, xdpf);
+	err = bq_enqueue(dst, xdpf, dev_rx);
 	if (err)
 		return err;
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 258d243099e1..8b7924368dc1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3022,7 +3022,7 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
 	case BPF_MAP_TYPE_DEVMAP: {
 		struct bpf_dtab_netdev *dst = fwd;
 
-		err = dev_map_enqueue(dst, xdp);
+		err = dev_map_enqueue(dst, xdp, dev_rx);
 		if (err)
 			return err;
 		__dev_map_insert_ctx(map, index);
diff --git a/samples/bpf/xdp_monitor_kern.c b/samples/bpf/xdp_monitor_kern.c
index 211db8ded0de..2854aa0665ea 100644
--- a/samples/bpf/xdp_monitor_kern.c
+++ b/samples/bpf/xdp_monitor_kern.c
@@ -208,3 +208,42 @@ int trace_xdp_cpumap_kthread(struct cpumap_kthread_ctx *ctx)
 
 	return 0;
 }
+
+struct bpf_map_def SEC("maps") devmap_xmit_cnt = {
+	.type		= BPF_MAP_TYPE_PERCPU_ARRAY,
+	.key_size	= sizeof(u32),
+	.value_size	= sizeof(struct datarec),
+	.max_entries	= 1,
+};
+
+/* Tracepoint: /sys/kernel/debug/tracing/events/xdp/xdp_devmap_xmit/format
+ * Code in:         kernel/include/trace/events/xdp.h
+ */
+struct devmap_xmit_ctx {
+	u64 __pad;		// First 8 bytes are not accessible by bpf code
+	int map_id;		//	offset:8;  size:4; signed:1;
+	u32 act;		//	offset:12; size:4; signed:0;
+	u32 map_index;		//	offset:16; size:4; signed:0;
+	int drops;		//	offset:20; size:4; signed:1;
+	int sent;		//	offset:24; size:4; signed:1;
+	int from_ifindex;	//	offset:28; size:4; signed:1;
+	int to_ifindex;		//	offset:32; size:4; signed:1;
+};
+
+SEC("tracepoint/xdp/xdp_devmap_xmit")
+int trace_xdp_devmap_xmit(struct devmap_xmit_ctx *ctx)
+{
+	struct datarec *rec;
+	u32 key = 0;
+
+	rec = bpf_map_lookup_elem(&devmap_xmit_cnt, &key);
+	if (!rec)
+		return 0;
+	rec->processed += ctx->sent;
+	rec->dropped   += ctx->drops;
+
+	/* Record bulk events, then userspace can calc average bulk size */
+	rec->info += 1;
+
+	return 1;
+}
diff --git a/samples/bpf/xdp_monitor_user.c b/samples/bpf/xdp_monitor_user.c
index 894bc64c2cac..4aaf1ab1927d 100644
--- a/samples/bpf/xdp_monitor_user.c
+++ b/samples/bpf/xdp_monitor_user.c
@@ -141,6 +141,7 @@ struct stats_record {
 	struct record_u64 xdp_exception[XDP_ACTION_MAX];
 	struct record xdp_cpumap_kthread;
 	struct record xdp_cpumap_enqueue[MAX_CPUS];
+	struct record xdp_devmap_xmit;
 };
 
 static bool map_collect_record(int fd, __u32 key, struct record *rec)
@@ -397,7 +398,7 @@ static void stats_print(struct stats_record *stats_rec,
 			info = calc_info(r, p, t);
 			if (info > 0)
 				i_str = "sched";
-			if (pps > 0)
+			if (pps > 0 || drop > 0)
 				printf(fmt1, "cpumap-kthread",
 				       i, pps, drop, info, i_str);
 		}
@@ -409,6 +410,42 @@ static void stats_print(struct stats_record *stats_rec,
 		printf(fmt2, "cpumap-kthread", "total", pps, drop, info, i_str);
 	}
 
+	/* devmap ndo_xdp_xmit stats */
+	{
+		char *fmt1 = "%-15s %-7d %'-12.0f %'-12.0f %'-10.2f %s\n";
+		char *fmt2 = "%-15s %-7s %'-12.0f %'-12.0f %'-10.2f %s\n";
+		struct record *rec, *prev;
+		double drop, info;
+		char *i_str = "";
+
+		rec  =  &stats_rec->xdp_devmap_xmit;
+		prev = &stats_prev->xdp_devmap_xmit;
+		t = calc_period(rec, prev);
+		for (i = 0; i < nr_cpus; i++) {
+			struct datarec *r = &rec->cpu[i];
+			struct datarec *p = &prev->cpu[i];
+
+			pps  = calc_pps(r, p, t);
+			drop = calc_drop(r, p, t);
+			info = calc_info(r, p, t);
+			if (info > 0) {
+				i_str = "bulk-average";
+				info = (pps+drop) / info; /* calc avg bulk */
+			}
+			if (pps > 0 || drop > 0)
+				printf(fmt1, "devmap-xmit",
+				       i, pps, drop, info, i_str);
+		}
+		pps = calc_pps(&rec->total, &prev->total, t);
+		drop = calc_drop(&rec->total, &prev->total, t);
+		info = calc_info(&rec->total, &prev->total, t);
+		if (info > 0) {
+			i_str = "bulk-average";
+			info = (pps+drop) / info; /* calc avg bulk */
+		}
+		printf(fmt2, "devmap-xmit", "total", pps, drop, info, i_str);
+	}
+
 	printf("\n");
 }
 
@@ -437,6 +474,9 @@ static bool stats_collect(struct stats_record *rec)
 	fd = map_data[3].fd; /* map3: cpumap_kthread_cnt */
 	map_collect_record(fd, 0, &rec->xdp_cpumap_kthread);
 
+	fd = map_data[4].fd; /* map4: devmap_xmit_cnt */
+	map_collect_record(fd, 0, &rec->xdp_devmap_xmit);
+
 	return true;
 }
 
@@ -480,6 +520,7 @@ static struct stats_record *alloc_stats_record(void)
 
 	rec_sz = sizeof(struct datarec);
 	rec->xdp_cpumap_kthread.cpu = alloc_rec_per_cpu(rec_sz);
+	rec->xdp_devmap_xmit.cpu    = alloc_rec_per_cpu(rec_sz);
 
 	for (i = 0; i < MAX_CPUS; i++)
 		rec->xdp_cpumap_enqueue[i].cpu = alloc_rec_per_cpu(rec_sz);
@@ -498,6 +539,7 @@ static void free_stats_record(struct stats_record *r)
 		free(r->xdp_exception[i].cpu);
 
 	free(r->xdp_cpumap_kthread.cpu);
+	free(r->xdp_devmap_xmit.cpu);
 
 	for (i = 0; i < MAX_CPUS; i++)
 		free(r->xdp_cpumap_enqueue[i].cpu);

^ permalink raw reply related

* [bpf-next PATCH 4/4] xdp: change ndo_xdp_xmit API to support bulking
From: Jesper Dangaard Brouer @ 2018-05-09 13:13 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: Christoph Hellwig, BjörnTöpel, Magnus Karlsson
In-Reply-To: <152587152136.20423.14493673928480468024.stgit@firesoul>

This patch change the API for ndo_xdp_xmit to support bulking
xdp_frames.

When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
Most of the slowdown is caused by DMA API indirect function calls, but
also the net_device->ndo_xdp_xmit() call.

Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
performance improved:
 for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
 for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps

With frames avail as a bulk inside the driver ndo_xdp_xmit call,
further optimizations are possible, like bulk DMA-mapping for TX.

Testing without CONFIG_RETPOLINE show the same performance for
physical NIC drivers.

The virtual NIC driver tun sees a huge performance boost, as it can
avoid doing per frame producer locking, but instead amortize the
locking cost over the bulk.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   26 +++++++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |    2 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   21 ++++++--
 drivers/net/tun.c                             |   37 +++++++++-----
 drivers/net/virtio_net.c                      |   66 +++++++++++++++++++------
 include/linux/netdevice.h                     |   14 +++--
 include/net/xdp.h                             |    1 
 include/trace/events/xdp.h                    |   10 ++--
 kernel/bpf/devmap.c                           |   33 ++++++++-----
 net/core/filter.c                             |    4 +-
 net/core/xdp.c                                |   14 ++++-
 samples/bpf/xdp_monitor_kern.c                |   10 ++++
 samples/bpf/xdp_monitor_user.c                |   35 +++++++++++--
 13 files changed, 197 insertions(+), 76 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 87fb27ab9c24..27e69fdd0a11 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -3686,14 +3686,19 @@ netdev_tx_t i40e_lan_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
  * @dev: netdev
  * @xdp: XDP buffer
  *
- * Returns Zero if sent, else an error code
+ * Returns number of frames successfully sent. Frames that fail are
+ * free'ed via XDP return API.
+ *
+ * For error cases, a negative errno code is returned and no-frames
+ * are transmitted (caller must handle freeing frames).
  **/
-int i40e_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
+int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames)
 {
 	struct i40e_netdev_priv *np = netdev_priv(dev);
 	unsigned int queue_index = smp_processor_id();
 	struct i40e_vsi *vsi = np->vsi;
-	int err;
+	int drops = 0;
+	int i;
 
 	if (test_bit(__I40E_VSI_DOWN, vsi->state))
 		return -ENETDOWN;
@@ -3701,11 +3706,18 @@ int i40e_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
 	if (!i40e_enabled_xdp_vsi(vsi) || queue_index >= vsi->num_queue_pairs)
 		return -ENXIO;
 
-	err = i40e_xmit_xdp_ring(xdpf, vsi->xdp_rings[queue_index]);
-	if (err != I40E_XDP_TX)
-		return -ENOSPC;
+	for (i = 0; i < n; i++) {
+		struct xdp_frame *xdpf = frames[i];
+		int err;
 
-	return 0;
+		err = i40e_xmit_xdp_ring(xdpf, vsi->xdp_rings[queue_index]);
+		if (err != I40E_XDP_TX) {
+			xdp_return_frame_rx_napi(xdpf);
+			drops++;
+		}
+	}
+
+	return n - drops;
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 4bf318b8be85..da6697c0407f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -511,7 +511,7 @@ u32 i40e_get_tx_pending(struct i40e_ring *ring, bool in_sw);
 void i40e_detect_recover_hung(struct i40e_vsi *vsi);
 int __i40e_maybe_stop_tx(struct i40e_ring *tx_ring, int size);
 bool __i40e_chk_linearize(struct sk_buff *skb);
-int i40e_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf);
+int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames);
 void i40e_xdp_flush(struct net_device *dev);
 
 /**
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index b6e5cea84949..e64f71bc04c2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -10042,11 +10042,13 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 	}
 }
 
-static int ixgbe_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
+static int ixgbe_xdp_xmit(struct net_device *dev, int n,
+			  struct xdp_frame **frames)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(dev);
 	struct ixgbe_ring *ring;
-	int err;
+	int drops = 0;
+	int i;
 
 	if (unlikely(test_bit(__IXGBE_DOWN, &adapter->state)))
 		return -ENETDOWN;
@@ -10058,11 +10060,18 @@ static int ixgbe_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
 	if (unlikely(!ring))
 		return -ENXIO;
 
-	err = ixgbe_xmit_xdp_ring(adapter, xdpf);
-	if (err != IXGBE_XDP_TX)
-		return -ENOSPC;
+	for (i = 0; i < n; i++) {
+		struct xdp_frame *xdpf = frames[i];
+		int err;
 
-	return 0;
+		err = ixgbe_xmit_xdp_ring(adapter, xdpf);
+		if (err != IXGBE_XDP_TX) {
+			xdp_return_frame_rx_napi(xdpf);
+			drops++;
+		}
+	}
+
+	return n - drops;
 }
 
 static void ixgbe_xdp_flush(struct net_device *dev)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d3c04ab9752a..4fe0c75c5e0b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -70,6 +70,7 @@
 #include <net/netns/generic.h>
 #include <net/rtnetlink.h>
 #include <net/sock.h>
+#include <net/xdp.h>
 #include <linux/seq_file.h>
 #include <linux/uio.h>
 #include <linux/skb_array.h>
@@ -1290,34 +1291,44 @@ static const struct net_device_ops tun_netdev_ops = {
 	.ndo_get_stats64	= tun_net_get_stats64,
 };
 
-static int tun_xdp_xmit(struct net_device *dev, struct xdp_frame *frame)
+static int tun_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames)
 {
 	struct tun_struct *tun = netdev_priv(dev);
 	struct tun_file *tfile;
 	u32 numqueues;
-	int ret = 0;
+	int drops = 0;
+	int cnt = n;
+	int i;
 
 	rcu_read_lock();
 
 	numqueues = READ_ONCE(tun->numqueues);
 	if (!numqueues) {
-		ret = -ENOSPC;
-		goto out;
+		rcu_read_unlock();
+		return -ENXIO; /* Caller will free/return all frames */
 	}
 
 	tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
 					    numqueues]);
-	/* Encode the XDP flag into lowest bit for consumer to differ
-	 * XDP buffer from sk_buff.
-	 */
-	if (ptr_ring_produce(&tfile->tx_ring, tun_xdp_to_ptr(frame))) {
-		this_cpu_inc(tun->pcpu_stats->tx_dropped);
-		ret = -ENOSPC;
+
+	spin_lock(&tfile->tx_ring.producer_lock);
+	for (i = 0; i < n; i++) {
+		struct xdp_frame *xdp = frames[i];
+		/* Encode the XDP flag into lowest bit for consumer to differ
+		 * XDP buffer from sk_buff.
+		 */
+		void *frame = tun_xdp_to_ptr(xdp);
+
+		if (__ptr_ring_produce(&tfile->tx_ring, frame)) {
+			this_cpu_inc(tun->pcpu_stats->tx_dropped);
+			xdp_return_frame_rx_napi(xdp);
+			drops++;
+		}
 	}
+	spin_unlock(&tfile->tx_ring.producer_lock);
 
-out:
 	rcu_read_unlock();
-	return ret;
+	return cnt - drops;
 }
 
 static int tun_xdp_tx(struct net_device *dev, struct xdp_buff *xdp)
@@ -1327,7 +1338,7 @@ static int tun_xdp_tx(struct net_device *dev, struct xdp_buff *xdp)
 	if (unlikely(!frame))
 		return -EOVERFLOW;
 
-	return tun_xdp_xmit(dev, frame);
+	return tun_xdp_xmit(dev, 1, &frame);
 }
 
 static void tun_xdp_flush(struct net_device *dev)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f34794a76c4d..39a0783d1cde 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -419,23 +419,13 @@ static void virtnet_xdp_flush(struct net_device *dev)
 	virtqueue_kick(sq->vq);
 }
 
-static int __virtnet_xdp_xmit(struct virtnet_info *vi,
-			       struct xdp_frame *xdpf)
+static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
+				   struct send_queue *sq,
+				   struct xdp_frame *xdpf)
 {
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
-	struct xdp_frame *xdpf_sent;
-	struct send_queue *sq;
-	unsigned int len;
-	unsigned int qp;
 	int err;
 
-	qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
-	sq = &vi->sq[qp];
-
-	/* Free up any pending old buffers before queueing new ones. */
-	while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL)
-		xdp_return_frame(xdpf_sent);
-
 	/* virtqueue want to use data area in-front of packet */
 	if (unlikely(xdpf->metasize > 0))
 		return -EOPNOTSUPP;
@@ -459,11 +449,40 @@ static int __virtnet_xdp_xmit(struct virtnet_info *vi,
 	return 0;
 }
 
-static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
+static int __virtnet_xdp_tx_xmit(struct virtnet_info *vi,
+				   struct xdp_frame *xdpf)
+{
+	struct xdp_frame *xdpf_sent;
+	struct send_queue *sq;
+	unsigned int len;
+	unsigned int qp;
+
+	qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
+	sq = &vi->sq[qp];
+
+	/* Free up any pending old buffers before queueing new ones. */
+	while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL)
+		xdp_return_frame(xdpf_sent);
+
+	return __virtnet_xdp_xmit_one(vi, sq, xdpf);
+}
+
+static int virtnet_xdp_xmit(struct net_device *dev,
+			    int n, struct xdp_frame **frames)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct receive_queue *rq = vi->rq;
+	struct xdp_frame *xdpf_sent;
 	struct bpf_prog *xdp_prog;
+	struct send_queue *sq;
+	unsigned int len;
+	unsigned int qp;
+	int drops = 0;
+	int err;
+	int i;
+
+	qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
+	sq = &vi->sq[qp];
 
 	/* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
 	 * indicate XDP resources have been successfully allocated.
@@ -472,7 +491,20 @@ static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
 	if (!xdp_prog)
 		return -ENXIO;
 
-	return __virtnet_xdp_xmit(vi, xdpf);
+	/* Free up any pending old buffers before queueing new ones. */
+	while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL)
+		xdp_return_frame(xdpf_sent);
+
+	for (i = 0; i < n; i++) {
+		struct xdp_frame *xdpf = frames[i];
+
+		err = __virtnet_xdp_xmit_one(vi, sq, xdpf);
+		if (err) {
+			xdp_return_frame_rx_napi(xdpf);
+			drops++;
+		}
+	}
+	return n - drops;
 }
 
 static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
@@ -616,7 +648,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 			xdpf = convert_to_xdp_frame(&xdp);
 			if (unlikely(!xdpf))
 				goto err_xdp;
-			err = __virtnet_xdp_xmit(vi, xdpf);
+			err = __virtnet_xdp_tx_xmit(vi, xdpf);
 			if (unlikely(err)) {
 				trace_xdp_exception(vi->dev, xdp_prog, act);
 				goto err_xdp;
@@ -779,7 +811,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			xdpf = convert_to_xdp_frame(&xdp);
 			if (unlikely(!xdpf))
 				goto err_xdp;
-			err = __virtnet_xdp_xmit(vi, xdpf);
+			err = __virtnet_xdp_tx_xmit(vi, xdpf);
 			if (unlikely(err)) {
 				trace_xdp_exception(vi->dev, xdp_prog, act);
 				if (unlikely(xdp_page != page))
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a30435118530..1c92f3b63c70 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1165,9 +1165,13 @@ struct dev_ifalias {
  *	This function is used to set or query state related to XDP on the
  *	netdevice and manage BPF offload. See definition of
  *	enum bpf_netdev_command for details.
- * int (*ndo_xdp_xmit)(struct net_device *dev, struct xdp_frame *xdp);
- *	This function is used to submit a XDP packet for transmit on a
- *	netdevice.
+ * int (*ndo_xdp_xmit)(struct net_device *dev, int n, struct xdp_frame **xdp);
+ *	This function is used to submit @n XDP packets for transmit on a
+ *	netdevice. Returns number of frames successfully transmitted, frames
+ *	that got dropped are freed/returned via xdp_return_frame().
+ *	Returns negative number, means general error invoking ndo, meaning
+ *	no frames were xmit'ed and core-caller will free all frames.
+ *	TODO: Consider add flag to allow sending flush operation.
  * void (*ndo_xdp_flush)(struct net_device *dev);
  *	This function is used to inform the driver to flush a particular
  *	xdp tx queue. Must be called on same CPU as xdp_xmit.
@@ -1355,8 +1359,8 @@ struct net_device_ops {
 						       int needed_headroom);
 	int			(*ndo_bpf)(struct net_device *dev,
 					   struct netdev_bpf *bpf);
-	int			(*ndo_xdp_xmit)(struct net_device *dev,
-						struct xdp_frame *xdp);
+	int			(*ndo_xdp_xmit)(struct net_device *dev, int n,
+						struct xdp_frame **xdp);
 	void			(*ndo_xdp_flush)(struct net_device *dev);
 };
 
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 0b689cf561c7..7ad779237ae8 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -104,6 +104,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
 }
 
 void xdp_return_frame(struct xdp_frame *xdpf);
+void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
 void xdp_return_buff(struct xdp_buff *xdp);
 
 int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index 2e9ef0650144..1ecf4c67fcf7 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -234,9 +234,9 @@ TRACE_EVENT(xdp_devmap_xmit,
 	TP_PROTO(const struct bpf_map *map, u32 map_index,
 		 int sent, int drops,
 		 const struct net_device *from_dev,
-		 const struct net_device *to_dev),
+		 const struct net_device *to_dev, int err),
 
-	TP_ARGS(map, map_index, sent, drops, from_dev, to_dev),
+	TP_ARGS(map, map_index, sent, drops, from_dev, to_dev, err),
 
 	TP_STRUCT__entry(
 		__field(int, map_id)
@@ -246,6 +246,7 @@ TRACE_EVENT(xdp_devmap_xmit,
 		__field(int, sent)
 		__field(int, from_ifindex)
 		__field(int, to_ifindex)
+		__field(int, err)
 	),
 
 	TP_fast_assign(
@@ -256,16 +257,17 @@ TRACE_EVENT(xdp_devmap_xmit,
 		__entry->sent		= sent;
 		__entry->from_ifindex	= from_dev->ifindex;
 		__entry->to_ifindex	= to_dev->ifindex;
+		__entry->err		= err;
 	),
 
 	TP_printk("ndo_xdp_xmit"
 		  " map_id=%d map_index=%d action=%s"
 		  " sent=%d drops=%d"
-		  " from_ifindex=%d to_ifindex=%d",
+		  " from_ifindex=%d to_ifindex=%d err=%d",
 		  __entry->map_id, __entry->map_index,
 		  __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
 		  __entry->sent, __entry->drops,
-		  __entry->from_ifindex, __entry->to_ifindex)
+		  __entry->from_ifindex, __entry->to_ifindex, __entry->err)
 );
 
 #endif /* _TRACE_XDP_H */
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 6f84100723b0..4dd8f0e3a8d9 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -222,7 +222,7 @@ static int bq_xmit_all(struct bpf_dtab_netdev *obj,
 			 struct xdp_bulk_queue *bq)
 {
 	struct net_device *dev = obj->dev;
-	int sent = 0, drops = 0;
+	int sent = 0, drops = 0, err = 0;
 	int i;
 
 	if (unlikely(!bq->count))
@@ -234,23 +234,32 @@ static int bq_xmit_all(struct bpf_dtab_netdev *obj,
 		prefetch(xdpf);
 	}
 
-	for (i = 0; i < bq->count; i++) {
-		struct xdp_frame *xdpf = bq->q[i];
-		int err;
-
-		err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
-		if (err) {
-			drops++;
-			xdp_return_frame(xdpf);
-		}
-		sent++;
+	sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q);
+	if (sent < 0) {
+		err = sent;
+		sent = 0;
+		goto error;
 	}
+	drops = bq->count - sent;
+out:
 	bq->count = 0;
 
 	trace_xdp_devmap_xmit(&obj->dtab->map, obj->bit,
-			      sent, drops, bq->dev_rx, dev);
+			      sent, drops, bq->dev_rx, dev, err);
 	bq->dev_rx = NULL;
 	return 0;
+error:
+	/* If ndo_xdp_xmit fails with an errno, no frames have been
+	 * xmit'ed and it's our responsibility to them free all.
+	 */
+	for (i = 0; i < bq->count; i++) {
+		struct xdp_frame *xdpf = bq->q[i];
+
+		/* RX path under NAPI protection, can return frames faster */
+		xdp_return_frame_rx_napi(xdpf);
+		drops++;
+	}
+	goto out;
 }
 
 /* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled
diff --git a/net/core/filter.c b/net/core/filter.c
index 8b7924368dc1..547174d37f66 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3004,8 +3004,8 @@ static int __bpf_tx_xdp(struct net_device *dev,
 	if (unlikely(!xdpf))
 		return -EOVERFLOW;
 
-	err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
-	if (err)
+	err = dev->netdev_ops->ndo_xdp_xmit(dev, 1, &xdpf);
+	if (err <= 0)
 		return err;
 	dev->netdev_ops->ndo_xdp_flush(dev);
 	return 0;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index bf6758f74339..237d374e6ef7 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -308,7 +308,7 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
 }
 EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
 
-static void xdp_return(void *data, struct xdp_mem_info *mem)
+static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct)
 {
 	struct xdp_mem_allocator *xa;
 	struct page *page;
@@ -320,7 +320,7 @@ static void xdp_return(void *data, struct xdp_mem_info *mem)
 		xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
 		page = virt_to_head_page(data);
 		if (xa)
-			page_pool_put_page(xa->page_pool, page);
+			__page_pool_put_page(xa->page_pool, page, napi_direct);
 		else
 			put_page(page);
 		rcu_read_unlock();
@@ -340,12 +340,18 @@ static void xdp_return(void *data, struct xdp_mem_info *mem)
 
 void xdp_return_frame(struct xdp_frame *xdpf)
 {
-	xdp_return(xdpf->data, &xdpf->mem);
+	__xdp_return(xdpf->data, &xdpf->mem, false);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame);
 
+void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
+{
+	__xdp_return(xdpf->data, &xdpf->mem, true);
+}
+EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
+
 void xdp_return_buff(struct xdp_buff *xdp)
 {
-	xdp_return(xdp->data, &xdp->rxq->mem);
+	__xdp_return(xdp->data, &xdp->rxq->mem, true);
 }
 EXPORT_SYMBOL_GPL(xdp_return_buff);
diff --git a/samples/bpf/xdp_monitor_kern.c b/samples/bpf/xdp_monitor_kern.c
index 2854aa0665ea..ad10fe700d7d 100644
--- a/samples/bpf/xdp_monitor_kern.c
+++ b/samples/bpf/xdp_monitor_kern.c
@@ -125,6 +125,7 @@ struct datarec {
 	u64 processed;
 	u64 dropped;
 	u64 info;
+	u64 err;
 };
 #define MAX_CPUS 64
 
@@ -228,6 +229,7 @@ struct devmap_xmit_ctx {
 	int sent;		//	offset:24; size:4; signed:1;
 	int from_ifindex;	//	offset:28; size:4; signed:1;
 	int to_ifindex;		//	offset:32; size:4; signed:1;
+	int err;		//	offset:36; size:4; signed:1;
 };
 
 SEC("tracepoint/xdp/xdp_devmap_xmit")
@@ -245,5 +247,13 @@ int trace_xdp_devmap_xmit(struct devmap_xmit_ctx *ctx)
 	/* Record bulk events, then userspace can calc average bulk size */
 	rec->info += 1;
 
+	/* Record error cases, where no frame were sent */
+	if (ctx->err)
+		rec->err++;
+
+	/* Catch API error of drv ndo_xdp_xmit sent more than count */
+	if (ctx->drops < 0)
+		rec->err++;
+
 	return 1;
 }
diff --git a/samples/bpf/xdp_monitor_user.c b/samples/bpf/xdp_monitor_user.c
index 4aaf1ab1927d..0ec7967dd32a 100644
--- a/samples/bpf/xdp_monitor_user.c
+++ b/samples/bpf/xdp_monitor_user.c
@@ -117,6 +117,7 @@ struct datarec {
 	__u64 processed;
 	__u64 dropped;
 	__u64 info;
+	__u64 err;
 };
 #define MAX_CPUS 64
 
@@ -152,6 +153,7 @@ static bool map_collect_record(int fd, __u32 key, struct record *rec)
 	__u64 sum_processed = 0;
 	__u64 sum_dropped = 0;
 	__u64 sum_info = 0;
+	__u64 sum_err = 0;
 	int i;
 
 	if ((bpf_map_lookup_elem(fd, &key, values)) != 0) {
@@ -170,10 +172,13 @@ static bool map_collect_record(int fd, __u32 key, struct record *rec)
 		sum_dropped        += values[i].dropped;
 		rec->cpu[i].info = values[i].info;
 		sum_info        += values[i].info;
+		rec->cpu[i].err = values[i].err;
+		sum_err        += values[i].err;
 	}
 	rec->total.processed = sum_processed;
 	rec->total.dropped   = sum_dropped;
 	rec->total.info      = sum_info;
+	rec->total.err       = sum_err;
 	return true;
 }
 
@@ -274,6 +279,18 @@ static double calc_info(struct datarec *r, struct datarec *p, double period)
 	return pps;
 }
 
+static double calc_err(struct datarec *r, struct datarec *p, double period)
+{
+	__u64 packets = 0;
+	double pps = 0;
+
+	if (period > 0) {
+		packets = r->err - p->err;
+		pps = packets / period;
+	}
+	return pps;
+}
+
 static void stats_print(struct stats_record *stats_rec,
 			struct stats_record *stats_prev,
 			bool err_only)
@@ -412,11 +429,12 @@ static void stats_print(struct stats_record *stats_rec,
 
 	/* devmap ndo_xdp_xmit stats */
 	{
-		char *fmt1 = "%-15s %-7d %'-12.0f %'-12.0f %'-10.2f %s\n";
-		char *fmt2 = "%-15s %-7s %'-12.0f %'-12.0f %'-10.2f %s\n";
+		char *fmt1 = "%-15s %-7d %'-12.0f %'-12.0f %'-10.2f %s %s\n";
+		char *fmt2 = "%-15s %-7s %'-12.0f %'-12.0f %'-10.2f %s %s\n";
 		struct record *rec, *prev;
-		double drop, info;
+		double drop, info, err;
 		char *i_str = "";
+		char *err_str = "";
 
 		rec  =  &stats_rec->xdp_devmap_xmit;
 		prev = &stats_prev->xdp_devmap_xmit;
@@ -428,22 +446,29 @@ static void stats_print(struct stats_record *stats_rec,
 			pps  = calc_pps(r, p, t);
 			drop = calc_drop(r, p, t);
 			info = calc_info(r, p, t);
+			err  = calc_err(r, p, t);
 			if (info > 0) {
 				i_str = "bulk-average";
 				info = (pps+drop) / info; /* calc avg bulk */
 			}
+			if (err > 0)
+				err_str = "drv-err";
 			if (pps > 0 || drop > 0)
 				printf(fmt1, "devmap-xmit",
-				       i, pps, drop, info, i_str);
+				       i, pps, drop, info, i_str, err_str);
 		}
 		pps = calc_pps(&rec->total, &prev->total, t);
 		drop = calc_drop(&rec->total, &prev->total, t);
 		info = calc_info(&rec->total, &prev->total, t);
+		err  = calc_err(&rec->total, &prev->total, t);
 		if (info > 0) {
 			i_str = "bulk-average";
 			info = (pps+drop) / info; /* calc avg bulk */
 		}
-		printf(fmt2, "devmap-xmit", "total", pps, drop, info, i_str);
+		if (err > 0)
+			err_str = "drv-err";
+		printf(fmt2, "devmap-xmit", "total", pps, drop,
+		       info, i_str, err_str);
 	}
 
 	printf("\n");

^ permalink raw reply related

* [PATCH v2 01/11] docs: can.rst: fix a footnote reference
From: Mauro Carvalho Chehab @ 2018-05-09 13:18 UTC (permalink / raw)
  To: Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, Mauro Carvalho Chehab, linux-kernel,
	Jonathan Corbet, Oliver Hartkopp, Marc Kleine-Budde,
	David S. Miller, linux-can, netdev
In-Reply-To: <cover.1525870886.git.mchehab+samsung@kernel.org>

As stated at:
	http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#footnotes

A footnote should contain either a number, a reference or
an auto number, e. g.:
        [1], [#f1] or [#].

While using [*] accidentaly works for html, it fails for other
document outputs. In particular, it causes an error with LaTeX
output, causing all books after networking to not be built.

So, replace it by a valid syntax.

Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/networking/can.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/can.rst b/Documentation/networking/can.rst
index d23c51abf8c6..2fd0b51a8c52 100644
--- a/Documentation/networking/can.rst
+++ b/Documentation/networking/can.rst
@@ -164,7 +164,7 @@ The Linux network devices (by default) just can handle the
 transmission and reception of media dependent frames. Due to the
 arbitration on the CAN bus the transmission of a low prio CAN-ID
 may be delayed by the reception of a high prio CAN frame. To
-reflect the correct [*]_ traffic on the node the loopback of the sent
+reflect the correct [#f1]_ traffic on the node the loopback of the sent
 data has to be performed right after a successful transmission. If
 the CAN network interface is not capable of performing the loopback for
 some reason the SocketCAN core can do this task as a fallback solution.
@@ -175,7 +175,7 @@ networking behaviour for CAN applications. Due to some requests from
 the RT-SocketCAN group the loopback optionally may be disabled for each
 separate socket. See sockopts from the CAN RAW sockets in :ref:`socketcan-raw-sockets`.
 
-.. [*] you really like to have this when you're running analyser
+.. [#f1] you really like to have this when you're running analyser
        tools like 'candump' or 'cansniffer' on the (same) node.
 
 
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net-next] tun: Do SIOCGSKNS out of rtnl_lock()
From: Jason Wang @ 2018-05-09 13:19 UTC (permalink / raw)
  To: Kirill Tkhai, davem, edumazet, mst, brouer, peterpenkov96, sd,
	netdev
In-Reply-To: <44b2f8b6-7152-8ba6-f26d-d08ed820f980@virtuozzo.com>



On 2018年05月09日 17:00, Kirill Tkhai wrote:
> Hi, Jason,
>
> On 09.05.2018 10:18, Jason Wang wrote:
>>
>> On 2018年05月09日 00:21, Kirill Tkhai wrote:
>>> Since net ns of tun device is assigned on the device creation,
>>> and it never changes, we do not need to use any lock to get it
>>> from alive tun.
>>>
>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>> ---
>>>    drivers/net/tun.c |   18 +++++++-----------
>>>    1 file changed, 7 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>> index d3c04ab9752a..44d4f3d25350 100644
>>> --- a/drivers/net/tun.c
>>> +++ b/drivers/net/tun.c
>>> @@ -2850,10 +2850,10 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
>>>                    unsigned long arg, int ifreq_len)
>>>    {
>>>        struct tun_file *tfile = file->private_data;
>>> +    struct net *net = sock_net(&tfile->sk);
>>>        struct tun_struct *tun;
>>>        void __user* argp = (void __user*)arg;
>>>        struct ifreq ifr;
>>> -    struct net *net;
>>>        kuid_t owner;
>>>        kgid_t group;
>>>        int sndbuf;
>>> @@ -2877,14 +2877,18 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
>>>             */
>>>            return put_user(IFF_TUN | IFF_TAP | TUN_FEATURES,
>>>                    (unsigned int __user*)argp);
>>> -    } else if (cmd == TUNSETQUEUE)
>>> +    } else if (cmd == TUNSETQUEUE) {
>>>            return tun_set_queue(file, &ifr);
>>> +    } else if (cmd == SIOCGSKNS) {
>> Not for this patch, reusing socket ioctl cmd is probably not good though they were probably not intersected (see ioctl-number.txt). We probably need to introduce TUN specific ioctls for SIOCGSKNS and SIOCGIFHWADDR and warn for socket ones.
> The most of socket ioctl cmds use 0x8900 type:
>
> #define SOCK_IOC_TYPE   0x89
>
> while tun cmd is 5400 ('T'). They should not intersect.
>
> The only exceptions are
>
> #define SIOCINQ         FIONREAD
> #define SIOCOUTQ        TIOCOUTQ
>
> #define TIOCOUTQ        0x5411
> #define FIONREAD        0x541B
>
> But they can't intersect even with exceptions, since tun nr starts from 200:
>
> #define TUNSETNOCSUM  _IOW('T', 200, int)
>
> and 200 > 0x1b (==27).
>
> I implemented SIOCGSKNS cmd in the same style as older socket cmds were used.
> I'm not sure, we can remove existing SIOCGIFHWADDR, since they are already used.

I think it's too late to remove it.

> If we add a warn, which time will we able to remove them? Some old software may
> use it, and in case of the program isn't developed any more, nobody can fix this
> warnings, even if he/she sees them..

I think this give a chance to push new wrote userspace to use new ioctl cmd.

Thanks

>
> Kirill
>
>>> +        if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
>>> +            return -EPERM;
>>> +        return open_related_ns(&net->ns, get_net_ns);
>>> +    }
>>>          ret = 0;
>>>        rtnl_lock();
>>>          tun = tun_get(tfile);
>>> -    net = sock_net(&tfile->sk);
>>>        if (cmd == TUNSETIFF) {
>>>            ret = -EEXIST;
>>>            if (tun)
>>> @@ -2914,14 +2918,6 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
>>>            tfile->ifindex = ifindex;
>>>            goto unlock;
>>>        }
>>> -    if (cmd == SIOCGSKNS) {
>>> -        ret = -EPERM;
>>> -        if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
>>> -            goto unlock;
>>> -
>>> -        ret = open_related_ns(&net->ns, get_net_ns);
>>> -        goto unlock;
>>> -    }
>>>          ret = -EBADFD;
>>>        if (!tun)
>>>

^ permalink raw reply

* Re: [PATCH v6 13/13] Documentation: clarify firmware_class provenance and why we can't rename the module
From: Mauro Carvalho Chehab @ 2018-05-09 13:21 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: gregkh, akpm, keescook, josh, maco, andy.gross, david.brown,
	bjorn.andersson, teg, wagi, hdegoede, andresx7, zohar, kubakici,
	shuah, mfuzzey, dhowells, pali.rohar, tiwai, kvalo,
	arend.vanspriel, zajec5, nbroeking, markivx, broonie,
	dmitry.torokhov, dwmw2, torvalds, Abhay_Salunke, jewalt, oneukum,
	cantabile.desu, ast, hare, jejb, martin.petersen, khc
In-Reply-To: <20180508181247.19431-14-mcgrof@kernel.org>

Em Tue,  8 May 2018 11:12:47 -0700
"Luis R. Rodriguez" <mcgrof@kernel.org> escreveu:

> Clarify the provenance of the firmware loader firmware_class module name
> and why we cannot rename the module in the future.
> 
> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> ---
>  .../driver-api/firmware/fallback-mechanisms.rst          | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/driver-api/firmware/fallback-mechanisms.rst b/Documentation/driver-api/firmware/fallback-mechanisms.rst
> index a39323ef7d29..a8047be4a96e 100644
> --- a/Documentation/driver-api/firmware/fallback-mechanisms.rst
> +++ b/Documentation/driver-api/firmware/fallback-mechanisms.rst
> @@ -72,9 +72,12 @@ the firmware requested, and establishes it in the device hierarchy by
>  associating the device used to make the request as the device's parent.
>  The sysfs directory's file attributes are defined and controlled through
>  the new device's class (firmware_class) and group (fw_dev_attr_groups).
> -This is actually where the original firmware_class.c file name comes from,
> -as originally the only firmware loading mechanism available was the
> -mechanism we now use as a fallback mechanism.
> +This is actually where the original firmware_class module name came from,
> +given that originally the only firmware loading mechanism available was the
> +mechanism we now use as a fallback mechanism, which which registers a
> +struct class firmware_class. Because the attributes exposed are part of the
> +module name, the module name firmware_class cannot be renamed in the future, to
> +ensure backward compatibilty with old userspace.

Ah, now the explanation makes a lot more sense to me :-)

Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

>  
>  To load firmware using the sysfs interface we expose a loading indicator,
>  and a file upload firmware into:



Thanks,
Mauro

^ permalink raw reply

* Please apply 3f1e53abff84c (" netfilter: ebtables: don't attempt to allocate 0-sized compat array") to v4.14.y and later
From: Guenter Roeck @ 2018-05-09 13:26 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: dvyukov@google.com, David S. Miller, Network Development

Hi Greg,

Commit 3f1e53abff84c fixes commit 7d7d7e02111e9 ("("netfilter: compat: reject huge
allocation requests"), which has found its way into 4.14.y and 4.16.y.
This causes syzcaller hiccups at least in 4.14.y (more specifically chromeos-4.14).

Please apply 3f1e53abff84c to v4.14.y and v4.16.y to fix the problem. Copying Dave
and netdev in case he wants to handle it.

Sorry for the noise if this has already been queued.

Thanks,
Guenter

^ permalink raw reply

* Re: [PATCH v2 net-next] drivers: net: davinci_mdio: prevent spurious timeout
From: Andrew Lunn @ 2018-05-09 13:30 UTC (permalink / raw)
  To: Sekhar Nori; +Cc: David S . Miller, Grygorii Strashko, linux-omap, netdev
In-Reply-To: <20180509110024.1772-1-nsekhar@ti.com>

On Wed, May 09, 2018 at 04:30:24PM +0530, Sekhar Nori wrote:
> A well timed kernel preemption in the time_after() loop
> in wait_for_idle() can result in a spurious timeout
> error to be returned.
> 
> Fix it by using readl_poll_timeout() which takes care of
> this issue.
> 
> Signed-off-by: Sekhar Nori <nsekhar@ti.com>
> ---
> v2: use readl_poll_timeout() per suggestion from Andrew.
> 
> The issue has not been personally observed by me, but has
> been reported by users. Sending for next-next given the
> non-critical nature. There is seems to be no easy way to
> reproduce this.
> 
>  drivers/net/ethernet/ti/davinci_mdio.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/davinci_mdio.c b/drivers/net/ethernet/ti/davinci_mdio.c
> index 3c33f4504d8e..d073432a5dbe 100644
> --- a/drivers/net/ethernet/ti/davinci_mdio.c
> +++ b/drivers/net/ethernet/ti/davinci_mdio.c
> @@ -34,6 +34,7 @@
>  #include <linux/clk.h>
>  #include <linux/err.h>
>  #include <linux/io.h>
> +#include <linux/iopoll.h>
>  #include <linux/pm_runtime.h>
>  #include <linux/davinci_emac.h>
>  #include <linux/of.h>
> @@ -227,14 +228,16 @@ static inline int wait_for_user_access(struct davinci_mdio_data *data)
>  static inline int wait_for_idle(struct davinci_mdio_data *data)
>  {
>  	struct davinci_mdio_regs __iomem *regs = data->regs;
> -	unsigned long timeout = jiffies + msecs_to_jiffies(MDIO_TIMEOUT);
> +	u32 val, ret;
>  
> -	while (time_after(timeout, jiffies)) {
> -		if (__raw_readl(&regs->control) & CONTROL_IDLE)
> -			return 0;
> +	ret = readl_poll_timeout(&regs->control, val, val & CONTROL_IDLE,
> +				 0, MDIO_TIMEOUT * 1000);
> +	if (ret) {
> +		dev_err(data->dev, "timed out waiting for idle\n");
> +		return ret;
>  	}
> -	dev_err(data->dev, "timed out waiting for idle\n");
> -	return -ETIMEDOUT;
> +
> +	return 0;
>  }

Hi Sekhar

You could simplify this to:

> +	if (ret)
> +		dev_err(data->dev, "timed out waiting for idle\n");
> +	return ret;

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [PATCH net] cxgb4: copy mbox log size to PF0-3 adap instances
From: Ganesh Goudar @ 2018-05-09 13:30 UTC (permalink / raw)
  To: netdev, davem; +Cc: nirranjan, indranil, venkatesh, Ganesh Goudar, Casey Leedom

copy mbox size to adapter instances of PF0-3 to avoid
mbox log overflow. This fixes the possible protection
fault.

Fixes: baf5086840ab ("cxgb4: restructure VF mgmt code")
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 8b34a7b..3f6d204 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5474,6 +5474,7 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	}
 	spin_lock_init(&adapter->mbox_lock);
 	INIT_LIST_HEAD(&adapter->mlist.list);
+	adapter->mbox_log->size = T4_OS_LOG_MBOX_CMDS;
 	pci_set_drvdata(pdev, adapter);
 
 	if (func != ent->driver_data) {
@@ -5508,8 +5509,6 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto out_free_adapter;
 	}
 
-	adapter->mbox_log->size = T4_OS_LOG_MBOX_CMDS;
-
 	/* PCI device has been enabled */
 	adapter->flags |= DEV_ENABLED;
 	memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH] net: phy: DP83TC811: Introduce support for the DP83TC811 phy
From: Andrew Lunn @ 2018-05-09 13:43 UTC (permalink / raw)
  To: Dan Murphy; +Cc: f.fainelli, netdev, linux-kernel
In-Reply-To: <20180509120559.12725-1-dmurphy@ti.com>

> +static int dp83811_config_aneg(struct phy_device *phydev)
> +{
> +	int err;
> +	int value;
> +
> +	value = phy_read(phydev, MII_DP83811_SGMII_CTRL);
> +	if (phydev->autoneg == AUTONEG_ENABLE) {
> +		err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
> +				(DP83811_SGMII_AUTO_NEG_EN | value));
> +		if (err < 0)
> +			return err;
> +	} else {
> +		err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
> +				(~DP83811_SGMII_AUTO_NEG_EN & value));
> +		if (err < 0)
> +			return err;
> +	}
> +

Hi Dan

You say SGMII is unreliable on one of these devices. Should you check
phydev->interface before enabling SGMII autoneg?

> +	return genphy_config_aneg(phydev);
> +}
> +
> +static int dp83811_config_init(struct phy_device *phydev)
> +{
> +	int err;
> +	int value;
> +
> +	err = genphy_config_init(phydev);
> +	if (err < 0)
> +		return err;
> +
> +	if (phydev->interface == PHY_INTERFACE_MODE_SGMII) {
> +		value = phy_read(phydev, MII_DP83811_SGMII_CTRL);
> +		if (!(value & DP83811_SGMII_EN)) {
> +			err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
> +					(DP83811_SGMII_EN | value));
> +			if (err < 0)
> +				return err;
> +		} else {
> +			err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
> +				(~DP83811_SGMII_EN & value));
> +			if (err < 0)
> +				return err;
> +		}

This looks to be a duplicate of dp83811_config_aneg()?

> +	}
> +
> +	value = DP83811_WOL_MAGIC_EN | DP83811_WOL_SECURE_ON | DP83811_WOL_EN;
> +
> +	return phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG,
> +	      value);
> +}
> +
> +static int dp83811_phy_reset(struct phy_device *phydev)
> +{
> +	int err;
> +
> +	err = phy_write(phydev, MII_DP83811_RESET_CTRL, DP83811_HW_RESET);
> +	if (err < 0)
> +		return err;
> +
> +	dp83811_config_init(phydev);

I don't think you need to initialize it here. phylib should call that
soon after the reset.

> +
> +	return 0;
> +}
> +

> +static struct phy_driver dp83811_driver[] = {
> +	{
> +		.phy_id = DP83TC811_PHY_ID,
> +		.phy_id_mask = 0xfffffff0,
> +		.name = "TI DP83TC811",
> +		.features = PHY_BASIC_FEATURES,
> +		.flags = PHY_HAS_INTERRUPT,
> +		.config_init = genphy_config_init,

????

	Andrew

^ permalink raw reply

* Re: [PATCH 2/2] alx: add disable_wol paramenter
From: Andrew Lunn @ 2018-05-09 13:45 UTC (permalink / raw)
  To: AceLan Kao
  Cc: David Miller, James Cliburn, Chris Snook, rakesh, netdev,
	Linux-Kernel@Vger. Kernel. Org, Emily Chien
In-Reply-To: <CAFv23Qn1-5nHcFsk3egcL+OpKtujv4rkyXxa-Qht4eV4z6cbCA@mail.gmail.com>

On Wed, May 09, 2018 at 10:40:13AM +0800, AceLan Kao wrote:
> Hi,
> 
> I didn't get any response around one month.

I didn't notice you posting an results of searching for the true
problem? Please can you point me at an email archive.

	 Andrew

^ permalink raw reply

* Re: Please apply 3f1e53abff84c (" netfilter: ebtables: don't attempt to allocate 0-sized compat array") to v4.14.y and later
From: Greg Kroah-Hartman @ 2018-05-09 13:46 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: stable, dvyukov@google.com, David S. Miller, Network Development
In-Reply-To: <6a55fa40-1304-dae4-e01a-9cfb1897aa8e@roeck-us.net>

On Wed, May 09, 2018 at 06:26:29AM -0700, Guenter Roeck wrote:
> Hi Greg,
> 
> Commit 3f1e53abff84c fixes commit 7d7d7e02111e9 ("("netfilter: compat: reject huge
> allocation requests"), which has found its way into 4.14.y and 4.16.y.
> This causes syzcaller hiccups at least in 4.14.y (more specifically chromeos-4.14).
> 
> Please apply 3f1e53abff84c to v4.14.y and v4.16.y to fix the problem. Copying Dave
> and netdev in case he wants to handle it.
> 
> Sorry for the noise if this has already been queued.

I queued it up 2 hours ago :)

But thanks for asking, it's better to make sure it gets added.

thanks,

greg k-h

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox