Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] nfp: convert nfp_eth_set_bit_config() into a macro
From: Joe Perches @ 2017-10-04 18:07 UTC (permalink / raw)
  To: Matthias Kaehlcke, Jakub Kicinski, David S . Miller, Simon Horman,
	Dirk van der Merwe
  Cc: oss-drivers, netdev, linux-kernel, Renato Golin, Manoj Gupta,
	Guenter Roeck, Doug Anderson
In-Reply-To: <20171003200546.165731-1-mka@chromium.org>

On Tue, 2017-10-03 at 13:05 -0700, Matthias Kaehlcke wrote:
> nfp_eth_set_bit_config() is marked as __always_inline to allow gcc to
> identify the 'mask' parameter as known to be constant at compile time,
> which is required to use the FIELD_GET() macro.
> 
> The forced inlining does the trick for gcc, but for kernel builds with
> clang it results in undefined symbols:

Can't you use local different FIELD_PREP/FIELD_GET macros
with a different name without the BUILD_BUG tests?

i.e.:

#define NFP_FIELD_PREP(_mask, _val)				\
({								\
	((typeof(_mask))(_val) << __bf_shf(_mask)) & (_mask);	\
})

#define NFP_FIELD_GET(_mask, _reg)				\
({								\
	(typeof(_mask))(((_reg) & (_mask)) >> __bf_shf(_mask));	\
})

Then the __always_inline can be removed from
nfp_eth_set_bit_config too.

^ permalink raw reply

* Re: [PATCH] nfp: convert nfp_eth_set_bit_config() into a macro
From: Jakub Kicinski @ 2017-10-04 18:17 UTC (permalink / raw)
  To: Manoj Gupta
  Cc: Matthias Kaehlcke, David S . Miller, Simon Horman,
	Dirk van der Merwe, oss-drivers, netdev, linux-kernel,
	Renato Golin, Guenter Roeck, Doug Anderson
In-Reply-To: <CAAMbb05G=HBQweiWqYva_9zTnQqAcwMhJ0yYBUi26T04YA4CxQ@mail.gmail.com>

On Wed, 4 Oct 2017 10:42:42 -0700, Manoj Gupta wrote:
> Hi Jakub,
> 
> I had discussed about supporting this code with some clang developers.
> However, the consensus was this code relies on a specific GCC optimizer
> behavior and Clang does not share the same behavior by design.

Hm.  I find surprising that constant propagation to inlined functions
is considered something GCC-specific rather than obvious/basic.

> Note that even GCC can't compile this code at -O0.

Yes, that makes me feel uncomfortable...  Could you try this?

-------8<-------

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
index f6f7c085f8e0..47251396fcae 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
@@ -469,10 +469,10 @@ int nfp_eth_set_configured(struct nfp_cpp *cpp, unsigned int idx, bool configed)
 	return nfp_eth_config_commit_end(nsp);
 }
 
-/* Force inline, FIELD_* macroes require masks to be compilation-time known */
-static __always_inline int
+static int
 nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
-		       const u64 mask, unsigned int val, const u64 ctrl_bit)
+		       const u64 mask, const unsigned int shift,
+		       unsigned int val, const u64 ctrl_bit)
 {
 	union eth_table_entry *entries = nfp_nsp_config_entries(nsp);
 	unsigned int idx = nfp_nsp_config_idx(nsp);
@@ -489,11 +489,11 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
 
 	/* Check if we are already in requested state */
 	reg = le64_to_cpu(entries[idx].raw[raw_idx]);
-	if (val == FIELD_GET(mask, reg))
+	if (val == (reg & mask) >> shift)
 		return 0;
 
 	reg &= ~mask;
-	reg |= FIELD_PREP(mask, val);
+	reg |= (val << shift) & mask;
 	entries[idx].raw[raw_idx] = cpu_to_le64(reg);
 
 	entries[idx].control |= cpu_to_le64(ctrl_bit);
@@ -503,6 +503,13 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
 	return 0;
 }
 
+#define NFP_ETH_SET_BIT_CONFIG(nsp, raw_idx, mask, val, ctrl_bit)	\
+	({								\
+		__BF_FIELD_CHECK(mask, 0ULL, val, "NFP_ETH_SET_BIT_CONFIG: "); \
+		nfp_eth_set_bit_config(nsp, raw_idx, mask, __bf_shf(mask), \
+				       val, ctrl_bit);			\
+	})
+
 /**
  * __nfp_eth_set_aneg() - set PHY autonegotiation control bit
  * @nsp:	NFP NSP handle returned from nfp_eth_config_start()
@@ -515,7 +522,7 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
  */
 int __nfp_eth_set_aneg(struct nfp_nsp *nsp, enum nfp_eth_aneg mode)
 {
-	return nfp_eth_set_bit_config(nsp, NSP_ETH_RAW_STATE,
+	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_STATE,
 				      NSP_ETH_STATE_ANEG, mode,
 				      NSP_ETH_CTRL_SET_ANEG);
 }
@@ -544,7 +551,7 @@ int __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed)
 		return -EINVAL;
 	}
 
-	return nfp_eth_set_bit_config(nsp, NSP_ETH_RAW_STATE,
+	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_STATE,
 				      NSP_ETH_STATE_RATE, rate,
 				      NSP_ETH_CTRL_SET_RATE);
 }
@@ -561,6 +568,6 @@ int __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed)
  */
 int __nfp_eth_set_split(struct nfp_nsp *nsp, unsigned int lanes)
 {
-	return nfp_eth_set_bit_config(nsp, NSP_ETH_RAW_PORT, NSP_ETH_PORT_LANES,
+	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_PORT, NSP_ETH_PORT_LANES,
 				      lanes, NSP_ETH_CTRL_SET_LANES);
 }
-- 
2.14.1

^ permalink raw reply related

* [PATCH 1/3 v2] net: phy: Remove TI DP83822 from DP83848 driver
From: Dan Murphy @ 2017-10-04 18:20 UTC (permalink / raw)
  To: andrew, f.fainelli; +Cc: netdev, afd, Dan Murphy

Removing the DP83822 device from the DP83848 to
support the TI DP83822 dedicated driver that will
initially support WoL settings.

Signed-off-by: Dan Murphy <dmurphy@ti.com>
---

v2 - There was no v1 on this patch this is new.

 drivers/net/phy/dp83848.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/phy/dp83848.c b/drivers/net/phy/dp83848.c
index 3de4fe4dda77..3966d43c5146 100644
--- a/drivers/net/phy/dp83848.c
+++ b/drivers/net/phy/dp83848.c
@@ -20,7 +20,6 @@
 #define TI_DP83620_PHY_ID		0x20005ce0
 #define NS_DP83848C_PHY_ID		0x20005c90
 #define TLK10X_PHY_ID			0x2000a210
-#define TI_DP83822_PHY_ID		0x2000a240
 
 /* Registers */
 #define DP83848_MICR			0x11 /* MII Interrupt Control Register */
@@ -80,7 +79,6 @@ static struct mdio_device_id __maybe_unused dp83848_tbl[] = {
 	{ NS_DP83848C_PHY_ID, 0xfffffff0 },
 	{ TI_DP83620_PHY_ID, 0xfffffff0 },
 	{ TLK10X_PHY_ID, 0xfffffff0 },
-	{ TI_DP83822_PHY_ID, 0xfffffff0 },
 	{ }
 };
 MODULE_DEVICE_TABLE(mdio, dp83848_tbl);
@@ -110,7 +108,6 @@ static struct phy_driver dp83848_driver[] = {
 	DP83848_PHY_DRIVER(NS_DP83848C_PHY_ID, "NS DP83848C 10/100 Mbps PHY"),
 	DP83848_PHY_DRIVER(TI_DP83620_PHY_ID, "TI DP83620 10/100 Mbps PHY"),
 	DP83848_PHY_DRIVER(TLK10X_PHY_ID, "TI TLK10X 10/100 Mbps PHY"),
-	DP83848_PHY_DRIVER(TI_DP83822_PHY_ID, "TI DP83822 10/100 Mbps PHY"),
 };
 module_phy_driver(dp83848_driver);
 
-- 
2.14.0

^ permalink raw reply related

* [PATCH 2/3 v2] net: phy: DP83822 initial driver submission
From: Dan Murphy @ 2017-10-04 18:20 UTC (permalink / raw)
  To: andrew, f.fainelli; +Cc: netdev, afd, Dan Murphy
In-Reply-To: <20171004182031.13794-1-dmurphy@ti.com>

Add support for the TI  DP83822 10/100Mbit ethernet phy.

The DP83822 provides flexibility to connect to a MAC through a
standard MII, RMII or RGMII interface.

Datasheet:
http://www.ti.com/product/DP83822I/datasheet

Signed-off-by: Dan Murphy <dmurphy@ti.com>
---

v2 - Updated per comments.  Removed unnessary parantheis, called genphy_suspend/
resume routines and then performing WoL changes, reworked sopass storage and reduced
the number of phy reads, and moved WOL_SECURE_ON - 
https://www.mail-archive.com/netdev@vger.kernel.org/msg191392.html

 drivers/net/phy/Kconfig   |   5 +
 drivers/net/phy/Makefile  |   1 +
 drivers/net/phy/dp83822.c | 306 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 312 insertions(+)
 create mode 100644 drivers/net/phy/dp83822.c

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index cd931cf9dcc2..8e78a482e09e 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -277,6 +277,11 @@ config DAVICOM_PHY
 	---help---
 	  Currently supports dm9161e and dm9131
 
+config DP83822_PHY
+	tristate "Texas Instruments DP83822 PHY"
+	---help---
+	  Supports the DP83822 PHY.
+
 config DP83848_PHY
 	tristate "Texas Instruments DP83848 PHY"
 	---help---
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 416df92fbf4f..df3b82ba8550 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -55,6 +55,7 @@ obj-$(CONFIG_CICADA_PHY)	+= cicada.o
 obj-$(CONFIG_CORTINA_PHY)	+= cortina.o
 obj-$(CONFIG_DAVICOM_PHY)	+= davicom.o
 obj-$(CONFIG_DP83640_PHY)	+= dp83640.o
+obj-$(CONFIG_DP83822_PHY)	+= dp83822.o
 obj-$(CONFIG_DP83848_PHY)	+= dp83848.o
 obj-$(CONFIG_DP83867_PHY)	+= dp83867.o
 obj-$(CONFIG_FIXED_PHY)		+= fixed_phy.o
diff --git a/drivers/net/phy/dp83822.c b/drivers/net/phy/dp83822.c
new file mode 100644
index 000000000000..200a0e39756e
--- /dev/null
+++ b/drivers/net/phy/dp83822.c
@@ -0,0 +1,306 @@
+/*
+ * Driver for the Texas Instruments DP83822 PHY
+ *
+ * Copyright (C) 2017 Texas Instruments Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/ethtool.h>
+#include <linux/etherdevice.h>
+#include <linux/kernel.h>
+#include <linux/mii.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/phy.h>
+#include <linux/netdevice.h>
+
+#define DP83822_PHY_ID	        0x2000a240
+#define DP83822_DEVADDR		0x1f
+
+#define MII_DP83822_MISR1	0x12
+#define MII_DP83822_MISR2	0x13
+#define MII_DP83822_RESET_CTRL	0x1f
+
+#define DP83822_HW_RESET	BIT(15)
+#define DP83822_SW_RESET	BIT(14)
+
+/* MISR1 bits */
+#define DP83822_RX_ERR_HF_INT_EN	BIT(0)
+#define DP83822_FALSE_CARRIER_HF_INT_EN	BIT(1)
+#define DP83822_ANEG_COMPLETE_INT_EN	BIT(2)
+#define DP83822_DUP_MODE_CHANGE_INT_EN	BIT(3)
+#define DP83822_SPEED_CHANGED_INT_EN	BIT(4)
+#define DP83822_LINK_STAT_INT_EN	BIT(5)
+#define DP83822_ENERGY_DET_INT_EN	BIT(6)
+#define DP83822_LINK_QUAL_INT_EN	BIT(7)
+
+/* MISR2 bits */
+#define DP83822_JABBER_DET_INT_EN	BIT(0)
+#define DP83822_WOL_PKT_INT_EN		BIT(1)
+#define DP83822_SLEEP_MODE_INT_EN	BIT(2)
+#define DP83822_MDI_XOVER_INT_EN	BIT(3)
+#define DP83822_LB_FIFO_INT_EN		BIT(4)
+#define DP83822_PAGE_RX_INT_EN		BIT(5)
+#define DP83822_ANEG_ERR_INT_EN		BIT(6)
+#define DP83822_EEE_ERROR_CHANGE_INT_EN	BIT(7)
+
+/* INT_STAT1 bits */
+#define DP83822_WOL_INT_EN	BIT(4)
+#define DP83822_WOL_INT_STAT	BIT(12)
+
+#define MII_DP83822_RXSOP1	0x04a5
+#define	MII_DP83822_RXSOP2	0x04a6
+#define	MII_DP83822_RXSOP3	0x04a7
+
+/* WoL Registers */
+#define	MII_DP83822_WOL_CFG	0x04a0
+#define	MII_DP83822_WOL_STAT	0x04a1
+#define	MII_DP83822_WOL_DA1	0x04a2
+#define	MII_DP83822_WOL_DA2	0x04a3
+#define	MII_DP83822_WOL_DA3	0x04a4
+
+/* WoL bits */
+#define DP83822_WOL_MAGIC_EN	BIT(1)
+#define DP83822_WOL_SECURE_ON	BIT(5)
+#define DP83822_WOL_EN		BIT(7)
+#define DP83822_WOL_INDICATION_SEL BIT(8)
+#define DP83822_WOL_CLR_INDICATION BIT(11)
+
+static int dp83822_ack_interrupt(struct phy_device *phydev)
+{
+	int err = phy_read(phydev, MII_DP83822_MISR1);
+
+	if (err < 0)
+		return err;
+
+	err = phy_read(phydev, MII_DP83822_MISR2);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+
+static int dp83822_set_wol(struct phy_device *phydev,
+			   struct ethtool_wolinfo *wol)
+{
+	struct net_device *ndev = phydev->attached_dev;
+	u16 value;
+	const u8 *mac;
+
+	if (wol->wolopts & (WAKE_MAGIC | WAKE_MAGICSECURE)) {
+		mac = (const u8 *)ndev->dev_addr;
+
+		if (!is_valid_ether_addr(mac))
+			return -EINVAL;
+
+		/* MAC addresses start with byte 5, but stored in mac[0].
+		 * 822 PHYs store bytes 4|5, 2|3, 0|1
+		 */
+		phy_write_mmd(phydev, DP83822_DEVADDR,
+			      MII_DP83822_WOL_DA1, (mac[1] << 8) | mac[0]);
+		phy_write_mmd(phydev, DP83822_DEVADDR,
+			      MII_DP83822_WOL_DA2, (mac[3] << 8) | mac[2]);
+		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_DA3,
+			      (mac[5] << 8) | mac[4]);
+
+		value = phy_read_mmd(phydev, DP83822_DEVADDR,
+				     MII_DP83822_WOL_CFG);
+		if (wol->wolopts & WAKE_MAGIC)
+			value |= DP83822_WOL_MAGIC_EN;
+		else
+			value &= ~DP83822_WOL_MAGIC_EN;
+
+		if (wol->wolopts & WAKE_MAGICSECURE) {
+			phy_write_mmd(phydev, DP83822_DEVADDR,
+				      MII_DP83822_RXSOP1,
+				      (wol->sopass[1] << 8) | wol->sopass[0]);
+			phy_write_mmd(phydev, DP83822_DEVADDR,
+				      MII_DP83822_RXSOP2,
+				      (wol->sopass[3] << 8) | wol->sopass[2]);
+			phy_write_mmd(phydev, DP83822_DEVADDR,
+				      MII_DP83822_RXSOP3,
+				      (wol->sopass[5] << 8) | wol->sopass[4]);
+			value |= DP83822_WOL_SECURE_ON;
+		} else {
+			value &= ~DP83822_WOL_SECURE_ON;
+		}
+
+		value |= (DP83822_WOL_EN | DP83822_WOL_CLR_INDICATION |
+			  DP83822_WOL_CLR_INDICATION);
+		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG,
+			      value);
+	} else {
+		value = phy_read_mmd(phydev, DP83822_DEVADDR,
+				     MII_DP83822_WOL_CFG);
+		value &= ~DP83822_WOL_EN;
+		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG,
+			      value);
+	}
+
+	return 0;
+}
+
+static void dp83822_get_wol(struct phy_device *phydev,
+			    struct ethtool_wolinfo *wol)
+{
+	int value;
+	u16 sopass_val;
+
+	wol->supported = (WAKE_MAGIC | WAKE_MAGICSECURE);
+	wol->wolopts = 0;
+
+	value = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
+	if (value & DP83822_WOL_MAGIC_EN)
+		wol->wolopts |= WAKE_MAGIC;
+
+	if (~value & DP83822_WOL_CLR_INDICATION)
+		wol->wolopts = 0;
+
+	if (value & DP83822_WOL_SECURE_ON) {
+		wol->wolopts |= WAKE_MAGICSECURE;
+	} else {
+		wol->wolopts &= ~WAKE_MAGICSECURE;
+		return;
+	}
+
+	sopass_val = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP1);
+	wol->sopass[0] = (sopass_val & 0xff);
+	wol->sopass[1] = (sopass_val >> 8);
+
+	sopass_val = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP2);
+	wol->sopass[2] = (sopass_val & 0xff);
+	wol->sopass[3] = (sopass_val >> 8);
+
+	sopass_val = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP3);
+	wol->sopass[4] = (sopass_val & 0xff);
+	wol->sopass[5] = (sopass_val >> 8);
+}
+
+static int dp83822_config_intr(struct phy_device *phydev)
+{
+	int misr_status;
+	int err;
+
+	if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
+		misr_status = phy_read(phydev, MII_DP83822_MISR1);
+		if (misr_status < 0)
+			return misr_status;
+
+		misr_status |= (DP83822_RX_ERR_HF_INT_EN |
+				DP83822_FALSE_CARRIER_HF_INT_EN |
+				DP83822_ANEG_COMPLETE_INT_EN |
+				DP83822_DUP_MODE_CHANGE_INT_EN |
+				DP83822_SPEED_CHANGED_INT_EN |
+				DP83822_LINK_STAT_INT_EN |
+				DP83822_ENERGY_DET_INT_EN |
+				DP83822_LINK_QUAL_INT_EN);
+
+		err = phy_write(phydev, MII_DP83822_MISR1, misr_status);
+		if (err < 0)
+			return err;
+
+		misr_status = phy_read(phydev, MII_DP83822_MISR2);
+		if (misr_status < 0)
+			return misr_status;
+
+		misr_status |= (DP83822_JABBER_DET_INT_EN |
+				DP83822_WOL_PKT_INT_EN |
+				DP83822_SLEEP_MODE_INT_EN |
+				DP83822_MDI_XOVER_INT_EN |
+				DP83822_LB_FIFO_INT_EN |
+				DP83822_PAGE_RX_INT_EN |
+				DP83822_ANEG_ERR_INT_EN |
+				DP83822_EEE_ERROR_CHANGE_INT_EN);
+
+		err = phy_write(phydev, MII_DP83822_MISR2, misr_status);
+	} else {
+		err = phy_write(phydev, MII_DP83822_MISR1, 0);
+		if (err < 0)
+			return err;
+
+		err = phy_write(phydev, MII_DP83822_MISR1, 0);
+	}
+
+	return err;
+}
+
+static int dp83822_phy_reset(struct phy_device *phydev)
+{
+	int err;
+
+	err = phy_write(phydev, MII_DP83822_RESET_CTRL, DP83822_HW_RESET);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+
+static int dp83822_suspend(struct phy_device *phydev)
+{
+	int value;
+
+	mutex_lock(&phydev->lock);
+	value = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
+	mutex_unlock(&phydev->lock);
+
+	if (!(value & DP83822_WOL_EN))
+		genphy_suspend(phydev);
+
+	return 0;
+}
+
+static int dp83822_resume(struct phy_device *phydev)
+{
+	int value;
+
+	genphy_resume(phydev);
+
+	mutex_lock(&phydev->lock);
+	value = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
+
+	phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG, value |
+		      DP83822_WOL_CLR_INDICATION);
+
+	mutex_unlock(&phydev->lock);
+
+	return 0;
+}
+
+static struct phy_driver dp83822_driver[] = {
+	{
+	 .phy_id = DP83822_PHY_ID,
+	 .phy_id_mask = 0xfffffff0,
+	 .name = "TI DP83822",
+	 .features = PHY_BASIC_FEATURES,
+	 .flags = PHY_HAS_INTERRUPT,
+	 .config_init = genphy_config_init,
+	 .soft_reset = dp83822_phy_reset,
+	 .get_wol = dp83822_get_wol,
+	 .set_wol = dp83822_set_wol,
+	 .ack_interrupt = dp83822_ack_interrupt,
+	 .config_intr = dp83822_config_intr,
+	 .config_aneg = genphy_config_aneg,
+	 .read_status = genphy_read_status,
+	 .suspend = dp83822_suspend,
+	 .resume = dp83822_resume,
+	 },
+};
+module_phy_driver(dp83822_driver);
+
+static struct mdio_device_id __maybe_unused dp83822_tbl[] = {
+	{ DP83822_PHY_ID, 0xfffffff0 },
+	{ },
+};
+MODULE_DEVICE_TABLE(mdio, dp83822_tbl);
+
+MODULE_DESCRIPTION("Texas Instruments DP83822 PHY driver");
+MODULE_AUTHOR("Dan Murphy <dmurphy@ti.com");
+MODULE_LICENSE("GPL");
-- 
2.14.0

^ permalink raw reply related

* [PATCH 3/3 v2] net: phy: Change error to EINVAL for invalid MAC
From: Dan Murphy @ 2017-10-04 18:20 UTC (permalink / raw)
  To: andrew, f.fainelli; +Cc: netdev, afd, Dan Murphy
In-Reply-To: <20171004182031.13794-1-dmurphy@ti.com>

Change the return error code to EINVAL if the MAC
address is not valid in the set_wol function.

Signed-off-by: Dan Murphy <dmurphy@ti.com>
---

v2 - There was no v1 on this patch this is new.

 drivers/net/phy/at803x.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
index c1e52b9dc58d..5f93e6add563 100644
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -167,7 +167,7 @@ static int at803x_set_wol(struct phy_device *phydev,
 		mac = (const u8 *) ndev->dev_addr;
 
 		if (!is_valid_ether_addr(mac))
-			return -EFAULT;
+			return -EINVAL;
 
 		for (i = 0; i < 3; i++) {
 			phy_write(phydev, AT803X_MMD_ACCESS_CONTROL,
-- 
2.14.0

^ permalink raw reply related

* [PATCH] mwifiex: Use put_unaligned_le32
From: Himanshu Jha @ 2017-10-04 18:28 UTC (permalink / raw)
  To: amitkarwar
  Cc: nishants, gbhat, huxm, kvalo, linux-wireless, netdev,
	linux-kernel, Himanshu Jha

Use put_unaligned_le32 rather than using byte ordering function and
memcpy which makes code clear.
Also, add the header file where it is declared.

Done using Coccinelle and semantic patch used is :

@ rule1 @
identifier tmp; expression ptr,x; type T;
@@

- tmp = cpu_to_le32(x);

  <+... when != tmp
- memcpy(ptr, (T)&tmp, ...);
+ put_unaligned_le32(x,ptr);
  ...+>

@ depends on rule1 @
type j; identifier tmp;
@@

- j tmp;
  ...when != tmp

Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
---
 drivers/net/wireless/marvell/mwifiex/cmdevt.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/cmdevt.c b/drivers/net/wireless/marvell/mwifiex/cmdevt.c
index 0edc5d6..e28e119 100644
--- a/drivers/net/wireless/marvell/mwifiex/cmdevt.c
+++ b/drivers/net/wireless/marvell/mwifiex/cmdevt.c
@@ -17,6 +17,7 @@
  * this warranty disclaimer.
  */
 
+#include <linux/unaligned/access_ok.h>
 #include "decl.h"
 #include "ioctl.h"
 #include "util.h"
@@ -183,7 +184,6 @@ static int mwifiex_dnld_cmd_to_fw(struct mwifiex_private *priv,
 	uint16_t cmd_code;
 	uint16_t cmd_size;
 	unsigned long flags;
-	__le32 tmp;
 
 	if (!adapter || !cmd_node)
 		return -1;
@@ -249,9 +249,9 @@ static int mwifiex_dnld_cmd_to_fw(struct mwifiex_private *priv,
 	mwifiex_dbg_dump(adapter, CMD_D, "cmd buffer:", host_cmd, cmd_size);
 
 	if (adapter->iface_type == MWIFIEX_USB) {
-		tmp = cpu_to_le32(MWIFIEX_USB_TYPE_CMD);
 		skb_push(cmd_node->cmd_skb, MWIFIEX_TYPE_LEN);
-		memcpy(cmd_node->cmd_skb->data, &tmp, MWIFIEX_TYPE_LEN);
+		put_unaligned_le32(MWIFIEX_USB_TYPE_CMD,
+				   cmd_node->cmd_skb->data);
 		adapter->cmd_sent = true;
 		ret = adapter->if_ops.host_to_card(adapter,
 						   MWIFIEX_USB_EP_CMD_EVENT,
@@ -317,7 +317,6 @@ static int mwifiex_dnld_sleep_confirm_cmd(struct mwifiex_adapter *adapter)
 				(struct mwifiex_opt_sleep_confirm *)
 						adapter->sleep_cfm->data;
 	struct sk_buff *sleep_cfm_tmp;
-	__le32 tmp;
 
 	priv = mwifiex_get_priv(adapter, MWIFIEX_BSS_ROLE_ANY);
 
@@ -342,8 +341,7 @@ static int mwifiex_dnld_sleep_confirm_cmd(struct mwifiex_adapter *adapter)
 				      + MWIFIEX_TYPE_LEN);
 		skb_put(sleep_cfm_tmp, sizeof(struct mwifiex_opt_sleep_confirm)
 			+ MWIFIEX_TYPE_LEN);
-		tmp = cpu_to_le32(MWIFIEX_USB_TYPE_CMD);
-		memcpy(sleep_cfm_tmp->data, &tmp, MWIFIEX_TYPE_LEN);
+		put_unaligned_le32(MWIFIEX_USB_TYPE_CMD, sleep_cfm_tmp->data);
 		memcpy(sleep_cfm_tmp->data + MWIFIEX_TYPE_LEN,
 		       adapter->sleep_cfm->data,
 		       sizeof(struct mwifiex_opt_sleep_confirm));
-- 
2.7.4

^ permalink raw reply related

* [PATCH 0/4] net-next: security: New file mode and LSM hooks for eBPF object permission control
From: Chenbo Feng @ 2017-10-04 18:29 UTC (permalink / raw)
  To: netdev, SELinux, linux-security-module
  Cc: Jeffrey Vander Stoep, Lorenzo Colitti, Alexei Starovoitov,
	Daniel Borkmann, Chenbo Feng

From: Chenbo Feng <fengc@google.com>

Much like files and sockets, eBPF objects are accessed, controlled, and
shared via a file descriptor (FD). Unlike files and sockets, the
existing mechanism for eBPF object access control is very limited.
Currently there are two options for granting accessing to eBPF
operations: grant access to all processes, or only CAP_SYS_ADMIN
processes. The CAP_SYS_ADMIN-only mode is not ideal because most users
do not have this capability and granting a user CAP_SYS_ADMIN grants too
many other security-sensitive permissions. It also unnecessarily allows
all CAP_SYS_ADMIN processes access to eBPF functionality. Allowing all
processes to access to eBPF objects is also undesirable since it has
potential to allow unprivileged processes to consume kernel memory, and
opens up attack surface to the kernel.

Adding LSM hooks maintains the status quo for systems which do not use
an LSM, preserving compatibility with userspace, while allowing security
modules to choose how best to handle permissions on eBPF objects. Here
is a possible use case for the lsm hooks with selinux module:

The network-control daemon (netd) creates and loads an eBPF object for
network packet filtering and analysis. It passes the object FD to an
unprivileged network monitor app (netmonitor), which is not allowed to
create, modify or load eBPF objects, but is allowed to read the traffic
stats from the map.

Selinux could use these hooks to grant the following permissions:
allow netd self:bpf_map { create read write};
allow netmonitor netd:fd use;
allow netmonitor netd:bpf_map read;

In this patch series, A file mode is added to bpf map to store the
accessing mode. With this file mode flags, the map can be obtained read
only, write only or read and write. With the help of this file mode,
several security hooks can be added to the eBPF syscall implementations
to do permissions checks. These LSM hooks are mainly focused on checking
the process privileges before it obtains the fd for a specific bpf
object. No matter from a file location or from a eBPF id. Besides that,
a general check hook is also implemented at the start of bpf syscalls so
that each security module can have their own implementation on the reset
of bpf object related functionalities.

In order to store the ownership and security information about eBPF
maps, a security field pointer is added to the struct bpf_map. And the
last two patch set are implementation of selinux check on these hooks
introduced, plus an additional check when eBPF object is passed between
processes using unix socket as well as binder IPC.

Chenbo Feng (4):
  Add file mode configuration into bpf maps
  Add LSM hooks for bpf object related syscall
  Add selinux check for eBPF syscall operations
  Add addtional check for bpf object file receive

 include/linux/bpf.h                 |  15 +++-
 include/linux/lsm_hooks.h           |  54 ++++++++++++
 include/linux/security.h            |  45 ++++++++++
 include/uapi/linux/bpf.h            |   6 ++
 kernel/bpf/inode.c                  |  15 ++--
 kernel/bpf/syscall.c                | 112 +++++++++++++++++++++---
 security/security.c                 |  32 +++++++
 security/selinux/hooks.c            | 168 +++++++++++++++++++++++++++++++++++-
 security/selinux/include/classmap.h |   2 +
 security/selinux/include/objsec.h   |   4 +
 10 files changed, 433 insertions(+), 20 deletions(-)

-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply

* [PATCH net-next 1/4] bpf: Add file mode configuration into bpf maps
From: Chenbo Feng @ 2017-10-04 18:29 UTC (permalink / raw)
  To: netdev, SELinux, linux-security-module
  Cc: Jeffrey Vander Stoep, Lorenzo Colitti, Alexei Starovoitov,
	Daniel Borkmann, Chenbo Feng
In-Reply-To: <20171004182932.140028-1-chenbofeng.kernel@gmail.com>

From: Chenbo Feng <fengc@google.com>

Introduce the map read/write flags to the eBPF syscalls that returns the
map fd. The flags is used to set up the file mode when construct a new
file descriptor for bpf maps. To not break the backward capability, the
f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise
it should be O_RDONLY or O_WRONLY. When the userspace want to modify or
read the map content, it will check the file mode to see if it is
allowed to make the change.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf.h      |  6 ++--
 include/uapi/linux/bpf.h |  6 ++++
 kernel/bpf/inode.c       | 15 ++++++---
 kernel/bpf/syscall.c     | 80 +++++++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 92 insertions(+), 15 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 252f4bc9eb25..d826be859589 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -273,11 +273,11 @@ void bpf_map_area_free(void *base);
 
 extern int sysctl_unprivileged_bpf_disabled;
 
-int bpf_map_new_fd(struct bpf_map *map);
+int bpf_map_new_fd(struct bpf_map *map, int flags);
 int bpf_prog_new_fd(struct bpf_prog *prog);
 
 int bpf_obj_pin_user(u32 ufd, const char __user *pathname);
-int bpf_obj_get_user(const char __user *pathname);
+int bpf_obj_get_user(const char __user *pathname, int flags);
 
 int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value);
 int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value);
@@ -296,6 +296,8 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
 				void *key, void *value, u64 map_flags);
 int bpf_fd_htab_map_lookup_elem(struct bpf_map *map, void *key, u32 *value);
 
+int bpf_get_file_flag(int flags);
+
 /* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
  * forced to use 'long' read/writes to try to atomically copy long counters.
  * Best-effort only.  No barriers here, since it _will_ race with concurrent
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6d2137b4cf38..e812be5946a6 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -177,6 +177,10 @@ enum bpf_attach_type {
 
 #define BPF_OBJ_NAME_LEN 16U
 
+/* Flags for accessing BPF object */
+#define BPF_F_RDONLY		(1U << 3)
+#define BPF_F_WRONLY		(1U << 4)
+
 union bpf_attr {
 	struct { /* anonymous struct used by BPF_MAP_CREATE command */
 		__u32	map_type;	/* one of enum bpf_map_type */
@@ -219,6 +223,7 @@ union bpf_attr {
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
 		__aligned_u64	pathname;
 		__u32		bpf_fd;
+		__u32		file_flags;
 	};
 
 	struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
@@ -246,6 +251,7 @@ union bpf_attr {
 			__u32		map_id;
 		};
 		__u32		next_id;
+		__u32		open_flags;
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index e833ed914358..7d8c6dd8dd5d 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -295,7 +295,7 @@ int bpf_obj_pin_user(u32 ufd, const char __user *pathname)
 }
 
 static void *bpf_obj_do_get(const struct filename *pathname,
-			    enum bpf_type *type)
+			    enum bpf_type *type, int flags)
 {
 	struct inode *inode;
 	struct path path;
@@ -307,7 +307,7 @@ static void *bpf_obj_do_get(const struct filename *pathname,
 		return ERR_PTR(ret);
 
 	inode = d_backing_inode(path.dentry);
-	ret = inode_permission(inode, MAY_WRITE);
+	ret = inode_permission(inode, ACC_MODE(flags));
 	if (ret)
 		goto out;
 
@@ -326,18 +326,23 @@ static void *bpf_obj_do_get(const struct filename *pathname,
 	return ERR_PTR(ret);
 }
 
-int bpf_obj_get_user(const char __user *pathname)
+int bpf_obj_get_user(const char __user *pathname, int flags)
 {
 	enum bpf_type type = BPF_TYPE_UNSPEC;
 	struct filename *pname;
 	int ret = -ENOENT;
+	int f_flags;
 	void *raw;
 
+	f_flags = bpf_get_file_flag(flags);
+	if (f_flags < 0)
+		return f_flags;
+
 	pname = getname(pathname);
 	if (IS_ERR(pname))
 		return PTR_ERR(pname);
 
-	raw = bpf_obj_do_get(pname, &type);
+	raw = bpf_obj_do_get(pname, &type, f_flags);
 	if (IS_ERR(raw)) {
 		ret = PTR_ERR(raw);
 		goto out;
@@ -346,7 +351,7 @@ int bpf_obj_get_user(const char __user *pathname)
 	if (type == BPF_TYPE_PROG)
 		ret = bpf_prog_new_fd(raw);
 	else if (type == BPF_TYPE_MAP)
-		ret = bpf_map_new_fd(raw);
+		ret = bpf_map_new_fd(raw, f_flags);
 	else
 		goto out;
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b927da66f653..7c8a9964a41d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -294,17 +294,48 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
 }
 #endif
 
+static ssize_t bpf_dummy_read(struct file *filp, char __user *buf, size_t siz,
+			      loff_t *ppos)
+{
+	/* We need this handler such that alloc_file() enables
+	 * f_mode with FMODE_CAN_READ.
+	 */
+	return -EINVAL;
+}
+
+static ssize_t bpf_dummy_write(struct file *filp, const char __user *buf,
+			       size_t siz, loff_t *ppos)
+{
+	/* We need this handler such that alloc_file() enables
+	 * f_mode with FMODE_CAN_WRITE.
+	 */
+	return -EINVAL;
+}
+
 static const struct file_operations bpf_map_fops = {
 #ifdef CONFIG_PROC_FS
 	.show_fdinfo	= bpf_map_show_fdinfo,
 #endif
 	.release	= bpf_map_release,
+	.read		= bpf_dummy_read,
+	.write		= bpf_dummy_write,
 };
 
-int bpf_map_new_fd(struct bpf_map *map)
+int bpf_map_new_fd(struct bpf_map *map, int flags)
 {
 	return anon_inode_getfd("bpf-map", &bpf_map_fops, map,
-				O_RDWR | O_CLOEXEC);
+				flags | O_CLOEXEC);
+}
+
+int bpf_get_file_flag(int flags)
+{
+	if ((flags & BPF_F_RDONLY) && (flags & BPF_F_WRONLY))
+		return -EINVAL;
+	if (flags & BPF_F_RDONLY)
+		return O_RDONLY;
+	if (flags & BPF_F_WRONLY)
+		return O_WRONLY;
+	return O_RDWR;
 }
 
 /* helper macro to check that unused fields 'union bpf_attr' are zero */
@@ -345,12 +376,17 @@ static int map_create(union bpf_attr *attr)
 {
 	int numa_node = bpf_map_attr_numa_node(attr);
 	struct bpf_map *map;
+	int f_flags;
 	int err;
 
 	err = CHECK_ATTR(BPF_MAP_CREATE);
 	if (err)
 		return -EINVAL;
 
+	f_flags = bpf_get_file_flag(attr->map_flags);
+	if (f_flags < 0)
+		return f_flags;
+
 	if (numa_node != NUMA_NO_NODE &&
 	    ((unsigned int)numa_node >= nr_node_ids ||
 	     !node_online(numa_node)))
@@ -376,7 +412,7 @@ static int map_create(union bpf_attr *attr)
 	if (err)
 		goto free_map;
 
-	err = bpf_map_new_fd(map);
+	err = bpf_map_new_fd(map, f_flags);
 	if (err < 0) {
 		/* failed to allocate fd.
 		 * bpf_map_put() is needed because the above
@@ -491,6 +527,11 @@ static int map_lookup_elem(union bpf_attr *attr)
 	if (IS_ERR(map))
 		return PTR_ERR(map);
 
+	if (!(f.file->f_mode & FMODE_CAN_READ)) {
+		err = -EPERM;
+		goto err_put;
+	}
+
 	key = memdup_user(ukey, map->key_size);
 	if (IS_ERR(key)) {
 		err = PTR_ERR(key);
@@ -571,6 +612,11 @@ static int map_update_elem(union bpf_attr *attr)
 	if (IS_ERR(map))
 		return PTR_ERR(map);
 
+	if (!(f.file->f_mode & FMODE_CAN_WRITE)) {
+		err = -EPERM;
+		goto err_put;
+	}
+
 	key = memdup_user(ukey, map->key_size);
 	if (IS_ERR(key)) {
 		err = PTR_ERR(key);
@@ -654,6 +700,11 @@ static int map_delete_elem(union bpf_attr *attr)
 	if (IS_ERR(map))
 		return PTR_ERR(map);
 
+	if (!(f.file->f_mode & FMODE_CAN_WRITE)) {
+		err = -EPERM;
+		goto err_put;
+	}
+
 	key = memdup_user(ukey, map->key_size);
 	if (IS_ERR(key)) {
 		err = PTR_ERR(key);
@@ -697,6 +748,11 @@ static int map_get_next_key(union bpf_attr *attr)
 	if (IS_ERR(map))
 		return PTR_ERR(map);
 
+	if (!(f.file->f_mode & FMODE_CAN_READ)) {
+		err = -EPERM;
+		goto err_put;
+	}
+
 	if (ukey) {
 		key = memdup_user(ukey, map->key_size);
 		if (IS_ERR(key)) {
@@ -903,6 +959,8 @@ static const struct file_operations bpf_prog_fops = {
 	.show_fdinfo	= bpf_prog_show_fdinfo,
 #endif
 	.release	= bpf_prog_release,
+	.read		= bpf_dummy_read,
+	.write		= bpf_dummy_write,
 };
 
 int bpf_prog_new_fd(struct bpf_prog *prog)
@@ -1112,11 +1170,11 @@ static int bpf_prog_load(union bpf_attr *attr)
 	return err;
 }
 
-#define BPF_OBJ_LAST_FIELD bpf_fd
+#define BPF_OBJ_LAST_FIELD file_flags
 
 static int bpf_obj_pin(const union bpf_attr *attr)
 {
-	if (CHECK_ATTR(BPF_OBJ))
+	if (CHECK_ATTR(BPF_OBJ) || attr->file_flags != 0)
 		return -EINVAL;
 
 	return bpf_obj_pin_user(attr->bpf_fd, u64_to_user_ptr(attr->pathname));
@@ -1127,7 +1185,8 @@ static int bpf_obj_get(const union bpf_attr *attr)
 	if (CHECK_ATTR(BPF_OBJ) || attr->bpf_fd != 0)
 		return -EINVAL;
 
-	return bpf_obj_get_user(u64_to_user_ptr(attr->pathname));
+	return bpf_obj_get_user(u64_to_user_ptr(attr->pathname),
+				attr->file_flags);
 }
 
 #ifdef CONFIG_CGROUP_BPF
@@ -1341,12 +1400,13 @@ static int bpf_prog_get_fd_by_id(const union bpf_attr *attr)
 	return fd;
 }
 
-#define BPF_MAP_GET_FD_BY_ID_LAST_FIELD map_id
+#define BPF_MAP_GET_FD_BY_ID_LAST_FIELD open_flags
 
 static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
 {
 	struct bpf_map *map;
 	u32 id = attr->map_id;
+	int f_flags;
 	int fd;
 
 	if (CHECK_ATTR(BPF_MAP_GET_FD_BY_ID))
@@ -1355,6 +1415,10 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
+	f_flags = bpf_get_file_flag(attr->open_flags);
+	if (f_flags < 0)
+		return f_flags;
+
 	spin_lock_bh(&map_idr_lock);
 	map = idr_find(&map_idr, id);
 	if (map)
@@ -1366,7 +1430,7 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
 	if (IS_ERR(map))
 		return PTR_ERR(map);
 
-	fd = bpf_map_new_fd(map);
+	fd = bpf_map_new_fd(map, f_flags);
 	if (fd < 0)
 		bpf_map_put(map);
 
-- 
2.14.2.920.gcf0c67979c-goog


^ permalink raw reply related

* [PATCH net-next 2/4] security: bpf: Add LSM hooks for bpf object related syscall
From: Chenbo Feng @ 2017-10-04 18:29 UTC (permalink / raw)
  To: netdev, SELinux, linux-security-module
  Cc: Jeffrey Vander Stoep, Lorenzo Colitti, Alexei Starovoitov,
	Daniel Borkmann, Chenbo Feng
In-Reply-To: <20171004182932.140028-1-chenbofeng.kernel@gmail.com>

From: Chenbo Feng <fengc@google.com>

Introduce several LSM hooks for the syscalls that will allow the
userspace to access to eBPF object such as eBPF programs and eBPF maps.
The security check is aimed to enforce a per object security protection
for eBPF object so only processes with the right priviliges can
read/write to a specific map or use a specific eBPF program. Besides
that, a general security hook is added before the multiplexer of bpf
syscall to check the cmd and the attribute used for the command. The
actual security module can decide which command need to be checked and
how the cmd should be checked.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf.h       |  6 ++++++
 include/linux/lsm_hooks.h | 54 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/security.h  | 45 +++++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c      | 28 ++++++++++++++++++++++--
 security/security.c       | 32 ++++++++++++++++++++++++++++
 5 files changed, 163 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d826be859589..d757ea3f2228 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -57,6 +57,9 @@ struct bpf_map {
 	atomic_t usercnt;
 	struct bpf_map *inner_map_meta;
 	u8 name[BPF_OBJ_NAME_LEN];
+#ifdef CONFIG_SECURITY
+	void *security;
+#endif
 };
 
 /* function argument constraints */
@@ -190,6 +193,9 @@ struct bpf_prog_aux {
 	struct user_struct *user;
 	u64 load_time; /* ns since boottime */
 	u8 name[BPF_OBJ_NAME_LEN];
+#ifdef CONFIG_SECURITY
+	void *security;
+#endif
 	union {
 		struct work_struct work;
 		struct rcu_head	rcu;
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index c9258124e417..7161d8e7ee79 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1351,6 +1351,40 @@
  *	@inode we wish to get the security context of.
  *	@ctx is a pointer in which to place the allocated security context.
  *	@ctxlen points to the place to put the length of @ctx.
+ *
+ * Security hooks for using the eBPF maps and programs functionalities through
+ * eBPF syscalls.
+ *
+ * @bpf:
+ *	Do a initial check for all bpf syscalls after the attribute is copied
+ *	into the kernel. The actual security module can implement their own
+ *	rules to check the specific cmd they need.
+ *
+ * @bpf_map:
+ *	Do a check when the kernel generate and return a file descriptor for
+ *	eBPF maps.
+ *
+ *	@map: bpf map that we want to access
+ *	@mask: the access flags
+ *
+ * @bpf_prog:
+ *	Do a check when the kernel generate and return a file descriptor for
+ *	eBPF programs.
+ *
+ *	@prog: bpf prog that userspace want to use.
+ *
+ * @bpf_map_alloc_security:
+ *	Initialize the security field inside bpf map.
+ *
+ * @bpf_map_free_security:
+ *	Clean up the security information stored inside bpf map.
+ *
+ * @bpf_prog_alloc_security:
+ *	Initialize the security field inside bpf program.
+ *
+ * @bpf_prog_free_security:
+ *	Clean up the security information stored inside bpf prog.
+ *
  */
 union security_list_options {
 	int (*binder_set_context_mgr)(struct task_struct *mgr);
@@ -1682,6 +1716,17 @@ union security_list_options {
 				struct audit_context *actx);
 	void (*audit_rule_free)(void *lsmrule);
 #endif /* CONFIG_AUDIT */
+
+#ifdef CONFIG_BPF_SYSCALL
+	int (*bpf)(int cmd, union bpf_attr *attr,
+				 unsigned int size);
+	int (*bpf_map)(struct bpf_map *map, fmode_t fmode);
+	int (*bpf_prog)(struct bpf_prog *prog);
+	int (*bpf_map_alloc_security)(struct bpf_map *map);
+	void (*bpf_map_free_security)(struct bpf_map *map);
+	int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux);
+	void (*bpf_prog_free_security)(struct bpf_prog_aux *aux);
+#endif /* CONFIG_BPF_SYSCALL */
 };
 
 struct security_hook_heads {
@@ -1901,6 +1946,15 @@ struct security_hook_heads {
 	struct list_head audit_rule_match;
 	struct list_head audit_rule_free;
 #endif /* CONFIG_AUDIT */
+#ifdef CONFIG_BPF_SYSCALL
+	struct list_head bpf;
+	struct list_head bpf_map;
+	struct list_head bpf_prog;
+	struct list_head bpf_map_alloc_security;
+	struct list_head bpf_map_free_security;
+	struct list_head bpf_prog_alloc_security;
+	struct list_head bpf_prog_free_security;
+#endif /* CONFIG_BPF_SYSCALL */
 } __randomize_layout;
 
 /*
diff --git a/include/linux/security.h b/include/linux/security.h
index ce6265960d6c..18800b0911e5 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -31,6 +31,7 @@
 #include <linux/string.h>
 #include <linux/mm.h>
 #include <linux/fs.h>
+#include <linux/bpf.h>
 
 struct linux_binprm;
 struct cred;
@@ -1730,6 +1731,50 @@ static inline void securityfs_remove(struct dentry *dentry)
 
 #endif
 
+#ifdef CONFIG_BPF_SYSCALL
+#ifdef CONFIG_SECURITY
+extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size);
+extern int security_bpf_map(struct bpf_map *map, fmode_t fmode);
+extern int security_bpf_prog(struct bpf_prog *prog);
+extern int security_bpf_map_alloc(struct bpf_map *map);
+extern void security_bpf_map_free(struct bpf_map *map);
+extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux);
+extern void security_bpf_prog_free(struct bpf_prog_aux *aux);
+#else
+static inline int security_bpf(int cmd, union bpf_attr *attr,
+					     unsigned int size)
+{
+	return 0;
+}
+
+static inline int security_bpf_map(struct bpf_map *map, fmode_t fmode)
+{
+	return 0;
+}
+
+static inline int security_bpf_prog(struct bpf_prog *prog)
+{
+	return 0;
+}
+
+static inline int security_bpf_map_alloc(struct bpf_map *map)
+{
+	return 0;
+}
+
+static inline void security_bpf_map_free(struct bpf_map *map)
+{ }
+
+static inline int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
+{
+	return 0;
+}
+
+static inline void security_bpf_prog_free(struct bpf_prog_aux *aux)
+{ }
+#endif /* CONFIG_SECURITY */
+#endif /* CONFIG_BPF_SYSCALL */
+
 #ifdef CONFIG_SECURITY
 
 static inline char *alloc_secdata(void)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7c8a9964a41d..58ff769d58ab 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -210,6 +210,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
 	struct bpf_map *map = container_of(work, struct bpf_map, work);
 
 	bpf_map_uncharge_memlock(map);
+	security_bpf_map_free(map);
 	/* implementation dependent freeing */
 	map->ops->map_free(map);
 }
@@ -323,6 +324,9 @@ static const struct file_operations bpf_map_fops = {
 
 int bpf_map_new_fd(struct bpf_map *map, int flags)
 {
+	if (security_bpf_map(map, OPEN_FMODE(flags)))
+		return -EPERM;
+
 	return anon_inode_getfd("bpf-map", &bpf_map_fops, map,
 				flags | O_CLOEXEC);
 }
@@ -404,10 +408,14 @@ static int map_create(union bpf_attr *attr)
 	atomic_set(&map->refcnt, 1);
 	atomic_set(&map->usercnt, 1);
 
-	err = bpf_map_charge_memlock(map);
+	err = security_bpf_map_alloc(map);
 	if (err)
 		goto free_map_nouncharge;
 
+	err = bpf_map_charge_memlock(map);
+	if (err)
+		goto free_map_sec;
+
 	err = bpf_map_alloc_id(map);
 	if (err)
 		goto free_map;
@@ -429,6 +437,8 @@ static int map_create(union bpf_attr *attr)
 
 free_map:
 	bpf_map_uncharge_memlock(map);
+free_map_sec:
+	security_bpf_map_free(map);
 free_map_nouncharge:
 	map->ops->map_free(map);
 	return err;
@@ -907,6 +917,7 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu)
 
 	free_used_maps(aux);
 	bpf_prog_uncharge_memlock(aux->prog);
+	security_bpf_prog_free(aux);
 	bpf_prog_free(aux->prog);
 }
 
@@ -965,6 +976,9 @@ static const struct file_operations bpf_prog_fops = {
 
 int bpf_prog_new_fd(struct bpf_prog *prog)
 {
+	if (security_bpf_prog(prog))
+		return -EPERM;
+
 	return anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog,
 				O_RDWR | O_CLOEXEC);
 }
@@ -1104,10 +1118,14 @@ static int bpf_prog_load(union bpf_attr *attr)
 	if (!prog)
 		return -ENOMEM;
 
-	err = bpf_prog_charge_memlock(prog);
+	err = security_bpf_prog_alloc(prog->aux);
 	if (err)
 		goto free_prog_nouncharge;
 
+	err = bpf_prog_charge_memlock(prog);
+	if (err)
+		goto free_prog_sec;
+
 	prog->len = attr->insn_cnt;
 
 	err = -EFAULT;
@@ -1165,6 +1183,8 @@ static int bpf_prog_load(union bpf_attr *attr)
 	free_used_maps(prog->aux);
 free_prog:
 	bpf_prog_uncharge_memlock(prog);
+free_prog_sec:
+	security_bpf_prog_free(prog->aux);
 free_prog_nouncharge:
 	bpf_prog_free(prog);
 	return err;
@@ -1585,6 +1605,10 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	if (copy_from_user(&attr, uattr, size) != 0)
 		return -EFAULT;
 
+	err = security_bpf(cmd, &attr, size);
+	if (err)
+		return -EPERM;
+
 	switch (cmd) {
 	case BPF_MAP_CREATE:
 		err = map_create(&attr);
diff --git a/security/security.c b/security/security.c
index 4bf0f571b4ef..1cd8526cb0b7 100644
--- a/security/security.c
+++ b/security/security.c
@@ -12,6 +12,7 @@
  *	(at your option) any later version.
  */
 
+#include <linux/bpf.h>
 #include <linux/capability.h>
 #include <linux/dcache.h>
 #include <linux/module.h>
@@ -1703,3 +1704,34 @@ int security_audit_rule_match(u32 secid, u32 field, u32 op, void *lsmrule,
 				actx);
 }
 #endif /* CONFIG_AUDIT */
+
+#ifdef CONFIG_BPF_SYSCALL
+int security_bpf(int cmd, union bpf_attr *attr, unsigned int size)
+{
+	return call_int_hook(bpf, 0, cmd, attr, size);
+}
+int security_bpf_map(struct bpf_map *map, fmode_t fmode)
+{
+	return call_int_hook(bpf_map, 0, map, fmode);
+}
+int security_bpf_prog(struct bpf_prog *prog)
+{
+	return call_int_hook(bpf_prog, 0, prog);
+}
+int security_bpf_map_alloc(struct bpf_map *map)
+{
+	return call_int_hook(bpf_map_alloc_security, 0, map);
+}
+int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
+{
+	return call_int_hook(bpf_prog_alloc_security, 0, aux);
+}
+void security_bpf_map_free(struct bpf_map *map)
+{
+	call_void_hook(bpf_map_free_security, map);
+}
+void security_bpf_prog_free(struct bpf_prog_aux *aux)
+{
+	call_void_hook(bpf_prog_free_security, aux);
+}
+#endif /* CONFIG_BPF_SYSCALL */
-- 
2.14.2.920.gcf0c67979c-goog


^ permalink raw reply related

* [PATCH net-next 3/4] selinux: bpf: Add selinux check for eBPF syscall operations
From: Chenbo Feng @ 2017-10-04 18:29 UTC (permalink / raw)
  To: netdev, SELinux, linux-security-module
  Cc: Jeffrey Vander Stoep, Lorenzo Colitti, Alexei Starovoitov,
	Daniel Borkmann, Chenbo Feng
In-Reply-To: <20171004182932.140028-1-chenbofeng.kernel@gmail.com>

From: Chenbo Feng <fengc@google.com>

Implement the actual checks introduced to eBPF related syscalls. This
implementation use the security field inside bpf object to store a sid that
identify the bpf object. And when processes try to access the object,
selinux will check if processes have the right privileges. The creation
of eBPF object are also checked at the general bpf check hook and new
cmd introduced to eBPF domain can also be checked there.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 security/selinux/hooks.c            | 111 ++++++++++++++++++++++++++++++++++++
 security/selinux/include/classmap.h |   2 +
 security/selinux/include/objsec.h   |   4 ++
 3 files changed, 117 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index f5d304736852..41aba4e3d57c 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -85,6 +85,7 @@
 #include <linux/export.h>
 #include <linux/msg.h>
 #include <linux/shm.h>
+#include <linux/bpf.h>
 
 #include "avc.h"
 #include "objsec.h"
@@ -6252,6 +6253,106 @@ static void selinux_ib_free_security(void *ib_sec)
 }
 #endif
 
+#ifdef CONFIG_BPF_SYSCALL
+static int selinux_bpf(int cmd, union bpf_attr *attr,
+				     unsigned int size)
+{
+	u32 sid = current_sid();
+	int ret;
+
+	switch (cmd) {
+	case BPF_MAP_CREATE:
+		ret = avc_has_perm(sid, sid, SECCLASS_BPF_MAP, BPF_MAP__CREATE,
+				   NULL);
+		break;
+	case BPF_PROG_LOAD:
+		ret = avc_has_perm(sid, sid, SECCLASS_BPF_PROG, BPF_PROG__LOAD,
+				   NULL);
+		break;
+	default:
+		ret = 0;
+		break;
+	}
+
+	return ret;
+}
+
+static u32 bpf_map_fmode_to_av(fmode_t fmode)
+{
+	u32 av = 0;
+
+	if (f_mode & FMODE_READ)
+		av |= BPF_MAP__READ;
+	if (f_mode & FMODE_WRITE)
+		av |= BPF_MAP__WRITE;
+	return av;
+}
+
+static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode)
+{
+	u32 sid = current_sid();
+	struct bpf_security_struct *bpfsec;
+
+	bpfsec = map->security;
+	return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_MAP,
+			    bpf_map_fmode_to_av(fmode), NULL);
+}
+
+static int selinux_bpf_prog(struct bpf_prog *prog)
+{
+	u32 sid = current_sid();
+	struct bpf_security_struct *bpfsec;
+
+	bpfsec = prog->aux->security;
+	return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_PROG,
+			    BPF_PROG__USE, NULL);
+}
+
+static int selinux_bpf_map_alloc(struct bpf_map *map)
+{
+	struct bpf_security_struct *bpfsec;
+
+	bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
+	if (!bpfsec)
+		return -ENOMEM;
+
+	bpfsec->sid = current_sid();
+	map->security = bpfsec;
+
+	return 0;
+}
+
+static void selinux_bpf_map_free(struct bpf_map *map)
+{
+	struct bpf_security_struct *bpfsec = map->security;
+
+	map->security = NULL;
+	kfree(bpfsec);
+}
+
+static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux)
+{
+	struct bpf_security_struct *bpfsec;
+
+	bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
+	if (!bpfsec)
+		return -ENOMEM;
+
+	bpfsec->sid = current_sid();
+	aux->security = bpfsec;
+
+	return 0;
+}
+
+static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
+{
+	struct bpf_security_struct *bpfsec = aux->security;
+
+	aux->security = NULL;
+	kfree(bpfsec);
+}
+#endif
+
 static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(binder_set_context_mgr, selinux_binder_set_context_mgr),
 	LSM_HOOK_INIT(binder_transaction, selinux_binder_transaction),
@@ -6471,6 +6572,16 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(audit_rule_match, selinux_audit_rule_match),
 	LSM_HOOK_INIT(audit_rule_free, selinux_audit_rule_free),
 #endif
+
+#ifdef CONFIG_BPF_SYSCALL
+	LSM_HOOK_INIT(bpf, selinux_bpf),
+	LSM_HOOK_INIT(bpf_map, selinux_bpf_map),
+	LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
+	LSM_HOOK_INIT(bpf_map_alloc_security, selinux_bpf_map_alloc),
+	LSM_HOOK_INIT(bpf_prog_alloc_security, selinux_bpf_prog_alloc),
+	LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
+	LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free),
+#endif
 };
 
 static __init int selinux_init(void)
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 35ffb29a69cb..7253c5eea59c 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -237,6 +237,8 @@ struct security_class_mapping secclass_map[] = {
 	  { "access", NULL } },
 	{ "infiniband_endport",
 	  { "manage_subnet", NULL } },
+	{ "bpf_map", {"create", "read", "write"} },
+	{ "bpf_prog", {"load", "use"} },
 	{ NULL }
   };
 
diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h
index 1649cd18eb0b..3d54468ce334 100644
--- a/security/selinux/include/objsec.h
+++ b/security/selinux/include/objsec.h
@@ -150,6 +150,10 @@ struct pkey_security_struct {
 	u32	sid;	/* SID of pkey */
 };
 
+struct bpf_security_struct {
+	u32 sid;  /*SID of bpf obj creater*/
+};
+
 extern unsigned int selinux_checkreqprot;
 
 #endif /* _SELINUX_OBJSEC_H_ */
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related

* [PATCH net-next 4/4] selinux: bpf: Add addtional check for bpf object file receive
From: Chenbo Feng @ 2017-10-04 18:29 UTC (permalink / raw)
  To: netdev, SELinux, linux-security-module
  Cc: Jeffrey Vander Stoep, Lorenzo Colitti, Alexei Starovoitov,
	Daniel Borkmann, Chenbo Feng
In-Reply-To: <20171004182932.140028-1-chenbofeng.kernel@gmail.com>

From: Chenbo Feng <fengc@google.com>

Introduce a bpf object related check when sending and receiving files
through unix domain socket as well as binder. It checks if the receiving
process have privilege to read/write the bpf map or use the bpf program.
This check is necessary because the bpf maps and programs are using a
anonymous inode as their shared inode so the normal way of checking the
files and sockets when passing between processes cannot work properly on
eBPF object. This check only works when the BPF_SYSCALL is configured.

Signed-off-by: Chenbo Feng <fengc@google.com>
---
 include/linux/bpf.h      |  3 +++
 kernel/bpf/syscall.c     |  4 ++--
 security/selinux/hooks.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d757ea3f2228..ac8428a36d56 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -250,6 +250,9 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 #ifdef CONFIG_BPF_SYSCALL
 DECLARE_PER_CPU(int, bpf_prog_active);
 
+extern const struct file_operations bpf_map_fops;
+extern const struct file_operations bpf_prog_fops;
+
 #define BPF_PROG_TYPE(_id, _ops) \
 	extern const struct bpf_verifier_ops _ops;
 #define BPF_MAP_TYPE(_id, _ops) \
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 58ff769d58ab..5789a5359f0a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -313,7 +313,7 @@ static ssize_t bpf_dummy_write(struct file *filp, const char __user *buf,
 	return -EINVAL;
 }
 
-static const struct file_operations bpf_map_fops = {
+const struct file_operations bpf_map_fops = {
 #ifdef CONFIG_PROC_FS
 	.show_fdinfo	= bpf_map_show_fdinfo,
 #endif
@@ -965,7 +965,7 @@ static void bpf_prog_show_fdinfo(struct seq_file *m, struct file *filp)
 }
 #endif
 
-static const struct file_operations bpf_prog_fops = {
+const struct file_operations bpf_prog_fops = {
 #ifdef CONFIG_PROC_FS
 	.show_fdinfo	= bpf_prog_show_fdinfo,
 #endif
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 41aba4e3d57c..381474ce3216 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1847,6 +1847,7 @@ static int file_has_perm(const struct cred *cred,
 
 	/* av is zero if only checking access to the descriptor. */
 	rc = 0;
+
 	if (av)
 		rc = inode_has_perm(cred, inode, av, &ad);
 
@@ -2142,6 +2143,10 @@ static int selinux_binder_transfer_binder(struct task_struct *from,
 			    NULL);
 }
 
+#ifdef CONFIG_BPF_SYSCALL
+static int bpf_fd_pass(struct file *file, u32 sid);
+#endif
+
 static int selinux_binder_transfer_file(struct task_struct *from,
 					struct task_struct *to,
 					struct file *file)
@@ -2165,6 +2170,12 @@ static int selinux_binder_transfer_file(struct task_struct *from,
 			return rc;
 	}
 
+#ifdef CONFIG_BPF_SYSCALL
+	rc = bpf_fd_pass(file, sid);
+	if (rc)
+		return rc;
+#endif
+
 	if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
 		return 0;
 
@@ -3735,8 +3746,18 @@ static int selinux_file_send_sigiotask(struct task_struct *tsk,
 static int selinux_file_receive(struct file *file)
 {
 	const struct cred *cred = current_cred();
+	int rc;
+
+	rc = file_has_perm(cred, file, file_to_av(file));
+	if (rc)
+		goto out;
+
+#ifdef CONFIG_BPF_SYSCALL
+	rc = bpf_fd_pass(file, cred_sid(sid));
+#endif
 
-	return file_has_perm(cred, file, file_to_av(file));
+out:
+	return rc;
 }
 
 static int selinux_file_open(struct file *file, const struct cred *cred)
@@ -6288,6 +6309,40 @@ static u32 bpf_map_fmode_to_av(fmode_t fmode)
 	return av;
 }
 
+/* This function will check the file pass through unix socket or binder to see
+ * if it is a bpf related object. And apply correspinding checks on the bpf
+ * object based on the type. The bpf maps and programs, not like other files and
+ * socket, are using a shared anonymous inode inside the kernel as their inode.
+ * So checking that inode cannot identify if the process have privilege to
+ * access the bpf object and that's why we have to add this additional check in
+ * selinux_file_receive and selinux_binder_transfer_files.
+ */
+static int bpf_fd_pass(struct file *file, u32 sid)
+{
+	struct bpf_security_struct *bpfsec;
+	u32 sid = cred_sid(cred);
+	struct bpf_prog *prog;
+	struct bpf_map *map;
+	int ret;
+
+	if (file->f_op == &bpf_map_fops) {
+		map = file->private_data;
+		bpfsec = map->security;
+		ret = avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_MAP,
+				   bpf_map_fmode_to_av(file->f_mode), NULL);
+		if (ret)
+			return ret;
+	} else if (file->f_op == &bpf_prog_fops) {
+		prog = file->private_data;
+		bpfsec = prog->aux->security;
+		ret = avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_PROG,
+				   BPF_PROG__USE, NULL);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
 static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode)
 {
 	u32 sid = current_sid();
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related

* Re: [PATCH net-next] net: cache skb_shinfo() in skb_try_coalesce()
From: David Miller @ 2017-10-04 18:34 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1507139315.14419.5.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 04 Oct 2017 10:48:35 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> Compiler does not really know that skb_shinfo(to|from) are constants
> in skb_try_coalesce(), lets cache their values to shrink code.
> 
> We might even take care of skb_zcopy() calls later.
> 
> $ size net/core/skbuff.o.before net/core/skbuff.o
>    text	   data	    bss	    dec	    hex	filename
>   40727	   1298	      0	  42025	   a429	net/core/skbuff.o.before
>   40631	   1298	      0	  41929	   a3c9	net/core/skbuff.o
> 
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next v3 3/3] tools: bpftool: add documentation
From: Jesper Dangaard Brouer @ 2017-10-04 18:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: brouer, netdev, alexei.starovoitov, daniel, dsahern, oss-drivers,
	David Beckett, linux-doc@vger.kernel.org
In-Reply-To: <20171004154032.11413-4-jakub.kicinski@netronome.com>

On Wed,  4 Oct 2017 08:40:32 -0700
Jakub Kicinski <jakub.kicinski@netronome.com> wrote:

> Add documentation for bpftool.  Separate files for each subcommand.
> Use rst format.  Documentation is compiled into man pages using
> rst2man.
> 
> Signed-off-by: David Beckett <david.beckett@netronome.com>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  tools/bpf/bpftool/Documentation/Makefile         |  34 +++++++
>  tools/bpf/bpftool/Documentation/bpftool-map.txt  | 110 +++++++++++++++++++++++
>  tools/bpf/bpftool/Documentation/bpftool-prog.txt |  79 ++++++++++++++++
>  tools/bpf/bpftool/Documentation/bpftool.txt      |  34 +++++++

RST-format files are usually called .rst and not .txt

This is useful when people happen to browse the code via github, then they get formatted nicely e.g.:
 https://github.com/torvalds/linux/blob/master/samples/bpf/README.rst
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net-next 5/7] net: bonding: Add extack messages for some enslave failures
From: Jiri Pirko @ 2017-10-04 18:39 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, j.vosburgh, vfalico, andy, jiri, idosch, davem, bridge
In-Reply-To: <1d18c43d-0604-3d38-2b68-ca3c7d1ab754@gmail.com>

Wed, Oct 04, 2017 at 08:06:16PM CEST, dsahern@gmail.com wrote:
>On 10/4/17 11:04 AM, Jiri Pirko wrote:
>> Wed, Oct 04, 2017 at 05:35:46PM CEST, dsahern@gmail.com wrote:
>>> On 10/3/17 11:38 PM, Jiri Pirko wrote:
>>>> Wed, Oct 04, 2017 at 06:58:52AM CEST, dsahern@gmail.com wrote:
>>>>> A number of bond_enslave errors are logged using the netdev_err API.
>>>>> Return those messages to userspace via the extack facility.
>>>>>
>>>>> Signed-off-by: David Ahern <dsahern@gmail.com>
>>>>> ---
>>>>> drivers/net/bonding/bond_main.c | 12 ++++++++++++
>>>>> 1 file changed, 12 insertions(+)
>>>>>
>>>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>>>> index bc92307c2082..6688dc9154e0 100644
>>>>> --- a/drivers/net/bonding/bond_main.c
>>>>> +++ b/drivers/net/bonding/bond_main.c
>>>>> @@ -1348,12 +1348,15 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
>>>>>
>>>>> 	/* already in-use? */
>>>>> 	if (netdev_is_rx_handler_busy(slave_dev)) {
>>>>> +		NL_SET_ERR_MSG(extack,
>>>>> +			       "Device is in use and cannot be enslaved");
>>>>
>>>> Please don't do this kind of wrapping. Just let the string be on the
>>>> same line.
>>>>
>>>
>>> Ok, I will do that for bonding only since it is the existing style.
>> 
>> I don't believe you need to do this wrap for any code. Just don't wrap.
>> General code stype says no wrap for strings :)
>> 
>
>I do not break / wrap strings; they need to be searchable. I assumed you
>meant this is preferred for bonding:
>
>NL_SET_ERR_MSG(extack, "Device is in use and cannot be enslaved");

Yep.

>
>
>over what I have done:
>
>NL_SET_ERR_MSG(extack,
>	       "Device is in use and cannot be enslaved");

^ permalink raw reply

* Re: [PATCH] nfp: convert nfp_eth_set_bit_config() into a macro
From: Matthias Kaehlcke @ 2017-10-04 18:44 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Manoj Gupta, David S . Miller, Simon Horman, Dirk van der Merwe,
	oss-drivers, netdev, linux-kernel, Renato Golin, Guenter Roeck,
	Doug Anderson
In-Reply-To: <20171004111724.29a4d7ff@cakuba.netronome.com>

El Wed, Oct 04, 2017 at 11:17:24AM -0700 Jakub Kicinski ha dit:

> On Wed, 4 Oct 2017 10:42:42 -0700, Manoj Gupta wrote:
> > Hi Jakub,
> > 
> > I had discussed about supporting this code with some clang developers.
> > However, the consensus was this code relies on a specific GCC optimizer
> > behavior and Clang does not share the same behavior by design.
> 
> Hm.  I find surprising that constant propagation to inlined functions
> is considered something GCC-specific rather than obvious/basic.
> 
> > Note that even GCC can't compile this code at -O0.
> 
> Yes, that makes me feel uncomfortable...  Could you try this?

The kernel as a whole does not build with -O0, so I couldn't try
that. I know Manoj (who isn't a kernel dev) isolated the
compiletime_assert macros and encountered the same 'undefined
reference' errors with gcc -O0 that we are seeing with clang.
This is due to:

 "if you use it in an inlined function and pass an argument of the
 function as the argument to the built-in, GCC never returns 1 when
 you call the inline function with a string constant or compound
 literal (see Compound Literals) and does not return 1 when you pass a
 constant numeric value to the inline function unless you specify the
 -O option."

https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/Other-Builtins.html

> diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
> index f6f7c085f8e0..47251396fcae 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
> +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
> @@ -469,10 +469,10 @@ int nfp_eth_set_configured(struct nfp_cpp *cpp, unsigned int idx, bool configed)
>  	return nfp_eth_config_commit_end(nsp);
>  }
>  
> -/* Force inline, FIELD_* macroes require masks to be compilation-time known */
> -static __always_inline int
> +static int
>  nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
> -		       const u64 mask, unsigned int val, const u64 ctrl_bit)
> +		       const u64 mask, const unsigned int shift,
> +		       unsigned int val, const u64 ctrl_bit)
>  {
>  	union eth_table_entry *entries = nfp_nsp_config_entries(nsp);
>  	unsigned int idx = nfp_nsp_config_idx(nsp);
> @@ -489,11 +489,11 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
>  
>  	/* Check if we are already in requested state */
>  	reg = le64_to_cpu(entries[idx].raw[raw_idx]);
> -	if (val == FIELD_GET(mask, reg))
> +	if (val == (reg & mask) >> shift)
>  		return 0;
>  
>  	reg &= ~mask;
> -	reg |= FIELD_PREP(mask, val);
> +	reg |= (val << shift) & mask;
>  	entries[idx].raw[raw_idx] = cpu_to_le64(reg);
>  
>  	entries[idx].control |= cpu_to_le64(ctrl_bit);
> @@ -503,6 +503,13 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
>  	return 0;
>  }
>  
> +#define NFP_ETH_SET_BIT_CONFIG(nsp, raw_idx, mask, val, ctrl_bit)	\
> +	({								\
> +		__BF_FIELD_CHECK(mask, 0ULL, val, "NFP_ETH_SET_BIT_CONFIG: "); \
> +		nfp_eth_set_bit_config(nsp, raw_idx, mask, __bf_shf(mask), \
> +				       val, ctrl_bit);			\
> +	})
> +
>  /**
>   * __nfp_eth_set_aneg() - set PHY autonegotiation control bit
>   * @nsp:	NFP NSP handle returned from nfp_eth_config_start()
> @@ -515,7 +522,7 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
>   */
>  int __nfp_eth_set_aneg(struct nfp_nsp *nsp, enum nfp_eth_aneg mode)
>  {
> -	return nfp_eth_set_bit_config(nsp, NSP_ETH_RAW_STATE,
> +	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_STATE,
>  				      NSP_ETH_STATE_ANEG, mode,
>  				      NSP_ETH_CTRL_SET_ANEG);
>  }
> @@ -544,7 +551,7 @@ int __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed)
>  		return -EINVAL;
>  	}
>  
> -	return nfp_eth_set_bit_config(nsp, NSP_ETH_RAW_STATE,
> +	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_STATE,
>  				      NSP_ETH_STATE_RATE, rate,
>  				      NSP_ETH_CTRL_SET_RATE);
>  }
> @@ -561,6 +568,6 @@ int __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed)
>   */
>  int __nfp_eth_set_split(struct nfp_nsp *nsp, unsigned int lanes)
>  {
> -	return nfp_eth_set_bit_config(nsp, NSP_ETH_RAW_PORT, NSP_ETH_PORT_LANES,
> +	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_PORT, NSP_ETH_PORT_LANES,
>  				      lanes, NSP_ETH_CTRL_SET_LANES);
>  }

^ permalink raw reply

* Re: [PATCH] nfp: convert nfp_eth_set_bit_config() into a macro
From: Matthias Kaehlcke @ 2017-10-04 18:49 UTC (permalink / raw)
  To: Joe Perches
  Cc: Jakub Kicinski, David S . Miller, Simon Horman,
	Dirk van der Merwe, oss-drivers, netdev, linux-kernel,
	Renato Golin, Manoj Gupta, Guenter Roeck, Doug Anderson
In-Reply-To: <1507140439.4434.14.camel@perches.com>

Hi Joe,

El Wed, Oct 04, 2017 at 11:07:19AM -0700 Joe Perches ha dit:

> On Tue, 2017-10-03 at 13:05 -0700, Matthias Kaehlcke wrote:
> > nfp_eth_set_bit_config() is marked as __always_inline to allow gcc to
> > identify the 'mask' parameter as known to be constant at compile time,
> > which is required to use the FIELD_GET() macro.
> > 
> > The forced inlining does the trick for gcc, but for kernel builds with
> > clang it results in undefined symbols:
> 
> Can't you use local different FIELD_PREP/FIELD_GET macros
> with a different name without the BUILD_BUG tests?
> 
> i.e.:
> 
> #define NFP_FIELD_PREP(_mask, _val)				\
> ({								\
> 	((typeof(_mask))(_val) << __bf_shf(_mask)) & (_mask);	\
> })
> 
> #define NFP_FIELD_GET(_mask, _reg)				\
> ({								\
> 	(typeof(_mask))(((_reg) & (_mask)) >> __bf_shf(_mask));	\
> })
> 
> Then the __always_inline can be removed from
> nfp_eth_set_bit_config too.

Thanks for the suggestion. This seems a viable alternative if David
and the NFP owners can live without the extra checking provided by
__BF_FIELD_CHECK.

^ permalink raw reply

* Re: [PATCH v2 net-next] ravb: RX checksum offload
From: Sergei Shtylyov @ 2017-10-04 18:53 UTC (permalink / raw)
  To: Simon Horman, David Miller; +Cc: Magnus Damm, netdev, linux-renesas-soc
In-Reply-To: <1507103667-6084-1-git-send-email-horms+renesas@verge.net.au>

On 10/04/2017 10:54 AM, Simon Horman wrote:

> Add support for RX checksum offload. This is enabled by default and
> may be disabled and re-enabled using ethtool:
> 
>   # ethtool -K eth0 rx off
>   # ethtool -K eth0 rx on
> 
> The RAVB provides a simple checksumming scheme which appears to be
> completely compatible with CHECKSUM_COMPLETE: sum of all packet data after
> the L2 header is appended to packet data; this may be trivially read by the
> driver and used to update the skb accordingly.
> 
> In terms of performance throughput is close to gigabit line-rate both with
> and without RX checksum offload enabled. Perf output, however, appears to
> indicate that significantly less time is spent in do_csum(). This is as
> expected.
> 
> Test results with RX checksum offload enabled:
>   # /usr/bin/perf_3.16 record -o /run/perf.data -a netperf -t TCP_MAERTS -H 10.4.3.162
>   MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.4.3.162 () port 0 AF_INET : demo
>   enable_enobufs failed: getprotobyname
>   Recv   Send    Send
>   Socket Socket  Message  Elapsed
>   Size   Size    Size     Time     Throughput
>   bytes  bytes   bytes    secs.    10^6bits/sec
> 
>    87380  16384  16384    10.00     937.54
> 
>   Summary of output of perf report:
>      18.28%      ksoftirqd/0  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
>      10.34%      ksoftirqd/0  [kernel.kallsyms]  [k] __pi_memcpy
>       9.83%      ksoftirqd/0  [kernel.kallsyms]  [k] ravb_poll
>       7.89%      ksoftirqd/0  [kernel.kallsyms]  [k] skb_put
>       4.01%      ksoftirqd/0  [kernel.kallsyms]  [k] dev_gro_receive
>       3.37%          netperf  [kernel.kallsyms]  [k] __arch_copy_to_user
>       3.17%          swapper  [kernel.kallsyms]  [k] arch_cpu_idle
>       2.55%          swapper  [kernel.kallsyms]  [k] tick_nohz_idle_enter
>       2.04%      ksoftirqd/0  [kernel.kallsyms]  [k] __pi___inval_dcache_area
>       2.03%          swapper  [kernel.kallsyms]  [k] _raw_spin_unlock_irq
>       1.96%      ksoftirqd/0  [kernel.kallsyms]  [k] __netdev_alloc_skb
>       1.59%      ksoftirqd/0  [kernel.kallsyms]  [k] __slab_alloc.isra.83
> 
> Test results without RX checksum offload enabled:
>   # /usr/bin/perf_3.16 record -o /run/perf.data -a netperf -t TCP_MAERTS -H 10.4.3.162
>   MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.4.3.162 () port 0 AF_INET : demo
>   enable_enobufs failed: getprotobyname
>   Recv   Send    Send
>   Socket Socket  Message  Elapsed
>   Size   Size    Size     Time     Throughput
>   bytes  bytes   bytes    secs.    10^6bits/sec
> 
>    87380  16384  16384    10.00     940.20
> 
>   Summary of output of perf report:
>      17.10%    ksoftirqd/0  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
>      10.99%    ksoftirqd/0  [kernel.kallsyms]  [k] __pi_memcpy
>       8.87%    ksoftirqd/0  [kernel.kallsyms]  [k] ravb_poll
>       8.16%    ksoftirqd/0  [kernel.kallsyms]  [k] skb_put
>       7.42%    ksoftirqd/0  [kernel.kallsyms]  [k] do_csum
>       3.91%    ksoftirqd/0  [kernel.kallsyms]  [k] dev_gro_receive
>       2.31%        swapper  [kernel.kallsyms]  [k] arch_cpu_idle
>       2.16%    ksoftirqd/0  [kernel.kallsyms]  [k] __pi___inval_dcache_area
>       2.14%    ksoftirqd/0  [kernel.kallsyms]  [k] __netdev_alloc_skb
>       1.93%        netperf  [kernel.kallsyms]  [k] __arch_copy_to_user
>       1.79%        swapper  [kernel.kallsyms]  [k] tick_nohz_idle_enter
>       1.63%    ksoftirqd/0  [kernel.kallsyms]  [k] __slab_alloc.isra.83
> 
> Above results collected on an R-Car Gen 3 Salvator-X/r8a7796 ES1.0.
> Also tested on a R-Car Gen 3 Salvator-X/r8a7795 ES1.0.
> 
> By inspection this also appears to be compatible with the ravb found
> on R-Car Gen 2 SoCs, however, this patch is currently untested on such
> hardware.
> 
> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
> 
> ---
> v2
> Address review of v1 by Sergei Shtylyov
> * set features rather than oring them with (zero) existing values:
> * Set/unset using a single call to ravb_modify()

   Already belated but still:

Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

MBR, Sergei

^ permalink raw reply

* Re: [PATCH net-next 0/3] A own subdirectory for shared TCP code
From: Richard Siegfried @ 2017-10-04 18:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20171003.160338.1370433279322133749.davem@davemloft.net>

On 04/10/17 01:03, David Miller wrote:
> As someone who has to do backports regularly to -stable, there is no way
> I am applying this.
> 
> Sorry.
Okay, I see.

Is grouping files into subdirectories something generally
unwanted/unlikely to be applied or is this specific to TCP / networking?

Because there are several other places in the source tree where I would
like to group things.

^ permalink raw reply

* Re: [PATCH v6 05/11] dt-bindings: net: dwmac-sun8i: update documentation about integrated PHY
From: Corentin Labbe @ 2017-10-04 19:00 UTC (permalink / raw)
  To: Andrew Lunn, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	maxime.ripard-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, wens-jdAy2FN1RRM
  Cc: mark.rutland-5wv7dgnIgG8, linux-I+IVW8TIWO2tmTQ+vhA3Yw,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	peppe.cavallaro-qxv4g6HH51o, alexandre.torgue-qxv4g6HH51o,
	frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-sunxi-/JYPxA39Uh5TLH3MbocFFw
In-Reply-To: <20170928073708.GB32676@Red>

On Thu, Sep 28, 2017 at 09:37:08AM +0200, Corentin Labbe wrote:
> On Wed, Sep 27, 2017 at 04:02:10PM +0200, Andrew Lunn wrote:
> > Hi Corentin
> > 
> > > +Required properties for the mdio-mux node:
> > > +  - compatible = "mdio-mux"
> > 
> > This is too generic. Please add a more specific compatible for this
> > particular mux. You can keep "mdio-mux", since that is what the MDIO
> > subsystem will look for.
> > 
> 
> I will add allwinner,sun8i-h3-mdio-mux
> 
> > > +Required properties of the integrated phy node:
> > >  - clocks: a phandle to the reference clock for the EPHY
> > >  - resets: a phandle to the reset control for the EPHY
> > > +- phy-is-integrated
> > 
> > So the last thing you said is that the mux is not the problem
> > here. Something else is locking up. Did you discover what?
> > 
> > I really would like phy-is-integrated to go away.
> > 
> 
> I have found the problem: by enabling ephy clk/reset the timeout does not occur anymore.
> So we could remove phy-is-integrated by:
> Moving internal phy clk/reset handling in mdio_mux_syscon_switch_fn()
> But this means:
> - getting internalphy node always by manually get internal_mdio/internal_phy (and not by the given phyhandle)
> - doing some unnecessary tasks (enable/scan/disable) when external_phy is needed
> 

Hello

I have get rid of phy-is-integrated, but mdio_mux_syscon_switch_fn need to enable/disable ephy clk/reset.
And so access to internal PHY node.
But current DT made this ugly: (need to find mdio-mux then internalmdio then internal PHY)

Since MAC cannot reset/choose internal MDIO without ephy clk/rst, could we interpret this as thoses clk/rst must be set in emac node.
This will simplify a lot the code.

Regards

^ permalink raw reply

* Re: [net-next V4 PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP
From: Alexei Starovoitov @ 2017-10-04 19:02 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, jakub.kicinski, Michael S. Tsirkin, pavel.odintsov,
	Jason Wang, mchan, John Fastabend, peter.waskiewicz.jr,
	Daniel Borkmann, Andy Gospodarek
In-Reply-To: <150711862505.9499.15042356194495111353.stgit@firesoul>

On Wed, Oct 04, 2017 at 02:03:45PM +0200, Jesper Dangaard Brouer wrote:
> The 'cpumap' is primary used as a backend map for XDP BPF helper
> call bpf_redirect_map() and XDP_REDIRECT action, like 'devmap'.
> 
> This patch implement the main part of the map.  It is not connected to
> the XDP redirect system yet, and no SKB allocation are done yet.
> 
> The main concern in this patch is to ensure the datapath can run
> without any locking.  This adds complexity to the setup and tear-down
> procedure, which assumptions are extra carefully documented in the
> code comments.
> 
> V2:
>  - make sure array isn't larger than NR_CPUS
>  - make sure CPUs added is a valid possible CPU
> 
> V3: fix nitpicks from Jakub Kicinski <kubakici@wp.pl>
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
...
> +static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
> +{
> +	struct bpf_cpu_map *cmap;
> +	u64 cost;
> +	int err;
> +
> +	/* check sanity of attributes */
> +	if (attr->max_entries == 0 || attr->key_size != 4 ||
> +	    attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)
> +		return ERR_PTR(-EINVAL);
> +
> +	cmap = kzalloc(sizeof(*cmap), GFP_USER);
> +	if (!cmap)
> +		return ERR_PTR(-ENOMEM);

just noticed that there is nothing here nor in DEVMAP/SOCKMAP
that prevents unpriv user to create them.
I'm not sure it was intentional for DEVMAP/SOCKMAP.
For CPUMAP I'd suggest to restrict it to root, since it
suppose to operate with XDP only which is root anyway.
Note, lpm and lru maps are cap_sys_admin only already.

^ permalink raw reply

* Re: [PATCH net-next 2/2] flow_dissector: dissect tunnel info
From: Simon Horman @ 2017-10-04 19:07 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Tom Herbert, David Miller, Jiri Pirko, Jamal Hadi Salim,
	Cong Wang, Linux Kernel Network Developers, oss-drivers
In-Reply-To: <20171004180715.GH1895@nanopsycho>

On Wed, Oct 04, 2017 at 08:07:15PM +0200, Jiri Pirko wrote:
> Wed, Oct 04, 2017 at 05:52:54PM CEST, tom@herbertland.com wrote:
> >On Wed, Oct 4, 2017 at 1:15 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> >> Wed, Oct 04, 2017 at 10:08:57AM CEST, simon.horman@netronome.com wrote:
> >>>On Tue, Oct 03, 2017 at 11:17:46AM -0700, Tom Herbert wrote:
> >>>> On Tue, Oct 3, 2017 at 2:40 AM, Simon Horman <simon.horman@netronome.com> wrote:
> >>>> > On Mon, Oct 02, 2017 at 01:37:55PM -0700, Tom Herbert wrote:
> >>>> >> On Mon, Oct 2, 2017 at 1:41 AM, Simon Horman <simon.horman@netronome.com> wrote:
> >>>> >> > Move dissection of tunnel info from the flower classifier to the flow
> >>>> >> > dissector where all other dissection occurs.  This should not have any
> >>>> >> > behavioural affect on other users of the flow dissector.
> >>>> >
> >>>> > ...
> >>>>
> >>>> > I feel that we are circling back the perennial issue of flower using the
> >>>> > flow dissector in a somewhat broader/different way than many/all other
> >>>> > users of the flow dissector.
> >>>> >
> >>>> Simon,
> >>>>
> >>>> It's more like __skb_flow_dissect is already an incredibly complex
> >>>> function and because of that it's difficult to maintain. We need to
> >>>> measure changes against that fact. For this patch, there is precisely
> >>>> one user (cls_flower.c) and it's not at all clear to me if there will
> >>>> be ever any more (e.g. for hashing we don't need tunnel info). IMO, it
> >>>> should be just as easy and less convolution for everyone to have
> >>>> flower call __skb_flow_dissect_tunnel_info directly and not call if
> >>>> from __skb_flow_dissect.
> >>>
> >>>Hi Tom,
> >>>
> >>>my original suggestion was just that, but Jiri indicated a strong preference
> >>>for the approach taken by this patch. I think we need to widen the
> >>>participants in this discussion.
> >>
> >> I like the __skb_flow_dissect to be the function to call and it will do
> >> the job according to the configuration. I don't like to split in
> >> multiple calls.
> >
> >Those are not technical arguments. As I already mentioned, I don't
> >like it when we add stuff for the benefit of a 1% use case that
> >negatively impacts the rest of the 99% cases which is what I believe
> >is happening here.
> 
> Yeah. I just wanted the flow dissector to stay compact. But if needed,
> could be split. I just fear that it will become a mess that's all.
> 
> 
> >
> >> Does not make sense in the most of the cases as the
> >> dissection state would have to be carried in between calls.
> >
> >Please elaborate. This code is being moved into __skb_flow_dissect, so
> >the functionality was already there. I don't see any description in
> >this discussion that things were broken and that this patch is a
> >necessary fix.
> 
> Yeah, you are right.

Hi Tom, Hi Jiri,

I'm happy to make a patch to move the call to
__skb_flow_dissect_tunnel_info() from __skb_flow_dissect() to
fl_classify(). It seems that approach has been agreed on above.

^ permalink raw reply

* [PATCH net v2] RDS: IB: Limit the scope of has_fr/has_fmr variables
From: Avinash Repaka @ 2017-10-04 19:10 UTC (permalink / raw)
  To: Santosh Shilimkar, David S. Miller, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	rds-devel-N0ozoZBvEnrZJqsBc5GL+g,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Avinash Repaka

This patch fixes the scope of has_fr and has_fmr variables as they are
needed only in rds_ib_add_one().

Signed-off-by: Avinash Repaka <avinash.repaka-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Acked-by: Santosh Shilimkar <santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

---
 net/rds/ib.c | 11 ++++++-----
 net/rds/ib.h |  2 --
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/rds/ib.c b/net/rds/ib.c
index a0954ac..36dd209 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -126,6 +126,7 @@ void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
 static void rds_ib_add_one(struct ib_device *device)
 {
 	struct rds_ib_device *rds_ibdev;
+	bool has_fr, has_fmr;
 
 	/* Only handle IB (no iWARP) devices */
 	if (device->node_type != RDMA_NODE_IB_CA)
@@ -143,11 +144,11 @@ static void rds_ib_add_one(struct ib_device *device)
 	rds_ibdev->max_wrs = device->attrs.max_qp_wr;
 	rds_ibdev->max_sge = min(device->attrs.max_sge, RDS_IB_MAX_SGE);
 
-	rds_ibdev->has_fr = (device->attrs.device_cap_flags &
-				  IB_DEVICE_MEM_MGT_EXTENSIONS);
-	rds_ibdev->has_fmr = (device->alloc_fmr && device->dealloc_fmr &&
-			    device->map_phys_fmr && device->unmap_fmr);
-	rds_ibdev->use_fastreg = (rds_ibdev->has_fr && !rds_ibdev->has_fmr);
+	has_fr = (device->attrs.device_cap_flags &
+		  IB_DEVICE_MEM_MGT_EXTENSIONS);
+	has_fmr = (device->alloc_fmr && device->dealloc_fmr &&
+		   device->map_phys_fmr && device->unmap_fmr);
+	rds_ibdev->use_fastreg = (has_fr && !has_fmr);
 
 	rds_ibdev->fmr_max_remaps = device->attrs.max_map_per_fmr?: 32;
 	rds_ibdev->max_1m_mrs = device->attrs.max_mr ?
diff --git a/net/rds/ib.h b/net/rds/ib.h
index bf48224..6ea6a27 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -215,8 +215,6 @@ struct rds_ib_device {
 	struct list_head	conn_list;
 	struct ib_device	*dev;
 	struct ib_pd		*pd;
-	bool                    has_fmr;
-	bool                    has_fr;
 	bool                    use_fastreg;
 
 	unsigned int		max_mrs;
-- 
2.4.11

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH net v2] RDS: IB: Initialize max_items based on underlying device attributes
From: Avinash Repaka @ 2017-10-04 19:11 UTC (permalink / raw)
  To: Santosh Shilimkar, David S. Miller, netdev, linux-rdma, rds-devel,
	linux-kernel
  Cc: Avinash Repaka

Use max_1m_mrs/max_8k_mrs while setting max_items, as the former
variables are set based on the underlying device attributes.

Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
v1->v2
Fixed a typo in commit description

 net/rds/ib_rdma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
index 9a3c54e..e678699 100644
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -601,11 +601,11 @@ struct rds_ib_mr_pool *rds_ib_create_mr_pool(struct rds_ib_device *rds_ibdev,
 	if (pool_type == RDS_IB_MR_1M_POOL) {
 		/* +1 allows for unaligned MRs */
 		pool->fmr_attr.max_pages = RDS_MR_1M_MSG_SIZE + 1;
-		pool->max_items = RDS_MR_1M_POOL_SIZE;
+		pool->max_items = rds_ibdev->max_1m_mrs;
 	} else {
 		/* pool_type == RDS_IB_MR_8K_POOL */
 		pool->fmr_attr.max_pages = RDS_MR_8K_MSG_SIZE + 1;
-		pool->max_items = RDS_MR_8K_POOL_SIZE;
+		pool->max_items = rds_ibdev->max_8k_mrs;
 	}
 
 	pool->max_free_pinned = pool->max_items * pool->fmr_attr.max_pages / 4;
-- 
2.4.11

^ permalink raw reply related

* Re: [RFC] bpf: remove global verifier state
From: Daniel Borkmann @ 2017-10-04 19:13 UTC (permalink / raw)
  To: Alexei Starovoitov, Eric Dumazet
  Cc: Jakub Kicinski, dsahern, netdev, oss-drivers, david.beckett
In-Reply-To: <20171004034311.t3uba7vqcvahly2q@ast-mbp>

On 10/04/2017 05:43 AM, Alexei Starovoitov wrote:
> On Tue, Oct 03, 2017 at 08:24:06PM -0700, Eric Dumazet wrote:
>> On Tue, 2017-10-03 at 19:52 -0700, Alexei Starovoitov wrote:
>>
>>> yep. looks great.
>>> Please test it and submit officially :)
>>> The commit aafe6ae9cee3 ("bpf: dynamically allocate digest scratch buffer")
>>> fixed the other case where we were relying on the above mutex.
>>> The only other spot to be adjusted is to add spin_lock/mutex or DO_ONCE() to
>>> bpf_get_skb_set_tunnel_proto() to protect md_dst init.
>>> imo that would be it.
>>> Daniel, anything else comes to mind?

Yes, this should be all. DO_ONCE() for the tunnel proto seems a
good choice.

>> 16 MB of log (unswappable kernel memory) per active checker.
>>
>> We might offer a way to oom hosts.
>
> right. good point!
> we need to switch to continuous copy_to_user() after a page or so.
> Can even do it after every vscnprintf()
> but page at a time is probably faster.

Also worst case upper limits on verification side for holding state
aside from the log would need to be checked in terms of how much mem
we end up holding that is not accounted against any process (and not
really "rate-limited" anymore once we drop the mutex).

^ permalink raw reply

* Re: [PATCH net-next 2/2] flow_dissector: dissect tunnel info
From: Jiri Pirko @ 2017-10-04 19:17 UTC (permalink / raw)
  To: Simon Horman
  Cc: Tom Herbert, David Miller, Jiri Pirko, Jamal Hadi Salim,
	Cong Wang, Linux Kernel Network Developers, oss-drivers
In-Reply-To: <20171004190716.GA22135@netronome.com>

Wed, Oct 04, 2017 at 09:07:17PM CEST, simon.horman@netronome.com wrote:
>On Wed, Oct 04, 2017 at 08:07:15PM +0200, Jiri Pirko wrote:
>> Wed, Oct 04, 2017 at 05:52:54PM CEST, tom@herbertland.com wrote:
>> >On Wed, Oct 4, 2017 at 1:15 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> >> Wed, Oct 04, 2017 at 10:08:57AM CEST, simon.horman@netronome.com wrote:
>> >>>On Tue, Oct 03, 2017 at 11:17:46AM -0700, Tom Herbert wrote:
>> >>>> On Tue, Oct 3, 2017 at 2:40 AM, Simon Horman <simon.horman@netronome.com> wrote:
>> >>>> > On Mon, Oct 02, 2017 at 01:37:55PM -0700, Tom Herbert wrote:
>> >>>> >> On Mon, Oct 2, 2017 at 1:41 AM, Simon Horman <simon.horman@netronome.com> wrote:
>> >>>> >> > Move dissection of tunnel info from the flower classifier to the flow
>> >>>> >> > dissector where all other dissection occurs.  This should not have any
>> >>>> >> > behavioural affect on other users of the flow dissector.
>> >>>> >
>> >>>> > ...
>> >>>>
>> >>>> > I feel that we are circling back the perennial issue of flower using the
>> >>>> > flow dissector in a somewhat broader/different way than many/all other
>> >>>> > users of the flow dissector.
>> >>>> >
>> >>>> Simon,
>> >>>>
>> >>>> It's more like __skb_flow_dissect is already an incredibly complex
>> >>>> function and because of that it's difficult to maintain. We need to
>> >>>> measure changes against that fact. For this patch, there is precisely
>> >>>> one user (cls_flower.c) and it's not at all clear to me if there will
>> >>>> be ever any more (e.g. for hashing we don't need tunnel info). IMO, it
>> >>>> should be just as easy and less convolution for everyone to have
>> >>>> flower call __skb_flow_dissect_tunnel_info directly and not call if
>> >>>> from __skb_flow_dissect.
>> >>>
>> >>>Hi Tom,
>> >>>
>> >>>my original suggestion was just that, but Jiri indicated a strong preference
>> >>>for the approach taken by this patch. I think we need to widen the
>> >>>participants in this discussion.
>> >>
>> >> I like the __skb_flow_dissect to be the function to call and it will do
>> >> the job according to the configuration. I don't like to split in
>> >> multiple calls.
>> >
>> >Those are not technical arguments. As I already mentioned, I don't
>> >like it when we add stuff for the benefit of a 1% use case that
>> >negatively impacts the rest of the 99% cases which is what I believe
>> >is happening here.
>> 
>> Yeah. I just wanted the flow dissector to stay compact. But if needed,
>> could be split. I just fear that it will become a mess that's all.
>> 
>> 
>> >
>> >> Does not make sense in the most of the cases as the
>> >> dissection state would have to be carried in between calls.
>> >
>> >Please elaborate. This code is being moved into __skb_flow_dissect, so
>> >the functionality was already there. I don't see any description in
>> >this discussion that things were broken and that this patch is a
>> >necessary fix.
>> 
>> Yeah, you are right.
>
>Hi Tom, Hi Jiri,
>
>I'm happy to make a patch to move the call to
>__skb_flow_dissect_tunnel_info() from __skb_flow_dissect() to
>fl_classify(). It seems that approach has been agreed on above.

If the consensus is that the right way is to cut-out flow dissector,
so be it. But first, I believe it is reasonable to request to see some
numbers that would indicate that it actually resolves anything.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox