Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: High CPU load by native_queued_spin_lock_slowpath
From: Eric Dumazet @ 2017-10-10 14:07 UTC (permalink / raw)
  To: Sergey K.; +Cc: netdev
In-Reply-To: <CAGv8E+xZOweXEedymRuLt4idQmCM8Mk1TZOM=0p3YmnXbh-n+g@mail.gmail.com>

On Tue, 2017-10-10 at 18:00 +0600, Sergey K. wrote:
> I'm using Debian 9(stretch edition) kernel 4.9., hp dl385 g7 server
> with 32 cpu cores. NIC queues are tied to processor cores. Server is
> shaping traffic (iproute2 and htb discipline + skbinfo + ipset + ifb)
> and filtering some rules by iptables.
> 
> At that moment, when traffic goes up about 1gbit/s cpu is very high
> loaded. Perf tool tells me that kernel module
> native_queued_spin_lock_slowpath loading cpu about 40%.
> 
> After several hours of searching, I found that if I remove the htb
> discipline from ifb0, the high load goes down.
> Well, I think that problem with classify and shaping by htb.
> 
> Who knows how to solve?

You use a single ifb0 on the whole (multiqueue) device for ingress ?

What about multiple ifb instead, one per RX queue ?

Alternative is to reduce contention and use a single RX queue.

^ permalink raw reply

* [PATCH 0/4] adapt DPAA drivers for DSA
From: Madalin Bucur @ 2017-10-10 14:10 UTC (permalink / raw)
  To: netdev, davem; +Cc: f.fainelli, andrew, vivien.didelot, junote, linux-kernel

Junote Cai reported that he was not able to get a DSA setup involving the
DPAA/FMAN driver to work and narrowed it down to of_find_net_device_by_node()
call in DSA setup. The initial attempt to fix this by adding of_node to the
platform device results in a second, failed, probing of the FMan MAC driver
against the new platform device created for the DPAA Ethernet driver.
Solve these issues by removing the of_node pointer from the platform device
and changing the net_dev dev to the of_device dev to ensure the DSA init
will be able to find the DPAA net_dev using of_find_net_device_by_node().
Several changes were required to enable this solution: refactoring the
adjust_link (also resulted in lesser, cleaner code) and renaming the fman
kernel modules to keep the legacy udev rules happy.

Madalin Bucur (4):
  fsl/fman: remove of_node
  dpaa_eth: move of_phy_connect() to the eth driver
  dpaa_eth: change device used
  fsl/fman: add dpaa in module names

 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c |  89 +++++++++-------
 drivers/net/ethernet/freescale/fman/Makefile   |  12 +--
 drivers/net/ethernet/freescale/fman/mac.c      | 135 +++++++------------------
 drivers/net/ethernet/freescale/fman/mac.h      |   6 +-
 4 files changed, 99 insertions(+), 143 deletions(-)

-- 
2.1.0

^ permalink raw reply

* [PATCH 1/4] fsl/fman: remove of_node
From: Madalin Bucur @ 2017-10-10 14:10 UTC (permalink / raw)
  To: netdev, davem; +Cc: f.fainelli, andrew, vivien.didelot, junote, linux-kernel
In-Reply-To: <1507644618-32006-1-git-send-email-madalin.bucur@nxp.com>

The FMan MAC driver allocates a platform device for the Ethernet
driver to probe on. Setting pdev->dev.of_node with the MAC node
triggers the MAC driver probing of the new platform device. While
this fails quickly and does not affect the functionality of the
drivers, it is incorrect and must be removed. This was added to
address a report that DSA code using of_find_net_device_by_node()
is unable to use the DPAA interfaces. Error message seen before
this fix:

fsl_mac dpaa-ethernet.0: __devm_request_mem_region(mac) failed
fsl_mac: probe of dpaa-ethernet.0 failed with error -16

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 drivers/net/ethernet/freescale/fman/mac.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fman/mac.c b/drivers/net/ethernet/freescale/fman/mac.c
index 387eb4a..9a265f8 100644
--- a/drivers/net/ethernet/freescale/fman/mac.c
+++ b/drivers/net/ethernet/freescale/fman/mac.c
@@ -623,7 +623,6 @@ static struct platform_device *dpaa_eth_add_device(int fman_id,
 		goto no_mem;
 	}

-	pdev->dev.of_node = node;
 	pdev->dev.parent = priv->dev;
 	set_dma_ops(&pdev->dev, get_dma_ops(priv->dev));

-- 
2.1.0

^ permalink raw reply related

* [PATCH 2/4] dpaa_eth: move of_phy_connect() to the eth driver
From: Madalin Bucur @ 2017-10-10 14:10 UTC (permalink / raw)
  To: netdev, davem; +Cc: f.fainelli, andrew, vivien.didelot, junote, linux-kernel
In-Reply-To: <1507644618-32006-1-git-send-email-madalin.bucur@nxp.com>

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 48 +++++++++++--
 drivers/net/ethernet/freescale/fman/mac.c      | 97 ++++++--------------------
 drivers/net/ethernet/freescale/fman/mac.h      |  5 +-
 3 files changed, 66 insertions(+), 84 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 4225806..7cf61d6 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2435,6 +2435,48 @@ static void dpaa_eth_napi_disable(struct dpaa_priv *priv)
 	}
 }
 
+static void dpaa_adjust_link(struct net_device *net_dev)
+{
+	struct mac_device *mac_dev;
+	struct dpaa_priv *priv;
+
+	priv = netdev_priv(net_dev);
+	mac_dev = priv->mac_dev;
+	mac_dev->adjust_link(mac_dev);
+}
+
+static int dpaa_phy_init(struct net_device *net_dev)
+{
+	struct mac_device *mac_dev;
+	struct phy_device *phy_dev;
+	struct dpaa_priv *priv;
+
+	priv = netdev_priv(net_dev);
+	mac_dev = priv->mac_dev;
+
+	phy_dev = of_phy_connect(net_dev, mac_dev->phy_node,
+				 &dpaa_adjust_link, 0,
+				 mac_dev->phy_if);
+	if (!phy_dev) {
+		netif_err(priv, ifup, net_dev, "init_phy() failed\n");
+		return -ENODEV;
+	}
+
+	/* Remove any features not supported by the controller */
+	phy_dev->supported &= mac_dev->if_support;
+
+	/* Enable the symmetric and asymmetric PAUSE frame advertisements,
+	 * as most of the PHY drivers do not enable them by default.
+	 */
+	phy_dev->supported |= (SUPPORTED_Pause | SUPPORTED_Asym_Pause);
+	phy_dev->advertising = phy_dev->supported;
+
+	mac_dev->phy_dev = phy_dev;
+	net_dev->phydev = phy_dev;
+
+	return 0;
+}
+
 static int dpaa_open(struct net_device *net_dev)
 {
 	struct mac_device *mac_dev;
@@ -2445,12 +2487,8 @@ static int dpaa_open(struct net_device *net_dev)
 	mac_dev = priv->mac_dev;
 	dpaa_eth_napi_enable(priv);
 
-	net_dev->phydev = mac_dev->init_phy(net_dev, priv->mac_dev);
-	if (!net_dev->phydev) {
-		netif_err(priv, ifup, net_dev, "init_phy() failed\n");
-		err = -ENODEV;
+	if (dpaa_phy_init(net_dev))
 		goto phy_init_failed;
-	}
 
 	for (i = 0; i < ARRAY_SIZE(mac_dev->port); i++) {
 		err = fman_port_enable(mac_dev->port[i]);
diff --git a/drivers/net/ethernet/freescale/fman/mac.c b/drivers/net/ethernet/freescale/fman/mac.c
index 9a265f8..a0a3107 100644
--- a/drivers/net/ethernet/freescale/fman/mac.c
+++ b/drivers/net/ethernet/freescale/fman/mac.c
@@ -57,9 +57,7 @@ struct mac_priv_s {
 	struct device			*dev;
 	void __iomem			*vaddr;
 	u8				cell_index;
-	phy_interface_t			phy_if;
 	struct fman			*fman;
-	struct device_node		*phy_node;
 	struct device_node		*internal_phy_node;
 	/* List of multicast addresses */
 	struct list_head		mc_addr_list;
@@ -106,7 +104,7 @@ static void set_fman_mac_params(struct mac_device *mac_dev,
 			     resource_size(mac_dev->res));
 	memcpy(&params->addr, mac_dev->addr, sizeof(mac_dev->addr));
 	params->max_speed	= priv->max_speed;
-	params->phy_if		= priv->phy_if;
+	params->phy_if		= mac_dev->phy_if;
 	params->basex_if	= false;
 	params->mac_id		= priv->cell_index;
 	params->fm		= (void *)priv->fman;
@@ -419,15 +417,12 @@ void fman_get_pause_cfg(struct mac_device *mac_dev, bool *rx_pause,
 }
 EXPORT_SYMBOL(fman_get_pause_cfg);
 
-static void adjust_link_void(struct net_device *net_dev)
+static void adjust_link_void(struct mac_device *mac_dev)
 {
 }
 
-static void adjust_link_dtsec(struct net_device *net_dev)
+static void adjust_link_dtsec(struct mac_device *mac_dev)
 {
-	struct device *dev = net_dev->dev.parent;
-	struct dpaa_eth_data *eth_data = dev->platform_data;
-	struct mac_device *mac_dev = eth_data->mac_dev;
 	struct phy_device *phy_dev = mac_dev->phy_dev;
 	struct fman_mac *fman_mac;
 	bool rx_pause, tx_pause;
@@ -444,14 +439,12 @@ static void adjust_link_dtsec(struct net_device *net_dev)
 	fman_get_pause_cfg(mac_dev, &rx_pause, &tx_pause);
 	err = fman_set_mac_active_pause(mac_dev, rx_pause, tx_pause);
 	if (err < 0)
-		netdev_err(net_dev, "fman_set_mac_active_pause() = %d\n", err);
+		dev_err(mac_dev->priv->dev, "fman_set_mac_active_pause() = %d\n",
+			err);
 }
 
-static void adjust_link_memac(struct net_device *net_dev)
+static void adjust_link_memac(struct mac_device *mac_dev)
 {
-	struct device *dev = net_dev->dev.parent;
-	struct dpaa_eth_data *eth_data = dev->platform_data;
-	struct mac_device *mac_dev = eth_data->mac_dev;
 	struct phy_device *phy_dev = mac_dev->phy_dev;
 	struct fman_mac *fman_mac;
 	bool rx_pause, tx_pause;
@@ -463,60 +456,12 @@ static void adjust_link_memac(struct net_device *net_dev)
 	fman_get_pause_cfg(mac_dev, &rx_pause, &tx_pause);
 	err = fman_set_mac_active_pause(mac_dev, rx_pause, tx_pause);
 	if (err < 0)
-		netdev_err(net_dev, "fman_set_mac_active_pause() = %d\n", err);
-}
-
-/* Initializes driver's PHY state, and attaches to the PHY.
- * Returns 0 on success.
- */
-static struct phy_device *init_phy(struct net_device *net_dev,
-				   struct mac_device *mac_dev,
-				   void (*adj_lnk)(struct net_device *))
-{
-	struct phy_device	*phy_dev;
-	struct mac_priv_s	*priv = mac_dev->priv;
-
-	phy_dev = of_phy_connect(net_dev, priv->phy_node, adj_lnk, 0,
-				 priv->phy_if);
-	if (!phy_dev) {
-		netdev_err(net_dev, "Could not connect to PHY\n");
-		return NULL;
-	}
-
-	/* Remove any features not supported by the controller */
-	phy_dev->supported &= mac_dev->if_support;
-	/* Enable the symmetric and asymmetric PAUSE frame advertisements,
-	 * as most of the PHY drivers do not enable them by default.
-	 */
-	phy_dev->supported |= (SUPPORTED_Pause | SUPPORTED_Asym_Pause);
-	phy_dev->advertising = phy_dev->supported;
-
-	mac_dev->phy_dev = phy_dev;
-
-	return phy_dev;
-}
-
-static struct phy_device *dtsec_init_phy(struct net_device *net_dev,
-					 struct mac_device *mac_dev)
-{
-	return init_phy(net_dev, mac_dev, &adjust_link_dtsec);
-}
-
-static struct phy_device *tgec_init_phy(struct net_device *net_dev,
-					struct mac_device *mac_dev)
-{
-	return init_phy(net_dev, mac_dev, adjust_link_void);
-}
-
-static struct phy_device *memac_init_phy(struct net_device *net_dev,
-					 struct mac_device *mac_dev)
-{
-	return init_phy(net_dev, mac_dev, &adjust_link_memac);
+		dev_err(mac_dev->priv->dev, "fman_set_mac_active_pause() = %d\n",
+			err);
 }
 
 static void setup_dtsec(struct mac_device *mac_dev)
 {
-	mac_dev->init_phy		= dtsec_init_phy;
 	mac_dev->init			= dtsec_initialization;
 	mac_dev->set_promisc		= dtsec_set_promiscuous;
 	mac_dev->change_addr		= dtsec_modify_mac_address;
@@ -528,14 +473,13 @@ static void setup_dtsec(struct mac_device *mac_dev)
 	mac_dev->set_multi		= set_multi;
 	mac_dev->start			= start;
 	mac_dev->stop			= stop;
-
+	mac_dev->adjust_link            = adjust_link_dtsec;
 	mac_dev->priv->enable		= dtsec_enable;
 	mac_dev->priv->disable		= dtsec_disable;
 }
 
 static void setup_tgec(struct mac_device *mac_dev)
 {
-	mac_dev->init_phy		= tgec_init_phy;
 	mac_dev->init			= tgec_initialization;
 	mac_dev->set_promisc		= tgec_set_promiscuous;
 	mac_dev->change_addr		= tgec_modify_mac_address;
@@ -547,14 +491,13 @@ static void setup_tgec(struct mac_device *mac_dev)
 	mac_dev->set_multi		= set_multi;
 	mac_dev->start			= start;
 	mac_dev->stop			= stop;
-
+	mac_dev->adjust_link            = adjust_link_void;
 	mac_dev->priv->enable		= tgec_enable;
 	mac_dev->priv->disable		= tgec_disable;
 }
 
 static void setup_memac(struct mac_device *mac_dev)
 {
-	mac_dev->init_phy		= memac_init_phy;
 	mac_dev->init			= memac_initialization;
 	mac_dev->set_promisc		= memac_set_promiscuous;
 	mac_dev->change_addr		= memac_modify_mac_address;
@@ -566,7 +509,7 @@ static void setup_memac(struct mac_device *mac_dev)
 	mac_dev->set_multi		= set_multi;
 	mac_dev->start			= start;
 	mac_dev->stop			= stop;
-
+	mac_dev->adjust_link            = adjust_link_memac;
 	mac_dev->priv->enable		= memac_enable;
 	mac_dev->priv->disable		= memac_disable;
 }
@@ -850,13 +793,13 @@ static int mac_probe(struct platform_device *_of_dev)
 			 mac_node);
 		phy_if = PHY_INTERFACE_MODE_SGMII;
 	}
-	priv->phy_if = phy_if;
+	mac_dev->phy_if = phy_if;
 
-	priv->speed		= phy2speed[priv->phy_if];
+	priv->speed		= phy2speed[mac_dev->phy_if];
 	priv->max_speed		= priv->speed;
 	mac_dev->if_support	= DTSEC_SUPPORTED;
 	/* We don't support half-duplex in SGMII mode */
-	if (priv->phy_if == PHY_INTERFACE_MODE_SGMII)
+	if (mac_dev->phy_if == PHY_INTERFACE_MODE_SGMII)
 		mac_dev->if_support &= ~(SUPPORTED_10baseT_Half |
 					SUPPORTED_100baseT_Half);
 
@@ -865,12 +808,12 @@ static int mac_probe(struct platform_device *_of_dev)
 		mac_dev->if_support |= SUPPORTED_1000baseT_Full;
 
 	/* The 10G interface only supports one mode */
-	if (priv->phy_if == PHY_INTERFACE_MODE_XGMII)
+	if (mac_dev->phy_if == PHY_INTERFACE_MODE_XGMII)
 		mac_dev->if_support = SUPPORTED_10000baseT_Full;
 
 	/* Get the rest of the PHY information */
-	priv->phy_node = of_parse_phandle(mac_node, "phy-handle", 0);
-	if (!priv->phy_node && of_phy_is_fixed_link(mac_node)) {
+	mac_dev->phy_node = of_parse_phandle(mac_node, "phy-handle", 0);
+	if (!mac_dev->phy_node && of_phy_is_fixed_link(mac_node)) {
 		struct phy_device *phy;
 
 		err = of_phy_register_fixed_link(mac_node);
@@ -884,8 +827,8 @@ static int mac_probe(struct platform_device *_of_dev)
 			goto _return_dev_set_drvdata;
 		}
 
-		priv->phy_node = of_node_get(mac_node);
-		phy = of_phy_find_device(priv->phy_node);
+		mac_dev->phy_node = of_node_get(mac_node);
+		phy = of_phy_find_device(mac_dev->phy_node);
 		if (!phy) {
 			err = -EINVAL;
 			goto _return_dev_set_drvdata;
@@ -903,7 +846,7 @@ static int mac_probe(struct platform_device *_of_dev)
 	err = mac_dev->init(mac_dev);
 	if (err < 0) {
 		dev_err(dev, "mac_dev->init() = %d\n", err);
-		of_node_put(priv->phy_node);
+		of_node_put(mac_dev->phy_node);
 		goto _return_dev_set_drvdata;
 	}
 
diff --git a/drivers/net/ethernet/freescale/fman/mac.h b/drivers/net/ethernet/freescale/fman/mac.h
index d7313f0..1ca85a1 100644
--- a/drivers/net/ethernet/freescale/fman/mac.h
+++ b/drivers/net/ethernet/freescale/fman/mac.h
@@ -50,6 +50,8 @@ struct mac_device {
 	struct fman_port	*port[2];
 	u32			 if_support;
 	struct phy_device	*phy_dev;
+	phy_interface_t		phy_if;
+	struct device_node	*phy_node;
 
 	bool autoneg_pause;
 	bool rx_pause_req;
@@ -58,11 +60,10 @@ struct mac_device {
 	bool tx_pause_active;
 	bool promisc;
 
-	struct phy_device *(*init_phy)(struct net_device *net_dev,
-				       struct mac_device *mac_dev);
 	int (*init)(struct mac_device *mac_dev);
 	int (*start)(struct mac_device *mac_dev);
 	int (*stop)(struct mac_device *mac_dev);
+	void (*adjust_link)(struct mac_device *mac_dev);
 	int (*set_promisc)(struct fman_mac *mac_dev, bool enable);
 	int (*change_addr)(struct fman_mac *mac_dev, enet_addr_t *enet_addr);
 	int (*set_multi)(struct net_device *net_dev,
-- 
2.1.0

^ permalink raw reply related

* [PATCH 3/4] dpaa_eth: change device used
From: Madalin Bucur @ 2017-10-10 14:10 UTC (permalink / raw)
  To: netdev, davem; +Cc: f.fainelli, andrew, vivien.didelot, junote, linux-kernel
In-Reply-To: <1507644618-32006-1-git-send-email-madalin.bucur@nxp.com>

Change device used for DMA mapping to the MAC device that is an
of_device, with proper DMA ops. Using this device for the netdevice
should also address the issue with DSA scenarios that need the
netdevice to be backed by an of_device.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 41 ++++++++------------------
 drivers/net/ethernet/freescale/fman/mac.c      | 37 ++++++++++-------------
 drivers/net/ethernet/freescale/fman/mac.h      |  1 -
 3 files changed, 27 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 7cf61d6..428ef2b 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -385,34 +385,19 @@ static int dpaa_setup_tc(struct net_device *net_dev, enum tc_setup_type type,
 
 static struct mac_device *dpaa_mac_dev_get(struct platform_device *pdev)
 {
-	struct platform_device *of_dev;
 	struct dpaa_eth_data *eth_data;
-	struct device *dpaa_dev, *dev;
-	struct device_node *mac_node;
+	struct device *dpaa_dev;
 	struct mac_device *mac_dev;
 
 	dpaa_dev = &pdev->dev;
 	eth_data = dpaa_dev->platform_data;
-	if (!eth_data)
+	if (!eth_data) {
+		dev_err(dpaa_dev, "eth_data missing\n");
 		return ERR_PTR(-ENODEV);
-
-	mac_node = eth_data->mac_node;
-
-	of_dev = of_find_device_by_node(mac_node);
-	if (!of_dev) {
-		dev_err(dpaa_dev, "of_find_device_by_node(%pOF) failed\n",
-			mac_node);
-		of_node_put(mac_node);
-		return ERR_PTR(-EINVAL);
 	}
-	of_node_put(mac_node);
-
-	dev = &of_dev->dev;
-
-	mac_dev = dev_get_drvdata(dev);
+	mac_dev = eth_data->mac_dev;
 	if (!mac_dev) {
-		dev_err(dpaa_dev, "dev_get_drvdata(%s) failed\n",
-			dev_name(dev));
+		dev_err(dpaa_dev, "mac_dev missing\n");
 		return ERR_PTR(-EINVAL);
 	}
 
@@ -2696,7 +2681,13 @@ static int dpaa_eth_probe(struct platform_device *pdev)
 	int err = 0, i, channel;
 	struct device *dev;
 
-	dev = &pdev->dev;
+	/* device used for DMA mapping */
+	dev = pdev->dev.parent;
+	err = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(40));
+	if (err) {
+		dev_err(dev, "dma_coerce_mask_and_coherent() failed\n");
+		goto dev_mask_failed;
+	}
 
 	/* Allocate this early, so we can store relevant information in
 	 * the private area
@@ -2738,14 +2729,6 @@ static int dpaa_eth_probe(struct platform_device *pdev)
 	priv->buf_layout[RX].priv_data_size = DPAA_RX_PRIV_DATA_SIZE; /* Rx */
 	priv->buf_layout[TX].priv_data_size = DPAA_TX_PRIV_DATA_SIZE; /* Tx */
 
-	/* device used for DMA mapping */
-	set_dma_ops(dev, get_dma_ops(&pdev->dev));
-	err = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(40));
-	if (err) {
-		dev_err(dev, "dma_coerce_mask_and_coherent() failed\n");
-		goto dev_mask_failed;
-	}
-
 	/* bp init */
 	for (i = 0; i < DPAA_BPS_NUM; i++) {
 		int err;
diff --git a/drivers/net/ethernet/freescale/fman/mac.c b/drivers/net/ethernet/freescale/fman/mac.c
index a0a3107..1d6da1e 100644
--- a/drivers/net/ethernet/freescale/fman/mac.c
+++ b/drivers/net/ethernet/freescale/fman/mac.c
@@ -542,8 +542,7 @@ static const u16 phy2speed[] = {
 };
 
 static struct platform_device *dpaa_eth_add_device(int fman_id,
-						   struct mac_device *mac_dev,
-						   struct device_node *node)
+						   struct mac_device *mac_dev)
 {
 	struct platform_device *pdev;
 	struct dpaa_eth_data data;
@@ -556,10 +555,8 @@ static struct platform_device *dpaa_eth_add_device(int fman_id,
 	data.mac_dev = mac_dev;
 	data.mac_hw_id = priv->cell_index;
 	data.fman_hw_id = fman_id;
-	data.mac_node = node;
 
 	mutex_lock(&eth_lock);
-
 	pdev = platform_device_alloc("dpaa-ethernet", dpaa_eth_dev_cnt);
 	if (!pdev) {
 		ret = -ENOMEM;
@@ -648,9 +645,6 @@ static int mac_probe(struct platform_device *_of_dev)
 		goto _return;
 	}
 
-	/* Register mac_dev */
-	dev_set_drvdata(dev, mac_dev);
-
 	INIT_LIST_HEAD(&priv->mc_addr_list);
 
 	/* Get the FM node */
@@ -659,7 +653,7 @@ static int mac_probe(struct platform_device *_of_dev)
 		dev_err(dev, "of_get_parent(%pOF) failed\n",
 			mac_node);
 		err = -EINVAL;
-		goto _return_dev_set_drvdata;
+		goto _return_of_get_parent;
 	}
 
 	of_dev = of_find_device_by_node(dev_node);
@@ -693,7 +687,7 @@ static int mac_probe(struct platform_device *_of_dev)
 	if (err < 0) {
 		dev_err(dev, "of_address_to_resource(%pOF) = %d\n",
 			mac_node, err);
-		goto _return_dev_set_drvdata;
+		goto _return_of_get_parent;
 	}
 
 	mac_dev->res = __devm_request_region(dev,
@@ -703,7 +697,7 @@ static int mac_probe(struct platform_device *_of_dev)
 	if (!mac_dev->res) {
 		dev_err(dev, "__devm_request_mem_region(mac) failed\n");
 		err = -EBUSY;
-		goto _return_dev_set_drvdata;
+		goto _return_of_get_parent;
 	}
 
 	priv->vaddr = devm_ioremap(dev, mac_dev->res->start,
@@ -711,7 +705,7 @@ static int mac_probe(struct platform_device *_of_dev)
 	if (!priv->vaddr) {
 		dev_err(dev, "devm_ioremap() failed\n");
 		err = -EIO;
-		goto _return_dev_set_drvdata;
+		goto _return_of_get_parent;
 	}
 
 	if (!of_device_is_available(mac_node)) {
@@ -728,7 +722,7 @@ static int mac_probe(struct platform_device *_of_dev)
 	if (err) {
 		dev_err(dev, "failed to read cell-index for %pOF\n", mac_node);
 		err = -EINVAL;
-		goto _return_dev_set_drvdata;
+		goto _return_of_get_parent;
 	}
 	priv->cell_index = (u8)val;
 
@@ -737,7 +731,7 @@ static int mac_probe(struct platform_device *_of_dev)
 	if (!mac_addr) {
 		dev_err(dev, "of_get_mac_address(%pOF) failed\n", mac_node);
 		err = -EINVAL;
-		goto _return_dev_set_drvdata;
+		goto _return_of_get_parent;
 	}
 	memcpy(mac_dev->addr, mac_addr, sizeof(mac_dev->addr));
 
@@ -747,14 +741,14 @@ static int mac_probe(struct platform_device *_of_dev)
 		dev_err(dev, "of_count_phandle_with_args(%pOF, fsl,fman-ports) failed\n",
 			mac_node);
 		err = nph;
-		goto _return_dev_set_drvdata;
+		goto _return_of_get_parent;
 	}
 
 	if (nph != ARRAY_SIZE(mac_dev->port)) {
 		dev_err(dev, "Not supported number of fman-ports handles of mac node %pOF from device tree\n",
 			mac_node);
 		err = -EINVAL;
-		goto _return_dev_set_drvdata;
+		goto _return_of_get_parent;
 	}
 
 	for (i = 0; i < ARRAY_SIZE(mac_dev->port); i++) {
@@ -818,20 +812,20 @@ static int mac_probe(struct platform_device *_of_dev)
 
 		err = of_phy_register_fixed_link(mac_node);
 		if (err)
-			goto _return_dev_set_drvdata;
+			goto _return_of_get_parent;
 
 		priv->fixed_link = kzalloc(sizeof(*priv->fixed_link),
 					   GFP_KERNEL);
 		if (!priv->fixed_link) {
 			err = -ENOMEM;
-			goto _return_dev_set_drvdata;
+			goto _return_of_get_parent;
 		}
 
 		mac_dev->phy_node = of_node_get(mac_node);
 		phy = of_phy_find_device(mac_dev->phy_node);
 		if (!phy) {
 			err = -EINVAL;
-			goto _return_dev_set_drvdata;
+			goto _return_of_get_parent;
 		}
 
 		priv->fixed_link->link = phy->link;
@@ -847,7 +841,7 @@ static int mac_probe(struct platform_device *_of_dev)
 	if (err < 0) {
 		dev_err(dev, "mac_dev->init() = %d\n", err);
 		of_node_put(mac_dev->phy_node);
-		goto _return_dev_set_drvdata;
+		goto _return_of_get_parent;
 	}
 
 	/* pause frame autonegotiation enabled */
@@ -868,7 +862,7 @@ static int mac_probe(struct platform_device *_of_dev)
 		 mac_dev->addr[0], mac_dev->addr[1], mac_dev->addr[2],
 		 mac_dev->addr[3], mac_dev->addr[4], mac_dev->addr[5]);
 
-	priv->eth_dev = dpaa_eth_add_device(fman_id, mac_dev, mac_node);
+	priv->eth_dev = dpaa_eth_add_device(fman_id, mac_dev);
 	if (IS_ERR(priv->eth_dev)) {
 		dev_err(dev, "failed to add Ethernet platform device for MAC %d\n",
 			priv->cell_index);
@@ -879,9 +873,8 @@ static int mac_probe(struct platform_device *_of_dev)
 
 _return_of_node_put:
 	of_node_put(dev_node);
-_return_dev_set_drvdata:
+_return_of_get_parent:
 	kfree(priv->fixed_link);
-	dev_set_drvdata(dev, NULL);
 _return:
 	return err;
 }
diff --git a/drivers/net/ethernet/freescale/fman/mac.h b/drivers/net/ethernet/freescale/fman/mac.h
index 1ca85a1..eefb335 100644
--- a/drivers/net/ethernet/freescale/fman/mac.h
+++ b/drivers/net/ethernet/freescale/fman/mac.h
@@ -83,7 +83,6 @@ struct mac_device {
 };
 
 struct dpaa_eth_data {
-	struct device_node *mac_node;
 	struct mac_device *mac_dev;
 	int mac_hw_id;
 	int fman_hw_id;
-- 
2.1.0

^ permalink raw reply related

* [PATCH 4/4] fsl/fman: add dpaa in module names
From: Madalin Bucur @ 2017-10-10 14:10 UTC (permalink / raw)
  To: netdev, davem; +Cc: f.fainelli, andrew, vivien.didelot, junote, linux-kernel
In-Reply-To: <1507644618-32006-1-git-send-email-madalin.bucur@nxp.com>

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 drivers/net/ethernet/freescale/fman/Makefile | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/Makefile b/drivers/net/ethernet/freescale/fman/Makefile
index 2c38119..4ae524a 100644
--- a/drivers/net/ethernet/freescale/fman/Makefile
+++ b/drivers/net/ethernet/freescale/fman/Makefile
@@ -1,9 +1,9 @@
 subdir-ccflags-y +=  -I$(srctree)/drivers/net/ethernet/freescale/fman
 
-obj-$(CONFIG_FSL_FMAN) += fsl_fman.o
-obj-$(CONFIG_FSL_FMAN) += fsl_fman_port.o
-obj-$(CONFIG_FSL_FMAN) += fsl_mac.o
+obj-$(CONFIG_FSL_FMAN) += fsl_dpaa_fman.o
+obj-$(CONFIG_FSL_FMAN) += fsl_dpaa_fman_port.o
+obj-$(CONFIG_FSL_FMAN) += fsl_dpaa_mac.o
 
-fsl_fman-objs	:= fman_muram.o fman.o fman_sp.o fman_keygen.o
-fsl_fman_port-objs := fman_port.o
-fsl_mac-objs:= mac.o fman_dtsec.o fman_memac.o fman_tgec.o
+fsl_dpaa_fman-objs	:= fman_muram.o fman.o fman_sp.o fman_keygen.o
+fsl_dpaa_fman_port-objs := fman_port.o
+fsl_dpaa_mac-objs:= mac.o fman_dtsec.o fman_memac.o fman_tgec.o
-- 
2.1.0

^ permalink raw reply related

* Re: [RFC net 1/1] net: sched: act: fix rcu race in dump
From: Eric Dumazet @ 2017-10-10 14:12 UTC (permalink / raw)
  To: Alexander Aring; +Cc: jhs, xiyou.wangcong, jiri, netdev, kurup.manish, bjb
In-Reply-To: <20171010123218.5251-2-aring@mojatatu.com>

On Tue, 2017-10-10 at 08:32 -0400, Alexander Aring wrote:
> This patch fixes an issue with kfree_rcu which is not protected by RTNL
> lock. It could be that the current assigned rcu pointer will be freed by
> kfree_rcu while dump callback is running.
> 
> To prevent this, we call rcu_synchronize at first. Then we are sure all
> latest rcu functions e.g. rcu_assign_pointer and kfree_rcu in init are
> done. After rcu_synchronize we dereference under RTNL lock which is also
> held in init function, which means no other rcu_assign_pointer or
> kfree_rcu will occur.
> 
> To call rcu_synchronize will also prevent weird behaviours by doing over
> netlink:
> 
>  - set params A
>  - set params B
>  - dump params
>   \--> will dump params A
> 
> This could be a unlikely case that the last rcu_assign_pointer was not
> happened before dump callback.
> 
> Signed-off-by: Alexander Aring <aring@mojatatu.com>
> ---
>  net/sched/act_skbmod.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
> index b642ad3d39dd..231e07bca384 100644
> --- a/net/sched/act_skbmod.c
> +++ b/net/sched/act_skbmod.c
> @@ -198,7 +198,7 @@ static int tcf_skbmod_dump(struct sk_buff *skb, struct tc_action *a,
>  {
>  	struct tcf_skbmod *d = to_skbmod(a);
>  	unsigned char *b = skb_tail_pointer(skb);
> -	struct tcf_skbmod_params  *p = rtnl_dereference(d->skbmod_p);
> +	struct tcf_skbmod_params  *p;
>  	struct tc_skbmod opt = {
>  		.index   = d->tcf_index,
>  		.refcnt  = d->tcf_refcnt - ref,
> @@ -207,6 +207,11 @@ static int tcf_skbmod_dump(struct sk_buff *skb, struct tc_action *a,
>  	};
>  	struct tcf_t t;
>  
> +	/* wait until last rcu_assign_pointer/kfree_rcu is done */
> +	rcu_synchronize();
> +	/* RTNL lock prevents another rcu_assign_pointer/kfree_rcu call */
> +	p = rtnl_dereference(d->skbmod_p);
> +
>  	opt.flags  = p->flags;
>  	if (nla_put(skb, TCA_SKBMOD_PARMS, sizeof(opt), &opt))
>  		goto nla_put_failure;

Sorry but no. This is plainly wrong.

We need to fix this without adding a _very_ expensive rcu_synchronize()
on a path which does not need such thing.

I am confused by this patch, please tell us more what the problem is.

I suspect rcu_read_lock() is what you need, but isn't a writer supposed
to hold RTNL in net/sched/* ???

^ permalink raw reply

* [PATCH net 0/2] net/smc: ib_query_gid() patches
From: Ursula Braun @ 2017-10-10 14:13 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA,
	jwi-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	schwidefsky-tA70FqPdS9bQT0dZR+AlfA,
	heiko.carstens-tA70FqPdS9bQT0dZR+AlfA,
	raspl-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	parav-VPRAkNaXOzVWk0Htik3J/w

Dave,

triggered by Parav Pandit here are 2 cleanup patches for usage of
ib_query_gid() in the smc-code.

Thanks, Ursula

Ursula Braun (2):
  net/smc: replace function pointer get_netdev()
  net/smc: dev_put for netdev after usage of ib_query_gid()

 net/smc/smc_core.c | 11 +++++++----
 net/smc/smc_ib.c   | 26 +++++++++-----------------
 2 files changed, 16 insertions(+), 21 deletions(-)

-- 
2.13.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net 1/2] net/smc: replace function pointer get_netdev()
From: Ursula Braun @ 2017-10-10 14:13 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA,
	jwi-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	schwidefsky-tA70FqPdS9bQT0dZR+AlfA,
	heiko.carstens-tA70FqPdS9bQT0dZR+AlfA,
	raspl-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	parav-VPRAkNaXOzVWk0Htik3J/w
In-Reply-To: <20171010141351.87700-1-ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

SMC should not open code the function pointer get_netdev of the
IB device. Replacing ib_query_gid(..., NULL) with
ib_query_gid(..., gid_attr) allows access to the netdev.

Signed-off-by: Ursula Braun <ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Suggested-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 net/smc/smc_ib.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index 0b5852299158..b428c0f6c782 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -369,26 +369,17 @@ void smc_ib_buf_unmap_sg(struct smc_ib_device *smcibdev,
 
 static int smc_ib_fill_gid_and_mac(struct smc_ib_device *smcibdev, u8 ibport)
 {
-	struct net_device *ndev;
+	struct ib_gid_attr gattr;
 	int rc;
 
 	rc = ib_query_gid(smcibdev->ibdev, ibport, 0,
-			  &smcibdev->gid[ibport - 1], NULL);
-	/* the SMC protocol requires specification of the roce MAC address;
-	 * if net_device cannot be determined, it can be derived from gid 0
-	 */
-	ndev = smcibdev->ibdev->get_netdev(smcibdev->ibdev, ibport);
-	if (ndev) {
-		memcpy(&smcibdev->mac, ndev->dev_addr, ETH_ALEN);
-		dev_put(ndev);
-	} else if (!rc) {
-		memcpy(&smcibdev->mac[ibport - 1][0],
-		       &smcibdev->gid[ibport - 1].raw[8], 3);
-		memcpy(&smcibdev->mac[ibport - 1][3],
-		       &smcibdev->gid[ibport - 1].raw[13], 3);
-		smcibdev->mac[ibport - 1][0] &= ~0x02;
-	}
-	return rc;
+			  &smcibdev->gid[ibport - 1], &gattr);
+	if (rc || !gattr.ndev)
+		return -ENODEV;
+
+	memcpy(smcibdev->mac[ibport - 1], gattr.ndev->dev_addr, ETH_ALEN);
+	dev_put(gattr.ndev);
+	return 0;
 }
 
 /* Create an identifier unique for this instance of SMC-R.
@@ -419,6 +410,7 @@ int smc_ib_remember_port_attr(struct smc_ib_device *smcibdev, u8 ibport)
 			   &smcibdev->pattr[ibport - 1]);
 	if (rc)
 		goto out;
+	/* the SMC protocol requires specification of the roce MAC address */
 	rc = smc_ib_fill_gid_and_mac(smcibdev, ibport);
 	if (rc)
 		goto out;
-- 
2.13.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH net 2/2] net/smc: dev_put for netdev after usage of ib_query_gid()
From: Ursula Braun @ 2017-10-10 14:13 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-rdma, linux-s390, jwi, schwidefsky, heiko.carstens,
	raspl, ubraun, parav
In-Reply-To: <20171010141351.87700-1-ubraun@linux.vnet.ibm.com>

For ROCEs ib_query_gid() takes a reference count on the net_device.
This reference count must be decreased by the caller.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
---
 net/smc/smc_core.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 20b66e79c5d6..e93a31ec3cc2 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -380,10 +380,13 @@ static int smc_link_determine_gid(struct smc_link_group *lgr)
 		if (ib_query_gid(lnk->smcibdev->ibdev, lnk->ibport, i, &gid,
 				 &gattr))
 			continue;
-		if (gattr.ndev &&
-		    (vlan_dev_vlan_id(gattr.ndev) == lgr->vlan_id)) {
-			lnk->gid = gid;
-			return 0;
+		if (gattr.ndev) {
+			if (vlan_dev_vlan_id(gattr.ndev) == lgr->vlan_id) {
+				lnk->gid = gid;
+				dev_put(gattr.ndev);
+				return 0;
+			}
+			dev_put(gattr.ndev);
 		}
 	}
 	return -ENODEV;
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next 1/1] net/smc: add SMC rendezvous protocol
From: Ursula Braun @ 2017-10-10 14:14 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-s390, jwi, schwidefsky, heiko.carstens, raspl,
	hwippel, ubraun

From: Ursula Braun <ubraun@linux.vnet.ibm.com>

The SMC protocol [1] uses a rendezvous protocol to negotiate SMC
capability between peers. The current Linux implementation does not use
this rendezvous protocol and, thus, is not compliant to RFC7609 and
incompatible with other SMC implementations like in zOS. This patch adds
support for the SMC rendezvous protocol.

Details:

The SMC rendezvous protocol relies on the use of a new TCP experimental
option. With this option, SMC capabilities are exchanged between the
peers during the TCP three way handshake.

The goal of this patch is to leave common TCP code unmodified. Thus,
it uses netfilter hooks to intercept TCP SYN and SYN/ACK packets. For
outgoing packets originating from SMC sockets, the experimental option
is added. For inbound packets destined for SMC sockets, the experimental
option is checked.

Another goal was to minimize the performance impact on non-SMC traffic
(when SMC is enabled). The netfilter hooks used for SMC client
connections are active only during TCP connection establishment.
The netfilter hooks used for SMC servers are active as long as there are
listening SMC sockets.

When the hooks are active, the following additional operations are
performed on incoming and outgoing packets:
  (1) call SMC netfilter hook (all IPv4 packets)
  (2) check if TCP SYN or SYN/ACK packet (all IPv4 packets)
  (3) check if packet goes to/comes from SMC socket (SYN & SYN/ACK
      packets only)
  (4) check/add SMC experimental option (SMC sockets' SYN & SYN/ACK
      packets only)

References:
  [1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609

Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
---
 net/smc/Kconfig  |   2 +-
 net/smc/Makefile |   2 +-
 net/smc/af_smc.c |  66 ++++++-
 net/smc/smc.h    |  10 +-
 net/smc/smc_rv.c | 543 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 net/smc/smc_rv.h |  31 ++++
 6 files changed, 646 insertions(+), 8 deletions(-)
 create mode 100644 net/smc/smc_rv.c
 create mode 100644 net/smc/smc_rv.h

diff --git a/net/smc/Kconfig b/net/smc/Kconfig
index c717ef0896aa..ad49086e8ed7 100644
--- a/net/smc/Kconfig
+++ b/net/smc/Kconfig
@@ -1,6 +1,6 @@
 config SMC
 	tristate "SMC socket protocol family"
-	depends on INET && INFINIBAND
+	depends on INET && INFINIBAND && NETFILTER
 	---help---
 	  SMC-R provides a "sockets over RDMA" solution making use of
 	  RDMA over Converged Ethernet (RoCE) technology to upgrade
diff --git a/net/smc/Makefile b/net/smc/Makefile
index 188104654b54..2155a7eff41d 100644
--- a/net/smc/Makefile
+++ b/net/smc/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_SMC)	+= smc.o
 obj-$(CONFIG_SMC_DIAG)	+= smc_diag.o
 smc-y := af_smc.o smc_pnet.o smc_ib.o smc_clc.o smc_core.o smc_wr.o smc_llc.o
-smc-y += smc_cdc.o smc_tx.o smc_rx.o smc_close.o
+smc-y += smc_cdc.o smc_tx.o smc_rx.o smc_close.o smc_rv.o
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 745f145d4c4d..290b9ff06e01 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -34,6 +34,7 @@
 #include <net/smc.h>
 
 #include "smc.h"
+#include "smc_rv.h"
 #include "smc_clc.h"
 #include "smc_llc.h"
 #include "smc_cdc.h"
@@ -109,6 +110,7 @@ static int smc_release(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
 	struct smc_sock *smc;
+	int old_state;
 	int rc = 0;
 
 	if (!sk)
@@ -123,6 +125,7 @@ static int smc_release(struct socket *sock)
 		lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
 	else
 		lock_sock(sk);
+	old_state = sk->sk_state;
 
 	if (smc->use_fallback) {
 		sk->sk_state = SMC_CLOSED;
@@ -132,6 +135,10 @@ static int smc_release(struct socket *sock)
 		sock_set_flag(sk, SOCK_DEAD);
 		sk->sk_shutdown |= SHUTDOWN_MASK;
 	}
+	if (old_state == SMC_LISTEN) {
+		smc_rv_nf_unregister_hook(sock_net(sk), &smc_nfho_serv);
+		kfree(smc->listen_pends);
+	}
 	if (smc->clcsock) {
 		sock_release(smc->clcsock);
 		smc->clcsock = NULL;
@@ -178,6 +185,7 @@ static struct sock *smc_sock_alloc(struct net *net, struct socket *sock)
 	sk->sk_destruct = smc_destruct;
 	sk->sk_protocol = SMCPROTO_SMC;
 	smc = smc_sk(sk);
+	smc->use_fallback = true; /* default: not SMC-capable */
 	INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work);
 	INIT_LIST_HEAD(&smc->accept_q);
 	spin_lock_init(&smc->accept_q_lock);
@@ -390,6 +398,10 @@ static int smc_connect_rdma(struct smc_sock *smc)
 	int rc = 0;
 	u8 ibport;
 
+	if (smc->use_fallback)
+		/* peer has not signalled SMC-capability */
+		goto out_connected;
+
 	/* IPSec connections opt out of SMC-R optimizations */
 	if (using_ipsec(smc)) {
 		reason_code = SMC_CLC_DECL_IPSEC;
@@ -500,7 +512,6 @@ static int smc_connect_rdma(struct smc_sock *smc)
 	smc_tx_init(smc);
 
 out_connected:
-	smc_copy_sock_settings_to_clc(smc);
 	if (smc->sk.sk_state == SMC_INIT)
 		smc->sk.sk_state = SMC_ACTIVE;
 
@@ -555,7 +566,11 @@ static int smc_connect(struct socket *sock, struct sockaddr *addr,
 	}
 
 	smc_copy_sock_settings_to_clc(smc);
+	smc_rv_nf_register_hook(sock_net(sk), &smc_nfho_clnt);
+
 	rc = kernel_connect(smc->clcsock, addr, alen, flags);
+	if (rc != -EINPROGRESS)
+		smc_rv_nf_unregister_hook(sock_net(sk), &smc_nfho_clnt);
 	if (rc)
 		goto out;
 
@@ -574,10 +589,12 @@ static int smc_connect(struct socket *sock, struct sockaddr *addr,
 
 static int smc_clcsock_accept(struct smc_sock *lsmc, struct smc_sock **new_smc)
 {
+	struct smc_listen_pending *pnd;
 	struct sock *sk = &lsmc->sk;
 	struct socket *new_clcsock;
 	struct sock *new_sk;
-	int rc;
+	unsigned long flags;
+	int i, rc;
 
 	release_sock(&lsmc->sk);
 	new_sk = smc_sock_alloc(sock_net(sk), NULL);
@@ -613,6 +630,25 @@ static int smc_clcsock_accept(struct smc_sock *lsmc, struct smc_sock **new_smc)
 	}
 
 	(*new_smc)->clcsock = new_clcsock;
+
+	/* enable SMC-capability if an SMC-capable connecting socket is
+	 * contained in listen_pends; invalidate this entry
+	 */
+	spin_lock_irqsave(&lsmc->listen_pends_lock, flags);
+	for (i = 0; i < 2 * lsmc->sk.sk_max_ack_backlog; i++) {
+		pnd = lsmc->listen_pends + i;
+		if (pnd->used &&
+		    pnd->addr == new_clcsock->sk->sk_daddr &&
+		    pnd->port == new_clcsock->sk->sk_dport &&
+		    jiffies_to_msecs(get_jiffies_64() - pnd->time) <=
+						SMC_LISTEN_PEND_VALID_TIME) {
+			(*new_smc)->use_fallback = false;
+			pnd->used = false;
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&lsmc->listen_pends_lock, flags);
+
 out:
 	return rc;
 }
@@ -759,6 +795,10 @@ static void smc_listen_work(struct work_struct *work)
 	u8 prefix_len;
 	u8 ibport;
 
+	if (new_smc->use_fallback)
+		/* peer has not signalled SMC-capability */
+		goto out_connected;
+
 	/* do inband token exchange -
 	 *wait for and receive SMC Proposal CLC message
 	 */
@@ -929,7 +969,6 @@ static void smc_tcp_listen_work(struct work_struct *work)
 			continue;
 
 		new_smc->listen_smc = lsmc;
-		new_smc->use_fallback = false; /* assume rdma capability first*/
 		sock_hold(&lsmc->sk); /* sock_put in smc_listen_work */
 		INIT_WORK(&new_smc->smc_listen_work, smc_listen_work);
 		smc_copy_sock_settings_to_smc(new_smc);
@@ -954,16 +993,32 @@ static int smc_listen(struct socket *sock, int backlog)
 	if ((sk->sk_state != SMC_INIT) && (sk->sk_state != SMC_LISTEN))
 		goto out;
 
+	rc = -ENOMEM;
+	/* Addresses and ports of incoming SYN packets with experimental option
+	 * SMC are saved, but TCP might decide to drop them. Thus more slots
+	 * than the backlog value are allocated for pending connecting sockets
+	 */
+	smc->listen_pends = kzalloc(
+			2 * backlog * sizeof(struct smc_listen_pending),
+			GFP_KERNEL);
+	if (!smc->listen_pends)
+		goto out;
+	spin_lock_init(&smc->listen_pends_lock);
+
 	rc = 0;
 	if (sk->sk_state == SMC_LISTEN) {
 		sk->sk_max_ack_backlog = backlog;
 		goto out;
 	}
+
+	smc->use_fallback = false; /* listen sockets are SMC-capable */
 	/* some socket options are handled in core, so we could not apply
 	 * them to the clc socket -- copy smc socket options to clc socket
 	 */
 	smc_copy_sock_settings_to_clc(smc);
 
+	smc_rv_nf_register_hook(sock_net(sk), &smc_nfho_serv);
+
 	rc = kernel_listen(smc->clcsock, backlog);
 	if (rc)
 		goto out;
@@ -1114,7 +1169,7 @@ static unsigned int smc_poll(struct file *file, struct socket *sock,
 	struct sock *sk = sock->sk;
 	unsigned int mask = 0;
 	struct smc_sock *smc;
-	int rc;
+	int rc = 0;
 
 	smc = smc_sk(sock->sk);
 	if ((sk->sk_state == SMC_INIT) || smc->use_fallback) {
@@ -1123,6 +1178,7 @@ static unsigned int smc_poll(struct file *file, struct socket *sock,
 		/* if non-blocking connect finished ... */
 		lock_sock(sk);
 		if ((sk->sk_state == SMC_INIT) && (mask & POLLOUT)) {
+			smc_rv_nf_unregister_hook(sock_net(sk), &smc_nfho_clnt);
 			sk->sk_err = smc->clcsock->sk->sk_err;
 			if (sk->sk_err) {
 				mask |= POLLERR;
@@ -1348,7 +1404,6 @@ static int smc_create(struct net *net, struct socket *sock, int protocol,
 
 	/* create internal TCP socket for CLC handshake and fallback */
 	smc = smc_sk(sk);
-	smc->use_fallback = false; /* assume rdma capability first */
 	rc = sock_create_kern(net, PF_INET, SOCK_STREAM,
 			      IPPROTO_TCP, &smc->clcsock);
 	if (rc)
@@ -1370,6 +1425,7 @@ static int __init smc_init(void)
 {
 	int rc;
 
+	smc_rv_init();
 	rc = smc_pnet_init();
 	if (rc)
 		return rc;
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 0ccd6fa387ad..96d7a20ba7db 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -167,6 +167,13 @@ struct smc_connection {
 	struct work_struct	close_work;	/* peer sent some closing */
 };
 
+struct smc_listen_pending {
+	u64		time;			/* time when entry was created*/
+	bool		used;			/* true if entry is in use */
+	__be32		addr;			/* address of a listen socket */
+	__be16		port;			/* port of a listen socket */
+};
+
 struct smc_sock {				/* smc sock container */
 	struct sock		sk;
 	struct socket		*clcsock;	/* internal tcp socket */
@@ -175,6 +182,8 @@ struct smc_sock {				/* smc sock container */
 	struct smc_sock		*listen_smc;	/* listen parent */
 	struct work_struct	tcp_listen_work;/* handle tcp socket accepts */
 	struct work_struct	smc_listen_work;/* prepare new accept socket */
+	struct smc_listen_pending *listen_pends;/* listen pending SYNs */
+	spinlock_t		listen_pends_lock; /* protects listen_pends */
 	struct list_head	accept_q;	/* sockets to be accepted */
 	spinlock_t		accept_q_lock;	/* protects accept_q */
 	struct delayed_work	sock_put_work;	/* final socket freeing */
@@ -271,5 +280,4 @@ int smc_conn_create(struct smc_sock *smc, __be32 peer_in_addr,
 		    struct smc_clc_msg_local *lcl, int srv_first_contact);
 struct sock *smc_accept_dequeue(struct sock *parent, struct socket *new_sock);
 void smc_close_non_accepted(struct sock *sk);
-
 #endif	/* __SMC_H */
diff --git a/net/smc/smc_rv.c b/net/smc/smc_rv.c
new file mode 100644
index 000000000000..4ce01dec808f
--- /dev/null
+++ b/net/smc/smc_rv.c
@@ -0,0 +1,543 @@
+/*
+ *  Shared Memory Communications over RDMA (SMC-R) and RoCE
+ *
+ *  SMC Rendezvous to determine SMC-capability of the peer
+ *
+ *  Copyright IBM Corp. 2017
+ *
+ *  Author(s):	Hans Wippel <hwippel@linux.vnet.ibm.com>
+ *		Ursula Braun <ubraun@linux.vnet.ibm.com>
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/netdevice.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter_ipv4.h>
+#include <linux/ip.h>
+#include <linux/tcp.h>
+#include <net/tcp.h>
+
+#include "smc.h"
+#include "smc_rv.h"
+
+#define TCPOLEN_SMC			8
+#define TCPOLEN_SMC_BASE		6
+#define TCPOLEN_SMC_ALIGNED		2
+
+static const char TCPOPT_SMC_MAGIC[4] = {'\xe2', '\xd4', '\xc3', '\xd9'};
+
+/* in TCP header, replace EOL option and remaining header bytes with NOPs */
+static bool smc_rv_replace_eol_option(struct sk_buff *skb)
+{
+	struct tcphdr *tcph = tcp_hdr(skb);
+	int opt_bytes = tcp_optlen(skb);
+	unsigned char *buf;
+	int i = 0;
+
+	buf = (unsigned char *)(tcph + 1);
+	/* Parse TCP options. Based on tcp_parse_options in tcp_input.c */
+	while (i < opt_bytes) {
+		switch (buf[i]) {
+		/* one byte options */
+		case TCPOPT_EOL:
+			/* replace remaining bytes with NOPs */
+			while (i < opt_bytes) {
+				buf[i] = TCPOPT_NOP;
+				i++;
+			}
+			return true;
+		case TCPOPT_NOP:
+			i++;
+			continue;
+		default:
+			/* multi-byte options */
+			if (buf[i + 1] < 2 || i + buf[i + 1] > opt_bytes)
+				return false; /* bad option */
+			i += buf[i + 1];
+			continue;
+		}
+	}
+	return true;
+}
+
+/* check if TCP header contains SMC option */
+static bool smc_rv_has_smc_option(struct sk_buff *skb)
+{
+	struct tcphdr *tcph = tcp_hdr(skb);
+	int opt_bytes = tcp_optlen(skb);
+	unsigned char *buf;
+	int i = 0;
+
+	buf = (unsigned char *)(tcph + 1);
+	/* Parse TCP options. Based on tcp_parse_options in tcp_input.c */
+	while (i < opt_bytes) {
+		switch (buf[i]) {
+		/* one byte options */
+		case TCPOPT_EOL:
+			return false;
+		case TCPOPT_NOP:
+			i++;
+			continue;
+		default:
+			/* multi-byte options */
+			if (buf[i + 1] < 2)
+				return false; /* bad option */
+			/* check for SMC rendezvous option */
+			if (buf[i] == TCPOPT_EXP &&
+			    buf[i + 1] == TCPOLEN_SMC_BASE &&
+			    (opt_bytes - i >= TCPOLEN_SMC_BASE) &&
+			    !memcmp(&buf[i + 2], TCPOPT_SMC_MAGIC,
+						sizeof(TCPOPT_SMC_MAGIC)))
+				return true;
+			i += buf[i + 1];
+			continue;
+		}
+	}
+
+	return false;
+}
+
+/* Add SMC option to TCP header */
+static int smc_rv_add_smc_option(struct sk_buff *skb)
+{
+	unsigned char smc_opt[] = {TCPOPT_NOP, TCPOPT_NOP,
+				   TCPOPT_EXP, TCPOLEN_SMC_BASE,
+				   TCPOPT_SMC_MAGIC[0], TCPOPT_SMC_MAGIC[1],
+				   TCPOPT_SMC_MAGIC[2], TCPOPT_SMC_MAGIC[3]};
+	struct tcphdr *tcph = tcp_hdr(skb);
+	struct iphdr *iph = ip_hdr(skb);
+	int tcplen = 0;
+
+	if (skb_availroom(skb) < TCPOLEN_SMC)
+		return -EFAULT;
+
+	if (tcp_optlen(skb) + TCPOLEN_SMC > MAX_TCP_OPTION_SPACE)
+		return -EFAULT;
+
+	/* give up if there is data after the TCP header */
+	if (skb_headlen(skb) > ip_hdrlen(skb) + tcp_hdrlen(skb))
+		return -EFAULT;
+
+	if (smc_rv_has_smc_option(skb))
+		return -EFAULT;
+
+	if (!smc_rv_replace_eol_option(skb))
+		return -EFAULT;
+
+	iph->tot_len = cpu_to_be16(be16_to_cpu(iph->tot_len) + TCPOLEN_SMC);
+	iph->check = 0;
+	iph->check = ip_fast_csum(iph, iph->ihl);
+	skb_put_data(skb, smc_opt, TCPOLEN_SMC);
+	tcph->doff += TCPOLEN_SMC_ALIGNED;
+	tcplen = (skb->len - ip_hdrlen(skb));
+	tcph->check = 0;
+	tcph->check = tcp_v4_check(tcplen, iph->saddr, iph->daddr,
+				   csum_partial(tcph, tcplen, 0));
+	skb->ip_summed = CHECKSUM_NONE;
+	return 0;
+}
+
+/* return an smc socket with certain source and destination */
+static struct smc_sock *smc_rv_lookup_connecting_smc(struct net *net,
+						     __be32 dest_addr,
+						     __be16 dest_port,
+						     __be32 source_addr,
+						     __be16 source_port)
+{
+	struct smc_sock *smc = NULL;
+	struct hlist_head *head;
+	struct socket *clcsock;
+	struct sock *sk;
+
+	read_lock(&smc_proto.h.smc_hash->lock);
+	head = &smc_proto.h.smc_hash->ht;
+
+	if (hlist_empty(head))
+		goto out;
+
+	sk_for_each(sk, head) {
+		if (!net_eq(sock_net(sk), net))
+			continue;
+		if (sk->sk_state != SMC_INIT)
+			continue;
+		clcsock = smc_sk(sk)->clcsock;
+		if (!clcsock)
+			continue;
+		if (source_port != htons(clcsock->sk->sk_num))
+			continue;
+		if (source_addr != clcsock->sk->sk_rcv_saddr)
+			continue;
+		if (dest_port != clcsock->sk->sk_dport)
+			continue;
+		if (dest_addr == clcsock->sk->sk_daddr) {
+			smc = smc_sk(sk);
+			break;
+		}
+	}
+
+out:
+	read_unlock(&smc_proto.h.smc_hash->lock);
+	return smc;
+}
+
+/* for netfilter smc_rv_hook_out_clnt (outgoing SYN):
+ * check if there exists a connecting smc socket with certain source and
+ * destination
+ */
+static bool smc_rv_exists_connecting_smc(struct net *net,
+					 __be32 dest_addr,
+					 __be16 dest_port,
+					 __be32 source_addr,
+					 __be16 source_port)
+{
+	return (smc_rv_lookup_connecting_smc(net, dest_addr, dest_port,
+					     source_addr, source_port) ?
+								true : false);
+}
+
+/* for netfilter smc_rv_hook_in_clnt (incoming SYN ACK):
+ * enable SMC-capability for the corresponding socket
+ */
+static void smc_rv_accepting_smc_peer(struct net *net,
+				      __be32 dest_addr,
+				      __be16 dest_port,
+				      __be32 source_addr,
+				      __be16 source_port)
+{
+	struct smc_sock *smc;
+
+	smc = smc_rv_lookup_connecting_smc(net, dest_addr, dest_port,
+					   source_addr, source_port);
+	if (smc)
+		/* connection is SMC-capable */
+		smc->use_fallback = false;
+}
+
+/* return an smc socket listening on a certain port */
+static struct smc_sock *smc_rv_lookup_listen_socket(struct net *net,
+						    __be32 listen_addr,
+						    __be16 listen_port)
+{
+	struct smc_sock *smc = NULL;
+	struct hlist_head *head;
+	struct socket *clcsock;
+	struct sock *sk;
+
+	read_lock(&smc_proto.h.smc_hash->lock);
+	head = &smc_proto.h.smc_hash->ht;
+
+	if (hlist_empty(head))
+		goto out;
+
+	sk_for_each(sk, head) {
+		if (!net_eq(sock_net(sk), net))
+			continue;
+		if (sk->sk_state != SMC_LISTEN)
+			continue;
+		clcsock = smc_sk(sk)->clcsock;
+		if (listen_port != htons(clcsock->sk->sk_num))
+			continue;
+		if (!listen_addr || !clcsock->sk->sk_rcv_saddr ||
+		    listen_addr == clcsock->sk->sk_rcv_saddr) {
+			smc = smc_sk(sk);
+			break;
+		}
+	}
+
+out:
+	read_unlock(&smc_proto.h.smc_hash->lock);
+	return smc;
+}
+
+/* for netfilter smc_rv_hook_in_serv (incoming SYN):
+ * save addr and port of connecting smc peer
+ */
+static void smc_rv_connecting_smc_peer(struct net *net,
+				       __be32 listen_addr,
+				       __be16 listen_port,
+				       __be32 peer_addr,
+				       __be16 peer_port)
+{
+	struct smc_listen_pending *pnd;
+	struct smc_sock *lsmc;
+	unsigned long flags;
+	int i;
+
+	lsmc = smc_rv_lookup_listen_socket(net, listen_addr, listen_port);
+	if (!lsmc)
+		return;
+
+	spin_lock_irqsave(&lsmc->listen_pends_lock, flags);
+	for (i = 0; i < 2 * lsmc->sk.sk_max_ack_backlog; i++) {
+		pnd = lsmc->listen_pends + i;
+		/* either use an unused entry or reuse an outdated entry */
+		if (!pnd->used ||
+		    jiffies_to_msecs(get_jiffies_64() - pnd->time) >
+						SMC_LISTEN_PEND_VALID_TIME) {
+			pnd->used = true;
+			pnd->addr = peer_addr;
+			pnd->port = peer_port;
+			pnd->time = get_jiffies_64();
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&lsmc->listen_pends_lock, flags);
+}
+
+/* for netfilter smc_rv_hook_out_serv (outgoing SYN/ACK):
+ * remove listen_pends entry of connecting smc peer in case of a problem
+ */
+static void smc_rv_remove_smc_peer(struct net *net,
+				   __be32 listen_addr,
+				   __be16 listen_port,
+				   __be32 peer_addr,
+				   __be16 peer_port)
+{
+	struct smc_listen_pending *pnd;
+	struct smc_sock *lsmc;
+	unsigned long flags;
+	int i;
+
+	lsmc = smc_rv_lookup_listen_socket(net, listen_addr, listen_port);
+	if (!lsmc)
+		return;
+
+	spin_lock_irqsave(&lsmc->listen_pends_lock, flags);
+	for (i = 0; i < 2 * lsmc->sk.sk_max_ack_backlog; i++) {
+		pnd = lsmc->listen_pends + i;
+		if (pnd->used &&
+		    pnd->addr == peer_addr &&
+		    pnd->port == peer_port &&
+		    jiffies_to_msecs(get_jiffies_64() - pnd->time) <=
+						SMC_LISTEN_PEND_VALID_TIME) {
+			pnd->used = false;
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&lsmc->listen_pends_lock, flags);
+}
+
+/* for netfilter smc_rv_hook_out_serv (outgoing SYN ACK):
+ * check if there has been a connecting smc peer
+ */
+static bool smc_rv_exists_connecting_smc_peer(struct net *net,
+					      __be32 listen_addr,
+					      __be16 listen_port,
+					      __be32 peer_addr,
+					      __be16 peer_port)
+{
+	struct smc_listen_pending *pnd;
+	struct smc_sock *lsmc;
+	unsigned long flags;
+	int i;
+
+	lsmc = smc_rv_lookup_listen_socket(net, listen_addr, listen_port);
+	if (!lsmc)
+		return false;
+
+	spin_lock_irqsave(&lsmc->listen_pends_lock, flags);
+	for (i = 0; i < 2 * lsmc->sk.sk_max_ack_backlog; i++) {
+		pnd = lsmc->listen_pends + i;
+		if (pnd->used &&
+		    pnd->addr == peer_addr &&
+		    pnd->port == peer_port &&
+		    jiffies_to_msecs(get_jiffies_64() - pnd->time) <=
+						SMC_LISTEN_PEND_VALID_TIME) {
+			spin_unlock_irqrestore(&lsmc->listen_pends_lock, flags);
+			return true;
+		}
+	}
+	spin_unlock_irqrestore(&lsmc->listen_pends_lock, flags);
+	return false;
+}
+
+/* Netfilter hooks */
+
+/* netfilter hook for incoming packets (client) */
+static unsigned int smc_rv_hook_in_clnt(void *priv, struct sk_buff *skb,
+					const struct nf_hook_state *state)
+{
+	struct tcphdr *tcph = tcp_hdr(skb);
+	struct iphdr *iph;
+
+	if (skb_headlen(skb) - sizeof(*iph) < sizeof(*tcph))
+		return NF_ACCEPT;
+
+	iph = ip_hdr(skb);
+	if (iph->protocol != IPPROTO_TCP)
+		return NF_ACCEPT;
+
+	/* Local SMC client, incoming SYN,ACK from server
+	 * check if there really is a local SMC client
+	 * and tell the client connection if the server is SMC capable
+	 */
+	if (tcph->syn == 1 && tcph->ack == 1) {
+		/* check for experimental option */
+		if (!smc_rv_has_smc_option(skb))
+			return NF_ACCEPT;
+		/* add info about server SMC capability */
+		smc_rv_accepting_smc_peer(state->net, iph->saddr, tcph->source,
+					  iph->daddr, tcph->dest);
+	}
+	return NF_ACCEPT;
+}
+
+/* netfilter hook for incoming packets (server) */
+static unsigned int smc_rv_hook_in_serv(void *priv, struct sk_buff *skb,
+					const struct nf_hook_state *state)
+{
+	struct tcphdr *tcph = tcp_hdr(skb);
+	struct iphdr *iph;
+
+	if (skb_headlen(skb) - sizeof(*iph) < sizeof(*tcph))
+		return NF_ACCEPT;
+
+	iph = ip_hdr(skb);
+	if (iph->protocol != IPPROTO_TCP)
+		return NF_ACCEPT;
+
+	/* Local SMC Server, incoming SYN request from client
+	 * check if there is a local SMC server
+	 * and tell the server if there is a new SMC capable client
+	 */
+	if (tcph->syn == 1 && tcph->ack == 0) {
+		/* check for experimental option */
+		if (!smc_rv_has_smc_option(skb))
+			return NF_ACCEPT;
+		/* add info about new client SMC capability */
+		smc_rv_connecting_smc_peer(state->net, iph->daddr, tcph->dest,
+					   iph->saddr, tcph->source);
+	}
+	return NF_ACCEPT;
+}
+
+/* netfilter hook for outgoing packets (client) */
+static unsigned int smc_rv_hook_out_clnt(void *priv, struct sk_buff *skb,
+					 const struct nf_hook_state *state)
+{
+	struct tcphdr *tcph = tcp_hdr(skb);
+	struct iphdr *iph;
+
+	if (skb_headlen(skb) - sizeof(*iph) < sizeof(*tcph))
+		return NF_ACCEPT;
+
+	iph = ip_hdr(skb);
+	if (iph->protocol != IPPROTO_TCP)
+		return NF_ACCEPT;
+
+	/* Local SMC client, outgoing SYN request to server
+	 * add TCP experimental option if there really is a local SMC client
+	 */
+	if (tcph->syn == 1 && tcph->ack == 0) {
+		/* check for local SMC client */
+		if (!smc_rv_exists_connecting_smc(state->net,
+						  iph->daddr, tcph->dest,
+						  iph->saddr, tcph->source))
+			return NF_ACCEPT;
+		/* add experimental option */
+		smc_rv_add_smc_option(skb);
+	}
+	return NF_ACCEPT;
+}
+
+/* netfilter hook for outgoing packets (server) */
+static unsigned int smc_rv_hook_out_serv(void *priv, struct sk_buff *skb,
+					 const struct nf_hook_state *state)
+{
+	struct tcphdr *tcph = tcp_hdr(skb);
+	struct iphdr *iph;
+
+	if (skb_headlen(skb) - sizeof(*iph) < sizeof(*tcph))
+		return NF_ACCEPT;
+
+	iph = ip_hdr(skb);
+	if (iph->protocol != IPPROTO_TCP)
+		return NF_ACCEPT;
+
+	/* Local SMC server, outgoing SYN,ACK to client
+	 * add TCP experimental option if there really is a local SMC server
+	 */
+	if (tcph->syn == 1 && tcph->ack == 1) {
+		/* check if client's SYN contained the experimental option */
+		if (!smc_rv_exists_connecting_smc_peer(state->net,
+						       iph->saddr, tcph->source,
+						       iph->daddr, tcph->dest))
+			return NF_ACCEPT;
+		/* add experimental option */
+		if (smc_rv_add_smc_option(skb) < 0)
+			smc_rv_remove_smc_peer(state->net,
+					       iph->saddr, tcph->source,
+					       iph->daddr, tcph->dest);
+	}
+	return NF_ACCEPT;
+}
+
+static struct nf_hook_ops smc_nfho_ops_clnt[] = {
+	{
+		.hook = smc_rv_hook_in_clnt,
+		.hooknum = NF_INET_PRE_ROUTING,
+		.pf = PF_INET,
+		.priority = NF_IP_PRI_FIRST,
+	},
+	{
+		.hook = smc_rv_hook_out_clnt,
+		.hooknum = NF_INET_POST_ROUTING,
+		.pf = PF_INET,
+		.priority = NF_IP_PRI_FIRST,
+	},
+};
+
+static struct nf_hook_ops smc_nfho_ops_serv[] = {
+	{
+		.hook = smc_rv_hook_in_serv,
+		.hooknum = NF_INET_PRE_ROUTING,
+		.pf = PF_INET,
+		.priority = NF_IP_PRI_FIRST,
+	},
+	{
+		.hook = smc_rv_hook_out_serv,
+		.hooknum = NF_INET_POST_ROUTING,
+		.pf = PF_INET,
+		.priority = NF_IP_PRI_FIRST,
+	},
+};
+
+struct smc_nf_hook smc_nfho_clnt = {
+	.refcount = 0,
+	.hook = &smc_nfho_ops_clnt[0],
+};
+
+struct smc_nf_hook smc_nfho_serv = {
+	.refcount = 0,
+	.hook = &smc_nfho_ops_serv[0],
+};
+
+int smc_rv_nf_register_hook(struct net *net, struct smc_nf_hook *nfho)
+{
+	int rc = 0;
+
+	mutex_lock(&nfho->nf_hook_mutex);
+	if (!(nfho->refcount++)) {
+		rc = nf_register_net_hooks(net, nfho->hook, 2);
+		if (rc)
+			nfho->refcount--;
+	}
+	mutex_unlock(&nfho->nf_hook_mutex);
+	return rc;
+}
+
+void smc_rv_nf_unregister_hook(struct net *net, struct smc_nf_hook *nfho)
+{
+	mutex_lock(&nfho->nf_hook_mutex);
+	if (!(--nfho->refcount))
+		nf_unregister_net_hooks(net, nfho->hook, 2);
+	mutex_unlock(&nfho->nf_hook_mutex);
+}
+
+void __init smc_rv_init(void)
+{
+	mutex_init(&smc_nfho_clnt.nf_hook_mutex);
+	mutex_init(&smc_nfho_serv.nf_hook_mutex);
+}
diff --git a/net/smc/smc_rv.h b/net/smc/smc_rv.h
new file mode 100644
index 000000000000..c3bdf4c0a5cb
--- /dev/null
+++ b/net/smc/smc_rv.h
@@ -0,0 +1,31 @@
+/*
+ * Shared Memory Communications over RDMA (SMC-R) and RoCE
+ *
+ *  Definitions for SMC Rendezvous - SMC capability checking
+ *
+ *  Copyright IBM Corp. 2017
+ *
+ *  Author(s):	Hans Wippel <hwippel@linux.vnet.ibm.com>
+ *		Ursula Braun <ubraun@linux.vnet.ibm.com>
+ */
+
+#ifndef _SMC_RV_H
+#define _SMC_RV_H
+
+#include <linux/netfilter.h>
+
+#define SMC_LISTEN_PEND_VALID_TIME	(600 * HZ)
+
+struct smc_nf_hook {
+	struct mutex		nf_hook_mutex;	/* serialize nf register ops */
+	int			refcount;
+	struct nf_hook_ops	*hook;
+};
+
+extern struct smc_nf_hook smc_nfho_clnt;
+extern struct smc_nf_hook smc_nfho_serv;
+
+int smc_rv_nf_register_hook(struct net *net, struct smc_nf_hook *nfho);
+void smc_rv_nf_unregister_hook(struct net *net, struct smc_nf_hook *nfho);
+void smc_rv_init(void) __init;
+#endif
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next] selftests: rtnetlink: test RTM_GETNETCONF
From: Florian Westphal @ 2017-10-10 14:18 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

exercise RTM_GETNETCONF call path for unspec, inet and inet6
families, they are DOIT_UNLOCKED candidates.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 tools/testing/selftests/net/rtnetlink.sh | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index e8c86c416ed0..a8a8cdf726b2 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -37,6 +37,26 @@ kci_del_dummy()
 	check_err $?
 }
 
+kci_test_netconf()
+{
+	dev="$1"
+	r=$ret
+
+	ip netconf show dev "$dev" > /dev/null
+	check_err $?
+
+	for f in 4 6; do
+		ip -$f netconf show dev "$dev" > /dev/null
+		check_err $?
+	done
+
+	if [ $ret -ne 0 ] ;then
+		echo "FAIL: ip netconf show $dev"
+		test $r -eq 0 && ret=0
+		return 1
+	fi
+}
+
 # add a bridge with vlans on top
 kci_test_bridge()
 {
@@ -63,6 +83,11 @@ kci_test_bridge()
 	check_err $?
 	ip r s t all > /dev/null
 	check_err $?
+
+	for name in "$devbr" "$vlandev" "$devdummy" ; do
+		kci_test_netconf "$name"
+	done
+
 	ip -6 addr del dev "$vlandev" dead:42::1234/64
 	check_err $?
 
@@ -100,6 +125,9 @@ kci_test_gre()
 	check_err $?
 	ip addr > /dev/null
 	check_err $?
+
+	kci_test_netconf "$gredev"
+
 	ip addr del dev "$devdummy" 10.23.7.11/24
 	check_err $?
 
-- 
2.13.6

^ permalink raw reply related

* Re: [PATCH net-next v2 4/5] selinux: bpf: Add selinux check for eBPF syscall operations
From: Stephen Smalley @ 2017-10-10 14:18 UTC (permalink / raw)
  To: Chenbo Feng, linux-security-module, netdev, SELinux
  Cc: Jeffrey Vander Stoep, Alexei Starovoitov, lorenzo,
	Daniel Borkmann, Chenbo Feng
In-Reply-To: <20171009222028.13096-5-chenbofeng.kernel@gmail.com>

On Mon, 2017-10-09 at 15:20 -0700, Chenbo Feng wrote:
> From: Chenbo Feng <fengc@google.com>
> 
> Implement the actual checks introduced to eBPF related syscalls. This
> implementation use the security field inside bpf object to store a
> sid that
> identify the bpf object. And when processes try to access the object,
> selinux will check if processes have the right privileges. The
> creation
> of eBPF object are also checked at the general bpf check hook and new
> cmd introduced to eBPF domain can also be checked there.
> 
> Signed-off-by: Chenbo Feng <fengc@google.com>
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  security/selinux/hooks.c            | 111
> ++++++++++++++++++++++++++++++++++++
>  security/selinux/include/classmap.h |   2 +
>  security/selinux/include/objsec.h   |   4 ++
>  3 files changed, 117 insertions(+)
> 
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index f5d304736852..41aba4e3d57c 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -85,6 +85,7 @@
>  #include <linux/export.h>
>  #include <linux/msg.h>
>  #include <linux/shm.h>
> +#include <linux/bpf.h>
>  
>  #include "avc.h"
>  #include "objsec.h"
> @@ -6252,6 +6253,106 @@ static void selinux_ib_free_security(void
> *ib_sec)
>  }
>  #endif
>  
> +#ifdef CONFIG_BPF_SYSCALL
> +static int selinux_bpf(int cmd, union bpf_attr *attr,
> +				     unsigned int size)
> +{
> +	u32 sid = current_sid();
> +	int ret;
> +
> +	switch (cmd) {
> +	case BPF_MAP_CREATE:
> +		ret = avc_has_perm(sid, sid, SECCLASS_BPF_MAP,
> BPF_MAP__CREATE,
> +				   NULL);
> +		break;
> +	case BPF_PROG_LOAD:
> +		ret = avc_has_perm(sid, sid, SECCLASS_BPF_PROG,
> BPF_PROG__LOAD,
> +				   NULL);
> +		break;
> +	default:
> +		ret = 0;
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
> +static u32 bpf_map_fmode_to_av(fmode_t fmode)
> +{
> +	u32 av = 0;
> +
> +	if (f_mode & FMODE_READ)
> +		av |= BPF_MAP__READ;
> +	if (f_mode & FMODE_WRITE)
> +		av |= BPF_MAP__WRITE;
> +	return av;
> +}
> +
> +static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode)
> +{
> +	u32 sid = current_sid();
> +	struct bpf_security_struct *bpfsec;
> +
> +	bpfsec = map->security;
> +	return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_MAP,
> +			    bpf_map_fmode_to_av(fmode), NULL);
> +}
> +
> +static int selinux_bpf_prog(struct bpf_prog *prog)
> +{
> +	u32 sid = current_sid();
> +	struct bpf_security_struct *bpfsec;
> +
> +	bpfsec = prog->aux->security;
> +	return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_PROG,
> +			    BPF_PROG__USE, NULL);
> +}
> +
> +static int selinux_bpf_map_alloc(struct bpf_map *map)
> +{
> +	struct bpf_security_struct *bpfsec;
> +
> +	bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
> +	if (!bpfsec)
> +		return -ENOMEM;
> +
> +	bpfsec->sid = current_sid();
> +	map->security = bpfsec;
> +
> +	return 0;
> +}
> +
> +static void selinux_bpf_map_free(struct bpf_map *map)
> +{
> +	struct bpf_security_struct *bpfsec = map->security;
> +
> +	map->security = NULL;
> +	kfree(bpfsec);
> +}
> +
> +static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux)
> +{
> +	struct bpf_security_struct *bpfsec;
> +
> +	bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
> +	if (!bpfsec)
> +		return -ENOMEM;
> +
> +	bpfsec->sid = current_sid();
> +	aux->security = bpfsec;
> +
> +	return 0;
> +}
> +
> +static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
> +{
> +	struct bpf_security_struct *bpfsec = aux->security;
> +
> +	aux->security = NULL;
> +	kfree(bpfsec);
> +}
> +#endif
> +
>  static struct security_hook_list selinux_hooks[] __lsm_ro_after_init
> = {
>  	LSM_HOOK_INIT(binder_set_context_mgr,
> selinux_binder_set_context_mgr),
>  	LSM_HOOK_INIT(binder_transaction,
> selinux_binder_transaction),
> @@ -6471,6 +6572,16 @@ static struct security_hook_list
> selinux_hooks[] __lsm_ro_after_init = {
>  	LSM_HOOK_INIT(audit_rule_match, selinux_audit_rule_match),
>  	LSM_HOOK_INIT(audit_rule_free, selinux_audit_rule_free),
>  #endif
> +
> +#ifdef CONFIG_BPF_SYSCALL
> +	LSM_HOOK_INIT(bpf, selinux_bpf),
> +	LSM_HOOK_INIT(bpf_map, selinux_bpf_map),
> +	LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
> +	LSM_HOOK_INIT(bpf_map_alloc_security,
> selinux_bpf_map_alloc),
> +	LSM_HOOK_INIT(bpf_prog_alloc_security,
> selinux_bpf_prog_alloc),
> +	LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
> +	LSM_HOOK_INIT(bpf_prog_free_security,
> selinux_bpf_prog_free),
> +#endif
>  };
>  
>  static __init int selinux_init(void)
> diff --git a/security/selinux/include/classmap.h
> b/security/selinux/include/classmap.h
> index 35ffb29a69cb..7253c5eea59c 100644
> --- a/security/selinux/include/classmap.h
> +++ b/security/selinux/include/classmap.h
> @@ -237,6 +237,8 @@ struct security_class_mapping secclass_map[] = {
>  	  { "access", NULL } },
>  	{ "infiniband_endport",
>  	  { "manage_subnet", NULL } },
> +	{ "bpf_map", {"create", "read", "write"} },
> +	{ "bpf_prog", {"load", "use"} },

Again I have to ask: do you truly need/want two separate classes, or
would a single class with distinct permissions suffice, ala:
        { "bpf", { "create_map", "read_map", "write_map", "prog_load",
"prog_use" } },

and then allow A self:bpf { create_map read_map write_map prog_load
prog_use }; would be stored in a single policy avtab rule, and be
cached in a single AVC entry.

>  	{ NULL }
>    };
>  
> diff --git a/security/selinux/include/objsec.h
> b/security/selinux/include/objsec.h
> index 1649cd18eb0b..3d54468ce334 100644
> --- a/security/selinux/include/objsec.h
> +++ b/security/selinux/include/objsec.h
> @@ -150,6 +150,10 @@ struct pkey_security_struct {
>  	u32	sid;	/* SID of pkey */
>  };
>  
> +struct bpf_security_struct {
> +	u32 sid;  /*SID of bpf obj creater*/
> +};
> +
>  extern unsigned int selinux_checkreqprot;
>  
>  #endif /* _SELINUX_OBJSEC_H_ */

^ permalink raw reply

* Re: [PATCH v2 net-next] rtnetlink: bridge: use ext_ack instead of printk
From: David Ahern @ 2017-10-10 14:18 UTC (permalink / raw)
  To: Florian Westphal, netdev
In-Reply-To: <20171010113236.8889-1-fw@strlen.de>

On 10/10/17 5:32 AM, Florian Westphal wrote:
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index e84d108cfee4..19ea53a5210f 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -3066,21 +3066,22 @@ int ndo_dflt_fdb_add(struct ndmsg *ndm,
>  }
>  EXPORT_SYMBOL(ndo_dflt_fdb_add);
>  
> -static int fdb_vid_parse(struct nlattr *vlan_attr, u16 *p_vid)
> +static int fdb_vid_parse(struct nlattr *vlan_attr, u16 *p_vid,
> +			 struct netlink_ext_ack *exta)
>  {
>  	u16 vid = 0;
>  
>  	if (vlan_attr) {
>  		if (nla_len(vlan_attr) != sizeof(u16)) {
> -			pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid vlan\n");
> +			NL_SET_ERR_MSG(exta, "RTM_NEWNEIGH with invalid vlan");

I realize you are keeping the existing wording, but the messages are
moving from out of line pr_info to inline message in response to a user
command.  From a user's perspective the RTM_NEWNEIGH and DELNEIGH do not
add much value, and the add and del in the bridge command tells which it
is. So in this case just emit "Invalid vlan id".

Although this failure is an invalid vlan attribute as opposed to an
invalid vlan id which is what the next message checks. So the message
needs to be updated as well.


>  			return -EINVAL;
>  		}
>  
>  		vid = nla_get_u16(vlan_attr);
>  
>  		if (!vid || vid >= VLAN_VID_MASK) {
> -			pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid vlan id %d\n",
> -				vid);
> +			NL_SET_ERR_MSG(exta,
> +				       "RTM_NEWNEIGH with invalid vlan id");
>  			return -EINVAL;
>  		}
>  	}

^ permalink raw reply

* Re: [PATCH net-next v2 5/5] selinux: bpf: Add addtional check for bpf object file receive
From: Stephen Smalley @ 2017-10-10 14:24 UTC (permalink / raw)
  To: Chenbo Feng, linux-security-module, netdev, SELinux
  Cc: Daniel Borkmann, Chenbo Feng, Alexei Starovoitov, lorenzo
In-Reply-To: <20171009222028.13096-6-chenbofeng.kernel@gmail.com>

On Mon, 2017-10-09 at 15:20 -0700, Chenbo Feng wrote:
> From: Chenbo Feng <fengc@google.com>
> 
> Introduce a bpf object related check when sending and receiving files
> through unix domain socket as well as binder. It checks if the
> receiving
> process have privilege to read/write the bpf map or use the bpf
> program.
> This check is necessary because the bpf maps and programs are using a
> anonymous inode as their shared inode so the normal way of checking
> the
> files and sockets when passing between processes cannot work properly
> on
> eBPF object. This check only works when the BPF_SYSCALL is
> configured.
> The information stored inside the file security struct is the same as
> the information in bpf object security struct.
> 
> Signed-off-by: Chenbo Feng <fengc@google.com>
> ---
>  include/linux/bpf.h       |  3 +++
>  include/linux/lsm_hooks.h | 17 +++++++++++++
>  include/linux/security.h  |  9 +++++++
>  kernel/bpf/syscall.c      |  4 ++--
>  security/security.c       |  8 +++++++
>  security/selinux/hooks.c  | 61
> +++++++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 100 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 225740688ab7..81d6c01b8825 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -285,6 +285,9 @@ int bpf_prog_array_copy_to_user(struct
> bpf_prog_array __rcu *progs,
>  #ifdef CONFIG_BPF_SYSCALL
>  DECLARE_PER_CPU(int, bpf_prog_active);
>  
> +extern const struct file_operations bpf_map_fops;
> +extern const struct file_operations bpf_prog_fops;
> +
>  #define BPF_PROG_TYPE(_id, _ops) \
>  	extern const struct bpf_verifier_ops _ops;
>  #define BPF_MAP_TYPE(_id, _ops) \
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 7161d8e7ee79..517dea60b87b 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -1385,6 +1385,19 @@
>   * @bpf_prog_free_security:
>   *	Clean up the security information stored inside bpf prog.
>   *
> + * @bpf_map_file:
> + *	When creating a bpf map fd, set up the file security
> information with
> + *	the bpf security information stored in the map struct. So
> when the map
> + *	fd is passed between processes, the security module can
> directly read
> + *	the security information from file security struct rather
> than the bpf
> + *	security struct.
> + *
> + * @bpf_prog_file:
> + *	When creating a bpf prog fd, set up the file security
> information with
> + *	the bpf security information stored in the prog struct. So
> when the prog
> + *	fd is passed between processes, the security module can
> directly read
> + *	the security information from file security struct rather
> than the bpf
> + *	security struct.
>   */
>  union security_list_options {
>  	int (*binder_set_context_mgr)(struct task_struct *mgr);
> @@ -1726,6 +1739,8 @@ union security_list_options {
>  	void (*bpf_map_free_security)(struct bpf_map *map);
>  	int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux);
>  	void (*bpf_prog_free_security)(struct bpf_prog_aux *aux);
> +	void (*bpf_map_file)(struct bpf_map *map, struct file
> *file);
> +	void (*bpf_prog_file)(struct bpf_prog_aux *aux, struct file
> *file);
>  #endif /* CONFIG_BPF_SYSCALL */
>  };
>  
> @@ -1954,6 +1969,8 @@ struct security_hook_heads {
>  	struct list_head bpf_map_free_security;
>  	struct list_head bpf_prog_alloc_security;
>  	struct list_head bpf_prog_free_security;
> +	struct list_head bpf_map_file;
> +	struct list_head bpf_prog_file;
>  #endif /* CONFIG_BPF_SYSCALL */
>  } __randomize_layout;
>  
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 18800b0911e5..57573b794e2d 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -1740,6 +1740,8 @@ extern int security_bpf_map_alloc(struct
> bpf_map *map);
>  extern void security_bpf_map_free(struct bpf_map *map);
>  extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux);
>  extern void security_bpf_prog_free(struct bpf_prog_aux *aux);
> +extern void security_bpf_map_file(struct bpf_map *map, struct file
> *file);
> +extern void security_bpf_prog_file(struct bpf_prog_aux *aux, struct
> file *file);
>  #else
>  static inline int security_bpf(int cmd, union bpf_attr *attr,
>  					     unsigned int size)
> @@ -1772,6 +1774,13 @@ static inline int
> security_bpf_prog_alloc(struct bpf_prog_aux *aux)
>  
>  static inline void security_bpf_prog_free(struct bpf_prog_aux *aux)
>  { }
> +
> +static inline void security_bpf_map_file(struct bpf_map *map, struct
> file *file)
> +{ }
> +
> +static inline void security_bpf_prog_file(struct bpf_prog_aux *aux,
> +					  struct file *file)
> +{ }
>  #endif /* CONFIG_SECURITY */
>  #endif /* CONFIG_BPF_SYSCALL */
>  
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 1cf31ddd7616..b144181d3f3a 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -313,7 +313,7 @@ static ssize_t bpf_dummy_write(struct file *filp,
> const char __user *buf,
>  	return -EINVAL;
>  }
>  
> -static const struct file_operations bpf_map_fops = {
> +const struct file_operations bpf_map_fops = {
>  #ifdef CONFIG_PROC_FS
>  	.show_fdinfo	= bpf_map_show_fdinfo,
>  #endif
> @@ -964,7 +964,7 @@ static void bpf_prog_show_fdinfo(struct seq_file
> *m, struct file *filp)
>  }
>  #endif
>  
> -static const struct file_operations bpf_prog_fops = {
> +const struct file_operations bpf_prog_fops = {
>  #ifdef CONFIG_PROC_FS
>  	.show_fdinfo	= bpf_prog_show_fdinfo,
>  #endif
> diff --git a/security/security.c b/security/security.c
> index 1cd8526cb0b7..dacf649b8cfa 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -1734,4 +1734,12 @@ void security_bpf_prog_free(struct
> bpf_prog_aux *aux)
>  {
>  	call_void_hook(bpf_prog_free_security, aux);
>  }
> +void security_bpf_map_file(struct bpf_map *map, struct file *file)
> +{
> +	call_void_hook(bpf_map_file, map, file);
> +}
> +void security_bpf_prog_file(struct bpf_prog_aux *aux, struct file
> *file)
> +{
> +	call_void_hook(bpf_prog_file, aux, file);
> +}
>  #endif /* CONFIG_BPF_SYSCALL */
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 41aba4e3d57c..fea88655e0ee 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -1815,6 +1815,10 @@ static inline int file_path_has_perm(const
> struct cred *cred,
>  	return inode_has_perm(cred, file_inode(file), av, &ad);
>  }
>  
> +#ifdef CONFIG_BPF_SYSCALL
> +static int bpf_file_check(struct file *file, u32 sid);
> +#endif
> +
>  /* Check whether a task can use an open file descriptor to
>     access an inode in a given way.  Check access to the
>     descriptor itself, and then use dentry_has_perm to
> @@ -1845,6 +1849,12 @@ static int file_has_perm(const struct cred
> *cred,
>  			goto out;
>  	}
>  
> +#ifdef CONFIG_BPF_SYSCALL
> +	rc = bpf_file_check(file, cred_sid(cred));
> +	if (rc)
> +		goto out;
> +#endif
> +
>  	/* av is zero if only checking access to the descriptor. */
>  	rc = 0;
>  	if (av)
> @@ -2165,6 +2175,12 @@ static int selinux_binder_transfer_file(struct
> task_struct *from,
>  			return rc;
>  	}
>  
> +#ifdef CONFIG_BPF_SYSCALL
> +	rc = bpf_file_check(file, sid);
> +	if (rc)
> +		return rc;
> +#endif
> +
>  	if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
>  		return 0;
>  
> @@ -6288,6 +6304,33 @@ static u32 bpf_map_fmode_to_av(fmode_t fmode)
>  	return av;
>  }
>  
> +/* This function will check the file pass through unix socket or
> binder to see
> + * if it is a bpf related object. And apply correspinding checks on
> the bpf
> + * object based on the type. The bpf maps and programs, not like
> other files and
> + * socket, are using a shared anonymous inode inside the kernel as
> their inode.
> + * So checking that inode cannot identify if the process have
> privilege to
> + * access the bpf object and that's why we have to add this
> additional check in
> + * selinux_file_receive and selinux_binder_transfer_files.
> + */
> +static int bpf_file_check(struct file *file, u32 sid)
> +{
> +	struct file_security_struct *fsec = file->f_security;
> +	int ret;
> +
> +	if (file->f_op == &bpf_map_fops) {
> +		ret = avc_has_perm(sid, fsec->sid, SECCLASS_BPF_MAP,
> +				   bpf_map_fmode_to_av(file-
> >f_mode), NULL);
> +		if (ret)
> +			return ret;
> +	} else if (file->f_op == &bpf_prog_fops) {
> +		ret = avc_has_perm(sid, fsec->sid,
> SECCLASS_BPF_PROG,
> +				   BPF_PROG__USE, NULL);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
>  static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode)
>  {
>  	u32 sid = current_sid();
> @@ -6351,6 +6394,22 @@ static void selinux_bpf_prog_free(struct
> bpf_prog_aux *aux)
>  	aux->security = NULL;
>  	kfree(bpfsec);
>  }
> +
> +static void selinux_bpf_map_file(struct bpf_map *map, struct file
> *file)
> +{
> +	struct bpf_security_struct *bpfsec = map->security;
> +	struct file_security_struct *fsec = file->f_security;
> +
> +	fsec->sid = bpfsec->sid;
> +}
> +
> +static void selinux_bpf_prog_file(struct bpf_prog_aux *aux, struct
> file *file)
> +{
> +	struct bpf_security_struct *bpfsec = aux->security;
> +	struct file_security_struct *fsec = file->f_security;
> +
> +	fsec->sid = bpfsec->sid;

I could be wrong, but isn't it the case that fsec->sid already will
equal bpfsec->sid, because they are both created by the same thread
during the same system call, and they each inherit the SID of the
current task?

What I expected you to do was to add and set a flags field in the
file_security_struct to indicate that this is a bpf map or prog, and
then test for that in your bpf_file_check() function instead of having
to export and test the fops structures.


> +}
>  #endif
>  
>  static struct security_hook_list selinux_hooks[] __lsm_ro_after_init
> = {
> @@ -6581,6 +6640,8 @@ static struct security_hook_list
> selinux_hooks[] __lsm_ro_after_init = {
>  	LSM_HOOK_INIT(bpf_prog_alloc_security,
> selinux_bpf_prog_alloc),
>  	LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
>  	LSM_HOOK_INIT(bpf_prog_free_security,
> selinux_bpf_prog_free),
> +	LSM_HOOK_INIT(bpf_map_file, selinux_bpf_map_file),
> +	LSM_HOOK_INIT(bpf_prog_file, selinux_bpf_prog_file),
>  #endif
>  };
>  

^ permalink raw reply

* Re: net/wireless/ray_cs: Convert timers to use
From: Kees Cook @ 2017-10-10 14:22 UTC (permalink / raw)
  To: Kalle Valo; +Cc: LKML, linux-wireless, Network Development, Thomas Gleixner
In-Reply-To: <20171010082603.8E283607C5@smtp.codeaurora.org>

On Tue, Oct 10, 2017 at 1:26 AM, Kalle Valo <kvalo@codeaurora.org> wrote:
> Kees Cook <keescook@chromium.org> wrote:
>
>> In preparation for unconditionally passing the struct timer_list pointer to
>> all timer callbacks, switch to using the new timer_setup() and from_timer()
>> to pass the timer pointer explicitly.
>>
>> Cc: Kalle Valo <kvalo@codeaurora.org>
>> Cc: linux-wireless@vger.kernel.org
>> Cc: netdev@vger.kernel.org
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>
> I'll apply this once I have fast forwarded wireless-drivers-next to
> -rc3. I'll also fix the title, what was it supposed to say?

It was truncated from "net/wireless/ray_cs: Convert timers to use
timer_setup()"; I've fixed that glitch in my workflow now.

> Patch set to Awaiting Upstream.

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply

* Re: [PATCH v2 net-next] rtnetlink: bridge: use ext_ack instead of printk
From: Florian Westphal @ 2017-10-10 14:28 UTC (permalink / raw)
  To: David Ahern; +Cc: Florian Westphal, netdev
In-Reply-To: <9c01905a-92e1-1246-35ee-2a60ac11733e@gmail.com>

David Ahern <dsahern@gmail.com> wrote:
> On 10/10/17 5:32 AM, Florian Westphal wrote:
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > index e84d108cfee4..19ea53a5210f 100644
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -3066,21 +3066,22 @@ int ndo_dflt_fdb_add(struct ndmsg *ndm,
> >  }
> >  EXPORT_SYMBOL(ndo_dflt_fdb_add);
> >  
> > -static int fdb_vid_parse(struct nlattr *vlan_attr, u16 *p_vid)
> > +static int fdb_vid_parse(struct nlattr *vlan_attr, u16 *p_vid,
> > +			 struct netlink_ext_ack *exta)
> >  {
> >  	u16 vid = 0;
> >  
> >  	if (vlan_attr) {
> >  		if (nla_len(vlan_attr) != sizeof(u16)) {
> > -			pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid vlan\n");
> > +			NL_SET_ERR_MSG(exta, "RTM_NEWNEIGH with invalid vlan");
> 
> I realize you are keeping the existing wording, but the messages are
> moving from out of line pr_info to inline message in response to a user
> command.  From a user's perspective the RTM_NEWNEIGH and DELNEIGH do not
> add much value, and the add and del in the bridge command tells which it
> is. So in this case just emit "Invalid vlan id".

Right, makes sense.

> Although this failure is an invalid vlan attribute as opposed to an
> invalid vlan id which is what the next message checks. So the message
> needs to be updated as well.

Indeed, I'll send a v2, thanks!

^ permalink raw reply

* Re: [patch net-next 2/4] net: sched: introduce per-egress action device callbacks
From: Jiri Pirko @ 2017-10-10 14:31 UTC (permalink / raw)
  To: David Laight
  Cc: netdev@vger.kernel.org, davem@davemloft.net, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, saeedm@mellanox.com,
	matanb@mellanox.com, leonro@mellanox.com, mlxsw@mellanox.com
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DD008ED67@AcuExch.aculab.com>

Tue, Oct 10, 2017 at 03:31:59PM CEST, David.Laight@ACULAB.COM wrote:
>From: Jiri Pirko
>> Sent: 10 October 2017 08:30
>> Introduce infrastructure that allows drivers to register callbacks that
>> are called whenever tc would offload inserted rule and specified device
>> acts as tc action egress device.
>
>How does a driver safely unregister a callback?
>(to avoid a race with the callback being called.)
>
>Usually this requires a callback in the context that makes the
>notification callbacks indicating that no more such callbacks
>will be made.

rtnl is your answer. It is being held during register/unregister/cb

^ permalink raw reply

* [PATCH v3 net-next] rtnetlink: bridge: use ext_ack instead of printk
From: Florian Westphal @ 2017-10-10 14:44 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal, David Ahern

We can now piggyback error strings to userspace via extended acks
rather than using printk.

Before:
bridge fdb add 01:02:03:04:05:06 dev br0 vlan 4095
RTNETLINK answers: Invalid argument

After:
bridge fdb add 01:02:03:04:05:06 dev br0 vlan 4095
Error: invalid vlan id.

v3: drop 'RTM_' prefixes, suggested by David Ahern, they
are not useful, the add/del in bridge command line is enough.

Also reword error in response to malformed/bad vlan id attribute
size.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/core/rtnetlink.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e84d108cfee4..af2dea45df33 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3066,21 +3066,21 @@ int ndo_dflt_fdb_add(struct ndmsg *ndm,
 }
 EXPORT_SYMBOL(ndo_dflt_fdb_add);
 
-static int fdb_vid_parse(struct nlattr *vlan_attr, u16 *p_vid)
+static int fdb_vid_parse(struct nlattr *vlan_attr, u16 *p_vid,
+			 struct netlink_ext_ack *extack)
 {
 	u16 vid = 0;
 
 	if (vlan_attr) {
 		if (nla_len(vlan_attr) != sizeof(u16)) {
-			pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid vlan\n");
+			NL_SET_ERR_MSG(extack, "invalid vlan attribute size");
 			return -EINVAL;
 		}
 
 		vid = nla_get_u16(vlan_attr);
 
 		if (!vid || vid >= VLAN_VID_MASK) {
-			pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid vlan id %d\n",
-				vid);
+			NL_SET_ERR_MSG(extack, "invalid vlan id");
 			return -EINVAL;
 		}
 	}
@@ -3105,24 +3105,24 @@ static int rtnl_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	ndm = nlmsg_data(nlh);
 	if (ndm->ndm_ifindex == 0) {
-		pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid ifindex\n");
+		NL_SET_ERR_MSG(extack, "invalid ifindex");
 		return -EINVAL;
 	}
 
 	dev = __dev_get_by_index(net, ndm->ndm_ifindex);
 	if (dev == NULL) {
-		pr_info("PF_BRIDGE: RTM_NEWNEIGH with unknown ifindex\n");
+		NL_SET_ERR_MSG(extack, "unknown ifindex");
 		return -ENODEV;
 	}
 
 	if (!tb[NDA_LLADDR] || nla_len(tb[NDA_LLADDR]) != ETH_ALEN) {
-		pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid address\n");
+		NL_SET_ERR_MSG(extack, "invalid address");
 		return -EINVAL;
 	}
 
 	addr = nla_data(tb[NDA_LLADDR]);
 
-	err = fdb_vid_parse(tb[NDA_VLAN], &vid);
+	err = fdb_vid_parse(tb[NDA_VLAN], &vid, extack);
 	if (err)
 		return err;
 
@@ -3209,24 +3209,24 @@ static int rtnl_fdb_del(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	ndm = nlmsg_data(nlh);
 	if (ndm->ndm_ifindex == 0) {
-		pr_info("PF_BRIDGE: RTM_DELNEIGH with invalid ifindex\n");
+		NL_SET_ERR_MSG(extack, "invalid ifindex");
 		return -EINVAL;
 	}
 
 	dev = __dev_get_by_index(net, ndm->ndm_ifindex);
 	if (dev == NULL) {
-		pr_info("PF_BRIDGE: RTM_DELNEIGH with unknown ifindex\n");
+		NL_SET_ERR_MSG(extack, "unknown ifindex");
 		return -ENODEV;
 	}
 
 	if (!tb[NDA_LLADDR] || nla_len(tb[NDA_LLADDR]) != ETH_ALEN) {
-		pr_info("PF_BRIDGE: RTM_DELNEIGH with invalid address\n");
+		NL_SET_ERR_MSG(extack, "invalid address");
 		return -EINVAL;
 	}
 
 	addr = nla_data(tb[NDA_LLADDR]);
 
-	err = fdb_vid_parse(tb[NDA_VLAN], &vid);
+	err = fdb_vid_parse(tb[NDA_VLAN], &vid, extack);
 	if (err)
 		return err;
 
@@ -3666,7 +3666,7 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	dev = __dev_get_by_index(net, ifm->ifi_index);
 	if (!dev) {
-		pr_info("PF_BRIDGE: RTM_SETLINK with unknown ifindex\n");
+		NL_SET_ERR_MSG(extack, "RTM_SETLINK with unknown ifindex");
 		return -ENODEV;
 	}
 
@@ -3741,7 +3741,7 @@ static int rtnl_bridge_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	dev = __dev_get_by_index(net, ifm->ifi_index);
 	if (!dev) {
-		pr_info("PF_BRIDGE: RTM_SETLINK with unknown ifindex\n");
+		NL_SET_ERR_MSG(extack, "RTM_SETLINK with unknown ifindex");
 		return -ENODEV;
 	}
 
-- 
2.13.6

^ permalink raw reply related

* Re: [PATCH v4 1/2] net: phy: DP83822 initial driver submission
From: Dan Murphy @ 2017-10-10 14:45 UTC (permalink / raw)
  To: Andrew F. Davis, andrew, f.fainelli; +Cc: netdev, Woojung.Huh
In-Reply-To: <17264c76-92d5-9dc8-b5ec-9dc09cf38ec0@ti.com>

Andrew

Thanks for the review

On 10/09/2017 02:12 PM, Andrew F. Davis wrote:
> On 10/09/2017 11:59 AM, Dan Murphy wrote:
>> Add support for the TI  DP83822 10/100Mbit ethernet phy.
>>
>> The DP83822 provides flexibility to connect to a MAC through a
>> standard MII, RMII or RGMII interface.
>>
>> In addition the DP83822 needs to be removed from the DP83848 driver
>> as the WoL support is added here for this device.
>>
>> Datasheet:
>> http://www.ti.com/product/DP83822I/datasheet
>>
>> Signed-off-by: Dan Murphy <dmurphy@ti.com>> ---
>>
>> v4 - Squash DP83822 removal patch into this patch -
>> https://www.mail-archive.com/netdev@vger.kernel.org/msg192424.html
>>
>> v3 - Fixed WoL indication bit and removed mutex for suspend/resume - 
>> https://www.mail-archive.com/netdev@vger.kernel.org/msg191891.html and
>> https://www.mail-archive.com/netdev@vger.kernel.org/msg191665.html
>>
>> v2 - Updated per comments.  Removed unnessary parantheis, called genphy_suspend/
>> resume routines and then performing WoL changes, reworked sopass storage and reduced
>> the number of phy reads, and moved WOL_SECURE_ON - 
>> https://www.mail-archive.com/netdev@vger.kernel.org/msg191392.html
>>
>>  drivers/net/phy/Kconfig   |   5 +
>>  drivers/net/phy/Makefile  |   1 +
>>  drivers/net/phy/dp83822.c | 302 ++++++++++++++++++++++++++++++++++++++++++++++
>>  drivers/net/phy/dp83848.c |   3 -
>>  4 files changed, 308 insertions(+), 3 deletions(-)
>>  create mode 100644 drivers/net/phy/dp83822.c
>>
>> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
>> index cd931cf9dcc2..8e78a482e09e 100644
>> --- a/drivers/net/phy/Kconfig
>> +++ b/drivers/net/phy/Kconfig
>> @@ -277,6 +277,11 @@ config DAVICOM_PHY
>>  	---help---
>>  	  Currently supports dm9161e and dm9131
>>  
>> +config DP83822_PHY
>> +	tristate "Texas Instruments DP83822 PHY"
>> +	---help---
>> +	  Supports the DP83822 PHY.
>> +
>>  config DP83848_PHY
>>  	tristate "Texas Instruments DP83848 PHY"
>>  	---help---
>> diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
>> index 416df92fbf4f..df3b82ba8550 100644
>> --- a/drivers/net/phy/Makefile
>> +++ b/drivers/net/phy/Makefile
>> @@ -55,6 +55,7 @@ obj-$(CONFIG_CICADA_PHY)	+= cicada.o
>>  obj-$(CONFIG_CORTINA_PHY)	+= cortina.o
>>  obj-$(CONFIG_DAVICOM_PHY)	+= davicom.o
>>  obj-$(CONFIG_DP83640_PHY)	+= dp83640.o
>> +obj-$(CONFIG_DP83822_PHY)	+= dp83822.o
>>  obj-$(CONFIG_DP83848_PHY)	+= dp83848.o
>>  obj-$(CONFIG_DP83867_PHY)	+= dp83867.o
>>  obj-$(CONFIG_FIXED_PHY)		+= fixed_phy.o
>> diff --git a/drivers/net/phy/dp83822.c b/drivers/net/phy/dp83822.c
>> new file mode 100644
>> index 000000000000..de196dbc46cd
>> --- /dev/null
>> +++ b/drivers/net/phy/dp83822.c
>> @@ -0,0 +1,302 @@
>> +/*
>> + * Driver for the Texas Instruments DP83822 PHY
>> + *
>> + * Copyright (C) 2017 Texas Instruments Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + */
>> +
>> +#include <linux/ethtool.h>
>> +#include <linux/etherdevice.h>
>> +#include <linux/kernel.h>
>> +#include <linux/mii.h>
>> +#include <linux/module.h>
>> +#include <linux/of.h>
>> +#include <linux/phy.h>
>> +#include <linux/netdevice.h>
>> +
>> +#define DP83822_PHY_ID	        0x2000a240
>> +#define DP83822_DEVADDR		0x1f
>> +
>> +#define MII_DP83822_MISR1	0x12
>> +#define MII_DP83822_MISR2	0x13
>> +#define MII_DP83822_RESET_CTRL	0x1f
>> +
>> +#define DP83822_HW_RESET	BIT(15)
>> +#define DP83822_SW_RESET	BIT(14)
>> +
>> +/* MISR1 bits */
>> +#define DP83822_RX_ERR_HF_INT_EN	BIT(0)
>> +#define DP83822_FALSE_CARRIER_HF_INT_EN	BIT(1)
>> +#define DP83822_ANEG_COMPLETE_INT_EN	BIT(2)
>> +#define DP83822_DUP_MODE_CHANGE_INT_EN	BIT(3)
>> +#define DP83822_SPEED_CHANGED_INT_EN	BIT(4)
>> +#define DP83822_LINK_STAT_INT_EN	BIT(5)
>> +#define DP83822_ENERGY_DET_INT_EN	BIT(6)
>> +#define DP83822_LINK_QUAL_INT_EN	BIT(7)
>> +
>> +/* MISR2 bits */
>> +#define DP83822_JABBER_DET_INT_EN	BIT(0)
>> +#define DP83822_WOL_PKT_INT_EN		BIT(1)
>> +#define DP83822_SLEEP_MODE_INT_EN	BIT(2)
>> +#define DP83822_MDI_XOVER_INT_EN	BIT(3)
>> +#define DP83822_LB_FIFO_INT_EN		BIT(4)
>> +#define DP83822_PAGE_RX_INT_EN		BIT(5)
>> +#define DP83822_ANEG_ERR_INT_EN		BIT(6)
>> +#define DP83822_EEE_ERROR_CHANGE_INT_EN	BIT(7)
>> +
>> +/* INT_STAT1 bits */
>> +#define DP83822_WOL_INT_EN	BIT(4)
>> +#define DP83822_WOL_INT_STAT	BIT(12)
>> +
>> +#define MII_DP83822_RXSOP1	0x04a5
>> +#define	MII_DP83822_RXSOP2	0x04a6
>> +#define	MII_DP83822_RXSOP3	0x04a7
>> +
>> +/* WoL Registers */
>> +#define	MII_DP83822_WOL_CFG	0x04a0
>> +#define	MII_DP83822_WOL_STAT	0x04a1
>> +#define	MII_DP83822_WOL_DA1	0x04a2
>> +#define	MII_DP83822_WOL_DA2	0x04a3
>> +#define	MII_DP83822_WOL_DA3	0x04a4
>> +
>> +/* WoL bits */
>> +#define DP83822_WOL_MAGIC_EN	BIT(1)
> 
> Datasheet seems to indicate MAGIC_EN is bit 0, not 1.

OK

> 
>> +#define DP83822_WOL_SECURE_ON	BIT(5)
>> +#define DP83822_WOL_EN		BIT(7)
>> +#define DP83822_WOL_INDICATION_SEL BIT(8)
>> +#define DP83822_WOL_CLR_INDICATION BIT(11)
>> +
>> +static int dp83822_ack_interrupt(struct phy_device *phydev)
>> +{
>> +	int err = phy_read(phydev, MII_DP83822_MISR1);
>> +
>> +	if (err < 0)
>> +		return err;
>> +
> 
> The above could also be written:
> 
> int err;
> 
> err = phy_read(phydev, MII_DP83822_MISR1);
> if (err < 0)
> 	return err;
> 
> This matches the below better and is more clear to me.

OK

> 
>> +	err = phy_read(phydev, MII_DP83822_MISR2);
>> +	if (err < 0)
>> +		return err;
>> +
>> +	return 0;
>> +}
>> +
>> +static int dp83822_set_wol(struct phy_device *phydev,
>> +			   struct ethtool_wolinfo *wol)
>> +{
>> +	struct net_device *ndev = phydev->attached_dev;
>> +	u16 value;
>> +	const u8 *mac;
>> +
>> +	if (wol->wolopts & (WAKE_MAGIC | WAKE_MAGICSECURE)) {
>> +		mac = (const u8 *)ndev->dev_addr;
>> +
>> +		if (!is_valid_ether_addr(mac))
>> +			return -EINVAL;
>> +
>> +		/* MAC addresses start with byte 5, but stored in mac[0].
>> +		 * 822 PHYs store bytes 4|5, 2|3, 0|1
>> +		 */
>> +		phy_write_mmd(phydev, DP83822_DEVADDR,
>> +			      MII_DP83822_WOL_DA1, (mac[1] << 8) | mac[0]);
>> +		phy_write_mmd(phydev, DP83822_DEVADDR,
>> +			      MII_DP83822_WOL_DA2, (mac[3] << 8) | mac[2]);
>> +		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_DA3,
>> +			      (mac[5] << 8) | mac[4]);
> 
> This 'phy_write_mmd' doesn't match the others, 'MII_DP83822_WOL_DAx'
> should be on the next line, or all others should be on same.

ok

> 
>> +
>> +		value = phy_read_mmd(phydev, DP83822_DEVADDR,
>> +				     MII_DP83822_WOL_CFG);
>> +		if (wol->wolopts & WAKE_MAGIC)
>> +			value |= DP83822_WOL_MAGIC_EN;
>> +		else
>> +			value &= ~DP83822_WOL_MAGIC_EN;
>> +
>> +		if (wol->wolopts & WAKE_MAGICSECURE) {
>> +			phy_write_mmd(phydev, DP83822_DEVADDR,
>> +				      MII_DP83822_RXSOP1,
>> +				      (wol->sopass[1] << 8) | wol->sopass[0]);
>> +			phy_write_mmd(phydev, DP83822_DEVADDR,
>> +				      MII_DP83822_RXSOP2,
>> +				      (wol->sopass[3] << 8) | wol->sopass[2]);
>> +			phy_write_mmd(phydev, DP83822_DEVADDR,
>> +				      MII_DP83822_RXSOP3,
>> +				      (wol->sopass[5] << 8) | wol->sopass[4]);
>> +			value |= DP83822_WOL_SECURE_ON;
>> +		} else {
>> +			value &= ~DP83822_WOL_SECURE_ON;
>> +		}
>> +
>> +		value |= (DP83822_WOL_EN | DP83822_WOL_INDICATION_SEL |
>> +			  DP83822_WOL_CLR_INDICATION);
>> +		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG,
>> +			      value);
>> +	} else {
>> +		value = phy_read_mmd(phydev, DP83822_DEVADDR,
>> +				     MII_DP83822_WOL_CFG);
>> +		value &= ~DP83822_WOL_EN;
>> +		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG,
>> +			      value);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static void dp83822_get_wol(struct phy_device *phydev,
>> +			    struct ethtool_wolinfo *wol)
>> +{
>> +	int value;
>> +	u16 sopass_val;
>> +
>> +	wol->supported = (WAKE_MAGIC | WAKE_MAGICSECURE);
>> +	wol->wolopts = 0;
>> +
>> +	value = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
>> +	if (value & DP83822_WOL_MAGIC_EN)
>> +		wol->wolopts |= WAKE_MAGIC;
>> +
>> +	if (~value & DP83822_WOL_CLR_INDICATION)
>> +		wol->wolopts = 0;
> 
> I'm not sure I understand the logic here, why do we clear all other
> wolopts if this is not set?

Actually this needs to be WOL_ENABLE bit check and if the WoL enable bit
is not set it should just return to indicate that WoL is disabled.  And the
rest of the opts should not matter.

> 
>> +
>> +	if (value & DP83822_WOL_SECURE_ON) {
>> +		wol->wolopts |= WAKE_MAGICSECURE;
>> +	} else {
>> +		wol->wolopts &= ~WAKE_MAGICSECURE;
> 
> wol->wolopts is set to 0 at the start, and nothing else sets it, why
> clear it here?

The above should fix this

> 
>> +		return;
>> +	}
>> +
>> +	sopass_val = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP1);
>> +	wol->sopass[0] = (sopass_val & 0xff);
>> +	wol->sopass[1] = (sopass_val >> 8);
>> +
>> +	sopass_val = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP2);
>> +	wol->sopass[2] = (sopass_val & 0xff);
>> +	wol->sopass[3] = (sopass_val >> 8);
>> +
>> +	sopass_val = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP3);
>> +	wol->sopass[4] = (sopass_val & 0xff);
>> +	wol->sopass[5] = (sopass_val >> 8);
> 
> Why not encase the above password lines in the 'if (value &
> DP83822_WOL_SECURE_ON)' block above, then you can drop the whole else block.

Moved

> 
>> +}
>> +
>> +static int dp83822_config_intr(struct phy_device *phydev)
>> +{
>> +	int misr_status;
>> +	int err;
>> +
>> +	if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
>> +		misr_status = phy_read(phydev, MII_DP83822_MISR1);
>> +		if (misr_status < 0)
>> +			return misr_status;
>> +
>> +		misr_status |= (DP83822_RX_ERR_HF_INT_EN |
>> +				DP83822_FALSE_CARRIER_HF_INT_EN |
>> +				DP83822_ANEG_COMPLETE_INT_EN |
>> +				DP83822_DUP_MODE_CHANGE_INT_EN |
>> +				DP83822_SPEED_CHANGED_INT_EN |
>> +				DP83822_LINK_STAT_INT_EN |
>> +				DP83822_ENERGY_DET_INT_EN |
>> +				DP83822_LINK_QUAL_INT_EN);
>> +
>> +		err = phy_write(phydev, MII_DP83822_MISR1, misr_status);
>> +		if (err < 0)
>> +			return err;
>> +
>> +		misr_status = phy_read(phydev, MII_DP83822_MISR2);
>> +		if (misr_status < 0)
>> +			return misr_status;
>> +
>> +		misr_status |= (DP83822_JABBER_DET_INT_EN |
>> +				DP83822_WOL_PKT_INT_EN |
>> +				DP83822_SLEEP_MODE_INT_EN |
>> +				DP83822_MDI_XOVER_INT_EN |
>> +				DP83822_LB_FIFO_INT_EN |
>> +				DP83822_PAGE_RX_INT_EN |
>> +				DP83822_ANEG_ERR_INT_EN |
>> +				DP83822_EEE_ERROR_CHANGE_INT_EN);
>> +
>> +		err = phy_write(phydev, MII_DP83822_MISR2, misr_status);
>> +	} else {
>> +		err = phy_write(phydev, MII_DP83822_MISR1, 0);
> 
> You should only clear the ones you set, I know it is all of them plus
> the other registers are read-only, but for clarity you could have a
> define with the mask you are using for each register and then ~MASK when
> clearing, like the dp83848.c driver.

The dp83848 only creates a define for setting the interrupts in the MISR register.
In that drivers ack_interrupt routine it just reads the MISR register and returns.  The mask is
not used anywhere else.  IMO it's a little over kill to create a define that is used once.

> 
>> +		if (err < 0)
>> +			return err;
>> +
>> +		err = phy_write(phydev, MII_DP83822_MISR1, 0);
>> +	}
>> +
>> +	return err;
>> +}
>> +
>> +static int dp83822_phy_reset(struct phy_device *phydev)
>> +{
>> +	int err;
>> +
>> +	err = phy_write(phydev, MII_DP83822_RESET_CTRL, DP83822_HW_RESET);
>> +	if (err < 0)
>> +		return err;
>> +
>> +	return 0;
>> +}
>> +
>> +static int dp83822_suspend(struct phy_device *phydev)
>> +{
>> +	int value;
>> +
>> +	value = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
>> +
>> +	if (!(value & DP83822_WOL_EN))
>> +		genphy_suspend(phydev);
>> +
>> +	return 0;
>> +}
>> +
>> +static int dp83822_resume(struct phy_device *phydev)
>> +{
>> +	int value;
>> +
>> +	genphy_resume(phydev);
>> +
>> +	value = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
>> +
>> +	phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG, value |
>> +		      DP83822_WOL_CLR_INDICATION);
>> +
>> +
> 
> Extra newline.

Removed

> 
>> +	return 0;
>> +}
>> +
>> +static struct phy_driver dp83822_driver[] = {
>> +	{
>> +	 .phy_id = DP83822_PHY_ID,
>> +	 .phy_id_mask = 0xfffffff0,
>> +	 .name = "TI DP83822",
>> +	 .features = PHY_BASIC_FEATURES,
>> +	 .flags = PHY_HAS_INTERRUPT,
>> +	 .config_init = genphy_config_init,
>> +	 .soft_reset = dp83822_phy_reset,
>> +	 .get_wol = dp83822_get_wol,
>> +	 .set_wol = dp83822_set_wol,
>> +	 .ack_interrupt = dp83822_ack_interrupt,
>> +	 .config_intr = dp83822_config_intr,
>> +	 .config_aneg = genphy_config_aneg,
>> +	 .read_status = genphy_read_status,
>> +	 .suspend = dp83822_suspend,
>> +	 .resume = dp83822_resume,
>> +	 },
> 
> Something is not right about the indenting here, tab then space?
> 

Fixed

>> +};
>> +module_phy_driver(dp83822_driver);
>> +
>> +static struct mdio_device_id __maybe_unused dp83822_tbl[] = {
>> +	{ DP83822_PHY_ID, 0xfffffff0 },
>> +	{ },
>> +};
>> +MODULE_DEVICE_TABLE(mdio, dp83822_tbl);
>> +
>> +MODULE_DESCRIPTION("Texas Instruments DP83822 PHY driver");
>> +MODULE_AUTHOR("Dan Murphy <dmurphy@ti.com");
>> +MODULE_LICENSE("GPL");
>> diff --git a/drivers/net/phy/dp83848.c b/drivers/net/phy/dp83848.c
>> index 3de4fe4dda77..3966d43c5146 100644
>> --- a/drivers/net/phy/dp83848.c
>> +++ b/drivers/net/phy/dp83848.c
>> @@ -20,7 +20,6 @@
>>  #define TI_DP83620_PHY_ID		0x20005ce0
>>  #define NS_DP83848C_PHY_ID		0x20005c90
>>  #define TLK10X_PHY_ID			0x2000a210
>> -#define TI_DP83822_PHY_ID		0x2000a240
>>  
>>  /* Registers */
>>  #define DP83848_MICR			0x11 /* MII Interrupt Control Register */
>> @@ -80,7 +79,6 @@ static struct mdio_device_id __maybe_unused dp83848_tbl[] = {
>>  	{ NS_DP83848C_PHY_ID, 0xfffffff0 },
>>  	{ TI_DP83620_PHY_ID, 0xfffffff0 },
>>  	{ TLK10X_PHY_ID, 0xfffffff0 },
>> -	{ TI_DP83822_PHY_ID, 0xfffffff0 },
>>  	{ }
>>  };
>>  MODULE_DEVICE_TABLE(mdio, dp83848_tbl);
>> @@ -110,7 +108,6 @@ static struct phy_driver dp83848_driver[] = {
>>  	DP83848_PHY_DRIVER(NS_DP83848C_PHY_ID, "NS DP83848C 10/100 Mbps PHY"),
>>  	DP83848_PHY_DRIVER(TI_DP83620_PHY_ID, "TI DP83620 10/100 Mbps PHY"),
>>  	DP83848_PHY_DRIVER(TLK10X_PHY_ID, "TI TLK10X 10/100 Mbps PHY"),
>> -	DP83848_PHY_DRIVER(TI_DP83822_PHY_ID, "TI DP83822 10/100 Mbps PHY"),
>>  };
>>  module_phy_driver(dp83848_driver);
>>  
>>


-- 
------------------
Dan Murphy

^ permalink raw reply

* Re: [PATCH v3 net-next] rtnetlink: bridge: use ext_ack instead of printk
From: David Ahern @ 2017-10-10 14:47 UTC (permalink / raw)
  To: Florian Westphal, netdev
In-Reply-To: <20171010144427.8341-1-fw@strlen.de>

On 10/10/17 8:44 AM, Florian Westphal wrote:
> @@ -3666,7 +3666,7 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>  
>  	dev = __dev_get_by_index(net, ifm->ifi_index);
>  	if (!dev) {
> -		pr_info("PF_BRIDGE: RTM_SETLINK with unknown ifindex\n");
> +		NL_SET_ERR_MSG(extack, "RTM_SETLINK with unknown ifindex");
>  		return -ENODEV;
>  	}
>  
> @@ -3741,7 +3741,7 @@ static int rtnl_bridge_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>  
>  	dev = __dev_get_by_index(net, ifm->ifi_index);
>  	if (!dev) {
> -		pr_info("PF_BRIDGE: RTM_SETLINK with unknown ifindex\n");
> +		NL_SET_ERR_MSG(extack, "RTM_SETLINK with unknown ifindex");
>  		return -ENODEV;
>  	}

missed a couple of 'RTM_* with' strings

^ permalink raw reply

* Re: [PATCH net v2] net: enable interface alias removal via rtnl
From: David Ahern @ 2017-10-10 14:50 UTC (permalink / raw)
  To: Nicolas Dichtel, davem; +Cc: netdev, oliver, Stephen Hemminger
In-Reply-To: <20171010124138.27342-1-nicolas.dichtel@6wind.com>

On 10/10/17 6:41 AM, Nicolas Dichtel wrote:
> IFLA_IFALIAS is defined as NLA_STRING. It means that the minimal length of
> the attribute is 1 ("\0"). However, to remove an alias, the attribute
> length must be 0 (see dev_set_alias()).
> 
> Let's define the type to NLA_BINARY, so that the alias can be removed.

not to be pedantic, but we need to be clear that the type is changed
only for policy validation.

> 
> Example:
> $ ip l s dummy0 alias foo
> $ ip l l dev dummy0
> 5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>     link/ether ae:20:30:4f:a7:f3 brd ff:ff:ff:ff:ff:ff
>     alias foo
> 
> Before the patch:
> $ ip l s dummy0 alias ""
> RTNETLINK answers: Numerical result out of range
> 
> After the patch:
> $ ip l s dummy0 alias ""
> $ ip l l dev dummy0
> 5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>     link/ether ae:20:30:4f:a7:f3 brd ff:ff:ff:ff:ff:ff
> 
> CC: Oliver Hartkopp <oliver@hartkopp.net>
> CC: Stephen Hemminger <stephen@networkplumber.org>
> Fixes: 96ca4a2cc145 ("net: remove ifalias on empty given alias")
> Reported-by: Julien FLoret <julien.floret@6wind.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
> 
> v1 -> v2: add the comment
> 
>  net/core/rtnetlink.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index d4bcdcc68e92..5343565d88b7 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1483,7 +1483,10 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>  	[IFLA_LINKINFO]		= { .type = NLA_NESTED },
>  	[IFLA_NET_NS_PID]	= { .type = NLA_U32 },
>  	[IFLA_NET_NS_FD]	= { .type = NLA_U32 },
> -	[IFLA_IFALIAS]	        = { .type = NLA_STRING, .len = IFALIASZ-1 },
> +	/* IFLA_IFALIAS is a string, but policy is set to NLA_BINARY to
> +	 * allow 0-length string (needed to remove an alias).
> +	 */
> +	[IFLA_IFALIAS]	        = { .type = NLA_BINARY, .len = IFALIASZ - 1 },
>  	[IFLA_VFINFO_LIST]	= {. type = NLA_NESTED },
>  	[IFLA_VF_PORTS]		= { .type = NLA_NESTED },
>  	[IFLA_PORT_SELF]	= { .type = NLA_NESTED },
> 

Seems like a reasonable solution.

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* Re: [PATCH net-next v2 4/5] selinux: bpf: Add selinux check for eBPF syscall operations
From: Stephen Smalley @ 2017-10-10 14:52 UTC (permalink / raw)
  To: Chenbo Feng, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, SELinux
  Cc: Chenbo Feng, Alexei Starovoitov, Daniel Borkmann,
	lorenzo-hpIqsD4AKlfQT0dZR+AlfA
In-Reply-To: <1507645097.30616.6.camel-+05T5uksL2qpZYMLLGbcSA@public.gmane.org>

On Tue, 2017-10-10 at 10:18 -0400, Stephen Smalley wrote:
> On Mon, 2017-10-09 at 15:20 -0700, Chenbo Feng wrote:
> > From: Chenbo Feng <fengc-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > 
> > Implement the actual checks introduced to eBPF related syscalls.
> > This
> > implementation use the security field inside bpf object to store a
> > sid that
> > identify the bpf object. And when processes try to access the
> > object,
> > selinux will check if processes have the right privileges. The
> > creation
> > of eBPF object are also checked at the general bpf check hook and
> > new
> > cmd introduced to eBPF domain can also be checked there.
> > 
> > Signed-off-by: Chenbo Feng <fengc-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > Acked-by: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > ---
> >  security/selinux/hooks.c            | 111
> > ++++++++++++++++++++++++++++++++++++
> >  security/selinux/include/classmap.h |   2 +
> >  security/selinux/include/objsec.h   |   4 ++
> >  3 files changed, 117 insertions(+)
> > 
> > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > index f5d304736852..41aba4e3d57c 100644
> > --- a/security/selinux/hooks.c
> > +++ b/security/selinux/hooks.c
> > @@ -85,6 +85,7 @@
> >  #include <linux/export.h>
> >  #include <linux/msg.h>
> >  #include <linux/shm.h>
> > +#include <linux/bpf.h>
> >  
> >  #include "avc.h"
> >  #include "objsec.h"
> > @@ -6252,6 +6253,106 @@ static void selinux_ib_free_security(void
> > *ib_sec)
> >  }
> >  #endif
> >  
> > +#ifdef CONFIG_BPF_SYSCALL
> > +static int selinux_bpf(int cmd, union bpf_attr *attr,
> > +				     unsigned int size)
> > +{
> > +	u32 sid = current_sid();
> > +	int ret;
> > +
> > +	switch (cmd) {
> > +	case BPF_MAP_CREATE:
> > +		ret = avc_has_perm(sid, sid, SECCLASS_BPF_MAP,
> > BPF_MAP__CREATE,
> > +				   NULL);
> > +		break;
> > +	case BPF_PROG_LOAD:
> > +		ret = avc_has_perm(sid, sid, SECCLASS_BPF_PROG,
> > BPF_PROG__LOAD,
> > +				   NULL);
> > +		break;
> > +	default:
> > +		ret = 0;
> > +		break;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static u32 bpf_map_fmode_to_av(fmode_t fmode)
> > +{
> > +	u32 av = 0;
> > +
> > +	if (f_mode & FMODE_READ)
> > +		av |= BPF_MAP__READ;
> > +	if (f_mode & FMODE_WRITE)
> > +		av |= BPF_MAP__WRITE;
> > +	return av;
> > +}
> > +
> > +static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode)
> > +{
> > +	u32 sid = current_sid();
> > +	struct bpf_security_struct *bpfsec;
> > +
> > +	bpfsec = map->security;
> > +	return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_MAP,
> > +			    bpf_map_fmode_to_av(fmode), NULL);
> > +}
> > +
> > +static int selinux_bpf_prog(struct bpf_prog *prog)
> > +{
> > +	u32 sid = current_sid();
> > +	struct bpf_security_struct *bpfsec;
> > +
> > +	bpfsec = prog->aux->security;
> > +	return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_PROG,
> > +			    BPF_PROG__USE, NULL);
> > +}
> > +
> > +static int selinux_bpf_map_alloc(struct bpf_map *map)
> > +{
> > +	struct bpf_security_struct *bpfsec;
> > +
> > +	bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
> > +	if (!bpfsec)
> > +		return -ENOMEM;
> > +
> > +	bpfsec->sid = current_sid();
> > +	map->security = bpfsec;
> > +
> > +	return 0;
> > +}
> > +
> > +static void selinux_bpf_map_free(struct bpf_map *map)
> > +{
> > +	struct bpf_security_struct *bpfsec = map->security;
> > +
> > +	map->security = NULL;
> > +	kfree(bpfsec);
> > +}
> > +
> > +static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux)
> > +{
> > +	struct bpf_security_struct *bpfsec;
> > +
> > +	bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
> > +	if (!bpfsec)
> > +		return -ENOMEM;
> > +
> > +	bpfsec->sid = current_sid();
> > +	aux->security = bpfsec;
> > +
> > +	return 0;
> > +}
> > +
> > +static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
> > +{
> > +	struct bpf_security_struct *bpfsec = aux->security;
> > +
> > +	aux->security = NULL;
> > +	kfree(bpfsec);
> > +}
> > +#endif
> > +
> >  static struct security_hook_list selinux_hooks[]
> > __lsm_ro_after_init
> > = {
> >  	LSM_HOOK_INIT(binder_set_context_mgr,
> > selinux_binder_set_context_mgr),
> >  	LSM_HOOK_INIT(binder_transaction,
> > selinux_binder_transaction),
> > @@ -6471,6 +6572,16 @@ static struct security_hook_list
> > selinux_hooks[] __lsm_ro_after_init = {
> >  	LSM_HOOK_INIT(audit_rule_match, selinux_audit_rule_match),
> >  	LSM_HOOK_INIT(audit_rule_free, selinux_audit_rule_free),
> >  #endif
> > +
> > +#ifdef CONFIG_BPF_SYSCALL
> > +	LSM_HOOK_INIT(bpf, selinux_bpf),
> > +	LSM_HOOK_INIT(bpf_map, selinux_bpf_map),
> > +	LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
> > +	LSM_HOOK_INIT(bpf_map_alloc_security,
> > selinux_bpf_map_alloc),
> > +	LSM_HOOK_INIT(bpf_prog_alloc_security,
> > selinux_bpf_prog_alloc),
> > +	LSM_HOOK_INIT(bpf_map_free_security,
> > selinux_bpf_map_free),
> > +	LSM_HOOK_INIT(bpf_prog_free_security,
> > selinux_bpf_prog_free),
> > +#endif
> >  };
> >  
> >  static __init int selinux_init(void)
> > diff --git a/security/selinux/include/classmap.h
> > b/security/selinux/include/classmap.h
> > index 35ffb29a69cb..7253c5eea59c 100644
> > --- a/security/selinux/include/classmap.h
> > +++ b/security/selinux/include/classmap.h
> > @@ -237,6 +237,8 @@ struct security_class_mapping secclass_map[] =
> > {
> >  	  { "access", NULL } },
> >  	{ "infiniband_endport",
> >  	  { "manage_subnet", NULL } },
> > +	{ "bpf_map", {"create", "read", "write"} },
> > +	{ "bpf_prog", {"load", "use"} },
> 
> Again I have to ask: do you truly need/want two separate classes, or
> would a single class with distinct permissions suffice, ala:
>         { "bpf", { "create_map", "read_map", "write_map",
> "prog_load",
> "prog_use" } },
> 
> and then allow A self:bpf { create_map read_map write_map prog_load
> prog_use }; would be stored in a single policy avtab rule, and be
> cached in a single AVC entry.

I guess for consistency in naming it should be either:
        { "bpf", { "map_create", "map_read", "map_write", "prog_load",
"prog_use" } },
 
or:

        { "bpf", { "create_map", "read_map", "write_map", "load_prog",
"use_prog" } },
 

> >  	{ NULL }
> >    };
> >  
> > diff --git a/security/selinux/include/objsec.h
> > b/security/selinux/include/objsec.h
> > index 1649cd18eb0b..3d54468ce334 100644
> > --- a/security/selinux/include/objsec.h
> > +++ b/security/selinux/include/objsec.h
> > @@ -150,6 +150,10 @@ struct pkey_security_struct {
> >  	u32	sid;	/* SID of pkey */
> >  };
> >  
> > +struct bpf_security_struct {
> > +	u32 sid;  /*SID of bpf obj creater*/
> > +};
> > +
> >  extern unsigned int selinux_checkreqprot;
> >  
> >  #endif /* _SELINUX_OBJSEC_H_ */

^ permalink raw reply

* [PATCH net] macsec: fix memory leaks when skb_to_sgvec fails
From: Sabrina Dubroca @ 2017-10-10 15:07 UTC (permalink / raw)
  To: netdev; +Cc: Sabrina Dubroca, Jason A . Donenfeld

Fixes: cda7ea690350 ("macsec: check return value of skb_to_sgvec always")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 drivers/net/macsec.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 98e4deaa3a6a..5ab1b8849c30 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -742,6 +742,7 @@ static struct sk_buff *macsec_encrypt(struct sk_buff *skb,
 	sg_init_table(sg, ret);
 	ret = skb_to_sgvec(skb, sg, 0, skb->len);
 	if (unlikely(ret < 0)) {
+		aead_request_free(req);
 		macsec_txsa_put(tx_sa);
 		kfree_skb(skb);
 		return ERR_PTR(ret);
@@ -954,6 +955,7 @@ static struct sk_buff *macsec_decrypt(struct sk_buff *skb,
 	sg_init_table(sg, ret);
 	ret = skb_to_sgvec(skb, sg, 0, skb->len);
 	if (unlikely(ret < 0)) {
+		aead_request_free(req);
 		kfree_skb(skb);
 		return ERR_PTR(ret);
 	}
-- 
2.14.2

^ permalink raw reply related

* Re: [PATCH net-next] openvswitch: add ct_clear action
From: Eric Garver @ 2017-10-10 15:09 UTC (permalink / raw)
  To: Joe Stringer; +Cc: Pravin Shelar, Linux Kernel Network Developers, ovs dev
In-Reply-To: <CAPWQB7FgKYe4Ax08NzW97-WGmriC7j9YhxQF9QtuQZwMjA00bQ@mail.gmail.com>

On Tue, Oct 10, 2017 at 05:33:48AM -0700, Joe Stringer wrote:
> On 9 October 2017 at 21:41, Pravin Shelar <pshelar@ovn.org> wrote:
> > On Fri, Oct 6, 2017 at 9:44 AM, Eric Garver <e@erig.me> wrote:
> >> This adds a ct_clear action for clearing conntrack state. ct_clear is
> >> currently implemented in OVS userspace, but is not backed by an action
> >> in the kernel datapath. This is useful for flows that may modify a
> >> packet tuple after a ct lookup has already occurred.
> >>
> >> Signed-off-by: Eric Garver <e@erig.me>
> > Patch mostly looks good. I have following comments.
> >
> >> ---
> >>  include/uapi/linux/openvswitch.h |  2 ++
> >>  net/openvswitch/actions.c        |  5 +++++
> >>  net/openvswitch/conntrack.c      | 12 ++++++++++++
> >>  net/openvswitch/conntrack.h      |  7 +++++++
> >>  net/openvswitch/flow_netlink.c   |  5 +++++
> >>  5 files changed, 31 insertions(+)
> >>
> >> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
> >> index 156ee4cab82e..1b6e510e2cc6 100644
> >> --- a/include/uapi/linux/openvswitch.h
> >> +++ b/include/uapi/linux/openvswitch.h
> >> @@ -806,6 +806,7 @@ struct ovs_action_push_eth {
> >>   * packet.
> >>   * @OVS_ACTION_ATTR_POP_ETH: Pop the outermost Ethernet header off the
> >>   * packet.
> >> + * @OVS_ACTION_ATTR_CT_CLEAR: Clear conntrack state from the packet.
> >>   *
> >>   * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
> >>   * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
> >> @@ -835,6 +836,7 @@ enum ovs_action_attr {
> >>         OVS_ACTION_ATTR_TRUNC,        /* u32 struct ovs_action_trunc. */
> >>         OVS_ACTION_ATTR_PUSH_ETH,     /* struct ovs_action_push_eth. */
> >>         OVS_ACTION_ATTR_POP_ETH,      /* No argument. */
> >> +       OVS_ACTION_ATTR_CT_CLEAR,     /* No argument. */
> >>
> >>         __OVS_ACTION_ATTR_MAX,        /* Nothing past this will be accepted
> >>                                        * from userspace. */
> >> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> >> index a54a556fcdb5..db9c7f2e662b 100644
> >> --- a/net/openvswitch/actions.c
> >> +++ b/net/openvswitch/actions.c
> >> @@ -1203,6 +1203,10 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
> >>                                 return err == -EINPROGRESS ? 0 : err;
> >>                         break;
> >>
> >> +               case OVS_ACTION_ATTR_CT_CLEAR:
> >> +                       err = ovs_ct_clear(skb, key);
> >> +                       break;
> >> +
> >>                 case OVS_ACTION_ATTR_PUSH_ETH:
> >>                         err = push_eth(skb, key, nla_data(a));
> >>                         break;
> >> @@ -1210,6 +1214,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
> >>                 case OVS_ACTION_ATTR_POP_ETH:
> >>                         err = pop_eth(skb, key);
> >>                         break;
> >> +
> >>                 }
> > Unrelated change.
> >
> >>
> >>                 if (unlikely(err)) {
> >> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> >> index d558e882ca0c..f9b73c726ad7 100644
> >> --- a/net/openvswitch/conntrack.c
> >> +++ b/net/openvswitch/conntrack.c
> >> @@ -1129,6 +1129,18 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
> >>         return err;
> >>  }
> >>
> >> +int ovs_ct_clear(struct sk_buff *skb, struct sw_flow_key *key)
> >> +{
> >> +       if (skb_nfct(skb)) {
> >> +               nf_conntrack_put(skb_nfct(skb));
> >> +               nf_ct_set(skb, NULL, 0);
> > Can the new conntract state be appropriate? may be IP_CT_UNTRACKED?
> >
> >> +       }
> >> +
> >> +       ovs_ct_fill_key(skb, key);
> >> +
> > I do not see need to refill the key if there is no skb-nf-ct.
> 
> Really this is trying to just zero the CT key fields, but reuses
> existing functions, right? This means that subsequent upcalls, for

Right.

> instance, won't have the outdated view of the CT state from the
> previous lookup (that was prior to the ct_clear). I'd expect these key
> fields to be cleared.

I assumed Pravin was saying that we don't need to clear them if there is
no conntrack state. They should already be zero.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox