Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [net 0/7][pull request] Intel Wired LAN Driver Fixes 2018-10-24
From: David Miller @ 2018-10-24 23:28 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann
In-Reply-To: <20181024214731.26036-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 24 Oct 2018 14:47:24 -0700

> This series contains fixes for the ice driver.
> 
> Anirudh fixes a namespace issue which was introduced with a previous
> patch to remove ice_netpoll.  Fixed up the device ID define names to
> align with the branding string names.  Use the capability count returned
> by the firmware, instead of calculating the count.  Introduced driver
> workarounds due to current firmware limitations.  Fixed the queue
> mapping for a VF, which needs to be set in the config and scatter queue
> modes.  Fixed the driver which is setup to handle link status events
> (LSE), even though the firmware does not have this feature yet, so add
> the ability to poll for link status changes while we wait for updated
> firmware.
> 
> The following are changes since commit 44adbac8f7217040be97928cd19998259d9d4418:
>   Merge branch 'work.tty-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 100GbE

Pulled, thanks Jeff.

^ permalink raw reply

* Re: [PATCH bpf 5/7] bpf: fix direct packet write into pop/peek helpers
From: Mauricio Vasquez @ 2018-10-24 22:30 UTC (permalink / raw)
  To: Daniel Borkmann, ast; +Cc: netdev
In-Reply-To: <20181024200549.8516-6-daniel@iogearbox.net>


On 10/24/18 3:05 PM, Daniel Borkmann wrote:
> Commit f1a2e44a3aec ("bpf: add queue and stack maps") probably just
> copy-pasted .pkt_access for bpf_map_{pop,peek}_elem() helpers, but
> this is buggy in this context since it would allow writes into cloned
> skbs which is invalid. Therefore, disable .pkt_access for the two.
>
> Fixes: f1a2e44a3aec ("bpf: add queue and stack maps")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Cc: Mauricio Vasquez B <mauricio.vasquez@polito.it>


Thanks for this as well.

Acked-by: Mauricio Vasquez B<mauricio.vasquez@polito.it>

^ permalink raw reply

* [PATCH net-next 4/4] net: ethernet: ti: cpsw: fix vlan configuration while down/up
From: Ivan Khoronzhuk @ 2018-10-24 22:10 UTC (permalink / raw)
  To: grygorii.strashko, davem
  Cc: linux-omap, netdev, linux-kernel, alexander.h.duyck, bjorn,
	Ivan Khoronzhuk
In-Reply-To: <20181024221059.21834-1-ivan.khoronzhuk@linaro.org>

The vlan configuration is not restored after interface donw/up sequence
(if dual-emac - both interfaces). Tested on am572x EVM.

Steps to check:
~# ip link add link eth1 name eth1.100 type vlan id 100
~# ifconfig eth0 down
~# ifconfig eth1 down

Try to remove vid and observe warning:
~# ip link del eth1.100
[  739.526757] net eth1: removing vlanid 100 from vlan filter
[  739.533322] failed to kill vid 0081/100 for device eth1

This patch fixes it, restoring only vlan ALE entries and all other
unicast/multicast entries are restored by system calling rx_mode ndo.

Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 drivers/net/ethernet/ti/cpsw.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 27e0a5d5ccf9..a061a12e3022 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -565,6 +565,9 @@ static const struct cpsw_stats cpsw_gstrings_ch_stats[] = {
 				(func)(slave++, ##arg);			\
 	} while (0)
 
+static int cpsw_ndo_vlan_rx_add_vid(struct net_device *ndev,
+				    __be16 proto, u16 vid);
+
 static inline int cpsw_get_slave_port(u32 slave_num)
 {
 	return slave_num + 1;
@@ -1977,9 +1980,23 @@ static void cpsw_mqprio_resume(struct cpsw_slave *slave, struct cpsw_priv *priv)
 	slave_write(slave, tx_prio_map, tx_prio_rg);
 }
 
+static int cpsw_restore_vlans(struct net_device *vdev, int vid, void *arg)
+{
+	struct cpsw_priv *priv = arg;
+
+	if (!vdev)
+		return 0;
+
+	cpsw_ndo_vlan_rx_add_vid(priv->ndev, 0, vid);
+	return 0;
+}
+
 /* restore resources after port reset */
 static void cpsw_restore(struct cpsw_priv *priv)
 {
+	/* restore vlan configurations */
+	vlan_for_each(priv->ndev, cpsw_restore_vlans, priv);
+
 	/* restore MQPRIO offload */
 	for_each_slave(priv, cpsw_mqprio_resume, priv);
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 3/4] net: ethernet: ti: cpsw: fix vlan mcast
From: Ivan Khoronzhuk @ 2018-10-24 22:10 UTC (permalink / raw)
  To: grygorii.strashko, davem
  Cc: linux-omap, netdev, linux-kernel, alexander.h.duyck, bjorn,
	Ivan Khoronzhuk
In-Reply-To: <20181024221059.21834-1-ivan.khoronzhuk@linaro.org>

At this moment, mcast addresses are added for real device only
(reserved vlans for dual-emac mode), even if a mcast address was added
for some vlan only, thus ALE doesn't have corresponding vlan mcast
entries after vlan socket joined multicast group. So ALE drops vlan
frames with mcast addresses intended for vlans and potentially can
receive mcast frames for base ndev. That's not correct. So, fix it by
creating only vlan/mcast entries as requested. Patch doesn't use any
additional lists and is based on device mc address list and cpsw ALE
table entries.

In legacy switch mode ALE table can have untracked vlan addresses, it
can conflict with method used to delete vlan mcast addresses, so for
switch mode, do syncing as it is, leaving ability for a user to modify
table in usual for him sequence.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 drivers/net/ethernet/ti/cpsw.c | 195 +++++++++++++++++++++++++++------
 1 file changed, 164 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 500f7ed8c58c..27e0a5d5ccf9 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -570,21 +570,6 @@ static inline int cpsw_get_slave_port(u32 slave_num)
 	return slave_num + 1;
 }
 
-static void cpsw_add_mcast(struct cpsw_priv *priv, const u8 *addr)
-{
-	struct cpsw_common *cpsw = priv->cpsw;
-
-	if (cpsw->data.dual_emac) {
-		struct cpsw_slave *slave = cpsw->slaves + priv->emac_port;
-
-		cpsw_ale_add_mcast(cpsw->ale, addr, ALE_PORT_HOST,
-				   ALE_VLAN, slave->port_vlan, 0);
-		return;
-	}
-
-	cpsw_ale_add_mcast(cpsw->ale, addr, ALE_ALL_PORTS, 0, 0, 0);
-}
-
 static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 {
 	struct cpsw_common *cpsw = ndev_to_cpsw(ndev);
@@ -640,7 +625,7 @@ static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 
 			/* Clear all mcast from ALE */
 			cpsw_ale_flush_multicast(ale, ALE_ALL_PORTS, -1);
-			__dev_mc_unsync(ndev, NULL);
+			__hw_addr_ref_unsync_dev(&ndev->mc, ndev, NULL);
 
 			/* Flood All Unicast Packets to Host port */
 			cpsw_ale_control_set(ale, 0, ALE_P0_UNI_FLOOD, 1);
@@ -661,29 +646,174 @@ static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 	}
 }
 
-static int cpsw_add_mc_addr(struct net_device *ndev, const u8 *addr)
+static int cpsw_switch_mode(struct net_device *ndev)
 {
-	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = ndev_to_cpsw(ndev);
 
-	cpsw_add_mcast(priv, addr);
-	return 0;
+	return !(cpsw->data.dual_emac || cpsw->data.slaves == 1);
 }
 
-static int cpsw_del_mc_addr(struct net_device *ndev, const u8 *addr)
+struct addr_sync_ctx {
+	struct net_device *ndev;
+	const u8 *addr;		/* address to be synched */
+	int consumed;		/* number of address instances */
+	int flush;		/* flush flag */
+};
+
+/**
+ * cpsw_set_mc - adds multicast entry to the table if it's not added or deletes
+ * if it's not deleted
+ * @ndev: device to sync
+ * @addr: address to be added or deleted
+ * @vid: vlan id, if vid < 0 set/unset address for real device
+ * @add: add address if the flag is set or remove otherwise
+ */
+static int cpsw_set_mc(struct net_device *ndev, const u8 *addr,
+		       int vid, int add)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int vid, flags;
+	int mask, flags, ret;
 
-	if (cpsw->data.dual_emac) {
-		vid = cpsw->slaves[priv->emac_port].port_vlan;
-		flags = ALE_VLAN;
-	} else {
-		vid = 0;
-		flags = 0;
+	if (vid < 0) {
+		if (cpsw->data.dual_emac)
+			vid = cpsw->slaves[priv->emac_port].port_vlan;
+		else
+			vid = 0;
+	}
+
+	mask = cpsw->data.dual_emac ? ALE_PORT_HOST : ALE_ALL_PORTS;
+	flags = vid ? ALE_VLAN : 0;
+
+	if (add)
+		ret = cpsw_ale_add_mcast(cpsw->ale, addr, mask, flags, vid, 0);
+	else
+		ret = cpsw_ale_del_mcast(cpsw->ale, addr, 0, flags, vid);
+
+	return ret;
+}
+
+static int cpsw_update_vlan_mc(struct net_device *vdev, int vid, void *ctx)
+{
+	struct addr_sync_ctx *sync_ctx = ctx;
+	struct netdev_hw_addr *ha;
+	int found = 0, ret = 0;
+
+	if (!vdev || !(vdev->flags & IFF_UP))
+		return 0;
+
+	/* vlan address is relevant if its sync_cnt != 0 */
+	netdev_for_each_mc_addr(ha, vdev) {
+		if (ether_addr_equal(ha->addr, sync_ctx->addr)) {
+			found = ha->sync_cnt;
+			break;
+		}
+	}
+
+	if (found)
+		sync_ctx->consumed++;
+
+	if (sync_ctx->flush) {
+		if (!found)
+			cpsw_set_mc(sync_ctx->ndev, sync_ctx->addr, vid, 0);
+		return 0;
+	}
+
+	if (found)
+		ret = cpsw_set_mc(sync_ctx->ndev, sync_ctx->addr, vid, 1);
+
+	return ret;
+}
+
+static int cpsw_add_mc_addr(struct net_device *ndev, const u8 *addr, int num)
+{
+	struct addr_sync_ctx sync_ctx;
+	int ret;
+
+	/* leave for legacy switch mode untracked ALE entries */
+	if (cpsw_switch_mode(ndev)) {
+		ret = cpsw_set_mc(ndev, addr, -1, 1);
+		return ret;
+	}
+
+	sync_ctx.consumed = 0;
+	sync_ctx.addr = addr;
+	sync_ctx.ndev = ndev;
+	sync_ctx.flush = 0;
+
+	ret = vlan_for_each(ndev, cpsw_update_vlan_mc, &sync_ctx);
+	if (sync_ctx.consumed < num && !ret)
+		ret = cpsw_set_mc(ndev, addr, -1, 1);
+
+	return ret;
+}
+
+static int cpsw_del_mc_addr(struct net_device *ndev, const u8 *addr, int num)
+{
+	struct addr_sync_ctx sync_ctx;
+
+	/* leave for legacy switch mode untracked ALE entries */
+	if (cpsw_switch_mode(ndev)) {
+		if (!num)
+			cpsw_set_mc(ndev, addr, -1, 0);
+		return 0;
+	}
+
+	sync_ctx.consumed = 0;
+	sync_ctx.addr = addr;
+	sync_ctx.ndev = ndev;
+	sync_ctx.flush = 1;
+
+	vlan_for_each(ndev, cpsw_update_vlan_mc, &sync_ctx);
+	if (sync_ctx.consumed == num)
+		cpsw_set_mc(ndev, addr, -1, 0);
+
+	return 0;
+}
+
+static int cpsw_purge_vlan_mc(struct net_device *vdev, int vid, void *ctx)
+{
+	struct addr_sync_ctx *sync_ctx = ctx;
+	struct netdev_hw_addr *ha;
+	int found = 0;
+
+	if (!vdev || !(vdev->flags & IFF_UP))
+		return 0;
+
+	/* vlan address is relevant if its sync_cnt != 0 */
+	netdev_for_each_mc_addr(ha, vdev) {
+		if (ether_addr_equal(ha->addr, sync_ctx->addr)) {
+			found = ha->sync_cnt;
+			break;
+		}
 	}
 
-	cpsw_ale_del_mcast(cpsw->ale, addr, 0, flags, vid);
+	if (!found)
+		return 0;
+
+	sync_ctx->consumed++;
+	cpsw_set_mc(sync_ctx->ndev, sync_ctx->addr, vid, 0);
+	return 0;
+}
+
+static int cpsw_purge_all_mc(struct net_device *ndev, const u8 *addr, int num)
+{
+	struct addr_sync_ctx sync_ctx;
+
+	/* leave for legacy switch mode untracked ALE entries */
+	if (cpsw_switch_mode(ndev)) {
+		cpsw_set_mc(ndev, addr, -1, 0);
+		return 0;
+	}
+
+	sync_ctx.addr = addr;
+	sync_ctx.ndev = ndev;
+	sync_ctx.consumed = 0;
+
+	vlan_for_each(ndev, cpsw_purge_vlan_mc, &sync_ctx);
+	if (sync_ctx.consumed < num)
+		cpsw_set_mc(ndev, addr, -1, 0);
+
 	return 0;
 }
 
@@ -704,7 +834,9 @@ static void cpsw_ndo_set_rx_mode(struct net_device *ndev)
 	/* Restore allmulti on vlans if necessary */
 	cpsw_ale_set_allmulti(cpsw->ale, ndev->flags & IFF_ALLMULTI);
 
-	__dev_mc_sync(ndev, cpsw_add_mc_addr, cpsw_del_mc_addr);
+	/* add/remove mcast address either for real netdev or for vlan */
+	__hw_addr_ref_sync_dev(&ndev->mc, ndev, cpsw_add_mc_addr,
+			       cpsw_del_mc_addr);
 }
 
 static void cpsw_intr_enable(struct cpsw_common *cpsw)
@@ -1964,7 +2096,7 @@ static int cpsw_ndo_stop(struct net_device *ndev)
 	struct cpsw_common *cpsw = priv->cpsw;
 
 	cpsw_info(priv, ifdown, "shutting down cpsw device\n");
-	__dev_mc_unsync(priv->ndev, cpsw_del_mc_addr);
+	__hw_addr_ref_unsync_dev(&ndev->mc, ndev, cpsw_purge_all_mc);
 	netif_tx_stop_all_queues(priv->ndev);
 	netif_carrier_off(priv->ndev);
 
@@ -2415,6 +2547,7 @@ static int cpsw_ndo_vlan_rx_kill_vid(struct net_device *ndev,
 				  HOST_PORT_NUM, ALE_VLAN, vid);
 	ret |= cpsw_ale_del_mcast(cpsw->ale, priv->ndev->broadcast,
 				  0, ALE_VLAN, vid);
+	ret |= cpsw_ale_flush_multicast(cpsw->ale, 0, vid);
 err:
 	pm_runtime_put(cpsw->dev);
 	return ret;
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 2/4] net: 8021q: vlan_core: allow use list of vlans for real device
From: Ivan Khoronzhuk @ 2018-10-24 22:10 UTC (permalink / raw)
  To: grygorii.strashko, davem
  Cc: linux-omap, netdev, linux-kernel, alexander.h.duyck, bjorn,
	Ivan Khoronzhuk
In-Reply-To: <20181024221059.21834-1-ivan.khoronzhuk@linaro.org>

It's redundancy for the drivers to hold the list of vlans when
absolutely the same list exists in vlan core. In most cases it's
needed only to traverse the vlan devices, their vids and sync some
settings with h/w, so add API to simplify this.

At least some of these drivers also can benefit:
grep "for_each.*vid" -r drivers/net/ethernet/

drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:
drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c:
drivers/net/ethernet/qlogic/qlge/qlge_main.c:
drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c:
drivers/net/ethernet/via/via-rhine.c:
drivers/net/ethernet/via/via-velocity.c:
drivers/net/ethernet/intel/igb/igb_main.c:
drivers/net/ethernet/intel/ice/ice_main.c:
drivers/net/ethernet/intel/e1000/e1000_main.c:
drivers/net/ethernet/intel/i40e/i40e_main.c:
drivers/net/ethernet/intel/e1000e/netdev.c:
drivers/net/ethernet/intel/igbvf/netdev.c:
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:
drivers/net/ethernet/intel/ixgb/ixgb_main.c:
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:
drivers/net/ethernet/amd/xgbe/xgbe-dev.c:
drivers/net/ethernet/emulex/benet/be_main.c:
drivers/net/ethernet/neterion/vxge/vxge-main.c:
drivers/net/ethernet/adaptec/starfire.c:
drivers/net/ethernet/brocade/bna/bnad.c:

Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 include/linux/if_vlan.h | 11 +++++++++++
 net/8021q/vlan_core.c   | 27 +++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 83ea4df6ab81..410a14cd856c 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -133,6 +133,9 @@ struct vlan_pcpu_stats {
 
 extern struct net_device *__vlan_find_dev_deep_rcu(struct net_device *real_dev,
 					       __be16 vlan_proto, u16 vlan_id);
+extern int vlan_for_each(struct net_device *dev,
+			 int (*action)(struct net_device *dev, int vid,
+				       void *arg), void *arg);
 extern struct net_device *vlan_dev_real_dev(const struct net_device *dev);
 extern u16 vlan_dev_vlan_id(const struct net_device *dev);
 extern __be16 vlan_dev_vlan_proto(const struct net_device *dev);
@@ -236,6 +239,14 @@ __vlan_find_dev_deep_rcu(struct net_device *real_dev,
 	return NULL;
 }
 
+static inline int
+vlan_for_each(struct net_device *dev,
+	      int (*action)(struct net_device *dev, int vid, void *arg),
+	      void *arg)
+{
+	return 0;
+}
+
 static inline struct net_device *vlan_dev_real_dev(const struct net_device *dev)
 {
 	BUG();
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 4f60e86f4b8d..6308b5427a66 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -223,6 +223,33 @@ static int vlan_kill_rx_filter_info(struct net_device *dev, __be16 proto, u16 vi
 		return -ENODEV;
 }
 
+int vlan_for_each(struct net_device *dev,
+		  int (*action)(struct net_device *dev, int vid, void *arg),
+		  void *arg)
+{
+	struct vlan_vid_info *vid_info;
+	struct vlan_info *vlan_info;
+	struct net_device *vdev;
+	int ret;
+
+	ASSERT_RTNL();
+
+	vlan_info = rtnl_dereference(dev->vlan_info);
+	if (!vlan_info)
+		return 0;
+
+	list_for_each_entry(vid_info, &vlan_info->vid_list, list) {
+		vdev = vlan_group_get_device(&vlan_info->grp, vid_info->proto,
+					     vid_info->vid);
+		ret = action(vdev, vid_info->vid, arg);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(vlan_for_each);
+
 int vlan_filter_push_vids(struct vlan_info *vlan_info, __be16 proto)
 {
 	struct net_device *real_dev = vlan_info->real_dev;
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 1/4] net: core: dev_addr_lists: add auxiliary func to handle reference address updates
From: Ivan Khoronzhuk @ 2018-10-24 22:10 UTC (permalink / raw)
  To: grygorii.strashko, davem
  Cc: linux-omap, netdev, linux-kernel, alexander.h.duyck, bjorn,
	Ivan Khoronzhuk
In-Reply-To: <20181024221059.21834-1-ivan.khoronzhuk@linaro.org>

In order to avoid all table update, and only remove or add new
address, the auxiliary function exists, named __hw_addr_sync_dev().
It allows end driver do nothing when nothing changed and add/rm when
concrete address is firstly added or lastly removed. But it doesn't
include cases when an address of real device or vlan was reused by
other vlans or vlan/macval devices.

For handaling events when address was reused/unreused the patch adds
new auxiliary routine - __hw_addr_ref_sync_dev(). It allows to do
nothing when nothing was changed and do updates only for an address
being added/reused/deleted/unreused. Thus, clone address changes for
vlans can be mirrored in the table. The function is exclusive with
__hw_addr_sync_dev(). It's responsibility of the end driver to
identify address vlan device, if it needs so.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 include/linux/netdevice.h | 10 ++++
 net/core/dev_addr_lists.c | 97 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 107 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index dc1d9ed33b31..de95f96a6352 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4048,6 +4048,16 @@ int __hw_addr_sync_dev(struct netdev_hw_addr_list *list,
 		       int (*sync)(struct net_device *, const unsigned char *),
 		       int (*unsync)(struct net_device *,
 				     const unsigned char *));
+int __hw_addr_ref_sync_dev(struct netdev_hw_addr_list *list,
+			   struct net_device *dev,
+			   int (*sync)(struct net_device *,
+				       const unsigned char *, int),
+			   int (*unsync)(struct net_device *,
+					 const unsigned char *, int));
+void __hw_addr_ref_unsync_dev(struct netdev_hw_addr_list *list,
+			      struct net_device *dev,
+			      int (*unsync)(struct net_device *,
+					    const unsigned char *, int));
 void __hw_addr_unsync_dev(struct netdev_hw_addr_list *list,
 			  struct net_device *dev,
 			  int (*unsync)(struct net_device *,
diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
index d884d8f5f0e5..81a8cd4ea3bd 100644
--- a/net/core/dev_addr_lists.c
+++ b/net/core/dev_addr_lists.c
@@ -277,6 +277,103 @@ int __hw_addr_sync_dev(struct netdev_hw_addr_list *list,
 }
 EXPORT_SYMBOL(__hw_addr_sync_dev);
 
+/**
+ *  __hw_addr_ref_sync_dev - Synchronize device's multicast address list taking
+ *  into account references
+ *  @list: address list to synchronize
+ *  @dev:  device to sync
+ *  @sync: function to call if address or reference on it should be added
+ *  @unsync: function to call if address or some reference on it should removed
+ *
+ *  This function is intended to be called from the ndo_set_rx_mode
+ *  function of devices that require explicit address or references on it
+ *  add/remove notifications. The unsync function may be NULL in which case
+ *  the addresses or references on it requiring removal will simply be
+ *  removed without any notification to the device. That is responsibility of
+ *  the driver to identify and distribute address or references on it between
+ *  internal address tables.
+ **/
+int __hw_addr_ref_sync_dev(struct netdev_hw_addr_list *list,
+			   struct net_device *dev,
+			   int (*sync)(struct net_device *,
+				       const unsigned char *, int),
+			   int (*unsync)(struct net_device *,
+					 const unsigned char *, int))
+{
+	struct netdev_hw_addr *ha, *tmp;
+	int err, ref_cnt;
+
+	/* first go through and flush out any unsynced/stale entries */
+	list_for_each_entry_safe(ha, tmp, &list->list, list) {
+		/* sync if address is not used */
+		if ((ha->sync_cnt << 1) <= ha->refcount)
+			continue;
+
+		/* if fails defer unsyncing address */
+		ref_cnt = ha->refcount - ha->sync_cnt;
+		if (unsync && unsync(dev, ha->addr, ref_cnt))
+			continue;
+
+		ha->refcount = (ref_cnt << 1) + 1;
+		ha->sync_cnt = ref_cnt;
+		__hw_addr_del_entry(list, ha, false, false);
+	}
+
+	/* go through and sync updated/new entries to the list */
+	list_for_each_entry_safe(ha, tmp, &list->list, list) {
+		/* sync if address added or reused */
+		if ((ha->sync_cnt << 1) >= ha->refcount)
+			continue;
+
+		ref_cnt = ha->refcount - ha->sync_cnt;
+		err = sync(dev, ha->addr, ref_cnt);
+		if (err)
+			return err;
+
+		ha->refcount = ref_cnt << 1;
+		ha->sync_cnt = ref_cnt;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(__hw_addr_ref_sync_dev);
+
+/**
+ *  __hw_addr_ref_unsync_dev - Remove synchronized addresses and references on
+ *  it from device
+ *  @list: address list to remove synchronized addresses (references on it) from
+ *  @dev:  device to sync
+ *  @unsync: function to call if address and references on it should be removed
+ *
+ *  Remove all addresses that were added to the device by
+ *  __hw_addr_ref_sync_dev(). This function is intended to be called from the
+ *  ndo_stop or ndo_open functions on devices that require explicit address (or
+ *  references on it) add/remove notifications. If the unsync function pointer
+ *  is NULL then this function can be used to just reset the sync_cnt for the
+ *  addresses in the list.
+ **/
+void __hw_addr_ref_unsync_dev(struct netdev_hw_addr_list *list,
+			      struct net_device *dev,
+			      int (*unsync)(struct net_device *,
+					    const unsigned char *, int))
+{
+	struct netdev_hw_addr *ha, *tmp;
+
+	list_for_each_entry_safe(ha, tmp, &list->list, list) {
+		if (!ha->sync_cnt)
+			continue;
+
+		/* if fails defer unsyncing address */
+		if (unsync && unsync(dev, ha->addr, ha->sync_cnt))
+			continue;
+
+		ha->refcount -= ha->sync_cnt - 1;
+		ha->sync_cnt = 0;
+		__hw_addr_del_entry(list, ha, false, false);
+	}
+}
+EXPORT_SYMBOL(__hw_addr_ref_unsync_dev);
+
 /**
  *  __hw_addr_unsync_dev - Remove synchronized addresses from device
  *  @list: address list to remove synchronized addresses from
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH bpf 6/7] bpf: fix leaking uninitialized memory on pop/peek helpers
From: Mauricio Vasquez @ 2018-10-24 22:08 UTC (permalink / raw)
  To: Daniel Borkmann, ast; +Cc: netdev
In-Reply-To: <20181024200549.8516-7-daniel@iogearbox.net>

On 10/24/18 3:05 PM, Daniel Borkmann wrote:
> Commit f1a2e44a3aec ("bpf: add queue and stack maps") added helpers
> with ARG_PTR_TO_UNINIT_MAP_VALUE. Meaning, the helper is supposed to
> fill the map value buffer with data instead of reading from it like
> in other helpers such as map update. However, given the buffer is
> allowed to be uninitialized (since we fill it in the helper anyway),
> it also means that the helper is obliged to wipe the memory in case
> of an error in order to not allow for leaking uninitialized memory.
> Given pop/peek is both handled inside __{stack,queue}_map_get(),
> lets wipe it there on error case, that is, empty stack/queue.
>
> Fixes: f1a2e44a3aec ("bpf: add queue and stack maps")
> Signed-off-by: Daniel Borkmann<daniel@iogearbox.net>
> Acked-by: Alexei Starovoitov<ast@kernel.org>
> Cc: Mauricio Vasquez B<mauricio.vasquez@polito.it>

Thanks for the fix Daniel.

Acked-by: Mauricio Vasquez B<mauricio.vasquez@polito.it>

^ permalink raw reply

* Re: [PATCH bpf 7/7] bpf: make direct packet write unclone more robust
From: Daniel Borkmann @ 2018-10-24 22:08 UTC (permalink / raw)
  To: Song Liu; +Cc: Alexei Starovoitov, Networking
In-Reply-To: <CAPhsuW4V46Hi=q_fSma3mgji_JQxhjjpa1hPEtEGjE_WvEamhg@mail.gmail.com>

On 10/24/2018 11:42 PM, Song Liu wrote:
> On Wed, Oct 24, 2018 at 1:06 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> Given this seems to be quite fragile and can easily slip through the
>> cracks, lets make direct packet write more robust by requiring that
>> future program types which allow for such write must provide a prologue
>> callback. In case of XDP and sk_msg it's noop, thus add a generic noop
>> handler there. The latter starts out with NULL data/data_end unconditionally
>> when sg pages are shared.
>>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> Acked-by: Alexei Starovoitov <ast@kernel.org>
>> ---
>>  kernel/bpf/verifier.c |  6 +++++-
>>  net/core/filter.c     | 11 +++++++++++
>>  2 files changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 5fc9a65..171a2c8 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -5709,7 +5709,11 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>>         bool is_narrower_load;
>>         u32 target_size;
>>
>> -       if (ops->gen_prologue) {
>> +       if (ops->gen_prologue || env->seen_direct_write) {
>> +               if (!ops->gen_prologue) {
>> +                       verbose(env, "bpf verifier is misconfigured\n");
>> +                       return -EINVAL;
>> +               }
> 
> nit: how about this?
> 
> diff --git i/kernel/bpf/verifier.c w/kernel/bpf/verifier.c
> index 6fbe7a8afed7..d35078024e35 100644
> --- i/kernel/bpf/verifier.c
> +++ w/kernel/bpf/verifier.c
> @@ -5286,6 +5286,11 @@ static int convert_ctx_accesses(struct
> bpf_verifier_env *env)
>         bool is_narrower_load;
>         u32 target_size;
> 
> +       if (!ops->gen_prologue && env->seen_direct_write) {
> +               verbose(env, "bpf verifier is misconfigured\n");
> +               return -EINVAL;
> +       }
> +
>         if (ops->gen_prologue) {
>                 cnt = ops->gen_prologue(insn_buf, env->seen_direct_write,
>                                         env->prog);
> 

Hm, probably matter of different style preference, personally I'd prefer
the one as is though.

Thanks,
Daniel

^ permalink raw reply

* [PATCH net] net: ethernet: cadence: fix socket buffer corruption problem
From: Tristram.Ha @ 2018-10-24 21:51 UTC (permalink / raw)
  To: David S. Miller, Nicolas Ferre; +Cc: Tristram Ha, UNGLinuxDriver, netdev

From: Tristram Ha <Tristram.Ha@microchip.com>

Socket buffer is not re-created when headroom is 2 and tailroom is 1.

Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
---
 drivers/net/ethernet/cadence/macb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 8f5bf91..1d86b4d 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1684,7 +1684,7 @@ static int macb_pad_and_fcs(struct sk_buff **skb, struct net_device *ndev)
 			padlen = 0;
 		/* No room for FCS, need to reallocate skb. */
 		else
-			padlen = ETH_FCS_LEN - tailroom;
+			padlen = ETH_FCS_LEN;
 	} else {
 		/* Add room for FCS. */
 		padlen += ETH_FCS_LEN;
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH] r8169: Add new device ID support
From: Heiner Kallweit @ 2018-10-24 21:44 UTC (permalink / raw)
  To: David Miller, shawn.lin; +Cc: nic_swsd, netdev
In-Reply-To: <20181024.142234.2286757776208469261.davem@davemloft.net>

On 24.10.2018 23:22, David Miller wrote:
> From: Shawn Lin <shawn.lin@rock-chips.com>
> Date: Wed, 24 Oct 2018 09:46:47 +0800
> 
>> It's found my r8169 ethernet card at hand has a device ID
>> of 0x0000 which wasn't on the list of rtl8169_pci_tbl. Add
>> a new entry to make it work:
>>
>> [2.165785] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
>> [2.165863] r8169 0000:01:00.0: enabling device (0000 -> 0003)
>> [2.167110] r8169 0000:01:00.0 eth0: RTL8168c/8111c at 0xffffff80089be000,
>> 00:e0:4c:21:00:17, XID 1c4000c0 IRQ 208
>> [2.167128] r8169 0000:01:00.0 eth0: jumbo features [frames: 6128
>> bytes, tx checksumming: ko]
>>
>> [root@rk1808:/]# lspci
>> 00:00.0 Class 0604: 1d87:1808
>> 01:00.0 Class 0200: 10ec:0000
>>
>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
> 
> I'm stil not terribly confident in this change, a device ID of zero is
> really unusual.
> 
> Heiner, what do you think?
> 
A PCI device ID of zero definitely is a mistake of the card vendor.
Or maybe the card was just a sample and not meant to retail?
If some vendor of cards with a different Realtek network chip makes
the same mistake, then we're in trouble. I don't think we should
accept this risk just to support a broken ancient card.
This card most likely is at least 10 years old, and that we get the
first report only now seems to indicate that it's not something
affecting a lot of people.
The reporter found a way to make the card work on his system,
so I don't see a need for any further action.

^ permalink raw reply

* [net 7/7] ice: Poll for link status change
From: Jeff Kirsher @ 2018-10-24 21:47 UTC (permalink / raw)
  To: davem; +Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181024214731.26036-1-jeffrey.t.kirsher@intel.com>

From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

When the physical link goes up or down, the driver is supposed to
receive a link status event (LSE). The driver currently has the code
to handle LSEs but there is no firmware support for this feature yet.
So this patch adds the ability for the driver to poll for link status
changes. The polling itself is done in ice_watchdog_subtask.

For namespace cleanliness, this patch also removes code that handles
LSE. This code will be reintroduced once the feature is officially
supported.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_common.c |  29 +-----
 drivers/net/ethernet/intel/ice/ice_common.h |   6 --
 drivers/net/ethernet/intel/ice/ice_main.c   | 110 +++++---------------
 3 files changed, 25 insertions(+), 120 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 5a91a9087d1e..8cd6a2401fd9 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -235,7 +235,7 @@ static enum ice_media_type ice_get_media_type(struct ice_port_info *pi)
  *
  * Get Link Status (0x607). Returns the link status of the adapter.
  */
-enum ice_status
+static enum ice_status
 ice_aq_get_link_info(struct ice_port_info *pi, bool ena_lse,
 		     struct ice_link_status *link, struct ice_sq_cd *cd)
 {
@@ -2004,33 +2004,6 @@ ice_aq_set_link_restart_an(struct ice_port_info *pi, bool ena_link,
 	return ice_aq_send_cmd(pi->hw, &desc, NULL, 0, cd);
 }
 
-/**
- * ice_aq_set_event_mask
- * @hw: pointer to the hw struct
- * @port_num: port number of the physical function
- * @mask: event mask to be set
- * @cd: pointer to command details structure or NULL
- *
- * Set event mask (0x0613)
- */
-enum ice_status
-ice_aq_set_event_mask(struct ice_hw *hw, u8 port_num, u16 mask,
-		      struct ice_sq_cd *cd)
-{
-	struct ice_aqc_set_event_mask *cmd;
-	struct ice_aq_desc desc;
-
-	cmd = &desc.params.set_event_mask;
-
-	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_set_event_mask);
-
-	cmd->lport_num = port_num;
-
-	cmd->event_mask = cpu_to_le16(mask);
-
-	return ice_aq_send_cmd(hw, &desc, NULL, 0, cd);
-}
-
 /**
  * __ice_aq_get_set_rss_lut
  * @hw: pointer to the hardware structure
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index 876347e32b6f..cf760c24a6aa 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -86,12 +86,6 @@ enum ice_status
 ice_aq_set_link_restart_an(struct ice_port_info *pi, bool ena_link,
 			   struct ice_sq_cd *cd);
 enum ice_status
-ice_aq_get_link_info(struct ice_port_info *pi, bool ena_lse,
-		     struct ice_link_status *link, struct ice_sq_cd *cd);
-enum ice_status
-ice_aq_set_event_mask(struct ice_hw *hw, u8 port_num, u16 mask,
-		      struct ice_sq_cd *cd);
-enum ice_status
 ice_dis_vsi_txq(struct ice_port_info *pi, u8 num_queues, u16 *q_ids,
 		u32 *q_teids, enum ice_disq_rst_src rst_src, u16 vmvf_num,
 		struct ice_sq_cd *cmd_details);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 0084b7290b2b..05993451147a 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -456,35 +456,6 @@ static void ice_reset_subtask(struct ice_pf *pf)
 	}
 }
 
-/**
- * ice_watchdog_subtask - periodic tasks not using event driven scheduling
- * @pf: board private structure
- */
-static void ice_watchdog_subtask(struct ice_pf *pf)
-{
-	int i;
-
-	/* if interface is down do nothing */
-	if (test_bit(__ICE_DOWN, pf->state) ||
-	    test_bit(__ICE_CFG_BUSY, pf->state))
-		return;
-
-	/* make sure we don't do these things too often */
-	if (time_before(jiffies,
-			pf->serv_tmr_prev + pf->serv_tmr_period))
-		return;
-
-	pf->serv_tmr_prev = jiffies;
-
-	/* Update the stats for active netdevs so the network stack
-	 * can look at updated numbers whenever it cares to
-	 */
-	ice_update_pf_stats(pf);
-	for (i = 0; i < pf->num_alloc_vsi; i++)
-		if (pf->vsi[i] && pf->vsi[i]->netdev)
-			ice_update_vsi_stats(pf->vsi[i]);
-}
-
 /**
  * ice_print_link_msg - print link up or down message
  * @vsi: the VSI whose link status is being queried
@@ -554,36 +525,6 @@ void ice_print_link_msg(struct ice_vsi *vsi, bool isup)
 		    speed, fc);
 }
 
-/**
- * ice_init_link_events - enable/initialize link events
- * @pi: pointer to the port_info instance
- *
- * Returns -EIO on failure, 0 on success
- */
-static int ice_init_link_events(struct ice_port_info *pi)
-{
-	u16 mask;
-
-	mask = ~((u16)(ICE_AQ_LINK_EVENT_UPDOWN | ICE_AQ_LINK_EVENT_MEDIA_NA |
-		       ICE_AQ_LINK_EVENT_MODULE_QUAL_FAIL));
-
-	if (ice_aq_set_event_mask(pi->hw, pi->lport, mask, NULL)) {
-		dev_dbg(ice_hw_to_dev(pi->hw),
-			"Failed to set link event mask for port %d\n",
-			pi->lport);
-		return -EIO;
-	}
-
-	if (ice_aq_get_link_info(pi, true, NULL, NULL)) {
-		dev_dbg(ice_hw_to_dev(pi->hw),
-			"Failed to enable link events for port %d\n",
-			pi->lport);
-		return -EIO;
-	}
-
-	return 0;
-}
-
 /**
  * ice_vsi_link_event - update the vsi's netdev
  * @vsi: the vsi on which the link event occurred
@@ -671,27 +612,35 @@ ice_link_event(struct ice_pf *pf, struct ice_port_info *pi)
 }
 
 /**
- * ice_handle_link_event - handle link event via ARQ
- * @pf: pf that the link event is associated with
- *
- * Return -EINVAL if port_info is null
- * Return status on succes
+ * ice_watchdog_subtask - periodic tasks not using event driven scheduling
+ * @pf: board private structure
  */
-static int ice_handle_link_event(struct ice_pf *pf)
+static void ice_watchdog_subtask(struct ice_pf *pf)
 {
-	struct ice_port_info *port_info;
-	int status;
+	int i;
 
-	port_info = pf->hw.port_info;
-	if (!port_info)
-		return -EINVAL;
+	/* if interface is down do nothing */
+	if (test_bit(__ICE_DOWN, pf->state) ||
+	    test_bit(__ICE_CFG_BUSY, pf->state))
+		return;
 
-	status = ice_link_event(pf, port_info);
-	if (status)
-		dev_dbg(&pf->pdev->dev,
-			"Could not process link event, error %d\n", status);
+	/* make sure we don't do these things too often */
+	if (time_before(jiffies,
+			pf->serv_tmr_prev + pf->serv_tmr_period))
+		return;
 
-	return status;
+	pf->serv_tmr_prev = jiffies;
+
+	if (ice_link_event(pf, pf->hw.port_info))
+		dev_dbg(&pf->pdev->dev, "ice_link_event failed\n");
+
+	/* Update the stats for active netdevs so the network stack
+	 * can look at updated numbers whenever it cares to
+	 */
+	ice_update_pf_stats(pf);
+	for (i = 0; i < pf->num_alloc_vsi; i++)
+		if (pf->vsi[i] && pf->vsi[i]->netdev)
+			ice_update_vsi_stats(pf->vsi[i]);
 }
 
 /**
@@ -797,11 +746,6 @@ static int __ice_clean_ctrlq(struct ice_pf *pf, enum ice_ctl_q q_type)
 		opcode = le16_to_cpu(event.desc.opcode);
 
 		switch (opcode) {
-		case ice_aqc_opc_get_link_status:
-			if (ice_handle_link_event(pf))
-				dev_err(&pf->pdev->dev,
-					"Could not handle link event\n");
-			break;
 		case ice_mbx_opc_send_msg_to_pf:
 			ice_vc_process_vf_msg(pf, &event);
 			break;
@@ -2207,12 +2151,6 @@ static int ice_probe(struct pci_dev *pdev,
 	/* since everything is good, start the service timer */
 	mod_timer(&pf->serv_tmr, round_jiffies(jiffies + pf->serv_tmr_period));
 
-	err = ice_init_link_events(pf->hw.port_info);
-	if (err) {
-		dev_err(&pdev->dev, "ice_init_link_events failed: %d\n", err);
-		goto err_alloc_sw_unroll;
-	}
-
 	return 0;
 
 err_alloc_sw_unroll:
-- 
2.17.2

^ permalink raw reply related

* [net 4/7] ice: Use capability count returned by the firmware
From: Jeff Kirsher @ 2018-10-24 21:47 UTC (permalink / raw)
  To: davem; +Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181024214731.26036-1-jeffrey.t.kirsher@intel.com>

From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

The firmware now returns the capability count in the command buffer.
Use it.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_common.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index c52f450f2c0d..78df54b25bf1 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1531,9 +1531,7 @@ ice_aq_discover_caps(struct ice_hw *hw, void *buf, u16 buf_size, u32 *cap_count,
 	if (!status)
 		ice_parse_caps(hw, buf, le32_to_cpu(cmd->count), opc);
 	else if (hw->adminq.sq_last_status == ICE_AQ_RC_ENOMEM)
-		*cap_count =
-			DIV_ROUND_UP(le16_to_cpu(desc.datalen),
-				     sizeof(struct ice_aqc_list_caps_elem));
+		*cap_count = le32_to_cpu(cmd->count);
 	return status;
 }
 
-- 
2.17.2

^ permalink raw reply related

* [net 5/7] ice: Introduce ice_dev_onetime_setup
From: Jeff Kirsher @ 2018-10-24 21:47 UTC (permalink / raw)
  To: davem; +Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181024214731.26036-1-jeffrey.t.kirsher@intel.com>

From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice_dev_onetime_setup contains a couple of driver workarounds for current
firmware limitations. These workarounds are expected to go away once
these limitations are fixed in the firmware.

On a firmware release that has these issues addressed, these workarounds
(while unnecessary) will not break anything.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_common.c   | 19 +++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_common.h   |  3 +++
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |  2 ++
 drivers/net/ethernet/intel/ice/ice_lib.c      |  1 +
 4 files changed, 25 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 78df54b25bf1..5a91a9087d1e 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -42,6 +42,23 @@ static enum ice_status ice_set_mac_type(struct ice_hw *hw)
 	return 0;
 }
 
+/**
+ * ice_dev_onetime_setup - Temporary HW/FW workarounds
+ * @hw: pointer to the HW structure
+ *
+ * This function provides temporary workarounds for certain issues
+ * that are expected to be fixed in the HW/FW.
+ */
+void ice_dev_onetime_setup(struct ice_hw *hw)
+{
+	/* configure Rx - set non pxe mode */
+	wr32(hw, GLLAN_RCTL_0, 0x1);
+
+#define MBX_PF_VT_PFALLOC	0x00231E80
+	/* set VFs per PF */
+	wr32(hw, MBX_PF_VT_PFALLOC, rd32(hw, PF_VT_PFALLOC_HIF));
+}
+
 /**
  * ice_clear_pf_cfg - Clear PF configuration
  * @hw: pointer to the hardware structure
@@ -740,6 +757,8 @@ enum ice_status ice_init_hw(struct ice_hw *hw)
 	if (status)
 		goto err_unroll_sched;
 
+	ice_dev_onetime_setup(hw);
+
 	/* Get MAC information */
 	/* A single port can report up to two (LAN and WoL) addresses */
 	mac_buf = devm_kcalloc(ice_hw_to_dev(hw), 2,
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index 1900681289a4..876347e32b6f 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -34,6 +34,9 @@ ice_sq_send_cmd(struct ice_hw *hw, struct ice_ctl_q_info *cq,
 		struct ice_sq_cd *cd);
 void ice_clear_pxe_mode(struct ice_hw *hw);
 enum ice_status ice_get_caps(struct ice_hw *hw);
+
+void ice_dev_onetime_setup(struct ice_hw *hw);
+
 enum ice_status
 ice_write_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
 		  u32 rxq_index);
diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index a6679a9bfd3a..228afcad6fc3 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -157,6 +157,7 @@
 #define VPINT_ALLOC_LAST_S			12
 #define VPINT_ALLOC_LAST_M			ICE_M(0x7FF, 12)
 #define VPINT_ALLOC_VALID_M			BIT(31)
+#define GLLAN_RCTL_0				0x002941F8
 #define QRX_CONTEXT(_i, _QRX)			(0x00280000 + ((_i) * 8192 + (_QRX) * 4))
 #define QRX_CTRL(_QRX)				(0x00120000 + ((_QRX) * 4))
 #define QRX_CTRL_MAX_INDEX			2047
@@ -320,6 +321,7 @@
 #define GLV_UPRCL(_i)				(0x003B2000 + ((_i) * 8))
 #define GLV_UPTCH(_i)				(0x0030A004 + ((_i) * 8))
 #define GLV_UPTCL(_i)				(0x0030A000 + ((_i) * 8))
+#define PF_VT_PFALLOC_HIF			0x0009DD80
 #define VSIQF_HKEY_MAX_INDEX			12
 #define VSIQF_HLUT_MAX_INDEX			15
 #define VFINT_DYN_CTLN(_i)			(0x00003800 + ((_i) * 4))
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index e750702bcdce..5bacad01f0c9 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2529,6 +2529,7 @@ int ice_vsi_rebuild(struct ice_vsi *vsi)
 	vsi->hw_base_vector = 0;
 	ice_vsi_clear_rings(vsi);
 	ice_vsi_free_arrays(vsi, false);
+	ice_dev_onetime_setup(&vsi->back->hw);
 	ice_vsi_set_num_qs(vsi);
 
 	/* Initialize VSI struct elements and create VSI in FW */
-- 
2.17.2

^ permalink raw reply related

* [net 3/7] ice: Update expected FW version
From: Jeff Kirsher @ 2018-10-24 21:47 UTC (permalink / raw)
  To: davem; +Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181024214731.26036-1-jeffrey.t.kirsher@intel.com>

From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

Update to the current firmware major and minor version which are
1 and 3 respectively.

Also remove an empty comment line.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_controlq.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_controlq.h b/drivers/net/ethernet/intel/ice/ice_controlq.h
index 437f832fd7c4..0038a4109c99 100644
--- a/drivers/net/ethernet/intel/ice/ice_controlq.h
+++ b/drivers/net/ethernet/intel/ice/ice_controlq.h
@@ -19,11 +19,10 @@
 
 /* Defines that help manage the driver vs FW API checks.
  * Take a look at ice_aq_ver_check in ice_controlq.c for actual usage.
- *
  */
 #define EXP_FW_API_VER_BRANCH		0x00
-#define EXP_FW_API_VER_MAJOR		0x00
-#define EXP_FW_API_VER_MINOR		0x01
+#define EXP_FW_API_VER_MAJOR		0x01
+#define EXP_FW_API_VER_MINOR		0x03
 
 /* Different control queue types: These are mainly for SW consumption. */
 enum ice_ctl_q {
-- 
2.17.2

^ permalink raw reply related

* [net 2/7] ice: Change device ID define names to align with branding string
From: Jeff Kirsher @ 2018-10-24 21:47 UTC (permalink / raw)
  To: davem; +Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181024214731.26036-1-jeffrey.t.kirsher@intel.com>

From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

Basically remove references to C810 and use E810C (from the branding
string) instead.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_devids.h | 6 +++---
 drivers/net/ethernet/intel/ice/ice_main.c   | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_devids.h b/drivers/net/ethernet/intel/ice/ice_devids.h
index a6f0a5c0c305..f8d5c661d0ba 100644
--- a/drivers/net/ethernet/intel/ice/ice_devids.h
+++ b/drivers/net/ethernet/intel/ice/ice_devids.h
@@ -6,10 +6,10 @@
 
 /* Device IDs */
 /* Intel(R) Ethernet Controller E810-C for backplane */
-#define ICE_DEV_ID_C810_BACKPLANE	0x1591
+#define ICE_DEV_ID_E810C_BACKPLANE	0x1591
 /* Intel(R) Ethernet Controller E810-C for QSFP */
-#define ICE_DEV_ID_C810_QSFP		0x1592
+#define ICE_DEV_ID_E810C_QSFP		0x1592
 /* Intel(R) Ethernet Controller E810-C for SFP */
-#define ICE_DEV_ID_C810_SFP		0x1593
+#define ICE_DEV_ID_E810C_SFP		0x1593
 
 #endif /* _ICE_DEVIDS_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 8f61b375e768..0084b7290b2b 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2271,9 +2271,9 @@ static void ice_remove(struct pci_dev *pdev)
  *   Class, Class Mask, private data (not used) }
  */
 static const struct pci_device_id ice_pci_tbl[] = {
-	{ PCI_VDEVICE(INTEL, ICE_DEV_ID_C810_BACKPLANE), 0 },
-	{ PCI_VDEVICE(INTEL, ICE_DEV_ID_C810_QSFP), 0 },
-	{ PCI_VDEVICE(INTEL, ICE_DEV_ID_C810_SFP), 0 },
+	{ PCI_VDEVICE(INTEL, ICE_DEV_ID_E810C_BACKPLANE), 0 },
+	{ PCI_VDEVICE(INTEL, ICE_DEV_ID_E810C_QSFP), 0 },
+	{ PCI_VDEVICE(INTEL, ICE_DEV_ID_E810C_SFP), 0 },
 	/* required last entry */
 	{ 0, }
 };
-- 
2.17.2

^ permalink raw reply related

* [net 6/7] ice: Allocate VF interrupts and set queue map
From: Jeff Kirsher @ 2018-10-24 21:47 UTC (permalink / raw)
  To: davem; +Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181024214731.26036-1-jeffrey.t.kirsher@intel.com>

From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

Allocate VF interrupts using VPINT_ALLOC_PCI. Multiple interrupts are
specified as a range from "first" to "last".

Also, according to the spec, the queue mapping for a VF needs to be set
in both contig and scatter queue modes. So make this change as well.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_hw_autogen.h  |  6 ++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 15 +++++++++++----
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 228afcad6fc3..5fdea6ec7675 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -157,6 +157,12 @@
 #define VPINT_ALLOC_LAST_S			12
 #define VPINT_ALLOC_LAST_M			ICE_M(0x7FF, 12)
 #define VPINT_ALLOC_VALID_M			BIT(31)
+#define VPINT_ALLOC_PCI(_VF)			(0x0009D000 + ((_VF) * 4))
+#define VPINT_ALLOC_PCI_FIRST_S			0
+#define VPINT_ALLOC_PCI_FIRST_M			ICE_M(0x7FF, 0)
+#define VPINT_ALLOC_PCI_LAST_S			12
+#define VPINT_ALLOC_PCI_LAST_M			ICE_M(0x7FF, 12)
+#define VPINT_ALLOC_PCI_VALID_M			BIT(31)
 #define GLLAN_RCTL_0				0x002941F8
 #define QRX_CONTEXT(_i, _QRX)			(0x00280000 + ((_i) * 8192 + (_QRX) * 4))
 #define QRX_CTRL(_QRX)				(0x00120000 + ((_QRX) * 4))
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index c25e486706f3..45f10f8f01dc 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -173,6 +173,7 @@ static void ice_dis_vf_mappings(struct ice_vf *vf)
 	vsi = pf->vsi[vf->lan_vsi_idx];
 
 	wr32(hw, VPINT_ALLOC(vf->vf_id), 0);
+	wr32(hw, VPINT_ALLOC_PCI(vf->vf_id), 0);
 
 	first = vf->first_vector_idx;
 	last = first + pf->num_vf_msix - 1;
@@ -519,6 +520,10 @@ static void ice_ena_vf_mappings(struct ice_vf *vf)
 	       VPINT_ALLOC_VALID_M);
 	wr32(hw, VPINT_ALLOC(vf->vf_id), reg);
 
+	reg = (((first << VPINT_ALLOC_PCI_FIRST_S) & VPINT_ALLOC_PCI_FIRST_M) |
+	       ((last << VPINT_ALLOC_PCI_LAST_S) & VPINT_ALLOC_PCI_LAST_M) |
+	       VPINT_ALLOC_PCI_VALID_M);
+	wr32(hw, VPINT_ALLOC_PCI(vf->vf_id), reg);
 	/* map the interrupts to its functions */
 	for (v = first; v <= last; v++) {
 		reg = (((abs_vf_id << GLINT_VECT2FUNC_VF_NUM_S) &
@@ -528,10 +533,11 @@ static void ice_ena_vf_mappings(struct ice_vf *vf)
 		wr32(hw, GLINT_VECT2FUNC(v), reg);
 	}
 
+	/* set regardless of mapping mode */
+	wr32(hw, VPLAN_TXQ_MAPENA(vf->vf_id), VPLAN_TXQ_MAPENA_TX_ENA_M);
+
 	/* VF Tx queues allocation */
 	if (vsi->tx_mapping_mode == ICE_VSI_MAP_CONTIG) {
-		wr32(hw, VPLAN_TXQ_MAPENA(vf->vf_id),
-		     VPLAN_TXQ_MAPENA_TX_ENA_M);
 		/* set the VF PF Tx queue range
 		 * VFNUMQ value should be set to (number of queues - 1). A value
 		 * of 0 means 1 queue and a value of 255 means 256 queues
@@ -546,10 +552,11 @@ static void ice_ena_vf_mappings(struct ice_vf *vf)
 			"Scattered mode for VF Tx queues is not yet implemented\n");
 	}
 
+	/* set regardless of mapping mode */
+	wr32(hw, VPLAN_RXQ_MAPENA(vf->vf_id), VPLAN_RXQ_MAPENA_RX_ENA_M);
+
 	/* VF Rx queues allocation */
 	if (vsi->rx_mapping_mode == ICE_VSI_MAP_CONTIG) {
-		wr32(hw, VPLAN_RXQ_MAPENA(vf->vf_id),
-		     VPLAN_RXQ_MAPENA_RX_ENA_M);
 		/* set the VF PF Rx queue range
 		 * VFNUMQ value should be set to (number of queues - 1). A value
 		 * of 0 means 1 queue and a value of 255 means 256 queues
-- 
2.17.2

^ permalink raw reply related

* [net 0/7][pull request] Intel Wired LAN Driver Fixes 2018-10-24
From: Jeff Kirsher @ 2018-10-24 21:47 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann

This series contains fixes for the ice driver.

Anirudh fixes a namespace issue which was introduced with a previous
patch to remove ice_netpoll.  Fixed up the device ID define names to
align with the branding string names.  Use the capability count returned
by the firmware, instead of calculating the count.  Introduced driver
workarounds due to current firmware limitations.  Fixed the queue
mapping for a VF, which needs to be set in the config and scatter queue
modes.  Fixed the driver which is setup to handle link status events
(LSE), even though the firmware does not have this feature yet, so add
the ability to poll for link status changes while we wait for updated
firmware.

The following are changes since commit 44adbac8f7217040be97928cd19998259d9d4418:
  Merge branch 'work.tty-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 100GbE

Anirudh Venkataramanan (7):
  ice: Make ice_msix_clean_rings static
  ice: Change device ID define names to align with branding string
  ice: Update expected FW version
  ice: Use capability count returned by the firmware
  ice: Introduce ice_dev_onetime_setup
  ice: Allocate VF interrupts and set queue map
  ice: Poll for link status change

 drivers/net/ethernet/intel/ice/ice_common.c   |  52 ++++----
 drivers/net/ethernet/intel/ice/ice_common.h   |   9 +-
 drivers/net/ethernet/intel/ice/ice_controlq.h |   5 +-
 drivers/net/ethernet/intel/ice/ice_devids.h   |   6 +-
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 ++
 drivers/net/ethernet/intel/ice/ice_lib.c      |   3 +-
 drivers/net/ethernet/intel/ice/ice_lib.h      |   1 -
 drivers/net/ethernet/intel/ice/ice_main.c     | 116 ++++--------------
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  |  15 ++-
 9 files changed, 77 insertions(+), 138 deletions(-)

-- 
2.17.2

^ permalink raw reply

* [net 1/7] ice: Make ice_msix_clean_rings static
From: Jeff Kirsher @ 2018-10-24 21:47 UTC (permalink / raw)
  To: davem; +Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181024214731.26036-1-jeffrey.t.kirsher@intel.com>

From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

commit 158a08a694c4e ("ice: remove ndo_poll_controller") removed
ice_netpoll and introduced a namespace warning for ice_msix_clean_rings.
Fix the namespace warning by making ice_msix_clean_rings static.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c | 2 +-
 drivers/net/ethernet/intel/ice/ice_lib.h | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 49f1940772ed..e750702bcdce 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -433,7 +433,7 @@ int ice_vsi_clear(struct ice_vsi *vsi)
  * @irq: interrupt number
  * @data: pointer to a q_vector
  */
-irqreturn_t ice_msix_clean_rings(int __always_unused irq, void *data)
+static irqreturn_t ice_msix_clean_rings(int __always_unused irq, void *data)
 {
 	struct ice_q_vector *q_vector = (struct ice_q_vector *)data;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
index 677db40338f5..3831b4f0960a 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_lib.h
@@ -73,5 +73,4 @@ int ice_vsi_cfg_tc(struct ice_vsi *vsi, u8 ena_tc);
 
 int ice_vsi_manage_rss_lut(struct ice_vsi *vsi, bool ena);
 
-irqreturn_t ice_msix_clean_rings(int __always_unused irq, void *data);
 #endif /* !_ICE_LIB_H_ */
-- 
2.17.2

^ permalink raw reply related

* Re: [PATCH bpf 0/7] Batch of direct packet access fixes for BPF
From: Song Liu @ 2018-10-24 21:43 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Alexei Starovoitov, Networking
In-Reply-To: <20181024200549.8516-1-daniel@iogearbox.net>

On Wed, Oct 24, 2018 at 1:08 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Several fixes to get direct packet access in order from verifier
> side. Also test suite fix to run cg_skb as unpriv and an improvement
> to make direct packet write less error prone in future.
>
> Thanks!
>
> Daniel Borkmann (7):
>   bpf: fix test suite to enable all unpriv program types
>   bpf: disallow direct packet access for unpriv in cg_skb
>   bpf: fix direct packet access for flow dissector progs
>   bpf: fix cg_skb types to hint access type in may_access_direct_pkt_data
>   bpf: fix direct packet write into pop/peek helpers
>   bpf: fix leaking uninitialized memory on pop/peek helpers
>   bpf: make direct packet write unclone more robust
>
>  kernel/bpf/helpers.c                        |  2 --
>  kernel/bpf/queue_stack_maps.c               |  2 ++
>  kernel/bpf/verifier.c                       | 13 ++++++++++---
>  net/core/filter.c                           | 17 +++++++++++++++++
>  tools/testing/selftests/bpf/test_verifier.c | 15 +++++++++++++--
>  5 files changed, 42 insertions(+), 7 deletions(-)
>
> --
> 2.9.5
>

Other than the nitpick on 7/7, for the series:

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply

* Re: [PATCH bpf 7/7] bpf: make direct packet write unclone more robust
From: Song Liu @ 2018-10-24 21:42 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Alexei Starovoitov, Networking
In-Reply-To: <20181024200549.8516-8-daniel@iogearbox.net>

On Wed, Oct 24, 2018 at 1:06 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Given this seems to be quite fragile and can easily slip through the
> cracks, lets make direct packet write more robust by requiring that
> future program types which allow for such write must provide a prologue
> callback. In case of XDP and sk_msg it's noop, thus add a generic noop
> handler there. The latter starts out with NULL data/data_end unconditionally
> when sg pages are shared.
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  kernel/bpf/verifier.c |  6 +++++-
>  net/core/filter.c     | 11 +++++++++++
>  2 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 5fc9a65..171a2c8 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5709,7 +5709,11 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>         bool is_narrower_load;
>         u32 target_size;
>
> -       if (ops->gen_prologue) {
> +       if (ops->gen_prologue || env->seen_direct_write) {
> +               if (!ops->gen_prologue) {
> +                       verbose(env, "bpf verifier is misconfigured\n");
> +                       return -EINVAL;
> +               }

nit: how about this?

diff --git i/kernel/bpf/verifier.c w/kernel/bpf/verifier.c
index 6fbe7a8afed7..d35078024e35 100644
--- i/kernel/bpf/verifier.c
+++ w/kernel/bpf/verifier.c
@@ -5286,6 +5286,11 @@ static int convert_ctx_accesses(struct
bpf_verifier_env *env)
        bool is_narrower_load;
        u32 target_size;

+       if (!ops->gen_prologue && env->seen_direct_write) {
+               verbose(env, "bpf verifier is misconfigured\n");
+               return -EINVAL;
+       }
+
        if (ops->gen_prologue) {
                cnt = ops->gen_prologue(insn_buf, env->seen_direct_write,
                                        env->prog);

^ permalink raw reply related

* Re: [PATCH net] net/ipv6: Allow onlink routes to have a device mismatch if it is the default route
From: David Miller @ 2018-10-24 21:37 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, dsahern
In-Reply-To: <20181024205839.24689-1-dsahern@kernel.org>

From: David Ahern <dsahern@kernel.org>
Date: Wed, 24 Oct 2018 13:58:39 -0700

> From: David Ahern <dsahern@gmail.com>
> 
> The intent of ip6_route_check_nh_onlink is to make sure the gateway
> given for an onlink route is not actually on a connected route for
> a different interface (e.g., 2001:db8:1::/64 is on dev eth1 and then
> an onlink route has a via 2001:db8:1::1 dev eth2). If the gateway
> lookup hits the default route then it most likely will be a different
> interface than the onlink route which is ok.
> 
> Update ip6_route_check_nh_onlink to disregard the device mismatch
> if the gateway lookup hits the default route. Turns out the existing
> onlink tests are passing because there is no default route or it is
> an unreachable default, so update the onlink tests to have a default
> route other than unreachable.
> 
> Fixes: fc1e64e1092f6 ("net/ipv6: Add support for onlink flag")
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 03/10] audit: log container info of syscalls
From: Steve Grubb @ 2018-10-25  6:06 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Paul Moore, simo, carlos, linux-api, containers, linux-kernel,
	dhowells, linux-audit, netfilter-devel, ebiederm, luto, netdev,
	linux-fsdevel, Eric Paris, Serge Hallyn, viro
In-Reply-To: <20181025004255.zl7p7j6gztouh2hh@madcap2.tricolour.ca>

On Wed, 24 Oct 2018 20:42:55 -0400
Richard Guy Briggs <rgb@redhat.com> wrote:

> On 2018-10-24 16:55, Paul Moore wrote:
> > On Wed, Oct 24, 2018 at 11:15 AM Richard Guy Briggs
> > <rgb@redhat.com> wrote:  
> > > On 2018-10-19 19:16, Paul Moore wrote:  
> > > > On Sun, Aug 5, 2018 at 4:32 AM Richard Guy Briggs
> > > > <rgb@redhat.com> wrote:  
> > > > > Create a new audit record AUDIT_CONTAINER to document the
> > > > > audit container identifier of a process if it is present.
> > > > >
> > > > > Called from audit_log_exit(), syscalls are covered.
> > > > >
> > > > > A sample raw event:
> > > > > type=SYSCALL msg=audit(1519924845.499:257): arch=c000003e
> > > > > syscall=257 success=yes exit=3 a0=ffffff9c a1=56374e1cef30
> > > > > a2=241 a3=1b6 items=2 ppid=606 pid=635 auid=0 uid=0 gid=0
> > > > > euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3
> > > > > comm="bash" exe="/usr/bin/bash"
> > > > > subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> > > > > key="tmpcontainerid" type=CWD msg=audit(1519924845.499:257):
> > > > > cwd="/root" type=PATH msg=audit(1519924845.499:257): item=0
> > > > > name="/tmp/" inode=13863 dev=00:27 mode=041777 ouid=0 ogid=0
> > > > > rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype= PARENT
> > > > > cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0
> > > > > cap_fver=0 type=PATH msg=audit(1519924845.499:257): item=1
> > > > > name="/tmp/tmpcontainerid" inode=17729 dev=00:27 mode=0100644
> > > > > ouid=0 ogid=0 rdev=00:00
> > > > > obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE
> > > > > cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0
> > > > > cap_fver=0 type=PROCTITLE msg=audit(1519924845.499:257):
> > > > > proctitle=62617368002D6300736C65657020313B206563686F2074657374203E202F746D702F746D70636F6E7461696E65726964
> > > > > type=CONTAINER msg=audit(1519924845.499:257): op=task
> > > > > contid=123458
> > > > >
> > > > > See: https://github.com/linux-audit/audit-kernel/issues/90
> > > > > See: https://github.com/linux-audit/audit-userspace/issues/51
> > > > > See: https://github.com/linux-audit/audit-testsuite/issues/64
> > > > > See:
> > > > > https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> > > > > Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by:
> > > > > Serge Hallyn <serge@hallyn.com> Acked-by: Steve Grubb
> > > > > <sgrubb@redhat.com> ---
> > > > >  include/linux/audit.h      |  7 +++++++
> > > > >  include/uapi/linux/audit.h |  1 +
> > > > >  kernel/audit.c             | 24 ++++++++++++++++++++++++
> > > > >  kernel/auditsc.c           |  3 +++
> > > > >  4 files changed, 35 insertions(+)  
> > > >
> > > > ...
> > > >  
> > > > > @@ -2045,6 +2045,30 @@ void audit_log_session_info(struct
> > > > > audit_buffer *ab) audit_log_format(ab, " auid=%u ses=%u",
> > > > > auid, sessionid); }
> > > > >
> > > > > +/*
> > > > > + * audit_log_contid - report container info
> > > > > + * @tsk: task to be recorded
> > > > > + * @context: task or local context for record
> > > > > + * @op: contid string description
> > > > > + */
> > > > > +int audit_log_contid(struct task_struct *tsk,
> > > > > +                            struct audit_context *context,
> > > > > char *op) +{
> > > > > +       struct audit_buffer *ab;
> > > > > +
> > > > > +       if (!audit_contid_set(tsk))
> > > > > +               return 0;
> > > > > +       /* Generate AUDIT_CONTAINER record with container ID
> > > > > */
> > > > > +       ab = audit_log_start(context, GFP_KERNEL,
> > > > > AUDIT_CONTAINER);
> > > > > +       if (!ab)
> > > > > +               return -ENOMEM;
> > > > > +       audit_log_format(ab, "op=%s contid=%llu",
> > > > > +                        op, audit_get_contid(tsk));
> > > > > +       audit_log_end(ab);
> > > > > +       return 0;
> > > > > +}
> > > > > +EXPORT_SYMBOL(audit_log_contid);  
> > > >
> > > > As discussed in the previous iteration of the patch, I prefer
> > > > AUDIT_CONTAINER_ID here over AUDIT_CONTAINER.  If you feel
> > > > strongly about keeping it as-is with AUDIT_CONTAINER I suppose
> > > > I could live with that, but it is isn't my first choice.  
> > >
> > > I don't have a strong opinion on this one, mildly preferring the
> > > shorter one only because it is shorter.  
> > 
> > We already have multiple AUDIT_CONTAINER* record types, so it seems
> > as though we should use "AUDIT_CONTAINER" as a prefix of sorts,
> > rather than a type itself.  
> 
> I'm fine with that.  I'd still like to hear Steve's input.  He had
> stronger opinions than me.

The creation event should be separate and distinct from the continuing
use when its used as a supplemental record. IOW, binding the ID to a
container is part of the lifecycle and needs to be kept distinct.

-Steve

> > > > However, I do care about the "op" field in this record.  It just
> > > > doesn't make any sense; the way you are using it it is more of a
> > > > context field than an operations field, and even then why is the
> > > > context important from a logging and/or security perspective?
> > > > Drop it please.  
> > >
> > > I'll rename it to whatever you like.  I'd suggest "ref=".  The
> > > reason I think it is important is there are multiple sources that
> > > aren't always obvious from the other records to which it is
> > > associated.  In the case of ptrace and signals, there can be many
> > > target tasks listed (OBJ_PID) with no other way to distinguish
> > > the matching audit container identifier records all for one
> > > event.  This is in addition to the default syscall container
> > > identifier record.  I'm not currently happy with the text content
> > > to link the two, but that should be solvable (most obvious is
> > > taret PID).  Throwing away this information seems shortsighted.  
> > 
> > It would be helpful if you could generate real audit events
> > demonstrating the problems you are describing, as well as a more
> > standard syscall event, so we can discuss some possible solutions.  
> 
> If the auditted process is in a container and it ptraces or signals
> another process in a container, there will be two AUDIT_CONTAINER
> records for the same event that won't be identified as to which record
> belongs to which process or other record (SYSCALL vs 1+ OBJ_PID
> records).  There could be many signals recorded, each with their own
> OBJ_PID record.  The first is stored in the audit context and
> additional ones are stored in a chained struct that can accommodate
> 16 entries each.
> 
> (See audit_signal_info(), __audit_ptrace().)
> 
> (As a side note, on code inspection it appears that a signal target
> would get overwritten by a ptrace action if they were to happen in
> that order.)
> 
> > paul moore  
> 
> - RGB
> 
> --
> Richard Guy Briggs <rgb@redhat.com>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635
> 
> --
> Linux-audit mailing list
> Linux-audit@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply

* Re: [PATCH net] net: sched: Remove TCA_OPTIONS from policy
From: David Miller @ 2018-10-24 21:35 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, pupilla, dsahern
In-Reply-To: <20181024153249.15374-1-dsahern@kernel.org>

From: David Ahern <dsahern@kernel.org>
Date: Wed, 24 Oct 2018 08:32:49 -0700

> From: David Ahern <dsahern@gmail.com>
> 
> Marco reported an error with hfsc:
> root@Calimero:~# tc qdisc add dev eth0 root handle 1:0 hfsc default 1
> Error: Attribute failed policy validation.
> 
> Apparently a few implementations pass TCA_OPTIONS as a binary instead
> of nested attribute, so drop TCA_OPTIONS from the policy.
> 
> Fixes: 8b4c3cdd9dd8 ("net: sched: Add policy validation for tc attributes")
> Reported-by: Marco Berizzi <pupilla@libero.it>
> Signed-off-by: David Ahern <dsahern@gmail.com>

That's unfortunate... applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] octeontx2-af: Copy the right amount of memory
From: David Miller @ 2018-10-24 21:25 UTC (permalink / raw)
  To: dan.carpenter; +Cc: sgoutham, lcherian, gakula, jerinj, netdev, kernel-janitors
In-Reply-To: <20181024083221.humvwh2pefovptcd@kili.mountain>

From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Wed, 24 Oct 2018 11:32:21 +0300

> This is a copy and paste bug where we copied the sizeof() from the chunk
> before.  We're copying more data than intended but the destination is a
> union so it doesn't cause memory corruption.
> 
> Fixes: ffb0abd7e9cb ("octeontx2-af: NIX AQ instruction enqueue support")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Applied, thanks Dan.

^ permalink raw reply

* Re: [PATCH net-next 1/3] net/sock: factor out dequeue/peek with offset code
From: Alexei Starovoitov @ 2018-10-24 21:23 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, David S. Miller, Eric Dumazet, kafai, daniel
In-Reply-To: <4c94ee8fe77a51d61927bfff46441abc15172193.camel@redhat.com>

On Tue, Oct 23, 2018 at 09:28:03AM +0200, Paolo Abeni wrote:
> Hi,
> 
> On Mon, 2018-10-22 at 21:49 -0700, Alexei Starovoitov wrote:
> > On Mon, May 15, 2017 at 11:01:42AM +0200, Paolo Abeni wrote:
> > > And update __sk_queue_drop_skb() to work on the specified queue.
> > > This will help the udp protocol to use an additional private
> > > rx queue in a later patch.
> > > 
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > >  include/linux/skbuff.h |  7 ++++
> > >  include/net/sock.h     |  4 +--
> > >  net/core/datagram.c    | 90 ++++++++++++++++++++++++++++----------------------
> > >  3 files changed, 60 insertions(+), 41 deletions(-)
> > > 
> > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > index a098d95..bfc7892 100644
> > > --- a/include/linux/skbuff.h
> > > +++ b/include/linux/skbuff.h
> > > @@ -3056,6 +3056,13 @@ static inline void skb_frag_list_init(struct sk_buff *skb)
> > >  
> > >  int __skb_wait_for_more_packets(struct sock *sk, int *err, long *timeo_p,
> > >  				const struct sk_buff *skb);
> > > +struct sk_buff *__skb_try_recv_from_queue(struct sock *sk,
> > > +					  struct sk_buff_head *queue,
> > > +					  unsigned int flags,
> > > +					  void (*destructor)(struct sock *sk,
> > > +							   struct sk_buff *skb),
> > > +					  int *peeked, int *off, int *err,
> > > +					  struct sk_buff **last);
> > >  struct sk_buff *__skb_try_recv_datagram(struct sock *sk, unsigned flags,
> > >  					void (*destructor)(struct sock *sk,
> > >  							   struct sk_buff *skb),
> > > diff --git a/include/net/sock.h b/include/net/sock.h
> > > index 66349e4..49d226f 100644
> > > --- a/include/net/sock.h
> > > +++ b/include/net/sock.h
> > > @@ -2035,8 +2035,8 @@ void sk_reset_timer(struct sock *sk, struct timer_list *timer,
> > >  
> > >  void sk_stop_timer(struct sock *sk, struct timer_list *timer);
> > >  
> > > -int __sk_queue_drop_skb(struct sock *sk, struct sk_buff *skb,
> > > -			unsigned int flags,
> > > +int __sk_queue_drop_skb(struct sock *sk, struct sk_buff_head *sk_queue,
> > > +			struct sk_buff *skb, unsigned int flags,
> > >  			void (*destructor)(struct sock *sk,
> > >  					   struct sk_buff *skb));
> > >  int __sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
> > > diff --git a/net/core/datagram.c b/net/core/datagram.c
> > > index db1866f2..a4592b4 100644
> > > --- a/net/core/datagram.c
> > > +++ b/net/core/datagram.c
> > > @@ -161,6 +161,43 @@ static struct sk_buff *skb_set_peeked(struct sk_buff *skb)
> > >  	return skb;
> > >  }
> > >  
> > > +struct sk_buff *__skb_try_recv_from_queue(struct sock *sk,
> > > +					  struct sk_buff_head *queue,
> > > +					  unsigned int flags,
> > > +					  void (*destructor)(struct sock *sk,
> > > +							   struct sk_buff *skb),
> > > +					  int *peeked, int *off, int *err,
> > > +					  struct sk_buff **last)
> > > +{
> > > +	struct sk_buff *skb;
> > > +
> > > +	*last = queue->prev;
> > 
> > this refactoring changed the behavior.
> > Now queue->prev is returned as last.
> > Whereas it was *last = queue before.
> > 
> > > +	skb_queue_walk(queue, skb) {
> > 
> > and *last = skb assignment is gone too.
> > 
> > Was this intentional ? 
> 
> Yes.
> 
> > Is this the right behavior?
> 
> I think so. queue->prev is the last skb in the queue. With the old
> code,   __skb_try_recv_datagram(), when returning NULL, used the
> instructions you quoted to overall set 'last' to the last skb in the
> queue. We did not use 'last' elsewhere. So overall this just reduce the
> number of instructions inside the loop. (unless I'm missing something).

Right. On the second glance it does appear to be correct.

> Are you experiencing any specific issues due to the mentioned commit?

yes.

Just like what Baoyou Xie reported https://lore.kernel.org/patchwork/patch/962802/
we're hitting infinite loop in __skb_recv_datagram() on 4.11 kernel.
and different, but also buggy, behavior on the net-next.

In particular __skb_try_recv_datagram() returns immediately,
because skb_queue_empty() is true (sk->sk_receive_queue.next == &sk->sk_receive_queue)

But __skb_wait_for_more_packets() also returns immediately
because if (sk->sk_receive_queue.prev != skb) is also true.

There is a link list corruption in sk_receive_queue.

list->next == list, but list->prev still points to valid skb.
Before your commit we had
*last = queue;
and we had this infinite loop I described above.
After your commit
*last = queue->next;
which assigns buggy pointer into *last, but that accidentally
makes if (sk->sk_receive_queue.prev != skb) to be false
and __skb_wait_for_more_packets() goes into schedule_timeout().
Eventually bad things happen too, but in the different spot.

The corruption is somehow related to netlink_recvmsg() just like in that
Baoyou Xie report.

The typical stack trace is
__skb_wait_for_more_packets+0x64/0x140
? skb_gro_receive+0x310/0x310
__skb_recv_datagram+0x5c/0xa0
skb_recv_datagram+0x31/0x40
netlink_recvmsg+0x51/0x3c0
? sock_write_iter+0xf8/0x110
SYSC_recvfrom+0x116/0x190

We didn't figure out a way to reproduce it yet.
kasan didn't help.
The way netlink socket pushes skbs into sk_receive_queue and drains it
all looks correct. We thought it could be MSG_PEAK related, but
skb->users refcnting also looks correct.

If anyone have any ideas what things to try, I'm all ears.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox