Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH bpf-next v2 1/3] libbpf: use LFS (_FILE_OFFSET_BITS) instead of direct mmap2 syscall
From: Ivan Khoronzhuk @ 2019-08-15 12:13 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel
  Cc: davem, hawk, john.fastabend, jakub.kicinski, daniel, netdev, bpf,
	xdp-newbies, linux-kernel, jlemon, yhs, andrii.nakryiko,
	Ivan Khoronzhuk
In-Reply-To: <20190815121356.8848-1-ivan.khoronzhuk@linaro.org>

Drop __NR_mmap2 fork in flavor of LFS, that is _FILE_OFFSET_BITS=64
(glibc & bionic) / LARGEFILE64_SOURCE (for musl) decision. It allows
mmap() to use 64bit offset that is passed to mmap2 syscall. As result
pgoff is not truncated and no need to use direct access to mmap2 for
32 bits systems.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 tools/lib/bpf/Makefile |  1 +
 tools/lib/bpf/xsk.c    | 49 ++++++++++++------------------------------
 2 files changed, 15 insertions(+), 35 deletions(-)

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 9312066a1ae3..844f6cd79c03 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -113,6 +113,7 @@ override CFLAGS += -Werror -Wall
 override CFLAGS += -fPIC
 override CFLAGS += $(INCLUDES)
 override CFLAGS += -fvisibility=hidden
+override CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64
 
 ifeq ($(VERBOSE),1)
   Q =
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 680e63066cf3..7392f428c07b 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -74,23 +74,6 @@ struct xsk_nl_info {
 	int fd;
 };
 
-/* For 32-bit systems, we need to use mmap2 as the offsets are 64-bit.
- * Unfortunately, it is not part of glibc.
- */
-static inline void *xsk_mmap(void *addr, size_t length, int prot, int flags,
-			     int fd, __u64 offset)
-{
-#ifdef __NR_mmap2
-	unsigned int page_shift = __builtin_ffs(getpagesize()) - 1;
-	long ret = syscall(__NR_mmap2, addr, length, prot, flags, fd,
-			   (off_t)(offset >> page_shift));
-
-	return (void *)ret;
-#else
-	return mmap(addr, length, prot, flags, fd, offset);
-#endif
-}
-
 int xsk_umem__fd(const struct xsk_umem *umem)
 {
 	return umem ? umem->fd : -EINVAL;
@@ -210,10 +193,9 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 		goto out_socket;
 	}
 
-	map = xsk_mmap(NULL, off.fr.desc +
-		       umem->config.fill_size * sizeof(__u64),
-		       PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
-		       umem->fd, XDP_UMEM_PGOFF_FILL_RING);
+	map = mmap(NULL, off.fr.desc + umem->config.fill_size * sizeof(__u64),
+		   PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd,
+		   XDP_UMEM_PGOFF_FILL_RING);
 	if (map == MAP_FAILED) {
 		err = -errno;
 		goto out_socket;
@@ -227,10 +209,9 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 	fill->ring = map + off.fr.desc;
 	fill->cached_cons = umem->config.fill_size;
 
-	map = xsk_mmap(NULL,
-		       off.cr.desc + umem->config.comp_size * sizeof(__u64),
-		       PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
-		       umem->fd, XDP_UMEM_PGOFF_COMPLETION_RING);
+	map = mmap(NULL, off.cr.desc + umem->config.comp_size * sizeof(__u64),
+		   PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd,
+		   XDP_UMEM_PGOFF_COMPLETION_RING);
 	if (map == MAP_FAILED) {
 		err = -errno;
 		goto out_mmap;
@@ -550,11 +531,10 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 	}
 
 	if (rx) {
-		rx_map = xsk_mmap(NULL, off.rx.desc +
-				  xsk->config.rx_size * sizeof(struct xdp_desc),
-				  PROT_READ | PROT_WRITE,
-				  MAP_SHARED | MAP_POPULATE,
-				  xsk->fd, XDP_PGOFF_RX_RING);
+		rx_map = mmap(NULL, off.rx.desc +
+			      xsk->config.rx_size * sizeof(struct xdp_desc),
+			      PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
+			      xsk->fd, XDP_PGOFF_RX_RING);
 		if (rx_map == MAP_FAILED) {
 			err = -errno;
 			goto out_socket;
@@ -569,11 +549,10 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 	xsk->rx = rx;
 
 	if (tx) {
-		tx_map = xsk_mmap(NULL, off.tx.desc +
-				  xsk->config.tx_size * sizeof(struct xdp_desc),
-				  PROT_READ | PROT_WRITE,
-				  MAP_SHARED | MAP_POPULATE,
-				  xsk->fd, XDP_PGOFF_TX_RING);
+		tx_map = mmap(NULL, off.tx.desc +
+			      xsk->config.tx_size * sizeof(struct xdp_desc),
+			      PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
+			      xsk->fd, XDP_PGOFF_TX_RING);
 		if (tx_map == MAP_FAILED) {
 			err = -errno;
 			goto out_mmap_rx;
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next v2 2/2] r8169: use the generic EEE management functions
From: Heiner Kallweit @ 2019-08-15 12:14 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <88a71ee7-a17d-ac9c-c998-d0ea35e5c566@gmail.com>

Now that the Realtek PHY driver maps the vendor-specific EEE registers
to the standard MMD registers, we can remove all special handling and
use the generic functions phy_ethtool_get/set_eee.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
v2.
- rebased
---
 drivers/net/ethernet/realtek/r8169_main.c | 172 +++-------------------
 1 file changed, 24 insertions(+), 148 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 34d072f0c..c9550b4f9 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -733,6 +733,13 @@ static bool rtl_is_8168evl_up(struct rtl8169_private *tp)
 	       tp->mac_version != RTL_GIGA_MAC_VER_39;
 }
 
+static bool rtl_supports_eee(struct rtl8169_private *tp)
+{
+	return tp->mac_version >= RTL_GIGA_MAC_VER_34 &&
+	       tp->mac_version != RTL_GIGA_MAC_VER_37 &&
+	       tp->mac_version != RTL_GIGA_MAC_VER_39;
+}
+
 struct rtl_cond {
 	bool (*check)(struct rtl8169_private *);
 	const char *msg;
@@ -1945,144 +1952,40 @@ static int rtl_set_coalesce(struct net_device *dev, struct ethtool_coalesce *ec)
 	return 0;
 }
 
-static int rtl_get_eee_supp(struct rtl8169_private *tp)
-{
-	struct phy_device *phydev = tp->phydev;
-	int ret;
-
-	switch (tp->mac_version) {
-	case RTL_GIGA_MAC_VER_34:
-	case RTL_GIGA_MAC_VER_35:
-	case RTL_GIGA_MAC_VER_36:
-	case RTL_GIGA_MAC_VER_38:
-		ret = phy_read_mmd(phydev, MDIO_MMD_PCS, MDIO_PCS_EEE_ABLE);
-		break;
-	case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
-		ret = phy_read_paged(phydev, 0x0a5c, 0x12);
-		break;
-	default:
-		ret = -EPROTONOSUPPORT;
-		break;
-	}
-
-	return ret;
-}
-
-static int rtl_get_eee_lpadv(struct rtl8169_private *tp)
-{
-	struct phy_device *phydev = tp->phydev;
-	int ret;
-
-	switch (tp->mac_version) {
-	case RTL_GIGA_MAC_VER_34:
-	case RTL_GIGA_MAC_VER_35:
-	case RTL_GIGA_MAC_VER_36:
-	case RTL_GIGA_MAC_VER_38:
-		ret = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_EEE_LPABLE);
-		break;
-	case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
-		ret = phy_read_paged(phydev, 0x0a5d, 0x11);
-		break;
-	default:
-		ret = -EPROTONOSUPPORT;
-		break;
-	}
-
-	return ret;
-}
-
-static int rtl_get_eee_adv(struct rtl8169_private *tp)
-{
-	struct phy_device *phydev = tp->phydev;
-	int ret;
-
-	switch (tp->mac_version) {
-	case RTL_GIGA_MAC_VER_34:
-	case RTL_GIGA_MAC_VER_35:
-	case RTL_GIGA_MAC_VER_36:
-	case RTL_GIGA_MAC_VER_38:
-		ret = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_EEE_ADV);
-		break;
-	case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
-		ret = phy_read_paged(phydev, 0x0a5d, 0x10);
-		break;
-	default:
-		ret = -EPROTONOSUPPORT;
-		break;
-	}
-
-	return ret;
-}
-
-static int rtl_set_eee_adv(struct rtl8169_private *tp, int val)
-{
-	struct phy_device *phydev = tp->phydev;
-	int ret = 0;
-
-	switch (tp->mac_version) {
-	case RTL_GIGA_MAC_VER_34:
-	case RTL_GIGA_MAC_VER_35:
-	case RTL_GIGA_MAC_VER_36:
-	case RTL_GIGA_MAC_VER_38:
-		ret = phy_write_mmd(phydev, MDIO_MMD_AN, MDIO_AN_EEE_ADV, val);
-		break;
-	case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
-		phy_write_paged(phydev, 0x0a5d, 0x10, val);
-		break;
-	default:
-		ret = -EPROTONOSUPPORT;
-		break;
-	}
-
-	return ret;
-}
-
 static int rtl8169_get_eee(struct net_device *dev, struct ethtool_eee *data)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
 	struct device *d = tp_to_dev(tp);
 	int ret;
 
+	if (!rtl_supports_eee(tp))
+		return -EOPNOTSUPP;
+
 	pm_runtime_get_noresume(d);
 
 	if (!pm_runtime_active(d)) {
 		ret = -EOPNOTSUPP;
-		goto out;
+	} else {
+		ret = phy_ethtool_get_eee(tp->phydev, data);
 	}
 
-	/* Get Supported EEE */
-	ret = rtl_get_eee_supp(tp);
-	if (ret < 0)
-		goto out;
-	data->supported = mmd_eee_cap_to_ethtool_sup_t(ret);
-
-	/* Get advertisement EEE */
-	ret = rtl_get_eee_adv(tp);
-	if (ret < 0)
-		goto out;
-	data->advertised = mmd_eee_adv_to_ethtool_adv_t(ret);
-	data->eee_enabled = !!data->advertised;
-
-	/* Get LP advertisement EEE */
-	ret = rtl_get_eee_lpadv(tp);
-	if (ret < 0)
-		goto out;
-	data->lp_advertised = mmd_eee_adv_to_ethtool_adv_t(ret);
-	data->eee_active = !!(data->advertised & data->lp_advertised);
-out:
 	pm_runtime_put_noidle(d);
-	return ret < 0 ? ret : 0;
+
+	return ret;
 }
 
 static int rtl8169_set_eee(struct net_device *dev, struct ethtool_eee *data)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
 	struct device *d = tp_to_dev(tp);
-	int old_adv, adv = 0, cap, ret;
+	int ret;
+
+	if (!rtl_supports_eee(tp))
+		return -EOPNOTSUPP;
 
 	pm_runtime_get_noresume(d);
 
-	if (!dev->phydev || !pm_runtime_active(d)) {
+	if (!pm_runtime_active(d)) {
 		ret = -EOPNOTSUPP;
 		goto out;
 	}
@@ -2093,38 +1996,10 @@ static int rtl8169_set_eee(struct net_device *dev, struct ethtool_eee *data)
 		goto out;
 	}
 
-	/* Get Supported EEE */
-	ret = rtl_get_eee_supp(tp);
-	if (ret < 0)
-		goto out;
-	cap = ret;
-
-	ret = rtl_get_eee_adv(tp);
-	if (ret < 0)
-		goto out;
-	old_adv = ret;
-
-	if (data->eee_enabled) {
-		adv = !data->advertised ? cap :
-		      ethtool_adv_to_mmd_eee_adv_t(data->advertised) & cap;
-		/* Mask prohibited EEE modes */
-		adv &= ~dev->phydev->eee_broken_modes;
-	}
-
-	if (old_adv != adv) {
-		ret = rtl_set_eee_adv(tp, adv);
-		if (ret < 0)
-			goto out;
-
-		/* Restart autonegotiation so the new modes get sent to the
-		 * link partner.
-		 */
-		ret = phy_restart_aneg(dev->phydev);
-	}
-
+	ret = phy_ethtool_set_eee(tp->phydev, data);
 out:
 	pm_runtime_put_noidle(d);
-	return ret < 0 ? ret : 0;
+	return ret;
 }
 
 static const struct ethtool_ops rtl8169_ethtool_ops = {
@@ -2151,10 +2026,11 @@ static const struct ethtool_ops rtl8169_ethtool_ops = {
 
 static void rtl_enable_eee(struct rtl8169_private *tp)
 {
-	int supported = rtl_get_eee_supp(tp);
+	struct phy_device *phydev = tp->phydev;
+	int supported = phy_read_mmd(phydev, MDIO_MMD_PCS, MDIO_PCS_EEE_ABLE);
 
 	if (supported > 0)
-		rtl_set_eee_adv(tp, supported);
+		phy_write_mmd(phydev, MDIO_MMD_AN, MDIO_AN_EEE_ADV, supported);
 }
 
 static void rtl8169_get_mac_version(struct rtl8169_private *tp)
-- 
2.22.1



^ permalink raw reply related

* [PATCH bpf-next v2 3/3] samples: bpf: syscal_nrs: use mmap2 if defined
From: Ivan Khoronzhuk @ 2019-08-15 12:13 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel
  Cc: davem, hawk, john.fastabend, jakub.kicinski, daniel, netdev, bpf,
	xdp-newbies, linux-kernel, jlemon, yhs, andrii.nakryiko,
	Ivan Khoronzhuk
In-Reply-To: <20190815121356.8848-1-ivan.khoronzhuk@linaro.org>

For arm32 xdp sockets mmap2 is preferred, so use it if it's defined.
Declaration of __NR_mmap can be skipped and it breaks build.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 samples/bpf/syscall_nrs.c  |  6 ++++++
 samples/bpf/tracex5_kern.c | 13 +++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/samples/bpf/syscall_nrs.c b/samples/bpf/syscall_nrs.c
index 516e255cbe8f..88f940052450 100644
--- a/samples/bpf/syscall_nrs.c
+++ b/samples/bpf/syscall_nrs.c
@@ -9,5 +9,11 @@ void syscall_defines(void)
 	COMMENT("Linux system call numbers.");
 	SYSNR(__NR_write);
 	SYSNR(__NR_read);
+#ifdef __NR_mmap2
+	SYSNR(__NR_mmap2);
+#endif
+#ifdef __NR_mmap
 	SYSNR(__NR_mmap);
+#endif
+
 }
diff --git a/samples/bpf/tracex5_kern.c b/samples/bpf/tracex5_kern.c
index f57f4e1ea1ec..35cb0eed3be5 100644
--- a/samples/bpf/tracex5_kern.c
+++ b/samples/bpf/tracex5_kern.c
@@ -68,12 +68,25 @@ PROG(SYS__NR_read)(struct pt_regs *ctx)
 	return 0;
 }
 
+#ifdef __NR_mmap2
+PROG(SYS__NR_mmap2)(struct pt_regs *ctx)
+{
+	char fmt[] = "mmap2\n";
+
+	bpf_trace_printk(fmt, sizeof(fmt));
+	return 0;
+}
+#endif
+
+#ifdef __NR_mmap
 PROG(SYS__NR_mmap)(struct pt_regs *ctx)
 {
 	char fmt[] = "mmap\n";
+
 	bpf_trace_printk(fmt, sizeof(fmt));
 	return 0;
 }
+#endif
 
 char _license[] SEC("license") = "GPL";
 u32 _version SEC("version") = LINUX_VERSION_CODE;
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next v2 1/2] net: phy: realtek: add support for EEE registers on integrated PHY's
From: Heiner Kallweit @ 2019-08-15 12:12 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <88a71ee7-a17d-ac9c-c998-d0ea35e5c566@gmail.com>

EEE-related registers on newer integrated PHY's have the standard
layout, but are accessible not via MMD but via vendor-specific
registers. Emulating the standard MMD registers allows to use the
generic functions for EEE control.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/realtek.c | 43 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index c49a1fb13..2635ad1ff 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -266,6 +266,45 @@ static int rtl8366rb_config_init(struct phy_device *phydev)
 	return ret;
 }
 
+static int rtlgen_read_mmd(struct phy_device *phydev, int devnum, u16 regnum)
+{
+	int ret;
+
+	if (devnum == MDIO_MMD_PCS && regnum == MDIO_PCS_EEE_ABLE) {
+		rtl821x_write_page(phydev, 0xa5c);
+		ret = __phy_read(phydev, 0x12);
+		rtl821x_write_page(phydev, 0);
+	} else if (devnum == MDIO_MMD_AN && regnum == MDIO_AN_EEE_ADV) {
+		rtl821x_write_page(phydev, 0xa5d);
+		ret = __phy_read(phydev, 0x10);
+		rtl821x_write_page(phydev, 0);
+	} else if (devnum == MDIO_MMD_AN && regnum == MDIO_AN_EEE_LPABLE) {
+		rtl821x_write_page(phydev, 0xa5d);
+		ret = __phy_read(phydev, 0x11);
+		rtl821x_write_page(phydev, 0);
+	} else {
+		ret = -EOPNOTSUPP;
+	}
+
+	return ret;
+}
+
+static int rtlgen_write_mmd(struct phy_device *phydev, int devnum, u16 regnum,
+			    u16 val)
+{
+	int ret;
+
+	if (devnum == MDIO_MMD_AN && regnum == MDIO_AN_EEE_ADV) {
+		rtl821x_write_page(phydev, 0xa5d);
+		ret = __phy_write(phydev, 0x10, val);
+		rtl821x_write_page(phydev, 0);
+	} else {
+		ret = -EOPNOTSUPP;
+	}
+
+	return ret;
+}
+
 static int rtl8125_get_features(struct phy_device *phydev)
 {
 	int val;
@@ -422,6 +461,8 @@ static struct phy_driver realtek_drvs[] = {
 		.resume		= genphy_resume,
 		.read_page	= rtl821x_read_page,
 		.write_page	= rtl821x_write_page,
+		.read_mmd	= rtlgen_read_mmd,
+		.write_mmd	= rtlgen_write_mmd,
 	}, {
 		.name		= "RTL8125 2.5Gbps internal",
 		.match_phy_device = rtl8125_match_phy_device,
@@ -432,6 +473,8 @@ static struct phy_driver realtek_drvs[] = {
 		.resume		= genphy_resume,
 		.read_page	= rtl821x_read_page,
 		.write_page	= rtl821x_write_page,
+		.read_mmd	= rtlgen_read_mmd,
+		.write_mmd	= rtlgen_write_mmd,
 	}, {
 		PHY_ID_MATCH_EXACT(0x001cc961),
 		.name		= "RTL8366RB Gigabit Ethernet",
-- 
2.22.1




^ permalink raw reply related

* [PATCH bpf-next v2 2/3] xdp: xdp_umem: replace kmap on vmap for umem map
From: Ivan Khoronzhuk @ 2019-08-15 12:13 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel
  Cc: davem, hawk, john.fastabend, jakub.kicinski, daniel, netdev, bpf,
	xdp-newbies, linux-kernel, jlemon, yhs, andrii.nakryiko,
	Ivan Khoronzhuk
In-Reply-To: <20190815121356.8848-1-ivan.khoronzhuk@linaro.org>

For 64-bit there is no reason to use vmap/vunmap, so use page_address
as it was initially. For 32 bits, in some apps, like in samples
xdpsock_user.c when number of pgs in use is quite big, the kmap
memory can be not enough, despite on this, kmap looks like is
deprecated in such cases as it can block and should be used rather
for dynamic mm.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 net/xdp/xdp_umem.c | 36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index a0607969f8c0..d740c4f8810c 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -14,7 +14,7 @@
 #include <linux/netdevice.h>
 #include <linux/rtnetlink.h>
 #include <linux/idr.h>
-#include <linux/highmem.h>
+#include <linux/vmalloc.h>
 
 #include "xdp_umem.h"
 #include "xsk_queue.h"
@@ -170,7 +170,30 @@ static void xdp_umem_unmap_pages(struct xdp_umem *umem)
 	unsigned int i;
 
 	for (i = 0; i < umem->npgs; i++)
-		kunmap(umem->pgs[i]);
+		if (PageHighMem(umem->pgs[i]))
+			vunmap(umem->pages[i].addr);
+}
+
+static int xdp_umem_map_pages(struct xdp_umem *umem)
+{
+	unsigned int i;
+	void *addr;
+
+	for (i = 0; i < umem->npgs; i++) {
+		if (PageHighMem(umem->pgs[i]))
+			addr = vmap(&umem->pgs[i], 1, VM_MAP, PAGE_KERNEL);
+		else
+			addr = page_address(umem->pgs[i]);
+
+		if (!addr) {
+			xdp_umem_unmap_pages(umem);
+			return -ENOMEM;
+		}
+
+		umem->pages[i].addr = addr;
+	}
+
+	return 0;
 }
 
 static void xdp_umem_unpin_pages(struct xdp_umem *umem)
@@ -312,7 +335,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
 	unsigned int chunks, chunks_per_page;
 	u64 addr = mr->addr, size = mr->len;
-	int size_chk, err, i;
+	int size_chk, err;
 
 	if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) {
 		/* Strictly speaking we could support this, if:
@@ -378,10 +401,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 		goto out_account;
 	}
 
-	for (i = 0; i < umem->npgs; i++)
-		umem->pages[i].addr = kmap(umem->pgs[i]);
+	err = xdp_umem_map_pages(umem);
+	if (!err)
+		return 0;
 
-	return 0;
+	kfree(umem->pages);
 
 out_account:
 	xdp_umem_unaccount_pages(umem);
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next v2 0/2] net: phy: realtek: map vendor-specific EEE registers to standard MMD registers
From: Heiner Kallweit @ 2019-08-15 12:12 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org

EEE-related registers on newer integrated PHY's have the standard
layout, but are accessible not via MMD but via vendor-specific
registers. Emulating the standard MMD registers allows to use the
generic functions for EEE control and to significantly simplify
the r8169 driver.

v2:
- rebase patch 2

Heiner Kallweit (2):
  net: phy: realtek: add support for EEE registers on integrated PHY's
  r8169: use the generic EEE management functions

 drivers/net/ethernet/realtek/r8169_main.c | 172 +++-------------------
 drivers/net/phy/realtek.c                 |  43 ++++++
 2 files changed, 67 insertions(+), 148 deletions(-)

-- 
2.22.1

^ permalink raw reply

* RE: [v2, 4/4] ocelot: add VCAP IS2 rule to trap PTP Ethernet frames
From: Y.b. Lu @ 2019-08-15 12:08 UTC (permalink / raw)
  To: Allan W . Nielsen
  Cc: Andrew Lunn, netdev@vger.kernel.org, David S . Miller,
	Alexandre Belloni, Microchip Linux Driver Support
In-Reply-To: <20190814091645.dwo7c36xan2ttln2@lx-anielsen.microsemi.net>

Hi Allan,

> -----Original Message-----
> From: Allan W . Nielsen <allan.nielsen@microchip.com>
> Sent: Wednesday, August 14, 2019 5:17 PM
> To: Y.b. Lu <yangbo.lu@nxp.com>
> Cc: Andrew Lunn <andrew@lunn.ch>; netdev@vger.kernel.org; David S . Miller
> <davem@davemloft.net>; Alexandre Belloni <alexandre.belloni@bootlin.com>;
> Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
> Subject: Re: [v2, 4/4] ocelot: add VCAP IS2 rule to trap PTP Ethernet frames
> 
> The 08/14/2019 04:56, Y.b. Lu wrote:
> > > -----Original Message-----
> > > From: Allan W . Nielsen <allan.nielsen@microchip.com>
> > > Sent: Tuesday, August 13, 2019 2:25 PM
> > > To: Y.b. Lu <yangbo.lu@nxp.com>
> > > Cc: netdev@vger.kernel.org; David S . Miller <davem@davemloft.net>;
> > > Alexandre Belloni <alexandre.belloni@bootlin.com>; Microchip Linux
> > > Driver Support <UNGLinuxDriver@microchip.com>
> > > Subject: Re: [v2, 4/4] ocelot: add VCAP IS2 rule to trap PTP
> > > Ethernet frames
> > >
> > > The 08/13/2019 10:52, Yangbo Lu wrote:
> > > > All the PTP messages over Ethernet have etype 0x88f7 on them.
> > > > Use etype as the key to trap PTP messages.
> > > >
> > > > Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
[...]
> Can we continue this discussion in the other thread where I listed the 3
> scenarios?

[Y.b. Lu] Sure. Let's discuss in that thread.
https://patchwork.ozlabs.org/patch/1145627/

[...]
> > > What if do not want this on all ports?
> > [Y.b. Lu] Actually I don’t think there should be difference of handling PTP
> messages on each port.
> > You don’t need to run PTP protocol application on the specific port if you
> don’t want.
> What if you want some vlans or some ports to be PTP unaware, and other to
> be PTP aware.

[Y.b. Lu] Actually I couldn’t find reasons why make some ports PTP unaware, if there is software stack for PTP aware...

> 
> > > If you do not have an application behind this implementing a
> > > boundary or transparent clock, then you are breaking PTP on the network.
> > [Y.b. Lu] You're right. But actually for PTP network, all PTP devices should run
> PTP protocol on it.
> > Of course, it's better to have a way to configure it as non-aware PTP switch.
> I think we agree.
> 
> In my point of view, it is the PTP daemon who should configure frames to be
> trapped. Then the switch will be PTP unaware until the PTP daemon starts up
> and is ready to make it aware.
> 
> If we put it in the init function, then it will be of PTP broken until the PTP
> daemon starts.
> 
> /Allan


^ permalink raw reply

* [PATCH net-next 3/3] net: phy: remove genphy_config_init
From: Heiner Kallweit @ 2019-08-15 12:04 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller, Kevin Hilman,
	Vivien Didelot
  Cc: netdev@vger.kernel.org, open list:ARM/Amlogic Meson...
In-Reply-To: <95dfdb55-415c-c995-cba3-1902bdd46aec@gmail.com>

Now that all users have been removed we can remove genphy_config_init.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy_device.c | 51 ------------------------------------
 include/linux/phy.h          |  1 -
 2 files changed, 52 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 7e7393f3c..d347ddcac 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1895,57 +1895,6 @@ int genphy_soft_reset(struct phy_device *phydev)
 }
 EXPORT_SYMBOL(genphy_soft_reset);
 
-int genphy_config_init(struct phy_device *phydev)
-{
-	int val;
-	__ETHTOOL_DECLARE_LINK_MODE_MASK(features) = { 0, };
-
-	linkmode_set_bit_array(phy_basic_ports_array,
-			       ARRAY_SIZE(phy_basic_ports_array),
-			       features);
-	linkmode_set_bit(ETHTOOL_LINK_MODE_Pause_BIT, features);
-	linkmode_set_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, features);
-
-	/* Do we support autonegotiation? */
-	val = phy_read(phydev, MII_BMSR);
-	if (val < 0)
-		return val;
-
-	if (val & BMSR_ANEGCAPABLE)
-		linkmode_set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, features);
-
-	if (val & BMSR_100FULL)
-		linkmode_set_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT, features);
-	if (val & BMSR_100HALF)
-		linkmode_set_bit(ETHTOOL_LINK_MODE_100baseT_Half_BIT, features);
-	if (val & BMSR_10FULL)
-		linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Full_BIT, features);
-	if (val & BMSR_10HALF)
-		linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT, features);
-
-	if (val & BMSR_ESTATEN) {
-		val = phy_read(phydev, MII_ESTATUS);
-		if (val < 0)
-			return val;
-
-		if (val & ESTATUS_1000_TFULL)
-			linkmode_set_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT,
-					 features);
-		if (val & ESTATUS_1000_THALF)
-			linkmode_set_bit(ETHTOOL_LINK_MODE_1000baseT_Half_BIT,
-					 features);
-		if (val & ESTATUS_1000_XFULL)
-			linkmode_set_bit(ETHTOOL_LINK_MODE_1000baseX_Full_BIT,
-					 features);
-	}
-
-	linkmode_and(phydev->supported, phydev->supported, features);
-	linkmode_and(phydev->advertising, phydev->advertising, features);
-
-	return 0;
-}
-EXPORT_SYMBOL(genphy_config_init);
-
 /**
  * genphy_read_abilities - read PHY abilities from Clause 22 registers
  * @phydev: target phy_device struct
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 5ac7d2137..d26779f1f 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1069,7 +1069,6 @@ void phy_attached_print(struct phy_device *phydev, const char *fmt, ...)
 void phy_attached_info(struct phy_device *phydev);
 
 /* Clause 22 PHY */
-int genphy_config_init(struct phy_device *phydev);
 int genphy_read_abilities(struct phy_device *phydev);
 int genphy_setup_forced(struct phy_device *phydev);
 int genphy_restart_aneg(struct phy_device *phydev);
-- 
2.22.1



^ permalink raw reply related

* [PATCH net-next 2/3] net: dsa: remove calls to genphy_config_init
From: Heiner Kallweit @ 2019-08-15 12:03 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller, Kevin Hilman,
	Vivien Didelot
  Cc: netdev@vger.kernel.org, open list:ARM/Amlogic Meson...
In-Reply-To: <95dfdb55-415c-c995-cba3-1902bdd46aec@gmail.com>

Supported PHY features are either auto-detected or explicitly set.
In both cases calling genphy_config_init isn't needed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 net/dsa/port.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/net/dsa/port.c b/net/dsa/port.c
index f071acf28..f75301456 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -538,10 +538,6 @@ static int dsa_port_setup_phy_of(struct dsa_port *dp, bool enable)
 		return PTR_ERR(phydev);
 
 	if (enable) {
-		err = genphy_config_init(phydev);
-		if (err < 0)
-			goto err_put_dev;
-
 		err = genphy_resume(phydev);
 		if (err < 0)
 			goto err_put_dev;
@@ -589,7 +585,6 @@ static int dsa_port_fixed_link_register_of(struct dsa_port *dp)
 		mode = PHY_INTERFACE_MODE_NA;
 	phydev->interface = mode;
 
-	genphy_config_init(phydev);
 	genphy_read_status(phydev);
 
 	if (ds->ops->adjust_link)
-- 
2.22.1



^ permalink raw reply related

* [PATCH net-next 1/3] net: phy: remove calls to genphy_config_init
From: Heiner Kallweit @ 2019-08-15 12:02 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller, Kevin Hilman,
	Vivien Didelot
  Cc: netdev@vger.kernel.org, open list:ARM/Amlogic Meson...
In-Reply-To: <95dfdb55-415c-c995-cba3-1902bdd46aec@gmail.com>

Supported PHY features are either auto-detected or explicitly set.
In both cases calling genphy_config_init isn't needed. All that
genphy_config_init does is removing features that are set as
supported but can't be auto-detected. Basically it duplicates the
code in genphy_read_abilities. Therefore remove such calls from
all PHY drivers.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/at803x.c       |  4 ----
 drivers/net/phy/dp83822.c      |  5 -----
 drivers/net/phy/dp83848.c      | 16 ++++++++--------
 drivers/net/phy/dp83tc811.c    |  4 ----
 drivers/net/phy/meson-gxl.c    |  2 +-
 drivers/net/phy/microchip.c    |  1 -
 drivers/net/phy/microchip_t1.c |  1 -
 drivers/net/phy/mscc.c         |  4 ++--
 drivers/net/phy/vitesse.c      |  6 +++---
 9 files changed, 14 insertions(+), 29 deletions(-)

diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
index 6ad8b1c63..2aa7b2e60 100644
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -249,10 +249,6 @@ static int at803x_config_init(struct phy_device *phydev)
 {
 	int ret;
 
-	ret = genphy_config_init(phydev);
-	if (ret < 0)
-		return ret;
-
 	/* The RX and TX delay default is:
 	 *   after HW reset: RX delay enabled and TX delay disabled
 	 *   after SW reset: RX delay enabled, while TX delay retains the
diff --git a/drivers/net/phy/dp83822.c b/drivers/net/phy/dp83822.c
index 7ed4760fb..8a4b1d167 100644
--- a/drivers/net/phy/dp83822.c
+++ b/drivers/net/phy/dp83822.c
@@ -254,13 +254,8 @@ static int dp83822_config_intr(struct phy_device *phydev)
 
 static int dp83822_config_init(struct phy_device *phydev)
 {
-	int err;
 	int value;
 
-	err = genphy_config_init(phydev);
-	if (err < 0)
-		return err;
-
 	value = DP83822_WOL_MAGIC_EN | DP83822_WOL_SECURE_ON | DP83822_WOL_EN;
 
 	return phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG,
diff --git a/drivers/net/phy/dp83848.c b/drivers/net/phy/dp83848.c
index 6f9bc7d91..11644579a 100644
--- a/drivers/net/phy/dp83848.c
+++ b/drivers/net/phy/dp83848.c
@@ -68,13 +68,8 @@ static int dp83848_config_intr(struct phy_device *phydev)
 
 static int dp83848_config_init(struct phy_device *phydev)
 {
-	int err;
 	int val;
 
-	err = genphy_config_init(phydev);
-	if (err < 0)
-		return err;
-
 	/* DP83620 always reports Auto Negotiation Ability on BMSR. Instead,
 	 * we check initial value of BMCR Auto negotiation enable bit
 	 */
@@ -85,6 +80,11 @@ static int dp83848_config_init(struct phy_device *phydev)
 	return 0;
 }
 
+static int dummy_config_init(struct phy_device *phydev)
+{
+	return 0;
+}
+
 static struct mdio_device_id __maybe_unused dp83848_tbl[] = {
 	{ TI_DP83848C_PHY_ID, 0xfffffff0 },
 	{ NS_DP83848C_PHY_ID, 0xfffffff0 },
@@ -113,13 +113,13 @@ MODULE_DEVICE_TABLE(mdio, dp83848_tbl);
 
 static struct phy_driver dp83848_driver[] = {
 	DP83848_PHY_DRIVER(TI_DP83848C_PHY_ID, "TI DP83848C 10/100 Mbps PHY",
-			   genphy_config_init),
+			   dummy_config_init),
 	DP83848_PHY_DRIVER(NS_DP83848C_PHY_ID, "NS DP83848C 10/100 Mbps PHY",
-			   genphy_config_init),
+			   dummy_config_init),
 	DP83848_PHY_DRIVER(TI_DP83620_PHY_ID, "TI DP83620 10/100 Mbps PHY",
 			   dp83848_config_init),
 	DP83848_PHY_DRIVER(TLK10X_PHY_ID, "TI TLK10X 10/100 Mbps PHY",
-			   genphy_config_init),
+			   dummy_config_init),
 };
 module_phy_driver(dp83848_driver);
 
diff --git a/drivers/net/phy/dp83tc811.c b/drivers/net/phy/dp83tc811.c
index ac27da168..06f08832e 100644
--- a/drivers/net/phy/dp83tc811.c
+++ b/drivers/net/phy/dp83tc811.c
@@ -277,10 +277,6 @@ static int dp83811_config_init(struct phy_device *phydev)
 {
 	int value, err;
 
-	err = genphy_config_init(phydev);
-	if (err < 0)
-		return err;
-
 	value = phy_read(phydev, MII_DP83811_SGMII_CTRL);
 	if (phydev->interface == PHY_INTERFACE_MODE_SGMII) {
 		err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
index fa80d6dce..e8f2ca625 100644
--- a/drivers/net/phy/meson-gxl.c
+++ b/drivers/net/phy/meson-gxl.c
@@ -136,7 +136,7 @@ static int meson_gxl_config_init(struct phy_device *phydev)
 	if (ret)
 		return ret;
 
-	return genphy_config_init(phydev);
+	return 0;
 }
 
 /* This function is provided to cope with the possible failures of this phy
diff --git a/drivers/net/phy/microchip.c b/drivers/net/phy/microchip.c
index eb1b3287f..a644e8e50 100644
--- a/drivers/net/phy/microchip.c
+++ b/drivers/net/phy/microchip.c
@@ -305,7 +305,6 @@ static int lan88xx_config_init(struct phy_device *phydev)
 {
 	int val;
 
-	genphy_config_init(phydev);
 	/*Zerodetect delay enable */
 	val = phy_read_mmd(phydev, MDIO_MMD_PCS,
 			   PHY_ARDENNES_MMD_DEV_3_PHY_CFG);
diff --git a/drivers/net/phy/microchip_t1.c b/drivers/net/phy/microchip_t1.c
index 3d09b4716..001def450 100644
--- a/drivers/net/phy/microchip_t1.c
+++ b/drivers/net/phy/microchip_t1.c
@@ -48,7 +48,6 @@ static struct phy_driver microchip_t1_phy_driver[] = {
 
 		.features       = PHY_BASIC_T1_FEATURES,
 
-		.config_init    = genphy_config_init,
 		.config_aneg    = genphy_config_aneg,
 
 		.ack_interrupt  = lan87xx_phy_ack_interrupt,
diff --git a/drivers/net/phy/mscc.c b/drivers/net/phy/mscc.c
index 645d354ff..7ada1fd9c 100644
--- a/drivers/net/phy/mscc.c
+++ b/drivers/net/phy/mscc.c
@@ -1725,7 +1725,7 @@ static int vsc8584_config_init(struct phy_device *phydev)
 			return ret;
 	}
 
-	return genphy_config_init(phydev);
+	return 0;
 
 err:
 	mutex_unlock(&phydev->mdio.bus->mdio_lock);
@@ -1767,7 +1767,7 @@ static int vsc85xx_config_init(struct phy_device *phydev)
 			return rc;
 	}
 
-	return genphy_config_init(phydev);
+	return 0;
 }
 
 static int vsc8584_did_interrupt(struct phy_device *phydev)
diff --git a/drivers/net/phy/vitesse.c b/drivers/net/phy/vitesse.c
index 43691b1ac..bb6803527 100644
--- a/drivers/net/phy/vitesse.c
+++ b/drivers/net/phy/vitesse.c
@@ -197,7 +197,7 @@ static int vsc738x_config_init(struct phy_device *phydev)
 
 	vsc73xx_config_init(phydev);
 
-	return genphy_config_init(phydev);
+	return 0;
 }
 
 static int vsc739x_config_init(struct phy_device *phydev)
@@ -229,7 +229,7 @@ static int vsc739x_config_init(struct phy_device *phydev)
 
 	vsc73xx_config_init(phydev);
 
-	return genphy_config_init(phydev);
+	return 0;
 }
 
 static int vsc73xx_config_aneg(struct phy_device *phydev)
@@ -267,7 +267,7 @@ static int vsc8601_config_init(struct phy_device *phydev)
 	if (ret < 0)
 		return ret;
 
-	return genphy_config_init(phydev);
+	return 0;
 }
 
 static int vsc824x_ack_interrupt(struct phy_device *phydev)
-- 
2.22.1



^ permalink raw reply related

* [PATCH net-next 0/3] net: phy: remove genphy_config_init
From: Heiner Kallweit @ 2019-08-15 12:01 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller, Kevin Hilman,
	Vivien Didelot
  Cc: netdev@vger.kernel.org, open list:ARM/Amlogic Meson...

Supported PHY features are either auto-detected or explicitly set.
In both cases calling genphy_config_init isn't needed. All that
genphy_config_init does is removing features that are set as
supported but can't be auto-detected. Basically it duplicates the
code in genphy_read_abilities. Therefore remove genphy_config_init.

Heiner Kallweit (3):
  net: phy: remove calls to genphy_config_init
  net: dsa: remove calls to genphy_config_init
  net: phy: remove genphy_config_init

 drivers/net/phy/at803x.c       |  4 ---
 drivers/net/phy/dp83822.c      |  5 ----
 drivers/net/phy/dp83848.c      | 16 +++++------
 drivers/net/phy/dp83tc811.c    |  4 ---
 drivers/net/phy/meson-gxl.c    |  2 +-
 drivers/net/phy/microchip.c    |  1 -
 drivers/net/phy/microchip_t1.c |  1 -
 drivers/net/phy/mscc.c         |  4 +--
 drivers/net/phy/phy_device.c   | 51 ----------------------------------
 drivers/net/phy/vitesse.c      |  6 ++--
 include/linux/phy.h            |  1 -
 net/dsa/port.c                 |  5 ----
 12 files changed, 14 insertions(+), 86 deletions(-)

-- 
2.22.1


^ permalink raw reply

* Re: [PATCH 1/2] usb: serial: option: Add the BroadMobi BM818 card
From: Johan Hovold @ 2019-08-15 11:49 UTC (permalink / raw)
  To: Bob Ham
  Cc: Johan Hovold, Angus Ainslie (Purism), kernel, Bjørn Mork,
	David S. Miller, Greg Kroah-Hartman, netdev, linux-usb,
	linux-kernel
In-Reply-To: <5fb96703-b174-eef1-5ad1-693e2bbce32f@puri.sm>

[-- Attachment #1: Type: text/plain, Size: 1230 bytes --]

On Mon, Aug 05, 2019 at 03:44:30PM +0100, Bob Ham wrote:
> On 05/08/2019 12:47, Johan Hovold wrote:
> > On Wed, Jul 24, 2019 at 07:52:26AM -0700, Angus Ainslie (Purism) wrote:
> >> From: Bob Ham <bob.ham@puri.sm>
> >>
> >> Add a VID:PID for the BroadModi BM818 M.2 card
> > 
> > Would you mind posting the output of usb-devices (or lsusb -v) for this
> > device?
> 
> T:  Bus=01 Lev=03 Prnt=40 Port=03 Cnt=01 Dev#= 44 Spd=480 MxCh= 0
> D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
> P:  Vendor=2020 ProdID=2060 Rev=00.00
> S:  Manufacturer=Qualcomm, Incorporated
> S:  Product=Qualcomm CDMA Technologies MSM
> C:  #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
> I:  If#=0x0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
> I:  If#=0x1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
> I:  If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
> I:  If#=0x3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=(none)
> I:  If#=0x4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)

I amended the commit message with the above, switched to
USB_DEVICE_INTERFACE_CLASS(), fixed the comment and moved the entry
to the other 0x2020 entries before applying.

Johan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH v5] perf machine: arm/arm64: Improve completeness for kernel address space
From: Adrian Hunter @ 2019-08-15 11:45 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, linux-kernel, netdev,
	bpf, clang-built-linux, Mathieu Poirier, Peter Zijlstra,
	Suzuki Poulouse, coresight, linux-arm-kernel
In-Reply-To: <20190815113242.GA28881@leoy-ThinkPad-X240s>

On 15/08/19 2:32 PM, Leo Yan wrote:
> Hi Adrian,
> 
> On Thu, Aug 15, 2019 at 11:54:54AM +0300, Adrian Hunter wrote:
> 
> [...]
> 
>>> diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
>>> index e4988f49ea79..d7ff839d8b20 100644
>>> --- a/tools/perf/Makefile.config
>>> +++ b/tools/perf/Makefile.config
>>> @@ -48,9 +48,20 @@ ifeq ($(SRCARCH),x86)
>>>    NO_PERF_REGS := 0
>>>  endif
>>>  
>>> +ARM_PRE_START_SIZE := 0
>>> +
>>>  ifeq ($(SRCARCH),arm)
>>>    NO_PERF_REGS := 0
>>>    LIBUNWIND_LIBS = -lunwind -lunwind-arm
>>> +  ifneq ($(wildcard $(srctree)/arch/$(SRCARCH)/kernel/vmlinux.lds),)
>>> +    # Extract info from lds:
>>> +    #   . = ((0xC0000000)) + 0x00208000;
>>> +    # ARM_PRE_START_SIZE := 0x00208000
>>> +    ARM_PRE_START_SIZE := $(shell egrep ' \. \= \({2}0x[0-9a-fA-F]+\){2}' \
>>> +      $(srctree)/arch/$(SRCARCH)/kernel/vmlinux.lds | \
>>> +      sed -e 's/[(|)|.|=|+|<|;|-]//g' -e 's/ \+/ /g' -e 's/^[ \t]*//' | \
>>> +      awk -F' ' '{printf "0x%x", $$2}' 2>/dev/null)
>>> +  endif
>>>  endif
>>>  
>>>  ifeq ($(SRCARCH),arm64)
>>> @@ -58,8 +69,19 @@ ifeq ($(SRCARCH),arm64)
>>>    NO_SYSCALL_TABLE := 0
>>>    CFLAGS += -I$(OUTPUT)arch/arm64/include/generated
>>>    LIBUNWIND_LIBS = -lunwind -lunwind-aarch64
>>> +  ifneq ($(wildcard $(srctree)/arch/$(SRCARCH)/kernel/vmlinux.lds),)
>>> +    # Extract info from lds:
>>> +    #  . = ((((((((0xffffffffffffffff)) - (((1)) << (48)) + 1) + (0)) + (0x08000000))) + (0x08000000))) + 0x00080000;
>>> +    # ARM_PRE_START_SIZE := (0x08000000 + 0x08000000 + 0x00080000) = 0x10080000
>>> +    ARM_PRE_START_SIZE := $(shell egrep ' \. \= \({8}0x[0-9a-fA-F]+\){2}' \
>>> +      $(srctree)/arch/$(SRCARCH)/kernel/vmlinux.lds | \
>>> +      sed -e 's/[(|)|.|=|+|<|;|-]//g' -e 's/ \+/ /g' -e 's/^[ \t]*//' | \
>>> +      awk -F' ' '{printf "0x%x", $$6+$$7+$$8}' 2>/dev/null)
>>> +  endif
>>
>> So, that is not going to work if you take a perf.data file to a non-arm machine?
> 
> Yeah, this patch will only allow perf to work correctly when perf
> run natively on arm/arm64, so it can resolve partial of the issue.
> 
>> How come you cannot use kallsyms to get the information?
> 
> Thanks for pointing out this.  Sorry I skipped your comment "I don't
> know how you intend to calculate ARM_PRE_START_SIZE" when you reviewed
> the patch v3, I should use that chance to elaborate the detailed idea
> and so can get more feedback/guidance before procceed.
> 
> Actually, I have considered to use kallsyms when worked on the previous
> patch set.
> 
> As mentioned in patch set v4's cover letter, I tried to implement
> machine__create_extra_kernel_maps() for arm/arm64, the purpose is to
> parse kallsyms so can find more kernel maps and thus also can fixup
> the kernel start address.  But I found the 'perf script' tool directly
> calls machine__get_kernel_start() instead of running into the flow for
> machine__create_extra_kernel_maps();

Doesn't it just need to loop through each kernel map to find the lowest
start address?

>                                      so I finally gave up to use
> machine__create_extra_kernel_maps() for tweaking kernel start address
> and went back to use this patch's approach by parsing lds files.
> 
> So for next step, I want to get some guidances:
> 
> - One method is to add a new weak function, e.g.
>   arch__fix_kernel_text_start(), then every arch can implement its own
>   function to fixup the kernel start address;
> 
>   For arm/arm64, can use kallsyms to find the symbols with least
>   address and fixup for kernel start address.
> 
> - Another method is to directly parse kallsyms in the function
>   machine__get_kernel_start(), thus the change can be used for all
>   archs;
> 
> Seems to me the second method is to address this issue as a common
> issue crossing all archs.  But not sure if this is the requirement for
> all archs or just this is only required for arm/arm64.  Please let me
> know what's your preference or other thoughts.  Thanks a lot!
> 
> Leo.
> 
>>>  endif
>>>  
>>> +CFLAGS += -DARM_PRE_START_SIZE=$(ARM_PRE_START_SIZE)
>>> +
>>>  ifeq ($(SRCARCH),csky)
>>>    NO_PERF_REGS := 0
>>>  endif
>>> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
>>> index f6ee7fbad3e4..e993f891bb82 100644
>>> --- a/tools/perf/util/machine.c
>>> +++ b/tools/perf/util/machine.c
>>> @@ -2687,13 +2687,26 @@ int machine__get_kernel_start(struct machine *machine)
>>>  	machine->kernel_start = 1ULL << 63;
>>>  	if (map) {
>>>  		err = map__load(map);
>>> +		if (err)
>>> +			return err;
>>> +
>>>  		/*
>>>  		 * On x86_64, PTI entry trampolines are less than the
>>>  		 * start of kernel text, but still above 2^63. So leave
>>>  		 * kernel_start = 1ULL << 63 for x86_64.
>>>  		 */
>>> -		if (!err && !machine__is(machine, "x86_64"))
>>> +		if (!machine__is(machine, "x86_64"))
>>>  			machine->kernel_start = map->start;
>>> +
>>> +		/*
>>> +		 * On arm/arm64, the kernel uses some memory regions which are
>>> +		 * prior to '_stext' symbol; to reflect the complete kernel
>>> +		 * address space, compensate these pre-defined regions for
>>> +		 * kernel start address.
>>> +		 */
>>> +		if (!strcmp(perf_env__arch(machine->env), "arm") ||
>>> +		    !strcmp(perf_env__arch(machine->env), "arm64"))
>>> +			machine->kernel_start -= ARM_PRE_START_SIZE;
>>>  	}
>>>  	return err;
>>>  }
>>>
>>
> 


^ permalink raw reply

* Re: [PATCH net-next, 5/6] net/mlx5: Add HV VHCA control agent
From: Eran Ben Elisha @ 2019-08-15 11:35 UTC (permalink / raw)
  To: Mark Bloch, haiyangz, sashal@kernel.org, davem@davemloft.net,
	Saeed Mahameed, leon@kernel.org, lorenzo.pieralisi@arm.com,
	bhelgaas@google.com, linux-pci@vger.kernel.org,
	linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
  Cc: kys, Stephen Hemminger, linux-kernel@vger.kernel.org
In-Reply-To: <745f663e-0c56-84d0-a02b-106f788e3e8f@mellanox.com>



On 8/14/2019 11:41 PM, Mark Bloch wrote:
> 
> 
> On 8/14/19 12:09 PM, Haiyang Zhang wrote:
>> From: Eran Ben Elisha <eranbe@mellanox.com>
>>
>> Control agent is responsible over of the control block (ID 0). It should
>> update the PF via this block about every capability change. In addition,
>> upon block 0 invalidate, it should activate all other supported agents
>> with data requests from the PF.
>>
>> Upon agent create/destroy, the invalidate callback of the control agent
>> is being called in order to update the PF driver about this change.
>>
>> The control agent is an integral part of HV VHCA and will be created
>> and destroy as part of the HV VHCA init/cleanup flow.
>>
>> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
>> ---
>>   .../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c  | 122 ++++++++++++++++++++-
>>   .../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h  |   1 +
>>   2 files changed, 121 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
>> index b2eebdf..3c7fffa 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
>> @@ -110,22 +110,131 @@ void mlx5_hv_vhca_invalidate(void *context, u64 block_mask)
>>   	queue_work(hv_vhca->work_queue, &work->invalidate_work);
>>   }
>>   
>> +#define AGENT_MASK(type) (type ? BIT(type - 1) : 0 /* control */)
>> +
>> +static void mlx5_hv_vhca_agents_control(struct mlx5_hv_vhca *hv_vhca,
>> +					struct mlx5_hv_vhca_control_block *block)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < MLX5_HV_VHCA_AGENT_MAX; i++) {
>> +		struct mlx5_hv_vhca_agent *agent = hv_vhca->agents[i];
>> +
>> +		if (!agent || !agent->control)
>> +			continue;
>> +
>> +		if (!(AGENT_MASK(agent->type) & block->control))
>> +			continue;
>> +
>> +		agent->control(agent, block);
>> +	}
>> +}
>> +
>> +static void mlx5_hv_vhca_capabilities(struct mlx5_hv_vhca *hv_vhca,
>> +				      u32 *capabilities)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < MLX5_HV_VHCA_AGENT_MAX; i++) {
>> +		struct mlx5_hv_vhca_agent *agent = hv_vhca->agents[i];
>> +
>> +		if (agent)
>> +			*capabilities |= AGENT_MASK(agent->type);
>> +	}
>> +}
>> +
>> +static void
>> +mlx5_hv_vhca_control_agent_invalidate(struct mlx5_hv_vhca_agent *agent,
>> +				      u64 block_mask)
>> +{
>> +	struct mlx5_hv_vhca *hv_vhca = agent->hv_vhca;
>> +	struct mlx5_core_dev *dev = hv_vhca->dev;
>> +	struct mlx5_hv_vhca_control_block *block;
>> +	u32 capabilities = 0;
>> +	int err;
>> +
>> +	block = kzalloc(sizeof(*block), GFP_KERNEL);
>> +	if (!block)
>> +		return;
>> +
>> +	err = mlx5_hv_read_config(dev, block, sizeof(*block), 0);
>> +	if (err)
>> +		goto free_block;
>> +
>> +	mlx5_hv_vhca_capabilities(hv_vhca, &capabilities);
>> +
>> +	/* In case no capabilities, send empty block in return */
>> +	if (!capabilities) {
>> +		memset(block, 0, sizeof(*block));
>> +		goto write;
>> +	}
>> +
>> +	if (block->capabilities != capabilities)
>> +		block->capabilities = capabilities;
>> +
>> +	if (block->control & ~capabilities)
>> +		goto free_block;
>> +
>> +	mlx5_hv_vhca_agents_control(hv_vhca, block);
>> +	block->command_ack = block->command;
>> +
>> +write:
>> +	mlx5_hv_write_config(dev, block, sizeof(*block), 0);
>> +
>> +free_block:
>> +	kfree(block);
>> +}
>> +
>> +static struct mlx5_hv_vhca_agent *
>> +mlx5_hv_vhca_control_agent_create(struct mlx5_hv_vhca *hv_vhca)
>> +{
>> +	return mlx5_hv_vhca_agent_create(hv_vhca, MLX5_HV_VHCA_AGENT_CONTROL,
>> +					 NULL,
>> +					 mlx5_hv_vhca_control_agent_invalidate,
>> +					 NULL, NULL);
>> +}
>> +
>> +static void mlx5_hv_vhca_control_agent_destroy(struct mlx5_hv_vhca_agent *agent)
>> +{
>> +	mlx5_hv_vhca_agent_destroy(agent);
>> +}
>> +
>>   int mlx5_hv_vhca_init(struct mlx5_hv_vhca *hv_vhca)
>>   {
>> +	struct mlx5_hv_vhca_agent *agent;
>> +	int err;
>> +
>>   	if (IS_ERR_OR_NULL(hv_vhca))
>>   		return IS_ERR_OR_NULL(hv_vhca);
>>   
>> -	return mlx5_hv_register_invalidate(hv_vhca->dev, hv_vhca,
>> -					   mlx5_hv_vhca_invalidate);
>> +	err = mlx5_hv_register_invalidate(hv_vhca->dev, hv_vhca,
>> +					  mlx5_hv_vhca_invalidate);
>> +	if (err)
>> +		return err;
>> +
>> +	agent = mlx5_hv_vhca_control_agent_create(hv_vhca);
>> +	if (IS_ERR_OR_NULL(agent)) {
>> +		mlx5_hv_unregister_invalidate(hv_vhca->dev);
>> +		return IS_ERR_OR_NULL(agent);
>> +	}
>> +
>> +	hv_vhca->agents[MLX5_HV_VHCA_AGENT_CONTROL] = agent;
>> +
>> +	return 0;
>>   }
>>   
>>   void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca)
>>   {
>> +	struct mlx5_hv_vhca_agent *agent;
>>   	int i;
>>   
>>   	if (IS_ERR_OR_NULL(hv_vhca))
>>   		return;
>>   
>> +	agent = hv_vhca->agents[MLX5_HV_VHCA_AGENT_CONTROL];
>> +	if (!IS_ERR_OR_NULL(agent))
>> +		mlx5_hv_vhca_control_agent_destroy(agent);
> 
> Can the agent be err ptr here?

Only NULL, will fix.

> 
>> +
>>   	mutex_lock(&hv_vhca->agents_lock);
>>   	for (i = 0; i < MLX5_HV_VHCA_AGENT_MAX; i++)
>>   		WARN_ON(hv_vhca->agents[i]);
> 
> With the comment above in mind, here you check only for not null

Comment above was right... after fixing it, all is aligned here.

> 
>> @@ -135,6 +244,11 @@ void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca)
>>   	mlx5_hv_unregister_invalidate(hv_vhca->dev);
>>   }
>>   
>> +static void mlx5_hv_vhca_agents_update(struct mlx5_hv_vhca *hv_vhca)
>> +{
>> +	mlx5_hv_vhca_invalidate(hv_vhca, BIT(MLX5_HV_VHCA_AGENT_CONTROL));
>> +}
>> +
>>   struct mlx5_hv_vhca_agent *
>>   mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
>>   			  enum mlx5_hv_vhca_agent_type type,
>> @@ -168,6 +282,8 @@ struct mlx5_hv_vhca_agent *
>>   	hv_vhca->agents[type] = agent;
>>   	mutex_unlock(&hv_vhca->agents_lock);
>>   
>> +	mlx5_hv_vhca_agents_update(hv_vhca);
>> +
>>   	return agent;
>>   }
>>   
>> @@ -189,6 +305,8 @@ void mlx5_hv_vhca_agent_destroy(struct mlx5_hv_vhca_agent *agent)
>>   		agent->cleanup(agent);
>>   
>>   	kfree(agent);
>> +
>> +	mlx5_hv_vhca_agents_update(hv_vhca);
>>   }
>>   
>>   static int mlx5_hv_vhca_data_block_prepare(struct mlx5_hv_vhca_agent *agent,
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
>> index fa7ee85..6f4bfb1 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
>> @@ -12,6 +12,7 @@
>>   struct mlx5_hv_vhca_control_block;
>>   
>>   enum mlx5_hv_vhca_agent_type {
>> +	MLX5_HV_VHCA_AGENT_CONTROL = 0,
> 
> No need to start value

I find it more easy to read when having the value explicitly.
If you or Saeed has a strong opinion against it, this can be easily fixed.

> 
>>   	MLX5_HV_VHCA_AGENT_MAX = 32,
>>   };
>>   
>>
> 
> Mark
> 

^ permalink raw reply

* Re: [PATCH net-next, 4/6] net/mlx5: Add HV VHCA infrastructure
From: Eran Ben Elisha @ 2019-08-15 11:35 UTC (permalink / raw)
  To: Mark Bloch, haiyangz, sashal@kernel.org, davem@davemloft.net,
	Saeed Mahameed, leon@kernel.org, lorenzo.pieralisi@arm.com,
	bhelgaas@google.com, linux-pci@vger.kernel.org,
	linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
  Cc: kys, Stephen Hemminger, linux-kernel@vger.kernel.org
In-Reply-To: <65561e09-6fa5-b223-b547-36736c4a9d83@mellanox.com>



On 8/14/2019 11:41 PM, Mark Bloch wrote:
> 
> 
> On 8/14/19 12:08 PM, Haiyang Zhang wrote:
>> From: Eran Ben Elisha <eranbe@mellanox.com>
>>
>> HV VHCA is a layer which provides PF to VF communication channel based on
>> HyperV PCI config channel. It implements Mellanox's Inter VHCA control
>> communication protocol. The protocol contains control block in order to
>> pass messages between the PF and VF drivers, and data blocks in order to
>> pass actual data.
>>
>> The infrastructure is agent based. Each agent will be responsible of
>> contiguous buffer blocks in the VHCA config space. This infrastructure will
>> bind agents to their blocks, and those agents can only access read/write
>> the buffer blocks assigned to them. Each agent will provide three
>> callbacks (control, invalidate, cleanup). Control will be invoked when
>> block-0 is invalidated with a command that concerns this agent. Invalidate
>> callback will be invoked if one of the blocks assigned to this agent was
>> invalidated. Cleanup will be invoked before the agent is being freed in
>> order to clean all of its open resources or deferred works.
>>
>> Block-0 serves as the control block. All execution commands from the PF
>> will be written by the PF over this block. VF will ack on those by
>> writing on block-0 as well. Its format is described by struct
>> mlx5_hv_vhca_control_block layout.
>>
>> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
>>   .../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c  | 247 +++++++++++++++++++++
>>   .../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h  | 102 +++++++++
>>   drivers/net/ethernet/mellanox/mlx5/core/main.c     |   7 +
>>   include/linux/mlx5/driver.h                        |   2 +
>>   5 files changed, 359 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
>>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>> index a8950b1..e0a1056 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>> @@ -45,7 +45,7 @@ mlx5_core-$(CONFIG_MLX5_ESWITCH)   += eswitch.o eswitch_offloads.o eswitch_offlo
>>   mlx5_core-$(CONFIG_MLX5_MPFS)      += lib/mpfs.o
>>   mlx5_core-$(CONFIG_VXLAN)          += lib/vxlan.o
>>   mlx5_core-$(CONFIG_PTP_1588_CLOCK) += lib/clock.o
>> -mlx5_core-$(CONFIG_PCI_HYPERV_MINI)     += lib/hv.o
>> +mlx5_core-$(CONFIG_PCI_HYPERV_MINI)+= lib/hv.o lib/hv_vhca.o
>>   
>>   #
>>   # Ipoib netdev
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
>> new file mode 100644
>> index 0000000..b2eebdf
>> --- /dev/null
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
>> @@ -0,0 +1,247 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
>> +// Copyright (c) 2018 Mellanox Technologies
>> +
>> +#include <linux/hyperv.h>
>> +#include "mlx5_core.h"
>> +#include "lib/hv.h"
>> +#include "lib/hv_vhca.h"
>> +
>> +struct mlx5_hv_vhca {
>> +	struct mlx5_core_dev       *dev;
>> +	struct workqueue_struct    *work_queue;
>> +	struct mlx5_hv_vhca_agent  *agents[MLX5_HV_VHCA_AGENT_MAX];
>> +	struct mutex                agents_lock; /* Protect agents array */
>> +};
>> +
>> +struct mlx5_hv_vhca_work {
>> +	struct work_struct     invalidate_work;
>> +	struct mlx5_hv_vhca   *hv_vhca;
>> +	u64                    block_mask;
>> +};
>> +
>> +struct mlx5_hv_vhca_data_block {
>> +	u16     sequence;
>> +	u16     offset;
>> +	u8      reserved[4];
>> +	u64     data[15];
>> +};
>> +
>> +struct mlx5_hv_vhca_agent {
>> +	enum mlx5_hv_vhca_agent_type	 type;
>> +	struct mlx5_hv_vhca		*hv_vhca;
>> +	void				*priv;
>> +	int                              seq;
> Why is this int? and in data block is u16?

No good reason. Should be changed to u16.

> 
>> +	void (*control)(struct mlx5_hv_vhca_agent *agent,
>> +			struct mlx5_hv_vhca_control_block *block);
>> +	void (*invalidate)(struct mlx5_hv_vhca_agent *agent,
>> +			   u64 block_mask);
>> +	void (*cleanup)(struct mlx5_hv_vhca_agent *agent);
>> +};
>> +
>> +struct mlx5_hv_vhca *mlx5_hv_vhca_create(struct mlx5_core_dev *dev)
>> +{
>> +	struct mlx5_hv_vhca *hv_vhca = NULL;
>> +
>> +	hv_vhca = kzalloc(sizeof(*hv_vhca), GFP_KERNEL);
>> +	if (!hv_vhca)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	hv_vhca->work_queue = create_singlethread_workqueue("mlx5_hv_vhca");
>> +	if (!hv_vhca->work_queue) {
>> +		kfree(hv_vhca);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +
>> +	hv_vhca->dev = dev;
>> +	mutex_init(&hv_vhca->agents_lock);
>> +
>> +	return hv_vhca;
>> +}
>> +
>> +void mlx5_hv_vhca_destroy(struct mlx5_hv_vhca *hv_vhca)
>> +{
>> +	if (IS_ERR_OR_NULL(hv_vhca))
>> +		return;
>> +
>> +	flush_workqueue(hv_vhca->work_queue);
>> +	destroy_workqueue(hv_vhca->work_queue);
> 
> Why not just destroy?

Will fix.

> 
>> +	kfree(hv_vhca);
>> +}
>> +
>> +static void mlx5_hv_vhca_invalidate_work(struct work_struct *work)
>> +{
>> +	struct mlx5_hv_vhca_work *hwork;
>> +	struct mlx5_hv_vhca *hv_vhca;
>> +	int i;
>> +
>> +	hwork = container_of(work, struct mlx5_hv_vhca_work, invalidate_work);
>> +	hv_vhca = hwork->hv_vhca;
>> +
>> +	mutex_lock(&hv_vhca->agents_lock);
>> +	for (i = 0; i < MLX5_HV_VHCA_AGENT_MAX; i++) {
>> +		struct mlx5_hv_vhca_agent *agent = hv_vhca->agents[i];
>> +
>> +		if (!agent || !agent->invalidate)
>> +			continue;
>> +
>> +		if (!(BIT(agent->type) & hwork->block_mask))
>> +			continue;
>> +
>> +		agent->invalidate(agent, hwork->block_mask);
>> +	}
>> +	mutex_unlock(&hv_vhca->agents_lock);
>> +
>> +	kfree(hwork);
>> +}
>> +
>> +void mlx5_hv_vhca_invalidate(void *context, u64 block_mask)
>> +{
>> +	struct mlx5_hv_vhca *hv_vhca = (struct mlx5_hv_vhca *)context;
>> +	struct mlx5_hv_vhca_work *work;
>> +
>> +	work = kzalloc(sizeof(*work), GFP_ATOMIC);
>> +	if (!work)
>> +		return;
>> +
>> +	INIT_WORK(&work->invalidate_work, mlx5_hv_vhca_invalidate_work);
>> +	work->hv_vhca    = hv_vhca;
>> +	work->block_mask = block_mask;
>> +
>> +	queue_work(hv_vhca->work_queue, &work->invalidate_work);
>> +}
>> +
>> +int mlx5_hv_vhca_init(struct mlx5_hv_vhca *hv_vhca)
>> +{
>> +	if (IS_ERR_OR_NULL(hv_vhca))
>> +		return IS_ERR_OR_NULL(hv_vhca);
>> +
>> +	return mlx5_hv_register_invalidate(hv_vhca->dev, hv_vhca,
>> +					   mlx5_hv_vhca_invalidate);
>> +}
>> +
>> +void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca)
>> +{
>> +	int i;
>> +
>> +	if (IS_ERR_OR_NULL(hv_vhca))
>> +		return;
>> +
>> +	mutex_lock(&hv_vhca->agents_lock);
>> +	for (i = 0; i < MLX5_HV_VHCA_AGENT_MAX; i++)
>> +		WARN_ON(hv_vhca->agents[i]);
>> +
>> +	mutex_unlock(&hv_vhca->agents_lock);
>> +
>> +	mlx5_hv_unregister_invalidate(hv_vhca->dev);
>> +}
>> +
>> +struct mlx5_hv_vhca_agent *
>> +mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
>> +			  enum mlx5_hv_vhca_agent_type type,
>> +			  void (*control)(struct mlx5_hv_vhca_agent*,
>> +					  struct mlx5_hv_vhca_control_block *block),
>> +			  void (*invalidate)(struct mlx5_hv_vhca_agent*,
>> +					     u64 block_mask),
>> +			  void (*cleaup)(struct mlx5_hv_vhca_agent *agent),
>> +			  void *priv)
>> +{
>> +	struct mlx5_hv_vhca_agent *agent;
>> +
>> +	if (IS_ERR_OR_NULL(hv_vhca))
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	if (hv_vhca->agents[type])
>> +		return ERR_PTR(-EINVAL);> +
>> +	agent = kzalloc(sizeof(*agent), GFP_KERNEL);
>> +	if (!agent)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	agent->type      = type;
>> +	agent->hv_vhca   = hv_vhca;
>> +	agent->priv      = priv;
>> +	agent->control   = control;
>> +	agent->invalidate = invalidate;
>> +	agent->cleanup   = cleaup;
>> +
>> +	mutex_lock(&hv_vhca->agents_lock);
>> +	hv_vhca->agents[type] = agent;
>> +	mutex_unlock(&hv_vhca->agents_lock);
> 
> You have a check for this not under a lock a few lines up,
> but assign under a lock?

good point, will add.

> 
> Mark
> 
>> +
>> +	return agent;
>> +}
>> +
>> +void mlx5_hv_vhca_agent_destroy(struct mlx5_hv_vhca_agent *agent)
>> +{
>> +	struct mlx5_hv_vhca *hv_vhca = agent->hv_vhca;
>> +
>> +	mutex_lock(&hv_vhca->agents_lock);
>> +
>> +	if (WARN_ON(agent != hv_vhca->agents[agent->type])) {
>> +		mutex_unlock(&hv_vhca->agents_lock);
>> +		return;
>> +	}
>> +
>> +	hv_vhca->agents[agent->type] = NULL;
>> +	mutex_unlock(&hv_vhca->agents_lock);
>> +
>> +	if (agent->cleanup)
>> +		agent->cleanup(agent);
>> +
>> +	kfree(agent);
>> +}
>> +
>> +static int mlx5_hv_vhca_data_block_prepare(struct mlx5_hv_vhca_agent *agent,
>> +					   struct mlx5_hv_vhca_data_block *data_block,
>> +					   void *src, int len, int *offset)
>> +{
>> +	int bytes = min_t(int, (int)sizeof(data_block->data), len);
>> +
>> +	data_block->sequence = agent->seq;
>> +	data_block->offset   = (*offset)++;
>> +	memcpy(data_block->data, src, bytes);
>> +
>> +	return bytes;
>> +}
>> +
>> +static void mlx5_hv_vhca_agent_seq_update(struct mlx5_hv_vhca_agent *agent)
>> +{
>> +	agent->seq++;
>> +}
>> +
>> +int mlx5_hv_vhca_agent_write(struct mlx5_hv_vhca_agent *agent,
>> +			     void *buf, int len)
>> +{
>> +	int offset = agent->type * HV_CONFIG_BLOCK_SIZE_MAX;
>> +	int block_offset = 0;
>> +	int total = 0;
>> +	int err;
>> +
>> +	while (len) {
>> +		struct mlx5_hv_vhca_data_block data_block = {0};
>> +		int bytes;
>> +
>> +		bytes = mlx5_hv_vhca_data_block_prepare(agent, &data_block,
>> +							buf + total,
>> +							len, &block_offset);
>> +		if (!bytes)
>> +			return -ENOMEM;
>> +
>> +		err = mlx5_hv_write_config(agent->hv_vhca->dev, &data_block,
>> +					   sizeof(data_block), offset);
>> +		if (err)
>> +			return err;
>> +
>> +		total += bytes;
>> +		len   -= bytes;
>> +	}
>> +
>> +	mlx5_hv_vhca_agent_seq_update(agent);
>> +
>> +	return 0;
>> +}
>> +
>> +void *mlx5_hv_vhca_agent_priv(struct mlx5_hv_vhca_agent *agent)
>> +{
>> +	return agent->priv;
>> +}
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
>> new file mode 100644
>> index 0000000..fa7ee85
>> --- /dev/null
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
>> @@ -0,0 +1,102 @@
>> +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
>> +/* Copyright (c) 2019 Mellanox Technologies. */
>> +
>> +#ifndef __LIB_HV_VHCA_H__
>> +#define __LIB_HV_VHCA_H__
>> +
>> +#include "en.h"
>> +#include "lib/hv.h"
>> +
>> +struct mlx5_hv_vhca_agent;
>> +struct mlx5_hv_vhca;
>> +struct mlx5_hv_vhca_control_block;
>> +
>> +enum mlx5_hv_vhca_agent_type {
>> +	MLX5_HV_VHCA_AGENT_MAX = 32,
>> +};
>> +
>> +#if IS_ENABLED(CONFIG_PCI_HYPERV_MINI)
>> +
>> +struct mlx5_hv_vhca_control_block {
>> +	u32     capabilities;
>> +	u32     control;
>> +	u16     command;
>> +	u16     command_ack;
>> +	u16     version;
>> +	u16     rings;
>> +	u32     reserved1[28];
>> +};
>> +
>> +struct mlx5_hv_vhca *mlx5_hv_vhca_create(struct mlx5_core_dev *dev);
>> +void mlx5_hv_vhca_destroy(struct mlx5_hv_vhca *hv_vhca);
>> +int mlx5_hv_vhca_init(struct mlx5_hv_vhca *hv_vhca);
>> +void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca);
>> +void mlx5_hv_vhca_invalidate(void *context, u64 block_mask);
>> +
>> +struct mlx5_hv_vhca_agent *
>> +mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
>> +			  enum mlx5_hv_vhca_agent_type type,
>> +			  void (*control)(struct mlx5_hv_vhca_agent*,
>> +					  struct mlx5_hv_vhca_control_block *block),
>> +			  void (*invalidate)(struct mlx5_hv_vhca_agent*,
>> +					     u64 block_mask),
>> +			  void (*cleanup)(struct mlx5_hv_vhca_agent *agent),
>> +			  void *context);
>> +
>> +void mlx5_hv_vhca_agent_destroy(struct mlx5_hv_vhca_agent *agent);
>> +int mlx5_hv_vhca_agent_write(struct mlx5_hv_vhca_agent *agent,
>> +			     void *buf, int len);
>> +void *mlx5_hv_vhca_agent_priv(struct mlx5_hv_vhca_agent *agent);
>> +
>> +#else
>> +
>> +static inline struct mlx5_hv_vhca *
>> +mlx5_hv_vhca_create(struct mlx5_core_dev *dev)
>> +{
>> +	return NULL;
>> +}
>> +
>> +static inline void mlx5_hv_vhca_destroy(struct mlx5_hv_vhca *hv_vhca)
>> +{
>> +}
>> +
>> +static inline int mlx5_hv_vhca_init(struct mlx5_hv_vhca *hv_vhca)
>> +{
>> +	return 0;
>> +}
>> +
>> +static inline void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca)
>> +{
>> +}
>> +
>> +static inline void mlx5_hv_vhca_invalidate(void *context,
>> +					   u64 block_mask)
>> +{
>> +}
>> +
>> +static inline struct mlx5_hv_vhca_agent *
>> +mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
>> +			  enum mlx5_hv_vhca_agent_type type,
>> +			  void (*control)(struct mlx5_hv_vhca_agent*,
>> +					  struct mlx5_hv_vhca_control_block *block),
>> +			  void (*invalidate)(struct mlx5_hv_vhca_agent*,
>> +					     u64 block_mask),
>> +			  void (*cleanup)(struct mlx5_hv_vhca_agent *agent),
>> +			  void *context)
>> +{
>> +	return NULL;
>> +}
>> +
>> +static inline void mlx5_hv_vhca_agent_destroy(struct mlx5_hv_vhca_agent *agent)
>> +{
>> +}
>> +
>> +static inline int
>> +mlx5_hv_vhca_write_agent(struct mlx5_hv_vhca_agent *agent,
>> +			 void *buf, int len)
>> +{
>> +	return 0;
>> +}
>> +#endif
>> +
>> +#endif /* __LIB_HV_VHCA_H__ */
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> index 4cc90eb..50ee38b 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> @@ -69,6 +69,7 @@
>>   #include "lib/pci_vsc.h"
>>   #include "diag/fw_tracer.h"
>>   #include "ecpf.h"
>> +#include "lib/hv_vhca.h"
>>   
>>   MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>");
>>   MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX series) core driver");
>> @@ -872,6 +873,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
>>   	}
>>   
>>   	dev->tracer = mlx5_fw_tracer_create(dev);
>> +	dev->hv_vhca = mlx5_hv_vhca_create(dev);
>>   
>>   	return 0;
>>   
>> @@ -902,6 +904,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
>>   
>>   static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
>>   {
>> +	mlx5_hv_vhca_destroy(dev->hv_vhca);
>>   	mlx5_fw_tracer_destroy(dev->tracer);
>>   	mlx5_fpga_cleanup(dev);
>>   	mlx5_eswitch_cleanup(dev->priv.eswitch);
>> @@ -1068,6 +1071,8 @@ static int mlx5_load(struct mlx5_core_dev *dev)
>>   		goto err_fw_tracer;
>>   	}
>>   
>> +	mlx5_hv_vhca_init(dev->hv_vhca);
>> +
>>   	err = mlx5_fpga_device_start(dev);
>>   	if (err) {
>>   		mlx5_core_err(dev, "fpga device start failed %d\n", err);
>> @@ -1123,6 +1128,7 @@ static int mlx5_load(struct mlx5_core_dev *dev)
>>   err_ipsec_start:
>>   	mlx5_fpga_device_stop(dev);
>>   err_fpga_start:
>> +	mlx5_hv_vhca_cleanup(dev->hv_vhca);
>>   	mlx5_fw_tracer_cleanup(dev->tracer);
>>   err_fw_tracer:
>>   	mlx5_eq_table_destroy(dev);
>> @@ -1143,6 +1149,7 @@ static void mlx5_unload(struct mlx5_core_dev *dev)
>>   	mlx5_accel_ipsec_cleanup(dev);
>>   	mlx5_accel_tls_cleanup(dev);
>>   	mlx5_fpga_device_stop(dev);
>> +	mlx5_hv_vhca_cleanup(dev->hv_vhca);
>>   	mlx5_fw_tracer_cleanup(dev->tracer);
>>   	mlx5_eq_table_destroy(dev);
>>   	mlx5_irq_table_destroy(dev);
>> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
>> index 2b84ee9..97bb98c 100644
>> --- a/include/linux/mlx5/driver.h
>> +++ b/include/linux/mlx5/driver.h
>> @@ -646,6 +646,7 @@ struct mlx5_clock {
>>   struct mlx5_fw_tracer;
>>   struct mlx5_vxlan;
>>   struct mlx5_geneve;
>> +struct mlx5_hv_vhca;
>>   
>>   struct mlx5_core_dev {
>>   	struct device *device;
>> @@ -693,6 +694,7 @@ struct mlx5_core_dev {
>>   	struct mlx5_ib_clock_info  *clock_info;
>>   	struct mlx5_fw_tracer   *tracer;
>>   	u32                      vsc_addr;
>> +	struct mlx5_hv_vhca	*hv_vhca;
>>   };
>>   
>>   struct mlx5_db {
>>

^ permalink raw reply

* Re: [PATCH net-next, 3/6] net/mlx5: Add wrappers for HyperV PCIe operations
From: Eran Ben Elisha @ 2019-08-15 11:34 UTC (permalink / raw)
  To: Mark Bloch, haiyangz, sashal@kernel.org, davem@davemloft.net,
	Saeed Mahameed, leon@kernel.org, lorenzo.pieralisi@arm.com,
	bhelgaas@google.com, linux-pci@vger.kernel.org,
	linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
  Cc: kys, Stephen Hemminger, linux-kernel@vger.kernel.org
In-Reply-To: <e2a38f24-5a63-ef89-9d69-6a0f2770a9e4@mellanox.com>



On 8/14/2019 11:41 PM, Mark Bloch wrote:
> 
> 
> On 8/14/19 12:08 PM, Haiyang Zhang wrote:
>> From: Eran Ben Elisha <eranbe@mellanox.com>
>>
>> Add wrapper functions for HyperV PCIe read / write /
>> block_invalidate_register operations.  This will be used as an
>> infrastructure in the downstream patch for software communication.
>>
>> This will be enabled by default if CONFIG_PCI_HYPERV_MINI is set.
>>
>> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/Makefile |  1 +
>>   drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c | 64 ++++++++++++++++++++++++
>>   drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h | 22 ++++++++
>>   3 files changed, 87 insertions(+)
>>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
>>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>> index 8b7edaa..a8950b1 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>> @@ -45,6 +45,7 @@ mlx5_core-$(CONFIG_MLX5_ESWITCH)   += eswitch.o eswitch_offloads.o eswitch_offlo
>>   mlx5_core-$(CONFIG_MLX5_MPFS)      += lib/mpfs.o
>>   mlx5_core-$(CONFIG_VXLAN)          += lib/vxlan.o
>>   mlx5_core-$(CONFIG_PTP_1588_CLOCK) += lib/clock.o
>> +mlx5_core-$(CONFIG_PCI_HYPERV_MINI)     += lib/hv.o
>>   
>>   #
>>   # Ipoib netdev
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
>> new file mode 100644
>> index 0000000..cf08d02
>> --- /dev/null
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
>> @@ -0,0 +1,64 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
>> +// Copyright (c) 2018 Mellanox Technologies
>> +
>> +#include <linux/hyperv.h>
>> +#include "mlx5_core.h"
>> +#include "lib/hv.h"
>> +
>> +static int mlx5_hv_config_common(struct mlx5_core_dev *dev, void *buf, int len,
>> +				 int offset, bool read)
>> +{
>> +	int rc = -EOPNOTSUPP;
>> +	int bytes_returned;
>> +	int block_id;
>> +
>> +	if (offset % HV_CONFIG_BLOCK_SIZE_MAX || len % HV_CONFIG_BLOCK_SIZE_MAX)
>> +		return -EINVAL;
>> +
>> +	block_id = offset / HV_CONFIG_BLOCK_SIZE_MAX;
>> +
>> +	rc = read ?
>> +	     hyperv_read_cfg_blk(dev->pdev, buf,
>> +				 HV_CONFIG_BLOCK_SIZE_MAX, block_id,
>> +				 &bytes_returned) :
>> +	     hyperv_write_cfg_blk(dev->pdev, buf,
>> +				  HV_CONFIG_BLOCK_SIZE_MAX, block_id);
>> +
>> +	/* Make sure len bytes were read successfully  */
>> +	if (read)
>> +		rc |= !(len == bytes_returned);
>> +
>> +	if (rc) {
>> +		mlx5_core_err(dev, "Failed to %s hv config, err = %d, len = %d, offset = %d\n",
>> +			      read ? "read" : "write", rc, len,
>> +			      offset);
>> +		return rc;
>> +	}
>> +
>> +	return 0;
>> +}
> 
> This seems out of place why not expose this function as part of hyperv and mlx5
> will just pass the pdev.
> 
The HV driver works with block chunks. I found it less convenience to do 
so directly, so I add a small wrapper for mlx5 core.

Haiyangz,
Do you see a reason to export this callback style from the HYPERV level 
instead?

>> +
>> +int mlx5_hv_read_config(struct mlx5_core_dev *dev, void *buf, int len,
>> +			int offset)
>> +{
>> +	return mlx5_hv_config_common(dev, buf, len, offset, true);
>> +}
>> +
>> +int mlx5_hv_write_config(struct mlx5_core_dev *dev, void *buf, int len,
>> +			 int offset)
>> +{
>> +	return mlx5_hv_config_common(dev, buf, len, offset, false);
>> +}
>> +
>> +int mlx5_hv_register_invalidate(struct mlx5_core_dev *dev, void *context,
>> +				void (*block_invalidate)(void *context,
>> +							 u64 block_mask))
>> +{
>> +	return hyperv_reg_block_invalidate(dev->pdev, context,
>> +					   block_invalidate);
>> +}
>> +
>> +void mlx5_hv_unregister_invalidate(struct mlx5_core_dev *dev)
>> +{
>> +	hyperv_reg_block_invalidate(dev->pdev, NULL, NULL);
>> +}
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h
>> new file mode 100644
>> index 0000000..7f69771
>> --- /dev/null
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h
>> @@ -0,0 +1,22 @@
>> +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
>> +/* Copyright (c) 2019 Mellanox Technologies. */
>> +
>> +#ifndef __LIB_HV_H__
>> +#define __LIB_HV_H__
>> +
>> +#if IS_ENABLED(CONFIG_PCI_HYPERV_MINI)
>> +
>> +#include <linux/hyperv.h>
>> +#include <linux/mlx5/driver.h>
>> +
>> +int mlx5_hv_read_config(struct mlx5_core_dev *dev, void *buf, int len,
>> +			int offset);
>> +int mlx5_hv_write_config(struct mlx5_core_dev *dev, void *buf, int len,
>> +			 int offset);
>> +int mlx5_hv_register_invalidate(struct mlx5_core_dev *dev, void *context,
>> +				void (*block_invalidate)(void *context,
>> +							 u64 block_mask));
>> +void mlx5_hv_unregister_invalidate(struct mlx5_core_dev *dev);
>> +#endif
>> +
>> +#endif /* __LIB_HV_H__ */
>>
> 
> Mark
> 

^ permalink raw reply

* Re: [PATCH v5] perf machine: arm/arm64: Improve completeness for kernel address space
From: Leo Yan @ 2019-08-15 11:32 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, linux-kernel, netdev,
	bpf, clang-built-linux, Mathieu Poirier, Peter Zijlstra,
	Suzuki Poulouse, coresight, linux-arm-kernel
In-Reply-To: <d874e6b3-c115-6c8c-bb12-160cfd600505@intel.com>

Hi Adrian,

On Thu, Aug 15, 2019 at 11:54:54AM +0300, Adrian Hunter wrote:

[...]

> > diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
> > index e4988f49ea79..d7ff839d8b20 100644
> > --- a/tools/perf/Makefile.config
> > +++ b/tools/perf/Makefile.config
> > @@ -48,9 +48,20 @@ ifeq ($(SRCARCH),x86)
> >    NO_PERF_REGS := 0
> >  endif
> >  
> > +ARM_PRE_START_SIZE := 0
> > +
> >  ifeq ($(SRCARCH),arm)
> >    NO_PERF_REGS := 0
> >    LIBUNWIND_LIBS = -lunwind -lunwind-arm
> > +  ifneq ($(wildcard $(srctree)/arch/$(SRCARCH)/kernel/vmlinux.lds),)
> > +    # Extract info from lds:
> > +    #   . = ((0xC0000000)) + 0x00208000;
> > +    # ARM_PRE_START_SIZE := 0x00208000
> > +    ARM_PRE_START_SIZE := $(shell egrep ' \. \= \({2}0x[0-9a-fA-F]+\){2}' \
> > +      $(srctree)/arch/$(SRCARCH)/kernel/vmlinux.lds | \
> > +      sed -e 's/[(|)|.|=|+|<|;|-]//g' -e 's/ \+/ /g' -e 's/^[ \t]*//' | \
> > +      awk -F' ' '{printf "0x%x", $$2}' 2>/dev/null)
> > +  endif
> >  endif
> >  
> >  ifeq ($(SRCARCH),arm64)
> > @@ -58,8 +69,19 @@ ifeq ($(SRCARCH),arm64)
> >    NO_SYSCALL_TABLE := 0
> >    CFLAGS += -I$(OUTPUT)arch/arm64/include/generated
> >    LIBUNWIND_LIBS = -lunwind -lunwind-aarch64
> > +  ifneq ($(wildcard $(srctree)/arch/$(SRCARCH)/kernel/vmlinux.lds),)
> > +    # Extract info from lds:
> > +    #  . = ((((((((0xffffffffffffffff)) - (((1)) << (48)) + 1) + (0)) + (0x08000000))) + (0x08000000))) + 0x00080000;
> > +    # ARM_PRE_START_SIZE := (0x08000000 + 0x08000000 + 0x00080000) = 0x10080000
> > +    ARM_PRE_START_SIZE := $(shell egrep ' \. \= \({8}0x[0-9a-fA-F]+\){2}' \
> > +      $(srctree)/arch/$(SRCARCH)/kernel/vmlinux.lds | \
> > +      sed -e 's/[(|)|.|=|+|<|;|-]//g' -e 's/ \+/ /g' -e 's/^[ \t]*//' | \
> > +      awk -F' ' '{printf "0x%x", $$6+$$7+$$8}' 2>/dev/null)
> > +  endif
> 
> So, that is not going to work if you take a perf.data file to a non-arm machine?

Yeah, this patch will only allow perf to work correctly when perf
run natively on arm/arm64, so it can resolve partial of the issue.

> How come you cannot use kallsyms to get the information?

Thanks for pointing out this.  Sorry I skipped your comment "I don't
know how you intend to calculate ARM_PRE_START_SIZE" when you reviewed
the patch v3, I should use that chance to elaborate the detailed idea
and so can get more feedback/guidance before procceed.

Actually, I have considered to use kallsyms when worked on the previous
patch set.

As mentioned in patch set v4's cover letter, I tried to implement
machine__create_extra_kernel_maps() for arm/arm64, the purpose is to
parse kallsyms so can find more kernel maps and thus also can fixup
the kernel start address.  But I found the 'perf script' tool directly
calls machine__get_kernel_start() instead of running into the flow for
machine__create_extra_kernel_maps(); so I finally gave up to use
machine__create_extra_kernel_maps() for tweaking kernel start address
and went back to use this patch's approach by parsing lds files.

So for next step, I want to get some guidances:

- One method is to add a new weak function, e.g.
  arch__fix_kernel_text_start(), then every arch can implement its own
  function to fixup the kernel start address;

  For arm/arm64, can use kallsyms to find the symbols with least
  address and fixup for kernel start address.

- Another method is to directly parse kallsyms in the function
  machine__get_kernel_start(), thus the change can be used for all
  archs;

Seems to me the second method is to address this issue as a common
issue crossing all archs.  But not sure if this is the requirement for
all archs or just this is only required for arm/arm64.  Please let me
know what's your preference or other thoughts.  Thanks a lot!

Leo.

> >  endif
> >  
> > +CFLAGS += -DARM_PRE_START_SIZE=$(ARM_PRE_START_SIZE)
> > +
> >  ifeq ($(SRCARCH),csky)
> >    NO_PERF_REGS := 0
> >  endif
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index f6ee7fbad3e4..e993f891bb82 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -2687,13 +2687,26 @@ int machine__get_kernel_start(struct machine *machine)
> >  	machine->kernel_start = 1ULL << 63;
> >  	if (map) {
> >  		err = map__load(map);
> > +		if (err)
> > +			return err;
> > +
> >  		/*
> >  		 * On x86_64, PTI entry trampolines are less than the
> >  		 * start of kernel text, but still above 2^63. So leave
> >  		 * kernel_start = 1ULL << 63 for x86_64.
> >  		 */
> > -		if (!err && !machine__is(machine, "x86_64"))
> > +		if (!machine__is(machine, "x86_64"))
> >  			machine->kernel_start = map->start;
> > +
> > +		/*
> > +		 * On arm/arm64, the kernel uses some memory regions which are
> > +		 * prior to '_stext' symbol; to reflect the complete kernel
> > +		 * address space, compensate these pre-defined regions for
> > +		 * kernel start address.
> > +		 */
> > +		if (!strcmp(perf_env__arch(machine->env), "arm") ||
> > +		    !strcmp(perf_env__arch(machine->env), "arm64"))
> > +			machine->kernel_start -= ARM_PRE_START_SIZE;
> >  	}
> >  	return err;
> >  }
> > 
> 

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Jordan Glover @ 2019-08-15 11:24 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Daniel Colascione, Song Liu, Kees Cook,
	Networking, bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team,
	Lorenz Bauer, Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <20190814220545.co5pucyo5jk3weiv@ast-mbp.dhcp.thefacebook.com>

On Wednesday, August 14, 2019 10:05 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Wed, Aug 14, 2019 at 10:51:23AM -0700, Andy Lutomirski wrote:
>
> > If eBPF is genuinely not usable by programs that are not fully trusted
> > by the admin, then no kernel changes at all are needed. Programs that
> > want to reduce their own privileges can easily fork() a privileged
> > subprocess or run a little helper to which they delegate BPF
> > operations. This is far more flexible than anything that will ever be
> > in the kernel because it allows the helper to verify that the rest of
> > the program is doing exactly what it's supposed to and restrict eBPF
> > operations to exactly the subset that is needed. So a container
> > manager or network manager that drops some provilege could have a
> > little bpf-helper that manages its BPF XDP, firewalling, etc
> > configuration. The two processes would talk over a socketpair.
>
> there were three projects that tried to delegate bpf operations.
> All of them failed.
> bpf operational workflow is much more complex than you're imagining.
> fork() also doesn't work for all cases.
> I gave this example before: consider multiple systemd-like deamons
> that need to do bpf operations that want to pass this 'bpf capability'
> to other deamons written by other teams. Some of them will start
> non-root, but still need to do bpf. They will be rpm installed
> and live upgraded while running.
> We considered to make systemd such centralized bpf delegation
> authority too. It didn't work. bpf in kernel grows quickly.
> libbpf part grows independently. llvm keeps evolving.
> All of them are being changed while system overall has to stay
> operational. Centralized approach breaks apart.
>
> > The interesting cases you're talking about really do involved
> > unprivileged or less privileged eBPF, though. Let's see:
> > systemd --user: systemd --user is not privileged at all. There's no
> > issue of reducing privilege, since systemd --user doesn't have any
> > privilege to begin with. But systemd supports some eBPF features, and
> > presumably it would like to support them in the systemd --user case.
> > This is unprivileged eBPF.
>
> Let's disambiguate the terminology.
> This /dev/bpf patch set started as describing the feature as 'unprivileged bpf'.
> I think that was a mistake.
> Let's call systemd-like deamon usage of bpf 'less privileged bpf'.
> This is not unprivileged.
> 'unprivileged bpf' is what sysctl kernel.unprivileged_bpf_disabled controls.
>
> There is a huge difference between the two.
> I'm against extending 'unprivileged bpf' even a bit more than what it is
> today for many reasons mentioned earlier.
> The /dev/bpf is about 'less privileged'.
> Less privileged than root. We need to split part of full root capability
> into bpf capability. So that most of the root can be dropped.
> This is very similar to what cap_net_admin does.
> cap_net_amdin can bring down eth0 which is just as bad as crashing the box.
> cap_net_admin is very much privileged. Just 'less privileged' than root.
> Same thing for cap_bpf.
>
> May be we should do both cap_bpf and /dev/bpf to make it clear that
> this is the same thing. Two interfaces to achieve the same result.
>

systemd --user processes aren't "less privileged". The are COMPLETELY unprivileged.
Granting them cap_bpf is the same as granting it to every other unprivileged user
process. Also unprivileged user process can start systemd --user process with any
command they like.

Jordan

^ permalink raw reply

* [PATCH net-next] net: phy: swphy: emulate register MII_ESTATUS
From: Heiner Kallweit @ 2019-08-15 11:19 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org

When the genphy driver binds to a swphy it will call
genphy_read_abilites that will try to read MII_ESTATUS if BMSR_ESTATEN
is set in MII_BMSR. So far this would read the default value 0xffff
and 1000FD and 1000HD are reported as supported just by chance.
Better add explicit support for emulating MII_ESTATUS.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/phy/swphy.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/phy/swphy.c b/drivers/net/phy/swphy.c
index dad22481d..53c214a22 100644
--- a/drivers/net/phy/swphy.c
+++ b/drivers/net/phy/swphy.c
@@ -22,6 +22,7 @@ struct swmii_regs {
 	u16 bmsr;
 	u16 lpa;
 	u16 lpagb;
+	u16 estat;
 };
 
 enum {
@@ -48,6 +49,7 @@ static const struct swmii_regs speed[] = {
 	[SWMII_SPEED_1000] = {
 		.bmsr  = BMSR_ESTATEN,
 		.lpagb = LPA_1000FULL | LPA_1000HALF,
+		.estat = ESTATUS_1000_TFULL | ESTATUS_1000_THALF,
 	},
 };
 
@@ -56,11 +58,13 @@ static const struct swmii_regs duplex[] = {
 		.bmsr  = BMSR_ESTATEN | BMSR_100HALF,
 		.lpa   = LPA_10HALF | LPA_100HALF,
 		.lpagb = LPA_1000HALF,
+		.estat = ESTATUS_1000_THALF,
 	},
 	[SWMII_DUPLEX_FULL] = {
 		.bmsr  = BMSR_ESTATEN | BMSR_100FULL,
 		.lpa   = LPA_10FULL | LPA_100FULL,
 		.lpagb = LPA_1000FULL,
+		.estat = ESTATUS_1000_TFULL,
 	},
 };
 
@@ -112,6 +116,7 @@ int swphy_read_reg(int reg, const struct fixed_phy_status *state)
 {
 	int speed_index, duplex_index;
 	u16 bmsr = BMSR_ANEGCAPABLE;
+	u16 estat = 0;
 	u16 lpagb = 0;
 	u16 lpa = 0;
 
@@ -125,6 +130,7 @@ int swphy_read_reg(int reg, const struct fixed_phy_status *state)
 	duplex_index = state->duplex ? SWMII_DUPLEX_FULL : SWMII_DUPLEX_HALF;
 
 	bmsr |= speed[speed_index].bmsr & duplex[duplex_index].bmsr;
+	estat |= speed[speed_index].estat & duplex[duplex_index].estat;
 
 	if (state->link) {
 		bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE;
@@ -151,6 +157,8 @@ int swphy_read_reg(int reg, const struct fixed_phy_status *state)
 		return lpa;
 	case MII_STAT1000:
 		return lpagb;
+	case MII_ESTATUS:
+		return estat;
 
 	/*
 	 * We do not support emulating Clause 45 over Clause 22 register
-- 
2.22.0


^ permalink raw reply related

* [PATCH net-next] net: phy: read MII_CTRL1000 in genphy_read_status only if needed
From: Heiner Kallweit @ 2019-08-15 11:15 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org

Value of MII_CTRL1000 is needed only if LPA_1000MSFAIL is set.
Therefore move reading this register.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy_device.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 54f80af31..4f4ddc05c 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1794,7 +1794,7 @@ EXPORT_SYMBOL(genphy_update_link);
  */
 int genphy_read_status(struct phy_device *phydev)
 {
-	int adv, lpa, lpagb, err, old_link = phydev->link;
+	int lpa, lpagb, err, old_link = phydev->link;
 
 	/* Update the link, but return if there was an error */
 	err = genphy_update_link(phydev);
@@ -1816,11 +1816,12 @@ int genphy_read_status(struct phy_device *phydev)
 			if (lpagb < 0)
 				return lpagb;
 
-			adv = phy_read(phydev, MII_CTRL1000);
-			if (adv < 0)
-				return adv;
-
 			if (lpagb & LPA_1000MSFAIL) {
+				int adv = phy_read(phydev, MII_CTRL1000);
+
+				if (adv < 0)
+					return adv;
+
 				if (adv & CTL1000_ENABLE_MASTER)
 					phydev_err(phydev, "Master/Slave resolution failed, maybe conflicting manual settings?\n");
 				else
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH bpf-next 0/5] Add support for SKIP_BPF flag for AF_XDP sockets
From: Toke Høiland-Jørgensen @ 2019-08-15 11:12 UTC (permalink / raw)
  To: Sridhar Samudrala, magnus.karlsson, bjorn.topel, netdev, bpf,
	sridhar.samudrala, intel-wired-lan, maciej.fijalkowski,
	tom.herbert
In-Reply-To: <1565840783-8269-1-git-send-email-sridhar.samudrala@intel.com>

Sridhar Samudrala <sridhar.samudrala@intel.com> writes:

> This patch series introduces XDP_SKIP_BPF flag that can be specified
> during the bind() call of an AF_XDP socket to skip calling the BPF 
> program in the receive path and pass the buffer directly to the socket.
>
> When a single AF_XDP socket is associated with a queue and a HW
> filter is used to redirect the packets and the app is interested in
> receiving all the packets on that queue, we don't need an additional 
> BPF program to do further filtering or lookup/redirect to a socket.
>
> Here are some performance numbers collected on 
>   - 2 socket 28 core Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
>   - Intel 40Gb Ethernet NIC (i40e)
>
> All tests use 2 cores and the results are in Mpps.
>
> turbo on (default)
> ---------------------------------------------	
>                       no-skip-bpf    skip-bpf
> ---------------------------------------------	
> rxdrop zerocopy           21.9         38.5 
> l2fwd  zerocopy           17.0         20.5
> rxdrop copy               11.1         13.3
> l2fwd  copy                1.9          2.0
>
> no turbo :  echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
> ---------------------------------------------	
>                       no-skip-bpf    skip-bpf
> ---------------------------------------------	
> rxdrop zerocopy           15.4         29.0
> l2fwd  zerocopy           11.8         18.2
> rxdrop copy                8.2         10.5
> l2fwd  copy                1.7          1.7
> ---------------------------------------------

You're getting this performance boost by adding more code in the fast
path for every XDP program; so what's the performance impact of that for
cases where we do run an eBPF program?

Also, this is basically a special-casing of a particular deployment
scenario. Without a way to control RX queue assignment and traffic
steering, you're basically hard-coding a particular app's takeover of
the network interface; I'm not sure that is such a good idea...

-Toke

^ permalink raw reply

* Re: [RFC PATCH bpf-next 00/14] xdp_flow: Flow offload to XDP
From: Toshiaki Makita @ 2019-08-15 10:59 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, netdev, bpf, William Tu
In-Reply-To: <f6160572-8fa8-0199-8d81-6159dd4cd5ff@gmail.com>

On 2019/08/14 16:33, Toshiaki Makita wrote:
>>>    bpf, hashtab: Compare keys in long
>>
>> 3Mpps vs 4Mpps just from this patch ?
>> or combined with i40 prefech patch ?
> 
> Combined.
> 
>>>   drivers/net/ethernet/intel/i40e/i40e_txrx.c  |    1 +
>>
>> Could you share "perf report" for just hash tab optimization
>> and for i40 ?
> 
> Sure, I'll get some more data and post them.

Here are perf report and performance numbers.
This time for some reason the performance is better than before.
Something in my env may have changed but could not identify that.

I cut and paste top 10 functions from perf report with drop rate for each case.
perf report is run with --no-child option, so does not include child functions load.
It looks like the hottest function is always xdp_flow BPF program for XDP,
but the shown function name is some meaningless one, like __this_module+0x800000007446.

- No prefetch, no long-compare

   3.3 Mpps

     25.22%  ksoftirqd/4      [kernel.kallsyms]             [k] __this_module+0x800000007446
     21.64%  ksoftirqd/4      [kernel.kallsyms]             [k] __htab_map_lookup_elem
     14.93%  ksoftirqd/4      [kernel.kallsyms]             [k] memcmp
      7.07%  ksoftirqd/4      [kernel.kallsyms]             [k] i40e_clean_rx_irq
      4.57%  ksoftirqd/4      [kernel.kallsyms]             [k] dev_map_enqueue
      3.60%  ksoftirqd/4      [kernel.kallsyms]             [k] lookup_nulls_elem_raw
      3.44%  ksoftirqd/4      [kernel.kallsyms]             [k] page_frag_free
      2.69%  ksoftirqd/4      [kernel.kallsyms]             [k] veth_xdp_rcv
      2.29%  ksoftirqd/4      [kernel.kallsyms]             [k] xdp_do_redirect
      1.51%  ksoftirqd/4      [kernel.kallsyms]             [k] veth_xdp_xmit

- With prefetch, no long-compare

   3.7 Mpps

     25.02%  ksoftirqd/4      [kernel.kallsyms]             [k] mirred_list_lock+0x800000008052
     21.52%  ksoftirqd/4      [kernel.kallsyms]             [k] __htab_map_lookup_elem
     13.20%  ksoftirqd/4      [kernel.kallsyms]             [k] memcmp
      7.38%  ksoftirqd/4      [kernel.kallsyms]             [k] i40e_clean_rx_irq
      4.09%  ksoftirqd/4      [kernel.kallsyms]             [k] lookup_nulls_elem_raw
      3.57%  ksoftirqd/4      [kernel.kallsyms]             [k] dev_map_enqueue
      3.50%  ksoftirqd/4      [kernel.kallsyms]             [k] page_frag_free
      2.86%  ksoftirqd/4      [kernel.kallsyms]             [k] xdp_do_redirect
      2.84%  ksoftirqd/4      [kernel.kallsyms]             [k] veth_xdp_rcv
      1.79%  ksoftirqd/4      [kernel.kallsyms]             [k] veth_xdp_xmit

- No prefetch, with long-compare

   4.2 Mpps

     24.64%  ksoftirqd/4      [kernel.kallsyms]             [k] __this_module+0x800000008f47
     24.42%  ksoftirqd/4      [kernel.kallsyms]             [k] __htab_map_lookup_elem
      6.91%  ksoftirqd/4      [kernel.kallsyms]             [k] i40e_clean_rx_irq
      4.04%  ksoftirqd/4      [kernel.kallsyms]             [k] page_frag_free
      3.53%  ksoftirqd/4      [kernel.kallsyms]             [k] lookup_nulls_elem_raw
      3.14%  ksoftirqd/4      [kernel.kallsyms]             [k] veth_xdp_rcv
      3.13%  ksoftirqd/4      [kernel.kallsyms]             [k] dev_map_enqueue
      2.34%  ksoftirqd/4      [kernel.kallsyms]             [k] xdp_do_redirect
      1.76%  ksoftirqd/4      [kernel.kallsyms]             [k] key_equal
      1.37%  ksoftirqd/4      [kernel.kallsyms]             [k] zero_key+0x800000010e93

   NOTE: key_equal is called in place of memcmp.

- With prefetch, with long-compare

   4.6 Mpps

     26.68%  ksoftirqd/4      [kernel.kallsyms]             [k] mirred_list_lock+0x80000000a109
     22.37%  ksoftirqd/4      [kernel.kallsyms]             [k] __htab_map_lookup_elem
     10.79%  ksoftirqd/4      [kernel.kallsyms]             [k] i40e_clean_rx_irq
      4.74%  ksoftirqd/4      [kernel.kallsyms]             [k] page_frag_free
      4.09%  ksoftirqd/4      [kernel.kallsyms]             [k] veth_xdp_rcv
      3.97%  ksoftirqd/4      [kernel.kallsyms]             [k] dev_map_enqueue
      3.79%  ksoftirqd/4      [kernel.kallsyms]             [k] lookup_nulls_elem_raw
      3.09%  ksoftirqd/4      [kernel.kallsyms]             [k] xdp_do_redirect
      2.45%  ksoftirqd/4      [kernel.kallsyms]             [k] key_equal
      1.91%  ksoftirqd/4      [kernel.kallsyms]             [k] veth_xdp_xmit

Toshiaki Makita

^ permalink raw reply

* INFO: task hung in tls_sw_release_resources_tx
From: syzbot @ 2019-08-15 10:54 UTC (permalink / raw)
  To: ast, aviadye, borisp, bpf, daniel, davejwatson, davem,
	jakub.kicinski, john.fastabend, kafai, linux-kernel, netdev,
	songliubraving, syzkaller-bugs, yhs

Hello,

syzbot found the following crash on:

HEAD commit:    6d5afe20 sctp: fix memleak in sctp_send_reset_streams
git tree:       net
console output: https://syzkaller.appspot.com/x/log.txt?x=16e5536a600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a4c9e9f08e9e8960
dashboard link: https://syzkaller.appspot.com/bug?extid=6a9ff159672dfbb41c95
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17cb0502600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14d5dc22600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+6a9ff159672dfbb41c95@syzkaller.appspotmail.com

INFO: task syz-executor153:10198 blocked for more than 143 seconds.
       Not tainted 5.3.0-rc3+ #162
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor153 D27672 10198  10179 0x80000002
Call Trace:
  context_switch kernel/sched/core.c:3254 [inline]
  __schedule+0x755/0x1580 kernel/sched/core.c:3880
  schedule+0xa8/0x270 kernel/sched/core.c:3944
  schedule_timeout+0x717/0xc50 kernel/time/timer.c:1783
  do_wait_for_common kernel/sched/completion.c:83 [inline]
  __wait_for_common kernel/sched/completion.c:104 [inline]
  wait_for_common kernel/sched/completion.c:115 [inline]
  wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136
  crypto_wait_req include/linux/crypto.h:685 [inline]
  crypto_wait_req include/linux/crypto.h:680 [inline]
  tls_sw_release_resources_tx+0x4ee/0x6b0 net/tls/tls_sw.c:2075
  tls_sk_proto_cleanup net/tls/tls_main.c:275 [inline]
  tls_sk_proto_close+0x686/0x970 net/tls/tls_main.c:305
  inet_release+0xed/0x200 net/ipv4/af_inet.c:427
  inet6_release+0x53/0x80 net/ipv6/af_inet6.c:470
  __sock_release+0xce/0x280 net/socket.c:590
  sock_close+0x1e/0x30 net/socket.c:1268
  __fput+0x2ff/0x890 fs/file_table.c:280
  ____fput+0x16/0x20 fs/file_table.c:313
  task_work_run+0x145/0x1c0 kernel/task_work.c:113
  exit_task_work include/linux/task_work.h:22 [inline]
  do_exit+0x92f/0x2e50 kernel/exit.c:879
  do_group_exit+0x135/0x360 kernel/exit.c:983
  __do_sys_exit_group kernel/exit.c:994 [inline]
  __se_sys_exit_group kernel/exit.c:992 [inline]
  __x64_sys_exit_group+0x44/0x50 kernel/exit.c:992
  do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x43ff88
Code: 00 00 be 3c 00 00 00 eb 19 66 0f 1f 84 00 00 00 00 00 48 89 d7 89 f0  
0f 05 48 3d 00 f0 ff ff 77 21 f4 48 89 d7 44 89 c0 0f 05 <48> 3d 00 f0 ff  
ff 76 e0 f7 d8 64 41 89 01 eb d8 0f 1f 84 00 00 00
RSP: 002b:00007ffd1c2d0f78 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043ff88
RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
RBP: 00000000004bf890 R08: 00000000000000e7 R09: ffffffffffffffd0
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00000000006d1180 R14: 0000000000000000 R15: 0000000000000000
INFO: lockdep is turned off.
NMI backtrace for cpu 0
CPU: 0 PID: 1057 Comm: khungtaskd Not tainted 5.3.0-rc3+ #162
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101
  nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62
  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
  trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
  check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
  watchdog+0x9d0/0xef0 kernel/hung_task.c:289
  kthread+0x361/0x430 kernel/kthread.c:255
  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0xe/0x10  
arch/x86/include/asm/irqflags.h:60


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH] net: usbnet: fix a memory leak bug
From: Oliver Neukum @ 2019-08-15 10:42 UTC (permalink / raw)
  To: Wenwen Wang
  Cc: David S. Miller, open list, open list:USB NETWORKING DRIVERS,
	open list:USB USBNET DRIVER FRAMEWORK
In-Reply-To: <1565804493-7758-1-git-send-email-wenwen@cs.uga.edu>

Am Mittwoch, den 14.08.2019, 12:41 -0500 schrieb Wenwen Wang:
> In usbnet_start_xmit(), 'urb->sg' is allocated through kmalloc_array() by
> invoking build_dma_sg(). Later on, if 'CONFIG_PM' is defined and the if
> branch is taken, the execution will go to the label 'deferred'. However,
> 'urb->sg' is not deallocated on this execution path, leading to a memory
> leak bug.

Just to make this clear:

> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
NACK

For the reason Jack explained. Deferral is not a failure.

	Regards
		Oliver


^ permalink raw reply

* Re: [RFC PATCH bpf-next 00/14] xdp_flow: Flow offload to XDP
From: Toshiaki Makita @ 2019-08-15 10:26 UTC (permalink / raw)
  To: Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, netdev, bpf, William Tu
In-Reply-To: <20190814170715.GJ2820@mini-arch>

On 2019/08/15 2:07, Stanislav Fomichev wrote:
> On 08/13, Toshiaki Makita wrote:
>> * Implementation
>>
>> xdp_flow makes use of UMH to load an eBPF program for XDP, similar to
>> bpfilter. The difference is that xdp_flow does not generate the eBPF
>> program dynamically but a prebuilt program is embedded in UMH. This is
>> mainly because flow insertion is considerably frequent. If we generate
>> and load an eBPF program on each insertion of a flow, the latency of the
>> first packet of ping in above test will incease, which I want to avoid.
> Can this be instead implemented with a new hook that will be called
> for TC events? This hook can write to perf event buffer and control
> plane will insert/remove/modify flow tables in the BPF maps (contol
> plane will also install xdp program).
> 
> Why do we need UMH? What am I missing?

So you suggest doing everything in xdp_flow kmod?
I also thought about that. There are two phases so let's think about them separately.

1) TC block (qdisc) creation / eBPF load

I saw eBPF maintainers repeatedly saying eBPF program loading needs to be
done from userland, not from kernel, to run the verifier for safety.
However xdp_flow eBPF program is prebuilt and embedded in kernel so we may
allow such programs to be loaded from kernel? I currently don't have the will
to make such an API as loading can be done with current UMH mechanism.

2) flow insertion / eBPF map update

Not sure if this needs to be done from userland. One concern is that eBPF maps can
be modified by unrelated processes and we need to handle all unexpected state of maps.
Such handling tends to be difficult and may cause unexpected kernel behavior.
OTOH updating maps from kmod may reduces the latency of flow insertion drastically.

Alexei, Daniel, what do you think?

Toshiaki Makita

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox