Netdev List
 help / color / mirror / Atom feed
* [PATCH net v2 2/2] xdp: refine xdp api with regards to generic xdp
From: Daniel Borkmann @ 2017-05-11 23:04 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, kubakici, john.fastabend, netdev,
	Daniel Borkmann
In-Reply-To: <cover.1494542162.git.daniel@iogearbox.net>

While working on the iproute2 generic XDP frontend, I noticed that
as of right now it's possible to have native *and* generic XDP
programs loaded both at the same time for the case when a driver
supports native XDP.

The intended model for generic XDP from b5cdae3291f7 ("net: Generic
XDP") is, however, that only one out of the two can be present at
once which is also indicated as such in the XDP netlink dump part.
The main rationale for generic XDP is to ease accessibility (in
case a driver does not yet have XDP support) and to generically
provide a semantical model as an example for driver developers
wanting to add XDP support. The generic XDP option for an XDP
aware driver can still be useful for comparing and testing both
implementations.

However, it is not intended to have a second XDP processing stage
or layer with exactly the same functionality of the first native
stage. Only reason could be to have a partial fallback for future
XDP features that are not supported yet in the native implementation
and we probably also shouldn't strive for such fallback and instead
encourage native feature support in the first place. Given there's
currently no such fallback issue or use case, lets not go there yet
if we don't need to.

Therefore, change semantics for loading XDP and bail out if the
user tries to load a generic XDP program when a native one is
present and vice versa. Another alternative to bailing out would
be to handle the transition from one flavor to another gracefully,
but that would require to bring the device down, exchange both
types of programs, and bring it up again in order to avoid a tiny
window where a packet could hit both hooks. Given this complicates
the logic for just a debugging feature in the native case, I went
with the simpler variant.

For the dump, remove IFLA_XDP_FLAGS that was added with b5cdae3291f7
and reuse IFLA_XDP_ATTACHED for indicating the mode. Dumping all
or just a subset of flags that were used for loading the XDP prog
is suboptimal in the long run since not all flags are useful for
dumping and if we start to reuse the same flag definitions for
load and dump, then we'll waste bit space. What we really just
want is to dump the mode for now.

Current IFLA_XDP_ATTACHED semantics are: nothing was installed (0),
a program is running at the native driver layer (1). Thus, add a
mode that says that a program is running at generic XDP layer (2).
Applications will handle this fine in that older binaries will
just indicate that something is attached at XDP layer, effectively
this is similar to IFLA_XDP_FLAGS attr that we would have had
modulo the redundancy.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/netdevice.h    |  8 +++++--
 include/uapi/linux/if_link.h |  7 ++++++
 net/core/dev.c               | 55 +++++++++++++++++++++++++++++---------------
 net/core/rtnetlink.c         | 40 +++++++++++++++-----------------
 4 files changed, 67 insertions(+), 43 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9c23bd2..3f39d27 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3296,11 +3296,15 @@ int dev_get_phys_port_id(struct net_device *dev,
 int dev_get_phys_port_name(struct net_device *dev,
 			   char *name, size_t len);
 int dev_change_proto_down(struct net_device *dev, bool proto_down);
-int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
-		      int fd, u32 flags);
 struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev);
 struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				    struct netdev_queue *txq, int *ret);
+
+typedef int (*xdp_op_t)(struct net_device *dev, struct netdev_xdp *xdp);
+int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
+		      int fd, u32 flags);
+bool __dev_xdp_attached(struct net_device *dev, xdp_op_t xdp_op);
+
 int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
 int dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
 bool is_skb_forwardable(const struct net_device *dev,
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 549ac8a..15ac203 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -894,6 +894,13 @@ enum {
 					 XDP_FLAGS_SKB_MODE | \
 					 XDP_FLAGS_DRV_MODE)
 
+/* These are stored into IFLA_XDP_ATTACHED on dump. */
+enum {
+	XDP_ATTACHED_NONE = 0,
+	XDP_ATTACHED_DRV,
+	XDP_ATTACHED_SKB,
+};
+
 enum {
 	IFLA_XDP_UNSPEC,
 	IFLA_XDP_FD,
diff --git a/net/core/dev.c b/net/core/dev.c
index e56cb71..fca407b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6852,6 +6852,32 @@ int dev_change_proto_down(struct net_device *dev, bool proto_down)
 }
 EXPORT_SYMBOL(dev_change_proto_down);
 
+bool __dev_xdp_attached(struct net_device *dev, xdp_op_t xdp_op)
+{
+	struct netdev_xdp xdp;
+
+	memset(&xdp, 0, sizeof(xdp));
+	xdp.command = XDP_QUERY_PROG;
+
+	/* Query must always succeed. */
+	WARN_ON(xdp_op(dev, &xdp) < 0);
+	return xdp.prog_attached;
+}
+
+static int dev_xdp_install(struct net_device *dev, xdp_op_t xdp_op,
+			   struct netlink_ext_ack *extack,
+			   struct bpf_prog *prog)
+{
+	struct netdev_xdp xdp;
+
+	memset(&xdp, 0, sizeof(xdp));
+	xdp.command = XDP_SETUP_PROG;
+	xdp.extack = extack;
+	xdp.prog = prog;
+
+	return xdp_op(dev, &xdp);
+}
+
 /**
  *	dev_change_xdp_fd - set or clear a bpf program for a device rx path
  *	@dev: device
@@ -6864,43 +6890,34 @@ int dev_change_proto_down(struct net_device *dev, bool proto_down)
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		      int fd, u32 flags)
 {
-	int (*xdp_op)(struct net_device *dev, struct netdev_xdp *xdp);
 	const struct net_device_ops *ops = dev->netdev_ops;
 	struct bpf_prog *prog = NULL;
-	struct netdev_xdp xdp;
+	xdp_op_t xdp_op, xdp_chk;
 	int err;
 
 	ASSERT_RTNL();
 
-	xdp_op = ops->ndo_xdp;
+	xdp_op = xdp_chk = ops->ndo_xdp;
 	if (!xdp_op && (flags & XDP_FLAGS_DRV_MODE))
 		return -EOPNOTSUPP;
 	if (!xdp_op || (flags & XDP_FLAGS_SKB_MODE))
 		xdp_op = generic_xdp_install;
+	if (xdp_op == xdp_chk)
+		xdp_chk = generic_xdp_install;
 
 	if (fd >= 0) {
-		if (flags & XDP_FLAGS_UPDATE_IF_NOEXIST) {
-			memset(&xdp, 0, sizeof(xdp));
-			xdp.command = XDP_QUERY_PROG;
-
-			err = xdp_op(dev, &xdp);
-			if (err < 0)
-				return err;
-			if (xdp.prog_attached)
-				return -EBUSY;
-		}
+		if (xdp_chk && __dev_xdp_attached(dev, xdp_chk))
+			return -EEXIST;
+		if ((flags & XDP_FLAGS_UPDATE_IF_NOEXIST) &&
+		    __dev_xdp_attached(dev, xdp_op))
+			return -EBUSY;
 
 		prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_XDP);
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
 	}
 
-	memset(&xdp, 0, sizeof(xdp));
-	xdp.command = XDP_SETUP_PROG;
-	xdp.extack = extack;
-	xdp.prog = prog;
-
-	err = xdp_op(dev, &xdp);
+	err = dev_xdp_install(dev, xdp_op, extack, prog);
 	if (err < 0 && prog)
 		bpf_prog_put(prog);
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index dda9f16..d7f82c3 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -899,8 +899,7 @@ static size_t rtnl_port_size(const struct net_device *dev,
 static size_t rtnl_xdp_size(void)
 {
 	size_t xdp_size = nla_total_size(0) +	/* nest IFLA_XDP */
-			  nla_total_size(1) +	/* XDP_ATTACHED */
-			  nla_total_size(4);	/* XDP_FLAGS */
+			  nla_total_size(1);	/* XDP_ATTACHED */
 
 	return xdp_size;
 }
@@ -1247,37 +1246,34 @@ static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev)
 	return 0;
 }
 
+static u8 rtnl_xdp_attached_mode(struct net_device *dev)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	ASSERT_RTNL();
+
+	if (rcu_access_pointer(dev->xdp_prog))
+		return XDP_ATTACHED_SKB;
+	if (ops->ndo_xdp && __dev_xdp_attached(dev, ops->ndo_xdp))
+		return XDP_ATTACHED_DRV;
+
+	return XDP_ATTACHED_NONE;
+}
+
 static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
 {
 	struct nlattr *xdp;
-	u32 xdp_flags = 0;
-	u8 val = 0;
 	int err;
 
 	xdp = nla_nest_start(skb, IFLA_XDP);
 	if (!xdp)
 		return -EMSGSIZE;
-	if (rcu_access_pointer(dev->xdp_prog)) {
-		xdp_flags = XDP_FLAGS_SKB_MODE;
-		val = 1;
-	} else if (dev->netdev_ops->ndo_xdp) {
-		struct netdev_xdp xdp_op = {};
-
-		xdp_op.command = XDP_QUERY_PROG;
-		err = dev->netdev_ops->ndo_xdp(dev, &xdp_op);
-		if (err)
-			goto err_cancel;
-		val = xdp_op.prog_attached;
-	}
-	err = nla_put_u8(skb, IFLA_XDP_ATTACHED, val);
+
+	err = nla_put_u8(skb, IFLA_XDP_ATTACHED,
+			 rtnl_xdp_attached_mode(dev));
 	if (err)
 		goto err_cancel;
 
-	if (xdp_flags) {
-		err = nla_put_u32(skb, IFLA_XDP_FLAGS, xdp_flags);
-		if (err)
-			goto err_cancel;
-	}
 	nla_nest_end(skb, xdp);
 	return 0;
 
-- 
1.9.3

^ permalink raw reply related

* [PATCH net v2 0/2] Two generic xdp related follow-ups
From: Daniel Borkmann @ 2017-05-11 23:04 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, kubakici, john.fastabend, netdev,
	Daniel Borkmann

Two follow-ups for the generic XDP API, would be great if
both could still be considered, since the XDP API is not
frozen yet. For details please see individual patches.

v1 -> v2:
  - Implemented feedback from Jakub Kicinski (reusing
    attribute on dump), thanks!
  - Rest as is.

Thanks!

Daniel Borkmann (2):
  xdp: add flag to enforce driver mode
  xdp: refine xdp api with regards to generic xdp

 include/linux/netdevice.h          |  8 ++++--
 include/uapi/linux/if_link.h       | 13 +++++++--
 net/core/dev.c                     | 57 +++++++++++++++++++++++++-------------
 net/core/rtnetlink.c               | 45 +++++++++++++++---------------
 samples/bpf/xdp1_user.c            |  8 ++++--
 samples/bpf/xdp_tx_iptunnel_user.c |  7 ++++-
 6 files changed, 90 insertions(+), 48 deletions(-)

-- 
1.9.3

^ permalink raw reply

* [PATCH net v2 1/2] xdp: add flag to enforce driver mode
From: Daniel Borkmann @ 2017-05-11 23:04 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, kubakici, john.fastabend, netdev,
	Daniel Borkmann
In-Reply-To: <cover.1494542162.git.daniel@iogearbox.net>

After commit b5cdae3291f7 ("net: Generic XDP") we automatically fall
back to a generic XDP variant if the driver does not support native
XDP. Allow for an option where the user can specify that always the
native XDP variant should be selected and in case it's not supported
by a driver, just bail out.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 include/uapi/linux/if_link.h       | 6 ++++--
 net/core/dev.c                     | 2 ++
 net/core/rtnetlink.c               | 5 +++++
 samples/bpf/xdp1_user.c            | 8 ++++++--
 samples/bpf/xdp_tx_iptunnel_user.c | 7 ++++++-
 5 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8e56ac7..549ac8a 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -888,9 +888,11 @@ enum {
 /* XDP section */
 
 #define XDP_FLAGS_UPDATE_IF_NOEXIST	(1U << 0)
-#define XDP_FLAGS_SKB_MODE		(2U << 0)
+#define XDP_FLAGS_SKB_MODE		(1U << 1)
+#define XDP_FLAGS_DRV_MODE		(1U << 2)
 #define XDP_FLAGS_MASK			(XDP_FLAGS_UPDATE_IF_NOEXIST | \
-					 XDP_FLAGS_SKB_MODE)
+					 XDP_FLAGS_SKB_MODE | \
+					 XDP_FLAGS_DRV_MODE)
 
 enum {
 	IFLA_XDP_UNSPEC,
diff --git a/net/core/dev.c b/net/core/dev.c
index 96cf83d..e56cb71 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6873,6 +6873,8 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	ASSERT_RTNL();
 
 	xdp_op = ops->ndo_xdp;
+	if (!xdp_op && (flags & XDP_FLAGS_DRV_MODE))
+		return -EOPNOTSUPP;
 	if (!xdp_op || (flags & XDP_FLAGS_SKB_MODE))
 		xdp_op = generic_xdp_install;
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index bcb0f610..dda9f16 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2199,6 +2199,11 @@ static int do_setlink(const struct sk_buff *skb,
 				err = -EINVAL;
 				goto errout;
 			}
+			if ((xdp_flags & XDP_FLAGS_SKB_MODE) &&
+			    (xdp_flags & XDP_FLAGS_DRV_MODE)) {
+				err = -EINVAL;
+				goto errout;
+			}
 		}
 
 		if (xdp[IFLA_XDP_FD]) {
diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index 378850c..17be9ea 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -62,13 +62,14 @@ static void usage(const char *prog)
 	fprintf(stderr,
 		"usage: %s [OPTS] IFINDEX\n\n"
 		"OPTS:\n"
-		"    -S    use skb-mode\n",
+		"    -S    use skb-mode\n"
+		"    -N    enforce native mode\n",
 		prog);
 }
 
 int main(int argc, char **argv)
 {
-	const char *optstr = "S";
+	const char *optstr = "SN";
 	char filename[256];
 	int opt;
 
@@ -77,6 +78,9 @@ int main(int argc, char **argv)
 		case 'S':
 			xdp_flags |= XDP_FLAGS_SKB_MODE;
 			break;
+		case 'N':
+			xdp_flags |= XDP_FLAGS_DRV_MODE;
+			break;
 		default:
 			usage(basename(argv[0]));
 			return 1;
diff --git a/samples/bpf/xdp_tx_iptunnel_user.c b/samples/bpf/xdp_tx_iptunnel_user.c
index 92b8bde..631cdcc 100644
--- a/samples/bpf/xdp_tx_iptunnel_user.c
+++ b/samples/bpf/xdp_tx_iptunnel_user.c
@@ -79,6 +79,8 @@ static void usage(const char *cmd)
 	printf("    -m <dest-MAC> Used in sending the IP Tunneled pkt\n");
 	printf("    -T <stop-after-X-seconds> Default: 0 (forever)\n");
 	printf("    -P <IP-Protocol> Default is TCP\n");
+	printf("    -S use skb-mode\n");
+	printf("    -N enforce native mode\n");
 	printf("    -h Display this help\n");
 }
 
@@ -138,7 +140,7 @@ int main(int argc, char **argv)
 {
 	unsigned char opt_flags[256] = {};
 	unsigned int kill_after_s = 0;
-	const char *optstr = "i:a:p:s:d:m:T:P:Sh";
+	const char *optstr = "i:a:p:s:d:m:T:P:SNh";
 	int min_port = 0, max_port = 0;
 	struct iptnl_info tnl = {};
 	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
@@ -206,6 +208,9 @@ int main(int argc, char **argv)
 		case 'S':
 			xdp_flags |= XDP_FLAGS_SKB_MODE;
 			break;
+		case 'N':
+			xdp_flags |= XDP_FLAGS_DRV_MODE;
+			break;
 		default:
 			usage(argv[0]);
 			return 1;
-- 
1.9.3

^ permalink raw reply related

* Re: BPF relocations
From: Alexei Starovoitov @ 2017-05-11 23:10 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, netdev
In-Reply-To: <20170511.153118.2082025756694634804.davem@davemloft.net>

On 5/11/17 12:31 PM, David Miller wrote:
>
> I haven't done more work on bintuils BPF support because we
> need to figure out exactly what to do with relocations.  So
> I've been trying to spend time thinking about this.
>
> As far as I can tell the 64-bit BPF relocation llvm uses
> is used in two situations:
>
> 1) 64-bit relocations against data
>
> 2) 64-bit relocations against ldimm64 instructions
>
> If this is true it's a very bad decision that has ramifications for us
> right now.
>
> One must always explicitly define relocations as being against data or
> instruction fields.  You cannot use the same relocation for both kinds
> of transformations, somehow trying to figure out what to do
> "contextually".  That doesn't work.

why it doesn't work?
as far as i can see x86 doesn't care where the relo applies.
afaik relocations are divided into absolute, pc relative and pic 
relative and it doesn't matter whether they're against .text,
.eh_frame or .debug_* sections.

We have just two so far: absolute 32-bit and absolute 64-bit relocation.
I don't see what we would use pc-relative relo for.

If I remember correctly, the x64 and other cpus use pc-relative for
'call foo' insns since this is how those instructions work.
We don't have such calls yet.

^ permalink raw reply

* Re: [PATCH net] samples/bpf: run cleanup routines when receiving SIGTERM
From: Alexei Starovoitov @ 2017-05-11 23:16 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: netdev, shahid.habib
In-Reply-To: <1494532350-23945-1-git-send-email-andy@greyhouse.net>

On Thu, May 11, 2017 at 03:52:30PM -0400, Andy Gospodarek wrote:
> Shahid Habib noticed that when xdp1 was killed from a different console the xdp
> program was not cleaned-up properly in the kernel and it continued to forward
> traffic.
> 
> Most of the applications in samples/bpf cleanup properly, but only when getting
> SIGINT.  Since kill defaults to using SIGTERM, add support to cleanup when the
> application receives either SIGINT or SIGTERM.
> 
> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> Reported-by: Shahid Habib <shahid.habib@broadcom.com>

Thanks for the fix.
Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* Re: [PATCH net] samples/bpf: run cleanup routines when receiving SIGTERM
From: Daniel Borkmann @ 2017-05-11 23:34 UTC (permalink / raw)
  To: Andy Gospodarek, netdev; +Cc: shahid.habib
In-Reply-To: <1494532350-23945-1-git-send-email-andy@greyhouse.net>

On 05/11/2017 09:52 PM, Andy Gospodarek wrote:
> Shahid Habib noticed that when xdp1 was killed from a different console the xdp
> program was not cleaned-up properly in the kernel and it continued to forward
> traffic.
>
> Most of the applications in samples/bpf cleanup properly, but only when getting
> SIGINT.  Since kill defaults to using SIGTERM, add support to cleanup when the
> application receives either SIGINT or SIGTERM.
>
> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> Reported-by: Shahid Habib <shahid.habib@broadcom.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [Patch net] ipv4: restore rt->fi for reference counting
From: Cong Wang @ 2017-05-12  0:07 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers,
	Andrey Konovalov, Eric Dumazet
In-Reply-To: <alpine.LFD.2.20.1705102141320.1928@ja.home.ssi.bg>

On Wed, May 10, 2017 at 12:51 PM, Julian Anastasov <ja@ssi.bg> wrote:
>         Oh, well, the sockets can hold cached dst.
> But if the promise is that rt->fi is used only as
> reference to metrics we have to find a way to drop
> the dev references at NETDEV_UNREGISTER time. If you
> set nh_dev to NULL then all lookups should check it
> for != NULL. The sockets will not walk NHs via rt->fi,
> i.e. the route lookups will get valid res.fi from trees,
> so it may work in this way.
>

So, if I understand you correctly it is safe to NULL'ing
nh_dev in NETDEV_UNREGISTER_FINAL, right?

If still not, how about transfer nh_dev's to loopback_dev
too in NETDEV_UNREGISTER? Like we transfer dst->dev.

I don't want to touch the fast path to check for NULL, as
it will change more code and slow down performance.

Thanks.

^ permalink raw reply

* Re: [PATCH net v2 2/2] xdp: refine xdp api with regards to generic xdp
From: Jakub Kicinski @ 2017-05-12  0:13 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, alexei.starovoitov, john.fastabend, netdev
In-Reply-To: <fe08e3c98d6c473d77b14a323abbd41c1e7d8f6d.1494542162.git.daniel@iogearbox.net>

On Fri, 12 May 2017 01:04:46 +0200, Daniel Borkmann wrote:
> For the dump, remove IFLA_XDP_FLAGS that was added with b5cdae3291f7
> and reuse IFLA_XDP_ATTACHED for indicating the mode. Dumping all
> or just a subset of flags that were used for loading the XDP prog
> is suboptimal in the long run since not all flags are useful for
> dumping and if we start to reuse the same flag definitions for
> load and dump, then we'll waste bit space. What we really just
> want is to dump the mode for now.
> 
> Current IFLA_XDP_ATTACHED semantics are: nothing was installed (0),
> a program is running at the native driver layer (1). Thus, add a
> mode that says that a program is running at generic XDP layer (2).
> Applications will handle this fine in that older binaries will
> just indicate that something is attached at XDP layer, effectively
> this is similar to IFLA_XDP_FLAGS attr that we would have had
> modulo the redundancy.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Alexei Starovoitov <ast@kernel.org>

Looks good to me, thanks!

^ permalink raw reply

* Re: [PATCH v2 1/7] bpf: Track alignment of register values in the verifier.
From: David Miller @ 2017-05-12  1:14 UTC (permalink / raw)
  To: ast; +Cc: daniel, alexei.starovoitov, netdev
In-Reply-To: <06d46f08-2c62-d167-1fe5-a49284c614bb@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Thu, 11 May 2017 15:53:06 -0700

> On 5/11/17 9:05 AM, David Miller wrote:
>> +		had_id = (dst_reg->id != 0);
>> +
>>  		/* dst_reg stays as pkt_ptr type and since some positive
>>  		 * integer value was added to the pointer, increment its 'id'
>>  		 */
>>  		dst_reg->id = ++env->id_gen;
>>
>> -		/* something was added to pkt_ptr, set range and off to zero */
>> +		/* something was added to pkt_ptr, set range to zero */
>> +		dst_reg->aux_off = dst_reg->off;
> 
> what about 2nd addition of a variable to pkt_ptr ?
> aux_off sort-of remembers already accumulated offset in pkt_ptr, but
> above line will hard assign it which doesn't seem right for the 2nd
> addition.
> Ex:
> before first add, reg->off == 14
> after first add, aux_off = 14, off = 0
> then imm4 added, now we have reg->off=4, aux_off=14
> now we do 2nd add of variable and
> reg->aux_off becomes 4
> and if we later do u64 load from the packet it will be rejected
> due to (net_ip_align + 4) whereas it should have been ok
> due to (net_ip_align + 14 + 4).

Indeed, we have to accumulate.  I was just thinking about this earlier
today.

Thanks for pointing this out, I'll work on a fix and write some test
cases.

^ permalink raw reply

* Re: BPF relocations
From: David Miller @ 2017-05-12  1:19 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev
In-Reply-To: <b39b79d6-4fe2-f887-dc7e-03b62d6eb3e4@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Thu, 11 May 2017 16:10:35 -0700

> I don't see what we would use pc-relative relo for.

We must have them at least for banches.

Internally, we have to emit some kind of relocation as GAS makes it's
first pass over the instructions.

Afterwards, it walks the relocations and resolves all that can be done
at assembly time, and preserves as relocations in the object file for
those which it cannot.

Thinking further down the line many other kinds of PC relative
relocations are possible, even if you don't allow calls.  For example:

	ldimm64	R1, 24 + (. - external_label)

This would be a 64-bit PC relative reloc, with the value 24 in the
addend.

And eventually we want full linking.

The example above may seem silly, but every other full CPU ELF
specification handles these things completely and we should seek to be
complete as well.

^ permalink raw reply

* Re: [Patch net] ipv4: restore rt->fi for reference counting
From: Cong Wang @ 2017-05-12  1:22 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers,
	Andrey Konovalov, Eric Dumazet
In-Reply-To: <CAM_iQpW0fJJXPS=FoTf-o4f5-mOC=mZYNP8=uqiUOmvX+CDwWw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 502 bytes --]

On Thu, May 11, 2017 at 5:07 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> So, if I understand you correctly it is safe to NULL'ing
> nh_dev in NETDEV_UNREGISTER_FINAL, right?
>
> If still not, how about transfer nh_dev's to loopback_dev
> too in NETDEV_UNREGISTER? Like we transfer dst->dev.
>
> I don't want to touch the fast path to check for NULL, as
> it will change more code and slow down performance.

Finally I come up with the attached patch. Please let me know if
I still miss anything.

[-- Attachment #2: ipv4-rt-fi-ref-count.diff --]
[-- Type: text/plain, Size: 4364 bytes --]

commit edc38ecab7101487b35fc9152e166a2670467e49
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Tue May 9 14:35:10 2017 -0700

    ipv4: restore rt->fi for reference counting, try #2
    
    Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 6692c57..0f04f4d 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -392,6 +392,7 @@ int fib_unmerge(struct net *net);
 /* Exported by fib_semantics.c */
 int ip_fib_check_default(__be32 gw, struct net_device *dev);
 int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force);
+void fib_put_nh_devs(struct net_device *dev);
 int fib_sync_down_addr(struct net_device *dev, __be32 local);
 int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
 
diff --git a/include/net/route.h b/include/net/route.h
index 2cc0e14..4335eb7 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -69,6 +69,7 @@ struct rtable {
 
 	struct list_head	rt_uncached;
 	struct uncached_list	*rt_uncached_list;
+	struct fib_info		*fi; /* for refcnt to shared metrics */
 };
 
 static inline bool rt_is_input_route(const struct rtable *rt)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 39bd1ed..59b0a1d 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1178,6 +1178,9 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
 		fib_disable_ip(dev, event, true);
 		rt_flush_dev(dev);
 		return NOTIFY_DONE;
+	} else if (event == NETDEV_UNREGISTER_FINAL) {
+		fib_put_nh_devs(dev);
+		return NOTIFY_DONE;
 	}
 
 	in_dev = __in_dev_get_rtnl(dev);
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index da449dd..b85c5bb 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1453,6 +1453,36 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force)
 	return ret;
 }
 
+/* We have to release these nh_dev here because a dst could still hold a
+ * fib_info via rt->fi, we can't wait for GC, a socket could hold the dst
+ * for a long time.
+ */
+void fib_put_nh_devs(struct net_device *dev)
+{
+	struct fib_info *prev_fi = NULL;
+	unsigned int hash = fib_devindex_hashfn(dev->ifindex);
+	struct hlist_head *head = &fib_info_devhash[hash];
+	struct fib_nh *nh;
+
+	hlist_for_each_entry(nh, head, nh_hash) {
+		struct fib_info *fi = nh->nh_parent;
+
+		if (nh->nh_dev != dev || fi == prev_fi)
+			continue;
+		prev_fi = fi;
+		change_nexthops(fi) {
+			if (nexthop_nh->nh_dev == dev) {
+				/* This should be safe, we are on unregister
+				 * path, after synchronize_net() and
+				 * rcu_barrier(), no one could see this.
+				 */
+				RCU_INIT_POINTER(nexthop_nh->nh_dev, NULL);
+				dev_put(dev);
+			}
+		} endfor_nexthops(fi)
+	}
+}
+
 /* Must be invoked inside of an RCU protected region.  */
 static void fib_select_default(const struct flowi4 *flp, struct fib_result *res)
 {
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 655d9ee..f647310 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1387,6 +1387,11 @@ static void ipv4_dst_destroy(struct dst_entry *dst)
 {
 	struct rtable *rt = (struct rtable *) dst;
 
+	if (rt->fi) {
+		fib_info_put(rt->fi);
+		rt->fi = NULL;
+	}
+
 	if (!list_empty(&rt->rt_uncached)) {
 		struct uncached_list *ul = rt->rt_uncached_list;
 
@@ -1424,6 +1429,16 @@ static bool rt_cache_valid(const struct rtable *rt)
 		!rt_is_expired(rt);
 }
 
+static void rt_init_metrics(struct rtable *rt, struct fib_info *fi)
+{
+	if (fi->fib_metrics != (u32 *)dst_default_metrics) {
+		fib_info_hold(fi);
+		rt->fi = fi;
+	}
+
+	dst_init_metrics(&rt->dst, fi->fib_metrics, true);
+}
+
 static void rt_set_nexthop(struct rtable *rt, __be32 daddr,
 			   const struct fib_result *res,
 			   struct fib_nh_exception *fnhe,
@@ -1438,7 +1453,7 @@ static void rt_set_nexthop(struct rtable *rt, __be32 daddr,
 			rt->rt_gateway = nh->nh_gw;
 			rt->rt_uses_gateway = 1;
 		}
-		dst_init_metrics(&rt->dst, fi->fib_metrics, true);
+		rt_init_metrics(rt, fi);
 #ifdef CONFIG_IP_ROUTE_CLASSID
 		rt->dst.tclassid = nh->nh_tclassid;
 #endif
@@ -1490,6 +1505,7 @@ struct rtable *rt_dst_alloc(struct net_device *dev,
 		rt->rt_gateway = 0;
 		rt->rt_uses_gateway = 0;
 		rt->rt_table_id = 0;
+		rt->fi = NULL;
 		INIT_LIST_HEAD(&rt->rt_uncached);
 
 		rt->dst.output = ip_output;

^ permalink raw reply related

* Re: [PATCH net-next] selftests/bpf: get rid of -D__x86_64__
From: David Miller @ 2017-05-12  1:29 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev
In-Reply-To: <9afe8b73-806c-1511-dd26-f09e67b85107@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Thu, 11 May 2017 15:58:33 -0700

> Can we than move gnu/stubs.h into include/uapi as well and remove
> the first -I. ?
> Or keep them separate, since this linux/types.h is bpf's arch types.h
> whereas gnu/stubs.h is a hack for glibc /usr/include/features.h ?
> I'm fine whichever way including keeping this patch as-is.

Let's keep it like this for now, and perhaps in the long term we
can have a better more organized piece of infrastructure for this.

So I'll commit this fix for now and happily sparc now works out of the
box for all the selftests and samples as far as I can tell. :-)

This whole thing go me thinking however.  What do you expect to happen
on 32-bit architectures implementing an eBPF JIT?  That's going to
create some serious conflicts and consternation wrt.  tracing which is
going to want to use headers which are for sizeof(void *)==4 whereas
for eBPF natively it's 8.

^ permalink raw reply

* Re: [PATCH net v2 0/2] Two generic xdp related follow-ups
From: David Miller @ 2017-05-12  1:31 UTC (permalink / raw)
  To: daniel; +Cc: alexei.starovoitov, kubakici, john.fastabend, netdev
In-Reply-To: <cover.1494542162.git.daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Fri, 12 May 2017 01:04:44 +0200

> Two follow-ups for the generic XDP API, would be great if
> both could still be considered, since the XDP API is not
> frozen yet. For details please see individual patches.
> 
> v1 -> v2:
>   - Implemented feedback from Jakub Kicinski (reusing
>     attribute on dump), thanks!
>   - Rest as is.

Series applied, thanks for working on this.

^ permalink raw reply

* Re: [PATCH net] netem: fix skb_orphan_partial()
From: David Miller @ 2017-05-12  1:33 UTC (permalink / raw)
  To: eric.dumazet; +Cc: mkm, stephen, edumazet, netdev
In-Reply-To: <1494541481.7796.124.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 11 May 2017 15:24:41 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> I should have known that lowering skb->truesize was dangerous :/
> 
> In case packets are not leaving the host via a standard Ethernet device,
> but looped back to local sockets, bad things can happen, as reported
> by Michael Madsen ( https://bugzilla.kernel.org/show_bug.cgi?id=195713 )
> 
> So instead of tweaking skb->truesize, lets change skb->destructor
> and keep a reference on the owner socket via its sk_refcnt.
> 
> Fixes: f2f872f9272a ("netem: Introduce skb_orphan_partial() helper")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Michael Madsen <mkm@nabto.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net] tcp: avoid fragmenting peculiar skbs in SACK
From: David Miller @ 2017-05-12  1:36 UTC (permalink / raw)
  To: ycheng; +Cc: netdev, ncardwell, edumazet, soheil
In-Reply-To: <20170511000127.4249-1-ycheng@google.com>

From: Yuchung Cheng <ycheng@google.com>
Date: Wed, 10 May 2017 17:01:27 -0700

> This patch fixes a bug in splitting an SKB during SACK
> processing. Specifically if an skb contains multiple
> packets and is only partially sacked in the higher sequences,
> tcp_match_sack_to_skb() splits the skb and marks the second fragment
> as SACKed.
> 
> The current code further attempts rounding up the first fragment
> to MSS boundaries. But it misses a boundary condition when the
> rounded-up fragment size (pkt_len) is exactly skb size.  Spliting
> such an skb is pointless and causses a kernel warning and aborts
> the SACK processing. This patch universally checks such over-split
> before calling tcp_fragment to prevent these unnecessary warnings.
> 
> Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries")
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>

Hehe, a 2.6.29 bug, funny that it lived for so long.

Applied and queued up for -stable, thanks!

^ permalink raw reply

* Re: [PATCH net] net: sched: optimize class dumps
From: David Miller @ 2017-05-12  1:38 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, jkosina, jhs, xiyou.wangcong, jiri
In-Reply-To: <1494478768.7796.108.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 10 May 2017 21:59:28 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> In commit 59cc1f61f09c ("net: sched: convert qdisc linked list to
> hashtable") we missed the opportunity to considerably speed up
> tc_dump_tclass_root() if a qdisc handle is provided by user.
> 
> Instead of iterating all the qdiscs, use qdisc_match_from_root()
> to directly get the one we look for.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH v2] xen-netfront: avoid crashing on resume after a failure in talk_to_netback()
From: David Miller @ 2017-05-12  1:39 UTC (permalink / raw)
  To: vkuznets; +Cc: xen-devel, netdev, linux-kernel, boris.ostrovsky, jgross
In-Reply-To: <20170511115806.25322-1-vkuznets@redhat.com>

From: Vitaly Kuznetsov <vkuznets@redhat.com>
Date: Thu, 11 May 2017 13:58:06 +0200

> Unavoidable crashes in netfront_resume() and netback_changed() after a
> previous fail in talk_to_netback() (e.g. when we fail to read MAC from
> xenstore) were discovered. The failure path in talk_to_netback() does
> unregister/free for netdev but we don't reset drvdata and we try accessing
> it after resume.
> 
> Fix the bug by removing the whole xen device completely with
> device_unregister(), this guarantees we won't have any calls into netfront
> after a failure.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> Changes since v1: instead of cleaning drvdata and checking for it in
> netfront_resume() and netback_changed() remove the device completely with
> device_unregister() [David Miller]

This looks a lot better, applied, thanks!

^ permalink raw reply

* Re: [PATCH net 0/2] qlcnic: Bug fix and update version
From: David Miller @ 2017-05-12  1:40 UTC (permalink / raw)
  To: manish.chopra; +Cc: netdev, Dept-GELinuxNICDev
In-Reply-To: <20170511141248.996-1-manish.chopra@cavium.com>

From: Manish Chopra <manish.chopra@cavium.com>
Date: Thu, 11 May 2017 07:12:46 -0700

> This series has one fix and bumps up driver version.
> Please consider applying to "net"

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH][V2] ethernet: aquantia: remove redundant checks on error status
From: David Miller @ 2017-05-12  1:42 UTC (permalink / raw)
  To: colin.king
  Cc: pavel.belous, vomlehn, Alexander.Loktionov, Dmitry.Bezrukov,
	Dmitrii.Tarakanov, netdev, kernel-janitors, linux-kernel
In-Reply-To: <20170511182940.18774-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Thu, 11 May 2017 19:29:40 +0100

> From: Colin Ian King <colin.king@canonical.com>
> 
> The error status err is initialized as zero and then being checked
> several times to see if it is less than zero even when it has not
> been updated.  It may seem that the err should be assigned to the
> return code of the call to the various *offload_en_set calls and
> then we check for failure, however, these functions are void and
> never actually return any status.  
> 
> Since these error checks are redundant we can remove these
> as well as err and the error exit label err_exit.
> 
> Detected by CoverityScan, CID#1398313 and CID#1398306 ("Logically
> dead code")
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net] samples/bpf: run cleanup routines when receiving SIGTERM
From: David Miller @ 2017-05-12  1:43 UTC (permalink / raw)
  To: andy; +Cc: netdev, shahid.habib
In-Reply-To: <1494532350-23945-1-git-send-email-andy@greyhouse.net>

From: Andy Gospodarek <andy@greyhouse.net>
Date: Thu, 11 May 2017 15:52:30 -0400

> Shahid Habib noticed that when xdp1 was killed from a different console the xdp
> program was not cleaned-up properly in the kernel and it continued to forward
> traffic.
> 
> Most of the applications in samples/bpf cleanup properly, but only when getting
> SIGINT.  Since kill defaults to using SIGTERM, add support to cleanup when the
> application receives either SIGINT or SIGTERM.
> 
> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> Reported-by: Shahid Habib <shahid.habib@broadcom.com>

Applied, thanks Andy.

^ permalink raw reply

* Re: BPF relocations
From: Alexei Starovoitov @ 2017-05-12  2:17 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, netdev
In-Reply-To: <20170511.211920.2220259143750653022.davem@davemloft.net>

On 5/11/17 6:19 PM, David Miller wrote:
> From: Alexei Starovoitov <ast@fb.com>
> Date: Thu, 11 May 2017 16:10:35 -0700
>
>> I don't see what we would use pc-relative relo for.
>
> We must have them at least for banches.
>
> Internally, we have to emit some kind of relocation as GAS makes it's
> first pass over the instructions.
>
> Afterwards, it walks the relocations and resolves all that can be done
> at assembly time, and preserves as relocations in the object file for
> those which it cannot.

got it. yes.
llvm has this thing as well. It calls it FK_PCRel_2
(FK stands for Fixup Kind) which is 16-bit pc relative relo
into 'off' bits of insn.
Fortunately for LLVM this fixup is never converted to actual relo
and it is applied before final .o is emitted.

So, indeed, defining 16-bit pc-relative relo for branches is
definitely useful.
I can imagine someone combining two .o files with 'goto's
crossing two functions or something.
Let's pick a code for it.

> Thinking further down the line many other kinds of PC relative
> relocations are possible, even if you don't allow calls.  For example:
>
> 	ldimm64	R1, 24 + (. - external_label)
>
> This would be a 64-bit PC relative reloc, with the value 24 in the
> addend.

yes. that would be useful as well.
I don't mind defining it for our semi-official bpf relo spec :)

> And eventually we want full linking.
>
> The example above may seem silly, but every other full CPU ELF
> specification handles these things completely and we should seek to be
> complete as well.

I agree that the further we go the more complete it will become.
My concern that if we define too many things because other archs
have them we may end up not using them when time comes, since we
didn't think of some minor detail.
So today I would only add 16-bit pc-relative and 64-bit pc-relative.
The former is clearly useful for branches and the latter for calls
and jmp tables.
Right now we don't have simple 'switch()'. llvm has to convert it
into a sequence of branches. It's ok-ish. Ideally we need jmp-table
like normal cpus do and ldimm64 + relo should work for such purpose.

^ permalink raw reply

* Re: [PATCH net 1/1] tipc: make macro tipc_wait_for_cond() smp safe
From: David Miller @ 2017-05-12  2:20 UTC (permalink / raw)
  To: jon.maloy; +Cc: netdev, parthasarathy.bhuvaragan, ying.xue, tipc-discussion
In-Reply-To: <1494527295-20646-1-git-send-email-jon.maloy@ericsson.com>

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Thu, 11 May 2017 20:28:15 +0200

> The macro tipc_wait_for_cond() is embedding the macro sk_wait_event()
> to fulfil its task. The latter, in turn, is evaluating the stated
> condition outside the socket lock context. This is problematic if
> the condition is accessing non-trivial data structures which may be
> altered by incoming interrupts, as is the case with the cong_links()
> linked list, used by socket to keep track of the current set of
> congested links. We sometimes see crashes when this list is accessed
> by a condition function at the same time as a SOCK_WAKEUP interrupt
> is removing an element from the list.
> 
> We fix this by expanding selected parts of sk_wait_event() into the
> outer macro, while ensuring that all evaluations of a given condition
> are performed under socket lock protection.
> 
> Fixes: commit 365ad353c256 ("tipc: reduce risk of user starvation
> during link congestion")
> 
> Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>

Applied and queued up for -stable, thanks Jon.

^ permalink raw reply

* Re: [PATCH v2 1/7] bpf: Track alignment of register values in the verifier.
From: David Miller @ 2017-05-12  2:31 UTC (permalink / raw)
  To: ast; +Cc: daniel, alexei.starovoitov, netdev
In-Reply-To: <20170511.211406.1201174799693258613.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Thu, 11 May 2017 21:14:06 -0400 (EDT)

> Indeed, we have to accumulate.  I was just thinking about this earlier
> today.

I've just pushed the following fix, thanks a lot.

====================
[PATCH] bpf: Handle multiple variable additions into packet pointers in verifier.

We must accumulate into reg->aux_off rather than use a plain assignment.

Add a test for this situation to test_align.

Reported-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 kernel/bpf/verifier.c                    |  2 +-
 tools/testing/selftests/bpf/test_align.c | 37 ++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e74fb1b..39f2dcb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1531,7 +1531,7 @@ static int check_packet_ptr_add(struct bpf_verifier_env *env,
 		dst_reg->id = ++env->id_gen;
 
 		/* something was added to pkt_ptr, set range to zero */
-		dst_reg->aux_off = dst_reg->off;
+		dst_reg->aux_off += dst_reg->off;
 		dst_reg->off = 0;
 		dst_reg->range = 0;
 		if (had_id)
diff --git a/tools/testing/selftests/bpf/test_align.c b/tools/testing/selftests/bpf/test_align.c
index ed24255..9644d4e 100644
--- a/tools/testing/selftests/bpf/test_align.c
+++ b/tools/testing/selftests/bpf/test_align.c
@@ -273,6 +273,20 @@ static struct bpf_align_test tests[] = {
 			BPF_EXIT_INSN(),
 			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_5, 0),
 
+			/* Test multiple accumulations of unknown values
+			 * into a packet pointer.
+			 */
+			BPF_MOV64_REG(BPF_REG_5, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, 14),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_5, BPF_REG_6),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, 4),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_5, BPF_REG_6),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_5),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, 4),
+			BPF_JMP_REG(BPF_JGE, BPF_REG_3, BPF_REG_4, 1),
+			BPF_EXIT_INSN(),
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_5, 0),
+
 			BPF_MOV64_IMM(BPF_REG_0, 0),
 			BPF_EXIT_INSN(),
 		},
@@ -314,6 +328,29 @@ static struct bpf_align_test tests[] = {
 			 * requirements.
 			 */
 			"23: R0=pkt(id=0,off=8,r=8) R1=ctx R2=pkt(id=0,off=0,r=8) R3=pkt_end R4=pkt(id=2,off=18,r=18),aux_off_align=4 R5=pkt(id=2,off=14,r=18),aux_off_align=4 R6=inv54,min_align=4 R10=fp",
+
+			/* Constant offset is added to R5 packet pointer,
+			 * resulting in reg->off value of 14.
+			 */
+			"26: R0=pkt(id=0,off=8,r=8) R1=ctx R2=pkt(id=0,off=0,r=8) R3=pkt_end R4=inv,aux_off_align=4 R5=pkt(id=0,off=14,r=8) R6=inv54,min_align=4 R10=fp",
+			/* Variable offset is added to R5, resulting in an
+			 * auxiliary offset of 14, and an auxiliary alignment of 4.
+			 */
+			"27: R0=pkt(id=0,off=8,r=8) R1=ctx R2=pkt(id=0,off=0,r=8) R3=pkt_end R4=inv,aux_off_align=4 R5=pkt(id=3,off=0,r=0),aux_off=14,aux_off_align=4 R6=inv54,min_align=4 R10=fp",
+			/* Constant is added to R5 again, setting reg->off to 4. */
+			"28: R0=pkt(id=0,off=8,r=8) R1=ctx R2=pkt(id=0,off=0,r=8) R3=pkt_end R4=inv,aux_off_align=4 R5=pkt(id=3,off=4,r=0),aux_off=14,aux_off_align=4 R6=inv54,min_align=4 R10=fp",
+			/* And once more we add a variable, which causes an accumulation
+			 * of reg->off into reg->aux_off_align, with resulting value of
+			 * 18.  The auxiliary alignment stays at 4.
+			 */
+			"29: R0=pkt(id=0,off=8,r=8) R1=ctx R2=pkt(id=0,off=0,r=8) R3=pkt_end R4=inv,aux_off_align=4 R5=pkt(id=4,off=0,r=0),aux_off=18,aux_off_align=4 R6=inv54,min_align=4 R10=fp",
+			/* At the time the word size load is performed from R5,
+			 * it's total offset is NET_IP_ALIGN + reg->off (0) +
+			 * reg->aux_off (18) which is 20.  Then the variable offset
+			 * is considered using reg->aux_off_align which is 4 and meets
+			 * the load's requirements.
+			 */
+			"33: R0=pkt(id=0,off=8,r=8) R1=ctx R2=pkt(id=0,off=0,r=8) R3=pkt_end R4=pkt(id=4,off=4,r=4),aux_off=18,aux_off_align=4 R5=pkt(id=4,off=0,r=4),aux_off=18,aux_off_align=4 R6=inv54,min_align=4 R10=fp",
 		},
 	},
 };
-- 
2.1.2.532.g19b5d50

^ permalink raw reply related

* Re: [net-dsa-mv88e6xxx] question about potential use of uninitialized variable
From: Andrew Lunn @ 2017-05-12  2:33 UTC (permalink / raw)
  To: Gustavo A. R. Silva
  Cc: Vivien Didelot, Florian Fainelli, netdev, linux-kernel
In-Reply-To: <20170511163537.Horde.CH1Q_6Lo2SOIbQB6MQJFV3g@gator4166.hostgator.com>

On Thu, May 11, 2017 at 04:35:37PM -0500, Gustavo A. R. Silva wrote:
> 
> Hello everybody,
> 
> While looking into Coverity ID 1398130 I ran into the following
> piece of code at drivers/net/dsa/mv88e6xxx/chip.c:849:
> 
>  849static uint64_t _mv88e6xxx_get_ethtool_stat(struct mv88e6xxx_chip *chip,
>  850                                            struct mv88e6xxx_hw_stat *s,
>  851                                            int port, u16 bank1_select,
>  852                                            u16 histogram)
>  853{
>  854        u32 low;
>  855        u32 high = 0;
>  856        u16 reg = 0;
>  857        int err;
>  858        u64 value;
>  859
>  860        switch (s->type) {
>  861        case STATS_TYPE_PORT:
>  862                err = mv88e6xxx_port_read(chip, port, s->reg, &reg);
>  863                if (err)
>  864                        return UINT64_MAX;
>  865
>  866                low = reg;
>  867                if (s->sizeof_stat == 4) {
>  868                        err = mv88e6xxx_port_read(chip, port,
> s->reg + 1, &reg);
>  869                        if (err)
>  870                                return UINT64_MAX;
>  871                        high = reg;
>  872                }
>  873                break;
>  874        case STATS_TYPE_BANK1:
>  875                reg = bank1_select;
>  876                /* fall through */
>  877        case STATS_TYPE_BANK0:
>  878                reg |= s->reg | histogram;
>  879                mv88e6xxx_g1_stats_read(chip, reg, &low);
>  880                if (s->sizeof_stat == 8)
>  881                        mv88e6xxx_g1_stats_read(chip, reg + 1, &high);
>  882        }
>  883        value = (((u64)high) << 16) | low;
>  884        return value;
>  885}
> 
> My question here is if there is any chance for the execution path to
> directly jump from line 860 to line 883, hence ending up using the
> uninitialized variable _low_?

Hi Gustavo

It would require that s->type not have one of the listed case values.
Currently all members of mv88e6xxx_hw_stats due use expected values.
However, it would not hurt to add a

	 default:
		return UINT64_MAX;

Do you want to submit a patch?

   Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH v5 15/17] dt-bindings: qca7000: append UART interface to binding
From: Jakub Kicinski @ 2017-05-12  2:45 UTC (permalink / raw)
  To: Michael Heimpold
  Cc: Rob Herring, Stefan Wahren, Mark Rutland, Greg Kroah-Hartman,
	Jiri Slaby, linux-serial, linux-kernel, netdev, devicetree
In-Reply-To: <1702237.66ccflAQAJ@kerker>

On Thu, 11 May 2017 21:12:22 +0200, Michael Heimpold wrote:
> Am Mittwoch, 10. Mai 2017, 10:53:26 CEST schrieb Stefan Wahren:
> > This merges the serdev binding for the QCA7000 UART driver (Ethernet over
> > UART) into the existing document.
> > 
> > Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
> > ---
> >  .../devicetree/bindings/net/qca-qca7000.txt        | 32
> > ++++++++++++++++++++++ 1 file changed, 32 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/net/qca-qca7000.txt
> > b/Documentation/devicetree/bindings/net/qca-qca7000.txt index
> > a37f656..08364c3 100644
> > --- a/Documentation/devicetree/bindings/net/qca-qca7000.txt
> > +++ b/Documentation/devicetree/bindings/net/qca-qca7000.txt
> > @@ -54,3 +54,35 @@ ssp2: spi@80014000 {
> >  		local-mac-address = [ A0 B0 C0 D0 E0 F0 ];
> >  	};
> >  };
> > +
> > +(b) Ethernet over UART
> > +
> > +In order to use the QCA7000 as UART slave it must be defined as a child of
> > a +UART master in the device tree. It is possible to preconfigure the UART
> > +settings of the QCA7000 firmware, but it's not possible to change them
> > during +runtime.
> > +
> > +Required properties:
> > +- compatible        : Should be "qca,qca7000-uart"  
> 
> I already discussed this with Stefan off-list a little bit, but I would like
> to bring this to a broader audience: I'm not sure whether the compatible 
> should contain the "-uart" suffix, because the hardware chip is the very same
> QCA7000 chip which can also be used with SPI protocol.
> The only difference is the loaded firmware within the chip which can either
> speak SPI or UART protocol (but not both at the same time - due to shared 
> pins). So the hardware design decides which interface type is used.
> 
> At the moment, this patch series adds a dedicated driver for the UART 
> protocol, in parallel to the existing SPI driver. So a different compatible
> string is needed here to match against the new driver.
> 
> An alternative approach would be to re-use the existing compatible string
> "qca,qca7000" for both, the SPI and UART protocol, because a "smarter" 
> (combined) driver would detect which protocol to use. For example the driver 
> could check for spi-cpha and/or spi-cpol which are required for SPI protocol: 
> if these exists the driver could assume that SPI must be used, if both are 
> missing then UART protocol should be used.
> (This way it would not be necessary to check whether the node is a child of
> a SPI or UART master node - but maybe this is even easier - I don't know)
> 
> Or in shorter words: my concern is that while "qca7000-uart" describes the 
> hardware, it's too closely coupled to the driver implementation. Having
> some feedback of the experts would be nice :-)

I'm no expert, but devices which can do both I2C and SPI are quite
common, and they usually have the same compatible string for both
buses.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox