Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v2 0/3] Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02 17:23 UTC (permalink / raw)
  Cc: Saeed Mahameed, Leon Romanovsky, David S . Miller, Doug Ledford,
	Jason Gunthorpe, netdev, linux-rdma, linux-kernel, Chuhong Yuan

Reference counters are preferred to use refcount_t instead of
atomic_t.
This is because the implementation of refcount_t can prevent
overflows and detect possible use-after-free.

First convert the refcount field to refcount_t in mlx5/driver.h.
Then convert the uses to refcount_() APIs.

Changelog:

v1 -> v2:
  - Add #include in include/linux/mlx5/driver.h.

Chuhong Yuan (3):
  mlx5: Use refcount_t for refcount
  net/mlx5: Use refcount_() APIs
  IB/mlx5: Use refcount_() APIs

 drivers/infiniband/hw/mlx5/srq_cmd.c         | 6 +++---
 drivers/net/ethernet/mellanox/mlx5/core/qp.c | 6 +++---
 include/linux/mlx5/driver.h                  | 3 ++-
 3 files changed, 8 insertions(+), 7 deletions(-)

-- 
2.20.1


^ permalink raw reply

* [PATCH v2 1/3] mlx5: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02 17:23 UTC (permalink / raw)
  Cc: Saeed Mahameed, Leon Romanovsky, netdev, linux-rdma, linux-kernel,
	Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
Changes in v2:
  - Add #include.

 include/linux/mlx5/driver.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 0e6da1840c7d..5b56f343ce87 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -47,6 +47,7 @@
 #include <linux/interrupt.h>
 #include <linux/idr.h>
 #include <linux/notifier.h>
+#include <linux/refcount.h>
 
 #include <linux/mlx5/device.h>
 #include <linux/mlx5/doorbell.h>
@@ -398,7 +399,7 @@ enum mlx5_res_type {
 
 struct mlx5_core_rsc_common {
 	enum mlx5_res_type	res;
-	atomic_t		refcount;
+	refcount_t		refcount;
 	struct completion	free;
 };
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH v2 2/3] net/mlx5: Use refcount_() APIs
From: Chuhong Yuan @ 2019-08-02 17:23 UTC (permalink / raw)
  Cc: Saeed Mahameed, Leon Romanovsky, David S . Miller, netdev,
	linux-rdma, linux-kernel, Chuhong Yuan

This patch depends on PATCH 1/3.

After converting refcount to refcount_t, use
refcount_() APIs to operate it.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/qp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
index b8ba74de9555..7b44d1e49604 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
@@ -53,7 +53,7 @@ mlx5_get_rsc(struct mlx5_qp_table *table, u32 rsn)
 
 	common = radix_tree_lookup(&table->tree, rsn);
 	if (common)
-		atomic_inc(&common->refcount);
+		refcount_inc(&common->refcount);
 
 	spin_unlock_irqrestore(&table->lock, flags);
 
@@ -62,7 +62,7 @@ mlx5_get_rsc(struct mlx5_qp_table *table, u32 rsn)
 
 void mlx5_core_put_rsc(struct mlx5_core_rsc_common *common)
 {
-	if (atomic_dec_and_test(&common->refcount))
+	if (refcount_dec_and_test(&common->refcount))
 		complete(&common->free);
 }
 
@@ -209,7 +209,7 @@ static int create_resource_common(struct mlx5_core_dev *dev,
 	if (err)
 		return err;
 
-	atomic_set(&qp->common.refcount, 1);
+	refcount_set(&qp->common.refcount, 1);
 	init_completion(&qp->common.free);
 	qp->pid = current->pid;
 
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH V2 7/9] vhost: do not use RCU to synchronize MMU notifier with worker
From: Jason Gunthorpe @ 2019-08-02 17:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, kvm, virtualization, netdev, linux-kernel, linux-mm
In-Reply-To: <20190802100414-mutt-send-email-mst@kernel.org>

On Fri, Aug 02, 2019 at 10:27:21AM -0400, Michael S. Tsirkin wrote:
> On Fri, Aug 02, 2019 at 09:46:13AM -0300, Jason Gunthorpe wrote:
> > On Fri, Aug 02, 2019 at 05:40:07PM +0800, Jason Wang wrote:
> > > > This must be a proper barrier, like a spinlock, mutex, or
> > > > synchronize_rcu.
> > > 
> > > 
> > > I start with synchronize_rcu() but both you and Michael raise some
> > > concern.
> > 
> > I've also idly wondered if calling synchronize_rcu() under the various
> > mm locks is a deadlock situation.
> > 
> > > Then I try spinlock and mutex:
> > > 
> > > 1) spinlock: add lots of overhead on datapath, this leads 0 performance
> > > improvement.
> > 
> > I think the topic here is correctness not performance improvement
> 
> The topic is whether we should revert
> commit 7f466032dc9 ("vhost: access vq metadata through kernel virtual address")
> 
> or keep it in. The only reason to keep it is performance.

Yikes, I'm not sure you can ever win against copy_from_user using
mmu_notifiers?  The synchronization requirements are likely always
more expensive unless large and scattered copies are being done..

The rcu is about the only simple approach that could be less
expensive, and that gets back to the question if you can block an
invalidate_start_range in synchronize_rcu or not..

So, frankly, I'd revert it until someone could prove the rcu solution is
OK..

BTW, how do you get copy_from_user to work outside a syscall?

Also, why can't this just permanently GUP the pages? In fact, where
does it put_page them anyhow? Worrying that 7f466 adds a get_user page
but does not add a put_page??

Jason

^ permalink raw reply

* Re: [PATCH iproute2-next] ip tunnel: add json output
From: Andrea Claudi @ 2019-08-02 17:30 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: linux-netdev, David Ahern
In-Reply-To: <20190802084933.10992569@hermes.lan>

On Fri, Aug 2, 2019 at 5:49 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Fri, 2 Aug 2019 13:14:15 +0200
> Andrea Claudi <aclaudi@redhat.com> wrote:
>
> > On Thu, Aug 1, 2019 at 5:16 PM Stephen Hemminger
> > <stephen@networkplumber.org> wrote:
> > >
> > > On Thu,  1 Aug 2019 12:12:58 +0200
> > > Andrea Claudi <aclaudi@redhat.com> wrote:
> > >
> > > > Add json support on iptunnel and ip6tunnel.
> > > > The plain text output format should remain the same.
> > > >
> > > > Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
> > > > ---
> > > >  ip/ip6tunnel.c | 82 +++++++++++++++++++++++++++++++--------------
> > > >  ip/iptunnel.c  | 90 +++++++++++++++++++++++++++++++++-----------------
> > > >  ip/tunnel.c    | 42 +++++++++++++++++------
> > > >  3 files changed, 148 insertions(+), 66 deletions(-)
> > > >
> > > > diff --git a/ip/ip6tunnel.c b/ip/ip6tunnel.c
> > > > index d7684a673fdc4..f2b9710c1320f 100644
> > > > --- a/ip/ip6tunnel.c
> > > > +++ b/ip/ip6tunnel.c
> > > > @@ -71,57 +71,90 @@ static void usage(void)
> > > >  static void print_tunnel(const void *t)
> > > >  {
> > > >       const struct ip6_tnl_parm2 *p = t;
> > > > -     char s1[1024];
> > > > -     char s2[1024];
> > > > +     SPRINT_BUF(b1);
> > > >
> > > >       /* Do not use format_host() for local addr,
> > > >        * symbolic name will not be useful.
> > > >        */
> > > > -     printf("%s: %s/ipv6 remote %s local %s",
> > > > -            p->name,
> > > > -            tnl_strproto(p->proto),
> > > > -            format_host_r(AF_INET6, 16, &p->raddr, s1, sizeof(s1)),
> > > > -            rt_addr_n2a_r(AF_INET6, 16, &p->laddr, s2, sizeof(s2)));
> > > > +     open_json_object(NULL);
> > > > +     print_string(PRINT_ANY, "ifname", "%s: ", p->name);
> > >
> > > Print this using color for interface name?
> >
> > Thanks for the suggestion, I can do the same also for remote and local
> > addresses?
> >
> > >
> > >
> > > > +     snprintf(b1, sizeof(b1), "%s/ipv6", tnl_strproto(p->proto));
> > > > +     print_string(PRINT_ANY, "mode", "%s ", b1);
> > > > +     print_string(PRINT_ANY,
> > > > +                  "remote",
> > > > +                  "remote %s ",
> > > > +                  format_host_r(AF_INET6, 16, &p->raddr, b1, sizeof(b1)));
> > > > +     print_string(PRINT_ANY,
> > > > +                  "local",
> > > > +                  "local %s",
> > > > +                  rt_addr_n2a_r(AF_INET6, 16, &p->laddr, b1, sizeof(b1)));
> > > > +
> > > >       if (p->link) {
> > > >               const char *n = ll_index_to_name(p->link);
> > > >
> > > >               if (n)
> > > > -                     printf(" dev %s", n);
> > > > +                     print_string(PRINT_ANY, "link", " dev %s", n);
> > > >       }
> > > >
> > > >       if (p->flags & IP6_TNL_F_IGN_ENCAP_LIMIT)
> > > > -             printf(" encaplimit none");
> > > > +             print_bool(PRINT_ANY,
> > > > +                        "ip6_tnl_f_ign_encap_limit",
> > > > +                        " encaplimit none",
> > > > +                        true);
> > >
> > > For flags like this, print_null is more typical JSON than a boolean
> > > value. Null is better for presence flag. Bool is better if both true and
> > > false are printed.
> >
> > Using print_null json JSON output becomes:
> >
> >   {
> >     "ifname": "gre2",
> >     "mode": "gre/ipv6",
> >     "remote": "fd::1",
> >     "local": "::",
> >     "ip6_tnl_f_ign_encap_limit": null,
> >     "hoplimit": 64,
> >     "tclass": "0x00",
> >     "flowlabel": "0x00000",
> >     "flowinfo": "0x00000000",
> >     "flags": []
> >   }
> >
> > Which seems a bit confusing to me (what does the "null" value means?).
> > Using print_null we also introduce an inconsistency with the JSON
> > output of ip/link_gre6.c and ip/link_ip6tnl.c.
> > I would prefer to use print_bool and print out
> > ip6_tnl_f_ign_encap_limit both when true and false, patching both
> > files above to do the same. WDYT?
>
> JSON has several ways to represent the same type of flag value.
> Since JSON always has key/value. Null is used to indicate something is present and
> in that case the value is unnecessary, which is what the null field was meant for.
>

Thanks for the clarification, Stephen.
I'll submit a v2 with print_null and I'll fix cases similar to the one
you highlighted above in subsequent patches.

^ permalink raw reply

* [PATCH iproute2-next v2] ip tunnel: add json output
From: Andrea Claudi @ 2019-08-02 17:38 UTC (permalink / raw)
  To: netdev; +Cc: stephen, dsahern

Add json support on iptunnel and ip6tunnel.
The plain text output format should remain the same.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
---
Changes since v1:
 * Use print_color_* for ifname and ip addresses;
 * Use print_null() instead of print_bool() where appropriate;
 * Reduce indentation level on tnl_print_gre_flags()
---
 ip/ip6tunnel.c | 66 ++++++++++++++++++++++++---------------
 ip/iptunnel.c  | 84 ++++++++++++++++++++++++++++++++------------------
 ip/tunnel.c    | 37 +++++++++++++++++-----
 3 files changed, 125 insertions(+), 62 deletions(-)

diff --git a/ip/ip6tunnel.c b/ip/ip6tunnel.c
index d7684a673fdc4..80f16c35f0618 100644
--- a/ip/ip6tunnel.c
+++ b/ip/ip6tunnel.c
@@ -71,57 +71,76 @@ static void usage(void)
 static void print_tunnel(const void *t)
 {
 	const struct ip6_tnl_parm2 *p = t;
-	char s1[1024];
-	char s2[1024];
+	SPRINT_BUF(b1);
 
 	/* Do not use format_host() for local addr,
 	 * symbolic name will not be useful.
 	 */
-	printf("%s: %s/ipv6 remote %s local %s",
-	       p->name,
-	       tnl_strproto(p->proto),
-	       format_host_r(AF_INET6, 16, &p->raddr, s1, sizeof(s1)),
-	       rt_addr_n2a_r(AF_INET6, 16, &p->laddr, s2, sizeof(s2)));
+	open_json_object(NULL);
+	print_color_string(PRINT_ANY, COLOR_IFNAME, "ifname", "%s: ", p->name);
+	snprintf(b1, sizeof(b1), "%s/ipv6", tnl_strproto(p->proto));
+	print_string(PRINT_ANY, "mode", "%s ", b1);
+	print_string(PRINT_FP, NULL, "%s", "remote ");
+	print_color_string(PRINT_ANY, COLOR_INET6, "remote", "%s ",
+			   format_host_r(AF_INET6, 16, &p->raddr, b1, sizeof(b1)));
+	print_string(PRINT_FP, NULL, "%s", "local ");
+	print_color_string(PRINT_ANY, COLOR_INET6, "local", "%s",
+			   rt_addr_n2a_r(AF_INET6, 16, &p->laddr, b1, sizeof(b1)));
+
 	if (p->link) {
 		const char *n = ll_index_to_name(p->link);
 
 		if (n)
-			printf(" dev %s", n);
+			print_string(PRINT_ANY, "link", " dev %s", n);
 	}
 
 	if (p->flags & IP6_TNL_F_IGN_ENCAP_LIMIT)
-		printf(" encaplimit none");
+		print_null(PRINT_ANY, "ip6_tnl_f_ign_encap_limit",
+			   " encaplimit none", NULL);
 	else
-		printf(" encaplimit %u", p->encap_limit);
+		print_uint(PRINT_ANY, "encap_limit", " encaplimit %u",
+			   p->encap_limit);
 
 	if (p->hop_limit)
-		printf(" hoplimit %u", p->hop_limit);
+		print_uint(PRINT_ANY, "hoplimit", " hoplimit %u", p->hop_limit);
 	else
-		printf(" hoplimit inherit");
+		print_string(PRINT_FP, "hoplimit", " hoplimit %s", "inherit");
 
-	if (p->flags & IP6_TNL_F_USE_ORIG_TCLASS)
-		printf(" tclass inherit");
-	else {
+	if (p->flags & IP6_TNL_F_USE_ORIG_TCLASS) {
+		print_null(PRINT_ANY, "ip6_tnl_f_use_orig_tclass",
+			   " tclass inherit", NULL);
+	} else {
 		__u32 val = ntohl(p->flowinfo & IP6_FLOWINFO_TCLASS);
 
-		printf(" tclass 0x%02x", (__u8)(val >> 20));
+		snprintf(b1, sizeof(b1), "0x%02x", (__u8)(val >> 20));
+		print_string(PRINT_ANY, "tclass", " tclass %s", b1);
 	}
 
-	if (p->flags & IP6_TNL_F_USE_ORIG_FLOWLABEL)
-		printf(" flowlabel inherit");
-	else
-		printf(" flowlabel 0x%05x", ntohl(p->flowinfo & IP6_FLOWINFO_FLOWLABEL));
+	if (p->flags & IP6_TNL_F_USE_ORIG_FLOWLABEL) {
+		print_null(PRINT_ANY, "ip6_tnl_f_use_orig_flowlabel",
+			   " flowlabel inherit", NULL);
+	} else {
+		__u32 val = ntohl(p->flowinfo & IP6_FLOWINFO_FLOWLABEL);
 
-	printf(" (flowinfo 0x%08x)", ntohl(p->flowinfo));
+		snprintf(b1, sizeof(b1), "0x%05x", val);
+		print_string(PRINT_ANY, "flowlabel", " flowlabel %s", b1);
+	}
+
+	snprintf(b1, sizeof(b1), "0x%08x", ntohl(p->flowinfo));
+	print_string(PRINT_ANY, "flowinfo", " (flowinfo %s)", b1);
 
 	if (p->flags & IP6_TNL_F_RCV_DSCP_COPY)
-		printf(" dscp inherit");
+		print_null(PRINT_ANY, "ip6_tnl_f_rcv_dscp_copy",
+			   " dscp inherit", NULL);
 
 	if (p->flags & IP6_TNL_F_ALLOW_LOCAL_REMOTE)
-		printf(" allow-localremote");
+		print_null(PRINT_ANY, "ip6_tnl_f_allow_local_remote",
+			   " allow-localremote", NULL);
 
 	tnl_print_gre_flags(p->proto, p->i_flags, p->o_flags,
 			    p->i_key, p->o_key);
+
+	close_json_object();
 }
 
 static int parse_args(int argc, char **argv, int cmd, struct ip6_tnl_parm2 *p)
@@ -357,7 +376,6 @@ static int do_show(int argc, char **argv)
 		return -1;
 
 	print_tunnel(&p);
-	fputc('\n', stdout);
 	return 0;
 }
 
diff --git a/ip/iptunnel.c b/ip/iptunnel.c
index 66929e75c7448..696f3b92bff55 100644
--- a/ip/iptunnel.c
+++ b/ip/iptunnel.c
@@ -289,17 +289,25 @@ static void print_tunnel(const void *t)
 {
 	const struct ip_tunnel_parm *p = t;
 	struct ip_tunnel_6rd ip6rd = {};
-	char s1[1024];
-	char s2[1024];
+	SPRINT_BUF(b1);
 
 	/* Do not use format_host() for local addr,
 	 * symbolic name will not be useful.
 	 */
-	printf("%s: %s/ip remote %s local %s",
-	       p->name,
-	       tnl_strproto(p->iph.protocol),
-	       p->iph.daddr ? format_host_r(AF_INET, 4, &p->iph.daddr, s1, sizeof(s1)) : "any",
-	       p->iph.saddr ? rt_addr_n2a_r(AF_INET, 4, &p->iph.saddr, s2, sizeof(s2)) : "any");
+	open_json_object(NULL);
+	print_color_string(PRINT_ANY, COLOR_IFNAME, "ifname", "%s: ", p->name);
+	snprintf(b1, sizeof(b1), "%s/ip", tnl_strproto(p->iph.protocol));
+	print_string(PRINT_ANY, "mode", "%s ", b1);
+	print_null(PRINT_FP, NULL, "remote ", NULL);
+	print_color_string(PRINT_ANY, COLOR_INET, "remote", "%s ",
+			   p->iph.daddr || is_json_context()
+				? format_host_r(AF_INET, 4, &p->iph.daddr, b1, sizeof(b1))
+				: "any");
+	print_null(PRINT_FP, NULL, "local ", NULL);
+	print_color_string(PRINT_ANY, COLOR_INET, "local", "%s",
+			   p->iph.saddr || is_json_context()
+				? rt_addr_n2a_r(AF_INET, 4, &p->iph.saddr, b1, sizeof(b1))
+				: "any");
 
 	if (p->iph.protocol == IPPROTO_IPV6 && (p->i_flags & SIT_ISATAP)) {
 		struct ip_tunnel_prl prl[16] = {};
@@ -308,54 +316,70 @@ static void print_tunnel(const void *t)
 		prl[0].datalen = sizeof(prl) - sizeof(prl[0]);
 		prl[0].addr = htonl(INADDR_ANY);
 
-		if (!tnl_prl_ioctl(SIOCGETPRL, p->name, prl))
+		if (!tnl_prl_ioctl(SIOCGETPRL, p->name, prl)) {
 			for (i = 1; i < ARRAY_SIZE(prl); i++) {
-				if (prl[i].addr != htonl(INADDR_ANY)) {
-					printf(" %s %s ",
-					       (prl[i].flags & PRL_DEFAULT) ? "pdr" : "pr",
-					       format_host(AF_INET, 4, &prl[i].addr));
-				}
+				if (prl[i].addr == htonl(INADDR_ANY))
+					continue;
+				if (prl[i].flags & PRL_DEFAULT)
+					print_string(PRINT_ANY, "pdr",
+						     " pdr %s",
+						     format_host(AF_INET, 4, &prl[i].addr));
+				else
+					print_string(PRINT_ANY, "pr", " pr %s",
+						     format_host(AF_INET, 4, &prl[i].addr));
 			}
+		}
 	}
 
 	if (p->link) {
 		const char *n = ll_index_to_name(p->link);
 
 		if (n)
-			printf(" dev %s", n);
+			print_string(PRINT_ANY, "dev", " dev %s", n);
 	}
 
 	if (p->iph.ttl)
-		printf(" ttl %u", p->iph.ttl);
+		print_uint(PRINT_ANY, "ttl", " ttl %u", p->iph.ttl);
 	else
-		printf(" ttl inherit");
+		print_string(PRINT_FP, "ttl", " ttl %s", "inherit");
 
 	if (p->iph.tos) {
-		SPRINT_BUF(b1);
-		printf(" tos");
-		if (p->iph.tos & 1)
-			printf(" inherit");
-		if (p->iph.tos & ~1)
-			printf("%c%s ", p->iph.tos & 1 ? '/' : ' ',
-			       rtnl_dsfield_n2a(p->iph.tos & ~1, b1, sizeof(b1)));
+		SPRINT_BUF(b2);
+
+		if (p->iph.tos != 1) {
+			if (!is_json_context() && p->iph.tos & 1)
+				snprintf(b2, sizeof(b2), "%s%s",
+					 p->iph.tos & 1 ? "inherit/" : "",
+					 rtnl_dsfield_n2a(p->iph.tos & ~1, b1, sizeof(b1)));
+			else
+				snprintf(b2, sizeof(b2), "%s",
+					 rtnl_dsfield_n2a(p->iph.tos, b1, sizeof(b1)));
+			print_string(PRINT_ANY, "tos", " tos %s", b2);
+		} else {
+			print_string(PRINT_FP, NULL, " tos %s", "inherit");
+		}
 	}
 
 	if (!(p->iph.frag_off & htons(IP_DF)))
-		printf(" nopmtudisc");
+		print_null(PRINT_ANY, "nopmtudisc", " nopmtudisc", NULL);
 
 	if (p->iph.protocol == IPPROTO_IPV6 && !tnl_ioctl_get_6rd(p->name, &ip6rd) && ip6rd.prefixlen) {
-		printf(" 6rd-prefix %s/%u",
-		       inet_ntop(AF_INET6, &ip6rd.prefix, s1, sizeof(s1)),
-		       ip6rd.prefixlen);
+		print_string(PRINT_ANY, "6rd-prefix", " 6rd-prefix %s",
+			     inet_ntop(AF_INET6, &ip6rd.prefix, b1, sizeof(b1)));
+		print_uint(PRINT_ANY, "6rd-prefixlen", "/%u", ip6rd.prefixlen);
 		if (ip6rd.relay_prefix) {
-			printf(" 6rd-relay_prefix %s/%u",
-			       format_host(AF_INET, 4, &ip6rd.relay_prefix),
-			       ip6rd.relay_prefixlen);
+			print_string(PRINT_ANY, "6rd-relay_prefix",
+				     " 6rd-relay_prefix %s",
+				     format_host(AF_INET, 4, &ip6rd.relay_prefix));
+			print_uint(PRINT_ANY, "6rd-relay_prefixlen", "/%u",
+				   ip6rd.relay_prefixlen);
 		}
 	}
 
 	tnl_print_gre_flags(p->iph.protocol, p->i_flags, p->o_flags,
 			    p->i_key, p->o_key);
+
+	close_json_object();
 }
 
 
diff --git a/ip/tunnel.c b/ip/tunnel.c
index 41b0ef3165fdc..88585cf3177b5 100644
--- a/ip/tunnel.c
+++ b/ip/tunnel.c
@@ -314,24 +314,43 @@ void tnl_print_gre_flags(__u8 proto,
 {
 	if ((i_flags & GRE_KEY) && (o_flags & GRE_KEY) &&
 	    o_key == i_key) {
-		printf(" key %u", ntohl(i_key));
+		print_uint(PRINT_ANY, "key", " key %u", ntohl(i_key));
 	} else {
 		if (i_flags & GRE_KEY)
-			printf(" ikey %u", ntohl(i_key));
+			print_uint(PRINT_ANY, "ikey", " ikey %u", ntohl(i_key));
 		if (o_flags & GRE_KEY)
-			printf(" okey %u", ntohl(o_key));
+			print_uint(PRINT_ANY, "okey", " okey %u", ntohl(o_key));
 	}
 
-	if (proto == IPPROTO_GRE) {
-		if (i_flags & GRE_SEQ)
+	if (proto != IPPROTO_GRE)
+		return;
+
+	open_json_array(PRINT_JSON, "flags");
+	if (i_flags & GRE_SEQ) {
+		if (is_json_context())
+			print_string(PRINT_JSON, NULL, "%s", "rx_drop_ooseq");
+		else
 			printf("%s  Drop packets out of sequence.", _SL_);
-		if (i_flags & GRE_CSUM)
+	}
+	if (i_flags & GRE_CSUM) {
+		if (is_json_context())
+			print_string(PRINT_JSON, NULL, "%s", "rx_csum");
+		else
 			printf("%s  Checksum in received packet is required.", _SL_);
-		if (o_flags & GRE_SEQ)
+	}
+	if (o_flags & GRE_SEQ) {
+		if (is_json_context())
+			print_string(PRINT_JSON, NULL, "%s", "tx_seq");
+		else
 			printf("%s  Sequence packets on output.", _SL_);
-		if (o_flags & GRE_CSUM)
+	}
+	if (o_flags & GRE_CSUM) {
+		if (is_json_context())
+			print_string(PRINT_JSON, NULL, "%s", "tx_csum");
+		else
 			printf("%s  Checksum output packets.", _SL_);
 	}
+	close_json_array(PRINT_JSON, NULL);
 }
 
 static void tnl_print_stats(const struct rtnl_link_stats64 *s)
@@ -417,6 +436,7 @@ static int print_nlmsg_tunnel(struct nlmsghdr *n, void *arg)
 
 int do_tunnels_list(struct tnl_print_nlmsg_info *info)
 {
+	new_json_obj(json);
 	if (rtnl_linkdump_req(&rth, preferred_family) < 0) {
 		perror("Cannot send dump request\n");
 		return -1;
@@ -426,6 +446,7 @@ int do_tunnels_list(struct tnl_print_nlmsg_info *info)
 		fprintf(stderr, "Dump terminated\n");
 		return -1;
 	}
+	delete_json_obj();
 
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related

* Re: [net-next 01/12] net/mlx5: E-Switch, add ingress rate support
From: Alexei Starovoitov @ 2019-08-02 17:37 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev@vger.kernel.org, Eli Cohen, Paul Blakey
In-Reply-To: <20190801195620.26180-2-saeedm@mellanox.com>

On Thu, Aug 1, 2019 at 6:30 PM Saeed Mahameed <saeedm@mellanox.com> wrote:
>
> From: Eli Cohen <eli@mellanox.com>
>
> Use the scheduling elements to implement ingress rate limiter on an
> eswitch ports ingress traffic. Since the ingress of eswitch port is the
> egress of VF port, we control eswitch ingress by controlling VF egress.

Looks like the patch is only passing args to firmware which is doing the magic.
Can you please describe what is the algorithm there?
Is it configurable?

^ permalink raw reply

* Re: [PATCH] net: sctp: Rename fallthrough label to unhandled
From: Joe Perches @ 2019-08-02 17:47 UTC (permalink / raw)
  To: Neil Horman
  Cc: Vlad Yasevich, Marcelo Ricardo Leitner, David S. Miller,
	linux-sctp, netdev, linux-kernel
In-Reply-To: <20190731121646.GD9823@hmswarspite.think-freely.org>

On Wed, 2019-07-31 at 08:16 -0400, Neil Horman wrote:
> On Wed, Jul 31, 2019 at 04:32:43AM -0700, Joe Perches wrote:
> > On Wed, 2019-07-31 at 07:19 -0400, Neil Horman wrote:
> > > On Tue, Jul 30, 2019 at 10:04:37PM -0700, Joe Perches wrote:
> > > > fallthrough may become a pseudo reserved keyword so this only use of
> > > > fallthrough is better renamed to allow it.

Can you or any other maintainer apply this patch
or ack it so David Miller can apply it?

Thanks.


^ permalink raw reply

* Re: [PATCH] net: sctp: Rename fallthrough label to unhandled
From: Neil Horman @ 2019-08-02 17:50 UTC (permalink / raw)
  To: Joe Perches
  Cc: Vlad Yasevich, Marcelo Ricardo Leitner, David S. Miller,
	linux-sctp, netdev, linux-kernel
In-Reply-To: <e0dd3af448e38e342c1ac6e7c0c802696eb77fd6.1564549413.git.joe@perches.com>

On Tue, Jul 30, 2019 at 10:04:37PM -0700, Joe Perches wrote:
> fallthrough may become a pseudo reserved keyword so this only use of
> fallthrough is better renamed to allow it.
> 
> Signed-off-by: Joe Perches <joe@perches.com>
> ---
>  net/sctp/sm_make_chunk.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index 36bd8a6e82df..3fdcaa2fbf12 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -2152,7 +2152,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
>  	case SCTP_PARAM_SET_PRIMARY:
>  		if (net->sctp.addip_enable)
>  			break;
> -		goto fallthrough;
> +		goto unhandled;
>  
>  	case SCTP_PARAM_HOST_NAME_ADDRESS:
>  		/* Tell the peer, we won't support this param.  */
> @@ -2163,11 +2163,11 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
>  	case SCTP_PARAM_FWD_TSN_SUPPORT:
>  		if (ep->prsctp_enable)
>  			break;
> -		goto fallthrough;
> +		goto unhandled;
>  
>  	case SCTP_PARAM_RANDOM:
>  		if (!ep->auth_enable)
> -			goto fallthrough;
> +			goto unhandled;
>  
>  		/* SCTP-AUTH: Secion 6.1
>  		 * If the random number is not 32 byte long the association
> @@ -2184,7 +2184,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
>  
>  	case SCTP_PARAM_CHUNKS:
>  		if (!ep->auth_enable)
> -			goto fallthrough;
> +			goto unhandled;
>  
>  		/* SCTP-AUTH: Section 3.2
>  		 * The CHUNKS parameter MUST be included once in the INIT or
> @@ -2200,7 +2200,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
>  
>  	case SCTP_PARAM_HMAC_ALGO:
>  		if (!ep->auth_enable)
> -			goto fallthrough;
> +			goto unhandled;
>  
>  		hmacs = (struct sctp_hmac_algo_param *)param.p;
>  		n_elt = (ntohs(param.p->length) -
> @@ -2223,7 +2223,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
>  			retval = SCTP_IERROR_ABORT;
>  		}
>  		break;
> -fallthrough:
> +unhandled:
>  	default:
>  		pr_debug("%s: unrecognized param:%d for chunk:%d\n",
>  			 __func__, ntohs(param.p->type), cid);
> -- 
> 2.15.0
> 
> 
Yeah, it seems reasonable to me, though I'm still not comfortable with defining
fallthrough as a macrotized keyword, but thats a debate for another thread

Acked-by: Neil Horman <nhorman@tuxdriver.com>


^ permalink raw reply

* Re: [PATCH net-next] openvswitch: Print error when ovs_execute_actions() fails
From: Gregory Rose @ 2019-08-02 17:51 UTC (permalink / raw)
  To: Yifeng Sun, netdev, pshelar
In-Reply-To: <1564694047-4859-1-git-send-email-pkusunyifeng@gmail.com>

On 8/1/2019 2:14 PM, Yifeng Sun wrote:
> Currently in function ovs_dp_process_packet(), return values of
> ovs_execute_actions() are silently discarded. This patch prints out
> an error message when error happens so as to provide helpful hints
> for debugging.
>
> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
> ---
>   net/openvswitch/datapath.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 892287d..603c533 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -222,6 +222,7 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
>   	struct dp_stats_percpu *stats;
>   	u64 *stats_counter;
>   	u32 n_mask_hit;
> +	int error;
>   
>   	stats = this_cpu_ptr(dp->stats_percpu);
>   
> @@ -229,7 +230,6 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
>   	flow = ovs_flow_tbl_lookup_stats(&dp->table, key, &n_mask_hit);
>   	if (unlikely(!flow)) {
>   		struct dp_upcall_info upcall;
> -		int error;
>   
>   		memset(&upcall, 0, sizeof(upcall));
>   		upcall.cmd = OVS_PACKET_CMD_MISS;
> @@ -246,7 +246,10 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
>   
>   	ovs_flow_stats_update(flow, key->tp.flags, skb);
>   	sf_acts = rcu_dereference(flow->sf_acts);
> -	ovs_execute_actions(dp, skb, sf_acts, key);
> +	error = ovs_execute_actions(dp, skb, sf_acts, key);
> +	if (unlikely(error))
> +		net_err_ratelimited("ovs: action execution error on datapath %s: %d\n",
> +							ovs_dp_name(dp), error);
>   
>   	stats_counter = &stats->n_hit;
>   

Thanks Yifeng.

Reviewed-by: Greg Rose <gvrose8192@gmail.com>


^ permalink raw reply

* Re: [PATCH v4 2/4] net: phy: Add function to retrieve LED configuration from the DT
From: Matthias Kaehlcke @ 2019-08-02 17:59 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S . Miller, Rob Herring, Mark Rutland, Florian Fainelli,
	Heiner Kallweit, netdev, devicetree, linux-kernel,
	Douglas Anderson
In-Reply-To: <20190802163810.GL2099@lunn.ch>

On Fri, Aug 02, 2019 at 06:38:10PM +0200, Andrew Lunn wrote:
> On Thu, Aug 01, 2019 at 12:07:57PM -0700, Matthias Kaehlcke wrote:
> > Add a phylib function for retrieving PHY LED configuration that
> > is specified in the device tree using the generic binding. LEDs
> > can be configured to be 'on' for a certain link speed or to blink
> > when there is TX/RX activity.
> > 
> > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > ---
> > Changes in v4:
> > - patch added to the series
> > ---
> >  drivers/net/phy/phy_device.c | 50 ++++++++++++++++++++++++++++++++++++
> >  include/linux/phy.h          | 15 +++++++++++
> >  2 files changed, 65 insertions(+)
> > 
> > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > index 6b5cb87f3866..b4b48de45712 100644
> > --- a/drivers/net/phy/phy_device.c
> > +++ b/drivers/net/phy/phy_device.c
> > @@ -2188,6 +2188,56 @@ static bool phy_drv_supports_irq(struct phy_driver *phydrv)
> >  	return phydrv->config_intr && phydrv->ack_interrupt;
> >  }
> >  
> > +int of_get_phy_led_cfg(struct phy_device *phydev, int led,
> > +		       struct phy_led_config *cfg)
> > +{
> > +	struct device_node *np, *child;
> > +	const char *trigger;
> > +	int ret;
> > +
> > +	if (!IS_ENABLED(CONFIG_OF_MDIO))
> > +		return -ENOENT;
> > +
> > +	np = of_find_node_by_name(phydev->mdio.dev.of_node, "leds");
> > +	if (!np)
> > +		return -ENOENT;
> > +
> > +	for_each_child_of_node(np, child) {
> > +		u32 val;
> > +
> > +		if (!of_property_read_u32(child, "reg", &val)) {
> > +			if (val == (u32)led)
> > +				break;
> > +		}
> > +	}
> 
> Hi Matthias
> 
> This is leaking references to np and child. In the past we have not
> cared about this too much, but we are now getting patches adding the
> missing releases. So it would be good to fix this.

Good point, I'll fix it in the next revision.

Thanks

Matthias

^ permalink raw reply

* Re: [PATCH net-next v5 5/6] flow_offload: support get flow_block immediately
From: Jakub Kicinski @ 2019-08-02 18:02 UTC (permalink / raw)
  To: wenxu; +Cc: jiri, pablo, fw, netfilter-devel, netdev, John Hurley
In-Reply-To: <55850b13-991f-97bd-b452-efacd0f39aa4@ucloud.cn>

On Fri, 2 Aug 2019 21:09:03 +0800, wenxu wrote:
> >> We'd have something like the loop in flow_get_default_block():
> >>
> >> 	for each (subsystem)
> >> 		subsystem->handle_new_indir_cb(indr_dev, cb);
> >>
> >> And then per-subsystem logic would actually call the cb. Or:
> >>
> >> 	for each (subsystem)
> >> 		block = get_default_block(indir_dev)
> >> 		indr_dev->ing_cmd_cb(...)  
> >             nft dev chian is also based on register_netdevice_notifier, So for unregister case,
> >
> > the basechian(block) of nft maybe delete before the __tc_indr_block_cb_unregister. is right?
> >
> > So maybe we can cache the block as a list of all the subsystem in  indr_dev ?  
> 
> 
> when the device is unregister the nft netdev chain related to this device will also be delete through netdevice_notifier
> 
> . So for unregister case,the basechian(block) of nft maybe delete before the __tc_indr_block_cb_unregister.

Hm, but I don't think that should be an issue. The ordering should be
like one of the following two:

device unregister:
  - driver notifier callback
    - unregister flow cb
      - UNBIND cb
        - free driver's block state
    - free driver's device state
  - nft block destroy
    # doesn't see driver's CB any more

Or:

device unregister:
  - nft block destroy
    - UNBIND cb
      - free driver's block state
  - driver notifier callback
    - free driver's state

No?

> cache for the block is not work because the chain already be delete and free. Maybe it improve the prio of
> 
> rep_netdev_event can help this?

In theory the cache should work in a similar way as drivers, because
once the indr_dev is created and the initial block is found, the cached
value is just recorded in BIND/UNBIND calls. So if BIND/UNBIND works for
drivers it will also put the right info in the cache.

^ permalink raw reply

* Re: [PATCH V2] mlx5: Fix formats with line continuation whitespace
From: Joe Perches @ 2019-08-02 18:09 UTC (permalink / raw)
  To: Doug Ledford, Leon Romanovsky
  Cc: Saeed Mahameed, David S. Miller, netdev, linux-rdma, linux-kernel
In-Reply-To: <ac8361beee5dd80ad6546328dd7457bb6ee1ca5a.camel@redhat.com>

On Tue, 2018-11-06 at 16:34 -0500, Doug Ledford wrote:
> On Thu, 2018-11-01 at 09:34 +0200, Leon Romanovsky wrote:
> > On Thu, Nov 01, 2018 at 12:24:08AM -0700, Joe Perches wrote:
> > > The line continuations unintentionally add whitespace so
> > > instead use coalesced formats to remove the whitespace.
> > > 
> > > Signed-off-by: Joe Perches <joe@perches.com>
> > > ---
> > > 
> > > v2: Remove excess space after %u
> > > 
> > >  drivers/net/ethernet/mellanox/mlx5/core/rl.c | 6 ++----
> > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > > 
> > 
> > Thanks,
> > Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> 
> Applied, thanks.

Still not upstream.  How long does it take?

^ permalink raw reply

* Re: [PATCH v4 4/4] net: phy: realtek: configure RTL8211E LEDs
From: Andrew Lunn @ 2019-08-02 18:18 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: David S . Miller, Rob Herring, Mark Rutland, Florian Fainelli,
	Heiner Kallweit, netdev, devicetree, linux-kernel,
	Douglas Anderson
In-Reply-To: <20190801190759.28201-5-mka@chromium.org>

On Thu, Aug 01, 2019 at 12:07:59PM -0700, Matthias Kaehlcke wrote:
> Configure the RTL8211E LEDs behavior when the device tree property
> 'realtek,led-modes' is specified.
> 
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>

Hi Matthias

I was more thinking of adding a new driver call to the PHY driver API,
to configure an LED. Something like

rtl8211e_config_leds(phydev, int led, struct phy_led_config cfg);

It would be called by the phylib core after config_init(). But also,
thinking ahead to generic linux LED support, it could be called later
to reconfigure the LEDs to use a different trigger. The standard LED
sysfs interface would be used.

      Andrew

^ permalink raw reply

* Re: [PATCH v4 1/4] dt-bindings: net: phy: Add subnode for LED configuration
From: Matthias Kaehlcke @ 2019-08-02 18:27 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S . Miller, Rob Herring, Mark Rutland, Florian Fainelli,
	Heiner Kallweit, netdev, devicetree, linux-kernel,
	Douglas Anderson
In-Reply-To: <20190802165755.GM2099@lunn.ch>

On Fri, Aug 02, 2019 at 06:57:55PM +0200, Andrew Lunn wrote:
> On Thu, Aug 01, 2019 at 12:07:56PM -0700, Matthias Kaehlcke wrote:
> > The LED behavior of some Ethernet PHYs is configurable. Add an
> > optional 'leds' subnode with a child node for each LED to be
> > configured. The binding aims to be compatible with the common
> > LED binding (see devicetree/bindings/leds/common.txt).
> > 
> > A LED can be configured to be 'on' when a link with a certain speed
> > is active, or to blink on RX/TX activity. For the configuration to
> > be effective it needs to be supported by the hardware and the
> > corresponding PHY driver.
> > 
> > Suggested-by: Andrew Lunn <andrew@lunn.ch>
> > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > ---
> > Changes in v4:
> > - patch added to the series
> > ---
> >  .../devicetree/bindings/net/ethernet-phy.yaml | 47 +++++++++++++++++++
> >  1 file changed, 47 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/net/ethernet-phy.yaml b/Documentation/devicetree/bindings/net/ethernet-phy.yaml
> > index f70f18ff821f..81c5aacc89a5 100644
> > --- a/Documentation/devicetree/bindings/net/ethernet-phy.yaml
> > +++ b/Documentation/devicetree/bindings/net/ethernet-phy.yaml
> > @@ -153,6 +153,38 @@ properties:
> >        Delay after the reset was deasserted in microseconds. If
> >        this property is missing the delay will be skipped.
> >  
> > +patternProperties:
> > +  "^leds$":
> > +    type: object
> > +    description:
> > +      Subnode with configuration of the PHY LEDs.
> > +
> > +    patternProperties:
> > +      "^led@[0-9]+$":
> > +        type: object
> > +        description:
> > +          Subnode with the configuration of a single PHY LED.
> > +
> > +    properties:
> > +      reg:
> > +        description:
> > +          The ID number of the LED, typically corresponds to a hardware ID.
> > +        $ref: "/schemas/types.yaml#/definitions/uint32"
> > +
> > +      linux,default-trigger:
> > +        description:
> > +          This parameter, if present, is a string specifying the trigger
> > +          assigned to the LED. Supported triggers are:
> > +            "phy_link_10m_active" - LED will be on when a 10Mb/s link is active
> > +            "phy_link_100m_active" - LED will be on when a 100Mb/s link is active
> > +            "phy_link_1g_active" - LED will be on when a 1Gb/s link is active
> > +            "phy_link_10g_active" - LED will be on when a 10Gb/s link is active
> > +            "phy_activity" - LED will blink when data is received or transmitted
> 
> Matthias
> 
> We should think a bit more about these names.
> 
> I can see in future needing 1G link, but it blinks off when there is
> active traffic? So phy_link_1g_active could be confusing, and very similar to
> phy_link_1g_activity?

agreed, the 'active' vs' 'activity' can be confusing, let's avoid that.

> So maybe 

> > +            "phy_link_10m" - LED will be solid on when a 10Mb/s link is active
> > +            "phy_link_100m" - LED will be solid on when a 100Mb/s link is active
> > +            "phy_link_1g" - LED will be solid on when a 1Gb/s link is active
> 
> etc.
>
> And then in the future we can have
> 
>                "phy_link_1g_activity' - LED will be on when 1Gbp/s
>                                         link is active and blink off
>                                         with activity.

sounds good to me

> What other use cases do we have? I don't want to support everything,
> but we should be able to represent the most common modes without the
> names getting too confusing.

Initially I planned to support to configure a LED to be solid for
multiple link speeds, however that could become a bit messy with the
string based triggers, unless we limit the possible combinations. My
expertise in network land is limited, so I'm not sure if that's an
important/realistic use case.

^ permalink raw reply

* Re: [PATCH V2] mlx5: Fix formats with line continuation whitespace
From: Saeed Mahameed @ 2019-08-02 18:32 UTC (permalink / raw)
  To: joe@perches.com, dledford@redhat.com, leon@kernel.org
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <f2b2559865e8bd59202e14b837a522a801d498e2.camel@perches.com>

On Fri, 2019-08-02 at 11:09 -0700, Joe Perches wrote:
> On Tue, 2018-11-06 at 16:34 -0500, Doug Ledford wrote:
> > On Thu, 2018-11-01 at 09:34 +0200, Leon Romanovsky wrote:
> > > On Thu, Nov 01, 2018 at 12:24:08AM -0700, Joe Perches wrote:
> > > > The line continuations unintentionally add whitespace so
> > > > instead use coalesced formats to remove the whitespace.
> > > > 
> > > > Signed-off-by: Joe Perches <joe@perches.com>
> > > > ---
> > > > 
> > > > v2: Remove excess space after %u
> > > > 
> > > >  drivers/net/ethernet/mellanox/mlx5/core/rl.c | 6 ++----
> > > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > > > 
> > > 
> > > Thanks,
> > > Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> > 
> > Applied, thanks.
> 
> Still not upstream.  How long does it take?
> 

Doug, Leon, this patch still apply, let me know what happened here ?
and if you want me to apply it to one of my branches.

Thanks,
Saeed.


^ permalink raw reply

* Re: [PATCH bpf-next v4 2/2] xsk: support BPF_EXIST and BPF_NOEXIST flags in XSKMAP
From: Jonathan Lemon @ 2019-08-02 18:28 UTC (permalink / raw)
  To: Björn Töpel
  Cc: ast, daniel, netdev, Björn Töpel, magnus.karlsson,
	bruce.richardson, songliubraving, bpf
In-Reply-To: <20190802081154.30962-3-bjorn.topel@gmail.com>



On 2 Aug 2019, at 1:11, Björn Töpel wrote:

> From: Björn Töpel <bjorn.topel@intel.com>
>
> The XSKMAP did not honor the BPF_EXIST/BPF_NOEXIST flags when updating
> an entry. This patch addresses that.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>

Reviewed-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply

* Re: [PATCH bpf-next v4 1/2] xsk: remove AF_XDP socket from map when the socket is released
From: Jonathan Lemon @ 2019-08-02 18:28 UTC (permalink / raw)
  To: Björn Töpel
  Cc: ast, daniel, netdev, Björn Töpel, magnus.karlsson,
	bruce.richardson, songliubraving, bpf
In-Reply-To: <20190802081154.30962-2-bjorn.topel@gmail.com>



On 2 Aug 2019, at 1:11, Björn Töpel wrote:

> From: Björn Töpel <bjorn.topel@intel.com>
>
> When an AF_XDP socket is released/closed the XSKMAP still holds a
> reference to the socket in a "released" state. The socket will still
> use the netdev queue resource, and block newly created sockets from
> attaching to that queue, but no user application can access the
> fill/complete/rx/tx queues. This results in that all applications need
> to explicitly clear the map entry from the old "zombie state"
> socket. This should be done automatically.
>
> In this patch, the sockets tracks, and have a reference to, which maps
> it resides in. When the socket is released, it will remove itself from
> all maps.
>
> Suggested-by: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>

Reviewed-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply

* Re: [PATCH] net/mlx4_core: Use refcount_t for refcount
From: Saeed Mahameed @ 2019-08-02 18:38 UTC (permalink / raw)
  To: hslester96@gmail.com
  Cc: linux-kernel@vger.kernel.org, davem@davemloft.net, Tariq Toukan,
	linux-rdma@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <CANhBUQ1chO0Q6wHJwbKMvp6LkD7qLBRw57xwf1QkBAKaewHs5w@mail.gmail.com>

On Sat, 2019-08-03 at 00:10 +0800, Chuhong Yuan wrote:
> Chuhong Yuan <hslester96@gmail.com> 于2019年8月2日周五 下午8:10写道：
> > refcount_t is better for reference counters since its
> > implementation can prevent overflows.
> > So convert atomic_t ref counters to refcount_t.
> > 
> > Also convert refcount from 0-based to 1-based.
> > 
> 
> It seems that directly converting refcount from 0-based
> to 1-based is infeasible.
> I am sorry for this mistake.

Just curious, why not keep it 0 based and use refcout_t ?

refcount API should have the same semantics as atomic_t API .. no ?

^ permalink raw reply

* Re: [v2,1/2] tools: bpftool: add net attach command to attach XDP on interface
From: Jakub Kicinski @ 2019-08-02 18:39 UTC (permalink / raw)
  To: Daniel T. Lee; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <CAEKGpzhsjMuf+DtN3pDVYMxJa5o2e=-3AeWbHFiFoMoXCkgsNg@mail.gmail.com>

On Fri, 2 Aug 2019 14:02:29 +0900, Daniel T. Lee wrote:
> On Fri, Aug 2, 2019 at 8:36 AM Jakub Kicinski  wrote:
> > On Thu,  1 Aug 2019 17:11:32 +0900, Daniel T. Lee wrote:  
> > > By this commit, using `bpftool net attach`, user can attach XDP prog on
> > > interface. New type of enum 'net_attach_type' has been made, as stated at
> > > cover-letter, the meaning of 'attach' is, prog will be attached on interface.
> > >
> > > BPF prog will be attached through libbpf 'bpf_set_link_xdp_fd'.
> > >
> > > Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > > ---
> > > Changes in v2:
> > >   - command 'load' changed to 'attach' for the consistency
> > >   - 'NET_ATTACH_TYPE_XDP_DRIVE' changed to 'NET_ATTACH_TYPE_XDP_DRIVER'
> > >
> > >  tools/bpf/bpftool/net.c | 107 +++++++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 106 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
> > > index 67e99c56bc88..f3b57660b303 100644
> > > --- a/tools/bpf/bpftool/net.c
> > > +++ b/tools/bpf/bpftool/net.c
> > > @@ -55,6 +55,35 @@ struct bpf_attach_info {
> > >       __u32 flow_dissector_id;
> > >  };
> > >
> > > +enum net_attach_type {
> > > +     NET_ATTACH_TYPE_XDP,
> > > +     NET_ATTACH_TYPE_XDP_GENERIC,
> > > +     NET_ATTACH_TYPE_XDP_DRIVER,
> > > +     NET_ATTACH_TYPE_XDP_OFFLOAD,
> > > +     __MAX_NET_ATTACH_TYPE
> > > +};
> > > +
> > > +static const char * const attach_type_strings[] = {
> > > +     [NET_ATTACH_TYPE_XDP] = "xdp",
> > > +     [NET_ATTACH_TYPE_XDP_GENERIC] = "xdpgeneric",
> > > +     [NET_ATTACH_TYPE_XDP_DRIVER] = "xdpdrv",
> > > +     [NET_ATTACH_TYPE_XDP_OFFLOAD] = "xdpoffload",
> > > +     [__MAX_NET_ATTACH_TYPE] = NULL,  
> >
> > Not sure if the terminator is necessary,
> > ARRAY_SIZE(attach_type_strings) should suffice?  
> 
> Yes, ARRAY_SIZE is fine though. But I was just trying to make below
> 'parse_attach_type' consistent with 'parse_attach_type' from the 'prog.c'.
> At 'prog.c', It has same terminator at 'attach_type_strings'.
> 
> Should I change it or keep it?

Oh well, I guess there is some precedent for that :S

Quick grep for const char * const reveals we have around 7 non-NULL
terminated arrays, and 2 NULL terminated. Plus the NULL-terminated
don't align the '=' sign, while most do.

it's not a big deal, my preference is for not NULL terminating here,
and aligning '='.

> > > +     NEXT_ARG();
> > > +     if (!REQ_ARGS(1))
> > > +             return -EINVAL;  
> >
> > Error message needed here.
> >  
> 
> Actually it provides error message like:
> Error: 'xdp' needs at least 1 arguments, 0 found
> 
> are you suggesting that any additional error message is necessary?

Ah, sorry, I missed REQ_ARGS() there!

> > > +             return -EINVAL;
> > > +     }  
> >
> > Please require the dev keyword before the interface name.
> > That'll make it feel closer to prog load syntax.  
> 
> If adding the dev keyword before interface name, will it be too long to type in?

I think it's probably muscle memory for most. Plus we have excellent
bash completions.

> and also `bpftool prog` use extra keyword (such as dev) when it is
> optional keyword.
> 
>        bpftool prog dump jited  PROG [{ file FILE | opcodes | linum }]
>        bpftool prog pin   PROG FILE
>        bpftool prog { load | loadall } OBJ  PATH \
> 
> as you can see here, FILE uses optional keyword 'file' when the
> argument is optional.

Not sure I follow 🤔

>        bpftool prog { load | loadall } OBJ  PATH \
>                          [type TYPE] [dev NAME] \
>                          [map { idx IDX | name NAME } MAP]\
>                          [pinmaps MAP_DIR]
> 
> Yes, bpftool prog load has dev keyword with it,
> 
> but first, like previous, the argument is optional so i think it is
> unnecessary to use optional keyword 'dev'.

The keyword should not be optional if device name is specified.
Maybe lack of coffee on my side..

> and secondly, 'bpftool net attach' isn't really related to 'bpftool prog load'.
>
> At previous version patch, I was using word 'load' instead of
> 'attach', since XDP program is not
> considered as 'BPF_PROG_ATTACH', so it might give a confusion. However
> by the last patch discussion,
> word 'load' has been replaced to 'attach'.
> 
> Keeping the consistency is very important, but I was just wandering
> about making command
> similar to 'bpftool prog load' syntax.

In case of TC the device argument is optional. You may specify it, or
you can refer to TC blocks instead. So for that reason alone I think
it'll be much cleaner to require dev before the interface name.

> > > +     return 0;
> > > +}
> > > +
> > > +static int do_attach_detach_xdp(int *progfd, enum net_attach_type *attach_type,
> > > +                             int *ifindex)
> > > +{
> > > +     __u32 flags;
> > > +     int err;
> > > +
> > > +     flags = XDP_FLAGS_UPDATE_IF_NOEXIST;  
> >
> > Please add this as an option so that user can decide whether overwrite
> > is allowed or not.  
> 
> Adding force flag to bpftool seems necessary.
> I will add an optional argument for this.

Right, I was wondering if we want to call it force, though? force is
sort of a reuse of iproute2 concept. But it's kind of hard to come up
with names.

Just to be sure - I mean something like:

bpftool net attach xdp id xyz dev ethN noreplace

Rather than:

bpftool -f net attach xdp id xyz dev ethN

> > > +     if (*attach_type == NET_ATTACH_TYPE_XDP_GENERIC)
> > > +             flags |= XDP_FLAGS_SKB_MODE;
> > > +     if (*attach_type == NET_ATTACH_TYPE_XDP_DRIVER)
> > > +             flags |= XDP_FLAGS_DRV_MODE;
> > > +     if (*attach_type == NET_ATTACH_TYPE_XDP_OFFLOAD)
> > > +             flags |= XDP_FLAGS_HW_MODE;
> > > +
> > > +     err = bpf_set_link_xdp_fd(*ifindex, *progfd, flags);
> > > +
> > > +     return err;  
> >
> > no need for the err variable here.  
> 
> My apologies, but I'm not sure why err variable isn't needed at here.
> AFAIK, 'bpf_set_link_xdp_fd' from libbpf returns the netlink_recv result,
> and in order to propagate error, err variable is necessary, I guess?

	return bpf_set_link_xdp_fd(*ifindex, *progfd, flags);

Is what I meant.

> > > +}
> > > +
> > > +static int do_attach(int argc, char **argv)
> > > +{
> > > +     enum net_attach_type attach_type;
> > > +     int err, progfd, ifindex;
> > > +
> > > +     err = parse_attach_args(argc, argv, &progfd, &attach_type, &ifindex);
> > > +     if (err)
> > > +             return err;  
> >
> > Probably not the best idea to move this out into a helper.  
> 
> Again, just trying to make consistent with 'prog.c'.
> 
> But clearly it has differences with do_attach/detach from 'prog.c'.
> From it, it uses the same parse logic 'parse_attach_detach_args' since
> the two command 'bpftool prog attach/detach' uses the same argument format.
> 
> However, in here, 'bpftool net' attach and detach requires different number of
> argument, so function for parse argument has been defined separately.
> The situation is little bit different, but keeping argument parse logic as an
> helper, I think it's better in terms of consistency.

Well they won't share the same arguments if you add the keyword for
controlling IF_NOEXIST :(

> About the moving parse logic to a helper, I was trying to keep command
> entry (do_attach)
> as simple as possible. Parse all the argument in command entry will
> make function longer
> and might make harder to understand what it does.
> 
> And I'm not pretty sure that argument parse logic will stays the same
> after other attachment
> type comes in. What I mean is, the argument count or type might be
> added and to fulfill
> all that specific cases, the code might grow larger.
> 
> So for the consistency, simplicity and extensibility, I prefer to keep
> it as a helper.
> 
> > > +     if (is_prefix("xdp", attach_type_strings[attach_type]))
> > > +             err = do_attach_detach_xdp(&progfd, &attach_type, &ifindex);  
> >
> > Hm. We either need an error to be reported if it's not xdp or since we
> > only accept XDP now perhaps the if() is superfluous?  
> 
> Well, if the attach_type isn't xdp, the error will be occurred from
> the argument parse,
> Will it be necessary to reinforce with error logic to make it more secure?

Hm. it should already be fine, no? For non-xdp parse_attach_type() will
return __MAX_NET_ATTACH_TYPE, then parsing returns EINVAL and we exit.
Not sure I follow.

^ permalink raw reply

* Re: [PATCH 06/34] drm/i915: convert put_page() to put_user_page*()
From: John Hubbard @ 2019-08-02 18:48 UTC (permalink / raw)
  To: Joonas Lahtinen, Andrew Morton, john.hubbard
  Cc: Christoph Hellwig, Dan Williams, Dave Chinner, Dave Hansen,
	Ira Weiny, Jan Kara, Jason Gunthorpe, Jérôme Glisse,
	LKML, amd-gfx, ceph-devel, devel, devel, dri-devel, intel-gfx,
	kvm, linux-arm-kernel, linux-block, linux-crypto, linux-fbdev,
	linux-fsdevel, linux-media, linux-mm, linux-nfs, linux-rdma,
	linux-rpi-kernel, linux-xfs, netdev, rds-devel, sparclinux, x86,
	xen-devel, Jani Nikula, Rodrigo Vivi, David Airlie
In-Reply-To: <156473756254.19842.12384378926183716632@jlahtine-desk.ger.corp.intel.com>

On 8/2/19 2:19 AM, Joonas Lahtinen wrote:
> Quoting john.hubbard@gmail.com (2019-08-02 05:19:37)
>> From: John Hubbard <jhubbard@nvidia.com>
>>
>> For pages that were retained via get_user_pages*(), release those pages
>> via the new put_user_page*() routines, instead of via put_page() or
>> release_pages().
>>
>> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
>> ("mm: introduce put_user_page*(), placeholder versions").
>>
>> Note that this effectively changes the code's behavior in
>> i915_gem_userptr_put_pages(): it now calls set_page_dirty_lock(),
>> instead of set_page_dirty(). This is probably more accurate.
> 
> We've already fixed this in drm-tip where the current code uses
> set_page_dirty_lock().
> 
> This would conflict with our tree. Rodrigo is handling
> drm-intel-next for 5.4, so you guys want to coordinate how
> to merge.
> 

Hi Joonas, Rodrigo,

First of all, I apologize for the API breakage: put_user_pages_dirty_lock()
has an additional "dirty" parameter.

In order to deal with the merge problem, I'll drop this patch from my series,
and I'd recommend that the drm-intel-next take the following approach:

1) For now, s/put_page/put_user_page/ in i915_gem_userptr_put_pages(),
and fix up the set_page_dirty() --> set_page_dirty_lock() issue, like this
(based against linux.git):

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 528b61678334..94721cc0093b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -664,10 +664,10 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,

         for_each_sgt_page(page, sgt_iter, pages) {
                 if (obj->mm.dirty)
-                       set_page_dirty(page);
+                       set_page_dirty_lock(page);

                 mark_page_accessed(page);
-               put_page(page);
+               put_user_page(page);
         }
         obj->mm.dirty = false;


That will leave you with your original set_page_dirty_lock() calls
and everything works properly.

2) Next cycle, move to the new put_user_pages_dirty_lock().

thanks,
-- 
John Hubbard
NVIDIA


> Regards, Joonas
> 

^ permalink raw reply related

* Re: [PATCH 16/34] drivers/tee: convert put_page() to put_user_page*()
From: John Hubbard @ 2019-08-02 18:51 UTC (permalink / raw)
  To: Jens Wiklander, john.hubbard
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Dave Hansen, Ira Weiny, Jan Kara, Jason Gunthorpe,
	Jérôme Glisse, LKML, amd-gfx, ceph-devel, devel, devel,
	dri-devel, intel-gfx, kvm, Linux ARM, linux-block,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE, linux-fbdev,
	linux-fsdevel, linux-media, linux-mm, linux-nfs, linux-rdma,
	linux-rpi-kernel, linux-xfs, netdev, rds-devel, sparclinux, x86,
	xen-devel
In-Reply-To: <CAHUa44G++iiwU62jj7QH=V3sr4z26sf007xrwWLPw6AAeMLAEw@mail.gmail.com>

On 8/1/19 11:29 PM, Jens Wiklander wrote:
> On Fri, Aug 2, 2019 at 4:20 AM <john.hubbard@gmail.com> wrote:
>>
>> From: John Hubbard <jhubbard@nvidia.com>
>>
>> For pages that were retained via get_user_pages*(), release those pages
>> via the new put_user_page*() routines, instead of via put_page() or
>> release_pages().
>>
>> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
>> ("mm: introduce put_user_page*(), placeholder versions").
>>
>> Cc: Jens Wiklander <jens.wiklander@linaro.org>
>> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>> ---
>>   drivers/tee/tee_shm.c | 10 ++--------
>>   1 file changed, 2 insertions(+), 8 deletions(-)
> 
> Acked-by: Jens Wiklander <jens.wiklander@linaro.org>
> 
> I suppose you're taking this via your own tree or such.
> 

Hi Jens,

Thanks for the ACK! I'm expecting that Andrew will take this through his
-mm tree, unless he pops up and says otherwise.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* Re: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits
From: Bernd @ 2019-08-02 19:02 UTC (permalink / raw)
  To: netdev

Hello,

While analyzing a aborted upload packet capture I came across a odd
trace where a sender was not responding to a duplicate SACK but
sending further segments until it stalled.

Took me some time until I remembered this fix, and actually the
problems started since the security fix was applied.

I see a high counter for TCPWqueueTooBig - and I don’t think that’s an
actual attack.

Is there a probability for triggering the limit with connections with
big windows and large send buffers and dropped segments? If so what
would be the plan? It does not look like it is configurable. The trace
seem to have 100 (filled) inflight segments.

Gruss
Bernd
-- 
http://bernd.eckenfels.net

^ permalink raw reply

* Re: [PATCH V2] mlx5: Fix formats with line continuation whitespace
From: Doug Ledford @ 2019-08-02 19:08 UTC (permalink / raw)
  To: Saeed Mahameed, joe@perches.com, leon@kernel.org
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <910f77ed7f2923206adc8927204c6d759ec18d20.camel@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 1729 bytes --]

On Fri, 2019-08-02 at 18:32 +0000, Saeed Mahameed wrote:
> On Fri, 2019-08-02 at 11:09 -0700, Joe Perches wrote:
> > On Tue, 2018-11-06 at 16:34 -0500, Doug Ledford wrote:
> > > On Thu, 2018-11-01 at 09:34 +0200, Leon Romanovsky wrote:
> > > > On Thu, Nov 01, 2018 at 12:24:08AM -0700, Joe Perches wrote:
> > > > > The line continuations unintentionally add whitespace so
> > > > > instead use coalesced formats to remove the whitespace.
> > > > > 
> > > > > Signed-off-by: Joe Perches <joe@perches.com>
> > > > > ---
> > > > > 
> > > > > v2: Remove excess space after %u
> > > > > 
> > > > >  drivers/net/ethernet/mellanox/mlx5/core/rl.c | 6 ++----
> > > > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > > > > 
> > > > 
> > > > Thanks,
> > > > Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> > > 
> > > Applied, thanks.
> > 
> > Still not upstream.  How long does it take?
> > 
> 
> Doug, Leon, this patch still apply, let me know what happened here ?
> and if you want me to apply it to one of my branches.

I'm not entirely sure what happened here.  Obviously I said I had taken
it, which I don't do under my normal workflow until I've actually
applied and build tested the patch.  For it to not make it into the tree
means that I probably applied it to my wip/dl-for-next branch, but prior
to moving it to for-next, I might have had a rebase and it got lost in
the shuffle or something like that.  My apologies for letting it slip
through the cracks.  Anyway, I pulled the patch from patchworks, applied
it, and pushed it to k.o.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits
From: Neal Cardwell @ 2019-08-02 19:14 UTC (permalink / raw)
  To: Bernd; +Cc: netdev, Eric Dumazet
In-Reply-To: <CABOR3+yUiu1BzCojFQFADUKc5BT2-Ew_j7KFNpjP8WoMYZ+SMA@mail.gmail.com>

On Fri, Aug 2, 2019 at 3:03 PM Bernd <ecki@zusammenkunft.net> wrote:
>
> Hello,
>
> While analyzing a aborted upload packet capture I came across a odd
> trace where a sender was not responding to a duplicate SACK but
> sending further segments until it stalled.
>
> Took me some time until I remembered this fix, and actually the
> problems started since the security fix was applied.
>
> I see a high counter for TCPWqueueTooBig - and I don’t think that’s an
> actual attack.
>
> Is there a probability for triggering the limit with connections with
> big windows and large send buffers and dropped segments? If so what
> would be the plan? It does not look like it is configurable. The trace
> seem to have 100 (filled) inflight segments.
>
> Gruss
> Bernd
> --
> http://bernd.eckenfels.net

What's the exact kernel version you are using?

Eric submitted a patch recently that may address your issue:
   tcp: be more careful in tcp_fragment()
  https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=b617158dc096709d8600c53b6052144d12b89fab

Would you be able to test your workload with that commit
cherry-picked, and see if the issue still occurs?

That commit was targeted to many stable releases, so you may be able
to pick up that fix from a stable branch.

cheers,
neal

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox