Netdev List
 help / color / mirror / Atom feed
* [PATCH net 1/2] l2tp: ensure sessions are freed after their PPPOL2TP socket
From: Guillaume Nault @ 2017-09-22 13:39 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman, Tom Parkin, Sabrina Dubroca
In-Reply-To: <cover.1506086081.git.g.nault@alphalink.fr>

If l2tp_tunnel_delete() or l2tp_tunnel_closeall() deletes a session
right after pppol2tp_release() orphaned its socket, then the 'sock'
variable of the pppol2tp_session_close() callback is NULL. Yet the
session is still used by pppol2tp_release().

Therefore we need to take an extra reference in any case, to prevent
l2tp_tunnel_delete() or l2tp_tunnel_closeall() from freeing the session.

Since the pppol2tp_session_close() callback is only set if the session
is associated to a PPPOL2TP socket and that both l2tp_tunnel_delete()
and l2tp_tunnel_closeall() hold the PPPOL2TP socket before calling
pppol2tp_session_close(), we're sure that pppol2tp_session_close() and
pppol2tp_session_destruct() are paired and called in the right order.
So the reference taken by the former will be released by the later.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_ppp.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 50e3ee9a9d61..bc6e8bfc5be4 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -437,11 +437,11 @@ static void pppol2tp_session_close(struct l2tp_session *session)
 
 	BUG_ON(session->magic != L2TP_SESSION_MAGIC);
 
-	if (sock) {
+	if (sock)
 		inet_shutdown(sock, SEND_SHUTDOWN);
-		/* Don't let the session go away before our socket does */
-		l2tp_session_inc_refcount(session);
-	}
+
+	/* Don't let the session go away before our socket does */
+	l2tp_session_inc_refcount(session);
 }
 
 /* Really kill the session socket. (Called from sock_put() if
-- 
2.14.1

^ permalink raw reply related

* [PATCH net 2/2] l2tp: fix race between l2tp_session_delete() and l2tp_tunnel_closeall()
From: Guillaume Nault @ 2017-09-22 13:39 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman, Tom Parkin, Sabrina Dubroca
In-Reply-To: <cover.1506086081.git.g.nault@alphalink.fr>

There are several ways to remove L2TP sessions:

  * deleting a session explicitly using the netlink interface (with
    L2TP_CMD_SESSION_DELETE),
  * deleting the session's parent tunnel (either by closing the
    tunnel's file descriptor or using the netlink interface),
  * closing the PPPOL2TP file descriptor of a PPP pseudo-wire.

In some cases, when these methods are used concurrently on the same
session, the session can be removed twice, leading to use-after-free
bugs.

This patch adds a 'dead' flag, used by l2tp_session_delete() and
l2tp_tunnel_closeall() to prevent them from stepping on each other's
toes.

The session deletion path used when closing a PPPOL2TP file descriptor
doesn't need to be adapted. It already has to ensure that a session
remains valid for the lifetime of its PPPOL2TP file descriptor.
So it takes an extra reference on the session in the ->session_close()
callback (pppol2tp_session_close()), which is eventually dropped
in the ->sk_destruct() callback of the PPPOL2TP socket
(pppol2tp_session_destruct()).
Still, __l2tp_session_unhash() and l2tp_session_queue_purge() can be
called twice and even concurrently for a given session, but thanks to
proper locking and re-initialisation of list fields, this is not an
issue.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_core.c | 6 ++++++
 net/l2tp/l2tp_core.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index ee485df73ccd..d8c2a89a76e1 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1314,6 +1314,9 @@ void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
 
 			hlist_del_init(&session->hlist);
 
+			if (test_and_set_bit(0, &session->dead))
+				goto again;
+
 			if (session->ref != NULL)
 				(*session->ref)(session);
 
@@ -1750,6 +1753,9 @@ EXPORT_SYMBOL_GPL(__l2tp_session_unhash);
  */
 int l2tp_session_delete(struct l2tp_session *session)
 {
+	if (test_and_set_bit(0, &session->dead))
+		return 0;
+
 	if (session->ref)
 		(*session->ref)(session);
 	__l2tp_session_unhash(session);
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index a305e0c5925a..70a12df40a5f 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -76,6 +76,7 @@ struct l2tp_session_cfg {
 struct l2tp_session {
 	int			magic;		/* should be
 						 * L2TP_SESSION_MAGIC */
+	long			dead;
 
 	struct l2tp_tunnel	*tunnel;	/* back pointer to tunnel
 						 * context */
-- 
2.14.1

^ permalink raw reply related

* [PATCH][V2] e1000: avoid null pointer dereference on invalid stat type
From: Colin King @ 2017-09-22 13:41 UTC (permalink / raw)
  To: Jeff Kirsher, intel-wired-lan, netdev; +Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Currently if the stat type is invalid then data[i] is being set
either by dereferencing a null pointer p, or it is reading from
an incorrect previous location if we had a valid stat type
previously.  Fix this by nullify pointer p if a stat type is
invalid and only setting data if p is not null.

Detected by CoverityScan, CID#113385 ("Explicit null dereferenced")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
index ec8aa4562cc9..724c93a6ea92 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
@@ -1824,7 +1824,7 @@ static void e1000_get_ethtool_stats(struct net_device *netdev,
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 	int i;
-	char *p = NULL;
+	char *p;
 	const struct e1000_stats *stat = e1000_gstrings_stats;
 
 	e1000_update_stats(adapter);
@@ -1837,16 +1837,18 @@ static void e1000_get_ethtool_stats(struct net_device *netdev,
 			p = (char *)adapter + stat->stat_offset;
 			break;
 		default:
+			p = NULL;
 			WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n",
 				  stat->type, i);
 			break;
 		}
 
-		if (stat->sizeof_stat == sizeof(u64))
-			data[i] = *(u64 *)p;
-		else
-			data[i] = *(u32 *)p;
-
+		if (p) {
+			if (stat->sizeof_stat == sizeof(u64))
+				data[i] = *(u64 *)p;
+			else
+				data[i] = *(u32 *)p;
+		}
 		stat++;
 	}
 /* BUG_ON(i != E1000_STATS_LEN); */
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
From: Edward Cree @ 2017-09-22 13:46 UTC (permalink / raw)
  To: Y Song; +Cc: Alexei Starovoitov, Daniel Borkmann, David Miller, netdev
In-Reply-To: <CAH3MdRVK-=pL8YCz44BPCmHhuZE_CRfih0QdZPLa=mpE_uoA_Q@mail.gmail.com>

On 22/09/17 00:11, Y Song wrote:
> On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree <ecree@solarflare.com> wrote:
>> On 21/09/17 20:44, Alexei Starovoitov wrote:
>>> On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote:
>>>> More intuitive, but agree on the from_be/le. Maybe we should
>>>> just drop the "to_" prefix altogether, and leave the rest as is since
>>>> it's not surrounded by braces, it's also not a cast but rather an op.
>> That works for me.
>>> 'be16 r4' is ambiguous regarding upper bits.
>>>
>>> what about my earlier suggestion:
>>> r4 = (be16) (u16) r4
>>> r4 = (le64) (u64) r4
>>>
>>> It will be pretty clear what instruction is doing (that upper bits become zero).
>> Trouble with that is that's very *not* what C will do with those casts
>>  and it doesn't really capture the bidirectional/symmetry thing.  The
>>  closest I could see with that is something like `r4 = (be16/u16) r4`,
>>  but that's quite an ugly mongrel.
>> I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the
>>  cleanest and clearest.  Should it be
>>     r4 = be16 r4
>>  or just
>>     be16 r4
>> ?  Personally I incline towards the latter, but admit it doesn't really
>>  match the syntax of other opcodes.
> I did some quick prototyping in llvm to make sure we have a syntax
> llvm is happy. Apparently, llvm does not like the syntax
>    r4 = be16 r4    or    r4 = (be16) (u16) r4.
>
> In llvm:utils/TableGen/AsmMatcherEmitter.cpp:
>
>     // Verify that any operand is only mentioned once.
Wait, how do you deal with (totally legal) r4 += r4?
Or r4 = *(r4 +0)?
Even jumps can have src_reg == dst_reg, though it doesn't seem useful.

^ permalink raw reply

* [PATCH v4 1/3] ipv4: Namespaceify tcp_fastopen knob
From: Haishuang Yan @ 2017-09-22 13:48 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Eric Dumazet, Wei Wang,
	Luca BRUNO
  Cc: netdev, linux-kernel, Haishuang Yan

Different namespace application might require enable TCP Fast Open
feature independently of the host.

This patch series continues making more of the TCP Fast Open related
sysctl knobs be per net-namespace.

Reported-by: Luca BRUNO <lucab@debian.org>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

---
Change since v4:
  * Fix potential leak issue
  * Fix typo error
  * Split the patches to be per-sysctl
  * Remove unrelated change by mistake
---
 include/net/netns/ipv4.h   |  1 +
 include/net/tcp.h          |  1 -
 net/ipv4/af_inet.c         |  7 ++++---
 net/ipv4/sysctl_net_ipv4.c | 14 +++++++-------
 net/ipv4/tcp.c             |  4 ++--
 net/ipv4/tcp_fastopen.c    | 11 +++++------
 net/ipv4/tcp_ipv4.c        |  2 ++
 7 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 20d061c..ce6dde0 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -127,6 +127,7 @@ struct netns_ipv4 {
 	int sysctl_tcp_timestamps;
 	struct inet_timewait_death_row tcp_death_row;
 	int sysctl_max_syn_backlog;
+	int sysctl_tcp_fastopen;
 
 #ifdef CONFIG_NET_L3_MASTER_DEV
 	int sysctl_udp_l3mdev_accept;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index b510f28..f628967 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -240,7 +240,6 @@
 
 
 /* sysctl variables for tcp */
-extern int sysctl_tcp_fastopen;
 extern int sysctl_tcp_retrans_collapse;
 extern int sysctl_tcp_stdurg;
 extern int sysctl_tcp_rfc1337;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index e31108e..ddd126d 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -195,7 +195,7 @@ int inet_listen(struct socket *sock, int backlog)
 {
 	struct sock *sk = sock->sk;
 	unsigned char old_state;
-	int err;
+	int err, tcp_fastopen;
 
 	lock_sock(sk);
 
@@ -217,8 +217,9 @@ int inet_listen(struct socket *sock, int backlog)
 		 * because the socket was in TCP_LISTEN state previously but
 		 * was shutdown() rather than close().
 		 */
-		if ((sysctl_tcp_fastopen & TFO_SERVER_WO_SOCKOPT1) &&
-		    (sysctl_tcp_fastopen & TFO_SERVER_ENABLE) &&
+		tcp_fastopen = sock_net(sk)->ipv4.sysctl_tcp_fastopen;
+		if ((tcp_fastopen & TFO_SERVER_WO_SOCKOPT1) &&
+		    (tcp_fastopen & TFO_SERVER_ENABLE) &&
 		    !inet_csk(sk)->icsk_accept_queue.fastopenq.max_qlen) {
 			fastopen_queue_tune(sk, backlog);
 			tcp_fastopen_init_key_once(true);
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 0d3c038..e31e853c 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -401,13 +401,6 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
 		.proc_handler	= proc_dointvec
 	},
 	{
-		.procname	= "tcp_fastopen",
-		.data		= &sysctl_tcp_fastopen,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
-	},
-	{
 		.procname	= "tcp_fastopen_key",
 		.mode		= 0600,
 		.maxlen		= ((TCP_FASTOPEN_KEY_LENGTH * 2) + 10),
@@ -1085,6 +1078,13 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "tcp_fastopen",
+		.data		= &init_net.ipv4.sysctl_tcp_fastopen,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	{
 		.procname	= "fib_multipath_use_neigh",
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5091402..dac56c4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1126,7 +1126,7 @@ static int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
 	struct sockaddr *uaddr = msg->msg_name;
 	int err, flags;
 
-	if (!(sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) ||
+	if (!(sock_net(sk)->ipv4.sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) ||
 	    (uaddr && msg->msg_namelen >= sizeof(uaddr->sa_family) &&
 	     uaddr->sa_family == AF_UNSPEC))
 		return -EOPNOTSUPP;
@@ -2759,7 +2759,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 	case TCP_FASTOPEN_CONNECT:
 		if (val > 1 || val < 0) {
 			err = -EINVAL;
-		} else if (sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) {
+		} else if (net->ipv4.sysctl_tcp_fastopen & TFO_CLIENT_ENABLE) {
 			if (sk->sk_state == TCP_CLOSE)
 				tp->fastopen_connect = val;
 			else
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index e3c3322..31b08ec 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -9,8 +9,6 @@
 #include <net/inetpeer.h>
 #include <net/tcp.h>
 
-int sysctl_tcp_fastopen __read_mostly = TFO_CLIENT_ENABLE;
-
 struct tcp_fastopen_context __rcu *tcp_fastopen_ctx;
 
 static DEFINE_SPINLOCK(tcp_fastopen_ctx_lock);
@@ -279,21 +277,22 @@ struct sock *tcp_try_fastopen(struct sock *sk, struct sk_buff *skb,
 			      struct request_sock *req,
 			      struct tcp_fastopen_cookie *foc)
 {
-	struct tcp_fastopen_cookie valid_foc = { .len = -1 };
 	bool syn_data = TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq + 1;
+	int tcp_fastopen = sock_net(sk)->ipv4.sysctl_tcp_fastopen;
+	struct tcp_fastopen_cookie valid_foc = { .len = -1 };
 	struct sock *child;
 
 	if (foc->len == 0) /* Client requests a cookie */
 		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPFASTOPENCOOKIEREQD);
 
-	if (!((sysctl_tcp_fastopen & TFO_SERVER_ENABLE) &&
+	if (!((tcp_fastopen & TFO_SERVER_ENABLE) &&
 	      (syn_data || foc->len >= 0) &&
 	      tcp_fastopen_queue_check(sk))) {
 		foc->len = -1;
 		return NULL;
 	}
 
-	if (syn_data && (sysctl_tcp_fastopen & TFO_SERVER_COOKIE_NOT_REQD))
+	if (syn_data && (tcp_fastopen & TFO_SERVER_COOKIE_NOT_REQD))
 		goto fastopen;
 
 	if (foc->len >= 0 &&  /* Client presents or requests a cookie */
@@ -347,7 +346,7 @@ bool tcp_fastopen_cookie_check(struct sock *sk, u16 *mss,
 		return false;
 	}
 
-	if (sysctl_tcp_fastopen & TFO_CLIENT_NO_COOKIE) {
+	if (sock_net(sk)->ipv4.sysctl_tcp_fastopen & TFO_CLIENT_NO_COOKIE) {
 		cookie->len = -1;
 		return true;
 	}
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d9416b5..88409b1 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2472,6 +2472,8 @@ static int __net_init tcp_sk_init(struct net *net)
 	net->ipv4.sysctl_tcp_window_scaling = 1;
 	net->ipv4.sysctl_tcp_timestamps = 1;
 
+	net->ipv4.sysctl_tcp_fastopen = TFO_CLIENT_ENABLE;
+
 	return 0;
 fail:
 	tcp_sk_exit(net);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v4 2/3] ipv4: Namespaceify tcp_fastopen_key knob
From: Haishuang Yan @ 2017-09-22 13:48 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Eric Dumazet, Wei Wang,
	Luca BRUNO
  Cc: netdev, linux-kernel, Haishuang Yan
In-Reply-To: <1506088124-12650-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

Different namespace application might require different tcp_fastopen_key
independently of the host.

David Miller pointed out there is a leak without releasing the context
of tcp_fastopen_key during netns teardown. So add the release action in
exit_batch path.

Tested:
1. Container namespace:
# cat /proc/sys/net/ipv4/tcp_fastopen_key:
2817fff2-f803cf97-eadfd1f3-78c0992b

cookie key in tcp syn packets:
Fast Open Cookie
    Kind: TCP Fast Open Cookie (34)
    Length: 10
    Fast Open Cookie: 1e5dd82a8c492ca9

2. Host:
# cat /proc/sys/net/ipv4/tcp_fastopen_key:
107d7c5f-68eb2ac7-02fb06e6-ed341702

cookie key in tcp syn packets:
Fast Open Cookie
    Kind: TCP Fast Open Cookie (34)
    Length: 10
    Fast Open Cookie: e213c02bf0afbc8a

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 include/net/netns/ipv4.h   |  4 +++
 include/net/tcp.h          |  6 ++---
 net/ipv4/af_inet.c         |  2 +-
 net/ipv4/sysctl_net_ipv4.c | 26 +++++++++----------
 net/ipv4/tcp.c             |  2 +-
 net/ipv4/tcp_fastopen.c    | 64 +++++++++++++++++++++++++++++++---------------
 net/ipv4/tcp_ipv4.c        |  6 +++++
 7 files changed, 70 insertions(+), 40 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index ce6dde0..66b8335 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -36,6 +36,8 @@ struct inet_timewait_death_row {
 	int			sysctl_max_tw_buckets;
 };
 
+struct tcp_fastopen_context;
+
 struct netns_ipv4 {
 #ifdef CONFIG_SYSCTL
 	struct ctl_table_header	*forw_hdr;
@@ -128,6 +130,8 @@ struct netns_ipv4 {
 	struct inet_timewait_death_row tcp_death_row;
 	int sysctl_max_syn_backlog;
 	int sysctl_tcp_fastopen;
+	struct tcp_fastopen_context __rcu *tcp_fastopen_ctx;
+	spinlock_t tcp_fastopen_ctx_lock;
 
 #ifdef CONFIG_NET_L3_MASTER_DEV
 	int sysctl_udp_l3mdev_accept;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index f628967..e27bd18 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1556,13 +1556,13 @@ struct tcp_fastopen_request {
 };
 void tcp_free_fastopen_req(struct tcp_sock *tp);
 
-extern struct tcp_fastopen_context __rcu *tcp_fastopen_ctx;
-int tcp_fastopen_reset_cipher(void *key, unsigned int len);
+void tcp_fastopen_ctx_destroy(struct net *net);
+int tcp_fastopen_reset_cipher(struct net *net, void *key, unsigned int len);
 void tcp_fastopen_add_skb(struct sock *sk, struct sk_buff *skb);
 struct sock *tcp_try_fastopen(struct sock *sk, struct sk_buff *skb,
 			      struct request_sock *req,
 			      struct tcp_fastopen_cookie *foc);
-void tcp_fastopen_init_key_once(bool publish);
+void tcp_fastopen_init_key_once(struct net *net);
 bool tcp_fastopen_cookie_check(struct sock *sk, u16 *mss,
 			     struct tcp_fastopen_cookie *cookie);
 bool tcp_fastopen_defer_connect(struct sock *sk, int *err);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index ddd126d..43a1bbe 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -222,7 +222,7 @@ int inet_listen(struct socket *sock, int backlog)
 		    (tcp_fastopen & TFO_SERVER_ENABLE) &&
 		    !inet_csk(sk)->icsk_accept_queue.fastopenq.max_qlen) {
 			fastopen_queue_tune(sk, backlog);
-			tcp_fastopen_init_key_once(true);
+			tcp_fastopen_init_key_once(sock_net(sk));
 		}
 
 		err = inet_csk_listen_start(sk, backlog);
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index e31e853c..20e19fe 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -251,10 +251,12 @@ static int proc_allowed_congestion_control(struct ctl_table *ctl,
 	return ret;
 }
 
-static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write,
+static int proc_tcp_fastopen_key(struct ctl_table *table, int write,
 				 void __user *buffer, size_t *lenp,
 				 loff_t *ppos)
 {
+	struct net *net = container_of(table->data, struct net,
+	    ipv4.sysctl_tcp_fastopen);
 	struct ctl_table tbl = { .maxlen = (TCP_FASTOPEN_KEY_LENGTH * 2 + 10) };
 	struct tcp_fastopen_context *ctxt;
 	int ret;
@@ -265,7 +267,7 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write,
 		return -ENOMEM;
 
 	rcu_read_lock();
-	ctxt = rcu_dereference(tcp_fastopen_ctx);
+	ctxt = rcu_dereference(net->ipv4.tcp_fastopen_ctx);
 	if (ctxt)
 		memcpy(user_key, ctxt->key, TCP_FASTOPEN_KEY_LENGTH);
 	else
@@ -282,12 +284,7 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write,
 			ret = -EINVAL;
 			goto bad_key;
 		}
-		/* Generate a dummy secret but don't publish it. This
-		 * is needed so we don't regenerate a new key on the
-		 * first invocation of tcp_fastopen_cookie_gen
-		 */
-		tcp_fastopen_init_key_once(false);
-		tcp_fastopen_reset_cipher(user_key, TCP_FASTOPEN_KEY_LENGTH);
+		tcp_fastopen_reset_cipher(net, user_key, TCP_FASTOPEN_KEY_LENGTH);
 	}
 
 bad_key:
@@ -401,12 +398,6 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
 		.proc_handler	= proc_dointvec
 	},
 	{
-		.procname	= "tcp_fastopen_key",
-		.mode		= 0600,
-		.maxlen		= ((TCP_FASTOPEN_KEY_LENGTH * 2) + 10),
-		.proc_handler	= proc_tcp_fastopen_key,
-	},
-	{
 		.procname	= "tcp_fastopen_blackhole_timeout_sec",
 		.data		= &sysctl_tcp_fastopen_blackhole_timeout,
 		.maxlen		= sizeof(int),
@@ -1085,6 +1076,13 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.procname	= "tcp_fastopen_key",
+		.mode		= 0600,
+		.data		= &init_net.ipv4.sysctl_tcp_fastopen,
+		.maxlen		= ((TCP_FASTOPEN_KEY_LENGTH * 2) + 10),
+		.proc_handler	= proc_tcp_fastopen_key,
+	},
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	{
 		.procname	= "fib_multipath_use_neigh",
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index dac56c4..23225c9 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2749,7 +2749,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 	case TCP_FASTOPEN:
 		if (val >= 0 && ((1 << sk->sk_state) & (TCPF_CLOSE |
 		    TCPF_LISTEN))) {
-			tcp_fastopen_init_key_once(true);
+			tcp_fastopen_init_key_once(net);
 
 			fastopen_queue_tune(sk, val);
 		} else {
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 31b08ec..4eae44a 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -9,13 +9,18 @@
 #include <net/inetpeer.h>
 #include <net/tcp.h>
 
-struct tcp_fastopen_context __rcu *tcp_fastopen_ctx;
-
-static DEFINE_SPINLOCK(tcp_fastopen_ctx_lock);
-
-void tcp_fastopen_init_key_once(bool publish)
+void tcp_fastopen_init_key_once(struct net *net)
 {
-	static u8 key[TCP_FASTOPEN_KEY_LENGTH];
+	u8 key[TCP_FASTOPEN_KEY_LENGTH];
+	struct tcp_fastopen_context *ctxt;
+
+	rcu_read_lock();
+	ctxt = rcu_dereference(net->ipv4.tcp_fastopen_ctx);
+	if (ctxt) {
+		rcu_read_unlock();
+		return;
+	}
+	rcu_read_unlock();
 
 	/* tcp_fastopen_reset_cipher publishes the new context
 	 * atomically, so we allow this race happening here.
@@ -23,8 +28,8 @@ void tcp_fastopen_init_key_once(bool publish)
 	 * All call sites of tcp_fastopen_cookie_gen also check
 	 * for a valid cookie, so this is an acceptable risk.
 	 */
-	if (net_get_random_once(key, sizeof(key)) && publish)
-		tcp_fastopen_reset_cipher(key, sizeof(key));
+	get_random_bytes(key, sizeof(key));
+	tcp_fastopen_reset_cipher(net, key, sizeof(key));
 }
 
 static void tcp_fastopen_ctx_free(struct rcu_head *head)
@@ -35,7 +40,22 @@ static void tcp_fastopen_ctx_free(struct rcu_head *head)
 	kfree(ctx);
 }
 
-int tcp_fastopen_reset_cipher(void *key, unsigned int len)
+void tcp_fastopen_ctx_destroy(struct net *net)
+{
+	struct tcp_fastopen_context *ctxt;
+
+	spin_lock(&net->ipv4.tcp_fastopen_ctx_lock);
+
+	ctxt = rcu_dereference_protected(net->ipv4.tcp_fastopen_ctx,
+				lockdep_is_held(&net->ipv4.tcp_fastopen_ctx_lock));
+	rcu_assign_pointer(net->ipv4.tcp_fastopen_ctx, NULL);
+	spin_unlock(&net->ipv4.tcp_fastopen_ctx_lock);
+
+	if (ctxt)
+		call_rcu(&ctxt->rcu, tcp_fastopen_ctx_free);
+}
+
+int tcp_fastopen_reset_cipher(struct net *net, void *key, unsigned int len)
 {
 	int err;
 	struct tcp_fastopen_context *ctx, *octx;
@@ -59,26 +79,27 @@ int tcp_fastopen_reset_cipher(void *key, unsigned int len)
 	}
 	memcpy(ctx->key, key, len);
 
-	spin_lock(&tcp_fastopen_ctx_lock);
+	spin_lock(&net->ipv4.tcp_fastopen_ctx_lock);
 
-	octx = rcu_dereference_protected(tcp_fastopen_ctx,
-				lockdep_is_held(&tcp_fastopen_ctx_lock));
-	rcu_assign_pointer(tcp_fastopen_ctx, ctx);
-	spin_unlock(&tcp_fastopen_ctx_lock);
+	octx = rcu_dereference_protected(net->ipv4.tcp_fastopen_ctx,
+				lockdep_is_held(&net->ipv4.tcp_fastopen_ctx_lock));
+	rcu_assign_pointer(net->ipv4.tcp_fastopen_ctx, ctx);
+	spin_unlock(&net->ipv4.tcp_fastopen_ctx_lock);
 
 	if (octx)
 		call_rcu(&octx->rcu, tcp_fastopen_ctx_free);
 	return err;
 }
 
-static bool __tcp_fastopen_cookie_gen(const void *path,
+static bool __tcp_fastopen_cookie_gen(struct net *net,
+				      const void *path,
 				      struct tcp_fastopen_cookie *foc)
 {
 	struct tcp_fastopen_context *ctx;
 	bool ok = false;
 
 	rcu_read_lock();
-	ctx = rcu_dereference(tcp_fastopen_ctx);
+	ctx = rcu_dereference(net->ipv4.tcp_fastopen_ctx);
 	if (ctx) {
 		crypto_cipher_encrypt_one(ctx->tfm, foc->val, path);
 		foc->len = TCP_FASTOPEN_COOKIE_SIZE;
@@ -94,7 +115,8 @@ static bool __tcp_fastopen_cookie_gen(const void *path,
  *
  * XXX (TFO) - refactor when TCP_FASTOPEN_COOKIE_SIZE != AES_BLOCK_SIZE.
  */
-static bool tcp_fastopen_cookie_gen(struct request_sock *req,
+static bool tcp_fastopen_cookie_gen(struct net *net,
+				    struct request_sock *req,
 				    struct sk_buff *syn,
 				    struct tcp_fastopen_cookie *foc)
 {
@@ -102,7 +124,7 @@ static bool tcp_fastopen_cookie_gen(struct request_sock *req,
 		const struct iphdr *iph = ip_hdr(syn);
 
 		__be32 path[4] = { iph->saddr, iph->daddr, 0, 0 };
-		return __tcp_fastopen_cookie_gen(path, foc);
+		return __tcp_fastopen_cookie_gen(net, path, foc);
 	}
 
 #if IS_ENABLED(CONFIG_IPV6)
@@ -110,13 +132,13 @@ static bool tcp_fastopen_cookie_gen(struct request_sock *req,
 		const struct ipv6hdr *ip6h = ipv6_hdr(syn);
 		struct tcp_fastopen_cookie tmp;
 
-		if (__tcp_fastopen_cookie_gen(&ip6h->saddr, &tmp)) {
+		if (__tcp_fastopen_cookie_gen(net, &ip6h->saddr, &tmp)) {
 			struct in6_addr *buf = &tmp.addr;
 			int i;
 
 			for (i = 0; i < 4; i++)
 				buf->s6_addr32[i] ^= ip6h->daddr.s6_addr32[i];
-			return __tcp_fastopen_cookie_gen(buf, foc);
+			return __tcp_fastopen_cookie_gen(net, buf, foc);
 		}
 	}
 #endif
@@ -296,7 +318,7 @@ struct sock *tcp_try_fastopen(struct sock *sk, struct sk_buff *skb,
 		goto fastopen;
 
 	if (foc->len >= 0 &&  /* Client presents or requests a cookie */
-	    tcp_fastopen_cookie_gen(req, skb, &valid_foc) &&
+	    tcp_fastopen_cookie_gen(sock_net(sk), req, skb, &valid_foc) &&
 	    foc->len == TCP_FASTOPEN_COOKIE_SIZE &&
 	    foc->len == valid_foc.len &&
 	    !memcmp(foc->val, valid_foc.val, foc->len)) {
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 88409b1..49c74c0 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2473,6 +2473,7 @@ static int __net_init tcp_sk_init(struct net *net)
 	net->ipv4.sysctl_tcp_timestamps = 1;
 
 	net->ipv4.sysctl_tcp_fastopen = TFO_CLIENT_ENABLE;
+	spin_lock_init(&net->ipv4.tcp_fastopen_ctx_lock);
 
 	return 0;
 fail:
@@ -2483,7 +2484,12 @@ static int __net_init tcp_sk_init(struct net *net)
 
 static void __net_exit tcp_sk_exit_batch(struct list_head *net_exit_list)
 {
+	struct net *net;
+
 	inet_twsk_purge(&tcp_hashinfo, AF_INET);
+
+	list_for_each_entry(net, net_exit_list, exit_list)
+		tcp_fastopen_ctx_destroy(net);
 }
 
 static struct pernet_operations __net_initdata tcp_sk_ops = {
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v4 3/3] ipv4: Namespaceify tcp_fastopen_blackhole_timeout knob
From: Haishuang Yan @ 2017-09-22 13:48 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Eric Dumazet, Wei Wang,
	Luca BRUNO
  Cc: netdev, linux-kernel, Haishuang Yan
In-Reply-To: <1506088124-12650-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

Different namespace application might require different time period in
second to disable Fastopen on active TCP sockets.

Tested:
Simulate following similar situation that the server's data gets dropped
after 3WHS.
C ---- syn-data ---> S
C <--- syn/ack ----- S
C ---- ack --------> S
S (accept & write)
C?  X <- data ------ S
	[retry and timeout]

And then print netstat of TCPFastOpenBlackhole, the counter increased as
expected when the firewall blackhole issue is detected and active TFO is
disabled.
# cat /proc/net/netstat | awk '{print $91}'
TCPFastOpenBlackhole
1

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 include/net/netns/ipv4.h   |  3 +++
 net/ipv4/sysctl_net_ipv4.c | 20 +++++++++++---------
 net/ipv4/tcp_fastopen.c    | 30 +++++++++++-------------------
 net/ipv4/tcp_ipv4.c        |  2 ++
 4 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 66b8335..d76edde 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -132,6 +132,9 @@ struct netns_ipv4 {
 	int sysctl_tcp_fastopen;
 	struct tcp_fastopen_context __rcu *tcp_fastopen_ctx;
 	spinlock_t tcp_fastopen_ctx_lock;
+	unsigned int sysctl_tcp_fastopen_blackhole_timeout;
+	atomic_t tfo_active_disable_times;
+	unsigned long tfo_active_disable_stamp;
 
 #ifdef CONFIG_NET_L3_MASTER_DEV
 	int sysctl_udp_l3mdev_accept;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 20e19fe..cac8dd3 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -355,11 +355,13 @@ static int proc_tfo_blackhole_detect_timeout(struct ctl_table *table,
 					     void __user *buffer,
 					     size_t *lenp, loff_t *ppos)
 {
+	struct net *net = container_of(table->data, struct net,
+	    ipv4.sysctl_tcp_fastopen_blackhole_timeout);
 	int ret;
 
 	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
 	if (write && ret == 0)
-		tcp_fastopen_active_timeout_reset();
+		atomic_set(&net->ipv4.tfo_active_disable_times, 0);
 
 	return ret;
 }
@@ -398,14 +400,6 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
 		.proc_handler	= proc_dointvec
 	},
 	{
-		.procname	= "tcp_fastopen_blackhole_timeout_sec",
-		.data		= &sysctl_tcp_fastopen_blackhole_timeout,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= proc_tfo_blackhole_detect_timeout,
-		.extra1		= &zero,
-	},
-	{
 		.procname	= "tcp_abort_on_overflow",
 		.data		= &sysctl_tcp_abort_on_overflow,
 		.maxlen		= sizeof(int),
@@ -1083,6 +1077,14 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
 		.maxlen		= ((TCP_FASTOPEN_KEY_LENGTH * 2) + 10),
 		.proc_handler	= proc_tcp_fastopen_key,
 	},
+	{
+		.procname	= "tcp_fastopen_blackhole_timeout_sec",
+		.data		= &init_net.ipv4.sysctl_tcp_fastopen_blackhole_timeout,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_tfo_blackhole_detect_timeout,
+		.extra1		= &zero,
+	},
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	{
 		.procname	= "fib_multipath_use_neigh",
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 4eae44a..de470e7 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -422,25 +422,16 @@ bool tcp_fastopen_defer_connect(struct sock *sk, int *err)
  * TFO connection with data exchanges.
  */
 
-/* Default to 1hr */
-unsigned int sysctl_tcp_fastopen_blackhole_timeout __read_mostly = 60 * 60;
-static atomic_t tfo_active_disable_times __read_mostly = ATOMIC_INIT(0);
-static unsigned long tfo_active_disable_stamp __read_mostly;
-
 /* Disable active TFO and record current jiffies and
  * tfo_active_disable_times
  */
 void tcp_fastopen_active_disable(struct sock *sk)
 {
-	atomic_inc(&tfo_active_disable_times);
-	tfo_active_disable_stamp = jiffies;
-	NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPFASTOPENBLACKHOLE);
-}
+	struct net *net = sock_net(sk);
 
-/* Reset tfo_active_disable_times to 0 */
-void tcp_fastopen_active_timeout_reset(void)
-{
-	atomic_set(&tfo_active_disable_times, 0);
+	atomic_inc(&net->ipv4.tfo_active_disable_times);
+	net->ipv4.tfo_active_disable_stamp = jiffies;
+	NET_INC_STATS(net, LINUX_MIB_TCPFASTOPENBLACKHOLE);
 }
 
 /* Calculate timeout for tfo active disable
@@ -449,17 +440,18 @@ void tcp_fastopen_active_timeout_reset(void)
  */
 bool tcp_fastopen_active_should_disable(struct sock *sk)
 {
-	int tfo_da_times = atomic_read(&tfo_active_disable_times);
-	int multiplier;
+	unsigned int tfo_bh_timeout = sock_net(sk)->ipv4.sysctl_tcp_fastopen_blackhole_timeout;
+	int tfo_da_times = atomic_read(&sock_net(sk)->ipv4.tfo_active_disable_times);
 	unsigned long timeout;
+	int multiplier;
 
 	if (!tfo_da_times)
 		return false;
 
 	/* Limit timout to max: 2^6 * initial timeout */
 	multiplier = 1 << min(tfo_da_times - 1, 6);
-	timeout = multiplier * sysctl_tcp_fastopen_blackhole_timeout * HZ;
-	if (time_before(jiffies, tfo_active_disable_stamp + timeout))
+	timeout = multiplier * tfo_bh_timeout * HZ;
+	if (time_before(jiffies, sock_net(sk)->ipv4.tfo_active_disable_stamp + timeout))
 		return true;
 
 	/* Mark check bit so we can check for successful active TFO
@@ -495,10 +487,10 @@ void tcp_fastopen_active_disable_ofo_check(struct sock *sk)
 			}
 		}
 	} else if (tp->syn_fastopen_ch &&
-		   atomic_read(&tfo_active_disable_times)) {
+		   atomic_read(&sock_net(sk)->ipv4.tfo_active_disable_times)) {
 		dst = sk_dst_get(sk);
 		if (!(dst && dst->dev && (dst->dev->flags & IFF_LOOPBACK)))
-			tcp_fastopen_active_timeout_reset();
+			atomic_set(&sock_net(sk)->ipv4.tfo_active_disable_times, 0);
 		dst_release(dst);
 	}
 }
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 49c74c0..ad3b5bb 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2474,6 +2474,8 @@ static int __net_init tcp_sk_init(struct net *net)
 
 	net->ipv4.sysctl_tcp_fastopen = TFO_CLIENT_ENABLE;
 	spin_lock_init(&net->ipv4.tcp_fastopen_ctx_lock);
+	net->ipv4.sysctl_tcp_fastopen_blackhole_timeout = 60 * 60;
+	atomic_set(&net->ipv4.tfo_active_disable_times, 0);
 
 	return 0;
 fail:
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH] brcm80211: make const array ucode_ofdm_rates static, reduces object code size
From: Colin King @ 2017-09-22 14:03 UTC (permalink / raw)
  To: Arend van Spriel, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
	Wright Feng, Kalle Valo, linux-wireless, brcm80211-dev-list.pdl,
	brcm80211-dev-list, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Don't populate const array ucode_ofdm_rates on the stack, instead make it
static. Makes the object code smaller by 100 bytes:

Before:
   text	   data	    bss	    dec	    hex	filename
  39482	    564	      0	  40046	   9c6e	phy_cmn.o

After
   text	   data	    bss	    dec	    hex	filename
  39326	    620	      0	  39946	   9c0a	phy_cmn.o

(gcc 6.3.0, x86-64)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c
index 1c4e9dd57960..3a13d176b221 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c
@@ -1916,7 +1916,7 @@ void wlc_phy_txpower_update_shm(struct brcms_phy *pi)
 				     pi->hwpwr_txcur);
 
 		for (j = TXP_FIRST_OFDM; j <= TXP_LAST_OFDM; j++) {
-			const u8 ucode_ofdm_rates[] = {
+			static const u8 ucode_ofdm_rates[] = {
 				0x0c, 0x12, 0x18, 0x24, 0x30, 0x48, 0x60, 0x6c
 			};
 			offset = wlapi_bmac_rate_shm_offset(
-- 
2.14.1

^ permalink raw reply related

* Re: [pktgen script v2 0/2] Add a pktgen sample script of NUMA awareness
From: Jesper Dangaard Brouer @ 2017-09-22 14:06 UTC (permalink / raw)
  To: Robert Hoo; +Cc: davem, tariqt, kyle.leet, netdev, robert.hu, brouer
In-Reply-To: <20170918110621.355dc215@redhat.com>

On Mon, 18 Sep 2017 11:06:21 +0200
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> On Sun, 17 Sep 2017 20:36:36 +0800 Robert Hoo <robert.hu@linux.intel.com> wrote:
> 
> > Change log
> > v2:
> >  Rebased to https://github.com/netoptimizer/network-testing/tree/master/pktgen  
> 
> Hi Robert,
> 
> Thank you for submitting this against my git tree[1]. I skimmed the
> patches and they looked okay.  I'll give them a test run, before I
> accept them into my tree.
> 
> Later I'll synchronize my pktgen scripts/git-tree with the kernel via
> regular patches against DaveM's net-next tree[2] (and I'll try to
> remember to give you author credit).
> 
> [1] https://github.com/netoptimizer/network-testing/tree/master/pktgen
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/samples/pktgen

FYI, I've applied and pushed these patches to my tree.
 https://github.com/netoptimizer/network-testing/commits?author=robert-hoo
 https://github.com/netoptimizer/network-testing/commit/1b9b4b797a4f112
 https://github.com/netoptimizer/network-testing/commit/65efc2352f63dde
 https://github.com/netoptimizer/network-testing/commit/54eb5178aaf4031

I fixed the description a bit, and only made one simple change:
 https://github.com/netoptimizer/network-testing/commit/9ff58568b3f8c91

Thanks for working on improving the pktgen scripts :-)
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH,v2,net-next 2/2] tun: enable napi_gro_frags() for TUN/TAP driver
From: Willem de Bruijn @ 2017-09-22 14:06 UTC (permalink / raw)
  To: Petar Penkov
  Cc: Network Development, Eric Dumazet, Mahesh Bandewar,
	Willem de Bruijn, David Miller, ppenkov
In-Reply-To: <20170922021715.2618-3-peterpenkov96@gmail.com>

> @@ -2061,6 +2174,9 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>         if (tfile->detached)
>                 return -EINVAL;
>
> +       if ((ifr->ifr_flags & IFF_NAPI_FRAGS) && !capable(CAP_NET_ADMIN))
> +               return -EPERM;
> +

This should perhaps be moved into the !dev branch, directly below the
ns_capable check.

>         dev = __dev_get_by_name(net, ifr->ifr_name);
>         if (dev) {
>                 if (ifr->ifr_flags & IFF_TUN_EXCL)
> @@ -2185,6 +2301,9 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>         tun->flags = (tun->flags & ~TUN_FEATURES) |
>                 (ifr->ifr_flags & TUN_FEATURES);
>
> +       if (!(tun->flags & IFF_NAPI) || (tun->flags & TUN_TYPE_MASK) != IFF_TAP)
> +               tun->flags = tun->flags & ~IFF_NAPI_FRAGS;
> +

Similarly, this check only need to be performed in that branch.
Instead of reverting to non-frags mode, a tun_set_iff with the wrong
set of flags should probably fail hard.

^ permalink raw reply

* Re: [PATCH,v2,net-next 1/2] tun: enable NAPI for TUN/TAP driver
From: Willem de Bruijn @ 2017-09-22 14:06 UTC (permalink / raw)
  To: Petar Penkov
  Cc: Network Development, Eric Dumazet, Mahesh Bandewar,
	Willem de Bruijn, David Miller, ppenkov
In-Reply-To: <20170922021715.2618-2-peterpenkov96@gmail.com>

On Thu, Sep 21, 2017 at 10:17 PM, Petar Penkov <peterpenkov96@gmail.com> wrote:
> Changes TUN driver to use napi_gro_receive() upon receiving packets
> rather than netif_rx_ni(). Adds flag IFF_NAPI that enables these
> changes and operation is not affected if the flag is disabled.  SKBs
> are constructed upon packet arrival and are queued to be processed
> later.
>
> The new path was evaluated with a benchmark with the following setup:
> Open two tap devices and a receiver thread that reads in a loop for
> each device. Start one sender thread and pin all threads to different
> CPUs. Send 1M minimum UDP packets to each device and measure sending
> time for each of the sending methods:
>         napi_gro_receive():     4.90s
>         netif_rx_ni():          4.90s
>         netif_receive_skb():    7.20s
>
> Signed-off-by: Petar Penkov <peterpenkov96@gmail.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Mahesh Bandewar <maheshb@google.com>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: davem@davemloft.net
> Cc: ppenkov@stanford.edu

Acked-by: Willem de Bruijn <willemb@google.com>

Thanks, Petar.

^ permalink raw reply

* [PATCH] i40e: make const array patterns static, reduces object code size
From: Colin King @ 2017-09-22 14:11 UTC (permalink / raw)
  To: Jeff Kirsher, intel-wired-lan, netdev; +Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Don't populate const array patterns on the stack, instead make it
static. Makes the object code smaller by over 60 bytes:

Before:
   text	   data	    bss	    dec	    hex	filename
   1953	    496	      0	   2449	    991	i40e_diag.o

After:
   text	   data	    bss	    dec	    hex	filename
   1798	    584	      0	   2382	    94e	i40e_diag.o

(gcc 6.3.0, x86-64)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/intel/i40e/i40e_diag.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_diag.c b/drivers/net/ethernet/intel/i40e/i40e_diag.c
index f141e78d409e..76ed56641864 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_diag.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_diag.c
@@ -36,7 +36,9 @@
 static i40e_status i40e_diag_reg_pattern_test(struct i40e_hw *hw,
 							u32 reg, u32 mask)
 {
-	const u32 patterns[] = {0x5A5A5A5A, 0xA5A5A5A5, 0x00000000, 0xFFFFFFFF};
+	static const u32 patterns[] = {
+		0x5A5A5A5A, 0xA5A5A5A5, 0x00000000, 0xFFFFFFFF
+	};
 	u32 pat, val, orig_val;
 	int i;
 
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
From: Y Song @ 2017-09-22 14:11 UTC (permalink / raw)
  To: Edward Cree; +Cc: Alexei Starovoitov, Daniel Borkmann, David Miller, netdev
In-Reply-To: <ec7d0485-11f4-4fdb-0b87-1fb0afdb4f11@solarflare.com>

On Fri, Sep 22, 2017 at 6:46 AM, Edward Cree <ecree@solarflare.com> wrote:
> On 22/09/17 00:11, Y Song wrote:
>> On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree <ecree@solarflare.com> wrote:
>>> On 21/09/17 20:44, Alexei Starovoitov wrote:
>>>> On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote:
>>>>> More intuitive, but agree on the from_be/le. Maybe we should
>>>>> just drop the "to_" prefix altogether, and leave the rest as is since
>>>>> it's not surrounded by braces, it's also not a cast but rather an op.
>>> That works for me.
>>>> 'be16 r4' is ambiguous regarding upper bits.
>>>>
>>>> what about my earlier suggestion:
>>>> r4 = (be16) (u16) r4
>>>> r4 = (le64) (u64) r4
>>>>
>>>> It will be pretty clear what instruction is doing (that upper bits become zero).
>>> Trouble with that is that's very *not* what C will do with those casts
>>>  and it doesn't really capture the bidirectional/symmetry thing.  The
>>>  closest I could see with that is something like `r4 = (be16/u16) r4`,
>>>  but that's quite an ugly mongrel.
>>> I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the
>>>  cleanest and clearest.  Should it be
>>>     r4 = be16 r4
>>>  or just
>>>     be16 r4
>>> ?  Personally I incline towards the latter, but admit it doesn't really
>>>  match the syntax of other opcodes.
>> I did some quick prototyping in llvm to make sure we have a syntax
>> llvm is happy. Apparently, llvm does not like the syntax
>>    r4 = be16 r4    or    r4 = (be16) (u16) r4.
>>
>> In llvm:utils/TableGen/AsmMatcherEmitter.cpp:
>>
>>     // Verify that any operand is only mentioned once.
> Wait, how do you deal with (totally legal) r4 += r4?
> Or r4 = *(r4 +0)?
> Even jumps can have src_reg == dst_reg, though it doesn't seem useful.

We are talking about dag node here. The above "r4", although using the same
register, will be different dag nodes. So it will be okay.

The "r4 = be16 r4" tries to use the *same* dag node as both source and
destination
in the asm output which is prohibited.

^ permalink raw reply

* [PATCH 1/1] forcedeth: optimize the xmit/rx with unlikely
From: Zhu Yanjun @ 2017-09-22 14:20 UTC (permalink / raw)
  To: davem, jarod, netdev

In the xmit/rx fastpath, the function dma_map_single rarely fails.
Therefore, add an unlikely() optimization to this error check
conditional.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index 0a7ba3a..63a9e1e 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -1816,8 +1816,8 @@ static int nv_alloc_rx(struct net_device *dev)
 							     skb->data,
 							     skb_tailroom(skb),
 							     DMA_FROM_DEVICE);
-			if (dma_mapping_error(&np->pci_dev->dev,
-					      np->put_rx_ctx->dma)) {
+			if (unlikely(dma_mapping_error(&np->pci_dev->dev,
+						       np->put_rx_ctx->dma))) {
 				kfree_skb(skb);
 				goto packet_dropped;
 			}
@@ -1857,8 +1857,8 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
 							     skb->data,
 							     skb_tailroom(skb),
 							     DMA_FROM_DEVICE);
-			if (dma_mapping_error(&np->pci_dev->dev,
-					      np->put_rx_ctx->dma)) {
+			if (unlikely(dma_mapping_error(&np->pci_dev->dev,
+						       np->put_rx_ctx->dma))) {
 				kfree_skb(skb);
 				goto packet_dropped;
 			}
@@ -2224,8 +2224,8 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		np->put_tx_ctx->dma = dma_map_single(&np->pci_dev->dev,
 						     skb->data + offset, bcnt,
 						     DMA_TO_DEVICE);
-		if (dma_mapping_error(&np->pci_dev->dev,
-				      np->put_tx_ctx->dma)) {
+		if (unlikely(dma_mapping_error(&np->pci_dev->dev,
+					       np->put_tx_ctx->dma))) {
 			/* on DMA mapping error - drop the packet */
 			dev_kfree_skb_any(skb);
 			u64_stats_update_begin(&np->swstats_tx_syncp);
@@ -2265,7 +2265,8 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
 							frag, offset,
 							bcnt,
 							DMA_TO_DEVICE);
-			if (dma_mapping_error(&np->pci_dev->dev, np->put_tx_ctx->dma)) {
+			if (unlikely(dma_mapping_error(&np->pci_dev->dev,
+						       np->put_tx_ctx->dma))) {
 
 				/* Unwind the mapped fragments */
 				do {
@@ -2373,8 +2374,8 @@ static netdev_tx_t nv_start_xmit_optimized(struct sk_buff *skb,
 		np->put_tx_ctx->dma = dma_map_single(&np->pci_dev->dev,
 						     skb->data + offset, bcnt,
 						     DMA_TO_DEVICE);
-		if (dma_mapping_error(&np->pci_dev->dev,
-				      np->put_tx_ctx->dma)) {
+		if (unlikely(dma_mapping_error(&np->pci_dev->dev,
+					       np->put_tx_ctx->dma))) {
 			/* on DMA mapping error - drop the packet */
 			dev_kfree_skb_any(skb);
 			u64_stats_update_begin(&np->swstats_tx_syncp);
@@ -2415,7 +2416,8 @@ static netdev_tx_t nv_start_xmit_optimized(struct sk_buff *skb,
 							bcnt,
 							DMA_TO_DEVICE);
 
-			if (dma_mapping_error(&np->pci_dev->dev, np->put_tx_ctx->dma)) {
+			if (unlikely(dma_mapping_error(&np->pci_dev->dev,
+						       np->put_tx_ctx->dma))) {
 
 				/* Unwind the mapped fragments */
 				do {
@@ -5070,8 +5072,8 @@ static int nv_loopback_test(struct net_device *dev)
 	test_dma_addr = dma_map_single(&np->pci_dev->dev, tx_skb->data,
 				       skb_tailroom(tx_skb),
 				       DMA_FROM_DEVICE);
-	if (dma_mapping_error(&np->pci_dev->dev,
-			      test_dma_addr)) {
+	if (unlikely(dma_mapping_error(&np->pci_dev->dev,
+				       test_dma_addr))) {
 		dev_kfree_skb_any(tx_skb);
 		goto out;
 	}
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
From: Y Song @ 2017-09-22 14:27 UTC (permalink / raw)
  To: Edward Cree; +Cc: Alexei Starovoitov, Daniel Borkmann, David Miller, netdev
In-Reply-To: <CAH3MdRVCf_wLoC7WOfaL2L2px3aYVWUhWxarMsyJdD7_E9fv_Q@mail.gmail.com>

On Fri, Sep 22, 2017 at 7:11 AM, Y Song <ys114321@gmail.com> wrote:
> On Fri, Sep 22, 2017 at 6:46 AM, Edward Cree <ecree@solarflare.com> wrote:
>> On 22/09/17 00:11, Y Song wrote:
>>> On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree <ecree@solarflare.com> wrote:
>>>> On 21/09/17 20:44, Alexei Starovoitov wrote:
>>>>> On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote:
>>>>>> More intuitive, but agree on the from_be/le. Maybe we should
>>>>>> just drop the "to_" prefix altogether, and leave the rest as is since
>>>>>> it's not surrounded by braces, it's also not a cast but rather an op.
>>>> That works for me.
>>>>> 'be16 r4' is ambiguous regarding upper bits.
>>>>>
>>>>> what about my earlier suggestion:
>>>>> r4 = (be16) (u16) r4
>>>>> r4 = (le64) (u64) r4
>>>>>
>>>>> It will be pretty clear what instruction is doing (that upper bits become zero).
>>>> Trouble with that is that's very *not* what C will do with those casts
>>>>  and it doesn't really capture the bidirectional/symmetry thing.  The
>>>>  closest I could see with that is something like `r4 = (be16/u16) r4`,
>>>>  but that's quite an ugly mongrel.
>>>> I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the
>>>>  cleanest and clearest.  Should it be
>>>>     r4 = be16 r4
>>>>  or just
>>>>     be16 r4
>>>> ?  Personally I incline towards the latter, but admit it doesn't really
>>>>  match the syntax of other opcodes.
>>> I did some quick prototyping in llvm to make sure we have a syntax
>>> llvm is happy. Apparently, llvm does not like the syntax
>>>    r4 = be16 r4    or    r4 = (be16) (u16) r4.
>>>
>>> In llvm:utils/TableGen/AsmMatcherEmitter.cpp:
>>>
>>>     // Verify that any operand is only mentioned once.
>> Wait, how do you deal with (totally legal) r4 += r4?
>> Or r4 = *(r4 +0)?
>> Even jumps can have src_reg == dst_reg, though it doesn't seem useful.
>
> We are talking about dag node here. The above "r4", although using the same
> register, will be different dag nodes. So it will be okay.
>
> The "r4 = be16 r4" tries to use the *same* dag node as both source and
> destination
> in the asm output which is prohibited.

With second thought, we may allow "r4 = be16 r4" by using different dag nodes.
(I need to do experiment for this.) But we do have constraints that
the two "r4" must
be the same register.  "r5 = be16 r4"  is not allowed. So from that
perspective, referencing
"r4" only once is a good idea and less confusing.

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH][V2] e1000: avoid null pointer dereference on invalid stat type
From: Alexander Duyck @ 2017-09-22 14:47 UTC (permalink / raw)
  To: Colin King
  Cc: Jeff Kirsher, intel-wired-lan, Netdev, kernel-janitors,
	linux-kernel@vger.kernel.org
In-Reply-To: <20170922134131.12283-1-colin.king@canonical.com>

On Fri, Sep 22, 2017 at 6:41 AM, Colin King <colin.king@canonical.com> wrote:
> From: Colin Ian King <colin.king@canonical.com>
>
> Currently if the stat type is invalid then data[i] is being set
> either by dereferencing a null pointer p, or it is reading from
> an incorrect previous location if we had a valid stat type
> previously.  Fix this by nullify pointer p if a stat type is
> invalid and only setting data if p is not null.
>
> Detected by CoverityScan, CID#113385 ("Explicit null dereferenced")
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
> index ec8aa4562cc9..724c93a6ea92 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
> @@ -1824,7 +1824,7 @@ static void e1000_get_ethtool_stats(struct net_device *netdev,
>  {
>         struct e1000_adapter *adapter = netdev_priv(netdev);
>         int i;
> -       char *p = NULL;
> +       char *p;
>         const struct e1000_stats *stat = e1000_gstrings_stats;
>
>         e1000_update_stats(adapter);

I was honestly happier with the portion of the first patch that moved
this into the loop.

> @@ -1837,16 +1837,18 @@ static void e1000_get_ethtool_stats(struct net_device *netdev,
>                         p = (char *)adapter + stat->stat_offset;
>                         break;
>                 default:
> +                       p = NULL;
>                         WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n",
>                                   stat->type, i);
>                         break;
>                 }
>
> -               if (stat->sizeof_stat == sizeof(u64))
> -                       data[i] = *(u64 *)p;
> -               else
> -                       data[i] = *(u32 *)p;
> -
> +               if (p) {
> +                       if (stat->sizeof_stat == sizeof(u64))
> +                               data[i] = *(u64 *)p;
> +                       else
> +                               data[i] = *(u32 *)p;
> +               }
>                 stat++;
>         }

Instead of doing all this why not just call out a "continue;" instead
of a "break" if the type isn't recognized, and move the stat++ into
the loop declaration and process it at the same time as i++? That
would clean most of this up and we don't have to worry about any loop
carried variables and the like.

>  /* BUG_ON(i != E1000_STATS_LEN); */
> --
> 2.14.1
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply

* Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
From: Alexei Starovoitov @ 2017-09-22 15:16 UTC (permalink / raw)
  To: Y Song; +Cc: Edward Cree, Daniel Borkmann, David Miller, netdev
In-Reply-To: <CAH3MdRXgVtVjhYQkRopPKvL+4+NaTRRJoJq6XD_S-Dra69j0hA@mail.gmail.com>

On Fri, Sep 22, 2017 at 07:27:29AM -0700, Y Song wrote:
> On Fri, Sep 22, 2017 at 7:11 AM, Y Song <ys114321@gmail.com> wrote:
> > On Fri, Sep 22, 2017 at 6:46 AM, Edward Cree <ecree@solarflare.com> wrote:
> >> On 22/09/17 00:11, Y Song wrote:
> >>> On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree <ecree@solarflare.com> wrote:
> >>>> On 21/09/17 20:44, Alexei Starovoitov wrote:
> >>>>> On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote:
> >>>>>> More intuitive, but agree on the from_be/le. Maybe we should
> >>>>>> just drop the "to_" prefix altogether, and leave the rest as is since
> >>>>>> it's not surrounded by braces, it's also not a cast but rather an op.
> >>>> That works for me.
> >>>>> 'be16 r4' is ambiguous regarding upper bits.
> >>>>>
> >>>>> what about my earlier suggestion:
> >>>>> r4 = (be16) (u16) r4
> >>>>> r4 = (le64) (u64) r4
> >>>>>
> >>>>> It will be pretty clear what instruction is doing (that upper bits become zero).
> >>>> Trouble with that is that's very *not* what C will do with those casts
> >>>>  and it doesn't really capture the bidirectional/symmetry thing.  The
> >>>>  closest I could see with that is something like `r4 = (be16/u16) r4`,
> >>>>  but that's quite an ugly mongrel.
> >>>> I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the
> >>>>  cleanest and clearest.  Should it be
> >>>>     r4 = be16 r4
> >>>>  or just
> >>>>     be16 r4
> >>>> ?  Personally I incline towards the latter, but admit it doesn't really
> >>>>  match the syntax of other opcodes.
> >>> I did some quick prototyping in llvm to make sure we have a syntax
> >>> llvm is happy. Apparently, llvm does not like the syntax
> >>>    r4 = be16 r4    or    r4 = (be16) (u16) r4.
> >>>
> >>> In llvm:utils/TableGen/AsmMatcherEmitter.cpp:
> >>>
> >>>     // Verify that any operand is only mentioned once.
> >> Wait, how do you deal with (totally legal) r4 += r4?
> >> Or r4 = *(r4 +0)?
> >> Even jumps can have src_reg == dst_reg, though it doesn't seem useful.
> >
> > We are talking about dag node here. The above "r4", although using the same
> > register, will be different dag nodes. So it will be okay.
> >
> > The "r4 = be16 r4" tries to use the *same* dag node as both source and
> > destination
> > in the asm output which is prohibited.
> 
> With second thought, we may allow "r4 = be16 r4" by using different dag nodes.
> (I need to do experiment for this.) But we do have constraints that
> the two "r4" must
> be the same register.  "r5 = be16 r4"  is not allowed. So from that
> perspective, referencing
> "r4" only once is a good idea and less confusing.

looks like we're converging on
"be16/be32/be64/le16/le32/le64 #register" for BPF_END.
I guess it can live with that. I would prefer more C like syntax
to match the rest, but llvm parsing point is a strong one.

For BPG_NEG I prefer to do it in C syntax like interpreter does:
        ALU_NEG:
                DST = (u32) -DST;
        ALU64_NEG:
                DST = -DST;
Yonghong, does it mean that asmparser will equally suffer?

^ permalink raw reply

* Re: tg3 pxe weirdness
From: Berend De Schouwer @ 2017-09-22 15:34 UTC (permalink / raw)
  To: Siva Reddy Kallam; +Cc: Linux Netdev List
In-Reply-To: <CAMet4B4Jdgjs42KAWQ4HJr1t_Grn1_+Hv8sipeLb_tvZ_SFrMw@mail.gmail.com>

On Fri, 2017-09-22 at 11:51 +0530, Siva Reddy Kallam wrote:
> 
> 
> Can you please share below details?
> 1) Model and Manufacturer of the system
> 2) Linux distro/kernel used?

4.13.3 gets a little further, but after some more data is transferred
the tg3 driver still crashes.  This is unfortunately before I've got a
writeable filesystem.

The last line is:
tg3 0000:01:00.0: tg3_stop_block timed out, ofs=4c00 enable_bit=2

I've got some ideas to get the full dmesg.

As with the other kernels it works OK on 1Gbps, but not slower
switches.

^ permalink raw reply

* [PATCH] b43: make const arrays static, reduces object code size
From: Colin King @ 2017-09-22 15:39 UTC (permalink / raw)
  To: Kalle Valo, linux-wireless, b43-dev, netdev; +Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Don't populate const arrays on the stack, instead make them static.
Makes the object code smaller by over 60 bytes:

Before:
   text	   data	    bss	    dec	    hex	filename
  14816	   1296	      0	  16112	   3ef0	b43/phy_ht.o

After:
   text	   data	    bss	    dec	    hex	filename
  14551	   1496	      0	  16047	   3eaf	b43/phy_ht.o

(gcc 6.3.0, x86-64)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/broadcom/b43/phy_ht.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/broadcom/b43/phy_ht.c b/drivers/net/wireless/broadcom/b43/phy_ht.c
index 718c90e81696..c3158d085c2b 100644
--- a/drivers/net/wireless/broadcom/b43/phy_ht.c
+++ b/drivers/net/wireless/broadcom/b43/phy_ht.c
@@ -119,7 +119,7 @@ static void b43_radio_2059_rcal(struct b43_wldev *dev)
 /* Calibrate the internal RC oscillator? */
 static void b43_radio_2057_rccal(struct b43_wldev *dev)
 {
-	const u16 radio_values[3][2] = {
+	static const u16 radio_values[3][2] = {
 		{ 0x61, 0xE9 }, { 0x69, 0xD5 }, { 0x73, 0x99 },
 	};
 	int i;
@@ -154,7 +154,7 @@ static void b43_radio_2059_init_pre(struct b43_wldev *dev)
 
 static void b43_radio_2059_init(struct b43_wldev *dev)
 {
-	const u16 routing[] = { R2059_C1, R2059_C2, R2059_C3 };
+	static const u16 routing[] = { R2059_C1, R2059_C2, R2059_C3 };
 	int i;
 
 	/* Prepare (reset?) radio */
@@ -263,7 +263,7 @@ static void b43_phy_ht_reset_cca(struct b43_wldev *dev)
 static void b43_phy_ht_zero_extg(struct b43_wldev *dev)
 {
 	u8 i, j;
-	u16 base[] = { 0x40, 0x60, 0x80 };
+	static const u16 base[] = { 0x40, 0x60, 0x80 };
 
 	for (i = 0; i < ARRAY_SIZE(base); i++) {
 		for (j = 0; j < 4; j++)
-- 
2.14.1

^ permalink raw reply related

* [PATCH] hv_netvsc:  make const array ver_list static, reduces object code size
From: Colin King @ 2017-09-22 15:50 UTC (permalink / raw)
  To: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, devel,
	netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Don't populate const array ver_list on the stack, instead make it
static. Makes the object code smaller by over 400 bytes:

Before:
   text	   data	    bss	    dec	    hex	filename
  18444	   3168	    320	  21932	   55ac	drivers/net/hyperv/netvsc.o

After:
   text	   data	    bss	    dec	    hex	filename
  17950	   3224	    320	  21494	   53f6	drivers/net/hyperv/netvsc.o

(gcc 6.3.0, x86-64)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/hyperv/netvsc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 8d5077fb0492..b0d323e24978 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -484,7 +484,7 @@ static int netvsc_connect_vsp(struct hv_device *device,
 			      struct netvsc_device *net_device,
 			      const struct netvsc_device_info *device_info)
 {
-	const u32 ver_list[] = {
+	static const u32 ver_list[] = {
 		NVSP_PROTOCOL_VERSION_1, NVSP_PROTOCOL_VERSION_2,
 		NVSP_PROTOCOL_VERSION_4, NVSP_PROTOCOL_VERSION_5
 	};
-- 
2.14.1

^ permalink raw reply related

* [net-next] net: remove MTU limits for dummy and ifb device
From: Zhang Shengju @ 2017-09-22 15:57 UTC (permalink / raw)
  To: davem, sd, edumazet, netdev

These two drivers (dummy and ifb) call ether_setup(), after commit
61e84623ace3 ("net: centralize net_device min/max MTU checking"), the
range of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.

These two devices should not have limits on MTU. This patch set their
min_mtu/max_mtu to 0. So that dev_set_mtu() will not check the mtu range,
and can be set with any value.

CC: Eric Dumazet <edumazet@google.com>
CC: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
---
 drivers/net/dummy.c | 2 +-
 drivers/net/ifb.c   | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index e31ab3b..58483af 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -353,7 +353,7 @@ static void dummy_setup(struct net_device *dev)
 	eth_hw_addr_random(dev);
 
 	dev->min_mtu = 0;
-	dev->max_mtu = ETH_MAX_MTU;
+	dev->max_mtu = 0;
 }
 
 static int dummy_validate(struct nlattr *tb[], struct nlattr *data[],
diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 8870bd2..0008da7 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -231,6 +231,9 @@ static void ifb_setup(struct net_device *dev)
 	eth_hw_addr_random(dev);
 	dev->needs_free_netdev = true;
 	dev->priv_destructor = ifb_dev_free;
+
+	dev->min_mtu = 0;
+	dev->max_mtu = 0;
 }
 
 static netdev_tx_t ifb_xmit(struct sk_buff *skb, struct net_device *dev)
-- 
1.8.3.1

^ permalink raw reply related

* RE: [PATCH] hv_netvsc:  make const array ver_list static, reduces object code size
From: Haiyang Zhang @ 2017-09-22 16:00 UTC (permalink / raw)
  To: Colin King, KY Srinivasan, Stephen Hemminger,
	devel@linuxdriverproject.org, netdev@vger.kernel.org
  Cc: kernel-janitors@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20170922155023.15821-1-colin.king@canonical.com>



> -----Original Message-----
> From: Colin King [mailto:colin.king@canonical.com]
> Sent: Friday, September 22, 2017 8:50 AM
> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>;
> devel@linuxdriverproject.org; netdev@vger.kernel.org
> Cc: kernel-janitors@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH] hv_netvsc: make const array ver_list static, reduces
> object code size
> 
> From: Colin Ian King <colin.king@canonical.com>
> 
> Don't populate const array ver_list on the stack, instead make it
> static. Makes the object code smaller by over 400 bytes:
> 
> Before:
>    text	   data	    bss	    dec	    hex	filename
>   18444	   3168	    320	  21932	   55ac
> 	drivers/net/hyperv/netvsc.o
> 
> After:
>    text	   data	    bss	    dec	    hex	filename
>   17950	   3224	    320	  21494	   53f6
> 	drivers/net/hyperv/netvsc.o
> 
> (gcc 6.3.0, x86-64)
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---

Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>



^ permalink raw reply

* Re: [PATCH net-next 10/10] net: hns3: Add mqprio support when interacting with network stack
From: Jiri Pirko @ 2017-09-22 16:03 UTC (permalink / raw)
  To: linyunsheng
  Cc: davem@davemloft.net, huangdaode, xuwei (O), Liguozhu (Kenneth),
	Zhuangyuzeng (Yisen), Gabriele Paoloni, John Garry, Linuxarm,
	Salil Mehta, lipeng (Y), netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <59c51a37.a1c4df0a.ac4e2.8df0SMTPIN_ADDED_BROKEN@mx.google.com>

Fri, Sep 22, 2017 at 04:11:51PM CEST, linyunsheng@huawei.com wrote:
>Hi, Jiri
>
>>>- if (!tc) {
>>>+ if (if_running) {
>>>+ (void)hns3_nic_net_stop(netdev);
>>>+ msleep(100);
>>>+ }
>>>+
>>>+ ret = (kinfo->dcb_ops && kinfo->dcb_ops->>setup_tc) ?
>>>+ kinfo->dcb_ops->setup_tc(h, tc, prio_tc) : ->EOPNOTSUPP;
>
>>This is most odd. Why do you call dcb_ops from >ndo_setup_tc callback?
>>Why are you mixing this together? prio->tc mapping >can be done
>>directly in dcbnl
>
>Here is what we do in dcb_ops->setup_tc:
>Firstly, if current tc num is different from the tc num
>that user provide, then we setup the queues for each
>tc.
>
>Secondly, we tell hardware the pri to tc mapping that
>the stack is using. In rx direction, our hardware need
>that mapping to put different packet into different tc'
>queues according to the priority of the packet, then
>rss decides which specific queue in the tc should the
>packet goto.
>
>By mixing, I suppose you meant why we need the
>pri to tc infomation?

by mixing, I mean what I wrote. You are calling dcb_ops callback from
ndo_setup_tc callback. So you are mixing DCBNL subsystem and TC
subsystem. Why? Why do you need sch_mqprio? Why DCBNL is not enough for
all?



>I hope I did not misunderstand your question, thanks
>for your time reviewing.

^ permalink raw reply

* URGENT RESPOND NEEDED FROM YOU BY DANIEL MMINELE THE DEPUTY GOVERNOR OF SOUTH AFRICAN RESERVE BANK
From: Mrs, Suran Yoda @ 2017-09-22 16:06 UTC (permalink / raw)


Hello Dear,

My Name is Dr. Daniel Mminele the deputy governor of south African
reserve bank, you can also check  on our Bank website here  for your
assurance , http://www.whoswhosa.co.za/south-african-reserve-bank-20991

I am a citizen of South African. I am a computer analyst long time
working with reserve Bank of  South African. (R.B.) My Dear, I came
across your payment file which was Marked X and your Fund Release Hard
Disk Painted Red, I took my time to study your payment file and found
out that you have really attempted to receive the funds without
success till now. But I am using another person email address to send
you this because I don’t want my bank to notify  me alright

The most annoying thing is that the directors and top officials of
this nation could not tell you the truth that on no account will they
ever release the fund to you; instead they will let you spend money
unnecessarily. I swear with my life, I can release this funds to you
if you can certify me of my security and how I can run away from this
country to meet with you. This is like a Mafia Setting here,  if I do
this for you, I do not intend to work here all the days of my life and
I cannot run back to my country  as this Bank has my home address ,
and if I don't run away after I making the transfer to your bank
account, I may be in trouble or my life will be in danger.

The only thing I will need to release this fund is a Special Hard
Disk, we call it HD120 GIG. I will buy two of it, recopy your
information, destroy the previous one, and punch the computer to make
the fund reflect in your bank account within 24 banking hours. I will
clean up the tracer and destroy your file, after which I will run away
from here to meet with you in your country if you are interested. But
you must assure me the absolute confidentiality of this deal before I
can do anything further. Do get in touch with me immediately, You
should send to me your convenient
Tell phone  number /fax numbers for easy communications and also re
confirm your banking
details, so that there won't be any mistake.
For phone conversation, please call me on +27011745039 or
+27843464227, And you must contact me though  this my Email address
which is, danielmminele1@yahoo.com

SPECIAL INFORMATION:


YOU WILL SEND THE FEE FOR THE HARD DISK FIRST BEFORE I MAKE YOUR
TRANSFER OR YOU BUY
THE HARD DISK IN YOUR COUNTRY AND SEND IT TO ME,DON'T CONTACT ME IF
YOU CAN NOT SEND
THE HARD DISK FEE FIRST OR THE HARD DISK. AS SOON AS I RECEIVED YOUR
EMAIL I WILL LET
YOU KNOW HOW MUCH THE DISK WILL COST YOU.

Regards, Dr. Daniel Mminele the deputy governor of south African reserve bank

^ permalink raw reply

* Re: [PATCH net v2] l2tp: fix race condition in l2tp_tunnel_delete
From: Sabrina Dubroca @ 2017-09-22 16:16 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: netdev, Xin Long, Tom Parkin
In-Reply-To: <20170919164337.zn3p6ikwsgnw7ba6@alphalink.fr>

2017-09-19, 18:43:37 +0200, Guillaume Nault wrote:
> On Tue, Sep 19, 2017 at 03:40:40PM +0200, Sabrina Dubroca wrote:
> > If we try to delete the same tunnel twice, the first delete operation
> > does a lookup (l2tp_tunnel_get), finds the tunnel, calls
> > l2tp_tunnel_delete, which queues it for deletion by
> > l2tp_tunnel_del_work.
> > 
> > The second delete operation also finds the tunnel and calls
> > l2tp_tunnel_delete. If the workqueue has already fired and started
> > running l2tp_tunnel_del_work, then l2tp_tunnel_delete will queue the
> > same tunnel a second time, and try to free the socket again.
> > 
> > Add a dead flag to prevent firing the workqueue twice. Then we can
> > remove the check of queue_work's result that was meant to prevent that
> > race but doesn't.
> > 
> > Also check the flag in the tunnel lookup functions, to avoid returning a
> > tunnel that is already scheduled for destruction.
> > 
> > Reproducer:
> > 
> >     ip l2tp add tunnel tunnel_id 3000 peer_tunnel_id 4000 local 192.168.0.2 remote 192.168.0.1 encap udp udp_sport 5000 udp_dport 6000
> >     ip l2tp add session name l2tp1 tunnel_id 3000 session_id 1000 peer_session_id 2000
> >     ip link set l2tp1 up
> >     ip l2tp del tunnel tunnel_id 3000
> >     ip l2tp del tunnel tunnel_id 3000
> > 
> > Fixes: f8ccac0e4493 ("l2tp: put tunnel socket release on a workqueue")
> > Reported-by: Jianlin Shi <jishi@redhat.com>
> > Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
> > ---
> > v2: as Tom Parkin explained, we can't remove the tunnel from the
> >     per-net list from netlink. v2 uses only a dead flag, and adds
> >     corresponding checks during lookups
> > 
> >  net/l2tp/l2tp_core.c | 18 +++++++++---------
> >  net/l2tp/l2tp_core.h |  5 ++++-
> >  2 files changed, 13 insertions(+), 10 deletions(-)
> > 
> > diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> > index ee485df73ccd..3891f0260f2b 100644
> > --- a/net/l2tp/l2tp_core.c
> > +++ b/net/l2tp/l2tp_core.c
> > @@ -203,7 +203,8 @@ struct l2tp_tunnel *l2tp_tunnel_get(const struct net *net, u32 tunnel_id)
> >  
> >  	rcu_read_lock_bh();
> >  	list_for_each_entry_rcu(tunnel, &pn->l2tp_tunnel_list, list) {
> > -		if (tunnel->tunnel_id == tunnel_id) {
> > +		if (tunnel->tunnel_id == tunnel_id &&
> > +		    !test_bit(0, &tunnel->dead)) {
> >  			l2tp_tunnel_inc_refcount(tunnel);
> >  			rcu_read_unlock_bh();
> >  
> > @@ -390,7 +391,8 @@ struct l2tp_tunnel *l2tp_tunnel_find(const struct net *net, u32 tunnel_id)
> >  
> >  	rcu_read_lock_bh();
> >  	list_for_each_entry_rcu(tunnel, &pn->l2tp_tunnel_list, list) {
> > -		if (tunnel->tunnel_id == tunnel_id) {
> > +		if (tunnel->tunnel_id == tunnel_id &&
> > +		    !test_bit(0, &tunnel->dead)) {
> >  			rcu_read_unlock_bh();
> >  			return tunnel;
> >  		}
> > @@ -409,7 +411,7 @@ struct l2tp_tunnel *l2tp_tunnel_find_nth(const struct net *net, int nth)
> >  
> >  	rcu_read_lock_bh();
> >  	list_for_each_entry_rcu(tunnel, &pn->l2tp_tunnel_list, list) {
> > -		if (++count > nth) {
> > +		if (++count > nth && !test_bit(0, &tunnel->dead)) {
> >  			rcu_read_unlock_bh();
> >  			return tunnel;
> >  		}
> > 
> I don't get why you're checking the dead flag in l2tp_tunnel_{get,find}*().
> Since it can be set concurrently right after test_bit(), it doesn't
> protect the caller from getting a tunnel that is being removed by
> l2tp_tunnel_delete().
> Or have I missed something?

You're right.

Then I would try going back to essentially v1, but keeping code to
remove the tunnel from the list in l2tp_tunnel_destruct if it's not
dead yet.

What do you think?


-------- 8< --------

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index ee485df73ccd..63cd1f30ac7d 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1234,6 +1234,23 @@ int l2tp_xmit_skb(struct l2tp_session *session, struct sk_buff *skb, int hdr_len
 }
 EXPORT_SYMBOL_GPL(l2tp_xmit_skb);
 
+static bool __l2tp_tunnel_delete(struct l2tp_tunnel *tunnel)
+{
+	struct l2tp_net *pn = l2tp_pernet(tunnel->l2tp_net);
+	bool ret = false;
+
+	spin_lock_bh(&pn->l2tp_tunnel_list_lock);
+	if (!tunnel->dead) {
+		tunnel->dead = 1;
+		list_del_rcu(&tunnel->list);
+		atomic_dec(&l2tp_tunnel_count);
+		ret = true;
+	}
+	spin_unlock_bh(&pn->l2tp_tunnel_list_lock);
+
+	return ret;
+}
+
 /*****************************************************************************
  * Tinnel and session create/destroy.
  *****************************************************************************/
@@ -1245,7 +1262,6 @@ EXPORT_SYMBOL_GPL(l2tp_xmit_skb);
 static void l2tp_tunnel_destruct(struct sock *sk)
 {
 	struct l2tp_tunnel *tunnel = l2tp_tunnel(sk);
-	struct l2tp_net *pn;
 
 	if (tunnel == NULL)
 		goto end;
@@ -1270,11 +1286,7 @@ static void l2tp_tunnel_destruct(struct sock *sk)
 	sk->sk_user_data = NULL;
 
 	/* Remove the tunnel struct from the tunnel list */
-	pn = l2tp_pernet(tunnel->l2tp_net);
-	spin_lock_bh(&pn->l2tp_tunnel_list_lock);
-	list_del_rcu(&tunnel->list);
-	spin_unlock_bh(&pn->l2tp_tunnel_list_lock);
-	atomic_dec(&l2tp_tunnel_count);
+	__l2tp_tunnel_delete(tunnel);
 
 	l2tp_tunnel_closeall(tunnel);
 
@@ -1685,14 +1697,12 @@ EXPORT_SYMBOL_GPL(l2tp_tunnel_create);
 
 /* This function is used by the netlink TUNNEL_DELETE command.
  */
-int l2tp_tunnel_delete(struct l2tp_tunnel *tunnel)
+void l2tp_tunnel_delete(struct l2tp_tunnel *tunnel)
 {
-	l2tp_tunnel_inc_refcount(tunnel);
-	if (false == queue_work(l2tp_wq, &tunnel->del_work)) {
-		l2tp_tunnel_dec_refcount(tunnel);
-		return 1;
+	if (__l2tp_tunnel_delete(tunnel)) {
+		l2tp_tunnel_inc_refcount(tunnel);
+		queue_work(l2tp_wq, &tunnel->del_work);
 	}
-	return 0;
 }
 EXPORT_SYMBOL_GPL(l2tp_tunnel_delete);
 
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index a305e0c5925a..173e68bb8119 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -160,6 +160,8 @@ struct l2tp_tunnel_cfg {
 
 struct l2tp_tunnel {
 	int			magic;		/* Should be L2TP_TUNNEL_MAGIC */
+	int			dead;
+
 	struct rcu_head rcu;
 	rwlock_t		hlist_lock;	/* protect session_hlist */
 	bool			acpt_newsess;	/* Indicates whether this
@@ -254,7 +256,7 @@ int l2tp_tunnel_create(struct net *net, int fd, int version, u32 tunnel_id,
 		       u32 peer_tunnel_id, struct l2tp_tunnel_cfg *cfg,
 		       struct l2tp_tunnel **tunnelp);
 void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel);
-int l2tp_tunnel_delete(struct l2tp_tunnel *tunnel);
+void l2tp_tunnel_delete(struct l2tp_tunnel *tunnel);
 struct l2tp_session *l2tp_session_create(int priv_size,
 					 struct l2tp_tunnel *tunnel,
 					 u32 session_id, u32 peer_session_id,


-- 
Sabrina

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox