Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
From: Jason Wang @ 2016-11-30 13:07 UTC (permalink / raw)
  To: Yunjian Wang, mst, netdev, linux-kernel; +Cc: caihe
In-Reply-To: <1480507857-22976-1-git-send-email-wangyunjian@huawei.com>



On 2016年11月30日 20:10, Yunjian Wang wrote:
> When we meet an error(err=-EBADFD) recvmsg, the error handling in vhost
> handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.
>
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> ---
>   drivers/vhost/net.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 5dc128a..edc470b 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
>   			pr_debug("Discarded rx packet: "
>   				 " len %d, expected %zd\n", err, sock_len);
>   			vhost_discard_vq_desc(vq, headcount);
> +			/* Don't continue to do, when meet errors. */
> +			if (err < 0)
> +				goto out;
>   			continue;
>   		}
>   		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */

Acked-by: Jason Wang <jasowang@redhat.com>

We may want to rename vhost_discard_vq_desc() in the future, since it 
does not discard the desc in fact.

^ permalink raw reply

* RE: [PATCH net-next v2 3/4] Documentation: net: phy: Add blurb about RGMII
From: David Laight @ 2016-11-30 12:32 UTC (permalink / raw)
  To: 'Florian Fainelli', Timur Tabi, netdev@vger.kernel.org
  Cc: davem@davemloft.net, andrew@lunn.ch, sf84@laposte.net,
	martin.blumenstingl@googlemail.com, mans@mansr.com,
	alexandre.torgue@st.com, peppe.cavallaro@st.com,
	jbrunet@baylibre.com
In-Reply-To: <a06d903f-5b80-4683-965f-9a6a1d5fe044@gmail.com>

From: Florian Fainelli
> Sent: 27 November 2016 23:03
> Le 27/11/2016  14:24, Timur Tabi a crit :
> >> + * PHY device drivers in PHYLIB being reusable by nature, being able to
> >> +   configure correctly a specified delay enables more designs with
> >> similar delay
> >> +   requirements to be operate correctly
> >
> > Ok, this one I don't know how to fix.  I'm not really sure what you're
> > trying to say.
> 
> What I am trying to say is that once a PHY driver properly configures a
> delay that you have specified, there is no reason why this is not
> applicable to other platforms using this same PHY driver.

As has been stated earlier it can depend on the track lengths on the
board itself.
(Although 1ns is about 1 foot - so track delays of that length are unlikely.)

> >> +Common problems with RGMII delay mismatch
> >> +
> >> + When there is a RGMII delay mismatch between the Ethernet MAC and
> >> the PHY, this
> >> + will most likely result in the clock and data line sampling to
> >> capture unstable
> >
> > I'm not sure what "sampling to capture unstable" is supposed to mean.
> 
> When the PHY devices takes a "snapshot" of the state of the data lines,
> after a clock edge, if the delay is improperly configured, these data
> lines are going to still be floating, or show some kind of
> capacitance/inductance effect, so the logical level which is going to be
> read may be incorrect.

No, the problem is that the data lines are being changed at much the same time
as the clock.
Quite possibly on both the rising and falling edges of the clock.

The actual latching of the data requires the data to be stable for the 'setup'
and 'hold' times of the latch (ie before and after the clock edge).
If the data and clock change at the same time it will be indeterminate whether
the old or new data is latched (the latch output might even oscillate).
The delay is there to ensure that the data isn't changing at the same time as
it is sampled.

At lower speed I suspect that the data only changes on one clock edge and is
sampled on the other.
(FWIW the latest DDR has an additional change in the data half way between
the clock edges!)

	David

^ permalink raw reply

* [PATCH net-next v2 2/2] tcp: allow to turn tcp timestamp randomization off
From: Florian Westphal @ 2016-11-30 12:28 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal
In-Reply-To: <1480508930-24406-1-git-send-email-fw@strlen.de>

Eric says: "By looking at tcpdump, and TS val of xmit packets of multiple
flows, we can deduct the relative qdisc delays (think of fq pacing).
This should work even if we have one flow per remote peer."

Having random per flow (or host) offsets doesn't allow that anymore so add
a way to turn this off.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 change since v2: do check in secure_tcpv4/6_sequence_number so outgoing
 syn packets won't have a random offset either in if randomization is off.

 Tested:
 sysctl_tcp_timestamps==1, tcpdump on lo, both ends have same values.

 Documentation/networking/ip-sysctl.txt | 9 +++++++--
 net/core/secure_seq.c                  | 5 +++--
 net/ipv4/tcp_input.c                   | 3 ++-
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 5af48dd7c5fc..de2448313799 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -610,8 +610,13 @@ tcp_syn_retries - INTEGER
 	with the current initial RTO of 1second. With this the final timeout
 	for an active TCP connection attempt will happen after 127seconds.
 
-tcp_timestamps - BOOLEAN
-	Enable timestamps as defined in RFC1323.
+tcp_timestamps - INTEGER
+Enable timestamps as defined in RFC1323.
+	0: Disabled.
+	1: Enable timestamps as defined in RFC1323.
+	2: Like 1, but also use a random offset for each connection
+	rather than only using the current time.
+	Default: 2
 
 tcp_min_tso_segs - INTEGER
 	Minimal number of segments per TSO frame.
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index a8d6062cbb4a..36addd3d9633 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -12,6 +12,7 @@
 #include <net/secure_seq.h>
 
 #if IS_ENABLED(CONFIG_IPV6) || IS_ENABLED(CONFIG_INET)
+#include <net/tcp.h>
 #define NET_SECRET_SIZE (MD5_MESSAGE_BYTES / 4)
 
 static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned;
@@ -58,7 +59,7 @@ u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
 
 	md5_transform(hash, secret);
 
-	*tsoff = hash[1];
+	*tsoff = sysctl_tcp_timestamps == 2 ? hash[1] : 0;
 	return seq_scale(hash[0]);
 }
 EXPORT_SYMBOL(secure_tcpv6_sequence_number);
@@ -100,7 +101,7 @@ u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
 
 	md5_transform(hash, net_secret);
 
-	*tsoff = hash[1];
+	*tsoff = sysctl_tcp_timestamps == 2 ? hash[1] : 0;
 	return seq_scale(hash[0]);
 }
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1b1921c71f7c..5f6d4efd2551 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -76,7 +76,7 @@
 #include <asm/unaligned.h>
 #include <linux/errqueue.h>
 
-int sysctl_tcp_timestamps __read_mostly = 1;
+int sysctl_tcp_timestamps __read_mostly = 2;
 int sysctl_tcp_window_scaling __read_mostly = 1;
 int sysctl_tcp_sack __read_mostly = 1;
 int sysctl_tcp_fack __read_mostly = 1;
@@ -85,6 +85,7 @@ int sysctl_tcp_dsack __read_mostly = 1;
 int sysctl_tcp_app_win __read_mostly = 31;
 int sysctl_tcp_adv_win_scale __read_mostly = 1;
 EXPORT_SYMBOL(sysctl_tcp_adv_win_scale);
+EXPORT_SYMBOL(sysctl_tcp_timestamps);
 
 /* rfc5961 challenge ack rate limiting */
 int sysctl_tcp_challenge_ack_limit = 1000;
-- 
2.7.3

^ permalink raw reply related

* [PATCH net-next v2 1/2] tcp: randomize tcp timestamp offsets for each connection
From: Florian Westphal @ 2016-11-30 12:28 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

jiffies based timestamps allow for easy inference of number of devices
behind NAT translators and also makes tracking of hosts simpler.

commit ceaa1fef65a7c2e ("tcp: adding a per-socket timestamp offset")
added the main infrastructure that is needed for per-connection ts
randomization, in particular writing/reading the on-wire tcp header
format takes the offset into account so rest of stack can use normal
tcp_time_stamp (jiffies).

So only two items are left:
 - add a tsoffset for request sockets
 - extend the tcp isn generator to also return another 32bit number
   in addition to the ISN.

Re-use of ISN generator also means timestamps are still monotonically
increasing for same connection quadruple, i.e. PAWS will still work.

Includes fixes from Eric Dumazet.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 No changes since v1, preserved Erics ack.

 include/linux/tcp.h      |  1 +
 include/net/secure_seq.h |  8 ++++----
 include/net/tcp.h        |  2 +-
 net/core/secure_seq.c    | 10 ++++++----
 net/ipv4/syncookies.c    |  1 +
 net/ipv4/tcp_input.c     |  7 ++++++-
 net/ipv4/tcp_ipv4.c      |  9 +++++----
 net/ipv4/tcp_minisocks.c |  4 +++-
 net/ipv4/tcp_output.c    |  2 +-
 net/ipv6/syncookies.c    |  1 +
 net/ipv6/tcp_ipv6.c      | 10 ++++++----
 11 files changed, 35 insertions(+), 20 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 32a7c7e35b71..2408bcc579f1 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -123,6 +123,7 @@ struct tcp_request_sock {
 	u32				txhash;
 	u32				rcv_isn;
 	u32				snt_isn;
+	u32				ts_off;
 	u32				last_oow_ack_time; /* last SYNACK */
 	u32				rcv_nxt; /* the ack # by SYNACK. For
 						  * FastOpen it's the seq#
diff --git a/include/net/secure_seq.h b/include/net/secure_seq.h
index 3f36d45b714a..0caee631a836 100644
--- a/include/net/secure_seq.h
+++ b/include/net/secure_seq.h
@@ -6,10 +6,10 @@
 u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
 u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport);
-__u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
-				 __be16 sport, __be16 dport);
-__u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
-				   __be16 sport, __be16 dport);
+u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
+			       __be16 sport, __be16 dport, u32 *tsoff);
+u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
+				 __be16 sport, __be16 dport, u32 *tsoff);
 u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr,
 				__be16 sport, __be16 dport);
 u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7de80739adab..1c09d909bd43 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1809,7 +1809,7 @@ struct tcp_request_sock_ops {
 	struct dst_entry *(*route_req)(const struct sock *sk, struct flowi *fl,
 				       const struct request_sock *req,
 				       bool *strict);
-	__u32 (*init_seq)(const struct sk_buff *skb);
+	__u32 (*init_seq)(const struct sk_buff *skb, u32 *tsoff);
 	int (*send_synack)(const struct sock *sk, struct dst_entry *dst,
 			   struct flowi *fl, struct request_sock *req,
 			   struct tcp_fastopen_cookie *foc,
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index fd3ce461fbe6..a8d6062cbb4a 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -40,8 +40,8 @@ static u32 seq_scale(u32 seq)
 #endif
 
 #if IS_ENABLED(CONFIG_IPV6)
-__u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
-				   __be16 sport, __be16 dport)
+u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
+				 __be16 sport, __be16 dport, u32 *tsoff)
 {
 	u32 secret[MD5_MESSAGE_BYTES / 4];
 	u32 hash[MD5_DIGEST_WORDS];
@@ -58,6 +58,7 @@ __u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
 
 	md5_transform(hash, secret);
 
+	*tsoff = hash[1];
 	return seq_scale(hash[0]);
 }
 EXPORT_SYMBOL(secure_tcpv6_sequence_number);
@@ -86,8 +87,8 @@ EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 
 #ifdef CONFIG_INET
 
-__u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
-				 __be16 sport, __be16 dport)
+u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
+			       __be16 sport, __be16 dport, u32 *tsoff)
 {
 	u32 hash[MD5_DIGEST_WORDS];
 
@@ -99,6 +100,7 @@ __u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
 
 	md5_transform(hash, net_secret);
 
+	*tsoff = hash[1];
 	return seq_scale(hash[0]);
 }
 
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 0dc6286272aa..3e88467d70ee 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -334,6 +334,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
 	treq = tcp_rsk(req);
 	treq->rcv_isn		= ntohl(th->seq) - 1;
 	treq->snt_isn		= cookie;
+	treq->ts_off		= 0;
 	req->mss		= mss;
 	ireq->ir_num		= ntohs(th->dest);
 	ireq->ir_rmt_port	= th->source;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 22e6a2097ff6..1b1921c71f7c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6301,6 +6301,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 		goto drop;
 
 	tcp_rsk(req)->af_specific = af_ops;
+	tcp_rsk(req)->ts_off = 0;
 
 	tcp_clear_options(&tmp_opt);
 	tmp_opt.mss_clamp = af_ops->mss_clamp;
@@ -6322,6 +6323,9 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	if (security_inet_conn_request(sk, skb, req))
 		goto drop_and_free;
 
+	if (isn && tmp_opt.tstamp_ok)
+		af_ops->init_seq(skb, &tcp_rsk(req)->ts_off);
+
 	if (!want_cookie && !isn) {
 		/* VJ's idea. We save last timestamp seen
 		 * from the destination in peer table, when entering
@@ -6362,7 +6366,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 			goto drop_and_release;
 		}
 
-		isn = af_ops->init_seq(skb);
+		isn = af_ops->init_seq(skb, &tcp_rsk(req)->ts_off);
 	}
 	if (!dst) {
 		dst = af_ops->route_req(sk, &fl, req, NULL);
@@ -6374,6 +6378,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 
 	if (want_cookie) {
 		isn = cookie_init_sequence(af_ops, sk, skb, &req->mss);
+		tcp_rsk(req)->ts_off = 0;
 		req->cookie_ts = tmp_opt.tstamp_ok;
 		if (!tmp_opt.tstamp_ok)
 			inet_rsk(req)->ecn_ok = 0;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5555eb86e549..b50f05905ced 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -95,12 +95,12 @@ static int tcp_v4_md5_hash_hdr(char *md5_hash, const struct tcp_md5sig_key *key,
 struct inet_hashinfo tcp_hashinfo;
 EXPORT_SYMBOL(tcp_hashinfo);
 
-static  __u32 tcp_v4_init_sequence(const struct sk_buff *skb)
+static u32 tcp_v4_init_sequence(const struct sk_buff *skb, u32 *tsoff)
 {
 	return secure_tcp_sequence_number(ip_hdr(skb)->daddr,
 					  ip_hdr(skb)->saddr,
 					  tcp_hdr(skb)->dest,
-					  tcp_hdr(skb)->source);
+					  tcp_hdr(skb)->source, tsoff);
 }
 
 int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
@@ -237,7 +237,8 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 		tp->write_seq = secure_tcp_sequence_number(inet->inet_saddr,
 							   inet->inet_daddr,
 							   inet->inet_sport,
-							   usin->sin_port);
+							   usin->sin_port,
+							   &tp->tsoffset);
 
 	inet->inet_id = tp->write_seq ^ jiffies;
 
@@ -824,7 +825,7 @@ static void tcp_v4_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb,
 	tcp_v4_send_ack(sk, skb, seq,
 			tcp_rsk(req)->rcv_nxt,
 			req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale,
-			tcp_time_stamp,
+			tcp_time_stamp + tcp_rsk(req)->ts_off,
 			req->ts_recent,
 			0,
 			tcp_md5_do_lookup(sk, (union tcp_md5_addr *)&ip_hdr(skb)->daddr,
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 6234ebaa7db1..28ce5ee831f5 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -532,7 +532,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
 			newtp->rx_opt.ts_recent_stamp = 0;
 			newtp->tcp_header_len = sizeof(struct tcphdr);
 		}
-		newtp->tsoffset = 0;
+		newtp->tsoffset = treq->ts_off;
 #ifdef CONFIG_TCP_MD5SIG
 		newtp->md5sig_info = NULL;	/*XXX*/
 		if (newtp->af_specific->md5_lookup(sk, newsk))
@@ -581,6 +581,8 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 
 		if (tmp_opt.saw_tstamp) {
 			tmp_opt.ts_recent = req->ts_recent;
+			if (tmp_opt.rcv_tsecr)
+				tmp_opt.rcv_tsecr -= tcp_rsk(req)->ts_off;
 			/* We do not store true stamp, but it is not required,
 			 * it can be estimated (approximately)
 			 * from another data.
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 19105b46a304..1b6d5f34bf45 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -640,7 +640,7 @@ static unsigned int tcp_synack_options(struct request_sock *req,
 	}
 	if (likely(ireq->tstamp_ok)) {
 		opts->options |= OPTION_TS;
-		opts->tsval = tcp_skb_timestamp(skb);
+		opts->tsval = tcp_skb_timestamp(skb) + tcp_rsk(req)->ts_off;
 		opts->tsecr = req->ts_recent;
 		remaining -= TCPOLEN_TSTAMP_ALIGNED;
 	}
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 97830a6a9cbb..a4d49760bf43 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -209,6 +209,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 	treq->snt_synack.v64	= 0;
 	treq->rcv_isn = ntohl(th->seq) - 1;
 	treq->snt_isn = cookie;
+	treq->ts_off = 0;
 
 	/*
 	 * We need to lookup the dst_entry to get the correct window size.
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 28ec0a2e7b72..a2185a214abc 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -101,12 +101,12 @@ static void inet6_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
 	}
 }
 
-static __u32 tcp_v6_init_sequence(const struct sk_buff *skb)
+static u32 tcp_v6_init_sequence(const struct sk_buff *skb, u32 *tsoff)
 {
 	return secure_tcpv6_sequence_number(ipv6_hdr(skb)->daddr.s6_addr32,
 					    ipv6_hdr(skb)->saddr.s6_addr32,
 					    tcp_hdr(skb)->dest,
-					    tcp_hdr(skb)->source);
+					    tcp_hdr(skb)->source, tsoff);
 }
 
 static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
@@ -283,7 +283,8 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 		tp->write_seq = secure_tcpv6_sequence_number(np->saddr.s6_addr32,
 							     sk->sk_v6_daddr.s6_addr32,
 							     inet->inet_sport,
-							     inet->inet_dport);
+							     inet->inet_dport,
+							     &tp->tsoffset);
 
 	err = tcp_connect(sk);
 	if (err)
@@ -956,7 +957,8 @@ static void tcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb,
 			tcp_rsk(req)->snt_isn + 1 : tcp_sk(sk)->snd_nxt,
 			tcp_rsk(req)->rcv_nxt,
 			req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale,
-			tcp_time_stamp, req->ts_recent, sk->sk_bound_dev_if,
+			tcp_time_stamp + tcp_rsk(req)->ts_off,
+			req->ts_recent, sk->sk_bound_dev_if,
 			tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr),
 			0, 0);
 }
-- 
2.7.3

^ permalink raw reply related

* Re: [PATCH net 2/2] esp6: Fix integrity verification when ESN are used
From: Steffen Klassert @ 2016-11-30 12:17 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Tobias Brunner, David S. Miller, netdev
In-Reply-To: <20161130095837.GB3138@gondor.apana.org.au>

On Wed, Nov 30, 2016 at 05:58:38PM +0800, Herbert Xu wrote:
> On Tue, Nov 29, 2016 at 05:05:25PM +0100, Tobias Brunner wrote:
> > When handling inbound packets, the two halves of the sequence number
> > stored on the skb are already in network order.
> > 
> > Fixes: 000ae7b2690e ("esp6: Switch to new AEAD interface")
> > Signed-off-by: Tobias Brunner <tobias@strongswan.org>
> 
> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Also applied to the ipsec tree, thanks a lot everyone!

^ permalink raw reply

* Re: [PATCH net 1/2] esp4: Fix integrity verification when ESN are used
From: Steffen Klassert @ 2016-11-30 12:17 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Tobias Brunner, David S. Miller, netdev
In-Reply-To: <20161130095827.GA3138@gondor.apana.org.au>

On Wed, Nov 30, 2016 at 05:58:27PM +0800, Herbert Xu wrote:
> On Tue, Nov 29, 2016 at 05:05:20PM +0100, Tobias Brunner wrote:
> > When handling inbound packets, the two halves of the sequence number
> > stored on the skb are already in network order.
> > 
> > Fixes: 7021b2e1cddd ("esp4: Switch to new AEAD interface")
> > Signed-off-by: Tobias Brunner <tobias@strongswan.org>
> 
> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied to the ipsec tree, thanks!

^ permalink raw reply

* Re: [PATCH] xfrm_user: fix return value from xfrm_user_rcv_msg
From: Steffen Klassert @ 2016-11-30 12:15 UTC (permalink / raw)
  To: Yi Zhao; +Cc: netdev, fan.du
In-Reply-To: <1480414141-17801-1-git-send-email-yi.zhao@windriver.com>

On Tue, Nov 29, 2016 at 06:09:01PM +0800, Yi Zhao wrote:
> It doesn't support to run 32bit 'ip' to set xfrm objdect on 64bit host.
> But the return value is unknown for user program:
> 
> ip xfrm policy list
> RTNETLINK answers: Unknown error 524
> 
> Replace ENOTSUPP with EOPNOTSUPP:
> 
> ip xfrm policy list
> RTNETLINK answers: Operation not supported
> 
> Signed-off-by: Yi Zhao <yi.zhao@windriver.com>

Applied to the ipsec tree, thanks!

^ permalink raw reply

* [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
From: Yunjian Wang @ 2016-11-30 12:10 UTC (permalink / raw)
  To: mst, jasowang, netdev, linux-kernel; +Cc: caihe, wangyunjian

When we meet an error(err=-EBADFD) recvmsg, the error handling in vhost
handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
---
 drivers/vhost/net.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5dc128a..edc470b 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
 			pr_debug("Discarded rx packet: "
 				 " len %d, expected %zd\n", err, sock_len);
 			vhost_discard_vq_desc(vq, headcount);
+			/* Don't continue to do, when meet errors. */
+			if (err < 0)
+				goto out;
 			continue;
 		}
 		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
-- 
1.9.5.msysgit.1

^ permalink raw reply related

* [PATCH 2/2] net: rfkill: Add rfkill-any LED trigger
From: Michał Kępień @ 2016-11-30 12:03 UTC (permalink / raw)
  To: Johannes Berg, David S . Miller
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161130120317.11851-1-kernel-ePNcKBjznIDVItvQsEIGlw@public.gmane.org>

This patch adds a new "global" (i.e. not per-rfkill device) LED trigger,
rfkill-any, which may be useful for laptops with a single "radio LED"
and multiple radio transmitters.  The trigger is meant to turn a LED on
whenever there is at least one radio transmitter active and turn it off
otherwise.

Signed-off-by: Michał Kępień <kernel-ePNcKBjznIDVItvQsEIGlw@public.gmane.org>
---
Note that the search for any active radio will have quadratic complexity
whenever __rfkill_switch_all() is used (as it calls rfkill_set_block()
for every affected rfkill device), but I intentionally refrained from
implementing rfkill_any_led_trigger_event() using struct work_struct to
keep things simple, given the average number of rfkill devices in
hardware these days.  Please let me know in case this should be
reworked.

 net/rfkill/core.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index f28e441..5275f2f 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -176,6 +176,47 @@ static void rfkill_led_trigger_unregister(struct rfkill *rfkill)
 {
 	led_trigger_unregister(&rfkill->led_trigger);
 }
+
+static struct led_trigger rfkill_any_led_trigger;
+
+static void __rfkill_any_led_trigger_event(void)
+{
+	enum led_brightness brightness = LED_OFF;
+	struct rfkill *rfkill;
+
+	list_for_each_entry(rfkill, &rfkill_list, node) {
+		if (!(rfkill->state & RFKILL_BLOCK_ANY)) {
+			brightness = LED_FULL;
+			break;
+		}
+	}
+
+	led_trigger_event(&rfkill_any_led_trigger, brightness);
+}
+
+static void rfkill_any_led_trigger_event(void)
+{
+	mutex_lock(&rfkill_global_mutex);
+	__rfkill_any_led_trigger_event();
+	mutex_unlock(&rfkill_global_mutex);
+}
+
+static void rfkill_any_led_trigger_activate(struct led_classdev *led_cdev)
+{
+	rfkill_any_led_trigger_event();
+}
+
+static int rfkill_any_led_trigger_register(void)
+{
+	rfkill_any_led_trigger.name = "rfkill-any";
+	rfkill_any_led_trigger.activate = rfkill_any_led_trigger_activate;
+	return led_trigger_register(&rfkill_any_led_trigger);
+}
+
+static void rfkill_any_led_trigger_unregister(void)
+{
+	led_trigger_unregister(&rfkill_any_led_trigger);
+}
 #else
 static void rfkill_led_trigger_event(struct rfkill *rfkill)
 {
@@ -189,6 +230,19 @@ static inline int rfkill_led_trigger_register(struct rfkill *rfkill)
 static inline void rfkill_led_trigger_unregister(struct rfkill *rfkill)
 {
 }
+
+static void rfkill_any_led_trigger_event(void)
+{
+}
+
+static int rfkill_any_led_trigger_register(void)
+{
+	return 0;
+}
+
+static void rfkill_any_led_trigger_unregister(void)
+{
+}
 #endif /* CONFIG_RFKILL_LEDS */
 
 static void rfkill_fill_event(struct rfkill_event *ev, struct rfkill *rfkill,
@@ -297,6 +351,7 @@ static void rfkill_set_block(struct rfkill *rfkill, bool blocked)
 	spin_unlock_irqrestore(&rfkill->lock, flags);
 
 	rfkill_led_trigger_event(rfkill);
+	__rfkill_any_led_trigger_event();
 
 	if (prev != curr)
 		rfkill_event(rfkill);
@@ -477,6 +532,7 @@ bool rfkill_set_hw_state(struct rfkill *rfkill, bool blocked)
 	spin_unlock_irqrestore(&rfkill->lock, flags);
 
 	rfkill_led_trigger_event(rfkill);
+	rfkill_any_led_trigger_event();
 
 	if (!rfkill->registered)
 		return ret;
@@ -523,6 +579,7 @@ bool rfkill_set_sw_state(struct rfkill *rfkill, bool blocked)
 		schedule_work(&rfkill->uevent_work);
 
 	rfkill_led_trigger_event(rfkill);
+	rfkill_any_led_trigger_event();
 
 	return blocked;
 }
@@ -572,6 +629,7 @@ void rfkill_set_states(struct rfkill *rfkill, bool sw, bool hw)
 			schedule_work(&rfkill->uevent_work);
 
 		rfkill_led_trigger_event(rfkill);
+		rfkill_any_led_trigger_event();
 	}
 }
 EXPORT_SYMBOL(rfkill_set_states);
@@ -988,6 +1046,7 @@ int __must_check rfkill_register(struct rfkill *rfkill)
 #endif
 	}
 
+	__rfkill_any_led_trigger_event();
 	rfkill_send_events(rfkill, RFKILL_OP_ADD);
 
 	mutex_unlock(&rfkill_global_mutex);
@@ -1020,6 +1079,7 @@ void rfkill_unregister(struct rfkill *rfkill)
 	mutex_lock(&rfkill_global_mutex);
 	rfkill_send_events(rfkill, RFKILL_OP_DEL);
 	list_del_init(&rfkill->node);
+	__rfkill_any_led_trigger_event();
 	mutex_unlock(&rfkill_global_mutex);
 
 	rfkill_led_trigger_unregister(rfkill);
@@ -1278,8 +1338,18 @@ static int __init rfkill_init(void)
 		goto error_input;
 #endif
 
+#ifdef CONFIG_RFKILL_LEDS
+	error = rfkill_any_led_trigger_register();
+	if (error)
+		goto error_led_trigger;
+#endif
+
 	return 0;
 
+error_led_trigger:
+#ifdef CONFIG_RFKILL_INPUT
+	rfkill_handler_exit();
+#endif
 error_input:
 	misc_deregister(&rfkill_miscdev);
 error_misc:
@@ -1291,6 +1361,9 @@ subsys_initcall(rfkill_init);
 
 static void __exit rfkill_exit(void)
 {
+#ifdef CONFIG_RFKILL_LEDS
+	rfkill_any_led_trigger_unregister();
+#endif
 #ifdef CONFIG_RFKILL_INPUT
 	rfkill_handler_exit();
 #endif
-- 
2.10.2

^ permalink raw reply related

* [PATCH 1/2] net: rfkill: Cleanup error handling in rfkill_init()
From: Michał Kępień @ 2016-11-30 12:03 UTC (permalink / raw)
  To: Johannes Berg, David S . Miller; +Cc: linux-wireless, netdev, linux-kernel

Use a separate label per error condition in rfkill_init() to make it a
bit cleaner and easier to extend.

Signed-off-by: Michał Kępień <kernel@kempniu.pl>
---
 net/rfkill/core.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index 884027f..f28e441 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -1266,24 +1266,25 @@ static int __init rfkill_init(void)
 
 	error = class_register(&rfkill_class);
 	if (error)
-		goto out;
+		goto error_class;
 
 	error = misc_register(&rfkill_miscdev);
-	if (error) {
-		class_unregister(&rfkill_class);
-		goto out;
-	}
+	if (error)
+		goto error_misc;
 
 #ifdef CONFIG_RFKILL_INPUT
 	error = rfkill_handler_init();
-	if (error) {
-		misc_deregister(&rfkill_miscdev);
-		class_unregister(&rfkill_class);
-		goto out;
-	}
+	if (error)
+		goto error_input;
 #endif
 
- out:
+	return 0;
+
+error_input:
+	misc_deregister(&rfkill_miscdev);
+error_misc:
+	class_unregister(&rfkill_class);
+error_class:
 	return error;
 }
 subsys_initcall(rfkill_init);
-- 
2.10.2

^ permalink raw reply related

* RE: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable
From: Hayes Wang @ 2016-11-30 11:58 UTC (permalink / raw)
  To: David Miller, mlord-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org
  Cc: greg-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org,
	romieu-W8zweXLXuWQS+FvcfC7Uqw@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, nic_swsd,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161125.115827.2014848246966159357.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

Mark Lord <mlord-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
[...]
> > Not sure why, because there really is no other way for the data to
> > appear where it does at the beginning of that URB buffer.
> >
> > This does seem a rather unexpected burden to place upon someone
> > reporting a regression in a USB network driver that corrupts user data.
> 
> If you are the only person who can actively reproduce this, which
> seems to be the case right now, this is unfortunately the only way to
> reach a proper analysis and fix.

I have tested it with iperf more than five days without any error.
I would think if there is any other way to reproduce it.

Best Regards,
Hayes

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] stmmac: simplify flag assignment
From: Pavel Machek @ 2016-11-30 11:44 UTC (permalink / raw)
  To: David Miller; +Cc: peppe.cavallaro, netdev, linux-kernel
In-Reply-To: <20161124.110416.198867271899443489.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 933 bytes --]


Simplify flag assignment.
    
Signed-off-by: Pavel Machek <pavel@denx.de>

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index ed20668..0b706a7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2771,12 +2771,8 @@ static netdev_features_t stmmac_fix_features(struct net_device *dev,
 		features &= ~NETIF_F_CSUM_MASK;
 
 	/* Disable tso if asked by ethtool */
-	if ((priv->plat->tso_en) && (priv->dma_cap.tsoen)) {
-		if (features & NETIF_F_TSO)
-			priv->tso = true;
-		else
-			priv->tso = false;
-	}
+	if ((priv->plat->tso_en) && (priv->dma_cap.tsoen))
+		priv->tso = !!(features & NETIF_F_TSO);
 
 	return features;
 }


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply related

* Re: [WIP] net+mlx4: auto doorbell
From: Jesper Dangaard Brouer @ 2016-11-30 11:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Rick Jones, netdev, Saeed Mahameed, Tariq Toukan, brouer,
	Achiad Shochat
In-Reply-To: <1480402716.18162.124.camel@edumazet-glaptop3.roam.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 3942 bytes --]


I've played with a somewhat similar patch (from Achiad Shochat) for
mlx5 (attached).  While it gives huge improvements, the problem I ran
into was that; TX performance became a function of the TX completion
time/interrupt and could easily be throttled if configured too
high/slow.

Can your patch be affected by this too?

Adjustable via:
 ethtool -C mlx5p2 tx-usecs 16 tx-frames 32
 

On Mon, 28 Nov 2016 22:58:36 -0800 Eric Dumazet <eric.dumazet@gmail.com> wrote:

> I have a WIP, that increases pktgen rate by 75 % on mlx4 when bulking is
> not used.
> 
> lpaa23:~# echo 0 >/sys/class/net/eth0/doorbell_opt 
> lpaa23:~# sar -n DEV 1 10|grep eth0
[...]
> Average:         eth0      9.50 5707925.60      0.99 585285.69      0.00      0.00      0.50
> lpaa23:~# echo 1 >/sys/class/net/eth0/doorbell_opt 
> lpaa23:~# sar -n DEV 1 10|grep eth0
[...]
> Average:         eth0      2.40 9985214.90      0.31 1023874.60      0.00      0.00      0.50

These +75% number is pktgen without "burst", and definitely show that
your patch activate xmit_more.
What is the pps performance number when using pktgen "burst" option?


> And about 11 % improvement on an mono-flow UDP_STREAM test.
> 
> skb_set_owner_w() is now the most consuming function.
> 
> 
> lpaa23:~# ./udpsnd -4 -H 10.246.7.152 -d 2 &
> [1] 13696
> lpaa23:~# echo 0 >/sys/class/net/eth0/doorbell_opt
> lpaa23:~# sar -n DEV 1 10|grep eth0
[...]
> Average:         eth0      9.00 1307380.50      1.00 308356.18      0.00      0.00      0.50
> lpaa23:~# echo 3 >/sys/class/net/eth0/doorbell_opt
> lpaa23:~# sar -n DEV 1 10|grep eth0
[...]
> Average:         eth0      3.10 1459558.20      0.44 344267.57      0.00      0.00      0.50

The +11% number seems consistent with my perf observations that approx
12% was "fakely" spend on the xmit spin_lock.


[...]
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 4b597dca5c52..affebb435679 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
[...]
> -static inline bool mlx4_en_is_tx_ring_full(struct mlx4_en_tx_ring *ring)
> +static inline bool mlx4_en_is_tx_ring_full(const struct mlx4_en_tx_ring *ring)
>  {
> -	return ring->prod - ring->cons > ring->full_size;
> +	return READ_ONCE(ring->prod) - READ_ONCE(ring->cons) > ring->full_size;
>  }
[...]

> @@ -1033,6 +1058,14 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
>  	}
>  	send_doorbell = !skb->xmit_more || netif_xmit_stopped(ring->tx_queue);
>  
> +	/* Doorbell avoidance : We can omit doorbell if we know a TX completion
> +	 * will happen shortly.
> +	 */
> +	if (send_doorbell &&
> +	    dev->doorbell_opt &&
> +	    (s32)(READ_ONCE(ring->prod_bell) - READ_ONCE(ring->ncons)) > 0)

It would be nice with a function call with an appropriate name, instead
of an open-coded queue size check.  I'm also confused by the "ncons" name.

> +		send_doorbell = false;
> +
[...]

> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index 574bcbb1b38f..c3fd0deda198 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -280,6 +280,7 @@ struct mlx4_en_tx_ring {
>  	 */
>  	u32			last_nr_txbb;
>  	u32			cons;
> +	u32			ncons;

Maybe we can find a better name than "ncons" ?

>  	unsigned long		wake_queue;
>  	struct netdev_queue	*tx_queue;
>  	u32			(*free_tx_desc)(struct mlx4_en_priv *priv,
> @@ -290,6 +291,7 @@ struct mlx4_en_tx_ring {
>  
>  	/* cache line used and dirtied in mlx4_en_xmit() */
>  	u32			prod ____cacheline_aligned_in_smp;
> +	u32			prod_bell;

Good descriptive variable name.

>  	unsigned int		tx_dropped;
>  	unsigned long		bytes;
>  	unsigned long		packets;


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[-- Attachment #2: net_mlx5e__force_tx_skb_bulking.patch --]
[-- Type: text/x-patch, Size: 5079 bytes --]

Return-Path: tariqt@mellanox.com
Received: from zmta04.collab.prod.int.phx2.redhat.com (LHLO
 zmta04.collab.prod.int.phx2.redhat.com) (10.5.81.11) by
 zmail22.collab.prod.int.phx2.redhat.com with LMTP; Wed, 17 Aug 2016
 05:21:47 -0400 (EDT)
Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])
	by zmta04.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id B23B4DA128
	for <jbrouer@mail.corp.redhat.com>; Wed, 17 Aug 2016 05:21:47 -0400 (EDT)
Received: from mx1.redhat.com (ext-mx01.extmail.prod.ext.phx2.redhat.com [10.5.110.25])
	by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u7H9LlWp015796
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO)
	for <brouer@redhat.com>; Wed, 17 Aug 2016 05:21:47 -0400
Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129])
	by mx1.redhat.com (Postfix) with ESMTP id B3ADB8122E
	for <brouer@redhat.com>; Wed, 17 Aug 2016 09:21:45 +0000 (UTC)
Received: from Internal Mail-Server by MTLPINE1 (envelope-from tariqt@mellanox.com)
	with ESMTPS (AES256-SHA encrypted); 17 Aug 2016 12:15:03 +0300
Received: from dev-l-vrt-206.mtl.labs.mlnx (dev-l-vrt-206.mtl.labs.mlnx [10.134.206.1])
	by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id u7H9F31D010642;
	Wed, 17 Aug 2016 12:15:03 +0300
From: Tariq Toukan <tariqt@mellanox.com>
To: Jesper Dangaard Brouer <brouer@redhat.com>,
        Achiad Shochat <achiad@mellanox.com>,
        Rana Shahout <ranas@mellanox.com>,
        Saeed Mahameed <saeedm@mellanox.com>
Subject: [PATCH] net/mlx5e: force tx skb bulking
Date: Wed, 17 Aug 2016 12:14:51 +0300
Message-Id: <1471425291-1782-1-git-send-email-tariqt@mellanox.com>
X-Greylist: Delayed for 00:06:41 by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Wed, 17 Aug 2016 09:21:46 +0000 (UTC)
X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Wed, 17 Aug 2016 09:21:46 +0000 (UTC) for IP:'193.47.165.129' DOMAIN:'mail-il-dmz.mellanox.com' HELO:'mellanox.co.il' FROM:'tariqt@mellanox.com' RCPT:''
X-RedHat-Spam-Score: 0.251  (BAYES_50,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY) 193.47.165.129 mail-il-dmz.mellanox.com 193.47.165.129 mail-il-dmz.mellanox.com <tariqt@mellanox.com>
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23
X-Scanned-By: MIMEDefang 2.78 on 10.5.110.25

From: Achiad Shochat <achiad@mellanox.com>

To improve SW message rate in case HW is faster.
Heuristically detect cases where the message rate is high and there
is still no skb bulking and if so, stops the txq for a while trying
to force the bulking.

Change-Id: Icb925135e69b030943cb4666117c47d1cc04da97
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h    | 5 +++++
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 9 ++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 74edd01..78a0661 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -394,6 +394,10 @@ enum {
 	MLX5E_SQ_STATE_TX_TIMEOUT,
 };
 
+enum {
+	MLX5E_SQ_STOP_ONCE,
+};
+
 struct mlx5e_ico_wqe_info {
 	u8  opcode;
 	u8  num_wqebbs;
@@ -403,6 +407,7 @@ struct mlx5e_sq {
 	/* data path */
 
 	/* dirtied @completion */
+	unsigned long              flags;
 	u16                        cc;
 	u32                        dma_fifo_cc;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index e073bf59..034eef0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -351,8 +351,10 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb)
 	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
 		skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
 
-	if (unlikely(!mlx5e_sq_has_room_for(sq, MLX5E_SQ_STOP_ROOM))) {
+	if (test_bit(MLX5E_SQ_STOP_ONCE, &sq->flags) ||
+	    unlikely(!mlx5e_sq_has_room_for(sq, MLX5E_SQ_STOP_ROOM))) {
 		netif_tx_stop_queue(sq->txq);
+		clear_bit(MLX5E_SQ_STOP_ONCE, &sq->flags);
 		sq->stats.stopped++;
 	}
 
@@ -429,6 +431,7 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 	u32 dma_fifo_cc;
 	u32 nbytes;
 	u16 npkts;
+	u16 ncqes;
 	u16 sqcc;
 	int i;
 
@@ -439,6 +442,7 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 
 	npkts = 0;
 	nbytes = 0;
+	ncqes = 0;
 
 	/* sq->cc must be updated only after mlx5_cqwq_update_db_record(),
 	 * otherwise a cq overrun may occur
@@ -458,6 +462,7 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 			break;
 
 		mlx5_cqwq_pop(&cq->wq);
+		ncqes++;
 
 		wqe_counter = be16_to_cpu(cqe->wqe_counter);
 
@@ -508,6 +513,8 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 
 	sq->dma_fifo_cc = dma_fifo_cc;
 	sq->cc = sqcc;
+	if ((npkts > 7) && ((npkts >> (ilog2(ncqes))) < 8))
+		set_bit(MLX5E_SQ_STOP_ONCE, &sq->flags);
 
 	netdev_tx_completed_queue(sq->txq, npkts, nbytes);
 
-- 
1.8.3.1


^ permalink raw reply related

* DSA vs. SWTICHDEV ?
From: Joakim Tjernlund @ 2016-11-30  8:50 UTC (permalink / raw)
  To: netdev@vger.kernel.org

I am trying to wrap my head around these two "devices" and have a hard time telling them apart.
We are looking att adding a faily large switch(over PCIe) to our board and from what I can tell
switchdev is the new way to do it but DSA is still there. Is it possible to just list
how they differ?

What can switchdev do that DSA cannot?

What can DSA do that switchdev cannot?

Can one enable switchdev and dsa for the same switch device?

 Jocke 

^ permalink raw reply

* Re: net: GPF in rt6_get_cookie
From: Andrey Konovalov @ 2016-11-30 11:10 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: syzkaller, David Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Eric Dumazet
In-Reply-To: <29124960-9002-cfd0-c6b9-8986d7e8c875@stressinduktion.org>

On Wed, Nov 30, 2016 at 12:00 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hi
>
> On 30.11.2016 11:39, Andrey Konovalov wrote:
>> On Sat, Nov 26, 2016 at 5:23 PM, 'Dmitry Vyukov' via syzkaller
>> <syzkaller@googlegroups.com> wrote:
>>> Hello,
>>>
>>> I got several GPFs in rt6_get_cookie while running syzkaller:
>>>
>>> general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>>> Dumping ftrace buffer:
>>>    (ftrace buffer empty)
>>> Modules linked in:
>>> CPU: 2 PID: 10156 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>> task: ffff880016f40480 task.stack: ffff88000fc00000
>>> RIP: 0010:[<ffffffff87a209e8>]  [<     inline     >] rt6_get_cookie
>>> include/net/ip6_fib.h:174
>>> RIP: 0010:[<ffffffff87a209e8>]  [<ffffffff87a209e8>]
>>> sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>>> RSP: 0018:ffff88000fc07298  EFLAGS: 00010202
>>> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc900029f5000
>>> RDX: 0000000000000015 RSI: 0000000000000001 RDI: 00000000000000a8
>>> RBP: ffff88000fc07580 R08: 0000000000000000 R09: 0000000000000001
>>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff880066cd0068
>>> R13: 1ffff10001f80e92 R14: ffff880066cd0040 R15: ffff88005f2d2808
>>> FS:  00007f52c41f7700(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 0000000020016000 CR3: 0000000065dd7000 CR4: 00000000000006e0
>>> DR0: 0000000000000400 DR1: 0000000000000400 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>>> Stack:
>>>  ffffffff87a210f6 ffffffff8701ad45 ffff88006768ec20 ffff88006768ec20
>>>  0000000000000000 0000000016f40480 ffff88000fc07450 1ffff1000cd9a017
>>>  ffff88006768ec00 ffff880066fc0730 ffff880066cd0068 1ffff10001f80e66
>>> Call Trace:
>>>  [<ffffffff879a313d>] sctp_transport_route+0xad/0x430 net/sctp/transport.c:279
>>>  [<ffffffff8799b106>] sctp_assoc_add_peer+0x5a6/0x13e0 net/sctp/associola.c:641
>>>  [<ffffffff879e8911>] sctp_sendmsg+0x1921/0x3bc0 net/sctp/socket.c:1864
>>>  [<ffffffff8701ad45>] inet_sendmsg+0x385/0x590 net/ipv4/af_inet.c:734
>>>  [<     inline     >] sock_sendmsg_nosec net/socket.c:621
>>>  [<ffffffff86a6d54f>] sock_sendmsg+0xcf/0x110 net/socket.c:631
>>>  [<ffffffff86a6ede0>] SYSC_sendto+0x660/0x810 net/socket.c:1656
>>>  [<ffffffff86a71dd5>] SyS_sendto+0x45/0x60 net/socket.c:1624
>>>  [<ffffffff88149dc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
>>> Code: 00 00 48 8b 84 24 88 00 00 00 48 8b 58 40 e8 80 76 cc f9 48 8d
>>> bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
>>> 3c 02 00 0f 85 56 0f 00 00 48 8b 9b a8 00 00 00 45 31 ed 48
>>> RIP  [<     inline     >] rt6_get_cookie include/net/ip6_fib.h:174
>>> RIP  [<ffffffff87a209e8>] sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>>>  RSP <ffff88000fc07298>
>>> ---[ end trace b8d1354fa571700d ]---
>>>
>>>
>>> general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>>> Dumping ftrace buffer:
>>>    (ftrace buffer empty)
>>> Modules linked in:
>>> CPU: 3 PID: 22744 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>> task: ffff88006b92a840 task.stack: ffff88006a730000
>>> RIP: 0010:[<ffffffff87a209e8>]  [<     inline     >] rt6_get_cookie
>>> include/net/ip6_fib.h:174
>>> RIP: 0010:[<ffffffff87a209e8>]  [<ffffffff87a209e8>]
>>> sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>>> RSP: 0018:ffff88006a736b88  EFLAGS: 00010202
>>> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90003c4f000
>>> RDX: 0000000000000015 RSI: 0000000000000001 RDI: 00000000000000a8
>>> RBP: ffff88006a736e68 R08: 0000000000000000 R09: 0000000000000001
>>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff880064cff268
>>> R13: 1ffff1000d4e6db0 R14: ffff880064cff240 R15: ffff88006a4b6808
>>> FS:  00007f74f4ec9700(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 000000002070effc CR3: 000000003bd2f000 CR4: 00000000000006e0
>>> DR0: 0000000000000400 DR1: 0000000000000400 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>>> Stack:
>>>  ffffffff87a210f6 ffffffff000bbd2d ffff88006c2cd5a0 ffff88006c2cd5a0
>>>  0000000000000000 000000006ccb46c0 ffff88006a736d40 1ffff1000c99fe57
>>>  ffff88006c2cd500 ffff8800658b1f30 ffff880064cff268 1ffff1000d4e6d84
>>> Call Trace:
>>>  [<ffffffff879a313d>] sctp_transport_route+0xad/0x430 net/sctp/transport.c:279
>>>  [<ffffffff8799b106>] sctp_assoc_add_peer+0x5a6/0x13e0 net/sctp/associola.c:641
>>>  [<ffffffff879e4358>] __sctp_connect+0x288/0xc90 net/sctp/socket.c:1178
>>>  [<ffffffff879e4f0b>] __sctp_setsockopt_connectx+0x1ab/0x200
>>> net/sctp/socket.c:1332
>>>  [<     inline     >] sctp_getsockopt_connectx3 net/sctp/socket.c:1417
>>>  [<ffffffff879fd2bd>] sctp_getsockopt+0x36ed/0x6800 net/sctp/socket.c:6474
>>>  [<ffffffff86a76c0a>] sock_common_getsockopt+0x9a/0xe0 net/core/sock.c:2649
>>>  [<     inline     >] SYSC_getsockopt net/socket.c:1788
>>>  [<ffffffff86a724d7>] SyS_getsockopt+0x257/0x390 net/socket.c:1770
>>>  [<ffffffff88149dc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
>>> Code: 00 00 48 8b 84 24 88 00 00 00 48 8b 58 40 e8 80 76 cc f9 48 8d
>>> bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
>>> 3c 02 00 0f 85 56 0f 00 00 48 8b 9b a8 00 00 00 45 31 ed 48
>>> RIP  [<     inline     >] rt6_get_cookie include/net/ip6_fib.h:174
>>> RIP  [<ffffffff87a209e8>] sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>>>  RSP <ffff88006a736b88>
>>> ---[ end trace f42d1c14cb6d2835 ]---
>>>
>>> This happened on commit a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6 (Nov 13).
>>>
>>> Unfortunately this is not reproducible.
>>>
>>> The line is:
>>>
>>>     return rt->rt6i_node ? rt->rt6i_node->fn_sernum : 0;
>>>
>>> Can it be a data race? rt->rt6i_node != NULL, but the next moment it
>>> is already NULL? That would explain the crash and non-reproducibility
>>> (need ThreadSanitizer!).
>>>
>>> This always happened when called from sctp code, but I don't know if
>>> it is relevant or not. It happened only 3 times.
>>
>> I'm seeing similar crashes from ipv6 and dccp code, reports below.
>>
>> [...]
>
> Thanks for the report.
>
> Do you have a thread running that concurrently mutates the routing table?

Hi Hannes,

We're running a fuzzer which calls random system calls from multiple
processes simultaneously, so it's quite possible.

Thanks!

>
> Bye,
> Hannes
>

^ permalink raw reply

* Re: [PATCH 3/6] net: ethernet: ti: cpts: add support of cpts HW_TS_PUSH
From: Jan Lübbe @ 2016-11-30 11:08 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Richard Cochran,
	Sekhar Nori, linux-kernel, linux-omap, Rob Herring, devicetree,
	Murali Karicheri, Wingman Kwok
In-Reply-To: <20161128230428.6872-4-grygorii.strashko@ti.com>

On Mo, 2016-11-28 at 17:04 -0600, Grygorii Strashko wrote:
> This patch adds support of the CPTS HW_TS_PUSH events which are generated
> by external low frequency time stamp channels on TI's OMAP CPSW and
> Keystone 2 platforms. It supports up to 8 external time stamp channels for
> HW_TS_PUSH input pins (the number of supported channel is different for
> different SoCs and CPTS versions, check corresponding Data maual before
> enabling it). Therefore, new DT property "cpts-ext-ts-inputs" is introduced
> for specifying number of available external timestamp channels.

If this only depends on SoC and CTPS, it should be possible to derive
the correct value from the compatible value and possibly a CPTS version
register? If the existing compatible strings are not specific enough,
possible a new one should be added.

Regards,
Jan
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: [PATCH 4/6] net: ethernet: ti: cpts: add ptp pps support
From: Jan Lübbe @ 2016-11-30 11:01 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Richard Cochran,
	Sekhar Nori, linux-kernel, linux-omap, Rob Herring, devicetree,
	Murali Karicheri, Wingman Kwok
In-Reply-To: <20161128230428.6872-5-grygorii.strashko@ti.com>

On Mo, 2016-11-28 at 17:04 -0600, Grygorii Strashko wrote:
> --- a/Documentation/devicetree/bindings/net/keystone-netcp.txt
> +++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt
> @@ -127,6 +127,16 @@ Optional properties:
>                 The number of external time stamp channels.
>                 The different CPTS versions might support up 8
>                 external time stamp channels. if absent - unsupported.
> +       - cpts-ts-comp-length:
> +               Enable time stamp comparison event and TS_COMP signal output
> +               generation when CPTS counter reaches a value written to
> +               the TS_COMP_VAL register.
> +               The generated pulse width is 3 refclk cycles if this property
> +               has no value (empty) or, otherwise, it should specify desired
> +               pulse width in number of refclk periods - max value 2^16.
> +               TS_COMP functionality will be disabled if not present.
> +       - cpts-ts-comp-polarity-low:
> +               Set polarity of TS_COMP signal to low. Default is hight.

Why is this configured via DT? Are the values fixed for a given board,
depending on external components? Couldn't this be configured somewhere
else?

Regards,
Jan
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: net: GPF in rt6_get_cookie
From: Hannes Frederic Sowa @ 2016-11-30 11:00 UTC (permalink / raw)
  To: Andrey Konovalov, syzkaller
  Cc: David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev, LKML, Eric Dumazet
In-Reply-To: <CAAeHK+wvAZByn7-fONWYk1P8fXA9wNdkVLGtXfQsdFb-NSdn+g@mail.gmail.com>

Hi

On 30.11.2016 11:39, Andrey Konovalov wrote:
> On Sat, Nov 26, 2016 at 5:23 PM, 'Dmitry Vyukov' via syzkaller
> <syzkaller@googlegroups.com> wrote:
>> Hello,
>>
>> I got several GPFs in rt6_get_cookie while running syzkaller:
>>
>> general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>> Dumping ftrace buffer:
>>    (ftrace buffer empty)
>> Modules linked in:
>> CPU: 2 PID: 10156 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> task: ffff880016f40480 task.stack: ffff88000fc00000
>> RIP: 0010:[<ffffffff87a209e8>]  [<     inline     >] rt6_get_cookie
>> include/net/ip6_fib.h:174
>> RIP: 0010:[<ffffffff87a209e8>]  [<ffffffff87a209e8>]
>> sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>> RSP: 0018:ffff88000fc07298  EFLAGS: 00010202
>> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc900029f5000
>> RDX: 0000000000000015 RSI: 0000000000000001 RDI: 00000000000000a8
>> RBP: ffff88000fc07580 R08: 0000000000000000 R09: 0000000000000001
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff880066cd0068
>> R13: 1ffff10001f80e92 R14: ffff880066cd0040 R15: ffff88005f2d2808
>> FS:  00007f52c41f7700(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000020016000 CR3: 0000000065dd7000 CR4: 00000000000006e0
>> DR0: 0000000000000400 DR1: 0000000000000400 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>> Stack:
>>  ffffffff87a210f6 ffffffff8701ad45 ffff88006768ec20 ffff88006768ec20
>>  0000000000000000 0000000016f40480 ffff88000fc07450 1ffff1000cd9a017
>>  ffff88006768ec00 ffff880066fc0730 ffff880066cd0068 1ffff10001f80e66
>> Call Trace:
>>  [<ffffffff879a313d>] sctp_transport_route+0xad/0x430 net/sctp/transport.c:279
>>  [<ffffffff8799b106>] sctp_assoc_add_peer+0x5a6/0x13e0 net/sctp/associola.c:641
>>  [<ffffffff879e8911>] sctp_sendmsg+0x1921/0x3bc0 net/sctp/socket.c:1864
>>  [<ffffffff8701ad45>] inet_sendmsg+0x385/0x590 net/ipv4/af_inet.c:734
>>  [<     inline     >] sock_sendmsg_nosec net/socket.c:621
>>  [<ffffffff86a6d54f>] sock_sendmsg+0xcf/0x110 net/socket.c:631
>>  [<ffffffff86a6ede0>] SYSC_sendto+0x660/0x810 net/socket.c:1656
>>  [<ffffffff86a71dd5>] SyS_sendto+0x45/0x60 net/socket.c:1624
>>  [<ffffffff88149dc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
>> Code: 00 00 48 8b 84 24 88 00 00 00 48 8b 58 40 e8 80 76 cc f9 48 8d
>> bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
>> 3c 02 00 0f 85 56 0f 00 00 48 8b 9b a8 00 00 00 45 31 ed 48
>> RIP  [<     inline     >] rt6_get_cookie include/net/ip6_fib.h:174
>> RIP  [<ffffffff87a209e8>] sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>>  RSP <ffff88000fc07298>
>> ---[ end trace b8d1354fa571700d ]---
>>
>>
>> general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>> Dumping ftrace buffer:
>>    (ftrace buffer empty)
>> Modules linked in:
>> CPU: 3 PID: 22744 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> task: ffff88006b92a840 task.stack: ffff88006a730000
>> RIP: 0010:[<ffffffff87a209e8>]  [<     inline     >] rt6_get_cookie
>> include/net/ip6_fib.h:174
>> RIP: 0010:[<ffffffff87a209e8>]  [<ffffffff87a209e8>]
>> sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>> RSP: 0018:ffff88006a736b88  EFLAGS: 00010202
>> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90003c4f000
>> RDX: 0000000000000015 RSI: 0000000000000001 RDI: 00000000000000a8
>> RBP: ffff88006a736e68 R08: 0000000000000000 R09: 0000000000000001
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff880064cff268
>> R13: 1ffff1000d4e6db0 R14: ffff880064cff240 R15: ffff88006a4b6808
>> FS:  00007f74f4ec9700(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 000000002070effc CR3: 000000003bd2f000 CR4: 00000000000006e0
>> DR0: 0000000000000400 DR1: 0000000000000400 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>> Stack:
>>  ffffffff87a210f6 ffffffff000bbd2d ffff88006c2cd5a0 ffff88006c2cd5a0
>>  0000000000000000 000000006ccb46c0 ffff88006a736d40 1ffff1000c99fe57
>>  ffff88006c2cd500 ffff8800658b1f30 ffff880064cff268 1ffff1000d4e6d84
>> Call Trace:
>>  [<ffffffff879a313d>] sctp_transport_route+0xad/0x430 net/sctp/transport.c:279
>>  [<ffffffff8799b106>] sctp_assoc_add_peer+0x5a6/0x13e0 net/sctp/associola.c:641
>>  [<ffffffff879e4358>] __sctp_connect+0x288/0xc90 net/sctp/socket.c:1178
>>  [<ffffffff879e4f0b>] __sctp_setsockopt_connectx+0x1ab/0x200
>> net/sctp/socket.c:1332
>>  [<     inline     >] sctp_getsockopt_connectx3 net/sctp/socket.c:1417
>>  [<ffffffff879fd2bd>] sctp_getsockopt+0x36ed/0x6800 net/sctp/socket.c:6474
>>  [<ffffffff86a76c0a>] sock_common_getsockopt+0x9a/0xe0 net/core/sock.c:2649
>>  [<     inline     >] SYSC_getsockopt net/socket.c:1788
>>  [<ffffffff86a724d7>] SyS_getsockopt+0x257/0x390 net/socket.c:1770
>>  [<ffffffff88149dc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
>> Code: 00 00 48 8b 84 24 88 00 00 00 48 8b 58 40 e8 80 76 cc f9 48 8d
>> bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
>> 3c 02 00 0f 85 56 0f 00 00 48 8b 9b a8 00 00 00 45 31 ed 48
>> RIP  [<     inline     >] rt6_get_cookie include/net/ip6_fib.h:174
>> RIP  [<ffffffff87a209e8>] sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>>  RSP <ffff88006a736b88>
>> ---[ end trace f42d1c14cb6d2835 ]---
>>
>> This happened on commit a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6 (Nov 13).
>>
>> Unfortunately this is not reproducible.
>>
>> The line is:
>>
>>     return rt->rt6i_node ? rt->rt6i_node->fn_sernum : 0;
>>
>> Can it be a data race? rt->rt6i_node != NULL, but the next moment it
>> is already NULL? That would explain the crash and non-reproducibility
>> (need ThreadSanitizer!).
>>
>> This always happened when called from sctp code, but I don't know if
>> it is relevant or not. It happened only 3 times.
> 
> I'm seeing similar crashes from ipv6 and dccp code, reports below.
> 
> [...]

Thanks for the report.

Do you have a thread running that concurrently mutates the routing table?

Bye,
Hannes

^ permalink raw reply

* net/ipv6: null-ptr-deref in ip6_rt_cache_alloc
From: Andrey Konovalov @ 2016-11-30 10:58 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML
  Cc: Dmitry Vyukov, Eric Dumazet, Kostya Serebryany, syzkaller

Hi!

I've got the following error report while running the syzkaller fuzzer.

On commit d8e435f3ab6fea2ea324dce72b51dd7761747523 (Nov 26).

This might be related to the crash in rt6_get_cookie that Dmitry
reported, since it also happens when accessing ort->dst:
https://groups.google.com/forum/#!msg/syzkaller/3uDn6P5bwzA/gdzgPxeYAgAJ

general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 3 PID: 5315 Comm: syz-executor6 Not tainted 4.9.0-rc6+ #468
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88003b729700 task.stack: ffff880038be8000
RIP: 0010:[<ffffffff83442c35>]  [<ffffffff83442c35>]
ip6_rt_cache_alloc+0xa5/0x580 net/ipv6/route.c:953
RSP: 0018:ffff880038bef168  EFLAGS: 00010206
RAX: ffff88003b729700 RBX: 0000000000000007 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffc90001aa7000 RDI: 0000000000000018
RBP: ffff880038bef198 R08: 0000000000004000 R09: 0000000000000003
R10: dffffc0000000000 R11: dffffc0000000000 R12: 0000000000000000
R13: ffff880038befa60 R14: 0000000000000000 R15: ffff880069ee1a40
FS:  00007fedfbb9f700(0000) GS:ffff88006e100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000003109cb8 CR3: 000000006c633000 CR4: 00000000000006e0
Stack:
 ffffffff8125141d ffff880069ee1a40 00000000fffd635a 1ffffffff0981200
 0000000000000000 ffff880069ee1a40 ffff880038bef310 ffffffff8344f233
 ffff880038befa60 1ffff1000717de49 ffff880038befa4f ffffffff850a0a68
Call Trace:
 [<ffffffff8344f233>] ip6_pol_route+0x13c3/0x1b20 net/ipv6/route.c:1106
 [<ffffffff8344fa4d>] ip6_pol_route_output+0x4d/0x60 net/ipv6/route.c:1190
 [<ffffffff834f606d>] fib6_rule_action+0x23d/0x740 net/ipv6/fib6_rules.c:100
 [<ffffffff82d82c36>] fib_rules_lookup+0x2b6/0x850 net/core/fib_rules.c:227
 [<ffffffff834f6b46>] fib6_rule_lookup+0xd6/0x260 net/ipv6/fib6_rules.c:44
 [<ffffffff83443426>] ip6_route_output_flags+0x276/0x310 net/ipv6/route.c:1218
 [<ffffffff83408f8d>] ip6_dst_lookup_tail+0xf9d/0x1410 net/ipv6/ip6_output.c:965
 [<ffffffff83409501>] ip6_dst_lookup_flow+0xa1/0x200 net/ipv6/ip6_output.c:1061
 [<ffffffff83488a3c>] rawv6_sendmsg+0xc0c/0x2c20 net/ipv6/raw.c:893
 [<ffffffff832a1037>] inet_sendmsg+0x317/0x4e0 net/ipv4/af_inet.c:734
 [<     inline     >] sock_sendmsg_nosec net/socket.c:621
 [<ffffffff82c9d76c>] sock_sendmsg+0xcc/0x110 net/socket.c:631
 [<ffffffff82c9f651>] ___sys_sendmsg+0x771/0x8b0 net/socket.c:1954
 [<ffffffff82ca163e>] __sys_sendmsg+0xce/0x170 net/socket.c:1988
 [<     inline     >] SYSC_sendmsg net/socket.c:1999
 [<ffffffff82ca170d>] SyS_sendmsg+0x2d/0x50 net/socket.c:1995
 [<ffffffff840f2d81>] entry_SYSCALL_64_fastpath+0x1f/0xc2
Code: 42 80 3c 06 00 0f 85 54 04 00 00 4d 8b 64 24 40 e8 11 11 01 fe
49 8d 7c 24 18 49 ba 00 00 00 00 00 fc ff df 49 89 f9 49 c1 e9 03 <43>
80 3c 11 00 0f 85 77 04 00 00 49 8b 74 24 18 49 bf 00 00 00
RIP  [<ffffffff83442c35>] ip6_rt_cache_alloc+0xa5/0x580 net/ipv6/route.c:953
 RSP <ffff880038bef168>
---[ end trace fefbac32da74ad88 ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

^ permalink raw reply

* Re: Netperf UDP issue with connected sockets
From: Jesper Dangaard Brouer @ 2016-11-30 10:43 UTC (permalink / raw)
  To: Rick Jones; +Cc: Eric Dumazet, netdev, brouer
In-Reply-To: <20730b37-d218-a1bb-d0fb-0f838e2a77b5@hpe.com>


On Mon, 28 Nov 2016 10:33:49 -0800 Rick Jones <rick.jones2@hpe.com> wrote:

> On 11/17/2016 12:16 AM, Jesper Dangaard Brouer wrote:
> >> time to try IP_MTU_DISCOVER ;)  
> >
> > To Rick, maybe you can find a good solution or option with Eric's hint,
> > to send appropriate sized UDP packets with Don't Fragment (DF).  
> 
> Jesper -
> 
> Top of trunk has a change adding an omni, test-specific -f option which 
> will set IP_MTU_DISCOVER:IP_PMTUDISC_DO on the data socket.  Is that 
> sufficient to your needs?

The "-- -f" option makes the __ip_select_ident lookup go away.  So,
confirming your new option works.

Notice the "fib_lookup" cost is still present, even when I use
option "-- -n -N" to create a connected socket.  As Eric taught us,
this is because we should use syscalls "send" or "write" on a connected
socket.

My udp_flood tool[1] cycle through the different syscalls:

taskset -c 2 ~/git/network-testing/src/udp_flood 198.18.50.1 --count $((10**7)) --pmtu 2
             	ns/pkt	pps		cycles/pkt
send      	473.08	2113816.28	1891
sendto    	558.58	1790265.84	2233
sendmsg   	587.24	1702873.80	2348
sendMmsg/32  	547.57	1826265.90	2189
write     	518.36	1929175.52	2072

Using "send" seems to be the fastest option.

Some notes on test: I've forced TX completions to happen on another CPU0
and pinned the udp_flood program (to CPU2) as I want to avoid the CPU
scheduler to move udp_flood around as this cause fluctuations in the
results (as it stress the memory allocations more).

My udp_flood --pmtu option is documented in the --help usage text (see below signature)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

$ uname -a
Linux canyon 4.9.0-rc6-page_pool07-baseline+ #185 SMP PREEMPT Wed Nov 30 10:07:51 CET 2016 x86_64

[1] udp_flood tool:
 https://github.com/netoptimizer/network-testing/blob/master/src/udp_flood.c

Quick command used for verifying  __ip_select_ident is removed:

 # First run benchmark
 sudo perf record -g -a ~/tools/netperf2-svn/src/netperf -H 198.18.50.1 \
  -t UDP_STREAM -l 3 -- -m 1472 -f

 # Second grep in perf output for functions
 sudo perf report --no-children --call-graph none --stdio |\
  egrep -e '__ip_select_ident|fib_table_lookup'


$ ./udp_flood --help

DOCUMENTATION:
 This tool is a UDP flood that measures the outgoing packet rate.
 Default cycles through tests with different send system calls.
 What function-call to invoke can also be specified as a command
 line option (see below).

 Default transmit 1000000 packets per test, adjust via --count

 Usage: ./udp_flood (options-see-below) IPADDR
 Listing options:
 --help         short-option: -h
 --ipv4         short-option: -4
 --ipv6         short-option: -6
 --sendmsg      short-option: -u
 --sendmmsg     short-option: -U
 --sendto       short-option: -t
 --write        short-option: -T
 --send         short-option: -S
 --batch        short-option: -b
 --count        short-option: -c
 --port         short-option: -p
 --payload      short-option: -m
 --pmtu         short-option: -d
 --verbose      short-option: -v

 Multiple tests can be selected:
     default: all tests
     -u -U -t -T -S: run any combination of sendmsg/sendmmsg/sendto/write/send

Option --pmtu <N>  for Path MTU discover socket option IP_MTU_DISCOVER
 This affects the DF(Don't-Fragment) bit setting.
 Following values are selectable:
  0 = IP_PMTUDISC_DONT
  1 = IP_PMTUDISC_WANT
  2 = IP_PMTUDISC_DO
  3 = IP_PMTUDISC_PROBE
  4 = IP_PMTUDISC_INTERFACE
  5 = IP_PMTUDISC_OMIT
 Documentation see under IP_MTU_DISCOVER in 'man 7 ip'

^ permalink raw reply

* Re: net: GPF in rt6_get_cookie
From: Andrey Konovalov @ 2016-11-30 10:39 UTC (permalink / raw)
  To: syzkaller
  Cc: David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev, LKML, Eric Dumazet
In-Reply-To: <CACT4Y+Y-B_GCDGpcLHt1CtXs3u9MBFD82MSFWTcL_v4Vi3+=HQ@mail.gmail.com>

On Sat, Nov 26, 2016 at 5:23 PM, 'Dmitry Vyukov' via syzkaller
<syzkaller@googlegroups.com> wrote:
> Hello,
>
> I got several GPFs in rt6_get_cookie while running syzkaller:
>
> general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 2 PID: 10156 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff880016f40480 task.stack: ffff88000fc00000
> RIP: 0010:[<ffffffff87a209e8>]  [<     inline     >] rt6_get_cookie
> include/net/ip6_fib.h:174
> RIP: 0010:[<ffffffff87a209e8>]  [<ffffffff87a209e8>]
> sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
> RSP: 0018:ffff88000fc07298  EFLAGS: 00010202
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc900029f5000
> RDX: 0000000000000015 RSI: 0000000000000001 RDI: 00000000000000a8
> RBP: ffff88000fc07580 R08: 0000000000000000 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff880066cd0068
> R13: 1ffff10001f80e92 R14: ffff880066cd0040 R15: ffff88005f2d2808
> FS:  00007f52c41f7700(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020016000 CR3: 0000000065dd7000 CR4: 00000000000006e0
> DR0: 0000000000000400 DR1: 0000000000000400 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> Stack:
>  ffffffff87a210f6 ffffffff8701ad45 ffff88006768ec20 ffff88006768ec20
>  0000000000000000 0000000016f40480 ffff88000fc07450 1ffff1000cd9a017
>  ffff88006768ec00 ffff880066fc0730 ffff880066cd0068 1ffff10001f80e66
> Call Trace:
>  [<ffffffff879a313d>] sctp_transport_route+0xad/0x430 net/sctp/transport.c:279
>  [<ffffffff8799b106>] sctp_assoc_add_peer+0x5a6/0x13e0 net/sctp/associola.c:641
>  [<ffffffff879e8911>] sctp_sendmsg+0x1921/0x3bc0 net/sctp/socket.c:1864
>  [<ffffffff8701ad45>] inet_sendmsg+0x385/0x590 net/ipv4/af_inet.c:734
>  [<     inline     >] sock_sendmsg_nosec net/socket.c:621
>  [<ffffffff86a6d54f>] sock_sendmsg+0xcf/0x110 net/socket.c:631
>  [<ffffffff86a6ede0>] SYSC_sendto+0x660/0x810 net/socket.c:1656
>  [<ffffffff86a71dd5>] SyS_sendto+0x45/0x60 net/socket.c:1624
>  [<ffffffff88149dc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
> Code: 00 00 48 8b 84 24 88 00 00 00 48 8b 58 40 e8 80 76 cc f9 48 8d
> bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
> 3c 02 00 0f 85 56 0f 00 00 48 8b 9b a8 00 00 00 45 31 ed 48
> RIP  [<     inline     >] rt6_get_cookie include/net/ip6_fib.h:174
> RIP  [<ffffffff87a209e8>] sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>  RSP <ffff88000fc07298>
> ---[ end trace b8d1354fa571700d ]---
>
>
> general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 3 PID: 22744 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff88006b92a840 task.stack: ffff88006a730000
> RIP: 0010:[<ffffffff87a209e8>]  [<     inline     >] rt6_get_cookie
> include/net/ip6_fib.h:174
> RIP: 0010:[<ffffffff87a209e8>]  [<ffffffff87a209e8>]
> sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
> RSP: 0018:ffff88006a736b88  EFLAGS: 00010202
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90003c4f000
> RDX: 0000000000000015 RSI: 0000000000000001 RDI: 00000000000000a8
> RBP: ffff88006a736e68 R08: 0000000000000000 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff880064cff268
> R13: 1ffff1000d4e6db0 R14: ffff880064cff240 R15: ffff88006a4b6808
> FS:  00007f74f4ec9700(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000002070effc CR3: 000000003bd2f000 CR4: 00000000000006e0
> DR0: 0000000000000400 DR1: 0000000000000400 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> Stack:
>  ffffffff87a210f6 ffffffff000bbd2d ffff88006c2cd5a0 ffff88006c2cd5a0
>  0000000000000000 000000006ccb46c0 ffff88006a736d40 1ffff1000c99fe57
>  ffff88006c2cd500 ffff8800658b1f30 ffff880064cff268 1ffff1000d4e6d84
> Call Trace:
>  [<ffffffff879a313d>] sctp_transport_route+0xad/0x430 net/sctp/transport.c:279
>  [<ffffffff8799b106>] sctp_assoc_add_peer+0x5a6/0x13e0 net/sctp/associola.c:641
>  [<ffffffff879e4358>] __sctp_connect+0x288/0xc90 net/sctp/socket.c:1178
>  [<ffffffff879e4f0b>] __sctp_setsockopt_connectx+0x1ab/0x200
> net/sctp/socket.c:1332
>  [<     inline     >] sctp_getsockopt_connectx3 net/sctp/socket.c:1417
>  [<ffffffff879fd2bd>] sctp_getsockopt+0x36ed/0x6800 net/sctp/socket.c:6474
>  [<ffffffff86a76c0a>] sock_common_getsockopt+0x9a/0xe0 net/core/sock.c:2649
>  [<     inline     >] SYSC_getsockopt net/socket.c:1788
>  [<ffffffff86a724d7>] SyS_getsockopt+0x257/0x390 net/socket.c:1770
>  [<ffffffff88149dc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
> Code: 00 00 48 8b 84 24 88 00 00 00 48 8b 58 40 e8 80 76 cc f9 48 8d
> bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
> 3c 02 00 0f 85 56 0f 00 00 48 8b 9b a8 00 00 00 45 31 ed 48
> RIP  [<     inline     >] rt6_get_cookie include/net/ip6_fib.h:174
> RIP  [<ffffffff87a209e8>] sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>  RSP <ffff88006a736b88>
> ---[ end trace f42d1c14cb6d2835 ]---
>
> This happened on commit a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6 (Nov 13).
>
> Unfortunately this is not reproducible.
>
> The line is:
>
>     return rt->rt6i_node ? rt->rt6i_node->fn_sernum : 0;
>
> Can it be a data race? rt->rt6i_node != NULL, but the next moment it
> is already NULL? That would explain the crash and non-reproducibility
> (need ThreadSanitizer!).
>
> This always happened when called from sctp code, but I don't know if
> it is relevant or not. It happened only 3 times.

I'm seeing similar crashes from ipv6 and dccp code, reports below.

===

general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 30320 Comm: syz-executor0 Not tainted 4.9.0-rc6+ #462
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff880069c10040 task.stack: ffff880069f20000
RIP: 0010:[<ffffffff839eec82>]  [<     inline     >] rt6_get_cookie
include/net/ip6_fib.h:174
RIP: 0010:[<ffffffff839eec82>]  [<     inline     >] ip6_dst_store
include/net/ip6_route.h:174
RIP: 0010:[<ffffffff839eec82>]  [<ffffffff839eec82>]
dccp_v6_connect+0x762/0x14e0 net/dccp/ipv6.c:899
RSP: 0018:ffff880069f27ab0  EFLAGS: 00010202
RAX: ffff880069c10040 RBX: ffff88003b5b0040 RCX: 0000000000000000
RDX: dffffc0000000000 RSI: ffffc90000e6c000 RDI: 00000000000000a8
RBP: ffff880069f27c08 R08: 0000000000000015 R09: ffffffff839eec65
R10: 1ffff1000d2d78e0 R11: dffffc0000000000 R12: ffff880069f27e00
R13: ffff8800696bc6c0 R14: ffff88003b5b08e8 R15: ffff88003b5b08e8
FS:  00007f04712ef700(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004ad010 CR3: 0000000037c6a000 CR4: 00000000000006e0
Stack:
 ffff880069f27c58 ffff88003b5b04a0 ffff88003b5b0088 ffff88003b5b0078
 ffff880069f27e02 ffff88003b5b0052 00000000ffffffff 0000000000000000
 1ffff1000d3e4f60 0000000000000000 0000000041b58ab3 ffffffff84a3599d
Call Trace:
 [<ffffffff8329b087>] __inet_stream_connect+0x2a7/0xb30 net/ipv4/af_inet.c:594
 [<ffffffff8329b965>] inet_stream_connect+0x55/0xa0 net/ipv4/af_inet.c:655
 [<ffffffff82c9e5f4>] SYSC_connect+0x244/0x2f0 net/socket.c:1548
 [<ffffffff82ca0de4>] SyS_connect+0x24/0x30 net/socket.c:1529
 [<ffffffff81006485>] do_syscall_64+0x195/0x490 arch/x86/entry/common.c:280
 [<ffffffff840f2d09>] entry_SYSCALL64_slow_path+0x25/0x25
Code: 49 8b 7d 40 48 89 7c 24 38 e8 cb 50 a6 fd 48 8b 4c 24 38 48 ba
00 00 00 00 00 fc ff df 48 8d b9 a8 00 00 00 49 89 f8 49 c1 e8 03 <41>
80 3c 10 00 0f 85 52 0d 00 00 48 8b 81 a8 00 00 00 48 85 c0
RIP  [<     inline     >] rt6_get_cookie include/net/ip6_fib.h:174
RIP  [<     inline     >] ip6_dst_store include/net/ip6_route.h:174
RIP  [<ffffffff839eec82>] dccp_v6_connect+0x762/0x14e0 net/dccp/ipv6.c:899
 RSP <ffff880069f27ab0>
---[ end trace e7d9d916f3bf26c5 ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

===

general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 3 PID: 21865 Comm: syz-executor0 Not tainted 4.9.0-rc6+ #462
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88003bacc480 task.stack: ffff88003a038000
RIP: 0010:[<ffffffff834c9c7f>]  [<     inline     >] rt6_get_cookie
include/net/ip6_fib.h:174
RIP: 0010:[<ffffffff834c9c7f>]  [<     inline     >] ip6_dst_store
include/net/ip6_route.h:174
RIP: 0010:[<ffffffff834c9c7f>]  [<ffffffff834c9c7f>]
ip6_datagram_dst_update+0x75f/0xe70 net/ipv6/datagram.c:108
RSP: 0018:ffff88003a03fb48  EFLAGS: 00010202
RAX: ffff88003bacc480 RBX: ffff880068e887c0 RCX: 0000000000000001
RDX: 0000000000000015 RSI: ffffc90000de4000 RDI: 00000000000000a8
RBP: ffff88003a03fc88 R08: 0000000000004000 R09: ffffffff834c9c67
R10: dffffc0000000000 R11: dffffc0000000000 R12: ffff880068e88cf0
R13: ffff88006ba75a40 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f1d4c9f3700(0000) GS:ffff88006e100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004ad010 CR3: 000000006cde5000 CR4: 00000000000006e0
Stack:
 ffffffff834c997b 0000000041b58ab3 ffffffff849dee58 1ffff10007407f70
 ffff880068e887f8 ffff880068e887f8 ffff880068e88cf0 0000000041b58ab3
 ffffffff84a35071 ffffffff834c9520 0000000000000007 ffff88003bacc480
Call Trace:
 [<     inline     >] __ip6_datagram_connect net/ipv6/datagram.c:246
 [<ffffffff834cab95>] ip6_datagram_connect+0x375/0xcc0 net/ipv6/datagram.c:261
 [<ffffffff834cb53f>] ip6_datagram_connect_v6_only+0x5f/0x80
net/ipv6/datagram.c:273
 [<ffffffff8329a3ab>] inet_dgram_connect+0x11b/0x200 net/ipv4/af_inet.c:530
 [<ffffffff82c9e5f4>] SYSC_connect+0x244/0x2f0 net/socket.c:1548
 [<ffffffff82ca0de4>] SyS_connect+0x24/0x30 net/socket.c:1529
 [<ffffffff840f2c41>] entry_SYSCALL_64_fastpath+0x1f/0xc2
Code: 80 3c 08 00 0f 85 96 05 00 00 4d 8b 7d 40 e8 c9 a0 f8 fd 49 8d
bf a8 00 00 00 49 ba 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <42>
80 3c 12 00 0f 85 74 05 00 00 4d 8b bf a8 00 00 00 4d 85 ff
RIP  [<     inline     >] rt6_get_cookie include/net/ip6_fib.h:174
RIP  [<     inline     >] ip6_dst_store include/net/ip6_route.h:174
RIP  [<ffffffff834c9c7f>] ip6_datagram_dst_update+0x75f/0xe70
net/ipv6/datagram.c:108
 RSP <ffff88003a03fb48>
---[ end trace 148fc8ac80034c6f ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

===

>
> Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: stmmac ethernet in kernel 4.4: coalescing related pauses?
From: Pavel Machek @ 2016-11-30 10:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, lsanfil, peppe.cavallaro, netdev, linux-kernel
In-Reply-To: <1480347103.18162.58.camel@edumazet-glaptop3.roam.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 1090 bytes --]

On Mon 2016-11-28 07:31:43, Eric Dumazet wrote:
> On Mon, 2016-11-28 at 09:54 -0500, David Miller wrote:
> > From: Lino Sanfilippo <lsanfil@marvell.com>
> > Date: Mon, 28 Nov 2016 14:07:51 +0100
> > 
> > > Calling skb_orphan() in the xmit handler made this issue disappear.
> > 
> > This is not the way to handle this problem.
> > 
> > The solution is to free the SKBs in a timely manner after the
> > chip has transmitted the frame.
> 
> Note that the 'pauses' described by Pavel are also caused by a too small
> SO_SNDBUF value on the UDP socket.
> 
> An immediate fix, with no kernel change is to increase it.
> 
> echo 1000000 >/proc/sys/net/core/wmem_default

Thanks a lot. For the record, that works around the problem, too. (Or
at least helps a lot; it may be possible that problem still remains if
continuous stream of packets is going to trigger this, if I read the
sources correctly.)

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: [PATCH net] tipc: check minimum bearer MTU
From: Ying Xue @ 2016-11-30 10:28 UTC (permalink / raw)
  To: Michal Kubecek, Jon Maloy
  Cc: Qian, netdev, Zhang, linux-kernel, tipc-discussion, Ben Hutchings,
	David S. Miller
In-Reply-To: <20161130095702.DD033A0F14@unicorn.suse.cz>

On 11/30/2016 05:57 PM, Michal Kubecek wrote:
> Qian Zhang (张谦) reported a potential socket buffer overflow in
> tipc_msg_build() which is also known as CVE-2016-8632: due to
> insufficient checks, a buffer overflow can occur if MTU is too short for
> even tipc headers. As anyone can set device MTU in a user/net namespace,
> this issue can be abused by a regular user.
>
> As agreed in the discussion on Ben Hutchings' original patch, we should
> check the MTU at the moment a bearer is attached rather than for each
> processed packet. We also need to repeat the check when bearer MTU is
> adjusted to new device MTU. UDP case also needs a check to avoid
> overflow when calculating bearer MTU.
>
> Fixes: b97bf3fd8f6a ("[TIPC] Initial merge")
> Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
> Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn>
> ---
>  net/tipc/bearer.c    |  9 +++++++--
>  net/tipc/bearer.h    | 13 +++++++++++++
>  net/tipc/udp_media.c |  5 +++++
>  3 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
> index 975dbeb60ab0..dd4b19e8bb43 100644
> --- a/net/tipc/bearer.c
> +++ b/net/tipc/bearer.c
> @@ -421,6 +421,10 @@ int tipc_enable_l2_media(struct net *net, struct tipc_bearer *b,
>  	dev = dev_get_by_name(net, driver_name);
>  	if (!dev)
>  		return -ENODEV;
> +	if (tipc_check_mtu(dev, 0)) {
> +		dev_put(dev);
> +		return -EINVAL;
> +	}
>
>  	/* Associate TIPC bearer with L2 bearer */
>  	rcu_assign_pointer(b->media_ptr, dev);
> @@ -610,8 +614,6 @@ static int tipc_l2_device_event(struct notifier_block *nb, unsigned long evt,
>  	if (!b)
>  		return NOTIFY_DONE;
>
> -	b->mtu = dev->mtu;
> -
>  	switch (evt) {
>  	case NETDEV_CHANGE:
>  		if (netif_carrier_ok(dev))
> @@ -624,6 +626,9 @@ static int tipc_l2_device_event(struct notifier_block *nb, unsigned long evt,
>  		tipc_reset_bearer(net, b);
>  		break;
>  	case NETDEV_CHANGEMTU:
> +		if (tipc_check_mtu(dev, 0))
> +			return -EINVAL;
> +		b->mtu = dev->mtu;
>  		tipc_reset_bearer(net, b);
>  		break;
>  	case NETDEV_CHANGEADDR:
> diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h
> index 78892e2f53e3..1a0b7434ec24 100644
> --- a/net/tipc/bearer.h
> +++ b/net/tipc/bearer.h
> @@ -39,6 +39,7 @@
>
>  #include "netlink.h"
>  #include "core.h"
> +#include "msg.h"
>  #include <net/genetlink.h>
>
>  #define MAX_MEDIA	3
> @@ -59,6 +60,9 @@
>  #define TIPC_MEDIA_TYPE_IB	2
>  #define TIPC_MEDIA_TYPE_UDP	3
>
> +/* minimum bearer MTU */
> +#define TIPC_MIN_BEARER_MTU	(MAX_H_SIZE + INT_H_SIZE)
> +
>  /**
>   * struct tipc_media_addr - destination address used by TIPC bearers
>   * @value: address info (format defined by media)
> @@ -215,4 +219,13 @@ void tipc_bearer_xmit(struct net *net, u32 bearer_id,
>  void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id,
>  			 struct sk_buff_head *xmitq);
>
> +/* check if device MTU is sufficient for tipc headers */
> +inline bool tipc_check_mtu(struct net_device *dev, unsigned int reserve)

It's unnecessary to explicitly declare a function as inline, instead 
please let GCC smartly decide this.

> +{
> +	if (dev->mtu >= TIPC_MIN_BEARER_MTU + reserve)
> +		return false;
> +	netdev_warn(dev, "MTU too low for tipc bearer\n");
> +	return true;
> +}
> +
>  #endif	/* _TIPC_BEARER_H */
> diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
> index 78cab9c5a445..376ed3e3ed46 100644
> --- a/net/tipc/udp_media.c
> +++ b/net/tipc/udp_media.c
> @@ -697,6 +697,11 @@ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b,
>  		udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
>  		udp_conf.use_udp_checksums = false;
>  		ub->ifindex = dev->ifindex;
> +		if (tipc_check_mtu(dev, sizeof(struct iphdr) +
> +					sizeof(struct udphdr))) {
> +			err = -EINVAL;
> +			goto err;
> +		}

For UDP bearer, it seems insufficient for us to check MTU size only when 
UDP bearer is enabled. Meanwhile, we should update MTU size for UDP 
bearer with Path MTU discovery protocol once MTU size is changed after 
bearer is enabled.

Regards,
Ying

>  		b->mtu = dev->mtu - sizeof(struct iphdr)
>  			- sizeof(struct udphdr);
>  #if IS_ENABLED(CONFIG_IPV6)
>


------------------------------------------------------------------------------
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

^ permalink raw reply

* [PATCH] net:phy fix driver reference count error when attach and detach phy device
From: Mao Wenan @ 2016-11-30 10:22 UTC (permalink / raw)
  To: netdev, f.fainelli, dingtianhong

The nic in my board use the phy dev from marvell, and the system will
load the marvell phy driver automatically, but when I remove the phy
drivers, the system immediately panic:
Call trace:
[ 2582.834493] [<ffff800000715384>] phy_state_machine+0x3c/0x438 [
2582.851754] [<ffff8000000db3b8>] process_one_work+0x150/0x428 [
2582.868188] [<ffff8000000db7d4>] worker_thread+0x144/0x4b0 [
2582.883882] [<ffff8000000e1d0c>] kthread+0xfc/0x110

there should be proper reference counting in place to avoid that.
I found that phy_attach_direct() forgets to add phy device driver
reference count, and phy_detach() forgets to subtract reference count.
This patch is to fix this bug, after that panic is disappeared when remove
marvell.ko

Signed-off-by: Mao Wenan <maowenan@huawei.com>
---
 drivers/net/phy/phy_device.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 1a4bf8a..a7ec7c2 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -866,6 +866,11 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
 		return -EIO;
 	}
 
+	if (!try_module_get(d->driver->owner)) {
+		dev_err(&dev->dev, "failed to get the device driver module\n");
+		return -EIO;
+	}
+
 	get_device(d);
 
 	/* Assume that if there is no driver, that it doesn't
@@ -921,6 +926,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
 
 error:
 	put_device(d);
+	module_put(d->driver->owner);
 	module_put(bus->owner);
 	return err;
 }
@@ -998,6 +1004,7 @@ void phy_detach(struct phy_device *phydev)
 	bus = phydev->mdio.bus;
 
 	put_device(&phydev->mdio.dev);
+	module_put(phydev->mdio.dev.driver->owner);
 	module_put(bus->owner);
 }
 EXPORT_SYMBOL(phy_detach);
-- 
2.7.0

^ permalink raw reply related

* Re: [PATCH net] tipc: check minimum bearer MTU
From: Michal Kubecek @ 2016-11-30 10:24 UTC (permalink / raw)
  To: Jon Maloy, Ying Xue
  Cc: David S. Miller, tipc-discussion, netdev, linux-kernel,
	Ben Hutchings, Qian Zhang
In-Reply-To: <20161130095702.DD033A0F14@unicorn.suse.cz>

On Wed, Nov 30, 2016 at 10:57:02AM +0100, Michal Kubecek wrote:
> Qian Zhang (张谦) reported a potential socket buffer overflow in
> tipc_msg_build() which is also known as CVE-2016-8632: due to
> insufficient checks, a buffer overflow can occur if MTU is too short for
> even tipc headers. As anyone can set device MTU in a user/net namespace,
> this issue can be abused by a regular user.
> 
> As agreed in the discussion on Ben Hutchings' original patch, we should
> check the MTU at the moment a bearer is attached rather than for each
> processed packet. We also need to repeat the check when bearer MTU is
> adjusted to new device MTU. UDP case also needs a check to avoid
> overflow when calculating bearer MTU.
> 
> Fixes: b97bf3fd8f6a ("[TIPC] Initial merge")
> Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
> Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn>

Self-NACK.

Im sorry, while testing this, I overlooked that an attempt to change
MTU of an underlying device to low value issues a warning but it
succeeds anyway.

> @@ -624,6 +626,9 @@ static int tipc_l2_device_event(struct notifier_block *nb, unsigned long evt,
>  		tipc_reset_bearer(net, b);
>  		break;
>  	case NETDEV_CHANGEMTU:
> +		if (tipc_check_mtu(dev, 0))
> +			return -EINVAL;
> +		b->mtu = dev->mtu;
>  		tipc_reset_bearer(net, b);
>  		break;
>  	case NETDEV_CHANGEADDR:

This is a notifier so that error value needs to be encoded into notifier
error. I'll send v2 after retesting

Michal Kubecek

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox