Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
From: Michael S. Tsirkin @ 2016-11-30 13:40 UTC (permalink / raw)
  To: Yunjian Wang; +Cc: jasowang, netdev, linux-kernel, caihe
In-Reply-To: <1480507857-22976-1-git-send-email-wangyunjian@huawei.com>

On Wed, Nov 30, 2016 at 08:10:57PM +0800, Yunjian Wang wrote:
> When we meet an error(err=-EBADFD) recvmsg,

How do you get EBADFD? Won't vhost_net_rx_peek_head_len
return 0 in this case, breaking the loop?

> the error handling in vhost
> handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.
> 
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> ---
>  drivers/vhost/net.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 5dc128a..edc470b 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
>  			pr_debug("Discarded rx packet: "
>  				 " len %d, expected %zd\n", err, sock_len);
>  			vhost_discard_vq_desc(vq, headcount);
> +			/* Don't continue to do, when meet errors. */
> +			if (err < 0)
> +				goto out;

You might get e.g. EAGAIN and I think you need to retry
in this case.

>  			continue;
>  		}
>  		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
> -- 
> 1.9.5.msysgit.1
> 

^ permalink raw reply

* Re: [PATCH net-next v3 2/2] net: phy: bcm7xxx: Plug in support for reading PHY error counters
From: Andrew Lunn @ 2016-11-30 13:37 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, davem, bcm-kernel-feedback-list, allan.nielsen,
	raju.lakkaraju
In-Reply-To: <20161129175718.20213-3-f.fainelli@gmail.com>

On Tue, Nov 29, 2016 at 09:57:18AM -0800, Florian Fainelli wrote:
> Broadcom BCM7xxx internal PHYs can leverage the Broadcom PHY library
> module PHY error counters helper functions, just implement the
> appropriate PHY driver function calls to do so. We need to allocate some
> storage space for our PHY statistics, and provide it to the Broadcom PHY
> library, so do this in a specific probe function, and slightly wrap the
> get_stats function call.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Hi Florian

Nice to see another PHY driver make use of this.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [PATCH] net/rtnetlink: fix attribute name in nlmsg_size() comments
From: Tobias Klauser @ 2016-11-30 13:30 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eric Dumazet

Use the correct attribute constant names IFLA_GSO_MAX_{SEGS,SIZE}
instead of IFLA_MAX_GSO_{SEGS,SIZE} for the comments int nlmsg_size().

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
---
 net/core/rtnetlink.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index deb35acbefd0..a6196cf844f6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -931,8 +931,8 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(4) /* IFLA_PROMISCUITY */
 	       + nla_total_size(4) /* IFLA_NUM_TX_QUEUES */
 	       + nla_total_size(4) /* IFLA_NUM_RX_QUEUES */
-	       + nla_total_size(4) /* IFLA_MAX_GSO_SEGS */
-	       + nla_total_size(4) /* IFLA_MAX_GSO_SIZE */
+	       + nla_total_size(4) /* IFLA_GSO_MAX_SEGS */
+	       + nla_total_size(4) /* IFLA_GSO_MAX_SIZE */
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_CARRIER_CHANGES */
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH net] tipc: check minimum bearer MTU
From: Michal Kubecek @ 2016-11-30 13:23 UTC (permalink / raw)
  To: Ying Xue
  Cc: Jon Maloy, David S. Miller, tipc-discussion, netdev, linux-kernel,
	Ben Hutchings, Qian Zhang
In-Reply-To: <4a2da388-d798-11cf-bf2c-84207cae6159@windriver.com>

On Wed, Nov 30, 2016 at 06:28:14PM +0800, Ying Xue wrote:
...
> >diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h
> >index 78892e2f53e3..1a0b7434ec24 100644
> >--- a/net/tipc/bearer.h
> >+++ b/net/tipc/bearer.h
> >@@ -39,6 +39,7 @@
> >
> > #include "netlink.h"
> > #include "core.h"
> >+#include "msg.h"
> > #include <net/genetlink.h>
> >
> > #define MAX_MEDIA	3
> >@@ -59,6 +60,9 @@
> > #define TIPC_MEDIA_TYPE_IB	2
> > #define TIPC_MEDIA_TYPE_UDP	3
> >
> >+/* minimum bearer MTU */
> >+#define TIPC_MIN_BEARER_MTU	(MAX_H_SIZE + INT_H_SIZE)
> >+
> > /**
> >  * struct tipc_media_addr - destination address used by TIPC bearers
> >  * @value: address info (format defined by media)
> >@@ -215,4 +219,13 @@ void tipc_bearer_xmit(struct net *net, u32 bearer_id,
> > void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id,
> > 			 struct sk_buff_head *xmitq);
> >
> >+/* check if device MTU is sufficient for tipc headers */
> >+inline bool tipc_check_mtu(struct net_device *dev, unsigned int reserve)
> 
> It's unnecessary to explicitly declare a function as inline, instead
> please let GCC smartly decide this.

This is a header file. But I just noticed the last change (adding
missing "static" keyword) is missing in the version I sent.

> 
> >+{
> >+	if (dev->mtu >= TIPC_MIN_BEARER_MTU + reserve)
> >+		return false;
> >+	netdev_warn(dev, "MTU too low for tipc bearer\n");
> >+	return true;
> >+}
> >+
> > #endif	/* _TIPC_BEARER_H */
> >diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
> >index 78cab9c5a445..376ed3e3ed46 100644
> >--- a/net/tipc/udp_media.c
> >+++ b/net/tipc/udp_media.c
> >@@ -697,6 +697,11 @@ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b,
> > 		udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
> > 		udp_conf.use_udp_checksums = false;
> > 		ub->ifindex = dev->ifindex;
> >+		if (tipc_check_mtu(dev, sizeof(struct iphdr) +
> >+					sizeof(struct udphdr))) {
> >+			err = -EINVAL;
> >+			goto err;
> >+		}
> 
> For UDP bearer, it seems insufficient for us to check MTU size only
> when UDP bearer is enabled. Meanwhile, we should update MTU size for
> UDP bearer with Path MTU discovery protocol once MTU size is changed
> after bearer is enabled.

I should admit I'm not that familiar with tipc. Do you mean updating
b->mtu in response to PMTU updates of the route used for ub->ubsock?
The way I understand it, it would be certainly useful but it's not
directly related to the security issue this patch addresses as if there
are no updates, b->mtu cannot get too low and there is no risk of
a buffer overflow. In other words, reflecting PMTU updates is something
that can be IMHO left for later.

                                                      Michal Kubecek

^ permalink raw reply

* Re: [PATCH net-next v2 2/4] Documentation: net: phy: Add a paragraph about pause frames/flow control
From: Sebastian Frias @ 2016-11-30 13:20 UTC (permalink / raw)
  To: Florian Fainelli, netdev
  Cc: davem, andrew, martin.blumenstingl, mans, alexandre.torgue,
	peppe.cavallaro, timur, jbrunet
In-Reply-To: <0a923150-582c-16dc-d14d-11d8a2620871@gmail.com>

On 28/11/16 18:33, Florian Fainelli wrote:
> On 11/28/2016 02:38 AM, Sebastian Frias wrote:
>> On 27/11/16 19:44, Florian Fainelli wrote:
>>> Describe that the Ethernet MAC controller is ultimately responsible for
>>> dealing with proper pause frames/flow control advertisement and
>>> enabling, and that it is therefore allowed to have it change
>>> phydev->supported/advertising with SUPPORTED_Pause and
>>> SUPPORTED_AsymPause.
>>>
>>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>>> ---
>>>  Documentation/networking/phy.txt | 18 ++++++++++++++++--
>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
>>> index 4b25c0f24201..9a42a9414cea 100644
>>> --- a/Documentation/networking/phy.txt
>>> +++ b/Documentation/networking/phy.txt
>>> @@ -127,8 +127,9 @@ Letting the PHY Abstraction Layer do Everything
>>>   values pruned from them which don't make sense for your controller (a 10/100
>>>   controller may be connected to a gigabit capable PHY, so you would need to
>>>   mask off SUPPORTED_1000baseT*).  See include/linux/ethtool.h for definitions
>>> - for these bitfields. Note that you should not SET any bits, or the PHY may
>>> - get put into an unsupported state.
>>> + for these bitfields. Note that you should not SET any bits, except the
>>> + SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get
>>> + put into an unsupported state.
>>>  
>>>   Lastly, once the controller is ready to handle network traffic, you call
>>>   phy_start(phydev).  This tells the PAL that you are ready, and configures the
>>> @@ -139,6 +140,19 @@ Letting the PHY Abstraction Layer do Everything
>>>   When you want to disconnect from the network (even if just briefly), you call
>>>   phy_stop(phydev).
>>>  
>>> +Pause frames / flow control
>>> +
>>> + The PHY does not participate directly in flow control/pause frames except by
>>> + making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in
>>> + MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC
>>> + controller supports such a thing. Since flow control/pause frames generation
>>> + involves the Ethernet MAC driver, it is recommended that this driver takes care
>>> + of properly indicating advertisement and support for such features by setting
>>> + the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done
>>> + either before or after phy_connect() 
>>
>> If the bits are set after phy_connect(), how does the PHY framework knows there's
>> an update to the bits? Should some call be made?
> 
> You would most likely either call phy_start() to start the PHY state
> machine (again) or have to re-negotiate the link with e.g:
> genphy_restart_aneg().
> 

Thanks, I think that would be worth adding to the documentation, right?

^ permalink raw reply

* [PATCH net] tcp: warn on bogus MSS and try to amend it
From: Marcelo Ricardo Leitner @ 2016-11-30 13:14 UTC (permalink / raw)
  To: netdev
  Cc: Jon Maxwell, Alex Sidorenko, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, tlfalcon, Brian King,
	Eric Dumazet, davem

There have been some reports lately about TCP connection stalls caused
by NIC drivers that aren't setting gso_size on aggregated packets on rx
path. This causes TCP to assume that the MSS is actually the size of the
aggregated packet, which is invalid.

Although the proper fix is to be done at each driver, it's often hard
and cumbersome for one to debug, come to such root cause and report/fix
it.

This patch amends this situation in two ways. First, it adds a warning
on when this situation occurs, so it gives a hint to those trying to
debug this. It also limit the maximum probed MSS to the adverised MSS,
as it should never be any higher than that.

The result is that the connection may not have the best performance ever
but it shouldn't stall, and the admin will have a hint on what to look
for.

Tested with virtio by forcing gso_size to 0.

Cc: Jonathan Maxwell <jmaxwell37@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/ipv4/tcp_input.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a27b9c0e27c08b4e4aeaff3d0bfdf3ae561ba4d8..ecc86105eb479de9b80db71af6a16a5af612a61c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -144,7 +144,10 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
 	 */
 	len = skb_shinfo(skb)->gso_size ? : skb->len;
 	if (len >= icsk->icsk_ack.rcv_mss) {
-		icsk->icsk_ack.rcv_mss = len;
+		icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
+					       tcp_sk(sk)->advmss);
+		if (icsk->icsk_ack.rcv_mss != len)
+			pr_warn_once("Seems your NIC driver is doing bad RX acceleration. TCP performance may be compromised.\n");
 	} else {
 		/* Otherwise, we make more careful check taking into account,
 		 * that SACKs block is variable.
-- 
2.9.3

^ permalink raw reply related

* Re: [RFC PATCH net-next] ipv6: implement consistent hashing for equal-cost multipath routing
From: Hannes Frederic Sowa @ 2016-11-30 13:12 UTC (permalink / raw)
  To: David Miller, david.lebrun; +Cc: netdev
In-Reply-To: <20161128.153209.2135257061368558724.davem@davemloft.net>

On Mon, Nov 28, 2016, at 21:32, David Miller wrote:
> From: David Lebrun <david.lebrun@uclouvain.be>
> Date: Mon, 28 Nov 2016 21:16:19 +0100
> 
> > The advantage of my solution over RFC2992 is lowest possible disruption
> > and equal rebalancing of affected flows. The disadvantage is the lookup
> > complexity of O(log n) vs O(1). Although from a theoretical viewpoint
> > O(1) is obviously better, would O(log n) have an effectively measurable
> > negative impact on scalability ? If we consider 32 next-hops for a route
> > and 100 pseudo-random numbers generated per next-hop, the lookup
> > algorithm would have to perform in the worst case log2 3200 = 11
> > comparisons to select a next-hop for that route.
> 
> When I was working on the routing cache removal in ipv4 I compared
> using a stupid O(1) hash lookup of the FIB entries vs. the O(log n)
> fib_trie stuff actually in use.
> 
> It did make a difference.
> 
> This is a lookup that can be invoked 20 million times per second or
> more.
> 
> Every cycle matters.
> 
> We already have a lot of trouble getting under the cycle budget one
> has for routing at wire speed for very high link rates, please don't
> make it worse.

David, one question: do you remember if you measured with linked lists
at that time or also with arrays. I actually would expect small arrays
that entirely fit into cachelines to be actually faster than our current
approach, which also walks a linked list, probably the best algorithm to
trash cache lines. I ask because I currently prefer this approach more
than having large allocations in the O(1) case because of easier code
and easier management.

Thanks,
Hannes

^ permalink raw reply

* Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
From: Jason Wang @ 2016-11-30 13:07 UTC (permalink / raw)
  To: Yunjian Wang, mst, netdev, linux-kernel; +Cc: caihe
In-Reply-To: <1480507857-22976-1-git-send-email-wangyunjian@huawei.com>



On 2016年11月30日 20:10, Yunjian Wang wrote:
> When we meet an error(err=-EBADFD) recvmsg, the error handling in vhost
> handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.
>
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> ---
>   drivers/vhost/net.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 5dc128a..edc470b 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
>   			pr_debug("Discarded rx packet: "
>   				 " len %d, expected %zd\n", err, sock_len);
>   			vhost_discard_vq_desc(vq, headcount);
> +			/* Don't continue to do, when meet errors. */
> +			if (err < 0)
> +				goto out;
>   			continue;
>   		}
>   		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */

Acked-by: Jason Wang <jasowang@redhat.com>

We may want to rename vhost_discard_vq_desc() in the future, since it 
does not discard the desc in fact.

^ permalink raw reply

* RE: [PATCH net-next v2 3/4] Documentation: net: phy: Add blurb about RGMII
From: David Laight @ 2016-11-30 12:32 UTC (permalink / raw)
  To: 'Florian Fainelli', Timur Tabi, netdev@vger.kernel.org
  Cc: davem@davemloft.net, andrew@lunn.ch, sf84@laposte.net,
	martin.blumenstingl@googlemail.com, mans@mansr.com,
	alexandre.torgue@st.com, peppe.cavallaro@st.com,
	jbrunet@baylibre.com
In-Reply-To: <a06d903f-5b80-4683-965f-9a6a1d5fe044@gmail.com>

From: Florian Fainelli
> Sent: 27 November 2016 23:03
> Le 27/11/2016  14:24, Timur Tabi a crit :
> >> + * PHY device drivers in PHYLIB being reusable by nature, being able to
> >> +   configure correctly a specified delay enables more designs with
> >> similar delay
> >> +   requirements to be operate correctly
> >
> > Ok, this one I don't know how to fix.  I'm not really sure what you're
> > trying to say.
> 
> What I am trying to say is that once a PHY driver properly configures a
> delay that you have specified, there is no reason why this is not
> applicable to other platforms using this same PHY driver.

As has been stated earlier it can depend on the track lengths on the
board itself.
(Although 1ns is about 1 foot - so track delays of that length are unlikely.)

> >> +Common problems with RGMII delay mismatch
> >> +
> >> + When there is a RGMII delay mismatch between the Ethernet MAC and
> >> the PHY, this
> >> + will most likely result in the clock and data line sampling to
> >> capture unstable
> >
> > I'm not sure what "sampling to capture unstable" is supposed to mean.
> 
> When the PHY devices takes a "snapshot" of the state of the data lines,
> after a clock edge, if the delay is improperly configured, these data
> lines are going to still be floating, or show some kind of
> capacitance/inductance effect, so the logical level which is going to be
> read may be incorrect.

No, the problem is that the data lines are being changed at much the same time
as the clock.
Quite possibly on both the rising and falling edges of the clock.

The actual latching of the data requires the data to be stable for the 'setup'
and 'hold' times of the latch (ie before and after the clock edge).
If the data and clock change at the same time it will be indeterminate whether
the old or new data is latched (the latch output might even oscillate).
The delay is there to ensure that the data isn't changing at the same time as
it is sampled.

At lower speed I suspect that the data only changes on one clock edge and is
sampled on the other.
(FWIW the latest DDR has an additional change in the data half way between
the clock edges!)

	David



^ permalink raw reply

* [PATCH net-next v2 2/2] tcp: allow to turn tcp timestamp randomization off
From: Florian Westphal @ 2016-11-30 12:28 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal
In-Reply-To: <1480508930-24406-1-git-send-email-fw@strlen.de>

Eric says: "By looking at tcpdump, and TS val of xmit packets of multiple
flows, we can deduct the relative qdisc delays (think of fq pacing).
This should work even if we have one flow per remote peer."

Having random per flow (or host) offsets doesn't allow that anymore so add
a way to turn this off.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 change since v2: do check in secure_tcpv4/6_sequence_number so outgoing
 syn packets won't have a random offset either in if randomization is off.

 Tested:
 sysctl_tcp_timestamps==1, tcpdump on lo, both ends have same values.

 Documentation/networking/ip-sysctl.txt | 9 +++++++--
 net/core/secure_seq.c                  | 5 +++--
 net/ipv4/tcp_input.c                   | 3 ++-
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 5af48dd7c5fc..de2448313799 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -610,8 +610,13 @@ tcp_syn_retries - INTEGER
 	with the current initial RTO of 1second. With this the final timeout
 	for an active TCP connection attempt will happen after 127seconds.
 
-tcp_timestamps - BOOLEAN
-	Enable timestamps as defined in RFC1323.
+tcp_timestamps - INTEGER
+Enable timestamps as defined in RFC1323.
+	0: Disabled.
+	1: Enable timestamps as defined in RFC1323.
+	2: Like 1, but also use a random offset for each connection
+	rather than only using the current time.
+	Default: 2
 
 tcp_min_tso_segs - INTEGER
 	Minimal number of segments per TSO frame.
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index a8d6062cbb4a..36addd3d9633 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -12,6 +12,7 @@
 #include <net/secure_seq.h>
 
 #if IS_ENABLED(CONFIG_IPV6) || IS_ENABLED(CONFIG_INET)
+#include <net/tcp.h>
 #define NET_SECRET_SIZE (MD5_MESSAGE_BYTES / 4)
 
 static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned;
@@ -58,7 +59,7 @@ u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
 
 	md5_transform(hash, secret);
 
-	*tsoff = hash[1];
+	*tsoff = sysctl_tcp_timestamps == 2 ? hash[1] : 0;
 	return seq_scale(hash[0]);
 }
 EXPORT_SYMBOL(secure_tcpv6_sequence_number);
@@ -100,7 +101,7 @@ u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
 
 	md5_transform(hash, net_secret);
 
-	*tsoff = hash[1];
+	*tsoff = sysctl_tcp_timestamps == 2 ? hash[1] : 0;
 	return seq_scale(hash[0]);
 }
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1b1921c71f7c..5f6d4efd2551 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -76,7 +76,7 @@
 #include <asm/unaligned.h>
 #include <linux/errqueue.h>
 
-int sysctl_tcp_timestamps __read_mostly = 1;
+int sysctl_tcp_timestamps __read_mostly = 2;
 int sysctl_tcp_window_scaling __read_mostly = 1;
 int sysctl_tcp_sack __read_mostly = 1;
 int sysctl_tcp_fack __read_mostly = 1;
@@ -85,6 +85,7 @@ int sysctl_tcp_dsack __read_mostly = 1;
 int sysctl_tcp_app_win __read_mostly = 31;
 int sysctl_tcp_adv_win_scale __read_mostly = 1;
 EXPORT_SYMBOL(sysctl_tcp_adv_win_scale);
+EXPORT_SYMBOL(sysctl_tcp_timestamps);
 
 /* rfc5961 challenge ack rate limiting */
 int sysctl_tcp_challenge_ack_limit = 1000;
-- 
2.7.3

^ permalink raw reply related

* [PATCH net-next v2 1/2] tcp: randomize tcp timestamp offsets for each connection
From: Florian Westphal @ 2016-11-30 12:28 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

jiffies based timestamps allow for easy inference of number of devices
behind NAT translators and also makes tracking of hosts simpler.

commit ceaa1fef65a7c2e ("tcp: adding a per-socket timestamp offset")
added the main infrastructure that is needed for per-connection ts
randomization, in particular writing/reading the on-wire tcp header
format takes the offset into account so rest of stack can use normal
tcp_time_stamp (jiffies).

So only two items are left:
 - add a tsoffset for request sockets
 - extend the tcp isn generator to also return another 32bit number
   in addition to the ISN.

Re-use of ISN generator also means timestamps are still monotonically
increasing for same connection quadruple, i.e. PAWS will still work.

Includes fixes from Eric Dumazet.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 No changes since v1, preserved Erics ack.

 include/linux/tcp.h      |  1 +
 include/net/secure_seq.h |  8 ++++----
 include/net/tcp.h        |  2 +-
 net/core/secure_seq.c    | 10 ++++++----
 net/ipv4/syncookies.c    |  1 +
 net/ipv4/tcp_input.c     |  7 ++++++-
 net/ipv4/tcp_ipv4.c      |  9 +++++----
 net/ipv4/tcp_minisocks.c |  4 +++-
 net/ipv4/tcp_output.c    |  2 +-
 net/ipv6/syncookies.c    |  1 +
 net/ipv6/tcp_ipv6.c      | 10 ++++++----
 11 files changed, 35 insertions(+), 20 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 32a7c7e35b71..2408bcc579f1 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -123,6 +123,7 @@ struct tcp_request_sock {
 	u32				txhash;
 	u32				rcv_isn;
 	u32				snt_isn;
+	u32				ts_off;
 	u32				last_oow_ack_time; /* last SYNACK */
 	u32				rcv_nxt; /* the ack # by SYNACK. For
 						  * FastOpen it's the seq#
diff --git a/include/net/secure_seq.h b/include/net/secure_seq.h
index 3f36d45b714a..0caee631a836 100644
--- a/include/net/secure_seq.h
+++ b/include/net/secure_seq.h
@@ -6,10 +6,10 @@
 u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
 u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport);
-__u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
-				 __be16 sport, __be16 dport);
-__u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
-				   __be16 sport, __be16 dport);
+u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
+			       __be16 sport, __be16 dport, u32 *tsoff);
+u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
+				 __be16 sport, __be16 dport, u32 *tsoff);
 u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr,
 				__be16 sport, __be16 dport);
 u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7de80739adab..1c09d909bd43 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1809,7 +1809,7 @@ struct tcp_request_sock_ops {
 	struct dst_entry *(*route_req)(const struct sock *sk, struct flowi *fl,
 				       const struct request_sock *req,
 				       bool *strict);
-	__u32 (*init_seq)(const struct sk_buff *skb);
+	__u32 (*init_seq)(const struct sk_buff *skb, u32 *tsoff);
 	int (*send_synack)(const struct sock *sk, struct dst_entry *dst,
 			   struct flowi *fl, struct request_sock *req,
 			   struct tcp_fastopen_cookie *foc,
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index fd3ce461fbe6..a8d6062cbb4a 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -40,8 +40,8 @@ static u32 seq_scale(u32 seq)
 #endif
 
 #if IS_ENABLED(CONFIG_IPV6)
-__u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
-				   __be16 sport, __be16 dport)
+u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
+				 __be16 sport, __be16 dport, u32 *tsoff)
 {
 	u32 secret[MD5_MESSAGE_BYTES / 4];
 	u32 hash[MD5_DIGEST_WORDS];
@@ -58,6 +58,7 @@ __u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
 
 	md5_transform(hash, secret);
 
+	*tsoff = hash[1];
 	return seq_scale(hash[0]);
 }
 EXPORT_SYMBOL(secure_tcpv6_sequence_number);
@@ -86,8 +87,8 @@ EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 
 #ifdef CONFIG_INET
 
-__u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
-				 __be16 sport, __be16 dport)
+u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
+			       __be16 sport, __be16 dport, u32 *tsoff)
 {
 	u32 hash[MD5_DIGEST_WORDS];
 
@@ -99,6 +100,7 @@ __u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
 
 	md5_transform(hash, net_secret);
 
+	*tsoff = hash[1];
 	return seq_scale(hash[0]);
 }
 
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 0dc6286272aa..3e88467d70ee 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -334,6 +334,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
 	treq = tcp_rsk(req);
 	treq->rcv_isn		= ntohl(th->seq) - 1;
 	treq->snt_isn		= cookie;
+	treq->ts_off		= 0;
 	req->mss		= mss;
 	ireq->ir_num		= ntohs(th->dest);
 	ireq->ir_rmt_port	= th->source;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 22e6a2097ff6..1b1921c71f7c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6301,6 +6301,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 		goto drop;
 
 	tcp_rsk(req)->af_specific = af_ops;
+	tcp_rsk(req)->ts_off = 0;
 
 	tcp_clear_options(&tmp_opt);
 	tmp_opt.mss_clamp = af_ops->mss_clamp;
@@ -6322,6 +6323,9 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	if (security_inet_conn_request(sk, skb, req))
 		goto drop_and_free;
 
+	if (isn && tmp_opt.tstamp_ok)
+		af_ops->init_seq(skb, &tcp_rsk(req)->ts_off);
+
 	if (!want_cookie && !isn) {
 		/* VJ's idea. We save last timestamp seen
 		 * from the destination in peer table, when entering
@@ -6362,7 +6366,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 			goto drop_and_release;
 		}
 
-		isn = af_ops->init_seq(skb);
+		isn = af_ops->init_seq(skb, &tcp_rsk(req)->ts_off);
 	}
 	if (!dst) {
 		dst = af_ops->route_req(sk, &fl, req, NULL);
@@ -6374,6 +6378,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 
 	if (want_cookie) {
 		isn = cookie_init_sequence(af_ops, sk, skb, &req->mss);
+		tcp_rsk(req)->ts_off = 0;
 		req->cookie_ts = tmp_opt.tstamp_ok;
 		if (!tmp_opt.tstamp_ok)
 			inet_rsk(req)->ecn_ok = 0;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5555eb86e549..b50f05905ced 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -95,12 +95,12 @@ static int tcp_v4_md5_hash_hdr(char *md5_hash, const struct tcp_md5sig_key *key,
 struct inet_hashinfo tcp_hashinfo;
 EXPORT_SYMBOL(tcp_hashinfo);
 
-static  __u32 tcp_v4_init_sequence(const struct sk_buff *skb)
+static u32 tcp_v4_init_sequence(const struct sk_buff *skb, u32 *tsoff)
 {
 	return secure_tcp_sequence_number(ip_hdr(skb)->daddr,
 					  ip_hdr(skb)->saddr,
 					  tcp_hdr(skb)->dest,
-					  tcp_hdr(skb)->source);
+					  tcp_hdr(skb)->source, tsoff);
 }
 
 int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
@@ -237,7 +237,8 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 		tp->write_seq = secure_tcp_sequence_number(inet->inet_saddr,
 							   inet->inet_daddr,
 							   inet->inet_sport,
-							   usin->sin_port);
+							   usin->sin_port,
+							   &tp->tsoffset);
 
 	inet->inet_id = tp->write_seq ^ jiffies;
 
@@ -824,7 +825,7 @@ static void tcp_v4_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb,
 	tcp_v4_send_ack(sk, skb, seq,
 			tcp_rsk(req)->rcv_nxt,
 			req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale,
-			tcp_time_stamp,
+			tcp_time_stamp + tcp_rsk(req)->ts_off,
 			req->ts_recent,
 			0,
 			tcp_md5_do_lookup(sk, (union tcp_md5_addr *)&ip_hdr(skb)->daddr,
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 6234ebaa7db1..28ce5ee831f5 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -532,7 +532,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
 			newtp->rx_opt.ts_recent_stamp = 0;
 			newtp->tcp_header_len = sizeof(struct tcphdr);
 		}
-		newtp->tsoffset = 0;
+		newtp->tsoffset = treq->ts_off;
 #ifdef CONFIG_TCP_MD5SIG
 		newtp->md5sig_info = NULL;	/*XXX*/
 		if (newtp->af_specific->md5_lookup(sk, newsk))
@@ -581,6 +581,8 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 
 		if (tmp_opt.saw_tstamp) {
 			tmp_opt.ts_recent = req->ts_recent;
+			if (tmp_opt.rcv_tsecr)
+				tmp_opt.rcv_tsecr -= tcp_rsk(req)->ts_off;
 			/* We do not store true stamp, but it is not required,
 			 * it can be estimated (approximately)
 			 * from another data.
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 19105b46a304..1b6d5f34bf45 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -640,7 +640,7 @@ static unsigned int tcp_synack_options(struct request_sock *req,
 	}
 	if (likely(ireq->tstamp_ok)) {
 		opts->options |= OPTION_TS;
-		opts->tsval = tcp_skb_timestamp(skb);
+		opts->tsval = tcp_skb_timestamp(skb) + tcp_rsk(req)->ts_off;
 		opts->tsecr = req->ts_recent;
 		remaining -= TCPOLEN_TSTAMP_ALIGNED;
 	}
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 97830a6a9cbb..a4d49760bf43 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -209,6 +209,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 	treq->snt_synack.v64	= 0;
 	treq->rcv_isn = ntohl(th->seq) - 1;
 	treq->snt_isn = cookie;
+	treq->ts_off = 0;
 
 	/*
 	 * We need to lookup the dst_entry to get the correct window size.
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 28ec0a2e7b72..a2185a214abc 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -101,12 +101,12 @@ static void inet6_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
 	}
 }
 
-static __u32 tcp_v6_init_sequence(const struct sk_buff *skb)
+static u32 tcp_v6_init_sequence(const struct sk_buff *skb, u32 *tsoff)
 {
 	return secure_tcpv6_sequence_number(ipv6_hdr(skb)->daddr.s6_addr32,
 					    ipv6_hdr(skb)->saddr.s6_addr32,
 					    tcp_hdr(skb)->dest,
-					    tcp_hdr(skb)->source);
+					    tcp_hdr(skb)->source, tsoff);
 }
 
 static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
@@ -283,7 +283,8 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 		tp->write_seq = secure_tcpv6_sequence_number(np->saddr.s6_addr32,
 							     sk->sk_v6_daddr.s6_addr32,
 							     inet->inet_sport,
-							     inet->inet_dport);
+							     inet->inet_dport,
+							     &tp->tsoffset);
 
 	err = tcp_connect(sk);
 	if (err)
@@ -956,7 +957,8 @@ static void tcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb,
 			tcp_rsk(req)->snt_isn + 1 : tcp_sk(sk)->snd_nxt,
 			tcp_rsk(req)->rcv_nxt,
 			req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale,
-			tcp_time_stamp, req->ts_recent, sk->sk_bound_dev_if,
+			tcp_time_stamp + tcp_rsk(req)->ts_off,
+			req->ts_recent, sk->sk_bound_dev_if,
 			tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr),
 			0, 0);
 }
-- 
2.7.3

^ permalink raw reply related

* Re: [PATCH net 2/2] esp6: Fix integrity verification when ESN are used
From: Steffen Klassert @ 2016-11-30 12:17 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Tobias Brunner, David S. Miller, netdev
In-Reply-To: <20161130095837.GB3138@gondor.apana.org.au>

On Wed, Nov 30, 2016 at 05:58:38PM +0800, Herbert Xu wrote:
> On Tue, Nov 29, 2016 at 05:05:25PM +0100, Tobias Brunner wrote:
> > When handling inbound packets, the two halves of the sequence number
> > stored on the skb are already in network order.
> > 
> > Fixes: 000ae7b2690e ("esp6: Switch to new AEAD interface")
> > Signed-off-by: Tobias Brunner <tobias@strongswan.org>
> 
> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Also applied to the ipsec tree, thanks a lot everyone!

^ permalink raw reply

* Re: [PATCH net 1/2] esp4: Fix integrity verification when ESN are used
From: Steffen Klassert @ 2016-11-30 12:17 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Tobias Brunner, David S. Miller, netdev
In-Reply-To: <20161130095827.GA3138@gondor.apana.org.au>

On Wed, Nov 30, 2016 at 05:58:27PM +0800, Herbert Xu wrote:
> On Tue, Nov 29, 2016 at 05:05:20PM +0100, Tobias Brunner wrote:
> > When handling inbound packets, the two halves of the sequence number
> > stored on the skb are already in network order.
> > 
> > Fixes: 7021b2e1cddd ("esp4: Switch to new AEAD interface")
> > Signed-off-by: Tobias Brunner <tobias@strongswan.org>
> 
> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied to the ipsec tree, thanks!

^ permalink raw reply

* Re: [PATCH] xfrm_user: fix return value from xfrm_user_rcv_msg
From: Steffen Klassert @ 2016-11-30 12:15 UTC (permalink / raw)
  To: Yi Zhao; +Cc: netdev, fan.du
In-Reply-To: <1480414141-17801-1-git-send-email-yi.zhao@windriver.com>

On Tue, Nov 29, 2016 at 06:09:01PM +0800, Yi Zhao wrote:
> It doesn't support to run 32bit 'ip' to set xfrm objdect on 64bit host.
> But the return value is unknown for user program:
> 
> ip xfrm policy list
> RTNETLINK answers: Unknown error 524
> 
> Replace ENOTSUPP with EOPNOTSUPP:
> 
> ip xfrm policy list
> RTNETLINK answers: Operation not supported
> 
> Signed-off-by: Yi Zhao <yi.zhao@windriver.com>

Applied to the ipsec tree, thanks!

^ permalink raw reply

* [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
From: Yunjian Wang @ 2016-11-30 12:10 UTC (permalink / raw)
  To: mst, jasowang, netdev, linux-kernel; +Cc: caihe, wangyunjian

When we meet an error(err=-EBADFD) recvmsg, the error handling in vhost
handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
---
 drivers/vhost/net.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5dc128a..edc470b 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
 			pr_debug("Discarded rx packet: "
 				 " len %d, expected %zd\n", err, sock_len);
 			vhost_discard_vq_desc(vq, headcount);
+			/* Don't continue to do, when meet errors. */
+			if (err < 0)
+				goto out;
 			continue;
 		}
 		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
-- 
1.9.5.msysgit.1

^ permalink raw reply related

* [PATCH 2/2] net: rfkill: Add rfkill-any LED trigger
From: Michał Kępień @ 2016-11-30 12:03 UTC (permalink / raw)
  To: Johannes Berg, David S . Miller
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161130120317.11851-1-kernel-ePNcKBjznIDVItvQsEIGlw@public.gmane.org>

This patch adds a new "global" (i.e. not per-rfkill device) LED trigger,
rfkill-any, which may be useful for laptops with a single "radio LED"
and multiple radio transmitters.  The trigger is meant to turn a LED on
whenever there is at least one radio transmitter active and turn it off
otherwise.

Signed-off-by: Michał Kępień <kernel-ePNcKBjznIDVItvQsEIGlw@public.gmane.org>
---
Note that the search for any active radio will have quadratic complexity
whenever __rfkill_switch_all() is used (as it calls rfkill_set_block()
for every affected rfkill device), but I intentionally refrained from
implementing rfkill_any_led_trigger_event() using struct work_struct to
keep things simple, given the average number of rfkill devices in
hardware these days.  Please let me know in case this should be
reworked.

 net/rfkill/core.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index f28e441..5275f2f 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -176,6 +176,47 @@ static void rfkill_led_trigger_unregister(struct rfkill *rfkill)
 {
 	led_trigger_unregister(&rfkill->led_trigger);
 }
+
+static struct led_trigger rfkill_any_led_trigger;
+
+static void __rfkill_any_led_trigger_event(void)
+{
+	enum led_brightness brightness = LED_OFF;
+	struct rfkill *rfkill;
+
+	list_for_each_entry(rfkill, &rfkill_list, node) {
+		if (!(rfkill->state & RFKILL_BLOCK_ANY)) {
+			brightness = LED_FULL;
+			break;
+		}
+	}
+
+	led_trigger_event(&rfkill_any_led_trigger, brightness);
+}
+
+static void rfkill_any_led_trigger_event(void)
+{
+	mutex_lock(&rfkill_global_mutex);
+	__rfkill_any_led_trigger_event();
+	mutex_unlock(&rfkill_global_mutex);
+}
+
+static void rfkill_any_led_trigger_activate(struct led_classdev *led_cdev)
+{
+	rfkill_any_led_trigger_event();
+}
+
+static int rfkill_any_led_trigger_register(void)
+{
+	rfkill_any_led_trigger.name = "rfkill-any";
+	rfkill_any_led_trigger.activate = rfkill_any_led_trigger_activate;
+	return led_trigger_register(&rfkill_any_led_trigger);
+}
+
+static void rfkill_any_led_trigger_unregister(void)
+{
+	led_trigger_unregister(&rfkill_any_led_trigger);
+}
 #else
 static void rfkill_led_trigger_event(struct rfkill *rfkill)
 {
@@ -189,6 +230,19 @@ static inline int rfkill_led_trigger_register(struct rfkill *rfkill)
 static inline void rfkill_led_trigger_unregister(struct rfkill *rfkill)
 {
 }
+
+static void rfkill_any_led_trigger_event(void)
+{
+}
+
+static int rfkill_any_led_trigger_register(void)
+{
+	return 0;
+}
+
+static void rfkill_any_led_trigger_unregister(void)
+{
+}
 #endif /* CONFIG_RFKILL_LEDS */
 
 static void rfkill_fill_event(struct rfkill_event *ev, struct rfkill *rfkill,
@@ -297,6 +351,7 @@ static void rfkill_set_block(struct rfkill *rfkill, bool blocked)
 	spin_unlock_irqrestore(&rfkill->lock, flags);
 
 	rfkill_led_trigger_event(rfkill);
+	__rfkill_any_led_trigger_event();
 
 	if (prev != curr)
 		rfkill_event(rfkill);
@@ -477,6 +532,7 @@ bool rfkill_set_hw_state(struct rfkill *rfkill, bool blocked)
 	spin_unlock_irqrestore(&rfkill->lock, flags);
 
 	rfkill_led_trigger_event(rfkill);
+	rfkill_any_led_trigger_event();
 
 	if (!rfkill->registered)
 		return ret;
@@ -523,6 +579,7 @@ bool rfkill_set_sw_state(struct rfkill *rfkill, bool blocked)
 		schedule_work(&rfkill->uevent_work);
 
 	rfkill_led_trigger_event(rfkill);
+	rfkill_any_led_trigger_event();
 
 	return blocked;
 }
@@ -572,6 +629,7 @@ void rfkill_set_states(struct rfkill *rfkill, bool sw, bool hw)
 			schedule_work(&rfkill->uevent_work);
 
 		rfkill_led_trigger_event(rfkill);
+		rfkill_any_led_trigger_event();
 	}
 }
 EXPORT_SYMBOL(rfkill_set_states);
@@ -988,6 +1046,7 @@ int __must_check rfkill_register(struct rfkill *rfkill)
 #endif
 	}
 
+	__rfkill_any_led_trigger_event();
 	rfkill_send_events(rfkill, RFKILL_OP_ADD);
 
 	mutex_unlock(&rfkill_global_mutex);
@@ -1020,6 +1079,7 @@ void rfkill_unregister(struct rfkill *rfkill)
 	mutex_lock(&rfkill_global_mutex);
 	rfkill_send_events(rfkill, RFKILL_OP_DEL);
 	list_del_init(&rfkill->node);
+	__rfkill_any_led_trigger_event();
 	mutex_unlock(&rfkill_global_mutex);
 
 	rfkill_led_trigger_unregister(rfkill);
@@ -1278,8 +1338,18 @@ static int __init rfkill_init(void)
 		goto error_input;
 #endif
 
+#ifdef CONFIG_RFKILL_LEDS
+	error = rfkill_any_led_trigger_register();
+	if (error)
+		goto error_led_trigger;
+#endif
+
 	return 0;
 
+error_led_trigger:
+#ifdef CONFIG_RFKILL_INPUT
+	rfkill_handler_exit();
+#endif
 error_input:
 	misc_deregister(&rfkill_miscdev);
 error_misc:
@@ -1291,6 +1361,9 @@ subsys_initcall(rfkill_init);
 
 static void __exit rfkill_exit(void)
 {
+#ifdef CONFIG_RFKILL_LEDS
+	rfkill_any_led_trigger_unregister();
+#endif
 #ifdef CONFIG_RFKILL_INPUT
 	rfkill_handler_exit();
 #endif
-- 
2.10.2

^ permalink raw reply related

* [PATCH 1/2] net: rfkill: Cleanup error handling in rfkill_init()
From: Michał Kępień @ 2016-11-30 12:03 UTC (permalink / raw)
  To: Johannes Berg, David S . Miller; +Cc: linux-wireless, netdev, linux-kernel

Use a separate label per error condition in rfkill_init() to make it a
bit cleaner and easier to extend.

Signed-off-by: Michał Kępień <kernel@kempniu.pl>
---
 net/rfkill/core.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index 884027f..f28e441 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -1266,24 +1266,25 @@ static int __init rfkill_init(void)
 
 	error = class_register(&rfkill_class);
 	if (error)
-		goto out;
+		goto error_class;
 
 	error = misc_register(&rfkill_miscdev);
-	if (error) {
-		class_unregister(&rfkill_class);
-		goto out;
-	}
+	if (error)
+		goto error_misc;
 
 #ifdef CONFIG_RFKILL_INPUT
 	error = rfkill_handler_init();
-	if (error) {
-		misc_deregister(&rfkill_miscdev);
-		class_unregister(&rfkill_class);
-		goto out;
-	}
+	if (error)
+		goto error_input;
 #endif
 
- out:
+	return 0;
+
+error_input:
+	misc_deregister(&rfkill_miscdev);
+error_misc:
+	class_unregister(&rfkill_class);
+error_class:
 	return error;
 }
 subsys_initcall(rfkill_init);
-- 
2.10.2

^ permalink raw reply related

* RE: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable
From: Hayes Wang @ 2016-11-30 11:58 UTC (permalink / raw)
  To: David Miller, mlord-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org
  Cc: greg-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org,
	romieu-W8zweXLXuWQS+FvcfC7Uqw@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, nic_swsd,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161125.115827.2014848246966159357.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

Mark Lord <mlord-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
[...]
> > Not sure why, because there really is no other way for the data to
> > appear where it does at the beginning of that URB buffer.
> >
> > This does seem a rather unexpected burden to place upon someone
> > reporting a regression in a USB network driver that corrupts user data.
> 
> If you are the only person who can actively reproduce this, which
> seems to be the case right now, this is unfortunately the only way to
> reach a proper analysis and fix.

I have tested it with iperf more than five days without any error.
I would think if there is any other way to reproduce it.

Best Regards,
Hayes

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] stmmac: simplify flag assignment
From: Pavel Machek @ 2016-11-30 11:44 UTC (permalink / raw)
  To: David Miller; +Cc: peppe.cavallaro, netdev, linux-kernel
In-Reply-To: <20161124.110416.198867271899443489.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 933 bytes --]


Simplify flag assignment.
    
Signed-off-by: Pavel Machek <pavel@denx.de>

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index ed20668..0b706a7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2771,12 +2771,8 @@ static netdev_features_t stmmac_fix_features(struct net_device *dev,
 		features &= ~NETIF_F_CSUM_MASK;
 
 	/* Disable tso if asked by ethtool */
-	if ((priv->plat->tso_en) && (priv->dma_cap.tsoen)) {
-		if (features & NETIF_F_TSO)
-			priv->tso = true;
-		else
-			priv->tso = false;
-	}
+	if ((priv->plat->tso_en) && (priv->dma_cap.tsoen))
+		priv->tso = !!(features & NETIF_F_TSO);
 
 	return features;
 }


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply related

* Re: [WIP] net+mlx4: auto doorbell
From: Jesper Dangaard Brouer @ 2016-11-30 11:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Rick Jones, netdev, Saeed Mahameed, Tariq Toukan, brouer,
	Achiad Shochat
In-Reply-To: <1480402716.18162.124.camel@edumazet-glaptop3.roam.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 3942 bytes --]


I've played with a somewhat similar patch (from Achiad Shochat) for
mlx5 (attached).  While it gives huge improvements, the problem I ran
into was that; TX performance became a function of the TX completion
time/interrupt and could easily be throttled if configured too
high/slow.

Can your patch be affected by this too?

Adjustable via:
 ethtool -C mlx5p2 tx-usecs 16 tx-frames 32
 

On Mon, 28 Nov 2016 22:58:36 -0800 Eric Dumazet <eric.dumazet@gmail.com> wrote:

> I have a WIP, that increases pktgen rate by 75 % on mlx4 when bulking is
> not used.
> 
> lpaa23:~# echo 0 >/sys/class/net/eth0/doorbell_opt 
> lpaa23:~# sar -n DEV 1 10|grep eth0
[...]
> Average:         eth0      9.50 5707925.60      0.99 585285.69      0.00      0.00      0.50
> lpaa23:~# echo 1 >/sys/class/net/eth0/doorbell_opt 
> lpaa23:~# sar -n DEV 1 10|grep eth0
[...]
> Average:         eth0      2.40 9985214.90      0.31 1023874.60      0.00      0.00      0.50

These +75% number is pktgen without "burst", and definitely show that
your patch activate xmit_more.
What is the pps performance number when using pktgen "burst" option?


> And about 11 % improvement on an mono-flow UDP_STREAM test.
> 
> skb_set_owner_w() is now the most consuming function.
> 
> 
> lpaa23:~# ./udpsnd -4 -H 10.246.7.152 -d 2 &
> [1] 13696
> lpaa23:~# echo 0 >/sys/class/net/eth0/doorbell_opt
> lpaa23:~# sar -n DEV 1 10|grep eth0
[...]
> Average:         eth0      9.00 1307380.50      1.00 308356.18      0.00      0.00      0.50
> lpaa23:~# echo 3 >/sys/class/net/eth0/doorbell_opt
> lpaa23:~# sar -n DEV 1 10|grep eth0
[...]
> Average:         eth0      3.10 1459558.20      0.44 344267.57      0.00      0.00      0.50

The +11% number seems consistent with my perf observations that approx
12% was "fakely" spend on the xmit spin_lock.


[...]
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 4b597dca5c52..affebb435679 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
[...]
> -static inline bool mlx4_en_is_tx_ring_full(struct mlx4_en_tx_ring *ring)
> +static inline bool mlx4_en_is_tx_ring_full(const struct mlx4_en_tx_ring *ring)
>  {
> -	return ring->prod - ring->cons > ring->full_size;
> +	return READ_ONCE(ring->prod) - READ_ONCE(ring->cons) > ring->full_size;
>  }
[...]

> @@ -1033,6 +1058,14 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
>  	}
>  	send_doorbell = !skb->xmit_more || netif_xmit_stopped(ring->tx_queue);
>  
> +	/* Doorbell avoidance : We can omit doorbell if we know a TX completion
> +	 * will happen shortly.
> +	 */
> +	if (send_doorbell &&
> +	    dev->doorbell_opt &&
> +	    (s32)(READ_ONCE(ring->prod_bell) - READ_ONCE(ring->ncons)) > 0)

It would be nice with a function call with an appropriate name, instead
of an open-coded queue size check.  I'm also confused by the "ncons" name.

> +		send_doorbell = false;
> +
[...]

> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index 574bcbb1b38f..c3fd0deda198 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -280,6 +280,7 @@ struct mlx4_en_tx_ring {
>  	 */
>  	u32			last_nr_txbb;
>  	u32			cons;
> +	u32			ncons;

Maybe we can find a better name than "ncons" ?

>  	unsigned long		wake_queue;
>  	struct netdev_queue	*tx_queue;
>  	u32			(*free_tx_desc)(struct mlx4_en_priv *priv,
> @@ -290,6 +291,7 @@ struct mlx4_en_tx_ring {
>  
>  	/* cache line used and dirtied in mlx4_en_xmit() */
>  	u32			prod ____cacheline_aligned_in_smp;
> +	u32			prod_bell;

Good descriptive variable name.

>  	unsigned int		tx_dropped;
>  	unsigned long		bytes;
>  	unsigned long		packets;


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[-- Attachment #2: net_mlx5e__force_tx_skb_bulking.patch --]
[-- Type: text/x-patch, Size: 5079 bytes --]

Return-Path: tariqt@mellanox.com
Received: from zmta04.collab.prod.int.phx2.redhat.com (LHLO
 zmta04.collab.prod.int.phx2.redhat.com) (10.5.81.11) by
 zmail22.collab.prod.int.phx2.redhat.com with LMTP; Wed, 17 Aug 2016
 05:21:47 -0400 (EDT)
Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])
	by zmta04.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id B23B4DA128
	for <jbrouer@mail.corp.redhat.com>; Wed, 17 Aug 2016 05:21:47 -0400 (EDT)
Received: from mx1.redhat.com (ext-mx01.extmail.prod.ext.phx2.redhat.com [10.5.110.25])
	by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u7H9LlWp015796
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO)
	for <brouer@redhat.com>; Wed, 17 Aug 2016 05:21:47 -0400
Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129])
	by mx1.redhat.com (Postfix) with ESMTP id B3ADB8122E
	for <brouer@redhat.com>; Wed, 17 Aug 2016 09:21:45 +0000 (UTC)
Received: from Internal Mail-Server by MTLPINE1 (envelope-from tariqt@mellanox.com)
	with ESMTPS (AES256-SHA encrypted); 17 Aug 2016 12:15:03 +0300
Received: from dev-l-vrt-206.mtl.labs.mlnx (dev-l-vrt-206.mtl.labs.mlnx [10.134.206.1])
	by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id u7H9F31D010642;
	Wed, 17 Aug 2016 12:15:03 +0300
From: Tariq Toukan <tariqt@mellanox.com>
To: Jesper Dangaard Brouer <brouer@redhat.com>,
        Achiad Shochat <achiad@mellanox.com>,
        Rana Shahout <ranas@mellanox.com>,
        Saeed Mahameed <saeedm@mellanox.com>
Subject: [PATCH] net/mlx5e: force tx skb bulking
Date: Wed, 17 Aug 2016 12:14:51 +0300
Message-Id: <1471425291-1782-1-git-send-email-tariqt@mellanox.com>
X-Greylist: Delayed for 00:06:41 by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Wed, 17 Aug 2016 09:21:46 +0000 (UTC)
X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Wed, 17 Aug 2016 09:21:46 +0000 (UTC) for IP:'193.47.165.129' DOMAIN:'mail-il-dmz.mellanox.com' HELO:'mellanox.co.il' FROM:'tariqt@mellanox.com' RCPT:''
X-RedHat-Spam-Score: 0.251  (BAYES_50,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY) 193.47.165.129 mail-il-dmz.mellanox.com 193.47.165.129 mail-il-dmz.mellanox.com <tariqt@mellanox.com>
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23
X-Scanned-By: MIMEDefang 2.78 on 10.5.110.25

From: Achiad Shochat <achiad@mellanox.com>

To improve SW message rate in case HW is faster.
Heuristically detect cases where the message rate is high and there
is still no skb bulking and if so, stops the txq for a while trying
to force the bulking.

Change-Id: Icb925135e69b030943cb4666117c47d1cc04da97
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h    | 5 +++++
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 9 ++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 74edd01..78a0661 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -394,6 +394,10 @@ enum {
 	MLX5E_SQ_STATE_TX_TIMEOUT,
 };
 
+enum {
+	MLX5E_SQ_STOP_ONCE,
+};
+
 struct mlx5e_ico_wqe_info {
 	u8  opcode;
 	u8  num_wqebbs;
@@ -403,6 +407,7 @@ struct mlx5e_sq {
 	/* data path */
 
 	/* dirtied @completion */
+	unsigned long              flags;
 	u16                        cc;
 	u32                        dma_fifo_cc;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index e073bf59..034eef0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -351,8 +351,10 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb)
 	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
 		skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
 
-	if (unlikely(!mlx5e_sq_has_room_for(sq, MLX5E_SQ_STOP_ROOM))) {
+	if (test_bit(MLX5E_SQ_STOP_ONCE, &sq->flags) ||
+	    unlikely(!mlx5e_sq_has_room_for(sq, MLX5E_SQ_STOP_ROOM))) {
 		netif_tx_stop_queue(sq->txq);
+		clear_bit(MLX5E_SQ_STOP_ONCE, &sq->flags);
 		sq->stats.stopped++;
 	}
 
@@ -429,6 +431,7 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 	u32 dma_fifo_cc;
 	u32 nbytes;
 	u16 npkts;
+	u16 ncqes;
 	u16 sqcc;
 	int i;
 
@@ -439,6 +442,7 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 
 	npkts = 0;
 	nbytes = 0;
+	ncqes = 0;
 
 	/* sq->cc must be updated only after mlx5_cqwq_update_db_record(),
 	 * otherwise a cq overrun may occur
@@ -458,6 +462,7 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 			break;
 
 		mlx5_cqwq_pop(&cq->wq);
+		ncqes++;
 
 		wqe_counter = be16_to_cpu(cqe->wqe_counter);
 
@@ -508,6 +513,8 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 
 	sq->dma_fifo_cc = dma_fifo_cc;
 	sq->cc = sqcc;
+	if ((npkts > 7) && ((npkts >> (ilog2(ncqes))) < 8))
+		set_bit(MLX5E_SQ_STOP_ONCE, &sq->flags);
 
 	netdev_tx_completed_queue(sq->txq, npkts, nbytes);
 
-- 
1.8.3.1


^ permalink raw reply related

* DSA vs. SWTICHDEV ?
From: Joakim Tjernlund @ 2016-11-30  8:50 UTC (permalink / raw)
  To: netdev@vger.kernel.org

I am trying to wrap my head around these two "devices" and have a hard time telling them apart.
We are looking att adding a faily large switch(over PCIe) to our board and from what I can tell
switchdev is the new way to do it but DSA is still there. Is it possible to just list
how they differ?

What can switchdev do that DSA cannot?

What can DSA do that switchdev cannot?


Can one enable switchdev and dsa for the same switch device?

 Jocke 

^ permalink raw reply

* Re: net: GPF in rt6_get_cookie
From: Andrey Konovalov @ 2016-11-30 11:10 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: syzkaller, David Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Eric Dumazet
In-Reply-To: <29124960-9002-cfd0-c6b9-8986d7e8c875@stressinduktion.org>

On Wed, Nov 30, 2016 at 12:00 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hi
>
> On 30.11.2016 11:39, Andrey Konovalov wrote:
>> On Sat, Nov 26, 2016 at 5:23 PM, 'Dmitry Vyukov' via syzkaller
>> <syzkaller@googlegroups.com> wrote:
>>> Hello,
>>>
>>> I got several GPFs in rt6_get_cookie while running syzkaller:
>>>
>>> general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>>> Dumping ftrace buffer:
>>>    (ftrace buffer empty)
>>> Modules linked in:
>>> CPU: 2 PID: 10156 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>> task: ffff880016f40480 task.stack: ffff88000fc00000
>>> RIP: 0010:[<ffffffff87a209e8>]  [<     inline     >] rt6_get_cookie
>>> include/net/ip6_fib.h:174
>>> RIP: 0010:[<ffffffff87a209e8>]  [<ffffffff87a209e8>]
>>> sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>>> RSP: 0018:ffff88000fc07298  EFLAGS: 00010202
>>> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc900029f5000
>>> RDX: 0000000000000015 RSI: 0000000000000001 RDI: 00000000000000a8
>>> RBP: ffff88000fc07580 R08: 0000000000000000 R09: 0000000000000001
>>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff880066cd0068
>>> R13: 1ffff10001f80e92 R14: ffff880066cd0040 R15: ffff88005f2d2808
>>> FS:  00007f52c41f7700(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 0000000020016000 CR3: 0000000065dd7000 CR4: 00000000000006e0
>>> DR0: 0000000000000400 DR1: 0000000000000400 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>>> Stack:
>>>  ffffffff87a210f6 ffffffff8701ad45 ffff88006768ec20 ffff88006768ec20
>>>  0000000000000000 0000000016f40480 ffff88000fc07450 1ffff1000cd9a017
>>>  ffff88006768ec00 ffff880066fc0730 ffff880066cd0068 1ffff10001f80e66
>>> Call Trace:
>>>  [<ffffffff879a313d>] sctp_transport_route+0xad/0x430 net/sctp/transport.c:279
>>>  [<ffffffff8799b106>] sctp_assoc_add_peer+0x5a6/0x13e0 net/sctp/associola.c:641
>>>  [<ffffffff879e8911>] sctp_sendmsg+0x1921/0x3bc0 net/sctp/socket.c:1864
>>>  [<ffffffff8701ad45>] inet_sendmsg+0x385/0x590 net/ipv4/af_inet.c:734
>>>  [<     inline     >] sock_sendmsg_nosec net/socket.c:621
>>>  [<ffffffff86a6d54f>] sock_sendmsg+0xcf/0x110 net/socket.c:631
>>>  [<ffffffff86a6ede0>] SYSC_sendto+0x660/0x810 net/socket.c:1656
>>>  [<ffffffff86a71dd5>] SyS_sendto+0x45/0x60 net/socket.c:1624
>>>  [<ffffffff88149dc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
>>> Code: 00 00 48 8b 84 24 88 00 00 00 48 8b 58 40 e8 80 76 cc f9 48 8d
>>> bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
>>> 3c 02 00 0f 85 56 0f 00 00 48 8b 9b a8 00 00 00 45 31 ed 48
>>> RIP  [<     inline     >] rt6_get_cookie include/net/ip6_fib.h:174
>>> RIP  [<ffffffff87a209e8>] sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>>>  RSP <ffff88000fc07298>
>>> ---[ end trace b8d1354fa571700d ]---
>>>
>>>
>>> general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>>> Dumping ftrace buffer:
>>>    (ftrace buffer empty)
>>> Modules linked in:
>>> CPU: 3 PID: 22744 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>> task: ffff88006b92a840 task.stack: ffff88006a730000
>>> RIP: 0010:[<ffffffff87a209e8>]  [<     inline     >] rt6_get_cookie
>>> include/net/ip6_fib.h:174
>>> RIP: 0010:[<ffffffff87a209e8>]  [<ffffffff87a209e8>]
>>> sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>>> RSP: 0018:ffff88006a736b88  EFLAGS: 00010202
>>> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90003c4f000
>>> RDX: 0000000000000015 RSI: 0000000000000001 RDI: 00000000000000a8
>>> RBP: ffff88006a736e68 R08: 0000000000000000 R09: 0000000000000001
>>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff880064cff268
>>> R13: 1ffff1000d4e6db0 R14: ffff880064cff240 R15: ffff88006a4b6808
>>> FS:  00007f74f4ec9700(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 000000002070effc CR3: 000000003bd2f000 CR4: 00000000000006e0
>>> DR0: 0000000000000400 DR1: 0000000000000400 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>>> Stack:
>>>  ffffffff87a210f6 ffffffff000bbd2d ffff88006c2cd5a0 ffff88006c2cd5a0
>>>  0000000000000000 000000006ccb46c0 ffff88006a736d40 1ffff1000c99fe57
>>>  ffff88006c2cd500 ffff8800658b1f30 ffff880064cff268 1ffff1000d4e6d84
>>> Call Trace:
>>>  [<ffffffff879a313d>] sctp_transport_route+0xad/0x430 net/sctp/transport.c:279
>>>  [<ffffffff8799b106>] sctp_assoc_add_peer+0x5a6/0x13e0 net/sctp/associola.c:641
>>>  [<ffffffff879e4358>] __sctp_connect+0x288/0xc90 net/sctp/socket.c:1178
>>>  [<ffffffff879e4f0b>] __sctp_setsockopt_connectx+0x1ab/0x200
>>> net/sctp/socket.c:1332
>>>  [<     inline     >] sctp_getsockopt_connectx3 net/sctp/socket.c:1417
>>>  [<ffffffff879fd2bd>] sctp_getsockopt+0x36ed/0x6800 net/sctp/socket.c:6474
>>>  [<ffffffff86a76c0a>] sock_common_getsockopt+0x9a/0xe0 net/core/sock.c:2649
>>>  [<     inline     >] SYSC_getsockopt net/socket.c:1788
>>>  [<ffffffff86a724d7>] SyS_getsockopt+0x257/0x390 net/socket.c:1770
>>>  [<ffffffff88149dc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
>>> Code: 00 00 48 8b 84 24 88 00 00 00 48 8b 58 40 e8 80 76 cc f9 48 8d
>>> bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
>>> 3c 02 00 0f 85 56 0f 00 00 48 8b 9b a8 00 00 00 45 31 ed 48
>>> RIP  [<     inline     >] rt6_get_cookie include/net/ip6_fib.h:174
>>> RIP  [<ffffffff87a209e8>] sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>>>  RSP <ffff88006a736b88>
>>> ---[ end trace f42d1c14cb6d2835 ]---
>>>
>>> This happened on commit a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6 (Nov 13).
>>>
>>> Unfortunately this is not reproducible.
>>>
>>> The line is:
>>>
>>>     return rt->rt6i_node ? rt->rt6i_node->fn_sernum : 0;
>>>
>>> Can it be a data race? rt->rt6i_node != NULL, but the next moment it
>>> is already NULL? That would explain the crash and non-reproducibility
>>> (need ThreadSanitizer!).
>>>
>>> This always happened when called from sctp code, but I don't know if
>>> it is relevant or not. It happened only 3 times.
>>
>> I'm seeing similar crashes from ipv6 and dccp code, reports below.
>>
>> [...]
>
> Thanks for the report.
>
> Do you have a thread running that concurrently mutates the routing table?

Hi Hannes,

We're running a fuzzer which calls random system calls from multiple
processes simultaneously, so it's quite possible.

Thanks!

>
> Bye,
> Hannes
>

^ permalink raw reply

* Re: [PATCH 3/6] net: ethernet: ti: cpts: add support of cpts HW_TS_PUSH
From: Jan Lübbe @ 2016-11-30 11:08 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Richard Cochran,
	Sekhar Nori, linux-kernel, linux-omap, Rob Herring, devicetree,
	Murali Karicheri, Wingman Kwok
In-Reply-To: <20161128230428.6872-4-grygorii.strashko@ti.com>

On Mo, 2016-11-28 at 17:04 -0600, Grygorii Strashko wrote:
> This patch adds support of the CPTS HW_TS_PUSH events which are generated
> by external low frequency time stamp channels on TI's OMAP CPSW and
> Keystone 2 platforms. It supports up to 8 external time stamp channels for
> HW_TS_PUSH input pins (the number of supported channel is different for
> different SoCs and CPTS versions, check corresponding Data maual before
> enabling it). Therefore, new DT property "cpts-ext-ts-inputs" is introduced
> for specifying number of available external timestamp channels.

If this only depends on SoC and CTPS, it should be possible to derive
the correct value from the compatible value and possibly a CPTS version
register? If the existing compatible strings are not specific enough,
possible a new one should be added.

Regards,
Jan
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: [PATCH 4/6] net: ethernet: ti: cpts: add ptp pps support
From: Jan Lübbe @ 2016-11-30 11:01 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Richard Cochran,
	Sekhar Nori, linux-kernel, linux-omap, Rob Herring, devicetree,
	Murali Karicheri, Wingman Kwok
In-Reply-To: <20161128230428.6872-5-grygorii.strashko@ti.com>

On Mo, 2016-11-28 at 17:04 -0600, Grygorii Strashko wrote:
> --- a/Documentation/devicetree/bindings/net/keystone-netcp.txt
> +++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt
> @@ -127,6 +127,16 @@ Optional properties:
>                 The number of external time stamp channels.
>                 The different CPTS versions might support up 8
>                 external time stamp channels. if absent - unsupported.
> +       - cpts-ts-comp-length:
> +               Enable time stamp comparison event and TS_COMP signal output
> +               generation when CPTS counter reaches a value written to
> +               the TS_COMP_VAL register.
> +               The generated pulse width is 3 refclk cycles if this property
> +               has no value (empty) or, otherwise, it should specify desired
> +               pulse width in number of refclk periods - max value 2^16.
> +               TS_COMP functionality will be disabled if not present.
> +       - cpts-ts-comp-polarity-low:
> +               Set polarity of TS_COMP signal to low. Default is hight.

Why is this configured via DT? Are the values fixed for a given board,
depending on external components? Couldn't this be configured somewhere
else?

Regards,
Jan
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: net: GPF in rt6_get_cookie
From: Hannes Frederic Sowa @ 2016-11-30 11:00 UTC (permalink / raw)
  To: Andrey Konovalov, syzkaller
  Cc: David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev, LKML, Eric Dumazet
In-Reply-To: <CAAeHK+wvAZByn7-fONWYk1P8fXA9wNdkVLGtXfQsdFb-NSdn+g@mail.gmail.com>

Hi

On 30.11.2016 11:39, Andrey Konovalov wrote:
> On Sat, Nov 26, 2016 at 5:23 PM, 'Dmitry Vyukov' via syzkaller
> <syzkaller@googlegroups.com> wrote:
>> Hello,
>>
>> I got several GPFs in rt6_get_cookie while running syzkaller:
>>
>> general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>> Dumping ftrace buffer:
>>    (ftrace buffer empty)
>> Modules linked in:
>> CPU: 2 PID: 10156 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> task: ffff880016f40480 task.stack: ffff88000fc00000
>> RIP: 0010:[<ffffffff87a209e8>]  [<     inline     >] rt6_get_cookie
>> include/net/ip6_fib.h:174
>> RIP: 0010:[<ffffffff87a209e8>]  [<ffffffff87a209e8>]
>> sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>> RSP: 0018:ffff88000fc07298  EFLAGS: 00010202
>> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc900029f5000
>> RDX: 0000000000000015 RSI: 0000000000000001 RDI: 00000000000000a8
>> RBP: ffff88000fc07580 R08: 0000000000000000 R09: 0000000000000001
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff880066cd0068
>> R13: 1ffff10001f80e92 R14: ffff880066cd0040 R15: ffff88005f2d2808
>> FS:  00007f52c41f7700(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000020016000 CR3: 0000000065dd7000 CR4: 00000000000006e0
>> DR0: 0000000000000400 DR1: 0000000000000400 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>> Stack:
>>  ffffffff87a210f6 ffffffff8701ad45 ffff88006768ec20 ffff88006768ec20
>>  0000000000000000 0000000016f40480 ffff88000fc07450 1ffff1000cd9a017
>>  ffff88006768ec00 ffff880066fc0730 ffff880066cd0068 1ffff10001f80e66
>> Call Trace:
>>  [<ffffffff879a313d>] sctp_transport_route+0xad/0x430 net/sctp/transport.c:279
>>  [<ffffffff8799b106>] sctp_assoc_add_peer+0x5a6/0x13e0 net/sctp/associola.c:641
>>  [<ffffffff879e8911>] sctp_sendmsg+0x1921/0x3bc0 net/sctp/socket.c:1864
>>  [<ffffffff8701ad45>] inet_sendmsg+0x385/0x590 net/ipv4/af_inet.c:734
>>  [<     inline     >] sock_sendmsg_nosec net/socket.c:621
>>  [<ffffffff86a6d54f>] sock_sendmsg+0xcf/0x110 net/socket.c:631
>>  [<ffffffff86a6ede0>] SYSC_sendto+0x660/0x810 net/socket.c:1656
>>  [<ffffffff86a71dd5>] SyS_sendto+0x45/0x60 net/socket.c:1624
>>  [<ffffffff88149dc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
>> Code: 00 00 48 8b 84 24 88 00 00 00 48 8b 58 40 e8 80 76 cc f9 48 8d
>> bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
>> 3c 02 00 0f 85 56 0f 00 00 48 8b 9b a8 00 00 00 45 31 ed 48
>> RIP  [<     inline     >] rt6_get_cookie include/net/ip6_fib.h:174
>> RIP  [<ffffffff87a209e8>] sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>>  RSP <ffff88000fc07298>
>> ---[ end trace b8d1354fa571700d ]---
>>
>>
>> general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>> Dumping ftrace buffer:
>>    (ftrace buffer empty)
>> Modules linked in:
>> CPU: 3 PID: 22744 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> task: ffff88006b92a840 task.stack: ffff88006a730000
>> RIP: 0010:[<ffffffff87a209e8>]  [<     inline     >] rt6_get_cookie
>> include/net/ip6_fib.h:174
>> RIP: 0010:[<ffffffff87a209e8>]  [<ffffffff87a209e8>]
>> sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>> RSP: 0018:ffff88006a736b88  EFLAGS: 00010202
>> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90003c4f000
>> RDX: 0000000000000015 RSI: 0000000000000001 RDI: 00000000000000a8
>> RBP: ffff88006a736e68 R08: 0000000000000000 R09: 0000000000000001
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff880064cff268
>> R13: 1ffff1000d4e6db0 R14: ffff880064cff240 R15: ffff88006a4b6808
>> FS:  00007f74f4ec9700(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 000000002070effc CR3: 000000003bd2f000 CR4: 00000000000006e0
>> DR0: 0000000000000400 DR1: 0000000000000400 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>> Stack:
>>  ffffffff87a210f6 ffffffff000bbd2d ffff88006c2cd5a0 ffff88006c2cd5a0
>>  0000000000000000 000000006ccb46c0 ffff88006a736d40 1ffff1000c99fe57
>>  ffff88006c2cd500 ffff8800658b1f30 ffff880064cff268 1ffff1000d4e6d84
>> Call Trace:
>>  [<ffffffff879a313d>] sctp_transport_route+0xad/0x430 net/sctp/transport.c:279
>>  [<ffffffff8799b106>] sctp_assoc_add_peer+0x5a6/0x13e0 net/sctp/associola.c:641
>>  [<ffffffff879e4358>] __sctp_connect+0x288/0xc90 net/sctp/socket.c:1178
>>  [<ffffffff879e4f0b>] __sctp_setsockopt_connectx+0x1ab/0x200
>> net/sctp/socket.c:1332
>>  [<     inline     >] sctp_getsockopt_connectx3 net/sctp/socket.c:1417
>>  [<ffffffff879fd2bd>] sctp_getsockopt+0x36ed/0x6800 net/sctp/socket.c:6474
>>  [<ffffffff86a76c0a>] sock_common_getsockopt+0x9a/0xe0 net/core/sock.c:2649
>>  [<     inline     >] SYSC_getsockopt net/socket.c:1788
>>  [<ffffffff86a724d7>] SyS_getsockopt+0x257/0x390 net/socket.c:1770
>>  [<ffffffff88149dc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
>> Code: 00 00 48 8b 84 24 88 00 00 00 48 8b 58 40 e8 80 76 cc f9 48 8d
>> bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
>> 3c 02 00 0f 85 56 0f 00 00 48 8b 9b a8 00 00 00 45 31 ed 48
>> RIP  [<     inline     >] rt6_get_cookie include/net/ip6_fib.h:174
>> RIP  [<ffffffff87a209e8>] sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
>>  RSP <ffff88006a736b88>
>> ---[ end trace f42d1c14cb6d2835 ]---
>>
>> This happened on commit a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6 (Nov 13).
>>
>> Unfortunately this is not reproducible.
>>
>> The line is:
>>
>>     return rt->rt6i_node ? rt->rt6i_node->fn_sernum : 0;
>>
>> Can it be a data race? rt->rt6i_node != NULL, but the next moment it
>> is already NULL? That would explain the crash and non-reproducibility
>> (need ThreadSanitizer!).
>>
>> This always happened when called from sctp code, but I don't know if
>> it is relevant or not. It happened only 3 times.
> 
> I'm seeing similar crashes from ipv6 and dccp code, reports below.
> 
> [...]

Thanks for the report.

Do you have a thread running that concurrently mutates the routing table?

Bye,
Hannes

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox