Netdev List
 help / color / mirror / Atom feed
* Re: [RFC v2 00/10] HFI Virtual Network Interface Controller (VNIC)
From: Vishwanathapura, Niranjana @ 2016-12-16  2:30 UTC (permalink / raw)
  To: ira.weiny
  Cc: Jason Gunthorpe, Doug Ledford, Leon Romanovsky, Jeff Kirsher,
	David S. Miller, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <20161216012404.GD3785-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>

On Thu, Dec 15, 2016 at 08:24:05PM -0500, ira.weiny wrote:
>On Thu, Dec 15, 2016 at 11:48:37AM -0700, Jason Gunthorpe wrote:
>> On Thu, Dec 15, 2016 at 01:19:18PM -0500, Doug Ledford wrote:
>> > On 12/15/2016 12:07 PM, Jason Gunthorpe wrote:
>> > > On Thu, Dec 15, 2016 at 11:28:06AM -0500, Doug Ledford wrote:
>> > >
>> > >> 1) Since your intent is to make this work with multiple versions of the
>> > >> hfi drivers, I disagree with Jason that just because there is only one
>> > >> driver today that we should keep it simple.  Design it right from the
>> > >> beginning of multi driver is your intent is, IMO, a better way to go.
>> > >> You'll work out the bugs in the initial implementation and when it comes
>> > >> time to add the second driver, things will go much more smoothly.
>> > >
>> > > If that is your position then this should be a straight up IB ULP that
>> > > works with any IB hardware.
>> >
>> > Yes, see my comments in point #3 of my previous email...
>>
>> Well, I'm not opposed to the vnic idea - Mellanox had (has?) a similar
>> IB driver. There are lots of good reasons to strictly maintain the
>> ethernet presentation.
>
>Agreed.  I'm pretty worried about the idea of putting VNIC into IPoIB.  It
>seems like a force fit at best.
>

Just to add what Jason, Ira already mentioned,
1) This isn't much common code between hfi_vnic and ipoib.
Besides we expect both ipoib and hfi_vnic to function parallely.
Registering with the network stack is also different.
hfi_vnic exchanges encapsulation information via IB MAD interface from OPA
EM which is not the case with ipoib.
We needed minimal set of interfaces (defined in include/rdma/opa_hfi.h in this 
path series) that represents HW.

2) The design is very different. There are no path record queries, QPs etc in 
hfi_vnic.

3) hfi_vnic also does the encapsulation with fabric (OPA) header, so bottom 
driver only puts it on the wire.
Whereas in ipoib, bottom ib device driver does the encapsulation for ipoib.

4) hfi_vnic do not need ib work request/completion structures.
hfi_vnic supports multiple TX/RX queues.

>>
>> There is much more going on here than just changing the LLADDR,
>> essentially everything MAD focused is different compared to ipoib, and
>> it looks like the required datastructures are different too. This is
>> more of a map a mac to a OPA_LRH approach with SA mediated discovery,
>> by my eye.
>>
>> The main share is the 'skb send' part, we've talked about hoisting
>> that out of ipoib in the past anyhow. A generic verb along those lines
>> would probably allow the sdma optimization for hfi for both this new
>> ulp and ipoib without creating such an ugly HFI1 specific interface.
>
>I'm not sure what you mean about "skb send" being used by ipoib.  Right now
>IPoIB already supplies a "generic skb send" for _Verbs_ in ipoib_send.
>
>I don't know what other devices would do to implement ipoib_send?  To me, it
>seems like the abstraction for IPoIB is at the proper layer now.
>
>For OPA, the hfi driver supports both IPoIB and VNIC.  So expecting IPoIB and
>VNIC to use a generic "skb send" in ib_device is going to make hfi1 do a lot of
>work to determine which ULP is calling it or make the interface kind of ugly.
>Either way I don't see how this is better than a separate set of functions.
>
>IMO the cleanest way to "clean up the ugly HFI1 interface" is to just  put the
>VNIC operations into ib_device similar to the iWarp specific structure
>"iw_cm_verbs" which is there today.
>
>If a device supports the VNIC operations then it can set the pointer and if not
>it will be NULL.  VNIC will look for that pointer for the support it needs.  If
>in the future other devices need modifications to that interface we can modify
>it then.
>
>Ira

Yes, I agree. The interface defined in include/rdma/opa_hfi.h in this patch 
series is pretty simple and generic interface that represents the HW.
If we include this file and put the hfi_vnic_ctrl_ops directly in ib_device 
structure, then it will simplify lot of stuff. We don't need to abstract
out hfi_ibdev and define any ib device capability flag for VNIC support.

>
>>
>> Jason
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC v2 00/10] HFI Virtual Network Interface Controller (VNIC)
From: Vishwanathapura, Niranjana @ 2016-12-16  2:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, ira.weiny, Leon Romanovsky, Jeff Kirsher,
	David S. Miller, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <20161215170713.GD3264-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Thu, Dec 15, 2016 at 10:07:13AM -0700, Jason Gunthorpe wrote:
>On Thu, Dec 15, 2016 at 11:28:06AM -0500, Doug Ledford wrote:
>
>> 1) Since your intent is to make this work with multiple versions of the
>> hfi drivers, I disagree with Jason that just because there is only one
>> driver today that we should keep it simple.  Design it right from the
>> beginning of multi driver is your intent is, IMO, a better way to go.
>> You'll work out the bugs in the initial implementation and when it comes
>> time to add the second driver, things will go much more smoothly.
>
>If that is your position then this should be a straight up IB ULP that
>works with any IB hardware.
>
>There is nothing HFI specific about it except for the
>micro-optimization of pushing packets via SDMA instead of post_send,
>and that same micro optimization probably applies to ipoib.
>

Responded on the other thread. As mentioned, there are differences between 
ipoib and hfi_vnic interface. For hfi_vnic, we need simple interface as defined 
in the include/rdma/opa_hfi.h that represents HW to put/get already 
encapsulated OPA packets.

>In other words, lets see the first version as a straight ULP with no
>special HFI hooks, then we can discuss how best to micro optimize it
>for HFI SDMA.
>

As mentioned in other thread, that would be putting hfi_vnic_ctrl_ops in 
ib_device structure.

Niranjana

>Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC v2 03/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) netdev
From: Vishwanathapura, Niranjana @ 2016-12-16  2:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w,
	ira.weiny-ral2JQCrhuEAvxtiuMwx3w, Sadanand Warrier, Sudeep Dutt,
	Tanya K Jajodia, Andrzej Kacprowski
In-Reply-To: <20161215170109.GC3264-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Thu, Dec 15, 2016 at 10:01:09AM -0700, Jason Gunthorpe wrote:
>On Wed, Dec 14, 2016 at 11:59:35PM -0800, Vishwanathapura, Niranjana wrote:
>> +/**
>> + * union hfi_vnic_bypass_hdr - VNIC bypass header
>> + * @slid: source lid
>> + * @length: length of packet
>> + * @becn: backward explicit congestion notification
>> + * @dlid: destination lid
>> + * @sc: service class
>> + * @fecn: forward explicit congestion notification
>> + * @l2: L2 type (2=16B)
>> + * @lt: link transfer field
>> + * @l4: L4 type
>> + * @slid_high: upper 4 bits of source lid
>> + * @dlid_high: upper 4 bits of destination lid
>> + * @pkey: partition key
>> + * @entropy: entropy
>> + * @age: packet age
>> + * @l4_hdr: L4 header
>> + */
>> +union hfi_vnic_bypass_hdr {
>> +	struct {
>> +	struct {
>> +		uint64_t slid   : 20;
>> +		uint64_t length : 11;
>> +		uint64_t becn   : 1;
>> +		uint64_t dlid   : 20;
>> +		uint64_t sc     : 5;
>> +		uint64_t rsvd   : 3;
>> +		uint64_t fecn   : 1;
>> +		uint64_t l2     : 2;
>> +		uint64_t lt     : 1;
>> +	};
>> +	struct {
>> +		uint64_t l4        : 8;
>> +		uint64_t slid_high : 4;
>> +		uint64_t dlid_high : 4;
>> +		uint64_t pkey      : 16;
>> +		uint64_t entropy   : 16;
>> +		uint64_t age       : 8;
>> +		uint64_t rsvd1     : 8;
>> +	};
>> +	struct {
>> +		uint32_t rsvd2  : 16;
>> +		uint32_t l4_hdr : 16;
>> +	};
>> +	} __packed;
>> +	u32 dw[5];
>> +};
>
>This isn't going to work on BE, please fix it.
>

We have made the hfi_vnic driver dependent on CONFIG_X86_64.
But I agree with all the feedback here. I will remove bitfields
and instead use bit operations in the next revision.

>> +/**
>> + * struct __hfi_vesw_info - HFI vnic virtual switch info
>> + */
>> +struct __hfi_vesw_info {
>> +	u16  fabric_id;
>> +	u16  vesw_id;
>> +
>> +	u8   rsvd0[6];
>> +	u16  def_port_mask;
>> +
>> +	u8   rsvd1[2];
>> +	u16  pkey;
>> +
>> +	u8   rsvd2[4];
>> +	u32  u_mcast_dlid;
>> +	u32  u_ucast_dlid[HFI_VESW_MAX_NUM_DEF_PORT];
>> +
>> +	u8   rsvd3[44];
>> +	u16  eth_mtu[HFI_VNIC_MAX_NUM_PCP];
>> +	u16  eth_mtu_non_vlan;
>> +	u8   rsvd4[2];
>> +} __packed;
>
>This goes on the network too? Also looks like it has endian problems.
>
>Ditto for all the __packed structures.
>

This is in CPU format. There is a separate big endian version of this structure 
defined in hfi_vnic_encap.h in below patch (which gets sent on wire).
https://www.spinics.net/lists/linux-rdma/msg44111.html

>> +#define v_dbg(format, arg...) \
>> +	netdev_dbg(adapter->netdev, format, ## arg)
>> +#define v_err(format, arg...) \
>> +	netdev_err(adapter->netdev, format, ## arg)
>> +#define v_info(format, arg...) \
>> +	netdev_info(adapter->netdev, format, ## arg)
>> +#define v_warn(format, arg...) \
>> +	netdev_warn(adapter->netdev, format, ## arg)
>
>Relies on an 'adapter' local varable?? Ugly.
>

I am using the same approach as Intel NIC driver like e1000e and ixgbe.

>Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v6 0/5] The SipHash Patchset
From: Jason A. Donenfeld @ 2016-12-16  3:03 UTC (permalink / raw)
  To: Netdev, kernel-hardening, LKML, linux-crypto, David Laight,
	Ted Tso, Hannes Frederic Sowa, Linus Torvalds, Eric Biggers,
	Tom Herbert, George Spelvin, Vegard Nossum, ak, davem, luto
  Cc: Jason A. Donenfeld
In-Reply-To: <20161215203003.31989-1-Jason@zx2c4.com>

Hey again,

This keeps getting more ambitious, which is good I suppose. If the frequency
of new versioned patchsets is too high for LKML and not customary, please let
me know. Otherwise, read on to see what's new this time...

With Hannes' suggestion, there is now only one siphash() function, which will
use the faster aligned version by compile-time constant folding. Additionally,
I now use constant folding to optionally switch to the helper siphash_Nu64
functions that are a bit faster for data of length 8, 16, 24, and 32. So, the
result is that you use siphash(data, len, key) if you have a buffer of sorts,
and then everything is taken care of for you. Or, if you have a series of
integers, you can opt to use siphash_Nu{32,64} functions instead. The basic
API is now complete.

After replacing MD5 in secure sequence number generation and the RNG, it
turned out that md5_transform wasn't used any place else in the tree, so
finally -- this is something to rejoice over -- lib/md5.c has been deleted and
now that function lives as a static function in crypto/md5.c where it belongs.

Meanwhile, it seems that sha_transform is used in places where SipHash would
be more fitting, so the IPv4 and IPv6 syncookies implementation now uses
SipHash, which should speed up TCP performance. Some BSDs already do this.

I'd like to replace sha_transform in addrconf, but that code is a bit gnarley,
so I don't want to be too meddlesome. I'm not entirely convinced either that
SipHash is a good choice for it. But I'm open to discussion here, so if you
have an opinion, please speak up.

If you've been following the evolution of this patchset, and think that
certain patches in it are fine, please do lend me your Reviewed-by to carry
into any subsequent versions, so that in case you disappear your useful
reviews will still keep the ball moving.

Thanks for all the great feedback thus far.

Jason

Jason A. Donenfeld (5):
  siphash: add cryptographically secure PRF
  secure_seq: use SipHash in place of MD5
  random: use SipHash in place of MD5
  md5: remove from lib and only live in crypto
  syncookies: use SipHash in place of SHA1

 MAINTAINERS             |   7 ++
 crypto/md5.c            |  95 +++++++++++++++++++++-
 drivers/char/random.c   |  32 +++-----
 include/linux/siphash.h |  86 ++++++++++++++++++++
 lib/Kconfig.debug       |   6 +-
 lib/Makefile            |   7 +-
 lib/md5.c               |  95 ----------------------
 lib/siphash.c           | 210 ++++++++++++++++++++++++++++++++++++++++++++++++
 lib/test_siphash.c      | 101 +++++++++++++++++++++++
 net/core/secure_seq.c   | 133 ++++++++++++------------------
 net/ipv4/syncookies.c   |  20 +----
 net/ipv6/syncookies.c   |  37 ++++-----
 12 files changed, 590 insertions(+), 239 deletions(-)
 create mode 100644 include/linux/siphash.h
 delete mode 100644 lib/md5.c
 create mode 100644 lib/siphash.c
 create mode 100644 lib/test_siphash.c

-- 
2.11.0

^ permalink raw reply

* [PATCH v6 2/5] secure_seq: use SipHash in place of MD5
From: Jason A. Donenfeld @ 2016-12-16  3:03 UTC (permalink / raw)
  To: Netdev, kernel-hardening, LKML, linux-crypto, David Laight,
	Ted Tso, Hannes Frederic Sowa, Linus Torvalds, Eric Biggers,
	Tom Herbert, George Spelvin, Vegard Nossum, ak, davem, luto
  Cc: Jason A. Donenfeld
In-Reply-To: <20161216030328.11602-1-Jason@zx2c4.com>

This gives a clear speed and security improvement. Siphash is both
faster and is more solid crypto than the aging MD5.

Rather than manually filling MD5 buffers, for IPv6, we simply create
a layout by a simple anonymous struct, for which gcc generates
rather efficient code. For IPv4, we pass the values directly to the
short input convenience functions.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Laight <David.Laight@aculab.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/core/secure_seq.c | 133 ++++++++++++++++++++------------------------------
 1 file changed, 52 insertions(+), 81 deletions(-)

diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 88a8e429fc3e..c80583bf3213 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -1,3 +1,5 @@
+/* Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. */
+
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/cryptohash.h>
@@ -8,14 +10,14 @@
 #include <linux/ktime.h>
 #include <linux/string.h>
 #include <linux/net.h>
-
+#include <linux/siphash.h>
 #include <net/secure_seq.h>
 
 #if IS_ENABLED(CONFIG_IPV6) || IS_ENABLED(CONFIG_INET)
+#include <linux/in6.h>
 #include <net/tcp.h>
-#define NET_SECRET_SIZE (MD5_MESSAGE_BYTES / 4)
 
-static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned;
+static siphash_key_t net_secret;
 
 static __always_inline void net_secret_init(void)
 {
@@ -44,44 +46,42 @@ static u32 seq_scale(u32 seq)
 u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
 				 __be16 sport, __be16 dport, u32 *tsoff)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
-	u32 i;
-
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 sport;
+		__be16 dport;
+		u32 padding;
+	} __aligned(SIPHASH_ALIGNMENT) combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.sport = sport,
+		.dport = dport
+	};
+	u64 hash;
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32)daddr[i];
-	secret[4] = net_secret[4] +
-		(((__force u16)sport << 16) + (__force u16)dport);
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	*tsoff = sysctl_tcp_timestamps == 1 ? hash[1] : 0;
-	return seq_scale(hash[0]);
+	hash = siphash(&combined, sizeof(combined), net_secret);
+	*tsoff = sysctl_tcp_timestamps == 1 ? (hash >> 32) : 0;
+	return seq_scale(hash);
 }
 EXPORT_SYMBOL(secure_tcpv6_sequence_number);
 
 u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
-	u32 i;
-
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 dport;
+		u16 padding1;
+		u32 padding2;
+	} __aligned(SIPHASH_ALIGNMENT) combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.dport = dport
+	};
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32) daddr[i];
-	secret[4] = net_secret[4] + (__force u32)dport;
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	return hash[0];
+	return siphash(&combined, sizeof(combined), net_secret);
 }
 EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 #endif
@@ -91,33 +91,17 @@ EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
 			       __be16 sport, __be16 dport, u32 *tsoff)
 {
-	u32 hash[MD5_DIGEST_WORDS];
-
+	u64 hash;
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	*tsoff = sysctl_tcp_timestamps == 1 ? hash[1] : 0;
-	return seq_scale(hash[0]);
+	hash = siphash_4u32(saddr, daddr, sport, dport, net_secret);
+	*tsoff = sysctl_tcp_timestamps == 1 ? (hash >> 32) : 0;
+	return seq_scale(hash);
 }
 
 u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
 {
-	u32 hash[MD5_DIGEST_WORDS];
-
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = (__force u32)dport ^ net_secret[14];
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	return hash[0];
+	return siphash_4u32(saddr, daddr, dport, 0, net_secret);
 }
 EXPORT_SYMBOL_GPL(secure_ipv4_port_ephemeral);
 #endif
@@ -126,21 +110,11 @@ EXPORT_SYMBOL_GPL(secure_ipv4_port_ephemeral);
 u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr,
 				__be16 sport, __be16 dport)
 {
-	u32 hash[MD5_DIGEST_WORDS];
 	u64 seq;
-
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	seq = hash[0] | (((u64)hash[1]) << 32);
+	seq = siphash_4u32(saddr, daddr, sport, dport, net_secret);
 	seq += ktime_get_real_ns();
 	seq &= (1ull << 48) - 1;
-
 	return seq;
 }
 EXPORT_SYMBOL(secure_dccp_sequence_number);
@@ -149,26 +123,23 @@ EXPORT_SYMBOL(secure_dccp_sequence_number);
 u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr,
 				  __be16 sport, __be16 dport)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 sport;
+		__be16 dport;
+		u32 padding;
+	} __aligned(SIPHASH_ALIGNMENT) combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.sport = sport,
+		.dport = dport
+	};
 	u64 seq;
-	u32 i;
-
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32)daddr[i];
-	secret[4] = net_secret[4] +
-		(((__force u16)sport << 16) + (__force u16)dport);
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	seq = hash[0] | (((u64)hash[1]) << 32);
+	seq = siphash(&combined, sizeof(combined), net_secret);
 	seq += ktime_get_real_ns();
 	seq &= (1ull << 48) - 1;
-
 	return seq;
 }
 EXPORT_SYMBOL(secure_dccpv6_sequence_number);
-- 
2.11.0

^ permalink raw reply related

* [PATCH v6 3/5] random: use SipHash in place of MD5
From: Jason A. Donenfeld @ 2016-12-16  3:03 UTC (permalink / raw)
  To: Netdev, kernel-hardening, LKML, linux-crypto, David Laight,
	Ted Tso, Hannes Frederic Sowa, Linus Torvalds, Eric Biggers,
	Tom Herbert, George Spelvin, Vegard Nossum, ak, davem, luto
  Cc: Jason A. Donenfeld, Jean-Philippe Aumasson
In-Reply-To: <20161216030328.11602-1-Jason@zx2c4.com>

This duplicates the current algorithm for get_random_int/long, but uses
siphash instead. This comes with several benefits. It's certainly
faster and more cryptographically secure than MD5. This patch also
separates hashed fields into three values instead of one, in order to
increase diffusion.

The previous MD5 algorithm used a per-cpu MD5 state, which caused
successive calls to the function to chain upon each other. While it's
not entirely clear that this kind of chaining is absolutely necessary
when using a secure PRF like siphash, it can't hurt, and the timing of
the call chain does add a degree of natural entropy. So, in keeping with
this design, instead of the massive per-cpu 64-byte MD5 state, there is
instead a per-cpu previously returned value for chaining.

The speed benefits are substantial:

                | siphash | md5    | speedup |
		------------------------------
get_random_long | 137130  | 415983 | 3.03x   |
get_random_int  | 86384   | 343323 | 3.97x   |

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Ted Tso <tytso@mit.edu>
---
 drivers/char/random.c | 32 +++++++++++++-------------------
 1 file changed, 13 insertions(+), 19 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d6876d506220..a51f0ff43f00 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -262,6 +262,7 @@
 #include <linux/syscalls.h>
 #include <linux/completion.h>
 #include <linux/uuid.h>
+#include <linux/siphash.h>
 #include <crypto/chacha20.h>
 
 #include <asm/processor.h>
@@ -2042,7 +2043,7 @@ struct ctl_table random_table[] = {
 };
 #endif 	/* CONFIG_SYSCTL */
 
-static u32 random_int_secret[MD5_MESSAGE_BYTES / 4] ____cacheline_aligned;
+static siphash_key_t random_int_secret;
 
 int random_int_secret_init(void)
 {
@@ -2050,8 +2051,7 @@ int random_int_secret_init(void)
 	return 0;
 }
 
-static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
-		__aligned(sizeof(unsigned long));
+static DEFINE_PER_CPU(u64, get_random_int_chaining);
 
 /*
  * Get a random word for internal kernel use only. Similar to urandom but
@@ -2061,19 +2061,16 @@ static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
  */
 unsigned int get_random_int(void)
 {
-	__u32 *hash;
 	unsigned int ret;
+	u64 *chaining;
 
 	if (arch_get_random_int(&ret))
 		return ret;
 
-	hash = get_cpu_var(get_random_int_hash);
-
-	hash[0] += current->pid + jiffies + random_get_entropy();
-	md5_transform(hash, random_int_secret);
-	ret = hash[0];
-	put_cpu_var(get_random_int_hash);
-
+	chaining = &get_cpu_var(get_random_int_chaining);
+	ret = *chaining = siphash_3u64(*chaining, jiffies, random_get_entropy() +
+				       current->pid, random_int_secret);
+	put_cpu_var(get_random_int_chaining);
 	return ret;
 }
 EXPORT_SYMBOL(get_random_int);
@@ -2083,19 +2080,16 @@ EXPORT_SYMBOL(get_random_int);
  */
 unsigned long get_random_long(void)
 {
-	__u32 *hash;
 	unsigned long ret;
+	u64 *chaining;
 
 	if (arch_get_random_long(&ret))
 		return ret;
 
-	hash = get_cpu_var(get_random_int_hash);
-
-	hash[0] += current->pid + jiffies + random_get_entropy();
-	md5_transform(hash, random_int_secret);
-	ret = *(unsigned long *)hash;
-	put_cpu_var(get_random_int_hash);
-
+	chaining = &get_cpu_var(get_random_int_chaining);
+	ret = *chaining = siphash_3u64(*chaining, jiffies, random_get_entropy() +
+				       current->pid, random_int_secret);
+	put_cpu_var(get_random_int_chaining);
 	return ret;
 }
 EXPORT_SYMBOL(get_random_long);
-- 
2.11.0

^ permalink raw reply related

* [PATCH v6 4/5] md5: remove from lib and only live in crypto
From: Jason A. Donenfeld @ 2016-12-16  3:03 UTC (permalink / raw)
  To: Netdev, kernel-hardening, LKML, linux-crypto, David Laight,
	Ted Tso, Hannes Frederic Sowa, Linus Torvalds, Eric Biggers,
	Tom Herbert, George Spelvin, Vegard Nossum, ak, davem, luto
  Cc: Jason A. Donenfeld
In-Reply-To: <20161216030328.11602-1-Jason@zx2c4.com>

The md5_transform function is no longer used any where in the tree,
except for the crypto api's actual implementation of md5, so we can drop
the function from lib and put it as a static function of the crypto
file, where it belongs. There should be no new users of md5_transform,
anyway, since there are more modern ways of doing what it once achieved.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 crypto/md5.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 lib/Makefile |  2 +-
 lib/md5.c    | 95 ------------------------------------------------------------
 3 files changed, 95 insertions(+), 97 deletions(-)
 delete mode 100644 lib/md5.c

diff --git a/crypto/md5.c b/crypto/md5.c
index 2355a7c25c45..f7ae1a48225b 100644
--- a/crypto/md5.c
+++ b/crypto/md5.c
@@ -21,9 +21,11 @@
 #include <linux/module.h>
 #include <linux/string.h>
 #include <linux/types.h>
-#include <linux/cryptohash.h>
 #include <asm/byteorder.h>
 
+#define MD5_DIGEST_WORDS 4
+#define MD5_MESSAGE_BYTES 64
+
 const u8 md5_zero_message_hash[MD5_DIGEST_SIZE] = {
 	0xd4, 0x1d, 0x8c, 0xd9, 0x8f, 0x00, 0xb2, 0x04,
 	0xe9, 0x80, 0x09, 0x98, 0xec, 0xf8, 0x42, 0x7e,
@@ -47,6 +49,97 @@ static inline void cpu_to_le32_array(u32 *buf, unsigned int words)
 	}
 }
 
+#define F1(x, y, z)	(z ^ (x & (y ^ z)))
+#define F2(x, y, z)	F1(z, x, y)
+#define F3(x, y, z)	(x ^ y ^ z)
+#define F4(x, y, z)	(y ^ (x | ~z))
+
+#define MD5STEP(f, w, x, y, z, in, s) \
+	(w += f(x, y, z) + in, w = (w<<s | w>>(32-s)) + x)
+
+static void md5_transform(__u32 *hash, __u32 const *in)
+{
+	u32 a, b, c, d;
+
+	a = hash[0];
+	b = hash[1];
+	c = hash[2];
+	d = hash[3];
+
+	MD5STEP(F1, a, b, c, d, in[0] + 0xd76aa478, 7);
+	MD5STEP(F1, d, a, b, c, in[1] + 0xe8c7b756, 12);
+	MD5STEP(F1, c, d, a, b, in[2] + 0x242070db, 17);
+	MD5STEP(F1, b, c, d, a, in[3] + 0xc1bdceee, 22);
+	MD5STEP(F1, a, b, c, d, in[4] + 0xf57c0faf, 7);
+	MD5STEP(F1, d, a, b, c, in[5] + 0x4787c62a, 12);
+	MD5STEP(F1, c, d, a, b, in[6] + 0xa8304613, 17);
+	MD5STEP(F1, b, c, d, a, in[7] + 0xfd469501, 22);
+	MD5STEP(F1, a, b, c, d, in[8] + 0x698098d8, 7);
+	MD5STEP(F1, d, a, b, c, in[9] + 0x8b44f7af, 12);
+	MD5STEP(F1, c, d, a, b, in[10] + 0xffff5bb1, 17);
+	MD5STEP(F1, b, c, d, a, in[11] + 0x895cd7be, 22);
+	MD5STEP(F1, a, b, c, d, in[12] + 0x6b901122, 7);
+	MD5STEP(F1, d, a, b, c, in[13] + 0xfd987193, 12);
+	MD5STEP(F1, c, d, a, b, in[14] + 0xa679438e, 17);
+	MD5STEP(F1, b, c, d, a, in[15] + 0x49b40821, 22);
+
+	MD5STEP(F2, a, b, c, d, in[1] + 0xf61e2562, 5);
+	MD5STEP(F2, d, a, b, c, in[6] + 0xc040b340, 9);
+	MD5STEP(F2, c, d, a, b, in[11] + 0x265e5a51, 14);
+	MD5STEP(F2, b, c, d, a, in[0] + 0xe9b6c7aa, 20);
+	MD5STEP(F2, a, b, c, d, in[5] + 0xd62f105d, 5);
+	MD5STEP(F2, d, a, b, c, in[10] + 0x02441453, 9);
+	MD5STEP(F2, c, d, a, b, in[15] + 0xd8a1e681, 14);
+	MD5STEP(F2, b, c, d, a, in[4] + 0xe7d3fbc8, 20);
+	MD5STEP(F2, a, b, c, d, in[9] + 0x21e1cde6, 5);
+	MD5STEP(F2, d, a, b, c, in[14] + 0xc33707d6, 9);
+	MD5STEP(F2, c, d, a, b, in[3] + 0xf4d50d87, 14);
+	MD5STEP(F2, b, c, d, a, in[8] + 0x455a14ed, 20);
+	MD5STEP(F2, a, b, c, d, in[13] + 0xa9e3e905, 5);
+	MD5STEP(F2, d, a, b, c, in[2] + 0xfcefa3f8, 9);
+	MD5STEP(F2, c, d, a, b, in[7] + 0x676f02d9, 14);
+	MD5STEP(F2, b, c, d, a, in[12] + 0x8d2a4c8a, 20);
+
+	MD5STEP(F3, a, b, c, d, in[5] + 0xfffa3942, 4);
+	MD5STEP(F3, d, a, b, c, in[8] + 0x8771f681, 11);
+	MD5STEP(F3, c, d, a, b, in[11] + 0x6d9d6122, 16);
+	MD5STEP(F3, b, c, d, a, in[14] + 0xfde5380c, 23);
+	MD5STEP(F3, a, b, c, d, in[1] + 0xa4beea44, 4);
+	MD5STEP(F3, d, a, b, c, in[4] + 0x4bdecfa9, 11);
+	MD5STEP(F3, c, d, a, b, in[7] + 0xf6bb4b60, 16);
+	MD5STEP(F3, b, c, d, a, in[10] + 0xbebfbc70, 23);
+	MD5STEP(F3, a, b, c, d, in[13] + 0x289b7ec6, 4);
+	MD5STEP(F3, d, a, b, c, in[0] + 0xeaa127fa, 11);
+	MD5STEP(F3, c, d, a, b, in[3] + 0xd4ef3085, 16);
+	MD5STEP(F3, b, c, d, a, in[6] + 0x04881d05, 23);
+	MD5STEP(F3, a, b, c, d, in[9] + 0xd9d4d039, 4);
+	MD5STEP(F3, d, a, b, c, in[12] + 0xe6db99e5, 11);
+	MD5STEP(F3, c, d, a, b, in[15] + 0x1fa27cf8, 16);
+	MD5STEP(F3, b, c, d, a, in[2] + 0xc4ac5665, 23);
+
+	MD5STEP(F4, a, b, c, d, in[0] + 0xf4292244, 6);
+	MD5STEP(F4, d, a, b, c, in[7] + 0x432aff97, 10);
+	MD5STEP(F4, c, d, a, b, in[14] + 0xab9423a7, 15);
+	MD5STEP(F4, b, c, d, a, in[5] + 0xfc93a039, 21);
+	MD5STEP(F4, a, b, c, d, in[12] + 0x655b59c3, 6);
+	MD5STEP(F4, d, a, b, c, in[3] + 0x8f0ccc92, 10);
+	MD5STEP(F4, c, d, a, b, in[10] + 0xffeff47d, 15);
+	MD5STEP(F4, b, c, d, a, in[1] + 0x85845dd1, 21);
+	MD5STEP(F4, a, b, c, d, in[8] + 0x6fa87e4f, 6);
+	MD5STEP(F4, d, a, b, c, in[15] + 0xfe2ce6e0, 10);
+	MD5STEP(F4, c, d, a, b, in[6] + 0xa3014314, 15);
+	MD5STEP(F4, b, c, d, a, in[13] + 0x4e0811a1, 21);
+	MD5STEP(F4, a, b, c, d, in[4] + 0xf7537e82, 6);
+	MD5STEP(F4, d, a, b, c, in[11] + 0xbd3af235, 10);
+	MD5STEP(F4, c, d, a, b, in[2] + 0x2ad7d2bb, 15);
+	MD5STEP(F4, b, c, d, a, in[9] + 0xeb86d391, 21);
+
+	hash[0] += a;
+	hash[1] += b;
+	hash[2] += c;
+	hash[3] += d;
+}
+
 static inline void md5_transform_helper(struct md5_state *ctx)
 {
 	le32_to_cpu_array(ctx->block, sizeof(ctx->block) / sizeof(u32));
diff --git a/lib/Makefile b/lib/Makefile
index 71d398b04a74..1079152607e0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -19,7 +19,7 @@ KCOV_INSTRUMENT_dynamic_debug.o := n
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 rbtree.o radix-tree.o dump_stack.o timerqueue.o\
 	 idr.o int_sqrt.o extable.o \
-	 sha1.o chacha20.o md5.o irq_regs.o argv_split.o \
+	 sha1.o chacha20.o irq_regs.o argv_split.o \
 	 flex_proportions.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
 	 earlycpio.o seq_buf.o siphash.o \
diff --git a/lib/md5.c b/lib/md5.c
deleted file mode 100644
index bb0cd01d356d..000000000000
--- a/lib/md5.c
+++ /dev/null
@@ -1,95 +0,0 @@
-#include <linux/compiler.h>
-#include <linux/export.h>
-#include <linux/cryptohash.h>
-
-#define F1(x, y, z)	(z ^ (x & (y ^ z)))
-#define F2(x, y, z)	F1(z, x, y)
-#define F3(x, y, z)	(x ^ y ^ z)
-#define F4(x, y, z)	(y ^ (x | ~z))
-
-#define MD5STEP(f, w, x, y, z, in, s) \
-	(w += f(x, y, z) + in, w = (w<<s | w>>(32-s)) + x)
-
-void md5_transform(__u32 *hash, __u32 const *in)
-{
-	u32 a, b, c, d;
-
-	a = hash[0];
-	b = hash[1];
-	c = hash[2];
-	d = hash[3];
-
-	MD5STEP(F1, a, b, c, d, in[0] + 0xd76aa478, 7);
-	MD5STEP(F1, d, a, b, c, in[1] + 0xe8c7b756, 12);
-	MD5STEP(F1, c, d, a, b, in[2] + 0x242070db, 17);
-	MD5STEP(F1, b, c, d, a, in[3] + 0xc1bdceee, 22);
-	MD5STEP(F1, a, b, c, d, in[4] + 0xf57c0faf, 7);
-	MD5STEP(F1, d, a, b, c, in[5] + 0x4787c62a, 12);
-	MD5STEP(F1, c, d, a, b, in[6] + 0xa8304613, 17);
-	MD5STEP(F1, b, c, d, a, in[7] + 0xfd469501, 22);
-	MD5STEP(F1, a, b, c, d, in[8] + 0x698098d8, 7);
-	MD5STEP(F1, d, a, b, c, in[9] + 0x8b44f7af, 12);
-	MD5STEP(F1, c, d, a, b, in[10] + 0xffff5bb1, 17);
-	MD5STEP(F1, b, c, d, a, in[11] + 0x895cd7be, 22);
-	MD5STEP(F1, a, b, c, d, in[12] + 0x6b901122, 7);
-	MD5STEP(F1, d, a, b, c, in[13] + 0xfd987193, 12);
-	MD5STEP(F1, c, d, a, b, in[14] + 0xa679438e, 17);
-	MD5STEP(F1, b, c, d, a, in[15] + 0x49b40821, 22);
-
-	MD5STEP(F2, a, b, c, d, in[1] + 0xf61e2562, 5);
-	MD5STEP(F2, d, a, b, c, in[6] + 0xc040b340, 9);
-	MD5STEP(F2, c, d, a, b, in[11] + 0x265e5a51, 14);
-	MD5STEP(F2, b, c, d, a, in[0] + 0xe9b6c7aa, 20);
-	MD5STEP(F2, a, b, c, d, in[5] + 0xd62f105d, 5);
-	MD5STEP(F2, d, a, b, c, in[10] + 0x02441453, 9);
-	MD5STEP(F2, c, d, a, b, in[15] + 0xd8a1e681, 14);
-	MD5STEP(F2, b, c, d, a, in[4] + 0xe7d3fbc8, 20);
-	MD5STEP(F2, a, b, c, d, in[9] + 0x21e1cde6, 5);
-	MD5STEP(F2, d, a, b, c, in[14] + 0xc33707d6, 9);
-	MD5STEP(F2, c, d, a, b, in[3] + 0xf4d50d87, 14);
-	MD5STEP(F2, b, c, d, a, in[8] + 0x455a14ed, 20);
-	MD5STEP(F2, a, b, c, d, in[13] + 0xa9e3e905, 5);
-	MD5STEP(F2, d, a, b, c, in[2] + 0xfcefa3f8, 9);
-	MD5STEP(F2, c, d, a, b, in[7] + 0x676f02d9, 14);
-	MD5STEP(F2, b, c, d, a, in[12] + 0x8d2a4c8a, 20);
-
-	MD5STEP(F3, a, b, c, d, in[5] + 0xfffa3942, 4);
-	MD5STEP(F3, d, a, b, c, in[8] + 0x8771f681, 11);
-	MD5STEP(F3, c, d, a, b, in[11] + 0x6d9d6122, 16);
-	MD5STEP(F3, b, c, d, a, in[14] + 0xfde5380c, 23);
-	MD5STEP(F3, a, b, c, d, in[1] + 0xa4beea44, 4);
-	MD5STEP(F3, d, a, b, c, in[4] + 0x4bdecfa9, 11);
-	MD5STEP(F3, c, d, a, b, in[7] + 0xf6bb4b60, 16);
-	MD5STEP(F3, b, c, d, a, in[10] + 0xbebfbc70, 23);
-	MD5STEP(F3, a, b, c, d, in[13] + 0x289b7ec6, 4);
-	MD5STEP(F3, d, a, b, c, in[0] + 0xeaa127fa, 11);
-	MD5STEP(F3, c, d, a, b, in[3] + 0xd4ef3085, 16);
-	MD5STEP(F3, b, c, d, a, in[6] + 0x04881d05, 23);
-	MD5STEP(F3, a, b, c, d, in[9] + 0xd9d4d039, 4);
-	MD5STEP(F3, d, a, b, c, in[12] + 0xe6db99e5, 11);
-	MD5STEP(F3, c, d, a, b, in[15] + 0x1fa27cf8, 16);
-	MD5STEP(F3, b, c, d, a, in[2] + 0xc4ac5665, 23);
-
-	MD5STEP(F4, a, b, c, d, in[0] + 0xf4292244, 6);
-	MD5STEP(F4, d, a, b, c, in[7] + 0x432aff97, 10);
-	MD5STEP(F4, c, d, a, b, in[14] + 0xab9423a7, 15);
-	MD5STEP(F4, b, c, d, a, in[5] + 0xfc93a039, 21);
-	MD5STEP(F4, a, b, c, d, in[12] + 0x655b59c3, 6);
-	MD5STEP(F4, d, a, b, c, in[3] + 0x8f0ccc92, 10);
-	MD5STEP(F4, c, d, a, b, in[10] + 0xffeff47d, 15);
-	MD5STEP(F4, b, c, d, a, in[1] + 0x85845dd1, 21);
-	MD5STEP(F4, a, b, c, d, in[8] + 0x6fa87e4f, 6);
-	MD5STEP(F4, d, a, b, c, in[15] + 0xfe2ce6e0, 10);
-	MD5STEP(F4, c, d, a, b, in[6] + 0xa3014314, 15);
-	MD5STEP(F4, b, c, d, a, in[13] + 0x4e0811a1, 21);
-	MD5STEP(F4, a, b, c, d, in[4] + 0xf7537e82, 6);
-	MD5STEP(F4, d, a, b, c, in[11] + 0xbd3af235, 10);
-	MD5STEP(F4, c, d, a, b, in[2] + 0x2ad7d2bb, 15);
-	MD5STEP(F4, b, c, d, a, in[9] + 0xeb86d391, 21);
-
-	hash[0] += a;
-	hash[1] += b;
-	hash[2] += c;
-	hash[3] += d;
-}
-EXPORT_SYMBOL(md5_transform);
-- 
2.11.0

^ permalink raw reply related

* [PATCH v6 5/5] syncookies: use SipHash in place of SHA1
From: Jason A. Donenfeld @ 2016-12-16  3:03 UTC (permalink / raw)
  To: Netdev, kernel-hardening, LKML, linux-crypto, David Laight,
	Ted Tso, Hannes Frederic Sowa, Linus Torvalds, Eric Biggers,
	Tom Herbert, George Spelvin, Vegard Nossum, ak, davem, luto
  Cc: Jason A. Donenfeld
In-Reply-To: <20161216030328.11602-1-Jason@zx2c4.com>

SHA1 is slower and less secure than SipHash, and so replacing syncookie
generation with SipHash makes natural sense. Some BSDs have been doing
this for several years in fact.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 net/ipv4/syncookies.c | 20 ++++----------------
 net/ipv6/syncookies.c | 37 ++++++++++++++++---------------------
 2 files changed, 20 insertions(+), 37 deletions(-)

diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 3e88467d70ee..03bb068f8888 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -13,13 +13,13 @@
 #include <linux/tcp.h>
 #include <linux/slab.h>
 #include <linux/random.h>
-#include <linux/cryptohash.h>
+#include <linux/siphash.h>
 #include <linux/kernel.h>
 #include <linux/export.h>
 #include <net/tcp.h>
 #include <net/route.h>
 
-static u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS] __read_mostly;
+static siphash_key_t syncookie_secret[2] __read_mostly;
 
 #define COOKIEBITS 24	/* Upper bits store count */
 #define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)
@@ -48,24 +48,12 @@ static u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS] __read_mostly;
 #define TSBITS	6
 #define TSMASK	(((__u32)1 << TSBITS) - 1)
 
-static DEFINE_PER_CPU(__u32 [16 + 5 + SHA_WORKSPACE_WORDS], ipv4_cookie_scratch);
-
 static u32 cookie_hash(__be32 saddr, __be32 daddr, __be16 sport, __be16 dport,
 		       u32 count, int c)
 {
-	__u32 *tmp;
-
 	net_get_random_once(syncookie_secret, sizeof(syncookie_secret));
-
-	tmp  = this_cpu_ptr(ipv4_cookie_scratch);
-	memcpy(tmp + 4, syncookie_secret[c], sizeof(syncookie_secret[c]));
-	tmp[0] = (__force u32)saddr;
-	tmp[1] = (__force u32)daddr;
-	tmp[2] = ((__force u32)sport << 16) + (__force u32)dport;
-	tmp[3] = count;
-	sha_transform(tmp + 16, (__u8 *)tmp, tmp + 16 + 5);
-
-	return tmp[17];
+	return siphash_4u32(saddr, daddr, (u32)sport << 16 | dport, count,
+			    syncookie_secret[c]);
 }
 
 
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index a4d49760bf43..04d19e89a3e0 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -16,7 +16,7 @@
 
 #include <linux/tcp.h>
 #include <linux/random.h>
-#include <linux/cryptohash.h>
+#include <linux/siphash.h>
 #include <linux/kernel.h>
 #include <net/ipv6.h>
 #include <net/tcp.h>
@@ -24,7 +24,7 @@
 #define COOKIEBITS 24	/* Upper bits store count */
 #define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)
 
-static u32 syncookie6_secret[2][16-4+SHA_DIGEST_WORDS] __read_mostly;
+static siphash_key_t syncookie6_secret[2] __read_mostly;
 
 /* RFC 2460, Section 8.3:
  * [ipv6 tcp] MSS must be computed as the maximum packet size minus 60 [..]
@@ -41,30 +41,25 @@ static __u16 const msstab[] = {
 	9000 - 60,
 };
 
-static DEFINE_PER_CPU(__u32 [16 + 5 + SHA_WORKSPACE_WORDS], ipv6_cookie_scratch);
-
 static u32 cookie_hash(const struct in6_addr *saddr, const struct in6_addr *daddr,
 		       __be16 sport, __be16 dport, u32 count, int c)
 {
-	__u32 *tmp;
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		u32 count;
+		u16 sport;
+		u16 dport;
+	} __aligned(SIPHASH_ALIGNMENT) combined = {
+		.saddr = *saddr,
+		.daddr = *daddr,
+		.count = count,
+		.sport = sport,
+		.dport = dport
+	};
 
 	net_get_random_once(syncookie6_secret, sizeof(syncookie6_secret));
-
-	tmp  = this_cpu_ptr(ipv6_cookie_scratch);
-
-	/*
-	 * we have 320 bits of information to hash, copy in the remaining
-	 * 192 bits required for sha_transform, from the syncookie6_secret
-	 * and overwrite the digest with the secret
-	 */
-	memcpy(tmp + 10, syncookie6_secret[c], 44);
-	memcpy(tmp, saddr, 16);
-	memcpy(tmp + 4, daddr, 16);
-	tmp[8] = ((__force u32)sport << 16) + (__force u32)dport;
-	tmp[9] = count;
-	sha_transform(tmp + 16, (__u8 *)tmp, tmp + 16 + 5);
-
-	return tmp[17];
+	return siphash(&combined, sizeof(combined), syncookie6_secret[c]);
 }
 
 static __u32 secure_tcp_syn_cookie(const struct in6_addr *saddr,
-- 
2.11.0

^ permalink raw reply related

* [PATCH v6 1/5] siphash: add cryptographically secure PRF
From: Jason A. Donenfeld @ 2016-12-16  3:03 UTC (permalink / raw)
  To: Netdev, kernel-hardening, LKML, linux-crypto, David Laight,
	Ted Tso, Hannes Frederic Sowa, Linus Torvalds, Eric Biggers,
	Tom Herbert, George Spelvin, Vegard Nossum, ak, davem, luto
  Cc: Jason A. Donenfeld, Jean-Philippe Aumasson
In-Reply-To: <20161216030328.11602-1-Jason@zx2c4.com>

SipHash is a 64-bit keyed hash function that is actually a
cryptographically secure PRF, like HMAC. Except SipHash is super fast,
and is meant to be used as a hashtable keyed lookup function, or as a
general PRF for short input use cases, such as sequence numbers or RNG
chaining.

For the first usage:

There are a variety of attacks known as "hashtable poisoning" in which an
attacker forms some data such that the hash of that data will be the
same, and then preceeds to fill up all entries of a hashbucket. This is
a realistic and well-known denial-of-service vector. Currently
hashtables use jhash, which is fast but not secure, and some kind of
rotating key scheme (or none at all, which isn't good). SipHash is meant
as a replacement for jhash in these cases.

There are a modicum of places in the kernel that are vulnerable to
hashtable poisoning attacks, either via userspace vectors or network
vectors, and there's not a reliable mechanism inside the kernel at the
moment to fix it. The first step toward fixing these issues is actually
getting a secure primitive into the kernel for developers to use. Then
we can, bit by bit, port things over to it as deemed appropriate.

While SipHash is extremely fast for a cryptographically secure function,
it is likely a bit slower than the insecure jhash, and so replacements
will be evaluated on a case-by-case basis based on whether or not the
difference in speed is negligible and whether or not the current jhash usage
poses a real security risk.

For the second usage:

A few places in the kernel are using MD5 or SHA1 for creating secure
sequence numbers, syn cookies, port numbers, or fast random numbers.
SipHash is a faster and more fitting, and more secure replacement for MD5
in those situations. Replacing MD5 and SHA1 with SipHash for these uses is
obvious and straight-forward, and so is submitted along with this patch
series. There shouldn't be much of a debate over its efficacy.

Dozens of languages are already using this internally for their hash
tables and PRFs. Some of the BSDs already use this in their kernels.
SipHash is a widely known high-speed solution to a widely known set of
problems, and it's time we catch-up.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
---
 MAINTAINERS             |   7 ++
 include/linux/siphash.h |  86 ++++++++++++++++++++
 lib/Kconfig.debug       |   6 +-
 lib/Makefile            |   5 +-
 lib/siphash.c           | 210 ++++++++++++++++++++++++++++++++++++++++++++++++
 lib/test_siphash.c      | 101 +++++++++++++++++++++++
 6 files changed, 410 insertions(+), 5 deletions(-)
 create mode 100644 include/linux/siphash.h
 create mode 100644 lib/siphash.c
 create mode 100644 lib/test_siphash.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 59c9895d73d5..5d87a8c1056a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11231,6 +11231,13 @@ F:	arch/arm/mach-s3c24xx/mach-bast.c
 F:	arch/arm/mach-s3c24xx/bast-ide.c
 F:	arch/arm/mach-s3c24xx/bast-irq.c
 
+SIPHASH PRF ROUTINES
+M:	Jason A. Donenfeld <Jason@zx2c4.com>
+S:	Maintained
+F:	lib/siphash.c
+F:	lib/test_siphash.c
+F:	include/linux/siphash.h
+
 TI DAVINCI MACHINE SUPPORT
 M:	Sekhar Nori <nsekhar@ti.com>
 M:	Kevin Hilman <khilman@kernel.org>
diff --git a/include/linux/siphash.h b/include/linux/siphash.h
new file mode 100644
index 000000000000..e82fce48a0f6
--- /dev/null
+++ b/include/linux/siphash.h
@@ -0,0 +1,86 @@
+/* Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ *
+ * This implementation is specifically for SipHash2-4.
+ */
+
+#ifndef _LINUX_SIPHASH_H
+#define _LINUX_SIPHASH_H
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+
+#define SIPHASH_ALIGNMENT 8
+typedef u64 siphash_key_t[2];
+
+u64 __siphash_aligned(const void *data, size_t len, const siphash_key_t key);
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+u64 __siphash_unaligned(const void *data, size_t len, const siphash_key_t key);
+#endif
+
+u64 siphash_1u64(const u64 a, const siphash_key_t key);
+u64 siphash_2u64(const u64 a, const u64 b, const siphash_key_t key);
+u64 siphash_3u64(const u64 a, const u64 b, const u64 c,
+		 const siphash_key_t key);
+u64 siphash_4u64(const u64 a, const u64 b, const u64 c, const u64 d,
+		 const siphash_key_t key);
+
+static inline u64 ___siphash_aligned(const u64 *data, size_t len, const siphash_key_t key)
+{
+	if (__builtin_constant_p(len) && len == 8)
+		return siphash_1u64(data[0], key);
+	if (__builtin_constant_p(len) && len == 16)
+		return siphash_2u64(data[0], data[1], key);
+	if (__builtin_constant_p(len) && len == 24)
+		return siphash_3u64(data[0], data[1], data[2], key);
+	if (__builtin_constant_p(len) && len == 32)
+		return siphash_4u64(data[0], data[1], data[2], data[3], key);
+	return __siphash_aligned(data, len, key);
+}
+
+/**
+ * siphash - compute 64-bit siphash PRF value
+ * @data: buffer to hash
+ * @size: size of @data
+ * @key: the siphash key
+ */
+static inline u64 siphash(const void *data, size_t len, const siphash_key_t key)
+{
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+	if (!IS_ALIGNED((unsigned long)data, SIPHASH_ALIGNMENT))
+		return __siphash_unaligned(data, len, key);
+#endif
+	return ___siphash_aligned(data, len, key);
+}
+
+static inline u64 siphash_2u32(const u32 a, const u32 b, const siphash_key_t key)
+{
+	return siphash_1u64((u64)b << 32 | a, key);
+}
+
+static inline u64 siphash_4u32(const u32 a, const u32 b, const u32 c, const u32 d,
+			       const siphash_key_t key)
+{
+	return siphash_2u64((u64)b << 32 | a, (u64)d << 32 | c, key);
+}
+
+static inline u64 siphash_6u32(const u32 a, const u32 b, const u32 c, const u32 d,
+			       const u32 e, const u32 f, const siphash_key_t key)
+{
+	return siphash_3u64((u64)b << 32 | a, (u64)d << 32 | c, (u64)f << 32 | e,
+			    key);
+}
+
+static inline u64 siphash_8u32(const u32 a, const u32 b, const u32 c, const u32 d,
+			       const u32 e, const u32 f, const u32 g, const u32 h,
+			       const siphash_key_t key)
+{
+	return siphash_4u64((u64)b << 32 | a, (u64)d << 32 | c, (u64)f << 32 | e,
+			    (u64)h << 32 | g, key);
+}
+
+#endif /* _LINUX_SIPHASH_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 7446097f72bd..86254ea99b45 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1843,9 +1843,9 @@ config TEST_HASH
 	tristate "Perform selftest on hash functions"
 	default n
 	help
-	  Enable this option to test the kernel's integer (<linux/hash,h>)
-	  and string (<linux/stringhash.h>) hash functions on boot
-	  (or module load).
+	  Enable this option to test the kernel's integer (<linux/hash.h>),
+	  string (<linux/stringhash.h>), and siphash (<linux/siphash.h>)
+	  hash functions on boot (or module load).
 
 	  This is intended to help people writing architecture-specific
 	  optimized versions.  If unsure, say N.
diff --git a/lib/Makefile b/lib/Makefile
index 50144a3aeebd..71d398b04a74 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -22,7 +22,8 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 sha1.o chacha20.o md5.o irq_regs.o argv_split.o \
 	 flex_proportions.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
-	 earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o win_minmax.o
+	 earlycpio.o seq_buf.o siphash.o \
+	 nmi_backtrace.o nodemask.o win_minmax.o
 
 lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
@@ -44,7 +45,7 @@ obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
 obj-y += kstrtox.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
 obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o
-obj-$(CONFIG_TEST_HASH) += test_hash.o
+obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o
 obj-$(CONFIG_TEST_KASAN) += test_kasan.o
 obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o
 obj-$(CONFIG_TEST_LKM) += test_module.o
diff --git a/lib/siphash.c b/lib/siphash.c
new file mode 100644
index 000000000000..7efc273de5d0
--- /dev/null
+++ b/lib/siphash.c
@@ -0,0 +1,210 @@
+/* Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ *
+ * This implementation is specifically for SipHash2-4.
+ */
+
+#include <linux/siphash.h>
+#include <asm/unaligned.h>
+
+#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
+#include <linux/dcache.h>
+#include <asm/word-at-a-time.h>
+#endif
+
+#define SIPROUND \
+	do { \
+	v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32); \
+	v2 += v3; v3 = rol64(v3, 16); v3 ^= v2; \
+	v0 += v3; v3 = rol64(v3, 21); v3 ^= v0; \
+	v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32); \
+	} while(0)
+
+#define PREAMBLE(len) \
+	u64 v0 = 0x736f6d6570736575ULL; \
+	u64 v1 = 0x646f72616e646f6dULL; \
+	u64 v2 = 0x6c7967656e657261ULL; \
+	u64 v3 = 0x7465646279746573ULL; \
+	u64 b = ((u64)len) << 56; \
+	v3 ^= key[1]; \
+	v2 ^= key[0]; \
+	v1 ^= key[1]; \
+	v0 ^= key[0];
+
+#define POSTAMBLE \
+	v3 ^= b; \
+	SIPROUND; \
+	SIPROUND; \
+	v0 ^= b; \
+	v2 ^= 0xff; \
+	SIPROUND; \
+	SIPROUND; \
+	SIPROUND; \
+	SIPROUND; \
+	return (v0 ^ v1) ^ (v2 ^ v3);
+
+u64 __siphash_aligned(const void *data, size_t len, const siphash_key_t key)
+{
+	const u8 *end = data + len - (len % sizeof(u64));
+	const u8 left = len & (sizeof(u64) - 1);
+	u64 m;
+	PREAMBLE(len)
+	for (; data != end; data += sizeof(u64)) {
+		m = le64_to_cpup(data);
+		v3 ^= m;
+		SIPROUND;
+		SIPROUND;
+		v0 ^= m;
+	}
+#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
+	if (left)
+		b |= le64_to_cpu((__force __le64)(load_unaligned_zeropad(data) &
+						  bytemask_from_count(left)));
+#else
+	switch (left) {
+	case 7: b |= ((u64)end[6]) << 48;
+	case 6: b |= ((u64)end[5]) << 40;
+	case 5: b |= ((u64)end[4]) << 32;
+	case 4: b |= le32_to_cpup(data); break;
+	case 3: b |= ((u64)end[2]) << 16;
+	case 2: b |= le16_to_cpup(data); break;
+	case 1: b |= end[0];
+	}
+#endif
+	POSTAMBLE
+}
+EXPORT_SYMBOL(__siphash_aligned);
+
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+u64 __siphash_unaligned(const void *data, size_t len, const siphash_key_t key)
+{
+	const u8 *end = data + len - (len % sizeof(u64));
+	const u8 left = len & (sizeof(u64) - 1);
+	u64 m;
+	PREAMBLE(len)
+	for (; data != end; data += sizeof(u64)) {
+		m = get_unaligned_le64(data);
+		v3 ^= m;
+		SIPROUND;
+		SIPROUND;
+		v0 ^= m;
+	}
+#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
+	if (left)
+		b |= le64_to_cpu((__force __le64)(load_unaligned_zeropad(data) &
+						  bytemask_from_count(left)));
+#else
+	switch (left) {
+	case 7: b |= ((u64)end[6]) << 48;
+	case 6: b |= ((u64)end[5]) << 40;
+	case 5: b |= ((u64)end[4]) << 32;
+	case 4: b |= get_unaligned_le32(end); break;
+	case 3: b |= ((u64)end[2]) << 16;
+	case 2: b |= get_unaligned_le16(end); break;
+	case 1: b |= end[0];
+	}
+#endif
+	POSTAMBLE
+}
+EXPORT_SYMBOL(__siphash_unaligned);
+#endif
+
+/**
+ * siphash_1u64 - compute 64-bit siphash PRF value of a u64
+ * @first: first u64
+ * @key: the siphash key
+ */
+u64 siphash_1u64(const u64 first, const siphash_key_t key)
+{
+	PREAMBLE(8)
+	v3 ^= first;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= first;
+	POSTAMBLE
+}
+EXPORT_SYMBOL(siphash_1u64);
+
+/**
+ * siphash_2u64 - compute 64-bit siphash PRF value of 2 u64
+ * @first: first u64
+ * @second: second u64
+ * @key: the siphash key
+ */
+u64 siphash_2u64(const u64 first, const u64 second, const siphash_key_t key)
+{
+	PREAMBLE(16)
+	v3 ^= first;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= first;
+	v3 ^= second;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= second;
+	POSTAMBLE
+}
+EXPORT_SYMBOL(siphash_2u64);
+
+/**
+ * siphash_3u64 - compute 64-bit siphash PRF value of 3 u64
+ * @first: first u64
+ * @second: second u64
+ * @third: third u64
+ * @key: the siphash key
+ */
+u64 siphash_3u64(const u64 first, const u64 second, const u64 third,
+		 const siphash_key_t key)
+{
+	PREAMBLE(24)
+	v3 ^= first;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= first;
+	v3 ^= second;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= second;
+	v3 ^= third;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= third;
+	POSTAMBLE
+}
+EXPORT_SYMBOL(siphash_3u64);
+
+/**
+ * siphash_4u64 - compute 64-bit siphash PRF value of 4 u64
+ * @first: first u64
+ * @second: second u64
+ * @third: third u64
+ * @forth: forth u64
+ * @key: the siphash key
+ */
+u64 siphash_4u64(const u64 first, const u64 second, const u64 third,
+		 const u64 forth, const siphash_key_t key)
+{
+	PREAMBLE(32)
+	v3 ^= first;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= first;
+	v3 ^= second;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= second;
+	v3 ^= third;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= third;
+	v3 ^= forth;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= forth;
+	POSTAMBLE
+}
+EXPORT_SYMBOL(siphash_4u64);
diff --git a/lib/test_siphash.c b/lib/test_siphash.c
new file mode 100644
index 000000000000..906e58a2c946
--- /dev/null
+++ b/lib/test_siphash.c
@@ -0,0 +1,101 @@
+/* Test cases for siphash.c
+ *
+ * Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ *
+ * This implementation is specifically for SipHash2-4.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/siphash.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/module.h>
+
+/* Test vectors taken from official reference source available at:
+ *     https://131002.net/siphash/siphash24.c
+ */
+static const u64 test_vectors[64] = {
+	0x726fdb47dd0e0e31ULL, 0x74f839c593dc67fdULL, 0x0d6c8009d9a94f5aULL,
+	0x85676696d7fb7e2dULL, 0xcf2794e0277187b7ULL, 0x18765564cd99a68dULL,
+	0xcbc9466e58fee3ceULL, 0xab0200f58b01d137ULL, 0x93f5f5799a932462ULL,
+	0x9e0082df0ba9e4b0ULL, 0x7a5dbbc594ddb9f3ULL, 0xf4b32f46226bada7ULL,
+	0x751e8fbc860ee5fbULL, 0x14ea5627c0843d90ULL, 0xf723ca908e7af2eeULL,
+	0xa129ca6149be45e5ULL, 0x3f2acc7f57c29bdbULL, 0x699ae9f52cbe4794ULL,
+	0x4bc1b3f0968dd39cULL, 0xbb6dc91da77961bdULL, 0xbed65cf21aa2ee98ULL,
+	0xd0f2cbb02e3b67c7ULL, 0x93536795e3a33e88ULL, 0xa80c038ccd5ccec8ULL,
+	0xb8ad50c6f649af94ULL, 0xbce192de8a85b8eaULL, 0x17d835b85bbb15f3ULL,
+	0x2f2e6163076bcfadULL, 0xde4daaaca71dc9a5ULL, 0xa6a2506687956571ULL,
+	0xad87a3535c49ef28ULL, 0x32d892fad841c342ULL, 0x7127512f72f27cceULL,
+	0xa7f32346f95978e3ULL, 0x12e0b01abb051238ULL, 0x15e034d40fa197aeULL,
+	0x314dffbe0815a3b4ULL, 0x027990f029623981ULL, 0xcadcd4e59ef40c4dULL,
+	0x9abfd8766a33735cULL, 0x0e3ea96b5304a7d0ULL, 0xad0c42d6fc585992ULL,
+	0x187306c89bc215a9ULL, 0xd4a60abcf3792b95ULL, 0xf935451de4f21df2ULL,
+	0xa9538f0419755787ULL, 0xdb9acddff56ca510ULL, 0xd06c98cd5c0975ebULL,
+	0xe612a3cb9ecba951ULL, 0xc766e62cfcadaf96ULL, 0xee64435a9752fe72ULL,
+	0xa192d576b245165aULL, 0x0a8787bf8ecb74b2ULL, 0x81b3e73d20b49b6fULL,
+	0x7fa8220ba3b2eceaULL, 0x245731c13ca42499ULL, 0xb78dbfaf3a8d83bdULL,
+	0xea1ad565322a1a0bULL, 0x60e61c23a3795013ULL, 0x6606d7e446282b93ULL,
+	0x6ca4ecb15c5f91e1ULL, 0x9f626da15c9625f3ULL, 0xe51b38608ef25f57ULL,
+	0x958a324ceb064572ULL
+};
+static const siphash_key_t test_key =
+	{ 0x0706050403020100ULL , 0x0f0e0d0c0b0a0908ULL };
+
+static int __init siphash_test_init(void)
+{
+	u8 in[64] __aligned(SIPHASH_ALIGNMENT);
+	u8 in_unaligned[65];
+	u8 i;
+	int ret = 0;
+
+	for (i = 0; i < 64; ++i) {
+		in[i] = i;
+		in_unaligned[i + 1] = i;
+		if (siphash(in, i, test_key) != test_vectors[i]) {
+			pr_info("self-test aligned %u: FAIL\n", i + 1);
+			ret = -EINVAL;
+		}
+		if (siphash(in_unaligned + 1, i, test_key) != test_vectors[i]) {
+			pr_info("self-test unaligned %u: FAIL\n", i + 1);
+			ret = -EINVAL;
+		}
+	}
+	if (siphash_1u64(0x0706050403020100ULL, test_key) != test_vectors[8]) {
+		pr_info("self-test 1u64: FAIL\n");
+		ret = -EINVAL;
+	}
+	if (siphash_2u64(0x0706050403020100ULL, 0x0f0e0d0c0b0a0908ULL, test_key) != test_vectors[16]) {
+		pr_info("self-test 2u64: FAIL\n");
+		ret = -EINVAL;
+	}
+	if (siphash_3u64(0x0706050403020100ULL, 0x0f0e0d0c0b0a0908ULL,
+			 0x1716151413121110ULL, test_key) != test_vectors[24]) {
+		pr_info("self-test 3u64: FAIL\n");
+		ret = -EINVAL;
+	}
+	if (siphash_4u64(0x0706050403020100ULL, 0x0f0e0d0c0b0a0908ULL,
+			 0x1716151413121110ULL, 0x1f1e1d1c1b1a1918ULL, test_key) != test_vectors[32]) {
+		pr_info("self-test 4u64: FAIL\n");
+		ret = -EINVAL;
+	}
+	if (!ret)
+		pr_info("self-tests: pass\n");
+	return ret;
+}
+
+static void __exit siphash_test_exit(void)
+{
+}
+
+module_init(siphash_test_init);
+module_exit(siphash_test_exit);
+
+MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
+MODULE_LICENSE("Dual BSD/GPL");
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH] net: ipv4: tcp_offload: check segs for NULL
From: Eric Dumazet @ 2016-12-16  3:28 UTC (permalink / raw)
  To: shakya.das; +Cc: davem, netdev, shakya89.das, vidushi.koul
In-Reply-To: <1481791661-21670-1-git-send-email-shakya.das@samsung.com>

On Thu, 2016-12-15 at 14:17 +0530, shakya.das@samsung.com wrote:
> From: Shakya Sundar Das <shakya.das@samsung.com>
> 
> This patch will check segs for being NULL in tcp_gso_segment()
> before calling skb_shinfo(segs) from skb_is_gso(segs), otherwise
> kernel can run into a NULL-pointer dereference.
> 
> Signed-off-by: Shakya Sundar Das <shakya.das@samsung.com>
> ---
>  net/ipv4/tcp_offload.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index bc68da3..93feefd 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -96,7 +96,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>  	skb->ooo_okay = 0;
>  
>  	segs = skb_segment(skb, features);
> -	if (IS_ERR(segs))
> +	if (IS_ERR_OR_NULL(segs))
>  		goto out;
>  
>  	/* Only first segment might have ooo_okay set */

I do not see how this can ever happen.

How did you come up with this patch exactly ???

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: George Spelvin @ 2016-12-16  3:46 UTC (permalink / raw)
  To: ak, davem, David.Laight, ebiggers3, hannes, Jason,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, linux, luto, netdev, tom, torvalds, tytso,
	vegard.nossum
  Cc: djb
In-Reply-To: <CAGiyFdfmiCMyHvAg=5sGh8KjBBrF0Wb4Qf=JLzJqUAx4yFSS3Q@mail.gmail.com>

Jean-Philippe Aumasson wrote:
> If a halved version of SipHash can bring significant performance boost
> (with 32b words instead of 64b words) with an acceptable security level
> (64-bit enough?) then we may design such a version.

It would be fairly significant, a 2x speed benefit on a lot of 32-bit
machines.

First is the fact that a 64-bit SipHash round on a generic 32-bit machine
requires not twice as many instructions, but more than three.

Consider the core SipHash quarter-round operation:
	a += b;
	b = rotate_left(b, k)
	b ^= a

The add and xor are equivalent between 32- and 64-bit rounds; twice the
instructions do twice the work.  (There's a dependency via the carry
bit between the two halves of the add, but it ends up not being on the
critical path even in a superscalar implementation.)

The problem is the rotates.  Although some particularly nice code is
possible on 32-bit ARM due to its support for shift-and-xor operations,
on a generic 32-bit CPU the rotate grows to 6 instructions with a 2-cycle
dependency chain (more in practice because barrel shifters are large and
even quad-issue CPUs can't do 4 shifts per cycle):

	temp_lo = b_lo >> (32-k)
	temp_hi = b_hi >> (32-k)
	b_lo <<= k
	b_hi <<= k
	b_lo ^= temp_hi
	b_hi ^= temp_lo

The resultant instruction counts and (assuming wide issue)
latencies are:

	64-bit SipHash	"Half" SipHash
	Inst.	Latency	Inst.	Latency
	 10	 3	  3	 2	Quarter round
	 40	 6	 12	 4	Full round
	 80	12	 24	 8	Two rounds
	 82	13	 26	 9	Mix in one word
	 82	13	 52	18	Mix in 64 bits
	166	26	 61	18	Four round finalization + final XOR
	248	39	113	36	Hash 64 bits
	330	52	165	54	Hash 128 bits
	412	65	217	72	Hash 192 bits

While the ideal latencies are actually better for the 64-bit algorithm,
that requires an unrealistic 6+-wide superscalar implementation that's
more than twice as wide as the 64-bit code requires (which is already
optimized for quad-issue).  For a 1- or 2-wide processor, the instruction
counts dominate, and not only does the 64-bit algorithm take 60% more
time to mix in the same number of bytes, but the finalization rounds
bring the ratio to 2:1 for small inputs.

(And I haven't included the possible savings if the input size is an odd
number of 32-bit words, such as networking applications which include
the source/dest port numbers.)


Notes on particular processors:
- x86 can do a 64-bit rotate in 3 instructions and 2 cycles using
  the SHLD/SHRD instructions instead:
	movl	%b_hi, %temp
	shldl	$k, %b_lo, %b_hi
	shldl	$k, %temp, %b_lo
  ... but as I mentioned the problem is registers.  SipHash needs 8 32-bit
  words plus at least one temporary, and 32-bit x86 has only 7 available.
  (And compilers can rarely manage to keep more than 6 of them busy.)
- 64-bit SipHash is particularly efficient on 32-bit ARM due to its
  support for shift-and-op instructions.  The 64-bit shift and following
  xor can be done in 4 instructions.  So the only benefit is from the
  reduced finalization.
- Double-width adds cost a little more on CPUs like MIPS and RISC-V without
  condition codes.
- Certain particularly crappy uClinux processors with slow shifts
  (68000, anyone?) really suffer from extra shifts.

One *weakly* requested feature: It might simplify some programming
interfaces if we could use the same key for multiple hash tables with a
1-word "tweak" (e.g. pointer to the hash table, so it could be assumed
non-zero if that helped) to make distinct functions.  That would let us
more safely use a global key for multiple small hash tables without the
need to add code to generate and store key material for each place that
an unkeyed hash is replaced.

^ permalink raw reply

* [PATCH v3 net] r6040: move spinlock in r6040_close as SOFTIRQ-unsafe lock order detected
From: Manuel Bessler @ 2016-12-16  3:55 UTC (permalink / raw)
  To: netdev; +Cc: f.fainelli, Manuel Bessler
In-Reply-To: <1481826099-12840-1-git-send-email-manuel.bessler@sensus.com>

'ifconfig eth0 down' makes r6040_close() trigger:
 INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected

Fixed by moving calls to phy_stop(), napi_disable(), netif_stop_queue()
to outside of the module's private spin_lock_irq block.

Found on a Versalogic Tomcat SBC with a Vortex86 SoC

s1660e_5150:~# sudo ifconfig eth0 down
[   61.306415] ======================================================
[   61.306415] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
[   61.306415] 4.9.0-gb898d2d-manuel #1 Not tainted
[   61.306415] ------------------------------------------------------
[   61.306415] ifconfig/449 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[   61.306415]  (&dev->lock){+.+...}, at: [<c1336276>] phy_stop+0x16/0x80

[   61.306415] and this task is already holding:
[   61.306415]  (&(&lp->lock)->rlock){+.-...}, at: [<d0934c84>] r6040_close+0x24/0x230 [r6040]
which would create a new lock dependency:
[   61.306415]  (&(&lp->lock)->rlock){+.-...} -> (&dev->lock){+.+...}

[   61.306415] but this new dependency connects a SOFTIRQ-irq-safe lock:
[   61.306415]  (&(&lp->lock)->rlock){+.-...}
[   61.306415] ... which became SOFTIRQ-irq-safe at:
[   61.306415]   [   61.306415] [<c1075bc5>] __lock_acquire+0x555/0x1770
[   61.306415]   [   61.306415] [<c107717c>] lock_acquire+0x7c/0x150
[   61.306415]   [   61.306415] [<c14bb334>] _raw_spin_lock_irqsave+0x24/0x40
[   61.306415]   [   61.306415] [<d0934ac0>] r6040_start_xmit+0x30/0x1d0 [r6040]
[   61.306415]   [   61.306415] [<c13a7d4d>] dev_hard_start_xmit+0x9d/0x2d0
[   61.306415]   [   61.306415] [<c13c8a38>] sch_direct_xmit+0xa8/0x140
[   61.306415]   [   61.306415] [<c13a8436>] __dev_queue_xmit+0x416/0x780
[   61.306415]   [   61.306415] [<c13a87aa>] dev_queue_xmit+0xa/0x10
[   61.306415]   [   61.306415] [<c13b4837>] neigh_resolve_output+0x147/0x220
[   61.306415]   [   61.306415] [<c144541b>] ip6_finish_output2+0x2fb/0x910
[   61.306415]   [   61.306415] [<c14494e6>] ip6_finish_output+0xa6/0x1a0
[   61.306415]   [   61.306415] [<c1449635>] ip6_output+0x55/0x320
[   61.306415]   [   61.306415] [<c146f4d2>] mld_sendpack+0x352/0x560
[   61.306415]   [   61.306415] [<c146fe55>] mld_ifc_timer_expire+0x155/0x280
[   61.306415]   [   61.306415] [<c108b081>] call_timer_fn+0x81/0x270
[   61.306415]   [   61.306415] [<c108b331>] expire_timers+0xc1/0x180
[   61.306415]   [   61.306415] [<c108b4f7>] run_timer_softirq+0x77/0x150
[   61.306415]   [   61.306415] [<c1043d04>] __do_softirq+0xb4/0x3d0
[   61.306415]   [   61.306415] [<c101a15c>] do_softirq_own_stack+0x1c/0x30
[   61.306415]   [   61.306415] [<c104416e>] irq_exit+0x8e/0xa0
[   61.306415]   [   61.306415] [<c1019d31>] do_IRQ+0x51/0x100
[   61.306415]   [   61.306415] [<c14bc176>] common_interrupt+0x36/0x40
[   61.306415]   [   61.306415] [<c1134928>] set_root+0x68/0xf0
[   61.306415]   [   61.306415] [<c1136120>] path_init+0x400/0x640
[   61.306415]   [   61.306415] [<c11386bf>] path_lookupat+0xf/0xe0
[   61.306415]   [   61.306415] [<c1139ebc>] filename_lookup+0x6c/0x100
[   61.306415]   [   61.306415] [<c1139fd5>] user_path_at_empty+0x25/0x30
[   61.306415]   [   61.306415] [<c11298c6>] SyS_faccessat+0x86/0x1e0
[   61.306415]   [   61.306415] [<c1129a30>] SyS_access+0x10/0x20
[   61.306415]   [   61.306415] [<c100179f>] do_int80_syscall_32+0x3f/0x110
[   61.306415]   [   61.306415] [<c14bba3f>] restore_all+0x0/0x61
[   61.306415]
[   61.306415] to a SOFTIRQ-irq-unsafe lock:
[   61.306415]  (&dev->lock){+.+...}
[   61.306415] ... which became SOFTIRQ-irq-unsafe at:
[   61.306415] ...[   61.306415]
[   61.306415] [<c1075c0c>] __lock_acquire+0x59c/0x1770
[   61.306415]   [   61.306415] [<c107717c>] lock_acquire+0x7c/0x150
[   61.306415]   [   61.306415] [<c14b7add>] mutex_lock_nested+0x2d/0x4a0
[   61.306415]   [   61.306415] [<c133747d>] phy_probe+0x4d/0xc0
[   61.306415]   [   61.306415] [<c1338afe>] phy_attach_direct+0xbe/0x190
[   61.306415]   [   61.306415] [<c1338ca7>] phy_connect_direct+0x17/0x60
[   61.306415]   [   61.306415] [<c1338d23>] phy_connect+0x33/0x70
[   61.306415]   [   61.306415] [<d09357a0>] r6040_init_one+0x3a0/0x500 [r6040]
[   61.306415]   [   61.306415] [<c12a78c7>] pci_device_probe+0x77/0xd0
[   61.306415]   [   61.306415] [<c12f5e15>] driver_probe_device+0x145/0x280
[   61.306415]   [   61.306415] [<c12f5fd9>] __driver_attach+0x89/0x90
[   61.306415]   [   61.306415] [<c12f43ef>] bus_for_each_dev+0x4f/0x80
[   61.306415]   [   61.306415] [<c12f5954>] driver_attach+0x14/0x20
[   61.306415]   [   61.306415] [<c12f55b7>] bus_add_driver+0x197/0x210
[   61.306415]   [   61.306415] [<c12f6a21>] driver_register+0x51/0xd0
[   61.306415]   [   61.306415] [<c12a6955>] __pci_register_driver+0x45/0x50
[   61.306415]   [   61.306415] [<d0938017>] 0xd0938017
[   61.306415]   [   61.306415] [<c100043f>] do_one_initcall+0x2f/0x140
[   61.306415]   [   61.306415] [<c10e48c0>] do_init_module+0x4a/0x19b
[   61.306415]   [   61.306415] [<c10a680e>] load_module+0x1b2e/0x2070
[   61.306415]   [   61.306415] [<c10a6eb9>] SyS_finit_module+0x69/0x80
[   61.306415]   [   61.306415] [<c100179f>] do_int80_syscall_32+0x3f/0x110
[   61.306415]   [   61.306415] [<c14bba3f>] restore_all+0x0/0x61
[   61.306415]
[   61.306415] other info that might help us debug this:
[   61.306415]
[   61.306415]  Possible interrupt unsafe locking scenario:
[   61.306415]
[   61.306415]        CPU0                    CPU1
[   61.306415]        ----                    ----
[   61.306415]   lock(&dev->lock);
[   61.306415]                                local_irq_disable();
[   61.306415]                                lock(&(&lp->lock)->rlock);
[   61.306415]                                lock(&dev->lock);
[   61.306415]   <Interrupt>
[   61.306415]     lock(&(&lp->lock)->rlock);
[   61.306415]
[   61.306415]  *** DEADLOCK ***
[   61.306415]
[   61.306415] 2 locks held by ifconfig/449:
[   61.306415]  #0:  (rtnl_mutex){+.+.+.}, at: [<c13b68ef>] rtnl_lock+0xf/0x20
[   61.306415]  #1:  (&(&lp->lock)->rlock){+.-...}, at: [<d0934c84>] r6040_close+0x24/0x230 [r6040]
[   61.306415]
[   61.306415] the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
[   61.306415] -> (&(&lp->lock)->rlock){+.-...} ops: 3049 {
[   61.306415]    HARDIRQ-ON-W at:
[   61.306415]                     [   61.306415] [<c1075be7>] __lock_acquire+0x577/0x1770
[   61.306415]                     [   61.306415] [<c107717c>] lock_acquire+0x7c/0x150
[   61.306415]                     [   61.306415] [<c14bb21b>] _raw_spin_lock+0x1b/0x30
[   61.306415]                     [   61.306415] [<d09343cc>] r6040_poll+0x2c/0x330 [r6040]
[   61.306415]                     [   61.306415] [<c13a5577>] net_rx_action+0x197/0x340
[   61.306415]                     [   61.306415] [<c1043d04>] __do_softirq+0xb4/0x3d0
[   61.306415]                     [   61.306415] [<c1044037>] run_ksoftirqd+0x17/0x40
[   61.306415]                     [   61.306415] [<c105fe91>] smpboot_thread_fn+0x141/0x180
[   61.306415]                     [   61.306415] [<c105c84e>] kthread+0xde/0x110
[   61.306415]                     [   61.306415] [<c14bb949>] ret_from_fork+0x19/0x30
[   61.306415]    IN-SOFTIRQ-W at:
[   61.306415]                     [   61.306415] [<c1075bc5>] __lock_acquire+0x555/0x1770
[   61.306415]                     [   61.306415] [<c107717c>] lock_acquire+0x7c/0x150
[   61.306415]                     [   61.306415] [<c14bb334>] _raw_spin_lock_irqsave+0x24/0x40
[   61.306415]                     [   61.306415] [<d0934ac0>] r6040_start_xmit+0x30/0x1d0 [r6040]
[   61.306415]                     [   61.306415] [<c13a7d4d>] dev_hard_start_xmit+0x9d/0x2d0
[   61.306415]                     [   61.306415] [<c13c8a38>] sch_direct_xmit+0xa8/0x140
[   61.306415]                     [   61.306415] [<c13a8436>] __dev_queue_xmit+0x416/0x780
[   61.306415]                     [   61.306415] [<c13a87aa>] dev_queue_xmit+0xa/0x10
[   61.306415]                     [   61.306415] [<c13b4837>] neigh_resolve_output+0x147/0x220
[   61.306415]                     [   61.306415] [<c144541b>] ip6_finish_output2+0x2fb/0x910
[   61.306415]                     [   61.306415] [<c14494e6>] ip6_finish_output+0xa6/0x1a0
[   61.306415]                     [   61.306415] [<c1449635>] ip6_output+0x55/0x320
[   61.306415]                     [   61.306415] [<c146f4d2>] mld_sendpack+0x352/0x560
[   61.306415]                     [   61.306415] [<c146fe55>] mld_ifc_timer_expire+0x155/0x280
[   61.306415]                     [   61.306415] [<c108b081>] call_timer_fn+0x81/0x270
[   61.306415]                     [   61.306415] [<c108b331>] expire_timers+0xc1/0x180
[   61.306415]                     [   61.306415] [<c108b4f7>] run_timer_softirq+0x77/0x150
[   61.306415]                     [   61.306415] [<c1043d04>] __do_softirq+0xb4/0x3d0
[   61.306415]                     [   61.306415] [<c101a15c>] do_softirq_own_stack+0x1c/0x30
[   61.306415]                     [   61.306415] [<c104416e>] irq_exit+0x8e/0xa0
[   61.306415]                     [   61.306415] [<c1019d31>] do_IRQ+0x51/0x100
[   61.306415]                     [   61.306415] [<c14bc176>] common_interrupt+0x36/0x40
[   61.306415]                     [   61.306415] [<c1134928>] set_root+0x68/0xf0
[   61.306415]                     [   61.306415] [<c1136120>] path_init+0x400/0x640
[   61.306415]                     [   61.306415] [<c11386bf>] path_lookupat+0xf/0xe0
[   61.306415]                     [   61.306415] [<c1139ebc>] filename_lookup+0x6c/0x100
[   61.306415]                     [   61.306415] [<c1139fd5>] user_path_at_empty+0x25/0x30
[   61.306415]                     [   61.306415] [<c11298c6>] SyS_faccessat+0x86/0x1e0
[   61.306415]                     [   61.306415] [<c1129a30>] SyS_access+0x10/0x20
[   61.306415]                     [   61.306415] [<c100179f>] do_int80_syscall_32+0x3f/0x110
[   61.306415]                     [   61.306415] [<c14bba3f>] restore_all+0x0/0x61
[   61.306415]    INITIAL USE at:
[   61.306415]                    [   61.306415] [<c107586e>] __lock_acquire+0x1fe/0x1770
[   61.306415]                    [   61.306415] [<c107717c>] lock_acquire+0x7c/0x150
[   61.306415]                    [   61.306415] [<c14bb334>] _raw_spin_lock_irqsave+0x24/0x40
[   61.306415]                    [   61.306415] [<d093474e>] r6040_get_stats+0x1e/0x60 [r6040]
[   61.306415]                    [   61.306415] [<c139fb16>] dev_get_stats+0x96/0xc0
[   61.306415]                    [   61.306415] [<c14b416e>] rtnl_fill_stats+0x36/0xfd
[   61.306415]                    [   61.306415] [<c13b7b3c>] rtnl_fill_ifinfo+0x47c/0xce0
[   61.306415]                    [   61.306415] [<c13bc08e>] rtmsg_ifinfo_build_skb+0x4e/0xd0
[   61.306415]                    [   61.306415] [<c13bc120>] rtmsg_ifinfo.part.20+0x10/0x40
[   61.306415]                    [   61.306415] [<c13bc16b>] rtmsg_ifinfo+0x1b/0x20
[   61.306415]                    [   61.306415] [<c13a9d19>] register_netdevice+0x409/0x550
[   61.306415]                    [   61.306415] [<c13a9e72>] register_netdev+0x12/0x20
[   61.306415]                    [   61.306415] [<d09357e8>] r6040_init_one+0x3e8/0x500 [r6040]
[   61.306415]                    [   61.306415] [<c12a78c7>] pci_device_probe+0x77/0xd0
[   61.306415]                    [   61.306415] [<c12f5e15>] driver_probe_device+0x145/0x280
[   61.306415]                    [   61.306415] [<c12f5fd9>] __driver_attach+0x89/0x90
[   61.306415]                    [   61.306415] [<c12f43ef>] bus_for_each_dev+0x4f/0x80
[   61.306415]                    [   61.306415] [<c12f5954>] driver_attach+0x14/0x20
[   61.306415]                    [   61.306415] [<c12f55b7>] bus_add_driver+0x197/0x210
[   61.306415]                    [   61.306415] [<c12f6a21>] driver_register+0x51/0xd0
[   61.306415]                    [   61.306415] [<c12a6955>] __pci_register_driver+0x45/0x50
[   61.306415]                    [   61.306415] [<d0938017>] 0xd0938017
[   61.306415]                    [   61.306415] [<c100043f>] do_one_initcall+0x2f/0x140
[   61.306415]                    [   61.306415] [<c10e48c0>] do_init_module+0x4a/0x19b
[   61.306415]                    [   61.306415] [<c10a680e>] load_module+0x1b2e/0x2070
[   61.306415]                    [   61.306415] [<c10a6eb9>] SyS_finit_module+0x69/0x80
[   61.306415]                    [   61.306415] [<c100179f>] do_int80_syscall_32+0x3f/0x110
[   61.306415]                    [   61.306415] [<c14bba3f>] restore_all+0x0/0x61
[   61.306415]  }
[   61.306415]  ... key      at: [<d0936280>] __key.45893+0x0/0xfffff739 [r6040]
[   61.306415]  ... acquired at:
[   61.306415]    [   61.306415] [<c1074a32>] check_irq_usage+0x42/0xb0
[   61.306415]    [   61.306415] [<c107677c>] __lock_acquire+0x110c/0x1770
[   61.306415]    [   61.306415] [<c107717c>] lock_acquire+0x7c/0x150
[   61.306415]    [   61.306415] [<c14b7add>] mutex_lock_nested+0x2d/0x4a0
[   61.306415]    [   61.306415] [<c1336276>] phy_stop+0x16/0x80
[   61.306415]    [   61.306415] [<d0934ce9>] r6040_close+0x89/0x230 [r6040]
[   61.306415]    [   61.306415] [<c13a0a91>] __dev_close_many+0x61/0xa0
[   61.306415]    [   61.306415] [<c13a0bbf>] __dev_close+0x1f/0x30
[   61.306415]    [   61.306415] [<c13a9127>] __dev_change_flags+0x87/0x150
[   61.306415]    [   61.306415] [<c13a9213>] dev_change_flags+0x23/0x60
[   61.306415]    [   61.306415] [<c1416238>] devinet_ioctl+0x5f8/0x6f0
[   61.306415]    [   61.306415] [<c1417f75>] inet_ioctl+0x65/0x90
[   61.306415]    [   61.306415] [<c1389b54>] sock_ioctl+0x124/0x2b0
[   61.306415]    [   61.306415] [<c113cf7c>] do_vfs_ioctl+0x7c/0x790
[   61.306415]    [   61.306415] [<c113d6b8>] SyS_ioctl+0x28/0x50
[   61.306415]    [   61.306415] [<c100179f>] do_int80_syscall_32+0x3f/0x110
[   61.306415]    [   61.306415] [<c14bba3f>] restore_all+0x0/0x61
[   61.306415]
[   61.306415]
the dependencies between the lock to be acquired[   61.306415]  and SOFTIRQ-irq-unsafe lock:
[   61.306415] -> (&dev->lock){+.+...} ops: 56 {
[   61.306415]    HARDIRQ-ON-W at:
[   61.306415]                     [   61.306415] [<c1075be7>] __lock_acquire+0x577/0x1770
[   61.306415]                     [   61.306415] [<c107717c>] lock_acquire+0x7c/0x150
[   61.306415]                     [   61.306415] [<c14b7add>] mutex_lock_nested+0x2d/0x4a0
[   61.306415]                     [   61.306415] [<c133747d>] phy_probe+0x4d/0xc0
[   61.306415]                     [   61.306415] [<c1338afe>] phy_attach_direct+0xbe/0x190
[   61.306415]                     [   61.306415] [<c1338ca7>] phy_connect_direct+0x17/0x60
[   61.306415]                     [   61.306415] [<c1338d23>] phy_connect+0x33/0x70
[   61.306415]                     [   61.306415] [<d09357a0>] r6040_init_one+0x3a0/0x500 [r6040]
[   61.306415]                     [   61.306415] [<c12a78c7>] pci_device_probe+0x77/0xd0
[   61.306415]                     [   61.306415] [<c12f5e15>] driver_probe_device+0x145/0x280
[   61.306415]                     [   61.306415] [<c12f5fd9>] __driver_attach+0x89/0x90
[   61.306415]                     [   61.306415] [<c12f43ef>] bus_for_each_dev+0x4f/0x80
[   61.306415]                     [   61.306415] [<c12f5954>] driver_attach+0x14/0x20
[   61.306415]                     [   61.306415] [<c12f55b7>] bus_add_driver+0x197/0x210
[   61.306415]                     [   61.306415] [<c12f6a21>] driver_register+0x51/0xd0
[   61.306415]                     [   61.306415] [<c12a6955>] __pci_register_driver+0x45/0x50
[   61.306415]                     [   61.306415] [<d0938017>] 0xd0938017
[   61.306415]                     [   61.306415] [<c100043f>] do_one_initcall+0x2f/0x140
[   61.306415]                     [   61.306415] [<c10e48c0>] do_init_module+0x4a/0x19b
[   61.306415]                     [   61.306415] [<c10a680e>] load_module+0x1b2e/0x2070
[   61.306415]                     [   61.306415] [<c10a6eb9>] SyS_finit_module+0x69/0x80
[   61.306415]                     [   61.306415] [<c100179f>] do_int80_syscall_32+0x3f/0x110
[   61.306415]                     [   61.306415] [<c14bba3f>] restore_all+0x0/0x61
[   61.306415]    SOFTIRQ-ON-W at:
[   61.306415]                     [   61.306415] [<c1075c0c>] __lock_acquire+0x59c/0x1770
[   61.306415]                     [   61.306415] [<c107717c>] lock_acquire+0x7c/0x150
[   61.306415]                     [   61.306415] [<c14b7add>] mutex_lock_nested+0x2d/0x4a0
[   61.306415]                     [   61.306415] [<c133747d>] phy_probe+0x4d/0xc0
[   61.306415]                     [   61.306415] [<c1338afe>] phy_attach_direct+0xbe/0x190
[   61.306415]                     [   61.306415] [<c1338ca7>] phy_connect_direct+0x17/0x60
[   61.306415]                     [   61.306415] [<c1338d23>] phy_connect+0x33/0x70
[   61.306415]                     [   61.306415] [<d09357a0>] r6040_init_one+0x3a0/0x500 [r6040]
[   61.306415]                     [   61.306415] [<c12a78c7>] pci_device_probe+0x77/0xd0
[   61.306415]                     [   61.306415] [<c12f5e15>] driver_probe_device+0x145/0x280
[   61.306415]                     [   61.306415] [<c12f5fd9>] __driver_attach+0x89/0x90
[   61.306415]                     [   61.306415] [<c12f43ef>] bus_for_each_dev+0x4f/0x80
[   61.306415]                     [   61.306415] [<c12f5954>] driver_attach+0x14/0x20
[   61.306415]                     [   61.306415] [<c12f55b7>] bus_add_driver+0x197/0x210
[   61.306415]                     [   61.306415] [<c12f6a21>] driver_register+0x51/0xd0
[   61.306415]                     [   61.306415] [<c12a6955>] __pci_register_driver+0x45/0x50
[   61.306415]                     [   61.306415] [<d0938017>] 0xd0938017
[   61.306415]                     [   61.306415] [<c100043f>] do_one_initcall+0x2f/0x140
[   61.306415]                     [   61.306415] [<c10e48c0>] do_init_module+0x4a/0x19b
[   61.306415]                     [   61.306415] [<c10a680e>] load_module+0x1b2e/0x2070
[   61.306415]                     [   61.306415] [<c10a6eb9>] SyS_finit_module+0x69/0x80
[   61.306415]                     [   61.306415] [<c100179f>] do_int80_syscall_32+0x3f/0x110
[   61.306415]                     [   61.306415] [<c14bba3f>] restore_all+0x0/0x61
[   61.306415]    INITIAL USE at:
[   61.306415]                    [   61.306415] [<c107586e>] __lock_acquire+0x1fe/0x1770
[   61.306415]                    [   61.306415] [<c107717c>] lock_acquire+0x7c/0x150
[   61.306415]                    [   61.306415] [<c14b7add>] mutex_lock_nested+0x2d/0x4a0
[   61.306415]                    [   61.306415] [<c133747d>] phy_probe+0x4d/0xc0
[   61.306415]                    [   61.306415] [<c1338afe>] phy_attach_direct+0xbe/0x190
[   61.306415]                    [   61.306415] [<c1338ca7>] phy_connect_direct+0x17/0x60
[   61.306415]                    [   61.306415] [<c1338d23>] phy_connect+0x33/0x70
[   61.306415]                    [   61.306415] [<d09357a0>] r6040_init_one+0x3a0/0x500 [r6040]
[   61.306415]                    [   61.306415] [<c12a78c7>] pci_device_probe+0x77/0xd0
[   61.306415]                    [   61.306415] [<c12f5e15>] driver_probe_device+0x145/0x280
[   61.306415]                    [   61.306415] [<c12f5fd9>] __driver_attach+0x89/0x90
[   61.306415]                    [   61.306415] [<c12f43ef>] bus_for_each_dev+0x4f/0x80
[   61.306415]                    [   61.306415] [<c12f5954>] driver_attach+0x14/0x20
[   61.306415]                    [   61.306415] [<c12f55b7>] bus_add_driver+0x197/0x210
[   61.306415]                    [   61.306415] [<c12f6a21>] driver_register+0x51/0xd0
[   61.306415]                    [   61.306415] [<c12a6955>] __pci_register_driver+0x45/0x50
[   61.306415]                    [   61.306415] [<d0938017>] 0xd0938017
[   61.306415]                    [   61.306415] [<c100043f>] do_one_initcall+0x2f/0x140
[   61.306415]                    [   61.306415] [<c10e48c0>] do_init_module+0x4a/0x19b
[   61.306415]                    [   61.306415] [<c10a680e>] load_module+0x1b2e/0x2070
[   61.306415]                    [   61.306415] [<c10a6eb9>] SyS_finit_module+0x69/0x80
[   61.306415]                    [   61.306415] [<c100179f>] do_int80_syscall_32+0x3f/0x110
[   61.306415]                    [   61.306415] [<c14bba3f>] restore_all+0x0/0x61
[   61.306415]  }
[   61.306415]  ... key      at: [<c1f28f39>] __key.43998+0x0/0x8
[   61.306415]  ... acquired at:
[   61.306415]    [   61.306415] [<c1074a32>] check_irq_usage+0x42/0xb0
[   61.306415]    [   61.306415] [<c107677c>] __lock_acquire+0x110c/0x1770
[   61.306415]    [   61.306415] [<c107717c>] lock_acquire+0x7c/0x150
[   61.306415]    [   61.306415] [<c14b7add>] mutex_lock_nested+0x2d/0x4a0
[   61.306415]    [   61.306415] [<c1336276>] phy_stop+0x16/0x80
[   61.306415]    [   61.306415] [<d0934ce9>] r6040_close+0x89/0x230 [r6040]
[   61.306415]    [   61.306415] [<c13a0a91>] __dev_close_many+0x61/0xa0
[   61.306415]    [   61.306415] [<c13a0bbf>] __dev_close+0x1f/0x30
[   61.306415]    [   61.306415] [<c13a9127>] __dev_change_flags+0x87/0x150
[   61.306415]    [   61.306415] [<c13a9213>] dev_change_flags+0x23/0x60
[   61.306415]    [   61.306415] [<c1416238>] devinet_ioctl+0x5f8/0x6f0
[   61.306415]    [   61.306415] [<c1417f75>] inet_ioctl+0x65/0x90
[   61.306415]    [   61.306415] [<c1389b54>] sock_ioctl+0x124/0x2b0
[   61.306415]    [   61.306415] [<c113cf7c>] do_vfs_ioctl+0x7c/0x790
[   61.306415]    [   61.306415] [<c113d6b8>] SyS_ioctl+0x28/0x50
[   61.306415]    [   61.306415] [<c100179f>] do_int80_syscall_32+0x3f/0x110
[   61.306415]    [   61.306415] [<c14bba3f>] restore_all+0x0/0x61
[   61.306415]
[   61.306415]
[   61.306415] stack backtrace:
[   61.306415] CPU: 0 PID: 449 Comm: ifconfig Not tainted 4.9.0-gb898d2d-manuel #1
[   61.306415] Call Trace:
[   61.306415]  dump_stack+0x16/0x19
[   61.306415]  check_usage+0x3f6/0x550
[   61.306415]  ? check_usage+0x4d/0x550
[   61.306415]  check_irq_usage+0x42/0xb0
[   61.306415]  __lock_acquire+0x110c/0x1770
[   61.306415]  lock_acquire+0x7c/0x150
[   61.306415]  ? phy_stop+0x16/0x80
[   61.306415]  mutex_lock_nested+0x2d/0x4a0
[   61.306415]  ? phy_stop+0x16/0x80
[   61.306415]  ? r6040_close+0x24/0x230 [r6040]
[   61.306415]  ? __delay+0x9/0x10
[   61.306415]  phy_stop+0x16/0x80
[   61.306415]  r6040_close+0x89/0x230 [r6040]
[   61.306415]  __dev_close_many+0x61/0xa0
[   61.306415]  __dev_close+0x1f/0x30
[   61.306415]  __dev_change_flags+0x87/0x150
[   61.306415]  dev_change_flags+0x23/0x60
[   61.306415]  devinet_ioctl+0x5f8/0x6f0
[   61.306415]  inet_ioctl+0x65/0x90
[   61.306415]  sock_ioctl+0x124/0x2b0
[   61.306415]  ? dlci_ioctl_set+0x30/0x30
[   61.306415]  do_vfs_ioctl+0x7c/0x790
[   61.306415]  ? trace_hardirqs_on+0xb/0x10
[   61.306415]  ? call_rcu_sched+0xd/0x10
[   61.306415]  ? __put_cred+0x32/0x50
[   61.306415]  ? SyS_faccessat+0x178/0x1e0
[   61.306415]  SyS_ioctl+0x28/0x50
[   61.306415]  do_int80_syscall_32+0x3f/0x110
[   61.306415]  entry_INT80_32+0x2f/0x2f
[   61.306415] EIP: 0xb764d364
[   61.306415] EFLAGS: 00000286 CPU: 0
[   61.306415] EAX: ffffffda EBX: 00000004 ECX: 00008914 EDX: bfa99d7c
[   61.306415] ESI: bfa99e4c EDI: fffffffe EBP: 00000004 ESP: bfa99d58
[   61.306415]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[   63.836607] r6040 0000:00:08.0 eth0: Link is Down

Signed-off-by: Manuel Bessler <manuel.bessler@sensus.com>
---
 drivers/net/ethernet/rdc/r6040.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/rdc/r6040.c b/drivers/net/ethernet/rdc/r6040.c
index 4ff4e04..aa11b70 100644
--- a/drivers/net/ethernet/rdc/r6040.c
+++ b/drivers/net/ethernet/rdc/r6040.c
@@ -472,8 +472,6 @@ static void r6040_down(struct net_device *dev)
 	iowrite16(adrp[0], ioaddr + MID_0L);
 	iowrite16(adrp[1], ioaddr + MID_0M);
 	iowrite16(adrp[2], ioaddr + MID_0H);
-
-	phy_stop(dev->phydev);
 }
 
 static int r6040_close(struct net_device *dev)
@@ -481,12 +479,12 @@ static int r6040_close(struct net_device *dev)
 	struct r6040_private *lp = netdev_priv(dev);
 	struct pci_dev *pdev = lp->pdev;
 
-	spin_lock_irq(&lp->lock);
+	phy_stop(dev->phydev);
 	napi_disable(&lp->napi);
 	netif_stop_queue(dev);
-	r6040_down(dev);
 
-	free_irq(dev->irq, dev);
+	spin_lock_irq(&lp->lock);
+	r6040_down(dev);
 
 	/* Free RX buffer */
 	r6040_free_rxbufs(dev);
@@ -496,6 +494,8 @@ static int r6040_close(struct net_device *dev)
 
 	spin_unlock_irq(&lp->lock);
 
+	free_irq(dev->irq, dev);
+
 	/* Free Descriptor memory */
 	if (lp->rx_ring) {
 		pci_free_consistent(pdev,
-- 
2.7.4

^ permalink raw reply related

* Re: [RFC v2 00/10] HFI Virtual Network Interface Controller (VNIC)
From: Vishwanathapura, Niranjana @ 2016-12-16  4:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w,
	ira.weiny-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <20161215165611.GB3264-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Thu, Dec 15, 2016 at 09:56:11AM -0700, Jason Gunthorpe wrote:
>On Wed, Dec 14, 2016 at 11:59:32PM -0800, Vishwanathapura, Niranjana wrote:
>>  create mode 100644 drivers/infiniband/sw/intel/hfi_vnic/Kconfig
>>  create mode 100644 drivers/infiniband/sw/intel/hfi_vnic/Makefile
>
>Stil NAK on these paths, I already explained why 'sw' is totally
>unsuitable. Put it in drivers/net or drivers/infiniband/ulp
>

I understand. I did not want to change dirver location until we concenses
on where it belongs.
In next revision, I will move it under drivers/infiniband/ulp/hfi_vnic.
If anybody thinks it should be in a different folder, let me know.

Niranjana

>Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC v2 00/10] HFI Virtual Network Interface Controller (VNIC)
From: Jason Gunthorpe @ 2016-12-16  4:17 UTC (permalink / raw)
  To: ira.weiny
  Cc: Doug Ledford, Leon Romanovsky, Jeff Kirsher, David S. Miller,
	Vishwanathapura, Niranjana, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <20161216012404.GD3785-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>

On Thu, Dec 15, 2016 at 08:24:05PM -0500, ira.weiny wrote:
> > The main share is the 'skb send' part, we've talked about hoisting
> > that out of ipoib in the past anyhow. A generic verb along those lines
> > would probably allow the sdma optimization for hfi for both this new
> > ulp and ipoib without creating such an ugly HFI1 specific interface.
> 
> I'm not sure what you mean about "skb send" being used by ipoib.  Right now
> IPoIB already supplies a "generic skb send" for _Verbs_ in ipoib_send.

Sending a skb is very hard, the boring standard verbs implementation
is slow. Mellanox extended that with some simple offloads, but it is not
enough to get the full performance out of the hardware.

The suggestion is to add a 'skb qp', for lack of a better name, that is
perfectly optimized to delegate working with skbs to the driver. The
driver will then optmize with all possible offloads and bypass the
verbs qp API between netdev and driver.

Look at what is already in patch #2:

+struct hfi_vnic_ops {
+	int (*open)(struct hfi_vnic_port *vport, hfi_vnic_evt_cb_fn cb);
+	void (*close)(struct hfi_vnic_port *vport);
+	int (*put_skb)(struct hfi_vnic_port *vport,
+	           u8 q_idx, struct sk_buff *skb);
+       struct sk_buff *(*get_skb)(struct hfi_vnic_port *vport, u8 q_idx);
+	      u16 (*get_read_avail)(struct hfi_vnic_port *vport, u8 q_idx);
+	bool (*get_write_avail)(struct hfi_vnic_port *vport, u8 q_idx);
+	u8 (*select_queue)(struct hfi_vnic_port *vport, u8 vl, u8 entropy);
+	void (*config_notify)(struct hfi_vnic_port *vport,
+	     			           u8 evt, bool enable);

That is almost what I'm talking about.

A 'vport' is a 'skb qp' that has been made overly specific.

So, clean it up to get rid of all the hfi specific stuff, stop calling
it a port. Get feedback from Mellanox. Refactor ipoib to use it to
show that it works sanely with both drivers.

> I don't know what other devices would do to implement ipoib_send?  To me, it
> seems like the abstraction for IPoIB is at the proper layer now.

An example is multi queue tx with QPN sharing. We don't have anything
like that in verbs. At some point it just doesn't make sense to twist
verbs into knots to do this stuff - a skb qp is much cleaner and
powerful.

> For OPA, te hfi driver supports both IPoIB and VNIC.  So expecting IPoIB and
> VNIC to use a generic "skb send" in ib_device is going to make hfi1 do a lot of
> work to determine which ULP is calling it or make the interface kind
> of ugly.

I don' think it will be that bad at all, the tx path is actually the
same execpt for the header construction. Handling that should not be
ugly, IMHO.

Think of it this way, if you do this you can probably boost the ipoib
performance on hfi as well.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC v2 03/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) netdev
From: Jason Gunthorpe @ 2016-12-16  4:24 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: dledford, linux-rdma, netdev, dennis.dalessandro, ira.weiny,
	Sadanand Warrier, Sudeep Dutt, Tanya K Jajodia,
	Andrzej Kacprowski
In-Reply-To: <20161216025947.GC90951@knc-06.sc.intel.com>

On Thu, Dec 15, 2016 at 06:59:47PM -0800, Vishwanathapura, Niranjana wrote:
> We have made the hfi_vnic driver dependent on CONFIG_X86_64.

Er, don't do that either?

> >>+struct __hfi_vesw_info {
> >>+	u16  fabric_id;
> >>+	u16  vesw_id;
> >>+
> >>+	u8   rsvd0[6];
> >>+	u16  def_port_mask;
> >>+
> >>+	u8   rsvd1[2];
> >>+	u16  pkey;
> >>+
> >>+	u8   rsvd2[4];
> >>+	u32  u_mcast_dlid;
> >>+	u32  u_ucast_dlid[HFI_VESW_MAX_NUM_DEF_PORT];
> >>+
> >>+	u8   rsvd3[44];
> >>+	u16  eth_mtu[HFI_VNIC_MAX_NUM_PCP];
> >>+	u16  eth_mtu_non_vlan;
> >>+	u8   rsvd4[2];
> >>+} __packed;
> >
> >This goes on the network too? Also looks like it has endian problems.
> >
> >Ditto for all the __packed structures.
> >
> 
> This is in CPU format. There is a separate big endian version of
> this

Why are CPU handled structures packed and full of reserved fields?
Don't pack them if they are not pushed out to the network..

There were lots of __packed structures, any that go on the network
need be/le annoations.

Jason

^ permalink raw reply

* Re: [PATCH] net: wan: Use dma_pool_zalloc
From: Joe Perches @ 2016-12-16  6:10 UTC (permalink / raw)
  To: Souptick Joarder; +Cc: Krzysztof Hałasa, netdev, Rameshwar Sahu
In-Reply-To: <CAFqt6zaPWuc+gZTMqk-gh8weZvAjvVwm0kyDT5ub=iah=y9skA@mail.gmail.com>

On Fri, 2016-12-16 at 11:33 +0530, Souptick Joarder wrote:
> On Thu, Dec 15, 2016 at 10:18 PM, Joe Perches <joe@perches.com> wrote:
> > On Thu, 2016-12-15 at 10:41 +0530, Souptick Joarder wrote:
> > > On Mon, Dec 12, 2016 at 10:12 AM, Souptick Joarder <jrdr.linux@gmail.com> wrote:
> > > > On Fri, Dec 9, 2016 at 6:33 PM, Krzysztof Hałasa <khalasa@piap.pl> wrote:
> > > > > Souptick Joarder <jrdr.linux@gmail.com> writes:
> > > > > 
> > > > > > We should use dma_pool_zalloc instead of dma_pool_alloc/memset
> > 
> > []
> > > > > > diff --git a/drivers/net/wan/ixp4xx_hss.c b/drivers/net/wan/ixp4xx_hss.c
> > 
> > []
> > > > > > @@ -976,10 +976,9 @@ static int init_hdlc_queues(struct port *port)
> > > > > >                       return -ENOMEM;
> > > > > >       }
> > > > > > 
> > > > > > -     if (!(port->desc_tab = dma_pool_alloc(dma_pool, GFP_KERNEL,
> > > > > > -                                           &port->desc_tab_phys)))
> > > > > > +     if (!(port->desc_tab = dma_pool_zalloc(dma_pool, GFP_KERNEL,
> > > > > > +                                            &port->desc_tab_phys)))
> > > > > >               return -ENOMEM;
> > > > > > -     memset(port->desc_tab, 0, POOL_ALLOC_SIZE);
> > > > > >       memset(port->rx_buff_tab, 0, sizeof(port->rx_buff_tab)); /* tables */
> > > > > >       memset(port->tx_buff_tab, 0, sizeof(port->tx_buff_tab));
> > > > > 
> > > > > This look fine, feel free to send it to the netdev mailing list for
> > > > > inclusion.
> > > > 
> > > > Including netdev mailing list based as requested.
> > > > > Acked-by: Krzysztof Halasa <khalasa@piap.pl>
> > 
> > []
> > > Any comment on this patch ?
> > 
> > Shouldn't the one in drivers/net/ethernet/xscale/ixp4xx_eth.c
> > also be changed?
> 
> Yes, you are right.   Do you want me to include it in same patch?

Your choice.  I would use a single patch.

^ permalink raw reply

* Re: [PATCH] net: wan: Use dma_pool_zalloc
From: Souptick Joarder @ 2016-12-16  6:03 UTC (permalink / raw)
  To: Joe Perches; +Cc: Krzysztof Hałasa, netdev, Rameshwar Sahu
In-Reply-To: <1481820487.29291.66.camel@perches.com>

On Thu, Dec 15, 2016 at 10:18 PM, Joe Perches <joe@perches.com> wrote:
> On Thu, 2016-12-15 at 10:41 +0530, Souptick Joarder wrote:
>> On Mon, Dec 12, 2016 at 10:12 AM, Souptick Joarder <jrdr.linux@gmail.com> wrote:
>> > On Fri, Dec 9, 2016 at 6:33 PM, Krzysztof Hałasa <khalasa@piap.pl> wrote:
>> > > Souptick Joarder <jrdr.linux@gmail.com> writes:
>> > >
>> > > > We should use dma_pool_zalloc instead of dma_pool_alloc/memset
> []
>> > > > diff --git a/drivers/net/wan/ixp4xx_hss.c b/drivers/net/wan/ixp4xx_hss.c
> []
>> > > > @@ -976,10 +976,9 @@ static int init_hdlc_queues(struct port *port)
>> > > >                       return -ENOMEM;
>> > > >       }
>> > > >
>> > > > -     if (!(port->desc_tab = dma_pool_alloc(dma_pool, GFP_KERNEL,
>> > > > -                                           &port->desc_tab_phys)))
>> > > > +     if (!(port->desc_tab = dma_pool_zalloc(dma_pool, GFP_KERNEL,
>> > > > +                                            &port->desc_tab_phys)))
>> > > >               return -ENOMEM;
>> > > > -     memset(port->desc_tab, 0, POOL_ALLOC_SIZE);
>> > > >       memset(port->rx_buff_tab, 0, sizeof(port->rx_buff_tab)); /* tables */
>> > > >       memset(port->tx_buff_tab, 0, sizeof(port->tx_buff_tab));
>> > >
>> > > This look fine, feel free to send it to the netdev mailing list for
>> > > inclusion.
>> >
>> > Including netdev mailing list based as requested.
>> > > Acked-by: Krzysztof Halasa <khalasa@piap.pl>
> []
>> Any comment on this patch ?
>
> Shouldn't the one in drivers/net/ethernet/xscale/ixp4xx_eth.c
> also be changed?

Yes, you are right.   Do you want me to include it in same patch?

Regards
Souptick

^ permalink raw reply

* [PATCH RFC] liquidio: make timeout HZ independent
From: Nicholas Mc Guire @ 2016-12-16  6:57 UTC (permalink / raw)
  To: Derek Chickles
  Cc: Satanand Burla, Felix Manlunas, Raghu Vatsavayi, netdev,
	linux-kernel, Nicholas Mc Guire

schedule_timeout_* takes a timeout in jiffies but the code currently is
passing in a constant which makes this timeout HZ dependent, so pass it
through msecs_to_jiffies() to fix this up.

Fixes: commit b0d66369edcd ("liquidio VF error handling")
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
---

Problem found by coccinelle spatch

The current wait time can vary by a factor 10 depending on the HZ
setting chose, which does not seem reasonable here.

The below patch sets the timeout to 100ms - it is though not clear
if this is the intent or if it should be longer/shorter as it is not
clear what HZ setting was assumed during design and used for testing.

This needs an ack by someone who knows the device and can confirm that
100ms is reasonable to wait for completion of in-flight requests.

Patch was compile tested with: x86_64_defconfig + CONFIG_LIQUIDIO_VF=m

Patch is against 4.9.0 (localversion-next is -next-20161216)

 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 70d96c1..eca469e 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -189,7 +189,7 @@ static void pcierror_quiesce_device(struct octeon_device *oct)
 	 */
 
 	/* To allow for in-flight requests */
-	schedule_timeout_uninterruptible(100);
+	schedule_timeout_uninterruptible(msecs_to_jiffies(100));
 
 	if (wait_for_pending_requests(oct))
 		dev_err(&oct->pci_dev->dev, "There were pending requests\n");
-- 
2.1.4

^ permalink raw reply related

* Re: wl1251 & mac address & calibration data
From: Daniel Wagner @ 2016-12-16  7:25 UTC (permalink / raw)
  To: Luis R. Rodriguez, Arend Van Spriel, Tom Gundersen, Johannes Berg,
	Ming Lei, Mimi Zohar, Bjorn Andersson, Rafał Miłecki
  Cc: Pali Rohár, Kalle Valo, Sebastian Reichel, Pavel Machek,
	Michal Kazior, Ivaylo Dimitrov, Aaro Koskinen, Tony Lindgren,
	linux-wireless, Network Development,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	David Woodhouse, Takashi Iwai, Josh Boyer, Dmitry Torokhov
In-Reply-To: <CAB=NE6VHF+U7DzapENvLMKtN4Dgi9J0qMoeUJ8o4vSwc2-rRFQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 12/16/2016 03:03 AM, Luis R. Rodriguez wrote:
> For the new API a solution for "fallback mechanisms" should be clean
> though and I am looking to stay as far as possible from the existing
> mess. A solution to help both the old API and new API is possible for
> the "fallback mechanism" though -- but for that I can only refer you
> at this point to some of Daniel Wagner and Tom Gunderson's firmwared
> deamon prospect. It should help pave the way for a clean solution and
> help address other stupid issues.

The firmwared project is hosted here

https://github.com/teg/firmwared

As Luis pointed out, firmwared relies on FW_LOADER_USER_HELPER_FALLBACK, 
which is not enabled by default. I don't see any reason why firmwared 
should not also support loading calibration data. If we find a sound way 
to do this.

As you can see from the commit history it is a pretty young project and 
more ore less reanimation of the old udev firmware loader feature.  We 
are getting int into shape, adding integration tests etc.

The main motivation for this project is the get movement back in stuck 
discussion on the firmware loader API. Luis was very busy writing up all 
the details on the current situation and purely from the amount of 
documentation need to describe the API you can tell something is awry.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: Jean-Philippe Aumasson @ 2016-12-16  8:08 UTC (permalink / raw)
  To: George Spelvin, ak, davem, David.Laight, ebiggers3, hannes, Jason,
	kernel-hardening, linux-crypto, linux-kernel, luto, netdev, tom,
	torvalds, tytso, vegard.nossum
  Cc: djb
In-Reply-To: <20161216034618.28276.qmail@ns.sciencehorizons.net>

[-- Attachment #1: Type: text/plain, Size: 4380 bytes --]

Here's a tentative HalfSipHash:
https://github.com/veorq/SipHash/blob/halfsiphash/halfsiphash.c

Haven't computed the cycle count nor measured its speed.

On Fri, Dec 16, 2016 at 4:46 AM George Spelvin <linux@sciencehorizons.net>
wrote:

> Jean-Philippe Aumasson wrote:
> > If a halved version of SipHash can bring significant performance boost
> > (with 32b words instead of 64b words) with an acceptable security level
> > (64-bit enough?) then we may design such a version.
>
> It would be fairly significant, a 2x speed benefit on a lot of 32-bit
> machines.
>
> First is the fact that a 64-bit SipHash round on a generic 32-bit machine
> requires not twice as many instructions, but more than three.
>
> Consider the core SipHash quarter-round operation:
>         a += b;
>         b = rotate_left(b, k)
>         b ^= a
>
> The add and xor are equivalent between 32- and 64-bit rounds; twice the
> instructions do twice the work.  (There's a dependency via the carry
> bit between the two halves of the add, but it ends up not being on the
> critical path even in a superscalar implementation.)
>
> The problem is the rotates.  Although some particularly nice code is
> possible on 32-bit ARM due to its support for shift-and-xor operations,
> on a generic 32-bit CPU the rotate grows to 6 instructions with a 2-cycle
> dependency chain (more in practice because barrel shifters are large and
> even quad-issue CPUs can't do 4 shifts per cycle):
>
>         temp_lo = b_lo >> (32-k)
>         temp_hi = b_hi >> (32-k)
>         b_lo <<= k
>         b_hi <<= k
>         b_lo ^= temp_hi
>         b_hi ^= temp_lo
>
> The resultant instruction counts and (assuming wide issue)
> latencies are:
>
>         64-bit SipHash  "Half" SipHash
>         Inst.   Latency Inst.   Latency
>          10      3        3      2      Quarter round
>          40      6       12      4      Full round
>          80     12       24      8      Two rounds
>          82     13       26      9      Mix in one word
>          82     13       52     18      Mix in 64 bits
>         166     26       61     18      Four round finalization + final XOR
>         248     39      113     36      Hash 64 bits
>         330     52      165     54      Hash 128 bits
>         412     65      217     72      Hash 192 bits
>
> While the ideal latencies are actually better for the 64-bit algorithm,
> that requires an unrealistic 6+-wide superscalar implementation that's
> more than twice as wide as the 64-bit code requires (which is already
> optimized for quad-issue).  For a 1- or 2-wide processor, the instruction
> counts dominate, and not only does the 64-bit algorithm take 60% more
> time to mix in the same number of bytes, but the finalization rounds
> bring the ratio to 2:1 for small inputs.
>
> (And I haven't included the possible savings if the input size is an odd
> number of 32-bit words, such as networking applications which include
> the source/dest port numbers.)
>
>
> Notes on particular processors:
> - x86 can do a 64-bit rotate in 3 instructions and 2 cycles using
>   the SHLD/SHRD instructions instead:
>         movl    %b_hi, %temp
>         shldl   $k, %b_lo, %b_hi
>         shldl   $k, %temp, %b_lo
>   ... but as I mentioned the problem is registers.  SipHash needs 8 32-bit
>   words plus at least one temporary, and 32-bit x86 has only 7 available.
>   (And compilers can rarely manage to keep more than 6 of them busy.)
> - 64-bit SipHash is particularly efficient on 32-bit ARM due to its
>   support for shift-and-op instructions.  The 64-bit shift and following
>   xor can be done in 4 instructions.  So the only benefit is from the
>   reduced finalization.
> - Double-width adds cost a little more on CPUs like MIPS and RISC-V without
>   condition codes.
> - Certain particularly crappy uClinux processors with slow shifts
>   (68000, anyone?) really suffer from extra shifts.
>
> One *weakly* requested feature: It might simplify some programming
> interfaces if we could use the same key for multiple hash tables with a
> 1-word "tweak" (e.g. pointer to the hash table, so it could be assumed
> non-zero if that helped) to make distinct functions.  That would let us
> more safely use a global key for multiple small hash tables without the
> need to add code to generate and store key material for each place that
> an unkeyed hash is replaced.
>

[-- Attachment #2: Type: text/html, Size: 6883 bytes --]

^ permalink raw reply

* [PATCH] liquidio CN23XX: make timeout HZ independent
From: Nicholas Mc Guire @ 2016-12-16  8:10 UTC (permalink / raw)
  To: Derek Chickles
  Cc: Satanand Burla, Felix Manlunas, Raghu Vatsavayi, netdev,
	linux-kernel, Nicholas Mc Guire

schedule_timeout_* takes a timeout in jiffies but the code currently is
passing in a constant which makes this timeout HZ dependent, so pass it
through msecs_to_jiffies() to fix this up.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
---

Problem found by coccinelle spatch

The current delay can vary by a factor 10 depending on the HZ
setting chose, which does not seem reasonable here.

The below patch sets the timeout to 10ms - it is though not clear
if this is the intent or if it should be longer/shorter as it is not
clear what HZ setting was assumed during design and used for testing.

This needs an ack by someone who knows the device and can confirm that
10ms is reasonable to wait for completion of queuing.

Patch was compile tested with: x86_64_defconfig + CONFIG_LIQUIDIO_VF=m
Note that this driver has a large number of sparse warnings !

Patch is against 4.9.0 (localversion-next is -next-20161216)

 drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c | 2 ++-
 1 file changed, 2 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
index 73696b42..1a0fcbe 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
@@ -171,7 +171,8 @@ int octeon_mbox_write(struct octeon_device *oct,
 			count = 0;
 			while (readq(mbox->mbox_write_reg) !=
 			       OCTEON_PFVFACK) {
-				schedule_timeout_uninterruptible(10);
+				schedule_timeout_uninterruptible(
+							msecs_to_jiffies(10));
 				if (count++ == LIO_MBOX_WRITE_WAIT_CNT) {
 					ret = OCTEON_MBOX_STATUS_FAILED;
 					break;
-- 
2.1.4

^ permalink raw reply related

* [PATCH] net: ipv6: check route protocol when deleting routes
From: Mantas Mikulėnas @ 2016-12-16  8:30 UTC (permalink / raw)
  To: netdev, linux-kernel, davem; +Cc: Mantas Mikulėnas

The protocol field is checked when deleting IPv4 routes, but ignored for
IPv6, which causes problems with routing daemons accidentally deleting
externally set routes (observed by multiple bird6 users).

This can be verified using `ip -6 route del <prefix> proto something`.

Signed-off-by: Mantas Mikulėnas <grawity@gmail.com>
---
 net/ipv6/route.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 947ed1ded026..ef6de8f9e5a2 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2163,6 +2163,8 @@ static int ip6_route_del(struct fib6_config *cfg)
 				continue;
 			if (cfg->fc_metric && cfg->fc_metric != rt->rt6i_metric)
 				continue;
+			if (cfg->fc_protocol && cfg->fc_protocol != rt->rt6i_protocol)
+				continue;
 			dst_hold(&rt->dst);
 			read_unlock_bh(&table->tb6_lock);
 
-- 
2.11.0

^ permalink raw reply related

* [PATCH mac80211-next] rfkill: hide unused goto label
From: Arnd Bergmann @ 2016-12-16  8:44 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Arnd Bergmann, David S. Miller, João Paulo Rechi Vita,
	Michał Kępień, linux-wireless, netdev,
	linux-kernel

A cleanup introduced a harmless warning in some configurations:

net/rfkill/core.c: In function 'rfkill_init':
net/rfkill/core.c:1350:1: warning: label 'error_input' defined but not used [-Wunused-label]

This adds another #ifdef around the label to match that around the
caller.

Fixes: 6124c53edeea ("rfkill: Cleanup error handling in rfkill_init()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 net/rfkill/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index f6b01f3616e5..b7adaee69756 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -1347,8 +1347,10 @@ static int __init rfkill_init(void)
 
 	return 0;
 
+#ifdef CONFIG_RFKILL_INPUT
 error_input:
 	rfkill_any_led_trigger_unregister();
+#endif
 error_led_trigger:
 	misc_deregister(&rfkill_miscdev);
 error_misc:
-- 
2.9.0

^ permalink raw reply related

* [PATCH net] qed: fix old-style function definition
From: Arnd Bergmann @ 2016-12-16  8:47 UTC (permalink / raw)
  To: Yuval Mintz, Ariel Elior, everest-linux-l2
  Cc: Arnd Bergmann, Arun Easi, Hannes Reinecke, David S. Miller,
	Johannes Thumshirn, netdev, linux-kernel

The newly added file causes a harmless warning, with "make W=1":

drivers/net/ethernet/qlogic/qed/qed_iscsi.c: In function 'qed_get_iscsi_ops':
drivers/net/ethernet/qlogic/qed/qed_iscsi.c:1268:29: warning: old-style function definition [-Wold-style-definition]

This makes it a proper prototype.

Fixes: fc831825f99e ("qed: Add support for hardware offloaded iSCSI.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
I only saw this after it was already merged for v4.10

 drivers/net/ethernet/qlogic/qed/qed_iscsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_iscsi.c b/drivers/net/ethernet/qlogic/qed/qed_iscsi.c
index 00efb1c4c57e..17a70122df05 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iscsi.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_iscsi.c
@@ -1265,7 +1265,7 @@ static const struct qed_iscsi_ops qed_iscsi_ops_pass = {
 	.get_stats = &qed_iscsi_stats,
 };
 
-const struct qed_iscsi_ops *qed_get_iscsi_ops()
+const struct qed_iscsi_ops *qed_get_iscsi_ops(void)
 {
 	return &qed_iscsi_ops_pass;
 }
-- 
2.9.0

^ permalink raw reply related

* Re: [PATCH net] qed: fix old-style function definition
From: Johannes Thumshirn @ 2016-12-16  8:55 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Yuval Mintz, Ariel Elior, everest-linux-l2, Arun Easi,
	Hannes Reinecke, David S. Miller, netdev, linux-kernel
In-Reply-To: <20161216084808.1815139-1-arnd@arndb.de>

On Fri, Dec 16, 2016 at 09:47:41AM +0100, Arnd Bergmann wrote:
> The newly added file causes a harmless warning, with "make W=1":
> 
> drivers/net/ethernet/qlogic/qed/qed_iscsi.c: In function 'qed_get_iscsi_ops':
> drivers/net/ethernet/qlogic/qed/qed_iscsi.c:1268:29: warning: old-style function definition [-Wold-style-definition]
> 
> This makes it a proper prototype.
> 
> Fixes: fc831825f99e ("qed: Add support for hardware offloaded iSCSI.")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox