Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long
From: Theodore Ts'o @ 2016-12-14 16:37 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Netdev, David Miller, Linus Torvalds,
	kernel-hardening@lists.openwall.com, LKML, George Spelvin,
	Scott Bauer, Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers,
	linux-crypto, Jean-Philippe Aumasson
In-Reply-To: <20161214031037.25498-1-Jason@zx2c4.com>

On Wed, Dec 14, 2016 at 04:10:37AM +0100, Jason A. Donenfeld wrote:
> This duplicates the current algorithm for get_random_int/long, but uses
> siphash24 instead. This comes with several benefits. It's certainly
> faster and more cryptographically secure than MD5. This patch also
> hashes the pid, entropy, and timestamp as fixed width fields, in order
> to increase diffusion.
> 
> The previous md5 algorithm used a per-cpu md5 state, which caused
> successive calls to the function to chain upon each other. While it's
> not entirely clear that this kind of chaining is absolutely necessary
> when using a secure PRF like siphash24, it can't hurt, and the timing of
> the call chain does add a degree of natural entropy. So, in keeping with
> this design, instead of the massive per-cpu 64-byte md5 state, there is
> instead a per-cpu previously returned value for chaining.
> 
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>

The original reason for get_random_int was because the original
urandom algorithms were too slow.  When we moved to chacha20, which is
must faster, I didn't think to revisit get_random_int() and
get_random_long().

One somewhat undesirable aspect of the current algorithm is that we
never change random_int_secret.  So I've been toying with the
following, which is 4 times faster than md5.  (I haven't tried
benchmarking against siphash yet.)

[    3.606139] random benchmark!!
[    3.606276] get_random_int # cycles: 326578
[    3.606317] get_random_int_new # cycles: 95438
[    3.607423] get_random_bytes # cycles: 2653388

     	       			  	  - Ted

P.S.  It's interesting to note that siphash24 and chacha20 are both
add-rotate-xor based algorithms.

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d6876d506220..be172ea75799 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1681,6 +1681,38 @@ static int rand_initialize(void)
 }
 early_initcall(rand_initialize);
 
+static unsigned int get_random_int_new(void);
+
+static int rand_benchmark(void)
+{
+	cycles_t start,finish;
+	int i, out;
+
+	pr_crit("random benchmark!!\n");
+	start = get_cycles();
+	for (i = 0; i < 1000; i++) {
+		get_random_int();
+	}
+	finish = get_cycles();
+	pr_err("get_random_int # cycles: %llu\n", finish - start);
+
+	start = get_cycles();
+	for (i = 0; i < 1000; i++) {
+		get_random_int_new();
+	}
+	finish = get_cycles();
+	pr_err("get_random_int_new # cycles: %llu\n", finish - start);
+
+	start = get_cycles();
+	for (i = 0; i < 1000; i++) {
+		get_random_bytes(&out, sizeof(out));
+	}
+	finish = get_cycles();
+	pr_err("get_random_bytes # cycles: %llu\n", finish - start);
+	return 0;
+}
+device_initcall(rand_benchmark);
+
 #ifdef CONFIG_BLOCK
 void rand_initialize_disk(struct gendisk *disk)
 {
@@ -2064,8 +2096,10 @@ unsigned int get_random_int(void)
 	__u32 *hash;
 	unsigned int ret;
 
+#if 0	// force slow path
 	if (arch_get_random_int(&ret))
 		return ret;
+#endif
 
 	hash = get_cpu_var(get_random_int_hash);
 
@@ -2100,6 +2134,38 @@ unsigned long get_random_long(void)
 }
 EXPORT_SYMBOL(get_random_long);
 
+struct random_buf {
+	__u8 buf[CHACHA20_BLOCK_SIZE];
+	int ptr;
+};
+
+static DEFINE_PER_CPU(struct random_buf, batched_entropy);
+
+static void get_batched_entropy(void *buf, int n)
+{
+	struct random_buf *p;
+
+	p = &get_cpu_var(batched_entropy);
+
+	if ((p->ptr == 0) ||
+	    (p->ptr + n >= CHACHA20_BLOCK_SIZE)) {
+		extract_crng(p->buf);
+		p->ptr = 0;
+	}
+	BUG_ON(n > CHACHA20_BLOCK_SIZE);
+	memcpy(buf, p->buf, n);
+	p->ptr += n;
+	put_cpu_var(batched_entropy);
+}
+
+static unsigned int get_random_int_new(void)
+{
+	int	ret;
+
+	get_batched_entropy(&ret, sizeof(ret));
+	return ret;
+}
+
 /**
  * randomize_page - Generate a random, page aligned address
  * @start:	The smallest acceptable address the caller will take.

^ permalink raw reply related

* Re: "virtio-net: enable multiqueue by default" in linux-next breaks networking on GCE
From: Theodore Ts'o @ 2016-12-14 16:42 UTC (permalink / raw)
  To: Wei Xu; +Cc: jasowang, netdev, mst, nhorman, davem
In-Reply-To: <95568017-e7b1-df95-ec49-f38ae8eb1b14@redhat.com>

On Wed, Dec 14, 2016 at 12:24:43PM +0800, Wei Xu wrote:
> 
> BTW, although this is a guest issue, is there anyway to view the GCE
> host kernel or qemu(if it is) version?

No, there isn't, as far as I know.

    	  	    	     - Ted

^ permalink raw reply

* Re: Designing a safe RX-zero-copy Memory Model for Networking
From: Alexander Duyck @ 2016-12-14 16:45 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jesper Dangaard Brouer, David Miller, Christoph Lameter, rppt,
	Netdev, linux-mm, willemdebruijn.kernel, Björn Töpel,
	magnus.karlsson, Mel Gorman, Tom Herbert, Brenden Blanco,
	Tariq Toukan, Saeed Mahameed, Brandeburg, Jesse, METH,
	Vlad Yasevich
In-Reply-To: <5851740A.2080806@gmail.com>

On Wed, Dec 14, 2016 at 8:32 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> On 16-12-14 01:39 AM, Jesper Dangaard Brouer wrote:
>> On Tue, 13 Dec 2016 12:08:21 -0800
>> John Fastabend <john.fastabend@gmail.com> wrote:
>>
>>> On 16-12-13 11:53 AM, David Miller wrote:
>>>> From: John Fastabend <john.fastabend@gmail.com>
>>>> Date: Tue, 13 Dec 2016 09:43:59 -0800
>>>>
>>>>> What does "zero-copy send packet-pages to the application/socket that
>>>>> requested this" mean? At the moment on x86 page-flipping appears to be
>>>>> more expensive than memcpy (I can post some data shortly) and shared
>>>>> memory was proposed and rejected for security reasons when we were
>>>>> working on bifurcated driver.
>>>>
>>>> The whole idea is that we map all the active RX ring pages into
>>>> userspace from the start.
>>>>
>>>> And just how Jesper's page pool work will avoid DMA map/unmap,
>>>> it will also avoid changing the userspace mapping of the pages
>>>> as well.
>>>>
>>>> Thus avoiding the TLB/VM overhead altogether.
>>>>
>>
>> Exactly.  It is worth mentioning that pages entering the page pool need
>> to be cleared (measured cost 143 cycles), in order to not leak any
>> kernel info.  The primary focus of this design is to make sure not to
>> leak kernel info to userspace, but with an "exclusive" mode also
>> support isolation between applications.
>>
>>
>>> I get this but it requires applications to be isolated. The pages from
>>> a queue can not be shared between multiple applications in different
>>> trust domains. And the application has to be cooperative meaning it
>>> can't "look" at data that has not been marked by the stack as OK. In
>>> these schemes we tend to end up with something like virtio/vhost or
>>> af_packet.
>>
>> I expect 3 modes, when enabling RX-zero-copy on a page_pool. The first
>> two would require CAP_NET_ADMIN privileges.  All modes have a trust
>> domain id, that need to match e.g. when page reach the socket.
>
> Even mode 3 should required cap_net_admin we don't want userspace to
> grab queues off the nic without it IMO.
>
>>
>> Mode-1 "Shared": Application choose lowest isolation level, allowing
>>  multiple application to mmap VMA area.
>
> My only point here is applications can read each others data and all
> applications need to cooperate for example one app could try to write
> continuously to read only pages causing faults and what not. This is
> all non standard and doesn't play well with cgroups and "normal"
> applications. It requires a new orchestration model.
>
> I'm a bit skeptical of the use case but I know of a handful of reasons
> to use this model. Maybe take a look at the ivshmem implementation in
> DPDK.
>
> Also this still requires a hardware filter to push "application" traffic
> onto reserved queues/pages as far as I can tell.
>
>>
>> Mode-2 "Single-user": Application request it want to be the only user
>>  of the RX queue.  This blocks other application to mmap VMA area.
>>
>
> Assuming data is read-only sharing with the stack is possibly OK :/. I
> guess you would need to pools of memory for data and skb so you don't
> leak skb into user space.
>
> The devils in the details here. There are lots of hooks in the kernel
> that can for example push the packet with a 'redirect' tc action for
> example. And letting an app "read" data or impact performance of an
> unrelated application is wrong IMO. Stacked devices also provide another
> set of details that are a bit difficult to track down see all the
> hardware offload efforts.
>
> I assume all these concerns are shared between mode-1 and mode-2
>
>> Mode-3 "Exclusive": Application request to own RX queue.  Packets are
>>  no longer allowed for normal netstack delivery.
>>
>
> I have patches for this mode already but haven't pushed them due to
> an alternative solution using VFIO.
>
>> Notice mode-2 still requires CAP_NET_ADMIN, because packets/pages are
>> still allowed to travel netstack and thus can contain packet data from
>> other normal applications.  This is part of the design, to share the
>> NIC between netstack and an accelerated userspace application using RX
>> zero-copy delivery.
>>
>
> I don't think this is acceptable to be honest. Letting an application
> potentially read/impact other arbitrary applications on the system
> seems like a non-starter even with CAP_NET_ADMIN. At least this was
> the conclusion from bifurcated driver work some time ago.

I agree.  This is a no-go from the performance perspective as well.
At a minimum you would have to be zeroing out the page between uses to
avoid leaking data, and that assumes that the program we are sending
the pages to is slightly well behaved.  If we think zeroing out an
sk_buff is expensive wait until we are trying to do an entire 4K page.

I think we are stuck with having to use a HW filter to split off
application traffic to a specific ring, and then having to share the
memory between the application and the kernel on that ring only.  Any
other approach just opens us up to all sorts of security concerns
since it would be possible for the application to try to read and
possibly write any data it wants into the buffers.

- Alex

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: Designing a safe RX-zero-copy Memory Model for Networking
From: Christoph Lameter @ 2016-12-14 17:00 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Jesper Dangaard Brouer, John Fastabend, Mike Rapoport,
	netdev@vger.kernel.org, linux-mm, Willem de Bruijn,
	Björn Töpel, Karlsson, Magnus, Alexander Duyck,
	Mel Gorman, Tom Herbert, Brenden Blanco, Tariq Toukan,
	Saeed Mahameed, Jesse Brandeburg, Kalman Meth, Vladislav Yasevich
In-Reply-To: <8aea213f-2739-9bd3-3a6a-668b759336ae@stressinduktion.org>

On Tue, 13 Dec 2016, Hannes Frederic Sowa wrote:

> > Interesting.  So you even imagine sockets registering memory regions
> > with the NIC.  If we had a proper NIC HW filter API across the drivers,
> > to register the steering rule (like ibv_create_flow), this would be
> > doable, but we don't (DPDK actually have an interesting proposal[1])
>
> On a side note, this is what windows does with RIO ("registered I/O").
> Maybe you want to look at the API to get some ideas: allocating and
> pinning down memory in user space and registering that with sockets to
> get zero-copy IO.

Yup that is also what I think. Regarding the memory registration and flow
steering for user space RX/TX ring please look at the qpair model
implemented by the RDMA subsystem in the kernel. The memory semantics are
clearly established there and have been in use for more than a decade.

^ permalink raw reply

* [PATCH net 0/2] net/sched: cls_flower: Fix mask handling
From: Paul Blakey @ 2016-12-14 17:00 UTC (permalink / raw)
  To: David S. Miller, netdev
  Cc: Jiri Pirko, Or Gerlitz, Roi Dayan, Shahar Klein, Hadar Hen Zion,
	Paul Blakey

Hi,
The series fix how the mask is being handled.
Thanks.

Paul Blakey (2):
  net/sched: cls_flower: Use mask for addr_type
  net/sched: cls_flower: Use masked key when calling HW offloads

 net/sched/cls_flower.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH net 1/2] net/sched: cls_flower: Use mask for addr_type
From: Paul Blakey @ 2016-12-14 17:00 UTC (permalink / raw)
  To: David S. Miller, netdev
  Cc: Jiri Pirko, Or Gerlitz, Roi Dayan, Shahar Klein, Hadar Hen Zion,
	Paul Blakey
In-Reply-To: <1481734858-37474-1-git-send-email-paulb@mellanox.com>

When addr_type is set, mask should also be set.

Fixes: 66530bdf85eb ('sched,cls_flower: set key address type when present')
Fixes: bc3103f1ed40 ('net/sched: cls_flower: Classify packet in ip tunnels')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index e040c51..9758f5a 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -509,6 +509,7 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 
 	if (tb[TCA_FLOWER_KEY_IPV4_SRC] || tb[TCA_FLOWER_KEY_IPV4_DST]) {
 		key->control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
+		mask->control.addr_type = ~0;
 		fl_set_key_val(tb, &key->ipv4.src, TCA_FLOWER_KEY_IPV4_SRC,
 			       &mask->ipv4.src, TCA_FLOWER_KEY_IPV4_SRC_MASK,
 			       sizeof(key->ipv4.src));
@@ -517,6 +518,7 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 			       sizeof(key->ipv4.dst));
 	} else if (tb[TCA_FLOWER_KEY_IPV6_SRC] || tb[TCA_FLOWER_KEY_IPV6_DST]) {
 		key->control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
+		mask->control.addr_type = ~0;
 		fl_set_key_val(tb, &key->ipv6.src, TCA_FLOWER_KEY_IPV6_SRC,
 			       &mask->ipv6.src, TCA_FLOWER_KEY_IPV6_SRC_MASK,
 			       sizeof(key->ipv6.src));
@@ -571,6 +573,7 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 	if (tb[TCA_FLOWER_KEY_ENC_IPV4_SRC] ||
 	    tb[TCA_FLOWER_KEY_ENC_IPV4_DST]) {
 		key->enc_control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
+		mask->enc_control.addr_type = ~0;
 		fl_set_key_val(tb, &key->enc_ipv4.src,
 			       TCA_FLOWER_KEY_ENC_IPV4_SRC,
 			       &mask->enc_ipv4.src,
@@ -586,6 +589,7 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 	if (tb[TCA_FLOWER_KEY_ENC_IPV6_SRC] ||
 	    tb[TCA_FLOWER_KEY_ENC_IPV6_DST]) {
 		key->enc_control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
+		mask->enc_control.addr_type = ~0;
 		fl_set_key_val(tb, &key->enc_ipv6.src,
 			       TCA_FLOWER_KEY_ENC_IPV6_SRC,
 			       &mask->enc_ipv6.src,
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net 2/2] net/sched: cls_flower: Use masked key when calling HW offloads
From: Paul Blakey @ 2016-12-14 17:00 UTC (permalink / raw)
  To: David S. Miller, netdev
  Cc: Jiri Pirko, Or Gerlitz, Roi Dayan, Shahar Klein, Hadar Hen Zion,
	Paul Blakey
In-Reply-To: <1481734858-37474-1-git-send-email-paulb@mellanox.com>

Zero bits on the mask signify a "don't care" on the corresponding bits
in key. Some HWs require those bits on the key to be zero. Since these
bits are masked anyway, it's okay to provide the masked key to all
drivers.

Fixes: 5b33f48842fa ('net/flower: Introduce hardware offload support')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 9758f5a..35ac28d 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -252,7 +252,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 	offload.cookie = (unsigned long)f;
 	offload.dissector = dissector;
 	offload.mask = mask;
-	offload.key = &f->key;
+	offload.key = &f->mkey;
 	offload.exts = &f->exts;
 
 	tc->type = TC_SETUP_CLSFLOWER;
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] IB/mlx4: avoid a -Wmaybe-uninitialize warning
From: Doug Ledford @ 2016-12-14 17:13 UTC (permalink / raw)
  To: Arnd Bergmann, Yishai Hadas
  Cc: David S. Miller, Jack Morgenstein, Or Gerlitz, Eran Ben Elisha,
	Moshe Shemesh, Christophe Jaillet, Moni Shoua,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161025161632.411899-1-arnd-r2nGTMty4D4@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 712 bytes --]

On 10/25/2016 12:16 PM, Arnd Bergmann wrote:
> There is an old warning about mlx4_SW2HW_EQ_wrapper on x86:
> 
> ethernet/mellanox/mlx4/resource_tracker.c: In function ‘mlx4_SW2HW_EQ_wrapper’:
> ethernet/mellanox/mlx4/resource_tracker.c:3071:10: error: ‘eq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
> 
> The problem here is that gcc won't track the state of the variable
> across a spin_unlock. Moving the assignment out of the lock is
> safe here and avoids the warning.
> 
> Signed-off-by: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>

Thanks, applied.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [net-next PATCH v5 1/6] net: virtio dynamically disable/enable LRO
From: John Fastabend @ 2016-12-14 17:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer
In-Reply-To: <20161214152924-mutt-send-email-mst@kernel.org>

On 16-12-14 05:31 AM, Michael S. Tsirkin wrote:
> On Thu, Dec 08, 2016 at 04:04:58PM -0800, John Fastabend wrote:
>> On 16-12-08 01:36 PM, Michael S. Tsirkin wrote:
>>> On Wed, Dec 07, 2016 at 12:11:11PM -0800, John Fastabend wrote:
>>>> This adds support for dynamically setting the LRO feature flag. The
>>>> message to control guest features in the backend uses the
>>>> CTRL_GUEST_OFFLOADS msg type.
>>>>
>>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>>> ---

[...]

>>>>  
>>>>  static void virtnet_config_changed_work(struct work_struct *work)
>>>> @@ -1815,6 +1846,12 @@ static int virtnet_probe(struct virtio_device *vdev)
>>>>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
>>>>  		dev->features |= NETIF_F_RXCSUM;
>>>>  
>>>> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) &&
>>>> +	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)) {
>>>> +		dev->features |= NETIF_F_LRO;
>>>> +		dev->hw_features |= NETIF_F_LRO;
>>>
>>> So the issue is I think that the virtio "LRO" isn't really
>>> LRO, it's typically just GRO forwarded to guests.
>>> So these are easily re-split along MTU boundaries,
>>> which makes it ok to forward these across bridges.
>>>
>>> It's not nice that we don't document this in the spec,
>>> but it's the reality and people rely on this.
>>>
>>> For now, how about doing a custom thing and just disable/enable
>>> it as XDP is attached/detached?
>>
>> The annoying part about doing this is ethtool will say that it is fixed
>> yet it will be changed by seemingly unrelated operation. I'm not sure I
>> like the idea to start automatically configuring the link via xdp_set.
> 
> I really don't like the idea of dropping performance
> by a factor of 3 for people bridging two virtio net
> interfaces.
> 
> So how about a simple approach for now, just disable
> XDP if GUEST_TSO is enabled?
> 
> We can discuss better approaches in next version.
> 

So the proposal is to add a check in XDP setup so that

  if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO{4|6})
	return -ENOPSUPP;

Or whatever is the most appropriate return code? Then we can
disable TSO via qemu-system with guest_tso4=off,guest_tso6=off for
XDP use cases.

Sounds like a reasonable start to me. I'll make the change should this
go through DaveMs net-next tree or do you want it on virtio tree? Either
is fine with me.

Thanks,
John

^ permalink raw reply

* Re: [Query] Delayed vxlan socket creation?
From: Cong Wang @ 2016-12-14 17:24 UTC (permalink / raw)
  To: Du, Fan; +Cc: netdev@vger.kernel.org, mrjana@gmail.com
In-Reply-To: <5A90DA2E42F8AE43BC4A093BF06788481A9457F1@SHSMSX103.ccr.corp.intel.com>

On Tue, Dec 13, 2016 at 11:49 PM, Du, Fan <fan.du@intel.com> wrote:
> Hi
>
> I'm interested to one Docker issue[1] which looks like related to kernel vxlan socket creation
> as described in the thread. From my limited knowledge here, socket creation is synchronous ,
> and after the *socket* syscall, the sock handle will be valid and ready to linkup.

You need to read the code. vxlan tunnel is a UDP tunnel, it needs a kernel
socket (and a port) to setup UDP communication, unlike GRE tunnel etc.


>
> Somehow I'm not sure the detailed scenario here, and which/how possible commit fix?
> Thanks!
>
> Quoted analysis:
> --------------------------------------------------------------------------
> (Found in kernel 3.13)
> The issue happens because in older kernels when a vxlan interface is created,
> the socket creation is queued up in a worker thread which actually creates
> the socket. But this needs to happen before we bring up the link on the vxlan interface.
> If for some chance, the worker thread hasn't completed the creation of the socket
> before we did link up then when we do link up the kernel checks if the socket was
> created and if not it will return ENOTCONN. This was a bug in the kernel which got fixed
> in later kernels. That is why retrying with a timer fixes the issue.


This was introduced by commit 1c51a9159ddefa5119724a4c7da3fd3ef44b68d5
and later fixed by commit 56ef9c909b40483d2c8cb63fcbf83865f162d5ec.

^ permalink raw reply

* Re: [v2] net:ethernet:cavium:octeon:octeon_mgmt: Handle return NULL error from devm_ioremap
From: David Daney @ 2016-12-14 17:32 UTC (permalink / raw)
  To: Arvind Yadav, peter.chen, fw, david.daney; +Cc: netdev, linux-kernel
In-Reply-To: <1481732732-6892-1-git-send-email-arvind.yadav.cs@gmail.com>

On 12/14/2016 08:25 AM, Arvind Yadav wrote:
> Here, If devm_ioremap will fail. It will return NULL.
> Kernel can run into a NULL-pointer dereference.
> This error check will avoid NULL pointer dereference.
>

Have you ever seen this failure in the wild?

How was the patch tested?

Thanks,
David Daney


> Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
> ---
>  drivers/net/ethernet/cavium/octeon/octeon_mgmt.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
> index 4ab404f..33c2fec 100644
> --- a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
> +++ b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
> @@ -1479,6 +1479,12 @@ static int octeon_mgmt_probe(struct platform_device *pdev)
>  	p->agl = (u64)devm_ioremap(&pdev->dev, p->agl_phys, p->agl_size);
>  	p->agl_prt_ctl = (u64)devm_ioremap(&pdev->dev, p->agl_prt_ctl_phys,
>  					   p->agl_prt_ctl_size);
> +	if (!p->mix || !p->agl || !p->agl_prt_ctl) {
> +		dev_err(&pdev->dev, "failed to map I/O memory\n");
> +		result = -ENOMEM;
> +		goto err;
> +	}
> +
>  	spin_lock_init(&p->lock);
>
>  	skb_queue_head_init(&p->tx_list);
>

^ permalink raw reply

* RE: Designing a safe RX-zero-copy Memory Model for Networking
From: David Laight @ 2016-12-14 17:37 UTC (permalink / raw)
  To: 'Christoph Lameter', Hannes Frederic Sowa
  Cc: Jesper Dangaard Brouer, John Fastabend, Mike Rapoport,
	netdev@vger.kernel.org, linux-mm, Willem de Bruijn,
	Björn Töpel, Karlsson, Magnus, Alexander Duyck,
	Mel Gorman, Tom Herbert, Brenden Blanco, Tariq Toukan,
	Saeed Mahameed, Jesse Brandeburg, Kalman Meth, Vladislav Yasevich
In-Reply-To: <alpine.DEB.2.20.1612141059020.20959@east.gentwo.org>

From: Christoph Lameter
> Sent: 14 December 2016 17:00
> On Tue, 13 Dec 2016, Hannes Frederic Sowa wrote:
> 
> > > Interesting.  So you even imagine sockets registering memory regions
> > > with the NIC.  If we had a proper NIC HW filter API across the drivers,
> > > to register the steering rule (like ibv_create_flow), this would be
> > > doable, but we don't (DPDK actually have an interesting proposal[1])
> >
> > On a side note, this is what windows does with RIO ("registered I/O").
> > Maybe you want to look at the API to get some ideas: allocating and
> > pinning down memory in user space and registering that with sockets to
> > get zero-copy IO.
> 
> Yup that is also what I think. Regarding the memory registration and flow
> steering for user space RX/TX ring please look at the qpair model
> implemented by the RDMA subsystem in the kernel. The memory semantics are
> clearly established there and have been in use for more than a decade.

Isn't there a bigger problem for transmit?
If the kernel is doing ANY validation on the frames it must copy the
data to memory the application cannot modify before doing the validation.
Otherwise the application could change the data afterwards.

	David


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform
From: Jason A. Donenfeld @ 2016-12-14 17:49 UTC (permalink / raw)
  To: David Laight
  Cc: Hannes Frederic Sowa, Netdev, kernel-hardening@lists.openwall.com,
	Andi Kleen, LKML, Linux Crypto Mailing List
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB023F7BD@AcuExch.aculab.com>

On Wed, Dec 14, 2016 at 3:47 PM, David Laight <David.Laight@aculab.com> wrote:
> Just remove the __packed and ensure that the structure is 'nice'.
> This includes ensuring there is no 'tail padding'.
> In some cases you'll need to put the port number into a 32bit field.

I'd rather not. There's no point in wasting extra cycles on hashing
useless data, just for some particular syntactic improvement. __packed
__aligned(8) will do what we want perfectly, I think.

> I'd also require that the key be aligned.

Yep, I'll certainly do this for the siphash24_aligned version in the v3.

^ permalink raw reply

* Re: [PATCH] infiniband: nes: nes_nic: use new api ethtool_{get|set}_link_ksettings
From: Doug Ledford @ 2016-12-14 17:52 UTC (permalink / raw)
  To: Philippe Reynes, faisal.latif-ral2JQCrhuEAvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1477409387-25056-1-git-send-email-tremyfr-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 369 bytes --]

On 10/25/2016 11:29 AM, Philippe Reynes wrote:
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes <tremyfr-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Thanks, applied.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform
From: David Miller @ 2016-12-14 17:56 UTC (permalink / raw)
  To: Jason
  Cc: David.Laight, netdev, kernel-hardening, ak, linux-kernel,
	linux-crypto
In-Reply-To: <CAHmME9pEM=cDC5S=j1BU2oCF8-WdnbRfiVojcet4rXcRLcpJRw@mail.gmail.com>

From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Wed, 14 Dec 2016 13:53:10 +0100

> In all current uses of __packed in the code, I think the impact is
> precisely zero, because all structures have members in descending
> order of size, with each member being a perfect multiple of the one
> below it. The __packed is therefore just there for safety, in case
> somebody comes in and screws everything up by sticking a u8 in
> between.

Just marking the structure __packed, whether necessary or not, makes
the compiler assume that the members are not aligned and causes
byte-by-byte accesses to be performed for words.

Never, _ever_, use __packed unless absolutely necessary, it pessimizes
the code on cpus that require proper alignment of types.

^ permalink raw reply

* Re: [kernel-hardening] Re: [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long
From: Jason A. Donenfeld @ 2016-12-14 17:58 UTC (permalink / raw)
  To: kernel-hardening, Theodore Ts'o, Jason A. Donenfeld, Netdev,
	David Miller, Linus Torvalds, LKML, George Spelvin, Scott Bauer,
	Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers,
	Linux Crypto Mailing List, Jean-Philippe Aumasson
In-Reply-To: <20161214163731.luj2dzmnihcuhn5p@thunk.org>

Hey Ted,

On Wed, Dec 14, 2016 at 5:37 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> One somewhat undesirable aspect of the current algorithm is that we
> never change random_int_secret.

Why exactly would this be a problem? So long as the secret is kept
secret, the PRF is secure. If an attacker can read arbitrary kernel
memory, there are much much bigger issues to be concerned about. As
well, the "chaining" variable I introduce ensures that the random
numbers are, per-cpu, related to the uniqueness of timing of
subsequent calls.

> So I've been toying with the
> following, which is 4 times faster than md5.  (I haven't tried
> benchmarking against siphash yet.)
>
> [    3.606139] random benchmark!!
> [    3.606276] get_random_int # cycles: 326578
> [    3.606317] get_random_int_new # cycles: 95438
> [    3.607423] get_random_bytes # cycles: 2653388

Cool, I'll benchmark it against the siphash implementation. I like
what you did with batching up lots of chacha output, and doling it out
bit by bit. I suspect this will be quite fast, because with chacha20
you get an entire block.

> P.S.  It's interesting to note that siphash24 and chacha20 are both
> add-rotate-xor based algorithms.

Quite! Lots of nice shiny things are turning out be be ARX -- ChaCha,
BLAKE2, Siphash, NORX. The simplicity is really appealing.

Jason

^ permalink raw reply

* Re: [v2] net:ethernet:cavium:octeon:octeon_mgmt: Handle return NULL error from devm_ioremap
From: arvind Yadav @ 2016-12-14 18:06 UTC (permalink / raw)
  To: David Daney, peter.chen, fw, david.daney; +Cc: netdev, linux-kernel
In-Reply-To: <3c3d5968-bc41-e1b7-6fda-78e92e7a9d56@caviumnetworks.com>

Yes, I have seen this error. We have a device with very less memory.
Basically it's OMAP2 board. We have to port Android L on this.
It's has 3.10 kernel version. In this device, we were getting Page 
allocation failure.
Vmalloc size was not enough to run all application. So we have decide to
increase vmalloc reserve space. once we increases Vmalloc space.
We start getting ioremap falilure. Kernel is getting NULL-pointer 
dereference error.

Here, It's just check to avoid any kernel crash because of ioremap failure.
We can keep this check to avoid this kind of scenario.

Thanks
-Arvind


On Wednesday 14 December 2016 11:02 PM, David Daney wrote:
> On 12/14/2016 08:25 AM, Arvind Yadav wrote:
>> Here, If devm_ioremap will fail. It will return NULL.
>> Kernel can run into a NULL-pointer dereference.
>> This error check will avoid NULL pointer dereference.
>>
> i
> Have you ever seen this failure in the wild?
>
> How was the patch tested?
>
> Thanks,
> David Daney
>
>
>> Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
>> ---
>>  drivers/net/ethernet/cavium/octeon/octeon_mgmt.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c 
>> b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
>> index 4ab404f..33c2fec 100644
>> --- a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
>> +++ b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
>> @@ -1479,6 +1479,12 @@ static int octeon_mgmt_probe(struct 
>> platform_device *pdev)
>>      p->agl = (u64)devm_ioremap(&pdev->dev, p->agl_phys, p->agl_size);
>>      p->agl_prt_ctl = (u64)devm_ioremap(&pdev->dev, p->agl_prt_ctl_phys,
>>                         p->agl_prt_ctl_size);
>> +    if (!p->mix || !p->agl || !p->agl_prt_ctl) {
>> +        dev_err(&pdev->dev, "failed to map I/O memory\n");
>> +        result = -ENOMEM;
>> +        goto err;
>> +    }
>> +
>>      spin_lock_init(&p->lock);
>>
>>      skb_queue_head_init(&p->tx_list);
>>
>

^ permalink raw reply

* Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform
From: Jason A. Donenfeld @ 2016-12-14 18:06 UTC (permalink / raw)
  To: David Miller
  Cc: David Laight, Netdev, kernel-hardening, Andi Kleen, LKML,
	Linux Crypto Mailing List
In-Reply-To: <20161214.125612.1361254098267633173.davem@davemloft.net>

Hi David,

On Wed, Dec 14, 2016 at 6:56 PM, David Miller <davem@davemloft.net> wrote:
> Just marking the structure __packed, whether necessary or not, makes
> the compiler assume that the members are not aligned and causes
> byte-by-byte accesses to be performed for words.
> Never, _ever_, use __packed unless absolutely necessary, it pessimizes
> the code on cpus that require proper alignment of types.

Oh, jimminy cricket, I did not realize that it made assignments
byte-by-byte *always*. So what options am I left with? What
immediately comes to mind are:

1)

struct {
    u64 a;
    u32 b;
    u32 c;
    u16 d;
    u8 end[];
} a = {
    .a = a,
    .b = b,
    .c = c,
    .d = d
};
siphash24(&a, offsetof(typeof(a), end), key);

2)

u8 bytes[sizeof(u64) + sizeof(u32) * 2 + sizeof(u16)];
*(u64 *)&bytes[0] = a;
*(u32 *)&bytes[sizeof(u64)] = b;
*(u32 *)&bytes[sizeof(u64) + sizeof(u32)] = c;
*(u16 *)&bytes[sizeof(u64) + sizeof(u32) * 2] = d;
siphash24(bytes, sizeof(bytes), key);


Personally I find (1) a bit neater than (2). What's your opinion?

Jason

^ permalink raw reply

* Re: [v2] net:ethernet:cavium:octeon:octeon_mgmt: Handle return NULL error from devm_ioremap
From: David Daney @ 2016-12-14 18:14 UTC (permalink / raw)
  To: arvind Yadav, peter.chen, fw, david.daney; +Cc: netdev, linux-kernel
In-Reply-To: <0be48286-2656-84b4-4cd8-93bea5fbc6f0@gmail.com>

On 12/14/2016 10:06 AM, arvind Yadav wrote:
> Yes, I have seen this error. We have a device with very less memory.
> Basically it's OMAP2 board. We have to port Android L on this.
> It's has 3.10 kernel version. In this device, we were getting Page
> allocation failure.

This makes absolutely no sense to me.  OCTEON is a mips64 SoC with a ton 
of memory where ioremap can never fail, and it doesn't run Android, and 
you are talking about OMAP2.

Q1: Have you observed a failure on the device for which you are 
modifying the driver?

Q2: Have you tested the patch on hardware that uses the driver you are 
modifying by running network traffic through the Ethernet interface this 
driver controls?

If you cannot answer yes to both of those questions, then you should 
probably note in the changelog that the patch is untested.

David.


> Vmalloc size was not enough to run all application. So we have decide to
> increase vmalloc reserve space. once we increases Vmalloc space.
> We start getting ioremap falilure. Kernel is getting NULL-pointer
> dereference error.
>
> Here, It's just check to avoid any kernel crash because of ioremap failure.
> We can keep this check to avoid this kind of scenario.
>
> Thanks
> -Arvind
>
>
> On Wednesday 14 December 2016 11:02 PM, David Daney wrote:
>> On 12/14/2016 08:25 AM, Arvind Yadav wrote:
>>> Here, If devm_ioremap will fail. It will return NULL.
>>> Kernel can run into a NULL-pointer dereference.
>>> This error check will avoid NULL pointer dereference.
>>>
>> i
>> Have you ever seen this failure in the wild?
>>
>> How was the patch tested?
>>
>> Thanks,
>> David Daney
>>
>>
>>> Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
>>> ---
>>>  drivers/net/ethernet/cavium/octeon/octeon_mgmt.c | 6 ++++++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
>>> b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
>>> index 4ab404f..33c2fec 100644
>>> --- a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
>>> +++ b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
>>> @@ -1479,6 +1479,12 @@ static int octeon_mgmt_probe(struct
>>> platform_device *pdev)
>>>      p->agl = (u64)devm_ioremap(&pdev->dev, p->agl_phys, p->agl_size);
>>>      p->agl_prt_ctl = (u64)devm_ioremap(&pdev->dev, p->agl_prt_ctl_phys,
>>>                         p->agl_prt_ctl_size);
>>> +    if (!p->mix || !p->agl || !p->agl_prt_ctl) {
>>> +        dev_err(&pdev->dev, "failed to map I/O memory\n");
>>> +        result = -ENOMEM;
>>> +        goto err;
>>> +    }
>>> +
>>>      spin_lock_init(&p->lock);
>>>
>>>      skb_queue_head_init(&p->tx_list);
>>>
>>
>

^ permalink raw reply

* Re: [v2] net:ethernet:cavium:octeon:octeon_mgmt: Handle return NULL error from devm_ioremap
From: arvind Yadav @ 2016-12-14 18:39 UTC (permalink / raw)
  To: David Daney, peter.chen, fw, david.daney; +Cc: netdev, linux-kernel
In-Reply-To: <f113cad3-e368-679a-c56b-3c8c57e1a07b@caviumnetworks.com>

Hi David,

I have gave my comment.

Thanks
Arvind

On Wednesday 14 December 2016 11:44 PM, David Daney wrote:
> On 12/14/2016 10:06 AM, arvind Yadav wrote:
>> Yes, I have seen this error. We have a device with very less memory.
>> Basically it's OMAP2 board. We have to port Android L on this.
>> It's has 3.10 kernel version. In this device, we were getting Page
>> allocation failure.
>
> This makes absolutely no sense to me.  OCTEON is a mips64 SoC with a 
> ton of memory where ioremap can never fail, and it doesn't run 
> Android, and you are talking about OMAP2.
           -I just gave as example where i have seen ioremap issue. 
Please don't relate. I know, Now it will not fail.  ioremap will through 
NULL on failure. We should catch this error. Even other driver of MIPS 
soc is having same check. It's just check which will not impact any 
functionality or performance of this driver. It will avoid NULL pointer 
error. We know, if  function is returning any error. we should catch.
>
> Q1: Have you observed a failure on the device for which you are 
> modifying the driver?
          -No, I did not observe this error.
>
> Q2: Have you tested the patch on hardware that uses the driver you are 
> modifying by running network traffic through the Ethernet interface 
> this driver controls?
         -Right Now we can not tested these kind of failure,
>
> If you cannot answer yes to both of those questions, then you should 
> probably note in the changelog that the patch is untested.
>

> David.
>
>
>> Vmalloc size was not enough to run all application. So we have decide to
>> increase vmalloc reserve space. once we increases Vmalloc space.
>> We start getting ioremap falilure. Kernel is getting NULL-pointer
>> dereference error.
>>
>> Here, It's just check to avoid any kernel crash because of ioremap 
>> failure.
>> We can keep this check to avoid this kind of scenario.
>>
>> Thanks
>> -Arvind
>>
>>
>> On Wednesday 14 December 2016 11:02 PM, David Daney wrote:
>>> On 12/14/2016 08:25 AM, Arvind Yadav wrote:
>>>> Here, If devm_ioremap will fail. It will return NULL.
>>>> Kernel can run into a NULL-pointer dereference.
>>>> This error check will avoid NULL pointer dereference.
>>>>
>>> i
>>> Have you ever seen this failure in the wild?
>>>
>>> How was the patch tested?
>>>
>>> Thanks,
>>> David Daney
>>>
>>>
>>>> Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
>>>> ---
>>>>  drivers/net/ethernet/cavium/octeon/octeon_mgmt.c | 6 ++++++
>>>>  1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
>>>> b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
>>>> index 4ab404f..33c2fec 100644
>>>> --- a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
>>>> +++ b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
>>>> @@ -1479,6 +1479,12 @@ static int octeon_mgmt_probe(struct
>>>> platform_device *pdev)
>>>>      p->agl = (u64)devm_ioremap(&pdev->dev, p->agl_phys, p->agl_size);
>>>>      p->agl_prt_ctl = (u64)devm_ioremap(&pdev->dev, 
>>>> p->agl_prt_ctl_phys,
>>>>                         p->agl_prt_ctl_size);
>>>> +    if (!p->mix || !p->agl || !p->agl_prt_ctl) {
>>>> +        dev_err(&pdev->dev, "failed to map I/O memory\n");
>>>> +        result = -ENOMEM;
>>>> +        goto err;
>>>> +    }
>>>> +
>>>>      spin_lock_init(&p->lock);
>>>>
>>>>      skb_queue_head_init(&p->tx_list);
>>>>
>>>
>>

^ permalink raw reply

* [PATCH v3 1/3] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-14 18:46 UTC (permalink / raw)
  To: Netdev, kernel-hardening, LKML, linux-crypto
  Cc: Jason A. Donenfeld, Jean-Philippe Aumasson, Daniel J . Bernstein,
	Linus Torvalds, Eric Biggers, David Laight
In-Reply-To: <20161214035927.30004-1-Jason@zx2c4.com>

SipHash is a 64-bit keyed hash function that is actually a
cryptographically secure PRF, like HMAC. Except SipHash is super fast,
and is meant to be used as a hashtable keyed lookup function.

SipHash isn't just some new trendy hash function. It's been around for a
while, and there really isn't anything that comes remotely close to
being useful in the way SipHash is. With that said, why do we need this?

There are a variety of attacks known as "hashtable poisoning" in which an
attacker forms some data such that the hash of that data will be the
same, and then preceeds to fill up all entries of a hashbucket. This is
a realistic and well-known denial-of-service vector.

Linux developers already seem to be aware that this is an issue, and
various places that use hash tables in, say, a network context, use a
non-cryptographically secure function (usually jhash) and then try to
twiddle with the key on a time basis (or in many cases just do nothing
and hope that nobody notices). While this is an admirable attempt at
solving the problem, it doesn't actually fix it. SipHash fixes it.

(It fixes it in such a sound way that you could even build a stream
cipher out of SipHash that would resist the modern cryptanalysis.)

There are a modicum of places in the kernel that are vulnerable to
hashtable poisoning attacks, either via userspace vectors or network
vectors, and there's not a reliable mechanism inside the kernel at the
moment to fix it. The first step toward fixing these issues is actually
getting a secure primitive into the kernel for developers to use. Then
we can, bit by bit, port things over to it as deemed appropriate.

Secondly, a few places are using MD5 for creating secure sequence
numbers, port numbers, or fast random numbers. Siphash is a faster, more
fittting, and more secure replacement for MD5 in those situations.

Dozens of languages are already using this internally for their hash
tables. Some of the BSDs already use this in their kernels. SipHash is
a widely known high-speed solution to a widely known problem, and it's
time we catch-up.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Daniel J. Bernstein <djb@cr.yp.to>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
---
Changes from v2->v3:

  - There is now a fast aligned version of the function and a not-as-fast
    unaligned version. The requirements for each have been documented in
    a docbook-style comment. As well, the header now contains a constant
    for the expected alignment.

  - The test suite has been updated to check both the unaligned and aligned
    version of the function.

 include/linux/siphash.h |  30 ++++++++++
 lib/Kconfig.debug       |   6 +-
 lib/Makefile            |   5 +-
 lib/siphash.c           | 153 ++++++++++++++++++++++++++++++++++++++++++++++++
 lib/test_siphash.c      |  85 +++++++++++++++++++++++++++
 5 files changed, 274 insertions(+), 5 deletions(-)
 create mode 100644 include/linux/siphash.h
 create mode 100644 lib/siphash.c
 create mode 100644 lib/test_siphash.c

diff --git a/include/linux/siphash.h b/include/linux/siphash.h
new file mode 100644
index 000000000000..82dc1a911a2e
--- /dev/null
+++ b/include/linux/siphash.h
@@ -0,0 +1,30 @@
+/* Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#ifndef _LINUX_SIPHASH_H
+#define _LINUX_SIPHASH_H
+
+#include <linux/types.h>
+
+enum siphash_lengths {
+	SIPHASH24_KEY_LEN = 16,
+	SIPHASH24_ALIGNMENT = 8
+};
+
+u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]);
+
+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+static inline u64 siphash24_unaligned(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN])
+{
+	return siphash24(data, len, key);
+}
+#else
+u64 siphash24_unaligned(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]);
+#endif
+
+#endif /* _LINUX_SIPHASH_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e6327d102184..32bbf689fc46 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1843,9 +1843,9 @@ config TEST_HASH
 	tristate "Perform selftest on hash functions"
 	default n
 	help
-	  Enable this option to test the kernel's integer (<linux/hash,h>)
-	  and string (<linux/stringhash.h>) hash functions on boot
-	  (or module load).
+	  Enable this option to test the kernel's integer (<linux/hash.h>),
+	  string (<linux/stringhash.h>), and siphash (<linux/siphash.h>)
+	  hash functions on boot (or module load).
 
 	  This is intended to help people writing architecture-specific
 	  optimized versions.  If unsure, say N.
diff --git a/lib/Makefile b/lib/Makefile
index 50144a3aeebd..71d398b04a74 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -22,7 +22,8 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 sha1.o chacha20.o md5.o irq_regs.o argv_split.o \
 	 flex_proportions.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
-	 earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o win_minmax.o
+	 earlycpio.o seq_buf.o siphash.o \
+	 nmi_backtrace.o nodemask.o win_minmax.o
 
 lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
@@ -44,7 +45,7 @@ obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
 obj-y += kstrtox.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
 obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o
-obj-$(CONFIG_TEST_HASH) += test_hash.o
+obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o
 obj-$(CONFIG_TEST_KASAN) += test_kasan.o
 obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o
 obj-$(CONFIG_TEST_LKM) += test_module.o
diff --git a/lib/siphash.c b/lib/siphash.c
new file mode 100644
index 000000000000..32acdc26234f
--- /dev/null
+++ b/lib/siphash.c
@@ -0,0 +1,153 @@
+/* Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ * Copyright (C) 2012-2014 Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
+ * Copyright (C) 2012-2014 Daniel J. Bernstein <djb@cr.yp.to>
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#include <linux/siphash.h>
+#include <linux/kernel.h>
+#include <asm/unaligned.h>
+
+#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
+#include <linux/dcache.h>
+#include <asm/word-at-a-time.h>
+#endif
+
+#define SIPROUND \
+	do { \
+	v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32); \
+	v2 += v3; v3 = rol64(v3, 16); v3 ^= v2; \
+	v0 += v3; v3 = rol64(v3, 21); v3 ^= v0; \
+	v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32); \
+	} while(0)
+
+static inline u16 le16_to_cpuvp(const void *p)
+{
+	return le16_to_cpup(p);
+}
+static inline u32 le32_to_cpuvp(const void *p)
+{
+	return le32_to_cpup(p);
+}
+static inline u64 le64_to_cpuvp(const void *p)
+{
+	return le64_to_cpup(p);
+}
+
+/**
+ * siphash24 - compute 64-bit siphash24 PRF value
+ * @data: buffer to hash, must be aligned to SIPHASH24_ALIGNMENT
+ * @size: size of @data
+ * @key: key buffer of size SIPHASH24_KEY_LEN, must be aligned to SIPHASH24_ALIGNMENT
+ */
+u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN])
+{
+	u64 v0 = 0x736f6d6570736575ULL;
+	u64 v1 = 0x646f72616e646f6dULL;
+	u64 v2 = 0x6c7967656e657261ULL;
+	u64 v3 = 0x7465646279746573ULL;
+	u64 b = ((u64)len) << 56;
+	u64 k0 = le64_to_cpuvp(key);
+	u64 k1 = le64_to_cpuvp(key + sizeof(u64));
+	u64 m;
+	const u8 *end = data + len - (len % sizeof(u64));
+	const u8 left = len & (sizeof(u64) - 1);
+	v3 ^= k1;
+	v2 ^= k0;
+	v1 ^= k1;
+	v0 ^= k0;
+	for (; data != end; data += sizeof(u64)) {
+		m = le64_to_cpuvp(data);
+		v3 ^= m;
+		SIPROUND;
+		SIPROUND;
+		v0 ^= m;
+	}
+#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
+	if (left)
+		b |= le64_to_cpu((__force __le64)(load_unaligned_zeropad(data) & bytemask_from_count(left)));
+#else
+	switch (left) {
+	case 7: b |= ((u64)data[6]) << 48;
+	case 6: b |= ((u64)data[5]) << 40;
+	case 5: b |= ((u64)data[4]) << 32;
+	case 4: b |= le32_to_cpuvp(data); break;
+	case 3: b |= ((u64)data[2]) << 16;
+	case 2: b |= le16_to_cpuvp(data); break;
+	case 1: b |= data[0];
+	}
+#endif
+	v3 ^= b;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= b;
+	v2 ^= 0xff;
+	SIPROUND;
+	SIPROUND;
+	SIPROUND;
+	SIPROUND;
+	return (v0 ^ v1) ^ (v2 ^ v3);
+}
+EXPORT_SYMBOL(siphash24);
+
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+/**
+ * siphash24 - compute 64-bit siphash24 PRF value, without alignment requirements
+ * @data: buffer to hash
+ * @size: size of @data
+ * @key: key buffer of size SIPHASH24_KEY_LEN
+ */
+u64 siphash24_unaligned(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN])
+{
+	u64 v0 = 0x736f6d6570736575ULL;
+	u64 v1 = 0x646f72616e646f6dULL;
+	u64 v2 = 0x6c7967656e657261ULL;
+	u64 v3 = 0x7465646279746573ULL;
+	u64 b = ((u64)len) << 56;
+	u64 k0 = get_unaligned_le64(key);
+	u64 k1 = get_unaligned_le64(key + sizeof(u64));
+	u64 m;
+	const u8 *end = data + len - (len % sizeof(u64));
+	const u8 left = len & (sizeof(u64) - 1);
+	v3 ^= k1;
+	v2 ^= k0;
+	v1 ^= k1;
+	v0 ^= k0;
+	for (; data != end; data += sizeof(u64)) {
+		m = get_unaligned_le64(data);
+		v3 ^= m;
+		SIPROUND;
+		SIPROUND;
+		v0 ^= m;
+	}
+#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
+	if (left)
+		b |= le64_to_cpu((__force __le64)(load_unaligned_zeropad(data) & bytemask_from_count(left)));
+#else
+	switch (left) {
+	case 7: b |= ((u64)data[6]) << 48;
+	case 6: b |= ((u64)data[5]) << 40;
+	case 5: b |= ((u64)data[4]) << 32;
+	case 4: b |= get_unaligned_le32(data); break;
+	case 3: b |= ((u64)data[2]) << 16;
+	case 2: b |= get_unaligned_le16(data); break;
+	case 1: b |= data[0];
+	}
+#endif
+	v3 ^= b;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= b;
+	v2 ^= 0xff;
+	SIPROUND;
+	SIPROUND;
+	SIPROUND;
+	SIPROUND;
+	return (v0 ^ v1) ^ (v2 ^ v3);
+}
+EXPORT_SYMBOL(siphash24_unaligned);
+#endif
diff --git a/lib/test_siphash.c b/lib/test_siphash.c
new file mode 100644
index 000000000000..69ac94dec366
--- /dev/null
+++ b/lib/test_siphash.c
@@ -0,0 +1,85 @@
+/* Test cases for siphash.c
+ *
+ * Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/siphash.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/module.h>
+
+/* Test vectors taken from official reference source available at:
+ *     https://131002.net/siphash/siphash24.c
+ */
+static const u64 test_vectors[64] = {
+	0x726fdb47dd0e0e31ULL, 0x74f839c593dc67fdULL, 0x0d6c8009d9a94f5aULL,
+	0x85676696d7fb7e2dULL, 0xcf2794e0277187b7ULL, 0x18765564cd99a68dULL,
+	0xcbc9466e58fee3ceULL, 0xab0200f58b01d137ULL, 0x93f5f5799a932462ULL,
+	0x9e0082df0ba9e4b0ULL, 0x7a5dbbc594ddb9f3ULL, 0xf4b32f46226bada7ULL,
+	0x751e8fbc860ee5fbULL, 0x14ea5627c0843d90ULL, 0xf723ca908e7af2eeULL,
+	0xa129ca6149be45e5ULL, 0x3f2acc7f57c29bdbULL, 0x699ae9f52cbe4794ULL,
+	0x4bc1b3f0968dd39cULL, 0xbb6dc91da77961bdULL, 0xbed65cf21aa2ee98ULL,
+	0xd0f2cbb02e3b67c7ULL, 0x93536795e3a33e88ULL, 0xa80c038ccd5ccec8ULL,
+	0xb8ad50c6f649af94ULL, 0xbce192de8a85b8eaULL, 0x17d835b85bbb15f3ULL,
+	0x2f2e6163076bcfadULL, 0xde4daaaca71dc9a5ULL, 0xa6a2506687956571ULL,
+	0xad87a3535c49ef28ULL, 0x32d892fad841c342ULL, 0x7127512f72f27cceULL,
+	0xa7f32346f95978e3ULL, 0x12e0b01abb051238ULL, 0x15e034d40fa197aeULL,
+	0x314dffbe0815a3b4ULL, 0x027990f029623981ULL, 0xcadcd4e59ef40c4dULL,
+	0x9abfd8766a33735cULL, 0x0e3ea96b5304a7d0ULL, 0xad0c42d6fc585992ULL,
+	0x187306c89bc215a9ULL, 0xd4a60abcf3792b95ULL, 0xf935451de4f21df2ULL,
+	0xa9538f0419755787ULL, 0xdb9acddff56ca510ULL, 0xd06c98cd5c0975ebULL,
+	0xe612a3cb9ecba951ULL, 0xc766e62cfcadaf96ULL, 0xee64435a9752fe72ULL,
+	0xa192d576b245165aULL, 0x0a8787bf8ecb74b2ULL, 0x81b3e73d20b49b6fULL,
+	0x7fa8220ba3b2eceaULL, 0x245731c13ca42499ULL, 0xb78dbfaf3a8d83bdULL,
+	0xea1ad565322a1a0bULL, 0x60e61c23a3795013ULL, 0x6606d7e446282b93ULL,
+	0x6ca4ecb15c5f91e1ULL, 0x9f626da15c9625f3ULL, 0xe51b38608ef25f57ULL,
+	0x958a324ceb064572ULL
+};
+
+static int __init siphash_test_init(void)
+{
+	u8 in[64] __aligned(SIPHASH24_ALIGNMENT);
+	u8 k[16] __aligned(SIPHASH24_ALIGNMENT);
+	u8 in_unaligned[65];
+	u8 k_unaligned[65];
+	u8 i;
+	int ret = 0;
+
+	for (i = 0; i < 16; ++i) {
+		k[i] = i;
+		k_unaligned[i + 1] = i;
+	}
+	for (i = 0; i < 64; ++i) {
+		in[i] = i;
+		in_unaligned[i + 1] = i;
+		if (siphash24(in, i, k) != test_vectors[i]) {
+			pr_info("self-test aligned %u: FAIL\n", i + 1);
+			ret = -EINVAL;
+		}
+		if (siphash24_unaligned(in_unaligned + 1, i, k_unaligned + 1) != test_vectors[i]) {
+			pr_info("self-test unaligned %u: FAIL\n", i + 1);
+			ret = -EINVAL;
+		}
+	}
+	if (!ret)
+		pr_info("self-tests: pass\n");
+	return ret;
+}
+
+static void __exit siphash_test_exit(void)
+{
+}
+
+module_init(siphash_test_init);
+module_exit(siphash_test_exit);
+
+MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
+MODULE_LICENSE("Dual BSD/GPL");
-- 
2.11.0

^ permalink raw reply related

* [PATCH v3 2/3] secure_seq: use siphash24 instead of md5_transform
From: Jason A. Donenfeld @ 2016-12-14 18:46 UTC (permalink / raw)
  To: Netdev, kernel-hardening, LKML, linux-crypto
  Cc: Jason A. Donenfeld, Andi Kleen, David Miller, David Laight
In-Reply-To: <20161214184605.24006-1-Jason@zx2c4.com>

This gives a clear speed and security improvement. Siphash is both
faster and is more solid crypto than the aging MD5.

Rather than manually filling MD5 buffers, we simply create
a layout by a simple anonymous struct, for which gcc generates
rather efficient code.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Laight <David.Laight@aculab.com>
---
Changes from v2->v3:

  - Structs are no longer packed, to mitigate slow byte-by-byte assignment.
  - A typo has been fixed in the port number assignment.

 net/core/secure_seq.c | 166 ++++++++++++++++++++++++++------------------------
 1 file changed, 85 insertions(+), 81 deletions(-)

diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 88a8e429fc3e..00eb141c981b 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -1,3 +1,5 @@
+/* Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. */
+
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/cryptohash.h>
@@ -8,14 +10,14 @@
 #include <linux/ktime.h>
 #include <linux/string.h>
 #include <linux/net.h>
-
+#include <linux/siphash.h>
 #include <net/secure_seq.h>
 
 #if IS_ENABLED(CONFIG_IPV6) || IS_ENABLED(CONFIG_INET)
+#include <linux/in6.h>
 #include <net/tcp.h>
-#define NET_SECRET_SIZE (MD5_MESSAGE_BYTES / 4)
 
-static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned;
+static u8 net_secret[SIPHASH24_KEY_LEN] __aligned(SIPHASH24_ALIGNMENT);
 
 static __always_inline void net_secret_init(void)
 {
@@ -44,44 +46,41 @@ static u32 seq_scale(u32 seq)
 u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
 				 __be16 sport, __be16 dport, u32 *tsoff)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
-	u32 i;
-
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 sport;
+		__be16 dport;
+		char end[];
+	} __aligned(SIPHASH24_ALIGNMENT) combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.sport = sport,
+		.dport = dport
+	};
+	u64 hash;
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32)daddr[i];
-	secret[4] = net_secret[4] +
-		(((__force u16)sport << 16) + (__force u16)dport);
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	*tsoff = sysctl_tcp_timestamps == 1 ? hash[1] : 0;
-	return seq_scale(hash[0]);
+	hash = siphash24((const u8 *)&combined, offsetof(typeof(combined), end), net_secret);
+	*tsoff = sysctl_tcp_timestamps == 1 ? (hash >> 32) : 0;
+	return seq_scale(hash);
 }
 EXPORT_SYMBOL(secure_tcpv6_sequence_number);
 
 u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
-	u32 i;
-
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 dport;
+		char end[];
+	} __aligned(SIPHASH24_ALIGNMENT) combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.dport = dport
+	};
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32) daddr[i];
-	secret[4] = net_secret[4] + (__force u32)dport;
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	return hash[0];
+	return siphash24((const u8 *)&combined, offsetof(typeof(combined), end), net_secret);
 }
 EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 #endif
@@ -91,33 +90,39 @@ EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
 			       __be16 sport, __be16 dport, u32 *tsoff)
 {
-	u32 hash[MD5_DIGEST_WORDS];
-
+	const struct {
+		__be32 saddr;
+		__be32 daddr;
+		__be16 sport;
+		__be16 dport;
+		char end[];
+	} __aligned(SIPHASH24_ALIGNMENT) combined = {
+		.saddr = saddr,
+		.daddr = daddr,
+		.sport = sport,
+		.dport = dport
+	};
+	u64 hash;
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	*tsoff = sysctl_tcp_timestamps == 1 ? hash[1] : 0;
-	return seq_scale(hash[0]);
+	hash = siphash24((const u8 *)&combined, offsetof(typeof(combined), end), net_secret);
+	*tsoff = sysctl_tcp_timestamps == 1 ? (hash >> 32) : 0;
+	return seq_scale(hash);
 }
 
 u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
 {
-	u32 hash[MD5_DIGEST_WORDS];
-
+	const struct {
+		__be32 saddr;
+		__be32 daddr;
+		__be16 dport;
+		char end[];
+	} __aligned(SIPHASH24_ALIGNMENT) combined = {
+		.saddr = saddr,
+		.daddr = daddr,
+		.dport = dport
+	};
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = (__force u32)dport ^ net_secret[14];
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	return hash[0];
+	return siphash24((const u8 *)&combined, offsetof(typeof(combined), end), net_secret);
 }
 EXPORT_SYMBOL_GPL(secure_ipv4_port_ephemeral);
 #endif
@@ -126,21 +131,23 @@ EXPORT_SYMBOL_GPL(secure_ipv4_port_ephemeral);
 u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr,
 				__be16 sport, __be16 dport)
 {
-	u32 hash[MD5_DIGEST_WORDS];
+	const struct {
+		__be32 saddr;
+		__be32 daddr;
+		__be16 sport;
+		__be16 dport;
+		char end[];
+	} __aligned(SIPHASH24_ALIGNMENT) combined = {
+		.saddr = saddr,
+		.daddr = daddr,
+		.sport = sport,
+		.dport = dport
+	};
 	u64 seq;
-
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	seq = hash[0] | (((u64)hash[1]) << 32);
+	seq = siphash24((const u8 *)&combined, offsetof(typeof(combined), end), net_secret);
 	seq += ktime_get_real_ns();
 	seq &= (1ull << 48) - 1;
-
 	return seq;
 }
 EXPORT_SYMBOL(secure_dccp_sequence_number);
@@ -149,26 +156,23 @@ EXPORT_SYMBOL(secure_dccp_sequence_number);
 u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr,
 				  __be16 sport, __be16 dport)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 sport;
+		__be16 dport;
+		char end[];
+	} __aligned(SIPHASH24_ALIGNMENT) combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.sport = sport,
+		.dport = dport
+	};
 	u64 seq;
-	u32 i;
-
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32)daddr[i];
-	secret[4] = net_secret[4] +
-		(((__force u16)sport << 16) + (__force u16)dport);
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	seq = hash[0] | (((u64)hash[1]) << 32);
+	seq = siphash24((const u8 *)&combined, offsetof(typeof(combined), end), net_secret);
 	seq += ktime_get_real_ns();
 	seq &= (1ull << 48) - 1;
-
 	return seq;
 }
 EXPORT_SYMBOL(secure_dccpv6_sequence_number);
-- 
2.11.0

^ permalink raw reply related

* [PATCH v3 3/3] random: use siphash24 instead of md5 for get_random_int/long
From: Jason A. Donenfeld @ 2016-12-14 18:46 UTC (permalink / raw)
  To: Netdev, kernel-hardening, LKML, linux-crypto
  Cc: Jason A. Donenfeld, Jean-Philippe Aumasson, Ted Tso
In-Reply-To: <20161214184605.24006-1-Jason@zx2c4.com>

This duplicates the current algorithm for get_random_int/long, but uses
siphash24 instead. This comes with several benefits. It's certainly
faster and more cryptographically secure than MD5. This patch also
hashes the pid, entropy, and timestamp as fixed width fields, in order
to increase diffusion.

The previous md5 algorithm used a per-cpu md5 state, which caused
successive calls to the function to chain upon each other. While it's
not entirely clear that this kind of chaining is absolutely necessary
when using a secure PRF like siphash24, it can't hurt, and the timing of
the call chain does add a degree of natural entropy. So, in keeping with
this design, instead of the massive per-cpu 64-byte md5 state, there is
instead a per-cpu previously returned value for chaining.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Ted Tso <tytso@mit.edu>
---
Changes from v2->v3:

  - Structs are no longer packed, to mitigate slow byte-by-byte assignment.

 drivers/char/random.c | 52 ++++++++++++++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 19 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d6876d506220..b1c2e3b26430 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -262,6 +262,7 @@
 #include <linux/syscalls.h>
 #include <linux/completion.h>
 #include <linux/uuid.h>
+#include <linux/siphash.h>
 #include <crypto/chacha20.h>
 
 #include <asm/processor.h>
@@ -2042,7 +2043,7 @@ struct ctl_table random_table[] = {
 };
 #endif 	/* CONFIG_SYSCTL */
 
-static u32 random_int_secret[MD5_MESSAGE_BYTES / 4] ____cacheline_aligned;
+static u8 random_int_secret[SIPHASH24_KEY_LEN] __aligned(SIPHASH24_ALIGNMENT);
 
 int random_int_secret_init(void)
 {
@@ -2050,8 +2051,7 @@ int random_int_secret_init(void)
 	return 0;
 }
 
-static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
-		__aligned(sizeof(unsigned long));
+static DEFINE_PER_CPU(u64, get_random_int_chaining);
 
 /*
  * Get a random word for internal kernel use only. Similar to urandom but
@@ -2061,19 +2061,26 @@ static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
  */
 unsigned int get_random_int(void)
 {
-	__u32 *hash;
 	unsigned int ret;
+	struct {
+		u64 chaining;
+		unsigned long ts;
+		unsigned long entropy;
+		pid_t pid;
+		char end[];
+	} __aligned(SIPHASH24_ALIGNMENT) combined;
+	u64 *chaining;
 
 	if (arch_get_random_int(&ret))
 		return ret;
 
-	hash = get_cpu_var(get_random_int_hash);
-
-	hash[0] += current->pid + jiffies + random_get_entropy();
-	md5_transform(hash, random_int_secret);
-	ret = hash[0];
-	put_cpu_var(get_random_int_hash);
-
+	chaining = get_cpu_ptr(&get_random_int_chaining);
+	combined.chaining = *chaining;
+	combined.ts = jiffies;
+	combined.entropy = random_get_entropy();
+	combined.pid = current->pid;
+	ret = *chaining = siphash24((u8 *)&combined, offsetof(typeof(combined), end), random_int_secret);
+	put_cpu_ptr(chaining);
 	return ret;
 }
 EXPORT_SYMBOL(get_random_int);
@@ -2083,19 +2090,26 @@ EXPORT_SYMBOL(get_random_int);
  */
 unsigned long get_random_long(void)
 {
-	__u32 *hash;
 	unsigned long ret;
+	struct {
+		u64 chaining;
+		unsigned long ts;
+		unsigned long entropy;
+		pid_t pid;
+		char end[];
+	} __aligned(SIPHASH24_ALIGNMENT) combined;
+	u64 *chaining;
 
 	if (arch_get_random_long(&ret))
 		return ret;
 
-	hash = get_cpu_var(get_random_int_hash);
-
-	hash[0] += current->pid + jiffies + random_get_entropy();
-	md5_transform(hash, random_int_secret);
-	ret = *(unsigned long *)hash;
-	put_cpu_var(get_random_int_hash);
-
+	chaining = get_cpu_ptr(&get_random_int_chaining);
+	combined.chaining = *chaining;
+	combined.ts = jiffies;
+	combined.entropy = random_get_entropy();
+	combined.pid = current->pid;
+	ret = *chaining = siphash24((u8 *)&combined, offsetof(typeof(combined), end), random_int_secret);
+	put_cpu_ptr(chaining);
 	return ret;
 }
 EXPORT_SYMBOL(get_random_long);
-- 
2.11.0

^ permalink raw reply related

* Re: [v2] net:ethernet:cavium:octeon:octeon_mgmt: Handle return NULL error from devm_ioremap
From: Florian Fainelli @ 2016-12-14 18:54 UTC (permalink / raw)
  To: arvind Yadav; +Cc: netdev, linux-kernel
In-Reply-To: <a4c42b7c-89bb-efa7-e6e6-86e620ee1897@gmail.com>

On 12/14/2016 10:39 AM, arvind Yadav wrote:
> Hi David,
> 
> I have gave my comment.
> 
> Thanks
> Arvind
> 
> On Wednesday 14 December 2016 11:44 PM, David Daney wrote:
>> On 12/14/2016 10:06 AM, arvind Yadav wrote:
>>> Yes, I have seen this error. We have a device with very less memory.
>>> Basically it's OMAP2 board. We have to port Android L on this.
>>> It's has 3.10 kernel version. In this device, we were getting Page
>>> allocation failure.
>>
>> This makes absolutely no sense to me.  OCTEON is a mips64 SoC with a
>> ton of memory where ioremap can never fail, and it doesn't run
>> Android, and you are talking about OMAP2.
>           -I just gave as example where i have seen ioremap issue.
> Please don't relate. I know, Now it will not fail.  ioremap will through
> NULL on failure. We should catch this error. Even other driver of MIPS
> soc is having same check. It's just check which will not impact any
> functionality or performance of this driver. It will avoid NULL pointer
> error. We know, if  function is returning any error. we should catch.

Your patch subject should also be changed to insert spaces between
semicolon, so this would be:

net: ethernet: cavium: octeon: octeon_mgmt:
-- 
Florian

^ permalink raw reply

* stmmac: lockups (was Re: Synopsys Ethernet QoS)
From: Pavel Machek @ 2016-12-14 19:01 UTC (permalink / raw)
  To: Joao Pinto
  Cc: Niklas Cassel, Giuseppe CAVALLARO, Florian Fainelli,
	Andy Shevchenko, David Miller, larper, rabinv, netdev,
	CARLOS.PALMINHA, Jie.Deng1, Stephen Warren
In-Reply-To: <79642215-95ce-7f04-3db7-121c585e2f2a@synopsys.com>

[-- Attachment #1: Type: text/plain, Size: 1003 bytes --]

Hi!

> I know that this is completely of topic, but I am facing a dificulty with
> stmmac. I have interrupts, mac well configured rx packets being received
> successfully, but TX is not working, resulting in Tx errors = Total TX packets.
> I have made a lot of debug and my conclusions is that by some reason when using
> stmmac after starting tx dma, the hw state machine enters a deadend state
> resulting in those errors. Anyone faced this trouble?

Well.... what I'm debugging are lockups after many packets transmitted
(followed by netdev watchdog; stmmac_tx_err() does not work for me, it
kills the device even when run from working state; ifconfig down/up
helps). 4.4 locks up in minutes to hours, 4.9 seems to work better
(but I believe I seen a lockup there, too; once).

So... probably different problem?

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox