Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2 binutils] Add BPF support to binutils...
From: Aaron Conole @ 2017-04-28 17:09 UTC (permalink / raw)
  To: David Miller; +Cc: ast, daniel, netdev, xdp-newbies
In-Reply-To: <20170428.120457.408834081300892396.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: Aaron Conole <aconole@bytheb.org>
> Date: Fri, 28 Apr 2017 11:57:36 -0400
>
>> I'll get an arm board up and running to do some testing there.  As a
>> teaser:
>
> Great.
>
> I started working on some more relocation stuff, so more of the
> generic gas tests pass.
>
> For example, stuff like this now works properly:
>
> [davem@dhcp-10-15-49-210 build-bpf]$ cat gas/y.s
> 	.data
> 	.globl	foo
> foo:	.xword	bar
> [davem@dhcp-10-15-49-210 build-bpf]$ gas/as-new -o gas/y.o gas/y.s
> [davem@dhcp-10-15-49-210 build-bpf]$ binutils/objdump -r gas/y.o
>
> gas/y.o:     file format elf64-bpfle
>
> RELOCATION RECORDS FOR [.data]:
> OFFSET           TYPE              VALUE 
> 0000000000000000 R_BPF_DATA_64     bar
>
>
> [davem@dhcp-10-15-49-210 build-bpf]$
>
> It turned out that I needed to separate the R_BPF_* relocations into
> data vs. insn ones.
>
> Another idea I am thinking about pursuing is adding BPF simulator
> support under sim/ so that people can use gdb to step through BPF
> programs.
>
> I hope we can make it work in a way that we can even step through
> XDP programs and feed them simple test packets, stuff like that.
>
> Anyways, quick relative live patch against v2 from my tree for the
> reloc stuff:
>
> diff --git a/bfd/elf64-bpf.c b/bfd/elf64-bpf.c
> index 9944bb4..1be285d 100644
> --- a/bfd/elf64-bpf.c
> +++ b/bfd/elf64-bpf.c
> @@ -1,8 +1,89 @@
>  #include "sysdep.h"
>  #include "bfd.h"
> +#include "bfdlink.h"
>  #include "libbfd.h"
> +#include "libiberty.h"
>  #include "elf-bfd.h"
> +#include "elf/bpf.h"
>  #include "opcode/bpf.h"
> +#include "objalloc.h"
> +#include "elf64-bpf.h"

I get a compile error here.  I guess this file wasn't included.

^ permalink raw reply

* [Patch net-next v2] ipv4: get rid of ip_ra_lock
From: Cong Wang @ 2017-04-28 17:04 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang

After commit 1215e51edad1 ("ipv4: fix a deadlock in ip_ra_control")
we always take RTNL lock for ip_ra_control() which is the only place
we update the list ip_ra_chain, so the ip_ra_lock is no longer needed.

As Eric points out, BH does not need to disable either, RCU readers
don't care.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/ipv4/ip_sockglue.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 1d46d05..4c25458 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -330,7 +330,6 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
    sent to multicast group to reach destination designated router.
  */
 struct ip_ra_chain __rcu *ip_ra_chain;
-static DEFINE_SPINLOCK(ip_ra_lock);
 
 
 static void ip_ra_destroy_rcu(struct rcu_head *head)
@@ -352,21 +351,17 @@ int ip_ra_control(struct sock *sk, unsigned char on,
 
 	new_ra = on ? kmalloc(sizeof(*new_ra), GFP_KERNEL) : NULL;
 
-	spin_lock_bh(&ip_ra_lock);
 	for (rap = &ip_ra_chain;
-	     (ra = rcu_dereference_protected(*rap,
-			lockdep_is_held(&ip_ra_lock))) != NULL;
+	     (ra = rtnl_dereference(*rap)) != NULL;
 	     rap = &ra->next) {
 		if (ra->sk == sk) {
 			if (on) {
-				spin_unlock_bh(&ip_ra_lock);
 				kfree(new_ra);
 				return -EADDRINUSE;
 			}
 			/* dont let ip_call_ra_chain() use sk again */
 			ra->sk = NULL;
 			RCU_INIT_POINTER(*rap, ra->next);
-			spin_unlock_bh(&ip_ra_lock);
 
 			if (ra->destructor)
 				ra->destructor(sk);
@@ -381,7 +376,6 @@ int ip_ra_control(struct sock *sk, unsigned char on,
 		}
 	}
 	if (!new_ra) {
-		spin_unlock_bh(&ip_ra_lock);
 		return -ENOBUFS;
 	}
 	new_ra->sk = sk;
@@ -390,7 +384,6 @@ int ip_ra_control(struct sock *sk, unsigned char on,
 	RCU_INIT_POINTER(new_ra->next, ra);
 	rcu_assign_pointer(*rap, new_ra);
 	sock_hold(sk);
-	spin_unlock_bh(&ip_ra_lock);
 
 	return 0;
 }
-- 
2.5.5

^ permalink raw reply related

* Re: [Patch net-next] ipv4: get rid of ip_ra_lock
From: Cong Wang @ 2017-04-28 16:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Network Developers
In-Reply-To: <1493337257.31837.7.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, Apr 27, 2017 at 4:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2017-04-27 at 16:46 -0700, Cong Wang wrote:
>> On Thu, Apr 27, 2017 at 5:46 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Wed, 2017-04-26 at 13:55 -0700, Cong Wang wrote:
>> >> After commit 1215e51edad1 ("ipv4: fix a deadlock in ip_ra_control")
>> >> we always take RTNL lock for ip_ra_control() which is the only place
>> >> we update the list ip_ra_chain, so the ip_ra_lock is no longer needed,
>> >> we just need to disable BH there.
>> >>
>> >> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>> >> ---
>> >
>> > Looks great, but reading again this code, I believe we do not need to
>> > disable BH at all ?
>> >
>>
>> Hmm, if we don't disable BH here, a reader in BH could jump in and
>> break this critical section? Or that is fine for RCU?
>
> It should be fine for RCU.
>
> The spinlock (or mutex if this is RTNL) is protecting writers among
> themselves. Here it should run in process context, with no specific
> rules to disable preemption, hard or soft irqs.
>
> The reader(s) do not care of how writer(s) enforce their mutual
> protection, and if writer(s) disable hard or soft irqs.

Fair enough! This refreshes my understanding of RCU.

I will send V2. The ASSERT_RTNL() is unnecessary too, as
we already have one in rcu_dereference_protected().

Thanks!

^ permalink raw reply

* Re: [PATCH v2 07/21] crypto: shash, caam: Make use of the new sg_map helper function
From: Logan Gunthorpe @ 2017-04-28 16:53 UTC (permalink / raw)
  To: Herbert Xu
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b, James E.J. Bottomley,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Sumit Semwal,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	sparmaintainer-GLv8BlqOqDDQT0dZR+AlfA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	megaraidlinux.pdl-dY08KVG/lbpWk0Htik3J/w, Jens Axboe,
	Martin K. Petersen, netdev-u79uwXL29TY76Z2rM5mHXA, Matthew Wilcox,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman,
	David S. Miller
In-Reply-To: <20170428063039.GB6817-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>



On 28/04/17 12:30 AM, Herbert Xu wrote:
> You are right.  Indeed the existing code looks buggy as they
> don't take sg->offset into account when doing the kmap.  Could
> you send me some patches that fix these problems first so that
> they can be easily backported?

Ok, I think the only buggy one in crypto is hifn_795x. Shash and caam
both do have the sg->offset accounted for. I'll send a patch for the
buggy one shortly.

Logan

^ permalink raw reply

* Re: ipsec doesn't route TCP with 4.11 kernel
From: Eric Dumazet @ 2017-04-28 16:46 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Don Bowman, Cong Wang, linux-kernel@vger.kernel.org, Herbert Xu,
	Linux Kernel Network Developers
In-Reply-To: <20170428071337.GG2649@secunet.com>

On Fri, 2017-04-28 at 09:13 +0200, Steffen Klassert wrote:
>          encap type espinudp sport 4500 dport 4500 addr 0.0.0.0
> 
> Ok, this is espinudp. This information was important.

> This is not a GRO issue as I thought, the TX side is already broken.
> 
> Could you please try the patch below?
> 
> Subject: [PATCH] esp4: Fix udpencap for local TCP packets.
> 
> Locally generated TCP packets are usually cloned, so we
> do skb_cow_data() on this packets. After that we need to
> reload the pointer to the esp header. On udpencap this
> header has an offset to skb_transport_header, so take this
> offset into account.


It looks like locally generated TCP packets could avoid the
skb_cow_data(), if you were using skb_header_cloned() instead of
skb_cloned()  ?

^ permalink raw reply

* Re: llvm-objdump...
From: David Miller @ 2017-04-28 16:38 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev
In-Reply-To: <c9e9eaa7-0fa8-440a-dc75-2d4020b8c429@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Fri, 28 Apr 2017 09:22:32 -0700

> On 4/28/17 9:17 AM, David Miller wrote:
>> Even if I give it -triple=bpfeb it emits immediates incorrectly.
>>
>> The bug is certainly in the insn field fetcher of the disassembler.
> 
> got it. so the binary looks correct, but disasm output is wrong?

F.e.

      12:       07 70 00 00 00 00 00 06         r0 += 100663296

Should be "r0 += 6"

Yes.

^ permalink raw reply

* Re: [PATCH net-next V3 2/2] rtnl: Add support for netdev event attribute to link messages
From: Jiri Pirko @ 2017-04-28 16:38 UTC (permalink / raw)
  To: David Ahern; +Cc: vyasevic, Vladislav Yasevich, netdev, roopa
In-Reply-To: <760db30a-859a-a365-e0f3-69a4433ef9bf@cumulusnetworks.com>

Thu, Apr 27, 2017 at 09:59:38PM CEST, dsa@cumulusnetworks.com wrote:
>On 4/27/17 1:43 PM, Vlad Yasevich wrote:
>>> For example, NETDEV_CHANGEINFODATA is only for bonds though nothing
>>> about the name suggests it is a bonding notification. This one was added
>>> specifically to notify userspace (d4261e5650004), yet seems to happen
>>> only during a changelink and that already generates a RTM_NEWLINK
>>> message via do_setlink. Since the rtnetlink_event message does not
>>> contain anything "NETDEV_CHANGEINFODATA" related what purpose does it
>>> really serve besides duplicating netlink messages to userspace.
>>>
>> 
>> I am not sure about this one, but if you have an app trying to monitor
>> for this event, it can't really since there is no info in the netlink message.
>
>I cc'ed Jiri on this thread hoping he would explain the intent.
>
>I propose it gets removed.

Hmm, I don't really recall. But looking at it now, I agree it is
redundant.

^ permalink raw reply

* [PATCH net-next v2] net: bridge: Fix improper taking over HW learned FDB
From: Arkadi Sharshevsky @ 2017-04-28 16:31 UTC (permalink / raw)
  To: netdev; +Cc: Ido Schimmel, Nikolay Aleksandrov, bridge, Arkadi Sharshevsky,
	davem

Commit 7e26bf45e4cb ("net: bridge: allow SW learn to take over HW fdb
entries") added the ability to "take over an entry which was previously
learned via HW when it shows up from a SW port".

However, if an entry was learned via HW and then a control packet
(e.g., ARP request) was trapped to the CPU, the bridge driver will
update the entry and remove the externally learned flag, although the
entry is still present in HW. Instead, only clear the externally learned
flag in case of roaming.

Fixes: 7e26bf45e4cb ("net: bridge: allow SW learn to take over HW fdb entries")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Arkadi Sharashevsky <arkadis@mellanox.com>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
v1->v2
- net-next rebase.
---
 net/bridge/br_fdb.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index de7988b..5905eb7 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -589,16 +589,16 @@ void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
 			if (unlikely(source != fdb->dst)) {
 				fdb->dst = source;
 				fdb_modified = true;
+				/* Take over HW learned entry */
+				if (unlikely(fdb->added_by_external_learn)) {
+					fdb->added_by_external_learn = 0;
+					fdb_modified = true;
+				}
 			}
 			if (now != fdb->updated)
 				fdb->updated = now;
 			if (unlikely(added_by_user))
 				fdb->added_by_user = 1;
-			/* Take over HW learned entry */
-			if (unlikely(fdb->added_by_external_learn)) {
-				fdb->added_by_external_learn = 0;
-				fdb_modified = true;
-			}
 			if (unlikely(fdb_modified))
 				fdb_notify(br, fdb, RTM_NEWNEIGH);
 		}
-- 
2.4.11

^ permalink raw reply related

* Re: [PATCH v1 net-next 5/6] net: allow simultaneous SW and HW transmit timestamping
From: Miroslav Lichvar @ 2017-04-28 16:23 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Richard Cochran, Willem de Bruijn,
	Soheil Hassas Yeganeh, Keller, Jacob E, Denny Page, Jiri Benc
In-Reply-To: <CAF=yD-+DYwBbhU0fOMOj4XO+diG4-GLn6Ruh9ZMoXVv5ZS4PgQ@mail.gmail.com>

On Fri, Apr 28, 2017 at 11:50:28AM -0400, Willem de Bruijn wrote:
> On Fri, Apr 28, 2017 at 4:54 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> >>   if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP &&
> >> -        !(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS))
> >> +      (!(skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS)) ||
> >> +      (skb->sk && skb->sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW)
> >
> > I'm not sure if this can work. sk_buff.h would need to include sock.h
> > in order to get the definition of struct sock. Any suggestions?
> 
> A more elegant solution would be to not set SKBTX_IN_PROGRESS
> at all if SOF_TIMESTAMPING_OPT_TX_SWHW is set on the socket.
> But the patch to do so is not elegant, having to update callsites in many
> device drivers.

Also, it would change the meaning of the flag as it seems some drivers
actually use the SKBTX_IN_PROGRESS flag to check if they expect a
timestamp.

How about allocating the last bit of tx_flags for SKBTX_SWHW_TSTAMP?

> Otherwise you may indeed have to call skb_tstamp_tx for every packet
> that has SKBTX_SW_TSTAMP set, as you do. We can at least move
> the skb->sk != NULL check into skb_tx_timestamp in skbuff.h.
> 
> By the way, if changing this code, I think that it's time to get rid of
> sw_tx_timestamp. It is only called from skb_tx_timestamp. Let's
> just move the condition in there.

Ok. I assume that should be a separate patch.

-- 
Miroslav Lichvar

^ permalink raw reply

* Re: llvm-objdump...
From: Alexei Starovoitov @ 2017-04-28 16:22 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, netdev
In-Reply-To: <20170428.121710.1780087759147659481.davem@davemloft.net>

On 4/28/17 9:17 AM, David Miller wrote:
> Even if I give it -triple=bpfeb it emits immediates incorrectly.
>
> The bug is certainly in the insn field fetcher of the disassembler.

got it. so the binary looks correct, but disasm output is wrong?
Thanks! Looks like there is a bug there indeed.

^ permalink raw reply

* Re: [PATCH iproute2] routel: fix infinite loop in line parser
From: Michal Kubecek @ 2017-04-28 16:21 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20170427164647.3f0e09bf@xeon-e3>

On Thu, Apr 27, 2017 at 04:46:47PM -0700, Stephen Hemminger wrote:
> 
> Appled, but this really needs to be done better.
> Either as a simplified output of route command. See ip -br link
> Or ip route should have a json output option and use python/perl/xss
> to reformat.

Agreed, parsing the output of ip command is certainly not the right way.

I must admit that I had no idea this script exists until the bug report
landed on my table (and I had to use "rpm -qf" to learn that it's part
of iproute2 package). And I suspect I may not be the only one.

Michal Kubecek

^ permalink raw reply

* Re: [PATCH net v1 0/3] tipc: fix hanging socket connections
From: David Miller @ 2017-04-28 16:20 UTC (permalink / raw)
  To: parthasarathy.bhuvaragan; +Cc: netdev, tipc-discussion
In-Reply-To: <1493193902-18332-1-git-send-email-parthasarathy.bhuvaragan@ericsson.com>

From: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Date: Wed, 26 Apr 2017 10:04:59 +0200

> This patch series contains fixes for the socket layer to
> prevent hanging / stale connections.

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH v6 1/5] skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow
From: Sabrina Dubroca @ 2017-04-28 16:18 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: netdev, linux-kernel, David.Laight, kernel-hardening, davem
In-Reply-To: <20170425184734.26563-1-Jason@zx2c4.com>

2017-04-25, 20:47:30 +0200, Jason A. Donenfeld wrote:
> This is a defense-in-depth measure in response to bugs like
> 4d6fa57b4dab ("macsec: avoid heap overflow in skb_to_sgvec"). While
> we're at it, we also limit the amount of recursion this function is
> allowed to do. Not actually providing a bounded base case is a future
> diaster that we can easily avoid here.
> 
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> Changes v5->v6:
>   * Use unlikely() for the rare overflow conditions.
>   * Also bound recursion, since this is a potential disaster we can avert.
> 
>  net/core/skbuff.c | 31 ++++++++++++++++++++++++-------
>  1 file changed, 24 insertions(+), 7 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index f86bf69cfb8d..24fb53f8534e 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3489,16 +3489,22 @@ void __init skb_init(void)
>   *	@len: Length of buffer space to be mapped
>   *
>   *	Fill the specified scatter-gather list with mappings/pointers into a
> - *	region of the buffer space attached to a socket buffer.
> + *	region of the buffer space attached to a socket buffer. Returns either
> + *	the number of scatterlist items used, or -EMSGSIZE if the contents
> + *	could not fit.
>   */

One small thing here: since you're touching this comment, could you
move it next to skb_to_sgvec, since that's the function it's supposed
to document?

Thanks!

>  static int
> -__skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
> +__skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len,
> +	       unsigned int recursion_level)

-- 
Sabrina

^ permalink raw reply

* Re: llvm-objdump...
From: David Miller @ 2017-04-28 16:17 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev
In-Reply-To: <612f0df6-711c-b6b1-4bd7-596a7f329737@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Tue, 25 Apr 2017 20:48:52 -0700

> On 4/25/17 10:13 AM, David Miller wrote:
>>
>> I think there are some endianness issues ;-)
>>
>> davem@patience:~/src/GIT/net-next/tools/testing/selftests/bpf$
>> llvm-objdump -S x.o
> 
> nice host name ;)
> 
>> x.o:    file format ELF64-BPF
>>
>> Disassembly of section test1:
>> process:
>>        0:       b7 00 00 00 00 00 00 02         r0 = 33554432
>>        1:       61 21 00 50 00 00 00 00         r1 = *(u32 *)(r2 + 20480)
>>
>> That first instruction should be "r0 = 2"
> 
> hmm. I haven't tested it on big endian.
> When last time s390 folks tested samples/bpf with llvm we didn't even
> have automatic -march=bpf in llvm, so they used -march=bpfeb.
> There was no llvm-objdump support either.
> 
> llvm side does this:
> tatic Triple::ArchType parseBPFArch(StringRef ArchName) {
>   if (ArchName.equals("bpf")) {
>     if (sys::IsLittleEndianHost)
>       return Triple::bpfel;
>     else
>       return Triple::bpfeb;
>   } else if (ArchName.equals("bpf_be") || ArchName.equals("bpfeb")) {
>     return Triple::bpfeb;
>   } else if (ArchName.equals("bpf_le") || ArchName.equals("bpfel")) {
>     return Triple::bpfel;
> 
> It works for clang and for llvm.
> I thought llvm-objdump should infer triple from elf file
> and do the 'right thing'... hmm
> 
> could you please test it with -g and see whether dwarf is still
> correct in .o ?
> llvm-objdump -S should print original C code next to asm.
> Hope bpf dwarf is not broken on big-endian...

Even if I give it -triple=bpfeb it emits immediates incorrectly.

The bug is certainly in the insn field fetcher of the disassembler.

^ permalink raw reply

* [patch net-next] net: sched: add helpers to handle extended actions
From: Jiri Pirko @ 2017-04-28 16:13 UTC (permalink / raw)
  To: netdev; +Cc: davem, jhs, xiyou.wangcong, mlxsw

From: Jiri Pirko <jiri@mellanox.com>

Jump is now the only one using value action opcode. This is going to
change soon. So introduce helpers to work with this. Convert TC_ACT_JUMP.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/uapi/linux/pkt_cls.h | 15 ++++++++++++++-
 net/sched/act_api.c          |  2 +-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index f1129e3..d613be3 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -37,7 +37,20 @@ enum {
 #define TC_ACT_QUEUED		5
 #define TC_ACT_REPEAT		6
 #define TC_ACT_REDIRECT		7
-#define TC_ACT_JUMP		0x10000000
+
+/* There is a special kind of actions called "extended actions",
+ * which need a value parameter. These have a local opcode located in
+ * the highest nibble, starting from 1. The rest of the bits
+ * are used to carry the value. These two parts together make
+ * a combined opcode.
+ */
+#define __TC_ACT_EXT_SHIFT 28
+#define __TC_ACT_EXT(local) ((local) << __TC_ACT_EXT_SHIFT)
+#define TC_ACT_EXT_VAL_MASK ((1 << __TC_ACT_EXT_SHIFT) - 1)
+#define TC_ACT_EXT_CMP(combined, opcode) \
+	(((combined) & (~TC_ACT_EXT_VAL_MASK)) == opcode)
+
+#define TC_ACT_JUMP __TC_ACT_EXT(1)
 
 /* Action type identifiers*/
 enum {
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 7f2cd70..a90e8f3 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -453,7 +453,7 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions,
 		if (ret == TC_ACT_REPEAT)
 			goto repeat;	/* we need a ttl - JHS */
 
-		if (ret & TC_ACT_JUMP) {
+		if (TC_ACT_EXT_CMP(ret, TC_ACT_JUMP)) {
 			jmp_prgcnt = ret & TCA_ACT_MAX_PRIO_MASK;
 			if (!jmp_prgcnt || (jmp_prgcnt > nr_actions)) {
 				/* faulty opcode, stop pipeline */
-- 
2.7.4

^ permalink raw reply related

* [PATCH 6/7] crypto: aesni: make AVX2 AES-GCM work with all valid auth_tag_len
From: Sabrina Dubroca @ 2017-04-28 16:12 UTC (permalink / raw)
  To: netdev
  Cc: Sabrina Dubroca, Hannes Frederic Sowa, Herbert Xu,
	David S. Miller, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, linux-crypto, linux-kernel
In-Reply-To: <cover.1493395785.git.sd@queasysnail.net>

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 arch/x86/crypto/aesni-intel_avx-x86_64.S | 31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index 7230808a7cef..faecb1518bf8 100644
--- a/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -2804,19 +2804,36 @@ ENDPROC(aesni_gcm_dec_avx_gen2)
         cmp     $16, %r11
         je      _T_16\@
 
-        cmp     $12, %r11
-        je      _T_12\@
+        cmp     $8, %r11
+        jl      _T_4\@
 
 _T_8\@:
         vmovq   %xmm9, %rax
         mov     %rax, (%r10)
-        jmp     _return_T_done\@
-_T_12\@:
-        vmovq   %xmm9, %rax
-        mov     %rax, (%r10)
+        add     $8, %r10
+        sub     $8, %r11
         vpsrldq $8, %xmm9, %xmm9
+        cmp     $0, %r11
+        je     _return_T_done\@
+_T_4\@:
         vmovd   %xmm9, %eax
-        mov     %eax, 8(%r10)
+        mov     %eax, (%r10)
+        add     $4, %r10
+        sub     $4, %r11
+        vpsrldq     $4, %xmm9, %xmm9
+        cmp     $0, %r11
+        je     _return_T_done\@
+_T_123\@:
+        vmovd     %xmm9, %eax
+        cmp     $2, %r11
+        jl     _T_1\@
+        mov     %ax, (%r10)
+        cmp     $2, %r11
+        je     _return_T_done\@
+        add     $2, %r10
+        sar     $16, %eax
+_T_1\@:
+        mov     %al, (%r10)
         jmp     _return_T_done\@
 
 _T_16\@:
-- 
2.12.2

^ permalink raw reply related

* [PATCH 3/7] crypto: aesni: make AVX AES-GCM work with any aadlen
From: Sabrina Dubroca @ 2017-04-28 16:11 UTC (permalink / raw)
  To: netdev
  Cc: Sabrina Dubroca, Hannes Frederic Sowa, Herbert Xu,
	David S. Miller, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, linux-crypto, linux-kernel
In-Reply-To: <cover.1493395785.git.sd@queasysnail.net>

This is the first step to make the aesni AES-GCM implementation
generic. The current code was written for rfc4106, so it handles
only some specific sizes of associated data.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 arch/x86/crypto/aesni-intel_avx-x86_64.S | 122 ++++++++++++++++++++++---------
 1 file changed, 88 insertions(+), 34 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index d664382c6e56..a73117c84904 100644
--- a/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -155,6 +155,30 @@ SHIFT_MASK:      .octa     0x0f0e0d0c0b0a09080706050403020100
 ALL_F:           .octa     0xffffffffffffffffffffffffffffffff
                  .octa     0x00000000000000000000000000000000
 
+.section .rodata
+.align 16
+.type aad_shift_arr, @object
+.size aad_shift_arr, 272
+aad_shift_arr:
+        .octa     0xffffffffffffffffffffffffffffffff
+        .octa     0xffffffffffffffffffffffffffffff0C
+        .octa     0xffffffffffffffffffffffffffff0D0C
+        .octa     0xffffffffffffffffffffffffff0E0D0C
+        .octa     0xffffffffffffffffffffffff0F0E0D0C
+        .octa     0xffffffffffffffffffffff0C0B0A0908
+        .octa     0xffffffffffffffffffff0D0C0B0A0908
+        .octa     0xffffffffffffffffff0E0D0C0B0A0908
+        .octa     0xffffffffffffffff0F0E0D0C0B0A0908
+        .octa     0xffffffffffffff0C0B0A090807060504
+        .octa     0xffffffffffff0D0C0B0A090807060504
+        .octa     0xffffffffff0E0D0C0B0A090807060504
+        .octa     0xffffffff0F0E0D0C0B0A090807060504
+        .octa     0xffffff0C0B0A09080706050403020100
+        .octa     0xffff0D0C0B0A09080706050403020100
+        .octa     0xff0E0D0C0B0A09080706050403020100
+        .octa     0x0F0E0D0C0B0A09080706050403020100
+
+
 .text
 
 
@@ -372,41 +396,72 @@ VARIABLE_OFFSET = 16*8
 
 .macro INITIAL_BLOCKS_AVX num_initial_blocks T1 T2 T3 T4 T5 CTR XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 T6 T_key ENC_DEC
 	i = (8-\num_initial_blocks)
+	j = 0
 	setreg
 
-        mov     arg6, %r10                      # r10 = AAD
-        mov     arg7, %r12                      # r12 = aadLen
-
-
-        mov     %r12, %r11
-
-        vpxor   reg_i, reg_i, reg_i
-_get_AAD_loop\@:
-        vmovd   (%r10), \T1
-        vpslldq $12, \T1, \T1
-        vpsrldq $4, reg_i, reg_i
-        vpxor   \T1, reg_i, reg_i
-
-        add     $4, %r10
-        sub     $4, %r12
-        jg      _get_AAD_loop\@
-
-
-        cmp     $16, %r11
-        je      _get_AAD_loop2_done\@
-        mov     $16, %r12
-
-_get_AAD_loop2\@:
-        vpsrldq $4, reg_i, reg_i
-        sub     $4, %r12
-        cmp     %r11, %r12
-        jg      _get_AAD_loop2\@
-
-_get_AAD_loop2_done\@:
-
-        #byte-reflect the AAD data
-        vpshufb SHUF_MASK(%rip), reg_i, reg_i
-
+	mov     arg6, %r10                      # r10 = AAD
+	mov     arg7, %r12                      # r12 = aadLen
+
+
+	mov     %r12, %r11
+
+	vpxor   reg_j, reg_j, reg_j
+	vpxor   reg_i, reg_i, reg_i
+	cmp     $16, %r11
+	jl      _get_AAD_rest8\@
+_get_AAD_blocks\@:
+	vmovdqu (%r10), reg_i
+	vpshufb SHUF_MASK(%rip), reg_i, reg_i
+	vpxor   reg_i, reg_j, reg_j
+	GHASH_MUL_AVX       reg_j, \T2, \T1, \T3, \T4, \T5, \T6
+	add     $16, %r10
+	sub     $16, %r12
+	sub     $16, %r11
+	cmp     $16, %r11
+	jge     _get_AAD_blocks\@
+	vmovdqu reg_j, reg_i
+	cmp     $0, %r11
+	je      _get_AAD_done\@
+
+	vpxor   reg_i, reg_i, reg_i
+
+	/* read the last <16B of AAD. since we have at least 4B of
+	data right after the AAD (the ICV, and maybe some CT), we can
+	read 4B/8B blocks safely, and then get rid of the extra stuff */
+_get_AAD_rest8\@:
+	cmp     $4, %r11
+	jle     _get_AAD_rest4\@
+	movq    (%r10), \T1
+	add     $8, %r10
+	sub     $8, %r11
+	vpslldq $8, \T1, \T1
+	vpsrldq $8, reg_i, reg_i
+	vpxor   \T1, reg_i, reg_i
+	jmp     _get_AAD_rest8\@
+_get_AAD_rest4\@:
+	cmp     $0, %r11
+	jle      _get_AAD_rest0\@
+	mov     (%r10), %eax
+	movq    %rax, \T1
+	add     $4, %r10
+	sub     $4, %r11
+	vpslldq $12, \T1, \T1
+	vpsrldq $4, reg_i, reg_i
+	vpxor   \T1, reg_i, reg_i
+_get_AAD_rest0\@:
+	/* finalize: shift out the extra bytes we read, and align
+	left. since pslldq can only shift by an immediate, we use
+	vpshufb and an array of shuffle masks */
+	movq    %r12, %r11
+	salq    $4, %r11
+	movdqu  aad_shift_arr(%r11), \T1
+	vpshufb \T1, reg_i, reg_i
+_get_AAD_rest_final\@:
+	vpshufb SHUF_MASK(%rip), reg_i, reg_i
+	vpxor   reg_j, reg_i, reg_i
+	GHASH_MUL_AVX       reg_i, \T2, \T1, \T3, \T4, \T5, \T6
+
+_get_AAD_done\@:
 	# initialize the data pointer offset as zero
 	xor     %r11, %r11
 
@@ -480,7 +535,6 @@ VARIABLE_OFFSET = 16*8
 	i = (8-\num_initial_blocks)
 	j = (9-\num_initial_blocks)
 	setreg
-        GHASH_MUL_AVX       reg_i, \T2, \T1, \T3, \T4, \T5, \T6
 
 .rep \num_initial_blocks
         vpxor    reg_i, reg_j, reg_j
-- 
2.12.2

^ permalink raw reply related

* [PATCH 1/7] crypto: aesni: make non-AVX AES-GCM work with any aadlen
From: Sabrina Dubroca @ 2017-04-28 16:11 UTC (permalink / raw)
  To: netdev
  Cc: Sabrina Dubroca, Hannes Frederic Sowa, Herbert Xu,
	David S. Miller, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, linux-crypto, linux-kernel
In-Reply-To: <cover.1493395785.git.sd@queasysnail.net>

This is the first step to make the aesni AES-GCM implementation
generic. The current code was written for rfc4106, so it handles only
some specific sizes of associated data.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 arch/x86/crypto/aesni-intel_asm.S | 169 +++++++++++++++++++++++++++++---------
 1 file changed, 132 insertions(+), 37 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 3c465184ff8a..605726aaf0a2 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -89,6 +89,29 @@ SHIFT_MASK: .octa 0x0f0e0d0c0b0a09080706050403020100
 ALL_F:      .octa 0xffffffffffffffffffffffffffffffff
             .octa 0x00000000000000000000000000000000
 
+.section .rodata
+.align 16
+.type aad_shift_arr, @object
+.size aad_shift_arr, 272
+aad_shift_arr:
+        .octa     0xffffffffffffffffffffffffffffffff
+        .octa     0xffffffffffffffffffffffffffffff0C
+        .octa     0xffffffffffffffffffffffffffff0D0C
+        .octa     0xffffffffffffffffffffffffff0E0D0C
+        .octa     0xffffffffffffffffffffffff0F0E0D0C
+        .octa     0xffffffffffffffffffffff0C0B0A0908
+        .octa     0xffffffffffffffffffff0D0C0B0A0908
+        .octa     0xffffffffffffffffff0E0D0C0B0A0908
+        .octa     0xffffffffffffffff0F0E0D0C0B0A0908
+        .octa     0xffffffffffffff0C0B0A090807060504
+        .octa     0xffffffffffff0D0C0B0A090807060504
+        .octa     0xffffffffff0E0D0C0B0A090807060504
+        .octa     0xffffffff0F0E0D0C0B0A090807060504
+        .octa     0xffffff0C0B0A09080706050403020100
+        .octa     0xffff0D0C0B0A09080706050403020100
+        .octa     0xff0E0D0C0B0A09080706050403020100
+        .octa     0x0F0E0D0C0B0A09080706050403020100
+
 
 .text
 
@@ -252,32 +275,66 @@ XMM2 XMM3 XMM4 XMMDst TMP6 TMP7 i i_seq operation
 	mov	   arg8, %r12           # %r12 = aadLen
 	mov	   %r12, %r11
 	pxor	   %xmm\i, %xmm\i
+	pxor       \XMM2, \XMM2
 
-_get_AAD_loop\num_initial_blocks\operation:
-	movd	   (%r10), \TMP1
-	pslldq	   $12, \TMP1
-	psrldq	   $4, %xmm\i
+	cmp	   $16, %r11
+	jl	   _get_AAD_rest8\num_initial_blocks\operation
+_get_AAD_blocks\num_initial_blocks\operation:
+	movdqu	   (%r10), %xmm\i
+	PSHUFB_XMM %xmm14, %xmm\i # byte-reflect the AAD data
+	pxor	   %xmm\i, \XMM2
+	GHASH_MUL  \XMM2, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
+	add	   $16, %r10
+	sub	   $16, %r12
+	sub	   $16, %r11
+	cmp	   $16, %r11
+	jge	   _get_AAD_blocks\num_initial_blocks\operation
+
+	movdqu	   \XMM2, %xmm\i
+	cmp	   $0, %r11
+	je	   _get_AAD_done\num_initial_blocks\operation
+
+	pxor	   %xmm\i,%xmm\i
+
+	/* read the last <16B of AAD. since we have at least 4B of
+	data right after the AAD (the ICV, and maybe some CT), we can
+	read 4B/8B blocks safely, and then get rid of the extra stuff */
+_get_AAD_rest8\num_initial_blocks\operation:
+	cmp	   $4, %r11
+	jle	   _get_AAD_rest4\num_initial_blocks\operation
+	movq	   (%r10), \TMP1
+	add	   $8, %r10
+	sub	   $8, %r11
+	pslldq	   $8, \TMP1
+	psrldq	   $8, %xmm\i
 	pxor	   \TMP1, %xmm\i
+	jmp	   _get_AAD_rest8\num_initial_blocks\operation
+_get_AAD_rest4\num_initial_blocks\operation:
+	cmp	   $0, %r11
+	jle	   _get_AAD_rest0\num_initial_blocks\operation
+	mov	   (%r10), %eax
+	movq	   %rax, \TMP1
 	add	   $4, %r10
-	sub	   $4, %r12
-	jne	   _get_AAD_loop\num_initial_blocks\operation
-
-	cmp	   $16, %r11
-	je	   _get_AAD_loop2_done\num_initial_blocks\operation
-
-	mov	   $16, %r12
-_get_AAD_loop2\num_initial_blocks\operation:
+	sub	   $4, %r10
+	pslldq	   $12, \TMP1
 	psrldq	   $4, %xmm\i
-	sub	   $4, %r12
-	cmp	   %r11, %r12
-	jne	   _get_AAD_loop2\num_initial_blocks\operation
-
-_get_AAD_loop2_done\num_initial_blocks\operation:
+	pxor	   \TMP1, %xmm\i
+_get_AAD_rest0\num_initial_blocks\operation:
+	/* finalize: shift out the extra bytes we read, and align
+	left. since pslldq can only shift by an immediate, we use
+	vpshufb and an array of shuffle masks */
+	movq	   %r12, %r11
+	salq	   $4, %r11
+	movdqu	   aad_shift_arr(%r11), \TMP1
+	PSHUFB_XMM \TMP1, %xmm\i
+_get_AAD_rest_final\num_initial_blocks\operation:
 	PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
+	pxor	   \XMM2, %xmm\i
+	GHASH_MUL  %xmm\i, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
 
+_get_AAD_done\num_initial_blocks\operation:
 	xor	   %r11, %r11 # initialise the data pointer offset as zero
-
-        # start AES for num_initial_blocks blocks
+	# start AES for num_initial_blocks blocks
 
 	mov	   %arg5, %rax                      # %rax = *Y0
 	movdqu	   (%rax), \XMM0                    # XMM0 = Y0
@@ -322,7 +379,7 @@ XMM2 XMM3 XMM4 XMMDst TMP6 TMP7 i i_seq operation
                 # prepare plaintext/ciphertext for GHASH computation
 .endr
 .endif
-	GHASH_MUL  %xmm\i, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
+
         # apply GHASH on num_initial_blocks blocks
 
 .if \i == 5
@@ -477,28 +534,66 @@ XMM2 XMM3 XMM4 XMMDst TMP6 TMP7 i i_seq operation
 	mov	   arg8, %r12           # %r12 = aadLen
 	mov	   %r12, %r11
 	pxor	   %xmm\i, %xmm\i
-_get_AAD_loop\num_initial_blocks\operation:
-	movd	   (%r10), \TMP1
-	pslldq	   $12, \TMP1
-	psrldq	   $4, %xmm\i
+	pxor	   \XMM2, \XMM2
+
+	cmp	   $16, %r11
+	jl	   _get_AAD_rest8\num_initial_blocks\operation
+_get_AAD_blocks\num_initial_blocks\operation:
+	movdqu	   (%r10), %xmm\i
+	PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
+	pxor	   %xmm\i, \XMM2
+	GHASH_MUL  \XMM2, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
+	add	   $16, %r10
+	sub	   $16, %r12
+	sub	   $16, %r11
+	cmp	   $16, %r11
+	jge	   _get_AAD_blocks\num_initial_blocks\operation
+
+	movdqu	   \XMM2, %xmm\i
+	cmp	   $0, %r11
+	je	   _get_AAD_done\num_initial_blocks\operation
+
+	pxor	   %xmm\i,%xmm\i
+
+	/* read the last <16B of AAD. since we have at least 4B of
+	data right after the AAD (the ICV, and maybe some PT), we can
+	read 4B/8B blocks safely, and then get rid of the extra stuff */
+_get_AAD_rest8\num_initial_blocks\operation:
+	cmp	   $4, %r11
+	jle	   _get_AAD_rest4\num_initial_blocks\operation
+	movq	   (%r10), \TMP1
+	add	   $8, %r10
+	sub	   $8, %r11
+	pslldq	   $8, \TMP1
+	psrldq	   $8, %xmm\i
 	pxor	   \TMP1, %xmm\i
+	jmp	   _get_AAD_rest8\num_initial_blocks\operation
+_get_AAD_rest4\num_initial_blocks\operation:
+	cmp	   $0, %r11
+	jle	   _get_AAD_rest0\num_initial_blocks\operation
+	mov	   (%r10), %eax
+	movq	   %rax, \TMP1
 	add	   $4, %r10
-	sub	   $4, %r12
-	jne	   _get_AAD_loop\num_initial_blocks\operation
-	cmp	   $16, %r11
-	je	   _get_AAD_loop2_done\num_initial_blocks\operation
-	mov	   $16, %r12
-_get_AAD_loop2\num_initial_blocks\operation:
+	sub	   $4, %r10
+	pslldq	   $12, \TMP1
 	psrldq	   $4, %xmm\i
-	sub	   $4, %r12
-	cmp	   %r11, %r12
-	jne	   _get_AAD_loop2\num_initial_blocks\operation
-_get_AAD_loop2_done\num_initial_blocks\operation:
+	pxor	   \TMP1, %xmm\i
+_get_AAD_rest0\num_initial_blocks\operation:
+	/* finalize: shift out the extra bytes we read, and align
+	left. since pslldq can only shift by an immediate, we use
+	vpshufb and an array of shuffle masks */
+	movq	   %r12, %r11
+	salq	   $4, %r11
+	movdqu	   aad_shift_arr(%r11), \TMP1
+	PSHUFB_XMM \TMP1, %xmm\i
+_get_AAD_rest_final\num_initial_blocks\operation:
 	PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
+	pxor	   \XMM2, %xmm\i
+	GHASH_MUL  %xmm\i, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
 
+_get_AAD_done\num_initial_blocks\operation:
 	xor	   %r11, %r11 # initialise the data pointer offset as zero
-
-        # start AES for num_initial_blocks blocks
+	# start AES for num_initial_blocks blocks
 
 	mov	   %arg5, %rax                      # %rax = *Y0
 	movdqu	   (%rax), \XMM0                    # XMM0 = Y0
@@ -543,7 +638,7 @@ XMM2 XMM3 XMM4 XMMDst TMP6 TMP7 i i_seq operation
 		# prepare plaintext/ciphertext for GHASH computation
 .endr
 .endif
-	GHASH_MUL  %xmm\i, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
+
         # apply GHASH on num_initial_blocks blocks
 
 .if \i == 5
-- 
2.12.2

^ permalink raw reply related

* [PATCH 7/7] crypto: aesni: add generic gcm(aes)
From: Sabrina Dubroca @ 2017-04-28 16:12 UTC (permalink / raw)
  To: netdev
  Cc: Sabrina Dubroca, Hannes Frederic Sowa, Herbert Xu,
	David S. Miller, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, linux-crypto, linux-kernel
In-Reply-To: <cover.1493395785.git.sd@queasysnail.net>

Now that the asm side of things can support all the valid lengths of ICV
and all lengths of associated data, provide the glue code to expose a
generic gcm(aes) crypto algorithm.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 arch/x86/crypto/aesni-intel_glue.c | 208 ++++++++++++++++++++++++++++---------
 1 file changed, 158 insertions(+), 50 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 93de8ea51548..4a55cdcdc008 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -61,6 +61,11 @@ struct aesni_rfc4106_gcm_ctx {
 	u8 nonce[4];
 };
 
+struct generic_gcmaes_ctx {
+	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
+};
+
 struct aesni_xts_ctx {
 	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
 	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
@@ -102,13 +107,11 @@ asmlinkage void aesni_xts_crypt8(struct crypto_aes_ctx *ctx, u8 *out,
  * u8 *out, Ciphertext output. Encrypt in-place is allowed.
  * const u8 *in, Plaintext input
  * unsigned long plaintext_len, Length of data in bytes for encryption.
- * u8 *iv, Pre-counter block j0: 4 byte salt (from Security Association)
- *         concatenated with 8 byte Initialisation Vector (from IPSec ESP
- *         Payload) concatenated with 0x00000001. 16-byte aligned pointer.
+ * u8 *iv, Pre-counter block j0: 12 byte IV concatenated with 0x00000001.
+ *         16-byte aligned pointer.
  * u8 *hash_subkey, the Hash sub key input. Data starts on a 16-byte boundary.
  * const u8 *aad, Additional Authentication Data (AAD)
- * unsigned long aad_len, Length of AAD in bytes. With RFC4106 this
- *          is going to be 8 or 12 bytes
+ * unsigned long aad_len, Length of AAD in bytes.
  * u8 *auth_tag, Authenticated Tag output.
  * unsigned long auth_tag_len), Authenticated Tag Length in bytes.
  *          Valid values are 16 (most likely), 12 or 8.
@@ -123,9 +126,8 @@ asmlinkage void aesni_gcm_enc(void *ctx, u8 *out,
  * u8 *out, Plaintext output. Decrypt in-place is allowed.
  * const u8 *in, Ciphertext input
  * unsigned long ciphertext_len, Length of data in bytes for decryption.
- * u8 *iv, Pre-counter block j0: 4 byte salt (from Security Association)
- *         concatenated with 8 byte Initialisation Vector (from IPSec ESP
- *         Payload) concatenated with 0x00000001. 16-byte aligned pointer.
+ * u8 *iv, Pre-counter block j0: 12 byte IV concatenated with 0x00000001.
+ *         16-byte aligned pointer.
  * u8 *hash_subkey, the Hash sub key input. Data starts on a 16-byte boundary.
  * const u8 *aad, Additional Authentication Data (AAD)
  * unsigned long aad_len, Length of AAD in bytes. With RFC4106 this is going
@@ -275,6 +277,16 @@ aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 		align = 1;
 	return PTR_ALIGN(crypto_aead_ctx(tfm), align);
 }
+
+static inline struct
+generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
+{
+	unsigned long align = AESNI_ALIGN;
+
+	if (align <= crypto_tfm_ctx_alignment())
+		align = 1;
+	return PTR_ALIGN(crypto_aead_ctx(tfm), align);
+}
 #endif
 
 static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
@@ -712,32 +724,34 @@ static int rfc4106_set_authsize(struct crypto_aead *parent,
 	return crypto_aead_setauthsize(&cryptd_tfm->base, authsize);
 }
 
-static int helper_rfc4106_encrypt(struct aead_request *req)
+static int generic_gcmaes_set_authsize(struct crypto_aead *tfm,
+				       unsigned int authsize)
+{
+	switch (authsize) {
+	case 4:
+	case 8:
+	case 12:
+	case 13:
+	case 14:
+	case 15:
+	case 16:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int gcmaes_encrypt(struct aead_request *req, unsigned int assoclen,
+			  u8 *hash_subkey, u8 *iv, void *aes_ctx)
 {
 	u8 one_entry_in_sg = 0;
 	u8 *src, *dst, *assoc;
-	__be32 counter = cpu_to_be32(1);
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
-	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
-	void *aes_ctx = &(ctx->aes_key_expanded);
 	unsigned long auth_tag_len = crypto_aead_authsize(tfm);
-	u8 iv[16] __attribute__ ((__aligned__(AESNI_ALIGN)));
 	struct scatter_walk src_sg_walk;
 	struct scatter_walk dst_sg_walk = {};
-	unsigned int i;
-
-	/* Assuming we are supporting rfc4106 64-bit extended */
-	/* sequence numbers We need to have the AAD length equal */
-	/* to 16 or 20 bytes */
-	if (unlikely(req->assoclen != 16 && req->assoclen != 20))
-		return -EINVAL;
-
-	/* IV below built */
-	for (i = 0; i < 4; i++)
-		*(iv+i) = ctx->nonce[i];
-	for (i = 0; i < 8; i++)
-		*(iv+4+i) = req->iv[i];
-	*((__be32 *)(iv+12)) = counter;
 
 	if (sg_is_last(req->src) &&
 	    (!PageHighMem(sg_page(req->src)) ||
@@ -768,7 +782,7 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
 
 	kernel_fpu_begin();
 	aesni_gcm_enc_tfm(aes_ctx, dst, src, req->cryptlen, iv,
-			  ctx->hash_subkey, assoc, req->assoclen - 8,
+			  hash_subkey, assoc, assoclen,
 			  dst + req->cryptlen, auth_tag_len);
 	kernel_fpu_end();
 
@@ -791,37 +805,20 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
 	return 0;
 }
 
-static int helper_rfc4106_decrypt(struct aead_request *req)
+static int gcmaes_decrypt(struct aead_request *req, unsigned int assoclen,
+			  u8 *hash_subkey, u8 *iv, void *aes_ctx)
 {
 	u8 one_entry_in_sg = 0;
 	u8 *src, *dst, *assoc;
 	unsigned long tempCipherLen = 0;
-	__be32 counter = cpu_to_be32(1);
-	int retval = 0;
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
-	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
-	void *aes_ctx = &(ctx->aes_key_expanded);
 	unsigned long auth_tag_len = crypto_aead_authsize(tfm);
-	u8 iv[16] __attribute__ ((__aligned__(AESNI_ALIGN)));
 	u8 authTag[16];
 	struct scatter_walk src_sg_walk;
 	struct scatter_walk dst_sg_walk = {};
-	unsigned int i;
-
-	if (unlikely(req->assoclen != 16 && req->assoclen != 20))
-		return -EINVAL;
-
-	/* Assuming we are supporting rfc4106 64-bit extended */
-	/* sequence numbers We need to have the AAD length */
-	/* equal to 16 or 20 bytes */
+	int retval = 0;
 
 	tempCipherLen = (unsigned long)(req->cryptlen - auth_tag_len);
-	/* IV below built */
-	for (i = 0; i < 4; i++)
-		*(iv+i) = ctx->nonce[i];
-	for (i = 0; i < 8; i++)
-		*(iv+4+i) = req->iv[i];
-	*((__be32 *)(iv+12)) = counter;
 
 	if (sg_is_last(req->src) &&
 	    (!PageHighMem(sg_page(req->src)) ||
@@ -838,7 +835,6 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 			scatterwalk_start(&dst_sg_walk, req->dst);
 			dst = scatterwalk_map(&dst_sg_walk) + req->assoclen;
 		}
-
 	} else {
 		/* Allocate memory for src, dst, assoc */
 		assoc = kmalloc(req->cryptlen + req->assoclen, GFP_ATOMIC);
@@ -850,9 +846,10 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 		dst = src;
 	}
 
+
 	kernel_fpu_begin();
 	aesni_gcm_dec_tfm(aes_ctx, dst, src, tempCipherLen, iv,
-			  ctx->hash_subkey, assoc, req->assoclen - 8,
+			  hash_subkey, assoc, assoclen,
 			  authTag, auth_tag_len);
 	kernel_fpu_end();
 
@@ -875,6 +872,60 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 		kfree(assoc);
 	}
 	return retval;
+
+}
+
+static int helper_rfc4106_encrypt(struct aead_request *req)
+{
+	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
+	void *aes_ctx = &(ctx->aes_key_expanded);
+	u8 iv[16] __attribute__ ((__aligned__(AESNI_ALIGN)));
+	unsigned int i;
+	__be32 counter = cpu_to_be32(1);
+
+	/* Assuming we are supporting rfc4106 64-bit extended */
+	/* sequence numbers We need to have the AAD length equal */
+	/* to 16 or 20 bytes */
+	if (unlikely(req->assoclen != 16 && req->assoclen != 20))
+		return -EINVAL;
+
+	/* IV below built */
+	for (i = 0; i < 4; i++)
+		*(iv+i) = ctx->nonce[i];
+	for (i = 0; i < 8; i++)
+		*(iv+4+i) = req->iv[i];
+	*((__be32 *)(iv+12)) = counter;
+
+	return gcmaes_encrypt(req, req->assoclen - 8, ctx->hash_subkey, iv,
+			      aes_ctx);
+}
+
+static int helper_rfc4106_decrypt(struct aead_request *req)
+{
+	__be32 counter = cpu_to_be32(1);
+	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
+	void *aes_ctx = &(ctx->aes_key_expanded);
+	u8 iv[16] __attribute__ ((__aligned__(AESNI_ALIGN)));
+	unsigned int i;
+
+	if (unlikely(req->assoclen != 16 && req->assoclen != 20))
+		return -EINVAL;
+
+	/* Assuming we are supporting rfc4106 64-bit extended */
+	/* sequence numbers We need to have the AAD length */
+	/* equal to 16 or 20 bytes */
+
+	/* IV below built */
+	for (i = 0; i < 4; i++)
+		*(iv+i) = ctx->nonce[i];
+	for (i = 0; i < 8; i++)
+		*(iv+4+i) = req->iv[i];
+	*((__be32 *)(iv+12)) = counter;
+
+	return gcmaes_decrypt(req, req->assoclen - 8, ctx->hash_subkey, iv,
+			      aes_ctx);
 }
 
 static int rfc4106_encrypt(struct aead_request *req)
@@ -1035,6 +1086,46 @@ struct {
 };
 
 #ifdef CONFIG_X86_64
+static int generic_gcmaes_set_key(struct crypto_aead *aead, const u8 *key,
+				  unsigned int key_len)
+{
+	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(aead);
+
+	return aes_set_key_common(crypto_aead_tfm(aead),
+				  &ctx->aes_key_expanded, key, key_len) ?:
+	       rfc4106_set_hash_subkey(ctx->hash_subkey, key, key_len);
+}
+
+static int generic_gcmaes_encrypt(struct aead_request *req)
+{
+	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
+	void *aes_ctx = &(ctx->aes_key_expanded);
+	u8 iv[16] __attribute__ ((__aligned__(AESNI_ALIGN)));
+	__be32 counter = cpu_to_be32(1);
+
+	memcpy(iv, req->iv, 12);
+	*((__be32 *)(iv+12)) = counter;
+
+	return gcmaes_encrypt(req, req->assoclen, ctx->hash_subkey, iv,
+			      aes_ctx);
+}
+
+static int generic_gcmaes_decrypt(struct aead_request *req)
+{
+	__be32 counter = cpu_to_be32(1);
+	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
+	void *aes_ctx = &(ctx->aes_key_expanded);
+	u8 iv[16] __attribute__ ((__aligned__(AESNI_ALIGN)));
+
+	memcpy(iv, req->iv, 12);
+	*((__be32 *)(iv+12)) = counter;
+
+	return gcmaes_decrypt(req, req->assoclen, ctx->hash_subkey, iv,
+			      aes_ctx);
+}
+
 static struct aead_alg aesni_aead_algs[] = { {
 	.setkey			= common_rfc4106_set_key,
 	.setauthsize		= common_rfc4106_set_authsize,
@@ -1069,6 +1160,23 @@ static struct aead_alg aesni_aead_algs[] = { {
 		.cra_ctxsize		= sizeof(struct cryptd_aead *),
 		.cra_module		= THIS_MODULE,
 	},
+}, {
+	.setkey			= generic_gcmaes_set_key,
+	.setauthsize		= generic_gcmaes_set_authsize,
+	.encrypt		= generic_gcmaes_encrypt,
+	.decrypt		= generic_gcmaes_decrypt,
+	.ivsize			= 12,
+	.maxauthsize		= 16,
+	.base = {
+		.cra_name		= "gcm(aes)",
+		.cra_driver_name	= "generic-gcm-aesni",
+		.cra_priority		= 400,
+		.cra_flags		= CRYPTO_ALG_ASYNC,
+		.cra_blocksize		= 1,
+		.cra_ctxsize		= sizeof(struct generic_gcmaes_ctx),
+		.cra_alignmask		= AESNI_ALIGN - 1,
+		.cra_module		= THIS_MODULE,
+	},
 } };
 #else
 static struct aead_alg aesni_aead_algs[0];
-- 
2.12.2

^ permalink raw reply related

* [PATCH 5/7] crypto: aesni: make AVX2 AES-GCM work with any aadlen
From: Sabrina Dubroca @ 2017-04-28 16:12 UTC (permalink / raw)
  To: netdev
  Cc: Sabrina Dubroca, Hannes Frederic Sowa, Herbert Xu,
	David S. Miller, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, linux-crypto, linux-kernel
In-Reply-To: <cover.1493395785.git.sd@queasysnail.net>

This is the first step to make the aesni AES-GCM implementation
generic. The current code was written for rfc4106, so it handles only
some specific sizes of associated data.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 arch/x86/crypto/aesni-intel_avx-x86_64.S | 85 ++++++++++++++++++++++----------
 1 file changed, 58 insertions(+), 27 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index ee6283120f83..7230808a7cef 100644
--- a/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -1702,41 +1702,73 @@ ENDPROC(aesni_gcm_dec_avx_gen2)
 
 .macro INITIAL_BLOCKS_AVX2 num_initial_blocks T1 T2 T3 T4 T5 CTR XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 T6 T_key ENC_DEC VER
 	i = (8-\num_initial_blocks)
+	j = 0
 	setreg
 
-        mov     arg6, %r10                       # r10 = AAD
-        mov     arg7, %r12                       # r12 = aadLen
-
-
-        mov     %r12, %r11
-
-        vpxor   reg_i, reg_i, reg_i
-_get_AAD_loop\@:
-        vmovd   (%r10), \T1
-        vpslldq $12, \T1, \T1
-        vpsrldq $4, reg_i, reg_i
-        vpxor   \T1, reg_i, reg_i
+	mov     arg6, %r10                       # r10 = AAD
+	mov     arg7, %r12                       # r12 = aadLen
 
-        add     $4, %r10
-        sub     $4, %r12
-        jg      _get_AAD_loop\@
 
+	mov     %r12, %r11
 
-        cmp     $16, %r11
-        je      _get_AAD_loop2_done\@
-        mov     $16, %r12
+	vpxor   reg_j, reg_j, reg_j
+	vpxor   reg_i, reg_i, reg_i
 
-_get_AAD_loop2\@:
-        vpsrldq $4, reg_i, reg_i
-        sub     $4, %r12
-        cmp     %r11, %r12
-        jg      _get_AAD_loop2\@
+	cmp     $16, %r11
+	jl      _get_AAD_rest8\@
+_get_AAD_blocks\@:
+	vmovdqu (%r10), reg_i
+	vpshufb SHUF_MASK(%rip), reg_i, reg_i
+	vpxor   reg_i, reg_j, reg_j
+	GHASH_MUL_AVX2      reg_j, \T2, \T1, \T3, \T4, \T5, \T6
+	add     $16, %r10
+	sub     $16, %r12
+	sub     $16, %r11
+	cmp     $16, %r11
+	jge     _get_AAD_blocks\@
+	vmovdqu reg_j, reg_i
+	cmp     $0, %r11
+	je      _get_AAD_done\@
 
-_get_AAD_loop2_done\@:
+	vpxor   reg_i, reg_i, reg_i
 
-        #byte-reflect the AAD data
-        vpshufb SHUF_MASK(%rip), reg_i, reg_i
+	/* read the last <16B of AAD. since we have at least 4B of
+	data right after the AAD (the ICV, and maybe some CT), we can
+	read 4B/8B blocks safely, and then get rid of the extra stuff */
+_get_AAD_rest8\@:
+	cmp     $4, %r11
+	jle     _get_AAD_rest4\@
+	movq    (%r10), \T1
+	add     $8, %r10
+	sub     $8, %r11
+	vpslldq $8, \T1, \T1
+	vpsrldq $8, reg_i, reg_i
+	vpxor   \T1, reg_i, reg_i
+	jmp     _get_AAD_rest8\@
+_get_AAD_rest4\@:
+	cmp     $0, %r11
+	jle     _get_AAD_rest0\@
+	mov     (%r10), %eax
+	movq    %rax, \T1
+	add     $4, %r10
+	sub     $4, %r11
+	vpslldq $12, \T1, \T1
+	vpsrldq $4, reg_i, reg_i
+	vpxor   \T1, reg_i, reg_i
+_get_AAD_rest0\@:
+	/* finalize: shift out the extra bytes we read, and align
+	left. since pslldq can only shift by an immediate, we use
+	vpshufb and an array of shuffle masks */
+	movq    %r12, %r11
+	salq    $4, %r11
+	movdqu  aad_shift_arr(%r11), \T1
+	vpshufb \T1, reg_i, reg_i
+_get_AAD_rest_final\@:
+	vpshufb SHUF_MASK(%rip), reg_i, reg_i
+	vpxor   reg_j, reg_i, reg_i
+	GHASH_MUL_AVX2      reg_i, \T2, \T1, \T3, \T4, \T5, \T6
 
+_get_AAD_done\@:
 	# initialize the data pointer offset as zero
 	xor     %r11, %r11
 
@@ -1811,7 +1843,6 @@ ENDPROC(aesni_gcm_dec_avx_gen2)
 	i = (8-\num_initial_blocks)
 	j = (9-\num_initial_blocks)
 	setreg
-        GHASH_MUL_AVX2       reg_i, \T2, \T1, \T3, \T4, \T5, \T6
 
 .rep \num_initial_blocks
         vpxor    reg_i, reg_j, reg_j
-- 
2.12.2

^ permalink raw reply related

* [PATCH 4/7] crypto: aesni: make AVX AES-GCM work with all valid auth_tag_len
From: Sabrina Dubroca @ 2017-04-28 16:11 UTC (permalink / raw)
  To: netdev
  Cc: Sabrina Dubroca, Hannes Frederic Sowa, Herbert Xu,
	David S. Miller, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, linux-crypto, linux-kernel
In-Reply-To: <cover.1493395785.git.sd@queasysnail.net>

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 arch/x86/crypto/aesni-intel_avx-x86_64.S | 31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index a73117c84904..ee6283120f83 100644
--- a/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -1481,19 +1481,36 @@ VARIABLE_OFFSET = 16*8
         cmp     $16, %r11
         je      _T_16\@
 
-        cmp     $12, %r11
-        je      _T_12\@
+        cmp     $8, %r11
+        jl      _T_4\@
 
 _T_8\@:
         vmovq   %xmm9, %rax
         mov     %rax, (%r10)
-        jmp     _return_T_done\@
-_T_12\@:
-        vmovq   %xmm9, %rax
-        mov     %rax, (%r10)
+        add     $8, %r10
+        sub     $8, %r11
         vpsrldq $8, %xmm9, %xmm9
+        cmp     $0, %r11
+        je     _return_T_done\@
+_T_4\@:
         vmovd   %xmm9, %eax
-        mov     %eax, 8(%r10)
+        mov     %eax, (%r10)
+        add     $4, %r10
+        sub     $4, %r11
+        vpsrldq     $4, %xmm9, %xmm9
+        cmp     $0, %r11
+        je     _return_T_done\@
+_T_123\@:
+        vmovd     %xmm9, %eax
+        cmp     $2, %r11
+        jl     _T_1\@
+        mov     %ax, (%r10)
+        cmp     $2, %r11
+        je     _return_T_done\@
+        add     $2, %r10
+        sar     $16, %eax
+_T_1\@:
+        mov     %al, (%r10)
         jmp     _return_T_done\@
 
 _T_16\@:
-- 
2.12.2

^ permalink raw reply related

* [PATCH 2/7] crypto: aesni: make non-AVX AES-GCM work with all valid auth_tag_len
From: Sabrina Dubroca @ 2017-04-28 16:11 UTC (permalink / raw)
  To: netdev
  Cc: Sabrina Dubroca, Hannes Frederic Sowa, Herbert Xu,
	David S. Miller, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, linux-crypto, linux-kernel
In-Reply-To: <cover.1493395785.git.sd@queasysnail.net>

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 arch/x86/crypto/aesni-intel_asm.S | 62 ++++++++++++++++++++++++++++++---------
 1 file changed, 48 insertions(+), 14 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 605726aaf0a2..16627fec80b2 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -1549,18 +1549,35 @@ ENTRY(aesni_gcm_dec)
 	mov	arg10, %r11               # %r11 = auth_tag_len
 	cmp	$16, %r11
 	je	_T_16_decrypt
-	cmp	$12, %r11
-	je	_T_12_decrypt
+	cmp	$8, %r11
+	jl	_T_4_decrypt
 _T_8_decrypt:
 	MOVQ_R64_XMM	%xmm0, %rax
 	mov	%rax, (%r10)
-	jmp	_return_T_done_decrypt
-_T_12_decrypt:
-	MOVQ_R64_XMM	%xmm0, %rax
-	mov	%rax, (%r10)
+	add	$8, %r10
+	sub	$8, %r11
 	psrldq	$8, %xmm0
+	cmp	$0, %r11
+	je	_return_T_done_decrypt
+_T_4_decrypt:
+	movd	%xmm0, %eax
+	mov	%eax, (%r10)
+	add	$4, %r10
+	sub	$4, %r11
+	psrldq	$4, %xmm0
+	cmp	$0, %r11
+	je	_return_T_done_decrypt
+_T_123_decrypt:
 	movd	%xmm0, %eax
-	mov	%eax, 8(%r10)
+	cmp	$2, %r11
+	jl	_T_1_decrypt
+	mov	%ax, (%r10)
+	cmp	$2, %r11
+	je	_return_T_done_decrypt
+	add	$2, %r10
+	sar	$16, %eax
+_T_1_decrypt:
+	mov	%al, (%r10)
 	jmp	_return_T_done_decrypt
 _T_16_decrypt:
 	movdqu	%xmm0, (%r10)
@@ -1813,18 +1830,35 @@ ENTRY(aesni_gcm_enc)
 	mov	arg10, %r11                    # %r11 = auth_tag_len
 	cmp	$16, %r11
 	je	_T_16_encrypt
-	cmp	$12, %r11
-	je	_T_12_encrypt
+	cmp	$8, %r11
+	jl	_T_4_encrypt
 _T_8_encrypt:
 	MOVQ_R64_XMM	%xmm0, %rax
 	mov	%rax, (%r10)
-	jmp	_return_T_done_encrypt
-_T_12_encrypt:
-	MOVQ_R64_XMM	%xmm0, %rax
-	mov	%rax, (%r10)
+	add	$8, %r10
+	sub	$8, %r11
 	psrldq	$8, %xmm0
+	cmp	$0, %r11
+	je	_return_T_done_encrypt
+_T_4_encrypt:
+	movd	%xmm0, %eax
+	mov	%eax, (%r10)
+	add	$4, %r10
+	sub	$4, %r11
+	psrldq	$4, %xmm0
+	cmp	$0, %r11
+	je	_return_T_done_encrypt
+_T_123_encrypt:
 	movd	%xmm0, %eax
-	mov	%eax, 8(%r10)
+	cmp	$2, %r11
+	jl	_T_1_encrypt
+	mov	%ax, (%r10)
+	cmp	$2, %r11
+	je	_return_T_done_encrypt
+	add	$2, %r10
+	sar	$16, %eax
+_T_1_encrypt:
+	mov	%al, (%r10)
 	jmp	_return_T_done_encrypt
 _T_16_encrypt:
 	movdqu	%xmm0, (%r10)
-- 
2.12.2

^ permalink raw reply related

* [PATCH 0/7] crypto: aesni: provide generic gcm(aes)
From: Sabrina Dubroca @ 2017-04-28 16:11 UTC (permalink / raw)
  To: netdev
  Cc: Sabrina Dubroca, Hannes Frederic Sowa, Herbert Xu,
	David S. Miller, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, linux-crypto, linux-kernel

The current aesni AES-GCM implementation only offers support for
rfc4106(gcm(aes)).  This makes some things a little bit simpler
(handling of associated data and authentication tag), but it means
that non-IPsec users of gcm(aes) have to rely on
gcm_base(ctr-aes-aesni,ghash-clmulni), which is much slower.

This patchset adds handling of all valid authentication tag lengths
and of any associated data length to the assembly code, and exposes a
generic gcm(aes) AEAD algorithm to the crypto API.

With these patches, performance of MACsec on a single core increases
by 40% (from 4.5Gbps to around 6.3Gbps).

Sabrina Dubroca (7):
  crypto: aesni: make non-AVX AES-GCM work with any aadlen
  crypto: aesni: make non-AVX AES-GCM work with all valid auth_tag_len
  crypto: aesni: make AVX AES-GCM work with any aadlen
  crypto: aesni: make AVX AES-GCM work with all valid auth_tag_len
  crypto: aesni: make AVX2 AES-GCM work with any aadlen
  crypto: aesni: make AVX2 AES-GCM work with all valid auth_tag_len
  crypto: aesni: add generic gcm(aes)

 arch/x86/crypto/aesni-intel_asm.S        | 231 +++++++++++++++++++------
 arch/x86/crypto/aesni-intel_avx-x86_64.S | 283 ++++++++++++++++++++++---------
 arch/x86/crypto/aesni-intel_glue.c       | 208 +++++++++++++++++------
 3 files changed, 539 insertions(+), 183 deletions(-)

-- 
2.12.2

^ permalink raw reply

* Re: arm64: next-20170428 hangs on boot
From: Yury Norov @ 2017-04-28 16:10 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Mark Rutland, netdev, linux-kernel, linux-arm-kernel, davem
In-Reply-To: <e65b3ec3-790d-da96-9e1f-3d344f65517b@gmail.com>

On Fri, Apr 28, 2017 at 08:40:54AM -0700, Florian Fainelli wrote:
> On 04/28/2017 08:09 AM, Yury Norov wrote:
> > On Fri, Apr 28, 2017 at 03:52:34PM +0100, Mark Rutland wrote:
> >> On Fri, Apr 28, 2017 at 04:24:29PM +0300, Yury Norov wrote:
> >>> Hi all,
> >>
> >> Hi,
> >>
> >> [adding Dave Miller, netdev, lkml]
> > 
> > thanks
> > 
> >>> On QEMU the next-20170428 hangs on boot for me due to kernel panic in 
> >>> rtnetlink_init():
> >>>
> >>> void __init rtnetlink_init(void)
> >>> {
> >>>         if (register_pernet_subsys(&rtnetlink_net_ops))
> >>>                 panic("rtnetlink_init: cannot initialize rtnetlink\n");
> >>>
> >>>         ...
> >>> }
> >>
> >> I see the same thing with a next-20170428 arm64 defconfig, on a Juno R1
> >> system:
> >>
> >> [    0.531949] Kernel panic - not syncing: rtnetlink_init: cannot initialize rtnetlink
> >> [    0.531949] 
> >> [    0.541271] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc8-next-20170428-00002-g6ee3799 #10
> >> [    0.550307] Hardware name: ARM Juno development board (r1) (DT)
> >> [    0.556332] Call trace:
> >> [    0.558833] [<ffff000008088538>] dump_backtrace+0x0/0x238
> >> [    0.564332] [<ffff000008088834>] show_stack+0x14/0x20
> >> [    0.569477] [<ffff00000839dd54>] dump_stack+0x9c/0xc0
> >> [    0.574622] [<ffff000008175344>] panic+0x11c/0x28c
> >> [    0.579505] [<ffff000008d80034>] rtnetlink_init+0x2c/0x1d0
> >> [    0.585092] [<ffff000008d8047c>] netlink_proto_init+0x14c/0x17c
> >> [    0.591119] [<ffff000008083150>] do_one_initcall+0x38/0x120
> >> [    0.596796] [<ffff000008d30d00>] kernel_init_freeable+0x1a0/0x240
> >> [    0.603003] [<ffff00000892a790>] kernel_init+0x10/0x100
> >> [    0.608324] [<ffff000008082ec0>] ret_from_fork+0x10/0x50
> >> [    0.613736] SMP: stopping secondary CPUs
> >> [    0.617738] ---[ end Kernel panic - not syncing: rtnetlink_init: cannot initialize rtnetlink
> >>
> >> If this isn't a known issue, it would be worth trying to bisect this.
> 
> It's fixed already by this commit in net-next:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=2d2ab658d2debcb4c0e29c9e6f18e5683f3077bf

Works for me, thank you.

^ permalink raw reply

* Re: [PATCH] stmmac: Add support for SIMATIC IOT2000 platform
From: David Miller @ 2017-04-28 16:09 UTC (permalink / raw)
  To: jan.kiszka
  Cc: peppe.cavallaro, alexandre.torgue, netdev, linux-kernel,
	sascha.weisenberger
In-Reply-To: <8c536123-6189-e0b6-1977-dc7a521718dd@siemens.com>

From: Jan Kiszka <jan.kiszka@siemens.com>
Date: Mon, 24 Apr 2017 21:27:15 +0200

> The IOT2000 is industrial controller platform, derived from the Intel
> Galileo Gen2 board. The variant IOT2020 comes with one LAN port, the
> IOT2040 has two of them. They can be told apart based on the board asset
> tag in the DMI table.
> 
> Based on patch by Sascha Weisenberger.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> Signed-off-by: Sascha Weisenberger <sascha.weisenberger@siemens.com>

It looks like there is still some discussion going on about precise
detection and disambiguation of these devices.

Once things have settled down please resubmit the final patch.

Thank you.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox