Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] net/mlx4_core: print firmware version during driver loading
From: Qing Huang @ 2018-09-14 22:36 UTC (permalink / raw)
  To: David Miller; +Cc: andrew, leon, netdev, linux-rdma, linux-kernel, tariqt
In-Reply-To: <20180914.141406.2211638662965115243.davem@davemloft.net>



On 9/14/2018 2:14 PM, David Miller wrote:
> From: Qing Huang<qing.huang@oracle.com>
> Date: Fri, 14 Sep 2018 11:33:40 -0700
>
>> On 9/14/2018 11:17 AM, Andrew Lunn wrote:
>>> On Fri, Sep 14, 2018 at 10:15:48AM -0700, Qing Huang wrote:
>>>> The FW version is actually a very crucial piece of information and
>>>> only
>>>> printed once here
>>>> when the driver is loaded. People tend to get confused when switching
>>>> multiple FW files
>>>> back and forth without running separate utility tools, especially at
>>>> customer sites.
>>>> IMHO, this information is very useful and only takes up very little
>>>> log file
>>>> space. :-)
>>> Why not use ethtool -i ?
>>>
>>> $ sudo ethtool -i eth0
>>> driver: r8169
>>> version: 2.3LK-NAPI
>>> firmware-version: rtl8168g-2_0.0.1 02/06/13
>>>
>>>       Andrew
>> Sure. You can also use ibstat or ibv_devinfo tool if they are
>> installed. But it's not very
>> convenient in some cases.
>>
>> E.g.
>> A customer upgrades FW on HCAs and encounters issues. During triage,
>> it's much easier
>> to study customer uploaded log files when remotely testing different
>> FW files.
> Not a valid argument.  You can print the ethtool output from initramfs
> if necessary for triage.
>
> I still stand by the fact that ethtool is the only fully reliable way
> to obtain this information, the kernel log is not.

This is more for Infiniband mode which depends more on features and 
functionalities
provided in firmware and get much more frequent FW bug fixes than 
typical Ethernet
devices. This is not meant to replace other ways of getting the 
information, more like
an enhancement for checking log history.

This can provide valuable information when tracing through system log 
history to
discover what happened with a specific HCA drv ver and fw ver 
combination in the past.

Regards,
Qing

^ permalink raw reply

* Re: [PATCH net-next v4 18/20] crypto: port ChaCha20 to Zinc
From: Ard Biesheuvel @ 2018-09-14 17:38 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Linux Kernel Mailing List, <netdev@vger.kernel.org>,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE, David S. Miller,
	Greg Kroah-Hartman, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Eric Biggers
In-Reply-To: <20180914162240.7925-19-Jason@zx2c4.com>

On 14 September 2018 at 18:22, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Now that ChaCha20 is in Zinc, we can have the crypto API code simply
> call into it. The crypto API expects to have a stored key per instance
> and independent nonces, so we follow suite and store the key and
> initialize the nonce independently.
>

>From our exchange re v3:

>> Then there is the performance claim. We know for instance that the
>> OpenSSL ARM NEON code for ChaCha20 is faster on cores that happen to
>> possess a micro-architectural property that ALU instructions are
>> essentially free when they are interleaved with SIMD instructions. But
>> we also know that a) Cortex-A7, which is a relevant target, is not one
>> of those cores, and b) that chip designers are not likely to optimize
>> for that particular usage pattern so relying on it in generic code is
>> unwise in general.
>
> That's interesting. I'll bring this up with AndyP. FWIW, if you think
> you have a real and compelling claim here, I'd be much more likely to
> accept a different ChaCha20 implementation than I would be to accept a
> different Poly1305 implementation. (It's a *lot* harder to screw up
> ChaCha20 than it is to screw up Poly1305.)
>

so could we please bring that discussion to a close before we drop the ARM code?

I am fine with dropping the arm64 code btw.

> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Samuel Neves <sneves@dei.uc.pt>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
> Cc: Eric Biggers <ebiggers@google.com>
> ---
>  arch/arm/configs/exynos_defconfig       |   1 -
>  arch/arm/configs/multi_v7_defconfig     |   1 -
>  arch/arm/configs/omap2plus_defconfig    |   1 -
>  arch/arm/crypto/Kconfig                 |   6 -
>  arch/arm/crypto/Makefile                |   2 -
>  arch/arm/crypto/chacha20-neon-core.S    | 521 --------------------
>  arch/arm/crypto/chacha20-neon-glue.c    | 127 -----
>  arch/arm64/configs/defconfig            |   1 -
>  arch/arm64/crypto/Kconfig               |   6 -
>  arch/arm64/crypto/Makefile              |   3 -
>  arch/arm64/crypto/chacha20-neon-core.S  | 450 -----------------
>  arch/arm64/crypto/chacha20-neon-glue.c  | 133 -----
>  arch/x86/crypto/Makefile                |   3 -
>  arch/x86/crypto/chacha20-avx2-x86_64.S  | 448 -----------------
>  arch/x86/crypto/chacha20-ssse3-x86_64.S | 630 ------------------------
>  arch/x86/crypto/chacha20_glue.c         | 146 ------
>  crypto/Kconfig                          |  16 -
>  crypto/Makefile                         |   2 +-
>  crypto/chacha20_generic.c               | 136 -----
>  crypto/chacha20_zinc.c                  | 100 ++++
>  crypto/chacha20poly1305.c               |   2 +-
>  include/crypto/chacha20.h               |  12 -
>  22 files changed, 102 insertions(+), 2645 deletions(-)
>  delete mode 100644 arch/arm/crypto/chacha20-neon-core.S
>  delete mode 100644 arch/arm/crypto/chacha20-neon-glue.c
>  delete mode 100644 arch/arm64/crypto/chacha20-neon-core.S
>  delete mode 100644 arch/arm64/crypto/chacha20-neon-glue.c
>  delete mode 100644 arch/x86/crypto/chacha20-avx2-x86_64.S
>  delete mode 100644 arch/x86/crypto/chacha20-ssse3-x86_64.S
>  delete mode 100644 arch/x86/crypto/chacha20_glue.c
>  delete mode 100644 crypto/chacha20_generic.c
>  create mode 100644 crypto/chacha20_zinc.c
>
> diff --git a/arch/arm/configs/exynos_defconfig b/arch/arm/configs/exynos_defconfig
> index 27ea6dfcf2f2..95929b5e7b10 100644
> --- a/arch/arm/configs/exynos_defconfig
> +++ b/arch/arm/configs/exynos_defconfig
> @@ -350,7 +350,6 @@ CONFIG_CRYPTO_SHA1_ARM_NEON=m
>  CONFIG_CRYPTO_SHA256_ARM=m
>  CONFIG_CRYPTO_SHA512_ARM=m
>  CONFIG_CRYPTO_AES_ARM_BS=m
> -CONFIG_CRYPTO_CHACHA20_NEON=m
>  CONFIG_CRC_CCITT=y
>  CONFIG_FONTS=y
>  CONFIG_FONT_7x14=y
> diff --git a/arch/arm/configs/multi_v7_defconfig b/arch/arm/configs/multi_v7_defconfig
> index fc33444e94f0..63be07724db3 100644
> --- a/arch/arm/configs/multi_v7_defconfig
> +++ b/arch/arm/configs/multi_v7_defconfig
> @@ -1000,4 +1000,3 @@ CONFIG_CRYPTO_AES_ARM_BS=m
>  CONFIG_CRYPTO_AES_ARM_CE=m
>  CONFIG_CRYPTO_GHASH_ARM_CE=m
>  CONFIG_CRYPTO_CRC32_ARM_CE=m
> -CONFIG_CRYPTO_CHACHA20_NEON=m
> diff --git a/arch/arm/configs/omap2plus_defconfig b/arch/arm/configs/omap2plus_defconfig
> index 6491419b1dad..f585a8ecc336 100644
> --- a/arch/arm/configs/omap2plus_defconfig
> +++ b/arch/arm/configs/omap2plus_defconfig
> @@ -547,7 +547,6 @@ CONFIG_CRYPTO_SHA512_ARM=m
>  CONFIG_CRYPTO_AES_ARM=m
>  CONFIG_CRYPTO_AES_ARM_BS=m
>  CONFIG_CRYPTO_GHASH_ARM_CE=m
> -CONFIG_CRYPTO_CHACHA20_NEON=m
>  CONFIG_CRC_CCITT=y
>  CONFIG_CRC_T10DIF=y
>  CONFIG_CRC_ITU_T=y
> diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
> index 925d1364727a..fb80fd89f0e7 100644
> --- a/arch/arm/crypto/Kconfig
> +++ b/arch/arm/crypto/Kconfig
> @@ -115,12 +115,6 @@ config CRYPTO_CRC32_ARM_CE
>         depends on KERNEL_MODE_NEON && CRC32
>         select CRYPTO_HASH
>
> -config CRYPTO_CHACHA20_NEON
> -       tristate "NEON accelerated ChaCha20 symmetric cipher"
> -       depends on KERNEL_MODE_NEON
> -       select CRYPTO_BLKCIPHER
> -       select CRYPTO_CHACHA20
> -
>  config CRYPTO_SPECK_NEON
>         tristate "NEON accelerated Speck cipher algorithms"
>         depends on KERNEL_MODE_NEON
> diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
> index 8de542c48ade..bbfa98447063 100644
> --- a/arch/arm/crypto/Makefile
> +++ b/arch/arm/crypto/Makefile
> @@ -9,7 +9,6 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
>  obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
>  obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
>  obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
> -obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
>  obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
>
>  ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
> @@ -53,7 +52,6 @@ aes-arm-ce-y  := aes-ce-core.o aes-ce-glue.o
>  ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o
>  crct10dif-arm-ce-y     := crct10dif-ce-core.o crct10dif-ce-glue.o
>  crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
> -chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
>  speck-neon-y := speck-neon-core.o speck-neon-glue.o
>
>  ifdef REGENERATE_ARM_CRYPTO
> diff --git a/arch/arm/crypto/chacha20-neon-core.S b/arch/arm/crypto/chacha20-neon-core.S
> deleted file mode 100644
> index 451a849ad518..000000000000
> --- a/arch/arm/crypto/chacha20-neon-core.S
> +++ /dev/null
> @@ -1,521 +0,0 @@
> -/*
> - * ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
> - *
> - * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 as
> - * published by the Free Software Foundation.
> - *
> - * Based on:
> - * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSE3 functions
> - *
> - * Copyright (C) 2015 Martin Willi
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - */
> -
> -#include <linux/linkage.h>
> -
> -       .text
> -       .fpu            neon
> -       .align          5
> -
> -ENTRY(chacha20_block_xor_neon)
> -       // r0: Input state matrix, s
> -       // r1: 1 data block output, o
> -       // r2: 1 data block input, i
> -
> -       //
> -       // This function encrypts one ChaCha20 block by loading the state matrix
> -       // in four NEON registers. It performs matrix operation on four words in
> -       // parallel, but requireds shuffling to rearrange the words after each
> -       // round.
> -       //
> -
> -       // x0..3 = s0..3
> -       add             ip, r0, #0x20
> -       vld1.32         {q0-q1}, [r0]
> -       vld1.32         {q2-q3}, [ip]
> -
> -       vmov            q8, q0
> -       vmov            q9, q1
> -       vmov            q10, q2
> -       vmov            q11, q3
> -
> -       mov             r3, #10
> -
> -.Ldoubleround:
> -       // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
> -       vadd.i32        q0, q0, q1
> -       veor            q3, q3, q0
> -       vrev32.16       q3, q3
> -
> -       // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
> -       vadd.i32        q2, q2, q3
> -       veor            q4, q1, q2
> -       vshl.u32        q1, q4, #12
> -       vsri.u32        q1, q4, #20
> -
> -       // x0 += x1, x3 = rotl32(x3 ^ x0, 8)
> -       vadd.i32        q0, q0, q1
> -       veor            q4, q3, q0
> -       vshl.u32        q3, q4, #8
> -       vsri.u32        q3, q4, #24
> -
> -       // x2 += x3, x1 = rotl32(x1 ^ x2, 7)
> -       vadd.i32        q2, q2, q3
> -       veor            q4, q1, q2
> -       vshl.u32        q1, q4, #7
> -       vsri.u32        q1, q4, #25
> -
> -       // x1 = shuffle32(x1, MASK(0, 3, 2, 1))
> -       vext.8          q1, q1, q1, #4
> -       // x2 = shuffle32(x2, MASK(1, 0, 3, 2))
> -       vext.8          q2, q2, q2, #8
> -       // x3 = shuffle32(x3, MASK(2, 1, 0, 3))
> -       vext.8          q3, q3, q3, #12
> -
> -       // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
> -       vadd.i32        q0, q0, q1
> -       veor            q3, q3, q0
> -       vrev32.16       q3, q3
> -
> -       // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
> -       vadd.i32        q2, q2, q3
> -       veor            q4, q1, q2
> -       vshl.u32        q1, q4, #12
> -       vsri.u32        q1, q4, #20
> -
> -       // x0 += x1, x3 = rotl32(x3 ^ x0, 8)
> -       vadd.i32        q0, q0, q1
> -       veor            q4, q3, q0
> -       vshl.u32        q3, q4, #8
> -       vsri.u32        q3, q4, #24
> -
> -       // x2 += x3, x1 = rotl32(x1 ^ x2, 7)
> -       vadd.i32        q2, q2, q3
> -       veor            q4, q1, q2
> -       vshl.u32        q1, q4, #7
> -       vsri.u32        q1, q4, #25
> -
> -       // x1 = shuffle32(x1, MASK(2, 1, 0, 3))
> -       vext.8          q1, q1, q1, #12
> -       // x2 = shuffle32(x2, MASK(1, 0, 3, 2))
> -       vext.8          q2, q2, q2, #8
> -       // x3 = shuffle32(x3, MASK(0, 3, 2, 1))
> -       vext.8          q3, q3, q3, #4
> -
> -       subs            r3, r3, #1
> -       bne             .Ldoubleround
> -
> -       add             ip, r2, #0x20
> -       vld1.8          {q4-q5}, [r2]
> -       vld1.8          {q6-q7}, [ip]
> -
> -       // o0 = i0 ^ (x0 + s0)
> -       vadd.i32        q0, q0, q8
> -       veor            q0, q0, q4
> -
> -       // o1 = i1 ^ (x1 + s1)
> -       vadd.i32        q1, q1, q9
> -       veor            q1, q1, q5
> -
> -       // o2 = i2 ^ (x2 + s2)
> -       vadd.i32        q2, q2, q10
> -       veor            q2, q2, q6
> -
> -       // o3 = i3 ^ (x3 + s3)
> -       vadd.i32        q3, q3, q11
> -       veor            q3, q3, q7
> -
> -       add             ip, r1, #0x20
> -       vst1.8          {q0-q1}, [r1]
> -       vst1.8          {q2-q3}, [ip]
> -
> -       bx              lr
> -ENDPROC(chacha20_block_xor_neon)
> -
> -       .align          5
> -ENTRY(chacha20_4block_xor_neon)
> -       push            {r4-r6, lr}
> -       mov             ip, sp                  // preserve the stack pointer
> -       sub             r3, sp, #0x20           // allocate a 32 byte buffer
> -       bic             r3, r3, #0x1f           // aligned to 32 bytes
> -       mov             sp, r3
> -
> -       // r0: Input state matrix, s
> -       // r1: 4 data blocks output, o
> -       // r2: 4 data blocks input, i
> -
> -       //
> -       // This function encrypts four consecutive ChaCha20 blocks by loading
> -       // the state matrix in NEON registers four times. The algorithm performs
> -       // each operation on the corresponding word of each state matrix, hence
> -       // requires no word shuffling. For final XORing step we transpose the
> -       // matrix by interleaving 32- and then 64-bit words, which allows us to
> -       // do XOR in NEON registers.
> -       //
> -
> -       // x0..15[0-3] = s0..3[0..3]
> -       add             r3, r0, #0x20
> -       vld1.32         {q0-q1}, [r0]
> -       vld1.32         {q2-q3}, [r3]
> -
> -       adr             r3, CTRINC
> -       vdup.32         q15, d7[1]
> -       vdup.32         q14, d7[0]
> -       vld1.32         {q11}, [r3, :128]
> -       vdup.32         q13, d6[1]
> -       vdup.32         q12, d6[0]
> -       vadd.i32        q12, q12, q11           // x12 += counter values 0-3
> -       vdup.32         q11, d5[1]
> -       vdup.32         q10, d5[0]
> -       vdup.32         q9, d4[1]
> -       vdup.32         q8, d4[0]
> -       vdup.32         q7, d3[1]
> -       vdup.32         q6, d3[0]
> -       vdup.32         q5, d2[1]
> -       vdup.32         q4, d2[0]
> -       vdup.32         q3, d1[1]
> -       vdup.32         q2, d1[0]
> -       vdup.32         q1, d0[1]
> -       vdup.32         q0, d0[0]
> -
> -       mov             r3, #10
> -
> -.Ldoubleround4:
> -       // x0 += x4, x12 = rotl32(x12 ^ x0, 16)
> -       // x1 += x5, x13 = rotl32(x13 ^ x1, 16)
> -       // x2 += x6, x14 = rotl32(x14 ^ x2, 16)
> -       // x3 += x7, x15 = rotl32(x15 ^ x3, 16)
> -       vadd.i32        q0, q0, q4
> -       vadd.i32        q1, q1, q5
> -       vadd.i32        q2, q2, q6
> -       vadd.i32        q3, q3, q7
> -
> -       veor            q12, q12, q0
> -       veor            q13, q13, q1
> -       veor            q14, q14, q2
> -       veor            q15, q15, q3
> -
> -       vrev32.16       q12, q12
> -       vrev32.16       q13, q13
> -       vrev32.16       q14, q14
> -       vrev32.16       q15, q15
> -
> -       // x8 += x12, x4 = rotl32(x4 ^ x8, 12)
> -       // x9 += x13, x5 = rotl32(x5 ^ x9, 12)
> -       // x10 += x14, x6 = rotl32(x6 ^ x10, 12)
> -       // x11 += x15, x7 = rotl32(x7 ^ x11, 12)
> -       vadd.i32        q8, q8, q12
> -       vadd.i32        q9, q9, q13
> -       vadd.i32        q10, q10, q14
> -       vadd.i32        q11, q11, q15
> -
> -       vst1.32         {q8-q9}, [sp, :256]
> -
> -       veor            q8, q4, q8
> -       veor            q9, q5, q9
> -       vshl.u32        q4, q8, #12
> -       vshl.u32        q5, q9, #12
> -       vsri.u32        q4, q8, #20
> -       vsri.u32        q5, q9, #20
> -
> -       veor            q8, q6, q10
> -       veor            q9, q7, q11
> -       vshl.u32        q6, q8, #12
> -       vshl.u32        q7, q9, #12
> -       vsri.u32        q6, q8, #20
> -       vsri.u32        q7, q9, #20
> -
> -       // x0 += x4, x12 = rotl32(x12 ^ x0, 8)
> -       // x1 += x5, x13 = rotl32(x13 ^ x1, 8)
> -       // x2 += x6, x14 = rotl32(x14 ^ x2, 8)
> -       // x3 += x7, x15 = rotl32(x15 ^ x3, 8)
> -       vadd.i32        q0, q0, q4
> -       vadd.i32        q1, q1, q5
> -       vadd.i32        q2, q2, q6
> -       vadd.i32        q3, q3, q7
> -
> -       veor            q8, q12, q0
> -       veor            q9, q13, q1
> -       vshl.u32        q12, q8, #8
> -       vshl.u32        q13, q9, #8
> -       vsri.u32        q12, q8, #24
> -       vsri.u32        q13, q9, #24
> -
> -       veor            q8, q14, q2
> -       veor            q9, q15, q3
> -       vshl.u32        q14, q8, #8
> -       vshl.u32        q15, q9, #8
> -       vsri.u32        q14, q8, #24
> -       vsri.u32        q15, q9, #24
> -
> -       vld1.32         {q8-q9}, [sp, :256]
> -
> -       // x8 += x12, x4 = rotl32(x4 ^ x8, 7)
> -       // x9 += x13, x5 = rotl32(x5 ^ x9, 7)
> -       // x10 += x14, x6 = rotl32(x6 ^ x10, 7)
> -       // x11 += x15, x7 = rotl32(x7 ^ x11, 7)
> -       vadd.i32        q8, q8, q12
> -       vadd.i32        q9, q9, q13
> -       vadd.i32        q10, q10, q14
> -       vadd.i32        q11, q11, q15
> -
> -       vst1.32         {q8-q9}, [sp, :256]
> -
> -       veor            q8, q4, q8
> -       veor            q9, q5, q9
> -       vshl.u32        q4, q8, #7
> -       vshl.u32        q5, q9, #7
> -       vsri.u32        q4, q8, #25
> -       vsri.u32        q5, q9, #25
> -
> -       veor            q8, q6, q10
> -       veor            q9, q7, q11
> -       vshl.u32        q6, q8, #7
> -       vshl.u32        q7, q9, #7
> -       vsri.u32        q6, q8, #25
> -       vsri.u32        q7, q9, #25
> -
> -       vld1.32         {q8-q9}, [sp, :256]
> -
> -       // x0 += x5, x15 = rotl32(x15 ^ x0, 16)
> -       // x1 += x6, x12 = rotl32(x12 ^ x1, 16)
> -       // x2 += x7, x13 = rotl32(x13 ^ x2, 16)
> -       // x3 += x4, x14 = rotl32(x14 ^ x3, 16)
> -       vadd.i32        q0, q0, q5
> -       vadd.i32        q1, q1, q6
> -       vadd.i32        q2, q2, q7
> -       vadd.i32        q3, q3, q4
> -
> -       veor            q15, q15, q0
> -       veor            q12, q12, q1
> -       veor            q13, q13, q2
> -       veor            q14, q14, q3
> -
> -       vrev32.16       q15, q15
> -       vrev32.16       q12, q12
> -       vrev32.16       q13, q13
> -       vrev32.16       q14, q14
> -
> -       // x10 += x15, x5 = rotl32(x5 ^ x10, 12)
> -       // x11 += x12, x6 = rotl32(x6 ^ x11, 12)
> -       // x8 += x13, x7 = rotl32(x7 ^ x8, 12)
> -       // x9 += x14, x4 = rotl32(x4 ^ x9, 12)
> -       vadd.i32        q10, q10, q15
> -       vadd.i32        q11, q11, q12
> -       vadd.i32        q8, q8, q13
> -       vadd.i32        q9, q9, q14
> -
> -       vst1.32         {q8-q9}, [sp, :256]
> -
> -       veor            q8, q7, q8
> -       veor            q9, q4, q9
> -       vshl.u32        q7, q8, #12
> -       vshl.u32        q4, q9, #12
> -       vsri.u32        q7, q8, #20
> -       vsri.u32        q4, q9, #20
> -
> -       veor            q8, q5, q10
> -       veor            q9, q6, q11
> -       vshl.u32        q5, q8, #12
> -       vshl.u32        q6, q9, #12
> -       vsri.u32        q5, q8, #20
> -       vsri.u32        q6, q9, #20
> -
> -       // x0 += x5, x15 = rotl32(x15 ^ x0, 8)
> -       // x1 += x6, x12 = rotl32(x12 ^ x1, 8)
> -       // x2 += x7, x13 = rotl32(x13 ^ x2, 8)
> -       // x3 += x4, x14 = rotl32(x14 ^ x3, 8)
> -       vadd.i32        q0, q0, q5
> -       vadd.i32        q1, q1, q6
> -       vadd.i32        q2, q2, q7
> -       vadd.i32        q3, q3, q4
> -
> -       veor            q8, q15, q0
> -       veor            q9, q12, q1
> -       vshl.u32        q15, q8, #8
> -       vshl.u32        q12, q9, #8
> -       vsri.u32        q15, q8, #24
> -       vsri.u32        q12, q9, #24
> -
> -       veor            q8, q13, q2
> -       veor            q9, q14, q3
> -       vshl.u32        q13, q8, #8
> -       vshl.u32        q14, q9, #8
> -       vsri.u32        q13, q8, #24
> -       vsri.u32        q14, q9, #24
> -
> -       vld1.32         {q8-q9}, [sp, :256]
> -
> -       // x10 += x15, x5 = rotl32(x5 ^ x10, 7)
> -       // x11 += x12, x6 = rotl32(x6 ^ x11, 7)
> -       // x8 += x13, x7 = rotl32(x7 ^ x8, 7)
> -       // x9 += x14, x4 = rotl32(x4 ^ x9, 7)
> -       vadd.i32        q10, q10, q15
> -       vadd.i32        q11, q11, q12
> -       vadd.i32        q8, q8, q13
> -       vadd.i32        q9, q9, q14
> -
> -       vst1.32         {q8-q9}, [sp, :256]
> -
> -       veor            q8, q7, q8
> -       veor            q9, q4, q9
> -       vshl.u32        q7, q8, #7
> -       vshl.u32        q4, q9, #7
> -       vsri.u32        q7, q8, #25
> -       vsri.u32        q4, q9, #25
> -
> -       veor            q8, q5, q10
> -       veor            q9, q6, q11
> -       vshl.u32        q5, q8, #7
> -       vshl.u32        q6, q9, #7
> -       vsri.u32        q5, q8, #25
> -       vsri.u32        q6, q9, #25
> -
> -       subs            r3, r3, #1
> -       beq             0f
> -
> -       vld1.32         {q8-q9}, [sp, :256]
> -       b               .Ldoubleround4
> -
> -       // x0[0-3] += s0[0]
> -       // x1[0-3] += s0[1]
> -       // x2[0-3] += s0[2]
> -       // x3[0-3] += s0[3]
> -0:     ldmia           r0!, {r3-r6}
> -       vdup.32         q8, r3
> -       vdup.32         q9, r4
> -       vadd.i32        q0, q0, q8
> -       vadd.i32        q1, q1, q9
> -       vdup.32         q8, r5
> -       vdup.32         q9, r6
> -       vadd.i32        q2, q2, q8
> -       vadd.i32        q3, q3, q9
> -
> -       // x4[0-3] += s1[0]
> -       // x5[0-3] += s1[1]
> -       // x6[0-3] += s1[2]
> -       // x7[0-3] += s1[3]
> -       ldmia           r0!, {r3-r6}
> -       vdup.32         q8, r3
> -       vdup.32         q9, r4
> -       vadd.i32        q4, q4, q8
> -       vadd.i32        q5, q5, q9
> -       vdup.32         q8, r5
> -       vdup.32         q9, r6
> -       vadd.i32        q6, q6, q8
> -       vadd.i32        q7, q7, q9
> -
> -       // interleave 32-bit words in state n, n+1
> -       vzip.32         q0, q1
> -       vzip.32         q2, q3
> -       vzip.32         q4, q5
> -       vzip.32         q6, q7
> -
> -       // interleave 64-bit words in state n, n+2
> -       vswp            d1, d4
> -       vswp            d3, d6
> -       vswp            d9, d12
> -       vswp            d11, d14
> -
> -       // xor with corresponding input, write to output
> -       vld1.8          {q8-q9}, [r2]!
> -       veor            q8, q8, q0
> -       veor            q9, q9, q4
> -       vst1.8          {q8-q9}, [r1]!
> -
> -       vld1.32         {q8-q9}, [sp, :256]
> -
> -       // x8[0-3] += s2[0]
> -       // x9[0-3] += s2[1]
> -       // x10[0-3] += s2[2]
> -       // x11[0-3] += s2[3]
> -       ldmia           r0!, {r3-r6}
> -       vdup.32         q0, r3
> -       vdup.32         q4, r4
> -       vadd.i32        q8, q8, q0
> -       vadd.i32        q9, q9, q4
> -       vdup.32         q0, r5
> -       vdup.32         q4, r6
> -       vadd.i32        q10, q10, q0
> -       vadd.i32        q11, q11, q4
> -
> -       // x12[0-3] += s3[0]
> -       // x13[0-3] += s3[1]
> -       // x14[0-3] += s3[2]
> -       // x15[0-3] += s3[3]
> -       ldmia           r0!, {r3-r6}
> -       vdup.32         q0, r3
> -       vdup.32         q4, r4
> -       adr             r3, CTRINC
> -       vadd.i32        q12, q12, q0
> -       vld1.32         {q0}, [r3, :128]
> -       vadd.i32        q13, q13, q4
> -       vadd.i32        q12, q12, q0            // x12 += counter values 0-3
> -
> -       vdup.32         q0, r5
> -       vdup.32         q4, r6
> -       vadd.i32        q14, q14, q0
> -       vadd.i32        q15, q15, q4
> -
> -       // interleave 32-bit words in state n, n+1
> -       vzip.32         q8, q9
> -       vzip.32         q10, q11
> -       vzip.32         q12, q13
> -       vzip.32         q14, q15
> -
> -       // interleave 64-bit words in state n, n+2
> -       vswp            d17, d20
> -       vswp            d19, d22
> -       vswp            d25, d28
> -       vswp            d27, d30
> -
> -       vmov            q4, q1
> -
> -       vld1.8          {q0-q1}, [r2]!
> -       veor            q0, q0, q8
> -       veor            q1, q1, q12
> -       vst1.8          {q0-q1}, [r1]!
> -
> -       vld1.8          {q0-q1}, [r2]!
> -       veor            q0, q0, q2
> -       veor            q1, q1, q6
> -       vst1.8          {q0-q1}, [r1]!
> -
> -       vld1.8          {q0-q1}, [r2]!
> -       veor            q0, q0, q10
> -       veor            q1, q1, q14
> -       vst1.8          {q0-q1}, [r1]!
> -
> -       vld1.8          {q0-q1}, [r2]!
> -       veor            q0, q0, q4
> -       veor            q1, q1, q5
> -       vst1.8          {q0-q1}, [r1]!
> -
> -       vld1.8          {q0-q1}, [r2]!
> -       veor            q0, q0, q9
> -       veor            q1, q1, q13
> -       vst1.8          {q0-q1}, [r1]!
> -
> -       vld1.8          {q0-q1}, [r2]!
> -       veor            q0, q0, q3
> -       veor            q1, q1, q7
> -       vst1.8          {q0-q1}, [r1]!
> -
> -       vld1.8          {q0-q1}, [r2]
> -       veor            q0, q0, q11
> -       veor            q1, q1, q15
> -       vst1.8          {q0-q1}, [r1]
> -
> -       mov             sp, ip
> -       pop             {r4-r6, pc}
> -ENDPROC(chacha20_4block_xor_neon)
> -
> -       .align          4
> -CTRINC:        .word           0, 1, 2, 3
> diff --git a/arch/arm/crypto/chacha20-neon-glue.c b/arch/arm/crypto/chacha20-neon-glue.c
> deleted file mode 100644
> index 59a7be08e80c..000000000000
> --- a/arch/arm/crypto/chacha20-neon-glue.c
> +++ /dev/null
> @@ -1,127 +0,0 @@
> -/*
> - * ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
> - *
> - * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 as
> - * published by the Free Software Foundation.
> - *
> - * Based on:
> - * ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
> - *
> - * Copyright (C) 2015 Martin Willi
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - */
> -
> -#include <crypto/algapi.h>
> -#include <crypto/chacha20.h>
> -#include <crypto/internal/skcipher.h>
> -#include <linux/kernel.h>
> -#include <linux/module.h>
> -
> -#include <asm/hwcap.h>
> -#include <asm/neon.h>
> -#include <asm/simd.h>
> -
> -asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
> -asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
> -
> -static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
> -                           unsigned int bytes)
> -{
> -       u8 buf[CHACHA20_BLOCK_SIZE];
> -
> -       while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
> -               chacha20_4block_xor_neon(state, dst, src);
> -               bytes -= CHACHA20_BLOCK_SIZE * 4;
> -               src += CHACHA20_BLOCK_SIZE * 4;
> -               dst += CHACHA20_BLOCK_SIZE * 4;
> -               state[12] += 4;
> -       }
> -       while (bytes >= CHACHA20_BLOCK_SIZE) {
> -               chacha20_block_xor_neon(state, dst, src);
> -               bytes -= CHACHA20_BLOCK_SIZE;
> -               src += CHACHA20_BLOCK_SIZE;
> -               dst += CHACHA20_BLOCK_SIZE;
> -               state[12]++;
> -       }
> -       if (bytes) {
> -               memcpy(buf, src, bytes);
> -               chacha20_block_xor_neon(state, buf, buf);
> -               memcpy(dst, buf, bytes);
> -       }
> -}
> -
> -static int chacha20_neon(struct skcipher_request *req)
> -{
> -       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> -       struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
> -       struct skcipher_walk walk;
> -       u32 state[16];
> -       int err;
> -
> -       if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd())
> -               return crypto_chacha20_crypt(req);
> -
> -       err = skcipher_walk_virt(&walk, req, true);
> -
> -       crypto_chacha20_init(state, ctx, walk.iv);
> -
> -       kernel_neon_begin();
> -       while (walk.nbytes > 0) {
> -               unsigned int nbytes = walk.nbytes;
> -
> -               if (nbytes < walk.total)
> -                       nbytes = round_down(nbytes, walk.stride);
> -
> -               chacha20_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
> -                               nbytes);
> -               err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
> -       }
> -       kernel_neon_end();
> -
> -       return err;
> -}
> -
> -static struct skcipher_alg alg = {
> -       .base.cra_name          = "chacha20",
> -       .base.cra_driver_name   = "chacha20-neon",
> -       .base.cra_priority      = 300,
> -       .base.cra_blocksize     = 1,
> -       .base.cra_ctxsize       = sizeof(struct chacha20_ctx),
> -       .base.cra_module        = THIS_MODULE,
> -
> -       .min_keysize            = CHACHA20_KEY_SIZE,
> -       .max_keysize            = CHACHA20_KEY_SIZE,
> -       .ivsize                 = CHACHA20_IV_SIZE,
> -       .chunksize              = CHACHA20_BLOCK_SIZE,
> -       .walksize               = 4 * CHACHA20_BLOCK_SIZE,
> -       .setkey                 = crypto_chacha20_setkey,
> -       .encrypt                = chacha20_neon,
> -       .decrypt                = chacha20_neon,
> -};
> -
> -static int __init chacha20_simd_mod_init(void)
> -{
> -       if (!(elf_hwcap & HWCAP_NEON))
> -               return -ENODEV;
> -
> -       return crypto_register_skcipher(&alg);
> -}
> -
> -static void __exit chacha20_simd_mod_fini(void)
> -{
> -       crypto_unregister_skcipher(&alg);
> -}
> -
> -module_init(chacha20_simd_mod_init);
> -module_exit(chacha20_simd_mod_fini);
> -
> -MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
> -MODULE_LICENSE("GPL v2");
> -MODULE_ALIAS_CRYPTO("chacha20");
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index db8d364f8476..6cc3c8a0ad88 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -709,5 +709,4 @@ CONFIG_CRYPTO_CRCT10DIF_ARM64_CE=m
>  CONFIG_CRYPTO_CRC32_ARM64_CE=m
>  CONFIG_CRYPTO_AES_ARM64_CE_CCM=y
>  CONFIG_CRYPTO_AES_ARM64_CE_BLK=y
> -CONFIG_CRYPTO_CHACHA20_NEON=m
>  CONFIG_CRYPTO_AES_ARM64_BS=m
> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
> index e3fdb0fd6f70..9db6d775a880 100644
> --- a/arch/arm64/crypto/Kconfig
> +++ b/arch/arm64/crypto/Kconfig
> @@ -105,12 +105,6 @@ config CRYPTO_AES_ARM64_NEON_BLK
>         select CRYPTO_AES
>         select CRYPTO_SIMD
>
> -config CRYPTO_CHACHA20_NEON
> -       tristate "NEON accelerated ChaCha20 symmetric cipher"
> -       depends on KERNEL_MODE_NEON
> -       select CRYPTO_BLKCIPHER
> -       select CRYPTO_CHACHA20
> -
>  config CRYPTO_AES_ARM64_BS
>         tristate "AES in ECB/CBC/CTR/XTS modes using bit-sliced NEON algorithm"
>         depends on KERNEL_MODE_NEON
> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
> index bcafd016618e..507c4bfb86e3 100644
> --- a/arch/arm64/crypto/Makefile
> +++ b/arch/arm64/crypto/Makefile
> @@ -53,9 +53,6 @@ sha256-arm64-y := sha256-glue.o sha256-core.o
>  obj-$(CONFIG_CRYPTO_SHA512_ARM64) += sha512-arm64.o
>  sha512-arm64-y := sha512-glue.o sha512-core.o
>
> -obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
> -chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
> -
>  obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
>  speck-neon-y := speck-neon-core.o speck-neon-glue.o
>
> diff --git a/arch/arm64/crypto/chacha20-neon-core.S b/arch/arm64/crypto/chacha20-neon-core.S
> deleted file mode 100644
> index 13c85e272c2a..000000000000
> --- a/arch/arm64/crypto/chacha20-neon-core.S
> +++ /dev/null
> @@ -1,450 +0,0 @@
> -/*
> - * ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
> - *
> - * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 as
> - * published by the Free Software Foundation.
> - *
> - * Based on:
> - * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSSE3 functions
> - *
> - * Copyright (C) 2015 Martin Willi
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - */
> -
> -#include <linux/linkage.h>
> -
> -       .text
> -       .align          6
> -
> -ENTRY(chacha20_block_xor_neon)
> -       // x0: Input state matrix, s
> -       // x1: 1 data block output, o
> -       // x2: 1 data block input, i
> -
> -       //
> -       // This function encrypts one ChaCha20 block by loading the state matrix
> -       // in four NEON registers. It performs matrix operation on four words in
> -       // parallel, but requires shuffling to rearrange the words after each
> -       // round.
> -       //
> -
> -       // x0..3 = s0..3
> -       adr             x3, ROT8
> -       ld1             {v0.4s-v3.4s}, [x0]
> -       ld1             {v8.4s-v11.4s}, [x0]
> -       ld1             {v12.4s}, [x3]
> -
> -       mov             x3, #10
> -
> -.Ldoubleround:
> -       // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
> -       add             v0.4s, v0.4s, v1.4s
> -       eor             v3.16b, v3.16b, v0.16b
> -       rev32           v3.8h, v3.8h
> -
> -       // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
> -       add             v2.4s, v2.4s, v3.4s
> -       eor             v4.16b, v1.16b, v2.16b
> -       shl             v1.4s, v4.4s, #12
> -       sri             v1.4s, v4.4s, #20
> -
> -       // x0 += x1, x3 = rotl32(x3 ^ x0, 8)
> -       add             v0.4s, v0.4s, v1.4s
> -       eor             v3.16b, v3.16b, v0.16b
> -       tbl             v3.16b, {v3.16b}, v12.16b
> -
> -       // x2 += x3, x1 = rotl32(x1 ^ x2, 7)
> -       add             v2.4s, v2.4s, v3.4s
> -       eor             v4.16b, v1.16b, v2.16b
> -       shl             v1.4s, v4.4s, #7
> -       sri             v1.4s, v4.4s, #25
> -
> -       // x1 = shuffle32(x1, MASK(0, 3, 2, 1))
> -       ext             v1.16b, v1.16b, v1.16b, #4
> -       // x2 = shuffle32(x2, MASK(1, 0, 3, 2))
> -       ext             v2.16b, v2.16b, v2.16b, #8
> -       // x3 = shuffle32(x3, MASK(2, 1, 0, 3))
> -       ext             v3.16b, v3.16b, v3.16b, #12
> -
> -       // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
> -       add             v0.4s, v0.4s, v1.4s
> -       eor             v3.16b, v3.16b, v0.16b
> -       rev32           v3.8h, v3.8h
> -
> -       // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
> -       add             v2.4s, v2.4s, v3.4s
> -       eor             v4.16b, v1.16b, v2.16b
> -       shl             v1.4s, v4.4s, #12
> -       sri             v1.4s, v4.4s, #20
> -
> -       // x0 += x1, x3 = rotl32(x3 ^ x0, 8)
> -       add             v0.4s, v0.4s, v1.4s
> -       eor             v3.16b, v3.16b, v0.16b
> -       tbl             v3.16b, {v3.16b}, v12.16b
> -
> -       // x2 += x3, x1 = rotl32(x1 ^ x2, 7)
> -       add             v2.4s, v2.4s, v3.4s
> -       eor             v4.16b, v1.16b, v2.16b
> -       shl             v1.4s, v4.4s, #7
> -       sri             v1.4s, v4.4s, #25
> -
> -       // x1 = shuffle32(x1, MASK(2, 1, 0, 3))
> -       ext             v1.16b, v1.16b, v1.16b, #12
> -       // x2 = shuffle32(x2, MASK(1, 0, 3, 2))
> -       ext             v2.16b, v2.16b, v2.16b, #8
> -       // x3 = shuffle32(x3, MASK(0, 3, 2, 1))
> -       ext             v3.16b, v3.16b, v3.16b, #4
> -
> -       subs            x3, x3, #1
> -       b.ne            .Ldoubleround
> -
> -       ld1             {v4.16b-v7.16b}, [x2]
> -
> -       // o0 = i0 ^ (x0 + s0)
> -       add             v0.4s, v0.4s, v8.4s
> -       eor             v0.16b, v0.16b, v4.16b
> -
> -       // o1 = i1 ^ (x1 + s1)
> -       add             v1.4s, v1.4s, v9.4s
> -       eor             v1.16b, v1.16b, v5.16b
> -
> -       // o2 = i2 ^ (x2 + s2)
> -       add             v2.4s, v2.4s, v10.4s
> -       eor             v2.16b, v2.16b, v6.16b
> -
> -       // o3 = i3 ^ (x3 + s3)
> -       add             v3.4s, v3.4s, v11.4s
> -       eor             v3.16b, v3.16b, v7.16b
> -
> -       st1             {v0.16b-v3.16b}, [x1]
> -
> -       ret
> -ENDPROC(chacha20_block_xor_neon)
> -
> -       .align          6
> -ENTRY(chacha20_4block_xor_neon)
> -       // x0: Input state matrix, s
> -       // x1: 4 data blocks output, o
> -       // x2: 4 data blocks input, i
> -
> -       //
> -       // This function encrypts four consecutive ChaCha20 blocks by loading
> -       // the state matrix in NEON registers four times. The algorithm performs
> -       // each operation on the corresponding word of each state matrix, hence
> -       // requires no word shuffling. For final XORing step we transpose the
> -       // matrix by interleaving 32- and then 64-bit words, which allows us to
> -       // do XOR in NEON registers.
> -       //
> -       adr             x3, CTRINC              // ... and ROT8
> -       ld1             {v30.4s-v31.4s}, [x3]
> -
> -       // x0..15[0-3] = s0..3[0..3]
> -       mov             x4, x0
> -       ld4r            { v0.4s- v3.4s}, [x4], #16
> -       ld4r            { v4.4s- v7.4s}, [x4], #16
> -       ld4r            { v8.4s-v11.4s}, [x4], #16
> -       ld4r            {v12.4s-v15.4s}, [x4]
> -
> -       // x12 += counter values 0-3
> -       add             v12.4s, v12.4s, v30.4s
> -
> -       mov             x3, #10
> -
> -.Ldoubleround4:
> -       // x0 += x4, x12 = rotl32(x12 ^ x0, 16)
> -       // x1 += x5, x13 = rotl32(x13 ^ x1, 16)
> -       // x2 += x6, x14 = rotl32(x14 ^ x2, 16)
> -       // x3 += x7, x15 = rotl32(x15 ^ x3, 16)
> -       add             v0.4s, v0.4s, v4.4s
> -       add             v1.4s, v1.4s, v5.4s
> -       add             v2.4s, v2.4s, v6.4s
> -       add             v3.4s, v3.4s, v7.4s
> -
> -       eor             v12.16b, v12.16b, v0.16b
> -       eor             v13.16b, v13.16b, v1.16b
> -       eor             v14.16b, v14.16b, v2.16b
> -       eor             v15.16b, v15.16b, v3.16b
> -
> -       rev32           v12.8h, v12.8h
> -       rev32           v13.8h, v13.8h
> -       rev32           v14.8h, v14.8h
> -       rev32           v15.8h, v15.8h
> -
> -       // x8 += x12, x4 = rotl32(x4 ^ x8, 12)
> -       // x9 += x13, x5 = rotl32(x5 ^ x9, 12)
> -       // x10 += x14, x6 = rotl32(x6 ^ x10, 12)
> -       // x11 += x15, x7 = rotl32(x7 ^ x11, 12)
> -       add             v8.4s, v8.4s, v12.4s
> -       add             v9.4s, v9.4s, v13.4s
> -       add             v10.4s, v10.4s, v14.4s
> -       add             v11.4s, v11.4s, v15.4s
> -
> -       eor             v16.16b, v4.16b, v8.16b
> -       eor             v17.16b, v5.16b, v9.16b
> -       eor             v18.16b, v6.16b, v10.16b
> -       eor             v19.16b, v7.16b, v11.16b
> -
> -       shl             v4.4s, v16.4s, #12
> -       shl             v5.4s, v17.4s, #12
> -       shl             v6.4s, v18.4s, #12
> -       shl             v7.4s, v19.4s, #12
> -
> -       sri             v4.4s, v16.4s, #20
> -       sri             v5.4s, v17.4s, #20
> -       sri             v6.4s, v18.4s, #20
> -       sri             v7.4s, v19.4s, #20
> -
> -       // x0 += x4, x12 = rotl32(x12 ^ x0, 8)
> -       // x1 += x5, x13 = rotl32(x13 ^ x1, 8)
> -       // x2 += x6, x14 = rotl32(x14 ^ x2, 8)
> -       // x3 += x7, x15 = rotl32(x15 ^ x3, 8)
> -       add             v0.4s, v0.4s, v4.4s
> -       add             v1.4s, v1.4s, v5.4s
> -       add             v2.4s, v2.4s, v6.4s
> -       add             v3.4s, v3.4s, v7.4s
> -
> -       eor             v12.16b, v12.16b, v0.16b
> -       eor             v13.16b, v13.16b, v1.16b
> -       eor             v14.16b, v14.16b, v2.16b
> -       eor             v15.16b, v15.16b, v3.16b
> -
> -       tbl             v12.16b, {v12.16b}, v31.16b
> -       tbl             v13.16b, {v13.16b}, v31.16b
> -       tbl             v14.16b, {v14.16b}, v31.16b
> -       tbl             v15.16b, {v15.16b}, v31.16b
> -
> -       // x8 += x12, x4 = rotl32(x4 ^ x8, 7)
> -       // x9 += x13, x5 = rotl32(x5 ^ x9, 7)
> -       // x10 += x14, x6 = rotl32(x6 ^ x10, 7)
> -       // x11 += x15, x7 = rotl32(x7 ^ x11, 7)
> -       add             v8.4s, v8.4s, v12.4s
> -       add             v9.4s, v9.4s, v13.4s
> -       add             v10.4s, v10.4s, v14.4s
> -       add             v11.4s, v11.4s, v15.4s
> -
> -       eor             v16.16b, v4.16b, v8.16b
> -       eor             v17.16b, v5.16b, v9.16b
> -       eor             v18.16b, v6.16b, v10.16b
> -       eor             v19.16b, v7.16b, v11.16b
> -
> -       shl             v4.4s, v16.4s, #7
> -       shl             v5.4s, v17.4s, #7
> -       shl             v6.4s, v18.4s, #7
> -       shl             v7.4s, v19.4s, #7
> -
> -       sri             v4.4s, v16.4s, #25
> -       sri             v5.4s, v17.4s, #25
> -       sri             v6.4s, v18.4s, #25
> -       sri             v7.4s, v19.4s, #25
> -
> -       // x0 += x5, x15 = rotl32(x15 ^ x0, 16)
> -       // x1 += x6, x12 = rotl32(x12 ^ x1, 16)
> -       // x2 += x7, x13 = rotl32(x13 ^ x2, 16)
> -       // x3 += x4, x14 = rotl32(x14 ^ x3, 16)
> -       add             v0.4s, v0.4s, v5.4s
> -       add             v1.4s, v1.4s, v6.4s
> -       add             v2.4s, v2.4s, v7.4s
> -       add             v3.4s, v3.4s, v4.4s
> -
> -       eor             v15.16b, v15.16b, v0.16b
> -       eor             v12.16b, v12.16b, v1.16b
> -       eor             v13.16b, v13.16b, v2.16b
> -       eor             v14.16b, v14.16b, v3.16b
> -
> -       rev32           v15.8h, v15.8h
> -       rev32           v12.8h, v12.8h
> -       rev32           v13.8h, v13.8h
> -       rev32           v14.8h, v14.8h
> -
> -       // x10 += x15, x5 = rotl32(x5 ^ x10, 12)
> -       // x11 += x12, x6 = rotl32(x6 ^ x11, 12)
> -       // x8 += x13, x7 = rotl32(x7 ^ x8, 12)
> -       // x9 += x14, x4 = rotl32(x4 ^ x9, 12)
> -       add             v10.4s, v10.4s, v15.4s
> -       add             v11.4s, v11.4s, v12.4s
> -       add             v8.4s, v8.4s, v13.4s
> -       add             v9.4s, v9.4s, v14.4s
> -
> -       eor             v16.16b, v5.16b, v10.16b
> -       eor             v17.16b, v6.16b, v11.16b
> -       eor             v18.16b, v7.16b, v8.16b
> -       eor             v19.16b, v4.16b, v9.16b
> -
> -       shl             v5.4s, v16.4s, #12
> -       shl             v6.4s, v17.4s, #12
> -       shl             v7.4s, v18.4s, #12
> -       shl             v4.4s, v19.4s, #12
> -
> -       sri             v5.4s, v16.4s, #20
> -       sri             v6.4s, v17.4s, #20
> -       sri             v7.4s, v18.4s, #20
> -       sri             v4.4s, v19.4s, #20
> -
> -       // x0 += x5, x15 = rotl32(x15 ^ x0, 8)
> -       // x1 += x6, x12 = rotl32(x12 ^ x1, 8)
> -       // x2 += x7, x13 = rotl32(x13 ^ x2, 8)
> -       // x3 += x4, x14 = rotl32(x14 ^ x3, 8)
> -       add             v0.4s, v0.4s, v5.4s
> -       add             v1.4s, v1.4s, v6.4s
> -       add             v2.4s, v2.4s, v7.4s
> -       add             v3.4s, v3.4s, v4.4s
> -
> -       eor             v15.16b, v15.16b, v0.16b
> -       eor             v12.16b, v12.16b, v1.16b
> -       eor             v13.16b, v13.16b, v2.16b
> -       eor             v14.16b, v14.16b, v3.16b
> -
> -       tbl             v15.16b, {v15.16b}, v31.16b
> -       tbl             v12.16b, {v12.16b}, v31.16b
> -       tbl             v13.16b, {v13.16b}, v31.16b
> -       tbl             v14.16b, {v14.16b}, v31.16b
> -
> -       // x10 += x15, x5 = rotl32(x5 ^ x10, 7)
> -       // x11 += x12, x6 = rotl32(x6 ^ x11, 7)
> -       // x8 += x13, x7 = rotl32(x7 ^ x8, 7)
> -       // x9 += x14, x4 = rotl32(x4 ^ x9, 7)
> -       add             v10.4s, v10.4s, v15.4s
> -       add             v11.4s, v11.4s, v12.4s
> -       add             v8.4s, v8.4s, v13.4s
> -       add             v9.4s, v9.4s, v14.4s
> -
> -       eor             v16.16b, v5.16b, v10.16b
> -       eor             v17.16b, v6.16b, v11.16b
> -       eor             v18.16b, v7.16b, v8.16b
> -       eor             v19.16b, v4.16b, v9.16b
> -
> -       shl             v5.4s, v16.4s, #7
> -       shl             v6.4s, v17.4s, #7
> -       shl             v7.4s, v18.4s, #7
> -       shl             v4.4s, v19.4s, #7
> -
> -       sri             v5.4s, v16.4s, #25
> -       sri             v6.4s, v17.4s, #25
> -       sri             v7.4s, v18.4s, #25
> -       sri             v4.4s, v19.4s, #25
> -
> -       subs            x3, x3, #1
> -       b.ne            .Ldoubleround4
> -
> -       ld4r            {v16.4s-v19.4s}, [x0], #16
> -       ld4r            {v20.4s-v23.4s}, [x0], #16
> -
> -       // x12 += counter values 0-3
> -       add             v12.4s, v12.4s, v30.4s
> -
> -       // x0[0-3] += s0[0]
> -       // x1[0-3] += s0[1]
> -       // x2[0-3] += s0[2]
> -       // x3[0-3] += s0[3]
> -       add             v0.4s, v0.4s, v16.4s
> -       add             v1.4s, v1.4s, v17.4s
> -       add             v2.4s, v2.4s, v18.4s
> -       add             v3.4s, v3.4s, v19.4s
> -
> -       ld4r            {v24.4s-v27.4s}, [x0], #16
> -       ld4r            {v28.4s-v31.4s}, [x0]
> -
> -       // x4[0-3] += s1[0]
> -       // x5[0-3] += s1[1]
> -       // x6[0-3] += s1[2]
> -       // x7[0-3] += s1[3]
> -       add             v4.4s, v4.4s, v20.4s
> -       add             v5.4s, v5.4s, v21.4s
> -       add             v6.4s, v6.4s, v22.4s
> -       add             v7.4s, v7.4s, v23.4s
> -
> -       // x8[0-3] += s2[0]
> -       // x9[0-3] += s2[1]
> -       // x10[0-3] += s2[2]
> -       // x11[0-3] += s2[3]
> -       add             v8.4s, v8.4s, v24.4s
> -       add             v9.4s, v9.4s, v25.4s
> -       add             v10.4s, v10.4s, v26.4s
> -       add             v11.4s, v11.4s, v27.4s
> -
> -       // x12[0-3] += s3[0]
> -       // x13[0-3] += s3[1]
> -       // x14[0-3] += s3[2]
> -       // x15[0-3] += s3[3]
> -       add             v12.4s, v12.4s, v28.4s
> -       add             v13.4s, v13.4s, v29.4s
> -       add             v14.4s, v14.4s, v30.4s
> -       add             v15.4s, v15.4s, v31.4s
> -
> -       // interleave 32-bit words in state n, n+1
> -       zip1            v16.4s, v0.4s, v1.4s
> -       zip2            v17.4s, v0.4s, v1.4s
> -       zip1            v18.4s, v2.4s, v3.4s
> -       zip2            v19.4s, v2.4s, v3.4s
> -       zip1            v20.4s, v4.4s, v5.4s
> -       zip2            v21.4s, v4.4s, v5.4s
> -       zip1            v22.4s, v6.4s, v7.4s
> -       zip2            v23.4s, v6.4s, v7.4s
> -       zip1            v24.4s, v8.4s, v9.4s
> -       zip2            v25.4s, v8.4s, v9.4s
> -       zip1            v26.4s, v10.4s, v11.4s
> -       zip2            v27.4s, v10.4s, v11.4s
> -       zip1            v28.4s, v12.4s, v13.4s
> -       zip2            v29.4s, v12.4s, v13.4s
> -       zip1            v30.4s, v14.4s, v15.4s
> -       zip2            v31.4s, v14.4s, v15.4s
> -
> -       // interleave 64-bit words in state n, n+2
> -       zip1            v0.2d, v16.2d, v18.2d
> -       zip2            v4.2d, v16.2d, v18.2d
> -       zip1            v8.2d, v17.2d, v19.2d
> -       zip2            v12.2d, v17.2d, v19.2d
> -       ld1             {v16.16b-v19.16b}, [x2], #64
> -
> -       zip1            v1.2d, v20.2d, v22.2d
> -       zip2            v5.2d, v20.2d, v22.2d
> -       zip1            v9.2d, v21.2d, v23.2d
> -       zip2            v13.2d, v21.2d, v23.2d
> -       ld1             {v20.16b-v23.16b}, [x2], #64
> -
> -       zip1            v2.2d, v24.2d, v26.2d
> -       zip2            v6.2d, v24.2d, v26.2d
> -       zip1            v10.2d, v25.2d, v27.2d
> -       zip2            v14.2d, v25.2d, v27.2d
> -       ld1             {v24.16b-v27.16b}, [x2], #64
> -
> -       zip1            v3.2d, v28.2d, v30.2d
> -       zip2            v7.2d, v28.2d, v30.2d
> -       zip1            v11.2d, v29.2d, v31.2d
> -       zip2            v15.2d, v29.2d, v31.2d
> -       ld1             {v28.16b-v31.16b}, [x2]
> -
> -       // xor with corresponding input, write to output
> -       eor             v16.16b, v16.16b, v0.16b
> -       eor             v17.16b, v17.16b, v1.16b
> -       eor             v18.16b, v18.16b, v2.16b
> -       eor             v19.16b, v19.16b, v3.16b
> -       eor             v20.16b, v20.16b, v4.16b
> -       eor             v21.16b, v21.16b, v5.16b
> -       st1             {v16.16b-v19.16b}, [x1], #64
> -       eor             v22.16b, v22.16b, v6.16b
> -       eor             v23.16b, v23.16b, v7.16b
> -       eor             v24.16b, v24.16b, v8.16b
> -       eor             v25.16b, v25.16b, v9.16b
> -       st1             {v20.16b-v23.16b}, [x1], #64
> -       eor             v26.16b, v26.16b, v10.16b
> -       eor             v27.16b, v27.16b, v11.16b
> -       eor             v28.16b, v28.16b, v12.16b
> -       st1             {v24.16b-v27.16b}, [x1], #64
> -       eor             v29.16b, v29.16b, v13.16b
> -       eor             v30.16b, v30.16b, v14.16b
> -       eor             v31.16b, v31.16b, v15.16b
> -       st1             {v28.16b-v31.16b}, [x1]
> -
> -       ret
> -ENDPROC(chacha20_4block_xor_neon)
> -
> -CTRINC:        .word           0, 1, 2, 3
> -ROT8:  .word           0x02010003, 0x06050407, 0x0a09080b, 0x0e0d0c0f
> diff --git a/arch/arm64/crypto/chacha20-neon-glue.c b/arch/arm64/crypto/chacha20-neon-glue.c
> deleted file mode 100644
> index 727579c93ded..000000000000
> --- a/arch/arm64/crypto/chacha20-neon-glue.c
> +++ /dev/null
> @@ -1,133 +0,0 @@
> -/*
> - * ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
> - *
> - * Copyright (C) 2016 - 2017 Linaro, Ltd. <ard.biesheuvel@linaro.org>
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 as
> - * published by the Free Software Foundation.
> - *
> - * Based on:
> - * ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
> - *
> - * Copyright (C) 2015 Martin Willi
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - */
> -
> -#include <crypto/algapi.h>
> -#include <crypto/chacha20.h>
> -#include <crypto/internal/skcipher.h>
> -#include <linux/kernel.h>
> -#include <linux/module.h>
> -
> -#include <asm/hwcap.h>
> -#include <asm/neon.h>
> -#include <asm/simd.h>
> -
> -asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
> -asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
> -
> -static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
> -                           unsigned int bytes)
> -{
> -       u8 buf[CHACHA20_BLOCK_SIZE];
> -
> -       while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
> -               kernel_neon_begin();
> -               chacha20_4block_xor_neon(state, dst, src);
> -               kernel_neon_end();
> -               bytes -= CHACHA20_BLOCK_SIZE * 4;
> -               src += CHACHA20_BLOCK_SIZE * 4;
> -               dst += CHACHA20_BLOCK_SIZE * 4;
> -               state[12] += 4;
> -       }
> -
> -       if (!bytes)
> -               return;
> -
> -       kernel_neon_begin();
> -       while (bytes >= CHACHA20_BLOCK_SIZE) {
> -               chacha20_block_xor_neon(state, dst, src);
> -               bytes -= CHACHA20_BLOCK_SIZE;
> -               src += CHACHA20_BLOCK_SIZE;
> -               dst += CHACHA20_BLOCK_SIZE;
> -               state[12]++;
> -       }
> -       if (bytes) {
> -               memcpy(buf, src, bytes);
> -               chacha20_block_xor_neon(state, buf, buf);
> -               memcpy(dst, buf, bytes);
> -       }
> -       kernel_neon_end();
> -}
> -
> -static int chacha20_neon(struct skcipher_request *req)
> -{
> -       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> -       struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
> -       struct skcipher_walk walk;
> -       u32 state[16];
> -       int err;
> -
> -       if (!may_use_simd() || req->cryptlen <= CHACHA20_BLOCK_SIZE)
> -               return crypto_chacha20_crypt(req);
> -
> -       err = skcipher_walk_virt(&walk, req, false);
> -
> -       crypto_chacha20_init(state, ctx, walk.iv);
> -
> -       while (walk.nbytes > 0) {
> -               unsigned int nbytes = walk.nbytes;
> -
> -               if (nbytes < walk.total)
> -                       nbytes = round_down(nbytes, walk.stride);
> -
> -               chacha20_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
> -                               nbytes);
> -               err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
> -       }
> -
> -       return err;
> -}
> -
> -static struct skcipher_alg alg = {
> -       .base.cra_name          = "chacha20",
> -       .base.cra_driver_name   = "chacha20-neon",
> -       .base.cra_priority      = 300,
> -       .base.cra_blocksize     = 1,
> -       .base.cra_ctxsize       = sizeof(struct chacha20_ctx),
> -       .base.cra_module        = THIS_MODULE,
> -
> -       .min_keysize            = CHACHA20_KEY_SIZE,
> -       .max_keysize            = CHACHA20_KEY_SIZE,
> -       .ivsize                 = CHACHA20_IV_SIZE,
> -       .chunksize              = CHACHA20_BLOCK_SIZE,
> -       .walksize               = 4 * CHACHA20_BLOCK_SIZE,
> -       .setkey                 = crypto_chacha20_setkey,
> -       .encrypt                = chacha20_neon,
> -       .decrypt                = chacha20_neon,
> -};
> -
> -static int __init chacha20_simd_mod_init(void)
> -{
> -       if (!(elf_hwcap & HWCAP_ASIMD))
> -               return -ENODEV;
> -
> -       return crypto_register_skcipher(&alg);
> -}
> -
> -static void __exit chacha20_simd_mod_fini(void)
> -{
> -       crypto_unregister_skcipher(&alg);
> -}
> -
> -module_init(chacha20_simd_mod_init);
> -module_exit(chacha20_simd_mod_fini);
> -
> -MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
> -MODULE_LICENSE("GPL v2");
> -MODULE_ALIAS_CRYPTO("chacha20");
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index cf830219846b..419212c31246 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -23,7 +23,6 @@ obj-$(CONFIG_CRYPTO_CAMELLIA_X86_64) += camellia-x86_64.o
>  obj-$(CONFIG_CRYPTO_BLOWFISH_X86_64) += blowfish-x86_64.o
>  obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
>  obj-$(CONFIG_CRYPTO_TWOFISH_X86_64_3WAY) += twofish-x86_64-3way.o
> -obj-$(CONFIG_CRYPTO_CHACHA20_X86_64) += chacha20-x86_64.o
>  obj-$(CONFIG_CRYPTO_SERPENT_SSE2_X86_64) += serpent-sse2-x86_64.o
>  obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
>  obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
> @@ -76,7 +75,6 @@ camellia-x86_64-y := camellia-x86_64-asm_64.o camellia_glue.o
>  blowfish-x86_64-y := blowfish-x86_64-asm_64.o blowfish_glue.o
>  twofish-x86_64-y := twofish-x86_64-asm_64.o twofish_glue.o
>  twofish-x86_64-3way-y := twofish-x86_64-asm_64-3way.o twofish_glue_3way.o
> -chacha20-x86_64-y := chacha20-ssse3-x86_64.o chacha20_glue.o
>  serpent-sse2-x86_64-y := serpent-sse2-x86_64-asm_64.o serpent_sse2_glue.o
>
>  aegis128-aesni-y := aegis128-aesni-asm.o aegis128-aesni-glue.o
> @@ -99,7 +97,6 @@ endif
>
>  ifeq ($(avx2_supported),yes)
>         camellia-aesni-avx2-y := camellia-aesni-avx2-asm_64.o camellia_aesni_avx2_glue.o
> -       chacha20-x86_64-y += chacha20-avx2-x86_64.o
>         serpent-avx2-y := serpent-avx2-asm_64.o serpent_avx2_glue.o
>
>         morus1280-avx2-y := morus1280-avx2-asm.o morus1280-avx2-glue.o
> diff --git a/arch/x86/crypto/chacha20-avx2-x86_64.S b/arch/x86/crypto/chacha20-avx2-x86_64.S
> deleted file mode 100644
> index f3cd26f48332..000000000000
> --- a/arch/x86/crypto/chacha20-avx2-x86_64.S
> +++ /dev/null
> @@ -1,448 +0,0 @@
> -/*
> - * ChaCha20 256-bit cipher algorithm, RFC7539, x64 AVX2 functions
> - *
> - * Copyright (C) 2015 Martin Willi
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - */
> -
> -#include <linux/linkage.h>
> -
> -.section       .rodata.cst32.ROT8, "aM", @progbits, 32
> -.align 32
> -ROT8:  .octa 0x0e0d0c0f0a09080b0605040702010003
> -       .octa 0x0e0d0c0f0a09080b0605040702010003
> -
> -.section       .rodata.cst32.ROT16, "aM", @progbits, 32
> -.align 32
> -ROT16: .octa 0x0d0c0f0e09080b0a0504070601000302
> -       .octa 0x0d0c0f0e09080b0a0504070601000302
> -
> -.section       .rodata.cst32.CTRINC, "aM", @progbits, 32
> -.align 32
> -CTRINC:        .octa 0x00000003000000020000000100000000
> -       .octa 0x00000007000000060000000500000004
> -
> -.text
> -
> -ENTRY(chacha20_8block_xor_avx2)
> -       # %rdi: Input state matrix, s
> -       # %rsi: 8 data blocks output, o
> -       # %rdx: 8 data blocks input, i
> -
> -       # This function encrypts eight consecutive ChaCha20 blocks by loading
> -       # the state matrix in AVX registers eight times. As we need some
> -       # scratch registers, we save the first four registers on the stack. The
> -       # algorithm performs each operation on the corresponding word of each
> -       # state matrix, hence requires no word shuffling. For final XORing step
> -       # we transpose the matrix by interleaving 32-, 64- and then 128-bit
> -       # words, which allows us to do XOR in AVX registers. 8/16-bit word
> -       # rotation is done with the slightly better performing byte shuffling,
> -       # 7/12-bit word rotation uses traditional shift+OR.
> -
> -       vzeroupper
> -       # 4 * 32 byte stack, 32-byte aligned
> -       lea             8(%rsp),%r10
> -       and             $~31, %rsp
> -       sub             $0x80, %rsp
> -
> -       # x0..15[0-7] = s[0..15]
> -       vpbroadcastd    0x00(%rdi),%ymm0
> -       vpbroadcastd    0x04(%rdi),%ymm1
> -       vpbroadcastd    0x08(%rdi),%ymm2
> -       vpbroadcastd    0x0c(%rdi),%ymm3
> -       vpbroadcastd    0x10(%rdi),%ymm4
> -       vpbroadcastd    0x14(%rdi),%ymm5
> -       vpbroadcastd    0x18(%rdi),%ymm6
> -       vpbroadcastd    0x1c(%rdi),%ymm7
> -       vpbroadcastd    0x20(%rdi),%ymm8
> -       vpbroadcastd    0x24(%rdi),%ymm9
> -       vpbroadcastd    0x28(%rdi),%ymm10
> -       vpbroadcastd    0x2c(%rdi),%ymm11
> -       vpbroadcastd    0x30(%rdi),%ymm12
> -       vpbroadcastd    0x34(%rdi),%ymm13
> -       vpbroadcastd    0x38(%rdi),%ymm14
> -       vpbroadcastd    0x3c(%rdi),%ymm15
> -       # x0..3 on stack
> -       vmovdqa         %ymm0,0x00(%rsp)
> -       vmovdqa         %ymm1,0x20(%rsp)
> -       vmovdqa         %ymm2,0x40(%rsp)
> -       vmovdqa         %ymm3,0x60(%rsp)
> -
> -       vmovdqa         CTRINC(%rip),%ymm1
> -       vmovdqa         ROT8(%rip),%ymm2
> -       vmovdqa         ROT16(%rip),%ymm3
> -
> -       # x12 += counter values 0-3
> -       vpaddd          %ymm1,%ymm12,%ymm12
> -
> -       mov             $10,%ecx
> -
> -.Ldoubleround8:
> -       # x0 += x4, x12 = rotl32(x12 ^ x0, 16)
> -       vpaddd          0x00(%rsp),%ymm4,%ymm0
> -       vmovdqa         %ymm0,0x00(%rsp)
> -       vpxor           %ymm0,%ymm12,%ymm12
> -       vpshufb         %ymm3,%ymm12,%ymm12
> -       # x1 += x5, x13 = rotl32(x13 ^ x1, 16)
> -       vpaddd          0x20(%rsp),%ymm5,%ymm0
> -       vmovdqa         %ymm0,0x20(%rsp)
> -       vpxor           %ymm0,%ymm13,%ymm13
> -       vpshufb         %ymm3,%ymm13,%ymm13
> -       # x2 += x6, x14 = rotl32(x14 ^ x2, 16)
> -       vpaddd          0x40(%rsp),%ymm6,%ymm0
> -       vmovdqa         %ymm0,0x40(%rsp)
> -       vpxor           %ymm0,%ymm14,%ymm14
> -       vpshufb         %ymm3,%ymm14,%ymm14
> -       # x3 += x7, x15 = rotl32(x15 ^ x3, 16)
> -       vpaddd          0x60(%rsp),%ymm7,%ymm0
> -       vmovdqa         %ymm0,0x60(%rsp)
> -       vpxor           %ymm0,%ymm15,%ymm15
> -       vpshufb         %ymm3,%ymm15,%ymm15
> -
> -       # x8 += x12, x4 = rotl32(x4 ^ x8, 12)
> -       vpaddd          %ymm12,%ymm8,%ymm8
> -       vpxor           %ymm8,%ymm4,%ymm4
> -       vpslld          $12,%ymm4,%ymm0
> -       vpsrld          $20,%ymm4,%ymm4
> -       vpor            %ymm0,%ymm4,%ymm4
> -       # x9 += x13, x5 = rotl32(x5 ^ x9, 12)
> -       vpaddd          %ymm13,%ymm9,%ymm9
> -       vpxor           %ymm9,%ymm5,%ymm5
> -       vpslld          $12,%ymm5,%ymm0
> -       vpsrld          $20,%ymm5,%ymm5
> -       vpor            %ymm0,%ymm5,%ymm5
> -       # x10 += x14, x6 = rotl32(x6 ^ x10, 12)
> -       vpaddd          %ymm14,%ymm10,%ymm10
> -       vpxor           %ymm10,%ymm6,%ymm6
> -       vpslld          $12,%ymm6,%ymm0
> -       vpsrld          $20,%ymm6,%ymm6
> -       vpor            %ymm0,%ymm6,%ymm6
> -       # x11 += x15, x7 = rotl32(x7 ^ x11, 12)
> -       vpaddd          %ymm15,%ymm11,%ymm11
> -       vpxor           %ymm11,%ymm7,%ymm7
> -       vpslld          $12,%ymm7,%ymm0
> -       vpsrld          $20,%ymm7,%ymm7
> -       vpor            %ymm0,%ymm7,%ymm7
> -
> -       # x0 += x4, x12 = rotl32(x12 ^ x0, 8)
> -       vpaddd          0x00(%rsp),%ymm4,%ymm0
> -       vmovdqa         %ymm0,0x00(%rsp)
> -       vpxor           %ymm0,%ymm12,%ymm12
> -       vpshufb         %ymm2,%ymm12,%ymm12
> -       # x1 += x5, x13 = rotl32(x13 ^ x1, 8)
> -       vpaddd          0x20(%rsp),%ymm5,%ymm0
> -       vmovdqa         %ymm0,0x20(%rsp)
> -       vpxor           %ymm0,%ymm13,%ymm13
> -       vpshufb         %ymm2,%ymm13,%ymm13
> -       # x2 += x6, x14 = rotl32(x14 ^ x2, 8)
> -       vpaddd          0x40(%rsp),%ymm6,%ymm0
> -       vmovdqa         %ymm0,0x40(%rsp)
> -       vpxor           %ymm0,%ymm14,%ymm14
> -       vpshufb         %ymm2,%ymm14,%ymm14
> -       # x3 += x7, x15 = rotl32(x15 ^ x3, 8)
> -       vpaddd          0x60(%rsp),%ymm7,%ymm0
> -       vmovdqa         %ymm0,0x60(%rsp)
> -       vpxor           %ymm0,%ymm15,%ymm15
> -       vpshufb         %ymm2,%ymm15,%ymm15
> -
> -       # x8 += x12, x4 = rotl32(x4 ^ x8, 7)
> -       vpaddd          %ymm12,%ymm8,%ymm8
> -       vpxor           %ymm8,%ymm4,%ymm4
> -       vpslld          $7,%ymm4,%ymm0
> -       vpsrld          $25,%ymm4,%ymm4
> -       vpor            %ymm0,%ymm4,%ymm4
> -       # x9 += x13, x5 = rotl32(x5 ^ x9, 7)
> -       vpaddd          %ymm13,%ymm9,%ymm9
> -       vpxor           %ymm9,%ymm5,%ymm5
> -       vpslld          $7,%ymm5,%ymm0
> -       vpsrld          $25,%ymm5,%ymm5
> -       vpor            %ymm0,%ymm5,%ymm5
> -       # x10 += x14, x6 = rotl32(x6 ^ x10, 7)
> -       vpaddd          %ymm14,%ymm10,%ymm10
> -       vpxor           %ymm10,%ymm6,%ymm6
> -       vpslld          $7,%ymm6,%ymm0
> -       vpsrld          $25,%ymm6,%ymm6
> -       vpor            %ymm0,%ymm6,%ymm6
> -       # x11 += x15, x7 = rotl32(x7 ^ x11, 7)
> -       vpaddd          %ymm15,%ymm11,%ymm11
> -       vpxor           %ymm11,%ymm7,%ymm7
> -       vpslld          $7,%ymm7,%ymm0
> -       vpsrld          $25,%ymm7,%ymm7
> -       vpor            %ymm0,%ymm7,%ymm7
> -
> -       # x0 += x5, x15 = rotl32(x15 ^ x0, 16)
> -       vpaddd          0x00(%rsp),%ymm5,%ymm0
> -       vmovdqa         %ymm0,0x00(%rsp)
> -       vpxor           %ymm0,%ymm15,%ymm15
> -       vpshufb         %ymm3,%ymm15,%ymm15
> -       # x1 += x6, x12 = rotl32(x12 ^ x1, 16)%ymm0
> -       vpaddd          0x20(%rsp),%ymm6,%ymm0
> -       vmovdqa         %ymm0,0x20(%rsp)
> -       vpxor           %ymm0,%ymm12,%ymm12
> -       vpshufb         %ymm3,%ymm12,%ymm12
> -       # x2 += x7, x13 = rotl32(x13 ^ x2, 16)
> -       vpaddd          0x40(%rsp),%ymm7,%ymm0
> -       vmovdqa         %ymm0,0x40(%rsp)
> -       vpxor           %ymm0,%ymm13,%ymm13
> -       vpshufb         %ymm3,%ymm13,%ymm13
> -       # x3 += x4, x14 = rotl32(x14 ^ x3, 16)
> -       vpaddd          0x60(%rsp),%ymm4,%ymm0
> -       vmovdqa         %ymm0,0x60(%rsp)
> -       vpxor           %ymm0,%ymm14,%ymm14
> -       vpshufb         %ymm3,%ymm14,%ymm14
> -
> -       # x10 += x15, x5 = rotl32(x5 ^ x10, 12)
> -       vpaddd          %ymm15,%ymm10,%ymm10
> -       vpxor           %ymm10,%ymm5,%ymm5
> -       vpslld          $12,%ymm5,%ymm0
> -       vpsrld          $20,%ymm5,%ymm5
> -       vpor            %ymm0,%ymm5,%ymm5
> -       # x11 += x12, x6 = rotl32(x6 ^ x11, 12)
> -       vpaddd          %ymm12,%ymm11,%ymm11
> -       vpxor           %ymm11,%ymm6,%ymm6
> -       vpslld          $12,%ymm6,%ymm0
> -       vpsrld          $20,%ymm6,%ymm6
> -       vpor            %ymm0,%ymm6,%ymm6
> -       # x8 += x13, x7 = rotl32(x7 ^ x8, 12)
> -       vpaddd          %ymm13,%ymm8,%ymm8
> -       vpxor           %ymm8,%ymm7,%ymm7
> -       vpslld          $12,%ymm7,%ymm0
> -       vpsrld          $20,%ymm7,%ymm7
> -       vpor            %ymm0,%ymm7,%ymm7
> -       # x9 += x14, x4 = rotl32(x4 ^ x9, 12)
> -       vpaddd          %ymm14,%ymm9,%ymm9
> -       vpxor           %ymm9,%ymm4,%ymm4
> -       vpslld          $12,%ymm4,%ymm0
> -       vpsrld          $20,%ymm4,%ymm4
> -       vpor            %ymm0,%ymm4,%ymm4
> -
> -       # x0 += x5, x15 = rotl32(x15 ^ x0, 8)
> -       vpaddd          0x00(%rsp),%ymm5,%ymm0
> -       vmovdqa         %ymm0,0x00(%rsp)
> -       vpxor           %ymm0,%ymm15,%ymm15
> -       vpshufb         %ymm2,%ymm15,%ymm15
> -       # x1 += x6, x12 = rotl32(x12 ^ x1, 8)
> -       vpaddd          0x20(%rsp),%ymm6,%ymm0
> -       vmovdqa         %ymm0,0x20(%rsp)
> -       vpxor           %ymm0,%ymm12,%ymm12
> -       vpshufb         %ymm2,%ymm12,%ymm12
> -       # x2 += x7, x13 = rotl32(x13 ^ x2, 8)
> -       vpaddd          0x40(%rsp),%ymm7,%ymm0
> -       vmovdqa         %ymm0,0x40(%rsp)
> -       vpxor           %ymm0,%ymm13,%ymm13
> -       vpshufb         %ymm2,%ymm13,%ymm13
> -       # x3 += x4, x14 = rotl32(x14 ^ x3, 8)
> -       vpaddd          0x60(%rsp),%ymm4,%ymm0
> -       vmovdqa         %ymm0,0x60(%rsp)
> -       vpxor           %ymm0,%ymm14,%ymm14
> -       vpshufb         %ymm2,%ymm14,%ymm14
> -
> -       # x10 += x15, x5 = rotl32(x5 ^ x10, 7)
> -       vpaddd          %ymm15,%ymm10,%ymm10
> -       vpxor           %ymm10,%ymm5,%ymm5
> -       vpslld          $7,%ymm5,%ymm0
> -       vpsrld          $25,%ymm5,%ymm5
> -       vpor            %ymm0,%ymm5,%ymm5
> -       # x11 += x12, x6 = rotl32(x6 ^ x11, 7)
> -       vpaddd          %ymm12,%ymm11,%ymm11
> -       vpxor           %ymm11,%ymm6,%ymm6
> -       vpslld          $7,%ymm6,%ymm0
> -       vpsrld          $25,%ymm6,%ymm6
> -       vpor            %ymm0,%ymm6,%ymm6
> -       # x8 += x13, x7 = rotl32(x7 ^ x8, 7)
> -       vpaddd          %ymm13,%ymm8,%ymm8
> -       vpxor           %ymm8,%ymm7,%ymm7
> -       vpslld          $7,%ymm7,%ymm0
> -       vpsrld          $25,%ymm7,%ymm7
> -       vpor            %ymm0,%ymm7,%ymm7
> -       # x9 += x14, x4 = rotl32(x4 ^ x9, 7)
> -       vpaddd          %ymm14,%ymm9,%ymm9
> -       vpxor           %ymm9,%ymm4,%ymm4
> -       vpslld          $7,%ymm4,%ymm0
> -       vpsrld          $25,%ymm4,%ymm4
> -       vpor            %ymm0,%ymm4,%ymm4
> -
> -       dec             %ecx
> -       jnz             .Ldoubleround8
> -
> -       # x0..15[0-3] += s[0..15]
> -       vpbroadcastd    0x00(%rdi),%ymm0
> -       vpaddd          0x00(%rsp),%ymm0,%ymm0
> -       vmovdqa         %ymm0,0x00(%rsp)
> -       vpbroadcastd    0x04(%rdi),%ymm0
> -       vpaddd          0x20(%rsp),%ymm0,%ymm0
> -       vmovdqa         %ymm0,0x20(%rsp)
> -       vpbroadcastd    0x08(%rdi),%ymm0
> -       vpaddd          0x40(%rsp),%ymm0,%ymm0
> -       vmovdqa         %ymm0,0x40(%rsp)
> -       vpbroadcastd    0x0c(%rdi),%ymm0
> -       vpaddd          0x60(%rsp),%ymm0,%ymm0
> -       vmovdqa         %ymm0,0x60(%rsp)
> -       vpbroadcastd    0x10(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm4,%ymm4
> -       vpbroadcastd    0x14(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm5,%ymm5
> -       vpbroadcastd    0x18(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm6,%ymm6
> -       vpbroadcastd    0x1c(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm7,%ymm7
> -       vpbroadcastd    0x20(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm8,%ymm8
> -       vpbroadcastd    0x24(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm9,%ymm9
> -       vpbroadcastd    0x28(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm10,%ymm10
> -       vpbroadcastd    0x2c(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm11,%ymm11
> -       vpbroadcastd    0x30(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm12,%ymm12
> -       vpbroadcastd    0x34(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm13,%ymm13
> -       vpbroadcastd    0x38(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm14,%ymm14
> -       vpbroadcastd    0x3c(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm15,%ymm15
> -
> -       # x12 += counter values 0-3
> -       vpaddd          %ymm1,%ymm12,%ymm12
> -
> -       # interleave 32-bit words in state n, n+1
> -       vmovdqa         0x00(%rsp),%ymm0
> -       vmovdqa         0x20(%rsp),%ymm1
> -       vpunpckldq      %ymm1,%ymm0,%ymm2
> -       vpunpckhdq      %ymm1,%ymm0,%ymm1
> -       vmovdqa         %ymm2,0x00(%rsp)
> -       vmovdqa         %ymm1,0x20(%rsp)
> -       vmovdqa         0x40(%rsp),%ymm0
> -       vmovdqa         0x60(%rsp),%ymm1
> -       vpunpckldq      %ymm1,%ymm0,%ymm2
> -       vpunpckhdq      %ymm1,%ymm0,%ymm1
> -       vmovdqa         %ymm2,0x40(%rsp)
> -       vmovdqa         %ymm1,0x60(%rsp)
> -       vmovdqa         %ymm4,%ymm0
> -       vpunpckldq      %ymm5,%ymm0,%ymm4
> -       vpunpckhdq      %ymm5,%ymm0,%ymm5
> -       vmovdqa         %ymm6,%ymm0
> -       vpunpckldq      %ymm7,%ymm0,%ymm6
> -       vpunpckhdq      %ymm7,%ymm0,%ymm7
> -       vmovdqa         %ymm8,%ymm0
> -       vpunpckldq      %ymm9,%ymm0,%ymm8
> -       vpunpckhdq      %ymm9,%ymm0,%ymm9
> -       vmovdqa         %ymm10,%ymm0
> -       vpunpckldq      %ymm11,%ymm0,%ymm10
> -       vpunpckhdq      %ymm11,%ymm0,%ymm11
> -       vmovdqa         %ymm12,%ymm0
> -       vpunpckldq      %ymm13,%ymm0,%ymm12
> -       vpunpckhdq      %ymm13,%ymm0,%ymm13
> -       vmovdqa         %ymm14,%ymm0
> -       vpunpckldq      %ymm15,%ymm0,%ymm14
> -       vpunpckhdq      %ymm15,%ymm0,%ymm15
> -
> -       # interleave 64-bit words in state n, n+2
> -       vmovdqa         0x00(%rsp),%ymm0
> -       vmovdqa         0x40(%rsp),%ymm2
> -       vpunpcklqdq     %ymm2,%ymm0,%ymm1
> -       vpunpckhqdq     %ymm2,%ymm0,%ymm2
> -       vmovdqa         %ymm1,0x00(%rsp)
> -       vmovdqa         %ymm2,0x40(%rsp)
> -       vmovdqa         0x20(%rsp),%ymm0
> -       vmovdqa         0x60(%rsp),%ymm2
> -       vpunpcklqdq     %ymm2,%ymm0,%ymm1
> -       vpunpckhqdq     %ymm2,%ymm0,%ymm2
> -       vmovdqa         %ymm1,0x20(%rsp)
> -       vmovdqa         %ymm2,0x60(%rsp)
> -       vmovdqa         %ymm4,%ymm0
> -       vpunpcklqdq     %ymm6,%ymm0,%ymm4
> -       vpunpckhqdq     %ymm6,%ymm0,%ymm6
> -       vmovdqa         %ymm5,%ymm0
> -       vpunpcklqdq     %ymm7,%ymm0,%ymm5
> -       vpunpckhqdq     %ymm7,%ymm0,%ymm7
> -       vmovdqa         %ymm8,%ymm0
> -       vpunpcklqdq     %ymm10,%ymm0,%ymm8
> -       vpunpckhqdq     %ymm10,%ymm0,%ymm10
> -       vmovdqa         %ymm9,%ymm0
> -       vpunpcklqdq     %ymm11,%ymm0,%ymm9
> -       vpunpckhqdq     %ymm11,%ymm0,%ymm11
> -       vmovdqa         %ymm12,%ymm0
> -       vpunpcklqdq     %ymm14,%ymm0,%ymm12
> -       vpunpckhqdq     %ymm14,%ymm0,%ymm14
> -       vmovdqa         %ymm13,%ymm0
> -       vpunpcklqdq     %ymm15,%ymm0,%ymm13
> -       vpunpckhqdq     %ymm15,%ymm0,%ymm15
> -
> -       # interleave 128-bit words in state n, n+4
> -       vmovdqa         0x00(%rsp),%ymm0
> -       vperm2i128      $0x20,%ymm4,%ymm0,%ymm1
> -       vperm2i128      $0x31,%ymm4,%ymm0,%ymm4
> -       vmovdqa         %ymm1,0x00(%rsp)
> -       vmovdqa         0x20(%rsp),%ymm0
> -       vperm2i128      $0x20,%ymm5,%ymm0,%ymm1
> -       vperm2i128      $0x31,%ymm5,%ymm0,%ymm5
> -       vmovdqa         %ymm1,0x20(%rsp)
> -       vmovdqa         0x40(%rsp),%ymm0
> -       vperm2i128      $0x20,%ymm6,%ymm0,%ymm1
> -       vperm2i128      $0x31,%ymm6,%ymm0,%ymm6
> -       vmovdqa         %ymm1,0x40(%rsp)
> -       vmovdqa         0x60(%rsp),%ymm0
> -       vperm2i128      $0x20,%ymm7,%ymm0,%ymm1
> -       vperm2i128      $0x31,%ymm7,%ymm0,%ymm7
> -       vmovdqa         %ymm1,0x60(%rsp)
> -       vperm2i128      $0x20,%ymm12,%ymm8,%ymm0
> -       vperm2i128      $0x31,%ymm12,%ymm8,%ymm12
> -       vmovdqa         %ymm0,%ymm8
> -       vperm2i128      $0x20,%ymm13,%ymm9,%ymm0
> -       vperm2i128      $0x31,%ymm13,%ymm9,%ymm13
> -       vmovdqa         %ymm0,%ymm9
> -       vperm2i128      $0x20,%ymm14,%ymm10,%ymm0
> -       vperm2i128      $0x31,%ymm14,%ymm10,%ymm14
> -       vmovdqa         %ymm0,%ymm10
> -       vperm2i128      $0x20,%ymm15,%ymm11,%ymm0
> -       vperm2i128      $0x31,%ymm15,%ymm11,%ymm15
> -       vmovdqa         %ymm0,%ymm11
> -
> -       # xor with corresponding input, write to output
> -       vmovdqa         0x00(%rsp),%ymm0
> -       vpxor           0x0000(%rdx),%ymm0,%ymm0
> -       vmovdqu         %ymm0,0x0000(%rsi)
> -       vmovdqa         0x20(%rsp),%ymm0
> -       vpxor           0x0080(%rdx),%ymm0,%ymm0
> -       vmovdqu         %ymm0,0x0080(%rsi)
> -       vmovdqa         0x40(%rsp),%ymm0
> -       vpxor           0x0040(%rdx),%ymm0,%ymm0
> -       vmovdqu         %ymm0,0x0040(%rsi)
> -       vmovdqa         0x60(%rsp),%ymm0
> -       vpxor           0x00c0(%rdx),%ymm0,%ymm0
> -       vmovdqu         %ymm0,0x00c0(%rsi)
> -       vpxor           0x0100(%rdx),%ymm4,%ymm4
> -       vmovdqu         %ymm4,0x0100(%rsi)
> -       vpxor           0x0180(%rdx),%ymm5,%ymm5
> -       vmovdqu         %ymm5,0x00180(%rsi)
> -       vpxor           0x0140(%rdx),%ymm6,%ymm6
> -       vmovdqu         %ymm6,0x0140(%rsi)
> -       vpxor           0x01c0(%rdx),%ymm7,%ymm7
> -       vmovdqu         %ymm7,0x01c0(%rsi)
> -       vpxor           0x0020(%rdx),%ymm8,%ymm8
> -       vmovdqu         %ymm8,0x0020(%rsi)
> -       vpxor           0x00a0(%rdx),%ymm9,%ymm9
> -       vmovdqu         %ymm9,0x00a0(%rsi)
> -       vpxor           0x0060(%rdx),%ymm10,%ymm10
> -       vmovdqu         %ymm10,0x0060(%rsi)
> -       vpxor           0x00e0(%rdx),%ymm11,%ymm11
> -       vmovdqu         %ymm11,0x00e0(%rsi)
> -       vpxor           0x0120(%rdx),%ymm12,%ymm12
> -       vmovdqu         %ymm12,0x0120(%rsi)
> -       vpxor           0x01a0(%rdx),%ymm13,%ymm13
> -       vmovdqu         %ymm13,0x01a0(%rsi)
> -       vpxor           0x0160(%rdx),%ymm14,%ymm14
> -       vmovdqu         %ymm14,0x0160(%rsi)
> -       vpxor           0x01e0(%rdx),%ymm15,%ymm15
> -       vmovdqu         %ymm15,0x01e0(%rsi)
> -
> -       vzeroupper
> -       lea             -8(%r10),%rsp
> -       ret
> -ENDPROC(chacha20_8block_xor_avx2)
> diff --git a/arch/x86/crypto/chacha20-ssse3-x86_64.S b/arch/x86/crypto/chacha20-ssse3-x86_64.S
> deleted file mode 100644
> index 512a2b500fd1..000000000000
> --- a/arch/x86/crypto/chacha20-ssse3-x86_64.S
> +++ /dev/null
> @@ -1,630 +0,0 @@
> -/*
> - * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSSE3 functions
> - *
> - * Copyright (C) 2015 Martin Willi
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - */
> -
> -#include <linux/linkage.h>
> -
> -.section       .rodata.cst16.ROT8, "aM", @progbits, 16
> -.align 16
> -ROT8:  .octa 0x0e0d0c0f0a09080b0605040702010003
> -.section       .rodata.cst16.ROT16, "aM", @progbits, 16
> -.align 16
> -ROT16: .octa 0x0d0c0f0e09080b0a0504070601000302
> -.section       .rodata.cst16.CTRINC, "aM", @progbits, 16
> -.align 16
> -CTRINC:        .octa 0x00000003000000020000000100000000
> -
> -.text
> -
> -ENTRY(chacha20_block_xor_ssse3)
> -       # %rdi: Input state matrix, s
> -       # %rsi: 1 data block output, o
> -       # %rdx: 1 data block input, i
> -
> -       # This function encrypts one ChaCha20 block by loading the state matrix
> -       # in four SSE registers. It performs matrix operation on four words in
> -       # parallel, but requireds shuffling to rearrange the words after each
> -       # round. 8/16-bit word rotation is done with the slightly better
> -       # performing SSSE3 byte shuffling, 7/12-bit word rotation uses
> -       # traditional shift+OR.
> -
> -       # x0..3 = s0..3
> -       movdqa          0x00(%rdi),%xmm0
> -       movdqa          0x10(%rdi),%xmm1
> -       movdqa          0x20(%rdi),%xmm2
> -       movdqa          0x30(%rdi),%xmm3
> -       movdqa          %xmm0,%xmm8
> -       movdqa          %xmm1,%xmm9
> -       movdqa          %xmm2,%xmm10
> -       movdqa          %xmm3,%xmm11
> -
> -       movdqa          ROT8(%rip),%xmm4
> -       movdqa          ROT16(%rip),%xmm5
> -
> -       mov     $10,%ecx
> -
> -.Ldoubleround:
> -
> -       # x0 += x1, x3 = rotl32(x3 ^ x0, 16)
> -       paddd           %xmm1,%xmm0
> -       pxor            %xmm0,%xmm3
> -       pshufb          %xmm5,%xmm3
> -
> -       # x2 += x3, x1 = rotl32(x1 ^ x2, 12)
> -       paddd           %xmm3,%xmm2
> -       pxor            %xmm2,%xmm1
> -       movdqa          %xmm1,%xmm6
> -       pslld           $12,%xmm6
> -       psrld           $20,%xmm1
> -       por             %xmm6,%xmm1
> -
> -       # x0 += x1, x3 = rotl32(x3 ^ x0, 8)
> -       paddd           %xmm1,%xmm0
> -       pxor            %xmm0,%xmm3
> -       pshufb          %xmm4,%xmm3
> -
> -       # x2 += x3, x1 = rotl32(x1 ^ x2, 7)
> -       paddd           %xmm3,%xmm2
> -       pxor            %xmm2,%xmm1
> -       movdqa          %xmm1,%xmm7
> -       pslld           $7,%xmm7
> -       psrld           $25,%xmm1
> -       por             %xmm7,%xmm1
> -
> -       # x1 = shuffle32(x1, MASK(0, 3, 2, 1))
> -       pshufd          $0x39,%xmm1,%xmm1
> -       # x2 = shuffle32(x2, MASK(1, 0, 3, 2))
> -       pshufd          $0x4e,%xmm2,%xmm2
> -       # x3 = shuffle32(x3, MASK(2, 1, 0, 3))
> -       pshufd          $0x93,%xmm3,%xmm3
> -
> -       # x0 += x1, x3 = rotl32(x3 ^ x0, 16)
> -       paddd           %xmm1,%xmm0
> -       pxor            %xmm0,%xmm3
> -       pshufb          %xmm5,%xmm3
> -
> -       # x2 += x3, x1 = rotl32(x1 ^ x2, 12)
> -       paddd           %xmm3,%xmm2
> -       pxor            %xmm2,%xmm1
> -       movdqa          %xmm1,%xmm6
> -       pslld           $12,%xmm6
> -       psrld           $20,%xmm1
> -       por             %xmm6,%xmm1
> -
> -       # x0 += x1, x3 = rotl32(x3 ^ x0, 8)
> -       paddd           %xmm1,%xmm0
> -       pxor            %xmm0,%xmm3
> -       pshufb          %xmm4,%xmm3
> -
> -       # x2 += x3, x1 = rotl32(x1 ^ x2, 7)
> -       paddd           %xmm3,%xmm2
> -       pxor            %xmm2,%xmm1
> -       movdqa          %xmm1,%xmm7
> -       pslld           $7,%xmm7
> -       psrld           $25,%xmm1
> -       por             %xmm7,%xmm1
> -
> -       # x1 = shuffle32(x1, MASK(2, 1, 0, 3))
> -       pshufd          $0x93,%xmm1,%xmm1
> -       # x2 = shuffle32(x2, MASK(1, 0, 3, 2))
> -       pshufd          $0x4e,%xmm2,%xmm2
> -       # x3 = shuffle32(x3, MASK(0, 3, 2, 1))
> -       pshufd          $0x39,%xmm3,%xmm3
> -
> -       dec             %ecx
> -       jnz             .Ldoubleround
> -
> -       # o0 = i0 ^ (x0 + s0)
> -       movdqu          0x00(%rdx),%xmm4
> -       paddd           %xmm8,%xmm0
> -       pxor            %xmm4,%xmm0
> -       movdqu          %xmm0,0x00(%rsi)
> -       # o1 = i1 ^ (x1 + s1)
> -       movdqu          0x10(%rdx),%xmm5
> -       paddd           %xmm9,%xmm1
> -       pxor            %xmm5,%xmm1
> -       movdqu          %xmm1,0x10(%rsi)
> -       # o2 = i2 ^ (x2 + s2)
> -       movdqu          0x20(%rdx),%xmm6
> -       paddd           %xmm10,%xmm2
> -       pxor            %xmm6,%xmm2
> -       movdqu          %xmm2,0x20(%rsi)
> -       # o3 = i3 ^ (x3 + s3)
> -       movdqu          0x30(%rdx),%xmm7
> -       paddd           %xmm11,%xmm3
> -       pxor            %xmm7,%xmm3
> -       movdqu          %xmm3,0x30(%rsi)
> -
> -       ret
> -ENDPROC(chacha20_block_xor_ssse3)
> -
> -ENTRY(chacha20_4block_xor_ssse3)
> -       # %rdi: Input state matrix, s
> -       # %rsi: 4 data blocks output, o
> -       # %rdx: 4 data blocks input, i
> -
> -       # This function encrypts four consecutive ChaCha20 blocks by loading the
> -       # the state matrix in SSE registers four times. As we need some scratch
> -       # registers, we save the first four registers on the stack. The
> -       # algorithm performs each operation on the corresponding word of each
> -       # state matrix, hence requires no word shuffling. For final XORing step
> -       # we transpose the matrix by interleaving 32- and then 64-bit words,
> -       # which allows us to do XOR in SSE registers. 8/16-bit word rotation is
> -       # done with the slightly better performing SSSE3 byte shuffling,
> -       # 7/12-bit word rotation uses traditional shift+OR.
> -
> -       lea             8(%rsp),%r10
> -       sub             $0x80,%rsp
> -       and             $~63,%rsp
> -
> -       # x0..15[0-3] = s0..3[0..3]
> -       movq            0x00(%rdi),%xmm1
> -       pshufd          $0x00,%xmm1,%xmm0
> -       pshufd          $0x55,%xmm1,%xmm1
> -       movq            0x08(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       movq            0x10(%rdi),%xmm5
> -       pshufd          $0x00,%xmm5,%xmm4
> -       pshufd          $0x55,%xmm5,%xmm5
> -       movq            0x18(%rdi),%xmm7
> -       pshufd          $0x00,%xmm7,%xmm6
> -       pshufd          $0x55,%xmm7,%xmm7
> -       movq            0x20(%rdi),%xmm9
> -       pshufd          $0x00,%xmm9,%xmm8
> -       pshufd          $0x55,%xmm9,%xmm9
> -       movq            0x28(%rdi),%xmm11
> -       pshufd          $0x00,%xmm11,%xmm10
> -       pshufd          $0x55,%xmm11,%xmm11
> -       movq            0x30(%rdi),%xmm13
> -       pshufd          $0x00,%xmm13,%xmm12
> -       pshufd          $0x55,%xmm13,%xmm13
> -       movq            0x38(%rdi),%xmm15
> -       pshufd          $0x00,%xmm15,%xmm14
> -       pshufd          $0x55,%xmm15,%xmm15
> -       # x0..3 on stack
> -       movdqa          %xmm0,0x00(%rsp)
> -       movdqa          %xmm1,0x10(%rsp)
> -       movdqa          %xmm2,0x20(%rsp)
> -       movdqa          %xmm3,0x30(%rsp)
> -
> -       movdqa          CTRINC(%rip),%xmm1
> -       movdqa          ROT8(%rip),%xmm2
> -       movdqa          ROT16(%rip),%xmm3
> -
> -       # x12 += counter values 0-3
> -       paddd           %xmm1,%xmm12
> -
> -       mov             $10,%ecx
> -
> -.Ldoubleround4:
> -       # x0 += x4, x12 = rotl32(x12 ^ x0, 16)
> -       movdqa          0x00(%rsp),%xmm0
> -       paddd           %xmm4,%xmm0
> -       movdqa          %xmm0,0x00(%rsp)
> -       pxor            %xmm0,%xmm12
> -       pshufb          %xmm3,%xmm12
> -       # x1 += x5, x13 = rotl32(x13 ^ x1, 16)
> -       movdqa          0x10(%rsp),%xmm0
> -       paddd           %xmm5,%xmm0
> -       movdqa          %xmm0,0x10(%rsp)
> -       pxor            %xmm0,%xmm13
> -       pshufb          %xmm3,%xmm13
> -       # x2 += x6, x14 = rotl32(x14 ^ x2, 16)
> -       movdqa          0x20(%rsp),%xmm0
> -       paddd           %xmm6,%xmm0
> -       movdqa          %xmm0,0x20(%rsp)
> -       pxor            %xmm0,%xmm14
> -       pshufb          %xmm3,%xmm14
> -       # x3 += x7, x15 = rotl32(x15 ^ x3, 16)
> -       movdqa          0x30(%rsp),%xmm0
> -       paddd           %xmm7,%xmm0
> -       movdqa          %xmm0,0x30(%rsp)
> -       pxor            %xmm0,%xmm15
> -       pshufb          %xmm3,%xmm15
> -
> -       # x8 += x12, x4 = rotl32(x4 ^ x8, 12)
> -       paddd           %xmm12,%xmm8
> -       pxor            %xmm8,%xmm4
> -       movdqa          %xmm4,%xmm0
> -       pslld           $12,%xmm0
> -       psrld           $20,%xmm4
> -       por             %xmm0,%xmm4
> -       # x9 += x13, x5 = rotl32(x5 ^ x9, 12)
> -       paddd           %xmm13,%xmm9
> -       pxor            %xmm9,%xmm5
> -       movdqa          %xmm5,%xmm0
> -       pslld           $12,%xmm0
> -       psrld           $20,%xmm5
> -       por             %xmm0,%xmm5
> -       # x10 += x14, x6 = rotl32(x6 ^ x10, 12)
> -       paddd           %xmm14,%xmm10
> -       pxor            %xmm10,%xmm6
> -       movdqa          %xmm6,%xmm0
> -       pslld           $12,%xmm0
> -       psrld           $20,%xmm6
> -       por             %xmm0,%xmm6
> -       # x11 += x15, x7 = rotl32(x7 ^ x11, 12)
> -       paddd           %xmm15,%xmm11
> -       pxor            %xmm11,%xmm7
> -       movdqa          %xmm7,%xmm0
> -       pslld           $12,%xmm0
> -       psrld           $20,%xmm7
> -       por             %xmm0,%xmm7
> -
> -       # x0 += x4, x12 = rotl32(x12 ^ x0, 8)
> -       movdqa          0x00(%rsp),%xmm0
> -       paddd           %xmm4,%xmm0
> -       movdqa          %xmm0,0x00(%rsp)
> -       pxor            %xmm0,%xmm12
> -       pshufb          %xmm2,%xmm12
> -       # x1 += x5, x13 = rotl32(x13 ^ x1, 8)
> -       movdqa          0x10(%rsp),%xmm0
> -       paddd           %xmm5,%xmm0
> -       movdqa          %xmm0,0x10(%rsp)
> -       pxor            %xmm0,%xmm13
> -       pshufb          %xmm2,%xmm13
> -       # x2 += x6, x14 = rotl32(x14 ^ x2, 8)
> -       movdqa          0x20(%rsp),%xmm0
> -       paddd           %xmm6,%xmm0
> -       movdqa          %xmm0,0x20(%rsp)
> -       pxor            %xmm0,%xmm14
> -       pshufb          %xmm2,%xmm14
> -       # x3 += x7, x15 = rotl32(x15 ^ x3, 8)
> -       movdqa          0x30(%rsp),%xmm0
> -       paddd           %xmm7,%xmm0
> -       movdqa          %xmm0,0x30(%rsp)
> -       pxor            %xmm0,%xmm15
> -       pshufb          %xmm2,%xmm15
> -
> -       # x8 += x12, x4 = rotl32(x4 ^ x8, 7)
> -       paddd           %xmm12,%xmm8
> -       pxor            %xmm8,%xmm4
> -       movdqa          %xmm4,%xmm0
> -       pslld           $7,%xmm0
> -       psrld           $25,%xmm4
> -       por             %xmm0,%xmm4
> -       # x9 += x13, x5 = rotl32(x5 ^ x9, 7)
> -       paddd           %xmm13,%xmm9
> -       pxor            %xmm9,%xmm5
> -       movdqa          %xmm5,%xmm0
> -       pslld           $7,%xmm0
> -       psrld           $25,%xmm5
> -       por             %xmm0,%xmm5
> -       # x10 += x14, x6 = rotl32(x6 ^ x10, 7)
> -       paddd           %xmm14,%xmm10
> -       pxor            %xmm10,%xmm6
> -       movdqa          %xmm6,%xmm0
> -       pslld           $7,%xmm0
> -       psrld           $25,%xmm6
> -       por             %xmm0,%xmm6
> -       # x11 += x15, x7 = rotl32(x7 ^ x11, 7)
> -       paddd           %xmm15,%xmm11
> -       pxor            %xmm11,%xmm7
> -       movdqa          %xmm7,%xmm0
> -       pslld           $7,%xmm0
> -       psrld           $25,%xmm7
> -       por             %xmm0,%xmm7
> -
> -       # x0 += x5, x15 = rotl32(x15 ^ x0, 16)
> -       movdqa          0x00(%rsp),%xmm0
> -       paddd           %xmm5,%xmm0
> -       movdqa          %xmm0,0x00(%rsp)
> -       pxor            %xmm0,%xmm15
> -       pshufb          %xmm3,%xmm15
> -       # x1 += x6, x12 = rotl32(x12 ^ x1, 16)
> -       movdqa          0x10(%rsp),%xmm0
> -       paddd           %xmm6,%xmm0
> -       movdqa          %xmm0,0x10(%rsp)
> -       pxor            %xmm0,%xmm12
> -       pshufb          %xmm3,%xmm12
> -       # x2 += x7, x13 = rotl32(x13 ^ x2, 16)
> -       movdqa          0x20(%rsp),%xmm0
> -       paddd           %xmm7,%xmm0
> -       movdqa          %xmm0,0x20(%rsp)
> -       pxor            %xmm0,%xmm13
> -       pshufb          %xmm3,%xmm13
> -       # x3 += x4, x14 = rotl32(x14 ^ x3, 16)
> -       movdqa          0x30(%rsp),%xmm0
> -       paddd           %xmm4,%xmm0
> -       movdqa          %xmm0,0x30(%rsp)
> -       pxor            %xmm0,%xmm14
> -       pshufb          %xmm3,%xmm14
> -
> -       # x10 += x15, x5 = rotl32(x5 ^ x10, 12)
> -       paddd           %xmm15,%xmm10
> -       pxor            %xmm10,%xmm5
> -       movdqa          %xmm5,%xmm0
> -       pslld           $12,%xmm0
> -       psrld           $20,%xmm5
> -       por             %xmm0,%xmm5
> -       # x11 += x12, x6 = rotl32(x6 ^ x11, 12)
> -       paddd           %xmm12,%xmm11
> -       pxor            %xmm11,%xmm6
> -       movdqa          %xmm6,%xmm0
> -       pslld           $12,%xmm0
> -       psrld           $20,%xmm6
> -       por             %xmm0,%xmm6
> -       # x8 += x13, x7 = rotl32(x7 ^ x8, 12)
> -       paddd           %xmm13,%xmm8
> -       pxor            %xmm8,%xmm7
> -       movdqa          %xmm7,%xmm0
> -       pslld           $12,%xmm0
> -       psrld           $20,%xmm7
> -       por             %xmm0,%xmm7
> -       # x9 += x14, x4 = rotl32(x4 ^ x9, 12)
> -       paddd           %xmm14,%xmm9
> -       pxor            %xmm9,%xmm4
> -       movdqa          %xmm4,%xmm0
> -       pslld           $12,%xmm0
> -       psrld           $20,%xmm4
> -       por             %xmm0,%xmm4
> -
> -       # x0 += x5, x15 = rotl32(x15 ^ x0, 8)
> -       movdqa          0x00(%rsp),%xmm0
> -       paddd           %xmm5,%xmm0
> -       movdqa          %xmm0,0x00(%rsp)
> -       pxor            %xmm0,%xmm15
> -       pshufb          %xmm2,%xmm15
> -       # x1 += x6, x12 = rotl32(x12 ^ x1, 8)
> -       movdqa          0x10(%rsp),%xmm0
> -       paddd           %xmm6,%xmm0
> -       movdqa          %xmm0,0x10(%rsp)
> -       pxor            %xmm0,%xmm12
> -       pshufb          %xmm2,%xmm12
> -       # x2 += x7, x13 = rotl32(x13 ^ x2, 8)
> -       movdqa          0x20(%rsp),%xmm0
> -       paddd           %xmm7,%xmm0
> -       movdqa          %xmm0,0x20(%rsp)
> -       pxor            %xmm0,%xmm13
> -       pshufb          %xmm2,%xmm13
> -       # x3 += x4, x14 = rotl32(x14 ^ x3, 8)
> -       movdqa          0x30(%rsp),%xmm0
> -       paddd           %xmm4,%xmm0
> -       movdqa          %xmm0,0x30(%rsp)
> -       pxor            %xmm0,%xmm14
> -       pshufb          %xmm2,%xmm14
> -
> -       # x10 += x15, x5 = rotl32(x5 ^ x10, 7)
> -       paddd           %xmm15,%xmm10
> -       pxor            %xmm10,%xmm5
> -       movdqa          %xmm5,%xmm0
> -       pslld           $7,%xmm0
> -       psrld           $25,%xmm5
> -       por             %xmm0,%xmm5
> -       # x11 += x12, x6 = rotl32(x6 ^ x11, 7)
> -       paddd           %xmm12,%xmm11
> -       pxor            %xmm11,%xmm6
> -       movdqa          %xmm6,%xmm0
> -       pslld           $7,%xmm0
> -       psrld           $25,%xmm6
> -       por             %xmm0,%xmm6
> -       # x8 += x13, x7 = rotl32(x7 ^ x8, 7)
> -       paddd           %xmm13,%xmm8
> -       pxor            %xmm8,%xmm7
> -       movdqa          %xmm7,%xmm0
> -       pslld           $7,%xmm0
> -       psrld           $25,%xmm7
> -       por             %xmm0,%xmm7
> -       # x9 += x14, x4 = rotl32(x4 ^ x9, 7)
> -       paddd           %xmm14,%xmm9
> -       pxor            %xmm9,%xmm4
> -       movdqa          %xmm4,%xmm0
> -       pslld           $7,%xmm0
> -       psrld           $25,%xmm4
> -       por             %xmm0,%xmm4
> -
> -       dec             %ecx
> -       jnz             .Ldoubleround4
> -
> -       # x0[0-3] += s0[0]
> -       # x1[0-3] += s0[1]
> -       movq            0x00(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       paddd           0x00(%rsp),%xmm2
> -       movdqa          %xmm2,0x00(%rsp)
> -       paddd           0x10(%rsp),%xmm3
> -       movdqa          %xmm3,0x10(%rsp)
> -       # x2[0-3] += s0[2]
> -       # x3[0-3] += s0[3]
> -       movq            0x08(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       paddd           0x20(%rsp),%xmm2
> -       movdqa          %xmm2,0x20(%rsp)
> -       paddd           0x30(%rsp),%xmm3
> -       movdqa          %xmm3,0x30(%rsp)
> -
> -       # x4[0-3] += s1[0]
> -       # x5[0-3] += s1[1]
> -       movq            0x10(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       paddd           %xmm2,%xmm4
> -       paddd           %xmm3,%xmm5
> -       # x6[0-3] += s1[2]
> -       # x7[0-3] += s1[3]
> -       movq            0x18(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       paddd           %xmm2,%xmm6
> -       paddd           %xmm3,%xmm7
> -
> -       # x8[0-3] += s2[0]
> -       # x9[0-3] += s2[1]
> -       movq            0x20(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       paddd           %xmm2,%xmm8
> -       paddd           %xmm3,%xmm9
> -       # x10[0-3] += s2[2]
> -       # x11[0-3] += s2[3]
> -       movq            0x28(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       paddd           %xmm2,%xmm10
> -       paddd           %xmm3,%xmm11
> -
> -       # x12[0-3] += s3[0]
> -       # x13[0-3] += s3[1]
> -       movq            0x30(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       paddd           %xmm2,%xmm12
> -       paddd           %xmm3,%xmm13
> -       # x14[0-3] += s3[2]
> -       # x15[0-3] += s3[3]
> -       movq            0x38(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       paddd           %xmm2,%xmm14
> -       paddd           %xmm3,%xmm15
> -
> -       # x12 += counter values 0-3
> -       paddd           %xmm1,%xmm12
> -
> -       # interleave 32-bit words in state n, n+1
> -       movdqa          0x00(%rsp),%xmm0
> -       movdqa          0x10(%rsp),%xmm1
> -       movdqa          %xmm0,%xmm2
> -       punpckldq       %xmm1,%xmm2
> -       punpckhdq       %xmm1,%xmm0
> -       movdqa          %xmm2,0x00(%rsp)
> -       movdqa          %xmm0,0x10(%rsp)
> -       movdqa          0x20(%rsp),%xmm0
> -       movdqa          0x30(%rsp),%xmm1
> -       movdqa          %xmm0,%xmm2
> -       punpckldq       %xmm1,%xmm2
> -       punpckhdq       %xmm1,%xmm0
> -       movdqa          %xmm2,0x20(%rsp)
> -       movdqa          %xmm0,0x30(%rsp)
> -       movdqa          %xmm4,%xmm0
> -       punpckldq       %xmm5,%xmm4
> -       punpckhdq       %xmm5,%xmm0
> -       movdqa          %xmm0,%xmm5
> -       movdqa          %xmm6,%xmm0
> -       punpckldq       %xmm7,%xmm6
> -       punpckhdq       %xmm7,%xmm0
> -       movdqa          %xmm0,%xmm7
> -       movdqa          %xmm8,%xmm0
> -       punpckldq       %xmm9,%xmm8
> -       punpckhdq       %xmm9,%xmm0
> -       movdqa          %xmm0,%xmm9
> -       movdqa          %xmm10,%xmm0
> -       punpckldq       %xmm11,%xmm10
> -       punpckhdq       %xmm11,%xmm0
> -       movdqa          %xmm0,%xmm11
> -       movdqa          %xmm12,%xmm0
> -       punpckldq       %xmm13,%xmm12
> -       punpckhdq       %xmm13,%xmm0
> -       movdqa          %xmm0,%xmm13
> -       movdqa          %xmm14,%xmm0
> -       punpckldq       %xmm15,%xmm14
> -       punpckhdq       %xmm15,%xmm0
> -       movdqa          %xmm0,%xmm15
> -
> -       # interleave 64-bit words in state n, n+2
> -       movdqa          0x00(%rsp),%xmm0
> -       movdqa          0x20(%rsp),%xmm1
> -       movdqa          %xmm0,%xmm2
> -       punpcklqdq      %xmm1,%xmm2
> -       punpckhqdq      %xmm1,%xmm0
> -       movdqa          %xmm2,0x00(%rsp)
> -       movdqa          %xmm0,0x20(%rsp)
> -       movdqa          0x10(%rsp),%xmm0
> -       movdqa          0x30(%rsp),%xmm1
> -       movdqa          %xmm0,%xmm2
> -       punpcklqdq      %xmm1,%xmm2
> -       punpckhqdq      %xmm1,%xmm0
> -       movdqa          %xmm2,0x10(%rsp)
> -       movdqa          %xmm0,0x30(%rsp)
> -       movdqa          %xmm4,%xmm0
> -       punpcklqdq      %xmm6,%xmm4
> -       punpckhqdq      %xmm6,%xmm0
> -       movdqa          %xmm0,%xmm6
> -       movdqa          %xmm5,%xmm0
> -       punpcklqdq      %xmm7,%xmm5
> -       punpckhqdq      %xmm7,%xmm0
> -       movdqa          %xmm0,%xmm7
> -       movdqa          %xmm8,%xmm0
> -       punpcklqdq      %xmm10,%xmm8
> -       punpckhqdq      %xmm10,%xmm0
> -       movdqa          %xmm0,%xmm10
> -       movdqa          %xmm9,%xmm0
> -       punpcklqdq      %xmm11,%xmm9
> -       punpckhqdq      %xmm11,%xmm0
> -       movdqa          %xmm0,%xmm11
> -       movdqa          %xmm12,%xmm0
> -       punpcklqdq      %xmm14,%xmm12
> -       punpckhqdq      %xmm14,%xmm0
> -       movdqa          %xmm0,%xmm14
> -       movdqa          %xmm13,%xmm0
> -       punpcklqdq      %xmm15,%xmm13
> -       punpckhqdq      %xmm15,%xmm0
> -       movdqa          %xmm0,%xmm15
> -
> -       # xor with corresponding input, write to output
> -       movdqa          0x00(%rsp),%xmm0
> -       movdqu          0x00(%rdx),%xmm1
> -       pxor            %xmm1,%xmm0
> -       movdqu          %xmm0,0x00(%rsi)
> -       movdqa          0x10(%rsp),%xmm0
> -       movdqu          0x80(%rdx),%xmm1
> -       pxor            %xmm1,%xmm0
> -       movdqu          %xmm0,0x80(%rsi)
> -       movdqa          0x20(%rsp),%xmm0
> -       movdqu          0x40(%rdx),%xmm1
> -       pxor            %xmm1,%xmm0
> -       movdqu          %xmm0,0x40(%rsi)
> -       movdqa          0x30(%rsp),%xmm0
> -       movdqu          0xc0(%rdx),%xmm1
> -       pxor            %xmm1,%xmm0
> -       movdqu          %xmm0,0xc0(%rsi)
> -       movdqu          0x10(%rdx),%xmm1
> -       pxor            %xmm1,%xmm4
> -       movdqu          %xmm4,0x10(%rsi)
> -       movdqu          0x90(%rdx),%xmm1
> -       pxor            %xmm1,%xmm5
> -       movdqu          %xmm5,0x90(%rsi)
> -       movdqu          0x50(%rdx),%xmm1
> -       pxor            %xmm1,%xmm6
> -       movdqu          %xmm6,0x50(%rsi)
> -       movdqu          0xd0(%rdx),%xmm1
> -       pxor            %xmm1,%xmm7
> -       movdqu          %xmm7,0xd0(%rsi)
> -       movdqu          0x20(%rdx),%xmm1
> -       pxor            %xmm1,%xmm8
> -       movdqu          %xmm8,0x20(%rsi)
> -       movdqu          0xa0(%rdx),%xmm1
> -       pxor            %xmm1,%xmm9
> -       movdqu          %xmm9,0xa0(%rsi)
> -       movdqu          0x60(%rdx),%xmm1
> -       pxor            %xmm1,%xmm10
> -       movdqu          %xmm10,0x60(%rsi)
> -       movdqu          0xe0(%rdx),%xmm1
> -       pxor            %xmm1,%xmm11
> -       movdqu          %xmm11,0xe0(%rsi)
> -       movdqu          0x30(%rdx),%xmm1
> -       pxor            %xmm1,%xmm12
> -       movdqu          %xmm12,0x30(%rsi)
> -       movdqu          0xb0(%rdx),%xmm1
> -       pxor            %xmm1,%xmm13
> -       movdqu          %xmm13,0xb0(%rsi)
> -       movdqu          0x70(%rdx),%xmm1
> -       pxor            %xmm1,%xmm14
> -       movdqu          %xmm14,0x70(%rsi)
> -       movdqu          0xf0(%rdx),%xmm1
> -       pxor            %xmm1,%xmm15
> -       movdqu          %xmm15,0xf0(%rsi)
> -
> -       lea             -8(%r10),%rsp
> -       ret
> -ENDPROC(chacha20_4block_xor_ssse3)
> diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c
> deleted file mode 100644
> index dce7c5d39c2f..000000000000
> --- a/arch/x86/crypto/chacha20_glue.c
> +++ /dev/null
> @@ -1,146 +0,0 @@
> -/*
> - * ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
> - *
> - * Copyright (C) 2015 Martin Willi
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - */
> -
> -#include <crypto/algapi.h>
> -#include <crypto/chacha20.h>
> -#include <crypto/internal/skcipher.h>
> -#include <linux/kernel.h>
> -#include <linux/module.h>
> -#include <asm/fpu/api.h>
> -#include <asm/simd.h>
> -
> -#define CHACHA20_STATE_ALIGN 16
> -
> -asmlinkage void chacha20_block_xor_ssse3(u32 *state, u8 *dst, const u8 *src);
> -asmlinkage void chacha20_4block_xor_ssse3(u32 *state, u8 *dst, const u8 *src);
> -#ifdef CONFIG_AS_AVX2
> -asmlinkage void chacha20_8block_xor_avx2(u32 *state, u8 *dst, const u8 *src);
> -static bool chacha20_use_avx2;
> -#endif
> -
> -static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src,
> -                           unsigned int bytes)
> -{
> -       u8 buf[CHACHA20_BLOCK_SIZE];
> -
> -#ifdef CONFIG_AS_AVX2
> -       if (chacha20_use_avx2) {
> -               while (bytes >= CHACHA20_BLOCK_SIZE * 8) {
> -                       chacha20_8block_xor_avx2(state, dst, src);
> -                       bytes -= CHACHA20_BLOCK_SIZE * 8;
> -                       src += CHACHA20_BLOCK_SIZE * 8;
> -                       dst += CHACHA20_BLOCK_SIZE * 8;
> -                       state[12] += 8;
> -               }
> -       }
> -#endif
> -       while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
> -               chacha20_4block_xor_ssse3(state, dst, src);
> -               bytes -= CHACHA20_BLOCK_SIZE * 4;
> -               src += CHACHA20_BLOCK_SIZE * 4;
> -               dst += CHACHA20_BLOCK_SIZE * 4;
> -               state[12] += 4;
> -       }
> -       while (bytes >= CHACHA20_BLOCK_SIZE) {
> -               chacha20_block_xor_ssse3(state, dst, src);
> -               bytes -= CHACHA20_BLOCK_SIZE;
> -               src += CHACHA20_BLOCK_SIZE;
> -               dst += CHACHA20_BLOCK_SIZE;
> -               state[12]++;
> -       }
> -       if (bytes) {
> -               memcpy(buf, src, bytes);
> -               chacha20_block_xor_ssse3(state, buf, buf);
> -               memcpy(dst, buf, bytes);
> -       }
> -}
> -
> -static int chacha20_simd(struct skcipher_request *req)
> -{
> -       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> -       struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
> -       u32 *state, state_buf[16 + 2] __aligned(8);
> -       struct skcipher_walk walk;
> -       int err;
> -
> -       BUILD_BUG_ON(CHACHA20_STATE_ALIGN != 16);
> -       state = PTR_ALIGN(state_buf + 0, CHACHA20_STATE_ALIGN);
> -
> -       if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd())
> -               return crypto_chacha20_crypt(req);
> -
> -       err = skcipher_walk_virt(&walk, req, true);
> -
> -       crypto_chacha20_init(state, ctx, walk.iv);
> -
> -       kernel_fpu_begin();
> -
> -       while (walk.nbytes >= CHACHA20_BLOCK_SIZE) {
> -               chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
> -                               rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE));
> -               err = skcipher_walk_done(&walk,
> -                                        walk.nbytes % CHACHA20_BLOCK_SIZE);
> -       }
> -
> -       if (walk.nbytes) {
> -               chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
> -                               walk.nbytes);
> -               err = skcipher_walk_done(&walk, 0);
> -       }
> -
> -       kernel_fpu_end();
> -
> -       return err;
> -}
> -
> -static struct skcipher_alg alg = {
> -       .base.cra_name          = "chacha20",
> -       .base.cra_driver_name   = "chacha20-simd",
> -       .base.cra_priority      = 300,
> -       .base.cra_blocksize     = 1,
> -       .base.cra_ctxsize       = sizeof(struct chacha20_ctx),
> -       .base.cra_module        = THIS_MODULE,
> -
> -       .min_keysize            = CHACHA20_KEY_SIZE,
> -       .max_keysize            = CHACHA20_KEY_SIZE,
> -       .ivsize                 = CHACHA20_IV_SIZE,
> -       .chunksize              = CHACHA20_BLOCK_SIZE,
> -       .setkey                 = crypto_chacha20_setkey,
> -       .encrypt                = chacha20_simd,
> -       .decrypt                = chacha20_simd,
> -};
> -
> -static int __init chacha20_simd_mod_init(void)
> -{
> -       if (!boot_cpu_has(X86_FEATURE_SSSE3))
> -               return -ENODEV;
> -
> -#ifdef CONFIG_AS_AVX2
> -       chacha20_use_avx2 = boot_cpu_has(X86_FEATURE_AVX) &&
> -                           boot_cpu_has(X86_FEATURE_AVX2) &&
> -                           cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL);
> -#endif
> -       return crypto_register_skcipher(&alg);
> -}
> -
> -static void __exit chacha20_simd_mod_fini(void)
> -{
> -       crypto_unregister_skcipher(&alg);
> -}
> -
> -module_init(chacha20_simd_mod_init);
> -module_exit(chacha20_simd_mod_fini);
> -
> -MODULE_LICENSE("GPL");
> -MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
> -MODULE_DESCRIPTION("chacha20 cipher algorithm, SIMD accelerated");
> -MODULE_ALIAS_CRYPTO("chacha20");
> -MODULE_ALIAS_CRYPTO("chacha20-simd");
> diff --git a/crypto/Kconfig b/crypto/Kconfig
> index 47859a0f8052..93cd4d199447 100644
> --- a/crypto/Kconfig
> +++ b/crypto/Kconfig
> @@ -1433,22 +1433,6 @@ config CRYPTO_CHACHA20
>
>           ChaCha20 is a 256-bit high-speed stream cipher designed by Daniel J.
>           Bernstein and further specified in RFC7539 for use in IETF protocols.
> -         This is the portable C implementation of ChaCha20.
> -
> -         See also:
> -         <http://cr.yp.to/chacha/chacha-20080128.pdf>
> -
> -config CRYPTO_CHACHA20_X86_64
> -       tristate "ChaCha20 cipher algorithm (x86_64/SSSE3/AVX2)"
> -       depends on X86 && 64BIT
> -       select CRYPTO_BLKCIPHER
> -       select CRYPTO_CHACHA20
> -       help
> -         ChaCha20 cipher algorithm, RFC7539.
> -
> -         ChaCha20 is a 256-bit high-speed stream cipher designed by Daniel J.
> -         Bernstein and further specified in RFC7539 for use in IETF protocols.
> -         This is the x86_64 assembler implementation using SIMD instructions.
>
>           See also:
>           <http://cr.yp.to/chacha/chacha-20080128.pdf>
> diff --git a/crypto/Makefile b/crypto/Makefile
> index 5e60348d02e2..587103b87890 100644
> --- a/crypto/Makefile
> +++ b/crypto/Makefile
> @@ -117,7 +117,7 @@ obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o
>  obj-$(CONFIG_CRYPTO_SEED) += seed.o
>  obj-$(CONFIG_CRYPTO_SPECK) += speck.o
>  obj-$(CONFIG_CRYPTO_SALSA20) += salsa20_generic.o
> -obj-$(CONFIG_CRYPTO_CHACHA20) += chacha20_generic.o
> +obj-$(CONFIG_CRYPTO_CHACHA20) += chacha20_zinc.o
>  obj-$(CONFIG_CRYPTO_POLY1305) += poly1305_zinc.o
>  obj-$(CONFIG_CRYPTO_DEFLATE) += deflate.o
>  obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o
> diff --git a/crypto/chacha20_generic.c b/crypto/chacha20_generic.c
> deleted file mode 100644
> index e451c3cb6a56..000000000000
> --- a/crypto/chacha20_generic.c
> +++ /dev/null
> @@ -1,136 +0,0 @@
> -/*
> - * ChaCha20 256-bit cipher algorithm, RFC7539
> - *
> - * Copyright (C) 2015 Martin Willi
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - */
> -
> -#include <asm/unaligned.h>
> -#include <crypto/algapi.h>
> -#include <crypto/chacha20.h>
> -#include <crypto/internal/skcipher.h>
> -#include <linux/module.h>
> -
> -static void chacha20_docrypt(u32 *state, u8 *dst, const u8 *src,
> -                            unsigned int bytes)
> -{
> -       u32 stream[CHACHA20_BLOCK_WORDS];
> -
> -       if (dst != src)
> -               memcpy(dst, src, bytes);
> -
> -       while (bytes >= CHACHA20_BLOCK_SIZE) {
> -               chacha20_block(state, stream);
> -               crypto_xor(dst, (const u8 *)stream, CHACHA20_BLOCK_SIZE);
> -               bytes -= CHACHA20_BLOCK_SIZE;
> -               dst += CHACHA20_BLOCK_SIZE;
> -       }
> -       if (bytes) {
> -               chacha20_block(state, stream);
> -               crypto_xor(dst, (const u8 *)stream, bytes);
> -       }
> -}
> -
> -void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv)
> -{
> -       state[0]  = 0x61707865; /* "expa" */
> -       state[1]  = 0x3320646e; /* "nd 3" */
> -       state[2]  = 0x79622d32; /* "2-by" */
> -       state[3]  = 0x6b206574; /* "te k" */
> -       state[4]  = ctx->key[0];
> -       state[5]  = ctx->key[1];
> -       state[6]  = ctx->key[2];
> -       state[7]  = ctx->key[3];
> -       state[8]  = ctx->key[4];
> -       state[9]  = ctx->key[5];
> -       state[10] = ctx->key[6];
> -       state[11] = ctx->key[7];
> -       state[12] = get_unaligned_le32(iv +  0);
> -       state[13] = get_unaligned_le32(iv +  4);
> -       state[14] = get_unaligned_le32(iv +  8);
> -       state[15] = get_unaligned_le32(iv + 12);
> -}
> -EXPORT_SYMBOL_GPL(crypto_chacha20_init);
> -
> -int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key,
> -                          unsigned int keysize)
> -{
> -       struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
> -       int i;
> -
> -       if (keysize != CHACHA20_KEY_SIZE)
> -               return -EINVAL;
> -
> -       for (i = 0; i < ARRAY_SIZE(ctx->key); i++)
> -               ctx->key[i] = get_unaligned_le32(key + i * sizeof(u32));
> -
> -       return 0;
> -}
> -EXPORT_SYMBOL_GPL(crypto_chacha20_setkey);
> -
> -int crypto_chacha20_crypt(struct skcipher_request *req)
> -{
> -       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> -       struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
> -       struct skcipher_walk walk;
> -       u32 state[16];
> -       int err;
> -
> -       err = skcipher_walk_virt(&walk, req, true);
> -
> -       crypto_chacha20_init(state, ctx, walk.iv);
> -
> -       while (walk.nbytes > 0) {
> -               unsigned int nbytes = walk.nbytes;
> -
> -               if (nbytes < walk.total)
> -                       nbytes = round_down(nbytes, walk.stride);
> -
> -               chacha20_docrypt(state, walk.dst.virt.addr, walk.src.virt.addr,
> -                                nbytes);
> -               err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
> -       }
> -
> -       return err;
> -}
> -EXPORT_SYMBOL_GPL(crypto_chacha20_crypt);
> -
> -static struct skcipher_alg alg = {
> -       .base.cra_name          = "chacha20",
> -       .base.cra_driver_name   = "chacha20-generic",
> -       .base.cra_priority      = 100,
> -       .base.cra_blocksize     = 1,
> -       .base.cra_ctxsize       = sizeof(struct chacha20_ctx),
> -       .base.cra_module        = THIS_MODULE,
> -
> -       .min_keysize            = CHACHA20_KEY_SIZE,
> -       .max_keysize            = CHACHA20_KEY_SIZE,
> -       .ivsize                 = CHACHA20_IV_SIZE,
> -       .chunksize              = CHACHA20_BLOCK_SIZE,
> -       .setkey                 = crypto_chacha20_setkey,
> -       .encrypt                = crypto_chacha20_crypt,
> -       .decrypt                = crypto_chacha20_crypt,
> -};
> -
> -static int __init chacha20_generic_mod_init(void)
> -{
> -       return crypto_register_skcipher(&alg);
> -}
> -
> -static void __exit chacha20_generic_mod_fini(void)
> -{
> -       crypto_unregister_skcipher(&alg);
> -}
> -
> -module_init(chacha20_generic_mod_init);
> -module_exit(chacha20_generic_mod_fini);
> -
> -MODULE_LICENSE("GPL");
> -MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
> -MODULE_DESCRIPTION("chacha20 cipher algorithm");
> -MODULE_ALIAS_CRYPTO("chacha20");
> -MODULE_ALIAS_CRYPTO("chacha20-generic");
> diff --git a/crypto/chacha20_zinc.c b/crypto/chacha20_zinc.c
> new file mode 100644
> index 000000000000..5df88fdee066
> --- /dev/null
> +++ b/crypto/chacha20_zinc.c
> @@ -0,0 +1,100 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * Copyright (C) 2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
> + */
> +
> +#include <asm/unaligned.h>
> +#include <crypto/algapi.h>
> +#include <crypto/internal/skcipher.h>
> +#include <zinc/chacha20.h>
> +#include <linux/module.h>
> +
> +struct chacha20_key_ctx {
> +       u32 key[8];
> +};
> +
> +static int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key,
> +                                 unsigned int keysize)
> +{
> +       struct chacha20_key_ctx *key_ctx = crypto_skcipher_ctx(tfm);
> +       int i;
> +
> +       if (keysize != CHACHA20_KEY_SIZE)
> +               return -EINVAL;
> +
> +       for (i = 0; i < ARRAY_SIZE(key_ctx->key); ++i)
> +               key_ctx->key[i] = get_unaligned_le32(key + i * sizeof(u32));
> +
> +       return 0;
> +}
> +
> +static int crypto_chacha20_crypt(struct skcipher_request *req)
> +{
> +       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> +       struct chacha20_key_ctx *key_ctx = crypto_skcipher_ctx(tfm);
> +       struct chacha20_ctx ctx;
> +       struct skcipher_walk walk;
> +       simd_context_t simd_context;
> +       int err, i;
> +
> +       err = skcipher_walk_virt(&walk, req, true);
> +       if (unlikely(err))
> +               return err;
> +
> +       memcpy(ctx.key, key_ctx->key, sizeof(ctx.key));
> +       for (i = 0; i < ARRAY_SIZE(ctx.counter); ++i)
> +               ctx.counter[i] = get_unaligned_le32(walk.iv + i * sizeof(u32));
> +
> +       simd_context = simd_get();
> +       while (walk.nbytes > 0) {
> +               unsigned int nbytes = walk.nbytes;
> +
> +               if (nbytes < walk.total)
> +                       nbytes = round_down(nbytes, walk.stride);
> +
> +               chacha20(&ctx, walk.dst.virt.addr, walk.src.virt.addr, nbytes,
> +                        simd_context);
> +
> +               err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
> +               simd_context = simd_relax(simd_context);
> +       }
> +       simd_put(simd_context);
> +
> +       return err;
> +}
> +
> +static struct skcipher_alg alg = {
> +       .base.cra_name          = "chacha20",
> +       .base.cra_driver_name   = "chacha20-software",
> +       .base.cra_priority      = 100,
> +       .base.cra_blocksize     = 1,
> +       .base.cra_ctxsize       = sizeof(struct chacha20_key_ctx),
> +       .base.cra_module        = THIS_MODULE,
> +
> +       .min_keysize            = CHACHA20_KEY_SIZE,
> +       .max_keysize            = CHACHA20_KEY_SIZE,
> +       .ivsize                 = CHACHA20_IV_SIZE,
> +       .chunksize              = CHACHA20_BLOCK_SIZE,
> +       .setkey                 = crypto_chacha20_setkey,
> +       .encrypt                = crypto_chacha20_crypt,
> +       .decrypt                = crypto_chacha20_crypt,
> +};
> +
> +static int __init chacha20_mod_init(void)
> +{
> +       return crypto_register_skcipher(&alg);
> +}
> +
> +static void __exit chacha20_mod_exit(void)
> +{
> +       crypto_unregister_skcipher(&alg);
> +}
> +
> +module_init(chacha20_mod_init);
> +module_exit(chacha20_mod_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
> +MODULE_DESCRIPTION("ChaCha20 stream cipher");
> +MODULE_ALIAS_CRYPTO("chacha20");
> +MODULE_ALIAS_CRYPTO("chacha20-software");
> diff --git a/crypto/chacha20poly1305.c b/crypto/chacha20poly1305.c
> index bf523797bef3..b26adb9ed898 100644
> --- a/crypto/chacha20poly1305.c
> +++ b/crypto/chacha20poly1305.c
> @@ -13,7 +13,7 @@
>  #include <crypto/internal/hash.h>
>  #include <crypto/internal/skcipher.h>
>  #include <crypto/scatterwalk.h>
> -#include <crypto/chacha20.h>
> +#include <zinc/chacha20.h>
>  #include <zinc/poly1305.h>
>  #include <linux/err.h>
>  #include <linux/init.h>
> diff --git a/include/crypto/chacha20.h b/include/crypto/chacha20.h
> index b83d66073db0..3b92f58f3891 100644
> --- a/include/crypto/chacha20.h
> +++ b/include/crypto/chacha20.h
> @@ -6,23 +6,11 @@
>  #ifndef _CRYPTO_CHACHA20_H
>  #define _CRYPTO_CHACHA20_H
>
> -#include <crypto/skcipher.h>
> -#include <linux/types.h>
> -#include <linux/crypto.h>
> -
>  #define CHACHA20_IV_SIZE       16
>  #define CHACHA20_KEY_SIZE      32
>  #define CHACHA20_BLOCK_SIZE    64
>  #define CHACHA20_BLOCK_WORDS   (CHACHA20_BLOCK_SIZE / sizeof(u32))
>
> -struct chacha20_ctx {
> -       u32 key[8];
> -};
> -
>  void chacha20_block(u32 *state, u32 *stream);
> -void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv);
> -int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key,
> -                          unsigned int keysize);
> -int crypto_chacha20_crypt(struct skcipher_request *req);
>
>  #endif
> --
> 2.19.0
>

^ permalink raw reply

* Re: [PATCH iproute2] libnetlink: fix leak and using unused memory on error
From: Mahesh Bandewar (महेश बंडेवार) @ 2018-09-14 17:48 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Mahesh Bandewar, linux-netdev
In-Reply-To: <20180913193338.20233-1-stephen@networkplumber.org>

On Thu, Sep 13, 2018 at 12:33 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> If an error happens in multi-segment message (tc only)
> then report the error and stop processing further responses.
> This also fixes refering to the buffer after free.
>
> The sequence check is not necessary here because the
> response message has already been validated to be in
> the window of the sequence number of the iov.
>
> Reported-by: Mahesh Bandewar <mahesh@bandewar.net>
> Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Mahesh Bandewar <maheshb@google.com>
> ---
>  lib/libnetlink.c | 23 +++++++++--------------
>  1 file changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/lib/libnetlink.c b/lib/libnetlink.c
> index 928de1dd16d8..586809292594 100644
> --- a/lib/libnetlink.c
> +++ b/lib/libnetlink.c
> @@ -617,7 +617,6 @@ static int __rtnl_talk_iov(struct rtnl_handle *rtnl, struct iovec *iov,
>         msg.msg_iovlen = 1;
>         i = 0;
>         while (1) {
> -next:
>                 status = rtnl_recvmsg(rtnl->fd, &msg, &buf);
>                 ++i;
>
> @@ -660,27 +659,23 @@ next:
>
>                                 if (l < sizeof(struct nlmsgerr)) {
>                                         fprintf(stderr, "ERROR truncated\n");
> -                               } else if (!err->error) {
> +                                       free(buf);
> +                                       return -1;
> +                               }
> +
> +                               if (!err->error)
>                                         /* check messages from kernel */
>                                         nl_dump_ext_ack(h, errfn);
>
> -                                       if (answer)
> -                                               *answer = (struct nlmsghdr *)buf;
> -                                       else
> -                                               free(buf);
> -                                       if (h->nlmsg_seq == seq)
> -                                               return 0;
> -                                       else if (i < iovlen)
> -                                               goto next;
> -                                       return 0;
> -                               }
> -
>                                 if (rtnl->proto != NETLINK_SOCK_DIAG &&
>                                     show_rtnl_err)
>                                         rtnl_talk_error(h, err, errfn);
>
>                                 errno = -err->error;
> -                               free(buf);
> +                               if (answer)
> +                                       *answer = (struct nlmsghdr *)buf;
> +                               else
> +                                       free(buf);
>                                 return -i;
>                         }
>
> --
> 2.18.0
>

^ permalink raw reply

* Re: [PATCH] net: caif: remove redundant null check on frontpkt
From: Sergei Shtylyov @ 2018-09-14 17:54 UTC (permalink / raw)
  To: Colin King, Dmitry Tarnyagin, David S . Miller; +Cc: kernel-janitors, netdev
In-Reply-To: <20180914171916.21298-1-colin.king@canonical.com>

Hello!

On 09/14/2018 08:19 PM, Colin King wrote:

> From: Colin Ian King <colin.king@canonical.com>
> 
> It is impossible for frontpkt to be null at the point of the null
> check because it has been assigned from rearpkt and there is no
> way realpkt can be null at the point of the assignment because

   rearpkt?

> of the sanity checking and exit paths taken previously. Remove
> the redundant null check.
> 
> Detected by CoverityScan, CID#114434 ("Logically dead code")
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
[...]

MBR, Sergei

^ permalink raw reply

* [PATCH net-next RFC 0/8] udp and configurable gro
From: Willem de Bruijn @ 2018-09-14 17:59 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, steffen.klassert, davem, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

This is a *very rough* draft. Mainly for discussion while we also
look at another partially overlapping approach [1].

Reduce UDP receive cost for bulk traffic by enabling datagram
coalescing with GRO.

Before adding more GRO callbacks, make GRO configurable by the
administrator to optionally reduce the attack surface of this
early receive path. See also [2].

Introduce sysctls net.(core|ipv4|ipv6).gro that expose the table of
protocols for which GRO is support. Allow the administrator to disable
individual entries in the table.

To have a single infrastructure, convert dev_offloads to the
table-based approach to existing inet(6)_offloads. Additional small
benefit is that ipv6 will no longer take two list lookups to find.

Patch 1 converts dev_offloads to the infra of inet(6)_offloads
Patch 2 deduplicates gro_complete logic now that all share infra
Patch 3 does the same for gro_receive, in anticipation of adding
        a branch to check whether gro_receive is enabled
Patch 4 harmonizes ipv6 header opts, so that those, too can be
        optionally disabled.
Patch 5 makes inet(6)_offloads non-const to allow disabling a flag
Patch 6 introduces the administrative sysctl
Patch 7 avoids udp gro cost if no udp gro callback is register
Patch 8 introduces udp gro

[1] http://patchwork.ozlabs.org/project/netdev/list/?series=65741
[2] http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf

Willem de Bruijn (8):
  gro: convert device offloads to net_offload
  gro: deduplicate gro_complete
  gro: add net_gro_receive
  ipv6: remove offload exception for hopopts
  net: deconstify net_offload
  net: make gro configurable
  udp: gro behind static key
  udp: add gro

 drivers/net/geneve.c       |  11 +---
 drivers/net/vxlan.c        |   8 +++
 include/linux/netdevice.h  |  64 +++++++++++++++++++--
 include/net/protocol.h     |  19 ++-----
 include/net/udp.h          |   2 +
 include/uapi/linux/udp.h   |   1 +
 net/8021q/vlan.c           |  12 +---
 net/core/dev.c             | 112 ++++++++-----------------------------
 net/core/sysctl_net_core.c |  60 ++++++++++++++++++++
 net/ethernet/eth.c         |  13 +----
 net/ipv4/af_inet.c         |  21 ++-----
 net/ipv4/esp4_offload.c    |   2 +-
 net/ipv4/fou.c             |  41 ++++----------
 net/ipv4/gre_offload.c     |  26 ++++-----
 net/ipv4/protocol.c        |  10 ++--
 net/ipv4/sysctl_net_ipv4.c |   7 +++
 net/ipv4/tcp_offload.c     |   2 +-
 net/ipv4/udp.c             |  73 +++++++++++++++++++++++-
 net/ipv4/udp_offload.c     |  19 +++----
 net/ipv6/esp6_offload.c    |   2 +-
 net/ipv6/exthdrs_offload.c |  17 +++++-
 net/ipv6/ip6_offload.c     |  69 +++++++++--------------
 net/ipv6/protocol.c        |  10 ++--
 net/ipv6/sysctl_net_ipv6.c |   8 +++
 net/ipv6/tcpv6_offload.c   |   2 +-
 net/ipv6/udp.c             |   2 +-
 net/ipv6/udp_offload.c     |   4 +-
 net/sctp/offload.c         |   2 +-
 28 files changed, 344 insertions(+), 275 deletions(-)

-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply

* [PATCH net-next RFC 1/8] gro: convert device offloads to net_offload
From: Willem de Bruijn @ 2018-09-14 17:59 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, steffen.klassert, davem, Willem de Bruijn
In-Reply-To: <20180914175941.213950-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

In preparation of making GRO receive configurable, have all offloads
share the same infrastructure.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/netdevice.h |  17 +++++-
 include/net/protocol.h    |   7 ---
 net/core/dev.c            | 105 +++++++++++++-------------------------
 3 files changed, 51 insertions(+), 78 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e2b3bd750c98..7425068fa249 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2366,13 +2366,18 @@ struct offload_callbacks {
 	int			(*gro_complete)(struct sk_buff *skb, int nhoff);
 };
 
-struct packet_offload {
+struct net_offload {
 	__be16			 type;	/* This is really htons(ether_type). */
 	u16			 priority;
 	struct offload_callbacks callbacks;
-	struct list_head	 list;
+	unsigned int		 flags;	/* Flags used by IPv6 for now */
 };
 
+#define packet_offload	net_offload
+
+/* This should be set for any extension header which is compatible with GSO. */
+#define INET6_PROTO_GSO_EXTHDR	0x1
+
 /* often modified stats are per-CPU, other are shared (netdev->stats) */
 struct pcpu_sw_netstats {
 	u64     rx_packets;
@@ -3554,6 +3559,14 @@ gro_result_t napi_gro_frags(struct napi_struct *napi);
 struct packet_offload *gro_find_receive_by_type(__be16 type);
 struct packet_offload *gro_find_complete_by_type(__be16 type);
 
+static inline u8 net_offload_from_type(u16 type)
+{
+	/* Do not bother handling collisions. There are none.
+	 * If they do occur with new offloads, add a mapping function here.
+	 */
+	return type & 0xFF;
+}
+
 static inline void napi_free_frags(struct napi_struct *napi)
 {
 	kfree_skb(napi->skb);
diff --git a/include/net/protocol.h b/include/net/protocol.h
index 4fc75f7ae23b..53a0322ee545 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -69,13 +69,6 @@ struct inet6_protocol {
 #define INET6_PROTO_FINAL	0x2
 #endif
 
-struct net_offload {
-	struct offload_callbacks callbacks;
-	unsigned int		 flags;	/* Flags used by IPv6 for now */
-};
-/* This should be set for any extension header which is compatible with GSO. */
-#define INET6_PROTO_GSO_EXTHDR	0x1
-
 /* This is used to register socket interfaces for IP protocols.  */
 struct inet_protosw {
 	struct list_head list;
diff --git a/net/core/dev.c b/net/core/dev.c
index 0b2d777e5b9e..55f86b6d3182 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -154,7 +154,6 @@
 #define GRO_MAX_HEAD (MAX_HEADER + 128)
 
 static DEFINE_SPINLOCK(ptype_lock);
-static DEFINE_SPINLOCK(offload_lock);
 struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
 struct list_head ptype_all __read_mostly;	/* Taps */
 static struct list_head offload_base __read_mostly;
@@ -467,6 +466,9 @@ void dev_remove_pack(struct packet_type *pt)
 EXPORT_SYMBOL(dev_remove_pack);
 
 
+const struct net_offload __rcu *dev_offloads[256] __read_mostly;
+EXPORT_SYMBOL(dev_offloads);
+
 /**
  *	dev_add_offload - register offload handlers
  *	@po: protocol offload declaration
@@ -481,15 +483,9 @@ EXPORT_SYMBOL(dev_remove_pack);
  */
 void dev_add_offload(struct packet_offload *po)
 {
-	struct packet_offload *elem;
-
-	spin_lock(&offload_lock);
-	list_for_each_entry(elem, &offload_base, list) {
-		if (po->priority < elem->priority)
-			break;
-	}
-	list_add_rcu(&po->list, elem->list.prev);
-	spin_unlock(&offload_lock);
+	cmpxchg((const struct net_offload **)
+		&dev_offloads[net_offload_from_type(po->type)],
+			NULL, po);
 }
 EXPORT_SYMBOL(dev_add_offload);
 
@@ -506,23 +502,11 @@ EXPORT_SYMBOL(dev_add_offload);
  *	and must not be freed until after all the CPU's have gone
  *	through a quiescent state.
  */
-static void __dev_remove_offload(struct packet_offload *po)
+static int __dev_remove_offload(struct packet_offload *po)
 {
-	struct list_head *head = &offload_base;
-	struct packet_offload *po1;
-
-	spin_lock(&offload_lock);
-
-	list_for_each_entry(po1, head, list) {
-		if (po == po1) {
-			list_del_rcu(&po->list);
-			goto out;
-		}
-	}
-
-	pr_warn("dev_remove_offload: %p not found\n", po);
-out:
-	spin_unlock(&offload_lock);
+	return (cmpxchg((const struct net_offload **)
+			&dev_offloads[net_offload_from_type(po->type)],
+		       po, NULL) == po) ? 0 : -1;
 }
 
 /**
@@ -2962,7 +2946,7 @@ struct sk_buff *skb_mac_gso_segment(struct sk_buff *skb,
 				    netdev_features_t features)
 {
 	struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT);
-	struct packet_offload *ptype;
+	const struct net_offload *off;
 	int vlan_depth = skb->mac_len;
 	__be16 type = skb_network_protocol(skb, &vlan_depth);
 
@@ -2972,12 +2956,9 @@ struct sk_buff *skb_mac_gso_segment(struct sk_buff *skb,
 	__skb_pull(skb, vlan_depth);
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(ptype, &offload_base, list) {
-		if (ptype->type == type && ptype->callbacks.gso_segment) {
-			segs = ptype->callbacks.gso_segment(skb, features);
-			break;
-		}
-	}
+	off = rcu_dereference(dev_offloads[net_offload_from_type(type)]);
+	if (off && off->type == type && off->callbacks.gso_segment)
+		segs = off->callbacks.gso_segment(skb, features);
 	rcu_read_unlock();
 
 	__skb_push(skb, skb->data - skb_mac_header(skb));
@@ -5254,9 +5235,8 @@ static void flush_all_backlogs(void)
 
 static int napi_gro_complete(struct sk_buff *skb)
 {
-	struct packet_offload *ptype;
+	const struct packet_offload *ptype;
 	__be16 type = skb->protocol;
-	struct list_head *head = &offload_base;
 	int err = -ENOENT;
 
 	BUILD_BUG_ON(sizeof(struct napi_gro_cb) > sizeof(skb->cb));
@@ -5267,17 +5247,12 @@ static int napi_gro_complete(struct sk_buff *skb)
 	}
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(ptype, head, list) {
-		if (ptype->type != type || !ptype->callbacks.gro_complete)
-			continue;
-
+	ptype = dev_offloads[net_offload_from_type(type)];
+	if (ptype && ptype->callbacks.gro_complete)
 		err = ptype->callbacks.gro_complete(skb, 0);
-		break;
-	}
 	rcu_read_unlock();
 
 	if (err) {
-		WARN_ON(&ptype->list == head);
 		kfree_skb(skb);
 		return NET_RX_SUCCESS;
 	}
@@ -5417,8 +5392,7 @@ static void gro_flush_oldest(struct list_head *head)
 static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 {
 	u32 hash = skb_get_hash_raw(skb) & (GRO_HASH_BUCKETS - 1);
-	struct list_head *head = &offload_base;
-	struct packet_offload *ptype;
+	const struct packet_offload *ptype;
 	__be16 type = skb->protocol;
 	struct list_head *gro_head;
 	struct sk_buff *pp = NULL;
@@ -5432,10 +5406,8 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 	gro_head = gro_list_prepare(napi, skb);
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(ptype, head, list) {
-		if (ptype->type != type || !ptype->callbacks.gro_receive)
-			continue;
-
+	ptype = dev_offloads[net_offload_from_type(type)];
+	if (ptype && ptype->callbacks.gro_receive) {
 		skb_set_network_header(skb, skb_gro_offset(skb));
 		skb_reset_mac_len(skb);
 		NAPI_GRO_CB(skb)->same_flow = 0;
@@ -5464,12 +5436,11 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 		}
 
 		pp = ptype->callbacks.gro_receive(gro_head, skb);
-		break;
-	}
-	rcu_read_unlock();
-
-	if (&ptype->list == head)
+		rcu_read_unlock();
+	} else {
+		rcu_read_unlock();
 		goto normal;
+	}
 
 	if (IS_ERR(pp) && PTR_ERR(pp) == -EINPROGRESS) {
 		ret = GRO_CONSUMED;
@@ -5524,29 +5495,25 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 
 struct packet_offload *gro_find_receive_by_type(__be16 type)
 {
-	struct list_head *offload_head = &offload_base;
-	struct packet_offload *ptype;
+	struct net_offload *off;
 
-	list_for_each_entry_rcu(ptype, offload_head, list) {
-		if (ptype->type != type || !ptype->callbacks.gro_receive)
-			continue;
-		return ptype;
-	}
-	return NULL;
+	off = (struct net_offload *) rcu_dereference(dev_offloads[type & 0xFF]);
+	if (off && off->type == type && off->callbacks.gro_receive)
+		return off;
+	else
+		return NULL;
 }
 EXPORT_SYMBOL(gro_find_receive_by_type);
 
 struct packet_offload *gro_find_complete_by_type(__be16 type)
 {
-	struct list_head *offload_head = &offload_base;
-	struct packet_offload *ptype;
+	struct net_offload *off;
 
-	list_for_each_entry_rcu(ptype, offload_head, list) {
-		if (ptype->type != type || !ptype->callbacks.gro_complete)
-			continue;
-		return ptype;
-	}
-	return NULL;
+	off = (struct net_offload *) rcu_dereference(dev_offloads[type & 0xFF]);
+	if (off && off->type == type && off->callbacks.gro_complete)
+		return off;
+	else
+		return NULL;
 }
 EXPORT_SYMBOL(gro_find_complete_by_type);
 
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related

* [PATCH net-next RFC 2/8] gro: deduplicate gro_complete
From: Willem de Bruijn @ 2018-09-14 17:59 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, steffen.klassert, davem, Willem de Bruijn
In-Reply-To: <20180914175941.213950-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

The gro completion datapath is open coded for all protocols.
Deduplicate with new helper function net_gro_complete.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/geneve.c      |  9 +--------
 include/linux/netdevice.h | 19 ++++++++++++++++++-
 net/8021q/vlan.c          | 10 +---------
 net/core/dev.c            | 24 +-----------------------
 net/ethernet/eth.c        | 11 +----------
 net/ipv4/af_inet.c        | 15 ++-------------
 net/ipv4/fou.c            | 25 +++----------------------
 net/ipv4/gre_offload.c    | 12 +++---------
 net/ipv6/ip6_offload.c    | 13 +------------
 9 files changed, 31 insertions(+), 107 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 6625fabe2c88..a3a4621d9bee 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -488,7 +488,6 @@ static int geneve_gro_complete(struct sock *sk, struct sk_buff *skb,
 			       int nhoff)
 {
 	struct genevehdr *gh;
-	struct packet_offload *ptype;
 	__be16 type;
 	int gh_len;
 	int err = -ENOSYS;
@@ -497,13 +496,7 @@ static int geneve_gro_complete(struct sock *sk, struct sk_buff *skb,
 	gh_len = geneve_hlen(gh);
 	type = gh->proto_type;
 
-	rcu_read_lock();
-	ptype = gro_find_complete_by_type(type);
-	if (ptype)
-		err = ptype->callbacks.gro_complete(skb, nhoff + gh_len);
-
-	rcu_read_unlock();
-
+	err = net_gro_complete(dev_offloads, type, skb, nhoff + gh_len);
 	skb_set_inner_mac_header(skb, nhoff + gh_len);
 
 	return err;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7425068fa249..0d292ea6716e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3557,7 +3557,8 @@ void napi_gro_flush(struct napi_struct *napi, bool flush_old);
 struct sk_buff *napi_get_frags(struct napi_struct *napi);
 gro_result_t napi_gro_frags(struct napi_struct *napi);
 struct packet_offload *gro_find_receive_by_type(__be16 type);
-struct packet_offload *gro_find_complete_by_type(__be16 type);
+
+extern const struct net_offload __rcu *dev_offloads[256];
 
 static inline u8 net_offload_from_type(u16 type)
 {
@@ -3567,6 +3568,22 @@ static inline u8 net_offload_from_type(u16 type)
 	return type & 0xFF;
 }
 
+static inline int net_gro_complete(const struct net_offload __rcu **offs,
+				   u16 type, struct sk_buff *skb, int nhoff)
+{
+	const struct net_offload *off;
+	int ret = -ENOENT;
+
+	rcu_read_lock();
+	off = rcu_dereference(offs[net_offload_from_type(type)]);
+	if (off && off->callbacks.gro_complete &&
+	    (!off->type || off->type == type))
+		ret = off->callbacks.gro_complete(skb, nhoff);
+	rcu_read_unlock();
+
+	return ret;
+}
+
 static inline void napi_free_frags(struct napi_struct *napi)
 {
 	kfree_skb(napi->skb);
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 5e9950453955..6ac27aa9f158 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -703,16 +703,8 @@ static int vlan_gro_complete(struct sk_buff *skb, int nhoff)
 {
 	struct vlan_hdr *vhdr = (struct vlan_hdr *)(skb->data + nhoff);
 	__be16 type = vhdr->h_vlan_encapsulated_proto;
-	struct packet_offload *ptype;
-	int err = -ENOENT;
 
-	rcu_read_lock();
-	ptype = gro_find_complete_by_type(type);
-	if (ptype)
-		err = ptype->callbacks.gro_complete(skb, nhoff + sizeof(*vhdr));
-
-	rcu_read_unlock();
-	return err;
+	return net_gro_complete(dev_offloads, type, skb, nhoff + sizeof(*vhdr));
 }
 
 static struct packet_offload vlan_packet_offloads[] __read_mostly = {
diff --git a/net/core/dev.c b/net/core/dev.c
index 55f86b6d3182..2c21e507291f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5235,10 +5235,6 @@ static void flush_all_backlogs(void)
 
 static int napi_gro_complete(struct sk_buff *skb)
 {
-	const struct packet_offload *ptype;
-	__be16 type = skb->protocol;
-	int err = -ENOENT;
-
 	BUILD_BUG_ON(sizeof(struct napi_gro_cb) > sizeof(skb->cb));
 
 	if (NAPI_GRO_CB(skb)->count == 1) {
@@ -5246,13 +5242,7 @@ static int napi_gro_complete(struct sk_buff *skb)
 		goto out;
 	}
 
-	rcu_read_lock();
-	ptype = dev_offloads[net_offload_from_type(type)];
-	if (ptype && ptype->callbacks.gro_complete)
-		err = ptype->callbacks.gro_complete(skb, 0);
-	rcu_read_unlock();
-
-	if (err) {
+	if (net_gro_complete(dev_offloads, skb->protocol, skb, 0)) {
 		kfree_skb(skb);
 		return NET_RX_SUCCESS;
 	}
@@ -5505,18 +5495,6 @@ struct packet_offload *gro_find_receive_by_type(__be16 type)
 }
 EXPORT_SYMBOL(gro_find_receive_by_type);
 
-struct packet_offload *gro_find_complete_by_type(__be16 type)
-{
-	struct net_offload *off;
-
-	off = (struct net_offload *) rcu_dereference(dev_offloads[type & 0xFF]);
-	if (off && off->type == type && off->callbacks.gro_complete)
-		return off;
-	else
-		return NULL;
-}
-EXPORT_SYMBOL(gro_find_complete_by_type);
-
 static void napi_skb_free_stolen_head(struct sk_buff *skb)
 {
 	skb_dst_drop(skb);
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index fd8faa0dfa61..fb17a13722e8 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -485,20 +485,11 @@ int eth_gro_complete(struct sk_buff *skb, int nhoff)
 {
 	struct ethhdr *eh = (struct ethhdr *)(skb->data + nhoff);
 	__be16 type = eh->h_proto;
-	struct packet_offload *ptype;
-	int err = -ENOSYS;
 
 	if (skb->encapsulation)
 		skb_set_inner_mac_header(skb, nhoff);
 
-	rcu_read_lock();
-	ptype = gro_find_complete_by_type(type);
-	if (ptype != NULL)
-		err = ptype->callbacks.gro_complete(skb, nhoff +
-						    sizeof(struct ethhdr));
-
-	rcu_read_unlock();
-	return err;
+	return net_gro_complete(dev_offloads, type, skb, nhoff + sizeof(*eh));
 }
 EXPORT_SYMBOL(eth_gro_complete);
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 1fbe2f815474..1b72ee4a7811 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1560,9 +1560,7 @@ int inet_gro_complete(struct sk_buff *skb, int nhoff)
 {
 	__be16 newlen = htons(skb->len - nhoff);
 	struct iphdr *iph = (struct iphdr *)(skb->data + nhoff);
-	const struct net_offload *ops;
 	int proto = iph->protocol;
-	int err = -ENOSYS;
 
 	if (skb->encapsulation) {
 		skb_set_inner_protocol(skb, cpu_to_be16(ETH_P_IP));
@@ -1572,21 +1570,12 @@ int inet_gro_complete(struct sk_buff *skb, int nhoff)
 	csum_replace2(&iph->check, iph->tot_len, newlen);
 	iph->tot_len = newlen;
 
-	rcu_read_lock();
-	ops = rcu_dereference(inet_offloads[proto]);
-	if (WARN_ON(!ops || !ops->callbacks.gro_complete))
-		goto out_unlock;
-
 	/* Only need to add sizeof(*iph) to get to the next hdr below
 	 * because any hdr with option will have been flushed in
 	 * inet_gro_receive().
 	 */
-	err = ops->callbacks.gro_complete(skb, nhoff + sizeof(*iph));
-
-out_unlock:
-	rcu_read_unlock();
-
-	return err;
+	return net_gro_complete(inet_offloads, proto, skb,
+				nhoff + sizeof(*iph));
 }
 EXPORT_SYMBOL(inet_gro_complete);
 
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 500a59906b87..c42a3ef17864 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -261,24 +261,14 @@ static struct sk_buff *fou_gro_receive(struct sock *sk,
 static int fou_gro_complete(struct sock *sk, struct sk_buff *skb,
 			    int nhoff)
 {
-	const struct net_offload *ops;
 	u8 proto = fou_from_sock(sk)->protocol;
-	int err = -ENOSYS;
 	const struct net_offload **offloads;
+	int err;
 
-	rcu_read_lock();
 	offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
-	ops = rcu_dereference(offloads[proto]);
-	if (WARN_ON(!ops || !ops->callbacks.gro_complete))
-		goto out_unlock;
-
-	err = ops->callbacks.gro_complete(skb, nhoff);
-
+	err = net_gro_complete(offloads, proto, skb, nhoff);
 	skb_set_inner_mac_header(skb, nhoff);
 
-out_unlock:
-	rcu_read_unlock();
-
 	return err;
 }
 
@@ -457,7 +447,6 @@ static int gue_gro_complete(struct sock *sk, struct sk_buff *skb, int nhoff)
 {
 	const struct net_offload **offloads;
 	struct guehdr *guehdr = (struct guehdr *)(skb->data + nhoff);
-	const struct net_offload *ops;
 	unsigned int guehlen = 0;
 	u8 proto;
 	int err = -ENOENT;
@@ -483,18 +472,10 @@ static int gue_gro_complete(struct sock *sk, struct sk_buff *skb, int nhoff)
 		return err;
 	}
 
-	rcu_read_lock();
 	offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
-	ops = rcu_dereference(offloads[proto]);
-	if (WARN_ON(!ops || !ops->callbacks.gro_complete))
-		goto out_unlock;
-
-	err = ops->callbacks.gro_complete(skb, nhoff + guehlen);
-
+	err = net_gro_complete(offloads, proto, skb, nhoff + guehlen);
 	skb_set_inner_mac_header(skb, nhoff + guehlen);
 
-out_unlock:
-	rcu_read_unlock();
 	return err;
 }
 
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index 6c63524f598a..fc8c99e4a058 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -231,10 +231,9 @@ static struct sk_buff *gre_gro_receive(struct list_head *head,
 static int gre_gro_complete(struct sk_buff *skb, int nhoff)
 {
 	struct gre_base_hdr *greh = (struct gre_base_hdr *)(skb->data + nhoff);
-	struct packet_offload *ptype;
 	unsigned int grehlen = sizeof(*greh);
-	int err = -ENOENT;
 	__be16 type;
+	int err;
 
 	skb->encapsulation = 1;
 	skb_shinfo(skb)->gso_type = SKB_GSO_GRE;
@@ -246,13 +245,8 @@ static int gre_gro_complete(struct sk_buff *skb, int nhoff)
 	if (greh->flags & GRE_CSUM)
 		grehlen += GRE_HEADER_SECTION;
 
-	rcu_read_lock();
-	ptype = gro_find_complete_by_type(type);
-	if (ptype)
-		err = ptype->callbacks.gro_complete(skb, nhoff + grehlen);
-
-	rcu_read_unlock();
-
+	err = net_gro_complete(dev_offloads, type, skb,
+				nhoff + sizeof(*greh));
 	skb_set_inner_mac_header(skb, nhoff + grehlen);
 
 	return err;
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index c7e495f12011..e8bf554ae611 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -298,7 +298,6 @@ static int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 {
 	const struct net_offload *ops;
 	struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + nhoff);
-	int err = -ENOSYS;
 
 	if (skb->encapsulation) {
 		skb_set_inner_protocol(skb, cpu_to_be16(ETH_P_IPV6));
@@ -307,18 +306,8 @@ static int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 
 	iph->payload_len = htons(skb->len - nhoff - sizeof(*iph));
 
-	rcu_read_lock();
-
 	nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops);
-	if (WARN_ON(!ops || !ops->callbacks.gro_complete))
-		goto out_unlock;
-
-	err = ops->callbacks.gro_complete(skb, nhoff);
-
-out_unlock:
-	rcu_read_unlock();
-
-	return err;
+	return net_gro_complete(inet6_offloads, ops->type, skb, nhoff);
 }
 
 static int sit_gro_complete(struct sk_buff *skb, int nhoff)
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related

* [PATCH net-next RFC 3/8] gro: add net_gro_receive
From: Willem de Bruijn @ 2018-09-14 17:59 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, steffen.klassert, davem, Willem de Bruijn
In-Reply-To: <20180914175941.213950-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

For configurable gro_receive all callsites need to be updated. Similar
to gro_complete, introduce a single shared helper, net_gro_receive.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/geneve.c      |  2 +-
 include/linux/netdevice.h | 14 +++++++++++++-
 net/8021q/vlan.c          |  2 +-
 net/core/dev.c            | 20 ++++----------------
 net/ethernet/eth.c        |  2 +-
 net/ipv4/af_inet.c        |  4 ++--
 net/ipv4/fou.c            |  8 ++++----
 net/ipv4/gre_offload.c    | 12 ++++++------
 net/ipv6/ip6_offload.c    |  8 ++++----
 9 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index a3a4621d9bee..a812a774e5fd 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -467,7 +467,7 @@ static struct sk_buff *geneve_gro_receive(struct sock *sk,
 	type = gh->proto_type;
 
 	rcu_read_lock();
-	ptype = gro_find_receive_by_type(type);
+	ptype = net_gro_receive(dev_offloads, type);
 	if (!ptype)
 		goto out_unlock;
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0d292ea6716e..0be594f8d1ce 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3556,7 +3556,6 @@ gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb);
 void napi_gro_flush(struct napi_struct *napi, bool flush_old);
 struct sk_buff *napi_get_frags(struct napi_struct *napi);
 gro_result_t napi_gro_frags(struct napi_struct *napi);
-struct packet_offload *gro_find_receive_by_type(__be16 type);
 
 extern const struct net_offload __rcu *dev_offloads[256];
 
@@ -3568,6 +3567,19 @@ static inline u8 net_offload_from_type(u16 type)
 	return type & 0xFF;
 }
 
+static inline const struct net_offload *
+net_gro_receive(const struct net_offload __rcu **offs, u16 type)
+{
+	const struct net_offload *off;
+
+	off = rcu_dereference(offs[net_offload_from_type(type)]);
+	if (off && off->callbacks.gro_receive &&
+	    (!off->type || off->type == type))
+		return off;
+	else
+		return NULL;
+}
+
 static inline int net_gro_complete(const struct net_offload __rcu **offs,
 				   u16 type, struct sk_buff *skb, int nhoff)
 {
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 6ac27aa9f158..a106c5373b1d 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -670,7 +670,7 @@ static struct sk_buff *vlan_gro_receive(struct list_head *head,
 	type = vhdr->h_vlan_encapsulated_proto;
 
 	rcu_read_lock();
-	ptype = gro_find_receive_by_type(type);
+	ptype = net_gro_receive(dev_offloads, type);
 	if (!ptype)
 		goto out_unlock;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 2c21e507291f..ae5fbd4114d2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5382,7 +5382,7 @@ static void gro_flush_oldest(struct list_head *head)
 static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 {
 	u32 hash = skb_get_hash_raw(skb) & (GRO_HASH_BUCKETS - 1);
-	const struct packet_offload *ptype;
+	const struct net_offload *ops;
 	__be16 type = skb->protocol;
 	struct list_head *gro_head;
 	struct sk_buff *pp = NULL;
@@ -5396,8 +5396,8 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 	gro_head = gro_list_prepare(napi, skb);
 
 	rcu_read_lock();
-	ptype = dev_offloads[net_offload_from_type(type)];
-	if (ptype && ptype->callbacks.gro_receive) {
+	ops = net_gro_receive(dev_offloads, type);
+	if (ops) {
 		skb_set_network_header(skb, skb_gro_offset(skb));
 		skb_reset_mac_len(skb);
 		NAPI_GRO_CB(skb)->same_flow = 0;
@@ -5425,7 +5425,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 			NAPI_GRO_CB(skb)->csum_valid = 0;
 		}
 
-		pp = ptype->callbacks.gro_receive(gro_head, skb);
+		pp = ops->callbacks.gro_receive(gro_head, skb);
 		rcu_read_unlock();
 	} else {
 		rcu_read_unlock();
@@ -5483,18 +5483,6 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 	goto pull;
 }
 
-struct packet_offload *gro_find_receive_by_type(__be16 type)
-{
-	struct net_offload *off;
-
-	off = (struct net_offload *) rcu_dereference(dev_offloads[type & 0xFF]);
-	if (off && off->type == type && off->callbacks.gro_receive)
-		return off;
-	else
-		return NULL;
-}
-EXPORT_SYMBOL(gro_find_receive_by_type);
-
 static void napi_skb_free_stolen_head(struct sk_buff *skb)
 {
 	skb_dst_drop(skb);
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index fb17a13722e8..542dbc2ec956 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -462,7 +462,7 @@ struct sk_buff *eth_gro_receive(struct list_head *head, struct sk_buff *skb)
 	type = eh->h_proto;
 
 	rcu_read_lock();
-	ptype = gro_find_receive_by_type(type);
+	ptype = net_gro_receive(dev_offloads, type);
 	if (ptype == NULL) {
 		flush = 1;
 		goto out_unlock;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 1b72ee4a7811..28b7c7671789 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1409,8 +1409,8 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
 	proto = iph->protocol;
 
 	rcu_read_lock();
-	ops = rcu_dereference(inet_offloads[proto]);
-	if (!ops || !ops->callbacks.gro_receive)
+	ops = net_gro_receive(inet_offloads, proto);
+	if (!ops)
 		goto out_unlock;
 
 	if (*(u8 *)iph != 0x45)
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index c42a3ef17864..13401cb2e7a4 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -246,8 +246,8 @@ static struct sk_buff *fou_gro_receive(struct sock *sk,
 
 	rcu_read_lock();
 	offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
-	ops = rcu_dereference(offloads[proto]);
-	if (!ops || !ops->callbacks.gro_receive)
+	ops = net_gro_receive(offloads, proto);
+	if (!ops)
 		goto out_unlock;
 
 	pp = call_gro_receive(ops->callbacks.gro_receive, head, skb);
@@ -428,8 +428,8 @@ static struct sk_buff *gue_gro_receive(struct sock *sk,
 
 	rcu_read_lock();
 	offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
-	ops = rcu_dereference(offloads[proto]);
-	if (WARN_ON_ONCE(!ops || !ops->callbacks.gro_receive))
+	ops = net_gro_receive(offloads, proto);
+	if (WARN_ON_ONCE(!ops))
 		goto out_unlock;
 
 	pp = call_gro_receive(ops->callbacks.gro_receive, head, skb);
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index fc8c99e4a058..4f9237a4bea1 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -111,13 +111,13 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
 static struct sk_buff *gre_gro_receive(struct list_head *head,
 				       struct sk_buff *skb)
 {
-	struct sk_buff *pp = NULL;
-	struct sk_buff *p;
 	const struct gre_base_hdr *greh;
+	const struct net_offload *ops;
 	unsigned int hlen, grehlen;
+	struct sk_buff *pp = NULL;
+	struct sk_buff *p;
 	unsigned int off;
 	int flush = 1;
-	struct packet_offload *ptype;
 	__be16 type;
 
 	if (NAPI_GRO_CB(skb)->encap_mark)
@@ -154,8 +154,8 @@ static struct sk_buff *gre_gro_receive(struct list_head *head,
 	type = greh->protocol;
 
 	rcu_read_lock();
-	ptype = gro_find_receive_by_type(type);
-	if (!ptype)
+	ops = net_gro_receive(dev_offloads, type);
+	if (!ops)
 		goto out_unlock;
 
 	grehlen = GRE_HEADER_SECTION;
@@ -217,7 +217,7 @@ static struct sk_buff *gre_gro_receive(struct list_head *head,
 	/* Adjusted NAPI_GRO_CB(skb)->csum after skb_gro_pull()*/
 	skb_gro_postpull_rcsum(skb, greh, grehlen);
 
-	pp = call_gro_receive(ptype->callbacks.gro_receive, head, skb);
+	pp = call_gro_receive(ops->callbacks.gro_receive, head, skb);
 	flush = 0;
 
 out_unlock:
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index e8bf554ae611..9d301bef0e23 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -194,8 +194,8 @@ static struct sk_buff *ipv6_gro_receive(struct list_head *head,
 
 	rcu_read_lock();
 	proto = iph->nexthdr;
-	ops = rcu_dereference(inet6_offloads[proto]);
-	if (!ops || !ops->callbacks.gro_receive) {
+	ops = net_gro_receive(inet6_offloads, proto);
+	if (!ops) {
 		__pskb_pull(skb, skb_gro_offset(skb));
 		skb_gro_frag0_invalidate(skb);
 		proto = ipv6_gso_pull_exthdrs(skb, proto);
@@ -203,8 +203,8 @@ static struct sk_buff *ipv6_gro_receive(struct list_head *head,
 		skb_reset_transport_header(skb);
 		__skb_push(skb, skb_gro_offset(skb));
 
-		ops = rcu_dereference(inet6_offloads[proto]);
-		if (!ops || !ops->callbacks.gro_receive)
+		ops = net_gro_receive(inet6_offloads, proto);
+		if (!ops)
 			goto out_unlock;
 
 		iph = ipv6_hdr(skb);
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related

* [PATCH net-next RFC 4/8] ipv6: remove offload exception for hopopts
From: Willem de Bruijn @ 2018-09-14 17:59 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, steffen.klassert, davem, Willem de Bruijn
In-Reply-To: <20180914175941.213950-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

Extension headers in ipv6 are pulled without calling a callback
function. An inet6_offload signals this feature with flag
INET6_PROTO_GSO_EXTHDR.

Add net_has_flag helper to hide implementation details and in
prepartion for configurable gro.

Convert NEXTHDR_HOP from a special case branch to a standard
extension header offload.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/netdevice.h  |  9 +++++++++
 net/ipv6/exthdrs_offload.c | 17 ++++++++++++++---
 net/ipv6/ip6_offload.c     | 36 +++++++++++++-----------------------
 3 files changed, 36 insertions(+), 26 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0be594f8d1ce..1c97a048506f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3567,6 +3567,15 @@ static inline u8 net_offload_from_type(u16 type)
 	return type & 0xFF;
 }
 
+static inline bool net_offload_has_flag(const struct net_offload __rcu **offs,
+					u16 type, u16 flag)
+{
+	const struct net_offload *off;
+
+	off = offs ? rcu_dereference(offs[net_offload_from_type(type)]) : NULL;
+	return off && off->flags & flag;
+}
+
 static inline const struct net_offload *
 net_gro_receive(const struct net_offload __rcu **offs, u16 type)
 {
diff --git a/net/ipv6/exthdrs_offload.c b/net/ipv6/exthdrs_offload.c
index f5e2ba1c18bf..2230331c6012 100644
--- a/net/ipv6/exthdrs_offload.c
+++ b/net/ipv6/exthdrs_offload.c
@@ -12,11 +12,15 @@
 #include <net/protocol.h>
 #include "ip6_offload.h"
 
-static const struct net_offload rthdr_offload = {
+static struct net_offload hophdr_offload = {
 	.flags		=	INET6_PROTO_GSO_EXTHDR,
 };
 
-static const struct net_offload dstopt_offload = {
+static struct net_offload rthdr_offload = {
+	.flags		=	INET6_PROTO_GSO_EXTHDR,
+};
+
+static struct net_offload dstopt_offload = {
 	.flags		=	INET6_PROTO_GSO_EXTHDR,
 };
 
@@ -24,10 +28,14 @@ int __init ipv6_exthdrs_offload_init(void)
 {
 	int ret;
 
-	ret = inet6_add_offload(&rthdr_offload, IPPROTO_ROUTING);
+	ret = inet6_add_offload(&hophdr_offload, IPPROTO_HOPOPTS);
 	if (ret)
 		goto out;
 
+	ret = inet6_add_offload(&rthdr_offload, IPPROTO_ROUTING);
+	if (ret)
+		goto out_hop;
+
 	ret = inet6_add_offload(&dstopt_offload, IPPROTO_DSTOPTS);
 	if (ret)
 		goto out_rt;
@@ -37,5 +45,8 @@ int __init ipv6_exthdrs_offload_init(void)
 
 out_rt:
 	inet6_del_offload(&rthdr_offload, IPPROTO_ROUTING);
+
+out_hop:
+	inet6_del_offload(&rthdr_offload, IPPROTO_HOPOPTS);
 	goto out;
 }
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 9d301bef0e23..4854509a2c5d 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -22,21 +22,13 @@
 
 static int ipv6_gso_pull_exthdrs(struct sk_buff *skb, int proto)
 {
-	const struct net_offload *ops = NULL;
-
 	for (;;) {
 		struct ipv6_opt_hdr *opth;
 		int len;
 
-		if (proto != NEXTHDR_HOP) {
-			ops = rcu_dereference(inet6_offloads[proto]);
-
-			if (unlikely(!ops))
-				break;
-
-			if (!(ops->flags & INET6_PROTO_GSO_EXTHDR))
-				break;
-		}
+		if (!net_offload_has_flag(inet6_offloads, proto,
+					  INET6_PROTO_GSO_EXTHDR))
+			break;
 
 		if (unlikely(!pskb_may_pull(skb, 8)))
 			break;
@@ -141,26 +133,24 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 /* Return the total length of all the extension hdrs, following the same
  * logic in ipv6_gso_pull_exthdrs() when parsing ext-hdrs.
  */
-static int ipv6_exthdrs_len(struct ipv6hdr *iph,
-			    const struct net_offload **opps)
+static int ipv6_exthdrs_len(struct ipv6hdr *iph, u8 *pproto)
 {
 	struct ipv6_opt_hdr *opth = (void *)iph;
 	int len = 0, proto, optlen = sizeof(*iph);
 
 	proto = iph->nexthdr;
 	for (;;) {
-		if (proto != NEXTHDR_HOP) {
-			*opps = rcu_dereference(inet6_offloads[proto]);
-			if (unlikely(!(*opps)))
-				break;
-			if (!((*opps)->flags & INET6_PROTO_GSO_EXTHDR))
-				break;
-		}
+		if (!net_offload_has_flag(inet6_offloads, proto,
+					  INET6_PROTO_GSO_EXTHDR))
+			break;
+
 		opth = (void *)opth + optlen;
 		optlen = ipv6_optlen(opth);
 		len += optlen;
 		proto = opth->nexthdr;
 	}
+
+	*pproto = proto;
 	return len;
 }
 
@@ -296,8 +286,8 @@ static struct sk_buff *ip4ip6_gro_receive(struct list_head *head,
 
 static int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 {
-	const struct net_offload *ops;
 	struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + nhoff);
+	u8 proto;
 
 	if (skb->encapsulation) {
 		skb_set_inner_protocol(skb, cpu_to_be16(ETH_P_IPV6));
@@ -306,8 +296,8 @@ static int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 
 	iph->payload_len = htons(skb->len - nhoff - sizeof(*iph));
 
-	nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops);
-	return net_gro_complete(inet6_offloads, ops->type, skb, nhoff);
+	nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &proto);
+	return net_gro_complete(inet6_offloads, proto, skb, nhoff);
 }
 
 static int sit_gro_complete(struct sk_buff *skb, int nhoff)
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related

* [PATCH net-next RFC 5/8] net: deconstify net_offload
From: Willem de Bruijn @ 2018-09-14 17:59 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, steffen.klassert, davem, Willem de Bruijn
In-Reply-To: <20180914175941.213950-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

With configurable gro, the flags field in net_offloads may be changed.

Remove the const keyword. This is a noop otherwise.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/netdevice.h | 14 +++++++-------
 include/net/protocol.h    | 12 ++++++------
 net/core/dev.c            |  8 +++-----
 net/ipv4/af_inet.c        |  2 +-
 net/ipv4/esp4_offload.c   |  2 +-
 net/ipv4/fou.c            |  8 ++++----
 net/ipv4/gre_offload.c    |  2 +-
 net/ipv4/protocol.c       | 10 +++++-----
 net/ipv4/tcp_offload.c    |  2 +-
 net/ipv4/udp_offload.c    |  6 +++---
 net/ipv6/esp6_offload.c   |  2 +-
 net/ipv6/ip6_offload.c    |  6 +++---
 net/ipv6/protocol.c       | 10 +++++-----
 net/ipv6/tcpv6_offload.c  |  2 +-
 net/ipv6/udp_offload.c    |  2 +-
 net/sctp/offload.c        |  2 +-
 16 files changed, 44 insertions(+), 46 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1c97a048506f..b9e671887fc2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3557,7 +3557,7 @@ void napi_gro_flush(struct napi_struct *napi, bool flush_old);
 struct sk_buff *napi_get_frags(struct napi_struct *napi);
 gro_result_t napi_gro_frags(struct napi_struct *napi);
 
-extern const struct net_offload __rcu *dev_offloads[256];
+extern struct net_offload __rcu *dev_offloads[256];
 
 static inline u8 net_offload_from_type(u16 type)
 {
@@ -3567,19 +3567,19 @@ static inline u8 net_offload_from_type(u16 type)
 	return type & 0xFF;
 }
 
-static inline bool net_offload_has_flag(const struct net_offload __rcu **offs,
+static inline bool net_offload_has_flag(struct net_offload __rcu **offs,
 					u16 type, u16 flag)
 {
-	const struct net_offload *off;
+	struct net_offload *off;
 
 	off = offs ? rcu_dereference(offs[net_offload_from_type(type)]) : NULL;
 	return off && off->flags & flag;
 }
 
 static inline const struct net_offload *
-net_gro_receive(const struct net_offload __rcu **offs, u16 type)
+net_gro_receive(struct net_offload __rcu **offs, u16 type)
 {
-	const struct net_offload *off;
+	struct net_offload *off;
 
 	off = rcu_dereference(offs[net_offload_from_type(type)]);
 	if (off && off->callbacks.gro_receive &&
@@ -3589,10 +3589,10 @@ net_gro_receive(const struct net_offload __rcu **offs, u16 type)
 		return NULL;
 }
 
-static inline int net_gro_complete(const struct net_offload __rcu **offs,
+static inline int net_gro_complete(struct net_offload __rcu **offs,
 				   u16 type, struct sk_buff *skb, int nhoff)
 {
-	const struct net_offload *off;
+	struct net_offload *off;
 	int ret = -ENOENT;
 
 	rcu_read_lock();
diff --git a/include/net/protocol.h b/include/net/protocol.h
index 53a0322ee545..5e2c20b662d1 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -87,8 +87,8 @@ struct inet_protosw {
 #define INET_PROTOSW_ICSK      0x04  /* Is this an inet_connection_sock? */
 
 extern struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS];
-extern const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS];
-extern const struct net_offload __rcu *inet6_offloads[MAX_INET_PROTOS];
+extern struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS];
+extern struct net_offload __rcu *inet6_offloads[MAX_INET_PROTOS];
 
 #if IS_ENABLED(CONFIG_IPV6)
 extern struct inet6_protocol __rcu *inet6_protos[MAX_INET_PROTOS];
@@ -96,8 +96,8 @@ extern struct inet6_protocol __rcu *inet6_protos[MAX_INET_PROTOS];
 
 int inet_add_protocol(const struct net_protocol *prot, unsigned char num);
 int inet_del_protocol(const struct net_protocol *prot, unsigned char num);
-int inet_add_offload(const struct net_offload *prot, unsigned char num);
-int inet_del_offload(const struct net_offload *prot, unsigned char num);
+int inet_add_offload(struct net_offload *prot, unsigned char num);
+int inet_del_offload(struct net_offload *prot, unsigned char num);
 void inet_register_protosw(struct inet_protosw *p);
 void inet_unregister_protosw(struct inet_protosw *p);
 
@@ -107,7 +107,7 @@ int inet6_del_protocol(const struct inet6_protocol *prot, unsigned char num);
 int inet6_register_protosw(struct inet_protosw *p);
 void inet6_unregister_protosw(struct inet_protosw *p);
 #endif
-int inet6_add_offload(const struct net_offload *prot, unsigned char num);
-int inet6_del_offload(const struct net_offload *prot, unsigned char num);
+int inet6_add_offload(struct net_offload *prot, unsigned char num);
+int inet6_del_offload(struct net_offload *prot, unsigned char num);
 
 #endif	/* _PROTOCOL_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index ae5fbd4114d2..20d9552afd38 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -466,7 +466,7 @@ void dev_remove_pack(struct packet_type *pt)
 EXPORT_SYMBOL(dev_remove_pack);
 
 
-const struct net_offload __rcu *dev_offloads[256] __read_mostly;
+struct net_offload __rcu *dev_offloads[256] __read_mostly;
 EXPORT_SYMBOL(dev_offloads);
 
 /**
@@ -483,8 +483,7 @@ EXPORT_SYMBOL(dev_offloads);
  */
 void dev_add_offload(struct packet_offload *po)
 {
-	cmpxchg((const struct net_offload **)
-		&dev_offloads[net_offload_from_type(po->type)],
+	cmpxchg(&dev_offloads[net_offload_from_type(po->type)],
 			NULL, po);
 }
 EXPORT_SYMBOL(dev_add_offload);
@@ -504,8 +503,7 @@ EXPORT_SYMBOL(dev_add_offload);
  */
 static int __dev_remove_offload(struct packet_offload *po)
 {
-	return (cmpxchg((const struct net_offload **)
-			&dev_offloads[net_offload_from_type(po->type)],
+	return (cmpxchg(&dev_offloads[net_offload_from_type(po->type)],
 		       po, NULL) == po) ? 0 : -1;
 }
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 28b7c7671789..f3ee6f4dfc0f 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1839,7 +1839,7 @@ static struct packet_offload ip_packet_offload __read_mostly = {
 	},
 };
 
-static const struct net_offload ipip_offload = {
+static struct net_offload ipip_offload = {
 	.callbacks = {
 		.gso_segment	= inet_gso_segment,
 		.gro_receive	= ipip_gro_receive,
diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c
index 58834a10c0be..e6d7a9be9244 100644
--- a/net/ipv4/esp4_offload.c
+++ b/net/ipv4/esp4_offload.c
@@ -240,7 +240,7 @@ static int esp_xmit(struct xfrm_state *x, struct sk_buff *skb,  netdev_features_
 	return 0;
 }
 
-static const struct net_offload esp4_offload = {
+static struct net_offload esp4_offload = {
 	.callbacks = {
 		.gro_receive = esp4_gro_receive,
 		.gso_segment = esp4_gso_segment,
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 13401cb2e7a4..52e01dcaa417 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -229,7 +229,7 @@ static struct sk_buff *fou_gro_receive(struct sock *sk,
 				       struct sk_buff *skb)
 {
 	u8 proto = fou_from_sock(sk)->protocol;
-	const struct net_offload **offloads;
+	struct net_offload **offloads;
 	const struct net_offload *ops;
 	struct sk_buff *pp = NULL;
 
@@ -262,7 +262,7 @@ static int fou_gro_complete(struct sock *sk, struct sk_buff *skb,
 			    int nhoff)
 {
 	u8 proto = fou_from_sock(sk)->protocol;
-	const struct net_offload **offloads;
+	struct net_offload **offloads;
 	int err;
 
 	offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
@@ -299,7 +299,7 @@ static struct sk_buff *gue_gro_receive(struct sock *sk,
 				       struct list_head *head,
 				       struct sk_buff *skb)
 {
-	const struct net_offload **offloads;
+	struct net_offload **offloads;
 	const struct net_offload *ops;
 	struct sk_buff *pp = NULL;
 	struct sk_buff *p;
@@ -445,7 +445,7 @@ static struct sk_buff *gue_gro_receive(struct sock *sk,
 
 static int gue_gro_complete(struct sock *sk, struct sk_buff *skb, int nhoff)
 {
-	const struct net_offload **offloads;
+	struct net_offload **offloads;
 	struct guehdr *guehdr = (struct guehdr *)(skb->data + nhoff);
 	unsigned int guehlen = 0;
 	u8 proto;
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index 4f9237a4bea1..70910650d322 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -252,7 +252,7 @@ static int gre_gro_complete(struct sk_buff *skb, int nhoff)
 	return err;
 }
 
-static const struct net_offload gre_offload = {
+static struct net_offload gre_offload = {
 	.callbacks = {
 		.gso_segment = gre_gso_segment,
 		.gro_receive = gre_gro_receive,
diff --git a/net/ipv4/protocol.c b/net/ipv4/protocol.c
index 32a691b7ce2c..66948d77672e 100644
--- a/net/ipv4/protocol.c
+++ b/net/ipv4/protocol.c
@@ -29,7 +29,7 @@
 #include <net/protocol.h>
 
 struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS] __read_mostly;
-const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS] __read_mostly;
+struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS] __read_mostly;
 EXPORT_SYMBOL(inet_offloads);
 
 int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol)
@@ -45,9 +45,9 @@ int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol)
 }
 EXPORT_SYMBOL(inet_add_protocol);
 
-int inet_add_offload(const struct net_offload *prot, unsigned char protocol)
+int inet_add_offload(struct net_offload *prot, unsigned char protocol)
 {
-	return !cmpxchg((const struct net_offload **)&inet_offloads[protocol],
+	return !cmpxchg((struct net_offload **)&inet_offloads[protocol],
 			NULL, prot) ? 0 : -1;
 }
 EXPORT_SYMBOL(inet_add_offload);
@@ -65,11 +65,11 @@ int inet_del_protocol(const struct net_protocol *prot, unsigned char protocol)
 }
 EXPORT_SYMBOL(inet_del_protocol);
 
-int inet_del_offload(const struct net_offload *prot, unsigned char protocol)
+int inet_del_offload(struct net_offload *prot, unsigned char protocol)
 {
 	int ret;
 
-	ret = (cmpxchg((const struct net_offload **)&inet_offloads[protocol],
+	ret = (cmpxchg((struct net_offload **)&inet_offloads[protocol],
 		       prot, NULL) == prot) ? 0 : -1;
 
 	synchronize_net();
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 870b0a335061..d670f2d008bc 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -333,7 +333,7 @@ static int tcp4_gro_complete(struct sk_buff *skb, int thoff)
 	return tcp_gro_complete(skb);
 }
 
-static const struct net_offload tcpv4_offload = {
+static struct net_offload tcpv4_offload = {
 	.callbacks = {
 		.gso_segment	=	tcp4_gso_segment,
 		.gro_receive	=	tcp4_gro_receive,
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 0c0522b79b43..4f6aa95a9b12 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -153,8 +153,8 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
 				       bool is_ipv6)
 {
 	__be16 protocol = skb->protocol;
-	const struct net_offload **offloads;
-	const struct net_offload *ops;
+	struct net_offload **offloads;
+	struct net_offload *ops;
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	struct sk_buff *(*gso_inner_segment)(struct sk_buff *skb,
 					     netdev_features_t features);
@@ -472,7 +472,7 @@ static int udp4_gro_complete(struct sk_buff *skb, int nhoff)
 	return udp_gro_complete(skb, nhoff, udp4_lib_lookup_skb);
 }
 
-static const struct net_offload udpv4_offload = {
+static struct net_offload udpv4_offload = {
 	.callbacks = {
 		.gso_segment = udp4_ufo_fragment,
 		.gro_receive  =	udp4_gro_receive,
diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c
index 6177e2171171..169dcd5c7135 100644
--- a/net/ipv6/esp6_offload.c
+++ b/net/ipv6/esp6_offload.c
@@ -268,7 +268,7 @@ static int esp6_xmit(struct xfrm_state *x, struct sk_buff *skb,  netdev_features
 	return 0;
 }
 
-static const struct net_offload esp6_offload = {
+static struct net_offload esp6_offload = {
 	.callbacks = {
 		.gro_receive = esp6_gro_receive,
 		.gso_segment = esp6_gso_segment,
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 4854509a2c5d..2d0ea3f453f2 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -330,7 +330,7 @@ static struct packet_offload ipv6_packet_offload __read_mostly = {
 	},
 };
 
-static const struct net_offload sit_offload = {
+static struct net_offload sit_offload = {
 	.callbacks = {
 		.gso_segment	= ipv6_gso_segment,
 		.gro_receive    = sit_ip6ip6_gro_receive,
@@ -338,7 +338,7 @@ static const struct net_offload sit_offload = {
 	},
 };
 
-static const struct net_offload ip4ip6_offload = {
+static struct net_offload ip4ip6_offload = {
 	.callbacks = {
 		.gso_segment	= inet_gso_segment,
 		.gro_receive    = ip4ip6_gro_receive,
@@ -346,7 +346,7 @@ static const struct net_offload ip4ip6_offload = {
 	},
 };
 
-static const struct net_offload ip6ip6_offload = {
+static struct net_offload ip6ip6_offload = {
 	.callbacks = {
 		.gso_segment	= ipv6_gso_segment,
 		.gro_receive    = sit_ip6ip6_gro_receive,
diff --git a/net/ipv6/protocol.c b/net/ipv6/protocol.c
index b5d54d4f995c..06efcfc6d02b 100644
--- a/net/ipv6/protocol.c
+++ b/net/ipv6/protocol.c
@@ -50,21 +50,21 @@ int inet6_del_protocol(const struct inet6_protocol *prot, unsigned char protocol
 EXPORT_SYMBOL(inet6_del_protocol);
 #endif
 
-const struct net_offload __rcu *inet6_offloads[MAX_INET_PROTOS] __read_mostly;
+struct net_offload __rcu *inet6_offloads[MAX_INET_PROTOS] __read_mostly;
 EXPORT_SYMBOL(inet6_offloads);
 
-int inet6_add_offload(const struct net_offload *prot, unsigned char protocol)
+int inet6_add_offload(struct net_offload *prot, unsigned char protocol)
 {
-	return !cmpxchg((const struct net_offload **)&inet6_offloads[protocol],
+	return !cmpxchg((struct net_offload **)&inet6_offloads[protocol],
 			NULL, prot) ? 0 : -1;
 }
 EXPORT_SYMBOL(inet6_add_offload);
 
-int inet6_del_offload(const struct net_offload *prot, unsigned char protocol)
+int inet6_del_offload(struct net_offload *prot, unsigned char protocol)
 {
 	int ret;
 
-	ret = (cmpxchg((const struct net_offload **)&inet6_offloads[protocol],
+	ret = (cmpxchg((struct net_offload **)&inet6_offloads[protocol],
 		       prot, NULL) == prot) ? 0 : -1;
 
 	synchronize_net();
diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
index e72947c99454..a3c5010e1361 100644
--- a/net/ipv6/tcpv6_offload.c
+++ b/net/ipv6/tcpv6_offload.c
@@ -67,7 +67,7 @@ static struct sk_buff *tcp6_gso_segment(struct sk_buff *skb,
 
 	return tcp_gso_segment(skb, features);
 }
-static const struct net_offload tcpv6_offload = {
+static struct net_offload tcpv6_offload = {
 	.callbacks = {
 		.gso_segment	=	tcp6_gso_segment,
 		.gro_receive	=	tcp6_gro_receive,
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 95dee9ca8d22..2a41da0dd33f 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -158,7 +158,7 @@ static int udp6_gro_complete(struct sk_buff *skb, int nhoff)
 	return udp_gro_complete(skb, nhoff, udp6_lib_lookup_skb);
 }
 
-static const struct net_offload udpv6_offload = {
+static struct net_offload udpv6_offload = {
 	.callbacks = {
 		.gso_segment	=	udp6_ufo_fragment,
 		.gro_receive	=	udp6_gro_receive,
diff --git a/net/sctp/offload.c b/net/sctp/offload.c
index 123e9f2dc226..ad504b83245d 100644
--- a/net/sctp/offload.c
+++ b/net/sctp/offload.c
@@ -90,7 +90,7 @@ static struct sk_buff *sctp_gso_segment(struct sk_buff *skb,
 	return segs;
 }
 
-static const struct net_offload sctp_offload = {
+static struct net_offload sctp_offload = {
 	.callbacks = {
 		.gso_segment = sctp_gso_segment,
 	},
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related

* [PATCH net-next RFC 6/8] net: make gro configurable
From: Willem de Bruijn @ 2018-09-14 17:59 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, steffen.klassert, davem, Willem de Bruijn
In-Reply-To: <20180914175941.213950-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

Add net_offload flag NET_OFF_FLAG_GRO_OFF. If set, a net_offload will
not be used for gro receive processing.

Also add sysctl helper proc_do_net_offload that toggles this flag and
register sysctls net.{core,ipv4,ipv6}.gro

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/vxlan.c        |  8 +++++
 include/linux/netdevice.h  |  7 ++++-
 net/core/dev.c             |  1 +
 net/core/sysctl_net_core.c | 60 ++++++++++++++++++++++++++++++++++++++
 net/ipv4/sysctl_net_ipv4.c |  7 +++++
 net/ipv6/ip6_offload.c     | 10 +++++--
 net/ipv6/sysctl_net_ipv6.c |  8 +++++
 7 files changed, 97 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index e5d236595206..8cb8e02c8ab6 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -572,6 +572,7 @@ static struct sk_buff *vxlan_gro_receive(struct sock *sk,
 					 struct list_head *head,
 					 struct sk_buff *skb)
 {
+	const struct net_offload *ops;
 	struct sk_buff *pp = NULL;
 	struct sk_buff *p;
 	struct vxlanhdr *vh, *vh2;
@@ -606,6 +607,12 @@ static struct sk_buff *vxlan_gro_receive(struct sock *sk,
 			goto out;
 	}
 
+	rcu_read_lock();
+	ops = net_gro_receive(dev_offloads, ETH_P_TEB);
+	rcu_read_unlock();
+	if (!ops)
+		goto out;
+
 	skb_gro_pull(skb, sizeof(struct vxlanhdr)); /* pull vxlan header */
 
 	list_for_each_entry(p, head, list) {
@@ -621,6 +628,7 @@ static struct sk_buff *vxlan_gro_receive(struct sock *sk,
 	}
 
 	pp = call_gro_receive(eth_gro_receive, head, skb);
+
 	flush = 0;
 
 out:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b9e671887fc2..93e8c9ade593 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2377,6 +2377,10 @@ struct net_offload {
 
 /* This should be set for any extension header which is compatible with GSO. */
 #define INET6_PROTO_GSO_EXTHDR	0x1
+#define NET_OFF_FLAG_GRO_OFF	0x2
+
+int proc_do_net_offload(struct ctl_table *ctl, int write, void __user *buffer,
+			size_t *lenp, loff_t *ppos);
 
 /* often modified stats are per-CPU, other are shared (netdev->stats) */
 struct pcpu_sw_netstats {
@@ -3583,7 +3587,8 @@ net_gro_receive(struct net_offload __rcu **offs, u16 type)
 
 	off = rcu_dereference(offs[net_offload_from_type(type)]);
 	if (off && off->callbacks.gro_receive &&
-	    (!off->type || off->type == type))
+	    (!off->type || off->type == type) &&
+	    !(off->flags & NET_OFF_FLAG_GRO_OFF))
 		return off;
 	else
 		return NULL;
diff --git a/net/core/dev.c b/net/core/dev.c
index 20d9552afd38..0fd5273bc931 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -154,6 +154,7 @@
 #define GRO_MAX_HEAD (MAX_HEADER + 128)
 
 static DEFINE_SPINLOCK(ptype_lock);
+DEFINE_SPINLOCK(offload_lock);
 struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
 struct list_head ptype_all __read_mostly;	/* Taps */
 static struct list_head offload_base __read_mostly;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index b1a2c5e38530..d2d72afdd9eb 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -15,6 +15,7 @@
 #include <linux/vmalloc.h>
 #include <linux/init.h>
 #include <linux/slab.h>
+#include <linux/bitmap.h>
 
 #include <net/ip.h>
 #include <net/sock.h>
@@ -34,6 +35,58 @@ static int net_msg_warn;	/* Unused, but still a sysctl */
 int sysctl_fb_tunnels_only_for_init_net __read_mostly = 0;
 EXPORT_SYMBOL(sysctl_fb_tunnels_only_for_init_net);
 
+extern spinlock_t offload_lock;
+
+#define NET_OFF_TBL_LEN	256
+
+int proc_do_net_offload(struct ctl_table *ctl, int write, void __user *buffer,
+			size_t *lenp, loff_t *ppos)
+{
+	unsigned long bitmap[NET_OFF_TBL_LEN / (sizeof(unsigned long) << 3)];
+	struct ctl_table tbl = { .maxlen = NET_OFF_TBL_LEN, .data = bitmap };
+	unsigned long flag = (unsigned long) ctl->extra2;
+	struct net_offload __rcu **offs = ctl->extra1;
+	struct net_offload *off;
+	int i, ret;
+
+	memset(bitmap, 0, sizeof(bitmap));
+
+	spin_lock(&offload_lock);
+
+	for (i = 0; i < tbl.maxlen; i++) {
+		off = rcu_dereference_protected(offs[i], lockdep_is_held(&offload_lock));
+		if (off && off->flags & flag) {
+			/* flag specific constraints */
+			if (flag == NET_OFF_FLAG_GRO_OFF) {
+				/* gro disable bit: only if can gro */
+				if (!off->callbacks.gro_receive &&
+				    !(off->flags & INET6_PROTO_GSO_EXTHDR))
+					continue;
+			}
+			set_bit(i, bitmap);
+		}
+	}
+
+	ret = proc_do_large_bitmap(&tbl, write, buffer, lenp, ppos);
+
+	if (write && !ret) {
+		for (i = 0; i < tbl.maxlen; i++) {
+			bool isset = test_bit(i, bitmap);
+
+			off = rcu_dereference_protected(offs[i], lockdep_is_held(&offload_lock));
+			if (!isset && (off->flags & flag))
+				off->flags &= ~flag;
+			else if (isset && !(off->flags & flag))
+				off->flags |= flag;
+		}
+	}
+
+	spin_unlock(&offload_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL(proc_do_net_offload);
+
 #ifdef CONFIG_RPS
 static int rps_sock_flow_sysctl(struct ctl_table *table, int write,
 				void __user *buffer, size_t *lenp, loff_t *ppos)
@@ -435,6 +488,13 @@ static struct ctl_table net_core_table[] = {
 		.extra1		= &zero,
 		.extra2		= &one
 	},
+	{
+		.procname	= "gro",
+		.mode		= 0644,
+		.proc_handler	= proc_do_net_offload,
+		.extra1		= dev_offloads,
+		.extra2		= (void *) NET_OFF_FLAG_GRO_OFF,
+	},
 #ifdef CONFIG_RPS
 	{
 		.procname	= "rps_sock_flow_entries",
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index b92f422f2fa8..7a525039afb2 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -477,6 +477,13 @@ static struct ctl_table ipv4_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "gro",
+		.mode		= 0644,
+		.proc_handler	= proc_do_net_offload,
+		.extra1		= inet_offloads,
+		.extra2		= (void *) NET_OFF_FLAG_GRO_OFF,
+	},
 #ifdef CONFIG_NETLABEL
 	{
 		.procname	= "cipso_cache_enable",
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 2d0ea3f453f2..6be5adbd2ce7 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -20,7 +20,7 @@
 
 #include "ip6_offload.h"
 
-static int ipv6_gso_pull_exthdrs(struct sk_buff *skb, int proto)
+static int ipv6_gso_pull_exthdrs(struct sk_buff *skb, int proto, bool is_gro)
 {
 	for (;;) {
 		struct ipv6_opt_hdr *opth;
@@ -30,6 +30,10 @@ static int ipv6_gso_pull_exthdrs(struct sk_buff *skb, int proto)
 					  INET6_PROTO_GSO_EXTHDR))
 			break;
 
+		if (is_gro && !net_offload_has_flag(inet6_offloads, proto,
+						    NET_OFF_FLAG_GRO_OFF))
+			break;
+
 		if (unlikely(!pskb_may_pull(skb, 8)))
 			break;
 
@@ -76,7 +80,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	__skb_pull(skb, sizeof(*ipv6h));
 	segs = ERR_PTR(-EPROTONOSUPPORT);
 
-	proto = ipv6_gso_pull_exthdrs(skb, ipv6h->nexthdr);
+	proto = ipv6_gso_pull_exthdrs(skb, ipv6h->nexthdr, false);
 
 	if (skb->encapsulation &&
 	    skb_shinfo(skb)->gso_type & (SKB_GSO_IPXIP4 | SKB_GSO_IPXIP6))
@@ -188,7 +192,7 @@ static struct sk_buff *ipv6_gro_receive(struct list_head *head,
 	if (!ops) {
 		__pskb_pull(skb, skb_gro_offset(skb));
 		skb_gro_frag0_invalidate(skb);
-		proto = ipv6_gso_pull_exthdrs(skb, proto);
+		proto = ipv6_gso_pull_exthdrs(skb, proto, true);
 		skb_gro_pull(skb, -skb_transport_offset(skb));
 		skb_reset_transport_header(skb);
 		__skb_push(skb, skb_gro_offset(skb));
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index e15cd37024fd..83f14962a909 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -20,6 +20,7 @@
 #ifdef CONFIG_NETLABEL
 #include <net/calipso.h>
 #endif
+#include <net/protocol.h>
 
 static int zero;
 static int one = 1;
@@ -178,6 +179,13 @@ static struct ctl_table ipv6_rotable[] = {
 		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= &one
 	},
+	{
+		.procname	= "gro",
+		.mode		= 0644,
+		.proc_handler	= proc_do_net_offload,
+		.extra1		= inet6_offloads,
+		.extra2		= (void *) NET_OFF_FLAG_GRO_OFF,
+	},
 #ifdef CONFIG_NETLABEL
 	{
 		.procname	= "calipso_cache_enable",
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related

* [PATCH net-next RFC 7/8] udp: gro behind static key
From: Willem de Bruijn @ 2018-09-14 17:59 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, steffen.klassert, davem, Willem de Bruijn
In-Reply-To: <20180914175941.213950-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

Avoid the socket lookup cost in udp_gro_receive if no socket has a
gro callback configured.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/net/udp.h      | 2 ++
 net/ipv4/udp.c         | 2 +-
 net/ipv4/udp_offload.c | 2 +-
 net/ipv6/udp.c         | 2 +-
 net/ipv6/udp_offload.c | 2 +-
 5 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 8482a990b0bb..9e82cb391dea 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -443,8 +443,10 @@ int udpv4_offload_init(void);
 
 void udp_init(void);
 
+DECLARE_STATIC_KEY_FALSE(udp_encap_needed_key);
 void udp_encap_enable(void);
 #if IS_ENABLED(CONFIG_IPV6)
+DECLARE_STATIC_KEY_FALSE(udpv6_encap_needed_key);
 void udpv6_encap_enable(void);
 #endif
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index f4e35b2ff8b8..bd873a5b8a86 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1889,7 +1889,7 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	return 0;
 }
 
-static DEFINE_STATIC_KEY_FALSE(udp_encap_needed_key);
+DEFINE_STATIC_KEY_FALSE(udp_encap_needed_key);
 void udp_encap_enable(void)
 {
 	static_branch_enable(&udp_encap_needed_key);
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 4f6aa95a9b12..f44fe328aa0f 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -405,7 +405,7 @@ static struct sk_buff *udp4_gro_receive(struct list_head *head,
 {
 	struct udphdr *uh = udp_gro_udphdr(skb);
 
-	if (unlikely(!uh))
+	if (unlikely(!uh) || !static_branch_unlikely(&udp_encap_needed_key))
 		goto flush;
 
 	/* Don't bother verifying checksum if we're going to flush anyway. */
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 83f4c77c79d8..d84672959f10 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -548,7 +548,7 @@ static __inline__ void udpv6_err(struct sk_buff *skb,
 	__udp6_lib_err(skb, opt, type, code, offset, info, &udp_table);
 }
 
-static DEFINE_STATIC_KEY_FALSE(udpv6_encap_needed_key);
+DEFINE_STATIC_KEY_FALSE(udpv6_encap_needed_key);
 void udpv6_encap_enable(void)
 {
 	static_branch_enable(&udpv6_encap_needed_key);
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 2a41da0dd33f..e00f19c4a939 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -119,7 +119,7 @@ static struct sk_buff *udp6_gro_receive(struct list_head *head,
 {
 	struct udphdr *uh = udp_gro_udphdr(skb);
 
-	if (unlikely(!uh))
+	if (unlikely(!uh) || !static_branch_unlikely(&udpv6_encap_needed_key))
 		goto flush;
 
 	/* Don't bother verifying checksum if we're going to flush anyway. */
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related

* [PATCH net-next RFC 8/8] udp: add gro
From: Willem de Bruijn @ 2018-09-14 17:59 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, steffen.klassert, davem, Willem de Bruijn
In-Reply-To: <20180914175941.213950-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

Very rough initial version of udp gro, for discussion purpose only at
this point.

Among others it
- lacks the cmsg UDP_SEGMENT to return gso_size
- probably breaks udp tunnels
- hard breaks at 40 segments
- does not allow a last segment of unequal size

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/uapi/linux/udp.h |  1 +
 net/ipv4/udp.c           | 71 ++++++++++++++++++++++++++++++++++++++++
 net/ipv4/udp_offload.c   | 11 +++----
 3 files changed, 76 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 09d00f8c442b..7fda3e8c7fcf 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -33,6 +33,7 @@ struct udphdr {
 #define UDP_NO_CHECK6_TX 101	/* Disable sending checksum for UDP6X */
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
+#define UDP_GRO		104	/* Enable GRO */
 
 /* UDP encapsulation types */
 #define UDP_ENCAP_ESPINUDP_NON_IKE	1 /* draft-ietf-ipsec-nat-t-ike-00/01 */
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index bd873a5b8a86..ae49c08e6225 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2387,6 +2387,51 @@ void udp_destroy_sock(struct sock *sk)
 	}
 }
 
+static struct sk_buff *udp_gro_receive_cb(struct sock *sk,
+					  struct list_head *head,
+					  struct sk_buff *skb)
+{
+	struct sk_buff *p;
+	unsigned int off;
+
+	off = skb_gro_offset(skb) - sizeof(struct udphdr);
+
+	list_for_each_entry(p, head, list) {
+		if (!NAPI_GRO_CB(p)->same_flow)
+			continue;
+
+		/* TODO: for UDP_GRO: match size unless last segment */
+		if (NAPI_GRO_CB(p)->flush)
+			break;
+
+		/* TODO: look into ip id check */
+		if (skb_gro_receive(p, skb)) {
+			NAPI_GRO_CB(skb)->flush = 1;
+			break;
+		}
+
+		if (NAPI_GRO_CB(skb)->count >= 40) {
+			return p;
+		}
+
+		return NULL;
+	}
+
+	return NULL;
+}
+
+static int udp_gro_complete_cb(struct sock *sk, struct sk_buff *skb,
+			       int nhoff)
+{
+	skb->csum_start = (unsigned char *)udp_hdr(skb) - skb->head;
+	skb->csum_offset = offsetof(struct udphdr, check);
+	skb->ip_summed = CHECKSUM_PARTIAL;
+
+	skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
+
+	return 0;
+}
+
 /*
  *	Socket option code for UDP
  */
@@ -2450,6 +2495,32 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
 		up->gso_size = val;
 		break;
 
+	case UDP_GRO:
+	{
+		if (val < 0 || val > 1)
+			return -EINVAL;
+
+		lock_sock(sk);
+		if (val) {
+
+			if (!udp_sk(sk)->gro_receive) {
+				udp_sk(sk)->gro_complete = udp_gro_complete_cb;
+				udp_sk(sk)->gro_receive = udp_gro_receive_cb;
+			} else {
+				err = -EALREADY;
+			}
+		} else {
+			if (udp_sk(sk)->gro_receive) {
+				udp_sk(sk)->gro_receive = NULL;
+				udp_sk(sk)->gro_complete = NULL;
+			} else {
+				err = -ENOENT;
+			}
+		}
+		release_sock(sk);
+		break;
+	}
+
 	/*
 	 * 	UDP-Lite's partial checksum coverage (RFC 3828).
 	 */
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index f44fe328aa0f..6dd3f0a28b5e 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -386,6 +386,8 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 			NAPI_GRO_CB(p)->same_flow = 0;
 			continue;
 		}
+
+		/* TODO: for UDP_GRO: match size */
 	}
 
 	skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */
@@ -437,11 +439,6 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
 
 	uh->len = newlen;
 
-	/* Set encapsulation before calling into inner gro_complete() functions
-	 * to make them set up the inner offsets.
-	 */
-	skb->encapsulation = 1;
-
 	rcu_read_lock();
 	sk = (*lookup)(skb, uh->source, uh->dest);
 	if (sk && udp_sk(sk)->gro_complete)
@@ -462,11 +459,11 @@ static int udp4_gro_complete(struct sk_buff *skb, int nhoff)
 	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
 
 	if (uh->check) {
-		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
+		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_L4;
 		uh->check = ~udp_v4_check(skb->len - nhoff, iph->saddr,
 					  iph->daddr, 0);
 	} else {
-		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL;
+		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_L4;
 	}
 
 	return udp_gro_complete(skb, nhoff, udp4_lib_lookup_skb);
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related

* Re: [PATCH] net: caif: remove redundant null check on frontpkt
From: Colin Ian King @ 2018-09-14 18:18 UTC (permalink / raw)
  To: Sergei Shtylyov, Dmitry Tarnyagin, David S . Miller
  Cc: kernel-janitors, netdev
In-Reply-To: <47b7f5a0-8b13-ead7-33b7-6e9c6ada8e61@cogentembedded.com>

On 14/09/18 18:54, Sergei Shtylyov wrote:
> Hello!
> 
> On 09/14/2018 08:19 PM, Colin King wrote:
> 
>> From: Colin Ian King <colin.king@canonical.com>
>>
>> It is impossible for frontpkt to be null at the point of the null
>> check because it has been assigned from rearpkt and there is no
>> way realpkt can be null at the point of the assignment because
> 
>    rearpkt?

Good spot. Can this be fixed up when the patch is applied?

> 
>> of the sanity checking and exit paths taken previously. Remove
>> the redundant null check.
>>
>> Detected by CoverityScan, CID#114434 ("Logically dead code")
>>
>> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> [...]
> 
> MBR, Sergei
> 

^ permalink raw reply

* KMSAN: uninit-value in strlcpy (2)
From: syzbot @ 2018-09-14 18:23 UTC (permalink / raw)
  To: coreteam, davem, fw, horms, ja, kadlec, linux-kernel, lvs-devel,
	netdev, netfilter-devel, pablo, syzkaller-bugs, wensong

Hello,

syzbot found the following crash on:

HEAD commit:    9822946c7fee kmsan: update .config.example to v4.17-rc5
git tree:       https://github.com/google/kmsan.git/master
console output: https://syzkaller.appspot.com/x/log.txt?x=169a5197800000
kernel config:  https://syzkaller.appspot.com/x/.config?x=9fa436d3ae606638
dashboard link: https://syzkaller.appspot.com/bug?extid=c86cf7903306a6c201ba
compiler:       clang version 7.0.0 (trunk 329391)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15d1b87b800000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11235417800000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+c86cf7903306a6c201ba@syzkaller.appspotmail.com

random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
==================================================================
BUG: KMSAN: uninit-value in strlen lib/string.c:482 [inline]
BUG: KMSAN: uninit-value in strlcpy+0x68/0x1c0 lib/string.c:142
CPU: 0 PID: 4506 Comm: syz-executor160 Not tainted 4.17.0-rc5+ #95
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x185/0x1d0 lib/dump_stack.c:113
  kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1084
  __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
  strlen lib/string.c:482 [inline]
  strlcpy+0x68/0x1c0 lib/string.c:142
  do_ip_vs_set_ctl+0x3f1/0x2760 net/netfilter/ipvs/ip_vs_ctl.c:2384
  nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
  nf_setsockopt+0x476/0x4d0 net/netfilter/nf_sockopt.c:115
  ip_setsockopt+0x24b/0x2b0 net/ipv4/ip_sockglue.c:1253
  udp_setsockopt+0x108/0x1b0 net/ipv4/udp.c:2416
  ipv6_setsockopt+0x30c/0x340 net/ipv6/ipv6_sockglue.c:917
  tcp_setsockopt+0x1bb/0x1f0 net/ipv4/tcp.c:2891
  sock_common_setsockopt+0x136/0x170 net/core/sock.c:3039
  __sys_setsockopt+0x4af/0x560 net/socket.c:1903
  __do_sys_setsockopt net/socket.c:1914 [inline]
  __se_sys_setsockopt net/socket.c:1911 [inline]
  __x64_sys_setsockopt+0x15c/0x1c0 net/socket.c:1911
  do_syscall_64+0x154/0x220 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x43fce9
RSP: 002b:00007ffea6b1dd08 EFLAGS: 00000213 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fce9
RDX: 000000000000048b RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 0000000000000018 R09: 00000000004002c8
R10: 00000000200001c0 R11: 0000000000000213 R12: 0000000000401610
R13: 00000000004016a0 R14: 0000000000000000 R15: 0000000000000000

Local variable description: ----arg@do_ip_vs_set_ctl
Variable was created at:
  read_pnet include/net/net_namespace.h:288 [inline]
  sock_net include/net/sock.h:2306 [inline]
  do_ip_vs_set_ctl+0x93/0x2760 net/netfilter/ipvs/ip_vs_ctl.c:2347
  nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
  nf_setsockopt+0x476/0x4d0 net/netfilter/nf_sockopt.c:115
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: mlx5 driver loading failing on v4.19 / net-next / bpf-next
From: Saeed Mahameed @ 2018-09-14 18:26 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Alexei Starovoitov, Moshe Shemesh, Eli Cohen, Or Gerlitz,
	Tariq Toukan, Saeed Mahameed, netdev@vger.kernel.org,
	Eran Ben Elisha
In-Reply-To: <20180914105235.65dfafcd@redhat.com>

On Fri, Sep 14, 2018 at 1:52 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> On Fri, 14 Sep 2018 01:22:15 -0700
> Saeed Mahameed <saeedm@dev.mellanox.co.il> wrote:
>
>> On Thu, Sep 13, 2018 at 11:36 PM, Jesper Dangaard Brouer
>> <brouer@redhat.com> wrote:
>> > On Thu, 13 Sep 2018 15:55:29 -0700
>> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>> >
>> >> On Thu, Aug 30, 2018 at 1:35 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>> >> >
>> >> >
>> >> > On 29/08/2018 6:05 PM, Jesper Dangaard Brouer wrote:
>> >> >>
>> >> >> Hi Saeed,
>> >> >>
>> >> >> I'm having issues loading mlx5 driver on v4.19 kernels (tested both
>> >> >> net-next and bpf-next), while kernel v4.18 seems to work.  It happens
>> >> >> with a Mellanox ConnectX-5 NIC (and also a CX4-Lx but I removed that
>> >> >> from the system now).
>> >> >>
>> >> >
>> >> > Hi Jesper,
>> >> >
>> >> > Thanks for your report!
>> >> >
>> >> > We are working to analyze and debug the issue.
>> >>
>> >> looks like serious issue to me... while no news in 2 weeks.
>> >> any update?
>> >
>> > Mellanox took it offlist, and Sep 6th found that this is a regression
>> > introduced by commit 269d26f47f6f ("net/mlx5: Reduce command polling
>> > interval"), but only if CONFIG_PREEMPT is on.
>> >
>> > I can confirm that reverting this commit fixed the issue (and not the
>> > firmware upgrade I also did).
>> >
>> > I think Moshe (Cc) is responsible for this case, and I expect to soon
>> > see a revert or alternative solution to this!?
>> >
>> > Thanks for the kick Alexei :-)
>>
>> Thanks you Alexei and Jesper for following up,
>> the fix is already being tested [1] and will be submitted tomorrow,
>> as Jesper pointed out the issue happens only with 269d26f47f6f
>> ("net/mlx5: Reduce command polling
>> interval"), and only if CONFIG_PREEMPT is on.
>> the only affected kernel is 4.19 which is not GA yet.
>>
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/commit/?h=net-mlx5
>
> Sound good.
>
> I will appreciate if you add a:
>
> Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
>

Of course i will add it, simply the patch was in my review queue
before your report :).

> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [Cake] [PATCH iproute2] q_cake: Also print nonat, nowash and no-ack-filter keywords
From: Stephen Hemminger @ 2018-09-14 18:35 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: netdev, cake
In-Reply-To: <20180914135139.16369-1-toke@toke.dk>

On Fri, 14 Sep 2018 15:51:39 +0200
Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> Similar to the previous patch for no-split-gso, the negative keywords for
> 'nat', 'wash' and 'ack-filter' were not printed either. Add those well.
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next RFC 6/8] net: make gro configurable
From: Stephen Hemminger @ 2018-09-14 18:38 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: netdev, pabeni, steffen.klassert, davem, Willem de Bruijn
In-Reply-To: <20180914175941.213950-7-willemdebruijn.kernel@gmail.com>

On Fri, 14 Sep 2018 13:59:39 -0400
Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:

> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index e5d236595206..8cb8e02c8ab6 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -572,6 +572,7 @@ static struct sk_buff *vxlan_gro_receive(struct sock *sk,
>  					 struct list_head *head,
>  					 struct sk_buff *skb)
>  {
> +	const struct net_offload *ops;
>  	struct sk_buff *pp = NULL;
>  	struct sk_buff *p;
>  	struct vxlanhdr *vh, *vh2;
> @@ -606,6 +607,12 @@ static struct sk_buff *vxlan_gro_receive(struct sock *sk,
>  			goto out;
>  	}
>  
> +	rcu_read_lock();
> +	ops = net_gro_receive(dev_offloads, ETH_P_TEB);
> +	rcu_read_unlock();
> +	if (!ops)
> +		goto out;

Isn't rcu_read_lock already held here?
RCU read lock is always held in the receive handler path

> +
>  	skb_gro_pull(skb, sizeof(struct vxlanhdr)); /* pull vxlan header */
>  
>  	list_for_each_entry(p, head, list) {
> @@ -621,6 +628,7 @@ static struct sk_buff *vxlan_gro_receive(struct sock *sk,
>  	}
>  
>  	pp = call_gro_receive(eth_gro_receive, head, skb);
> +
>  	flush = 0;

whitespace change crept into this patch.

^ permalink raw reply

* Re: [RFC PATCH net-next v1 00/14] rename and shrink i40evf
From: Jesse Brandeburg @ 2018-09-14 18:55 UTC (permalink / raw)
  To: Benjamin Poirier; +Cc: netdev, intel-wired-lan, jeffrey.t.kirsher
In-Reply-To: <20180914043917.GB24996@f2>

On Fri, 14 Sep 2018 13:39:17 +0900 Benjamin wrote:
> > Jesse Brandeburg (14):
> >   intel-ethernet: rename i40evf to iavf  
> 
> Seems like patch 1 didn't make it to netdev
> https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20180910/014025.html

Hi Ben, Thanks for the note, I don't know why it didn't show up for
you, it's here if you want to take a look:
https://patchwork.ozlabs.org/patch/969557/

^ permalink raw reply

* [PATCH net] ipv6: fix possible use-after-free in ip6_xmit()
From: Eric Dumazet @ 2018-09-14 19:02 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

In the unlikely case ip6_xmit() has to call skb_realloc_headroom(),
we need to call skb_set_owner_w() before consuming original skb,
otherwise we risk a use-after-free.

Bring IPv6 in line with what we do in IPv4 to fix this.

Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 net/ipv6/ip6_output.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 16f200f06500758c4cae84ea16229d5dbce912cb..f9f8f554d141676a7d342f85088d12d9a6815e9d 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -219,12 +219,10 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 				kfree_skb(skb);
 				return -ENOBUFS;
 			}
+			if (skb->sk)
+				skb_set_owner_w(skb2, skb->sk);
 			consume_skb(skb);
 			skb = skb2;
-			/* skb_set_owner_w() changes sk->sk_wmem_alloc atomically,
-			 * it is safe to call in our context (socket lock not held)
-			 */
-			skb_set_owner_w(skb, (struct sock *)sk);
 		}
 		if (opt->opt_flen)
 			ipv6_push_frag_opts(skb, opt, &proto);
-- 
2.19.0.397.gdd90340f6a-goog

^ permalink raw reply related

* Re: [RFC PATCH net-next v1 00/14] rename and shrink i40evf
From: Jesse Brandeburg @ 2018-09-14 19:17 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Linux Netdev List, intel-wired-lan, Jeff Kirsher, Saeed Mahameed
In-Reply-To: <CAJ3xEMj3OXwz=A5+5JQ0mXoq2OU3N7DP-TpJVMtdTWXZw5tQ9w@mail.gmail.com>

On Fri, 14 Sep 2018 12:10:45 +0300 Or wrote:
> On Fri, Sep 14, 2018 at 1:31 AM, Jesse Brandeburg
> <jesse.brandeburg@intel.com> wrote:
> on what HW ring format do you standardize? do i40e/Fortville and
> ice/what's-the-intel-code-name?  HWs can/use the same posting/completion
> descriptor?

The initial ring format is the same as used for XL710/X722 devices, and
planned be supported for the Intel Ethernet E800 series (ice driver) and
future VF devices using SR-IOV.

> > This solves 2 issues we saw coming or were already present, the
> > first was constant code duplication happening with i40e/i40evf,
> > when much of the duplicate code in the i40evf was not used or was
> > not needed.  
> 
> could you spare few words on the origin/nature of these duplicates? were them
> just developer C&P mistakes for functionality which is irrelevant for
> a VF? like what?
> if not, what was there?

In particular, some of the code was not used at all, but was not caught
by any automation because it was in a header file and included into
multiple file scopes.  Other big chunk of the duplicate code was for
the PF's usage of the communication channel to firmware, which for some
reason was left in the VF driver code (probably just to avoid changing
the file) - but the VF driver doesn't communicate to firmware, just to
the PF.

> > The second was to remove the future confusion of why
> > future VF devices that were not considered "40GbE" only devices
> > were supported by i40evf.  
> 
> can elaborate further?

The name i40evf was generating customer questions, and was confusing
when you add in multiple generations of PF hardware that are no longer
using the i40e driver.

> > The thought is that iavf will be the virtual function driver for
> > all future devices, so it should have a "generic" name to propery
> > represent that it is the VF driver for multiple generations of
> > devices.  
> 
> for that end,  as I think was explained @ the netdev Tokyo AVF session,
> you would need a mechanism for feature negotiation, is it here or coming up?

The driver already has it (a feature negotitiation), please see the
function called iavf_send_vf_config_msg, and follow from where it is
called.  Basically the VF driver negotiates with the PF for what it can
do, and the PF guarantees that the base set of features will always
work, with optional advanced features which the code may/may-not have
in the future.

> >  41 files changed, 3436 insertions(+), 7581 deletions(-)  
> 
> code diet is cool!

Thanks! ~4000 lines less made me very happy too.

^ permalink raw reply

* Re: [bpf-next, v4 0/5] Introduce eBPF flow dissector
From: Alexei Starovoitov @ 2018-09-14 19:22 UTC (permalink / raw)
  To: Petar Penkov
  Cc: netdev, davem, ast, daniel, simon.horman, ecree, songliubraving,
	tom, Petar Penkov
In-Reply-To: <20180914144622.16436-1-peterpenkov96@gmail.com>

On Fri, Sep 14, 2018 at 07:46:17AM -0700, Petar Penkov wrote:
> From: Petar Penkov <ppenkov@google.com>
> 
> This patch series hardens the RX stack by allowing flow dissection in BPF,
> as previously discussed [1]. Because of the rigorous checks of the BPF
> verifier, this provides significant security guarantees. In particular, the
> BPF flow dissector cannot get inside of an infinite loop, as with
> CVE-2013-4348, because BPF programs are guaranteed to terminate. It cannot
> read outside of packet bounds, because all memory accesses are checked.
> Also, with BPF the administrator can decide which protocols to support,
> reducing potential attack surface. Rarely encountered protocols can be
> excluded from dissection and the program can be updated without kernel
> recompile or reboot if a bug is discovered.
> 
> Patch 1 adds infrastructure to execute a BPF program in __skb_flow_dissect.
> This includes a new BPF program and attach type.
> 
> Patch 2 adds the new BPF flow dissector definitions to tools/uapi.
> 
> Patch 3 adds support for the new BPF program type to libbpf and bpftool.
> 
> Patch 4 adds a flow dissector program in BPF. This parses most protocols in
> __skb_flow_dissect in BPF for a subset of flow keys (basic, control, ports,
> and address types).
> 
> Patch 5 adds a selftest that attaches the BPF program to the flow dissector
> and sends traffic with different levels of encapsulation.
> 
> Performance Evaluation:
> The in-kernel implementation was compared against the demo program from
> patch 4 using the test in patch 5 with IPv4/UDP traffic over 10 seconds.
> 	$perf record -a -C 4 taskset -c 4 ./test_flow_dissector -i 4 -f 8 \
> 		-t 10

Looks great. Applied to bpf-next with one extra patch:
 SEC("dissect")
-int dissect(struct __sk_buff *skb)
+int _dissect(struct __sk_buff *skb)

otherwise the test doesn't build.
I'm not sure how it builds for you. Which llvm did you use?

Also above command works and ipv4 test in ./test_flow_dissector.sh
is passing as well, but it still fails at the end for me:
./test_flow_dissector.sh
bpffs not mounted. Mounting...
0: IP
1: IPV6
2: IPV6OP
3: IPV6FR
4: MPLS
5: VLAN
Testing IPv4...
inner.dest4: 127.0.0.1
inner.source4: 127.0.0.3
pkts: tx=10 rx=10
inner.dest4: 127.0.0.1
inner.source4: 127.0.0.3
pkts: tx=10 rx=0
inner.dest4: 127.0.0.1
inner.source4: 127.0.0.3
pkts: tx=10 rx=10
Testing IPIP...
tunnels before test:
tunl0: any/ip remote any local any ttl inherit nopmtudisc
sit_test_LV5N: any/ip remote 127.0.0.2 local 127.0.0.1 dev lo ttl inherit
ipip_test_LV5N: any/ip remote 127.0.0.2 local 127.0.0.1 dev lo ttl inherit
sit0: ipv6/ip remote any local any ttl 64 nopmtudisc
gre_test_LV5N: gre/ip remote 127.0.0.2 local 127.0.0.1 dev lo ttl inherit
gre0: gre/ip remote any local any ttl inherit nopmtudisc
inner.dest4: 192.168.0.1
inner.source4: 1.1.1.1
encap proto:   4
outer.dest4: 127.0.0.1
outer.source4: 127.0.0.2
pkts: tx=10 rx=0
tunnels after test:
tunl0: any/ip remote any local any ttl inherit nopmtudisc
sit0: ipv6/ip remote any local any ttl 64 nopmtudisc
gre0: gre/ip remote any local any ttl inherit nopmtudisc
selftests: test_flow_dissector [FAILED]

is it something in my setup or test is broken?

^ permalink raw reply

* Re: [PATCH net-next v2] net/tls: Add support for async decryption of tls records
From: John Fastabend @ 2018-09-14 19:39 UTC (permalink / raw)
  To: Vakul Garg, netdev; +Cc: borisp, aviadye, davejwatson, davem
In-Reply-To: <20180829095655.31963-1-vakul.garg@nxp.com>

On 08/29/2018 02:56 AM, Vakul Garg wrote:
> When tls records are decrypted using asynchronous acclerators such as
> NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
> getting -EINPROGRESS, the tls record processing stops till the time the
> crypto accelerator finishes off and returns the result. This incurs a
> context switch and is not an efficient way of accessing the crypto
> accelerators. Crypto accelerators work efficient when they are queued
> with multiple crypto jobs without having to wait for the previous ones
> to complete.
> 
> The patch submits multiple crypto requests without having to wait for
> for previous ones to complete. This has been implemented for records
> which are decrypted in zero-copy mode. At the end of recvmsg(), we wait
> for all the asynchronous decryption requests to complete.
> 
> The references to records which have been sent for async decryption are
> dropped. For cases where record decryption is not possible in zero-copy
> mode, asynchronous decryption is not used and we wait for decryption
> crypto api to complete.
> 
> For crypto requests executing in async fashion, the memory for
> aead_request, sglists and skb etc is freed from the decryption
> completion handler. The decryption completion handler wakesup the
> sleeping user context when recvmsg() flags that it has done sending
> all the decryption requests and there are no more decryption requests
> pending to be completed.
> 
> Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
> Reviewed-by: Dave Watson <davejwatson@fb.com>
> ---

[...]


> @@ -1271,6 +1377,8 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
>  		goto free_aead;
>  
>  	if (sw_ctx_rx) {
> +		(*aead)->reqsize = sizeof(struct decrypt_req_ctx);
> +

This is not valid and may cause GPF or best case only a KASAN
warning. 'reqsize' should probably not be mangled outside the
internal crypto APIs but the real reason is the reqsize is used
to determine how much space is needed at the end of the aead_request
for crypto private ctx use in encrypt/decrypt. After this patch
when we submit an aead_request the crypto layer will think it
has room for its private structs at the end but now only 8B will
be there and crypto layer will happily memset some arbitrary
memory for you amongst other things.

Anyways testing a fix now will post shortly.

Thanks,
John

^ permalink raw reply

* [PATCH net] bnxt_en: Fix VF mac address regression.
From: Michael Chan @ 2018-09-14 19:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, seth.forshee, loseweigh

The recent commit to always forward the VF MAC address to the PF for
approval may not work if the PF driver or the firmware is older.  This
will cause the VF driver to fail during probe:

  bnxt_en 0000:00:03.0 (unnamed net_device) (uninitialized): hwrm req_type 0xf seq id 0x5 error 0xffff
  bnxt_en 0000:00:03.0 (unnamed net_device) (uninitialized): VF MAC address 00:00:17:02:05:d0 not approved by the PF
  bnxt_en 0000:00:03.0: Unable to initialize mac address.
  bnxt_en: probe of 0000:00:03.0 failed with error -99

We fix it by treating the error as fatal only if the VF MAC address is
locally generated by the VF.

Fixes: 707e7e966026 ("bnxt_en: Always forward VF MAC address to the PF.")
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Reported-by: Siwei Liu <loseweigh@gmail.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
Please queue this for stable as well.  Thanks.

 drivers/net/ethernet/broadcom/bnxt/bnxt.c       | 9 +++++++--
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 9 +++++----
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h | 2 +-
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index cecbb1d..177587f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -8027,7 +8027,7 @@ static int bnxt_change_mac_addr(struct net_device *dev, void *p)
 	if (ether_addr_equal(addr->sa_data, dev->dev_addr))
 		return 0;
 
-	rc = bnxt_approve_mac(bp, addr->sa_data);
+	rc = bnxt_approve_mac(bp, addr->sa_data, true);
 	if (rc)
 		return rc;
 
@@ -8827,14 +8827,19 @@ static int bnxt_init_mac_addr(struct bnxt *bp)
 	} else {
 #ifdef CONFIG_BNXT_SRIOV
 		struct bnxt_vf_info *vf = &bp->vf;
+		bool strict_approval = true;
 
 		if (is_valid_ether_addr(vf->mac_addr)) {
 			/* overwrite netdev dev_addr with admin VF MAC */
 			memcpy(bp->dev->dev_addr, vf->mac_addr, ETH_ALEN);
+			/* Older PF driver or firmware may not approve this
+			 * correctly.
+			 */
+			strict_approval = false;
 		} else {
 			eth_hw_addr_random(bp->dev);
 		}
-		rc = bnxt_approve_mac(bp, bp->dev->dev_addr);
+		rc = bnxt_approve_mac(bp, bp->dev->dev_addr, strict_approval);
 #endif
 	}
 	return rc;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index fcd085a..3962f6f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -1104,7 +1104,7 @@ void bnxt_update_vf_mac(struct bnxt *bp)
 	mutex_unlock(&bp->hwrm_cmd_lock);
 }
 
-int bnxt_approve_mac(struct bnxt *bp, u8 *mac)
+int bnxt_approve_mac(struct bnxt *bp, u8 *mac, bool strict)
 {
 	struct hwrm_func_vf_cfg_input req = {0};
 	int rc = 0;
@@ -1122,12 +1122,13 @@ int bnxt_approve_mac(struct bnxt *bp, u8 *mac)
 	memcpy(req.dflt_mac_addr, mac, ETH_ALEN);
 	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
 mac_done:
-	if (rc) {
+	if (rc && strict) {
 		rc = -EADDRNOTAVAIL;
 		netdev_warn(bp->dev, "VF MAC address %pM not approved by the PF\n",
 			    mac);
+		return rc;
 	}
-	return rc;
+	return 0;
 }
 #else
 
@@ -1144,7 +1145,7 @@ void bnxt_update_vf_mac(struct bnxt *bp)
 {
 }
 
-int bnxt_approve_mac(struct bnxt *bp, u8 *mac)
+int bnxt_approve_mac(struct bnxt *bp, u8 *mac, bool strict)
 {
 	return 0;
 }
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
index e9b20cd..2eed9ed 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
@@ -39,5 +39,5 @@ int bnxt_sriov_configure(struct pci_dev *pdev, int num_vfs);
 void bnxt_sriov_disable(struct bnxt *);
 void bnxt_hwrm_exec_fwd_req(struct bnxt *);
 void bnxt_update_vf_mac(struct bnxt *);
-int bnxt_approve_mac(struct bnxt *, u8 *);
+int bnxt_approve_mac(struct bnxt *, u8 *, bool);
 #endif
-- 
2.5.1

^ permalink raw reply related

* [net-next PATCH] tls: async support causes out-of-bounds access in crypto APIs
From: John Fastabend @ 2018-09-14 20:01 UTC (permalink / raw)
  To: vakul.garg, davejwatson
  Cc: doronrk, netdev, alexei.starovoitov, daniel, davem

When async support was added it needed to access the sk from the async
callback to report errors up the stack. The patch tried to use space
after the aead request struct by directly setting the reqsize field in
aead_request. This is an internal field that should not be used
outside the crypto APIs. It is used by the crypto code to define extra
space for private structures used in the crypto context. Users of the
API then use crypto_aead_reqsize() and add the returned amount of
bytes to the end of the request memory allocation before posting the
request to encrypt/decrypt APIs.

So this breaks (with general protection fault and KASAN error, if
enabled) because the request sent to decrypt is shorter than required
causing the crypto API out-of-bounds errors. Also it seems unlikely the
sk is even valid by the time it gets to the callback because of memset
in crypto layer.

Anyways, fix this by holding the sk in the skb->sk field when the
callback is set up and because the skb is already passed through to
the callback handler via void* we can access it in the handler. Then
in the handler we need to be careful to NULL the pointer again before
kfree_skb. I added comments on both the setup (in tls_do_decryption)
and when we clear it from the crypto callback handler
tls_decrypt_done(). After this selftests pass again and fixes KASAN
errors/warnings.

Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/net/tls.h |    4 ----
 net/tls/tls_sw.c  |   39 +++++++++++++++++++++++----------------
 2 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index cd0a65b..8630d28 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -128,10 +128,6 @@ struct tls_sw_context_rx {
 	bool async_notify;
 };
 
-struct decrypt_req_ctx {
-	struct sock *sk;
-};
-
 struct tls_record_info {
 	struct list_head list;
 	u32 end_seq;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index be4f2e9..cef69b6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -122,25 +122,32 @@ static int skb_nsg(struct sk_buff *skb, int offset, int len)
 static void tls_decrypt_done(struct crypto_async_request *req, int err)
 {
 	struct aead_request *aead_req = (struct aead_request *)req;
-	struct decrypt_req_ctx *req_ctx =
-			(struct decrypt_req_ctx *)(aead_req + 1);
-
 	struct scatterlist *sgout = aead_req->dst;
-
-	struct tls_context *tls_ctx = tls_get_ctx(req_ctx->sk);
-	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
-	int pending = atomic_dec_return(&ctx->decrypt_pending);
+	struct tls_sw_context_rx *ctx;
+	struct tls_context *tls_ctx;
 	struct scatterlist *sg;
+	struct sk_buff *skb;
 	unsigned int pages;
+	int pending;
+
+	skb = (struct sk_buff *)req->data;
+	tls_ctx = tls_get_ctx(skb->sk);
+	ctx = tls_sw_ctx_rx(tls_ctx);
+	pending = atomic_dec_return(&ctx->decrypt_pending);
 
 	/* Propagate if there was an err */
 	if (err) {
 		ctx->async_wait.err = err;
-		tls_err_abort(req_ctx->sk, err);
+		tls_err_abort(skb->sk, err);
 	}
 
+	/* After using skb->sk to propagate sk through crypto async callback
+	 * we need to NULL it again.
+	 */
+	skb->sk = NULL;
+
 	/* Release the skb, pages and memory allocated for crypto req */
-	kfree_skb(req->data);
+	kfree_skb(skb);
 
 	/* Skip the first S/G entry as it points to AAD */
 	for_each_sg(sg_next(sgout), sg, UINT_MAX, pages) {
@@ -175,11 +182,13 @@ static int tls_do_decryption(struct sock *sk,
 			       (u8 *)iv_recv);
 
 	if (async) {
-		struct decrypt_req_ctx *req_ctx;
-
-		req_ctx = (struct decrypt_req_ctx *)(aead_req + 1);
-		req_ctx->sk = sk;
-
+		/* Using skb->sk to push sk through to crypto async callback
+		 * handler. This allows propagating errors up to the socket
+		 * if needed. It _must_ be cleared in the async handler
+		 * before kfree_skb is called. We _know_ skb->sk is NULL
+		 * because it is a clone from strparser.
+		 */
+		skb->sk = sk;
 		aead_request_set_callback(aead_req,
 					  CRYPTO_TFM_REQ_MAY_BACKLOG,
 					  tls_decrypt_done, skb);
@@ -1455,8 +1464,6 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
 		goto free_aead;
 
 	if (sw_ctx_rx) {
-		(*aead)->reqsize = sizeof(struct decrypt_req_ctx);
-
 		/* Set up strparser */
 		memset(&cb, 0, sizeof(cb));
 		cb.rcv_msg = tls_queue;

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox