* [PATCH 5/7] hwrng: core: Move hwrng miscdev minor number to include/linux/miscdevice.h
From: Corentin Labbe @ 2016-12-09 14:21 UTC (permalink / raw)
To: mpm, herbert, arnd, gregkh; +Cc: linux-crypto, linux-kernel, Corentin Labbe
In-Reply-To: <1481293299-21697-1-git-send-email-clabbe.montjoie@gmail.com>
This patch move the define for hwrng's miscdev minor number to
include/linux/miscdevice.h.
It's better that all minor number are in the same place.
Rename it to HWRNG_MINOR (from RNG_MISCDEV_MINOR) in he process since
no other miscdev define have MISCDEV in their name.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
---
drivers/char/hw_random/core.c | 3 +--
include/linux/miscdevice.h | 1 +
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 7a2e496..1e1e385 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -26,7 +26,6 @@
#define RNG_MODULE_NAME "hw_random"
#define PFX RNG_MODULE_NAME ": "
-#define RNG_MISCDEV_MINOR 183 /* official */
static struct hwrng *current_rng;
static struct task_struct *hwrng_fill;
@@ -283,7 +282,7 @@ static const struct file_operations rng_chrdev_ops = {
static const struct attribute_group *rng_dev_groups[];
static struct miscdevice rng_miscdev = {
- .minor = RNG_MISCDEV_MINOR,
+ .minor = HWRNG_MINOR,
.name = RNG_MODULE_NAME,
.nodename = "hwrng",
.fops = &rng_chrdev_ops,
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index 722698a..659f586 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -31,6 +31,7 @@
#define SGI_MMTIMER 153
#define STORE_QUEUE_MINOR 155 /* unused */
#define I2O_MINOR 166
+#define HWRNG_MINOR 183
#define MICROCODE_MINOR 184
#define VFIO_MINOR 196
#define TUN_MINOR 200
--
2.7.3
^ permalink raw reply related
* [PATCH 3/7] hwrng: core: Rewrite the header
From: Corentin Labbe @ 2016-12-09 14:21 UTC (permalink / raw)
To: mpm, herbert, arnd, gregkh; +Cc: linux-crypto, linux-kernel, Corentin Labbe
In-Reply-To: <1481293299-21697-1-git-send-email-clabbe.montjoie@gmail.com>
checkpatch have lot of complaint about header.
Furthermore, the header have some offtopic/useless information.
This patch rewrite a proper header.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
---
drivers/char/hw_random/core.c | 38 +++++++++-----------------------------
1 file changed, 9 insertions(+), 29 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 7029246..a8e63ae 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -1,33 +1,13 @@
/*
- Added support for the AMD Geode LX RNG
- (c) Copyright 2004-2005 Advanced Micro Devices, Inc.
-
- derived from
-
- Hardware driver for the Intel/AMD/VIA Random Number Generators (RNG)
- (c) Copyright 2003 Red Hat Inc <jgarzik@redhat.com>
-
- derived from
-
- Hardware driver for the AMD 768 Random Number Generator (RNG)
- (c) Copyright 2001 Red Hat Inc <alan@redhat.com>
-
- derived from
-
- Hardware driver for Intel i810 Random Number Generator (RNG)
- Copyright 2000,2001 Jeff Garzik <jgarzik@pobox.com>
- Copyright 2000,2001 Philipp Rumpf <prumpf@mandrakesoft.com>
-
- Added generic RNG API
- Copyright 2006 Michael Buesch <m@bues.ch>
- Copyright 2005 (c) MontaVista Software, Inc.
-
- Please read Documentation/hw_random.txt for details on use.
-
- ----------------------------------------------------------
- This software may be used and distributed according to the terms
- of the GNU General Public License, incorporated herein by reference.
-
+ * hw_random/core.c: HWRNG core API
+ *
+ * Copyright 2006 Michael Buesch <m@bues.ch>
+ * Copyright 2005 (c) MontaVista Software, Inc.
+ *
+ * Please read Documentation/hw_random.txt for details on use.
+ *
+ * This software may be used and distributed according to the terms
+ * of the GNU General Public License, incorporated herein by reference.
*/
#include <linux/device.h>
--
2.7.3
^ permalink raw reply related
* [PATCH] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can
From: Ard Biesheuvel @ 2016-12-09 13:47 UTC (permalink / raw)
To: linux-crypto, herbert; +Cc: linux-arm-kernel, Ard Biesheuvel
The bit-sliced NEON implementation of AES only performs optimally if
it can process 8 blocks of input in parallel. This is due to the nature
of bit slicing, where the n-th bit of each byte of AES state of each input
block is collected into NEON register 'n', for registers q0 - q7.
This implies that the amount of work for the transform is fixed,
regardless of whether we are handling just one block or 8 in parallel.
So let's try a bit harder to iterate over the input in suitably sized
chunks, by increasing the chunksize to 8 * AES_BLOCK_SIZE, and tweaking
the loops to only process multiples of the chunk size, unless we are
handling the last chunk in the input stream.
Note that the skcipher walk API guarantees that a step in the walk never
returns less that 'chunksize' bytes if there are at least that many bytes
of input still available. However, it does *not* guarantee that those steps
produce an exact multiple of the chunk size.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm/crypto/aesbs-glue.c | 68 +++++++++++++++++++++++++-------------------
1 file changed, 38 insertions(+), 30 deletions(-)
diff --git a/arch/arm/crypto/aesbs-glue.c b/arch/arm/crypto/aesbs-glue.c
index d8e06de72ef3..938d1e1bf9a3 100644
--- a/arch/arm/crypto/aesbs-glue.c
+++ b/arch/arm/crypto/aesbs-glue.c
@@ -121,39 +121,26 @@ static int aesbs_cbc_encrypt(struct skcipher_request *req)
return crypto_cbc_encrypt_walk(req, aesbs_encrypt_one);
}
-static inline void aesbs_decrypt_one(struct crypto_skcipher *tfm,
- const u8 *src, u8 *dst)
-{
- struct aesbs_cbc_ctx *ctx = crypto_skcipher_ctx(tfm);
-
- AES_decrypt(src, dst, &ctx->dec.rk);
-}
-
static int aesbs_cbc_decrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct aesbs_cbc_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
- unsigned int nbytes;
int err;
- for (err = skcipher_walk_virt(&walk, req, false);
- (nbytes = walk.nbytes); err = skcipher_walk_done(&walk, nbytes)) {
- u32 blocks = nbytes / AES_BLOCK_SIZE;
- u8 *dst = walk.dst.virt.addr;
- u8 *src = walk.src.virt.addr;
- u8 *iv = walk.iv;
-
- if (blocks >= 8) {
- kernel_neon_begin();
- bsaes_cbc_encrypt(src, dst, nbytes, &ctx->dec, iv);
- kernel_neon_end();
- nbytes %= AES_BLOCK_SIZE;
- continue;
- }
+ err = skcipher_walk_virt(&walk, req, false);
+
+ while (walk.nbytes) {
+ unsigned int nbytes = walk.nbytes;
+
+ if (nbytes < walk.total)
+ nbytes = round_down(nbytes, walk.chunksize);
- nbytes = crypto_cbc_decrypt_blocks(&walk, tfm,
- aesbs_decrypt_one);
+ kernel_neon_begin();
+ bsaes_cbc_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
+ nbytes, &ctx->dec, walk.iv);
+ kernel_neon_end();
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
}
return err;
}
@@ -186,6 +173,12 @@ static int aesbs_ctr_encrypt(struct skcipher_request *req)
__be32 *ctr = (__be32 *)walk.iv;
u32 headroom = UINT_MAX - be32_to_cpu(ctr[3]);
+ if (walk.nbytes < walk.total) {
+ blocks = round_down(blocks,
+ walk.chunksize / AES_BLOCK_SIZE);
+ tail = walk.nbytes - blocks * AES_BLOCK_SIZE;
+ }
+
/* avoid 32 bit counter overflow in the NEON code */
if (unlikely(headroom < blocks)) {
blocks = headroom + 1;
@@ -198,6 +191,9 @@ static int aesbs_ctr_encrypt(struct skcipher_request *req)
kernel_neon_end();
inc_be128_ctr(ctr, blocks);
+ if (tail > 0 && tail < AES_BLOCK_SIZE)
+ break;
+
err = skcipher_walk_done(&walk, tail);
}
if (walk.nbytes) {
@@ -227,11 +223,16 @@ static int aesbs_xts_encrypt(struct skcipher_request *req)
AES_encrypt(walk.iv, walk.iv, &ctx->twkey);
while (walk.nbytes) {
+ unsigned int nbytes = walk.nbytes;
+
+ if (nbytes < walk.total)
+ nbytes = round_down(nbytes, walk.chunksize);
+
kernel_neon_begin();
bsaes_xts_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
- walk.nbytes, &ctx->enc, walk.iv);
+ nbytes, &ctx->enc, walk.iv);
kernel_neon_end();
- err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
}
return err;
}
@@ -249,11 +250,16 @@ static int aesbs_xts_decrypt(struct skcipher_request *req)
AES_encrypt(walk.iv, walk.iv, &ctx->twkey);
while (walk.nbytes) {
+ unsigned int nbytes = walk.nbytes;
+
+ if (nbytes < walk.total)
+ nbytes = round_down(nbytes, walk.chunksize);
+
kernel_neon_begin();
bsaes_xts_decrypt(walk.src.virt.addr, walk.dst.virt.addr,
- walk.nbytes, &ctx->dec, walk.iv);
+ nbytes, &ctx->dec, walk.iv);
kernel_neon_end();
- err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
}
return err;
}
@@ -272,6 +278,7 @@ static struct skcipher_alg aesbs_algs[] = { {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
+ .chunksize = 8 * AES_BLOCK_SIZE,
.setkey = aesbs_cbc_set_key,
.encrypt = aesbs_cbc_encrypt,
.decrypt = aesbs_cbc_decrypt,
@@ -289,7 +296,7 @@ static struct skcipher_alg aesbs_algs[] = { {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
- .chunksize = AES_BLOCK_SIZE,
+ .chunksize = 8 * AES_BLOCK_SIZE,
.setkey = aesbs_ctr_set_key,
.encrypt = aesbs_ctr_encrypt,
.decrypt = aesbs_ctr_encrypt,
@@ -307,6 +314,7 @@ static struct skcipher_alg aesbs_algs[] = { {
.min_keysize = 2 * AES_MIN_KEY_SIZE,
.max_keysize = 2 * AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
+ .chunksize = 8 * AES_BLOCK_SIZE,
.setkey = aesbs_xts_set_key,
.encrypt = aesbs_xts_encrypt,
.decrypt = aesbs_xts_decrypt,
--
2.7.4
^ permalink raw reply related
* Re: scatterwalk_map_and_copy incorrect optimization
From: Jason A. Donenfeld @ 2016-12-09 13:26 UTC (permalink / raw)
To: linux-crypto, Herbert Xu, Eric Biggers; +Cc: LKML
In-Reply-To: <CAHmME9p24bkNPD2CfGMUOHxgOX+OuFO_rVqfhOQdDexwW53ExA@mail.gmail.com>
Hah, looks like I missed [1] by a couple weeks. Looks like it's been
settled then.
Is this a stable@ candidate?
[1] https://git.zx2c4.com/linux/commit/?id=c8467f7a3620698bf3c22f0e199b550fb611a8ae
^ permalink raw reply
* scatterwalk_map_and_copy incorrect optimization
From: Jason A. Donenfeld @ 2016-12-09 13:18 UTC (permalink / raw)
To: linux-crypto, Herbert Xu; +Cc: LKML
Hi Herbert,
The scatterwalk_map_and_copy function copies ordinary buffers to and
from scatterlists. These buffers can, of course, be on the stack, and
this remains the most popular use of this function -- getting info
between stack buffers and DMA regions. It's mostly used for adding or
checking MACs, in the majority of call sites. Its implementation is
relatively straightforward. It maps the DMA region(s) to a vaddr, and
then just calls vanilla memcpy. Pretty uncontroversial.
However, around ~4.1 an optimization was added to prevent copying when
unnecessary (when the src and dst are the same). The optimization
looks like this:
if (sg_page(sg) == virt_to_page(buf) &&
sg->offset == offset_in_page(buf))
return;
There are two problems with this:
1) If buf points to a large contiguous region, but sg points to
several smaller regions, with the first one of which being smaller
than buf, then this function will not actually copy the latter
fragments to the large contiguous buf. Maybe you don't care about
this, but it is a limitation and a potential source of bugs down the
line.
2) If buf points to the stack, this optimization is totally wrong,
since you shouldn't call virt_to_page on stack addresses.
Since this function is primarily used for copying to and from the
stack, item (2) is especially worrisome. A very quick fix for that
would be to just:
if (!object_is_on_stack(buf) &&
sg_page(sg) == virt_to_page(buf) &&
sg->offset == offset_in_page(buf))
return;
This doesn't address item (1), however. If you care about fixing item
(1), then maybe a reasonable way would be to just remove the
optimization all together.
I'll submit a patch for the !object_is_on_stack(buf) fix. But maybe
you'd prefer that I submit a patch that removes the whole optimization
entirely. Or something else -- just let me know.
Regards,
Jason
^ permalink raw reply
* Re: [PATCH v1 2/2] crypto: mediatek - add DT bindings documentation
From: Matthias Brugger @ 2016-12-09 9:45 UTC (permalink / raw)
To: Ryder Lee
Cc: Herbert Xu, David S. Miller, devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-crypto-u79uwXL29TY76Z2rM5mHXA,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Sean Wang,
Roy Luo
In-Reply-To: <1481188776.14860.26.camel@mtkswgap22>
On 08/12/16 10:19, Ryder Lee wrote:
> Hello,
>
> On Mon, 2016-12-05 at 11:18 +0100, Matthias Brugger wrote:
>>
>> On 05/12/16 08:01, Ryder Lee wrote:
>>> Add DT bindings documentation for the crypto driver
>>>
>>> Signed-off-by: Ryder Lee <ryder.lee-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
>>> ---
>>> .../devicetree/bindings/crypto/mediatek-crypto.txt | 32 ++++++++++++++++++++++
>>> 1 file changed, 32 insertions(+)
>>> create mode 100644 Documentation/devicetree/bindings/crypto/mediatek-crypto.txt
>>>
>>> diff --git a/Documentation/devicetree/bindings/crypto/mediatek-crypto.txt b/Documentation/devicetree/bindings/crypto/mediatek-crypto.txt
>>> new file mode 100644
>>> index 0000000..8b1db08
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/crypto/mediatek-crypto.txt
>>> @@ -0,0 +1,32 @@
>>> +MediaTek cryptographic accelerators
>>> +
>>> +Required properties:
>>> +- compatible: Should be "mediatek,mt7623-crypto"
>>
>> Do you know how big the difference is between the crypto engine for
>> mt7623/mt2701/mt8521p in comparison, let's say mt8173 or mt6797?
>> Do this SoCs have a crypot engine? If so and they are quite similar, we
>> might think of adding a mtk-crypto binding and add soc specific bindings.
>
> This engine is just available in mt7623/mt2701/mt8521p series SoCs and
> they have no difference.
>
> But there are still other crypto IPs in MTK, i think maybe we could use
> "mediatek,{IP name}-crypto” to distinguish them ?
>
Sounds good, thanks for the clarification.
Matthias
>> Regards,
>> Matthias
>>
>>> +- reg: Address and length of the register set for the device
>>> +- interrupts: Should contain the five crypto engines interrupts in numeric
>>> + order. These are global system and four descriptor rings.
>>> +- clocks: the clock used by the core
>>> +- clock-names: the names of the clock listed in the clocks property. These are
>>> + "ethif", "cryp"
>>> +- power-domains: Must contain a reference to the PM domain.
>>> +
>>> +
>>> +Optional properties:
>>> +- interrupt-parent: Should be the phandle for the interrupt controller
>>> + that services interrupts for this device
>>> +
>>> +
>>> +Example:
>>> + crypto: crypto@1b240000 {
>>> + compatible = "mediatek,mt7623-crypto";
>>> + reg = <0 0x1b240000 0 0x20000>;
>>> + interrupts = <GIC_SPI 82 IRQ_TYPE_LEVEL_LOW>,
>>> + <GIC_SPI 83 IRQ_TYPE_LEVEL_LOW>,
>>> + <GIC_SPI 84 IRQ_TYPE_LEVEL_LOW>,
>>> + <GIC_SPI 91 IRQ_TYPE_LEVEL_LOW>,
>>> + <GIC_SPI 97 IRQ_TYPE_LEVEL_LOW>;
>>> + clocks = <&topckgen CLK_TOP_ETHIF_SEL>,
>>> + <ðsys CLK_ETHSYS_CRYPTO>;
>>> + clock-names = "ethif","cryp";
>>> + power-domains = <&scpsys MT2701_POWER_DOMAIN_ETH>;
>>> + };
>>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] linux/types.h: enable endian checks for all sparse builds
From: Madhani, Himanshu @ 2016-12-09 6:40 UTC (permalink / raw)
To: Michael S. Tsirkin, Bart Van Assche
Cc: kvm@vger.kernel.org, Neil Armstrong, David Airlie,
linux-remoteproc@vger.kernel.org, dri-devel@lists.freedesktop.org,
virtualization@lists.linux-foundation.org,
linux-s390@vger.kernel.org, James E.J. Bottomley, Herbert Xu,
linux-scsi@vger.kernel.org, Christoph Hellwig,
v9fs-developer@lists.sourceforge.net, Asias He, Arnd Bergmann,
"linux-kbuild@vger.kernel.org" <lin
In-Reply-To: <20161208163658-mutt-send-email-mst@kernel.org>
Hi Mike/Bart,
On 12/8/16, 8:17 AM, "virtualization-bounces@lists.linux-foundation.org on behalf of Michael S. Tsirkin" <virtualization-bounces@lists.linux-foundation.org on behalf of mst@redhat.com> wrote:
>On Thu, Dec 08, 2016 at 06:38:11AM +0000, Bart Van Assche wrote:
>> On 12/07/16 21:54, Michael S. Tsirkin wrote:
>> > On Thu, Dec 08, 2016 at 05:21:47AM +0000, Bart Van Assche wrote:
>> >> Additionally, there are notable exceptions to the rule that most drivers
>> >> are endian-clean, e.g. drivers/scsi/qla2xxx. I would appreciate it if it
>> >> would remain possible to check such drivers with sparse without enabling
>> >> endianness checks. Have you considered to change #ifdef __CHECK_ENDIAN__
>> >> into e.g. #ifndef __DONT_CHECK_ENDIAN__?
>> >
>> > The right thing is probably just to fix these, isn't it?
>> > Until then, why not just ignore the warnings?
>>
>> Neither option is realistic. With endian-checking enabled the qla2xxx
>> driver triggers so many warnings that it becomes a real challenge to
>> filter the non-endian warnings out manually:
>>
>> $ for f in "" CF=-D__CHECK_ENDIAN__; do make M=drivers/scsi/qla2xxx C=2\
>> $f | &grep -c ': warning:'; done
>> 4
>> 752
>
>You can always revert this patch in your tree, or whatever. It does not
>look like this will get fixed otherwise.
>
>> If you think it would be easy to fix the endian warnings triggered by
>> the qla2xxx driver, you are welcome to try to fix these.
>>
>> Bart.
>
>Yea, this hardware was designed by someone who thought mixing
>LE and BE all over the place is a good idea.
>But who said it should be easy?
>
>Maybe this change will be enough to motivate the maintainers.
>
>Here's a minor buglet for you as a motivator:
>
> if (ct_rsp->header.response !=
> cpu_to_be16(CT_ACCEPT_RESPONSE)) {
> ql_dbg(ql_dbg_disc + ql_dbg_buffer, vha, 0x2077,
> "%s failed rejected request on port_id: %02x%02x%02x Compeltion status 0x%x, response 0x%x\n",
> routine, vha->d_id.b.domain,
> vha->d_id.b.area, vha->d_id.b.al_pa, comp_status, ct_rsp->header.response);
>
>
>response is BE and isn't printed correctly.
>
>another:
>
> eiter->a.max_frame_size = cpu_to_be32(eiter->a.max_frame_size);
> size += 4 + 4;
>
> ql_dbg(ql_dbg_disc, vha, 0x20bc,
> "Max_Frame_Size = %x.\n", eiter->a.max_frame_size);
>
>printed too late, it's be by that time.
>
>Here's another suspicious line
>
> ctio24->u.status1.flags = (atio->u.isp24.attr << 9) |
> cpu_to_le16(CTIO7_FLAGS_STATUS_MODE_1 |
> CTIO7_FLAGS_TERMINATE);
>
>shifting attr by 9 bits gives different results on BE and LE,
>mixing it with le16 looks rather strange.
>
>Another:
>
> ha->flags.dport_enabled =
> (mid_init_cb->init_cb.firmware_options_1 & BIT_7) != 0;
>
>BIT_7 is native endian, firmware_options_1 is LE I think.
>
>
>
>Look at qla27xx_find_valid_image as well.
>
> if (pri_image_status.signature != QLA27XX_IMG_STATUS_SIGN)
>
>qla27xx_image_status seems to be data coming from flash, but is
>somehow native-endian? Maybe ...
>
>
> lun = a->u.isp24.fcp_cmnd.lun;
>
>I think lun here is in hardware format (le?), code treats it
>as native.
>
>
>Not to speak about interface abuse all over the place.
>How about this:
>
>uint32_t *
>qla24xx_read_flash_data(scsi_qla_host_t *vha, uint32_t *dwptr, uint32_t
>faddr,
> uint32_t dwords)
>{
> uint32_t i;
> struct qla_hw_data *ha = vha->hw;
>
> /* Dword reads to flash. */
> for (i = 0; i < dwords; i++, faddr++)
> dwptr[i] = cpu_to_le32(qla24xx_read_flash_dword(ha,
> flash_data_addr(ha, faddr)));
>
> return dwptr;
>}
>
>OK so we convert to LE ...
>
> qla24xx_read_flash_data(vha, dcode, faddr, 4);
>
> risc_addr = be32_to_cpu(dcode[2]);
> *srisc_addr = *srisc_addr == 0 ? risc_addr : *srisc_addr;
> risc_size = be32_to_cpu(dcode[3]);
>
>then happily assume it's BE.
>
>And again, coming from flash, it's unlikely to actually be in the native
>endian-ness as callers seem to assume. I'm guessing it's all BE.
>
>I poked at it a bit and was able to cut down # of warnings
>from 1700 to 1400 in an hour. Someone familiar with the code
>should look at it.
We’ll take a look and send patches to resolve these warnings.
>
>--
>MST
>_______________________________________________
>Virtualization mailing list
>Virtualization@lists.linux-foundation.org
>https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* xts(ecb(aes-asm))
From: Stephan Müller @ 2016-12-09 6:02 UTC (permalink / raw)
To: herbert; +Cc: linux-crypto
Hi Herbert,
while testing the current cryptodev-2.6 tree, I noticed that instead of the
driver name of xts(aes-asm) (which used to be there), I now see xts(ecb(aes-
asm)).
Is that intentional?
Ciao
Stephan
^ permalink raw reply
* Re: simd ciphers
From: Stephan Müller @ 2016-12-08 18:40 UTC (permalink / raw)
To: herbert; +Cc: linux-crypto
In-Reply-To: <3279008.xAQAe6ukNz@positron.chronox.de>
Am Donnerstag, 8. Dezember 2016, 19:32:03 CET schrieb Stephan Müller:
Hi Herbert,
> Hi Herbert,
>
> with the addition of the simd cipher change I seem to be unable to use the
> AESNI ciphers. My .config contains CONFIG_CRYPTO_SIMD=y and
> CONFIG_CRYPTO_AES_NI_INTEL=y. Those options always worked to load the AES-NI
(with "always worked" I meant the CONFIG_CRYPTO_AES_NI_INTEL=y option of
course, that always ensured AES-NI to be present and usable)
> implementation. But now, I do not see the AES-NI implementations any more
> in / proc/crypto.
>
> Ciao
> Stephan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Ciao
Stephan
^ permalink raw reply
* simd ciphers
From: Stephan Müller @ 2016-12-08 18:32 UTC (permalink / raw)
To: herbert; +Cc: linux-crypto
Hi Herbert,
with the addition of the simd cipher change I seem to be unable to use the
AESNI ciphers. My .config contains CONFIG_CRYPTO_SIMD=y and
CONFIG_CRYPTO_AES_NI_INTEL=y. Those options always worked to load the AES-NI
implementation. But now, I do not see the AES-NI implementations any more in /
proc/crypto.
Ciao
Stephan
^ permalink raw reply
* Re: [PATCH] linux/types.h: enable endian checks for all sparse builds
From: Michael S. Tsirkin @ 2016-12-08 16:17 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-kernel@vger.kernel.org, Linus Torvalds, Christoph Hellwig,
Jason Wang, linux-kbuild@vger.kernel.org, Michal Marek,
Arnd Bergmann, Greg Kroah-Hartman, Matt Mackall, Herbert Xu,
David Airlie, Gerd Hoffmann, Ohad Ben-Cohen,
Christian Borntraeger, Cornelia Huck, James E.J. Bottomley,
David S. Miller, Jens Axboe, Neil Armstrong <narmstron
In-Reply-To: <BLUPR02MB168304F8FBA50C916A6A4E6081840@BLUPR02MB1683.namprd02.prod.outlook.com>
On Thu, Dec 08, 2016 at 06:38:11AM +0000, Bart Van Assche wrote:
> On 12/07/16 21:54, Michael S. Tsirkin wrote:
> > On Thu, Dec 08, 2016 at 05:21:47AM +0000, Bart Van Assche wrote:
> >> Additionally, there are notable exceptions to the rule that most drivers
> >> are endian-clean, e.g. drivers/scsi/qla2xxx. I would appreciate it if it
> >> would remain possible to check such drivers with sparse without enabling
> >> endianness checks. Have you considered to change #ifdef __CHECK_ENDIAN__
> >> into e.g. #ifndef __DONT_CHECK_ENDIAN__?
> >
> > The right thing is probably just to fix these, isn't it?
> > Until then, why not just ignore the warnings?
>
> Neither option is realistic. With endian-checking enabled the qla2xxx
> driver triggers so many warnings that it becomes a real challenge to
> filter the non-endian warnings out manually:
>
> $ for f in "" CF=-D__CHECK_ENDIAN__; do make M=drivers/scsi/qla2xxx C=2\
> $f | &grep -c ': warning:'; done
> 4
> 752
You can always revert this patch in your tree, or whatever. It does not
look like this will get fixed otherwise.
> If you think it would be easy to fix the endian warnings triggered by
> the qla2xxx driver, you are welcome to try to fix these.
>
> Bart.
Yea, this hardware was designed by someone who thought mixing
LE and BE all over the place is a good idea.
But who said it should be easy?
Maybe this change will be enough to motivate the maintainers.
Here's a minor buglet for you as a motivator:
if (ct_rsp->header.response !=
cpu_to_be16(CT_ACCEPT_RESPONSE)) {
ql_dbg(ql_dbg_disc + ql_dbg_buffer, vha, 0x2077,
"%s failed rejected request on port_id: %02x%02x%02x Compeltion status 0x%x, response 0x%x\n",
routine, vha->d_id.b.domain,
vha->d_id.b.area, vha->d_id.b.al_pa, comp_status, ct_rsp->header.response);
response is BE and isn't printed correctly.
another:
eiter->a.max_frame_size = cpu_to_be32(eiter->a.max_frame_size);
size += 4 + 4;
ql_dbg(ql_dbg_disc, vha, 0x20bc,
"Max_Frame_Size = %x.\n", eiter->a.max_frame_size);
printed too late, it's be by that time.
Here's another suspicious line
ctio24->u.status1.flags = (atio->u.isp24.attr << 9) |
cpu_to_le16(CTIO7_FLAGS_STATUS_MODE_1 |
CTIO7_FLAGS_TERMINATE);
shifting attr by 9 bits gives different results on BE and LE,
mixing it with le16 looks rather strange.
Another:
ha->flags.dport_enabled =
(mid_init_cb->init_cb.firmware_options_1 & BIT_7) != 0;
BIT_7 is native endian, firmware_options_1 is LE I think.
Look at qla27xx_find_valid_image as well.
if (pri_image_status.signature != QLA27XX_IMG_STATUS_SIGN)
qla27xx_image_status seems to be data coming from flash, but is
somehow native-endian? Maybe ...
lun = a->u.isp24.fcp_cmnd.lun;
I think lun here is in hardware format (le?), code treats it
as native.
Not to speak about interface abuse all over the place.
How about this:
uint32_t *
qla24xx_read_flash_data(scsi_qla_host_t *vha, uint32_t *dwptr, uint32_t
faddr,
uint32_t dwords)
{
uint32_t i;
struct qla_hw_data *ha = vha->hw;
/* Dword reads to flash. */
for (i = 0; i < dwords; i++, faddr++)
dwptr[i] = cpu_to_le32(qla24xx_read_flash_dword(ha,
flash_data_addr(ha, faddr)));
return dwptr;
}
OK so we convert to LE ...
qla24xx_read_flash_data(vha, dcode, faddr, 4);
risc_addr = be32_to_cpu(dcode[2]);
*srisc_addr = *srisc_addr == 0 ? risc_addr : *srisc_addr;
risc_size = be32_to_cpu(dcode[3]);
then happily assume it's BE.
And again, coming from flash, it's unlikely to actually be in the native
endian-ness as callers seem to assume. I'm guessing it's all BE.
I poked at it a bit and was able to cut down # of warnings
from 1700 to 1400 in an hour. Someone familiar with the code
should look at it.
--
MST
^ permalink raw reply
* [PATCH 2/2] crypto: arm/chacha20 - implement NEON version based on SSE3 code
From: Ard Biesheuvel @ 2016-12-08 14:28 UTC (permalink / raw)
To: linux-crypto, herbert; +Cc: linux-arm-kernel, Ard Biesheuvel
In-Reply-To: <1481207339-17332-1-git-send-email-ard.biesheuvel@linaro.org>
This is a straight port to ARM/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm/crypto/Kconfig | 6 +
arch/arm/crypto/Makefile | 2 +
arch/arm/crypto/chacha20-neon-core.S | 524 ++++++++++++++++++++
arch/arm/crypto/chacha20-neon-glue.c | 136 +++++
4 files changed, 668 insertions(+)
diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 13f1b4c289d4..2f3339f015d3 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -130,4 +130,10 @@ config CRYPTO_CRC32_ARM_CE
depends on KERNEL_MODE_NEON && CRC32
select CRYPTO_HASH
+config CRYPTO_CHACHA20_NEON
+ tristate "NEON accelerated ChaCha20 symmetric cipher"
+ depends on KERNEL_MODE_NEON
+ select CRYPTO_BLKCIPHER
+ select CRYPTO_CHACHA20
+
endif
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index b578a1820ab1..8d74e55eacd4 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
+obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
@@ -40,6 +41,7 @@ aes-arm-ce-y := aes-ce-core.o aes-ce-glue.o
ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o
crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
+chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
quiet_cmd_perl = PERL $@
cmd_perl = $(PERL) $(<) > $(@)
diff --git a/arch/arm/crypto/chacha20-neon-core.S b/arch/arm/crypto/chacha20-neon-core.S
new file mode 100644
index 000000000000..9f315041f521
--- /dev/null
+++ b/arch/arm/crypto/chacha20-neon-core.S
@@ -0,0 +1,524 @@
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
+ *
+ * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on:
+ * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SNEON3 functions
+ *
+ * Copyright (C) 2015 Martin Willi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/linkage.h>
+
+ .text
+ .fpu neon
+ .align 5
+
+ENTRY(chacha20_block_xor_neon)
+ // r0: Input state matrix, s
+ // r1: 1 data block output, o
+ // r2: 1 data block input, i
+
+ //
+ // This function encrypts one ChaCha20 block by loading the state matrix
+ // in four NEON registers. It performs matrix operation on four words in
+ // parallel, but requireds shuffling to rearrange the words after each
+ // round.
+ //
+
+ // x0..3 = s0..3
+ add ip, r0, #0x20
+ vld1.32 {q0-q1}, [r0]
+ vld1.32 {q2-q3}, [ip]
+
+ vmov q8, q0
+ vmov q9, q1
+ vmov q10, q2
+ vmov q11, q3
+
+ mov r3, #10
+
+.Ldoubleround:
+ // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+ vadd.i32 q0, q0, q1
+ veor q4, q3, q0
+ vshl.u32 q3, q4, #16
+ vsri.u32 q3, q4, #16
+
+ // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+ vadd.i32 q2, q2, q3
+ veor q4, q1, q2
+ vshl.u32 q1, q4, #12
+ vsri.u32 q1, q4, #20
+
+ // x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+ vadd.i32 q0, q0, q1
+ veor q4, q3, q0
+ vshl.u32 q3, q4, #8
+ vsri.u32 q3, q4, #24
+
+ // x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+ vadd.i32 q2, q2, q3
+ veor q4, q1, q2
+ vshl.u32 q1, q4, #7
+ vsri.u32 q1, q4, #25
+
+ // x1 = shuffle32(x1, MASK(0, 3, 2, 1))
+ vext.8 q1, q1, q1, #4
+ // x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+ vext.8 q2, q2, q2, #8
+ // x3 = shuffle32(x3, MASK(2, 1, 0, 3))
+ vext.8 q3, q3, q3, #12
+
+ // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+ vadd.i32 q0, q0, q1
+ veor q4, q3, q0
+ vshl.u32 q3, q4, #16
+ vsri.u32 q3, q4, #16
+
+ // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+ vadd.i32 q2, q2, q3
+ veor q4, q1, q2
+ vshl.u32 q1, q4, #12
+ vsri.u32 q1, q4, #20
+
+ // x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+ vadd.i32 q0, q0, q1
+ veor q4, q3, q0
+ vshl.u32 q3, q4, #8
+ vsri.u32 q3, q4, #24
+
+ // x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+ vadd.i32 q2, q2, q3
+ veor q4, q1, q2
+ vshl.u32 q1, q4, #7
+ vsri.u32 q1, q4, #25
+
+ // x1 = shuffle32(x1, MASK(2, 1, 0, 3))
+ vext.8 q1, q1, q1, #12
+ // x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+ vext.8 q2, q2, q2, #8
+ // x3 = shuffle32(x3, MASK(0, 3, 2, 1))
+ vext.8 q3, q3, q3, #4
+
+ subs r3, r3, #1
+ bne .Ldoubleround
+
+ add ip, r2, #0x20
+ vld1.8 {q4-q5}, [r2]
+ vld1.8 {q6-q7}, [ip]
+
+ // o0 = i0 ^ (x0 + s0)
+ vadd.i32 q0, q0, q8
+ veor q0, q0, q4
+
+ // o1 = i1 ^ (x1 + s1)
+ vadd.i32 q1, q1, q9
+ veor q1, q1, q5
+
+ // o2 = i2 ^ (x2 + s2)
+ vadd.i32 q2, q2, q10
+ veor q2, q2, q6
+
+ // o3 = i3 ^ (x3 + s3)
+ vadd.i32 q3, q3, q11
+ veor q3, q3, q7
+
+ add ip, r1, #0x20
+ vst1.8 {q0-q1}, [r1]
+ vst1.8 {q2-q3}, [ip]
+
+ bx lr
+ENDPROC(chacha20_block_xor_neon)
+
+ .align 5
+ENTRY(chacha20_4block_xor_neon)
+ push {r4-r6, lr}
+ mov ip, sp // preserve the stack pointer
+ sub r3, sp, #0x20 // allocate a 32 byte buffer
+ bic r3, r3, #0x1f // aligned to 32 bytes
+ mov sp, r3
+
+ // r0: Input state matrix, s
+ // r1: 4 data blocks output, o
+ // r2: 4 data blocks input, i
+
+ //
+ // This function encrypts four consecutive ChaCha20 blocks by loading
+ // the state matrix in NEON registers four times. The algorithm performs
+ // each operation on the corresponding word of each state matrix, hence
+ // requires no word shuffling. For final XORing step we transpose the
+ // matrix by interleaving 32- and then 64-bit words, which allows us to
+ // do XOR in NEON registers.
+ //
+
+ // x0..15[0-3] = s0..3[0..3]
+ add r3, r0, #0x20
+ vld1.32 {q0-q1}, [r0]
+ vld1.32 {q2-q3}, [r3]
+
+ adr r3, CTRINC
+ vdup.32 q15, d7[1]
+ vdup.32 q14, d7[0]
+ vld1.32 {q11}, [r3, :128]
+ vdup.32 q13, d6[1]
+ vdup.32 q12, d6[0]
+ vadd.i32 q12, q12, q11 // x12 += counter values 0-3
+ vdup.32 q11, d5[1]
+ vdup.32 q10, d5[0]
+ vdup.32 q9, d4[1]
+ vdup.32 q8, d4[0]
+ vdup.32 q7, d3[1]
+ vdup.32 q6, d3[0]
+ vdup.32 q5, d2[1]
+ vdup.32 q4, d2[0]
+ vdup.32 q3, d1[1]
+ vdup.32 q2, d1[0]
+ vdup.32 q1, d0[1]
+ vdup.32 q0, d0[0]
+
+ mov r3, #10
+
+.Ldoubleround4:
+ // x0 += x4, x12 = rotl32(x12 ^ x0, 16)
+ // x1 += x5, x13 = rotl32(x13 ^ x1, 16)
+ // x2 += x6, x14 = rotl32(x14 ^ x2, 16)
+ // x3 += x7, x15 = rotl32(x15 ^ x3, 16)
+ vadd.i32 q0, q0, q4
+ vadd.i32 q1, q1, q5
+ vadd.i32 q2, q2, q6
+ vadd.i32 q3, q3, q7
+
+ veor q12, q12, q0
+ veor q13, q13, q1
+ veor q14, q14, q2
+ veor q15, q15, q3
+
+ vrev32.16 q12, q12
+ vrev32.16 q13, q13
+ vrev32.16 q14, q14
+ vrev32.16 q15, q15
+
+ // x8 += x12, x4 = rotl32(x4 ^ x8, 12)
+ // x9 += x13, x5 = rotl32(x5 ^ x9, 12)
+ // x10 += x14, x6 = rotl32(x6 ^ x10, 12)
+ // x11 += x15, x7 = rotl32(x7 ^ x11, 12)
+ vadd.i32 q8, q8, q12
+ vadd.i32 q9, q9, q13
+ vadd.i32 q10, q10, q14
+ vadd.i32 q11, q11, q15
+
+ vst1.32 {q8-q9}, [sp, :256]
+
+ veor q8, q4, q8
+ veor q9, q5, q9
+ vshl.u32 q4, q8, #12
+ vshl.u32 q5, q9, #12
+ vsri.u32 q4, q8, #20
+ vsri.u32 q5, q9, #20
+
+ veor q8, q6, q10
+ veor q9, q7, q11
+ vshl.u32 q6, q8, #12
+ vshl.u32 q7, q9, #12
+ vsri.u32 q6, q8, #20
+ vsri.u32 q7, q9, #20
+
+ // x0 += x4, x12 = rotl32(x12 ^ x0, 8)
+ // x1 += x5, x13 = rotl32(x13 ^ x1, 8)
+ // x2 += x6, x14 = rotl32(x14 ^ x2, 8)
+ // x3 += x7, x15 = rotl32(x15 ^ x3, 8)
+ vadd.i32 q0, q0, q4
+ vadd.i32 q1, q1, q5
+ vadd.i32 q2, q2, q6
+ vadd.i32 q3, q3, q7
+
+ veor q8, q12, q0
+ veor q9, q13, q1
+ vshl.u32 q12, q8, #8
+ vshl.u32 q13, q9, #8
+ vsri.u32 q12, q8, #24
+ vsri.u32 q13, q9, #24
+
+ veor q8, q14, q2
+ veor q9, q15, q3
+ vshl.u32 q14, q8, #8
+ vshl.u32 q15, q9, #8
+ vsri.u32 q14, q8, #24
+ vsri.u32 q15, q9, #24
+
+ vld1.32 {q8-q9}, [sp, :256]
+
+ // x8 += x12, x4 = rotl32(x4 ^ x8, 7)
+ // x9 += x13, x5 = rotl32(x5 ^ x9, 7)
+ // x10 += x14, x6 = rotl32(x6 ^ x10, 7)
+ // x11 += x15, x7 = rotl32(x7 ^ x11, 7)
+ vadd.i32 q8, q8, q12
+ vadd.i32 q9, q9, q13
+ vadd.i32 q10, q10, q14
+ vadd.i32 q11, q11, q15
+
+ vst1.32 {q8-q9}, [sp, :256]
+
+ veor q8, q4, q8
+ veor q9, q5, q9
+ vshl.u32 q4, q8, #7
+ vshl.u32 q5, q9, #7
+ vsri.u32 q4, q8, #25
+ vsri.u32 q5, q9, #25
+
+ veor q8, q6, q10
+ veor q9, q7, q11
+ vshl.u32 q6, q8, #7
+ vshl.u32 q7, q9, #7
+ vsri.u32 q6, q8, #25
+ vsri.u32 q7, q9, #25
+
+ vld1.32 {q8-q9}, [sp, :256]
+
+ // x0 += x5, x15 = rotl32(x15 ^ x0, 16)
+ // x1 += x6, x12 = rotl32(x12 ^ x1, 16)
+ // x2 += x7, x13 = rotl32(x13 ^ x2, 16)
+ // x3 += x4, x14 = rotl32(x14 ^ x3, 16)
+ vadd.i32 q0, q0, q5
+ vadd.i32 q1, q1, q6
+ vadd.i32 q2, q2, q7
+ vadd.i32 q3, q3, q4
+
+ veor q15, q15, q0
+ veor q12, q12, q1
+ veor q13, q13, q2
+ veor q14, q14, q3
+
+ vrev32.16 q15, q15
+ vrev32.16 q12, q12
+ vrev32.16 q13, q13
+ vrev32.16 q14, q14
+
+ // x10 += x15, x5 = rotl32(x5 ^ x10, 12)
+ // x11 += x12, x6 = rotl32(x6 ^ x11, 12)
+ // x8 += x13, x7 = rotl32(x7 ^ x8, 12)
+ // x9 += x14, x4 = rotl32(x4 ^ x9, 12)
+ vadd.i32 q10, q10, q15
+ vadd.i32 q11, q11, q12
+ vadd.i32 q8, q8, q13
+ vadd.i32 q9, q9, q14
+
+ vst1.32 {q8-q9}, [sp, :256]
+
+ veor q8, q7, q8
+ veor q9, q4, q9
+ vshl.u32 q7, q8, #12
+ vshl.u32 q4, q9, #12
+ vsri.u32 q7, q8, #20
+ vsri.u32 q4, q9, #20
+
+ veor q8, q5, q10
+ veor q9, q6, q11
+ vshl.u32 q5, q8, #12
+ vshl.u32 q6, q9, #12
+ vsri.u32 q5, q8, #20
+ vsri.u32 q6, q9, #20
+
+ // x0 += x5, x15 = rotl32(x15 ^ x0, 8)
+ // x1 += x6, x12 = rotl32(x12 ^ x1, 8)
+ // x2 += x7, x13 = rotl32(x13 ^ x2, 8)
+ // x3 += x4, x14 = rotl32(x14 ^ x3, 8)
+ vadd.i32 q0, q0, q5
+ vadd.i32 q1, q1, q6
+ vadd.i32 q2, q2, q7
+ vadd.i32 q3, q3, q4
+
+ veor q8, q15, q0
+ veor q9, q12, q1
+ vshl.u32 q15, q8, #8
+ vshl.u32 q12, q9, #8
+ vsri.u32 q15, q8, #24
+ vsri.u32 q12, q9, #24
+
+ veor q8, q13, q2
+ veor q9, q14, q3
+ vshl.u32 q13, q8, #8
+ vshl.u32 q14, q9, #8
+ vsri.u32 q13, q8, #24
+ vsri.u32 q14, q9, #24
+
+ vld1.32 {q8-q9}, [sp, :256]
+
+ // x10 += x15, x5 = rotl32(x5 ^ x10, 7)
+ // x11 += x12, x6 = rotl32(x6 ^ x11, 7)
+ // x8 += x13, x7 = rotl32(x7 ^ x8, 7)
+ // x9 += x14, x4 = rotl32(x4 ^ x9, 7)
+ vadd.i32 q10, q10, q15
+ vadd.i32 q11, q11, q12
+ vadd.i32 q8, q8, q13
+ vadd.i32 q9, q9, q14
+
+ vst1.32 {q8-q9}, [sp, :256]
+
+ veor q8, q7, q8
+ veor q9, q4, q9
+ vshl.u32 q7, q8, #7
+ vshl.u32 q4, q9, #7
+ vsri.u32 q7, q8, #25
+ vsri.u32 q4, q9, #25
+
+ veor q8, q5, q10
+ veor q9, q6, q11
+ vshl.u32 q5, q8, #7
+ vshl.u32 q6, q9, #7
+ vsri.u32 q5, q8, #25
+ vsri.u32 q6, q9, #25
+
+ subs r3, r3, #1
+ beq 0f
+
+ vld1.32 {q8-q9}, [sp, :256]
+ b .Ldoubleround4
+
+ // x0[0-3] += s0[0]
+ // x1[0-3] += s0[1]
+ // x2[0-3] += s0[2]
+ // x3[0-3] += s0[3]
+0: ldmia r0!, {r3-r6}
+ vdup.32 q8, r3
+ vdup.32 q9, r4
+ vadd.i32 q0, q0, q8
+ vadd.i32 q1, q1, q9
+ vdup.32 q8, r5
+ vdup.32 q9, r6
+ vadd.i32 q2, q2, q8
+ vadd.i32 q3, q3, q9
+
+ // x4[0-3] += s1[0]
+ // x5[0-3] += s1[1]
+ // x6[0-3] += s1[2]
+ // x7[0-3] += s1[3]
+ ldmia r0!, {r3-r6}
+ vdup.32 q8, r3
+ vdup.32 q9, r4
+ vadd.i32 q4, q4, q8
+ vadd.i32 q5, q5, q9
+ vdup.32 q8, r5
+ vdup.32 q9, r6
+ vadd.i32 q6, q6, q8
+ vadd.i32 q7, q7, q9
+
+ // interleave 32-bit words in state n, n+1
+ vzip.32 q0, q1
+ vzip.32 q2, q3
+ vzip.32 q4, q5
+ vzip.32 q6, q7
+
+ // interleave 64-bit words in state n, n+2
+ vswp d1, d4
+ vswp d3, d6
+ vswp d9, d12
+ vswp d11, d14
+
+ // xor with corresponding input, write to output
+ vld1.8 {q8-q9}, [r2]!
+ veor q8, q8, q0
+ veor q9, q9, q4
+ vst1.8 {q8-q9}, [r1]!
+
+ vld1.32 {q8-q9}, [sp, :256]
+
+ // x8[0-3] += s2[0]
+ // x9[0-3] += s2[1]
+ // x10[0-3] += s2[2]
+ // x11[0-3] += s2[3]
+ ldmia r0!, {r3-r6}
+ vdup.32 q0, r3
+ vdup.32 q4, r4
+ vadd.i32 q8, q8, q0
+ vadd.i32 q9, q9, q4
+ vdup.32 q0, r5
+ vdup.32 q4, r6
+ vadd.i32 q10, q10, q0
+ vadd.i32 q11, q11, q4
+
+ // x12[0-3] += s3[0]
+ // x13[0-3] += s3[1]
+ // x14[0-3] += s3[2]
+ // x15[0-3] += s3[3]
+ ldmia r0!, {r3-r6}
+ vdup.32 q0, r3
+ vdup.32 q4, r4
+ adr r3, CTRINC
+ vadd.i32 q12, q12, q0
+ vld1.32 {q0}, [r3, :128]
+ vadd.i32 q13, q13, q4
+ vadd.i32 q12, q12, q0 // x12 += counter values 0-3
+
+ vdup.32 q0, r5
+ vdup.32 q4, r6
+ vadd.i32 q14, q14, q0
+ vadd.i32 q15, q15, q4
+
+ // interleave 32-bit words in state n, n+1
+ vzip.32 q8, q9
+ vzip.32 q10, q11
+ vzip.32 q12, q13
+ vzip.32 q14, q15
+
+ // interleave 64-bit words in state n, n+2
+ vswp d17, d20
+ vswp d19, d22
+ vswp d25, d28
+ vswp d27, d30
+
+ vmov q4, q1
+
+ vld1.8 {q0-q1}, [r2]!
+ veor q0, q0, q8
+ veor q1, q1, q12
+ vst1.8 {q0-q1}, [r1]!
+
+ vld1.8 {q0-q1}, [r2]!
+ veor q0, q0, q2
+ veor q1, q1, q6
+ vst1.8 {q0-q1}, [r1]!
+
+ vld1.8 {q0-q1}, [r2]!
+ veor q0, q0, q10
+ veor q1, q1, q14
+ vst1.8 {q0-q1}, [r1]!
+
+ vld1.8 {q0-q1}, [r2]!
+ veor q0, q0, q4
+ veor q1, q1, q5
+ vst1.8 {q0-q1}, [r1]!
+
+ vld1.8 {q0-q1}, [r2]!
+ veor q0, q0, q9
+ veor q1, q1, q13
+ vst1.8 {q0-q1}, [r1]!
+
+ vld1.8 {q0-q1}, [r2]!
+ veor q0, q0, q3
+ veor q1, q1, q7
+ vst1.8 {q0-q1}, [r1]!
+
+ vld1.8 {q0-q1}, [r2]
+ veor q0, q0, q11
+ veor q1, q1, q15
+ vst1.8 {q0-q1}, [r1]
+
+ mov sp, ip
+ pop {r4-r6, pc}
+ENDPROC(chacha20_4block_xor_neon)
+
+ .align 4
+CTRINC: .word 0, 1, 2, 3
+
diff --git a/arch/arm/crypto/chacha20-neon-glue.c b/arch/arm/crypto/chacha20-neon-glue.c
new file mode 100644
index 000000000000..554f7f6069da
--- /dev/null
+++ b/arch/arm/crypto/chacha20-neon-glue.c
@@ -0,0 +1,136 @@
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
+ *
+ * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on:
+ * ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
+ *
+ * Copyright (C) 2015 Martin Willi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <crypto/algapi.h>
+#include <crypto/chacha20.h>
+#include <linux/crypto.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#include <asm/simd.h>
+
+asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
+asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
+
+static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src,
+ unsigned int bytes)
+{
+ u8 buf[CHACHA20_BLOCK_SIZE];
+
+ while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
+ chacha20_4block_xor_neon(state, dst, src);
+ bytes -= CHACHA20_BLOCK_SIZE * 4;
+ src += CHACHA20_BLOCK_SIZE * 4;
+ dst += CHACHA20_BLOCK_SIZE * 4;
+ state[12] += 4;
+ }
+ while (bytes >= CHACHA20_BLOCK_SIZE) {
+ chacha20_block_xor_neon(state, dst, src);
+ bytes -= CHACHA20_BLOCK_SIZE;
+ src += CHACHA20_BLOCK_SIZE;
+ dst += CHACHA20_BLOCK_SIZE;
+ state[12]++;
+ }
+ if (bytes) {
+ memcpy(buf, src, bytes);
+ chacha20_block_xor_neon(state, buf, buf);
+ memcpy(dst, buf, bytes);
+ }
+}
+
+static int chacha20_simd(struct blkcipher_desc *desc, struct scatterlist *dst,
+ struct scatterlist *src, unsigned int nbytes)
+{
+ struct blkcipher_walk walk;
+ u32 state[16];
+ int err;
+
+ if (nbytes <= CHACHA20_BLOCK_SIZE || !may_use_simd())
+ return crypto_chacha20_crypt(desc, dst, src, nbytes);
+
+ blkcipher_walk_init(&walk, dst, src, nbytes);
+ err = blkcipher_walk_virt_block(desc, &walk, CHACHA20_BLOCK_SIZE);
+
+ crypto_chacha20_init(state, crypto_blkcipher_ctx(desc->tfm), walk.iv);
+
+ kernel_neon_begin();
+
+ while (walk.nbytes >= CHACHA20_BLOCK_SIZE) {
+ chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
+ rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE));
+ err = blkcipher_walk_done(desc, &walk,
+ walk.nbytes % CHACHA20_BLOCK_SIZE);
+ }
+
+ if (walk.nbytes) {
+ chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
+ walk.nbytes);
+ err = blkcipher_walk_done(desc, &walk, 0);
+ }
+
+ kernel_neon_end();
+
+ return err;
+}
+
+static struct crypto_alg alg = {
+ .cra_name = "chacha20",
+ .cra_driver_name = "chacha20-neon",
+ .cra_priority = 300,
+ .cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER,
+ .cra_blocksize = 1,
+ .cra_type = &crypto_blkcipher_type,
+ .cra_ctxsize = sizeof(struct chacha20_ctx),
+ .cra_alignmask = sizeof(u32) - 1,
+ .cra_module = THIS_MODULE,
+ .cra_u = {
+ .blkcipher = {
+ .min_keysize = CHACHA20_KEY_SIZE,
+ .max_keysize = CHACHA20_KEY_SIZE,
+ .ivsize = CHACHA20_IV_SIZE,
+ .geniv = "seqiv",
+ .setkey = crypto_chacha20_setkey,
+ .encrypt = chacha20_simd,
+ .decrypt = chacha20_simd,
+ },
+ },
+};
+
+static int __init chacha20_simd_mod_init(void)
+{
+ if (!(elf_hwcap & HWCAP_NEON))
+ return -ENODEV;
+
+ return crypto_register_alg(&alg);
+}
+
+static void __exit chacha20_simd_mod_fini(void)
+{
+ crypto_unregister_alg(&alg);
+}
+
+module_init(chacha20_simd_mod_init);
+module_exit(chacha20_simd_mod_fini);
+
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("chacha20");
--
2.7.4
^ permalink raw reply related
* [PATCH 1/2] crypto: arm64/chacha20 - implement NEON version based on SSE3 code
From: Ard Biesheuvel @ 2016-12-08 14:28 UTC (permalink / raw)
To: linux-crypto, herbert; +Cc: linux-arm-kernel, Ard Biesheuvel
In-Reply-To: <1481207339-17332-1-git-send-email-ard.biesheuvel@linaro.org>
This is a straight port to arm64/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/Kconfig | 6 +
arch/arm64/crypto/Makefile | 3 +
arch/arm64/crypto/chacha20-neon-core.S | 480 ++++++++++++++++++++
arch/arm64/crypto/chacha20-neon-glue.c | 131 ++++++
4 files changed, 620 insertions(+)
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 450a85df041a..0bf0f531f539 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -72,4 +72,10 @@ config CRYPTO_CRC32_ARM64
depends on ARM64
select CRYPTO_HASH
+config CRYPTO_CHACHA20_NEON
+ tristate "NEON accelerated ChaCha20 symmetric cipher"
+ depends on KERNEL_MODE_NEON
+ select CRYPTO_BLKCIPHER
+ select CRYPTO_CHACHA20
+
endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index aa8888d7b744..9d2826c5fccf 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -41,6 +41,9 @@ sha256-arm64-y := sha256-glue.o sha256-core.o
obj-$(CONFIG_CRYPTO_SHA512_ARM64) += sha512-arm64.o
sha512-arm64-y := sha512-glue.o sha512-core.o
+obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
+chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
+
AFLAGS_aes-ce.o := -DINTERLEAVE=4
AFLAGS_aes-neon.o := -DINTERLEAVE=4
diff --git a/arch/arm64/crypto/chacha20-neon-core.S b/arch/arm64/crypto/chacha20-neon-core.S
new file mode 100644
index 000000000000..e2cd65580807
--- /dev/null
+++ b/arch/arm64/crypto/chacha20-neon-core.S
@@ -0,0 +1,480 @@
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
+ *
+ * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on:
+ * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSSE3 functions
+ *
+ * Copyright (C) 2015 Martin Willi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/linkage.h>
+
+ .text
+ .align 6
+
+ENTRY(chacha20_block_xor_neon)
+ // x0: Input state matrix, s
+ // x1: 1 data block output, o
+ // x2: 1 data block input, i
+
+ //
+ // This function encrypts one ChaCha20 block by loading the state matrix
+ // in four NEON registers. It performs matrix operation on four words in
+ // parallel, but requires shuffling to rearrange the words after each
+ // round.
+ //
+
+ // x0..3 = s0..3
+ ld1 {v0.4s-v3.4s}, [x0]
+ ld1 {v8.4s-v11.4s}, [x0]
+
+ mov x3, #10
+
+.Ldoubleround:
+ // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+ add v0.4s, v0.4s, v1.4s
+ eor v3.16b, v3.16b, v0.16b
+ rev32 v3.8h, v3.8h
+
+ // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+ add v2.4s, v2.4s, v3.4s
+ eor v4.16b, v1.16b, v2.16b
+ shl v1.4s, v4.4s, #12
+ sri v1.4s, v4.4s, #20
+
+ // x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+ add v0.4s, v0.4s, v1.4s
+ eor v4.16b, v3.16b, v0.16b
+ shl v3.4s, v4.4s, #8
+ sri v3.4s, v4.4s, #24
+
+ // x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+ add v2.4s, v2.4s, v3.4s
+ eor v4.16b, v1.16b, v2.16b
+ shl v1.4s, v4.4s, #7
+ sri v1.4s, v4.4s, #25
+
+ // x1 = shuffle32(x1, MASK(0, 3, 2, 1))
+ ext v1.16b, v1.16b, v1.16b, #4
+ // x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+ ext v2.16b, v2.16b, v2.16b, #8
+ // x3 = shuffle32(x3, MASK(2, 1, 0, 3))
+ ext v3.16b, v3.16b, v3.16b, #12
+
+ // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+ add v0.4s, v0.4s, v1.4s
+ eor v3.16b, v3.16b, v0.16b
+ rev32 v3.8h, v3.8h
+
+ // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+ add v2.4s, v2.4s, v3.4s
+ eor v4.16b, v1.16b, v2.16b
+ shl v1.4s, v4.4s, #12
+ sri v1.4s, v4.4s, #20
+
+ // x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+ add v0.4s, v0.4s, v1.4s
+ eor v4.16b, v3.16b, v0.16b
+ shl v3.4s, v4.4s, #8
+ sri v3.4s, v4.4s, #24
+
+ // x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+ add v2.4s, v2.4s, v3.4s
+ eor v4.16b, v1.16b, v2.16b
+ shl v1.4s, v4.4s, #7
+ sri v1.4s, v4.4s, #25
+
+ // x1 = shuffle32(x1, MASK(2, 1, 0, 3))
+ ext v1.16b, v1.16b, v1.16b, #12
+ // x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+ ext v2.16b, v2.16b, v2.16b, #8
+ // x3 = shuffle32(x3, MASK(0, 3, 2, 1))
+ ext v3.16b, v3.16b, v3.16b, #4
+
+ subs x3, x3, #1
+ b.ne .Ldoubleround
+
+ ld1 {v4.16b-v7.16b}, [x2]
+
+ // o0 = i0 ^ (x0 + s0)
+ add v0.4s, v0.4s, v8.4s
+ eor v0.16b, v0.16b, v4.16b
+
+ // o1 = i1 ^ (x1 + s1)
+ add v1.4s, v1.4s, v9.4s
+ eor v1.16b, v1.16b, v5.16b
+
+ // o2 = i2 ^ (x2 + s2)
+ add v2.4s, v2.4s, v10.4s
+ eor v2.16b, v2.16b, v6.16b
+
+ // o3 = i3 ^ (x3 + s3)
+ add v3.4s, v3.4s, v11.4s
+ eor v3.16b, v3.16b, v7.16b
+
+ st1 {v0.16b-v3.16b}, [x1]
+
+ ret
+ENDPROC(chacha20_block_xor_neon)
+
+ .align 6
+ENTRY(chacha20_4block_xor_neon)
+ // x0: Input state matrix, s
+ // x1: 4 data blocks output, o
+ // x2: 4 data blocks input, i
+
+ //
+ // This function encrypts four consecutive ChaCha20 blocks by loading
+ // the state matrix in NEON registers four times. The algorithm performs
+ // each operation on the corresponding word of each state matrix, hence
+ // requires no word shuffling. For final XORing step we transpose the
+ // matrix by interleaving 32- and then 64-bit words, which allows us to
+ // do XOR in NEON registers.
+ //
+ adr x3, CTRINC
+ ld1 {v16.4s}, [x3]
+
+ // x0..15[0-3] = s0..3[0..3]
+ mov x4, x0
+ ld4r { v0.4s- v3.4s}, [x4], #16
+ ld4r { v4.4s- v7.4s}, [x4], #16
+ ld4r { v8.4s-v11.4s}, [x4], #16
+ ld4r {v12.4s-v15.4s}, [x4]
+
+ // x12 += counter values 0-3
+ add v12.4s, v12.4s, v16.4s
+
+ mov x3, #10
+
+.Ldoubleround4:
+ // x0 += x4, x12 = rotl32(x12 ^ x0, 16)
+ // x1 += x5, x13 = rotl32(x13 ^ x1, 16)
+ // x2 += x6, x14 = rotl32(x14 ^ x2, 16)
+ // x3 += x7, x15 = rotl32(x15 ^ x3, 16)
+ add v0.4s, v0.4s, v4.4s
+ add v1.4s, v1.4s, v5.4s
+ add v2.4s, v2.4s, v6.4s
+ add v3.4s, v3.4s, v7.4s
+
+ eor v12.16b, v12.16b, v0.16b
+ eor v13.16b, v13.16b, v1.16b
+ eor v14.16b, v14.16b, v2.16b
+ eor v15.16b, v15.16b, v3.16b
+
+ rev32 v12.8h, v12.8h
+ rev32 v13.8h, v13.8h
+ rev32 v14.8h, v14.8h
+ rev32 v15.8h, v15.8h
+
+ // x8 += x12, x4 = rotl32(x4 ^ x8, 12)
+ // x9 += x13, x5 = rotl32(x5 ^ x9, 12)
+ // x10 += x14, x6 = rotl32(x6 ^ x10, 12)
+ // x11 += x15, x7 = rotl32(x7 ^ x11, 12)
+ add v8.4s, v8.4s, v12.4s
+ add v9.4s, v9.4s, v13.4s
+ add v10.4s, v10.4s, v14.4s
+ add v11.4s, v11.4s, v15.4s
+
+ eor v17.16b, v4.16b, v8.16b
+ eor v18.16b, v5.16b, v9.16b
+ eor v19.16b, v6.16b, v10.16b
+ eor v20.16b, v7.16b, v11.16b
+
+ shl v4.4s, v17.4s, #12
+ shl v5.4s, v18.4s, #12
+ shl v6.4s, v19.4s, #12
+ shl v7.4s, v20.4s, #12
+
+ sri v4.4s, v17.4s, #20
+ sri v5.4s, v18.4s, #20
+ sri v6.4s, v19.4s, #20
+ sri v7.4s, v20.4s, #20
+
+ // x0 += x4, x12 = rotl32(x12 ^ x0, 8)
+ // x1 += x5, x13 = rotl32(x13 ^ x1, 8)
+ // x2 += x6, x14 = rotl32(x14 ^ x2, 8)
+ // x3 += x7, x15 = rotl32(x15 ^ x3, 8)
+ add v0.4s, v0.4s, v4.4s
+ add v1.4s, v1.4s, v5.4s
+ add v2.4s, v2.4s, v6.4s
+ add v3.4s, v3.4s, v7.4s
+
+ eor v17.16b, v12.16b, v0.16b
+ eor v18.16b, v13.16b, v1.16b
+ eor v19.16b, v14.16b, v2.16b
+ eor v20.16b, v15.16b, v3.16b
+
+ shl v12.4s, v17.4s, #8
+ shl v13.4s, v18.4s, #8
+ shl v14.4s, v19.4s, #8
+ shl v15.4s, v20.4s, #8
+
+ sri v12.4s, v17.4s, #24
+ sri v13.4s, v18.4s, #24
+ sri v14.4s, v19.4s, #24
+ sri v15.4s, v20.4s, #24
+
+ // x8 += x12, x4 = rotl32(x4 ^ x8, 7)
+ // x9 += x13, x5 = rotl32(x5 ^ x9, 7)
+ // x10 += x14, x6 = rotl32(x6 ^ x10, 7)
+ // x11 += x15, x7 = rotl32(x7 ^ x11, 7)
+ add v8.4s, v8.4s, v12.4s
+ add v9.4s, v9.4s, v13.4s
+ add v10.4s, v10.4s, v14.4s
+ add v11.4s, v11.4s, v15.4s
+
+ eor v17.16b, v4.16b, v8.16b
+ eor v18.16b, v5.16b, v9.16b
+ eor v19.16b, v6.16b, v10.16b
+ eor v20.16b, v7.16b, v11.16b
+
+ shl v4.4s, v17.4s, #7
+ shl v5.4s, v18.4s, #7
+ shl v6.4s, v19.4s, #7
+ shl v7.4s, v20.4s, #7
+
+ sri v4.4s, v17.4s, #25
+ sri v5.4s, v18.4s, #25
+ sri v6.4s, v19.4s, #25
+ sri v7.4s, v20.4s, #25
+
+ // x0 += x5, x15 = rotl32(x15 ^ x0, 16)
+ // x1 += x6, x12 = rotl32(x12 ^ x1, 16)
+ // x2 += x7, x13 = rotl32(x13 ^ x2, 16)
+ // x3 += x4, x14 = rotl32(x14 ^ x3, 16)
+ add v0.4s, v0.4s, v5.4s
+ add v1.4s, v1.4s, v6.4s
+ add v2.4s, v2.4s, v7.4s
+ add v3.4s, v3.4s, v4.4s
+
+ eor v15.16b, v15.16b, v0.16b
+ eor v12.16b, v12.16b, v1.16b
+ eor v13.16b, v13.16b, v2.16b
+ eor v14.16b, v14.16b, v3.16b
+
+ rev32 v15.8h, v15.8h
+ rev32 v12.8h, v12.8h
+ rev32 v13.8h, v13.8h
+ rev32 v14.8h, v14.8h
+
+ // x10 += x15, x5 = rotl32(x5 ^ x10, 12)
+ // x11 += x12, x6 = rotl32(x6 ^ x11, 12)
+ // x8 += x13, x7 = rotl32(x7 ^ x8, 12)
+ // x9 += x14, x4 = rotl32(x4 ^ x9, 12)
+ add v10.4s, v10.4s, v15.4s
+ add v11.4s, v11.4s, v12.4s
+ add v8.4s, v8.4s, v13.4s
+ add v9.4s, v9.4s, v14.4s
+
+ eor v17.16b, v5.16b, v10.16b
+ eor v18.16b, v6.16b, v11.16b
+ eor v19.16b, v7.16b, v8.16b
+ eor v20.16b, v4.16b, v9.16b
+
+ shl v5.4s, v17.4s, #12
+ shl v6.4s, v18.4s, #12
+ shl v7.4s, v19.4s, #12
+ shl v4.4s, v20.4s, #12
+
+ sri v5.4s, v17.4s, #20
+ sri v6.4s, v18.4s, #20
+ sri v7.4s, v19.4s, #20
+ sri v4.4s, v20.4s, #20
+
+ // x0 += x5, x15 = rotl32(x15 ^ x0, 8)
+ // x1 += x6, x12 = rotl32(x12 ^ x1, 8)
+ // x2 += x7, x13 = rotl32(x13 ^ x2, 8)
+ // x3 += x4, x14 = rotl32(x14 ^ x3, 8)
+ add v0.4s, v0.4s, v5.4s
+ add v1.4s, v1.4s, v6.4s
+ add v2.4s, v2.4s, v7.4s
+ add v3.4s, v3.4s, v4.4s
+
+ eor v17.16b, v15.16b, v0.16b
+ eor v18.16b, v12.16b, v1.16b
+ eor v19.16b, v13.16b, v2.16b
+ eor v20.16b, v14.16b, v3.16b
+
+ shl v15.4s, v17.4s, #8
+ shl v12.4s, v18.4s, #8
+ shl v13.4s, v19.4s, #8
+ shl v14.4s, v20.4s, #8
+
+ sri v15.4s, v17.4s, #24
+ sri v12.4s, v18.4s, #24
+ sri v13.4s, v19.4s, #24
+ sri v14.4s, v20.4s, #24
+
+ // x10 += x15, x5 = rotl32(x5 ^ x10, 7)
+ // x11 += x12, x6 = rotl32(x6 ^ x11, 7)
+ // x8 += x13, x7 = rotl32(x7 ^ x8, 7)
+ // x9 += x14, x4 = rotl32(x4 ^ x9, 7)
+ add v10.4s, v10.4s, v15.4s
+ add v11.4s, v11.4s, v12.4s
+ add v8.4s, v8.4s, v13.4s
+ add v9.4s, v9.4s, v14.4s
+
+ eor v17.16b, v5.16b, v10.16b
+ eor v18.16b, v6.16b, v11.16b
+ eor v19.16b, v7.16b, v8.16b
+ eor v20.16b, v4.16b, v9.16b
+
+ shl v5.4s, v17.4s, #7
+ shl v6.4s, v18.4s, #7
+ shl v7.4s, v19.4s, #7
+ shl v4.4s, v20.4s, #7
+
+ sri v5.4s, v17.4s, #25
+ sri v6.4s, v18.4s, #25
+ sri v7.4s, v19.4s, #25
+ sri v4.4s, v20.4s, #25
+
+ subs x3, x3, #1
+ b.ne .Ldoubleround4
+
+ // x0[0-3] += s0[0]
+ // x1[0-3] += s0[1]
+ // x2[0-3] += s0[2]
+ // x3[0-3] += s0[3]
+ ld4r {v17.4s-v20.4s}, [x0], #16
+ add v0.4s, v0.4s, v17.4s
+ add v1.4s, v1.4s, v18.4s
+ add v2.4s, v2.4s, v19.4s
+ add v3.4s, v3.4s, v20.4s
+
+ // x4[0-3] += s1[0]
+ // x5[0-3] += s1[1]
+ // x6[0-3] += s1[2]
+ // x7[0-3] += s1[3]
+ ld4r {v21.4s-v24.4s}, [x0], #16
+ add v4.4s, v4.4s, v21.4s
+ add v5.4s, v5.4s, v22.4s
+ add v6.4s, v6.4s, v23.4s
+ add v7.4s, v7.4s, v24.4s
+
+ // x8[0-3] += s2[0]
+ // x9[0-3] += s2[1]
+ // x10[0-3] += s2[2]
+ // x11[0-3] += s2[3]
+ ld4r {v17.4s-v20.4s}, [x0], #16
+ add v8.4s, v8.4s, v17.4s
+ add v9.4s, v9.4s, v18.4s
+ add v10.4s, v10.4s, v19.4s
+ add v11.4s, v11.4s, v20.4s
+
+ // x12[0-3] += s3[0]
+ // x13[0-3] += s3[1]
+ // x14[0-3] += s3[2]
+ // x15[0-3] += s3[3]
+ ld4r {v21.4s-v24.4s}, [x0]
+ add v12.4s, v12.4s, v21.4s
+ add v13.4s, v13.4s, v22.4s
+ add v14.4s, v14.4s, v23.4s
+ add v15.4s, v15.4s, v24.4s
+
+ // x12 += counter values 0-3
+ add v12.4s, v12.4s, v16.4s
+
+ ld1 {v16.16b-v19.16b}, [x2], #64
+ ld1 {v20.16b-v23.16b}, [x2], #64
+
+ // interleave 32-bit words in state n, n+1
+ zip1 v24.4s, v0.4s, v1.4s
+ zip1 v25.4s, v2.4s, v3.4s
+ zip1 v26.4s, v4.4s, v5.4s
+ zip1 v27.4s, v6.4s, v7.4s
+ zip1 v28.4s, v8.4s, v9.4s
+ zip1 v29.4s, v10.4s, v11.4s
+ zip1 v30.4s, v12.4s, v13.4s
+ zip1 v31.4s, v14.4s, v15.4s
+
+ zip2 v1.4s, v0.4s, v1.4s
+ zip2 v3.4s, v2.4s, v3.4s
+ zip2 v5.4s, v4.4s, v5.4s
+ zip2 v7.4s, v6.4s, v7.4s
+ zip2 v9.4s, v8.4s, v9.4s
+ zip2 v11.4s, v10.4s, v11.4s
+ zip2 v13.4s, v12.4s, v13.4s
+ zip2 v15.4s, v14.4s, v15.4s
+
+ mov v0.16b, v24.16b
+ mov v2.16b, v25.16b
+ mov v4.16b, v26.16b
+ mov v6.16b, v27.16b
+ mov v8.16b, v28.16b
+ mov v10.16b, v29.16b
+ mov v12.16b, v30.16b
+ mov v14.16b, v31.16b
+
+ // interleave 64-bit words in state n, n+2
+ zip1 v24.2d, v0.2d, v2.2d
+ zip1 v25.2d, v1.2d, v3.2d
+ zip1 v26.2d, v4.2d, v6.2d
+ zip1 v27.2d, v5.2d, v7.2d
+ zip1 v28.2d, v8.2d, v10.2d
+ zip1 v29.2d, v9.2d, v11.2d
+ zip1 v30.2d, v12.2d, v14.2d
+ zip1 v31.2d, v13.2d, v15.2d
+
+ zip2 v2.2d, v0.2d, v2.2d
+ zip2 v3.2d, v1.2d, v3.2d
+ zip2 v6.2d, v4.2d, v6.2d
+ zip2 v7.2d, v5.2d, v7.2d
+ zip2 v10.2d, v8.2d, v10.2d
+ zip2 v11.2d, v9.2d, v11.2d
+ zip2 v14.2d, v12.2d, v14.2d
+ zip2 v15.2d, v13.2d, v15.2d
+
+ mov v0.16b, v24.16b
+ mov v1.16b, v25.16b
+ mov v4.16b, v26.16b
+ mov v5.16b, v27.16b
+
+ mov v8.16b, v28.16b
+ mov v9.16b, v29.16b
+ mov v12.16b, v30.16b
+ mov v13.16b, v31.16b
+
+ ld1 {v24.16b-v27.16b}, [x2], #64
+ ld1 {v28.16b-v31.16b}, [x2]
+
+ // xor with corresponding input, write to output
+ eor v16.16b, v16.16b, v0.16b
+ eor v17.16b, v17.16b, v4.16b
+ eor v18.16b, v18.16b, v8.16b
+ eor v19.16b, v19.16b, v12.16b
+ st1 {v16.16b-v19.16b}, [x1], #64
+
+ eor v20.16b, v20.16b, v2.16b
+ eor v21.16b, v21.16b, v6.16b
+ eor v22.16b, v22.16b, v10.16b
+ eor v23.16b, v23.16b, v14.16b
+ st1 {v20.16b-v23.16b}, [x1], #64
+
+ eor v24.16b, v24.16b, v1.16b
+ eor v25.16b, v25.16b, v5.16b
+ eor v26.16b, v26.16b, v9.16b
+ eor v27.16b, v27.16b, v13.16b
+ st1 {v24.16b-v27.16b}, [x1], #64
+
+ eor v28.16b, v28.16b, v3.16b
+ eor v29.16b, v29.16b, v7.16b
+ eor v30.16b, v30.16b, v11.16b
+ eor v31.16b, v31.16b, v15.16b
+ st1 {v28.16b-v31.16b}, [x1]
+
+ ret
+ENDPROC(chacha20_4block_xor_neon)
+
+CTRINC: .word 0, 1, 2, 3
diff --git a/arch/arm64/crypto/chacha20-neon-glue.c b/arch/arm64/crypto/chacha20-neon-glue.c
new file mode 100644
index 000000000000..705b42b06d00
--- /dev/null
+++ b/arch/arm64/crypto/chacha20-neon-glue.c
@@ -0,0 +1,131 @@
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
+ *
+ * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on:
+ * ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
+ *
+ * Copyright (C) 2015 Martin Willi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <crypto/algapi.h>
+#include <crypto/chacha20.h>
+#include <linux/crypto.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+#include <asm/neon.h>
+
+asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
+asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
+
+static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src,
+ unsigned int bytes)
+{
+ u8 buf[CHACHA20_BLOCK_SIZE];
+
+ while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
+ chacha20_4block_xor_neon(state, dst, src);
+ bytes -= CHACHA20_BLOCK_SIZE * 4;
+ src += CHACHA20_BLOCK_SIZE * 4;
+ dst += CHACHA20_BLOCK_SIZE * 4;
+ state[12] += 4;
+ }
+ while (bytes >= CHACHA20_BLOCK_SIZE) {
+ chacha20_block_xor_neon(state, dst, src);
+ bytes -= CHACHA20_BLOCK_SIZE;
+ src += CHACHA20_BLOCK_SIZE;
+ dst += CHACHA20_BLOCK_SIZE;
+ state[12]++;
+ }
+ if (bytes) {
+ memcpy(buf, src, bytes);
+ chacha20_block_xor_neon(state, buf, buf);
+ memcpy(dst, buf, bytes);
+ }
+}
+
+static int chacha20_simd(struct blkcipher_desc *desc, struct scatterlist *dst,
+ struct scatterlist *src, unsigned int nbytes)
+{
+ struct blkcipher_walk walk;
+ u32 state[16];
+ int err;
+
+ if (nbytes <= CHACHA20_BLOCK_SIZE)
+ return crypto_chacha20_crypt(desc, dst, src, nbytes);
+
+ blkcipher_walk_init(&walk, dst, src, nbytes);
+ err = blkcipher_walk_virt_block(desc, &walk, CHACHA20_BLOCK_SIZE);
+
+ crypto_chacha20_init(state, crypto_blkcipher_ctx(desc->tfm), walk.iv);
+
+ kernel_neon_begin();
+
+ while (walk.nbytes >= CHACHA20_BLOCK_SIZE) {
+ chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
+ rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE));
+ err = blkcipher_walk_done(desc, &walk,
+ walk.nbytes % CHACHA20_BLOCK_SIZE);
+ }
+
+ if (walk.nbytes) {
+ chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
+ walk.nbytes);
+ err = blkcipher_walk_done(desc, &walk, 0);
+ }
+
+ kernel_neon_end();
+
+ return err;
+}
+
+static struct crypto_alg alg = {
+ .cra_name = "chacha20",
+ .cra_driver_name = "chacha20-neon",
+ .cra_priority = 300,
+ .cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER,
+ .cra_blocksize = 1,
+ .cra_type = &crypto_blkcipher_type,
+ .cra_ctxsize = sizeof(struct chacha20_ctx),
+ .cra_alignmask = sizeof(u32) - 1,
+ .cra_module = THIS_MODULE,
+ .cra_u = {
+ .blkcipher = {
+ .min_keysize = CHACHA20_KEY_SIZE,
+ .max_keysize = CHACHA20_KEY_SIZE,
+ .ivsize = CHACHA20_IV_SIZE,
+ .geniv = "seqiv",
+ .setkey = crypto_chacha20_setkey,
+ .encrypt = chacha20_simd,
+ .decrypt = chacha20_simd,
+ },
+ },
+};
+
+static int __init chacha20_simd_mod_init(void)
+{
+ return crypto_register_alg(&alg);
+}
+
+static void __exit chacha20_simd_mod_fini(void)
+{
+ crypto_unregister_alg(&alg);
+}
+
+module_init(chacha20_simd_mod_init);
+module_exit(chacha20_simd_mod_fini);
+
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("chacha20");
--
2.7.4
^ permalink raw reply related
* [PATCH 0/2] crypto: arm64/ARM: NEON accelerated ChaCha20
From: Ard Biesheuvel @ 2016-12-08 14:28 UTC (permalink / raw)
To: linux-crypto, herbert; +Cc: linux-arm-kernel, Ard Biesheuvel
Another port of existing x86 SSE code to NEON, again both for arm64 and ARM.
ChaCha20 is a stream cipher described in RFC 7539, and is intended to be
an efficient software implementable 'standby cipher', in case AES cannot
be used.
This NEON implementation is almost 2x as fast as the generic C code
(measured on Cortex-A57 using the arm64 version)
I'm aware that blkciphers are deprecated in favor of skciphers, but this
code (like the x86 version) uses the init and setkey routines of the generic
version, so it is probably better to port all implementations at once.
Ard Biesheuvel (2):
crypto: arm64/chacha20 - implement NEON version based on SSE3 code
crypto: arm/chacha20 - implement NEON version based on SSE3 code
arch/arm/crypto/Kconfig | 6 +
arch/arm/crypto/Makefile | 2 +
arch/arm/crypto/chacha20-neon-core.S | 524 ++++++++++++++++++++
arch/arm/crypto/chacha20-neon-glue.c | 136 +++++
arch/arm64/crypto/Kconfig | 6 +
arch/arm64/crypto/Makefile | 3 +
arch/arm64/crypto/chacha20-neon-core.S | 480 ++++++++++++++++++
arch/arm64/crypto/chacha20-neon-glue.c | 131 +++++
8 files changed, 1288 insertions(+)
create mode 100644 arch/arm/crypto/chacha20-neon-core.S
create mode 100644 arch/arm/crypto/chacha20-neon-glue.c
create mode 100644 arch/arm64/crypto/chacha20-neon-core.S
create mode 100644 arch/arm64/crypto/chacha20-neon-glue.c
--
2.7.4
^ permalink raw reply
* Re: [PATCH] crypto: testmgr - fix overlap in chunked tests again
From: Herbert Xu @ 2016-12-08 12:17 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: linux-crypto, ebiggers
In-Reply-To: <1481185432-24761-1-git-send-email-ard.biesheuvel@linaro.org>
On Thu, Dec 08, 2016 at 08:23:52AM +0000, Ard Biesheuvel wrote:
> Commit 7e4c7f17cde2 ("crypto: testmgr - avoid overlap in chunked tests")
> attempted to address a problem in the crypto testmgr code where chunked
> test cases are copied to memory in a way that results in overlap.
>
> However, the fix recreated the exact same issue for other chunked tests,
> by putting IDX3 within 492 bytes of IDX1, which causes overlap if the
> first chunk exceeds 492 bytes, which is the case for at least one of
> the xts(aes) test cases.
>
> So increase IDX3 by another 1000 bytes.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Patch applied. Thanks.
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH] crypto: algif_aead - fix uninitialized variable warning
From: Herbert Xu @ 2016-12-08 12:17 UTC (permalink / raw)
To: Stephan Müller; +Cc: Stephen Rothwell, linux-crypto
In-Reply-To: <1660604.GQeZumkfIP@positron.chronox.de>
On Thu, Dec 08, 2016 at 07:09:44AM +0100, Stephan Müller wrote:
> In case the user provided insufficient data, the code may return
> prematurely without any operation. In this case, the processed
> data indicated with outlen is zero.
>
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Signed-off-by: Stephan Mueller <smueller@chronox.de>
Patch applied. Thanks.
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH] linux/types.h: enable endian checks for all sparse builds
From: Cornelia Huck @ 2016-12-08 11:25 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: kvm, Neil Armstrong, David Airlie, linux-remoteproc, dri-devel,
virtualization, linux-s390, James E.J. Bottomley, Herbert Xu,
linux-scsi, Christoph Hellwig, v9fs-developer, Asias He,
Arnd Bergmann, linux-kbuild, Jens Axboe, Michal Marek,
Stefan Hajnoczi, Matt Mackall, Greg Kroah-Hartman, linux-kernel,
linux-crypto, netdev, Linus Torvalds, David S. Miller
In-Reply-To: <1481164052-28036-1-git-send-email-mst@redhat.com>
On Thu, 8 Dec 2016 04:29:39 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> By now, linux is mostly endian-clean. Enabling endian-ness
> checks for everyone produces about 200 new sparse warnings for me -
> less than 10% over the 2000 sparse warnings already there.
Out of curiousity: Where do most of those warnings show up?
>
> Not a big deal, OTOH enabling this helps people notice
> they are introducing new bugs.
>
> So let's just drop __CHECK_ENDIAN__. Follow-up patches
> can drop distinction between __bitwise and __bitwise__.
>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> Linus, could you ack this for upstream? If yes I'll
> merge through my tree as a replacement for enabling
> this just for virtio.
>
> include/uapi/linux/types.h | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/include/uapi/linux/types.h b/include/uapi/linux/types.h
> index acf0979..41e5914 100644
> --- a/include/uapi/linux/types.h
> +++ b/include/uapi/linux/types.h
> @@ -23,11 +23,7 @@
> #else
> #define __bitwise__
> #endif
> -#ifdef __CHECK_ENDIAN__
> #define __bitwise __bitwise__
> -#else
> -#define __bitwise
> -#endif
>
> typedef __u16 __bitwise __le16;
> typedef __u16 __bitwise __be16;
FWIW, I like this better than just enabling it for the virtio code.
^ permalink raw reply
* Re: [PATCH] linux/types.h: enable endian checks for all sparse builds
From: Greg Kroah-Hartman @ 2016-12-08 11:13 UTC (permalink / raw)
To: Bart Van Assche
Cc: kvm@vger.kernel.org, Michael S. Tsirkin, David Airlie,
linux-remoteproc@vger.kernel.org, dri-devel@lists.freedesktop.org,
virtualization@lists.linux-foundation.org,
linux-s390@vger.kernel.org, James E.J. Bottomley, Herbert Xu,
linux-scsi@vger.kernel.org, Neil Armstrong, Christoph Hellwig,
v9fs-developer@lists.sourceforge.net, Asias He, Arnd Bergmann,
linux-kbuild@vger.kernel.org, Jens Axboe, Michal Marek <
In-Reply-To: <BLUPR02MB168304F8FBA50C916A6A4E6081840@BLUPR02MB1683.namprd02.prod.outlook.com>
On Thu, Dec 08, 2016 at 06:38:11AM +0000, Bart Van Assche wrote:
> On 12/07/16 21:54, Michael S. Tsirkin wrote:
> > On Thu, Dec 08, 2016 at 05:21:47AM +0000, Bart Van Assche wrote:
> >> Additionally, there are notable exceptions to the rule that most drivers
> >> are endian-clean, e.g. drivers/scsi/qla2xxx. I would appreciate it if it
> >> would remain possible to check such drivers with sparse without enabling
> >> endianness checks. Have you considered to change #ifdef __CHECK_ENDIAN__
> >> into e.g. #ifndef __DONT_CHECK_ENDIAN__?
> >
> > The right thing is probably just to fix these, isn't it?
> > Until then, why not just ignore the warnings?
>
> Neither option is realistic. With endian-checking enabled the qla2xxx
> driver triggers so many warnings that it becomes a real challenge to
> filter the non-endian warnings out manually:
>
> $ for f in "" CF=-D__CHECK_ENDIAN__; do make M=drivers/scsi/qla2xxx C=2\
> $f | &grep -c ': warning:'; done
> 4
> 752
>
> If you think it would be easy to fix the endian warnings triggered by
> the qla2xxx driver, you are welcome to try to fix these.
Please don't let one crappy, obviously broken, driver prevent this good
change from happening which will help ensure that the rest of the kernel
(i.e. 99% of it) can be easily tested and fixed for endian issues.
Sounds like you should just fix the driver up if it has so many
problems, and it annoys you so much that you have to filter out some
warnings to see the others :)
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH v2] crypto: sun4i-ss: support the Security System PRNG
From: PrasannaKumar Muralidharan @ 2016-12-08 11:01 UTC (permalink / raw)
To: Corentin Labbe
Cc: Herbert Xu, davem, maxime.ripard, Chen-Yu Tsai, linux-kernel,
linux-crypto, linux-arm-kernel
In-Reply-To: <20161208092400.GA9743@Red>
>> The hwrng interface was always meant to be an interface for real
>> hardware random number generators. People rely on that so we
>> should not provide bogus entropy sources through this interface.
>>
>
> Why not adding a KCONFIG HW_RANDOM_ACCEPT_ALSO_PRNG with big warning ?
> Or a HW_PRNG Kconfig which do the same than hwrandom with /dev/prng ?
> With that it will be much easier to convert in-tree PRNG that you want to remove.
I do have driver for a PRNG that I was planning to post in sometime.
Upon seeing this thread I think the code has to be changed.
I completely agree with Corentin, adding /dev/prng or /dev/hw_prng
will make it easier to move existing code. It can be made explicit
that using new device will provide only pseudo random number.
Thanks,
PrasannaKumar
^ permalink raw reply
* Re: crypto regression?
From: Herbert Xu @ 2016-12-08 10:44 UTC (permalink / raw)
To: Cyrille Pitchen
Cc: linux-crypto, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, nicolas.ferre
In-Reply-To: <10260816-d835-746e-d69e-da4cd7b6b88e@atmel.com>
On Thu, Dec 08, 2016 at 11:41:55AM +0100, Cyrille Pitchen wrote:
> Hi Herbert,
>
> Let me report a potential regression I've noticed this morning when testing
> linux-next.
>
> I've set CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=n when compiling both kernel
> images.
>
>
> On 4.9.0-rc2-next-20161028, /proc/crypto displays:
> driver : atmel-xts-aes
Thanks for the report. This should be fixed once I push this out
https://patchwork.kernel.org/patch/9465979/
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* crypto regression?
From: Cyrille Pitchen @ 2016-12-08 10:41 UTC (permalink / raw)
To: Herbert Xu
Cc: linux-crypto, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, nicolas.ferre
Hi Herbert,
Let me report a potential regression I've noticed this morning when testing
linux-next.
I've set CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=n when compiling both kernel
images.
On 4.9.0-rc2-next-20161028, /proc/crypto displays:
driver : atmel-xts-aes
module : kernel
priority : 300
refcnt : 1
selftest : passed
internal : no
type : ablkcipher
async : yes
blocksize : 16
min keysize : 32
max keysize : 64
ivsize : 16
geniv : <default>
and no output from the test manager in the boot log.
Whereas on 4.9.0-rc8-next-20161208, we get:
driver : atmel-xts-aes
module : kernel
priority : 300
refcnt : 1
selftest : unknown
internal : no
type : ablkcipher
async : yes
blocksize : 16
min keysize : 32
max keysize : 64
ivsize : 16
geniv : <default>
Also I see the following traces during the boot:
alg: skcipher: Chunk test 1 failed on encryption at page 0 for atmel-xts-aes
00000000: 1c 3b 3a 10 2f 77 03 86 e4 83 6c 99 e3 70 cf 9b
00000010: ea 00 80 3f 5e 48 23 57 a4 ae 12 d4 14 a3 e6 3b
00000020: 5d 31 e2 76 f8 fe 4a 8d 66 b3 17 f9 ac 68 3f 44
00000030: 68 0a 86 ac 35 ad fc 33 45 be fe cb 4b b1 88 fd
00000040: 57 76 92 6c 49 a3 09 5e b1 08 fd 10 98 ba ec 70
00000050: aa a6 69 99 a7 2a 82 f2 7d 84 8b 21 d4 a7 41 b0
00000060: c5 cd 4d 5f ff 9d ac 89 ae ba 12 29 61 d0 3a 75
00000070: 71 23 e9 87 0f 8a cf 10 00 02 08 87 89 14 29 ca
00000080: 2a 3e 7a 7d 7d f7 b1 03 55 16 5c 8b 9a 6d 0a 7d
00000090: e8 b0 62 c4 50 0d c4 cd 12 0c 0f 74 18 da e3 d0
000000a0: b5 78 1c 34 80 3f a7 54 21 c7 90 df e1 de 18 34
000000b0: f2 80 d7 66 7b 32 7f 6c 8c d7 55 7e 12 ac 3a 0f
000000c0: 93 ec 05 c5 2e 04 93 ef 31 a1 2d 3d 92 60 f7 9a
000000d0: 28 9d 6a 37 9b c7 0c 50 84 14 73 d1 a8 cc 81 ec
000000e0: 58 3e 96 45 e0 7b 8d 96 70 65 5b a5 bb cf ec c6
000000f0: dc 39 66 38 0a d8 fe cb 17 b6 ba 02 46 9a 02 0a
00000100: 84 e1 8e 8f 84 25 20 70 c1 3e 9f 1f 28 9b e5 4f
00000110: bc 48 14 57 77 8f 61 60 15 e1 32 7a 02 b1 40 f1
00000120: 50 5e b3 09 32 6d 68 37 8f 83 74 59 5c 84 9d 84
00000130: f4 c3 33 ec 44 23 88 51 43 cb 47 bd 71 c5 ed ae
00000140: 9b e6 9a 2f fe ce b1 be c9 de 24 4f be 15 99 2b
00000150: 11 b7 7c 04 0f 12 bd 8f 6a 97 5a 44 a0 f9 0c 29
00000160: a9 ab c3 d4 d8 93 92 72 84 c5 87 54 cc e2 94 52
00000170: 9f 86 14 dc d2 ab a9 91 92 5f ed c4 ae 74 ff ac
00000180: 6e 33 3b 93 eb 4a ff 04 79 da 9a 41 0e 44 50 e0
00000190: dd 7a e4 c6 e2 91 09 00 57 5d a4 01 fc 07 05 9f
000001a0: 64 5e 8b 7e 9b fd ef 33 94 30 54 ff 84 01 14 93
000001b0: c2 7b 34 29 ea ed b4 ed 53 76 44 1a 77 ed 43 85
000001c0: 1a d7 7f 16 f5 41 df d2 69 d5 0d 6a 5f 14 fb 0a
000001d0: b5 32 fd 6f 01 77 3d 53 f7 a4 70 83 46 bc 5f c4
000001e0: f3 6f fd a9 fc ea 70 b9 c6 e6 93 e1
The output is a little bit long for test 1, isn't it?
When I look at aes_xts_enc_tv_template[] from crypto/testmgr.h
I see .rlen = 32 .
I didn't bisect to find out exactly since when the regression is there. I
wanted to warn you quickly since we are close to the merge window.
Also if you have already been notified about this issue, please, sorry for
the noise!
Best regards,
Cyrille
^ permalink raw reply
* Re: [PATCH v2] crypto: sun4i-ss: support the Security System PRNG
From: Corentin Labbe @ 2016-12-08 9:24 UTC (permalink / raw)
To: Herbert Xu
Cc: davem, maxime.ripard, wens, linux-kernel, linux-crypto,
linux-arm-kernel
In-Reply-To: <20161208090618.GB22932@gondor.apana.org.au>
On Thu, Dec 08, 2016 at 05:06:18PM +0800, Herbert Xu wrote:
> On Wed, Dec 07, 2016 at 01:51:27PM +0100, Corentin Labbe wrote:
> >
> > So I must expose it as a crypto_rng ?
>
> If it is to be exposed at all then algif_rng would be the best
> place.
>
I have badly said my question.
So I need to use the HW PRNG in a crypto_rng "provider" that could be thereafter used from user space via algif_rng. right ?
> > Could you explain why PRNG must not be used as hw_random ?
>
> The hwrng interface was always meant to be an interface for real
> hardware random number generators. People rely on that so we
> should not provide bogus entropy sources through this interface.
>
Why not adding a KCONFIG HW_RANDOM_ACCEPT_ALSO_PRNG with big warning ?
Or a HW_PRNG Kconfig which do the same than hwrandom with /dev/prng ?
With that it will be much easier to convert in-tree PRNG that you want to remove.
Regards
Corentin Labbe
^ permalink raw reply
* Re: [PATCH v1 2/2] crypto: mediatek - add DT bindings documentation
From: Ryder Lee @ 2016-12-08 9:19 UTC (permalink / raw)
To: Matthias Brugger
Cc: Herbert Xu, David S. Miller, devicetree, linux-mediatek,
linux-kernel, linux-crypto, linux-arm-kernel, Sean Wang, Roy Luo
In-Reply-To: <4b0df417-825f-65c6-4c02-85fe1c20ce50@gmail.com>
Hello,
On Mon, 2016-12-05 at 11:18 +0100, Matthias Brugger wrote:
>
> On 05/12/16 08:01, Ryder Lee wrote:
> > Add DT bindings documentation for the crypto driver
> >
> > Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
> > ---
> > .../devicetree/bindings/crypto/mediatek-crypto.txt | 32 ++++++++++++++++++++++
> > 1 file changed, 32 insertions(+)
> > create mode 100644 Documentation/devicetree/bindings/crypto/mediatek-crypto.txt
> >
> > diff --git a/Documentation/devicetree/bindings/crypto/mediatek-crypto.txt b/Documentation/devicetree/bindings/crypto/mediatek-crypto.txt
> > new file mode 100644
> > index 0000000..8b1db08
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/crypto/mediatek-crypto.txt
> > @@ -0,0 +1,32 @@
> > +MediaTek cryptographic accelerators
> > +
> > +Required properties:
> > +- compatible: Should be "mediatek,mt7623-crypto"
>
> Do you know how big the difference is between the crypto engine for
> mt7623/mt2701/mt8521p in comparison, let's say mt8173 or mt6797?
> Do this SoCs have a crypot engine? If so and they are quite similar, we
> might think of adding a mtk-crypto binding and add soc specific bindings.
This engine is just available in mt7623/mt2701/mt8521p series SoCs and
they have no difference.
But there are still other crypto IPs in MTK, i think maybe we could use
"mediatek,{IP name}-crypto” to distinguish them ?
> Regards,
> Matthias
>
> > +- reg: Address and length of the register set for the device
> > +- interrupts: Should contain the five crypto engines interrupts in numeric
> > + order. These are global system and four descriptor rings.
> > +- clocks: the clock used by the core
> > +- clock-names: the names of the clock listed in the clocks property. These are
> > + "ethif", "cryp"
> > +- power-domains: Must contain a reference to the PM domain.
> > +
> > +
> > +Optional properties:
> > +- interrupt-parent: Should be the phandle for the interrupt controller
> > + that services interrupts for this device
> > +
> > +
> > +Example:
> > + crypto: crypto@1b240000 {
> > + compatible = "mediatek,mt7623-crypto";
> > + reg = <0 0x1b240000 0 0x20000>;
> > + interrupts = <GIC_SPI 82 IRQ_TYPE_LEVEL_LOW>,
> > + <GIC_SPI 83 IRQ_TYPE_LEVEL_LOW>,
> > + <GIC_SPI 84 IRQ_TYPE_LEVEL_LOW>,
> > + <GIC_SPI 91 IRQ_TYPE_LEVEL_LOW>,
> > + <GIC_SPI 97 IRQ_TYPE_LEVEL_LOW>;
> > + clocks = <&topckgen CLK_TOP_ETHIF_SEL>,
> > + <ðsys CLK_ETHSYS_CRYPTO>;
> > + clock-names = "ethif","cryp";
> > + power-domains = <&scpsys MT2701_POWER_DOMAIN_ETH>;
> > + };
> >
^ permalink raw reply
* Re: [PATCH v2] crypto: sun4i-ss: support the Security System PRNG
From: Herbert Xu @ 2016-12-08 9:06 UTC (permalink / raw)
To: Corentin Labbe
Cc: davem, maxime.ripard, wens, linux-kernel, linux-crypto,
linux-arm-kernel
In-Reply-To: <20161207125127.GB28218@Red>
On Wed, Dec 07, 2016 at 01:51:27PM +0100, Corentin Labbe wrote:
>
> So I must expose it as a crypto_rng ?
If it is to be exposed at all then algif_rng would be the best
place.
> Could you explain why PRNG must not be used as hw_random ?
The hwrng interface was always meant to be an interface for real
hardware random number generators. People rely on that so we
should not provide bogus entropy sources through this interface.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH v1 1/2] Add crypto driver support for some MediaTek chips
From: Ryder Lee @ 2016-12-08 9:05 UTC (permalink / raw)
To: Corentin Labbe
Cc: Herbert Xu, David S. Miller, Matthias Brugger,
devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-crypto-u79uwXL29TY76Z2rM5mHXA,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Sean Wang,
Roy Luo
In-Reply-To: <20161205085220.GA333@Red>
Hello,
On Mon, 2016-12-05 at 09:52 +0100, Corentin Labbe wrote:
> Hello
>
> I have two minor comment.
>
> On Mon, Dec 05, 2016 at 03:01:23PM +0800, Ryder Lee wrote:
> > This adds support for the MediaTek hardware accelerator on
> > mt7623/mt2701/mt8521p SoC.
> >
> > This driver currently implement:
> > - SHA1 and SHA2 family(HMAC) hash alogrithms.
>
> There is a typo for algorithms.
>
> [...]
> > +/**
> > + * struct mtk_desc - DMA descriptor
> > + * @hdr: the descriptor control header
> > + * @buf: DMA address of input buffer segment
> > + * @ct: DMA address of command token that control operation flow
> > + * @ct_hdr: the command token control header
> > + * @tag: the user-defined field
> > + * @tfm: DMA address of transform state
> > + * @bound: align descriptors offset boundary
> > + *
> > + * Structure passed to the crypto engine to describe where source
> > + * data needs to be fetched and how it needs to be processed.
> > + */
> > +struct mtk_desc {
> > + u32 hdr;
> > + u32 buf;
> > + u32 ct;
> > + u32 ct_hdr;
> > + u32 tag;
> > + u32 tfm;
> > + u32 bound[2];
> > +};
>
> Do you have tested this descriptor with BE/LE kernel ?
I did not test it with BE kernel, because both CPU and accelerator in
our SoC just run on LE system.
Thanks for reminding me, i will use byteorder conversion macros and type
identifiers.
> Regards
> Corentin Labbe
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox