Linux cryptographic layer development

Linux cryptographic layer development
 help / color / mirror / Atom feed

* Re: Re: [PATCH v2 1/2] virtio: introduce little edian functions for virtio_cread/write# family
From: Michael S. Tsirkin @ 2016-11-28  2:34 UTC (permalink / raw)
  To: Gonglei (Arei)
  Cc: linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
	virtio-dev@lists.oasis-open.org,
	virtualization@lists.linux-foundation.org,
	linux-crypto@vger.kernel.org, Luonengjun, stefanha@redhat.com,
	Huangweidong (C), Wubin (H), xin.zeng@intel.com, Claudio Fontana,
	herbert@gondor.apana.org.au, pasic@linux.vnet.ibm.com,
	davem@davemloft.net,
	"Zhoujian (jay, Euler)" <jianjay.zhou@
In-Reply-To: <33183CC9F5247A488A2544077AF19020D97A103C@SZXEMA503-MBS.china.huawei.com>

On Mon, Nov 28, 2016 at 01:56:04AM +0000, Gonglei (Arei) wrote:
> Hi Michael,
> 
> Thanks for your feedback firstly!
> 
> > -----Original Message-----
> > From: virtio-dev@lists.oasis-open.org [mailto:virtio-dev@lists.oasis-open.org]
> > On Behalf Of Michael S. Tsirkin
> > Sent: Sunday, November 27, 2016 11:33 AM
> > To: Gonglei (Arei)
> > Subject: [virtio-dev] Re: [PATCH v2 1/2] virtio: introduce little edian functions for
> > virtio_cread/write# family
> > 
> > On Tue, Nov 22, 2016 at 04:10:22PM +0800, Gonglei wrote:
> > > Virtio modern devices are always little edian, let's introduce
> > > the LE functions for read/write configuration space for
> > > virtio modern devices, which avoid complaint by Sparse when
> > > we use the virtio_creaed/virtio_cwrite in VIRTIO_1 devices.
> > >
> > > Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> > > ---
> > >  include/linux/virtio_config.h | 45
> > +++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 45 insertions(+)
> > >
> > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > index 26c155b..de05707 100644
> > > --- a/include/linux/virtio_config.h
> > > +++ b/include/linux/virtio_config.h
> > > @@ -414,4 +414,49 @@ static inline void virtio_cwrite64(struct virtio_device
> > *vdev,
> > >  		_r;							\
> > >  	})
> > >
> > > +static inline __le16 virtio_cread16_le(struct virtio_device *vdev,
> > > +				 unsigned int offset)
> > > +{
> > > +	__le16 ret;
> > > +
> > > +	vdev->config->get(vdev, offset, &ret, sizeof(ret));
> > > +	return ret;
> > > +}
> > > +
> > > +static inline void virtio_cwrite16_le(struct virtio_device *vdev,
> > > +				   unsigned int offset, __le16 val)
> > > +{
> > > +	vdev->config->set(vdev, offset, &val, sizeof(val));
> > > +}
> > > +
> > > +static inline __le32 virtio_cread32_le(struct virtio_device *vdev,
> > > +				 unsigned int offset)
> > > +{
> > > +	__le32 ret;
> > > +
> > > +	vdev->config->get(vdev, offset, &ret, sizeof(ret));
> > > +	return ret;
> > > +}
> > > +
> > > +static inline void virtio_cwrite32_le(struct virtio_device *vdev,
> > > +				   unsigned int offset, __le32 val)
> > > +{
> > > +	vdev->config->set(vdev, offset, &val, sizeof(val));
> > > +}
> > > +
> > > +static inline __le64 virtio_cread64_le(struct virtio_device *vdev,
> > > +				 unsigned int offset)
> > > +{
> > > +	__le64 ret;
> > > +
> > > +	__virtio_cread_many(vdev, offset, &ret, 1, sizeof(ret));
> > > +	return ret;
> > > +}
> > > +
> > > +static inline void virtio_cwrite64_le(struct virtio_device *vdev,
> > > +				   unsigned int offset, __le64 val)
> > > +{
> > > +	vdev->config->set(vdev, offset, &val, sizeof(val));
> > > +}
> > > +
> > >  #endif /* _LINUX_VIRTIO_CONFIG_H */
> > 
> > Could you please better explain what is the issue you are facing?
> > virtio_cwrite/virtio_cread all accept and return native types.
> > 
> virtio_cwrite/virtio_cread are used to write/read configuration
> space for virtio devices. For virtio-crypto device, I used __le32/64 directly
> in struct virtio_crypto_config. The sparse reports warnings if I use virtio_cread()
> for virtio-crypto device.

I suspect that's because you are doing cread into an le32 variable.



> Furthermore, it means the warnings exist for all VIRTIO_1 devices because
> they are definitely LE, which it's not necessary to use virtio_to_cpu/cpu_to_virtio.
> 
> 
> PS: I googled a discussion about this topic for virtio-input device, pls see:
>  http://linux.kernel.narkive.com/3argfbWz/patch-1-1-add-virtio-input-driver
> 
> Regards,
> -Gonglei

Looks like we changed the macros since - at least ATM
virtio_console_config uses __u16 fields (which is probably a bug -
I'll look into fixing it up) and they
do not seem to trigger warnings.


> > If you want it in LE format, swap it!
> > 
> > 
> > 
> > > --
> > > 1.8.3.1
> > >
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply

* RE: [virtio-dev] Re: [PATCH v2 1/2] virtio: introduce little edian functions for virtio_cread/write# family
From: Gonglei (Arei) @ 2016-11-28  1:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
	virtio-dev@lists.oasis-open.org,
	virtualization@lists.linux-foundation.org,
	linux-crypto@vger.kernel.org, Luonengjun, stefanha@redhat.com,
	Huangweidong (C), Wubin (H), xin.zeng@intel.com, Claudio Fontana,
	herbert@gondor.apana.org.au, pasic@linux.vnet.ibm.com,
	davem@davemloft.net, "Zhoujian (jay, Euler)" <jian
In-Reply-To: <20161127052802-mutt-send-email-mst@kernel.org>

Hi Michael,

Thanks for your feedback firstly!

> -----Original Message-----
> From: virtio-dev@lists.oasis-open.org [mailto:virtio-dev@lists.oasis-open.org]
> On Behalf Of Michael S. Tsirkin
> Sent: Sunday, November 27, 2016 11:33 AM
> To: Gonglei (Arei)
> Subject: [virtio-dev] Re: [PATCH v2 1/2] virtio: introduce little edian functions for
> virtio_cread/write# family
> 
> On Tue, Nov 22, 2016 at 04:10:22PM +0800, Gonglei wrote:
> > Virtio modern devices are always little edian, let's introduce
> > the LE functions for read/write configuration space for
> > virtio modern devices, which avoid complaint by Sparse when
> > we use the virtio_creaed/virtio_cwrite in VIRTIO_1 devices.
> >
> > Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> > ---
> >  include/linux/virtio_config.h | 45
> +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 45 insertions(+)
> >
> > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > index 26c155b..de05707 100644
> > --- a/include/linux/virtio_config.h
> > +++ b/include/linux/virtio_config.h
> > @@ -414,4 +414,49 @@ static inline void virtio_cwrite64(struct virtio_device
> *vdev,
> >  		_r;							\
> >  	})
> >
> > +static inline __le16 virtio_cread16_le(struct virtio_device *vdev,
> > +				 unsigned int offset)
> > +{
> > +	__le16 ret;
> > +
> > +	vdev->config->get(vdev, offset, &ret, sizeof(ret));
> > +	return ret;
> > +}
> > +
> > +static inline void virtio_cwrite16_le(struct virtio_device *vdev,
> > +				   unsigned int offset, __le16 val)
> > +{
> > +	vdev->config->set(vdev, offset, &val, sizeof(val));
> > +}
> > +
> > +static inline __le32 virtio_cread32_le(struct virtio_device *vdev,
> > +				 unsigned int offset)
> > +{
> > +	__le32 ret;
> > +
> > +	vdev->config->get(vdev, offset, &ret, sizeof(ret));
> > +	return ret;
> > +}
> > +
> > +static inline void virtio_cwrite32_le(struct virtio_device *vdev,
> > +				   unsigned int offset, __le32 val)
> > +{
> > +	vdev->config->set(vdev, offset, &val, sizeof(val));
> > +}
> > +
> > +static inline __le64 virtio_cread64_le(struct virtio_device *vdev,
> > +				 unsigned int offset)
> > +{
> > +	__le64 ret;
> > +
> > +	__virtio_cread_many(vdev, offset, &ret, 1, sizeof(ret));
> > +	return ret;
> > +}
> > +
> > +static inline void virtio_cwrite64_le(struct virtio_device *vdev,
> > +				   unsigned int offset, __le64 val)
> > +{
> > +	vdev->config->set(vdev, offset, &val, sizeof(val));
> > +}
> > +
> >  #endif /* _LINUX_VIRTIO_CONFIG_H */
> 
> Could you please better explain what is the issue you are facing?
> virtio_cwrite/virtio_cread all accept and return native types.
> 
virtio_cwrite/virtio_cread are used to write/read configuration
space for virtio devices. For virtio-crypto device, I used __le32/64 directly
in struct virtio_crypto_config. The sparse reports warnings if I use virtio_cread()
for virtio-crypto device.

Furthermore, it means the warnings exist for all VIRTIO_1 devices because
they are definitely LE, which it's not necessary to use virtio_to_cpu/cpu_to_virtio.


PS: I googled a discussion about this topic for virtio-input device, pls see:
 http://linux.kernel.narkive.com/3argfbWz/patch-1-1-add-virtio-input-driver

Regards,
-Gonglei

> If you want it in LE format, swap it!
> 
> 
> 
> > --
> > 1.8.3.1
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply

* Re: [PATCH v2 1/2] virtio: introduce little edian functions for virtio_cread/write# family
From: Michael S. Tsirkin @ 2016-11-27  3:32 UTC (permalink / raw)
  To: Gonglei
  Cc: virtio-dev, weidong.huang, claudio.fontana, hanweidong,
	qemu-devel, luonengjun, linux-kernel, virtualization,
	salvatore.benedetto, xuquan8, linux-crypto, stefanha,
	jianjay.zhou, longpeng2, arei.gonglei, davem, wu.wubin, herbert
In-Reply-To: <1479802223-121104-2-git-send-email-arei.gonglei@huawei.com>

On Tue, Nov 22, 2016 at 04:10:22PM +0800, Gonglei wrote:
> Virtio modern devices are always little edian, let's introduce
> the LE functions for read/write configuration space for
> virtio modern devices, which avoid complaint by Sparse when
> we use the virtio_creaed/virtio_cwrite in VIRTIO_1 devices.
> 
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  include/linux/virtio_config.h | 45 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
> 
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 26c155b..de05707 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -414,4 +414,49 @@ static inline void virtio_cwrite64(struct virtio_device *vdev,
>  		_r;							\
>  	})
>  
> +static inline __le16 virtio_cread16_le(struct virtio_device *vdev,
> +				 unsigned int offset)
> +{
> +	__le16 ret;
> +
> +	vdev->config->get(vdev, offset, &ret, sizeof(ret));
> +	return ret;
> +}
> +
> +static inline void virtio_cwrite16_le(struct virtio_device *vdev,
> +				   unsigned int offset, __le16 val)
> +{
> +	vdev->config->set(vdev, offset, &val, sizeof(val));
> +}
> +
> +static inline __le32 virtio_cread32_le(struct virtio_device *vdev,
> +				 unsigned int offset)
> +{
> +	__le32 ret;
> +
> +	vdev->config->get(vdev, offset, &ret, sizeof(ret));
> +	return ret;
> +}
> +
> +static inline void virtio_cwrite32_le(struct virtio_device *vdev,
> +				   unsigned int offset, __le32 val)
> +{
> +	vdev->config->set(vdev, offset, &val, sizeof(val));
> +}
> +
> +static inline __le64 virtio_cread64_le(struct virtio_device *vdev,
> +				 unsigned int offset)
> +{
> +	__le64 ret;
> +
> +	__virtio_cread_many(vdev, offset, &ret, 1, sizeof(ret));
> +	return ret;
> +}
> +
> +static inline void virtio_cwrite64_le(struct virtio_device *vdev,
> +				   unsigned int offset, __le64 val)
> +{
> +	vdev->config->set(vdev, offset, &val, sizeof(val));
> +}
> +
>  #endif /* _LINUX_VIRTIO_CONFIG_H */

Could you please better explain what is the issue you are facing?
virtio_cwrite/virtio_cread all accept and return native types.

If you want it in LE format, swap it!



> -- 
> 1.8.3.1
> 

^ permalink raw reply

* Re: [PATCH v2 2/2] crypto: add virtio-crypto driver
From: Michael S. Tsirkin @ 2016-11-27  3:27 UTC (permalink / raw)
  To: Gonglei
  Cc: linux-kernel, qemu-devel, virtio-dev, virtualization,
	linux-crypto, luonengjun, stefanha, weidong.huang, wu.wubin,
	xin.zeng, claudio.fontana, herbert, pasic, davem, jianjay.zhou,
	hanweidong, arei.gonglei, cornelia.huck, xuquan8, longpeng2,
	salvatore.benedetto
In-Reply-To: <1479802223-121104-3-git-send-email-arei.gonglei@huawei.com>

On Tue, Nov 22, 2016 at 04:10:23PM +0800, Gonglei wrote:
> This patch introduces virtio-crypto driver for Linux Kernel.
> 
> The virtio crypto device is a virtual cryptography device
> as well as a kind of virtual hardware accelerator for
> virtual machines. The encryption anddecryption requests
> are placed in the data queue and are ultimately handled by
> thebackend crypto accelerators. The second queue is the
> control queue used to create or destroy sessions for
> symmetric algorithms and will control some advanced features
> in the future. The virtio crypto device provides the following
> cryptoservices: CIPHER, MAC, HASH, and AEAD.
> 
> For more information about virtio-crypto device, please see:
>   http://qemu-project.org/Features/VirtioCrypto
> 
> CC: Michael S. Tsirkin <mst@redhat.com>
> CC: Cornelia Huck <cornelia.huck@de.ibm.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Herbert Xu <herbert@gondor.apana.org.au>
> CC: Halil Pasic <pasic@linux.vnet.ibm.com>
> CC: David S. Miller <davem@davemloft.net>
> CC: Zeng Xin <xin.zeng@intel.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  MAINTAINERS                                  |   8 +
>  drivers/crypto/Kconfig                       |   2 +
>  drivers/crypto/Makefile                      |   1 +
>  drivers/crypto/virtio/Kconfig                |  10 +
>  drivers/crypto/virtio/Makefile               |   5 +
>  drivers/crypto/virtio/virtio_crypto.c        | 444 +++++++++++++++++++++++
>  drivers/crypto/virtio/virtio_crypto_algs.c   | 524 +++++++++++++++++++++++++++
>  drivers/crypto/virtio/virtio_crypto_common.h | 124 +++++++
>  drivers/crypto/virtio/virtio_crypto_mgr.c    | 258 +++++++++++++
>  include/uapi/linux/Kbuild                    |   1 +
>  include/uapi/linux/virtio_crypto.h           | 435 ++++++++++++++++++++++
>  include/uapi/linux/virtio_ids.h              |   1 +
>  12 files changed, 1813 insertions(+)
>  create mode 100644 drivers/crypto/virtio/Kconfig
>  create mode 100644 drivers/crypto/virtio/Makefile
>  create mode 100644 drivers/crypto/virtio/virtio_crypto.c
>  create mode 100644 drivers/crypto/virtio/virtio_crypto_algs.c
>  create mode 100644 drivers/crypto/virtio/virtio_crypto_common.h
>  create mode 100644 drivers/crypto/virtio/virtio_crypto_mgr.c
>  create mode 100644 include/uapi/linux/virtio_crypto.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 411e3b8..d6b18bb 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12844,6 +12844,14 @@ S:	Maintained
>  F:	drivers/virtio/virtio_input.c
>  F:	include/uapi/linux/virtio_input.h
>  
> +VIRTIO CRYPTO DRIVER
> +M:  Gonglei <arei.gonglei@huawei.com>
> +L:  virtualization@lists.linux-foundation.org
> +L:  linux-crypto@vger.kernel.org
> +S:  Maintained
> +F:  drivers/crypto/virtio/
> +F:  include/uapi/linux/virtio_crypto.h
> +
>  VIA RHINE NETWORK DRIVER
>  S:	Orphan
>  F:	drivers/net/ethernet/via/via-rhine.c
> diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
> index 4d2b81f..7956478 100644
> --- a/drivers/crypto/Kconfig
> +++ b/drivers/crypto/Kconfig
> @@ -555,4 +555,6 @@ config CRYPTO_DEV_ROCKCHIP
>  
>  source "drivers/crypto/chelsio/Kconfig"
>  
> +source "drivers/crypto/virtio/Kconfig"
> +
>  endif # CRYPTO_HW
> diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
> index ad7250f..bc53cb8 100644
> --- a/drivers/crypto/Makefile
> +++ b/drivers/crypto/Makefile
> @@ -32,3 +32,4 @@ obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
>  obj-$(CONFIG_CRYPTO_DEV_SUN4I_SS) += sunxi-ss/
>  obj-$(CONFIG_CRYPTO_DEV_ROCKCHIP) += rockchip/
>  obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chelsio/
> +obj-$(CONFIG_CRYPTO_DEV_VIRTIO) += virtio/
> diff --git a/drivers/crypto/virtio/Kconfig b/drivers/crypto/virtio/Kconfig
> new file mode 100644
> index 0000000..ceae88c
> --- /dev/null
> +++ b/drivers/crypto/virtio/Kconfig
> @@ -0,0 +1,10 @@
> +config CRYPTO_DEV_VIRTIO
> +	tristate "VirtIO crypto driver"
> +	depends on VIRTIO
> +    select CRYPTO_AEAD
> +    select CRYPTO_AUTHENC
> +    select CRYPTO_BLKCIPHER
> +	default m
> +	help
> +	  This driver provides support for virtio crypto device. If you
> +	  choose 'M' here, this module will be called virtio-crypto.
> diff --git a/drivers/crypto/virtio/Makefile b/drivers/crypto/virtio/Makefile
> new file mode 100644
> index 0000000..a316e92
> --- /dev/null
> +++ b/drivers/crypto/virtio/Makefile
> @@ -0,0 +1,5 @@
> +obj-$(CONFIG_CRYPTO_DEV_VIRTIO) += virtio-crypto.o
> +virtio-crypto-objs := \
> +	virtio_crypto_algs.o \
> +	virtio_crypto_mgr.o \
> +	virtio_crypto.o
> diff --git a/drivers/crypto/virtio/virtio_crypto.c b/drivers/crypto/virtio/virtio_crypto.c
> new file mode 100644
> index 0000000..56fdfed
> --- /dev/null
> +++ b/drivers/crypto/virtio/virtio_crypto.c
> @@ -0,0 +1,444 @@
> + /* Driver for Virtio crypto device.
> +  *
> +  * Copyright 2016 HUAWEI TECHNOLOGIES CO., LTD.
> +  *
> +  * This program is free software; you can redistribute it and/or modify
> +  * it under the terms of the GNU General Public License as published by
> +  * the Free Software Foundation; either version 2 of the License, or
> +  * (at your option) any later version.
> +  *
> +  * This program is distributed in the hope that it will be useful,
> +  * but WITHOUT ANY WARRANTY; without even the implied warranty of
> +  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +  * GNU General Public License for more details.
> +  *
> +  * You should have received a copy of the GNU General Public License
> +  * along with this program; if not, see <http://www.gnu.org/licenses/>.
> +  */
> +
> +#include <linux/err.h>
> +#include <linux/module.h>
> +#include <linux/virtio_config.h>
> +#include <linux/cpu.h>
> +
> +#include <uapi/linux/virtio_crypto.h>
> +#include "virtio_crypto_common.h"
> +
> +
> +static void virtcrypto_dataq_callback(struct virtqueue *vq)
> +{
> +	struct virtio_crypto *vcrypto = vq->vdev->priv;
> +	struct virtio_crypto_request *vc_req;
> +	unsigned long flags;
> +	unsigned int len;
> +	struct ablkcipher_request *ablk_req;
> +	int error;
> +
> +	spin_lock_irqsave(&vcrypto->lock, flags);
> +	do {
> +		virtqueue_disable_cb(vq);
> +		while ((vc_req = virtqueue_get_buf(vq, &len)) != NULL) {
> +			if (vc_req->type == VIRTIO_CRYPTO_SYM_OP_CIPHER) {
> +				switch (vc_req->status) {
> +				case VIRTIO_CRYPTO_OK:
> +					error = 0;
> +					break;
> +				case VIRTIO_CRYPTO_INVSESS:
> +				case VIRTIO_CRYPTO_ERR:
> +					error = -EINVAL;
> +					break;
> +				case VIRTIO_CRYPTO_BADMSG:
> +					error = -EBADMSG;
> +					break;
> +				default:
> +					error = -EIO;
> +					break;
> +				}
> +				ablk_req = vc_req->ablkcipher_req;
> +				/* Finish the encrypt or decrypt process */
> +				ablk_req->base.complete(&ablk_req->base, error);
> +			}
> +
> +			kfree(vc_req->req_data);
> +			kfree(vc_req->sgs);
> +		}
> +	} while (!virtqueue_enable_cb(vq));
> +	spin_unlock_irqrestore(&vcrypto->lock, flags);
> +}
> +
> +static int virtcrypto_find_vqs(struct virtio_crypto *vi)
> +{
> +	vq_callback_t **callbacks;
> +	struct virtqueue **vqs;
> +	int ret = -ENOMEM;
> +	int i, total_vqs;
> +	const char **names;
> +
> +	/* We expect 1 data virtqueue, followed by
> +	 * possible N-1 data queues used in multiqueue mode, followed by
> +	 * control vq.
> +	 */
> +	total_vqs = vi->max_data_queues + 1;
> +
> +	/* Allocate space for find_vqs parameters */
> +	vqs = kcalloc(total_vqs, sizeof(*vqs), GFP_KERNEL);
> +	if (!vqs)
> +		goto err_vq;
> +	callbacks = kcalloc(total_vqs, sizeof(*callbacks), GFP_KERNEL);
> +	if (!callbacks)
> +		goto err_callback;
> +	names = kcalloc(total_vqs, sizeof(*names), GFP_KERNEL);
> +	if (!names)
> +		goto err_names;
> +
> +	/* Parameters for control virtqueue */
> +	callbacks[total_vqs - 1] = NULL;
> +	names[total_vqs - 1] = "controlq";
> +
> +	/* Allocate/initialize parameters for data virtqueues */
> +	for (i = 0; i < vi->max_data_queues; i++) {
> +		callbacks[i] = virtcrypto_dataq_callback;
> +		snprintf(vi->data_vq[i].name, sizeof(vi->data_vq[i].name),
> +				"dataq.%d", i);
> +		names[i] = vi->data_vq[i].name;
> +	}
> +
> +	ret = vi->vdev->config->find_vqs(vi->vdev, total_vqs, vqs, callbacks,
> +					 names);
> +	if (ret)
> +		goto err_find;
> +
> +	vi->ctrl_vq = vqs[total_vqs - 1];
> +
> +	for (i = 0; i < vi->max_data_queues; i++)
> +		vi->data_vq[i].vq = vqs[i];
> +
> +	kfree(names);
> +	kfree(callbacks);
> +	kfree(vqs);
> +
> +	return 0;
> +
> +err_find:
> +	kfree(names);
> +err_names:
> +	kfree(callbacks);
> +err_callback:
> +	kfree(vqs);
> +err_vq:
> +	return ret;
> +}
> +
> +static int virtcrypto_alloc_queues(struct virtio_crypto *vi)
> +{
> +	vi->data_vq = kcalloc(vi->max_data_queues, sizeof(*vi->data_vq),
> +				GFP_KERNEL);
> +	if (!vi->data_vq)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static void virtcrypto_clean_affinity(struct virtio_crypto *vi, long hcpu)
> +{
> +	int i;
> +
> +	if (vi->affinity_hint_set) {
> +		for (i = 0; i < vi->max_data_queues; i++)
> +			virtqueue_set_affinity(vi->data_vq[i].vq, -1);
> +
> +		vi->affinity_hint_set = false;
> +	}
> +}
> +
> +static void virtcrypto_set_affinity(struct virtio_crypto *vi)
> +{
> +	int i;
> +	int cpu;
> +
> +	/*
> +	 * In multiqueue mode, when the number of cpu is equal to the number
> +	 * of queue, we let the queue to be private to one cpu by
> +	 * setting the affinity hint to eliminate the contention.
> +	 */
> +	if (vi->curr_queue == 1 ||
> +	    vi->max_data_queues != num_online_cpus()) {
> +		virtcrypto_clean_affinity(vi, -1);
> +		return;
> +	}
> +

So how can you handle cpu hotplug then?
I would imagine making max_data_queues > num_online_cpus
and initializing just the required number would be
a saner way.

> +	i = 0;
> +	for_each_online_cpu(cpu) {
> +		virtqueue_set_affinity(vi->data_vq[i].vq, cpu);
> +		i++;
> +	}
> +
> +	vi->affinity_hint_set = true;
> +}
> +

Above is racy in prsence of cpu hotplug. Is there a userspace
interface to fix up affinity after hotplug?
If not you need to get notified after hotplug and
fix it up yourself.

> +static void virtcrypto_free_queues(struct virtio_crypto *vi)
> +{
> +	kfree(vi->data_vq);
> +}
> +
> +static int virtcrypto_init_vqs(struct virtio_crypto *vi)
> +{
> +	int ret;
> +
> +	/* Allocate send & receive queues */
> +	ret = virtcrypto_alloc_queues(vi);
> +	if (ret)
> +		goto err;
> +
> +	ret = virtcrypto_find_vqs(vi);
> +	if (ret)
> +		goto err_free;
> +
> +	get_online_cpus();
> +	virtcrypto_set_affinity(vi);
> +	put_online_cpus();
> +
> +	return 0;
> +
> +err_free:
> +	virtcrypto_free_queues(vi);
> +err:
> +	return ret;
> +}
> +
> +static int virtcrypto_update_status(struct virtio_crypto *vcrypto)
> +{
> +	__le32 status_le;
> +	u32 status;
> +	int err;
> +
> +	status_le = virtio_cread32_le(vcrypto->vdev,
> +			offsetof(struct virtio_crypto_config, status));
> +	status = le32_to_cpu(status_le);
> +
> +	/* Ignore unknown (future) status bits */
> +	status &= VIRTIO_CRYPTO_S_HW_READY;
> +
> +	if (vcrypto->status == status)
> +		return 0;
> +
> +	vcrypto->status = status;
> +
> +	if (vcrypto->status & VIRTIO_CRYPTO_S_HW_READY) {
> +		err = virtcrypto_dev_start(vcrypto);
> +		if (err) {
> +			dev_err(&vcrypto->vdev->dev,
> +				"Failed to start virtio crypto device.\n");
> +			virtcrypto_dev_stop(vcrypto);
> +			return -1;
> +		}
> +		dev_info(&vcrypto->vdev->dev, "Accelerator is ready\n");
> +	} else {
> +		virtcrypto_dev_stop(vcrypto);
> +		dev_info(&vcrypto->vdev->dev, "Accelerator is not ready\n");
> +	}
> +
> +	return 0;
> +}
> +
> +static void virtcrypto_del_vqs(struct virtio_crypto *vcrypto)
> +{
> +	struct virtio_device *vdev = vcrypto->vdev;
> +
> +	virtcrypto_clean_affinity(vcrypto, -1);
> +
> +	vdev->config->del_vqs(vdev);
> +
> +	virtcrypto_free_queues(vcrypto);
> +}
> +
> +static int virtcrypto_probe(struct virtio_device *vdev)
> +{
> +	int err = -EFAULT;
> +	struct virtio_crypto *vcrypto;
> +	__le32 max_data_queues_le = 0, max_cipher_key_len_le = 0;
> +	__le32 max_auth_key_len_le = 0;
> +	__le64 max_size_le = 0;
> +

Pls validate that it's a modern device (has VERSION_1 set).
We should find a way to do it in a central place,
but we don't have that now.

> +	if (!vdev->config->get) {
> +		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> +			__func__);
> +		return -EINVAL;
> +	}
> +
> +	if (num_possible_nodes() > 1 && dev_to_node(&vdev->dev) < 0) {
> +		/*
> +		 * If the accelerator is connected to a node with no memory
> +		 * there is no point in using the accelerator since the remote
> +		 * memory transaction will be very slow.
> +		 */
> +		dev_err(&vdev->dev, "Invalid NUMA configuration.\n");
> +		return -EINVAL;
> +	}
> +
> +	vcrypto = kzalloc_node(sizeof(*vcrypto), GFP_KERNEL,
> +					dev_to_node(&vdev->dev));
> +	if (!vcrypto)
> +		return -ENOMEM;
> +
> +	max_data_queues_le = virtio_cread32_le(vdev,
> +			offsetof(struct virtio_crypto_config, max_dataqueues));
> +	vcrypto->max_data_queues = le32_to_cpu(max_data_queues_le);
> +	if (vcrypto->max_data_queues < 1)
> +		vcrypto->max_data_queues = 1;
> +
> +	dev_info(&vdev->dev, "max_queues: %u\n", vcrypto->max_data_queues);
> +
> +	max_cipher_key_len_le = virtio_cread32_le(vdev,
> +		offsetof(struct virtio_crypto_config, max_cipher_key_len));
> +	max_auth_key_len_le = virtio_cread32_le(vdev,
> +		offsetof(struct virtio_crypto_config, max_auth_key_len));
> +	max_size_le = virtio_cread64_le(vdev,
> +		offsetof(struct virtio_crypto_config, max_size));
> +
> +	/* Add virtio crypto device to global table */
> +	err = virtcrypto_devmgr_add_dev(vcrypto);
> +	if (err) {
> +		dev_err(&vdev->dev, "Failed to add new virtio crypto device.\n");
> +		goto free;
> +	}
> +	vcrypto->owner = THIS_MODULE;
> +	vcrypto = vdev->priv = vcrypto;
> +	vcrypto->vdev = vdev;
> +	spin_lock_init(&vcrypto->lock);
> +	spin_lock_init(&vcrypto->ctrl_lock);
> +
> +	/* Use sigle data queue as default */
> +	vcrypto->curr_queue = 1;
> +	vcrypto->max_cipher_key_len = le32_to_cpu(max_cipher_key_len_le);
> +	vcrypto->max_auth_key_len = le32_to_cpu(max_auth_key_len_le);
> +	vcrypto->max_size = le64_to_cpu(max_size_le);
> +
> +	err = virtcrypto_init_vqs(vcrypto);
> +	if (err) {
> +		dev_err(&vdev->dev, "Failed to initialize vqs.\n");
> +		goto free_dev;
> +	}
> +	virtio_device_ready(vdev);
> +
> +	err = virtcrypto_update_status(vcrypto);
> +	if (err) {
> +		err = -EFAULT;

Why EFAULT?

> +		goto free_vqs;
> +	}
> +
> +	return 0;
> +
> +free_vqs:
> +	virtcrypto_del_vqs(vcrypto);

as you called virtio_device_ready this is likely unsafe
unless you first reset device or make sure there are
no interrupts in some other way.

> +free_dev:
> +	virtcrypto_devmgr_rm_dev(vcrypto);
> +free:
> +	kfree(vcrypto);
> +	return err;
> +}
> +
> +static void virtcrypto_free_unused_reqs(struct virtio_crypto *vcrypto)
> +{
> +	struct virtio_crypto_request *vc_req;
> +	int i;
> +	struct virtqueue *vq;
> +
> +	for (i = 0; i < vcrypto->max_data_queues; i++) {
> +		vq = vcrypto->data_vq[i].vq;
> +		while ((vc_req = virtqueue_detach_unused_buf(vq)) != NULL) {
> +			kfree(vc_req->req_data);
> +			kfree(vc_req->sgs);
> +		}
> +	}
> +}
> +
> +static void virtcrypto_remove(struct virtio_device *vdev)
> +{
> +	struct virtio_crypto *vcrypto = vdev->priv;
> +
> +	dev_info(&vdev->dev, "Start virtcrypto_remove.\n");
> +
> +	if (virtcrypto_dev_started(vcrypto))
> +		virtcrypto_dev_stop(vcrypto);
> +	vdev->config->reset(vdev);
> +	virtcrypto_free_unused_reqs(vcrypto);
> +	virtcrypto_del_vqs(vcrypto);
> +	virtcrypto_devmgr_rm_dev(vcrypto);
> +	kfree(vcrypto);
> +}
> +
> +static void virtcrypto_config_changed(struct virtio_device *vdev)
> +{
> +	struct virtio_crypto *vcrypto = vdev->priv;
> +
> +	virtcrypto_update_status(vcrypto);
> +}
> +
> +#ifdef CONFIG_PM_SLEEP
> +static int virtcrypto_freeze(struct virtio_device *vdev)
> +{
> +	struct virtio_crypto *vcrypto = vdev->priv;
> +	unsigned long flags;
> +
> +	virtcrypto_free_unused_reqs(vcrypto);
> +	spin_lock_irqsave(&vcrypto->lock, flags);
> +	if (virtcrypto_dev_started(vcrypto))
> +		virtcrypto_dev_stop(vcrypto);

This is taking mutex locks under a spinlock.
Highly unlikely to work. Try this with lockdep
enabled.


> +	spin_unlock_irqrestore(&vcrypto->lock, flags);
> +
> +	virtcrypto_del_vqs(vcrypto);
> +	return 0;
> +}
> +
> +static int virtcrypto_restore(struct virtio_device *vdev)
> +{
> +	struct virtio_crypto *vcrypto = vdev->priv;
> +	int err;
> +
> +	err = virtcrypto_init_vqs(vcrypto);
> +	if (err)
> +		return err;
> +
> +	virtio_device_ready(vdev);
> +	err = virtcrypto_dev_start(vcrypto);
> +	if (err) {
> +		dev_err(&vdev->dev, "Failed to start virtio crypto device.\n");
> +		return -EFAULT;
> +	}
> +
> +	return 0;
> +}
> +#endif
> +
> +
> +static unsigned int features[] = {
> +	/* none */
> +};
> +
> +static struct virtio_device_id id_table[] = {
> +	{ VIRTIO_ID_CRYPTO, VIRTIO_DEV_ANY_ID },
> +	{ 0 },
> +};
> +
> +static struct virtio_driver virtio_crypto_driver = {
> +	.driver.name         = KBUILD_MODNAME,
> +	.driver.owner        = THIS_MODULE,
> +	.feature_table       = features,
> +	.feature_table_size  = ARRAY_SIZE(features),
> +	.id_table            = id_table,
> +	.probe               = virtcrypto_probe,
> +	.remove              = virtcrypto_remove,
> +	.config_changed = virtcrypto_config_changed,
> +#ifdef CONFIG_PM_SLEEP
> +	.freeze = virtcrypto_freeze,
> +	.restore = virtcrypto_restore,
> +#endif
> +};
> +
> +module_virtio_driver(virtio_crypto_driver);
> +
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +MODULE_DESCRIPTION("virtio crypto device driver");
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Gonglei <arei.gonglei@huawei.com>");
> diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c b/drivers/crypto/virtio/virtio_crypto_algs.c
> new file mode 100644
> index 0000000..7b57584
> --- /dev/null
> +++ b/drivers/crypto/virtio/virtio_crypto_algs.c
> @@ -0,0 +1,524 @@
> + /* Algorithms supported by virtio crypto device
> +  *
> +  * Authors: Gonglei <arei.gonglei@huawei.com>
> +  *
> +  * Copyright 2016 HUAWEI TECHNOLOGIES CO., LTD.
> +  *
> +  * This program is free software; you can redistribute it and/or modify
> +  * it under the terms of the GNU General Public License as published by
> +  * the Free Software Foundation; either version 2 of the License, or
> +  * (at your option) any later version.
> +  *
> +  * This program is distributed in the hope that it will be useful,
> +  * but WITHOUT ANY WARRANTY; without even the implied warranty of
> +  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +  * GNU General Public License for more details.
> +  *
> +  * You should have received a copy of the GNU General Public License
> +  * along with this program; if not, see <http://www.gnu.org/licenses/>.
> +  */
> +
> +#include <linux/scatterlist.h>
> +#include <crypto/algapi.h>
> +#include <linux/err.h>
> +#include <crypto/scatterwalk.h>
> +#include <linux/atomic.h>
> +
> +#include <uapi/linux/virtio_crypto.h>
> +#include "virtio_crypto_common.h"
> +
> +static DEFINE_MUTEX(algs_lock);
> +static unsigned int virtio_crypto_active_devs;
> +
> +static u64 virtio_crypto_alg_sg_nents_length(struct scatterlist *sg)
> +{
> +	u64 total = 0;
> +
> +	for (total = 0; sg; sg = sg_next(sg))
> +		total += sg->length;
> +
> +	return total;
> +}
> +
> +static int virtio_crypto_alg_validate_key(int key_len, int *alg)
> +{
> +	switch (key_len) {
> +	case AES_KEYSIZE_128:
> +	case AES_KEYSIZE_192:
> +	case AES_KEYSIZE_256:
> +		*alg = VIRTIO_CRYPTO_CIPHER_AES_CBC;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +static int virtio_crypto_alg_ablkcipher_init_session(
> +		struct virtio_crypto_ablkcipher_ctx *ctx,
> +		int alg, const uint8_t *key,
> +		unsigned int keylen,
> +		int encrypt)
> +{
> +	struct scatterlist outhdr, key_sg, inhdr, *sgs[3];
> +	unsigned int tmp;
> +	struct virtio_crypto *vcrypto = ctx->vcrypto;
> +	int op = encrypt ? VIRTIO_CRYPTO_OP_ENCRYPT : VIRTIO_CRYPTO_OP_DECRYPT;
> +	int err;
> +	unsigned int num_out = 0, num_in = 0;
> +
> +	/*
> +	 * Avoid to do DMA from the stack, switch to using
> +	 * dynamically-allocated for the key
> +	 */
> +	uint8_t *cipher_key = kmalloc(keylen, GFP_ATOMIC);

Is all crypto using GFP_ATOMIC like this?
Is there code that recovers if this fails?


> +
> +	if (!cipher_key)
> +		return -ENOMEM;
> +
> +	memcpy(cipher_key, key, keylen);
> +
> +	spin_lock(&vcrypto->ctrl_lock);
> +	/* Pad ctrl header */
> +	vcrypto->ctrl.header.opcode =
> +		cpu_to_le32(VIRTIO_CRYPTO_CIPHER_CREATE_SESSION);
> +	vcrypto->ctrl.header.algo = cpu_to_le32((uint32_t)alg);

why cast here?

> +	/* Set the default dataqueue id to 0 */
> +	vcrypto->ctrl.header.queue_id = 0;
> +
> +	vcrypto->input.status = cpu_to_le32(VIRTIO_CRYPTO_ERR);
> +	/* Pad cipher's parameters */
> +	vcrypto->ctrl.u.sym_create_session.op_type =
> +		cpu_to_le32(VIRTIO_CRYPTO_SYM_OP_CIPHER);
> +	vcrypto->ctrl.u.sym_create_session.u.cipher.para.algo =
> +		vcrypto->ctrl.header.algo;
> +	vcrypto->ctrl.u.sym_create_session.u.cipher.para.keylen =
> +		cpu_to_le32(keylen);
> +	vcrypto->ctrl.u.sym_create_session.u.cipher.para.op =
> +		cpu_to_le32(op);
> +
> +	sg_init_one(&outhdr, &vcrypto->ctrl, sizeof(vcrypto->ctrl));
> +	sgs[num_out++] = &outhdr;
> +
> +	/* Set key */
> +	sg_init_one(&key_sg, cipher_key, keylen);
> +	sgs[num_out++] = &key_sg;
> +
> +	/* Return status and session id back */
> +	sg_init_one(&inhdr, &vcrypto->input, sizeof(vcrypto->input));
> +	sgs[num_out + num_in++] = &inhdr;
> +
> +	err = virtqueue_add_sgs(vcrypto->ctrl_vq, sgs, num_out,
> +				num_in, vcrypto, GFP_ATOMIC);
> +	if (err < 0) {
> +		spin_unlock(&vcrypto->ctrl_lock);
> +		kfree(cipher_key);
> +		return err;
> +	}
> +	virtqueue_kick(vcrypto->ctrl_vq);
> +
> +	/*
> +	 * Spin for a response, the kick causes an ioport write, trapping
> +	 * into the hypervisor, so the request should be handled immediately.
> +	 */
> +	while (!virtqueue_get_buf(vcrypto->ctrl_vq, &tmp) &&
> +	       !virtqueue_is_broken(vcrypto->ctrl_vq))
> +		cpu_relax();
> +
> +	if (le32_to_cpu(vcrypto->input.status) != VIRTIO_CRYPTO_OK) {
> +		spin_unlock(&vcrypto->ctrl_lock);
> +		pr_err("virtio_crypto: Create session failed status: %u\n",
> +			le32_to_cpu(vcrypto->input.status));
> +		kfree(cipher_key);
> +		return -EINVAL;
> +	}
> +	spin_unlock(&vcrypto->ctrl_lock);
> +
> +	spin_lock(&ctx->lock);
> +	if (encrypt)
> +		ctx->enc_sess_info.session_id =
> +			le64_to_cpu(vcrypto->input.session_id);
> +	else
> +		ctx->dec_sess_info.session_id =
> +			le64_to_cpu(vcrypto->input.session_id);
> +	spin_unlock(&ctx->lock);
> +
> +	kfree(cipher_key);
> +	return 0;
> +}
> +
> +static int virtio_crypto_alg_ablkcipher_close_session(
> +		struct virtio_crypto_ablkcipher_ctx *ctx,
> +		int encrypt)
> +{
> +	struct scatterlist outhdr, status_sg, *sgs[2];
> +	unsigned int tmp;
> +	struct virtio_crypto_destroy_session_req *destroy_session;
> +	struct virtio_crypto *vcrypto = ctx->vcrypto;
> +	int err;
> +	unsigned int num_out = 0, num_in = 0;
> +
> +	spin_lock(&vcrypto->ctrl_lock);
> +	vcrypto->ctrl_status.status = VIRTIO_CRYPTO_ERR;
> +	/* Pad ctrl header */
> +	vcrypto->ctrl.header.opcode =
> +		cpu_to_le32(VIRTIO_CRYPTO_CIPHER_DESTROY_SESSION);
> +	/* Set the default virtqueue id to 0 */
> +	vcrypto->ctrl.header.queue_id = 0;
> +
> +	destroy_session = &vcrypto->ctrl.u.destroy_session;
> +
> +	if (encrypt)
> +		destroy_session->session_id =
> +			cpu_to_le64(ctx->enc_sess_info.session_id);
> +	else
> +		destroy_session->session_id =
> +			cpu_to_le64(ctx->dec_sess_info.session_id);
> +
> +	sg_init_one(&outhdr, &vcrypto->ctrl, sizeof(vcrypto->ctrl));
> +	sgs[num_out++] = &outhdr;
> +
> +	/* Return status and session id back */
> +	sg_init_one(&status_sg, &vcrypto->ctrl_status.status,
> +		sizeof(vcrypto->ctrl_status.status));
> +	sgs[num_out + num_in++] = &status_sg;
> +
> +	err = virtqueue_add_sgs(vcrypto->ctrl_vq, sgs, num_out,
> +			num_in, vcrypto, GFP_ATOMIC);
> +	if (err < 0) {
> +		spin_unlock(&vcrypto->ctrl_lock);
> +		return err;
> +	}
> +	virtqueue_kick(vcrypto->ctrl_vq);
> +
> +	while (!virtqueue_get_buf(vcrypto->ctrl_vq, &tmp) &&
> +	       !virtqueue_is_broken(vcrypto->ctrl_vq))
> +		cpu_relax();
> +
> +	if (vcrypto->ctrl_status.status != VIRTIO_CRYPTO_OK) {
> +		spin_unlock(&vcrypto->ctrl_lock);
> +		pr_err("virtio_crypto: Close session failed status: %u, session_id: 0x%llx\n",
> +			vcrypto->ctrl_status.status,
> +			destroy_session->session_id);
> +
> +		return -EINVAL;
> +	}
> +	spin_unlock(&vcrypto->ctrl_lock);
> +
> +	return 0;
> +}
> +
> +static int virtio_crypto_alg_ablkcipher_init_sessions(
> +		struct virtio_crypto_ablkcipher_ctx *ctx,
> +		const uint8_t *key, unsigned int keylen)
> +{
> +	int alg;
> +	int ret;
> +	struct virtio_crypto *vcrypto = ctx->vcrypto;
> +
> +	if (keylen > vcrypto->max_cipher_key_len) {
> +		pr_err("virtio_crypto: the key is too long\n");
> +		goto bad_key;
> +	}
> +
> +	if (virtio_crypto_alg_validate_key(keylen, &alg))
> +		goto bad_key;
> +
> +	/* Create encryption session */
> +	ret = virtio_crypto_alg_ablkcipher_init_session(ctx,
> +			alg, key, keylen, 1);
> +	if (ret)
> +		return ret;
> +	/* Create decryption session */
> +	ret = virtio_crypto_alg_ablkcipher_init_session(ctx,
> +			alg, key, keylen, 0);
> +	if (ret) {
> +		virtio_crypto_alg_ablkcipher_close_session(ctx, 1);
> +		return ret;
> +	}
> +	return 0;
> +
> +bad_key:
> +	crypto_tfm_set_flags(ctx->tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
> +	return -EINVAL;
> +}
> +
> +/* Note: kernel crypto API realization */
> +static int virtio_crypto_ablkcipher_setkey(struct crypto_ablkcipher *tfm,
> +					 const uint8_t *key,
> +					 unsigned int keylen)
> +{
> +	struct virtio_crypto_ablkcipher_ctx *ctx = crypto_ablkcipher_ctx(tfm);
> +	int ret;
> +
> +	spin_lock(&ctx->lock);
> +
> +	if (!ctx->vcrypto) {
> +		/* New key */
> +		int node = virtio_crypto_get_current_node();
> +		struct virtio_crypto *vcrypto =
> +				      virtcrypto_get_dev_node(node);
> +		if (!vcrypto) {
> +			vcrypto = virtcrypto_devmgr_get_first();
> +			if (!vcrypto) {
> +				pr_err("virtio_crypto: Could not find a virtio device in the system");
> +				spin_unlock(&ctx->lock);
> +				return -ENODEV;
> +			}
> +		}
> +
> +		ctx->vcrypto = vcrypto;
> +	}
> +	spin_unlock(&ctx->lock);
> +
> +	ret = virtio_crypto_alg_ablkcipher_init_sessions(ctx, key, keylen);
> +	if (ret) {
> +		virtcrypto_dev_put(ctx->vcrypto);
> +		ctx->vcrypto = NULL;
> +
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +__virtio_crypto_ablkcipher_do_req(struct virtio_crypto_request *vc_req,
> +		struct ablkcipher_request *req,
> +		struct data_queue *data_vq,
> +		__u8 op)
> +{
> +	struct virtio_crypto_ablkcipher_ctx *ctx = vc_req->ablkcipher_ctx;
> +	struct virtio_crypto *vcrypto = ctx->vcrypto;
> +	struct virtio_crypto_op_data_req *req_data;
> +	int src_nents, dst_nents;
> +	int err;
> +	unsigned long flags;
> +	struct scatterlist outhdr, iv_sg, status_sg, **sgs;
> +	int i;
> +	u64 dst_len;
> +	unsigned int num_out = 0, num_in = 0;
> +	int sg_total;
> +
> +	src_nents = sg_nents_for_len(req->src, req->nbytes);
> +	dst_nents = sg_nents(req->dst);
> +
> +	pr_debug("virtio_crypto: Number of sgs (src_nents: %d, dst_nents: %d)\n",
> +			src_nents, dst_nents);
> +
> +	/* Why 3?  outhdr + iv + inhdr */
> +	sg_total = src_nents + dst_nents + 3;
> +	sgs = kzalloc_node(sg_total * sizeof(*sgs), GFP_ATOMIC,
> +				dev_to_node(&vcrypto->vdev->dev));
> +	if (!sgs)
> +		return -ENOMEM;
> +
> +	req_data = kzalloc_node(sizeof(*req_data), GFP_ATOMIC,
> +				dev_to_node(&vcrypto->vdev->dev));
> +	if (!req_data) {
> +		kfree(sgs);
> +		return -ENOMEM;
> +	}
> +
> +	vc_req->req_data = req_data;
> +	vc_req->type = VIRTIO_CRYPTO_SYM_OP_CIPHER;
> +	/* Head of operation */
> +	if (op) {
> +		req_data->header.session_id =
> +			cpu_to_le64(ctx->enc_sess_info.session_id);
> +		req_data->header.opcode =
> +			cpu_to_le32(VIRTIO_CRYPTO_CIPHER_ENCRYPT);
> +	} else {
> +		req_data->header.session_id =
> +			cpu_to_le64(ctx->dec_sess_info.session_id);
> +	    req_data->header.opcode =
> +			cpu_to_le32(VIRTIO_CRYPTO_CIPHER_DECRYPT);
> +	}
> +	req_data->u.sym_req.op_type = cpu_to_le32(VIRTIO_CRYPTO_SYM_OP_CIPHER);
> +	req_data->u.sym_req.u.cipher.para.iv_len = cpu_to_le32(AES_BLOCK_SIZE);
> +	req_data->u.sym_req.u.cipher.para.src_data_len =
> +			cpu_to_le32(req->nbytes);
> +
> +	dst_len = virtio_crypto_alg_sg_nents_length(req->dst);
> +	if (unlikely(dst_len > U32_MAX)) {
> +		pr_err("virtio_crypto: The dst_len is beyond U32_MAX\n");
> +		err = -EINVAL;
> +		goto free;
> +	}
> +
> +	pr_debug("virtio_crypto: src_len: %u, dst_len: %llu\n",
> +			req->nbytes, dst_len);
> +
> +	if (unlikely(req->nbytes + dst_len + AES_BLOCK_SIZE +
> +		sizeof(vc_req->status) > vcrypto->max_size)) {
> +		pr_err("virtio_crypto: The length is too big\n");
> +		err = -EINVAL;
> +		goto free;
> +	}
> +
> +	req_data->u.sym_req.u.cipher.para.dst_data_len =
> +			cpu_to_le32((uint32_t)dst_len);
> +
> +	/* Outhdr */
> +	sg_init_one(&outhdr, req_data, sizeof(*req_data));
> +	sgs[num_out++] = &outhdr;
> +
> +	/* IV */
> +	sg_init_one(&iv_sg, req->info, AES_BLOCK_SIZE);
> +	sgs[num_out++] = &iv_sg;
> +
> +	/* Source data */
> +	for (i = 0; i < src_nents; i++)
> +		sgs[num_out++] = &req->src[i];
> +
> +	/* Destination data */
> +	for (i = 0; i < dst_nents; i++)
> +		sgs[num_out + num_in++] = &req->dst[i];
> +
> +	/* Status */
> +	sg_init_one(&status_sg, &vc_req->status, sizeof(vc_req->status));
> +	sgs[num_out + num_in++] = &status_sg;
> +
> +	vc_req->sgs = sgs;
> +
> +	spin_lock_irqsave(&vcrypto->lock, flags);
> +	err = virtqueue_add_sgs(data_vq->vq, sgs, num_out,
> +				num_in, vc_req, GFP_ATOMIC);
> +	spin_unlock_irqrestore(&vcrypto->lock, flags);
> +	if (unlikely(err < 0))
> +		goto free;
> +
> +	return 0;
> +
> +free:
> +	kfree(req_data);
> +	kfree(sgs);
> +	return err;
> +}
> +
> +static int virtio_crypto_ablkcipher_encrypt(struct ablkcipher_request *req)
> +{
> +	struct crypto_ablkcipher *atfm = crypto_ablkcipher_reqtfm(req);
> +	struct virtio_crypto_ablkcipher_ctx *ctx = crypto_ablkcipher_ctx(atfm);
> +	struct virtio_crypto_request *vc_req = ablkcipher_request_ctx(req);
> +	struct virtio_crypto *vcrypto = ctx->vcrypto;
> +	int ret;
> +	/* Use the first data virtqueue as default */
> +	struct data_queue *data_vq = &vcrypto->data_vq[0];
> +
> +	vc_req->ablkcipher_ctx = ctx;
> +	vc_req->ablkcipher_req = req;
> +	ret = __virtio_crypto_ablkcipher_do_req(vc_req, req, data_vq, 1);
> +	if (ret < 0) {
> +		pr_err("virtio_crypto: Encryption failed!\n");
> +		return ret;
> +	}
> +	virtqueue_kick(data_vq->vq);
> +
> +	return -EINPROGRESS;
> +}
> +
> +static int virtio_crypto_ablkcipher_decrypt(struct ablkcipher_request *req)
> +{
> +	struct crypto_ablkcipher *atfm = crypto_ablkcipher_reqtfm(req);
> +	struct virtio_crypto_ablkcipher_ctx *ctx = crypto_ablkcipher_ctx(atfm);
> +	struct virtio_crypto_request *vc_req = ablkcipher_request_ctx(req);
> +	struct virtio_crypto *vcrypto = ctx->vcrypto;
> +	int ret;
> +	/* Use the first data virtqueue as default */
> +	struct data_queue *data_vq = &vcrypto->data_vq[0];
> +
> +	vc_req->ablkcipher_ctx = ctx;
> +	vc_req->ablkcipher_req = req;
> +
> +	ret = __virtio_crypto_ablkcipher_do_req(vc_req, req, data_vq, 0);
> +	if (ret < 0) {
> +		pr_err("virtio_crypto: Decryption failed!\n");
> +		return ret;
> +	}
> +	virtqueue_kick(data_vq->vq);
> +
> +	return -EINPROGRESS;
> +}
> +
> +static int virtio_crypto_ablkcipher_init(struct crypto_tfm *tfm)
> +{
> +	struct virtio_crypto_ablkcipher_ctx *ctx = crypto_tfm_ctx(tfm);
> +
> +	spin_lock_init(&ctx->lock);
> +	tfm->crt_ablkcipher.reqsize = sizeof(struct virtio_crypto_request);
> +	ctx->tfm = tfm;
> +
> +	return 0;
> +}
> +
> +static void virtio_crypto_ablkcipher_exit(struct crypto_tfm *tfm)
> +{
> +	struct virtio_crypto_ablkcipher_ctx *ctx = crypto_tfm_ctx(tfm);
> +
> +	if (!ctx->vcrypto)
> +		return;
> +
> +	virtio_crypto_alg_ablkcipher_close_session(ctx, 1);
> +	virtio_crypto_alg_ablkcipher_close_session(ctx, 0);
> +	virtcrypto_dev_put(ctx->vcrypto);
> +	ctx->vcrypto = NULL;
> +}
> +
> +static struct crypto_alg virtio_crypto_algs[] = { {
> +	.cra_name = "cbc(aes)",
> +	.cra_driver_name = "virtio_crypto_aes_cbc",
> +	.cra_priority = 4001,
> +	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
> +	.cra_blocksize = AES_BLOCK_SIZE,
> +	.cra_ctxsize  = sizeof(struct virtio_crypto_ablkcipher_ctx),
> +	.cra_alignmask = 0,
> +	.cra_module = THIS_MODULE,
> +	.cra_type = &crypto_ablkcipher_type,
> +	.cra_init = virtio_crypto_ablkcipher_init,
> +	.cra_exit = virtio_crypto_ablkcipher_exit,
> +	.cra_u = {
> +	   .ablkcipher = {
> +			.setkey = virtio_crypto_ablkcipher_setkey,
> +			.decrypt = virtio_crypto_ablkcipher_decrypt,
> +			.encrypt = virtio_crypto_ablkcipher_encrypt,
> +			.min_keysize = AES_MIN_KEY_SIZE,
> +			.max_keysize = AES_MAX_KEY_SIZE,
> +			.ivsize = AES_BLOCK_SIZE,
> +		},
> +	},
> +} };
> +
> +int virtio_crypto_algs_register(void)
> +{
> +	int ret = 0, i;
> +
> +	mutex_lock(&algs_lock);
> +	if (++virtio_crypto_active_devs != 1)
> +		goto unlock;
> +
> +	for (i = 0; i < ARRAY_SIZE(virtio_crypto_algs); i++) {
> +		virtio_crypto_algs[i].cra_flags =
> +			     CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC;
> +	}
> +
> +	ret = crypto_register_algs(virtio_crypto_algs,
> +			ARRAY_SIZE(virtio_crypto_algs));
> +
> +unlock:
> +	mutex_unlock(&algs_lock);
> +	return ret;
> +}
> +
> +void virtio_crypto_algs_unregister(void)
> +{
> +	mutex_lock(&algs_lock);
> +	if (--virtio_crypto_active_devs != 0)
> +		goto unlock;
> +
> +	crypto_unregister_algs(virtio_crypto_algs,
> +			ARRAY_SIZE(virtio_crypto_algs));
> +
> +unlock:
> +	mutex_unlock(&algs_lock);
> +}
> diff --git a/drivers/crypto/virtio/virtio_crypto_common.h b/drivers/crypto/virtio/virtio_crypto_common.h
> new file mode 100644
> index 0000000..a599733
> --- /dev/null
> +++ b/drivers/crypto/virtio/virtio_crypto_common.h
> @@ -0,0 +1,124 @@
> +/* Common header for Virtio crypto device.
> + *
> + * Copyright 2016 HUAWEI TECHNOLOGIES CO., LTD.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef _VIRITO_CRYPTO_COMMON_H
> +#define _VIRITO_CRYPTO_COMMON_H
> +
> +#include <linux/virtio.h>
> +#include <linux/crypto.h>
> +#include <linux/spinlock.h>
> +#include <crypto/aead.h>
> +#include <crypto/aes.h>
> +#include <crypto/authenc.h>
> +
> +
> +/* Internal representation of a data virtqueue */
> +struct data_queue {
> +	/* Virtqueue associated with this send _queue */
> +	struct virtqueue *vq;
> +
> +	/* Name of the tx queue: dataq.$index */
> +	char name[32];
> +};
> +
> +struct virtio_crypto {
> +	struct virtio_device *vdev;
> +	struct virtqueue *ctrl_vq;
> +	struct data_queue *data_vq;
> +
> +	/* To protect the vq operations for the dataq */
> +	spinlock_t lock;
> +
> +	/* To protect the vq operations for the controlq */
> +	spinlock_t ctrl_lock;
> +
> +	/* Maximum of data queues supported by the device */
> +	u32 max_data_queues;
> +
> +	/* Number of queue currently used by the driver */
> +	u32 curr_queue;
> +
> +	/* Maximum length of cipher key */
> +	u32 max_cipher_key_len;
> +	/* Maximum length of authenticated key */
> +	u32 max_auth_key_len;
> +	/* Maximum size of per request */
> +	u64 max_size;
> +
> +	/* Control VQ buffers: protected by the ctrl_lock */
> +	struct virtio_crypto_op_ctrl_req ctrl;
> +	struct virtio_crypto_session_input input;
> +	struct virtio_crypto_inhdr ctrl_status;
> +
> +	unsigned long status;
> +	atomic_t ref_count;
> +	struct list_head list;
> +	struct module *owner;
> +	uint8_t dev_id;
> +
> +	/* Does the affinity hint is set for virtqueues? */
> +	bool affinity_hint_set;
> +};
> +
> +struct virtio_crypto_sym_session_info {
> +	/* Backend session id, which come from the host side */
> +	__u64 session_id;
> +};
> +
> +struct virtio_crypto_ablkcipher_ctx {
> +	struct virtio_crypto *vcrypto;
> +	struct crypto_tfm *tfm;
> +
> +	struct virtio_crypto_sym_session_info enc_sess_info;
> +	struct virtio_crypto_sym_session_info dec_sess_info;
> +
> +	/* Protects virtio_crypto_ablkcipher_ctx struct */
> +	spinlock_t lock;
> +};
> +
> +struct virtio_crypto_request {
> +	/* Cipher or aead */
> +	uint32_t type;
> +	uint8_t status;
> +	struct virtio_crypto_ablkcipher_ctx *ablkcipher_ctx;
> +	struct ablkcipher_request *ablkcipher_req;
> +	struct virtio_crypto_op_data_req *req_data;
> +	struct scatterlist **sgs;
> +};
> +
> +int virtcrypto_devmgr_add_dev(struct virtio_crypto *vcrypto_dev);
> +struct list_head *virtcrypto_devmgr_get_head(void);
> +void virtcrypto_devmgr_rm_dev(struct virtio_crypto *vcrypto_dev);
> +struct virtio_crypto *virtcrypto_devmgr_get_first(void);
> +int virtcrypto_dev_in_use(struct virtio_crypto *vcrypto_dev);
> +int virtcrypto_dev_get(struct virtio_crypto *vcrypto_dev);
> +void virtcrypto_dev_put(struct virtio_crypto *vcrypto_dev);
> +int virtcrypto_dev_started(struct virtio_crypto *vcrypto_dev);
> +struct virtio_crypto *virtcrypto_get_dev_node(int node);
> +int virtcrypto_dev_start(struct virtio_crypto *vcrypto);
> +void virtcrypto_dev_stop(struct virtio_crypto *vcrypto);
> +
> +static inline int virtio_crypto_get_current_node(void)
> +{
> +	return topology_physical_package_id(smp_processor_id());
> +}
> +
> +int virtio_crypto_algs_register(void);
> +void virtio_crypto_algs_unregister(void);
> +
> +#endif /* _VIRITO_CRYPTO_COMMON_H */
> diff --git a/drivers/crypto/virtio/virtio_crypto_mgr.c b/drivers/crypto/virtio/virtio_crypto_mgr.c
> new file mode 100644
> index 0000000..5b7260c
> --- /dev/null
> +++ b/drivers/crypto/virtio/virtio_crypto_mgr.c
> @@ -0,0 +1,258 @@
> + /* Management for virtio crypto devices (refer to adf_dev_mgr.c)
> +  *
> +  * Copyright 2016 HUAWEI TECHNOLOGIES CO., LTD.
> +  *
> +  * This program is free software; you can redistribute it and/or modify
> +  * it under the terms of the GNU General Public License as published by
> +  * the Free Software Foundation; either version 2 of the License, or
> +  * (at your option) any later version.
> +  *
> +  * This program is distributed in the hope that it will be useful,
> +  * but WITHOUT ANY WARRANTY; without even the implied warranty of
> +  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +  * GNU General Public License for more details.
> +  *
> +  * You should have received a copy of the GNU General Public License
> +  * along with this program; if not, see <http://www.gnu.org/licenses/>.
> +  */
> +
> +#include <linux/mutex.h>
> +#include <linux/list.h>
> +#include <linux/module.h>
> +
> +#include <uapi/linux/virtio_crypto.h>
> +#include "virtio_crypto_common.h"
> +
> +static LIST_HEAD(virtio_crypto_table);
> +static DEFINE_MUTEX(table_lock);
> +static uint32_t num_devices;
> +
> +#define VIRTIO_CRYPTO_MAX_DEVICES 32
> +
> +
> +/*
> + * virtcrypto_devmgr_add_dev() - Add vcrypto_dev to the acceleration
> + * framework.
> + * @vcrypto_dev:  Pointer to virtio crypto device.
> + *
> + * Function adds virtio crypto device to the global list.
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: 0 on success, error code othewise.
> + */
> +int virtcrypto_devmgr_add_dev(struct virtio_crypto *vcrypto_dev)
> +{
> +	struct list_head *itr;
> +
> +	if (num_devices == VIRTIO_CRYPTO_MAX_DEVICES) {
> +		pr_info("virtio_crypto: only support up to %d devices\n",
> +			    VIRTIO_CRYPTO_MAX_DEVICES);
> +		return -EFAULT;
> +	}
> +
> +	mutex_lock(&table_lock);
> +	list_for_each(itr, &virtio_crypto_table) {
> +		struct virtio_crypto *ptr =
> +				list_entry(itr, struct virtio_crypto, list);
> +
> +		if (ptr == vcrypto_dev) {
> +			mutex_unlock(&table_lock);
> +			return -EEXIST;
> +		}
> +	}
> +	atomic_set(&vcrypto_dev->ref_count, 0);
> +	list_add_tail(&vcrypto_dev->list, &virtio_crypto_table);
> +	vcrypto_dev->dev_id = num_devices++;
> +	mutex_unlock(&table_lock);
> +	return 0;
> +}
> +
> +struct list_head *virtcrypto_devmgr_get_head(void)
> +{
> +	return &virtio_crypto_table;
> +}
> +
> +/*
> + * virtcrypto_devmgr_rm_dev() - Remove vcrypto_dev from the acceleration
> + * framework.
> + * @vcrypto_dev:  Pointer to virtio crypto device.
> + *
> + * Function removes virtio crypto device from the acceleration framework.
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: void
> + */
> +void virtcrypto_devmgr_rm_dev(struct virtio_crypto *vcrypto_dev)
> +{
> +	mutex_lock(&table_lock);
> +	list_del(&vcrypto_dev->list);
> +	num_devices--;
> +	mutex_unlock(&table_lock);
> +}
> +
> +/*
> + * virtcrypto_devmgr_get_first()
> + *
> + * Function returns the first virtio crypto device from the acceleration
> + * framework.
> + *
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: pointer to vcrypto_dev or NULL if not found.
> + */
> +struct virtio_crypto *virtcrypto_devmgr_get_first(void)
> +{
> +	struct virtio_crypto *dev = NULL;
> +
> +	if (!list_empty(&virtio_crypto_table))
> +		dev = list_first_entry(&virtio_crypto_table,
> +					struct virtio_crypto,
> +				    list);
> +	return dev;
> +}
> +
> +/*
> + * virtcrypto_dev_in_use() - Check whether vcrypto_dev is currently in use
> + * @vcrypto_dev: Pointer to virtio crypto device.
> + *
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: 1 when device is in use, 0 otherwise.
> + */
> +int virtcrypto_dev_in_use(struct virtio_crypto *vcrypto_dev)
> +{
> +	return atomic_read(&vcrypto_dev->ref_count) != 0;
> +}
> +
> +/*
> + * virtcrypto_dev_get() - Increment vcrypto_dev reference count
> + * @vcrypto_dev: Pointer to virtio crypto device.
> + *
> + * Increment the vcrypto_dev refcount and if this is the first time
> + * incrementing it during this period the vcrypto_dev is in use,
> + * increment the module refcount too.
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: 0 when successful, EFAULT when fail to bump module refcount
> + */
> +int virtcrypto_dev_get(struct virtio_crypto *vcrypto_dev)
> +{
> +	if (atomic_add_return(1, &vcrypto_dev->ref_count) == 1)
> +		if (!try_module_get(vcrypto_dev->owner))
> +			return -EFAULT;
> +	return 0;
> +}
> +
> +/*
> + * virtcrypto_dev_put() - Decrement vcrypto_dev reference count
> + * @vcrypto_dev: Pointer to virtio crypto device.
> + *
> + * Decrement the vcrypto_dev refcount and if this is the last time
> + * decrementing it during this period the vcrypto_dev is in use,
> + * decrement the module refcount too.
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: void
> + */
> +void virtcrypto_dev_put(struct virtio_crypto *vcrypto_dev)
> +{
> +	if (atomic_sub_return(1, &vcrypto_dev->ref_count) == 0)
> +		module_put(vcrypto_dev->owner);
> +}
> +
> +/*
> + * virtcrypto_dev_started() - Check whether device has started
> + * @vcrypto_dev: Pointer to virtio crypto device.
> + *
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: 1 when the device has started, 0 otherwise
> + */
> +int virtcrypto_dev_started(struct virtio_crypto *vcrypto_dev)
> +{
> +	return (vcrypto_dev->status & VIRTIO_CRYPTO_S_HW_READY);
> +}
> +
> +/*
> + * virtcrypto_get_dev_node() - Get vcrypto_dev on the node.
> + * @node:  Node id the driver works.
> + *
> + * Function returns the virtio crypto device used fewest on the node.
> + *
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: pointer to vcrypto_dev or NULL if not found.
> + */
> +struct virtio_crypto *virtcrypto_get_dev_node(int node)
> +{
> +	struct virtio_crypto *vcrypto_dev = NULL, *tmp_dev;
> +	unsigned long best = ~0;
> +	unsigned long ctr;
> +
> +	list_for_each_entry(tmp_dev, virtcrypto_devmgr_get_head(), list) {
> +
> +		if ((node == dev_to_node(&tmp_dev->vdev->dev) ||
> +		     dev_to_node(&tmp_dev->vdev->dev) < 0) &&
> +		    virtcrypto_dev_started(tmp_dev)) {
> +			ctr = atomic_read(&tmp_dev->ref_count);
> +			if (best > ctr) {
> +				vcrypto_dev = tmp_dev;
> +				best = ctr;
> +			}
> +		}
> +	}
> +
> +	if (!vcrypto_dev) {
> +		pr_info("virtio_crypto: Could not find a device on node %d\n",
> +				node);
> +		/* Get any started device */
> +		list_for_each_entry(tmp_dev,
> +				virtcrypto_devmgr_get_head(), list) {
> +			if (virtcrypto_dev_started(tmp_dev)) {
> +				vcrypto_dev = tmp_dev;
> +				break;
> +			}
> +		}
> +	}
> +
> +	if (!vcrypto_dev)
> +		return NULL;
> +
> +	virtcrypto_dev_get(vcrypto_dev);
> +	return vcrypto_dev;
> +}
> +
> +/*
> + * virtcrypto_dev_start() - Start virtio crypto device
> + * @vcrypto:    Pointer to virtio crypto device.
> + *
> + * Function notifies all the registered services that the virtio crypto device
> + * is ready to be used.
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: 0 on success, EFAULT when fail to register algorithms
> + */
> +int virtcrypto_dev_start(struct virtio_crypto *vcrypto)
> +{
> +	if (virtio_crypto_algs_register()) {
> +		pr_err("virtio_crypto: Failed to register crypto algs\n");
> +		return -EFAULT;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * virtcrypto_dev_stop() - Stop virtio crypto device
> + * @vcrypto:    Pointer to virtio crypto device.
> + *
> + * Function notifies all the registered services that the virtio crypto device
> + * is ready to be used.
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: void
> + */
> +void virtcrypto_dev_stop(struct virtio_crypto *vcrypto)
> +{
> +	virtio_crypto_algs_unregister();
> +}
> diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
> index cd2be1c..4bdb84c 100644
> --- a/include/uapi/linux/Kbuild
> +++ b/include/uapi/linux/Kbuild
> @@ -460,6 +460,7 @@ header-y += virtio_rng.h
>  header-y += virtio_scsi.h
>  header-y += virtio_types.h
>  header-y += virtio_vsock.h
> +header-y += virtio_crypto.h
>  header-y += vm_sockets.h
>  header-y += vt.h
>  header-y += vtpm_proxy.h
> diff --git a/include/uapi/linux/virtio_crypto.h b/include/uapi/linux/virtio_crypto.h
> new file mode 100644
> index 0000000..abc2cf5
> --- /dev/null
> +++ b/include/uapi/linux/virtio_crypto.h
> @@ -0,0 +1,435 @@
> +#ifndef _VIRTIO_CRYPTO_H
> +#define _VIRTIO_CRYPTO_H
> +/* This header is BSD licensed so anyone can use the definitions to implement
> + * compatible drivers/servers.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of IBM nor the names of its contributors
> + *    may be used to endorse or promote products derived from this software
> + *    without specific prior written permission.
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR
> + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
> + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
> + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
> + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#include <linux/types.h>
> +#include <linux/virtio_types.h>
> +#include <linux/virtio_ids.h>
> +#include <linux/virtio_config.h>
> +
> +
> +#define VIRTIO_CRYPTO_SERVICE_CIPHER 0
> +#define VIRTIO_CRYPTO_SERVICE_HASH   1
> +#define VIRTIO_CRYPTO_SERVICE_MAC    2
> +#define VIRTIO_CRYPTO_SERVICE_AEAD   3
> +
> +#define VIRTIO_CRYPTO_OPCODE(service, op)   (((service) << 8) | (op))
> +
> +struct virtio_crypto_ctrl_header {
> +#define VIRTIO_CRYPTO_CIPHER_CREATE_SESSION \
> +	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x02)
> +#define VIRTIO_CRYPTO_CIPHER_DESTROY_SESSION \
> +	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x03)
> +#define VIRTIO_CRYPTO_HASH_CREATE_SESSION \
> +	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x02)
> +#define VIRTIO_CRYPTO_HASH_DESTROY_SESSION \
> +	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x03)
> +#define VIRTIO_CRYPTO_MAC_CREATE_SESSION \
> +	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x02)
> +#define VIRTIO_CRYPTO_MAC_DESTROY_SESSION \
> +	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x03)
> +#define VIRTIO_CRYPTO_AEAD_CREATE_SESSION \
> +	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x02)
> +#define VIRTIO_CRYPTO_AEAD_DESTROY_SESSION \
> +	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x03)
> +	__le32 opcode;
> +	__le32 algo;
> +	__le32 flag;
> +	/* data virtqueue id */
> +	__le32 queue_id;
> +};
> +
> +struct virtio_crypto_cipher_session_para {
> +#define VIRTIO_CRYPTO_NO_CIPHER                 0
> +#define VIRTIO_CRYPTO_CIPHER_ARC4               1
> +#define VIRTIO_CRYPTO_CIPHER_AES_ECB            2
> +#define VIRTIO_CRYPTO_CIPHER_AES_CBC            3
> +#define VIRTIO_CRYPTO_CIPHER_AES_CTR            4
> +#define VIRTIO_CRYPTO_CIPHER_DES_ECB            5
> +#define VIRTIO_CRYPTO_CIPHER_DES_CBC            6
> +#define VIRTIO_CRYPTO_CIPHER_3DES_ECB           7
> +#define VIRTIO_CRYPTO_CIPHER_3DES_CBC           8
> +#define VIRTIO_CRYPTO_CIPHER_3DES_CTR           9
> +#define VIRTIO_CRYPTO_CIPHER_KASUMI_F8          10
> +#define VIRTIO_CRYPTO_CIPHER_SNOW3G_UEA2        11
> +#define VIRTIO_CRYPTO_CIPHER_AES_F8             12
> +#define VIRTIO_CRYPTO_CIPHER_AES_XTS            13
> +#define VIRTIO_CRYPTO_CIPHER_ZUC_EEA3           14
> +	__le32 algo;
> +	/* length of key */
> +	__le32 keylen;
> +
> +#define VIRTIO_CRYPTO_OP_ENCRYPT  1
> +#define VIRTIO_CRYPTO_OP_DECRYPT  2
> +	/* encrypt or decrypt */
> +	__le32 op;
> +	__le32 padding;
> +};
> +
> +struct virtio_crypto_session_input {
> +	/* Device-writable part */
> +	__le64 session_id;
> +	__le32 status;
> +	__le32 padding;
> +};
> +
> +struct virtio_crypto_cipher_session_req {
> +	struct virtio_crypto_cipher_session_para para;
> +};
> +
> +struct virtio_crypto_hash_session_para {
> +#define VIRTIO_CRYPTO_NO_HASH            0
> +#define VIRTIO_CRYPTO_HASH_MD5           1
> +#define VIRTIO_CRYPTO_HASH_SHA1          2
> +#define VIRTIO_CRYPTO_HASH_SHA_224       3
> +#define VIRTIO_CRYPTO_HASH_SHA_256       4
> +#define VIRTIO_CRYPTO_HASH_SHA_384       5
> +#define VIRTIO_CRYPTO_HASH_SHA_512       6
> +#define VIRTIO_CRYPTO_HASH_SHA3_224      7
> +#define VIRTIO_CRYPTO_HASH_SHA3_256      8
> +#define VIRTIO_CRYPTO_HASH_SHA3_384      9
> +#define VIRTIO_CRYPTO_HASH_SHA3_512      10
> +#define VIRTIO_CRYPTO_HASH_SHA3_SHAKE128      11
> +#define VIRTIO_CRYPTO_HASH_SHA3_SHAKE256      12
> +	__le32 algo;
> +	/* hash result length */
> +	__le32 hash_result_len;
> +};
> +
> +struct virtio_crypto_hash_create_session_req {
> +	struct virtio_crypto_hash_session_para para;
> +};
> +
> +struct virtio_crypto_mac_session_para {
> +#define VIRTIO_CRYPTO_NO_MAC                       0
> +#define VIRTIO_CRYPTO_MAC_HMAC_MD5                 1
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA1                2
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_224             3
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_256             4
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_384             5
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_512             6
> +#define VIRTIO_CRYPTO_MAC_CMAC_3DES                25
> +#define VIRTIO_CRYPTO_MAC_CMAC_AES                 26
> +#define VIRTIO_CRYPTO_MAC_KASUMI_F9                27
> +#define VIRTIO_CRYPTO_MAC_SNOW3G_UIA2              28
> +#define VIRTIO_CRYPTO_MAC_GMAC_AES                 41
> +#define VIRTIO_CRYPTO_MAC_GMAC_TWOFISH             42
> +#define VIRTIO_CRYPTO_MAC_CBCMAC_AES               49
> +#define VIRTIO_CRYPTO_MAC_CBCMAC_KASUMI_F9         50
> +#define VIRTIO_CRYPTO_MAC_XCBC_AES                 53
> +	__le32 algo;
> +	/* hash result length */
> +	__le32 hash_result_len;
> +	/* length of authenticated key */
> +	__le32 auth_key_len;
> +	__le32 padding;
> +};
> +
> +struct virtio_crypto_mac_create_session_req {
> +	struct virtio_crypto_mac_session_para para;
> +};
> +
> +struct virtio_crypto_aead_session_para {
> +#define VIRTIO_CRYPTO_NO_AEAD     0
> +#define VIRTIO_CRYPTO_AEAD_GCM    1
> +#define VIRTIO_CRYPTO_AEAD_CCM    2
> +#define VIRTIO_CRYPTO_AEAD_CHACHA20_POLY1305  3
> +	__le32 algo;
> +	/* length of key */
> +	__le32 key_len;
> +	/* hash result length */
> +	__le32 hash_result_len;
> +	/* length of the additional authenticated data (AAD) in bytes */
> +	__le32 aad_len;
> +	/* encrypt or decrypt, See above VIRTIO_CRYPTO_OP_* */
> +	__le32 op;
> +	__le32 padding;
> +};
> +
> +struct virtio_crypto_aead_create_session_req {
> +	struct virtio_crypto_aead_session_para para;
> +};
> +
> +struct virtio_crypto_alg_chain_session_para {
> +#define VIRTIO_CRYPTO_SYM_ALG_CHAIN_ORDER_HASH_THEN_CIPHER  1
> +#define VIRTIO_CRYPTO_SYM_ALG_CHAIN_ORDER_CIPHER_THEN_HASH  2
> +	__le32 alg_chain_order;
> +/* Plain hash */
> +#define VIRTIO_CRYPTO_SYM_HASH_MODE_PLAIN    1
> +/* Authenticated hash (mac) */
> +#define VIRTIO_CRYPTO_SYM_HASH_MODE_AUTH     2
> +/* Nested hash */
> +#define VIRTIO_CRYPTO_SYM_HASH_MODE_NESTED   3
> +	__le32 hash_mode;
> +	struct virtio_crypto_cipher_session_para cipher_param;
> +	union {
> +		struct virtio_crypto_hash_session_para hash_param;
> +		struct virtio_crypto_mac_session_para mac_param;
> +	} u;

I personally dislike the use of unions of different size
in these structures. For example, using hash_param,
there's no way to initialize the trailing part of the
structure. Also, the size of the structure is hard to
figure out. I would like to see padding of all structures
to be same size, and adding a union member that's just padding.

Same applies to other unions.

> +	/* length of the additional authenticated data (AAD) in bytes */
> +	__le32 aad_len;
> +	__le32 padding;
> +};
> +
> +struct virtio_crypto_alg_chain_session_req {
> +	struct virtio_crypto_alg_chain_session_para para;
> +};
> +
> +struct virtio_crypto_sym_create_session_req {
> +	union {
> +		struct virtio_crypto_cipher_session_req cipher;
> +		struct virtio_crypto_alg_chain_session_req chain;
> +	} u;
> +
> +	/* Device-readable part */
> +
> +/* No operation */
> +#define VIRTIO_CRYPTO_SYM_OP_NONE  0
> +/* Cipher only operation on the data */
> +#define VIRTIO_CRYPTO_SYM_OP_CIPHER  1
> +/*
> + * Chain any cipher with any hash or mac operation. The order
> + * depends on the value of alg_chain_order param
> + */
> +#define VIRTIO_CRYPTO_SYM_OP_ALGORITHM_CHAINING  2
> +	__le32 op_type;
> +	__le32 padding;
> +};
> +
> +struct virtio_crypto_destroy_session_req {
> +	/* Device-readable part */
> +	__le64  session_id;
> +};
> +
> +/* The request of the control viritqueue's packet */
> +struct virtio_crypto_op_ctrl_req {
> +	struct virtio_crypto_ctrl_header header;
> +
> +	union {
> +		struct virtio_crypto_sym_create_session_req
> +			sym_create_session;
> +		struct virtio_crypto_hash_create_session_req
> +			hash_create_session;
> +		struct virtio_crypto_mac_create_session_req
> +			mac_create_session;
> +		struct virtio_crypto_aead_create_session_req
> +			aead_create_session;
> +		struct virtio_crypto_destroy_session_req
> +			destroy_session;
> +	} u;
> +};
> +
> +struct virtio_crypto_op_header {
> +#define VIRTIO_CRYPTO_CIPHER_ENCRYPT \
> +	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x00)
> +#define VIRTIO_CRYPTO_CIPHER_DECRYPT \
> +	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x01)
> +#define VIRTIO_CRYPTO_HASH \
> +	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x00)
> +#define VIRTIO_CRYPTO_MAC \
> +	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x00)
> +#define VIRTIO_CRYPTO_AEAD_ENCRYPT \
> +	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x00)
> +#define VIRTIO_CRYPTO_AEAD_DECRYPT \
> +	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x01)
> +	__le32 opcode;
> +	/* algo should be service-specific algorithms */
> +	__le32 algo;
> +	/* session_id should be service-specific algorithms */
> +	__le64 session_id;
> +	/* control flag to control the request */
> +	__le32 flag;
> +	__le32 padding;
> +};
> +
> +struct virtio_crypto_cipher_para {
> +	/*
> +	 * Byte Length of valid IV/Counter
> +	 *
> +	 * For block ciphers in CBC or F8 mode, or for Kasumi in F8 mode, or for
> +	 *   SNOW3G in UEA2 mode, this is the length of the IV (which
> +	 *   must be the same as the block length of the cipher).
> +	 * For block ciphers in CTR mode, this is the length of the counter
> +	 *   (which must be the same as the block length of the cipher).
> +	 * For AES-XTS, this is the 128bit tweak, i, from IEEE Std 1619-2007.
> +	 *
> +	 * The IV/Counter will be updated after every partial cryptographic
> +	 * operation.
> +	 */
> +	__le32 iv_len;
> +	/* length of source data */
> +	__le32 src_data_len;
> +	/* length of dst data */
> +	__le32 dst_data_len;
> +	__le32 padding;
> +};
> +
> +struct virtio_crypto_hash_para {
> +	/* length of source data */
> +	__le32 src_data_len;
> +	/* hash result length */
> +	__le32 hash_result_len;
> +};
> +
> +struct virtio_crypto_mac_para {
> +	struct virtio_crypto_hash_para hash;
> +};
> +
> +struct virtio_crypto_aead_para {
> +	/*
> +	 * Byte Length of valid IV data pointed to by the below iv_addr
> +	 * parameter.
> +	 *
> +	 * For GCM mode, this is either 12 (for 96-bit IVs) or 16, in which
> +	 *   case iv_addr points to J0.
> +	 * For CCM mode, this is the length of the nonce, which can be in the
> +	 *   range 7 to 13 inclusive.
> +	 */
> +	__le32 iv_len;
> +	/* length of additional auth data */
> +	__le32 aad_len;
> +	/* length of source data */
> +	__le32 src_data_len;
> +	/* length of dst data */
> +	__le32 dst_data_len;
> +};
> +
> +struct virtio_crypto_cipher_data_req {
> +	/* Device-readable part */
> +	struct virtio_crypto_cipher_para para;
> +};
> +
> +struct virtio_crypto_hash_data_req {
> +	/* Device-readable part */
> +	struct virtio_crypto_hash_para para;
> +};
> +
> +struct virtio_crypto_mac_data_req {
> +	/* Device-readable part */
> +	struct virtio_crypto_mac_para para;
> +};
> +
> +struct virtio_crypto_alg_chain_data_para {
> +	__le32 iv_len;
> +	/* Length of source data */
> +	__le32 src_data_len;
> +	/* Length of destination data */
> +	__le32 dst_data_len;
> +	/* Starting point for cipher processing in source data */
> +	__le32 cipher_start_src_offset;
> +	/* Length of the source data that the cipher will be computed on */
> +	__le32 len_to_cipher;
> +	/* Starting point for hash processing in source data */
> +	__le32 hash_start_src_offset;
> +	/* Length of the source data that the hash will be computed on */
> +	__le32 len_to_hash;
> +	/* Length of the additional auth data */
> +	__le32 aad_len;
> +	/* Length of the hash result */
> +	__le32 hash_result_len;
> +	__le32 reserved;
> +};
> +
> +struct virtio_crypto_alg_chain_data_req {
> +	/* Device-readable part */
> +	struct virtio_crypto_alg_chain_data_para para;
> +};
> +
> +struct virtio_crypto_sym_data_req {
> +	union {
> +		struct virtio_crypto_cipher_data_req cipher;
> +		struct virtio_crypto_alg_chain_data_req chain;
> +	} u;
> +
> +	/* See above VIRTIO_CRYPTO_SYM_OP_* */
> +	__le32 op_type;
> +	__le32 padding;
> +};
> +
> +struct virtio_crypto_aead_data_req {
> +	/* Device-readable part */
> +	struct virtio_crypto_aead_para para;
> +};
> +
> +/* The request of the data viritqueue's packet */
> +struct virtio_crypto_op_data_req {
> +	struct virtio_crypto_op_header header;
> +
> +	union {
> +		struct virtio_crypto_sym_data_req  sym_req;
> +		struct virtio_crypto_hash_data_req hash_req;
> +		struct virtio_crypto_mac_data_req mac_req;
> +		struct virtio_crypto_aead_data_req aead_req;
> +	} u;
> +};
> +
> +#define VIRTIO_CRYPTO_OK        0
> +#define VIRTIO_CRYPTO_ERR       1
> +#define VIRTIO_CRYPTO_BADMSG    2
> +#define VIRTIO_CRYPTO_NOTSUPP   3
> +#define VIRTIO_CRYPTO_INVSESS   4 /* Invaild session id */
> +
> +/* The accelerator hardware is ready */
> +#define VIRTIO_CRYPTO_S_HW_READY  (1 << 0)
> +
> +struct virtio_crypto_config {
> +	/* See VIRTIO_CRYPTO_OP_* above */
> +	__le32  status;
> +
> +	/*
> +	 * Maximum number of data queue legal values are between 1 and 0x8000
> +	 */
> +	__le32  max_dataqueues;
> +
> +	/*
> +	 * Specifies the services mask which the devcie support,
> +	 * see VIRTIO_CRYPTO_SERVICE_* above
> +	 */
> +	__le32 crypto_services;
> +
> +	/* Detailed algorithms mask */
> +	__le32 cipher_algo_l;
> +	__le32 cipher_algo_h;
> +	__le32 hash_algo;
> +	__le32 mac_algo_l;
> +	__le32 mac_algo_h;
> +	__le32 aead_algo;
> +	/* Maximum length of cipher key */
> +	__le32 max_cipher_key_len;
> +	/* Maximum length of authenticated key */
> +	__le32 max_auth_key_len;
> +	__le32 reserve;
> +	/* Maximum size of each crypto request's content */
> +	__le64 max_size;
> +};
> +
> +struct virtio_crypto_inhdr {
> +	/* See VIRTIO_CRYPTO_* above */
> +	__u8 status;
> +};
> +#endif
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 3228d58..6d5c3b2 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -42,5 +42,6 @@
>  #define VIRTIO_ID_GPU          16 /* virtio GPU */
>  #define VIRTIO_ID_INPUT        18 /* virtio input */
>  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
> +#define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
>  
>  #endif /* _LINUX_VIRTIO_IDS_H */
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* [PATCH 2/2] crypto: arm/crc32 - accelerated support based on x86 SSE implementation
From: Ard Biesheuvel @ 2016-11-26 20:15 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, linux
  Cc: steve.capper, Ard Biesheuvel
In-Reply-To: <1480191314-2331-1-git-send-email-ard.biesheuvel@linaro.org>

This is a combination of the the Intel algorithm implemented using SSE
and PCLMULQDQ instructions from arch/x86/crypto/crc32-pclmul_asm.S, and
the new CRC32 extensions introduced for both 32-bit and 64-bit ARM in
version 8 of the architecture.

The PMULL/NEON algorithm is faster, but operates on blocks of at least
64 bytes, and on multiples of 16 bytes only. For the remaining input,
or for all input on systems that lack the PMULL 64x64->128 instructions,
the CRC32 instructions will be used.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/Kconfig         |   5 +
 arch/arm/crypto/Makefile        |   2 +
 arch/arm/crypto/crc32-ce-core.S | 257 ++++++++++++++++++++
 arch/arm/crypto/crc32-ce-glue.c | 129 ++++++++++
 4 files changed, 393 insertions(+)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index fce801fa52a1..be5cb5a7d3fa 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -125,4 +125,9 @@ config CRYPTO_CRCT10DIF_ARM_CE
 	depends on KERNEL_MODE_NEON && CRC_T10DIF
 	select CRYPTO_HASH
 
+config CRYPTO_CRC32_ARM_CE
+	tristate "CRC32 digest algorithm using CRC and/or PMULL instructions"
+	depends on KERNEL_MODE_NEON && CRC32
+	select CRYPTO_HASH
+
 endif
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index fc77265014b7..b578a1820ab1 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -14,6 +14,7 @@ ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_SHA2_ARM_CE) += sha2-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_GHASH_ARM_CE) += ghash-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_CRCT10DIF_ARM_CE) += crct10dif-arm-ce.o
+ce-obj-$(CONFIG_CRYPTO_CRC32_ARM_CE) += crc32-arm-ce.o
 
 ifneq ($(ce-obj-y)$(ce-obj-m),)
 ifeq ($(call as-instr,.fpu crypto-neon-fp-armv8,y,n),y)
@@ -38,6 +39,7 @@ sha2-arm-ce-y	:= sha2-ce-core.o sha2-ce-glue.o
 aes-arm-ce-y	:= aes-ce-core.o aes-ce-glue.o
 ghash-arm-ce-y	:= ghash-ce-core.o ghash-ce-glue.o
 crct10dif-arm-ce-y	:= crct10dif-ce-core.o crct10dif-ce-glue.o
+crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
 
 quiet_cmd_perl = PERL    $@
       cmd_perl = $(PERL) $(<) > $(@)
diff --git a/arch/arm/crypto/crc32-ce-core.S b/arch/arm/crypto/crc32-ce-core.S
new file mode 100644
index 000000000000..ef671f040672
--- /dev/null
+++ b/arch/arm/crypto/crc32-ce-core.S
@@ -0,0 +1,257 @@
+/*
+ * Accelerated CRC32 using ARM CRC, NEON and Crypto Extensions instructions
+ *
+ * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/* GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see http://www.gnu.org/licenses
+ *
+ * Please  visit http://www.xyratex.com/contact if you need additional
+ * information or have any questions.
+ *
+ * GPL HEADER END
+ */
+
+/*
+ * Copyright 2012 Xyratex Technology Limited
+ *
+ * Using hardware provided PCLMULQDQ instruction to accelerate the CRC32
+ * calculation.
+ * CRC32 polynomial:0x04c11db7(BE)/0xEDB88320(LE)
+ * PCLMULQDQ is a new instruction in Intel SSE4.2, the reference can be found
+ * at:
+ * http://www.intel.com/products/processor/manuals/
+ * Intel(R) 64 and IA-32 Architectures Software Developer's Manual
+ * Volume 2B: Instruction Set Reference, N-Z
+ *
+ * Authors:   Gregory Prestas <Gregory_Prestas@us.xyratex.com>
+ *	      Alexander Boyko <Alexander_Boyko@xyratex.com>
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+	.text
+	.align		4
+	.arch		armv8-a
+	.arch_extension	crc
+	.fpu		crypto-neon-fp-armv8
+
+	/*
+	 * [x4*128+32 mod P(x) << 32)]'  << 1   = 0x154442bd4
+	 * #define CONSTANT_R1  0x154442bd4LL
+	 *
+	 * [(x4*128-32 mod P(x) << 32)]' << 1   = 0x1c6e41596
+	 * #define CONSTANT_R2  0x1c6e41596LL
+	 */
+.Lconstant_R2R1:
+	.quad		0x0000000154442bd4
+	.quad		0x00000001c6e41596
+
+	/*
+	 * [(x128+32 mod P(x) << 32)]'   << 1   = 0x1751997d0
+	 * #define CONSTANT_R3  0x1751997d0LL
+	 *
+	 * [(x128-32 mod P(x) << 32)]'   << 1   = 0x0ccaa009e
+	 * #define CONSTANT_R4  0x0ccaa009eLL
+	 */
+.Lconstant_R4R3:
+	.quad		0x00000001751997d0
+	.quad		0x00000000ccaa009e
+
+	/*
+	 * [(x64 mod P(x) << 32)]'       << 1   = 0x163cd6124
+	 * #define CONSTANT_R5  0x163cd6124LL
+	 */
+.Lconstant_R5:
+	.quad		0x0000000163cd6124
+
+.Lconstant_mask32:
+	.quad		0x00000000FFFFFFFF
+
+	/*
+	 * #define CRCPOLY_TRUE_LE_FULL 0x1DB710641LL
+	 *
+	 * Barrett Reduction constant (u64`) = u` = (x**64 / P(x))`
+	 *                                                      = 0x1F7011641LL
+	 * #define CONSTANT_RU  0x1F7011641LL
+	 */
+.Lconstant_RUpoly:
+	.quad		0x00000001DB710641
+	.quad		0x00000001F7011641
+
+	dCONSTANTl	.req	d0
+	dCONSTANTh	.req	d1
+	qCONSTANT	.req	q0
+
+	BUF		.req	r0
+	LEN		.req	r1
+	CRC		.req	r2
+
+	qzr		.req	q9
+
+	/**
+	 * Calculate crc32
+	 * BUF - buffer
+	 * LEN - sizeof buffer (multiple of 16 bytes), LEN should be > 63
+	 * CRC - initial crc32
+	 * return %eax crc32
+	 * uint crc32_pmull_le(unsigned char const *buffer,
+	 *                     size_t len, uint crc32)
+	 */
+ENTRY(crc32_pmull_le)
+	bic		LEN, LEN, #15
+	vld1.8		{q1-q2}, [BUF]!
+	vld1.8		{q3-q4}, [BUF]!
+	vmov.i8		qzr, #0
+	vmov.i8		qCONSTANT, #0
+	vmov		dCONSTANTl[0], CRC
+	veor.8		d2, d2, dCONSTANTl
+	sub		LEN, LEN, #0x40
+	cmp		LEN, #0x40
+	blt		less_64
+
+	vldr		dCONSTANTl, .Lconstant_R2R1
+	vldr		dCONSTANTh, .Lconstant_R2R1 + 8
+
+loop_64:		/* 64 bytes Full cache line folding */
+	sub		LEN, LEN, #0x40
+
+	vmull.p64	q5, d3, dCONSTANTh
+	vmull.p64	q6, d5, dCONSTANTh
+	vmull.p64	q7, d7, dCONSTANTh
+	vmull.p64	q8, d9, dCONSTANTh
+
+	vmull.p64	q1, d2, dCONSTANTl
+	vmull.p64	q2, d4, dCONSTANTl
+	vmull.p64	q3, d6, dCONSTANTl
+	vmull.p64	q4, d8, dCONSTANTl
+
+	veor.8		q1, q1, q5
+	vld1.8		{q5}, [BUF]!
+	veor.8		q2, q2, q6
+	vld1.8		{q6}, [BUF]!
+	veor.8		q3, q3, q7
+	vld1.8		{q7}, [BUF]!
+	veor.8		q4, q4, q8
+	vld1.8		{q8}, [BUF]!
+
+	veor.8		q1, q1, q5
+	veor.8		q2, q2, q6
+	veor.8		q3, q3, q7
+	veor.8		q4, q4, q8
+
+	cmp		LEN, #0x40
+	bge		loop_64
+
+less_64:		/* Folding cache line into 128bit */
+	vldr		dCONSTANTl, .Lconstant_R4R3
+	vldr		dCONSTANTh, .Lconstant_R4R3 + 8
+
+	vmull.p64	q5, d3, dCONSTANTh
+	vmull.p64	q1, d2, dCONSTANTl
+	veor.8		q1, q1, q5
+	veor.8		q1, q1, q2
+
+	vmull.p64	q5, d3, dCONSTANTh
+	vmull.p64	q1, d2, dCONSTANTl
+	veor.8		q1, q1, q5
+	veor.8		q1, q1, q3
+
+	vmull.p64	q5, d3, dCONSTANTh
+	vmull.p64	q1, d2, dCONSTANTl
+	veor.8		q1, q1, q5
+	veor.8		q1, q1, q4
+
+	teq		LEN, #0
+	beq		fold_64
+
+loop_16:		/* Folding rest buffer into 128bit */
+	subs		LEN, LEN, #0x10
+
+	vld1.8		{q2}, [BUF]!
+	vmull.p64	q5, d3, dCONSTANTh
+	vmull.p64	q1, d2, dCONSTANTl
+	veor.8		q1, q1, q5
+	veor.8		q1, q1, q2
+
+	bne		loop_16
+
+fold_64:
+	/* perform the last 64 bit fold, also adds 32 zeroes
+	 * to the input stream */
+	vmull.p64	q2, d2, dCONSTANTh
+	vext.8		q1, q1, qzr, #8
+	veor.8		q1, q1, q2
+
+	/* final 32-bit fold */
+	vldr		dCONSTANTl, .Lconstant_R5
+	vldr		d6, .Lconstant_mask32
+	vmov.i8		d7, #0
+
+	vext.8		q2, q1, qzr, #4
+	vand.8		d2, d2, d6
+	vmull.p64	q1, d2, dCONSTANTl
+	veor.8		q1, q1, q2
+
+	/* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */
+	vldr		dCONSTANTl, .Lconstant_RUpoly
+	vldr		dCONSTANTh, .Lconstant_RUpoly + 8
+
+	vand.8		q2, q1, q3
+	vext.8		q2, qzr, q2, #8
+	vmull.p64	q2, d5, dCONSTANTh
+	vand.8		q2, q2, q3
+	vmull.p64	q2, d4, dCONSTANTl
+	veor.8		q1, q1, q2
+	vmov		r0, s5
+
+	bx		lr
+ENDPROC(crc32_pmull_le)
+
+ENTRY(crc32_armv8_le)
+	mov		ip, r2
+0:	subs		ip, ip, #8
+	bmi		4f
+	ldrd		r2, r3, [r1], #8
+ARM_BE8(rev		r2, r2		)
+ARM_BE8(rev		r3, r3		)
+	crc32w		r0, r0, r2
+	crc32w		r0, r0, r3
+	b		0b
+
+4:	tst		ip, #4
+	beq		2f
+	ldr		r3, [r1], #4
+ARM_BE8(rev		r3, r3		)
+	crc32w		r0, r0, r3
+2:	tst		ip, #2
+	beq		1f
+	ldrh		r3, [r1], #2
+ARM_BE8(rev16		r3, r3		)
+	crc32h		r0, r0, r3
+1:	tst		ip, #1
+	bxeq		lr
+	ldrb		r3, [r1]
+	crc32b		r0, r0, r3
+	bx		lr
+ENDPROC(crc32_armv8_le)
diff --git a/arch/arm/crypto/crc32-ce-glue.c b/arch/arm/crypto/crc32-ce-glue.c
new file mode 100644
index 000000000000..61916b6b40e9
--- /dev/null
+++ b/arch/arm/crypto/crc32-ce-glue.c
@@ -0,0 +1,129 @@
+/*
+ * Accelerated CRC32 using ARM CRC, NEON and Crypto Extensions instructions
+ *
+ * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/crc32.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/string.h>
+
+#include <crypto/internal/hash.h>
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#include <asm/simd.h>
+#include <asm/unaligned.h>
+
+#define PMULL_MIN_LEN		64L	/* minimum size of buffer
+					 * for crc32_pmull_le_16 */
+#define SCALE_F			16L	/* size of NEON register */
+
+asmlinkage u32 crc32_pmull_le(const u8 buf[], u32 len, u32 init_crc);
+asmlinkage u32 crc32_armv8_le(u32 init_crc, const u8 buf[], u32 len);
+
+static int crc32_pmull_cra_init(struct crypto_tfm *tfm)
+{
+	u32 *key = crypto_tfm_ctx(tfm);
+
+	*key = 0;
+	return 0;
+}
+
+static int crc32_pmull_setkey(struct crypto_shash *hash, const u8 *key,
+			      unsigned int keylen)
+{
+	u32 *mctx = crypto_shash_ctx(hash);
+
+	if (keylen != sizeof(u32)) {
+		crypto_shash_set_flags(hash, CRYPTO_TFM_RES_BAD_KEY_LEN);
+		return -EINVAL;
+	}
+	*mctx = le32_to_cpup((__le32 *)key);
+	return 0;
+}
+
+static int crc32_pmull_init(struct shash_desc *desc)
+{
+	u32 *mctx = crypto_shash_ctx(desc->tfm);
+	u32 *crc = shash_desc_ctx(desc);
+
+	*crc = *mctx;
+	return 0;
+}
+
+static int crc32_pmull_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int length)
+{
+	u32 *crc = shash_desc_ctx(desc);
+
+	if (length >= PMULL_MIN_LEN && may_use_simd() &&
+	    (elf_hwcap2 & HWCAP2_PMULL)) {
+		kernel_neon_begin();
+		*crc = crc32_pmull_le(data, round_down(length, SCALE_F), *crc);
+		kernel_neon_end();
+
+		data += round_down(length, SCALE_F);
+		length %= SCALE_F;
+	}
+
+	if (length > 0) {
+		if (elf_hwcap2 & HWCAP2_CRC32)
+			*crc = crc32_armv8_le(*crc, data, length);
+		else
+			*crc = crc32_le(*crc, data, length);
+	}
+
+	return 0;
+}
+
+static int crc32_pmull_final(struct shash_desc *desc, u8 *out)
+{
+	u32 *crc = shash_desc_ctx(desc);
+
+	put_unaligned_le32(*crc, out);
+	return 0;
+}
+
+static struct shash_alg crc32_pmull_alg = {
+	.setkey			= crc32_pmull_setkey,
+	.init			= crc32_pmull_init,
+	.update			= crc32_pmull_update,
+	.final			= crc32_pmull_final,
+	.descsize		= sizeof(u32),
+	.digestsize		= sizeof(u32),
+
+	.base.cra_ctxsize	= sizeof(u32),
+	.base.cra_init		= crc32_pmull_cra_init,
+	.base.cra_name		= "crc32",
+	.base.cra_driver_name	= "crc32-arm-ce",
+	.base.cra_priority	= 200,
+	.base.cra_blocksize	= 1,
+	.base.cra_module	= THIS_MODULE,
+};
+
+static int __init crc32_pmull_mod_init(void)
+{
+	if (!(elf_hwcap2 & (HWCAP2_PMULL|HWCAP2_CRC32)))
+		return -ENODEV;
+
+	return crypto_register_shash(&crc32_pmull_alg);
+}
+
+static void __exit crc32_pmull_mod_exit(void)
+{
+	crypto_unregister_shash(&crc32_pmull_alg);
+}
+
+module_init(crc32_pmull_mod_init);
+module_exit(crc32_pmull_mod_exit);
+
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("crc32");
-- 
2.7.4

^ permalink raw reply related

* [PATCH 1/2] crypto: arm64/crc32 - accelerated support based on x86 SSE implementation
From: Ard Biesheuvel @ 2016-11-26 20:15 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, linux
  Cc: steve.capper, Ard Biesheuvel
In-Reply-To: <1480191314-2331-1-git-send-email-ard.biesheuvel@linaro.org>

This is a combination of the the Intel algorithm implemented using SSE
and PCLMULQDQ instructions from arch/x86/crypto/crc32-pclmul_asm.S, and
the new CRC32 extensions introduced for both 32-bit and 64-bit ARM in
version 8 of the architecture.

The PMULL/NEON algorithm is faster, but operates on blocks of at least
64 bytes, and on multiples of 16 bytes only. For the remaining input,
or for all input on systems that lack the PMULL 64x64->128 instructions,
the CRC32 instructions will be used.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig         |   6 +
 arch/arm64/crypto/Makefile        |   3 +
 arch/arm64/crypto/crc32-ce-core.S | 246 ++++++++++++++++++++
 arch/arm64/crypto/crc32-ce-glue.c | 124 ++++++++++
 4 files changed, 379 insertions(+)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 1b50671ffec3..11dc2ac1f2e5 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -58,4 +58,10 @@ config CRYPTO_CRC32_ARM64
 	tristate "CRC32 and CRC32C using optional ARMv8 instructions"
 	depends on ARM64
 	select CRYPTO_HASH
+
+config CRYPTO_CRC32_ARM64_CE
+	tristate "CRC32 digest algorithm using PMULL instructions"
+	depends on ARM64 && KERNEL_MODE_NEON
+	select CRYPTO_HASH
+
 endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 36fd3eb4201b..144387805a46 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -20,6 +20,9 @@ ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
 obj-$(CONFIG_CRYPTO_CRCT10DIF_ARM64_CE) += crct10dif-ce.o
 crct10dif-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
 
+obj-$(CONFIG_CRYPTO_CRC32_ARM64_CE) += crc32-ce.o
+crc32-ce-y:= crc32-ce-core.o crc32-ce-glue.o
+
 obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
 CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
 
diff --git a/arch/arm64/crypto/crc32-ce-core.S b/arch/arm64/crypto/crc32-ce-core.S
new file mode 100644
index 000000000000..eff7fe100dab
--- /dev/null
+++ b/arch/arm64/crypto/crc32-ce-core.S
@@ -0,0 +1,246 @@
+/*
+ * Accelerated CRC32 using arm64 CRC, NEON and Crypto Extensions instructions
+ *
+ * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/* GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see http://www.gnu.org/licenses
+ *
+ * Please  visit http://www.xyratex.com/contact if you need additional
+ * information or have any questions.
+ *
+ * GPL HEADER END
+ */
+
+/*
+ * Copyright 2012 Xyratex Technology Limited
+ *
+ * Using hardware provided PCLMULQDQ instruction to accelerate the CRC32
+ * calculation.
+ * CRC32 polynomial:0x04c11db7(BE)/0xEDB88320(LE)
+ * PCLMULQDQ is a new instruction in Intel SSE4.2, the reference can be found
+ * at:
+ * http://www.intel.com/products/processor/manuals/
+ * Intel(R) 64 and IA-32 Architectures Software Developer's Manual
+ * Volume 2B: Instruction Set Reference, N-Z
+ *
+ * Authors:   Gregory Prestas <Gregory_Prestas@us.xyratex.com>
+ *	      Alexander Boyko <Alexander_Boyko@xyratex.com>
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+
+	.text
+	.align		4
+	.cpu		generic+crypto+crc
+
+	/*
+	 * [x4*128+32 mod P(x) << 32)]'  << 1   = 0x154442bd4
+	 * #define CONSTANT_R1  0x154442bd4LL
+	 *
+	 * [(x4*128-32 mod P(x) << 32)]' << 1   = 0x1c6e41596
+	 * #define CONSTANT_R2  0x1c6e41596LL
+	 */
+.Lconstant_R2R1:
+	.octa		0x00000001c6e415960000000154442bd4
+
+	/*
+	 * [(x128+32 mod P(x) << 32)]'   << 1   = 0x1751997d0
+	 * #define CONSTANT_R3  0x1751997d0LL
+	 *
+	 * [(x128-32 mod P(x) << 32)]'   << 1   = 0x0ccaa009e
+	 * #define CONSTANT_R4  0x0ccaa009eLL
+	 */
+.Lconstant_R4R3:
+	.octa		0x00000000ccaa009e00000001751997d0
+
+	/*
+	 * [(x64 mod P(x) << 32)]'       << 1   = 0x163cd6124
+	 * #define CONSTANT_R5  0x163cd6124LL
+	 */
+.Lconstant_R5:
+	.octa		0x00000000000000000000000163cd6124
+.Lconstant_mask32:
+	.octa		0x000000000000000000000000FFFFFFFF
+
+	/*
+	 * #define CRCPOLY_TRUE_LE_FULL 0x1DB710641LL
+	 *
+	 * Barrett Reduction constant (u64`) = u` = (x**64 / P(x))`
+	 *                                                      = 0x1F7011641LL
+	 * #define CONSTANT_RU  0x1F7011641LL
+	 */
+.Lconstant_RUpoly:
+	.octa		0x00000001F701164100000001DB710641
+
+	vCONSTANT	.req	v0
+	dCONSTANT	.req	d0
+	qCONSTANT	.req	q0
+
+	BUF		.req	x0
+	LEN		.req	x1
+	CRC		.req	x2
+
+	vzr		.req	v9
+
+	/**
+	 * Calculate crc32
+	 * BUF - buffer
+	 * LEN - sizeof buffer (multiple of 16 bytes), LEN should be > 63
+	 * CRC - initial crc32
+	 * return %eax crc32
+	 * uint crc32_pmull_le(unsigned char const *buffer,
+	 *                     size_t len, uint crc32)
+	 */
+ENTRY(crc32_pmull_le)
+	bic		LEN, LEN, #15
+	ld1		{v1.16b-v4.16b}, [BUF], #0x40
+	movi		vzr.16b, #0
+	fmov		dCONSTANT, CRC
+	eor		v1.16b, v1.16b, vCONSTANT.16b
+	sub		LEN, LEN, #0x40
+	cmp		LEN, #0x40
+	b.lt		less_64
+
+	ldr		qCONSTANT, .Lconstant_R2R1
+
+loop_64:		/* 64 bytes Full cache line folding */
+	sub		LEN, LEN, #0x40
+
+	pmull2		v5.1q, v1.2d, vCONSTANT.2d
+	pmull2		v6.1q, v2.2d, vCONSTANT.2d
+	pmull2		v7.1q, v3.2d, vCONSTANT.2d
+	pmull2		v8.1q, v4.2d, vCONSTANT.2d
+
+	pmull		v1.1q, v1.1d, vCONSTANT.1d
+	pmull		v2.1q, v2.1d, vCONSTANT.1d
+	pmull		v3.1q, v3.1d, vCONSTANT.1d
+	pmull		v4.1q, v4.1d, vCONSTANT.1d
+
+	eor		v1.16b, v1.16b, v5.16b
+	ld1		{v5.16b}, [BUF], #0x10
+	eor		v2.16b, v2.16b, v6.16b
+	ld1		{v6.16b}, [BUF], #0x10
+	eor		v3.16b, v3.16b, v7.16b
+	ld1		{v7.16b}, [BUF], #0x10
+	eor		v4.16b, v4.16b, v8.16b
+	ld1		{v8.16b}, [BUF], #0x10
+
+	eor		v1.16b, v1.16b, v5.16b
+	eor		v2.16b, v2.16b, v6.16b
+	eor		v3.16b, v3.16b, v7.16b
+	eor		v4.16b, v4.16b, v8.16b
+
+	cmp		LEN, #0x40
+	b.ge		loop_64
+
+less_64:		/* Folding cache line into 128bit */
+	ldr		qCONSTANT, .Lconstant_R4R3
+
+	pmull2		v5.1q, v1.2d, vCONSTANT.2d
+	pmull		v1.1q, v1.1d, vCONSTANT.1d
+	eor		v1.16b, v1.16b, v5.16b
+	eor		v1.16b, v1.16b, v2.16b
+
+	pmull2		v5.1q, v1.2d, vCONSTANT.2d
+	pmull		v1.1q, v1.1d, vCONSTANT.1d
+	eor		v1.16b, v1.16b, v5.16b
+	eor		v1.16b, v1.16b, v3.16b
+
+	pmull2		v5.1q, v1.2d, vCONSTANT.2d
+	pmull		v1.1q, v1.1d, vCONSTANT.1d
+	eor		v1.16b, v1.16b, v5.16b
+	eor		v1.16b, v1.16b, v4.16b
+
+	cbz		LEN, fold_64
+
+loop_16:		/* Folding rest buffer into 128bit */
+	subs		LEN, LEN, #0x10
+
+	ld1		{v2.16b}, [BUF], #0x10
+	pmull2		v5.1q, v1.2d, vCONSTANT.2d
+	pmull		v1.1q, v1.1d, vCONSTANT.1d
+	eor		v1.16b, v1.16b, v5.16b
+	eor		v1.16b, v1.16b, v2.16b
+
+	b.ne		loop_16
+
+fold_64:
+	/* perform the last 64 bit fold, also adds 32 zeroes
+	 * to the input stream */
+	ext		v2.16b, v1.16b, v1.16b, #8
+	pmull2		v2.1q, v2.2d, vCONSTANT.2d
+	ext		v1.16b, v1.16b, vzr.16b, #8
+	eor		v1.16b, v1.16b, v2.16b
+
+	/* final 32-bit fold */
+	ldr		qCONSTANT, .Lconstant_R5
+	ldr		q3, .Lconstant_mask32
+
+	ext		v2.16b, v1.16b, vzr.16b, #4
+	and		v1.16b, v1.16b, v3.16b
+	pmull		v1.1q, v1.1d, vCONSTANT.1d
+	eor		v1.16b, v1.16b, v2.16b
+
+	/* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */
+	ldr		qCONSTANT, .Lconstant_RUpoly
+
+	and		v2.16b, v1.16b, v3.16b
+	ext		v2.16b, vzr.16b, v2.16b, #8
+	pmull2		v2.1q, v2.2d, vCONSTANT.2d
+	and		v2.16b, v2.16b, v3.16b
+	pmull		v2.1q, v2.1d, vCONSTANT.1d
+	eor		v1.16b, v1.16b, v2.16b
+	mov		w0, v1.s[1]
+
+	ret
+ENDPROC(crc32_pmull_le)
+
+ENTRY(crc32_armv8_le)
+0:	subs		x2, x2, #16
+	b.mi		8f
+	ldp		x3, x4, [x1], #16
+CPU_BE(	rev		x3, x3		)
+CPU_BE(	rev		x4, x4		)
+	crc32x		w0, w0, x3
+	crc32x		w0, w0, x4
+	b		0b
+
+8:	tbz		x2, #3, 4f
+	ldr		x3, [x1], #8
+CPU_BE(	rev		x3, x3		)
+	crc32x		w0, w0, x3
+4:	tbz		x2, #2, 2f
+	ldr		w3, [x1], #4
+CPU_BE(	rev		w3, w3		)
+	crc32w		w0, w0, w3
+2:	tbz		x2, #1, 1f
+	ldrh		w3, [x1], #2
+CPU_BE(	rev16		w3, w3		)
+	crc32h		w0, w0, w3
+1:	tbz		x2, #0, 0f
+	ldrb		w3, [x1]
+	crc32b		w0, w0, w3
+0:	ret
+ENDPROC(crc32_armv8_le)
diff --git a/arch/arm64/crypto/crc32-ce-glue.c b/arch/arm64/crypto/crc32-ce-glue.c
new file mode 100644
index 000000000000..567203f29ac6
--- /dev/null
+++ b/arch/arm64/crypto/crc32-ce-glue.c
@@ -0,0 +1,124 @@
+/*
+ * Accelerated CRC32 using arm64 NEON and Crypto Extensions instructions
+ *
+ * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/cpufeature.h>
+#include <linux/crc32.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/string.h>
+
+#include <crypto/internal/hash.h>
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+
+#define PMULL_MIN_LEN		64L	/* minimum size of buffer
+					 * for crc32_pmull_le_16 */
+#define SCALE_F			16L	/* size of NEON register */
+
+asmlinkage u32 crc32_pmull_le(const u8 buf[], u64 len, u32 init_crc);
+asmlinkage u32 crc32_armv8_le(u32 init_crc, const u8 buf[], u64 len);
+
+static int crc32_pmull_cra_init(struct crypto_tfm *tfm)
+{
+	u32 *key = crypto_tfm_ctx(tfm);
+
+	*key = 0;
+	return 0;
+}
+
+static int crc32_pmull_setkey(struct crypto_shash *hash, const u8 *key,
+			      unsigned int keylen)
+{
+	u32 *mctx = crypto_shash_ctx(hash);
+
+	if (keylen != sizeof(u32)) {
+		crypto_shash_set_flags(hash, CRYPTO_TFM_RES_BAD_KEY_LEN);
+		return -EINVAL;
+	}
+	*mctx = le32_to_cpup((__le32 *)key);
+	return 0;
+}
+
+static int crc32_pmull_init(struct shash_desc *desc)
+{
+	u32 *mctx = crypto_shash_ctx(desc->tfm);
+	u32 *crc = shash_desc_ctx(desc);
+
+	*crc = *mctx;
+	return 0;
+}
+
+static int crc32_pmull_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int length)
+{
+	u32 *crc = shash_desc_ctx(desc);
+
+	if (length >= PMULL_MIN_LEN) {
+		kernel_neon_begin_partial(10);
+		*crc = crc32_pmull_le(data, round_down(length, SCALE_F), *crc);
+		kernel_neon_end();
+
+		data += round_down(length, SCALE_F);
+		length %= SCALE_F;
+	}
+
+	if (length > 0) {
+		if (elf_hwcap & HWCAP_CRC32)
+			*crc = crc32_armv8_le(*crc, data, length);
+		else
+			*crc = crc32_le(*crc, data, length);
+	}
+
+	return 0;
+}
+
+static int crc32_pmull_final(struct shash_desc *desc, u8 *out)
+{
+	u32 *crc = shash_desc_ctx(desc);
+
+	put_unaligned_le32(*crc, out);
+	return 0;
+}
+
+static struct shash_alg crc32_pmull_alg = {
+	.setkey			= crc32_pmull_setkey,
+	.init			= crc32_pmull_init,
+	.update			= crc32_pmull_update,
+	.final			= crc32_pmull_final,
+	.descsize		= sizeof(u32),
+	.digestsize		= sizeof(u32),
+
+	.base.cra_ctxsize	= sizeof(u32),
+	.base.cra_init		= crc32_pmull_cra_init,
+	.base.cra_name		= "crc32",
+	.base.cra_driver_name	= "crc32-arm64-ce",
+	.base.cra_priority	= 200,
+	.base.cra_blocksize	= 1,
+	.base.cra_module	= THIS_MODULE,
+};
+
+static int __init crc32_pmull_mod_init(void)
+{
+	return crypto_register_shash(&crc32_pmull_alg);
+}
+
+static void __exit crc32_pmull_mod_exit(void)
+{
+	crypto_unregister_shash(&crc32_pmull_alg);
+}
+
+module_cpu_feature_match(PMULL, crc32_pmull_mod_init);
+module_exit(crc32_pmull_mod_exit);
+
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
-- 
2.7.4

^ permalink raw reply related

* [PATCH 0/2] CRC32 for ARM and arm64 using PMULL and CRC instructions
From: Ard Biesheuvel @ 2016-11-26 20:15 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, linux
  Cc: steve.capper, Ard Biesheuvel

Version 8 of the ARM architecture introduces both a set of dedicated CRC32
instructions, and a 64x64 to 128 bit polynomial multiplication instruction,
both of which can be used to accelerate CRC32 calculations.

These patches contains ports of the existing polynomial multiplication based
CRC32 code that resides in arch/x86/crypto/crc32-pclmul_asm.S, but since that
algorithm operates on multiples of 16 bytes only, and requires at least 64
bytes of input, the remainders are calculated with the CRC32 instructions,
if available.

These patches apply on top of the CRC-T10DIF series I sent out last Thursday.

https://git.kernel.org/cgit/linux/kernel/git/ardb/linux.git/log/?h=crc32

Ard Biesheuvel (2):
  crypto: arm64/crc32 - accelerated support based on x86 SSE
    implementation
  crypto: arm/crc32 - accelerated support based on x86 SSE
    implementation

 arch/arm/crypto/Kconfig           |   5 +
 arch/arm/crypto/Makefile          |   2 +
 arch/arm/crypto/crc32-ce-core.S   | 257 ++++++++++++++++++++
 arch/arm/crypto/crc32-ce-glue.c   | 129 ++++++++++
 arch/arm64/crypto/Kconfig         |   6 +
 arch/arm64/crypto/Makefile        |   3 +
 arch/arm64/crypto/crc32-ce-core.S | 246 +++++++++++++++++++
 arch/arm64/crypto/crc32-ce-glue.c | 124 ++++++++++
 8 files changed, 772 insertions(+)
 create mode 100644 arch/arm/crypto/crc32-ce-core.S
 create mode 100644 arch/arm/crypto/crc32-ce-glue.c
 create mode 100644 arch/arm64/crypto/crc32-ce-core.S
 create mode 100644 arch/arm64/crypto/crc32-ce-glue.c

-- 
2.7.4

^ permalink raw reply

* [PATCH] crypto: CTR DRBG - prevent invalid SG mappings
From: Stephan Mueller @ 2016-11-26  8:54 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto
In-Reply-To: <3805767.QuT5G9UC4v@positron.chronox.de>

Hi Herbert,

as discussed in another thread, SGs must not be used with stack memory 
pointers. This issue was the culprit to the error I see with the CTR DRBG. The 
attached patch fixes the issue.

---8<---

When using SGs, only heap memory (memory that is valid as per
virt_addr_valid) is allowed to be referenced. The CTR DRBG used to
reference the caller-provided memory directly in an SG. In case the
caller provided stack memory pointers, the SG mapping is not considered
to be valid. In some cases, this would even cause a paging fault.

The change adds a new scratch buffer that is used in case the
caller-provided buffer is deemed not suitable for use in an SG. The
crypto operation of the CTR DRBG produces its output with that scratch
buffer.

The scratch buffer is allocated during allocation time of the CTR DRBG
as its access is protected with the DRBG mutex.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
---
 crypto/drbg.c         | 35 +++++++++++++++++++++++++++++++----
 include/crypto/drbg.h |  2 ++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/crypto/drbg.c b/crypto/drbg.c
index 9a95b61..cbbd19f 100644
--- a/crypto/drbg.c
+++ b/crypto/drbg.c
@@ -262,6 +262,7 @@ static int drbg_kcapi_sym_ctr(struct drbg_state *drbg,
 			      u8 *inbuf, u32 inbuflen,
 			      u8 *outbuf, u32 outlen);
 #define DRBG_CTR_NULL_LEN 128
+#define DRBG_OUTSCRATCHLEN DRBG_CTR_NULL_LEN
 
 /* BCC function for CTR DRBG as defined in 10.4.3 */
 static int drbg_ctr_bcc(struct drbg_state *drbg,
@@ -1644,6 +1645,9 @@ static int drbg_fini_sym_kernel(struct drbg_state *drbg)
 	kfree(drbg->ctr_null_value_buf);
 	drbg->ctr_null_value = NULL;
 
+	kfree(drbg->outscratchpadbuf);
+	drbg->outscratchpadbuf = NULL;
+
 	return 0;
 }
 
@@ -1708,6 +1712,15 @@ static int drbg_init_sym_kernel(struct drbg_state 
*drbg)
 	drbg->ctr_null_value = (u8 *)PTR_ALIGN(drbg->ctr_null_value_buf,
 					       alignmask + 1);
 
+	drbg->outscratchpadbuf = kmalloc(DRBG_OUTSCRATCHLEN + alignmask,
+					 GFP_KERNEL);
+	if (!drbg->outscratchpadbuf) {
+		drbg_fini_sym_kernel(drbg);
+		return -ENOMEM;
+	}
+	drbg->outscratchpad = (u8 *)PTR_ALIGN(drbg->outscratchpadbuf,
+					      alignmask + 1);
+
 	return alignmask;
 }
 
@@ -1737,15 +1750,22 @@ static int drbg_kcapi_sym_ctr(struct drbg_state *drbg,
 			      u8 *outbuf, u32 outlen)
 {
 	struct scatterlist sg_in;
+	bool virt_addr_valid = virt_addr_valid(outbuf);
+	int ret = 0;
 
 	sg_init_one(&sg_in, inbuf, inlen);
 
 	while (outlen) {
 		u32 cryptlen = min_t(u32, inlen, outlen);
 		struct scatterlist sg_out;
-		int ret;
 
-		sg_init_one(&sg_out, outbuf, cryptlen);
+		/* If output buffer is not valid for SGL, use scratchpad */
+		if (virt_addr_valid)
+			sg_init_one(&sg_out, outbuf, cryptlen);
+		else {
+			cryptlen = min_t(u32, cryptlen, DRBG_OUTSCRATCHLEN);
+			sg_init_one(&sg_out, drbg->outscratchpad, cryptlen);
+		}
 		skcipher_request_set_crypt(drbg->ctr_req, &sg_in, &sg_out,
 					   cryptlen, drbg->V);
 		ret = crypto_skcipher_encrypt(drbg->ctr_req);
@@ -1761,15 +1781,22 @@ static int drbg_kcapi_sym_ctr(struct drbg_state *drbg,
 				break;
 			}
 		default:
-			return ret;
+			goto out;
 		}
 		init_completion(&drbg->ctr_completion);
 
+		if (!virt_addr_valid)
+			memcpy(outbuf, drbg->outscratchpad, cryptlen);
+
 		outlen -= cryptlen;
 		outbuf += cryptlen;
 	}
+	ret = 0;
 
-	return 0;
+out:
+	if (!virt_addr_valid)
+		memzero_explicit(drbg->outscratchpad, DRBG_OUTSCRATCHLEN);
+	return ret;
 }
 #endif /* CONFIG_CRYPTO_DRBG_CTR */
 
diff --git a/include/crypto/drbg.h b/include/crypto/drbg.h
index 61580b1..22f884c 100644
--- a/include/crypto/drbg.h
+++ b/include/crypto/drbg.h
@@ -124,6 +124,8 @@ struct drbg_state {
 	struct skcipher_request *ctr_req;	/* CTR mode request handle */
 	__u8 *ctr_null_value_buf;		/* CTR mode unaligned buffer */
 	__u8 *ctr_null_value;			/* CTR mode aligned zero buf */
+	__u8 *outscratchpadbuf;			/* CTR mode output scratchpad */
+        __u8 *outscratchpad;			/* CTR mode aligned outbuf */
 	struct completion ctr_completion;	/* CTR mode async handler */
 	int ctr_async_err;			/* CTR mode async error */
 
-- 
2.9.3

^ permalink raw reply related

* RE: [PATCH v2 0/2] virtio-crypto: add Linux driver
From: Gonglei (Arei) @ 2016-11-26  9:38 UTC (permalink / raw)
  To: Gonglei (Arei), linux-kernel@vger.kernel.org,
	qemu-devel@nongnu.org, virtio-dev@lists.oasis-open.org,
	virtualization@lists.linux-foundation.org,
	linux-crypto@vger.kernel.org
  Cc: Luonengjun, mst@redhat.com, stefanha@redhat.com, Huangweidong (C),
	Wubin (H), xin.zeng@intel.com, Claudio Fontana,
	herbert@gondor.apana.org.au, pasic@linux.vnet.ibm.com,
	davem@davemloft.net, Zhoujian (jay, Euler), Hanweidong (Randy),
	arei.gonglei@hotmail.com, cornelia.huck@de.ibm.com,
	Xuquan (Quan Xu), longpeng,
	"salvatore.benedetto@intel.com" <salvat
In-Reply-To: <1479802223-121104-1-git-send-email-arei.gonglei@huawei.com>

Hi,

> -----Original Message-----
> From: Gonglei (Arei)
> Sent: Tuesday, November 22, 2016 4:10 PM
> To: linux-kernel@vger.kernel.org; qemu-devel@nongnu.org;
> virtio-dev@lists.oasis-open.org; virtualization@lists.linux-foundation.org;
> linux-crypto@vger.kernel.org
> Subject: [PATCH v2 0/2] virtio-crypto: add Linux driver
> 
> The virtio crypto device is a virtual cryptography device
> as well as a kind of virtual hardware accelerator for
> virtual machines. The encryption anddecryption requests
> are placed in the data queue and are ultimately handled by
> thebackend crypto accelerators. The second queue is the
> control queue used to create or destroy sessions for
> symmetric algorithms and will control some advanced features
> in the future. The virtio crypto device provides the following
> cryptoservices: CIPHER, MAC, HASH, and AEAD.
> 
> For more information about virtio-crypto device, please see:
>   http://qemu-project.org/Features/VirtioCrypto
> 
> For better reviewing:
> 
> Patch 1 introduces the little edian functions for VIRTIO_1
> devices.
> 
> Patch 2 mainly includes five files:
>  1) virtio_crypto.h is the header file for virtio-crypto device,
> which is based on the virtio-crypto specification.
>  2) virtio_crypto.c is the entry of the driver module,
> which is similar with other virtio devices, such as virtio-net,
> virtio-input etc.
>  3) virtio_crypto_mgr.c is used to manage the virtio
> crypto devices in the system. We support up to 32 virtio-crypto
> devices currently. I use a global list to store the virtio crypto
> devices which refer to Intel QAT driver. Meanwhile, the file
> includs the functions of add/del/search/start/stop for virtio
> crypto devices.
>  4) virtio_crypto_common.h is a private header file for virtio
> crypto driver, includes structure definations, and function declarations.
>  5) virtio_crypto_algs.c is the realization of algs based on Linux Crypto
> Framwork,
> which can register different crypto algorithms. Currently it's only support
> AES-CBC.
> The Crypto guys can mainly focus to this file.
> 
> Actually I have no idea the virtio-crypto driver should be gone in whose
> tree, Michael's or Herbert's?
> 
> Would you give me a feedback? Thanks a lot!
> 
Ping?

Any ideas? Thanks.

Regards,
-Gonglei

> 
> v2:
>  - stop doing DMA from the stack, CONFIG_VMAP_STACK=y [Salvatore]
>  - convert __virtio32/64 to __le32/64 in virtio_crypto.h
>  - remove VIRTIO_CRYPTO_S_STARTED based on the lastest virtio crypto spec.
>  - introduces the little edian functions for VIRTIO_1 devices in patch 1.
> 
> Gonglei (2):
>   virtio: introduce little edian functions for virtio_cread/write#
>     family
>   crypto: add virtio-crypto driver
> 
>  MAINTAINERS                                  |   8 +
>  drivers/crypto/Kconfig                       |   2 +
>  drivers/crypto/Makefile                      |   1 +
>  drivers/crypto/virtio/Kconfig                |  10 +
>  drivers/crypto/virtio/Makefile               |   5 +
>  drivers/crypto/virtio/virtio_crypto.c        | 444
> +++++++++++++++++++++++
>  drivers/crypto/virtio/virtio_crypto_algs.c   | 524
> +++++++++++++++++++++++++++
>  drivers/crypto/virtio/virtio_crypto_common.h | 124 +++++++
>  drivers/crypto/virtio/virtio_crypto_mgr.c    | 258 +++++++++++++
>  include/linux/virtio_config.h                |  45 +++
>  include/uapi/linux/Kbuild                    |   1 +
>  include/uapi/linux/virtio_crypto.h           | 435
> ++++++++++++++++++++++
>  include/uapi/linux/virtio_ids.h              |   1 +
>  13 files changed, 1858 insertions(+)
>  create mode 100644 drivers/crypto/virtio/Kconfig
>  create mode 100644 drivers/crypto/virtio/Makefile
>  create mode 100644 drivers/crypto/virtio/virtio_crypto.c
>  create mode 100644 drivers/crypto/virtio/virtio_crypto_algs.c
>  create mode 100644 drivers/crypto/virtio/virtio_crypto_common.h
>  create mode 100644 drivers/crypto/virtio/virtio_crypto_mgr.c
>  create mode 100644 include/uapi/linux/virtio_crypto.h
> 
> --
> 1.8.3.1
> 

^ permalink raw reply

* [PATCH] crypto: vmx - rebuild generated asm when target changes
From: Nicholas Piggin @ 2016-11-26  4:24 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Nicholas Piggin, Leonidas S . Barbosa, Paulo Flabiano Smorigo,
	linux-crypto, Michael Ellerman, linuxppc-dev

Switching from big endian to little endian can fail to regenerate
the crypto assembly properly. Switch to using standard form of
kbuild dependency checking (i.e., use FORCE and if_changed).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 drivers/crypto/vmx/Makefile | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/vmx/Makefile b/drivers/crypto/vmx/Makefile
index b47da00..16ab2a0 100644
--- a/drivers/crypto/vmx/Makefile
+++ b/drivers/crypto/vmx/Makefile
@@ -1,6 +1,8 @@
 obj-$(CONFIG_CRYPTO_DEV_VMX_ENCRYPT) += vmx-crypto.o
 vmx-crypto-objs := vmx.o aesp8-ppc.o ghashp8-ppc.o aes.o aes_cbc.o aes_ctr.o aes_xts.o ghash.o
 
+targets += aesp8-ppc.S ghashp8-ppc.S
+
 ifdef CONFIG_CPU_LITTLE_ENDIAN
 TARGET := linux-ppc64le
 else
@@ -11,13 +13,13 @@ TARGET := linux-ppc64
 endif
 endif
 
-quiet_cmd_perl = PERL $@
+quiet_cmd_perl = PERL    $@
       cmd_perl = $(PERL) $(<) $(TARGET) > $(@)
 
-$(src)/aesp8-ppc.S: $(src)/aesp8-ppc.pl
-	$(call cmd,perl)
+$(src)/aesp8-ppc.S: $(src)/aesp8-ppc.pl FORCE
+	$(call if_changed,perl)
   
-$(src)/ghashp8-ppc.S: $(src)/ghashp8-ppc.pl
-	$(call cmd,perl)
+$(src)/ghashp8-ppc.S: $(src)/ghashp8-ppc.pl FORCE
+	$(call if_changed,perl)
 
 .PRECIOUS: $(obj)/aesp8-ppc.S $(obj)/ghashp8-ppc.S
-- 
2.10.2

^ permalink raw reply related

* Re: bug in blkcipher_walk code
From: Stephan Mueller @ 2016-11-25 12:43 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto
In-Reply-To: <1833318.YYJALW9cvC@positron.chronox.de>

Am Freitag, 18. November 2016, 12:31:10 CET schrieb Stephan Mueller:

Hi Herbert,

> Hi Herbert,
> 
> Once in a while I seem to trigger a bug in the blkcipher_walk code which I
> cannot track down. This bug happens sporadically where I assume that it has
> something to do with the memory management in the slow path of
> blkcipher_walk.
> 
> I am using the CTR DRBG code that in turn uses the ctr-aes-aesni
> implementation. The bug only appears when I want to obtain a random number
> that is less than the CTR AES block size. In my particular case, I want 4
> bytes from the DRBG.
> 
> The bug happens in arch/x86/crypto/aesni-intel_glue.c:ctr_crypt_final() at
> the line:
> 
> 	memcpy(dst, keystream, nbytes);
> 
> The bug looks like the following:
> 
> [   12.328676] BUG: unable to handle kernel paging request at
> ffffa17ae418b988 [   12.328680] IP: [<ffffffff82060eea>]
> ctr_crypt+0x19a/0x1c0
> [   12.328681] PGD 66fed067
> [   12.328681] PUD 0
> [   12.328681]
> [   12.328683] Oops: 0002 [#1] SMP
> [   12.328692] Modules linked in: bridge(+) stp llc ebtable_nat ip6table_raw
> ip6table_security ip6table_mangle iptable_raw iptable_security
> iptable_mangle ebtable_filter ebtables ip6table_filter ip6_tables
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr i2c_piix4
> virtio_net virtio_balloon acpi_cpufreq sch_fq_codel virtio_console
> virtio_blk virtio_pci virtio_ring serio_raw crc32c_intel virtio
> [   12.328693] CPU: 0 PID: 521 Comm: modprobe Not tainted 4.9.0-rc1+ #253
> [   12.328694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.9.1-1.fc24 04/01/2014
> [   12.328694] task: ffffa17ab8453fc0 task.stack: ffffbdafc0744000
> [   12.328696] RIP: 0010:[<ffffffff82060eea>]  [<ffffffff82060eea>]
> ctr_crypt +0x19a/0x1c0
> [   12.328696] RSP: 0018:ffffbdafc0747a60  EFLAGS: 00010002
> [   12.328697] RAX: 0000000032e455a6 RBX: 0000000000000004 RCX:
> 0000000000000002
> [   12.328697] RDX: 0000000000000001 RSI: 0000000000000086 RDI:
> 0000000000000086
> [   12.328698] RBP: ffffbdafc0747b28 R08: ffffa17abc16e900 R09:
> 0000000000000019
> [   12.328698] R10: ffffa17a764f68b0 R11: 000000000002e918 R12:
> ffffbdafc0747b38
> [   12.328698] R13: ffffa17a764f6840 R14: ffffa17ae418b988 R15:
> ffffbdafc0747a70
> [   12.328699] FS:  00007f55f57a6700(0000) GS:ffffa17abfc00000(0000) knlGS:
> 0000000000000000
> [   12.328700] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   12.328700] CR2: ffffa17ae418b988 CR3: 0000000079b26000 CR4:
> 00000000003406f0
> [   12.328703] Stack:
> [   12.328705]  ffffa17abc16e900 ffffa17ab845fd80 2ae7e40732e455a6
> 3a224612a8f9841d
> [   12.328706]  fffffb4e81e117c0 ffffa17ab845fd80 fffffb4e829062c0
> ffffa17ae418b988
> [   12.328707]  ffffbdafc0747ba8 ffffffff00000d80 ffffffff00000004
> ffffbdafc0747bc8
> [   12.328708] Call Trace:
> [   12.328712]  [<ffffffff823e5fd3>] __ablk_encrypt+0x43/0x50
> [   12.328714]  [<ffffffff823e6012>] ablk_encrypt+0x32/0xc0
> [   12.328716]  [<ffffffff823c4f2e>] skcipher_encrypt_ablkcipher+0x5e/0x60
> [   12.328717]  [<ffffffff823dbb80>] drbg_kcapi_sym_ctr+0xb0/0x130
> [   12.328719]  [<ffffffff823de153>] drbg_ctr_generate+0x53/0x80
> 
> 
> Now, the interesting part is the following: the original memory pointer that
> shall be processed by the DRBG is in my example ffffffffc018b988 -- this
> pointer is used until the DRBG invokes crypto_skcipher_encrypt. However,
> when I print out the buffer pointer that is used as dst in the memcpy of
> ctr_crypt_final, I see ffffa17ae418b988 -- i.e. the buffer that causes
> paging failure.
> 
> During tracing the blkcipher_walk code I see that the slow code path is used
> when the request size is smaller than the block size. That slow code path
> allocates new memory that will be used for the dst pointer in
> ctr_crypt_final.
> 
> May I ask you for checking whether the allocation and the memory pointer
> logic has an issue that would cause a paging failure?

Following up this issue, I found the location where the wrong memory pointer 
is produced -- the following call tree is used:

1. set up of SGL with proper pointer

2. skcipher_encrypt_ablkcipher with SGL

3. invocation of ctr_crypt from arch/x86/crypto/aesni-intel_glue.c

4. blkcipher_walk_virt_block

5. blkcipher_walk_first

6. blkcipher_walk_next (this code does not use the code path to allocate a 
page)

7. blkcipher_next_fast

        walk->dst.virt.addr = walk->src.virt.addr;
			-> copy src virt address into dst address pointer

		Now, the diff path is used:
        if (diff) {
                walk->flags |= BLKCIPHER_WALK_DIFF;
                blkcipher_map_dst(walk);
        }

8. blkcipher_map_dst

        walk->dst.virt.addr = scatterwalk_map(&walk->out);

		==> this pointer is wrong

The interesting point is that step 8 gets the low and high bits right, but not 
the bits in the middle:

The real data pointer for the dst buffer is ffffffffc0332988. The data pointer 
used by the crypto API is ffff96a995332988 -- as often as I see the issue, 
this similarity in the pointer values is always there.


Please note that the caller uses a static variable that shall be used as dst 
buffer.

Thanks
Stephan

^ permalink raw reply

* 40147 linux-crypto
From: service @ 2016-11-24 19:05 UTC (permalink / raw)
  To: linux-crypto

[-- Attachment #1: INFO_628278483951873_linux-crypto.zip --]
[-- Type: application/zip, Size: 4860 bytes --]

^ permalink raw reply

* Re: [PATCH 4/4] crypto: arm/crct10dif - port x86 SSE implementation to ARM
From: Ard Biesheuvel @ 2016-11-24 17:32 UTC (permalink / raw)
  To: linux-crypto@vger.kernel.org, Herbert Xu,
	linux-arm-kernel@lists.infradead.org, Catalin Marinas,
	Will Deacon, Russell King - ARM Linux
  Cc: Steve Capper, dingtinahong, yangshengkai, YueHaibing, Hanjun Guo,
	Ard Biesheuvel
In-Reply-To: <1480002201-1427-5-git-send-email-ard.biesheuvel@linaro.org>

On 24 November 2016 at 15:43, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> This is a straight transliteration of the Intel algorithm implemented
> using SSE and PCLMULQDQ instructions that resides under in the file
> arch/x86/crypto/crct10dif-pcl-asm_64.S.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm/crypto/Kconfig                        |   5 +
>  arch/arm/crypto/Makefile                       |   2 +
>  arch/{arm64 => arm}/crypto/crct10dif-ce-core.S | 457 +++++++++++---------
>  arch/{arm64 => arm}/crypto/crct10dif-ce-glue.c |  23 +-
>  4 files changed, 277 insertions(+), 210 deletions(-)
>

This patch needs the following hunk folded in to avoid breaking the
Thumb2 build:

"""
diff --git a/arch/arm/crypto/crct10dif-ce-core.S
b/arch/arm/crypto/crct10dif-ce-core.S
index 30168b0f8581..4fdbca94dd0c 100644
--- a/arch/arm/crypto/crct10dif-ce-core.S
+++ b/arch/arm/crypto/crct10dif-ce-core.S
@@ -152,7 +152,8 @@ CPU_LE(     vrev64.8        q7, q7                  )
        // XOR the initial_crc value
        veor.8          q0, q0, q10

-       adrl            ip, rk3
+ARM(   adrl            ip, rk3         )
+THUMB( adr             ip, rk3         )
        vld1.64         {q10}, [ip]     // xmm10 has rk3 and rk4
                                        // type of pmull instruction
                                        // will determine which constant to use
"""

Updated patch(es) can be found here
https://git.kernel.org/cgit/linux/kernel/git/ardb/linux.git/log/?h=arm-crct10dif

^ permalink raw reply related

* [PATCH 3/4] crypto: arm64/crct10dif - port x86 SSE implementation to arm64
From: Ard Biesheuvel @ 2016-11-24 15:43 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, linux
  Cc: steve.capper, dingtianhong, yangshengkai, yuehaibing, hanjun.guo,
	Ard Biesheuvel
In-Reply-To: <1480002201-1427-1-git-send-email-ard.biesheuvel@linaro.org>

This is a straight transliteration of the Intel algorithm implemented
using SSE and PCLMULQDQ instructions that resides under in the file
arch/x86/crypto/crct10dif-pcl-asm_64.S.

Suggested-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig             |   5 +
 arch/arm64/crypto/Makefile            |   3 +
 arch/arm64/crypto/crct10dif-ce-core.S | 518 ++++++++++++++++++++
 arch/arm64/crypto/crct10dif-ce-glue.c |  80 +++
 4 files changed, 606 insertions(+)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 2cf32e9887e1..1b50671ffec3 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -23,6 +23,11 @@ config CRYPTO_GHASH_ARM64_CE
 	depends on ARM64 && KERNEL_MODE_NEON
 	select CRYPTO_HASH
 
+config CRYPTO_CRCT10DIF_ARM64_CE
+	tristate "CRCT10DIF digest algorithm using PMULL instructions"
+	depends on ARM64 && KERNEL_MODE_NEON
+	select CRYPTO_HASH
+
 config CRYPTO_AES_ARM64_CE
 	tristate "AES core cipher using ARMv8 Crypto Extensions"
 	depends on ARM64 && KERNEL_MODE_NEON
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index abb79b3cfcfe..36fd3eb4201b 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -17,6 +17,9 @@ sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
 obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
 ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
 
+obj-$(CONFIG_CRYPTO_CRCT10DIF_ARM64_CE) += crct10dif-ce.o
+crct10dif-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
+
 obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
 CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
 
diff --git a/arch/arm64/crypto/crct10dif-ce-core.S b/arch/arm64/crypto/crct10dif-ce-core.S
new file mode 100644
index 000000000000..9148ebd3470a
--- /dev/null
+++ b/arch/arm64/crypto/crct10dif-ce-core.S
@@ -0,0 +1,518 @@
+//
+// Accelerated CRC-T10DIF using arm64 NEON and Crypto Extensions instructions
+//
+// Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+//
+// This program is free software; you can redistribute it and/or modify
+// it under the terms of the GNU General Public License version 2 as
+// published by the Free Software Foundation.
+//
+
+//
+// Implement fast CRC-T10DIF computation with SSE and PCLMULQDQ instructions
+//
+// Copyright (c) 2013, Intel Corporation
+//
+// Authors:
+//     Erdinc Ozturk <erdinc.ozturk@intel.com>
+//     Vinodh Gopal <vinodh.gopal@intel.com>
+//     James Guilford <james.guilford@intel.com>
+//     Tim Chen <tim.c.chen@linux.intel.com>
+//
+// This software is available to you under a choice of one of two
+// licenses.  You may choose to be licensed under the terms of the GNU
+// General Public License (GPL) Version 2, available from the file
+// COPYING in the main directory of this source tree, or the
+// OpenIB.org BSD license below:
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// * Redistributions of source code must retain the above copyright
+//   notice, this list of conditions and the following disclaimer.
+//
+// * Redistributions in binary form must reproduce the above copyright
+//   notice, this list of conditions and the following disclaimer in the
+//   documentation and/or other materials provided with the
+//   distribution.
+//
+// * Neither the name of the Intel Corporation nor the names of its
+//   contributors may be used to endorse or promote products derived from
+//   this software without specific prior written permission.
+//
+//
+// THIS SOFTWARE IS PROVIDED BY INTEL CORPORATION ""AS IS"" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL CORPORATION OR
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+//       Function API:
+//       UINT16 crc_t10dif_pcl(
+//               UINT16 init_crc, //initial CRC value, 16 bits
+//               const unsigned char *buf, //buffer pointer to calculate CRC on
+//               UINT64 len //buffer length in bytes (64-bit data)
+//       );
+//
+//       Reference paper titled "Fast CRC Computation for Generic
+//	Polynomials Using PCLMULQDQ Instruction"
+//       URL: http://www.intel.com/content/dam/www/public/us/en/documents
+//  /white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
+//
+//
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+	.text
+	.cpu		generic+crypto
+
+	arg1_low32	.req	w0
+	arg2		.req	x1
+	arg3		.req	x2
+
+	vzr		.req	v13
+
+ENTRY(crc_t10dif_pmull)
+	stp		x29, x30, [sp, #-32]!
+	mov		x29, sp
+
+	movi		vzr.16b, #0		// init zero register
+
+	// adjust the 16-bit initial_crc value, scale it to 32 bits
+	lsl		arg1_low32, arg1_low32, #16
+
+	// check if smaller than 256
+	cmp		arg3, #256
+
+	// for sizes less than 128, we can't fold 64B at a time...
+	b.lt		_less_than_128
+
+	// load the initial crc value
+	// crc value does not need to be byte-reflected, but it needs
+	// to be moved to the high part of the register.
+	// because data will be byte-reflected and will align with
+	// initial crc at correct place.
+	movi		v10.16b, #0
+	mov		v10.s[3], arg1_low32		// initial crc
+
+	// receive the initial 64B data, xor the initial crc value
+	ld1		{v0.2d-v3.2d}, [arg2], #0x40
+	ld1		{v4.2d-v7.2d}, [arg2], #0x40
+CPU_LE(	rev64		v0.16b, v0.16b		)
+CPU_LE(	rev64		v1.16b, v1.16b		)
+CPU_LE(	rev64		v2.16b, v2.16b		)
+CPU_LE(	rev64		v3.16b, v3.16b		)
+CPU_LE(	rev64		v4.16b, v4.16b		)
+CPU_LE(	rev64		v5.16b, v5.16b		)
+CPU_LE(	rev64		v6.16b, v6.16b		)
+CPU_LE(	rev64		v7.16b, v7.16b		)
+
+	ext		v0.16b, v0.16b, v0.16b, #8
+	ext		v1.16b, v1.16b, v1.16b, #8
+	ext		v2.16b, v2.16b, v2.16b, #8
+	ext		v3.16b, v3.16b, v3.16b, #8
+	ext		v4.16b, v4.16b, v4.16b, #8
+	ext		v5.16b, v5.16b, v5.16b, #8
+	ext		v6.16b, v6.16b, v6.16b, #8
+	ext		v7.16b, v7.16b, v7.16b, #8
+
+	// XOR the initial_crc value
+	eor		v0.16b, v0.16b, v10.16b
+
+	ldr		q10, rk3	// xmm10 has rk3 and rk4
+					// type of pmull instruction
+					// will determine which constant to use
+
+	//
+	// we subtract 256 instead of 128 to save one instruction from the loop
+	//
+	sub		arg3, arg3, #256
+
+	// at this section of the code, there is 64*x+y (0<=y<64) bytes of
+	// buffer. The _fold_64_B_loop will fold 64B at a time
+	// until we have 64+y Bytes of buffer
+
+
+	// fold 64B at a time. This section of the code folds 4 vector
+	// registers in parallel
+_fold_64_B_loop:
+
+	.macro		fold64, reg1, reg2
+	ld1		{v11.2d-v12.2d}, [arg2], #0x20
+CPU_LE(	rev64		v11.16b, v11.16b	)
+CPU_LE(	rev64		v12.16b, v12.16b	)
+	ext		v11.16b, v11.16b, v11.16b, #8
+	ext		v12.16b, v12.16b, v12.16b, #8
+
+	pmull2		v8.1q, \reg1\().2d, v10.2d
+	pmull		\reg1\().1q, \reg1\().1d, v10.1d
+	pmull2		v9.1q, \reg2\().2d, v10.2d
+	pmull		\reg2\().1q, \reg2\().1d, v10.1d
+
+	eor		\reg1\().16b, \reg1\().16b, v11.16b
+	eor		\reg2\().16b, \reg2\().16b, v12.16b
+	eor		\reg1\().16b, \reg1\().16b, v8.16b
+	eor		\reg2\().16b, \reg2\().16b, v9.16b
+	.endm
+
+	fold64		v0, v1
+	fold64		v2, v3
+	fold64		v4, v5
+	fold64		v6, v7
+
+	subs		arg3, arg3, #128
+
+	// check if there is another 64B in the buffer to be able to fold
+	b.ge		_fold_64_B_loop
+
+	// at this point, the buffer pointer is pointing at the last y Bytes
+	// of the buffer the 64B of folded data is in 4 of the vector
+	// registers: v0, v1, v2, v3
+
+	// fold the 8 vector registers to 1 vector register with different
+	// constants
+
+	.macro		fold16, rk, reg
+	ldr		q10, \rk
+	pmull		v8.1q, \reg\().1d, v10.1d
+	pmull2		\reg\().1q, \reg\().2d, v10.2d
+	eor		v7.16b, v7.16b, v8.16b
+	eor		v7.16b, v7.16b, \reg\().16b
+	.endm
+
+	fold16		rk9, v0
+	fold16		rk11, v1
+	fold16		rk13, v2
+	fold16		rk15, v3
+	fold16		rk17, v4
+	fold16		rk19, v5
+	fold16		rk1, v6
+
+	// instead of 64, we add 48 to the loop counter to save 1 instruction
+	// from the loop instead of a cmp instruction, we use the negative
+	// flag with the jl instruction
+	adds		arg3, arg3, #(128-16)
+	b.lt		_final_reduction_for_128
+
+	// now we have 16+y bytes left to reduce. 16 Bytes is in register v7
+	// and the rest is in memory. We can fold 16 bytes at a time if y>=16
+	// continue folding 16B at a time
+
+_16B_reduction_loop:
+	pmull		v8.1q, v7.1d, v10.1d
+	pmull2		v7.1q, v7.2d, v10.2d
+	eor		v7.16b, v7.16b, v8.16b
+
+	ld1		{v0.2d}, [arg2], #16
+CPU_LE(	rev64		v0.16b, v0.16b		)
+	ext		v0.16b, v0.16b, v0.16b, #8
+	eor		v7.16b, v7.16b, v0.16b
+	subs		arg3, arg3, #16
+
+	// instead of a cmp instruction, we utilize the flags with the
+	// jge instruction equivalent of: cmp arg3, 16-16
+	// check if there is any more 16B in the buffer to be able to fold
+	b.ge		_16B_reduction_loop
+
+	// now we have 16+z bytes left to reduce, where 0<= z < 16.
+	// first, we reduce the data in the xmm7 register
+
+_final_reduction_for_128:
+	// check if any more data to fold. If not, compute the CRC of
+	// the final 128 bits
+	adds		arg3, arg3, #16
+	b.eq		_128_done
+
+	// here we are getting data that is less than 16 bytes.
+	// since we know that there was data before the pointer, we can
+	// offset the input pointer before the actual point, to receive
+	// exactly 16 bytes. after that the registers need to be adjusted.
+_get_last_two_regs:
+	mov		v2.16b, v7.16b
+
+	add		arg2, arg2, arg3
+	sub		arg2, arg2, #16
+	ld1		{v1.2d}, [arg2]
+CPU_LE(	rev64		v1.16b, v1.16b		)
+	ext		v1.16b, v1.16b, v1.16b, #8
+
+	// get rid of the extra data that was loaded before
+	// load the shift constant
+	adr		x4, tbl_shf_table + 16
+	sub		x4, x4, arg3
+	ld1		{v0.16b}, [x4]
+
+	// shift v2 to the left by arg3 bytes
+	tbl		v2.16b, {v2.16b}, v0.16b
+
+	// shift v7 to the right by 16-arg3 bytes
+	movi		v9.16b, #0x80
+	eor		v0.16b, v0.16b, v9.16b
+	tbl		v7.16b, {v7.16b}, v0.16b
+
+	// blend
+	sshr		v0.16b, v0.16b, #7	// convert to 8-bit mask
+	bsl		v0.16b, v2.16b, v1.16b
+
+	// fold 16 Bytes
+	pmull		v8.1q, v7.1d, v10.1d
+	pmull2		v7.1q, v7.2d, v10.2d
+	eor		v7.16b, v7.16b, v8.16b
+	eor		v7.16b, v7.16b, v0.16b
+
+_128_done:
+	// compute crc of a 128-bit value
+	ldr		q10, rk5		// rk5 and rk6 in xmm10
+
+	// 64b fold
+	mov		v0.16b, v7.16b
+	ext		v7.16b, v7.16b, v7.16b, #8
+	pmull		v7.1q, v7.1d, v10.1d
+	ext		v0.16b, vzr.16b, v0.16b, #8
+	eor		v7.16b, v7.16b, v0.16b
+
+	// 32b fold
+	mov		v0.16b, v7.16b
+	mov		v0.s[3], vzr.s[0]
+	ext		v7.16b, v7.16b, vzr.16b, #12
+	ext		v9.16b, v10.16b, v10.16b, #8
+	pmull		v7.1q, v7.1d, v9.1d
+	eor		v7.16b, v7.16b, v0.16b
+
+	// barrett reduction
+_barrett:
+	ldr		q10, rk7
+	mov		v0.16b, v7.16b
+	ext		v7.16b, v7.16b, v7.16b, #8
+
+	pmull		v7.1q, v7.1d, v10.1d
+	ext		v7.16b, vzr.16b, v7.16b, #12
+	pmull2		v7.1q, v7.2d, v10.2d
+	ext		v7.16b, vzr.16b, v7.16b, #12
+	eor		v7.16b, v7.16b, v0.16b
+	mov		w0, v7.s[1]
+
+_cleanup:
+	// scale the result back to 16 bits
+	lsr		x0, x0, #16
+	ldp		x29, x30, [sp], #32
+	ret
+
+	.align		4
+_less_than_128:
+
+	// check if there is enough buffer to be able to fold 16B at a time
+	cmp		arg3, #32
+	b.lt		_less_than_32
+
+	// now if there is, load the constants
+	ldr		q10, rk1		// rk1 and rk2 in xmm10
+
+	movi		v0.16b, #0
+	mov		v0.s[3], arg1_low32	// get the initial crc value
+	ld1		{v7.2d}, [arg2], #0x10
+CPU_LE(	rev64		v7.16b, v7.16b		)
+	ext		v7.16b, v7.16b, v7.16b, #8
+	eor		v7.16b, v7.16b, v0.16b
+
+	// update the counter. subtract 32 instead of 16 to save one
+	// instruction from the loop
+	sub		arg3, arg3, #32
+
+	b		_16B_reduction_loop
+
+	.align		4
+_less_than_32:
+	cbz		arg3, _cleanup
+
+	movi		v0.16b, #0
+	mov		v0.s[3], arg1_low32	// get the initial crc value
+
+	cmp		arg3, #16
+	b.eq		_exact_16_left
+	b.lt		_less_than_16_left
+
+	ld1		{v7.2d}, [arg2], #0x10
+CPU_LE(	rev64		v7.16b, v7.16b		)
+	ext		v7.16b, v7.16b, v7.16b, #8
+	eor		v7.16b, v7.16b, v0.16b
+	sub		arg3, arg3, #16
+	ldr		q10, rk1		// rk1 and rk2 in xmm10
+	b		_get_last_two_regs
+
+	.align		4
+_less_than_16_left:
+	// use stack space to load data less than 16 bytes, zero-out
+	// the 16B in memory first.
+
+	add		x11, sp, #0x10
+	stp		xzr, xzr, [x11]
+
+	cmp		arg3, #4
+	b.lt		_only_less_than_4
+
+	// backup the counter value
+	mov		x9, arg3
+	tbz		arg3, #3, _less_than_8_left
+
+	// load 8 Bytes
+	ldr		x0, [arg2], #8
+	str		x0, [x11], #8
+	sub		arg3, arg3, #8
+
+_less_than_8_left:
+	tbz		arg3, #2, _less_than_4_left
+
+	// load 4 Bytes
+	ldr		w0, [arg2], #4
+	str		w0, [x11], #4
+	sub		arg3, arg3, #4
+
+_less_than_4_left:
+	tbz		arg3, #1, _less_than_2_left
+
+	// load 2 Bytes
+	ldrh		w0, [arg2], #2
+	strh		w0, [x11], #2
+	sub		arg3, arg3, #2
+
+_less_than_2_left:
+	cbz		arg3, _zero_left
+
+	// load 1 Byte
+	ldrb		w0, [arg2]
+	strb		w0, [x11]
+
+_zero_left:
+	add		x11, sp, #0x10
+	ld1		{v7.2d}, [x11]
+CPU_LE(	rev64		v7.16b, v7.16b		)
+	ext		v7.16b, v7.16b, v7.16b, #8
+	eor		v7.16b, v7.16b, v0.16b
+
+	// shl r9, 4
+	adr		x0, tbl_shf_table + 16
+	sub		x0, x0, x9
+	ld1		{v0.16b}, [x0]
+	movi		v9.16b, #0x80
+	eor		v0.16b, v0.16b, v9.16b
+	tbl		v7.16b, {v7.16b}, v0.16b
+
+	b		_128_done
+
+	.align		4
+_exact_16_left:
+	ld1		{v7.2d}, [arg2]
+CPU_LE(	rev64		v7.16b, v7.16b		)
+	ext		v7.16b, v7.16b, v7.16b, #8
+	eor		v7.16b, v7.16b, v0.16b	// xor the initial crc value
+
+	b		_128_done
+
+_only_less_than_4:
+	cmp		arg3, #3
+	b.lt		_only_less_than_3
+
+	// load 3 Bytes
+	ldrh		w0, [arg2]
+	strh		w0, [x11]
+
+	ldrb		w0, [arg2, #2]
+	strb		w0, [x11, #2]
+
+	ld1		{v7.2d}, [x11]
+CPU_LE(	rev64		v7.16b, v7.16b		)
+	ext		v7.16b, v7.16b, v7.16b, #8
+	eor		v7.16b, v7.16b, v0.16b
+
+	ext		v7.16b, v7.16b, vzr.16b, #5
+	b		_barrett
+
+_only_less_than_3:
+	cmp		arg3, #2
+	b.lt		_only_less_than_2
+
+	// load 2 Bytes
+	ldrh		w0, [arg2]
+	strh		w0, [x11]
+
+	ld1		{v7.2d}, [x11]
+CPU_LE(	rev64		v7.16b, v7.16b		)
+	ext		v7.16b, v7.16b, v7.16b, #8
+	eor		v7.16b, v7.16b, v0.16b
+
+	ext		v7.16b, v7.16b, vzr.16b, #6
+	b		_barrett
+
+_only_less_than_2:
+
+	// load 1 Byte
+	ldrb		w0, [arg2]
+	strb		w0, [x11]
+
+	ld1		{v7.2d}, [x11]
+CPU_LE(	rev64		v7.16b, v7.16b		)
+	ext		v7.16b, v7.16b, v7.16b, #8
+	eor		v7.16b, v7.16b, v0.16b
+
+	ext		v7.16b, v7.16b, vzr.16b, #7
+	b		_barrett
+
+ENDPROC(crc_t10dif_pmull)
+
+// precomputed constants
+// these constants are precomputed from the poly:
+// 0x8bb70000 (0x8bb7 scaled to 32 bits)
+	.align		4
+// Q = 0x18BB70000
+// rk1 = 2^(32*3) mod Q << 32
+// rk2 = 2^(32*5) mod Q << 32
+// rk3 = 2^(32*15) mod Q << 32
+// rk4 = 2^(32*17) mod Q << 32
+// rk5 = 2^(32*3) mod Q << 32
+// rk6 = 2^(32*2) mod Q << 32
+// rk7 = floor(2^64/Q)
+// rk8 = Q
+
+rk1:	.octa		0x06df0000000000002d56000000000000
+rk3:	.octa		0x7cf50000000000009d9d000000000000
+rk5:	.octa		0x13680000000000002d56000000000000
+rk7:	.octa		0x000000018bb7000000000001f65a57f8
+rk9:	.octa		0xbfd6000000000000ceae000000000000
+rk11:	.octa		0x713c0000000000001e16000000000000
+rk13:	.octa		0x80a6000000000000f7f9000000000000
+rk15:	.octa		0xe658000000000000044c000000000000
+rk17:	.octa		0xa497000000000000ad18000000000000
+rk19:	.octa		0xe7b50000000000006ee3000000000000
+
+tbl_shf_table:
+// use these values for shift constants for the tbl/tbx instruction
+// different alignments result in values as shown:
+//	DDQ 0x008f8e8d8c8b8a898887868584838281 # shl 15 (16-1) / shr1
+//	DDQ 0x01008f8e8d8c8b8a8988878685848382 # shl 14 (16-3) / shr2
+//	DDQ 0x0201008f8e8d8c8b8a89888786858483 # shl 13 (16-4) / shr3
+//	DDQ 0x030201008f8e8d8c8b8a898887868584 # shl 12 (16-4) / shr4
+//	DDQ 0x04030201008f8e8d8c8b8a8988878685 # shl 11 (16-5) / shr5
+//	DDQ 0x0504030201008f8e8d8c8b8a89888786 # shl 10 (16-6) / shr6
+//	DDQ 0x060504030201008f8e8d8c8b8a898887 # shl 9  (16-7) / shr7
+//	DDQ 0x07060504030201008f8e8d8c8b8a8988 # shl 8  (16-8) / shr8
+//	DDQ 0x0807060504030201008f8e8d8c8b8a89 # shl 7  (16-9) / shr9
+//	DDQ 0x090807060504030201008f8e8d8c8b8a # shl 6  (16-10) / shr10
+//	DDQ 0x0a090807060504030201008f8e8d8c8b # shl 5  (16-11) / shr11
+//	DDQ 0x0b0a090807060504030201008f8e8d8c # shl 4  (16-12) / shr12
+//	DDQ 0x0c0b0a090807060504030201008f8e8d # shl 3  (16-13) / shr13
+//	DDQ 0x0d0c0b0a090807060504030201008f8e # shl 2  (16-14) / shr14
+//	DDQ 0x0e0d0c0b0a090807060504030201008f # shl 1  (16-15) / shr15
+
+	.byte		 0x0, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87
+	.byte		0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f
+	.byte		 0x0,  0x1,  0x2,  0x3,  0x4,  0x5,  0x6,  0x7
+	.byte		 0x8,  0x9,  0xa,  0xb,  0xc,  0xd,  0xe , 0x0
diff --git a/arch/arm64/crypto/crct10dif-ce-glue.c b/arch/arm64/crypto/crct10dif-ce-glue.c
new file mode 100644
index 000000000000..d11f33dae79c
--- /dev/null
+++ b/arch/arm64/crypto/crct10dif-ce-glue.c
@@ -0,0 +1,80 @@
+/*
+ * Accelerated CRC-T10DIF using arm64 NEON and Crypto Extensions instructions
+ *
+ * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/cpufeature.h>
+#include <linux/crc-t10dif.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/string.h>
+
+#include <crypto/internal/hash.h>
+
+#include <asm/neon.h>
+
+asmlinkage u16 crc_t10dif_pmull(u16 init_crc, const u8 buf[], u64 len);
+
+static int crct10dif_init(struct shash_desc *desc)
+{
+	u16 *crc = shash_desc_ctx(desc);
+
+	*crc = 0;
+	return 0;
+}
+
+static int crct10dif_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int length)
+{
+	u16 *crc = shash_desc_ctx(desc);
+
+	kernel_neon_begin_partial(14);
+	*crc = crc_t10dif_pmull(*crc, data, length);
+	kernel_neon_end();
+
+	return 0;
+}
+
+static int crct10dif_final(struct shash_desc *desc, u8 *out)
+{
+	u16 *crc = shash_desc_ctx(desc);
+
+	*(u16 *)out = *crc;
+	return 0;
+}
+
+static struct shash_alg crc_t10dif_alg = {
+	.digestsize		= CRC_T10DIF_DIGEST_SIZE,
+	.init			= crct10dif_init,
+	.update			= crct10dif_update,
+	.final			= crct10dif_final,
+
+	.descsize		= CRC_T10DIF_DIGEST_SIZE,
+	.base.cra_name		= "crct10dif",
+	.base.cra_driver_name	= "crct10dif-arm64-ce",
+	.base.cra_priority	= 200,
+	.base.cra_blocksize	= CRC_T10DIF_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+};
+
+static int __init crc_t10dif_mod_init(void)
+{
+	return crypto_register_shash(&crc_t10dif_alg);
+}
+
+static void __exit crc_t10dif_mod_exit(void)
+{
+	crypto_unregister_shash(&crc_t10dif_alg);
+}
+
+module_cpu_feature_match(PMULL, crc_t10dif_mod_init);
+module_exit(crc_t10dif_mod_exit);
+
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
-- 
2.7.4

^ permalink raw reply related

* [PATCH 2/4] crypto: testmgr - add/enhance test cases for CRC-T10DIF
From: Ard Biesheuvel @ 2016-11-24 15:43 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, linux
  Cc: steve.capper, dingtianhong, yangshengkai, yuehaibing, hanjun.guo,
	Ard Biesheuvel
In-Reply-To: <1480002201-1427-1-git-send-email-ard.biesheuvel@linaro.org>

The existing test cases only exercise a small slice of the various
possible code paths through the x86 SSE/PCLMULQDQ implementation,
and the upcoming ports of it for arm64. So add one that exceeds 256
bytes in size, and convert another to a chunked test.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/testmgr.h | 70 ++++++++++++--------
 1 file changed, 42 insertions(+), 28 deletions(-)

diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index e64a4ef9d8ca..b7cd41b25a2a 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -1334,36 +1334,50 @@ static struct hash_testvec rmd320_tv_template[] = {
 	}
 };
 
-#define CRCT10DIF_TEST_VECTORS	3
+#define CRCT10DIF_TEST_VECTORS	ARRAY_SIZE(crct10dif_tv_template)
 static struct hash_testvec crct10dif_tv_template[] = {
 	{
-		.plaintext = "abc",
-		.psize  = 3,
-#ifdef __LITTLE_ENDIAN
-		.digest = "\x3b\x44",
-#else
-		.digest = "\x44\x3b",
-#endif
-	}, {
-		.plaintext = "1234567890123456789012345678901234567890"
-			     "123456789012345678901234567890123456789",
-		.psize	= 79,
-#ifdef __LITTLE_ENDIAN
-		.digest	= "\x70\x4b",
-#else
-		.digest	= "\x4b\x70",
-#endif
-	}, {
-		.plaintext =
-		"abcddddddddddddddddddddddddddddddddddddddddddddddddddddd",
-		.psize  = 56,
-#ifdef __LITTLE_ENDIAN
-		.digest = "\xe3\x9c",
-#else
-		.digest = "\x9c\xe3",
-#endif
-		.np     = 2,
-		.tap    = { 28, 28 }
+		.plaintext	= "abc",
+		.psize		= 3,
+		.digest		= (u8 *)(u16 []){ 0x443b },
+	}, {
+		.plaintext 	= "1234567890123456789012345678901234567890"
+				  "123456789012345678901234567890123456789",
+		.psize		= 79,
+		.digest 	= (u8 *)(u16 []){ 0x4b70 },
+		.np		= 2,
+		.tap		= { 63, 16 },
+	}, {
+		.plaintext	= "abcdddddddddddddddddddddddddddddddddddddddd"
+				  "ddddddddddddd",
+		.psize		= 56,
+		.digest		= (u8 *)(u16 []){ 0x9ce3 },
+		.np		= 8,
+		.tap		= { 1, 2, 28, 7, 6, 5, 4, 3 },
+	}, {
+		.plaintext 	= "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "123456789012345678901234567890123456789",
+		.psize		= 319,
+		.digest		= (u8 *)((u16 []){ 0x44c6 }),
+	}, {
+		.plaintext 	= "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "1234567890123456789012345678901234567890"
+				  "123456789012345678901234567890123456789",
+		.psize		= 319,
+		.digest		= (u8 *)((u16 []){ 0x44c6 }),
+		.np		= 4,
+		.tap		= { 1, 255, 57, 6 },
 	}
 };
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH 1/4] crypto: testmgr - avoid overlap in chunked tests
From: Ard Biesheuvel @ 2016-11-24 15:43 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, linux
  Cc: steve.capper, dingtianhong, yangshengkai, yuehaibing, hanjun.guo,
	Ard Biesheuvel
In-Reply-To: <1480002201-1427-1-git-send-email-ard.biesheuvel@linaro.org>

The IDXn offsets are chosen such that tap values (which may go up to
255) end up overlapping in the xbuf allocation. In particular, IDX1
and IDX3 are too close together, so update IDX3 to avoid this issue.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/testmgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 62dffa0028ac..15650597dcc9 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -62,7 +62,7 @@ int alg_test(const char *driver, const char *alg, u32 type, u32 mask)
  */
 #define IDX1		32
 #define IDX2		32400
-#define IDX3		1
+#define IDX3		511
 #define IDX4		8193
 #define IDX5		22222
 #define IDX6		17101
-- 
2.7.4

^ permalink raw reply related

* [PATCH 0/4] crypto: CRCT10DIF support for ARM and arm64
From: Ard Biesheuvel @ 2016-11-24 15:43 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, linux
  Cc: steve.capper, dingtianhong, yangshengkai, yuehaibing, hanjun.guo,
	Ard Biesheuvel

First of all, apologies to Yue Haibing for stealing his thunder, to some
extent. But after reviewing (and replying to) his patch, I noticed that his
code is not original code, but simply a transliteration of the existing Intel
code that resides in arch/x86/crypto/crct10dif-pcl-asm_64.S, but with the
license and copyright statement removed. 

So, if we are going to transliterate code, let's credit the original authors,
even if the resulting code does not look like the code you started out with.

Then, I noticed that we could stay *much* closer to the original, and that
there is no need for jump tables or computed gotos at all. So I got a bit
carried away, and ended up reimplementing the whole thing, for both arm and64
and ARM.

Patch #1 fixes an issue in testmgr that results in spurious false negatives
in the chunking tests if the third chunk exceeds 31 bytes.

Patch #2 expands the existing CRCT10DIF test cases, to ensure that all
code paths are actually covered.

Patch #3 is a straight transliteration of the Intel code to arm64.

Patch #4 is a straight transliteration of the Intel code to ARM. This patch
is against patch #3 (using --find-copies-harder) so that it is easy to
see how the ARM code deviates from the arm64 code.

NOTE: this code uses the 64x64->128 bit polynomial multiply instruction,
which is only available on cores that implement the v8 Crypto Extensions.

Ard Biesheuvel (4):
  crypto: testmgr - avoid overlap in chunked tests
  crypto: testmgr - add/enhance test cases for CRC-T10DIF
  crypto: arm64/crct10dif - port x86 SSE implementation to arm64
  crypto: arm/crct10dif - port x86 SSE implementation to ARM

 arch/arm/crypto/Kconfig               |   5 +
 arch/arm/crypto/Makefile              |   2 +
 arch/arm/crypto/crct10dif-ce-core.S   | 569 ++++++++++++++++++++
 arch/arm/crypto/crct10dif-ce-glue.c   |  89 +++
 arch/arm64/crypto/Kconfig             |   5 +
 arch/arm64/crypto/Makefile            |   3 +
 arch/arm64/crypto/crct10dif-ce-core.S | 518 ++++++++++++++++++
 arch/arm64/crypto/crct10dif-ce-glue.c |  80 +++
 crypto/testmgr.c                      |   2 +-
 crypto/testmgr.h                      |  70 ++-
 10 files changed, 1314 insertions(+), 29 deletions(-)
 create mode 100644 arch/arm/crypto/crct10dif-ce-core.S
 create mode 100644 arch/arm/crypto/crct10dif-ce-glue.c
 create mode 100644 arch/arm64/crypto/crct10dif-ce-core.S
 create mode 100644 arch/arm64/crypto/crct10dif-ce-glue.c

-- 
2.7.4

^ permalink raw reply

* [PATCH 4/4] crypto: arm/crct10dif - port x86 SSE implementation to ARM
From: Ard Biesheuvel @ 2016-11-24 15:43 UTC (permalink / raw)
  To: linux-crypto, herbert, linux-arm-kernel, catalin.marinas,
	will.deacon, linux
  Cc: steve.capper, Ard Biesheuvel, yuehaibing, hanjun.guo,
	dingtianhong, yangshengkai
In-Reply-To: <1480002201-1427-1-git-send-email-ard.biesheuvel@linaro.org>

This is a straight transliteration of the Intel algorithm implemented
using SSE and PCLMULQDQ instructions that resides under in the file
arch/x86/crypto/crct10dif-pcl-asm_64.S.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/Kconfig                        |   5 +
 arch/arm/crypto/Makefile                       |   2 +
 arch/{arm64 => arm}/crypto/crct10dif-ce-core.S | 457 +++++++++++---------
 arch/{arm64 => arm}/crypto/crct10dif-ce-glue.c |  23 +-
 4 files changed, 277 insertions(+), 210 deletions(-)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 27ed1b1cd1d7..fce801fa52a1 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -120,4 +120,9 @@ config CRYPTO_GHASH_ARM_CE
 	  that uses the 64x64 to 128 bit polynomial multiplication (vmull.p64)
 	  that is part of the ARMv8 Crypto Extensions
 
+config CRYPTO_CRCT10DIF_ARM_CE
+	tristate "CRCT10DIF digest algorithm using PMULL instructions"
+	depends on KERNEL_MODE_NEON && CRC_T10DIF
+	select CRYPTO_HASH
+
 endif
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index fc5150702b64..fc77265014b7 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -13,6 +13,7 @@ ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_SHA2_ARM_CE) += sha2-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_GHASH_ARM_CE) += ghash-arm-ce.o
+ce-obj-$(CONFIG_CRYPTO_CRCT10DIF_ARM_CE) += crct10dif-arm-ce.o
 
 ifneq ($(ce-obj-y)$(ce-obj-m),)
 ifeq ($(call as-instr,.fpu crypto-neon-fp-armv8,y,n),y)
@@ -36,6 +37,7 @@ sha1-arm-ce-y	:= sha1-ce-core.o sha1-ce-glue.o
 sha2-arm-ce-y	:= sha2-ce-core.o sha2-ce-glue.o
 aes-arm-ce-y	:= aes-ce-core.o aes-ce-glue.o
 ghash-arm-ce-y	:= ghash-ce-core.o ghash-ce-glue.o
+crct10dif-arm-ce-y	:= crct10dif-ce-core.o crct10dif-ce-glue.o
 
 quiet_cmd_perl = PERL    $@
       cmd_perl = $(PERL) $(<) > $(@)
diff --git a/arch/arm64/crypto/crct10dif-ce-core.S b/arch/arm/crypto/crct10dif-ce-core.S
similarity index 60%
copy from arch/arm64/crypto/crct10dif-ce-core.S
copy to arch/arm/crypto/crct10dif-ce-core.S
index 9148ebd3470a..30168b0f8581 100644
--- a/arch/arm64/crypto/crct10dif-ce-core.S
+++ b/arch/arm/crypto/crct10dif-ce-core.S
@@ -1,5 +1,5 @@
 //
-// Accelerated CRC-T10DIF using arm64 NEON and Crypto Extensions instructions
+// Accelerated CRC-T10DIF using ARM NEON and Crypto Extensions instructions
 //
 // Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
 //
@@ -71,20 +71,43 @@
 #include <linux/linkage.h>
 #include <asm/assembler.h>
 
-	.text
-	.cpu		generic+crypto
-
-	arg1_low32	.req	w0
-	arg2		.req	x1
-	arg3		.req	x2
+#ifdef CONFIG_CPU_ENDIAN_BE8
+#define CPU_LE(code...)
+#else
+#define CPU_LE(code...)		code
+#endif
 
-	vzr		.req	v13
+	.text
+	.fpu		crypto-neon-fp-armv8
+
+	arg1_low32	.req	r0
+	arg2		.req	r1
+	arg3		.req	r2
+
+	qzr		.req	q13
+
+	q0l		.req	d0
+	q0h		.req	d1
+	q1l		.req	d2
+	q1h		.req	d3
+	q2l		.req	d4
+	q2h		.req	d5
+	q3l		.req	d6
+	q3h		.req	d7
+	q4l		.req	d8
+	q4h		.req	d9
+	q5l		.req	d10
+	q5h		.req	d11
+	q6l		.req	d12
+	q6h		.req	d13
+	q7l		.req	d14
+	q7h		.req	d15
 
 ENTRY(crc_t10dif_pmull)
-	stp		x29, x30, [sp, #-32]!
-	mov		x29, sp
+	push		{r4, lr}
+	sub		sp, sp, #0x10
 
-	movi		vzr.16b, #0		// init zero register
+	vmov.i8		qzr, #0			// init zero register
 
 	// adjust the 16-bit initial_crc value, scale it to 32 bits
 	lsl		arg1_low32, arg1_low32, #16
@@ -93,41 +116,44 @@ ENTRY(crc_t10dif_pmull)
 	cmp		arg3, #256
 
 	// for sizes less than 128, we can't fold 64B at a time...
-	b.lt		_less_than_128
+	blt		_less_than_128
 
 	// load the initial crc value
 	// crc value does not need to be byte-reflected, but it needs
 	// to be moved to the high part of the register.
 	// because data will be byte-reflected and will align with
 	// initial crc at correct place.
-	movi		v10.16b, #0
-	mov		v10.s[3], arg1_low32		// initial crc
+	vmov		s0, arg1_low32		// initial crc
+	vext.8		q10, qzr, q0, #4
 
 	// receive the initial 64B data, xor the initial crc value
-	ld1		{v0.2d-v3.2d}, [arg2], #0x40
-	ld1		{v4.2d-v7.2d}, [arg2], #0x40
-CPU_LE(	rev64		v0.16b, v0.16b		)
-CPU_LE(	rev64		v1.16b, v1.16b		)
-CPU_LE(	rev64		v2.16b, v2.16b		)
-CPU_LE(	rev64		v3.16b, v3.16b		)
-CPU_LE(	rev64		v4.16b, v4.16b		)
-CPU_LE(	rev64		v5.16b, v5.16b		)
-CPU_LE(	rev64		v6.16b, v6.16b		)
-CPU_LE(	rev64		v7.16b, v7.16b		)
-
-	ext		v0.16b, v0.16b, v0.16b, #8
-	ext		v1.16b, v1.16b, v1.16b, #8
-	ext		v2.16b, v2.16b, v2.16b, #8
-	ext		v3.16b, v3.16b, v3.16b, #8
-	ext		v4.16b, v4.16b, v4.16b, #8
-	ext		v5.16b, v5.16b, v5.16b, #8
-	ext		v6.16b, v6.16b, v6.16b, #8
-	ext		v7.16b, v7.16b, v7.16b, #8
+	vld1.64		{q0-q1}, [arg2]!
+	vld1.64		{q2-q3}, [arg2]!
+	vld1.64		{q4-q5}, [arg2]!
+	vld1.64		{q6-q7}, [arg2]!
+CPU_LE(	vrev64.8	q0, q0			)
+CPU_LE(	vrev64.8	q1, q1			)
+CPU_LE(	vrev64.8	q2, q2			)
+CPU_LE(	vrev64.8	q3, q3			)
+CPU_LE(	vrev64.8	q4, q4			)
+CPU_LE(	vrev64.8	q5, q5			)
+CPU_LE(	vrev64.8	q6, q6			)
+CPU_LE(	vrev64.8	q7, q7			)
+
+	vext.8		q0, q0, q0, #8
+	vext.8		q1, q1, q1, #8
+	vext.8		q2, q2, q2, #8
+	vext.8		q3, q3, q3, #8
+	vext.8		q4, q4, q4, #8
+	vext.8		q5, q5, q5, #8
+	vext.8		q6, q6, q6, #8
+	vext.8		q7, q7, q7, #8
 
 	// XOR the initial_crc value
-	eor		v0.16b, v0.16b, v10.16b
+	veor.8		q0, q0, q10
 
-	ldr		q10, rk3	// xmm10 has rk3 and rk4
+	adrl		ip, rk3
+	vld1.64		{q10}, [ip]	// xmm10 has rk3 and rk4
 					// type of pmull instruction
 					// will determine which constant to use
 
@@ -146,32 +172,32 @@ CPU_LE(	rev64		v7.16b, v7.16b		)
 _fold_64_B_loop:
 
 	.macro		fold64, reg1, reg2
-	ld1		{v11.2d-v12.2d}, [arg2], #0x20
-CPU_LE(	rev64		v11.16b, v11.16b	)
-CPU_LE(	rev64		v12.16b, v12.16b	)
-	ext		v11.16b, v11.16b, v11.16b, #8
-	ext		v12.16b, v12.16b, v12.16b, #8
-
-	pmull2		v8.1q, \reg1\().2d, v10.2d
-	pmull		\reg1\().1q, \reg1\().1d, v10.1d
-	pmull2		v9.1q, \reg2\().2d, v10.2d
-	pmull		\reg2\().1q, \reg2\().1d, v10.1d
-
-	eor		\reg1\().16b, \reg1\().16b, v11.16b
-	eor		\reg2\().16b, \reg2\().16b, v12.16b
-	eor		\reg1\().16b, \reg1\().16b, v8.16b
-	eor		\reg2\().16b, \reg2\().16b, v9.16b
+	vld1.64		{q11-q12}, [arg2]!
+CPU_LE(	vrev64.8	q11, q11		)
+CPU_LE(	vrev64.8	q12, q12		)
+	vext.8		q11, q11, q11, #8
+	vext.8		q12, q12, q12, #8
+
+	vmull.p64	q8, \reg1\()h, d21
+	vmull.p64	\reg1\(), \reg1\()l, d20
+	vmull.p64	q9, \reg2\()h, d21
+	vmull.p64	\reg2\(), \reg2\()l, d20
+
+	veor.8		\reg1, \reg1, q11
+	veor.8		\reg2, \reg2, q12
+	veor.8		\reg1, \reg1, q8
+	veor.8		\reg2, \reg2, q9
 	.endm
 
-	fold64		v0, v1
-	fold64		v2, v3
-	fold64		v4, v5
-	fold64		v6, v7
+	fold64		q0, q1
+	fold64		q2, q3
+	fold64		q4, q5
+	fold64		q6, q7
 
 	subs		arg3, arg3, #128
 
 	// check if there is another 64B in the buffer to be able to fold
-	b.ge		_fold_64_B_loop
+	bge		_fold_64_B_loop
 
 	// at this point, the buffer pointer is pointing at the last y Bytes
 	// of the buffer the 64B of folded data is in 4 of the vector
@@ -181,46 +207,47 @@ CPU_LE(	rev64		v12.16b, v12.16b	)
 	// constants
 
 	.macro		fold16, rk, reg
-	ldr		q10, \rk
-	pmull		v8.1q, \reg\().1d, v10.1d
-	pmull2		\reg\().1q, \reg\().2d, v10.2d
-	eor		v7.16b, v7.16b, v8.16b
-	eor		v7.16b, v7.16b, \reg\().16b
+	vldr		d20, \rk
+	vldr		d21, \rk + 8
+	vmull.p64	q8, \reg\()l, d20
+	vmull.p64	\reg\(), \reg\()h, d21
+	veor.8		q7, q7, q8
+	veor.8		q7, q7, \reg
 	.endm
 
-	fold16		rk9, v0
-	fold16		rk11, v1
-	fold16		rk13, v2
-	fold16		rk15, v3
-	fold16		rk17, v4
-	fold16		rk19, v5
-	fold16		rk1, v6
+	fold16		rk9, q0
+	fold16		rk11, q1
+	fold16		rk13, q2
+	fold16		rk15, q3
+	fold16		rk17, q4
+	fold16		rk19, q5
+	fold16		rk1, q6
 
 	// instead of 64, we add 48 to the loop counter to save 1 instruction
 	// from the loop instead of a cmp instruction, we use the negative
 	// flag with the jl instruction
 	adds		arg3, arg3, #(128-16)
-	b.lt		_final_reduction_for_128
+	blt		_final_reduction_for_128
 
 	// now we have 16+y bytes left to reduce. 16 Bytes is in register v7
 	// and the rest is in memory. We can fold 16 bytes at a time if y>=16
 	// continue folding 16B at a time
 
 _16B_reduction_loop:
-	pmull		v8.1q, v7.1d, v10.1d
-	pmull2		v7.1q, v7.2d, v10.2d
-	eor		v7.16b, v7.16b, v8.16b
-
-	ld1		{v0.2d}, [arg2], #16
-CPU_LE(	rev64		v0.16b, v0.16b		)
-	ext		v0.16b, v0.16b, v0.16b, #8
-	eor		v7.16b, v7.16b, v0.16b
+	vmull.p64	q8, d14, d20
+	vmull.p64	q7, d15, d21
+	veor.8		q7, q7, q8
+
+	vld1.64		{q0}, [arg2]!
+CPU_LE(	vrev64.8	q0, q0		)
+	vext.8		q0, q0, q0, #8
+	veor.8		q7, q7, q0
 	subs		arg3, arg3, #16
 
 	// instead of a cmp instruction, we utilize the flags with the
 	// jge instruction equivalent of: cmp arg3, 16-16
 	// check if there is any more 16B in the buffer to be able to fold
-	b.ge		_16B_reduction_loop
+	bge		_16B_reduction_loop
 
 	// now we have 16+z bytes left to reduce, where 0<= z < 16.
 	// first, we reduce the data in the xmm7 register
@@ -229,99 +256,104 @@ _final_reduction_for_128:
 	// check if any more data to fold. If not, compute the CRC of
 	// the final 128 bits
 	adds		arg3, arg3, #16
-	b.eq		_128_done
+	beq		_128_done
 
 	// here we are getting data that is less than 16 bytes.
 	// since we know that there was data before the pointer, we can
 	// offset the input pointer before the actual point, to receive
 	// exactly 16 bytes. after that the registers need to be adjusted.
 _get_last_two_regs:
-	mov		v2.16b, v7.16b
+	vmov		q2, q7
 
 	add		arg2, arg2, arg3
 	sub		arg2, arg2, #16
-	ld1		{v1.2d}, [arg2]
-CPU_LE(	rev64		v1.16b, v1.16b		)
-	ext		v1.16b, v1.16b, v1.16b, #8
+	vld1.64		{q1}, [arg2]
+CPU_LE(	vrev64.8	q1, q1			)
+	vext.8		q1, q1, q1, #8
 
 	// get rid of the extra data that was loaded before
 	// load the shift constant
-	adr		x4, tbl_shf_table + 16
-	sub		x4, x4, arg3
-	ld1		{v0.16b}, [x4]
+	adr		lr, tbl_shf_table + 16
+	sub		lr, lr, arg3
+	vld1.8		{q0}, [lr]
 
 	// shift v2 to the left by arg3 bytes
-	tbl		v2.16b, {v2.16b}, v0.16b
+	vmov		q9, q2
+	vtbl.8		d4, {d18-d19}, d0
+	vtbl.8		d5, {d18-d19}, d1
 
 	// shift v7 to the right by 16-arg3 bytes
-	movi		v9.16b, #0x80
-	eor		v0.16b, v0.16b, v9.16b
-	tbl		v7.16b, {v7.16b}, v0.16b
+	vmov.i8		q9, #0x80
+	veor.8		q0, q0, q9
+	vmov		q9, q7
+	vtbl.8		d14, {d18-d19}, d0
+	vtbl.8		d15, {d18-d19}, d1
 
 	// blend
-	sshr		v0.16b, v0.16b, #7	// convert to 8-bit mask
-	bsl		v0.16b, v2.16b, v1.16b
+	vshr.s8		q0, q0, #7		// convert to 8-bit mask
+	vbsl.8		q0, q2, q1
 
 	// fold 16 Bytes
-	pmull		v8.1q, v7.1d, v10.1d
-	pmull2		v7.1q, v7.2d, v10.2d
-	eor		v7.16b, v7.16b, v8.16b
-	eor		v7.16b, v7.16b, v0.16b
+	vmull.p64	q8, d14, d20
+	vmull.p64	q7, d15, d21
+	veor.8		q7, q7, q8
+	veor.8		q7, q7, q0
 
 _128_done:
 	// compute crc of a 128-bit value
-	ldr		q10, rk5		// rk5 and rk6 in xmm10
+	vldr		d20, rk5
+	vldr		d21, rk6		// rk5 and rk6 in xmm10
 
 	// 64b fold
-	mov		v0.16b, v7.16b
-	ext		v7.16b, v7.16b, v7.16b, #8
-	pmull		v7.1q, v7.1d, v10.1d
-	ext		v0.16b, vzr.16b, v0.16b, #8
-	eor		v7.16b, v7.16b, v0.16b
+	vmov		q0, q7
+	vmull.p64	q7, d15, d20
+	vext.8		q0, qzr, q0, #8
+	veor.8		q7, q7, q0
 
 	// 32b fold
-	mov		v0.16b, v7.16b
-	mov		v0.s[3], vzr.s[0]
-	ext		v7.16b, v7.16b, vzr.16b, #12
-	ext		v9.16b, v10.16b, v10.16b, #8
-	pmull		v7.1q, v7.1d, v9.1d
-	eor		v7.16b, v7.16b, v0.16b
+	veor.8		d1, d1, d1
+	vmov		d0, d14
+	vmov		s2, s30
+	vext.8		q7, q7, qzr, #12
+	vmull.p64	q7, d14, d21
+	veor.8		q7, q7, q0
 
 	// barrett reduction
 _barrett:
-	ldr		q10, rk7
-	mov		v0.16b, v7.16b
-	ext		v7.16b, v7.16b, v7.16b, #8
+	vldr		d20, rk7
+	vldr		d21, rk8
+	vmov.8		q0, q7
 
-	pmull		v7.1q, v7.1d, v10.1d
-	ext		v7.16b, vzr.16b, v7.16b, #12
-	pmull2		v7.1q, v7.2d, v10.2d
-	ext		v7.16b, vzr.16b, v7.16b, #12
-	eor		v7.16b, v7.16b, v0.16b
-	mov		w0, v7.s[1]
+	vmull.p64	q7, d15, d20
+	vext.8		q7, qzr, q7, #12
+	vmull.p64	q7, d15, d21
+	vext.8		q7, qzr, q7, #12
+	veor.8		q7, q7, q0
+	vmov		r0, s29
 
 _cleanup:
 	// scale the result back to 16 bits
-	lsr		x0, x0, #16
-	ldp		x29, x30, [sp], #32
-	ret
+	lsr		r0, r0, #16
+	add		sp, sp, #0x10
+	pop		{r4, pc}
 
 	.align		4
 _less_than_128:
 
 	// check if there is enough buffer to be able to fold 16B at a time
 	cmp		arg3, #32
-	b.lt		_less_than_32
+	blt		_less_than_32
 
 	// now if there is, load the constants
-	ldr		q10, rk1		// rk1 and rk2 in xmm10
+	vldr		d20, rk1
+	vldr		d21, rk2		// rk1 and rk2 in xmm10
 
-	movi		v0.16b, #0
-	mov		v0.s[3], arg1_low32	// get the initial crc value
-	ld1		{v7.2d}, [arg2], #0x10
-CPU_LE(	rev64		v7.16b, v7.16b		)
-	ext		v7.16b, v7.16b, v7.16b, #8
-	eor		v7.16b, v7.16b, v0.16b
+	vmov.i8		q0, #0
+	vmov		s3, arg1_low32		// get the initial crc value
+	vld1.64		{q7}, [arg2]!
+CPU_LE(	vrev64.8	q7, q7		)
+	vext.8		q7, q7, q7, #8
+	veor.8		q7, q7, q0
 
 	// update the counter. subtract 32 instead of 16 to save one
 	// instruction from the loop
@@ -331,21 +363,23 @@ CPU_LE(	rev64		v7.16b, v7.16b		)
 
 	.align		4
 _less_than_32:
-	cbz		arg3, _cleanup
+	teq		arg3, #0
+	beq		_cleanup
 
-	movi		v0.16b, #0
-	mov		v0.s[3], arg1_low32	// get the initial crc value
+	vmov.i8		q0, #0
+	vmov		s3, arg1_low32		// get the initial crc value
 
 	cmp		arg3, #16
-	b.eq		_exact_16_left
-	b.lt		_less_than_16_left
+	beq		_exact_16_left
+	blt		_less_than_16_left
 
-	ld1		{v7.2d}, [arg2], #0x10
-CPU_LE(	rev64		v7.16b, v7.16b		)
-	ext		v7.16b, v7.16b, v7.16b, #8
-	eor		v7.16b, v7.16b, v0.16b
+	vld1.64		{q7}, [arg2]!
+CPU_LE(	vrev64.8	q7, q7		)
+	vext.8		q7, q7, q7, #8
+	veor.8		q7, q7, q0
 	sub		arg3, arg3, #16
-	ldr		q10, rk1		// rk1 and rk2 in xmm10
+	vldr		d20, rk1
+	vldr		d21, rk2		// rk1 and rk2 in xmm10
 	b		_get_last_two_regs
 
 	.align		4
@@ -353,117 +387,124 @@ _less_than_16_left:
 	// use stack space to load data less than 16 bytes, zero-out
 	// the 16B in memory first.
 
-	add		x11, sp, #0x10
-	stp		xzr, xzr, [x11]
+	vst1.8		{qzr}, [sp]
+	mov		ip, sp
 
 	cmp		arg3, #4
-	b.lt		_only_less_than_4
+	blt		_only_less_than_4
 
 	// backup the counter value
-	mov		x9, arg3
-	tbz		arg3, #3, _less_than_8_left
+	mov		lr, arg3
+	cmp		arg3, #8
+	blt		_less_than_8_left
 
 	// load 8 Bytes
-	ldr		x0, [arg2], #8
-	str		x0, [x11], #8
+	ldr		r0, [arg2], #4
+	ldr		r3, [arg2], #4
+	str		r0, [ip], #4
+	str		r3, [ip], #4
 	sub		arg3, arg3, #8
 
 _less_than_8_left:
-	tbz		arg3, #2, _less_than_4_left
+	cmp		arg3, #4
+	blt		_less_than_4_left
 
 	// load 4 Bytes
-	ldr		w0, [arg2], #4
-	str		w0, [x11], #4
+	ldr		r0, [arg2], #4
+	str		r0, [ip], #4
 	sub		arg3, arg3, #4
 
 _less_than_4_left:
-	tbz		arg3, #1, _less_than_2_left
+	cmp		arg3, #2
+	blt		_less_than_2_left
 
 	// load 2 Bytes
-	ldrh		w0, [arg2], #2
-	strh		w0, [x11], #2
+	ldrh		r0, [arg2], #2
+	strh		r0, [ip], #2
 	sub		arg3, arg3, #2
 
 _less_than_2_left:
-	cbz		arg3, _zero_left
+	cmp		arg3, #1
+	blt		_zero_left
 
 	// load 1 Byte
-	ldrb		w0, [arg2]
-	strb		w0, [x11]
+	ldrb		r0, [arg2]
+	strb		r0, [ip]
 
 _zero_left:
-	add		x11, sp, #0x10
-	ld1		{v7.2d}, [x11]
-CPU_LE(	rev64		v7.16b, v7.16b		)
-	ext		v7.16b, v7.16b, v7.16b, #8
-	eor		v7.16b, v7.16b, v0.16b
+	vld1.64		{q7}, [sp]
+CPU_LE(	vrev64.8	q7, q7		)
+	vext.8		q7, q7, q7, #8
+	veor.8		q7, q7, q0
 
 	// shl r9, 4
-	adr		x0, tbl_shf_table + 16
-	sub		x0, x0, x9
-	ld1		{v0.16b}, [x0]
-	movi		v9.16b, #0x80
-	eor		v0.16b, v0.16b, v9.16b
-	tbl		v7.16b, {v7.16b}, v0.16b
+	adr		ip, tbl_shf_table + 16
+	sub		ip, ip, lr
+	vld1.8		{q0}, [ip]
+	vmov.i8		q9, #0x80
+	veor.8		q0, q0, q9
+	vmov		q9, q7
+	vtbl.8		d14, {d18-d19}, d0
+	vtbl.8		d15, {d18-d19}, d1
 
 	b		_128_done
 
 	.align		4
 _exact_16_left:
-	ld1		{v7.2d}, [arg2]
-CPU_LE(	rev64		v7.16b, v7.16b		)
-	ext		v7.16b, v7.16b, v7.16b, #8
-	eor		v7.16b, v7.16b, v0.16b	// xor the initial crc value
+	vld1.64		{q7}, [arg2]
+CPU_LE(	vrev64.8	q7, q7			)
+	vext.8		q7, q7, q7, #8
+	veor.8		q7, q7, q0		// xor the initial crc value
 
 	b		_128_done
 
 _only_less_than_4:
 	cmp		arg3, #3
-	b.lt		_only_less_than_3
+	blt		_only_less_than_3
 
 	// load 3 Bytes
-	ldrh		w0, [arg2]
-	strh		w0, [x11]
+	ldrh		r0, [arg2]
+	strh		r0, [ip]
 
-	ldrb		w0, [arg2, #2]
-	strb		w0, [x11, #2]
+	ldrb		r0, [arg2, #2]
+	strb		r0, [ip, #2]
 
-	ld1		{v7.2d}, [x11]
-CPU_LE(	rev64		v7.16b, v7.16b		)
-	ext		v7.16b, v7.16b, v7.16b, #8
-	eor		v7.16b, v7.16b, v0.16b
+	vld1.64		{q7}, [ip]
+CPU_LE(	vrev64.8	q7, q7			)
+	vext.8		q7, q7, q7, #8
+	veor.8		q7, q7, q0
 
-	ext		v7.16b, v7.16b, vzr.16b, #5
+	vext.8		q7, q7, qzr, #5
 	b		_barrett
 
 _only_less_than_3:
 	cmp		arg3, #2
-	b.lt		_only_less_than_2
+	blt		_only_less_than_2
 
 	// load 2 Bytes
-	ldrh		w0, [arg2]
-	strh		w0, [x11]
+	ldrh		r0, [arg2]
+	strh		r0, [ip]
 
-	ld1		{v7.2d}, [x11]
-CPU_LE(	rev64		v7.16b, v7.16b		)
-	ext		v7.16b, v7.16b, v7.16b, #8
-	eor		v7.16b, v7.16b, v0.16b
+	vld1.64		{q7}, [ip]
+CPU_LE(	vrev64.8	q7, q7			)
+	vext.8		q7, q7, q7, #8
+	veor.8		q7, q7, q0
 
-	ext		v7.16b, v7.16b, vzr.16b, #6
+	vext.8		q7, q7, qzr, #6
 	b		_barrett
 
 _only_less_than_2:
 
 	// load 1 Byte
-	ldrb		w0, [arg2]
-	strb		w0, [x11]
+	ldrb		r0, [arg2]
+	strb		r0, [ip]
 
-	ld1		{v7.2d}, [x11]
-CPU_LE(	rev64		v7.16b, v7.16b		)
-	ext		v7.16b, v7.16b, v7.16b, #8
-	eor		v7.16b, v7.16b, v0.16b
+	vld1.64		{q7}, [ip]
+CPU_LE(	vrev64.8	q7, q7			)
+	vext.8		q7, q7, q7, #8
+	veor.8		q7, q7, q0
 
-	ext		v7.16b, v7.16b, vzr.16b, #7
+	vext.8		q7, q7, qzr, #7
 	b		_barrett
 
 ENDPROC(crc_t10dif_pmull)
@@ -482,16 +523,26 @@ ENDPROC(crc_t10dif_pmull)
 // rk7 = floor(2^64/Q)
 // rk8 = Q
 
-rk1:	.octa		0x06df0000000000002d56000000000000
-rk3:	.octa		0x7cf50000000000009d9d000000000000
-rk5:	.octa		0x13680000000000002d56000000000000
-rk7:	.octa		0x000000018bb7000000000001f65a57f8
-rk9:	.octa		0xbfd6000000000000ceae000000000000
-rk11:	.octa		0x713c0000000000001e16000000000000
-rk13:	.octa		0x80a6000000000000f7f9000000000000
-rk15:	.octa		0xe658000000000000044c000000000000
-rk17:	.octa		0xa497000000000000ad18000000000000
-rk19:	.octa		0xe7b50000000000006ee3000000000000
+rk1:	.quad		0x2d56000000000000
+rk2:	.quad		0x06df000000000000
+rk3:	.quad		0x9d9d000000000000
+rk4:	.quad		0x7cf5000000000000
+rk5:	.quad		0x2d56000000000000
+rk6:	.quad		0x1368000000000000
+rk7:	.quad		0x00000001f65a57f8
+rk8:	.quad		0x000000018bb70000
+rk9:	.quad		0xceae000000000000
+rk10:	.quad		0xbfd6000000000000
+rk11:	.quad		0x1e16000000000000
+rk12:	.quad		0x713c000000000000
+rk13:	.quad		0xf7f9000000000000
+rk14:	.quad		0x80a6000000000000
+rk15:	.quad		0x044c000000000000
+rk16:	.quad		0xe658000000000000
+rk17:	.quad		0xad18000000000000
+rk18:	.quad		0xa497000000000000
+rk19:	.quad		0x6ee3000000000000
+rk20:	.quad		0xe7b5000000000000
 
 tbl_shf_table:
 // use these values for shift constants for the tbl/tbx instruction
diff --git a/arch/arm64/crypto/crct10dif-ce-glue.c b/arch/arm/crypto/crct10dif-ce-glue.c
similarity index 76%
copy from arch/arm64/crypto/crct10dif-ce-glue.c
copy to arch/arm/crypto/crct10dif-ce-glue.c
index d11f33dae79c..e717538d902c 100644
--- a/arch/arm64/crypto/crct10dif-ce-glue.c
+++ b/arch/arm/crypto/crct10dif-ce-glue.c
@@ -1,5 +1,5 @@
 /*
- * Accelerated CRC-T10DIF using arm64 NEON and Crypto Extensions instructions
+ * Accelerated CRC-T10DIF using ARM NEON and Crypto Extensions instructions
  *
  * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
  *
@@ -8,7 +8,6 @@
  * published by the Free Software Foundation.
  */
 
-#include <linux/cpufeature.h>
 #include <linux/crc-t10dif.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
@@ -18,6 +17,7 @@
 #include <crypto/internal/hash.h>
 
 #include <asm/neon.h>
+#include <asm/simd.h>
 
 asmlinkage u16 crc_t10dif_pmull(u16 init_crc, const u8 buf[], u64 len);
 
@@ -34,9 +34,13 @@ static int crct10dif_update(struct shash_desc *desc, const u8 *data,
 {
 	u16 *crc = shash_desc_ctx(desc);
 
-	kernel_neon_begin_partial(14);
-	*crc = crc_t10dif_pmull(*crc, data, length);
-	kernel_neon_end();
+	if (may_use_simd()) {
+		kernel_neon_begin();
+		*crc = crc_t10dif_pmull(*crc, data, length);
+		kernel_neon_end();
+	} else {
+		*crc = crc_t10dif_generic(*crc, data, length);
+	}
 
 	return 0;
 }
@@ -57,7 +61,7 @@ static struct shash_alg crc_t10dif_alg = {
 
 	.descsize		= CRC_T10DIF_DIGEST_SIZE,
 	.base.cra_name		= "crct10dif",
-	.base.cra_driver_name	= "crct10dif-arm64-ce",
+	.base.cra_driver_name	= "crct10dif-arm-ce",
 	.base.cra_priority	= 200,
 	.base.cra_blocksize	= CRC_T10DIF_BLOCK_SIZE,
 	.base.cra_module	= THIS_MODULE,
@@ -65,6 +69,9 @@ static struct shash_alg crc_t10dif_alg = {
 
 static int __init crc_t10dif_mod_init(void)
 {
+	if (!(elf_hwcap2 & HWCAP2_PMULL))
+		return -ENODEV;
+
 	return crypto_register_shash(&crc_t10dif_alg);
 }
 
@@ -73,8 +80,10 @@ static void __exit crc_t10dif_mod_exit(void)
 	crypto_unregister_shash(&crc_t10dif_alg);
 }
 
-module_cpu_feature_match(PMULL, crc_t10dif_mod_init);
+module_init(crc_t10dif_mod_init);
 module_exit(crc_t10dif_mod_exit);
 
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("crct10dif");
+MODULE_ALIAS_CRYPTO("crct10dif-arm-ce");
-- 
2.7.4

^ permalink raw reply related

* [PATCH 2/2] mpi: Fix NULL ptr dereference in mpi_powm() [ver #3]
From: David Howells @ 2016-11-24 13:23 UTC (permalink / raw)
  To: jmorris
  Cc: Dmitry Kasatkin, linux-kernel, stable, dhowells,
	linux-security-module, keyrings, linux-crypto, linux-ima-devel,
	Andrey Ryabinin
In-Reply-To: <147999377574.9697.16315343355948647181.stgit@warthog.procyon.org.uk>

From: Andrey Ryabinin <aryabinin@virtuozzo.com>

This fixes CVE-2016-8650.

If mpi_powm() is given a zero exponent, it wants to immediately return
either 1 or 0, depending on the modulus.  However, if the result was
initalised with zero limb space, no limbs space is allocated and a
NULL-pointer exception ensues.

Fix this by allocating a minimal amount of limb space for the result when
the 0-exponent case when the result is 1 and not touching the limb space
when the result is 0.

This affects the use of RSA keys and X.509 certificates that carry them.

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
PGD 0
Oops: 0002 [#1] SMP
Modules linked in:
CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
task: ffff8804011944c0 task.stack: ffff880401294000
RIP: 0010:[<ffffffff8138ce5d>]  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
RSP: 0018:ffff880401297ad8  EFLAGS: 00010212
RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0
RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0
RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000
R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50
FS:  00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0
Stack:
 ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4
 0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30
 ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8
Call Trace:
 [<ffffffff81376cd4>] ? __sg_page_iter_next+0x43/0x66
 [<ffffffff81376d12>] ? sg_miter_get_next_page+0x1b/0x5d
 [<ffffffff81376f37>] ? sg_miter_next+0x17/0xbd
 [<ffffffff8138ba3a>] ? mpi_read_raw_from_sgl+0xf2/0x146
 [<ffffffff8132a95c>] rsa_verify+0x9d/0xee
 [<ffffffff8132acca>] ? pkcs1pad_sg_set_buf+0x2e/0xbb
 [<ffffffff8132af40>] pkcs1pad_verify+0xc0/0xe1
 [<ffffffff8133cb5e>] public_key_verify_signature+0x1b0/0x228
 [<ffffffff8133d974>] x509_check_for_self_signed+0xa1/0xc4
 [<ffffffff8133cdde>] x509_cert_parse+0x167/0x1a1
 [<ffffffff8133d609>] x509_key_preparse+0x21/0x1a1
 [<ffffffff8133c3d7>] asymmetric_key_preparse+0x34/0x61
 [<ffffffff812fc9f3>] key_create_or_update+0x145/0x399
 [<ffffffff812fe227>] SyS_add_key+0x154/0x19e
 [<ffffffff81001c2b>] do_syscall_64+0x80/0x191
 [<ffffffff816825e4>] entry_SYSCALL64_slow_path+0x25/0x25
Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce <49> c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f
RIP  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
 RSP <ffff880401297ad8>
CR2: 0000000000000000
---[ end trace d82015255d4a5d8d ]---

Basically, this is a backport of a libgcrypt patch:

	http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526

Fixes: cdec9cb5167a ("crypto: GnuPG based MPI lib - source files (part 1)")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
cc: linux-ima-devel@lists.sourceforge.net
cc: stable@vger.kernel.org
---

 lib/mpi/mpi-pow.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/mpi/mpi-pow.c b/lib/mpi/mpi-pow.c
index 5464c8744ea9..e24388a863a7 100644
--- a/lib/mpi/mpi-pow.c
+++ b/lib/mpi/mpi-pow.c
@@ -64,8 +64,13 @@ int mpi_powm(MPI res, MPI base, MPI exp, MPI mod)
 	if (!esize) {
 		/* Exponent is zero, result is 1 mod MOD, i.e., 1 or 0
 		 * depending on if MOD equals 1.  */
-		rp[0] = 1;
 		res->nlimbs = (msize == 1 && mod->d[0] == 1) ? 0 : 1;
+		if (res->nlimbs) {
+			if (mpi_resize(res, 1) < 0)
+				goto enomem;
+			rp = res->d;
+			rp[0] = 1;
+		}
 		res->sign = 0;
 		goto leave;
 	}


^ permalink raw reply related

* [PATCH 1/2] X.509: Fix double free in x509_cert_parse() [ver #3]
From: David Howells @ 2016-11-24 13:23 UTC (permalink / raw)
  To: jmorris
  Cc: linux-kernel, stable, dhowells, linux-security-module, keyrings,
	linux-crypto, Andrey Ryabinin
In-Reply-To: <147999377574.9697.16315343355948647181.stgit@warthog.procyon.org.uk>

From: Andrey Ryabinin <aryabinin@virtuozzo.com>

We shouldn't free cert->pub->key in x509_cert_parse() because
x509_free_certificate() also does this:
	BUG: Double free or freeing an invalid pointer
	...
	Call Trace:
	 [<ffffffff81896c20>] dump_stack+0x63/0x83
	 [<ffffffff81356571>] kasan_object_err+0x21/0x70
	 [<ffffffff81356ed9>] kasan_report_double_free+0x49/0x60
	 [<ffffffff813561ad>] kasan_slab_free+0x9d/0xc0
	 [<ffffffff81350b7a>] kfree+0x8a/0x1a0
	 [<ffffffff81844fbf>] public_key_free+0x1f/0x30
	 [<ffffffff818455d4>] x509_free_certificate+0x24/0x90
	 [<ffffffff818460bc>] x509_cert_parse+0x2bc/0x300
	 [<ffffffff81846cae>] x509_key_preparse+0x3e/0x330
	 [<ffffffff818444cf>] asymmetric_key_preparse+0x6f/0x100
	 [<ffffffff8178bec0>] key_create_or_update+0x260/0x5f0
	 [<ffffffff8178e6d9>] SyS_add_key+0x199/0x2a0
	 [<ffffffff821d823b>] entry_SYSCALL_64_fastpath+0x1e/0xad
	Object at ffff880110bd1900, in cache kmalloc-512 size: 512
	....
	Freed:
	PID = 2579
	[<ffffffff8104283b>] save_stack_trace+0x1b/0x20
	[<ffffffff813558f6>] save_stack+0x46/0xd0
	[<ffffffff81356183>] kasan_slab_free+0x73/0xc0
	[<ffffffff81350b7a>] kfree+0x8a/0x1a0
	[<ffffffff818460a3>] x509_cert_parse+0x2a3/0x300
	[<ffffffff81846cae>] x509_key_preparse+0x3e/0x330
	[<ffffffff818444cf>] asymmetric_key_preparse+0x6f/0x100
	[<ffffffff8178bec0>] key_create_or_update+0x260/0x5f0
	[<ffffffff8178e6d9>] SyS_add_key+0x199/0x2a0
	[<ffffffff821d823b>] entry_SYSCALL_64_fastpath+0x1e/0xad

Fixes: db6c43bd2132 ("crypto: KEYS: convert public key and digsig asym to the akcipher api")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
---

 crypto/asymmetric_keys/x509_cert_parser.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/crypto/asymmetric_keys/x509_cert_parser.c b/crypto/asymmetric_keys/x509_cert_parser.c
index 865f46ea724f..c80765b211cf 100644
--- a/crypto/asymmetric_keys/x509_cert_parser.c
+++ b/crypto/asymmetric_keys/x509_cert_parser.c
@@ -133,7 +133,6 @@ struct x509_certificate *x509_cert_parse(const void *data, size_t datalen)
 	return cert;
 
 error_decode:
-	kfree(cert->pub->key);
 	kfree(ctx);
 error_no_ctx:
 	x509_free_certificate(cert);

^ permalink raw reply related

* [PATCH 0/2] KEYS: Fixes [ver #3]
From: David Howells @ 2016-11-24 13:22 UTC (permalink / raw)
  To: jmorris
  Cc: dhowells, linux-security-module, keyrings, linux-kernel,
	linux-crypto


Hi James,

Can you pull these patches please and pass them on to Linus?  They include
the following:

 (1) Fix mpi_powm()'s handling of a number with a zero exponent [CVE-2016-8650].

 (2) Fix double free in X.509 error handling.

Ver #3:

 - Integrate my and Andrey's patches for mpi_powm() and use mpi_resize()
   instead of RESIZE_IF_NEEDED() - the latter adds a duplicate check into
   the execution path of a trivial case we don't normally expect to be
   taken.

Ver #2:

 - Use RESIZE_IF_NEEDED() to conditionally resize the result rather than
   manually doing this.

The patches can be found here also:

	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=keys-fixes

Tagged thusly:

	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
	keys-fixes-20161124-3

David
---
Andrey Ryabinin (2):
      X.509: Fix double free in x509_cert_parse()
      mpi: Fix NULL ptr dereference in mpi_powm()


 crypto/asymmetric_keys/x509_cert_parser.c |    1 -
 lib/mpi/mpi-pow.c                         |    7 ++++++-
 2 files changed, 6 insertions(+), 2 deletions(-)


^ permalink raw reply

* 53486 linux-crypto
From: e.camilla.johansson @ 2016-11-24  6:32 UTC (permalink / raw)
  To: linux-crypto

[-- Attachment #1: INFO_93352944885680_linux-crypto.zip --]
[-- Type: application/zip, Size: 2549 bytes --]

^ permalink raw reply

* [PATCH] crypto: acomp - don't use stack buffer in test_acomp()
From: Eric Biggers @ 2016-11-23 18:24 UTC (permalink / raw)
  To: linux-crypto
  Cc: Giovanni Cabiddu, Herbert Xu, David S. Miller, Andy Lutomirski,
	Eric Biggers

With virtually-mapped stacks (CONFIG_VMAP_STACK=y), using the
scatterlist crypto API with stack buffers is not allowed, and with
appropriate debugging options will cause the
'BUG_ON(!virt_addr_valid(buf));' in sg_set_buf() to be triggered.
Use a heap buffer instead.

Fixes: d7db7a882deb ("crypto: acomp - update testmgr with support for acomp")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 crypto/testmgr.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index ded50b6..aca1b7b 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1448,17 +1448,21 @@ static int test_acomp(struct crypto_acomp *tfm, struct comp_testvec *ctemplate,
 {
 	const char *algo = crypto_tfm_alg_driver_name(crypto_acomp_tfm(tfm));
 	unsigned int i;
-	char output[COMP_BUF_SIZE];
+	char *output;
 	int ret;
 	struct scatterlist src, dst;
 	struct acomp_req *req;
 	struct tcrypt_result result;
 
+	output = kmalloc(COMP_BUF_SIZE, GFP_KERNEL);
+	if (!output)
+		return -ENOMEM;
+
 	for (i = 0; i < ctcount; i++) {
 		unsigned int dlen = COMP_BUF_SIZE;
 		int ilen = ctemplate[i].inlen;
 
-		memset(output, 0, sizeof(output));
+		memset(output, 0, dlen);
 		init_completion(&result.completion);
 		sg_init_one(&src, ctemplate[i].input, ilen);
 		sg_init_one(&dst, output, dlen);
@@ -1507,7 +1511,7 @@ static int test_acomp(struct crypto_acomp *tfm, struct comp_testvec *ctemplate,
 		unsigned int dlen = COMP_BUF_SIZE;
 		int ilen = dtemplate[i].inlen;
 
-		memset(output, 0, sizeof(output));
+		memset(output, 0, dlen);
 		init_completion(&result.completion);
 		sg_init_one(&src, dtemplate[i].input, ilen);
 		sg_init_one(&dst, output, dlen);
@@ -1555,6 +1559,7 @@ static int test_acomp(struct crypto_acomp *tfm, struct comp_testvec *ctemplate,
 	ret = 0;
 
 out:
+	kfree(output);
 	return ret;
 }
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH] mpi: Fix NULL ptr dereference in mpi_powm()
From: Andrey Ryabinin @ 2016-11-23 16:28 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S . Miller, Nicolai Stange, linux-crypto, linux-kernel,
	Andrey Ryabinin, stable

Parsing certain certificates (see [1]) triggers NULL-ptr
dereference in mpi_powm():

	BUG: unable to handle kernel NULL pointer dereference at (null)
	IP: [<ffffffff818eb118>] mpi_powm+0xf8/0x10b0
	...
	Call Trace:
	 [<ffffffff817e06a6>] _rsa_dec.isra.2+0x66/0x80
	 [<ffffffff817e07c3>] rsa_verify+0x103/0x1c0
	 [<ffffffff817e1683>] pkcs1pad_verify+0x1c3/0x220
	 [<ffffffff8184549a>] public_key_verify_signature+0x3fa/0x4d0
	 [<ffffffff818473c7>] x509_check_for_self_signed+0x167/0x1e0
	 [<ffffffff8184607e>] x509_cert_parse+0x27e/0x300
	 [<ffffffff81846cae>] x509_key_preparse+0x3e/0x330
	 [<ffffffff818444cf>] asymmetric_key_preparse+0x6f/0x100
	 [<ffffffff8178bec0>] key_create_or_update+0x260/0x5f0
	 [<ffffffff8178e6d9>] SyS_add_key+0x199/0x2a0
	 [<ffffffff821d823b>] entry_SYSCALL_64_fastpath+0x1e/0xad

This happens because mpi_alloc(0) doesn't allocate the limb space.
Fix this by allocating the result if needed.

Basically, this is a backport of libgcrypt patch [2].

[1] http://seclists.org/fulldisclosure/2016/Nov/76
[2] http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526

Fixes: cdec9cb5167a ("crypto: GnuPG based MPI lib - source files (part 1)")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: <stable@vger.kernel.org>
---
 lib/mpi/mpi-pow.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/mpi/mpi-pow.c b/lib/mpi/mpi-pow.c
index 5464c87..9e28d12 100644
--- a/lib/mpi/mpi-pow.c
+++ b/lib/mpi/mpi-pow.c
@@ -64,8 +64,13 @@ int mpi_powm(MPI res, MPI base, MPI exp, MPI mod)
 	if (!esize) {
 		/* Exponent is zero, result is 1 mod MOD, i.e., 1 or 0
 		 * depending on if MOD equals 1.  */
-		rp[0] = 1;
 		res->nlimbs = (msize == 1 && mod->d[0] == 1) ? 0 : 1;
+		if (res->nlimbs) {
+			if (RESIZE_IF_NEEDED(res, 1) < 0)
+				goto enomem;
+			rp = res->d;
+			rp[0] = 1;
+		}
 		res->sign = 0;
 		goto leave;
 	}
-- 
2.7.3

^ permalink raw reply related

* [PATCH] X.509: Fix double free in x509_cert_parse()
From: Andrey Ryabinin @ 2016-11-23 16:16 UTC (permalink / raw)
  To: David Howells
  Cc: Tadeusz Struk, Herbert Xu, David S. Miller, keyrings,
	linux-crypto, linux-kernel, Andrey Ryabinin, stable

We shouldn't free cert->pub->key in x509_cert_parse() because
x509_free_certificate() also does this:
	BUG: Double free or freeing an invalid pointer
	...
	Call Trace:
	 [<ffffffff81896c20>] dump_stack+0x63/0x83
	 [<ffffffff81356571>] kasan_object_err+0x21/0x70
	 [<ffffffff81356ed9>] kasan_report_double_free+0x49/0x60
	 [<ffffffff813561ad>] kasan_slab_free+0x9d/0xc0
	 [<ffffffff81350b7a>] kfree+0x8a/0x1a0
	 [<ffffffff81844fbf>] public_key_free+0x1f/0x30
	 [<ffffffff818455d4>] x509_free_certificate+0x24/0x90
	 [<ffffffff818460bc>] x509_cert_parse+0x2bc/0x300
	 [<ffffffff81846cae>] x509_key_preparse+0x3e/0x330
	 [<ffffffff818444cf>] asymmetric_key_preparse+0x6f/0x100
	 [<ffffffff8178bec0>] key_create_or_update+0x260/0x5f0
	 [<ffffffff8178e6d9>] SyS_add_key+0x199/0x2a0
	 [<ffffffff821d823b>] entry_SYSCALL_64_fastpath+0x1e/0xad
	Object at ffff880110bd1900, in cache kmalloc-512 size: 512
	....
	Freed:
	PID = 2579
	[<ffffffff8104283b>] save_stack_trace+0x1b/0x20
	[<ffffffff813558f6>] save_stack+0x46/0xd0
	[<ffffffff81356183>] kasan_slab_free+0x73/0xc0
	[<ffffffff81350b7a>] kfree+0x8a/0x1a0
	[<ffffffff818460a3>] x509_cert_parse+0x2a3/0x300
	[<ffffffff81846cae>] x509_key_preparse+0x3e/0x330
	[<ffffffff818444cf>] asymmetric_key_preparse+0x6f/0x100
	[<ffffffff8178bec0>] key_create_or_update+0x260/0x5f0
	[<ffffffff8178e6d9>] SyS_add_key+0x199/0x2a0
	[<ffffffff821d823b>] entry_SYSCALL_64_fastpath+0x1e/0xad

Fixes: db6c43bd2132 ("crypto: KEYS: convert public key and digsig asym to the akcipher api")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: <stable@vger.kernel.org>
---
 crypto/asymmetric_keys/x509_cert_parser.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/crypto/asymmetric_keys/x509_cert_parser.c b/crypto/asymmetric_keys/x509_cert_parser.c
index 865f46e..c80765b 100644
--- a/crypto/asymmetric_keys/x509_cert_parser.c
+++ b/crypto/asymmetric_keys/x509_cert_parser.c
@@ -133,7 +133,6 @@ struct x509_certificate *x509_cert_parse(const void *data, size_t datalen)
 	return cert;
 
 error_decode:
-	kfree(cert->pub->key);
 	kfree(ctx);
 error_no_ctx:
 	x509_free_certificate(cert);
-- 
2.7.3

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox