Linux cryptographic layer development

Linux cryptographic layer development
 help / color / mirror / Atom feed

* Re: [PATCH] crypto: add virtio-crypto driver
From: gong lei @ 2016-11-20  7:11 UTC (permalink / raw)
  To: Benedetto, Salvatore, Gonglei, qemu-devel@nongnu.org,
	virtio-dev@lists.oasis-open.org,
	virtualization@lists.linux-foundation.org,
	linux-crypto@vger.kernel.org
  Cc: pasic@linux.vnet.ibm.com, weidong.huang@huawei.com,
	claudio.fontana@huawei.com, mst@redhat.com, luonengjun@huawei.com,
	hanweidong@huawei.com, Zeng, Xin, peter.huangpeng@huawei.com,
	xuquan8@huawei.com, stefanha@redhat.com, jianjay.zhou@huawei.com,
	cornelia.huck@de.ibm.com, davem@davemloft.net,
	wu.wubin@huawei.com, herbert@gondor.apana.org.au
In-Reply-To: <309B30E91F5E2846B79BD9AA9711D031A12767@IRSMSX102.ger.corp.intel.com>

on 2016/11/17 23:55, Benedetto, Salvatore wrote:

> Hi Gonglei,
>
> ...
>> +
>> +static int virtio_crypto_alg_ablkcipher_init_session(
>> +		struct virtio_crypto_ablkcipher_ctx *ctx,
>> +		int alg, const uint8_t *key,
>> +		unsigned int keylen,
>> +		int encrypt)
>> +{
>> +	struct scatterlist outhdr, key_sg, inhdr, *sgs[3];
>> +	unsigned int tmp;
>> +	struct virtio_crypto_session_input input;
>> +	struct virtio_crypto_op_ctrl_req ctrl;
>> +	struct virtio_crypto *vcrypto = ctx->vcrypto;
>> +	int op = encrypt ? VIRTIO_CRYPTO_OP_ENCRYPT :
>> VIRTIO_CRYPTO_OP_DECRYPT;
>> +	int err;
>> +	unsigned int num_out = 0, num_in = 0;
>> +
>> +	memset(&ctrl, 0, sizeof(ctrl));
>> +	memset(&input, 0, sizeof(input));
>> +	/* Pad ctrl header */
>> +	ctrl.header.opcode =
>> cpu_to_le32(VIRTIO_CRYPTO_CIPHER_CREATE_SESSION);
>> +	ctrl.header.algo = cpu_to_le32((uint32_t)alg);
>> +	/* Set the default dataqueue id to 0 */
>> +	ctrl.header.queue_id = 0;
>> +
>> +	input.status = cpu_to_le32(VIRTIO_CRYPTO_ERR);
>> +	/* Pad cipher's parameters */
>> +	ctrl.u.sym_create_session.op_type =
>> +		cpu_to_le32(VIRTIO_CRYPTO_SYM_OP_CIPHER);
>> +	ctrl.u.sym_create_session.u.cipher.para.algo = ctrl.header.algo;
>> +	ctrl.u.sym_create_session.u.cipher.para.keylen =
>> cpu_to_le32(keylen);
>> +	ctrl.u.sym_create_session.u.cipher.para.op = cpu_to_le32(op);
>> +
>> +	sg_init_one(&outhdr, &ctrl, sizeof(ctrl));
> I believe this won't work when the new virtually-mapped kernel stack (VMAP_STACK)
> is enabled.
I see, will fix it in the next version. Thanks for your comments :)
>
> Regards,
> Salvatore

-- 
Regards,
-Gonglei

^ permalink raw reply

* Re: [PATCH 2/3] crypto: AF_ALG - disregard AAD buffer space for output
From: Stephan Mueller @ 2016-11-19 21:08 UTC (permalink / raw)
  To: Herbert Xu; +Cc: mathew.j.martineau, linux-crypto
In-Reply-To: <20161116090446.GE29644@gondor.apana.org.au>

Am Mittwoch, 16. November 2016, 17:04:46 CET schrieb Herbert Xu:

Hi Herbert,

> On Wed, Nov 16, 2016 at 10:02:59AM +0100, Stephan Mueller wrote:
> > One thing occurred to me: The copying of the AD would only be done of src
> > != dst. For the AF_ALG interface, I thing we always have src != dst due
> > to the user space/kernel space translation. That means the kernel copies
> > the AD around even in user space src == dst. Isn't that a waste? I.e.
> > shouldn't we handle the AD copying rather in user space than in kernel
> > space?
> 
> No that's not the case.  You can do zero-copy, in which case src
> would be identical to dst.

The way to go on this topic would be to use the same logic as the authenc 
implementation by using a null cipher for the copy operation. Though, finding 
out whether the src and dst buffers are the same is an interesting 
proposition, because we need to traverse the src and dst SGLs to see whether 
the same pages and same offsets are used. A simple check for src SGL == dst 
SGL will not work for the AF_ALG implementation, because the src SGL will 
always be different from the dst SGL because they are constructed in different 
ways (tsgl will always be different from rsgl). What may be the same are the 
pages and offsets that are pointed to by the SGL in case of zerocopy.

Keeping that in mind, I am wondering whether the authenc() implementation 
should be changed to simply remove the copy operation in there. As there seem 
to be no other AEAD cipher implements that copy operation (at least the major 
CCM and GCM implementations applicable to X86 do not do that), it seems that 
it is not necessary at all for in-kernel users. The authenc implementation 
performs the copy operation of the src SGL if it is different from the dst 
SGL. See the following code used by authenc:

        if (req->src != req->dst) {
                err = crypto_authenc_copy_assoc(req);
                if (err)
                        return err;

                dst = scatterwalk_ffwd(areq_ctx->dst, req->dst, req-
>assoclen);
        }

Thus, the authenc implementation will always copy the AAD over in case of 
AF_ALG even though zerocopy with the same buffers are used.

When the in-kernel users of AEAD seemingly do not care about the copying of 
the AAD, and considering that authenc would not do it right for AF_ALG, I am 
wondering whether we should:

1. remove the AAD copy in authenc to make it en-par with the other AEAD 
implementations

2. re-consider the discussed patch

3. tell users to copy the AAD over if they need it in the dst buffers.

Ciao
Stephan

^ permalink raw reply

* Crypto Fixes for 4.9
From: Herbert Xu @ 2016-11-19 10:27 UTC (permalink / raw)
  To: Linus Torvalds, David S. Miller, Linux Kernel Mailing List,
	Linux Crypto Mailing List

Hi Linus:

This push fixes the following issues:

- Compiler warning in caam driver that was the last one remaining.
- Do not register aes-xts in caam drivers on unsupported platforms.
- Regression in algif_hash interface that may lead to an oops.


Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git linus


Arnd Bergmann (1):
      crypto: caam - fix type mismatch warning

Herbert Xu (1):
      crypto: algif_hash - Fix NULL hash crash with shash

Sven Ebenfeld (1):
      crypto: caam - do not register AES-XTS mode on LP units

 crypto/algif_hash.c           |   17 ++++++++++-------
 drivers/crypto/caam/caamalg.c |   11 ++++++++++-
 2 files changed, 20 insertions(+), 8 deletions(-)

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 3/3] drivers: crypto: Enable CPT options crypto for build
From: kbuild test robot @ 2016-11-18 20:44 UTC (permalink / raw)
  To: gcherianv
  Cc: kbuild-all, linux-kernel, linux-crypto, davem, herbert,
	George Cherian
In-Reply-To: <1479481209-11475-4-git-send-email-gcherianv@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 16490 bytes --]

Hi George,

[auto build test ERROR on cryptodev/master]
[also build test ERROR on v4.9-rc5 next-20161117]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/gcherianv-gmail-com/Add-Support-for-Cavium-Cryptographic-Accelerarion-Unit/20161119-005337
base:   https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
config: arm64-allmodconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm64 

All error/warnings (new ones prefixed by >>):

warning: (CRYPTO_DEV_CPT) selects HW_RANDOM_OCTEON which has unmet direct dependencies (HW_RANDOM && CAVIUM_OCTEON_SOC)
   In file included from drivers/crypto/cavium/cpt/cpt_common.h:27:0,
                    from drivers/crypto/cavium/cpt/cpt.h:12,
                    from drivers/crypto/cavium/cpt/cpt_main.c:19:
>> drivers/crypto/cavium/cpt/cpt_hw_types.h:439:2: warning: no semicolon at end of struct or union
     } s;
     ^
>> drivers/crypto/cavium/cpt/cpt_hw_types.h:608:3: error: expected ',', ';' or '}' before 'uint64_t'
      uint64_t reserved_0_5:6;
      ^~~~~~~~
   drivers/crypto/cavium/cpt/cpt_main.c:236:13: warning: 'cpt_enable_all_interrupts' defined but not used [-Wunused-function]
    static void cpt_enable_all_interrupts(struct cpt_device *cpt)
                ^~~~~~~~~~~~~~~~~~~~~~~~~
--
   In file included from drivers/crypto/cavium/cpt/cpt_common.h:27:0,
                    from drivers/crypto/cavium/cpt/cpt.h:12,
                    from drivers/crypto/cavium/cpt/cpt_pf_mbox.c:11:
>> drivers/crypto/cavium/cpt/cpt_hw_types.h:439:2: warning: no semicolon at end of struct or union
     } s;
     ^
>> drivers/crypto/cavium/cpt/cpt_hw_types.h:608:3: error: expected ',', ';' or '}' before 'uint64_t'
      uint64_t reserved_0_5:6;
      ^~~~~~~~
--
>> drivers/char/hw_random/octeon-rng.c:19:31: fatal error: asm/octeon/octeon.h: No such file or directory
    #include <asm/octeon/octeon.h>
                                  ^
   compilation terminated.

vim +608 drivers/crypto/cavium/cpt/cpt_hw_types.h

fcb2dbd1 George Cherian 2016-11-18  433  		uint64_t reserved_48_63:16;
fcb2dbd1 George Cherian 2016-11-18  434  		uint64_t bstatus:48
fcb2dbd1 George Cherian 2016-11-18  435  #else /* Word 0 - Little Endian */
fcb2dbd1 George Cherian 2016-11-18  436  		uint64_t bstatus:48;
fcb2dbd1 George Cherian 2016-11-18  437  		uint64_t reserved_48_63:16;
fcb2dbd1 George Cherian 2016-11-18  438  #endif /* Word 0 - End */
fcb2dbd1 George Cherian 2016-11-18 @439  	} s;
fcb2dbd1 George Cherian 2016-11-18  440  	struct cptx_pf_exe_bist_status_cn81xx {
fcb2dbd1 George Cherian 2016-11-18  441  #if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
fcb2dbd1 George Cherian 2016-11-18  442  		uint64_t reserved_16_63:48;
fcb2dbd1 George Cherian 2016-11-18  443  		uint64_t bstatus:16;
fcb2dbd1 George Cherian 2016-11-18  444  #else /* Word 0 - Little Endian */
fcb2dbd1 George Cherian 2016-11-18  445  		uint64_t bstatus:16;
fcb2dbd1 George Cherian 2016-11-18  446  		uint64_t reserved_16_63:48;
fcb2dbd1 George Cherian 2016-11-18  447  #endif /* Word 0 - End */
fcb2dbd1 George Cherian 2016-11-18  448  	} cn81xx;
fcb2dbd1 George Cherian 2016-11-18  449  };
fcb2dbd1 George Cherian 2016-11-18  450  
fcb2dbd1 George Cherian 2016-11-18  451  /**
fcb2dbd1 George Cherian 2016-11-18  452   * Register (NCB) cpt#_pf_exe_ctl
fcb2dbd1 George Cherian 2016-11-18  453   *
fcb2dbd1 George Cherian 2016-11-18  454   * CPT PF Engine Control Register
fcb2dbd1 George Cherian 2016-11-18  455   * This register enables the engines.
fcb2dbd1 George Cherian 2016-11-18  456   * cptx_pf_exe_ctl_s
fcb2dbd1 George Cherian 2016-11-18  457   * Word0
fcb2dbd1 George Cherian 2016-11-18  458   *  enable:64 [63:0](R/W) Individual enables for each of the engines.
fcb2dbd1 George Cherian 2016-11-18  459   */
fcb2dbd1 George Cherian 2016-11-18  460  union cptx_pf_exe_ctl {
fcb2dbd1 George Cherian 2016-11-18  461  	uint64_t u;
fcb2dbd1 George Cherian 2016-11-18  462  	struct cptx_pf_exe_ctl_s {
fcb2dbd1 George Cherian 2016-11-18  463  #if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
fcb2dbd1 George Cherian 2016-11-18  464  		uint64_t enable:64;
fcb2dbd1 George Cherian 2016-11-18  465  #else /* Word 0 - Little Endian */
fcb2dbd1 George Cherian 2016-11-18  466  		uint64_t enable:64;
fcb2dbd1 George Cherian 2016-11-18  467  #endif /* Word 0 - End */
fcb2dbd1 George Cherian 2016-11-18  468  	} s;
fcb2dbd1 George Cherian 2016-11-18  469  };
fcb2dbd1 George Cherian 2016-11-18  470  
fcb2dbd1 George Cherian 2016-11-18  471  /**
fcb2dbd1 George Cherian 2016-11-18  472   * Register (NCB) cpt#_pf_q#_ctl
fcb2dbd1 George Cherian 2016-11-18  473   *
fcb2dbd1 George Cherian 2016-11-18  474   * CPT Queue Control Register
fcb2dbd1 George Cherian 2016-11-18  475   * This register configures queues. This register should be changed only
fcb2dbd1 George Cherian 2016-11-18  476   * when quiescent (see CPT()_VQ()_INPROG[INFLIGHT]).
fcb2dbd1 George Cherian 2016-11-18  477   * cptx_pf_qx_ctl_s
fcb2dbd1 George Cherian 2016-11-18  478   * Word0
fcb2dbd1 George Cherian 2016-11-18  479   *  reserved_60_63:4 [63:60] reserved.
fcb2dbd1 George Cherian 2016-11-18  480   *  aura:12; [59:48](R/W) Guest-aura for returning this queue's
fcb2dbd1 George Cherian 2016-11-18  481   *	instruction-chunk buffers to FPA. Only used when [INST_FREE] is set.
fcb2dbd1 George Cherian 2016-11-18  482   *	For the FPA to not discard the request, FPA_PF_MAP() must map
fcb2dbd1 George Cherian 2016-11-18  483   *	[AURA] and CPT()_PF_Q()_GMCTL[GMID] as valid.
fcb2dbd1 George Cherian 2016-11-18  484   *  reserved_45_47:3 [47:45] reserved.
fcb2dbd1 George Cherian 2016-11-18  485   *  size:13 [44:32](R/W) Command-buffer size, in number of 64-bit words per
fcb2dbd1 George Cherian 2016-11-18  486   *	command buffer segment. Must be 8*n + 1, where n is the number of
fcb2dbd1 George Cherian 2016-11-18  487   *	instructions per buffer segment.
fcb2dbd1 George Cherian 2016-11-18  488   *  reserved_11_31:21 [31:11] Reserved.
fcb2dbd1 George Cherian 2016-11-18  489   *  cont_err:1 [10:10](R/W) Continue on error.
fcb2dbd1 George Cherian 2016-11-18  490   *	0 = When CPT()_VQ()_MISC_INT[NWRP], CPT()_VQ()_MISC_INT[IRDE] or
fcb2dbd1 George Cherian 2016-11-18  491   *	CPT()_VQ()_MISC_INT[DOVF] are set by hardware or software via
fcb2dbd1 George Cherian 2016-11-18  492   *	CPT()_VQ()_MISC_INT_W1S, then CPT()_VQ()_CTL[ENA] is cleared.  Due to
fcb2dbd1 George Cherian 2016-11-18  493   *	pipelining, additional instructions may have been processed between the
fcb2dbd1 George Cherian 2016-11-18  494   *	instruction causing the error and the next instruction in the disabled
fcb2dbd1 George Cherian 2016-11-18  495   *	queue (the instruction at CPT()_VQ()_SADDR).
fcb2dbd1 George Cherian 2016-11-18  496   *	1 = Ignore errors and continue processing instructions.
fcb2dbd1 George Cherian 2016-11-18  497   *	For diagnostic use only.
fcb2dbd1 George Cherian 2016-11-18  498   *  inst_free:1 [9:9](R/W) Instruction FPA free. When set, when CPT reaches the
fcb2dbd1 George Cherian 2016-11-18  499   *	end of an instruction chunk, that chunk will be freed to the FPA.
fcb2dbd1 George Cherian 2016-11-18  500   *  inst_be:1 [8:8](R/W) Instruction big-endian control. When set, instructions,
fcb2dbd1 George Cherian 2016-11-18  501   *	instruction next chunk pointers, and result structures are stored in
fcb2dbd1 George Cherian 2016-11-18  502   *	big-endian format in memory.
fcb2dbd1 George Cherian 2016-11-18  503   *  iqb_ldwb:1 [7:7](R/W) Instruction load don't write back.
fcb2dbd1 George Cherian 2016-11-18  504   *	0 = The hardware issues NCB transient load (LDT) towards the cache,
fcb2dbd1 George Cherian 2016-11-18  505   *	which if the line hits and is is dirty will cause the line to be
fcb2dbd1 George Cherian 2016-11-18  506   *	written back before being replaced.
fcb2dbd1 George Cherian 2016-11-18  507   *	1 = The hardware issues NCB LDWB read-and-invalidate command towards
fcb2dbd1 George Cherian 2016-11-18  508   *	the cache when fetching the last word of instructions; as a result the
fcb2dbd1 George Cherian 2016-11-18  509   *	line will not be written back when replaced.  This improves
fcb2dbd1 George Cherian 2016-11-18  510   *	performance, but software must not read the instructions after they are
fcb2dbd1 George Cherian 2016-11-18  511   *	posted to the hardware.	Reads that do not consume the last word of a
fcb2dbd1 George Cherian 2016-11-18  512   *	cache line always use LDI.
fcb2dbd1 George Cherian 2016-11-18  513   *  reserved_4_6:3 [6:4] Reserved.
fcb2dbd1 George Cherian 2016-11-18  514   *  grp:3; [3:1](R/W) Engine group.
fcb2dbd1 George Cherian 2016-11-18  515   *  pri:1; [0:0](R/W) Queue priority.
fcb2dbd1 George Cherian 2016-11-18  516   *	1 = This queue has higher priority. Round-robin between higher
fcb2dbd1 George Cherian 2016-11-18  517   *	priority queues.
fcb2dbd1 George Cherian 2016-11-18  518   *	0 = This queue has lower priority. Round-robin between lower
fcb2dbd1 George Cherian 2016-11-18  519   *	priority queues.
fcb2dbd1 George Cherian 2016-11-18  520   */
fcb2dbd1 George Cherian 2016-11-18  521  union cptx_pf_qx_ctl {
fcb2dbd1 George Cherian 2016-11-18  522  	uint64_t u;
fcb2dbd1 George Cherian 2016-11-18  523  	struct cptx_pf_qx_ctl_s {
fcb2dbd1 George Cherian 2016-11-18  524  #if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
fcb2dbd1 George Cherian 2016-11-18  525  		uint64_t reserved_60_63:4;
fcb2dbd1 George Cherian 2016-11-18  526  		uint64_t aura:12;
fcb2dbd1 George Cherian 2016-11-18  527  		uint64_t reserved_45_47:3;
fcb2dbd1 George Cherian 2016-11-18  528  		uint64_t size:13;
fcb2dbd1 George Cherian 2016-11-18  529  		uint64_t reserved_11_31:21;
fcb2dbd1 George Cherian 2016-11-18  530  		uint64_t cont_err:1;
fcb2dbd1 George Cherian 2016-11-18  531  		uint64_t inst_free:1;
fcb2dbd1 George Cherian 2016-11-18  532  		uint64_t inst_be:1;
fcb2dbd1 George Cherian 2016-11-18  533  		uint64_t iqb_ldwb:1;
fcb2dbd1 George Cherian 2016-11-18  534  		uint64_t reserved_4_6:3;
fcb2dbd1 George Cherian 2016-11-18  535  		uint64_t grp:3;
fcb2dbd1 George Cherian 2016-11-18  536  		uint64_t pri:1;
fcb2dbd1 George Cherian 2016-11-18  537  #else /* Word 0 - Little Endian */
fcb2dbd1 George Cherian 2016-11-18  538  		uint64_t pri:1;
fcb2dbd1 George Cherian 2016-11-18  539  		uint64_t grp:3;
fcb2dbd1 George Cherian 2016-11-18  540  		uint64_t reserved_4_6:3;
fcb2dbd1 George Cherian 2016-11-18  541  		uint64_t iqb_ldwb:1;
fcb2dbd1 George Cherian 2016-11-18  542  		uint64_t inst_be:1;
fcb2dbd1 George Cherian 2016-11-18  543  		uint64_t inst_free:1;
fcb2dbd1 George Cherian 2016-11-18  544  		uint64_t cont_err:1;
fcb2dbd1 George Cherian 2016-11-18  545  		uint64_t reserved_11_31:21;
fcb2dbd1 George Cherian 2016-11-18  546  		uint64_t size:13;
fcb2dbd1 George Cherian 2016-11-18  547  		uint64_t reserved_45_47:3;
fcb2dbd1 George Cherian 2016-11-18  548  		uint64_t aura:12;
fcb2dbd1 George Cherian 2016-11-18  549  		uint64_t reserved_60_63:4;
fcb2dbd1 George Cherian 2016-11-18  550  #endif /* Word 0 - End */
fcb2dbd1 George Cherian 2016-11-18  551  	} s;
fcb2dbd1 George Cherian 2016-11-18  552      /* struct cptx_pf_qx_ctl_s cn; */
fcb2dbd1 George Cherian 2016-11-18  553  };
fcb2dbd1 George Cherian 2016-11-18  554  
fcb2dbd1 George Cherian 2016-11-18  555  /**
fcb2dbd1 George Cherian 2016-11-18  556   * Register (NCB) cpt#_pf_g#_en
fcb2dbd1 George Cherian 2016-11-18  557   *
fcb2dbd1 George Cherian 2016-11-18  558   * CPT PF Group Control Register
fcb2dbd1 George Cherian 2016-11-18  559   * This register configures engine groups.
fcb2dbd1 George Cherian 2016-11-18  560   * cptx_pf_gx_en_s
fcb2dbd1 George Cherian 2016-11-18  561   * Word0
fcb2dbd1 George Cherian 2016-11-18  562   *  en: 64; [63:0](R/W/H) Engine group enable. One bit corresponds to each
fcb2dbd1 George Cherian 2016-11-18  563   *	engine, with the bit set to indicate this engine can service this group.
fcb2dbd1 George Cherian 2016-11-18  564   *	Bits corresponding to unimplemented engines read as zero, i.e. only bit
fcb2dbd1 George Cherian 2016-11-18  565   *	numbers	less than CPT()_PF_CONSTANTS[AE] + CPT()_PF_CONSTANTS[SE] are
fcb2dbd1 George Cherian 2016-11-18  566   *	writable. AE engine bits follow SE engine bits.
fcb2dbd1 George Cherian 2016-11-18  567   *	E.g. if CPT()_PF_CONSTANTS[AE] = 0x1, and CPT()_PF_CONSTANTS[SE] = 0x2,
fcb2dbd1 George Cherian 2016-11-18  568   *	then bits <2:0> are read/writable with bit <2> corresponding to AE<0>,
fcb2dbd1 George Cherian 2016-11-18  569   *	and bit <1> to SE<1>, and bit<0> to SE<0>. Before disabling an engine,
fcb2dbd1 George Cherian 2016-11-18  570   *	the corresponding bit in each group must be cleared. CPT()_PF_EXEC_BUSY
fcb2dbd1 George Cherian 2016-11-18  571   *	can then be polled to determing when the engine becomes	idle.
fcb2dbd1 George Cherian 2016-11-18  572   *	At the point, the engine can be disabled.
fcb2dbd1 George Cherian 2016-11-18  573   */
fcb2dbd1 George Cherian 2016-11-18  574  union cptx_pf_gx_en {
fcb2dbd1 George Cherian 2016-11-18  575  	uint64_t u;
fcb2dbd1 George Cherian 2016-11-18  576  	struct cptx_pf_gx_en_s {
fcb2dbd1 George Cherian 2016-11-18  577  #if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
fcb2dbd1 George Cherian 2016-11-18  578  		uint64_t en:64;
fcb2dbd1 George Cherian 2016-11-18  579  #else /* Word 0 - Little Endian */
fcb2dbd1 George Cherian 2016-11-18  580  		uint64_t en:64;
fcb2dbd1 George Cherian 2016-11-18  581  #endif /* Word 0 - End */
fcb2dbd1 George Cherian 2016-11-18  582  	} s;
fcb2dbd1 George Cherian 2016-11-18  583  };
fcb2dbd1 George Cherian 2016-11-18  584  
fcb2dbd1 George Cherian 2016-11-18  585  /**
fcb2dbd1 George Cherian 2016-11-18  586   * Register (NCB) cpt#_vq#_saddr
fcb2dbd1 George Cherian 2016-11-18  587   *
fcb2dbd1 George Cherian 2016-11-18  588   * CPT Queue Starting Buffer Address Registers
fcb2dbd1 George Cherian 2016-11-18  589   * These registers set the instruction buffer starting address.
fcb2dbd1 George Cherian 2016-11-18  590   * cptx_vqx_saddr_s
fcb2dbd1 George Cherian 2016-11-18  591   * Word0
fcb2dbd1 George Cherian 2016-11-18  592   *  reserved_49_63:15 [63:49] Reserved.
fcb2dbd1 George Cherian 2016-11-18  593   *  ptr:43 [48:6](R/W/H) Instruction buffer IOVA <48:6> (64-byte aligned).
fcb2dbd1 George Cherian 2016-11-18  594   *	When written, it is the initial buffer starting address; when read,
fcb2dbd1 George Cherian 2016-11-18  595   *	it is the next read pointer to be requested from L2C. The PTR field
fcb2dbd1 George Cherian 2016-11-18  596   *	is overwritten with the next pointer each time that the command buffer
fcb2dbd1 George Cherian 2016-11-18  597   *	segment is exhausted. New commands will then be read from the newly
fcb2dbd1 George Cherian 2016-11-18  598   *	specified command buffer pointer.
fcb2dbd1 George Cherian 2016-11-18  599   *  reserved_0_5:6 [5:0] Reserved.
fcb2dbd1 George Cherian 2016-11-18  600   *
fcb2dbd1 George Cherian 2016-11-18  601   */
fcb2dbd1 George Cherian 2016-11-18  602  union cptx_vqx_saddr {
fcb2dbd1 George Cherian 2016-11-18  603  	uint64_t u;
fcb2dbd1 George Cherian 2016-11-18  604  	struct cptx_vqx_saddr_s {
fcb2dbd1 George Cherian 2016-11-18  605  #if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
fcb2dbd1 George Cherian 2016-11-18  606  		uint64_t reserved_49_63:15;
fcb2dbd1 George Cherian 2016-11-18  607  		uint64_t ptr:43
fcb2dbd1 George Cherian 2016-11-18 @608  		uint64_t reserved_0_5:6;
fcb2dbd1 George Cherian 2016-11-18  609  #else /* Word 0 - Little Endian */
fcb2dbd1 George Cherian 2016-11-18  610  		uint64_t reserved_0_5:6;
fcb2dbd1 George Cherian 2016-11-18  611  		uint64_t ptr:43;

:::::: The code at line 608 was first introduced by commit
:::::: fcb2dbd14b3247c53056bc2b78e907c569da1d44 drivers: crypto: Add Support for Octeon-tx CPT Engine

:::::: TO: George Cherian <george.cherian@cavium.com>
:::::: CC: 0day robot <fengguang.wu@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 52455 bytes --]

^ permalink raw reply

* Re: [PATCH 1/3] drivers: crypto: Add Support for Octeon-tx CPT Engine
From: George Cherian @ 2016-11-18 19:31 UTC (permalink / raw)
  To: David Daney, gcherianv
  Cc: linux-kernel, linux-crypto, davem, herbert, George Cherian
In-Reply-To: <582F4EA7.9030303@caviumnetworks.com>

Hi David,

Thanks for the review.
On Saturday 19 November 2016 12:25 AM, David Daney wrote:
> On 11/18/2016 07:00 AM, gcherianv@gmail.com wrote:
>> From: George Cherian <george.cherian@cavium.com>
>>
>> Enable the Physical Function diver for the Cavium Crypto Engine (CPT)
>> found in Octeon-tx series of SoC's. CPT is the Cryptographic 
>> Acceleration
>> Unit. CPT includes microcoded GigaCypher symmetric engines (SEs) and
>> asymmetric engines (AEs).
>>
>> Signed-off-by: George Cherian <george.cherian@cavium.com>
>
>
> How was this tested?
Using ecryptfs and dm-crypt.
>
>
>> ---
>>   drivers/crypto/cavium/cpt/Kconfig        |  22 +
>>   drivers/crypto/cavium/cpt/Makefile       |   2 +
>>   drivers/crypto/cavium/cpt/cpt.h          |  90 +++
>>   drivers/crypto/cavium/cpt/cpt_common.h   | 377 +++++++++++++
>>   drivers/crypto/cavium/cpt/cpt_hw_types.h | 940 
>> +++++++++++++++++++++++++++++++
>>   drivers/crypto/cavium/cpt/cpt_main.c     | 891 
>> +++++++++++++++++++++++++++++
>>   drivers/crypto/cavium/cpt/cpt_pf_mbox.c  | 174 ++++++
>>   7 files changed, 2496 insertions(+)
>>   create mode 100644 drivers/crypto/cavium/cpt/Kconfig
>>   create mode 100644 drivers/crypto/cavium/cpt/Makefile
>>   create mode 100644 drivers/crypto/cavium/cpt/cpt.h
>>   create mode 100644 drivers/crypto/cavium/cpt/cpt_common.h
>>   create mode 100644 drivers/crypto/cavium/cpt/cpt_hw_types.h
>>   create mode 100644 drivers/crypto/cavium/cpt/cpt_main.c
>>   create mode 100644 drivers/crypto/cavium/cpt/cpt_pf_mbox.c
>>
>> diff --git a/drivers/crypto/cavium/cpt/Kconfig 
>> b/drivers/crypto/cavium/cpt/Kconfig
>> new file mode 100644
>> index 0000000..8fe3f44
>> --- /dev/null
>> +++ b/drivers/crypto/cavium/cpt/Kconfig
>> @@ -0,0 +1,22 @@
>> +#
>> +# Cavium crypto device configuration
>> +#
>> +
>> +config CRYPTO_DEV_CPT
>> +    tristate
>> +    select HW_RANDOM_OCTEON
>
> This makes no sense.  HW_RANDOM_OCTEON is for a mips64 based SOC and 
> isn't present on devices that have this crypto block.  Why select this?
>
Yeah true... I actually wanted to this one instead 
|CONFIG_HW_RANDOM_CAVIUM|
>
>> +    select CRYPTO_AES
>> +    select CRYPTO_DES
>> +    select CRYPTO_BLKCIPHER
>> +    select FW_LOADER
>> +
>> +config OCTEONTX_CPT_PF
>> +    tristate "Octeon-tx CPT Physical function driver"
>> +    depends on ARCH_THUNDER
>> +    select CRYPTO_DEV_CPT
>> +    help
>> +      Support for Cavium CPT block found in octeon-tx series of
>> +      processors.
>> +
>> +      To compile this as a module, choose M here: the module will be
>> +      called cptpf.
>> diff --git a/drivers/crypto/cavium/cpt/Makefile 
>> b/drivers/crypto/cavium/cpt/Makefile
>> new file mode 100644
>> index 0000000..bf758e2
>> --- /dev/null
>> +++ b/drivers/crypto/cavium/cpt/Makefile
>> @@ -0,0 +1,2 @@
>> +obj-$(CONFIG_OCTEONTX_CPT_PF) += cptpf.o
>> +cptpf-objs := cpt_main.o cpt_pf_mbox.o
>> diff --git a/drivers/crypto/cavium/cpt/cpt.h 
>> b/drivers/crypto/cavium/cpt/cpt.h
>> new file mode 100644
>> index 0000000..63d12da
>> --- /dev/null
>> +++ b/drivers/crypto/cavium/cpt/cpt.h
>> @@ -0,0 +1,90 @@
>> +/*
>> + * Copyright (C) 2016 Cavium, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or 
>> modify it
>> + * under the terms of version 2 of the GNU General Public License
>> + * as published by the Free Software Foundation.
>> + */
>> +
>> +#ifndef __CPT_H
>> +#define __CPT_H
>> +
>> +#include "cpt_common.h"
>> +
>> +#define BASE_PROC_DIR    "cavium"
>> +
>> +#define PF  0
>> +#define VF  1
>> +
>> +struct cpt_device;
>> +
>> +struct microcode {
>> +    uint8_t  is_mc_valid;
>
> s/uint8_t/u8/  ??
>
> That could be done everywhere.
will do
>
> [...]
>> diff --git a/drivers/crypto/cavium/cpt/cpt_common.h 
>> b/drivers/crypto/cavium/cpt/cpt_common.h
>> new file mode 100644
>> index 0000000..351ed4a
>> --- /dev/null
>> +++ b/drivers/crypto/cavium/cpt/cpt_common.h
>> @@ -0,0 +1,377 @@
>> +/*
>> + * Copyright (C) 2016 Cavium, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or 
>> modify it
>> + * under the terms of version 2 of the GNU General Public License
>> + * as published by the Free Software Foundation.
>> + */
>> +
>> +#ifndef __CPT_COMMON_H
>> +#define __CPT_COMMON_H
>> +
>> +#include <asm/byteorder.h>
>> +#include <linux/uaccess.h>
>> +#include <linux/types.h>
>> +#include <linux/spinlock.h>
>> +#include <linux/pci.h>
>> +#include <linux/cpumask.h>
>> +#include <linux/string.h>
>> +#include <linux/pci_regs.h>
>> +#include <linux/delay.h>
>> +#include <linux/printk.h>
>> +#include <linux/sched.h>
>> +#include <linux/completion.h>
>> +#include <asm/arch_timer.h>
>> +#include <linux/types.h>
>> +
>> +#include "cpt_hw_types.h"
>> +
>> +/* configuration space offsets */
>> +#ifndef PCI_VENDOR_ID
>> +#define PCI_VENDOR_ID 0x00 /* 16 bits */
>> +#endif
>> +#ifndef PCI_DEVICE_ID
>> +#define PCI_DEVICE_ID 0x02 /* 16 bits */
>> +#endif
>> +#ifndef PCI_REVISION_ID
>> +#define PCI_REVISION_ID 0x08 /* Revision ID */
>> +#endif
>> +#ifndef PCI_CAPABILITY_LIST
>> +#define PCI_CAPABILITY_LIST 0x34 /* first capability list entry */
>> +#endif
>> +
>
> Standard PCI core functions give you access to all that information, 
> use pdev->device, pdev->revision, etc. instead of reinventing the 
> wheel here with all these #defines.
>
>
>> +/* Device ID */
>> +#define PCI_VENDOR_ID_CAVIUM 0x177d
>
> This is defined in pci_ids.h, use value from there instead of placing 
> a duplicate definition here.
>
okay will remove them
>> +#define CPT_81XX_PCI_PF_DEVICE_ID 0xa040
>> +#define CPT_81XX_PCI_VF_DEVICE_ID 0xa041
>> +
>> +#define PASS_1_0 0x0
>> +
>> +/* CPT Models ((Device ID<<16)|Revision ID) */
>> +/* CPT models */
>> +#define CPT_81XX_PASS1_0 ((CPT_81XX_PCI_PF_DEVICE_ID << 8) | PASS_1_0)
>> +#define CPTVF_81XX_PASS1_0 ((CPT_81XX_PCI_VF_DEVICE_ID << 8) | 
>> PASS_1_0)
>> +
>> +#define PF 0
>> +#define VF 1
>> +
>> +#define DEFAULT_DEVICE_QUEUES CPT_NUM_QS_PER_VF
>> +
>> +#define SUCCESS    (0)
>> +#define FAIL    (1)
>> +
>> +#ifndef ROUNDUP4
>> +#define ROUNDUP4(val) (((val) + 3) & 0xfffffffc)
>> +#endif
>> +
>> +#ifndef ROUNDUP8
>> +#define ROUNDUP8(val) (((val) + 7) & 0xfffffff8)
>> +#endif
>> +
>> +#ifndef ROUNDUP16
>> +#define ROUNDUP16(val) (((val) + 15) & 0xfffffff0)
>> +#endif
>> +
>
> kernel.h has round_up(), use that instead of defining all these.
>
>> +#define ERR_ADDR_LEN 8
>> +
>
> What is that for?  It looks unused.
>
> [...]
>> +/*###### PCIE EP-Mode Configuration Registers #########*/
>> +#define PCIEEP0_CFG000 (0x0)
>> +#define PCIEEP0_CFG002 (0x8)
>> +#define PCIEEP0_CFG011 (0x2C)
>> +#define PCIEEP0_CFG020 (0x50)
>> +#define PCIEEP0_CFG025 (0x64)
>> +#define PCIEEP0_CFG030 (0x78)
>> +#define PCIEEP0_CFG044 (0xB0)
>> +#define PCIEEP0_CFG045 (0xB4)
>> +#define PCIEEP0_CFG082 (0x148)
>> +#define PCIEEP0_CFG095 (0x17C)
>> +#define PCIEEP0_CFG096 (0x180)
>> +#define PCIEEP0_CFG097 (0x184)
>> +#define PCIEEP0_CFG103 (0x19C)
>> +#define PCIEEP0_CFG460 (0x730)
>> +#define PCIEEP0_CFG461 (0x734)
>> +#define PCIEEP0_CFG462 (0x738)
>> +
>> +/*#######  PCIe EP-Mode SR-IOV Configuration Registers  #####*/
>> +#define PCIEEPVF0_CFG000 (0x0)
>> +#define PCIEEPVF0_CFG002 (0x8)
>> +#define PCIEEPVF0_CFG011 (0x2C)
>> +#define PCIEEPVF0_CFG030 (0x78)
>> +#define PCIEEPVF0_CFG044 (0xB0)
>> +
>
> Where are all those defines used?  What are they for?
>
>
> That's all I can look at for now.
>
I will address your comments in next version.
> David.
>

^ permalink raw reply

* Re: [PATCH 1/3] drivers: crypto: Add Support for Octeon-tx CPT Engine
From: David Daney @ 2016-11-18 18:55 UTC (permalink / raw)
  To: gcherianv; +Cc: linux-kernel, linux-crypto, davem, herbert, George Cherian
In-Reply-To: <1479481209-11475-2-git-send-email-gcherianv@gmail.com>

On 11/18/2016 07:00 AM, gcherianv@gmail.com wrote:
> From: George Cherian <george.cherian@cavium.com>
>
> Enable the Physical Function diver for the Cavium Crypto Engine (CPT)
> found in Octeon-tx series of SoC's. CPT is the Cryptographic Acceleration
> Unit. CPT includes microcoded GigaCypher symmetric engines (SEs) and
> asymmetric engines (AEs).
>
> Signed-off-by: George Cherian <george.cherian@cavium.com>


How was this tested?

> ---
>   drivers/crypto/cavium/cpt/Kconfig        |  22 +
>   drivers/crypto/cavium/cpt/Makefile       |   2 +
>   drivers/crypto/cavium/cpt/cpt.h          |  90 +++
>   drivers/crypto/cavium/cpt/cpt_common.h   | 377 +++++++++++++
>   drivers/crypto/cavium/cpt/cpt_hw_types.h | 940 +++++++++++++++++++++++++++++++
>   drivers/crypto/cavium/cpt/cpt_main.c     | 891 +++++++++++++++++++++++++++++
>   drivers/crypto/cavium/cpt/cpt_pf_mbox.c  | 174 ++++++
>   7 files changed, 2496 insertions(+)
>   create mode 100644 drivers/crypto/cavium/cpt/Kconfig
>   create mode 100644 drivers/crypto/cavium/cpt/Makefile
>   create mode 100644 drivers/crypto/cavium/cpt/cpt.h
>   create mode 100644 drivers/crypto/cavium/cpt/cpt_common.h
>   create mode 100644 drivers/crypto/cavium/cpt/cpt_hw_types.h
>   create mode 100644 drivers/crypto/cavium/cpt/cpt_main.c
>   create mode 100644 drivers/crypto/cavium/cpt/cpt_pf_mbox.c
>
> diff --git a/drivers/crypto/cavium/cpt/Kconfig b/drivers/crypto/cavium/cpt/Kconfig
> new file mode 100644
> index 0000000..8fe3f44
> --- /dev/null
> +++ b/drivers/crypto/cavium/cpt/Kconfig
> @@ -0,0 +1,22 @@
> +#
> +# Cavium crypto device configuration
> +#
> +
> +config CRYPTO_DEV_CPT
> +	tristate
> +	select HW_RANDOM_OCTEON

This makes no sense.  HW_RANDOM_OCTEON is for a mips64 based SOC and 
isn't present on devices that have this crypto block.  Why select this?


> +	select CRYPTO_AES
> +	select CRYPTO_DES
> +	select CRYPTO_BLKCIPHER
> +	select FW_LOADER
> +
> +config OCTEONTX_CPT_PF
> +	tristate "Octeon-tx CPT Physical function driver"
> +	depends on ARCH_THUNDER
> +	select CRYPTO_DEV_CPT
> +	help
> +	  Support for Cavium CPT block found in octeon-tx series of
> +	  processors.
> +
> +	  To compile this as a module, choose M here: the module will be
> +	  called cptpf.
> diff --git a/drivers/crypto/cavium/cpt/Makefile b/drivers/crypto/cavium/cpt/Makefile
> new file mode 100644
> index 0000000..bf758e2
> --- /dev/null
> +++ b/drivers/crypto/cavium/cpt/Makefile
> @@ -0,0 +1,2 @@
> +obj-$(CONFIG_OCTEONTX_CPT_PF) += cptpf.o
> +cptpf-objs := cpt_main.o cpt_pf_mbox.o
> diff --git a/drivers/crypto/cavium/cpt/cpt.h b/drivers/crypto/cavium/cpt/cpt.h
> new file mode 100644
> index 0000000..63d12da
> --- /dev/null
> +++ b/drivers/crypto/cavium/cpt/cpt.h
> @@ -0,0 +1,90 @@
> +/*
> + * Copyright (C) 2016 Cavium, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of version 2 of the GNU General Public License
> + * as published by the Free Software Foundation.
> + */
> +
> +#ifndef __CPT_H
> +#define __CPT_H
> +
> +#include "cpt_common.h"
> +
> +#define BASE_PROC_DIR	"cavium"
> +
> +#define PF  0
> +#define VF  1
> +
> +struct cpt_device;
> +
> +struct microcode {
> +	uint8_t  is_mc_valid;

s/uint8_t/u8/  ??

That could be done everywhere.

[...]
> diff --git a/drivers/crypto/cavium/cpt/cpt_common.h b/drivers/crypto/cavium/cpt/cpt_common.h
> new file mode 100644
> index 0000000..351ed4a
> --- /dev/null
> +++ b/drivers/crypto/cavium/cpt/cpt_common.h
> @@ -0,0 +1,377 @@
> +/*
> + * Copyright (C) 2016 Cavium, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of version 2 of the GNU General Public License
> + * as published by the Free Software Foundation.
> + */
> +
> +#ifndef __CPT_COMMON_H
> +#define __CPT_COMMON_H
> +
> +#include <asm/byteorder.h>
> +#include <linux/uaccess.h>
> +#include <linux/types.h>
> +#include <linux/spinlock.h>
> +#include <linux/pci.h>
> +#include <linux/cpumask.h>
> +#include <linux/string.h>
> +#include <linux/pci_regs.h>
> +#include <linux/delay.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/completion.h>
> +#include <asm/arch_timer.h>
> +#include <linux/types.h>
> +
> +#include "cpt_hw_types.h"
> +
> +/* configuration space offsets */
> +#ifndef PCI_VENDOR_ID
> +#define PCI_VENDOR_ID 0x00 /* 16 bits */
> +#endif
> +#ifndef PCI_DEVICE_ID
> +#define PCI_DEVICE_ID 0x02 /* 16 bits */
> +#endif
> +#ifndef PCI_REVISION_ID
> +#define PCI_REVISION_ID 0x08 /* Revision ID */
> +#endif
> +#ifndef PCI_CAPABILITY_LIST
> +#define PCI_CAPABILITY_LIST 0x34 /* first capability list entry */
> +#endif
> +

Standard PCI core functions give you access to all that information, use 
pdev->device, pdev->revision, etc. instead of reinventing the wheel here 
with all these #defines.


> +/* Device ID */
> +#define PCI_VENDOR_ID_CAVIUM 0x177d

This is defined in pci_ids.h, use value from there instead of placing a 
duplicate definition here.

> +#define CPT_81XX_PCI_PF_DEVICE_ID 0xa040
> +#define CPT_81XX_PCI_VF_DEVICE_ID 0xa041
> +
> +#define PASS_1_0 0x0
> +
> +/* CPT Models ((Device ID<<16)|Revision ID) */
> +/* CPT models */
> +#define CPT_81XX_PASS1_0 ((CPT_81XX_PCI_PF_DEVICE_ID << 8) | PASS_1_0)
> +#define CPTVF_81XX_PASS1_0 ((CPT_81XX_PCI_VF_DEVICE_ID << 8) | PASS_1_0)
> +
> +#define PF 0
> +#define VF 1
> +
> +#define DEFAULT_DEVICE_QUEUES CPT_NUM_QS_PER_VF
> +
> +#define SUCCESS	(0)
> +#define FAIL	(1)
> +
> +#ifndef ROUNDUP4
> +#define ROUNDUP4(val) (((val) + 3) & 0xfffffffc)
> +#endif
> +
> +#ifndef ROUNDUP8
> +#define ROUNDUP8(val) (((val) + 7) & 0xfffffff8)
> +#endif
> +
> +#ifndef ROUNDUP16
> +#define ROUNDUP16(val) (((val) + 15) & 0xfffffff0)
> +#endif
> +

kernel.h has round_up(), use that instead of defining all these.

> +#define ERR_ADDR_LEN 8
> +

What is that for?  It looks unused.

[...]
> +/*###### PCIE EP-Mode Configuration Registers #########*/
> +#define PCIEEP0_CFG000 (0x0)
> +#define PCIEEP0_CFG002 (0x8)
> +#define PCIEEP0_CFG011 (0x2C)
> +#define PCIEEP0_CFG020 (0x50)
> +#define PCIEEP0_CFG025 (0x64)
> +#define PCIEEP0_CFG030 (0x78)
> +#define PCIEEP0_CFG044 (0xB0)
> +#define PCIEEP0_CFG045 (0xB4)
> +#define PCIEEP0_CFG082 (0x148)
> +#define PCIEEP0_CFG095 (0x17C)
> +#define PCIEEP0_CFG096 (0x180)
> +#define PCIEEP0_CFG097 (0x184)
> +#define PCIEEP0_CFG103 (0x19C)
> +#define PCIEEP0_CFG460 (0x730)
> +#define PCIEEP0_CFG461 (0x734)
> +#define PCIEEP0_CFG462 (0x738)
> +
> +/*#######  PCIe EP-Mode SR-IOV Configuration Registers  #####*/
> +#define PCIEEPVF0_CFG000 (0x0)
> +#define PCIEEPVF0_CFG002 (0x8)
> +#define PCIEEPVF0_CFG011 (0x2C)
> +#define PCIEEPVF0_CFG030 (0x78)
> +#define PCIEEPVF0_CFG044 (0xB0)
> +

Where are all those defines used?  What are they for?


That's all I can look at for now.

David.

^ permalink raw reply

* Re: [PATCH net-next] cxgb4: Allocate Tx queues dynamically
From: David Miller @ 2016-11-18 19:04 UTC (permalink / raw)
  To: atul.gupta-ut6Up61K2wZBDgjK7y7TUQ
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, nab-IzHhD5pYlfBP7FQvKIMDCQ,
	jejb-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q,
	leedom-ut6Up61K2wZBDgjK7y7TUQ, nirranjan-ut6Up61K2wZBDgjK7y7TUQ,
	varun-ut6Up61K2wZBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	hariprasad-ut6Up61K2wZBDgjK7y7TUQ
In-Reply-To: <1479467260-6509-1-git-send-email-atul.gupta-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

From: Atul Gupta <atul.gupta-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Date: Fri, 18 Nov 2016 16:37:40 +0530

> From: Hariprasad Shenai <hariprasad-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
> 
> Allocate resources dynamically for Upper layer driver's (ULD) like
> cxgbit, iw_cxgb4, cxgb4i and chcr. The resources allocated include Tx
> queues which are allocated when ULD register with cxgb4 driver and freed
> while un-registering. The Tx queues which are shared by ULD shall be
> allocated by first registering driver and un-allocated by last
> unregistering driver.
> 
> Signed-off-by: Atul Gupta <atul.gupta-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] hw_random: Make explicit that max >= 32 always
From: PrasannaKumar Muralidharan @ 2016-11-18 17:30 UTC (permalink / raw)
  To: mpm, herbert, daniel.thompson, linux-crypto; +Cc: PrasannaKumar Muralidharan

As hw_random core calls ->read with max > 32 or more, make it explicit.
Also remove checks involving 'max' being less than 8.

Signed-off-by: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com>
---
 drivers/char/hw_random/msm-rng.c     | 4 ----
 drivers/char/hw_random/pic32-rng.c   | 3 ---
 drivers/char/hw_random/pseries-rng.c | 5 ++---
 include/linux/hw_random.h            | 3 +--
 4 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/char/hw_random/msm-rng.c b/drivers/char/hw_random/msm-rng.c
index 96fb986..841fee8 100644
--- a/drivers/char/hw_random/msm-rng.c
+++ b/drivers/char/hw_random/msm-rng.c
@@ -90,10 +90,6 @@ static int msm_rng_read(struct hwrng *hwrng, void *data, size_t max, bool wait)
 	/* calculate max size bytes to transfer back to caller */
 	maxsize = min_t(size_t, MAX_HW_FIFO_SIZE, max);
 
-	/* no room for word data */
-	if (maxsize < WORD_SZ)
-		return 0;
-
 	ret = clk_prepare_enable(rng->clk);
 	if (ret)
 		return ret;
diff --git a/drivers/char/hw_random/pic32-rng.c b/drivers/char/hw_random/pic32-rng.c
index 11dc9b7..9b5e68a 100644
--- a/drivers/char/hw_random/pic32-rng.c
+++ b/drivers/char/hw_random/pic32-rng.c
@@ -62,9 +62,6 @@ static int pic32_rng_read(struct hwrng *rng, void *buf, size_t max,
 	u32 t;
 	unsigned int timeout = RNG_TIMEOUT;
 
-	if (max < 8)
-		return 0;
-
 	do {
 		t = readl(priv->base + RNGRCNT) & RCNT_MASK;
 		if (t == 64) {
diff --git a/drivers/char/hw_random/pseries-rng.c b/drivers/char/hw_random/pseries-rng.c
index 63ce51d..d9f46b4 100644
--- a/drivers/char/hw_random/pseries-rng.c
+++ b/drivers/char/hw_random/pseries-rng.c
@@ -28,7 +28,6 @@
 static int pseries_rng_read(struct hwrng *rng, void *data, size_t max, bool wait)
 {
 	u64 buffer[PLPAR_HCALL_BUFSIZE];
-	size_t size = max < 8 ? max : 8;
 	int rc;
 
 	rc = plpar_hcall(H_RANDOM, (unsigned long *)buffer);
@@ -36,10 +35,10 @@ static int pseries_rng_read(struct hwrng *rng, void *data, size_t max, bool wait
 		pr_err_ratelimited("H_RANDOM call failed %d\n", rc);
 		return -EIO;
 	}
-	memcpy(data, buffer, size);
+	memcpy(data, buffer, 8);
 
 	/* The hypervisor interface returns 64 bits */
-	return size;
+	return 8;
 }
 
 /**
diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h
index 34a0dc1..bee0827 100644
--- a/include/linux/hw_random.h
+++ b/include/linux/hw_random.h
@@ -30,8 +30,7 @@
  *			Must not be NULL.    *OBSOLETE*
  * @read:		New API. drivers can fill up to max bytes of data
  *			into the buffer. The buffer is aligned for any type
- *			and max is guaranteed to be >= to that alignment
- *			(either 4 or 8 depending on architecture).
+ *			and max is a multiple of 4 and >= 32 bytes.
  * @priv:		Private data, for use by the RNG driver.
  * @quality:		Estimation of true entropy in RNG's bitstream
  *			(per mill).
-- 
2.9.3

^ permalink raw reply related

* [PATCH 3/3] drivers: crypto: Enable CPT options crypto for build
From: gcherianv @ 2016-11-18 15:00 UTC (permalink / raw)
  To: linux-kernel, linux-crypto; +Cc: davem, herbert, George Cherian
In-Reply-To: <1479481209-11475-1-git-send-email-gcherianv@gmail.com>

From: George Cherian <george.cherian@cavium.com>

Add the CPT options in crypto Kconfig and update the
crypto Makefile

Signed-off-by: George Cherian <george.cherian@cavium.com>
---
 drivers/crypto/Kconfig  | 1 +
 drivers/crypto/Makefile | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 4d2b81f..15f9040 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -484,6 +484,7 @@ config CRYPTO_DEV_MXS_DCP
 	  will be called mxs-dcp.
 
 source "drivers/crypto/qat/Kconfig"
+source "drivers/crypto/cavium/cpt/Kconfig"
 
 config CRYPTO_DEV_QCE
 	tristate "Qualcomm crypto engine accelerator"
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index ad7250f..dd33290 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -32,3 +32,4 @@ obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
 obj-$(CONFIG_CRYPTO_DEV_SUN4I_SS) += sunxi-ss/
 obj-$(CONFIG_CRYPTO_DEV_ROCKCHIP) += rockchip/
 obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chelsio/
+obj-$(CONFIG_CRYPTO_DEV_CPT) += cavium/cpt/
-- 
2.1.4

^ permalink raw reply related

* [PATCH 2/3] drivers: crypto: Add the Virtual Function driver for CPT
From: gcherianv @ 2016-11-18 15:00 UTC (permalink / raw)
  To: linux-kernel, linux-crypto; +Cc: davem, herbert, George Cherian
In-Reply-To: <1479481209-11475-1-git-send-email-gcherianv@gmail.com>

From: George Cherian <george.cherian@cavium.com>

Enable the CPT VF driver. CPT is the cryptographic Accelaration Unit
in Octeon-tx series of processors.

Signed-off-by: George Cherian <george.cherian@cavium.com>
---
 drivers/crypto/cavium/cpt/Kconfig            |   10 +
 drivers/crypto/cavium/cpt/Makefile           |    2 +
 drivers/crypto/cavium/cpt/cptvf.h            |  255 +++++++
 drivers/crypto/cavium/cpt/cptvf_algs.c       |  446 +++++++++++
 drivers/crypto/cavium/cpt/cptvf_algs.h       |  159 ++++
 drivers/crypto/cavium/cpt/cptvf_main.c       | 1038 ++++++++++++++++++++++++++
 drivers/crypto/cavium/cpt/cptvf_mbox.c       |  208 ++++++
 drivers/crypto/cavium/cpt/cptvf_reqmanager.c |  655 ++++++++++++++++
 drivers/crypto/cavium/cpt/request_manager.h  |  221 ++++++
 9 files changed, 2994 insertions(+)
 create mode 100644 drivers/crypto/cavium/cpt/cptvf.h
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_algs.c
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_algs.h
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_main.c
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_mbox.c
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_reqmanager.c
 create mode 100644 drivers/crypto/cavium/cpt/request_manager.h

diff --git a/drivers/crypto/cavium/cpt/Kconfig b/drivers/crypto/cavium/cpt/Kconfig
index 8fe3f44..d8c3f48 100644
--- a/drivers/crypto/cavium/cpt/Kconfig
+++ b/drivers/crypto/cavium/cpt/Kconfig
@@ -20,3 +20,13 @@ config OCTEONTX_CPT_PF
 
 	  To compile this as a module, choose M here: the module will be
 	  called cptpf.
+config OCTEONTX_CPT_VF
+	tristate "Octeon-tx CPT Virtual function driver"
+	depends on ARCH_THUNDER
+	select CRYPTO_DEV_CPT
+	help
+	  Support for Cavium CPT Virtual function found in octeon-tx
+	  series of processors.
+
+	  To compile this as a module, choose M here: the module will be
+	  called cptvf.
diff --git a/drivers/crypto/cavium/cpt/Makefile b/drivers/crypto/cavium/cpt/Makefile
index bf758e2..6f70b15 100644
--- a/drivers/crypto/cavium/cpt/Makefile
+++ b/drivers/crypto/cavium/cpt/Makefile
@@ -1,2 +1,4 @@
 obj-$(CONFIG_OCTEONTX_CPT_PF) += cptpf.o
 cptpf-objs := cpt_main.o cpt_pf_mbox.o
+obj-$(CONFIG_OCTEONTX_CPT_VF) += cptvf.o
+cptvf-objs := cptvf_main.o cptvf_reqmanager.o cptvf_mbox.o cptvf_algs.o
diff --git a/drivers/crypto/cavium/cpt/cptvf.h b/drivers/crypto/cavium/cpt/cptvf.h
new file mode 100644
index 0000000..1fafea8
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/cptvf.h
@@ -0,0 +1,255 @@
+/*
+ * Copyright (C) 2016 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ */
+
+#ifndef __CPTVF_H
+#define __CPTVF_H
+
+#include <linux/list.h>
+#include "cpt_common.h"
+
+struct command_chunk {
+	uint8_t *head; /* 128-byte aligned real_vaddr */
+	uint8_t *real_vaddr; /* Virtual address after dma_alloc_consistent */
+	dma_addr_t dma_addr; /* 128-byte aligned real_dma_addr */
+	dma_addr_t real_dma_addr; /* DMA address after dma_alloc_consistent */
+	uint32_t size; /* Chunk size, max CPT_INST_CHUNK_MAX_SIZE */
+	struct hlist_node nextchunk;
+};
+
+struct iq_stats {
+	atomic64_t instr_posted;
+	atomic64_t instr_dropped;
+};
+
+/**
+ * comamnd queue structure
+ */
+struct command_queue {
+	spinlock_t lock; /* command queue lock */
+	uint32_t idx; /* Command queue host write idx */
+	uint32_t dbell_count; /* outstanding commands */
+	uint32_t nchunks; /* Number of command chunks */
+	struct command_chunk *qhead;	/* Command queue head, instructions
+					 * are inserted here
+					 */
+	struct hlist_head chead;
+	struct iq_stats stats; /* Queue statistics */
+};
+
+struct command_qinfo {
+	uint32_t dbell_thold; /* Command queue doorbell threshold */
+	uint32_t cmd_size; /* Command size (32/64-Byte) */
+	uint32_t qchunksize; /* Command queue chunk size configured by user */
+	struct command_queue queue[DEFAULT_DEVICE_QUEUES];
+};
+
+/**
+ * pending entry structure
+ */
+struct pending_entry {
+	uint8_t busy; /* Entry status (free/busy) */
+	uint8_t done;
+	uint8_t is_ae;
+
+	volatile uint64_t *completion_addr; /* Completion address */
+	void *post_arg;
+	void (*callback)(int, void *); /* Kernel ASYNC request callabck */
+	void *callback_arg; /* Kernel ASYNC request callabck arg */
+};
+
+/**
+ * pending queue structure
+ */
+struct pending_queue {
+	struct pending_entry *head;	/* head of the queue */
+	uint32_t front; /* Process work from here */
+	uint32_t rear; /* Append new work here */
+	atomic64_t pending_count;
+	spinlock_t lock; /* Queue lock */
+};
+
+struct pending_qinfo {
+	uint32_t nr_queues;	/* Number of queues supported */
+	uint32_t qlen; /* Queue length */
+	struct pending_queue queue[DEFAULT_DEVICE_QUEUES];
+};
+
+#define for_each_pending_queue(qinfo, q, i)	\
+	for (i = 0, q = &qinfo->queue[i]; i < qinfo->nr_queues; i++, \
+	     q = &qinfo->queue[i])
+
+/**
+ * CPT VF device structure
+ */
+struct cpt_vf {
+	uint32_t chip_id; /* CPT Device ID */
+	uint16_t flags; /* Flags to hold device status bits */
+	uint8_t vfid; /* Device Index 0...CPT_MAX_VF_NUM */
+	uint8_t vftype; /* VF type of SE_TYPE(1) or AE_TYPE(1) */
+	uint8_t vfgrp; /* VF group (0 - 8) */
+	uint8_t node; /* Operating node: Bits (46:44) in BAR0 address */
+	uint8_t  priority; /* VF priority ring: 1-High proirity round
+			    * robin ring;0-Low priority round robin ring;
+			    */
+	uint8_t  reqmode; /* Request processing mode POLL/ASYNC */
+	struct pci_dev *pdev; /* pci device handle */
+	void *sysdev; /* sysfs device */
+	void *proc; /* proc dir */
+	void __iomem *reg_base; /* Register start address */
+	void *wqe_info;	/* BH worker threads */
+	void *context;	/* Context Specific Information*/
+	void *nqueue_info; /* Queue Specific Information*/
+	/* MSI-X */
+	bool msix_enabled;
+	uint8_t	num_vec;
+	struct msix_entry msix_entries[CPT_VF_MSIX_VECTORS];
+	bool irq_allocated[CPT_VF_MSIX_VECTORS];
+	cpumask_var_t affinity_mask[CPT_VF_MSIX_VECTORS];
+	uint64_t intcnt;
+	/* Command and Pending queues */
+	uint32_t qlen;
+	uint32_t qsize; /* Calculated queue size */
+	uint32_t nr_queues;
+	uint32_t max_queues;
+	struct command_qinfo cqinfo; /* Command queue information */
+	struct pending_qinfo pqinfo; /* Pending queue information */
+	/* VF-PF mailbox communication */
+	bool pf_acked;
+	bool pf_nacked;
+} ____cacheline_aligned_in_smp;
+
+#define CPT_NODE_ID_SHIFT (44u)
+#define CPT_NODE_ID_MASK (3u)
+
+#define MAX_CPT_AE_CORES 6
+#define MAX_CPT_SE_CORES 10
+
+enum req_mode {
+	BLOCKING,
+	NON_BLOCKING,
+	SPEED,
+	KERN_POLL,
+};
+
+enum dma_mode {
+	DMA_DIRECT_DIRECT, /* Input DIRECT, Output DIRECT */
+	DMA_GATHER_SCATTER
+};
+
+enum inputype {
+	FROM_CTX = 0,
+	FROM_DPTR = 1
+};
+
+enum CspErrorCodes {
+	/*Microcode errors*/
+	NO_ERR = 0x00,
+	ERR_OPCODE_UNSUPPORTED = 0x01,
+
+	/*SCATTER GATHER*/
+	ERR_SCATTER_GATHER_WRITE_LENGTH = 0x02,
+	ERR_SCATTER_GATHER_LIST = 0x03,
+	ERR_SCATTER_GATHER_NOT_SUPPORTED = 0x04,
+
+	/*AE*/
+	ERR_LENGTH_INVALID = 0x05,
+	ERR_MOD_LEN_INVALID = 0x06,
+	ERR_EXP_LEN_INVALID = 0x07,
+	ERR_DATA_LEN_INVALID = 0x08,
+	ERR_MOD_LEN_ODD = 0x09,
+	ERR_PKCS_DECRYPT_INCORRECT = 0x0a,
+	ERR_ECC_PAI = 0xb,
+	ERR_ECC_CURVE_UNSUPPORTED = 0xc,
+	ERR_ECC_SIGN_R_INVALID = 0xd,
+	ERR_ECC_SIGN_S_INVALID = 0xe,
+	ERR_ECC_SIGNATURE_MISMATCH = 0xf,
+
+	/*SE GC*/
+	ERR_GC_LENGTH_INVALID = 0x41,
+	ERR_GC_RANDOM_LEN_INVALID = 0x42,
+	ERR_GC_DATA_LEN_INVALID = 0x43,
+	ERR_GC_DRBG_TYPE_INVALID = 0x44,
+	ERR_GC_CTX_LEN_INVALID = 0x45,
+	ERR_GC_CIPHER_UNSUPPORTED = 0x46,
+	ERR_GC_AUTH_UNSUPPORTED = 0x47,
+	ERR_GC_OFFSET_INVALID = 0x48,
+	ERR_GC_HASH_MODE_UNSUPPORTED = 0x49,
+	ERR_GC_DRBG_ENTROPY_LEN_INVALID = 0x4a,
+	ERR_GC_DRBG_ADDNL_LEN_INVALID = 0x4b,
+	ERR_GC_ICV_MISCOMPARE = 0x4c,
+	ERR_GC_DATA_UNALIGNED = 0x4d,
+
+	/*SE IPSEC*/
+	ERR_IPSEC_AUTH_UNSUPPORTED = 0xB0,
+	ERR_IPSEC_ENCRYPT_UNSUPPORTED = 0xB1,
+	ERR_IPSEC_IP_VERSION = 0xB2,
+	ERR_IPSEC_PROTOCOL = 0xB3,
+	ERR_IPSEC_CONTEXT_INVALID = 0xB4,
+	ERR_IPSEC_CONTEXT_DIRECTION_MISMATCH = 0xB5,
+	ERR_IPSEC_IP_PAYLOAD_TYPE = 0xB6,
+	ERR_IPSEC_CONTEXT_FLAG_MISMATCH = 0xB7,
+	ERR_IPSEC_GRE_HEADER_MISMATCH = 0xB8,
+	ERR_IPSEC_GRE_PROTOCOL = 0xB9,
+	ERR_IPSEC_CUSTOM_HDR_LEN = 0xBA,
+	ERR_IPSEC_ESP_NEXT_HEADER = 0xBB,
+	ERR_IPSEC_IPCOMP_CONFIGURATION = 0xBC,
+	ERR_IPSEC_FRAG_SIZE_CONFIGURATION = 0xBD,
+	ERR_IPSEC_SPI_MISMATCH = 0xBE,
+	ERR_IPSEC_CHECKSUM = 0xBF,
+	ERR_IPSEC_IPCOMP_PACKET_DETECTED = 0xC0,
+	ERR_IPSEC_TFC_PADDING_WITH_PREFRAG = 0xC1,
+	ERR_IPSEC_DSIV_INCORRECT_PARAM = 0xC2,
+	ERR_IPSEC_AUTHENTICATION_MISMATCH = 0xC3,
+	ERR_IPSEC_PADDING = 0xC4,
+	ERR_IPSEC_DUMMY_PAYLOAD = 0xC5,
+	ERR_IPSEC_IPV6_EXTENSION_HEADERS_TOO_BIG = 0xC6,
+	ERR_IPSEC_IPV6_HOP_BY_HOP = 0xC7,
+	ERR_IPSEC_IPV6_RH_LENGTH = 0xC8,
+	ERR_IPSEC_IPV6_OUTBOUND_RH_COPY_ADDR = 0xC9,
+	ERR_IPSEC_IPV6_DECRYPT_RH_SEGS_LEFT = 0xCA,
+	ERR_IPSEC_IPV6_HEADER_INVALID = 0xCB,
+	ERR_IPSEC_SELECTOR_MATCH = 0xCC,
+
+	/*SE SSL*/
+	ERR_SSL_POM_LEN_INVALID = 0x81,
+	ERR_SSL_RECORD_LEN_INVALID = 0x82,
+	ERR_SSL_CTX_LEN_INVALID = 0x83,
+	ERR_SSL_CIPHER_UNSUPPORTED = 0x84,
+	ERR_SSL_MAC_UNSUPPORTED = 0x85,
+	ERR_SSL_VERSION_UNSUPPORTED = 0x86,
+	ERR_SSL_VERIFY_AUTH_UNSUPPORTED = 0x87,
+	ERR_SSL_MS_LEN_INVALID = 0x88,
+	ERR_SSL_MAC_MISMATCH = 0x89,
+
+	/* API Layer */
+	ERR_REQ_TIMEOUT      = (0x40000000 | 0x103),    /* 0x40000103 */
+	ERR_REQ_PENDING      = (0x40000000 | 0x110),    /* 0x40000110 */
+	ERR_BAD_INPUT_LENGTH = (0x40000000 | 384),    /* 0x40000180 */
+	ERR_BAD_KEY_LENGTH,
+	ERR_BAD_KEY_HANDLE,
+	ERR_BAD_CONTEXT_HANDLE,
+	ERR_BAD_SCALAR_LENGTH,
+	ERR_BAD_DIGEST_LENGTH,
+	ERR_BAD_INPUT_ARG,
+	ERR_BAD_SSL_MSG_TYPE,
+	ERR_BAD_RECORD_PADDING,
+	ERR_NB_REQUEST_PENDING,
+};
+
+int cptvf_send_vf_up(struct cpt_vf *cptvf);
+int cptvf_send_vf_down(struct cpt_vf *cptvf);
+int cptvf_send_vf_to_grp_msg(struct cpt_vf *cptvf);
+int cptvf_send_vf_priority_msg(struct cpt_vf *cptvf);
+int cptvf_send_vq_size_msg(struct cpt_vf *cptvf);
+int cptvf_check_pf_ready(struct cpt_vf *cptvf);
+void cptvf_handle_mbox_intr(struct cpt_vf *cptvf);
+void cvm_crypto_exit(void);
+int cvm_crypto_init(struct cpt_vf *cptvf);
+void vq_post_process(struct cpt_vf *cptvf, uint32_t qno);
+void cptvf_write_vq_doorbell(struct cpt_vf *cptvf, uint32_t val);
+#endif /* __CPTVF_H */
diff --git a/drivers/crypto/cavium/cpt/cptvf_algs.c b/drivers/crypto/cavium/cpt/cptvf_algs.c
new file mode 100644
index 0000000..4705e90
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/cptvf_algs.c
@@ -0,0 +1,446 @@
+
+/*
+ * Copyright (C) 2016 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ */
+
+#include <linux/crypto.h>
+#include <crypto/algapi.h>
+#include <crypto/cryptd.h>
+#include <crypto/crypto_wq.h>
+#include <linux/list.h>
+#include <linux/scatterlist.h>
+#include <linux/err.h>
+#include <crypto/aes.h>
+#include <crypto/internal/aead.h>
+#include <crypto/aead.h>
+#include <crypto/authenc.h>
+#include <crypto/aes.h>
+#include <crypto/des.h>
+#include "request_manager.h"
+#include "cptvf.h"
+#include "cptvf_algs.h"
+
+struct cpt_device_handle {
+	void *cdev[MAX_DEVICES];
+	uint32_t dev_count;
+};
+
+static struct cpt_device_handle dev_handle;
+
+static void cvm_callback(uint32_t status, void *arg)
+{
+	struct crypto_async_request *req = (struct crypto_async_request *)arg;
+
+	req->complete(req, !status);
+}
+
+static inline void update_input_iv(struct cpt_request_info *req_info,
+				   uint8_t *iv, uint32_t enc_iv_len,
+				   uint32_t *argcnt)
+{
+	/* Setting the iv information */
+	req_info->in[*argcnt].ptr.addr = (void *)iv;
+	req_info->in[*argcnt].size = enc_iv_len;
+	req_info->in[*argcnt].offset = enc_iv_len;
+	req_info->in[*argcnt].type = UNIT_8_BIT;
+	req_info->req.dlen += enc_iv_len;
+
+	++(*argcnt);
+}
+
+static inline void update_output_iv(struct cpt_request_info *req_info,
+				    uint8_t *iv, uint32_t enc_iv_len,
+				    uint32_t *argcnt)
+{
+	/* Setting the iv information */
+	req_info->out[*argcnt].ptr.addr = (void *)iv;
+	req_info->out[*argcnt].size = enc_iv_len;
+	req_info->out[*argcnt].offset = enc_iv_len;
+	req_info->out[*argcnt].type = UNIT_8_BIT;
+
+	req_info->rlen += enc_iv_len;
+
+	++(*argcnt);
+}
+
+static inline void update_input_data(struct cpt_request_info *req_info,
+				     struct scatterlist *inp_sg,
+				     uint32_t nbytes, uint32_t *argcnt)
+{
+	req_info->req.dlen += nbytes;
+
+	while (nbytes) {
+		uint32_t len = min(nbytes, inp_sg->length);
+		uint8_t *ptr = page_address(sg_page(inp_sg)) + inp_sg->offset;
+
+		req_info->in[*argcnt].ptr.addr = (void *)ptr;
+		req_info->in[*argcnt].size = len;
+		req_info->in[*argcnt].offset = len;
+		req_info->in[*argcnt].type = UNIT_8_BIT;
+		nbytes -= len;
+
+		++(*argcnt);
+		++inp_sg;
+	}
+}
+
+static inline void update_output_data(struct cpt_request_info *req_info,
+				      struct scatterlist *outp_sg,
+				      uint32_t nbytes, uint32_t *argcnt)
+{
+	req_info->rlen += nbytes;
+
+	while (nbytes) {
+		uint32_t len = min(nbytes, outp_sg->length);
+		uint8_t *ptr = page_address(sg_page(outp_sg)) +
+					    outp_sg->offset;
+
+		req_info->out[*argcnt].ptr.addr = (void *)ptr;
+		req_info->out[*argcnt].size = len;
+		req_info->out[*argcnt].offset = len;
+		req_info->out[*argcnt].type = UNIT_8_BIT;
+		nbytes -= len;
+		++(*argcnt);
+		++outp_sg;
+	}
+}
+
+static inline uint32_t create_ctx_hdr(struct ablkcipher_request *req,
+				      uint32_t enc, uint32_t cipher_type,
+				      uint32_t aes_key_type, uint32_t *argcnt)
+{
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req);
+	struct cvm_enc_ctx *ctx = crypto_ablkcipher_ctx(tfm);
+	struct cvm_req_ctx *rctx = ablkcipher_request_ctx(req);
+	struct fc_context *fctx = &rctx->fctx;
+	uint64_t *offset_control = &rctx->control_word;
+	uint32_t enc_iv_len = crypto_ablkcipher_ivsize(tfm);
+	struct cpt_request_info *req_info = &rctx->cpt_req;
+	uint64_t *ctrl_flags = NULL;
+	uint8_t iv_inp = FROM_DPTR;
+	uint8_t dma_mode = DMA_GATHER_SCATTER;
+
+	req_info->ctrl.s.grp = 0;
+	req_info->ctrl.s.dma_mode = dma_mode;
+	req_info->ctrl.s.req_mode = NON_BLOCKING;
+	req_info->ctrl.s.se_req = SE_CORE_REQ;
+
+	req_info->ctxl = sizeof(struct fc_context);
+	req_info->handle = 0;
+
+	req_info->req.opcode.s.major = MAJOR_OP_FC | DMA_MODE_FLAG(dma_mode);
+	if (enc)
+		req_info->req.opcode.s.minor = 2;
+	else
+		req_info->req.opcode.s.minor = 3;
+
+	req_info->req.param1 = req->nbytes; /* Encryption Data length */
+	req_info->req.param2 = 0; /*Auth data length */
+
+	fctx->enc.enc_ctrl.e.enc_cipher = cipher_type;
+	fctx->enc.enc_ctrl.e.aes_key = aes_key_type;
+	fctx->enc.enc_ctrl.e.iv_source = iv_inp;
+
+	memcpy(fctx->enc.encr_key, ctx->enc_key, ctx->key_len);
+	ctrl_flags = (uint64_t *)&fctx->enc.enc_ctrl.flags;
+	*ctrl_flags = cpu_to_be64(*ctrl_flags);
+
+	*offset_control = cpu_to_be64(((uint64_t)(enc_iv_len) << 16));
+	/* Storing  Packet Data Information in offset
+	 * Control Word First 8 bytes
+	 */
+	req_info->in[*argcnt].ptr.addr = (uint8_t *)offset_control;
+	req_info->in[*argcnt].size = CONTROL_WORD_LEN;
+	req_info->in[*argcnt].offset = CONTROL_WORD_LEN;
+	req_info->in[*argcnt].type = UNIT_8_BIT;
+	req_info->req.dlen += CONTROL_WORD_LEN;
+
+	++(*argcnt);
+
+	req_info->in[*argcnt].ptr.addr = (uint8_t *)fctx;
+	req_info->in[*argcnt].size = sizeof(struct fc_context);
+	req_info->in[*argcnt].offset = sizeof(struct fc_context);
+	req_info->in[*argcnt].type = UNIT_8_BIT;
+	req_info->req.dlen += sizeof(struct fc_context);
+
+	++(*argcnt);
+
+	return 0;
+}
+
+static inline uint32_t create_input_list(struct ablkcipher_request  *req,
+					 uint32_t enc, uint32_t cipher_type,
+					 uint32_t aes_key_type,
+					 uint32_t enc_iv_len)
+{
+	struct cvm_req_ctx *rctx = ablkcipher_request_ctx(req);
+	struct cpt_request_info *req_info = &rctx->cpt_req;
+	uint32_t argcnt =  0;
+
+	create_ctx_hdr(req, enc, cipher_type, aes_key_type, &argcnt);
+	update_input_iv(req_info, req->info, enc_iv_len, &argcnt);
+	update_input_data(req_info, req->src, req->nbytes, &argcnt);
+	req_info->incnt = argcnt;
+
+	return 0;
+}
+
+static inline void store_cb_info(struct ablkcipher_request *req,
+				 struct cpt_request_info *req_info)
+{
+	req_info->callback = (void *)cvm_callback;
+	req_info->callback_arg = (void *)&req->base;
+}
+
+static inline void create_output_list(struct ablkcipher_request *req,
+				      uint32_t cipher_type,
+				      uint32_t enc_iv_len)
+{
+	struct cvm_req_ctx *rctx = ablkcipher_request_ctx(req);
+	struct cpt_request_info *req_info = &rctx->cpt_req;
+	uint32_t argcnt = 0;
+
+	/* OUTPUT Buffer Processing
+	 * AES encryption/decryption output would be
+	 * received in the following format
+	 *
+	 * ------IV--------|------ENCRYPTED/DECRYPTED DATA-----|
+	 * [ 16 Bytes/     [   Request Enc/Dec/ DATA Len AES CBC ]
+	 */
+	/* Reading IV information */
+	update_output_iv(req_info, req->info, enc_iv_len, &argcnt);
+	update_output_data(req_info, req->dst, req->nbytes, &argcnt);
+	req_info->outcnt = argcnt;
+}
+
+static inline uint32_t cvm_enc_dec(struct ablkcipher_request *req,
+				   uint32_t enc, uint32_t cipher_type)
+{
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req);
+	struct cvm_enc_ctx *ctx = crypto_ablkcipher_ctx(tfm);
+	uint32_t key_type = AES_128_BIT;
+	struct cvm_req_ctx *rctx = ablkcipher_request_ctx(req);
+	uint32_t enc_iv_len = crypto_ablkcipher_ivsize(tfm);
+	struct fc_context *fctx = &rctx->fctx;
+	struct cpt_request_info *req_info = &rctx->cpt_req;
+	void *cdev = NULL;
+	uint32_t status = -1;
+
+	switch (ctx->key_len) {
+	case BYTE_16:
+		key_type = AES_128_BIT;
+		break;
+	case BYTE_24:
+		key_type = AES_192_BIT;
+		break;
+	case BYTE_32:
+		key_type = AES_256_BIT;
+		break;
+	default:
+		return ERR_GC_CIPHER_UNSUPPORTED;
+	}
+
+	if (cipher_type == DES3_CBC)
+		key_type = 0;
+
+	memset(req_info, 0, sizeof(struct cpt_request_info));
+	memset(fctx, 0, sizeof(struct fc_context));
+	create_input_list(req, enc, cipher_type, key_type, enc_iv_len);
+	create_output_list(req, cipher_type, enc_iv_len);
+	store_cb_info(req, req_info);
+	cdev = dev_handle.cdev[smp_processor_id()];
+	status = cptvf_do_request(cdev, req_info);
+	/* We perform an asynchronous send and once
+	 * the request is completed the driver would
+	 * intimate through  registered call back functions
+	 */
+
+	if (status)
+		return status;
+	else
+		return -EINPROGRESS;
+}
+
+int cvm_des3_encrypt_cbc(struct ablkcipher_request *req)
+{
+	return cvm_enc_dec(req, true, DES3_CBC);
+}
+
+int cvm_des3_decrypt_cbc(struct ablkcipher_request *req)
+{
+	return cvm_enc_dec(req, false, DES3_CBC);
+}
+
+int cvm_aes_encrypt_xts(struct ablkcipher_request *req)
+{
+	return cvm_enc_dec(req, true, AES_XTS);
+}
+
+int cvm_aes_decrypt_xts(struct ablkcipher_request *req)
+{
+	return cvm_enc_dec(req, false, AES_XTS);
+}
+
+int cvm_aes_encrypt_cbc(struct ablkcipher_request *req)
+{
+	return cvm_enc_dec(req, true, AES_CBC);
+}
+
+int cvm_aes_decrypt_cbc(struct ablkcipher_request *req)
+{
+	return cvm_enc_dec(req, false, AES_CBC);
+}
+
+int cvm_enc_dec_setkey(struct crypto_ablkcipher *cipher, const uint8_t *key,
+		       uint32_t keylen)
+{
+	struct crypto_tfm *tfm = crypto_ablkcipher_tfm(cipher);
+	struct cvm_enc_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if ((keylen == BYTE_16) || (keylen == BYTE_24) ||
+	    (keylen == BYTE_32)) {
+		ctx->key_len = keylen;
+		memcpy(ctx->enc_key, key, keylen);
+		return 0;
+	}
+	crypto_ablkcipher_set_flags(cipher, CRYPTO_TFM_RES_BAD_KEY_LEN);
+
+	return -EINVAL;
+}
+
+int cvm_enc_dec_init(struct crypto_tfm *tfm)
+{
+	struct cvm_enc_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	memset(ctx, 0, sizeof(*ctx));
+	tfm->crt_ablkcipher.reqsize = sizeof(struct cvm_req_ctx) +
+					sizeof(struct ablkcipher_request);
+	/* Additional memory for ablkcipher_request is
+	 * allocated since the cryptd daemon uses
+	 * this memory for request_ctx information
+	 */
+
+	return 0;
+}
+
+void cvm_enc_dec_exit(struct crypto_tfm *tfm)
+{
+	return;
+}
+
+struct crypto_alg algs[] = { {
+	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
+	.cra_blocksize = AES_BLOCK_SIZE,
+	.cra_ctxsize = sizeof(struct cvm_enc_ctx),
+	.cra_alignmask = 7,
+	.cra_priority = CAV_PRIORITY,
+	.cra_name = "xts(aes)",
+	.cra_driver_name = "cavium-xts-aes",
+	.cra_type = &crypto_ablkcipher_type,
+	.cra_u = {
+		.ablkcipher = {
+			.ivsize = AES_BLOCK_SIZE,
+			.min_keysize = AES_MIN_KEY_SIZE,
+			.max_keysize = AES_MAX_KEY_SIZE,
+			.setkey = cvm_enc_dec_setkey,
+			.encrypt = cvm_aes_encrypt_xts,
+			.decrypt = cvm_aes_decrypt_xts,
+		},
+	},
+	.cra_init = cvm_enc_dec_init,
+	.cra_exit = cvm_enc_dec_exit,
+	.cra_module = THIS_MODULE,
+}, {
+	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
+	.cra_blocksize = AES_BLOCK_SIZE,
+	.cra_ctxsize = sizeof(struct cvm_enc_ctx),
+	.cra_alignmask = 7,
+	.cra_priority = CAV_PRIORITY,
+	.cra_name = "cbc(aes)",
+	.cra_driver_name = "cavium-cbc-aes",
+	.cra_type = &crypto_ablkcipher_type,
+	.cra_u = {
+		.ablkcipher = {
+			.ivsize = AES_BLOCK_SIZE,
+			.min_keysize = AES_MIN_KEY_SIZE,
+			.max_keysize = AES_MAX_KEY_SIZE,
+			.setkey = cvm_enc_dec_setkey,
+			.encrypt = cvm_aes_encrypt_cbc,
+			.decrypt = cvm_aes_decrypt_cbc,
+		},
+	},
+	.cra_init = cvm_enc_dec_init,
+	.cra_exit = cvm_enc_dec_exit,
+	.cra_module = THIS_MODULE,
+}, {
+	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
+	.cra_blocksize = DES3_EDE_BLOCK_SIZE,
+	.cra_ctxsize = sizeof(struct cvm_des3_ctx),
+	.cra_alignmask = 7,
+	.cra_priority = CAV_PRIORITY,
+	.cra_name = "cbc(des3_ede)",
+	.cra_driver_name = "cavium-cbc-des3_ede",
+	.cra_type = &crypto_ablkcipher_type,
+	.cra_u = {
+		.ablkcipher = {
+			.min_keysize = DES3_EDE_KEY_SIZE,
+			.max_keysize = DES3_EDE_KEY_SIZE,
+			.ivsize = DES_BLOCK_SIZE,
+			.setkey = cvm_enc_dec_setkey,
+			.encrypt = cvm_des3_encrypt_cbc,
+			.decrypt = cvm_des3_decrypt_cbc,
+		},
+	},
+	.cra_init = cvm_enc_dec_init,
+	.cra_exit = cvm_enc_dec_exit,
+	.cra_module = THIS_MODULE,
+} };
+
+static inline int cav_register_algs(void)
+{
+	int err = 0;
+
+	err = crypto_register_algs(algs, ARRAY_SIZE(algs));
+	if (err) {
+		pr_err("Error in aes module init %d\n", err);
+		return -1;
+	}
+
+	return 0;
+}
+
+static inline void cav_unregister_algs(void)
+{
+	crypto_unregister_algs(algs, ARRAY_SIZE(algs));
+}
+
+int cvm_crypto_init(struct cpt_vf *cptvf)
+{
+	uint32_t dev_count;
+
+	dev_count = dev_handle.dev_count;
+	dev_handle.cdev[dev_count] = cptvf;
+	dev_handle.dev_count++;
+
+	if (!dev_count) {
+		if (cav_register_algs()) {
+			pr_err("Error in registering crypto algorithms\n");
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+void cvm_crypto_exit(void)
+{
+	uint32_t dev_count;
+
+	dev_count = --dev_handle.dev_count;
+	if (!dev_count)
+		cav_unregister_algs();
+}
diff --git a/drivers/crypto/cavium/cpt/cptvf_algs.h b/drivers/crypto/cavium/cpt/cptvf_algs.h
new file mode 100644
index 0000000..2e45797
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/cptvf_algs.h
@@ -0,0 +1,159 @@
+/*
+ * Copyright (C) 2016 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ */
+
+#ifndef _CAVIUM_SYM_CRYPTO_H_
+#define _CAVIUM_SYM_CRYPTO_H_
+
+#define MAX_DEVICES 16
+/* AE opcodes*/
+#define MAJOR_OP_MISC         0x01
+#define MAJOR_OP_RANDOM       0x02
+#define MAJOR_OP_MODEXP       0x03
+#define MAJOR_OP_ECDSA        0x04
+#define MAJOR_OP_ECC          0x05
+#define MAJOR_OP_GENRSAPRIME  0x06
+#define MAJOR_OP_AE_RANDOM    0x32
+#define MAJOR_OP_AE_PASSTHRU  0x01
+#define MINOR_OP_AE_PASSTHRU  0x07
+
+/*SE opcodes*/
+#define MAJOR_OP_SE_MISC    0x31
+#define MAJOR_OP_SE_RANDOM  0x32
+#define MAJOR_OP_FC         0x33
+#define MAJOR_OP_HASH       0x34
+#define MAJOR_OP_HMAC       0x35
+#define MAJOR_OP_DSIV       0x36
+
+#define MAJOR_OP_SSL_FULL    0x10
+#define MAJOR_OP_SSL_VERIFY  0x11
+#define MAJOR_OP_SSL_RESUME  0x12
+#define MAJOR_OP_SSL_FINISH  0x13
+#define MAJOR_OP_SSL_ENCREC  0x14
+#define MAJOR_OP_SSL_DECREC  0x15
+
+#define MAJOR_OP_WRITESA_OUTBOUND 0x20
+#define MAJOR_OP_WRITESA_INBOUND  0x21
+#define MAJOR_OP_OUTBOUND         0x23
+#define MAJOR_OP_INBOUND          0x24
+
+#define MAJOR_OP_SE_PASSTHRU  0x01
+#define MINOR_OP_SE_PASSTHRU  0x07
+
+#define  CAV_PRIORITY 1000
+#define  MAX_ENC_KEY_SIZE 32
+#define  MAX_HASH_KEY_SIZE 64
+#define  MAX_KEY_SIZE (MAX_ENC_KEY_SIZE + MAX_HASH_KEY_SIZE)
+#define  CONTROL_WORD_LEN 8
+
+#define IV_OFFSET 8   /* Include SPI | SNO 8 Bytes */
+#define AES_CBC_ALG_NAME "cbc(aes)"
+#define AES_XTS_ALG_NAME "xts(aes)"
+#define DES3_ALG_NAME "cbc(des3_ede)"
+
+#define  BYTE_16 16
+#define  BYTE_24 24
+#define  BYTE_32 32
+
+#define DMA_MODE_FLAG(dma_mode) \
+	((dma_mode == DMA_GATHER_SCATTER) ? (1 << 7) : 0)
+
+enum req_type {
+	AE_CORE_REQ,
+	SE_CORE_REQ,
+};
+
+enum cipher_type {
+	DES3_CBC = 0x1,
+	DES3_ECB = 0x2,
+	AES_CBC = 0x3,
+	AES_ECB = 0x4,
+	AES_CFB = 0x5,
+	AES_CTR = 0x6,
+	AES_GCM = 0x7,
+	AES_XTS = 0x8
+};
+
+enum aes_type {
+	AES_128_BIT = 0x1,
+	AES_192_BIT = 0x2,
+	AES_256_BIT = 0x3
+};
+
+/*Context length in words*/
+#define  FC_CTX_LENGTH       23
+#define  ENC_CTX_LENGTH       7
+#define  HASH_CTX_LENGTH     34
+#define  HMAC_CTX_LENGTH     34
+
+union encr_ctrl {
+	uint64_t flags;
+	struct {
+#if defined(__BIG_ENDIAN_BITFIELD)
+		uint64_t enc_cipher:4;
+		uint64_t reserved1:1;
+		uint64_t aes_key:2;
+		uint64_t iv_source:1;
+		uint64_t hash_type:4;
+		uint64_t reserved2:3;
+		uint64_t auth_input_type:1;
+		uint64_t mac_len:8;
+		uint64_t reserved3:8;
+		uint64_t encr_offset:16;
+		uint64_t iv_offset:8;
+		uint64_t auth_offset:8;
+#else
+		uint64_t auth_offset:8;
+		uint64_t iv_offset:8;
+		uint64_t encr_offset:16;
+		uint64_t reserved3:8;
+		uint64_t mac_len:8;
+		uint64_t auth_input_type:1;
+		uint64_t reserved2:3;
+		uint64_t hash_type:4;
+		uint64_t iv_source:1;
+		uint64_t aes_key:2;
+		uint64_t reserved1:1;
+		uint64_t enc_cipher:4;
+#endif
+	} e;
+};
+
+struct enc_context {
+	union encr_ctrl enc_ctrl;
+	uint8_t  encr_key[32];
+	uint8_t  encr_iv[16];
+};
+
+struct fchmac_context {
+	uint8_t  ipad[64];
+	uint8_t  opad[64]; /* or OPAD */
+};
+
+struct fc_context {
+	struct enc_context enc;
+	struct fchmac_context hmac;
+};
+
+struct cvm_enc_ctx {
+	uint32_t key_len;
+	uint8_t enc_key[MAX_KEY_SIZE];
+};
+
+struct cvm_des3_ctx {
+	uint32_t key_len;
+	uint8_t des3_key[MAX_KEY_SIZE];
+};
+
+struct cvm_req_ctx {
+	struct cpt_request_info cpt_req;
+	uint64_t control_word;
+	struct fc_context fctx;
+};
+
+uint32_t cptvf_do_request(void *cptvf, struct cpt_request_info *);
+#endif /*_CAVIUM_SYM_CRYPTO_H_*/
diff --git a/drivers/crypto/cavium/cpt/cptvf_main.c b/drivers/crypto/cavium/cpt/cptvf_main.c
new file mode 100644
index 0000000..57b796f
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/cptvf_main.c
@@ -0,0 +1,1038 @@
+/*
+ * Copyright (C) 2016 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ */
+
+#include <linux/version.h>
+#include <linux/aer.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/printk.h>
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/pci.h>
+#include <linux/cpumask.h>
+
+#include "cptvf.h"
+
+#define DRV_NAME	"thunder-cptvf"
+#define DRV_VERSION	"1.0"
+
+static uint32_t qlen = DEFAULT_CMD_QLEN;
+module_param(qlen, uint, 0644);
+MODULE_PARM_DESC(qlen, "Command queue length");
+
+static uint32_t chunksize = DEFAULT_CMD_QCHUNK_SIZE;
+module_param(chunksize, uint, 0644);
+MODULE_PARM_DESC(chunksize, "Command queue chunk size");
+
+static uint32_t group = 1; /* Default to SE group */
+module_param(group, uint, 0644);
+MODULE_PARM_DESC(group, "VF group (Value between 0 - 7)");
+
+static uint32_t priority;
+module_param(priority, uint, 0644);
+MODULE_PARM_DESC(priority, "VF/VQ Priority (0-1)");
+
+struct cptvf_wqe {
+	struct tasklet_struct twork;
+	void *cptvf;
+	uint32_t qno;
+};
+
+struct cptvf_wqe_info {
+	struct cptvf_wqe vq_wqe[DEFAULT_DEVICE_QUEUES];
+};
+
+static void vq_work_handler(unsigned long data)
+{
+	struct cptvf_wqe_info *cwqe_info = (struct cptvf_wqe_info *)data;
+	struct cptvf_wqe *cwqe = &cwqe_info->vq_wqe[0];
+
+	vq_post_process(cwqe->cptvf, cwqe->qno);
+}
+
+static int init_worker_threads(struct cpt_vf *cptvf)
+{
+	struct pci_dev *pdev = cptvf->pdev;
+	struct cptvf_wqe_info *cwqe_info;
+	int i;
+
+	cwqe_info = kzalloc(sizeof(*cwqe_info), GFP_KERNEL);
+	if (!cwqe_info)
+		return -ENOMEM;
+
+	if (cptvf->nr_queues) {
+		dev_info(&pdev->dev, "Creating VQ worker threads (%d)\n",
+			 cptvf->nr_queues);
+	}
+
+	for (i = 0; i < cptvf->nr_queues; i++) {
+		tasklet_init(&cwqe_info->vq_wqe[i].twork, vq_work_handler,
+			     (uint64_t)cwqe_info);
+		cwqe_info->vq_wqe[i].qno = i;
+		cwqe_info->vq_wqe[i].cptvf = cptvf;
+	}
+
+	cptvf->wqe_info = cwqe_info;
+
+	return 0;
+}
+
+static void cleanup_worker_threads(struct cpt_vf *cptvf)
+{
+	struct cptvf_wqe_info *cwqe_info;
+	struct pci_dev *pdev = cptvf->pdev;
+	int i;
+
+	cwqe_info = (struct cptvf_wqe_info *)cptvf->wqe_info;
+	if (!cwqe_info)
+		return;
+
+	if (cptvf->nr_queues) {
+		dev_info(&pdev->dev, "Cleaning VQ worker threads (%u)\n",
+			 cptvf->nr_queues);
+	}
+
+	for (i = 0; i < cptvf->nr_queues; i++)
+		tasklet_kill(&cwqe_info->vq_wqe[i].twork);
+
+	kzfree(cwqe_info);
+	cptvf->wqe_info = NULL;
+}
+
+static void free_pending_queues(struct pending_qinfo *pqinfo)
+{
+	int32_t i;
+	struct pending_queue *queue;
+
+	for_each_pending_queue(pqinfo, queue, i) {
+		if (!queue->head)
+			continue;
+
+		/* free single queue */
+		kzfree((queue->head));
+
+		queue->front = 0;
+		queue->rear = 0;
+
+		return;
+	}
+
+	pqinfo->qlen = 0;
+	pqinfo->nr_queues = 0;
+}
+
+static int32_t alloc_pending_queues(struct pending_qinfo *pqinfo,
+				    uint32_t qlen, uint32_t nr_queues)
+{
+	uint32_t i;
+	size_t size;
+	int32_t ret;
+	struct pending_queue *queue = NULL;
+
+	pqinfo->nr_queues = nr_queues;
+	pqinfo->qlen = qlen;
+
+	size = (qlen * sizeof(struct pending_entry));
+
+	for_each_pending_queue(pqinfo, queue, i) {
+		queue->head = kzalloc((size), GFP_KERNEL);
+		if (!queue->head) {
+			pr_err("pending Q (%d) allocation failed\n", i);
+			ret = -ENOMEM;
+			goto pending_qfail;
+		}
+
+		queue->front = 0;
+		queue->rear = 0;
+		atomic64_set((&queue->pending_count), (0));
+
+		/* init queue spin lock */
+		spin_lock_init(&queue->lock);
+	}
+
+	return 0;
+
+pending_qfail:
+	free_pending_queues(pqinfo);
+
+	return ret;
+}
+
+static int32_t init_pending_queues(struct cpt_vf *cptvf, uint32_t qlen,
+				   uint32_t nr_queues)
+{
+	int32_t ret;
+
+	if (!nr_queues)
+		return 0;
+
+	ret = alloc_pending_queues(&cptvf->pqinfo, qlen, nr_queues);
+	if (ret) {
+		pr_err("failed to setup pending queues (%u)\n", nr_queues);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void cleanup_pending_queues(struct cpt_vf *cptvf)
+{
+	struct pci_dev *pdev = cptvf->pdev;
+
+	if (!cptvf->nr_queues)
+		return;
+
+	dev_info(&pdev->dev, "Cleaning VQ pending queue (%u)\n",
+		 cptvf->nr_queues);
+	free_pending_queues(&cptvf->pqinfo);
+}
+
+static void free_command_queues(struct cpt_vf *cptvf,
+				struct command_qinfo *cqinfo)
+{
+	int i, j;
+	struct command_queue *queue = NULL;
+	struct command_chunk *chunk = NULL, *next = NULL;
+	struct pci_dev *pdev = cptvf->pdev;
+	struct hlist_node *node;
+
+	/* clean up for each queue */
+	for (i = 0; i < cptvf->nr_queues; i++) {
+		queue = &cqinfo->queue[i];
+		if (hlist_empty(&cqinfo->queue[i].chead))
+			continue;
+
+		hlist_for_each(node, &cqinfo->queue[i].chead) {
+			chunk = hlist_entry(node, struct command_chunk,
+					    nextchunk);
+			break;
+		}
+
+		for (j = 0; j < queue->nchunks; j++) {
+			if (j < queue->nchunks) {
+				node = node->next;
+				next = hlist_entry(node, struct command_chunk,
+						   nextchunk);
+			}
+
+			dma_free_coherent(&pdev->dev, chunk->size,
+					  chunk->real_vaddr,
+					  chunk->real_dma_addr);
+			chunk->real_vaddr = NULL;
+			chunk->real_dma_addr = 0;
+			chunk->head = NULL;
+			chunk->dma_addr = 0;
+			hlist_del(&chunk->nextchunk);
+			kzfree(chunk);
+			chunk = next;
+		}
+		queue->nchunks = 0;
+		queue->idx = 0;
+		queue->dbell_count = 0;
+	}
+
+	/* common cleanup */
+	cqinfo->cmd_size = 0;
+	cqinfo->dbell_thold = 0;
+}
+
+static int32_t alloc_command_queues(struct cpt_vf *cptvf,
+				    struct command_qinfo *cqinfo,
+				    size_t cmd_size, size_t align,
+				    uint32_t qlen, uint32_t nr_queues)
+{
+	int i;
+	size_t q_size;
+	struct command_queue *queue = NULL;
+	struct pci_dev *pdev = cptvf->pdev;
+
+	/* common init */
+	cqinfo->cmd_size = cmd_size;
+	cqinfo->dbell_thold = CPT_DBELL_THOLD;
+
+	/* Qsize in dwords, needed for SADDR config, 1-next chunk pointer */
+	cptvf->qsize = min(qlen, cqinfo->qchunksize) *
+			CPT_NEXT_CHUNK_PTR_SIZE + 1;
+	/* Qsize in bytes to create space for alignment */
+	q_size = qlen * cqinfo->cmd_size;
+
+	/* per queue initialization */
+	for (i = 0; i < cptvf->nr_queues; i++) {
+		size_t c_size = 0;
+		size_t rem_q_size = q_size;
+		struct command_chunk *curr = NULL, *first = NULL, *last = NULL;
+		uint32_t qcsize_bytes = cqinfo->qchunksize * cqinfo->cmd_size;
+
+		queue = &cqinfo->queue[i];
+		INIT_HLIST_HEAD(&cqinfo->queue[i].chead);
+		do {
+			curr = kzalloc(sizeof(*curr), GFP_KERNEL);
+			if (!curr)
+				goto cmd_qfail;
+
+			c_size = (rem_q_size > qcsize_bytes) ? qcsize_bytes :
+					rem_q_size;
+			curr->real_vaddr = (uint8_t *)dma_zalloc_coherent(&pdev->dev,
+					  c_size + CPT_NEXT_CHUNK_PTR_SIZE,
+					  &curr->real_dma_addr, GFP_KERNEL);
+			if (!curr->real_vaddr) {
+				pr_err("Command Q (%d) chunk (%d) allocation failed\n",
+				       i, queue->nchunks);
+				goto cmd_qfail;
+			}
+
+			curr->head = (uint8_t *)PTR_ALIGN(curr->real_vaddr, align);
+			curr->dma_addr = (dma_addr_t)PTR_ALIGN(curr->real_dma_addr,
+								align);
+			curr->size = c_size;
+			if (queue->nchunks == 0) {
+				hlist_add_head(&curr->nextchunk,
+					       &cqinfo->queue[i].chead);
+				first = curr;
+			} else {
+				hlist_add_behind(&curr->nextchunk,
+						 &last->nextchunk);
+			}
+
+			queue->nchunks++;
+			rem_q_size -= c_size;
+			if (last)
+				*((uint64_t *)(&last->head[last->size])) = (uint64_t)curr->dma_addr;
+
+			last = curr;
+		} while (rem_q_size);
+
+		/* Make the queue circular */
+		/* Tie back last chunk entry to head */
+		curr = first;
+		*((uint64_t *)(&last->head[last->size])) = (uint64_t)curr->dma_addr;
+		last->nextchunk.next = &curr->nextchunk;
+		queue->qhead = curr;
+		queue->dbell_count = 0;
+		spin_lock_init(&queue->lock);
+	}
+	return 0;
+
+cmd_qfail:
+	free_command_queues(cptvf, cqinfo);
+	return -ENOMEM;
+}
+
+static int32_t init_command_queues(struct cpt_vf *cptvf, uint32_t qlen,
+				   uint32_t nr_queues)
+{
+	int32_t ret;
+
+	if (!nr_queues)
+		return 0;
+
+	/* setup AE command queues */
+	ret = alloc_command_queues(cptvf, &cptvf->cqinfo, CPT_INST_SIZE,
+				   CPT_VQ_CHUNK_ALIGN, qlen, nr_queues);
+	if (ret) {
+		pr_err("failed to allocate AE command queues (%u)\n",
+		       nr_queues);
+		return ret;
+	}
+
+	return ret;
+}
+
+static void cleanup_command_queues(struct cpt_vf *cptvf)
+{
+	struct pci_dev *pdev = cptvf->pdev;
+
+	if (!cptvf->nr_queues)
+		return;
+
+	dev_info(&pdev->dev, "Cleaning VQ command queue (%u)\n",
+		 cptvf->nr_queues);
+	free_command_queues(cptvf, &cptvf->cqinfo);
+}
+
+static void cptvf_sw_cleanup(struct cpt_vf *cptvf)
+{
+	cleanup_worker_threads(cptvf);
+	cleanup_pending_queues(cptvf);
+	cleanup_command_queues(cptvf);
+}
+
+static int32_t cptvf_sw_init(struct cpt_vf *cptvf, uint32_t qlen,
+			     uint32_t nr_queues)
+{
+	int32_t ret = 0;
+	uint32_t max_dev_queues = 0, nr_cpus = num_online_cpus();
+
+	max_dev_queues = CPT_NUM_QS_PER_VF;
+	/* possible cpus */
+	nr_queues = max_t(uint32_t, nr_cpus, nr_queues);
+	nr_queues = min_t(uint32_t, nr_queues, max_dev_queues);
+	cptvf->max_queues = nr_queues;
+	cptvf->nr_queues = nr_queues;
+	cptvf->qlen = qlen;
+
+	ret = init_command_queues(cptvf, qlen, nr_queues);
+	if (ret) {
+		pr_err("Failed to setup command queues (%u)\n", nr_queues);
+		return ret;
+	}
+
+	ret = init_pending_queues(cptvf, qlen, nr_queues);
+	if (ret) {
+		pr_err("Failed to setup pending queues (%u)\n", nr_queues);
+		goto setup_pqfail;
+	}
+
+	/* Create worker threads for BH processing */
+	ret = init_worker_threads(cptvf);
+	if (ret) {
+		pr_err("Failed to setup worker threads\n");
+		goto init_work_fail;
+	}
+
+	return 0;
+
+init_work_fail:
+	cleanup_worker_threads(cptvf);
+	cleanup_pending_queues(cptvf);
+
+setup_pqfail:
+	cleanup_command_queues(cptvf);
+
+	return ret;
+}
+
+static inline int cptvf_get_node_id(struct pci_dev *pdev)
+{
+	uint64_t addr = pci_resource_start(pdev, CPT_CSR_BAR);
+
+	return ((addr >> CPT_NODE_ID_SHIFT) & CPT_NODE_ID_MASK);
+}
+
+static void cptvf_disable_msix(struct cpt_vf *cptvf)
+{
+	if (cptvf->msix_enabled) {
+		pci_disable_msix(cptvf->pdev);
+		cptvf->msix_enabled = 0;
+		cptvf->num_vec = 0;
+	}
+}
+
+static int cptvf_enable_msix(struct cpt_vf *cptvf)
+{
+	int i, ret;
+
+	cptvf->num_vec = CPT_VF_MSIX_VECTORS;
+
+	for (i = 0; i < cptvf->num_vec; i++)
+		cptvf->msix_entries[i].entry = i;
+
+	ret = pci_enable_msix(cptvf->pdev, cptvf->msix_entries,
+			      cptvf->num_vec);
+	if (ret) {
+		dev_err(&cptvf->pdev->dev, "Request for #%d msix vectors failed\n",
+			cptvf->num_vec);
+		return ret;
+	}
+
+	cptvf->msix_enabled = 1;
+	/* Mark MSIX enabled */
+	cptvf->flags |= CPT_FLAG_MSIX_ENABLED;
+
+	return 0;
+}
+
+static void cptvf_free_all_interrupts(struct cpt_vf *cptvf)
+{
+	int irq;
+
+	for (irq = 0; irq < cptvf->num_vec; irq++) {
+		if (cptvf->irq_allocated[irq])
+			irq_set_affinity_hint(cptvf->msix_entries[irq].vector,
+					      NULL);
+		free_cpumask_var(cptvf->affinity_mask[irq]);
+		free_irq(cptvf->msix_entries[irq].vector, cptvf);
+		cptvf->irq_allocated[irq] = false;
+	}
+}
+
+static void cptvf_write_vq_ctl(struct cpt_vf *cptvf, bool val)
+{
+	union cptx_vqx_ctl vqx_ctl;
+
+	vqx_ctl.u = cpt_read_csr64(cptvf->reg_base, CPTX_VQX_CTL(0, 0));
+	vqx_ctl.s.ena = val;
+	cpt_write_csr64(cptvf->reg_base, CPTX_VQX_CTL(0, 0), vqx_ctl.u);
+}
+
+void cptvf_write_vq_doorbell(struct cpt_vf *cptvf, uint32_t val)
+{
+	union cptx_vqx_doorbell vqx_dbell;
+
+	vqx_dbell.u = cpt_read_csr64(cptvf->reg_base,
+				     CPTX_VQX_DOORBELL(0, 0));
+	vqx_dbell.s.dbell_cnt = val * 8; /* Num of Instructions * 8 words */
+	cpt_write_csr64(cptvf->reg_base, CPTX_VQX_DOORBELL(0, 0),
+			vqx_dbell.u);
+}
+
+static void cptvf_write_vq_inprog(struct cpt_vf *cptvf, uint8_t val)
+{
+	union cptx_vqx_inprog vqx_inprg;
+
+	vqx_inprg.u = cpt_read_csr64(cptvf->reg_base, CPTX_VQX_INPROG(0, 0));
+	vqx_inprg.s.inflight = val;
+	cpt_write_csr64(cptvf->reg_base, CPTX_VQX_INPROG(0, 0), vqx_inprg.u);
+}
+
+static void cptvf_write_vq_done_numwait(struct cpt_vf *cptvf, uint32_t val)
+{
+	union cptx_vqx_done_wait vqx_dwait;
+
+	vqx_dwait.u = cpt_read_csr64(cptvf->reg_base,
+				     CPTX_VQX_DONE_WAIT(0, 0));
+	vqx_dwait.s.num_wait = val;
+	cpt_write_csr64(cptvf->reg_base, CPTX_VQX_DONE_WAIT(0, 0),
+			vqx_dwait.u);
+}
+
+static void cptvf_write_vq_done_timewait(struct cpt_vf *cptvf, uint16_t val)
+{
+	union cptx_vqx_done_wait vqx_dwait;
+
+	vqx_dwait.u = cpt_read_csr64(cptvf->reg_base,
+				     CPTX_VQX_DONE_WAIT(0, 0));
+	vqx_dwait.s.time_wait = val;
+	cpt_write_csr64(cptvf->reg_base, CPTX_VQX_DONE_WAIT(0, 0),
+			vqx_dwait.u);
+}
+
+static void cptvf_enable_swerr_interrupts(struct cpt_vf *cptvf)
+{
+	union cptx_vqx_misc_ena_w1s vqx_misc_ena;
+
+	vqx_misc_ena.u = cpt_read_csr64(cptvf->reg_base,
+					CPTX_VQX_MISC_ENA_W1S(0, 0));
+	/* Set mbox(0) interupts for the requested vf */
+	vqx_misc_ena.s.swerr = 1;
+	cpt_write_csr64(cptvf->reg_base, CPTX_VQX_MISC_ENA_W1S(0, 0),
+			vqx_misc_ena.u);
+}
+
+static void cptvf_enable_mbox_interrupts(struct cpt_vf *cptvf)
+{
+	union cptx_vqx_misc_ena_w1s vqx_misc_ena;
+
+	vqx_misc_ena.u = cpt_read_csr64(cptvf->reg_base,
+					CPTX_VQX_MISC_ENA_W1S(0, 0));
+	/* Set mbox(0) interupts for the requested vf */
+	vqx_misc_ena.s.mbox = 1;
+	cpt_write_csr64(cptvf->reg_base, CPTX_VQX_MISC_ENA_W1S(0, 0),
+			vqx_misc_ena.u);
+}
+
+static void cptvf_enable_done_interrupts(struct cpt_vf *cptvf)
+{
+	union cptx_vqx_done_ena_w1s vqx_done_ena;
+
+	vqx_done_ena.u = cpt_read_csr64(cptvf->reg_base,
+					CPTX_VQX_DONE_ENA_W1S(0, 0));
+	/* Set DONE interrupt for the requested vf */
+	vqx_done_ena.s.done = 1;
+	cpt_write_csr64(cptvf->reg_base, CPTX_VQX_DONE_ENA_W1S(0, 0),
+			vqx_done_ena.u);
+}
+
+static void cptvf_clear_dovf_intr(struct cpt_vf *cptvf)
+{
+	union cptx_vqx_misc_int vqx_misc_int;
+
+	vqx_misc_int.u = cpt_read_csr64(cptvf->reg_base,
+					CPTX_VQX_MISC_INT(0, 0));
+	/* W1C for the VF */
+	vqx_misc_int.s.dovf = 1;
+	cpt_write_csr64(cptvf->reg_base, CPTX_VQX_MISC_INT(0, 0),
+			vqx_misc_int.u);
+}
+
+static void cptvf_clear_irde_intr(struct cpt_vf *cptvf)
+{
+	union cptx_vqx_misc_int vqx_misc_int;
+
+	vqx_misc_int.u = cpt_read_csr64(cptvf->reg_base,
+					CPTX_VQX_MISC_INT(0, 0));
+	/* W1C for the VF */
+	vqx_misc_int.s.irde = 1;
+	cpt_write_csr64(cptvf->reg_base, CPTX_VQX_MISC_INT(0, 0),
+			vqx_misc_int.u);
+}
+
+static void cptvf_clear_nwrp_intr(struct cpt_vf *cptvf)
+{
+	union cptx_vqx_misc_int vqx_misc_int;
+
+	vqx_misc_int.u = cpt_read_csr64(cptvf->reg_base,
+					CPTX_VQX_MISC_INT(0, 0));
+	/* W1C for the VF */
+	vqx_misc_int.s.nwrp = 1;
+	cpt_write_csr64(cptvf->reg_base,
+			CPTX_VQX_MISC_INT(0, 0), vqx_misc_int.u);
+}
+
+static void cptvf_clear_mbox_intr(struct cpt_vf *cptvf)
+{
+	union cptx_vqx_misc_int vqx_misc_int;
+
+	vqx_misc_int.u = cpt_read_csr64(cptvf->reg_base,
+					CPTX_VQX_MISC_INT(0, 0));
+	/* W1C for the VF */
+	vqx_misc_int.s.mbox = 1;
+	cpt_write_csr64(cptvf->reg_base, CPTX_VQX_MISC_INT(0, 0),
+			vqx_misc_int.u);
+}
+
+static void cptvf_clear_swerr_intr(struct cpt_vf *cptvf)
+{
+	union cptx_vqx_misc_int vqx_misc_int;
+
+	vqx_misc_int.u = cpt_read_csr64(cptvf->reg_base,
+					CPTX_VQX_MISC_INT(0, 0));
+	/* W1C for the VF */
+	vqx_misc_int.s.swerr = 1;
+	cpt_write_csr64(cptvf->reg_base, CPTX_VQX_MISC_INT(0, 0),
+			vqx_misc_int.u);
+}
+
+static uint64_t cptvf_read_vf_misc_intr_status(struct cpt_vf *cptvf)
+{
+	return cpt_read_csr64(cptvf->reg_base, CPTX_VQX_MISC_INT(0, 0));
+}
+
+static irqreturn_t cptvf_misc_intr_handler(int irq, void *cptvf_irq)
+{
+	struct cpt_vf *cptvf = (struct cpt_vf *)cptvf_irq;
+	uint64_t intr;
+
+	intr = cptvf_read_vf_misc_intr_status(cptvf);
+	/*Check for MISC interrupt types*/
+	if (likely(intr & CPT_VF_INTR_MBOX_MASK)) {
+		pr_err("Mailbox interrupt 0x%llx on CPT VF %d\n",
+		       intr, cptvf->vfid);
+		cptvf_handle_mbox_intr(cptvf);
+		cptvf_clear_mbox_intr(cptvf);
+	} else if (unlikely(intr & CPT_VF_INTR_DOVF_MASK)) {
+		cptvf_clear_dovf_intr(cptvf);
+		/*Clear doorbell count*/
+		cptvf_write_vq_doorbell(cptvf, 0);
+		pr_err("Doorbell overflow error interrupt 0x%llx on CPT VF %d\n",
+		       intr, cptvf->vfid);
+	} else if (unlikely(intr & CPT_VF_INTR_IRDE_MASK)) {
+		cptvf_clear_irde_intr(cptvf);
+		pr_err("Instruction NCB read error interrupt 0x%llx on CPT VF %d\n",
+		       intr, cptvf->vfid);
+	} else if (unlikely(intr & CPT_VF_INTR_NWRP_MASK)) {
+		cptvf_clear_nwrp_intr(cptvf);
+		pr_err("NCB response write error interrupt 0x%llx on CPT VF %d\n",
+		       intr, cptvf->vfid);
+	} else if (unlikely(intr & CPT_VF_INTR_SERR_MASK)) {
+		cptvf_clear_swerr_intr(cptvf);
+		pr_err("Software error interrupt 0x%llx on CPT VF %d\n",
+		       intr, cptvf->vfid);
+	} else {
+		pr_err("Unhandled interrupt in CPT VF %d\n", cptvf->vfid);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static inline struct cptvf_wqe *get_cptvf_vq_wqe(struct cpt_vf *cptvf,
+						 int qno)
+{
+	struct cptvf_wqe_info *nwqe_info;
+
+	if (unlikely(qno >= cptvf->nr_queues))
+		return NULL;
+	nwqe_info = (struct cptvf_wqe_info *)cptvf->wqe_info;
+
+	return &nwqe_info->vq_wqe[qno];
+}
+
+static inline uint32_t cptvf_read_vq_done_count(struct cpt_vf *cptvf)
+{
+	union cptx_vqx_done vqx_done;
+
+	vqx_done.u = cpt_read_csr64(cptvf->reg_base, CPTX_VQX_DONE(0, 0));
+	return vqx_done.s.done;
+}
+
+static inline void cptvf_write_vq_done_ack(struct cpt_vf *cptvf,
+					   uint32_t ackcnt)
+{
+	union cptx_vqx_done_ack vqx_dack_cnt;
+
+	vqx_dack_cnt.u = cpt_read_csr64(cptvf->reg_base,
+					CPTX_VQX_DONE_ACK(0, 0));
+	vqx_dack_cnt.s.done_ack = ackcnt;
+	cpt_write_csr64(cptvf->reg_base, CPTX_VQX_DONE_ACK(0, 0),
+			vqx_dack_cnt.u);
+}
+
+static irqreturn_t cptvf_done_intr_handler(int irq, void *cptvf_irq)
+{
+	struct cpt_vf *cptvf = (struct cpt_vf *)cptvf_irq;
+	/* Read the number of completions */
+	uint32_t intr = cptvf_read_vq_done_count(cptvf);
+
+	cptvf->intcnt += intr;
+	if (intr) {
+		struct cptvf_wqe *wqe;
+
+		/* Acknowledge the number of
+		 * scheduled completions for processing
+		 */
+		cptvf_write_vq_done_ack(cptvf, intr);
+		wqe = get_cptvf_vq_wqe(cptvf, 0);
+		if (unlikely(!wqe)) {
+			pr_err("No work to schedule for VF (%d)",
+			       cptvf->vfid);
+			return 1;
+		}
+		tasklet_hi_schedule(&wqe->twork);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static int cptvf_register_misc_intr(struct cpt_vf *cptvf)
+{
+	int ret;
+	struct device *dev = &cptvf->pdev->dev;
+
+	/* Register misc interrupt handlers */
+	ret = request_irq(cptvf->msix_entries[CPT_VF_INT_VEC_E_MISC].vector,
+			  cptvf_misc_intr_handler, 0, "CPT VF misc intr",
+			  cptvf);
+	if (ret)
+		goto fail;
+
+	cptvf->irq_allocated[CPT_VF_INT_VEC_E_MISC] = true;
+
+	/* Enable mailbox interrupt */
+	cptvf_enable_mbox_interrupts(cptvf);
+	cptvf_enable_swerr_interrupts(cptvf);
+
+	return 0;
+
+fail:
+	dev_err(dev, "Request misc irq failed");
+	cptvf_free_all_interrupts(cptvf);
+	return ret;
+}
+
+static int cptvf_register_done_intr(struct cpt_vf *cptvf)
+{
+	int ret;
+	struct device *dev = &cptvf->pdev->dev;
+
+	/* Register DONE interrupt handlers */
+	ret = request_irq(cptvf->msix_entries[CPT_VF_INT_VEC_E_DONE].vector,
+			  cptvf_done_intr_handler, 0, "CPT VF done intr",
+			  cptvf);
+	if (ret)
+		goto fail;
+
+	cptvf->irq_allocated[CPT_VF_INT_VEC_E_DONE] = true;
+
+	/* Enable mailbox interrupt */
+	cptvf_enable_done_interrupts(cptvf);
+	return 0;
+
+fail:
+	dev_err(dev, "Request done irq failed\n");
+	cptvf_free_all_interrupts(cptvf);
+	return ret;
+}
+
+static void cptvf_unregister_interrupts(struct cpt_vf *cptvf)
+{
+	cptvf_free_all_interrupts(cptvf);
+	cptvf_disable_msix(cptvf);
+}
+
+static void cptvf_set_irq_affinity(struct cpt_vf *cptvf)
+{
+	int32_t vec, cpu;
+	int32_t irqnum;
+
+	for (vec = 0; vec < cptvf->num_vec; vec++) {
+		if (!cptvf->irq_allocated[vec])
+			continue;
+
+		if (!zalloc_cpumask_var(&cptvf->affinity_mask[vec],
+					GFP_KERNEL)) {
+			pr_err("Allocation failed for affinity_mask for VF %d",
+			       cptvf->vfid);
+			return;
+		}
+
+		cpu = cptvf->vfid % num_online_cpus();
+		cpumask_set_cpu(cpumask_local_spread(cpu, cptvf->node),
+				cptvf->affinity_mask[vec]);
+		irqnum = cptvf->msix_entries[vec].vector;
+		irq_set_affinity_hint(irqnum, cptvf->affinity_mask[vec]);
+	}
+}
+
+static void cptvf_write_vq_saddr(struct cpt_vf *cptvf, uint64_t val)
+{
+	union cptx_vqx_saddr vqx_saddr;
+
+	vqx_saddr.u = val;
+	cpt_write_csr64(cptvf->reg_base, CPTX_VQX_SADDR(0, 0), vqx_saddr.u);
+}
+
+void cptvf_device_init(struct cpt_vf *cptvf)
+{
+	uint64_t base_addr = 0;
+
+	cptvf->chip_id = CPTVF_81XX_PASS1_0;
+	/* Disable the VQ */
+	cptvf_write_vq_ctl(cptvf, 0);
+	/* Reset the doorbell */
+	cptvf_write_vq_doorbell(cptvf, 0);
+	/* Clear inflight */
+	cptvf_write_vq_inprog(cptvf, 0);
+	/* Write VQ SADDR */
+	/* TODO: for now only one queue, so hard coded */
+	base_addr = (uint64_t)(cptvf->cqinfo.queue[0].qhead->dma_addr);
+	cptvf_write_vq_saddr(cptvf, base_addr);
+	/* Configure timerhold / coalescence */
+	cptvf_write_vq_done_timewait(cptvf, CPT_TIMER_THOLD);
+	cptvf_write_vq_done_numwait(cptvf, CPT_COUNT_THOLD);
+	/* Enable the VQ */
+	cptvf_write_vq_ctl(cptvf, 1);
+	/* Flag the VF ready */
+	cptvf->flags |= CPT_FLAG_DEVICE_READY;
+}
+
+static int cptvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
+{
+	struct device *dev = &pdev->dev;
+	struct cpt_vf *cptvf;
+	int    err;
+
+	cptvf = devm_kzalloc(dev, sizeof(struct cpt_vf), GFP_KERNEL);
+	if (!cptvf)
+		return -ENOMEM;
+
+	pci_set_drvdata(pdev, cptvf);
+	cptvf->pdev = pdev;
+	err = pci_enable_device(pdev);
+	if (err) {
+		dev_err(dev, "Failed to enable PCI device\n");
+		pci_set_drvdata(pdev, NULL);
+		return err;
+	}
+
+	err = pci_request_regions(pdev, DRV_NAME);
+	if (err) {
+		dev_err(dev, "PCI request regions failed 0x%x\n", err);
+		goto cptvf_err_disable_device;
+	}
+	/* Mark as VF driver */
+	cptvf->flags |= CPT_FLAG_VF_DRIVER;
+	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(48));
+	if (err) {
+		dev_err(dev, "Unable to get usable DMA configuration\n");
+		goto cptvf_err_release_regions;
+	}
+
+	err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(48));
+	if (err) {
+		dev_err(dev, "Unable to get 48-bit DMA for consistent allocations\n");
+		goto cptvf_err_release_regions;
+	}
+
+	/* MAP PF's configuration registers */
+	cptvf->reg_base = pcim_iomap(pdev, CPT_CSR_BAR, 0);
+	if (!cptvf->reg_base) {
+		dev_err(dev, "Cannot map config register space, aborting\n");
+		err = -ENOMEM;
+		goto cptvf_err_release_regions;
+	}
+
+	cptvf->node = cptvf_get_node_id(pdev);
+	/* Enable MSI-X */
+	err = cptvf_enable_msix(cptvf);
+	if (err) {
+		dev_err(dev, "cptvf_enable_msix() failed");
+		goto cptvf_err_release_regions;
+	}
+
+	/* Register mailbox interrupts */
+	cptvf_register_misc_intr(cptvf);
+
+	/* Check ready with PF */
+	/* Gets chip ID / device Id from PF if ready */
+	err = cptvf_check_pf_ready(cptvf);
+	if (err) {
+		dev_err(dev, "PF not responding to READY msg");
+		err = -EBUSY;
+		goto cptvf_err_release_regions;
+	}
+
+	/* CPT VF software resources initialization */
+	cptvf->cqinfo.qchunksize = chunksize;
+	err = cptvf_sw_init(cptvf, qlen, CPT_NUM_QS_PER_VF);
+	if (err) {
+		dev_err(dev, "cptvf_sw_init() failed");
+		goto cptvf_err_release_regions;
+	}
+	/* Convey VQ LEN to PF */
+	err = cptvf_send_vq_size_msg(cptvf);
+	if (err) {
+		dev_err(dev, "PF not responding to QLEN msg");
+		err = -EBUSY;
+		goto cptvf_err_release_regions;
+	}
+
+	/* CPT VF device initialization */
+	cptvf_device_init(cptvf);
+	/* Send msg to PF to assign currnet Q to required group */
+	cptvf->vfgrp = group;
+	err = cptvf_send_vf_to_grp_msg(cptvf);
+	if (err) {
+		dev_err(dev, "PF not responding to VF_GRP msg");
+		err = -EBUSY;
+		goto cptvf_err_release_regions;
+	}
+
+	cptvf->priority = priority;
+	err = cptvf_send_vf_priority_msg(cptvf);
+	if (err) {
+		dev_err(dev, "PF not responding to VF_PRIO msg");
+		err = -EBUSY;
+		goto cptvf_err_release_regions;
+	}
+	/* Register DONE interrupts */
+	err = cptvf_register_done_intr(cptvf);
+	if (err)
+		goto cptvf_err_release_regions;
+
+	/* Set irq affinity masks */
+	cptvf_set_irq_affinity(cptvf);
+	/* Convey UP to PF */
+	err = cptvf_send_vf_up(cptvf);
+	if (err) {
+		dev_err(dev, "PF not responding to UP msg");
+		err = -EBUSY;
+		goto cptvf_up_fail;
+	}
+	err = cvm_crypto_init(cptvf);
+	if (err) {
+		dev_err(dev, "Algorithm register failed\n");
+		err = -EBUSY;
+		goto cptvf_up_fail;
+	}
+	return 0;
+
+cptvf_up_fail:
+	cptvf_unregister_interrupts(cptvf);
+cptvf_err_release_regions:
+	pci_release_regions(pdev);
+cptvf_err_disable_device:
+	pci_disable_device(pdev);
+	pci_set_drvdata(pdev, NULL);
+
+	return err;
+}
+
+static void cptvf_remove(struct pci_dev *pdev)
+{
+	struct cpt_vf *cptvf = pci_get_drvdata(pdev);
+
+	if (!cptvf)
+		pr_err("Invalid CPT-VF device\n");
+
+	/* Convey DOWN to PF */
+	if (cptvf_send_vf_down(cptvf)) {
+		pr_err("PF not responding to DOWN msg");
+	} else {
+		cptvf_unregister_interrupts(cptvf);
+		cptvf_sw_cleanup(cptvf);
+		pci_set_drvdata(pdev, NULL);
+		pci_release_regions(pdev);
+		pci_disable_device(pdev);
+		cvm_crypto_exit();
+	}
+}
+
+static void cptvf_shutdown(struct pci_dev *pdev)
+{
+	cptvf_remove(pdev);
+}
+
+/* Supported devices */
+static const struct pci_device_id cptvf_id_table[] = {
+	{PCI_VDEVICE(CAVIUM, CPT_81XX_PCI_VF_DEVICE_ID), 0},
+	{ 0, }  /* end of table */
+};
+
+static struct pci_driver cptvf_pci_driver = {
+	.name = DRV_NAME,
+	.id_table = cptvf_id_table,
+	.probe = cptvf_probe,
+	.remove = cptvf_remove,
+	.shutdown = cptvf_shutdown,
+};
+
+static int __init cptvf_init_module(void)
+{
+	int ret = -1;
+
+	pr_info("%s, ver %s\n", DRV_NAME, DRV_VERSION);
+	if (group < 0 || group > 7) {
+		pr_warn("Invalid group. Should be (0-7), setting to default 1.\n");
+		group = 1;
+	}
+
+	if (chunksize > CPT_INST_CHUNK_MAX_SIZE || chunksize <= 0) {
+		pr_warn("Invalid instruction chunk size. Should be (1-1023). Setting to default 1023\n");
+		chunksize = CPT_INST_CHUNK_MAX_SIZE;
+	}
+
+	if ((qlen > chunksize) && (qlen % chunksize != 0)) {
+		pr_warn("qlen should be multiple of chunksize when qlen > chunksize, rounding up qlen\n");
+		qlen += chunksize - (qlen % chunksize);
+	}
+
+	if (priority < 0 || priority > 1) {
+		pr_warn("Invalid VQ/VF priority. Should be (0-1), setting to default 0.\n");
+		priority = 0;
+	}
+
+	ret = pci_register_driver(&cptvf_pci_driver);
+	if (ret)
+		pr_err("pci_register_driver() failed");
+
+	return ret;
+}
+
+static void __exit cptvf_cleanup_module(void)
+{
+	pci_unregister_driver(&cptvf_pci_driver);
+}
+
+module_init(cptvf_init_module);
+module_exit(cptvf_cleanup_module);
+
+MODULE_AUTHOR("George Cherian <george.cherian@cavium.com>, Murthy Nidadavolu");
+MODULE_DESCRIPTION("Cavium Thunder CPT Physical Function Driver");
+MODULE_LICENSE("GPL v2");
+MODULE_VERSION(DRV_VERSION);
+MODULE_DEVICE_TABLE(pci, cptvf_id_table);
diff --git a/drivers/crypto/cavium/cpt/cptvf_mbox.c b/drivers/crypto/cavium/cpt/cptvf_mbox.c
new file mode 100644
index 0000000..80de249
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/cptvf_mbox.c
@@ -0,0 +1,208 @@
+/*
+ * Copyright (C) 2016 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ */
+
+#include "cptvf.h"
+
+static void cptvf_send_msg_to_pf(struct cpt_vf *cptvf, struct cpt_mbox *mbx)
+{
+	/* Writing mbox(1) causes interrupt */
+	cpt_write_csr64(cptvf->reg_base, CPTX_VFX_PF_MBOXX(0, 0, 0),
+			mbx->msg);
+	cpt_write_csr64(cptvf->reg_base, CPTX_VFX_PF_MBOXX(0, 0, 1),
+			mbx->data);
+}
+
+/* ACKs PF's mailbox message
+ */
+void cptvf_mbox_send_ack(struct cpt_vf *cptvf, struct cpt_mbox *mbx)
+{
+	mbx->msg = CPT_MBOX_MSG_TYPE_ACK;
+	cptvf_send_msg_to_pf(cptvf, mbx);
+}
+
+/* NACKs PF's mailbox message that VF is not able to
+ * complete the action
+ */
+void cptvf_mbox_send_nack(struct cpt_vf *cptvf, struct cpt_mbox *mbx)
+{
+	mbx->msg = CPT_MBOX_MSG_TYPE_NACK;
+	cptvf_send_msg_to_pf(cptvf, mbx);
+}
+
+/* Interrupt handler to handle mailbox messages from VFs */
+void cptvf_handle_mbox_intr(struct cpt_vf *cptvf)
+{
+	struct cpt_mbox mbx = {};
+
+	/*
+	 * MBOX[0] contains msg
+	 * MBOX[1] contains data
+	 */
+	mbx.msg  = cpt_read_csr64(cptvf->reg_base, CPTX_VFX_PF_MBOXX(0, 0, 0));
+	mbx.data = cpt_read_csr64(cptvf->reg_base, CPTX_VFX_PF_MBOXX(0, 0, 1));
+	dev_dbg(&cptvf->pdev->dev, "%s: Mailbox msg 0x%llx from PF\n",
+		__func__, mbx.msg);
+	switch (mbx.msg) {
+	case CPT_MSG_READY:
+	{
+		union cpt_chipid_vfid cid;
+
+		cid.u16 = mbx.data;
+		cptvf->pf_acked = true;
+		cptvf->vfid = cid.s.vfid;
+		dev_dbg(&cptvf->pdev->dev, "Received VFID %d\n", cptvf->vfid);
+		break;
+	}
+	case CPT_MSG_QBIND_GRP:
+		cptvf->pf_acked = true;
+		cptvf->vftype = mbx.data;
+		dev_dbg(&cptvf->pdev->dev, "VF %d type %s group %d\n",
+			cptvf->vfid, ((mbx.data == SE_TYPES) ? "SE" : "AE"),
+			cptvf->vfgrp);
+		break;
+	case CPT_MBOX_MSG_TYPE_ACK:
+		cptvf->pf_acked = true;
+		break;
+	case CPT_MBOX_MSG_TYPE_NACK:
+		cptvf->pf_nacked = true;
+		break;
+	default:
+		dev_err(&cptvf->pdev->dev, "Invalid msg from PF, msg 0x%llx\n",
+			mbx.msg);
+		break;
+	}
+}
+
+static int32_t cptvf_send_msg_to_pf_timeout(struct cpt_vf *cptvf,
+					    struct cpt_mbox *mbx)
+{
+	int timeout = CPT_MBOX_MSG_TIMEOUT;
+	int sleep = 10;
+
+	cptvf->pf_acked = false;
+	cptvf->pf_nacked = false;
+	cptvf_send_msg_to_pf(cptvf, mbx);
+	/* Wait for previous message to be acked, timeout 2sec */
+	while (!cptvf->pf_acked) {
+		if (cptvf->pf_nacked)
+			return -EINVAL;
+		msleep(sleep);
+		if (cptvf->pf_acked)
+			break;
+		timeout -= sleep;
+		if (!timeout) {
+			dev_err(&cptvf->pdev->dev, "PF didn't ack to mbox msg %llx from VF%u\n",
+				(mbx->msg & 0xFF), cptvf->vfid);
+			return -EBUSY;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * Checks if VF is able to comminicate with PF
+ * and also gets the CPT number this VF is associated to.
+ */
+int cptvf_check_pf_ready(struct cpt_vf *cptvf)
+{
+	struct cpt_mbox mbx = {};
+
+	mbx.msg = CPT_MSG_READY;
+	if (cptvf_send_msg_to_pf_timeout(cptvf, &mbx)) {
+		dev_err(&cptvf->pdev->dev, "PF didn't respond to READY msg\n");
+		return 1;
+	}
+
+	return 0;
+}
+
+/*
+ * Communicate VQs size to PF to program CPT(0)_PF_Q(0-15)_CTL of the VF.
+ * Must be ACKed.
+ */
+int cptvf_send_vq_size_msg(struct cpt_vf *cptvf)
+{
+	struct cpt_mbox mbx = {};
+
+	mbx.msg = CPT_MSG_QLEN;
+	mbx.data = cptvf->qsize;
+	if (cptvf_send_msg_to_pf_timeout(cptvf, &mbx)) {
+		dev_err(&cptvf->pdev->dev, "PF didn't respond to vq_size msg\n");
+		return 1;
+	}
+
+	return 0;
+}
+
+/*
+ * Communicate VF group required to PF and get the VQ binded to that group
+ */
+int cptvf_send_vf_to_grp_msg(struct cpt_vf *cptvf)
+{
+	struct cpt_mbox mbx = {};
+
+	mbx.msg = CPT_MSG_QBIND_GRP;
+	/* Convey group of the VF */
+	mbx.data = cptvf->vfgrp;
+	if (cptvf_send_msg_to_pf_timeout(cptvf, &mbx)) {
+		dev_err(&cptvf->pdev->dev, "PF didn't respond to vf_type msg\n");
+		return 1;
+	}
+
+	return 0;
+}
+
+/*
+ * Communicate VF group required to PF and get the VQ binded to that group
+ */
+int cptvf_send_vf_priority_msg(struct cpt_vf *cptvf)
+{
+	struct cpt_mbox mbx = {};
+
+	mbx.msg = CPT_MSG_VQ_PRIORITY;
+	/* Convey group of the VF */
+	mbx.data = cptvf->priority;
+	if (cptvf_send_msg_to_pf_timeout(cptvf, &mbx)) {
+		dev_err(&cptvf->pdev->dev, "PF didn't respond to vf_type msg\n");
+		return 1;
+	}
+	return 0;
+}
+
+/*
+ * Communicate to PF that VF is UP and running
+ */
+int cptvf_send_vf_up(struct cpt_vf *cptvf)
+{
+	struct cpt_mbox mbx = {};
+
+	mbx.msg = CPT_MSG_VF_UP;
+	if (cptvf_send_msg_to_pf_timeout(cptvf, &mbx)) {
+		dev_err(&cptvf->pdev->dev, "PF didn't respond to UP msg\n");
+		return 1;
+	}
+
+	return 0;
+}
+
+/*
+ * Communicate to PF that VF is DOWN and running
+ */
+int cptvf_send_vf_down(struct cpt_vf *cptvf)
+{
+	struct cpt_mbox mbx = {};
+
+	mbx.msg = CPT_MSG_VF_DOWN;
+	if (cptvf_send_msg_to_pf_timeout(cptvf, &mbx)) {
+		dev_err(&cptvf->pdev->dev, "PF didn't respond to DOWN msg\n");
+		return 1;
+	}
+
+	return 0;
+}
diff --git a/drivers/crypto/cavium/cpt/cptvf_reqmanager.c b/drivers/crypto/cavium/cpt/cptvf_reqmanager.c
new file mode 100644
index 0000000..e6fc3f9
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/cptvf_reqmanager.c
@@ -0,0 +1,655 @@
+/*
+ * Copyright (C) 2016 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ */
+
+#include <linux/bitmap.h>
+#include <linux/kdev_t.h>
+#include <linux/fs.h>
+#include <linux/device.h>
+#include <linux/cdev.h>
+#include <linux/poll.h>
+
+#include "cptvf.h"
+#include "request_manager.h"
+
+/**
+ * get_free_pending_entry - get free entry from pending queue
+ * @param pqinfo: pending_qinfo structure
+ * @param qno: queue number
+ */
+static struct pending_entry *get_free_pending_entry(struct pending_queue *q,
+						    int32_t qlen)
+{
+	struct pending_entry *ent = NULL;
+
+	ent = &q->head[q->rear];
+	if (unlikely(ent->busy)) {
+		ent = NULL;
+		goto no_free_entry;
+	}
+
+	q->rear++;
+	if (unlikely(q->rear == qlen))
+		q->rear = 0;
+
+no_free_entry:
+	return ent;
+}
+
+static inline void pending_queue_inc_front(struct pending_qinfo *pqinfo,
+					   int32_t qno)
+{
+	struct pending_queue *queue = &pqinfo->queue[qno];
+
+	queue->front++;
+	if (unlikely(queue->front == pqinfo->qlen))
+		queue->front = 0;
+}
+
+static int32_t setup_sgio_components(struct cpt_vf *cptvf,
+				     struct buf_ptr *list,
+				     int32_t buf_count, uint8_t *buffer)
+{
+	int32_t ret = 0, i, j;
+	int32_t components;
+	struct sglist_component *sg_ptr = NULL;
+	struct pci_dev *pdev = cptvf->pdev;
+
+	if (unlikely(!list)) {
+		pr_err("Input List pointer is NULL\n");
+		ret = -EFAULT;
+		return ret;
+	}
+
+	for (i = 0; i < buf_count; i++) {
+		if (likely(list[i].vptr)) {
+			list[i].dma_addr = dma_map_single(&pdev->dev,
+							  list[i].vptr,
+							  list[i].size,
+							  DMA_BIDIRECTIONAL);
+			if (unlikely(dma_mapping_error(&pdev->dev,
+						       list[i].dma_addr))) {
+				pr_err("DMA map kernel buffer failed for component: %d\n",
+				       i);
+				ret = -EIO;
+				goto sg_cleanup;
+			}
+		}
+	}
+
+	components = buf_count / 4;
+	sg_ptr = (struct sglist_component *)buffer;
+	for (i = 0; i < components; i++) {
+		sg_ptr->u.s.len0 = cpu_to_be16(list[i * 4 + 0].size);
+		sg_ptr->u.s.len1 = cpu_to_be16(list[i * 4 + 1].size);
+		sg_ptr->u.s.len2 = cpu_to_be16(list[i * 4 + 2].size);
+		sg_ptr->u.s.len3 = cpu_to_be16(list[i * 4 + 3].size);
+		sg_ptr->ptr0 = cpu_to_be64(list[i * 4 + 0].dma_addr);
+		sg_ptr->ptr1 = cpu_to_be64(list[i * 4 + 1].dma_addr);
+		sg_ptr->ptr2 = cpu_to_be64(list[i * 4 + 2].dma_addr);
+		sg_ptr->ptr3 = cpu_to_be64(list[i * 4 + 3].dma_addr);
+		sg_ptr++;
+	}
+
+	components = buf_count % 4;
+
+	switch (components) {
+	case 3:
+		sg_ptr->u.s.len2 = cpu_to_be16(list[i * 4 + 2].size);
+		sg_ptr->ptr2 = cpu_to_be64(list[i * 4 + 2].dma_addr);
+		/* Fall through */
+	case 2:
+		sg_ptr->u.s.len1 = cpu_to_be16(list[i * 4 + 1].size);
+		sg_ptr->ptr1 = cpu_to_be64(list[i * 4 + 1].dma_addr);
+		/* Fall through */
+	case 1:
+		sg_ptr->u.s.len0 = cpu_to_be16(list[i * 4 + 0].size);
+		sg_ptr->ptr0 = cpu_to_be64(list[i * 4 + 0].dma_addr);
+		break;
+	default:
+		break;
+	}
+
+	return ret;
+
+sg_cleanup:
+	for (j = 0; j < i; j++) {
+		if (list[j].dma_addr) {
+			dma_unmap_single(&pdev->dev, list[i].dma_addr,
+					 list[i].size, DMA_BIDIRECTIONAL);
+		}
+
+		list[j].dma_addr = 0;
+	}
+
+	return ret;
+}
+
+static inline int32_t setup_sgio_list(struct cpt_vf *cptvf,
+				      struct cpt_info_buffer *info,
+				      struct cpt_request_info *req)
+{
+	uint16_t g_size_bytes = 0, s_size_bytes = 0;
+	int32_t i = 0, ret = 0;
+	struct pci_dev *pdev = cptvf->pdev;
+
+	if ((req->incnt + req->outcnt) > MAX_SG_IN_OUT_CNT) {
+		pr_err("Requestes SG components are higher than supported\n");
+		ret = -EINVAL;
+		goto  scatter_gather_clean;
+	}
+
+	/* Setup gather (input) components */
+	info->g_size = (req->incnt + 3) / 4;
+	info->glist_cnt = req->incnt;
+	g_size_bytes = info->g_size * sizeof(struct sglist_component);
+	for (i = 0; i < req->incnt; i++) {
+		info->glist_ptr[i].vptr = req->in[i].ptr.addr;
+		info->glist_ptr[i].size = req->in[i].size;
+	}
+
+	info->gather_components = kzalloc((g_size_bytes), GFP_KERNEL);
+	if (!info->gather_components) {
+		ret = -ENOMEM;
+		goto  scatter_gather_clean;
+	}
+
+	ret = setup_sgio_components(cptvf, info->glist_ptr,
+				    info->glist_cnt,
+				    info->gather_components);
+	if (ret) {
+		pr_err("Failed to setup gather list\n");
+		ret = -EFAULT;
+		goto  scatter_gather_clean;
+	}
+
+	/* Setup scatter (output) components */
+	info->s_size = (req->outcnt + 3) / 4;
+	info->slist_cnt = req->outcnt;
+	s_size_bytes = info->s_size * sizeof(struct sglist_component);
+	for (i = 0; i < info->slist_cnt ; i++) {
+		info->slist_ptr[i].vptr = req->out[i].ptr.addr;
+		info->slist_ptr[i].size = req->out[i].size;
+		info->outptr[i] = req->out[i].ptr.addr;
+		info->outsize[i] = req->out[i].size;
+		info->total_out += info->outsize[i];
+	}
+
+	info->scatter_components = kzalloc((s_size_bytes), GFP_KERNEL);
+	if (!info->scatter_components) {
+		ret = -ENOMEM;
+		goto  scatter_gather_clean;
+	}
+
+	ret = setup_sgio_components(cptvf, info->slist_ptr,
+				    info->slist_cnt,
+				    info->scatter_components);
+	if (ret) {
+		pr_err("Failed to setup gather list\n");
+		ret = -EFAULT;
+		goto  scatter_gather_clean;
+	}
+
+	/* Create and initialize DPTR */
+	info->dlen = g_size_bytes + s_size_bytes + SG_LIST_HDR_SIZE;
+	info->in_buffer = kzalloc((info->dlen), GFP_KERNEL);
+	if (!info->in_buffer) {
+		ret = -ENOMEM;
+		goto  scatter_gather_clean;
+	}
+
+	((uint16_t *)info->in_buffer)[0] = info->slist_cnt;
+	((uint16_t *)info->in_buffer)[1] = info->glist_cnt;
+	((uint16_t *)info->in_buffer)[2] = 0;
+	((uint16_t *)info->in_buffer)[3] = 0;
+	byte_swap_64((uint64_t *)info->in_buffer);
+
+	memcpy(&info->in_buffer[8], info->gather_components,
+	       g_size_bytes);
+	memcpy(&info->in_buffer[8 + g_size_bytes],
+	       info->scatter_components, s_size_bytes);
+
+	info->dptr_baddr = dma_map_single(&pdev->dev,
+					       (void *)info->in_buffer,
+					       info->dlen,
+					       DMA_BIDIRECTIONAL);
+	if (dma_mapping_error(&pdev->dev, info->dptr_baddr)) {
+		pr_err("Mapping DPTR Failed %d\n", info->dlen);
+		ret = -EIO;
+		goto  scatter_gather_clean;
+	}
+
+	/* Create and initialize RPTR */
+	info->rlen = COMPLETION_CODE_SIZE;
+	info->out_buffer = kzalloc((info->rlen), GFP_KERNEL);
+	if (!info->out_buffer) {
+		ret = -ENOMEM;
+		goto  scatter_gather_clean;
+	}
+
+	*((uint64_t *)info->out_buffer) = ~((uint64_t)COMPLETION_CODE_INIT);
+	info->alternate_caddr = (uint64_t *)info->out_buffer;
+	info->rptr_baddr = dma_map_single(&pdev->dev,
+					       (void *)info->out_buffer,
+					       info->rlen,
+					       DMA_BIDIRECTIONAL);
+	if (dma_mapping_error(&pdev->dev, info->rptr_baddr)) {
+		pr_err("Mapping RPTR Failed %d\n", info->rlen);
+		ret = -EIO;
+		goto  scatter_gather_clean;
+	}
+
+	return 0;
+
+scatter_gather_clean:
+	return ret;
+}
+
+int32_t send_cpt_command(struct cpt_vf *cptvf, union cpt_inst_s *cmd,
+			 uint32_t qno)
+{
+	struct command_qinfo *qinfo = NULL;
+	struct command_queue *queue;
+	struct command_chunk *chunk;
+	uint8_t *ent;
+	int32_t ret = 0;
+
+	if (unlikely(qno >= cptvf->nr_queues)) {
+		pr_err("Invalid queue (qno: %d, nr_queues: %d)\n",
+		       qno, cptvf->nr_queues);
+		return -EINVAL;
+	}
+
+	qinfo = &cptvf->cqinfo;
+	queue = &qinfo->queue[qno];
+	/* lock commad queue */
+	spin_lock(&queue->lock);
+	ent = &queue->qhead->head[queue->idx * qinfo->cmd_size];
+	memcpy(ent, (void *)cmd, qinfo->cmd_size);
+
+	if (++queue->idx >= queue->qhead->size / 64) {
+		struct hlist_node *node;
+
+		hlist_for_each(node, &queue->chead) {
+			chunk = hlist_entry(node, struct command_chunk,
+					    nextchunk);
+			if (chunk == queue->qhead) {
+				continue;
+			} else {
+				queue->qhead = chunk;
+				break;
+			}
+		}
+		queue->idx = 0;
+	}
+	/* make sure all memory stores are done before ringing doorbell */
+	smp_wmb();
+	cptvf_write_vq_doorbell(cptvf, 1);
+	/* unlock command queue */
+	spin_unlock(&queue->lock);
+
+	return ret;
+}
+
+void do_request_cleanup(struct cpt_vf *cptvf,
+			struct cpt_info_buffer *info)
+{
+	int32_t i;
+	struct pci_dev *pdev = cptvf->pdev;
+
+	if (info->dptr_baddr) {
+		dma_unmap_single(&pdev->dev, info->dptr_baddr,
+				 info->dlen, DMA_BIDIRECTIONAL);
+		info->dptr_baddr = 0;
+	}
+
+	if (info->rptr_baddr) {
+		dma_unmap_single(&pdev->dev, info->rptr_baddr,
+				 info->rlen, DMA_BIDIRECTIONAL);
+		info->rptr_baddr = 0;
+	}
+
+	if (info->comp_baddr) {
+		dma_unmap_single(&pdev->dev, info->comp_baddr,
+				 sizeof(union cpt_res_s), DMA_BIDIRECTIONAL);
+		info->comp_baddr = 0;
+	}
+
+	if (info->dma_mode == DMA_GATHER_SCATTER) {
+		for (i = 0; i < info->slist_cnt; i++) {
+			if (info->slist_ptr[i].dma_addr) {
+				dma_unmap_single(&pdev->dev,
+						 info->slist_ptr[i].dma_addr,
+						 info->slist_ptr[i].size,
+						 DMA_BIDIRECTIONAL);
+				info->slist_ptr[i].dma_addr = 0ULL;
+			}
+		}
+		info->slist_cnt = 0;
+		if (info->scatter_components)
+			kzfree(info->scatter_components);
+
+		for (i = 0; i < info->glist_cnt; i++) {
+			if (info->glist_ptr[i].dma_addr) {
+				dma_unmap_single(&pdev->dev,
+						 info->glist_ptr[i].dma_addr,
+						 info->glist_ptr[i].size,
+						 DMA_BIDIRECTIONAL);
+				info->glist_ptr[i].dma_addr = 0ULL;
+			}
+		}
+		info->glist_cnt = 0;
+		if (info->gather_components)
+			kzfree((info->gather_components));
+	}
+
+	if (info->out_buffer) {
+		kzfree((info->out_buffer));
+		info->out_buffer = NULL;
+	}
+
+	if (info->in_buffer) {
+		kzfree((info->in_buffer));
+		info->in_buffer = NULL;
+	}
+
+	if (info->completion_addr) {
+		kzfree(((void *)info->completion_addr));
+		info->completion_addr = NULL;
+	}
+
+	if (info) {
+		kzfree((info));
+		info = NULL;
+	}
+}
+
+void do_post_process(struct cpt_vf *cptvf, struct cpt_info_buffer *info)
+{
+	uint64_t *p;
+	uint32_t i;
+
+	if (!info || !cptvf) {
+		pr_err("Input params are incorrect for post processing\n");
+		return;
+	}
+
+	if (info->rlen) {
+		for (i = 0; i < info->slist_cnt; i++) {
+			if (info->outunit[i] == UNIT_64_BIT) {
+				p = (uint64_t *)info->slist_ptr[i].vptr;
+				*p = cpu_to_be64(*p);
+			}
+		}
+	}
+
+	do_request_cleanup(cptvf, info);
+}
+
+static inline void process_pending_queue(struct cpt_vf *cptvf,
+					 struct pending_qinfo *pqinfo,
+					 int32_t qno)
+{
+	struct pending_queue *pqueue = &pqinfo->queue[qno];
+	struct pending_entry *pentry = NULL;
+	struct cpt_info_buffer *info = NULL;
+	union cpt_res_s *status = NULL;
+
+	while (1) {
+		spin_lock_bh(&pqueue->lock);
+		pentry = &pqueue->head[pqueue->front];
+		if (unlikely(!pentry->busy)) {
+			spin_unlock_bh(&pqueue->lock);
+			break;
+		}
+
+		info = (struct cpt_info_buffer *)pentry->post_arg;
+		if (unlikely(!info)) {
+			pr_err("Pending Entry post arg NULL\n");
+			pending_queue_inc_front(pqinfo, qno);
+			spin_unlock_bh(&pqueue->lock);
+			continue;
+		}
+
+		status = (union cpt_res_s *)pentry->completion_addr;
+		if ((status->s.compcode == CPT_COMP_E_FAULT) ||
+		    (status->s.compcode == CPT_COMP_E_SWERR)) {
+			pr_err("Request failed with %s\n",
+			       (status->s.compcode == CPT_COMP_E_FAULT) ?
+			       "DMA Fault" : "Software error");
+			pentry->completion_addr = NULL;
+			pentry->busy = false;
+			atomic64_dec((&pqueue->pending_count));
+			pentry->post_arg = NULL;
+			pending_queue_inc_front(pqinfo, qno);
+			do_request_cleanup(cptvf, info);
+			spin_unlock_bh(&pqueue->lock);
+			break;
+		} else if (status->s.compcode == COMPLETION_CODE_INIT) {
+			/* check for timeout */
+			if (time_after_eq(jiffies,
+			    (info->time_in + (DEFAULT_COMMAND_TIMEOUT * HZ)))) {
+				pr_err("Request timed out");
+				pentry->completion_addr = NULL;
+				pentry->busy = false;
+				atomic64_dec((&pqueue->pending_count));
+				pentry->post_arg = NULL;
+				pending_queue_inc_front(pqinfo, qno);
+				do_request_cleanup(cptvf, info);
+				spin_unlock_bh(&pqueue->lock);
+				break;
+			} else if ((*info->alternate_caddr ==
+				(~COMPLETION_CODE_INIT)) &&
+				(info->extra_time < TIME_IN_RESET_COUNT)) {
+				info->time_in = jiffies;
+				info->extra_time++;
+				spin_unlock_bh(&pqueue->lock);
+				break;
+			}
+		}
+
+		info->status = 0;
+		pentry->completion_addr = NULL;
+		pentry->busy = false;
+		pentry->post_arg = NULL;
+		atomic64_dec((&pqueue->pending_count));
+		pending_queue_inc_front(pqinfo, qno);
+		spin_unlock_bh(&pqueue->lock);
+
+		do_post_process(info->cptvf, info);
+		/*
+		 * Calling callback after we find
+		 * that the request has been serviced
+		 */
+		pentry->callback(status->s.compcode, pentry->callback_arg);
+	}
+}
+
+int32_t process_request(struct cpt_vf *cptvf, struct cpt_request_info *req)
+{
+	int32_t ret = 0, clear = 0, queue = 0;
+	struct cpt_info_buffer *info = NULL;
+	struct cptvf_request *cpt_req = NULL;
+	union ctrl_info *ctrl = NULL;
+	struct pending_entry *pentry = NULL;
+	struct pending_queue *pqueue = NULL;
+	struct pci_dev *pdev = cptvf->pdev;
+	uint64_t key_handle = 0ULL;
+	uint8_t group = 0;
+	struct cpt_vq_command vq_cmd;
+	union cpt_inst_s cptinst;
+
+	if (unlikely(!cptvf || !req)) {
+		pr_err("Invalid inputs (cptvf: %p, req: %p)\n", cptvf, req);
+		return -EINVAL;
+	}
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL | GFP_ATOMIC);
+	if (unlikely(!info)) {
+		pr_err("Unable to allocate memory for info_buffer\n");
+		return -ENOMEM;
+	}
+
+	cpt_req = (struct cptvf_request *)&req->req;
+	ctrl = (union ctrl_info *)&req->ctrl;
+	key_handle = req->handle;
+
+	info->cptvf = cptvf;
+	info->outcnt = req->outcnt;
+	info->req_type = ctrl->s.req_mode;
+	info->dma_mode = ctrl->s.dma_mode;
+	info->dlen   = cpt_req->dlen;
+	/* Add 8-bytes more for microcode completion code */
+	info->rlen   = ROUNDUP8(req->rlen + COMPLETION_CODE_SIZE);
+
+	group = ctrl->s.grp;
+	ret = setup_sgio_list(cptvf, info, req);
+	if (ret) {
+		pr_err("Setting up SG list failed");
+		goto request_cleanup;
+	}
+
+	cpt_req->dlen = info->dlen;
+	info->opcode = cpt_req->opcode.flags;
+	/*
+	 * Get buffer for union cpt_res_s response
+	 * structure and its physical address
+	 */
+	info->completion_addr = kzalloc(sizeof(union cpt_res_s),
+					     GFP_KERNEL | GFP_ATOMIC);
+	*((uint8_t *)(info->completion_addr)) = COMPLETION_CODE_INIT;
+	info->comp_baddr = dma_map_single(&pdev->dev,
+					       (void *)info->completion_addr,
+					       sizeof(union cpt_res_s),
+					       DMA_BIDIRECTIONAL);
+	if (dma_mapping_error(&pdev->dev, info->comp_baddr)) {
+		pr_err("mapping compptr Failed %lu\n", sizeof(union cpt_res_s));
+		ret = -EFAULT;
+		goto  request_cleanup;
+	}
+
+	/* Fill the VQ command */
+	vq_cmd.cmd.u64 = 0;
+	vq_cmd.cmd.s.opcode = cpu_to_be16(cpt_req->opcode.flags);
+	vq_cmd.cmd.s.param1 = cpu_to_be16(cpt_req->param1);
+	vq_cmd.cmd.s.param2 = cpu_to_be16(cpt_req->param2);
+	vq_cmd.cmd.s.dlen   = cpu_to_be16(cpt_req->dlen);
+
+	/* 64-bit swap for microcode data reads, not needed for addresses*/
+	vq_cmd.cmd.u64 = cpu_to_be64(vq_cmd.cmd.u64);
+	vq_cmd.dptr = info->dptr_baddr;
+	vq_cmd.rptr = info->rptr_baddr;
+	vq_cmd.cptr.u64 = 0;
+	vq_cmd.cptr.s.grp = group;
+	/* Get Pending Entry to submit command */
+	/*queue = SMP_PROCESSOR_ID() % cptvf->nr_queues;*/
+	/* Always queue 0, because 1 queue per VF */
+	queue = 0;
+	info->queue = queue;
+	pqueue = &cptvf->pqinfo.queue[queue];
+
+	if (atomic64_read(&pqueue->pending_count) > PENDING_THOLD) {
+		pr_err("pending threshold reached\n");
+		process_pending_queue(cptvf, &cptvf->pqinfo, queue);
+	}
+
+get_pending_entry:
+	spin_lock_bh(&pqueue->lock);
+	pentry = get_free_pending_entry(pqueue, cptvf->pqinfo.qlen);
+	if (unlikely(!pentry)) {
+		spin_unlock_bh(&pqueue->lock);
+		if (clear == 0) {
+			process_pending_queue(cptvf, &cptvf->pqinfo, queue);
+			clear = 1;
+			goto get_pending_entry;
+		}
+		pr_err("Get free entry failed\n");
+		pr_err("queue: %d, rear: %d, front: %d\n",
+		       queue, pqueue->rear, pqueue->front);
+		ret = -EFAULT;
+		goto request_cleanup;
+	}
+
+	pentry->done = false;
+	pentry->completion_addr = info->completion_addr;
+	pentry->post_arg = (void *)info;
+	pentry->callback = req->callback;
+	pentry->callback_arg = req->callback_arg;
+	info->pentry = pentry;
+	pentry->busy = true;
+	atomic64_inc(&pqueue->pending_count);
+
+	/* Send CPT command */
+	info->pentry = pentry;
+	info->status = ERR_REQ_PENDING;
+	info->time_in = jiffies;
+
+	/* Create the CPT_INST_S type command for HW intrepretation */
+	cptinst.s.doneint = true;
+	cptinst.s.res_addr = (uint64_t)info->comp_baddr;
+	cptinst.s.tag = 0;
+	cptinst.s.grp = 0;
+	cptinst.s.wq_ptr = 0;
+	cptinst.s.ei0 = vq_cmd.cmd.u64;
+	cptinst.s.ei1 = vq_cmd.dptr;
+	cptinst.s.ei2 = vq_cmd.rptr;
+	cptinst.s.ei3 = vq_cmd.cptr.u64;
+
+	ret = send_cpt_command(cptvf, &cptinst, queue);
+	spin_unlock_bh(&pqueue->lock);
+	if (unlikely(ret)) {
+		spin_unlock_bh(&pqueue->lock);
+		pr_err("Send command failed for AE\n");
+		ret = -EFAULT;
+		goto request_cleanup;
+	}
+
+	/* Non-Blocking request */
+	req->request_id = (uint64_t)(info);
+	req->status = -EAGAIN;
+
+	return 0;
+
+request_cleanup:
+	pr_debug("Failed to submit CPT command\n");
+	do_request_cleanup(cptvf, info);
+
+	return ret;
+}
+
+void vq_post_process(struct cpt_vf *cptvf, uint32_t qno)
+{
+	if (unlikely(qno > cptvf->nr_queues)) {
+		pr_err("Request for post processing on invalid pending queue: %u\n",
+		       qno);
+		return;
+	}
+
+	process_pending_queue(cptvf, &cptvf->pqinfo, qno);
+}
+
+int32_t cptvf_do_request(void *vfdev, struct cpt_request_info *req)
+{
+	struct cpt_vf *cptvf = (struct cpt_vf *)vfdev;
+
+	if (!cpt_device_ready(cptvf)) {
+		pr_err("CPT Device is not ready");
+		return -ENODEV;
+	}
+
+	if ((cptvf->vftype == SE_TYPES) && (!req->ctrl.s.se_req)) {
+		pr_err("CPTVF-%d of SE TYPE got AE request", cptvf->vfid);
+		return -EINVAL;
+	} else if ((cptvf->vftype == AE_TYPES) && (req->ctrl.s.se_req)) {
+		pr_err("CPTVF-%d of AE TYPE got SE request", cptvf->vfid);
+		return -EINVAL;
+	}
+
+	cptvf->reqmode = req->ctrl.s.req_mode;
+
+	return process_request(cptvf, req);
+}
diff --git a/drivers/crypto/cavium/cpt/request_manager.h b/drivers/crypto/cavium/cpt/request_manager.h
new file mode 100644
index 0000000..d18d95b
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/request_manager.h
@@ -0,0 +1,221 @@
+/*
+ * Copyright (C) 2016 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ */
+
+#ifndef __REQUEST_MANGER_H
+#define __REQUEST_MANGER_H
+
+#include "cpt_common.h"
+
+#define TIME_IN_RESET_COUNT  5
+#define COMPLETION_CODE_SIZE 8
+#define COMPLETION_CODE_INIT 0
+
+#if defined(__BIG_ENDIAN_BITFIELD)
+#define COMPLETION_CODE_SHIFT     56
+#else
+#define COMPLETION_CODE_SHIFT      0
+#endif
+
+#define PENDING_THOLD  100
+
+#define MAX_SG_IN_OUT_CNT (25u)
+#define SG_LIST_HDR_SIZE  (8u)
+
+union data_ptr {
+	uint64_t addr64;
+	uint8_t *addr;
+};
+
+struct cpt_buffer {
+	uint8_t type; /**< How to interpret the buffer */
+	uint8_t reserved0;
+	uint16_t size; /**< Sizeof of the data */
+	uint16_t offset;
+	uint16_t reserved1;
+	union data_ptr ptr; /**< Pointer to data */
+};
+
+union ctrl_info {
+	uint32_t flags;
+	struct {
+#if defined(__BIG_ENDIAN_BITFIELD)
+		uint32_t reserved0:24;
+		uint32_t grp:3; /**< Group bits */
+		uint32_t dma_mode:2; /**< DMA mode */
+		uint32_t req_mode:2; /**< Requeset mode BLOCKING/NONBLOCKING*/
+		uint32_t se_req:1;/**< To SE core */
+#else
+		uint32_t se_req:1; /**< To SE core */
+		uint32_t req_mode:2; /**< Requeset mode BLOCKING/NONBLOCKING*/
+		uint32_t dma_mode:2; /**< DMA mode */
+		uint32_t grp:3; /* Group bits */
+		uint32_t reserved0:24;
+#endif
+	} s;
+};
+
+union opcode_info {
+	uint16_t flags;
+	struct {
+		uint8_t major;
+		uint8_t minor;
+	} s;
+};
+
+struct cptvf_request {
+	union opcode_info opcode;
+	uint16_t param1;
+	uint16_t param2;
+	uint16_t dlen;
+};
+
+#define MAX_BUF_CNT	16
+
+struct cpt_request_info {
+	uint8_t incnt; /**< Number of input buffers */
+	uint8_t outcnt; /**< Number of output buffers */
+	uint8_t ctxl; /**< Context length, if 0, then INLINE */
+	uint16_t rlen; /**< Output length */
+	union ctrl_info ctrl; /**< User control information */
+
+	struct cptvf_request req; /**< Request Information (Core specific) */
+
+	uint64_t handle; /**< key/context handle */
+	uint64_t request_id; /**< Request ID */
+
+	struct cpt_buffer in[MAX_BUF_CNT];
+	struct cpt_buffer out[MAX_BUF_CNT];
+
+	void (*callback)(int, void *); /**< Kernel ASYNC request callabck */
+	void *callback_arg; /**< Kernel ASYNC request callabck arg */
+
+	uint32_t status; /**< Request status */
+};
+
+enum {
+	UNIT_8_BIT,
+	UNIT_16_BIT,
+	UNIT_32_BIT,
+	UNIT_64_BIT
+};
+
+struct sglist_component {
+	union {
+		uint64_t len;
+		struct {
+			uint16_t len0;
+			uint16_t len1;
+			uint16_t len2;
+			uint16_t len3;
+		} s;
+	} u;
+	uint64_t ptr0;
+	uint64_t ptr1;
+	uint64_t ptr2;
+	uint64_t ptr3;
+};
+
+struct buf_ptr {
+	uint8_t *vptr;
+	dma_addr_t dma_addr;
+	uint16_t size;
+};
+
+#define MAX_OUTCNT	10
+#define MAX_INCNT	10
+
+struct cpt_info_buffer {
+	struct cpt_vf *cptvf;
+	uint8_t req_type;
+	uint8_t dma_mode;
+
+	uint16_t opcode;
+	uint8_t queue;
+	uint8_t extra_time;
+	uint8_t is_ae;
+
+	uint16_t glist_cnt;
+	uint16_t slist_cnt;
+	uint16_t g_size;
+	uint16_t s_size;
+
+	uint32_t outcnt;
+	uint32_t status;
+
+	unsigned long time_in;
+	uint64_t request_id;
+
+	uint32_t dlen;
+	uint32_t rlen;
+	uint32_t total_in;
+	uint32_t total_out;
+	uint64_t dptr_baddr;
+	uint64_t rptr_baddr;
+	uint64_t comp_baddr;
+	uint8_t *in_buffer;
+	uint8_t *out_buffer;
+	uint8_t *gather_components;
+	uint8_t *scatter_components;
+	uint32_t outsize[MAX_OUTCNT];
+	uint32_t outunit[MAX_OUTCNT];
+	uint8_t *outptr[MAX_OUTCNT];
+
+	struct pending_entry *pentry;
+	volatile uint64_t *completion_addr;
+	volatile uint64_t *alternate_caddr;
+
+	struct buf_ptr glist_ptr[MAX_INCNT];
+	struct buf_ptr slist_ptr[MAX_OUTCNT];
+};
+
+/*
+ * CPT_INST_S software command definitions
+ * Words EI (0-3)
+ */
+union vq_cmd_word0 {
+	uint64_t u64;
+	struct {
+		uint16_t opcode;
+		uint16_t param1;
+		uint16_t param2;
+		uint16_t dlen;
+	} s;
+};
+
+union vq_cmd_word3 {
+	uint64_t u64;
+	struct {
+#if defined(__BIG_ENDIAN_BITFIELD)
+		uint64_t grp	: 3;
+		uint64_t cptr	: 61;
+#else
+		uint64_t cptr	: 61;
+		uint64_t grp	: 3;
+#endif
+	} s;
+};
+
+struct cpt_vq_command {
+	union vq_cmd_word0 cmd;
+	uint64_t dptr;
+	uint64_t rptr;
+	union vq_cmd_word3 cptr;
+};
+
+#if defined(__BIG_ENDIAN_BITFIELD)
+#define set_scatter_chunks(value, scatter_component)	{\
+	(value) |= (((uint64_t)scatter_component) << 25); }
+#else
+#define set_scatter_chunks(value, scatter_component)	{\
+	(value) |= (((uint64_t)scatter_component) << 32); }
+#endif
+
+void vq_post_process(struct cpt_vf *cptvf, uint32_t qno);
+int32_t process_request(struct cpt_vf *cptvf,
+			struct cpt_request_info *kern_req);
+#endif /* __REQUEST_MANGER_H */
-- 
2.1.4

^ permalink raw reply related

* [PATCH 0/3] Add Support for Cavium Cryptographic Accelerarion Unit
From: gcherianv @ 2016-11-18 15:00 UTC (permalink / raw)
  To: linux-kernel, linux-crypto; +Cc: davem, herbert, George Cherian

From: George Cherian <george.cherian@cavium.com>

This series adds the support for Cavium Cryptographic Accelerarion Unit (CPT)
CPT is available in Octeon-Tx SoC series.

George Cherian (3):
  drivers: crypto: Add Support for Octeon-tx CPT Engine
  drivers: crypto: Add the Virtual Function driver for CPT
  drivers: crypto: Enable CPT options crypto for build

 drivers/crypto/Kconfig                       |    1 +
 drivers/crypto/Makefile                      |    1 +
 drivers/crypto/cavium/cpt/Kconfig            |   32 +
 drivers/crypto/cavium/cpt/Makefile           |    4 +
 drivers/crypto/cavium/cpt/cpt.h              |   90 +++
 drivers/crypto/cavium/cpt/cpt_common.h       |  377 ++++++++++
 drivers/crypto/cavium/cpt/cpt_hw_types.h     |  940 +++++++++++++++++++++++
 drivers/crypto/cavium/cpt/cpt_main.c         |  891 ++++++++++++++++++++++
 drivers/crypto/cavium/cpt/cpt_pf_mbox.c      |  174 +++++
 drivers/crypto/cavium/cpt/cptvf.h            |  255 +++++++
 drivers/crypto/cavium/cpt/cptvf_algs.c       |  446 +++++++++++
 drivers/crypto/cavium/cpt/cptvf_algs.h       |  159 ++++
 drivers/crypto/cavium/cpt/cptvf_main.c       | 1038 ++++++++++++++++++++++++++
 drivers/crypto/cavium/cpt/cptvf_mbox.c       |  208 ++++++
 drivers/crypto/cavium/cpt/cptvf_reqmanager.c |  655 ++++++++++++++++
 drivers/crypto/cavium/cpt/request_manager.h  |  221 ++++++
 16 files changed, 5492 insertions(+)
 create mode 100644 drivers/crypto/cavium/cpt/Kconfig
 create mode 100644 drivers/crypto/cavium/cpt/Makefile
 create mode 100644 drivers/crypto/cavium/cpt/cpt.h
 create mode 100644 drivers/crypto/cavium/cpt/cpt_common.h
 create mode 100644 drivers/crypto/cavium/cpt/cpt_hw_types.h
 create mode 100644 drivers/crypto/cavium/cpt/cpt_main.c
 create mode 100644 drivers/crypto/cavium/cpt/cpt_pf_mbox.c
 create mode 100644 drivers/crypto/cavium/cpt/cptvf.h
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_algs.c
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_algs.h
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_main.c
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_mbox.c
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_reqmanager.c
 create mode 100644 drivers/crypto/cavium/cpt/request_manager.h

-- 
2.1.4

^ permalink raw reply

* [PATCH 1/3] drivers: crypto: Add Support for Octeon-tx CPT Engine
From: gcherianv @ 2016-11-18 15:00 UTC (permalink / raw)
  To: linux-kernel, linux-crypto; +Cc: davem, herbert, George Cherian
In-Reply-To: <1479481209-11475-1-git-send-email-gcherianv@gmail.com>

From: George Cherian <george.cherian@cavium.com>

Enable the Physical Function diver for the Cavium Crypto Engine (CPT)
found in Octeon-tx series of SoC's. CPT is the Cryptographic Acceleration
Unit. CPT includes microcoded GigaCypher symmetric engines (SEs) and
asymmetric engines (AEs).

Signed-off-by: George Cherian <george.cherian@cavium.com>
---
 drivers/crypto/cavium/cpt/Kconfig        |  22 +
 drivers/crypto/cavium/cpt/Makefile       |   2 +
 drivers/crypto/cavium/cpt/cpt.h          |  90 +++
 drivers/crypto/cavium/cpt/cpt_common.h   | 377 +++++++++++++
 drivers/crypto/cavium/cpt/cpt_hw_types.h | 940 +++++++++++++++++++++++++++++++
 drivers/crypto/cavium/cpt/cpt_main.c     | 891 +++++++++++++++++++++++++++++
 drivers/crypto/cavium/cpt/cpt_pf_mbox.c  | 174 ++++++
 7 files changed, 2496 insertions(+)
 create mode 100644 drivers/crypto/cavium/cpt/Kconfig
 create mode 100644 drivers/crypto/cavium/cpt/Makefile
 create mode 100644 drivers/crypto/cavium/cpt/cpt.h
 create mode 100644 drivers/crypto/cavium/cpt/cpt_common.h
 create mode 100644 drivers/crypto/cavium/cpt/cpt_hw_types.h
 create mode 100644 drivers/crypto/cavium/cpt/cpt_main.c
 create mode 100644 drivers/crypto/cavium/cpt/cpt_pf_mbox.c

diff --git a/drivers/crypto/cavium/cpt/Kconfig b/drivers/crypto/cavium/cpt/Kconfig
new file mode 100644
index 0000000..8fe3f44
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/Kconfig
@@ -0,0 +1,22 @@
+#
+# Cavium crypto device configuration
+#
+
+config CRYPTO_DEV_CPT
+	tristate
+	select HW_RANDOM_OCTEON
+	select CRYPTO_AES
+	select CRYPTO_DES
+	select CRYPTO_BLKCIPHER
+	select FW_LOADER
+
+config OCTEONTX_CPT_PF
+	tristate "Octeon-tx CPT Physical function driver"
+	depends on ARCH_THUNDER
+	select CRYPTO_DEV_CPT
+	help
+	  Support for Cavium CPT block found in octeon-tx series of
+	  processors.
+
+	  To compile this as a module, choose M here: the module will be
+	  called cptpf.
diff --git a/drivers/crypto/cavium/cpt/Makefile b/drivers/crypto/cavium/cpt/Makefile
new file mode 100644
index 0000000..bf758e2
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_OCTEONTX_CPT_PF) += cptpf.o
+cptpf-objs := cpt_main.o cpt_pf_mbox.o
diff --git a/drivers/crypto/cavium/cpt/cpt.h b/drivers/crypto/cavium/cpt/cpt.h
new file mode 100644
index 0000000..63d12da
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/cpt.h
@@ -0,0 +1,90 @@
+/*
+ * Copyright (C) 2016 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ */
+
+#ifndef __CPT_H
+#define __CPT_H
+
+#include "cpt_common.h"
+
+#define BASE_PROC_DIR	"cavium"
+
+#define PF  0
+#define VF  1
+
+struct cpt_device;
+
+struct microcode {
+	uint8_t  is_mc_valid;
+	uint8_t  is_ae;
+	uint8_t  group;
+	uint32_t code_size;
+	void    *code;
+	uint8_t  num_cores;
+	uint64_t core_mask_low; /* Used as long as num # cores are <= 64 */
+	uint64_t core_mask_hi;  /* Unused for now */
+	uint8_t  version[32];
+
+	/* Base info */
+	dma_addr_t dma;
+	dma_addr_t phys_base;
+	void *base;
+};
+
+#define VF_STATE_DOWN	(0)
+#define VF_STATE_UP	(1)
+
+struct cpt_vf_info {
+	uint8_t state;
+	uint8_t priority;
+	uint32_t qlen;
+	union cpt_chipid_vfid id;
+};
+
+/**
+ * cpt device structure
+ */
+struct cpt_device {
+	uint32_t chip_id; /**< CPT Device ID */
+	uint16_t core_freq; /**< CPT Device Frequency */
+	uint16_t flags;	/**< Flags to hold device status bits */
+	uint8_t idx; /**< Device Index (0...MAX_CPT_DEVICES) */
+	uint8_t num_vf_en; /**< Number of VFs enabled (0...CPT_MAX_VF_NUM) */
+
+	struct cpt_vf_info vfinfo[CPT_MAX_VF_NUM]; /* Per VF info */
+	uint8_t next_mc_idx; /**< next microcode index */
+	uint8_t next_group;
+
+	uint8_t max_se_cores;
+	uint8_t max_ae_cores;
+	uint8_t avail_se_cores;
+	uint8_t avail_ae_cores;
+
+	void __iomem *reg_base; /* Register start address */
+
+	/* MSI-X */
+	bool msix_enabled;
+	uint8_t	num_vec;
+	struct msix_entry msix_entries[CPT_PF_MSIX_VECTORS];
+	bool irq_allocated[CPT_PF_MSIX_VECTORS];
+
+	bool mbx_lock[CPT_MAX_VF_NUM]; /* Mailbox locks per VF */
+
+	struct pci_dev *pdev; /**< pci device handle */
+	void *proc; /**< proc dir */
+	struct microcode mcode[CPT_MAX_CORE_GROUPS];
+};
+
+struct cpt_device_list {
+	/* device list lock */
+	spinlock_t lock;
+	uint32_t nr_device;
+	struct cpt_device *device_ptr[MAX_CPT_DEVICES];
+};
+
+void cpt_mbox_intr_handler(struct cpt_device *cpt, int mbx);
+#endif /* __CPT_H */
diff --git a/drivers/crypto/cavium/cpt/cpt_common.h b/drivers/crypto/cavium/cpt/cpt_common.h
new file mode 100644
index 0000000..351ed4a
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/cpt_common.h
@@ -0,0 +1,377 @@
+/*
+ * Copyright (C) 2016 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ */
+
+#ifndef __CPT_COMMON_H
+#define __CPT_COMMON_H
+
+#include <asm/byteorder.h>
+#include <linux/uaccess.h>
+#include <linux/types.h>
+#include <linux/spinlock.h>
+#include <linux/pci.h>
+#include <linux/cpumask.h>
+#include <linux/string.h>
+#include <linux/pci_regs.h>
+#include <linux/delay.h>
+#include <linux/printk.h>
+#include <linux/sched.h>
+#include <linux/completion.h>
+#include <asm/arch_timer.h>
+#include <linux/types.h>
+
+#include "cpt_hw_types.h"
+
+/* configuration space offsets */
+#ifndef PCI_VENDOR_ID
+#define PCI_VENDOR_ID 0x00 /* 16 bits */
+#endif
+#ifndef PCI_DEVICE_ID
+#define PCI_DEVICE_ID 0x02 /* 16 bits */
+#endif
+#ifndef PCI_REVISION_ID
+#define PCI_REVISION_ID 0x08 /* Revision ID */
+#endif
+#ifndef PCI_CAPABILITY_LIST
+#define PCI_CAPABILITY_LIST 0x34 /* first capability list entry */
+#endif
+
+/* Device ID */
+#define PCI_VENDOR_ID_CAVIUM 0x177d
+#define CPT_81XX_PCI_PF_DEVICE_ID 0xa040
+#define CPT_81XX_PCI_VF_DEVICE_ID 0xa041
+
+#define PASS_1_0 0x0
+
+/* CPT Models ((Device ID<<16)|Revision ID) */
+/* CPT models */
+#define CPT_81XX_PASS1_0 ((CPT_81XX_PCI_PF_DEVICE_ID << 8) | PASS_1_0)
+#define CPTVF_81XX_PASS1_0 ((CPT_81XX_PCI_VF_DEVICE_ID << 8) | PASS_1_0)
+
+#define PF 0
+#define VF 1
+
+#define DEFAULT_DEVICE_QUEUES CPT_NUM_QS_PER_VF
+
+#define SUCCESS	(0)
+#define FAIL	(1)
+
+#ifndef ROUNDUP4
+#define ROUNDUP4(val) (((val) + 3) & 0xfffffffc)
+#endif
+
+#ifndef ROUNDUP8
+#define ROUNDUP8(val) (((val) + 7) & 0xfffffff8)
+#endif
+
+#ifndef ROUNDUP16
+#define ROUNDUP16(val) (((val) + 15) & 0xfffffff0)
+#endif
+
+#define ERR_ADDR_LEN 8
+
+#define CPT_MBOX_MSG_TIMEOUT 2000
+#define VF_STATE_DOWN (0)
+#define VF_STATE_UP (1)
+
+/**< flags to indicate the features supported */
+#define CPT_FLAG_DMA_64BIT (uint16_t)(1 << 0)
+#define CPT_FLAG_MSIX_ENABLED (uint16_t)(1 << 1)
+#define CPT_FLAG_SRIOV_ENABLED (uint16_t)(1 << 2)
+#define CPT_FLAG_VF_DRIVER (uint16_t)(1 << 3)
+#define CPT_FLAG_DEVICE_READY (uint16_t)(1 << 4)
+
+#define cpt_msix_enabled(cpt) ((cpt)->flags & CPT_FLAG_MSIX_ENABLED)
+#define cpt_sriov_enabled(cpt) ((cpt)->flags & CPT_FLAG_SRIOV_ENABLED)
+#define cpt_vf_driver(cpt) ((cpt)->flags & CPT_FLAG_VF_DRIVER)
+#define cpt_pf_driver(cpt) (!((cpt)->flags & CPT_FLAG_VF_DRIVER))
+#define cpt_device_ready(cpt) ((cpt)->flags & CPT_FLAG_DEVICE_READY)
+
+#define MAX_CPT_DEVICES	2
+
+/* Default command queue length */
+#define DEFAULT_CMD_QLEN 2046
+#define DEFAULT_CMD_QCHUNK_SIZE 1023
+
+/* Max command queue length allowed. This is to restrict host memory usage */
+#define MAX_CMD_QLEN 16000
+
+/* Completion Interrupt threshold */
+#define COMPLETION_INTR_THOLD 1
+
+/* Default command timeout in seconds */
+#define DEFAULT_COMMAND_TIMEOUT 4
+
+/* Default Mailbox ACK timeout */
+#define DEFAULT_MBOX_ACK_TIMEOUT 4
+
+#define CPT_MBOX_MSG_TYPE_REQ 0
+#define CPT_MBOX_MSG_TYPE_ACK 1
+#define CPT_MBOX_MSG_TYPE_NACK 2
+#define CPT_MBOX_MSG_TYPE_NOP 3
+
+#define CPT_COUNT_THOLD 1
+#define CPT_TIMER_THOLD	0xFFFF
+#define CPT_DBELL_THOLD	1
+
+/*
+ * CPT Registers map for 81xx
+ */
+
+/* PF registers */
+#define CPTX_PF_CONSTANTS(a) (0x0ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_RESET(a) (0x100ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_DIAG(a) (0x120ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_BIST_STATUS(a) (0x160ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_ECC0_CTL(a) (0x200ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_ECC0_FLIP(a) (0x210ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_ECC0_INT(a) (0x220ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_ECC0_INT_W1S(a) (0x230ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_ECC0_ENA_W1S(a)	(0x240ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_ECC0_ENA_W1C(a)	(0x250ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_MBOX_INTX(a, b)	\
+	(0x400ll + 0x1000000000ll * ((a) & 0x1) + 8ll * ((b) & 0x0))
+#define CPTX_PF_MBOX_INT_W1SX(a, b) \
+	(0x420ll + 0x1000000000ll * ((a) & 0x1) + 8ll * ((b) & 0x0))
+#define CPTX_PF_MBOX_ENA_W1CX(a, b) \
+	(0x440ll + 0x1000000000ll * ((a) & 0x1) + 8ll * ((b) & 0x0))
+#define CPTX_PF_MBOX_ENA_W1SX(a, b) \
+	(0x460ll + 0x1000000000ll * ((a) & 0x1) + 8ll * ((b) & 0x0))
+#define CPTX_PF_EXEC_INT(a) (0x500ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXEC_INT_W1S(a)	(0x520ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXEC_ENA_W1C(a)	(0x540ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXEC_ENA_W1S(a)	(0x560ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_GX_EN(a, b) \
+	(0x600ll + 0x1000000000ll * ((a) & 0x1) + 8ll * ((b) & 0x7))
+#define CPTX_PF_EXEC_INFO(a) (0x700ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXEC_BUSY(a) (0x800ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXEC_INFO0(a) (0x900ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXEC_INFO1(a) (0x910ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_INST_REQ_PC(a) (0x10000ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_INST_LATENCY_PC(a) \
+	(0x10020ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_RD_REQ_PC(a) (0x10040ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_RD_LATENCY_PC(a) (0x10060ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_RD_UC_PC(a) (0x10080ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_ACTIVE_CYCLES_PC(a) \
+	(0x10100ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXE_CTL(a) (0x4000000ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXE_STATUS(a) (0x4000008ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXE_CLK(a) (0x4000010ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXE_DBG_CTL(a) (0x4000018ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXE_DBG_DATA(a)	(0x4000020ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXE_BIST_STATUS(a) \
+	(0x4000028ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXE_REQ_TIMER(a) (0x4000030ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXE_MEM_CTL(a) (0x4000038ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXE_PERF_CTL(a)	(0x4001000ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXE_DBG_CNTX(a, b) \
+	(0x4001100ll + 0x1000000000ll * ((a) & 0x1) + 8ll * ((b) & 0xf))
+#define CPTX_PF_EXE_PERF_EVENT_CNT(a) \
+	(0x4001180ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_EXE_EPCI_INBX_CNT(a, b) \
+	(0x4001200ll + 0x1000000000ll * ((a) & 0x1) + 8ll * ((b) & 0x0))
+#define CPTX_PF_EXE_EPCI_OUTBX_CNT(a, b) \
+	(0x4001240ll + 0x1000000000ll * ((a) & 0x1) + 8ll * ((b) & 0x0))
+#define CPTX_PF_ENGX_UCODE_BASE(a, b) \
+	(0x4002000ll + 0x1000000000ll * ((a) & 0x1) + 8ll * ((b) & 0x3f))
+#define CPTX_PF_QX_CTL(a, b) \
+	(0x8000000ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf))
+#define CPTX_PF_QX_GMCTL(a, b) \
+	(0x8000020ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf))
+#define CPTX_PF_QX_CTL2(a, b) \
+	(0x8000100ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf))
+#define CPTX_PF_VFX_MBOXX(a, b, c) \
+	(0x8001000ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf) + 0x100ll * ((c) & 0x1))
+#define CPTX_PF_MSIX_VECX_ADDR(a, b) \
+	(0x0ll + 0x1000000000ll * ((a) & 0x1) + 0x10ll * ((b) & 0x3))
+#define CPTX_PF_MSIX_VECX_CTL(a, b) \
+	(0x8ll + 0x1000000000ll * ((a) & 0x1) + 0x10ll * ((b) & 0x3))
+#define CPTX_PF_MSIX_PBAX(a, b)	\
+	(0xf0000ll + 0x1000000000ll * ((a) & 0x1) + 8ll * ((b) & 0x0))
+
+/* VF registers */
+#define CPTX_VQX_CTL(a, b) \
+	(0x100ll + 0x1000000000ll * ((a) & 0x0) + 0x100000ll * ((b) & 0xf))
+#define CPTX_VQX_SADDR(a, b) \
+	(0x200ll + 0x1000000000ll * ((a) & 0x0) + 0x100000ll * ((b) & 0xf))
+#define CPTX_VQX_DONE_WAIT(a, b) \
+	(0x400ll + 0x1000000000ll * ((a) & 0x0) + 0x100000ll * ((b) & 0xf))
+#define CPTX_VQX_INPROG(a, b) \
+	(0x410ll + 0x1000000000ll * ((a) & 0x0) + 0x100000ll * ((b) & 0xf))
+#define CPTX_VQX_DONE(a, b) \
+	(0x420ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf))
+#define CPTX_VQX_DONE_ACK(a, b) \
+	(0x440ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf))
+#define CPTX_VQX_DONE_INT_W1S(a, b) \
+	(0x460ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf))
+#define CPTX_VQX_DONE_INT_W1C(a, b) \
+	(0x468ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf))
+#define CPTX_VQX_DONE_ENA_W1S(a, b) \
+	(0x470ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf))
+#define CPTX_VQX_DONE_ENA_W1C(a, b) \
+	(0x478ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf))
+#define CPTX_VQX_MISC_INT(a, b)	\
+	(0x500ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf))
+#define CPTX_VQX_MISC_INT_W1S(a, b) \
+	(0x508ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf))
+#define CPTX_VQX_MISC_ENA_W1S(a, b) \
+	(0x510ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf))
+#define CPTX_VQX_MISC_ENA_W1C(a, b) \
+	(0x518ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf))
+#define CPTX_VQX_DOORBELL(a, b)	\
+	(0x600ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf))
+#define CPTX_VFX_PF_MBOXX(a, b, c) \
+	(0x1000ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf) + 8ll * ((c) & 0x1))
+#define CPTX_VFX_MSIX_VECX_ADDR(a, b, c) \
+	(0x0ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf) + 0x10ll * ((c) & 0x1))
+#define CPTX_VFX_MSIX_VECX_CTL(a, b, c) \
+	(0x8ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf) + 0x10ll * ((c) & 0x1))
+#define CPTX_VFX_MSIX_PBAX(a, b, c) \
+	(0xf0000ll + 0x1000000000ll * ((a) & 0x1) + 0x100000ll * ((b) & 0xf) + 8ll * ((c) & 0x0))
+
+/* Future extensions */
+#define CPTX_BRIDGE_BP_TEST(a) (0x1c0ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_CQM_CORE_OBS0(a) (0x1a0ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_CQM_CORE_OBS1(a) (0x1a8ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_NCBI_OBS(a) (0x190ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_BP_TEST(a) (0x180ll + 0x1000000000ll * ((a) & 0x1))
+#define CPTX_PF_ECO(a) (0x140ll + 0x1000000000ll * ((a) & 0x1))
+
+/*###### PCIE EP-Mode Configuration Registers #########*/
+#define PCIEEP0_CFG000 (0x0)
+#define PCIEEP0_CFG002 (0x8)
+#define PCIEEP0_CFG011 (0x2C)
+#define PCIEEP0_CFG020 (0x50)
+#define PCIEEP0_CFG025 (0x64)
+#define PCIEEP0_CFG030 (0x78)
+#define PCIEEP0_CFG044 (0xB0)
+#define PCIEEP0_CFG045 (0xB4)
+#define PCIEEP0_CFG082 (0x148)
+#define PCIEEP0_CFG095 (0x17C)
+#define PCIEEP0_CFG096 (0x180)
+#define PCIEEP0_CFG097 (0x184)
+#define PCIEEP0_CFG103 (0x19C)
+#define PCIEEP0_CFG460 (0x730)
+#define PCIEEP0_CFG461 (0x734)
+#define PCIEEP0_CFG462 (0x738)
+
+/*#######  PCIe EP-Mode SR-IOV Configuration Registers  #####*/
+#define PCIEEPVF0_CFG000 (0x0)
+#define PCIEEPVF0_CFG002 (0x8)
+#define PCIEEPVF0_CFG011 (0x2C)
+#define PCIEEPVF0_CFG030 (0x78)
+#define PCIEEPVF0_CFG044 (0xB0)
+
+enum vftype {
+	AE_TYPES = 1,
+	SE_TYPES = 2,
+	BAD_CPT_TYPES,
+};
+
+static inline int32_t count_set_bits(uint64_t mask)
+{
+	int32_t count = 0;
+
+	while (mask) {
+		if (mask & 1ULL)
+			count++;
+		mask = mask >> 1;
+	}
+
+	return count;
+}
+
+static const uint8_t cpt_device_name[] = "CPT81XX";
+static const uint8_t cptvf_device_name[] = "CPT81XX-VF";
+static const uint8_t cpt_device_file[] = "cpt";
+static const uint8_t cptvf_device_file[] = "cptvf";
+
+static const uint8_t cpt_driver_name[] = "CPT Driver";
+static const uint8_t cpt_driver_class[] = "crypto";
+static const uint8_t cptvf_driver_class[] = "cryptovf";
+
+/* Max CPT devices supported */
+enum cpt_mbox_opcode {
+	CPT_MSG_VF_CFG = 1,
+	CPT_MSG_VF_UP,
+	CPT_MSG_VF_DOWN,
+	CPT_MSG_CHIPID_VFID,
+	CPT_MSG_READY,
+	CPT_MSG_QLEN,
+	CPT_MSG_QBIND_GRP,
+	CPT_MSG_VQ_PRIORITY,
+	CPT_MSG_VF_QUERY_HEALTH,
+};
+
+union cpt_chipid_vfid {
+	uint16_t u16;
+	struct {
+#if defined(__BIG_ENDIAN_BITFIELD)
+		uint16_t chip_id:8;
+		uint16_t vfid:8;
+#else
+		uint16_t vfid:8;
+		uint16_t chip_id:8;
+#endif
+	} s;
+};
+
+/* CPT mailbox structure */
+struct cpt_mbox {
+	uint64_t msg; /* Message type MBOX[0] */
+	uint64_t data;/* Data         MBOX[1] */
+};
+
+/* The Cryptographic Acceleration Unit can *only* be found in SoCs
+ * containing the ThunderX ARM64 CPU implementation.  All accesses to the device
+ * registers on this platform are implicitly strongly ordered with respect
+ * to memory accesses. So writeq_relaxed() and readq_relaxed() are safe to use
+ * with no memory barriers in this driver.  The readq()/writeq() functions add
+ * explicit ordering operation which in this case are redundant, and only
+ * add overhead.
+ */
+/* Register read/write APIs */
+static inline void cpt_write_csr64(uint8_t __iomem *hw_addr, uint64_t offset,
+				   uint64_t val)
+{
+	uint8_t __iomem *base = ACCESS_ONCE(hw_addr);
+
+	writeq_relaxed(val, base + offset);
+}
+
+static inline uint64_t cpt_read_csr64(uint8_t __iomem *hw_addr, uint64_t offset)
+{
+	uint8_t __iomem *base = ACCESS_ONCE(hw_addr);
+
+	return readq_relaxed(base + offset);
+}
+
+static inline void byte_swap_64(uint64_t *data)
+{
+	uint64_t val = 0ULL;
+	uint8_t *a, *b;
+
+	a = (uint8_t *)data;
+	b = (uint8_t *)&val;
+	b[0] = a[7];
+	b[1] = a[6];
+	b[2] = a[5];
+	b[3] = a[4];
+	b[4] = a[3];
+	b[5] = a[2];
+	b[6] = a[1];
+	b[7] = a[0];
+	*data = val;
+}
+
+static inline void byte_swap_16(uint16_t *data)
+{
+	uint16_t val = *data;
+	*data = (val >> 8) | (val << 8);
+}
+#endif /* __CPT_COMMON_H */
diff --git a/drivers/crypto/cavium/cpt/cpt_hw_types.h b/drivers/crypto/cavium/cpt/cpt_hw_types.h
new file mode 100644
index 0000000..a6def18
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/cpt_hw_types.h
@@ -0,0 +1,940 @@
+/*
+ * Copyright (C) 2016 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ */
+
+#ifndef __CPT_HW_TYPES_H
+#define __CPT_HW_TYPES_H
+
+#include "cpt_common.h"
+
+#define NR_CLUSTER (4)
+#define CSR_DELAY (30)
+
+#define CPT_NUM_QS_PER_VF (1)
+#define CPT_INST_SIZE (64)
+#define CPT_VQ_CHUNK_ALIGN (128) /**< 128 byte align */
+#define CPT_NEXT_CHUNK_PTR_SIZE (8)
+#define CPT_INST_CHUNK_MAX_SIZE (1023)
+
+#define CPT_MAX_CORE_GROUPS (8)
+#define CPT_MAX_SE_CORES (10)
+#define CPT_MAX_AE_CORES (6)
+#define CPT_MAX_TOTAL_CORES (CPT_MAX_SE_CORES + CPT_MAX_AE_CORES)
+#define CPT_MAX_VF_NUM (16)
+#define CPT_MAX_VQ_NUM (16)
+#define CPT_PF_VF_MAILBOX_SIZE (2)
+
+/* MSI-X interrupts */
+#define	CPT_PF_MSIX_VECTORS (3)
+#define	CPT_VF_MSIX_VECTORS (2)
+
+/* Configuration and Status registers are in BAR 0 */
+#define CPT_CSR_BAR 0
+#define CPT_MSIX_BAR 4
+
+/**
+ * Enumeration cpt_bar_e
+ *
+ * CPT Base Address Register Enumeration
+ * Enumerates the base address registers.
+ */
+#define CPT_BAR_E_CPTX_PF_BAR0(a) (0x872000000000ll + 0x1000000000ll * (a))
+#define CPT_BAR_E_CPTX_PF_BAR4(a) (0x872010000000ll + 0x1000000000ll * (a))
+#define CPT_BAR_E_CPTX_VFX_BAR0(a, b) \
+	(0x872020000000ll + 0x1000000000ll * (a) + 0x100000ll * (b))
+#define CPT_BAR_E_CPTX_VFX_BAR4(a, b) \
+	(0x872030000000ll + 0x1000000000ll * (a) + 0x100000ll * (b))
+
+/**
+ * Enumeration cpt_comp_e
+ *
+ * CPT Completion Enumeration
+ * Enumerates the values of CPT_RES_S[COMPCODE].
+ */
+enum cpt_comp_e {
+	CPT_COMP_E_NOTDONE = 0x00,
+	CPT_COMP_E_GOOD = 0x01,
+	CPT_COMP_E_FAULT = 0x02,
+	CPT_COMP_E_SWERR = 0x03,
+	CPT_COMP_E_LAST_ENTRY = 0xFF
+};
+
+/**
+ * Enumeration cpt_engine_err_type_e
+ *
+ * CPT Engine Error Code Enumeration
+ * Enumerates the values of CPT_RES_S[COMPCODE].
+ */
+enum cpt_engine_err_type_e {
+	CPT_ENGINE_ERR_TYPE_E_NOERR = 0x00,
+	CPT_ENGINE_ERR_TYPE_E_RF = 0x01,
+	CPT_ENGINE_ERR_TYPE_E_UC = 0x02,
+	CPT_ENGINE_ERR_TYPE_E_WD = 0x04,
+	CPT_ENGINE_ERR_TYPE_E_GE = 0x08,
+	CPT_ENGINE_ERR_TYPE_E_BUS = 0x20,
+	CPT_ENGINE_ERR_TYPE_E_LAST = 0xFF
+};
+
+/**
+ * Enumeration cpt_eop_e
+ *
+ * CPT EOP (EPCI Opcodes) Enumeration
+ * Opcodes on the epci bus.
+ */
+enum cpt_eop_e {
+	CPT_EOP_E_DMA_RD_LDT = 0x01,
+	CPT_EOP_E_DMA_RD_LDI = 0x02,
+	CPT_EOP_E_DMA_RD_LDY = 0x06,
+	CPT_EOP_E_DMA_RD_LDD = 0x08,
+	CPT_EOP_E_DMA_RD_LDE = 0x0b,
+	CPT_EOP_E_DMA_RD_LDWB = 0x0d,
+	CPT_EOP_E_DMA_WR_STY = 0x0e,
+	CPT_EOP_E_DMA_WR_STT = 0x11,
+	CPT_EOP_E_DMA_WR_STP = 0x12,
+	CPT_EOP_E_ATM_FAA64 = 0x3b,
+	CPT_EOP_E_RANDOM1_REQ = 0x61,
+	CPT_EOP_E_RANDOM_REQ = 0x60,
+	CPT_EOP_E_ERR_REQUEST = 0xfb,
+	CPT_EOP_E_UCODE_REQ = 0xfc,
+	CPT_EOP_E_MEMB = 0xfd,
+	CPT_EOP_E_NEW_WORK_REQ = 0xff,
+};
+
+/**
+ * Enumeration cpt_pf_int_vec_e
+ *
+ * CPT PF MSI-X Vector Enumeration
+ * Enumerates the MSI-X interrupt vectors.
+ */
+enum cpt_pf_int_vec_e {
+	CPT_PF_INT_VEC_E_ECC0 = 0x00,
+	CPT_PF_INT_VEC_E_EXEC = 0x01
+};
+
+#define CPT_PF_INT_VEC_E_MBOXX(a) (0x02 + (a))
+
+/**
+ * Enumeration cpt_rams_e
+ *
+ * CPT RAM Field Enumeration
+ * Enumerates the relative bit positions within CPT()_PF_ECC0_CTL[CDIS].
+ */
+enum cpt_rams_e {
+	CPT_RAMS_E_NCBI_DATFIF = 0x00,
+	CPT_RAMS_E_NCBO_MEM0 = 0x01,
+	CPT_RAMS_E_CQM_CTLMEM = 0x02,
+	CPT_RAMS_E_CQM_BPTR = 0x03,
+	CPT_RAMS_E_CQM_GMID = 0x04,
+	CPT_RAMS_E_CQM_INSTFIF0 = 0x05,
+	CPT_RAMS_E_CQM_INSTFIF1 = 0x06,
+	CPT_RAMS_E_CQM_INSTFIF2 = 0x07,
+	CPT_RAMS_E_CQM_INSTFIF3 = 0x08,
+	CPT_RAMS_E_CQM_INSTFIF4 = 0x09,
+	CPT_RAMS_E_CQM_INSTFIF5 = 0x0a,
+	CPT_RAMS_E_CQM_INSTFIF6 = 0x0b,
+	CPT_RAMS_E_CQM_INSTFIF7 = 0x0c,
+	CPT_RAMS_E_CQM_DONE_CNT = 0x0d,
+	CPT_RAMS_E_CQM_DONE_TIMER = 0x0e,
+	CPT_RAMS_E_COMP_FIFO = 0x0f,
+	CPT_RAMS_E_MBOX_MEM = 0x10,
+	CPT_RAMS_E_FPA_MEM = 0x11,
+	CPT_RAMS_E_CDEI_UCODE = 0x12,
+	CPT_RAMS_E_COMP_ARRAY0 = 0x13,
+	CPT_RAMS_E_COMP_ARRAY1 = 0x14,
+	CPT_RAMS_E_CSR_VMEM = 0x15,
+	CPT_RAMS_E_RSP_MAP = 0x16,
+	CPT_RAMS_E_RSP_INST = 0x17,
+	CPT_RAMS_E_RSP_NCBO = 0x18,
+	CPT_RAMS_E_RSP_RNM = 0x19,
+	CPT_RAMS_E_CDEI_FIFO0 = 0x1a,
+	CPT_RAMS_E_CDEI_FIFO1 = 0x1b,
+	CPT_RAMS_E_EPCO_FIFO0 = 0x1c,
+	CPT_RAMS_E_EPCO_FIFO1 = 0x1d,
+	CPT_RAMS_E_LAST_ENTRY = 0xff
+};
+
+/**
+ * Enumeration cpt_vf_int_vec_e
+ *
+ * CPT VF MSI-X Vector Enumeration
+ * Enumerates the MSI-X interrupt vectors.
+ */
+enum cpt_vf_int_vec_e {
+	CPT_VF_INT_VEC_E_MISC = 0x00,
+	CPT_VF_INT_VEC_E_DONE = 0x01
+};
+
+#define CPT_VF_INTR_MBOX_MASK BIT(0)
+#define CPT_VF_INTR_DOVF_MASK BIT(1)
+#define CPT_VF_INTR_IRDE_MASK BIT(2)
+#define CPT_VF_INTR_NWRP_MASK BIT(3)
+#define CPT_VF_INTR_SERR_MASK BIT(4)
+
+/**
+ * Structure cpt_inst_s
+ *
+ * CPT Instruction Structure
+ * This structure specifies the instruction layout. Instructions are
+ * stored in memory as little-endian unless CPT()_PF_Q()_CTL[INST_BE] is set.
+ * cpt_inst_s_s
+ * Word 0
+ * doneint:1 Done interrupt.
+ *	0 = No interrupts related to this instruction.
+ *	1 = When the instruction completes, CPT()_VQ()_DONE[DONE] will be
+ *	incremented,and based on the rules described there an interrupt may
+ *	occur.
+ * Word 1
+ * res_addr:64 [127: 64] Result IOVA.
+ *	If nonzero, specifies where to write CPT_RES_S.
+ *	If zero, no result structure will be written.
+ *	Address must be 16-byte aligned.
+ *	Bits <63:49> are ignored by hardware; software should use a
+ *	sign-extended bit <48> for forward compatibility.
+ * Word 2
+ *  grp:10 [171:162] If [WQ_PTR] is nonzero, the SSO guest-group to use when
+ *	CPT submits work SSO.
+ *	For the SSO to not discard the add-work request, FPA_PF_MAP() must map
+ *	[GRP] and CPT()_PF_Q()_GMCTL[GMID] as valid.
+ *  tt:2 [161:160] If [WQ_PTR] is nonzero, the SSO tag type to use when CPT
+ *	submits work to SSO
+ *  tag:32 [159:128] If [WQ_PTR] is nonzero, the SSO tag to use when CPT
+ *	submits work to SSO.
+ * Word 3
+ *  wq_ptr:64 [255:192] If [WQ_PTR] is nonzero, it is a pointer to a
+ *	work-queue entry that CPT submits work to SSO after all context,
+ *	output data, and result write operations are visible to other
+ *	CNXXXX units and the cores. Bits <2:0> must be zero.
+ *	Bits <63:49> are ignored by hardware; software should
+ *	use a sign-extended bit <48> for forward compatibility.
+ *	Internal:
+ *	Bits <63:49>, <2:0> are ignored by hardware, treated as always 0x0.
+ * Word 4
+ *  ei0:64; [319:256] Engine instruction word 0. Passed to the AE/SE.
+ * Word 5
+ *  ei1:64; [383:320] Engine instruction word 1. Passed to the AE/SE.
+ * Word 6
+ *  ei2:64; [447:384] Engine instruction word 1. Passed to the AE/SE.
+ * Word 7
+ *  ei3:64; [511:448] Engine instruction word 1. Passed to the AE/SE.
+ *
+ */
+union cpt_inst_s {
+	uint64_t u[8];
+	struct cpt_inst_s_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_17_63:47;
+		uint64_t doneint:1;
+		uint64_t reserved_0_1:16;
+#else /* Word 0 - Little Endian */
+		uint64_t reserved_0_15:16;
+		uint64_t doneint:1;
+		uint64_t reserved_17_63:47;
+#endif /* Word 0 - End */
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 1 - Big Endian */
+		uint64_t res_addr:64;
+#else /* Word 1 - Little Endian */
+		uint64_t res_addr:64;
+#endif /* Word 1 - End */
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 2 - Big Endian */
+		uint64_t reserved_172_19:20;
+		uint64_t grp:10;
+		uint64_t tt:2;
+		uint64_t tag:32;
+#else /* Word 2 - Little Endian */
+		uint64_t tag:32;
+		uint64_t tt:2;
+		uint64_t grp:10;
+		uint64_t reserved_172_191:20;
+#endif /* Word 2 - End */
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 3 - Big Endian */
+		uint64_t wq_ptr:64;
+#else /* Word 3 - Little Endian */
+		uint64_t wq_ptr:64;
+#endif /* Word 3 - End */
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 4 - Big Endian */
+		uint64_t ei0:64;
+#else /* Word 4 - Little Endian */
+		uint64_t ei0:64;
+#endif /* Word 4 - End */
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 5 - Big Endian */
+		uint64_t ei1:64;
+#else /* Word 5 - Little Endian */
+		uint64_t ei1:64;
+#endif /* Word 5 - End */
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 6 - Big Endian */
+		uint64_t ei2:64;
+#else /* Word 6 - Little Endian */
+		uint64_t ei2:64;
+#endif /* Word 6 - End */
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 7 - Big Endian */
+		uint64_t ei3:64;
+#else /* Word 7 - Little Endian */
+		uint64_t ei3:64;
+#endif /* Word 7 - End */
+	} s;
+};
+
+/**
+ * Structure cpt_res_s
+ *
+ * CPT Result Structure
+ * The CPT coprocessor writes the result structure after it completes a
+ * CPT_INST_S instruction. The result structure is exactly 16 bytes, and
+ * each instruction completion produces exactly one result structure.
+ *
+ * This structure is stored in memory as little-endian unless
+ * CPT()_PF_Q()_CTL[INST_BE] is set.
+ * cpt_res_s_s
+ * Word 0
+ *  doneint:1 [16:16] Done interrupt. This bit is copied from the
+ *	corresponding instruction's CPT_INST_S[DONEINT].
+ *  compcode:8 [7:0] Indicates completion/error status of the CPT coprocessor
+ *	for the	associated instruction, as enumerated by CPT_COMP_E.
+ *	Core software may write the memory location containing [COMPCODE] to
+ *	0x0 before ringing the doorbell, and then poll for completion by
+ *	checking for a nonzero value.
+ *	Once the core observes a nonzero [COMPCODE] value in this case,the CPT
+ *	coprocessor will have also completed L2/DRAM write operations.
+ * Word 1
+ *  reserved
+ *
+ */
+union cpt_res_s {
+	uint64_t u[2];
+	struct cpt_res_s_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_17_63:47;
+		uint64_t doneint:1;
+		uint64_t reserved_8_15:8;
+		uint64_t compcode:8;
+#else /* Word 0 - Little Endian */
+		uint64_t compcode:8;
+		uint64_t reserved_8_15:8;
+		uint64_t doneint:1;
+		uint64_t reserved_17_63:47;
+#endif /* Word 0 - End */
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 1 - Big Endian */
+		uint64_t reserved_64_127:64;
+#else /* Word 1 - Little Endian */
+		uint64_t reserved_64_127:64;
+#endif /* Word 1 - End */
+	} s;
+};
+
+/**
+ * Register (NCB) cpt#_pf_bist_status
+ *
+ * CPT PF Control Bist Status Register
+ * This register has the BIST status of memories. Each bit is the BIST result
+ * of an individual memory (per bit, 0 = pass and 1 = fail).
+ * cptx_pf_bist_status_s
+ * Word0
+ *  bstatus [29:0](RO/H) BIST status. One bit per memory, enumerated by
+ *	CPT_RAMS_E.
+ */
+union cptx_pf_bist_status {
+	uint64_t u;
+	struct cptx_pf_bist_status_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_30_63:34;
+		uint64_t bstatus:30;
+#else /* Word 0 - Little Endian */
+		uint64_t bstatus:30;
+		uint64_t reserved_30_63:34;
+#endif /* Word 0 - End */
+	} s;
+};
+
+/**
+ * Register (NCB) cpt#_pf_constants
+ *
+ * CPT PF Constants Register
+ * This register contains implementation-related parameters of CPT in CNXXXX.
+ * cptx_pf_constants_s
+ * Word 0
+ *  reserved_40_63:24 [63:40] Reserved.
+ *  epcis:8 [39:32](RO) Number of EPCI busses.
+ *  grps:8 [31:24](RO) Number of engine groups implemented.
+ *  ae:8 [23:16](RO/H) Number of AEs. In CNXXXX, for CPT0 returns 0x0,
+ *	for CPT1 returns 0x18, or less if there are fuse-disables.
+ *  se:8 [15:8](RO/H) Number of SEs. In CNXXXX, for CPT0 returns 0x30,
+ *	or less if there are fuse-disables, for CPT1 returns 0x0.
+ *  vq:8 [7:0](RO) Number of VQs.
+ * cptx_pf_constants_cn81xx
+ * Word 0
+ *  reserved_40_63:24 [63:40] Reserved
+ *  epcis:8 [39:32](RO) Number of EPCI busses.
+ *  grps:8 [31:24](RO) Number of engine groups implemented.
+ *  ae:8 [23:16](RO/H) Number of AEs. In CNXXXX, returns 0x6 or less
+ *	if there are fuse-disables.
+ *  se:8 [15: 8](RO/H) Number of SEs. In CNXXXX, returns 0xA, or less
+ *	if there are fuse-disables.
+ *  vq:8 [7:0](RO) Number of VQs.
+ *
+ */
+union cptx_pf_constants {
+	uint64_t u;
+	struct cptx_pf_constants_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_40_63:24;
+		uint64_t epcis:8;
+		uint64_t grps:8;
+		uint64_t ae:8;
+		uint64_t se:8;
+		uint64_t vq:8;
+#else /* Word 0 - Little Endian */
+		uint64_t vq:8;
+		uint64_t se:8;
+		uint64_t ae:8;
+		uint64_t grps:8;
+		uint64_t epcis:8;
+		uint64_t reserved_40_63:24;
+#endif /* Word 0 - End */
+	} s;
+	struct cptx_pf_constants_cn81xx {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_40_63:24;
+		uint64_t epcis:8;
+		uint64_t grps:8;
+		uint64_t ae:8;
+		uint64_t se:8;
+		uint64_t vq:8;
+#else /* Word 0 - Little Endian */
+		uint64_t vq:8;
+		uint64_t se:8;
+		uint64_t ae:8;
+		uint64_t grps:8;
+		uint64_t epcis:8;
+		uint64_t reserved_40_63:24;
+#endif /* Word 0 - End */
+	} cn81xx;
+};
+
+/**
+ * Register (NCB) cpt#_pf_exe_bist_status
+ *
+ * CPT PF Engine Bist Status Register
+ * This register has the BIST status of each engine.  Each bit is the
+ * BIST result of an individual engine (per bit, 0 = pass and 1 = fail).
+ * cptx_pf_exe_bist_status_s
+ * Word0
+ *  reserved_48_63:16 [63:48] reserved
+ *  bstatus:48 [47:0](RO/H) BIST status. One bit per engine.
+ *
+ */
+union cptx_pf_exe_bist_status {
+	uint64_t u;
+	struct cptx_pf_exe_bist_status_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_48_63:16;
+		uint64_t bstatus:48
+#else /* Word 0 - Little Endian */
+		uint64_t bstatus:48;
+		uint64_t reserved_48_63:16;
+#endif /* Word 0 - End */
+	} s;
+	struct cptx_pf_exe_bist_status_cn81xx {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_16_63:48;
+		uint64_t bstatus:16;
+#else /* Word 0 - Little Endian */
+		uint64_t bstatus:16;
+		uint64_t reserved_16_63:48;
+#endif /* Word 0 - End */
+	} cn81xx;
+};
+
+/**
+ * Register (NCB) cpt#_pf_exe_ctl
+ *
+ * CPT PF Engine Control Register
+ * This register enables the engines.
+ * cptx_pf_exe_ctl_s
+ * Word0
+ *  enable:64 [63:0](R/W) Individual enables for each of the engines.
+ */
+union cptx_pf_exe_ctl {
+	uint64_t u;
+	struct cptx_pf_exe_ctl_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t enable:64;
+#else /* Word 0 - Little Endian */
+		uint64_t enable:64;
+#endif /* Word 0 - End */
+	} s;
+};
+
+/**
+ * Register (NCB) cpt#_pf_q#_ctl
+ *
+ * CPT Queue Control Register
+ * This register configures queues. This register should be changed only
+ * when quiescent (see CPT()_VQ()_INPROG[INFLIGHT]).
+ * cptx_pf_qx_ctl_s
+ * Word0
+ *  reserved_60_63:4 [63:60] reserved.
+ *  aura:12; [59:48](R/W) Guest-aura for returning this queue's
+ *	instruction-chunk buffers to FPA. Only used when [INST_FREE] is set.
+ *	For the FPA to not discard the request, FPA_PF_MAP() must map
+ *	[AURA] and CPT()_PF_Q()_GMCTL[GMID] as valid.
+ *  reserved_45_47:3 [47:45] reserved.
+ *  size:13 [44:32](R/W) Command-buffer size, in number of 64-bit words per
+ *	command buffer segment. Must be 8*n + 1, where n is the number of
+ *	instructions per buffer segment.
+ *  reserved_11_31:21 [31:11] Reserved.
+ *  cont_err:1 [10:10](R/W) Continue on error.
+ *	0 = When CPT()_VQ()_MISC_INT[NWRP], CPT()_VQ()_MISC_INT[IRDE] or
+ *	CPT()_VQ()_MISC_INT[DOVF] are set by hardware or software via
+ *	CPT()_VQ()_MISC_INT_W1S, then CPT()_VQ()_CTL[ENA] is cleared.  Due to
+ *	pipelining, additional instructions may have been processed between the
+ *	instruction causing the error and the next instruction in the disabled
+ *	queue (the instruction at CPT()_VQ()_SADDR).
+ *	1 = Ignore errors and continue processing instructions.
+ *	For diagnostic use only.
+ *  inst_free:1 [9:9](R/W) Instruction FPA free. When set, when CPT reaches the
+ *	end of an instruction chunk, that chunk will be freed to the FPA.
+ *  inst_be:1 [8:8](R/W) Instruction big-endian control. When set, instructions,
+ *	instruction next chunk pointers, and result structures are stored in
+ *	big-endian format in memory.
+ *  iqb_ldwb:1 [7:7](R/W) Instruction load don't write back.
+ *	0 = The hardware issues NCB transient load (LDT) towards the cache,
+ *	which if the line hits and is is dirty will cause the line to be
+ *	written back before being replaced.
+ *	1 = The hardware issues NCB LDWB read-and-invalidate command towards
+ *	the cache when fetching the last word of instructions; as a result the
+ *	line will not be written back when replaced.  This improves
+ *	performance, but software must not read the instructions after they are
+ *	posted to the hardware.	Reads that do not consume the last word of a
+ *	cache line always use LDI.
+ *  reserved_4_6:3 [6:4] Reserved.
+ *  grp:3; [3:1](R/W) Engine group.
+ *  pri:1; [0:0](R/W) Queue priority.
+ *	1 = This queue has higher priority. Round-robin between higher
+ *	priority queues.
+ *	0 = This queue has lower priority. Round-robin between lower
+ *	priority queues.
+ */
+union cptx_pf_qx_ctl {
+	uint64_t u;
+	struct cptx_pf_qx_ctl_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_60_63:4;
+		uint64_t aura:12;
+		uint64_t reserved_45_47:3;
+		uint64_t size:13;
+		uint64_t reserved_11_31:21;
+		uint64_t cont_err:1;
+		uint64_t inst_free:1;
+		uint64_t inst_be:1;
+		uint64_t iqb_ldwb:1;
+		uint64_t reserved_4_6:3;
+		uint64_t grp:3;
+		uint64_t pri:1;
+#else /* Word 0 - Little Endian */
+		uint64_t pri:1;
+		uint64_t grp:3;
+		uint64_t reserved_4_6:3;
+		uint64_t iqb_ldwb:1;
+		uint64_t inst_be:1;
+		uint64_t inst_free:1;
+		uint64_t cont_err:1;
+		uint64_t reserved_11_31:21;
+		uint64_t size:13;
+		uint64_t reserved_45_47:3;
+		uint64_t aura:12;
+		uint64_t reserved_60_63:4;
+#endif /* Word 0 - End */
+	} s;
+    /* struct cptx_pf_qx_ctl_s cn; */
+};
+
+/**
+ * Register (NCB) cpt#_pf_g#_en
+ *
+ * CPT PF Group Control Register
+ * This register configures engine groups.
+ * cptx_pf_gx_en_s
+ * Word0
+ *  en: 64; [63:0](R/W/H) Engine group enable. One bit corresponds to each
+ *	engine, with the bit set to indicate this engine can service this group.
+ *	Bits corresponding to unimplemented engines read as zero, i.e. only bit
+ *	numbers	less than CPT()_PF_CONSTANTS[AE] + CPT()_PF_CONSTANTS[SE] are
+ *	writable. AE engine bits follow SE engine bits.
+ *	E.g. if CPT()_PF_CONSTANTS[AE] = 0x1, and CPT()_PF_CONSTANTS[SE] = 0x2,
+ *	then bits <2:0> are read/writable with bit <2> corresponding to AE<0>,
+ *	and bit <1> to SE<1>, and bit<0> to SE<0>. Before disabling an engine,
+ *	the corresponding bit in each group must be cleared. CPT()_PF_EXEC_BUSY
+ *	can then be polled to determing when the engine becomes	idle.
+ *	At the point, the engine can be disabled.
+ */
+union cptx_pf_gx_en {
+	uint64_t u;
+	struct cptx_pf_gx_en_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t en:64;
+#else /* Word 0 - Little Endian */
+		uint64_t en:64;
+#endif /* Word 0 - End */
+	} s;
+};
+
+/**
+ * Register (NCB) cpt#_vq#_saddr
+ *
+ * CPT Queue Starting Buffer Address Registers
+ * These registers set the instruction buffer starting address.
+ * cptx_vqx_saddr_s
+ * Word0
+ *  reserved_49_63:15 [63:49] Reserved.
+ *  ptr:43 [48:6](R/W/H) Instruction buffer IOVA <48:6> (64-byte aligned).
+ *	When written, it is the initial buffer starting address; when read,
+ *	it is the next read pointer to be requested from L2C. The PTR field
+ *	is overwritten with the next pointer each time that the command buffer
+ *	segment is exhausted. New commands will then be read from the newly
+ *	specified command buffer pointer.
+ *  reserved_0_5:6 [5:0] Reserved.
+ *
+ */
+union cptx_vqx_saddr {
+	uint64_t u;
+	struct cptx_vqx_saddr_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_49_63:15;
+		uint64_t ptr:43
+		uint64_t reserved_0_5:6;
+#else /* Word 0 - Little Endian */
+		uint64_t reserved_0_5:6;
+		uint64_t ptr:43;
+		uint64_t reserved_49_63:15;
+#endif /* Word 0 - End */
+	} s;
+};
+
+/**
+ * Register (NCB) cpt#_vq#_misc_ena_w1s
+ *
+ * CPT Queue Misc Interrupt Enable Set Register
+ * This register sets interrupt enable bits.
+ * cptx_vqx_misc_ena_w1s_s
+ * Word0
+ * reserved_5_63:59 [63:5] Reserved.
+ * swerr:1 [4:4](R/W1S/H) Reads or sets enable for
+ *	CPT(0..1)_VQ(0..63)_MISC_INT[SWERR].
+ * nwrp:1 [3:3](R/W1S/H) Reads or sets enable for
+ *	CPT(0..1)_VQ(0..63)_MISC_INT[NWRP].
+ * irde:1 [2:2](R/W1S/H) Reads or sets enable for
+ *	CPT(0..1)_VQ(0..63)_MISC_INT[IRDE].
+ * dovf:1 [1:1](R/W1S/H) Reads or sets enable for
+ *	CPT(0..1)_VQ(0..63)_MISC_INT[DOVF].
+ * mbox:1 [0:0](R/W1S/H) Reads or sets enable for
+ *	CPT(0..1)_VQ(0..63)_MISC_INT[MBOX].
+ *
+ */
+union cptx_vqx_misc_ena_w1s {
+	uint64_t u;
+	struct cptx_vqx_misc_ena_w1s_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_5_63:59;
+		uint64_t swerr:1;
+		uint64_t nwrp:1;
+		uint64_t irde:1;
+		uint64_t dovf:1;
+		uint64_t mbox:1;
+#else /* Word 0 - Little Endian */
+		uint64_t mbox:1;
+		uint64_t dovf:1;
+		uint64_t irde:1;
+		uint64_t nwrp:1;
+		uint64_t swerr:1;
+		uint64_t reserved_5_63:59;
+#endif /* Word 0 - End */
+	} s;
+	struct cptx_vqx_misc_ena_w1s_cn81xx {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_5_63:59;
+		uint64_t swerr:1;
+		uint64_t nwrp:1;
+		uint64_t irde:1;
+		uint64_t dovf:1;
+		uint64_t mbox:1;
+#else /* Word 0 - Little Endian */
+		uint64_t mbox:1;
+		uint64_t dovf:1;
+		uint64_t irde:1;
+		uint64_t nwrp:1;
+		uint64_t swerr:1;
+		uint64_t reserved_5_63:59;
+#endif /* Word 0 - End */
+	} cn81xx;
+};
+
+/**
+ * Register (NCB) cpt#_vq#_doorbell
+ *
+ * CPT Queue Doorbell Registers
+ * Doorbells for the CPT instruction queues.
+ * cptx_vqx_doorbell_s
+ * Word0
+ *  reserved_20_63:44 [63:20] Reserved.
+ *  dbell_cnt:20 [19:0](R/W/H) Number of instruction queue 64-bit words to add
+ *	to the CPT instruction doorbell count. Readback value is the the
+ *	current number of pending doorbell requests. If counter overflows
+ *	CPT()_VQ()_MISC_INT[DBELL_DOVF] is set. To reset the count back to
+ *	zero, write one to clear CPT()_VQ()_MISC_INT_ENA_W1C[DBELL_DOVF],
+ *	then write a value of 2^20 minus the read [DBELL_CNT], then write one
+ *	to CPT()_VQ()_MISC_INT_W1C[DBELL_DOVF] and
+ *	CPT()_VQ()_MISC_INT_ENA_W1S[DBELL_DOVF]. Must be a multiple of 8.
+ *	All CPT instructions are 8 words and require a doorbell count of
+ *	multiple of 8.
+ */
+union cptx_vqx_doorbell {
+	uint64_t u;
+	struct cptx_vqx_doorbell_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_20_63:44;
+		uint64_t dbell_cnt:20;
+#else /* Word 0 - Little Endian */
+		uint64_t dbell_cnt:20;
+		uint64_t reserved_20_63:44;
+#endif /* Word 0 - End */
+	} s;
+};
+
+/**
+ * Register (NCB) cpt#_vq#_inprog
+ *
+ * CPT Queue In Progress Count Registers
+ * These registers contain the per-queue instruction in flight registers.
+ * cptx_vqx_inprog_s
+ * Word0
+ *  reserved_8_63:56 [63:8] Reserved.
+ *  inflight:8 [7:0](RO/H) Inflight count. Counts the number of instructions
+ *	for the VF for which CPT is fetching, executing or responding to
+ *	instructions. However this does not include any interrupts that are
+ *	awaiting software handling (CPT()_VQ()_DONE[DONE] != 0x0).
+ *	A queue may not be reconfigured until:
+ *	1. CPT()_VQ()_CTL[ENA] is cleared by software.
+ *	2. [INFLIGHT] is polled until equals to zero.
+ */
+union cptx_vqx_inprog {
+	uint64_t u;
+	struct cptx_vqx_inprog_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_8_63:56;
+		uint64_t inflight:8;
+#else /* Word 0 - Little Endian */
+		uint64_t inflight:8;
+		uint64_t reserved_8_63:56;
+#endif /* Word 0 - End */
+	} s;
+};
+
+/**
+ * Register (NCB) cpt#_vq#_misc_int
+ *
+ * CPT Queue Misc Interrupt Register
+ * These registers contain the per-queue miscellaneous interrupts.
+ * cptx_vqx_misc_int_s
+ * Word 0
+ *  reserved_5_63:59 [63:5] Reserved.
+ *  swerr:1 [4:4](R/W1C/H) Software error from engines.
+ *  nwrp:1  [3:3](R/W1C/H) NCB result write response error.
+ *  irde:1  [2:2](R/W1C/H) Instruction NCB read response error.
+ *  dovf:1 [1:1](R/W1C/H) Doorbell overflow.
+ *  mbox:1 [0:0](R/W1C/H) PF to VF mailbox interrupt. Set when
+ *	CPT()_VF()_PF_MBOX(0) is written.
+ *
+ */
+union cptx_vqx_misc_int {
+	uint64_t u;
+	struct cptx_vqx_misc_int_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_5_63:59;
+		uint64_t swerr:1;
+		uint64_t nwrp:1;
+		uint64_t irde:1;
+		uint64_t dovf:1;
+		uint64_t mbox:1;
+#else /* Word 0 - Little Endian */
+		uint64_t mbox:1;
+		uint64_t dovf:1;
+		uint64_t irde:1;
+		uint64_t nwrp:1;
+		uint64_t swerr:1;
+		uint64_t reserved_5_63:59;
+#endif /* Word 0 - End */
+	} s;
+};
+
+/**
+ * Register (NCB) cpt#_vq#_done_ack
+ *
+ * CPT Queue Done Count Ack Registers
+ * This register is written by software to acknowledge interrupts.
+ * cptx_vqx_done_ack_s
+ * Word0
+ *  reserved_20_63:44 [63:20] Reserved.
+ *  done_ack:20 [19:0](R/W/H) Number of decrements to CPT()_VQ()_DONE[DONE].
+ *	Reads CPT()_VQ()_DONE[DONE]. Written by software to acknowledge
+ *	interrupts. If CPT()_VQ()_DONE[DONE] is still nonzero the interrupt
+ *	will be re-sent if the conditions described in CPT()_VQ()_DONE[DONE]
+ *	are satisfied.
+ *
+ */
+union cptx_vqx_done_ack {
+	uint64_t u;
+	struct cptx_vqx_done_ack_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_20_63:44;
+		uint64_t done_ack:20;
+#else /* Word 0 - Little Endian */
+		uint64_t done_ack:20;
+		uint64_t reserved_20_63:44;
+#endif /* Word 0 - End */
+	} s;
+};
+
+/**
+ * Register (NCB) cpt#_vq#_done
+ *
+ * CPT Queue Done Count Registers
+ * These registers contain the per-queue instruction done count.
+ * cptx_vqx_done_s
+ * Word0
+ *  reserved_20_63:44 [63:20] Reserved.
+ *  done:20 [19:0](R/W/H) Done count. When CPT_INST_S[DONEINT] set and that
+ *	instruction completes, CPT()_VQ()_DONE[DONE] is incremented when the
+ *	instruction finishes. Write to this field are for diagnostic use only;
+ *	instead software writes CPT()_VQ()_DONE_ACK with the number of
+ *	decrements for this field.
+ *	Interrupts are sent as follows:
+ *	* When CPT()_VQ()_DONE[DONE] = 0, then no results are pending, the
+ *	interrupt coalescing timer is held to zero, and an interrupt is not
+ *	sent.
+ *	* When CPT()_VQ()_DONE[DONE] != 0, then the interrupt coalescing timer
+ *	counts. If the counter is >= CPT()_VQ()_DONE_WAIT[TIME_WAIT]*1024, or
+ *	CPT()_VQ()_DONE[DONE] >= CPT()_VQ()_DONE_WAIT[NUM_WAIT], i.e. enough
+ *	time has passed or enough results have arrived, then the interrupt is
+ *	sent.
+ *	* When CPT()_VQ()_DONE_ACK is written (or CPT()_VQ()_DONE is written
+ *	but this is not typical), the interrupt coalescing timer restarts.
+ *	Note after decrementing this interrupt equation is recomputed,
+ *	for example if CPT()_VQ()_DONE[DONE] >= CPT()_VQ()_DONE_WAIT[NUM_WAIT]
+ *	and because the timer is zero, the interrupt will be resent immediately.
+ *	(This covers the race case between software acknowledging an interrupt
+ *	and a result returning.)
+ *	* When CPT()_VQ()_DONE_ENA_W1S[DONE] = 0, interrupts are not sent,
+ *	but the counting described above still occurs.
+ *	Since CPT instructions complete out-of-order, if software is using
+ *	completion interrupts the suggested scheme is to request a DONEINT on
+ *	each request, and when an interrupt arrives perform a "greedy" scan for
+ *	completions; even if a later command is acknowledged first this will
+ *	not result in missing a completion.
+ *	Software is responsible for making sure [DONE] does not overflow;
+ *	for example by insuring there are not more than 2^20-1 instructions in
+ *	flight that may request interrupts.
+ *
+ */
+union cptx_vqx_done {
+	uint64_t u;
+	struct cptx_vqx_done_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_20_63:44;
+		uint64_t done:20;
+#else /* Word 0 - Little Endian */
+		uint64_t done:20;
+		uint64_t reserved_20_63:44;
+#endif /* Word 0 - End */
+	} s;
+};
+
+/**
+ * Register (NCB) cpt#_vq#_done_wait
+ *
+ * CPT Queue Done Interrupt Coalescing Wait Registers
+ * Specifies the per queue interrupt coalescing settings.
+ * cptx_vqx_done_wait_s
+ * Word0
+ *  reserved_48_63:16 [63:48] Reserved.
+ *  time_wait:16; [47:32](R/W) Time hold-off. When CPT()_VQ()_DONE[DONE] = 0
+ *	or CPT()_VQ()_DONE_ACK is written a timer is cleared. When the timer
+ *	reaches [TIME_WAIT]*1024 then interrupt coalescing ends.
+ *	see CPT()_VQ()_DONE[DONE]. If 0x0, time coalescing is disabled.
+ *  reserved_20_31:12 [31:20] Reserved.
+ *  num_wait:20 [19:0](R/W) Number of messages hold-off.
+ *	When CPT()_VQ()_DONE[DONE] >= [NUM_WAIT] then interrupt coalescing ends
+ *	see CPT()_VQ()_DONE[DONE]. If 0x0, same behavior as 0x1.
+ *
+ */
+union cptx_vqx_done_wait {
+	uint64_t u;
+	struct cptx_vqx_done_wait_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_48_63:16;
+		uint64_t time_wait:16;
+		uint64_t reserved_20_31:12;
+		uint64_t num_wait:20;
+#else /* Word 0 - Little Endian */
+		uint64_t num_wait:20;
+		uint64_t reserved_20_31:12;
+		uint64_t time_wait:16;
+		uint64_t reserved_48_63:16;
+#endif /* Word 0 - End */
+	} s;
+};
+
+/**
+ * Register (NCB) cpt#_vq#_done_ena_w1s
+ *
+ * CPT Queue Done Interrupt Enable Set Registers
+ * Write 1 to these registers will enable the DONEINT interrupt for the queue.
+ * cptx_vqx_done_ena_w1s_s
+ * Word0
+ *  reserved_1_63:63 [63:1] Reserved.
+ *  done:1 [0:0](R/W1S/H) Write 1 will enable DONEINT for this queue.
+ *	Write 0 has no effect. Read will return the enable bit.
+ */
+union cptx_vqx_done_ena_w1s {
+	uint64_t u;
+	struct cptx_vqx_done_ena_w1s_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_1_63:63;
+		uint64_t done:1;
+#else /* Word 0 - Little Endian */
+		uint64_t done:1;
+		uint64_t reserved_1_63:63;
+#endif /* Word 0 - End */
+	} s;
+};
+
+/**
+ * Register (NCB) cpt#_vq#_ctl
+ *
+ * CPT VF Queue Control Registers
+ * This register configures queues. This register should be changed (other than
+ * clearing [ENA]) only when quiescent (see CPT()_VQ()_INPROG[INFLIGHT]).
+ * cptx_vqx_ctl_s
+ * Word0
+ *  reserved_1_63:63 [63:1] Reserved.
+ *  ena:1 [0:0](R/W/H) Enables the logical instruction queue.
+ *	See also CPT()_PF_Q()_CTL[CONT_ERR] and	CPT()_VQ()_INPROG[INFLIGHT].
+ *	1 = Queue is enabled.
+ *	0 = Queue is disabled.
+ */
+union cptx_vqx_ctl {
+	uint64_t u;
+	struct cptx_vqx_ctl_s {
+#if defined(__BIG_ENDIAN_BITFIELD) /* Word 0 - Big Endian */
+		uint64_t reserved_1_63:63;
+		uint64_t ena:1;
+#else /* Word 0 - Little Endian */
+		uint64_t ena:1;
+		uint64_t reserved_1_63:63;
+#endif /* Word 0 - End */
+	} s;
+};
+#endif /*__CPT_HW_TYPES_H*/
diff --git a/drivers/crypto/cavium/cpt/cpt_main.c b/drivers/crypto/cavium/cpt/cpt_main.c
new file mode 100644
index 0000000..f5a89f9
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/cpt_main.c
@@ -0,0 +1,891 @@
+/*
+ * Copyright (C) 2016 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ */
+
+#include <linux/version.h>
+#include <linux/aer.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/printk.h>
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/firmware.h>
+#include <linux/pci.h>
+
+#include "cpt.h"
+
+#define DRV_NAME	"thunder-cpt"
+#define DRV_VERSION	"1.0"
+
+/* Global list for holding all cpt_device pointers */
+struct cpt_device_list cpt_dev_list;
+
+static uint32_t num_vfs = 1; /* Default 1 VF enabled */
+module_param(num_vfs, uint, 0);
+MODULE_PARM_DESC(num_vfs, "Number of VFs to enable(1-16)");
+
+static inline void cpt_init_device_list(struct cpt_device_list *cpt_list)
+{
+	cpt_list->nr_device = 0;
+	spin_lock_init(&cpt_list->lock);
+
+	memset(cpt_list->device_ptr, 0, (sizeof(void *) * MAX_CPT_DEVICES));
+}
+
+static inline int32_t cpt_get_device_number(struct cpt_device_list *cpt_list,
+					    void *dev)
+{
+	struct cpt_device *cpt = (struct cpt_device *)dev;
+	int32_t i = 0;
+
+	spin_lock(&cpt_list->lock);
+
+	for (i = 0; i < MAX_CPT_DEVICES; i++) {
+		if (cpt_list->device_ptr[i] == cpt) {
+			spin_unlock(&cpt_list->lock);
+			return i;
+		}
+	}
+	spin_unlock(&cpt_list->lock);
+
+	return -1;
+}
+
+static inline int32_t cpt_add_device(struct cpt_device_list *cpt_list,
+				     struct cpt_device *cpt)
+{
+	/* lock the global device list */
+	spin_lock(&cpt_list->lock);
+
+	if (cpt_list->nr_device > MAX_CPT_DEVICES) {
+		/* unlock the global device list */
+		spin_unlock(&cpt_list->lock);
+		return -ENOMEM;
+	}
+
+	cpt->idx = cpt_list->nr_device;
+
+	cpt_list->device_ptr[cpt_list->nr_device] = cpt;
+	cpt_list->nr_device++;
+
+	/* unlock the global device list */
+	spin_unlock(&cpt_list->lock);
+
+	return 0;
+}
+
+static inline void cpt_remove_device(struct cpt_device_list *cpt_list,
+				     struct cpt_device *cpt)
+{
+	int32_t i = 0;
+
+	/* lock the global device list */
+	spin_lock(&cpt_list->lock);
+
+	while (i < MAX_CPT_DEVICES) {
+		if (cpt_list->device_ptr[i] == cpt) {
+			cpt_list->device_ptr[i] = NULL;
+			cpt_list->nr_device--;
+			break;
+		}
+		i++;
+	}
+
+	/* unlock the global device list */
+	spin_unlock(&cpt_list->lock);
+}
+
+struct cpt_device *cpt_get_device(struct cpt_device_list *cpt_list,
+				  int32_t dev_no)
+{
+	if (dev_no >= cpt_list->nr_device)
+		return NULL;
+
+	return cpt_list->device_ptr[dev_no];
+}
+
+int32_t nr_cpt_devices(struct cpt_device_list *cpt_list)
+{
+	return cpt_list->nr_device;
+}
+
+static uint64_t get_mask_from_value(int32_t value)
+{
+	uint64_t mask = 0ULL;
+	int32_t i;
+
+	for (i = 0; i < value; i++)
+		mask |= ((uint64_t)1 << i);
+
+	return mask;
+}
+
+/*
+ * Disable cores specified by coremask
+ */
+static void cpt_disable_cores(struct cpt_device *cpt, uint64_t coremask,
+			      uint8_t type, uint8_t grp)
+{
+	union cptx_pf_exe_ctl pf_exe_ctl;
+	uint32_t timeout = 0xFFFFFFFF;
+	uint64_t grpmask = 0;
+	struct device *dev = &cpt->pdev->dev;
+
+	if (type == AE_TYPES)
+		coremask = (coremask << cpt->max_se_cores);
+
+	/* Disengage the cores from groups */
+	grpmask = cpt_read_csr64(cpt->reg_base, CPTX_PF_GX_EN(0, grp));
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_GX_EN(0, grp),
+			(grpmask & ~coremask));
+	udelay(CSR_DELAY);
+	grp = cpt_read_csr64(cpt->reg_base, CPTX_PF_EXEC_BUSY(0));
+	while (grp & coremask) {
+		dev_err(dev, "Cores still busy %llx", coremask);
+		grp = cpt_read_csr64(cpt->reg_base,
+				     CPTX_PF_EXEC_BUSY(0));
+		if (timeout--)
+			break;
+	}
+
+	/* Disable the cores */
+	pf_exe_ctl.u = cpt_read_csr64(cpt->reg_base, CPTX_PF_EXE_CTL(0));
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_EXE_CTL(0),
+			(pf_exe_ctl.u & ~coremask));
+	udelay(CSR_DELAY);
+}
+
+/*
+ * Enable cores specified by coremask
+ */
+static void cpt_enable_cores(struct cpt_device *cpt, uint64_t coremask,
+			     uint8_t type)
+{
+	union cptx_pf_exe_ctl pf_exe_ctl;
+
+	if (type == AE_TYPES)
+		coremask = (coremask << cpt->max_se_cores);
+
+	pf_exe_ctl.u = cpt_read_csr64(cpt->reg_base, CPTX_PF_EXE_CTL(0));
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_EXE_CTL(0),
+			(pf_exe_ctl.u | coremask));
+	udelay(CSR_DELAY);
+}
+
+static void cpt_configure_group(struct cpt_device *cpt, uint8_t grp,
+				uint64_t coremask, uint8_t type)
+{
+	union cptx_pf_gx_en pf_gx_en = {0};
+
+	if (type == AE_TYPES)
+		coremask = (coremask << cpt->max_se_cores);
+
+	pf_gx_en.u = cpt_read_csr64(cpt->reg_base, CPTX_PF_GX_EN(0, grp));
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_GX_EN(0, grp),
+			(pf_gx_en.u | coremask));
+	udelay(CSR_DELAY);
+}
+
+static void cpt_disable_mbox_interrupts(struct cpt_device *cpt)
+{
+	/* Clear mbox(0) interupts for all vfs */
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_MBOX_ENA_W1CX(0, 0), ~0ull);
+}
+
+static void cpt_disable_ecc_interrupts(struct cpt_device *cpt)
+{
+	/* Clear ecc(0) interupts for all vfs */
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_ECC0_ENA_W1C(0), ~0ull);
+}
+
+static void cpt_disable_exec_interrupts(struct cpt_device *cpt)
+{
+	/* Clear exec interupts for all vfs */
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_EXEC_ENA_W1C(0), ~0ull);
+}
+
+static void cpt_disable_all_interrupts(struct cpt_device *cpt)
+{
+	cpt_disable_mbox_interrupts(cpt);
+	cpt_disable_ecc_interrupts(cpt);
+	cpt_disable_exec_interrupts(cpt);
+}
+
+static void cpt_enable_mbox_interrupts(struct cpt_device *cpt)
+{
+	/* Set mbox(0) interupts for all vfs */
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_MBOX_ENA_W1SX(0, 0), ~0ull);
+}
+
+static void cpt_enable_ecc_interrupts(struct cpt_device *cpt)
+{
+	/* Set ecc(0) interupts for all vfs */
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_ECC0_ENA_W1S(0), ~0ull);
+}
+
+static void cpt_enable_exec_interrupts(struct cpt_device *cpt)
+{
+	/* Set exec interupts for all vfs */
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_EXEC_ENA_W1S(0), ~0ull);
+}
+
+static void cpt_enable_all_interrupts(struct cpt_device *cpt)
+{
+	cpt_enable_mbox_interrupts(cpt);
+	cpt_enable_ecc_interrupts(cpt);
+	cpt_enable_exec_interrupts(cpt);
+}
+
+static int32_t cpt_load_microcode(struct cpt_device *cpt,
+				  struct microcode *mcode)
+{
+	int32_t ret = 0, core = 0, shift = 0;
+	uint32_t total_cores = 0;
+	struct device *dev = &cpt->pdev->dev;
+
+	if (!mcode || !mcode->code) {
+		dev_err(dev, "Either the mcode is null or data is NULL\n");
+		return 1;
+	}
+
+	if (mcode->code_size == 0) {
+		dev_err(dev, "microcode size is 0\n");
+		return 1;
+	}
+
+	/* Assumes 0-9 are SE cores for UCODE_BASE registers and
+	 * AE core bases follow
+	 */
+	if (mcode->is_ae) {
+		core = CPT_MAX_SE_CORES; /* start couting from 10 */
+		total_cores = CPT_MAX_TOTAL_CORES; /* upto 15 */
+	} else {
+		core = 0; /* start couting from 0 */
+		total_cores = CPT_MAX_SE_CORES; /* upto 9 */
+	}
+
+	/* Point to microcode for each core of the group */
+	for (; core < total_cores ; core++, shift++) {
+		if (mcode->core_mask_low & (1 << shift)) {
+			cpt_write_csr64(cpt->reg_base,
+					CPTX_PF_ENGX_UCODE_BASE(0, core),
+					(uint64_t)mcode->phys_base);
+		}
+	}
+	return ret;
+}
+
+static int32_t do_cpt_init(struct cpt_device *cpt, struct microcode *mcode)
+{
+	int32_t ret = 0;
+	struct device *dev = &cpt->pdev->dev;
+
+	/* Make device not ready */
+	cpt->flags &= ~CPT_FLAG_DEVICE_READY;
+	/* Disable All PF interrupts */
+	cpt_disable_all_interrupts(cpt);
+	/* Calculate mcode group and coremasks */
+	if (mcode->is_ae) {
+		if (mcode->num_cores > cpt->avail_ae_cores) {
+			dev_err(dev, "Requested for more cores than available AE cores\n");
+			ret = -1;
+			goto cpt_init_fail;
+		}
+
+		if (cpt->next_group >= CPT_MAX_CORE_GROUPS) {
+			dev_err(dev, "Can't load, all eight microcode groups in use");
+			return -ENFILE;
+		}
+
+		mcode->group = cpt->next_group;
+		/* Convert requested cores to mask */
+		mcode->core_mask_low = get_mask_from_value(mcode->num_cores);
+		mcode->core_mask_low <<= (cpt->max_ae_cores -
+					  cpt->avail_ae_cores);
+		/* Deduct the available ae cores */
+		cpt->avail_ae_cores -= mcode->num_cores;
+		cpt_disable_cores(cpt, mcode->core_mask_low, AE_TYPES,
+				  mcode->group);
+		/* Load microcode for AE engines */
+		if (cpt_load_microcode(cpt, mcode)) {
+			dev_err(dev, "Microcode load Failed for %s\n",
+				mcode->version);
+			ret = -1;
+			goto cpt_init_fail;
+		}
+		cpt->next_group++;
+		/* Configure group mask for the mcode */
+		cpt_configure_group(cpt, mcode->group, mcode->core_mask_low,
+				    AE_TYPES);
+		/* Enable AE cores for the group mask */
+		cpt_enable_cores(cpt, mcode->core_mask_low, AE_TYPES);
+	} else {
+		if (mcode->num_cores > cpt->avail_se_cores) {
+			dev_err(dev, "Requested for more cores than available SE cores\n");
+			ret = -1;
+			goto cpt_init_fail;
+		}
+		if (cpt->next_group >= CPT_MAX_CORE_GROUPS) {
+			dev_err(dev, "Can't load, all eight microcode groups in use");
+			return -ENFILE;
+		}
+
+		mcode->group = cpt->next_group;
+		/* Covert requested cores to mask */
+		mcode->core_mask_low = get_mask_from_value(mcode->num_cores);
+		mcode->core_mask_low <<= (cpt->max_se_cores -
+					  cpt->avail_se_cores);
+		/* Deduct the available se cores */
+		cpt->avail_se_cores -= mcode->num_cores;
+		cpt_disable_cores(cpt, mcode->core_mask_low, SE_TYPES,
+				  mcode->group);
+		/* Load microcode for SE engines */
+		if (cpt_load_microcode(cpt, mcode)) {
+			dev_err(dev, "Microcode load Failed for %s\n",
+				mcode->version);
+			ret = -1;
+			goto cpt_init_fail;
+		}
+		cpt->next_group++;
+		/* Configure group mask for the mcode */
+		cpt_configure_group(cpt, mcode->group, mcode->core_mask_low,
+				    SE_TYPES);
+		/* Enable SE cores for the group mask */
+		cpt_enable_cores(cpt, mcode->core_mask_low, SE_TYPES);
+	}
+
+	/* Enabled PF mailbox interrupts */
+	cpt_enable_mbox_interrupts(cpt);
+	cpt->flags |= CPT_FLAG_DEVICE_READY;
+
+	return ret;
+
+cpt_init_fail:
+	/* Enabled PF mailbox interrupts */
+	cpt_enable_mbox_interrupts(cpt);
+	/* Reset coremask values */
+	/* TODO: Revisit this failure case for more loads case */
+	cpt->avail_ae_cores = cpt->max_ae_cores;
+	cpt->avail_se_cores = cpt->max_se_cores;
+
+	return ret;
+}
+
+struct ucode_header {
+	uint8_t version[32];
+	uint32_t code_length;
+	uint32_t data_length;
+	uint64_t sram_address;
+};
+
+static int32_t cpt_ucode_load_fw(struct cpt_device *cpt, const uint8_t *fw,
+				 bool is_ae)
+{
+	const struct firmware *fw_entry;
+	struct device *dev = &cpt->pdev->dev;
+	struct ucode_header *ucode;
+	struct microcode *mcode;
+	int j, ret = 0;
+
+	ret = request_firmware(&fw_entry, fw, dev);
+	if (ret)
+		return ret;
+
+	mcode = &cpt->mcode[cpt->next_mc_idx];
+	ucode = (struct ucode_header *)fw_entry->data;
+	memcpy(mcode->version, (uint8_t *)fw_entry->data, 32);
+	mcode->code_size = ntohl(ucode->code_length) * 2;
+	mcode->is_ae = is_ae;
+	mcode->core_mask_low  = 0ULL;
+	mcode->core_mask_hi   = 0ULL;
+	mcode->num_cores = is_ae ? 6 : 10;
+
+	/*  Allocate DMAable space */
+	mcode->code = dma_zalloc_coherent(&cpt->pdev->dev, mcode->code_size,
+					  &mcode->dma, GFP_KERNEL);
+	if (!mcode->code) {
+		dev_err(dev, "Unable to allocate space for microcode");
+		return -ENOMEM;
+	}
+	/* Align memory address for 'align_bytes' */
+	/* Neglect Bits 6:0 and 49:63: Align for 128-bytes */
+	mcode->phys_base = ALIGN((uint64_t)mcode->dma, 128);
+	mcode->base = mcode->code + (mcode->phys_base - mcode->dma);
+	memcpy((void *)mcode->base, (void *)(fw_entry->data + 48),
+	       mcode->code_size);
+
+	/* Byte swap 64-bit */
+	for (j = 0; j < (mcode->code_size / 8); j++)
+		byte_swap_64(&((uint64_t *)mcode->base)[j]);
+	/*  MC needs 16-bit swap */
+	for (j = 0; j < (mcode->code_size / 2); j++)
+		byte_swap_16(&((uint16_t *)mcode->base)[j]);
+
+	dev_dbg(dev, "mcode->code_size = %u\n", mcode->code_size);
+	dev_dbg(dev, "mcode->is_ae       = %u\n", mcode->is_ae);
+	dev_dbg(dev, "mcode->num_cores   = %u\n", mcode->num_cores);
+	dev_dbg(dev, "mcode->code = %llx\n", (uint64_t)mcode->code);
+	dev_dbg(dev, "mcode->phys_base = %llx\n", mcode->phys_base);
+	dev_dbg(dev, "mcode->base = %llx\n", (uint64_t)mcode->base);
+	dev_dbg(dev, "mcode->is_mc_valid = %u\n", mcode->is_mc_valid);
+
+	ret = do_cpt_init(cpt, mcode);
+	if (ret) {
+		dev_err(dev, "do_cpt_init failed with ret: %d\n", ret);
+		return ret;
+	}
+
+	dev_dbg(dev, "Microcode Loaded\n");
+	mcode->is_mc_valid = 1;
+	cpt->next_mc_idx++;
+	dev_dbg(dev, "mcode->is_mc_valid = %u\n", mcode->is_mc_valid);
+	release_firmware(fw_entry);
+
+	return ret;
+}
+
+static int32_t cpt_ucode_load(struct cpt_device *cpt)
+{
+	int32_t ret = 0;
+	struct device *dev = &cpt->pdev->dev;
+
+	ret = cpt_ucode_load_fw(cpt, "cpt8x-mc-ae.out", true);
+	if (ret) {
+		dev_err(dev, "ae:cpt_ucode_load failed with ret: %d\n", ret);
+		return ret;
+	}
+	ret = cpt_ucode_load_fw(cpt, "cpt8x-mc-se.out", false);
+	if (ret) {
+		dev_err(dev, "se:cpt_ucode_load failed with ret: %d\n", ret);
+		return ret;
+	}
+
+	return ret;
+}
+
+uint16_t active_cpt_devmask(struct cpt_device_list *cpt_list)
+{
+	struct cpt_device *cpt;
+	uint16_t mask = 0;
+	int32_t i = 0;
+
+	while (i < MAX_CPT_DEVICES) {
+		cpt = cpt_list->device_ptr[i];
+		if (cpt && cpt_device_ready(cpt))
+			mask |= (1 << i);
+		i++;
+	}
+
+	return mask;
+}
+
+static int32_t cpt_enable_msix(struct cpt_device *cpt)
+{
+	int32_t i, ret;
+
+	cpt->num_vec = CPT_PF_MSIX_VECTORS;
+
+	for (i = 0; i < cpt->num_vec; i++)
+		cpt->msix_entries[i].entry = i;
+
+	ret = pci_enable_msix(cpt->pdev, cpt->msix_entries, cpt->num_vec);
+	if (ret) {
+		dev_err(&cpt->pdev->dev, "Request for #%d msix vectors failed\n",
+			cpt->num_vec);
+		return ret;
+	}
+
+	cpt->msix_enabled = 1;
+	return 0;
+}
+
+static irqreturn_t cpt_mbx0_intr_handler (int32_t irq, void *cpt_irq)
+{
+	struct cpt_device *cpt = (struct cpt_device *)cpt_irq;
+
+	cpt_mbox_intr_handler(cpt, 0);
+
+	return IRQ_HANDLED;
+}
+
+static void cpt_disable_msix(struct cpt_device *cpt)
+{
+	if (cpt->msix_enabled) {
+		pci_disable_msix(cpt->pdev);
+		cpt->msix_enabled = 0;
+		cpt->num_vec = 0;
+	}
+}
+
+static void cpt_free_all_interrupts(struct cpt_device *cpt)
+{
+	int32_t irq;
+
+	for (irq = 0; irq < cpt->num_vec; irq++) {
+		if (cpt->irq_allocated[irq])
+			free_irq(cpt->msix_entries[irq].vector, cpt);
+		cpt->irq_allocated[irq] = false;
+	}
+}
+
+static void cpt_reset(struct cpt_device *cpt)
+{
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_RESET(0), 1);
+}
+
+static void cpt_find_max_enabled_cores(struct cpt_device *cpt)
+{
+	union cptx_pf_constants pf_cnsts = {0};
+
+	pf_cnsts.u = cpt_read_csr64(cpt->reg_base, CPTX_PF_CONSTANTS(0));
+	cpt->max_se_cores = pf_cnsts.s.se;
+	cpt->max_ae_cores = pf_cnsts.s.ae;
+}
+
+static uint32_t cpt_check_bist_status(struct cpt_device *cpt)
+{
+	union cptx_pf_bist_status bist_sts = {0};
+
+	bist_sts.u = cpt_read_csr64(cpt->reg_base,
+				    CPTX_PF_BIST_STATUS(0));
+
+	return bist_sts.u;
+}
+
+static uint64_t cpt_check_exe_bist_status(struct cpt_device *cpt)
+{
+	union cptx_pf_exe_bist_status bist_sts = {0};
+
+	bist_sts.u = cpt_read_csr64(cpt->reg_base,
+				    CPTX_PF_EXE_BIST_STATUS(0));
+
+	return bist_sts.u;
+}
+
+static void cpt_disable_all_cores(struct cpt_device *cpt)
+{
+	uint32_t grp, timeout = 0xFFFFFFFF;
+	struct device *dev = &cpt->pdev->dev;
+
+	/* Disengage the cores from groups */
+	for (grp = 0; grp < CPT_MAX_CORE_GROUPS; grp++) {
+		cpt_write_csr64(cpt->reg_base, CPTX_PF_GX_EN(0, grp), 0);
+		udelay(CSR_DELAY);
+	}
+
+	grp = cpt_read_csr64(cpt->reg_base, CPTX_PF_EXEC_BUSY(0));
+	while (grp) {
+		dev_err(dev, "Cores still busy");
+		grp = cpt_read_csr64(cpt->reg_base,
+				     CPTX_PF_EXEC_BUSY(0));
+		if (timeout--)
+			break;
+	}
+	/* Disable the cores */
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_EXE_CTL(0), 0);
+}
+
+/**
+ * Ensure all cores are disenganed from all groups by
+ * calling cpt_disable_all_cores() before calling this
+ * function.
+ */
+static void cpt_unload_microcode(struct cpt_device *cpt)
+{
+	uint32_t grp = 0, core;
+
+	/* Free microcode bases and reset group masks */
+	for (grp = 0; grp < CPT_MAX_CORE_GROUPS; grp++) {
+		struct microcode *mcode = &cpt->mcode[grp];
+
+		if (cpt->mcode[grp].code)
+			dma_free_coherent(&cpt->pdev->dev, mcode->code_size,
+					  mcode->code, mcode->dma);
+		mcode->code = NULL;
+		mcode->base = NULL;
+	}
+	/* Clear UCODE_BASE registers for all engines */
+	for (core = 0; core < CPT_MAX_TOTAL_CORES; core++)
+		cpt_write_csr64(cpt->reg_base,
+				CPTX_PF_ENGX_UCODE_BASE(0, core), 0ull);
+}
+
+static int32_t cpt_device_init(struct cpt_device *cpt)
+{
+	uint16_t device_id;
+	uint8_t rev_id;
+	uint64_t bist;
+	struct device *dev = &cpt->pdev->dev;
+
+	/* Reset the PF when probed first */
+	cpt_reset(cpt);
+	mdelay((100));
+
+	pci_read_config_word(cpt->pdev, PCI_DEVICE_ID, &device_id);
+	pci_read_config_byte(cpt->pdev, PCI_REVISION_ID, &rev_id);
+	cpt->chip_id = (device_id << 8) | rev_id;
+	dev_dbg(dev, "CPT Chip ID: 0x%0x ", cpt->chip_id);
+
+	/*Check BIST status*/
+	bist = (uint64_t)cpt_check_bist_status(cpt);
+	if (bist) {
+		dev_err(dev, "RAM BIST failed with code 0x%llx", bist);
+		return -ENODEV;
+	}
+
+	bist = cpt_check_exe_bist_status(cpt);
+	if (bist) {
+		dev_err(dev, "Engine BIST failed with code 0x%llx", bist);
+	return -ENODEV;
+	}
+
+	/*Get CLK frequency*/
+	/*Get max enabled cores */
+	cpt_find_max_enabled_cores(cpt);
+	/*Disable all cores*/
+	cpt_disable_all_cores(cpt);
+	/*Reset device parameters*/
+	cpt->next_mc_idx   = 0;
+	cpt->next_group = 0;
+	cpt->avail_se_cores = cpt->max_se_cores;
+	cpt->avail_ae_cores = cpt->max_ae_cores;
+	/* PF is ready */
+	cpt->flags |= CPT_FLAG_DEVICE_READY;
+
+	return 0;
+}
+
+static int32_t cpt_register_interrupts(struct cpt_device *cpt)
+{
+	int32_t ret;
+	struct device *dev = &cpt->pdev->dev;
+
+	/* Enable MSI-X */
+	ret = cpt_enable_msix(cpt);
+	if (ret)
+		return ret;
+
+	/* Register mailbox interrupt handlers */
+	ret = request_irq(cpt->msix_entries[CPT_PF_INT_VEC_E_MBOXX(0)].vector,
+			  cpt_mbx0_intr_handler, 0, "CPT Mbox0", cpt);
+	if (ret)
+		goto fail;
+
+	cpt->irq_allocated[CPT_PF_INT_VEC_E_MBOXX(0)] = true;
+
+	/* Enable mailbox interrupt */
+	cpt_enable_mbox_interrupts(cpt);
+	return 0;
+
+fail:
+	dev_err(dev, "Request irq failed\n");
+	cpt_free_all_interrupts(cpt);
+	return ret;
+}
+
+static void cpt_unregister_interrupts(struct cpt_device *cpt)
+{
+	cpt_free_all_interrupts(cpt);
+	cpt_disable_msix(cpt);
+}
+
+static int32_t cpt_sriov_init(struct cpt_device *cpt, int32_t num_vfs)
+{
+	int32_t pos = 0;
+	int32_t err;
+	uint16_t total_vf_cnt;
+	struct pci_dev *pdev = cpt->pdev;
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_SRIOV);
+	if (!pos) {
+		dev_err(&pdev->dev, "SRIOV capability is not found in PCIe config space\n");
+		return -ENODEV;
+	}
+
+	cpt->num_vf_en = num_vfs; /* User requested VFs */
+	pci_read_config_word(pdev, (pos + PCI_SRIOV_TOTAL_VF), &total_vf_cnt);
+	if (total_vf_cnt < cpt->num_vf_en)
+		cpt->num_vf_en = total_vf_cnt;
+
+	if (!total_vf_cnt)
+		return 0;
+
+	/*Enabled the available VFs */
+	err = pci_enable_sriov(pdev, cpt->num_vf_en);
+	if (err) {
+		dev_err(&pdev->dev, "SRIOV enable failed, num VF is %d\n",
+			cpt->num_vf_en);
+		cpt->num_vf_en = 0;
+		return err;
+	}
+
+	/* TODO: Optionally enable static VQ priorities feature */
+
+	dev_info(&pdev->dev, "SRIOV enabled, number of VF available %d\n",
+		 cpt->num_vf_en);
+
+	cpt->flags |= CPT_FLAG_SRIOV_ENABLED;
+
+	return 0;
+}
+
+static int32_t cpt_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
+{
+	struct device *dev = &pdev->dev;
+	struct cpt_device *cpt;
+	int32_t    err;
+
+	cpt = devm_kzalloc(dev, sizeof(struct cpt_device), GFP_KERNEL);
+	if (!cpt)
+		return -ENOMEM;
+
+	pci_set_drvdata(pdev, cpt);
+	cpt->pdev = pdev;
+	err = pci_enable_device(pdev);
+	if (err) {
+		dev_err(dev, "Failed to enable PCI device\n");
+		pci_set_drvdata(pdev, NULL);
+		return err;
+	}
+
+	err = pci_request_regions(pdev, DRV_NAME);
+	if (err) {
+		dev_err(dev, "PCI request regions failed 0x%x\n", err);
+		goto cpt_err_disable_device;
+	}
+
+	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(48));
+	if (err) {
+		dev_err(dev, "Unable to get usable DMA configuration\n");
+		goto cpt_err_release_regions;
+	}
+
+	err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(48));
+	if (err) {
+		dev_err(dev, "Unable to get 48-bit DMA for consistent allocations\n");
+		goto cpt_err_release_regions;
+	}
+
+	/* MAP PF's configuration registers */
+	cpt->reg_base = pcim_iomap(pdev, CPT_CSR_BAR, 0);
+	if (!cpt->reg_base) {
+		dev_err(dev, "Cannot map config register space, aborting\n");
+		err = -ENOMEM;
+		goto cpt_err_release_regions;
+	}
+
+	/* CPT device HW initialization */
+	cpt_device_init(cpt);
+
+	/* Register interrupts */
+	err = cpt_register_interrupts(cpt);
+	if (err)
+		goto cpt_err_release_regions;
+
+	err = cpt_ucode_load(cpt);
+	if (err)
+		goto cpt_err_unregister_interrupts;
+
+	/* Configure SRIOV */
+	err = cpt_sriov_init(cpt, num_vfs);
+	if (err)
+		goto cpt_err_unregister_interrupts;
+
+	/* Add device to global device list */
+	cpt_add_device(&cpt_dev_list, cpt);
+
+	return 0;
+
+cpt_err_unregister_interrupts:
+	cpt_unregister_interrupts(cpt);
+cpt_err_release_regions:
+	pci_release_regions(pdev);
+cpt_err_disable_device:
+	pci_disable_device(pdev);
+	pci_set_drvdata(pdev, NULL);
+	return err;
+}
+
+static void cpt_remove(struct pci_dev *pdev)
+{
+	struct cpt_device *cpt = pci_get_drvdata(pdev);
+
+	/* Disengage SE and AE cores from all groups*/
+	cpt_disable_all_cores(cpt);
+	/* Unload microcodes */
+	cpt_unload_microcode(cpt);
+	cpt_unregister_interrupts(cpt);
+	pci_disable_sriov(pdev);
+	pci_release_regions(pdev);
+	pci_disable_device(pdev);
+	pci_set_drvdata(pdev, NULL);
+}
+
+static void cpt_shutdown(struct pci_dev *pdev)
+{
+	struct cpt_device *cpt = pci_get_drvdata(pdev);
+
+	if (!cpt)
+		return;
+
+	dev_info(&pdev->dev, "Shutdown device %x:%x.\n",
+		 (uint32_t)pdev->vendor, (uint32_t)pdev->device);
+
+	cpt_unregister_interrupts(cpt);
+	pci_release_regions(pdev);
+	pci_disable_device(pdev);
+	pci_set_drvdata(pdev, NULL);
+	kzfree(cpt);
+}
+
+/* Supported devices */
+static const struct pci_device_id cpt_id_table[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, CPT_81XX_PCI_PF_DEVICE_ID) },
+	{ 0, }  /* end of table */
+};
+
+static struct pci_driver cpt_pci_driver = {
+	.name = DRV_NAME,
+	.id_table = cpt_id_table,
+	.probe = cpt_probe,
+	.remove = cpt_remove,
+	.shutdown = cpt_shutdown,
+};
+
+static int32_t __init cpt_init_module(void)
+{
+	int32_t ret = -1;
+
+	pr_info("%s, ver %s\n", DRV_NAME, DRV_VERSION);
+
+	if (num_vfs > 16) {
+		pr_warn("Invalid vf count %d, Resetting it to 1(default)\n",
+			num_vfs);
+		num_vfs = 1;
+	}
+
+	cpt_init_device_list(&cpt_dev_list);
+	ret = pci_register_driver(&cpt_pci_driver);
+	if (ret)
+		pr_err("pci_register_driver() failed");
+
+	return ret;
+}
+
+static void __exit cpt_cleanup_module(void)
+{
+	pci_unregister_driver(&cpt_pci_driver);
+}
+
+module_init(cpt_init_module);
+module_exit(cpt_cleanup_module);
+
+MODULE_AUTHOR("George Cherian <george.cherian@cavium.com>, Murthy Nidadavolu");
+MODULE_DESCRIPTION("Cavium Thunder CPT Physical Function Driver");
+MODULE_LICENSE("GPL v2");
+MODULE_VERSION(DRV_VERSION);
+MODULE_DEVICE_TABLE(pci, cpt_id_table);
diff --git a/drivers/crypto/cavium/cpt/cpt_pf_mbox.c b/drivers/crypto/cavium/cpt/cpt_pf_mbox.c
new file mode 100644
index 0000000..7ed2d9c
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/cpt_pf_mbox.c
@@ -0,0 +1,174 @@
+/*
+ * Copyright (C) 2016 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+
+#include "cpt.h"
+
+static void cpt_send_msg_to_vf(struct cpt_device *cpt, int vf,
+			       struct cpt_mbox *mbx)
+{
+	/* Writing mbox(0) causes interrupt */
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_VFX_MBOXX(0, vf, 1),
+			mbx->data);
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_VFX_MBOXX(0, vf, 0), mbx->msg);
+}
+
+/* ACKs VF's mailbox message
+ * @vf: VF to which ACK to be sent
+ */
+static void cpt_mbox_send_ack(struct cpt_device *cpt, int vf,
+			      struct cpt_mbox *mbx)
+{
+	mbx->data = 0ull;
+	mbx->msg = CPT_MBOX_MSG_TYPE_ACK;
+	cpt_send_msg_to_vf(cpt, vf, mbx);
+}
+
+static void cpt_clear_mbox_intr(struct cpt_device *cpt, uint32_t vf)
+{
+	/* W1C for the VF */
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_MBOX_INTX(0, 0), (1 << vf));
+}
+
+/*
+ *  Configure QLEN/Chunk sizes for VF
+ */
+static void cpt_cfg_qlen_for_vf(struct cpt_device *cpt, int vf, uint32_t size)
+{
+	union cptx_pf_qx_ctl pf_qx_ctl;
+
+	pf_qx_ctl.u = cpt_read_csr64(cpt->reg_base, CPTX_PF_QX_CTL(0, vf));
+	pf_qx_ctl.s.size = size;
+	pf_qx_ctl.s.cont_err = true;
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_QX_CTL(0, vf), pf_qx_ctl.u);
+}
+
+/*
+ * Configure VQ priority
+ */
+static void cpt_cfg_vq_priority(struct cpt_device *cpt, int vf, uint32_t pri)
+{
+	union cptx_pf_qx_ctl pf_qx_ctl;
+
+	pf_qx_ctl.u = cpt_read_csr64(cpt->reg_base, CPTX_PF_QX_CTL(0, vf));
+	pf_qx_ctl.s.pri = pri;
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_QX_CTL(0, vf), pf_qx_ctl.u);
+}
+
+static uint8_t cpt_bind_vq_to_grp(struct cpt_device *cpt, uint8_t q,
+				  uint8_t grp)
+{
+	struct microcode *mcode = cpt->mcode;
+	union cptx_pf_qx_ctl pf_qx_ctl;
+	struct device *dev = &cpt->pdev->dev;
+
+	if (q >= CPT_MAX_VQ_NUM) {
+		dev_err(dev, "Queues are more than cores in the group");
+		return -EINVAL;
+	}
+	if (grp >= CPT_MAX_CORE_GROUPS) {
+		dev_err(dev, "Request group is more than possible groups");
+		return -EINVAL;
+	}
+	if (grp >= cpt->next_mc_idx) {
+		dev_err(dev, "Request group is higher than available functional groups");
+		return -EINVAL;
+	}
+	pf_qx_ctl.u = cpt_read_csr64(cpt->reg_base, CPTX_PF_QX_CTL(0, q));
+	pf_qx_ctl.s.grp = mcode[grp].group;
+	cpt_write_csr64(cpt->reg_base, CPTX_PF_QX_CTL(0, q), pf_qx_ctl.u);
+	dev_dbg(dev, "VF %d TYPE %s", q, (mcode[grp].is_ae ? "AE" : "SE"));
+
+	return mcode[grp].is_ae ? AE_TYPES : SE_TYPES;
+}
+
+/* Interrupt handler to handle mailbox messages from VFs */
+static void cpt_handle_mbox_intr(struct cpt_device *cpt, int vf)
+{
+	struct cpt_vf_info *vfx = &cpt->vfinfo[vf];
+	struct cpt_mbox mbx = {};
+	union cpt_chipid_vfid chipid_vfid;
+	uint8_t vftype;
+	struct device *dev = &cpt->pdev->dev;
+	/* Take mbox lock */
+	cpt->mbx_lock[vf] = true;
+	/*
+	 * MBOX[0] contains msg
+	 * MBOX[1] contains data
+	 */
+	mbx.msg  = cpt_read_csr64(cpt->reg_base, CPTX_PF_VFX_MBOXX(0, vf, 0));
+	mbx.data = cpt_read_csr64(cpt->reg_base, CPTX_PF_VFX_MBOXX(0, vf, 1));
+	dev_dbg(dev, "%s: Mailbox msg 0x%llx from VF%d", __func__, mbx.msg, vf);
+	switch (mbx.msg) {
+	case CPT_MSG_VF_UP:
+		vfx->state = VF_STATE_UP;
+		try_module_get(THIS_MODULE);
+		cpt_mbox_send_ack(cpt, vf, &mbx);
+		break;
+	case CPT_MSG_READY:
+		chipid_vfid.u16 = 0;
+		chipid_vfid.s.chip_id = cpt->chip_id;
+		chipid_vfid.s.vfid = vf;
+		mbx.msg  = CPT_MSG_READY;
+		mbx.data = chipid_vfid.u16;
+		cpt_send_msg_to_vf(cpt, vf, &mbx);
+		break;
+	case CPT_MSG_VF_DOWN:
+		/* First msg in VF teardown sequence */
+		vfx->state = VF_STATE_DOWN;
+		module_put(THIS_MODULE);
+		cpt_mbox_send_ack(cpt, vf, &mbx);
+		break;
+	case CPT_MSG_QLEN:
+		vfx->qlen = mbx.data;
+		cpt_cfg_qlen_for_vf(cpt, vf, vfx->qlen);
+		cpt_mbox_send_ack(cpt, vf, &mbx);
+		break;
+	case CPT_MSG_QBIND_GRP:
+		vftype = cpt_bind_vq_to_grp(cpt, vf, (uint8_t)mbx.data);
+		if ((vftype != AE_TYPES) && (vftype != SE_TYPES))
+			dev_err(dev, "Queue %d binding to group %llu failed",
+				vf, mbx.data);
+		else {
+			dev_dbg(dev, "Queue %d binding to group %llu successful",
+				vf, mbx.data);
+			mbx.msg = CPT_MSG_QBIND_GRP;
+			mbx.data = vftype;
+			cpt_send_msg_to_vf(cpt, vf, &mbx);
+		}
+		break;
+	case CPT_MSG_VQ_PRIORITY:
+		vfx->priority = mbx.data;
+		cpt_cfg_vq_priority(cpt, vf, vfx->priority);
+		cpt_mbox_send_ack(cpt, vf, &mbx);
+		break;
+	default:
+		dev_err(&cpt->pdev->dev, "Invalid msg from VF%d, msg 0x%llx\n",
+			vf, mbx.msg);
+		break;
+	}
+	/* Unlock mailbox */
+	cpt->mbx_lock[vf] = false;
+}
+
+void cpt_mbox_intr_handler (struct cpt_device *cpt, int mbx)
+{
+	uint64_t intr;
+	uint8_t  vf;
+
+	intr = cpt_read_csr64(cpt->reg_base, CPTX_PF_MBOX_INTX(0, 0));
+	dev_dbg(&cpt->pdev->dev, "PF interrupt Mbox%d 0x%llx\n", mbx, intr);
+	for (vf = 0; vf < CPT_MAX_VF_NUM; vf++) {
+		if (intr & (1ULL << vf)) {
+			dev_dbg(&cpt->pdev->dev, "Intr from VF %d\n", vf);
+			cpt_handle_mbox_intr(cpt, vf);
+			cpt_clear_mbox_intr(cpt, vf);
+		}
+	}
+}
-- 
2.1.4

^ permalink raw reply related

* Re: [patch] s390/crypto: unlock on error in prng_tdes_read()
From: Martin Schwidefsky @ 2016-11-18 12:12 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Herbert Xu, Harald Freudenberger, David S. Miller, Heiko Carstens,
	linux-crypto, linux-s390, kernel-janitors
In-Reply-To: <20161118105451.GA26523@mwanda>

On Fri, 18 Nov 2016 14:11:00 +0300
Dan Carpenter <dan.carpenter@oracle.com> wrote:

> We added some new locking but forgot to unlock on error.
> 
> Fixes: 57127645d79d ("s390/zcrypt: Introduce new SHA-512 based Pseudo Random Generator.")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> 
> diff --git a/arch/s390/crypto/prng.c b/arch/s390/crypto/prng.c
> index 9cc050f..1113389 100644
> --- a/arch/s390/crypto/prng.c
> +++ b/arch/s390/crypto/prng.c
> @@ -507,8 +507,10 @@ static ssize_t prng_tdes_read(struct file *file, char __user *ubuf,
>  		prng_data->prngws.byte_counter += n;
>  		prng_data->prngws.reseed_counter += n;
> 
> -		if (copy_to_user(ubuf, prng_data->buf, chunk))
> -			return -EFAULT;
> +		if (copy_to_user(ubuf, prng_data->buf, chunk)) {
> +			ret = -EFAULT;
> +			break;
> +		}
> 
>  		nbytes -= chunk;
>  		ret += chunk;
> 

Nice spotting, I will add this to my fixes tree. Thank you..

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply

* [PATCH] crypto: CTR DRBG - advance output buffer pointer
From: Stephan Mueller @ 2016-11-18 11:27 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto

The CTR DRBG segments the number of random bytes to be generated into
128 byte blocks. The current code misses the advancement of the output
buffer pointer when the requestor asks for more than 128 bytes of data.
In this case, the next 128 byte block of random numbers is copied to
the beginning of the output buffer again. This implies that only the
first 128 bytes of the output buffer would ever be filled.

The patch adds the advancement of the buffer pointer to fill the entire
buffer.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
---
 crypto/drbg.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/crypto/drbg.c b/crypto/drbg.c
index fb33f7d..9a95b61 100644
--- a/crypto/drbg.c
+++ b/crypto/drbg.c
@@ -1766,6 +1766,7 @@ static int drbg_kcapi_sym_ctr(struct drbg_state *drbg,
 		init_completion(&drbg->ctr_completion);

 		outlen -= cryptlen;
+		outbuf += cryptlen;
 	}

 	return 0;
-- 
2.7.4

^ permalink raw reply related

* bug in blkcipher_walk code
From: Stephan Mueller @ 2016-11-18 11:31 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto

Hi Herbert,

Once in a while I seem to trigger a bug in the blkcipher_walk code which I 
cannot track down. This bug happens sporadically where I assume that it has 
something to do with the memory management in the slow path of blkcipher_walk.

I am using the CTR DRBG code that in turn uses the ctr-aes-aesni 
implementation. The bug only appears when I want to obtain a random number 
that is less than the CTR AES block size. In my particular case, I want 4 
bytes from the DRBG.

The bug happens in arch/x86/crypto/aesni-intel_glue.c:ctr_crypt_final() at the 
line:

	memcpy(dst, keystream, nbytes);

The bug looks like the following:

[   12.328676] BUG: unable to handle kernel paging request at ffffa17ae418b988
[   12.328680] IP: [<ffffffff82060eea>] ctr_crypt+0x19a/0x1c0
[   12.328681] PGD 66fed067
[   12.328681] PUD 0
[   12.328681]
[   12.328683] Oops: 0002 [#1] SMP
[   12.328692] Modules linked in: bridge(+) stp llc ebtable_nat ip6table_raw 
ip6table_security ip6table_mangle iptable_raw iptable_security iptable_mangle 
ebtable_filter ebtables ip6table_filter ip6_tables crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel pcspkr i2c_piix4 virtio_net virtio_balloon 
acpi_cpufreq sch_fq_codel virtio_console virtio_blk virtio_pci virtio_ring 
serio_raw crc32c_intel virtio
[   12.328693] CPU: 0 PID: 521 Comm: modprobe Not tainted 4.9.0-rc1+ #253
[   12.328694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.1-1.fc24 04/01/2014
[   12.328694] task: ffffa17ab8453fc0 task.stack: ffffbdafc0744000
[   12.328696] RIP: 0010:[<ffffffff82060eea>]  [<ffffffff82060eea>] ctr_crypt
+0x19a/0x1c0
[   12.328696] RSP: 0018:ffffbdafc0747a60  EFLAGS: 00010002
[   12.328697] RAX: 0000000032e455a6 RBX: 0000000000000004 RCX: 
0000000000000002
[   12.328697] RDX: 0000000000000001 RSI: 0000000000000086 RDI: 
0000000000000086
[   12.328698] RBP: ffffbdafc0747b28 R08: ffffa17abc16e900 R09: 
0000000000000019
[   12.328698] R10: ffffa17a764f68b0 R11: 000000000002e918 R12: 
ffffbdafc0747b38
[   12.328698] R13: ffffa17a764f6840 R14: ffffa17ae418b988 R15: 
ffffbdafc0747a70
[   12.328699] FS:  00007f55f57a6700(0000) GS:ffffa17abfc00000(0000) knlGS:
0000000000000000
[   12.328700] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   12.328700] CR2: ffffa17ae418b988 CR3: 0000000079b26000 CR4: 
00000000003406f0
[   12.328703] Stack:
[   12.328705]  ffffa17abc16e900 ffffa17ab845fd80 2ae7e40732e455a6 
3a224612a8f9841d
[   12.328706]  fffffb4e81e117c0 ffffa17ab845fd80 fffffb4e829062c0 
ffffa17ae418b988
[   12.328707]  ffffbdafc0747ba8 ffffffff00000d80 ffffffff00000004 
ffffbdafc0747bc8
[   12.328708] Call Trace:
[   12.328712]  [<ffffffff823e5fd3>] __ablk_encrypt+0x43/0x50
[   12.328714]  [<ffffffff823e6012>] ablk_encrypt+0x32/0xc0
[   12.328716]  [<ffffffff823c4f2e>] skcipher_encrypt_ablkcipher+0x5e/0x60
[   12.328717]  [<ffffffff823dbb80>] drbg_kcapi_sym_ctr+0xb0/0x130
[   12.328719]  [<ffffffff823de153>] drbg_ctr_generate+0x53/0x80

Now, the interesting part is the following: the original memory pointer that 
shall be processed by the DRBG is in my example ffffffffc018b988 -- this 
pointer is used until the DRBG invokes crypto_skcipher_encrypt. However, when 
I print out the buffer pointer that is used as dst in the memcpy of 
ctr_crypt_final, I see ffffa17ae418b988 -- i.e. the buffer that causes paging 
failure.

During tracing the blkcipher_walk code I see that the slow code path is used 
when the request size is smaller than the block size. That slow code path 
allocates new memory that will be used for the dst pointer in ctr_crypt_final.

May I ask you for checking whether the allocation and the memory pointer logic 
has an issue that would cause a paging failure?

Ciao
Stephan

^ permalink raw reply

* [patch] s390/crypto: unlock on error in prng_tdes_read()
From: Dan Carpenter @ 2016-11-18 11:11 UTC (permalink / raw)
  To: Herbert Xu, Harald Freudenberger
  Cc: David S. Miller, Martin Schwidefsky, Heiko Carstens, linux-crypto,
	linux-s390, kernel-janitors

We added some new locking but forgot to unlock on error.

Fixes: 57127645d79d ("s390/zcrypt: Introduce new SHA-512 based Pseudo Random Generator.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/arch/s390/crypto/prng.c b/arch/s390/crypto/prng.c
index 9cc050f..1113389 100644
--- a/arch/s390/crypto/prng.c
+++ b/arch/s390/crypto/prng.c
@@ -507,8 +507,10 @@ static ssize_t prng_tdes_read(struct file *file, char __user *ubuf,
 		prng_data->prngws.byte_counter += n;
 		prng_data->prngws.reseed_counter += n;
 
-		if (copy_to_user(ubuf, prng_data->buf, chunk))
-			return -EFAULT;
+		if (copy_to_user(ubuf, prng_data->buf, chunk)) {
+			ret = -EFAULT;
+			break;
+		}
 
 		nbytes -= chunk;
 		ret += chunk;

^ permalink raw reply related

* [PATCH net-next] cxgb4: Allocate Tx queues dynamically
From: Atul Gupta @ 2016-11-18 11:07 UTC (permalink / raw)
  To: netdev, linux-scsi, target-devel, linux-rdma, linux-crypto
  Cc: davem, nab, jejb, martin.petersen, dledford, herbert, leedom,
	nirranjan, varun, swise, hariprasad, Atul Gupta

From: Hariprasad Shenai <hariprasad@chelsio.com>

Allocate resources dynamically for Upper layer driver's (ULD) like
cxgbit, iw_cxgb4, cxgb4i and chcr. The resources allocated include Tx
queues which are allocated when ULD register with cxgb4 driver and freed
while un-registering. The Tx queues which are shared by ULD shall be
allocated by first registering driver and un-allocated by last
unregistering driver.

Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
---
 drivers/crypto/chelsio/chcr_algo.c                 |  16 +--
 drivers/crypto/chelsio/chcr_core.c                 |   3 +-
 drivers/infiniband/hw/cxgb4/device.c               |   1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |  19 +++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |  12 --
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    |  64 +++++++----
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c     | 114 +++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h     |  17 +++
 drivers/net/ethernet/chelsio/cxgb4/sge.c           | 121 +++++++++++++++------
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c                 |   1 +
 drivers/target/iscsi/cxgbit/cxgbit_main.c          |   1 +
 11 files changed, 287 insertions(+), 82 deletions(-)

diff --git a/drivers/crypto/chelsio/chcr_algo.c b/drivers/crypto/chelsio/chcr_algo.c
index e4ddb921d7b3..56b153805462 100644
--- a/drivers/crypto/chelsio/chcr_algo.c
+++ b/drivers/crypto/chelsio/chcr_algo.c
@@ -592,16 +592,18 @@ static int chcr_aes_cbc_setkey(struct crypto_ablkcipher *tfm, const u8 *key,
 
 static int cxgb4_is_crypto_q_full(struct net_device *dev, unsigned int idx)
 {
-	int ret = 0;
-	struct sge_ofld_txq *q;
 	struct adapter *adap = netdev2adap(dev);
+	struct sge_uld_txq_info *txq_info =
+		adap->sge.uld_txq_info[CXGB4_TX_CRYPTO];
+	struct sge_uld_txq *txq;
+	int ret = 0;
 
 	local_bh_disable();
-	q = &adap->sge.ofldtxq[idx];
-	spin_lock(&q->sendq.lock);
-	if (q->full)
+	txq = &txq_info->uldtxq[idx];
+	spin_lock(&txq->sendq.lock);
+	if (txq->full)
 		ret = -1;
-	spin_unlock(&q->sendq.lock);
+	spin_unlock(&txq->sendq.lock);
 	local_bh_enable();
 	return ret;
 }
@@ -674,11 +676,11 @@ static int chcr_device_init(struct chcr_context *ctx)
 		}
 		u_ctx = ULD_CTX(ctx);
 		rxq_perchan = u_ctx->lldi.nrxq / u_ctx->lldi.nchan;
-		ctx->dev->tx_channel_id = 0;
 		rxq_idx = ctx->dev->tx_channel_id * rxq_perchan;
 		rxq_idx += id % rxq_perchan;
 		spin_lock(&ctx->dev->lock_chcr_dev);
 		ctx->tx_channel_id = rxq_idx;
+		ctx->dev->tx_channel_id = !ctx->dev->tx_channel_id;
 		spin_unlock(&ctx->dev->lock_chcr_dev);
 	}
 out:
diff --git a/drivers/crypto/chelsio/chcr_core.c b/drivers/crypto/chelsio/chcr_core.c
index fb5f9bbfa09c..4d7f6700fd7e 100644
--- a/drivers/crypto/chelsio/chcr_core.c
+++ b/drivers/crypto/chelsio/chcr_core.c
@@ -42,6 +42,7 @@ static chcr_handler_func work_handlers[NUM_CPL_CMDS] = {
 static struct cxgb4_uld_info chcr_uld_info = {
 	.name = DRV_MODULE_NAME,
 	.nrxq = MAX_ULD_QSETS,
+	.ntxq = MAX_ULD_QSETS,
 	.rxq_size = 1024,
 	.add = chcr_uld_add,
 	.state_change = chcr_uld_state_change,
@@ -126,7 +127,7 @@ static int cpl_fw6_pld_handler(struct chcr_dev *dev,
 
 int chcr_send_wr(struct sk_buff *skb)
 {
-	return cxgb4_ofld_send(skb->dev, skb);
+	return cxgb4_crypto_send(skb->dev, skb);
 }
 
 static void *chcr_uld_add(const struct cxgb4_lld_info *lld)
diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index 93e3d270a98a..4e5baf4fe15e 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -1481,6 +1481,7 @@ static int c4iw_uld_control(void *handle, enum cxgb4_control control, ...)
 static struct cxgb4_uld_info c4iw_uld_info = {
 	.name = DRV_NAME,
 	.nrxq = MAX_ULD_QSETS,
+	.ntxq = MAX_ULD_QSETS,
 	.rxq_size = 511,
 	.ciq = true,
 	.lro = false,
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 2125903043fb..0bce1bf9ca0f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -635,6 +635,7 @@ struct tx_sw_desc;
 
 struct sge_txq {
 	unsigned int  in_use;       /* # of in-use Tx descriptors */
+	unsigned int  q_type;	    /* Q type Eth/Ctrl/Ofld */
 	unsigned int  size;         /* # of descriptors */
 	unsigned int  cidx;         /* SW consumer index */
 	unsigned int  pidx;         /* producer index */
@@ -665,7 +666,7 @@ struct sge_eth_txq {                /* state for an SGE Ethernet Tx queue */
 	unsigned long mapping_err;  /* # of I/O MMU packet mapping errors */
 } ____cacheline_aligned_in_smp;
 
-struct sge_ofld_txq {               /* state for an SGE offload Tx queue */
+struct sge_uld_txq {               /* state for an SGE offload Tx queue */
 	struct sge_txq q;
 	struct adapter *adap;
 	struct sk_buff_head sendq;  /* list of backpressured packets */
@@ -693,14 +694,20 @@ struct sge_uld_rxq_info {
 	u8 uld;			/* uld type */
 };
 
+struct sge_uld_txq_info {
+	struct sge_uld_txq *uldtxq; /* Txq's for ULD */
+	atomic_t users;		/* num users */
+	u16 ntxq;		/* # of egress uld queues */
+};
+
 struct sge {
 	struct sge_eth_txq ethtxq[MAX_ETH_QSETS];
-	struct sge_ofld_txq ofldtxq[MAX_OFLD_QSETS];
 	struct sge_ctrl_txq ctrlq[MAX_CTRL_QUEUES];
 
 	struct sge_eth_rxq ethrxq[MAX_ETH_QSETS];
 	struct sge_rspq fw_evtq ____cacheline_aligned_in_smp;
 	struct sge_uld_rxq_info **uld_rxq_info;
+	struct sge_uld_txq_info **uld_txq_info;
 
 	struct sge_rspq intrq ____cacheline_aligned_in_smp;
 	spinlock_t intrq_lock;
@@ -1298,8 +1305,9 @@ int t4_sge_alloc_ctrl_txq(struct adapter *adap, struct sge_ctrl_txq *txq,
 			  unsigned int cmplqid);
 int t4_sge_mod_ctrl_txq(struct adapter *adap, unsigned int eqid,
 			unsigned int cmplqid);
-int t4_sge_alloc_ofld_txq(struct adapter *adap, struct sge_ofld_txq *txq,
-			  struct net_device *dev, unsigned int iqid);
+int t4_sge_alloc_uld_txq(struct adapter *adap, struct sge_uld_txq *txq,
+			 struct net_device *dev, unsigned int iqid,
+			 unsigned int uld_type);
 irqreturn_t t4_sge_intr_msix(int irq, void *cookie);
 int t4_sge_init(struct adapter *adap);
 void t4_sge_start(struct adapter *adap);
@@ -1661,4 +1669,7 @@ int t4_uld_mem_alloc(struct adapter *adap);
 void t4_uld_clean_up(struct adapter *adap);
 void t4_register_netevent_notifier(void);
 void free_rspq_fl(struct adapter *adap, struct sge_rspq *rq, struct sge_fl *fl);
+void free_tx_desc(struct adapter *adap, struct sge_txq *q,
+		  unsigned int n, bool unmap);
+void free_txq(struct adapter *adap, struct sge_txq *q);
 #endif /* __CXGB4_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index 20455d082cb8..acc231293e4d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -2512,18 +2512,6 @@ do { \
 		RL("FLLow:", fl.low);
 		RL("FLStarving:", fl.starving);
 
-	} else if (ofld_idx < ofld_entries) {
-		const struct sge_ofld_txq *tx =
-			&adap->sge.ofldtxq[ofld_idx * 4];
-		int n = min(4, adap->sge.ofldqsets - 4 * ofld_idx);
-
-		S("QType:", "OFLD-Txq");
-		T("TxQ ID:", q.cntxt_id);
-		T("TxQ size:", q.size);
-		T("TxQ inuse:", q.in_use);
-		T("TxQ CIDX:", q.cidx);
-		T("TxQ PIDX:", q.pidx);
-
 	} else if (ctrl_idx < ctrl_entries) {
 		const struct sge_ctrl_txq *tx = &adap->sge.ctrlq[ctrl_idx * 4];
 		int n = min(4, adap->params.nports - 4 * ctrl_idx);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index c0cc2ee77be7..449884f8dd67 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -530,15 +530,15 @@ static int fwevtq_handler(struct sge_rspq *q, const __be64 *rsp,
 
 		txq = q->adap->sge.egr_map[qid - q->adap->sge.egr_start];
 		txq->restarts++;
-		if ((u8 *)txq < (u8 *)q->adap->sge.ofldtxq) {
+		if (txq->q_type == CXGB4_TXQ_ETH) {
 			struct sge_eth_txq *eq;
 
 			eq = container_of(txq, struct sge_eth_txq, q);
 			netif_tx_wake_queue(eq->txq);
 		} else {
-			struct sge_ofld_txq *oq;
+			struct sge_uld_txq *oq;
 
-			oq = container_of(txq, struct sge_ofld_txq, q);
+			oq = container_of(txq, struct sge_uld_txq, q);
 			tasklet_schedule(&oq->qresume_tsk);
 		}
 	} else if (opcode == CPL_FW6_MSG || opcode == CPL_FW4_MSG) {
@@ -885,15 +885,6 @@ static int setup_sge_queues(struct adapter *adap)
 		}
 	}
 
-	j = s->ofldqsets / adap->params.nports; /* iscsi queues per channel */
-	for_each_ofldtxq(s, i) {
-		err = t4_sge_alloc_ofld_txq(adap, &s->ofldtxq[i],
-					    adap->port[i / j],
-					    s->fw_evtq.cntxt_id);
-		if (err)
-			goto freeout;
-	}
-
 	for_each_port(adap, i) {
 		/* Note that cmplqid below is 0 if we don't
 		 * have RDMA queues, and that's the right value.
@@ -1922,8 +1913,18 @@ static void disable_dbs(struct adapter *adap)
 
 	for_each_ethrxq(&adap->sge, i)
 		disable_txq_db(&adap->sge.ethtxq[i].q);
-	for_each_ofldtxq(&adap->sge, i)
-		disable_txq_db(&adap->sge.ofldtxq[i].q);
+	if (is_offload(adap)) {
+		struct sge_uld_txq_info *txq_info =
+			adap->sge.uld_txq_info[CXGB4_TX_OFLD];
+
+		if (txq_info) {
+			for_each_ofldtxq(&adap->sge, i) {
+				struct sge_uld_txq *txq = &txq_info->uldtxq[i];
+
+				disable_txq_db(&txq->q);
+			}
+		}
+	}
 	for_each_port(adap, i)
 		disable_txq_db(&adap->sge.ctrlq[i].q);
 }
@@ -1934,8 +1935,18 @@ static void enable_dbs(struct adapter *adap)
 
 	for_each_ethrxq(&adap->sge, i)
 		enable_txq_db(adap, &adap->sge.ethtxq[i].q);
-	for_each_ofldtxq(&adap->sge, i)
-		enable_txq_db(adap, &adap->sge.ofldtxq[i].q);
+	if (is_offload(adap)) {
+		struct sge_uld_txq_info *txq_info =
+			adap->sge.uld_txq_info[CXGB4_TX_OFLD];
+
+		if (txq_info) {
+			for_each_ofldtxq(&adap->sge, i) {
+				struct sge_uld_txq *txq = &txq_info->uldtxq[i];
+
+				enable_txq_db(adap, &txq->q);
+			}
+		}
+	}
 	for_each_port(adap, i)
 		enable_txq_db(adap, &adap->sge.ctrlq[i].q);
 }
@@ -2006,8 +2017,17 @@ static void recover_all_queues(struct adapter *adap)
 
 	for_each_ethrxq(&adap->sge, i)
 		sync_txq_pidx(adap, &adap->sge.ethtxq[i].q);
-	for_each_ofldtxq(&adap->sge, i)
-		sync_txq_pidx(adap, &adap->sge.ofldtxq[i].q);
+	if (is_offload(adap)) {
+		struct sge_uld_txq_info *txq_info =
+			adap->sge.uld_txq_info[CXGB4_TX_OFLD];
+		if (txq_info) {
+			for_each_ofldtxq(&adap->sge, i) {
+				struct sge_uld_txq *txq = &txq_info->uldtxq[i];
+
+				sync_txq_pidx(adap, &txq->q);
+			}
+		}
+	}
 	for_each_port(adap, i)
 		sync_txq_pidx(adap, &adap->sge.ctrlq[i].q);
 }
@@ -3991,7 +4011,7 @@ static inline bool is_x_10g_port(const struct link_config *lc)
 static void cfg_queues(struct adapter *adap)
 {
 	struct sge *s = &adap->sge;
-	int i, n10g = 0, qidx = 0;
+	int i = 0, n10g = 0, qidx = 0;
 #ifndef CONFIG_CHELSIO_T4_DCB
 	int q10g = 0;
 #endif
@@ -4006,8 +4026,7 @@ static void cfg_queues(struct adapter *adap)
 		adap->params.crypto = 0;
 	}
 
-	for_each_port(adap, i)
-		n10g += is_x_10g_port(&adap2pinfo(adap, i)->link_cfg);
+	n10g += is_x_10g_port(&adap2pinfo(adap, i)->link_cfg);
 #ifdef CONFIG_CHELSIO_T4_DCB
 	/* For Data Center Bridging support we need to be able to support up
 	 * to 8 Traffic Priorities; each of which will be assigned to its
@@ -4075,9 +4094,6 @@ static void cfg_queues(struct adapter *adap)
 	for (i = 0; i < ARRAY_SIZE(s->ctrlq); i++)
 		s->ctrlq[i].q.size = 512;
 
-	for (i = 0; i < ARRAY_SIZE(s->ofldtxq); i++)
-		s->ofldtxq[i].q.size = 1024;
-
 	init_rspq(adap, &s->fw_evtq, 0, 1, 1024, 64);
 	init_rspq(adap, &s->intrq, 0, 1, 512, 64);
 }
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
index 2471ff465d5c..565a6c6bfeaf 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
@@ -447,6 +447,106 @@ static void quiesce_rx_uld(struct adapter *adap, unsigned int uld_type)
 		quiesce_rx(adap, &rxq_info->uldrxq[idx].rspq);
 }
 
+static void
+free_sge_txq_uld(struct adapter *adap, struct sge_uld_txq_info *txq_info)
+{
+	int nq = txq_info->ntxq;
+	int i;
+
+	for (i = 0; i < nq; i++) {
+		struct sge_uld_txq *txq = &txq_info->uldtxq[i];
+
+		if (txq && txq->q.desc) {
+			tasklet_kill(&txq->qresume_tsk);
+			t4_ofld_eq_free(adap, adap->mbox, adap->pf, 0,
+					txq->q.cntxt_id);
+			free_tx_desc(adap, &txq->q, txq->q.in_use, false);
+			kfree(txq->q.sdesc);
+			__skb_queue_purge(&txq->sendq);
+			free_txq(adap, &txq->q);
+		}
+	}
+}
+
+static int
+alloc_sge_txq_uld(struct adapter *adap, struct sge_uld_txq_info *txq_info,
+		  unsigned int uld_type)
+{
+	struct sge *s = &adap->sge;
+	int nq = txq_info->ntxq;
+	int i, j, err;
+
+	j = nq / adap->params.nports;
+	for (i = 0; i < nq; i++) {
+		struct sge_uld_txq *txq = &txq_info->uldtxq[i];
+
+		txq->q.size = 1024;
+		err = t4_sge_alloc_uld_txq(adap, txq, adap->port[i / j],
+					   s->fw_evtq.cntxt_id, uld_type);
+		if (err)
+			goto freeout;
+	}
+	return 0;
+freeout:
+	free_sge_txq_uld(adap, txq_info);
+	return err;
+}
+
+static void
+release_sge_txq_uld(struct adapter *adap, unsigned int uld_type)
+{
+	struct sge_uld_txq_info *txq_info = NULL;
+	int tx_uld_type = TX_ULD(uld_type);
+
+	txq_info = adap->sge.uld_txq_info[tx_uld_type];
+
+	if (txq_info && atomic_dec_and_test(&txq_info->users)) {
+		free_sge_txq_uld(adap, txq_info);
+		kfree(txq_info->uldtxq);
+		kfree(txq_info);
+		adap->sge.uld_txq_info[tx_uld_type] = NULL;
+	}
+}
+
+static int
+setup_sge_txq_uld(struct adapter *adap, unsigned int uld_type,
+		  const struct cxgb4_uld_info *uld_info)
+{
+	struct sge_uld_txq_info *txq_info = NULL;
+	int tx_uld_type, i;
+
+	tx_uld_type = TX_ULD(uld_type);
+	txq_info = adap->sge.uld_txq_info[tx_uld_type];
+
+	if ((tx_uld_type == CXGB4_TX_OFLD) && txq_info &&
+	    (atomic_inc_return(&txq_info->users) > 1))
+		return 0;
+
+	txq_info = kzalloc(sizeof(*txq_info), GFP_KERNEL);
+	if (!txq_info)
+		return -ENOMEM;
+
+	i = min_t(int, uld_info->ntxq, num_online_cpus());
+	txq_info->ntxq = roundup(i, adap->params.nports);
+
+	txq_info->uldtxq = kcalloc(txq_info->ntxq, sizeof(struct sge_uld_txq),
+				   GFP_KERNEL);
+	if (!txq_info->uldtxq) {
+		kfree(txq_info->uldtxq);
+		return -ENOMEM;
+	}
+
+	if (alloc_sge_txq_uld(adap, txq_info, tx_uld_type)) {
+		kfree(txq_info->uldtxq);
+		kfree(txq_info);
+		return -ENOMEM;
+	}
+
+	atomic_inc(&txq_info->users);
+	adap->sge.uld_txq_info[tx_uld_type] = txq_info;
+	return 0;
+}
+
 static void uld_queue_init(struct adapter *adap, unsigned int uld_type,
 			   struct cxgb4_lld_info *lli)
 {
@@ -472,7 +572,15 @@ int t4_uld_mem_alloc(struct adapter *adap)
 	if (!s->uld_rxq_info)
 		goto err_uld;
 
+	s->uld_txq_info = kzalloc(CXGB4_TX_MAX *
+				  sizeof(struct sge_uld_txq_info *),
+				  GFP_KERNEL);
+	if (!s->uld_txq_info)
+		goto err_uld_rx;
 	return 0;
+
+err_uld_rx:
+	kfree(s->uld_rxq_info);
 err_uld:
 	kfree(adap->uld);
 	return -ENOMEM;
@@ -482,6 +590,7 @@ void t4_uld_mem_free(struct adapter *adap)
 {
 	struct sge *s = &adap->sge;
 
+	kfree(s->uld_txq_info);
 	kfree(s->uld_rxq_info);
 	kfree(adap->uld);
 }
@@ -616,6 +725,9 @@ int cxgb4_register_uld(enum cxgb4_uld type,
 			ret = -EBUSY;
 			goto free_irq;
 		}
+		ret = setup_sge_txq_uld(adap, type, p);
+		if (ret)
+			goto free_irq;
 		adap->uld[type] = *p;
 		uld_attach(adap, type);
 		adap_idx++;
@@ -644,6 +756,7 @@ int cxgb4_register_uld(enum cxgb4_uld type,
 			break;
 		adap->uld[type].handle = NULL;
 		adap->uld[type].add = NULL;
+		release_sge_txq_uld(adap, type);
 		if (adap->flags & FULL_INIT_DONE)
 			quiesce_rx_uld(adap, type);
 		if (adap->flags & USING_MSIX)
@@ -679,6 +792,7 @@ int cxgb4_unregister_uld(enum cxgb4_uld type)
 			continue;
 		adap->uld[type].handle = NULL;
 		adap->uld[type].add = NULL;
+		release_sge_txq_uld(adap, type);
 		if (adap->flags & FULL_INIT_DONE)
 			quiesce_rx_uld(adap, type);
 		if (adap->flags & USING_MSIX)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
index 2996793b1aaa..4c856605fdfa 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
@@ -77,6 +77,8 @@ enum {
 
 /* Special asynchronous notification message */
 #define CXGB4_MSG_AN ((void *)1)
+#define TX_ULD(uld)(((uld) != CXGB4_ULD_CRYPTO) ? CXGB4_TX_OFLD :\
+		      CXGB4_TX_CRYPTO)
 
 struct serv_entry {
 	void *data;
@@ -223,6 +225,19 @@ enum cxgb4_uld {
 	CXGB4_ULD_MAX
 };
 
+enum cxgb4_tx_uld {
+	CXGB4_TX_OFLD,
+	CXGB4_TX_CRYPTO,
+	CXGB4_TX_MAX
+};
+
+enum cxgb4_txq_type {
+	CXGB4_TXQ_ETH,
+	CXGB4_TXQ_ULD,
+	CXGB4_TXQ_CTRL,
+	CXGB4_TXQ_MAX
+};
+
 enum cxgb4_state {
 	CXGB4_STATE_UP,
 	CXGB4_STATE_START_RECOVERY,
@@ -316,6 +331,7 @@ struct cxgb4_uld_info {
 	void *handle;
 	unsigned int nrxq;
 	unsigned int rxq_size;
+	unsigned int ntxq;
 	bool ciq;
 	bool lro;
 	void *(*add)(const struct cxgb4_lld_info *p);
@@ -333,6 +349,7 @@ struct cxgb4_uld_info {
 int cxgb4_register_uld(enum cxgb4_uld type, const struct cxgb4_uld_info *p);
 int cxgb4_unregister_uld(enum cxgb4_uld type);
 int cxgb4_ofld_send(struct net_device *dev, struct sk_buff *skb);
+int cxgb4_crypto_send(struct net_device *dev, struct sk_buff *skb);
 unsigned int cxgb4_dbfifo_count(const struct net_device *dev, int lpfifo);
 unsigned int cxgb4_port_chan(const struct net_device *dev);
 unsigned int cxgb4_port_viid(const struct net_device *dev);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 1e74fd6085df..b7d0753b9242 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -377,8 +377,8 @@ unmap:			dma_unmap_page(dev, be64_to_cpu(p->addr[0]),
  *	Reclaims Tx descriptors from an SGE Tx queue and frees the associated
  *	Tx buffers.  Called with the Tx queue lock held.
  */
-static void free_tx_desc(struct adapter *adap, struct sge_txq *q,
-			 unsigned int n, bool unmap)
+void free_tx_desc(struct adapter *adap, struct sge_txq *q,
+		  unsigned int n, bool unmap)
 {
 	struct tx_sw_desc *d;
 	unsigned int cidx = q->cidx;
@@ -1543,7 +1543,7 @@ static inline unsigned int calc_tx_flits_ofld(const struct sk_buff *skb)
  *	inability to map packets.  A periodic timer attempts to restart
  *	queues so marked.
  */
-static void txq_stop_maperr(struct sge_ofld_txq *q)
+static void txq_stop_maperr(struct sge_uld_txq *q)
 {
 	q->mapping_err++;
 	q->q.stops++;
@@ -1559,7 +1559,7 @@ static void txq_stop_maperr(struct sge_ofld_txq *q)
  *	Stops an offload Tx queue that has become full and modifies the packet
  *	being written to request a wakeup.
  */
-static void ofldtxq_stop(struct sge_ofld_txq *q, struct sk_buff *skb)
+static void ofldtxq_stop(struct sge_uld_txq *q, struct sk_buff *skb)
 {
 	struct fw_wr_hdr *wr = (struct fw_wr_hdr *)skb->data;
 
@@ -1586,7 +1586,7 @@ static void ofldtxq_stop(struct sge_ofld_txq *q, struct sk_buff *skb)
  *	boolean "service_ofldq_running" to make sure that only one instance
  *	is ever running at a time ...
  */
-static void service_ofldq(struct sge_ofld_txq *q)
+static void service_ofldq(struct sge_uld_txq *q)
 {
 	u64 *pos, *before, *end;
 	int credits;
@@ -1706,7 +1706,7 @@ static void service_ofldq(struct sge_ofld_txq *q)
  *
  *	Send an offload packet through an SGE offload queue.
  */
-static int ofld_xmit(struct sge_ofld_txq *q, struct sk_buff *skb)
+static int ofld_xmit(struct sge_uld_txq *q, struct sk_buff *skb)
 {
 	skb->priority = calc_tx_flits_ofld(skb);       /* save for restart */
 	spin_lock(&q->sendq.lock);
@@ -1735,7 +1735,7 @@ static int ofld_xmit(struct sge_ofld_txq *q, struct sk_buff *skb)
  */
 static void restart_ofldq(unsigned long data)
 {
-	struct sge_ofld_txq *q = (struct sge_ofld_txq *)data;
+	struct sge_uld_txq *q = (struct sge_uld_txq *)data;
 
 	spin_lock(&q->sendq.lock);
 	q->full = 0;            /* the queue actually is completely empty now */
@@ -1767,17 +1767,23 @@ static inline unsigned int is_ctrl_pkt(const struct sk_buff *skb)
 	return skb->queue_mapping & 1;
 }
 
-static inline int ofld_send(struct adapter *adap, struct sk_buff *skb)
+static inline int uld_send(struct adapter *adap, struct sk_buff *skb,
+			   unsigned int tx_uld_type)
 {
+	struct sge_uld_txq_info *txq_info;
+	struct sge_uld_txq *txq;
 	unsigned int idx = skb_txq(skb);
 
+	txq_info = adap->sge.uld_txq_info[tx_uld_type];
+	txq = &txq_info->uldtxq[idx];
+
 	if (unlikely(is_ctrl_pkt(skb))) {
 		/* Single ctrl queue is a requirement for LE workaround path */
 		if (adap->tids.nsftids)
 			idx = 0;
 		return ctrl_xmit(&adap->sge.ctrlq[idx], skb);
 	}
-	return ofld_xmit(&adap->sge.ofldtxq[idx], skb);
+	return ofld_xmit(txq, skb);
 }
 
 /**
@@ -1794,7 +1800,7 @@ int t4_ofld_send(struct adapter *adap, struct sk_buff *skb)
 	int ret;
 
 	local_bh_disable();
-	ret = ofld_send(adap, skb);
+	ret = uld_send(adap, skb, CXGB4_TX_OFLD);
 	local_bh_enable();
 	return ret;
 }
@@ -1813,6 +1819,39 @@ int cxgb4_ofld_send(struct net_device *dev, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(cxgb4_ofld_send);
 
+/**
+ *	t4_crypto_send - send crypto packet
+ *	@adap: the adapter
+ *	@skb: the packet
+ *
+ *	Sends crypto packet.  We use the packet queue_mapping to select the
+ *	appropriate Tx queue as follows: bit 0 indicates whether the packet
+ *	should be sent as regular or control, bits 1-15 select the queue.
+ */
+static int t4_crypto_send(struct adapter *adap, struct sk_buff *skb)
+{
+	int ret;
+
+	local_bh_disable();
+	ret = uld_send(adap, skb, CXGB4_TX_CRYPTO);
+	local_bh_enable();
+	return ret;
+}
+
+/**
+ *	cxgb4_crypto_send - send crypto packet
+ *	@dev: the net device
+ *	@skb: the packet
+ *
+ *	Sends crypto packet.  This is an exported version of @t4_crypto_send,
+ *	intended for ULDs.
+ */
+int cxgb4_crypto_send(struct net_device *dev, struct sk_buff *skb)
+{
+	return t4_crypto_send(netdev2adap(dev), skb);
+}
+EXPORT_SYMBOL(cxgb4_crypto_send);
+
 static inline void copy_frags(struct sk_buff *skb,
 			      const struct pkt_gl *gl, unsigned int offset)
 {
@@ -2479,7 +2518,7 @@ static void sge_tx_timer_cb(unsigned long data)
 	for (i = 0; i < BITS_TO_LONGS(s->egr_sz); i++)
 		for (m = s->txq_maperr[i]; m; m &= m - 1) {
 			unsigned long id = __ffs(m) + i * BITS_PER_LONG;
-			struct sge_ofld_txq *txq = s->egr_map[id];
+			struct sge_uld_txq *txq = s->egr_map[id];
 
 			clear_bit(id, s->txq_maperr);
 			tasklet_schedule(&txq->qresume_tsk);
@@ -2799,6 +2838,7 @@ int t4_sge_alloc_eth_txq(struct adapter *adap, struct sge_eth_txq *txq,
 		return ret;
 	}
 
+	txq->q.q_type = CXGB4_TXQ_ETH;
 	init_txq(adap, &txq->q, FW_EQ_ETH_CMD_EQID_G(ntohl(c.eqid_pkd)));
 	txq->txq = netdevq;
 	txq->tso = txq->tx_cso = txq->vlan_ins = 0;
@@ -2852,6 +2892,7 @@ int t4_sge_alloc_ctrl_txq(struct adapter *adap, struct sge_ctrl_txq *txq,
 		return ret;
 	}
 
+	txq->q.q_type = CXGB4_TXQ_CTRL;
 	init_txq(adap, &txq->q, FW_EQ_CTRL_CMD_EQID_G(ntohl(c.cmpliqid_eqid)));
 	txq->adap = adap;
 	skb_queue_head_init(&txq->sendq);
@@ -2872,13 +2913,15 @@ int t4_sge_mod_ctrl_txq(struct adapter *adap, unsigned int eqid,
 	return t4_set_params(adap, adap->mbox, adap->pf, 0, 1, &param, &val);
 }
 
-int t4_sge_alloc_ofld_txq(struct adapter *adap, struct sge_ofld_txq *txq,
-			  struct net_device *dev, unsigned int iqid)
+int t4_sge_alloc_uld_txq(struct adapter *adap, struct sge_uld_txq *txq,
+			 struct net_device *dev, unsigned int iqid,
+			 unsigned int uld_type)
 {
 	int ret, nentries;
 	struct fw_eq_ofld_cmd c;
 	struct sge *s = &adap->sge;
 	struct port_info *pi = netdev_priv(dev);
+	int cmd = FW_EQ_OFLD_CMD;
 
 	/* Add status entries */
 	nentries = txq->q.size + s->stat_len / sizeof(struct tx_desc);
@@ -2891,7 +2934,9 @@ int t4_sge_alloc_ofld_txq(struct adapter *adap, struct sge_ofld_txq *txq,
 		return -ENOMEM;
 
 	memset(&c, 0, sizeof(c));
-	c.op_to_vfn = htonl(FW_CMD_OP_V(FW_EQ_OFLD_CMD) | FW_CMD_REQUEST_F |
+	if (unlikely(uld_type == CXGB4_TX_CRYPTO))
+		cmd = FW_EQ_CTRL_CMD;
+	c.op_to_vfn = htonl(FW_CMD_OP_V(cmd) | FW_CMD_REQUEST_F |
 			    FW_CMD_WRITE_F | FW_CMD_EXEC_F |
 			    FW_EQ_OFLD_CMD_PFN_V(adap->pf) |
 			    FW_EQ_OFLD_CMD_VFN_V(0));
@@ -2919,6 +2964,7 @@ int t4_sge_alloc_ofld_txq(struct adapter *adap, struct sge_ofld_txq *txq,
 		return ret;
 	}
 
+	txq->q.q_type = CXGB4_TXQ_ULD;
 	init_txq(adap, &txq->q, FW_EQ_OFLD_CMD_EQID_G(ntohl(c.eqid_pkd)));
 	txq->adap = adap;
 	skb_queue_head_init(&txq->sendq);
@@ -2928,7 +2974,7 @@ int t4_sge_alloc_ofld_txq(struct adapter *adap, struct sge_ofld_txq *txq,
 	return 0;
 }
 
-static void free_txq(struct adapter *adap, struct sge_txq *q)
+void free_txq(struct adapter *adap, struct sge_txq *q)
 {
 	struct sge *s = &adap->sge;
 
@@ -3026,21 +3072,6 @@ void t4_free_sge_resources(struct adapter *adap)
 		}
 	}
 
-	/* clean up offload Tx queues */
-	for (i = 0; i < ARRAY_SIZE(adap->sge.ofldtxq); i++) {
-		struct sge_ofld_txq *q = &adap->sge.ofldtxq[i];
-
-		if (q->q.desc) {
-			tasklet_kill(&q->qresume_tsk);
-			t4_ofld_eq_free(adap, adap->mbox, adap->pf, 0,
-					q->q.cntxt_id);
-			free_tx_desc(adap, &q->q, q->q.in_use, false);
-			kfree(q->q.sdesc);
-			__skb_queue_purge(&q->sendq);
-			free_txq(adap, &q->q);
-		}
-	}
-
 	/* clean up control Tx queues */
 	for (i = 0; i < ARRAY_SIZE(adap->sge.ctrlq); i++) {
 		struct sge_ctrl_txq *cq = &adap->sge.ctrlq[i];
@@ -3093,12 +3124,34 @@ void t4_sge_stop(struct adapter *adap)
 	if (s->tx_timer.function)
 		del_timer_sync(&s->tx_timer);
 
-	for (i = 0; i < ARRAY_SIZE(s->ofldtxq); i++) {
-		struct sge_ofld_txq *q = &s->ofldtxq[i];
+	if (is_offload(adap)) {
+		struct sge_uld_txq_info *txq_info;
+
+		txq_info = adap->sge.uld_txq_info[CXGB4_TX_OFLD];
+		if (txq_info) {
+			struct sge_uld_txq *txq = txq_info->uldtxq;
 
-		if (q->q.desc)
-			tasklet_kill(&q->qresume_tsk);
+			for_each_ofldtxq(&adap->sge, i) {
+				if (txq->q.desc)
+					tasklet_kill(&txq->qresume_tsk);
+			}
+		}
 	}
+
+	if (is_pci_uld(adap)) {
+		struct sge_uld_txq_info *txq_info;
+
+		txq_info = adap->sge.uld_txq_info[CXGB4_TX_CRYPTO];
+		if (txq_info) {
+			struct sge_uld_txq *txq = txq_info->uldtxq;
+
+			for_each_ofldtxq(&adap->sge, i) {
+				if (txq->q.desc)
+					tasklet_kill(&txq->qresume_tsk);
+			}
+		}
+	}
+
 	for (i = 0; i < ARRAY_SIZE(s->ctrlq); i++) {
 		struct sge_ctrl_txq *cq = &s->ctrlq[i];
 
diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 0039bebaa9e2..4655a9f9dcea 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -85,6 +85,7 @@ static inline int send_tx_flowc_wr(struct cxgbi_sock *);
 static const struct cxgb4_uld_info cxgb4i_uld_info = {
 	.name = DRV_MODULE_NAME,
 	.nrxq = MAX_ULD_QSETS,
+	.ntxq = MAX_ULD_QSETS,
 	.rxq_size = 1024,
 	.lro = false,
 	.add = t4_uld_add,
diff --git a/drivers/target/iscsi/cxgbit/cxgbit_main.c b/drivers/target/iscsi/cxgbit/cxgbit_main.c
index ad26b9372f10..96eedfc49c94 100644
--- a/drivers/target/iscsi/cxgbit/cxgbit_main.c
+++ b/drivers/target/iscsi/cxgbit/cxgbit_main.c
@@ -653,6 +653,7 @@ static struct iscsit_transport cxgbit_transport = {
 static struct cxgb4_uld_info cxgbit_uld_info = {
 	.name		= DRV_NAME,
 	.nrxq		= MAX_ULD_QSETS,
+	.ntxq		= MAX_ULD_QSETS,
 	.rxq_size	= 1024,
 	.lro		= true,
 	.add		= cxgbit_uld_add,
-- 
2.3.4

^ permalink raw reply related

* Re: [PATCH] crypto: sun4i-ss: support the Security System PRNG
From: Corentin Labbe @ 2016-11-18  7:55 UTC (permalink / raw)
  To: Sandy Harris
  Cc: Herbert Xu, David S. Miller, maxime.ripard, wens, LKML,
	linux-crypto, linux-arm-kernel
In-Reply-To: <CACXcFmmMc7U1Qz6A+mvMXVnfSmOmssydcmSugo21jrX3u-95Qg@mail.gmail.com>

On Thu, Nov 17, 2016 at 08:07:09PM -0500, Sandy Harris wrote:
> Add Ted T'so to cc list. Shouldn't he be included on anything affecting
> the random(4) driver?
> 

Blindy used get_maintainer.pl, and since the file is in crypto, hw_random people were not set.
Note that get_maintainer.pl on drivers/char/hw_random/, does not give his address also.
My V2 patch will have them in CC/TO.

> On Tue, Oct 18, 2016 at 8:34 AM, Corentin Labbe
> <clabbe.montjoie@gmail.com> wrote:
> 
> > From: LABBE Corentin <clabbe.montjoie@gmail.com>
> >
> > The Security System have a PRNG.
> > This patch add support for it as an hwrng.
> 
> Which is it? A PRNG & a HW RNG are quite different things.
> It would, in general, be a fairly serious error to treat a PRNG
> as a HWRNG.
> 
> If it is just a prng (which it appears to be from a quick look
> at your code) then it is not clear it is useful since the
> random(4) driver already has two PRNGs. It might be
> but I cannot tell.

For me hwrng is a way to give user space an another way to get "random" data via /dev/hwrng.
The only impact of hwrng with random is that just after init some data of hwrng is used for having more entropy.

Grepping prng in drivers/char/hw_random/ and drivers/crypto show me some other PRNG used with hwrng.

Regards
Corentin Labbe

^ permalink raw reply

* Re: [PATCH] crypto: sun4i-ss: support the Security System PRNG
From: Sandy Harris @ 2016-11-18  1:07 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: Herbert Xu, David S. Miller, maxime.ripard, wens, LKML,
	linux-crypto, linux-arm-kernel
In-Reply-To: <1476794067-28563-1-git-send-email-clabbe.montjoie@gmail.com>

Add Ted T'so to cc list. Shouldn't he be included on anything affecting
the random(4) driver?

On Tue, Oct 18, 2016 at 8:34 AM, Corentin Labbe
<clabbe.montjoie@gmail.com> wrote:

> From: LABBE Corentin <clabbe.montjoie@gmail.com>
>
> The Security System have a PRNG.
> This patch add support for it as an hwrng.

Which is it? A PRNG & a HW RNG are quite different things.
It would, in general, be a fairly serious error to treat a PRNG
as a HWRNG.

If it is just a prng (which it appears to be from a quick look
at your code) then it is not clear it is useful since the
random(4) driver already has two PRNGs. It might be
but I cannot tell.

^ permalink raw reply

* Re: BUG: algif_hash crash with extra recv() in 4.9-rc5
From: Laura Abbott @ 2016-11-17 21:20 UTC (permalink / raw)
  To: Herbert Xu, Mat Martineau; +Cc: linux-crypto, Russell King - ARM Linux
In-Reply-To: <20161117140757.GA1149@gondor.apana.org.au>

On 11/17/2016 06:07 AM, Herbert Xu wrote:
> On Wed, Nov 16, 2016 at 11:17:33AM -0800, Mat Martineau wrote:
>>
>> Herbert -
>>
>> Following commit 493b2ed3f7603a15ff738553384d5a4510ffeb95, there is a NULL
>> dereference crash in algif_hash when recv() is called twice like this:
>>
>> send(sk, data, len, MSG_MORE);
>> recv(sk, hash1, len, 0);
>> recv(sk, hash2, len, 0);
>>
>> In 4.8 and earlier, the two recvs return identical data. In 4.9-rc5, the
>> second recv triggers this:
>>
>> [   53.041287] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
>> [   53.042048] IP: [<ffffffffa73fdfb3>] shash_ahash_digest+0x23/0x130
> 
> Ugh.  It looks like the shash wrapper is incorrectly dereferencing
> the SG list even when the length is zero.  Rather than fixing it
> I'm just going to make algif_hash do the safe thing of doing an
> init followed by a final.
> 
> Thanks,
> 
> ---8<---
> Subject: crypto: algif_hash - Fix NULL hash crash with shash
> 
> Recently algif_hash has been changed to allow null hashes.  This
> triggers a bug when used with an shash algorithm whereby it will
> cause a crash during the digest operation.
> 
> This patch fixes it by avoiding the digest operation and instead
> doing an init followed by a final which avoids the buggy code in
> shash.
> 
> This patch also ensures that the result buffer is freed after an
> error so that it is not returned as a genuine hash result on the
> next recv call.
> 
> The shash/ahash wrapper code will be fixed later to handle this
> case correctly.
> 
> Fixes: 493b2ed3f760 ("crypto: algif_hash - Handle NULL hashes correctly")
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
> index 2d8466f..05e21b4 100644
> --- a/crypto/algif_hash.c
> +++ b/crypto/algif_hash.c
> @@ -214,23 +214,26 @@ static int hash_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
>  
>  	ahash_request_set_crypt(&ctx->req, NULL, ctx->result, 0);
>  
> -	if (ctx->more) {
> +	if (!result) {
> +		err = af_alg_wait_for_completion(
> +				crypto_ahash_init(&ctx->req),
> +				&ctx->completion);
> +		if (err)
> +			goto unlock;
> +	}
> +
> +	if (!result || ctx->more) {
>  		ctx->more = 0;
>  		err = af_alg_wait_for_completion(crypto_ahash_final(&ctx->req),
>  						 &ctx->completion);
>  		if (err)
>  			goto unlock;
> -	} else if (!result) {
> -		err = af_alg_wait_for_completion(
> -				crypto_ahash_digest(&ctx->req),
> -				&ctx->completion);
>  	}
>  
>  	err = memcpy_to_msg(msg, ctx->result, len);
>  
> -	hash_free_result(sk, ctx);
> -
>  unlock:
> +	hash_free_result(sk, ctx);
>  	release_sock(sk);
>  
>  	return err ?: len;
> 

Confirmed to work for me. You can take that as a Tested-by.

Thanks,
Laura

^ permalink raw reply

* Re: crypto: caam warning fix, was: master build: 0 failures 1 warnings (v4.9-rc5-177-g81bcfe5)
From: Arnd Bergmann @ 2016-11-17 15:42 UTC (permalink / raw)
  To: Herbert Xu
  Cc: linaro-kernel, Build bot for Mark Brown, kernel-build-reports,
	Horia Geantă, linux-crypto
In-Reply-To: <20161117153403.GA1687@gondor.apana.org.au>

On Thursday, November 17, 2016 11:34:04 PM CET Herbert Xu wrote:
> On Wed, Nov 16, 2016 at 05:40:41PM +0100, Arnd Bergmann wrote:
> > 
> > This is currently the only build warning reported for v4.9, and you have merged
> > the fix for v4.10 in
> > 
> > d69985a07692 ("crypto: caam - fix type mismatch warning")
> > 
> > Any chance you can send this for v4.9 so we have a clean build?
> 
> OK I'll push that along.
> 
> 

Thanks!

	Arnd

^ permalink raw reply

* Re: crypto: caam warning fix, was: master build: 0 failures 1 warnings (v4.9-rc5-177-g81bcfe5)
From: Herbert Xu @ 2016-11-17 15:34 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linaro-kernel, Build bot for Mark Brown, kernel-build-reports,
	Horia Geantă, linux-crypto
In-Reply-To: <5255539.MpGQJX1bep@wuerfel>

On Wed, Nov 16, 2016 at 05:40:41PM +0100, Arnd Bergmann wrote:
> 
> This is currently the only build warning reported for v4.9, and you have merged
> the fix for v4.10 in
> 
> d69985a07692 ("crypto: caam - fix type mismatch warning")
> 
> Any chance you can send this for v4.9 so we have a clean build?

OK I'll push that along.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: BUG: algif_hash crash with extra recv() in 4.9-rc5
From: Herbert Xu @ 2016-11-17 14:07 UTC (permalink / raw)
  To: Mat Martineau; +Cc: linux-crypto, Laura Abbott, Russell King - ARM Linux
In-Reply-To: <alpine.OSX.2.20.1611161027010.67352@mjmartin-mac01.local>

On Wed, Nov 16, 2016 at 11:17:33AM -0800, Mat Martineau wrote:
> 
> Herbert -
> 
> Following commit 493b2ed3f7603a15ff738553384d5a4510ffeb95, there is a NULL
> dereference crash in algif_hash when recv() is called twice like this:
> 
> send(sk, data, len, MSG_MORE);
> recv(sk, hash1, len, 0);
> recv(sk, hash2, len, 0);
> 
> In 4.8 and earlier, the two recvs return identical data. In 4.9-rc5, the
> second recv triggers this:
> 
> [   53.041287] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> [   53.042048] IP: [<ffffffffa73fdfb3>] shash_ahash_digest+0x23/0x130

Ugh.  It looks like the shash wrapper is incorrectly dereferencing
the SG list even when the length is zero.  Rather than fixing it
I'm just going to make algif_hash do the safe thing of doing an
init followed by a final.

Thanks,

---8<---
Subject: crypto: algif_hash - Fix NULL hash crash with shash

Recently algif_hash has been changed to allow null hashes.  This
triggers a bug when used with an shash algorithm whereby it will
cause a crash during the digest operation.

This patch fixes it by avoiding the digest operation and instead
doing an init followed by a final which avoids the buggy code in
shash.

This patch also ensures that the result buffer is freed after an
error so that it is not returned as a genuine hash result on the
next recv call.

The shash/ahash wrapper code will be fixed later to handle this
case correctly.

Fixes: 493b2ed3f760 ("crypto: algif_hash - Handle NULL hashes correctly")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 2d8466f..05e21b4 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -214,23 +214,26 @@ static int hash_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 
 	ahash_request_set_crypt(&ctx->req, NULL, ctx->result, 0);
 
-	if (ctx->more) {
+	if (!result) {
+		err = af_alg_wait_for_completion(
+				crypto_ahash_init(&ctx->req),
+				&ctx->completion);
+		if (err)
+			goto unlock;
+	}
+
+	if (!result || ctx->more) {
 		ctx->more = 0;
 		err = af_alg_wait_for_completion(crypto_ahash_final(&ctx->req),
 						 &ctx->completion);
 		if (err)
 			goto unlock;
-	} else if (!result) {
-		err = af_alg_wait_for_completion(
-				crypto_ahash_digest(&ctx->req),
-				&ctx->completion);
 	}
 
 	err = memcpy_to_msg(msg, ctx->result, len);
 
-	hash_free_result(sk, ctx);
-
 unlock:
+	hash_free_result(sk, ctx);
 	release_sock(sk);
 
 	return err ?: len;
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related

* Re: [PATCH] crypto: ccp - Fix handling of RSA exponent on a v5 device
From: Herbert Xu @ 2016-11-17 13:14 UTC (permalink / raw)
  To: Gary R Hook; +Cc: Gary R Hook, linux-crypto, thomas.lendacky, davem
In-Reply-To: <368b41ee-45e3-c330-10c5-16fcc22d3d16@amd.com>

On Wed, Nov 16, 2016 at 11:25:19AM -0600, Gary R Hook wrote:
>
> The kernel crypto layer does not yet support RSA, true. However, we
> designed the ccp.ko layer to be available to anyone that wants to use
> it. The underlying module currently has differing behavior/results
> between the v3 and v5 implementations of the RSA command function.
> This patch fixes the borked v5 code.

Do you mean that an out-of-tree module could enter the buggy
code path?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: BUG: algif_hash crash with extra recv() in 4.9-rc5
From: Mat Martineau @ 2016-11-17 16:50 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto, Laura Abbott, Russell King - ARM Linux
In-Reply-To: <20161117140757.GA1149@gondor.apana.org.au>


Herbert,

On Thu, 17 Nov 2016, Herbert Xu wrote:

> On Wed, Nov 16, 2016 at 11:17:33AM -0800, Mat Martineau wrote:
>>
>> Herbert -
>>
>> Following commit 493b2ed3f7603a15ff738553384d5a4510ffeb95, there is a NULL
>> dereference crash in algif_hash when recv() is called twice like this:
>>
>> send(sk, data, len, MSG_MORE);
>> recv(sk, hash1, len, 0);
>> recv(sk, hash2, len, 0);
>>
>> In 4.8 and earlier, the two recvs return identical data. In 4.9-rc5, the
>> second recv triggers this:
>>
>> [   53.041287] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
>> [   53.042048] IP: [<ffffffffa73fdfb3>] shash_ahash_digest+0x23/0x130
>
> Ugh.  It looks like the shash wrapper is incorrectly dereferencing
> the SG list even when the length is zero.  Rather than fixing it
> I'm just going to make algif_hash do the safe thing of doing an
> init followed by a final.

Thanks for the patch. For my test code, it fixed the crash and restored 
the previous behavior.

Regards,
Mat


> ---8<---
> Subject: crypto: algif_hash - Fix NULL hash crash with shash
>
> Recently algif_hash has been changed to allow null hashes.  This
> triggers a bug when used with an shash algorithm whereby it will
> cause a crash during the digest operation.
>
> This patch fixes it by avoiding the digest operation and instead
> doing an init followed by a final which avoids the buggy code in
> shash.
>
> This patch also ensures that the result buffer is freed after an
> error so that it is not returned as a genuine hash result on the
> next recv call.
>
> The shash/ahash wrapper code will be fixed later to handle this
> case correctly.
>
> Fixes: 493b2ed3f760 ("crypto: algif_hash - Handle NULL hashes correctly")
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
> index 2d8466f..05e21b4 100644
> --- a/crypto/algif_hash.c
> +++ b/crypto/algif_hash.c
> @@ -214,23 +214,26 @@ static int hash_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
>
> 	ahash_request_set_crypt(&ctx->req, NULL, ctx->result, 0);
>
> -	if (ctx->more) {
> +	if (!result) {
> +		err = af_alg_wait_for_completion(
> +				crypto_ahash_init(&ctx->req),
> +				&ctx->completion);
> +		if (err)
> +			goto unlock;
> +	}
> +
> +	if (!result || ctx->more) {
> 		ctx->more = 0;
> 		err = af_alg_wait_for_completion(crypto_ahash_final(&ctx->req),
> 						 &ctx->completion);
> 		if (err)
> 			goto unlock;
> -	} else if (!result) {
> -		err = af_alg_wait_for_completion(
> -				crypto_ahash_digest(&ctx->req),
> -				&ctx->completion);
> 	}
>
> 	err = memcpy_to_msg(msg, ctx->result, len);
>
> -	hash_free_result(sk, ctx);
> -
> unlock:
> +	hash_free_result(sk, ctx);
> 	release_sock(sk);
>
> 	return err ?: len;
> -- 
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>

--
Mat Martineau
Intel OTC

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox