Linux cryptographic layer development

Linux cryptographic layer development
 help / color / mirror / Atom feed

* [PATCH 5/8] crypto: omap-sham: change the DMA threshold value to a define
From: Tero Kristo @ 2016-09-19 15:22 UTC (permalink / raw)
  To: herbert, lokeshvutla, davem, linux-crypto, linux-omap; +Cc: linux-arm-kernel
In-Reply-To: <1474298539-23897-1-git-send-email-t-kristo@ti.com>

Currently the threshold value was hardcoded in the driver. Having a define
for it makes it easier to configure.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
---
 drivers/crypto/omap-sham.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/omap-sham.c b/drivers/crypto/omap-sham.c
index 8558989..5c95bf9 100644
--- a/drivers/crypto/omap-sham.c
+++ b/drivers/crypto/omap-sham.c
@@ -137,6 +137,7 @@
 #define OMAP_ALIGNED		__attribute__((aligned(sizeof(u32))))
 
 #define BUFLEN			PAGE_SIZE
+#define OMAP_SHA_DMA_THRESHOLD	256
 
 struct omap_sham_dev;
 
@@ -1435,10 +1436,11 @@ static int omap_sham_final(struct ahash_request *req)
 	/*
 	 * OMAP HW accel works only with buffers >= 9.
 	 * HMAC is always >= 9 because ipad == block size.
-	 * If buffersize is less than 240, we use fallback SW encoding,
-	 * as using DMA + HW in this case doesn't provide any benefit.
+	 * If buffersize is less than DMA_THRESHOLD, we use fallback
+	 * SW encoding, as using DMA + HW in this case doesn't provide
+	 * any benefit.
 	 */
-	if (!ctx->digcnt && ctx->bufcnt < 240)
+	if (!ctx->digcnt && ctx->bufcnt < OMAP_SHA_DMA_THRESHOLD)
 		return omap_sham_final_shash(req);
 	else if (ctx->bufcnt)
 		return omap_sham_enqueue(req, OP_FINAL);
-- 
1.9.1

^ permalink raw reply related

* [PATCH 3/8] crypto: omap-sham: rename sgl to sgl_tmp for deprecation
From: Tero Kristo @ 2016-09-19 15:22 UTC (permalink / raw)
  To: herbert, lokeshvutla, davem, linux-crypto, linux-omap; +Cc: linux-arm-kernel
In-Reply-To: <1474298539-23897-1-git-send-email-t-kristo@ti.com>

The current usage of sgl will be deprecated, and will be replaced by an
array required by the sg based driver implementation. Rename the existing
variable as sgl_tmp so that it can be removed from the driver easily later.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
---
 drivers/crypto/omap-sham.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/crypto/omap-sham.c b/drivers/crypto/omap-sham.c
index 3f2bf98..33bea52 100644
--- a/drivers/crypto/omap-sham.c
+++ b/drivers/crypto/omap-sham.c
@@ -151,7 +151,7 @@ struct omap_sham_reqctx {
 
 	/* walk state */
 	struct scatterlist	*sg;
-	struct scatterlist	sgl;
+	struct scatterlist	sgl_tmp;
 	unsigned int		offset;	/* offset in current sg */
 	unsigned int		total;	/* total request */
 
@@ -583,18 +583,19 @@ static int omap_sham_xmit_dma(struct omap_sham_dev *dd, dma_addr_t dma_addr,
 	if (is_sg) {
 		/*
 		 * The SG entry passed in may not have the 'length' member
-		 * set correctly so use a local SG entry (sgl) with the
+		 * set correctly so use a local SG entry (sgl_tmp) with the
 		 * proper value for 'length' instead.  If this is not done,
 		 * the dmaengine may try to DMA the incorrect amount of data.
 		 */
-		sg_init_table(&ctx->sgl, 1);
-		sg_assign_page(&ctx->sgl, sg_page(ctx->sg));
-		ctx->sgl.offset = ctx->sg->offset;
-		sg_dma_len(&ctx->sgl) = len32;
-		sg_dma_address(&ctx->sgl) = sg_dma_address(ctx->sg);
-
-		tx = dmaengine_prep_slave_sg(dd->dma_lch, &ctx->sgl, 1,
-			DMA_MEM_TO_DEV, DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+		sg_init_table(&ctx->sgl_tmp, 1);
+		sg_assign_page(&ctx->sgl_tmp, sg_page(ctx->sg));
+		ctx->sgl_tmp.offset = ctx->sg->offset;
+		sg_dma_len(&ctx->sgl_tmp) = len32;
+		sg_dma_address(&ctx->sgl_tmp) = sg_dma_address(ctx->sg);
+
+		tx = dmaengine_prep_slave_sg(dd->dma_lch, &ctx->sgl_tmp, 1,
+					     DMA_MEM_TO_DEV,
+					     DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
 	} else {
 		tx = dmaengine_prep_slave_single(dd->dma_lch, dma_addr, len32,
 			DMA_MEM_TO_DEV, DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
-- 
1.9.1

^ permalink raw reply related

* [PATCH 0/8] crypto: omap-sham: convert to sg based data engine
From: Tero Kristo @ 2016-09-19 15:22 UTC (permalink / raw)
  To: herbert, lokeshvutla, davem, linux-crypto, linux-omap; +Cc: linux-arm-kernel

Hi,

This series converts the omap-sham buffer handling towards a scatterlist
approach. This avoids the need to have a huge internal buffer within the
driver, and also allows us to properly implement export/import for the
driver. I tried splitting up the changes to some sane patches, but this
was rather difficult due to the fact that this is largely a complete
rewrite of portions of the driver. Patch #6 is a prime example of this
being pretty large, but splitting this up would break bisectability.
Hopefully the patch is still understandable though.

Crypto manager tests work fine at least on omap3/am3/am4/dra7 SoC:s.
(Something is broken with test farm again and could not try omap2/omap4.)

Also tested tcrypt SHA performance on DRA7 and it seems to be working
fine with different buffer sizes.

My test branch is also available here for interested parties:
tree: https://github.com/t-kristo/linux-pm.git
breanch: 4.8-rc1-crypto

-Tero

^ permalink raw reply

* [PATCH 2/8] crypto: omap-sham: align algorithms on word offset
From: Tero Kristo @ 2016-09-19 15:22 UTC (permalink / raw)
  To: herbert, lokeshvutla, davem, linux-crypto, linux-omap; +Cc: linux-arm-kernel
In-Reply-To: <1474298539-23897-1-git-send-email-t-kristo@ti.com>

OMAP HW generally expects data for DMA to be on word boundary, so make the
SHA driver inform crypto framework of the same preference.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
---
 drivers/crypto/omap-sham.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/omap-sham.c b/drivers/crypto/omap-sham.c
index 74653c9..3f2bf98 100644
--- a/drivers/crypto/omap-sham.c
+++ b/drivers/crypto/omap-sham.c
@@ -1368,7 +1368,7 @@ static struct ahash_alg algs_sha1_md5[] = {
 						CRYPTO_ALG_NEED_FALLBACK,
 		.cra_blocksize		= SHA1_BLOCK_SIZE,
 		.cra_ctxsize		= sizeof(struct omap_sham_ctx),
-		.cra_alignmask		= 0,
+		.cra_alignmask		= OMAP_ALIGN_MASK,
 		.cra_module		= THIS_MODULE,
 		.cra_init		= omap_sham_cra_init,
 		.cra_exit		= omap_sham_cra_exit,
@@ -1467,7 +1467,7 @@ static struct ahash_alg algs_sha224_sha256[] = {
 						CRYPTO_ALG_NEED_FALLBACK,
 		.cra_blocksize		= SHA224_BLOCK_SIZE,
 		.cra_ctxsize		= sizeof(struct omap_sham_ctx),
-		.cra_alignmask		= 0,
+		.cra_alignmask		= OMAP_ALIGN_MASK,
 		.cra_module		= THIS_MODULE,
 		.cra_init		= omap_sham_cra_init,
 		.cra_exit		= omap_sham_cra_exit,
@@ -1489,7 +1489,7 @@ static struct ahash_alg algs_sha224_sha256[] = {
 						CRYPTO_ALG_NEED_FALLBACK,
 		.cra_blocksize		= SHA256_BLOCK_SIZE,
 		.cra_ctxsize		= sizeof(struct omap_sham_ctx),
-		.cra_alignmask		= 0,
+		.cra_alignmask		= OMAP_ALIGN_MASK,
 		.cra_module		= THIS_MODULE,
 		.cra_init		= omap_sham_cra_init,
 		.cra_exit		= omap_sham_cra_exit,
@@ -1562,7 +1562,7 @@ static struct ahash_alg algs_sha384_sha512[] = {
 						CRYPTO_ALG_NEED_FALLBACK,
 		.cra_blocksize		= SHA384_BLOCK_SIZE,
 		.cra_ctxsize		= sizeof(struct omap_sham_ctx),
-		.cra_alignmask		= 0,
+		.cra_alignmask		= OMAP_ALIGN_MASK,
 		.cra_module		= THIS_MODULE,
 		.cra_init		= omap_sham_cra_init,
 		.cra_exit		= omap_sham_cra_exit,
@@ -1584,7 +1584,7 @@ static struct ahash_alg algs_sha384_sha512[] = {
 						CRYPTO_ALG_NEED_FALLBACK,
 		.cra_blocksize		= SHA512_BLOCK_SIZE,
 		.cra_ctxsize		= sizeof(struct omap_sham_ctx),
-		.cra_alignmask		= 0,
+		.cra_alignmask		= OMAP_ALIGN_MASK,
 		.cra_module		= THIS_MODULE,
 		.cra_init		= omap_sham_cra_init,
 		.cra_exit		= omap_sham_cra_exit,
-- 
1.9.1

^ permalink raw reply related

* [PATCH 1/8] crypto: omap-sham: add context export/import stubs
From: Tero Kristo @ 2016-09-19 15:22 UTC (permalink / raw)
  To: herbert, lokeshvutla, davem, linux-crypto, linux-omap; +Cc: linux-arm-kernel
In-Reply-To: <1474298539-23897-1-git-send-email-t-kristo@ti.com>

Initially these just return -ENOTSUPP to indicate that they don't
really do anything yet. Some sort of implementation is required
for the driver to at least probe.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
---
 drivers/crypto/omap-sham.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/omap-sham.c b/drivers/crypto/omap-sham.c
index cf9f617c..74653c9 100644
--- a/drivers/crypto/omap-sham.c
+++ b/drivers/crypto/omap-sham.c
@@ -1340,6 +1340,16 @@ static void omap_sham_cra_exit(struct crypto_tfm *tfm)
 	}
 }
 
+static int omap_sham_export(struct ahash_request *req, void *out)
+{
+	return -ENOTSUPP;
+}
+
+static int omap_sham_import(struct ahash_request *req, const void *in)
+{
+	return -ENOTSUPP;
+}
+
 static struct ahash_alg algs_sha1_md5[] = {
 {
 	.init		= omap_sham_init,
@@ -1998,8 +2008,13 @@ static int omap_sham_probe(struct platform_device *pdev)
 
 	for (i = 0; i < dd->pdata->algs_info_size; i++) {
 		for (j = 0; j < dd->pdata->algs_info[i].size; j++) {
-			err = crypto_register_ahash(
-					&dd->pdata->algs_info[i].algs_list[j]);
+			struct ahash_alg *alg;
+
+			alg = &dd->pdata->algs_info[i].algs_list[j];
+			alg->export = omap_sham_export;
+			alg->import = omap_sham_import;
+			alg->halg.statesize = sizeof(struct omap_sham_reqctx);
+			err = crypto_register_ahash(alg);
 			if (err)
 				goto err_algs;
 
-- 
1.9.1

^ permalink raw reply related

* Crypto Fixes for 4.8
From: Herbert Xu @ 2016-09-19 11:21 UTC (permalink / raw)
  To: Linus Torvalds, David S. Miller, Linux Kernel Mailing List,
	Linux Crypto Mailing List
In-Reply-To: <20160905093318.GA30895@gondor.apana.org.au>

Hi Linus:

This push fixes a potential weakness in IPsec CBC IV generation,
as well as a number of issues that arose out of an OOM crash on
ARM with CTR-mode AES.


Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git linus


Ard Biesheuvel (2):
      crypto: arm/aes-ctr - fix NULL dereference in tail processing
      crypto: arm64/aes-ctr - fix NULL dereference in tail processing

Herbert Xu (2):
      crypto: echainiv - Replace chaining with multiplication
      crypto: skcipher - Fix blkcipher walk OOM crash

 arch/arm/crypto/aes-ce-glue.c |    2 +-
 arch/arm64/crypto/aes-glue.c  |    2 +-
 crypto/blkcipher.c            |    3 +-
 crypto/echainiv.c             |  115 +++++++++--------------------------------
 4 files changed, 28 insertions(+), 94 deletions(-)

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH net-next] chcr/cxgb4i/cxgbit/RDMA/cxgb4: Allocate resources dynamically for all cxgb4 ULD's
From: David Miller @ 2016-09-19  5:38 UTC (permalink / raw)
  To: hariprasad
  Cc: netdev, linux-scsi, target-devel, linux-rdma, linux-crypto, nab,
	jejb, martin.petersen, dledford, herbert, varun, nirranjan, swise,
	atul.gupta, rahul.lakkireddy
In-Reply-To: <1474080159-14980-1-git-send-email-hariprasad@chelsio.com>

From: Hariprasad Shenai <hariprasad@chelsio.com>
Date: Sat, 17 Sep 2016 08:12:39 +0530

> Allocate resources dynamically to cxgb4's Upper layer driver's(ULD) like
> cxgbit, iw_cxgb4 and cxgb4i. Allocate resources when they register with
> cxgb4 driver and free them while unregistering. All the queues and the
> interrupts for them will be allocated during ULD probe only and freed
> during remove.
> 
> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>

Applied.

^ permalink raw reply

* Re: bcachefs: Encryption (Posting for review)
From: Eric Wheeler @ 2016-09-19  0:04 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-kernel, linux-fsdevel, linux-crypto, linux-bcache
In-Reply-To: <20160902044145.mkk6eipej6kbfhhi@kmo-pixel>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 13480 bytes --]

On Thu, 1 Sep 2016, Kent Overstreet wrote:

> Encryption in bcachefs is done and working and I just finished documenting the
> design - so now, it needs more eyeballs and vetting before letting users play
> with it.
> 
> ### Algorithms
> 
> By virtue of working within a copy on write filesystem with provisions for ZFS
> style checksums (that is, checksums with the pointers, not the data), we’re
> able to use a modern AEAD style construction. We use ChaCha20 and Poly1305. We
> use the cyphers directly instead of using the kernel AEAD library (and thus
> means there's a bit more in the design that needs auditing).

A few thoughts:

Great work implementing your own monotnoically-increasing nonce in 
bcachefs.  You have implemented your own crypto stack, I'm sure lots of 
time went into that.  Stream ciphers are great, but not always seekable 
(eg: output-feedback mode, OFM).  Of course if you plan to validate the 
MAC for every read, then seekable might not matter---but you will need to 
decrypt up to the offset you want even after MAC validation which adds 
overhead.

Since you have written a solid nonce generator, supporting the existing 
kernel library for block ciphers in counter-mode (CTR) is probably easy 
and would aide the validation of your protocol; indeed, it would 
future-proof bcachefs against maintaining new ciphers if the user can 
specify the kernel crypto library's block- or streamping- cipher.

For counter mode, shift the nonce by as many bits as the extent size 
divided by the cipher-block size.  For 64k extents and AES-128, you would 
shift the nonce like so:
	nonce <<= ilog2( 65536 / ilog2(128) )  

This example uses 12 nonce bits leaving 2^84 extent generations before 
wrapping.  (Of course you'll need to store the nonce in your extent 
metadata.)

You then have a counter for each 16-bytes of AES in the bottom bits of 
the extent nonce.  Counter mode is trivial to implement:
	ciphertext[   0] = E_k(nonce+   0) XOR plaintext[0]
	[...]
	ciphertext[4095] = E_k(nonce+4095) XOR plaintext[4095]

This gives you cipher-block-independent parallelism, seekability, and 
flexibility using existing and future block ciphers.  Counter mode is well 
studied, I don't think anyone will argue against that if your nonces are 
well founded. 

[... more below ]


> The current design uses the same key for both ChaCha20 and Poly1305, but my
> recent rereading of the Poly1305-AES paper seems to imply that the Poly1305 key
> shouldn't be used for anything else. Guidance from actual cryptographers would
> be appreciated here; the ChaCha20/Poly1305 AEAD RFC appears to be silent on the
> matter.
> 
> Note that ChaCha20 is a stream cypher. This means that it’s critical that we use
> a cryptographic MAC (which would be highly desirable anyways), and also avoiding
> nonce reuse is critical. Getting nonces right is where most of the trickiness is
> involved in bcachefs’s encryption.
> 
> The current algorithm choices are not hard coded. Bcachefs already has
> selectable checksum types, and every individual data and metadata write has a
> field that describes the checksum algorithm that was used. On disk, encrypted
> data is represented as a new checksum type - so we now have [none, crc32c,
> crc64, chacha20/poly1305] as possible methods for data to be
> checksummed/encrypted. If in the future we add new encryption algorithms, users
> will be able to switch to the new algorithm on existing encrypted filesystems;
> new data will be written with the new algorithm and old data will be read with
> the old algorithm until it is rewritten.
> 
> ### Key derivation, master key
> 
> Userspace tooling takes the user's passphrase and derives an encryption key with
> scrypt. This key is made available to the kernel (via the Linux kernel's keyring
> service) prior to mounting the filesystem.
> 
> On filesystem mount, the userspace provided key is used to decrypt the master
> key, which is stored in the superblock - also with ChaCha20. The master key is
> encrypted with an 8 byte header, so that we can tell if the correct key was
> supplied.
> 
> ### Metadata
> 
> Except for the superblock, no metadata in bcache/bcachefs is updated in place -
> everything is more or less log structured. Only the superblock is stored
> unencrypted; other metadata is stored with an unencrypted header and encrypted
> contents.
> 
> The superblock contains:
>  * Label and UUIDs identifying the filesystem
>  * A list of component devices (for multi-device filesystems), and information
>    on their size, geometry, status (active/failed), last used timestamp
>  * Filesystem options
>  * The location of the journal
> 
> For the rest of the metadata, the unencrypted portion contains:
> 
>  * 128 bit checksum/MAC field
>  * Magic number - identifies a given structure as btree/journal/allocation
>    information, for that filesystem
>  * Version number (of on disk format), flags (including checksum/encryption
>    type).
>  * Sequence numbers: journal entries have an ascending 64 bit sequence number,
>    btree node entries have a random 64 bit sequence number identifying them as
>    belonging to that node. Btree nodes also have a field containing the sequence
>    number of the most recent journal entry they contain updates from; this is
>    stored unencrypted so it can be used as part of the nonce.
>  * Size of the btree node entry/journal entry, in u64s
> 
> Btree node layout information is encrypted; an attacker could tell that a given
> location on disk was a btree node, but the part of the header that indicates
> what range of the keyspace, or which btree ID (extents/dirents/xattrs/etc.), or
> which level of the btree is all encrypted.
> 
> #### Metadata nonces
> 
>  * Journal entries use their sequence number - which is unique for a given
>    filesystem. When metadata is being replicated and we're doing multiple
>    journal writes with the same sequence number - and thus nonce - we really are
>    writing the same data (we only checksum once, not once per write).
> 
>  * Btree nodes concatenate a few things for the nonce:
>    - A 64 bit random integer, which is generated per btree node (but btree nodes
>      are log structured, so entries within a given btree node share the same
>      integer).
>    - A journal sequence number. For btree node writes done at around the same
>      point in time, this field can be identical in unrelated btree node writes -
>      but only for btree nodes writes done relatively close in time, so the
>      journal sequence number plus the previous random integer should be more
>      than sufficient entropy.
>    - And lastly the offset within the btree node, so that btree node entries
>      sharing the same random integer are guaranteed a different nonce.
> 
>  * Allocation information (struct prio_set):
>    bcache/bcachefs doesn't have allocation information persisted like other
>    filesystems, but this is our closest equivalent - this structure mainly
>    stores generation numbers that correspond to extent pointers.
> 
>    Allocation information uses a dedicated randomly generated 96 bit nonce
>    field.
> 
> ### Data
> 
> Data writes have no unencrypted header: checksums/MACs, nonces, etc. are stored
> with the pointers, ZFS style.
> 
> Bcache/bcachefs is extent based, not block based: pointers point to variable
> sized chunks of data, and we store one checksum/MAC per extent, not per block: a
> checksum or MAC might cover up to 64k (extents that aren't checksummed or
> compressed may be larger). Nonces are thus also per extent, not per block.
> 
> Currently, the Poly1305 MAC is truncated to 64 bits - due to a desire not to
> inflate our metadata any more than necessary. Guidance from cryptographers is
> requested as to whether this is a reasonable option; do note that the MAC is not
> stored with the data, but is itself stored encrypted elsewhere in the btree. 

Encrypted or not, MACs are already only as strong as sqrt(2^n) because of 
birthday paradox collisions.  Truncating to 64 bits guarantees a MAC 
collision every 2^32 generations which is too weak in my opinion.  Lost 
space or no, it is definitely better to keep the MAC the same size as the 
MACing function (or hash if HMAC).

Another note on MACs:  Using Poly1305 sounds great, but what if it is 
found to be weakened or broken or deprecated in the future? 

IMO, it would best to support MACs as HMACs and allow the user to specify 
the hash function using the existing kernel crypto library.  HMAC 
construction is as follows (|| means append):

	HMAC_{HASH,key}(data) = HASH(HASH(key) || HASH(data))


Of course you are welcome to support all of the above and let the user 
decide on the MAC function (poly1305 vs HMAC_{HASH,key}) and crypto 
function (counter-mode+block-cipher, streaming ciphers including chacha*).

It is appealing to have super-fast chacha20+poly1305---and equally so to 
have the flexibility and breadth of the existing crypto suite using well 
tested crypto constructs like counter-mode and HMAC.


--
Eric Wheeler

> We do already have different fields for storing 4 byte checksums and 8 
> byte checksums; it will be a trivial matter to add a field allowing 16 
> byte checksums to be stored, and we will add that anyways - so this 
> isn't a pressing design issue, this is just a matter of what the 
> defaults should be and what we should tell users.
> 
> #### Extent nonces
> 
> We don't wish to simply add a random 96 bit nonce to every extent - that would
> inflate our metadata size by a very significant amount. Instead, keys (of which
> extents are a subset) have a 96 bit version number field; when encryption is
> enabled, we ensure that version numbers are enabled and every new extent gets a
> new, unique version number.
> 
> However, extents may be partially overwritten or split, and then to defragment
> we may have to rewrite those partially overwritten extents elsewhere. We cannot
> simply pick a new version number when we rewrite an extent - that would break
> semantics other uses of version numbers expect.
> 
> When we rewrite an extent, we only write the currently live portions of the
> extent - we don't rewrite the parts that were overwritten. We can't write it out
> with the same nonce as the original extent.
> 
> If we concatenated the version number with the offset within the file, and the
> extent's current size - that would work, except that it would break fcollapse(),
> which moves extents to a new position within a file. We are forced to add some
> additional state to extents.
> 
> We could add a small counter that is incremented every time the size of an
> extent is reduced (and the data it points to changes); we can easily bound the
> size of the counter we need by the maximum size of a checksummed extent. But
> this approach fails when extents are split.
> 
> What can work is if we add a field for "offset from the start of the original
> extent to the start of the current extent" - updating that field whenever we
> trim the front of an extent.
> 
> If we have that, then we could simply skip ahead in the keystream to where the
> currently live data lived in the original extent - there's no problem with nonce
> reuse if you're encrypting exactly the same data. Except - that fails with
> compression, since if we take an extent, drop the first 4k, and compress it,
> that won't give the same data as if we compress it and then drop the first 4k of
> the compressed data.
> 
> The approach almost works though, if we take that offset and use it as part of
> our nonce: what we want to do is construct a function that will output the same
> nonce iff two extents (fragments of the same original extent) really are the
> same data.
> 
> Offset into the original extent works in the absence of compression - two
> fragments with the same offset but different sizes will be equal in their common
> prefix, ignoring compression. We can handle compression if we also include both
> the current size, and the current compression function - offset + current size
> uniquely determines the uncompressed data, so, offset + current size +
> compression function will uniquely determine the compressed output.
> 
> #### Nonce reuse on startup
> 
> After recovery, we must ensure we don't reuse existing version numbers - we must
> ensure that newly allocated version numbers are strictly greater than any
> version number that has every been used before.
> 
> The problem here is that we use the version number to write the data before
> adding the extent with that version number to the btree: after unclean shutdown,
> there will have been version numbers used to write data for which we have no
> record in the btree.
> 
> The rigorous solution to this is to add a field (likely to the journal header)
> that indicates version numbers smaller than that field may have been used.
> However, we don't do that yet - it's not completely trivial since it'll add
> another potential dependency in the IO path that needs some analysis.
> 
> The current solution implemented by the code is to scan every existing version
> number (as part of an existing pass), and set the next version number to
> allocate to be 64k greater than the highest existing version number that was
> found.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* [PATCH] crypto: sun4i-ss: mark sun4i_hash() static
From: Baoyou Xie @ 2016-09-18 12:52 UTC (permalink / raw)
  To: clabbe.montjoie, herbert, davem, maxime.ripard, wens
  Cc: linux-crypto, linux-arm-kernel, linux-kernel, arnd, baoyou.xie,
	xie.baoyou

We get 1 warning when building kernel with W=1:
drivers/crypto/sunxi-ss/sun4i-ss-hash.c:168:5: warning: no previous prototype for 'sun4i_hash' [-Wmissing-prototypes]

In fact, this function is only used in the file in which it is
declared and don't need a declaration, but can be made static.
So this patch marks it 'static'.

Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
---
 drivers/crypto/sunxi-ss/sun4i-ss-hash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/sunxi-ss/sun4i-ss-hash.c b/drivers/crypto/sunxi-ss/sun4i-ss-hash.c
index 1afeb8e..0de2f62 100644
--- a/drivers/crypto/sunxi-ss/sun4i-ss-hash.c
+++ b/drivers/crypto/sunxi-ss/sun4i-ss-hash.c
@@ -165,7 +165,7 @@ int sun4i_hash_import_sha1(struct ahash_request *areq, const void *in)
  * write remaining data in op->buf
  * final state op->len=56
  */
-int sun4i_hash(struct ahash_request *areq)
+static int sun4i_hash(struct ahash_request *areq)
 {
 	u32 v, ivmode = 0;
 	unsigned int i = 0;
-- 
2.7.4

^ permalink raw reply related

* [PATCH] crypto: ccp - Fix return value check in ccp_dmaengine_register()
From: Wei Yongjun @ 2016-09-17 16:01 UTC (permalink / raw)
  To: Tom Lendacky, Gary Hook, Herbert Xu; +Cc: Wei Yongjun, linux-crypto

From: Wei Yongjun <weiyongjun1@huawei.com>

Fix the retrn value check which testing the wrong variable
in ccp_dmaengine_register().

Fixes: 58ea8abf4904 ("crypto: ccp - Register the CCP as a DMA resource")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/crypto/ccp/ccp-dmaengine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/ccp/ccp-dmaengine.c b/drivers/crypto/ccp/ccp-dmaengine.c
index 94f77b0..32f645e 100644
--- a/drivers/crypto/ccp/ccp-dmaengine.c
+++ b/drivers/crypto/ccp/ccp-dmaengine.c
@@ -650,7 +650,7 @@ int ccp_dmaengine_register(struct ccp_device *ccp)
 	dma_desc_cache_name = devm_kasprintf(ccp->dev, GFP_KERNEL,
 					     "%s-dmaengine-desc-cache",
 					     ccp->name);
-	if (!dma_cmd_cache_name)
+	if (!dma_desc_cache_name)
 		return -ENOMEM;
 	ccp->dma_desc_cache = kmem_cache_create(dma_desc_cache_name,
 						sizeof(struct ccp_dma_desc),

^ permalink raw reply related

* [PATCH net-next] chcr/cxgb4i/cxgbit/RDMA/cxgb4: Allocate resources dynamically for all cxgb4 ULD's
From: Hariprasad Shenai @ 2016-09-17  2:42 UTC (permalink / raw)
  To: netdev, linux-scsi, target-devel, linux-rdma, linux-crypto
  Cc: davem, nab, jejb, martin.petersen, dledford, herbert, varun,
	nirranjan, swise, atul.gupta, rahul.lakkireddy, Hariprasad Shenai

Allocate resources dynamically to cxgb4's Upper layer driver's(ULD) like
cxgbit, iw_cxgb4 and cxgb4i. Allocate resources when they register with
cxgb4 driver and free them while unregistering. All the queues and the
interrupts for them will be allocated during ULD probe only and freed
during remove.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 drivers/crypto/chelsio/chcr_core.c                 |   10 +-
 drivers/infiniband/hw/cxgb4/device.c               |    4 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |   47 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |  127 +----
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    |  613 +++++---------------
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c     |  223 ++++++--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h     |   31 +-
 drivers/net/ethernet/chelsio/cxgb4/sge.c           |   18 +-
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c                 |    3 +
 drivers/target/iscsi/cxgbit/cxgbit_main.c          |    3 +
 10 files changed, 385 insertions(+), 694 deletions(-)

diff --git a/drivers/crypto/chelsio/chcr_core.c b/drivers/crypto/chelsio/chcr_core.c
index 2f6156b..fb5f9bb 100644
--- a/drivers/crypto/chelsio/chcr_core.c
+++ b/drivers/crypto/chelsio/chcr_core.c
@@ -39,12 +39,10 @@ static chcr_handler_func work_handlers[NUM_CPL_CMDS] = {
 	[CPL_FW6_PLD] = cpl_fw6_pld_handler,
 };
 
-static struct cxgb4_pci_uld_info chcr_uld_info = {
+static struct cxgb4_uld_info chcr_uld_info = {
 	.name = DRV_MODULE_NAME,
-	.nrxq = 4,
+	.nrxq = MAX_ULD_QSETS,
 	.rxq_size = 1024,
-	.nciq = 0,
-	.ciq_size = 0,
 	.add = chcr_uld_add,
 	.state_change = chcr_uld_state_change,
 	.rx_handler = chcr_uld_rx_handler,
@@ -205,7 +203,7 @@ static int chcr_uld_state_change(void *handle, enum cxgb4_state state)
 
 static int __init chcr_crypto_init(void)
 {
-	if (cxgb4_register_pci_uld(CXGB4_PCI_ULD1, &chcr_uld_info)) {
+	if (cxgb4_register_uld(CXGB4_ULD_CRYPTO, &chcr_uld_info)) {
 		pr_err("ULD register fail: No chcr crypto support in cxgb4");
 		return -1;
 	}
@@ -228,7 +226,7 @@ static void __exit chcr_crypto_exit(void)
 		kfree(u_ctx);
 	}
 	mutex_unlock(&dev_mutex);
-	cxgb4_unregister_pci_uld(CXGB4_PCI_ULD1);
+	cxgb4_unregister_uld(CXGB4_ULD_CRYPTO);
 }
 
 module_init(chcr_crypto_init);
diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index 071d733..f170b63 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -1475,6 +1475,10 @@ static int c4iw_uld_control(void *handle, enum cxgb4_control control, ...)
 
 static struct cxgb4_uld_info c4iw_uld_info = {
 	.name = DRV_NAME,
+	.nrxq = MAX_ULD_QSETS,
+	.rxq_size = 511,
+	.ciq = true,
+	.lro = false,
 	.add = c4iw_uld_add,
 	.rx_handler = c4iw_uld_rx_handler,
 	.state_change = c4iw_uld_state_change,
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 4595569..1f9867d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -437,11 +437,6 @@ enum {
 	MAX_ETH_QSETS = 32,           /* # of Ethernet Tx/Rx queue sets */
 	MAX_OFLD_QSETS = 16,          /* # of offload Tx, iscsi Rx queue sets */
 	MAX_CTRL_QUEUES = NCHAN,      /* # of control Tx queues */
-	MAX_RDMA_QUEUES = NCHAN,      /* # of streaming RDMA Rx queues */
-	MAX_RDMA_CIQS = 32,        /* # of  RDMA concentrator IQs */
-
-	/* # of streaming iSCSIT Rx queues */
-	MAX_ISCSIT_QUEUES = MAX_OFLD_QSETS,
 };
 
 enum {
@@ -458,8 +453,7 @@ enum {
 enum {
 	INGQ_EXTRAS = 2,        /* firmware event queue and */
 				/*   forwarded interrupts */
-	MAX_INGQ = MAX_ETH_QSETS + MAX_OFLD_QSETS + MAX_RDMA_QUEUES +
-		   MAX_RDMA_CIQS + MAX_ISCSIT_QUEUES + INGQ_EXTRAS,
+	MAX_INGQ = MAX_ETH_QSETS + INGQ_EXTRAS,
 };
 
 struct adapter;
@@ -704,10 +698,6 @@ struct sge {
 	struct sge_ctrl_txq ctrlq[MAX_CTRL_QUEUES];
 
 	struct sge_eth_rxq ethrxq[MAX_ETH_QSETS];
-	struct sge_ofld_rxq iscsirxq[MAX_OFLD_QSETS];
-	struct sge_ofld_rxq iscsitrxq[MAX_ISCSIT_QUEUES];
-	struct sge_ofld_rxq rdmarxq[MAX_RDMA_QUEUES];
-	struct sge_ofld_rxq rdmaciq[MAX_RDMA_CIQS];
 	struct sge_rspq fw_evtq ____cacheline_aligned_in_smp;
 	struct sge_uld_rxq_info **uld_rxq_info;
 
@@ -717,15 +707,8 @@ struct sge {
 	u16 max_ethqsets;           /* # of available Ethernet queue sets */
 	u16 ethqsets;               /* # of active Ethernet queue sets */
 	u16 ethtxq_rover;           /* Tx queue to clean up next */
-	u16 iscsiqsets;              /* # of active iSCSI queue sets */
-	u16 niscsitq;               /* # of available iSCST Rx queues */
-	u16 rdmaqs;                 /* # of available RDMA Rx queues */
-	u16 rdmaciqs;               /* # of available RDMA concentrator IQs */
+	u16 ofldqsets;              /* # of active ofld queue sets */
 	u16 nqs_per_uld;	    /* # of Rx queues per ULD */
-	u16 iscsi_rxq[MAX_OFLD_QSETS];
-	u16 iscsit_rxq[MAX_ISCSIT_QUEUES];
-	u16 rdma_rxq[MAX_RDMA_QUEUES];
-	u16 rdma_ciq[MAX_RDMA_CIQS];
 	u16 timer_val[SGE_NTIMERS];
 	u8 counter_val[SGE_NCOUNTERS];
 	u32 fl_pg_order;            /* large page allocation size */
@@ -749,10 +732,7 @@ struct sge {
 };
 
 #define for_each_ethrxq(sge, i) for (i = 0; i < (sge)->ethqsets; i++)
-#define for_each_iscsirxq(sge, i) for (i = 0; i < (sge)->iscsiqsets; i++)
-#define for_each_iscsitrxq(sge, i) for (i = 0; i < (sge)->niscsitq; i++)
-#define for_each_rdmarxq(sge, i) for (i = 0; i < (sge)->rdmaqs; i++)
-#define for_each_rdmaciq(sge, i) for (i = 0; i < (sge)->rdmaciqs; i++)
+#define for_each_ofldtxq(sge, i) for (i = 0; i < (sge)->ofldqsets; i++)
 
 struct l2t_data;
 
@@ -786,6 +766,7 @@ struct uld_msix_bmap {
 struct uld_msix_info {
 	unsigned short vec;
 	char desc[IFNAMSIZ + 10];
+	unsigned int idx;
 };
 
 struct vf_info {
@@ -818,7 +799,7 @@ struct adapter {
 	} msix_info[MAX_INGQ + 1];
 	struct uld_msix_info *msix_info_ulds; /* msix info for uld's */
 	struct uld_msix_bmap msix_bmap_ulds; /* msix bitmap for all uld */
-	unsigned int msi_idx;
+	int msi_idx;
 
 	struct doorbell_stats db_stats;
 	struct sge sge;
@@ -836,9 +817,10 @@ struct adapter {
 	unsigned int clipt_start;
 	unsigned int clipt_end;
 	struct clip_tbl *clipt;
-	struct cxgb4_pci_uld_info *uld;
+	struct cxgb4_uld_info *uld;
 	void *uld_handle[CXGB4_ULD_MAX];
 	unsigned int num_uld;
+	unsigned int num_ofld_uld;
 	struct list_head list_node;
 	struct list_head rcu_node;
 	struct list_head mac_hlist; /* list of MAC addresses in MPS Hash */
@@ -858,6 +840,8 @@ struct adapter {
 #define T4_OS_LOG_MBOX_CMDS 256
 	struct mbox_cmd_log *mbox_log;
 
+	struct mutex uld_mutex;
+
 	struct dentry *debugfs_root;
 	bool use_bd;     /* Use SGE Back Door intfc for reading SGE Contexts */
 	bool trace_rss;	/* 1 implies that different RSS flit per filter is
@@ -1051,6 +1035,11 @@ static inline int is_pci_uld(const struct adapter *adap)
 	return adap->params.crypto;
 }
 
+static inline int is_uld(const struct adapter *adap)
+{
+	return (adap->params.offload || adap->params.crypto);
+}
+
 static inline u32 t4_read_reg(struct adapter *adap, u32 reg_addr)
 {
 	return readl(adap->regs + reg_addr);
@@ -1277,6 +1266,8 @@ int t4_sge_alloc_eth_txq(struct adapter *adap, struct sge_eth_txq *txq,
 int t4_sge_alloc_ctrl_txq(struct adapter *adap, struct sge_ctrl_txq *txq,
 			  struct net_device *dev, unsigned int iqid,
 			  unsigned int cmplqid);
+int t4_sge_mod_ctrl_txq(struct adapter *adap, unsigned int eqid,
+			unsigned int cmplqid);
 int t4_sge_alloc_ofld_txq(struct adapter *adap, struct sge_ofld_txq *txq,
 			  struct net_device *dev, unsigned int iqid);
 irqreturn_t t4_sge_intr_msix(int irq, void *cookie);
@@ -1635,7 +1626,9 @@ void t4_idma_monitor(struct adapter *adapter,
 		     int hz, int ticks);
 int t4_set_vf_mac_acl(struct adapter *adapter, unsigned int vf,
 		      unsigned int naddr, u8 *addr);
-void uld_mem_free(struct adapter *adap);
-int uld_mem_alloc(struct adapter *adap);
+void t4_uld_mem_free(struct adapter *adap);
+int t4_uld_mem_alloc(struct adapter *adap);
+void t4_uld_clean_up(struct adapter *adap);
+void t4_register_netevent_notifier(void);
 void free_rspq_fl(struct adapter *adap, struct sge_rspq *rq, struct sge_fl *fl);
 #endif /* __CXGB4_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index 91fb508..52be9a4 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -2432,17 +2432,11 @@ static int sge_qinfo_show(struct seq_file *seq, void *v)
 {
 	struct adapter *adap = seq->private;
 	int eth_entries = DIV_ROUND_UP(adap->sge.ethqsets, 4);
-	int iscsi_entries = DIV_ROUND_UP(adap->sge.iscsiqsets, 4);
-	int iscsit_entries = DIV_ROUND_UP(adap->sge.niscsitq, 4);
-	int rdma_entries = DIV_ROUND_UP(adap->sge.rdmaqs, 4);
-	int ciq_entries = DIV_ROUND_UP(adap->sge.rdmaciqs, 4);
+	int ofld_entries = DIV_ROUND_UP(adap->sge.ofldqsets, 4);
 	int ctrl_entries = DIV_ROUND_UP(MAX_CTRL_QUEUES, 4);
 	int i, r = (uintptr_t)v - 1;
-	int iscsi_idx = r - eth_entries;
-	int iscsit_idx = iscsi_idx - iscsi_entries;
-	int rdma_idx = iscsit_idx - iscsit_entries;
-	int ciq_idx = rdma_idx - rdma_entries;
-	int ctrl_idx =  ciq_idx - ciq_entries;
+	int ofld_idx = r - eth_entries;
+	int ctrl_idx =  ofld_idx - ofld_entries;
 	int fq_idx =  ctrl_idx - ctrl_entries;
 
 	if (r)
@@ -2518,119 +2512,17 @@ do { \
 		RL("FLLow:", fl.low);
 		RL("FLStarving:", fl.starving);
 
-	} else if (iscsi_idx < iscsi_entries) {
-		const struct sge_ofld_rxq *rx =
-			&adap->sge.iscsirxq[iscsi_idx * 4];
+	} else if (ofld_idx < ofld_entries) {
 		const struct sge_ofld_txq *tx =
-			&adap->sge.ofldtxq[iscsi_idx * 4];
-		int n = min(4, adap->sge.iscsiqsets - 4 * iscsi_idx);
+			&adap->sge.ofldtxq[ofld_idx * 4];
+		int n = min(4, adap->sge.ofldqsets - 4 * ofld_idx);
 
-		S("QType:", "iSCSI");
+		S("QType:", "OFLD-Txq");
 		T("TxQ ID:", q.cntxt_id);
 		T("TxQ size:", q.size);
 		T("TxQ inuse:", q.in_use);
 		T("TxQ CIDX:", q.cidx);
 		T("TxQ PIDX:", q.pidx);
-		R("RspQ ID:", rspq.abs_id);
-		R("RspQ size:", rspq.size);
-		R("RspQE size:", rspq.iqe_len);
-		R("RspQ CIDX:", rspq.cidx);
-		R("RspQ Gen:", rspq.gen);
-		S3("u", "Intr delay:", qtimer_val(adap, &rx[i].rspq));
-		S3("u", "Intr pktcnt:",
-		   adap->sge.counter_val[rx[i].rspq.pktcnt_idx]);
-		R("FL ID:", fl.cntxt_id);
-		R("FL size:", fl.size - 8);
-		R("FL pend:", fl.pend_cred);
-		R("FL avail:", fl.avail);
-		R("FL PIDX:", fl.pidx);
-		R("FL CIDX:", fl.cidx);
-		RL("RxPackets:", stats.pkts);
-		RL("RxImmPkts:", stats.imm);
-		RL("RxNoMem:", stats.nomem);
-		RL("FLAllocErr:", fl.alloc_failed);
-		RL("FLLrgAlcErr:", fl.large_alloc_failed);
-		RL("FLMapErr:", fl.mapping_err);
-		RL("FLLow:", fl.low);
-		RL("FLStarving:", fl.starving);
-
-	} else if (iscsit_idx < iscsit_entries) {
-		const struct sge_ofld_rxq *rx =
-			&adap->sge.iscsitrxq[iscsit_idx * 4];
-		int n = min(4, adap->sge.niscsitq - 4 * iscsit_idx);
-
-		S("QType:", "iSCSIT");
-		R("RspQ ID:", rspq.abs_id);
-		R("RspQ size:", rspq.size);
-		R("RspQE size:", rspq.iqe_len);
-		R("RspQ CIDX:", rspq.cidx);
-		R("RspQ Gen:", rspq.gen);
-		S3("u", "Intr delay:", qtimer_val(adap, &rx[i].rspq));
-		S3("u", "Intr pktcnt:",
-		   adap->sge.counter_val[rx[i].rspq.pktcnt_idx]);
-		R("FL ID:", fl.cntxt_id);
-		R("FL size:", fl.size - 8);
-		R("FL pend:", fl.pend_cred);
-		R("FL avail:", fl.avail);
-		R("FL PIDX:", fl.pidx);
-		R("FL CIDX:", fl.cidx);
-		RL("RxPackets:", stats.pkts);
-		RL("RxImmPkts:", stats.imm);
-		RL("RxNoMem:", stats.nomem);
-		RL("FLAllocErr:", fl.alloc_failed);
-		RL("FLLrgAlcErr:", fl.large_alloc_failed);
-		RL("FLMapErr:", fl.mapping_err);
-		RL("FLLow:", fl.low);
-		RL("FLStarving:", fl.starving);
-
-	} else if (rdma_idx < rdma_entries) {
-		const struct sge_ofld_rxq *rx =
-				&adap->sge.rdmarxq[rdma_idx * 4];
-		int n = min(4, adap->sge.rdmaqs - 4 * rdma_idx);
-
-		S("QType:", "RDMA-CPL");
-		S("Interface:",
-		  rx[i].rspq.netdev ? rx[i].rspq.netdev->name : "N/A");
-		R("RspQ ID:", rspq.abs_id);
-		R("RspQ size:", rspq.size);
-		R("RspQE size:", rspq.iqe_len);
-		R("RspQ CIDX:", rspq.cidx);
-		R("RspQ Gen:", rspq.gen);
-		S3("u", "Intr delay:", qtimer_val(adap, &rx[i].rspq));
-		S3("u", "Intr pktcnt:",
-		   adap->sge.counter_val[rx[i].rspq.pktcnt_idx]);
-		R("FL ID:", fl.cntxt_id);
-		R("FL size:", fl.size - 8);
-		R("FL pend:", fl.pend_cred);
-		R("FL avail:", fl.avail);
-		R("FL PIDX:", fl.pidx);
-		R("FL CIDX:", fl.cidx);
-		RL("RxPackets:", stats.pkts);
-		RL("RxImmPkts:", stats.imm);
-		RL("RxNoMem:", stats.nomem);
-		RL("FLAllocErr:", fl.alloc_failed);
-		RL("FLLrgAlcErr:", fl.large_alloc_failed);
-		RL("FLMapErr:", fl.mapping_err);
-		RL("FLLow:", fl.low);
-		RL("FLStarving:", fl.starving);
-
-	} else if (ciq_idx < ciq_entries) {
-		const struct sge_ofld_rxq *rx = &adap->sge.rdmaciq[ciq_idx * 4];
-		int n = min(4, adap->sge.rdmaciqs - 4 * ciq_idx);
-
-		S("QType:", "RDMA-CIQ");
-		S("Interface:",
-		  rx[i].rspq.netdev ? rx[i].rspq.netdev->name : "N/A");
-		R("RspQ ID:", rspq.abs_id);
-		R("RspQ size:", rspq.size);
-		R("RspQE size:", rspq.iqe_len);
-		R("RspQ CIDX:", rspq.cidx);
-		R("RspQ Gen:", rspq.gen);
-		S3("u", "Intr delay:", qtimer_val(adap, &rx[i].rspq));
-		S3("u", "Intr pktcnt:",
-		   adap->sge.counter_val[rx[i].rspq.pktcnt_idx]);
-		RL("RxAN:", stats.an);
-		RL("RxNoMem:", stats.nomem);
 
 	} else if (ctrl_idx < ctrl_entries) {
 		const struct sge_ctrl_txq *tx = &adap->sge.ctrlq[ctrl_idx * 4];
@@ -2672,10 +2564,7 @@ do { \
 static int sge_queue_entries(const struct adapter *adap)
 {
 	return DIV_ROUND_UP(adap->sge.ethqsets, 4) +
-	       DIV_ROUND_UP(adap->sge.iscsiqsets, 4) +
-	       DIV_ROUND_UP(adap->sge.niscsitq, 4) +
-	       DIV_ROUND_UP(adap->sge.rdmaqs, 4) +
-	       DIV_ROUND_UP(adap->sge.rdmaciqs, 4) +
+	       DIV_ROUND_UP(adap->sge.ofldqsets, 4) +
 	       DIV_ROUND_UP(MAX_CTRL_QUEUES, 4) + 1;
 }
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 44cc976..d1ebb84 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -226,11 +226,6 @@ static struct dentry *cxgb4_debugfs_root;
 
 LIST_HEAD(adapter_list);
 DEFINE_MUTEX(uld_mutex);
-/* Adapter list to be accessed from atomic context */
-static LIST_HEAD(adap_rcu_list);
-static DEFINE_SPINLOCK(adap_rcu_lock);
-static struct cxgb4_uld_info ulds[CXGB4_ULD_MAX];
-static const char *const uld_str[] = { "RDMA", "iSCSI", "iSCSIT" };
 
 static void link_report(struct net_device *dev)
 {
@@ -678,56 +673,6 @@ out:
 	return 0;
 }
 
-/* Flush the aggregated lro sessions */
-static void uldrx_flush_handler(struct sge_rspq *q)
-{
-	if (ulds[q->uld].lro_flush)
-		ulds[q->uld].lro_flush(&q->lro_mgr);
-}
-
-/**
- *	uldrx_handler - response queue handler for ULD queues
- *	@q: the response queue that received the packet
- *	@rsp: the response queue descriptor holding the offload message
- *	@gl: the gather list of packet fragments
- *
- *	Deliver an ingress offload packet to a ULD.  All processing is done by
- *	the ULD, we just maintain statistics.
- */
-static int uldrx_handler(struct sge_rspq *q, const __be64 *rsp,
-			 const struct pkt_gl *gl)
-{
-	struct sge_ofld_rxq *rxq = container_of(q, struct sge_ofld_rxq, rspq);
-	int ret;
-
-	/* FW can send CPLs encapsulated in a CPL_FW4_MSG.
-	 */
-	if (((const struct rss_header *)rsp)->opcode == CPL_FW4_MSG &&
-	    ((const struct cpl_fw4_msg *)(rsp + 1))->type == FW_TYPE_RSSCPL)
-		rsp += 2;
-
-	if (q->flush_handler)
-		ret = ulds[q->uld].lro_rx_handler(q->adap->uld_handle[q->uld],
-						  rsp, gl, &q->lro_mgr,
-						  &q->napi);
-	else
-		ret = ulds[q->uld].rx_handler(q->adap->uld_handle[q->uld],
-					      rsp, gl);
-
-	if (ret) {
-		rxq->stats.nomem++;
-		return -1;
-	}
-
-	if (gl == NULL)
-		rxq->stats.imm++;
-	else if (gl == CXGB4_MSG_AN)
-		rxq->stats.an++;
-	else
-		rxq->stats.pkts++;
-	return 0;
-}
-
 static void disable_msi(struct adapter *adapter)
 {
 	if (adapter->flags & USING_MSIX) {
@@ -779,30 +724,12 @@ static void name_msix_vecs(struct adapter *adap)
 			snprintf(adap->msix_info[msi_idx].desc, n, "%s-Rx%d",
 				 d->name, i);
 	}
-
-	/* offload queues */
-	for_each_iscsirxq(&adap->sge, i)
-		snprintf(adap->msix_info[msi_idx++].desc, n, "%s-iscsi%d",
-			 adap->port[0]->name, i);
-
-	for_each_iscsitrxq(&adap->sge, i)
-		snprintf(adap->msix_info[msi_idx++].desc, n, "%s-iSCSIT%d",
-			 adap->port[0]->name, i);
-
-	for_each_rdmarxq(&adap->sge, i)
-		snprintf(adap->msix_info[msi_idx++].desc, n, "%s-rdma%d",
-			 adap->port[0]->name, i);
-
-	for_each_rdmaciq(&adap->sge, i)
-		snprintf(adap->msix_info[msi_idx++].desc, n, "%s-rdma-ciq%d",
-			 adap->port[0]->name, i);
 }
 
 static int request_msix_queue_irqs(struct adapter *adap)
 {
 	struct sge *s = &adap->sge;
-	int err, ethqidx, iscsiqidx = 0, rdmaqidx = 0, rdmaciqqidx = 0;
-	int iscsitqidx = 0;
+	int err, ethqidx;
 	int msi_index = 2;
 
 	err = request_irq(adap->msix_info[1].vec, t4_sge_intr_msix, 0,
@@ -819,57 +746,9 @@ static int request_msix_queue_irqs(struct adapter *adap)
 			goto unwind;
 		msi_index++;
 	}
-	for_each_iscsirxq(s, iscsiqidx) {
-		err = request_irq(adap->msix_info[msi_index].vec,
-				  t4_sge_intr_msix, 0,
-				  adap->msix_info[msi_index].desc,
-				  &s->iscsirxq[iscsiqidx].rspq);
-		if (err)
-			goto unwind;
-		msi_index++;
-	}
-	for_each_iscsitrxq(s, iscsitqidx) {
-		err = request_irq(adap->msix_info[msi_index].vec,
-				  t4_sge_intr_msix, 0,
-				  adap->msix_info[msi_index].desc,
-				  &s->iscsitrxq[iscsitqidx].rspq);
-		if (err)
-			goto unwind;
-		msi_index++;
-	}
-	for_each_rdmarxq(s, rdmaqidx) {
-		err = request_irq(adap->msix_info[msi_index].vec,
-				  t4_sge_intr_msix, 0,
-				  adap->msix_info[msi_index].desc,
-				  &s->rdmarxq[rdmaqidx].rspq);
-		if (err)
-			goto unwind;
-		msi_index++;
-	}
-	for_each_rdmaciq(s, rdmaciqqidx) {
-		err = request_irq(adap->msix_info[msi_index].vec,
-				  t4_sge_intr_msix, 0,
-				  adap->msix_info[msi_index].desc,
-				  &s->rdmaciq[rdmaciqqidx].rspq);
-		if (err)
-			goto unwind;
-		msi_index++;
-	}
 	return 0;
 
 unwind:
-	while (--rdmaciqqidx >= 0)
-		free_irq(adap->msix_info[--msi_index].vec,
-			 &s->rdmaciq[rdmaciqqidx].rspq);
-	while (--rdmaqidx >= 0)
-		free_irq(adap->msix_info[--msi_index].vec,
-			 &s->rdmarxq[rdmaqidx].rspq);
-	while (--iscsitqidx >= 0)
-		free_irq(adap->msix_info[--msi_index].vec,
-			 &s->iscsitrxq[iscsitqidx].rspq);
-	while (--iscsiqidx >= 0)
-		free_irq(adap->msix_info[--msi_index].vec,
-			 &s->iscsirxq[iscsiqidx].rspq);
 	while (--ethqidx >= 0)
 		free_irq(adap->msix_info[--msi_index].vec,
 			 &s->ethrxq[ethqidx].rspq);
@@ -885,16 +764,6 @@ static void free_msix_queue_irqs(struct adapter *adap)
 	free_irq(adap->msix_info[1].vec, &s->fw_evtq);
 	for_each_ethrxq(s, i)
 		free_irq(adap->msix_info[msi_index++].vec, &s->ethrxq[i].rspq);
-	for_each_iscsirxq(s, i)
-		free_irq(adap->msix_info[msi_index++].vec,
-			 &s->iscsirxq[i].rspq);
-	for_each_iscsitrxq(s, i)
-		free_irq(adap->msix_info[msi_index++].vec,
-			 &s->iscsitrxq[i].rspq);
-	for_each_rdmarxq(s, i)
-		free_irq(adap->msix_info[msi_index++].vec, &s->rdmarxq[i].rspq);
-	for_each_rdmaciq(s, i)
-		free_irq(adap->msix_info[msi_index++].vec, &s->rdmaciq[i].rspq);
 }
 
 /**
@@ -1033,42 +902,11 @@ static void enable_rx(struct adapter *adap)
 	}
 }
 
-static int alloc_ofld_rxqs(struct adapter *adap, struct sge_ofld_rxq *q,
-			   unsigned int nq, unsigned int per_chan, int msi_idx,
-			   u16 *ids, bool lro)
-{
-	int i, err;
-
-	for (i = 0; i < nq; i++, q++) {
-		if (msi_idx > 0)
-			msi_idx++;
-		err = t4_sge_alloc_rxq(adap, &q->rspq, false,
-				       adap->port[i / per_chan],
-				       msi_idx, q->fl.size ? &q->fl : NULL,
-				       uldrx_handler,
-				       lro ? uldrx_flush_handler : NULL,
-				       0);
-		if (err)
-			return err;
-		memset(&q->stats, 0, sizeof(q->stats));
-		if (ids)
-			ids[i] = q->rspq.abs_id;
-	}
-	return 0;
-}
 
-/**
- *	setup_sge_queues - configure SGE Tx/Rx/response queues
- *	@adap: the adapter
- *
- *	Determines how many sets of SGE queues to use and initializes them.
- *	We support multiple queue sets per port if we have MSI-X, otherwise
- *	just one queue set per port.
- */
-static int setup_sge_queues(struct adapter *adap)
+static int setup_fw_sge_queues(struct adapter *adap)
 {
-	int err, i, j;
 	struct sge *s = &adap->sge;
+	int err = 0;
 
 	bitmap_zero(s->starving_fl, s->egr_sz);
 	bitmap_zero(s->txq_maperr, s->egr_sz);
@@ -1083,25 +921,27 @@ static int setup_sge_queues(struct adapter *adap)
 		adap->msi_idx = -((int)s->intrq.abs_id + 1);
 	}
 
-	/* NOTE: If you add/delete any Ingress/Egress Queue allocations in here,
-	 * don't forget to update the following which need to be
-	 * synchronized to and changes here.
-	 *
-	 * 1. The calculations of MAX_INGQ in cxgb4.h.
-	 *
-	 * 2. Update enable_msix/name_msix_vecs/request_msix_queue_irqs
-	 *    to accommodate any new/deleted Ingress Queues
-	 *    which need MSI-X Vectors.
-	 *
-	 * 3. Update sge_qinfo_show() to include information on the
-	 *    new/deleted queues.
-	 */
 	err = t4_sge_alloc_rxq(adap, &s->fw_evtq, true, adap->port[0],
 			       adap->msi_idx, NULL, fwevtq_handler, NULL, -1);
-	if (err) {
-freeout:	t4_free_sge_resources(adap);
-		return err;
-	}
+	if (err)
+		t4_free_sge_resources(adap);
+	return err;
+}
+
+/**
+ *	setup_sge_queues - configure SGE Tx/Rx/response queues
+ *	@adap: the adapter
+ *
+ *	Determines how many sets of SGE queues to use and initializes them.
+ *	We support multiple queue sets per port if we have MSI-X, otherwise
+ *	just one queue set per port.
+ */
+static int setup_sge_queues(struct adapter *adap)
+{
+	int err, i, j;
+	struct sge *s = &adap->sge;
+	struct sge_uld_rxq_info *rxq_info = s->uld_rxq_info[CXGB4_ULD_RDMA];
+	unsigned int cmplqid = 0;
 
 	for_each_port(adap, i) {
 		struct net_device *dev = adap->port[i];
@@ -1132,8 +972,8 @@ freeout:	t4_free_sge_resources(adap);
 		}
 	}
 
-	j = s->iscsiqsets / adap->params.nports; /* iscsi queues per channel */
-	for_each_iscsirxq(s, i) {
+	j = s->ofldqsets / adap->params.nports; /* iscsi queues per channel */
+	for_each_ofldtxq(s, i) {
 		err = t4_sge_alloc_ofld_txq(adap, &s->ofldtxq[i],
 					    adap->port[i / j],
 					    s->fw_evtq.cntxt_id);
@@ -1141,30 +981,15 @@ freeout:	t4_free_sge_resources(adap);
 			goto freeout;
 	}
 
-#define ALLOC_OFLD_RXQS(firstq, nq, per_chan, ids, lro) do { \
-	err = alloc_ofld_rxqs(adap, firstq, nq, per_chan, adap->msi_idx, ids, lro); \
-	if (err) \
-		goto freeout; \
-	if (adap->msi_idx > 0) \
-		adap->msi_idx += nq; \
-} while (0)
-
-	ALLOC_OFLD_RXQS(s->iscsirxq, s->iscsiqsets, j, s->iscsi_rxq, false);
-	ALLOC_OFLD_RXQS(s->iscsitrxq, s->niscsitq, j, s->iscsit_rxq, true);
-	ALLOC_OFLD_RXQS(s->rdmarxq, s->rdmaqs, 1, s->rdma_rxq, false);
-	j = s->rdmaciqs / adap->params.nports; /* rdmaq queues per channel */
-	ALLOC_OFLD_RXQS(s->rdmaciq, s->rdmaciqs, j, s->rdma_ciq, false);
-
-#undef ALLOC_OFLD_RXQS
-
 	for_each_port(adap, i) {
-		/*
-		 * Note that ->rdmarxq[i].rspq.cntxt_id below is 0 if we don't
+		/* Note that cmplqid below is 0 if we don't
 		 * have RDMA queues, and that's the right value.
 		 */
+		if (rxq_info)
+			cmplqid	= rxq_info->uldrxq[i].rspq.cntxt_id;
+
 		err = t4_sge_alloc_ctrl_txq(adap, &s->ctrlq[i], adap->port[i],
-					    s->fw_evtq.cntxt_id,
-					    s->rdmarxq[i].rspq.cntxt_id);
+					    s->fw_evtq.cntxt_id, cmplqid);
 		if (err)
 			goto freeout;
 	}
@@ -1175,6 +1000,9 @@ freeout:	t4_free_sge_resources(adap);
 		     RSSCONTROL_V(netdev2pinfo(adap->port[0])->tx_chan) |
 		     QUEUENUMBER_V(s->ethrxq[0].rspq.abs_id));
 	return 0;
+freeout:
+	t4_free_sge_resources(adap);
+	return err;
 }
 
 /*
@@ -2317,7 +2145,7 @@ static void disable_dbs(struct adapter *adap)
 
 	for_each_ethrxq(&adap->sge, i)
 		disable_txq_db(&adap->sge.ethtxq[i].q);
-	for_each_iscsirxq(&adap->sge, i)
+	for_each_ofldtxq(&adap->sge, i)
 		disable_txq_db(&adap->sge.ofldtxq[i].q);
 	for_each_port(adap, i)
 		disable_txq_db(&adap->sge.ctrlq[i].q);
@@ -2329,7 +2157,7 @@ static void enable_dbs(struct adapter *adap)
 
 	for_each_ethrxq(&adap->sge, i)
 		enable_txq_db(adap, &adap->sge.ethtxq[i].q);
-	for_each_iscsirxq(&adap->sge, i)
+	for_each_ofldtxq(&adap->sge, i)
 		enable_txq_db(adap, &adap->sge.ofldtxq[i].q);
 	for_each_port(adap, i)
 		enable_txq_db(adap, &adap->sge.ctrlq[i].q);
@@ -2337,9 +2165,10 @@ static void enable_dbs(struct adapter *adap)
 
 static void notify_rdma_uld(struct adapter *adap, enum cxgb4_control cmd)
 {
-	if (adap->uld_handle[CXGB4_ULD_RDMA])
-		ulds[CXGB4_ULD_RDMA].control(adap->uld_handle[CXGB4_ULD_RDMA],
-				cmd);
+	enum cxgb4_uld type = CXGB4_ULD_RDMA;
+
+	if (adap->uld && adap->uld[type].handle)
+		adap->uld[type].control(adap->uld[type].handle, cmd);
 }
 
 static void process_db_full(struct work_struct *work)
@@ -2393,13 +2222,14 @@ out:
 	if (ret)
 		CH_WARN(adap, "DB drop recovery failed.\n");
 }
+
 static void recover_all_queues(struct adapter *adap)
 {
 	int i;
 
 	for_each_ethrxq(&adap->sge, i)
 		sync_txq_pidx(adap, &adap->sge.ethtxq[i].q);
-	for_each_iscsirxq(&adap->sge, i)
+	for_each_ofldtxq(&adap->sge, i)
 		sync_txq_pidx(adap, &adap->sge.ofldtxq[i].q);
 	for_each_port(adap, i)
 		sync_txq_pidx(adap, &adap->sge.ctrlq[i].q);
@@ -2464,94 +2294,12 @@ void t4_db_dropped(struct adapter *adap)
 	queue_work(adap->workq, &adap->db_drop_task);
 }
 
-static void uld_attach(struct adapter *adap, unsigned int uld)
-{
-	void *handle;
-	struct cxgb4_lld_info lli;
-	unsigned short i;
-
-	lli.pdev = adap->pdev;
-	lli.pf = adap->pf;
-	lli.l2t = adap->l2t;
-	lli.tids = &adap->tids;
-	lli.ports = adap->port;
-	lli.vr = &adap->vres;
-	lli.mtus = adap->params.mtus;
-	if (uld == CXGB4_ULD_RDMA) {
-		lli.rxq_ids = adap->sge.rdma_rxq;
-		lli.ciq_ids = adap->sge.rdma_ciq;
-		lli.nrxq = adap->sge.rdmaqs;
-		lli.nciq = adap->sge.rdmaciqs;
-	} else if (uld == CXGB4_ULD_ISCSI) {
-		lli.rxq_ids = adap->sge.iscsi_rxq;
-		lli.nrxq = adap->sge.iscsiqsets;
-	} else if (uld == CXGB4_ULD_ISCSIT) {
-		lli.rxq_ids = adap->sge.iscsit_rxq;
-		lli.nrxq = adap->sge.niscsitq;
-	}
-	lli.ntxq = adap->sge.iscsiqsets;
-	lli.nchan = adap->params.nports;
-	lli.nports = adap->params.nports;
-	lli.wr_cred = adap->params.ofldq_wr_cred;
-	lli.adapter_type = adap->params.chip;
-	lli.iscsi_iolen = MAXRXDATA_G(t4_read_reg(adap, TP_PARA_REG2_A));
-	lli.iscsi_tagmask = t4_read_reg(adap, ULP_RX_ISCSI_TAGMASK_A);
-	lli.iscsi_pgsz_order = t4_read_reg(adap, ULP_RX_ISCSI_PSZ_A);
-	lli.iscsi_llimit = t4_read_reg(adap, ULP_RX_ISCSI_LLIMIT_A);
-	lli.iscsi_ppm = &adap->iscsi_ppm;
-	lli.cclk_ps = 1000000000 / adap->params.vpd.cclk;
-	lli.udb_density = 1 << adap->params.sge.eq_qpp;
-	lli.ucq_density = 1 << adap->params.sge.iq_qpp;
-	lli.filt_mode = adap->params.tp.vlan_pri_map;
-	/* MODQ_REQ_MAP sets queues 0-3 to chan 0-3 */
-	for (i = 0; i < NCHAN; i++)
-		lli.tx_modq[i] = i;
-	lli.gts_reg = adap->regs + MYPF_REG(SGE_PF_GTS_A);
-	lli.db_reg = adap->regs + MYPF_REG(SGE_PF_KDOORBELL_A);
-	lli.fw_vers = adap->params.fw_vers;
-	lli.dbfifo_int_thresh = dbfifo_int_thresh;
-	lli.sge_ingpadboundary = adap->sge.fl_align;
-	lli.sge_egrstatuspagesize = adap->sge.stat_len;
-	lli.sge_pktshift = adap->sge.pktshift;
-	lli.enable_fw_ofld_conn = adap->flags & FW_OFLD_CONN;
-	lli.max_ordird_qp = adap->params.max_ordird_qp;
-	lli.max_ird_adapter = adap->params.max_ird_adapter;
-	lli.ulptx_memwrite_dsgl = adap->params.ulptx_memwrite_dsgl;
-	lli.nodeid = dev_to_node(adap->pdev_dev);
-
-	handle = ulds[uld].add(&lli);
-	if (IS_ERR(handle)) {
-		dev_warn(adap->pdev_dev,
-			 "could not attach to the %s driver, error %ld\n",
-			 uld_str[uld], PTR_ERR(handle));
-		return;
-	}
-
-	adap->uld_handle[uld] = handle;
-
+void t4_register_netevent_notifier(void)
+{
 	if (!netevent_registered) {
 		register_netevent_notifier(&cxgb4_netevent_nb);
 		netevent_registered = true;
 	}
-
-	if (adap->flags & FULL_INIT_DONE)
-		ulds[uld].state_change(handle, CXGB4_STATE_UP);
-}
-
-static void attach_ulds(struct adapter *adap)
-{
-	unsigned int i;
-
-	spin_lock(&adap_rcu_lock);
-	list_add_tail_rcu(&adap->rcu_node, &adap_rcu_list);
-	spin_unlock(&adap_rcu_lock);
-
-	mutex_lock(&uld_mutex);
-	list_add_tail(&adap->list_node, &adapter_list);
-	for (i = 0; i < CXGB4_ULD_MAX; i++)
-		if (ulds[i].add)
-			uld_attach(adap, i);
-	mutex_unlock(&uld_mutex);
 }
 
 static void detach_ulds(struct adapter *adap)
@@ -2561,12 +2309,6 @@ static void detach_ulds(struct adapter *adap)
 	mutex_lock(&uld_mutex);
 	list_del(&adap->list_node);
 	for (i = 0; i < CXGB4_ULD_MAX; i++)
-		if (adap->uld_handle[i]) {
-			ulds[i].state_change(adap->uld_handle[i],
-					     CXGB4_STATE_DETACH);
-			adap->uld_handle[i] = NULL;
-		}
-	for (i = 0; i < CXGB4_PCI_ULD_MAX; i++)
 		if (adap->uld && adap->uld[i].handle) {
 			adap->uld[i].state_change(adap->uld[i].handle,
 					     CXGB4_STATE_DETACH);
@@ -2577,10 +2319,6 @@ static void detach_ulds(struct adapter *adap)
 		netevent_registered = false;
 	}
 	mutex_unlock(&uld_mutex);
-
-	spin_lock(&adap_rcu_lock);
-	list_del_rcu(&adap->rcu_node);
-	spin_unlock(&adap_rcu_lock);
 }
 
 static void notify_ulds(struct adapter *adap, enum cxgb4_state new_state)
@@ -2589,65 +2327,12 @@ static void notify_ulds(struct adapter *adap, enum cxgb4_state new_state)
 
 	mutex_lock(&uld_mutex);
 	for (i = 0; i < CXGB4_ULD_MAX; i++)
-		if (adap->uld_handle[i])
-			ulds[i].state_change(adap->uld_handle[i], new_state);
-	for (i = 0; i < CXGB4_PCI_ULD_MAX; i++)
 		if (adap->uld && adap->uld[i].handle)
 			adap->uld[i].state_change(adap->uld[i].handle,
 						  new_state);
 	mutex_unlock(&uld_mutex);
 }
 
-/**
- *	cxgb4_register_uld - register an upper-layer driver
- *	@type: the ULD type
- *	@p: the ULD methods
- *
- *	Registers an upper-layer driver with this driver and notifies the ULD
- *	about any presently available devices that support its type.  Returns
- *	%-EBUSY if a ULD of the same type is already registered.
- */
-int cxgb4_register_uld(enum cxgb4_uld type, const struct cxgb4_uld_info *p)
-{
-	int ret = 0;
-	struct adapter *adap;
-
-	if (type >= CXGB4_ULD_MAX)
-		return -EINVAL;
-	mutex_lock(&uld_mutex);
-	if (ulds[type].add) {
-		ret = -EBUSY;
-		goto out;
-	}
-	ulds[type] = *p;
-	list_for_each_entry(adap, &adapter_list, list_node)
-		uld_attach(adap, type);
-out:	mutex_unlock(&uld_mutex);
-	return ret;
-}
-EXPORT_SYMBOL(cxgb4_register_uld);
-
-/**
- *	cxgb4_unregister_uld - unregister an upper-layer driver
- *	@type: the ULD type
- *
- *	Unregisters an existing upper-layer driver.
- */
-int cxgb4_unregister_uld(enum cxgb4_uld type)
-{
-	struct adapter *adap;
-
-	if (type >= CXGB4_ULD_MAX)
-		return -EINVAL;
-	mutex_lock(&uld_mutex);
-	list_for_each_entry(adap, &adapter_list, list_node)
-		adap->uld_handle[type] = NULL;
-	ulds[type].add = NULL;
-	mutex_unlock(&uld_mutex);
-	return 0;
-}
-EXPORT_SYMBOL(cxgb4_unregister_uld);
-
 #if IS_ENABLED(CONFIG_IPV6)
 static int cxgb4_inet6addr_handler(struct notifier_block *this,
 				   unsigned long event, void *data)
@@ -2752,7 +2437,6 @@ static int cxgb_up(struct adapter *adap)
 				  adap->msix_info[0].desc, adap);
 		if (err)
 			goto irq_err;
-
 		err = request_msix_queue_irqs(adap);
 		if (err) {
 			free_irq(adap->msix_info[0].vec, adap);
@@ -4262,6 +3946,7 @@ static int adap_init0(struct adapter *adap)
 		adap->params.ofldq_wr_cred = val[5];
 
 		adap->params.offload = 1;
+		adap->num_ofld_uld += 1;
 	}
 	if (caps_cmd.rdmacaps) {
 		params[0] = FW_PARAM_PFVF(STAG_START);
@@ -4314,6 +3999,7 @@ static int adap_init0(struct adapter *adap)
 			 "max_ordird_qp %d max_ird_adapter %d\n",
 			 adap->params.max_ordird_qp,
 			 adap->params.max_ird_adapter);
+		adap->num_ofld_uld += 2;
 	}
 	if (caps_cmd.iscsicaps) {
 		params[0] = FW_PARAM_PFVF(ISCSI_START);
@@ -4324,6 +4010,8 @@ static int adap_init0(struct adapter *adap)
 			goto bye;
 		adap->vres.iscsi.start = val[0];
 		adap->vres.iscsi.size = val[1] - val[0] + 1;
+		/* LIO target and cxgb4i initiaitor */
+		adap->num_ofld_uld += 2;
 	}
 	if (caps_cmd.cryptocaps) {
 		/* Should query params here...TODO */
@@ -4523,14 +4211,14 @@ static void cfg_queues(struct adapter *adap)
 #ifndef CONFIG_CHELSIO_T4_DCB
 	int q10g = 0;
 #endif
-	int ciq_size;
 
 	/* Reduce memory usage in kdump environment, disable all offload.
 	 */
 	if (is_kdump_kernel()) {
 		adap->params.offload = 0;
 		adap->params.crypto = 0;
-	} else if (adap->num_uld && uld_mem_alloc(adap)) {
+	} else if (is_uld(adap) && t4_uld_mem_alloc(adap)) {
+		adap->params.offload = 0;
 		adap->params.crypto = 0;
 	}
 
@@ -4576,33 +4264,18 @@ static void cfg_queues(struct adapter *adap)
 	s->ethqsets = qidx;
 	s->max_ethqsets = qidx;   /* MSI-X may lower it later */
 
-	if (is_offload(adap)) {
+	if (is_uld(adap)) {
 		/*
 		 * For offload we use 1 queue/channel if all ports are up to 1G,
 		 * otherwise we divide all available queues amongst the channels
 		 * capped by the number of available cores.
 		 */
 		if (n10g) {
-			i = min_t(int, ARRAY_SIZE(s->iscsirxq),
-				  num_online_cpus());
-			s->iscsiqsets = roundup(i, adap->params.nports);
-		} else
-			s->iscsiqsets = adap->params.nports;
-		/* For RDMA one Rx queue per channel suffices */
-		s->rdmaqs = adap->params.nports;
-		/* Try and allow at least 1 CIQ per cpu rounding down
-		 * to the number of ports, with a minimum of 1 per port.
-		 * A 2 port card in a 6 cpu system: 6 CIQs, 3 / port.
-		 * A 4 port card in a 6 cpu system: 4 CIQs, 1 / port.
-		 * A 4 port card in a 2 cpu system: 4 CIQs, 1 / port.
-		 */
-		s->rdmaciqs = min_t(int, MAX_RDMA_CIQS, num_online_cpus());
-		s->rdmaciqs = (s->rdmaciqs / adap->params.nports) *
-				adap->params.nports;
-		s->rdmaciqs = max_t(int, s->rdmaciqs, adap->params.nports);
-
-		if (!is_t4(adap->params.chip))
-			s->niscsitq = s->iscsiqsets;
+			i = num_online_cpus();
+			s->ofldqsets = roundup(i, adap->params.nports);
+		} else {
+			s->ofldqsets = adap->params.nports;
+		}
 	}
 
 	for (i = 0; i < ARRAY_SIZE(s->ethrxq); i++) {
@@ -4621,47 +4294,8 @@ static void cfg_queues(struct adapter *adap)
 	for (i = 0; i < ARRAY_SIZE(s->ofldtxq); i++)
 		s->ofldtxq[i].q.size = 1024;
 
-	for (i = 0; i < ARRAY_SIZE(s->iscsirxq); i++) {
-		struct sge_ofld_rxq *r = &s->iscsirxq[i];
-
-		init_rspq(adap, &r->rspq, 5, 1, 1024, 64);
-		r->rspq.uld = CXGB4_ULD_ISCSI;
-		r->fl.size = 72;
-	}
-
-	if (!is_t4(adap->params.chip)) {
-		for (i = 0; i < ARRAY_SIZE(s->iscsitrxq); i++) {
-			struct sge_ofld_rxq *r = &s->iscsitrxq[i];
-
-			init_rspq(adap, &r->rspq, 5, 1, 1024, 64);
-			r->rspq.uld = CXGB4_ULD_ISCSIT;
-			r->fl.size = 72;
-		}
-	}
-
-	for (i = 0; i < ARRAY_SIZE(s->rdmarxq); i++) {
-		struct sge_ofld_rxq *r = &s->rdmarxq[i];
-
-		init_rspq(adap, &r->rspq, 5, 1, 511, 64);
-		r->rspq.uld = CXGB4_ULD_RDMA;
-		r->fl.size = 72;
-	}
-
-	ciq_size = 64 + adap->vres.cq.size + adap->tids.nftids;
-	if (ciq_size > SGE_MAX_IQ_SIZE) {
-		CH_WARN(adap, "CIQ size too small for available IQs\n");
-		ciq_size = SGE_MAX_IQ_SIZE;
-	}
-
-	for (i = 0; i < ARRAY_SIZE(s->rdmaciq); i++) {
-		struct sge_ofld_rxq *r = &s->rdmaciq[i];
-
-		init_rspq(adap, &r->rspq, 5, 1, ciq_size, 64);
-		r->rspq.uld = CXGB4_ULD_RDMA;
-	}
-
 	init_rspq(adap, &s->fw_evtq, 0, 1, 1024, 64);
-	init_rspq(adap, &s->intrq, 0, 1, 2 * MAX_INGQ, 64);
+	init_rspq(adap, &s->intrq, 0, 1, 512, 64);
 }
 
 /*
@@ -4695,7 +4329,15 @@ static void reduce_ethqs(struct adapter *adap, int n)
 static int get_msix_info(struct adapter *adap)
 {
 	struct uld_msix_info *msix_info;
-	int max_ingq = (MAX_OFLD_QSETS * adap->num_uld);
+	unsigned int max_ingq = 0;
+
+	if (is_offload(adap))
+		max_ingq += MAX_OFLD_QSETS * adap->num_ofld_uld;
+	if (is_pci_uld(adap))
+		max_ingq += MAX_OFLD_QSETS * adap->num_uld;
+
+	if (!max_ingq)
+		goto out;
 
 	msix_info = kcalloc(max_ingq, sizeof(*msix_info), GFP_KERNEL);
 	if (!msix_info)
@@ -4709,12 +4351,13 @@ static int get_msix_info(struct adapter *adap)
 	}
 	spin_lock_init(&adap->msix_bmap_ulds.lock);
 	adap->msix_info_ulds = msix_info;
+out:
 	return 0;
 }
 
 static void free_msix_info(struct adapter *adap)
 {
-	if (!adap->num_uld)
+	if (!(adap->num_uld && adap->num_ofld_uld))
 		return;
 
 	kfree(adap->msix_info_ulds);
@@ -4733,32 +4376,32 @@ static int enable_msix(struct adapter *adap)
 	struct msix_entry *entries;
 	int max_ingq = MAX_INGQ;
 
-	max_ingq += (MAX_OFLD_QSETS * adap->num_uld);
+	if (is_pci_uld(adap))
+		max_ingq += (MAX_OFLD_QSETS * adap->num_uld);
+	if (is_offload(adap))
+		max_ingq += (MAX_OFLD_QSETS * adap->num_ofld_uld);
 	entries = kmalloc(sizeof(*entries) * (max_ingq + 1),
 			  GFP_KERNEL);
 	if (!entries)
 		return -ENOMEM;
 
 	/* map for msix */
-	if (is_pci_uld(adap) && get_msix_info(adap))
+	if (get_msix_info(adap)) {
+		adap->params.offload = 0;
 		adap->params.crypto = 0;
+	}
 
 	for (i = 0; i < max_ingq + 1; ++i)
 		entries[i].entry = i;
 
 	want = s->max_ethqsets + EXTRA_VECS;
 	if (is_offload(adap)) {
-		want += s->rdmaqs + s->rdmaciqs + s->iscsiqsets	+
-			s->niscsitq;
-		/* need nchan for each possible ULD */
-		if (is_t4(adap->params.chip))
-			ofld_need = 3 * nchan;
-		else
-			ofld_need = 4 * nchan;
+		want += adap->num_ofld_uld * s->ofldqsets;
+		ofld_need = adap->num_ofld_uld * nchan;
 	}
 	if (is_pci_uld(adap)) {
-		want += netif_get_num_default_rss_queues() * nchan;
-		uld_need = nchan;
+		want += adap->num_uld * s->ofldqsets;
+		uld_need = adap->num_uld * nchan;
 	}
 #ifdef CONFIG_CHELSIO_T4_DCB
 	/* For Data Center Bridging we need 8 Ethernet TX Priority Queues for
@@ -4786,43 +4429,25 @@ static int enable_msix(struct adapter *adap)
 		if (i < s->ethqsets)
 			reduce_ethqs(adap, i);
 	}
-	if (is_pci_uld(adap)) {
+	if (is_uld(adap)) {
 		if (allocated < want)
 			s->nqs_per_uld = nchan;
 		else
-			s->nqs_per_uld = netif_get_num_default_rss_queues() *
-					nchan;
+			s->nqs_per_uld = s->ofldqsets;
 	}
 
-	if (is_offload(adap)) {
-		if (allocated < want) {
-			s->rdmaqs = nchan;
-			s->rdmaciqs = nchan;
-
-			if (!is_t4(adap->params.chip))
-				s->niscsitq = nchan;
-		}
-
-		/* leftovers go to OFLD */
-		i = allocated - EXTRA_VECS - s->max_ethqsets -
-			s->rdmaqs - s->rdmaciqs - s->niscsitq;
-		if (is_pci_uld(adap))
-			i -= s->nqs_per_uld * adap->num_uld;
-		s->iscsiqsets = (i / nchan) * nchan;  /* round down */
-
-	}
-
-	for (i = 0; i < (allocated - (s->nqs_per_uld * adap->num_uld)); ++i)
+	for (i = 0; i < (s->max_ethqsets + EXTRA_VECS); ++i)
 		adap->msix_info[i].vec = entries[i].vector;
-	if (is_pci_uld(adap)) {
-		for (j = 0 ; i < allocated; ++i, j++)
+	if (is_uld(adap)) {
+		for (j = 0 ; i < allocated; ++i, j++) {
 			adap->msix_info_ulds[j].vec = entries[i].vector;
+			adap->msix_info_ulds[j].idx = i;
+		}
 		adap->msix_bmap_ulds.mapsize = j;
 	}
 	dev_info(adap->pdev_dev, "%d MSI-X vectors allocated, "
-		 "nic %d iscsi %d rdma cpl %d rdma ciq %d uld %d\n",
-		 allocated, s->max_ethqsets, s->iscsiqsets, s->rdmaqs,
-		 s->rdmaciqs, s->nqs_per_uld);
+		 "nic %d per uld %d\n",
+		 allocated, s->max_ethqsets, s->nqs_per_uld);
 
 	kfree(entries);
 	return 0;
@@ -5535,10 +5160,14 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* PCIe EEH recovery on powerpc platforms needs fundamental reset */
 	pdev->needs_freset = 1;
 
-	if (is_offload(adapter))
-		attach_ulds(adapter);
+	if (is_uld(adapter)) {
+		mutex_lock(&uld_mutex);
+		list_add_tail(&adapter->list_node, &adapter_list);
+		mutex_unlock(&uld_mutex);
+	}
 
 	print_adapter_info(adapter);
+	setup_fw_sge_queues(adapter);
 	return 0;
 
 sriov:
@@ -5593,8 +5222,8 @@ sriov:
 	free_some_resources(adapter);
 	if (adapter->flags & USING_MSIX)
 		free_msix_info(adapter);
-	if (adapter->num_uld)
-		uld_mem_free(adapter);
+	if (adapter->num_uld || adapter->num_ofld_uld)
+		t4_uld_mem_free(adapter);
  out_unmap_bar:
 	if (!is_t4(adapter->params.chip))
 		iounmap(adapter->bar2);
@@ -5631,7 +5260,7 @@ static void remove_one(struct pci_dev *pdev)
 		 */
 		destroy_workqueue(adapter->workq);
 
-		if (is_offload(adapter))
+		if (is_uld(adapter))
 			detach_ulds(adapter);
 
 		disable_interrupts(adapter);
@@ -5658,8 +5287,8 @@ static void remove_one(struct pci_dev *pdev)
 
 		if (adapter->flags & USING_MSIX)
 			free_msix_info(adapter);
-		if (adapter->num_uld)
-			uld_mem_free(adapter);
+		if (adapter->num_uld || adapter->num_ofld_uld)
+			t4_uld_mem_free(adapter);
 		free_some_resources(adapter);
 #if IS_ENABLED(CONFIG_IPV6)
 		t4_cleanup_clip_tbl(adapter);
@@ -5690,12 +5319,58 @@ static void remove_one(struct pci_dev *pdev)
 #endif
 }
 
+/* "Shutdown" quiesces the device, stopping Ingress Packet and Interrupt
+ * delivery.  This is essentially a stripped down version of the PCI remove()
+ * function where we do the minimal amount of work necessary to shutdown any
+ * further activity.
+ */
+static void shutdown_one(struct pci_dev *pdev)
+{
+	struct adapter *adapter = pci_get_drvdata(pdev);
+
+	/* As with remove_one() above (see extended comment), we only want do
+	 * do cleanup on PCI Devices which went all the way through init_one()
+	 * ...
+	 */
+	if (!adapter) {
+		pci_release_regions(pdev);
+		return;
+	}
+
+	if (adapter->pf == 4) {
+		int i;
+
+		for_each_port(adapter, i)
+			if (adapter->port[i]->reg_state == NETREG_REGISTERED)
+				cxgb_close(adapter->port[i]);
+
+		t4_uld_clean_up(adapter);
+		disable_interrupts(adapter);
+		disable_msi(adapter);
+
+		t4_sge_stop(adapter);
+		if (adapter->flags & FW_OK)
+			t4_fw_bye(adapter, adapter->mbox);
+	}
+#ifdef CONFIG_PCI_IOV
+	else {
+		if (adapter->port[0])
+			unregister_netdev(adapter->port[0]);
+		iounmap(adapter->regs);
+		kfree(adapter->vfinfo);
+		kfree(adapter);
+		pci_disable_sriov(pdev);
+		pci_release_regions(pdev);
+	}
+#endif
+}
+
 static struct pci_driver cxgb4_driver = {
 	.name     = KBUILD_MODNAME,
 	.id_table = cxgb4_pci_tbl,
 	.probe    = init_one,
 	.remove   = remove_one,
-	.shutdown = remove_one,
+	.shutdown = shutdown_one,
 #ifdef CONFIG_PCI_IOV
 	.sriov_configure = cxgb4_iov_configure,
 #endif
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
index 5d402ba..fc04e3b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
@@ -82,6 +82,24 @@ static void free_msix_idx_in_bmap(struct adapter *adap, unsigned int msix_idx)
 	spin_unlock_irqrestore(&bmap->lock, flags);
 }
 
+/* Flush the aggregated lro sessions */
+static void uldrx_flush_handler(struct sge_rspq *q)
+{
+	struct adapter *adap = q->adap;
+
+	if (adap->uld[q->uld].lro_flush)
+		adap->uld[q->uld].lro_flush(&q->lro_mgr);
+}
+
+/**
+ *	uldrx_handler - response queue handler for ULD queues
+ *	@q: the response queue that received the packet
+ *	@rsp: the response queue descriptor holding the offload message
+ *	@gl: the gather list of packet fragments
+ *
+ *	Deliver an ingress offload packet to a ULD.  All processing is done by
+ *	the ULD, we just maintain statistics.
+ */
 static int uldrx_handler(struct sge_rspq *q, const __be64 *rsp,
 			 const struct pkt_gl *gl)
 {
@@ -124,8 +142,8 @@ static int alloc_uld_rxqs(struct adapter *adap,
 	struct sge_ofld_rxq *q = rxq_info->uldrxq + offset;
 	unsigned short *ids = rxq_info->rspq_id + offset;
 	unsigned int per_chan = nq / adap->params.nports;
-	unsigned int msi_idx, bmap_idx;
-	int i, err;
+	unsigned int bmap_idx = 0;
+	int i, err, msi_idx;
 
 	if (adap->flags & USING_MSIX)
 		msi_idx = 1;
@@ -135,14 +153,14 @@ static int alloc_uld_rxqs(struct adapter *adap,
 	for (i = 0; i < nq; i++, q++) {
 		if (msi_idx >= 0) {
 			bmap_idx = get_msix_idx_from_bmap(adap);
-			adap->msi_idx++;
+			msi_idx = adap->msix_info_ulds[bmap_idx].idx;
 		}
 		err = t4_sge_alloc_rxq(adap, &q->rspq, false,
 				       adap->port[i / per_chan],
-				       adap->msi_idx,
+				       msi_idx,
 				       q->fl.size ? &q->fl : NULL,
 				       uldrx_handler,
-				       NULL,
+				       lro ? uldrx_flush_handler : NULL,
 				       0);
 		if (err)
 			goto freeout;
@@ -159,7 +177,6 @@ freeout:
 		if (q->rspq.desc)
 			free_rspq_fl(adap, &q->rspq,
 				     q->fl.size ? &q->fl : NULL);
-		adap->msi_idx--;
 	}
 
 	/* We need to free rxq also in case of ciq allocation failure */
@@ -169,7 +186,6 @@ freeout:
 			if (q->rspq.desc)
 				free_rspq_fl(adap, &q->rspq,
 					     q->fl.size ? &q->fl : NULL);
-			adap->msi_idx--;
 		}
 	}
 	return err;
@@ -178,17 +194,38 @@ freeout:
 int setup_sge_queues_uld(struct adapter *adap, unsigned int uld_type, bool lro)
 {
 	struct sge_uld_rxq_info *rxq_info = adap->sge.uld_rxq_info[uld_type];
+	int i, ret = 0;
 
 	if (adap->flags & USING_MSIX) {
-		rxq_info->msix_tbl = kzalloc(rxq_info->nrxq + rxq_info->nciq,
+		rxq_info->msix_tbl = kcalloc((rxq_info->nrxq + rxq_info->nciq),
+					     sizeof(unsigned short),
 					     GFP_KERNEL);
 		if (!rxq_info->msix_tbl)
 			return -ENOMEM;
 	}
 
-	return !(!alloc_uld_rxqs(adap, rxq_info, rxq_info->nrxq, 0, lro) &&
+	ret = !(!alloc_uld_rxqs(adap, rxq_info, rxq_info->nrxq, 0, lro) &&
 		 !alloc_uld_rxqs(adap, rxq_info, rxq_info->nciq,
 				 rxq_info->nrxq, lro));
+
+	/* Tell uP to route control queue completions to rdma rspq */
+	if (adap->flags & FULL_INIT_DONE &&
+	    !ret && uld_type == CXGB4_ULD_RDMA) {
+		struct sge *s = &adap->sge;
+		unsigned int cmplqid;
+		u32 param, cmdop;
+
+		cmdop = FW_PARAMS_PARAM_DMAQ_EQ_CMPLIQID_CTRL;
+		for_each_port(adap, i) {
+			cmplqid = rxq_info->uldrxq[i].rspq.cntxt_id;
+			param = (FW_PARAMS_MNEM_V(FW_PARAMS_MNEM_DMAQ) |
+				 FW_PARAMS_PARAM_X_V(cmdop) |
+				 FW_PARAMS_PARAM_YZ_V(s->ctrlq[i].q.cntxt_id));
+			ret = t4_set_params(adap, adap->mbox, adap->pf,
+					    0, 1, &param, &cmplqid);
+		}
+	}
+	return ret;
 }
 
 static void t4_free_uld_rxqs(struct adapter *adap, int n,
@@ -198,7 +235,6 @@ static void t4_free_uld_rxqs(struct adapter *adap, int n,
 		if (q->rspq.desc)
 			free_rspq_fl(adap, &q->rspq,
 				     q->fl.size ? &q->fl : NULL);
-		adap->msi_idx--;
 	}
 }
 
@@ -206,6 +242,21 @@ void free_sge_queues_uld(struct adapter *adap, unsigned int uld_type)
 {
 	struct sge_uld_rxq_info *rxq_info = adap->sge.uld_rxq_info[uld_type];
 
+	if (adap->flags & FULL_INIT_DONE && uld_type == CXGB4_ULD_RDMA) {
+		struct sge *s = &adap->sge;
+		u32 param, cmdop, cmplqid = 0;
+		int i;
+
+		cmdop = FW_PARAMS_PARAM_DMAQ_EQ_CMPLIQID_CTRL;
+		for_each_port(adap, i) {
+			param = (FW_PARAMS_MNEM_V(FW_PARAMS_MNEM_DMAQ) |
+				 FW_PARAMS_PARAM_X_V(cmdop) |
+				 FW_PARAMS_PARAM_YZ_V(s->ctrlq[i].q.cntxt_id));
+			t4_set_params(adap, adap->mbox, adap->pf,
+				      0, 1, &param, &cmplqid);
+		}
+	}
+
 	if (rxq_info->nciq)
 		t4_free_uld_rxqs(adap, rxq_info->nciq,
 				 rxq_info->uldrxq + rxq_info->nrxq);
@@ -215,26 +266,38 @@ void free_sge_queues_uld(struct adapter *adap, unsigned int uld_type)
 }
 
 int cfg_queues_uld(struct adapter *adap, unsigned int uld_type,
-		   const struct cxgb4_pci_uld_info *uld_info)
+		   const struct cxgb4_uld_info *uld_info)
 {
 	struct sge *s = &adap->sge;
 	struct sge_uld_rxq_info *rxq_info;
-	int i, nrxq;
+	int i, nrxq, ciq_size;
 
 	rxq_info = kzalloc(sizeof(*rxq_info), GFP_KERNEL);
 	if (!rxq_info)
 		return -ENOMEM;
 
-	if (uld_info->nrxq > s->nqs_per_uld)
-		rxq_info->nrxq = s->nqs_per_uld;
-	else
-		rxq_info->nrxq = uld_info->nrxq;
-	if (!uld_info->nciq)
+	if (adap->flags & USING_MSIX && uld_info->nrxq > s->nqs_per_uld) {
+		i = s->nqs_per_uld;
+		rxq_info->nrxq = roundup(i, adap->params.nports);
+	} else {
+		i = min_t(int, uld_info->nrxq,
+			  num_online_cpus());
+		rxq_info->nrxq = roundup(i, adap->params.nports);
+	}
+	if (!uld_info->ciq) {
 		rxq_info->nciq = 0;
-	else if (uld_info->nciq && uld_info->nciq > s->nqs_per_uld)
-		rxq_info->nciq = s->nqs_per_uld;
-	else
-		rxq_info->nciq = uld_info->nciq;
+	} else  {
+		if (adap->flags & USING_MSIX)
+			rxq_info->nciq = min_t(int, s->nqs_per_uld,
+					       num_online_cpus());
+		else
+			rxq_info->nciq = min_t(int, MAX_OFLD_QSETS,
+					       num_online_cpus());
+		rxq_info->nciq = ((rxq_info->nciq / adap->params.nports) *
+				  adap->params.nports);
+		rxq_info->nciq = max_t(int, rxq_info->nciq,
+				       adap->params.nports);
+	}
 
 	nrxq = rxq_info->nrxq + rxq_info->nciq; /* total rxq's */
 	rxq_info->uldrxq = kcalloc(nrxq, sizeof(struct sge_ofld_rxq),
@@ -259,12 +322,17 @@ int cfg_queues_uld(struct adapter *adap, unsigned int uld_type,
 		r->fl.size = 72;
 	}
 
+	ciq_size = 64 + adap->vres.cq.size + adap->tids.nftids;
+	if (ciq_size > SGE_MAX_IQ_SIZE) {
+		dev_warn(adap->pdev_dev, "CIQ size too small for available IQs\n");
+		ciq_size = SGE_MAX_IQ_SIZE;
+	}
+
 	for (i = rxq_info->nrxq; i < nrxq; i++) {
 		struct sge_ofld_rxq *r = &rxq_info->uldrxq[i];
 
-		init_rspq(adap, &r->rspq, 5, 1, uld_info->ciq_size, 64);
+		init_rspq(adap, &r->rspq, 5, 1, ciq_size, 64);
 		r->rspq.uld = uld_type;
-		r->fl.size = 72;
 	}
 
 	memcpy(rxq_info->name, uld_info->name, IFNAMSIZ);
@@ -285,7 +353,8 @@ void free_queues_uld(struct adapter *adap, unsigned int uld_type)
 int request_msix_queue_irqs_uld(struct adapter *adap, unsigned int uld_type)
 {
 	struct sge_uld_rxq_info *rxq_info = adap->sge.uld_rxq_info[uld_type];
-	int idx, bmap_idx, err = 0;
+	int err = 0;
+	unsigned int idx, bmap_idx;
 
 	for_each_uldrxq(rxq_info, idx) {
 		bmap_idx = rxq_info->msix_tbl[idx];
@@ -310,10 +379,10 @@ unwind:
 void free_msix_queue_irqs_uld(struct adapter *adap, unsigned int uld_type)
 {
 	struct sge_uld_rxq_info *rxq_info = adap->sge.uld_rxq_info[uld_type];
-	int idx;
+	unsigned int idx, bmap_idx;
 
 	for_each_uldrxq(rxq_info, idx) {
-		unsigned int bmap_idx = rxq_info->msix_tbl[idx];
+		bmap_idx = rxq_info->msix_tbl[idx];
 
 		free_msix_idx_in_bmap(adap, bmap_idx);
 		free_irq(adap->msix_info_ulds[bmap_idx].vec,
@@ -325,10 +394,10 @@ void name_msix_vecs_uld(struct adapter *adap, unsigned int uld_type)
 {
 	struct sge_uld_rxq_info *rxq_info = adap->sge.uld_rxq_info[uld_type];
 	int n = sizeof(adap->msix_info_ulds[0].desc);
-	int idx;
+	unsigned int idx, bmap_idx;
 
 	for_each_uldrxq(rxq_info, idx) {
-		unsigned int bmap_idx = rxq_info->msix_tbl[idx];
+		bmap_idx = rxq_info->msix_tbl[idx];
 
 		snprintf(adap->msix_info_ulds[bmap_idx].desc, n, "%s-%s%d",
 			 adap->port[0]->name, rxq_info->name, idx);
@@ -390,15 +459,15 @@ static void uld_queue_init(struct adapter *adap, unsigned int uld_type,
 	lli->nciq = rxq_info->nciq;
 }
 
-int uld_mem_alloc(struct adapter *adap)
+int t4_uld_mem_alloc(struct adapter *adap)
 {
 	struct sge *s = &adap->sge;
 
-	adap->uld = kcalloc(adap->num_uld, sizeof(*adap->uld), GFP_KERNEL);
+	adap->uld = kcalloc(CXGB4_ULD_MAX, sizeof(*adap->uld), GFP_KERNEL);
 	if (!adap->uld)
 		return -ENOMEM;
 
-	s->uld_rxq_info = kzalloc(adap->num_uld *
+	s->uld_rxq_info = kzalloc(CXGB4_ULD_MAX *
 				  sizeof(struct sge_uld_rxq_info *),
 				  GFP_KERNEL);
 	if (!s->uld_rxq_info)
@@ -410,7 +479,7 @@ err_uld:
 	return -ENOMEM;
 }
 
-void uld_mem_free(struct adapter *adap)
+void t4_uld_mem_free(struct adapter *adap)
 {
 	struct sge *s = &adap->sge;
 
@@ -418,6 +487,26 @@ void uld_mem_free(struct adapter *adap)
 	kfree(adap->uld);
 }
 
+void t4_uld_clean_up(struct adapter *adap)
+{
+	struct sge_uld_rxq_info *rxq_info;
+	unsigned int i;
+
+	if (!adap->uld)
+		return;
+	for (i = 0; i < CXGB4_ULD_MAX; i++) {
+		if (!adap->uld[i].handle)
+			continue;
+		rxq_info = adap->sge.uld_rxq_info[i];
+		if (adap->flags & FULL_INIT_DONE)
+			quiesce_rx_uld(adap, i);
+		if (adap->flags & USING_MSIX)
+			free_msix_queue_irqs_uld(adap, i);
+		free_sge_queues_uld(adap, i);
+		free_queues_uld(adap, i);
+	}
+}
+
 static void uld_init(struct adapter *adap, struct cxgb4_lld_info *lld)
 {
 	int i;
@@ -429,10 +518,15 @@ static void uld_init(struct adapter *adap, struct cxgb4_lld_info *lld)
 	lld->ports = adap->port;
 	lld->vr = &adap->vres;
 	lld->mtus = adap->params.mtus;
-	lld->ntxq = adap->sge.iscsiqsets;
+	lld->ntxq = adap->sge.ofldqsets;
 	lld->nchan = adap->params.nports;
 	lld->nports = adap->params.nports;
 	lld->wr_cred = adap->params.ofldq_wr_cred;
+	lld->iscsi_iolen = MAXRXDATA_G(t4_read_reg(adap, TP_PARA_REG2_A));
+	lld->iscsi_tagmask = t4_read_reg(adap, ULP_RX_ISCSI_TAGMASK_A);
+	lld->iscsi_pgsz_order = t4_read_reg(adap, ULP_RX_ISCSI_PSZ_A);
+	lld->iscsi_llimit = t4_read_reg(adap, ULP_RX_ISCSI_LLIMIT_A);
+	lld->iscsi_ppm = &adap->iscsi_ppm;
 	lld->adapter_type = adap->params.chip;
 	lld->cclk_ps = 1000000000 / adap->params.vpd.cclk;
 	lld->udb_density = 1 << adap->params.sge.eq_qpp;
@@ -472,23 +566,37 @@ static void uld_attach(struct adapter *adap, unsigned int uld)
 	}
 
 	adap->uld[uld].handle = handle;
+	t4_register_netevent_notifier();
 
 	if (adap->flags & FULL_INIT_DONE)
 		adap->uld[uld].state_change(handle, CXGB4_STATE_UP);
 }
 
-int cxgb4_register_pci_uld(enum cxgb4_pci_uld type,
-			   struct cxgb4_pci_uld_info *p)
+/**
+ *	cxgb4_register_uld - register an upper-layer driver
+ *	@type: the ULD type
+ *	@p: the ULD methods
+ *
+ *	Registers an upper-layer driver with this driver and notifies the ULD
+ *	about any presently available devices that support its type.  Returns
+ *	%-EBUSY if a ULD of the same type is already registered.
+ */
+int cxgb4_register_uld(enum cxgb4_uld type,
+		       const struct cxgb4_uld_info *p)
 {
 	int ret = 0;
+	unsigned int adap_idx = 0;
 	struct adapter *adap;
 
-	if (type >= CXGB4_PCI_ULD_MAX)
+	if (type >= CXGB4_ULD_MAX)
 		return -EINVAL;
 
 	mutex_lock(&uld_mutex);
 	list_for_each_entry(adap, &adapter_list, list_node) {
-		if (!is_pci_uld(adap))
+		if ((type == CXGB4_ULD_CRYPTO && !is_pci_uld(adap)) ||
+		    (type != CXGB4_ULD_CRYPTO && !is_offload(adap)))
+			continue;
+		if (type == CXGB4_ULD_ISCSIT && is_t4(adap->params.chip))
 			continue;
 		ret = cfg_queues_uld(adap, type, p);
 		if (ret)
@@ -510,11 +618,14 @@ int cxgb4_register_pci_uld(enum cxgb4_pci_uld type,
 		}
 		adap->uld[type] = *p;
 		uld_attach(adap, type);
+		adap_idx++;
 	}
 	mutex_unlock(&uld_mutex);
 	return 0;
 
 free_irq:
+	if (adap->flags & FULL_INIT_DONE)
+		quiesce_rx_uld(adap, type);
 	if (adap->flags & USING_MSIX)
 		free_msix_queue_irqs_uld(adap, type);
 free_rxq:
@@ -522,21 +633,49 @@ free_rxq:
 free_queues:
 	free_queues_uld(adap, type);
 out:
+
+	list_for_each_entry(adap, &adapter_list, list_node) {
+		if ((type == CXGB4_ULD_CRYPTO && !is_pci_uld(adap)) ||
+		    (type != CXGB4_ULD_CRYPTO && !is_offload(adap)))
+			continue;
+		if (type == CXGB4_ULD_ISCSIT && is_t4(adap->params.chip))
+			continue;
+		if (!adap_idx)
+			break;
+		adap->uld[type].handle = NULL;
+		adap->uld[type].add = NULL;
+		if (adap->flags & FULL_INIT_DONE)
+			quiesce_rx_uld(adap, type);
+		if (adap->flags & USING_MSIX)
+			free_msix_queue_irqs_uld(adap, type);
+		free_sge_queues_uld(adap, type);
+		free_queues_uld(adap, type);
+		adap_idx--;
+	}
 	mutex_unlock(&uld_mutex);
 	return ret;
 }
-EXPORT_SYMBOL(cxgb4_register_pci_uld);
+EXPORT_SYMBOL(cxgb4_register_uld);
 
-int cxgb4_unregister_pci_uld(enum cxgb4_pci_uld type)
+/**
+ *	cxgb4_unregister_uld - unregister an upper-layer driver
+ *	@type: the ULD type
+ *
+ *	Unregisters an existing upper-layer driver.
+ */
+int cxgb4_unregister_uld(enum cxgb4_uld type)
 {
 	struct adapter *adap;
 
-	if (type >= CXGB4_PCI_ULD_MAX)
+	if (type >= CXGB4_ULD_MAX)
 		return -EINVAL;
 
 	mutex_lock(&uld_mutex);
 	list_for_each_entry(adap, &adapter_list, list_node) {
-		if (!is_pci_uld(adap))
+		if ((type == CXGB4_ULD_CRYPTO && !is_pci_uld(adap)) ||
+		    (type != CXGB4_ULD_CRYPTO && !is_offload(adap)))
+			continue;
+		if (type == CXGB4_ULD_ISCSIT && is_t4(adap->params.chip))
 			continue;
 		adap->uld[type].handle = NULL;
 		adap->uld[type].add = NULL;
@@ -551,4 +690,4 @@ int cxgb4_unregister_pci_uld(enum cxgb4_pci_uld type)
 
 	return 0;
 }
-EXPORT_SYMBOL(cxgb4_unregister_pci_uld);
+EXPORT_SYMBOL(cxgb4_unregister_uld);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
index ab40372..b3544f6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
@@ -42,6 +42,8 @@
 #include <linux/atomic.h>
 #include "cxgb4.h"
 
+#define MAX_ULD_QSETS 16
+
 /* CPL message priority levels */
 enum {
 	CPL_PRIORITY_DATA     = 0,  /* data messages */
@@ -189,9 +191,11 @@ static inline void set_wr_txq(struct sk_buff *skb, int prio, int queue)
 }
 
 enum cxgb4_uld {
+	CXGB4_ULD_INIT,
 	CXGB4_ULD_RDMA,
 	CXGB4_ULD_ISCSI,
 	CXGB4_ULD_ISCSIT,
+	CXGB4_ULD_CRYPTO,
 	CXGB4_ULD_MAX
 };
 
@@ -284,31 +288,11 @@ struct cxgb4_lld_info {
 
 struct cxgb4_uld_info {
 	const char *name;
-	void *(*add)(const struct cxgb4_lld_info *p);
-	int (*rx_handler)(void *handle, const __be64 *rsp,
-			  const struct pkt_gl *gl);
-	int (*state_change)(void *handle, enum cxgb4_state new_state);
-	int (*control)(void *handle, enum cxgb4_control control, ...);
-	int (*lro_rx_handler)(void *handle, const __be64 *rsp,
-			      const struct pkt_gl *gl,
-			      struct t4_lro_mgr *lro_mgr,
-			      struct napi_struct *napi);
-	void (*lro_flush)(struct t4_lro_mgr *);
-};
-
-enum cxgb4_pci_uld {
-	CXGB4_PCI_ULD1,
-	CXGB4_PCI_ULD_MAX
-};
-
-struct cxgb4_pci_uld_info {
-	const char *name;
-	bool lro;
 	void *handle;
 	unsigned int nrxq;
-	unsigned int nciq;
 	unsigned int rxq_size;
-	unsigned int ciq_size;
+	bool ciq;
+	bool lro;
 	void *(*add)(const struct cxgb4_lld_info *p);
 	int (*rx_handler)(void *handle, const __be64 *rsp,
 			  const struct pkt_gl *gl);
@@ -323,9 +307,6 @@ struct cxgb4_pci_uld_info {
 
 int cxgb4_register_uld(enum cxgb4_uld type, const struct cxgb4_uld_info *p);
 int cxgb4_unregister_uld(enum cxgb4_uld type);
-int cxgb4_register_pci_uld(enum cxgb4_pci_uld type,
-			   struct cxgb4_pci_uld_info *p);
-int cxgb4_unregister_pci_uld(enum cxgb4_pci_uld type);
 int cxgb4_ofld_send(struct net_device *dev, struct sk_buff *skb);
 unsigned int cxgb4_dbfifo_count(const struct net_device *dev, int lpfifo);
 unsigned int cxgb4_port_chan(const struct net_device *dev);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 9a607db..1e74fd6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2860,6 +2860,18 @@ int t4_sge_alloc_ctrl_txq(struct adapter *adap, struct sge_ctrl_txq *txq,
 	return 0;
 }
 
+int t4_sge_mod_ctrl_txq(struct adapter *adap, unsigned int eqid,
+			unsigned int cmplqid)
+{
+	u32 param, val;
+
+	param = (FW_PARAMS_MNEM_V(FW_PARAMS_MNEM_DMAQ) |
+		 FW_PARAMS_PARAM_X_V(FW_PARAMS_PARAM_DMAQ_EQ_CMPLIQID_CTRL) |
+		 FW_PARAMS_PARAM_YZ_V(eqid));
+	val = cmplqid;
+	return t4_set_params(adap, adap->mbox, adap->pf, 0, 1, &param, &val);
+}
+
 int t4_sge_alloc_ofld_txq(struct adapter *adap, struct sge_ofld_txq *txq,
 			  struct net_device *dev, unsigned int iqid)
 {
@@ -3014,12 +3026,6 @@ void t4_free_sge_resources(struct adapter *adap)
 		}
 	}
 
-	/* clean up RDMA and iSCSI Rx queues */
-	t4_free_ofld_rxqs(adap, adap->sge.iscsiqsets, adap->sge.iscsirxq);
-	t4_free_ofld_rxqs(adap, adap->sge.niscsitq, adap->sge.iscsitrxq);
-	t4_free_ofld_rxqs(adap, adap->sge.rdmaqs, adap->sge.rdmarxq);
-	t4_free_ofld_rxqs(adap, adap->sge.rdmaciqs, adap->sge.rdmaciq);
-
 	/* clean up offload Tx queues */
 	for (i = 0; i < ARRAY_SIZE(adap->sge.ofldtxq); i++) {
 		struct sge_ofld_txq *q = &adap->sge.ofldtxq[i];
diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index e4ba2d2..7c0d7af 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -84,6 +84,9 @@ static inline int send_tx_flowc_wr(struct cxgbi_sock *);
 
 static const struct cxgb4_uld_info cxgb4i_uld_info = {
 	.name = DRV_MODULE_NAME,
+	.nrxq = MAX_ULD_QSETS,
+	.rxq_size = 1024,
+	.lro = false,
 	.add = t4_uld_add,
 	.rx_handler = t4_uld_rx_handler,
 	.state_change = t4_uld_state_change,
diff --git a/drivers/target/iscsi/cxgbit/cxgbit_main.c b/drivers/target/iscsi/cxgbit/cxgbit_main.c
index 27dd11a..ad26b93 100644
--- a/drivers/target/iscsi/cxgbit/cxgbit_main.c
+++ b/drivers/target/iscsi/cxgbit/cxgbit_main.c
@@ -652,6 +652,9 @@ static struct iscsit_transport cxgbit_transport = {
 
 static struct cxgb4_uld_info cxgbit_uld_info = {
 	.name		= DRV_NAME,
+	.nrxq		= MAX_ULD_QSETS,
+	.rxq_size	= 1024,
+	.lro		= true,
 	.add		= cxgbit_uld_add,
 	.state_change	= cxgbit_uld_state_change,
 	.lro_rx_handler = cxgbit_uld_lro_rx_handler,
-- 
1.7.3

^ permalink raw reply related

* Re: crypto: caam from tasklet to threadirq
From: Russell King - ARM Linux @ 2016-09-16 16:53 UTC (permalink / raw)
  To: Cata Vasile; +Cc: Horia Geanta Neag, linux-crypto@vger.kernel.org
In-Reply-To: <DB5PR04MB1302AD00536D2E37E726E104EEF30@DB5PR04MB1302.eurprd04.prod.outlook.com>

On Fri, Sep 16, 2016 at 02:01:00PM +0000, Cata Vasile wrote:
> Hi,
> 
> We've tried to test and benchmark your submitted work[1].
> 
> Cryptographic offloading is also used in IPsec in the Linux Kernel. In
> heavy traffic scenarios, the NIC driver competes with the crypto device
> driver. Most NICs use the NAPI context, which is one of the most
> prioritized context types. In IPsec scenarios  the performance is
> trashed because, although raw data gets in to device, the data is
> encrypted/decrypted and the dequeue code in CAAM driver has a hard time
> being scheduled to actually call the callback to notify the networking
> stack it can continue working with  that data.

Having received a reply from Thomas Gleixner today, there appears to be
some disagreement with your findings, and a suggestion that the problem
needs proper and more in-depth investigation.

Thomas indicates that the NAPI processing shows an improvement when
moved to the same context that threaded interrupts run in, as opposed
to the current softirq context - which also would run the tasklets.

What I would say is that if threaded IRQs are causing harm, then there
seems to be something very wrong somewhere.

> Being this scenario, at heavy load, the Kernel warns on rcu stalls and
> the forwarding path has a lot of latency.  Have you tried benchmarking
> the board you used for testing?

It's way too long ago for me to remember - these patches were created
almost a year ago - October 20th 2015, which is when I'd have tested
them.  So, I'm afraid I can't help very much at this point, apart from
trying to re-run some benchmarks.

I'd suggest testing the openssl (with AF_ALG support), which is probably
what I tested and benchmarked.  However, as I say, it's far too long
ago for me to really remember at this point.

> I have ran some on our other platforms. The after benchmark fails to
> run at the top level of the before results.

Sorry, that last sentence doesn't make any sense to me.

I don't have the bandwidth to look at this, and IPsec doesn't interest
me one bit - I've never been able to work out how to setup IPsec
locally.  From what I remember when I looked into it many years ago,
you had to have significant information about ipsec to get it up and
running.  Maybe things have changed since then, I don't know.

If you want me to reproduce it, please send me a step-by-step idiots
guide on setting up a working test scenario which reproduces your
problem.

Thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply

* crypto: caam from tasklet to threadirq
From: Cata Vasile @ 2016-09-16 14:01 UTC (permalink / raw)
  To: rmk+kernel@armlinux.org.uk
  Cc: Horia Geanta Neag, linux-crypto@vger.kernel.org
In-Reply-To: <DB5PR04MB130229BBD433C8FF7BDD975FEEF90@DB5PR04MB1302.eurprd04.prod.outlook.com>

Hi,

We've tried to test and benchmark your submitted work[1].

Cryptographic offloading is also used in IPsec in the Linux Kernel. In heavy traffic scenarios, the NIC driver competes with the crypto device driver. Most NICs use the NAPI context, which is one of the most prioritized context types. In IPsec scenarios  the performance is trashed because, although raw data gets in to device, the data is encrypted/decrypted and the dequeue code in CAAM driver has a hard time being scheduled to actually call the callback to notify the networking stack it can continue working with  that data.

Being this scenario, at heavy load, the Kernel warns on rcu stalls and the forwarding path has a lot of latency.
Have you tried benchmarking the board you used for testing?

I have ran some on our other platforms. The after benchmark fails to run at the top level of the before results. The rcu stall does not always stall in the same place. The after ping latency is greater, and oscillates a lot.

It might be a good idea for the codebase to change to a threadirq, but from a pragmatic perspective, the whole system has to suffer. That is one the reasons most crypto accelerators try to run dequeue primitives in high priority contexts.

Regards,
Catalin Vasile

[1] https://git.kernel.org/cgit/linux/kernel/git/herbert/cryptodev-2.6.git/commit/?id=66d2e2028091a074aa1290d2eeda5ddb1a6c329c

^ permalink raw reply

* Re: [PATCH] crypto: caam - fix sg dump
From: Horia Geanta Neag @ 2016-09-16 13:49 UTC (permalink / raw)
  To: Cata Vasile, linux-crypto@vger.kernel.org
  Cc: linux-crypto-owner@vger.kernel.org, Alexandru Porosanu
In-Reply-To: <1474016787-5792-1-git-send-email-cata.vasile@nxp.com>

On 9/16/2016 12:06 PM, Catalin Vasile wrote:
> Ensure scatterlists have a virtual memory mapping before dumping.
> 
> Signed-off-by: Catalin Vasile <cata.vasile@nxp.com>
> ---
>  drivers/crypto/caam/caamalg.c | 65 +++++++++++++++++++++++++++++++++----------
>  1 file changed, 50 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
> index 6dc5971..49f1ea7 100644
> --- a/drivers/crypto/caam/caamalg.c
> +++ b/drivers/crypto/caam/caamalg.c
> @@ -111,6 +111,41 @@
>  #else
>  #define debug(format, arg...)
>  #endif
> +
> +#ifdef DEBUG
> +#include <linux/highmem.h>
> +
> +static void dbg_dump_sg(const char *level, const char *prefix_str,
> +			int prefix_type, int rowsize, int groupsize,
> +			struct scatterlist *sg, size_t tlen, bool ascii)
> +{
> +	struct scatterlist *it;
> +	size_t len;
> +	void *buf;
> +
> +	for (it = sg; it != NULL && tlen > 0 ; it = sg_next(sg)) {
> +		/*
> +		 * make sure the scatterlist's page
> +		 * has a valid virtual memory mapping
> +		 */
> +		buf = kmap(sg_page(it));
Even though driver has been updated recently to use threaded irq instead
of tasklet, there are still cases when you are not allowed to sleep here.
You should check this and use kmap_atomic when needed.

Horia

^ permalink raw reply

* Re: [PATCH -next] hwrng: amd - Fix return value check in mod_init()
From: PrasannaKumar Muralidharan @ 2016-09-16 13:13 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: Matt Mackall, Herbert Xu, Corentin LABBE, Wei Yongjun,
	linux-crypto
In-Reply-To: <1473990581-18602-1-git-send-email-weiyj.lk@gmail.com>

> In case of error, the function devm_kzalloc() or devm_ioport_map()
> return NULL pointer not ERR_PTR(). The IS_ERR() test in the return
> value check should be replaced with NULL test.
>
> Fixes: 31b2a73c9c5f ("hwrng: amd - Migrate to managed API")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> ---
>  drivers/char/hw_random/amd-rng.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/char/hw_random/amd-rng.c b/drivers/char/hw_random/amd-rng.c
> index 4dbc5aa..4a99ac7 100644
> --- a/drivers/char/hw_random/amd-rng.c
> +++ b/drivers/char/hw_random/amd-rng.c
> @@ -149,8 +149,8 @@ static int __init mod_init(void)
>                 return -EIO;
>
>         priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
> -       if (IS_ERR(priv))
> -               return PTR_ERR(priv);
> +       if (!priv)
> +               return -ENOMEM;
>
>         if (!devm_request_region(&pdev->dev, pmbase + PMBASE_OFFSET,
>                                 PMBASE_SIZE, DRV_NAME)) {
> @@ -161,9 +161,9 @@ static int __init mod_init(void)
>
>         priv->iobase = devm_ioport_map(&pdev->dev, pmbase + PMBASE_OFFSET,
>                         PMBASE_SIZE);
> -       if (IS_ERR(priv->iobase)) {
> +       if (!priv->iobase) {
>                 pr_err(DRV_NAME "Cannot map ioport\n");
> -               return PTR_ERR(priv->iobase);
> +               return -ENOMEM;
>         }
>
>         amd_rng.priv = (unsigned long)priv;
>

My change introduced this issue. Thanks for fixing it.

^ permalink raw reply

* Re: [PATCH v3 0/8] Add support for SafeXcel IP-76 to OMAP RNG
From: Romain Perier @ 2016-09-16 12:52 UTC (permalink / raw)
  To: dsaxena, mpm, Herbert Xu
  Cc: Gregory Clement, Thomas Petazzoni, Nadav Haklai, Omri Itach,
	Shadi Ammouri, Yahuda Yitschak, Hanna Hawa, Neta Zur Hershkovits,
	Igal Liberman, Marcin Wojtas, linux-crypto
In-Reply-To: <20160916100856.31727-1-romain.perier@free-electrons.com>

Hello,

Le 16/09/2016 12:08, Romain Perier a écrit :
> The driver omap-rng has a lot of similarity with the IP block SafeXcel
> IP-76. A lot of registers are the same and the way that the driver works
> is very closed the description of the TRNG EIP76 in its datasheet.
>
> This series refactorize the driver, add support for generating bigger
> output random data and add a device variant for SafeXcel IP-76, found
> in Armada 8K.
>
> Romain Perier (8):
>    dt-bindings: Add vendor prefix for INSIDE Secure
>    dt-bindings: omap-rng: Document SafeXcel IP-76 device variant
>    hwrng: omap - Switch to non-obsolete read API implementation
>    hwrng: omap - Remove global definition of hwrng
>    hwrng: omap - Add support for 128-bit output of data
>    hwrng: omap - Don't prefix the probe message with OMAP
>    hwrng: omap - Add device variant for SafeXcel IP-76 found in Armada 8K
>    arm64: dts: marvell: add TRNG description for Armada 8K CP
>
>   Documentation/devicetree/bindings/rng/omap_rng.txt |  14 +-
>   .../devicetree/bindings/vendor-prefixes.txt        |   1 +
>   .../boot/dts/marvell/armada-cp110-master.dtsi      |   8 +
>   .../arm64/boot/dts/marvell/armada-cp110-slave.dtsi |   8 +
>   drivers/char/hw_random/Kconfig                     |   2 +-
>   drivers/char/hw_random/omap-rng.c                  | 162 ++++++++++++++++-----
>   6 files changed, 152 insertions(+), 43 deletions(-)
>

If possible, I would like to get "Tested-by" tags by the omap guys 
before this set of patches is merged, to be sure that there are no 
regressions for the OMAP variant.

Thanks,
Regards,
-- 
Romain Perier, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply

* [PATCH] crypto: gcm - Fix IV buffer size in crypto_gcm_setkey
From: Ondrej Mosnáček @ 2016-09-16 12:07 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto

The cipher block size for GCM is 16 bytes, and thus the CTR transform
used in crypto_gcm_setkey() will also expect a 16-byte IV. However,
the code currently reserves only 8 bytes for the IV, causing
an out-of-bounds access in the CTR transform. This patch fixes
the issue by setting the size of the IV buffer to 16 bytes.

Fixes: 84c911523020 ("[CRYPTO] gcm: Add support for async ciphers")
Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
---
I randomly noticed this while going over igcm.c for an unrelated
reason. It seems the wrong buffer size never caused any noticeable
problems (it's been there since 2007), but it should be corrected
nonetheless...

 crypto/gcm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/gcm.c b/crypto/gcm.c
index 70a892e8..f624ac9 100644
--- a/crypto/gcm.c
+++ b/crypto/gcm.c
@@ -117,7 +117,7 @@ static int crypto_gcm_setkey(struct crypto_aead
*aead, const u8 *key,
 	struct crypto_skcipher *ctr = ctx->ctr;
 	struct {
 		be128 hash;
-		u8 iv[8];
+		u8 iv[16];

 		struct crypto_gcm_setkey_result result;

-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 7/8] hwrng: omap - Add device variant for SafeXcel IP-76 found in Armada 8K
From: Romain Perier @ 2016-09-16 10:08 UTC (permalink / raw)
  To: dsaxena, mpm, Herbert Xu
  Cc: Gregory Clement, Thomas Petazzoni, Romain Perier, Nadav Haklai,
	Omri Itach, Shadi Ammouri, Yahuda Yitschak, Hanna Hawa,
	Neta Zur Hershkovits, Igal Liberman, Marcin Wojtas, linux-crypto
In-Reply-To: <20160916100856.31727-1-romain.perier@free-electrons.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 6241 bytes --]

This commits adds a device variant for Safexcel,EIP76 found in Marvell
Armada 8k. It defines registers mapping with the good offset and add a
specific initialization function.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
---

Changes in v2:
 - Call pm_runtime_put_sync from the label err_register. When there is an
   EPROBE_DEFER, strange errors can happen because the call to pm_runtime_*
   is not well balanced.

 drivers/char/hw_random/Kconfig    |  2 +-
 drivers/char/hw_random/omap-rng.c | 86 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 85 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/Kconfig b/drivers/char/hw_random/Kconfig
index 56ad5a5..aea3613 100644
--- a/drivers/char/hw_random/Kconfig
+++ b/drivers/char/hw_random/Kconfig
@@ -168,7 +168,7 @@ config HW_RANDOM_IXP4XX
 
 config HW_RANDOM_OMAP
 	tristate "OMAP Random Number Generator support"
-	depends on ARCH_OMAP16XX || ARCH_OMAP2PLUS
+	depends on ARCH_OMAP16XX || ARCH_OMAP2PLUS || ARCH_MVEBU
 	default HW_RANDOM
  	---help---
  	  This driver provides kernel-side support for the Random Number
diff --git a/drivers/char/hw_random/omap-rng.c b/drivers/char/hw_random/omap-rng.c
index bbbce16..43ee8b3 100644
--- a/drivers/char/hw_random/omap-rng.c
+++ b/drivers/char/hw_random/omap-rng.c
@@ -28,6 +28,7 @@
 #include <linux/of_device.h>
 #include <linux/of_address.h>
 #include <linux/interrupt.h>
+#include <linux/clk.h>
 
 #include <asm/io.h>
 
@@ -63,6 +64,7 @@
 
 #define OMAP2_RNG_OUTPUT_SIZE			0x4
 #define OMAP4_RNG_OUTPUT_SIZE			0x8
+#define EIP76_RNG_OUTPUT_SIZE			0x10
 
 enum {
 	RNG_OUTPUT_0_REG = 0,
@@ -108,6 +110,23 @@ static const u16 reg_map_omap4[] = {
 	[RNG_SYSCONFIG_REG]	= 0x1FE4,
 };
 
+static const u16 reg_map_eip76[] = {
+	[RNG_OUTPUT_0_REG]	= 0x0,
+	[RNG_OUTPUT_1_REG]	= 0x4,
+	[RNG_OUTPUT_2_REG]	= 0x8,
+	[RNG_OUTPUT_3_REG]	= 0xc,
+	[RNG_STATUS_REG]	= 0x10,
+	[RNG_INTACK_REG]	= 0x10,
+	[RNG_CONTROL_REG]	= 0x14,
+	[RNG_CONFIG_REG]	= 0x18,
+	[RNG_ALARMCNT_REG]	= 0x1c,
+	[RNG_FROENABLE_REG]	= 0x20,
+	[RNG_FRODETUNE_REG]	= 0x24,
+	[RNG_ALARMMASK_REG]	= 0x28,
+	[RNG_ALARMSTOP_REG]	= 0x2c,
+	[RNG_REV_REG]		= 0x7c,
+};
+
 struct omap_rng_dev;
 /**
  * struct omap_rng_pdata - RNG IP block-specific data
@@ -130,6 +149,7 @@ struct omap_rng_dev {
 	struct device			*dev;
 	const struct omap_rng_pdata	*pdata;
 	struct hwrng rng;
+	struct clk 			*clk;
 };
 
 static inline u32 omap_rng_read(struct omap_rng_dev *priv, u16 reg)
@@ -221,6 +241,38 @@ static inline u32 omap4_rng_data_present(struct omap_rng_dev *priv)
 	return omap_rng_read(priv, RNG_STATUS_REG) & RNG_REG_STATUS_RDY;
 }
 
+static int eip76_rng_init(struct omap_rng_dev *priv)
+{
+	u32 val;
+
+	/* Return if RNG is already running. */
+	if (omap_rng_read(priv, RNG_CONTROL_REG) & RNG_CONTROL_ENABLE_TRNG_MASK)
+		return 0;
+
+	/*  Number of 512 bit blocks of raw Noise Source output data that must
+	 *  be processed by either the Conditioning Function or the
+	 *  SP 800-90 DRBG ‘BC_DF’ functionality to yield a ‘full entropy’
+	 *  output value.
+	 */
+	val = 0x5 << RNG_CONFIG_MIN_REFIL_CYCLES_SHIFT;
+
+	/* Number of FRO samples that are XOR-ed together into one bit to be
+	 * shifted into the main shift register
+	 */
+	val |= RNG_CONFIG_MAX_REFIL_CYCLES << RNG_CONFIG_MAX_REFIL_CYCLES_SHIFT;
+	omap_rng_write(priv, RNG_CONFIG_REG, val);
+
+	/* Enable all available FROs */
+	omap_rng_write(priv, RNG_FRODETUNE_REG, 0x0);
+	omap_rng_write(priv, RNG_FROENABLE_REG, RNG_REG_FROENABLE_MASK);
+
+	/* Enable TRNG */
+	val = RNG_CONTROL_ENABLE_TRNG_MASK;
+	omap_rng_write(priv, RNG_CONTROL_REG, val);
+
+	return 0;
+}
+
 static int omap4_rng_init(struct omap_rng_dev *priv)
 {
 	u32 val;
@@ -290,6 +342,14 @@ static struct omap_rng_pdata omap4_rng_pdata = {
 	.cleanup	= omap4_rng_cleanup,
 };
 
+static struct omap_rng_pdata eip76_rng_pdata = {
+	.regs		= (u16 *)reg_map_eip76,
+	.data_size	= EIP76_RNG_OUTPUT_SIZE,
+	.data_present	= omap4_rng_data_present,
+	.init		= eip76_rng_init,
+	.cleanup	= omap4_rng_cleanup,
+};
+
 static const struct of_device_id omap_rng_of_match[] = {
 		{
 			.compatible	= "ti,omap2-rng",
@@ -299,6 +359,10 @@ static const struct of_device_id omap_rng_of_match[] = {
 			.compatible	= "ti,omap4-rng",
 			.data		= &omap4_rng_pdata,
 		},
+		{
+			.compatible	= "inside-secure,safexcel-eip76",
+			.data		= &eip76_rng_pdata,
+		},
 		{},
 };
 MODULE_DEVICE_TABLE(of, omap_rng_of_match);
@@ -317,7 +381,8 @@ static int of_get_omap_rng_device_details(struct omap_rng_dev *priv,
 	}
 	priv->pdata = match->data;
 
-	if (of_device_is_compatible(dev->of_node, "ti,omap4-rng")) {
+	if (of_device_is_compatible(dev->of_node, "ti,omap4-rng") ||
+	    of_device_is_compatible(dev->of_node, "inside-secure,safexcel-eip76")) {
 		irq = platform_get_irq(pdev, 0);
 		if (irq < 0) {
 			dev_err(dev, "%s: error getting IRQ resource - %d\n",
@@ -333,6 +398,16 @@ static int of_get_omap_rng_device_details(struct omap_rng_dev *priv,
 			return err;
 		}
 		omap_rng_write(priv, RNG_INTMASK_REG, RNG_SHUTDOWN_OFLO_MASK);
+
+		priv->clk = of_clk_get(pdev->dev.of_node, 0);
+		if (IS_ERR(priv->clk) && PTR_ERR(priv->clk) == -EPROBE_DEFER)
+			return -EPROBE_DEFER;
+		if (!IS_ERR(priv->clk)) {
+			err = clk_prepare_enable(priv->clk);
+			if (err)
+				dev_err(&pdev->dev, "unable to enable the clk, "
+						    "err = %d\n", err);
+		}
 	}
 	return 0;
 }
@@ -394,7 +469,7 @@ static int omap_rng_probe(struct platform_device *pdev)
 	ret = (dev->of_node) ? of_get_omap_rng_device_details(priv, pdev) :
 				get_omap_rng_device_details(priv);
 	if (ret)
-		goto err_ioremap;
+		goto err_register;
 
 	ret = hwrng_register(&priv->rng);
 	if (ret)
@@ -407,7 +482,11 @@ static int omap_rng_probe(struct platform_device *pdev)
 
 err_register:
 	priv->base = NULL;
+	pm_runtime_put_sync(&pdev->dev);
 	pm_runtime_disable(&pdev->dev);
+
+	if (!IS_ERR(priv->clk))
+		clk_disable_unprepare(priv->clk);
 err_ioremap:
 	dev_err(dev, "initialization failed.\n");
 	return ret;
@@ -424,6 +503,9 @@ static int omap_rng_remove(struct platform_device *pdev)
 	pm_runtime_put_sync(&pdev->dev);
 	pm_runtime_disable(&pdev->dev);
 
+	if (!IS_ERR(priv->clk))
+		clk_disable_unprepare(priv->clk);
+
 	return 0;
 }
 
-- 
2.9.3

^ permalink raw reply related

* [PATCH v3 8/8] arm64: dts: marvell: add TRNG description for Armada 8K CP
From: Romain Perier @ 2016-09-16 10:08 UTC (permalink / raw)
  To: dsaxena, mpm, Herbert Xu
  Cc: Gregory Clement, Thomas Petazzoni, Romain Perier, Nadav Haklai,
	Omri Itach, Shadi Ammouri, Yahuda Yitschak, Hanna Hawa,
	Neta Zur Hershkovits, Igal Liberman, Marcin Wojtas, linux-crypto
In-Reply-To: <20160916100856.31727-1-romain.perier@free-electrons.com>

This commits adds the devicetree description of the SafeXcel IP-76 TRNG
found in the two Armada CP110.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
---
 arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi | 8 ++++++++
 arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi  | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi b/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
index da31bbb..aaffa24 100644
--- a/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
@@ -164,6 +164,14 @@
 				clocks = <&cpm_syscon0 1 21>;
 				status = "disabled";
 			};
+
+			cpm_trng: trng@760000 {
+				compatible = "marvell,armada-8k-rng", "inside-secure,safexcel-eip76";
+				reg = <0x760000 0x7d>;
+				interrupts = <GIC_SPI 59 IRQ_TYPE_LEVEL_HIGH>;
+				clocks = <&cpm_syscon0 1 25>;
+				status = "okay";
+			};
 		};
 
 		cpm_pcie0: pcie@f2600000 {
diff --git a/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi b/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
index 6ff1201..216de12 100644
--- a/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
@@ -164,6 +164,14 @@
 				clocks = <&cps_syscon0 1 21>;
 				status = "disabled";
 			};
+
+			cps_trng: trng@760000 {
+				compatible = "marvell,armada-8k-rng", "inside-secure,safexcel-eip76";
+				reg = <0x760000 0x7d>;
+				interrupts = <GIC_SPI 312 IRQ_TYPE_LEVEL_HIGH>;
+				clocks = <&cps_syscon0 1 25>;
+				status = "okay";
+			};
 		};
 
 		cps_pcie0: pcie@f4600000 {
-- 
2.9.3

^ permalink raw reply related

* [PATCH v3 6/8] hwrng: omap - Don't prefix the probe message with OMAP
From: Romain Perier @ 2016-09-16 10:08 UTC (permalink / raw)
  To: dsaxena, mpm, Herbert Xu
  Cc: Gregory Clement, Thomas Petazzoni, Romain Perier, Nadav Haklai,
	Omri Itach, Shadi Ammouri, Yahuda Yitschak, Hanna Hawa,
	Neta Zur Hershkovits, Igal Liberman, Marcin Wojtas, linux-crypto
In-Reply-To: <20160916100856.31727-1-romain.perier@free-electrons.com>

So far, this driver was only used for OMAP SoCs. However, if a device
variant is added for an IP block that has nothing to do with the OMAP
platform, the message "OMAP Random Number Generator Ver" is displayed
anyway. Instead of hardcoding "OMAP" into this message, we decide to
only display "Random Number Generator". As dev_info is already
pre-pending the message with the name of the device, we have enough
informations.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
---
 drivers/char/hw_random/omap-rng.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/hw_random/omap-rng.c b/drivers/char/hw_random/omap-rng.c
index a84ab49..bbbce16 100644
--- a/drivers/char/hw_random/omap-rng.c
+++ b/drivers/char/hw_random/omap-rng.c
@@ -400,7 +400,7 @@ static int omap_rng_probe(struct platform_device *pdev)
 	if (ret)
 		goto err_register;
 
-	dev_info(&pdev->dev, "OMAP Random Number Generator ver. %02x\n",
+	dev_info(&pdev->dev, "Random Number Generator ver. %02x\n",
 		 omap_rng_read(priv, RNG_REV_REG));
 
 	return 0;
-- 
2.9.3

^ permalink raw reply related

* [PATCH v3 5/8] hwrng: omap - Add support for 128-bit output of data
From: Romain Perier @ 2016-09-16 10:08 UTC (permalink / raw)
  To: dsaxena, mpm, Herbert Xu
  Cc: Gregory Clement, Thomas Petazzoni, Romain Perier, Nadav Haklai,
	Omri Itach, Shadi Ammouri, Yahuda Yitschak, Hanna Hawa,
	Neta Zur Hershkovits, Igal Liberman, Marcin Wojtas, linux-crypto
In-Reply-To: <20160916100856.31727-1-romain.perier@free-electrons.com>

So far, this driver only supports up to 64 bits of output data generated
by an RNG. Some IP blocks, like the SafeXcel IP-76 supports up to 128
bits of output data. This commits renames registers descriptions
OUTPUT_L_REG and OUTPUT_H_REG to OUTPUT_0_REG and OUPUT_1_REG,
respectively. It also adds two new values to the enumeration of existing
registers: OUTPUT_2_REG and OUTPUT_3_REG.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
---
 drivers/char/hw_random/omap-rng.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/char/hw_random/omap-rng.c b/drivers/char/hw_random/omap-rng.c
index 5ea804d..a84ab49 100644
--- a/drivers/char/hw_random/omap-rng.c
+++ b/drivers/char/hw_random/omap-rng.c
@@ -65,8 +65,10 @@
 #define OMAP4_RNG_OUTPUT_SIZE			0x8
 
 enum {
-	RNG_OUTPUT_L_REG = 0,
-	RNG_OUTPUT_H_REG,
+	RNG_OUTPUT_0_REG = 0,
+	RNG_OUTPUT_1_REG,
+	RNG_OUTPUT_2_REG,
+	RNG_OUTPUT_3_REG,
 	RNG_STATUS_REG,
 	RNG_INTMASK_REG,
 	RNG_INTACK_REG,
@@ -82,7 +84,7 @@ enum {
 };
 
 static const u16 reg_map_omap2[] = {
-	[RNG_OUTPUT_L_REG]	= 0x0,
+	[RNG_OUTPUT_0_REG]	= 0x0,
 	[RNG_STATUS_REG]	= 0x4,
 	[RNG_CONFIG_REG]	= 0x28,
 	[RNG_REV_REG]		= 0x3c,
@@ -90,8 +92,8 @@ static const u16 reg_map_omap2[] = {
 };
 
 static const u16 reg_map_omap4[] = {
-	[RNG_OUTPUT_L_REG]	= 0x0,
-	[RNG_OUTPUT_H_REG]	= 0x4,
+	[RNG_OUTPUT_0_REG]	= 0x0,
+	[RNG_OUTPUT_1_REG]	= 0x4,
 	[RNG_STATUS_REG]	= 0x8,
 	[RNG_INTMASK_REG]	= 0xc,
 	[RNG_INTACK_REG]	= 0x10,
@@ -163,7 +165,7 @@ static int omap_rng_do_read(struct hwrng *rng, void *data, size_t max,
 	if (!present)
 		return 0;
 
-	memcpy_fromio(data, priv->base + priv->pdata->regs[RNG_OUTPUT_L_REG],
+	memcpy_fromio(data, priv->base + priv->pdata->regs[RNG_OUTPUT_0_REG],
 		      priv->pdata->data_size);
 
 	if (priv->pdata->regs[RNG_INTACK_REG])
-- 
2.9.3

^ permalink raw reply related

* [PATCH v3 4/8] hwrng: omap - Remove global definition of hwrng
From: Romain Perier @ 2016-09-16 10:08 UTC (permalink / raw)
  To: dsaxena, mpm, Herbert Xu
  Cc: Gregory Clement, Thomas Petazzoni, Romain Perier, Nadav Haklai,
	Omri Itach, Shadi Ammouri, Yahuda Yitschak, Hanna Hawa,
	Neta Zur Hershkovits, Igal Liberman, Marcin Wojtas, linux-crypto
In-Reply-To: <20160916100856.31727-1-romain.perier@free-electrons.com>

The omap-rng driver currently assumes that there will only ever be a
single instance of an RNG device. For this reason, there is a statically
allocated struct hwrng, with a fixed name. However, registering two
struct hwrng with the same isn't accepted by the RNG framework, so we
need to switch to a dynamically allocated struct hwrng, each using a
different name. Then, we define the name of this hwrng to "dev_name(dev)",
so the name of the data structure is unique per device.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
---

Changes in v2:
 - Fix the goto label used when there is an error for devm_kstrdup

 drivers/char/hw_random/omap-rng.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/char/hw_random/omap-rng.c b/drivers/char/hw_random/omap-rng.c
index b3f6047..5ea804d 100644
--- a/drivers/char/hw_random/omap-rng.c
+++ b/drivers/char/hw_random/omap-rng.c
@@ -127,6 +127,7 @@ struct omap_rng_dev {
 	void __iomem			*base;
 	struct device			*dev;
 	const struct omap_rng_pdata	*pdata;
+	struct hwrng rng;
 };
 
 static inline u32 omap_rng_read(struct omap_rng_dev *priv, u16 reg)
@@ -187,12 +188,6 @@ static void omap_rng_cleanup(struct hwrng *rng)
 	priv->pdata->cleanup(priv);
 }
 
-static struct hwrng omap_rng_ops = {
-	.name		= "omap",
-	.read 		= omap_rng_do_read,
-	.init		= omap_rng_init,
-	.cleanup	= omap_rng_cleanup,
-};
 
 static inline u32 omap2_rng_data_present(struct omap_rng_dev *priv)
 {
@@ -365,7 +360,11 @@ static int omap_rng_probe(struct platform_device *pdev)
 	if (!priv)
 		return -ENOMEM;
 
-	omap_rng_ops.priv = (unsigned long)priv;
+	priv->rng.read = omap_rng_do_read;
+	priv->rng.init = omap_rng_init;
+	priv->rng.cleanup = omap_rng_cleanup;
+
+	priv->rng.priv = (unsigned long)priv;
 	platform_set_drvdata(pdev, priv);
 	priv->dev = dev;
 
@@ -376,6 +375,12 @@ static int omap_rng_probe(struct platform_device *pdev)
 		goto err_ioremap;
 	}
 
+	priv->rng.name = devm_kstrdup(dev, dev_name(dev), GFP_KERNEL);
+	if (!priv->rng.name) {
+		ret = -ENOMEM;
+		goto err_ioremap;
+	}
+
 	pm_runtime_enable(&pdev->dev);
 	ret = pm_runtime_get_sync(&pdev->dev);
 	if (ret) {
@@ -389,7 +394,7 @@ static int omap_rng_probe(struct platform_device *pdev)
 	if (ret)
 		goto err_ioremap;
 
-	ret = hwrng_register(&omap_rng_ops);
+	ret = hwrng_register(&priv->rng);
 	if (ret)
 		goto err_register;
 
@@ -410,7 +415,7 @@ static int omap_rng_remove(struct platform_device *pdev)
 {
 	struct omap_rng_dev *priv = platform_get_drvdata(pdev);
 
-	hwrng_unregister(&omap_rng_ops);
+	hwrng_unregister(&priv->rng);
 
 	priv->pdata->cleanup(priv);
 
-- 
2.9.3

^ permalink raw reply related

* [PATCH v3 3/8] hwrng: omap - Switch to non-obsolete read API implementation
From: Romain Perier @ 2016-09-16 10:08 UTC (permalink / raw)
  To: dsaxena, mpm, Herbert Xu
  Cc: Gregory Clement, Thomas Petazzoni, Romain Perier, Nadav Haklai,
	Omri Itach, Shadi Ammouri, Yahuda Yitschak, Hanna Hawa,
	Neta Zur Hershkovits, Igal Liberman, Marcin Wojtas, linux-crypto
In-Reply-To: <20160916100856.31727-1-romain.perier@free-electrons.com>

The ".data_present" and ".data_read" operations are marked as OBSOLETE
in the hwrng API. We have to use the ".read" operation instead. It makes
the driver simpler and moves the busy loop, that waits until enough data
is generated, to the read function. We simplify this step by only
checking the status of the engine, if there is data, we copy the data to
the output buffer and the amout of copied data is returned to the caller,
otherwise zero is returned. The hwrng core will re-call the read operation
as many times as required until enough data has been copied.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
---

Changes in v3:
 - Re-add code for busy loop that waits until enough data is generated. When no
   data is available, the busy loop is tried several time until the function
   returns to the hw_random core and then schedule_timeout_interruptible(1) is
   called. in v2, schedule_timeout_interruptible(1) was called each time data
   was not available, which added more latency.

 drivers/char/hw_random/omap-rng.c | 41 ++++++++++++++++-----------------------
 1 file changed, 17 insertions(+), 24 deletions(-)

diff --git a/drivers/char/hw_random/omap-rng.c b/drivers/char/hw_random/omap-rng.c
index 01d4be2..b3f6047 100644
--- a/drivers/char/hw_random/omap-rng.c
+++ b/drivers/char/hw_random/omap-rng.c
@@ -140,41 +140,35 @@ static inline void omap_rng_write(struct omap_rng_dev *priv, u16 reg,
 	__raw_writel(val, priv->base + priv->pdata->regs[reg]);
 }
 
-static int omap_rng_data_present(struct hwrng *rng, int wait)
+
+static int omap_rng_do_read(struct hwrng *rng, void *data, size_t max,
+			    bool wait)
 {
 	struct omap_rng_dev *priv;
-	int data, i;
+	int i, present;
 
 	priv = (struct omap_rng_dev *)rng->priv;
 
+	if (max < priv->pdata->data_size)
+		return 0;
+
 	for (i = 0; i < 20; i++) {
-		data = priv->pdata->data_present(priv);
-		if (data || !wait)
+		present = priv->pdata->data_present(priv);
+		if (present || !wait)
 			break;
-		/* RNG produces data fast enough (2+ MBit/sec, even
-		 * during "rngtest" loads, that these delays don't
-		 * seem to trigger.  We *could* use the RNG IRQ, but
-		 * that'd be higher overhead ... so why bother?
-		 */
+
 		udelay(10);
 	}
-	return data;
-}
-
-static int omap_rng_data_read(struct hwrng *rng, u32 *data)
-{
-	struct omap_rng_dev *priv;
-	u32 data_size, i;
-
-	priv = (struct omap_rng_dev *)rng->priv;
-	data_size = priv->pdata->data_size;
+	if (!present)
+		return 0;
 
-	for (i = 0; i < data_size / sizeof(u32); i++)
-		data[i] = omap_rng_read(priv, RNG_OUTPUT_L_REG + i);
+	memcpy_fromio(data, priv->base + priv->pdata->regs[RNG_OUTPUT_L_REG],
+		      priv->pdata->data_size);
 
 	if (priv->pdata->regs[RNG_INTACK_REG])
 		omap_rng_write(priv, RNG_INTACK_REG, RNG_REG_INTACK_RDY_MASK);
-	return data_size;
+
+	return priv->pdata->data_size;
 }
 
 static int omap_rng_init(struct hwrng *rng)
@@ -195,8 +189,7 @@ static void omap_rng_cleanup(struct hwrng *rng)
 
 static struct hwrng omap_rng_ops = {
 	.name		= "omap",
-	.data_present	= omap_rng_data_present,
-	.data_read	= omap_rng_data_read,
+	.read 		= omap_rng_do_read,
 	.init		= omap_rng_init,
 	.cleanup	= omap_rng_cleanup,
 };
-- 
2.9.3

^ permalink raw reply related

* [PATCH v3 2/8] dt-bindings: omap-rng: Document SafeXcel IP-76 device variant
From: Romain Perier @ 2016-09-16 10:08 UTC (permalink / raw)
  To: dsaxena, mpm, Herbert Xu
  Cc: Gregory Clement, Thomas Petazzoni, Romain Perier, Nadav Haklai,
	Omri Itach, Shadi Ammouri, Yahuda Yitschak, Hanna Hawa,
	Neta Zur Hershkovits, Igal Liberman, Marcin Wojtas, linux-crypto
In-Reply-To: <20160916100856.31727-1-romain.perier@free-electrons.com>

This commits add missing fields in the documentation that are used
by the new device variant. It also includes DT example to show how
the variant should be used.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
---
 Documentation/devicetree/bindings/rng/omap_rng.txt | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/rng/omap_rng.txt b/Documentation/devicetree/bindings/rng/omap_rng.txt
index 6a62acd..4714772 100644
--- a/Documentation/devicetree/bindings/rng/omap_rng.txt
+++ b/Documentation/devicetree/bindings/rng/omap_rng.txt
@@ -1,4 +1,4 @@
-OMAP SoC HWRNG Module
+OMAP SoC and Inside-Secure HWRNG Module
 
 Required properties:
 
@@ -6,11 +6,13 @@ Required properties:
   RNG versions:
   - "ti,omap2-rng" for OMAP2.
   - "ti,omap4-rng" for OMAP4, OMAP5 and AM33XX.
+  - "inside-secure,safexcel-eip76" for SoCs with EIP76 IP block
   Note that these two versions are incompatible.
 - ti,hwmods: Name of the hwmod associated with the RNG module
 - reg : Offset and length of the register set for the module
 - interrupts : the interrupt number for the RNG module.
-		Only used for "ti,omap4-rng".
+		Used for "ti,omap4-rng" and "inside-secure,safexcel-eip76"
+- clocks: the trng clock source
 
 Example:
 /* AM335x */
@@ -20,3 +22,11 @@ rng: rng@48310000 {
 	reg = <0x48310000 0x2000>;
 	interrupts = <111>;
 };
+
+/* SafeXcel IP-76 */
+trng: rng@f2760000 {
+	compatible = "inside-secure,safexcel-eip76";
+	reg = <0xf2760000 0x7d>;
+	interrupts = <GIC_SPI 59 IRQ_TYPE_LEVEL_HIGH>;
+	clocks = <&cpm_syscon0 1 25>;
+};
-- 
2.9.3

^ permalink raw reply related

* [PATCH v3 1/8] dt-bindings: Add vendor prefix for INSIDE Secure
From: Romain Perier @ 2016-09-16 10:08 UTC (permalink / raw)
  To: dsaxena, mpm, Herbert Xu
  Cc: Gregory Clement, Thomas Petazzoni, Romain Perier, Nadav Haklai,
	Omri Itach, Shadi Ammouri, Yahuda Yitschak, Hanna Hawa,
	Neta Zur Hershkovits, Igal Liberman, Marcin Wojtas, linux-crypto
In-Reply-To: <20160916100856.31727-1-romain.perier@free-electrons.com>

This commits adds a vendor for the company INSIDE Secure.
See https://www.insidesecure.com, for more details.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
---
 Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt
index 1992aa9..6a5e872 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -132,6 +132,7 @@ infineon Infineon Technologies
 inforce	Inforce Computing
 ingenic	Ingenic Semiconductor
 innolux	Innolux Corporation
+inside-secure	INSIDE Secure
 intel	Intel Corporation
 intercontrol	Inter Control Group
 invensense	InvenSense Inc.
-- 
2.9.3

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox