Linux cryptographic layer development
 help / color / mirror / Atom feed
* [PATCH 0/4 v3] crypto: octeontx2: Fix hang and address alignment issues
@ 2025-05-21 10:04 Bharat Bhushan
  2025-05-21 10:04 ` [PATCH 1/4 v3] crypto: octeontx2: add timeout for load_fvc completion poll Bharat Bhushan
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Bharat Bhushan @ 2025-05-21 10:04 UTC (permalink / raw)
  To: bbrezillon, schalla, herbert, davem, giovanni.cabiddu, linux,
	bharatb.linux, linux-crypto, linux-kernel
  Cc: Bharat Bhushan

First patch of the series fixes possible infinite loop.

Remaining three patches fixes address alignment issue observed
after "9382bc44b5f5 arm64: allow kmalloc() caches aligned to the
       smaller cache_line_size()"
  
First 3 patches applies to Linux version 6.5 onwards.
Patch-4 applies to Linux version 6.8 onwards

v2->v3:
 - Align DMA memory to ARCH_DMA_MINALIGN as that is mapped as
   bidirectional
 
v1->v2:
 - Fixed memory padding size calculation as per review comment

Bharat Bhushan (4):
  crypto: octeontx2: add timeout for load_fvc completion poll
  crypto: octeontx2: Fix address alignment issue on ucode loading
  crypto: octeontx2: Fix address alignment on CN10K A0/A1 and OcteonTX2
  crypto: octeontx2: Fix address alignment on CN10KB and CN10KA-B0

 .../marvell/octeontx2/otx2_cpt_reqmgr.h       | 121 +++++++++++++-----
 .../marvell/octeontx2/otx2_cptpf_ucode.c      |  51 +++++---
 2 files changed, 126 insertions(+), 46 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4 v3] crypto: octeontx2: add timeout for load_fvc completion poll
  2025-05-21 10:04 [PATCH 0/4 v3] crypto: octeontx2: Fix hang and address alignment issues Bharat Bhushan
@ 2025-05-21 10:04 ` Bharat Bhushan
  2025-05-21 10:04 ` [PATCH 2/4 v3] crypto: octeontx2: Fix address alignment issue on ucode loading Bharat Bhushan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Bharat Bhushan @ 2025-05-21 10:04 UTC (permalink / raw)
  To: bbrezillon, schalla, herbert, davem, giovanni.cabiddu, linux,
	bharatb.linux, linux-crypto, linux-kernel
  Cc: Bharat Bhushan

Adds timeout to exit from possible infinite loop, which polls
on CPT instruction(load_fvc) completion.

Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
---
 .../crypto/marvell/octeontx2/otx2_cptpf_ucode.c  | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/marvell/octeontx2/otx2_cptpf_ucode.c b/drivers/crypto/marvell/octeontx2/otx2_cptpf_ucode.c
index 78367849c3d5..9095dea2748d 100644
--- a/drivers/crypto/marvell/octeontx2/otx2_cptpf_ucode.c
+++ b/drivers/crypto/marvell/octeontx2/otx2_cptpf_ucode.c
@@ -1494,6 +1494,7 @@ int otx2_cpt_discover_eng_capabilities(struct otx2_cptpf_dev *cptpf)
 	dma_addr_t rptr_baddr;
 	struct pci_dev *pdev;
 	u32 len, compl_rlen;
+	int timeout = 10000;
 	int ret, etype;
 	void *rptr;
 
@@ -1554,16 +1555,27 @@ int otx2_cpt_discover_eng_capabilities(struct otx2_cptpf_dev *cptpf)
 							 etype);
 		otx2_cpt_fill_inst(&inst, &iq_cmd, rptr_baddr);
 		lfs->ops->send_cmd(&inst, 1, &cptpf->lfs.lf[0]);
+		timeout = 10000;
 
 		while (lfs->ops->cpt_get_compcode(result) ==
-						OTX2_CPT_COMPLETION_CODE_INIT)
+						OTX2_CPT_COMPLETION_CODE_INIT) {
 			cpu_relax();
+			udelay(1);
+			timeout--;
+			if (!timeout) {
+				ret = -ENODEV;
+				cptpf->is_eng_caps_discovered = false;
+				dev_warn(&pdev->dev, "Timeout on CPT load_fvc completion poll\n");
+				goto error_no_response;
+			}
+		}
 
 		cptpf->eng_caps[etype].u = be64_to_cpup(rptr);
 	}
-	dma_unmap_single(&pdev->dev, rptr_baddr, len, DMA_BIDIRECTIONAL);
 	cptpf->is_eng_caps_discovered = true;
 
+error_no_response:
+	dma_unmap_single(&pdev->dev, rptr_baddr, len, DMA_BIDIRECTIONAL);
 free_result:
 	kfree(result);
 lf_cleanup:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4 v3] crypto: octeontx2: Fix address alignment issue on ucode loading
  2025-05-21 10:04 [PATCH 0/4 v3] crypto: octeontx2: Fix hang and address alignment issues Bharat Bhushan
  2025-05-21 10:04 ` [PATCH 1/4 v3] crypto: octeontx2: add timeout for load_fvc completion poll Bharat Bhushan
@ 2025-05-21 10:04 ` Bharat Bhushan
  2025-05-21 10:04 ` [PATCH 3/4 v3] crypto: octeontx2: Fix address alignment on CN10K A0/A1 and OcteonTX2 Bharat Bhushan
  2025-05-21 10:04 ` [PATCH 4/4 v3] crypto: octeontx2: Fix address alignment on CN10KB and CN10KA-B0 Bharat Bhushan
  3 siblings, 0 replies; 7+ messages in thread
From: Bharat Bhushan @ 2025-05-21 10:04 UTC (permalink / raw)
  To: bbrezillon, schalla, herbert, davem, giovanni.cabiddu, linux,
	bharatb.linux, linux-crypto, linux-kernel
  Cc: Bharat Bhushan

octeontx2 crypto driver allocates memory using kmalloc/kzalloc,
and uses this memory for dma (does dma_map_single()). It assumes
that kmalloc/kzalloc will return 128-byte aligned address. But
kmalloc/kzalloc returns 8-byte aligned address after below changes:
  "9382bc44b5f5 arm64: allow kmalloc() caches aligned to the
   smaller cache_line_size()"

Completion address should be 32-Byte alignment when loading
microcode.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
---
v2->v3:
 - Align DMA memory to ARCH_DMA_MINALIGN as that is mapped as
   bidirectional
 
 .../marvell/octeontx2/otx2_cptpf_ucode.c      | 35 +++++++++++--------
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/drivers/crypto/marvell/octeontx2/otx2_cptpf_ucode.c b/drivers/crypto/marvell/octeontx2/otx2_cptpf_ucode.c
index 9095dea2748d..56645b3eb717 100644
--- a/drivers/crypto/marvell/octeontx2/otx2_cptpf_ucode.c
+++ b/drivers/crypto/marvell/octeontx2/otx2_cptpf_ucode.c
@@ -1491,12 +1491,13 @@ int otx2_cpt_discover_eng_capabilities(struct otx2_cptpf_dev *cptpf)
 	union otx2_cpt_opcode opcode;
 	union otx2_cpt_res_s *result;
 	union otx2_cpt_inst_s inst;
+	dma_addr_t result_baddr;
 	dma_addr_t rptr_baddr;
 	struct pci_dev *pdev;
-	u32 len, compl_rlen;
 	int timeout = 10000;
+	void *base, *rptr;
 	int ret, etype;
-	void *rptr;
+	u32 len;
 
 	/*
 	 * We don't get capabilities if it was already done
@@ -1519,22 +1520,28 @@ int otx2_cpt_discover_eng_capabilities(struct otx2_cptpf_dev *cptpf)
 	if (ret)
 		goto delete_grps;
 
-	compl_rlen = ALIGN(sizeof(union otx2_cpt_res_s), OTX2_CPT_DMA_MINALIGN);
-	len = compl_rlen + LOADFVC_RLEN;
+	/* Allocate extra memory for "rptr" and "result" pointer alignment */
+	len = LOADFVC_RLEN + ARCH_DMA_MINALIGN +
+	       sizeof(union otx2_cpt_res_s) + OTX2_CPT_RES_ADDR_ALIGN;
 
-	result = kzalloc(len, GFP_KERNEL);
-	if (!result) {
+	base = kzalloc(len, GFP_KERNEL);
+	if (!base) {
 		ret = -ENOMEM;
 		goto lf_cleanup;
 	}
-	rptr_baddr = dma_map_single(&pdev->dev, (void *)result, len,
-				    DMA_BIDIRECTIONAL);
+
+	rptr = PTR_ALIGN(base, ARCH_DMA_MINALIGN);
+	rptr_baddr = dma_map_single(&pdev->dev, rptr, len, DMA_BIDIRECTIONAL);
 	if (dma_mapping_error(&pdev->dev, rptr_baddr)) {
 		dev_err(&pdev->dev, "DMA mapping failed\n");
 		ret = -EFAULT;
-		goto free_result;
+		goto free_rptr;
 	}
-	rptr = (u8 *)result + compl_rlen;
+
+	result = (union otx2_cpt_res_s *)PTR_ALIGN(rptr + LOADFVC_RLEN,
+						   OTX2_CPT_RES_ADDR_ALIGN);
+	result_baddr = ALIGN(rptr_baddr + LOADFVC_RLEN,
+			     OTX2_CPT_RES_ADDR_ALIGN);
 
 	/* Fill in the command */
 	opcode.s.major = LOADFVC_MAJOR_OP;
@@ -1546,14 +1553,14 @@ int otx2_cpt_discover_eng_capabilities(struct otx2_cptpf_dev *cptpf)
 	/* 64-bit swap for microcode data reads, not needed for addresses */
 	cpu_to_be64s(&iq_cmd.cmd.u);
 	iq_cmd.dptr = 0;
-	iq_cmd.rptr = rptr_baddr + compl_rlen;
+	iq_cmd.rptr = rptr_baddr;
 	iq_cmd.cptr.u = 0;
 
 	for (etype = 1; etype < OTX2_CPT_MAX_ENG_TYPES; etype++) {
 		result->s.compcode = OTX2_CPT_COMPLETION_CODE_INIT;
 		iq_cmd.cptr.s.grp = otx2_cpt_get_eng_grp(&cptpf->eng_grps,
 							 etype);
-		otx2_cpt_fill_inst(&inst, &iq_cmd, rptr_baddr);
+		otx2_cpt_fill_inst(&inst, &iq_cmd, result_baddr);
 		lfs->ops->send_cmd(&inst, 1, &cptpf->lfs.lf[0]);
 		timeout = 10000;
 
@@ -1576,8 +1583,8 @@ int otx2_cpt_discover_eng_capabilities(struct otx2_cptpf_dev *cptpf)
 
 error_no_response:
 	dma_unmap_single(&pdev->dev, rptr_baddr, len, DMA_BIDIRECTIONAL);
-free_result:
-	kfree(result);
+free_rptr:
+	kfree(base);
 lf_cleanup:
 	otx2_cptlf_shutdown(lfs);
 delete_grps:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/4 v3] crypto: octeontx2: Fix address alignment on CN10K A0/A1 and OcteonTX2
  2025-05-21 10:04 [PATCH 0/4 v3] crypto: octeontx2: Fix hang and address alignment issues Bharat Bhushan
  2025-05-21 10:04 ` [PATCH 1/4 v3] crypto: octeontx2: add timeout for load_fvc completion poll Bharat Bhushan
  2025-05-21 10:04 ` [PATCH 2/4 v3] crypto: octeontx2: Fix address alignment issue on ucode loading Bharat Bhushan
@ 2025-05-21 10:04 ` Bharat Bhushan
  2025-05-21 11:28   ` Herbert Xu
  2025-05-21 10:04 ` [PATCH 4/4 v3] crypto: octeontx2: Fix address alignment on CN10KB and CN10KA-B0 Bharat Bhushan
  3 siblings, 1 reply; 7+ messages in thread
From: Bharat Bhushan @ 2025-05-21 10:04 UTC (permalink / raw)
  To: bbrezillon, schalla, herbert, davem, giovanni.cabiddu, linux,
	bharatb.linux, linux-crypto, linux-kernel
  Cc: Bharat Bhushan, stable

octeontx2 crypto driver allocates memory using kmalloc/kzalloc,
and uses this memory for dma (does dma_map_single()). It assumes
that kmalloc/kzalloc will return 128-byte aligned address. But
kmalloc/kzalloc returns 8-byte aligned address after below changes:
  "9382bc44b5f5 arm64: allow kmalloc() caches aligned to the
   smaller cache_line_size()

Memory allocated are used for following purpose:
 - Input data or scatter list address - 8-Byte alignment
 - Output data or gather list address - 8-Byte alignment
 - Completion address - 32-Byte alignment.

This patch ensures all addresses are aligned as mentioned above.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Cc: <stable@vger.kernel.org> #v6.5+
---
v2->v3:
 - Align DMA memory to ARCH_DMA_MINALIGN as that is mapped as
   bidirectional
 
v1->v2:
 - Fixed memory padding size calculation as per review comment

 .../marvell/octeontx2/otx2_cpt_reqmgr.h       | 63 ++++++++++++++-----
 1 file changed, 48 insertions(+), 15 deletions(-)

diff --git a/drivers/crypto/marvell/octeontx2/otx2_cpt_reqmgr.h b/drivers/crypto/marvell/octeontx2/otx2_cpt_reqmgr.h
index e27e849b01df..204a31755710 100644
--- a/drivers/crypto/marvell/octeontx2/otx2_cpt_reqmgr.h
+++ b/drivers/crypto/marvell/octeontx2/otx2_cpt_reqmgr.h
@@ -34,6 +34,9 @@
 #define SG_COMP_2    2
 #define SG_COMP_1    1
 
+#define OTX2_CPT_DPTR_RPTR_ALIGN	8
+#define OTX2_CPT_RES_ADDR_ALIGN		32
+
 union otx2_cpt_opcode {
 	u16 flags;
 	struct {
@@ -417,10 +420,9 @@ static inline struct otx2_cpt_inst_info *
 otx2_sg_info_create(struct pci_dev *pdev, struct otx2_cpt_req_info *req,
 		    gfp_t gfp)
 {
-	int align = OTX2_CPT_DMA_MINALIGN;
 	struct otx2_cpt_inst_info *info;
-	u32 dlen, align_dlen, info_len;
-	u16 g_sz_bytes, s_sz_bytes;
+	u32 dlen, info_len;
+	u16 g_len, s_len;
 	u32 total_mem_len;
 
 	if (unlikely(req->in_cnt > OTX2_CPT_MAX_SG_IN_CNT ||
@@ -429,22 +431,51 @@ otx2_sg_info_create(struct pci_dev *pdev, struct otx2_cpt_req_info *req,
 		return NULL;
 	}
 
-	g_sz_bytes = ((req->in_cnt + 3) / 4) *
-		      sizeof(struct otx2_cpt_sglist_component);
-	s_sz_bytes = ((req->out_cnt + 3) / 4) *
-		      sizeof(struct otx2_cpt_sglist_component);
+	/* Allocate memory to meet below alignment requirement:
+	 *  ----------------------------------
+	 * |    struct otx2_cpt_inst_info     |
+	 * |    (No alignment required)       |
+	 * |     -----------------------------|
+	 * |    | padding for 8B alignment    |
+	 * |----------------------------------|
+	 * |    SG List Gather/Input memory   |
+	 * |    Length = multiple of 32Bytes  |
+	 * |    Alignment = 8Byte             |
+	 * |----------------------------------|
+	 * |    SG List Scatter/Output memory |
+	 * |    Length = multiple of 32Bytes  |
+	 * |    Alignment = 8Byte             |
+	 * |    (padding for below alignment) |
+	 * |     -----------------------------|
+	 * |    | padding for 32B alignment   |
+	 * |----------------------------------|
+	 * |    Result response memory        |
+	 *  ----------------------------------
+	 */
 
-	dlen = g_sz_bytes + s_sz_bytes + SG_LIST_HDR_SIZE;
-	align_dlen = ALIGN(dlen, align);
-	info_len = ALIGN(sizeof(*info), align);
-	total_mem_len = align_dlen + info_len + sizeof(union otx2_cpt_res_s);
+	info_len = sizeof(*info);
+
+	g_len = ((req->in_cnt + 3) / 4) *
+		 sizeof(struct otx2_cpt_sglist_component);
+	s_len = ((req->out_cnt + 3) / 4) *
+		 sizeof(struct otx2_cpt_sglist_component);
+
+	dlen = g_len + s_len + SG_LIST_HDR_SIZE;
+
+	/* Allocate extra memory for SG and response address alignment */
+	total_mem_len = ALIGN(info_len, ARCH_DMA_MINALIGN) + dlen;
+	total_mem_len = ALIGN(total_mem_len, OTX2_CPT_DPTR_RPTR_ALIGN);
+	total_mem_len += (OTX2_CPT_RES_ADDR_ALIGN - 1) &
+			  ~(OTX2_CPT_DPTR_RPTR_ALIGN - 1);
+	total_mem_len += sizeof(union otx2_cpt_res_s);
 
 	info = kzalloc(total_mem_len, gfp);
 	if (unlikely(!info))
 		return NULL;
 
 	info->dlen = dlen;
-	info->in_buffer = (u8 *)info + info_len;
+	info->in_buffer = PTR_ALIGN((u8 *)info + info_len, ARCH_DMA_MINALIGN);
+	info->out_buffer = info->in_buffer + SG_LIST_HDR_SIZE + g_len;
 
 	((u16 *)info->in_buffer)[0] = req->out_cnt;
 	((u16 *)info->in_buffer)[1] = req->in_cnt;
@@ -460,7 +491,7 @@ otx2_sg_info_create(struct pci_dev *pdev, struct otx2_cpt_req_info *req,
 	}
 
 	if (setup_sgio_components(pdev, req->out, req->out_cnt,
-				  &info->in_buffer[8 + g_sz_bytes])) {
+				  info->out_buffer)) {
 		dev_err(&pdev->dev, "Failed to setup scatter list\n");
 		goto destroy_info;
 	}
@@ -476,8 +507,10 @@ otx2_sg_info_create(struct pci_dev *pdev, struct otx2_cpt_req_info *req,
 	 * Get buffer for union otx2_cpt_res_s response
 	 * structure and its physical address
 	 */
-	info->completion_addr = info->in_buffer + align_dlen;
-	info->comp_baddr = info->dptr_baddr + align_dlen;
+	info->completion_addr = PTR_ALIGN((info->in_buffer + dlen),
+					  OTX2_CPT_RES_ADDR_ALIGN);
+	info->comp_baddr = ALIGN((info->dptr_baddr + dlen),
+				 OTX2_CPT_RES_ADDR_ALIGN);
 
 	return info;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/4 v3] crypto: octeontx2: Fix address alignment on CN10KB and CN10KA-B0
  2025-05-21 10:04 [PATCH 0/4 v3] crypto: octeontx2: Fix hang and address alignment issues Bharat Bhushan
                   ` (2 preceding siblings ...)
  2025-05-21 10:04 ` [PATCH 3/4 v3] crypto: octeontx2: Fix address alignment on CN10K A0/A1 and OcteonTX2 Bharat Bhushan
@ 2025-05-21 10:04 ` Bharat Bhushan
  3 siblings, 0 replies; 7+ messages in thread
From: Bharat Bhushan @ 2025-05-21 10:04 UTC (permalink / raw)
  To: bbrezillon, schalla, herbert, davem, giovanni.cabiddu, linux,
	bharatb.linux, linux-crypto, linux-kernel
  Cc: Bharat Bhushan, stable

octeontx2 crypto driver allocates memory using kmalloc/kzalloc,
and uses this memory for dma (does dma_map_single()). It assumes
that kmalloc/kzalloc will return 128-byte aligned address. But
kmalloc/kzalloc returns 8-byte aligned address after below changes:
  "9382bc44b5f5 arm64: allow kmalloc() caches aligned to the
   smaller cache_line_size()

Memory allocated are used for following purpose:
 - Input data or scatter list address - 8-Byte alignment
 - Output data or gather list address - 8-Byte alignment
 - Completion address - 32-Byte alignment.

This patch ensures all addresses are aligned as mentioned above.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Cc: <stable@vger.kernel.org> #v6.8+
---
v2->v3:
 - Align DMA memory to ARCH_DMA_MINALIGN as that is mapped as
   bidirectional
 
v1->v2:
 - Fixed memory padding size calculation as per review comment

 .../marvell/octeontx2/otx2_cpt_reqmgr.h       | 58 ++++++++++++++-----
 1 file changed, 43 insertions(+), 15 deletions(-)

diff --git a/drivers/crypto/marvell/octeontx2/otx2_cpt_reqmgr.h b/drivers/crypto/marvell/octeontx2/otx2_cpt_reqmgr.h
index 204a31755710..8e95036e91d9 100644
--- a/drivers/crypto/marvell/octeontx2/otx2_cpt_reqmgr.h
+++ b/drivers/crypto/marvell/octeontx2/otx2_cpt_reqmgr.h
@@ -350,22 +350,47 @@ static inline struct otx2_cpt_inst_info *
 cn10k_sgv2_info_create(struct pci_dev *pdev, struct otx2_cpt_req_info *req,
 		       gfp_t gfp)
 {
-	u32 dlen = 0, g_len, sg_len, info_len;
-	int align = OTX2_CPT_DMA_MINALIGN;
+	u32 dlen = 0, g_len, s_len, sg_len, info_len;
 	struct otx2_cpt_inst_info *info;
-	u16 g_sz_bytes, s_sz_bytes;
 	u32 total_mem_len;
 	int i;
 
-	g_sz_bytes = ((req->in_cnt + 2) / 3) *
-		      sizeof(struct cn10kb_cpt_sglist_component);
-	s_sz_bytes = ((req->out_cnt + 2) / 3) *
-		      sizeof(struct cn10kb_cpt_sglist_component);
+	/* Allocate memory to meet below alignment requirement:
+	 *  ----------------------------------
+	 * |    struct otx2_cpt_inst_info     |
+	 * |    (No alignment required)       |
+	 * |     -----------------------------|
+	 * |    | padding for 8B alignment    |
+	 * |----------------------------------|
+	 * |    SG List Gather/Input memory   |
+	 * |    Length = multiple of 32Bytes  |
+	 * |    Alignment = 8Byte             |
+	 * |----------------------------------|
+	 * |    SG List Scatter/Output memory |
+	 * |    Length = multiple of 32Bytes  |
+	 * |    Alignment = 8Byte             |
+	 * |    (padding for below alignment) |
+	 * |     -----------------------------|
+	 * |    | padding for 32B alignment   |
+	 * |----------------------------------|
+	 * |    Result response memory        |
+	 *  ----------------------------------
+	 */
+
+	info_len = sizeof(*info);
+
+	g_len = ((req->in_cnt + 2) / 3) *
+		 sizeof(struct cn10kb_cpt_sglist_component);
+	s_len = ((req->out_cnt + 2) / 3) *
+		 sizeof(struct cn10kb_cpt_sglist_component);
+	sg_len = g_len + s_len;
 
-	g_len = ALIGN(g_sz_bytes, align);
-	sg_len = ALIGN(g_len + s_sz_bytes, align);
-	info_len = ALIGN(sizeof(*info), align);
-	total_mem_len = sg_len + info_len + sizeof(union otx2_cpt_res_s);
+	/* Allocate extra memory for SG and response address alignment */
+	total_mem_len = ALIGN(info_len, ARCH_DMA_MINALIGN) + sg_len;
+	total_mem_len = ALIGN(total_mem_len, OTX2_CPT_DPTR_RPTR_ALIGN);
+	total_mem_len += (OTX2_CPT_RES_ADDR_ALIGN - 1) &
+			  ~(OTX2_CPT_DPTR_RPTR_ALIGN - 1);
+	total_mem_len += sizeof(union otx2_cpt_res_s);
 
 	info = kzalloc(total_mem_len, gfp);
 	if (unlikely(!info))
@@ -375,7 +400,8 @@ cn10k_sgv2_info_create(struct pci_dev *pdev, struct otx2_cpt_req_info *req,
 		dlen += req->in[i].size;
 
 	info->dlen = dlen;
-	info->in_buffer = (u8 *)info + info_len;
+	info->in_buffer = PTR_ALIGN((u8 *)info + info_len, ARCH_DMA_MINALIGN);
+	info->out_buffer = info->in_buffer + g_len;
 	info->gthr_sz = req->in_cnt;
 	info->sctr_sz = req->out_cnt;
 
@@ -387,7 +413,7 @@ cn10k_sgv2_info_create(struct pci_dev *pdev, struct otx2_cpt_req_info *req,
 	}
 
 	if (sgv2io_components_setup(pdev, req->out, req->out_cnt,
-				    &info->in_buffer[g_len])) {
+				    info->out_buffer)) {
 		dev_err(&pdev->dev, "Failed to setup scatter list\n");
 		goto destroy_info;
 	}
@@ -404,8 +430,10 @@ cn10k_sgv2_info_create(struct pci_dev *pdev, struct otx2_cpt_req_info *req,
 	 * Get buffer for union otx2_cpt_res_s response
 	 * structure and its physical address
 	 */
-	info->completion_addr = info->in_buffer + sg_len;
-	info->comp_baddr = info->dptr_baddr + sg_len;
+	info->completion_addr = PTR_ALIGN((info->in_buffer + sg_len),
+					  OTX2_CPT_RES_ADDR_ALIGN);
+	info->comp_baddr = ALIGN((info->dptr_baddr + sg_len),
+				 OTX2_CPT_RES_ADDR_ALIGN);
 
 	return info;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/4 v3] crypto: octeontx2: Fix address alignment on CN10K A0/A1 and OcteonTX2
  2025-05-21 10:04 ` [PATCH 3/4 v3] crypto: octeontx2: Fix address alignment on CN10K A0/A1 and OcteonTX2 Bharat Bhushan
@ 2025-05-21 11:28   ` Herbert Xu
  2025-05-22 10:08     ` Bharat Bhushan
  0 siblings, 1 reply; 7+ messages in thread
From: Herbert Xu @ 2025-05-21 11:28 UTC (permalink / raw)
  To: Bharat Bhushan
  Cc: bbrezillon, schalla, davem, giovanni.cabiddu, linux,
	bharatb.linux, linux-crypto, linux-kernel, stable

On Wed, May 21, 2025 at 03:34:46PM +0530, Bharat Bhushan wrote:
>
> @@ -429,22 +431,51 @@ otx2_sg_info_create(struct pci_dev *pdev, struct otx2_cpt_req_info *req,
>  		return NULL;
>  	}
>  
> -	g_sz_bytes = ((req->in_cnt + 3) / 4) *
> -		      sizeof(struct otx2_cpt_sglist_component);
> -	s_sz_bytes = ((req->out_cnt + 3) / 4) *
> -		      sizeof(struct otx2_cpt_sglist_component);
> +	/* Allocate memory to meet below alignment requirement:
> +	 *  ----------------------------------
> +	 * |    struct otx2_cpt_inst_info     |
> +	 * |    (No alignment required)       |
> +	 * |     -----------------------------|
> +	 * |    | padding for 8B alignment    |
> +	 * |----------------------------------|

This should be updated to show that everything following this
is on an 128-byte boundary.

> +	 * |    SG List Gather/Input memory   |
> +	 * |    Length = multiple of 32Bytes  |
> +	 * |    Alignment = 8Byte             |
> +	 * |----------------------------------|
> +	 * |    SG List Scatter/Output memory |
> +	 * |    Length = multiple of 32Bytes  |
> +	 * |    Alignment = 8Byte             |
> +	 * |    (padding for below alignment) |
> +	 * |     -----------------------------|
> +	 * |    | padding for 32B alignment   |
> +	 * |----------------------------------|
> +	 * |    Result response memory        |
> +	 *  ----------------------------------
> +	 */
>  
> -	dlen = g_sz_bytes + s_sz_bytes + SG_LIST_HDR_SIZE;
> -	align_dlen = ALIGN(dlen, align);
> -	info_len = ALIGN(sizeof(*info), align);
> -	total_mem_len = align_dlen + info_len + sizeof(union otx2_cpt_res_s);
> +	info_len = sizeof(*info);
> +
> +	g_len = ((req->in_cnt + 3) / 4) *
> +		 sizeof(struct otx2_cpt_sglist_component);
> +	s_len = ((req->out_cnt + 3) / 4) *
> +		 sizeof(struct otx2_cpt_sglist_component);
> +
> +	dlen = g_len + s_len + SG_LIST_HDR_SIZE;
> +
> +	/* Allocate extra memory for SG and response address alignment */
> +	total_mem_len = ALIGN(info_len, ARCH_DMA_MINALIGN) + dlen;
> +	total_mem_len = ALIGN(total_mem_len, OTX2_CPT_DPTR_RPTR_ALIGN);
> +	total_mem_len += (OTX2_CPT_RES_ADDR_ALIGN - 1) &
> +			  ~(OTX2_CPT_DPTR_RPTR_ALIGN - 1);
> +	total_mem_len += sizeof(union otx2_cpt_res_s);

This calculation is wrong again.  It should be:

	total_mem_len = ALIGN(info_len, OTX2_CPT_DPTR_RPTR_ALIGN);
	total_mem_len += (ARCH_DMA_MINALIGN - 1) &
			 ~(OTX2_CPT_DPTR_RPTR_ALIGN - 1);
	total_mem_len += ALIGN(dlen, OTX2_CPT_RES_ADDR_ALIGN);
	total_mem_len += sizeof(union otx2_cpt_res_s);

Remember ALIGN may not actually give you extra memory.  So if you
need to add memory for alignment padding, you will need to do it
by hand.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/4 v3] crypto: octeontx2: Fix address alignment on CN10K A0/A1 and OcteonTX2
  2025-05-21 11:28   ` Herbert Xu
@ 2025-05-22 10:08     ` Bharat Bhushan
  0 siblings, 0 replies; 7+ messages in thread
From: Bharat Bhushan @ 2025-05-22 10:08 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Bharat Bhushan, bbrezillon, schalla, davem, giovanni.cabiddu,
	linux, linux-crypto, linux-kernel, stable

On Wed, May 21, 2025 at 4:58 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Wed, May 21, 2025 at 03:34:46PM +0530, Bharat Bhushan wrote:
> >
> > @@ -429,22 +431,51 @@ otx2_sg_info_create(struct pci_dev *pdev, struct otx2_cpt_req_info *req,
> >               return NULL;
> >       }
> >
> > -     g_sz_bytes = ((req->in_cnt + 3) / 4) *
> > -                   sizeof(struct otx2_cpt_sglist_component);
> > -     s_sz_bytes = ((req->out_cnt + 3) / 4) *
> > -                   sizeof(struct otx2_cpt_sglist_component);
> > +     /* Allocate memory to meet below alignment requirement:
> > +      *  ----------------------------------
> > +      * |    struct otx2_cpt_inst_info     |
> > +      * |    (No alignment required)       |
> > +      * |     -----------------------------|
> > +      * |    | padding for 8B alignment    |
> > +      * |----------------------------------|
>
> This should be updated to show that everything following this
> is on an 128-byte boundary.
>
> > +      * |    SG List Gather/Input memory   |
> > +      * |    Length = multiple of 32Bytes  |
> > +      * |    Alignment = 8Byte             |
> > +      * |----------------------------------|
> > +      * |    SG List Scatter/Output memory |
> > +      * |    Length = multiple of 32Bytes  |
> > +      * |    Alignment = 8Byte             |
> > +      * |    (padding for below alignment) |
> > +      * |     -----------------------------|
> > +      * |    | padding for 32B alignment   |
> > +      * |----------------------------------|
> > +      * |    Result response memory        |
> > +      *  ----------------------------------
> > +      */
> >
> > -     dlen = g_sz_bytes + s_sz_bytes + SG_LIST_HDR_SIZE;
> > -     align_dlen = ALIGN(dlen, align);
> > -     info_len = ALIGN(sizeof(*info), align);
> > -     total_mem_len = align_dlen + info_len + sizeof(union otx2_cpt_res_s);
> > +     info_len = sizeof(*info);
> > +
> > +     g_len = ((req->in_cnt + 3) / 4) *
> > +              sizeof(struct otx2_cpt_sglist_component);
> > +     s_len = ((req->out_cnt + 3) / 4) *
> > +              sizeof(struct otx2_cpt_sglist_component);
> > +
> > +     dlen = g_len + s_len + SG_LIST_HDR_SIZE;
> > +
> > +     /* Allocate extra memory for SG and response address alignment */
> > +     total_mem_len = ALIGN(info_len, ARCH_DMA_MINALIGN) + dlen;
> > +     total_mem_len = ALIGN(total_mem_len, OTX2_CPT_DPTR_RPTR_ALIGN);
> > +     total_mem_len += (OTX2_CPT_RES_ADDR_ALIGN - 1) &
> > +                       ~(OTX2_CPT_DPTR_RPTR_ALIGN - 1);
> > +     total_mem_len += sizeof(union otx2_cpt_res_s);
>
> This calculation is wrong again.  It should be:
>
>         total_mem_len = ALIGN(info_len, OTX2_CPT_DPTR_RPTR_ALIGN);
>         total_mem_len += (ARCH_DMA_MINALIGN - 1) &
>                          ~(OTX2_CPT_DPTR_RPTR_ALIGN - 1);
>         total_mem_len += ALIGN(dlen, OTX2_CPT_RES_ADDR_ALIGN);
>         total_mem_len += sizeof(union otx2_cpt_res_s);
>
> Remember ALIGN may not actually give you extra memory.  So if you
> need to add memory for alignment padding, you will need to do it
> by hand.

Will do changes as proposed above, Thanks for reviewing.

Thanks
-Bharat

>
> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-05-22 10:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-21 10:04 [PATCH 0/4 v3] crypto: octeontx2: Fix hang and address alignment issues Bharat Bhushan
2025-05-21 10:04 ` [PATCH 1/4 v3] crypto: octeontx2: add timeout for load_fvc completion poll Bharat Bhushan
2025-05-21 10:04 ` [PATCH 2/4 v3] crypto: octeontx2: Fix address alignment issue on ucode loading Bharat Bhushan
2025-05-21 10:04 ` [PATCH 3/4 v3] crypto: octeontx2: Fix address alignment on CN10K A0/A1 and OcteonTX2 Bharat Bhushan
2025-05-21 11:28   ` Herbert Xu
2025-05-22 10:08     ` Bharat Bhushan
2025-05-21 10:04 ` [PATCH 4/4 v3] crypto: octeontx2: Fix address alignment on CN10KB and CN10KA-B0 Bharat Bhushan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox