[PATCH v2 RESEND 0/3] Add Block event interrupt support for I2C protocol

public inbox for dmaengine@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 RESEND 0/3] Add Block event interrupt support for I2C protocol
@ 2024-11-11 14:02 Jyothi Kumar Seerapu
  2024-11-11 14:02 ` [PATCH v2 RESEND 1/3] dmaengine: qcom: gpi: Add GPI Block event interrupt support Jyothi Kumar Seerapu
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Jyothi Kumar Seerapu @ 2024-11-11 14:02 UTC (permalink / raw)
  To: Vinod Koul, Andi Shyti, Sumit Semwal, Christian König
  Cc: linux-arm-msm, dmaengine, linux-kernel, linux-i2c, linux-media,
	dri-devel, linaro-mm-sig, quic_msavaliy, quic_vtanuku

The I2C driver gets an interrupt upon transfer completion.
For multiple messages in a single transfer, N interrupts will be
received for N messages, leading to significant software interrupt
latency. To mitigate this latency, utilize Block Event Interrupt (BEI)
only when an interrupt is necessary. This means large transfers can be
split into multiple chunks of 8 messages internally, without expecting
interrupts for the first 7 messages completion, only the last one will
trigger an interrupt indicating 8 messages completed.

By implementing BEI, multi-message transfers can be divided into
chunks of 8 messages, improving overall transfer time.
This optimization reduces transfer time from 168 ms to 48 ms for a
series of 200 I2C write messages in a single transfer, with a
clock frequency support of 100 kHz.

BEI optimizations are currently implemented for I2C write transfers only,
as there is no use case for multiple I2C read messages in a single transfer
at this time.

v1 -> v2:
  - DT changes are reverted for adding dma channel size as a new arg of
    dma-cells property.
  - DT binding change reveted for dma channel size as a new arg of
    dma-cells property.
  - In GPI driver, reverted the changes to parse the channel TRE size
    from device tree.
  - Made the changes in QCOM I2C geni driver to support the BEI
    functionality with the existing TRE size of 64.
  - Made changes in QCOM I2C geni driver as per the review comments.
  - Fixed Kernel test robot reported compiltion issues.

Jyothi Kumar Seerapu (3):
  dmaengine: qcom: gpi: Add GPI Block event interrupt support
  i2c: qcom_geni: Update compile dependenices for qcom geni
  i2c: i2c-qcom-geni: Add Block event interrupt support

 drivers/dma/qcom/gpi.c             |  49 +++++++
 drivers/i2c/busses/Kconfig         |   1 +
 drivers/i2c/busses/i2c-qcom-geni.c | 203 +++++++++++++++++++++++++----
 include/linux/dma/qcom-gpi-dma.h   |  37 ++++++
 4 files changed, 265 insertions(+), 25 deletions(-)

base-commit: 55bcd2e0d04c1171d382badef1def1fd04ef66c5
-- 
2.17.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 RESEND 1/3] dmaengine: qcom: gpi: Add GPI Block event interrupt support
  2024-11-11 14:02 [PATCH v2 RESEND 0/3] Add Block event interrupt support for I2C protocol Jyothi Kumar Seerapu
@ 2024-11-11 14:02 ` Jyothi Kumar Seerapu
  2024-11-11 22:36   ` Andi Shyti
  2024-11-12  4:56   ` Bjorn Andersson
  2024-11-11 14:02 ` [PATCH v2 RESEND 2/3] i2c: qcom_geni: Update compile dependenices for qcom geni Jyothi Kumar Seerapu
  2024-11-11 14:02 ` [PATCH v2 RESEND 3/3] i2c: i2c-qcom-geni: Add Block event interrupt support Jyothi Kumar Seerapu
  2 siblings, 2 replies; 13+ messages in thread
From: Jyothi Kumar Seerapu @ 2024-11-11 14:02 UTC (permalink / raw)
  To: Vinod Koul, Andi Shyti, Sumit Semwal, Christian König
  Cc: linux-arm-msm, dmaengine, linux-kernel, linux-i2c, linux-media,
	dri-devel, linaro-mm-sig, quic_msavaliy, quic_vtanuku

GSI hardware generates an interrupt for each transfer completion.
For multiple messages within a single transfer, this results
in receiving N interrupts for N messages, which can introduce
significant software interrupt latency. To mitigate this latency,
utilize Block Event Interrupt (BEI) only when an interrupt is necessary.
When using BEI, consider splitting a single multi-message transfer into
chunks of 8. This approach can enhance overall transfer time and
efficiency.

Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu@quicinc.com>
---

v1 -> v2: 
   - Changed dma_addr type from array of pointers to array.
   - To support BEI functionality with the TRE size of 64 defined in GPI driver,
     updated QCOM_GPI_MAX_NUM_MSGS to 16 and NUM_MSGS_PER_IRQ to 8.
 
 drivers/dma/qcom/gpi.c           | 49 ++++++++++++++++++++++++++++++++
 include/linux/dma/qcom-gpi-dma.h | 37 ++++++++++++++++++++++++
 2 files changed, 86 insertions(+)

diff --git a/drivers/dma/qcom/gpi.c b/drivers/dma/qcom/gpi.c
index 52a7c8f2498f..a98de3178764 100644
--- a/drivers/dma/qcom/gpi.c
+++ b/drivers/dma/qcom/gpi.c
@@ -1693,6 +1693,9 @@ static int gpi_create_i2c_tre(struct gchan *chan, struct gpi_desc *desc,
 
 		tre->dword[3] = u32_encode_bits(TRE_TYPE_DMA, TRE_FLAGS_TYPE);
 		tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_IEOT);
+
+		if (i2c->flags & QCOM_GPI_BLOCK_EVENT_IRQ)
+			tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_BEI);
 	}
 
 	for (i = 0; i < tre_idx; i++)
@@ -2098,6 +2101,52 @@ static int gpi_find_avail_gpii(struct gpi_dev *gpi_dev, u32 seid)
 	return -EIO;
 }
 
+/**
+ * gpi_multi_desc_process() - Process received transfers from GSI HW
+ * @dev: pointer to the corresponding dev node
+ * @multi_xfer: pointer to the gpi_multi_xfer
+ * @num_xfers: total number of transfers
+ * @transfer_timeout_msecs: transfer timeout value
+ * @transfer_comp: completion object of the transfer
+ *
+ * This function is used to process the received transfers based on the
+ * completion events
+ *
+ * Return: On success returns 0, otherwise return error code
+ */
+int gpi_multi_desc_process(struct device *dev, struct gpi_multi_xfer *multi_xfer,
+			   u32 num_xfers, u32 transfer_timeout_msecs,
+			   struct completion *transfer_comp)
+{
+	int i;
+	u32 max_irq_cnt, time_left;
+
+	max_irq_cnt = num_xfers / NUM_MSGS_PER_IRQ;
+	if (num_xfers % NUM_MSGS_PER_IRQ)
+		max_irq_cnt++;
+
+	/*
+	 * Wait for the interrupts of the processed transfers in multiple
+	 * of 64 and for the last transfer. If the hardware is fast and
+	 * already processed all the transfers then no need to wait.
+	 */
+	for (i = 0; i < max_irq_cnt; i++) {
+		reinit_completion(transfer_comp);
+		if (max_irq_cnt != multi_xfer->irq_cnt) {
+			time_left = wait_for_completion_timeout(transfer_comp,
+								transfer_timeout_msecs);
+			if (!time_left) {
+				dev_err(dev, "%s: Transfer timeout\n", __func__);
+				return -ETIMEDOUT;
+			}
+		}
+		if (num_xfers > multi_xfer->msg_idx_cnt)
+			return 0;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(gpi_multi_desc_process);
+
 /* gpi_of_dma_xlate: open client requested channel */
 static struct dma_chan *gpi_of_dma_xlate(struct of_phandle_args *args,
 					 struct of_dma *of_dma)
diff --git a/include/linux/dma/qcom-gpi-dma.h b/include/linux/dma/qcom-gpi-dma.h
index 6680dd1a43c6..1341ff0db808 100644
--- a/include/linux/dma/qcom-gpi-dma.h
+++ b/include/linux/dma/qcom-gpi-dma.h
@@ -15,6 +15,12 @@ enum spi_transfer_cmd {
 	SPI_DUPLEX,
 };
 
+#define QCOM_GPI_BLOCK_EVENT_IRQ	BIT(0)
+
+#define QCOM_GPI_MAX_NUM_MSGS		16
+#define NUM_MSGS_PER_IRQ		8
+#define MIN_NUM_OF_MSGS_MULTI_DESC	4
+
 /**
  * struct gpi_spi_config - spi config for peripheral
  *
@@ -51,6 +57,29 @@ enum i2c_op {
 	I2C_READ,
 };
 
+/**
+ * struct gpi_multi_xfer - Used for multi transfer support
+ *
+ * @msg_idx_cnt: message index for the transfer
+ * @buf_idx: dma buffer index
+ * @unmap_msg_cnt: unampped transfer index
+ * @freed_msg_cnt: freed transfer index
+ * @irq_cnt: received interrupt count
+ * @irq_msg_cnt: transfer message count for the received irqs
+ * @dma_buf: virtual address of the buffer
+ * @dma_addr: dma address of the buffer
+ */
+struct gpi_multi_xfer {
+	u32 msg_idx_cnt;
+	u32 buf_idx;
+	u32 unmap_msg_cnt;
+	u32 freed_msg_cnt;
+	u32 irq_cnt;
+	u32 irq_msg_cnt;
+	void *dma_buf[QCOM_GPI_MAX_NUM_MSGS];
+	dma_addr_t dma_addr[QCOM_GPI_MAX_NUM_MSGS];
+};
+
 /**
  * struct gpi_i2c_config - i2c config for peripheral
  *
@@ -65,6 +94,8 @@ enum i2c_op {
  * @rx_len: receive length for buffer
  * @op: i2c cmd
  * @muli-msg: is part of multi i2c r-w msgs
+ * @flags: true for block event interrupt support
+ * @multi_xfer: indicates transfer has multi messages
  */
 struct gpi_i2c_config {
 	u8 set_config;
@@ -78,6 +109,12 @@ struct gpi_i2c_config {
 	u32 rx_len;
 	enum i2c_op op;
 	bool multi_msg;
+	u8 flags;
+	struct gpi_multi_xfer multi_xfer;
 };
 
+int gpi_multi_desc_process(struct device *dev, struct gpi_multi_xfer *multi_xfer,
+			   u32 num_xfers, u32 tranfer_timeout_msecs,
+			   struct completion *transfer_comp);
+
 #endif /* QCOM_GPI_DMA_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 RESEND 2/3] i2c: qcom_geni: Update compile dependenices for qcom geni
  2024-11-11 14:02 [PATCH v2 RESEND 0/3] Add Block event interrupt support for I2C protocol Jyothi Kumar Seerapu
  2024-11-11 14:02 ` [PATCH v2 RESEND 1/3] dmaengine: qcom: gpi: Add GPI Block event interrupt support Jyothi Kumar Seerapu
@ 2024-11-11 14:02 ` Jyothi Kumar Seerapu
  2024-11-12  4:09   ` Bjorn Andersson
  2024-11-11 14:02 ` [PATCH v2 RESEND 3/3] i2c: i2c-qcom-geni: Add Block event interrupt support Jyothi Kumar Seerapu
  2 siblings, 1 reply; 13+ messages in thread
From: Jyothi Kumar Seerapu @ 2024-11-11 14:02 UTC (permalink / raw)
  To: Vinod Koul, Andi Shyti, Sumit Semwal, Christian König
  Cc: linux-arm-msm, dmaengine, linux-kernel, linux-i2c, linux-media,
	dri-devel, linaro-mm-sig, quic_msavaliy, quic_vtanuku

I2C_QCOM_GENI is having compile dependencies on QCOM_GPI_DMA and
so update I2C_QCOM_GENI to depends on QCOM_GPI_DMA.

Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu@quicinc.com>
---

v1 -> v2: 
	This patch is added in v2 to address the kernel test robot
	reported compilation error.
	ERROR: modpost: "gpi_multi_desc_process" [drivers/i2c/busses/i2c-qcom-geni.ko] undefined! 

drivers/i2c/busses/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 0aa948014008..87634a682855 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -1049,6 +1049,7 @@ config I2C_QCOM_GENI
 	tristate "Qualcomm Technologies Inc.'s GENI based I2C controller"
 	depends on ARCH_QCOM || COMPILE_TEST
 	depends on QCOM_GENI_SE
+	depends on QCOM_GPI_DMA
 	help
 	  This driver supports GENI serial engine based I2C controller in
 	  master mode on the Qualcomm Technologies Inc.'s SoCs. If you say
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 RESEND 3/3] i2c: i2c-qcom-geni: Add Block event interrupt support
  2024-11-11 14:02 [PATCH v2 RESEND 0/3] Add Block event interrupt support for I2C protocol Jyothi Kumar Seerapu
  2024-11-11 14:02 ` [PATCH v2 RESEND 1/3] dmaengine: qcom: gpi: Add GPI Block event interrupt support Jyothi Kumar Seerapu
  2024-11-11 14:02 ` [PATCH v2 RESEND 2/3] i2c: qcom_geni: Update compile dependenices for qcom geni Jyothi Kumar Seerapu
@ 2024-11-11 14:02 ` Jyothi Kumar Seerapu
  2024-11-12  4:33   ` Bjorn Andersson
  2 siblings, 1 reply; 13+ messages in thread
From: Jyothi Kumar Seerapu @ 2024-11-11 14:02 UTC (permalink / raw)
  To: Vinod Koul, Andi Shyti, Sumit Semwal, Christian König
  Cc: linux-arm-msm, dmaengine, linux-kernel, linux-i2c, linux-media,
	dri-devel, linaro-mm-sig, quic_msavaliy, quic_vtanuku

The I2C driver gets an interrupt upon transfer completion.
For multiple messages in a single transfer, N interrupts will be
received for N messages, leading to significant software interrupt
latency. To mitigate this latency, utilize Block Event Interrupt (BEI)
only when an interrupt is necessary. This means large transfers can be
split into multiple chunks of 8 messages internally, without expecting
interrupts for the first 7 message completions, only the last one will
trigger an interrupt indicating 8 messages completed.

By implementing BEI, multi-message transfers can be divided into
chunks of 8 messages, improving overall transfer time.
This optimization reduces transfer time from 168 ms to 48 ms for a
series of 200 I2C write messages in a single transfer, with a
clock frequency support of 100 kHz.

BEI optimizations are currently implemented for I2C write transfers only,
as there is no use case for multiple I2C read messages in a single transfer
at this time.

Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu@quicinc.com>
---

v1 -> v2:
   - Moved gi2c_gpi_xfer->msg_idx_cnt to separate local variable.
   - Updated goto labels for error scenarios in geni_i2c_gpi function
   - memset tx_multi_xfer to 0.
   - Removed passing current msg index to geni_i2c_gpi.
   - Fixed kernel test robot reported compilation issues.    

 drivers/i2c/busses/i2c-qcom-geni.c | 203 +++++++++++++++++++++++++----
 1 file changed, 178 insertions(+), 25 deletions(-)

diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c
index 7a22e1f46e60..04a7d926dadc 100644
--- a/drivers/i2c/busses/i2c-qcom-geni.c
+++ b/drivers/i2c/busses/i2c-qcom-geni.c
@@ -100,6 +100,10 @@ struct geni_i2c_dev {
 	struct dma_chan *rx_c;
 	bool gpi_mode;
 	bool abort_done;
+	bool is_tx_multi_xfer;
+	u32 num_msgs;
+	u32 tx_irq_cnt;
+	struct gpi_i2c_config *gpi_config;
 };
 
 struct geni_i2c_desc {
@@ -500,6 +504,7 @@ static int geni_i2c_tx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
 static void i2c_gpi_cb_result(void *cb, const struct dmaengine_result *result)
 {
 	struct geni_i2c_dev *gi2c = cb;
+	struct gpi_multi_xfer *tx_multi_xfer;
 
 	if (result->result != DMA_TRANS_NOERROR) {
 		dev_err(gi2c->se.dev, "DMA txn failed:%d\n", result->result);
@@ -508,7 +513,21 @@ static void i2c_gpi_cb_result(void *cb, const struct dmaengine_result *result)
 		dev_dbg(gi2c->se.dev, "DMA xfer has pending: %d\n", result->residue);
 	}
 
-	complete(&gi2c->done);
+	if (gi2c->is_tx_multi_xfer) {
+		tx_multi_xfer = &gi2c->gpi_config->multi_xfer;
+
+		/*
+		 * Send Completion for last message or multiple of NUM_MSGS_PER_IRQ.
+		 */
+		if ((tx_multi_xfer->irq_msg_cnt == gi2c->num_msgs - 1) ||
+		    (!((tx_multi_xfer->irq_msg_cnt + 1) % NUM_MSGS_PER_IRQ))) {
+			tx_multi_xfer->irq_cnt++;
+			complete(&gi2c->done);
+		}
+		tx_multi_xfer->irq_msg_cnt++;
+	} else {
+		complete(&gi2c->done);
+	}
 }
 
 static void geni_i2c_gpi_unmap(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
@@ -526,7 +545,42 @@ static void geni_i2c_gpi_unmap(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
 	}
 }
 
-static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
+/**
+ * gpi_i2c_multi_desc_unmap() - unmaps the buffers post multi message TX transfers
+ * @dev: pointer to the corresponding dev node
+ * @gi2c: i2c dev handle
+ * @msgs: i2c messages array
+ * @peripheral: pointer to the gpi_i2c_config
+ */
+static void gpi_i2c_multi_desc_unmap(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[],
+				     struct gpi_i2c_config *peripheral)
+{
+	u32 msg_xfer_cnt, wr_idx = 0;
+	struct gpi_multi_xfer *tx_multi_xfer = &peripheral->multi_xfer;
+
+	/*
+	 * In error case, need to unmap all messages based on the msg_idx_cnt.
+	 * Non-error case unmap all the processed messages.
+	 */
+	if (gi2c->err)
+		msg_xfer_cnt = tx_multi_xfer->msg_idx_cnt;
+	else
+		msg_xfer_cnt = tx_multi_xfer->irq_cnt * NUM_MSGS_PER_IRQ;
+
+	/* Unmap the processed DMA buffers based on the received interrupt count */
+	for (; tx_multi_xfer->unmap_msg_cnt < msg_xfer_cnt; tx_multi_xfer->unmap_msg_cnt++) {
+		if (tx_multi_xfer->unmap_msg_cnt == gi2c->num_msgs)
+			break;
+		wr_idx = tx_multi_xfer->unmap_msg_cnt % QCOM_GPI_MAX_NUM_MSGS;
+		geni_i2c_gpi_unmap(gi2c, &msgs[tx_multi_xfer->unmap_msg_cnt],
+				   tx_multi_xfer->dma_buf[wr_idx],
+				   tx_multi_xfer->dma_addr[wr_idx],
+				   NULL, (dma_addr_t)NULL);
+		tx_multi_xfer->freed_msg_cnt++;
+	}
+}
+
+static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[],
 			struct dma_slave_config *config, dma_addr_t *dma_addr_p,
 			void **buf, unsigned int op, struct dma_chan *dma_chan)
 {
@@ -538,26 +592,48 @@ static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
 	enum dma_transfer_direction dma_dirn;
 	struct dma_async_tx_descriptor *desc;
 	int ret;
+	struct gpi_multi_xfer *gi2c_gpi_xfer;
+	dma_cookie_t cookie;
+	u32 msg_idx;
 
 	peripheral = config->peripheral_config;
-
-	dma_buf = i2c_get_dma_safe_msg_buf(msg, 1);
-	if (!dma_buf)
-		return -ENOMEM;
+	gi2c_gpi_xfer = &peripheral->multi_xfer;
+	dma_buf = gi2c_gpi_xfer->dma_buf[gi2c_gpi_xfer->buf_idx];
+	addr = gi2c_gpi_xfer->dma_addr[gi2c_gpi_xfer->buf_idx];
+	msg_idx = gi2c_gpi_xfer->msg_idx_cnt;
+
+	dma_buf = i2c_get_dma_safe_msg_buf(&msgs[msg_idx], 1);
+	if (!dma_buf) {
+		ret = -ENOMEM;
+		goto out;
+	}
 
 	if (op == I2C_WRITE)
 		map_dirn = DMA_TO_DEVICE;
 	else
 		map_dirn = DMA_FROM_DEVICE;
 
-	addr = dma_map_single(gi2c->se.dev->parent, dma_buf, msg->len, map_dirn);
+	addr = dma_map_single(gi2c->se.dev->parent, dma_buf,
+			      msgs[msg_idx].len, map_dirn);
 	if (dma_mapping_error(gi2c->se.dev->parent, addr)) {
-		i2c_put_dma_safe_msg_buf(dma_buf, msg, false);
-		return -ENOMEM;
+		i2c_put_dma_safe_msg_buf(dma_buf, &msgs[msg_idx], false);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (gi2c->is_tx_multi_xfer) {
+		if (((msg_idx + 1) % NUM_MSGS_PER_IRQ))
+			peripheral->flags |= QCOM_GPI_BLOCK_EVENT_IRQ;
+		else
+			peripheral->flags &= ~QCOM_GPI_BLOCK_EVENT_IRQ;
+
+		/* BEI bit to be cleared for last TRE */
+		if (msg_idx == gi2c->num_msgs - 1)
+			peripheral->flags &= ~QCOM_GPI_BLOCK_EVENT_IRQ;
 	}
 
 	/* set the length as message for rx txn */
-	peripheral->rx_len = msg->len;
+	peripheral->rx_len = msgs[msg_idx].len;
 	peripheral->op = op;
 
 	ret = dmaengine_slave_config(dma_chan, config);
@@ -575,7 +651,8 @@ static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
 	else
 		dma_dirn = DMA_DEV_TO_MEM;
 
-	desc = dmaengine_prep_slave_single(dma_chan, addr, msg->len, dma_dirn, flags);
+	desc = dmaengine_prep_slave_single(dma_chan, addr, msgs[msg_idx].len,
+					   dma_dirn, flags);
 	if (!desc) {
 		dev_err(gi2c->se.dev, "prep_slave_sg failed\n");
 		ret = -EIO;
@@ -585,15 +662,48 @@ static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
 	desc->callback_result = i2c_gpi_cb_result;
 	desc->callback_param = gi2c;
 
-	dmaengine_submit(desc);
-	*buf = dma_buf;
-	*dma_addr_p = addr;
+	if (!((msgs[msg_idx].flags & I2C_M_RD) && op == I2C_WRITE)) {
+		gi2c_gpi_xfer->msg_idx_cnt++;
+		gi2c_gpi_xfer->buf_idx = (msg_idx + 1) % QCOM_GPI_MAX_NUM_MSGS;
+	}
+	cookie = dmaengine_submit(desc);
+	if (dma_submit_error(cookie)) {
+		dev_err(gi2c->se.dev,
+			"%s: dmaengine_submit failed (%d)\n", __func__, cookie);
+		ret = -EINVAL;
+		goto err_config;
+	}
 
+	if (gi2c->is_tx_multi_xfer) {
+		dma_async_issue_pending(gi2c->tx_c);
+		if ((msg_idx == (gi2c->num_msgs - 1)) ||
+		    (gi2c_gpi_xfer->msg_idx_cnt >=
+		     QCOM_GPI_MAX_NUM_MSGS + gi2c_gpi_xfer->freed_msg_cnt)) {
+			ret = gpi_multi_desc_process(gi2c->se.dev, gi2c_gpi_xfer,
+						     gi2c->num_msgs, XFER_TIMEOUT,
+						     &gi2c->done);
+			if (ret) {
+				dev_err(gi2c->se.dev,
+					"I2C multi write msg transfer timeout: %d\n",
+					ret);
+				gi2c->err = ret;
+				goto err_config;
+			}
+		}
+	} else {
+		/* Non multi descriptor message transfer */
+		*buf = dma_buf;
+		*dma_addr_p = addr;
+	}
 	return 0;
 
 err_config:
-	dma_unmap_single(gi2c->se.dev->parent, addr, msg->len, map_dirn);
-	i2c_put_dma_safe_msg_buf(dma_buf, msg, false);
+	dma_unmap_single(gi2c->se.dev->parent, addr,
+			 msgs[msg_idx].len, map_dirn);
+	i2c_put_dma_safe_msg_buf(dma_buf, &msgs[msg_idx], false);
+
+out:
+	gi2c->err = ret;
 	return ret;
 }
 
@@ -605,6 +715,7 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
 	unsigned long time_left;
 	dma_addr_t tx_addr, rx_addr;
 	void *tx_buf = NULL, *rx_buf = NULL;
+	struct gpi_multi_xfer *tx_multi_xfer;
 	const struct geni_i2c_clk_fld *itr = gi2c->clk_fld;
 
 	config.peripheral_config = &peripheral;
@@ -618,6 +729,34 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
 	peripheral.set_config = 1;
 	peripheral.multi_msg = false;
 
+	gi2c->gpi_config = &peripheral;
+	gi2c->num_msgs = num;
+	gi2c->is_tx_multi_xfer = false;
+	gi2c->tx_irq_cnt = 0;
+
+	tx_multi_xfer = &peripheral.multi_xfer;
+	memset(tx_multi_xfer, 0, sizeof(struct gpi_multi_xfer));
+
+	/*
+	 * If number of write messages are four and higher then
+	 * configure hardware for multi descriptor transfers with BEI.
+	 */
+	if (num >= MIN_NUM_OF_MSGS_MULTI_DESC) {
+		gi2c->is_tx_multi_xfer = true;
+		for (i = 0; i < num; i++) {
+			if (msgs[i].flags & I2C_M_RD) {
+				/*
+				 * Multi descriptor transfer with BEI
+				 * support is enabled for write transfers.
+				 * Add BEI optimization support for read
+				 * transfers later.
+				 */
+				gi2c->is_tx_multi_xfer = false;
+				break;
+			}
+		}
+	}
+
 	for (i = 0; i < num; i++) {
 		gi2c->cur = &msgs[i];
 		gi2c->err = 0;
@@ -628,14 +767,16 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
 			peripheral.stretch = 1;
 
 		peripheral.addr = msgs[i].addr;
+		if (i > 0 && (!(msgs[i].flags & I2C_M_RD)))
+			peripheral.multi_msg = false;
 
-		ret =  geni_i2c_gpi(gi2c, &msgs[i], &config,
+		ret =  geni_i2c_gpi(gi2c, msgs, &config,
 				    &tx_addr, &tx_buf, I2C_WRITE, gi2c->tx_c);
 		if (ret)
 			goto err;
 
 		if (msgs[i].flags & I2C_M_RD) {
-			ret =  geni_i2c_gpi(gi2c, &msgs[i], &config,
+			ret =  geni_i2c_gpi(gi2c, msgs, &config,
 					    &rx_addr, &rx_buf, I2C_READ, gi2c->rx_c);
 			if (ret)
 				goto err;
@@ -643,18 +784,26 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
 			dma_async_issue_pending(gi2c->rx_c);
 		}
 
-		dma_async_issue_pending(gi2c->tx_c);
-
-		time_left = wait_for_completion_timeout(&gi2c->done, XFER_TIMEOUT);
-		if (!time_left)
-			gi2c->err = -ETIMEDOUT;
+		if (!gi2c->is_tx_multi_xfer) {
+			dma_async_issue_pending(gi2c->tx_c);
+			time_left = wait_for_completion_timeout(&gi2c->done, XFER_TIMEOUT);
+			if (!time_left) {
+				dev_err(gi2c->se.dev, "%s:I2C timeout\n", __func__);
+				gi2c->err = -ETIMEDOUT;
+			}
+		}
 
 		if (gi2c->err) {
 			ret = gi2c->err;
 			goto err;
 		}
 
-		geni_i2c_gpi_unmap(gi2c, &msgs[i], tx_buf, tx_addr, rx_buf, rx_addr);
+		if (!gi2c->is_tx_multi_xfer) {
+			geni_i2c_gpi_unmap(gi2c, &msgs[i], tx_buf, tx_addr, rx_buf, rx_addr);
+		} else if (gi2c->tx_irq_cnt != tx_multi_xfer->irq_cnt) {
+			gi2c->tx_irq_cnt = tx_multi_xfer->irq_cnt;
+			gpi_i2c_multi_desc_unmap(gi2c, msgs, &peripheral);
+		}
 	}
 
 	return num;
@@ -663,7 +812,11 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
 	dev_err(gi2c->se.dev, "GPI transfer failed: %d\n", ret);
 	dmaengine_terminate_sync(gi2c->rx_c);
 	dmaengine_terminate_sync(gi2c->tx_c);
-	geni_i2c_gpi_unmap(gi2c, &msgs[i], tx_buf, tx_addr, rx_buf, rx_addr);
+	if (gi2c->is_tx_multi_xfer)
+		gpi_i2c_multi_desc_unmap(gi2c, msgs, &peripheral);
+	else
+		geni_i2c_gpi_unmap(gi2c, &msgs[i], tx_buf, tx_addr, rx_buf, rx_addr);
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 RESEND 1/3] dmaengine: qcom: gpi: Add GPI Block event interrupt support
  2024-11-11 14:02 ` [PATCH v2 RESEND 1/3] dmaengine: qcom: gpi: Add GPI Block event interrupt support Jyothi Kumar Seerapu
@ 2024-11-11 22:36   ` Andi Shyti
  2024-11-21 12:51     ` Jyothi Kumar Seerapu
  2024-11-12  4:56   ` Bjorn Andersson
  1 sibling, 1 reply; 13+ messages in thread
From: Andi Shyti @ 2024-11-11 22:36 UTC (permalink / raw)
  To: Jyothi Kumar Seerapu
  Cc: Vinod Koul, Sumit Semwal, Christian König, linux-arm-msm,
	dmaengine, linux-kernel, linux-i2c, linux-media, dri-devel,
	linaro-mm-sig, quic_msavaliy, quic_vtanuku

Ping, Vinod :-)

Andi

On Mon, Nov 11, 2024 at 07:32:42PM +0530, Jyothi Kumar Seerapu wrote:
> GSI hardware generates an interrupt for each transfer completion.
> For multiple messages within a single transfer, this results
> in receiving N interrupts for N messages, which can introduce
> significant software interrupt latency. To mitigate this latency,
> utilize Block Event Interrupt (BEI) only when an interrupt is necessary.
> When using BEI, consider splitting a single multi-message transfer into
> chunks of 8. This approach can enhance overall transfer time and
> efficiency.
> 
> Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu@quicinc.com>
> ---
> 
> v1 -> v2: 
>    - Changed dma_addr type from array of pointers to array.
>    - To support BEI functionality with the TRE size of 64 defined in GPI driver,
>      updated QCOM_GPI_MAX_NUM_MSGS to 16 and NUM_MSGS_PER_IRQ to 8.
>  
>  drivers/dma/qcom/gpi.c           | 49 ++++++++++++++++++++++++++++++++
>  include/linux/dma/qcom-gpi-dma.h | 37 ++++++++++++++++++++++++
>  2 files changed, 86 insertions(+)
> 
> diff --git a/drivers/dma/qcom/gpi.c b/drivers/dma/qcom/gpi.c
> index 52a7c8f2498f..a98de3178764 100644
> --- a/drivers/dma/qcom/gpi.c
> +++ b/drivers/dma/qcom/gpi.c
> @@ -1693,6 +1693,9 @@ static int gpi_create_i2c_tre(struct gchan *chan, struct gpi_desc *desc,
>  
>  		tre->dword[3] = u32_encode_bits(TRE_TYPE_DMA, TRE_FLAGS_TYPE);
>  		tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_IEOT);
> +
> +		if (i2c->flags & QCOM_GPI_BLOCK_EVENT_IRQ)
> +			tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_BEI);
>  	}
>  
>  	for (i = 0; i < tre_idx; i++)
> @@ -2098,6 +2101,52 @@ static int gpi_find_avail_gpii(struct gpi_dev *gpi_dev, u32 seid)
>  	return -EIO;
>  }
>  
> +/**
> + * gpi_multi_desc_process() - Process received transfers from GSI HW
> + * @dev: pointer to the corresponding dev node
> + * @multi_xfer: pointer to the gpi_multi_xfer
> + * @num_xfers: total number of transfers
> + * @transfer_timeout_msecs: transfer timeout value
> + * @transfer_comp: completion object of the transfer
> + *
> + * This function is used to process the received transfers based on the
> + * completion events
> + *
> + * Return: On success returns 0, otherwise return error code
> + */
> +int gpi_multi_desc_process(struct device *dev, struct gpi_multi_xfer *multi_xfer,
> +			   u32 num_xfers, u32 transfer_timeout_msecs,
> +			   struct completion *transfer_comp)
> +{
> +	int i;
> +	u32 max_irq_cnt, time_left;
> +
> +	max_irq_cnt = num_xfers / NUM_MSGS_PER_IRQ;
> +	if (num_xfers % NUM_MSGS_PER_IRQ)
> +		max_irq_cnt++;
> +
> +	/*
> +	 * Wait for the interrupts of the processed transfers in multiple
> +	 * of 64 and for the last transfer. If the hardware is fast and
> +	 * already processed all the transfers then no need to wait.
> +	 */
> +	for (i = 0; i < max_irq_cnt; i++) {
> +		reinit_completion(transfer_comp);
> +		if (max_irq_cnt != multi_xfer->irq_cnt) {
> +			time_left = wait_for_completion_timeout(transfer_comp,
> +								transfer_timeout_msecs);
> +			if (!time_left) {
> +				dev_err(dev, "%s: Transfer timeout\n", __func__);
> +				return -ETIMEDOUT;
> +			}
> +		}
> +		if (num_xfers > multi_xfer->msg_idx_cnt)
> +			return 0;
> +	}
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(gpi_multi_desc_process);
> +
>  /* gpi_of_dma_xlate: open client requested channel */
>  static struct dma_chan *gpi_of_dma_xlate(struct of_phandle_args *args,
>  					 struct of_dma *of_dma)
> diff --git a/include/linux/dma/qcom-gpi-dma.h b/include/linux/dma/qcom-gpi-dma.h
> index 6680dd1a43c6..1341ff0db808 100644
> --- a/include/linux/dma/qcom-gpi-dma.h
> +++ b/include/linux/dma/qcom-gpi-dma.h
> @@ -15,6 +15,12 @@ enum spi_transfer_cmd {
>  	SPI_DUPLEX,
>  };
>  
> +#define QCOM_GPI_BLOCK_EVENT_IRQ	BIT(0)
> +
> +#define QCOM_GPI_MAX_NUM_MSGS		16
> +#define NUM_MSGS_PER_IRQ		8
> +#define MIN_NUM_OF_MSGS_MULTI_DESC	4
> +
>  /**
>   * struct gpi_spi_config - spi config for peripheral
>   *
> @@ -51,6 +57,29 @@ enum i2c_op {
>  	I2C_READ,
>  };
>  
> +/**
> + * struct gpi_multi_xfer - Used for multi transfer support
> + *
> + * @msg_idx_cnt: message index for the transfer
> + * @buf_idx: dma buffer index
> + * @unmap_msg_cnt: unampped transfer index
> + * @freed_msg_cnt: freed transfer index
> + * @irq_cnt: received interrupt count
> + * @irq_msg_cnt: transfer message count for the received irqs
> + * @dma_buf: virtual address of the buffer
> + * @dma_addr: dma address of the buffer
> + */
> +struct gpi_multi_xfer {
> +	u32 msg_idx_cnt;
> +	u32 buf_idx;
> +	u32 unmap_msg_cnt;
> +	u32 freed_msg_cnt;
> +	u32 irq_cnt;
> +	u32 irq_msg_cnt;
> +	void *dma_buf[QCOM_GPI_MAX_NUM_MSGS];
> +	dma_addr_t dma_addr[QCOM_GPI_MAX_NUM_MSGS];
> +};
> +
>  /**
>   * struct gpi_i2c_config - i2c config for peripheral
>   *
> @@ -65,6 +94,8 @@ enum i2c_op {
>   * @rx_len: receive length for buffer
>   * @op: i2c cmd
>   * @muli-msg: is part of multi i2c r-w msgs
> + * @flags: true for block event interrupt support
> + * @multi_xfer: indicates transfer has multi messages
>   */
>  struct gpi_i2c_config {
>  	u8 set_config;
> @@ -78,6 +109,12 @@ struct gpi_i2c_config {
>  	u32 rx_len;
>  	enum i2c_op op;
>  	bool multi_msg;
> +	u8 flags;
> +	struct gpi_multi_xfer multi_xfer;
>  };
>  
> +int gpi_multi_desc_process(struct device *dev, struct gpi_multi_xfer *multi_xfer,
> +			   u32 num_xfers, u32 tranfer_timeout_msecs,
> +			   struct completion *transfer_comp);
> +
>  #endif /* QCOM_GPI_DMA_H */
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 RESEND 2/3] i2c: qcom_geni: Update compile dependenices for qcom geni
  2024-11-11 14:02 ` [PATCH v2 RESEND 2/3] i2c: qcom_geni: Update compile dependenices for qcom geni Jyothi Kumar Seerapu
@ 2024-11-12  4:09   ` Bjorn Andersson
  2024-11-21 12:52     ` Jyothi Kumar Seerapu
  0 siblings, 1 reply; 13+ messages in thread
From: Bjorn Andersson @ 2024-11-12  4:09 UTC (permalink / raw)
  To: Jyothi Kumar Seerapu
  Cc: Vinod Koul, Andi Shyti, Sumit Semwal, Christian König,
	linux-arm-msm, dmaengine, linux-kernel, linux-i2c, linux-media,
	dri-devel, linaro-mm-sig, quic_msavaliy, quic_vtanuku

On Mon, Nov 11, 2024 at 07:32:43PM +0530, Jyothi Kumar Seerapu wrote:
> I2C_QCOM_GENI is having compile dependencies on QCOM_GPI_DMA and
> so update I2C_QCOM_GENI to depends on QCOM_GPI_DMA.
> 

Given that this is a separate patch, your wording can only be
interpreted as this being an existing problem.

> Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu@quicinc.com>
> ---
> 
> v1 -> v2: 
> 	This patch is added in v2 to address the kernel test robot
> 	reported compilation error.
> 	ERROR: modpost: "gpi_multi_desc_process" [drivers/i2c/busses/i2c-qcom-geni.ko] undefined! 

But as far as I can tell you introduce this problem in patch 3. If so
this addition should be part of patch 3.



Also, you have different subject prefix for patch 2 and 3, yet they
relate to the same driver. Not pretty.

Regards,
Bjorn

> 
> drivers/i2c/busses/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
> index 0aa948014008..87634a682855 100644
> --- a/drivers/i2c/busses/Kconfig
> +++ b/drivers/i2c/busses/Kconfig
> @@ -1049,6 +1049,7 @@ config I2C_QCOM_GENI
>  	tristate "Qualcomm Technologies Inc.'s GENI based I2C controller"
>  	depends on ARCH_QCOM || COMPILE_TEST
>  	depends on QCOM_GENI_SE
> +	depends on QCOM_GPI_DMA
>  	help
>  	  This driver supports GENI serial engine based I2C controller in
>  	  master mode on the Qualcomm Technologies Inc.'s SoCs. If you say
> -- 
> 2.17.1
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 RESEND 3/3] i2c: i2c-qcom-geni: Add Block event interrupt support
  2024-11-11 14:02 ` [PATCH v2 RESEND 3/3] i2c: i2c-qcom-geni: Add Block event interrupt support Jyothi Kumar Seerapu
@ 2024-11-12  4:33   ` Bjorn Andersson
  2024-11-21 12:58     ` Jyothi Kumar Seerapu
  0 siblings, 1 reply; 13+ messages in thread
From: Bjorn Andersson @ 2024-11-12  4:33 UTC (permalink / raw)
  To: Jyothi Kumar Seerapu
  Cc: Vinod Koul, Andi Shyti, Sumit Semwal, Christian König,
	linux-arm-msm, dmaengine, linux-kernel, linux-i2c, linux-media,
	dri-devel, linaro-mm-sig, quic_msavaliy, quic_vtanuku

On Mon, Nov 11, 2024 at 07:32:44PM +0530, Jyothi Kumar Seerapu wrote:
> The I2C driver gets an interrupt upon transfer completion.
> For multiple messages in a single transfer, N interrupts will be
> received for N messages, leading to significant software interrupt
> latency. To mitigate this latency, utilize Block Event Interrupt (BEI)

Please rewrite this to the tone that the reader doesn't know what Block
Event Interrupt is, or that it exists.

> only when an interrupt is necessary. This means large transfers can be
> split into multiple chunks of 8 messages internally, without expecting
> interrupts for the first 7 message completions, only the last one will
> trigger an interrupt indicating 8 messages completed.
> 
> By implementing BEI, multi-message transfers can be divided into
> chunks of 8 messages, improving overall transfer time.

You already wrote this in the paragraph above.


Where is this number 8 coming from btw?

> This optimization reduces transfer time from 168 ms to 48 ms for a
> series of 200 I2C write messages in a single transfer, with a
> clock frequency support of 100 kHz.
> 
> BEI optimizations are currently implemented for I2C write transfers only,
> as there is no use case for multiple I2C read messages in a single transfer
> at this time.
> 
> Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu@quicinc.com>
> ---
> 
> v1 -> v2:
>    - Moved gi2c_gpi_xfer->msg_idx_cnt to separate local variable.
>    - Updated goto labels for error scenarios in geni_i2c_gpi function
>    - memset tx_multi_xfer to 0.
>    - Removed passing current msg index to geni_i2c_gpi.
>    - Fixed kernel test robot reported compilation issues.    
> 
>  drivers/i2c/busses/i2c-qcom-geni.c | 203 +++++++++++++++++++++++++----
>  1 file changed, 178 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c
> index 7a22e1f46e60..04a7d926dadc 100644
> --- a/drivers/i2c/busses/i2c-qcom-geni.c
> +++ b/drivers/i2c/busses/i2c-qcom-geni.c
> @@ -100,6 +100,10 @@ struct geni_i2c_dev {
>  	struct dma_chan *rx_c;
>  	bool gpi_mode;
>  	bool abort_done;
> +	bool is_tx_multi_xfer;
> +	u32 num_msgs;
> +	u32 tx_irq_cnt;
> +	struct gpi_i2c_config *gpi_config;
>  };
>  
>  struct geni_i2c_desc {
> @@ -500,6 +504,7 @@ static int geni_i2c_tx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
>  static void i2c_gpi_cb_result(void *cb, const struct dmaengine_result *result)
>  {
>  	struct geni_i2c_dev *gi2c = cb;
> +	struct gpi_multi_xfer *tx_multi_xfer;
>  
>  	if (result->result != DMA_TRANS_NOERROR) {
>  		dev_err(gi2c->se.dev, "DMA txn failed:%d\n", result->result);
> @@ -508,7 +513,21 @@ static void i2c_gpi_cb_result(void *cb, const struct dmaengine_result *result)
>  		dev_dbg(gi2c->se.dev, "DMA xfer has pending: %d\n", result->residue);
>  	}
>  
> -	complete(&gi2c->done);
> +	if (gi2c->is_tx_multi_xfer) {

Wouldn't it be cleaner to treat the !is_tx_multi_xfer case as a
multi-xfer of length 1?

> +		tx_multi_xfer = &gi2c->gpi_config->multi_xfer;
> +
> +		/*
> +		 * Send Completion for last message or multiple of NUM_MSGS_PER_IRQ.
> +		 */
> +		if ((tx_multi_xfer->irq_msg_cnt == gi2c->num_msgs - 1) ||
> +		    (!((tx_multi_xfer->irq_msg_cnt + 1) % NUM_MSGS_PER_IRQ))) {
> +			tx_multi_xfer->irq_cnt++;
> +			complete(&gi2c->done);

Why? You're removing the wait_for_completion_timeout() from
geni_i2c_gpi_xfer() when is_tx_multi_xfer is set.

> +		}
> +		tx_multi_xfer->irq_msg_cnt++;
> +	} else {
> +		complete(&gi2c->done);
> +	}
>  }
>  
>  static void geni_i2c_gpi_unmap(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
> @@ -526,7 +545,42 @@ static void geni_i2c_gpi_unmap(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
>  	}
>  }
>  
> -static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
> +/**
> + * gpi_i2c_multi_desc_unmap() - unmaps the buffers post multi message TX transfers
> + * @dev: pointer to the corresponding dev node
> + * @gi2c: i2c dev handle
> + * @msgs: i2c messages array
> + * @peripheral: pointer to the gpi_i2c_config
> + */
> +static void gpi_i2c_multi_desc_unmap(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[],
> +				     struct gpi_i2c_config *peripheral)
> +{
> +	u32 msg_xfer_cnt, wr_idx = 0;
> +	struct gpi_multi_xfer *tx_multi_xfer = &peripheral->multi_xfer;
> +
> +	/*
> +	 * In error case, need to unmap all messages based on the msg_idx_cnt.
> +	 * Non-error case unmap all the processed messages.

What is the benefit of this optimization, compared to keeping things
simple and just unmap all buffers at the end of geni_i2c_gpi_xfer()?

> +	 */
> +	if (gi2c->err)
> +		msg_xfer_cnt = tx_multi_xfer->msg_idx_cnt;
> +	else
> +		msg_xfer_cnt = tx_multi_xfer->irq_cnt * NUM_MSGS_PER_IRQ;
> +
> +	/* Unmap the processed DMA buffers based on the received interrupt count */
> +	for (; tx_multi_xfer->unmap_msg_cnt < msg_xfer_cnt; tx_multi_xfer->unmap_msg_cnt++) {
> +		if (tx_multi_xfer->unmap_msg_cnt == gi2c->num_msgs)
> +			break;
> +		wr_idx = tx_multi_xfer->unmap_msg_cnt % QCOM_GPI_MAX_NUM_MSGS;
> +		geni_i2c_gpi_unmap(gi2c, &msgs[tx_multi_xfer->unmap_msg_cnt],
> +				   tx_multi_xfer->dma_buf[wr_idx],
> +				   tx_multi_xfer->dma_addr[wr_idx],
> +				   NULL, (dma_addr_t)NULL);
> +		tx_multi_xfer->freed_msg_cnt++;
> +	}
> +}
> +
> +static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[],
>  			struct dma_slave_config *config, dma_addr_t *dma_addr_p,
>  			void **buf, unsigned int op, struct dma_chan *dma_chan)
>  {
> @@ -538,26 +592,48 @@ static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
>  	enum dma_transfer_direction dma_dirn;
>  	struct dma_async_tx_descriptor *desc;
>  	int ret;
> +	struct gpi_multi_xfer *gi2c_gpi_xfer;
> +	dma_cookie_t cookie;
> +	u32 msg_idx;
>  
>  	peripheral = config->peripheral_config;
> -
> -	dma_buf = i2c_get_dma_safe_msg_buf(msg, 1);
> -	if (!dma_buf)
> -		return -ENOMEM;
> +	gi2c_gpi_xfer = &peripheral->multi_xfer;
> +	dma_buf = gi2c_gpi_xfer->dma_buf[gi2c_gpi_xfer->buf_idx];
> +	addr = gi2c_gpi_xfer->dma_addr[gi2c_gpi_xfer->buf_idx];
> +	msg_idx = gi2c_gpi_xfer->msg_idx_cnt;
> +
> +	dma_buf = i2c_get_dma_safe_msg_buf(&msgs[msg_idx], 1);
> +	if (!dma_buf) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
>  
>  	if (op == I2C_WRITE)
>  		map_dirn = DMA_TO_DEVICE;
>  	else
>  		map_dirn = DMA_FROM_DEVICE;
>  
> -	addr = dma_map_single(gi2c->se.dev->parent, dma_buf, msg->len, map_dirn);
> +	addr = dma_map_single(gi2c->se.dev->parent, dma_buf,
> +			      msgs[msg_idx].len, map_dirn);
>  	if (dma_mapping_error(gi2c->se.dev->parent, addr)) {
> -		i2c_put_dma_safe_msg_buf(dma_buf, msg, false);
> -		return -ENOMEM;
> +		i2c_put_dma_safe_msg_buf(dma_buf, &msgs[msg_idx], false);
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	if (gi2c->is_tx_multi_xfer) {
> +		if (((msg_idx + 1) % NUM_MSGS_PER_IRQ))
> +			peripheral->flags |= QCOM_GPI_BLOCK_EVENT_IRQ;
> +		else
> +			peripheral->flags &= ~QCOM_GPI_BLOCK_EVENT_IRQ;
> +
> +		/* BEI bit to be cleared for last TRE */
> +		if (msg_idx == gi2c->num_msgs - 1)
> +			peripheral->flags &= ~QCOM_GPI_BLOCK_EVENT_IRQ;
>  	}
>  
>  	/* set the length as message for rx txn */
> -	peripheral->rx_len = msg->len;
> +	peripheral->rx_len = msgs[msg_idx].len;
>  	peripheral->op = op;
>  
>  	ret = dmaengine_slave_config(dma_chan, config);
> @@ -575,7 +651,8 @@ static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
>  	else
>  		dma_dirn = DMA_DEV_TO_MEM;
>  
> -	desc = dmaengine_prep_slave_single(dma_chan, addr, msg->len, dma_dirn, flags);
> +	desc = dmaengine_prep_slave_single(dma_chan, addr, msgs[msg_idx].len,
> +					   dma_dirn, flags);
>  	if (!desc) {
>  		dev_err(gi2c->se.dev, "prep_slave_sg failed\n");
>  		ret = -EIO;
> @@ -585,15 +662,48 @@ static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
>  	desc->callback_result = i2c_gpi_cb_result;
>  	desc->callback_param = gi2c;
>  
> -	dmaengine_submit(desc);
> -	*buf = dma_buf;
> -	*dma_addr_p = addr;
> +	if (!((msgs[msg_idx].flags & I2C_M_RD) && op == I2C_WRITE)) {
> +		gi2c_gpi_xfer->msg_idx_cnt++;
> +		gi2c_gpi_xfer->buf_idx = (msg_idx + 1) % QCOM_GPI_MAX_NUM_MSGS;
> +	}
> +	cookie = dmaengine_submit(desc);
> +	if (dma_submit_error(cookie)) {
> +		dev_err(gi2c->se.dev,
> +			"%s: dmaengine_submit failed (%d)\n", __func__, cookie);
> +		ret = -EINVAL;
> +		goto err_config;
> +	}
>  
> +	if (gi2c->is_tx_multi_xfer) {
> +		dma_async_issue_pending(gi2c->tx_c);
> +		if ((msg_idx == (gi2c->num_msgs - 1)) ||
> +		    (gi2c_gpi_xfer->msg_idx_cnt >=
> +		     QCOM_GPI_MAX_NUM_MSGS + gi2c_gpi_xfer->freed_msg_cnt)) {
> +			ret = gpi_multi_desc_process(gi2c->se.dev, gi2c_gpi_xfer,

A function call straight into the GPI driver? I'm not entirely familiar
with the details of the dmaengine API, but this doesn't look correct.

> +						     gi2c->num_msgs, XFER_TIMEOUT,
> +						     &gi2c->done);
> +			if (ret) {
> +				dev_err(gi2c->se.dev,
> +					"I2C multi write msg transfer timeout: %d\n",
> +					ret);
> +				gi2c->err = ret;
> +				goto err_config;
> +			}
> +		}
> +	} else {
> +		/* Non multi descriptor message transfer */
> +		*buf = dma_buf;
> +		*dma_addr_p = addr;
> +	}
>  	return 0;
>  
>  err_config:
> -	dma_unmap_single(gi2c->se.dev->parent, addr, msg->len, map_dirn);
> -	i2c_put_dma_safe_msg_buf(dma_buf, msg, false);
> +	dma_unmap_single(gi2c->se.dev->parent, addr,
> +			 msgs[msg_idx].len, map_dirn);
> +	i2c_put_dma_safe_msg_buf(dma_buf, &msgs[msg_idx], false);
> +
> +out:
> +	gi2c->err = ret;
>  	return ret;
>  }
>  
> @@ -605,6 +715,7 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
>  	unsigned long time_left;
>  	dma_addr_t tx_addr, rx_addr;
>  	void *tx_buf = NULL, *rx_buf = NULL;
> +	struct gpi_multi_xfer *tx_multi_xfer;
>  	const struct geni_i2c_clk_fld *itr = gi2c->clk_fld;
>  
>  	config.peripheral_config = &peripheral;
> @@ -618,6 +729,34 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
>  	peripheral.set_config = 1;
>  	peripheral.multi_msg = false;
>  
> +	gi2c->gpi_config = &peripheral;
> +	gi2c->num_msgs = num;
> +	gi2c->is_tx_multi_xfer = false;
> +	gi2c->tx_irq_cnt = 0;
> +
> +	tx_multi_xfer = &peripheral.multi_xfer;
> +	memset(tx_multi_xfer, 0, sizeof(struct gpi_multi_xfer));
> +
> +	/*
> +	 * If number of write messages are four and higher then

Why four?

> +	 * configure hardware for multi descriptor transfers with BEI.
> +	 */
> +	if (num >= MIN_NUM_OF_MSGS_MULTI_DESC) {
> +		gi2c->is_tx_multi_xfer = true;
> +		for (i = 0; i < num; i++) {
> +			if (msgs[i].flags & I2C_M_RD) {
> +				/*
> +				 * Multi descriptor transfer with BEI
> +				 * support is enabled for write transfers.
> +				 * Add BEI optimization support for read
> +				 * transfers later.

Prefix this comment with "TODO:"

> +				 */
> +				gi2c->is_tx_multi_xfer = false;
> +				break;
> +			}
> +		}
> +	}
> +
>  	for (i = 0; i < num; i++) {
>  		gi2c->cur = &msgs[i];
>  		gi2c->err = 0;
> @@ -628,14 +767,16 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
>  			peripheral.stretch = 1;
>  
>  		peripheral.addr = msgs[i].addr;
> +		if (i > 0 && (!(msgs[i].flags & I2C_M_RD)))
> +			peripheral.multi_msg = false;
>  
> -		ret =  geni_i2c_gpi(gi2c, &msgs[i], &config,
> +		ret =  geni_i2c_gpi(gi2c, msgs, &config,
>  				    &tx_addr, &tx_buf, I2C_WRITE, gi2c->tx_c);
>  		if (ret)
>  			goto err;
>  
>  		if (msgs[i].flags & I2C_M_RD) {
> -			ret =  geni_i2c_gpi(gi2c, &msgs[i], &config,
> +			ret =  geni_i2c_gpi(gi2c, msgs, &config,
>  					    &rx_addr, &rx_buf, I2C_READ, gi2c->rx_c);
>  			if (ret)
>  				goto err;
> @@ -643,18 +784,26 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
>  			dma_async_issue_pending(gi2c->rx_c);
>  		}
>  
> -		dma_async_issue_pending(gi2c->tx_c);
> -
> -		time_left = wait_for_completion_timeout(&gi2c->done, XFER_TIMEOUT);
> -		if (!time_left)
> -			gi2c->err = -ETIMEDOUT;
> +		if (!gi2c->is_tx_multi_xfer) {
> +			dma_async_issue_pending(gi2c->tx_c);
> +			time_left = wait_for_completion_timeout(&gi2c->done, XFER_TIMEOUT);

By making this conditional on !is_tx_multi_xfer transfers, what makes
the loop wait for the transfer to complete before you below unmap the
buffers?

> +			if (!time_left) {
> +				dev_err(gi2c->se.dev, "%s:I2C timeout\n", __func__);
> +				gi2c->err = -ETIMEDOUT;
> +			}
> +		}
>  
>  		if (gi2c->err) {
>  			ret = gi2c->err;
>  			goto err;
>  		}
>  
> -		geni_i2c_gpi_unmap(gi2c, &msgs[i], tx_buf, tx_addr, rx_buf, rx_addr);
> +		if (!gi2c->is_tx_multi_xfer) {
> +			geni_i2c_gpi_unmap(gi2c, &msgs[i], tx_buf, tx_addr, rx_buf, rx_addr);
> +		} else if (gi2c->tx_irq_cnt != tx_multi_xfer->irq_cnt) {
> +			gi2c->tx_irq_cnt = tx_multi_xfer->irq_cnt;
> +			gpi_i2c_multi_desc_unmap(gi2c, msgs, &peripheral);
> +		}
>  	}
>  
>  	return num;
> @@ -663,7 +812,11 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
>  	dev_err(gi2c->se.dev, "GPI transfer failed: %d\n", ret);
>  	dmaengine_terminate_sync(gi2c->rx_c);
>  	dmaengine_terminate_sync(gi2c->tx_c);
> -	geni_i2c_gpi_unmap(gi2c, &msgs[i], tx_buf, tx_addr, rx_buf, rx_addr);
> +	if (gi2c->is_tx_multi_xfer)
> +		gpi_i2c_multi_desc_unmap(gi2c, msgs, &peripheral);
> +	else
> +		geni_i2c_gpi_unmap(gi2c, &msgs[i], tx_buf, tx_addr, rx_buf, rx_addr);
> +

As above, it would be nice if multi-xfer was just a special case with a
single buffer; rather than inflating the cyclomatic complexity.

Regards,
Bjorn

>  	return ret;
>  }
>  
> -- 
> 2.17.1
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 RESEND 1/3] dmaengine: qcom: gpi: Add GPI Block event interrupt support
  2024-11-11 14:02 ` [PATCH v2 RESEND 1/3] dmaengine: qcom: gpi: Add GPI Block event interrupt support Jyothi Kumar Seerapu
  2024-11-11 22:36   ` Andi Shyti
@ 2024-11-12  4:56   ` Bjorn Andersson
  2024-11-21 12:54     ` Jyothi Kumar Seerapu
  1 sibling, 1 reply; 13+ messages in thread
From: Bjorn Andersson @ 2024-11-12  4:56 UTC (permalink / raw)
  To: Jyothi Kumar Seerapu
  Cc: Vinod Koul, Andi Shyti, Sumit Semwal, Christian König,
	linux-arm-msm, dmaengine, linux-kernel, linux-i2c, linux-media,
	dri-devel, linaro-mm-sig, quic_msavaliy, quic_vtanuku

On Mon, Nov 11, 2024 at 07:32:42PM +0530, Jyothi Kumar Seerapu wrote:
> GSI hardware generates an interrupt for each transfer completion.
> For multiple messages within a single transfer, this results
> in receiving N interrupts for N messages, which can introduce
> significant software interrupt latency.

Here's an excellent opportunity for splitting your problem description
and solution description in two easy to read paragraphs by adding some
newlines.

> To mitigate this latency,
> utilize Block Event Interrupt (BEI) only when an interrupt is necessary.
> When using BEI, consider splitting a single multi-message transfer into
> chunks of 8. This approach can enhance overall transfer time and

The reason for the number 8 must be documented.

"This approach..." wouldn't hurt from having it's own paragraph once
again.

> efficiency.
> 
> Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu@quicinc.com>
> ---
> 
> v1 -> v2: 
>    - Changed dma_addr type from array of pointers to array.
>    - To support BEI functionality with the TRE size of 64 defined in GPI driver,
>      updated QCOM_GPI_MAX_NUM_MSGS to 16 and NUM_MSGS_PER_IRQ to 8.
>  
>  drivers/dma/qcom/gpi.c           | 49 ++++++++++++++++++++++++++++++++
>  include/linux/dma/qcom-gpi-dma.h | 37 ++++++++++++++++++++++++
>  2 files changed, 86 insertions(+)
> 
> diff --git a/drivers/dma/qcom/gpi.c b/drivers/dma/qcom/gpi.c
> index 52a7c8f2498f..a98de3178764 100644
> --- a/drivers/dma/qcom/gpi.c
> +++ b/drivers/dma/qcom/gpi.c
> @@ -1693,6 +1693,9 @@ static int gpi_create_i2c_tre(struct gchan *chan, struct gpi_desc *desc,
>  
>  		tre->dword[3] = u32_encode_bits(TRE_TYPE_DMA, TRE_FLAGS_TYPE);
>  		tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_IEOT);
> +
> +		if (i2c->flags & QCOM_GPI_BLOCK_EVENT_IRQ)
> +			tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_BEI);
>  	}
>  
>  	for (i = 0; i < tre_idx; i++)
> @@ -2098,6 +2101,52 @@ static int gpi_find_avail_gpii(struct gpi_dev *gpi_dev, u32 seid)
>  	return -EIO;
>  }
>  
> +/**
> + * gpi_multi_desc_process() - Process received transfers from GSI HW
> + * @dev: pointer to the corresponding dev node
> + * @multi_xfer: pointer to the gpi_multi_xfer
> + * @num_xfers: total number of transfers
> + * @transfer_timeout_msecs: transfer timeout value
> + * @transfer_comp: completion object of the transfer
> + *
> + * This function is used to process the received transfers based on the
> + * completion events

As far as I can tell it doesn't "process" anything. All it does is
reinit the completion (n + 7) / 8 times, and for the first n / 8
iterations it will wait for an externally defined completion.

Why is this function even defined here, it solely operates on parameters
coming from the I2C driver?

> + *
> + * Return: On success returns 0, otherwise return error code
> + */
> +int gpi_multi_desc_process(struct device *dev, struct gpi_multi_xfer *multi_xfer,
> +			   u32 num_xfers, u32 transfer_timeout_msecs,
> +			   struct completion *transfer_comp)
> +{
> +	int i;
> +	u32 max_irq_cnt, time_left;
> +
> +	max_irq_cnt = num_xfers / NUM_MSGS_PER_IRQ;
> +	if (num_xfers % NUM_MSGS_PER_IRQ)
> +		max_irq_cnt++;
> +
> +	/*
> +	 * Wait for the interrupts of the processed transfers in multiple
> +	 * of 64 and for the last transfer. If the hardware is fast and

I'm confused, where does this 64 come from?

> +	 * already processed all the transfers then no need to wait.
> +	 */
> +	for (i = 0; i < max_irq_cnt; i++) {
> +		reinit_completion(transfer_comp);

I'm trying to convince myself that this isn't racey, but the split
ownership of updating and checking multi_xfer->irq_cnt between the GPI
and I2C drivers is just too hard for me to follow.

> +		if (max_irq_cnt != multi_xfer->irq_cnt) {
> +			time_left = wait_for_completion_timeout(transfer_comp,
> +								transfer_timeout_msecs);
> +			if (!time_left) {
> +				dev_err(dev, "%s: Transfer timeout\n", __func__);
> +				return -ETIMEDOUT;
> +			}
> +		}
> +		if (num_xfers > multi_xfer->msg_idx_cnt)
> +			return 0;
> +	}
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(gpi_multi_desc_process);

The dmaengine framework is expected to provide an abstraction between
clients and DMA engines, so this doesn't look right.

> +
>  /* gpi_of_dma_xlate: open client requested channel */
>  static struct dma_chan *gpi_of_dma_xlate(struct of_phandle_args *args,
>  					 struct of_dma *of_dma)
> diff --git a/include/linux/dma/qcom-gpi-dma.h b/include/linux/dma/qcom-gpi-dma.h
> index 6680dd1a43c6..1341ff0db808 100644
> --- a/include/linux/dma/qcom-gpi-dma.h
> +++ b/include/linux/dma/qcom-gpi-dma.h
> @@ -15,6 +15,12 @@ enum spi_transfer_cmd {
>  	SPI_DUPLEX,
>  };
>  
> +#define QCOM_GPI_BLOCK_EVENT_IRQ	BIT(0)
> +
> +#define QCOM_GPI_MAX_NUM_MSGS		16
> +#define NUM_MSGS_PER_IRQ		8
> +#define MIN_NUM_OF_MSGS_MULTI_DESC	4

Prefixing these QCOM_GPI_ seems like an excellent idea. Still puzzled
about the numbers 8 and 4 though, are they universal for all variants of
GPI or are they just arbitrary numbers picked by experimentation?

> +
>  /**
>   * struct gpi_spi_config - spi config for peripheral
>   *
> @@ -51,6 +57,29 @@ enum i2c_op {
>  	I2C_READ,
>  };
>  
> +/**
> + * struct gpi_multi_xfer - Used for multi transfer support
> + *
> + * @msg_idx_cnt: message index for the transfer
> + * @buf_idx: dma buffer index
> + * @unmap_msg_cnt: unampped transfer index

s/unampped/unmapped

> + * @freed_msg_cnt: freed transfer index
> + * @irq_cnt: received interrupt count
> + * @irq_msg_cnt: transfer message count for the received irqs
> + * @dma_buf: virtual address of the buffer
> + * @dma_addr: dma address of the buffer

"the buffer"? There's up to 16 of them...


As mentioned above, I'm skeptical about this custom API - but if we
were to go this route, the exact responsibilities and semantics should
be documented.

Regards,
Bjorn

> + */
> +struct gpi_multi_xfer {
> +	u32 msg_idx_cnt;
> +	u32 buf_idx;
> +	u32 unmap_msg_cnt;
> +	u32 freed_msg_cnt;
> +	u32 irq_cnt;
> +	u32 irq_msg_cnt;
> +	void *dma_buf[QCOM_GPI_MAX_NUM_MSGS];
> +	dma_addr_t dma_addr[QCOM_GPI_MAX_NUM_MSGS];
> +};
> +
>  /**
>   * struct gpi_i2c_config - i2c config for peripheral
>   *
> @@ -65,6 +94,8 @@ enum i2c_op {
>   * @rx_len: receive length for buffer
>   * @op: i2c cmd
>   * @muli-msg: is part of multi i2c r-w msgs
> + * @flags: true for block event interrupt support
> + * @multi_xfer: indicates transfer has multi messages
>   */
>  struct gpi_i2c_config {
>  	u8 set_config;
> @@ -78,6 +109,12 @@ struct gpi_i2c_config {
>  	u32 rx_len;
>  	enum i2c_op op;
>  	bool multi_msg;
> +	u8 flags;
> +	struct gpi_multi_xfer multi_xfer;
>  };
>  
> +int gpi_multi_desc_process(struct device *dev, struct gpi_multi_xfer *multi_xfer,
> +			   u32 num_xfers, u32 tranfer_timeout_msecs,
> +			   struct completion *transfer_comp);
> +
>  #endif /* QCOM_GPI_DMA_H */
> -- 
> 2.17.1
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 RESEND 1/3] dmaengine: qcom: gpi: Add GPI Block event interrupt support
  2024-11-11 22:36   ` Andi Shyti
@ 2024-11-21 12:51     ` Jyothi Kumar Seerapu
  0 siblings, 0 replies; 13+ messages in thread
From: Jyothi Kumar Seerapu @ 2024-11-21 12:51 UTC (permalink / raw)
  To: Andi Shyti
  Cc: Vinod Koul, Sumit Semwal, Christian König, linux-arm-msm,
	dmaengine, linux-kernel, linux-i2c, linux-media, dri-devel,
	linaro-mm-sig, quic_msavaliy, quic_vtanuku



On 11/12/2024 4:06 AM, Andi Shyti wrote:
> Ping, Vinod :-)
Sure, thanks.
> 
> Andi
> 
> On Mon, Nov 11, 2024 at 07:32:42PM +0530, Jyothi Kumar Seerapu wrote:
>> GSI hardware generates an interrupt for each transfer completion.
>> For multiple messages within a single transfer, this results
>> in receiving N interrupts for N messages, which can introduce
>> significant software interrupt latency. To mitigate this latency,
>> utilize Block Event Interrupt (BEI) only when an interrupt is necessary.
>> When using BEI, consider splitting a single multi-message transfer into
>> chunks of 8. This approach can enhance overall transfer time and
>> efficiency.
>>
>> Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu@quicinc.com>
>> ---
>>
>> v1 -> v2:
>>     - Changed dma_addr type from array of pointers to array.
>>     - To support BEI functionality with the TRE size of 64 defined in GPI driver,
>>       updated QCOM_GPI_MAX_NUM_MSGS to 16 and NUM_MSGS_PER_IRQ to 8.
>>   
>>   drivers/dma/qcom/gpi.c           | 49 ++++++++++++++++++++++++++++++++
>>   include/linux/dma/qcom-gpi-dma.h | 37 ++++++++++++++++++++++++
>>   2 files changed, 86 insertions(+)
>>
>> diff --git a/drivers/dma/qcom/gpi.c b/drivers/dma/qcom/gpi.c
>> index 52a7c8f2498f..a98de3178764 100644
>> --- a/drivers/dma/qcom/gpi.c
>> +++ b/drivers/dma/qcom/gpi.c
>> @@ -1693,6 +1693,9 @@ static int gpi_create_i2c_tre(struct gchan *chan, struct gpi_desc *desc,
>>   
>>   		tre->dword[3] = u32_encode_bits(TRE_TYPE_DMA, TRE_FLAGS_TYPE);
>>   		tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_IEOT);
>> +
>> +		if (i2c->flags & QCOM_GPI_BLOCK_EVENT_IRQ)
>> +			tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_BEI);
>>   	}
>>   
>>   	for (i = 0; i < tre_idx; i++)
>> @@ -2098,6 +2101,52 @@ static int gpi_find_avail_gpii(struct gpi_dev *gpi_dev, u32 seid)
>>   	return -EIO;
>>   }
>>   
>> +/**
>> + * gpi_multi_desc_process() - Process received transfers from GSI HW
>> + * @dev: pointer to the corresponding dev node
>> + * @multi_xfer: pointer to the gpi_multi_xfer
>> + * @num_xfers: total number of transfers
>> + * @transfer_timeout_msecs: transfer timeout value
>> + * @transfer_comp: completion object of the transfer
>> + *
>> + * This function is used to process the received transfers based on the
>> + * completion events
>> + *
>> + * Return: On success returns 0, otherwise return error code
>> + */
>> +int gpi_multi_desc_process(struct device *dev, struct gpi_multi_xfer *multi_xfer,
>> +			   u32 num_xfers, u32 transfer_timeout_msecs,
>> +			   struct completion *transfer_comp)
>> +{
>> +	int i;
>> +	u32 max_irq_cnt, time_left;
>> +
>> +	max_irq_cnt = num_xfers / NUM_MSGS_PER_IRQ;
>> +	if (num_xfers % NUM_MSGS_PER_IRQ)
>> +		max_irq_cnt++;
>> +
>> +	/*
>> +	 * Wait for the interrupts of the processed transfers in multiple
>> +	 * of 64 and for the last transfer. If the hardware is fast and
>> +	 * already processed all the transfers then no need to wait.
>> +	 */
>> +	for (i = 0; i < max_irq_cnt; i++) {
>> +		reinit_completion(transfer_comp);
>> +		if (max_irq_cnt != multi_xfer->irq_cnt) {
>> +			time_left = wait_for_completion_timeout(transfer_comp,
>> +								transfer_timeout_msecs);
>> +			if (!time_left) {
>> +				dev_err(dev, "%s: Transfer timeout\n", __func__);
>> +				return -ETIMEDOUT;
>> +			}
>> +		}
>> +		if (num_xfers > multi_xfer->msg_idx_cnt)
>> +			return 0;
>> +	}
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(gpi_multi_desc_process);
>> +
>>   /* gpi_of_dma_xlate: open client requested channel */
>>   static struct dma_chan *gpi_of_dma_xlate(struct of_phandle_args *args,
>>   					 struct of_dma *of_dma)
>> diff --git a/include/linux/dma/qcom-gpi-dma.h b/include/linux/dma/qcom-gpi-dma.h
>> index 6680dd1a43c6..1341ff0db808 100644
>> --- a/include/linux/dma/qcom-gpi-dma.h
>> +++ b/include/linux/dma/qcom-gpi-dma.h
>> @@ -15,6 +15,12 @@ enum spi_transfer_cmd {
>>   	SPI_DUPLEX,
>>   };
>>   
>> +#define QCOM_GPI_BLOCK_EVENT_IRQ	BIT(0)
>> +
>> +#define QCOM_GPI_MAX_NUM_MSGS		16
>> +#define NUM_MSGS_PER_IRQ		8
>> +#define MIN_NUM_OF_MSGS_MULTI_DESC	4
>> +
>>   /**
>>    * struct gpi_spi_config - spi config for peripheral
>>    *
>> @@ -51,6 +57,29 @@ enum i2c_op {
>>   	I2C_READ,
>>   };
>>   
>> +/**
>> + * struct gpi_multi_xfer - Used for multi transfer support
>> + *
>> + * @msg_idx_cnt: message index for the transfer
>> + * @buf_idx: dma buffer index
>> + * @unmap_msg_cnt: unampped transfer index
>> + * @freed_msg_cnt: freed transfer index
>> + * @irq_cnt: received interrupt count
>> + * @irq_msg_cnt: transfer message count for the received irqs
>> + * @dma_buf: virtual address of the buffer
>> + * @dma_addr: dma address of the buffer
>> + */
>> +struct gpi_multi_xfer {
>> +	u32 msg_idx_cnt;
>> +	u32 buf_idx;
>> +	u32 unmap_msg_cnt;
>> +	u32 freed_msg_cnt;
>> +	u32 irq_cnt;
>> +	u32 irq_msg_cnt;
>> +	void *dma_buf[QCOM_GPI_MAX_NUM_MSGS];
>> +	dma_addr_t dma_addr[QCOM_GPI_MAX_NUM_MSGS];
>> +};
>> +
>>   /**
>>    * struct gpi_i2c_config - i2c config for peripheral
>>    *
>> @@ -65,6 +94,8 @@ enum i2c_op {
>>    * @rx_len: receive length for buffer
>>    * @op: i2c cmd
>>    * @muli-msg: is part of multi i2c r-w msgs
>> + * @flags: true for block event interrupt support
>> + * @multi_xfer: indicates transfer has multi messages
>>    */
>>   struct gpi_i2c_config {
>>   	u8 set_config;
>> @@ -78,6 +109,12 @@ struct gpi_i2c_config {
>>   	u32 rx_len;
>>   	enum i2c_op op;
>>   	bool multi_msg;
>> +	u8 flags;
>> +	struct gpi_multi_xfer multi_xfer;
>>   };
>>   
>> +int gpi_multi_desc_process(struct device *dev, struct gpi_multi_xfer *multi_xfer,
>> +			   u32 num_xfers, u32 tranfer_timeout_msecs,
>> +			   struct completion *transfer_comp);
>> +
>>   #endif /* QCOM_GPI_DMA_H */
>> -- 
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 RESEND 2/3] i2c: qcom_geni: Update compile dependenices for qcom geni
  2024-11-12  4:09   ` Bjorn Andersson
@ 2024-11-21 12:52     ` Jyothi Kumar Seerapu
  0 siblings, 0 replies; 13+ messages in thread
From: Jyothi Kumar Seerapu @ 2024-11-21 12:52 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Vinod Koul, Andi Shyti, Sumit Semwal, Christian König,
	linux-arm-msm, dmaengine, linux-kernel, linux-i2c, linux-media,
	dri-devel, linaro-mm-sig, quic_msavaliy, quic_vtanuku



On 11/12/2024 9:39 AM, Bjorn Andersson wrote:
> On Mon, Nov 11, 2024 at 07:32:43PM +0530, Jyothi Kumar Seerapu wrote:
>> I2C_QCOM_GENI is having compile dependencies on QCOM_GPI_DMA and
>> so update I2C_QCOM_GENI to depends on QCOM_GPI_DMA.
>>
> 
> Given that this is a separate patch, your wording can only be
> interpreted as this being an existing problem.
> 
>> Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu@quicinc.com>
>> ---
>>
>> v1 -> v2:
>> 	This patch is added in v2 to address the kernel test robot
>> 	reported compilation error.
>> 	ERROR: modpost: "gpi_multi_desc_process" [drivers/i2c/busses/i2c-qcom-geni.ko] undefined!
> 
> But as far as I can tell you introduce this problem in patch 3. If so
> this addition should be part of patch 3.

Sure, this change is added part of patch3.
> 
> 
> 
> Also, you have different subject prefix for patch 2 and 3, yet they
> relate to the same driver. Not pretty.
Thanks, corrected it.
> 
> Regards,
> Bjorn
> 
>>
>> drivers/i2c/busses/Kconfig | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
>> index 0aa948014008..87634a682855 100644
>> --- a/drivers/i2c/busses/Kconfig
>> +++ b/drivers/i2c/busses/Kconfig
>> @@ -1049,6 +1049,7 @@ config I2C_QCOM_GENI
>>   	tristate "Qualcomm Technologies Inc.'s GENI based I2C controller"
>>   	depends on ARCH_QCOM || COMPILE_TEST
>>   	depends on QCOM_GENI_SE
>> +	depends on QCOM_GPI_DMA
>>   	help
>>   	  This driver supports GENI serial engine based I2C controller in
>>   	  master mode on the Qualcomm Technologies Inc.'s SoCs. If you say
>> -- 
>> 2.17.1
>>
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 RESEND 1/3] dmaengine: qcom: gpi: Add GPI Block event interrupt support
  2024-11-12  4:56   ` Bjorn Andersson
@ 2024-11-21 12:54     ` Jyothi Kumar Seerapu
  0 siblings, 0 replies; 13+ messages in thread
From: Jyothi Kumar Seerapu @ 2024-11-21 12:54 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Vinod Koul, Andi Shyti, Sumit Semwal, Christian König,
	linux-arm-msm, dmaengine, linux-kernel, linux-i2c, linux-media,
	dri-devel, linaro-mm-sig, quic_msavaliy, quic_vtanuku



On 11/12/2024 10:26 AM, Bjorn Andersson wrote:
> On Mon, Nov 11, 2024 at 07:32:42PM +0530, Jyothi Kumar Seerapu wrote:
>> GSI hardware generates an interrupt for each transfer completion.
>> For multiple messages within a single transfer, this results
>> in receiving N interrupts for N messages, which can introduce
>> significant software interrupt latency.
> 
> Here's an excellent opportunity for splitting your problem description
> and solution description in two easy to read paragraphs by adding some
> newlines.
Thanks, Done.
> 
>> To mitigate this latency,
>> utilize Block Event Interrupt (BEI) only when an interrupt is necessary.
>> When using BEI, consider splitting a single multi-message transfer into
>> chunks of 8. This approach can enhance overall transfer time and
> 
> The reason for the number 8 must be documented.
Its documented in "qcom-gpi-dma.h" file.

> 
> "This approach..." wouldn't hurt from having it's own paragraph once
> again.

Done
> 
>> efficiency.
>>
>> Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu@quicinc.com>
>> ---
>>
>> v1 -> v2:
>>     - Changed dma_addr type from array of pointers to array.
>>     - To support BEI functionality with the TRE size of 64 defined in GPI driver,
>>       updated QCOM_GPI_MAX_NUM_MSGS to 16 and NUM_MSGS_PER_IRQ to 8.
>>   
>>   drivers/dma/qcom/gpi.c           | 49 ++++++++++++++++++++++++++++++++
>>   include/linux/dma/qcom-gpi-dma.h | 37 ++++++++++++++++++++++++
>>   2 files changed, 86 insertions(+)
>>
>> diff --git a/drivers/dma/qcom/gpi.c b/drivers/dma/qcom/gpi.c
>> index 52a7c8f2498f..a98de3178764 100644
>> --- a/drivers/dma/qcom/gpi.c
>> +++ b/drivers/dma/qcom/gpi.c
>> @@ -1693,6 +1693,9 @@ static int gpi_create_i2c_tre(struct gchan *chan, struct gpi_desc *desc,
>>   
>>   		tre->dword[3] = u32_encode_bits(TRE_TYPE_DMA, TRE_FLAGS_TYPE);
>>   		tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_IEOT);
>> +
>> +		if (i2c->flags & QCOM_GPI_BLOCK_EVENT_IRQ)
>> +			tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_BEI);
>>   	}
>>   
>>   	for (i = 0; i < tre_idx; i++)
>> @@ -2098,6 +2101,52 @@ static int gpi_find_avail_gpii(struct gpi_dev *gpi_dev, u32 seid)
>>   	return -EIO;
>>   }
>>   
>> +/**
>> + * gpi_multi_desc_process() - Process received transfers from GSI HW
>> + * @dev: pointer to the corresponding dev node
>> + * @multi_xfer: pointer to the gpi_multi_xfer
>> + * @num_xfers: total number of transfers
>> + * @transfer_timeout_msecs: transfer timeout value
>> + * @transfer_comp: completion object of the transfer
>> + *
>> + * This function is used to process the received transfers based on the
>> + * completion events
> 
> As far as I can tell it doesn't "process" anything. All it does is
> reinit the completion (n + 7) / 8 times, and for the first n / 8
> iterations it will wait for an externally defined completion.
Yes correct. Changed the function name and description.
> 
> Why is this function even defined here, it solely operates on parameters
> coming from the I2C driver?
This function can be used for the other protocols as well and so defined 
here. Please let me know if its not a good idea to make this as common 
function for all required protocols and keep in gpi driver.
> 
>> + *
>> + * Return: On success returns 0, otherwise return error code
>> + */
>> +int gpi_multi_desc_process(struct device *dev, struct gpi_multi_xfer *multi_xfer,
>> +			   u32 num_xfers, u32 transfer_timeout_msecs,
>> +			   struct completion *transfer_comp)
>> +{
>> +	int i;
>> +	u32 max_irq_cnt, time_left;
>> +
>> +	max_irq_cnt = num_xfers / NUM_MSGS_PER_IRQ;
>> +	if (num_xfers % NUM_MSGS_PER_IRQ)
>> +		max_irq_cnt++;
>> +
>> +	/*
>> +	 * Wait for the interrupts of the processed transfers in multiple
>> +	 * of 64 and for the last transfer. If the hardware is fast and
> 
> I'm confused, where does this 64 come from?
It supposed to be 8, thanks for pointing.
> 
>> +	 * already processed all the transfers then no need to wait.
>> +	 */
>> +	for (i = 0; i < max_irq_cnt; i++) {
>> +		reinit_completion(transfer_comp);
> 
> I'm trying to convince myself that this isn't racey, but the split
> ownership of updating and checking multi_xfer->irq_cnt between the GPI
> and I2C drivers is just too hard for me to follow.
If this functionlaity is added into gpi driver, then it can be used for 
i2c and other protocols as well.
Otherwise, this code or function to be added into all protocol drivers.

>> +		if (max_irq_cnt != multi_xfer->irq_cnt) {
>> +			time_left = wait_for_completion_timeout(transfer_comp,
>> +								transfer_timeout_msecs);
>> +			if (!time_left) {
>> +				dev_err(dev, "%s: Transfer timeout\n", __func__);
>> +				return -ETIMEDOUT;
>> +			}
>> +		}
>> +		if (num_xfers > multi_xfer->msg_idx_cnt)
>> +			return 0;
>> +	}
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(gpi_multi_desc_process);
> 
> The dmaengine framework is expected to provide an abstraction between
> clients and DMA engines, so this doesn't look right.
> 
>> +
>>   /* gpi_of_dma_xlate: open client requested channel */
>>   static struct dma_chan *gpi_of_dma_xlate(struct of_phandle_args *args,
>>   					 struct of_dma *of_dma)
>> diff --git a/include/linux/dma/qcom-gpi-dma.h b/include/linux/dma/qcom-gpi-dma.h
>> index 6680dd1a43c6..1341ff0db808 100644
>> --- a/include/linux/dma/qcom-gpi-dma.h
>> +++ b/include/linux/dma/qcom-gpi-dma.h
>> @@ -15,6 +15,12 @@ enum spi_transfer_cmd {
>>   	SPI_DUPLEX,
>>   };
>>   
>> +#define QCOM_GPI_BLOCK_EVENT_IRQ	BIT(0)
>> +
>> +#define QCOM_GPI_MAX_NUM_MSGS		16
>> +#define NUM_MSGS_PER_IRQ		8
>> +#define MIN_NUM_OF_MSGS_MULTI_DESC	4
> 
> Prefixing these QCOM_GPI_ seems like an excellent idea. Still puzzled
> about the numbers 8 and 4 though, are they universal for all variants of
> GPI or are they just arbitrary numbers picked by experimentation?
MIN_NUM_OF_MSGS_MULTI_DESC changed to 2 in V3, so that if the number of 
messages in a transfer are greter than 1, then is_tx_multi_xfer is set.

maximum channel tre's are 64 which is for config TRE, go TRE and DMA(Tx, 
Rx) TRE. 32 messages can't fit to allocate memory with size of 64 and so 
used maximum messages as 16. And plan to use multiple factor of 
MAX_NUM_MSGS for interrupts per messages and so defined NUM_MSGS_PER_IRQ 
to 8.	

> 
>> +
>>   /**
>>    * struct gpi_spi_config - spi config for peripheral
>>    *
>> @@ -51,6 +57,29 @@ enum i2c_op {
>>   	I2C_READ,
>>   };
>>   
>> +/**
>> + * struct gpi_multi_xfer - Used for multi transfer support
>> + *
>> + * @msg_idx_cnt: message index for the transfer
>> + * @buf_idx: dma buffer index
>> + * @unmap_msg_cnt: unampped transfer index
> 
> s/unampped/unmapped
Done
> 
>> + * @freed_msg_cnt: freed transfer index
>> + * @irq_cnt: received interrupt count
>> + * @irq_msg_cnt: transfer message count for the received irqs
>> + * @dma_buf: virtual address of the buffer
>> + * @dma_addr: dma address of the buffer
> 
> "the buffer"? There's up to 16 of them...

Done
> 
> 
> As mentioned above, I'm skeptical about this custom API - but if we
> were to go this route, the exact responsibilities and semantics should
> be documented.

Documentated the API's and other added in this file.
> 
> Regards,
> Bjorn
> 
>> + */
>> +struct gpi_multi_xfer {
>> +	u32 msg_idx_cnt;
>> +	u32 buf_idx;
>> +	u32 unmap_msg_cnt;
>> +	u32 freed_msg_cnt;
>> +	u32 irq_cnt;
>> +	u32 irq_msg_cnt;
>> +	void *dma_buf[QCOM_GPI_MAX_NUM_MSGS];
>> +	dma_addr_t dma_addr[QCOM_GPI_MAX_NUM_MSGS];
>> +};
>> +
>>   /**
>>    * struct gpi_i2c_config - i2c config for peripheral
>>    *
>> @@ -65,6 +94,8 @@ enum i2c_op {
>>    * @rx_len: receive length for buffer
>>    * @op: i2c cmd
>>    * @muli-msg: is part of multi i2c r-w msgs
>> + * @flags: true for block event interrupt support
>> + * @multi_xfer: indicates transfer has multi messages
>>    */
>>   struct gpi_i2c_config {
>>   	u8 set_config;
>> @@ -78,6 +109,12 @@ struct gpi_i2c_config {
>>   	u32 rx_len;
>>   	enum i2c_op op;
>>   	bool multi_msg;
>> +	u8 flags;
>> +	struct gpi_multi_xfer multi_xfer;
>>   };
>>   
>> +int gpi_multi_desc_process(struct device *dev, struct gpi_multi_xfer *multi_xfer,
>> +			   u32 num_xfers, u32 tranfer_timeout_msecs,
>> +			   struct completion *transfer_comp);
>> +
>>   #endif /* QCOM_GPI_DMA_H */
>> -- 
>> 2.17.1
>>
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 RESEND 3/3] i2c: i2c-qcom-geni: Add Block event interrupt support
  2024-11-12  4:33   ` Bjorn Andersson
@ 2024-11-21 12:58     ` Jyothi Kumar Seerapu
  2024-12-02 13:41       ` Jyothi Kumar Seerapu
  0 siblings, 1 reply; 13+ messages in thread
From: Jyothi Kumar Seerapu @ 2024-11-21 12:58 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Vinod Koul, Andi Shyti, Sumit Semwal, Christian König,
	linux-arm-msm, dmaengine, linux-kernel, linux-i2c, linux-media,
	dri-devel, linaro-mm-sig, quic_msavaliy, quic_vtanuku



On 11/12/2024 10:03 AM, Bjorn Andersson wrote:
> On Mon, Nov 11, 2024 at 07:32:44PM +0530, Jyothi Kumar Seerapu wrote:
>> The I2C driver gets an interrupt upon transfer completion.
>> For multiple messages in a single transfer, N interrupts will be
>> received for N messages, leading to significant software interrupt
>> latency. To mitigate this latency, utilize Block Event Interrupt (BEI)
> 
> Please rewrite this to the tone that the reader doesn't know what Block
> Event Interrupt is, or that it exists.
Sure, done.
> 
>> only when an interrupt is necessary. This means large transfers can be
>> split into multiple chunks of 8 messages internally, without expecting
>> interrupts for the first 7 message completions, only the last one will
>> trigger an interrupt indicating 8 messages completed.
>>
>> By implementing BEI, multi-message transfers can be divided into
>> chunks of 8 messages, improving overall transfer time.
> 
> You already wrote this in the paragraph above.
yeah removed it.
> 
> 
> Where is this number 8 coming from btw?
Its documented in "qcom-gpi-dma.h" file.
Trigger an interrupt, after the completion of 8 messages.
> 
>> This optimization reduces transfer time from 168 ms to 48 ms for a
>> series of 200 I2C write messages in a single transfer, with a
>> clock frequency support of 100 kHz.
>>
>> BEI optimizations are currently implemented for I2C write transfers only,
>> as there is no use case for multiple I2C read messages in a single transfer
>> at this time.
>>
>> Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu@quicinc.com>
>> ---
>>
>> v1 -> v2:
>>     - Moved gi2c_gpi_xfer->msg_idx_cnt to separate local variable.
>>     - Updated goto labels for error scenarios in geni_i2c_gpi function
>>     - memset tx_multi_xfer to 0.
>>     - Removed passing current msg index to geni_i2c_gpi.
>>     - Fixed kernel test robot reported compilation issues.
>>
>>   drivers/i2c/busses/i2c-qcom-geni.c | 203 +++++++++++++++++++++++++----
>>   1 file changed, 178 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c
>> index 7a22e1f46e60..04a7d926dadc 100644
>> --- a/drivers/i2c/busses/i2c-qcom-geni.c
>> +++ b/drivers/i2c/busses/i2c-qcom-geni.c
>> @@ -100,6 +100,10 @@ struct geni_i2c_dev {
>>   	struct dma_chan *rx_c;
>>   	bool gpi_mode;
>>   	bool abort_done;
>> +	bool is_tx_multi_xfer;
>> +	u32 num_msgs;
>> +	u32 tx_irq_cnt;
>> +	struct gpi_i2c_config *gpi_config;
>>   };
>>   
>>   struct geni_i2c_desc {
>> @@ -500,6 +504,7 @@ static int geni_i2c_tx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
>>   static void i2c_gpi_cb_result(void *cb, const struct dmaengine_result *result)
>>   {
>>   	struct geni_i2c_dev *gi2c = cb;
>> +	struct gpi_multi_xfer *tx_multi_xfer;
>>   
>>   	if (result->result != DMA_TRANS_NOERROR) {
>>   		dev_err(gi2c->se.dev, "DMA txn failed:%d\n", result->result);
>> @@ -508,7 +513,21 @@ static void i2c_gpi_cb_result(void *cb, const struct dmaengine_result *result)
>>   		dev_dbg(gi2c->se.dev, "DMA xfer has pending: %d\n", result->residue);
>>   	}
>>   
>> -	complete(&gi2c->done);
>> +	if (gi2c->is_tx_multi_xfer) {
> 
> Wouldn't it be cleaner to treat the !is_tx_multi_xfer case as a
> multi-xfer of length 1?
Sure, addressed the change in V3 patch.
> 
>> +		tx_multi_xfer = &gi2c->gpi_config->multi_xfer;
>> +
>> +		/*
>> +		 * Send Completion for last message or multiple of NUM_MSGS_PER_IRQ.
>> +		 */
>> +		if ((tx_multi_xfer->irq_msg_cnt == gi2c->num_msgs - 1) ||
>> +		    (!((tx_multi_xfer->irq_msg_cnt + 1) % NUM_MSGS_PER_IRQ))) {
>> +			tx_multi_xfer->irq_cnt++;
>> +			complete(&gi2c->done);
> 
> Why? You're removing the wait_for_completion_timeout() from
> geni_i2c_gpi_xfer() when is_tx_multi_xfer is set.
For (!is_tx_multi_xfer) case, need to wait for every message.
But whereas for multi-message (when is_tx_multi_xfer is set) cases, 
"wait_for_completion_timeout" will trigger after queuing messages till 
QCOM_GPI_MAX_NUM_MSGS (32) or total number of i2c msgs and 
"wait_for_completion_timeout" for this case is handled in GPI driver.
> 
>> +		}
>> +		tx_multi_xfer->irq_msg_cnt++;
>> +	} else {
>> +		complete(&gi2c->done);
>> +	}
>>   }
>>   
>>   static void geni_i2c_gpi_unmap(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
>> @@ -526,7 +545,42 @@ static void geni_i2c_gpi_unmap(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
>>   	}
>>   }
>>   
>> -static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
>> +/**
>> + * gpi_i2c_multi_desc_unmap() - unmaps the buffers post multi message TX transfers
>> + * @dev: pointer to the corresponding dev node
>> + * @gi2c: i2c dev handle
>> + * @msgs: i2c messages array
>> + * @peripheral: pointer to the gpi_i2c_config
>> + */
>> +static void gpi_i2c_multi_desc_unmap(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[],
>> +				     struct gpi_i2c_config *peripheral)
>> +{
>> +	u32 msg_xfer_cnt, wr_idx = 0;
>> +	struct gpi_multi_xfer *tx_multi_xfer = &peripheral->multi_xfer;
>> +
>> +	/*
>> +	 * In error case, need to unmap all messages based on the msg_idx_cnt.
>> +	 * Non-error case unmap all the processed messages.
> 
> What is the benefit of this optimization, compared to keeping things
> simple and just unmap all buffers at the end of geni_i2c_gpi_xfer()?

The maximum number of messages can allocate and submit to GSI hardware 
is 16 (QCOM_GPI_MAX_NUM_MSGS) and to handle more messages beyond this 
need to unmap the processed messages.
If there is 200 messages or more in a transfer then we need to unmap the 
processed messages for handling all messages in a transfer.
So, instead of Unmap all messages together, here unmapping the chunk of 
messages.

> 
>> +	 */
>> +	if (gi2c->err)
>> +		msg_xfer_cnt = tx_multi_xfer->msg_idx_cnt;
>> +	else
>> +		msg_xfer_cnt = tx_multi_xfer->irq_cnt * NUM_MSGS_PER_IRQ;
>> +
>> +	/* Unmap the processed DMA buffers based on the received interrupt count */
>> +	for (; tx_multi_xfer->unmap_msg_cnt < msg_xfer_cnt; tx_multi_xfer->unmap_msg_cnt++) {
>> +		if (tx_multi_xfer->unmap_msg_cnt == gi2c->num_msgs)
>> +			break;
>> +		wr_idx = tx_multi_xfer->unmap_msg_cnt % QCOM_GPI_MAX_NUM_MSGS;
>> +		geni_i2c_gpi_unmap(gi2c, &msgs[tx_multi_xfer->unmap_msg_cnt],
>> +				   tx_multi_xfer->dma_buf[wr_idx],
>> +				   tx_multi_xfer->dma_addr[wr_idx],
>> +				   NULL, (dma_addr_t)NULL);
>> +		tx_multi_xfer->freed_msg_cnt++;
>> +	}
>> +}
>> +
>> +static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[],
>>   			struct dma_slave_config *config, dma_addr_t *dma_addr_p,
>>   			void **buf, unsigned int op, struct dma_chan *dma_chan)
>>   {
>> @@ -538,26 +592,48 @@ static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
>>   	enum dma_transfer_direction dma_dirn;
>>   	struct dma_async_tx_descriptor *desc;
>>   	int ret;
>> +	struct gpi_multi_xfer *gi2c_gpi_xfer;
>> +	dma_cookie_t cookie;
>> +	u32 msg_idx;
>>   
>>   	peripheral = config->peripheral_config;
>> -
>> -	dma_buf = i2c_get_dma_safe_msg_buf(msg, 1);
>> -	if (!dma_buf)
>> -		return -ENOMEM;
>> +	gi2c_gpi_xfer = &peripheral->multi_xfer;
>> +	dma_buf = gi2c_gpi_xfer->dma_buf[gi2c_gpi_xfer->buf_idx];
>> +	addr = gi2c_gpi_xfer->dma_addr[gi2c_gpi_xfer->buf_idx];
>> +	msg_idx = gi2c_gpi_xfer->msg_idx_cnt;
>> +
>> +	dma_buf = i2c_get_dma_safe_msg_buf(&msgs[msg_idx], 1);
>> +	if (!dma_buf) {
>> +		ret = -ENOMEM;
>> +		goto out;
>> +	}
>>   
>>   	if (op == I2C_WRITE)
>>   		map_dirn = DMA_TO_DEVICE;
>>   	else
>>   		map_dirn = DMA_FROM_DEVICE;
>>   
>> -	addr = dma_map_single(gi2c->se.dev->parent, dma_buf, msg->len, map_dirn);
>> +	addr = dma_map_single(gi2c->se.dev->parent, dma_buf,
>> +			      msgs[msg_idx].len, map_dirn);
>>   	if (dma_mapping_error(gi2c->se.dev->parent, addr)) {
>> -		i2c_put_dma_safe_msg_buf(dma_buf, msg, false);
>> -		return -ENOMEM;
>> +		i2c_put_dma_safe_msg_buf(dma_buf, &msgs[msg_idx], false);
>> +		ret = -ENOMEM;
>> +		goto out;
>> +	}
>> +
>> +	if (gi2c->is_tx_multi_xfer) {
>> +		if (((msg_idx + 1) % NUM_MSGS_PER_IRQ))
>> +			peripheral->flags |= QCOM_GPI_BLOCK_EVENT_IRQ;
>> +		else
>> +			peripheral->flags &= ~QCOM_GPI_BLOCK_EVENT_IRQ;
>> +
>> +		/* BEI bit to be cleared for last TRE */
>> +		if (msg_idx == gi2c->num_msgs - 1)
>> +			peripheral->flags &= ~QCOM_GPI_BLOCK_EVENT_IRQ;
>>   	}
>>   
>>   	/* set the length as message for rx txn */
>> -	peripheral->rx_len = msg->len;
>> +	peripheral->rx_len = msgs[msg_idx].len;
>>   	peripheral->op = op;
>>   
>>   	ret = dmaengine_slave_config(dma_chan, config);
>> @@ -575,7 +651,8 @@ static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
>>   	else
>>   		dma_dirn = DMA_DEV_TO_MEM;
>>   
>> -	desc = dmaengine_prep_slave_single(dma_chan, addr, msg->len, dma_dirn, flags);
>> +	desc = dmaengine_prep_slave_single(dma_chan, addr, msgs[msg_idx].len,
>> +					   dma_dirn, flags);
>>   	if (!desc) {
>>   		dev_err(gi2c->se.dev, "prep_slave_sg failed\n");
>>   		ret = -EIO;
>> @@ -585,15 +662,48 @@ static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
>>   	desc->callback_result = i2c_gpi_cb_result;
>>   	desc->callback_param = gi2c;
>>   
>> -	dmaengine_submit(desc);
>> -	*buf = dma_buf;
>> -	*dma_addr_p = addr;
>> +	if (!((msgs[msg_idx].flags & I2C_M_RD) && op == I2C_WRITE)) {
>> +		gi2c_gpi_xfer->msg_idx_cnt++;
>> +		gi2c_gpi_xfer->buf_idx = (msg_idx + 1) % QCOM_GPI_MAX_NUM_MSGS;
>> +	}
>> +	cookie = dmaengine_submit(desc);
>> +	if (dma_submit_error(cookie)) {
>> +		dev_err(gi2c->se.dev,
>> +			"%s: dmaengine_submit failed (%d)\n", __func__, cookie);
>> +		ret = -EINVAL;
>> +		goto err_config;
>> +	}
>>   
>> +	if (gi2c->is_tx_multi_xfer) {
>> +		dma_async_issue_pending(gi2c->tx_c);
>> +		if ((msg_idx == (gi2c->num_msgs - 1)) ||
>> +		    (gi2c_gpi_xfer->msg_idx_cnt >=
>> +		     QCOM_GPI_MAX_NUM_MSGS + gi2c_gpi_xfer->freed_msg_cnt)) {
>> +			ret = gpi_multi_desc_process(gi2c->se.dev, gi2c_gpi_xfer,
> 
> A function call straight into the GPI driver? I'm not entirely familiar
> with the details of the dmaengine API, but this doesn't look correct.

"gpi_multi_desc_process" can be used for the other protocols as well and 
so defined here. Please let me know if its not a good idea to make this 
as common function for all required protocols and keep in GPI driver.

Also, gpi_multi_desc_process can't be fit into dmaengine API and so 
invoked a function call to GPI driver.
> 
>> +						     gi2c->num_msgs, XFER_TIMEOUT,
>> +						     &gi2c->done);
>> +			if (ret) {
>> +				dev_err(gi2c->se.dev,
>> +					"I2C multi write msg transfer timeout: %d\n",
>> +					ret);
>> +				gi2c->err = ret;
>> +				goto err_config;
>> +			}
>> +		}
>> +	} else {
>> +		/* Non multi descriptor message transfer */
>> +		*buf = dma_buf;
>> +		*dma_addr_p = addr;
>> +	}
>>   	return 0;
>>   
>>   err_config:
>> -	dma_unmap_single(gi2c->se.dev->parent, addr, msg->len, map_dirn);
>> -	i2c_put_dma_safe_msg_buf(dma_buf, msg, false);
>> +	dma_unmap_single(gi2c->se.dev->parent, addr,
>> +			 msgs[msg_idx].len, map_dirn);
>> +	i2c_put_dma_safe_msg_buf(dma_buf, &msgs[msg_idx], false);
>> +
>> +out:
>> +	gi2c->err = ret;
>>   	return ret;
>>   }
>>   
>> @@ -605,6 +715,7 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
>>   	unsigned long time_left;
>>   	dma_addr_t tx_addr, rx_addr;
>>   	void *tx_buf = NULL, *rx_buf = NULL;
>> +	struct gpi_multi_xfer *tx_multi_xfer;
>>   	const struct geni_i2c_clk_fld *itr = gi2c->clk_fld;
>>   
>>   	config.peripheral_config = &peripheral;
>> @@ -618,6 +729,34 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
>>   	peripheral.set_config = 1;
>>   	peripheral.multi_msg = false;
>>   
>> +	gi2c->gpi_config = &peripheral;
>> +	gi2c->num_msgs = num;
>> +	gi2c->is_tx_multi_xfer = false;
>> +	gi2c->tx_irq_cnt = 0;
>> +
>> +	tx_multi_xfer = &peripheral.multi_xfer;
>> +	memset(tx_multi_xfer, 0, sizeof(struct gpi_multi_xfer));
>> +
>> +	/*
>> +	 * If number of write messages are four and higher then
> 
> Why four?
It changed to 2 in V3, so that if the number of messages in a transfer 
are greter than 1, then "is_tx_multi_xfer" is set.
> 
>> +	 * configure hardware for multi descriptor transfers with BEI.
>> +	 */
>> +	if (num >= MIN_NUM_OF_MSGS_MULTI_DESC) {
>> +		gi2c->is_tx_multi_xfer = true;
>> +		for (i = 0; i < num; i++) {
>> +			if (msgs[i].flags & I2C_M_RD) {
>> +				/*
>> +				 * Multi descriptor transfer with BEI
>> +				 * support is enabled for write transfers.
>> +				 * Add BEI optimization support for read
>> +				 * transfers later.
> 
> Prefix this comment with "TODO:"
Done
> 
>> +				 */
>> +				gi2c->is_tx_multi_xfer = false;
>> +				break;
>> +			}
>> +		}
>> +	}
>> +
>>   	for (i = 0; i < num; i++) {
>>   		gi2c->cur = &msgs[i];
>>   		gi2c->err = 0;
>> @@ -628,14 +767,16 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
>>   			peripheral.stretch = 1;
>>   
>>   		peripheral.addr = msgs[i].addr;
>> +		if (i > 0 && (!(msgs[i].flags & I2C_M_RD)))
>> +			peripheral.multi_msg = false;
>>   
>> -		ret =  geni_i2c_gpi(gi2c, &msgs[i], &config,
>> +		ret =  geni_i2c_gpi(gi2c, msgs, &config,
>>   				    &tx_addr, &tx_buf, I2C_WRITE, gi2c->tx_c);
>>   		if (ret)
>>   			goto err;
>>   
>>   		if (msgs[i].flags & I2C_M_RD) {
>> -			ret =  geni_i2c_gpi(gi2c, &msgs[i], &config,
>> +			ret =  geni_i2c_gpi(gi2c, msgs, &config,
>>   					    &rx_addr, &rx_buf, I2C_READ, gi2c->rx_c);
>>   			if (ret)
>>   				goto err;
>> @@ -643,18 +784,26 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
>>   			dma_async_issue_pending(gi2c->rx_c);
>>   		}
>>   
>> -		dma_async_issue_pending(gi2c->tx_c);
>> -
>> -		time_left = wait_for_completion_timeout(&gi2c->done, XFER_TIMEOUT);
>> -		if (!time_left)
>> -			gi2c->err = -ETIMEDOUT;
>> +		if (!gi2c->is_tx_multi_xfer) {
>> +			dma_async_issue_pending(gi2c->tx_c);
>> +			time_left = wait_for_completion_timeout(&gi2c->done, XFER_TIMEOUT);
> 
> By making this conditional on !is_tx_multi_xfer transfers, what makes
> the loop wait for the transfer to complete before you below unmap the
> buffers?
Yes, for (!is_tx_multi_xfer) case, need to wait for every message and 
then unmap it and for is_tx_multi_xfer transfers shouldn't unmap per 
message wise instead unmap the chunk of messages together.

> 
>> +			if (!time_left) {
>> +				dev_err(gi2c->se.dev, "%s:I2C timeout\n", __func__);
>> +				gi2c->err = -ETIMEDOUT;
>> +			}
>> +		}
>>   
>>   		if (gi2c->err) {
>>   			ret = gi2c->err;
>>   			goto err;
>>   		}
>>   
>> -		geni_i2c_gpi_unmap(gi2c, &msgs[i], tx_buf, tx_addr, rx_buf, rx_addr);
>> +		if (!gi2c->is_tx_multi_xfer) {
>> +			geni_i2c_gpi_unmap(gi2c, &msgs[i], tx_buf, tx_addr, rx_buf, rx_addr);
>> +		} else if (gi2c->tx_irq_cnt != tx_multi_xfer->irq_cnt) {
>> +			gi2c->tx_irq_cnt = tx_multi_xfer->irq_cnt;
>> +			gpi_i2c_multi_desc_unmap(gi2c, msgs, &peripheral);
>> +		}
>>   	}
>>   
>>   	return num;
>> @@ -663,7 +812,11 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
>>   	dev_err(gi2c->se.dev, "GPI transfer failed: %d\n", ret);
>>   	dmaengine_terminate_sync(gi2c->rx_c);
>>   	dmaengine_terminate_sync(gi2c->tx_c);
>> -	geni_i2c_gpi_unmap(gi2c, &msgs[i], tx_buf, tx_addr, rx_buf, rx_addr);
>> +	if (gi2c->is_tx_multi_xfer)
>> +		gpi_i2c_multi_desc_unmap(gi2c, msgs, &peripheral);
>> +	else
>> +		geni_i2c_gpi_unmap(gi2c, &msgs[i], tx_buf, tx_addr, rx_buf, rx_addr);
>> +
> 
> As above, it would be nice if multi-xfer was just a special case with a
> single buffer; rather than inflating the cyclomatic complexity.

For a single i2c message, data can be placed at contigious memory 
locations, but for multiple i2c messages in a transfer, all the messages 
offsets and data may not guarantee to placed at contigious memory 
locations.
So, looks single large buffer is not helpful here.

> 
> Regards,
> Bjorn
> 
>>   	return ret;
>>   }
>>   
>> -- 
>> 2.17.1
>>
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 RESEND 3/3] i2c: i2c-qcom-geni: Add Block event interrupt support
  2024-11-21 12:58     ` Jyothi Kumar Seerapu
@ 2024-12-02 13:41       ` Jyothi Kumar Seerapu
  0 siblings, 0 replies; 13+ messages in thread
From: Jyothi Kumar Seerapu @ 2024-12-02 13:41 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Vinod Koul, Andi Shyti, Sumit Semwal, Christian König,
	linux-arm-msm, dmaengine, linux-kernel, linux-i2c, linux-media,
	dri-devel, linaro-mm-sig, quic_msavaliy, quic_vtanuku



On 11/21/2024 6:28 PM, Jyothi Kumar Seerapu wrote:
> 
> 
> On 11/12/2024 10:03 AM, Bjorn Andersson wrote:
>> On Mon, Nov 11, 2024 at 07:32:44PM +0530, Jyothi Kumar Seerapu wrote:
>>> The I2C driver gets an interrupt upon transfer completion.
>>> For multiple messages in a single transfer, N interrupts will be
>>> received for N messages, leading to significant software interrupt
>>> latency. To mitigate this latency, utilize Block Event Interrupt (BEI)
>>
>> Please rewrite this to the tone that the reader doesn't know what Block
>> Event Interrupt is, or that it exists.
> Sure, done.
>>
>>> only when an interrupt is necessary. This means large transfers can be
>>> split into multiple chunks of 8 messages internally, without expecting
>>> interrupts for the first 7 message completions, only the last one will
>>> trigger an interrupt indicating 8 messages completed.
>>>
>>> By implementing BEI, multi-message transfers can be divided into
>>> chunks of 8 messages, improving overall transfer time.
>>
>> You already wrote this in the paragraph above.
> yeah removed it.
>>
>>
>> Where is this number 8 coming from btw?
> Its documented in "qcom-gpi-dma.h" file.
> Trigger an interrupt, after the completion of 8 messages.
>>
>>> This optimization reduces transfer time from 168 ms to 48 ms for a
>>> series of 200 I2C write messages in a single transfer, with a
>>> clock frequency support of 100 kHz.
>>>
>>> BEI optimizations are currently implemented for I2C write transfers 
>>> only,
>>> as there is no use case for multiple I2C read messages in a single 
>>> transfer
>>> at this time.
>>>
>>> Signed-off-by: Jyothi Kumar Seerapu <quic_jseerapu@quicinc.com>
>>> ---
>>>
>>> v1 -> v2:
>>>     - Moved gi2c_gpi_xfer->msg_idx_cnt to separate local variable.
>>>     - Updated goto labels for error scenarios in geni_i2c_gpi function
>>>     - memset tx_multi_xfer to 0.
>>>     - Removed passing current msg index to geni_i2c_gpi.
>>>     - Fixed kernel test robot reported compilation issues.
>>>
>>>   drivers/i2c/busses/i2c-qcom-geni.c | 203 +++++++++++++++++++++++++----
>>>   1 file changed, 178 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/drivers/i2c/busses/i2c-qcom-geni.c 
>>> b/drivers/i2c/busses/i2c-qcom-geni.c
>>> index 7a22e1f46e60..04a7d926dadc 100644
>>> --- a/drivers/i2c/busses/i2c-qcom-geni.c
>>> +++ b/drivers/i2c/busses/i2c-qcom-geni.c
>>> @@ -100,6 +100,10 @@ struct geni_i2c_dev {
>>>       struct dma_chan *rx_c;
>>>       bool gpi_mode;
>>>       bool abort_done;
>>> +    bool is_tx_multi_xfer;
>>> +    u32 num_msgs;
>>> +    u32 tx_irq_cnt;
>>> +    struct gpi_i2c_config *gpi_config;
>>>   };
>>>   struct geni_i2c_desc {
>>> @@ -500,6 +504,7 @@ static int geni_i2c_tx_one_msg(struct 
>>> geni_i2c_dev *gi2c, struct i2c_msg *msg,
>>>   static void i2c_gpi_cb_result(void *cb, const struct 
>>> dmaengine_result *result)
>>>   {
>>>       struct geni_i2c_dev *gi2c = cb;
>>> +    struct gpi_multi_xfer *tx_multi_xfer;
>>>       if (result->result != DMA_TRANS_NOERROR) {
>>>           dev_err(gi2c->se.dev, "DMA txn failed:%d\n", result->result);
>>> @@ -508,7 +513,21 @@ static void i2c_gpi_cb_result(void *cb, const 
>>> struct dmaengine_result *result)
>>>           dev_dbg(gi2c->se.dev, "DMA xfer has pending: %d\n", 
>>> result->residue);
>>>       }
>>> -    complete(&gi2c->done);
>>> +    if (gi2c->is_tx_multi_xfer) {
>>
>> Wouldn't it be cleaner to treat the !is_tx_multi_xfer case as a
>> multi-xfer of length 1?
> Sure, addressed the change in V3 patch.
>>
>>> +        tx_multi_xfer = &gi2c->gpi_config->multi_xfer;
>>> +
>>> +        /*
>>> +         * Send Completion for last message or multiple of 
>>> NUM_MSGS_PER_IRQ.
>>> +         */
>>> +        if ((tx_multi_xfer->irq_msg_cnt == gi2c->num_msgs - 1) ||
>>> +            (!((tx_multi_xfer->irq_msg_cnt + 1) % NUM_MSGS_PER_IRQ))) {
>>> +            tx_multi_xfer->irq_cnt++;
>>> +            complete(&gi2c->done);
>>
>> Why? You're removing the wait_for_completion_timeout() from
>> geni_i2c_gpi_xfer() when is_tx_multi_xfer is set.
> For (!is_tx_multi_xfer) case, need to wait for every message.
> But whereas for multi-message (when is_tx_multi_xfer is set) cases, 
> "wait_for_completion_timeout" will trigger after queuing messages till 
> QCOM_GPI_MAX_NUM_MSGS (32) or total number of i2c msgs and 
> "wait_for_completion_timeout" for this case is handled in GPI driver.
>>
>>> +        }
>>> +        tx_multi_xfer->irq_msg_cnt++;
>>> +    } else {
>>> +        complete(&gi2c->done);
>>> +    }
>>>   }
>>>   static void geni_i2c_gpi_unmap(struct geni_i2c_dev *gi2c, struct 
>>> i2c_msg *msg,
>>> @@ -526,7 +545,42 @@ static void geni_i2c_gpi_unmap(struct 
>>> geni_i2c_dev *gi2c, struct i2c_msg *msg,
>>>       }
>>>   }
>>> -static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
>>> +/**
>>> + * gpi_i2c_multi_desc_unmap() - unmaps the buffers post multi 
>>> message TX transfers
>>> + * @dev: pointer to the corresponding dev node
>>> + * @gi2c: i2c dev handle
>>> + * @msgs: i2c messages array
>>> + * @peripheral: pointer to the gpi_i2c_config
>>> + */
>>> +static void gpi_i2c_multi_desc_unmap(struct geni_i2c_dev *gi2c, 
>>> struct i2c_msg msgs[],
>>> +                     struct gpi_i2c_config *peripheral)
>>> +{
>>> +    u32 msg_xfer_cnt, wr_idx = 0;
>>> +    struct gpi_multi_xfer *tx_multi_xfer = &peripheral->multi_xfer;
>>> +
>>> +    /*
>>> +     * In error case, need to unmap all messages based on the 
>>> msg_idx_cnt.
>>> +     * Non-error case unmap all the processed messages.
>>
>> What is the benefit of this optimization, compared to keeping things
>> simple and just unmap all buffers at the end of geni_i2c_gpi_xfer()?
> 
> The maximum number of messages can allocate and submit to GSI hardware 
> is 16 (QCOM_GPI_MAX_NUM_MSGS) and to handle more messages beyond this 
> need to unmap the processed messages.
> If there is 200 messages or more in a transfer then we need to unmap the 
> processed messages for handling all messages in a transfer.
> So, instead of Unmap all messages together, here unmapping the chunk of 
> messages.
> 
>>
>>> +     */
>>> +    if (gi2c->err)
>>> +        msg_xfer_cnt = tx_multi_xfer->msg_idx_cnt;
>>> +    else
>>> +        msg_xfer_cnt = tx_multi_xfer->irq_cnt * NUM_MSGS_PER_IRQ;
>>> +
>>> +    /* Unmap the processed DMA buffers based on the received 
>>> interrupt count */
>>> +    for (; tx_multi_xfer->unmap_msg_cnt < msg_xfer_cnt; 
>>> tx_multi_xfer->unmap_msg_cnt++) {
>>> +        if (tx_multi_xfer->unmap_msg_cnt == gi2c->num_msgs)
>>> +            break;
>>> +        wr_idx = tx_multi_xfer->unmap_msg_cnt % QCOM_GPI_MAX_NUM_MSGS;
>>> +        geni_i2c_gpi_unmap(gi2c, &msgs[tx_multi_xfer->unmap_msg_cnt],
>>> +                   tx_multi_xfer->dma_buf[wr_idx],
>>> +                   tx_multi_xfer->dma_addr[wr_idx],
>>> +                   NULL, (dma_addr_t)NULL);
>>> +        tx_multi_xfer->freed_msg_cnt++;
>>> +    }
>>> +}
>>> +
>>> +static int geni_i2c_gpi(struct geni_i2c_dev *gi2c, struct i2c_msg 
>>> msgs[],
>>>               struct dma_slave_config *config, dma_addr_t *dma_addr_p,
>>>               void **buf, unsigned int op, struct dma_chan *dma_chan)
>>>   {
>>> @@ -538,26 +592,48 @@ static int geni_i2c_gpi(struct geni_i2c_dev 
>>> *gi2c, struct i2c_msg *msg,
>>>       enum dma_transfer_direction dma_dirn;
>>>       struct dma_async_tx_descriptor *desc;
>>>       int ret;
>>> +    struct gpi_multi_xfer *gi2c_gpi_xfer;
>>> +    dma_cookie_t cookie;
>>> +    u32 msg_idx;
>>>       peripheral = config->peripheral_config;
>>> -
>>> -    dma_buf = i2c_get_dma_safe_msg_buf(msg, 1);
>>> -    if (!dma_buf)
>>> -        return -ENOMEM;
>>> +    gi2c_gpi_xfer = &peripheral->multi_xfer;
>>> +    dma_buf = gi2c_gpi_xfer->dma_buf[gi2c_gpi_xfer->buf_idx];
>>> +    addr = gi2c_gpi_xfer->dma_addr[gi2c_gpi_xfer->buf_idx];
>>> +    msg_idx = gi2c_gpi_xfer->msg_idx_cnt;
>>> +
>>> +    dma_buf = i2c_get_dma_safe_msg_buf(&msgs[msg_idx], 1);
>>> +    if (!dma_buf) {
>>> +        ret = -ENOMEM;
>>> +        goto out;
>>> +    }
>>>       if (op == I2C_WRITE)
>>>           map_dirn = DMA_TO_DEVICE;
>>>       else
>>>           map_dirn = DMA_FROM_DEVICE;
>>> -    addr = dma_map_single(gi2c->se.dev->parent, dma_buf, msg->len, 
>>> map_dirn);
>>> +    addr = dma_map_single(gi2c->se.dev->parent, dma_buf,
>>> +                  msgs[msg_idx].len, map_dirn);
>>>       if (dma_mapping_error(gi2c->se.dev->parent, addr)) {
>>> -        i2c_put_dma_safe_msg_buf(dma_buf, msg, false);
>>> -        return -ENOMEM;
>>> +        i2c_put_dma_safe_msg_buf(dma_buf, &msgs[msg_idx], false);
>>> +        ret = -ENOMEM;
>>> +        goto out;
>>> +    }
>>> +
>>> +    if (gi2c->is_tx_multi_xfer) {
>>> +        if (((msg_idx + 1) % NUM_MSGS_PER_IRQ))
>>> +            peripheral->flags |= QCOM_GPI_BLOCK_EVENT_IRQ;
>>> +        else
>>> +            peripheral->flags &= ~QCOM_GPI_BLOCK_EVENT_IRQ;
>>> +
>>> +        /* BEI bit to be cleared for last TRE */
>>> +        if (msg_idx == gi2c->num_msgs - 1)
>>> +            peripheral->flags &= ~QCOM_GPI_BLOCK_EVENT_IRQ;
>>>       }
>>>       /* set the length as message for rx txn */
>>> -    peripheral->rx_len = msg->len;
>>> +    peripheral->rx_len = msgs[msg_idx].len;
>>>       peripheral->op = op;
>>>       ret = dmaengine_slave_config(dma_chan, config);
>>> @@ -575,7 +651,8 @@ static int geni_i2c_gpi(struct geni_i2c_dev 
>>> *gi2c, struct i2c_msg *msg,
>>>       else
>>>           dma_dirn = DMA_DEV_TO_MEM;
>>> -    desc = dmaengine_prep_slave_single(dma_chan, addr, msg->len, 
>>> dma_dirn, flags);
>>> +    desc = dmaengine_prep_slave_single(dma_chan, addr, 
>>> msgs[msg_idx].len,
>>> +                       dma_dirn, flags);
>>>       if (!desc) {
>>>           dev_err(gi2c->se.dev, "prep_slave_sg failed\n");
>>>           ret = -EIO;
>>> @@ -585,15 +662,48 @@ static int geni_i2c_gpi(struct geni_i2c_dev 
>>> *gi2c, struct i2c_msg *msg,
>>>       desc->callback_result = i2c_gpi_cb_result;
>>>       desc->callback_param = gi2c;
>>> -    dmaengine_submit(desc);
>>> -    *buf = dma_buf;
>>> -    *dma_addr_p = addr;
>>> +    if (!((msgs[msg_idx].flags & I2C_M_RD) && op == I2C_WRITE)) {
>>> +        gi2c_gpi_xfer->msg_idx_cnt++;
>>> +        gi2c_gpi_xfer->buf_idx = (msg_idx + 1) % QCOM_GPI_MAX_NUM_MSGS;
>>> +    }
>>> +    cookie = dmaengine_submit(desc);
>>> +    if (dma_submit_error(cookie)) {
>>> +        dev_err(gi2c->se.dev,
>>> +            "%s: dmaengine_submit failed (%d)\n", __func__, cookie);
>>> +        ret = -EINVAL;
>>> +        goto err_config;
>>> +    }
>>> +    if (gi2c->is_tx_multi_xfer) {
>>> +        dma_async_issue_pending(gi2c->tx_c);
>>> +        if ((msg_idx == (gi2c->num_msgs - 1)) ||
>>> +            (gi2c_gpi_xfer->msg_idx_cnt >=
>>> +             QCOM_GPI_MAX_NUM_MSGS + gi2c_gpi_xfer->freed_msg_cnt)) {
>>> +            ret = gpi_multi_desc_process(gi2c->se.dev, gi2c_gpi_xfer,
>>
>> A function call straight into the GPI driver? I'm not entirely familiar
>> with the details of the dmaengine API, but this doesn't look correct.
> 
> "gpi_multi_desc_process" can be used for the other protocols as well and 
> so defined here. Please let me know if its not a good idea to make this 
> as common function for all required protocols and keep in GPI driver.
> 
> Also, gpi_multi_desc_process can't be fit into dmaengine API and so 
> invoked a function call to GPI driver.

Hi Bjorn, this function(gpi_multi_desc_process) does not fit into any 
DMA engine API.
So, I am considering moving this function to the I2C driver from GPI. 
Please let me know if this is acceptable or if you have any suggestions.
>>
>>> +                             gi2c->num_msgs, XFER_TIMEOUT,
>>> +                             &gi2c->done);
>>> +            if (ret) {
>>> +                dev_err(gi2c->se.dev,
>>> +                    "I2C multi write msg transfer timeout: %d\n",
>>> +                    ret);
>>> +                gi2c->err = ret;
>>> +                goto err_config;
>>> +            }
>>> +        }
>>> +    } else {
>>> +        /* Non multi descriptor message transfer */
>>> +        *buf = dma_buf;
>>> +        *dma_addr_p = addr;
>>> +    }
>>>       return 0;
>>>   err_config:
>>> -    dma_unmap_single(gi2c->se.dev->parent, addr, msg->len, map_dirn);
>>> -    i2c_put_dma_safe_msg_buf(dma_buf, msg, false);
>>> +    dma_unmap_single(gi2c->se.dev->parent, addr,
>>> +             msgs[msg_idx].len, map_dirn);
>>> +    i2c_put_dma_safe_msg_buf(dma_buf, &msgs[msg_idx], false);
>>> +
>>> +out:
>>> +    gi2c->err = ret;
>>>       return ret;
>>>   }
>>> @@ -605,6 +715,7 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev 
>>> *gi2c, struct i2c_msg msgs[], i
>>>       unsigned long time_left;
>>>       dma_addr_t tx_addr, rx_addr;
>>>       void *tx_buf = NULL, *rx_buf = NULL;
>>> +    struct gpi_multi_xfer *tx_multi_xfer;
>>>       const struct geni_i2c_clk_fld *itr = gi2c->clk_fld;
>>>       config.peripheral_config = &peripheral;
>>> @@ -618,6 +729,34 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev 
>>> *gi2c, struct i2c_msg msgs[], i
>>>       peripheral.set_config = 1;
>>>       peripheral.multi_msg = false;
>>> +    gi2c->gpi_config = &peripheral;
>>> +    gi2c->num_msgs = num;
>>> +    gi2c->is_tx_multi_xfer = false;
>>> +    gi2c->tx_irq_cnt = 0;
>>> +
>>> +    tx_multi_xfer = &peripheral.multi_xfer;
>>> +    memset(tx_multi_xfer, 0, sizeof(struct gpi_multi_xfer));
>>> +
>>> +    /*
>>> +     * If number of write messages are four and higher then
>>
>> Why four?
> It changed to 2 in V3, so that if the number of messages in a transfer 
> are greter than 1, then "is_tx_multi_xfer" is set.
>>
>>> +     * configure hardware for multi descriptor transfers with BEI.
>>> +     */
>>> +    if (num >= MIN_NUM_OF_MSGS_MULTI_DESC) {
>>> +        gi2c->is_tx_multi_xfer = true;
>>> +        for (i = 0; i < num; i++) {
>>> +            if (msgs[i].flags & I2C_M_RD) {
>>> +                /*
>>> +                 * Multi descriptor transfer with BEI
>>> +                 * support is enabled for write transfers.
>>> +                 * Add BEI optimization support for read
>>> +                 * transfers later.
>>
>> Prefix this comment with "TODO:"
> Done
>>
>>> +                 */
>>> +                gi2c->is_tx_multi_xfer = false;
>>> +                break;
>>> +            }
>>> +        }
>>> +    }
>>> +
>>>       for (i = 0; i < num; i++) {
>>>           gi2c->cur = &msgs[i];
>>>           gi2c->err = 0;
>>> @@ -628,14 +767,16 @@ static int geni_i2c_gpi_xfer(struct 
>>> geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
>>>               peripheral.stretch = 1;
>>>           peripheral.addr = msgs[i].addr;
>>> +        if (i > 0 && (!(msgs[i].flags & I2C_M_RD)))
>>> +            peripheral.multi_msg = false;
>>> -        ret =  geni_i2c_gpi(gi2c, &msgs[i], &config,
>>> +        ret =  geni_i2c_gpi(gi2c, msgs, &config,
>>>                       &tx_addr, &tx_buf, I2C_WRITE, gi2c->tx_c);
>>>           if (ret)
>>>               goto err;
>>>           if (msgs[i].flags & I2C_M_RD) {
>>> -            ret =  geni_i2c_gpi(gi2c, &msgs[i], &config,
>>> +            ret =  geni_i2c_gpi(gi2c, msgs, &config,
>>>                           &rx_addr, &rx_buf, I2C_READ, gi2c->rx_c);
>>>               if (ret)
>>>                   goto err;
>>> @@ -643,18 +784,26 @@ static int geni_i2c_gpi_xfer(struct 
>>> geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
>>>               dma_async_issue_pending(gi2c->rx_c);
>>>           }
>>> -        dma_async_issue_pending(gi2c->tx_c);
>>> -
>>> -        time_left = wait_for_completion_timeout(&gi2c->done, 
>>> XFER_TIMEOUT);
>>> -        if (!time_left)
>>> -            gi2c->err = -ETIMEDOUT;
>>> +        if (!gi2c->is_tx_multi_xfer) {
>>> +            dma_async_issue_pending(gi2c->tx_c);
>>> +            time_left = wait_for_completion_timeout(&gi2c->done, 
>>> XFER_TIMEOUT);
>>
>> By making this conditional on !is_tx_multi_xfer transfers, what makes
>> the loop wait for the transfer to complete before you below unmap the
>> buffers?
> Yes, for (!is_tx_multi_xfer) case, need to wait for every message and 
> then unmap it and for is_tx_multi_xfer transfers shouldn't unmap per 
> message wise instead unmap the chunk of messages together.
> 
>>
>>> +            if (!time_left) {
>>> +                dev_err(gi2c->se.dev, "%s:I2C timeout\n", __func__);
>>> +                gi2c->err = -ETIMEDOUT;
>>> +            }
>>> +        }
>>>           if (gi2c->err) {
>>>               ret = gi2c->err;
>>>               goto err;
>>>           }
>>> -        geni_i2c_gpi_unmap(gi2c, &msgs[i], tx_buf, tx_addr, rx_buf, 
>>> rx_addr);
>>> +        if (!gi2c->is_tx_multi_xfer) {
>>> +            geni_i2c_gpi_unmap(gi2c, &msgs[i], tx_buf, tx_addr, 
>>> rx_buf, rx_addr);
>>> +        } else if (gi2c->tx_irq_cnt != tx_multi_xfer->irq_cnt) {
>>> +            gi2c->tx_irq_cnt = tx_multi_xfer->irq_cnt;
>>> +            gpi_i2c_multi_desc_unmap(gi2c, msgs, &peripheral);
>>> +        }
>>>       }
>>>       return num;
>>> @@ -663,7 +812,11 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev 
>>> *gi2c, struct i2c_msg msgs[], i
>>>       dev_err(gi2c->se.dev, "GPI transfer failed: %d\n", ret);
>>>       dmaengine_terminate_sync(gi2c->rx_c);
>>>       dmaengine_terminate_sync(gi2c->tx_c);
>>> -    geni_i2c_gpi_unmap(gi2c, &msgs[i], tx_buf, tx_addr, rx_buf, 
>>> rx_addr);
>>> +    if (gi2c->is_tx_multi_xfer)
>>> +        gpi_i2c_multi_desc_unmap(gi2c, msgs, &peripheral);
>>> +    else
>>> +        geni_i2c_gpi_unmap(gi2c, &msgs[i], tx_buf, tx_addr, rx_buf, 
>>> rx_addr);
>>> +
>>
>> As above, it would be nice if multi-xfer was just a special case with a
>> single buffer; rather than inflating the cyclomatic complexity.
> 
> For a single i2c message, data can be placed at contigious memory 
> locations, but for multiple i2c messages in a transfer, all the messages 
> offsets and data may not guarantee to placed at contigious memory 
> locations.
> So, looks single large buffer is not helpful here.
> 
>>
>> Regards,
>> Bjorn
>>
>>>       return ret;
>>>   }
>>> -- 
>>> 2.17.1
>>>
>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-12-02 13:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-11 14:02 [PATCH v2 RESEND 0/3] Add Block event interrupt support for I2C protocol Jyothi Kumar Seerapu
2024-11-11 14:02 ` [PATCH v2 RESEND 1/3] dmaengine: qcom: gpi: Add GPI Block event interrupt support Jyothi Kumar Seerapu
2024-11-11 22:36   ` Andi Shyti
2024-11-21 12:51     ` Jyothi Kumar Seerapu
2024-11-12  4:56   ` Bjorn Andersson
2024-11-21 12:54     ` Jyothi Kumar Seerapu
2024-11-11 14:02 ` [PATCH v2 RESEND 2/3] i2c: qcom_geni: Update compile dependenices for qcom geni Jyothi Kumar Seerapu
2024-11-12  4:09   ` Bjorn Andersson
2024-11-21 12:52     ` Jyothi Kumar Seerapu
2024-11-11 14:02 ` [PATCH v2 RESEND 3/3] i2c: i2c-qcom-geni: Add Block event interrupt support Jyothi Kumar Seerapu
2024-11-12  4:33   ` Bjorn Andersson
2024-11-21 12:58     ` Jyothi Kumar Seerapu
2024-12-02 13:41       ` Jyothi Kumar Seerapu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox