Re: [PATCH v2 2/3] i3c: master: dw: Map CCC hardware errors to I3C M0/M2 - sashiko-bot

From: sashiko-bot@kernel.org
To: tze.yee.ng@altera.com
Cc: linux-i3c@lists.infradead.org, Frank.Li@kernel.org
Subject: Re: [PATCH v2 2/3] i3c: master: dw: Map CCC hardware errors to I3C M0/M2
Date: Tue, 09 Jun 2026 10:32:18 +0000	[thread overview]
Message-ID: <20260609103218.E3EEF1F00893@smtp.kernel.org> (raw)
In-Reply-To: <a4d909479fb8d78e930398fbdc94289d0a173a09.1780999511.git.tze.yee.ng@altera.com>

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

Pre-existing issues:
- [High] dw_i3c_master_daa() ignores the transfer error status and uses an uninitialized receive length, leading to spurious registration of dummy devices on failure or timeout.
- [High] The driver blindly trusts hardware response counts and Transaction IDs (TID) from the I3C controller, leading to out-of-bounds array access and arbitrary memory corruption.
- [High] The driver blindly trusts the response data length from the hardware and copies it into the receive buffer, risking a buffer overflow.
- [High] The IRQ handler prematurely completes ongoing I3C transfers when unrelated interrupts (such as In-Band Interrupts) occur, bypassing hardware state verification.
- [High] If a transfer times out, subsequent queued transfers are left indefinitely stalled, and any new transfer will execute out-of-order.
--

commit 4df4fe8c6ced05d7c1108a9f52a2ac16c9feeede
Author: Adrian Ng Ho Yin <adrian.ho.yin.ng@altera.com>

i3c: master: dw: Map CCC hardware errors to I3C M0/M2

This commit maps DesignWare response-queue status to standard I3C error
codes in ccc->err and returns -EIO for bus NACKs so they can be retried by
the core.

> diff --git a/drivers/i3c/master/dw-i3c-master.c b/drivers/i3c/master/dw-i3c-master.c
> index e42cf3ddcec8f..c4004f87c6857 100644
> --- a/drivers/i3c/master/dw-i3c-master.c
> +++ b/drivers/i3c/master/dw-i3c-master.c
> @@ -493,6 +493,7 @@ static void dw_i3c_master_end_xfer_locked(struct dw_i3c_master *master, u32 isr)
>  			break;
>  		case RESPONSE_ERROR_PARITY:
>  		case RESPONSE_ERROR_IBA_NACK:
> +		case RESPONSE_ERROR_ADDRESS_NACK:
>  		case RESPONSE_ERROR_TRANSF_ABORT:
>  		case RESPONSE_ERROR_CRC:
>  		case RESPONSE_ERROR_FRAME:

[Severity: High]
This isn't a regression introduced by this patch, but while reviewing
dw_i3c_master_end_xfer_locked(), could the hardware response TID cause an
out-of-bounds access?

Looking at the loop earlier in this function:

dw_i3c_master_end_xfer_locked()
    ...
    resp = readl(master->regs + RESPONSE_QUEUE_PORT);
    cmd = &xfer->cmds[RESPONSE_PORT_TID(resp)];

Since RESPONSE_PORT_TID(resp) is used directly to index the xfer->cmds
array without bounds checking, could a hardware fault or arbitrary response
trigger an out-of-bounds heap access?

[Severity: High]
This is a pre-existing issue, but I also noticed a potential buffer
overflow in the same loop:

dw_i3c_master_end_xfer_locked()
    ...
    cmd->rx_len = RESPONSE_PORT_DATA_LEN(resp);
    cmd->error = RESPONSE_PORT_ERR_STATUS(resp);
    if (cmd->rx_len && !cmd->error)
        dw_i3c_master_read_rx_fifo(master, cmd->rx_buf,
                                   cmd->rx_len);

Since cmd->rx_len is overwritten with the hardware response length and
passed to dw_i3c_master_read_rx_fifo() without validation, could a response
length larger than the allocated cmd->rx_buf cause a heap buffer overflow?

[Severity: High]
This isn't a regression introduced by this patch, but does the IRQ handler
prematurely complete transfers?

If an In-Band Interrupt (IBI) triggers while a transfer is active:

dw_i3c_master_irq_handler()
    ...
    spin_lock(&master->xferqueue.lock);
    dw_i3c_master_end_xfer_locked(master, status);

dw_i3c_master_end_xfer_locked() doesn't verify if the interrupt was a
response completion. If nresp is 0, the function still proceeds to
unconditionally call complete(&xfer->comp) and start the next transfer.

Does this prematurely abort hardware synchronization for the active
transfer?

[ ... ]

[Severity: High]
This is a pre-existing issue in dw_i3c_master_daa(), but does it register
phantom devices on timeout?

If wait_for_completion_timeout() fails, cmd->rx_len remains uninitialized.
Then the calculation:

dw_i3c_master_daa()
    ...
    newdevs = GENMASK(master->maxdevs - cmd->rx_len - 1, 0);
    newdevs &= ~olddevs;

    for (pos = 0; pos < master->maxdevs; pos++) {
        if (newdevs & BIT(pos))
            i3c_master_add_i3c_dev_locked(m, master->devs[pos].addr);
    }

Since cmd->rx_len is 0, GENMASK evaluates to setting all remaining slots in
newdevs, spuriously registering non-existent devices. Is there a missing
check for the transfer error status here?

[Severity: High]
This isn't a regression introduced by this patch, but what happens to
queued transfers if one times out?

Looking at dw_i3c_master_dequeue_xfer_locked():

dw_i3c_master_dequeue_xfer_locked()
    if (master->xferqueue.cur == xfer) {
        master->xferqueue.cur = NULL;
        writel(RESET_CTRL_RX_FIFO | RESET_CTRL_TX_FIFO |
               RESET_CTRL_RESP_QUEUE | RESET_CTRL_CMD_QUEUE,
               master->regs + RESET_CTRL);
    ...
    } else {
        list_del_init(&xfer->node);
    }

If the current transfer is cancelled, the hardware queue is reset and
master->xferqueue.cur is set to NULL, but it fails to pop the next pending
transfer from the wait queue and restart execution.

Doesn't this stall subsequent queued transfers indefinitely, and allow new
transfers to jump the line since cur is NULL?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1780999511.git.tze.yee.ng@altera.com?part=2

-- 
linux-i3c mailing list
linux-i3c@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-i3c