Linux PCI Non-Transparent Bridge framework and drivers
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Alok Tiwari" <alok.a.tiwari@oracle.com>
Cc: ntb@lists.linux.dev
Subject: Re: [PATCH] NTB: epf: log teardown command failures with error code
Date: Fri, 29 May 2026 19:05:46 +0000	[thread overview]
Message-ID: <20260529190546.E71581F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260408055002.436349-1-alok.a.tiwari@oracle.com>

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

Pre-existing issues:
- [Critical] ntb_free_mw() ignores the return value of ntb_mw_clear_trans() before freeing DMA memory, risking memory corruption if hardware teardown fails.
- [High] ntb_epf_mw_set_trans() completely ignores the return value of ntb_epf_send_command(), leading to silent failures during memory window configuration.
- [High] Missing cleanup of previously requested IRQs in the error path of ntb_epf_init_isr(), causing a use-after-free crash on subsequent interrupts.
- [High] The host-to-EPF command protocol cannot recover from a timeout, permanently desynchronizing the command state machine.
- [High] Doorbell bitmask and integer indexing mismatch in ntb_epf_vec_isr() and ntb_epf_db_read(), violating the NTB framework protocol.
--

commit a6cd16ced6f6513467ba1eb8d60eb48cc645ec44
Author: Alok Tiwari <alok.a.tiwari@oracle.com>

NTB: epf: log teardown command failures with error code

This commit updates ntb_epf_mw_clear_trans() to capture the return value
from ntb_epf_send_command(). It logs the error code and returns it to the
caller if the teardown command fails.

> diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
> index d3ecf25a51625..0c8fc459eb7a6 100644
> --- a/drivers/ntb/hw/epf/ntb_hw_epf.c
> +++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
> @@ -438,11 +438,11 @@ static int ntb_epf_mw_clear_trans(struct ntb_dev *ntb, int pidx, int idx)
>  {
>  	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
>  	struct device *dev = ndev->dev;
> -	int ret = 0;
> +	int ret;
>  
> -	ntb_epf_send_command(ndev, CMD_TEARDOWN_MW, idx);
> +	ret = ntb_epf_send_command(ndev, CMD_TEARDOWN_MW, idx);

[Severity: Critical]
This isn't a bug introduced by this patch, but now that this function correctly
returns the teardown error, does the caller ntb_free_mw() need an update to
handle it safely?

drivers/ntb/ntb_transport.c:ntb_free_mw() {
    ...
    ntb_mw_clear_trans(nt->ndev, PIDX, num_mw);
    dma_free_attrs(dma_dev, mw->alloc_size, mw->alloc_addr,
                   mw->original_dma_addr, DMA_ATTR_FORCE_CONTIGUOUS);
    ...
}

Could this risk memory corruption if the hardware teardown fails? If the
teardown command times out and the hardware window remains active, the remote
peer might write into physical pages that have already been returned to the
allocator.

>  	if (ret)
> -		dev_err(dev, "Failed to teardown memory window\n");
> +		dev_err(dev, "Failed to teardown memory window %d err %d\n", idx, ret);
>  
>  	return ret;
>  }

[Severity: High]
This is a pre-existing issue, but I noticed ntb_epf_mw_set_trans() completely
ignores the return value of ntb_epf_send_command():

drivers/ntb/hw/epf/ntb_hw_epf.c:ntb_epf_mw_set_trans() {
    ...
    ntb_epf_send_command(ndev, CMD_CONFIGURE_MW, idx);

    return 0;
}

Could this lead to silent failures during memory window configuration? If
ntb_epf_send_command() fails due to a hardware timeout, returning 0 might
cause the framework to assume the memory window is ready and initiate DMA
operations over an unconfigured window.

[Severity: High]
This is a pre-existing issue, but is there a missing cleanup of previously
requested IRQs in the error path of ntb_epf_init_isr()?

drivers/ntb/hw/epf/ntb_hw_epf.c:ntb_epf_init_isr() {
    ...
err_request_irq:
    pci_free_irq_vectors(pdev);

    return ret;
}

Does this leave dangling IRQ handlers? If request_irq() fails during the loop,
merely calling pci_free_irq_vectors() without calling free_irq() for the
already registered handlers could result in a use-after-free crash on
subsequent interrupts, since the ndev structure will be freed.

[Severity: High]
This isn't a bug introduced by this patch, but does the host-to-EPF command
protocol permanently desynchronize on timeout?

drivers/ntb/hw/epf/ntb_hw_epf.c:ntb_epf_send_command() {
    ...
    if (WARN_ON(timedout)) {
        ret = -ETIMEDOUT;
        break;
    }
    ...
}

If a command times out and the host clears the status register without aborting
the state machine, might the endpoint firmware still be processing it? When the
firmware eventually completes, it could write COMMAND_STATUS_OK to the status
register, causing the next command sent by the host to instantly read this
stale status.

[Severity: High]
This is a pre-existing issue, but does the doorbell implementation violate
the NTB framework protocol by returning an integer index instead of a bitmask?

drivers/ntb/hw/epf/ntb_hw_epf.c:ntb_epf_vec_isr() {
    ...
    irq_no = irq - pci_irq_vector(ndev->ntb.pdev, 0);
    ndev->db_val = irq_no + 1;
    ...
}

drivers/ntb/hw/epf/ntb_hw_epf.c:ntb_epf_db_read() {
    ...
    return ndev->db_val;
}

Does the framework expect a bitmask of triggered doorbells here? Setting
ndev->db_val = irq_no + 1 means a trigger on doorbell 2 results in a value
of 3, which the framework might misinterpret as doorbells 0 and 1. Also, does
this direct assignment clobber previously pending doorbells instead of
atomically OR-ing them together?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260408055002.436349-1-alok.a.tiwari@oracle.com?part=1

      parent reply	other threads:[~2026-05-29 19:05 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08  5:49 [PATCH] NTB: epf: log teardown command failures with error code Alok Tiwari
2026-04-08 15:35 ` Dave Jiang
2026-05-29 19:05 ` sashiko-bot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260529190546.E71581F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=alok.a.tiwari@oracle.com \
    --cc=ntb@lists.linux.dev \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox