Re: [PATCH v2 1/2] can: Add Loongson CAN-FD controller driver

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Binbin Zhou" <zhoubinbin@loongson.cn>
Cc: "Vincent Mailhol" <mailhol@kernel.org>,
	"Marc Kleine-Budde" <mkl@pengutronix.de>,
	linux-can@vger.kernel.org,
	"Oleksij Rempel" <o.rempel@pengutronix.de>,
	"Oliver Hartkopp" <socketcan@hartkopp.net>
Subject: Re: [PATCH v2 1/2] can: Add Loongson CAN-FD controller driver
Date: Tue, 09 Jun 2026 18:49:18 +0000	[thread overview]
Message-ID: <20260609184919.342881F00893@smtp.kernel.org> (raw)
In-Reply-To: <ed069249061d105d72d2115b8a05216a60cebffb.1780908445.git.zhoubinbin@loongson.cn>

Thank you for your contribution! Sashiko AI review found 14 potential issue(s) to consider:
- [Critical] Missing bounds checks on the hardware-provided RX word count (`RWCNT`) causes severe heap buffer overflow.
- [High] Uninitialized variable `buf_id` in `loongson_canfd_start_xmit()` causes packet drops and potential out-of-bounds array access.
- [High] Unhandled Bus Error Interrupts (BEI) and overwritten ISR variables lead to permanent loss of interrupts and device lockup.
- [High] Missing NULL pointer check in `CAN_STATE_ERROR_ACTIVE` handling causes a kernel panic.
- [High] Failing to drain the RX FIFO upon SKB allocation failure permanently desynchronizes the controller's packet stream.
- [High] Incorrect NAPI exit logic causes list corruption when `work_done == quota`.
- [High] The NAPI poll loop permanently stalls the RX path if `loongson_canfd_rx()` encounters an error.
- [High] Race condition in driver initialization permanently drops RX capability if an interrupt fires early.
- [High] The driver misuses strictly reserved SocketCAN UAPI fields, corrupting the length indicator for Classic CAN frames.
- [High] Syntax Error: Declarations immediately follow a label, violating C11 standard requirements.
- [High] The driver violates the SocketCAN standard by prematurely resetting the hardware error counters upon entering BUS_OFF.
- [Medium] Incorrect logic in `loongson_canfd_txbuf_freed` forces a strict queue depth of 1, severely degrading transmission throughput.
- [Medium] Polling of MMIO asynchronous TX status without proper flow control risks processing the same transmission multiple times.
- [Medium] Lost increments on `stats->rx_errors` across softirq and hardirq contexts.
--

commit da73fa842fa0bacd63f1ec58f9a813c7b28476d3
Author: Binbin Zhou <zhoubinbin@loongson.cn>
Subject: can: Add Loongson CAN-FD controller driver

Add support for the CAN-FD controller integrated into Loongson-2K series
SoCs. The controller supports both Classic CAN and CAN FD, with up to 8
transmit buffers, hardware timestamping, error counters, and various
control modes (loopback, listen-only, one-shot, non-ISO FD, etc.).

> diff --git a/drivers/net/can/loongson_canfd.c b/drivers/net/can/loongson_canfd.c
> new file mode 100644
> index 0000000000000..3794700e04c8d
> --- /dev/null
> +++ b/drivers/net/can/loongson_canfd.c

[ ... ]

> +static bool loongson_canfd_txbuf_freed(struct loongson_canfd_priv *priv)
> +{
> +	return !regmap_test_bits(priv->regmap, LOONGSON_CANFD_TX_STAT, REG_TX_STAT_BRP);
> +}

[Severity: Medium]
Does this logic overly restrict the transmission queue? Since regmap_test_bits()
returns true if ANY bit is set in the mask, checking it with ! means this
returns true only when ALL hardware buffers are completely free. This forces a
strict queue depth of 1 and prevents utilizing the 8-buffer hardware pipeline.

[ ... ]

> +	for_each_clear_bit(i, &tx_brp, LOONGSON_CANFD_TXBUF_NUM) {
> +		u32 transmission_complete_mask = 0x3 << (i * 2);
> +
> +		if (!(tx_bs & transmission_complete_mask)) {
> +			buf_id = i;
> +			break;
> +		}
> +	}
> +
> +	if (!loongson_canfd_insert_frame(priv, cf, buf_id, can_is_canfd_skb(skb))) {

[Severity: High]
What happens in loongson_canfd_start_xmit() if the conditional
!(tx_bs & transmission_complete_mask) is false for all clear bits?

The loop would exit without assigning a value to buf_id. Passing an
uninitialized buf_id to loongson_canfd_insert_frame() could lead to silent
packet drops or unpredictable out-of-bounds array access.

[ ... ]

> +	/* Timesamp */
> +	cf->__res0 = meta1;
> +	cf->__res1 = meta1 >> 8;

[Severity: High]
Is it safe to write to cf->__res1 here?

For Classic CAN frames, alloc_can_skb() returns a struct can_frame where the
offset of __res1 physically aliases with the len8_dlc field. Overwriting this
memory corrupts the length indicator for 8-byte frames in the upstream
networking stack.

[ ... ]

> +	cf->len = len;
> +	if (unlikely(len > wc * LOONGSON_CANFD_DW_BYTE))
> +		len = wc * LOONGSON_CANFD_DW_BYTE;
> +
> +	/* Data */
> +	for (i = 0; i < len; i += LOONGSON_CANFD_DW_BYTE) {

[Severity: Critical]
Is it possible for the hardware-provided RX word count to exceed expected
bounds here? 

The wc variable is derived from RWCNT - 2. If RWCNT is large, len could exceed
CANFD_MAX_DLEN, leading to a massive buffer overflow when writing directly
into cf->data. Furthermore, if RWCNT is 0 or 1, wc would underflow, resulting
in an infinite loop and massive memory corruption.

[ ... ]

> +	/* Flexible Data-rate Format */
> +	if (meta1 & REG_FRAME_META1_FDF)
> +		skb = alloc_canfd_skb(ndev, &cf);
> +	else
> +		skb = alloc_can_skb(ndev, (struct can_frame **)&cf);
> +	if (unlikely(!skb))
> +		return -ENOMEM;

[Severity: High]
If this allocation fails, loongson_canfd_rx() returns early without draining
the remaining data payload words for this frame from the RX FIFO. Will this
leave unread payload words in the stream that get mistakenly parsed as metadata
headers for the next frame, permanently misaligning the parser?

[ ... ]

> +	switch (FIELD_GET(REG_FSTAT_MASK, fstat)) {
> +	case REG_FSTAT_ERA:
> +		u32 ewl, erl, rec_tec, max_tec;

[Severity: High]
Does this compile? Under the C11 standard (-std=gnu11), a label cannot be
followed immediately by a variable declaration. This needs to be wrapped in a
block or the variables must be declared earlier.

[ ... ]

> +		if (priv->can.state == state)
> +			netdev_warn(ndev, "cur and pre state is the same!(miss intr?)\n");
> +
> +		isr = REG_INT_STAT_FCSI | REG_INT_STAT_EWLI;
> +		priv->can.state = state;

[Severity: High]
Overwriting the isr variable here discards any other concurrent interrupt
flags (such as REG_INT_STAT_ALI) that were present. Since this function
later writes the overwritten isr value to LOONGSON_CANFD_INT_MASK, does
this permanently fail to unmask those erased interrupts?

> +		switch (state) {
> +		case CAN_STATE_BUS_OFF:
> +			priv->can.can_stats.bus_off++;
> +			if (priv->can.restart_ms)
> +				regmap_write(priv->regmap, LOONGSON_CANFD_CMD,
> +					     REG_CMD_ERCRST);

[Severity: High]
Does issuing a hardware reset immediately upon detecting BUS_OFF bypass the
standard CAN bus backoff mechanism?

The kernel's CAN core manages the auto-restart delay using a timer for
restart_ms. Resetting the controller's error counters here causes it to rejoin
the bus immediately, bypassing the configured recovery delay.

[ ... ]

> +		case CAN_STATE_ERROR_ACTIVE:
> +			cf->can_id |= CAN_ERR_CNT;
> +			cf->data[1] = CAN_ERR_CRTL_ACTIVE;

[Severity: High]
If alloc_can_err_skb() failed earlier due to memory pressure, cf will be
NULL here. The other error state cases guard their writes with if (skb), but
this case writes unconditionally, which would cause a kernel panic.

[ ... ]

> +	/* Check for Bus Error interrupt */
> +	if (isr & REG_INT_STAT_BEI) {
> +		isr = REG_INT_STAT_BEI;
> +		netdev_err(ndev, "Bus Error interrupt\n");
> +		priv->can.can_stats.bus_error++;
> +		stats->rx_errors++;

[Severity: Medium]
Could this increment of stats->rx_errors race with the identical increment in
loongson_canfd_rx_napi()? 

This runs in hardirq context, while the other runs in softirq context. A hardirq
interrupting the softirq's read-modify-write cycle on the exact same statistic
could cause the hardirq's increment to be overwritten and lost.

[ ... ]

> +	while (rx_frc && work_done < quota && ret > 0) {
> +		ret = loongson_canfd_rx(ndev);

[Severity: High]
If loongson_canfd_rx() encounters an error (e.g. -ENOMEM), ret becomes negative
and the loop terminates early. Because rx_frc remains non-zero, the logic below
will skip napi_complete_done() and return work_done < quota.

Returning less than the quota without completing NAPI puts it into an invalid
state, removing the device from the poll list but leaving the SCHED bit set,
which will silently stall the RX path.

[ ... ]

> +		netdev_info(ndev, "Loongson canfd RX overflow\n");
> +		stats->rx_over_errors++;
> +		stats->rx_errors++;

[Severity: Medium]
As mentioned earlier, could this unprotected increment in softirq context race
with the identical increment in the hardirq error handler, resulting in lost
updates to stats->rx_errors?

[ ... ]

> +	if (!rx_frc && ret != 0) {
> +		if (napi_complete_done(napi, work_done)) {

[Severity: High]
Is it safe to call napi_complete_done() when work_done equals quota?

If the FIFO happens to empty on exactly the last frame of the budget, this will
call napi_complete_done() and still return quota to the caller. Returning quota
indicates to the core networking stack that the driver consumed its budget and
remains scheduled, but napi_complete_done() detaches it from the poll list,
leading to list corruption.

[ ... ]

> +	guard(spinlock_irqsave)(&priv->tx_lock);
> +
> +	while ((buf_id = loongson_canfd_get_txbuf_id(priv, &sts)) < LOONGSON_CANFD_TXBUF_NUM) {
> +		switch (sts) {

[Severity: Medium]
Could this tight loop re-process the exact same TX buffer multiple times?

The loop sends a hardware clean command (TXT_CMD_SR_CLEAN) at the end of the
iteration and immediately polls the live status via loongson_canfd_get_txbuf_id()
on the next pass. Since MMIO propagation may take multiple cycles, the read might
fetch the un-cleared stale status, leading to redundant register writes and
double increments of stats->tx_packets.

> +		case TX_BS_FAIL:
> +			u32 cnt;

[Severity: High]
Similar to the error handler earlier, does this code compile under C11? A
variable declaration immediately following a case label is a syntax error.

[ ... ]

> +		/* Error interrupts */
> +		imask = isr & REG_INT_STAT_ERRORI;

[Severity: High]
Does this logic handle Bus Error Interrupts (BEI) correctly?

REG_INT_STAT_BEI is enabled during setup, but it is not included in the
REG_INT_STAT_ERRORI mask. Consequently, if a BEI fires, imask evaluates to 0,
skipping the handler and failing to clear the interrupt in the hardware, which
would generate an interrupt storm and lock up the device.

[ ... ]

> +	netdev_info(ndev, "loongson_canfd_device registered\n");
> +	napi_enable(&priv->napi);
> +	netif_start_queue(ndev);

[Severity: High]
Is there a race condition between unmasking the hardware interrupts and enabling
NAPI?

loongson_canfd_chip_start() fully arms the hardware interrupts before
napi_enable() is called. If a frame is received in this narrow window, the ISR
will fire, mask the interrupt, and call napi_schedule(). Since NAPI is still
disabled, the schedule request is silently ignored. When napi_enable()
subsequently executes, the interrupt remains masked, resulting in a completely
dead RX path.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1780908445.git.zhoubinbin@loongson.cn?part=1

next prev parent reply	other threads:[~2026-06-09 18:49 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-08  8:49 [PATCH v2 0/2] Add Loongson CAN-FD controller driver Binbin Zhou
2026-06-08  8:49 ` [PATCH v2 1/2] can: " Binbin Zhou
2026-06-08 17:43   ` Vincent Mailhol
2026-06-09 18:49   ` sashiko-bot [this message]
2026-06-08  8:49 ` [PATCH v2 2/2] can: loongson_canfd: Add RXDMA support Binbin Zhou
2026-06-08 12:13   ` Vincent Mailhol
2026-06-09 18:59   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260609184919.342881F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-can@vger.kernel.org \
    --cc=mailhol@kernel.org \
    --cc=mkl@pengutronix.de \
    --cc=o.rempel@pengutronix.de \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=socketcan@hartkopp.net \
    --cc=zhoubinbin@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.