Re: [PATCH v3 1/2] can: Add Loongson CAN-FD controller driver

Linux CAN drivers development
 help / color / mirror / Atom feed

From: Vincent Mailhol <mailhol@kernel.org>
To: Binbin Zhou <zhoubinbin@loongson.cn>
Cc: Oleksij Rempel <o.rempel@pengutronix.de>,
	linux-can@vger.kernel.org, Marc Kleine-Budde <mkl@pengutronix.de>,
	Oliver Hartkopp <socketcan@hartkopp.net>,
	sashiko-reviews@lists.linux.dev
Subject: Re: [PATCH v3 1/2] can: Add Loongson CAN-FD controller driver
Date: Tue, 30 Jun 2026 08:51:10 +0200	[thread overview]
Message-ID: <b963bde4-25fb-4cf3-9fa5-4eb417cf49ef@kernel.org> (raw)
In-Reply-To: <20260630031622.14F981F000E9@smtp.kernel.org>

Hi Binbin,

Thanks for the v3. Please first have a look at the sashiko bot findings.
Some of these are legit.

I will continue my review after you address the relevant findings.

On 30/06/2026 at 05:16, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 18 potential issue(s) to consider:
> - [High] Missing `skb` check in `CAN_STATE_ERROR_ACTIVE` case leads to a NULL pointer dereference.
> - [High] SKB allocation failure or invalid frame in RX path permanently stalls NAPI and corrupts RX FIFO.
> - [High] NAPI poll list corruption due to calling `napi_complete_done` while returning `quota`.
> - [High] Missing `napi_disable()` on error paths in `loongson_canfd_open()` leads to a kernel panic on subsequent open attempts.
> - [High] Race condition in `loongson_canfd_start_xmit` causes unexpected packet drops due to un-reaped TX buffers.
> - [High] Missing bounds check on the hardware-provided word count `REG_FRAME_META1_RWCNT` causes heap buffer overflow.
> - [High] Unchecked return value of `device_property_read_u32` leaves `clk_rate` uninitialized.
> - [High] Premature hardware restart on Bus-Off when auto-restart is configured.
> - [High] Bitwise NOT on a 16-bit variable unintentionally unmasks all interrupts due to integer promotion.
> - [High] Overwriting of `isr` variable causes lost error interrupts and permanent interrupt masking.
> - [High] Using `regmap_update_bits` on command-style SET/CLR registers causes conflicting hardware commands.
> - [High] Calling `kfree_skb()` with hardware interrupts disabled.
> - [High] Potential infinite loop in TX interrupt handler due to repeated status reads of posted writes.
> - [Medium] The driver advertises 'hardware timestamping' in commit message but does not implement it.
> - [Medium] `ndev->stats.rx_errors` is modified concurrently from both hardirq and NAPI softirq contexts without locking.
> - [Medium] Incorrect check for RTR frames causes invalid RX byte statistics.
> - [Medium] Missing `cf->len8_dlc` assignment for Classic CAN frames.
> - [Low] Syntax error due to variable declarations immediately following a switch case label.
> --
> 
> commit e0d474f047900e47a73eaa207fc07dd8c35084bf
> Author: Binbin Zhou <zhoubinbin@loongson.cn>
> Subject: can: Add Loongson CAN-FD controller driver
> 
> This commit adds support for the CAN-FD controller integrated into 
> Loongson-2K series SoCs. It implements NAPI for the RX path, TX buffer 
> management, bus error reporting, and various timing configurations.
> 
> [Severity: Medium]
> The commit message states the driver supports hardware timestamping, but is
> this functionality actually implemented? Looking at the driver code, it sets
> ethtool_op_get_ts_info but does not seem to extract the timestamp field from
> the hardware or write to skb_hwtstamps in the receive path.
> 
>> diff --git a/drivers/net/can/loongson_canfd.c b/drivers/net/can/loongson_canfd.c
>> new file mode 100644
>> index 0000000000000..1c7381129c2b6
>> --- /dev/null
>> +++ b/drivers/net/can/loongson_canfd.c
> [ ... ]
>> @@ -891,37 +891,24 @@ static int loongson_canfd_chip_start(struct net_device *ndev)
> [ ... ]
>> +	/* Configure interrupts */
>> +	int_ena = REG_INT_STAT_RBNEI | REG_INT_STAT_TXBHCI |
>> +		  REG_INT_STAT_EWLI | REG_INT_STAT_FCSI;
>> +
>> +	/* Bus error reporting */
>> +	if (priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING)
>> +		int_ena |= REG_INT_STAT_ALI | REG_INT_STAT_BEI;
>> +
>> +	/* It's after reset, so there is no need to clear anything */
>> +	regmap_write(priv->regmap, LOONGSON_CANFD_INT_MASK, ~int_ena);
> 
> [Severity: High]
> Will this bitwise NOT operation unmask all interrupts on the controller?
> Because int_ena is a 16-bit unsigned integer, the bitwise NOT causes integer
> promotion to a 32-bit signed int, filling the upper 16 bits with 1s. Since
> the upper bits of this register clear the mask, does this unintentionally
> enable every possible interrupt?
> 
>> +	regmap_write(priv->regmap, LOONGSON_CANFD_INT_ENA, int_ena);
> [ ... ]
>> @@ -1048,46 +1048,22 @@ static netdev_tx_t loongson_canfd_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> [ ... ]
>> +	for_each_clear_bit(i, &tx_brp, LOONGSON_CANFD_TXBUF_NUM) {
>> +		u32 transmission_complete_mask = 0x3 << (i * 2);
>> +
>> +		if (!(tx_bs & transmission_complete_mask)) {
>> +			buf_id = i;
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (!loongson_canfd_insert_frame(priv, cf, buf_id, can_is_canfd_skb(skb))) {
>> +		netdev_err(ndev, "TXNF set but cannot insert frame into TXTB! HW Bug?");
>> +		kfree_skb(skb);
> 
> [Severity: High]
> Could a race condition here cause silent packet drops? The hardware clears
> tx_brp immediately on completion, but tx_bs is cleared later in the ISR.
> If all inactive buffers are awaiting a reap, the loop finishes without
> finding a buffer, leaving buf_id at 0. This would cause the insert check to
> fail, dropping the packet while incorrectly returning NETDEV_TX_OK.
> 
> [Severity: High]
> Is it safe to call kfree_skb here? The tx_lock is held with
> spinlock_irqsave which disables local interrupts. Calling kfree_skb with
> hard interrupts disabled can lead to kernel warnings and potential deadlocks.
> Should this use dev_kfree_skb_any instead?
> 
>> +		ndev->stats.tx_dropped++;
>> +		return NETDEV_TX_OK;
>> +	}
> [ ... ]
>> @@ -1105,37 +1105,25 @@ static void loongson_canfd_read_rx_frame(struct loongson_canfd_priv *priv, struct canfd_frame *cf,
> [ ... ]
>> +	/* Data Length Code */
>> +	len = FIELD_GET(REG_FRAME_META1_DLC, meta1);
>> +	if (len > 8) {
>> +		if (meta1 & REG_FRAME_META1_FDF)
>> +			len = wc << 2;
>> +		else
>> +			len = 8;
>> +	}
>> +
>> +	cf->len = len;
> 
> [Severity: Medium]
> Does this properly support the CAN_CTRLMODE_CC_LEN8_DLC feature? For Classic
> CAN frames with a DLC greater than 8, the length is appropriately capped to
> 8, but the original DLC is not saved to cf->len8_dlc. This might break
> userspace protocols that rely on the original DLC data.
> 
>> +	if (unlikely(len > wc * LOONGSON_CANFD_DW_BYTE))
>> +		len = wc * LOONGSON_CANFD_DW_BYTE;
>> +
>> +	/* Data */
>> +	for (i = 0; i < len; i += LOONGSON_CANFD_DW_BYTE)
>> +		regmap_read(priv->regmap, LOONGSON_CANFD_RX_DATA, (u32 *)(cf->data + i));
> 
> [Severity: High]
> Can a malformed hardware word count cause a heap buffer overflow? The len
> variable is checked against the hardware-provided word count, but not against
> CANFD_MAX_DLEN. If the hardware yields an invalid read word count, the length
> could exceed the 64-byte payload limit, causing out of bounds memory writes
> to the skb data.
> 
> [ ... ]
>> @@ -1164,28 +1164,22 @@ static int loongson_canfd_rx(struct net_device *ndev)
> [ ... ]
>> +	/* Flexible Data-rate Format */
>> +	if (meta1 & REG_FRAME_META1_FDF)
>> +		skb = alloc_canfd_skb(ndev, &cf);
>> +	else
>> +		skb = alloc_can_skb(ndev, (struct can_frame **)&cf);
>> +	if (unlikely(!skb))
>> +		return 0;
> 
> [Severity: High]
> If the skb allocation fails, does returning 0 early permanently stall the
> NAPI loop? Since the rest of the data payload is not read from the hardware
> FIFO, the RX stream might desynchronize. Also, returning 0 here causes the
> caller to exit its loop without completing NAPI, potentially leaving it
> permanently in a scheduled state.
> 
>> +
>> +	loongson_canfd_read_rx_frame(priv, cf, meta0, meta1);
>> +
>> +	if (cf->can_id != CAN_RTR_FLAG)
>> +		stats->rx_bytes += cf->len;
> 
> [Severity: Medium]
> Is this RTR check accurate? The can_id field contains both the CAN ID and
> the flags, so a direct inequality comparison to CAN_RTR_FLAG might evaluate
> true for almost all RTR frames, erroneously inflating the RX byte statistics
> for frames with no actual payload.
> 
> [ ... ]
>> @@ -1203,32 +1203,26 @@ static enum can_state loongson_canfd_read_fault_state(struct net_device *ndev)
> [ ... ]
>> +	switch (FIELD_GET(REG_FSTAT_MASK, fstat)) {
>> +	case REG_FSTAT_ERA:
>> +		u32 ewl, erl, rec_tec, max_tec;
> 
> [Severity: Low]
> Will this compile on strict C11 compilers? The C11 standard does not allow
> variable declarations immediately following a switch case label.
> 
> [ ... ]
>> @@ -1278,82 +1278,28 @@ static void loongson_canfd_err_interrupt(struct net_device *ndev, u32 isr)
> [ ... ]
>> +	if ((isr & REG_INT_STAT_FCSI) || (isr & REG_INT_STAT_EWLI)) {
>> +		netdev_info(ndev, "state changes from %s to %s\n",
>> +			    can_get_state_str(priv->can.state), can_get_state_str(state));
>> +
>> +		if (priv->can.state == state)
>> +			netdev_warn(ndev, "cur and pre state is the same!(miss intr?)\n");
>> +
>> +		isr = REG_INT_STAT_FCSI | REG_INT_STAT_EWLI;
> 
> [Severity: High]
> Does overwriting the isr variable here permanently disable other interrupts?
> If arbitration lost or bus error flags are pending, this assignment destroys
> them. Later in the function, isr is shifted to unmask interrupts, meaning the
> erased flags will never be unmasked again.
> 
>> +		priv->can.state = state;
>> +		switch (state) {
>> +		case CAN_STATE_BUS_OFF:
>> +			priv->can.can_stats.bus_off++;
>> +			if (priv->can.restart_ms)
>> +				regmap_write(priv->regmap, LOONGSON_CANFD_CMD,
>> +					     REG_CMD_ERCRST);
> 
> [Severity: High]
> Does this prematurely restart the hardware upon a bus-off event? Instantly
> writing the error reset command clears the hardware error state and rejoins
> the bus immediately, ignoring the restart delay contract enforced by the core
> can_bus_off timer.
> 
> [ ... ]
>> +		case CAN_STATE_ERROR_ACTIVE:
>> +			cf->can_id |= CAN_ERR_CNT;
>> +			cf->data[1] = CAN_ERR_CRTL_ACTIVE;
>> +			cf->data[6] = bec.txerr;
> 
> [Severity: High]
> Could this lead to a null pointer dereference? The alloc_can_err_skb call
> earlier in the function can fail and leave cf uninitialized. While the other
> switch cases check if skb is valid before modifying cf data, this case does
> not, which would crash if memory pressure prevents allocation.
> 
> [ ... ]
>> +	/* Check for Bus Error interrupt */
>> +	if (isr & REG_INT_STAT_BEI) {
>> +		isr = REG_INT_STAT_BEI;
>> +		netdev_err(ndev, "Bus Error interrupt\n");
>> +		priv->can.can_stats.bus_error++;
>> +		stats->rx_errors++;
> 
> [Severity: Medium]
> Is it safe to increment stats->rx_errors without locking here? This counter
> is modified from hardirq context here and concurrently from softirq context
> in the NAPI poll routine, which could cause lost increments due to data races
> on 32-bit systems.
> 
> [ ... ]
>> @@ -1391,37 +1391,25 @@ static int loongson_canfd_rx_napi(struct napi_struct *napi, int quota)
> [ ... ]
>> +	/* Check for RX FIFO Overflow */
>> +	regmap_read(priv->regmap, LOONGSON_CANFD_STAT, &sts);
>> +	if (sts & REG_STAT_DOR) {
>> +		struct net_device_stats *stats = &ndev->stats;
>> +		struct can_frame *cf;
>> +		struct sk_buff *skb;
>> +
>> +		netdev_info(ndev, "Loongson canfd RX overflow\n");
>> +		stats->rx_over_errors++;
>> +		stats->rx_errors++;
> 
> [Severity: Medium]
> As noted above, stats->rx_errors is updated concurrently in both softirq and
> hardirq contexts without locking.
> 
> [ ... ]
>> +	if (rxbuf_is_empty && ret != 0) {
>> +		if (napi_complete_done(napi, work_done)) {
> 
> [Severity: High]
> Should this check if work_done is less than quota? If exactly quota frames
> are processed and the hardware buffer becomes empty, this logic calls
> napi_complete_done and then returns quota to the networking core. Returning
> the full quota instructs the core to reschedule the instance, but the state
> has already been manipulated, which can corrupt the NAPI poll list.
> 
> [ ... ]
>> @@ -1425,33 +1425,18 @@ static void loongson_canfd_tx_interrupt(struct net_device *ndev)
> [ ... ]
>> +	while ((buf_id = loongson_canfd_get_txbuf_id(priv, &sts)) < LOONGSON_CANFD_TXBUF_NUM) {
>> +		switch (sts) {
>> +		case TX_BS_VALID:
>> +			stats->tx_bytes += can_get_echo_skb(ndev, buf_id, NULL);
>> +			stats->tx_packets++;
>> +			break;
> 
> [Severity: High]
> Could this result in an inescapable infinite loop? When the driver writes a
> clear command to the posted MMIO register at the end of the loop, it
> immediately reads the status register back in the while condition. Because
> the hardware takes cycles to clear the bit, the old status is read, causing
> the driver to repeatedly process the same buffer and spin in the ISR.
> 
>> +		case TX_BS_FAIL:
>> +			u32 cnt;
> 
> [Severity: Low]
> Similar to the fault state read function, this violates C11 syntax by
> declaring a variable immediately following a switch case label.
> 
> [ ... ]
>> @@ -1584,20 +1584,14 @@ static void loongson_canfd_chip_stop(struct net_device *ndev)
>>  {
>>  	struct loongson_canfd_priv *priv = netdev_priv(ndev);
>>  
>>  	/* Disable interrupts and disable CAN */
>>  	regmap_update_bits(priv->regmap, LOONGSON_CANFD_INT_ENA, REG_INT_ENA_CLR, REG_INT_ENA_CLR);
>>  	regmap_update_bits(priv->regmap, LOONGSON_CANFD_INT_MASK,
>>  			   REG_INT_MASK_SET, REG_INT_MASK_SET);
> 
> [Severity: High]
> Are these regmap_update_bits calls safe for W1S/W1C command registers? This
> function reads the current 1s from the lower half of INT_ENA and writes them
> back along with the clear mask in the upper half, which commands the hardware
> to simultaneously set and clear the exact same interrupts.
> 
> [ ... ]
>> @@ -1604,45 +1604,22 @@ static int loongson_canfd_open(struct net_device *ndev)
> [ ... ]
>> +err_chip_start:
>> +	free_irq(ndev->irq, ndev);
>> +err_irq:
>> +	close_candev(ndev);
>> +	return ret;
>> +}
> 
> [Severity: High]
> Are we missing a napi_disable call on these error paths? If the driver fails
> to open, the candev is closed but NAPI is left enabled. A subsequent attempt
> to bring the interface up will call napi_enable on an already enabled
> instance, triggering a BUG_ON in the core.
> 
> [ ... ]
>> @@ -1690,46 +1690,24 @@ static int loongson_canfd_probe(struct platform_device *pdev)
> [ ... ]
>> +	regmap = devm_regmap_init_mmio(dev, base, &loongson_cangfd_regmap);
>> +	if (IS_ERR(regmap))
>> +		return PTR_ERR(regmap);
>> +
>> +	device_property_read_u32(dev, "clock-frequency", &clk_rate);
>> +
>> +	irq = platform_get_irq(pdev, 0);
> 
> [Severity: High]
> Should the return value of device_property_read_u32 be checked? If the
> property is missing, clk_rate retains uninitialized stack garbage which is
> then assigned to priv->can.clock.freq, leading to unpredictable hardware
> timing.


Yours sincerely,
Vincent Mailhol

next prev parent reply	other threads:[~2026-06-30  6:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-30  3:01 [PATCH v3 0/2] Add Loongson CAN-FD controller driver Binbin Zhou
2026-06-30  3:01 ` [PATCH v3 1/2] can: " Binbin Zhou
2026-06-30  3:16   ` sashiko-bot
2026-06-30  6:51     ` Vincent Mailhol [this message]
2026-06-30  3:01 ` [PATCH v3 2/2] can: loongson_canfd: Add RXDMA support Binbin Zhou
2026-06-30  3:11   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b963bde4-25fb-4cf3-9fa5-4eb417cf49ef@kernel.org \
    --to=mailhol@kernel.org \
    --cc=linux-can@vger.kernel.org \
    --cc=mkl@pengutronix.de \
    --cc=o.rempel@pengutronix.de \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=socketcan@hartkopp.net \
    --cc=zhoubinbin@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox