Linux CAN drivers development
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Matias Ezequiel Vara Larsen" <mvaralar@redhat.com>
Cc: linux-can@vger.kernel.org,
	"Oleksij Rempel" <o.rempel@pengutronix.de>,
	"Marc Kleine-Budde" <mkl@pengutronix.de>,
	"Vincent Mailhol" <mailhol@kernel.org>,
	"Oliver Hartkopp" <socketcan@hartkopp.net>
Subject: Re: [PATCH v15] can: virtio: Add virtio CAN driver
Date: Tue, 09 Jun 2026 18:47:45 +0000	[thread overview]
Message-ID: <20260609184745.91C201F00893@smtp.kernel.org> (raw)
In-Reply-To: <ahXNb+KzuHYbS24+@fedora>

Thank you for your contribution! Sashiko AI review found 10 potential issue(s) to consider:
- [High] Virtqueue interrupts arriving while NAPI is disabled (e.g. before open, or during close/re-open) permanently disable virtqueue callbacks and stall the queues.
- [High] Unbounded `wait_for_completion` in `virtio_can_send_ctrl_msg` can deadlock the networking subsystem.
- [High] Lockless evaluation of TX flow control creates a TOCTOU race that can permanently stall the TX queue.
- [High] The `can_ctr_msg` struct and the elements of the `rpkt` array are not cacheline aligned, leading to critical DMA cacheline sharing and memory corruption on architectures with non-coherent DMA.
- [High] Valid RTR (Remote Transmission Request) frames are incorrectly dropped due to flawed payload length validation.
- [High] The driver incorrectly drops received packets when the controller is in the `CAN_STATE_ERROR_PASSIVE` state.
- [High] If the synthesized Bus-Off error SKB is dropped by the networking stack, the NAPI poll routine permanently loses the event and leaves the device stuck.
- [Medium] Concurrent modification of global `dev->stats` from separate NAPI and Qdisc contexts leads to lost statistic increments.
- [Medium] TX byte statistics are not properly accounted because can_put_echo_skb() is called with a frame_len of 0.
- [Medium] Control message failures due to a broken virtqueue silently return success, misleading the caller.
--

commit 43be0c71c5752e0d7687d30df7f568cb87caee83
Author: Matias Ezequiel Vara Larsen <mvaralar@redhat.com>

can: virtio: Add virtio CAN driver

This commit adds a virtio CAN driver based on the Virtio 1.4 specification.
The driver implements a complete CAN bus interface over Virtio transport,
supporting both CAN Classic and CAN-FD.

> diff --git a/drivers/net/can/virtio_can.c b/drivers/net/can/virtio_can.c
> index 0000000000000..f67d0bf09681f
> --- /dev/null
> +++ b/drivers/net/can/virtio_can.c

[ ... ]

> +struct virtio_can_priv {
> +	struct can_priv can;	/* must be the first member */
> +	/* NAPI for RX messages */
> +	struct napi_struct napi;
> +	/* NAPI for TX messages */
> +	struct napi_struct napi_tx;
> +	/* The network device we're associated with */
> +	struct net_device *dev;
> +	/* The virtio device we're associated with */
> +	struct virtio_device *vdev;
> +	/* The virtqueues */
> +	struct virtqueue *vqs[VIRTIO_CAN_QUEUE_COUNT];
> +	/* Lock for TX operations */
> +	spinlock_t tx_lock;
> +	/* Control queue lock */
> +	struct mutex ctrl_lock;
> +	/* Wait for control queue processing without polling */
> +	struct completion ctrl_done;
> +	/* Array of receive queue messages */
> +	struct virtio_can_rx *rpkt;
> +	struct virtio_can_control can_ctr_msg;
> +	/* Data to get and maintain the putidx for local TX echo */

[Severity: High]
The can_ctr_msg struct and the elements of the rpkt array are not cacheline
aligned.

Since can_ctr_msg is embedded without cacheline alignment, it shares a CPU
cacheline with fields like tx_putidx_ida and tx_inflight. On non-coherent
architectures, the DMA unmap invalidation for the control response can
silently overwrite concurrent CPU updates to these fields.

Could this cause memory corruption during DMA mapping?

[ ... ]

> +static u8 virtio_can_send_ctrl_msg(struct net_device *ndev, u16 msg_type)
> +{
> +	struct scatterlist sg_out, sg_in, *sgs[2] = { &sg_out, &sg_in };
> +	struct virtio_can_priv *priv = netdev_priv(ndev);
> +	struct virtqueue *vq = priv->vqs[VIRTIO_CAN_QUEUE_CONTROL];
> +	struct device *dev = &priv->vdev->dev;
> +	unsigned int len;
> +	int err;
> +
> +	if (!vq)
> +		return VIRTIO_CAN_RESULT_NOT_OK;
> +
> +	guard(mutex)(&priv->ctrl_lock);
> +
> +	priv->can_ctr_msg.cpkt_out.msg_type = cpu_to_le16(msg_type);
> +	sg_init_one(&sg_out, &priv->can_ctr_msg.cpkt_out,
> +		    sizeof(priv->can_ctr_msg.cpkt_out));
> +	sg_init_one(&sg_in, &priv->can_ctr_msg.cpkt_in, sizeof(priv->can_ctr_msg.cpkt_in));
> +
> +	reinit_completion(&priv->ctrl_done);
> +
> +	err = virtqueue_add_sgs(vq, sgs, 1u, 1u, priv, GFP_ATOMIC);
> +	if (err != 0) {
> +		dev_err(dev, "%s(): virtqueue_add_sgs() failed\n", __func__);
> +		return VIRTIO_CAN_RESULT_NOT_OK;
> +	}
> +
> +	if (!virtqueue_kick(vq)) {
> +		dev_err(dev, "%s(): Kick failed\n", __func__);
> +		return VIRTIO_CAN_RESULT_NOT_OK;
> +	}
> +
> +	while (!virtqueue_get_buf(vq, &len) && !virtqueue_is_broken(vq))
> +		wait_for_completion(&priv->ctrl_done);
> +
> +	return priv->can_ctr_msg.cpkt_in.result;
> +}

[Severity: High]
This unbounded wait_for_completion() can deadlock if the device is removed
or becomes unresponsive.

Since the wait is indefinite and uninterruptible, the loop condition checking
virtqueue_is_broken(vq) is never re-evaluated while sleeping. This function
is called from ndo_open and ndo_stop with the global rtnl_lock held.

Could this freeze the networking subsystem if the hypervisor fails to respond?

[Severity: Medium]
If the virtqueue goes into a broken state, virtqueue_is_broken(vq) evaluates
to true, which breaks the wait loop. However, the function then unconditionally
returns the possibly stale cpkt_in.result since virtqueue_get_buf never
successfully retrieved the new buffer.

Does this cause callers to erroneously assume the operation succeeded when
it actually failed?

[ ... ]

> +static netdev_tx_t virtio_can_start_xmit(struct sk_buff *skb,
> +					 struct net_device *dev)
> +{

[ ... ]

> +	/* Push loopback echo. Will be looped back on TX interrupt/TX NAPI */
> +	err = can_put_echo_skb(skb, dev, can_tx_msg->putidx, 0);

[Severity: Medium]
The frame_len parameter is hardcoded to 0 here.

When the TX packet is successfully acknowledged, virtio_can_read_tx_queue()
will increment stats->tx_bytes by the return value of can_get_echo_skb().
Since 0 was stored, the returned length is 0, and tx_bytes is never
incremented.

Should this use can_skb_get_frame_len(skb) to track TX bytes correctly?

[ ... ]

> +	/* Normal flow control: stop queue when no transmission slots left */
> +	if (atomic_read(&priv->tx_inflight) >= priv->can.echo_skb_max ||
> +	    vq->num_free == 0 || (vq->num_free < ARRAY_SIZE(sgs) &&
> +	    !virtio_has_feature(vq->vdev, VIRTIO_RING_F_INDIRECT_DESC))) {
> +		netif_stop_queue(dev);
> +		netdev_dbg(dev, "TX: Normal stop queue\n");
> +	}

[Severity: High]
This flow control evaluation and netif_stop_queue() are performed outside
of the priv->tx_lock critical section.

If the CPU executing xmit evaluates the condition as true but gets preempted
before executing netif_stop_queue(), the device might process the packet
and trigger a TX interrupt. The NAPI polling function will run, evaluate
netif_queue_stopped(dev) as false, and do nothing to wake the queue. When
this thread resumes, it will execute netif_stop_queue(), leaving the queue
stopped permanently.

Does this create a TOCTOU race that stalls the TX queue?

[ ... ]

> +static int virtio_can_read_tx_queue(struct virtqueue *vq)
> +{

[ ... ]

> +	if (can_priv->can.state < CAN_STATE_BUS_OFF) {
> +		if (result != VIRTIO_CAN_RESULT_OK) {

[ ... ]

> +		} else {
> +			stats->tx_bytes += can_get_echo_skb(dev, can_tx_msg->putidx,
> +				NULL);
> +			stats->tx_packets++;
> +		}

[Severity: Medium]
This function modifies the global dev->stats structure without holding a
common lock or using atomics. This pattern is also present in
virtio_can_read_rx_queue() for rx_packets.

The driver initializes independent NAPI instances for RX and TX, and
virtio_can_start_xmit() runs in Qdisc context. Concurrent read-modify-write
operations on dev->stats from these contexts will lead to torn or lost
updates.

Should this use per-CPU stats or u64_stats_sync to safely track statistics?

[ ... ]

> +static int virtio_can_read_rx_queue(struct virtqueue *vq)
> +{

[ ... ]

> +	if (priv->can.state >= CAN_STATE_ERROR_PASSIVE) {
> +		netdev_dbg(dev, "%s(): Controller not active\n", __func__);
> +		goto putback;
> +	}

[Severity: High]
This check drops incoming frames when the controller is in the
CAN_STATE_ERROR_PASSIVE state.

In the CAN protocol, a node in the Error Passive state is fully operational
and must continue to receive and transmit frames.

Should this check be >= CAN_STATE_BUS_OFF instead to avoid breaking bus
functionality in degraded environments?

[ ... ]

> +	if (transport_len < header_size + len) {
> +		netdev_warn(dev, "RX: Message too small for payload\n");
> +		goto putback;
> +	}

[Severity: High]
This length validation incorrectly enforces the presence of a payload for
RTR frames.

For CAN RTR frames, the len variable represents the requested Data Length
Code, but the frame contains no actual payload. If the device strictly adheres
to the protocol and transmits only the header for RTR frames, transport_len
will equal header_size, and this check will drop valid frames.

Can this be updated to handle RTR frames correctly?

[ ... ]

> +static int virtio_can_handle_busoff(struct net_device *dev)
> +{

[ ... ]

> +	/* Ensure that the BusOff indication does not get lost */
> +	if (netif_receive_skb(skb) == NET_RX_SUCCESS)
> +		priv->busoff_pending = false;
> +
> +	return 1;
> +}

[Severity: High]
If the synthesized Bus-Off error SKB is dropped by the networking stack (e.g.
returning NET_RX_DROP), the busoff_pending flag fails to clear.

The function returns 1, allowing the NAPI poll loop to eventually complete
and sleep. Because the physical device is now in a Bus-Off state, it will
never generate another RX or configuration interrupt, leaving the driver
permanently stuck with busoff_pending = true.

Will the NAPI loop fail to retry the failed event, permanently blinding the
system to the hardware failure?

[ ... ]

> +static void virtio_can_rx_intr(struct virtqueue *vq)
> +{
> +	struct virtio_can_priv *can_priv = vq->vdev->priv;
> +
> +	virtqueue_disable_cb(vq);
> +	napi_schedule(&can_priv->napi);
> +}

[Severity: High]
This disables the virtqueue callback and schedules NAPI.

If an interrupt fires while NAPI is disabled (e.g., before virtio_can_open
enables it, or after close), napi_schedule() is a no-op and silently drops
the wakeup. When NAPI is eventually enabled, it is not scheduled, and the
virtqueue callbacks are never re-enabled.

Can this permanently stall the virtqueues if an interrupt arrives at the
wrong time?

[ ... ]

> +static int virtio_can_probe(struct virtio_device *vdev)
> +{

[ ... ]

> +	if (virtio_has_feature(vdev, VIRTIO_CAN_F_CAN_FD))
> +		priv->rpkt_len += CANFD_MAX_DLEN;
> +	else
> +		priv->rpkt_len += CAN_MAX_DLEN;

[Severity: High]
This calculates rpkt_len dynamically without aligning to cacheline boundaries.

When virtqueue_add_inbuf() maps these unaligned chunks for DMA, they will
overlap cachelines. On non-coherent architectures, this violates the DMA
API.

Could this cause memory corruption during receive operations?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/ahXNb+KzuHYbS24+@fedora?part=1

      reply	other threads:[~2026-06-09 18:47 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-26 16:42 [PATCH v15] can: virtio: Add virtio CAN driver Matias Ezequiel Vara Larsen
2026-06-09 18:47 ` sashiko-bot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260609184745.91C201F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-can@vger.kernel.org \
    --cc=mailhol@kernel.org \
    --cc=mkl@pengutronix.de \
    --cc=mvaralar@redhat.com \
    --cc=o.rempel@pengutronix.de \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=socketcan@hartkopp.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox