linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matias Ezequiel Vara Larsen <mvaralar@redhat.com>
To: Francesco Valla <francesco@valla.it>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>,
	Vincent Mailhol <mailhol@kernel.org>,
	Harald Mommer <harald.mommer@oss.qualcomm.com>,
	Mikhail Golubev-Ciuchea
	<mikhail.golubev-ciuchea@oss.qualcomm.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
	linux-can@vger.kernel.org, virtualization@lists.linux.dev,
	Wolfgang Grandegger <wg@grandegger.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Stefano Garzarella <sgarzare@redhat.com>
Subject: Re: [PATCH v6] can: virtio: Add virtio CAN driver
Date: Fri, 26 Dec 2025 21:52:48 +0100	[thread overview]
Message-ID: <aU71oJScQ8aC0npw@fedora> (raw)
In-Reply-To: <aT7XAsTWr0_yyfx_@bywater>

On Sun, Dec 14, 2025 at 04:25:54PM +0100, Francesco Valla wrote:
> Hi Matias,
> 
> On Thu, Dec 11, 2025 at 06:52:21PM +0100, Matias Ezequiel Vara Larsen wrote:
> > Thanks for your comments, I took most of them. I have some questions
> > though. 
> > 
> 
> (snip)
> 
> > > > + */
> > > > +static int virtio_can_add_inbuf(struct virtqueue *vq, void *buf,
> > > > +				unsigned int size)
> > > > +{
> > > > +	struct scatterlist sg[1];
> > > > +	int ret;
> > > > +
> > > > +	sg_init_one(sg, buf, size);
> > > > +
> > > > +	ret = virtqueue_add_inbuf(vq, sg, 1, buf, GFP_ATOMIC);
> > > > +	virtqueue_kick(vq);
> > > > +	if (ret == 0)
> > > > +		ret = vq->num_free;
> > > 
> > > The returned value in case of success (i.e.: vq->num_free) is not used.
> > > 
> > 
> > I removed it this but something is not working anymore so I need to
> > investigate.
> >
> 
> That should't happen, since vq->num_free is defined as an unsigned int
> and no caller is reacting on return values >= 0.
> 
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +/* Send a control message with message type either
> > > > + *
> > > > + * - VIRTIO_CAN_SET_CTRL_MODE_START or
> > > > + * - VIRTIO_CAN_SET_CTRL_MODE_STOP.
> > > > + *
> > > > + * Unlike AUTOSAR CAN Driver Can_SetControllerMode() there is no requirement
> > > > + * for this Linux driver to have an asynchronous implementation of the mode
> > > > + * setting function so in order to keep things simple the function is
> > > > + * implemented as synchronous function. Design pattern is
> > > > + * virtio_console.c/__send_control_msg() & virtio_net.c/virtnet_send_command().
> > > > + */
> > > 
> > > Comment is not really helpful nor informative, at least for me. I'd drop
> > > the AUTOSAR part at least.
> > > 
> > 
> > I removed it.
> > 
> > > > +static u8 virtio_can_send_ctrl_msg(struct net_device *ndev, u16 msg_type)
> > > > +{
> > > > +	struct scatterlist sg_out, sg_in, *sgs[2] = { &sg_out, &sg_in };
> > > > +	struct virtio_can_priv *priv = netdev_priv(ndev);
> > > > +	struct device *dev = &priv->vdev->dev;
> > > > +	struct virtqueue *vq;
> > > > +	unsigned int len;
> > > > +	int err;
> > > > +
> > > > +	vq = priv->vqs[VIRTIO_CAN_QUEUE_CONTROL];
> > > > +
> > > > +	/* The function may be serialized by rtnl lock. Not sure.
> > > > +	 * Better safe than sorry.
> > > > +	 */
> > > > +	mutex_lock(&priv->ctrl_lock);
> > > 
> > > Consider the newer guard() syntax:
> > > 
> > >     guard(mutex)(&priv->ctrl_lock)
> > > 
> > 
> > IIUC this adds the unlock automatically, right?
> >
> 
> Yes, as soon as the function returnsi, and even in case of early
> returns.
> 
> > > > +
> > > > +	priv->can_ctr_msg.cpkt_out.msg_type = cpu_to_le16(msg_type);
> > > > +	sg_init_one(&sg_out, &priv->can_ctr_msg.cpkt_out,
> > > > +		    sizeof(priv->can_ctr_msg.cpkt_out));
> > > > +	sg_init_one(&sg_in, &priv->can_ctr_msg.cpkt_in, sizeof(priv->can_ctr_msg.cpkt_in));
> > > > +
> > > > +	err = virtqueue_add_sgs(vq, sgs, 1u, 1u, priv, GFP_ATOMIC);
> > > > +	if (err != 0) {
> > > > +		/* Not expected to happen */
> > > > +		dev_err(dev, "%s(): virtqueue_add_sgs() failed\n", __func__);
> > > > +		mutex_unlock(&priv->ctrl_lock);
> > > > +		return VIRTIO_CAN_RESULT_NOT_OK;
> > > > +	}
> > > > +
> > > > +	if (!virtqueue_kick(vq)) {
> > > > +		/* Not expected to happen */
> > > > +		dev_err(dev, "%s(): Kick failed\n", __func__);
> > > > +		mutex_unlock(&priv->ctrl_lock);
> > > > +		return VIRTIO_CAN_RESULT_NOT_OK;
> > > > +	}
> > > > +
> > > > +	while (!virtqueue_get_buf(vq, &len) && !virtqueue_is_broken(vq))
> > > > +		wait_for_completion(&priv->ctrl_done);
> > > > +
> > > 
> > > Since the call is synchronous, does can_ctr_msg really need to be part
> > > of priv? Cannot be it allocated from the stack?
> > > 
> > 
> > I moved it to the stack.
> > 
> 
> This was a bad suggestion, as Michael S. Tsirkin suggested in another
> branch of this thread [0]. Sorry.
> 
> > > > +	mutex_unlock(&priv->ctrl_lock);
> > > > +
> > > > +	return priv->can_ctr_msg.cpkt_in.result;
> > > > +}
> > > > +
> > > > +static void virtio_can_start(struct net_device *ndev)
> > > > +{
> > > > +	struct virtio_can_priv *priv = netdev_priv(ndev);
> > > > +	u8 result;
> > > > +
> > > > +	result = virtio_can_send_ctrl_msg(ndev, VIRTIO_CAN_SET_CTRL_MODE_START);
> > > > +	if (result != VIRTIO_CAN_RESULT_OK) {
> > > > +		/* Not expected to happen */
> > > > +		netdev_err(ndev, "CAN controller start failed\n");
> > > > +	}
> > > > +
> > > > +	priv->busoff_pending = false;
> > > > +	priv->can.state = CAN_STATE_ERROR_ACTIVE;
> > > > +
> > > > +	/* Switch carrier on if device was not connected to the bus */
> > > > +	if (!netif_carrier_ok(ndev))
> > > > +		netif_carrier_on(ndev);
> > > > +}
> > > > +
> > > > +/* See also m_can.c/m_can_set_mode()
> > > > + *
> > > > + * It is interesting that not only the M-CAN implementation but also all other
> > > > + * implementations I looked into only support CAN_MODE_START.
> > > > + * That CAN_MODE_SLEEP is frequently not found to be supported anywhere did not
> > > > + * come not as surprise but that CAN_MODE_STOP is also never supported was one.
> > > > + * The function is accessible via the method pointer do_set_mode in
> > > > + * struct can_priv. As usual no documentation there.
> > > > + * May not play any role as grepping through the code did not reveal any place
> > > > + * from where the method is actually called.
> > > > + */
> > > 
> > > The do_set_mode method is used e.g. to restart the CAN after a bus-off event.
> > > 
> > 
> > Do you think we need to support CAN_MODE_STOP?
> >
> 
> No, I don't think so, but the comment is somewhat misleading and feels
> more like some private notes.
> 
> > > > +static int virtio_can_set_mode(struct net_device *dev, enum can_mode mode)
> > > > +{
> > > > +	switch (mode) {
> > > > +	case CAN_MODE_START:
> > > > +		virtio_can_start(dev);
> > > > +		netif_wake_queue(dev);
> > > > +		break;
> > > > +	default:
> > > > +		return -EOPNOTSUPP;
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> 
> (snip)
> 
> > > > +/* Compare with m_can.c/m_can_echo_tx_event() */
> > > > +static int virtio_can_read_tx_queue(struct virtqueue *vq)
> > > > +{
> > > > +	struct virtio_can_priv *can_priv = vq->vdev->priv;
> > > > +	struct net_device *dev = can_priv->dev;
> > > > +	struct virtio_can_tx *can_tx_msg;
> > > > +	struct net_device_stats *stats;
> > > > +	unsigned long flags;
> > > > +	unsigned int len;
> > > > +	u8 result;
> > > > +
> > > > +	stats = &dev->stats;
> > > > +
> > > > +	/* Protect list and virtio queue operations */
> > > > +	spin_lock_irqsave(&can_priv->tx_lock, flags);
> > > 
> > > The section below seems a pretty big one to protect behind a spin lock. 
> > > 
> > 
> > How can I split it? 
> > 
> 
> Question here is: what needs to be protected? As far as I can tell, the
> only entity needing some kind of locking here is the queue, while both
> ida_* and tx_inflight operations are already covered (the former by
> design [1], the second because it's implemented using an atomic.
> 
> If I'm not wrong (but I might be, so please double check) this can be
> limited to:
> 
> 	/* Protect queue operations */
> 	scoped_guard(spinlock_irqsave, &priv->tx_lock)
> 		err = virtqueue_add_sgs(vq, sgs, 1u, 1u, can_tx_msg, GFP_ATOMIC);
> 
> 
> Maybe the whole locking pattern is a leftover from a previous version, 
> where a list of TX messages was kept?
> 

I followed this approach for the three queues. I wonder why the rx queue
and the ctrl queue use a mutex instead of spinlock? I added a mutex for
the operations to the rx queue.

Matias

> > > > +
> > > > +	can_tx_msg = virtqueue_get_buf(vq, &len);
> > > > +	if (!can_tx_msg) {
> > > > +		spin_unlock_irqrestore(&can_priv->tx_lock, flags);
> > > > +		return 0; /* No more data */
> > > > +	}
> > > > +
> > > > +	if (unlikely(len < sizeof(struct virtio_can_tx_in))) {
> > > > +		netdev_err(dev, "TX ACK: Device sent no result code\n");
> > > > +		result = VIRTIO_CAN_RESULT_NOT_OK; /* Keep things going */
> > > > +	} else {
> > > > +		result = can_tx_msg->tx_in.result;
> > > > +	}
> > > > +
> 
> (snip)
> 
> > > > +static int virtio_can_probe(struct virtio_device *vdev)
> > > > +{
> > > > +	struct virtio_can_priv *priv;
> > > > +	struct net_device *dev;
> > > > +	int err;
> > > > +
> > > > +	dev = alloc_candev(sizeof(struct virtio_can_priv),
> > > > +			   VIRTIO_CAN_ECHO_SKB_MAX);
> > > > +	if (!dev)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	priv = netdev_priv(dev);
> > > > +
> > > > +	ida_init(&priv->tx_putidx_ida);
> > > > +
> > > > +	netif_napi_add(dev, &priv->napi, virtio_can_rx_poll);
> > > > +	netif_napi_add(dev, &priv->napi_tx, virtio_can_tx_poll);
> > > > +
> > > > +	SET_NETDEV_DEV(dev, &vdev->dev);
> > > > +
> > > > +	priv->dev = dev;
> > > > +	priv->vdev = vdev;
> > > > +	vdev->priv = priv;
> > > > +
> > > > +	priv->can.do_set_mode = virtio_can_set_mode;
> > > > +	/* Set Virtio CAN supported operations */
> > > > +	priv->can.ctrlmode_supported = CAN_CTRLMODE_BERR_REPORTING;
> > > > +	if (virtio_has_feature(vdev, VIRTIO_CAN_F_CAN_FD)) {
> > > > +		err = can_set_static_ctrlmode(dev, CAN_CTRLMODE_FD);
> > > > +		if (err != 0)
> > > > +			goto on_failure;
> > > > +	}
> > > > +
> > > > +	/* Initialize virtqueues */
> > > > +	err = virtio_can_find_vqs(priv);
> > > > +	if (err != 0)
> > > > +		goto on_failure;
> > > > +
> > > > +	spin_lock_init(&priv->tx_lock);
> > > > +	mutex_init(&priv->ctrl_lock);
> > > > +
> > > > +	init_completion(&priv->ctrl_done);
> > > > +
> > > > +	if (virtio_has_feature(vdev, VIRTIO_CAN_F_CAN_FD))
> > > > +		priv->sdu_len = CANFD_MAX_DLEN;
> > > > +	else
> > > > +		priv->sdu_len = CAN_MAX_DLEN;
> > > 
> > > Consider replacing sdu_len with something like rpkt_len, which is
> > > directly related to priv->rpkt and can make code more readable, here and
> > > in other locations. E.g.:
> > 
> > I replaced it with rpkt_len.
> > 
> > > 
> > > 	priv->rpkt_len = sizeof(struct virtio_can_rx);
> > > 	if (virtio_has_feature(vdev, VIRTIO_CAN_F_CAN_FD))
> > >     		priv->rpkt_len += CANFD_MAX_DLEN;
> > > 	else
> > > 		priv->rpkt_len += CAN_MAX_DLEN;
> > > 
> > > 	priv->rpkt = kzalloc(priv->rpkt_len * ...
> > > 
> > > > +
> > > > +	priv->rpkt = kzalloc((sizeof(struct virtio_can_rx) + priv->sdu_len) *
> > > > +						priv->vqs[VIRTIO_CAN_QUEUE_RX]->num_free,
> > > > +						GFP_KERNEL);
> > > 
> > > If I'm not mistaken, priv->rpkt is never freed. Consider moving to
> > > devm_kzalloc().
> > > 
> > 
> > Done.
> > 
> > > > +	if (!priv->rpkt) {
> > > > +		virtio_can_del_vq(vdev);
> > > > +		goto on_failure;
> > > > +	}
> > > > +	virtio_can_populate_rx_vq(vdev);
> > > > +
> > > > +	err = register_virtio_can_dev(dev);
> > > > +	if (err) {
> > > > +		virtio_can_del_vq(vdev);
> > > > +		goto on_failure;
> > > > +	}
> > > > +
> > > > +	napi_enable(&priv->napi);
> > > > +	napi_enable(&priv->napi_tx);
> > > 
> > > Most of the existing drivers enable the napi(s) during the open() phase,
> > > IIUC to avoid scheduling napi operations for devices that might never
> > > get used. But here maybe there is a specific reason to do it this way?
> > > 
> > 
> > I do not have idea. I moved to open() and something stopped to work. I
> > am investigating it.
> >
> 
> On a second thought, it may be wiser to have the napis enabled on probe,
> to drop the incoming messages even when the interface is brought down.
> 
> (last snip)
> 
> 
> While stress testing this, I noticed that flooding the virtio-can
> interface with packets leads to an hang of the interface itself.
> I am seeing this issuing, at host side:
> 
> 	while true; do cansend can0 123#00; done
> 
> with:
> 
>  - QEMU: the tip of the master branch plus [2]
>  - vhost-device: the tip of the main branch
> 
> and the following QEMU invocation:
> 
> qemu-system-x86_64 -serial mon:stdio \
>     -m 2G -smp 2 \
>     -kernel $(pwd)/BUILD.bin/arch/x86/boot/bzImage \
>     -initrd /home/francesco/SRC/LINUX_KERNEL/initramfs.gz \
>     -append "loglevel=7 console=ttyS0" \
>     -machine memory-backend=pc.ram \
>     -object memory-backend-file,id=pc.ram,size=2G,mem-path=/tmp/pc.ram,share=on \
>     -chardev socket,id=can0,path=/tmp/sock-can0 \
>     -device vhost-user-can-pci,chardev=can0
> 
> 
> Restarting the interface (i.e.: ip link set down and the up) does not
> fix the situation.
> 
> I'll try to do some more testing during the next days.
> 
> 
> > Thanks for reviewing it!
> > 
> > Matias
> > 
> >
> 
> Regards,
> Francesco
> 
> 
> [0] https://lore.kernel.org/linux-can/20251214022000-mutt-send-email-mst@kernel.org/
> [1] https://elixir.bootlin.com/linux/v6.18.1/source/lib/idr.c#L323
> [2] https://lore.kernel.org/qemu-devel/20251031155617.1223248-1-mvaralar@redhat.com/
> 
> 


  parent reply	other threads:[~2025-12-26 20:52 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-29 17:40 [PATCH v6] can: virtio: Add virtio CAN driver Matias Ezequiel Vara Larsen
2025-11-03 21:37 ` Francesco Valla
2025-12-11 17:52   ` Matias Ezequiel Vara Larsen
2025-12-14 15:25     ` Francesco Valla
2025-12-18 19:51       ` Harald Mommer
2025-12-18 23:07         ` Francesco Valla
2025-12-21 13:01         ` Michael S. Tsirkin
2025-12-26 19:45         ` Matias Ezequiel Vara Larsen
2025-12-26 15:08       ` Francesco Valla
2025-12-26 20:52       ` Matias Ezequiel Vara Larsen [this message]
2025-12-26 22:22         ` Francesco Valla
2025-12-29 15:47           ` Matias Ezequiel Vara Larsen
2025-12-29 18:53       ` Matias Ezequiel Vara Larsen
2025-12-29 20:55         ` Francesco Valla
2025-12-12 15:35   ` Matias Ezequiel Vara Larsen
2025-12-14  7:20     ` Michael S. Tsirkin
2025-12-14 14:24       ` Francesco Valla
2025-11-17  9:49 ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aU71oJScQ8aC0npw@fedora \
    --to=mvaralar@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=francesco@valla.it \
    --cc=harald.mommer@oss.qualcomm.com \
    --cc=jasowang@redhat.com \
    --cc=kuba@kernel.org \
    --cc=linux-can@vger.kernel.org \
    --cc=mailhol@kernel.org \
    --cc=mikhail.golubev-ciuchea@oss.qualcomm.com \
    --cc=mkl@pengutronix.de \
    --cc=mst@redhat.com \
    --cc=pabeni@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=wg@grandegger.com \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).