All of lore.kernel.org
 help / color / mirror / Atom feed
From: Axel Forsman <axfo@kvaser.com>
To: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: linux-can@vger.kernel.org, mailhol.vincent@wanadoo.fr,
	stable@vger.kernel.org, Jimmy Assarsson <extja@kvaser.com>
Subject: Re: [PATCH 2/3] can: kvaser_pciefd: Fix echo_skb race conditions
Date: Wed, 09 Apr 2025 12:09:47 +0200	[thread overview]
Message-ID: <87y0w97ij8.fsf@kvaser.com> (raw)
In-Reply-To: <20250407-vanilla-lyrebird-of-leadership-5d0c72-mkl@pengutronix.de>

Marc Kleine-Budde <mkl@pengutronix.de> writes:

> On 31.03.2025 09:25:27, Axel Forsman wrote:
>> The functions kvaser_pciefd_start_xmit() and
>> kvaser_pciefd_handle_ack_packet() raced to stop/wake TX queues and
>> get/put echo skbs, as kvaser_pciefd_can->echo_lock was only ever taken
>> when transmitting. E.g., this caused the following error:
>> 
>>     can_put_echo_skb: BUG! echo_skb 5 is occupied!
>> 
>> Instead, use the synchronization helpers in netdev_queues.h. As those
>> piggyback on BQL barriers, start updating in-flight packets and bytes
>> counts as well.
>
> This looks like it does in the right direction. Using the
> netif_subqueue_completed helpers is a great idea.
>
> What usually works even better is to have 2 counters and a mask:
> - unsigned int tx_head, tx_tail
> - TXFIFO_DEPTH
>
> The tx_head is incremented in the xmit function, tail is incremented in
> the tx_done function.
>
> There's no need to check how many buffers are free in the HW.
>
> Have a look at the rockchip-canfd driver for an example.

An attempt was made to keep this a bugfix-only commit,
but I do agree it is nicer to maintain an in-flight counter
instead of querying the device.

(A mask is inapplicable,
as the size of the device TX FIFO is not necessarily a power-of-2,
though the packets have sequence numbers so it does not matter.)

I guess a write barrier would be needed before tx_tail is incremented,
for the xmit function not to see it before __can_get_echo_skb() has
cleared the echo skb slot? (Or else, can_put_echo_skb() errors if the
slot is already non-NULL.) It is not obvious to me how rockchip-canfd
handles this?

>> +	/*
>> +	 * Without room for a new message, stop the queue until at least
>> +	 * one successful transmit.
>> +	 */
>> +	if (!netif_subqueue_maybe_stop(netdev, 0, kvaser_pciefd_tx_avail(can), 1, 1))
>> +		return NETDEV_TX_BUSY;
>
> Returning NETDEV_TX_BUSY is quite expensive, stop the queue at the end
> of this function, if the buffers are full.

Will do.

>> @@ -1638,11 +1650,25 @@ static int kvaser_pciefd_read_buffer(struct kvaser_pciefd *pcie, int dma_buf)
>>  {
>>  	int pos = 0;
>>  	int res = 0;
>> +	unsigned int i;
>>  
>>  	do {
>>  		res = kvaser_pciefd_read_packet(pcie, &pos, dma_buf);
>>  	} while (!res && pos > 0 && pos < KVASER_PCIEFD_DMA_SIZE);
>>  
>> +	for (i = 0; i < pcie->nr_channels; ++i) {
>> +		struct kvaser_pciefd_can *can = pcie->can[i];
>> +
>> +		if (!can->completed_tx_pkts)
>> +			continue;
>> +		netif_subqueue_completed_wake(can->can.dev, 0,
>> +					      can->completed_tx_pkts,
>> +					      can->completed_tx_bytes,
>> +					      kvaser_pciefd_tx_avail(can), 1);
>
> You can do this as soon as as one packet is finished, if you want to
> avoid too frequent wakeups, use threshold of more than 1.

I did that initially, but changed it to follow the advice of the
netdev_tx_completed_queue() docstring. Since the RX buffer for a single
IRQ may have multiple packets, would not BQL see incorrect intervals in
that case?


Thanks for the review
Axel Forsman

  reply	other threads:[~2025-04-09 10:09 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-31  7:25 [PATCH 0/3] can: kvaser_pciefd: Fix ISR race conditions Axel Forsman
2025-03-31  7:25 ` [PATCH 1/3] can: kvaser_pciefd: Force IRQ edge in case of nested IRQ Axel Forsman
2025-04-07 14:34   ` Marc Kleine-Budde
2025-04-08 12:24     ` Axel Forsman
2025-03-31  7:25 ` [PATCH 2/3] can: kvaser_pciefd: Fix echo_skb race conditions Axel Forsman
2025-04-07 15:04   ` Marc Kleine-Budde
2025-04-09 10:09     ` Axel Forsman [this message]
2025-03-31  7:25 ` [PATCH 3/3] can: kvaser_pciefd: Continue parsing DMA buf after dropped RX Axel Forsman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y0w97ij8.fsf@kvaser.com \
    --to=axfo@kvaser.com \
    --cc=extja@kvaser.com \
    --cc=linux-can@vger.kernel.org \
    --cc=mailhol.vincent@wanadoo.fr \
    --cc=mkl@pengutronix.de \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.