netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Simon Horman <horms@kernel.org>
To: Ronald Wahl <rwahl@gmx.de>
Cc: Ronald Wahl <ronald.wahl@raritan.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Ben Dooks <ben.dooks@codethink.co.uk>,
	Tristram Ha <Tristram.Ha@microchip.com>,
	netdev@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH net v3] net: ks8851: Fix TX stall caused by TX buffer overrun
Date: Sun, 17 Dec 2023 12:53:53 +0000	[thread overview]
Message-ID: <20231217125353.GY6288@kernel.org> (raw)
In-Reply-To: <20231214181112.76052-1-rwahl@gmx.de>

On Thu, Dec 14, 2023 at 07:11:12PM +0100, Ronald Wahl wrote:
> From: Ronald Wahl <ronald.wahl@raritan.com>
> 
> There is a bug in the ks8851 Ethernet driver that more data is written
> to the hardware TX buffer than actually available. This is caused by
> wrong accounting of the free TX buffer space.
> 
> The driver maintains a tx_space variable that represents the TX buffer
> space that is deemed to be free. The ks8851_start_xmit_spi() function
> adds an SKB to a queue if tx_space is large enough and reduces tx_space
> by the amount of buffer space it will later need in the TX buffer and
> then schedules a work item. If there is not enough space then the TX
> queue is stopped.
> 
> The worker function ks8851_tx_work() dequeues all the SKBs and writes
> the data into the hardware TX buffer. The last packet will trigger an
> interrupt after it was send. Here it is assumed that all data fits into
> the TX buffer.
> 
> In the interrupt routine (which runs asynchronously because it is a
> threaded interrupt) tx_space is updated with the current value from the
> hardware. Also the TX queue is woken up again.
> 
> Now it could happen that after data was sent to the hardware and before
> handling the TX interrupt new data is queued in ks8851_start_xmit_spi()
> when the TX buffer space had still some space left. When the interrupt
> is actually handled tx_space is updated from the hardware but now we
> already have new SKBs queued that have not been written to the hardware
> TX buffer yet. Since tx_space has been overwritten by the value from the
> hardware the space is not accounted for.
> 
> Now we have more data queued then buffer space available in the hardware
> and ks8851_tx_work() will potentially overrun the hardware TX buffer. In
> many cases it will still work because often the buffer is written out
> fast enough so that no overrun occurs but for example if the peer
> throttles us via flow control then an overrun may happen.
> 
> This can be fixed in different ways. The most simple way would be to set
> tx_space to 0 before writing data to the hardware TX buffer preventing
> the queuing of more SKBs until the TX interrupt has been handled. I have
> chosen a slightly more efficient (and still rather simple) way and
> track the amount of data that is already queued and not yet written to
> the hardware. When new SKBs are to be queued the already queued amount
> of data is honoured when checking free TX buffer space.
> 
> I tested this with a setup of two linked KS8851 running iperf3 between
> the two in bidirectional mode. Before the fix I got a stall after some
> minutes. With the fix I saw now issues anymore after hours.
> 
> Fixes: 3ba81f3ece3c ("net: Micrel KS8851 SPI network driver")
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Ben Dooks <ben.dooks@codethink.co.uk>
> Cc: Tristram Ha <Tristram.Ha@microchip.com>
> Cc: netdev@vger.kernel.org
> Cc: stable@vger.kernel.org # 5.10+
> Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com>
> ---
> V3: - Add missing kdoc of structure fields
>     - Avoid potential NULL pointer dereference
>     - Fix stack variable declaration order
> 
> V2: - Added Fixes: tag (issue actually present from the beginning)
>     - cosmetics reported by checkpatch

Thanks for the updates.

This change looks good to me, and I agree that
the problem was introduced in the cited commit.

Reviewed-by: Simon Horman <horms@kernel.org>

  reply	other threads:[~2023-12-17 12:53 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-14 18:11 [PATCH net v3] net: ks8851: Fix TX stall caused by TX buffer overrun Ronald Wahl
2023-12-17 12:53 ` Simon Horman [this message]
2023-12-19 11:00 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231217125353.GY6288@kernel.org \
    --to=horms@kernel.org \
    --cc=Tristram.Ha@microchip.com \
    --cc=ben.dooks@codethink.co.uk \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=ronald.wahl@raritan.com \
    --cc=rwahl@gmx.de \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).