netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael Walle" <mwalle@kernel.org>
To: "Vladimir Oltean" <vladimir.oltean@nxp.com>, <netdev@vger.kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>, "Andrew Lunn" <andrew@lunn.ch>,
	"Claudiu Manoil" <claudiu.manoil@nxp.com>,
	"Alexandre Belloni" <alexandre.belloni@bootlin.com>,
	<UNGLinuxDriver@microchip.com>,
	"Xiaoliang Yang" <xiaoliang.yang_1@nxp.com>,
	"Yangbo Lu" <yangbo.lu@nxp.com>,
	"Radu Bulie" <radu-andrei.bulie@nxp.com>
Subject: Re: [PATCH net] net: dsa: felix: fix stuck CPU-injected packets with short taprio windows
Date: Tue, 10 Dec 2024 14:50:33 +0100	[thread overview]
Message-ID: <D682I0FBTZYR.RBPXND1NUZFR@kernel.org> (raw)
In-Reply-To: <20241210132640.3426788-1-vladimir.oltean@nxp.com>

[-- Attachment #1: Type: text/plain, Size: 6750 bytes --]

On Tue Dec 10, 2024 at 2:26 PM CET, Vladimir Oltean wrote:
> With this port schedule:
>
> tc qdisc replace dev $send_if parent root handle 100 taprio \
> 	num_tc 8 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> 	map 0 1 2 3 4 5 6 7 \
> 	base-time 0 cycle-time 10000 \
> 	sched-entry S 01 1250 \
> 	sched-entry S 02 1250 \
> 	sched-entry S 04 1250 \
> 	sched-entry S 08 1250 \
> 	sched-entry S 10 1250 \
> 	sched-entry S 20 1250 \
> 	sched-entry S 40 1250 \
> 	sched-entry S 80 1250 \
> 	flags 2
>
> ptp4l would fail to take TX timestamps of Pdelay_Resp messages like this:
>
> increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
> ptp4l[4134.168]: port 2: send peer delay response failed
>
> It turns out that the driver can't take their TX timestamps because it
> can't transmit them in the first place. And there's nothing special
> about the Pdelay_Resp packets - they're just regular 68 byte packets.
> But with this taprio configuration, the switch would refuse to send even
> the ETH_ZLEN minimum packet size.
>
> This should have definitely not been the case. When applying the taprio
> config, the driver prints:
>
> mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
>
> and thus, everything under 132 bytes - ETH_FCS_LEN should have been sent
> without problems. Yet it's not.
>
> For the forwarding path, the configuration is fine, yet packets injected
> from Linux get stuck with this schedule no matter what.
>
> The first hint that the static guard bands are the cause of the problem
> is that reverting Michael Walle's commit 297c4de6f780 ("net: dsa: felix:
> re-enable TAS guard band mode") made things work. It must be that the
> guard bands are calculated incorrectly.
>
> I remembered that there is a magic constant in the driver, set to 33 ns
> for no logical reason other than experimentation, which says "never let
> the static guard bands get so large as to leave less than this amount of
> remaining space in the time slot, because the queue system will refuse
> to schedule packets otherwise, and they will get stuck". I had a hunch
> that my previous experimentally-determined value was only good for
> packets coming from the forwarding path, and that the CPU injection path
> needed more.
>
> I came to the new value of 35 ns through binary search, after seeing
> that with 544 ns (the bit time required to send the Pdelay_Resp packet
> at gigabit) it works. Again, this is purely experimental, there's no
> logic and the manual doesn't say anything.
>
> The new driver prints for this schedule look like this:
>
> mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
>
> So yes, the maximum MTU is now even smaller by 1 byte than before.
> This is maybe counter-intuitive, but makes more sense with a diagram of
> one time slot.
>
> Before:
>
>  Gate open                                   Gate close
>  |                                                    |
>  v           1250 ns total time slot duration         v
>  <---------------------------------------------------->
>  <----><---------------------------------------------->
>   33 ns            1217 ns static guard band
>   useful
>
>  Gate open                                   Gate close
>  |                                                    |
>  v           1250 ns total time slot duration         v
>  <---------------------------------------------------->
>  <-----><--------------------------------------------->
>   35 ns            1215 ns static guard band
>   useful
>
> The static guard band implemented by this switch hardware directly
> determines the maximum allowable MTU for that traffic class. The larger
> it is, the earlier the switch will stop scheduling frames for
> transmission, because otherwise they might overrun the gate close time
> (and avoiding that is the entire purpose of Michael's patch).
> So, we now have guard bands smaller by 2 ns, thus, in this particular
> case, we lose a byte of the maximum MTU.
>
> Fixes: 11afdc6526de ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Makes sense:

Reviewed-by: Michael Walle <mwalle@kernel.org>

-michael

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 297 bytes --]

  reply	other threads:[~2024-12-10 13:50 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-10 13:26 [PATCH net] net: dsa: felix: fix stuck CPU-injected packets with short taprio windows Vladimir Oltean
2024-12-10 13:50 ` Michael Walle [this message]
2024-12-12  4:40 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D682I0FBTZYR.RBPXND1NUZFR@kernel.org \
    --to=mwalle@kernel.org \
    --cc=UNGLinuxDriver@microchip.com \
    --cc=alexandre.belloni@bootlin.com \
    --cc=andrew@lunn.ch \
    --cc=claudiu.manoil@nxp.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=radu-andrei.bulie@nxp.com \
    --cc=vladimir.oltean@nxp.com \
    --cc=xiaoliang.yang_1@nxp.com \
    --cc=yangbo.lu@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).