All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael Walle" <mwalle@kernel.org>
To: "Vladimir Oltean" <vladimir.oltean@nxp.com>, <netdev@vger.kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>, "Andrew Lunn" <andrew@lunn.ch>,
	"Claudiu Manoil" <claudiu.manoil@nxp.com>,
	"Alexandre Belloni" <alexandre.belloni@bootlin.com>,
	<UNGLinuxDriver@microchip.com>,
	"Xiaoliang Yang" <xiaoliang.yang_1@nxp.com>,
	"Yangbo Lu" <yangbo.lu@nxp.com>,
	"Radu Bulie" <radu-andrei.bulie@nxp.com>
Subject: Re: [PATCH net] net: dsa: felix: fix stuck CPU-injected packets with short taprio windows
Date: Tue, 10 Dec 2024 14:50:33 +0100	[thread overview]
Message-ID: <D682I0FBTZYR.RBPXND1NUZFR@kernel.org> (raw)
In-Reply-To: <20241210132640.3426788-1-vladimir.oltean@nxp.com>

[-- Attachment #1: Type: text/plain, Size: 6750 bytes --]

On Tue Dec 10, 2024 at 2:26 PM CET, Vladimir Oltean wrote:
> With this port schedule:
>
> tc qdisc replace dev $send_if parent root handle 100 taprio \
> 	num_tc 8 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> 	map 0 1 2 3 4 5 6 7 \
> 	base-time 0 cycle-time 10000 \
> 	sched-entry S 01 1250 \
> 	sched-entry S 02 1250 \
> 	sched-entry S 04 1250 \
> 	sched-entry S 08 1250 \
> 	sched-entry S 10 1250 \
> 	sched-entry S 20 1250 \
> 	sched-entry S 40 1250 \
> 	sched-entry S 80 1250 \
> 	flags 2
>
> ptp4l would fail to take TX timestamps of Pdelay_Resp messages like this:
>
> increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
> ptp4l[4134.168]: port 2: send peer delay response failed
>
> It turns out that the driver can't take their TX timestamps because it
> can't transmit them in the first place. And there's nothing special
> about the Pdelay_Resp packets - they're just regular 68 byte packets.
> But with this taprio configuration, the switch would refuse to send even
> the ETH_ZLEN minimum packet size.
>
> This should have definitely not been the case. When applying the taprio
> config, the driver prints:
>
> mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
>
> and thus, everything under 132 bytes - ETH_FCS_LEN should have been sent
> without problems. Yet it's not.
>
> For the forwarding path, the configuration is fine, yet packets injected
> from Linux get stuck with this schedule no matter what.
>
> The first hint that the static guard bands are the cause of the problem
> is that reverting Michael Walle's commit 297c4de6f780 ("net: dsa: felix:
> re-enable TAS guard band mode") made things work. It must be that the
> guard bands are calculated incorrectly.
>
> I remembered that there is a magic constant in the driver, set to 33 ns
> for no logical reason other than experimentation, which says "never let
> the static guard bands get so large as to leave less than this amount of
> remaining space in the time slot, because the queue system will refuse
> to schedule packets otherwise, and they will get stuck". I had a hunch
> that my previous experimentally-determined value was only good for
> packets coming from the forwarding path, and that the CPU injection path
> needed more.
>
> I came to the new value of 35 ns through binary search, after seeing
> that with 544 ns (the bit time required to send the Pdelay_Resp packet
> at gigabit) it works. Again, this is purely experimental, there's no
> logic and the manual doesn't say anything.
>
> The new driver prints for this schedule look like this:
>
> mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
>
> So yes, the maximum MTU is now even smaller by 1 byte than before.
> This is maybe counter-intuitive, but makes more sense with a diagram of
> one time slot.
>
> Before:
>
>  Gate open                                   Gate close
>  |                                                    |
>  v           1250 ns total time slot duration         v
>  <---------------------------------------------------->
>  <----><---------------------------------------------->
>   33 ns            1217 ns static guard band
>   useful
>
>  Gate open                                   Gate close
>  |                                                    |
>  v           1250 ns total time slot duration         v
>  <---------------------------------------------------->
>  <-----><--------------------------------------------->
>   35 ns            1215 ns static guard band
>   useful
>
> The static guard band implemented by this switch hardware directly
> determines the maximum allowable MTU for that traffic class. The larger
> it is, the earlier the switch will stop scheduling frames for
> transmission, because otherwise they might overrun the gate close time
> (and avoiding that is the entire purpose of Michael's patch).
> So, we now have guard bands smaller by 2 ns, thus, in this particular
> case, we lose a byte of the maximum MTU.
>
> Fixes: 11afdc6526de ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Makes sense:

Reviewed-by: Michael Walle <mwalle@kernel.org>

-michael

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 297 bytes --]

  reply	other threads:[~2024-12-10 13:50 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-10 13:26 [PATCH net] net: dsa: felix: fix stuck CPU-injected packets with short taprio windows Vladimir Oltean
2024-12-10 13:50 ` Michael Walle [this message]
2024-12-12  4:40 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D682I0FBTZYR.RBPXND1NUZFR@kernel.org \
    --to=mwalle@kernel.org \
    --cc=UNGLinuxDriver@microchip.com \
    --cc=alexandre.belloni@bootlin.com \
    --cc=andrew@lunn.ch \
    --cc=claudiu.manoil@nxp.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=radu-andrei.bulie@nxp.com \
    --cc=vladimir.oltean@nxp.com \
    --cc=xiaoliang.yang_1@nxp.com \
    --cc=yangbo.lu@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.