From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev,
Vladimir Oltean <vladimir.oltean@nxp.com>,
Michael Walle <mwalle@kernel.org>,
Jakub Kicinski <kuba@kernel.org>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 6.1 58/76] net: dsa: felix: fix stuck CPU-injected packets with short taprio windows
Date: Tue, 17 Dec 2024 18:07:38 +0100 [thread overview]
Message-ID: <20241217170528.676866588@linuxfoundation.org> (raw)
In-Reply-To: <20241217170526.232803729@linuxfoundation.org>
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean <vladimir.oltean@nxp.com>
[ Upstream commit acfcdb78d5d4cdb78e975210c8825b9a112463f6 ]
With this port schedule:
tc qdisc replace dev $send_if parent root handle 100 taprio \
num_tc 8 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
map 0 1 2 3 4 5 6 7 \
base-time 0 cycle-time 10000 \
sched-entry S 01 1250 \
sched-entry S 02 1250 \
sched-entry S 04 1250 \
sched-entry S 08 1250 \
sched-entry S 10 1250 \
sched-entry S 20 1250 \
sched-entry S 40 1250 \
sched-entry S 80 1250 \
flags 2
ptp4l would fail to take TX timestamps of Pdelay_Resp messages like this:
increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
ptp4l[4134.168]: port 2: send peer delay response failed
It turns out that the driver can't take their TX timestamps because it
can't transmit them in the first place. And there's nothing special
about the Pdelay_Resp packets - they're just regular 68 byte packets.
But with this taprio configuration, the switch would refuse to send even
the ETH_ZLEN minimum packet size.
This should have definitely not been the case. When applying the taprio
config, the driver prints:
mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
and thus, everything under 132 bytes - ETH_FCS_LEN should have been sent
without problems. Yet it's not.
For the forwarding path, the configuration is fine, yet packets injected
from Linux get stuck with this schedule no matter what.
The first hint that the static guard bands are the cause of the problem
is that reverting Michael Walle's commit 297c4de6f780 ("net: dsa: felix:
re-enable TAS guard band mode") made things work. It must be that the
guard bands are calculated incorrectly.
I remembered that there is a magic constant in the driver, set to 33 ns
for no logical reason other than experimentation, which says "never let
the static guard bands get so large as to leave less than this amount of
remaining space in the time slot, because the queue system will refuse
to schedule packets otherwise, and they will get stuck". I had a hunch
that my previous experimentally-determined value was only good for
packets coming from the forwarding path, and that the CPU injection path
needed more.
I came to the new value of 35 ns through binary search, after seeing
that with 544 ns (the bit time required to send the Pdelay_Resp packet
at gigabit) it works. Again, this is purely experimental, there's no
logic and the manual doesn't say anything.
The new driver prints for this schedule look like this:
mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
So yes, the maximum MTU is now even smaller by 1 byte than before.
This is maybe counter-intuitive, but makes more sense with a diagram of
one time slot.
Before:
Gate open Gate close
| |
v 1250 ns total time slot duration v
<---------------------------------------------------->
<----><---------------------------------------------->
33 ns 1217 ns static guard band
useful
Gate open Gate close
| |
v 1250 ns total time slot duration v
<---------------------------------------------------->
<-----><--------------------------------------------->
35 ns 1215 ns static guard band
useful
The static guard band implemented by this switch hardware directly
determines the maximum allowable MTU for that traffic class. The larger
it is, the earlier the switch will stop scheduling frames for
transmission, because otherwise they might overrun the gate close time
(and avoiding that is the entire purpose of Michael's patch).
So, we now have guard bands smaller by 2 ns, thus, in this particular
case, we lose a byte of the maximum MTU.
Fixes: 11afdc6526de ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Michael Walle <mwalle@kernel.org>
Link: https://patch.msgid.link/20241210132640.3426788-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/dsa/ocelot/felix_vsc9959.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c
index 0186482194d2..391c4e3cb66f 100644
--- a/drivers/net/dsa/ocelot/felix_vsc9959.c
+++ b/drivers/net/dsa/ocelot/felix_vsc9959.c
@@ -22,7 +22,7 @@
#define VSC9959_NUM_PORTS 6
#define VSC9959_TAS_GCL_ENTRY_MAX 63
-#define VSC9959_TAS_MIN_GATE_LEN_NS 33
+#define VSC9959_TAS_MIN_GATE_LEN_NS 35
#define VSC9959_VCAP_POLICER_BASE 63
#define VSC9959_VCAP_POLICER_MAX 383
#define VSC9959_SWITCH_PCI_BAR 4
@@ -1057,11 +1057,15 @@ static void vsc9959_mdio_bus_free(struct ocelot *ocelot)
mdiobus_free(felix->imdio);
}
-/* The switch considers any frame (regardless of size) as eligible for
- * transmission if the traffic class gate is open for at least 33 ns.
+/* The switch considers any frame (regardless of size) as eligible
+ * for transmission if the traffic class gate is open for at least
+ * VSC9959_TAS_MIN_GATE_LEN_NS.
+ *
* Overruns are prevented by cropping an interval at the end of the gate time
- * slot for which egress scheduling is blocked, but we need to still keep 33 ns
- * available for one packet to be transmitted, otherwise the port tc will hang.
+ * slot for which egress scheduling is blocked, but we need to still keep
+ * VSC9959_TAS_MIN_GATE_LEN_NS available for one packet to be transmitted,
+ * otherwise the port tc will hang.
+ *
* This function returns the size of a gate interval that remains available for
* setting the guard band, after reserving the space for one egress frame.
*/
@@ -1293,7 +1297,8 @@ static void vsc9959_tas_guard_bands_update(struct ocelot *ocelot, int port)
* per-tc static guard band lengths, so it reduces the
* useful gate interval length. Therefore, be careful
* to calculate a guard band (and therefore max_sdu)
- * that still leaves 33 ns available in the time slot.
+ * that still leaves VSC9959_TAS_MIN_GATE_LEN_NS
+ * available in the time slot.
*/
max_sdu = div_u64(remaining_gate_len_ps, picos_per_byte);
/* A TC gate may be completely closed, which is a
--
2.39.5
next prev parent reply other threads:[~2024-12-17 17:17 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-17 17:06 [PATCH 6.1 00/76] 6.1.121-rc1 review Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 01/76] bpf: Fix UAF via mismatching bpf_prog/attachment RCU flavors Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 02/76] ksmbd: fix racy issue from session lookup and expire Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 03/76] tcp: check space before adding MPTCP SYN options Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 04/76] blk-cgroup: Fix UAF in blkcg_unpin_online() Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 05/76] ALSA: usb-audio: Add implicit feedback quirk for Yamaha THR5 Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 06/76] usb: host: max3421-hcd: Correctly abort a USB request Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 07/76] ata: sata_highbank: fix OF node reference leak in highbank_initialize_phys() Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 08/76] usb: dwc2: Fix HCD resume Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 09/76] usb: dwc2: hcd: Fix GetPortStatus & SetPortFeature Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 10/76] usb: dwc2: Fix HCD port connection race Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 11/76] usb: ehci-hcd: fix call balance of clocks handling routines Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 12/76] usb: typec: anx7411: fix fwnode_handle reference leak Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 13/76] usb: typec: anx7411: fix OF node reference leaks in anx7411_typec_switch_probe() Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 14/76] usb: gadget: u_serial: Fix the issue that gs_start_io crashed due to accessing null pointer Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 15/76] usb: dwc3: xilinx: make sure pipe clock is deselected in usb2 only mode Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 16/76] drm/i915: Fix memory leak by correcting cache object name in error handler Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 17/76] xfs: update btree keys correctly when _insrec splits an inode root block Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 18/76] xfs: dont drop errno values when we fail to ficlone the entire range Greg Kroah-Hartman
2024-12-17 17:06 ` [PATCH 6.1 19/76] xfs: return from xfs_symlink_verify early on V4 filesystems Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 20/76] xfs: fix scrub tracepoints when inode-rooted btrees are involved Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 21/76] xfs: only run precommits once per transaction object Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 22/76] bpf,perf: Fix invalid prog_array access in perf_event_detach_bpf_prog Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 23/76] bpf, sockmap: Fix update element with same Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 24/76] smb: client: fix UAF in smb2_reconnect_server() Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 25/76] exfat: support dynamic allocate bh for exfat_entry_set_cache Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 26/76] exfat: fix potential deadlock on __exfat_get_dentry_set Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 27/76] wifi: nl80211: fix NL80211_ATTR_MLO_LINK_ID off-by-one Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 28/76] wifi: mac80211: clean up ret in sta_link_apply_parameters() Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 29/76] wifi: mac80211: fix station NSS capability initialization order Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 30/76] acpi: nfit: vmalloc-out-of-bounds Read in acpi_nfit_ctl Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 31/76] amdgpu/uvd: get ring reference from rq scheduler Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 32/76] batman-adv: Do not send uninitialized TT changes Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 33/76] batman-adv: Remove uninitialized data in full table TT response Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 34/76] batman-adv: Do not let TT changes list grows indefinitely Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 35/76] tipc: fix NULL deref in cleanup_bearer() Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 36/76] net/mlx5: DR, prevent potential error pointer dereference Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 37/76] selftests: mlxsw: sharedbuffer: Remove h1 ingress test case Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 38/76] selftests: mlxsw: sharedbuffer: Remove duplicate test cases Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 39/76] selftests: mlxsw: sharedbuffer: Ensure no extra packets are counted Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 40/76] ptp: kvm: Use decrypted memory in confidential guest on x86 Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 41/76] ptp: kvm: x86: Return EOPNOTSUPP instead of ENODEV from kvm_arch_ptp_init() Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 42/76] net: lapb: increase LAPB_HEADER_LEN Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 43/76] net: defer final struct net free in netns dismantle Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 44/76] net: mscc: ocelot: fix memory leak on ocelot_port_add_txtstamp_skb() Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 45/76] net: mscc: ocelot: improve handling of TX timestamp for unknown skb Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 46/76] net: mscc: ocelot: ocelot->ts_id_lock and ocelot_port->tx_skbs.lock are IRQ-safe Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 47/76] net: mscc: ocelot: be resilient to loss of PTP packets during transmission Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 48/76] net: mscc: ocelot: perform error cleanup in ocelot_hwstamp_set() Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 49/76] spi: aspeed: Fix an error handling path in aspeed_spi_[read|write]_user() Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 50/76] net: sparx5: fix FDMA performance issue Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 51/76] net: sparx5: fix the maximum frame length register Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 52/76] ACPI: resource: Fix memory resource type union access Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 53/76] cxgb4: use port number to set mac addr Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 54/76] qca_spi: Fix clock speed for multiple QCA7000 Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 55/76] qca_spi: Make driver probing reliable Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 56/76] ASoC: amd: yc: Fix the wrong return value Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 57/76] Documentation: PM: Clarify pm_runtime_resume_and_get() " Greg Kroah-Hartman
2024-12-17 17:07 ` Greg Kroah-Hartman [this message]
2024-12-17 17:07 ` [PATCH 6.1 59/76] net/sched: netem: account for backlog updates from child qdisc Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 60/76] bonding: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 61/76] team: " Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 62/76] ACPICA: events/evxfregn: dont release the ContextMutex that was never acquired Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 63/76] Bluetooth: iso: Fix recursive locking warning Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 64/76] Bluetooth: SCO: Add support for 16 bits transparent voice setting Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 65/76] blk-iocost: Avoid using clamp() on inuse in __propagate_weights() Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 66/76] bpf: sync_linked_regs() must preserve subreg_def Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 67/76] tracing/kprobes: Skip symbol counting logic for module symbols in create_local_trace_kprobe() Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 68/76] xen/netfront: fix crash when removing device Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 69/76] x86: make get_cpu_vendor() accessible from Xen code Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 70/76] objtool/x86: allow syscall instruction Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 71/76] x86/static-call: provide a way to do very early static-call updates Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 72/76] x86/xen: dont do PV iret hypercall through hypercall page Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 73/76] x86/xen: add central hypercall functions Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 74/76] x86/xen: use new hypercall functions instead of hypercall page Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 75/76] x86/xen: remove " Greg Kroah-Hartman
2024-12-17 17:07 ` [PATCH 6.1 76/76] ALSA: usb-audio: Fix a DMA to stack memory bug Greg Kroah-Hartman
2024-12-17 19:55 ` [PATCH 6.1 00/76] 6.1.121-rc1 review Florian Fainelli
2024-12-17 21:27 ` Pavel Machek
2024-12-17 23:03 ` Shuah Khan
2024-12-18 6:55 ` Ron Economos
2024-12-18 11:35 ` Peter Schneider
2024-12-18 12:49 ` Mark Brown
2024-12-18 15:46 ` Naresh Kamboju
2024-12-18 17:21 ` Jon Hunter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241217170528.676866588@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=kuba@kernel.org \
--cc=mwalle@kernel.org \
--cc=patches@lists.linux.dev \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
--cc=vladimir.oltean@nxp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox