From: Vladimir Oltean <olteanv@gmail.com>
To: Jiri Pirko <jiri@resnulli.us>,
netdev@vger.kernel.org, UNGLinuxDriver@microchip.com,
Jakub Kicinski <kuba@kernel.org>
Subject: Re: devlink-sb on ocelot switches
Date: Mon, 17 Aug 2020 19:32:22 +0300 [thread overview]
Message-ID: <20200817163222.opf576vyvapk4bqm@skbuf> (raw)
In-Reply-To: <20200814104228.eidqu7fd7mfyur5n@skbuf>
So after some more fiddling, it looks like I got the diagram wrong.
Here's how the switch really consumes resources. 4 lookups in parallel,
they are ORed in 2 pairs (ingress with egress forms a pair), and the
result is ANDed. The consumptions for ingress and egress are really
completely independent.
Frame forwarding decision taken
|
|
v
+--------------------+--------------------+--------------------+
| | | |
v v v v
Ingress memory Egress memory Ingress frame Egress frame
check check reference check reference check
| | | |
v v v v
BUF_Q_RSRV_I ok BUF_Q_RSRV_E ok REF_Q_RSRV_I ok REF_Q_RSRV_E ok
(src port, prio) -+ (dst port, prio) -+ (src port, prio) -+ (dst port, prio) -+
| | | | | | | |
| exceeded | | exceeded | | exceeded | | exceeded |
| | | | | | | |
v | v | v | v |
BUF_P_RSRV_I ok| BUF_P_RSRV_E ok| REF_P_RSRV_I ok| REF_P_RSRV_E ok|
(src port) ----+ (dst port) ----+ (src port) ----+ (dst port) -----+
| | | | | | | |
| exceeded | | exceeded | | exceeded | | exceeded |
| | | | | | | |
v | v | v | v |
BUF_PRIO_SHR_I ok| BUF_PRIO_SHR_E ok| REF_PRIO_SHR_I ok| REF_PRIO_SHR_E ok|
(prio) ------+ (prio) ------+ (prio) ------+ (prio) -------+
| | | | | | | |
| exceeded | | exceeded | | exceeded | | exceeded |
| | | | | | | |
v | v | v | v |
BUF_COL_SHR_I ok| BUF_COL_SHR_E ok| REF_COL_SHR_I ok| REF_COL_SHR_E ok|
(dp) -------+ (dp) -------+ (dp) -------+ (dp) --------+
| | | | | | | |
| exceeded | | exceeded | | exceeded | | exceeded |
| | | | | | | |
v v v v v v v v
fail success fail success fail success fail success
| | | | | | | |
v v v v v v v v
+-----+----+ +-----+----+ +-----+----+ +-----+-----+
| | | |
+-------> OR <-------+ +-------> OR <-------+
| |
v v
+----------------> AND <-----------------+
|
v
FIFO drop / accept
Something which isn't explicitly said in devlink-sb is whether a pool
bound to a port-TC is allowed to spill over into the port pool. And
whether the port pool, in turn, is allowed to spill over into something
else (a shared pool)?
If they are, then I could expose BUF_P_RSRV_I (buffer reservation per
ingress port) as the threshold of the port pool, BUF_Q_RSRV_I and
BUF_Q_RSRV_E (buffer reservations per QoS class of ingress, and egress,
ports) as port-TC pools, and I could implicitly configure the remaining
sharing watermarks to consume the rest of the memory available in the
pool.
But by looking at some of the selftests, I don't see any clear
indication of a test where the occupancy of the port-TC exceeds the size
of that pool, and what should happen in that case. Just a vague hint,
in tools/testing/selftests/drivers/net/mlxsw/sch_ets.sh, that once the
port-TC pool threshold has been exceeded, the excess should be simply
dropped:
# Set the ingress quota high and use the three egress TCs to limit the
# amount of traffic that is admitted to the shared buffers. This makes
# sure that there is always enough traffic of all types to select from
# for the DWRR process.
devlink_port_pool_th_set $swp1 0 12
devlink_tc_bind_pool_th_set $swp1 0 ingress 0 12
devlink_port_pool_th_set $swp2 4 12
devlink_tc_bind_pool_th_set $swp2 7 egress 4 5
devlink_tc_bind_pool_th_set $swp2 6 egress 4 5
devlink_tc_bind_pool_th_set $swp2 5 egress 4 5
So I'm guessing that this is not the same behavior as in ocelot. But,
truth be told, it doesn't really help either that nfp and mlxsw are
simply passing these parameters to firmware, not really giving any
insight into how they are interpreted.
Would it be simpler if I just exposed these watermarks as generic
devlink resources? Although in a way that would be a wasted opportunity
for devlink-sb. I also don't think I can monitor occupancy if I model
them as generic resources.
Am I missing something?
Thanks,
-Vladimir
prev parent reply other threads:[~2020-08-17 17:35 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-14 10:42 devlink-sb on ocelot switches Vladimir Oltean
2020-08-17 16:32 ` Vladimir Oltean [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200817163222.opf576vyvapk4bqm@skbuf \
--to=olteanv@gmail.com \
--cc=UNGLinuxDriver@microchip.com \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox