Netdev List

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
From: levipearson @ 2017-09-20  1:59 UTC (permalink / raw)
  To: vinicius.gomes
  Cc: netdev, intel-wired-lan, andre.guedes, ivan.briano,
	jesus.sanchez-palencia, boon.leong.ong, jhs, xiyou.wangcong, jiri,
	richardcochran, henrik
In-Reply-To: <20170901012625.14838-1-vinicius.gomes@intel.com>

On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote:
> Hi,
> 
> This patchset is an RFC on a proposal of how the Traffic Control subsystem can
> be used to offload the configuration of traffic shapers into network devices
> that provide support for them in HW. Our goal here is to start upstreaming
> support for features related to the Time-Sensitive Networking (TSN) set of
> standards into the kernel.

I'm very excited to see these features moving into the kernel! I am one of the
maintainers of the OpenAvnu project and I've been involved in building AVB/TSN
systems and working on the standards for around 10 years, so the support that's
been slowly making it into more silicon and now Linux drivers is very
encouraging.

My team at Harman is working on endpoint code based on what's in the OpenAvnu
project and a few Linux-based platforms. The Qav interface you've proposed will
fit nicely with our traffic shaper management daemon, which already uses mqprio
as a base but uses the htb shaper to approximate the Qav credit-based shaper on
platforms where launch time scheduling isn't available.

I've applied your patches and plan on testing them in conjunction with our
shaper manager to see if we run into any hitches, but I don't expect any
problems.

> As part of this work, we've assessed previous public discussions related to TSN
> enabling: patches from Henrik Austad (Cisco), the presentation from Eric Mann
> at Linux Plumbers 2012, patches from Gangfeng Huang (National Instruments) and
> the current state of the OpenAVNU project (https://github.com/AVnu/OpenAvnu/).
> 
> Please note that the patches provided as part of this RFC are implementing what
> is needed only for 802.1Qav (FQTSS) only, but we'd like to take advantage of
> this discussion and share our WIP ideas for the 802.1Qbv and 802.1Qbu interfaces
> as well. The current patches are only providing support for HW offload of the
> configs.
> 
> 
> Overview
> ========
> 
> Time-sensitive Networking (TSN) is a set of standards that aim to address
> resources availability for providing bandwidth reservation and bounded latency
> on Ethernet based LANs. The proposal described here aims to cover mainly what is
> needed to enable the following standards: 802.1Qat, 802.1Qav, 802.1Qbv and
> 802.1Qbu.
> 
> The initial target of this work is the Intel i210 NIC, but other controllers'
> datasheet were also taken into account, like the Renesas RZ/A1H RZ/A1M group and
> the Synopsis DesignWare Ethernet QoS controller.

Recent SoCs from NXP (the i.MX 6 SoloX, and all the i.MX 7 and 8 parts) support
Qav shaping as well as scheduled launch functionality; these are the parts I 
have been mostly working with. Marvell silicon (some subset of Armada processors
and Link Street DSA switches) generally supports traffic shaping as well.

I think a lack of an interface like this has probably slowed upstream driver
support for this functionality where it exists; most vendors have an out-of-
tree version of their driver with TSN functionality enabled via non-standard
interfaces. Hopefully making it available will encourage vendors to upstream
their driver support!

> Proposal
> ========
> 
> Feature-wise, what is covered here are configuration interfaces for HW
> implementations of the Credit-Based shaper (CBS, 802.1Qav), Time-Aware shaper
> (802.1Qbv) and Frame Preemption (802.1Qbu). CBS is a per-queue shaper, while
> Qbv and Qbu must be configured per port, with the configuration covering all
> queues. Given that these features are related to traffic shaping, and that the
> traffic control subsystem already provides a queueing discipline that offloads
> config into the device driver (i.e. mqprio), designing new qdiscs for the
> specific purpose of offloading the config for each shaper seemed like a good
> fit.

This makes sense to me too. The 802.1Q standards are all based on the sort of
mappings between priority, traffic class, and hardware queues that the existing
tc infrastructure seems to be modeling. I believe the mqprio module's mapping
scheme is flexible enough to meet any TSN needs in conjunction with the other
parts of the kernel qdisc system.

> For steering traffic into the correct queues, we use the socket option
> SO_PRIORITY and then a mechanism to map priority to traffic classes / Txqueues.
> The qdisc mqprio is currently used in our tests.
> 
> As for the shapers config interface:
> 
>  * CBS (802.1Qav)
> 
>    This patchset is proposing a new qdisc called 'cbs'. Its 'tc' cmd line is:
>    $ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \
>      idleslope I
> 
>    Note that the parameters for this qdisc are the ones defined by the
>    802.1Q-2014 spec, so no hardware specific functionality is exposed here.

These parameters look good to me as a baseline; some additional optional
parameters may be useful for software-based implementations--such as setting an
interval at which to recalculate queues--but those can be discussed later.

>  * Time-aware shaper (802.1Qbv):

I haven't come across any specific NIC or SoC MAC that does Qbv, but I have
been experimenting with an EspressoBin board, which has a "Topaz" DSA switch
in it that has some features intended for Qbv support, although they were done
with a draft version in mind.

I haven't looked at the interaction between the qdisc subsystem and DSA yet,
but this mechanism might be useful to configure Qbv on the slave ports in
that context. I've got both the board and the documentation, so I might be
able to work on an implementation at some point.

If some endpoint device shows up with direct Qbv support, this interface would
probably work well there too, although a talker would need to be able to
schedule its transmits pretty precisely to achieve the lowest possible latency.

>    The idea we are currently exploring is to add a "time-aware", priority based
>    qdisc, that also exposes the Tx queues available and provides a mechanism for
>    mapping priority <-> traffic class <-> Tx queues in a similar fashion as
>    mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be:
> 
>    $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4    \
>      	   map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                         \
> 	   queues 0 1 2 3                                              \
>      	   sched-file gates.sched [base-time <interval>]               \
>            [cycle-time <interval>] [extension-time <interval>]

One concern here is calling the base-time parameter an interval; it's really
an absolute time with respect to the PTP timescale. Good documentation will
be important to this one, since the specification discusses some subtleties
regarding the impact of different time values chosen here.

The format for specifying the actual intervals such as cycle-time could prove
to be an important detail as well; Qbv specifies cycle-time as a ratio of two
integers expressed in seconds, while extension-time is specified as an integer
number of nanoseconds.

Precision with the cycle-time is especially important, since base-time can be
almost arbitrarily far in the past or future, and any given cycle start should
be calculable from the base-time plus/minus some integer multiple of cycle-
time.

>    <file> is multi-line, with each line being of the following format:
>    <cmd> <gate mask> <interval in nanoseconds>
> 
>    Qbv only defines one <cmd>: "S" for 'SetGates'
> 
>    For example:
> 
>    S 0x01 300
>    S 0x03 500
> 
>    This means that there are two intervals, the first will have the gate
>    for traffic class 0 open for 300 nanoseconds, the second will have
>    both traffic classes open for 500 nanoseconds.
> 
>    Additionally, an option to set just one entry of the gate control list will
>    also be provided by 'taprio':
> 
>    $ tc qdisc (...) \
>         sched-row <row number> <cmd> <gate mask> <interval>  \
>         [base-time <interval>] [cycle-time <interval>] \
>         [extension-time <interval>]

If I understand correctly, 'sched-row' is meant to be usable multiple times in
a single command and the 'sched-file' option is just a shorthand version for
large tables? Or is it meant to update an existing schedule table? It doesn't
seem very useful if it can only be specified once when the whole taprio intance
is being established.

>  * Frame Preemption (802.1Qbu):
> 
>    To control even further the latency, it may prove useful to signal which
>    traffic classes are marked as preemptable. For that, 'taprio' provides the
>    preemption command so you set each traffic class as preemptable or not:
> 
>    $ tc qdisc (...) \
>         preemption 0 1 1 1
> 
> 
>  * Time-aware shaper + Preemption:
> 
>    As an example of how Qbv and Qbu can be used together, we may specify
>    both the schedule and the preempt-mask, and this way we may also
>    specify the Set-Gates-and-Hold and Set-Gates-and-Release commands as
>    specified in the Qbu spec:
> 
>    $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4 \
>      	   map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                    \
> 	   queues 0 1 2 3                                         \
>      	   preemption 0 1 1 1                                     \
> 	   sched-file preempt_gates.sched
> 
>     <file> is multi-line, with each line being of the following format:
>     <cmd> <gate mask> <interval in nanoseconds>
> 
>     For this case, two new commands are introduced:
> 
>     "H" for 'set gates and hold'
>     "R" for 'set gates and release'
> 
>     H 0x01 300
>     R 0x03 500
> 

The new Hold and Release gate commands look right, but I'm not sure about the
preemption flags. Qbu describes a preemption parameter table indexed by
*priority* rather than traffic class or queue. These select which of two MAC
service interfaces is used by the frame at the ISS layer, either express or
preemptable, at the time the frame is selected for transmit. If my
understanding is correct, it's possible to map a preemptable priority as well
as an express priority to the same queue, so flagging preemptability at the
queue level is not correct.

I'm not aware of any endpoint interfaces that support Qbu either, nor do I 
know of any switches that support it that someone could experiment with right
now, so there's no pressure on getting that interface nailed down yet.

Hopefully you find this feedback useful, and I appreciate the effort taken to
get the RFC posted here!

Levi

^ permalink raw reply