From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Liang, Cunming" Subject: Re: [RFC] Generic flow director/filtering/classification API Date: Fri, 8 Jul 2016 19:11:28 +0800 Message-ID: <577F8A60.2000409@intel.com> References: <20160705181646.GO7621@6wind.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable To: dev@dpdk.org, Thomas Monjalon , Helin Zhang , Jingjing Wu , Rasesh Mody , Ajit Khaparde , Rahul Lakkireddy , Wenzhuo Lu , Jan Medala , John Daley , Jing Chen , Konstantin Ananyev , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , Pablo de Lara , Olga Shern Return-path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id CE7315AA8 for ; Fri, 8 Jul 2016 13:11:34 +0200 (CEST) In-Reply-To: <20160705181646.GO7621@6wind.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Adrien, On 7/6/2016 2:16 AM, Adrien Mazarguil wrote: > Hi All, > > First, forgive me for this large message, I know our mailboxes already > suffer quite a bit from the amount of traffic on this ML. > > This is not exactly yet another thread about how flow director should b= e > extended, rather about a brand new API to handle filtering and > classification for incoming packets in the most PMD-generic and > application-friendly fashion we can come up with. Reasons described bel= ow. > > I think this topic is important enough to include both the users of thi= s API > as well as PMD maintainers. So far I have CC'ed librte_ether (especiall= y > rte_eth_ctrl.h contributors), testpmd and PMD maintainers (with and wit= hout > a .filter_ctrl implementation), but if you know application maintainers > other than testpmd who use FDIR or might be interested in this discussi= on, > feel free to add them. > > The issues we found with the current approach are already summarized in= the > following document, but here is a quick summary for TL;DR folks: > > - PMDs do not expose a common set of filter types and even when they do= , > their behavior more or less differs. > > - Applications need to determine and adapt to device-specific limitatio= ns > and quirks on their own, without help from PMDs. > > - Writing an application that creates flow rules targeting all devices > supported by DPDK is thus difficult, if not impossible. > > - The current API has too many unspecified areas (particularly regardin= g > side effects of flow rules) that make PMD implementation tricky. > > This RFC API handles everything currently supported by .filter_ctrl, th= e > idea being to reimplement all of these to make them fully usable by > applications in a more generic and well defined fashion. It has a very = small > set of mandatory features and an easy method to let applications probe = for > supported capabilities. > > The only downside is more work for the software control side of PMDs be= cause > they have to adapt to the API instead of the reverse. I think helpers c= an be > added to EAL to assist with this. > > HTML version: > > https://rawgit.com/6WIND/rte_flow/master/rte_flow.html > > PDF version: > > https://rawgit.com/6WIND/rte_flow/master/rte_flow.pdf > > Related draft header file (for reference while reading the specificatio= n): > > https://raw.githubusercontent.com/6WIND/rte_flow/master/rte_flow.h > > Git tree for completeness (latest .rst version can be retrieved from he= re): > > https://github.com/6WIND/rte_flow > > What follows is the ReST source of the above, for inline comments and > discussion. I intend to update that specification accordingly. > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Generic filter interface > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > .. footer:: > > v0.6 > > .. contents:: > .. sectnum:: > .. raw:: pdf > > PageBreak > > Overview > =3D=3D=3D=3D=3D=3D=3D=3D > > DPDK provides several competing interfaces added over time to perform p= acket > matching and related actions such as filtering and classification. > > They must be extended to implement the features supported by newer devi= ces > in order to expose them to applications, however the current design has > several drawbacks: > > - Complicated filter combinations which have not been hard-coded cannot= be > expressed. > - Prone to API/ABI breakage when new features must be added to an exist= ing > filter type, which frequently happens. > > From an application point of view: > > - Having disparate interfaces, all optional and lacking in features doe= s not > make this API easy to use. > - Seemingly arbitrary built-in limitations of filter types based on the > device they were initially designed for. > - Undefined relationship between different filter types. > - High complexity, considerable undocumented and/or undefined behavior. > > Considering the growing number of devices supported by DPDK, adding a n= ew > filter type each time a new feature must be implemented is not sustaina= ble > in the long term. Applications not written to target a specific device > cannot really benefit from such an API. > > For these reasons, this document defines an extensible unified API that > encompasses and supersedes these legacy filter types. > > .. raw:: pdf > > PageBreak > > Current API > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > Rationale > --------- > > The reason several competing (and mostly overlapping) filtering APIs ar= e > present in DPDK is due to its nature as a thin layer between hardware a= nd > software. > > Each subsequent interface has been added to better match the capabiliti= es > and limitations of the latest supported device, which usually happened = to > need an incompatible configuration approach. Because of this, many ende= d up > device-centric and not usable by applications that were not written for= that > particular device. > > This document is not the first attempt to address this proliferation is= sue, > in fact a lot of work has already been done both to create a more gener= ic > interface while somewhat keeping compatibility with legacy ones through= a > common call interface (``rte_eth_dev_filter_ctrl()`` with the > ``.filter_ctrl`` PMD callback in ``rte_ethdev.h``). > > Today, these previously incompatible interfaces are known as filter typ= es > (``RTE_ETH_FILTER_*`` from ``enum rte_filter_type`` in ``rte_eth_ctrl.h= ``). > > However while trivial to extend with new types, it only shifted the > underlying problem as applications still need to be written for one kin= d of > filter type, which, as described in the following sections, is not > necessarily implemented by all PMDs that support filtering. > > .. raw:: pdf > > PageBreak > > Filter types > ------------ > > This section summarizes the capabilities of each filter type. > > Although the following list is exhaustive, the description of individua= l > types may contain inaccuracies due to the lack of documentation or usag= e > examples. > > Note: names are prefixed with ``RTE_ETH_FILTER_``. > > ``MACVLAN`` > ~~~~~~~~~~~ > > Matching: > > - L2 source/destination addresses. > - Optional 802.1Q VLAN ID. > - Masking individual fields on a rule basis is not supported. > > Action: > > - Packets are redirected either to a given VF device using its ID or to= the > PF. > > ``ETHERTYPE`` > ~~~~~~~~~~~~~ > > Matching: > > - L2 source/destination addresses (optional). > - Ethertype (no VLAN ID?). > - Masking individual fields on a rule basis is not supported. > > Action: > > - Receive packets on a given queue. > - Drop packets. > > ``FLEXIBLE`` > ~~~~~~~~~~~~ > > Matching: > > - At most 128 consecutive bytes anywhere in packets. > - Masking is supported with byte granularity. > - Priorities are supported (relative to this filter type, undefined > otherwise). > > Action: > > - Receive packets on a given queue. > > ``SYN`` > ~~~~~~~ > > Matching: > > - TCP SYN packets only. > - One high priority bit can be set to give the highest possible priorit= y to > this type when other filters with different types are configured. > > Action: > > - Receive packets on a given queue. > > ``NTUPLE`` > ~~~~~~~~~~ > > Matching: > > - Source/destination IPv4 addresses (optional in 2-tuple mode). > - Source/destination TCP/UDP port (mandatory in 2 and 5-tuple modes). > - L4 protocol (2 and 5-tuple modes). > - Masking individual fields is supported. > - TCP flags. > - Up to 7 levels of priority relative to this filter type, undefined > otherwise. > - No IPv6. > > Action: > > - Receive packets on a given queue. > > ``TUNNEL`` > ~~~~~~~~~~ > > Matching: > > - Outer L2 source/destination addresses. > - Inner L2 source/destination addresses. > - Inner VLAN ID. > - IPv4/IPv6 source (destination?) address. > - Tunnel type to match (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE, 802.= 1BR > E-Tag). > - Tenant ID for tunneling protocols that have one. > - Any combination of the above can be specified. > - Masking individual fields on a rule basis is not supported. > > Action: > > - Receive packets on a given queue. > > .. raw:: pdf > > PageBreak > > ``FDIR`` > ~~~~~~~~ > > Queries: > > - Device capabilities and limitations. > - Device statistics about configured filters (resource usage, collision= s). > - Device configuration (matching input set and masks) > > Matching: > > - Device mode of operation: none (to disable filtering), signature > (hash-based dispatching from masked fields) or perfect (either MAC V= LAN or > tunnel). > - L2 Ethertype. > - Outer L2 destination address (MAC VLAN mode). > - Inner L2 destination address, tunnel type (NVGRE, VXLAN) and tunnel I= D > (tunnel mode). > - IPv4 source/destination addresses, ToS, TTL and protocol fields. > - IPv6 source/destination addresses, TC, protocol and hop limits fields= . > - UDP source/destination IPv4/IPv6 and ports. > - TCP source/destination IPv4/IPv6 and ports. > - SCTP source/destination IPv4/IPv6, ports and verification tag field. > - Note, only one protocol type at once (either only L2 Ethertype, basic > IPv6, IPv4+UDP, IPv4+TCP and so on). > - VLAN TCI (extended API). > - At most 16 bytes to match in payload (extended API). A global device > look-up table specifies for each possible protocol layer (unknown, r= aw, > L2, L3, L4) the offset to use for each byte (they do not need to be > contiguous) and the related bitmask. > - Whether packet is addressed to PF or VF, in that case its ID can be > matched as well (extended API). > - Masking most of the above fields is supported, but simultaneously aff= ects > all filters configured on a device. > - Input set can be modified in a similar fashion for a given device to > ignore individual fields of filters (i.e. do not match the destinati= on > address in a IPv4 filter, refer to **RTE_ETH_INPUT_SET_** > macros). Configuring this also affects RSS processing on **i40e**. > - Filters can also provide 32 bits of arbitrary data to return as part = of > matched packets. > > Action: > > - **RTE_ETH_FDIR_ACCEPT**: receive (accept) packet on a given queue. > - **RTE_ETH_FDIR_REJECT**: drop packet immediately. > - **RTE_ETH_FDIR_PASSTHRU**: similar to accept for the last filter in l= ist, > otherwise process it with subsequent filters. > - For accepted packets and if requested by filter, either 32 bits of > arbitrary data and four bytes of matched payload (only in case of fl= ex > bytes matching), or eight bytes of matched payload (flex also) are a= dded > to meta data. > > .. raw:: pdf > > PageBreak > > ``HASH`` > ~~~~~~~~ > > Not an actual filter type. Provides and retrieves the global device > configuration (per port or entire NIC) for hash functions and their > properties. > > Hash function selection: "default" (keep current), XOR or Toeplitz. > > This function can be configured per flow type (**RTE_ETH_FLOW_** > definitions), supported types are: > > - Unknown. > - Raw. > - Fragmented or non-fragmented IPv4. > - Non-fragmented IPv4 with L4 (TCP, UDP, SCTP or other). > - Fragmented or non-fragmented IPv6. > - Non-fragmented IPv6 with L4 (TCP, UDP, SCTP or other). > - L2 payload. > - IPv6 with extensions. > - IPv6 with L4 (TCP, UDP) and extensions. > > ``L2_TUNNEL`` > ~~~~~~~~~~~~~ > > Matching: > > - All packets received on a given port. > > Action: > > - Add tunnel encapsulation (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE, > 802.1BR E-Tag) using the provided Ethertype and tunnel ID (only E-Ta= g > is implemented at the moment). > - VF ID to use for tag insertion (currently unused). > - Destination pool for tag based forwarding (pools are IDs that can be > affected to ports, duplication occurs if the same ID is shared by se= veral > ports of the same NIC). > > .. raw:: pdf > > PageBreak > > Driver support > -------------- > > =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D= =3D =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D =3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D= =3D =3D=3D=3D=3D =3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D > Driver MACVLAN ETHERTYPE FLEXIBLE SYN NTUPLE TUNNEL FDIR HASH L2_TUNN= EL > =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D= =3D =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D =3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D= =3D =3D=3D=3D=3D =3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D > bnx2x > cxgbe > e1000 yes yes yes yes > ena > enic yes > fm10k > i40e yes yes yes yes yes > ixgbe yes yes yes yes yes > mlx4 > mlx5 yes > szedata2 > =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D= =3D =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D =3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D= =3D =3D=3D=3D=3D =3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D > > Flow director > ------------- > > Flow director (FDIR) is the name of the most capable filter type, which > covers most features offered by others. As such, it is the most widespr= ead > in PMDs that support filtering (i.e. all of them besides **e1000**). > > It is also the only type that allows an arbitrary 32 bits value provide= d by > applications to be attached to a filter and returned with matching pack= ets > instead of relying on the destination queue to recognize flows. > > Unfortunately, even FDIR requires applications to be aware of low-level > capabilities and limitations (most of which come directly from **ixgbe*= * and > **i40e**): > > - Bitmasks are set globally per device (port?), not per filter. > - Configuration state is not expected to be saved by the driver, and > stopping/restarting a port requires the application to perform it ag= ain > (API documentation is also unclear about this). > - Monolithic approach with ABI issues as soon as a new kind of flow or > combination needs to be supported. > - Cryptic global statistics/counters. > - Unclear about how priorities are managed; filters seem to be arranged= as a > linked list in hardware (possibly related to configuration order). > > Packet alteration > ----------------- > > One interesting feature is that the L2 tunnel filter type implements th= e > ability to alter incoming packets through a filter (in this case to > encapsulate them), thus the **mlx5** flow encap/decap features are not = a > foreign concept. > > .. raw:: pdf > > PageBreak > > Proposed API > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > Terminology > ----------- > > - **Filtering API**: overall framework affecting the fate of selected > packets, covers everything described in this document. > - **Matching pattern**: properties to look for in received packets, a > combination of any number of items. > - **Pattern item**: part of a pattern that either matches packet data > (protocol header, payload or derived information), or specifies prop= erties > of the pattern itself. > - **Actions**: what needs to be done when a packet matches a pattern. > - **Flow rule**: this is the result of combining a *matching pattern* w= ith > *actions*. > - **Filter rule**: a less generic term than *flow rule*, can otherwise = be > used interchangeably. > - **Hit**: a flow rule is said to be *hit* when processing a matching > packet. > > Requirements > ------------ > > As described in the previous section, there is a growing need for a com= mon > method to configure filtering and related actions in a hardware indepen= dent > fashion. > > The filtering API should not disallow any filter combination by design = and > must remain as simple as possible to use. It can simply be defined as a > method to perform one or several actions on selected packets. > > PMDs are aware of the capabilities of the device they manage and should= be > responsible for preventing unsupported or conflicting combinations. > > This approach is fundamentally different as it places most of the burde= n on > the software side of the PMD instead of having device capabilities dire= ctly > mapped to API functions, then expecting applications to work around ens= uing > compatibility issues. > > Requirements for a new API: > > - Flexible and extensible without causing API/ABI problems for existing > applications. > - Should be unambiguous and easy to use. > - Support existing filtering features and actions listed in `Filter typ= es`_. > - Support packet alteration. > - In case of overlapping filters, their priority should be well documen= ted. > - Support filter queries (for example to retrieve counters). > > .. raw:: pdf > > PageBreak > > High level design > ----------------- > > The chosen approach to make filtering as generic as possible is by > expressing matching patterns through lists of items instead of the flat > structures used in DPDK today, enabling combinations that are not prede= fined > and thus being more versatile. > > Flow rules can have several distinct actions (such as counting, > encapsulating, decapsulating before redirecting packets to a particular > queue, etc.), instead of relying on several rules to achieve this and h= aving > applications deal with hardware implementation details regarding their > order. > > Support for different priority levels on a rule basis is provided, for > example in order to force a more specific rule come before a more gener= ic > one for packets matched by both, however hardware support for more than= a > single priority level cannot be guaranteed. When supported, the number = of > available priority levels is usually low, which is why they can also be > implemented in software by PMDs (e.g. to simulate missing priority leve= ls by > reordering rules). > > In order to remain as hardware agnostic as possible, by default all rul= es > are considered to have the same priority, which means that the order be= tween > overlapping rules (when a packet is matched by several filters) is > undefined, packet duplication may even occur as a result. > > PMDs may refuse to create overlapping rules at a given priority level w= hen > they can be detected (e.g. if a pattern matches an existing filter). > > Thus predictable results for a given priority level can only be achieve= d > with non-overlapping rules, using perfect matching on all protocol laye= rs. > > Support for multiple actions per rule may be implemented internally on = top > of non-default hardware priorities, as a result both features may not b= e > simultaneously available to applications. > > Considering that allowed pattern/actions combinations cannot be known i= n > advance and would result in an unpractically large number of capabiliti= es to > expose, a method is provided to validate a given rule from the current > device configuration state without actually adding it (akin to a "dry r= un" > mode). > > This enables applications to check if the rule types they need is suppo= rted > at initialization time, before starting their data path. This method ca= n be > used anytime, its only requirement being that the resources needed by a= rule > must exist (e.g. a target RX queue must be configured first). > > Each defined rule is associated with an opaque handle managed by the PM= D, > applications are responsible for keeping it. These can be used for quer= ies > and rules management, such as retrieving counters or other data and > destroying them. > > Handles must be destroyed before releasing associated resources such as > queues. > > Integration > ----------- > > To avoid ABI breakage, this new interface will be implemented through t= he > existing filtering control framework (``rte_eth_dev_filter_ctrl()``) us= ing > **RTE_ETH_FILTER_GENERIC** as a new filter type. > > However a public front-end API described in `Rules management`_ will > be added as the preferred method to use it. > > Once discussions with the community have converged to a definite API, l= egacy > filter types should be deprecated and a deadline defined to remove thei= r > support entirely. > > PMDs will have to be gradually converted to **RTE_ETH_FILTER_GENERIC** = or > drop filtering support entirely. Less maintained PMDs for older hardwar= e may > lose support at this point. > > The notion of filter type will then be deprecated and subsequently drop= ped > to avoid confusion between both frameworks. > > Implementation details > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > Flow rule > --------- > > A flow rule is the combination of a matching pattern with a list of act= ions, > and is the basis of this API. > > Priorities > ~~~~~~~~~~ > > A priority can be assigned to a matching pattern. > > The default priority level is 0 and is also the highest. Support for mo= re > than a single priority level in hardware is not guaranteed. > > If a packet is matched by several filters at a given priority level, th= e > outcome is undefined. It can take any path and can even be duplicated. > > Matching pattern > ~~~~~~~~~~~~~~~~ > > A matching pattern comprises any number of items of various types. > > Items are arranged in a list to form a matching pattern for packets. Th= ey > fall in two categories: > > - Protocol matching (ANY, RAW, ETH, IPV4, IPV6, ICMP, UDP, TCP, VXLAN a= nd so > on), usually associated with a specification structure. These must b= e > stacked in the same order as the protocol layers to match, starting = from > L2. > > - Affecting how the pattern is processed (END, VOID, INVERT, PF, VF, > SIGNATURE and so on), often without a specification structure. Since= they > are meta data that does not match packet contents, these can be spec= ified > anywhere within item lists without affecting the protocol matching i= tems. > > Most item specifications can be optionally paired with a mask to narrow= the > specific fields or bits to be matched. > > - Items are defined with ``struct rte_flow_item``. > - Patterns are defined with ``struct rte_flow_pattern``. > > Example of an item specification matching an Ethernet header: > > +-----------------------------------------+ > | Ethernet | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | ``spec`` | ``src`` | ``00:01:02:03:04`` | > | +---------+--------------------+ > | | ``dst`` | ``00:2a:66:00:01`` | > +----------+---------+--------------------+ > | ``mask`` | ``src`` | ``00:ff:ff:ff:00`` | > | +---------+--------------------+ > | | ``dst`` | ``00:00:00:00:ff`` | > +----------+---------+--------------------+ > > Non-masked bits stand for any value, Ethernet headers with the followin= g > properties are thus matched: > > - ``src``: ``??:01:02:03:??`` > - ``dst``: ``??:??:??:??:01`` > > Except for meta types that do not need one, ``spec`` must be a valid po= inter > to a structure of the related item type. A ``mask`` of the same type ca= n be > provided to tell which bits in ``spec`` are to be matched. > > A mask is normally only needed for ``spec`` fields matching packet data= , > ignored otherwise. See individual item types for more information. > > A ``NULL`` mask pointer is allowed and is similar to matching with a fu= ll > mask (all ones) ``spec`` fields supported by hardware, the remaining fi= elds > are ignored (all zeroes), there is thus no error checking for unsupport= ed > fields. > > Matching pattern items for packet data must be naturally stacked (order= ed > from lowest to highest protocol layer), as in the following examples: > > +--------------+ > | TCPv4 as L4 | > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | 0 | Ethernet | > +---+----------+ > | 1 | IPv4 | > +---+----------+ > | 2 | TCP | > +---+----------+ > > +----------------+ > | TCPv6 in VXLAN | > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | 0 | Ethernet | > +---+------------+ > | 1 | IPv4 | > +---+------------+ > | 2 | UDP | > +---+------------+ > | 3 | VXLAN | > +---+------------+ > | 4 | Ethernet | > +---+------------+ > | 5 | IPv6 | > +---+------------+ > | 6 | TCP | > +---+------------+ > > +-----------------------------+ > | TCPv4 as L4 with meta items | > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D+ > | 0 | VOID | > +---+-------------------------+ > | 1 | Ethernet | > +---+-------------------------+ > | 2 | VOID | > +---+-------------------------+ > | 3 | IPv4 | > +---+-------------------------+ > | 4 | TCP | > +---+-------------------------+ > | 5 | VOID | > +---+-------------------------+ > | 6 | VOID | > +---+-------------------------+ > > The above example shows how meta items do not affect packet data matchi= ng > items, as long as those remain stacked properly. The resulting matching > pattern is identical to "TCPv4 as L4". > > +----------------+ > | UDPv6 anywhere | > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | 0 | IPv6 | > +---+------------+ > | 1 | UDP | > +---+------------+ > > If supported by the PMD, omitting one or several protocol layers at the > bottom of the stack as in the above example (missing an Ethernet > specification) enables hardware to look anywhere in packets. > > It is unspecified whether the payload of supported encapsulations > (e.g. VXLAN inner packet) is matched by such a pattern, which may apply= to > inner, outer or both packets. > > +---------------------+ > | Invalid, missing L3 | > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | 0 | Ethernet | > +---+-----------------+ > | 1 | UDP | > +---+-----------------+ > > The above pattern is invalid due to a missing L3 specification between = L2 > and L4. It is only allowed at the bottom and at the top of the stack. > > Meta item types > ~~~~~~~~~~~~~~~ > > These do not match packet data but affect how the pattern is processed,= most > of them do not need a specification structure. This particularity allow= s > them to be specified anywhere without affecting other item types. [LC] For the meta item(END, VOID, INVERT) and some data matching type=20 like ANY and RAW, it's all PMD responsible to understand the key character and to parse=20 the header graph? > > ``END`` > ^^^^^^^ > > End marker for item lists. Prevents further processing of items, thereb= y > ending the pattern. > > - Its numeric value is **0** for convenience. > - PMD support is mandatory. > - Both ``spec`` and ``mask`` are ignored. > > +--------------------+ > | END | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | ``spec`` | ignored | > +----------+---------+ > | ``mask`` | ignored | > +----------+---------+ > > ``VOID`` > ^^^^^^^^ > > Used as a placeholder for convenience. It is ignored and simply discard= ed by > PMDs. > > - PMD support is mandatory. > - Both ``spec`` and ``mask`` are ignored. > > +--------------------+ > | VOID | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | ``spec`` | ignored | > +----------+---------+ > | ``mask`` | ignored | > +----------+---------+ > > One usage example for this type is generating rules that share a common > prefix quickly without reallocating memory, only by updating item types= : > > +------------------------+ > | TCP, UDP or ICMP as L4 | > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= + > | 0 | Ethernet | > +---+--------------------+ > | 1 | IPv4 | > +---+------+------+------+ > | 2 | UDP | VOID | VOID | > +---+------+------+------+ > | 3 | VOID | TCP | VOID | > +---+------+------+------+ > | 4 | VOID | VOID | ICMP | > +---+------+------+------+ > > .. raw:: pdf > > PageBreak > > ``INVERT`` > ^^^^^^^^^^ > > Inverted matching, i.e. process packets that do not match the pattern. > > - Both ``spec`` and ``mask`` are ignored. > > +--------------------+ > | INVERT | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | ``spec`` | ignored | > +----------+---------+ > | ``mask`` | ignored | > +----------+---------+ > > Usage example in order to match non-TCPv4 packets only: > > +--------------------+ > | Anything but TCPv4 | > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | 0 | INVERT | > +---+----------------+ > | 1 | Ethernet | > +---+----------------+ > | 2 | IPv4 | > +---+----------------+ > | 3 | TCP | > +---+----------------+ > > ``PF`` > ^^^^^^ > > Matches packets addressed to the physical function of the device. > > - Both ``spec`` and ``mask`` are ignored. > > +--------------------+ > | PF | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | ``spec`` | ignored | > +----------+---------+ > | ``mask`` | ignored | > +----------+---------+ > > ``VF`` > ^^^^^^ > > Matches packets addressed to the given virtual function ID of the devic= e. > > - Only ``spec`` needs to be defined, ``mask`` is ignored. > > +----------------------------------------+ > | VF | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | ``spec`` | ``vf`` | destination VF ID | > +----------+---------+-------------------+ > | ``mask`` | ignored | > +----------+-----------------------------+ > > ``SIGNATURE`` > ^^^^^^^^^^^^^ > > Requests hash-based signature dispatching for this rule. > > Considering this is a global setting on devices that support it, all > subsequent filter rules may have to be created with it as well. > > - Only ``spec`` needs to be defined, ``mask`` is ignored. > > +--------------------+ > | SIGNATURE | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | ``spec`` | TBD | > +----------+---------+ > | ``mask`` | ignored | > +----------+---------+ > > .. raw:: pdf > > PageBreak > > Data matching item types > ~~~~~~~~~~~~~~~~~~~~~~~~ > > Most of these are basically protocol header definitions with associated > bitmasks. They must be specified (stacked) from lowest to highest proto= col > layer. > > The following list is not exhaustive as new protocols will be added in = the > future. > > ``ANY`` > ^^^^^^^ > > Matches any protocol in place of the current layer, a single ANY may al= so > stand for several protocol layers. > > This is usually specified as the first pattern item when looking for a > protocol anywhere in a packet. > > - A maximum value of **0** requests matching any number of protocol lay= ers > above or equal to the minimum value, a maximum value lower than the > minimum one is otherwise invalid. > - Only ``spec`` needs to be defined, ``mask`` is ignored. > > +----------------------------------------------------------------------= -+ > | ANY = | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | ``spec`` | ``min`` | minimum number of layers covered = | > | +---------+-------------------------------------------------= -+ > | | ``max`` | maximum number of layers covered, 0 for infinity= | > +----------+---------+-------------------------------------------------= -+ > | ``mask`` | ignored = | > +----------+-----------------------------------------------------------= -+ > > Example for VXLAN TCP payload matching regardless of outer L3 (IPv4 or = IPv6) > and L4 (UDP) both matched by the first ANY specification, and inner L3 = (IPv4 > or IPv6) matched by the second ANY specification: > > +----------------------------------+ > | TCP in VXLAN with wildcards | > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | 0 | Ethernet | > +---+-----+----------+---------+---+ > | 1 | ANY | ``spec`` | ``min`` | 2 | > | | | +---------+---+ > | | | | ``max`` | 2 | > +---+-----+----------+---------+---+ > | 2 | VXLAN | > +---+------------------------------+ > | 3 | Ethernet | > +---+-----+----------+---------+---+ > | 4 | ANY | ``spec`` | ``min`` | 1 | > | | | +---------+---+ > | | | | ``max`` | 1 | > +---+-----+----------+---------+---+ > | 5 | TCP | > +---+------------------------------+ > > .. raw:: pdf > > PageBreak > > ``RAW`` > ^^^^^^^ > > Matches a string of a given length at a given offset (in bytes), or any= where > in the payload of the current protocol layer (including L2 header if us= ed as > the first item in the stack). > > This does not increment the protocol layer count as it is not a protoco= l > definition. Subsequent RAW items modulate the first absolute one with > relative offsets. > > - Using **-1** as the ``offset`` of the first RAW item makes its absolu= te > offset not fixed, i.e. the pattern is searched everywhere. > - ``mask`` only affects the pattern. The RAW matching type allow offset & length which support anchor setting=20 setting and string match. It's not defined for a user defined packet layout. Sometimes, comparing=20 payload raw data after a header require {offset, length}. One typical case is 5-tuples matching. The 'PORT' of=20 transport layer is an offset to the IP header. It can't address by IP/ANY, as it requires to extract key from the field=20 in ANY. > > +--------------------------------------------------------------+ > | RAW | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | ``spec`` | ``offset`` | absolute or relative pattern offset | > | +-------------+-------------------------------------+ > | | ``length`` | pattern length | > | +-------------+-------------------------------------+ > | | ``pattern`` | byte string of the above length | > +----------+-------------+-------------------------------------+ > | ``mask`` | ``offset`` | ignored | > | +-------------+-------------------------------------+ > | | ``length`` | ignored | > | +-------------+-------------------------------------+ > | | ``pattern`` | bitmask with the same byte length | > +----------+-------------+-------------------------------------+ > > Example pattern looking for several strings at various offsets of a UDP > payload, using combined RAW items: > > +------------------------------------------+ > | UDP payload matching | > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | 0 | Ethernet | > +---+--------------------------------------+ > | 1 | IPv4 | > +---+--------------------------------------+ > | 2 | UDP | > +---+-----+----------+-------------+-------+ > | 3 | RAW | ``spec`` | ``offset`` | -1 | > | | | +-------------+-------+ > | | | | ``length`` | 3 | > | | | +-------------+-------+ > | | | | ``pattern`` | "foo" | > +---+-----+----------+-------------+-------+ > | 4 | RAW | ``spec`` | ``offset`` | 20 | > | | | +-------------+-------+ > | | | | ``length`` | 3 | > | | | +-------------+-------+ > | | | | ``pattern`` | "bar" | > +---+-----+----------+-------------+-------+ > | 5 | RAW | ``spec`` | ``offset`` | -30 | > | | | +-------------+-------+ > | | | | ``length`` | 3 | > | | | +-------------+-------+ > | | | | ``pattern`` | "baz" | > +---+-----+----------+-------------+-------+ > > This translates to: > > - Locate "foo" in UDP payload, remember its offset. > - Check "bar" at "foo"'s offset plus 20 bytes. > - Check "baz" at "foo"'s offset minus 30 bytes. > > .. raw:: pdf > > PageBreak > > ``ETH`` > ^^^^^^^ > > Matches an Ethernet header. > > - ``dst``: destination MAC. > - ``src``: source MAC. > - ``type``: EtherType. > - ``tags``: number of 802.1Q/ad tags defined. > - ``tag[]``: 802.1Q/ad tag definitions, innermost first. For each one: > > - ``tpid``: Tag protocol identifier. > - ``tci``: Tag control information. > > ``IPV4`` > ^^^^^^^^ > > Matches an IPv4 header. > > - ``src``: source IP address. > - ``dst``: destination IP address. > - ``tos``: ToS/DSCP field. > - ``ttl``: TTL field. > - ``proto``: protocol number for the next layer. > > ``IPV6`` > ^^^^^^^^ > > Matches an IPv6 header. > > - ``src``: source IP address. > - ``dst``: destination IP address. > - ``tc``: traffic class field. > - ``nh``: Next header field (protocol). > - ``hop_limit``: hop limit field (TTL). > > ``ICMP`` > ^^^^^^^^ > > Matches an ICMP header. > > - TBD. > > ``UDP`` > ^^^^^^^ > > Matches a UDP header. > > - ``sport``: source port. > - ``dport``: destination port. > - ``length``: UDP length. > - ``checksum``: UDP checksum. > > .. raw:: pdf > > PageBreak > > ``TCP`` > ^^^^^^^ > > Matches a TCP header. > > - ``sport``: source port. > - ``dport``: destination port. > - All other TCP fields and bits. > > ``VXLAN`` > ^^^^^^^^^ > > Matches a VXLAN header. > > - TBD. > > .. raw:: pdf > > PageBreak > > Actions > ~~~~~~~ > > Each possible action is represented by a type. Some have associated > configuration structures. Several actions combined in a list can be aff= ected > to a flow rule. That list is not ordered. > > At least one action must be defined in a filter rule in order to do > something with matched packets. > > - Actions are defined with ``struct rte_flow_action``. > - A list of actions is defined with ``struct rte_flow_actions``. > > They fall in three categories: > > - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent > processing matched packets by subsequent flow rules, unless overridd= en > with PASSTHRU. > > - Non terminating actions (PASSTHRU, DUP) that leave matched packets up= for > additional processing by subsequent flow rules. > > - Other non terminating meta actions that do not affect the fate of pac= kets > (END, VOID, ID, COUNT). > > When several actions are combined in a flow rule, they should all have > different types (e.g. dropping a packet twice is not possible). However > considering the VOID type is an exception to this rule, the defined beh= avior > is for PMDs to only take into account the last action of a given type f= ound > in the list. PMDs still perform error checking on the entire list. > > *Note that PASSTHRU is the only action able to override a terminating r= ule.* [LC] I'm wondering how to address the meta data carried by mbuf, there's=20 no mentioned here. For packets hit one specific flow, usually there's something for CPU to=20 identify the flow. FDIR and RSS as an example, has id or key in mbuf. In addition, some=20 meta may pointed by userdata in mbuf. Any view on it ? > > .. raw:: pdf > > PageBreak > > Example of an action that redirects packets to queue index 10: > > +----------------+ > | QUEUE | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D+ > | ``queue`` | 10 | > +-----------+----+ > > Action lists examples, their order is not significant, applications mus= t > consider all actions to be performed simultaneously: > > +----------------+ > | Count and drop | > +=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+ > | COUNT | | > +-------+--------+ > | DROP | | > +-------+--------+ > > +--------------------------+ > | Tag, count and redirect | > +=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D= =3D+ > | ID | ``id`` | 0x2a | > +-------+-----------+------+ > | COUNT | | > +-------+-----------+------+ > | QUEUE | ``queue`` | 10 | > +-------+-----------+------+ > > +-----------------------+ > | Redirect to queue 5 | > +=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | DROP | | > +-------+-----------+---+ > | QUEUE | ``queue`` | 5 | > +-------+-----------+---+ > > In the above example, considering both actions are performed simultaneo= usly, > its end result is that only QUEUE has any effect. > > +-----------------------+ > | Redirect to queue 3 | > +=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D+ > | QUEUE | ``queue`` | 5 | > +-------+-----------+---+ > | VOID | | > +-------+-----------+---+ > | QUEUE | ``queue`` | 3 | > +-------+-----------+---+ > > As previously described, only the last action of a given type found in = the > list is taken into account. The above example also shows that VOID is > ignored. > > .. raw:: pdf > > PageBreak > > Action types > ~~~~~~~~~~~~ > > Common action types are described in this section. Like pattern item ty= pes, > this list is not exhaustive as new actions will be added in the future. > > ``END`` (action) > ^^^^^^^^^^^^^^^^ > > End marker for action lists. Prevents further processing of actions, th= ereby > ending the list. > > - Its numeric value is **0** for convenience. > - PMD support is mandatory. > - No configurable property. > > +---------------+ > | END | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | no properties | > +---------------+ > > ``VOID`` (action) > ^^^^^^^^^^^^^^^^^ > > Used as a placeholder for convenience. It is ignored and simply discard= ed by > PMDs. > > - PMD support is mandatory. > - No configurable property. > > +---------------+ > | VOID | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | no properties | > +---------------+ > > ``PASSTHRU`` > ^^^^^^^^^^^^ > > Leaves packets up for additional processing by subsequent flow rules. T= his > is the default when a rule does not contain a terminating action, but c= an be > specified to force a rule to become non-terminating. > > - No configurable property. > > +---------------+ > | PASSTHRU | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | no properties | > +---------------+ > > Example to copy a packet to a queue and continue processing by subseque= nt > flow rules: > > +--------------------------+ > | Copy to queue 8 | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D+ > | PASSTHRU | | > +----------+-----------+---+ > | QUEUE | ``queue`` | 8 | > +----------+-----------+---+ > > ``ID`` > ^^^^^^ > > Attaches a 32 bit value to packets. > > +----------------------------------------------+ > | ID | > +=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | ``id`` | 32 bit value to return with packets | > +--------+-------------------------------------+ > > .. raw:: pdf > > PageBreak > > ``QUEUE`` > ^^^^^^^^^ > > Assigns packets to a given queue index. > > - Terminating by default. > > +--------------------------------+ > | QUEUE | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D+ > | ``queue`` | queue index to use | > +-----------+--------------------+ > > ``DROP`` > ^^^^^^^^ > > Drop packets. > > - No configurable property. > - Terminating by default. > - PASSTHRU overrides this action if both are specified. > > +---------------+ > | DROP | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | no properties | > +---------------+ > > ``COUNT`` > ^^^^^^^^^ > > Enables hits counter for this rule. > > This counter can be retrieved and reset through ``rte_flow_query()``, s= ee > ``struct rte_flow_query_count``. > > - Counters can be retrieved with ``rte_flow_query()``. > - No configurable property. > > +---------------+ > | COUNT | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | no properties | > +---------------+ > > Query structure to retrieve and reset the flow rule hits counter: > > +------------------------------------------------+ > | COUNT query | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | ``reset`` | in | reset counter after query | > +-----------+-----+------------------------------+ > | ``hits`` | out | number of hits for this flow | > +-----------+-----+------------------------------+ > > ``DUP`` > ^^^^^^^ > > Duplicates packets to a given queue index. > > This is normally combined with QUEUE, however when used alone, it is > actually similar to QUEUE + PASSTHRU. > > - Non-terminating by default. > > +------------------------------------------------+ > | DUP | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | ``queue`` | queue index to duplicate packet to | > +-----------+------------------------------------+ > > .. raw:: pdf > > PageBreak > > ``RSS`` > ^^^^^^^ > > Similar to QUEUE, except RSS is additionally performed on packets to sp= read > them among several queues according to the provided parameters. > > - Terminating by default. > > +---------------------------------------------+ > | RSS | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | ``rss_conf`` | RSS parameters | > +--------------+------------------------------+ > | ``queues`` | number of entries in queue[] | > +--------------+------------------------------+ > | ``queue[]`` | queue indices to use | > +--------------+------------------------------+ > > ``PF`` (action) > ^^^^^^^^^^^^^^^ > > Redirects packets to the physical function (PF) of the current device. > > - No configurable property. > - Terminating by default. > > +---------------+ > | PF | > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | no properties | > +---------------+ > > ``VF`` (action) > ^^^^^^^^^^^^^^^ > > Redirects packets to the virtual function (VF) of the current device wi= th > the specified ID. > > - Terminating by default. > > +---------------------------------------+ > | VF | > +=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | ``id`` | VF ID to redirect packets to | > +--------+------------------------------+ > > Planned types > ~~~~~~~~~~~~~ > > Other action types are planned but not defined yet. These actions will = add > the ability to alter matching packets in several ways, such as performi= ng > encapsulation/decapsulation of tunnel headers on specific flows. > > .. raw:: pdf > > PageBreak > > Rules management > ---------------- > > A simple API with only four functions is provided to fully manage flows= . > > Each created flow rule is associated with an opaque, PMD-specific handl= e > pointer. The application is responsible for keeping it until the rule i= s > destroyed. > > Flows rules are defined with ``struct rte_flow``. > > Validation > ~~~~~~~~~~ > > Given that expressing a definite set of device capabilities with this A= PI is > not practical, a dedicated function is provided to check if a flow rule= is > supported and can be created. > > :: > > int > rte_flow_validate(uint8_t port_id, > const struct rte_flow_pattern *pattern, > const struct rte_flow_actions *actions); > > While this function has no effect on the target device, the flow rule i= s > validated against its current configuration state and the returned valu= e > should be considered valid by the caller for that state only. > > The returned value is guaranteed to remain valid only as long as no > successful calls to rte_flow_create() or rte_flow_destroy() are made in= the > meantime and no device parameter affecting flow rules in any way are > modified, due to possible collisions or resource limitations (although = in > such cases ``EINVAL`` should not be returned). > > Arguments: > > - ``port_id``: port identifier of Ethernet device. > - ``pattern``: pattern specification to check. > - ``actions``: actions associated with the flow definition. > > Return value: > > - **0** if flow rule is valid and can be created. A negative errno valu= e > otherwise (``rte_errno`` is also set), the following errors are defi= ned. > - ``-EINVAL``: unknown or invalid rule specification. > - ``-ENOTSUP``: valid but unsupported rule specification (e.g. partial = masks > are unsupported). > - ``-EEXIST``: collision with an existing rule. > - ``-ENOMEM``: not enough resources. > > .. raw:: pdf > > PageBreak > > Creation > ~~~~~~~~ > > Creating a flow rule is similar to validating one, except the rule is > actually created. > > :: > > struct rte_flow * > rte_flow_create(uint8_t port_id, > const struct rte_flow_pattern *pattern, > const struct rte_flow_actions *actions); > > Arguments: > > - ``port_id``: port identifier of Ethernet device. > - ``pattern``: pattern specification to add. > - ``actions``: actions associated with the flow definition. > > Return value: > > A valid flow pointer in case of success, NULL otherwise and ``rte_errno= `` is > set to the positive version of one of the error codes defined for > ``rte_flow_validate()``. > > Destruction > ~~~~~~~~~~~ > > Flow rules destruction is not automatic, and a queue should not be rele= ased > if any are still attached to it. Applications must take care of perform= ing > this step before releasing resources. > > :: > > int > rte_flow_destroy(uint8_t port_id, > struct rte_flow *flow); > > > Failure to destroy a flow rule may occur when other flow rules depend o= n it, > and destroying it would result in an inconsistent state. > > This function is only guaranteed to succeed if flow rules are destroyed= in > reverse order of their creation. > > Arguments: > > - ``port_id``: port identifier of Ethernet device. > - ``flow``: flow rule to destroy. > > Return value: > > - **0** on success, a negative errno value otherwise and ``rte_errno`` = is > set. > > .. raw:: pdf > > PageBreak > > Query > ~~~~~ > > Query an existing flow rule. > > This function allows retrieving flow-specific data such as counters. Da= ta > is gathered by special actions which must be present in the flow rule > definition. > > :: > > int > rte_flow_query(uint8_t port_id, > struct rte_flow *flow, > enum rte_flow_action_type action, > void *data); > > Arguments: > > - ``port_id``: port identifier of Ethernet device. > - ``flow``: flow rule to query. > - ``action``: action type to query. > - ``data``: pointer to storage for the associated query data type. > > Return value: > > - **0** on success, a negative errno value otherwise and ``rte_errno`` = is > set. > > .. raw:: pdf > > PageBreak > > Behavior > -------- > > - API operations are synchronous and blocking (``EAGAIN`` cannot be > returned). > > - There is no provision for reentrancy/multi-thread safety, although no= thing > should prevent different devices from being configured at the same > time. PMDs may protect their control path functions accordingly. > > - Stopping the data path (TX/RX) should not be necessary when managing = flow > rules. If this cannot be achieved naturally or with workarounds (suc= h as > temporarily replacing the burst function pointers), an appropriate e= rror > code must be returned (``EBUSY``). > > - PMDs, not applications, are responsible for maintaining flow rules > configuration when stopping and restarting a port or performing othe= r > actions which may affect them. They can only be destroyed explicitly= . > > .. raw:: pdf > > PageBreak > > Compatibility > ------------- > > No known hardware implementation supports all the features described in= this > document. > > Unsupported features or combinations are not expected to be fully emula= ted > in software by PMDs for performance reasons. Partially supported featur= es > may be completed in software as long as hardware performs most of the w= ork > (such as queue redirection and packet recognition). > > However PMDs are expected to do their best to satisfy application reque= sts > by working around hardware limitations as long as doing so does not aff= ect > the behavior of existing flow rules. > > The following sections provide a few examples of such cases, they are b= ased > on limitations built into the previous APIs. > > Global bitmasks > ~~~~~~~~~~~~~~~ > > Each flow rule comes with its own, per-layer bitmasks, while hardware m= ay > support only a single, device-wide bitmask for a given layer type, so t= hat > two IPv4 rules cannot use different bitmasks. > > The expected behavior in this case is that PMDs automatically configure > global bitmasks according to the needs of the first created flow rule. > > Subsequent rules are allowed only if their bitmasks match those, the > ``EEXIST`` error code should be returned otherwise. > > Unsupported layer types > ~~~~~~~~~~~~~~~~~~~~~~~ > > Many protocols can be simulated by crafting patterns with the `RAW`_ ty= pe. > > PMDs can rely on this capability to simulate support for protocols with > fixed headers not directly recognized by hardware. > > ``ANY`` pattern item > ~~~~~~~~~~~~~~~~~~~~ > > This pattern item stands for anything, which can be difficult to transl= ate > to something hardware would understand, particularly if followed by mor= e > specific types. > > Consider the following pattern: > > +---+--------------------------------+ > | 0 | ETHER | > +---+--------------------------------+ > | 1 | ANY (``min`` =3D 1, ``max`` =3D 1) | > +---+--------------------------------+ > | 2 | TCP | > +---+--------------------------------+ > > Knowing that TCP does not make sense with something other than IPv4 and= IPv6 > as L3, such a pattern may be translated to two flow rules instead: > > +---+--------------------+ > | 0 | ETHER | > +---+--------------------+ > | 1 | IPV4 (zeroed mask) | > +---+--------------------+ > | 2 | TCP | > +---+--------------------+ > > +---+--------------------+ > | 0 | ETHER | > +---+--------------------+ > | 1 | IPV6 (zeroed mask) | > +---+--------------------+ > | 2 | TCP | > +---+--------------------+ > > Note that as soon as a ANY rule covers several layers, this approach ma= y > yield a large number of hidden flow rules. It is thus suggested to only > support the most common scenarios (anything as L2 and/or L3). > > .. raw:: pdf > > PageBreak > > Unsupported actions > ~~~~~~~~~~~~~~~~~~~ > > - When combined with a `QUEUE`_ action, packet counting (`COUNT`_) and > tagging (`ID`_) may be implemented in software as long as the target= queue > is used by a single rule. > > - A rule specifying both `DUP`_ + `QUEUE`_ may be translated to two hid= den > rules combining `QUEUE`_ and `PASSTHRU`_. > > - When a single target queue is provided, `RSS`_ can also be implemente= d > through `QUEUE`_. > > Flow rules priority > ~~~~~~~~~~~~~~~~~~~ > > While it would naturally make sense, flow rules cannot be assumed to be > processed by hardware in the same order as their creation for several > reasons: > > - They may be managed internally as a tree or a hash table instead of a > list. > - Removing a flow rule before adding another one can either put the new= rule > at the end of the list or reuse a freed entry. > - Duplication may occur when packets are matched by several rules. > > For overlapping rules (particularly in order to use the `PASSTHRU`_ act= ion) > predictable behavior is only guaranteed by using different priority lev= els. > > Priority levels are not necessarily implemented in hardware, or may be > severely limited (e.g. a single priority bit). > > For these reasons, priority levels may be implemented purely in softwar= e by > PMDs. > > - For devices expecting flow rules to be added in the correct order, PM= Ds > may destroy and re-create existing rules after adding a new one with > a higher priority. > > - A configurable number of dummy or empty rules can be created at > initialization time to save high priority slots for later. > > - In order to save priority levels, PMDs may evaluate whether rules are > likely to collide and adjust their priority accordingly. > > .. raw:: pdf > > PageBreak > > API migration > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > Exhaustive list of deprecated filter types and how to convert them to > generic flow rules. > > ``MACVLAN`` to ``ETH`` =E2=86=92 ``VF``, ``PF`` > --------------------------------------- > > `MACVLAN`_ can be translated to a basic `ETH`_ flow rule with a `VF > (action)`_ or `PF (action)`_ terminating action. > > +------------------------------------+ > | MACVLAN | > +--------------------------+---------+ > | Pattern | Actions | > +=3D=3D=3D+=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D= +=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | 0 | ETH | ``spec`` | any | VF, | > | | +----------+-----+ PF | > | | | ``mask`` | any | | > +---+-----+----------+-----+---------+ > > ``ETHERTYPE`` to ``ETH`` =E2=86=92 ``QUEUE``, ``DROP`` > ---------------------------------------------- > > `ETHERTYPE`_ is basically an `ETH`_ flow rule with `QUEUE`_ or `DROP`_ = as > a terminating action. > > +------------------------------------+ > | ETHERTYPE | > +--------------------------+---------+ > | Pattern | Actions | > +=3D=3D=3D+=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D= +=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | 0 | ETH | ``spec`` | any | QUEUE, | > | | +----------+-----+ DROP | > | | | ``mask`` | any | | > +---+-----+----------+-----+---------+ > > ``FLEXIBLE`` to ``RAW`` =E2=86=92 ``QUEUE`` > ----------------------------------- > > `FLEXIBLE`_ can be translated to one `RAW`_ pattern with `QUEUE`_ as th= e > terminating action and a defined priority level. > > +------------------------------------+ > | FLEXIBLE | > +--------------------------+---------+ > | Pattern | Actions | > +=3D=3D=3D+=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D= +=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | 0 | RAW | ``spec`` | any | QUEUE | > | | +----------+-----+ | > | | | ``mask`` | any | | > +---+-----+----------+-----+---------+ > > ``SYN`` to ``TCP`` =E2=86=92 ``QUEUE`` > ------------------------------ > > `SYN`_ is a `TCP`_ rule with only the ``syn`` bit enabled and masked, a= nd > `QUEUE`_ as the terminating action. > > Priority level can be set to simulate the high priority bit. > > +---------------------------------------------+ > | SYN | > +-----------------------------------+---------+ > | Pattern | Actions | > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | 0 | ETH | ``spec`` | N/A | QUEUE | > | | +----------+-------------+ | > | | | ``mask`` | empty | | > +---+------+----------+-------------+ | > | 1 | IPV4 | ``spec`` | N/A | | > | | +----------+-------------+ | > | | | ``mask`` | empty | | > +---+------+----------+-------------+ | > | 2 | TCP | ``spec`` | ``syn`` =3D 1 | | > | | +----------+-------------+ | > | | | ``mask`` | ``syn`` =3D 1 | | > +---+------+----------+-------------+---------+ > > ``NTUPLE`` to ``IPV4``, ``TCP``, ``UDP`` =E2=86=92 ``QUEUE`` > ---------------------------------------------------- > > `NTUPLE`_ is similar to specifying an empty L2, `IPV4`_ as L3 with `TCP= `_ or > `UDP`_ as L4 and `QUEUE`_ as the terminating action. > > A priority level can be specified as well. > > +---------------------------------------+ > | NTUPLE | > +-----------------------------+---------+ > | Pattern | Actions | > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D= =3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | 0 | ETH | ``spec`` | N/A | QUEUE | > | | +----------+-------+ | > | | | ``mask`` | empty | | > +---+------+----------+-------+ | > | 1 | IPV4 | ``spec`` | any | | > | | +----------+-------+ | > | | | ``mask`` | any | | > +---+------+----------+-------+ | > | 2 | TCP, | ``spec`` | any | | > | | UDP +----------+-------+ | > | | | ``mask`` | any | | > +---+------+----------+-------+---------+ > > ``TUNNEL`` to ``ETH``, ``IPV4``, ``IPV6``, ``VXLAN`` (or other) =E2=86=92= ``QUEUE`` > -----------------------------------------------------------------------= ---- > > `TUNNEL`_ matches common IPv4 and IPv6 L3/L4-based tunnel types. > > In the following table, `ANY`_ is used to cover the optional L4. > > +------------------------------------------------+ > | TUNNEL | > +--------------------------------------+---------+ > | Pattern | Actions | > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | 0 | ETH | ``spec`` | any | QUEUE | > | | +----------+-------------+ | > | | | ``mask`` | any | | > +---+---------+----------+-------------+ | > | 1 | IPV4, | ``spec`` | any | | > | | IPV6 +----------+-------------+ | > | | | ``mask`` | any | | > +---+---------+----------+-------------+ | > | 2 | ANY | ``spec`` | ``min`` =3D 0 | | > | | | +-------------+ | > | | | | ``max`` =3D 0 | | > | | +----------+-------------+ | > | | | ``mask`` | N/A | | > +---+---------+----------+-------------+ | > | 3 | VXLAN, | ``spec`` | any | | > | | GENEVE, +----------+-------------+ | > | | TEREDO, | ``mask`` | any | | > | | NVGRE, | | | | > | | GRE, | | | | > | | ... | | | | > +---+---------+----------+-------------+---------+ > > .. raw:: pdf > > PageBreak > > ``FDIR`` to most item types =E2=86=92 ``QUEUE``, ``DROP``, ``PASSTHRU`` > --------------------------------------------------------------- > > `FDIR`_ is more complex than any other type, there are several methods = to > emulate its functionality. It is summarized for the most part in the ta= ble > below. > > A few features are intentionally not supported: > > - The ability to configure the matching input set and masks for the ent= ire > device, PMDs should take care of it automatically according to flow = rules. > > - Returning four or eight bytes of matched data when using flex bytes > filtering. Although a specific action could implement it, it conflic= ts > with the much more useful 32 bits tagging on devices that support it= . > > - Side effects on RSS processing of the entire device. Flow rules that > conflict with the current device configuration should not be > allowed. Similarly, device configuration should not be allowed when = it > affects existing flow rules. > > - Device modes of operation. "none" is unsupported since filtering cann= ot be > disabled as long as a flow rule is present. > > - "MAC VLAN" or "tunnel" perfect matching modes should be automatically= set > according to the created flow rules. > > +----------------------------------------------+ > | FDIR | > +---------------------------------+------------+ > | Pattern | Actions | > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D+=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | 0 | ETH, | ``spec`` | any | QUEUE, | > | | RAW +----------+-----+ DROP, | > | | | ``mask`` | any | PASSTHRU | > +---+------------+----------+-----+------------+ > | 1 | IPV4, | ``spec`` | any | ID | > | | IPV6 +----------+-----+ (optional) | > | | | ``mask`` | any | | > +---+------------+----------+-----+ | > | 2 | TCP, | ``spec`` | any | | > | | UDP, +----------+-----+ | > | | SCTP | ``mask`` | any | | > +---+------------+----------+-----+ | > | 3 | VF, | ``spec`` | any | | > | | PF, +----------+-----+ | > | | SIGNATURE | ``mask`` | any | | > | | (optional) | | | | > +---+------------+----------+-----+------------+ > > ``HASH`` > ~~~~~~~~ > > Hashing configuration is set per rule through the `SIGNATURE`_ item. > > Since it is usually a global device setting, all flow rules created wit= h > this item may have to share the same specification. > > ``L2_TUNNEL`` to ``VOID`` =E2=86=92 ``VXLAN`` (or others) > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > All packets are matched. This type alters incoming packets to encapsula= te > them in a chosen tunnel type, optionally redirect them to a VF as well. > > The destination pool for tag based forwarding can be emulated with othe= r > flow rules using `DUP`_ as the action. > > +----------------------------------------+ > | L2_TUNNEL | > +---------------------------+------------+ > | Pattern | Actions | > +=3D=3D=3D+=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D= =3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > | 0 | VOID | ``spec`` | N/A | VXLAN, | > | | | | | GENEVE, | > | | | | | ... | > | | +----------+-----+------------+ > | | | ``mask`` | N/A | VF | > | | | | | (optional) | > +---+------+----------+-----+------------+ >