From: Stanislav Fomichev <sdf@google.com>
To: bpf@vger.kernel.org
Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
martin.lau@linux.dev, song@kernel.org, yhs@fb.com,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com,
haoluo@google.com, jolsa@kernel.org, kuba@kernel.org,
toke@kernel.org, willemb@google.com, dsahern@kernel.org,
magnus.karlsson@intel.com, bjorn@kernel.org,
maciej.fijalkowski@intel.com, hawk@kernel.org,
yoong.siang.song@intel.com, netdev@vger.kernel.org,
xdp-hints@xdp-project.net
Subject: [PATCH bpf-next v3 10/10] xsk: document tx_metadata_len layout
Date: Tue, 3 Oct 2023 13:05:22 -0700 [thread overview]
Message-ID: <20231003200522.1914523-11-sdf@google.com> (raw)
In-Reply-To: <20231003200522.1914523-1-sdf@google.com>
- how to use
- how to query features
- pointers to the examples
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
Documentation/networking/index.rst | 1 +
Documentation/networking/xsk-tx-metadata.rst | 77 ++++++++++++++++++++
2 files changed, 78 insertions(+)
create mode 100644 Documentation/networking/xsk-tx-metadata.rst
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 5b75c3f7a137..9b2accb48df7 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -123,6 +123,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics.
xfrm_sync
xfrm_sysctl
xdp-rx-metadata
+ xsk-tx-metadata
.. only:: subproject and html
diff --git a/Documentation/networking/xsk-tx-metadata.rst b/Documentation/networking/xsk-tx-metadata.rst
new file mode 100644
index 000000000000..b7289f06745c
--- /dev/null
+++ b/Documentation/networking/xsk-tx-metadata.rst
@@ -0,0 +1,77 @@
+==================
+AF_XDP TX Metadata
+==================
+
+This document describes how to enable offloads when transmitting packets
+via :doc:`af_xdp`. Refer to :doc:`xdp-rx-metadata` on how to access similar
+metadata on the receive side.
+
+General Design
+==============
+
+The headroom for the metadata is reserved via ``tx_metadata_len`` in
+``struct xdp_umem_reg``. The metadata length is therefore the same for
+every socket that shares the same umem. The metadata layout is a fixed UAPI,
+refer to ``union xsk_tx_metadata`` in ``include/uapi/linux/if_xdp.h``.
+Thus, generally, the ``tx_metadata_len`` field above should contain
+``sizeof(union xsk_tx_metadata)``.
+
+The headroom and the metadata itself should be located right before
+``xdp_desc->addr`` in the umem frame. Within a frame, the metadata
+layout is as follows::
+
+ tx_metadata_len
+ / \
+ +-----------------+---------+----------------------------+
+ | xsk_tx_metadata | padding | payload |
+ +-----------------+---------+----------------------------+
+ ^
+ |
+ xdp_desc->addr
+
+An AF_XDP application can request headrooms larger than ``sizeof(struct
+xsk_tx_metadata)``. The kernel will ignore the padding (and will still
+use ``xdp_desc->addr - tx_metadata_len`` to locate
+the ``xsk_tx_metadata``). For the frames that shouldn't carry
+any metadata (i.e., the ones that don't have ``XDP_TX_METADATA`` option),
+the metadata area is ignored by the kernel as well.
+
+The flags field enables the particular offload:
+
+- ``XDP_TX_METADATA_TIMESTAMP``: requests the device to put transmission
+ timestamp into ``tx_timestamp`` field of ``union xsk_tx_metadata``.
+- ``XDP_TX_METADATA_CHECKSUM``: requests the device to calculate L4
+ checksum. ``csum_start`` specifies byte offset of there the checksumming
+ should start and ``csum_offset`` specifies byte offset where the
+ device should store the computed checksum.
+- ``XDP_TX_METADATA_CHECKSUM_SW``: requests checksum calculation to
+ be done in software; this mode works only in ``XSK_COPY`` mode and
+ is mostly intended for testing. Do not enable this option, it
+ will negatively affect performance.
+
+Besides the flags above, in order to trigger the offloads, the first
+packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA``
+bit in the ``options`` field. Also not that in a multi-buffer packet
+only the first chunk should carry the metadata.
+
+Querying Device Capabilities
+============================
+
+Every devices exports its offloads capabilities via netlink netdev family.
+Refer to ``xsk-flags`` features bitmask in
+``Documentation/netlink/specs/netdev.yaml``.
+
+- ``tx-timestamp``: device supports ``XDP_TX_METADATA_TIMESTAMP``
+- ``tx-checksum``: device supports ``XDP_TX_METADATA_CHECKSUM``
+
+Note that every devices supports ``XDP_TX_METADATA_CHECKSUM_SW`` when
+running in ``XSK_COPY`` mode.
+
+See ``tools/net/ynl/samples/netdev.c`` on how to query this information.
+
+Example
+=======
+
+See ``tools/testing/selftests/bpf/xdp_hw_metadata.c`` for an example
+program that handles TX metadata. Also see https://github.com/fomichev/xskgen
+for a more bare-bones example.
--
2.42.0.582.g8ccd20d70d-goog
prev parent reply other threads:[~2023-10-03 20:05 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-03 20:05 [PATCH bpf-next v3 00/10] xsk: TX metadata Stanislav Fomichev
2023-10-03 20:05 ` [PATCH bpf-next v3 01/10] xsk: Support tx_metadata_len Stanislav Fomichev
2023-10-03 20:05 ` [PATCH bpf-next v3 02/10] xsk: add TX timestamp and TX checksum offload support Stanislav Fomichev
2023-10-04 6:18 ` Song, Yoong Siang
2023-10-04 17:48 ` Stanislav Fomichev
2023-10-04 17:56 ` Stanislav Fomichev
2023-10-05 1:16 ` Song, Yoong Siang
2023-10-03 20:05 ` [PATCH bpf-next v3 03/10] tools: ynl: print xsk-features from the sample Stanislav Fomichev
2023-10-03 20:05 ` [PATCH bpf-next v3 04/10] net/mlx5e: Implement AF_XDP TX timestamp and checksum offload Stanislav Fomichev
2023-10-04 23:47 ` kernel test robot
2023-10-03 20:05 ` [PATCH bpf-next v3 05/10] net: stmmac: Add Tx HWTS support to XDP ZC Stanislav Fomichev
2023-10-04 23:05 ` kernel test robot
2023-10-04 23:14 ` Stanislav Fomichev
2023-10-06 4:38 ` kernel test robot
2023-10-03 20:05 ` [PATCH bpf-next v3 06/10] selftests/xsk: Support tx_metadata_len Stanislav Fomichev
2023-10-03 20:05 ` [PATCH bpf-next v3 07/10] selftests/bpf: Add csum helpers Stanislav Fomichev
2023-10-03 20:05 ` [PATCH bpf-next v3 08/10] selftests/bpf: Add TX side to xdp_metadata Stanislav Fomichev
2023-10-03 20:05 ` [PATCH bpf-next v3 09/10] selftests/bpf: Add TX side to xdp_hw_metadata Stanislav Fomichev
2023-10-09 8:12 ` Jesper Dangaard Brouer
2023-10-09 16:37 ` Stanislav Fomichev
2023-10-10 20:40 ` Stanislav Fomichev
2023-10-13 1:13 ` Song, Yoong Siang
2023-10-13 18:47 ` Stanislav Fomichev
2023-10-15 13:28 ` Song, Yoong Siang
2023-10-03 20:05 ` Stanislav Fomichev [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231003200522.1914523-11-sdf@google.com \
--to=sdf@google.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dsahern@kernel.org \
--cc=haoluo@google.com \
--cc=hawk@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=song@kernel.org \
--cc=toke@kernel.org \
--cc=willemb@google.com \
--cc=xdp-hints@xdp-project.net \
--cc=yhs@fb.com \
--cc=yoong.siang.song@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.