* [v10 0/3] net/af_xdp: fix multi interface support for K8s
@ 2024-02-29 13:01 Maryam Tahhan
2024-02-29 13:01 ` [v10 1/3] docs: AF_XDP Device Plugin Maryam Tahhan
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Maryam Tahhan @ 2024-02-29 13:01 UTC (permalink / raw)
To: ferruh.yigit, stephen, lihuisong, fengchengwen, liuyonglong,
david.marchand, shibin.koikkara.reeny, ciara.loftus
Cc: dev, Maryam Tahhan
The original `use_cni` implementation was limited to
supporting only a single netdev in a DPDK pod. This patchset
aims to fix this limitation transparently to the end user.
It will also enable compatibility with the latest AF_XDP
Device Plugin.
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
---
v10:
* Add UDS acronym
* Update `use_cni` in docs with ``use_cni``
* Remove reference to limitations and simply document behaviour
before and after DPDK 23.11.
v9:
* Fixup checkpatch issues.
v8:
* Go back to using `use_cni` vdev argument
* Introduce `use_map_pinning` vdev param.
* Rename `uds_path` to `dp_path` so that it can be used
with map pinning as well as `use_cni`.
* Set `dp_path` internally in the AF_XDP PMD if it's
not configured by the user.
* Clean up the original `use_cni` documentation separately
to coding changes.
v7:
* Give a more descriptive commit msg headline.
* Fixup typos in documentation.
v6:
* Add link to PR 81 in commit message
* Add release notes changes to this patchset
v5:
* Fix alignment for ETH_AF_XDP_USE_DP_UDS_PATH_ARG
* Remove use_cni references in af_xdp.rst
v4:
* Rename af_xdp_cni.rst to af_xdp_dp.rst
* Removed all incorrect references to CNI throughout af_xdp
PMD file.
* Fixed Typos in af_xdp_dp.rst
v3:
* Remove `use_cni` vdev argument as it's no longer needed.
* Update incorrect CNI references for the AF_XDP DP in the
documentation.
* Update the documentation to run a simple example with the
AF_XDP DP plugin in K8s.
v2:
* Rename sock_path to uds_path.
* Update documentation to reflect when CAP_BPF is needed.
* Fix testpmd arguments in the provided example for Pods.
* Use AF_XDP API to update the xskmap entry.
---
Maryam Tahhan (3):
docs: AF_XDP Device Plugin
net/af_xdp: fix multi interface support for K8s
net/af_xdp: support AF_XDP DP pinned maps
doc/guides/howto/af_xdp_cni.rst | 253 ------------------
doc/guides/howto/af_xdp_dp.rst | 340 +++++++++++++++++++++++++
doc/guides/howto/index.rst | 2 +-
doc/guides/nics/af_xdp.rst | 44 +++-
doc/guides/rel_notes/release_24_03.rst | 17 ++
drivers/net/af_xdp/rte_eth_af_xdp.c | 167 ++++++++----
6 files changed, 522 insertions(+), 301 deletions(-)
delete mode 100644 doc/guides/howto/af_xdp_cni.rst
create mode 100644 doc/guides/howto/af_xdp_dp.rst
--
2.41.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [v10 1/3] docs: AF_XDP Device Plugin
2024-02-29 13:01 [v10 0/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
@ 2024-02-29 13:01 ` Maryam Tahhan
2024-02-29 13:01 ` [v10 2/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
2024-02-29 13:01 ` [v10 3/3] net/af_xdp: support AF_XDP DP pinned maps Maryam Tahhan
2 siblings, 0 replies; 5+ messages in thread
From: Maryam Tahhan @ 2024-02-29 13:01 UTC (permalink / raw)
To: ferruh.yigit, stephen, lihuisong, fengchengwen, liuyonglong,
david.marchand, shibin.koikkara.reeny, ciara.loftus
Cc: dev, Maryam Tahhan, stable
Fixup the references to the AF_XDP Device Plugin in
the documentation (was referred to as CNI previously)
and document the single netdev limitation for deploying
an AF_XDP based DPDK pod. Also renames af_xdp_cni.rst to
af_xdp_dp.rst
Fixes: 7fc6ae50369d ("net/af_xdp: support CNI Integration")
Cc: stable@dpdk.org
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
---
doc/guides/howto/af_xdp_cni.rst | 253 ---------------------------
doc/guides/howto/af_xdp_dp.rst | 299 ++++++++++++++++++++++++++++++++
doc/guides/howto/index.rst | 2 +-
doc/guides/nics/af_xdp.rst | 4 +-
4 files changed, 302 insertions(+), 256 deletions(-)
delete mode 100644 doc/guides/howto/af_xdp_cni.rst
create mode 100644 doc/guides/howto/af_xdp_dp.rst
diff --git a/doc/guides/howto/af_xdp_cni.rst b/doc/guides/howto/af_xdp_cni.rst
deleted file mode 100644
index a1a6d5b99c..0000000000
--- a/doc/guides/howto/af_xdp_cni.rst
+++ /dev/null
@@ -1,253 +0,0 @@
-.. SPDX-License-Identifier: BSD-3-Clause
- Copyright(c) 2023 Intel Corporation.
-
-Using a CNI with the AF_XDP driver
-==================================
-
-Introduction
-------------
-
-CNI, the Container Network Interface, is a technology for configuring
-container network interfaces
-and which can be used to setup Kubernetes networking.
-AF_XDP is a Linux socket Address Family that enables an XDP program
-to redirect packets to a memory buffer in userspace.
-
-This document explains how to enable the `AF_XDP Plugin for Kubernetes`_ within
-a DPDK application using the :doc:`../nics/af_xdp` to connect and use these technologies.
-
-.. _AF_XDP Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
-
-
-Background
-----------
-
-The standard :doc:`../nics/af_xdp` initialization process involves loading an eBPF program
-onto the kernel netdev to be used by the PMD.
-This operation requires root or escalated Linux privileges
-and thus prevents the PMD from working in an unprivileged container.
-The AF_XDP CNI plugin handles this situation
-by providing a device plugin that performs the program loading.
-
-At a technical level the CNI opens a Unix Domain Socket and listens for a client
-to make requests over that socket.
-A DPDK application acting as a client connects and initiates a configuration "handshake".
-The client then receives a file descriptor which points to the XSKMAP
-associated with the loaded eBPF program.
-The XSKMAP is a BPF map of AF_XDP sockets (XSK).
-The client can then proceed with creating an AF_XDP socket
-and inserting that socket into the XSKMAP pointed to by the descriptor.
-
-The EAL vdev argument ``use_cni`` is used to indicate that the user wishes
-to run the PMD in unprivileged mode and to receive the XSKMAP file descriptor
-from the CNI.
-When this flag is set,
-the ``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` libbpf flag
-should be used when creating the socket
-to instruct libbpf not to load the default libbpf program on the netdev.
-Instead the loading is handled by the CNI.
-
-.. note::
-
- The Unix Domain Socket file path appear in the end user is "/tmp/afxdp.sock".
-
-
-Prerequisites
--------------
-
-Docker and container prerequisites:
-
-* Set up the device plugin
- as described in the instructions for `AF_XDP Plugin for Kubernetes`_.
-
-* The Docker image should contain the libbpf and libxdp libraries,
- which are dependencies for AF_XDP,
- and should include support for the ``ethtool`` command.
-
-* The Pod should have enabled the capabilities ``CAP_NET_RAW`` and ``CAP_BPF``
- for AF_XDP along with support for hugepages.
-
-* Increase locked memory limit so containers have enough memory for packet buffers.
- For example:
-
- .. code-block:: console
-
- cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
- [Service]
- LimitMEMLOCK=infinity
- EOF
-
-* dpdk-testpmd application should have AF_XDP feature enabled.
-
- For further information see the docs for the: :doc:`../../nics/af_xdp`.
-
-
-Example
--------
-
-Howto run dpdk-testpmd with CNI plugin:
-
-* Clone the CNI plugin
-
- .. code-block:: console
-
- # git clone https://github.com/intel/afxdp-plugins-for-kubernetes.git
-
-* Build the CNI plugin
-
- .. code-block:: console
-
- # cd afxdp-plugins-for-kubernetes/
- # make build
-
- .. note::
-
- CNI plugin has a dependence on the config.json.
-
- Sample Config.json
-
- .. code-block:: json
-
- {
- "logLevel":"debug",
- "logFile":"afxdp-dp-e2e.log",
- "pools":[
- {
- "name":"e2e",
- "mode":"primary",
- "timeout":30,
- "ethtoolCmds" : ["-L -device- combined 1"],
- "devices":[
- {
- "name":"ens785f0"
- }
- ]
- }
- ]
- }
-
- For further reference please use the `config.json`_
-
- .. _config.json: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/v0.0.2/test/e2e/config.json
-
-* Create the Network Attachment definition
-
- .. code-block:: console
-
- # kubectl create -f nad.yaml
-
- Sample nad.yml
-
- .. code-block:: yaml
-
- apiVersion: "k8s.cni.cncf.io/v1"
- kind: NetworkAttachmentDefinition
- metadata:
- name: afxdp-e2e-test
- annotations:
- k8s.v1.cni.cncf.io/resourceName: afxdp/e2e
- spec:
- config: '{
- "cniVersion": "0.3.0",
- "type": "afxdp",
- "mode": "cdq",
- "logFile": "afxdp-cni-e2e.log",
- "logLevel": "debug",
- "ipam": {
- "type": "host-local",
- "subnet": "192.168.1.0/24",
- "rangeStart": "192.168.1.200",
- "rangeEnd": "192.168.1.216",
- "routes": [
- { "dst": "0.0.0.0/0" }
- ],
- "gateway": "192.168.1.1"
- }
- }'
-
- For further reference please use the `nad.yaml`_
-
- .. _nad.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/v0.0.2/test/e2e/nad.yaml
-
-* Build the Docker image
-
- .. code-block:: console
-
- # docker build -t afxdp-e2e-test -f Dockerfile .
-
- Sample Dockerfile:
-
- .. code-block:: console
-
- FROM ubuntu:20.04
- RUN apt-get update -y
- RUN apt install build-essential libelf-dev -y
- RUN apt-get install iproute2 acl -y
- RUN apt install python3-pyelftools ethtool -y
- RUN apt install libnuma-dev libjansson-dev libpcap-dev net-tools -y
- RUN apt-get install clang llvm -y
- COPY ./libbpf<version>.tar.gz /tmp
- RUN cd /tmp && tar -xvmf libbpf<version>.tar.gz && cd libbpf/src && make install
- COPY ./libxdp<version>.tar.gz /tmp
- RUN cd /tmp && tar -xvmf libxdp<version>.tar.gz && cd libxdp && make install
-
- .. note::
-
- All the files that need to COPY-ed should be in the same directory as the Dockerfile
-
-* Run the Pod
-
- .. code-block:: console
-
- # kubectl create -f pod.yaml
-
- Sample pod.yaml:
-
- .. code-block:: yaml
-
- apiVersion: v1
- kind: Pod
- metadata:
- name: afxdp-e2e-test
- annotations:
- k8s.v1.cni.cncf.io/networks: afxdp-e2e-test
- spec:
- containers:
- - name: afxdp
- image: afxdp-e2e-test:latest
- imagePullPolicy: Never
- env:
- - name: LD_LIBRARY_PATH
- value: /usr/lib64/:/usr/local/lib/
- command: ["tail", "-f", "/dev/null"]
- securityContext:
- capabilities:
- add:
- - CAP_NET_RAW
- - CAP_BPF
- resources:
- requests:
- hugepages-2Mi: 2Gi
- memory: 2Gi
- afxdp/e2e: '1'
- limits:
- hugepages-2Mi: 2Gi
- memory: 2Gi
- afxdp/e2e: '1'
-
- For further reference please use the `pod.yaml`_
-
- .. _pod.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/v0.0.2/test/e2e/pod-1c1d.yaml
-
-* Run DPDK with a command like the following:
-
- .. code-block:: console
-
- kubectl exec -i <Pod name> --container <containers name> -- \
- /<Path>/dpdk-testpmd -l 0,1 --no-pci \
- --vdev=net_af_xdp0,use_cni=1,iface=<interface name> \
- -- --no-mlockall --in-memory
-
-For further reference please use the `e2e`_ test case in `AF_XDP Plugin for Kubernetes`_
-
- .. _e2e: https://github.com/intel/afxdp-plugins-for-kubernetes/tree/v0.0.2/test/e2e
diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
new file mode 100644
index 0000000000..7166d904bd
--- /dev/null
+++ b/doc/guides/howto/af_xdp_dp.rst
@@ -0,0 +1,299 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2023 Intel Corporation.
+
+Using the AF_XDP driver in Kubernetes
+=====================================
+
+Introduction
+------------
+
+Two infrastructure components are needed in order to provision a pod that is
+using the AF_XDP PMD in Kubernetes:
+
+1. AF_XDP Device Plugin (DP).
+2. AF_XDP Container Network Interface (CNI) binary.
+
+Both of these components are available through the `AF_XDP Device Plugin for Kubernetes`_
+repository.
+
+The AF_XDP DP provisions and advertises networking interfaces to Kubernetes,
+while the CNI configures and plumbs network interfaces for the Pod.
+
+This document explains how to use the `AF_XDP Device Plugin for Kubernetes`_ with
+a DPDK application using the :doc:`../nics/af_xdp`.
+
+.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
+
+Background
+----------
+
+The standard :doc:`../nics/af_xdp` initialization process involves loading an eBPF program
+onto the kernel netdev to be used by the PMD.
+This operation requires root or escalated Linux privileges
+and thus prevents the PMD from working in an unprivileged container.
+The AF_XDP Device Plugin handles this situation
+by managing the eBPF program(s) on behalf of the Pod, outside of the pod context.
+
+At a technical level the AF_XDP Device Plugin opens a Unix Domain Socket (UDS) and listens for a client
+to make requests over that socket.
+A DPDK application acting as a client connects and initiates a configuration "handshake".
+After some validation on the Device Plugin side, the client receives a file descriptor which points to the XSKMAP
+associated with the loaded eBPF program.
+The XSKMAP is an eBPF map of AF_XDP sockets (XSK).
+The client can then proceed with creating an AF_XDP socket
+and inserting that socket into the XSKMAP pointed to by the descriptor.
+
+The EAL vdev argument ``use_cni`` is used to indicate that the user wishes
+to run the PMD in unprivileged mode and to receive the XSKMAP file descriptor
+from the CNI.
+When this flag is set,
+the ``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` libbpf flag
+should be used when creating the socket
+to instruct libbpf not to load the default libbpf program on the netdev.
+Instead the loading is handled by the AF_XDP Device Plugin.
+
+Limitations
+-----------
+
+For DPDK versions <= v23.11 the Unix Domain Socket file path appears in
+the pod at "/tmp/afxdp.sock". The handshake implementation in the AF_XDP PMD
+is only compatible with the AF_XDP Device Plugin up to commit id `38317c2`_
+and the pod is limited to a single netdev.
+
+.. note::
+
+ DPDK AF_XDP PMD <= v23.11 will not work with the latest version of the
+ AF_XDP Device Plugin.
+
+The issue is if a single pod requests different devices from different pools it
+results in multiple UDS servers serving the pod with the container using only a
+single mount point for their UDS as ``/tmp/afxdp.sock``. This means that at best one
+device might be able to complete the handshake. This has been fixed in the AF_XDP
+Device Plugin so that the mount point in the pods for the UDS appear at
+``/tmp/afxdp_dp/<netdev>/afxdp.sock``. Later versions of DPDK fix this hardcoded path
+in the PMD alongside the ``use_cni`` parameter.
+
+.. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
+
+
+Prerequisites
+-------------
+
+Device Plugin and DPDK container prerequisites:
+
+* Create a DPDK container image.
+
+* Set up the device plugin and prepare the Pod Spec as described in
+ the instructions for `AF_XDP Device Plugin for Kubernetes`_.
+
+* The Docker image should contain the libbpf and libxdp libraries,
+ which are dependencies for AF_XDP,
+ and should include support for the ``ethtool`` command.
+
+* The Pod should have enabled the capabilities ``CAP_NET_RAW`` for
+ AF_XDP socket creation, ``IPC_LOCK`` for umem creation and
+ ``CAP_BPF`` (for Kernel < 5.19) along with support for hugepages.
+
+ .. note::
+
+ For Kernel versions < 5.19, all BPF sys calls required CAP_BPF, to access maps shared
+ between the eBFP program and the userspace program. Kernels >= 5.19, only requires CAP_BPF
+ for map creation (BPF_MAP_CREATE) and loading programs (BPF_PROG_LOAD).
+
+* Increase locked memory limit so containers have enough memory for packet buffers.
+ For example:
+
+ .. code-block:: console
+
+ cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
+ [Service]
+ LimitMEMLOCK=infinity
+ EOF
+
+* dpdk-testpmd application should have AF_XDP feature enabled.
+
+ For further information see the docs for the: :doc:`../../nics/af_xdp`.
+
+
+Example
+-------
+
+Build a DPDK container image (using Docker)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+1. Create a Dockerfile (should be placed in top level DPDK directory):
+
+ .. code-block:: console
+
+ FROM fedora:38
+
+ # Setup container to build DPDK applications
+ RUN dnf -y upgrade && dnf -y install \
+ libbsd-devel \
+ numactl-libs \
+ libbpf-devel \
+ libbpf \
+ meson \
+ ninja-build \
+ libxdp-devel \
+ libxdp \
+ numactl-devel \
+ python3-pyelftools \
+ python38 \
+ iproute
+ RUN dnf groupinstall -y 'Development Tools'
+
+ # Create DPDK dir and copy over sources
+ # Create DPDK dir and copy over sources
+ COPY ./ /dpdk
+ WORKDIR /dpdk
+
+ # Build DPDK
+ RUN meson setup build
+ RUN ninja -C build
+
+2. Build a DPDK container image (using Docker)
+
+ .. code-block:: console
+
+ # docker build -t dpdk -f Dockerfile
+
+Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+* Clone the AF_XDP Device plugin and CNI
+
+ .. code-block:: console
+
+ # git clone https://github.com/intel/afxdp-plugins-for-kubernetes.git
+
+ .. note::
+
+ Ensure you have the AF_XDP Device Plugin + CNI prerequisites installed.
+
+* Build the AF_XDP Device plugin and CNI
+
+ .. code-block:: console
+
+ # cd afxdp-plugins-for-kubernetes/
+ # make image
+
+* Make sure to modify the image used by the `daemonset.yml`_ file in the deployments directory with
+ the following configuration:
+
+ .. _daemonset.yml : https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/deployments/daemonset.yml
+
+ .. code-block:: yaml
+
+ image: afxdp-device-plugin:latest
+
+ .. note::
+
+ This will select the AF_XDP DP image that was built locally. Detailed configuration
+ options can be found in the AF_XDP Device Plugin `readme`_ .
+
+ .. _readme: https://github.com/intel/afxdp-plugins-for-kubernetes#readme
+
+* Deploy the AF_XDP Device Plugin and CNI
+
+ .. code-block:: console
+
+ # kubectl create -f deployments/daemonset.yml
+
+* Create the Network Attachment definition
+
+ .. code-block:: console
+
+ # kubectl create -f nad.yaml
+
+ Sample nad.yml
+
+ .. code-block:: yaml
+
+ apiVersion: "k8s.cni.cncf.io/v1"
+ kind: NetworkAttachmentDefinition
+ metadata:
+ name: afxdp-network
+ annotations:
+ k8s.v1.cni.cncf.io/resourceName: afxdp/myPool
+ spec:
+ config: '{
+ "cniVersion": "0.3.0",
+ "type": "afxdp",
+ "mode": "primary",
+ "logFile": "afxdp-cni.log",
+ "logLevel": "debug",
+ "ethtoolCmds" : ["-N -device- rx-flow-hash udp4 fn",
+ "-N -device- flow-type udp4 dst-port 2152 action 22"
+ ],
+ "ipam": {
+ "type": "host-local",
+ "subnet": "192.168.1.0/24",
+ "rangeStart": "192.168.1.200",
+ "rangeEnd": "192.168.1.220",
+ "routes": [
+ { "dst": "0.0.0.0/0" }
+ ],
+ "gateway": "192.168.1.1"
+ }
+ }'
+
+ For further reference please use the example provided by the AF_XDP DP `nad.yaml`_
+
+ .. _nad.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/examples/network-attachment-definition.yaml
+
+* Run the Pod
+
+ .. code-block:: console
+
+ # kubectl create -f pod.yaml
+
+ Sample pod.yaml:
+
+ .. code-block:: yaml
+
+ apiVersion: v1
+ kind: Pod
+ metadata:
+ name: dpdk
+ annotations:
+ k8s.v1.cni.cncf.io/networks: afxdp-network
+ spec:
+ containers:
+ - name: testpmd
+ image: dpdk:latest
+ command: ["tail", "-f", "/dev/null"]
+ securityContext:
+ capabilities:
+ add:
+ - NET_RAW
+ - IPC_LOCK
+ resources:
+ requests:
+ afxdp/myPool: '1'
+ limits:
+ hugepages-1Gi: 2Gi
+ cpu: 2
+ memory: 256Mi
+ afxdp/myPool: '1'
+ volumeMounts:
+ - name: hugepages
+ mountPath: /dev/hugepages
+ volumes:
+ - name: hugepages
+ emptyDir:
+ medium: HugePages
+
+ For further reference please use the `pod.yaml`_
+
+ .. _pod.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/examples/pod-spec.yaml
+
+* Run DPDK with a command like the following:
+
+ .. code-block:: console
+
+ kubectl exec -i <Pod name> --container <containers name> -- \
+ /<Path>/dpdk-testpmd -l 0,1 --no-pci \
+ --vdev=net_af_xdp0,use_cni=1,iface=<interface name> \
+ --no-mlockall --in-memory \
+ -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
diff --git a/doc/guides/howto/index.rst b/doc/guides/howto/index.rst
index 71a3381c36..a7692e8a97 100644
--- a/doc/guides/howto/index.rst
+++ b/doc/guides/howto/index.rst
@@ -8,7 +8,7 @@ HowTo Guides
:maxdepth: 2
:numbered:
- af_xdp_cni
+ af_xdp_dp
lm_bond_virtio_sriov
lm_virtio_vhost_user
flow_bifurcation
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 1932525d4d..4dd9c73742 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -155,9 +155,9 @@ use_cni
~~~~~~~
The EAL vdev argument ``use_cni`` is used to indicate that the user wishes to
-enable the `AF_XDP Plugin for Kubernetes`_ within a DPDK application.
+enable the `AF_XDP Device Plugin for Kubernetes`_ with a DPDK application/pod.
-.. _AF_XDP Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
+.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
.. code-block:: console
--
2.41.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [v10 2/3] net/af_xdp: fix multi interface support for K8s
2024-02-29 13:01 [v10 0/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
2024-02-29 13:01 ` [v10 1/3] docs: AF_XDP Device Plugin Maryam Tahhan
@ 2024-02-29 13:01 ` Maryam Tahhan
2024-02-29 13:06 ` Maryam Tahhan
2024-02-29 13:01 ` [v10 3/3] net/af_xdp: support AF_XDP DP pinned maps Maryam Tahhan
2 siblings, 1 reply; 5+ messages in thread
From: Maryam Tahhan @ 2024-02-29 13:01 UTC (permalink / raw)
To: ferruh.yigit, stephen, lihuisong, fengchengwen, liuyonglong,
david.marchand, shibin.koikkara.reeny, ciara.loftus
Cc: dev, Maryam Tahhan, stable
The original 'use_cni' implementation, was added
to enable support for the AF_XDP PMD in a K8s env
without any escalated privileges.
However 'use_cni' used a hardcoded socket rather
than a configurable one. If a DPDK pod is requesting
multiple net devices and these devices are from
different pools, then the AF_XDP PMD attempts to
mount all the netdev UDSes in the pod as /tmp/afxdp.sock.
Which means that at best only 1 netdev will handshake
correctly with the AF_XDP DP. This patch addresses
this by making the socket parameter configurable using
a new vdev param called 'dp_path' alongside the
original 'use_cni' param. If the 'dp_path' parameter
is not set alongside the 'use_cni' parameter, then
it's configured inside the AF_XDP PMD (transparently
to the user). This change has been tested
with the AF_XDP DP PR 81[1], with both single and
multiple interfaces.
[1] https://github.com/intel/afxdp-plugins-for-kubernetes/pull/81
Fixes: 7fc6ae50369d ("net/af_xdp: support CNI Integration")
Cc: stable@dpdk.org
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
---
doc/guides/howto/af_xdp_dp.rst | 62 +++++++++++------
doc/guides/nics/af_xdp.rst | 14 ++++
doc/guides/rel_notes/release_24_03.rst | 7 ++
drivers/net/af_xdp/rte_eth_af_xdp.c | 94 ++++++++++++++++----------
4 files changed, 121 insertions(+), 56 deletions(-)
diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
index 7166d904bd..ec348c3b82 100644
--- a/doc/guides/howto/af_xdp_dp.rst
+++ b/doc/guides/howto/af_xdp_dp.rst
@@ -52,29 +52,33 @@ should be used when creating the socket
to instruct libbpf not to load the default libbpf program on the netdev.
Instead the loading is handled by the AF_XDP Device Plugin.
-Limitations
------------
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
+to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
+AF_XDP Device Plugin. If this argument is not passed alongside the ``use_cni``
+argument then the AF_XDP PMD configures it internally.
-For DPDK versions <= v23.11 the Unix Domain Socket file path appears in
-the pod at "/tmp/afxdp.sock". The handshake implementation in the AF_XDP PMD
-is only compatible with the AF_XDP Device Plugin up to commit id `38317c2`_
-and the pod is limited to a single netdev.
+.. note::
+
+ DPDK AF_XDP PMD <= v23.11 will only work with the AF_XDP Device Plugin
+ <= commit id `38317c2`_.
.. note::
- DPDK AF_XDP PMD <= v23.11 will not work with the latest version of the
- AF_XDP Device Plugin.
+ DPDK AF_XDP PMD > v23.11 will work with latest version of the
+ AF_XDP Device Plugin through a combination of the ``dp_path`` and/or
+ the ``use_cni`` parameter. In these versions of the PMD if a user doesn't
+ explicitly set the ``dp_path``parameter when using ``use_cni`` then that
+ path is transparently configured in the AF_XDP PMD to the default
+ `AF_XDP Device Plugin for Kubernetes`_ mount point path. The path can
+ be overriden by explicitly setting the ``dp_path`` param.
-The issue is if a single pod requests different devices from different pools it
-results in multiple UDS servers serving the pod with the container using only a
-single mount point for their UDS as ``/tmp/afxdp.sock``. This means that at best one
-device might be able to complete the handshake. This has been fixed in the AF_XDP
-Device Plugin so that the mount point in the pods for the UDS appear at
-``/tmp/afxdp_dp/<netdev>/afxdp.sock``. Later versions of DPDK fix this hardcoded path
-in the PMD alongside the ``use_cni`` parameter.
+.. note::
-.. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
+ DPDK AF_XDP PMD > v23.11 is backwards compatible with (older) versions
+ of the AF_XDP DP <= commit id `38317c2`_ by explicitly setting ``dp_path`` to
+ ``/tmp/afxdp.sock``.
+.. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
Prerequisites
-------------
@@ -105,10 +109,10 @@ Device Plugin and DPDK container prerequisites:
.. code-block:: console
- cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
- [Service]
- LimitMEMLOCK=infinity
- EOF
+ cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
+ [Service]
+ LimitMEMLOCK=infinity
+ EOF
* dpdk-testpmd application should have AF_XDP feature enabled.
@@ -284,7 +288,7 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
emptyDir:
medium: HugePages
- For further reference please use the `pod.yaml`_
+ For further reference please see the `pod.yaml`_
.. _pod.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/examples/pod-spec.yaml
@@ -297,3 +301,19 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
--vdev=net_af_xdp0,use_cni=1,iface=<interface name> \
--no-mlockall --in-memory \
-- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+
+ Or
+
+ .. code-block:: console
+
+ kubectl exec -i <Pod name> --container <containers name> -- \
+ /<Path>/dpdk-testpmd -l 0,1 --no-pci \
+ --vdev=net_af_xdp0,use_cni=1,iface=<interface name>,dp_path="/tmp/afxdp_dp/<interface name>/afxdp.sock" \
+ --no-mlockall --in-memory \
+ -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+
+.. note::
+
+ If the ``dp_path`` parameter isn't explicitly set (like the example above)
+ the AF_XDP PMD will set the parameter value to
+ ``/tmp/afxdp_dp/<<interface name>>/afxdp.sock``.
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 4dd9c73742..7f8651beda 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -171,6 +171,20 @@ enable the `AF_XDP Device Plugin for Kubernetes`_ with a DPDK application/pod.
so enabling and disabling of the promiscuous mode through the DPDK application
is also not supported.
+dp_path
+~~~~~~~
+
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
+to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
+`AF_XDP Device Plugin for Kubernetes`_. If this argument is not passed
+alongside the ``use_cni`` argument then the AF_XDP PMD configures it internally.
+
+.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
+
+.. code-block:: console
+
+ --vdev=net_af_xdp0,use_cni=1,dp_path="/tmp/afxdp_dp/<<interface name>>/afxdp.sock"
+
Limitations
-----------
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 879bb4944c..b2b1f2566f 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -138,6 +138,13 @@ New Features
to support TLS v1.2, TLS v1.3 and DTLS v1.2.
* Added PMD API to allow raw submission of instructions to CPT.
+* **Enabled AF_XDP PMD multi interface (UDS) support with AF_XDP Device Plugin**.
+
+ The EAL vdev argument for the AF_XDP PMD ``use_cni`` previously limited
+ a pod to using only a single netdev/interface. The latest changes (adding
+ the ``dp_path`` parameter) remove this limitation and maintain backward
+ compatibility for any applications already using the ``use_cni`` vdev
+ argument with the AF_XDP Device Plugin.
Removed Items
-------------
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 2d151e45c7..3fb0c6a3b9 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -83,12 +83,13 @@ RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype, NOTICE);
#define ETH_AF_XDP_MP_KEY "afxdp_mp_send_fds"
+#define DP_BASE_PATH "/tmp/afxdp_dp"
+#define DP_UDS_SOCK "afxdp.sock"
#define MAX_LONG_OPT_SZ 64
#define UDS_MAX_FD_NUM 2
#define UDS_MAX_CMD_LEN 64
#define UDS_MAX_CMD_RESP 128
#define UDS_XSK_MAP_FD_MSG "/xsk_map_fd"
-#define UDS_SOCK "/tmp/afxdp.sock"
#define UDS_CONNECT_MSG "/connect"
#define UDS_HOST_OK_MSG "/host_ok"
#define UDS_HOST_NAK_MSG "/host_nak"
@@ -171,6 +172,7 @@ struct pmd_internals {
bool custom_prog_configured;
bool force_copy;
bool use_cni;
+ char dp_path[PATH_MAX];
struct bpf_map *map;
struct rte_ether_addr eth_addr;
@@ -191,6 +193,7 @@ struct pmd_process_private {
#define ETH_AF_XDP_BUDGET_ARG "busy_budget"
#define ETH_AF_XDP_FORCE_COPY_ARG "force_copy"
#define ETH_AF_XDP_USE_CNI_ARG "use_cni"
+#define ETH_AF_XDP_DP_PATH_ARG "dp_path"
static const char * const valid_arguments[] = {
ETH_AF_XDP_IFACE_ARG,
@@ -201,6 +204,7 @@ static const char * const valid_arguments[] = {
ETH_AF_XDP_BUDGET_ARG,
ETH_AF_XDP_FORCE_COPY_ARG,
ETH_AF_XDP_USE_CNI_ARG,
+ ETH_AF_XDP_DP_PATH_ARG,
NULL
};
@@ -1351,7 +1355,7 @@ configure_preferred_busy_poll(struct pkt_rx_queue *rxq)
}
static int
-init_uds_sock(struct sockaddr_un *server)
+init_uds_sock(struct sockaddr_un *server, const char *dp_path)
{
int sock;
@@ -1362,7 +1366,7 @@ init_uds_sock(struct sockaddr_un *server)
}
server->sun_family = AF_UNIX;
- strlcpy(server->sun_path, UDS_SOCK, sizeof(server->sun_path));
+ strlcpy(server->sun_path, dp_path, sizeof(server->sun_path));
if (connect(sock, (struct sockaddr *)server, sizeof(struct sockaddr_un)) < 0) {
close(sock);
@@ -1382,7 +1386,7 @@ struct msg_internal {
};
static int
-send_msg(int sock, char *request, int *fd)
+send_msg(int sock, char *request, int *fd, const char *dp_path)
{
int snd;
struct iovec iov;
@@ -1393,7 +1397,7 @@ send_msg(int sock, char *request, int *fd)
memset(&dst, 0, sizeof(dst));
dst.sun_family = AF_UNIX;
- strlcpy(dst.sun_path, UDS_SOCK, sizeof(dst.sun_path));
+ strlcpy(dst.sun_path, dp_path, sizeof(dst.sun_path));
/* Initialize message header structure */
memset(&msgh, 0, sizeof(msgh));
@@ -1470,8 +1474,8 @@ read_msg(int sock, char *response, struct sockaddr_un *s, int *fd)
}
static int
-make_request_cni(int sock, struct sockaddr_un *server, char *request,
- int *req_fd, char *response, int *out_fd)
+make_request_dp(int sock, struct sockaddr_un *server, char *request,
+ int *req_fd, char *response, int *out_fd, const char *dp_path)
{
int rval;
@@ -1483,7 +1487,7 @@ make_request_cni(int sock, struct sockaddr_un *server, char *request,
if (req_fd == NULL)
rval = write(sock, request, strlen(request));
else
- rval = send_msg(sock, request, req_fd);
+ rval = send_msg(sock, request, req_fd, dp_path);
if (rval < 0) {
AF_XDP_LOG(ERR, "Write error %s\n", strerror(errno));
@@ -1507,7 +1511,7 @@ check_response(char *response, char *exp_resp, long size)
}
static int
-get_cni_fd(char *if_name)
+uds_get_xskmap_fd(char *if_name, const char *dp_path)
{
char request[UDS_MAX_CMD_LEN], response[UDS_MAX_CMD_RESP];
char hostname[MAX_LONG_OPT_SZ], exp_resp[UDS_MAX_CMD_RESP];
@@ -1520,14 +1524,14 @@ get_cni_fd(char *if_name)
return -1;
memset(&server, 0, sizeof(server));
- sock = init_uds_sock(&server);
+ sock = init_uds_sock(&server, dp_path);
if (sock < 0)
return -1;
- /* Initiates handshake to CNI send: /connect,hostname */
+ /* Initiates handshake to the AF_XDP Device Plugin send: /connect,hostname */
snprintf(request, sizeof(request), "%s,%s", UDS_CONNECT_MSG, hostname);
memset(response, 0, sizeof(response));
- if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+ if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
goto err_close;
}
@@ -1541,7 +1545,7 @@ get_cni_fd(char *if_name)
/* Request for "/version" */
strlcpy(request, UDS_VERSION_MSG, UDS_MAX_CMD_LEN);
memset(response, 0, sizeof(response));
- if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+ if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
goto err_close;
}
@@ -1549,7 +1553,7 @@ get_cni_fd(char *if_name)
/* Request for file descriptor for netdev name*/
snprintf(request, sizeof(request), "%s,%s", UDS_XSK_MAP_FD_MSG, if_name);
memset(response, 0, sizeof(response));
- if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+ if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
goto err_close;
}
@@ -1571,7 +1575,7 @@ get_cni_fd(char *if_name)
/* Initiate close connection */
strlcpy(request, UDS_FIN_MSG, UDS_MAX_CMD_LEN);
memset(response, 0, sizeof(response));
- if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+ if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
goto err_close;
}
@@ -1695,17 +1699,16 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
}
if (internals->use_cni) {
- int err, fd, map_fd;
+ int err, map_fd;
- /* get socket fd from CNI plugin */
- map_fd = get_cni_fd(internals->if_name);
+ /* get socket fd from AF_XDP Device Plugin */
+ map_fd = uds_get_xskmap_fd(internals->if_name, internals->dp_path);
if (map_fd < 0) {
- AF_XDP_LOG(ERR, "Failed to receive CNI plugin fd\n");
+ AF_XDP_LOG(ERR, "Failed to receive xskmap fd from AF_XDP Device Plugin\n");
goto out_xsk;
}
- /* get socket fd */
- fd = xsk_socket__fd(rxq->xsk);
- err = bpf_map_update_elem(map_fd, &rxq->xsk_queue_idx, &fd, 0);
+
+ err = xsk_socket__update_xskmap(rxq->xsk, map_fd);
if (err) {
AF_XDP_LOG(ERR, "Failed to insert unprivileged xsk in map.\n");
goto out_xsk;
@@ -1881,13 +1884,13 @@ static const struct eth_dev_ops ops = {
.get_monitor_addr = eth_get_monitor_addr,
};
-/* CNI option works in unprivileged container environment
- * and ethernet device functionality will be reduced. So
- * additional customiszed eth_dev_ops struct is needed
- * for cni. Promiscuous enable and disable functionality
- * is removed.
+/* AF_XDP Device Plugin option works in unprivileged
+ * container environments and ethernet device functionality
+ * will be reduced. So additional customised eth_dev_ops
+ * struct is needed for the Device Plugin. Promiscuous
+ * enable and disable functionality is removed.
**/
-static const struct eth_dev_ops ops_cni = {
+static const struct eth_dev_ops ops_afxdp_dp = {
.dev_start = eth_dev_start,
.dev_stop = eth_dev_stop,
.dev_close = eth_dev_close,
@@ -2023,7 +2026,8 @@ xdp_get_channels_info(const char *if_name, int *max_queues,
static int
parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
int *queue_cnt, int *shared_umem, char *prog_path,
- int *busy_budget, int *force_copy, int *use_cni)
+ int *busy_budget, int *force_copy, int *use_cni,
+ char *dp_path)
{
int ret;
@@ -2069,6 +2073,11 @@ parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
if (ret < 0)
goto free_kvlist;
+ ret = rte_kvargs_process(kvlist, ETH_AF_XDP_DP_PATH_ARG,
+ &parse_prog_arg, dp_path);
+ if (ret < 0)
+ goto free_kvlist;
+
free_kvlist:
rte_kvargs_free(kvlist);
return ret;
@@ -2108,7 +2117,7 @@ static struct rte_eth_dev *
init_internals(struct rte_vdev_device *dev, const char *if_name,
int start_queue_idx, int queue_cnt, int shared_umem,
const char *prog_path, int busy_budget, int force_copy,
- int use_cni)
+ int use_cni, const char *dp_path)
{
const char *name = rte_vdev_device_name(dev);
const unsigned int numa_node = dev->device.numa_node;
@@ -2138,6 +2147,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
internals->shared_umem = shared_umem;
internals->force_copy = force_copy;
internals->use_cni = use_cni;
+ strlcpy(internals->dp_path, dp_path, PATH_MAX);
if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
&internals->combined_queue_cnt)) {
@@ -2199,7 +2209,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
if (!internals->use_cni)
eth_dev->dev_ops = &ops;
else
- eth_dev->dev_ops = &ops_cni;
+ eth_dev->dev_ops = &ops_afxdp_dp;
eth_dev->rx_pkt_burst = eth_af_xdp_rx;
eth_dev->tx_pkt_burst = eth_af_xdp_tx;
@@ -2328,6 +2338,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
int busy_budget = -1, ret;
int force_copy = 0;
int use_cni = 0;
+ char dp_path[PATH_MAX] = {'\0'};
struct rte_eth_dev *eth_dev = NULL;
const char *name = rte_vdev_device_name(dev);
@@ -2370,7 +2381,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
&xsk_queue_cnt, &shared_umem, prog_path,
- &busy_budget, &force_copy, &use_cni) < 0) {
+ &busy_budget, &force_copy, &use_cni, dp_path) < 0) {
AF_XDP_LOG(ERR, "Invalid kvargs value\n");
return -EINVAL;
}
@@ -2384,7 +2395,19 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
if (use_cni && strnlen(prog_path, PATH_MAX)) {
AF_XDP_LOG(ERR, "When '%s' parameter is used, '%s' parameter is not valid\n",
ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_PROG_ARG);
- return -EINVAL;
+ return -EINVAL;
+ }
+
+ if (use_cni && !strnlen(dp_path, PATH_MAX)) {
+ snprintf(dp_path, sizeof(dp_path), "%s/%s/%s", DP_BASE_PATH, if_name, DP_UDS_SOCK);
+ AF_XDP_LOG(INFO, "'%s' parameter not provided, setting value to '%s'\n",
+ ETH_AF_XDP_DP_PATH_ARG, dp_path);
+ }
+
+ if (!use_cni && strnlen(dp_path, PATH_MAX)) {
+ AF_XDP_LOG(ERR, "'%s' parameter is set, but '%s' was not enabled\n",
+ ETH_AF_XDP_DP_PATH_ARG, ETH_AF_XDP_USE_CNI_ARG);
+ return -EINVAL;
}
if (strlen(if_name) == 0) {
@@ -2410,7 +2433,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
xsk_queue_cnt, shared_umem, prog_path,
- busy_budget, force_copy, use_cni);
+ busy_budget, force_copy, use_cni, dp_path);
if (eth_dev == NULL) {
AF_XDP_LOG(ERR, "Failed to init internals\n");
return -1;
@@ -2471,4 +2494,5 @@ RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
"xdp_prog=<string> "
"busy_budget=<int> "
"force_copy=<int> "
- "use_cni=<int> ");
+ "use_cni=<int> "
+ "dp_path=<string> ");
--
2.41.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [v10 3/3] net/af_xdp: support AF_XDP DP pinned maps
2024-02-29 13:01 [v10 0/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
2024-02-29 13:01 ` [v10 1/3] docs: AF_XDP Device Plugin Maryam Tahhan
2024-02-29 13:01 ` [v10 2/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
@ 2024-02-29 13:01 ` Maryam Tahhan
2 siblings, 0 replies; 5+ messages in thread
From: Maryam Tahhan @ 2024-02-29 13:01 UTC (permalink / raw)
To: ferruh.yigit, stephen, lihuisong, fengchengwen, liuyonglong,
david.marchand, shibin.koikkara.reeny, ciara.loftus
Cc: dev, Maryam Tahhan
Enable the AF_XDP PMD to retrieve the xskmap
from a pinned eBPF map. This map is expected
to be pinned by an external entity like the
AF_XDP Device Plugin. This enabled unprivileged
pods to create and use AF_XDP sockets.
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
---
doc/guides/howto/af_xdp_dp.rst | 35 ++++++++--
doc/guides/nics/af_xdp.rst | 34 ++++++++--
doc/guides/rel_notes/release_24_03.rst | 10 +++
drivers/net/af_xdp/rte_eth_af_xdp.c | 93 ++++++++++++++++++++------
4 files changed, 141 insertions(+), 31 deletions(-)
diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
index ec348c3b82..9aa9f7d8d4 100644
--- a/doc/guides/howto/af_xdp_dp.rst
+++ b/doc/guides/howto/af_xdp_dp.rst
@@ -52,10 +52,21 @@ should be used when creating the socket
to instruct libbpf not to load the default libbpf program on the netdev.
Instead the loading is handled by the AF_XDP Device Plugin.
-The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
-to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
-AF_XDP Device Plugin. If this argument is not passed alongside the ``use_cni``
-argument then the AF_XDP PMD configures it internally.
+The EAL vdev argument ``use_pinned_map`` is used indicate to the AF_XDP PMD to
+retrieve the XSKMAP fd from a pinned eBPF map. This map is expected to be pinned
+by an external entity like the AF_XDP Device Plugin. This enabled unprivileged pods
+to create and use AF_XDP sockets. When this flag is set, the
+``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` libbpf flag is used by the AF_XDP PMD when
+creating the AF_XDP socket.
+
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` or ``use_pinned_map``
+arguments to explicitly tell the AF_XDP PMD where to find either:
+
+1. The UDS to interact with the AF_XDP Device Plugin. OR
+2. The pinned xskmap to use when creating AF_XDP sockets.
+
+If this argument is not passed alongside the ``use_cni`` or ``use_pinned_map`` arguments then
+the AF_XDP PMD configures it internally to the `AF_XDP Device Plugin for Kubernetes`_.
.. note::
@@ -312,8 +323,18 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
--no-mlockall --in-memory \
-- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+ Or
+
+ .. code-block:: console
+
+ kubectl exec -i <Pod name> --container <containers name> -- \
+ /<Path>/dpdk-testpmd -l 0,1 --no-pci \
+ --vdev=net_af_xdp0,use_pinned_map=1,iface=<interface name>,dp_path="/tmp/afxdp_dp/<interface name>/xsks_map" \
+ --no-mlockall --in-memory \
+ -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+
.. note::
- If the ``dp_path`` parameter isn't explicitly set (like the example above)
- the AF_XDP PMD will set the parameter value to
- ``/tmp/afxdp_dp/<<interface name>>/afxdp.sock``.
+ If the ``dp_path`` parameter isn't explicitly set with ``use_cni`` or ``use_pinned_map``
+ the AF_XDP PMD will set the parameter values to the `AF_XDP Device Plugin for Kubernetes`_
+ defaults.
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 7f8651beda..940bbf60f2 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -171,13 +171,35 @@ enable the `AF_XDP Device Plugin for Kubernetes`_ with a DPDK application/pod.
so enabling and disabling of the promiscuous mode through the DPDK application
is also not supported.
+use_pinned_map
+~~~~~~~~~~~~~~
+
+The EAL vdev argument ``use_pinned_map`` is used to indicate that the user wishes to
+load a pinned xskmap mounted by `AF_XDP Device Plugin for Kubernetes`_ in the DPDK
+application/pod.
+
+.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
+
+.. code-block:: console
+
+ --vdev=net_af_xdp0,use_pinned_map=1
+
+.. note::
+
+ This feature can also be used with any external entity that can pin an eBPF map, not just
+ the `AF_XDP Device Plugin for Kubernetes`_.
+
dp_path
~~~~~~~
-The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
-to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
-`AF_XDP Device Plugin for Kubernetes`_. If this argument is not passed
-alongside the ``use_cni`` argument then the AF_XDP PMD configures it internally.
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` or ``use_pinned_map``
+arguments to explicitly tell the AF_XDP PMD where to find either:
+
+1. The UDS to interact with the AF_XDP Device Plugin. OR
+2. The pinned xskmap to use when creating AF_XDP sockets.
+
+If this argument is not passed alongside the ``use_cni`` or ``use_pinned_map`` arguments then
+the AF_XDP PMD configures it internally to the `AF_XDP Device Plugin for Kubernetes`_.
.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
@@ -185,6 +207,10 @@ alongside the ``use_cni`` argument then the AF_XDP PMD configures it internally.
--vdev=net_af_xdp0,use_cni=1,dp_path="/tmp/afxdp_dp/<<interface name>>/afxdp.sock"
+.. code-block:: console
+
+ --vdev=net_af_xdp0,use_pinned_map=1,dp_path="/tmp/afxdp_dp/<<interface name>>/xsks_map"
+
Limitations
-----------
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index b2b1f2566f..95d9a0f842 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -146,6 +146,16 @@ New Features
compatibility for any applications already using the ``use_cni`` vdev
argument with the AF_XDP Device Plugin.
+* **Integrated AF_XDP PMD with AF_XDP Device Plugin eBPF map pinning support**.
+
+ The EAL vdev argument for the AF_XDP PMD ``use_map_pinning`` was added
+ to allow Kubernetes Pods to use AF_XDP with DPDK, and run with limited
+ privileges, without having to do a full handshake over a Unix Domain
+ Socket with the Device Plugin. This flag indicates that the AF_XDP PMD
+ will be used in unprivileged mode and will obtain the XSKMAP FD by calling
+ ``bpf_obj_get()`` for an xskmap pinned (by the AF_XDP DP) inside the
+ container.
+
Removed Items
-------------
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 3fb0c6a3b9..f13bdb9017 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -85,6 +85,7 @@ RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype, NOTICE);
#define DP_BASE_PATH "/tmp/afxdp_dp"
#define DP_UDS_SOCK "afxdp.sock"
+#define DP_XSK_MAP "xsks_map"
#define MAX_LONG_OPT_SZ 64
#define UDS_MAX_FD_NUM 2
#define UDS_MAX_CMD_LEN 64
@@ -172,6 +173,7 @@ struct pmd_internals {
bool custom_prog_configured;
bool force_copy;
bool use_cni;
+ bool use_pinned_map;
char dp_path[PATH_MAX];
struct bpf_map *map;
@@ -193,6 +195,7 @@ struct pmd_process_private {
#define ETH_AF_XDP_BUDGET_ARG "busy_budget"
#define ETH_AF_XDP_FORCE_COPY_ARG "force_copy"
#define ETH_AF_XDP_USE_CNI_ARG "use_cni"
+#define ETH_AF_XDP_USE_PINNED_MAP_ARG "use_pinned_map"
#define ETH_AF_XDP_DP_PATH_ARG "dp_path"
static const char * const valid_arguments[] = {
@@ -204,6 +207,7 @@ static const char * const valid_arguments[] = {
ETH_AF_XDP_BUDGET_ARG,
ETH_AF_XDP_FORCE_COPY_ARG,
ETH_AF_XDP_USE_CNI_ARG,
+ ETH_AF_XDP_USE_PINNED_MAP_ARG,
ETH_AF_XDP_DP_PATH_ARG,
NULL
};
@@ -1258,6 +1262,21 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
}
#endif
+static int
+get_pinned_map(const char *dp_path, int *map_fd)
+{
+ *map_fd = bpf_obj_get(dp_path);
+ if (!*map_fd) {
+ AF_XDP_LOG(ERR, "Failed to find xsks_map in %s\n", dp_path);
+ return -1;
+ }
+
+ AF_XDP_LOG(INFO, "Successfully retrieved map %s with fd %d\n",
+ dp_path, *map_fd);
+
+ return 0;
+}
+
static int
load_custom_xdp_prog(const char *prog_path, int if_index, struct bpf_map **map)
{
@@ -1644,7 +1663,7 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
#endif
/* Disable libbpf from loading XDP program */
- if (internals->use_cni)
+ if (internals->use_cni || internals->use_pinned_map)
cfg.libbpf_flags |= XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD;
if (strnlen(internals->prog_path, PATH_MAX)) {
@@ -1698,14 +1717,23 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
}
}
- if (internals->use_cni) {
+ if (internals->use_cni || internals->use_pinned_map) {
int err, map_fd;
- /* get socket fd from AF_XDP Device Plugin */
- map_fd = uds_get_xskmap_fd(internals->if_name, internals->dp_path);
- if (map_fd < 0) {
- AF_XDP_LOG(ERR, "Failed to receive xskmap fd from AF_XDP Device Plugin\n");
- goto out_xsk;
+ if (internals->use_cni) {
+ /* get socket fd from AF_XDP Device Plugin */
+ map_fd = uds_get_xskmap_fd(internals->if_name, internals->dp_path);
+ if (map_fd < 0) {
+ AF_XDP_LOG(ERR, "Failed to receive xskmap fd from AF_XDP Device Plugin\n");
+ goto out_xsk;
+ }
+ } else {
+ /* get socket fd from AF_XDP plugin */
+ err = get_pinned_map(internals->dp_path, &map_fd);
+ if (err < 0 || map_fd < 0) {
+ AF_XDP_LOG(ERR, "Failed to retrieve pinned map fd\n");
+ goto out_xsk;
+ }
}
err = xsk_socket__update_xskmap(rxq->xsk, map_fd);
@@ -2027,7 +2055,7 @@ static int
parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
int *queue_cnt, int *shared_umem, char *prog_path,
int *busy_budget, int *force_copy, int *use_cni,
- char *dp_path)
+ int *use_pinned_map, char *dp_path)
{
int ret;
@@ -2073,6 +2101,11 @@ parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
if (ret < 0)
goto free_kvlist;
+ ret = rte_kvargs_process(kvlist, ETH_AF_XDP_USE_PINNED_MAP_ARG,
+ &parse_integer_arg, use_pinned_map);
+ if (ret < 0)
+ goto free_kvlist;
+
ret = rte_kvargs_process(kvlist, ETH_AF_XDP_DP_PATH_ARG,
&parse_prog_arg, dp_path);
if (ret < 0)
@@ -2117,7 +2150,7 @@ static struct rte_eth_dev *
init_internals(struct rte_vdev_device *dev, const char *if_name,
int start_queue_idx, int queue_cnt, int shared_umem,
const char *prog_path, int busy_budget, int force_copy,
- int use_cni, const char *dp_path)
+ int use_cni, int use_pinned_map, const char *dp_path)
{
const char *name = rte_vdev_device_name(dev);
const unsigned int numa_node = dev->device.numa_node;
@@ -2147,6 +2180,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
internals->shared_umem = shared_umem;
internals->force_copy = force_copy;
internals->use_cni = use_cni;
+ internals->use_pinned_map = use_pinned_map;
strlcpy(internals->dp_path, dp_path, PATH_MAX);
if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
@@ -2206,7 +2240,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
eth_dev->data->dev_link = pmd_link;
eth_dev->data->mac_addrs = &internals->eth_addr;
eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
- if (!internals->use_cni)
+ if (!internals->use_cni && !internals->use_pinned_map)
eth_dev->dev_ops = &ops;
else
eth_dev->dev_ops = &ops_afxdp_dp;
@@ -2338,6 +2372,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
int busy_budget = -1, ret;
int force_copy = 0;
int use_cni = 0;
+ int use_pinned_map = 0;
char dp_path[PATH_MAX] = {'\0'};
struct rte_eth_dev *eth_dev = NULL;
const char *name = rte_vdev_device_name(dev);
@@ -2381,20 +2416,29 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
&xsk_queue_cnt, &shared_umem, prog_path,
- &busy_budget, &force_copy, &use_cni, dp_path) < 0) {
+ &busy_budget, &force_copy, &use_cni, &use_pinned_map,
+ dp_path) < 0) {
AF_XDP_LOG(ERR, "Invalid kvargs value\n");
return -EINVAL;
}
- if (use_cni && busy_budget > 0) {
+ if (use_cni && use_pinned_map) {
AF_XDP_LOG(ERR, "When '%s' parameter is used, '%s' parameter is not valid\n",
- ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_BUDGET_ARG);
+ ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_USE_PINNED_MAP_ARG);
return -EINVAL;
}
- if (use_cni && strnlen(prog_path, PATH_MAX)) {
- AF_XDP_LOG(ERR, "When '%s' parameter is used, '%s' parameter is not valid\n",
- ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_PROG_ARG);
+ if ((use_cni || use_pinned_map) && busy_budget > 0) {
+ AF_XDP_LOG(ERR, "When '%s' or '%s' parameter is used, '%s' parameter is not valid\n",
+ ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_USE_PINNED_MAP_ARG,
+ ETH_AF_XDP_BUDGET_ARG);
+ return -EINVAL;
+ }
+
+ if ((use_cni || use_pinned_map) && strnlen(prog_path, PATH_MAX)) {
+ AF_XDP_LOG(ERR, "When '%s' or '%s' parameter is used, '%s' parameter is not valid\n",
+ ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_USE_PINNED_MAP_ARG,
+ ETH_AF_XDP_PROG_ARG);
return -EINVAL;
}
@@ -2404,9 +2448,16 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
ETH_AF_XDP_DP_PATH_ARG, dp_path);
}
- if (!use_cni && strnlen(dp_path, PATH_MAX)) {
- AF_XDP_LOG(ERR, "'%s' parameter is set, but '%s' was not enabled\n",
- ETH_AF_XDP_DP_PATH_ARG, ETH_AF_XDP_USE_CNI_ARG);
+ if (use_pinned_map && !strnlen(dp_path, PATH_MAX)) {
+ snprintf(dp_path, sizeof(dp_path), "%s/%s/%s", DP_BASE_PATH, if_name, DP_XSK_MAP);
+ AF_XDP_LOG(INFO, "'%s' parameter not provided, setting value to '%s'\n",
+ ETH_AF_XDP_DP_PATH_ARG, dp_path);
+ }
+
+ if ((!use_cni && !use_pinned_map) && strnlen(dp_path, PATH_MAX)) {
+ AF_XDP_LOG(ERR, "'%s' parameter is set, but '%s' or '%s' were not enabled\n",
+ ETH_AF_XDP_DP_PATH_ARG, ETH_AF_XDP_USE_CNI_ARG,
+ ETH_AF_XDP_USE_PINNED_MAP_ARG);
return -EINVAL;
}
@@ -2433,7 +2484,8 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
xsk_queue_cnt, shared_umem, prog_path,
- busy_budget, force_copy, use_cni, dp_path);
+ busy_budget, force_copy, use_cni, use_pinned_map,
+ dp_path);
if (eth_dev == NULL) {
AF_XDP_LOG(ERR, "Failed to init internals\n");
return -1;
@@ -2495,4 +2547,5 @@ RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
"busy_budget=<int> "
"force_copy=<int> "
"use_cni=<int> "
+ "use_pinned_map=<int> "
"dp_path=<string> ");
--
2.41.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [v10 2/3] net/af_xdp: fix multi interface support for K8s
2024-02-29 13:01 ` [v10 2/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
@ 2024-02-29 13:06 ` Maryam Tahhan
0 siblings, 0 replies; 5+ messages in thread
From: Maryam Tahhan @ 2024-02-29 13:06 UTC (permalink / raw)
To: ferruh.yigit, stephen, lihuisong, fengchengwen, liuyonglong,
david.marchand, shibin.koikkara.reeny, ciara.loftus
Cc: dev, stable
On 29/02/2024 13:01, Maryam Tahhan wrote:
> The original 'use_cni' implementation, was added
> to enable support for the AF_XDP PMD in a K8s env
> without any escalated privileges.
> However 'use_cni' used a hardcoded socket rather
> than a configurable one. If a DPDK pod is requesting
> multiple net devices and these devices are from
> different pools, then the AF_XDP PMD attempts to
> mount all the netdev UDSes in the pod as /tmp/afxdp.sock.
> Which means that at best only 1 netdev will handshake
> correctly with the AF_XDP DP. This patch addresses
> this by making the socket parameter configurable using
> a new vdev param called 'dp_path' alongside the
> original 'use_cni' param. If the 'dp_path' parameter
> is not set alongside the 'use_cni' parameter, then
> it's configured inside the AF_XDP PMD (transparently
> to the user). This change has been tested
> with the AF_XDP DP PR 81[1], with both single and
> multiple interfaces.
>
> [1] https://github.com/intel/afxdp-plugins-for-kubernetes/pull/81
>
> Fixes: 7fc6ae50369d ("net/af_xdp: support CNI Integration")
> Cc: stable@dpdk.org
>
> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
> ---
> doc/guides/howto/af_xdp_dp.rst | 62 +++++++++++------
> doc/guides/nics/af_xdp.rst | 14 ++++
> doc/guides/rel_notes/release_24_03.rst | 7 ++
> drivers/net/af_xdp/rte_eth_af_xdp.c | 94 ++++++++++++++++----------
> 4 files changed, 121 insertions(+), 56 deletions(-)
>
> diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
> index 7166d904bd..ec348c3b82 100644
> --- a/doc/guides/howto/af_xdp_dp.rst
> +++ b/doc/guides/howto/af_xdp_dp.rst
> @@ -52,29 +52,33 @@ should be used when creating the socket
> to instruct libbpf not to load the default libbpf program on the netdev.
> Instead the loading is handled by the AF_XDP Device Plugin.
>
> -Limitations
> ------------
> +The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
> +to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
> +AF_XDP Device Plugin. If this argument is not passed alongside the ``use_cni``
> +argument then the AF_XDP PMD configures it internally.
>
> -For DPDK versions <= v23.11 the Unix Domain Socket file path appears in
> -the pod at "/tmp/afxdp.sock". The handshake implementation in the AF_XDP PMD
> -is only compatible with the AF_XDP Device Plugin up to commit id `38317c2`_
> -and the pod is limited to a single netdev.
> +.. note::
> +
> + DPDK AF_XDP PMD <= v23.11 will only work with the AF_XDP Device Plugin
> + <= commit id `38317c2`_.
>
> .. note::
>
> - DPDK AF_XDP PMD <= v23.11 will not work with the latest version of the
> - AF_XDP Device Plugin.
> + DPDK AF_XDP PMD > v23.11 will work with latest version of the
> + AF_XDP Device Plugin through a combination of the ``dp_path`` and/or
> + the ``use_cni`` parameter. In these versions of the PMD if a user doesn't
> + explicitly set the ``dp_path``parameter when using ``use_cni`` then that
I see the typo - will respin - sorry, it's been a long day already
> + path is transparently configured in the AF_XDP PMD to the default
> + `AF_XDP Device Plugin for Kubernetes`_ mount point path. The path can
> + be overriden by explicitly setting the ``dp_path`` param.
>
> -The issue is if a single pod requests different devices from different pools it
> -results in multiple UDS servers serving the pod with the container using only a
> -single mount point for their UDS as ``/tmp/afxdp.sock``. This means that at best one
> -device might be able to complete the handshake. This has been fixed in the AF_XDP
> -Device Plugin so that the mount point in the pods for the UDS appear at
> -``/tmp/afxdp_dp/<netdev>/afxdp.sock``. Later versions of DPDK fix this hardcoded path
> -in the PMD alongside the ``use_cni`` parameter.
> +.. note::
>
> -.. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
> + DPDK AF_XDP PMD > v23.11 is backwards compatible with (older) versions
> + of the AF_XDP DP <= commit id `38317c2`_ by explicitly setting ``dp_path`` to
> + ``/tmp/afxdp.sock``.
>
> +.. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
>
> Prerequisites
> -------------
> @@ -105,10 +109,10 @@ Device Plugin and DPDK container prerequisites:
>
> .. code-block:: console
>
> - cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
> - [Service]
> - LimitMEMLOCK=infinity
> - EOF
> + cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
> + [Service]
> + LimitMEMLOCK=infinity
> + EOF
>
> * dpdk-testpmd application should have AF_XDP feature enabled.
>
> @@ -284,7 +288,7 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
> emptyDir:
> medium: HugePages
>
> - For further reference please use the `pod.yaml`_
> + For further reference please see the `pod.yaml`_
>
> .. _pod.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/examples/pod-spec.yaml
>
> @@ -297,3 +301,19 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
> --vdev=net_af_xdp0,use_cni=1,iface=<interface name> \
> --no-mlockall --in-memory \
> -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
> +
> + Or
> +
> + .. code-block:: console
> +
> + kubectl exec -i <Pod name> --container <containers name> -- \
> + /<Path>/dpdk-testpmd -l 0,1 --no-pci \
> + --vdev=net_af_xdp0,use_cni=1,iface=<interface name>,dp_path="/tmp/afxdp_dp/<interface name>/afxdp.sock" \
> + --no-mlockall --in-memory \
> + -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
> +
> +.. note::
> +
> + If the ``dp_path`` parameter isn't explicitly set (like the example above)
> + the AF_XDP PMD will set the parameter value to
> + ``/tmp/afxdp_dp/<<interface name>>/afxdp.sock``.
> diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> index 4dd9c73742..7f8651beda 100644
> --- a/doc/guides/nics/af_xdp.rst
> +++ b/doc/guides/nics/af_xdp.rst
> @@ -171,6 +171,20 @@ enable the `AF_XDP Device Plugin for Kubernetes`_ with a DPDK application/pod.
> so enabling and disabling of the promiscuous mode through the DPDK application
> is also not supported.
>
> +dp_path
> +~~~~~~~
> +
> +The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
> +to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
> +`AF_XDP Device Plugin for Kubernetes`_. If this argument is not passed
> +alongside the ``use_cni`` argument then the AF_XDP PMD configures it internally.
> +
> +.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
> +
> +.. code-block:: console
> +
> + --vdev=net_af_xdp0,use_cni=1,dp_path="/tmp/afxdp_dp/<<interface name>>/afxdp.sock"
> +
> Limitations
> -----------
>
> diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
> index 879bb4944c..b2b1f2566f 100644
> --- a/doc/guides/rel_notes/release_24_03.rst
> +++ b/doc/guides/rel_notes/release_24_03.rst
> @@ -138,6 +138,13 @@ New Features
> to support TLS v1.2, TLS v1.3 and DTLS v1.2.
> * Added PMD API to allow raw submission of instructions to CPT.
>
> +* **Enabled AF_XDP PMD multi interface (UDS) support with AF_XDP Device Plugin**.
> +
> + The EAL vdev argument for the AF_XDP PMD ``use_cni`` previously limited
> + a pod to using only a single netdev/interface. The latest changes (adding
> + the ``dp_path`` parameter) remove this limitation and maintain backward
> + compatibility for any applications already using the ``use_cni`` vdev
> + argument with the AF_XDP Device Plugin.
>
> Removed Items
> -------------
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 2d151e45c7..3fb0c6a3b9 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -83,12 +83,13 @@ RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype, NOTICE);
>
> #define ETH_AF_XDP_MP_KEY "afxdp_mp_send_fds"
>
> +#define DP_BASE_PATH "/tmp/afxdp_dp"
> +#define DP_UDS_SOCK "afxdp.sock"
> #define MAX_LONG_OPT_SZ 64
> #define UDS_MAX_FD_NUM 2
> #define UDS_MAX_CMD_LEN 64
> #define UDS_MAX_CMD_RESP 128
> #define UDS_XSK_MAP_FD_MSG "/xsk_map_fd"
> -#define UDS_SOCK "/tmp/afxdp.sock"
> #define UDS_CONNECT_MSG "/connect"
> #define UDS_HOST_OK_MSG "/host_ok"
> #define UDS_HOST_NAK_MSG "/host_nak"
> @@ -171,6 +172,7 @@ struct pmd_internals {
> bool custom_prog_configured;
> bool force_copy;
> bool use_cni;
> + char dp_path[PATH_MAX];
> struct bpf_map *map;
>
> struct rte_ether_addr eth_addr;
> @@ -191,6 +193,7 @@ struct pmd_process_private {
> #define ETH_AF_XDP_BUDGET_ARG "busy_budget"
> #define ETH_AF_XDP_FORCE_COPY_ARG "force_copy"
> #define ETH_AF_XDP_USE_CNI_ARG "use_cni"
> +#define ETH_AF_XDP_DP_PATH_ARG "dp_path"
>
> static const char * const valid_arguments[] = {
> ETH_AF_XDP_IFACE_ARG,
> @@ -201,6 +204,7 @@ static const char * const valid_arguments[] = {
> ETH_AF_XDP_BUDGET_ARG,
> ETH_AF_XDP_FORCE_COPY_ARG,
> ETH_AF_XDP_USE_CNI_ARG,
> + ETH_AF_XDP_DP_PATH_ARG,
> NULL
> };
>
> @@ -1351,7 +1355,7 @@ configure_preferred_busy_poll(struct pkt_rx_queue *rxq)
> }
>
> static int
> -init_uds_sock(struct sockaddr_un *server)
> +init_uds_sock(struct sockaddr_un *server, const char *dp_path)
> {
> int sock;
>
> @@ -1362,7 +1366,7 @@ init_uds_sock(struct sockaddr_un *server)
> }
>
> server->sun_family = AF_UNIX;
> - strlcpy(server->sun_path, UDS_SOCK, sizeof(server->sun_path));
> + strlcpy(server->sun_path, dp_path, sizeof(server->sun_path));
>
> if (connect(sock, (struct sockaddr *)server, sizeof(struct sockaddr_un)) < 0) {
> close(sock);
> @@ -1382,7 +1386,7 @@ struct msg_internal {
> };
>
> static int
> -send_msg(int sock, char *request, int *fd)
> +send_msg(int sock, char *request, int *fd, const char *dp_path)
> {
> int snd;
> struct iovec iov;
> @@ -1393,7 +1397,7 @@ send_msg(int sock, char *request, int *fd)
>
> memset(&dst, 0, sizeof(dst));
> dst.sun_family = AF_UNIX;
> - strlcpy(dst.sun_path, UDS_SOCK, sizeof(dst.sun_path));
> + strlcpy(dst.sun_path, dp_path, sizeof(dst.sun_path));
>
> /* Initialize message header structure */
> memset(&msgh, 0, sizeof(msgh));
> @@ -1470,8 +1474,8 @@ read_msg(int sock, char *response, struct sockaddr_un *s, int *fd)
> }
>
> static int
> -make_request_cni(int sock, struct sockaddr_un *server, char *request,
> - int *req_fd, char *response, int *out_fd)
> +make_request_dp(int sock, struct sockaddr_un *server, char *request,
> + int *req_fd, char *response, int *out_fd, const char *dp_path)
> {
> int rval;
>
> @@ -1483,7 +1487,7 @@ make_request_cni(int sock, struct sockaddr_un *server, char *request,
> if (req_fd == NULL)
> rval = write(sock, request, strlen(request));
> else
> - rval = send_msg(sock, request, req_fd);
> + rval = send_msg(sock, request, req_fd, dp_path);
>
> if (rval < 0) {
> AF_XDP_LOG(ERR, "Write error %s\n", strerror(errno));
> @@ -1507,7 +1511,7 @@ check_response(char *response, char *exp_resp, long size)
> }
>
> static int
> -get_cni_fd(char *if_name)
> +uds_get_xskmap_fd(char *if_name, const char *dp_path)
> {
> char request[UDS_MAX_CMD_LEN], response[UDS_MAX_CMD_RESP];
> char hostname[MAX_LONG_OPT_SZ], exp_resp[UDS_MAX_CMD_RESP];
> @@ -1520,14 +1524,14 @@ get_cni_fd(char *if_name)
> return -1;
>
> memset(&server, 0, sizeof(server));
> - sock = init_uds_sock(&server);
> + sock = init_uds_sock(&server, dp_path);
> if (sock < 0)
> return -1;
>
> - /* Initiates handshake to CNI send: /connect,hostname */
> + /* Initiates handshake to the AF_XDP Device Plugin send: /connect,hostname */
> snprintf(request, sizeof(request), "%s,%s", UDS_CONNECT_MSG, hostname);
> memset(response, 0, sizeof(response));
> - if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
> + if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
> AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
> goto err_close;
> }
> @@ -1541,7 +1545,7 @@ get_cni_fd(char *if_name)
> /* Request for "/version" */
> strlcpy(request, UDS_VERSION_MSG, UDS_MAX_CMD_LEN);
> memset(response, 0, sizeof(response));
> - if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
> + if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
> AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
> goto err_close;
> }
> @@ -1549,7 +1553,7 @@ get_cni_fd(char *if_name)
> /* Request for file descriptor for netdev name*/
> snprintf(request, sizeof(request), "%s,%s", UDS_XSK_MAP_FD_MSG, if_name);
> memset(response, 0, sizeof(response));
> - if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
> + if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
> AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
> goto err_close;
> }
> @@ -1571,7 +1575,7 @@ get_cni_fd(char *if_name)
> /* Initiate close connection */
> strlcpy(request, UDS_FIN_MSG, UDS_MAX_CMD_LEN);
> memset(response, 0, sizeof(response));
> - if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
> + if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
> AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
> goto err_close;
> }
> @@ -1695,17 +1699,16 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
> }
>
> if (internals->use_cni) {
> - int err, fd, map_fd;
> + int err, map_fd;
>
> - /* get socket fd from CNI plugin */
> - map_fd = get_cni_fd(internals->if_name);
> + /* get socket fd from AF_XDP Device Plugin */
> + map_fd = uds_get_xskmap_fd(internals->if_name, internals->dp_path);
> if (map_fd < 0) {
> - AF_XDP_LOG(ERR, "Failed to receive CNI plugin fd\n");
> + AF_XDP_LOG(ERR, "Failed to receive xskmap fd from AF_XDP Device Plugin\n");
> goto out_xsk;
> }
> - /* get socket fd */
> - fd = xsk_socket__fd(rxq->xsk);
> - err = bpf_map_update_elem(map_fd, &rxq->xsk_queue_idx, &fd, 0);
> +
> + err = xsk_socket__update_xskmap(rxq->xsk, map_fd);
> if (err) {
> AF_XDP_LOG(ERR, "Failed to insert unprivileged xsk in map.\n");
> goto out_xsk;
> @@ -1881,13 +1884,13 @@ static const struct eth_dev_ops ops = {
> .get_monitor_addr = eth_get_monitor_addr,
> };
>
> -/* CNI option works in unprivileged container environment
> - * and ethernet device functionality will be reduced. So
> - * additional customiszed eth_dev_ops struct is needed
> - * for cni. Promiscuous enable and disable functionality
> - * is removed.
> +/* AF_XDP Device Plugin option works in unprivileged
> + * container environments and ethernet device functionality
> + * will be reduced. So additional customised eth_dev_ops
> + * struct is needed for the Device Plugin. Promiscuous
> + * enable and disable functionality is removed.
> **/
> -static const struct eth_dev_ops ops_cni = {
> +static const struct eth_dev_ops ops_afxdp_dp = {
> .dev_start = eth_dev_start,
> .dev_stop = eth_dev_stop,
> .dev_close = eth_dev_close,
> @@ -2023,7 +2026,8 @@ xdp_get_channels_info(const char *if_name, int *max_queues,
> static int
> parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
> int *queue_cnt, int *shared_umem, char *prog_path,
> - int *busy_budget, int *force_copy, int *use_cni)
> + int *busy_budget, int *force_copy, int *use_cni,
> + char *dp_path)
> {
> int ret;
>
> @@ -2069,6 +2073,11 @@ parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
> if (ret < 0)
> goto free_kvlist;
>
> + ret = rte_kvargs_process(kvlist, ETH_AF_XDP_DP_PATH_ARG,
> + &parse_prog_arg, dp_path);
> + if (ret < 0)
> + goto free_kvlist;
> +
> free_kvlist:
> rte_kvargs_free(kvlist);
> return ret;
> @@ -2108,7 +2117,7 @@ static struct rte_eth_dev *
> init_internals(struct rte_vdev_device *dev, const char *if_name,
> int start_queue_idx, int queue_cnt, int shared_umem,
> const char *prog_path, int busy_budget, int force_copy,
> - int use_cni)
> + int use_cni, const char *dp_path)
> {
> const char *name = rte_vdev_device_name(dev);
> const unsigned int numa_node = dev->device.numa_node;
> @@ -2138,6 +2147,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
> internals->shared_umem = shared_umem;
> internals->force_copy = force_copy;
> internals->use_cni = use_cni;
> + strlcpy(internals->dp_path, dp_path, PATH_MAX);
>
> if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
> &internals->combined_queue_cnt)) {
> @@ -2199,7 +2209,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
> if (!internals->use_cni)
> eth_dev->dev_ops = &ops;
> else
> - eth_dev->dev_ops = &ops_cni;
> + eth_dev->dev_ops = &ops_afxdp_dp;
>
> eth_dev->rx_pkt_burst = eth_af_xdp_rx;
> eth_dev->tx_pkt_burst = eth_af_xdp_tx;
> @@ -2328,6 +2338,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
> int busy_budget = -1, ret;
> int force_copy = 0;
> int use_cni = 0;
> + char dp_path[PATH_MAX] = {'\0'};
> struct rte_eth_dev *eth_dev = NULL;
> const char *name = rte_vdev_device_name(dev);
>
> @@ -2370,7 +2381,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
>
> if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
> &xsk_queue_cnt, &shared_umem, prog_path,
> - &busy_budget, &force_copy, &use_cni) < 0) {
> + &busy_budget, &force_copy, &use_cni, dp_path) < 0) {
> AF_XDP_LOG(ERR, "Invalid kvargs value\n");
> return -EINVAL;
> }
> @@ -2384,7 +2395,19 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
> if (use_cni && strnlen(prog_path, PATH_MAX)) {
> AF_XDP_LOG(ERR, "When '%s' parameter is used, '%s' parameter is not valid\n",
> ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_PROG_ARG);
> - return -EINVAL;
> + return -EINVAL;
> + }
> +
> + if (use_cni && !strnlen(dp_path, PATH_MAX)) {
> + snprintf(dp_path, sizeof(dp_path), "%s/%s/%s", DP_BASE_PATH, if_name, DP_UDS_SOCK);
> + AF_XDP_LOG(INFO, "'%s' parameter not provided, setting value to '%s'\n",
> + ETH_AF_XDP_DP_PATH_ARG, dp_path);
> + }
> +
> + if (!use_cni && strnlen(dp_path, PATH_MAX)) {
> + AF_XDP_LOG(ERR, "'%s' parameter is set, but '%s' was not enabled\n",
> + ETH_AF_XDP_DP_PATH_ARG, ETH_AF_XDP_USE_CNI_ARG);
> + return -EINVAL;
> }
>
> if (strlen(if_name) == 0) {
> @@ -2410,7 +2433,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
>
> eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
> xsk_queue_cnt, shared_umem, prog_path,
> - busy_budget, force_copy, use_cni);
> + busy_budget, force_copy, use_cni, dp_path);
> if (eth_dev == NULL) {
> AF_XDP_LOG(ERR, "Failed to init internals\n");
> return -1;
> @@ -2471,4 +2494,5 @@ RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
> "xdp_prog=<string> "
> "busy_budget=<int> "
> "force_copy=<int> "
> - "use_cni=<int> ");
> + "use_cni=<int> "
> + "dp_path=<string> ");
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-02-29 13:06 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-29 13:01 [v10 0/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
2024-02-29 13:01 ` [v10 1/3] docs: AF_XDP Device Plugin Maryam Tahhan
2024-02-29 13:01 ` [v10 2/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
2024-02-29 13:06 ` Maryam Tahhan
2024-02-29 13:01 ` [v10 3/3] net/af_xdp: support AF_XDP DP pinned maps Maryam Tahhan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.