All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@google.com>
To: yoong.siang.song@intel.com
Cc: "bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"ast@kernel.org" <ast@kernel.org>,
	 "daniel@iogearbox.net" <daniel@iogearbox.net>,
	"andrii@kernel.org" <andrii@kernel.org>,
	 "martin.lau@linux.dev" <martin.lau@linux.dev>,
	"song@kernel.org" <song@kernel.org>, "yhs@fb.com" <yhs@fb.com>,
	 "john.fastabend@gmail.com" <john.fastabend@gmail.com>,
	"kpsingh@kernel.org" <kpsingh@kernel.org>,
	 "haoluo@google.com" <haoluo@google.com>,
	"jolsa@kernel.org" <jolsa@kernel.org>,
	 "kuba@kernel.org" <kuba@kernel.org>,
	"toke@kernel.org" <toke@kernel.org>,
	 "willemb@google.com" <willemb@google.com>,
	"dsahern@kernel.org" <dsahern@kernel.org>,
	magnus.karlsson@intel.com,  "bjorn@kernel.org" <bjorn@kernel.org>,
	maciej.fijalkowski@intel.com,
	 "hawk@kernel.org" <hawk@kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	 "xdp-hints@xdp-project.net" <xdp-hints@xdp-project.net>
Subject: Re: [PATCH bpf-next v3 02/10] xsk: add TX timestamp and TX checksum offload support
Date: Wed, 4 Oct 2023 10:48:08 -0700	[thread overview]
Message-ID: <ZR2lNjyyNZO4aQP0@google.com> (raw)
In-Reply-To: <PH0PR11MB583036756C64ED287ECA2344D8CBA@PH0PR11MB5830.namprd11.prod.outlook.com>

On 10/04, Song, Yoong Siang wrote:
> On Wednesday, October 4, 2023 4:05 AM Stanislav Fomichev <sdf@google.com> wrote:
> >This change actually defines the (initial) metadata layout that should be used by
> >AF_XDP userspace (xsk_tx_metadata).
> >The first field is flags which requests appropriate offloads, followed by the offload-
> >specific fields. The supported per-device offloads are exported via netlink (new
> >xsk-flags).
> >
> >The offloads themselves are still implemented in a bit of a framework-y fashion
> >that's left from my initial kfunc attempt.
> >I'm introducing new xsk_tx_metadata_ops which drivers are supposed to
> >implement. The drivers are also supposed to call
> >xsk_tx_metadata_request/xsk_tx_metadata_complete in the right places. Since
> >xsk_tx_metadata_{request,_complete}
> >are static inline, we don't incur any extra overhead doing indirect calls.
> >
> >The benefit of this scheme is as follows:
> >- keeps all metadata layout parsing away from driver code
> >- makes it easy to grep and see which drivers implement what
> >- don't need any extra flags to maintain to keep track of what
> >  offloads are implemented; if the callback is implemented - the offload
> >  is supported (used by netlink reporting code)
> >
> >Two offloads are defined right now:
> >1. XDP_TX_METADATA_CHECKSUM: skb-style csum_start+csum_offset 2.
> >XDP_TX_METADATA_TIMESTAMP: writes TX timestamp back into metadata
> >   area upon completion (tx_timestamp field)
> >
> >The offloads are also implemented for copy mode:
> >1. Extra XDP_TX_METADATA_CHECKSUM_SW to trigger skb_checksum_help; this
> >   might be useful as a reference implementation and for testing 2.
> >XDP_TX_METADATA_TIMESTAMP writes SW timestamp from the skb
> >   destructor (note I'm reusing hwtstamps to pass metadata pointer)
> >
> >The struct is forward-compatible and can be extended in the future by appending
> >more fields.
> >
> >Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >---
> > Documentation/netlink/specs/netdev.yaml | 19 ++++++
> > include/linux/netdevice.h               | 27 +++++++++
> > include/linux/skbuff.h                  | 14 ++++-
> > include/net/xdp_sock.h                  | 80 +++++++++++++++++++++++++
> > include/net/xdp_sock_drv.h              | 13 ++++
> > include/net/xsk_buff_pool.h             |  6 ++
> > include/uapi/linux/if_xdp.h             | 40 +++++++++++++
> > include/uapi/linux/netdev.h             | 16 +++++
> > net/core/netdev-genl.c                  | 12 +++-
> > net/xdp/xsk.c                           | 39 ++++++++++++
> > net/xdp/xsk_queue.h                     |  2 +-
> > tools/include/uapi/linux/if_xdp.h       | 54 +++++++++++++++--
> > tools/include/uapi/linux/netdev.h       | 16 +++++
> > tools/net/ynl/generated/netdev-user.c   | 19 ++++++
> > tools/net/ynl/generated/netdev-user.h   |  3 +
> > 15 files changed, 352 insertions(+), 8 deletions(-)
> >
> >diff --git a/Documentation/netlink/specs/netdev.yaml
> >b/Documentation/netlink/specs/netdev.yaml
> >index c46fcc78fc04..3735c26c8646 100644
> >--- a/Documentation/netlink/specs/netdev.yaml
> >+++ b/Documentation/netlink/specs/netdev.yaml
> >@@ -55,6 +55,19 @@ name: netdev
> >         name: hash
> >         doc:
> >           Device is capable of exposing receive packet hash via
> >bpf_xdp_metadata_rx_hash().
> >+  -
> >+    type: flags
> >+    name: xsk-flags
> >+    render-max: true
> >+    entries:
> >+      -
> >+        name: tx-timestamp
> >+        doc:
> >+          HW timestamping egress packets is supported by the driver.
> >+      -
> >+        name: tx-checksum
> >+        doc:
> >+          L3 checksum HW offload is supported by the driver.
> >
> > attribute-sets:
> >   -
> >@@ -88,6 +101,11 @@ name: netdev
> >         type: u64
> >         enum: xdp-rx-metadata
> >         enum-as-flags: true
> >+      -
> >+        name: xsk-features
> >+        doc: Bitmask of enabled AF_XDP features.
> >+        type: u64
> >+        enum: xsk-flags
> >
> > operations:
> >   list:
> >@@ -105,6 +123,7 @@ name: netdev
> >             - xdp-features
> >             - xdp-zc-max-segs
> >             - xdp-rx-metadata-features
> >+            - xsk-features
> >       dump:
> >         reply: *dev-all
> >     -
> >diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index
> >7e520c14eb8c..0e1cb026cbe5 100644
> >--- a/include/linux/netdevice.h
> >+++ b/include/linux/netdevice.h
> >@@ -1650,6 +1650,31 @@ struct net_device_ops {
> > 						    struct netlink_ext_ack
> >*extack);  };
> >
> >+/*
> >+ * This structure defines the AF_XDP TX metadata hooks for network devices.
> >+ * The following hooks can be defined; unless noted otherwise, they are
> >+ * optional and can be filled with a null pointer.
> >+ *
> >+ * int (*tmo_request_timestamp)(void *priv)

[..]
 
> Should be "void" instead of "int"
> 
> >+ *     This function is called when AF_XDP frame requested egress timestamp.
> >+ *
> >+ * int (*tmo_fill_timestamp)(void *priv)
> 
> Should be "u64" instead of "int"
> 
> >+ *     This function is called when AF_XDP frame, that had requested
> >+ *     egress timestamp, received a completion. The hook needs to return
> >+ *     the actual HW timestamp.
> >+ *
> >+ * int (*tmo_request_checksum)(u16 csum_start, u16 csum_offset, void *priv)
> 
> Should be "void" instead of "int"

Oh, good catch, will update these doc entries!

> >+ *     This function is called when AF_XDP frame requested HW checksum
> >+ *     offload. csum_start indicates position where checksumming should start.
> >+ *     csum_offset indicates position where checksum should be stored.
> >+ *
> >+ */
> >+struct xsk_tx_metadata_ops {
> >+	void	(*tmo_request_timestamp)(void *priv);
> >+	u64	(*tmo_fill_timestamp)(void *priv);
> >+	void	(*tmo_request_checksum)(u16 csum_start, u16 csum_offset, void
> >*priv);
> >+};
> >+
> > /**
> >  * enum netdev_priv_flags - &struct net_device priv_flags
> >  *
> >@@ -1838,6 +1863,7 @@ enum netdev_ml_priv_type {
> >  *	@netdev_ops:	Includes several pointers to callbacks,
> >  *			if one wants to override the ndo_*() functions
> >  *	@xdp_metadata_ops:	Includes pointers to XDP metadata callbacks.
> >+ *	@xsk_tx_metadata_ops:	Includes pointers to AF_XDP TX
> >metadata callbacks.
> >  *	@ethtool_ops:	Management operations
> >  *	@l3mdev_ops:	Layer 3 master device operations
> >  *	@ndisc_ops:	Includes callbacks for different IPv6 neighbour
> >@@ -2097,6 +2123,7 @@ struct net_device {
> > 	unsigned long long	priv_flags;
> > 	const struct net_device_ops *netdev_ops;
> > 	const struct xdp_metadata_ops *xdp_metadata_ops;
> >+	const struct xsk_tx_metadata_ops *xsk_tx_metadata_ops;
> > 	int			ifindex;
> > 	unsigned short		gflags;
> > 	unsigned short		hard_header_len;
> >diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index
> >4174c4b82d13..444d35dcd690 100644
> >--- a/include/linux/skbuff.h
> >+++ b/include/linux/skbuff.h
> >@@ -566,6 +566,15 @@ struct ubuf_info_msgzc {  int
> >mm_account_pinned_pages(struct mmpin *mmp, size_t size);  void
> >mm_unaccount_pinned_pages(struct mmpin *mmp);
> >
> >+/* Preserve some data across TX submission and completion.
> >+ *
> >+ * Note, this state is stored in the driver. Extending the layout
> >+ * might need some special care.
> >+ */
> >+struct xsk_tx_metadata_compl {
> >+	__u64 *tx_timestamp;
> >+};
> >+
> > /* This data is invariant across clones and lives at
> >  * the end of the header data, ie. at skb->end.
> >  */
> >@@ -578,7 +587,10 @@ struct skb_shared_info {
> > 	/* Warning: this field is not always filled in (UFO)! */
> > 	unsigned short	gso_segs;
> > 	struct sk_buff	*frag_list;
> >-	struct skb_shared_hwtstamps hwtstamps;
> >+	union {
> >+		struct skb_shared_hwtstamps hwtstamps;
> >+		struct xsk_tx_metadata_compl xsk_meta;
> >+	};
> > 	unsigned int	gso_type;
> > 	u32		tskey;
> >
> >diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index
> >caa1f04106be..29427a69784d 100644
> >--- a/include/net/xdp_sock.h
> >+++ b/include/net/xdp_sock.h
> >@@ -92,6 +92,74 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff
> >*xdp);  int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp);  void
> >__xsk_map_flush(void);
> >
> >+/**
> >+ *  xsk_tx_metadata_to_compl - Save enough relevant metadata
> >+information
> >+ *  to perform tx completion in the future.
> >+ *  @meta: pointer to AF_XDP metadata area
> >+ *  @compl: pointer to output struct xsk_tx_metadata_to_compl
> >+ *
> >+ *  This function should be called by the networking device when
> >+ *  it prepares AF_XDP egress packet. The value of @compl should be
> >+stored
> >+ *  and passed to xsk_tx_metadata_complete upon TX completion.
> >+ */
> >+static inline void xsk_tx_metadata_to_compl(struct xsk_tx_metadata *meta,
> >+					    struct xsk_tx_metadata_compl
> >*compl) {
> >+	if (!meta)
> >+		return;
> >+
> >+	if (meta->flags & XDP_TX_METADATA_TIMESTAMP)
> >+		compl->tx_timestamp = &meta->completion.tx_timestamp;
> >+	else
> >+		compl->tx_timestamp = NULL;
> >+}
> >+
> >+/**
> >+ *  xsk_tx_metadata_request - Evaluate AF_XDP TX metadata at submission
> >+ *  and call appropriate xsk_tx_metadata_ops operation.
> >+ *  @meta: pointer to AF_XDP metadata area
> >+ *  @ops: pointer to struct xsk_tx_metadata_ops
> >+ *  @priv: pointer to driver-private aread
> >+ *
> >+ *  This function should be called by the networking device when
> >+ *  it prepares AF_XDP egress packet.
> >+ */
> >+static inline void xsk_tx_metadata_request(const struct xsk_tx_metadata *meta,
> >+					   const struct xsk_tx_metadata_ops
> >*ops,
> >+					   void *priv)
> >+{
> >+	if (!meta)
> >+		return;
> >+
> >+	if (ops->tmo_request_timestamp)
> >+		if (meta->flags & XDP_TX_METADATA_TIMESTAMP)
> >+			ops->tmo_request_timestamp(priv);
> >+
> >+	if (ops->tmo_request_checksum)
> >+		if (meta->flags & XDP_TX_METADATA_CHECKSUM)
> >+			ops->tmo_request_checksum(meta->csum_start, meta-
> >>csum_offset,
> >+priv); }
> >+
> >+/**
> >+ *  xsk_tx_metadata_complete - Evaluate AF_XDP TX metadata at
> >+completion
> >+ *  and call appropriate xsk_tx_metadata_ops operation.
> >+ *  @compl: pointer to completion metadata produced from
> >+xsk_tx_metadata_to_compl
> >+ *  @ops: pointer to struct xsk_tx_metadata_ops
> >+ *  @priv: pointer to driver-private aread
> >+ *
> >+ *  This function should be called by the networking device upon
> >+ *  AF_XDP egress completion.
> >+ */
> >+static inline void xsk_tx_metadata_complete(struct xsk_tx_metadata_compl
> >*compl,
> >+					    const struct xsk_tx_metadata_ops
> >*ops,
> >+					    void *priv)
> >+{
> >+	if (!compl)
> >+		return;
> >+
> >+	*compl->tx_timestamp = ops->tmo_fill_timestamp(priv); }
> >+
> > #else
> >
> > static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) @@ -
> >108,6 +176,18 @@ static inline void __xsk_map_flush(void)  {  }
> >
> >+static inline void xsk_tx_metadata_request(struct xsk_tx_metadata *meta,
> >+					   const struct xsk_tx_metadata_ops
> >*ops,
> >+					   void *priv)
> >+{
> >+}
> >+
> >+static inline void xsk_tx_metadata_complete(struct xsk_tx_metadata_compl
> >*compl,
> >+					    const struct xsk_tx_metadata_ops
> >*ops,
> >+					    void *priv)
> >+{
> >+}
> >+
> > #endif /* CONFIG_XDP_SOCKETS */
> >
> > #endif /* _LINUX_XDP_SOCK_H */
> >diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index
> >1f6fc8c7a84c..e2558ac3e195 100644
> >--- a/include/net/xdp_sock_drv.h
> >+++ b/include/net/xdp_sock_drv.h
> >@@ -165,6 +165,14 @@ static inline void *xsk_buff_raw_get_data(struct
> >xsk_buff_pool *pool, u64 addr)
> > 	return xp_raw_get_data(pool, addr);
> > }
> >
> >+static inline struct xsk_tx_metadata *xsk_buff_get_metadata(struct
> >+xsk_buff_pool *pool, u64 addr) {
> >+	if (!pool->tx_metadata_len)
> >+		return NULL;
> >+
> >+	return xp_raw_get_data(pool, addr) - pool->tx_metadata_len; }
> >+
> > static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct
> >xsk_buff_pool *pool)  {
> > 	struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp);
> >@@ -324,6 +332,11 @@ static inline void *xsk_buff_raw_get_data(struct
> >xsk_buff_pool *pool, u64 addr)
> > 	return NULL;
> > }
> >
> >+static inline struct xsk_tx_metadata *xsk_buff_get_metadata(struct
> >+xsk_buff_pool *pool, u64 addr) {
> >+	return NULL;
> >+}
> >+
> > static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct
> >xsk_buff_pool *pool)  {  } diff --git a/include/net/xsk_buff_pool.h
> >b/include/net/xsk_buff_pool.h index 1985ffaf9b0c..97f5cc10d79e 100644
> >--- a/include/net/xsk_buff_pool.h
> >+++ b/include/net/xsk_buff_pool.h
> >@@ -33,6 +33,7 @@ struct xdp_buff_xsk {
> > };
> >
> > #define XSK_CHECK_PRIV_TYPE(t) BUILD_BUG_ON(sizeof(t) > offsetofend(struct
> >xdp_buff_xsk, cb))
> >+#define XSK_TX_COMPL_FITS(t) BUILD_BUG_ON(sizeof(struct
> >+xsk_tx_metadata_compl) > sizeof(t))
> >
> > struct xsk_dma_map {
> > 	dma_addr_t *dma_pages;
> >@@ -234,4 +235,9 @@ static inline u64 xp_get_handle(struct xdp_buff_xsk *xskb)
> > 	return xskb->orig_addr + (offset <<
> >XSK_UNALIGNED_BUF_OFFSET_SHIFT);  }
> >
> >+static inline bool xp_tx_metadata_enabled(const struct xsk_buff_pool
> >+*pool) {
> >+	return pool->tx_metadata_len > 0;
> >+}
> >+
> > #endif /* XSK_BUFF_POOL_H_ */
> >diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index
> >2ecf79282c26..ecfd67988283 100644
> >--- a/include/uapi/linux/if_xdp.h
> >+++ b/include/uapi/linux/if_xdp.h
> >@@ -106,6 +106,43 @@ struct xdp_options {  #define
> >XSK_UNALIGNED_BUF_ADDR_MASK \
> > 	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
> >
> >+/* Request transmit timestamp. Upon completion, put it into
> >+tx_timestamp
> >+ * field of struct xsk_tx_metadata.
> >+ */
> >+#define XDP_TX_METADATA_TIMESTAMP		(1 << 0)

[..]

> Suggestion from checkpatch.pl:
> CHECK: Prefer using the BIT macro
> 
> >+
> >+/* Request transmit checksum offload. Checksum start position and
> >+offset
> >+ * are communicated via csum_start and csum_offset fields of struct
> >+ * xsk_tx_metadata.
> >+ */
> >+#define XDP_TX_METADATA_CHECKSUM		(1 << 1)
> 
> Suggestion from checkpatch.pl:
> CHECK: Prefer using the BIT macro
> 
> >+
> >+/* Force checksum calculation in software. Can be used for testing or
> >+ * working around potential HW issues. This option causes performance
> >+ * degradation and only works in XDP_COPY mode.
> >+ */
> >+#define XDP_TX_METADATA_CHECKSUM_SW		(1 << 2)
> 
> Suggestion from checkpatch.pl:
> CHECK: Prefer using the BIT macro

Will do! Hopefully nothing breaks, let me check...

  reply	other threads:[~2023-10-04 17:48 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-03 20:05 [PATCH bpf-next v3 00/10] xsk: TX metadata Stanislav Fomichev
2023-10-03 20:05 ` [PATCH bpf-next v3 01/10] xsk: Support tx_metadata_len Stanislav Fomichev
2023-10-03 20:05 ` [PATCH bpf-next v3 02/10] xsk: add TX timestamp and TX checksum offload support Stanislav Fomichev
2023-10-04  6:18   ` Song, Yoong Siang
2023-10-04 17:48     ` Stanislav Fomichev [this message]
2023-10-04 17:56       ` Stanislav Fomichev
2023-10-05  1:16         ` Song, Yoong Siang
2023-10-03 20:05 ` [PATCH bpf-next v3 03/10] tools: ynl: print xsk-features from the sample Stanislav Fomichev
2023-10-03 20:05 ` [PATCH bpf-next v3 04/10] net/mlx5e: Implement AF_XDP TX timestamp and checksum offload Stanislav Fomichev
2023-10-04 23:47   ` kernel test robot
2023-10-03 20:05 ` [PATCH bpf-next v3 05/10] net: stmmac: Add Tx HWTS support to XDP ZC Stanislav Fomichev
2023-10-04 23:05   ` kernel test robot
2023-10-04 23:14     ` Stanislav Fomichev
2023-10-06  4:38   ` kernel test robot
2023-10-03 20:05 ` [PATCH bpf-next v3 06/10] selftests/xsk: Support tx_metadata_len Stanislav Fomichev
2023-10-03 20:05 ` [PATCH bpf-next v3 07/10] selftests/bpf: Add csum helpers Stanislav Fomichev
2023-10-03 20:05 ` [PATCH bpf-next v3 08/10] selftests/bpf: Add TX side to xdp_metadata Stanislav Fomichev
2023-10-03 20:05 ` [PATCH bpf-next v3 09/10] selftests/bpf: Add TX side to xdp_hw_metadata Stanislav Fomichev
2023-10-09  8:12   ` Jesper Dangaard Brouer
2023-10-09 16:37     ` Stanislav Fomichev
2023-10-10 20:40       ` Stanislav Fomichev
2023-10-13  1:13         ` Song, Yoong Siang
2023-10-13 18:47           ` Stanislav Fomichev
2023-10-15 13:28             ` Song, Yoong Siang
2023-10-03 20:05 ` [PATCH bpf-next v3 10/10] xsk: document tx_metadata_len layout Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZR2lNjyyNZO4aQP0@google.com \
    --to=sdf@google.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@kernel.org \
    --cc=haoluo@google.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=toke@kernel.org \
    --cc=willemb@google.com \
    --cc=xdp-hints@xdp-project.net \
    --cc=yhs@fb.com \
    --cc=yoong.siang.song@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.