* [PATCH v1 bpf-next/net 0/5] bpf: Support RX/TX HW timestamp proxy.
@ 2026-06-12 0:17 Kuniyuki Iwashima
2026-06-12 0:17 ` [PATCH v1 bpf-next/net 1/5] ethtool: Introduce ETHTOOL_MSG_TSINFO_SET for virtual interfaces Kuniyuki Iwashima
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Kuniyuki Iwashima @ 2026-06-12 0:17 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Stanislav Fomichev, Andrii Nakryiko, John Fastabend,
Kumar Kartikeya Dwivedi, Eduard Zingerman
Cc: Song Liu, Yonghong Song, Jiri Olsa, Andrew Lunn, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Willem de Bruijn, Kuniyuki Iwashima, Kuniyuki Iwashima, bpf,
netdev
We have some hosts where packets come from special hardware
and are provided directly to userspace, bypassing the kernel
networking stack.
When standard socket applications are run on these hosts,
a userspace proxy is required to mediate traffic between the
hardware and the applications.
+---------+ +----------------------+
| proxy | | socket application |
+---------+ +----------------------+
^ ^ ^
userspace | | |
-----------| |-----------------------------------------------
| | | +---------------------+ | skb
| | `--->| virtual interface |<---'
kernel | | skb +---------------------+
-----------| |-----------------------------------------------
|
v
+------------+
| hardware |
+------------+
However, even though the hardware fully supports timestamping,
the HW timestamps are not directly accessible to the socket
applications because the skb is consumed/injected by the proxy.
This series extends ethtool and adds BPF kfuncs to transparently
support HW timestamp on such a setup.
Patch 1 is pure net-next patch to advertise fake timestamping
capability on virtual interfaces (e.g. ipvlan, geneve, etc).
Patch 2 is misc cleanup.
Patch 3 & 4 add kfunc to proxy RX/TX hwtstamp.
Patch 5 is selftest to demonstrate how it works.
Note the test requires this iproute2 commit:
https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git/commit/?id=c9a9f12aa619288fd3d4e16bc4b3c73b655a4efe
Kuniyuki Iwashima (5):
ethtool: Introduce ETHTOOL_MSG_TSINFO_SET for virtual interfaces.
bpf: Rename bpf_kfunc_set_tcp_reqsk to bpf_kfunc_set_sched_cls.
bpf: Add bpf_skb_set_hwtstamp().
bpf: Add kfunc to proxy TX HW Timestamp.
selftest: bpf: Add test for hwtstamp proxy.
Documentation/netlink/specs/ethtool.yaml | 13 +
include/linux/filter.h | 2 +
include/linux/netdevice.h | 11 +
include/linux/skbuff.h | 13 +
include/net/tcx.h | 1 +
include/uapi/linux/bpf.h | 1 +
.../uapi/linux/ethtool_netlink_generated.h | 1 +
include/uapi/linux/pkt_cls.h | 3 +-
kernel/bpf/verifier.c | 13 +-
net/core/dev.c | 39 ++
net/core/dev_ioctl.c | 31 +-
net/core/filter.c | 91 ++-
net/ethtool/common.c | 4 +
net/ethtool/netlink.c | 8 +
net/ethtool/netlink.h | 1 +
net/ethtool/tsconfig.c | 7 +-
net/ethtool/tsinfo.c | 123 +++-
tools/testing/selftests/bpf/bpf_kfuncs.h | 10 +
.../selftests/bpf/prog_tests/proxy_hwtstamp.c | 580 ++++++++++++++++++
.../selftests/bpf/progs/bpf_tracing_net.h | 1 +
.../selftests/bpf/progs/proxy_hwtstamp.c | 234 +++++++
21 files changed, 1169 insertions(+), 18 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/proxy_hwtstamp.c
create mode 100644 tools/testing/selftests/bpf/progs/proxy_hwtstamp.c
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v1 bpf-next/net 1/5] ethtool: Introduce ETHTOOL_MSG_TSINFO_SET for virtual interfaces.
2026-06-12 0:17 [PATCH v1 bpf-next/net 0/5] bpf: Support RX/TX HW timestamp proxy Kuniyuki Iwashima
@ 2026-06-12 0:17 ` Kuniyuki Iwashima
2026-06-12 0:17 ` [PATCH v1 bpf-next/net 2/5] bpf: Rename bpf_kfunc_set_tcp_reqsk to bpf_kfunc_set_sched_cls Kuniyuki Iwashima
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Kuniyuki Iwashima @ 2026-06-12 0:17 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Stanislav Fomichev, Andrii Nakryiko, John Fastabend,
Kumar Kartikeya Dwivedi, Eduard Zingerman
Cc: Song Liu, Yonghong Song, Jiri Olsa, Andrew Lunn, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Willem de Bruijn, Kuniyuki Iwashima, Kuniyuki Iwashima, bpf,
netdev
Before enabling SO_TIMESTAMPING, applications typically try to
enable hardware timestamping on network interfaces via SIOCSHWTSTAMP
(or ETHTOOL_MSG_TSCONFIG_SET).
The timestamping capability on an interface can be checked via
ETHTOOL_MSG_TSINFO_GET:
# ethtool -T eth0
Time stamping parameters for eth0:
Capabilities:
hardware-transmit
software-transmit
hardware-receive
software-receive
software-system-clock
hardware-raw-clock
PTP Hardware Clock: none
Hardware Transmit Timestamp Modes:
off
on
Hardware Receive Filter Modes:
none
all
These operations rely on the driver implementing two callbacks,
dev->netdev_ops->ndo_hwtstamp_{get,set}().
However, among all virtual network interfaces, only bond and
macvlan currently implement them.
As a result, most virtual interfaces cannot advertise the
capabilities of their underlying devices:
# ip link add ipvl0 link eth0 type ipvlan mode l2 bridge
# ethtool -T ipvl0
Time stamping parameters for ipvl0:
Capabilities:
software-receive
software-system-clock
PTP Hardware Clock: none
Hardware Transmit Timestamp Modes: none
Hardware Receive Filter Modes: none
While these callbacks could be implemented in each virtual
interface, this approach is limited to those directly linked
to a physical device.
Not all virtual interfaces are tied to real hardware; for
instance, packets from UDP tunnel devices eventually pass
through physical devices and can be hardware-timestamped there.
Let's allow configuring the hardware timestamping capability on
virtual interfaces via ETHTOOL_MSG_TSINFO_SET.
Note that SOF_TIMESTAMPING_RX_SOFTWARE and SOF_TIMESTAMPING_SOFTWARE
are automatically added since __ethtool_get_ts_info() and
ethnl_tsinfo_end_dump() report them for all devices.
By configuring this capability, ioctl(SIOCSHWTSTAMP) (hwstamp_ctl
below) can enable TX/RX hardware timestamping successfully:
# ./tools/net/ynl/pyynl/cli.py \
--spec Documentation/netlink/specs/ethtool.yaml \
--do tsinfo-set \
--json '{"header": {"dev-index": 6},
"timestamping": {"nomask": true, "bits": {
"bit": [{"name": "hardware-transmit"},
{"name": "software-transmit"},
{"name": "hardware-receive"}]
}},
"tx-types": {"nomask": true, "bits": {
"bit": [{"name": "off"}, {"name": "on"}]
}},
"rx-filters": {"nomask": true, "bits": {
"bit" : [{"name": "none"}, {"name": "all"}]
}}}'
# ethtool -T ipvl0
Time stamping parameters for ipvl0:
Capabilities:
hardware-transmit
software-transmit
hardware-receive
software-receive
software-system-clock
PTP Hardware Clock: none
Hardware Transmit Timestamp Modes:
off
on
Hardware Receive Filter Modes:
none
all
# hwstamp_ctl -i ipvl0 -t 1 -r 1
current settings:
tx_type 0
rx_filter 0
new settings:
tx_type 1
rx_filter 1
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
Documentation/netlink/specs/ethtool.yaml | 13 ++
include/linux/netdevice.h | 11 ++
.../uapi/linux/ethtool_netlink_generated.h | 1 +
net/core/dev_ioctl.c | 31 ++++-
net/ethtool/common.c | 4 +
net/ethtool/netlink.c | 8 ++
net/ethtool/netlink.h | 1 +
net/ethtool/tsconfig.c | 7 +-
net/ethtool/tsinfo.c | 123 +++++++++++++++++-
9 files changed, 188 insertions(+), 11 deletions(-)
diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
index 5dd4d1b5d94b..2dace12c8b4d 100644
--- a/Documentation/netlink/specs/ethtool.yaml
+++ b/Documentation/netlink/specs/ethtool.yaml
@@ -2859,6 +2859,19 @@ operations:
- worst-channel
- link
dump: *mse-get-op
+ -
+ name: tsinfo-set
+ doc: Set tsinfo params.
+
+ attribute-set: tsinfo
+
+ do: &tsinfo-set-op
+ request:
+ attributes:
+ - header
+ - timestamping
+ - tx-types
+ - rx-filters
mcast-groups:
list:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 74507c006490..2693161d4168 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2087,6 +2087,7 @@ enum netdev_reg_state {
* offload capabilities of the device
* @udp_tunnel_nic: UDP tunnel offload state
* @ethtool: ethtool related state
+ * @tsinfo: HW timestamping capability for virtual devices
* @xdp_state: stores info on attached XDP BPF programs
*
* @nested_level: Used as a parameter of spin_lock_nested() of
@@ -2509,6 +2510,16 @@ struct net_device {
*/
struct netdev_config *cfg_pending;
struct ethtool_netdev_state *ethtool;
+ struct {
+ struct {
+ u32 tx_type;
+ u32 rx_filter;
+ } cfg;
+ u32 so_timestamping;
+ u32 tx_types;
+ u32 rx_filters;
+ bool enabled;
+ } tsinfo;
/* protected by rtnl_lock */
struct bpf_xdp_entity xdp_state[__MAX_XDP_MODE];
diff --git a/include/uapi/linux/ethtool_netlink_generated.h b/include/uapi/linux/ethtool_netlink_generated.h
index 8134baf7860f..5aaa3bf9a073 100644
--- a/include/uapi/linux/ethtool_netlink_generated.h
+++ b/include/uapi/linux/ethtool_netlink_generated.h
@@ -893,6 +893,7 @@ enum {
ETHTOOL_MSG_RSS_CREATE_ACT,
ETHTOOL_MSG_RSS_DELETE_ACT,
ETHTOOL_MSG_MSE_GET,
+ ETHTOOL_MSG_TSINFO_SET,
__ETHTOOL_MSG_USER_CNT,
ETHTOOL_MSG_USER_MAX = (__ETHTOOL_MSG_USER_CNT - 1)
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index f3979b276090..3ecc57fd75c9 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -260,6 +260,15 @@ int dev_get_hwtstamp_phylib(struct net_device *dev,
{
struct hwtstamp_provider *hwprov;
+ if (!dev->netdev_ops->ndo_hwtstamp_get) {
+ if (!dev->tsinfo.enabled)
+ return -EOPNOTSUPP;
+
+ cfg->rx_filter = dev->tsinfo.cfg.rx_filter;
+ cfg->tx_type = dev->tsinfo.cfg.tx_type;
+ return 0;
+ }
+
hwprov = rtnl_dereference(dev->hwprov);
if (hwprov) {
cfg->qualifier = hwprov->desc.qualifier;
@@ -286,7 +295,7 @@ static int dev_get_hwtstamp(struct net_device *dev, struct ifreq *ifr)
struct hwtstamp_config cfg;
int err;
- if (!ops->ndo_hwtstamp_get)
+ if (!ops->ndo_hwtstamp_get && !dev->tsinfo.enabled)
return -EOPNOTSUPP;
if (!netif_device_present(dev))
@@ -337,6 +346,20 @@ int dev_set_hwtstamp_phylib(struct net_device *dev,
bool phy_ts;
int err;
+ if (!ops->ndo_hwtstamp_set) {
+ if (!dev->tsinfo.enabled ||
+ !(dev->tsinfo.tx_types & BIT(cfg->tx_type)) ||
+ !(dev->tsinfo.rx_filters & BIT(cfg->rx_filter)))
+ return -EOPNOTSUPP;
+
+ if (cfg->flags)
+ return -EINVAL;
+
+ dev->tsinfo.cfg.tx_type = cfg->tx_type;
+ dev->tsinfo.cfg.rx_filter = cfg->rx_filter;
+ return 0;
+ }
+
hwprov = rtnl_dereference(dev->hwprov);
if (hwprov) {
if (hwprov->source == HWTSTAMP_SOURCE_PHYLIB &&
@@ -413,7 +436,7 @@ static int dev_set_hwtstamp(struct net_device *dev, struct ifreq *ifr)
return err;
}
- if (!ops->ndo_hwtstamp_set)
+ if (!ops->ndo_hwtstamp_set && !dev->tsinfo.enabled)
return -EOPNOTSUPP;
if (!netif_device_present(dev))
@@ -447,7 +470,7 @@ int generic_hwtstamp_get_lower(struct net_device *dev,
if (!netif_device_present(dev))
return -ENODEV;
- if (!ops->ndo_hwtstamp_get)
+ if (!ops->ndo_hwtstamp_get && !dev->tsinfo.enabled)
return -EOPNOTSUPP;
netdev_lock_ops(dev);
@@ -468,7 +491,7 @@ int generic_hwtstamp_set_lower(struct net_device *dev,
if (!netif_device_present(dev))
return -ENODEV;
- if (!ops->ndo_hwtstamp_set)
+ if (!ops->ndo_hwtstamp_set && !dev->tsinfo.enabled)
return -EOPNOTSUPP;
netdev_lock_ops(dev);
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index 84ec88dee05c..bb0c02d92a9b 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -1090,6 +1090,10 @@ int __ethtool_get_ts_info(struct net_device *dev,
err = ops->get_ts_info(dev, info);
if (!err && info->phc_index >= 0)
info->phc_source = HWTSTAMP_SOURCE_NETDEV;
+ } else if (dev->tsinfo.enabled) {
+ info->so_timestamping = dev->tsinfo.so_timestamping;
+ info->tx_types = dev->tsinfo.tx_types;
+ info->rx_filters = dev->tsinfo.rx_filters;
}
info->so_timestamping |= SOF_TIMESTAMPING_RX_SOFTWARE |
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index 25e22c48060a..07d1a010b1cc 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -422,6 +422,7 @@ ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = {
[ETHTOOL_MSG_TSCONFIG_SET] = ðnl_tsconfig_request_ops,
[ETHTOOL_MSG_PHY_GET] = ðnl_phy_request_ops,
[ETHTOOL_MSG_MSE_GET] = ðnl_mse_request_ops,
+ [ETHTOOL_MSG_TSINFO_SET] = ðnl_tsinfo_request_ops,
};
static struct ethnl_dump_ctx *ethnl_dump_context(struct netlink_callback *cb)
@@ -1548,6 +1549,13 @@ static const struct genl_ops ethtool_genl_ops[] = {
.policy = ethnl_mse_get_policy,
.maxattr = ARRAY_SIZE(ethnl_mse_get_policy) - 1,
},
+ {
+ .cmd = ETHTOOL_MSG_TSINFO_SET,
+ .flags = GENL_UNS_ADMIN_PERM,
+ .doit = ethnl_default_set_doit,
+ .policy = ethnl_tsinfo_set_policy,
+ .maxattr = ARRAY_SIZE(ethnl_tsinfo_set_policy) - 1,
+ },
};
static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 674c9c19529b..7c2a350f8ba4 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -502,6 +502,7 @@ extern const struct nla_policy ethnl_phy_get_policy[ETHTOOL_A_PHY_HEADER + 1];
extern const struct nla_policy ethnl_tsconfig_get_policy[ETHTOOL_A_TSCONFIG_HEADER + 1];
extern const struct nla_policy ethnl_tsconfig_set_policy[ETHTOOL_A_TSCONFIG_MAX + 1];
extern const struct nla_policy ethnl_mse_get_policy[ETHTOOL_A_MSE_HEADER + 1];
+extern const struct nla_policy ethnl_tsinfo_set_policy[ETHTOOL_A_TSINFO_MAX + 1];
int ethnl_set_features(struct sk_buff *skb, struct genl_info *info);
int ethnl_act_cable_test(struct sk_buff *skb, struct genl_info *info);
diff --git a/net/ethtool/tsconfig.c b/net/ethtool/tsconfig.c
index 664c3fe49b5b..9f387339fc8e 100644
--- a/net/ethtool/tsconfig.c
+++ b/net/ethtool/tsconfig.c
@@ -41,7 +41,7 @@ static int tsconfig_prepare_data(const struct ethnl_req_info *req_base,
struct kernel_hwtstamp_config cfg = {};
int ret;
- if (!dev->netdev_ops->ndo_hwtstamp_get)
+ if (!dev->netdev_ops->ndo_hwtstamp_get && !dev->tsinfo.enabled)
return -EOPNOTSUPP;
ret = ethnl_ops_begin(dev);
@@ -61,7 +61,7 @@ static int tsconfig_prepare_data(const struct ethnl_req_info *req_base,
if (hwprov) {
data->hwprov_desc.index = hwprov->desc.index;
data->hwprov_desc.qualifier = hwprov->desc.qualifier;
- } else {
+ } else if (!dev->tsinfo.enabled) {
struct kernel_ethtool_ts_info ts_info = {};
ts_info.phc_index = -1;
@@ -252,7 +252,8 @@ static int ethnl_set_tsconfig_validate(struct ethnl_req_info *req_base,
{
const struct net_device_ops *ops = req_base->dev->netdev_ops;
- if (!ops->ndo_hwtstamp_set || !ops->ndo_hwtstamp_get)
+ if ((!ops->ndo_hwtstamp_set || !ops->ndo_hwtstamp_get) &&
+ !READ_ONCE(req_base->dev->tsinfo.enabled))
return -EOPNOTSUPP;
return 1;
diff --git a/net/ethtool/tsinfo.c b/net/ethtool/tsinfo.c
index 14bf01e3b55c..c9078aa4a897 100644
--- a/net/ethtool/tsinfo.c
+++ b/net/ethtool/tsinfo.c
@@ -38,6 +38,14 @@ const struct nla_policy ethnl_tsinfo_get_policy[ETHTOOL_A_TSINFO_MAX + 1] = {
NLA_POLICY_NESTED(ethnl_ts_hwtst_prov_policy),
};
+const struct nla_policy ethnl_tsinfo_set_policy[ETHTOOL_A_TSINFO_MAX + 1] = {
+ [ETHTOOL_A_TSINFO_HEADER] =
+ NLA_POLICY_NESTED(ethnl_header_policy_stats),
+ [ETHTOOL_A_TSINFO_TIMESTAMPING] = { .type = NLA_NESTED },
+ [ETHTOOL_A_TSINFO_TX_TYPES] = { .type = NLA_NESTED },
+ [ETHTOOL_A_TSINFO_RX_FILTERS] = { .type = NLA_NESTED },
+};
+
int ts_parse_hwtst_provider(const struct nlattr *nest,
struct hwtstamp_provider_desc *hwprov_desc,
struct netlink_ext_ack *extack,
@@ -390,15 +398,17 @@ static int ethnl_tsinfo_dump_one_netdev(struct sk_buff *skb,
{
struct ethnl_tsinfo_dump_ctx *ctx = (void *)cb->ctx;
const struct ethtool_ops *ops = dev->ethtool_ops;
+ struct kernel_ethtool_ts_info *ts_info;
struct tsinfo_reply_data *reply_data;
struct tsinfo_req_info *req_info;
void *ehdr = NULL;
int ret = 0;
- if (!ops->get_ts_info)
+ if (!ops->get_ts_info && !dev->tsinfo.enabled)
return -EOPNOTSUPP;
reply_data = ctx->reply_data;
+ ts_info = &reply_data->ts_info;
req_info = ctx->req_info;
for (; ctx->pos_phcqualifier < HWTSTAMP_PROVIDER_QUALIFIER_CNT;
ctx->pos_phcqualifier++) {
@@ -411,9 +421,16 @@ static int ethnl_tsinfo_dump_one_netdev(struct sk_buff *skb,
return PTR_ERR(ehdr);
reply_data->ts_info.phc_qualifier = ctx->pos_phcqualifier;
- ret = ops->get_ts_info(dev, &reply_data->ts_info);
- if (ret < 0)
- goto err;
+
+ if (dev->tsinfo.enabled) {
+ ts_info->so_timestamping |= dev->tsinfo.so_timestamping;
+ ts_info->tx_types |= dev->tsinfo.tx_types;
+ ts_info->rx_filters |= dev->tsinfo.rx_filters;
+ } else {
+ ret = ops->get_ts_info(dev, ts_info);
+ if (ret < 0)
+ goto err;
+ }
if (reply_data->ts_info.phc_index >= 0)
reply_data->ts_info.phc_source = HWTSTAMP_SOURCE_NETDEV;
@@ -563,6 +580,101 @@ int ethnl_tsinfo_done(struct netlink_callback *cb)
return 0;
}
+static int ethnl_tsinfo_set_validate(struct ethnl_req_info *req_base,
+ struct genl_info *info)
+{
+ const struct net_device *dev = req_base->dev;
+
+ if (!dev->rtnl_link_ops ||
+ dev->ethtool_ops->get_ts_info ||
+ dev->netdev_ops->ndo_hwtstamp_set ||
+ dev->netdev_ops->ndo_hwtstamp_get)
+ return -EOPNOTSUPP;
+
+ return 1;
+}
+
+static int ethnl_tsinfo_set(struct ethnl_req_info *req_base,
+ struct genl_info *info)
+{
+ struct net_device *dev = req_base->dev;
+ struct kernel_ethtool_ts_info ts_info;
+ struct nlattr **tb = info->attrs;
+ bool config_mod = false;
+ int ret;
+
+ ts_info.so_timestamping = dev->tsinfo.so_timestamping;
+ ts_info.tx_types = dev->tsinfo.tx_types;
+ ts_info.rx_filters = dev->tsinfo.rx_filters;
+
+ if (tb[ETHTOOL_A_TSINFO_TIMESTAMPING]) {
+ ret = ethnl_update_bitset32(&ts_info.so_timestamping,
+ __SOF_TIMESTAMPING_CNT,
+ tb[ETHTOOL_A_TSINFO_TIMESTAMPING],
+ sof_timestamping_names, info->extack,
+ &config_mod);
+ if (ret < 0)
+ return ret;
+ }
+
+ if (tb[ETHTOOL_A_TSINFO_TX_TYPES]) {
+ ret = ethnl_update_bitset32(&ts_info.tx_types,
+ __HWTSTAMP_TX_CNT,
+ tb[ETHTOOL_A_TSINFO_TX_TYPES],
+ ts_tx_type_names, info->extack,
+ &config_mod);
+ if (ret < 0)
+ return ret;
+ }
+
+ if (tb[ETHTOOL_A_TSINFO_RX_FILTERS]) {
+ ret = ethnl_update_bitset32(&ts_info.rx_filters,
+ __HWTSTAMP_FILTER_CNT,
+ tb[ETHTOOL_A_TSINFO_RX_FILTERS],
+ ts_rx_filter_names, info->extack,
+ &config_mod);
+ if (ret < 0)
+ return ret;
+ }
+
+ if (!config_mod)
+ goto out;
+
+ if (!ts_info.so_timestamping &&
+ !ts_info.tx_types && !ts_info.rx_filters) {
+ WRITE_ONCE(dev->tsinfo.enabled, false);
+ memset(&dev->tsinfo, 0, offsetof(typeof(dev->tsinfo), enabled));
+ goto out;
+ }
+
+ /* __ethtool_get_ts_info() and ethnl_tsinfo_end_dump()
+ * unconditionally report these flags.
+ */
+ ts_info.so_timestamping |= SOF_TIMESTAMPING_RX_SOFTWARE |
+ SOF_TIMESTAMPING_SOFTWARE;
+
+ /* Fallback to HWTSTAMP_TX_OFF / HWTSTAMP_FILTER_NONE
+ * if the current mode is not supported.
+ */
+ if (!(ts_info.tx_types & BIT(dev->tsinfo.cfg.tx_type))) {
+ ts_info.tx_types |= BIT(HWTSTAMP_TX_OFF);
+ dev->tsinfo.cfg.tx_type = HWTSTAMP_TX_OFF;
+ }
+ if (!(ts_info.rx_filters & BIT(dev->tsinfo.cfg.rx_filter))) {
+ ts_info.rx_filters |= BIT(HWTSTAMP_FILTER_NONE);
+ dev->tsinfo.cfg.rx_filter = HWTSTAMP_FILTER_NONE;
+ }
+
+ dev->tsinfo.so_timestamping = ts_info.so_timestamping;
+ dev->tsinfo.tx_types = ts_info.tx_types;
+ dev->tsinfo.rx_filters = ts_info.rx_filters;
+
+ WRITE_ONCE(dev->tsinfo.enabled, true);
+out:
+ /* no notification. */
+ return 0;
+}
+
const struct ethnl_request_ops ethnl_tsinfo_request_ops = {
.request_cmd = ETHTOOL_MSG_TSINFO_GET,
.reply_cmd = ETHTOOL_MSG_TSINFO_GET_REPLY,
@@ -574,4 +686,7 @@ const struct ethnl_request_ops ethnl_tsinfo_request_ops = {
.prepare_data = tsinfo_prepare_data,
.reply_size = tsinfo_reply_size,
.fill_reply = tsinfo_fill_reply,
+
+ .set_validate = ethnl_tsinfo_set_validate,
+ .set = ethnl_tsinfo_set,
};
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v1 bpf-next/net 2/5] bpf: Rename bpf_kfunc_set_tcp_reqsk to bpf_kfunc_set_sched_cls.
2026-06-12 0:17 [PATCH v1 bpf-next/net 0/5] bpf: Support RX/TX HW timestamp proxy Kuniyuki Iwashima
2026-06-12 0:17 ` [PATCH v1 bpf-next/net 1/5] ethtool: Introduce ETHTOOL_MSG_TSINFO_SET for virtual interfaces Kuniyuki Iwashima
@ 2026-06-12 0:17 ` Kuniyuki Iwashima
2026-06-12 0:17 ` [PATCH v1 bpf-next/net 3/5] bpf: Add bpf_skb_set_hwtstamp() Kuniyuki Iwashima
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Kuniyuki Iwashima @ 2026-06-12 0:17 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Stanislav Fomichev, Andrii Nakryiko, John Fastabend,
Kumar Kartikeya Dwivedi, Eduard Zingerman
Cc: Song Liu, Yonghong Song, Jiri Olsa, Andrew Lunn, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Willem de Bruijn, Kuniyuki Iwashima, Kuniyuki Iwashima, bpf,
netdev
Currently bpf_kfunc_set_tcp_reqsk is registered for
BPF_PROG_TYPE_SCHED_CLS.
We will add more kfuncs there, but the name is too specific.
Let's rename it to bpf_kfunc_set_sched_cls.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
net/core/filter.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 80439767e0ee..acdc66aa4f27 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -12498,9 +12498,9 @@ BTF_KFUNCS_START(bpf_kfunc_check_set_sock_addr)
BTF_ID_FLAGS(func, bpf_sock_addr_set_sun_path)
BTF_KFUNCS_END(bpf_kfunc_check_set_sock_addr)
-BTF_KFUNCS_START(bpf_kfunc_check_set_tcp_reqsk)
+BTF_KFUNCS_START(bpf_kfunc_check_set_sched_cls)
BTF_ID_FLAGS(func, bpf_sk_assign_tcp_reqsk)
-BTF_KFUNCS_END(bpf_kfunc_check_set_tcp_reqsk)
+BTF_KFUNCS_END(bpf_kfunc_check_set_sched_cls)
BTF_KFUNCS_START(bpf_kfunc_check_set_sock_ops)
BTF_ID_FLAGS(func, bpf_sock_ops_enable_tx_tstamp)
@@ -12526,9 +12526,9 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_sock_addr = {
.set = &bpf_kfunc_check_set_sock_addr,
};
-static const struct btf_kfunc_id_set bpf_kfunc_set_tcp_reqsk = {
+static const struct btf_kfunc_id_set bpf_kfunc_set_sched_cls = {
.owner = THIS_MODULE,
- .set = &bpf_kfunc_check_set_tcp_reqsk,
+ .set = &bpf_kfunc_check_set_sched_cls,
};
static const struct btf_kfunc_id_set bpf_kfunc_set_sock_ops = {
@@ -12556,7 +12556,7 @@ static int __init bpf_kfunc_init(void)
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
&bpf_kfunc_set_sock_addr);
- ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_sched_cls);
return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCK_OPS, &bpf_kfunc_set_sock_ops);
}
late_initcall(bpf_kfunc_init);
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v1 bpf-next/net 3/5] bpf: Add bpf_skb_set_hwtstamp().
2026-06-12 0:17 [PATCH v1 bpf-next/net 0/5] bpf: Support RX/TX HW timestamp proxy Kuniyuki Iwashima
2026-06-12 0:17 ` [PATCH v1 bpf-next/net 1/5] ethtool: Introduce ETHTOOL_MSG_TSINFO_SET for virtual interfaces Kuniyuki Iwashima
2026-06-12 0:17 ` [PATCH v1 bpf-next/net 2/5] bpf: Rename bpf_kfunc_set_tcp_reqsk to bpf_kfunc_set_sched_cls Kuniyuki Iwashima
@ 2026-06-12 0:17 ` Kuniyuki Iwashima
2026-06-12 0:17 ` [PATCH v1 bpf-next/net 4/5] bpf: Add kfunc to proxy TX HW Timestamp Kuniyuki Iwashima
2026-06-12 0:17 ` [PATCH v1 bpf-next/net 5/5] selftest: bpf: Add test for hwtstamp proxy Kuniyuki Iwashima
4 siblings, 0 replies; 6+ messages in thread
From: Kuniyuki Iwashima @ 2026-06-12 0:17 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Stanislav Fomichev, Andrii Nakryiko, John Fastabend,
Kumar Kartikeya Dwivedi, Eduard Zingerman
Cc: Song Liu, Yonghong Song, Jiri Olsa, Andrew Lunn, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Willem de Bruijn, Kuniyuki Iwashima, Kuniyuki Iwashima, bpf,
netdev
We have some hosts where packets come from special hardware
and are provided directly to userspace, bypassing the kernel
networking stack.
When standard socket applications are run on these hosts,
a userspace proxy is required to mediate traffic between the
hardware and the applications.
+---------+ +----------------------+
| proxy | | socket application |
+---------+ +----------------------+
^ ^ ^
userspace | | |
-----------| |-----------------------------------------------
| | | +---------------------+ | skb
| | `--->| virtual interface |<---'
kernel | | skb +---------------------+
-----------| |-----------------------------------------------
|
v
+------------+
| hardware |
+------------+
However, even though the hardware fully supports timestamping,
the HW timestamps are not directly accessible to the socket
applications because the skb is consumed/injected by the proxy.
For RX flow, let's add a kfunc to update skb_hwtstamps(skb)->hwtstamp
at tc/ingress.
With this kfunc, the proxy can carry the RX hardware timestamp
via encapsulated packets (e.g. in GENEVE option) and BPF prog
can extract it into skb_hwtstamps(skb)->hwtstamp at tc/ingress
of the virtual interface above.
+---------+ +----------------------+
| proxy | | socket application |
+---------+ +----------------------+
^ | encap packets ^ recv payload
userspace | | w/ RX hwtstamp | w/ RX hwtstamp
-----------| |-----------------------------------------------
| | | +---------------------+ | skb
| | `--->| geneve0 |----'
kernel | | skb +---------------------+
| | | ^
| | v |
| | +------------------+ extract RX hwtstamp
| | | BPF@tc/ingress | and set it to skb
| | +------------------+
-----------| |-----------------------------------------------
|
|
+------------+
| hardware |
+------------+
This allows transparently proxying RX hardware timestamps to
the socket applications via SCM_TIMESTAMPING.
Note that bpf_skb_set_hwtstamp() calls skb_header_unclone() and
bpf_compute_data_pointers(), so it is marked as a packet-changing
kfunc.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
include/linux/skbuff.h | 5 +++++
kernel/bpf/verifier.c | 9 ++++++++-
net/core/filter.c | 23 +++++++++++++++++++++++
3 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 115db8c44db2..b4ac1180f5a8 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4701,6 +4701,11 @@ static inline bool skb_defer_rx_timestamp(struct sk_buff *skb)
#endif /* !CONFIG_NETWORK_PHY_TIMESTAMPING */
+struct bpf_hwtstamp {
+ ktime_t hwtstamp;
+ u64 reserved;
+} __packed;
+
/**
* skb_complete_tx_timestamp() - deliver cloned skb with tx timestamps
*
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 7fb88e1cd7c4..6b23577d001a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11191,6 +11191,7 @@ enum special_kfunc_type {
KF_bpf_session_is_return,
KF_bpf_stream_vprintk,
KF_bpf_stream_print_stack,
+ KF_bpf_skb_set_hwtstamp,
};
BTF_ID_LIST(special_kfunc_list)
@@ -11283,6 +11284,11 @@ BTF_ID_UNUSED
#endif
BTF_ID(func, bpf_stream_vprintk)
BTF_ID(func, bpf_stream_print_stack)
+#ifdef CONFIG_NET
+BTF_ID(func, bpf_skb_set_hwtstamp)
+#else
+BTF_ID_UNUSED
+#endif
static bool is_bpf_obj_new_kfunc(u32 func_id)
{
@@ -11364,7 +11370,8 @@ static bool is_kfunc_bpf_preempt_enable(struct bpf_kfunc_call_arg_meta *meta)
bool bpf_is_kfunc_pkt_changing(struct bpf_kfunc_call_arg_meta *meta)
{
- return meta->func_id == special_kfunc_list[KF_bpf_xdp_pull_data];
+ return meta->func_id == special_kfunc_list[KF_bpf_xdp_pull_data] ||
+ meta->func_id == special_kfunc_list[KF_bpf_skb_set_hwtstamp];
}
static enum kfunc_ptr_arg_type
diff --git a/net/core/filter.c b/net/core/filter.c
index acdc66aa4f27..ab7adef9c015 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -12372,6 +12372,28 @@ __bpf_kfunc int bpf_sock_ops_enable_tx_tstamp(struct bpf_sock_ops_kern *skops,
return 0;
}
+__bpf_kfunc int bpf_skb_set_hwtstamp(struct __sk_buff *s,
+ struct bpf_hwtstamp *attrs, int attrs__sz)
+{
+ struct sk_buff *skb = (struct sk_buff *)s;
+
+ if (attrs__sz != sizeof(*attrs) || attrs->reserved)
+ return -EINVAL;
+
+ if (!skb_at_tc_ingress(skb))
+ return -EINVAL;
+
+ if (skb_header_unclone(skb, GFP_ATOMIC))
+ return -ENOMEM;
+
+ skb_clear_tstamp(skb);
+ skb_hwtstamps(skb)->hwtstamp = attrs->hwtstamp;
+
+ bpf_compute_data_pointers(skb);
+
+ return 0;
+}
+
/**
* bpf_xdp_pull_data() - Pull in non-linear xdp data.
* @x: &xdp_md associated with the XDP buffer
@@ -12500,6 +12522,7 @@ BTF_KFUNCS_END(bpf_kfunc_check_set_sock_addr)
BTF_KFUNCS_START(bpf_kfunc_check_set_sched_cls)
BTF_ID_FLAGS(func, bpf_sk_assign_tcp_reqsk)
+BTF_ID_FLAGS(func, bpf_skb_set_hwtstamp)
BTF_KFUNCS_END(bpf_kfunc_check_set_sched_cls)
BTF_KFUNCS_START(bpf_kfunc_check_set_sock_ops)
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v1 bpf-next/net 4/5] bpf: Add kfunc to proxy TX HW Timestamp.
2026-06-12 0:17 [PATCH v1 bpf-next/net 0/5] bpf: Support RX/TX HW timestamp proxy Kuniyuki Iwashima
` (2 preceding siblings ...)
2026-06-12 0:17 ` [PATCH v1 bpf-next/net 3/5] bpf: Add bpf_skb_set_hwtstamp() Kuniyuki Iwashima
@ 2026-06-12 0:17 ` Kuniyuki Iwashima
2026-06-12 0:17 ` [PATCH v1 bpf-next/net 5/5] selftest: bpf: Add test for hwtstamp proxy Kuniyuki Iwashima
4 siblings, 0 replies; 6+ messages in thread
From: Kuniyuki Iwashima @ 2026-06-12 0:17 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Stanislav Fomichev, Andrii Nakryiko, John Fastabend,
Kumar Kartikeya Dwivedi, Eduard Zingerman
Cc: Song Liu, Yonghong Song, Jiri Olsa, Andrew Lunn, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Willem de Bruijn, Kuniyuki Iwashima, Kuniyuki Iwashima, bpf,
netdev
In the setup mentioned in the previous patch, it is impossible
for socket applications to get TX hardware timestamps via
SCM_TIMESTAMPING.
To proxy TX hardware timestamp, let's add two kfuncs:
* bpf_skb_scrub_tx_tstamp() : scrub skb_shinfo(skb)->tx_flags
* bpf_skb_complete_tx_tstamp() : enqueue skb to sk->sk_error_queue
The key idea is to regenerate an skb that contains all the
information required for the TX timestamp, identical to the
original skb.
Here is how it works:
When the socket application sends a packet, BPF prog at tc/egress
checks skb_shinfo()->tx_flags. If it has SKBTX_HW_TSTAMP_NOBPF,
BPF prog scrub the value by bpf_skb_scrub_tx_tstamp() and inserts
a GENEVE option to signal that the packet wants TX HW timestamp.
The proxy decapsulates and forwards the packet to the hardware,
and if it has GENEVE option, the proxy keeps the original packet
until TX completion.
+---------+ +----------------------+
| proxy | | socket application |
+---------+ +----------------------+
| ^ decap packet and |
userspace | | keep it till TX cmpl |
-----------| |-----------------------------------------------
| | | +---------------------+ | skb
| | `----| geneve0 |<---'
kernel | | skb +---------------------+
| | ^ |
| | | v
| | +------------------+ check skb_shinfo()->tx_flags
| | | BPF@tc/egress | and insert a GENEVE option
| | +------------------+
-----------| |-----------------------------------------------
|
v
+------------+
| hardware |
+------------+
Once the proxy gets TX hwtstamp, encapsulate the original packet
with TX hwtstamp embedded in GENEVE option, and sends it to the
GENEVE device.
At tc@ingress, BPF extracts the TX hwtstamp and sets it to skb.
Then, it looks up the sender socket, assigns it to skb->sk,
calls bpf_skb_complete_tx_tstamp(), and returns TCX_ERRQUEUE to
put the skb to skb->sk->sk_error_queue.
+---------+ +----------------------+
| proxy | | socket application |
+---------+ +----------------------+
^ | encap packet ^ get TX hwtstamp by
userspace | | w/ TX hwtstamp | recvmsg(MSG_ERRQUEUE)
-----------| |-----------------------------------------------
| | | +---------------------+ | skb
| | `--->| geneve0 | |
kernel | | skb +---------------------+ |
| | | ________'
| | v | extract TX hwtstamp to skb
| | +------------------+ and look up the sender sk
| | | BPF@tc/ingress | and enqueue skb to its
| | +------------------+ sk->sk_error_queue
-----------| |-----------------------------------------------
|
| TX completion w/ TX hwtstamp
+------------+
| hardware |
+------------+
This provides transparent TX HW timestamp support, and the socket
application can finally receive it via recvmsg(MSG_ERRQUEUE).
Note that struct bpf_tx_tstamp_cmpl needs network_offset and
payload_offset so that
1. ip_cmsg_recv() and ipv6_recv_error() can correctly parse
the IPv4/IPv6 header for some control messages
2. applications can receive the original payload
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
include/linux/filter.h | 2 ++
include/linux/skbuff.h | 8 +++++
include/net/tcx.h | 1 +
include/uapi/linux/bpf.h | 1 +
include/uapi/linux/pkt_cls.h | 3 +-
kernel/bpf/verifier.c | 6 +++-
net/core/dev.c | 39 ++++++++++++++++++++++++
net/core/filter.c | 58 ++++++++++++++++++++++++++++++++++++
8 files changed, 116 insertions(+), 2 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 88a241aac36a..59097bfd8522 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -770,6 +770,7 @@ struct bpf_nh_params {
#define BPF_RI_F_CPU_MAP_INIT BIT(2)
#define BPF_RI_F_DEV_MAP_INIT BIT(3)
#define BPF_RI_F_XSK_MAP_INIT BIT(4)
+#define BPF_RI_F_TX_TS_CMPL BIT(5)
struct bpf_redirect_info {
u64 tgt_index;
@@ -780,6 +781,7 @@ struct bpf_redirect_info {
enum bpf_map_type map_type;
struct bpf_nh_params nh;
u32 kern_flags;
+ struct bpf_tx_tstamp_cmpl txtscmpl;
};
struct bpf_net_context {
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b4ac1180f5a8..bd9343288928 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4706,6 +4706,14 @@ struct bpf_hwtstamp {
u64 reserved;
} __packed;
+struct bpf_tx_tstamp_cmpl {
+ u32 tskey;
+ __be16 protocol;
+ u16 network_offset;
+ u16 payload_offset;
+ u16 reserved;
+} __packed;
+
/**
* skb_complete_tx_timestamp() - deliver cloned skb with tx timestamps
*
diff --git a/include/net/tcx.h b/include/net/tcx.h
index 23a61af13547..052e751d907e 100644
--- a/include/net/tcx.h
+++ b/include/net/tcx.h
@@ -151,6 +151,7 @@ static inline enum tcx_action_base tcx_action_code(struct sk_buff *skb,
fallthrough;
case TCX_DROP:
case TCX_REDIRECT:
+ case TCX_ERRQUEUE:
return code;
case TCX_NEXT:
default:
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 552bc5d9afbd..60950aa583aa 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6532,6 +6532,7 @@ enum tcx_action_base {
TCX_PASS = 0,
TCX_DROP = 2,
TCX_REDIRECT = 7,
+ TCX_ERRQUEUE = 9,
};
struct bpf_xdp_sock {
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 28d94b11d1aa..337f1bdbabb6 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -76,7 +76,8 @@ enum {
* the skb and act like everything
* is alright.
*/
-#define TC_ACT_VALUE_MAX TC_ACT_TRAP
+#define TC_ACT_ERRQUEUE 9
+#define TC_ACT_VALUE_MAX TC_ACT_ERRQUEUE
/* There is a special kind of actions called "extended actions",
* which need a value parameter. These have a local opcode located in
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6b23577d001a..5451a19847ec 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11192,6 +11192,7 @@ enum special_kfunc_type {
KF_bpf_stream_vprintk,
KF_bpf_stream_print_stack,
KF_bpf_skb_set_hwtstamp,
+ KF_bpf_skb_scrub_tx_tstamp,
};
BTF_ID_LIST(special_kfunc_list)
@@ -11286,8 +11287,10 @@ BTF_ID(func, bpf_stream_vprintk)
BTF_ID(func, bpf_stream_print_stack)
#ifdef CONFIG_NET
BTF_ID(func, bpf_skb_set_hwtstamp)
+BTF_ID(func, bpf_skb_scrub_tx_tstamp)
#else
BTF_ID_UNUSED
+BTF_ID_UNUSED
#endif
static bool is_bpf_obj_new_kfunc(u32 func_id)
@@ -11371,7 +11374,8 @@ static bool is_kfunc_bpf_preempt_enable(struct bpf_kfunc_call_arg_meta *meta)
bool bpf_is_kfunc_pkt_changing(struct bpf_kfunc_call_arg_meta *meta)
{
return meta->func_id == special_kfunc_list[KF_bpf_xdp_pull_data] ||
- meta->func_id == special_kfunc_list[KF_bpf_skb_set_hwtstamp];
+ meta->func_id == special_kfunc_list[KF_bpf_skb_set_hwtstamp] ||
+ meta->func_id == special_kfunc_list[KF_bpf_skb_scrub_tx_tstamp];
}
static enum kfunc_ptr_arg_type
diff --git a/net/core/dev.c b/net/core/dev.c
index 1ecd5691992e..6f39e613cbbd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4457,6 +4457,41 @@ tcx_run(const struct bpf_mprog_entry *entry, struct sk_buff *skb,
return tcx_action_code(skb, ret);
}
+static int skb_do_completion(struct sk_buff *skb)
+{
+ enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_INGRESS;
+ struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
+ struct bpf_tx_tstamp_cmpl *txtscmpl;
+
+ if (!(ri->kern_flags & BPF_RI_F_TX_TS_CMPL))
+ goto drop;
+
+ if (skb_header_unclone(skb, GFP_ATOMIC))
+ goto drop;
+
+ __skb_push(skb, skb->mac_len);
+
+ txtscmpl = &ri->txtscmpl;
+
+ drop_reason = pskb_may_pull_reason(skb, txtscmpl->payload_offset);
+ if (drop_reason)
+ goto drop;
+
+ skb->protocol = txtscmpl->protocol;
+ skb_set_network_header(skb, txtscmpl->network_offset);
+ __skb_pull(skb, txtscmpl->payload_offset);
+
+ skb_shinfo(skb)->tskey = txtscmpl->tskey;
+ skb_shinfo(skb)->tx_flags = SKBTX_HW_TSTAMP_NOBPF;
+ __skb_tstamp_tx(skb, NULL, skb_hwtstamps(skb), skb->sk, SCM_TSTAMP_SND);
+
+ consume_skb(skb);
+ return NET_RX_SUCCESS;
+drop:
+ kfree_skb_reason(skb, drop_reason);
+ return NET_RX_DROP;
+}
+
static __always_inline struct sk_buff *
sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
struct net_device *orig_dev, bool *another)
@@ -4505,6 +4540,10 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
*ret = NET_RX_DROP;
bpf_net_ctx_clear(bpf_net_ctx);
return NULL;
+ case TC_ACT_ERRQUEUE:
+ *ret = skb_do_completion(skb);
+ bpf_net_ctx_clear(bpf_net_ctx);
+ return NULL;
/* used by tc_run */
case TC_ACT_STOLEN:
case TC_ACT_QUEUED:
diff --git a/net/core/filter.c b/net/core/filter.c
index ab7adef9c015..0bb8122f9f2e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -12394,6 +12394,62 @@ __bpf_kfunc int bpf_skb_set_hwtstamp(struct __sk_buff *s,
return 0;
}
+__bpf_kfunc int bpf_skb_scrub_tx_tstamp(struct __sk_buff *s)
+{
+ struct sk_buff *skb = (struct sk_buff *)s;
+
+ if (skb_at_tc_ingress(skb))
+ return -EINVAL;
+
+ if (skb_header_unclone(skb, GFP_ATOMIC))
+ return -ENOMEM;
+
+ skb_shinfo(skb)->tx_flags = 0;
+
+ bpf_compute_data_pointers(skb);
+
+ return 0;
+}
+
+__bpf_kfunc int bpf_skb_complete_tx_tstamp(struct __sk_buff *s,
+ struct bpf_tx_tstamp_cmpl *attrs,
+ int attrs__sz)
+{
+ struct sk_buff *skb = (struct sk_buff *)s;
+ struct bpf_redirect_info *ri;
+ struct sock *sk = skb->sk;
+ s32 delta;
+
+ if (attrs__sz != sizeof(*attrs) || attrs->reserved)
+ return -EINVAL;
+
+ if (!sk || !sk_fullsock(sk))
+ return -EINVAL;
+
+ if (attrs->payload_offset > skb->len)
+ return -EINVAL;
+
+ delta = attrs->payload_offset - attrs->network_offset;
+ switch (attrs->protocol) {
+ case htons(ETH_P_IP):
+ if (delta < (s32)sizeof(struct iphdr) || !sk_is_inet(sk))
+ return -EINVAL;
+ break;
+ case htons(ETH_P_IPV6):
+ if (delta < (s32)sizeof(struct ipv6hdr) || sk->sk_family != AF_INET6)
+ return -EINVAL;
+ break;
+ default:
+ return -EAFNOSUPPORT;
+ }
+
+ ri = bpf_net_ctx_get_ri();
+ ri->kern_flags |= BPF_RI_F_TX_TS_CMPL;
+ ri->txtscmpl = *attrs;
+
+ return 0;
+}
+
/**
* bpf_xdp_pull_data() - Pull in non-linear xdp data.
* @x: &xdp_md associated with the XDP buffer
@@ -12523,6 +12579,8 @@ BTF_KFUNCS_END(bpf_kfunc_check_set_sock_addr)
BTF_KFUNCS_START(bpf_kfunc_check_set_sched_cls)
BTF_ID_FLAGS(func, bpf_sk_assign_tcp_reqsk)
BTF_ID_FLAGS(func, bpf_skb_set_hwtstamp)
+BTF_ID_FLAGS(func, bpf_skb_scrub_tx_tstamp)
+BTF_ID_FLAGS(func, bpf_skb_complete_tx_tstamp)
BTF_KFUNCS_END(bpf_kfunc_check_set_sched_cls)
BTF_KFUNCS_START(bpf_kfunc_check_set_sock_ops)
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v1 bpf-next/net 5/5] selftest: bpf: Add test for hwtstamp proxy.
2026-06-12 0:17 [PATCH v1 bpf-next/net 0/5] bpf: Support RX/TX HW timestamp proxy Kuniyuki Iwashima
` (3 preceding siblings ...)
2026-06-12 0:17 ` [PATCH v1 bpf-next/net 4/5] bpf: Add kfunc to proxy TX HW Timestamp Kuniyuki Iwashima
@ 2026-06-12 0:17 ` Kuniyuki Iwashima
4 siblings, 0 replies; 6+ messages in thread
From: Kuniyuki Iwashima @ 2026-06-12 0:17 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Stanislav Fomichev, Andrii Nakryiko, John Fastabend,
Kumar Kartikeya Dwivedi, Eduard Zingerman
Cc: Song Liu, Yonghong Song, Jiri Olsa, Andrew Lunn, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Willem de Bruijn, Kuniyuki Iwashima, Kuniyuki Iwashima, bpf,
netdev
This selftest simulates the hardware timestamp proxy scenario mentioned
in the previous commits using two UDP sockets.
Here, app_fd represents a standard socket application, and proxy_fd
simulates a userspace proxy that receives and injects encapsulated
packets from/to app_fd via a GENEVE device (geneve0).
TX:
1. app_fd sends data w/ SCM_TS_OPT_ID
2. BPF prog hooks at tc/egress of geneve0
3. BPF inserts the GENEVE option with Type 0x1 to save SCM_TS_OPT_ID
4. proxy_fd receives the encapsulated packet
5. proxy changes the option Type to 0x2 and sets TX hwtstamp
6. proxy sends it back to geneve0
7. BPF prog hooks at tc/ingress of geneve0
8. BPF extracts TX hwtstamp into skb
9. BPF looks up the app_fd socket
10. BPF enqueues skb to app_fd's sk->sk_error_queue
11. app_fd receives TX hwtstamp and verifies the value
RX:
12. proxy_fd generates RX packet from TX packet
by swapping src/dst in each header
13. proxy changes the option Type to 0x3 and sets RX hwtstamp
14. proxy sends the encapsulated packet to geneve0
15. BPF prog hooks at tc/ingress of geneve0
16. BPF extracts RX hwtstamp into skb
17. app_fd receives RX hwtstamp and verifies the value
The GENEVE TLV option is structured as follows:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class | Type |0|0|0| Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ HW Timestamp (8 bytes) +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timestamp key (4 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Type:
- 0x1: TX packet
- 0x2: TX completion packet w/ TX hwtstamp
- 0x3: RX packet w/ RX hwtstamp
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
tools/testing/selftests/bpf/bpf_kfuncs.h | 10 +
.../selftests/bpf/prog_tests/proxy_hwtstamp.c | 580 ++++++++++++++++++
.../selftests/bpf/progs/bpf_tracing_net.h | 1 +
.../selftests/bpf/progs/proxy_hwtstamp.c | 234 +++++++
4 files changed, 825 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/proxy_hwtstamp.c
create mode 100644 tools/testing/selftests/bpf/progs/proxy_hwtstamp.c
diff --git a/tools/testing/selftests/bpf/bpf_kfuncs.h b/tools/testing/selftests/bpf/bpf_kfuncs.h
index 7dad01439391..8d119b10ed0d 100644
--- a/tools/testing/selftests/bpf/bpf_kfuncs.h
+++ b/tools/testing/selftests/bpf/bpf_kfuncs.h
@@ -92,4 +92,14 @@ extern int bpf_set_dentry_xattr(struct dentry *dentry, const char *name__str,
const struct bpf_dynptr *value_p, int flags) __ksym __weak;
extern int bpf_remove_dentry_xattr(struct dentry *dentry, const char *name__str) __ksym __weak;
+extern int bpf_skb_scrub_tx_tstamp(struct __sk_buff *s) __ksym __weak;
+
+struct bpf_hwtstamp;
+extern int bpf_skb_set_hwtstamp(struct __sk_buff *s,
+ struct bpf_hwtstamp *attrs, int attrs__sz) __ksym __weak;
+
+struct bpf_tx_tstamp_cmpl;
+extern int bpf_skb_complete_tx_tstamp(struct __sk_buff *s,
+ struct bpf_tx_tstamp_cmpl *attrs,
+ int attrs__sz) __ksym __weak;
#endif
diff --git a/tools/testing/selftests/bpf/prog_tests/proxy_hwtstamp.c b/tools/testing/selftests/bpf/prog_tests/proxy_hwtstamp.c
new file mode 100644
index 000000000000..d0f90f22bea2
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/proxy_hwtstamp.c
@@ -0,0 +1,580 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2026 Google LLC */
+
+#include <sys/epoll.h>
+#include <net/if.h>
+#include <linux/errqueue.h>
+#include <linux/net_tstamp.h>
+
+#include "test_progs.h"
+#include <network_helpers.h>
+#include "proxy_hwtstamp.skel.h"
+
+#define swap(a, b) \
+ do { \
+ typeof(a) __tmp = (a); \
+ (a) = (b); \
+ (b) = __tmp; \
+ } while (0)
+
+#define swap_array(a, b) \
+ do { \
+ char __tmp[sizeof(a)]; \
+ memcpy(__tmp, a, sizeof(a)); \
+ memcpy(a, b, sizeof(a)); \
+ memcpy(b, __tmp, sizeof(a)); \
+ } while (0)
+
+struct genevehdr {
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+ u8 opt_len:6;
+ u8 ver:2;
+ u8 rsvd1:6;
+ u8 critical:1;
+ u8 oam:1;
+#else
+ u8 ver:2;
+ u8 opt_len:6;
+ u8 oam:1;
+ u8 critical:1;
+ u8 rsvd1:6;
+#endif
+ __be16 proto_type;
+ u8 vni[3];
+ u8 rsvd2;
+};
+
+struct geneve_opt {
+ __be16 opt_class;
+ u8 type;
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+ u8 length:5;
+ u8 r3:1;
+ u8 r2:1;
+ u8 r1:1;
+#else
+ u8 r1:1;
+ u8 r2:1;
+ u8 r3:1;
+ u8 length:5;
+#endif
+};
+
+struct proxy_header {
+ struct genevehdr geneve;
+ struct geneve_opt geneve_opt;
+ s64 hwtstamp;
+ u32 tskey;
+ struct ethhdr eth;
+ union {
+ struct {
+ struct iphdr ip;
+ struct udphdr udp;
+ } v4;
+ struct {
+ struct ipv6hdr ip;
+ struct udphdr udp;
+ } v6;
+ };
+} __attribute__((packed));
+
+#define GENEVE_VNI 0x900913
+#define GENEVE_OPT_CLASS 0x9009
+#define GENEVE_OPT_LEN ((sizeof(struct proxy_hwtstamp_opt) \
+ - sizeof(struct geneve_opt)) / 4)
+enum {
+ GENEVE_OPT_TYPE_TX = 1,
+ GENEVE_OPT_TYPE_TX_CMPL = 2,
+ GENEVE_OPT_TYPE_RX = 3,
+};
+
+#define APP_DST_IPV4 "192.168.0.1"
+#define APP_DST_IPV6 "2001:db7::92"
+
+#define GENEVE_PORT 6081
+#define APP_SRC_IPV4 "10.0.3.1"
+#define APP_SRC_IPV6 "2001:db8::1"
+
+#define HWTSTAMP 0x12345678
+#define TSKEY 0xaabbccdd
+
+static struct proxy_hwtstamp_test_case {
+ char name[8];
+ int family;
+ char geneve_remote_ip[16];
+ char geneve_local_ip[16];
+ char app_dst_ip[16];
+ int app_dst_port;
+ int encap_payload_len;
+
+ /* fields below are populated during test. */
+ struct proxy_hwtstamp *skel;
+ struct netns_obj *netns;
+ struct sockaddr_storage geneve_remote_addr;
+ struct sockaddr_storage geneve_local_addr;
+ socklen_t addrlen;
+ int proxy_fd;
+ int app_fd;
+#define APP_PAYLOAD_LEN 512
+ char app_payload[APP_PAYLOAD_LEN];
+ char encap_payload[APP_PAYLOAD_LEN + sizeof(struct proxy_header)];
+} test_cases[] = {
+ {
+ .name = "IPv4",
+ .family = AF_INET,
+ .geneve_remote_ip = "127.0.0.1",
+ .geneve_local_ip = APP_SRC_IPV4,
+ .app_dst_ip = APP_DST_IPV4,
+ .app_dst_port = 443,
+ .encap_payload_len = APP_PAYLOAD_LEN + offsetofend(struct proxy_header, v4),
+ },
+ {
+ .name = "IPv6",
+ .family = AF_INET6,
+ .geneve_remote_ip = "::1",
+ .geneve_local_ip = APP_SRC_IPV6,
+ .app_dst_ip = APP_DST_IPV6,
+ .app_dst_port = 443,
+ .encap_payload_len = APP_PAYLOAD_LEN + offsetofend(struct proxy_header, v6),
+ },
+};
+
+char *ipv4_commands[] = {
+ "ip link set dev lo up",
+ "ip link add geneve0 type geneve local " APP_SRC_IPV4 " external",
+ "ip addr add " APP_SRC_IPV4 "/24 dev geneve0",
+ "ip link set dev geneve0 address aa:bb:cc:dd:ee:ff",
+ "ip link set dev geneve0 up",
+ "ip route add " APP_DST_IPV4 "/32 dev geneve0",
+ /* We do not forward ARP to the wire in this test,
+ * so a static neighbour entry is needed for APP_DST_IPV4.
+ */
+ "ip neigh add " APP_DST_IPV4 " lladdr ab:bc:cd:de:ef:fa dev geneve0",
+};
+
+char *ipv6_commands[] = {
+ "ip link set dev lo up",
+ "ip link add geneve0 type geneve local " APP_SRC_IPV6 " external",
+ "ip -6 addr add " APP_SRC_IPV6 "/32 dev geneve0 nodad",
+ "ip link set dev geneve0 address aa:bb:cc:dd:ee:ff",
+ "ip link set dev geneve0 up",
+ "ip -6 route add " APP_DST_IPV6 "/128 dev geneve0",
+ /* Similarly, APP_DST_IPV6 needs a static neighbour entry */
+ "ip -6 neigh add " APP_DST_IPV6 " lladdr ab:bc:cd:de:ef:fa dev geneve0",
+};
+
+static int setup_netns(struct proxy_hwtstamp_test_case *test_case)
+{
+ int i, array_size, ret;
+ char **commands;
+
+ if (test_case->family == AF_INET) {
+ commands = ipv4_commands;
+ array_size = ARRAY_SIZE(ipv4_commands);
+ } else {
+ commands = ipv6_commands;
+ array_size = ARRAY_SIZE(ipv6_commands);
+ }
+
+ for (i = 0; i < array_size; i++) {
+ ret = system(commands[i]);
+ if (!ASSERT_OK(ret, commands[i]))
+ break;
+ }
+
+ return ret;
+}
+
+static int setup_tcx(struct proxy_hwtstamp_test_case *test_case)
+{
+ struct proxy_hwtstamp *skel = test_case->skel;
+ LIBBPF_OPTS(bpf_tcx_opts, tcx_opts_ingress);
+ LIBBPF_OPTS(bpf_tcx_opts, tcx_opts_egress);
+ struct bpf_link *link;
+ int ifindex;
+
+ ifindex = if_nametoindex("geneve0");
+
+ if (make_sockaddr(test_case->family, test_case->geneve_remote_ip, GENEVE_PORT,
+ &test_case->geneve_remote_addr, &test_case->addrlen))
+ goto err;
+
+ if (make_sockaddr(test_case->family, test_case->geneve_local_ip, GENEVE_PORT,
+ &test_case->geneve_local_addr, &test_case->addrlen))
+ goto err;
+
+ /* Set up struct bpf_tunnel_key for GENEVE.
+ * Note that bpf_skb_set_tunnel_key() expects
+ * IPv4 address in host byte order
+ * IPv6 address in network byte order.
+ */
+ skel->bss->key_dst.tunnel_id = GENEVE_VNI;
+ if (test_case->family == AF_INET) {
+ struct sockaddr_in *addr4;
+
+ addr4 = (struct sockaddr_in *)&test_case->geneve_remote_addr;
+ skel->bss->key_dst.remote_ipv4 = ntohl(addr4->sin_addr.s_addr);
+
+ addr4 = (struct sockaddr_in *)&test_case->geneve_local_addr;
+ skel->bss->key_dst.local_ipv4 = ntohl(addr4->sin_addr.s_addr);
+
+ skel->bss->tunnel_tx_flags = BPF_F_ZERO_CSUM_TX;
+ skel->bss->tunnel_rx_flags = 0;
+ } else {
+ struct sockaddr_in6 *addr6;
+
+ addr6 = (struct sockaddr_in6 *)&test_case->geneve_remote_addr;
+ memcpy(&skel->bss->key_dst.remote_ipv6,
+ &addr6->sin6_addr, sizeof(addr6->sin6_addr));
+
+ addr6 = (struct sockaddr_in6 *)&test_case->geneve_local_addr;
+ memcpy(&skel->bss->key_dst.local_ipv6,
+ &addr6->sin6_addr, sizeof(addr6->sin6_addr));
+
+ /* IPv6 requires BPF_F_TUNINFO_IPV6.
+ * Since udpv6_rcv() drops 0 csum packets unlike udp_rcv()
+ * by default, UDP_NO_CHECK6_RX must be set on the proxy socket.
+ */
+ skel->bss->tunnel_tx_flags = BPF_F_ZERO_CSUM_TX | BPF_F_TUNINFO_IPV6;
+ skel->bss->tunnel_rx_flags = BPF_F_TUNINFO_IPV6;
+ }
+
+ /* Attach BPF progs to egress and ingress. */
+ link = bpf_program__attach_tcx(skel->progs.proxy_hwtstamp_ingress,
+ ifindex, &tcx_opts_ingress);
+ if (!ASSERT_OK_PTR(link, "attach_tcx(ingress)"))
+ goto err;
+
+ skel->links.proxy_hwtstamp_ingress = link;
+
+ link = bpf_program__attach_tcx(skel->progs.proxy_hwtstamp_egress,
+ ifindex, &tcx_opts_egress);
+ if (!ASSERT_OK_PTR(link, "attach_tcx(egress)"))
+ goto err;
+
+ skel->links.proxy_hwtstamp_egress = link;
+
+ return 0;
+err:
+ return -1;
+}
+
+static int setup_fd(struct proxy_hwtstamp_test_case *test_case)
+{
+ int proxy_fd, app_fd;
+ int val, ret;
+
+ proxy_fd = start_server_addr(SOCK_DGRAM, &test_case->geneve_remote_addr,
+ test_case->addrlen, NULL);
+ if (!ASSERT_OK_FD(proxy_fd, "start_server"))
+ goto err;
+
+ if (test_case->family == AF_INET6) {
+ /* udpv6_rcv() drops 0 csum (BPF_F_ZERO_CSUM_TX) packets
+ * unless UDP_NO_CHECK6_RX is set.
+ */
+ val = 1;
+ ret = setsockopt(proxy_fd, SOL_UDP, UDP_NO_CHECK6_RX, &val, sizeof(val));
+ if (!ASSERT_OK(ret, "setsockopt(UDP_NO_CHECK6_RX)"))
+ goto close_proxy;
+ }
+
+ app_fd = connect_to_addr_str(test_case->family, SOCK_DGRAM,
+ test_case->app_dst_ip,
+ test_case->app_dst_port, NULL);
+ if (!ASSERT_OK_FD(app_fd, "connect_to_addr_str"))
+ goto close_proxy;
+
+ val = SOF_TIMESTAMPING_RX_HARDWARE |
+ SOF_TIMESTAMPING_TX_HARDWARE |
+ SOF_TIMESTAMPING_RAW_HARDWARE |
+ SOF_TIMESTAMPING_OPT_ID;
+ ret = setsockopt(app_fd, SOL_SOCKET, SO_TIMESTAMPING_NEW, &val, sizeof(val));
+ if (!ASSERT_OK(ret, "setsockopt(SO_TIMESTAMPING_NEW)"))
+ goto close_app;
+
+ test_case->proxy_fd = proxy_fd;
+ test_case->app_fd = app_fd;
+
+ return 0;
+
+close_app:
+ close(app_fd);
+close_proxy:
+ close(proxy_fd);
+err:
+ return -1;
+}
+
+static void destroy_env(struct proxy_hwtstamp_test_case *test_case)
+{
+ close(test_case->app_fd);
+ close(test_case->proxy_fd);
+ proxy_hwtstamp__destroy(test_case->skel);
+ netns_free(test_case->netns);
+}
+
+static int setup_env(struct proxy_hwtstamp_test_case *test_case)
+{
+ test_case->netns = netns_new("proxy_hwtstamp", true);
+ if (!ASSERT_OK_PTR(test_case->netns, "netns_new"))
+ goto err;
+
+ if (setup_netns(test_case))
+ goto free_netns;
+
+ test_case->skel = proxy_hwtstamp__open_and_load();
+ if (!ASSERT_OK_PTR(test_case->skel, "open_and_load"))
+ goto free_netns;
+
+ if (setup_tcx(test_case))
+ goto destroy_skel;
+
+ if (setup_fd(test_case))
+ goto destroy_skel;
+
+ return 0;
+
+destroy_skel:
+ proxy_hwtstamp__destroy(test_case->skel);
+free_netns:
+ netns_free(test_case->netns);
+err:
+ return -1;
+}
+
+static int wait_data(struct proxy_hwtstamp_test_case *test_case, bool tx)
+{
+ struct epoll_event event = {
+ .events = tx ? EPOLLERR : EPOLLIN,
+ .data.fd = test_case->app_fd,
+ };
+ int epoll_fd;
+ int ret = -1;
+
+ epoll_fd = epoll_create1(0);
+ if (!ASSERT_GE(epoll_fd, 0, "epoll_create1"))
+ goto out;
+
+ ret = epoll_ctl(epoll_fd, EPOLL_CTL_ADD, test_case->app_fd, &event);
+ if (!ASSERT_OK(ret, "epoll_ctl"))
+ goto close_epoll;
+
+ ret = epoll_wait(epoll_fd, &event, 1, 3000);
+ if (ASSERT_EQ(ret, 1, "epoll_wait"))
+ ret = 0;
+ else
+ ret = -1;
+
+close_epoll:
+ close(epoll_fd);
+out:
+ return ret;
+}
+
+static int check_tstamp(struct proxy_hwtstamp_test_case *test_case, bool tx)
+{
+ char buf_msg[APP_PAYLOAD_LEN * 2], buf_cmsg[1024];
+ bool saw_tstamp = false, saw_tskey = false;
+ struct msghdr msg = {};
+ struct iovec iov = {};
+ struct cmsghdr *cmsg;
+ int ret;
+
+ if (wait_data(test_case, tx))
+ return -1;
+
+ iov.iov_base = buf_msg;
+ iov.iov_len = sizeof(buf_msg);
+
+ msg.msg_iov = &iov;
+ msg.msg_iovlen = 1;
+ msg.msg_control = buf_cmsg;
+ msg.msg_controllen = sizeof(buf_cmsg);
+
+ ret = recvmsg(test_case->app_fd, &msg, tx ? MSG_ERRQUEUE : 0);
+
+ if (ret > 0)
+ hexdump(tx ? "tx tstamp " : "rx tstamp ", buf_msg, ret);
+
+ if (!ASSERT_EQ(ret, APP_PAYLOAD_LEN, "recvmsg"))
+ return -1;
+
+ ret = memcmp(buf_msg, test_case->app_payload, sizeof(test_case->app_payload));
+ ASSERT_OK(ret, "memcmp");
+
+ ret = -1;
+
+ for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) {
+ if (cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type == SO_TIMESTAMPING_NEW) {
+ struct scm_timestamping *ts;
+
+ ts = (struct scm_timestamping *)CMSG_DATA(cmsg);
+ ASSERT_EQ(ts->ts[2].tv_sec, 0, "tv_sec");
+ ASSERT_EQ(ts->ts[2].tv_nsec, HWTSTAMP, "tv_nsec");
+
+ saw_tstamp = true;
+ } else if ((cmsg->cmsg_level == SOL_IP && cmsg->cmsg_type == IP_RECVERR) ||
+ (cmsg->cmsg_level == SOL_IPV6 && cmsg->cmsg_type == IPV6_RECVERR)) {
+ struct sock_extended_err *ee;
+
+ ee = (struct sock_extended_err *)CMSG_DATA(cmsg);
+
+ if (ee->ee_origin == SO_EE_ORIGIN_TIMESTAMPING) {
+ ASSERT_EQ(ee->ee_data, TSKEY, "tskey");
+ saw_tskey = true;
+ }
+ }
+ }
+
+ ASSERT_TRUE(saw_tstamp && (!tx || saw_tstamp), "no timestamp");
+
+ return ret;
+}
+
+static int test_proxy_hwtstamp_tx(struct proxy_hwtstamp_test_case *test_case)
+{
+ char h_source_dummy[ETH_HLEN] = {0xFF, 0xEE, 0xDD, 0xCC, 0xBB, 0xAA};
+ char buf_cmsg[CMSG_SPACE(sizeof(u32))];
+ struct proxy_header *phdr;
+ struct msghdr msg = {};
+ struct iovec iov = {};
+ struct cmsghdr *cmsg;
+ int ret;
+
+ memset(test_case->app_payload, 0xAB, sizeof(test_case->app_payload));
+ iov.iov_base = test_case->app_payload;
+ iov.iov_len = sizeof(test_case->app_payload);
+
+ msg.msg_iov = &iov;
+ msg.msg_iovlen = 1;
+ msg.msg_control = buf_cmsg;
+ msg.msg_controllen = sizeof(buf_cmsg);
+
+ cmsg = CMSG_FIRSTHDR(&msg);
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_TS_OPT_ID;
+ cmsg->cmsg_len = CMSG_LEN(sizeof(u32));
+ *(u32 *)CMSG_DATA(cmsg) = TSKEY;
+
+ ret = sendmsg(test_case->app_fd, &msg, 0);
+ if (!ASSERT_EQ(ret, sizeof(test_case->app_payload), "send"))
+ return -1;
+
+ while (1) {
+ memset(test_case->encap_payload, 0, sizeof(test_case->encap_payload));
+
+ ret = recv(test_case->proxy_fd, test_case->encap_payload,
+ sizeof(test_case->encap_payload), 0);
+ if (ret <= (int)sizeof(phdr->geneve)) {
+ ASSERT_GT(ret, (int)sizeof(phdr->geneve), "recv(tx ingress)");
+ return -1;
+ }
+
+ phdr = (struct proxy_header *)test_case->encap_payload;
+
+ /* In the real world, we forward all packets,
+ * including ARP, NDP, etc, but now we ignore them.
+ * In this test case, we only care about skb with
+ * the GENEVE option, meaning it was sent by app_fd.
+ */
+ if (phdr->geneve.opt_len)
+ break;
+ }
+
+ hexdump("tx payload ", test_case->encap_payload,
+ test_case->encap_payload_len);
+
+ if (!ASSERT_EQ(ret, test_case->encap_payload_len, "encap payload len"))
+ return -1;
+
+ if (!ASSERT_EQ(phdr->tskey, TSKEY, "tskey"))
+ return -1;
+
+ /* Assume we have got TX hwtstamp now.
+ * Reuse the original payload to "regenerate" the
+ * same skb to put into app_fd's sk_error_queue.
+ */
+ phdr->geneve_opt.type = GENEVE_OPT_TYPE_TX_CMPL;
+ phdr->hwtstamp = HWTSTAMP;
+
+ /* GENEVE drops a packet if the outer/inner eth headers
+ * have the same source address. (See geneve_rx())
+ * Work around it by filling a fake address.
+ */
+ swap_array(phdr->eth.h_source, h_source_dummy);
+
+ /* Send the TX completion packet to geneve0. */
+ ret = sendto(test_case->proxy_fd,
+ test_case->encap_payload, test_case->encap_payload_len, 0,
+ (struct sockaddr *)&test_case->geneve_local_addr, test_case->addrlen);
+ if (!ASSERT_EQ(ret, test_case->encap_payload_len, "sendto(tx cmpl)"))
+ return -1;
+
+ swap_array(phdr->eth.h_source, h_source_dummy);
+
+ return check_tstamp(test_case, true);
+}
+
+static int test_proxy_hwtstamp_rx(struct proxy_hwtstamp_test_case *test_case)
+{
+ struct proxy_header *phdr;
+ int ret;
+
+ /* Assume we have received a packet w/ RX hwtstamp.
+ * Generate RX packet by swapping source/dest of the
+ * original TX packet.
+ */
+ phdr = (struct proxy_header *)test_case->encap_payload;
+
+ swap_array(phdr->eth.h_dest, phdr->eth.h_source);
+
+ if (test_case->family == AF_INET) {
+ swap(phdr->v4.ip.daddr, phdr->v4.ip.saddr);
+ swap(phdr->v4.udp.dest, phdr->v4.udp.source);
+ } else {
+ swap(phdr->v6.ip.daddr, phdr->v6.ip.saddr);
+ swap(phdr->v6.udp.dest, phdr->v6.udp.source);
+ }
+
+ /* Embed RX hwtstamp into the GENEVE option. */
+ phdr->geneve_opt.type = GENEVE_OPT_TYPE_RX;
+ phdr->hwtstamp = HWTSTAMP;
+ phdr->tskey = 0;
+
+ /* Send the packet to geneve0. */
+ ret = sendto(test_case->proxy_fd,
+ test_case->encap_payload, test_case->encap_payload_len, 0,
+ (struct sockaddr *)&test_case->geneve_local_addr, test_case->addrlen);
+ if (!ASSERT_EQ(ret, test_case->encap_payload_len, "sendto(rx)"))
+ return -1;
+
+ return check_tstamp(test_case, false);
+}
+
+static void run_test(struct proxy_hwtstamp_test_case *test_case)
+{
+ int ret;
+
+ ret = setup_env(test_case);
+ if (ret)
+ return;
+
+ ret = test_proxy_hwtstamp_tx(test_case);
+ if (!ret)
+ test_proxy_hwtstamp_rx(test_case);
+
+ destroy_env(test_case);
+}
+
+void test_proxy_hwtstamp(void)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+ if (!test__start_subtest(test_cases[i].name))
+ continue;
+
+ run_test(&test_cases[i]);
+ }
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
index d8dacef37c16..77a88dc20a64 100644
--- a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
+++ b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
@@ -73,6 +73,7 @@
#define ETH_P_IPV6 0x86DD
#define NEXTHDR_TCP 6
+#define NEXTHDR_UDP 17
#define TCPOPT_NOP 1
#define TCPOPT_EOL 0
diff --git a/tools/testing/selftests/bpf/progs/proxy_hwtstamp.c b/tools/testing/selftests/bpf/progs/proxy_hwtstamp.c
new file mode 100644
index 000000000000..c15428e4c20f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/proxy_hwtstamp.c
@@ -0,0 +1,234 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2026 Google LLC */
+
+#include "vmlinux.h"
+#include <errno.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+#include "bpf_tracing_net.h"
+
+struct proxy_hwtstamp_opt {
+ struct geneve_opt header;
+ ktime_t hwtstamp;
+ u32 tskey;
+} __attribute__((packed));
+
+#define GENEVE_VNI 0x900913
+#define GENEVE_OPT_CLASS 0x9009
+#define GENEVE_OPT_LEN ((sizeof(struct proxy_hwtstamp_opt) \
+ - sizeof(struct geneve_opt)) / 4)
+enum {
+ GENEVE_OPT_TYPE_TX = 1,
+ GENEVE_OPT_TYPE_TX_CMPL = 2,
+ GENEVE_OPT_TYPE_RX = 3,
+};
+
+struct bpf_tunnel_key key_dst; /* Populated from userspace for TX encap. */
+int tunnel_tx_flags;
+int tunnel_rx_flags;
+
+SEC("tcx/egress")
+int proxy_hwtstamp_egress(struct __sk_buff *skb)
+{
+ struct skb_shared_info *shared_info;
+ struct proxy_hwtstamp_opt opt = {};
+ struct sk_buff *kskb;
+ int ret;
+
+ /* Outgoing packet will be |ETH|IP|UDP|GENEVE|ETH|IP|UDP|Payload| */
+ ret = bpf_skb_set_tunnel_key(skb, &key_dst, sizeof(key_dst), tunnel_tx_flags);
+ if (ret < 0)
+ goto drop;
+
+ kskb = bpf_cast_to_kern_ctx(skb);
+ shared_info = bpf_core_cast(kskb->head + kskb->end, struct skb_shared_info);
+ if (!shared_info->tx_flags) {
+ /* If TX tstamp is not needed, don't insert the GENEVE option.
+ * The proxy socket will see genevehdr.opt_len == 0.
+ */
+ goto pass;
+ }
+
+ opt.header.opt_class = bpf_htons(GENEVE_OPT_CLASS);
+ opt.header.type = GENEVE_OPT_TYPE_TX;
+ opt.header.length = GENEVE_OPT_LEN;
+ opt.tskey = shared_info->tskey;
+
+ /* Outgoing packet will be |ETH|IP|UDP|GENEVE|GENEVE_OPT|ETH|IP|UDP|Payload| */
+ ret = bpf_skb_set_tunnel_opt(skb, &opt, sizeof(opt));
+ if (ret < 0)
+ goto drop;
+
+ bpf_skb_scrub_tx_tstamp(skb);
+pass:
+ return TCX_PASS;
+drop:
+ return TCX_DROP;
+}
+
+static int proxy_hwtstamp_sk_assign(struct __sk_buff *skb,
+ struct bpf_tx_tstamp_cmpl *attrs)
+{
+ struct bpf_sock_tuple tuple;
+ void *data_end, *data_l4;
+ __be16 *dport, *sport;
+ struct bpf_sock *skc;
+ struct ethhdr *eth;
+ int protocol_l4;
+ int tuple_size;
+ int ret;
+
+ data_end = (void *)(long)skb->data_end;
+ eth = (struct ethhdr *)(long)skb->data;
+
+ if (eth + 1 > data_end)
+ goto drop;
+
+ attrs->protocol = eth->h_proto;
+
+ switch (bpf_ntohs(eth->h_proto)) {
+ case ETH_P_IP: {
+ struct iphdr *ipv4 = (struct iphdr *)(eth + 1);
+
+ if (ipv4 + 1 > data_end)
+ goto drop;
+
+ attrs->payload_offset += sizeof(struct iphdr);
+
+ protocol_l4 = ipv4->protocol;
+ data_l4 = ipv4 + 1;
+
+ /* Swap daddr/saddr since this skb has the original TX headers. */
+ tuple.ipv4.daddr = ipv4->saddr;
+ tuple.ipv4.saddr = ipv4->daddr;
+
+ tuple_size = sizeof(tuple.ipv4);
+ dport = &tuple.ipv4.dport;
+ sport = &tuple.ipv4.sport;
+ break;
+ }
+ case ETH_P_IPV6: {
+ struct ipv6hdr *ipv6 = (struct ipv6hdr *)(eth + 1);
+
+ if (ipv6 + 1 > data_end)
+ goto drop;
+
+ attrs->payload_offset += sizeof(struct ipv6hdr);
+
+ protocol_l4 = ipv6->nexthdr;
+ data_l4 = ipv6 + 1;
+
+ /* Swap daddr/saddr since this skb has the original TX headers. */
+ __builtin_memcpy(tuple.ipv6.daddr, &ipv6->saddr, sizeof(tuple.ipv6.daddr));
+ __builtin_memcpy(tuple.ipv6.saddr, &ipv6->daddr, sizeof(tuple.ipv6.saddr));
+
+ tuple_size = sizeof(tuple.ipv6);
+ dport = &tuple.ipv6.dport;
+ sport = &tuple.ipv6.sport;
+ break;
+ }
+ default:
+ goto drop;
+ }
+
+ switch (protocol_l4) {
+ case IPPROTO_UDP: {
+ struct udphdr *udp = data_l4;
+
+ if (udp + 1 > data_end)
+ goto drop;
+
+ attrs->payload_offset += sizeof(struct udphdr);
+
+ /* Swap sport/dport since this skb has the original TX headers. */
+ *dport = udp->source;
+ *sport = udp->dest;
+
+ skc = bpf_sk_lookup_udp(skb, &tuple, tuple_size, -1, 0);
+ break;
+ }
+ default:
+ goto drop;
+ }
+ if (!skc)
+ goto drop;
+
+ ret = bpf_sk_assign(skb, skc, 0);
+ bpf_sk_release(skc);
+
+ if (ret)
+ goto drop;
+
+ return 0;
+drop:
+ return TCX_DROP;
+}
+
+static int proxy_hwtstamp_tx_completion(struct __sk_buff *skb, u32 tskey)
+{
+ struct bpf_tx_tstamp_cmpl attrs = {
+ .network_offset = sizeof(struct ethhdr),
+ .payload_offset = sizeof(struct ethhdr),
+ .tskey = tskey,
+ };
+ int ret;
+
+ /* Set skb->sk to the socket of the original sender. */
+ ret = proxy_hwtstamp_sk_assign(skb, &attrs);
+ if (ret)
+ return ret;
+
+ ret = bpf_skb_complete_tx_tstamp(skb, &attrs, sizeof(attrs));
+ if (ret)
+ return TCX_DROP;
+
+ return TCX_ERRQUEUE;
+}
+
+SEC("tcx/ingress")
+int proxy_hwtstamp_ingress(struct __sk_buff *skb)
+{
+ struct proxy_hwtstamp_opt opt;
+ struct bpf_tunnel_key key;
+ int ret;
+
+ /* Get the GENEVE header. */
+ ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), tunnel_rx_flags);
+ if (ret < 0)
+ goto drop;
+
+ if (key.tunnel_id != GENEVE_VNI)
+ goto drop;
+
+ /* Get the GENEVE option. */
+ ret = bpf_skb_get_tunnel_opt(skb, &opt, sizeof(opt));
+ if (ret < sizeof(opt)) {
+ /* If TX/RX tstamp is not needed, the proxy socket
+ * does not insert the GENEVE option.
+ */
+ goto pass;
+ }
+
+ if (opt.header.opt_class != bpf_htons(GENEVE_OPT_CLASS) ||
+ opt.header.length != GENEVE_OPT_LEN)
+ goto drop;
+
+ if (opt.header.type == GENEVE_OPT_TYPE_TX_CMPL ||
+ opt.header.type == GENEVE_OPT_TYPE_RX) {
+ struct bpf_hwtstamp attrs = {
+ .hwtstamp = opt.hwtstamp,
+ };
+
+ bpf_skb_set_hwtstamp(skb, &attrs, sizeof(attrs));
+
+ if (opt.header.type == GENEVE_OPT_TYPE_TX_CMPL)
+ return proxy_hwtstamp_tx_completion(skb, opt.tskey);
+ }
+pass:
+ return TCX_PASS;
+drop:
+ return TCX_DROP;
+}
+
+char _license[] SEC("license") = "GPL";
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-06-12 0:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12 0:17 [PATCH v1 bpf-next/net 0/5] bpf: Support RX/TX HW timestamp proxy Kuniyuki Iwashima
2026-06-12 0:17 ` [PATCH v1 bpf-next/net 1/5] ethtool: Introduce ETHTOOL_MSG_TSINFO_SET for virtual interfaces Kuniyuki Iwashima
2026-06-12 0:17 ` [PATCH v1 bpf-next/net 2/5] bpf: Rename bpf_kfunc_set_tcp_reqsk to bpf_kfunc_set_sched_cls Kuniyuki Iwashima
2026-06-12 0:17 ` [PATCH v1 bpf-next/net 3/5] bpf: Add bpf_skb_set_hwtstamp() Kuniyuki Iwashima
2026-06-12 0:17 ` [PATCH v1 bpf-next/net 4/5] bpf: Add kfunc to proxy TX HW Timestamp Kuniyuki Iwashima
2026-06-12 0:17 ` [PATCH v1 bpf-next/net 5/5] selftest: bpf: Add test for hwtstamp proxy Kuniyuki Iwashima
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox