hsr: netlink notifications leak across network namespaces

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* hsr: netlink notifications leak across network namespaces
@ 2026-05-18  6:44 Maoyi Xie
  2026-05-27  7:59 ` [PATCH net] hsr: broadcast netlink notifications in the device's net namespace Maoyi Xie
  0 siblings, 1 reply; 6+ messages in thread
From: Maoyi Xie @ 2026-05-18  6:44 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2889 bytes --]

Hi all,

(Resending from a personal address. My previous attempt from
my NTU corporate account carried an auto-appended confidentiality
disclaimer that you've declined to accept. The content below is
unchanged.)

I noticed what looks like a cross-namespace leak in
net/hsr/hsr_netlink.c on linux-7.1-rc1. Could you confirm if this
is a real defect, and worth fixing?

# Where it is

net/hsr/hsr_netlink.c line 250 in hsr_nl_ringerror()
net/hsr/hsr_netlink.c line 287 in hsr_nl_nodedown()

Both call genlmsg_multicast(&hsr_genl_family, skb, 0, 0, GFP_ATOMIC).

# What looks wrong

hsr_genl_family is registered with .netnsok = true (line 557),
so HSR devices can live in non-init network namespaces. But
genlmsg_multicast() delivers on the kernel default generic
netlink socket. That socket lives in init_net. Events from any
HSR device end up there, regardless of which network namespace
the device sits in.

Other .netnsok = true networking subsystems use
genlmsg_multicast_netns() instead. Examples:

  drivers/net/gtp.c
  drivers/net/ovpn/netlink.c
  drivers/net/team/team_core.c
  net/batman-adv/netlink.c
  net/core/netdev-genl.c
  net/ethtool/netlink.c
  net/handshake/netlink.c

# Two effects

1. A privileged listener in init_net subscribed to the
  "hsr-network" group receives ring-error and node-down events
   for HSR devices in any network namespace. The payload carries
   the peer MAC (HSR_A_NODE_ADDR) and the slave port ifindex
   (HSR_A_IFINDEX). Looks like a cross-namespace information
   leak.

2. A listener inside the HSR device's own network namespace
   never sees its own events. Functional defect for any
   namespaced HSR consumer.

# How I tested

Attached proof of concept sets up vethA0/vethB0 with hsr1 in
init_net and vethA1/vethB1 with hsr0 in a child network
namespace. Veth pairs bridge the namespace boundary. Both HSRs
talk for 4 seconds so node tables fill. Then vethA0/vethB0 go
down. Each side's prune timer fires hsr_nl_nodedown() for the
peer MAC. Two listeners report what they got.

Vanilla linux-7.1-rc1:
    init_net          received 2 HSR notifications
    child namespace   received 0 HSR notifications

With genlmsg_multicast_netns() routed via
dev_net(master->dev) and dev_net(port->dev):
    init_net          received 1 HSR notification    (hsr1's event)
    child namespace   received 1 HSR notification    (hsr0's event)

The proof of concept source poc_hsr_pernet.c and the run log
poc_hsr_pernet.log are attached. The proof of concept build
overrides HSR_NODE_FORGET_TIME from 60000 ms to 3000 ms to
keep wall clock short; the bug shape is the same at the
stock 60 s.

If you confirm this is a real bug, I have a small patch ready
that switches the two callers to genlmsg_multicast_netns().
Please let me know if you would like me to send it as a
follow-up.

Thanks,
Maoyi
--
Nanyang Technological University
https://maoyixie.com/

[-- Attachment #2: poc_hsr_pernet.log --]
[-- Type: application/octet-stream, Size: 1814 bytes --]

# Unpatched (linux-7.1-rc1 vanilla)

$ rmmod hsr; insmod /root/hsr.ko; /root/poc_hsr_pernet 15
[child] bringing up vethA1, vethB1, hsr0
[child] HSR family_id=37 mcgrp_id=15
[child] hsr0 up
[child] listening on HSR mcast in child namespace for 15s...
[child] received 0 HSR notifications in child namespace
[parent] creating veth pairs vethA0<->vethA1, vethB0<->vethB1
[parent] moving vethA1, vethB1 into the child namespace
[parent] creating hsr1 on vethA0+vethB0 in init_net
[parent] HSR family_id=37 mcgrp_id=15 (init_net)
[parent] soak 4s
[parent] bringing vethA0, vethB0 down
[parent] listening 11s for nodedown
[parent(init_net)] HSR notification cmd=2 len=32
[parent(init_net)] HSR notification cmd=2 len=32

========== RESULT ==========
init_net   received 2 HSR notifications
child namespace  received 0 HSR notifications
UNPATCHED: init_net got events that belong to the child namespace.


# Patched (linux-7.1-rc1 + the two-line genlmsg_multicast_netns conversion)

$ rmmod hsr; insmod /root/hsr.ko; /root/poc_hsr_pernet 15
[child] bringing up vethA1, vethB1, hsr0
[child] HSR family_id=38 mcgrp_id=15
[child] hsr0 up
[child] listening on HSR mcast in child namespace for 15s...
[child] HSR notification cmd=2 len=32
[child] received 1 HSR notifications in child namespace
[parent] creating veth pairs vethA0<->vethA1, vethB0<->vethB1
[parent] moving vethA1, vethB1 into the child namespace
[parent] creating hsr1 on vethA0+vethB0 in init_net
[parent] HSR family_id=38 mcgrp_id=15 (init_net)
[parent] soak 4s
[parent] bringing vethA0, vethB0 down
[parent] listening 11s for nodedown
[parent(init_net)] HSR notification cmd=2 len=32

========== RESULT ==========
init_net   received 1 HSR notifications
child namespace  received >=1 HSR notifications
PATCHED: each namespace receives only its own events.

[-- Attachment #3: poc_hsr_pernet.c --]
[-- Type: application/octet-stream, Size: 17541 bytes --]

/*
 * PoC for HSR genlmsg_multicast cross-namespace leak.
 *
 * Setup:
 *   parent in init_net.
 *   parent creates two veth pairs: vethA0/vethA1 and vethB0/vethB1.
 *   parent moves vethA1, vethB1 into a child network namespace.
 *   parent builds hsr1 on vethA0 + vethB0 in init_net.
 *   child  builds hsr0 on vethA1 + vethB1 in the child namespace.
 *   parent listens on the "hsr-network" mcast group in init_net.
 *   child  listens on the "hsr-network" mcast group in the child
 *          namespace.
 *
 * Trigger:
 *   let both HSRs talk for 4 s.
 *   parent brings vethA0 and vethB0 down.
 *   each HSR's prune timer eventually fires hsr_nl_nodedown for the
 *   peer MAC.
 *
 * Expected:
 *   unpatched: parent counts 2 events (hsr1 own + hsr0 leak),
 *              child  counts 0.
 *   patched:   parent counts 1 (hsr1 own), child counts 1 (hsr0 own).
 *
 * Build: cc -O0 -g poc_hsr_pernet.c -o poc_hsr_pernet
 * Run as root in a VM with hsr.ko loaded.
 *
 * Note on timing:
 *   The trigger relies on hsr_nl_nodedown firing once each side's
 *   node-table entry for the peer expires. The expiry time is
 *   HSR_NODE_FORGET_TIME in net/hsr/hsr_main.h, default 60000 ms.
 *   To finish in 15 s, rebuild the hsr module with
 *   HSR_NODE_FORGET_TIME = 3000. To run against a stock kernel,
 *   pass a timeout argument >= 70:
 *
 *     ./poc_hsr_pernet 75
 *
 *   The bug shape is identical in both cases.
 */

#define _GNU_SOURCE
#include <arpa/inet.h>
#include <errno.h>
#include <fcntl.h>
#include <linux/genetlink.h>
#include <linux/if.h>
#include <linux/if_link.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <linux/veth.h>
#include <poll.h>
#include <sched.h>
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

/* if_nametoindex is normally in <net/if.h>, which clashes with
 * <linux/if.h>. Forward-declare to avoid the conflict.
 */
extern unsigned int if_nametoindex(const char *);

/* HSR netlink UAPI (mirror of include/uapi/linux/hsr_netlink.h) */
#define HSR_GENL_NAME		"HSR"
#define HSR_GENL_VERSION	1
#define HSR_GENL_MCGRP_NAME	"hsr-network"
#define HSR_C_RING_ERROR	1
#define HSR_C_NODE_DOWN		2

#define HSR_ATTR_NODE_ADDR	2
#define HSR_ATTR_IFINDEX	3

/* IFLA_INFO_* for nested netlink */
#define IFLA_INFO_KIND		1
#define IFLA_INFO_DATA		2

/* HSR rtnetlink attrs (include/uapi/linux/if_link.h IFLA_HSR_*) */
#define IFLA_HSR_SLAVE1		1
#define IFLA_HSR_SLAVE2		2

static int nl_open(int *pid_out)
{
	int sk = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
	if (sk < 0) { perror("socket(NETLINK_ROUTE)"); return -1; }
	struct sockaddr_nl sa = { .nl_family = AF_NETLINK };
	if (bind(sk, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
		perror("bind NETLINK_ROUTE"); return -1;
	}
	socklen_t slen = sizeof(sa);
	getsockname(sk, (struct sockaddr *)&sa, &slen);
	if (pid_out) *pid_out = sa.nl_pid;
	return sk;
}

static int nl_open_generic(void)
{
	int sk = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
	if (sk < 0) { perror("socket(NETLINK_GENERIC)"); return -1; }
	struct sockaddr_nl sa = { .nl_family = AF_NETLINK };
	if (bind(sk, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
		perror("bind NETLINK_GENERIC"); return -1;
	}
	return sk;
}

static int nl_send_recv(int sk, struct nlmsghdr *nh)
{
	if (send(sk, nh, nh->nlmsg_len, 0) < 0) {
		perror("send"); return -1;
	}
	char buf[8192];
	int n = recv(sk, buf, sizeof(buf), 0);
	if (n < 0) { perror("recv"); return -1; }
	struct nlmsghdr *rnh = (struct nlmsghdr *)buf;
	if (rnh->nlmsg_type == NLMSG_ERROR) {
		struct nlmsgerr *err = (struct nlmsgerr *)NLMSG_DATA(rnh);
		if (err->error) {
			fprintf(stderr, "nlmsgerr %d (%s)\n",
				err->error, strerror(-err->error));
			return err->error;
		}
	}
	return 0;
}

/* helper: add an nla to existing nh */
static struct nlattr *nla_put(struct nlmsghdr *nh, int type,
			      const void *data, int len)
{
	struct nlattr *na = (struct nlattr *)((char *)nh + NLMSG_ALIGN(nh->nlmsg_len));
	na->nla_type = type;
	na->nla_len = NLA_HDRLEN + len;
	if (data) memcpy((char *)na + NLA_HDRLEN, data, len);
	nh->nlmsg_len = NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(na->nla_len);
	return na;
}

/* Create a veth pair via RTM_NEWLINK */
static int create_veth_pair(int sk, const char *a, const char *b)
{
	char buf[4096] = {0};
	struct nlmsghdr *nh = (struct nlmsghdr *)buf;
	nh->nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
	nh->nlmsg_type = RTM_NEWLINK;
	nh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE | NLM_F_EXCL;
	nh->nlmsg_seq = 1;
	struct ifinfomsg *ifi = NLMSG_DATA(nh);
	ifi->ifi_family = AF_UNSPEC;

	nla_put(nh, IFLA_IFNAME, a, strlen(a) + 1);

	/* nested: linkinfo = { kind="veth", data = { peer = { ifname=b } } } */
	struct nlattr *linfo = (struct nlattr *)((char *)nh + NLMSG_ALIGN(nh->nlmsg_len));
	linfo->nla_type = IFLA_LINKINFO;
	int linfo_start = nh->nlmsg_len;
	linfo->nla_len = NLA_HDRLEN;
	nh->nlmsg_len = NLMSG_ALIGN(nh->nlmsg_len) + NLA_HDRLEN;
	nla_put(nh, IFLA_INFO_KIND, "veth", strlen("veth") + 1);

	struct nlattr *info_data = (struct nlattr *)((char *)nh + NLMSG_ALIGN(nh->nlmsg_len));
	info_data->nla_type = IFLA_INFO_DATA;
	int info_data_start = nh->nlmsg_len;
	info_data->nla_len = NLA_HDRLEN;
	nh->nlmsg_len = NLMSG_ALIGN(nh->nlmsg_len) + NLA_HDRLEN;

	struct nlattr *peer = (struct nlattr *)((char *)nh + NLMSG_ALIGN(nh->nlmsg_len));
	peer->nla_type = VETH_INFO_PEER;
	int peer_start = nh->nlmsg_len;
	peer->nla_len = NLA_HDRLEN + sizeof(struct ifinfomsg);
	/* zero ifinfomsg for peer */
	memset((char *)peer + NLA_HDRLEN, 0, sizeof(struct ifinfomsg));
	nh->nlmsg_len = NLMSG_ALIGN(nh->nlmsg_len) + peer->nla_len;
	nla_put(nh, IFLA_IFNAME, b, strlen(b) + 1);
	peer->nla_len = nh->nlmsg_len - peer_start;
	info_data->nla_len = nh->nlmsg_len - info_data_start;
	linfo->nla_len = nh->nlmsg_len - linfo_start;

	return nl_send_recv(sk, nh);
}

/* Move netdev (by ifname) to target netns_fd */
static int set_link_netns(int sk, const char *ifname, int netns_fd)
{
	char buf[4096] = {0};
	struct nlmsghdr *nh = (struct nlmsghdr *)buf;
	nh->nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
	nh->nlmsg_type = RTM_NEWLINK;
	nh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
	nh->nlmsg_seq = 2;
	struct ifinfomsg *ifi = NLMSG_DATA(nh);
	ifi->ifi_family = AF_UNSPEC;

	nla_put(nh, IFLA_IFNAME, ifname, strlen(ifname) + 1);
	__u32 fd = netns_fd;
	nla_put(nh, IFLA_NET_NS_FD, &fd, sizeof(fd));

	return nl_send_recv(sk, nh);
}

/* Bring up an interface by name */
static int link_up(int sk, const char *ifname)
{
	char buf[4096] = {0};
	struct nlmsghdr *nh = (struct nlmsghdr *)buf;
	nh->nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
	nh->nlmsg_type = RTM_NEWLINK;
	nh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
	nh->nlmsg_seq = 3;
	struct ifinfomsg *ifi = NLMSG_DATA(nh);
	ifi->ifi_family = AF_UNSPEC;
	ifi->ifi_flags = IFF_UP;
	ifi->ifi_change = IFF_UP;
	nla_put(nh, IFLA_IFNAME, ifname, strlen(ifname) + 1);
	return nl_send_recv(sk, nh);
}

static int if_nametoindex_via_proc(const char *name)
{
	return if_nametoindex(name);
}

/* Create an HSR device with the two named slaves. */
static int create_hsr_in_ns(int sk, const char *hsr_name,
			    const char *slave1, const char *slave2)
{
	char buf[4096] = {0};
	struct nlmsghdr *nh = (struct nlmsghdr *)buf;
	nh->nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
	nh->nlmsg_type = RTM_NEWLINK;
	nh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE | NLM_F_EXCL;
	nh->nlmsg_seq = 4;
	struct ifinfomsg *ifi = NLMSG_DATA(nh);
	ifi->ifi_family = AF_UNSPEC;
	nla_put(nh, IFLA_IFNAME, hsr_name, strlen(hsr_name) + 1);

	struct nlattr *linfo = (struct nlattr *)((char *)nh + NLMSG_ALIGN(nh->nlmsg_len));
	linfo->nla_type = IFLA_LINKINFO;
	int linfo_start = nh->nlmsg_len;
	linfo->nla_len = NLA_HDRLEN;
	nh->nlmsg_len = NLMSG_ALIGN(nh->nlmsg_len) + NLA_HDRLEN;
	nla_put(nh, IFLA_INFO_KIND, "hsr", 4);

	struct nlattr *idata = (struct nlattr *)((char *)nh + NLMSG_ALIGN(nh->nlmsg_len));
	idata->nla_type = IFLA_INFO_DATA;
	int idata_start = nh->nlmsg_len;
	idata->nla_len = NLA_HDRLEN;
	nh->nlmsg_len = NLMSG_ALIGN(nh->nlmsg_len) + NLA_HDRLEN;

	__u32 sidx1 = if_nametoindex_via_proc(slave1);
	__u32 sidx2 = if_nametoindex_via_proc(slave2);
	if (!sidx1 || !sidx2) {
		fprintf(stderr, "if_nametoindex %s=%u %s=%u\n",
			slave1, sidx1, slave2, sidx2);
		return -1;
	}
	nla_put(nh, IFLA_HSR_SLAVE1, &sidx1, sizeof(sidx1));
	nla_put(nh, IFLA_HSR_SLAVE2, &sidx2, sizeof(sidx2));

	idata->nla_len = nh->nlmsg_len - idata_start;
	linfo->nla_len = nh->nlmsg_len - linfo_start;

	return nl_send_recv(sk, nh);
}

/* Resolve HSR genl family id and mcast group id */
static int resolve_hsr_genl(int sk, __u16 *family_id, __u32 *mc_id)
{
	char buf[8192] = {0};
	struct nlmsghdr *nh = (struct nlmsghdr *)buf;
	nh->nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
	nh->nlmsg_type = GENL_ID_CTRL;
	nh->nlmsg_flags = NLM_F_REQUEST;
	nh->nlmsg_seq = 10;
	struct genlmsghdr *gh = NLMSG_DATA(nh);
	gh->cmd = CTRL_CMD_GETFAMILY;
	gh->version = 1;
	nla_put(nh, CTRL_ATTR_FAMILY_NAME, HSR_GENL_NAME, strlen(HSR_GENL_NAME) + 1);

	if (send(sk, buf, nh->nlmsg_len, 0) < 0) {
		perror("send GETFAMILY"); return -1;
	}
	int n = recv(sk, buf, sizeof(buf), 0);
	if (n < 0) { perror("recv GETFAMILY"); return -1; }
	struct nlmsghdr *rnh = (struct nlmsghdr *)buf;
	if (rnh->nlmsg_type == NLMSG_ERROR) {
		struct nlmsgerr *err = (struct nlmsgerr *)NLMSG_DATA(rnh);
		fprintf(stderr, "GETFAMILY error %d\n", err->error);
		return -1;
	}
	char *p = (char *)NLMSG_DATA(rnh) + GENL_HDRLEN;
	int rem = rnh->nlmsg_len - NLMSG_LENGTH(GENL_HDRLEN);
	*family_id = 0;
	*mc_id = 0;
	while (rem > 0) {
		struct nlattr *a = (struct nlattr *)p;
		if (a->nla_type == CTRL_ATTR_FAMILY_ID) {
			*family_id = *(__u16 *)((char *)a + NLA_HDRLEN);
		} else if (a->nla_type == CTRL_ATTR_MCAST_GROUPS) {
			char *q = (char *)a + NLA_HDRLEN;
			int qrem = a->nla_len - NLA_HDRLEN;
			while (qrem > 0) {
				struct nlattr *g = (struct nlattr *)q;
				char *gq = (char *)g + NLA_HDRLEN;
				int grem = g->nla_len - NLA_HDRLEN;
				__u32 gid = 0;
				const char *gname = NULL;
				while (grem > 0) {
					struct nlattr *ga = (struct nlattr *)gq;
					if (ga->nla_type == CTRL_ATTR_MCAST_GRP_ID)
						gid = *(__u32 *)((char *)ga + NLA_HDRLEN);
					else if (ga->nla_type == CTRL_ATTR_MCAST_GRP_NAME)
						gname = (char *)ga + NLA_HDRLEN;
					int gal = NLA_ALIGN(ga->nla_len);
					gq += gal; grem -= gal;
				}
				if (gname && !strcmp(gname, HSR_GENL_MCGRP_NAME))
					*mc_id = gid;
				int gl = NLA_ALIGN(g->nla_len);
				q += gl; qrem -= gl;
			}
		}
		int al = NLA_ALIGN(a->nla_len);
		p += al; rem -= al;
	}
	return (*family_id && *mc_id) ? 0 : -1;
}

/* Open a NETLINK_GENERIC socket and subscribe to HSR mcast group */
static int hsr_listener(__u32 mc_id)
{
	int sk = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
	if (sk < 0) { perror("socket(GEN)"); return -1; }
	struct sockaddr_nl sa = { .nl_family = AF_NETLINK };
	if (bind(sk, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
		perror("bind GEN"); return -1;
	}
	if (setsockopt(sk, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP,
		       &mc_id, sizeof(mc_id)) < 0) {
		perror("setsockopt ADD_MEMBERSHIP");
		return -1;
	}
	return sk;
}

/* Drain & count notifications received within timeout_ms */
static int drain_notifications(int sk, int timeout_ms, const char *label)
{
	int count = 0;
	struct pollfd pfd = { .fd = sk, .events = POLLIN };
	int remain = timeout_ms;
	struct timespec t0, t1;
	clock_gettime(CLOCK_MONOTONIC, &t0);
	while (remain > 0) {
		int n = poll(&pfd, 1, remain);
		if (n < 0) break;
		if (n == 0) break;
		char buf[4096];
		int r = recv(sk, buf, sizeof(buf), MSG_DONTWAIT);
		if (r < 0) break;
		struct nlmsghdr *nh = (struct nlmsghdr *)buf;
		while (NLMSG_OK(nh, r)) {
			struct genlmsghdr *gh = NLMSG_DATA(nh);
			printf("[%s] HSR notification cmd=%u len=%u\n",
			       label, gh->cmd, nh->nlmsg_len);
			count++;
			nh = NLMSG_NEXT(nh, r);
		}
		clock_gettime(CLOCK_MONOTONIC, &t1);
		long elapsed_ms = (t1.tv_sec - t0.tv_sec) * 1000 +
				  (t1.tv_nsec - t0.tv_nsec) / 1000000;
		remain = timeout_ms - (int)elapsed_ms;
	}
	return count;
}

int main(int argc, char **argv)
{
	int timeout_s = 30;
	if (argc >= 2) timeout_s = atoi(argv[1]);

	/* Spawn helper child in fresh netns. Parent stays in init_net. */
	int sync_p[2], ready_p[2];
	if (pipe(sync_p) < 0 || pipe(ready_p) < 0) { perror("pipe"); return 1; }

	pid_t pid = fork();
	if (pid < 0) { perror("fork"); return 1; }

	if (pid == 0) {
		setbuf(stdout, NULL); setbuf(stderr, NULL);
		/* CHILD */
		close(sync_p[0]); close(ready_p[1]);
		if (unshare(CLONE_NEWNET) < 0) {
			perror("unshare(NEWNET)"); _exit(1);
		}
		/* Signal "I am ready to receive vethA1/vethB1" */
		write(sync_p[1], "x", 1);
		/* Wait for parent to finish device moves + HSR creation */
		char c; read(ready_p[0], &c, 1);
		close(sync_p[1]); close(ready_p[0]);

		/* In child netns: bring up slaves + create HSR. */
		int sk = nl_open(NULL);
		if (sk < 0) _exit(1);
		fprintf(stderr, "[child] bringing up vethA1, vethB1, hsr0\n");
		link_up(sk, "vethA1");
		link_up(sk, "vethB1");

		__u16 fid; __u32 mcid;
		int gsk = nl_open_generic();
		if (gsk < 0) _exit(1);
		if (resolve_hsr_genl(gsk, &fid, &mcid) < 0) {
			fprintf(stderr, "[child] resolve HSR genl failed\n");
			_exit(1);
		}
		fprintf(stderr, "[child] HSR family_id=%u mcgrp_id=%u\n", fid, mcid);

		if (create_hsr_in_ns(sk, "hsr0", "vethA1", "vethB1") < 0) {
			fprintf(stderr, "[child] create_hsr_in_ns failed\n");
			_exit(1);
		}
		link_up(sk, "hsr0");
		fprintf(stderr, "[child] hsr0 up\n");

		/* Open genl mcast listener for HSR_MCGRP */
		int lsk = hsr_listener(mcid);
		if (lsk < 0) _exit(1);
		fprintf(stderr, "[child] listening on HSR mcast in child namespace for %ds...\n",
		       timeout_s);
		int n = drain_notifications(lsk, timeout_s * 1000, "child");
		fprintf(stderr, "[child] received %d HSR notifications in child namespace\n", n);
		_exit(n > 0 ? 0 : 42);
	}

	/* PARENT (init_net) */
	close(sync_p[1]); close(ready_p[0]);
	char c; read(sync_p[0], &c, 1);  /* wait until child unshare'd */
	close(sync_p[0]);

	int sk = nl_open(NULL);
	if (sk < 0) { kill(pid, SIGKILL); return 1; }

	printf("[parent] creating veth pairs vethA0<->vethA1, vethB0<->vethB1\n");
	if (create_veth_pair(sk, "vethA0", "vethA1") < 0 ||
	    create_veth_pair(sk, "vethB0", "vethB1") < 0) {
		kill(pid, SIGKILL); return 1;
	}

	char path[64];
	snprintf(path, sizeof(path), "/proc/%d/ns/net", pid);
	int netns_fd = open(path, O_RDONLY);
	if (netns_fd < 0) { perror("open child namespace"); kill(pid, SIGKILL); return 1; }

	printf("[parent] moving vethA1, vethB1 into the child namespace\n");
	if (set_link_netns(sk, "vethA1", netns_fd) < 0 ||
	    set_link_netns(sk, "vethB1", netns_fd) < 0) {
		kill(pid, SIGKILL); return 1;
	}

	link_up(sk, "vethA0");
	link_up(sk, "vethB0");
	close(netns_fd);

	/* hsr1 is the peer that hsr0 in N will time out once we cut frames. */
	printf("[parent] creating hsr1 on vethA0+vethB0 in init_net\n");
	if (create_hsr_in_ns(sk, "hsr1", "vethA0", "vethB0") < 0) {
		fprintf(stderr, "[parent] create hsr1 failed\n");
		kill(pid, SIGKILL); return 1;
	}
	link_up(sk, "hsr1");

	/* Parent subscribes to HSR mcast in init_net */
	__u16 fid; __u32 mcid;
	int gsk = nl_open_generic();
	if (gsk < 0) { kill(pid, SIGKILL); return 1; }
	if (resolve_hsr_genl(gsk, &fid, &mcid) < 0) {
		fprintf(stderr, "[parent] resolve HSR genl failed\n");
		kill(pid, SIGKILL); return 1;
	}
	printf("[parent] HSR family_id=%u mcgrp_id=%u (init_net)\n", fid, mcid);

	int lsk = hsr_listener(mcid);
	if (lsk < 0) { kill(pid, SIGKILL); return 1; }

	/* Signal child to proceed with HSR creation */
	write(ready_p[1], "x", 1);
	close(ready_p[1]);

	/* let both HSRs talk for 4 s so node tables populate */
	printf("[parent] soak 4s\n");
	drain_notifications(lsk, 4000, "parent(init_net)");

	/* cut frames by bringing the init_net slaves down */
	printf("[parent] bringing vethA0, vethB0 down\n");
	char buf2[4096] = {0};
	struct nlmsghdr *nh2 = (struct nlmsghdr *)buf2;
	nh2->nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
	nh2->nlmsg_type = RTM_NEWLINK;
	nh2->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
	nh2->nlmsg_seq = 5;
	struct ifinfomsg *ifi2 = NLMSG_DATA(nh2);
	ifi2->ifi_family = AF_UNSPEC;
	ifi2->ifi_change = IFF_UP;
	ifi2->ifi_flags = 0;
	nla_put(nh2, IFLA_IFNAME, "vethA0", 7);
	nl_send_recv(sk, nh2);
	nh2->nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
	ifi2->ifi_family = AF_UNSPEC;
	ifi2->ifi_change = IFF_UP;
	ifi2->ifi_flags = 0;
	nh2->nlmsg_seq = 6;
	nla_put(nh2, IFLA_IFNAME, "vethB0", 7);
	nl_send_recv(sk, nh2);

	printf("[parent] listening %ds for nodedown\n", timeout_s - 4);
	int n = drain_notifications(lsk, (timeout_s - 4) * 1000, "parent(init_net)");

	int status;
	waitpid(pid, &status, 0);
	int child_count = (WIFEXITED(status) && WEXITSTATUS(status) == 0) ? 1 :
			  (WIFEXITED(status) && WEXITSTATUS(status) == 42) ? 0 :
			  -1;

	printf("\n========== RESULT ==========\n");
	printf("init_net   received %d HSR notifications\n", n);
	printf("child namespace  received %s HSR notifications\n",
	       child_count == 1 ? ">=1" : "0");
	if (n > 0 && child_count == 0)
		printf("UNPATCHED: init_net got events that belong to the child namespace.\n");
	else if (n >= 1 && child_count >= 1)
		printf("PATCHED: each namespace receives only its own events.\n");
	else
		printf("No events in the chosen window. Try a longer timeout.\n");
	return 0;
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net] hsr: broadcast netlink notifications in the device's net namespace
  2026-05-18  6:44 hsr: netlink notifications leak across network namespaces Maoyi Xie
@ 2026-05-27  7:59 ` Maoyi Xie
  2026-05-27 11:11   ` Fernando Fernandez Mancera
  0 siblings, 1 reply; 6+ messages in thread
From: Maoyi Xie @ 2026-05-27  7:59 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Fernando Fernandez Mancera, Jan Vaclav, Andrew Lunn, Taehee Yoo,
	netdev, linux-kernel, Maoyi Xie, stable

The HSR generic netlink family sets .netnsok = true. HSR devices can
live in network namespaces other than init_net.

Two async notifiers broadcast events with genlmsg_multicast(). They
are hsr_nl_ringerror() and hsr_nl_nodedown(). That helper delivers
only on the default genl socket in init_net. So the events always land
in init_net. The network namespace of the device does not matter.

This has two effects. A listener in the device's own namespace never
sees its own ring error and node down events. A privileged listener in
init_net receives events from HSR devices in other namespaces. The
payload carries the peer node MAC (HSR_A_NODE_ADDR) and the slave port
ifindex (HSR_A_IFINDEX). It leaks information across network
namespaces.

Switch both callers to genlmsg_multicast_netns(). Other families with
.netnsok = true already do this. Examples are gtp, ovpn, team,
batman-adv, netdev-genl, ethtool and handshake.

hsr_nl_ringerror() already has the slave port. It uses
dev_net(port->dev). hsr_nl_nodedown() takes the namespace from the
master port via hsr_port_get_hsr().

Fixes: 09e91dbea0aa ("hsr: set .netnsok flag")
Cc: stable@vger.kernel.org
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
---
This is the fix for the problem I reported on netdev on 2026-05-18 [1].
That thread had no reply, so I am sending the patch and adding the HSR
maintainers to Cc. The proof of concept and the test numbers are in
that message.

[1] https://lore.kernel.org/netdev/CAHPEe=GO=2qqWZPwBB4rrXc3mkD0dznp2K78nCsKwF=c-QwxEw@mail.gmail.com/

 net/hsr/hsr_netlink.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/hsr/hsr_netlink.c b/net/hsr/hsr_netlink.c
index db0b0af7a692..067ceaf7304b 100644
--- a/net/hsr/hsr_netlink.c
+++ b/net/hsr/hsr_netlink.c
@@ -247,7 +247,8 @@ void hsr_nl_ringerror(struct hsr_priv *hsr, unsigned char addr[ETH_ALEN],
 		goto nla_put_failure;

 	genlmsg_end(skb, msg_head);
-	genlmsg_multicast(&hsr_genl_family, skb, 0, 0, GFP_ATOMIC);
+	genlmsg_multicast_netns(&hsr_genl_family, dev_net(port->dev),
+				skb, 0, 0, GFP_ATOMIC);

 	return;

@@ -283,8 +284,17 @@ void hsr_nl_nodedown(struct hsr_priv *hsr, unsigned char addr[ETH_ALEN])
 	if (res < 0)
 		goto nla_put_failure;

+	rcu_read_lock();
+	master = hsr_port_get_hsr(hsr, HSR_PT_MASTER);
+	if (!master) {
+		rcu_read_unlock();
+		goto nla_put_failure;
+	}
+
 	genlmsg_end(skb, msg_head);
-	genlmsg_multicast(&hsr_genl_family, skb, 0, 0, GFP_ATOMIC);
+	genlmsg_multicast_netns(&hsr_genl_family, dev_net(master->dev),
+				skb, 0, 0, GFP_ATOMIC);
+	rcu_read_unlock();

 	return;

-- 
2.34.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] hsr: broadcast netlink notifications in the device's net namespace
  2026-05-27  7:59 ` [PATCH net] hsr: broadcast netlink notifications in the device's net namespace Maoyi Xie
@ 2026-05-27 11:11   ` Fernando Fernandez Mancera
  2026-05-27 23:18     ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Fernando Fernandez Mancera @ 2026-05-27 11:11 UTC (permalink / raw)
  To: Maoyi Xie, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: Jan Vaclav, Andrew Lunn, Taehee Yoo, netdev, linux-kernel, stable

On 5/27/26 9:59 AM, Maoyi Xie wrote:
> The HSR generic netlink family sets .netnsok = true. HSR devices can
> live in network namespaces other than init_net.
> 
> Two async notifiers broadcast events with genlmsg_multicast(). They
> are hsr_nl_ringerror() and hsr_nl_nodedown(). That helper delivers
> only on the default genl socket in init_net. So the events always land
> in init_net. The network namespace of the device does not matter.
> 
> This has two effects. A listener in the device's own namespace never
> sees its own ring error and node down events. A privileged listener in
> init_net receives events from HSR devices in other namespaces. The
> payload carries the peer node MAC (HSR_A_NODE_ADDR) and the slave port
> ifindex (HSR_A_IFINDEX). It leaks information across network
> namespaces.
> 
> Switch both callers to genlmsg_multicast_netns(). Other families with
> .netnsok = true already do this. Examples are gtp, ovpn, team,
> batman-adv, netdev-genl, ethtool and handshake.
> 
> hsr_nl_ringerror() already has the slave port. It uses
> dev_net(port->dev). hsr_nl_nodedown() takes the namespace from the
> master port via hsr_port_get_hsr().
> 
> Fixes: 09e91dbea0aa ("hsr: set .netnsok flag")
> Cc: stable@vger.kernel.org
> Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
> ---
> This is the fix for the problem I reported on netdev on 2026-05-18 [1].
> That thread had no reply, so I am sending the patch and adding the HSR
> maintainers to Cc. The proof of concept and the test numbers are in
> that message.
> 
> [1] https://lore.kernel.org/netdev/CAHPEe=GO=2qqWZPwBB4rrXc3mkD0dznp2K78nCsKwF=c-QwxEw@mail.gmail.com/
> 
>   net/hsr/hsr_netlink.c | 14 ++++++++++++--
>   1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/net/hsr/hsr_netlink.c b/net/hsr/hsr_netlink.c
> index db0b0af7a692..067ceaf7304b 100644
> --- a/net/hsr/hsr_netlink.c
> +++ b/net/hsr/hsr_netlink.c
> @@ -247,7 +247,8 @@ void hsr_nl_ringerror(struct hsr_priv *hsr, unsigned char addr[ETH_ALEN],
>   		goto nla_put_failure;
>   
>   	genlmsg_end(skb, msg_head);
> -	genlmsg_multicast(&hsr_genl_family, skb, 0, 0, GFP_ATOMIC);
> +	genlmsg_multicast_netns(&hsr_genl_family, dev_net(port->dev),
> +				skb, 0, 0, GFP_ATOMIC);
>   
>   	return;
>   
> @@ -283,8 +284,17 @@ void hsr_nl_nodedown(struct hsr_priv *hsr, unsigned char addr[ETH_ALEN])
>   	if (res < 0)
>   		goto nla_put_failure;
>   
> +	rcu_read_lock();
> +	master = hsr_port_get_hsr(hsr, HSR_PT_MASTER);
> +	if (!master) {
> +		rcu_read_unlock();
> +		goto nla_put_failure;
> +	}
> +
>   	genlmsg_end(skb, msg_head);
> -	genlmsg_multicast(&hsr_genl_family, skb, 0, 0, GFP_ATOMIC);
> +	genlmsg_multicast_netns(&hsr_genl_family, dev_net(master->dev),
> +				skb, 0, 0, GFP_ATOMIC);
> +	rcu_read_unlock();
>   

The patch makes sense to me. Anyway, I think we can reduce the RCU 
critical section here.

What about moving rcu_read_unlock() after the if and extract the net 
before unlocking? The benefit is minimal but I think it is worth it.

Thanks,
Fernando.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] hsr: broadcast netlink notifications in the device's net namespace
  2026-05-27 11:11   ` Fernando Fernandez Mancera
@ 2026-05-27 23:18     ` Jakub Kicinski
  2026-05-31 10:23       ` Maoyi Xie
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2026-05-27 23:18 UTC (permalink / raw)
  To: Fernando Fernandez Mancera
  Cc: Maoyi Xie, David S . Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Jan Vaclav, Andrew Lunn, Taehee Yoo, netdev,
	linux-kernel, stable

On Wed, 27 May 2026 13:11:57 +0200 Fernando Fernandez Mancera wrote:
> >   	genlmsg_end(skb, msg_head);
> > -	genlmsg_multicast(&hsr_genl_family, skb, 0, 0, GFP_ATOMIC);
> > +	genlmsg_multicast_netns(&hsr_genl_family, dev_net(master->dev),
> > +				skb, 0, 0, GFP_ATOMIC);
> > +	rcu_read_unlock();
> >     
> 
> The patch makes sense to me. Anyway, I think we can reduce the RCU 
> critical section here.
> 
> What about moving rcu_read_unlock() after the if and extract the net 
> before unlocking? The benefit is minimal but I think it is worth it.

Not sure TBH, we'd need to take a ref on the netns and allocate 
a tracker (on DEBUG kernels). One could go either way.

I'm replying because I wanted to question whether this is Fixes+stable@ 
worthy. Sending the notifications to the namespace where the device is
makes sense. But it's as much a behavior changes as it is a fix.
The commit in question was merged to 5.6, real users clearly don't care.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] hsr: broadcast netlink notifications in the device's net namespace
  2026-05-27 23:18     ` Jakub Kicinski
@ 2026-05-31 10:23       ` Maoyi Xie
  2026-05-31 20:01         ` Fernando Fernandez Mancera
  0 siblings, 1 reply; 6+ messages in thread
From: Maoyi Xie @ 2026-05-31 10:23 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Fernando Fernandez Mancera, David S . Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Jan Vaclav, Andrew Lunn, Taehee Yoo,
	netdev, linux-kernel, stable

On Thu, 28 May 2026 07:18, Jakub Kicinski <kuba@kernel.org> wrote:
> Not sure TBH, we'd need to take a ref on the netns and allocate
> a tracker (on DEBUG kernels). One could go either way.

On the RCU side, you're right that moving the net out of the lock
means taking a ref, and this isn't a hot path where that really pays
off. So I'd lean towards keeping it as posted, with the multicast
still inside the rcu_read_lock. Fernando, thanks for the suggestion
either way.

> I'm replying because I wanted to question whether this is Fixes+stable@
> worthy. Sending the notifications to the namespace where the device is
> makes sense. But it's as much a behavior changes as it is a fix.
> The commit in question was merged to 5.6, real users clearly don't care.

On the Fixes and stable tags, my thinking was that the init_net side
is an information leak. A privileged listener there ends up seeing
ring error and node down events from devices in other netns. The
payload carries the peer MAC and the slave ifindex. That was my reason
for tagging it.

But I see your point that it is as much a behavior change as a fix,
and if nobody has hit it since 5.6, the risk is clearly low. I don't
feel strongly here. If you'd rather take it as a plain net-next
improvement without the two tags, that is completely fine by me and I
will respin it that way.

Thanks,
Maoyi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] hsr: broadcast netlink notifications in the device's net namespace
  2026-05-31 10:23       ` Maoyi Xie
@ 2026-05-31 20:01         ` Fernando Fernandez Mancera
  0 siblings, 0 replies; 6+ messages in thread
From: Fernando Fernandez Mancera @ 2026-05-31 20:01 UTC (permalink / raw)
  To: Maoyi Xie, Jakub Kicinski
  Cc: David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Jan Vaclav, Andrew Lunn, Taehee Yoo, netdev, linux-kernel, stable



On 5/31/26 12:23 PM, Maoyi Xie wrote:
> On Thu, 28 May 2026 07:18, Jakub Kicinski <kuba@kernel.org> wrote:
>> Not sure TBH, we'd need to take a ref on the netns and allocate
>> a tracker (on DEBUG kernels). One could go either way.
> 
> On the RCU side, you're right that moving the net out of the lock
> means taking a ref, and this isn't a hot path where that really pays
> off. So I'd lean towards keeping it as posted, with the multicast
> still inside the rcu_read_lock. Fernando, thanks for the suggestion
> either way.
> 

In such case:

Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>

Keep it if re-posted to net-next please.

Thanks!

>> I'm replying because I wanted to question whether this is Fixes+stable@
>> worthy. Sending the notifications to the namespace where the device is
>> makes sense. But it's as much a behavior changes as it is a fix.
>> The commit in question was merged to 5.6, real users clearly don't care.
> 
> On the Fixes and stable tags, my thinking was that the init_net side
> is an information leak. A privileged listener there ends up seeing
> ring error and node down events from devices in other netns. The
> payload carries the peer MAC and the slave ifindex. That was my reason
> for tagging it.
> 
> But I see your point that it is as much a behavior change as a fix,
> and if nobody has hit it since 5.6, the risk is clearly low. I don't
> feel strongly here. If you'd rather take it as a plain net-next
> improvement without the two tags, that is completely fine by me and I
> will respin it that way.
> 
> Thanks,
> Maoyi


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-31 20:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18  6:44 hsr: netlink notifications leak across network namespaces Maoyi Xie
2026-05-27  7:59 ` [PATCH net] hsr: broadcast netlink notifications in the device's net namespace Maoyi Xie
2026-05-27 11:11   ` Fernando Fernandez Mancera
2026-05-27 23:18     ` Jakub Kicinski
2026-05-31 10:23       ` Maoyi Xie
2026-05-31 20:01         ` Fernando Fernandez Mancera

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox