public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v5] openvswitch: cap upcall PID array size and pre-size vport replies
@ 2026-04-16  2:46 Weiming Shi
  2026-04-20  1:25 ` Ilya Maximets
  2026-04-20 21:59 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 3+ messages in thread
From: Weiming Shi @ 2026-04-16  2:46 UTC (permalink / raw)
  To: Aaron Conole, Eelco Chaudron, Ilya Maximets, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Pravin B Shelar, Thomas Graf, Alex Wang, netdev,
	dev, Xiang Mei, Weiming Shi

The vport netlink reply helpers allocate a fixed-size skb with
nlmsg_new(NLMSG_DEFAULT_SIZE, ...) but serialize the full upcall PID
array via ovs_vport_get_upcall_portids().  Since
ovs_vport_set_upcall_portids() accepts any non-zero multiple of
sizeof(u32) with no upper bound, a CAP_NET_ADMIN user can install a PID
array large enough to overflow the reply buffer, causing nla_put() to
fail with -EMSGSIZE and hitting BUG_ON(err < 0).  On systems with
unprivileged user namespaces enabled (e.g., Ubuntu default), this is
reachable via unshare -Urn since OVS vport mutation operations use
GENL_UNS_ADMIN_PERM.

 kernel BUG at net/openvswitch/datapath.c:2414!
 Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
 CPU: 1 UID: 0 PID: 65 Comm: poc Not tainted 7.0.0-rc7-00195-geb216e422044 #1
 RIP: 0010:ovs_vport_cmd_set+0x34c/0x400
 Call Trace:
  <TASK>
  genl_family_rcv_msg_doit (net/netlink/genetlink.c:1116)
  genl_rcv_msg (net/netlink/genetlink.c:1194)
  netlink_rcv_skb (net/netlink/af_netlink.c:2550)
  genl_rcv (net/netlink/genetlink.c:1219)
  netlink_unicast (net/netlink/af_netlink.c:1344)
  netlink_sendmsg (net/netlink/af_netlink.c:1894)
  __sys_sendto (net/socket.c:2206)
  __x64_sys_sendto (net/socket.c:2209)
  do_syscall_64 (arch/x86/entry/syscall_64.c:63)
  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
  </TASK>
 Kernel panic - not syncing: Fatal exception

Reject attempts to set more PIDs than nr_cpu_ids in
ovs_vport_set_upcall_portids(), and pre-compute the worst-case reply
size in ovs_vport_cmd_msg_size() based on that bound, similar to the
existing ovs_dp_cmd_msg_size().  nr_cpu_ids matches the cap already
used by the per-CPU dispatch configuration on the datapath side
(ovs_dp_cmd_fill_info() serialises at most nr_cpu_ids PIDs), so the
two sides stay consistent.

Fixes: 5cd667b0a456 ("openvswitch: Allow each vport to have an array of 'port_id's.")
Reported-by: Xiang Mei <xmei5@asu.edu>
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
v5 (per Ilya):
- Add blank lines before multi-line comment blocks in
  ovs_vport_cmd_msg_size() for readability.
- Drop parenthetical from the OVS_VPORT_ATTR_UPCALL_PID comment.
- Add lore links for previous versions.
v4: https://lore.kernel.org/netdev/20260415125121.110874-2-bestswngs@gmail.com
- Use nr_cpu_ids instead of num_possible_cpus() for consistency with
  the per-CPU dispatch on the datapath side.
- Annotate ovs_vport_cmd_msg_size() per-attribute; split nested sums.
v3: https://lore.kernel.org/netdev/20260413035514.2113886-3-bestswngs@gmail.com
- Cap at num_possible_cpus(); add ovs_vport_cmd_msg_size(); keep
  BUG_ON(); fix Fixes tag.
v2: https://lore.kernel.org/netdev/20260411141448.1479933-3-bestswngs@gmail.com
- Dynamically size reply skb; drop WARN_ON_ONCE, return plain errors.
v1: https://lore.kernel.org/netdev/20260411055915.1224902-2-bestswngs@gmail.com
---
 net/openvswitch/datapath.c | 35 +++++++++++++++++++++++++++++++++--
 net/openvswitch/vport.c    |  3 +++
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index e209099218b4..bbbde50fc649 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -2184,9 +2184,40 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb,
 	return err;
 }
 
+static size_t ovs_vport_cmd_msg_size(void)
+{
+	size_t msgsize = NLMSG_ALIGN(sizeof(struct ovs_header));
+
+	msgsize += nla_total_size(sizeof(u32)); /* OVS_VPORT_ATTR_PORT_NO */
+	msgsize += nla_total_size(sizeof(u32)); /* OVS_VPORT_ATTR_TYPE */
+	msgsize += nla_total_size(IFNAMSIZ);    /* OVS_VPORT_ATTR_NAME */
+	msgsize += nla_total_size(sizeof(u32)); /* OVS_VPORT_ATTR_IFINDEX */
+	msgsize += nla_total_size(sizeof(s32)); /* OVS_VPORT_ATTR_NETNSID */
+
+	/* OVS_VPORT_ATTR_STATS */
+	msgsize += nla_total_size_64bit(sizeof(struct ovs_vport_stats));
+
+	/* OVS_VPORT_ATTR_UPCALL_STATS(OVS_VPORT_UPCALL_ATTR_SUCCESS +
+	 *                             OVS_VPORT_UPCALL_ATTR_FAIL)
+	 */
+	msgsize += nla_total_size(nla_total_size_64bit(sizeof(u64)) +
+				  nla_total_size_64bit(sizeof(u64)));
+
+	/* OVS_VPORT_ATTR_UPCALL_PID */
+	msgsize += nla_total_size(nr_cpu_ids * sizeof(u32));
+
+	/* OVS_VPORT_ATTR_OPTIONS(OVS_TUNNEL_ATTR_DST_PORT +
+	 *                        OVS_TUNNEL_ATTR_EXTENSION(OVS_VXLAN_EXT_GBP))
+	 */
+	msgsize += nla_total_size(nla_total_size(sizeof(u16)) +
+				  nla_total_size(nla_total_size(0)));
+
+	return msgsize;
+}
+
 static struct sk_buff *ovs_vport_cmd_alloc_info(void)
 {
-	return nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	return genlmsg_new(ovs_vport_cmd_msg_size(), GFP_KERNEL);
 }
 
 /* Called with ovs_mutex, only via ovs_dp_notify_wq(). */
@@ -2196,7 +2227,7 @@ struct sk_buff *ovs_vport_cmd_build_info(struct vport *vport, struct net *net,
 	struct sk_buff *skb;
 	int retval;
 
-	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	skb = ovs_vport_cmd_alloc_info();
 	if (!skb)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 23f629e94a36..56b2e2d1a749 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -406,6 +406,9 @@ int ovs_vport_set_upcall_portids(struct vport *vport, const struct nlattr *ids)
 	if (!nla_len(ids) || nla_len(ids) % sizeof(u32))
 		return -EINVAL;
 
+	if (nla_len(ids) / sizeof(u32) > nr_cpu_ids)
+		return -EINVAL;
+
 	old = ovsl_dereference(vport->upcall_portids);
 
 	vport_portids = kmalloc(sizeof(*vport_portids) + nla_len(ids),
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net v5] openvswitch: cap upcall PID array size and pre-size vport replies
  2026-04-16  2:46 [PATCH net v5] openvswitch: cap upcall PID array size and pre-size vport replies Weiming Shi
@ 2026-04-20  1:25 ` Ilya Maximets
  2026-04-20 21:59 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: Ilya Maximets @ 2026-04-20  1:25 UTC (permalink / raw)
  To: Weiming Shi, Aaron Conole, Eelco Chaudron, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: i.maximets, Simon Horman, netdev, dev, Xiang Mei

On 4/16/26 4:46 AM, Weiming Shi wrote:
> The vport netlink reply helpers allocate a fixed-size skb with
> nlmsg_new(NLMSG_DEFAULT_SIZE, ...) but serialize the full upcall PID
> array via ovs_vport_get_upcall_portids().  Since
> ovs_vport_set_upcall_portids() accepts any non-zero multiple of
> sizeof(u32) with no upper bound, a CAP_NET_ADMIN user can install a PID
> array large enough to overflow the reply buffer, causing nla_put() to
> fail with -EMSGSIZE and hitting BUG_ON(err < 0).  On systems with
> unprivileged user namespaces enabled (e.g., Ubuntu default), this is
> reachable via unshare -Urn since OVS vport mutation operations use
> GENL_UNS_ADMIN_PERM.
> 
>  kernel BUG at net/openvswitch/datapath.c:2414!
>  Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
>  CPU: 1 UID: 0 PID: 65 Comm: poc Not tainted 7.0.0-rc7-00195-geb216e422044 #1
>  RIP: 0010:ovs_vport_cmd_set+0x34c/0x400
>  Call Trace:
>   <TASK>
>   genl_family_rcv_msg_doit (net/netlink/genetlink.c:1116)
>   genl_rcv_msg (net/netlink/genetlink.c:1194)
>   netlink_rcv_skb (net/netlink/af_netlink.c:2550)
>   genl_rcv (net/netlink/genetlink.c:1219)
>   netlink_unicast (net/netlink/af_netlink.c:1344)
>   netlink_sendmsg (net/netlink/af_netlink.c:1894)
>   __sys_sendto (net/socket.c:2206)
>   __x64_sys_sendto (net/socket.c:2209)
>   do_syscall_64 (arch/x86/entry/syscall_64.c:63)
>   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
>   </TASK>
>  Kernel panic - not syncing: Fatal exception
> 
> Reject attempts to set more PIDs than nr_cpu_ids in
> ovs_vport_set_upcall_portids(), and pre-compute the worst-case reply
> size in ovs_vport_cmd_msg_size() based on that bound, similar to the
> existing ovs_dp_cmd_msg_size().  nr_cpu_ids matches the cap already
> used by the per-CPU dispatch configuration on the datapath side
> (ovs_dp_cmd_fill_info() serialises at most nr_cpu_ids PIDs), so the
> two sides stay consistent.
> 
> Fixes: 5cd667b0a456 ("openvswitch: Allow each vport to have an array of 'port_id's.")
> Reported-by: Xiang Mei <xmei5@asu.edu>
> Assisted-by: Claude:claude-opus-4-6
> Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> ---

sashiko AI review is concerned about a few things:

1. Potentially higher memory usage for these messages on systems with
   high core count.  This is not really a problem as we have similarly
   sized datapath info messages and even larger flow messages.

2. Technically a uAPI change since we're limiting the number of PIDs
   that can be supplied.  But, as I said before, this number should be
   limited by some value regardless, it's not good to have it unbounded,
   and also it is generally not a good idea to configure more handlers
   than cores, as it will worsen the "thundering herd" issue the
   per-vport dispatch method has.  Existing userspace by default sizes
   the number of threads to be lower than the number of cores.

In practice, per-vport dispatch is only used as a fallback in all modern
versions of Open vSwitch (since 2021).  So, we should also, probably,
explore the ways to deprecate it and eventually remove, which would likely
require at least removing the warning for when the dp->upcall_portids
array is smaller than the cpu_id, so the users are not always required to
supply the full array, e.g. for tests, but that's a separate topic.

For this particular patch:

Reviewed-by: Ilya Maximets <i.maximets@ovn.org>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net v5] openvswitch: cap upcall PID array size and pre-size vport replies
  2026-04-16  2:46 [PATCH net v5] openvswitch: cap upcall PID array size and pre-size vport replies Weiming Shi
  2026-04-20  1:25 ` Ilya Maximets
@ 2026-04-20 21:59 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-04-20 21:59 UTC (permalink / raw)
  To: Weiming Shi
  Cc: aconole, echaudro, i.maximets, davem, edumazet, kuba, pabeni,
	horms, pshelar, tgraf, alexw, netdev, dev, xmei5

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 15 Apr 2026 19:46:54 -0700 you wrote:
> The vport netlink reply helpers allocate a fixed-size skb with
> nlmsg_new(NLMSG_DEFAULT_SIZE, ...) but serialize the full upcall PID
> array via ovs_vport_get_upcall_portids().  Since
> ovs_vport_set_upcall_portids() accepts any non-zero multiple of
> sizeof(u32) with no upper bound, a CAP_NET_ADMIN user can install a PID
> array large enough to overflow the reply buffer, causing nla_put() to
> fail with -EMSGSIZE and hitting BUG_ON(err < 0).  On systems with
> unprivileged user namespaces enabled (e.g., Ubuntu default), this is
> reachable via unshare -Urn since OVS vport mutation operations use
> GENL_UNS_ADMIN_PERM.
> 
> [...]

Here is the summary with links:
  - [net,v5] openvswitch: cap upcall PID array size and pre-size vport replies
    https://git.kernel.org/netdev/net/c/2091c6aa0df6

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-20 22:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-16  2:46 [PATCH net v5] openvswitch: cap upcall PID array size and pre-size vport replies Weiming Shi
2026-04-20  1:25 ` Ilya Maximets
2026-04-20 21:59 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox