From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFB6438BF92 for ; Thu, 2 Apr 2026 07:26:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775114796; cv=none; b=reAjvCMrLzb0cYZqf+zGyF1akznnR1hasppqRrR12stw6zSRmMH+1Z8RriJA11/tRFwKm13s6rS/Ek59J738CJLJn6AhGuQzTu0iqyx46NkvMfPFmf9To22+bAILz5cCL9pPCXdi5FndaSXY2MFjzgxRbl8fT8V7mf1s+GtyP/A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775114796; c=relaxed/simple; bh=MTUKsZMQhqavcLs0bYu45unM5oI1oNCah3f+jH1mpZ8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AVnoG5arGkAuxiJp+NSuPp6Fmkzumw+iWZiR+08rS4yMhzszMNx6CrgOksetQl/g8ZEQy7/ZxsN8jO2fo3isq05RLXCFA0z+r8LaXoETqIBYk0nhfotBKBnHwncqKTzFEQDfLJu/yA6CxLmWO9MI8gLIHNZ2FwKpGyzx16XmO8I= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 941615BD95; Thu, 2 Apr 2026 07:26:29 +0000 (UTC) Authentication-Results: smtp-out2.suse.de; none Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id DCFDC4A0B0; Thu, 2 Apr 2026 07:26:28 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id UIQRMyQazmn5agAAD6G6ig (envelope-from ); Thu, 02 Apr 2026 07:26:28 +0000 From: Fernando Fernandez Mancera To: netdev@vger.kernel.org Cc: idosch@nvidia.com, petrm@nvidia.com, horms@kernel.org, pabeni@redhat.com, kuba@kernel.org, edumazet@google.com, davem@davemloft.net, dsahern@kernel.org, kees@kernel.org, Fernando Fernandez Mancera , Yiming Qian Subject: [PATCH 2/2 net v3] ipv4: nexthop: allocate skb dynamically in rtm_get_nexthop() Date: Thu, 2 Apr 2026 09:26:13 +0200 Message-ID: <20260402072613.25262-2-fmancera@suse.de> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260402072613.25262-1-fmancera@suse.de> References: <20260402072613.25262-1-fmancera@suse.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[] X-Rspamd-Queue-Id: 941615BD95 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Action: no action X-Spam-Flag: NO X-Spam-Score: -4.00 X-Spam-Level: When querying a nexthop object via RTM_GETNEXTHOP, the kernel currently allocates a fixed-size skb using NLMSG_GOODSIZE. While sufficient for single nexthops and small Equal-Cost Multi-Path groups, this fixed allocation fails for large nexthop groups like 512 nexthops. This results in the following warning splat: WARNING: net/ipv4/nexthop.c:3395 at rtm_get_nexthop+0x176/0x1c0, CPU#20: rep/4608 [...] RIP: 0010:rtm_get_nexthop (net/ipv4/nexthop.c:3395) [...] Call Trace: rtnetlink_rcv_msg (net/core/rtnetlink.c:6989) netlink_rcv_skb (net/netlink/af_netlink.c:2550) netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344) netlink_sendmsg (net/netlink/af_netlink.c:1894) ____sys_sendmsg (net/socket.c:721 net/socket.c:736 net/socket.c:2585) ___sys_sendmsg (net/socket.c:2641) __sys_sendmsg (net/socket.c:2671) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Fix this by allocating the size dynamically using nh_nlmsg_size() and using nlmsg_new(), this is consistent with nexthop_notify() behavior. In addition, adjust nh_nlmsg_size_grp() so it calculates the size needed based on flags passed. While at it, also add the size of NHA_FDB for nexthop group size calculation as it was missing too. This cannot be reproduced via iproute2 as the group size is currently limited and the command fails as follows: addattr_l ERROR: message exceeded bound of 1048 Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Reported-by: Yiming Qian Closes: https://lore.kernel.org/netdev/CAL_bE8Li2h4KO+AQFXW4S6Yb_u5X4oSKnkywW+LPFjuErhqELA@mail.gmail.com/ Signed-off-by: Fernando Fernandez Mancera Reviewed-by: Eric Dumazet --- v2: adjust nh_nlmsg_size_grp() to handle size for stats and add symbols to the trace in commit message v3: include NHA_FDB size into nh_nlmsg_size_grp() calculation --- net/ipv4/nexthop.c | 38 +++++++++++++++++++++++++++----------- 1 file changed, 27 insertions(+), 11 deletions(-) diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index a0c694583299..2c9036c719b6 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -1003,16 +1003,32 @@ static size_t nh_nlmsg_size_grp_res(struct nh_group *nhg) nla_total_size_64bit(8);/* NHA_RES_GROUP_UNBALANCED_TIME */ } -static size_t nh_nlmsg_size_grp(struct nexthop *nh) +static size_t nh_nlmsg_size_grp(struct nexthop *nh, u32 op_flags) { struct nh_group *nhg = rtnl_dereference(nh->nh_grp); size_t sz = sizeof(struct nexthop_grp) * nhg->num_nh; size_t tot = nla_total_size(sz) + - nla_total_size(2); /* NHA_GROUP_TYPE */ + nla_total_size(2) + /* NHA_GROUP_TYPE */ + nla_total_size(0); /* NHA_FDB */ if (nhg->resilient) tot += nh_nlmsg_size_grp_res(nhg); + if (op_flags & NHA_OP_FLAG_DUMP_STATS) { + tot += nla_total_size(0) + /* NHA_GROUP_STATS */ + nla_total_size(4); /* NHA_HW_STATS_ENABLE */ + tot += nhg->num_nh * + (nla_total_size(0) + /* NHA_GROUP_STATS_ENTRY */ + nla_total_size(4) + /* NHA_GROUP_STATS_ENTRY_ID */ + nla_total_size_64bit(8)); /* NHA_GROUP_STATS_ENTRY_PACKETS */ + + if (op_flags & NHA_OP_FLAG_DUMP_HW_STATS) { + tot += nhg->num_nh * + nla_total_size_64bit(8); /* NHA_GROUP_STATS_ENTRY_PACKETS_HW */ + tot += nla_total_size(4); /* NHA_HW_STATS_USED */ + } + } + return tot; } @@ -1047,14 +1063,14 @@ static size_t nh_nlmsg_size_single(struct nexthop *nh) return sz; } -static size_t nh_nlmsg_size(struct nexthop *nh) +static size_t nh_nlmsg_size(struct nexthop *nh, u32 op_flags) { size_t sz = NLMSG_ALIGN(sizeof(struct nhmsg)); sz += nla_total_size(4); /* NHA_ID */ if (nh->is_group) - sz += nh_nlmsg_size_grp(nh) + + sz += nh_nlmsg_size_grp(nh, op_flags) + nla_total_size(4) + /* NHA_OP_FLAGS */ 0; else @@ -1070,7 +1086,7 @@ static void nexthop_notify(int event, struct nexthop *nh, struct nl_info *info) struct sk_buff *skb; int err = -ENOBUFS; - skb = nlmsg_new(nh_nlmsg_size(nh), gfp_any()); + skb = nlmsg_new(nh_nlmsg_size(nh, 0), gfp_any()); if (!skb) goto errout; @@ -3376,15 +3392,15 @@ static int rtm_get_nexthop(struct sk_buff *in_skb, struct nlmsghdr *nlh, if (err) return err; - err = -ENOBUFS; - skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); - if (!skb) - goto out; - err = -ENOENT; nh = nexthop_find_by_id(net, id); if (!nh) - goto errout_free; + goto out; + + err = -ENOBUFS; + skb = nlmsg_new(nh_nlmsg_size(nh, op_flags), GFP_KERNEL); + if (!skb) + goto out; err = nh_fill_node(skb, nh, RTM_NEWNEXTHOP, NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, 0, op_flags); -- 2.53.0