public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Fernando Fernandez Mancera <fmancera@suse.de>
To: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org, horms@kernel.org, pabeni@redhat.com,
	edumazet@google.com, davem@davemloft.net, dsahern@kernel.org,
	Yiming Qian <yimingqian591@gmail.com>
Subject: Re: [PATCH net] ipv4: nexthop: allocate skb dynamically in rtm_get_nexthop()
Date: Tue, 31 Mar 2026 19:40:38 +0200	[thread overview]
Message-ID: <3025cc67-ddbf-43f6-a313-602193979ace@suse.de> (raw)
In-Reply-To: <20260331103538.103e3778@kernel.org>

On 3/31/26 7:35 PM, Jakub Kicinski wrote:
> On Tue, 31 Mar 2026 13:59:43 +0200 Fernando Fernandez Mancera wrote:
>> When querying a nexthop object via RTM_GETNEXTHOP, the kernel currently
>> allocates a fixed-size skb using NLMSG_GOODSIZE. While sufficient for
>> single nexthops and small Equal-Cost Multi-Path groups, this fixed
>> allocation fails for large nexthop groups like 512+ nexthops.
> 
> router_mpath_seed.sh says:
> 
> [    9.366434] WARNING: net/ipv4/nexthop.c:3395 at rtm_get_nexthop+0x181/0x1b0, CPU#0: ip/342
> [    9.366490] Modules linked in: vrf veth
> [    9.366519] CPU: 0 UID: 0 PID: 342 Comm: ip Not tainted 7.0.0-rc5-virtme #1 PREEMPT(lazy)
> [    9.366567] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [    9.366610] RIP: 0010:rtm_get_nexthop+0x181/0x1b0
> [    9.366649] Code: 25 a7 ee ff eb 80 48 c7 c7 a0 ee af ae e8 57 30 f5 ff 4d 85 ed 74 08 49 c7 45 00 a0 ee af ae b8 ea ff ff ff e9 5d ff ff ff 90 <0f> 0b 90 ba 02 00 00 00 4c 89 ee 31 ff e8 2d 84 eb ff b8 a6 ff ff
> [    9.366754] RSP: 0018:ff5db175808bf9d8 EFLAGS: 00010286
> [    9.366790] RAX: 00000000ffffffa6 RBX: ff3c7cb941dab700 RCX: 0000000000000000
> [    9.366835] RDX: 0000000000000003 RSI: 0000000000000000 RDI: ff3c7cb94404a300
> [    9.366872] RBP: ff3c7cb94404a400 R08: ff3c7cb9413f25a8 R09: ff3c7cb94196cb40
> [    9.366916] R10: ff3c7cb94196ca00 R11: 000000000000000e R12: ffffffffaf8df6c0
> [    9.366967] R13: ff3c7cb94404a300 R14: ff3c7cb94525e180 R15: ff3c7cb94404a400
> [    9.367018] FS:  00007fa1f5261440(0000) GS:ff3c7cb9cf3fb000(0000) knlGS:0000000000000000
> [    9.367064] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    9.367102] CR2: 000000000044f720 CR3: 000000000573f001 CR4: 0000000000771ef0
> [    9.367149] PKRU: 55555554
> [    9.367165] Call Trace:
> [    9.367183]  <TASK>
> [    9.367201]  rtnetlink_rcv_msg+0x13a/0x3e0
> [    9.367227]  ? get_page_from_freelist+0x1109/0x16c0
> [    9.367259]  ? rtnl_calcit.isra.0+0x120/0x120
> [    9.367286]  netlink_rcv_skb+0x59/0x100
> [    9.367310]  netlink_unicast+0x255/0x380
> [    9.367333]  netlink_sendmsg+0x1cc/0x3e0
> [    9.367356]  ____sys_sendmsg+0x164/0x260
> [    9.367390]  ___sys_sendmsg+0x99/0xe0
> [    9.367415]  __sys_sendmsg+0x8a/0xe0
> [    9.367441]  do_syscall_64+0x101/0xfc0
> [    9.367466]  ? exc_page_fault+0x6e/0x170
> [    9.367493]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [    9.367528] RIP: 0033:0x7fa1f53bbc5e
> [    9.367550] Code: 4d 89 d8 e8 34 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa
> [    9.367653] RSP: 002b:00007ffc00947ff0 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> [    9.367700] RAX: ffffffffffffffda RBX: 000000000048ba90 RCX: 00007fa1f53bbc5e
> [    9.367744] RDX: 0000000000000000 RSI: 00007ffc009480b0 RDI: 0000000000000005
> [    9.367787] RBP: 00007ffc00948000 R08: 0000000000000000 R09: 0000000000000000
> [    9.367835] R10: 0000000000000000 R11: 0000000000000202 R12: 000000000049d620
> [    9.367889] R13: 0000000069cbe823 R14: 0000000000000004 R15: 000000000049d620
> 
> decoded:
> https://netdev-ctrl.bots.linux.dev/logs/vmksft/forwarding/results/582821/vm-crash-thr4-1

Hi Jakub,

thanks for sharing. As I replied to Eric this is the main reason why a 
V2 is needed. There is also another bug I discovered while fixing this.
When dumping the stats NHA_HW_STATS_ENABLE is being included twice per 
group. I am fixing that in another patch that will be included on the V2 
series.

Thanks,
Fernando.

  reply	other threads:[~2026-03-31 17:40 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-31 11:59 [PATCH net] ipv4: nexthop: allocate skb dynamically in rtm_get_nexthop() Fernando Fernandez Mancera
2026-03-31 12:13 ` Eric Dumazet
2026-03-31 12:50   ` Fernando Fernandez Mancera
2026-03-31 13:38     ` Eric Dumazet
2026-03-31 14:38       ` Fernando Fernandez Mancera
2026-03-31 17:35 ` Jakub Kicinski
2026-03-31 17:40   ` Fernando Fernandez Mancera [this message]
2026-03-31 22:41     ` Jakub Kicinski
2026-04-01  7:18       ` Fernando Fernandez Mancera

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3025cc67-ddbf-43f6-a313-602193979ace@suse.de \
    --to=fmancera@suse.de \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=yimingqian591@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox