Intel-Wired-Lan Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Ian Kumlien <ian.kumlien@gmail.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	intel-wired-lan <intel-wired-lan@lists.osuosl.org>,
	Jakub Kicinski <kuba@kernel.org>
Subject: Re: [Intel-wired-lan] bug with rx-udp-gro-forwarding offloading?
Date: Wed, 28 Jun 2023 11:06:23 +0200	[thread overview]
Message-ID: <486ae2687cd2e2624c0db1ea1f3d6ca36db15411.camel@redhat.com> (raw)
In-Reply-To: <CAA85sZurk7-_0XGmoCEM93vu3vbqRgPTH4QVymPR5BeeFw6iFg@mail.gmail.com>

Hello,

On Wed, 2023-06-28 at 09:37 +0200, Ian Kumlien wrote:
> Been running all night but eventually it crashed again...
> 
> [21753.055795] Out of memory: Killed process 970 (qemu-system-x86)
> total-vm:4709488kB, anon-rss:2172652kB, file-rss:4608kB,
> shmem-rss:0kB, UID:77 pgtables:4800kB oom_score_adj:0
> [24249.061154] general protection fault, probably for non-canonical
> address 0xb0746d4e6bee35e2: 0000 [#1] PREEMPT SMP NOPTI
> [24249.072138] CPU: 0 PID: 893 Comm: napi/eno1-68 Tainted: G        W
>         6.4.0-dirty #366
> [24249.080670] Hardware name: Supermicro Super Server/A2SDi-12C-HLN4F,
> BIOS 1.7a 10/13/2022
> [24249.088852] RIP: 0010:kmem_cache_alloc_bulk (mm/slub.c:377
> mm/slub.c:388 mm/slub.c:395 mm/slub.c:3963 mm/slub.c:4026)
> [24249.094086] Code: 0f 84 46 ff ff ff 65 ff 05 a4 bd e4 47 48 8b 4d
> 00 65 48 03 0d e8 5f e3 47 9c 5e fa 45 31 d2 eb 2f 8b 45 28 48 01 d0
> 48 89 c7 <48> 8b 00 48 33 85 b8 00 00 00 48 0f cf 48 31 f8 48 89 01 49
> 89 17
> All code
> ========
>    0: 0f 84 46 ff ff ff    je     0xffffffffffffff4c
>    6: 65 ff 05 a4 bd e4 47 incl   %gs:0x47e4bda4(%rip)        # 0x47e4bdb1
>    d: 48 8b 4d 00          mov    0x0(%rbp),%rcx
>   11: 65 48 03 0d e8 5f e3 add    %gs:0x47e35fe8(%rip),%rcx        # 0x47e36001
>   18: 47
>   19: 9c                    pushf
>   1a: 5e                    pop    %rsi
>   1b: fa                    cli
>   1c: 45 31 d2              xor    %r10d,%r10d
>   1f: eb 2f                jmp    0x50
>   21: 8b 45 28              mov    0x28(%rbp),%eax
>   24: 48 01 d0              add    %rdx,%rax
>   27: 48 89 c7              mov    %rax,%rdi
>   2a:* 48 8b 00              mov    (%rax),%rax <-- trapping instruction
>   2d: 48 33 85 b8 00 00 00 xor    0xb8(%rbp),%rax
>   34: 48 0f cf              bswap  %rdi
>   37: 48 31 f8              xor    %rdi,%rax
>   3a: 48 89 01              mov    %rax,(%rcx)
>   3d: 49 89 17              mov    %rdx,(%r15)
> 
> Code starting with the faulting instruction
> ===========================================
>    0: 48 8b 00              mov    (%rax),%rax
>    3: 48 33 85 b8 00 00 00 xor    0xb8(%rbp),%rax
>    a: 48 0f cf              bswap  %rdi
>    d: 48 31 f8              xor    %rdi,%rax
>   10: 48 89 01              mov    %rax,(%rcx)
>   13: 49 89 17              mov    %rdx,(%r15)
> [24249.112951] RSP: 0018:ffff9fc303973d20 EFLAGS: 00010086
> [24249.118275] RAX: b0746d4e6bee35e2 RBX: 0000000000000001 RCX: ffff8d5a2fa31da0
> [24249.125501] RDX: b0746d4e6bee3572 RSI: 0000000000000286 RDI: b0746d4e6bee35e2
> [24249.132730] RBP: ffff8d56c016d500 R08: 0000000000000400 R09: ffff8d56ede0e67a
> [24249.139958] R10: 0000000000000001 R11: ffff8d56c59d88c0 R12: 0000000000000010
> [24249.147187] R13: 0000000000000820 R14: ffff8d5a2fa2a810 R15: ffff8d5a2fa2a818
> [24249.154415] FS:  0000000000000000(0000) GS:ffff8d5a2fa00000(0000)
> knlGS:0000000000000000
> [24249.162620] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [24249.168471] CR2: 00007f0f3f7f8760 CR3: 0000000102466000 CR4: 00000000003526f0
> [24249.175717] Call Trace:
> [24249.178268]  <TASK>
> [24249.180476] ? die_addr (arch/x86/kernel/dumpstack.c:421
> arch/x86/kernel/dumpstack.c:460)
> [24249.183907] ? exc_general_protection (arch/x86/kernel/traps.c:783
> arch/x86/kernel/traps.c:728)
> [24249.188726] ? asm_exc_general_protection
> (./arch/x86/include/asm/idtentry.h:564)
> [24249.193720] ? kmem_cache_alloc_bulk (mm/slub.c:377 mm/slub.c:388
> mm/slub.c:395 mm/slub.c:3963 mm/slub.c:4026)
> [24249.198361] ? netif_receive_skb_list_internal (net/core/dev.c:5729)
> [24249.203960] napi_skb_cache_get (net/core/skbuff.c:338)
> [24249.208078] __napi_build_skb (net/core/skbuff.c:517)
> [24249.211934] napi_build_skb (net/core/skbuff.c:541)
> [24249.215616] ixgbe_poll
> (drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:2165
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:2361
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3178)
> [24249.219305] __napi_poll (net/core/dev.c:6498)
> [24249.222905] napi_threaded_poll (./include/linux/netpoll.h:89
> net/core/dev.c:6640)
> [24249.227197] ? __napi_poll (net/core/dev.c:6625)
> [24249.231050] kthread (kernel/kthread.c:379)
> [24249.234300] ? kthread_complete_and_exit (kernel/kthread.c:332)
> [24249.239207] ret_from_fork (arch/x86/entry/entry_64.S:314)
> [24249.242892]  </TASK>
> [24249.245185] Modules linked in: chaoskey
> [24249.249133] ---[ end trace 0000000000000000 ]---
> [24249.270157] pstore: backend (erst) writing error (-28)
> [24249.275408] RIP: 0010:kmem_cache_alloc_bulk (mm/slub.c:377
> mm/slub.c:388 mm/slub.c:395 mm/slub.c:3963 mm/slub.c:4026)
> [24249.280660] Code: 0f 84 46 ff ff ff 65 ff 05 a4 bd e4 47 48 8b 4d
> 00 65 48 03 0d e8 5f e3 47 9c 5e fa 45 31 d2 eb 2f 8b 45 28 48 01 d0
> 48 89 c7 <48> 8b 00 48 33 85 b8 00 00 00 48 0f cf 48 31 f8 48 89 01 49
> 89 17
> All code
> ========
>    0: 0f 84 46 ff ff ff    je     0xffffffffffffff4c
>    6: 65 ff 05 a4 bd e4 47 incl   %gs:0x47e4bda4(%rip)        # 0x47e4bdb1
>    d: 48 8b 4d 00          mov    0x0(%rbp),%rcx
>   11: 65 48 03 0d e8 5f e3 add    %gs:0x47e35fe8(%rip),%rcx        # 0x47e36001
>   18: 47
>   19: 9c                    pushf
>   1a: 5e                    pop    %rsi
>   1b: fa                    cli
>   1c: 45 31 d2              xor    %r10d,%r10d
>   1f: eb 2f                jmp    0x50
>   21: 8b 45 28              mov    0x28(%rbp),%eax
>   24: 48 01 d0              add    %rdx,%rax
>   27: 48 89 c7              mov    %rax,%rdi
>   2a:* 48 8b 00              mov    (%rax),%rax <-- trapping instruction
>   2d: 48 33 85 b8 00 00 00 xor    0xb8(%rbp),%rax
>   34: 48 0f cf              bswap  %rdi
>   37: 48 31 f8              xor    %rdi,%rax
>   3a: 48 89 01              mov    %rax,(%rcx)
>   3d: 49 89 17              mov    %rdx,(%r15)
> 
> Code starting with the faulting instruction
> ===========================================
>    0: 48 8b 00              mov    (%rax),%rax
>    3: 48 33 85 b8 00 00 00 xor    0xb8(%rbp),%rax
>    a: 48 0f cf              bswap  %rdi
>    d: 48 31 f8              xor    %rdi,%rax
>   10: 48 89 01              mov    %rax,(%rcx)
>   13: 49 89 17              mov    %rdx,(%r15)
> [24249.299578] RSP: 0018:ffff9fc303973d20 EFLAGS: 00010086
> [24249.304917] RAX: b0746d4e6bee35e2 RBX: 0000000000000001 RCX: ffff8d5a2fa31da0
> [24249.312161] RDX: b0746d4e6bee3572 RSI: 0000000000000286 RDI: b0746d4e6bee35e2
> [24249.319407] RBP: ffff8d56c016d500 R08: 0000000000000400 R09: ffff8d56ede0e67a
> [24249.326651] R10: 0000000000000001 R11: ffff8d56c59d88c0 R12: 0000000000000010
> [24249.333896] R13: 0000000000000820 R14: ffff8d5a2fa2a810 R15: ffff8d5a2fa2a818
> [24249.341141] FS:  0000000000000000(0000) GS:ffff8d5a2fa00000(0000)
> knlGS:0000000000000000
> [24249.349356] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [24249.355206] CR2: 00007f0f3f7f8760 CR3: 0000000102466000 CR4: 00000000003526f0
> [24249.362452] Kernel panic - not syncing: Fatal exception in interrupt
> [24249.566854] Kernel Offset: 0x36e00000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [24249.594124] ---[ end Kernel panic - not syncing: Fatal exception in
> interrupt ]---
> 
> It's also odd that i get a OOM - it only seems to happen when i enable
> rx-gro-list 

Unfortunately, not the result I was looking for. That leads to more
questions then answer, I'm sorry.

How long did the host keep going with rx-gro-list enabled?

Did you observe the WARN_ON() introduced by the tentative fix?

> - it's also odd because this machine always has ~8GB of
> memory available

It looks like there is a memory leak somewhere, and I don't think the
tentative fixup introduced such issue.

It looks like the above splat is due to a slab corruption, which in
turn could be unrelated from the mentioned leak, but it could/should 
be related to rx-gro-list. 

Could you please run the test with both kmemleak and kasan enabled?

Additionally could you please disclose if you have non trivial
netfilter and/or bridge filter and/or tc rules possibly modifying the
incoming/egress packets?

If kasan is not an option, could you please apply the debug the patch
below? (on top of the previous one)

Thanks!

Paolo
---
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6c5915efbc17..94adca27b205 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4295,6 +4295,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
 		delta_len += nskb->len;
 
 		skb_push(nskb, -skb_network_offset(nskb) + offset);
+		if (WARN_ON_ONCE(nskb->data - skb->head > skb->tail))
+			goto err_linearize;
 
 		skb_release_head_state(nskb);
 		len_diff = skb_network_header_len(nskb) - skb_network_header_len(skb);
@@ -4302,6 +4304,11 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
 
 		skb_headers_offset_update(nskb, skb_headroom(nskb) - skb_headroom(skb));
 		nskb->transport_header += len_diff;
+		if (WARN_ON_ONCE(tnl_hlen > skb_headroom(nskb)))
+			goto err_linearize;
+		if (WARN_ON_ONCE(skb_headroom(nskb) + offset > nskb->tail))
+			goto err_linearize;
+
 		skb_copy_from_linear_data_offset(skb, -tnl_hlen,
 						 nskb->data - tnl_hlen,
 						 offset + tnl_hlen);


_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

  reply	other threads:[~2023-06-28  9:06 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-24 20:03 [Intel-wired-lan] bug with rx-udp-gro-forwarding offloading? Ian Kumlien
2023-06-25 10:59 ` Ian Kumlien
2023-06-26 13:39   ` Ian Kumlien
2023-06-26 14:17   ` Alexander Lobakin
2023-06-26 14:25     ` Ian Kumlien
2023-06-26 16:42       ` Paolo Abeni
2023-06-26 17:07       ` Alexander Lobakin
2023-06-26 17:24         ` Ian Kumlien
2023-06-26 17:30           ` Ian Kumlien
2023-06-26 17:55             ` Paolo Abeni
2023-06-26 18:01               ` Ian Kumlien
2023-06-26 18:20                 ` Ian Kumlien
2023-06-26 18:59                   ` Ian Kumlien
2023-06-27  9:19                     ` Paolo Abeni
2023-06-27 12:31                       ` Ian Kumlien
2023-06-28  7:37                         ` Ian Kumlien
2023-06-28  9:06                           ` Paolo Abeni [this message]
2023-06-28 11:47                             ` Ian Kumlien
2023-06-28 12:04                               ` Ian Kumlien
2023-06-28 15:14                                 ` Paolo Abeni
2023-06-28 20:18                                   ` Ian Kumlien
2023-06-29 10:50                                     ` Ian Kumlien
2023-07-03  9:37                                       ` Ian Kumlien
2023-07-04 10:10                                         ` Paolo Abeni
2023-07-04 11:36                                           ` Ian Kumlien
2023-07-04 12:54                                             ` Paolo Abeni
2023-07-04 13:23                                               ` Ian Kumlien
2023-07-04 13:41                                                 ` Paolo Abeni
2023-07-04 14:06                                                   ` Ian Kumlien
2023-07-04 14:27                                                     ` Ian Kumlien
2023-07-04 14:57                                                       ` Ian Kumlien
2023-07-05 10:28                                                       ` Paolo Abeni
2023-07-05 11:32                                                         ` Ian Kumlien
2023-07-05 13:29                                                           ` Paolo Abeni
2023-07-05 13:58                                                             ` Ian Kumlien
2023-07-06  8:42                                                               ` Paolo Abeni
2023-07-06 11:27                                                                 ` Ian Kumlien
2023-07-06 13:01                                                                   ` Paolo Abeni
2023-07-06 13:56                                                                     ` Eric Dumazet
2023-07-06 14:04                                                                       ` Paolo Abeni
2023-07-06 16:17                                                                         ` Ian Kumlien
2023-07-06 17:10                                                                           ` Paolo Abeni
2023-07-06 17:43                                                                             ` Ian Kumlien
2023-07-06 22:32                                                                             ` Ian Kumlien
2023-07-06 22:41                                                                               ` Ian Kumlien
2023-07-07  6:55                                                                               ` Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=486ae2687cd2e2624c0db1ea1f3d6ca36db15411.camel@redhat.com \
    --to=pabeni@redhat.com \
    --cc=edumazet@google.com \
    --cc=ian.kumlien@gmail.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox