0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues

All of lore.kernel.org
 help / color / mirror / Atom feed

* 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues
@ 2024-09-06  6:20 Jaroslav Pulchart
  2024-09-13  7:50 ` Jaroslav Pulchart
  0 siblings, 1 reply; 13+ messages in thread
From: Jaroslav Pulchart @ 2024-09-06  6:20 UTC (permalink / raw)
  To: virtualization

Hello,

My virtual machine crashed with the message
"0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]". See the full log
below.

I did two changes:
* Updated my VM packages (kernel from 6.9.5 to 6.10.7) on the VM
* enabled "packed virtqueues" by libvirt on host
and it happens after a few hours of uptime.

Any hint how to prevent it or fix this issue?

[52890.265362] BUG: unable to handle page fault for address: ffff9b94c480000c
[52890.266264] #PF: supervisor write access in kernel mode
[52890.266814] #PF: error_code(0x000b) - reserved bit violation
[52890.267299] PGD 4c3c01067 P4D 4c3c01067 PUD 103be6063 PMD 7849cc063
PTE 7a1dd28f4e77cee7
[52890.267926] Oops: Oops: 000b [#1] PREEMPT SMP NOPTI
[52890.268372] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G            E
     6.10.7-1.gdc.el9.x86_64 #1
[52890.269007] Hardware name: RDO OpenStack Compute/RHEL, BIOS
edk2-20240524-1.el9 05/24/2024
[52890.269853] RIP: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]
[52890.270349] Code: e7 e8 85 ef ff ff 49 c7 84 24 80 05 00 00 00 00
00 00 41 0f b7 84 24 ac 02 00 00 48 8d 73 10 45 31 c0 b9 02 00 00 00
8d 50 f0 <66> 89 53 0c 49 8b 3c 24 0f b7 d2 e8 f1 04 9d cb 49 8b 3c 24
48 89
[52890.272173] RSP: 0018:ffffb5dac64f0d60 EFLAGS: 00010246
[52890.272637] RAX: 0000000000001000 RBX: ffff9b94c4800000 RCX: 0000000000000002
[52890.273209] RDX: 0000000000000ff0 RSI: ffff9b94c4800010 RDI: ffffe06380000000
[52890.273772] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffe0639e120008
[52890.274341] R10: 0000000000000000 R11: 00000000000096d0 R12: ffff9b8ebe621800
[52890.274898] R13: ffff9b8ebe621800 R14: 0000000000001000 R15: 0000000000000000
[52890.275476] FS:  0000000000000000(0000) GS:ffff9b9ba3d00000(0000)
knlGS:0000000000000000
[52890.276087] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[52890.276592] CR2: ffff9b94c480000c CR3: 00000004c3b1c006 CR4: 0000000000770ef0
[52890.277167] PKRU: 55555554
[52890.277524] Call Trace:
[52890.277869]  <IRQ>
[52890.278198]  ? __die+0x20/0x70
[52890.278580]  ? page_fault_oops+0x75/0x170
[52890.279009]  ? exc_page_fault+0xbe/0x160
[52890.279441]  ? asm_exc_page_fault+0x22/0x30
[52890.279881]  ? virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]
[52890.280377]  try_fill_recv+0x22c/0x440 [virtio_net]
[52890.280848]  virtnet_receive+0x1ce/0x230 [virtio_net]
[52890.281334]  virtnet_poll+0x179/0x3a0 [virtio_net]
[52890.281804]  __napi_poll+0x29/0x1b0
[52890.282222]  net_rx_action+0x2b5/0x390
[52890.282641]  ? _raw_spin_unlock_irqrestore+0xa/0x30
[52890.283118]  handle_softirqs+0xd3/0x2b0
[52890.283550]  __irq_exit_rcu+0x9b/0xc0
[52890.283970]  common_interrupt+0x7f/0xa0
[52890.284409]  </IRQ>
[52890.284749]  <TASK>
[52890.285084]  asm_common_interrupt+0x22/0x40
[52890.285526] RIP: 0010:default_idle+0xb/0x20
[52890.285953] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 90
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d c3 92 30
00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
00 90
[52890.287332] RSP: 0018:ffffb5dac436fec0 EFLAGS: 00000206
[52890.287814] RAX: 000000000000000e RBX: ffff9b94c320ce00 RCX: 0000000103227730
[52890.288395] RDX: 000000000000000e RSI: 0000000000000082 RDI: 000000002f3198a4
[52890.288962] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[52890.289533] R10: 00000000000002cd R11: 0000000000000000 R12: 0000000000000000
[52890.290092] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[52890.290657]  default_idle_call+0x2c/0xf0
[52890.291053]  cpuidle_idle_call+0x109/0x120
[52890.291464]  do_idle+0x76/0xb0
[52890.291813]  cpu_startup_entry+0x25/0x30
[52890.292269]  start_secondary+0x113/0x130
[52890.292812]  common_startup_64+0x13e/0x141
[52890.293279]  </TASK>
[52890.293615] Modules linked in: mptcp_diag(E) xsk_diag(E)
raw_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) tcp_diag(E)
udp_diag(E) inet_diag(E) nf_conntrack(E) nf_defrag_ipv6(E)
nf_defrag_ipv4(E) binfmt_misc(E) zram(E) tls(E) isofs(E) vfat(E)
fat(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) ccp(E) kvm(E)
i2c_i801(E) virtio_net(E) net_failover(E) failover(E) virtio_gpu(E)
i2c_smbus(E) dimlib(E) virtio_balloon(E) virtio_dma_buf(E) fuse(E)
ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E)
crct10dif_pclmul(E) libahci(E) crc32_pclmul(E) polyval_clmulni(E)
polyval_generic(E) libata(E) ghash_clmulni_intel(E) sha512_ssse3(E)
virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E)
raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E)
dm_region_hash(E) dm_log(E) dm_mod(E)
[52890.293657] Unloaded tainted modules: edac_mce_amd(E):1
amd_atl(E):2 padlock_aes(E):3
[52890.299716] CR2: ffff9b94c480000c
[52890.300092] ---[ end trace 0000000000000000 ]---
[52890.300101] Oops: general protection fault, probably for
non-canonical address 0x86304c8ed4b709b3: 0000 [#2] PREEMPT SMP NOPTI
[52890.300336] RIP: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]
[52890.301177] CPU: 10 PID: 51 Comm: ksoftirqd/10 Tainted: G      D
 E      6.10.7-1.gdc.el9.x86_64 #1
[52890.301472] Code: e7 e8 85 ef ff ff 49 c7 84 24 80 05 00 00 00 00
00 00 41 0f b7 84 24 ac 02 00 00 48 8d 73 10 45 31 c0 b9 02 00 00 00
8d 50 f0 <66> 89 53 0c 49 8b 3c 24 0f b7 d2 e8 f1 04 9d cb 49 8b 3c 24
48 89
[52890.302137] Hardware name: RDO OpenStack Compute/RHEL, BIOS
edk2-20240524-1.el9 05/24/2024
[52890.303074] RSP: 0018:ffffb5dac64f0d60 EFLAGS: 00010246
[52890.303689] RIP: 0010:put_cpu_partial+0x15/0x70
[52890.303895]
[52890.304291] Code: 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90
90 90 90 90 90 0f 1f 44 00 00 9c 59 fa 48 8b 07 65 4c 8b 40 18 4d 85
c0 74 54 <41> 8b 40 18 85 d2 75 22 83 c0 01 89 46 18 4c 89 46 10 48 8b
07 65
[52890.304556] RAX: 0000000000001000 RBX: ffff9b94c4800000 RCX: 0000000000000002
[52890.304680] RSP: 0018:ffffb5dac6473d48 EFLAGS: 00010082
[52890.305597] RDX: 0000000000000ff0 RSI: ffff9b94c4800010 RDI: ffffe06380000000
[52890.306115]
[52890.306384] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffe0639e120008
[52890.306910] RAX: 00003a3f1c003ff0 RBX: ffff9b99322a16d0 RCX: 0000000000000246
[52890.307010] R10: 0000000000000000 R11: 00000000000096d0 R12: ffff9b8ebe621800
[52890.307574] RDX: 0000000000000001 RSI: ffffe063afc8a800 RDI: ffff9b94cc48fc00
[52890.307927] R13: ffff9b8ebe621800 R14: 0000000000001000 R15: 0000000000000000
[52890.308450] RBP: ffffb5dac6473da0 R08: 86304c8ed4b7099b R09: 00000000001c001b
[52890.309006] FS:  0000000000000000(0000) GS:ffff9b9ba3d00000(0000)
knlGS:0000000000000000
[52890.309543] R10: 0000000000040000 R11: 0000000000000001 R12: ffffe063afc8a840
[52890.310127] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[52890.310722] R13: ffffe063afc8a800 R14: ffff9b94cc48fc00 R15: ffff9b9ba3b36100
[52890.311314] CR2: ffff9b94c480000c CR3: 00000004c3b1c006 CR4: 0000000000770ef0
[52890.311735] FS:  0000000000000000(0000) GS:ffff9b9ba3b00000(0000)
knlGS:0000000000000000
[52890.312333] PKRU: 55555554
[52890.312850] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[52890.313298] Kernel panic - not syncing: Fatal exception in interrupt
[52891.350776] Shutting down cpus with NMI
[52891.358926] Kernel Offset: 0xae00000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[52891.359713] ---[ end Kernel panic - not syncing: Fatal exception in
interrupt ]---

Best,
Jaroslav Pulchart

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues
  2024-09-06  6:20 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues Jaroslav Pulchart
@ 2024-09-13  7:50 ` Jaroslav Pulchart
  2024-09-13  8:26   ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 13+ messages in thread
From: Jaroslav Pulchart @ 2024-09-13  7:50 UTC (permalink / raw)
  To: virtualization

Hello,

actually I'm getting random memory corruption related crashes after
updating to 6.10.y. My expectation is that it relates to this issue:
https://bugzilla.kernel.org/show_bug.cgi?id=219154
It looks like it is almost 1 month ago already from the last comment
there, However the patches fixing the regression are not reverted from
the 6.10.y tree which surprises me.

I will try to revert them from our builds and see if it helps to avoid
random daily happening crashes.

Best

pá 6. 9. 2024 v 8:20 odesílatel Jaroslav Pulchart
<jaroslav.pulchart@gooddata.com> napsal:
>
> Hello,
>
> My virtual machine crashed with the message
> "0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]". See the full log
> below.
>
> I did two changes:
> * Updated my VM packages (kernel from 6.9.5 to 6.10.7) on the VM
> * enabled "packed virtqueues" by libvirt on host
> and it happens after a few hours of uptime.
>
> Any hint how to prevent it or fix this issue?
>
> [52890.265362] BUG: unable to handle page fault for address: ffff9b94c480000c
> [52890.266264] #PF: supervisor write access in kernel mode
> [52890.266814] #PF: error_code(0x000b) - reserved bit violation
> [52890.267299] PGD 4c3c01067 P4D 4c3c01067 PUD 103be6063 PMD 7849cc063
> PTE 7a1dd28f4e77cee7
> [52890.267926] Oops: Oops: 000b [#1] PREEMPT SMP NOPTI
> [52890.268372] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G            E
>      6.10.7-1.gdc.el9.x86_64 #1
> [52890.269007] Hardware name: RDO OpenStack Compute/RHEL, BIOS
> edk2-20240524-1.el9 05/24/2024
> [52890.269853] RIP: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]
> [52890.270349] Code: e7 e8 85 ef ff ff 49 c7 84 24 80 05 00 00 00 00
> 00 00 41 0f b7 84 24 ac 02 00 00 48 8d 73 10 45 31 c0 b9 02 00 00 00
> 8d 50 f0 <66> 89 53 0c 49 8b 3c 24 0f b7 d2 e8 f1 04 9d cb 49 8b 3c 24
> 48 89
> [52890.272173] RSP: 0018:ffffb5dac64f0d60 EFLAGS: 00010246
> [52890.272637] RAX: 0000000000001000 RBX: ffff9b94c4800000 RCX: 0000000000000002
> [52890.273209] RDX: 0000000000000ff0 RSI: ffff9b94c4800010 RDI: ffffe06380000000
> [52890.273772] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffe0639e120008
> [52890.274341] R10: 0000000000000000 R11: 00000000000096d0 R12: ffff9b8ebe621800
> [52890.274898] R13: ffff9b8ebe621800 R14: 0000000000001000 R15: 0000000000000000
> [52890.275476] FS:  0000000000000000(0000) GS:ffff9b9ba3d00000(0000)
> knlGS:0000000000000000
> [52890.276087] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [52890.276592] CR2: ffff9b94c480000c CR3: 00000004c3b1c006 CR4: 0000000000770ef0
> [52890.277167] PKRU: 55555554
> [52890.277524] Call Trace:
> [52890.277869]  <IRQ>
> [52890.278198]  ? __die+0x20/0x70
> [52890.278580]  ? page_fault_oops+0x75/0x170
> [52890.279009]  ? exc_page_fault+0xbe/0x160
> [52890.279441]  ? asm_exc_page_fault+0x22/0x30
> [52890.279881]  ? virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]
> [52890.280377]  try_fill_recv+0x22c/0x440 [virtio_net]
> [52890.280848]  virtnet_receive+0x1ce/0x230 [virtio_net]
> [52890.281334]  virtnet_poll+0x179/0x3a0 [virtio_net]
> [52890.281804]  __napi_poll+0x29/0x1b0
> [52890.282222]  net_rx_action+0x2b5/0x390
> [52890.282641]  ? _raw_spin_unlock_irqrestore+0xa/0x30
> [52890.283118]  handle_softirqs+0xd3/0x2b0
> [52890.283550]  __irq_exit_rcu+0x9b/0xc0
> [52890.283970]  common_interrupt+0x7f/0xa0
> [52890.284409]  </IRQ>
> [52890.284749]  <TASK>
> [52890.285084]  asm_common_interrupt+0x22/0x40
> [52890.285526] RIP: 0010:default_idle+0xb/0x20
> [52890.285953] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 90
> 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d c3 92 30
> 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> 00 90
> [52890.287332] RSP: 0018:ffffb5dac436fec0 EFLAGS: 00000206
> [52890.287814] RAX: 000000000000000e RBX: ffff9b94c320ce00 RCX: 0000000103227730
> [52890.288395] RDX: 000000000000000e RSI: 0000000000000082 RDI: 000000002f3198a4
> [52890.288962] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
> [52890.289533] R10: 00000000000002cd R11: 0000000000000000 R12: 0000000000000000
> [52890.290092] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [52890.290657]  default_idle_call+0x2c/0xf0
> [52890.291053]  cpuidle_idle_call+0x109/0x120
> [52890.291464]  do_idle+0x76/0xb0
> [52890.291813]  cpu_startup_entry+0x25/0x30
> [52890.292269]  start_secondary+0x113/0x130
> [52890.292812]  common_startup_64+0x13e/0x141
> [52890.293279]  </TASK>
> [52890.293615] Modules linked in: mptcp_diag(E) xsk_diag(E)
> raw_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) tcp_diag(E)
> udp_diag(E) inet_diag(E) nf_conntrack(E) nf_defrag_ipv6(E)
> nf_defrag_ipv4(E) binfmt_misc(E) zram(E) tls(E) isofs(E) vfat(E)
> fat(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) ccp(E) kvm(E)
> i2c_i801(E) virtio_net(E) net_failover(E) failover(E) virtio_gpu(E)
> i2c_smbus(E) dimlib(E) virtio_balloon(E) virtio_dma_buf(E) fuse(E)
> ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E)
> crct10dif_pclmul(E) libahci(E) crc32_pclmul(E) polyval_clmulni(E)
> polyval_generic(E) libata(E) ghash_clmulni_intel(E) sha512_ssse3(E)
> virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E)
> raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E)
> dm_region_hash(E) dm_log(E) dm_mod(E)
> [52890.293657] Unloaded tainted modules: edac_mce_amd(E):1
> amd_atl(E):2 padlock_aes(E):3
> [52890.299716] CR2: ffff9b94c480000c
> [52890.300092] ---[ end trace 0000000000000000 ]---
> [52890.300101] Oops: general protection fault, probably for
> non-canonical address 0x86304c8ed4b709b3: 0000 [#2] PREEMPT SMP NOPTI
> [52890.300336] RIP: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]
> [52890.301177] CPU: 10 PID: 51 Comm: ksoftirqd/10 Tainted: G      D
>  E      6.10.7-1.gdc.el9.x86_64 #1
> [52890.301472] Code: e7 e8 85 ef ff ff 49 c7 84 24 80 05 00 00 00 00
> 00 00 41 0f b7 84 24 ac 02 00 00 48 8d 73 10 45 31 c0 b9 02 00 00 00
> 8d 50 f0 <66> 89 53 0c 49 8b 3c 24 0f b7 d2 e8 f1 04 9d cb 49 8b 3c 24
> 48 89
> [52890.302137] Hardware name: RDO OpenStack Compute/RHEL, BIOS
> edk2-20240524-1.el9 05/24/2024
> [52890.303074] RSP: 0018:ffffb5dac64f0d60 EFLAGS: 00010246
> [52890.303689] RIP: 0010:put_cpu_partial+0x15/0x70
> [52890.303895]
> [52890.304291] Code: 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90
> 90 90 90 90 90 0f 1f 44 00 00 9c 59 fa 48 8b 07 65 4c 8b 40 18 4d 85
> c0 74 54 <41> 8b 40 18 85 d2 75 22 83 c0 01 89 46 18 4c 89 46 10 48 8b
> 07 65
> [52890.304556] RAX: 0000000000001000 RBX: ffff9b94c4800000 RCX: 0000000000000002
> [52890.304680] RSP: 0018:ffffb5dac6473d48 EFLAGS: 00010082
> [52890.305597] RDX: 0000000000000ff0 RSI: ffff9b94c4800010 RDI: ffffe06380000000
> [52890.306115]
> [52890.306384] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffe0639e120008
> [52890.306910] RAX: 00003a3f1c003ff0 RBX: ffff9b99322a16d0 RCX: 0000000000000246
> [52890.307010] R10: 0000000000000000 R11: 00000000000096d0 R12: ffff9b8ebe621800
> [52890.307574] RDX: 0000000000000001 RSI: ffffe063afc8a800 RDI: ffff9b94cc48fc00
> [52890.307927] R13: ffff9b8ebe621800 R14: 0000000000001000 R15: 0000000000000000
> [52890.308450] RBP: ffffb5dac6473da0 R08: 86304c8ed4b7099b R09: 00000000001c001b
> [52890.309006] FS:  0000000000000000(0000) GS:ffff9b9ba3d00000(0000)
> knlGS:0000000000000000
> [52890.309543] R10: 0000000000040000 R11: 0000000000000001 R12: ffffe063afc8a840
> [52890.310127] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [52890.310722] R13: ffffe063afc8a800 R14: ffff9b94cc48fc00 R15: ffff9b9ba3b36100
> [52890.311314] CR2: ffff9b94c480000c CR3: 00000004c3b1c006 CR4: 0000000000770ef0
> [52890.311735] FS:  0000000000000000(0000) GS:ffff9b9ba3b00000(0000)
> knlGS:0000000000000000
> [52890.312333] PKRU: 55555554
> [52890.312850] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [52890.313298] Kernel panic - not syncing: Fatal exception in interrupt
> [52891.350776] Shutting down cpus with NMI
> [52891.358926] Kernel Offset: 0xae00000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [52891.359713] ---[ end Kernel panic - not syncing: Fatal exception in
> interrupt ]---
>
> Best,
> Jaroslav Pulchart



-- 
Jaroslav Pulchart
Sr. Principal SW Engineer
GoodData

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues
  2024-09-13  7:50 ` Jaroslav Pulchart
@ 2024-09-13  8:26   ` Linux regression tracking (Thorsten Leemhuis)
  2024-09-13  8:42     ` Xuan Zhuo
  0 siblings, 1 reply; 13+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2024-09-13  8:26 UTC (permalink / raw)
  To: Xuan Zhuo, Michael S. Tsirkin
  Cc: Linux kernel regressions list, Jaroslav Pulchart, virtualization

[CCing a few people that know more about this stuff than I do]

On 13.09.24 09:50, Jaroslav Pulchart wrote:
> 
> actually I'm getting random memory corruption related crashes after
> updating to 6.10.y. My expectation is that it relates to this issue:
> https://bugzilla.kernel.org/show_bug.cgi?id=219154
> It looks like it is almost 1 month ago

A lot of developer ignore bugzilla.

> already from the last comment
> there, However the patches fixing the regression are not reverted from
> the 6.10.y tree which surprises me.
> 
> I will try to revert them from our builds and see if it helps to avoid
> random daily happening crashes.

Not my area of expertise, but to me it sounds like the problem will be
resolved my "Revert "virtio_net: rx enable premapped mode by default"":
https://lore.kernel.org/all/20240820071913.68004-1-xuanzhuo@linux.alibaba.com/

That set just landed in mainline. It's likely to be backported to 6.10.y
within a week or two, but it's not ensured due to the lack of a stable
tag. So you might keep an eye on it.

Ciao, Thorsten

> pá 6. 9. 2024 v 8:20 odesílatel Jaroslav Pulchart
> <jaroslav.pulchart@gooddata.com> napsal:
>>
>> Hello,
>>
>> My virtual machine crashed with the message
>> "0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]". See the full log
>> below.
>>
>> I did two changes:
>> * Updated my VM packages (kernel from 6.9.5 to 6.10.7) on the VM
>> * enabled "packed virtqueues" by libvirt on host
>> and it happens after a few hours of uptime.
>>
>> Any hint how to prevent it or fix this issue?
>>
>> [52890.265362] BUG: unable to handle page fault for address: ffff9b94c480000c
>> [52890.266264] #PF: supervisor write access in kernel mode
>> [52890.266814] #PF: error_code(0x000b) - reserved bit violation
>> [52890.267299] PGD 4c3c01067 P4D 4c3c01067 PUD 103be6063 PMD 7849cc063
>> PTE 7a1dd28f4e77cee7
>> [52890.267926] Oops: Oops: 000b [#1] PREEMPT SMP NOPTI
>> [52890.268372] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G            E
>>      6.10.7-1.gdc.el9.x86_64 #1
>> [52890.269007] Hardware name: RDO OpenStack Compute/RHEL, BIOS
>> edk2-20240524-1.el9 05/24/2024
>> [52890.269853] RIP: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]
>> [52890.270349] Code: e7 e8 85 ef ff ff 49 c7 84 24 80 05 00 00 00 00
>> 00 00 41 0f b7 84 24 ac 02 00 00 48 8d 73 10 45 31 c0 b9 02 00 00 00
>> 8d 50 f0 <66> 89 53 0c 49 8b 3c 24 0f b7 d2 e8 f1 04 9d cb 49 8b 3c 24
>> 48 89
>> [52890.272173] RSP: 0018:ffffb5dac64f0d60 EFLAGS: 00010246
>> [52890.272637] RAX: 0000000000001000 RBX: ffff9b94c4800000 RCX: 0000000000000002
>> [52890.273209] RDX: 0000000000000ff0 RSI: ffff9b94c4800010 RDI: ffffe06380000000
>> [52890.273772] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffe0639e120008
>> [52890.274341] R10: 0000000000000000 R11: 00000000000096d0 R12: ffff9b8ebe621800
>> [52890.274898] R13: ffff9b8ebe621800 R14: 0000000000001000 R15: 0000000000000000
>> [52890.275476] FS:  0000000000000000(0000) GS:ffff9b9ba3d00000(0000)
>> knlGS:0000000000000000
>> [52890.276087] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [52890.276592] CR2: ffff9b94c480000c CR3: 00000004c3b1c006 CR4: 0000000000770ef0
>> [52890.277167] PKRU: 55555554
>> [52890.277524] Call Trace:
>> [52890.277869]  <IRQ>
>> [52890.278198]  ? __die+0x20/0x70
>> [52890.278580]  ? page_fault_oops+0x75/0x170
>> [52890.279009]  ? exc_page_fault+0xbe/0x160
>> [52890.279441]  ? asm_exc_page_fault+0x22/0x30
>> [52890.279881]  ? virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]
>> [52890.280377]  try_fill_recv+0x22c/0x440 [virtio_net]
>> [52890.280848]  virtnet_receive+0x1ce/0x230 [virtio_net]
>> [52890.281334]  virtnet_poll+0x179/0x3a0 [virtio_net]
>> [52890.281804]  __napi_poll+0x29/0x1b0
>> [52890.282222]  net_rx_action+0x2b5/0x390
>> [52890.282641]  ? _raw_spin_unlock_irqrestore+0xa/0x30
>> [52890.283118]  handle_softirqs+0xd3/0x2b0
>> [52890.283550]  __irq_exit_rcu+0x9b/0xc0
>> [52890.283970]  common_interrupt+0x7f/0xa0
>> [52890.284409]  </IRQ>
>> [52890.284749]  <TASK>
>> [52890.285084]  asm_common_interrupt+0x22/0x40
>> [52890.285526] RIP: 0010:default_idle+0xb/0x20
>> [52890.285953] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 90
>> 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d c3 92 30
>> 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
>> 00 90
>> [52890.287332] RSP: 0018:ffffb5dac436fec0 EFLAGS: 00000206
>> [52890.287814] RAX: 000000000000000e RBX: ffff9b94c320ce00 RCX: 0000000103227730
>> [52890.288395] RDX: 000000000000000e RSI: 0000000000000082 RDI: 000000002f3198a4
>> [52890.288962] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
>> [52890.289533] R10: 00000000000002cd R11: 0000000000000000 R12: 0000000000000000
>> [52890.290092] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>> [52890.290657]  default_idle_call+0x2c/0xf0
>> [52890.291053]  cpuidle_idle_call+0x109/0x120
>> [52890.291464]  do_idle+0x76/0xb0
>> [52890.291813]  cpu_startup_entry+0x25/0x30
>> [52890.292269]  start_secondary+0x113/0x130
>> [52890.292812]  common_startup_64+0x13e/0x141
>> [52890.293279]  </TASK>
>> [52890.293615] Modules linked in: mptcp_diag(E) xsk_diag(E)
>> raw_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) tcp_diag(E)
>> udp_diag(E) inet_diag(E) nf_conntrack(E) nf_defrag_ipv6(E)
>> nf_defrag_ipv4(E) binfmt_misc(E) zram(E) tls(E) isofs(E) vfat(E)
>> fat(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) ccp(E) kvm(E)
>> i2c_i801(E) virtio_net(E) net_failover(E) failover(E) virtio_gpu(E)
>> i2c_smbus(E) dimlib(E) virtio_balloon(E) virtio_dma_buf(E) fuse(E)
>> ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E)
>> crct10dif_pclmul(E) libahci(E) crc32_pclmul(E) polyval_clmulni(E)
>> polyval_generic(E) libata(E) ghash_clmulni_intel(E) sha512_ssse3(E)
>> virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E)
>> raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E)
>> dm_region_hash(E) dm_log(E) dm_mod(E)
>> [52890.293657] Unloaded tainted modules: edac_mce_amd(E):1
>> amd_atl(E):2 padlock_aes(E):3
>> [52890.299716] CR2: ffff9b94c480000c
>> [52890.300092] ---[ end trace 0000000000000000 ]---
>> [52890.300101] Oops: general protection fault, probably for
>> non-canonical address 0x86304c8ed4b709b3: 0000 [#2] PREEMPT SMP NOPTI
>> [52890.300336] RIP: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]
>> [52890.301177] CPU: 10 PID: 51 Comm: ksoftirqd/10 Tainted: G      D
>>  E      6.10.7-1.gdc.el9.x86_64 #1
>> [52890.301472] Code: e7 e8 85 ef ff ff 49 c7 84 24 80 05 00 00 00 00
>> 00 00 41 0f b7 84 24 ac 02 00 00 48 8d 73 10 45 31 c0 b9 02 00 00 00
>> 8d 50 f0 <66> 89 53 0c 49 8b 3c 24 0f b7 d2 e8 f1 04 9d cb 49 8b 3c 24
>> 48 89
>> [52890.302137] Hardware name: RDO OpenStack Compute/RHEL, BIOS
>> edk2-20240524-1.el9 05/24/2024
>> [52890.303074] RSP: 0018:ffffb5dac64f0d60 EFLAGS: 00010246
>> [52890.303689] RIP: 0010:put_cpu_partial+0x15/0x70
>> [52890.303895]
>> [52890.304291] Code: 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90
>> 90 90 90 90 90 0f 1f 44 00 00 9c 59 fa 48 8b 07 65 4c 8b 40 18 4d 85
>> c0 74 54 <41> 8b 40 18 85 d2 75 22 83 c0 01 89 46 18 4c 89 46 10 48 8b
>> 07 65
>> [52890.304556] RAX: 0000000000001000 RBX: ffff9b94c4800000 RCX: 0000000000000002
>> [52890.304680] RSP: 0018:ffffb5dac6473d48 EFLAGS: 00010082
>> [52890.305597] RDX: 0000000000000ff0 RSI: ffff9b94c4800010 RDI: ffffe06380000000
>> [52890.306115]
>> [52890.306384] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffe0639e120008
>> [52890.306910] RAX: 00003a3f1c003ff0 RBX: ffff9b99322a16d0 RCX: 0000000000000246
>> [52890.307010] R10: 0000000000000000 R11: 00000000000096d0 R12: ffff9b8ebe621800
>> [52890.307574] RDX: 0000000000000001 RSI: ffffe063afc8a800 RDI: ffff9b94cc48fc00
>> [52890.307927] R13: ffff9b8ebe621800 R14: 0000000000001000 R15: 0000000000000000
>> [52890.308450] RBP: ffffb5dac6473da0 R08: 86304c8ed4b7099b R09: 00000000001c001b
>> [52890.309006] FS:  0000000000000000(0000) GS:ffff9b9ba3d00000(0000)
>> knlGS:0000000000000000
>> [52890.309543] R10: 0000000000040000 R11: 0000000000000001 R12: ffffe063afc8a840
>> [52890.310127] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [52890.310722] R13: ffffe063afc8a800 R14: ffff9b94cc48fc00 R15: ffff9b9ba3b36100
>> [52890.311314] CR2: ffff9b94c480000c CR3: 00000004c3b1c006 CR4: 0000000000770ef0
>> [52890.311735] FS:  0000000000000000(0000) GS:ffff9b9ba3b00000(0000)
>> knlGS:0000000000000000
>> [52890.312333] PKRU: 55555554
>> [52890.312850] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [52890.313298] Kernel panic - not syncing: Fatal exception in interrupt
>> [52891.350776] Shutting down cpus with NMI
>> [52891.358926] Kernel Offset: 0xae00000 from 0xffffffff81000000
>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [52891.359713] ---[ end Kernel panic - not syncing: Fatal exception in
>> interrupt ]---
>>
>> Best,
>> Jaroslav Pulchart
> 
> 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues
  2024-09-13  8:26   ` Linux regression tracking (Thorsten Leemhuis)
@ 2024-09-13  8:42     ` Xuan Zhuo
  2024-09-13  8:51       ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 13+ messages in thread
From: Xuan Zhuo @ 2024-09-13  8:42 UTC (permalink / raw)
  To: Linux regression tracking (Thorsten Leemhuis)
  Cc: Linux kernel regressions list, Jaroslav Pulchart, virtualization,
	Michael S. Tsirkin

On Fri, 13 Sep 2024 10:26:57 +0200, "Linux regression tracking (Thorsten Leemhuis)" <regressions@leemhuis.info> wrote:
> [CCing a few people that know more about this stuff than I do]
>
> On 13.09.24 09:50, Jaroslav Pulchart wrote:
> >
> > actually I'm getting random memory corruption related crashes after
> > updating to 6.10.y. My expectation is that it relates to this issue:
> > https://bugzilla.kernel.org/show_bug.cgi?id=219154
> > It looks like it is almost 1 month ago
>
> A lot of developer ignore bugzilla.
>
> > already from the last comment
> > there, However the patches fixing the regression are not reverted from
> > the 6.10.y tree which surprises me.
> >
> > I will try to revert them from our builds and see if it helps to avoid
> > random daily happening crashes.
>
> Not my area of expertise, but to me it sounds like the problem will be
> resolved my "Revert "virtio_net: rx enable premapped mode by default"":
> https://lore.kernel.org/all/20240820071913.68004-1-xuanzhuo@linux.alibaba.com/


YES. That is merged into net.

Thanks


>
> That set just landed in mainline. It's likely to be backported to 6.10.y
> within a week or two, but it's not ensured due to the lack of a stable
> tag. So you might keep an eye on it.
>
> Ciao, Thorsten
>
> > pá 6. 9. 2024 v 8:20 odesílatel Jaroslav Pulchart
> > <jaroslav.pulchart@gooddata.com> napsal:
> >>
> >> Hello,
> >>
> >> My virtual machine crashed with the message
> >> "0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]". See the full log
> >> below.
> >>
> >> I did two changes:
> >> * Updated my VM packages (kernel from 6.9.5 to 6.10.7) on the VM
> >> * enabled "packed virtqueues" by libvirt on host
> >> and it happens after a few hours of uptime.
> >>
> >> Any hint how to prevent it or fix this issue?
> >>
> >> [52890.265362] BUG: unable to handle page fault for address: ffff9b94c480000c
> >> [52890.266264] #PF: supervisor write access in kernel mode
> >> [52890.266814] #PF: error_code(0x000b) - reserved bit violation
> >> [52890.267299] PGD 4c3c01067 P4D 4c3c01067 PUD 103be6063 PMD 7849cc063
> >> PTE 7a1dd28f4e77cee7
> >> [52890.267926] Oops: Oops: 000b [#1] PREEMPT SMP NOPTI
> >> [52890.268372] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G            E
> >>      6.10.7-1.gdc.el9.x86_64 #1
> >> [52890.269007] Hardware name: RDO OpenStack Compute/RHEL, BIOS
> >> edk2-20240524-1.el9 05/24/2024
> >> [52890.269853] RIP: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]
> >> [52890.270349] Code: e7 e8 85 ef ff ff 49 c7 84 24 80 05 00 00 00 00
> >> 00 00 41 0f b7 84 24 ac 02 00 00 48 8d 73 10 45 31 c0 b9 02 00 00 00
> >> 8d 50 f0 <66> 89 53 0c 49 8b 3c 24 0f b7 d2 e8 f1 04 9d cb 49 8b 3c 24
> >> 48 89
> >> [52890.272173] RSP: 0018:ffffb5dac64f0d60 EFLAGS: 00010246
> >> [52890.272637] RAX: 0000000000001000 RBX: ffff9b94c4800000 RCX: 0000000000000002
> >> [52890.273209] RDX: 0000000000000ff0 RSI: ffff9b94c4800010 RDI: ffffe06380000000
> >> [52890.273772] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffe0639e120008
> >> [52890.274341] R10: 0000000000000000 R11: 00000000000096d0 R12: ffff9b8ebe621800
> >> [52890.274898] R13: ffff9b8ebe621800 R14: 0000000000001000 R15: 0000000000000000
> >> [52890.275476] FS:  0000000000000000(0000) GS:ffff9b9ba3d00000(0000)
> >> knlGS:0000000000000000
> >> [52890.276087] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [52890.276592] CR2: ffff9b94c480000c CR3: 00000004c3b1c006 CR4: 0000000000770ef0
> >> [52890.277167] PKRU: 55555554
> >> [52890.277524] Call Trace:
> >> [52890.277869]  <IRQ>
> >> [52890.278198]  ? __die+0x20/0x70
> >> [52890.278580]  ? page_fault_oops+0x75/0x170
> >> [52890.279009]  ? exc_page_fault+0xbe/0x160
> >> [52890.279441]  ? asm_exc_page_fault+0x22/0x30
> >> [52890.279881]  ? virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]
> >> [52890.280377]  try_fill_recv+0x22c/0x440 [virtio_net]
> >> [52890.280848]  virtnet_receive+0x1ce/0x230 [virtio_net]
> >> [52890.281334]  virtnet_poll+0x179/0x3a0 [virtio_net]
> >> [52890.281804]  __napi_poll+0x29/0x1b0
> >> [52890.282222]  net_rx_action+0x2b5/0x390
> >> [52890.282641]  ? _raw_spin_unlock_irqrestore+0xa/0x30
> >> [52890.283118]  handle_softirqs+0xd3/0x2b0
> >> [52890.283550]  __irq_exit_rcu+0x9b/0xc0
> >> [52890.283970]  common_interrupt+0x7f/0xa0
> >> [52890.284409]  </IRQ>
> >> [52890.284749]  <TASK>
> >> [52890.285084]  asm_common_interrupt+0x22/0x40
> >> [52890.285526] RIP: 0010:default_idle+0xb/0x20
> >> [52890.285953] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 90
> >> 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d c3 92 30
> >> 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> >> 00 90
> >> [52890.287332] RSP: 0018:ffffb5dac436fec0 EFLAGS: 00000206
> >> [52890.287814] RAX: 000000000000000e RBX: ffff9b94c320ce00 RCX: 0000000103227730
> >> [52890.288395] RDX: 000000000000000e RSI: 0000000000000082 RDI: 000000002f3198a4
> >> [52890.288962] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
> >> [52890.289533] R10: 00000000000002cd R11: 0000000000000000 R12: 0000000000000000
> >> [52890.290092] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> >> [52890.290657]  default_idle_call+0x2c/0xf0
> >> [52890.291053]  cpuidle_idle_call+0x109/0x120
> >> [52890.291464]  do_idle+0x76/0xb0
> >> [52890.291813]  cpu_startup_entry+0x25/0x30
> >> [52890.292269]  start_secondary+0x113/0x130
> >> [52890.292812]  common_startup_64+0x13e/0x141
> >> [52890.293279]  </TASK>
> >> [52890.293615] Modules linked in: mptcp_diag(E) xsk_diag(E)
> >> raw_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) tcp_diag(E)
> >> udp_diag(E) inet_diag(E) nf_conntrack(E) nf_defrag_ipv6(E)
> >> nf_defrag_ipv4(E) binfmt_misc(E) zram(E) tls(E) isofs(E) vfat(E)
> >> fat(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) ccp(E) kvm(E)
> >> i2c_i801(E) virtio_net(E) net_failover(E) failover(E) virtio_gpu(E)
> >> i2c_smbus(E) dimlib(E) virtio_balloon(E) virtio_dma_buf(E) fuse(E)
> >> ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E)
> >> crct10dif_pclmul(E) libahci(E) crc32_pclmul(E) polyval_clmulni(E)
> >> polyval_generic(E) libata(E) ghash_clmulni_intel(E) sha512_ssse3(E)
> >> virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E)
> >> raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E)
> >> dm_region_hash(E) dm_log(E) dm_mod(E)
> >> [52890.293657] Unloaded tainted modules: edac_mce_amd(E):1
> >> amd_atl(E):2 padlock_aes(E):3
> >> [52890.299716] CR2: ffff9b94c480000c
> >> [52890.300092] ---[ end trace 0000000000000000 ]---
> >> [52890.300101] Oops: general protection fault, probably for
> >> non-canonical address 0x86304c8ed4b709b3: 0000 [#2] PREEMPT SMP NOPTI
> >> [52890.300336] RIP: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]
> >> [52890.301177] CPU: 10 PID: 51 Comm: ksoftirqd/10 Tainted: G      D
> >>  E      6.10.7-1.gdc.el9.x86_64 #1
> >> [52890.301472] Code: e7 e8 85 ef ff ff 49 c7 84 24 80 05 00 00 00 00
> >> 00 00 41 0f b7 84 24 ac 02 00 00 48 8d 73 10 45 31 c0 b9 02 00 00 00
> >> 8d 50 f0 <66> 89 53 0c 49 8b 3c 24 0f b7 d2 e8 f1 04 9d cb 49 8b 3c 24
> >> 48 89
> >> [52890.302137] Hardware name: RDO OpenStack Compute/RHEL, BIOS
> >> edk2-20240524-1.el9 05/24/2024
> >> [52890.303074] RSP: 0018:ffffb5dac64f0d60 EFLAGS: 00010246
> >> [52890.303689] RIP: 0010:put_cpu_partial+0x15/0x70
> >> [52890.303895]
> >> [52890.304291] Code: 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90
> >> 90 90 90 90 90 0f 1f 44 00 00 9c 59 fa 48 8b 07 65 4c 8b 40 18 4d 85
> >> c0 74 54 <41> 8b 40 18 85 d2 75 22 83 c0 01 89 46 18 4c 89 46 10 48 8b
> >> 07 65
> >> [52890.304556] RAX: 0000000000001000 RBX: ffff9b94c4800000 RCX: 0000000000000002
> >> [52890.304680] RSP: 0018:ffffb5dac6473d48 EFLAGS: 00010082
> >> [52890.305597] RDX: 0000000000000ff0 RSI: ffff9b94c4800010 RDI: ffffe06380000000
> >> [52890.306115]
> >> [52890.306384] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffe0639e120008
> >> [52890.306910] RAX: 00003a3f1c003ff0 RBX: ffff9b99322a16d0 RCX: 0000000000000246
> >> [52890.307010] R10: 0000000000000000 R11: 00000000000096d0 R12: ffff9b8ebe621800
> >> [52890.307574] RDX: 0000000000000001 RSI: ffffe063afc8a800 RDI: ffff9b94cc48fc00
> >> [52890.307927] R13: ffff9b8ebe621800 R14: 0000000000001000 R15: 0000000000000000
> >> [52890.308450] RBP: ffffb5dac6473da0 R08: 86304c8ed4b7099b R09: 00000000001c001b
> >> [52890.309006] FS:  0000000000000000(0000) GS:ffff9b9ba3d00000(0000)
> >> knlGS:0000000000000000
> >> [52890.309543] R10: 0000000000040000 R11: 0000000000000001 R12: ffffe063afc8a840
> >> [52890.310127] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [52890.310722] R13: ffffe063afc8a800 R14: ffff9b94cc48fc00 R15: ffff9b9ba3b36100
> >> [52890.311314] CR2: ffff9b94c480000c CR3: 00000004c3b1c006 CR4: 0000000000770ef0
> >> [52890.311735] FS:  0000000000000000(0000) GS:ffff9b9ba3b00000(0000)
> >> knlGS:0000000000000000
> >> [52890.312333] PKRU: 55555554
> >> [52890.312850] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [52890.313298] Kernel panic - not syncing: Fatal exception in interrupt
> >> [52891.350776] Shutting down cpus with NMI
> >> [52891.358926] Kernel Offset: 0xae00000 from 0xffffffff81000000
> >> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> >> [52891.359713] ---[ end Kernel panic - not syncing: Fatal exception in
> >> interrupt ]---
> >>
> >> Best,
> >> Jaroslav Pulchart
> >
> >
> >
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues
  2024-09-13  8:42     ` Xuan Zhuo
@ 2024-09-13  8:51       ` Linux regression tracking (Thorsten Leemhuis)
  2024-09-13  9:21         ` Jaroslav Pulchart
  0 siblings, 1 reply; 13+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2024-09-13  8:51 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Linux kernel regressions list, Jaroslav Pulchart, virtualization,
	Michael S. Tsirkin

On 13.09.24 10:42, Xuan Zhuo wrote:
> On Fri, 13 Sep 2024 10:26:57 +0200, "Linux regression tracking (Thorsten Leemhuis)" <regressions@leemhuis.info> wrote:
>> [CCing a few people that know more about this stuff than I do]
>>
>> On 13.09.24 09:50, Jaroslav Pulchart wrote:
>>>
>>> actually I'm getting random memory corruption related crashes after
>>> updating to 6.10.y. My expectation is that it relates to this issue:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=219154
>>> It looks like it is almost 1 month ago
>>
>> A lot of developer ignore bugzilla.
>>
>>> already from the last comment
>>> there, However the patches fixing the regression are not reverted from
>>> the 6.10.y tree which surprises me.
>>>
>>> I will try to revert them from our builds and see if it helps to avoid
>>> random daily happening crashes.
>>
>> Not my area of expertise, but to me it sounds like the problem will be
>> resolved my "Revert "virtio_net: rx enable premapped mode by default"":
>> https://lore.kernel.org/all/20240820071913.68004-1-xuanzhuo@linux.alibaba.com/
> 
> YES. That is merged into net.

Well, yes, but TWIMC to avoid confusion, it's already one step further,
as mentioned:

>> That set just landed in mainline. 

See
https://git.kernel.org/torvalds/c/48aa361c5db0b380c2b75c24984c0d3e7c1e8c09
or
https://git.kernel.org/torvalds/c/111fc9f517cb293c4213673733b980123c3b0209

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues
  2024-09-13  8:51       ` Linux regression tracking (Thorsten Leemhuis)
@ 2024-09-13  9:21         ` Jaroslav Pulchart
  2024-09-13 14:38           ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Jaroslav Pulchart @ 2024-09-13  9:21 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Xuan Zhuo, virtualization, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 2317 bytes --]

So far:

1/ I was able to "do a reproducer" and hit the "random memory
corruption" issue with vanila 6.10.10 in our setup in ~28m of uptime
see attached 6.10.10-1.gdc.el9.x86_64.log.
2/ I reverted these commits
"virtio_net: rx remove premapped failover code":
defd28aa5acb0fd7c15adc6bc40a8ac277d04dea
"virtio_net: big mode skip the unmap check":
a377ae542d8d0a20a3173da3bbba72e045bea7a9
"virtio_ring: enable premapped mode whatever use_dma_api":
f9dac92ba9081062a6477ee015bd3b8c5914efc4
in our next build and so far the environment is stable and not
crashing under same conditions like the previous crash.


pá 13. 9. 2024 v 10:51 odesílatel Linux regression tracking (Thorsten
Leemhuis) <regressions@leemhuis.info> napsal:
>
> On 13.09.24 10:42, Xuan Zhuo wrote:
> > On Fri, 13 Sep 2024 10:26:57 +0200, "Linux regression tracking (Thorsten Leemhuis)" <regressions@leemhuis.info> wrote:
> >> [CCing a few people that know more about this stuff than I do]
> >>
> >> On 13.09.24 09:50, Jaroslav Pulchart wrote:
> >>>
> >>> actually I'm getting random memory corruption related crashes after
> >>> updating to 6.10.y. My expectation is that it relates to this issue:
> >>> https://bugzilla.kernel.org/show_bug.cgi?id=219154
> >>> It looks like it is almost 1 month ago
> >>
> >> A lot of developer ignore bugzilla.
> >>
> >>> already from the last comment
> >>> there, However the patches fixing the regression are not reverted from
> >>> the 6.10.y tree which surprises me.
> >>>
> >>> I will try to revert them from our builds and see if it helps to avoid
> >>> random daily happening crashes.
> >>
> >> Not my area of expertise, but to me it sounds like the problem will be
> >> resolved my "Revert "virtio_net: rx enable premapped mode by default"":
> >> https://lore.kernel.org/all/20240820071913.68004-1-xuanzhuo@linux.alibaba.com/
> >
> > YES. That is merged into net.
>
> Well, yes, but TWIMC to avoid confusion, it's already one step further,
> as mentioned:
>
> >> That set just landed in mainline.
>
> See
> https://git.kernel.org/torvalds/c/48aa361c5db0b380c2b75c24984c0d3e7c1e8c09
> or
> https://git.kernel.org/torvalds/c/111fc9f517cb293c4213673733b980123c3b0209
>
> Ciao, Thorsten



-- 
Jaroslav Pulchart
Sr. Principal SW Engineer
GoodData

[-- Attachment #2: 6.10.10-1.gdc.el9.x86_64.log --]
[-- Type: text/x-log, Size: 5923 bytes --]

[ 2224.743780] Oops: stack segment: 0000 [#1] PREEMPT SMP NOPTI
[ 2224.744605] CPU: 1 PID: 52 Comm: kswapd0 Tainted: G            E      6.10.10-1.gdc.el9.x86_64 #1
[ 2224.745375] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20240524-1.el9 05/24/2024
[ 2224.746094] RIP: 0010:refill_obj_stock+0x40/0x170
[ 2224.746629] Code: 5c fa 65 48 8b 05 c8 c4 bd 77 4c 8d b8 60 12 03 00 49 8b 47 10 48 39 f8 74 5d 4c 89 ff e8 78 ed ff ff 49 89 c6 e8 f0 34 d7 ff <48> 8b 45 00 a8 03 0f 85 ca 00 00 00 65 48 ff 00 e8 ab 74 d7 ff 49
[ 2224.748241] RSP: 0018:ffffa5024010ce10 EFLAGS: 00010002
[ 2224.748803] RAX: 0000000000000002 RBX: 00000000000000c8 RCX: 00002d82d4038240
[ 2224.749449] RDX: ffff977b00aa9a00 RSI: 0000000000000001 RDI: ffff977b00aa9a00
[ 2224.750082] RBP: a91ef76620614d85 R08: 0000000000000001 R09: ffffffff881b9077
[ 2224.750720] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000282
[ 2224.751359] R13: ffff977b00235c00 R14: ffff977baa14e280 R15: ffff977f6bd31260
[ 2224.752183] FS:  0000000000000000(0000) GS:ffff977f6bd00000(0000) knlGS:0000000000000000
[ 2224.752952] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2224.753593] CR2: 00007f2d7e5dc000 CR3: 0000000222340005 CR4: 0000000000770ef0
[ 2224.754271] PKRU: 55555554
[ 2224.754697] Call Trace:
[ 2224.755112]  <IRQ>
[ 2224.755509]  ? die+0x33/0x90
[ 2224.755949]  ? do_trap+0xd9/0x100
[ 2224.756418]  ? do_error_trap+0x65/0x80
[ 2224.756903]  ? exc_stack_segment+0x35/0x50
[ 2224.757417]  ? asm_exc_stack_segment+0x22/0x30
[ 2224.757999]  ? rcu_do_batch+0x1a7/0x530
[ 2224.758549]  ? refill_obj_stock+0x40/0x170
[ 2224.759125]  __memcg_slab_free_hook+0xb0/0x140
[ 2224.759723]  kmem_cache_free+0x3b2/0x3e0
[ 2224.760292]  ? rcu_do_batch+0x1a7/0x530
[ 2224.760845]  rcu_do_batch+0x1a7/0x530
[ 2224.761399]  ? rcu_do_batch+0x13b/0x530
[ 2224.761950]  rcu_core+0x256/0x420
[ 2224.762475]  ? ktime_get+0x34/0xc0
[ 2224.763010]  handle_softirqs+0xd3/0x2b0
[ 2224.763573]  __irq_exit_rcu+0x9b/0xc0
[ 2224.764118]  sysvec_apic_timer_interrupt+0x71/0x90
[ 2224.764738]  </IRQ>
[ 2224.765159]  <TASK>
[ 2224.765594]  asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 2224.766163] RIP: 0010:mem_cgroup_from_slab_obj+0x51/0x130
[ 2224.766750] Code: 01 c8 48 8b 35 58 9d 28 01 48 c1 e8 0c 48 c1 e0 06 48 01 f0 48 8b 78 08 48 89 c1 40 f6 c7 01 0f 85 cd 00 00 00 66 90 8b 41 30 <25> 00 10 00 f0 3d 00 00 00 f0 74 45 48 8b 51 38 f6 c2 01 75 15 48
[ 2224.768355] RSP: 0018:ffffa502403cfa70 EFLAGS: 00000202
[ 2224.768994] RAX: 00000000ffffefff RBX: ffff977b9fbb7000 RCX: ffffc69214c0b500
[ 2224.769747] RDX: ffff977f302d6a40 RSI: ffffc69200000000 RDI: ffffc69214c0b501
[ 2224.770504] RBP: ffff977f302d6a40 R08: ffff977f300e58c8 R09: ffff977f300e58c8
[ 2224.771246] R10: 0000000000000000 R11: ffffa502403cf900 R12: ffff977b9fbb7498
[ 2224.771974] R13: 0000000000000000 R14: ffff977b9fbb7070 R15: 0000000000000000
[ 2224.772678]  list_lru_add_obj+0x6b/0xa0
[ 2224.773158]  iput+0x1f1/0x210
[ 2224.773596]  __dentry_kill+0x71/0x170
[ 2224.774055]  shrink_dentry_list+0x67/0xe0
[ 2224.774542]  prune_dcache_sb+0x54/0x80
[ 2224.774996]  super_cache_scan+0x120/0x1c0
[ 2224.775470]  do_shrink_slab+0x134/0x350
[ 2224.775916]  shrink_slab_memcg+0x199/0x2c0
[ 2224.776387]  shrink_one+0x118/0x1b0
[ 2224.776845]  shrink_many+0x127/0x2a0
[ 2224.777314]  shrink_node+0x3d7/0x430
[ 2224.777765]  ? pick_next_task+0x5a/0xae0
[ 2224.778250]  balance_pgdat+0x29c/0x730
[ 2224.778704]  ? __try_to_del_timer_sync+0x62/0xa0
[ 2224.779227]  ? __pfx_kswapd+0x10/0x10
[ 2224.779674]  kswapd+0xf7/0x180
[ 2224.780082]  kthread+0xcc/0x100
[ 2224.780483]  ? __pfx_kthread+0x10/0x10
[ 2224.780887]  ret_from_fork+0x2d/0x50
[ 2224.781297]  ? __pfx_kthread+0x10/0x10
[ 2224.781703]  ret_from_fork_asm+0x1a/0x30
[ 2224.782118]  </TASK>
[ 2224.782451] Modules linked in: udp_diag(E) tcp_diag(E) inet_diag(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) zram(E) tls(E) isofs(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) ccp(E) kvm(E) virtio_gpu(E) virtio_net(E) i2c_i801(E) i2c_smbus(E) net_failover(E) failover(E) dimlib(E) virtio_dma_buf(E) virtio_balloon(E) vfat(E) fat(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E) libata(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[ 2224.782487] Unloaded tainted modules: amd_atl(E):2 edac_mce_amd(E):1 padlock_aes(E):3
[ 2224.787698] ---[ end trace 0000000000000000 ]---
[ 2224.788286] RIP: 0010:refill_obj_stock+0x40/0x170
[ 2224.788860] Code: 5c fa 65 48 8b 05 c8 c4 bd 77 4c 8d b8 60 12 03 00 49 8b 47 10 48 39 f8 74 5d 4c 89 ff e8 78 ed ff ff 49 89 c6 e8 f0 34 d7 ff <48> 8b 45 00 a8 03 0f 85 ca 00 00 00 65 48 ff 00 e8 ab 74 d7 ff 49
[ 2224.790600] RSP: 0018:ffffa5024010ce10 EFLAGS: 00010002
[ 2224.791230] RAX: 0000000000000002 RBX: 00000000000000c8 RCX: 00002d82d4038240
[ 2224.791924] RDX: ffff977b00aa9a00 RSI: 0000000000000001 RDI: ffff977b00aa9a00
[ 2224.792610] RBP: a91ef76620614d85 R08: 0000000000000001 R09: ffffffff881b9077
[ 2224.793303] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000282
[ 2224.793985] R13: ffff977b00235c00 R14: ffff977baa14e280 R15: ffff977f6bd31260
[ 2224.794681] FS:  0000000000000000(0000) GS:ffff977f6bd00000(0000) knlGS:0000000000000000
[ 2224.795439] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2224.796117] CR2: 00007f2d7e5dc000 CR3: 0000000222340005 CR4: 0000000000770ef0
[ 2224.796887] PKRU: 55555554
[ 2224.797384] Kernel panic - not syncing: Fatal exception in interrupt
[ 2224.798304] Kernel Offset: 0x7000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 2224.799190] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues
  2024-09-13  9:21         ` Jaroslav Pulchart
@ 2024-09-13 14:38           ` Michael S. Tsirkin
  2024-09-16  7:32             ` Jaroslav Pulchart
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2024-09-13 14:38 UTC (permalink / raw)
  To: Jaroslav Pulchart
  Cc: Linux regressions mailing list, Xuan Zhuo, virtualization

On Fri, Sep 13, 2024 at 11:21:11AM +0200, Jaroslav Pulchart wrote:
> So far:
> 
> 1/ I was able to "do a reproducer" and hit the "random memory
> corruption" issue with vanila 6.10.10 in our setup in ~28m of uptime
> see attached 6.10.10-1.gdc.el9.x86_64.log.
> 2/ I reverted these commits
> "virtio_net: rx remove premapped failover code":
> defd28aa5acb0fd7c15adc6bc40a8ac277d04dea
> "virtio_net: big mode skip the unmap check":
> a377ae542d8d0a20a3173da3bbba72e045bea7a9
> "virtio_ring: enable premapped mode whatever use_dma_api":
> f9dac92ba9081062a6477ee015bd3b8c5914efc4
> in our next build and so far the environment is stable and not
> crashing under same conditions like the previous crash.


Automated backport failed:

http://lore.kernel.org/all/2024091336-family-daffodil-541d@gregkh

Since you have done the revert, and actually tested it, feel free
to post, I will ack.




> 
> pá 13. 9. 2024 v 10:51 odesílatel Linux regression tracking (Thorsten
> Leemhuis) <regressions@leemhuis.info> napsal:
> >
> > On 13.09.24 10:42, Xuan Zhuo wrote:
> > > On Fri, 13 Sep 2024 10:26:57 +0200, "Linux regression tracking (Thorsten Leemhuis)" <regressions@leemhuis.info> wrote:
> > >> [CCing a few people that know more about this stuff than I do]
> > >>
> > >> On 13.09.24 09:50, Jaroslav Pulchart wrote:
> > >>>
> > >>> actually I'm getting random memory corruption related crashes after
> > >>> updating to 6.10.y. My expectation is that it relates to this issue:
> > >>> https://bugzilla.kernel.org/show_bug.cgi?id=219154
> > >>> It looks like it is almost 1 month ago
> > >>
> > >> A lot of developer ignore bugzilla.
> > >>
> > >>> already from the last comment
> > >>> there, However the patches fixing the regression are not reverted from
> > >>> the 6.10.y tree which surprises me.
> > >>>
> > >>> I will try to revert them from our builds and see if it helps to avoid
> > >>> random daily happening crashes.
> > >>
> > >> Not my area of expertise, but to me it sounds like the problem will be
> > >> resolved my "Revert "virtio_net: rx enable premapped mode by default"":
> > >> https://lore.kernel.org/all/20240820071913.68004-1-xuanzhuo@linux.alibaba.com/
> > >
> > > YES. That is merged into net.
> >
> > Well, yes, but TWIMC to avoid confusion, it's already one step further,
> > as mentioned:
> >
> > >> That set just landed in mainline.
> >
> > See
> > https://git.kernel.org/torvalds/c/48aa361c5db0b380c2b75c24984c0d3e7c1e8c09
> > or
> > https://git.kernel.org/torvalds/c/111fc9f517cb293c4213673733b980123c3b0209
> >
> > Ciao, Thorsten
> 
> 
> 
> -- 
> Jaroslav Pulchart
> Sr. Principal SW Engineer
> GoodData

> [ 2224.743780] Oops: stack segment: 0000 [#1] PREEMPT SMP NOPTI
> [ 2224.744605] CPU: 1 PID: 52 Comm: kswapd0 Tainted: G            E      6.10.10-1.gdc.el9.x86_64 #1
> [ 2224.745375] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20240524-1.el9 05/24/2024
> [ 2224.746094] RIP: 0010:refill_obj_stock+0x40/0x170
> [ 2224.746629] Code: 5c fa 65 48 8b 05 c8 c4 bd 77 4c 8d b8 60 12 03 00 49 8b 47 10 48 39 f8 74 5d 4c 89 ff e8 78 ed ff ff 49 89 c6 e8 f0 34 d7 ff <48> 8b 45 00 a8 03 0f 85 ca 00 00 00 65 48 ff 00 e8 ab 74 d7 ff 49
> [ 2224.748241] RSP: 0018:ffffa5024010ce10 EFLAGS: 00010002
> [ 2224.748803] RAX: 0000000000000002 RBX: 00000000000000c8 RCX: 00002d82d4038240
> [ 2224.749449] RDX: ffff977b00aa9a00 RSI: 0000000000000001 RDI: ffff977b00aa9a00
> [ 2224.750082] RBP: a91ef76620614d85 R08: 0000000000000001 R09: ffffffff881b9077
> [ 2224.750720] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000282
> [ 2224.751359] R13: ffff977b00235c00 R14: ffff977baa14e280 R15: ffff977f6bd31260
> [ 2224.752183] FS:  0000000000000000(0000) GS:ffff977f6bd00000(0000) knlGS:0000000000000000
> [ 2224.752952] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2224.753593] CR2: 00007f2d7e5dc000 CR3: 0000000222340005 CR4: 0000000000770ef0
> [ 2224.754271] PKRU: 55555554
> [ 2224.754697] Call Trace:
> [ 2224.755112]  <IRQ>
> [ 2224.755509]  ? die+0x33/0x90
> [ 2224.755949]  ? do_trap+0xd9/0x100
> [ 2224.756418]  ? do_error_trap+0x65/0x80
> [ 2224.756903]  ? exc_stack_segment+0x35/0x50
> [ 2224.757417]  ? asm_exc_stack_segment+0x22/0x30
> [ 2224.757999]  ? rcu_do_batch+0x1a7/0x530
> [ 2224.758549]  ? refill_obj_stock+0x40/0x170
> [ 2224.759125]  __memcg_slab_free_hook+0xb0/0x140
> [ 2224.759723]  kmem_cache_free+0x3b2/0x3e0
> [ 2224.760292]  ? rcu_do_batch+0x1a7/0x530
> [ 2224.760845]  rcu_do_batch+0x1a7/0x530
> [ 2224.761399]  ? rcu_do_batch+0x13b/0x530
> [ 2224.761950]  rcu_core+0x256/0x420
> [ 2224.762475]  ? ktime_get+0x34/0xc0
> [ 2224.763010]  handle_softirqs+0xd3/0x2b0
> [ 2224.763573]  __irq_exit_rcu+0x9b/0xc0
> [ 2224.764118]  sysvec_apic_timer_interrupt+0x71/0x90
> [ 2224.764738]  </IRQ>
> [ 2224.765159]  <TASK>
> [ 2224.765594]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> [ 2224.766163] RIP: 0010:mem_cgroup_from_slab_obj+0x51/0x130
> [ 2224.766750] Code: 01 c8 48 8b 35 58 9d 28 01 48 c1 e8 0c 48 c1 e0 06 48 01 f0 48 8b 78 08 48 89 c1 40 f6 c7 01 0f 85 cd 00 00 00 66 90 8b 41 30 <25> 00 10 00 f0 3d 00 00 00 f0 74 45 48 8b 51 38 f6 c2 01 75 15 48
> [ 2224.768355] RSP: 0018:ffffa502403cfa70 EFLAGS: 00000202
> [ 2224.768994] RAX: 00000000ffffefff RBX: ffff977b9fbb7000 RCX: ffffc69214c0b500
> [ 2224.769747] RDX: ffff977f302d6a40 RSI: ffffc69200000000 RDI: ffffc69214c0b501
> [ 2224.770504] RBP: ffff977f302d6a40 R08: ffff977f300e58c8 R09: ffff977f300e58c8
> [ 2224.771246] R10: 0000000000000000 R11: ffffa502403cf900 R12: ffff977b9fbb7498
> [ 2224.771974] R13: 0000000000000000 R14: ffff977b9fbb7070 R15: 0000000000000000
> [ 2224.772678]  list_lru_add_obj+0x6b/0xa0
> [ 2224.773158]  iput+0x1f1/0x210
> [ 2224.773596]  __dentry_kill+0x71/0x170
> [ 2224.774055]  shrink_dentry_list+0x67/0xe0
> [ 2224.774542]  prune_dcache_sb+0x54/0x80
> [ 2224.774996]  super_cache_scan+0x120/0x1c0
> [ 2224.775470]  do_shrink_slab+0x134/0x350
> [ 2224.775916]  shrink_slab_memcg+0x199/0x2c0
> [ 2224.776387]  shrink_one+0x118/0x1b0
> [ 2224.776845]  shrink_many+0x127/0x2a0
> [ 2224.777314]  shrink_node+0x3d7/0x430
> [ 2224.777765]  ? pick_next_task+0x5a/0xae0
> [ 2224.778250]  balance_pgdat+0x29c/0x730
> [ 2224.778704]  ? __try_to_del_timer_sync+0x62/0xa0
> [ 2224.779227]  ? __pfx_kswapd+0x10/0x10
> [ 2224.779674]  kswapd+0xf7/0x180
> [ 2224.780082]  kthread+0xcc/0x100
> [ 2224.780483]  ? __pfx_kthread+0x10/0x10
> [ 2224.780887]  ret_from_fork+0x2d/0x50
> [ 2224.781297]  ? __pfx_kthread+0x10/0x10
> [ 2224.781703]  ret_from_fork_asm+0x1a/0x30
> [ 2224.782118]  </TASK>
> [ 2224.782451] Modules linked in: udp_diag(E) tcp_diag(E) inet_diag(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) zram(E) tls(E) isofs(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) ccp(E) kvm(E) virtio_gpu(E) virtio_net(E) i2c_i801(E) i2c_smbus(E) net_failover(E) failover(E) dimlib(E) virtio_dma_buf(E) virtio_balloon(E) vfat(E) fat(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E) libata(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> [ 2224.782487] Unloaded tainted modules: amd_atl(E):2 edac_mce_amd(E):1 padlock_aes(E):3
> [ 2224.787698] ---[ end trace 0000000000000000 ]---
> [ 2224.788286] RIP: 0010:refill_obj_stock+0x40/0x170
> [ 2224.788860] Code: 5c fa 65 48 8b 05 c8 c4 bd 77 4c 8d b8 60 12 03 00 49 8b 47 10 48 39 f8 74 5d 4c 89 ff e8 78 ed ff ff 49 89 c6 e8 f0 34 d7 ff <48> 8b 45 00 a8 03 0f 85 ca 00 00 00 65 48 ff 00 e8 ab 74 d7 ff 49
> [ 2224.790600] RSP: 0018:ffffa5024010ce10 EFLAGS: 00010002
> [ 2224.791230] RAX: 0000000000000002 RBX: 00000000000000c8 RCX: 00002d82d4038240
> [ 2224.791924] RDX: ffff977b00aa9a00 RSI: 0000000000000001 RDI: ffff977b00aa9a00
> [ 2224.792610] RBP: a91ef76620614d85 R08: 0000000000000001 R09: ffffffff881b9077
> [ 2224.793303] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000282
> [ 2224.793985] R13: ffff977b00235c00 R14: ffff977baa14e280 R15: ffff977f6bd31260
> [ 2224.794681] FS:  0000000000000000(0000) GS:ffff977f6bd00000(0000) knlGS:0000000000000000
> [ 2224.795439] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2224.796117] CR2: 00007f2d7e5dc000 CR3: 0000000222340005 CR4: 0000000000770ef0
> [ 2224.796887] PKRU: 55555554
> [ 2224.797384] Kernel panic - not syncing: Fatal exception in interrupt
> [ 2224.798304] Kernel Offset: 0x7000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 2224.799190] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues
  2024-09-13 14:38           ` Michael S. Tsirkin
@ 2024-09-16  7:32             ` Jaroslav Pulchart
  2024-11-06  9:01               ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Jaroslav Pulchart @ 2024-09-16  7:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Linux regressions mailing list, Xuan Zhuo, virtualization

>
> On Fri, Sep 13, 2024 at 11:21:11AM +0200, Jaroslav Pulchart wrote:
> > So far:
> >
> > 1/ I was able to "do a reproducer" and hit the "random memory
> > corruption" issue with vanila 6.10.10 in our setup in ~28m of uptime
> > see attached 6.10.10-1.gdc.el9.x86_64.log.
> > 2/ I reverted these commits
> > "virtio_net: rx remove premapped failover code":
> > defd28aa5acb0fd7c15adc6bc40a8ac277d04dea
> > "virtio_net: big mode skip the unmap check":
> > a377ae542d8d0a20a3173da3bbba72e045bea7a9
> > "virtio_ring: enable premapped mode whatever use_dma_api":
> > f9dac92ba9081062a6477ee015bd3b8c5914efc4
> > in our next build and so far the environment is stable and not
> > crashing under same conditions like the previous crash.
>
>
> Automated backport failed:
>
> http://lore.kernel.org/all/2024091336-family-daffodil-541d@gregkh
>
> Since you have done the revert, and actually tested it, feel free
> to post, I will ack.
>
>

What I did is:
git checkout linux-6.10.y
git revert defd28aa5acb0fd7c15adc6bc40a8ac277d04dea
git revert a377ae542d8d0a20a3173da3bbba72e045bea7a9
git revert f9dac92ba9081062a6477ee015bd3b8c5914efc4
(no changes nor fixing conflicts was needed)

I'm newbie in posting the changes to upstream, Can you help me with
some simple steps on how to do it?

>
>
> >
> > pá 13. 9. 2024 v 10:51 odesílatel Linux regression tracking (Thorsten
> > Leemhuis) <regressions@leemhuis.info> napsal:
> > >
> > > On 13.09.24 10:42, Xuan Zhuo wrote:
> > > > On Fri, 13 Sep 2024 10:26:57 +0200, "Linux regression tracking (Thorsten Leemhuis)" <regressions@leemhuis.info> wrote:
> > > >> [CCing a few people that know more about this stuff than I do]
> > > >>
> > > >> On 13.09.24 09:50, Jaroslav Pulchart wrote:
> > > >>>
> > > >>> actually I'm getting random memory corruption related crashes after
> > > >>> updating to 6.10.y. My expectation is that it relates to this issue:
> > > >>> https://bugzilla.kernel.org/show_bug.cgi?id=219154
> > > >>> It looks like it is almost 1 month ago
> > > >>
> > > >> A lot of developer ignore bugzilla.
> > > >>
> > > >>> already from the last comment
> > > >>> there, However the patches fixing the regression are not reverted from
> > > >>> the 6.10.y tree which surprises me.
> > > >>>
> > > >>> I will try to revert them from our builds and see if it helps to avoid
> > > >>> random daily happening crashes.
> > > >>
> > > >> Not my area of expertise, but to me it sounds like the problem will be
> > > >> resolved my "Revert "virtio_net: rx enable premapped mode by default"":
> > > >> https://lore.kernel.org/all/20240820071913.68004-1-xuanzhuo@linux.alibaba.com/
> > > >
> > > > YES. That is merged into net.
> > >
> > > Well, yes, but TWIMC to avoid confusion, it's already one step further,
> > > as mentioned:
> > >
> > > >> That set just landed in mainline.
> > >
> > > See
> > > https://git.kernel.org/torvalds/c/48aa361c5db0b380c2b75c24984c0d3e7c1e8c09
> > > or
> > > https://git.kernel.org/torvalds/c/111fc9f517cb293c4213673733b980123c3b0209
> > >
> > > Ciao, Thorsten
> >
> >
> >
> > --
> > Jaroslav Pulchart
> > Sr. Principal SW Engineer
> > GoodData
>
> > [ 2224.743780] Oops: stack segment: 0000 [#1] PREEMPT SMP NOPTI
> > [ 2224.744605] CPU: 1 PID: 52 Comm: kswapd0 Tainted: G            E      6.10.10-1.gdc.el9.x86_64 #1
> > [ 2224.745375] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20240524-1.el9 05/24/2024
> > [ 2224.746094] RIP: 0010:refill_obj_stock+0x40/0x170
> > [ 2224.746629] Code: 5c fa 65 48 8b 05 c8 c4 bd 77 4c 8d b8 60 12 03 00 49 8b 47 10 48 39 f8 74 5d 4c 89 ff e8 78 ed ff ff 49 89 c6 e8 f0 34 d7 ff <48> 8b 45 00 a8 03 0f 85 ca 00 00 00 65 48 ff 00 e8 ab 74 d7 ff 49
> > [ 2224.748241] RSP: 0018:ffffa5024010ce10 EFLAGS: 00010002
> > [ 2224.748803] RAX: 0000000000000002 RBX: 00000000000000c8 RCX: 00002d82d4038240
> > [ 2224.749449] RDX: ffff977b00aa9a00 RSI: 0000000000000001 RDI: ffff977b00aa9a00
> > [ 2224.750082] RBP: a91ef76620614d85 R08: 0000000000000001 R09: ffffffff881b9077
> > [ 2224.750720] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000282
> > [ 2224.751359] R13: ffff977b00235c00 R14: ffff977baa14e280 R15: ffff977f6bd31260
> > [ 2224.752183] FS:  0000000000000000(0000) GS:ffff977f6bd00000(0000) knlGS:0000000000000000
> > [ 2224.752952] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 2224.753593] CR2: 00007f2d7e5dc000 CR3: 0000000222340005 CR4: 0000000000770ef0
> > [ 2224.754271] PKRU: 55555554
> > [ 2224.754697] Call Trace:
> > [ 2224.755112]  <IRQ>
> > [ 2224.755509]  ? die+0x33/0x90
> > [ 2224.755949]  ? do_trap+0xd9/0x100
> > [ 2224.756418]  ? do_error_trap+0x65/0x80
> > [ 2224.756903]  ? exc_stack_segment+0x35/0x50
> > [ 2224.757417]  ? asm_exc_stack_segment+0x22/0x30
> > [ 2224.757999]  ? rcu_do_batch+0x1a7/0x530
> > [ 2224.758549]  ? refill_obj_stock+0x40/0x170
> > [ 2224.759125]  __memcg_slab_free_hook+0xb0/0x140
> > [ 2224.759723]  kmem_cache_free+0x3b2/0x3e0
> > [ 2224.760292]  ? rcu_do_batch+0x1a7/0x530
> > [ 2224.760845]  rcu_do_batch+0x1a7/0x530
> > [ 2224.761399]  ? rcu_do_batch+0x13b/0x530
> > [ 2224.761950]  rcu_core+0x256/0x420
> > [ 2224.762475]  ? ktime_get+0x34/0xc0
> > [ 2224.763010]  handle_softirqs+0xd3/0x2b0
> > [ 2224.763573]  __irq_exit_rcu+0x9b/0xc0
> > [ 2224.764118]  sysvec_apic_timer_interrupt+0x71/0x90
> > [ 2224.764738]  </IRQ>
> > [ 2224.765159]  <TASK>
> > [ 2224.765594]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> > [ 2224.766163] RIP: 0010:mem_cgroup_from_slab_obj+0x51/0x130
> > [ 2224.766750] Code: 01 c8 48 8b 35 58 9d 28 01 48 c1 e8 0c 48 c1 e0 06 48 01 f0 48 8b 78 08 48 89 c1 40 f6 c7 01 0f 85 cd 00 00 00 66 90 8b 41 30 <25> 00 10 00 f0 3d 00 00 00 f0 74 45 48 8b 51 38 f6 c2 01 75 15 48
> > [ 2224.768355] RSP: 0018:ffffa502403cfa70 EFLAGS: 00000202
> > [ 2224.768994] RAX: 00000000ffffefff RBX: ffff977b9fbb7000 RCX: ffffc69214c0b500
> > [ 2224.769747] RDX: ffff977f302d6a40 RSI: ffffc69200000000 RDI: ffffc69214c0b501
> > [ 2224.770504] RBP: ffff977f302d6a40 R08: ffff977f300e58c8 R09: ffff977f300e58c8
> > [ 2224.771246] R10: 0000000000000000 R11: ffffa502403cf900 R12: ffff977b9fbb7498
> > [ 2224.771974] R13: 0000000000000000 R14: ffff977b9fbb7070 R15: 0000000000000000
> > [ 2224.772678]  list_lru_add_obj+0x6b/0xa0
> > [ 2224.773158]  iput+0x1f1/0x210
> > [ 2224.773596]  __dentry_kill+0x71/0x170
> > [ 2224.774055]  shrink_dentry_list+0x67/0xe0
> > [ 2224.774542]  prune_dcache_sb+0x54/0x80
> > [ 2224.774996]  super_cache_scan+0x120/0x1c0
> > [ 2224.775470]  do_shrink_slab+0x134/0x350
> > [ 2224.775916]  shrink_slab_memcg+0x199/0x2c0
> > [ 2224.776387]  shrink_one+0x118/0x1b0
> > [ 2224.776845]  shrink_many+0x127/0x2a0
> > [ 2224.777314]  shrink_node+0x3d7/0x430
> > [ 2224.777765]  ? pick_next_task+0x5a/0xae0
> > [ 2224.778250]  balance_pgdat+0x29c/0x730
> > [ 2224.778704]  ? __try_to_del_timer_sync+0x62/0xa0
> > [ 2224.779227]  ? __pfx_kswapd+0x10/0x10
> > [ 2224.779674]  kswapd+0xf7/0x180
> > [ 2224.780082]  kthread+0xcc/0x100
> > [ 2224.780483]  ? __pfx_kthread+0x10/0x10
> > [ 2224.780887]  ret_from_fork+0x2d/0x50
> > [ 2224.781297]  ? __pfx_kthread+0x10/0x10
> > [ 2224.781703]  ret_from_fork_asm+0x1a/0x30
> > [ 2224.782118]  </TASK>
> > [ 2224.782451] Modules linked in: udp_diag(E) tcp_diag(E) inet_diag(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) zram(E) tls(E) isofs(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) ccp(E) kvm(E) virtio_gpu(E) virtio_net(E) i2c_i801(E) i2c_smbus(E) net_failover(E) failover(E) dimlib(E) virtio_dma_buf(E) virtio_balloon(E) vfat(E) fat(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E) libata(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> > [ 2224.782487] Unloaded tainted modules: amd_atl(E):2 edac_mce_amd(E):1 padlock_aes(E):3
> > [ 2224.787698] ---[ end trace 0000000000000000 ]---
> > [ 2224.788286] RIP: 0010:refill_obj_stock+0x40/0x170
> > [ 2224.788860] Code: 5c fa 65 48 8b 05 c8 c4 bd 77 4c 8d b8 60 12 03 00 49 8b 47 10 48 39 f8 74 5d 4c 89 ff e8 78 ed ff ff 49 89 c6 e8 f0 34 d7 ff <48> 8b 45 00 a8 03 0f 85 ca 00 00 00 65 48 ff 00 e8 ab 74 d7 ff 49
> > [ 2224.790600] RSP: 0018:ffffa5024010ce10 EFLAGS: 00010002
> > [ 2224.791230] RAX: 0000000000000002 RBX: 00000000000000c8 RCX: 00002d82d4038240
> > [ 2224.791924] RDX: ffff977b00aa9a00 RSI: 0000000000000001 RDI: ffff977b00aa9a00
> > [ 2224.792610] RBP: a91ef76620614d85 R08: 0000000000000001 R09: ffffffff881b9077
> > [ 2224.793303] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000282
> > [ 2224.793985] R13: ffff977b00235c00 R14: ffff977baa14e280 R15: ffff977f6bd31260
> > [ 2224.794681] FS:  0000000000000000(0000) GS:ffff977f6bd00000(0000) knlGS:0000000000000000
> > [ 2224.795439] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 2224.796117] CR2: 00007f2d7e5dc000 CR3: 0000000222340005 CR4: 0000000000770ef0
> > [ 2224.796887] PKRU: 55555554
> > [ 2224.797384] Kernel panic - not syncing: Fatal exception in interrupt
> > [ 2224.798304] Kernel Offset: 0x7000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [ 2224.799190] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> >
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues
  2024-09-16  7:32             ` Jaroslav Pulchart
@ 2024-11-06  9:01               ` Michael S. Tsirkin
  2024-11-06  9:04                 ` Xuan Zhuo
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2024-11-06  9:01 UTC (permalink / raw)
  To: Jaroslav Pulchart
  Cc: Linux regressions mailing list, Xuan Zhuo, virtualization

On Mon, Sep 16, 2024 at 09:32:38AM +0200, Jaroslav Pulchart wrote:
> >
> > On Fri, Sep 13, 2024 at 11:21:11AM +0200, Jaroslav Pulchart wrote:
> > > So far:
> > >
> > > 1/ I was able to "do a reproducer" and hit the "random memory
> > > corruption" issue with vanila 6.10.10 in our setup in ~28m of uptime
> > > see attached 6.10.10-1.gdc.el9.x86_64.log.
> > > 2/ I reverted these commits
> > > "virtio_net: rx remove premapped failover code":
> > > defd28aa5acb0fd7c15adc6bc40a8ac277d04dea
> > > "virtio_net: big mode skip the unmap check":
> > > a377ae542d8d0a20a3173da3bbba72e045bea7a9
> > > "virtio_ring: enable premapped mode whatever use_dma_api":
> > > f9dac92ba9081062a6477ee015bd3b8c5914efc4
> > > in our next build and so far the environment is stable and not
> > > crashing under same conditions like the previous crash.
> >
> >
> > Automated backport failed:
> >
> > http://lore.kernel.org/all/2024091336-family-daffodil-541d@gregkh
> >
> > Since you have done the revert, and actually tested it, feel free
> > to post, I will ack.
> >
> >
> 
> What I did is:
> git checkout linux-6.10.y
> git revert defd28aa5acb0fd7c15adc6bc40a8ac277d04dea
> git revert a377ae542d8d0a20a3173da3bbba72e045bea7a9
> git revert f9dac92ba9081062a6477ee015bd3b8c5914efc4
> (no changes nor fixing conflicts was needed)
> 
> I'm newbie in posting the changes to upstream, Can you help me with
> some simple steps on how to do it?

Basically in this case, I think it is enough
to reply to the revert patches and CC stable.



> >
> >
> > >
> > > pá 13. 9. 2024 v 10:51 odesílatel Linux regression tracking (Thorsten
> > > Leemhuis) <regressions@leemhuis.info> napsal:
> > > >
> > > > On 13.09.24 10:42, Xuan Zhuo wrote:
> > > > > On Fri, 13 Sep 2024 10:26:57 +0200, "Linux regression tracking (Thorsten Leemhuis)" <regressions@leemhuis.info> wrote:
> > > > >> [CCing a few people that know more about this stuff than I do]
> > > > >>
> > > > >> On 13.09.24 09:50, Jaroslav Pulchart wrote:
> > > > >>>
> > > > >>> actually I'm getting random memory corruption related crashes after
> > > > >>> updating to 6.10.y. My expectation is that it relates to this issue:
> > > > >>> https://bugzilla.kernel.org/show_bug.cgi?id=219154
> > > > >>> It looks like it is almost 1 month ago
> > > > >>
> > > > >> A lot of developer ignore bugzilla.
> > > > >>
> > > > >>> already from the last comment
> > > > >>> there, However the patches fixing the regression are not reverted from
> > > > >>> the 6.10.y tree which surprises me.
> > > > >>>
> > > > >>> I will try to revert them from our builds and see if it helps to avoid
> > > > >>> random daily happening crashes.
> > > > >>
> > > > >> Not my area of expertise, but to me it sounds like the problem will be
> > > > >> resolved my "Revert "virtio_net: rx enable premapped mode by default"":
> > > > >> https://lore.kernel.org/all/20240820071913.68004-1-xuanzhuo@linux.alibaba.com/
> > > > >
> > > > > YES. That is merged into net.
> > > >
> > > > Well, yes, but TWIMC to avoid confusion, it's already one step further,
> > > > as mentioned:
> > > >
> > > > >> That set just landed in mainline.
> > > >
> > > > See
> > > > https://git.kernel.org/torvalds/c/48aa361c5db0b380c2b75c24984c0d3e7c1e8c09
> > > > or
> > > > https://git.kernel.org/torvalds/c/111fc9f517cb293c4213673733b980123c3b0209
> > > >
> > > > Ciao, Thorsten
> > >
> > >
> > >
> > > --
> > > Jaroslav Pulchart
> > > Sr. Principal SW Engineer
> > > GoodData
> >
> > > [ 2224.743780] Oops: stack segment: 0000 [#1] PREEMPT SMP NOPTI
> > > [ 2224.744605] CPU: 1 PID: 52 Comm: kswapd0 Tainted: G            E      6.10.10-1.gdc.el9.x86_64 #1
> > > [ 2224.745375] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20240524-1.el9 05/24/2024
> > > [ 2224.746094] RIP: 0010:refill_obj_stock+0x40/0x170
> > > [ 2224.746629] Code: 5c fa 65 48 8b 05 c8 c4 bd 77 4c 8d b8 60 12 03 00 49 8b 47 10 48 39 f8 74 5d 4c 89 ff e8 78 ed ff ff 49 89 c6 e8 f0 34 d7 ff <48> 8b 45 00 a8 03 0f 85 ca 00 00 00 65 48 ff 00 e8 ab 74 d7 ff 49
> > > [ 2224.748241] RSP: 0018:ffffa5024010ce10 EFLAGS: 00010002
> > > [ 2224.748803] RAX: 0000000000000002 RBX: 00000000000000c8 RCX: 00002d82d4038240
> > > [ 2224.749449] RDX: ffff977b00aa9a00 RSI: 0000000000000001 RDI: ffff977b00aa9a00
> > > [ 2224.750082] RBP: a91ef76620614d85 R08: 0000000000000001 R09: ffffffff881b9077
> > > [ 2224.750720] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000282
> > > [ 2224.751359] R13: ffff977b00235c00 R14: ffff977baa14e280 R15: ffff977f6bd31260
> > > [ 2224.752183] FS:  0000000000000000(0000) GS:ffff977f6bd00000(0000) knlGS:0000000000000000
> > > [ 2224.752952] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 2224.753593] CR2: 00007f2d7e5dc000 CR3: 0000000222340005 CR4: 0000000000770ef0
> > > [ 2224.754271] PKRU: 55555554
> > > [ 2224.754697] Call Trace:
> > > [ 2224.755112]  <IRQ>
> > > [ 2224.755509]  ? die+0x33/0x90
> > > [ 2224.755949]  ? do_trap+0xd9/0x100
> > > [ 2224.756418]  ? do_error_trap+0x65/0x80
> > > [ 2224.756903]  ? exc_stack_segment+0x35/0x50
> > > [ 2224.757417]  ? asm_exc_stack_segment+0x22/0x30
> > > [ 2224.757999]  ? rcu_do_batch+0x1a7/0x530
> > > [ 2224.758549]  ? refill_obj_stock+0x40/0x170
> > > [ 2224.759125]  __memcg_slab_free_hook+0xb0/0x140
> > > [ 2224.759723]  kmem_cache_free+0x3b2/0x3e0
> > > [ 2224.760292]  ? rcu_do_batch+0x1a7/0x530
> > > [ 2224.760845]  rcu_do_batch+0x1a7/0x530
> > > [ 2224.761399]  ? rcu_do_batch+0x13b/0x530
> > > [ 2224.761950]  rcu_core+0x256/0x420
> > > [ 2224.762475]  ? ktime_get+0x34/0xc0
> > > [ 2224.763010]  handle_softirqs+0xd3/0x2b0
> > > [ 2224.763573]  __irq_exit_rcu+0x9b/0xc0
> > > [ 2224.764118]  sysvec_apic_timer_interrupt+0x71/0x90
> > > [ 2224.764738]  </IRQ>
> > > [ 2224.765159]  <TASK>
> > > [ 2224.765594]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> > > [ 2224.766163] RIP: 0010:mem_cgroup_from_slab_obj+0x51/0x130
> > > [ 2224.766750] Code: 01 c8 48 8b 35 58 9d 28 01 48 c1 e8 0c 48 c1 e0 06 48 01 f0 48 8b 78 08 48 89 c1 40 f6 c7 01 0f 85 cd 00 00 00 66 90 8b 41 30 <25> 00 10 00 f0 3d 00 00 00 f0 74 45 48 8b 51 38 f6 c2 01 75 15 48
> > > [ 2224.768355] RSP: 0018:ffffa502403cfa70 EFLAGS: 00000202
> > > [ 2224.768994] RAX: 00000000ffffefff RBX: ffff977b9fbb7000 RCX: ffffc69214c0b500
> > > [ 2224.769747] RDX: ffff977f302d6a40 RSI: ffffc69200000000 RDI: ffffc69214c0b501
> > > [ 2224.770504] RBP: ffff977f302d6a40 R08: ffff977f300e58c8 R09: ffff977f300e58c8
> > > [ 2224.771246] R10: 0000000000000000 R11: ffffa502403cf900 R12: ffff977b9fbb7498
> > > [ 2224.771974] R13: 0000000000000000 R14: ffff977b9fbb7070 R15: 0000000000000000
> > > [ 2224.772678]  list_lru_add_obj+0x6b/0xa0
> > > [ 2224.773158]  iput+0x1f1/0x210
> > > [ 2224.773596]  __dentry_kill+0x71/0x170
> > > [ 2224.774055]  shrink_dentry_list+0x67/0xe0
> > > [ 2224.774542]  prune_dcache_sb+0x54/0x80
> > > [ 2224.774996]  super_cache_scan+0x120/0x1c0
> > > [ 2224.775470]  do_shrink_slab+0x134/0x350
> > > [ 2224.775916]  shrink_slab_memcg+0x199/0x2c0
> > > [ 2224.776387]  shrink_one+0x118/0x1b0
> > > [ 2224.776845]  shrink_many+0x127/0x2a0
> > > [ 2224.777314]  shrink_node+0x3d7/0x430
> > > [ 2224.777765]  ? pick_next_task+0x5a/0xae0
> > > [ 2224.778250]  balance_pgdat+0x29c/0x730
> > > [ 2224.778704]  ? __try_to_del_timer_sync+0x62/0xa0
> > > [ 2224.779227]  ? __pfx_kswapd+0x10/0x10
> > > [ 2224.779674]  kswapd+0xf7/0x180
> > > [ 2224.780082]  kthread+0xcc/0x100
> > > [ 2224.780483]  ? __pfx_kthread+0x10/0x10
> > > [ 2224.780887]  ret_from_fork+0x2d/0x50
> > > [ 2224.781297]  ? __pfx_kthread+0x10/0x10
> > > [ 2224.781703]  ret_from_fork_asm+0x1a/0x30
> > > [ 2224.782118]  </TASK>
> > > [ 2224.782451] Modules linked in: udp_diag(E) tcp_diag(E) inet_diag(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) zram(E) tls(E) isofs(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) ccp(E) kvm(E) virtio_gpu(E) virtio_net(E) i2c_i801(E) i2c_smbus(E) net_failover(E) failover(E) dimlib(E) virtio_dma_buf(E) virtio_balloon(E) vfat(E) fat(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E) libata(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> > > [ 2224.782487] Unloaded tainted modules: amd_atl(E):2 edac_mce_amd(E):1 padlock_aes(E):3
> > > [ 2224.787698] ---[ end trace 0000000000000000 ]---
> > > [ 2224.788286] RIP: 0010:refill_obj_stock+0x40/0x170
> > > [ 2224.788860] Code: 5c fa 65 48 8b 05 c8 c4 bd 77 4c 8d b8 60 12 03 00 49 8b 47 10 48 39 f8 74 5d 4c 89 ff e8 78 ed ff ff 49 89 c6 e8 f0 34 d7 ff <48> 8b 45 00 a8 03 0f 85 ca 00 00 00 65 48 ff 00 e8 ab 74 d7 ff 49
> > > [ 2224.790600] RSP: 0018:ffffa5024010ce10 EFLAGS: 00010002
> > > [ 2224.791230] RAX: 0000000000000002 RBX: 00000000000000c8 RCX: 00002d82d4038240
> > > [ 2224.791924] RDX: ffff977b00aa9a00 RSI: 0000000000000001 RDI: ffff977b00aa9a00
> > > [ 2224.792610] RBP: a91ef76620614d85 R08: 0000000000000001 R09: ffffffff881b9077
> > > [ 2224.793303] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000282
> > > [ 2224.793985] R13: ffff977b00235c00 R14: ffff977baa14e280 R15: ffff977f6bd31260
> > > [ 2224.794681] FS:  0000000000000000(0000) GS:ffff977f6bd00000(0000) knlGS:0000000000000000
> > > [ 2224.795439] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 2224.796117] CR2: 00007f2d7e5dc000 CR3: 0000000222340005 CR4: 0000000000770ef0
> > > [ 2224.796887] PKRU: 55555554
> > > [ 2224.797384] Kernel panic - not syncing: Fatal exception in interrupt
> > > [ 2224.798304] Kernel Offset: 0x7000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > [ 2224.799190] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> > >
> >


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues
  2024-11-06  9:01               ` Michael S. Tsirkin
@ 2024-11-06  9:04                 ` Xuan Zhuo
  2024-11-06  9:43                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Xuan Zhuo @ 2024-11-06  9:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Linux regressions mailing list, virtualization, Jaroslav Pulchart

On Wed, 6 Nov 2024 04:01:43 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Sep 16, 2024 at 09:32:38AM +0200, Jaroslav Pulchart wrote:
> > >
> > > On Fri, Sep 13, 2024 at 11:21:11AM +0200, Jaroslav Pulchart wrote:
> > > > So far:
> > > >
> > > > 1/ I was able to "do a reproducer" and hit the "random memory
> > > > corruption" issue with vanila 6.10.10 in our setup in ~28m of uptime
> > > > see attached 6.10.10-1.gdc.el9.x86_64.log.
> > > > 2/ I reverted these commits
> > > > "virtio_net: rx remove premapped failover code":
> > > > defd28aa5acb0fd7c15adc6bc40a8ac277d04dea
> > > > "virtio_net: big mode skip the unmap check":
> > > > a377ae542d8d0a20a3173da3bbba72e045bea7a9
> > > > "virtio_ring: enable premapped mode whatever use_dma_api":
> > > > f9dac92ba9081062a6477ee015bd3b8c5914efc4
> > > > in our next build and so far the environment is stable and not
> > > > crashing under same conditions like the previous crash.
> > >
> > >
> > > Automated backport failed:
> > >
> > > http://lore.kernel.org/all/2024091336-family-daffodil-541d@gregkh
> > >
> > > Since you have done the revert, and actually tested it, feel free
> > > to post, I will ack.
> > >
> > >
> >
> > What I did is:
> > git checkout linux-6.10.y
> > git revert defd28aa5acb0fd7c15adc6bc40a8ac277d04dea
> > git revert a377ae542d8d0a20a3173da3bbba72e045bea7a9
> > git revert f9dac92ba9081062a6477ee015bd3b8c5914efc4
> > (no changes nor fixing conflicts was needed)
> >
> > I'm newbie in posting the changes to upstream, Can you help me with
> > some simple steps on how to do it?
>
> Basically in this case, I think it is enough
> to reply to the revert patches and CC stable.

Oh, I am ok.

If need me to do something, please let me know.

Thanks.

>
>
>
> > >
> > >
> > > >
> > > > pá 13. 9. 2024 v 10:51 odesílatel Linux regression tracking (Thorsten
> > > > Leemhuis) <regressions@leemhuis.info> napsal:
> > > > >
> > > > > On 13.09.24 10:42, Xuan Zhuo wrote:
> > > > > > On Fri, 13 Sep 2024 10:26:57 +0200, "Linux regression tracking (Thorsten Leemhuis)" <regressions@leemhuis.info> wrote:
> > > > > >> [CCing a few people that know more about this stuff than I do]
> > > > > >>
> > > > > >> On 13.09.24 09:50, Jaroslav Pulchart wrote:
> > > > > >>>
> > > > > >>> actually I'm getting random memory corruption related crashes after
> > > > > >>> updating to 6.10.y. My expectation is that it relates to this issue:
> > > > > >>> https://bugzilla.kernel.org/show_bug.cgi?id=219154
> > > > > >>> It looks like it is almost 1 month ago
> > > > > >>
> > > > > >> A lot of developer ignore bugzilla.
> > > > > >>
> > > > > >>> already from the last comment
> > > > > >>> there, However the patches fixing the regression are not reverted from
> > > > > >>> the 6.10.y tree which surprises me.
> > > > > >>>
> > > > > >>> I will try to revert them from our builds and see if it helps to avoid
> > > > > >>> random daily happening crashes.
> > > > > >>
> > > > > >> Not my area of expertise, but to me it sounds like the problem will be
> > > > > >> resolved my "Revert "virtio_net: rx enable premapped mode by default"":
> > > > > >> https://lore.kernel.org/all/20240820071913.68004-1-xuanzhuo@linux.alibaba.com/
> > > > > >
> > > > > > YES. That is merged into net.
> > > > >
> > > > > Well, yes, but TWIMC to avoid confusion, it's already one step further,
> > > > > as mentioned:
> > > > >
> > > > > >> That set just landed in mainline.
> > > > >
> > > > > See
> > > > > https://git.kernel.org/torvalds/c/48aa361c5db0b380c2b75c24984c0d3e7c1e8c09
> > > > > or
> > > > > https://git.kernel.org/torvalds/c/111fc9f517cb293c4213673733b980123c3b0209
> > > > >
> > > > > Ciao, Thorsten
> > > >
> > > >
> > > >
> > > > --
> > > > Jaroslav Pulchart
> > > > Sr. Principal SW Engineer
> > > > GoodData
> > >
> > > > [ 2224.743780] Oops: stack segment: 0000 [#1] PREEMPT SMP NOPTI
> > > > [ 2224.744605] CPU: 1 PID: 52 Comm: kswapd0 Tainted: G            E      6.10.10-1.gdc.el9.x86_64 #1
> > > > [ 2224.745375] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20240524-1.el9 05/24/2024
> > > > [ 2224.746094] RIP: 0010:refill_obj_stock+0x40/0x170
> > > > [ 2224.746629] Code: 5c fa 65 48 8b 05 c8 c4 bd 77 4c 8d b8 60 12 03 00 49 8b 47 10 48 39 f8 74 5d 4c 89 ff e8 78 ed ff ff 49 89 c6 e8 f0 34 d7 ff <48> 8b 45 00 a8 03 0f 85 ca 00 00 00 65 48 ff 00 e8 ab 74 d7 ff 49
> > > > [ 2224.748241] RSP: 0018:ffffa5024010ce10 EFLAGS: 00010002
> > > > [ 2224.748803] RAX: 0000000000000002 RBX: 00000000000000c8 RCX: 00002d82d4038240
> > > > [ 2224.749449] RDX: ffff977b00aa9a00 RSI: 0000000000000001 RDI: ffff977b00aa9a00
> > > > [ 2224.750082] RBP: a91ef76620614d85 R08: 0000000000000001 R09: ffffffff881b9077
> > > > [ 2224.750720] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000282
> > > > [ 2224.751359] R13: ffff977b00235c00 R14: ffff977baa14e280 R15: ffff977f6bd31260
> > > > [ 2224.752183] FS:  0000000000000000(0000) GS:ffff977f6bd00000(0000) knlGS:0000000000000000
> > > > [ 2224.752952] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [ 2224.753593] CR2: 00007f2d7e5dc000 CR3: 0000000222340005 CR4: 0000000000770ef0
> > > > [ 2224.754271] PKRU: 55555554
> > > > [ 2224.754697] Call Trace:
> > > > [ 2224.755112]  <IRQ>
> > > > [ 2224.755509]  ? die+0x33/0x90
> > > > [ 2224.755949]  ? do_trap+0xd9/0x100
> > > > [ 2224.756418]  ? do_error_trap+0x65/0x80
> > > > [ 2224.756903]  ? exc_stack_segment+0x35/0x50
> > > > [ 2224.757417]  ? asm_exc_stack_segment+0x22/0x30
> > > > [ 2224.757999]  ? rcu_do_batch+0x1a7/0x530
> > > > [ 2224.758549]  ? refill_obj_stock+0x40/0x170
> > > > [ 2224.759125]  __memcg_slab_free_hook+0xb0/0x140
> > > > [ 2224.759723]  kmem_cache_free+0x3b2/0x3e0
> > > > [ 2224.760292]  ? rcu_do_batch+0x1a7/0x530
> > > > [ 2224.760845]  rcu_do_batch+0x1a7/0x530
> > > > [ 2224.761399]  ? rcu_do_batch+0x13b/0x530
> > > > [ 2224.761950]  rcu_core+0x256/0x420
> > > > [ 2224.762475]  ? ktime_get+0x34/0xc0
> > > > [ 2224.763010]  handle_softirqs+0xd3/0x2b0
> > > > [ 2224.763573]  __irq_exit_rcu+0x9b/0xc0
> > > > [ 2224.764118]  sysvec_apic_timer_interrupt+0x71/0x90
> > > > [ 2224.764738]  </IRQ>
> > > > [ 2224.765159]  <TASK>
> > > > [ 2224.765594]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> > > > [ 2224.766163] RIP: 0010:mem_cgroup_from_slab_obj+0x51/0x130
> > > > [ 2224.766750] Code: 01 c8 48 8b 35 58 9d 28 01 48 c1 e8 0c 48 c1 e0 06 48 01 f0 48 8b 78 08 48 89 c1 40 f6 c7 01 0f 85 cd 00 00 00 66 90 8b 41 30 <25> 00 10 00 f0 3d 00 00 00 f0 74 45 48 8b 51 38 f6 c2 01 75 15 48
> > > > [ 2224.768355] RSP: 0018:ffffa502403cfa70 EFLAGS: 00000202
> > > > [ 2224.768994] RAX: 00000000ffffefff RBX: ffff977b9fbb7000 RCX: ffffc69214c0b500
> > > > [ 2224.769747] RDX: ffff977f302d6a40 RSI: ffffc69200000000 RDI: ffffc69214c0b501
> > > > [ 2224.770504] RBP: ffff977f302d6a40 R08: ffff977f300e58c8 R09: ffff977f300e58c8
> > > > [ 2224.771246] R10: 0000000000000000 R11: ffffa502403cf900 R12: ffff977b9fbb7498
> > > > [ 2224.771974] R13: 0000000000000000 R14: ffff977b9fbb7070 R15: 0000000000000000
> > > > [ 2224.772678]  list_lru_add_obj+0x6b/0xa0
> > > > [ 2224.773158]  iput+0x1f1/0x210
> > > > [ 2224.773596]  __dentry_kill+0x71/0x170
> > > > [ 2224.774055]  shrink_dentry_list+0x67/0xe0
> > > > [ 2224.774542]  prune_dcache_sb+0x54/0x80
> > > > [ 2224.774996]  super_cache_scan+0x120/0x1c0
> > > > [ 2224.775470]  do_shrink_slab+0x134/0x350
> > > > [ 2224.775916]  shrink_slab_memcg+0x199/0x2c0
> > > > [ 2224.776387]  shrink_one+0x118/0x1b0
> > > > [ 2224.776845]  shrink_many+0x127/0x2a0
> > > > [ 2224.777314]  shrink_node+0x3d7/0x430
> > > > [ 2224.777765]  ? pick_next_task+0x5a/0xae0
> > > > [ 2224.778250]  balance_pgdat+0x29c/0x730
> > > > [ 2224.778704]  ? __try_to_del_timer_sync+0x62/0xa0
> > > > [ 2224.779227]  ? __pfx_kswapd+0x10/0x10
> > > > [ 2224.779674]  kswapd+0xf7/0x180
> > > > [ 2224.780082]  kthread+0xcc/0x100
> > > > [ 2224.780483]  ? __pfx_kthread+0x10/0x10
> > > > [ 2224.780887]  ret_from_fork+0x2d/0x50
> > > > [ 2224.781297]  ? __pfx_kthread+0x10/0x10
> > > > [ 2224.781703]  ret_from_fork_asm+0x1a/0x30
> > > > [ 2224.782118]  </TASK>
> > > > [ 2224.782451] Modules linked in: udp_diag(E) tcp_diag(E) inet_diag(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) zram(E) tls(E) isofs(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) ccp(E) kvm(E) virtio_gpu(E) virtio_net(E) i2c_i801(E) i2c_smbus(E) net_failover(E) failover(E) dimlib(E) virtio_dma_buf(E) virtio_balloon(E) vfat(E) fat(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E) libata(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> > > > [ 2224.782487] Unloaded tainted modules: amd_atl(E):2 edac_mce_amd(E):1 padlock_aes(E):3
> > > > [ 2224.787698] ---[ end trace 0000000000000000 ]---
> > > > [ 2224.788286] RIP: 0010:refill_obj_stock+0x40/0x170
> > > > [ 2224.788860] Code: 5c fa 65 48 8b 05 c8 c4 bd 77 4c 8d b8 60 12 03 00 49 8b 47 10 48 39 f8 74 5d 4c 89 ff e8 78 ed ff ff 49 89 c6 e8 f0 34 d7 ff <48> 8b 45 00 a8 03 0f 85 ca 00 00 00 65 48 ff 00 e8 ab 74 d7 ff 49
> > > > [ 2224.790600] RSP: 0018:ffffa5024010ce10 EFLAGS: 00010002
> > > > [ 2224.791230] RAX: 0000000000000002 RBX: 00000000000000c8 RCX: 00002d82d4038240
> > > > [ 2224.791924] RDX: ffff977b00aa9a00 RSI: 0000000000000001 RDI: ffff977b00aa9a00
> > > > [ 2224.792610] RBP: a91ef76620614d85 R08: 0000000000000001 R09: ffffffff881b9077
> > > > [ 2224.793303] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000282
> > > > [ 2224.793985] R13: ffff977b00235c00 R14: ffff977baa14e280 R15: ffff977f6bd31260
> > > > [ 2224.794681] FS:  0000000000000000(0000) GS:ffff977f6bd00000(0000) knlGS:0000000000000000
> > > > [ 2224.795439] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [ 2224.796117] CR2: 00007f2d7e5dc000 CR3: 0000000222340005 CR4: 0000000000770ef0
> > > > [ 2224.796887] PKRU: 55555554
> > > > [ 2224.797384] Kernel panic - not syncing: Fatal exception in interrupt
> > > > [ 2224.798304] Kernel Offset: 0x7000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > [ 2224.799190] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> > > >
> > >
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues
  2024-11-06  9:04                 ` Xuan Zhuo
@ 2024-11-06  9:43                   ` Michael S. Tsirkin
  2024-11-06 10:09                     ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2024-11-06  9:43 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Linux regressions mailing list, virtualization, Jaroslav Pulchart

On Wed, Nov 06, 2024 at 05:04:34PM +0800, Xuan Zhuo wrote:
> On Wed, 6 Nov 2024 04:01:43 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Mon, Sep 16, 2024 at 09:32:38AM +0200, Jaroslav Pulchart wrote:
> > > >
> > > > On Fri, Sep 13, 2024 at 11:21:11AM +0200, Jaroslav Pulchart wrote:
> > > > > So far:
> > > > >
> > > > > 1/ I was able to "do a reproducer" and hit the "random memory
> > > > > corruption" issue with vanila 6.10.10 in our setup in ~28m of uptime
> > > > > see attached 6.10.10-1.gdc.el9.x86_64.log.
> > > > > 2/ I reverted these commits
> > > > > "virtio_net: rx remove premapped failover code":
> > > > > defd28aa5acb0fd7c15adc6bc40a8ac277d04dea
> > > > > "virtio_net: big mode skip the unmap check":
> > > > > a377ae542d8d0a20a3173da3bbba72e045bea7a9
> > > > > "virtio_ring: enable premapped mode whatever use_dma_api":
> > > > > f9dac92ba9081062a6477ee015bd3b8c5914efc4
> > > > > in our next build and so far the environment is stable and not
> > > > > crashing under same conditions like the previous crash.
> > > >
> > > >
> > > > Automated backport failed:
> > > >
> > > > http://lore.kernel.org/all/2024091336-family-daffodil-541d@gregkh
> > > >
> > > > Since you have done the revert, and actually tested it, feel free
> > > > to post, I will ack.
> > > >
> > > >
> > >
> > > What I did is:
> > > git checkout linux-6.10.y
> > > git revert defd28aa5acb0fd7c15adc6bc40a8ac277d04dea
> > > git revert a377ae542d8d0a20a3173da3bbba72e045bea7a9
> > > git revert f9dac92ba9081062a6477ee015bd3b8c5914efc4
> > > (no changes nor fixing conflicts was needed)
> > >
> > > I'm newbie in posting the changes to upstream, Can you help me with
> > > some simple steps on how to do it?
> >
> > Basically in this case, I think it is enough
> > to reply to the revert patches and CC stable.
> 
> Oh, I am ok.
> 
> If need me to do something, please let me know.
> 
> Thanks.

yes, pls reply and CC stable ;)

> >
> >
> >
> > > >
> > > >
> > > > >
> > > > > pá 13. 9. 2024 v 10:51 odesílatel Linux regression tracking (Thorsten
> > > > > Leemhuis) <regressions@leemhuis.info> napsal:
> > > > > >
> > > > > > On 13.09.24 10:42, Xuan Zhuo wrote:
> > > > > > > On Fri, 13 Sep 2024 10:26:57 +0200, "Linux regression tracking (Thorsten Leemhuis)" <regressions@leemhuis.info> wrote:
> > > > > > >> [CCing a few people that know more about this stuff than I do]
> > > > > > >>
> > > > > > >> On 13.09.24 09:50, Jaroslav Pulchart wrote:
> > > > > > >>>
> > > > > > >>> actually I'm getting random memory corruption related crashes after
> > > > > > >>> updating to 6.10.y. My expectation is that it relates to this issue:
> > > > > > >>> https://bugzilla.kernel.org/show_bug.cgi?id=219154
> > > > > > >>> It looks like it is almost 1 month ago
> > > > > > >>
> > > > > > >> A lot of developer ignore bugzilla.
> > > > > > >>
> > > > > > >>> already from the last comment
> > > > > > >>> there, However the patches fixing the regression are not reverted from
> > > > > > >>> the 6.10.y tree which surprises me.
> > > > > > >>>
> > > > > > >>> I will try to revert them from our builds and see if it helps to avoid
> > > > > > >>> random daily happening crashes.
> > > > > > >>
> > > > > > >> Not my area of expertise, but to me it sounds like the problem will be
> > > > > > >> resolved my "Revert "virtio_net: rx enable premapped mode by default"":
> > > > > > >> https://lore.kernel.org/all/20240820071913.68004-1-xuanzhuo@linux.alibaba.com/
> > > > > > >
> > > > > > > YES. That is merged into net.
> > > > > >
> > > > > > Well, yes, but TWIMC to avoid confusion, it's already one step further,
> > > > > > as mentioned:
> > > > > >
> > > > > > >> That set just landed in mainline.
> > > > > >
> > > > > > See
> > > > > > https://git.kernel.org/torvalds/c/48aa361c5db0b380c2b75c24984c0d3e7c1e8c09
> > > > > > or
> > > > > > https://git.kernel.org/torvalds/c/111fc9f517cb293c4213673733b980123c3b0209
> > > > > >
> > > > > > Ciao, Thorsten
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Jaroslav Pulchart
> > > > > Sr. Principal SW Engineer
> > > > > GoodData
> > > >
> > > > > [ 2224.743780] Oops: stack segment: 0000 [#1] PREEMPT SMP NOPTI
> > > > > [ 2224.744605] CPU: 1 PID: 52 Comm: kswapd0 Tainted: G            E      6.10.10-1.gdc.el9.x86_64 #1
> > > > > [ 2224.745375] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20240524-1.el9 05/24/2024
> > > > > [ 2224.746094] RIP: 0010:refill_obj_stock+0x40/0x170
> > > > > [ 2224.746629] Code: 5c fa 65 48 8b 05 c8 c4 bd 77 4c 8d b8 60 12 03 00 49 8b 47 10 48 39 f8 74 5d 4c 89 ff e8 78 ed ff ff 49 89 c6 e8 f0 34 d7 ff <48> 8b 45 00 a8 03 0f 85 ca 00 00 00 65 48 ff 00 e8 ab 74 d7 ff 49
> > > > > [ 2224.748241] RSP: 0018:ffffa5024010ce10 EFLAGS: 00010002
> > > > > [ 2224.748803] RAX: 0000000000000002 RBX: 00000000000000c8 RCX: 00002d82d4038240
> > > > > [ 2224.749449] RDX: ffff977b00aa9a00 RSI: 0000000000000001 RDI: ffff977b00aa9a00
> > > > > [ 2224.750082] RBP: a91ef76620614d85 R08: 0000000000000001 R09: ffffffff881b9077
> > > > > [ 2224.750720] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000282
> > > > > [ 2224.751359] R13: ffff977b00235c00 R14: ffff977baa14e280 R15: ffff977f6bd31260
> > > > > [ 2224.752183] FS:  0000000000000000(0000) GS:ffff977f6bd00000(0000) knlGS:0000000000000000
> > > > > [ 2224.752952] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [ 2224.753593] CR2: 00007f2d7e5dc000 CR3: 0000000222340005 CR4: 0000000000770ef0
> > > > > [ 2224.754271] PKRU: 55555554
> > > > > [ 2224.754697] Call Trace:
> > > > > [ 2224.755112]  <IRQ>
> > > > > [ 2224.755509]  ? die+0x33/0x90
> > > > > [ 2224.755949]  ? do_trap+0xd9/0x100
> > > > > [ 2224.756418]  ? do_error_trap+0x65/0x80
> > > > > [ 2224.756903]  ? exc_stack_segment+0x35/0x50
> > > > > [ 2224.757417]  ? asm_exc_stack_segment+0x22/0x30
> > > > > [ 2224.757999]  ? rcu_do_batch+0x1a7/0x530
> > > > > [ 2224.758549]  ? refill_obj_stock+0x40/0x170
> > > > > [ 2224.759125]  __memcg_slab_free_hook+0xb0/0x140
> > > > > [ 2224.759723]  kmem_cache_free+0x3b2/0x3e0
> > > > > [ 2224.760292]  ? rcu_do_batch+0x1a7/0x530
> > > > > [ 2224.760845]  rcu_do_batch+0x1a7/0x530
> > > > > [ 2224.761399]  ? rcu_do_batch+0x13b/0x530
> > > > > [ 2224.761950]  rcu_core+0x256/0x420
> > > > > [ 2224.762475]  ? ktime_get+0x34/0xc0
> > > > > [ 2224.763010]  handle_softirqs+0xd3/0x2b0
> > > > > [ 2224.763573]  __irq_exit_rcu+0x9b/0xc0
> > > > > [ 2224.764118]  sysvec_apic_timer_interrupt+0x71/0x90
> > > > > [ 2224.764738]  </IRQ>
> > > > > [ 2224.765159]  <TASK>
> > > > > [ 2224.765594]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> > > > > [ 2224.766163] RIP: 0010:mem_cgroup_from_slab_obj+0x51/0x130
> > > > > [ 2224.766750] Code: 01 c8 48 8b 35 58 9d 28 01 48 c1 e8 0c 48 c1 e0 06 48 01 f0 48 8b 78 08 48 89 c1 40 f6 c7 01 0f 85 cd 00 00 00 66 90 8b 41 30 <25> 00 10 00 f0 3d 00 00 00 f0 74 45 48 8b 51 38 f6 c2 01 75 15 48
> > > > > [ 2224.768355] RSP: 0018:ffffa502403cfa70 EFLAGS: 00000202
> > > > > [ 2224.768994] RAX: 00000000ffffefff RBX: ffff977b9fbb7000 RCX: ffffc69214c0b500
> > > > > [ 2224.769747] RDX: ffff977f302d6a40 RSI: ffffc69200000000 RDI: ffffc69214c0b501
> > > > > [ 2224.770504] RBP: ffff977f302d6a40 R08: ffff977f300e58c8 R09: ffff977f300e58c8
> > > > > [ 2224.771246] R10: 0000000000000000 R11: ffffa502403cf900 R12: ffff977b9fbb7498
> > > > > [ 2224.771974] R13: 0000000000000000 R14: ffff977b9fbb7070 R15: 0000000000000000
> > > > > [ 2224.772678]  list_lru_add_obj+0x6b/0xa0
> > > > > [ 2224.773158]  iput+0x1f1/0x210
> > > > > [ 2224.773596]  __dentry_kill+0x71/0x170
> > > > > [ 2224.774055]  shrink_dentry_list+0x67/0xe0
> > > > > [ 2224.774542]  prune_dcache_sb+0x54/0x80
> > > > > [ 2224.774996]  super_cache_scan+0x120/0x1c0
> > > > > [ 2224.775470]  do_shrink_slab+0x134/0x350
> > > > > [ 2224.775916]  shrink_slab_memcg+0x199/0x2c0
> > > > > [ 2224.776387]  shrink_one+0x118/0x1b0
> > > > > [ 2224.776845]  shrink_many+0x127/0x2a0
> > > > > [ 2224.777314]  shrink_node+0x3d7/0x430
> > > > > [ 2224.777765]  ? pick_next_task+0x5a/0xae0
> > > > > [ 2224.778250]  balance_pgdat+0x29c/0x730
> > > > > [ 2224.778704]  ? __try_to_del_timer_sync+0x62/0xa0
> > > > > [ 2224.779227]  ? __pfx_kswapd+0x10/0x10
> > > > > [ 2224.779674]  kswapd+0xf7/0x180
> > > > > [ 2224.780082]  kthread+0xcc/0x100
> > > > > [ 2224.780483]  ? __pfx_kthread+0x10/0x10
> > > > > [ 2224.780887]  ret_from_fork+0x2d/0x50
> > > > > [ 2224.781297]  ? __pfx_kthread+0x10/0x10
> > > > > [ 2224.781703]  ret_from_fork_asm+0x1a/0x30
> > > > > [ 2224.782118]  </TASK>
> > > > > [ 2224.782451] Modules linked in: udp_diag(E) tcp_diag(E) inet_diag(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) zram(E) tls(E) isofs(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) ccp(E) kvm(E) virtio_gpu(E) virtio_net(E) i2c_i801(E) i2c_smbus(E) net_failover(E) failover(E) dimlib(E) virtio_dma_buf(E) virtio_balloon(E) vfat(E) fat(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E) libata(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> > > > > [ 2224.782487] Unloaded tainted modules: amd_atl(E):2 edac_mce_amd(E):1 padlock_aes(E):3
> > > > > [ 2224.787698] ---[ end trace 0000000000000000 ]---
> > > > > [ 2224.788286] RIP: 0010:refill_obj_stock+0x40/0x170
> > > > > [ 2224.788860] Code: 5c fa 65 48 8b 05 c8 c4 bd 77 4c 8d b8 60 12 03 00 49 8b 47 10 48 39 f8 74 5d 4c 89 ff e8 78 ed ff ff 49 89 c6 e8 f0 34 d7 ff <48> 8b 45 00 a8 03 0f 85 ca 00 00 00 65 48 ff 00 e8 ab 74 d7 ff 49
> > > > > [ 2224.790600] RSP: 0018:ffffa5024010ce10 EFLAGS: 00010002
> > > > > [ 2224.791230] RAX: 0000000000000002 RBX: 00000000000000c8 RCX: 00002d82d4038240
> > > > > [ 2224.791924] RDX: ffff977b00aa9a00 RSI: 0000000000000001 RDI: ffff977b00aa9a00
> > > > > [ 2224.792610] RBP: a91ef76620614d85 R08: 0000000000000001 R09: ffffffff881b9077
> > > > > [ 2224.793303] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000282
> > > > > [ 2224.793985] R13: ffff977b00235c00 R14: ffff977baa14e280 R15: ffff977f6bd31260
> > > > > [ 2224.794681] FS:  0000000000000000(0000) GS:ffff977f6bd00000(0000) knlGS:0000000000000000
> > > > > [ 2224.795439] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [ 2224.796117] CR2: 00007f2d7e5dc000 CR3: 0000000222340005 CR4: 0000000000770ef0
> > > > > [ 2224.796887] PKRU: 55555554
> > > > > [ 2224.797384] Kernel panic - not syncing: Fatal exception in interrupt
> > > > > [ 2224.798304] Kernel Offset: 0x7000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > > [ 2224.799190] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> > > > >
> > > >
> >


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues
  2024-11-06  9:43                   ` Michael S. Tsirkin
@ 2024-11-06 10:09                     ` Linux regression tracking (Thorsten Leemhuis)
  2024-11-07  2:21                       ` Xuan Zhuo
  0 siblings, 1 reply; 13+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2024-11-06 10:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, Xuan Zhuo
  Cc: Linux regressions mailing list, virtualization, Jaroslav Pulchart

On 06.11.24 10:43, Michael S. Tsirkin wrote:
> On Wed, Nov 06, 2024 at 05:04:34PM +0800, Xuan Zhuo wrote:
>> On Wed, 6 Nov 2024 04:01:43 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>> On Mon, Sep 16, 2024 at 09:32:38AM +0200, Jaroslav Pulchart wrote:
>>>>> On Fri, Sep 13, 2024 at 11:21:11AM +0200, Jaroslav Pulchart wrote:
>>>>>>
>>>>>> 1/ I was able to "do a reproducer" and hit the "random memory
>>>>>> corruption" issue with vanila 6.10.10 in our setup in ~28m of uptime
>>>>>> see attached 6.10.10-1.gdc.el9.x86_64.log.
>>>>>> 2/ I reverted these commits
>>>>>> "virtio_net: rx remove premapped failover code":
>>>>>> defd28aa5acb0fd7c15adc6bc40a8ac277d04dea
>>>>>> "virtio_net: big mode skip the unmap check":
>>>>>> a377ae542d8d0a20a3173da3bbba72e045bea7a9
>>>>>> "virtio_ring: enable premapped mode whatever use_dma_api":
>>>>>> f9dac92ba9081062a6477ee015bd3b8c5914efc4
>>>>>> in our next build and so far the environment is stable and not
>>>>>> crashing under same conditions like the previous crash.
>>>>>
>>>>> Automated backport failed:
>>>>>
>>>>> http://lore.kernel.org/all/2024091336-family-daffodil-541d@gregkh
>>>>>
>>>>> Since you have done the revert, and actually tested it, feel free
>>>>> to post, I will ack.
>>>>
>>>> What I did is:
>>>> git checkout linux-6.10.y
>>>> git revert defd28aa5acb0fd7c15adc6bc40a8ac277d04dea
>>>> git revert a377ae542d8d0a20a3173da3bbba72e045bea7a9
>>>> git revert f9dac92ba9081062a6477ee015bd3b8c5914efc4
>>>> (no changes nor fixing conflicts was needed)
>>>>
>>>> I'm newbie in posting the changes to upstream, Can you help me with
>>>> some simple steps on how to do it?
>>>
>>> Basically in this case, I think it is enough
>>> to reply to the revert patches and CC stable.
>>
>> Oh, I am ok.
>>
>> If need me to do something, please let me know.
>>
>> Thanks.
> 
> yes, pls reply and CC stable ;)

I see that ";)", but it seems I'm missing something here:

Why did this thread resurface? 6.10.y is EOL since a while now, so that
reverting ship has sailed. And the commit in question is in 6.11 afaics.

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues
  2024-11-06 10:09                     ` Linux regression tracking (Thorsten Leemhuis)
@ 2024-11-07  2:21                       ` Xuan Zhuo
  0 siblings, 0 replies; 13+ messages in thread
From: Xuan Zhuo @ 2024-11-07  2:21 UTC (permalink / raw)
  To: Linux regression tracking (Thorsten Leemhuis)
  Cc: Linux regressions mailing list, virtualization, Jaroslav Pulchart,
	Michael S. Tsirkin

On Wed, 6 Nov 2024 11:09:17 +0100, "Linux regression tracking (Thorsten Leemhuis)" <regressions@leemhuis.info> wrote:
> On 06.11.24 10:43, Michael S. Tsirkin wrote:
> > On Wed, Nov 06, 2024 at 05:04:34PM +0800, Xuan Zhuo wrote:
> >> On Wed, 6 Nov 2024 04:01:43 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>> On Mon, Sep 16, 2024 at 09:32:38AM +0200, Jaroslav Pulchart wrote:
> >>>>> On Fri, Sep 13, 2024 at 11:21:11AM +0200, Jaroslav Pulchart wrote:
> >>>>>>
> >>>>>> 1/ I was able to "do a reproducer" and hit the "random memory
> >>>>>> corruption" issue with vanila 6.10.10 in our setup in ~28m of uptime
> >>>>>> see attached 6.10.10-1.gdc.el9.x86_64.log.
> >>>>>> 2/ I reverted these commits
> >>>>>> "virtio_net: rx remove premapped failover code":
> >>>>>> defd28aa5acb0fd7c15adc6bc40a8ac277d04dea
> >>>>>> "virtio_net: big mode skip the unmap check":
> >>>>>> a377ae542d8d0a20a3173da3bbba72e045bea7a9
> >>>>>> "virtio_ring: enable premapped mode whatever use_dma_api":
> >>>>>> f9dac92ba9081062a6477ee015bd3b8c5914efc4
> >>>>>> in our next build and so far the environment is stable and not
> >>>>>> crashing under same conditions like the previous crash.
> >>>>>
> >>>>> Automated backport failed:
> >>>>>
> >>>>> http://lore.kernel.org/all/2024091336-family-daffodil-541d@gregkh
> >>>>>
> >>>>> Since you have done the revert, and actually tested it, feel free
> >>>>> to post, I will ack.
> >>>>
> >>>> What I did is:
> >>>> git checkout linux-6.10.y
> >>>> git revert defd28aa5acb0fd7c15adc6bc40a8ac277d04dea
> >>>> git revert a377ae542d8d0a20a3173da3bbba72e045bea7a9
> >>>> git revert f9dac92ba9081062a6477ee015bd3b8c5914efc4
> >>>> (no changes nor fixing conflicts was needed)
> >>>>
> >>>> I'm newbie in posting the changes to upstream, Can you help me with
> >>>> some simple steps on how to do it?
> >>>
> >>> Basically in this case, I think it is enough
> >>> to reply to the revert patches and CC stable.
> >>
> >> Oh, I am ok.
> >>
> >> If need me to do something, please let me know.
> >>
> >> Thanks.
> >
> > yes, pls reply and CC stable ;)
>
> I see that ";)", but it seems I'm missing something here:
>
> Why did this thread resurface? 6.10.y is EOL since a while now, so that
> reverting ship has sailed. And the commit in question is in 6.11 afaics.
>


I did a simple research that the only version was affected is 6.10.y.

6.9 does not include the problem commits.
6.11 is fixed by reverting the problem commits.

6.10.y failed on the backport.

But the 6.10.y is EOL, so the fix commit to 6.10.y is still needed?

If yes, I can post a fix commit.


Thanks.


> Ciao, Thorsten

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-11-07  2:27 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-06  6:20 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues Jaroslav Pulchart
2024-09-13  7:50 ` Jaroslav Pulchart
2024-09-13  8:26   ` Linux regression tracking (Thorsten Leemhuis)
2024-09-13  8:42     ` Xuan Zhuo
2024-09-13  8:51       ` Linux regression tracking (Thorsten Leemhuis)
2024-09-13  9:21         ` Jaroslav Pulchart
2024-09-13 14:38           ` Michael S. Tsirkin
2024-09-16  7:32             ` Jaroslav Pulchart
2024-11-06  9:01               ` Michael S. Tsirkin
2024-11-06  9:04                 ` Xuan Zhuo
2024-11-06  9:43                   ` Michael S. Tsirkin
2024-11-06 10:09                     ` Linux regression tracking (Thorsten Leemhuis)
2024-11-07  2:21                       ` Xuan Zhuo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.