All of lore.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon@kernel.org
To: kvm@vger.kernel.org
Subject: [Bug 217562] New: kernel NULL pointer dereference on deletion of guest physical memory slot
Date: Fri, 16 Jun 2023 16:02:45 +0000	[thread overview]
Message-ID: <bug-217562-28872@https.bugzilla.kernel.org/> (raw)

https://bugzilla.kernel.org/show_bug.cgi?id=217562

            Bug ID: 217562
           Summary: kernel NULL pointer dereference on deletion of guest
                    physical memory slot
           Product: Virtualization
           Version: unspecified
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: arnaud.lefebvre@clever-cloud.com
        Regression: No

Created attachment 304438
  --> https://bugzilla.kernel.org/attachment.cgi?id=304438&action=edit
dmesg logs with decoded backtrace

Hello,

We've been having this BUG for the last 6 months on both Intel and AMD hosts
without being able to reproduce it on demand. The issue also occurs randomly:

[Mon Jun 12 10:50:08 UTC 2023] BUG: kernel NULL pointer dereference, address:
0000000000000008
[Mon Jun 12 10:50:08 UTC 2023] #PF: supervisor write access in kernel mode
[Mon Jun 12 10:50:08 UTC 2023] #PF: error_code(0x0002) - not-present page
[Mon Jun 12 10:50:08 UTC 2023] PGD 0 P4D 0
[Mon Jun 12 10:50:08 UTC 2023] Oops: 0002 [#1] SMP NOPTI
[Mon Jun 12 10:50:08 UTC 2023] CPU: 88 PID: 856806 Comm: qemu Kdump: loaded Not
tainted 5.15.115 #1
[Mon Jun 12 10:50:08 UTC 2023] Hardware name: MCT         Capri                
         /Capri           , BIOS V2010 04/19/2022
[Mon Jun 12 10:50:08 UTC 2023] RIP: 0010:__handle_changed_spte+0x5f3/0x670
[Mon Jun 12 10:50:08 UTC 2023] Code: b8 a8 00 00 00 e9 4d be 0f 00 4d 8d be 60
6a 01 00 4c 89 44 24 08 4c 89 ff e8 69 30 43 01 4c 8b 44 24 08 49 8b 40 08 49
8b 10 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 49 89 00 48 83
[Mon Jun 12 10:50:09 UTC 2023] RSP: 0018:ffffc90029477840 EFLAGS: 00010246
[Mon Jun 12 10:50:09 UTC 2023] RAX: 0000000000000000 RBX: ffff89581a1f6000 RCX:
0000000000000000
[Mon Jun 12 10:50:09 UTC 2023] RDX: 0000000000000000 RSI: 0000000000000002 RDI:
ffffc9002d858a60
[Mon Jun 12 10:50:09 UTC 2023] RBP: 0000000000002200 R08: ffff893450426450 R09:
0000000000000002
[Mon Jun 12 10:50:09 UTC 2023] R10: 0000000000000001 R11: 0000000000000001 R12:
0000000000000001
[Mon Jun 12 10:50:09 UTC 2023] R13: 00000000000005a0 R14: ffffc9002d842000 R15:
ffffc9002d858a60
[Mon Jun 12 10:50:09 UTC 2023] FS:  00007fdb6c1ff6c0(0000)
GS:ffff89804d800000(0000) knlGS:0000000000000000
[Mon Jun 12 10:50:09 UTC 2023] CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[Mon Jun 12 10:50:09 UTC 2023] CR2: 0000000000000008 CR3: 0000005380534005 CR4:
0000000000770ee0
[Mon Jun 12 10:50:09 UTC 2023] PKRU: 55555554
[Mon Jun 12 10:50:09 UTC 2023] Call Trace:
[Mon Jun 12 10:50:09 UTC 2023]  <TASK>
[Mon Jun 12 10:50:09 UTC 2023]  ? __die+0x50/0x8d
[Mon Jun 12 10:50:09 UTC 2023]  ? page_fault_oops+0x184/0x2f0
[Mon Jun 12 10:50:09 UTC 2023]  ? exc_page_fault+0x535/0x7d0
[Mon Jun 12 10:50:09 UTC 2023]  ? asm_exc_page_fault+0x22/0x30
[Mon Jun 12 10:50:09 UTC 2023]  ? __handle_changed_spte+0x5f3/0x670
[Mon Jun 12 10:50:09 UTC 2023]  ? update_load_avg+0x73/0x560
[Mon Jun 12 10:50:09 UTC 2023]  __handle_changed_spte+0x3ae/0x670
[Mon Jun 12 10:50:09 UTC 2023]  __handle_changed_spte+0x3ae/0x670
[Mon Jun 12 10:50:09 UTC 2023]  zap_gfn_range+0x21a/0x320
[Mon Jun 12 10:50:09 UTC 2023]  kvm_tdp_mmu_zap_invalidated_roots+0x50/0xa0
[Mon Jun 12 10:50:09 UTC 2023]  kvm_mmu_zap_all_fast+0x178/0x1b0
[Mon Jun 12 10:50:09 UTC 2023]  kvm_page_track_flush_slot+0x4f/0x90
[Mon Jun 12 10:50:09 UTC 2023]  kvm_set_memslot+0x32b/0x8e0
[Mon Jun 12 10:50:09 UTC 2023]  kvm_delete_memslot+0x58/0x80
[Mon Jun 12 10:50:09 UTC 2023]  __kvm_set_memory_region+0x3c4/0x4a0
[Mon Jun 12 10:50:09 UTC 2023]  kvm_vm_ioctl+0x3d1/0xea0
[Mon Jun 12 10:50:09 UTC 2023]  __x64_sys_ioctl+0x8b/0xc0
[Mon Jun 12 10:50:09 UTC 2023]  do_syscall_64+0x3f/0x90
[Mon Jun 12 10:50:09 UTC 2023]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[Mon Jun 12 10:50:09 UTC 2023] RIP: 0033:0x7fdc71e3a5ef
[Mon Jun 12 10:50:09 UTC 2023] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7
04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00
0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[Mon Jun 12 10:50:09 UTC 2023] RSP: 002b:00007fdb6c1fc920 EFLAGS: 00000246
ORIG_RAX: 0000000000000010
[Mon Jun 12 10:50:09 UTC 2023] RAX: ffffffffffffffda RBX: 000000004020ae46 RCX:
00007fdc71e3a5ef
[Mon Jun 12 10:50:09 UTC 2023] RDX: 00007fdb6c1fca40 RSI: 000000004020ae46 RDI:
000000000000000d
[Mon Jun 12 10:50:09 UTC 2023] RBP: 00007fdc714fa000 R08: 0000000000000002 R09:
00007fdc6f7e8e10
[Mon Jun 12 10:50:09 UTC 2023] R10: 00007fdc724966e8 R11: 0000000000000246 R12:
00007fdb6c1fca40
[Mon Jun 12 10:50:09 UTC 2023] R13: 0000000001000000 R14: 00007fdb68400000 R15:
00000000fd000000
[Mon Jun 12 10:50:09 UTC 2023]  </TASK>
[Mon Jun 12 10:50:09 UTC 2023] CR2: 0000000000000008
[Mon Jun 12 10:50:09 UTC 2023] ---[ end trace 353e5ae9ef11cd10 ]---
[Mon Jun 12 10:50:09 UTC 2023] RIP: 0010:__handle_changed_spte+0x5f3/0x670
[Mon Jun 12 10:50:09 UTC 2023] Code: b8 a8 00 00 00 e9 4d be 0f 00 4d 8d be 60
6a 01 00 4c 89 44 24 08 4c 89 ff e8 69 30 43 01 4c 8b 44 24 08 49 8b 40 08 49
8b 10 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 49 89 00 48 83
[Mon Jun 12 10:50:09 UTC 2023] RSP: 0018:ffffc90029477840 EFLAGS: 00010246
[Mon Jun 12 10:50:09 UTC 2023] RAX: 0000000000000000 RBX: ffff89581a1f6000 RCX:
0000000000000000
[Mon Jun 12 10:50:09 UTC 2023] RDX: 0000000000000000 RSI: 0000000000000002 RDI:
ffffc9002d858a60
[Mon Jun 12 10:50:09 UTC 2023] RBP: 0000000000002200 R08: ffff893450426450 R09:
0000000000000002
[Mon Jun 12 10:50:09 UTC 2023] R10: 0000000000000001 R11: 0000000000000001 R12:
0000000000000001
[Mon Jun 12 10:50:09 UTC 2023] R13: 00000000000005a0 R14: ffffc9002d842000 R15:
ffffc9002d858a60
[Mon Jun 12 10:50:09 UTC 2023] FS:  00007fdb6c1ff6c0(0000)
GS:ffff89804d800000(0000) knlGS:0000000000000000
[Mon Jun 12 10:50:09 UTC 2023] CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[Mon Jun 12 10:50:09 UTC 2023] CR2: 0000000000000008 CR3: 0000005380534005 CR4:
0000000000770ee0
[Mon Jun 12 10:50:09 UTC 2023] PKRU: 55555554

We've seen this issue with kernel 5.15.115, 5.15.79, some versions between the
two, and probably a 5.15.4x (not sure here). At the beginning, only a few
"identical" hosts (same hardware model) had this issue but since then we've
also had crashes on hosts running different hardware. Unfortunately, it
sometimes takes a  few weeks to trigger (last occurrence before this one was 2
months ago) and we can't really think of a way to reproduce this.

As you can see in the dmesg.log.gz file, this bug then creates soft lockups for
other processes, I guess because they wait for some kind of lock that never
gets released. The host then becomes more and more unresponsive as time goes
by.

Let me know if I can provide any other details.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

             reply	other threads:[~2023-06-16 16:03 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-16 16:02 bugzilla-daemon [this message]
2023-06-16 23:53 ` [Bug 217562] New: kernel NULL pointer dereference on deletion of guest physical memory slot Sean Christopherson
2023-06-16 23:53 ` [Bug 217562] " bugzilla-daemon
2023-06-22 17:51 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-217562-28872@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@kernel.org \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.