From: Gavin Shan <gshan@redhat.com>
To: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Igor Mammedov <imammedo@redhat.com>,
qemu-arm@nongnu.org, qemu-devel@nongnu.org, mst@redhat.com,
anisinha@redhat.com, gengdongjiu1@gmail.com,
peter.maydell@linaro.org, pbonzini@redhat.com,
mchehab+huawei@kernel.org, shan.gavin@gmail.com,
James Houghton <jthoughton@google.com>
Subject: Re: [PATCH RESEND v2 3/3] target/arm/kvm: Support multiple memory CPERs injection
Date: Fri, 7 Nov 2025 15:11:59 +1000 [thread overview]
Message-ID: <c96879f7-122c-4da9-bb2c-4b5b66e99033@redhat.com> (raw)
In-Reply-To: <20251105090242.00004f93@huawei.com>
Hi Jonathan,
On 11/5/25 7:02 PM, Jonathan Cameron wrote:
[...]
>>
>> I already had the prototype of error source per vcpu, which works fine for
>> 64KB-host-4KB-guest. However, it doesn't work for huge pages. For example,
>> a problematic 512MB huge page can cause heavy memory error storm to QEMU
>> where we absolutely can't handle.
>>
>> 1. Start the VM with hugetlb pages
>>
>> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
>> -accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on \
>> -cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \
>> -m 4096M,slots=16,maxmem=128G \
>> -object memory-backend-file,id=mem0,prealloc=on,mem-path=/dev/hugepages-524288kB,size=4096M \
>> -numa node,nodeid=0,cpus=0-7,memdev=mem0 \
>>
>> 2. Run 'victim -d' on guest
>>
>> guest$ ./victim -d
>> physical address of (0xffff889d6000) = 0x11a7da000
>> Hit any key to trigger error:
>>
>> 3. Inject error from host
>>
>> host$ errinjct 0x11a7da000
>>
>> 4. QEMU crashes with error message "Bus error (core dumped)", which is triggered
>> the following path.
>>
>> sigbus_handler
>> kvm_on_sigbus_vcpu // have_sigbus_pending = 1
>> sigbus_reraise
>
> To me this sounds like something that should not be happening on the host unless
> a real memory error is detected that blows away the whole of / most of a huge page.
> I'm not sure we care about surviving that case if it isn't mapped using hugetlb/DAX or
> similar in the guest (so contiguous in both with contained impact in both).
>
> I assume the issue is backing with hugetlbfs which doesn't have a sub huge page granularity
> for poison tracking. I vaguely recall an effort to solve that
> https://lore.kernel.org/linux-mm/20220624173656.2033256-1-jthoughton@google.com/
> was the first thing google threw me. Looks like it got to v2.
> https://lore.kernel.org/linux-mm/20230218002819.1486479-1-jthoughton@google.com/
>
> +CC James.
>
For this particular case where the guest memory is backed by 512MB hugetlb pages.
There are 8 hugetlb pages since the guest has 4GB memory. I agree it's impossible
to recover from this extreme situation for a couple of factors: (1) A problematic
huge page is likely to be shared by multiple vCPUs. Multiple SIGBUS signals can be
raised at once, but we're unable to handle; (2) The instruction (TEXT section) of
guest's application or kernel can reside in the problematic huge page. Any instruction
fetch just leads to SIGBUS signal, meaning the vCPUs can't continue their executions.
I'm summarizing my findings for above case, to make this thread complete at least.
Only one pending SIGBUS signal is allowed by QEMU in current implementation. Otherwise,
it crashes in sigbus_handler() by a SIGBUS signal sent from sigbus_reraise().
qemu
====
sigbus_handler
kvm_on_sigbus_vcpu
have_sigbus_pending = true;
qatomic_set(&cpu->exit_request, true)
:
kvm_cpu_exec
kvm_cpu_kick_self
kvm_cpu_kick
qatomic_set(&cpu->kvm_run->immediate_exit, 1);
kvm_vcpu_ioctl // Return immediately
kvm_arch_on_sigbus_vcpu
have_sigbus_pending = true;
There are two SIGBUS signals raised by host before the target vCPU can be stopped. The
first one is raised by host when the memory error is handled.
host
====
memory_failure
try_memory_failure_hugetlb
get_huge_page_for_hwpoison
__get_huge_page_for_hwpoison
folio_set_hugetlb_hwpoison
hwpoison_user_mappings
collect_procs // Collect tasks using the folio
unmap_poisoned_folio
try_to_unmap // TTU_HWPOISON
try_to_unmap_one
mmu_notifier_invalidate_range_start
swp_entry_to_pte(make_hwpoison_entry(subpage))
set_huge_pte_at // Poisoned PMD
mmu_notifier_invalidate_range_end
kill_procs // Raise SIGBUS
identify_page_state
The second one is raised by the stage2 page fault handler due to the poisoned PMD.
kvm_handle_guest_abort
user_mem_abort
__kvm_faultin_pfn
kvm_follow_pfn
hva_to_pfn
hva_to_pfn_fast
hva_to_pfn_slow
get_user_pages_unlocked
__get_user_pages_locked
__get_user_pages
follow_page_mask // No PMD mapping
faultin_page
handle_mm_fault
hugetlb_fault
is_hugetlb_entry_hwpoisoned // Return VM_FAULT_HWPOISON_LARGE
kvm_send_hwpoison_signal // Raise SIGBUS
Thanks,
Gavin
next prev parent reply other threads:[~2025-11-07 5:12 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-07 6:08 [PATCH RESEND v2 0/3] target/arm/kvm: Improve memory error handling Gavin Shan
2025-10-07 6:08 ` [PATCH RESEND v2 1/3] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs Gavin Shan
2025-10-31 9:58 ` Jonathan Cameron via
2025-10-31 10:08 ` Jonathan Cameron via
2025-11-02 22:45 ` Gavin Shan
2025-10-31 13:17 ` Igor Mammedov
2025-11-02 22:51 ` Gavin Shan
2025-10-07 6:08 ` [PATCH RESEND v2 2/3] kvm/arm/kvm: Introduce helper push_ghes_memory_errors() Gavin Shan
2025-10-31 10:09 ` Jonathan Cameron via
2025-11-02 23:39 ` Gavin Shan
2025-11-03 9:45 ` Igor Mammedov
2025-10-31 13:25 ` Igor Mammedov
2025-11-02 23:35 ` Gavin Shan
2025-10-07 6:08 ` [PATCH RESEND v2 3/3] target/arm/kvm: Support multiple memory CPERs injection Gavin Shan
2025-10-07 10:57 ` Mauro Carvalho Chehab
2025-10-08 3:57 ` Gavin Shan
2025-10-17 14:27 ` Igor Mammedov
2025-10-19 0:36 ` Gavin Shan
2025-10-31 13:55 ` Igor Mammedov
2025-11-02 23:02 ` Gavin Shan
2025-11-03 9:52 ` Igor Mammedov
2025-11-03 23:51 ` Gavin Shan
2025-11-06 7:57 ` Igor Mammedov
2025-11-06 21:43 ` Gavin Shan
2025-11-04 12:21 ` Jonathan Cameron via
2025-11-05 0:40 ` Gavin Shan
2025-11-05 9:02 ` Jonathan Cameron via
2025-11-07 5:11 ` Gavin Shan [this message]
2025-10-31 10:10 ` Jonathan Cameron via
2025-11-02 23:03 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c96879f7-122c-4da9-bb2c-4b5b66e99033@redhat.com \
--to=gshan@redhat.com \
--cc=anisinha@redhat.com \
--cc=gengdongjiu1@gmail.com \
--cc=imammedo@redhat.com \
--cc=jonathan.cameron@huawei.com \
--cc=jthoughton@google.com \
--cc=mchehab+huawei@kernel.org \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=shan.gavin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).