Re: [PATCH RESEND v2 3/3] target/arm/kvm: Support multiple memory CPERs injection

qemu-arm.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Gavin Shan <gshan@redhat.com>
To: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Igor Mammedov <imammedo@redhat.com>,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org, mst@redhat.com,
	anisinha@redhat.com, gengdongjiu1@gmail.com,
	peter.maydell@linaro.org, pbonzini@redhat.com,
	mchehab+huawei@kernel.org, shan.gavin@gmail.com,
	James Houghton <jthoughton@google.com>
Subject: Re: [PATCH RESEND v2 3/3] target/arm/kvm: Support multiple memory CPERs injection
Date: Fri, 7 Nov 2025 15:11:59 +1000	[thread overview]
Message-ID: <c96879f7-122c-4da9-bb2c-4b5b66e99033@redhat.com> (raw)
In-Reply-To: <20251105090242.00004f93@huawei.com>

Hi Jonathan,

On 11/5/25 7:02 PM, Jonathan Cameron wrote:

[...]

>>
>> I already had the prototype of error source per vcpu, which works fine for
>> 64KB-host-4KB-guest. However, it doesn't work for huge pages. For example,
>> a problematic 512MB huge page can cause heavy memory error storm to QEMU
>> where we absolutely can't handle.
>>
>> 1. Start the VM with hugetlb pages
>>
>> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64                                     \
>> -accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on                                  \
>> -cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1                      \
>> -m 4096M,slots=16,maxmem=128G                                                               \
>> -object memory-backend-file,id=mem0,prealloc=on,mem-path=/dev/hugepages-524288kB,size=4096M \
>> -numa node,nodeid=0,cpus=0-7,memdev=mem0                                                    \
>>
>> 2. Run 'victim -d' on guest
>>
>> guest$ ./victim -d
>> physical address of (0xffff889d6000) = 0x11a7da000
>> Hit any key to trigger error:
>>
>> 3. Inject error from host
>>
>> host$ errinjct 0x11a7da000
>>
>> 4. QEMU crashes with error message "Bus error (core dumped)", which is triggered
>> the following path.
>>
>> sigbus_handler
>>     kvm_on_sigbus_vcpu           // have_sigbus_pending = 1
>>     sigbus_reraise
> 
> To me this sounds like something that should not be happening on the host unless
> a real memory error is detected that blows away the whole of / most of a huge page.
> I'm not sure we care about surviving that case if it isn't mapped using hugetlb/DAX or
> similar in the guest (so contiguous in both with contained impact in both).
> 
> I assume the issue is backing with hugetlbfs which doesn't have a sub huge page granularity
> for poison tracking.  I vaguely recall an effort to solve that
> https://lore.kernel.org/linux-mm/20220624173656.2033256-1-jthoughton@google.com/
> was the first thing google threw me. Looks like it got to v2.
> https://lore.kernel.org/linux-mm/20230218002819.1486479-1-jthoughton@google.com/
> 
> +CC James.
> 

For this particular case where the guest memory is backed by 512MB hugetlb pages.
There are 8 hugetlb pages since the guest has 4GB memory. I agree it's impossible
to recover from this extreme situation for a couple of factors: (1) A problematic
huge page is likely to be shared by multiple vCPUs. Multiple SIGBUS signals can be
raised at once, but we're unable to handle; (2) The instruction (TEXT section) of
guest's application or kernel can reside in the problematic huge page. Any instruction
fetch just leads to SIGBUS signal, meaning the vCPUs can't continue their executions.

I'm summarizing my findings for above case, to make this thread complete at least.

Only one pending SIGBUS signal is allowed by QEMU in current implementation. Otherwise,
it crashes in sigbus_handler() by a SIGBUS signal sent from sigbus_reraise().

   qemu
   ====
   sigbus_handler
     kvm_on_sigbus_vcpu
       have_sigbus_pending = true;
       qatomic_set(&cpu->exit_request, true)
            :
   kvm_cpu_exec
     kvm_cpu_kick_self
       kvm_cpu_kick
         qatomic_set(&cpu->kvm_run->immediate_exit, 1);
     kvm_vcpu_ioctl                                       // Return immediately
     kvm_arch_on_sigbus_vcpu
     have_sigbus_pending = true;

There are two SIGBUS signals raised by host before the target vCPU can be stopped. The
first one is raised by host when the memory error is handled.

   host
   ====
   memory_failure
     try_memory_failure_hugetlb
       get_huge_page_for_hwpoison
         __get_huge_page_for_hwpoison
           folio_set_hugetlb_hwpoison
     hwpoison_user_mappings
       collect_procs                                     // Collect tasks using the folio
       unmap_poisoned_folio
         try_to_unmap                                    // TTU_HWPOISON
           try_to_unmap_one
             mmu_notifier_invalidate_range_start
             swp_entry_to_pte(make_hwpoison_entry(subpage))
             set_huge_pte_at                             // Poisoned PMD
             mmu_notifier_invalidate_range_end
       kill_procs                                        // Raise SIGBUS
     identify_page_state


The second one is raised by the stage2 page fault handler due to the poisoned PMD.

   kvm_handle_guest_abort
     user_mem_abort
       __kvm_faultin_pfn
         kvm_follow_pfn
           hva_to_pfn
             hva_to_pfn_fast
             hva_to_pfn_slow
               get_user_pages_unlocked
                 __get_user_pages_locked
                   __get_user_pages
                     follow_page_mask                      // No PMD mapping
                     faultin_page
                       handle_mm_fault
                         hugetlb_fault
                           is_hugetlb_entry_hwpoisoned     // Return VM_FAULT_HWPOISON_LARGE
       kvm_send_hwpoison_signal                            // Raise SIGBUS

Thanks,
Gavin

next prev parent reply	other threads:[~2025-11-07  5:12 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-07  6:08 [PATCH RESEND v2 0/3] target/arm/kvm: Improve memory error handling Gavin Shan
2025-10-07  6:08 ` [PATCH RESEND v2 1/3] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs Gavin Shan
2025-10-31  9:58   ` Jonathan Cameron via
2025-10-31 10:08     ` Jonathan Cameron via
2025-11-02 22:45       ` Gavin Shan
2025-10-31 13:17   ` Igor Mammedov
2025-11-02 22:51     ` Gavin Shan
2025-10-07  6:08 ` [PATCH RESEND v2 2/3] kvm/arm/kvm: Introduce helper push_ghes_memory_errors() Gavin Shan
2025-10-31 10:09   ` Jonathan Cameron via
2025-11-02 23:39     ` Gavin Shan
2025-11-03  9:45       ` Igor Mammedov
2025-10-31 13:25   ` Igor Mammedov
2025-11-02 23:35     ` Gavin Shan
2025-10-07  6:08 ` [PATCH RESEND v2 3/3] target/arm/kvm: Support multiple memory CPERs injection Gavin Shan
2025-10-07 10:57   ` Mauro Carvalho Chehab
2025-10-08  3:57     ` Gavin Shan
2025-10-17 14:27   ` Igor Mammedov
2025-10-19  0:36     ` Gavin Shan
2025-10-31 13:55       ` Igor Mammedov
2025-11-02 23:02         ` Gavin Shan
2025-11-03  9:52           ` Igor Mammedov
2025-11-03 23:51             ` Gavin Shan
2025-11-06  7:57               ` Igor Mammedov
2025-11-06 21:43                 ` Gavin Shan
2025-11-04 12:21             ` Jonathan Cameron via
2025-11-05  0:40               ` Gavin Shan
2025-11-05  9:02                 ` Jonathan Cameron via
2025-11-07  5:11                   ` Gavin Shan [this message]
2025-10-31 10:10   ` Jonathan Cameron via
2025-11-02 23:03     ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c96879f7-122c-4da9-bb2c-4b5b66e99033@redhat.com \
    --to=gshan@redhat.com \
    --cc=anisinha@redhat.com \
    --cc=gengdongjiu1@gmail.com \
    --cc=imammedo@redhat.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=jthoughton@google.com \
    --cc=mchehab+huawei@kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=shan.gavin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).