linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: james.morse@arm.com (James Morse)
To: linux-arm-kernel@lists.infradead.org
Subject: 答复: [PATCH v6 4/7] arm64: kvm: support user space to query RAS extension feature
Date: Thu, 14 Sep 2017 13:38:52 +0100	[thread overview]
Message-ID: <59BA785C.40504@arm.com> (raw)
In-Reply-To: <0184EA26B2509940AA629AE1405DD7F2015F61EA@DGGEMA503-MBX.china.huawei.com>

Hi gengdongjiu,

On 08/09/17 18:36, gengdongjiu wrote:
>> The code to signal memory-failure to user-space doesn't depend on the CPU's RAS-extensions.
> I roughly check your answer and agree with your general idea.
> late I will check it in detail.

> I have a question, do you sure that if CPU does not support RAS-extensions kernel can still call
> memory-failure() to send signal to qemu?

If CONFIG_MEMORY_FAILURE is selected then the kernel has the code to send
SIGBUS_MCCERR_A* signals to user space.
This can be triggered by any GHES. A case in point: the 'AMD Seattle Overdrive'
under my desk has a HEST with four polled GHES entries. If any of these generate
a memory error the kernel will trigger the memory_failure() code.

Without all the ACPI stuff we still have CONFIG_HWPOISON_INJECT,
madvise(MADV_HWPOISON).
sysfs's: 'soft_offline_page' and 'hard_offline_page' as mechanisms that may
trigger memory_failure().

User space shouldn't try and guess whether its likely to get one of these.

I've been using these mechanisms to test SDEI virtualisation with kvmtool.


> After my checking the code, the general flow is RAS module detects the error or CPU consumes the
> hardware poison data, happen exception, then EL3 firmware records the address
to APEI table and
> send notification to kernel. Kernel parses the APEI table to get address and
call memory_failure() to
> identify the page to poison. That is to say, usually, after RAS detect the
error, it call memory_failure(),
> otherwise, it does not know whether this address is poison.

> I am worried about one thing, if hardware does not has RAS, OS cannot know which address is poison,
> so it cannot identify the address , then the address that is delivered to
Qemu(user space) may not right.

You've switched from talking about the CPU's 'ARM v8.2 RAS extensions' to 'RAS'.
Supporting memory_failure() is a linux:RAS feature, it doesn't depend on the
cpu:'ARM v8.2 RAS extensions'.

> As you said, kernel can also call memory_failure() even without RAS support. in this without RAS case,
> how it consider the address is poison and needs to send SIGBUS to QEMU?

Which component doesn't have RAS? The CPU? Okay, what about the memory controller:
The memory controller may catch a parity error during dram refresh/scrub, and
signal firmware via an interrupt. Firmware can then read the affected address
from the memory-controller's error registers and report it to the OS as a
firmware-first error.
The CPU doesn't need any RAS features for this to work.

To caricature your argument: 'the CPU doesn't have this particular version of
this particular RAS feature, thus no component in the system has any RAS feature'.

APEI's firmware-first is an abstraction so that we don't need to know which
system components have RAS features (or how to drive them) , we let firmware do
the work and tell us the results.


>> If Qemu supports notifying the guest about RAS errors using CPER records, it should generate a HEST describing firmware first. It can then
>> choose the notification methods, some of which may require optional KVM APIs to support.
>>
>> Seattle has a HEST, it doesn't support the CPU RAS-extensions. The kernel can notify user-space about memory_failure() on this machine. I
>> would expect Qemu to be able to receive signals and describe memory errors to a guest (1).
> 
> Usually we consider the address got from APEI table is poison. If so, I want to know, without RAS and APEI table, how it identify the address to hwpoison?

~s/APEI/CPER/

I agree the main path into memory_failure() is from APEI, and we get the address
from the CPER records. This isn't the only path into memory_failure, and there
may be more in the future.
None of the firmware-first stuff depends on CPU RAS features, it may be
reporting errors from some other component in the system.

Back to the issue at hand: should qemu/kvmtool generate a HEST?
If these tools want to inject emulated errors into a guest: yes. (this may be
totally independent of what the host supports)
If these tools want to pass memory-failure notifications for guest-memory into a
guest: yes.
You may want to make this depend on whether the host supports memory-failure
notifications, but you  shouldn't care where they come from.

Does the host support memory-failure notifications?
You can poke around in /proc to find this out, /proc/sys/vm has:
> memory_failure_early_kill
> memory_failure_recovery
when the kernel was built with CONFIG_MEMORY_FAILURE, but it user-space has
support for this stuff I don't know why you wouldn't unconditionally turn it on.
(what happens if you migrate between hosts with different support...)


James

  reply	other threads:[~2017-09-14 12:38 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-28 10:38 [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM Dongjiu Geng
2017-08-28 10:38 ` [PATCH v6 1/7] arm64: cpufeature: Detect CPU RAS Extentions Dongjiu Geng
2017-08-31 17:44   ` James Morse
2017-09-04 11:20     ` gengdongjiu
2017-08-28 10:38 ` [PATCH v6 2/7] KVM: arm64: Save ESR_EL2 on guest SError Dongjiu Geng
2017-08-28 10:38 ` [PATCH v6 3/7] acpi: apei: remove the unused code Dongjiu Geng
2017-08-31 17:50   ` James Morse
2017-09-04 11:43     ` gengdongjiu
2017-09-08 18:17       ` James Morse
2017-09-11 12:04         ` gengdongjiu
2017-09-14 12:35           ` James Morse
2017-09-14 12:51             ` gengdongjiu
2017-08-28 10:38 ` [PATCH v6 4/7] arm64: kvm: support user space to query RAS extension feature Dongjiu Geng
2017-08-31 18:04   ` James Morse
2017-09-05  7:18     ` gengdongjiu
2017-09-07 16:31       ` James Morse
2017-09-08 14:34         ` 答复: " gengdongjiu
2017-09-08 15:03           ` Peter Maydell
2017-09-14 12:34             ` James Morse
2017-09-08 17:36         ` gengdongjiu
2017-09-14 12:38           ` James Morse [this message]
2017-08-28 10:38 ` [PATCH v6 5/7] arm64: kvm: route synchronous external abort exceptions to el2 Dongjiu Geng
2017-09-07 16:31   ` James Morse
2017-09-13  8:12     ` gengdongjiu
2017-09-14 11:12     ` gengdongjiu
2017-09-14 12:36       ` James Morse
2017-10-16 11:44       ` James Morse
2017-10-16 13:44         ` gengdongjiu
2017-08-28 10:38 ` [PATCH v6 6/7] KVM: arm64: allow get exception information from userspace Dongjiu Geng
2017-09-07 16:30   ` James Morse
2017-09-13  7:32     ` gengdongjiu
2017-09-14 13:00       ` James Morse
2017-09-18 13:36         ` gengdongjiu
2017-09-22 16:39           ` James Morse
2017-09-25 15:13             ` 答复: " gengdongjiu
2017-10-06 16:46               ` James Morse
2017-10-19  5:48                 ` gengdongjiu
2017-09-21  7:55         ` gengdongjiu
2017-09-22 16:51           ` James Morse
2017-09-27 11:07             ` gengdongjiu
2017-09-27 15:37               ` gengdongjiu
2017-10-06 17:31               ` James Morse
2017-10-19  7:49                 ` gengdongjiu
2017-08-28 10:38 ` [PATCH v6 7/7] arm64: kvm: handle SEI notification and pass the virtual syndrome Dongjiu Geng
2017-08-31 17:43 ` [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM James Morse
2017-09-04 11:10   ` gengdongjiu
2017-09-07 16:32     ` James Morse
2017-09-06 11:19 ` Peter Maydell
2017-09-06 11:29   ` gengdongjiu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59BA785C.40504@arm.com \
    --to=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).