linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: james.morse@arm.com (James Morse)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v6 6/7] KVM: arm64: allow get exception information from userspace
Date: Thu, 14 Sep 2017 14:00:34 +0100	[thread overview]
Message-ID: <59BA7D72.4090403@arm.com> (raw)
In-Reply-To: <2a42d1ea-3456-2873-c9ea-d8a027b59789@huawei.com>

Hi gengdongjiu,

(re-ordered hunks)

On 13/09/17 08:32, gengdongjiu wrote:
> On 2017/9/8 0:30, James Morse wrote:
>> On 28/08/17 11:38, Dongjiu Geng wrote:
>> For BUS_MCEERR_A* from memory_failure() we can't know if they are caused by
>> an access or not.

Actually it looks like we can: I thought 'BUS_MCEERR_AR' could be triggered via
some CPER flags, but its not. The only code that flags MF_ACTION_REQUIRED is
x86's kernel-first handling, which nicely matches this 'direct access' problem.
BUS_MCEERR_AR also come from KVM stage2 faults (and the x86 equivalent). Powerpc
also triggers these directly, both from what look to be synchronous paths, so I
think its fair to equate BUS_MCEERR_AR to a synchronous access and BUS_MCEERR_AO
to something_else.

I don't think we need anything else.


>> When the mm code gets -EHWPOISON when trying to resolve a
>
> Because of that, so I allow  userspace getting exception information

... and there are cases where you can't get the exception information, and other
cases where it wasn't an exception at all.

[...]


>> What happens if the dram-scrub hardware spots an error in guest memory, but
>> the guest wasn't running? KVM won't have a relevant ESR value to give you.

> if the dram-scrub hardware spots an error in guest memory, it will generate
> IRQ in DDR controller, not SEA or SEI exception. I still do not consider the
> GSIV. For GSIV, may be we can only handle it in the host OS.

Great example: this IRQ pulls us out of a guest, we tromp through APEI and then
memory_failure(), the memory happened to belong to the same guest
(coincidence!), we send it some signal and now its user-space's problem.

Your KVM_REG_ARM64_FAULT mechanism is going to return stale data, even though
the notification interrupted the guest, and it was guest memory that was
affected. KVM doesn't have a relevant ESR.


I'm strongly against exposing 'which notification type' this error originally
came from because:
* it doesn't matter once we've got the CPER records,
* there isn't always an answer (there are/will-be other ways of tripping
  memory_failure())
* it creates ABI between firwmare, host userspace and guest userspace.
  Firmware's choice of notification type shouldn't affect anything other than
  the host kernel.


On 13/09/17 08:32, gengdongjiu wrote:
> On 2017/9/8 0:30, James Morse wrote:
>> On 28/08/17 11:38, Dongjiu Geng wrote:
>>> when userspace gets SIGBUS signal, it does not know whether
>>> this is a synchronous external abort or SError,
>>
>> Why would Qemu/kvmtool need to know if the original notification (if there was
>> one) was synchronous or asynchronous? This is between firmware and the kernel.

> there are two reasons:
> 
> 1. Let us firstly discuss the SEA and SEI, there are different workflow for the two different Errors.
> 2. when record the CPER in the user space, it needs to know the error type, because SEA and SEI are different Error source,
>    so they have different offset in the APEI table, that is to say they will be recorded to different place of the APEI table.

user-space can choose whether to use SEA or SEI, it doesn't have to choose the
same notification type that firmware used, which in turn doesn't have to be the
same as that used by the CPU to notify firmware.

The choice only matters because these notifications hang on an existing pieces
of the Arm-architecture, so the notification can only add to the architecturally
defined meaning. (i.e. You can only send an SEA for something that can already
be described as a synchronous external abort).

Once we get to user-space, for memory_failure() notifications, (which so far is
all we are talking about here), the only thing that could matter is whether the
guest hit a PG_hwpoison page as a stage2 fault. These can be described as
Synchronous-External-Abort.

The Synchronous-External-Abort/SError-Interrupt distinction matters for the CPU
because it can't always make an error synchronous. For memory_failure()
notifications to a KVM guest we really can do this, and we already have this
behaviour for free. An example:

A guest touches some hardware:poisoned memory, for whatever reason the CPU can't
put the world back together to make this a synchronous exception, so it reports
it to firmware as an SError-interrupt.
Linux gets an APEI notification and memory_failure() causes the affected page to
be unmapped from the guest's stage2, and SIGBUS_MCEERR_AO sent to user-space.

Qemu/kvmtool can now notify the guest with an IRQ or POLLed notification. AO->
action optional, probably asynchronous.

But in our example it wasn't really asynchronous, that was just a property of
the original CPU->firmware notification. What happens? The guest vcpu is re-run,
it re-runs the same instructions (this was a contained error so KVM's ELR points
at/before the instruction that steps in the problem). This time KVM takes a
stage2 fault, which the mm code will refuse to fixup because the relevant page
was marked as PG_hwpoision by memory_failure(). KVM signals Qemu/kvmtool with
SIGBUS_MCEERR_AR. Now Qemu/kvmtool can notify the guest using SEA.




>          etc/acpi/tables                               etc/hardware_errors
>         ====================                    ==========================================
>     + +--------------------------+            +------------------+
>     | | HEST                     |            |    address       |              +--------------+
>     | +--------------------------+            |    registers     |              | Error Status |
>     | | GHES0                    |            | +----------------+              | Data Block 0 |
>     | +--------------------------+ +--------->| |status_address0 |------------->| +------------+
>     | | .................        | |          | +----------------+              | |  CPER      |
>     | | error_status_address-----+-+ +------->| |status_address1 |----------+   | |  CPER      |
>     | | .................        |   |        | +----------------+          |   | |  ....      |
>     | | read_ack_register--------+-+ |        |  .............   |          |   | |  CPER      |
>     | | read_ack_preserve        | | |        +------------------+          |   | +-+------------+
>     | | read_ack_write           | | | +----->| |status_address10|--------+ |   | Error Status |
>     + +--------------------------+ | | |      | +----------------+        | |   | Data Block 1 |
>     | | GHES1                    | +-+-+----->| | ack_value0     |        | +-->| +------------+
>     + +--------------------------+   | |      | +----------------+        |     | |  CPER      |
>     | | .................        |   | | +--->| | ack_value1     |        |     | |  CPER      |
>     | | error_status_address-----+---+ | |    | +----------------+        |     | |  ....      |
>     | | .................        |     | |    | |  ............. |        |     | |  CPER      |
>     | | read_ack_register--------+-----+-+    | +----------------+        |     +-+------------+
>     | | read_ack_preserve        |     |   +->| | ack_value10    |        |     | |..........  |
>     | | read_ack_write           |     |   |  | +----------------+        |     | +------------+
>     + +--------------------------|     |   |                              |     | Error Status |
>     | | ...............          |     |   |                              |     | Data Block 10|
>     + +--------------------------+     |   |                              +---->| +------------+
>     | | GHES10                   |     |   |                                    | |  CPER      |
>     + +--------------------------+     |   |                                    | |  CPER      |
>     | | .................        |     |   |                                    | |  ....      |
>     | | error_status_address-----+-----+   |                                    | |  CPER      |
>     | | .................        |         |                                    +-+------------+
>     | | read_ack_register--------+---------+
>     | | read_ack_preserve        |
>     | | read_ack_write           |
>     + +--------------------------+
> 

(nice ascii art!)

>> I think I can see why you need this: to choose whether to emulate SEA or SEI,

> emulating SEA or SEI is one reason, another reason is that the CPER will be recorded to different place of APEI.

(This doesn't matter: Generate the CPER records after you've chosen the
notification and this isn't a problem. Or map your 'Error Status Data Blocks'
to status_address* depending on usage not in a fixed 1:1 way)


>> I think what you need is some way of knowing if the BUS_MCEERR_A* was directly
>> caused by a user-space (or guest) access, and if so was it a data or instruction

> when user space received the signal, it will judge whether the memory address is user-space (or guest) address

>> fetch. These can become SEA notifications.

> In fact, it can be SEI, not always SEA, why it will always SEA notifications?
> If the memory properties of data is device type, it may become SEI notification.

Let's take a step back: in what scenario should we use an emulated-SEA instead
of an emulated-SEI? (forget what the CPU and firmware did, this is up to Qemu to
decide).

It can use SEA if this is a valid Synchronous-external-abort. Stage 2 faults are
synchronous exceptions, if you hit a PG_hwpoision page on this path you can
report this back to the guest as a Synchronous-external-abort/SEA.
How do you know? You get SIGBUS_MCEERR_AR from KVM.


Thanks,

James

  reply	other threads:[~2017-09-14 13:00 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-28 10:38 [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM Dongjiu Geng
2017-08-28 10:38 ` [PATCH v6 1/7] arm64: cpufeature: Detect CPU RAS Extentions Dongjiu Geng
2017-08-31 17:44   ` James Morse
2017-09-04 11:20     ` gengdongjiu
2017-08-28 10:38 ` [PATCH v6 2/7] KVM: arm64: Save ESR_EL2 on guest SError Dongjiu Geng
2017-08-28 10:38 ` [PATCH v6 3/7] acpi: apei: remove the unused code Dongjiu Geng
2017-08-31 17:50   ` James Morse
2017-09-04 11:43     ` gengdongjiu
2017-09-08 18:17       ` James Morse
2017-09-11 12:04         ` gengdongjiu
2017-09-14 12:35           ` James Morse
2017-09-14 12:51             ` gengdongjiu
2017-08-28 10:38 ` [PATCH v6 4/7] arm64: kvm: support user space to query RAS extension feature Dongjiu Geng
2017-08-31 18:04   ` James Morse
2017-09-05  7:18     ` gengdongjiu
2017-09-07 16:31       ` James Morse
2017-09-08 14:34         ` 答复: " gengdongjiu
2017-09-08 15:03           ` Peter Maydell
2017-09-14 12:34             ` James Morse
2017-09-08 17:36         ` gengdongjiu
2017-09-14 12:38           ` James Morse
2017-08-28 10:38 ` [PATCH v6 5/7] arm64: kvm: route synchronous external abort exceptions to el2 Dongjiu Geng
2017-09-07 16:31   ` James Morse
2017-09-13  8:12     ` gengdongjiu
2017-09-14 11:12     ` gengdongjiu
2017-09-14 12:36       ` James Morse
2017-10-16 11:44       ` James Morse
2017-10-16 13:44         ` gengdongjiu
2017-08-28 10:38 ` [PATCH v6 6/7] KVM: arm64: allow get exception information from userspace Dongjiu Geng
2017-09-07 16:30   ` James Morse
2017-09-13  7:32     ` gengdongjiu
2017-09-14 13:00       ` James Morse [this message]
2017-09-18 13:36         ` gengdongjiu
2017-09-22 16:39           ` James Morse
2017-09-25 15:13             ` 答复: " gengdongjiu
2017-10-06 16:46               ` James Morse
2017-10-19  5:48                 ` gengdongjiu
2017-09-21  7:55         ` gengdongjiu
2017-09-22 16:51           ` James Morse
2017-09-27 11:07             ` gengdongjiu
2017-09-27 15:37               ` gengdongjiu
2017-10-06 17:31               ` James Morse
2017-10-19  7:49                 ` gengdongjiu
2017-08-28 10:38 ` [PATCH v6 7/7] arm64: kvm: handle SEI notification and pass the virtual syndrome Dongjiu Geng
2017-08-31 17:43 ` [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM James Morse
2017-09-04 11:10   ` gengdongjiu
2017-09-07 16:32     ` James Morse
2017-09-06 11:19 ` Peter Maydell
2017-09-06 11:29   ` gengdongjiu
     [not found] <0184EA26B2509940AA629AE1405DD7F2016BA9E4@DGGEMA503-MBX.china.huawei.com>
2017-10-20 15:33 ` [PATCH v6 6/7] KVM: arm64: allow get exception information from userspace gengdongjiu
2017-10-25 17:42   ` James Morse
2017-10-27  7:21     ` gengdongjiu
2017-11-03 18:36       ` James Morse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59BA7D72.4090403@arm.com \
    --to=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).