linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: gengdongjiu <gengdj.1984@gmail.com>
Cc: wuquanming <wuquanming@huawei.com>,
	kvm@vger.kernel.org, linux-doc@vger.kernel.org,
	Marc Zyngier <marc.zyngier@arm.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linuxarm@huawei.com, linux@armlinux.org.uk,
	Dongjiu Geng <gengdongjiu@huawei.com>,
	linux-acpi@vger.kernel.org, Huangshaoyu <huangshaoyu@huawei.com>,
	kvmarm@lists.cs.columbia.edu,
	arm-mail-list <linux-arm-kernel@lists.infradead.org>,
	devel@acpica.org
Subject: Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
Date: Fri, 12 Jan 2018 18:05:31 +0000	[thread overview]
Message-ID: <5A58F8EB.9080909@arm.com> (raw)
In-Reply-To: <CAMj-D2CwHWNpXzeNMEVKGTveAwuJvja219NF6PZ7xqYMUY17Kw@mail.gmail.com>

Hi gengdongjiu,

On 16/12/17 04:47, gengdongjiu wrote:
> [...]
>>
>>> +     case ESR_ELx_AET_UER:   /* The error has not been propagated */
>>> +             /*
>>> +              * Userspace only handle the guest SError Interrupt(SEI) if the
>>> +              * error has not been propagated
>>> +              */
>>> +             run->exit_reason = KVM_EXIT_EXCEPTION;
>>> +             run->ex.exception = ESR_ELx_EC_SERROR;
>>> +             run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>> +             return 0;
>>
>> We should not pass RAS notifications to user space. The kernel either handles
>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
> 
> For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
> below, which get from [0]

[..]

> so we can see the  exception is precise and PE can recover execution
> from the preferred return address of the exception, 

> so let guest handling it is
> better, for example, if it is guest application RAS error, we can kill
> the guest application instead of panic whole OS; if it is guest kernel
> RAS error, guest will panic.

If the kernel takes an unhandled RAS error it should panic - we don't know where
the error is.

I understand you want to kill-off guest tasks as a result of RAS errors, but
this needs to go through the whole APEI->memory_failure()->sigbus machinery so
that the kernel knows the kernel can keep running.

This saves us signalling user-space when we don't need to. An example:
code-corruption. Linux can happily re-read affected user-space executables from
disk, there is absolutely nothing user-space can do about it.
Handling errors first in the kernel allows us to do recovery for all the
affected processes, not just the one that happens to be running right now.


> Host does not know which application of guest has error, so host can
> not handle it,

It has to work this out, otherwise the errors we can handle never get a chance.

This kernel is expected to look at the error description, (which for some reason
we aren't talking about here), e.g. the CPER records, and determine what
recovery action is necessary for this error.
For memory errors this may be re-reading from disk, or at the worst case,
unmapping from all user-space users (including KVM's stage2) and raining signals
on all affected processes.

For a memory error the important piece of information is the physical address.
Only the kernel can do anything with this, it determines who owns the affected
memory and what needs doing to recover from the error.

If you pass the notification to user-space, all it can do is signal the guest to
"stop doing whatever it is you're doing". The guest may have been able to
re-read pages from disk, or otherwise handle the error.
Has the error been handled? No: The error remains latent in the system.


> panic OS is not a good choice for the Recoverable error.

If we don't know where the error is, and we can't make progress, its the only
sane choice.

This code is never expected to run! (why are we arguing about it?) We should get
RAS errors as GHES notifications from firmware via some mechanism. If those are
NOTIFY_SEI then APEI should claim the notification and kick off the appropriate
handling based on the CPER records. If/when we get kernel-first, that can claim
the SError. What we're left with is RAS notifications that no-one claimed
because there was no error-description found.



James

  reply	other threads:[~2018-01-12 18:05 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-10 19:54 [PATCH v8 0/7] Support RAS virtualization in KVM Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 1/7] arm64: cpufeature: Detect CPU RAS Extentions Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 2/7] KVM: arm64: Save ESR_EL2 on guest SError Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 3/7] acpi: apei: Add SEI notification type support for ARMv8 Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 4/7] KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 6/7] arm64: kvm: Set Virtual SError Exception Syndrome for guest Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization Dongjiu Geng
2017-11-14 16:00   ` James Morse
2017-11-15 11:29     ` gengdongjiu
2017-12-06 10:26     ` gengdongjiu
2017-12-06 19:04       ` James Morse
2017-12-07  6:37         ` gengdongjiu
2017-12-15  3:30           ` gengdongjiu
2018-01-12 18:05             ` James Morse
2018-01-15  8:33               ` Christoffer Dall
2018-01-16 11:19                 ` gengdongjiu
2018-01-21  3:10                 ` gengdongjiu
2018-01-21  2:45               ` gengdongjiu
2018-01-22 19:32                 ` James Morse
2017-12-15 18:52           ` James Morse
2017-12-16  3:44             ` gengdongjiu
2018-01-22 19:36               ` James Morse
2017-12-16  4:47     ` gengdongjiu
2018-01-12 18:05       ` James Morse [this message]
2018-01-16 11:22         ` gengdongjiu
2018-01-21  2:54         ` gengdongjiu
2017-11-14 16:00 ` [PATCH v8 0/7] Support RAS virtualization in KVM James Morse
2017-11-15 11:06   ` gengdongjiu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A58F8EB.9080909@arm.com \
    --to=james.morse@arm.com \
    --cc=devel@acpica.org \
    --cc=gengdj.1984@gmail.com \
    --cc=gengdongjiu@huawei.com \
    --cc=huangshaoyu@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxarm@huawei.com \
    --cc=marc.zyngier@arm.com \
    --cc=wuquanming@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).