All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Morse <james.morse at arm.com>
To: devel@acpica.org
Subject: Re: [Devel] [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8
Date: Tue, 30 Jan 2018 19:39:40 +0000	[thread overview]
Message-ID: <5A70C9FC.5080100@arm.com> (raw)
In-Reply-To: 168c3f61-0d11-13a4-c383-1f6a97d0ef37@huawei.com

[-- Attachment #1: Type: text/plain, Size: 3347 bytes --]

Hi gengdongjiu,

On 23/01/18 09:23, gengdongjiu wrote:
> On 2018/1/23 3:39, James Morse wrote:
>> gengdongjiu wrote:
>>> This error source parsing and handling method
>>> is similar with the SEA.
>>
>> There are problems with doing this:
>>
>> Oct. 18, 2017, 10:26 a.m. James Morse wrote:
>> | How do SEA and SEI interact?
>> |
>> | As far as I can see they can both interrupt each other, which isn't something
>> | the single in_nmi() path in APEI can handle. I thinks we should fix this
>> | first.
>>
>> [..]
>>
>> | SEA gets away with a lot of things because its synchronous. SEI isn't. Xie
>> | XiuQi pointed to the memory_failure_queue() code. We can use this directly
>> | from SEA, but not SEI. (what happens if an SError arrives while we are
>> | queueing memory_failure work from an IRQ).
>> |
>> | The one that scares me is the trace-point reporting stuff. What happens if an
>> | SError arrives while we are enabling a trace point? (these are static-keys
>> | right?)
>> |
>> |  I don't think we can just plumb SEI in like this and be done with it.
>> |  (I'm looking at teasing out the estatus cache code from being x86:NMI only.
>> |  This way we solve the same 'cant do this from NMI context' with the same
>> |  code'.)
>>
>>
>> I will post what I've got for this estatus-cache thing as an RFC, its not ready
>> to be considered yet.

> Yes, I know you are dong that. Your serial's patch will consider all above things, right?

Assuming I got it right, yes. It currently makes the race Xie XiuQi spotted
worse, which I want to fix too. (details on the cover letter)


> If your patch can be consider that, this patch can based on your patchset. thanks.

I'd like to pick these patches onto the end of that series, but first I want to
know what NOTIFY_SEI means for any OS. The ACPI spec doesn't say, and because
its asynchronous, route-able and mask-able, there are many more corners than
NOTFIY_SEA.

This thing is a notification using an emulated SError exception. (emulated
because physical-SError must be routed to EL3 for firmware-first, and
virtual-SError belongs to EL2).

Does your firmware emulate SError exactly as the TakeException() pseudo code in
the Arm-Arm?
Is the emulated SError routed following the routing rules for HCR_EL2.{AMO, TGE}?
What does your firmware do when it wants to emulate SError but its masked?
(e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had PSTATE.A
 set.
 e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the emulated
 SError should go to EL1. This effectively masks SError.)


Answers to these let us determine whether a bug is in the firmware or the
kernel. If firmware is expecting the OS to do something special, I'd like to
know about it from the beginning!


>>> Expose API ghes_notify_sei() to external users. External
>>> modules can call this exposed API to parse APEI table and
>>> handle the SEI notification.
>>
>> external modules? You mean called by the arch code when it gets this NOTIFY_SEI?

> yes, called by kernel ARCH code, such as below, I remember I have discussed with you.

Sure. The phrase 'external modules' usually means the '.ko' files that live in
/lib/modules, nothing outside the kernel tree should be doing this stuff.


Thanks,

James


WARNING: multiple messages have this Message-ID (diff)
From: James Morse <james.morse@arm.com>
To: gengdongjiu <gengdongjiu@huawei.com>
Cc: christoffer.dall@linaro.org, marc.zyngier@arm.com,
	linux@armlinux.org.uk, catalin.marinas@arm.com,
	rjw@rjwysocki.net, bp@alien8.de, robert.moore@intel.com,
	lv.zheng@intel.com, corbet@lwn.net, will.deacon@arm.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, linux-acpi@vger.kernel.org,
	devel@acpica.org, huangshaoyu@huawei.com
Subject: Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8
Date: Tue, 30 Jan 2018 19:39:40 +0000	[thread overview]
Message-ID: <5A70C9FC.5080100@arm.com> (raw)
In-Reply-To: <168c3f61-0d11-13a4-c383-1f6a97d0ef37@huawei.com>

Hi gengdongjiu,

On 23/01/18 09:23, gengdongjiu wrote:
> On 2018/1/23 3:39, James Morse wrote:
>> gengdongjiu wrote:
>>> This error source parsing and handling method
>>> is similar with the SEA.
>>
>> There are problems with doing this:
>>
>> Oct. 18, 2017, 10:26 a.m. James Morse wrote:
>> | How do SEA and SEI interact?
>> |
>> | As far as I can see they can both interrupt each other, which isn't something
>> | the single in_nmi() path in APEI can handle. I thinks we should fix this
>> | first.
>>
>> [..]
>>
>> | SEA gets away with a lot of things because its synchronous. SEI isn't. Xie
>> | XiuQi pointed to the memory_failure_queue() code. We can use this directly
>> | from SEA, but not SEI. (what happens if an SError arrives while we are
>> | queueing memory_failure work from an IRQ).
>> |
>> | The one that scares me is the trace-point reporting stuff. What happens if an
>> | SError arrives while we are enabling a trace point? (these are static-keys
>> | right?)
>> |
>> |  I don't think we can just plumb SEI in like this and be done with it.
>> |  (I'm looking at teasing out the estatus cache code from being x86:NMI only.
>> |  This way we solve the same 'cant do this from NMI context' with the same
>> |  code'.)
>>
>>
>> I will post what I've got for this estatus-cache thing as an RFC, its not ready
>> to be considered yet.

> Yes, I know you are dong that. Your serial's patch will consider all above things, right?

Assuming I got it right, yes. It currently makes the race Xie XiuQi spotted
worse, which I want to fix too. (details on the cover letter)


> If your patch can be consider that, this patch can based on your patchset. thanks.

I'd like to pick these patches onto the end of that series, but first I want to
know what NOTIFY_SEI means for any OS. The ACPI spec doesn't say, and because
its asynchronous, route-able and mask-able, there are many more corners than
NOTFIY_SEA.

This thing is a notification using an emulated SError exception. (emulated
because physical-SError must be routed to EL3 for firmware-first, and
virtual-SError belongs to EL2).

Does your firmware emulate SError exactly as the TakeException() pseudo code in
the Arm-Arm?
Is the emulated SError routed following the routing rules for HCR_EL2.{AMO, TGE}?
What does your firmware do when it wants to emulate SError but its masked?
(e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had PSTATE.A
 set.
 e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the emulated
 SError should go to EL1. This effectively masks SError.)


Answers to these let us determine whether a bug is in the firmware or the
kernel. If firmware is expecting the OS to do something special, I'd like to
know about it from the beginning!


>>> Expose API ghes_notify_sei() to external users. External
>>> modules can call this exposed API to parse APEI table and
>>> handle the SEI notification.
>>
>> external modules? You mean called by the arch code when it gets this NOTIFY_SEI?

> yes, called by kernel ARCH code, such as below, I remember I have discussed with you.

Sure. The phrase 'external modules' usually means the '.ko' files that live in
/lib/modules, nothing outside the kernel tree should be doing this stuff.


Thanks,

James


WARNING: multiple messages have this Message-ID (diff)
From: james.morse@arm.com (James Morse)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8
Date: Tue, 30 Jan 2018 19:39:40 +0000	[thread overview]
Message-ID: <5A70C9FC.5080100@arm.com> (raw)
In-Reply-To: <168c3f61-0d11-13a4-c383-1f6a97d0ef37@huawei.com>

Hi gengdongjiu,

On 23/01/18 09:23, gengdongjiu wrote:
> On 2018/1/23 3:39, James Morse wrote:
>> gengdongjiu wrote:
>>> This error source parsing and handling method
>>> is similar with the SEA.
>>
>> There are problems with doing this:
>>
>> Oct. 18, 2017, 10:26 a.m. James Morse wrote:
>> | How do SEA and SEI interact?
>> |
>> | As far as I can see they can both interrupt each other, which isn't something
>> | the single in_nmi() path in APEI can handle. I thinks we should fix this
>> | first.
>>
>> [..]
>>
>> | SEA gets away with a lot of things because its synchronous. SEI isn't. Xie
>> | XiuQi pointed to the memory_failure_queue() code. We can use this directly
>> | from SEA, but not SEI. (what happens if an SError arrives while we are
>> | queueing memory_failure work from an IRQ).
>> |
>> | The one that scares me is the trace-point reporting stuff. What happens if an
>> | SError arrives while we are enabling a trace point? (these are static-keys
>> | right?)
>> |
>> |  I don't think we can just plumb SEI in like this and be done with it.
>> |  (I'm looking at teasing out the estatus cache code from being x86:NMI only.
>> |  This way we solve the same 'cant do this from NMI context' with the same
>> |  code'.)
>>
>>
>> I will post what I've got for this estatus-cache thing as an RFC, its not ready
>> to be considered yet.

> Yes, I know you are dong that. Your serial's patch will consider all above things, right?

Assuming I got it right, yes. It currently makes the race Xie XiuQi spotted
worse, which I want to fix too. (details on the cover letter)


> If your patch can be consider that, this patch can based on your patchset. thanks.

I'd like to pick these patches onto the end of that series, but first I want to
know what NOTIFY_SEI means for any OS. The ACPI spec doesn't say, and because
its asynchronous, route-able and mask-able, there are many more corners than
NOTFIY_SEA.

This thing is a notification using an emulated SError exception. (emulated
because physical-SError must be routed to EL3 for firmware-first, and
virtual-SError belongs to EL2).

Does your firmware emulate SError exactly as the TakeException() pseudo code in
the Arm-Arm?
Is the emulated SError routed following the routing rules for HCR_EL2.{AMO, TGE}?
What does your firmware do when it wants to emulate SError but its masked?
(e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had PSTATE.A
 set.
 e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the emulated
 SError should go to EL1. This effectively masks SError.)


Answers to these let us determine whether a bug is in the firmware or the
kernel. If firmware is expecting the OS to do something special, I'd like to
know about it from the beginning!


>>> Expose API ghes_notify_sei() to external users. External
>>> modules can call this exposed API to parse APEI table and
>>> handle the SEI notification.
>>
>> external modules? You mean called by the arch code when it gets this NOTIFY_SEI?

> yes, called by kernel ARCH code, such as below, I remember I have discussed with you.

Sure. The phrase 'external modules' usually means the '.ko' files that live in
/lib/modules, nothing outside the kernel tree should be doing this stuff.


Thanks,

James

             reply	other threads:[~2018-01-30 19:39 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-30 19:39 James Morse [this message]
2018-01-30 19:39 ` [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8 James Morse
2018-01-30 19:39 ` James Morse
  -- strict thread matches above, loose matches on Subject: below --
2018-04-13 13:50 [Devel] " gengdongjiu
2018-04-12 16:14 James Morse
2018-02-15 17:55 James Morse
2018-02-05 11:24 gengdongjiu
2018-01-25  8:21 [Devel] [PATCH v9 6/7] arm64: kvm: Set Virtual SError Exception Syndrome for guest gengdongjiu
2018-01-25  8:21 ` gengdongjiu
2018-01-25  8:21 ` gengdongjiu
2018-01-25  8:21 ` gengdongjiu
2018-01-23 19:07 [Devel] " James Morse
2018-01-23 19:07 ` James Morse
2018-01-23 19:07 ` James Morse
2018-01-23 19:06 [Devel] [PATCH v9 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl James Morse
2018-01-23 19:06 ` James Morse
2018-01-23 19:06 ` James Morse
2018-01-23 10:07 [Devel] [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8 gengdongjiu
2018-01-23 10:07 ` gengdongjiu
2018-01-23 10:07 ` gengdongjiu
2018-01-23 10:07 ` gengdongjiu
2018-01-23  9:23 [Devel] " gengdongjiu
2018-01-23  9:23 ` gengdongjiu
2018-01-23  9:23 ` gengdongjiu
2018-01-23  9:23 ` gengdongjiu
2018-01-22 19:39 [Devel] " James Morse
2018-01-22 19:39 ` James Morse
2018-01-22 19:39 ` James Morse
2018-01-22 19:39 ` James Morse
2018-01-06 16:02 [Devel] [PATCH v9 7/7] arm64: kvm: handle guest SError Interrupt by categorization Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 [Devel] [PATCH v9 6/7] arm64: kvm: Set Virtual SError Exception Syndrome for guest Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 [Devel] [PATCH v9 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 [Devel] [PATCH v9 4/7] KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 [Devel] [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8 Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 [Devel] [PATCH v9 2/7] KVM: arm64: Save ESR_EL2 on guest SError Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 [Devel] [PATCH v9 1/7] arm64: cpufeature: Detect CPU RAS Extentions Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 [Devel] [PATCH v9 0/7] Handle guest RAS Error in KVM and kernel Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng
2018-01-06 16:02 ` Dongjiu Geng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A70C9FC.5080100@arm.com \
    --to=devel@acpica.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.