* [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up
@ 2018-05-16 16:28 James Morse
2018-05-16 16:28 ` [PATCH v4 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef James Morse
` (12 more replies)
0 siblings, 13 replies; 16+ messages in thread
From: James Morse @ 2018-05-16 16:28 UTC (permalink / raw)
To: linux-acpi
Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, inux-mm, Xie XiuQi,
Marc Zyngier, Catalin Marinas, Tyler Baicar, Will Deacon,
Christoffer Dall, Dongjiu Geng, Punit Agrawal, Borislav Petkov,
James Morse, Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown
The aim of this series is to wire arm64's SDEI into APEI.
Since v3 the NMI fixmap entries and locks have moved into their own
structure. This moves the indirection up from the 'lock', which should
be more acceptible to polite society.
Changes are noted in each patch.
This touches a few trees, so I'm not sure how best it should be merged.
Patches 11 and 12 are reducing a race that is made worse by patch 4, I'd
like them to arrive together, even though patch 11 doesn't depend on anything
else in the series. A partial merge of this would be 1-3 and 11.
The earlier boiler-plate:
What's SDEI? Its ARM's "Software Delegated Exception Interface" [0]. It's
used by firmware to tell the OS about firmware-first RAS events.
These Software exceptions can interrupt anything, so I describe them as
NMI-like. They aren't the only NMI-like way to notify the OS about
firmware-first RAS events, the ACPI spec also defines 'NOTFIY_SEA' and
'NOTIFY_SEI'.
(Acronyms: SEA, Synchronous External Abort. The CPU requested some memory,
but the owner of that memory said no. These are always synchronous with the
instruction that caused them. SEI, System-Error Interrupt, commonly called
SError. This is an asynchronous external abort, the memory-owner didn't say no
at the right point. Collectively these things are called external-aborts
How is firmware involved? It traps these and re-injects them into the kernel
once its written the CPER records).
APEI's GHES code only expects one source of NMI. If a platform implements
more than one of these mechanisms, APEI needs to handle the interaction.
'SEA' and 'SEI' can interact as 'SEI' is asynchronous. SDEI can interact
with itself: its exceptions can be 'normal' or 'critical', and firmware
could use both types for RAS. (errors using normal, 'panic-now' using
critical).
What does this series do?
Patches 1-4 refactor APEIs 'estatus queue' so it can be used for all
NMI-like notifications. This defers the NMI work to irq_work, which will
happen when we next unmask interrupts.
Patches 5&6 move the arch and KVM code around so that NMI-like notifications
are always called in_nmi().
Patch 7 changes the 'irq or nmi?' path through ghes_copy_tofrom_phys()
to be per-ghes. When called in_nmi(), the struct ghes is expected to
provide a fixmap slot and lock that is safe to use. NMI-like notifications
that mask each other can share these resources. Those that interact should
have their own fixmap slot and lock.
Patch 8 renames NOTIFY_SEA's use of NOTIFY_NMI's infrastructure, as we're
about to have multiple NMI-like users that can't share resources.
Pathes 9&10 add the SDEI helper, and notify methods for APEI.
After this, adding further firmware-first pieces for arm64 is simple
(and safe), and all our NMI-like notifications behave the same as x86's
NOTIFY_NMI.
All of this makes the race between memory_failure_queue() and
ret_to_user worse, as there is now always irq_work involved.
Patch 11 makes the reschedule to memory_failure() run as soon as possible.
Patch 12 makes sure the arch code knows whether the irq_work has run by
the time do_sea() returns. We can skip the signalling step if it has as
APEI has done its work.
ghes.c became clearer to me when I worked out that it has three sets of
functions with 'estatus' in the name. One is a pool of memory that can be
allocated-from atomically. This is grown/shrunk when new NMI users are
allocated.
The second is the estatus-cache, which holds recent notifications so it
can suppress notifications we've already handled.
The last it the estatus-queue, which holds data from NMI-like notifications
(in pool memory) to be processed from irq_work.
Testing?
Tested with the SDEI FVP based software model and a mocked up NOTFIY_SEA using
KVM. I've added a case where 'corrected errors' are discovered at probe time
to exercise ghes_probe() during boot. I've only build tested this on x86.
Thanks,
James
[0] http://infocenter.arm.com/help/topic/com.arm.doc.den0054a/ARM_DEN0054A_Software_Delegated_Exception_Interface.pdf
James Morse (12):
ACPI / APEI: Move the estatus queue code up, and under its own ifdef
ACPI / APEI: Generalise the estatus queue's add/remove and notify code
ACPI / APEI: don't wait to serialise with oops messages when
panic()ing
ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing
arm64: KVM/mm: Move SEA handling behind a single 'claim' interface
ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple
in_nmi() users
ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications
firmware: arm_sdei: Add ACPI GHES registration helper
ACPI / APEI: Add support for the SDEI GHES Notification type
mm/memory-failure: increase queued recovery work's priority
arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
arch/arm/include/asm/kvm_ras.h | 14 +
arch/arm/include/asm/system_misc.h | 5 -
arch/arm64/include/asm/acpi.h | 4 +
arch/arm64/include/asm/daifflags.h | 1 +
arch/arm64/include/asm/fixmap.h | 8 +-
arch/arm64/include/asm/kvm_ras.h | 24 ++
arch/arm64/include/asm/system_misc.h | 2 -
arch/arm64/kernel/acpi.c | 49 ++++
arch/arm64/mm/fault.c | 30 +-
drivers/acpi/apei/ghes.c | 518 ++++++++++++++++++++---------------
drivers/firmware/arm_sdei.c | 67 +++++
include/acpi/ghes.h | 17 ++
include/linux/arm_sdei.h | 8 +
mm/memory-failure.c | 11 +-
virt/kvm/arm/mmu.c | 4 +-
15 files changed, 503 insertions(+), 259 deletions(-)
create mode 100644 arch/arm/include/asm/kvm_ras.h
create mode 100644 arch/arm64/include/asm/kvm_ras.h
--
2.16.2
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v4 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef
2018-05-16 16:28 [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
@ 2018-05-16 16:28 ` James Morse
2018-05-16 16:28 ` [PATCH v4 02/12] ACPI / APEI: Generalise the estatus queue's add/remove and notify code James Morse
` (11 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2018-05-16 16:28 UTC (permalink / raw)
To: linux-acpi
Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, inux-mm, Marc Zyngier,
Catalin Marinas, Tyler Baicar, Will Deacon, Dongjiu Geng,
Punit Agrawal, Borislav Petkov, Naoya Horiguchi, kvmarm,
linux-arm-kernel, Len Brown
To support asynchronous NMI-like notifications on arm64 we need to use
the estatus-queue. These patches refactor it to allow multiple APEI
notification types to use it.
First we move the estatus-queue code higher in the file so that any
notify_foo() handler can make use of it.
This patch moves code around ... and makes the following trivial change:
Freshen the dated comment above ghes_estatus_llist. printk() is no
longer the issue, its the helpers like memory_failure_queue() that
still aren't nmi safe.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
---
drivers/acpi/apei/ghes.c | 265 ++++++++++++++++++++++++-----------------------
1 file changed, 137 insertions(+), 128 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 1efefe919555..e2af91c92135 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -545,6 +545,16 @@ static int ghes_print_estatus(const char *pfx,
return 0;
}
+static void __ghes_panic(struct ghes *ghes)
+{
+ __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
+
+ /* reboot to log the error! */
+ if (!panic_timeout)
+ panic_timeout = ghes_panic_timeout;
+ panic("Fatal hardware error!");
+}
+
/*
* GHES error status reporting throttle, to report more kinds of
* errors, instead of just most frequently occurred errors.
@@ -672,6 +682,133 @@ static void ghes_estatus_cache_add(
rcu_read_unlock();
}
+#ifdef CONFIG_HAVE_ACPI_APEI_NMI
+/*
+ * Handlers for CPER records may not be NMI safe. For example,
+ * memory_failure_queue() takes spinlocks and calls schedule_work_on().
+ * In any NMI-like handler, memory from ghes_estatus_pool is used to save
+ * estatus, and added to the ghes_estatus_llist. irq_work_queue() causes
+ * ghes_proc_in_irq() to run in IRQ context where each estatus in
+ * ghes_estatus_llist is processed. Each NMI-like error source must grow
+ * the ghes_estatus_pool to ensure memory is available.
+ *
+ * Memory from the ghes_estatus_pool is also used with the ghes_estatus_cache
+ * to suppress frequent messages.
+ */
+static struct llist_head ghes_estatus_llist;
+static struct irq_work ghes_proc_irq_work;
+
+static void ghes_print_queued_estatus(void)
+{
+ struct llist_node *llnode;
+ struct ghes_estatus_node *estatus_node;
+ struct acpi_hest_generic *generic;
+ struct acpi_hest_generic_status *estatus;
+
+ llnode = llist_del_all(&ghes_estatus_llist);
+ /*
+ * Because the time order of estatus in list is reversed,
+ * revert it back to proper order.
+ */
+ llnode = llist_reverse_order(llnode);
+ while (llnode) {
+ estatus_node = llist_entry(llnode, struct ghes_estatus_node,
+ llnode);
+ estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+ generic = estatus_node->generic;
+ ghes_print_estatus(NULL, generic, estatus);
+ llnode = llnode->next;
+ }
+}
+
+/* Save estatus for further processing in IRQ context */
+static void __process_error(struct ghes *ghes)
+{
+#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+ u32 len, node_len;
+ struct ghes_estatus_node *estatus_node;
+ struct acpi_hest_generic_status *estatus;
+
+ if (ghes_estatus_cached(ghes->estatus))
+ return;
+
+ len = cper_estatus_len(ghes->estatus);
+ node_len = GHES_ESTATUS_NODE_LEN(len);
+
+ estatus_node = (void *)gen_pool_alloc(ghes_estatus_pool, node_len);
+ if (!estatus_node)
+ return;
+
+ estatus_node->ghes = ghes;
+ estatus_node->generic = ghes->generic;
+ estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+ memcpy(estatus, ghes->estatus, len);
+ llist_add(&estatus_node->llnode, &ghes_estatus_llist);
+#endif
+}
+
+static unsigned long ghes_esource_prealloc_size(
+ const struct acpi_hest_generic *generic)
+{
+ unsigned long block_length, prealloc_records, prealloc_size;
+
+ block_length = min_t(unsigned long, generic->error_block_length,
+ GHES_ESTATUS_MAX_SIZE);
+ prealloc_records = max_t(unsigned long,
+ generic->records_to_preallocate, 1);
+ prealloc_size = min_t(unsigned long, block_length * prealloc_records,
+ GHES_ESOURCE_PREALLOC_MAX_SIZE);
+
+ return prealloc_size;
+}
+
+static void ghes_estatus_pool_shrink(unsigned long len)
+{
+ ghes_estatus_pool_size_request -= PAGE_ALIGN(len);
+}
+
+static void ghes_proc_in_irq(struct irq_work *irq_work)
+{
+ struct llist_node *llnode, *next;
+ struct ghes_estatus_node *estatus_node;
+ struct acpi_hest_generic *generic;
+ struct acpi_hest_generic_status *estatus;
+ u32 len, node_len;
+
+ llnode = llist_del_all(&ghes_estatus_llist);
+ /*
+ * Because the time order of estatus in list is reversed,
+ * revert it back to proper order.
+ */
+ llnode = llist_reverse_order(llnode);
+ while (llnode) {
+ next = llnode->next;
+ estatus_node = llist_entry(llnode, struct ghes_estatus_node,
+ llnode);
+ estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+ len = cper_estatus_len(estatus);
+ node_len = GHES_ESTATUS_NODE_LEN(len);
+ ghes_do_proc(estatus_node->ghes, estatus);
+ if (!ghes_estatus_cached(estatus)) {
+ generic = estatus_node->generic;
+ if (ghes_print_estatus(NULL, generic, estatus))
+ ghes_estatus_cache_add(generic, estatus);
+ }
+ gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node,
+ node_len);
+ llnode = next;
+ }
+}
+
+static void ghes_nmi_init_cxt(void)
+{
+ init_irq_work(&ghes_proc_irq_work, ghes_proc_in_irq);
+}
+
+#else
+static inline void ghes_nmi_init_cxt(void) { }
+#endif /* CONFIG_HAVE_ACPI_APEI_NMI */
+
static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
{
int rc;
@@ -687,16 +824,6 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
return apei_write(val, &gv2->read_ack_register);
}
-static void __ghes_panic(struct ghes *ghes)
-{
- __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
-
- /* reboot to log the error! */
- if (!panic_timeout)
- panic_timeout = ghes_panic_timeout;
- panic("Fatal hardware error!");
-}
-
static int ghes_proc(struct ghes *ghes)
{
int rc;
@@ -828,17 +955,6 @@ static inline void ghes_sea_remove(struct ghes *ghes) { }
#endif /* CONFIG_ACPI_APEI_SEA */
#ifdef CONFIG_HAVE_ACPI_APEI_NMI
-/*
- * printk is not safe in NMI context. So in NMI handler, we allocate
- * required memory from lock-less memory allocator
- * (ghes_estatus_pool), save estatus into it, put them into lock-less
- * list (ghes_estatus_llist), then delay printk into IRQ context via
- * irq_work (ghes_proc_irq_work). ghes_estatus_size_request record
- * required pool size by all NMI error source.
- */
-static struct llist_head ghes_estatus_llist;
-static struct irq_work ghes_proc_irq_work;
-
/*
* NMI may be triggered on any CPU, so ghes_in_nmi is used for
* having only one concurrent reader.
@@ -847,88 +963,6 @@ static atomic_t ghes_in_nmi = ATOMIC_INIT(0);
static LIST_HEAD(ghes_nmi);
-static void ghes_proc_in_irq(struct irq_work *irq_work)
-{
- struct llist_node *llnode, *next;
- struct ghes_estatus_node *estatus_node;
- struct acpi_hest_generic *generic;
- struct acpi_hest_generic_status *estatus;
- u32 len, node_len;
-
- llnode = llist_del_all(&ghes_estatus_llist);
- /*
- * Because the time order of estatus in list is reversed,
- * revert it back to proper order.
- */
- llnode = llist_reverse_order(llnode);
- while (llnode) {
- next = llnode->next;
- estatus_node = llist_entry(llnode, struct ghes_estatus_node,
- llnode);
- estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
- len = cper_estatus_len(estatus);
- node_len = GHES_ESTATUS_NODE_LEN(len);
- ghes_do_proc(estatus_node->ghes, estatus);
- if (!ghes_estatus_cached(estatus)) {
- generic = estatus_node->generic;
- if (ghes_print_estatus(NULL, generic, estatus))
- ghes_estatus_cache_add(generic, estatus);
- }
- gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node,
- node_len);
- llnode = next;
- }
-}
-
-static void ghes_print_queued_estatus(void)
-{
- struct llist_node *llnode;
- struct ghes_estatus_node *estatus_node;
- struct acpi_hest_generic *generic;
- struct acpi_hest_generic_status *estatus;
-
- llnode = llist_del_all(&ghes_estatus_llist);
- /*
- * Because the time order of estatus in list is reversed,
- * revert it back to proper order.
- */
- llnode = llist_reverse_order(llnode);
- while (llnode) {
- estatus_node = llist_entry(llnode, struct ghes_estatus_node,
- llnode);
- estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
- generic = estatus_node->generic;
- ghes_print_estatus(NULL, generic, estatus);
- llnode = llnode->next;
- }
-}
-
-/* Save estatus for further processing in IRQ context */
-static void __process_error(struct ghes *ghes)
-{
-#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
- u32 len, node_len;
- struct ghes_estatus_node *estatus_node;
- struct acpi_hest_generic_status *estatus;
-
- if (ghes_estatus_cached(ghes->estatus))
- return;
-
- len = cper_estatus_len(ghes->estatus);
- node_len = GHES_ESTATUS_NODE_LEN(len);
-
- estatus_node = (void *)gen_pool_alloc(ghes_estatus_pool, node_len);
- if (!estatus_node)
- return;
-
- estatus_node->ghes = ghes;
- estatus_node->generic = ghes->generic;
- estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
- memcpy(estatus, ghes->estatus, len);
- llist_add(&estatus_node->llnode, &ghes_estatus_llist);
-#endif
-}
-
static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
{
struct ghes *ghes;
@@ -967,26 +1001,6 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
return ret;
}
-static unsigned long ghes_esource_prealloc_size(
- const struct acpi_hest_generic *generic)
-{
- unsigned long block_length, prealloc_records, prealloc_size;
-
- block_length = min_t(unsigned long, generic->error_block_length,
- GHES_ESTATUS_MAX_SIZE);
- prealloc_records = max_t(unsigned long,
- generic->records_to_preallocate, 1);
- prealloc_size = min_t(unsigned long, block_length * prealloc_records,
- GHES_ESOURCE_PREALLOC_MAX_SIZE);
-
- return prealloc_size;
-}
-
-static void ghes_estatus_pool_shrink(unsigned long len)
-{
- ghes_estatus_pool_size_request -= PAGE_ALIGN(len);
-}
-
static void ghes_nmi_add(struct ghes *ghes)
{
unsigned long len;
@@ -1018,14 +1032,9 @@ static void ghes_nmi_remove(struct ghes *ghes)
ghes_estatus_pool_shrink(len);
}
-static void ghes_nmi_init_cxt(void)
-{
- init_irq_work(&ghes_proc_irq_work, ghes_proc_in_irq);
-}
#else /* CONFIG_HAVE_ACPI_APEI_NMI */
static inline void ghes_nmi_add(struct ghes *ghes) { }
static inline void ghes_nmi_remove(struct ghes *ghes) { }
-static inline void ghes_nmi_init_cxt(void) { }
#endif /* CONFIG_HAVE_ACPI_APEI_NMI */
static int ghes_probe(struct platform_device *ghes_dev)
--
2.16.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 02/12] ACPI / APEI: Generalise the estatus queue's add/remove and notify code
2018-05-16 16:28 [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
2018-05-16 16:28 ` [PATCH v4 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef James Morse
@ 2018-05-16 16:28 ` James Morse
2018-05-16 16:28 ` [PATCH v4 03/12] ACPI / APEI: don't wait to serialise with oops messages when panic()ing James Morse
` (10 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2018-05-16 16:28 UTC (permalink / raw)
To: linux-acpi
Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, inux-mm, Marc Zyngier,
Catalin Marinas, Tyler Baicar, Will Deacon, Dongjiu Geng,
Punit Agrawal, Borislav Petkov, Naoya Horiguchi, kvmarm,
linux-arm-kernel, Len Brown
Refactor the estatus queue's pool grow/shrink code and notification
routine from NOTIFY_NMI's handlers. This will allow another notification
method to use the estatus queue without duplicating this code.
This patch adds rcu_read_lock()/rcu_read_unlock() around the list
list_for_each_entry_rcu() walker. These aren't strictly necessary as
the whole nmi_enter/nmi_exit() window is a spooky RCU read-side
critical section.
The existing ghes_estatus_pool_shrink() is folded into the new
ghes_estatus_queue_shrink_pool() as only the queue uses it.
_in_nmi_notify_one() is separate from the rcu-list walker for a later
caller that doesn't need to walk a list.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
---
Changes since v3:
* Removed dupicate or redundant paragraphs in commit message.
* Fixed the style of a zero check
Changes since v1:
* Tidied up _in_nmi_notify_one().
drivers/acpi/apei/ghes.c | 100 ++++++++++++++++++++++++++++++-----------------
1 file changed, 65 insertions(+), 35 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index e2af91c92135..5a0b8a1bddb1 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -747,6 +747,51 @@ static void __process_error(struct ghes *ghes)
#endif
}
+static int _in_nmi_notify_one(struct ghes *ghes)
+{
+ int sev;
+
+ if (ghes_read_estatus(ghes, 1)) {
+ ghes_clear_estatus(ghes);
+ return -ENOENT;
+ }
+
+ sev = ghes_severity(ghes->estatus->error_severity);
+ if (sev >= GHES_SEV_PANIC) {
+#ifdef CONFIG_X86
+ oops_begin();
+#endif
+ ghes_print_queued_estatus();
+ __ghes_panic(ghes);
+ }
+
+ if (!(ghes->flags & GHES_TO_CLEAR))
+ return 0;
+
+ __process_error(ghes);
+ ghes_clear_estatus(ghes);
+
+ return 0;
+}
+
+static int ghes_estatus_queue_notified(struct list_head *rcu_list)
+{
+ int ret = -ENOENT;
+ struct ghes *ghes;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(ghes, rcu_list, list) {
+ if (!_in_nmi_notify_one(ghes))
+ ret = 0;
+ }
+ rcu_read_unlock();
+
+ if (IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG) && !ret)
+ irq_work_queue(&ghes_proc_irq_work);
+
+ return ret;
+}
+
static unsigned long ghes_esource_prealloc_size(
const struct acpi_hest_generic *generic)
{
@@ -762,11 +807,24 @@ static unsigned long ghes_esource_prealloc_size(
return prealloc_size;
}
-static void ghes_estatus_pool_shrink(unsigned long len)
+/* After removing a queue user, we can shrink the pool */
+static void ghes_estatus_queue_shrink_pool(struct ghes *ghes)
{
+ unsigned long len;
+
+ len = ghes_esource_prealloc_size(ghes->generic);
ghes_estatus_pool_size_request -= PAGE_ALIGN(len);
}
+/* Before adding a queue user, grow the pool */
+static void ghes_estatus_queue_grow_pool(struct ghes *ghes)
+{
+ unsigned long len;
+
+ len = ghes_esource_prealloc_size(ghes->generic);
+ ghes_estatus_pool_expand(len);
+}
+
static void ghes_proc_in_irq(struct irq_work *irq_work)
{
struct llist_node *llnode, *next;
@@ -965,48 +1023,22 @@ static LIST_HEAD(ghes_nmi);
static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
{
- struct ghes *ghes;
- int sev, ret = NMI_DONE;
+ int ret = NMI_DONE;
if (!atomic_add_unless(&ghes_in_nmi, 1, 1))
return ret;
- list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
- if (ghes_read_estatus(ghes, 1)) {
- ghes_clear_estatus(ghes);
- continue;
- } else {
- ret = NMI_HANDLED;
- }
-
- sev = ghes_severity(ghes->estatus->error_severity);
- if (sev >= GHES_SEV_PANIC) {
- oops_begin();
- ghes_print_queued_estatus();
- __ghes_panic(ghes);
- }
+ if (!ghes_estatus_queue_notified(&ghes_nmi))
+ ret = NMI_HANDLED;
- if (!(ghes->flags & GHES_TO_CLEAR))
- continue;
-
- __process_error(ghes);
- ghes_clear_estatus(ghes);
- }
-
-#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
- if (ret == NMI_HANDLED)
- irq_work_queue(&ghes_proc_irq_work);
-#endif
atomic_dec(&ghes_in_nmi);
return ret;
}
static void ghes_nmi_add(struct ghes *ghes)
{
- unsigned long len;
+ ghes_estatus_queue_grow_pool(ghes);
- len = ghes_esource_prealloc_size(ghes->generic);
- ghes_estatus_pool_expand(len);
mutex_lock(&ghes_list_mutex);
if (list_empty(&ghes_nmi))
register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0, "ghes");
@@ -1016,8 +1048,6 @@ static void ghes_nmi_add(struct ghes *ghes)
static void ghes_nmi_remove(struct ghes *ghes)
{
- unsigned long len;
-
mutex_lock(&ghes_list_mutex);
list_del_rcu(&ghes->list);
if (list_empty(&ghes_nmi))
@@ -1028,8 +1058,8 @@ static void ghes_nmi_remove(struct ghes *ghes)
* freed after NMI handler finishes.
*/
synchronize_rcu();
- len = ghes_esource_prealloc_size(ghes->generic);
- ghes_estatus_pool_shrink(len);
+
+ ghes_estatus_queue_shrink_pool(ghes);
}
#else /* CONFIG_HAVE_ACPI_APEI_NMI */
--
2.16.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 03/12] ACPI / APEI: don't wait to serialise with oops messages when panic()ing
2018-05-16 16:28 [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
2018-05-16 16:28 ` [PATCH v4 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef James Morse
2018-05-16 16:28 ` [PATCH v4 02/12] ACPI / APEI: Generalise the estatus queue's add/remove and notify code James Morse
@ 2018-05-16 16:28 ` James Morse
2018-05-16 16:28 ` [PATCH v4 04/12] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue James Morse
` (9 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2018-05-16 16:28 UTC (permalink / raw)
To: linux-acpi
Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, inux-mm, Marc Zyngier,
Catalin Marinas, Tyler Baicar, Will Deacon, Dongjiu Geng,
Punit Agrawal, Borislav Petkov, Naoya Horiguchi, kvmarm,
linux-arm-kernel, Len Brown
oops_begin() exists to group printk() messages with the oops message
printed by die(). To reach this caller we know that platform firmware
took this error first, then notified the OS via NMI with a 'panic'
severity.
Don't wait for another CPU to release the die-lock before we can
panic(), our only goal is to print this fatal error and panic().
This code is always called in_nmi(), and since 42a0bb3f7138 ("printk/nmi:
generic solution for safe printk in NMI"), it has been safe to call
printk() from this context. Messages are batched in a per-cpu buffer
and printed via irq-work, or a call back from panic().
Link: https://patchwork.kernel.org/patch/10313555/
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/acpi/apei/ghes.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 5a0b8a1bddb1..9b5f9642ee32 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -33,7 +33,6 @@
#include <linux/interrupt.h>
#include <linux/timer.h>
#include <linux/cper.h>
-#include <linux/kdebug.h>
#include <linux/platform_device.h>
#include <linux/mutex.h>
#include <linux/ratelimit.h>
@@ -758,9 +757,6 @@ static int _in_nmi_notify_one(struct ghes *ghes)
sev = ghes_severity(ghes->estatus->error_severity);
if (sev >= GHES_SEV_PANIC) {
-#ifdef CONFIG_X86
- oops_begin();
-#endif
ghes_print_queued_estatus();
__ghes_panic(ghes);
}
--
2.16.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 04/12] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
2018-05-16 16:28 [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
` (2 preceding siblings ...)
2018-05-16 16:28 ` [PATCH v4 03/12] ACPI / APEI: don't wait to serialise with oops messages when panic()ing James Morse
@ 2018-05-16 16:28 ` James Morse
2018-05-16 16:28 ` [PATCH v4 05/12] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing James Morse
` (8 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2018-05-16 16:28 UTC (permalink / raw)
To: linux-acpi
Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, inux-mm, Marc Zyngier,
Catalin Marinas, Tyler Baicar, Will Deacon, Dongjiu Geng,
Punit Agrawal, Borislav Petkov, Naoya Horiguchi, kvmarm,
linux-arm-kernel, Len Brown
Now that the estatus queue can be used by more than one notification
method, we can move notifications that have NMI-like behaviour over to
it, and start abstracting GHES's single in_nmi() path.
Switch NOTIFY_SEA over to use the estatus queue. This makes it behave
in the same way as x86's NOTIFY_NMI.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
---
drivers/acpi/apei/ghes.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 9b5f9642ee32..40f8f9f34b05 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -58,6 +58,10 @@
#define GHES_PFX "GHES: "
+#if defined(CONFIG_HAVE_ACPI_APEI_NMI) || defined(CONFIG_ACPI_APEI_SEA)
+#define WANT_NMI_ESTATUS_QUEUE 1
+#endif
+
#define GHES_ESTATUS_MAX_SIZE 65536
#define GHES_ESOURCE_PREALLOC_MAX_SIZE 65536
@@ -681,7 +685,7 @@ static void ghes_estatus_cache_add(
rcu_read_unlock();
}
-#ifdef CONFIG_HAVE_ACPI_APEI_NMI
+#ifdef WANT_NMI_ESTATUS_QUEUE
/*
* Handlers for CPER records may not be NMI safe. For example,
* memory_failure_queue() takes spinlocks and calls schedule_work_on().
@@ -861,7 +865,7 @@ static void ghes_nmi_init_cxt(void)
#else
static inline void ghes_nmi_init_cxt(void) { }
-#endif /* CONFIG_HAVE_ACPI_APEI_NMI */
+#endif /* WANT_NMI_ESTATUS_QUEUE */
static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
{
@@ -977,20 +981,13 @@ static LIST_HEAD(ghes_sea);
*/
int ghes_notify_sea(void)
{
- struct ghes *ghes;
- int ret = -ENOENT;
-
- rcu_read_lock();
- list_for_each_entry_rcu(ghes, &ghes_sea, list) {
- if (!ghes_proc(ghes))
- ret = 0;
- }
- rcu_read_unlock();
- return ret;
+ return ghes_estatus_queue_notified(&ghes_sea);
}
static void ghes_sea_add(struct ghes *ghes)
{
+ ghes_estatus_queue_grow_pool(ghes);
+
mutex_lock(&ghes_list_mutex);
list_add_rcu(&ghes->list, &ghes_sea);
mutex_unlock(&ghes_list_mutex);
@@ -1002,6 +999,8 @@ static void ghes_sea_remove(struct ghes *ghes)
list_del_rcu(&ghes->list);
mutex_unlock(&ghes_list_mutex);
synchronize_rcu();
+
+ ghes_estatus_queue_shrink_pool(ghes);
}
#else /* CONFIG_ACPI_APEI_SEA */
static inline void ghes_sea_add(struct ghes *ghes) { }
--
2.16.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 05/12] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing
2018-05-16 16:28 [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
` (3 preceding siblings ...)
2018-05-16 16:28 ` [PATCH v4 04/12] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue James Morse
@ 2018-05-16 16:28 ` James Morse
2018-05-16 16:28 ` [PATCH v4 06/12] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface James Morse
` (7 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2018-05-16 16:28 UTC (permalink / raw)
To: linux-acpi
Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, inux-mm, Marc Zyngier,
Catalin Marinas, Tyler Baicar, Will Deacon, Dongjiu Geng,
Punit Agrawal, Borislav Petkov, Naoya Horiguchi, kvmarm,
linux-arm-kernel, Len Brown
To split up APEIs in_nmi() path, we need any nmi-like callers to always
be in_nmi(). KVM shouldn't have to know about this, pull the RAS plumbing
out into a header file.
Currently guest synchronous external aborts are claimed as RAS
notifications by handle_guest_sea(), which is hidden in the arch codes
mm/fault.c. 32bit gets a dummy declaration in system_misc.h.
There is going to be more of this in the future if/when we support
the SError-based firmware-first notification mechanism and/or
kernel-first notifications for both synchronous external abort and
SError. Each of these will come with some Kconfig symbols and a
handful of header files.
Create a header file for all this.
This patch gives handle_guest_sea() a 'kvm_' prefix, and moves the
declarations to kvm_ras.h as preparation for a future patch that moves
the ACPI-specific RAS code out of mm/fault.c.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
---
arch/arm/include/asm/kvm_ras.h | 14 ++++++++++++++
arch/arm/include/asm/system_misc.h | 5 -----
arch/arm64/include/asm/kvm_ras.h | 11 +++++++++++
arch/arm64/include/asm/system_misc.h | 2 --
arch/arm64/mm/fault.c | 2 +-
virt/kvm/arm/mmu.c | 4 ++--
6 files changed, 28 insertions(+), 10 deletions(-)
create mode 100644 arch/arm/include/asm/kvm_ras.h
create mode 100644 arch/arm64/include/asm/kvm_ras.h
diff --git a/arch/arm/include/asm/kvm_ras.h b/arch/arm/include/asm/kvm_ras.h
new file mode 100644
index 000000000000..aaff56bf338f
--- /dev/null
+++ b/arch/arm/include/asm/kvm_ras.h
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 - Arm Ltd
+
+#ifndef __ARM_KVM_RAS_H__
+#define __ARM_KVM_RAS_H__
+
+#include <linux/types.h>
+
+static inline int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
+{
+ return -1;
+}
+
+#endif /* __ARM_KVM_RAS_H__ */
diff --git a/arch/arm/include/asm/system_misc.h b/arch/arm/include/asm/system_misc.h
index 78f6db114faf..51e5ab50b35f 100644
--- a/arch/arm/include/asm/system_misc.h
+++ b/arch/arm/include/asm/system_misc.h
@@ -23,11 +23,6 @@ extern void (*arm_pm_idle)(void);
extern unsigned int user_debug;
-static inline int handle_guest_sea(phys_addr_t addr, unsigned int esr)
-{
- return -1;
-}
-
#endif /* !__ASSEMBLY__ */
#endif /* __ASM_ARM_SYSTEM_MISC_H */
diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h
new file mode 100644
index 000000000000..5f72b07b7912
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_ras.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 - Arm Ltd
+
+#ifndef __ARM64_KVM_RAS_H__
+#define __ARM64_KVM_RAS_H__
+
+#include <linux/types.h>
+
+int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr);
+
+#endif /* __ARM64_KVM_RAS_H__ */
diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index 28893a0b141d..48ded3628a89 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -45,8 +45,6 @@ extern void __show_regs(struct pt_regs *);
extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
-int handle_guest_sea(phys_addr_t addr, unsigned int esr);
-
#endif /* __ASSEMBLY__ */
#endif /* __ASM_SYSTEM_MISC_H */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 4165485e8b6e..d61a886afec7 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -662,7 +662,7 @@ static const struct fault_info fault_info[] = {
{ do_bad, SIGKILL, SI_KERNEL, "unknown 63" },
};
-int handle_guest_sea(phys_addr_t addr, unsigned int esr)
+int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
{
int ret = -ENOENT;
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 7f6a944db23d..673141de1e67 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -27,10 +27,10 @@
#include <asm/kvm_arm.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_mmio.h>
+#include <asm/kvm_ras.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
#include <asm/virt.h>
-#include <asm/system_misc.h>
#include "trace.h"
@@ -1647,7 +1647,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
* For RAS the host kernel may handle this abort.
* There is no need to pass the error into the guest.
*/
- if (!handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
+ if (!kvm_handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
return 1;
if (unlikely(!is_iabt)) {
--
2.16.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 06/12] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface
2018-05-16 16:28 [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
` (4 preceding siblings ...)
2018-05-16 16:28 ` [PATCH v4 05/12] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing James Morse
@ 2018-05-16 16:28 ` James Morse
2018-05-16 16:28 ` [PATCH v4 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users James Morse
` (6 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2018-05-16 16:28 UTC (permalink / raw)
To: linux-acpi
Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, inux-mm, Marc Zyngier,
Catalin Marinas, Tyler Baicar, Will Deacon, Dongjiu Geng,
Punit Agrawal, Borislav Petkov, Naoya Horiguchi, kvmarm,
linux-arm-kernel, Len Brown
To split up APEIs in_nmi() path, we need the nmi-like callers to always
be in_nmi(). Add a helper to do the work and claim the notification.
When KVM or the arch code takes an exception that might be a RAS
notification, it asks the APEI firmware-first code whether it wants
to claim the exception. We can then go on to see if (a future)
kernel-first mechanism wants to claim the notification, before
falling through to the existing default behaviour.
The NOTIFY_SEA code was merged before we had multiple, possibly
interacting, NMI-like notifications and the need to consider kernel
first in the future. Make the 'claiming' behaviour explicit.
As we're restructuring the APEI code to allow multiple NMI-like
notifications, any notification that might interrupt interrupts-masked
code must always be wrapped in nmi_enter()/nmi_exit(). This allows APEI
to use in_nmi() to choose between the raw/regular spinlock routines.
We mask SError over this window to prevent an asynchronous RAS error
arriving and tripping 'nmi_enter()'s BUG_ON(in_nmi()).
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
---
Why does apei_claim_sea() take a pt_regs? This gets used later to take
APEI by the hand through NMI->IRQ context, depending on what we
interrupted.
Changes since v3:
* Removed spurious whitespace change
* Updated comment in acpi.c to cover SError masking
Changes since v2:
* Added dummy definition for !ACPI and culled IS_ENABLED() checks.
arch/arm64/include/asm/acpi.h | 4 ++++
arch/arm64/include/asm/daifflags.h | 1 +
arch/arm64/include/asm/kvm_ras.h | 15 ++++++++++++++-
arch/arm64/kernel/acpi.c | 30 ++++++++++++++++++++++++++++++
arch/arm64/mm/fault.c | 29 +++++------------------------
5 files changed, 54 insertions(+), 25 deletions(-)
diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 32f465a80e4e..f5fc9330ea2f 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -16,6 +16,7 @@
#include <linux/psci.h>
#include <asm/cputype.h>
+#include <asm/ptrace.h>
#include <asm/smp_plat.h>
#include <asm/tlbflush.h>
@@ -126,6 +127,9 @@ static inline const char *acpi_get_enable_method(int cpu)
*/
#define acpi_disable_cmcff 1
pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr);
+int apei_claim_sea(struct pt_regs *regs);
+#else
+static inline int apei_claim_sea(struct pt_regs *regs) { return -ENOENT; }
#endif /* CONFIG_ACPI_APEI */
#ifdef CONFIG_ACPI_NUMA
diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h
index 22e4c83de5a5..cbd753855bf3 100644
--- a/arch/arm64/include/asm/daifflags.h
+++ b/arch/arm64/include/asm/daifflags.h
@@ -20,6 +20,7 @@
#define DAIF_PROCCTX 0
#define DAIF_PROCCTX_NOIRQ PSR_I_BIT
+#define DAIF_ERRCTX (PSR_I_BIT | PSR_A_BIT)
/* mask/save/unmask/restore all exceptions, including interrupts. */
static inline void local_daif_mask(void)
diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h
index 5f72b07b7912..52edc9b3b937 100644
--- a/arch/arm64/include/asm/kvm_ras.h
+++ b/arch/arm64/include/asm/kvm_ras.h
@@ -4,8 +4,21 @@
#ifndef __ARM64_KVM_RAS_H__
#define __ARM64_KVM_RAS_H__
+#include <linux/acpi.h>
+#include <linux/errno.h>
#include <linux/types.h>
-int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr);
+#include <asm/acpi.h>
+
+/*
+ * Was this synchronous external abort a RAS notification?
+ * Returns '0' for errors handled by some RAS subsystem, or -ENOENT.
+ *
+ * Call with irqs unmasked.
+ */
+static inline int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
+{
+ return apei_claim_sea(NULL);
+}
#endif /* __ARM64_KVM_RAS_H__ */
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index 7b09487ff8fb..df2c6bff8c58 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -33,6 +33,8 @@
#ifdef CONFIG_ACPI_APEI
# include <linux/efi.h>
+# include <acpi/ghes.h>
+# include <asm/daifflags.h>
# include <asm/pgtable.h>
#endif
@@ -261,4 +263,32 @@ pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
return __pgprot(PROT_NORMAL_NC);
return __pgprot(PROT_DEVICE_nGnRnE);
}
+
+
+/*
+ * Claim Synchronous External Aborts as a firmware first notification.
+ *
+ * Used by KVM and the arch do_sea handler.
+ * @regs may be NULL when called from process context.
+ */
+int apei_claim_sea(struct pt_regs *regs)
+{
+ int err = -ENOENT;
+ unsigned long current_flags = arch_local_save_flags();
+
+ if (!IS_ENABLED(CONFIG_ACPI_APEI_SEA))
+ return err;
+
+ /*
+ * SEA can interrupt SError, mask it and describe this as an NMI so
+ * that APEI defers the handling.
+ */
+ local_daif_restore(DAIF_ERRCTX);
+ nmi_enter();
+ err = ghes_notify_sea();
+ nmi_exit();
+ local_daif_restore(current_flags);
+
+ return err;
+}
#endif
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index d61a886afec7..d7e89da0e5df 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -18,6 +18,7 @@
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
+#include <linux/acpi.h>
#include <linux/extable.h>
#include <linux/signal.h>
#include <linux/mm.h>
@@ -33,6 +34,7 @@
#include <linux/preempt.h>
#include <linux/hugetlb.h>
+#include <asm/acpi.h>
#include <asm/bug.h>
#include <asm/cmpxchg.h>
#include <asm/cpufeature.h>
@@ -45,8 +47,6 @@
#include <asm/tlbflush.h>
#include <asm/traps.h>
-#include <acpi/ghes.h>
-
struct fault_info {
int (*fn)(unsigned long addr, unsigned int esr,
struct pt_regs *regs);
@@ -569,19 +569,10 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
inf = esr_to_fault_info(esr);
/*
- * Synchronous aborts may interrupt code which had interrupts masked.
- * Before calling out into the wider kernel tell the interested
- * subsystems.
+ * Return value ignored as we rely on signal merging.
+ * Future patches will make this more robust.
*/
- if (IS_ENABLED(CONFIG_ACPI_APEI_SEA)) {
- if (interrupts_enabled(regs))
- nmi_enter();
-
- ghes_notify_sea();
-
- if (interrupts_enabled(regs))
- nmi_exit();
- }
+ apei_claim_sea(regs);
info.si_signo = inf->sig;
info.si_errno = 0;
@@ -662,16 +653,6 @@ static const struct fault_info fault_info[] = {
{ do_bad, SIGKILL, SI_KERNEL, "unknown 63" },
};
-int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
-{
- int ret = -ENOENT;
-
- if (IS_ENABLED(CONFIG_ACPI_APEI_SEA))
- ret = ghes_notify_sea();
-
- return ret;
-}
-
asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int esr,
struct pt_regs *regs)
{
--
2.16.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
2018-05-16 16:28 [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
` (5 preceding siblings ...)
2018-05-16 16:28 ` [PATCH v4 06/12] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface James Morse
@ 2018-05-16 16:28 ` James Morse
2018-05-16 16:28 ` [PATCH v4 08/12] ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications James Morse
` (5 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2018-05-16 16:28 UTC (permalink / raw)
To: linux-acpi
Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, inux-mm, Marc Zyngier,
Catalin Marinas, Tyler Baicar, Will Deacon, Dongjiu Geng,
Punit Agrawal, Borislav Petkov, Naoya Horiguchi, kvmarm,
linux-arm-kernel, Len Brown
Arm64 has multiple NMI-like notifications, but ghes.c only has one
in_nmi() path, risking deadlock if one NMI-like notification can
interrupt another.
To support this we need a fixmap entry and lock for each notification
type. But ghes_probe() attempts to process each struct ghes at probe
time, to ensure any error that was notified before ghes_probe() was
called has been done. This releases the CPER buffers (and maybe
acknowledges this firmware) so that future errors can be delivered.
NMI-like notifications need two fixmap entries and locks, one for the
ghes_probe() time call, and another for the actual NMI that could
interrupt ghes_probe().
Split this single path up by adding an nmi-fixmap structure that holds
the fixmap-idx and lock to struct ghes. Any notification that can be
called as an NMI can use these to separate its resources from any other
notification it may interrupt.
The majority of notifications occur in IRQ context, so unless its
called in_nmi(), ghes_copy_tofrom_phys() will use the FIX_APEI_GHES_IRQ
fixmap entry and the ghes_fixmap_lock_irq lock. This allows
NMI-notifications to be processed by ghes_probe(), and then taken
as an NMI.
Add a helper to create these nmi_fixmap structs, and code to read them
in ghes_copy_tofrom_phys(). This lets us merge to the two 'ghes_ioremap'
helpers, and remove the unmap helpers. Remove the the last references
to 'ioremap' as this is all done via fixmap.
Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
---
Changes since v3:
* Moved idx/lock into a struct, to avoid tasteless lock pointers.
* Tried to improve the commit message,
Changes since v1:
* Fixed for ghes_proc() always calling every notification in process context.
Now only NMI-like notifications need an additional fixmap-slot/lock.
---
drivers/acpi/apei/ghes.c | 68 ++++++++++++++++--------------------------------
include/acpi/ghes.h | 17 ++++++++++++
2 files changed, 39 insertions(+), 46 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 40f8f9f34b05..13bb3bb94fbd 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -117,12 +117,9 @@ static DEFINE_MUTEX(ghes_list_mutex);
* from BIOS to Linux can be determined only in NMI, IRQ or timer
* handler, but general ioremap can not be used in atomic context, so
* the fixmap is used instead.
- *
- * These 2 spinlocks are used to prevent the fixmap entries from being used
- * simultaneously.
*/
-static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi);
-static DEFINE_SPINLOCK(ghes_ioremap_lock_irq);
+static DEFINE_GHES_NMI_FIXMAP(nmi_fixmap, FIX_APEI_GHES_NMI);
+static DEFINE_SPINLOCK(ghes_fixmap_lock_irq);
static struct gen_pool *ghes_estatus_pool;
static unsigned long ghes_estatus_pool_size_request;
@@ -132,38 +129,16 @@ static atomic_t ghes_estatus_cache_alloced;
static int ghes_panic_timeout __read_mostly = 30;
-static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
-{
- phys_addr_t paddr;
- pgprot_t prot;
-
- paddr = pfn << PAGE_SHIFT;
- prot = arch_apei_get_mem_attribute(paddr);
- __set_fixmap(FIX_APEI_GHES_NMI, paddr, prot);
-
- return (void __iomem *) fix_to_virt(FIX_APEI_GHES_NMI);
-}
-
-static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
+static void __iomem *ghes_fixmap_pfn(int fixmap_idx, u64 pfn)
{
phys_addr_t paddr;
pgprot_t prot;
paddr = pfn << PAGE_SHIFT;
prot = arch_apei_get_mem_attribute(paddr);
- __set_fixmap(FIX_APEI_GHES_IRQ, paddr, prot);
+ __set_fixmap(fixmap_idx, paddr, prot);
- return (void __iomem *) fix_to_virt(FIX_APEI_GHES_IRQ);
-}
-
-static void ghes_iounmap_nmi(void)
-{
- clear_fixmap(FIX_APEI_GHES_NMI);
-}
-
-static void ghes_iounmap_irq(void)
-{
- clear_fixmap(FIX_APEI_GHES_IRQ);
+ return (void __iomem *) __fix_to_virt(fixmap_idx);
}
static int ghes_estatus_pool_init(void)
@@ -291,10 +266,11 @@ static inline int ghes_severity(int severity)
}
}
-static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
- int from_phys)
+static void ghes_copy_tofrom_phys(struct ghes *ghes, void *buffer, u64 paddr,
+ u32 len, int from_phys)
{
void __iomem *vaddr;
+ int fixmap_idx = FIX_APEI_GHES_IRQ;
unsigned long flags = 0;
int in_nmi = in_nmi();
u64 offset;
@@ -303,12 +279,12 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
while (len > 0) {
offset = paddr - (paddr & PAGE_MASK);
if (in_nmi) {
- raw_spin_lock(&ghes_ioremap_lock_nmi);
- vaddr = ghes_ioremap_pfn_nmi(paddr >> PAGE_SHIFT);
+ raw_spin_lock(&ghes->nmi_fixmap->lock);
+ fixmap_idx = ghes->nmi_fixmap->idx;
} else {
- spin_lock_irqsave(&ghes_ioremap_lock_irq, flags);
- vaddr = ghes_ioremap_pfn_irq(paddr >> PAGE_SHIFT);
+ spin_lock_irqsave(&ghes_fixmap_lock_irq, flags);
}
+ vaddr = ghes_fixmap_pfn(fixmap_idx, paddr >> PAGE_SHIFT);
trunk = PAGE_SIZE - offset;
trunk = min(trunk, len);
if (from_phys)
@@ -318,13 +294,11 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
len -= trunk;
paddr += trunk;
buffer += trunk;
- if (in_nmi) {
- ghes_iounmap_nmi();
- raw_spin_unlock(&ghes_ioremap_lock_nmi);
- } else {
- ghes_iounmap_irq();
- spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags);
- }
+ clear_fixmap(fixmap_idx);
+ if (in_nmi)
+ raw_spin_unlock(&ghes->nmi_fixmap->lock);
+ else
+ spin_unlock_irqrestore(&ghes_fixmap_lock_irq, flags);
}
}
@@ -346,7 +320,7 @@ static int ghes_read_estatus(struct ghes *ghes, int silent)
if (!buf_paddr)
return -ENOENT;
- ghes_copy_tofrom_phys(ghes->estatus, buf_paddr,
+ ghes_copy_tofrom_phys(ghes, ghes->estatus, buf_paddr,
sizeof(*ghes->estatus), 1);
if (!ghes->estatus->block_status)
return -ENOENT;
@@ -362,7 +336,7 @@ static int ghes_read_estatus(struct ghes *ghes, int silent)
goto err_read_block;
if (cper_estatus_check_header(ghes->estatus))
goto err_read_block;
- ghes_copy_tofrom_phys(ghes->estatus + 1,
+ ghes_copy_tofrom_phys(ghes, ghes->estatus + 1,
buf_paddr + sizeof(*ghes->estatus),
len - sizeof(*ghes->estatus), 1);
if (cper_estatus_check(ghes->estatus))
@@ -381,7 +355,7 @@ static void ghes_clear_estatus(struct ghes *ghes)
ghes->estatus->block_status = 0;
if (!(ghes->flags & GHES_TO_CLEAR))
return;
- ghes_copy_tofrom_phys(ghes->estatus, ghes->buffer_paddr,
+ ghes_copy_tofrom_phys(ghes, ghes->estatus, ghes->buffer_paddr,
sizeof(ghes->estatus->block_status), 0);
ghes->flags &= ~GHES_TO_CLEAR;
}
@@ -986,6 +960,7 @@ int ghes_notify_sea(void)
static void ghes_sea_add(struct ghes *ghes)
{
+ ghes->nmi_fixmap = &nmi_fixmap;
ghes_estatus_queue_grow_pool(ghes);
mutex_lock(&ghes_list_mutex);
@@ -1032,6 +1007,7 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
static void ghes_nmi_add(struct ghes *ghes)
{
+ ghes->nmi_fixmap = &nmi_fixmap;
ghes_estatus_queue_grow_pool(ghes);
mutex_lock(&ghes_list_mutex);
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 8feb0c866ee0..d38dce5ef83e 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -5,6 +5,20 @@
#include <acpi/apei.h>
#include <acpi/hed.h>
+/*
+ * Systems with multiple NMI-like notifications may need separate locks/fixmap
+ * entries.
+ */
+struct ghes_nmi_fixmap {
+ raw_spinlock_t lock;
+ int idx;
+};
+
+#define DEFINE_GHES_NMI_FIXMAP(name, slot) struct ghes_nmi_fixmap name = {\
+ .lock = __RAW_SPIN_LOCK_INITIALIZER(lock), \
+ .idx = slot, \
+}
+
/*
* One struct ghes is created for each generic hardware error source.
* It provides the context for APEI hardware error timer/IRQ/SCI/NMI
@@ -29,6 +43,9 @@ struct ghes {
struct timer_list timer;
unsigned int irq;
};
+
+ /* If this ghes can be called in NMI contet, this must be populated. */
+ struct ghes_nmi_fixmap *nmi_fixmap;
};
struct ghes_estatus_node {
--
2.16.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 08/12] ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications
2018-05-16 16:28 [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
` (6 preceding siblings ...)
2018-05-16 16:28 ` [PATCH v4 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users James Morse
@ 2018-05-16 16:28 ` James Morse
2018-05-16 16:28 ` [PATCH v4 09/12] firmware: arm_sdei: Add ACPI GHES registration helper James Morse
` (4 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2018-05-16 16:28 UTC (permalink / raw)
To: linux-acpi
Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, inux-mm, Marc Zyngier,
Catalin Marinas, Tyler Baicar, Will Deacon, Dongjiu Geng,
Punit Agrawal, Borislav Petkov, Naoya Horiguchi, kvmarm,
linux-arm-kernel, Len Brown
Now that ghes uses the fixmap addresses and locks via some indirection
we can support multiple NMI-like notifications on arm64.
These should be named after their notification method. x86's
NOTIFY_NMI already is, move it to live with the ghes_nmi list.
Change the SEA fixmap entry to be called FIX_APEI_GHES_SEA.
Future patches can add support for FIX_APEI_GHES_SEI and
FIX_APEI_GHES_SDEI_{NORMAL,CRITICAL}.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
---
Changes since v3:
* idx/lock are now in a separate struct.
* Add to the comment above ghes_fixmap_lock_irq so that it makes more
sense in isolation.
arch/arm64/include/asm/fixmap.h | 4 +++-
drivers/acpi/apei/ghes.c | 12 ++++++++----
2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index ec1e6d6fa14c..c3974517c2cb 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -55,7 +55,9 @@ enum fixed_addresses {
#ifdef CONFIG_ACPI_APEI_GHES
/* Used for GHES mapping from assorted contexts */
FIX_APEI_GHES_IRQ,
- FIX_APEI_GHES_NMI,
+#ifdef CONFIG_ACPI_APEI_SEA
+ FIX_APEI_GHES_SEA,
+#endif
#endif /* CONFIG_ACPI_APEI_GHES */
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 13bb3bb94fbd..014966bdd0a7 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -117,8 +117,10 @@ static DEFINE_MUTEX(ghes_list_mutex);
* from BIOS to Linux can be determined only in NMI, IRQ or timer
* handler, but general ioremap can not be used in atomic context, so
* the fixmap is used instead.
+ * This lock protects access to the FIX_APEI_GHES_IRQ entry.
+ * NMI-like notifications use DEFINE_GHES_NMI_FIXMAP() to pair a fixmap
+ * entry and a lock.
*/
-static DEFINE_GHES_NMI_FIXMAP(nmi_fixmap, FIX_APEI_GHES_NMI);
static DEFINE_SPINLOCK(ghes_fixmap_lock_irq);
static struct gen_pool *ghes_estatus_pool;
@@ -948,6 +950,7 @@ static struct notifier_block ghes_notifier_hed = {
#ifdef CONFIG_ACPI_APEI_SEA
static LIST_HEAD(ghes_sea);
+static DEFINE_GHES_NMI_FIXMAP(sea_fixmap, FIX_APEI_GHES_SEA);
/*
* Return 0 only if one of the SEA error sources successfully reported an error
@@ -960,7 +963,7 @@ int ghes_notify_sea(void)
static void ghes_sea_add(struct ghes *ghes)
{
- ghes->nmi_fixmap = &nmi_fixmap;
+ ghes->nmi_fixmap = &sea_fixmap;
ghes_estatus_queue_grow_pool(ghes);
mutex_lock(&ghes_list_mutex);
@@ -984,12 +987,13 @@ static inline void ghes_sea_remove(struct ghes *ghes) { }
#ifdef CONFIG_HAVE_ACPI_APEI_NMI
/*
- * NMI may be triggered on any CPU, so ghes_in_nmi is used for
- * having only one concurrent reader.
+ * NOTIFY_NMI may be triggered on any CPU, so ghes_in_nmi is
+ * used for having only one concurrent reader.
*/
static atomic_t ghes_in_nmi = ATOMIC_INIT(0);
static LIST_HEAD(ghes_nmi);
+static DEFINE_GHES_NMI_FIXMAP(nmi_fixmap, FIX_APEI_GHES_NMI);
static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
{
--
2.16.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 09/12] firmware: arm_sdei: Add ACPI GHES registration helper
2018-05-16 16:28 [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
` (7 preceding siblings ...)
2018-05-16 16:28 ` [PATCH v4 08/12] ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications James Morse
@ 2018-05-16 16:28 ` James Morse
2018-05-16 16:28 ` [PATCH v4 10/12] ACPI / APEI: Add support for the SDEI GHES Notification type James Morse
` (3 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2018-05-16 16:28 UTC (permalink / raw)
To: linux-acpi
Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, inux-mm, Marc Zyngier,
Catalin Marinas, Tyler Baicar, Will Deacon, Dongjiu Geng,
Punit Agrawal, Borislav Petkov, Naoya Horiguchi, kvmarm,
linux-arm-kernel, Len Brown
APEI's Generic Hardware Error Source structures do not describe
whether the SDEI event is shared or private, as this information is
discoverable via the API.
GHES needs to know whether an event is normal or critical to avoid
sharing locks or fixmap entries.
Add a helper to ask firmware for this information so it can initialise
the struct ghes and register then enable the event.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
---
Changes since v3:
* Removed acpi_disabled() checks that aren't necessary after v2s #ifdef
change.
Changes since v2:
* Added header file, thanks kbuild-robot!
* changed ifdef to the GHES version to match the fixmap definition
Changes since v1:
* ghes->fixmap_idx variable rename
arch/arm64/include/asm/fixmap.h | 4 +++
drivers/firmware/arm_sdei.c | 67 +++++++++++++++++++++++++++++++++++++++++
include/linux/arm_sdei.h | 5 +++
3 files changed, 76 insertions(+)
diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index c3974517c2cb..e2b423a5feaf 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -58,6 +58,10 @@ enum fixed_addresses {
#ifdef CONFIG_ACPI_APEI_SEA
FIX_APEI_GHES_SEA,
#endif
+#ifdef CONFIG_ARM_SDE_INTERFACE
+ FIX_APEI_GHES_SDEI_NORMAL,
+ FIX_APEI_GHES_SDEI_CRITICAL,
+#endif
#endif /* CONFIG_ACPI_APEI_GHES */
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
index 1ea71640fdc2..7c304ebb6282 100644
--- a/drivers/firmware/arm_sdei.c
+++ b/drivers/firmware/arm_sdei.c
@@ -2,6 +2,7 @@
// Copyright (C) 2017 Arm Ltd.
#define pr_fmt(fmt) "sdei: " fmt
+#include <acpi/ghes.h>
#include <linux/acpi.h>
#include <linux/arm_sdei.h>
#include <linux/arm-smccc.h>
@@ -32,6 +33,8 @@
#include <linux/spinlock.h>
#include <linux/uaccess.h>
+#include <asm/fixmap.h>
+
/*
* The call to use to reach the firmware.
*/
@@ -887,6 +890,70 @@ static void sdei_smccc_hvc(unsigned long function_id,
arm_smccc_hvc(function_id, arg0, arg1, arg2, arg3, arg4, 0, 0, res);
}
+#ifdef CONFIG_ACPI_APEI_GHES
+static DEFINE_GHES_NMI_FIXMAP(sde_normal, FIX_APEI_GHES_SDEI_NORMAL);
+static DEFINE_GHES_NMI_FIXMAP(sde_critical, FIX_APEI_GHES_SDEI_CRITICAL);
+
+int sdei_register_ghes(struct ghes *ghes, sdei_event_callback *cb)
+{
+ int err;
+ u32 event_num;
+ u64 result;
+
+ event_num = ghes->generic->notify.vector;
+ if (event_num == 0) {
+ /*
+ * Event 0 is reserved by the specification for
+ * SDEI_EVENT_SIGNAL.
+ */
+ return -EINVAL;
+ }
+
+ err = sdei_api_event_get_info(event_num, SDEI_EVENT_INFO_EV_PRIORITY,
+ &result);
+ if (err)
+ return err;
+
+ if (result == SDEI_EVENT_PRIORITY_CRITICAL)
+ ghes->nmi_fixmap = &sde_critical;
+ else
+ ghes->nmi_fixmap = &sde_normal;
+
+ err = sdei_event_register(event_num, cb, ghes);
+ if (!err)
+ err = sdei_event_enable(event_num);
+
+ return err;
+}
+
+int sdei_unregister_ghes(struct ghes *ghes)
+{
+ int i;
+ int err;
+ u32 event_num = ghes->generic->notify.vector;
+
+ might_sleep();
+
+ /*
+ * The event may be running on another CPU. Disable it
+ * to stop new events, then try to unregister a few times.
+ */
+ err = sdei_event_disable(event_num);
+ if (err)
+ return err;
+
+ for (i = 0; i < 3; i++) {
+ err = sdei_event_unregister(event_num);
+ if (err != -EINPROGRESS)
+ break;
+
+ schedule();
+ }
+
+ return err;
+}
+#endif /* CONFIG_ACPI_APEI_GHES */
+
static int sdei_get_conduit(struct platform_device *pdev)
{
const char *method;
diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h
index 942afbd544b7..5fdf799be026 100644
--- a/include/linux/arm_sdei.h
+++ b/include/linux/arm_sdei.h
@@ -11,6 +11,7 @@ enum sdei_conduit_types {
CONDUIT_HVC,
};
+#include <acpi/ghes.h>
#include <asm/sdei.h>
/* Arch code should override this to set the entry point from firmware... */
@@ -39,6 +40,10 @@ int sdei_event_unregister(u32 event_num);
int sdei_event_enable(u32 event_num);
int sdei_event_disable(u32 event_num);
+/* GHES register/unregister helpers */
+int sdei_register_ghes(struct ghes *ghes, sdei_event_callback *cb);
+int sdei_unregister_ghes(struct ghes *ghes);
+
#ifdef CONFIG_ARM_SDE_INTERFACE
/* For use by arch code when CPU hotplug notifiers are not appropriate. */
int sdei_mask_local_cpu(void);
--
2.16.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 10/12] ACPI / APEI: Add support for the SDEI GHES Notification type
2018-05-16 16:28 [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
` (8 preceding siblings ...)
2018-05-16 16:28 ` [PATCH v4 09/12] firmware: arm_sdei: Add ACPI GHES registration helper James Morse
@ 2018-05-16 16:28 ` James Morse
2018-05-16 16:28 ` [PATCH v4 11/12] mm/memory-failure: increase queued recovery work's priority James Morse
` (2 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2018-05-16 16:28 UTC (permalink / raw)
To: linux-acpi
Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, inux-mm, Marc Zyngier,
Catalin Marinas, Tyler Baicar, Will Deacon, Dongjiu Geng,
Punit Agrawal, Borislav Petkov, Naoya Horiguchi, kvmarm,
linux-arm-kernel, Len Brown
If the GHES notification type is SDEI, register the provided event
number and point the callback at ghes_sdei_callback().
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
---
drivers/acpi/apei/ghes.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++--
include/linux/arm_sdei.h | 3 +++
2 files changed, 67 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 014966bdd0a7..72f7bc8435f7 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -25,6 +25,7 @@
* GNU General Public License for more details.
*/
+#include <linux/arm_sdei.h>
#include <linux/kernel.h>
#include <linux/moduleparam.h>
#include <linux/init.h>
@@ -58,7 +59,7 @@
#define GHES_PFX "GHES: "
-#if defined(CONFIG_HAVE_ACPI_APEI_NMI) || defined(CONFIG_ACPI_APEI_SEA)
+#if defined(CONFIG_HAVE_ACPI_APEI_NMI) || defined(CONFIG_ACPI_APEI_SEA) || defined(CONFIG_ARM_SDE_INTERFACE)
#define WANT_NMI_ESTATUS_QUEUE 1
#endif
@@ -750,7 +751,7 @@ static int _in_nmi_notify_one(struct ghes *ghes)
return 0;
}
-static int ghes_estatus_queue_notified(struct list_head *rcu_list)
+static int __maybe_unused ghes_estatus_queue_notified(struct list_head *rcu_list)
{
int ret = -ENOENT;
struct ghes *ghes;
@@ -1042,6 +1043,49 @@ static inline void ghes_nmi_add(struct ghes *ghes) { }
static inline void ghes_nmi_remove(struct ghes *ghes) { }
#endif /* CONFIG_HAVE_ACPI_APEI_NMI */
+static int ghes_sdei_callback(u32 event_num, struct pt_regs *regs, void *arg)
+{
+ struct ghes *ghes = arg;
+
+ if (!_in_nmi_notify_one(ghes)) {
+ if (IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG))
+ irq_work_queue(&ghes_proc_irq_work);
+
+ return 0;
+ }
+
+ return -ENOENT;
+}
+
+static int apei_sdei_register_ghes(struct ghes *ghes)
+{
+ int err = -EINVAL;
+
+ if (IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {
+ ghes_estatus_queue_grow_pool(ghes);
+
+ err = sdei_register_ghes(ghes, ghes_sdei_callback);
+ if (err)
+ ghes_estatus_queue_shrink_pool(ghes);
+ }
+
+ return err;
+}
+
+static int apei_sdei_unregister_ghes(struct ghes *ghes)
+{
+ int err = -EINVAL;
+
+ if (IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {
+ err = sdei_unregister_ghes(ghes);
+
+ if (!err)
+ ghes_estatus_queue_shrink_pool(ghes);
+ }
+
+ return err;
+}
+
static int ghes_probe(struct platform_device *ghes_dev)
{
struct acpi_hest_generic *generic;
@@ -1076,6 +1120,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
goto err;
}
break;
+ case ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED:
+ if (!IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {
+ pr_warn(GHES_PFX "Generic hardware error source: %d notified via SDE Interface is not supported!\n",
+ generic->header.source_id);
+ goto err;
+ }
+ break;
case ACPI_HEST_NOTIFY_LOCAL:
pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
generic->header.source_id);
@@ -1143,6 +1194,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
case ACPI_HEST_NOTIFY_NMI:
ghes_nmi_add(ghes);
break;
+ case ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED:
+ rc = apei_sdei_register_ghes(ghes);
+ if (rc)
+ goto err_edac_unreg;
+ break;
default:
BUG();
}
@@ -1164,6 +1220,7 @@ err:
static int ghes_remove(struct platform_device *ghes_dev)
{
+ int rc;
struct ghes *ghes;
struct acpi_hest_generic *generic;
@@ -1196,6 +1253,11 @@ static int ghes_remove(struct platform_device *ghes_dev)
case ACPI_HEST_NOTIFY_NMI:
ghes_nmi_remove(ghes);
break;
+ case ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED:
+ rc = apei_sdei_unregister_ghes(ghes);
+ if (rc)
+ return rc;
+ break;
default:
BUG();
break;
diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h
index 5fdf799be026..f49063ca206d 100644
--- a/include/linux/arm_sdei.h
+++ b/include/linux/arm_sdei.h
@@ -12,7 +12,10 @@ enum sdei_conduit_types {
};
#include <acpi/ghes.h>
+
+#ifdef CONFIG_ARM_SDE_INTERFACE
#include <asm/sdei.h>
+#endif
/* Arch code should override this to set the entry point from firmware... */
#ifndef sdei_arch_get_entry_point
--
2.16.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 11/12] mm/memory-failure: increase queued recovery work's priority
2018-05-16 16:28 [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
` (9 preceding siblings ...)
2018-05-16 16:28 ` [PATCH v4 10/12] ACPI / APEI: Add support for the SDEI GHES Notification type James Morse
@ 2018-05-16 16:28 ` James Morse
2018-05-20 7:12 ` 答复: " gengdongjiu
2018-05-20 7:13 ` gengdongjiu
2018-05-16 16:28 ` [PATCH v4 12/12] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work James Morse
2018-05-16 16:46 ` [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
12 siblings, 2 replies; 16+ messages in thread
From: James Morse @ 2018-05-16 16:28 UTC (permalink / raw)
To: linux-acpi
Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, inux-mm, Marc Zyngier,
Catalin Marinas, Tyler Baicar, Will Deacon, Dongjiu Geng,
Punit Agrawal, Borislav Petkov, Naoya Horiguchi, kvmarm,
linux-arm-kernel, Len Brown
arm64 can take an NMI-like error notification when user-space steps in
some corrupt memory. APEI's GHES code will call memory_failure_queue()
to schedule the recovery work. We then return to user-space, possibly
taking the fault again.
Currently the arch code unconditionally signals user-space from this
path, so we don't get stuck in this loop, but the affected process
never benefits from memory_failure()s recovery work. To fix this we
need to know the recovery work will run before we get back to user-space.
Increase the priority of the recovery work by scheduling it on the
system_highpri_wq, then try to bump the current task off this CPU
so that the recover work starts immediately.
Reported-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
CC: Xie XiuQi <xiexiuqi@huawei.com>
CC: gengdongjiu <gengdongjiu@huawei.com>
---
mm/memory-failure.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 9d142b9b86dc..f0e69d7ac406 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -55,6 +55,7 @@
#include <linux/hugetlb.h>
#include <linux/memory_hotplug.h>
#include <linux/mm_inline.h>
+#include <linux/preempt.h>
#include <linux/kfifo.h>
#include <linux/ratelimit.h>
#include "internal.h"
@@ -1333,6 +1334,7 @@ static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu);
*/
void memory_failure_queue(unsigned long pfn, int flags)
{
+ int cpu = smp_processor_id();
struct memory_failure_cpu *mf_cpu;
unsigned long proc_flags;
struct memory_failure_entry entry = {
@@ -1342,11 +1344,14 @@ void memory_failure_queue(unsigned long pfn, int flags)
mf_cpu = &get_cpu_var(memory_failure_cpu);
spin_lock_irqsave(&mf_cpu->lock, proc_flags);
- if (kfifo_put(&mf_cpu->fifo, entry))
- schedule_work_on(smp_processor_id(), &mf_cpu->work);
- else
+ if (kfifo_put(&mf_cpu->fifo, entry)) {
+ queue_work_on(cpu, system_highpri_wq, &mf_cpu->work);
+ set_tsk_need_resched(current);
+ preempt_set_need_resched();
+ } else {
pr_err("Memory failure: buffer overflow when queuing memory failure at %#lx\n",
pfn);
+ }
spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
put_cpu_var(memory_failure_cpu);
}
--
2.16.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 12/12] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
2018-05-16 16:28 [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
` (10 preceding siblings ...)
2018-05-16 16:28 ` [PATCH v4 11/12] mm/memory-failure: increase queued recovery work's priority James Morse
@ 2018-05-16 16:28 ` James Morse
2018-05-16 16:46 ` [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
12 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2018-05-16 16:28 UTC (permalink / raw)
To: linux-acpi
Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, inux-mm, Marc Zyngier,
Catalin Marinas, Tyler Baicar, Will Deacon, Dongjiu Geng,
Punit Agrawal, Borislav Petkov, Naoya Horiguchi, kvmarm,
linux-arm-kernel, Len Brown
APEI is unable to do all of its error handling work in nmi-context, so
it defers non-fatal work onto the irq_work queue. arch_irq_work_raise()
sends an IPI to the calling cpu, but we can't guarantee this will be
taken before we return.
Unless we interrupted a context with irqs-masked, we can call
irq_work_run() to do the work now. Otherwise return -EINPROGRESS to
indicate ghes_notify_sea() found some work to do, but it hasn't
finished yet.
With this we can take apei_claim_sea() returning '0' to mean this
external-abort was also notification of a firmware-first RAS error,
and that APEI has processed the CPER records.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
CC: Xie XiuQi <xiexiuqi@huawei.com>
CC: gengdongjiu <gengdongjiu@huawei.com>
---
Changes since v2:
* Removed IS_ENABLED() check, done by the caller unless we have a dummy
definition.
arch/arm64/kernel/acpi.c | 19 +++++++++++++++++++
arch/arm64/mm/fault.c | 9 ++++-----
2 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index df2c6bff8c58..9ef2d91f0000 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -22,6 +22,7 @@
#include <linux/init.h>
#include <linux/irq.h>
#include <linux/irqdomain.h>
+#include <linux/irq_work.h>
#include <linux/memblock.h>
#include <linux/of_fdt.h>
#include <linux/smp.h>
@@ -275,10 +276,14 @@ int apei_claim_sea(struct pt_regs *regs)
{
int err = -ENOENT;
unsigned long current_flags = arch_local_save_flags();
+ unsigned long interrupted_flags = current_flags;
if (!IS_ENABLED(CONFIG_ACPI_APEI_SEA))
return err;
+ if (regs)
+ interrupted_flags = regs->pstate;
+
/*
* SEA can interrupt SError, mask it and describe this as an NMI so
* that APEI defers the handling.
@@ -287,6 +292,20 @@ int apei_claim_sea(struct pt_regs *regs)
nmi_enter();
err = ghes_notify_sea();
nmi_exit();
+
+ /*
+ * APEI NMI-like notifications are deferred to irq_work. Unless
+ * we interrupted irqs-masked code, we can do that now.
+ */
+ if (!err) {
+ if (!arch_irqs_disabled_flags(interrupted_flags)) {
+ local_daif_restore(DAIF_PROCCTX_NOIRQ);
+ irq_work_run();
+ } else {
+ err = -EINPROGRESS;
+ }
+ }
+
local_daif_restore(current_flags);
return err;
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index d7e89da0e5df..0232e9064144 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -568,11 +568,10 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
inf = esr_to_fault_info(esr);
- /*
- * Return value ignored as we rely on signal merging.
- * Future patches will make this more robust.
- */
- apei_claim_sea(regs);
+ if (apei_claim_sea(regs) == 0) {
+ /* APEI claimed this as a firmware-first notification */
+ return 0;
+ }
info.si_signo = inf->sig;
info.si_errno = 0;
--
2.16.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up
2018-05-16 16:28 [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
` (11 preceding siblings ...)
2018-05-16 16:28 ` [PATCH v4 12/12] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work James Morse
@ 2018-05-16 16:46 ` James Morse
12 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2018-05-16 16:46 UTC (permalink / raw)
To: linux-acpi
Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Punit Agrawal,
Marc Zyngier, Catalin Marinas, Tyler Baicar, Will Deacon,
Dongjiu Geng, linux-mm, Borislav Petkov, Naoya Horiguchi, kvmarm,
linux-arm-kernel, Len Brown
On 16/05/18 17:28, James Morse wrote:
> The aim of this series is to wire arm64's SDEI into APEI.
... and I missed the 'l' from the beginning of the well know inux-mm@kvack.org
mailing list. I won't increase the spam by resending, please fix it when
pointing out my other mistakes!
Thanks,
James
> Since v3 the NMI fixmap entries and locks have moved into their own
> structure. This moves the indirection up from the 'lock', which should
> be more acceptable to polite society.
> Changes are noted in each patch.
>
> This touches a few trees, so I'm not sure how best it should be merged.
> Patches 11 and 12 are reducing a race that is made worse by patch 4, I'd
> like them to arrive together, even though patch 11 doesn't depend on anything
> else in the series. A partial merge of this would be 1-3 and 11.
[...]
> Patch 11 makes the reschedule to memory_failure() run as soon as possible.
[...]
> James Morse (12):
> ACPI / APEI: Move the estatus queue code up, and under its own ifdef
> ACPI / APEI: Generalise the estatus queue's add/remove and notify code
> ACPI / APEI: don't wait to serialise with oops messages when
> panic()ing
> ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
> KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing
> arm64: KVM/mm: Move SEA handling behind a single 'claim' interface
> ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple
> in_nmi() users
> ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications
> firmware: arm_sdei: Add ACPI GHES registration helper
> ACPI / APEI: Add support for the SDEI GHES Notification type
> mm/memory-failure: increase queued recovery work's priority
> arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
>
> arch/arm/include/asm/kvm_ras.h | 14 +
> arch/arm/include/asm/system_misc.h | 5 -
> arch/arm64/include/asm/acpi.h | 4 +
> arch/arm64/include/asm/daifflags.h | 1 +
> arch/arm64/include/asm/fixmap.h | 8 +-
> arch/arm64/include/asm/kvm_ras.h | 24 ++
> arch/arm64/include/asm/system_misc.h | 2 -
> arch/arm64/kernel/acpi.c | 49 ++++
> arch/arm64/mm/fault.c | 30 +-
> drivers/acpi/apei/ghes.c | 518 ++++++++++++++++++++---------------
> drivers/firmware/arm_sdei.c | 67 +++++
> include/acpi/ghes.h | 17 ++
> include/linux/arm_sdei.h | 8 +
> mm/memory-failure.c | 11 +-
> virt/kvm/arm/mmu.c | 4 +-
> 15 files changed, 503 insertions(+), 259 deletions(-)
> create mode 100644 arch/arm/include/asm/kvm_ras.h
> create mode 100644 arch/arm64/include/asm/kvm_ras.h
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* 答复: [PATCH v4 11/12] mm/memory-failure: increase queued recovery work's priority
2018-05-16 16:28 ` [PATCH v4 11/12] mm/memory-failure: increase queued recovery work's priority James Morse
@ 2018-05-20 7:12 ` gengdongjiu
2018-05-20 7:13 ` gengdongjiu
1 sibling, 0 replies; 16+ messages in thread
From: gengdongjiu @ 2018-05-20 7:12 UTC (permalink / raw)
To: James Morse, linux-acpi@vger.kernel.org
Cc: jonathan.zhang@cavium.com, Rafael Wysocki, Tony Luck,
inux-mm@kvack.org, Xiexiuqi, Marc Zyngier, Catalin Marinas,
Tyler Baicar, Will Deacon, Christoffer Dall, Punit Agrawal,
Borislav Petkov, Naoya Horiguchi, kvmarm@lists.cs.columbia.edu,
linux-arm-kernel@lists.infradead.org, Len Brown
>
> arm64 can take an NMI-like error notification when user-space steps in some corrupt memory. APEI's GHES code will call
> memory_failure_queue() to schedule the recovery work. We then return to user-space, possibly taking the fault again.
>
> Currently the arch code unconditionally signals user-space from this path, so we don't get stuck in this loop, but the affected process never
> benefits from memory_failure()s recovery work. To fix this we need to know the recovery work will run before we get back to user-space.
>
> Increase the priority of the recovery work by scheduling it on the system_highpri_wq, then try to bump the current task off this CPU so that
> the recover work starts immediately.
>
> Reported-by: Xie XiuQi <xiexiuqi@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
> Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
> CC: Xie XiuQi <xiexiuqi@huawei.com>
> CC: gengdongjiu <gengdongjiu@huawei.com>
Tested-by: gengdongjiu <gengdongjiu@huawei.com>
> ---
> mm/memory-failure.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 9d142b9b86dc..f0e69d7ac406 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -55,6 +55,7 @@
> #include <linux/hugetlb.h>
> #include <linux/memory_hotplug.h>
> #include <linux/mm_inline.h>
> +#include <linux/preempt.h>
> #include <linux/kfifo.h>
> #include <linux/ratelimit.h>
> #include "internal.h"
> @@ -1333,6 +1334,7 @@ static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu);
> */
> void memory_failure_queue(unsigned long pfn, int flags) {
> + int cpu = smp_processor_id();
> struct memory_failure_cpu *mf_cpu;
> unsigned long proc_flags;
> struct memory_failure_entry entry = {
> @@ -1342,11 +1344,14 @@ void memory_failure_queue(unsigned long pfn, int flags)
>
> mf_cpu = &get_cpu_var(memory_failure_cpu);
> spin_lock_irqsave(&mf_cpu->lock, proc_flags);
> - if (kfifo_put(&mf_cpu->fifo, entry))
> - schedule_work_on(smp_processor_id(), &mf_cpu->work);
> - else
> + if (kfifo_put(&mf_cpu->fifo, entry)) {
> + queue_work_on(cpu, system_highpri_wq, &mf_cpu->work);
> + set_tsk_need_resched(current);
> + preempt_set_need_resched();
> + } else {
> pr_err("Memory failure: buffer overflow when queuing memory failure at %#lx\n",
> pfn);
> + }
> spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
> put_cpu_var(memory_failure_cpu);
> }
> --
> 2.16.2
^ permalink raw reply [flat|nested] 16+ messages in thread
* 答复: [PATCH v4 11/12] mm/memory-failure: increase queued recovery work's priority
2018-05-16 16:28 ` [PATCH v4 11/12] mm/memory-failure: increase queued recovery work's priority James Morse
2018-05-20 7:12 ` 答复: " gengdongjiu
@ 2018-05-20 7:13 ` gengdongjiu
1 sibling, 0 replies; 16+ messages in thread
From: gengdongjiu @ 2018-05-20 7:13 UTC (permalink / raw)
To: James Morse, linux-acpi@vger.kernel.org
Cc: jonathan.zhang@cavium.com, Rafael Wysocki, Tony Luck,
inux-mm@kvack.org, Marc Zyngier, Catalin Marinas, Tyler Baicar,
Will Deacon, Punit Agrawal, Borislav Petkov, Naoya Horiguchi,
kvmarm@lists.cs.columbia.edu,
linux-arm-kernel@lists.infradead.org, Len Brown
>
> arm64 can take an NMI-like error notification when user-space steps in some corrupt memory. APEI's GHES code will call
> memory_failure_queue() to schedule the recovery work. We then return to user-space, possibly taking the fault again.
>
> Currently the arch code unconditionally signals user-space from this path, so we don't get stuck in this loop, but the affected process never
> benefits from memory_failure()s recovery work. To fix this we need to know the recovery work will run before we get back to user-space.
>
> Increase the priority of the recovery work by scheduling it on the system_highpri_wq, then try to bump the current task off this CPU so that
> the recover work starts immediately.
>
> Reported-by: Xie XiuQi <xiexiuqi@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
> Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
> CC: Xie XiuQi <xiexiuqi@huawei.com>
> CC: gengdongjiu <gengdongjiu@huawei.com>
Tested-by: gengdongjiu <gengdongjiu@huawei.com>
> ---
> mm/memory-failure.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 9d142b9b86dc..f0e69d7ac406 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -55,6 +55,7 @@
> #include <linux/hugetlb.h>
> #include <linux/memory_hotplug.h>
> #include <linux/mm_inline.h>
> +#include <linux/preempt.h>
> #include <linux/kfifo.h>
> #include <linux/ratelimit.h>
> #include "internal.h"
> @@ -1333,6 +1334,7 @@ static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu);
> */
> void memory_failure_queue(unsigned long pfn, int flags) {
> + int cpu = smp_processor_id();
> struct memory_failure_cpu *mf_cpu;
> unsigned long proc_flags;
> struct memory_failure_entry entry = {
> @@ -1342,11 +1344,14 @@ void memory_failure_queue(unsigned long pfn, int flags)
>
> mf_cpu = &get_cpu_var(memory_failure_cpu);
> spin_lock_irqsave(&mf_cpu->lock, proc_flags);
> - if (kfifo_put(&mf_cpu->fifo, entry))
> - schedule_work_on(smp_processor_id(), &mf_cpu->work);
> - else
> + if (kfifo_put(&mf_cpu->fifo, entry)) {
> + queue_work_on(cpu, system_highpri_wq, &mf_cpu->work);
> + set_tsk_need_resched(current);
> + preempt_set_need_resched();
> + } else {
> pr_err("Memory failure: buffer overflow when queuing memory failure at %#lx\n",
> pfn);
> + }
> spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
> put_cpu_var(memory_failure_cpu);
> }
> --
> 2.16.2
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2018-05-20 7:13 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-16 16:28 [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
2018-05-16 16:28 ` [PATCH v4 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef James Morse
2018-05-16 16:28 ` [PATCH v4 02/12] ACPI / APEI: Generalise the estatus queue's add/remove and notify code James Morse
2018-05-16 16:28 ` [PATCH v4 03/12] ACPI / APEI: don't wait to serialise with oops messages when panic()ing James Morse
2018-05-16 16:28 ` [PATCH v4 04/12] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue James Morse
2018-05-16 16:28 ` [PATCH v4 05/12] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing James Morse
2018-05-16 16:28 ` [PATCH v4 06/12] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface James Morse
2018-05-16 16:28 ` [PATCH v4 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users James Morse
2018-05-16 16:28 ` [PATCH v4 08/12] ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications James Morse
2018-05-16 16:28 ` [PATCH v4 09/12] firmware: arm_sdei: Add ACPI GHES registration helper James Morse
2018-05-16 16:28 ` [PATCH v4 10/12] ACPI / APEI: Add support for the SDEI GHES Notification type James Morse
2018-05-16 16:28 ` [PATCH v4 11/12] mm/memory-failure: increase queued recovery work's priority James Morse
2018-05-20 7:12 ` 答复: " gengdongjiu
2018-05-20 7:13 ` gengdongjiu
2018-05-16 16:28 ` [PATCH v4 12/12] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work James Morse
2018-05-16 16:46 ` [PATCH v4 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).