From: Shuai Xue <xueshuai@linux.alibaba.com>
To: rafael@kernel.org, lenb@kernel.org, james.morse@arm.com,
tony.luck@intel.com, bp@alien8.de, dave.hansen@linux.intel.com,
jarkko@kernel.org, naoya.horiguchi@nec.com, linmiaohe@huawei.com,
akpm@linux-foundation.org
Cc: stable@vger.kernel.org, linux-acpi@vger.kernel.org,
linux-kernel@vger.kernel.org, cuibixuan@linux.alibaba.com,
baolin.wang@linux.alibaba.com, zhuo.song@linux.alibaba.com,
xueshuai@linux.alibaba.com
Subject: [PATCH v2] ACPI: APEI: do not add task_work to kernel thread to avoid memory leak
Date: Sat, 24 Sep 2022 15:49:53 +0800 [thread overview]
Message-ID: <20220924074953.83064-1-xueshuai@linux.alibaba.com> (raw)
In-Reply-To: <20220916050535.26625-1-xueshuai@linux.alibaba.com>
If an error is detected as a result of user-space process accessing a
corrupt memory location, the CPU may take an abort. Then the platform
firmware reports kernel via NMI like notifications, e.g. NOTIFY_SEA,
NOTIFY_SOFTWARE_DELEGATED, etc.
For NMI like notifications, commit 7f17b4a121d0 ("ACPI: APEI: Kick the
memory_failure() queue for synchronous errors") keep track of whether
memory_failure() work was queued, and make task_work pending to flush out
the queue so that the work is processed before return to user-space.
The code use init_mm to check whether the error occurs in user space:
if (current->mm != &init_mm)
The condition is always true, becase _nobody_ ever has "init_mm" as a real
VM any more.
In addition to abort, errors can also be signaled as asynchronous
exceptions, such as interrupt and SError. In such case, the interrupted
current process could be any kind of thread. When a kernel thread is
interrupted, the work ghes_kick_task_work deferred to task_work will never
be processed because entry_handler returns to call ret_to_kernel() instead
of ret_to_user(). Consequently, the estatus_node alloced from
ghes_estatus_pool in ghes_in_nmi_queue_one_entry() will not be freed.
After around 200 allocations in our platform, the ghes_estatus_pool will
run of memory and ghes_in_nmi_queue_one_entry() returns ENOMEM. As a
result, the event failed to be processed.
sdei: event 805 on CPU 113 failed with error: -2
Finally, a lot of unhandled events may cause platform firmware to exceed
some threshold and reboot.
The condition should generally just do
if (current->mm)
as described in active_mm.rst documentation.
Then if an asynchronous error is detected when a kernel thread is running,
(e.g. when detected by a background scrubber), do not add task_work to it
as the original patch intends to do.
Fixes: 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors")
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
changes since v1:
- add description the side effect and give more details
drivers/acpi/apei/ghes.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d91ad378c00d..80ad530583c9 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -985,7 +985,7 @@ static void ghes_proc_in_irq(struct irq_work *irq_work)
ghes_estatus_cache_add(generic, estatus);
}
- if (task_work_pending && current->mm != &init_mm) {
+ if (task_work_pending && current->mm) {
estatus_node->task_work.func = ghes_kick_task_work;
estatus_node->task_work_cpu = smp_processor_id();
ret = task_work_add(current, &estatus_node->task_work,
--
2.20.1.12.g72788fdb
next prev parent reply other threads:[~2022-09-24 7:50 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-16 5:05 [PATCH] ACPI: APEI: do not add task_work for outside context error Shuai Xue
2022-09-19 2:37 ` Shuai Xue
2022-09-24 7:49 ` Shuai Xue [this message]
2022-09-24 17:17 ` [PATCH v2] ACPI: APEI: do not add task_work to kernel thread to avoid memory leak Rafael J. Wysocki
2022-09-26 11:35 ` Shuai Xue
2022-09-26 15:20 ` Luck, Tony
2022-09-27 3:50 ` Shuai Xue
2022-09-27 17:47 ` Luck, Tony
2022-09-29 2:33 ` Shuai Xue
2022-09-29 20:52 ` Luck, Tony
2022-09-30 2:52 ` Shuai Xue
2022-09-30 15:52 ` Luck, Tony
2022-10-04 14:07 ` Rafael J. Wysocki
2022-10-13 7:05 ` Shuai Xue
2022-10-13 17:18 ` Luck, Tony
2022-10-14 13:23 ` Shuai Xue
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220924074953.83064-1-xueshuai@linux.alibaba.com \
--to=xueshuai@linux.alibaba.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bp@alien8.de \
--cc=cuibixuan@linux.alibaba.com \
--cc=dave.hansen@linux.intel.com \
--cc=james.morse@arm.com \
--cc=jarkko@kernel.org \
--cc=lenb@kernel.org \
--cc=linmiaohe@huawei.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=naoya.horiguchi@nec.com \
--cc=rafael@kernel.org \
--cc=stable@vger.kernel.org \
--cc=tony.luck@intel.com \
--cc=zhuo.song@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox