From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: David Belanger <david.belanger@amd.com>,
Felix Kuehling <Felix.Kuehling@amd.com>,
Alex Deucher <alexander.deucher@amd.com>,
Sasha Levin <sashal@kernel.org>,
christian.koenig@amd.com, Xinhui.Pan@amd.com, airlied@gmail.com,
daniel@ffwll.ch, amd-gfx@lists.freedesktop.org,
dri-devel@lists.freedesktop.org
Subject: [PATCH AUTOSEL 6.1 26/34] drm/amdkfd: Fixed kfd_process cleanup on module exit.
Date: Wed, 22 Mar 2023 15:59:18 -0400 [thread overview]
Message-ID: <20230322195926.1996699-26-sashal@kernel.org> (raw)
In-Reply-To: <20230322195926.1996699-1-sashal@kernel.org>
From: David Belanger <david.belanger@amd.com>
[ Upstream commit 20bc9f76b6a2455c6b54b91ae7634f147f64987f ]
Handle case when module is unloaded (kfd_exit) before a process space
(mm_struct) is released.
v2: Fixed potential race conditions by removing all kfd_process from
the process table first, then working on releasing the resources.
v3: Fixed loop element access / synchronization. Fixed extra empty lines.
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/gpu/drm/amd/amdkfd/kfd_module.c | 1 +
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 +
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 67 +++++++++++++++++++++---
3 files changed, 62 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
index 09b966dc37681..aee2212e52f69 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -77,6 +77,7 @@ static int kfd_init(void)
static void kfd_exit(void)
{
+ kfd_cleanup_processes();
kfd_debugfs_fini();
kfd_process_destroy_wq();
kfd_procfs_shutdown();
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index bf610e3b683bb..6d6588b9beed7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -928,6 +928,7 @@ bool kfd_dev_is_large_bar(struct kfd_dev *dev);
int kfd_process_create_wq(void);
void kfd_process_destroy_wq(void);
+void kfd_cleanup_processes(void);
struct kfd_process *kfd_create_process(struct file *filep);
struct kfd_process *kfd_get_process(const struct task_struct *task);
struct kfd_process *kfd_lookup_process_by_pasid(u32 pasid);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index dd351105c1bcf..7f68d51541e8e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1167,6 +1167,17 @@ static void kfd_process_free_notifier(struct mmu_notifier *mn)
kfd_unref_process(container_of(mn, struct kfd_process, mmu_notifier));
}
+static void kfd_process_notifier_release_internal(struct kfd_process *p)
+{
+ cancel_delayed_work_sync(&p->eviction_work);
+ cancel_delayed_work_sync(&p->restore_work);
+
+ /* Indicate to other users that MM is no longer valid */
+ p->mm = NULL;
+
+ mmu_notifier_put(&p->mmu_notifier);
+}
+
static void kfd_process_notifier_release(struct mmu_notifier *mn,
struct mm_struct *mm)
{
@@ -1181,17 +1192,22 @@ static void kfd_process_notifier_release(struct mmu_notifier *mn,
return;
mutex_lock(&kfd_processes_mutex);
+ /*
+ * Do early return if table is empty.
+ *
+ * This could potentially happen if this function is called concurrently
+ * by mmu_notifier and by kfd_cleanup_pocesses.
+ *
+ */
+ if (hash_empty(kfd_processes_table)) {
+ mutex_unlock(&kfd_processes_mutex);
+ return;
+ }
hash_del_rcu(&p->kfd_processes);
mutex_unlock(&kfd_processes_mutex);
synchronize_srcu(&kfd_processes_srcu);
- cancel_delayed_work_sync(&p->eviction_work);
- cancel_delayed_work_sync(&p->restore_work);
-
- /* Indicate to other users that MM is no longer valid */
- p->mm = NULL;
-
- mmu_notifier_put(&p->mmu_notifier);
+ kfd_process_notifier_release_internal(p);
}
static const struct mmu_notifier_ops kfd_process_mmu_notifier_ops = {
@@ -1200,6 +1216,43 @@ static const struct mmu_notifier_ops kfd_process_mmu_notifier_ops = {
.free_notifier = kfd_process_free_notifier,
};
+/*
+ * This code handles the case when driver is being unloaded before all
+ * mm_struct are released. We need to safely free the kfd_process and
+ * avoid race conditions with mmu_notifier that might try to free them.
+ *
+ */
+void kfd_cleanup_processes(void)
+{
+ struct kfd_process *p;
+ struct hlist_node *p_temp;
+ unsigned int temp;
+ HLIST_HEAD(cleanup_list);
+
+ /*
+ * Move all remaining kfd_process from the process table to a
+ * temp list for processing. Once done, callback from mmu_notifier
+ * release will not see the kfd_process in the table and do early return,
+ * avoiding double free issues.
+ */
+ mutex_lock(&kfd_processes_mutex);
+ hash_for_each_safe(kfd_processes_table, temp, p_temp, p, kfd_processes) {
+ hash_del_rcu(&p->kfd_processes);
+ synchronize_srcu(&kfd_processes_srcu);
+ hlist_add_head(&p->kfd_processes, &cleanup_list);
+ }
+ mutex_unlock(&kfd_processes_mutex);
+
+ hlist_for_each_entry_safe(p, p_temp, &cleanup_list, kfd_processes)
+ kfd_process_notifier_release_internal(p);
+
+ /*
+ * Ensures that all outstanding free_notifier get called, triggering
+ * the release of the kfd_process struct.
+ */
+ mmu_notifier_synchronize();
+}
+
static int kfd_process_init_cwsr_apu(struct kfd_process *p, struct file *filep)
{
unsigned long offset;
--
2.39.2
next prev parent reply other threads:[~2023-03-22 20:08 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-22 19:58 [PATCH AUTOSEL 6.1 01/34] xfrm: Zero padding when dumping algos and encap Sasha Levin
2023-03-22 19:58 ` [PATCH AUTOSEL 6.1 02/34] ASoC: codecs: tx-macro: Fix for KASAN: slab-out-of-bounds Sasha Levin
2023-03-22 19:58 ` [PATCH AUTOSEL 6.1 03/34] ASoC: Intel: avs: max98357a: Explicitly define codec format Sasha Levin
2023-03-22 19:58 ` [PATCH AUTOSEL 6.1 04/34] ASoC: Intel: avs: da7219: " Sasha Levin
2023-03-22 19:58 ` [PATCH AUTOSEL 6.1 05/34] ASoC: Intel: avs: ssm4567: Remove nau8825 bits Sasha Levin
2023-03-22 19:58 ` [PATCH AUTOSEL 6.1 06/34] ASoC: Intel: avs: nau8825: Adjust clock control Sasha Levin
2023-03-22 19:58 ` [PATCH AUTOSEL 6.1 07/34] zstd: Fix definition of assert() Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 08/34] ACPI: video: Add backlight=native DMI quirk for Dell Vostro 15 3535 Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 09/34] ACPI: x86: Add skip i2c clients quirk for Lenovo Yoga Book X90 Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 10/34] ASoC: SOF: ipc3: Check for upper size limit for the received message Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 11/34] ASoC: SOF: ipc4-topology: Fix incorrect sample rate print unit Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 12/34] ASoC: SOF: Intel: pci-tng: revert invalid bar size setting Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 13/34] ASoC: SOF: IPC4: update gain ipc msg definition to align with fw Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 14/34] md: avoid signed overflow in slot_store() Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 15/34] x86/PVH: obtain VGA console info in Dom0 Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 16/34] drm/amdkfd: Fix BO offset for multi-VMA page migration Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 17/34] drm/amdkfd: fix a potential double free in pqm_create_queue Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 18/34] drm/amdkfd: fix potential kgd_mem UAFs Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 19/34] net: hsr: Don't log netdev_err message on unknown prp dst node Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 20/34] ALSA: asihpi: check pao in control_message() Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 21/34] ALSA: hda/ca0132: fixup buffer overrun at tuning_ctl_set() Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 22/34] fbdev: tgafb: Fix potential divide by zero Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 23/34] ACPI: tools: pfrut: Check if the input of level and type is in the right numeric range Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 24/34] sched_getaffinity: don't assume 'cpumask_size()' is fully initialized Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 25/34] nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620 Sasha Levin
2023-03-22 19:59 ` Sasha Levin [this message]
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 27/34] net/mlx5e: Lower maximum allowed MTU in XSK to match XDP prerequisites Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 28/34] fbdev: nvidia: Fix potential divide by zero Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 29/34] fbdev: intelfb: " Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 30/34] fbdev: lxfb: " Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 31/34] fbdev: au1200fb: " Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 32/34] tools/power turbostat: Fix /dev/cpu_dma_latency warnings Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 33/34] tools/power turbostat: fix decoding of HWP_STATUS Sasha Levin
2023-03-22 19:59 ` [PATCH AUTOSEL 6.1 34/34] tracing: Fix wrong return in kprobe_event_gen_test.c Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230322195926.1996699-26-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=Felix.Kuehling@amd.com \
--cc=Xinhui.Pan@amd.com \
--cc=airlied@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=daniel@ffwll.ch \
--cc=david.belanger@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox