From: Greg KH <gregkh@linuxfoundation.org>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: david@redhat.com, patches@lists.linux.dev,
linux-modules@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, pmladek@suse.com,
petr.pavlu@suse.com, prarit@redhat.com,
torvalds@linux-foundation.org, rafael@kernel.org,
christophe.leroy@csgroup.eu, tglx@linutronix.de,
peterz@infradead.org, song@kernel.org, rppt@kernel.org,
dave@stgolabs.net, willy@infradead.org, vbabka@suse.cz,
mhocko@suse.com, dave.hansen@linux.intel.com,
colin.i.king@gmail.com, jim.cromie@gmail.com,
catalin.marinas@arm.com, jbaron@akamai.com,
rick.p.edgecombe@intel.com
Subject: Re: [PATCH] module: add debugging auto-load duplicate module support
Date: Wed, 19 Apr 2023 09:15:11 +0200 [thread overview]
Message-ID: <2023041951-evolution-unwitting-1791@gregkh> (raw)
In-Reply-To: <20230418204636.791699-1-mcgrof@kernel.org>
On Tue, Apr 18, 2023 at 01:46:36PM -0700, Luis Chamberlain wrote:
> This adds debugging support to the kernel module auto-loader to
> easily detect and deal with duplicate module requests. To aid
> with possible bootup failure issues it will supress the waste
> in virtual memory when races happen before userspace loads a
> module and the kernel is still issuing requests for the same
> module.
>
> Folks debugging virtual memory abuse on bootup can and should
> enable this to see what WARN()s come on, to see if module
> auto-loading is to blame for their woes.
You get 72 columns for changelog text, so you can use it :)
>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>
> Changes on this patch since the last RFC:
>
> o dropped the kernel_read*() patch from this series moving to
> punt the issues as a udev issue now that we have proof auto-loading
> is not the issue
> o some spell checks
>
> kernel/module/Kconfig | 40 +++++++
> kernel/module/Makefile | 1 +
> kernel/module/dups.c | 234 +++++++++++++++++++++++++++++++++++++++
> kernel/module/internal.h | 15 +++
> kernel/module/kmod.c | 23 +++-
> 5 files changed, 309 insertions(+), 4 deletions(-)
> create mode 100644 kernel/module/dups.c
>
> diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
> index e6df183e2c80..cc146ef4a6ac 100644
> --- a/kernel/module/Kconfig
> +++ b/kernel/module/Kconfig
> @@ -59,6 +59,46 @@ config MODULE_STATS
>
> If unsure, say N.
>
> +config MODULE_AUTOLOAD_SUPRESS_DUPS
MODULE_DEBUG_DUPLICATE perhaps? It has nothing to do with autoloading
(other than that is what userspace is doing) and you aren't suppressing
anything except throwing up warnings, right?
> + bool "Debug duplicate modules with auto-loading"
> + help
> + Module autoloading allows in-kernel code to request modules through
> + the *request_module*() API calls. This in turn just calls userspace
> + modprobe. Although modprobe checks to see if a module is already
> + loaded before trying to load a module there is a small time window in
> + which multiple duplicate requests can end up in userspace and multiple
> + modprobe calls race calling finit_module() around the same time for
> + duplicate modules. The finit_module() system call can consume in the
> + worst case more than twice the respective module size in virtual
> + memory for each duplicate module requests. Although duplicate module
> + requests are non-fatal virtual memory is a limited resource and each
> + duplicate module request ends up just wasting virtual memory.
It's not "wasted", as it is returned when the module is determined to be
a duplicate. Otherwise everyone will want this enabled as they think it
will actually save memory.
> + This debugging facility will create WARN() splats for duplicate module
> + requests to help identify if module auto-loading is the culprit to your
> + woes. Since virtual memory abuse caused by duplicate module requests
> + could render a system unusable this functionality will also suppresses
> + the waste in virtual memory caused by duplicate requests by sharing
> + races in requests for the same module to a single unified request.
> + Once a non-wait request_module() call completes a module should be
> + loaded and modprobe should simply not allow new finit_module() calls.
> +
> + Enable this functionality to try to debug virtual memory abuse during
> + boot on systems and identify if the abuse was due to module
> + auto-loading.
> +
> + If the first module request used request_module_nowait() we cannot
> + use that as the anchor to wait for duplicate module requests, since
> + users of request_module() do want a proper return value. If a call
> + for the same module happened earlier with request_module() though,
> + then a duplicate request_module_nowait() would be detected.
> +
> + You want to enable this if you want to debug and see if duplicate
> + module auto-loading might be causing virtual memory abuse during
> + bootup. A kernel trace will be provided for each duplicate request.
> +
> + Disable this if you are on production.
"on production"? That does not translate well, how about:
Only enable this for debugging system functionality, never have
it enabled on real systems.
or something like that?
> +
> endif # MODULE_DEBUG
>
> config MODULE_FORCE_LOAD
> diff --git a/kernel/module/Makefile b/kernel/module/Makefile
> index 52340bce497e..e8b121ac39cf 100644
> --- a/kernel/module/Makefile
> +++ b/kernel/module/Makefile
> @@ -10,6 +10,7 @@ KCOV_INSTRUMENT_module.o := n
> obj-y += main.o
> obj-y += strict_rwx.o
> obj-y += kmod.o
> +obj-$(CONFIG_MODULE_AUTOLOAD_SUPRESS_DUPS) += dups.o
> obj-$(CONFIG_MODULE_DECOMPRESS) += decompress.o
> obj-$(CONFIG_MODULE_SIG) += signing.o
> obj-$(CONFIG_LIVEPATCH) += livepatch.o
> diff --git a/kernel/module/dups.c b/kernel/module/dups.c
> new file mode 100644
> index 000000000000..903ab7c7e8f4
> --- /dev/null
> +++ b/kernel/module/dups.c
> @@ -0,0 +1,234 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * kmod dups - the kernel module autoloader duplicate suppressor
> + *
> + * Copyright (C) 2023 Luis Chamberlain <mcgrof@kernel.org>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/sched.h>
> +#include <linux/sched/task.h>
> +#include <linux/binfmts.h>
> +#include <linux/syscalls.h>
> +#include <linux/unistd.h>
> +#include <linux/kmod.h>
> +#include <linux/slab.h>
> +#include <linux/completion.h>
> +#include <linux/cred.h>
> +#include <linux/file.h>
> +#include <linux/fdtable.h>
> +#include <linux/workqueue.h>
> +#include <linux/security.h>
> +#include <linux/mount.h>
> +#include <linux/kernel.h>
> +#include <linux/init.h>
> +#include <linux/resource.h>
> +#include <linux/notifier.h>
> +#include <linux/suspend.h>
> +#include <linux/rwsem.h>
> +#include <linux/ptrace.h>
> +#include <linux/async.h>
> +#include <linux/uaccess.h>
> +
> +DEFINE_MUTEX(kmod_dup_mutex);
static?
> +static LIST_HEAD(dup_kmod_reqs);
> +
> +struct kmod_dup_req {
> + struct list_head list;
> + char name[MODULE_NAME_LEN];
> + struct completion first_req_done;
> + struct work_struct complete_work;
> + struct delayed_work delete_work;
> + int dup_ret;
> +};
> +
> +static struct kmod_dup_req *kmod_dup_request_lookup(char *module_name)
Lock requirements? You should document that here.
> +{
> + struct kmod_dup_req *kmod_req;
> +
> + list_for_each_entry_rcu(kmod_req, &dup_kmod_reqs, list,
> + lockdep_is_held(&kmod_dup_mutex)) {
> + if (strlen(kmod_req->name) == strlen(module_name) &&
> + !memcmp(kmod_req->name, module_name, strlen(module_name))) {
> + return kmod_req;
> + }
> + }
> +
> + return NULL;
> +}
> +
> +static void kmod_dup_request_delete(struct work_struct *work)
> +{
> + struct kmod_dup_req *kmod_req;
> + kmod_req = container_of(to_delayed_work(work), struct kmod_dup_req, delete_work);
> +
> + /*
> + * The typical situation is a module successully loaded. In that
> + * situation the module will be present already in userspace. If
> + * new requests come in after that, userspace will already know the
> + * module is loaded so will just return 0 right away. There is still
> + * a small chance right after we delete this entry new request_module()
> + * calls may happen after that, they can happen. These heuristics
> + * are to protect finit_module() abuse for auto-loading, if modules
> + * are still tryign to auto-load even if a module is already loaded,
> + * that's on them, and those inneficiencies should not be fixed by
> + * kmod. The inneficies there are a call to modprobe and modprobe
> + * just returning 0.
> + */
> + mutex_lock(&kmod_dup_mutex);
> + list_del_rcu(&kmod_req->list);
> + synchronize_rcu();
> + mutex_unlock(&kmod_dup_mutex);
> + kfree(kmod_req);
> +}
> +
> +static void kmod_dup_request_complete(struct work_struct *work)
> +{
> + struct kmod_dup_req *kmod_req;
> +
> + kmod_req = container_of(work, struct kmod_dup_req, complete_work);
> +
> + /*
> + * This will ensure that the kernel will let all the waiters get
> + * informed its time to check the return value. It's time to
> + * go home.
> + */
> + complete_all(&kmod_req->first_req_done);
> +
> + /*
> + * Now that we have allowed prior request_module() calls to go on
> + * with life, let's schedule deleting this entry. We don't have
> + * to do it right away, but we *eventually* want to do it so to not
> + * let this linger forever as this is just a boot optimization for
> + * possible abuses of vmalloc() incurred by finit_module() thrashing.
> + */
> + queue_delayed_work(system_wq, &kmod_req->delete_work, 60 * HZ);
> +}
> +
> +bool kmod_dup_request_exists_wait(char *module_name, bool wait, int *dup_ret)
> +{
> + struct kmod_dup_req *kmod_req, *new_kmod_req;
> + int ret;
> +
> + /*
> + * Pre-allocate the entry in case we have to use it later
> + * to avoid contention with the mutex.
> + */
> + new_kmod_req = kzalloc(sizeof(*new_kmod_req), GFP_KERNEL);
> + if (!new_kmod_req)
> + return false;
> +
> + memcpy(new_kmod_req->name, module_name, strlen(module_name));
> + INIT_WORK(&new_kmod_req->complete_work, kmod_dup_request_complete);
> + INIT_DELAYED_WORK(&new_kmod_req->delete_work, kmod_dup_request_delete);
> + init_completion(&new_kmod_req->first_req_done);
> +
> + mutex_lock(&kmod_dup_mutex);
> +
> + kmod_req = kmod_dup_request_lookup(module_name);
> + if (!kmod_req) {
> + /*
> + * If the first request that came through for a module
> + * was with request_module_nowait() we cannot wait for it
> + * and share its return value with other users which may
> + * have used request_module() and need a proper return value
> + * so just skip using them as an anchor.
> + *
> + * If a prior request to this one came through with
> + * request_module() though, then a request_module_nowait()
> + * would benefit from duplicate detection.
> + */
> + if (!wait) {
> + kfree(new_kmod_req);
> + pr_warn("New request_module_nowait() for %s -- cannot track duplicates for this request\n", module_name);
pr_dbg() as this is debugging?
And did you set your pr_fmt value to make it obvious where these
messages are coming from?
> + mutex_unlock(&kmod_dup_mutex);
> + return false;
> + }
> +
> + /*
> + * There was no duplicate, just add the request so we can
> + * keep tab on duplicates later.
> + */
> + pr_info("New request_module() for %s\n", module_name);
Why not pr_dbg()?
> + list_add_rcu(&new_kmod_req->list, &dup_kmod_reqs);
> + mutex_unlock(&kmod_dup_mutex);
> + return false;
> + }
> + mutex_unlock(&kmod_dup_mutex);
> +
> + /* We are dealing with a duplicate request now */
> +
No blank line needed.
> + kfree(new_kmod_req);
> +
> + /*
> + * To fix these try to use try_then_request_module() instead as that
> + * will check if the component you are looking for is present or not.
> + * You could also just queue a single request to load the module once,
> + * instead of having each and everything you need try to request for
> + * the module.
> + *
> + * Duplicate request_module() calls can cause quite a bit of wasted
> + * vmalloc() space when racing with userspace.
> + */
> + WARN(1, "module-autoload: duplicate request for module %s\n", module_name);
Remember, this will trigger syzbot and any system that has
panic-on-warn, so be prepared for that mess. Especially the syzbot
reports, that's going to cause lots of issues for you if you don't tell
them to always disable this option.
thanks,
greg k-h
next prev parent reply other threads:[~2023-04-19 7:15 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-18 20:46 [PATCH] module: add debugging auto-load duplicate module support Luis Chamberlain
2023-04-19 7:15 ` Greg KH [this message]
2023-04-19 23:32 ` Luis Chamberlain
2023-04-20 5:32 ` Greg KH
2023-04-20 21:03 ` Luis Chamberlain
2023-04-21 15:12 ` Greg KH
2023-04-21 16:42 ` Lucas De Marchi
2023-04-21 17:38 ` Luis Chamberlain
2023-04-21 18:31 ` Lucas De Marchi
2023-04-21 18:45 ` Luis Chamberlain
2023-04-26 10:13 ` Petr Pavlu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2023041951-evolution-unwitting-1791@gregkh \
--to=gregkh@linuxfoundation.org \
--cc=catalin.marinas@arm.com \
--cc=christophe.leroy@csgroup.eu \
--cc=colin.i.king@gmail.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=jbaron@akamai.com \
--cc=jim.cromie@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-modules@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=mhocko@suse.com \
--cc=patches@lists.linux.dev \
--cc=peterz@infradead.org \
--cc=petr.pavlu@suse.com \
--cc=pmladek@suse.com \
--cc=prarit@redhat.com \
--cc=rafael@kernel.org \
--cc=rick.p.edgecombe@intel.com \
--cc=rppt@kernel.org \
--cc=song@kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.