From: Petr Pavlu <petr.pavlu@suse.com>
To: Luis Chamberlain <mcgrof@kernel.org>,
Petr Mladek <pmladek@suse.com>,
Prarit Bhargava <prarit@redhat.com>,
Vegard Nossum <vegard.nossum@oracle.com>,
Borislav Petkov <bp@alien8.de>, NeilBrown <neilb@suse.de>,
Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: david@redhat.com, mwilck@suse.com, linux-modules@vger.kernel.org,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH v2] module: Don't wait for GOING modules
Date: Wed, 18 Jan 2023 16:12:05 +0100 [thread overview]
Message-ID: <79aad139-5305-1081-8a84-42ef3763d4f4@suse.com> (raw)
In-Reply-To: <Y8c3hgVwKiVrKJM1@bombadil.infradead.org>
On 1/18/23 01:04, Luis Chamberlain wrote:
> On Tue, Dec 13, 2022 at 11:17:42AM +0100, Petr Mladek wrote:
>> On Mon 2022-12-12 21:09:19, Luis Chamberlain wrote:
>>> 3) *Fixing* a kernel regression by adding new expected API for testing
>>> against -EBUSY seems not ideal.
>>
>> IMHO, the right solution is to fix the subsystems so that they send
>> only one uevent.
>
> Makes sense, but that can take time and some folks are stuck on old kernels
> and perhaps porting fixes for this on subsystems may take time to land
> to some enterprisy kernels. And then there is also systemd that issues
> the requests too, at least that was reflected in commit 6e6de3dee51a
> ("kernel/module.c: Only return -EEXIST for modules that have finished loading")
> that commit claims it was systemd issueing the requests which I mean to
> interpret finit_module(), not calling modprobe.
>
> The rationale for making a regression fix with a new userspace return value
> is fair given the old fix made things even much worse the point some kernel
> boots would fail. So the rationale to suggest we *must* short-cut
> parallel loads as effectively as possible seems sensible *iff* that
> could not make things worse too but sadly I've found an isssue
> proactively with this fix, or at least that this issue is also not fixed:
>
> ./tools/testing/selftests/kmod/kmod.sh -t 0006
> Tue Jan 17 23:18:13 UTC 2023
> Running test: kmod_test_0006 - run #0
> kmod_test_0006: OK! - loading kmod test
> kmod_test_0006: FAIL, test expects SUCCESS (0) - got -EINVAL (-22)
> ----------------------------------------------------
> Custom trigger configuration for: test_kmod0
> Number of threads: 50
> Test_case: TEST_KMOD_FS_TYPE (2)
> driver: test_module
> fs: xfs
> ----------------------------------------------------
> Test completed
>
> When can multiple get_fs_type() calls be issued on a system? When
> mounting a large number of filesystems. Sadly though this issue seems
> to have gone unnoticed for a while now. Even reverting commit
> 6e6de3dee51a doesn't fix it, and I've run into issues with trying
> to bisect, first due to missing Kees' patch which fixes a compiler
> failure on older kernel [0] and now I'm seeing this while trying to
> build v5.1:
>
> ld: arch/x86/boot/compressed/pgtable_64.o:(.bss+0x0): multiple definition of `__force_order';
> arch/x86/boot/compressed/kaslr_64.o:(.bss+0x0): first defined here
> ld: warning: arch/x86/boot/compressed/efi_thunk_64.o: missing .note.GNU-stack section implies executable stack
> ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
> ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text'
> ld: warning: arch/x86/boot/compressed/vmlinux has a LOAD segment with RWX permissions
> ld: warning: creating DT_TEXTREL in a PIE
> make[2]: *** [arch/x86/boot/compressed/Makefile:118: arch/x86/boot/compressed/vmlinux] Error 1
> make[1]: *** [arch/x86/boot/Makefile:112: arch/x86/boot/compressed/vmlinux] Error 2
> make: *** [arch/x86/Makefile:283: bzImage] Error 2
>
> [0] http://lore.kernel.org/lkml/20220213182443.4037039-1-keescook@chromium.org
>
> But we should try to bisect to see what cauased the above kmod test 0006
> to start failing.
It is not clear to me from your description if the observed failure of
kmod_test_0006 is related to the fix in this thread.
The problem was not possible for me to reproduce on my system. My test was on
an 8-CPU x86_64 machine using v6.2-rc4 with "defconfig + kvm_guest.config +
tools/testing/selftests/kmod/config".
Could you perhaps trace the test to determine where the EINVAL value comes
from?
>> The question is how the module loader would deal with "broken"
>> subsystems. Petr Pavlu, please, fixme. I think that there are
>> more subsystems doing this ugly thing.
>>
>> I personally thing that returning -EBUSY is better than serializing
>> all the loads. It makes eventual problem easier to reproduce and fix.
>
> I agree with this assessment, however given the multiple get_fs_type()
> calls as an example, I am not sure if there are other areas which rely on the
> old busy-wait mechanism.
>
> *If* we knew this issue was not so common I'd go so far as to say we
> should pr_warn_once() on failure, but at this point in time I think it'd
> be pretty chatty.
>
> I don't yet have confidence that the new fast track to -EXIST or -EBUSY may
> not create regressions, so the below I think would be too chatty. If it
> wasn't true, I'd say we should keep record of these uses so we fix the
> callers.
A similar fast track logic was present prior to 6e6de3dee51a. The fix in this
thread doesn't bring a completely new behavior but rather restores the
previous one. The fix should have only two differences: a window when parallel
loads are detected is extended, the return code is -EBUSY instead of -EEXIST.
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index d3be89de706d..d1ad0b510cb8 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -2589,13 +2589,6 @@ static int add_unformed_module(struct module *mod)
> true);
> }
>
> - /*
> - * We are here only when the same module was being loaded. Do
> - * not try to load it again right now. It prevents long delays
> - * caused by serialized module load failures. It might happen
> - * when more devices of the same type trigger load of
> - * a particular module.
> - */
> if (old && old->state == MODULE_STATE_LIVE)
> err = -EEXIST;
> else
> @@ -2610,6 +2603,15 @@ static int add_unformed_module(struct module *mod)
> out:
> mutex_unlock(&module_mutex);
> out_unlocked:
> + /*
> + * We get an error here only when there is an attempt to load the
> + * same module. Subsystems should strive to only issue one request
> + * for a needed module. Multiple requests might happen when more devices
> + * of the same type trigger load of a particular module.
> + */
> + if (err)
> + pr_warn_once("%: dropping duplicate module request, err: %d\n",
> + mod->name, err);
> return err;
> }
I'm not sure if this would be the right thing to do.
What I think would be good to fix is the pattern utilized by some cpufreq (and
edac) modules. It is a combination of them being loaded per-CPU and using
a cooperative pattern to allow only one module of each such type on the
system. They can be viewed more as a whole-platform drivers that should be
attempted to be loaded only once. It is still on my todo list to post a patch
to cpufreq maintainers to start a discussion on this.
Another consideration on the kernel side could be to try grouping other
currently per-CPU loaded modules.
However, in general, a system can have many hardware pieces of the same type
and it looks correct to me that each is individually exposed in sysfs and via
uevents with their set of needed modules.
I would say that if this turns out to be a further issue in practice, udevd
could optimize its process and avoid same-module loads when it is bringing up
to life multiple devices.
Thanks,
Petr
next prev parent reply other threads:[~2023-01-18 15:17 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-05 10:35 [PATCH v2] module: Don't wait for GOING modules Petr Pavlu
2022-12-05 19:54 ` Petr Mladek
2022-12-06 16:57 ` Petr Pavlu
2022-12-07 15:15 ` Petr Mladek
2022-12-08 2:44 ` Luis Chamberlain
2022-12-12 11:29 ` Petr Pavlu
2022-12-13 2:58 ` Luis Chamberlain
2022-12-13 5:09 ` Luis Chamberlain
2022-12-13 10:17 ` Petr Mladek
2022-12-13 13:36 ` Petr Pavlu
2023-01-18 0:04 ` Luis Chamberlain
2023-01-18 0:18 ` Luis Chamberlain
2023-01-18 15:12 ` Petr Pavlu [this message]
2023-01-18 18:42 ` Luis Chamberlain
2023-01-19 12:26 ` Petr Pavlu
2023-01-19 15:47 ` Petr Mladek
2023-01-20 0:51 ` Luis Chamberlain
2023-01-20 0:58 ` Luis Chamberlain
2023-01-21 22:40 ` Luis Chamberlain
2023-03-11 21:36 ` Luis Chamberlain
2023-03-12 6:25 ` Lucas De Marchi
2023-03-22 22:31 ` Luis Chamberlain
2023-03-23 15:01 ` Lucas De Marchi
2023-03-23 15:08 ` Luis Chamberlain
2023-03-24 6:03 ` Lucas De Marchi
2023-03-24 18:47 ` Luis Chamberlain
2023-01-24 19:58 ` Luis Chamberlain
2023-01-18 20:02 ` Borislav Petkov
2023-01-19 1:23 ` Luis Chamberlain
2023-01-19 23:37 ` Luis Chamberlain
2023-01-19 1:15 ` Lucas De Marchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=79aad139-5305-1081-8a84-42ef3763d4f4@suse.com \
--to=petr.pavlu@suse.com \
--cc=bp@alien8.de \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-modules@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=mwilck@suse.com \
--cc=neilb@suse.de \
--cc=pmladek@suse.com \
--cc=prarit@redhat.com \
--cc=rgoldwyn@suse.com \
--cc=stable@vger.kernel.org \
--cc=vegard.nossum@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox