From: Ming Lei <ming.lei@redhat.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Petr Mladek <pmladek@suse.com>,
linux-kernel@vger.kernel.org,
Luis Chamberlain <mcgrof@kernel.org>
Subject: Re: [PATCH V2 2/2] kobject: wait until kobject is cleaned up before freeing module
Date: Tue, 7 Dec 2021 09:50:03 +0800 [thread overview]
Message-ID: <Ya69y7hPo52V0kRy@T590> (raw)
In-Reply-To: <Ya3EGLbhNWrpTqX+@kroah.com>
On Mon, Dec 06, 2021 at 09:04:40AM +0100, Greg Kroah-Hartman wrote:
> On Mon, Dec 06, 2021 at 10:13:53AM +0800, Ming Lei wrote:
> > On Fri, Dec 03, 2021 at 04:07:39PM +0100, Greg Kroah-Hartman wrote:
> > > On Mon, Nov 29, 2021 at 11:45:09AM +0800, Ming Lei wrote:
> > > > kobject_put() may become asynchronously because of
> > > > CONFIG_DEBUG_KOBJECT_RELEASE, so once kobject_put() returns, the caller may
> > > > expect the kobject is released after the last refcnt is dropped, however
> > > > CONFIG_DEBUG_KOBJECT_RELEASE just schedules one delayed work function
> > > > for cleaning up the kobject.
> > >
> > > The caller should NOT expect the kobject to be released. That's the
> > > whole point of dynamic reference counted objects, you never "know" when
> > > the last object is released. This option just makes it obvious so that
> > > you know when to fix up code that has this assumption.
> >
> > Yes, so CONFIG_DEBUG_KOBJECT_RELEASE needs to be fixed.
>
> What is broken with it today? It is there for you to find problems in
> your kernel code that uses kobjects. What oops/crash/whatever is it
> causing that you feel it should not be causing?
>
> A module's kobject is "owned" by the module core, not the module code
No, this patch is nothing to do with module's kobject, we are talking
about any kobjects allocated/released from one driver built as module.
> that is being unloaded, so I don't see the problem here. More details
> please.
If CONFIG_DEBUG_KOBJECT_RELEASE is enabled, kobject_release() will
schedule a (random time)delay work to run kobject_cleanup(), and the delay
work may be run after the module which allocates/frees the kobject is
unloaded.
kobject_cleanup():
struct kobj_type *t = get_ktype(kobj);
...
if (t && t->release) {
pr_debug("kobject: '%s' (%p): calling ktype release\n",
kobject_name(kobj), kobj);
t->release(kobj);
}
Both kobj_type and ->release are allocated in the module data/text section,
so kernel panic is triggered when 't && t->release' is run from the
delay work context.
>
> > > > Inside the cleanup handler, kobj->ktype and kobj->ktype->release are
> > > > required.
> > >
> > > Yes. Is that a problem?
> >
> > Of course for CONFIG_DEBUG_KOBJECT_RELEASE, which delays to call
> > ->release after random time, when the module for storing ->ktype and
> > ->ktype->release has been unloaded.
> >
> > As I mentioned, the issue can be triggered 100% by 'modprobe -r
> > kset-example' when CONFIG_DEBUG_KOBJECT_RELEASE is enabled if the
> > 1st patch is applied.
>
> Is there any "real" kernel code that this causes problems on?
>
> Again, this is for debugging, yes, this tiny example will crash that
> way, but that is fine, as we can obviously see that the kernel code here
> is correct.
Nothing is wrong with kset-example, the issue is just that foo_ktype and
foo_release are allocated in code/data section of the module 'kset-example'.
There are ~150 such uses:
[linux]$ git grep -n "static struct kobj_type" ./ | grep "{" | wc
153 923 11676
Most of the code can be built as module, so all should have such problem,
that is why I think it as one generic issue, not kset-example specific.
Here kset-example is referred just for showing the issue easily.
>
> And if you really want to ensure that it works properly, let's wait on
> release before allowing that module to be unloaded. But again, module
Then all modules which uses kobject need such change.
> unload is NOT a normal operation and is not what this debugging option
> was created to help out with.
But CONFIG_MODULE and CONFIG_DEBUG_KOBJECT_RELEASE can be enabled at the
same time.
>
> Again, the confusion between kobjects (which protect data) and module
> references (which protect code) is getting mixed up here.
>
> > > > It is supposed that no activity is on kobject itself any more since
> > > > module_exit() is started, so it is reasonable for the kobject user or
> > > > driver to expect that kobject can be really released in the last run of
> > > > kobject_put() in module_exit() code path. Otherwise, it can be thought as
> > > > one driver's bug since the module is going away.
> > >
> > > Why is module_exit() somehow special here? What is so odd about that?
> >
> > After module_exit() is done, the module will be unloaded, then any code
> > or data stored in the module can't be referred.
> >
> > >
> > > > When the ->ktype and ->ktype->release are allocated as module static
> > > > variable, it can cause trouble because the delayed cleanup handler may
> > > > be run after the module is unloaded.
> > >
> > > Why is ktype and release part of module code?
> >
> > Lots of driver defines ktype and ktype->release in its module static
> > variable.
>
> They do? Where?
>
> > > What module kobject is causing this problem?
> >
> > Any modules which defines its ktype and ktype->release in its module
> > static variable, which is pretty common.
>
> What non-example code does this? Let's fix that.
>
> > > > Fixes the issue by flushing scheduled kobject cleanup work before
> > > > freeing module.
> > >
> > > Why are modules special here?
> > >
> > > And if you enable this option, and then start unloading kernel modules,
> > > yes, things can go wrong, but that's not what this kernel option is for
> > > at all.
> > >
> > > This feels like a hack for not a real problem.
> >
> > I think it is caused by CONFIG_DEBUG_KOBJECT_RELEASE, that is why this
> > patch is posted. Otherwise I'd suggest to remove
> > CONFIG_DEBUG_KOBJECT_RELEASE, which supposes to not panic kernel since
> > there isn't anything wrong from driver side.
>
> Perhaps just put a nice warning in that debug option that says "beware
> of unloading modules with this option enabled."
>
> Or better yet, forbid it if that option is enabled :)
You mean disabling CONFIG_DEBUG_KOBJECT_RELEASE if CONFIG_MODULE is
enabled?
Thanks,
Ming
next prev parent reply other threads:[~2021-12-07 1:50 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-29 3:45 [PATCH V2 0/2] kobject: avoid to cleanup kobject after module is unloaded Ming Lei
2021-11-29 3:45 ` [PATCH V2 1/2] kobject: don't delay to cleanup module kobject Ming Lei
2021-12-03 15:08 ` Greg Kroah-Hartman
2021-12-07 13:58 ` kernel test robot
2021-11-29 3:45 ` [PATCH V2 2/2] kobject: wait until kobject is cleaned up before freeing module Ming Lei
2021-12-03 15:07 ` Greg Kroah-Hartman
2021-12-06 2:13 ` Ming Lei
2021-12-06 8:04 ` Greg Kroah-Hartman
2021-12-07 1:50 ` Ming Lei [this message]
2021-12-07 10:32 ` Petr Mladek
2021-12-07 12:51 ` Ming Lei
2021-12-07 14:02 ` Petr Mladek
2022-07-07 4:54 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ya69y7hPo52V0kRy@T590 \
--to=ming.lei@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=pmladek@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox