public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: richard kennedy <richard@rsk.demon.co.uk>
To: Karsten Keil <kkeil@suse.de>
Cc: linux-kernel@vger.kernel.org, Michal Hocko <mhocko@suse.cz>,
	Rusty Russell <rusty@rustcorp.com.au>
Subject: Re: [RFC] Suspicious bug in module refcounting
Date: Tue, 03 Feb 2009 15:02:25 +0000	[thread overview]
Message-ID: <49885C81.2040901@rsk.demon.co.uk> (raw)
In-Reply-To: <20090203134721.GA11069@pingi.kke.suse.de>

Karsten Keil wrote:
> Hi,
> 
> While debugging a wired SCTP bug we hit from time to time
> the BUG_ON(module_refcount(module) == 0) statement in __module_get().
> After fixing the SCTP bug in final tests runs with lot of traffic
> we still saw this bug message from time to time.
> Looking at the refcounting in sctp, does not show up any forgotten
> get module operation or some wrong put module calls.
> We added some debug code and this shows only, that the test case heavly
> change module_refcount(sctp) bacause the socket accept call also increase
> the module count (and the BUG always was triggered in the __module_get() here).
> If the socket close, it decrease the module refcount, so no problem here.
> Some times the refcount seems to go very high, very quickly and later
> goes down very quickly again, I think this occurs if the network stalls for
> some time (the test case try to saturate a GB networklink).
> So I had the idea that the bug is in the refcounting itself and not related
> to the sctp code.
> After looking closer at __module_get() and module_refcount(module) it looks
> as I'm right, but I can not belive that this bug was not discovered before,
> the code is here since long time.
> The refcount is a per CPU atomic variable, module_refcount() simple add
> in a fully unprotected loop (not disabled irqs, not protected against
> scheduling) all per cpu values.
> 
> The issue is that so the process (it is as syscall from a userspace process)
> get scheduled in the middle of the counting loop, so it already has counted
> some per cpu vales, but not all.
> Now while the process is not active, other processes modify the counts, some
> accepts (== module gets) increase the count on already summed up CPUs,
> some other release sockets on CPUs, which are not already summed up.
> If the process become active again, it read the now decremented values from
> later CPUS, so the total count is too low and may reach zero, in which the above
> BUG_ON() will be triggered.
> 
> To prove this I replaced the BUG_ON with following code:
> 
>         if (module) {
> -               BUG_ON(module_refcount(module) == 0);
> +               unsigned int c = module_refcount(module);
> +
> +               if (unlikely(c == 0)) {
> +                       printk(KERN_ERR" module %s refcount=%x/%x\n", module->name, c,  module_refcount(module));
> +                       module_dump_refcounts(module);
> +                       WARN_ON(1);
> +               }
>                 local_inc(&module->ref[get_cpu()].count);
>                 put_cpu();
>         }
> 
> module_dump_refcounts() does print out the per cpu refcounts.
> If my idea is correct the second call to  module_refcount(module) should
> have a none zero value.
> 
> And indeed:
> 
> Feb  2 03:29:51 pingi5 kernel:  module sctp refcount=0/8e
> Feb  2 03:29:51 pingi5 kernel: CPU0 refcount:38562331
> Feb  2 03:29:51 pingi5 kernel: CPU1 refcount:-38562187
> Feb  2 03:29:51 pingi5 kernel: Badness in __module_get at include/linux/module.h:374
> Feb  2 03:29:51 pingi5 kernel:
> Feb  2 03:29:51 pingi5 kernel: Call Trace: <ffffffff802837c2>{sys_accept+212} <ffffffff8019c76f>{dput+44}
> Feb  2 03:29:51 pingi5 kernel:        <ffffffff8018704f>{__fput+355} <ffffffff801a0dff>{mntput_no_expire+29}
> Feb  2 03:29:51 pingi5 kernel:        <ffffffff801845df>{filp_close+92} <ffffffff8010adba>{system_call+126}
> 
> 
> Do my findings be correct, or do I miss something ?
> 
> A other thing is, why __module_get() should be used anyway, I think it was a
> optimation long time ago, current code seems to need more cycles on default
> SMP kernels as try_module_get(), because of the big loop  in module_refcount()
> which goes trough all possible CPUs (NR_CPU, 64 in default config).
> try_module_get() only has one test and one atomic increment.
> 
> 
> I think we should replace all unprotected __module_get() calls with
> try_module_get(), or remove __module_get() completely.
> 
> 
> Any comments ?
> 

Hi Karsten,

I've been seeing problems in a simple socket test harness that I think
has the same cause. I have several clients all calling a single socket
server on the same machine, under heavy load & after some time the
clients will fail to connect with -EADDRNOTAVAIL even though the server
is still running.

I changed the module ref count to be an atomic_t and the problem goes
away. I haven't tracked down the exact cause of the problem yet ( the
network code is kind of tricky!).

So I think you're right module_refcount() does need some sort of locking.

BTW On my desktop machine I cannot measure any significant performance
differences between the atomic_t refcount & the existing code, so it may
be that atomic_t will be good enough.

regards
Richard






  reply	other threads:[~2009-02-03 15:02 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-03 13:47 [RFC] Suspicious bug in module refcounting Karsten Keil
2009-02-03 15:02 ` richard kennedy [this message]
2009-02-04  3:48 ` Rusty Russell
2009-02-04 10:11   ` Russell King
2009-02-04 10:55     ` Rusty Russell
2009-02-04 10:59       ` Russell King
2009-02-04 16:33   ` Dan Williams
2009-02-06 22:41   ` Karsten Keil
2009-02-09 15:18   ` Michal Hocko
2009-02-10  3:15     ` Rusty Russell
2009-02-10  3:42       ` Karsten Keil
2009-02-10 10:31       ` Michal Hocko
2009-02-10 13:36         ` Rusty Russell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49885C81.2040901@rsk.demon.co.uk \
    --to=richard@rsk.demon.co.uk \
    --cc=kkeil@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox