All of lore.kernel.org
 help / color / mirror / Atom feed
From: richard kennedy <richard@rsk.demon.co.uk>
To: Karsten Keil <kkeil@suse.de>
Cc: linux-kernel@vger.kernel.org, Michal Hocko <mhocko@suse.cz>,
	Rusty Russell <rusty@rustcorp.com.au>
Subject: Re: [RFC] Suspicious bug in module refcounting
Date: Tue, 03 Feb 2009 15:02:25 +0000	[thread overview]
Message-ID: <49885C81.2040901@rsk.demon.co.uk> (raw)
In-Reply-To: <20090203134721.GA11069@pingi.kke.suse.de>

Karsten Keil wrote:
> Hi,
> 
> While debugging a wired SCTP bug we hit from time to time
> the BUG_ON(module_refcount(module) == 0) statement in __module_get().
> After fixing the SCTP bug in final tests runs with lot of traffic
> we still saw this bug message from time to time.
> Looking at the refcounting in sctp, does not show up any forgotten
> get module operation or some wrong put module calls.
> We added some debug code and this shows only, that the test case heavly
> change module_refcount(sctp) bacause the socket accept call also increase
> the module count (and the BUG always was triggered in the __module_get() here).
> If the socket close, it decrease the module refcount, so no problem here.
> Some times the refcount seems to go very high, very quickly and later
> goes down very quickly again, I think this occurs if the network stalls for
> some time (the test case try to saturate a GB networklink).
> So I had the idea that the bug is in the refcounting itself and not related
> to the sctp code.
> After looking closer at __module_get() and module_refcount(module) it looks
> as I'm right, but I can not belive that this bug was not discovered before,
> the code is here since long time.
> The refcount is a per CPU atomic variable, module_refcount() simple add
> in a fully unprotected loop (not disabled irqs, not protected against
> scheduling) all per cpu values.
> 
> The issue is that so the process (it is as syscall from a userspace process)
> get scheduled in the middle of the counting loop, so it already has counted
> some per cpu vales, but not all.
> Now while the process is not active, other processes modify the counts, some
> accepts (== module gets) increase the count on already summed up CPUs,
> some other release sockets on CPUs, which are not already summed up.
> If the process become active again, it read the now decremented values from
> later CPUS, so the total count is too low and may reach zero, in which the above
> BUG_ON() will be triggered.
> 
> To prove this I replaced the BUG_ON with following code:
> 
>         if (module) {
> -               BUG_ON(module_refcount(module) == 0);
> +               unsigned int c = module_refcount(module);
> +
> +               if (unlikely(c == 0)) {
> +                       printk(KERN_ERR" module %s refcount=%x/%x\n", module->name, c,  module_refcount(module));
> +                       module_dump_refcounts(module);
> +                       WARN_ON(1);
> +               }
>                 local_inc(&module->ref[get_cpu()].count);
>                 put_cpu();
>         }
> 
> module_dump_refcounts() does print out the per cpu refcounts.
> If my idea is correct the second call to  module_refcount(module) should
> have a none zero value.
> 
> And indeed:
> 
> Feb  2 03:29:51 pingi5 kernel:  module sctp refcount=0/8e
> Feb  2 03:29:51 pingi5 kernel: CPU0 refcount:38562331
> Feb  2 03:29:51 pingi5 kernel: CPU1 refcount:-38562187
> Feb  2 03:29:51 pingi5 kernel: Badness in __module_get at include/linux/module.h:374
> Feb  2 03:29:51 pingi5 kernel:
> Feb  2 03:29:51 pingi5 kernel: Call Trace: <ffffffff802837c2>{sys_accept+212} <ffffffff8019c76f>{dput+44}
> Feb  2 03:29:51 pingi5 kernel:        <ffffffff8018704f>{__fput+355} <ffffffff801a0dff>{mntput_no_expire+29}
> Feb  2 03:29:51 pingi5 kernel:        <ffffffff801845df>{filp_close+92} <ffffffff8010adba>{system_call+126}
> 
> 
> Do my findings be correct, or do I miss something ?
> 
> A other thing is, why __module_get() should be used anyway, I think it was a
> optimation long time ago, current code seems to need more cycles on default
> SMP kernels as try_module_get(), because of the big loop  in module_refcount()
> which goes trough all possible CPUs (NR_CPU, 64 in default config).
> try_module_get() only has one test and one atomic increment.
> 
> 
> I think we should replace all unprotected __module_get() calls with
> try_module_get(), or remove __module_get() completely.
> 
> 
> Any comments ?
> 

Hi Karsten,

I've been seeing problems in a simple socket test harness that I think
has the same cause. I have several clients all calling a single socket
server on the same machine, under heavy load & after some time the
clients will fail to connect with -EADDRNOTAVAIL even though the server
is still running.

I changed the module ref count to be an atomic_t and the problem goes
away. I haven't tracked down the exact cause of the problem yet ( the
network code is kind of tricky!).

So I think you're right module_refcount() does need some sort of locking.

BTW On my desktop machine I cannot measure any significant performance
differences between the atomic_t refcount & the existing code, so it may
be that atomic_t will be good enough.

regards
Richard






  reply	other threads:[~2009-02-03 15:02 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-03 13:47 [RFC] Suspicious bug in module refcounting Karsten Keil
2009-02-03 15:02 ` richard kennedy [this message]
2009-02-04  3:48 ` Rusty Russell
2009-02-04 10:11   ` Russell King
2009-02-04 10:55     ` Rusty Russell
2009-02-04 10:59       ` Russell King
2009-02-04 16:33   ` Dan Williams
2009-02-06 22:41   ` Karsten Keil
2009-02-09 15:18   ` Michal Hocko
2009-02-10  3:15     ` Rusty Russell
2009-02-10  3:42       ` Karsten Keil
2009-02-10 10:31       ` Michal Hocko
2009-02-10 13:36         ` Rusty Russell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49885C81.2040901@rsk.demon.co.uk \
    --to=richard@rsk.demon.co.uk \
    --cc=kkeil@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.