From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754201AbZBCPCh (ORCPT ); Tue, 3 Feb 2009 10:02:37 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752808AbZBCPCb (ORCPT ); Tue, 3 Feb 2009 10:02:31 -0500 Received: from anchor-post-1.mail.demon.net ([195.173.77.132]:39405 "EHLO anchor-post-1.mail.demon.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752770AbZBCPCa (ORCPT ); Tue, 3 Feb 2009 10:02:30 -0500 Message-ID: <49885C81.2040901@rsk.demon.co.uk> Date: Tue, 03 Feb 2009 15:02:25 +0000 From: richard kennedy User-Agent: Thunderbird 2.0.0.19 (X11/20090105) MIME-Version: 1.0 To: Karsten Keil CC: linux-kernel@vger.kernel.org, Michal Hocko , Rusty Russell Subject: Re: [RFC] Suspicious bug in module refcounting References: <20090203134721.GA11069@pingi.kke.suse.de> In-Reply-To: <20090203134721.GA11069@pingi.kke.suse.de> X-Enigmail-Version: 0.95.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Karsten Keil wrote: > Hi, > > While debugging a wired SCTP bug we hit from time to time > the BUG_ON(module_refcount(module) == 0) statement in __module_get(). > After fixing the SCTP bug in final tests runs with lot of traffic > we still saw this bug message from time to time. > Looking at the refcounting in sctp, does not show up any forgotten > get module operation or some wrong put module calls. > We added some debug code and this shows only, that the test case heavly > change module_refcount(sctp) bacause the socket accept call also increase > the module count (and the BUG always was triggered in the __module_get() here). > If the socket close, it decrease the module refcount, so no problem here. > Some times the refcount seems to go very high, very quickly and later > goes down very quickly again, I think this occurs if the network stalls for > some time (the test case try to saturate a GB networklink). > So I had the idea that the bug is in the refcounting itself and not related > to the sctp code. > After looking closer at __module_get() and module_refcount(module) it looks > as I'm right, but I can not belive that this bug was not discovered before, > the code is here since long time. > The refcount is a per CPU atomic variable, module_refcount() simple add > in a fully unprotected loop (not disabled irqs, not protected against > scheduling) all per cpu values. > > The issue is that so the process (it is as syscall from a userspace process) > get scheduled in the middle of the counting loop, so it already has counted > some per cpu vales, but not all. > Now while the process is not active, other processes modify the counts, some > accepts (== module gets) increase the count on already summed up CPUs, > some other release sockets on CPUs, which are not already summed up. > If the process become active again, it read the now decremented values from > later CPUS, so the total count is too low and may reach zero, in which the above > BUG_ON() will be triggered. > > To prove this I replaced the BUG_ON with following code: > > if (module) { > - BUG_ON(module_refcount(module) == 0); > + unsigned int c = module_refcount(module); > + > + if (unlikely(c == 0)) { > + printk(KERN_ERR" module %s refcount=%x/%x\n", module->name, c, module_refcount(module)); > + module_dump_refcounts(module); > + WARN_ON(1); > + } > local_inc(&module->ref[get_cpu()].count); > put_cpu(); > } > > module_dump_refcounts() does print out the per cpu refcounts. > If my idea is correct the second call to module_refcount(module) should > have a none zero value. > > And indeed: > > Feb 2 03:29:51 pingi5 kernel: module sctp refcount=0/8e > Feb 2 03:29:51 pingi5 kernel: CPU0 refcount:38562331 > Feb 2 03:29:51 pingi5 kernel: CPU1 refcount:-38562187 > Feb 2 03:29:51 pingi5 kernel: Badness in __module_get at include/linux/module.h:374 > Feb 2 03:29:51 pingi5 kernel: > Feb 2 03:29:51 pingi5 kernel: Call Trace: {sys_accept+212} {dput+44} > Feb 2 03:29:51 pingi5 kernel: {__fput+355} {mntput_no_expire+29} > Feb 2 03:29:51 pingi5 kernel: {filp_close+92} {system_call+126} > > > Do my findings be correct, or do I miss something ? > > A other thing is, why __module_get() should be used anyway, I think it was a > optimation long time ago, current code seems to need more cycles on default > SMP kernels as try_module_get(), because of the big loop in module_refcount() > which goes trough all possible CPUs (NR_CPU, 64 in default config). > try_module_get() only has one test and one atomic increment. > > > I think we should replace all unprotected __module_get() calls with > try_module_get(), or remove __module_get() completely. > > > Any comments ? > Hi Karsten, I've been seeing problems in a simple socket test harness that I think has the same cause. I have several clients all calling a single socket server on the same machine, under heavy load & after some time the clients will fail to connect with -EADDRNOTAVAIL even though the server is still running. I changed the module ref count to be an atomic_t and the problem goes away. I haven't tracked down the exact cause of the problem yet ( the network code is kind of tricky!). So I think you're right module_refcount() does need some sort of locking. BTW On my desktop machine I cannot measure any significant performance differences between the atomic_t refcount & the existing code, so it may be that atomic_t will be good enough. regards Richard