From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261310AbULHTEN (ORCPT ); Wed, 8 Dec 2004 14:04:13 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261315AbULHTEN (ORCPT ); Wed, 8 Dec 2004 14:04:13 -0500 Received: from bgm-24-94-57-164.stny.rr.com ([24.94.57.164]:29346 "EHLO localhost.localdomain") by vger.kernel.org with ESMTP id S261310AbULHTD6 (ORCPT ); Wed, 8 Dec 2004 14:03:58 -0500 Subject: Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-6 From: Steven Rostedt To: Rui Nuno Capela Cc: Ingo Molnar , LKML , Lee Revell , Mark Johnson , "K.R. Foley" , Bill Huey , Adam Heath , Florian Schmidt , Thomas Gleixner , Michal Schmidt , Fernando Pablo Lopez-Lezcano , Karsten Wiese , Gunther Persoons , emann@mrv.com, Shane Shrybman , Amit Shah , Esben Nielsen In-Reply-To: <32950.192.168.1.5.1102529664.squirrel@192.168.1.5> References: <20041116130946.GA11053@elte.hu> <20041116134027.GA13360@elte.hu> <20041117124234.GA25956@elte.hu> <20041118123521.GA29091@elte.hu> <20041118164612.GA17040@elte.hu> <20041122005411.GA19363@elte.hu> <20041123175823.GA8803@elte.hu> <20041124101626.GA31788@elte.hu> <20041203205807.GA25578@elte.hu> <20041207132927.GA4846@elte.hu> <20041207141123.GA12025@elte.hu> <1102526018.25841.308.camel@localhost.localdomain> <32950.192.168.1.5.1102529664.squirrel@192.168.1.5> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Kihon Technologies Date: Wed, 08 Dec 2004 14:03:45 -0500 Message-Id: <1102532625.25841.327.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2004-12-08 at 18:14 +0000, Rui Nuno Capela wrote: > Steven Rostedt wrote: > > > > I found a race condition in slab.c, but I'm still trying to figure out > > exactly how it's playing out. This has to do with dynamic loading and > > unloading of caches. I have a small test case that simulates the problem > > at http://home.stny.rr.com/rostedt/tests/sillycaches.tgz > > > > This was done on: > > > > # uname -r > > 2.6.10-rc2-mm3-V0.7.32-9 > > Found the culprit!!! I did a diff of 2.6.10-rc2-mm3 to 2.6.10-rc2-mm3-V0.7.32-9 and found this in slab.c: ---------------------------- +#ifndef CONFIG_PREEMPT_RT +/* + * Executes in an IRQ context: + */ static void do_drain(void *arg) { kmem_cache_t *cachep = (kmem_cache_t*)arg; struct array_cache *ac; + int cpu = smp_processor_id(); check_irq_off(); - ac = ac_data(cachep); + ac = ac_data(cachep, cpu); spin_lock(&cachep->spinlock); free_block(cachep, &ac_entry(ac)[0], ac->avail); spin_unlock(&cachep->spinlock); ac->avail = 0; } +#endif static void drain_cpu_caches(kmem_cache_t *cachep) { +#ifndef CONFIG_PREEMPT_RT smp_call_function_all_cpus(do_drain, cachep); +#endif check_irq_on(); -------------------------------- (I have CONFIG_PREEMPT_RT defined :-) I then put in static void drain_cpu_caches(kmem_cache_t *cachep) { #ifndef CONFIG_PREEMPT_RT smp_call_function_all_cpus(do_drain, cachep); #endif check_irq_on(); spin_lock_irq(&cachep->spinlock); + { + struct array_cache *ac; + ac = ac_data(cachep, smp_processor_id()); + free_block(cachep, &ac_entry(ac)[0], ac->avail); + ac->avail = 0; + } To see what would happen, and this indeed fixed the problem. At least didn't cause the problem to appear after a few tests. Obviously, this is not the right answer, and Ingo, since I don't know exactly what you are accomplishing with the added cpu changes, I think you are probably better at writing a patch than I. Which brings up another point. In places like kmem_cache_create you have cpu = _smp_processor_id(); and way down near the bottom, you use it. Being a preemptable kernel, can't that process jump cpus in the time being? So isn't that in itself a race condition? Thanks, -- Steve Rui, Try adding the following in slab.c --- slab.c 2004-12-08 09:27:10.000000000 -0500 +++ slab.c.new 2004-12-08 13:58:40.000000000 -0500 @@ -1533,6 +1533,12 @@ #ifndef CONFIG_PREEMPT_RT smp_call_function_all_cpus(do_drain, cachep); #endif + { + struct array_cache *ac; + ac = ac_data(cachep, smp_processor_id()); + free_block(cachep, &ac_entry(ac)[0], ac->avail); + ac->avail = 0; + } check_irq_on(); spin_lock_irq(&cachep->spinlock); if (cachep->lists.shared) and see if this fixes your usb problems. I would say that this is not a proper fix and especially for a SMP system. But if it fixes your problem then you know this is the solution.