From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762519AbZFRP3E (ORCPT ); Thu, 18 Jun 2009 11:29:04 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1758276AbZFRP2z (ORCPT ); Thu, 18 Jun 2009 11:28:55 -0400 Received: from e7.ny.us.ibm.com ([32.97.182.137]:54021 "EHLO e7.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752284AbZFRP2x (ORCPT ); Thu, 18 Jun 2009 11:28:53 -0400 Date: Thu, 18 Jun 2009 08:28:54 -0700 From: "Paul E. McKenney" To: Tetsuo Handa Cc: linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] TOMOYO: Add garbage collector support. (v3) Message-ID: <20090618152854.GB6794@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <200906172019.EAE00032.OLOJFQVMtFOFSH@I-love.SAKURA.ne.jp> <20090617163122.GC6767@linux.vnet.ibm.com> <200906180534.n5I5YgMY080614@www262.sakura.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200906180534.n5I5YgMY080614@www262.sakura.ne.jp> User-Agent: Mutt/1.5.15+20070412 (2007-04-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 18, 2009 at 02:34:42PM +0900, Tetsuo Handa wrote: > Hello. > > Paul E. McKenney wrote: > > Consider the following sequence of events: > > > > o CPU 0 picks up users_counter_idx int local variable idx. > > Let's assume that the value is zero. > > > > o CPU 0 is now preempted, interrupted, or otherwise delayed. > > > > o CPU 1 starts garbage collection, finding some elements to > > delete, thus setting "element_deleted" to true. > > > > o CPU 1 continues garbage collection, inverting the value of > > users_counter_idx, so that the value is now one, waiting > > for the value-zero readers, and freeing up the old elements. [1] > > o CPU 0 continues execution, first atomically incrementing > > users_counter[0], then traversing the list, possibly sleeping. > > > > o CPU 2 starts a new round of garbage collection, again finding > > some elements to delete, and thus again setting > > "elements_deleted" to true. One of the elements deleted > > is the one that CPU 0 is currently referencing while asleep. > > > No. CPU 2 can't start a new round of GC because GC function is exclusively > executed because of gc_mutex mutex. But CPU 1 would have released gc_mutex back at time [1], right? > > o CPU 2 continues garbage collection, inverting the value of > > users_counter_idx, so that the value is now zero, waiting > > for the value-one readers, and freeing up the old elements. > > Note that CPU 0 is a value-zero reader, so that CPU 2 will > > not wait on it. > > > > CPU 2 therefore kfree()s the element that CPU 0 is currently > > referencing. > > > CPU 2 won't continue GC, for CPU 2 can't start a new round of GC. I still don't see why CPU 0 would not have released gc_mutex back at point [1]. > > o CPU 0 wakes up, and suffers possibly fatal disappointment upon > > attempting to reference an element that has been freed -- and, > > worse yet, possibly re-allocated as some other type of > > structure. > > > CPU 0 won't suffer, for first round of GC (by CPU 1) prevents CPU 2 from > starting a new round of GC. Why would CPU 1 be unable to complete its round of GC, thus releasing gc_mutex, thus allowing CPU 2 from starting a new one? For that matter, CPU 1 could start a new round, correct? > > Or am I missing something in your pseudocode? > I think you missed that GC function is executed exclusively. > > The race between readers and GC is avoided as below. If you can tell me why CPU 1 cannot release gc_mutex, I will look at the following. Until then, I will stand by my scenario above. ;-) > (a-1) A reader reads users_counter_idx and saves to r_idx > (a-2) GC removes element from the list using RCU > (a-3) GC reads users_counter_idx and saves to g_idx > (a-4) GC inverts users_counter_idx > (a-5) GC releases the removed element > (a-6) A reader increments users_counter[r_idx] > (a-7) A reader won't see the element removed by GC because > the reader has not started list traversal as of (a-2) > > (b-1) A reader reads users_counter_idx and saves to r_idx > (b-2) A reader increments users_counter[r_idx] > (b-3) GC removes element from the list using RCU > (b-4) A reader won't see the element removed by GC > (b-5) GC reads users_counter_idx and saves to g_idxo > (b-6) GC inverts users_counter_idx > (b-7) GC waits for users_counter[g_idx] to become 0 > (b-8) A reader decrements users_counter[r_idx] > (b-9) GC releases the removed element > > (c-1) A reader reads users_counter_idx and saves to r_idx > (c-2) A reader increments users_counter[r_idx] > (c-3) A reader sees the element > (c-4) GC removes element from the list using RCU > (c-5) GC reads users_counter_idx and saves to g_idx > (c-6) GC inverts users_counter_idx > (c-7) GC waits for users_counter[g_idx] to become 0 > (c-8) A reader decrements users_counter[r_idx] > (c-9) GC releases the removed element > > What I worry is that some memory barriers might be needed between > > > > { > > > /* Get counter index. */ > > > int idx = atomic_read(&users_counter_idx); > > > /* Lock counter. */ > > > atomic_inc(&users_counter[idx]); > - here - > > > list_for_each_entry_rcu() { > > > ... /* Allowed to sleep. */ > > > } > - here - > > > /* Unlock counter. */ > > > atomic_dec(&users_counter[idx]); > > > } > > and > > > > if (element_deleted) { > > > /* Swap active counter. */ > > > const int idx = atomic_read(&users_counter_idx); > - here - > > > atomic_set(&users_counter_idx, idx ^ 1); > - here - > > > /* > > > * Wait for readers who are using previously active counter. > > > * This is similar to synchronize_rcu() while this code allows > > > * readers to do operations which may sleep. > > > */ > > > while (atomic_read(&users_counter[idx])) > > > msleep(1000); > > > /* > > > * Nobody is using previously active counter. > > > * Ready to release memory of elements removed before > > > * previously active counter became inactive. > > > */ > > > kfree(element); > > > } > > in order to enforce ordering. Quite possibly. One of the advantages of using things like SRCU is that the necessary memory barriers are already in place. (knock wood) > > Also, if you have lots of concurrent readers, you can suffer high memory > > contention on the users_counter[] array, correct? > > Excuse me. I couldn't understand "memory contention"... > > ( http://www.answers.com/topic/memory-contention ) > | A situation in which two different programs, or two parts of a program, > | try to read items in the same block of memory at the same time. > Why suffered by atomic_read() at the same time? > Cache invalidation by atomic_inc()/atomic_dec() a shared variable? Yes, cache invalidation by atomic_inc()/atomic_dec() of a shared variable can definitly result in memory contention and extremely bad performance. > ( http://wiki.answers.com/Q/What_is_memory_contention ) > | Memory contention is a state a OS memory manager can reside in when to many > | memory requests (alloc, realloc, free) are issued to it from an active > | application possibly leading to a DOS condition specific to that > | application. > No memory allocation for users_counter[] array. This is not the type of memory contention I was thinking of. > > I recommend that you look into use of SRCU in this case. > > I have one worry regarding SRCU. > Inside synchronize_srcu(), there is a loop > > while (srcu_readers_active_idx(sp, idx)) > schedule_timeout_interruptible(1); > > but the reader's sleeping duration varies from less than one second to > more than hours. > > Checking for counters for every jiffies sounds too much waste of CPU. > Delaying kfree() for seconds or minutes won't cause troubles for TOMOYO. > It would be nice if checking interval is configurable like > "schedule_timeout_interruptible(sp->timeout);". This would not be a difficult change, and certainly would make sense in your case. I would be happy to review a patch from you, or to create a patch to SRCU if you would prefer. I would add a field as you say, and add an API element: void set_srcu_timeout(struct srcu_struct *sp, unsigned long timeout) The timeout would default to 1, and a value of 0 would be interpreted as 1. In your case, perhaps you would do the following after initializing the srcu_struct: set_srcu_timeout(&tomoyo_srcu, HZ); Would this work? > > Anyway, the general approach would be to make changes to your code > > roughly as follows: > > > > 1. replace your users_counter and users_counter_idx with a > > struct srcu_struct. > > > > 2. In the reader, replace the fetch from users_counter_idx and > > the atomic_inc() with srcu_read_lock(). > > > > 3. In the garbage collector, replace the fetch/update of > > users_counter_idx and the "while" loop with synchronize_srcu(). > > > I see. Since I isolated the GC as a dedicated kernel thread, writers no longer > wait for elements to be kfree()ed. I can use SRCU. Very good! > > Or is there some reason why SRCU does not work for you? > None for mainline version. > > I'm also maintaining TOMOYO for older/distributor kernels for those who want to > enable both SELinux/SMACK/AppArmor/grsecurity etc. and TOMOYO at the same time. > Thus, if my idea works, I want to backport it to TOMOYO for these kernels. SRCU has been in the kernel since 2.6.18, but would be easy to backport. If you have a recent git tree, run "gitk kernel/srcu.c include/linux/srcu.h". You will find four commits that are pretty well isolated -- you would need the changes to kernel/srcu.c, include/linux/srcu.h, and kernel/Makefile. Thanx, Paul