From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] TOMOYO: Add garbage collector support. (v3)
Date: Thu, 18 Jun 2009 08:28:54 -0700 [thread overview]
Message-ID: <20090618152854.GB6794@linux.vnet.ibm.com> (raw)
In-Reply-To: <200906180534.n5I5YgMY080614@www262.sakura.ne.jp>
On Thu, Jun 18, 2009 at 02:34:42PM +0900, Tetsuo Handa wrote:
> Hello.
>
> Paul E. McKenney wrote:
> > Consider the following sequence of events:
> >
> > o CPU 0 picks up users_counter_idx int local variable idx.
> > Let's assume that the value is zero.
> >
> > o CPU 0 is now preempted, interrupted, or otherwise delayed.
> >
> > o CPU 1 starts garbage collection, finding some elements to
> > delete, thus setting "element_deleted" to true.
> >
> > o CPU 1 continues garbage collection, inverting the value of
> > users_counter_idx, so that the value is now one, waiting
> > for the value-zero readers, and freeing up the old elements.
[1]
> > o CPU 0 continues execution, first atomically incrementing
> > users_counter[0], then traversing the list, possibly sleeping.
> >
> > o CPU 2 starts a new round of garbage collection, again finding
> > some elements to delete, and thus again setting
> > "elements_deleted" to true. One of the elements deleted
> > is the one that CPU 0 is currently referencing while asleep.
> >
> No. CPU 2 can't start a new round of GC because GC function is exclusively
> executed because of gc_mutex mutex.
But CPU 1 would have released gc_mutex back at time [1], right?
> > o CPU 2 continues garbage collection, inverting the value of
> > users_counter_idx, so that the value is now zero, waiting
> > for the value-one readers, and freeing up the old elements.
> > Note that CPU 0 is a value-zero reader, so that CPU 2 will
> > not wait on it.
> >
> > CPU 2 therefore kfree()s the element that CPU 0 is currently
> > referencing.
> >
> CPU 2 won't continue GC, for CPU 2 can't start a new round of GC.
I still don't see why CPU 0 would not have released gc_mutex back
at point [1].
> > o CPU 0 wakes up, and suffers possibly fatal disappointment upon
> > attempting to reference an element that has been freed -- and,
> > worse yet, possibly re-allocated as some other type of
> > structure.
> >
> CPU 0 won't suffer, for first round of GC (by CPU 1) prevents CPU 2 from
> starting a new round of GC.
Why would CPU 1 be unable to complete its round of GC, thus releasing
gc_mutex, thus allowing CPU 2 from starting a new one? For that matter,
CPU 1 could start a new round, correct?
> > Or am I missing something in your pseudocode?
> I think you missed that GC function is executed exclusively.
>
> The race between readers and GC is avoided as below.
If you can tell me why CPU 1 cannot release gc_mutex, I will look at
the following. Until then, I will stand by my scenario above. ;-)
> (a-1) A reader reads users_counter_idx and saves to r_idx
> (a-2) GC removes element from the list using RCU
> (a-3) GC reads users_counter_idx and saves to g_idx
> (a-4) GC inverts users_counter_idx
> (a-5) GC releases the removed element
> (a-6) A reader increments users_counter[r_idx]
> (a-7) A reader won't see the element removed by GC because
> the reader has not started list traversal as of (a-2)
>
> (b-1) A reader reads users_counter_idx and saves to r_idx
> (b-2) A reader increments users_counter[r_idx]
> (b-3) GC removes element from the list using RCU
> (b-4) A reader won't see the element removed by GC
> (b-5) GC reads users_counter_idx and saves to g_idxo
> (b-6) GC inverts users_counter_idx
> (b-7) GC waits for users_counter[g_idx] to become 0
> (b-8) A reader decrements users_counter[r_idx]
> (b-9) GC releases the removed element
>
> (c-1) A reader reads users_counter_idx and saves to r_idx
> (c-2) A reader increments users_counter[r_idx]
> (c-3) A reader sees the element
> (c-4) GC removes element from the list using RCU
> (c-5) GC reads users_counter_idx and saves to g_idx
> (c-6) GC inverts users_counter_idx
> (c-7) GC waits for users_counter[g_idx] to become 0
> (c-8) A reader decrements users_counter[r_idx]
> (c-9) GC releases the removed element
>
> What I worry is that some memory barriers might be needed between
>
> > > {
> > > /* Get counter index. */
> > > int idx = atomic_read(&users_counter_idx);
> > > /* Lock counter. */
> > > atomic_inc(&users_counter[idx]);
> - here -
> > > list_for_each_entry_rcu() {
> > > ... /* Allowed to sleep. */
> > > }
> - here -
> > > /* Unlock counter. */
> > > atomic_dec(&users_counter[idx]);
> > > }
>
> and
>
> > > if (element_deleted) {
> > > /* Swap active counter. */
> > > const int idx = atomic_read(&users_counter_idx);
> - here -
> > > atomic_set(&users_counter_idx, idx ^ 1);
> - here -
> > > /*
> > > * Wait for readers who are using previously active counter.
> > > * This is similar to synchronize_rcu() while this code allows
> > > * readers to do operations which may sleep.
> > > */
> > > while (atomic_read(&users_counter[idx]))
> > > msleep(1000);
> > > /*
> > > * Nobody is using previously active counter.
> > > * Ready to release memory of elements removed before
> > > * previously active counter became inactive.
> > > */
> > > kfree(element);
> > > }
>
> in order to enforce ordering.
Quite possibly. One of the advantages of using things like SRCU is that
the necessary memory barriers are already in place. (knock wood)
> > Also, if you have lots of concurrent readers, you can suffer high memory
> > contention on the users_counter[] array, correct?
>
> Excuse me. I couldn't understand "memory contention"...
>
> ( http://www.answers.com/topic/memory-contention )
> | A situation in which two different programs, or two parts of a program,
> | try to read items in the same block of memory at the same time.
> Why suffered by atomic_read() at the same time?
> Cache invalidation by atomic_inc()/atomic_dec() a shared variable?
Yes, cache invalidation by atomic_inc()/atomic_dec() of a shared
variable can definitly result in memory contention and extremely
bad performance.
> ( http://wiki.answers.com/Q/What_is_memory_contention )
> | Memory contention is a state a OS memory manager can reside in when to many
> | memory requests (alloc, realloc, free) are issued to it from an active
> | application possibly leading to a DOS condition specific to that
> | application.
> No memory allocation for users_counter[] array.
This is not the type of memory contention I was thinking of.
> > I recommend that you look into use of SRCU in this case.
>
> I have one worry regarding SRCU.
> Inside synchronize_srcu(), there is a loop
>
> while (srcu_readers_active_idx(sp, idx))
> schedule_timeout_interruptible(1);
>
> but the reader's sleeping duration varies from less than one second to
> more than hours.
>
> Checking for counters for every jiffies sounds too much waste of CPU.
> Delaying kfree() for seconds or minutes won't cause troubles for TOMOYO.
> It would be nice if checking interval is configurable like
> "schedule_timeout_interruptible(sp->timeout);".
This would not be a difficult change, and certainly would make sense in
your case. I would be happy to review a patch from you, or to create a
patch to SRCU if you would prefer.
I would add a field as you say, and add an API element:
void set_srcu_timeout(struct srcu_struct *sp, unsigned long timeout)
The timeout would default to 1, and a value of 0 would be interpreted
as 1. In your case, perhaps you would do the following after initializing
the srcu_struct:
set_srcu_timeout(&tomoyo_srcu, HZ);
Would this work?
> > Anyway, the general approach would be to make changes to your code
> > roughly as follows:
> >
> > 1. replace your users_counter and users_counter_idx with a
> > struct srcu_struct.
> >
> > 2. In the reader, replace the fetch from users_counter_idx and
> > the atomic_inc() with srcu_read_lock().
> >
> > 3. In the garbage collector, replace the fetch/update of
> > users_counter_idx and the "while" loop with synchronize_srcu().
> >
> I see. Since I isolated the GC as a dedicated kernel thread, writers no longer
> wait for elements to be kfree()ed. I can use SRCU.
Very good!
> > Or is there some reason why SRCU does not work for you?
> None for mainline version.
>
> I'm also maintaining TOMOYO for older/distributor kernels for those who want to
> enable both SELinux/SMACK/AppArmor/grsecurity etc. and TOMOYO at the same time.
> Thus, if my idea works, I want to backport it to TOMOYO for these kernels.
SRCU has been in the kernel since 2.6.18, but would be easy to
backport. If you have a recent git tree, run "gitk kernel/srcu.c
include/linux/srcu.h". You will find four commits that are pretty
well isolated -- you would need the changes to kernel/srcu.c,
include/linux/srcu.h, and kernel/Makefile.
Thanx, Paul
next prev parent reply other threads:[~2009-06-18 15:29 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-17 11:19 [PATCH] TOMOYO: Add garbage collector support. (v3) Tetsuo Handa
2009-06-17 11:21 ` [PATCH 1/3] TOMOYO: Move sleeping operations to outside the semaphore Tetsuo Handa
2009-06-17 11:22 ` [PATCH 2/3] TOMOYO: Replace tomoyo_save_name() with tomoyo_get_name()/tomoyo_put_name() Tetsuo Handa
2009-06-17 11:23 ` [PATCH 3/3] TOMOYO: Add RCU-like garbage collector Tetsuo Handa
2009-06-17 12:28 ` [PATCH] TOMOYO: Add garbage collector support. (v3) Peter Zijlstra
2009-06-17 16:31 ` Paul E. McKenney
2009-06-18 5:34 ` Tetsuo Handa
2009-06-18 6:45 ` [PATCH 3/3] TOMOYO: Add SRCU based garbage collector Tetsuo Handa
2009-06-18 16:05 ` Paul E. McKenney
2009-06-18 15:28 ` Paul E. McKenney [this message]
2009-06-19 4:57 ` [PATCH] TOMOYO: Add garbage collector support. (v3) Tetsuo Handa
2009-06-20 1:28 ` Paul E. McKenney
2009-06-20 7:04 ` Tetsuo Handa
2009-06-21 4:07 ` Paul E. McKenney
-- strict thread matches above, loose matches on Subject: below --
2009-06-02 1:39 Tetsuo Handa
2009-06-02 1:57 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090618152854.GB6794@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.