public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Kaigai Kohei <kaigai@ak.jp.nec.com>
Cc: Stephen Smalley <sds@epoch.ncsc.mil>,
	"SELinux-ML(Eng)" <selinux@tycho.nsa.gov>,
	"Linux Kernel ML(Eng)" <linux-kernel@vger.kernel.org>,
	James Morris <jmorris@redhat.com>
Subject: Re: RCU issue with SELinux (Re: SELINUX performance issues)
Date: Wed, 25 Aug 2004 10:34:24 -0700	[thread overview]
Message-ID: <20040825173423.GC1244@us.ibm.com> (raw)
In-Reply-To: <024b01c48a89$26765b60$f97d220a@linux.bs1.fc.nec.co.jp>

On Wed, Aug 25, 2004 at 06:51:53PM +0900, Kaigai Kohei wrote:

Hello again, Kai,

> Hi Paul, thanks for your comments.
> 
> > > I modified the following points:
> > > - We hold the lock for hash backet when avc_insert() and avc_ss_reset() are
> > >   called for safety.
> > > - list_for_each_rcu() and list_entry() are replaced by list_for_entry().
> > 
> > One subtlety here...
> > 
> > The traversals that are protected by rcu_read_lock() (rather than an
> > update-side spinlock) need to be list_for_each_entry_rcu() rather than
> > list_for_each_entry().  The "_rcu()" is required in order to work
> > reliably on Alpha, and has the added benefit of calling out exactly
> > which traversals are RCU-protected.
> > 
> > Update-side code remains list_for_each_entry().
> 
> It was a simple misconception.
> I fixed them in the take3-patch.

Let's take them one at a time:

1.	The list_for_each_entry_rcu() in avc_hash_eval() is correct.
	It is protected by rcu_read_lock(), so it is documenting
	a RCU-protected traversal of the list, and inserting needed
	memory barriers on Alpha CPUs.

2.	The list_for_each_entry_rcu() in avc_reclaim_node() should
	instead by list_for_each_entry(), with no _rcu().  This is
	because this traversal is guarded by the update-side lock,
	not by RCU.  The fact that the lock is held means that no
	other CPU can be changing this list during the traversal.

3.	The list_for_each_entry_rcu() in avc_search_node() is correct.
	As in item #1, it is protected by rcu_read_lock().

4.	The list_for_each_entry_rcu() in avc_insert() is more interesting,
	but needs to be list_for_each_entry().  It is protected by -both-
	the spinlock and by an rcu_read_lock() acquired by the caller!

	However, the spinlock prevents any other CPU from modifying the
	data structure, so this should be list_for_each_entry() rather
	than the current list_for_each_entry_rcu().

	What is happening is that the code is upgrading from a read-side
	rcu_read_lock() to an update-side spin_lock_irqsave().  This
	is legal, though perhaps a bit unconventional.

	However, looking closer, I don't understand why the rcu_read_lock()
	needs to be re-acquired before the call to avc_insert() in
	avc_has_perm_noaudit():

	o	avc_latest_notif_update() allocates a sequence number
		under a lock.

	o	avc_get_node() allocates and initializes a node.  It does
		call avc_reclaim_node(), but this function is guarded
		by a spinlock.

	o	avc_insert() calls the above two functions, and other
		than that, initializes the new node (which no other CPU
		has access to), and inserts it under the spinlock.

	So I do not believe that avc_insert() needs rcu_read_lock().
	Unless I am missing something, the rcu_read_lock() acquired
	in avc_has_perm_noaudit() should be moved after the call to
	avc_insert().

	I was concerned about the possibility that avc_has_perm_noaudit()
	might be preempted between the rcu_read_unlock() and the
	rcu_read_lock(), but this issue appears to be correctly handled
	by the code in avc_insert(), which handles either case, updating
	the existing node or inserting a new one.

5.	The list_for_each_entry_rcu() in avc_update_node() should be
	changed to list_for_each_entry(), since it is protected
	by the update-side spinlock, and therefore does not need
	RCU protection, as in item #2 above.  This is another case
	where an rcu_read_lock() is "upgraded" to a write-side lock
	by acquiring a spinlock.

6.	The list_for_each_entry_rcu() in avc_update_cache() is correct.
	It is protected only by rcu_read_lock(), so RCU protection is
	required.

7.	The list_for_each_entry_rcu() in avc_ss_reset() needs to be
	changed to list_for_each_entry(), since it is protected by
	the update-side spinlock.  Since the array is statically allocated,
	there is no need for the enclosing rcu_read_lock() and
	rcu_read_unlock(), and they should be removed.

I clearly need to provide more documentation.  ;-)  This is in progress.

See below for a patch that applies on top of your take3 patch, since
the C code might be more clear than the English in this case.

> > > - avc_node_dual structure which contains two avc_node objects is defined. 
> > >   It allows to do avc_update_node() without kmalloc() or any locks.
> > 
> > What happens when you have two consecutive updates to the same object?
> > Don't you have to defer the second update until a grace period has
> > elapsed since the first update in order to avoid confusing readers that
> > are still accessing the original version?
> 
> I didn't imagine such a situation. Indeed, such thing may happen.
> 
> > One way to do this would be to set a "don't-touch-me" bit that is
> > cleared by an RCU callback.  An update to an element with the
> > "don't-touch-me" bit set would block until the bit clears.  There
> > are probably better ways...
> 
> I think we can't apply this approach for the implementation
> of avc_update_node(), because execution context isn't permitted to block.

That would certainly invalidate my suggestion!  ;-)

> I changed my opinion and implementation of avc_update_node().
> If kmalloc() returns NULL in avc_update_node(), it returns -ENOMEM.
> 
> But this effect of changing the prototype is limited, because only
> avc_has_perm_noaudit() and avc_update_cache() call avc_update_node().
> 
> Even if avc_update_node() return -ENOMEM to avc_has_perm_noaudit(),
> avc_has_perm_noaudit() can ignore it, because the purpose is only
> to control the audit-log floods.
> This adverse effect is only that audit-logs are printed twice.
> 
> Nobody calls avc_update_cache(), which is only defined.

Sounds good!

> Some other trivial fixes are as follows:
> - All list_for_each_entry() were replaced by list_for_each_entry_rcu().

See above.  ;-)

> - All spin_lock()/spin_unlock() were replaced by spin_lock_irqsave()
>   /spin_unlock_restore().
> - In avc_node_insert(), if an entry with the same ssid/tsid/tclass as new
>   one exists, the older entry is replaced by the new one.
> 
> Thank you for the opinion as a specialist of RCU!

Thank -you- for giving RCU a try!  Again, the performance results are
quite impressive!

						Thanx, Paul


diff -urpN -X dontdiff linux-2.6.8.1-selinux3.rcu/security/selinux/avc.c linux-2.6.8.1-selinux3.rcu.pem/security/selinux/avc.c
--- linux-2.6.8.1-selinux3.rcu/security/selinux/avc.c	Wed Aug 25 09:34:26 2004
+++ linux-2.6.8.1-selinux3.rcu.pem/security/selinux/avc.c	Wed Aug 25 10:30:10 2004
@@ -253,7 +253,7 @@ static inline int avc_reclaim_node(void)
 		if (!spin_trylock(&avc_cache.slots_lock[hvalue]))
 			continue;
 
-		list_for_each_entry_rcu(node, &avc_cache.slots[hvalue], list) {
+		list_for_each_entry(node, &avc_cache.slots[hvalue], list) {
 			if (!atomic_dec_and_test(&node->ae.used)) {
 				/* Recently Unused */
 				list_del_rcu(&node->list);
@@ -419,7 +419,7 @@ struct avc_node *avc_insert(u32 ssid, u3
 		node->ae.avd.seqno = ae->avd.seqno;
 
 		spin_lock_irqsave(&avc_cache.slots_lock[hvalue], flag);
-		list_for_each_entry_rcu(pos, &avc_cache.slots[hvalue], list) {
+		list_for_each_entry(pos, &avc_cache.slots[hvalue], list) {
 			if (pos->ae.ssid == ssid &&
 			    pos->ae.tsid == tsid &&
 			    pos->ae.tclass == tclass) {
@@ -719,7 +719,7 @@ static int avc_update_node(u32 event, u3
 	hvalue = avc_hash(ssid, tsid, tclass);
 	spin_lock_irqsave(&avc_cache.slots_lock[hvalue], flag);
 
-	list_for_each_entry_rcu(pos, &avc_cache.slots[hvalue], list){
+	list_for_each_entry(pos, &avc_cache.slots[hvalue], list){
 		if ( ssid==pos->ae.ssid &&
 		     tsid==pos->ae.tsid &&
 		     tclass==pos->ae.tclass ){
@@ -912,16 +912,14 @@ int avc_ss_reset(u32 seqno)
 
 	avc_hash_eval("reset");
 
-	rcu_read_lock();
 	for (i = 0; i < AVC_CACHE_SLOTS; i++) {
 		spin_lock_irqsave(&avc_cache.slots_lock[i], flag);
-		list_for_each_entry_rcu(node, &avc_cache.slots[i], list){
+		list_for_each_entry(node, &avc_cache.slots[i], list){
 			list_del_rcu(&node->list);
 			call_rcu(&node->rhead, avc_node_free);
 		}
 		spin_unlock_irqrestore(&avc_cache.slots_lock[i], flag);
 	}
-	rcu_read_unlock();
 
 	for (i = 0; i < AVC_NSTATS; i++)
 		avc_cache_stats[i] = 0;

      reply	other threads:[~2004-08-25 17:38 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-16  9:33 RCU issue with SELinux (Re: SELINUX performance issues) Kaigai Kohei
2004-08-16 15:19 ` James Morris
2004-08-20 13:36   ` Kaigai Kohei
2004-08-20 14:53     ` James Morris
2004-08-24  7:27       ` Kaigai Kohei
2004-08-24 13:24         ` James Morris
2004-08-25  9:51           ` Kaigai Kohei
2004-08-25 18:31             ` James Morris
2004-08-25  9:52           ` [PATCH]atomic_inc_return() for i386/x86_64 (Re: RCU issue with SELinux) Kaigai Kohei
2004-08-20 17:31     ` RCU issue with SELinux (Re: SELINUX performance issues) Luke Kenneth Casson Leighton
2004-08-20 18:15       ` James Morris
2004-08-20 20:19     ` Paul E. McKenney
2004-08-20 20:35       ` James Morris
2004-08-24  7:27       ` Kaigai Kohei
     [not found]     ` <1093014789.16585.186.camel@moss-spartans.epoch.ncsc.mil>
2004-08-24  7:25       ` Kaigai Kohei
2004-08-24 15:37         ` Stephen Smalley
2004-08-25  9:51           ` Kaigai Kohei
2004-08-25 15:50             ` Stephen Smalley
2004-08-25 16:11               ` Stephen Smalley
2004-08-26  7:53               ` Kaigai Kohei
2004-08-26 13:24                 ` Stephen Smalley
2004-08-27 11:07                   ` Kaigai Kohei
2004-08-30 11:17                   ` [PATCH]SELinux performance improvement by RCU (Re: RCU issue with SELinux) Kaigai Kohei
2004-08-30 15:35                     ` Stephen Smalley
2004-08-30 16:13                       ` Paul E. McKenney
2004-08-31  4:33                         ` Kaigai Kohei
2004-08-31 16:20                           ` Paul E. McKenney
2004-08-31 15:33                     ` James Morris
2004-08-24 23:02         ` RCU issue with SELinux (Re: SELINUX performance issues) Paul E. McKenney
2004-08-25  9:51           ` Kaigai Kohei
2004-08-25 17:34             ` Paul E. McKenney [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040825173423.GC1244@us.ibm.com \
    --to=paulmck@us.ibm.com \
    --cc=jmorris@redhat.com \
    --cc=kaigai@ak.jp.nec.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sds@epoch.ncsc.mil \
    --cc=selinux@tycho.nsa.gov \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox