public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2.6.13-stable] cpuset semaphore double trip fix
@ 2005-09-10  0:44 Paul Jackson
  2005-09-10  1:56 ` Randy.Dunlap
  2005-09-10  3:01 ` Chris Wright
  0 siblings, 2 replies; 7+ messages in thread
From: Paul Jackson @ 2005-09-10  0:44 UTC (permalink / raw)
  To: Chris Wright
  Cc: Andrew Morton, Simon Derr, Paul Jackson, linux-kernel,
	Linus Torvalds

Code reading uncovered a potential deadlock on the global cpuset
semaphore, cpuset_sem.

==> This patch is only useful in the 2.6.13-stable series.

    (It's harmless, and useless, in the pre 2.6.14 fork)

The pre-2.6.14 fork has already diverged, with an additional patch
that further aggrevated this problem, and a more thorough overhaul
of the cpuset locking, to fix the problems.

All code paths in kernel/cpuset.c (2.6.13 or earlier) that first
grab cpuset_sem and then allocate memory _must_ call the routine
'refresh_mems()', after getting cpuset_sem, before any possible
allocation.

If this refresh_mems() call is not done, then there is a risk that one
of the cpuset_zone_allowed() calls made from within the page allocator
(__alloc_pages) will find that the mems_generation of the current task
doesn't match that of its cpuset, causing it to try to grab cpuset_sem.
Since it already held cpuset_sem, this deadlocks that task, and any
subsequent task wanting cpuset_sem.

==> The code paths leading to the kmalloc in check_for_release(), from
    cpuset_exit, cpuset_rmdir and attach_task (for the detached cpuset),
    fail to invoke refresh_mems() as required.

    The fix is easy enough - add the requisite refresh_mems() call.

Unless someone is rapidly creating, modifying and destroying cpusets,
they are unlikely to have any chance of encountering this deadlock.
And even then, it is apparently difficult to do so.

In the case we got here from cpuset_exit(), we have already torn
down the tasks connection to this cpuset and current->cpuset is NULL.
Don't call refresh_mems() in that case - it oops the kernel.

Signed-off-by: Paul Jackson <pj@sgi.com>

Index: linux-2.6.13-mem_exclusive_oom/kernel/cpuset.c
===================================================================
--- linux-2.6.13-mem_exclusive_oom.orig/kernel/cpuset.c
+++ linux-2.6.13-mem_exclusive_oom/kernel/cpuset.c
@@ -458,7 +458,10 @@ static void check_for_release(struct cpu
 	if (notify_on_release(cs) && atomic_read(&cs->count) == 0 &&
 	    list_empty(&cs->children)) {
 		char *buf;
+		static void refresh_mems(void);
 
+		if (current->cpuset)
+			refresh_mems();
 		buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
 		if (!buf)
 			return;

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.6.13-stable] cpuset semaphore double trip fix
  2005-09-10  0:44 [PATCH 2.6.13-stable] cpuset semaphore double trip fix Paul Jackson
@ 2005-09-10  1:56 ` Randy.Dunlap
  2005-09-10  2:31   ` Paul Jackson
  2005-09-10  3:01 ` Chris Wright
  1 sibling, 1 reply; 7+ messages in thread
From: Randy.Dunlap @ 2005-09-10  1:56 UTC (permalink / raw)
  To: Paul Jackson, stable; +Cc: chrisw, akpm, Simon.Derr, pj, linux-kernel, torvalds

On Fri, 9 Sep 2005 17:44:03 -0700 (PDT) Paul Jackson wrote:

> Code reading uncovered a potential deadlock on the global cpuset
> semaphore, cpuset_sem.
> 
> ==> This patch is only useful in the 2.6.13-stable series.

Paul,

stable patch candidates should be sent to stable@kernel.org

Chris/Greg,
maybe that should be listed in MAINTAINERS as well as in
Documentation/stable_kernel_rules.txt (it's here already).


>     (It's harmless, and useless, in the pre 2.6.14 fork)
> 
> The pre-2.6.14 fork has already diverged, with an additional patch
> that further aggrevated this problem, and a more thorough overhaul
> of the cpuset locking, to fix the problems.
> 
> All code paths in kernel/cpuset.c (2.6.13 or earlier) that first
> grab cpuset_sem and then allocate memory _must_ call the routine
> 'refresh_mems()', after getting cpuset_sem, before any possible
> allocation.
> 
> If this refresh_mems() call is not done, then there is a risk that one
> of the cpuset_zone_allowed() calls made from within the page allocator
> (__alloc_pages) will find that the mems_generation of the current task
> doesn't match that of its cpuset, causing it to try to grab cpuset_sem.
> Since it already held cpuset_sem, this deadlocks that task, and any
> subsequent task wanting cpuset_sem.
> 
> ==> The code paths leading to the kmalloc in check_for_release(), from
>     cpuset_exit, cpuset_rmdir and attach_task (for the detached cpuset),
>     fail to invoke refresh_mems() as required.
> 
>     The fix is easy enough - add the requisite refresh_mems() call.
> 
> Unless someone is rapidly creating, modifying and destroying cpusets,
> they are unlikely to have any chance of encountering this deadlock.
> And even then, it is apparently difficult to do so.
> 
> In the case we got here from cpuset_exit(), we have already torn
> down the tasks connection to this cpuset and current->cpuset is NULL.
> Don't call refresh_mems() in that case - it oops the kernel.
> 
> Signed-off-by: Paul Jackson <pj@sgi.com>
> 
> Index: linux-2.6.13-mem_exclusive_oom/kernel/cpuset.c
> ===================================================================
> --- linux-2.6.13-mem_exclusive_oom.orig/kernel/cpuset.c
> +++ linux-2.6.13-mem_exclusive_oom/kernel/cpuset.c
> @@ -458,7 +458,10 @@ static void check_for_release(struct cpu
>  	if (notify_on_release(cs) && atomic_read(&cs->count) == 0 &&
>  	    list_empty(&cs->children)) {
>  		char *buf;
> +		static void refresh_mems(void);
>  
> +		if (current->cpuset)
> +			refresh_mems();
>  		buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
>  		if (!buf)
>  			return;
> 

---
~Randy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.6.13-stable] cpuset semaphore double trip fix
  2005-09-10  1:56 ` Randy.Dunlap
@ 2005-09-10  2:31   ` Paul Jackson
  0 siblings, 0 replies; 7+ messages in thread
From: Paul Jackson @ 2005-09-10  2:31 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: stable, chrisw, akpm, Simon.Derr, linux-kernel, torvalds

> Documentation/stable_kernel_rules.txt 

Aha - that's the stable documentation that I missed.  Thanks, Randy.

I figured that there was an explanation somewhere.  And there it is,
right in plain view.

Reading ...

My patch does pass the following criteria:

 - No "theoretical race condition" issues, unless an explanation of how
   the race can be exploited.

The conditions to trigger the race are too delicate to exploit
quickly.  I don't have a coded exploit.

Apparently I did the right thing by _not_ sending this patch to
stable@kernel.org <grin>.

This patch is dead.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.6.13-stable] cpuset semaphore double trip fix
  2005-09-10  0:44 [PATCH 2.6.13-stable] cpuset semaphore double trip fix Paul Jackson
  2005-09-10  1:56 ` Randy.Dunlap
@ 2005-09-10  3:01 ` Chris Wright
  2005-09-10  3:20   ` Paul Jackson
  1 sibling, 1 reply; 7+ messages in thread
From: Chris Wright @ 2005-09-10  3:01 UTC (permalink / raw)
  To: Paul Jackson
  Cc: Chris Wright, Andrew Morton, Simon Derr, linux-kernel,
	Linus Torvalds, stable

Thanks Paul.  As Randy mentioned, please send these to stable@kernel.org
in the future.

* Paul Jackson (pj@sgi.com) wrote:
> Code reading uncovered a potential deadlock on the global cpuset
> semaphore, cpuset_sem.

Another 'by inspection' patch, perhaps we'll need to update the stable
rules, since these can be quite valid fixes, yet typically trigger
review replies asking if it's necessary for -stable.

> ==> This patch is only useful in the 2.6.13-stable series.
> 
>     (It's harmless, and useless, in the pre 2.6.14 fork)
> 
> The pre-2.6.14 fork has already diverged, with an additional patch
> that further aggrevated this problem, and a more thorough overhaul
> of the cpuset locking, to fix the problems.
> 
> All code paths in kernel/cpuset.c (2.6.13 or earlier) that first
> grab cpuset_sem and then allocate memory _must_ call the routine
> 'refresh_mems()', after getting cpuset_sem, before any possible
> allocation.
> 
> If this refresh_mems() call is not done, then there is a risk that one
> of the cpuset_zone_allowed() calls made from within the page allocator
> (__alloc_pages) will find that the mems_generation of the current task
> doesn't match that of its cpuset, causing it to try to grab cpuset_sem.
> Since it already held cpuset_sem, this deadlocks that task, and any
> subsequent task wanting cpuset_sem.
> 
> ==> The code paths leading to the kmalloc in check_for_release(), from
>     cpuset_exit, cpuset_rmdir and attach_task (for the detached cpuset),
>     fail to invoke refresh_mems() as required.
> 
>     The fix is easy enough - add the requisite refresh_mems() call.
> 
> Unless someone is rapidly creating, modifying and destroying cpusets,
> they are unlikely to have any chance of encountering this deadlock.
> And even then, it is apparently difficult to do so.

How unlikely?  So unlikely that it's more a theoreitical race, or did
you find ways to trigger?  If it's purely theoretical then it's not a
good candidiate for -stable.

> In the case we got here from cpuset_exit(), we have already torn
> down the tasks connection to this cpuset and current->cpuset is NULL.
> Don't call refresh_mems() in that case - it oops the kernel.

Is this one well-tested, since the fix diverges from upstream?  And one
minor nit, let's just do a real forward declaration of refresh_mems() 
instead of local to check_for_release().

thanks,
-chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.6.13-stable] cpuset semaphore double trip fix
  2005-09-10  3:01 ` Chris Wright
@ 2005-09-10  3:20   ` Paul Jackson
  2005-09-10  3:28     ` Chris Wright
  0 siblings, 1 reply; 7+ messages in thread
From: Paul Jackson @ 2005-09-10  3:20 UTC (permalink / raw)
  To: Chris Wright; +Cc: chrisw, akpm, Simon.Derr, linux-kernel, torvalds, stable

Chris wrote:
> Another 'by inspection' patch, perhaps we'll need to update the stable
> rules, since these can be quite valid fixes, yet typically trigger
> review replies asking if it's necessary for -stable.

I'm scratching my head here, trying to figure out what is the
bottom line of this comment.

I'm guessing you're saying:

	"By inspection" patches, unless they have something further
	to recommend their inclusion, are not candidates for -stable.

But intent of your phrase "yet typically trigger review replies ..."
went right past me ...


> How unlikely?  So unlikely that it's more a theoreitical race, or did
> you find ways to trigger? 

I don't have a way to trigger it.  My guess is that someday, some
customer will find the right combination of calls, and be able to
trigger this once every few hours or days.  The odds are quite good
that 2.6.13.* will live out its life before that happens.  When it
happens, it will be a customer doing some serious cpuset manipulations
on serious big iron.


> Is this one well-tested, since the fix diverges from upstream?

Yes - I have a stress test, and some custom kernel tracing, that
could see the conditions needed to trigger this occurring, just not
all simultaneously in the necessary small window of vulnerability.


> And one minor nit, let's just do a real forward declaration of
> refresh_mems() instead of local to check_for_release().

Normally, yes.  In this case, I was coding to keep the patch as
localized as possible, and less to optimize the resulting organization
of the kernel/cpuset.c source file after the patch was applied.

Given that this patch is unlikely to ever have a life beyond a briefly
discussed patch today, I guessed right ;)  Coding for clarity of the
patch, not of the source, was arguably the right choice this time.

Thanks.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.6.13-stable] cpuset semaphore double trip fix
  2005-09-10  3:20   ` Paul Jackson
@ 2005-09-10  3:28     ` Chris Wright
  2005-09-10  3:40       ` Paul Jackson
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Wright @ 2005-09-10  3:28 UTC (permalink / raw)
  To: Paul Jackson
  Cc: Chris Wright, akpm, Simon.Derr, linux-kernel, torvalds, stable

* Paul Jackson (pj@sgi.com) wrote:
> Chris wrote:
> > Another 'by inspection' patch, perhaps we'll need to update the stable
> > rules, since these can be quite valid fixes, yet typically trigger
> > review replies asking if it's necessary for -stable.
> 
> I'm scratching my head here, trying to figure out what is the
> bottom line of this comment.
> 
> I'm guessing you're saying:
> 
> 	"By inspection" patches, unless they have something further
> 	to recommend their inclusion, are not candidates for -stable.

Yes.

> But intent of your phrase "yet typically trigger review replies ..."
> went right past me ...

Sorry, I was thinking outloud, it wasn't a direct comment on this patch.
During the -stable review period, patches like these usually get some
squawks.  And there are cases where 'found by inspection' are valid.

> > How unlikely?  So unlikely that it's more a theoreitical race, or did
> > you find ways to trigger? 
> 
> I don't have a way to trigger it.  My guess is that someday, some
> customer will find the right combination of calls, and be able to
> trigger this once every few hours or days.  The odds are quite good
> that 2.6.13.* will live out its life before that happens.  When it
> happens, it will be a customer doing some serious cpuset manipulations
> on serious big iron.

OK, we can hold until you find a good case for triggering ;-)

thanks,
-chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.6.13-stable] cpuset semaphore double trip fix
  2005-09-10  3:28     ` Chris Wright
@ 2005-09-10  3:40       ` Paul Jackson
  0 siblings, 0 replies; 7+ messages in thread
From: Paul Jackson @ 2005-09-10  3:40 UTC (permalink / raw)
  To: Chris Wright; +Cc: chrisw, akpm, Simon.Derr, linux-kernel, torvalds, stable

Chris wrote:
> OK, we can hold until you find a good case for triggering ;-)

Agreed.

By the way, you or Randy mentioned something about leaving a mention of
-stable in the MAINTAINERS file.

That wouldn't have helped me this week, but if you had put the string:

	Documentation/stable_kernel_rules.txt

somewhere in the opening text of the messages that announce stable
reviews, that would have saved me some time.  I have in mind the
lkml messages such as:

	From: Chris Wright <chrisw@osdl.org>
	To: linux-kernel@vger.kernel.org, stable@kernel.org
	Subject: [PATCH 0/9] -stable review
	
	This is the start of the stable review cycle for the 2.6.13.1 release.

Just a thought ...

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-09-10  3:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-10  0:44 [PATCH 2.6.13-stable] cpuset semaphore double trip fix Paul Jackson
2005-09-10  1:56 ` Randy.Dunlap
2005-09-10  2:31   ` Paul Jackson
2005-09-10  3:01 ` Chris Wright
2005-09-10  3:20   ` Paul Jackson
2005-09-10  3:28     ` Chris Wright
2005-09-10  3:40       ` Paul Jackson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox