* [PATCH] make idr_remove_all() do removal -before- free_layer()
@ 2009-03-07 22:25 Paul E. McKenney
2009-03-08 15:33 ` Ingo Molnar
0 siblings, 1 reply; 3+ messages in thread
From: Paul E. McKenney @ 2009-03-07 22:25 UTC (permalink / raw)
To: linux-kernel
Cc: manfred, Nadia.Derbey, miltonm, mingo, akpm, peterz, lnxninja,
efault, riel
The following patch fixes a problem in the IDR system, where an
idr_remove_all() hands a data element to call_rcu() (via free_layer())
before making that data element inaccessible to new readers. This is
very bad, and results in readers still having a reference to this data
element at the end of the grace period. Tests on large machines that
concurrently map and unmap user-space memory within the same multithreaded
process result in crashes within about five minutes. Applying this
patch increases the kernel's longevity to the three-to-eight-hour range.
There appear to be other similar problems in idr_get_empty_slot() and
sub_remove(), but I fixed the easy one in idr_remove_all() first. It is
therefore no surprise that failures still occur.
(Yes, and I did look at the relevant patch last year without spotting
this one. Goes to show the value of testing as well as code review,
I guess...)
Nadia, Manfred, any thoughts?
Located-by: Milton Miller II <miltonm@austin.ibm.com>
Tested-by: Milton Miller II <miltonm@austin.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
diff --git a/lib/idr.c b/lib/idr.c
index c11c576..dab4bca 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -449,6 +449,7 @@ void idr_remove_all(struct idr *idp)
n = idp->layers * IDR_BITS;
p = idp->top;
+ rcu_assign_pointer(idp->top, NULL);
max = 1 << n;
id = 0;
@@ -467,7 +468,6 @@ void idr_remove_all(struct idr *idp)
p = *--paa;
}
}
- rcu_assign_pointer(idp->top, NULL);
idp->layers = 0;
}
EXPORT_SYMBOL(idr_remove_all);
----- End forwarded message -----
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] make idr_remove_all() do removal -before- free_layer()
2009-03-07 22:25 [PATCH] make idr_remove_all() do removal -before- free_layer() Paul E. McKenney
@ 2009-03-08 15:33 ` Ingo Molnar
2009-03-08 19:20 ` Paul E. McKenney
0 siblings, 1 reply; 3+ messages in thread
From: Ingo Molnar @ 2009-03-08 15:33 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-kernel, manfred, Nadia.Derbey, miltonm, akpm, peterz,
lnxninja, efault, riel
* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> The following patch fixes a problem in the IDR system, where
> an idr_remove_all() hands a data element to call_rcu() (via
> free_layer()) before making that data element inaccessible to
> new readers. This is very bad, and results in readers still
> having a reference to this data element at the end of the
> grace period. Tests on large machines that concurrently map
> and unmap user-space memory within the same multithreaded
> process result in crashes within about five minutes. Applying
> this patch increases the kernel's longevity to the
> three-to-eight-hour range.
>
> There appear to be other similar problems in
> idr_get_empty_slot() and sub_remove(), but I fixed the easy
> one in idr_remove_all() first. It is therefore no surprise
> that failures still occur.
>
> (Yes, and I did look at the relevant patch last year without
> spotting this one. Goes to show the value of testing as well
> as code review, I guess...)
>
> Nadia, Manfred, any thoughts?
>
> Located-by: Milton Miller II <miltonm@austin.ibm.com>
> Tested-by: Milton Miller II <miltonm@austin.ibm.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Hm, looks like something we really want to see fixed in
2.6.29-final, right?
Ingo
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] make idr_remove_all() do removal -before- free_layer()
2009-03-08 15:33 ` Ingo Molnar
@ 2009-03-08 19:20 ` Paul E. McKenney
0 siblings, 0 replies; 3+ messages in thread
From: Paul E. McKenney @ 2009-03-08 19:20 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, manfred, Nadia.Derbey, miltonm, akpm, peterz,
lnxninja, efault, riel
On Sun, Mar 08, 2009 at 04:33:36PM +0100, Ingo Molnar wrote:
>
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
>
> > The following patch fixes a problem in the IDR system, where
> > an idr_remove_all() hands a data element to call_rcu() (via
> > free_layer()) before making that data element inaccessible to
> > new readers. This is very bad, and results in readers still
> > having a reference to this data element at the end of the
> > grace period. Tests on large machines that concurrently map
> > and unmap user-space memory within the same multithreaded
> > process result in crashes within about five minutes. Applying
> > this patch increases the kernel's longevity to the
> > three-to-eight-hour range.
> >
> > There appear to be other similar problems in
> > idr_get_empty_slot() and sub_remove(), but I fixed the easy
> > one in idr_remove_all() first. It is therefore no surprise
> > that failures still occur.
> >
> > (Yes, and I did look at the relevant patch last year without
> > spotting this one. Goes to show the value of testing as well
> > as code review, I guess...)
> >
> > Nadia, Manfred, any thoughts?
> >
> > Located-by: Milton Miller II <miltonm@austin.ibm.com>
> > Tested-by: Milton Miller II <miltonm@austin.ibm.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>
> Hm, looks like something we really want to see fixed in
> 2.6.29-final, right?
This was located in real testing, so I agree that it is pretty high
priority. So this patch should go into 2.6.29.
The priority of the remaining yet-as-unknown fixes depends on their
complexity and risk.
Thanx, Paul
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-03-08 19:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-07 22:25 [PATCH] make idr_remove_all() do removal -before- free_layer() Paul E. McKenney
2009-03-08 15:33 ` Ingo Molnar
2009-03-08 19:20 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox