* ipoib lockdep warning
@ 2006-07-11 20:10 Zach Brown
2006-07-11 21:16 ` Michael S. Tsirkin
` (2 more replies)
0 siblings, 3 replies; 32+ messages in thread
From: Zach Brown @ 2006-07-11 20:10 UTC (permalink / raw)
To: openib-general, Linux Kernel Mailing List, Arjan van de Ven,
Ingo Molnar
I get an awfully verbose lockdep warning when I bring up ipoib devices
and wanted to make sure it didn't go unreported. I've put it up at the
following URL:
http://oss.oracle.com/~zab/ipoib-multicast-lockdep-warning.txt
I looked into it a bit and it seems to be on to something. There might
be a AB, BC, CA ordering deadlock.
AB:
priv->lock is held while acquiring query_idr.lock
ipoib_mcast_send()
spin_lock(&priv->lock);
ipoib_mcast_sendonly_join()
ib_sa_mcmember_rec_query()
send_mad()
idr_pre_get(&query_idr)
spin_lock(&idp->lock);
BC:
query_idr.lock is taken with interrupts enabled and so is implicitly
ordered before dev->_xmit_lock which is taken in interrupt context.
ipoib_mcast_join_task()
ipoib_mcast_join()
ib_sa_mcmember_rec_query()
send_mad()
idr_pre_get(&query_idr)
spin_lock(&idp->lock)
CA:
dev->_xmit_lock is held while acquiring priv->lock. This triggers the
lockdep warning that adding the dep between dev->_xmit_lock and
priv->lock connects a soft-irq-safe lock to a soft-irq-unsafe one.
ipoib_mcast_restart_task()
local_irq_save(flags);
netif_tx_lock(dev)
spin_lock(&dev->_xmit_lock);
spin_lock(&priv->lock);
I can imagine all sorts of potential fixes (block ints when calling idr?
reorder acquiry in ipoib_mcast_restart_task()?) but I'm operating on a
partial view of the paths here so I wasn't comfortable suggesting a fix.
I wouldn't be surprised to hear that there are circumstances that both
lockdep and I don't know about that stop this from being a problem :).
In any case, it'd be fantastic if someone who knows this code could sit
down with lockdep and some ipoib regression suite to shake out lockdep's
complaints.
- z
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: ipoib lockdep warning 2006-07-11 20:10 ipoib lockdep warning Zach Brown @ 2006-07-11 21:16 ` Michael S. Tsirkin 2006-07-11 21:40 ` Sean Hefty 2006-07-11 22:54 ` Roland Dreier 2006-07-12 2:33 ` [openib-general] ipoib lockdep warning Roland Dreier 2 siblings, 1 reply; 32+ messages in thread From: Michael S. Tsirkin @ 2006-07-11 21:16 UTC (permalink / raw) To: Zach Brown, Sean Hefty, Hal Rosenstock, Roland Dreier Cc: openib-general, Linux Kernel Mailing List, Arjan van de Ven, Ingo Molnar Quoting r. Zach Brown <zach.brown@oracle.com>: > BC: > > query_idr.lock is taken with interrupts enabled and so is implicitly > ordered before dev->_xmit_lock which is taken in interrupt context. > > ipoib_mcast_join_task() > ipoib_mcast_join() > ib_sa_mcmember_rec_query() > send_mad() > idr_pre_get(&query_idr) > spin_lock(&idp->lock) Got to check, but if that's true we have a simple deadlock here: ib_sa_mcmember_rec_query might get called from interrupt context as well, deadlocking on idp->lock? Sean? > I can imagine all sorts of potential fixes (block ints when calling idr? > reorder acquiry in ipoib_mcast_restart_task()?) but I'm operating on a > partial view of the paths here so I wasn't comfortable suggesting a fix. > I wouldn't be surprised to hear that there are circumstances that both > lockdep and I don't know about that stop this from being a problem :). Awesome, thanks for the analysis! Your help is very much appreciated. -- MST ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: ipoib lockdep warning 2006-07-11 21:16 ` Michael S. Tsirkin @ 2006-07-11 21:40 ` Sean Hefty 2006-07-11 21:50 ` [openib-general] " Zach Brown 0 siblings, 1 reply; 32+ messages in thread From: Sean Hefty @ 2006-07-11 21:40 UTC (permalink / raw) To: 'Michael S. Tsirkin', Zach Brown, Hal Rosenstock, Roland Dreier Cc: openib-general, Linux Kernel Mailing List, Arjan van de Ven, Ingo Molnar >Quoting r. Zach Brown <zach.brown@oracle.com>: >> BC: >> >> query_idr.lock is taken with interrupts enabled and so is implicitly >> ordered before dev->_xmit_lock which is taken in interrupt context. >> >> ipoib_mcast_join_task() >> ipoib_mcast_join() >> ib_sa_mcmember_rec_query() >> send_mad() >> idr_pre_get(&query_idr) >> spin_lock(&idp->lock) > >Got to check, but if that's true we have a simple deadlock here: >ib_sa_mcmember_rec_query might get called from interrupt >context as well, deadlocking on idp->lock? > >Sean? As a side note, I believe that this is the upstream code and does not include the latest multicast changes. I'm not sure if anything calls into the sa_query interfaces from interrupt context, but I doubt it. From my brief look at the initially reported issue, I can't determine if there's an actual problem without studying the ipoib code more. - Sean ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [openib-general] ipoib lockdep warning 2006-07-11 21:40 ` Sean Hefty @ 2006-07-11 21:50 ` Zach Brown 0 siblings, 0 replies; 32+ messages in thread From: Zach Brown @ 2006-07-11 21:50 UTC (permalink / raw) To: Sean Hefty Cc: 'Michael S. Tsirkin', Hal Rosenstock, Roland Dreier, Ingo Molnar, Linux Kernel Mailing List, openib-general, Arjan van de Ven > As a side note, I believe that this is the upstream code and does not include > the latest multicast changes. Indeed, I should have mentioned in my report that I was running 2.6.17-mm6. - z ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [openib-general] ipoib lockdep warning 2006-07-11 20:10 ipoib lockdep warning Zach Brown 2006-07-11 21:16 ` Michael S. Tsirkin @ 2006-07-11 22:54 ` Roland Dreier 2006-07-11 23:27 ` Zach Brown 2006-07-12 2:33 ` [openib-general] ipoib lockdep warning Roland Dreier 2 siblings, 1 reply; 32+ messages in thread From: Roland Dreier @ 2006-07-11 22:54 UTC (permalink / raw) To: Zach Brown Cc: openib-general, Linux Kernel Mailing List, Arjan van de Ven, Ingo Molnar No time to really look at this in detail, but I think the issue is a slightly bogus conversion to netif_tx_lock(). Can you try this patch and see if things are better? diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index ab40488..ddd1946 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -820,9 +820,8 @@ void ipoib_mcast_restart_task(void *dev_ ipoib_mcast_stop_thread(dev, 0); - local_irq_save(flags); netif_tx_lock(dev); - spin_lock(&priv->lock); + spin_lock_irqsave(&priv->lock, flags); /* * Unfortunately, the networking core only gives us a list of all of @@ -893,9 +892,8 @@ void ipoib_mcast_restart_task(void *dev_ } } - spin_unlock(&priv->lock); + spin_unlock_irqrestore(&priv->lock, flags); netif_tx_unlock(dev); - local_irq_restore(flags); /* We have to cancel outside of the spinlock */ list_for_each_entry_safe(mcast, tmcast, &remove_list, list) { ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [openib-general] ipoib lockdep warning 2006-07-11 22:54 ` Roland Dreier @ 2006-07-11 23:27 ` Zach Brown 2006-07-11 23:43 ` Roland Dreier 0 siblings, 1 reply; 32+ messages in thread From: Zach Brown @ 2006-07-11 23:27 UTC (permalink / raw) To: Roland Dreier Cc: Ingo Molnar, Linux Kernel Mailing List, openib-general, Arjan van de Ven Roland Dreier wrote: > No time to really look at this in detail, but I think the issue is a > slightly bogus conversion to netif_tx_lock(). Can you try this patch > and see if things are better? > - local_irq_save(flags); > netif_tx_lock(dev); > - spin_lock(&priv->lock); > + spin_lock_irqsave(&priv->lock, flags); Hmm, won't that hold dev->_xmit_lock with interrupts enabled? That seems like it'd deadlock with dev_watchdog grabbing it from a softirq? But setting that aside, I don't think this will help as it doesn't change the order that the locks are acquired in this path. This can still be racing with other cpus that are each in the other two paths (queue_idr.lock -> softirq -> dev->_xmit_lock, priv->lock -> queue_idr.lock). The local state of interrupt masking on this cpu doesn't stop the other cpus from each grabbing the first lock in their path and then trying to grab the second. Imagine that they all race to grab the first (A, B, C) and succeed and then all get stuck spinning on their second lock (B, C, A). Maybe you could get the priv->lock here before dev->_xmit_lock. Then we'd have AB, BC, AC, and that's OK. I'm going to guess that this won't work because other paths have dev->_xmit_lock -> priv->lock ordering. Another possibility is masking interrupts when getting queue_idr.lock. That drops the implicit dependency between queue_idr and _xmit_lock and gives us AB, B, CA -- which is fine. That means blocking ints while in idr_pre_get() which allocs which leads to GFP_ATOMIC and more likely allocation failure. That's my reading, anyway. - z ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [openib-general] ipoib lockdep warning 2006-07-11 23:27 ` Zach Brown @ 2006-07-11 23:43 ` Roland Dreier 2006-07-11 23:53 ` Zach Brown 2006-07-12 9:38 ` Ingo Molnar 0 siblings, 2 replies; 32+ messages in thread From: Roland Dreier @ 2006-07-11 23:43 UTC (permalink / raw) To: Zach Brown Cc: Ingo Molnar, Linux Kernel Mailing List, openib-general, Arjan van de Ven Hmm, good point. It sort of seems to me like the idr interfaces are broken by design. Internally, lib/idr.c uses bare spin_lock(&idp->lock) with no interrupt disabling or anything in both the idr_pre_get() and idr_get_new() code paths. But idr_pre_get() is supposed to be called in a context that can sleep, while idr_get_new() is supposed to be called with locks held to serialize things (at least according to http://lwn.net/Articles/103209/). So, ugh... maybe the best thing to do is change lib/idr.c to use spin_lock_irqsave() internally? - R. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [openib-general] ipoib lockdep warning 2006-07-11 23:43 ` Roland Dreier @ 2006-07-11 23:53 ` Zach Brown 2006-07-12 0:06 ` Roland Dreier 2006-07-12 9:38 ` Ingo Molnar 1 sibling, 1 reply; 32+ messages in thread From: Zach Brown @ 2006-07-11 23:53 UTC (permalink / raw) To: Roland Dreier Cc: Ingo Molnar, Linux Kernel Mailing List, openib-general, Arjan van de Ven Roland Dreier wrote: > Hmm, good point. > > It sort of seems to me like the idr interfaces are broken by design. > Internally, lib/idr.c uses bare spin_lock(&idp->lock) with no > interrupt disabling or anything in both the idr_pre_get() and > idr_get_new() code paths. I wasn't thrilled to see that either. We seem to have a fair precedent (list.h, rbtree, etc) for leaving serialization to callers. > So, ugh... maybe the best thing to do is change lib/idr.c to use > spin_lock_irqsave() internally? I dunno, it seems to have had _irq() locking in the past? From the comment at the top: * Modified by George Anzinger to reuse immediately and to use * find bit instructions. Also removed _irq on spinlocks. - z ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [openib-general] ipoib lockdep warning 2006-07-11 23:53 ` Zach Brown @ 2006-07-12 0:06 ` Roland Dreier 0 siblings, 0 replies; 32+ messages in thread From: Roland Dreier @ 2006-07-12 0:06 UTC (permalink / raw) To: Zach Brown Cc: Ingo Molnar, Linux Kernel Mailing List, openib-general, Arjan van de Ven > > So, ugh... maybe the best thing to do is change lib/idr.c to use > > spin_lock_irqsave() internally? > > I dunno, it seems to have had _irq() locking in the past? From the > comment at the top: > > * Modified by George Anzinger to reuse immediately and to use > * find bit instructions. Also removed _irq on spinlocks. Well, _irq would be no good, because we might want to call idr stuff with interrupts disabled. But making idr internally _irqsave seems like the right fix to me. I think the real issue here is that the sa_query.c stuff wants to use the idr mechanism to assign "query ids", and other modules want to be able to start queries from any context. So if idr uses bare spin_lock internally, then sa_query.c has no choice but to wrap all idr calls inside spin_lock_irqsave and do all allocation with GFP_ATOMIC, which doesn't seem very nice. - R. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [openib-general] ipoib lockdep warning 2006-07-11 23:43 ` Roland Dreier 2006-07-11 23:53 ` Zach Brown @ 2006-07-12 9:38 ` Ingo Molnar 2006-07-12 11:09 ` Michael S. Tsirkin ` (2 more replies) 1 sibling, 3 replies; 32+ messages in thread From: Ingo Molnar @ 2006-07-12 9:38 UTC (permalink / raw) To: Roland Dreier Cc: Zach Brown, Linux Kernel Mailing List, openib-general, Arjan van de Ven * Roland Dreier <rdreier@cisco.com> wrote: > Hmm, good point. > > It sort of seems to me like the idr interfaces are broken by design. [...] > So, ugh... maybe the best thing to do is change lib/idr.c to use > spin_lock_irqsave() internally? i agree that the IDR subsystem should be irq-safe if GFP_ATOMIC is passed in. So the _irqsave()/_irqrestore() fix should be done. But i also think that you should avoid using GFP_ATOMIC for any sort of reliable IO path and push as much work into process context as possible. Is it acceptable for your infiniband IO model to fail with -ENOMEM if GFP_ATOMIC happens to fail, and is the IO retried transparently? Ingo ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ipoib lockdep warning 2006-07-12 9:38 ` Ingo Molnar @ 2006-07-12 11:09 ` Michael S. Tsirkin 2006-07-12 16:31 ` [openib-general] " Sean Hefty ` (2 more replies) 2006-07-12 19:06 ` [openib-general] ipoib lockdep warning Roland Dreier 2006-07-12 20:45 ` [PATCH] Convert idr's internal locking to _irqsave variant Roland Dreier 2 siblings, 3 replies; 32+ messages in thread From: Michael S. Tsirkin @ 2006-07-12 11:09 UTC (permalink / raw) To: Ingo Molnar, Sean Hefty Cc: Roland Dreier, Zach Brown, Linux Kernel Mailing List, openib-general, Arjan van de Ven Quoting r. Ingo Molnar <mingo@elte.hu>: > But i also think that you should avoid using GFP_ATOMIC for any sort of > reliable IO path and push as much work into process context as possible. > Is it acceptable for your infiniband IO model to fail with -ENOMEM if > GFP_ATOMIC happens to fail, and is the IO retried transparently? Yes, this is true for users that pass GFP_ATOMIC to sa_query, at least. But might not be so for other users: send_mad in sa_query actually gets gfp_flags parameter, but for some reason it does not pass it to idr_pre_get, which means even sa query done with GFP_KERNEL flag is likely to fail. Sean, it seems we need something like the following - what do you think? -- Avoid bogus out out memory errors: fix sa_query to actually pass gfp_mask supplied by the user to idr_pre_get. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index e911c99..aeda484 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -488,13 +488,13 @@ static void init_mad(struct ib_sa_mad *m spin_unlock_irqrestore(&tid_lock, flags); } -static int send_mad(struct ib_sa_query *query, int timeout_ms) +static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask) { unsigned long flags; int ret, id; retry: - if (!idr_pre_get(&query_idr, GFP_ATOMIC)) + if (!idr_pre_get(&query_idr, gfp_mask)) return -ENOMEM; spin_lock_irqsave(&idr_lock, flags); ret = idr_get_new(&query_idr, query, &id); @@ -630,7 +630,7 @@ int ib_sa_path_rec_get(struct ib_device *sa_query = &query->sa_query; - ret = send_mad(&query->sa_query, timeout_ms); + ret = send_mad(&query->sa_query, timeout_ms, gfp_mask); if (ret < 0) goto err2; @@ -752,7 +752,7 @@ int ib_sa_service_rec_query(struct ib_de *sa_query = &query->sa_query; - ret = send_mad(&query->sa_query, timeout_ms); + ret = send_mad(&query->sa_query, timeout_ms, gfp_mask); if (ret < 0) goto err2; @@ -844,7 +844,7 @@ int ib_sa_mcmember_rec_query(struct ib_d *sa_query = &query->sa_query; - ret = send_mad(&query->sa_query, timeout_ms); + ret = send_mad(&query->sa_query, timeout_ms, gfp_mask); if (ret < 0) goto err2; -- MST ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [openib-general] ipoib lockdep warning 2006-07-12 11:09 ` Michael S. Tsirkin @ 2006-07-12 16:31 ` Sean Hefty 2006-07-12 18:56 ` Roland Dreier 2006-07-13 15:55 ` [PATCH] IB/core: use correct gfp_mask in sa_query Michael S. Tsirkin 2 siblings, 0 replies; 32+ messages in thread From: Sean Hefty @ 2006-07-12 16:31 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Ingo Molnar, Sean Hefty, Roland Dreier, Arjan van de Ven, Zach Brown, openib-general, Linux Kernel Mailing List Michael S. Tsirkin wrote: > Yes, this is true for users that pass GFP_ATOMIC to sa_query, at least. But > might not be so for other users: send_mad in sa_query actually gets gfp_flags > parameter, but for some reason it does not pass it to idr_pre_get, which means > even sa query done with GFP_KERNEL flag is likely to fail. > > Sean, it seems we need something like the following - what do you think? I noticed this same thing looking at the code yesterday. I can't think of any reason why your patch wouldn't work. - Sean ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ipoib lockdep warning 2006-07-12 11:09 ` Michael S. Tsirkin 2006-07-12 16:31 ` [openib-general] " Sean Hefty @ 2006-07-12 18:56 ` Roland Dreier 2006-07-13 15:55 ` [PATCH] IB/core: use correct gfp_mask in sa_query Michael S. Tsirkin 2 siblings, 0 replies; 32+ messages in thread From: Roland Dreier @ 2006-07-12 18:56 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Ingo Molnar, Sean Hefty, Zach Brown, Linux Kernel Mailing List, openib-general, Arjan van de Ven > Avoid bogus out out memory errors: fix sa_query to actually pass gfp_mask > supplied by the user to idr_pre_get. Yes, this looks right to me. - R. ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] IB/core: use correct gfp_mask in sa_query 2006-07-12 11:09 ` Michael S. Tsirkin 2006-07-12 16:31 ` [openib-general] " Sean Hefty 2006-07-12 18:56 ` Roland Dreier @ 2006-07-13 15:55 ` Michael S. Tsirkin 2 siblings, 0 replies; 32+ messages in thread From: Michael S. Tsirkin @ 2006-07-13 15:55 UTC (permalink / raw) To: Linux Kernel Mailing List, Andrew Morton, Ingo Molnar, Sean Hefty Cc: Roland Dreier, Zach Brown, openib-general, Arjan van de Ven Andrew, could you please drop the following in -mm and on to Linus? -- Avoid bogus out of memory errors: fix sa_query to actually pass gfp_mask supplied by the user to idr_pre_get. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Acked-by: "Sean Hefty" <mshefty@ichips.intel.com> Acked-by: "Roland Dreier" <rdreier@cisco.com> diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index e911c99..aeda484 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -488,13 +488,13 @@ static void init_mad(struct ib_sa_mad *m spin_unlock_irqrestore(&tid_lock, flags); } -static int send_mad(struct ib_sa_query *query, int timeout_ms) +static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask) { unsigned long flags; int ret, id; retry: - if (!idr_pre_get(&query_idr, GFP_ATOMIC)) + if (!idr_pre_get(&query_idr, gfp_mask)) return -ENOMEM; spin_lock_irqsave(&idr_lock, flags); ret = idr_get_new(&query_idr, query, &id); @@ -630,7 +630,7 @@ int ib_sa_path_rec_get(struct ib_device *sa_query = &query->sa_query; - ret = send_mad(&query->sa_query, timeout_ms); + ret = send_mad(&query->sa_query, timeout_ms, gfp_mask); if (ret < 0) goto err2; @@ -752,7 +752,7 @@ int ib_sa_service_rec_query(struct ib_de *sa_query = &query->sa_query; - ret = send_mad(&query->sa_query, timeout_ms); + ret = send_mad(&query->sa_query, timeout_ms, gfp_mask); if (ret < 0) goto err2; @@ -844,7 +844,7 @@ int ib_sa_mcmember_rec_query(struct ib_d *sa_query = &query->sa_query; - ret = send_mad(&query->sa_query, timeout_ms); + ret = send_mad(&query->sa_query, timeout_ms, gfp_mask); if (ret < 0) goto err2; -- MST ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [openib-general] ipoib lockdep warning 2006-07-12 9:38 ` Ingo Molnar 2006-07-12 11:09 ` Michael S. Tsirkin @ 2006-07-12 19:06 ` Roland Dreier 2006-07-12 20:45 ` [PATCH] Convert idr's internal locking to _irqsave variant Roland Dreier 2 siblings, 0 replies; 32+ messages in thread From: Roland Dreier @ 2006-07-12 19:06 UTC (permalink / raw) To: Ingo Molnar Cc: Zach Brown, Linux Kernel Mailing List, openib-general, Arjan van de Ven > i agree that the IDR subsystem should be irq-safe if GFP_ATOMIC is > passed in. So the _irqsave()/_irqrestore() fix should be done. OK, I will send the idr change to Andrew. > But i also think that you should avoid using GFP_ATOMIC for any sort of > reliable IO path and push as much work into process context as possible. > Is it acceptable for your infiniband IO model to fail with -ENOMEM if > GFP_ATOMIC happens to fail, and is the IO retried transparently? Yes, I think it's OK. This idr use is in an inherently unreliable path. With that said, as Michael pointed out, we can change things to use GFP_ATOMIC less. Thanks, Roland ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] Convert idr's internal locking to _irqsave variant 2006-07-12 9:38 ` Ingo Molnar 2006-07-12 11:09 ` Michael S. Tsirkin 2006-07-12 19:06 ` [openib-general] ipoib lockdep warning Roland Dreier @ 2006-07-12 20:45 ` Roland Dreier 2006-07-12 21:14 ` Ingo Molnar 2006-07-13 1:30 ` Andrew Morton 2 siblings, 2 replies; 32+ messages in thread From: Roland Dreier @ 2006-07-12 20:45 UTC (permalink / raw) To: akpm Cc: Arjan van de Ven, Ingo Molnar, Zach Brown, openib-general, linux-kernel Currently, the code in lib/idr.c uses a bare spin_lock(&idp->lock) to do internal locking. This is a nasty trap for code that might call idr functions from different contexts; for example, it seems perfectly reasonable to call idr_get_new() from process context and idr_remove() from interrupt context -- but with the current locking this would lead to a potential deadlock. The simplest fix for this is to just convert the idr locking to use spin_lock_irqsave(). In particular, this fixes a very complicated locking issue detected by lockdep, involving the ib_ipoib driver's priv->lock and dev->_xmit_lock, which get involved with the ib_sa module's query_idr.lock. Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Zach Brown <zach.brown@oracle.com>, Signed-off-by: Roland Dreier <rolandd@cisco.com> --- diff --git a/lib/idr.c b/lib/idr.c index 4d09681..16d2143 100644 --- a/lib/idr.c +++ b/lib/idr.c @@ -38,14 +38,15 @@ static kmem_cache_t *idr_layer_cache; static struct idr_layer *alloc_layer(struct idr *idp) { struct idr_layer *p; + unsigned long flags; - spin_lock(&idp->lock); + spin_lock_irqsave(&idp->lock, flags); if ((p = idp->id_free)) { idp->id_free = p->ary[0]; idp->id_free_cnt--; p->ary[0] = NULL; } - spin_unlock(&idp->lock); + spin_unlock_irqrestore(&idp->lock, flags); return(p); } @@ -59,12 +60,14 @@ static void __free_layer(struct idr *idp static void free_layer(struct idr *idp, struct idr_layer *p) { + unsigned long flags; + /* * Depends on the return element being zeroed. */ - spin_lock(&idp->lock); + spin_lock_irqsave(&idp->lock, flags); __free_layer(idp, p); - spin_unlock(&idp->lock); + spin_unlock_irqrestore(&idp->lock, flags); } /** @@ -168,6 +171,7 @@ static int idr_get_new_above_int(struct { struct idr_layer *p, *new; int layers, v, id; + unsigned long flags; id = starting_id; build_up: @@ -191,14 +195,14 @@ build_up: * The allocation failed. If we built part of * the structure tear it down. */ - spin_lock(&idp->lock); + spin_lock_irqsave(&idp->lock, flags); for (new = p; p && p != idp->top; new = p) { p = p->ary[0]; new->ary[0] = NULL; new->bitmap = new->count = 0; __free_layer(idp, new); } - spin_unlock(&idp->lock); + spin_unlock_irqrestore(&idp->lock, flags); return -1; } new->ary[0] = p; ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH] Convert idr's internal locking to _irqsave variant 2006-07-12 20:45 ` [PATCH] Convert idr's internal locking to _irqsave variant Roland Dreier @ 2006-07-12 21:14 ` Ingo Molnar 2006-07-13 1:30 ` Andrew Morton 1 sibling, 0 replies; 32+ messages in thread From: Ingo Molnar @ 2006-07-12 21:14 UTC (permalink / raw) To: Roland Dreier Cc: akpm, Arjan van de Ven, Zach Brown, openib-general, linux-kernel * Roland Dreier <rdreier@cisco.com> wrote: > Currently, the code in lib/idr.c uses a bare spin_lock(&idp->lock) to > do internal locking. This is a nasty trap for code that might call > idr functions from different contexts; for example, it seems perfectly > reasonable to call idr_get_new() from process context and idr_remove() > from interrupt context -- but with the current locking this would lead > to a potential deadlock. > > The simplest fix for this is to just convert the idr locking to use > spin_lock_irqsave(). > > In particular, this fixes a very complicated locking issue detected by > lockdep, involving the ib_ipoib driver's priv->lock and > dev->_xmit_lock, which get involved with the ib_sa module's > query_idr.lock. > > Cc: Arjan van de Ven <arjan@infradead.org> > Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Ingo Molnar <mingo@elte.hu> Ingo ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Convert idr's internal locking to _irqsave variant 2006-07-12 20:45 ` [PATCH] Convert idr's internal locking to _irqsave variant Roland Dreier 2006-07-12 21:14 ` Ingo Molnar @ 2006-07-13 1:30 ` Andrew Morton 2006-07-13 15:42 ` Roland Dreier 1 sibling, 1 reply; 32+ messages in thread From: Andrew Morton @ 2006-07-13 1:30 UTC (permalink / raw) To: Roland Dreier; +Cc: arjan, mingo, zach.brown, openib-general, linux-kernel On Wed, 12 Jul 2006 13:45:12 -0700 Roland Dreier <rdreier@cisco.com> wrote: > Currently, the code in lib/idr.c uses a bare spin_lock(&idp->lock) to > do internal locking. This is a nasty trap for code that might call > idr functions from different contexts; for example, it seems perfectly > reasonable to call idr_get_new() from process context and idr_remove() > from interrupt context -- but with the current locking this would lead > to a potential deadlock. > > The simplest fix for this is to just convert the idr locking to use > spin_lock_irqsave(). > > In particular, this fixes a very complicated locking issue detected by > lockdep, involving the ib_ipoib driver's priv->lock and dev->_xmit_lock, > which get involved with the ib_sa module's query_idr.lock. Sigh. It was always a mistake (of the kernel programming 101 type) to put any locking at all in the idr code. At some stage we need to weed it all out and move it to callers. Your fix is yet more fallout from that mistake. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Convert idr's internal locking to _irqsave variant 2006-07-13 1:30 ` Andrew Morton @ 2006-07-13 15:42 ` Roland Dreier 2006-07-13 20:54 ` Andrew Morton 0 siblings, 1 reply; 32+ messages in thread From: Roland Dreier @ 2006-07-13 15:42 UTC (permalink / raw) To: Andrew Morton; +Cc: arjan, mingo, zach.brown, openib-general, linux-kernel > Sigh. It was always a mistake (of the kernel programming 101 type) to put > any locking at all in the idr code. At some stage we need to weed it all > out and move it to callers. > > Your fix is yet more fallout from that mistake. Agreed. Consider me on the hook to fix this up in a better way once my life is a little saner. Maybe I'll try to cook something up on the plane ride to Ottawa. Anyway you can punch me in the stomach if I don't have something in time for 2.6.19. - R. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Convert idr's internal locking to _irqsave variant 2006-07-13 15:42 ` Roland Dreier @ 2006-07-13 20:54 ` Andrew Morton 2006-07-13 21:03 ` Roland Dreier 0 siblings, 1 reply; 32+ messages in thread From: Andrew Morton @ 2006-07-13 20:54 UTC (permalink / raw) To: Roland Dreier; +Cc: arjan, mingo, zach.brown, openib-general, linux-kernel On Thu, 13 Jul 2006 08:42:47 -0700 Roland Dreier <rdreier@cisco.com> wrote: > > Sigh. It was always a mistake (of the kernel programming 101 type) to put > > any locking at all in the idr code. At some stage we need to weed it all > > out and move it to callers. > > > > Your fix is yet more fallout from that mistake. > > Agreed. Consider me on the hook to fix this up in a better way once > my life is a little saner. Maybe I'll try to cook something up on the > plane ride to Ottawa. > I suspect it'll get really ugly. It's a container library which needs to allocate memory when items are added, like the radix-tree. Either it needs to assume GFP_ATOMIC, which is bad and can easily fail or it does weird things like radix_tree_preload(). The basic problem is: idr_pre_get(GFP_KERNEL); spin_lock(my_lock); idr_get_new(..); which is racy, because some other CPU could have got in there and consumed some of the pool which was precharged by idr_pre_get(). It's wildly improbable that it'll actually fail. It requires all of: a) that the race occur b) that the racing thread consume an appreciable amount of the pool c) that this thread also consume an appreciable amount (such that the total of both exceeds the pool size). d) that a (needs to be added) GFP_ATOMIC attempt to replenish the pool inside idr_get_new() fails. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Convert idr's internal locking to _irqsave variant 2006-07-13 20:54 ` Andrew Morton @ 2006-07-13 21:03 ` Roland Dreier 2006-07-13 21:05 ` Arjan van de Ven 2006-07-13 21:43 ` Andrew Morton 0 siblings, 2 replies; 32+ messages in thread From: Roland Dreier @ 2006-07-13 21:03 UTC (permalink / raw) To: Andrew Morton; +Cc: arjan, mingo, zach.brown, openib-general, linux-kernel > I suspect it'll get really ugly. It's a container library which needs to > allocate memory when items are added, like the radix-tree. Either it needs > to assume GFP_ATOMIC, which is bad and can easily fail or it does weird > things like radix_tree_preload(). Actually I don't think it has to be too bad. We could tweak the interface a little bit so that consumers do something like: struct idr_layer *layer = NULL; /* opaque */ retry: spin_lock(&my_idr_lock); ret = idr_get_new(&my_idr, ptr, &id, layer); spin_unlock(&my_idr_lock); if (ret == -EAGAIN) { layer = idr_alloc_layer(&my_idr, GFP_KERNEL); if (!IS_ERR(layer)) goto retry; } in other words make the consumer responsible for passing in new memory that can be used for a new entry (or freed if other entries have become free in the meantime). - R. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Convert idr's internal locking to _irqsave variant 2006-07-13 21:03 ` Roland Dreier @ 2006-07-13 21:05 ` Arjan van de Ven 2006-07-14 0:18 ` Roland Dreier 2006-07-13 21:43 ` Andrew Morton 1 sibling, 1 reply; 32+ messages in thread From: Arjan van de Ven @ 2006-07-13 21:05 UTC (permalink / raw) To: Roland Dreier Cc: Andrew Morton, mingo, zach.brown, openib-general, linux-kernel On Thu, 2006-07-13 at 14:03 -0700, Roland Dreier wrote: > > I suspect it'll get really ugly. It's a container library which needs to > > allocate memory when items are added, like the radix-tree. Either it needs > > to assume GFP_ATOMIC, which is bad and can easily fail or it does weird > > things like radix_tree_preload(). > > Actually I don't think it has to be too bad. We could tweak the > interface a little bit so that consumers do something like: > > it does get harder if this is needed for your IB device to do more work, so that your swap device on your IB can take more IO's to free up ram.. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Convert idr's internal locking to _irqsave variant 2006-07-13 21:05 ` Arjan van de Ven @ 2006-07-14 0:18 ` Roland Dreier 2006-07-14 6:20 ` Arjan van de Ven 0 siblings, 1 reply; 32+ messages in thread From: Roland Dreier @ 2006-07-14 0:18 UTC (permalink / raw) To: Arjan van de Ven Cc: Andrew Morton, mingo, zach.brown, openib-general, linux-kernel Arjan> it does get harder if this is needed for your IB device to Arjan> do more work, so that your swap device on your IB can take Arjan> more IO's to free up ram.. That's the classic problem, but it's more a matter of the consumer using GFP_NOIO in the right places. - R. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Convert idr's internal locking to _irqsave variant 2006-07-14 0:18 ` Roland Dreier @ 2006-07-14 6:20 ` Arjan van de Ven 0 siblings, 0 replies; 32+ messages in thread From: Arjan van de Ven @ 2006-07-14 6:20 UTC (permalink / raw) To: Roland Dreier Cc: Andrew Morton, mingo, zach.brown, openib-general, linux-kernel On Thu, 2006-07-13 at 17:18 -0700, Roland Dreier wrote: > Arjan> it does get harder if this is needed for your IB device to > Arjan> do more work, so that your swap device on your IB can take > Arjan> more IO's to free up ram.. > > That's the classic problem, but it's more a matter of the consumer > using GFP_NOIO in the right places. GFP_NOIO isn't going to save you in the cases where the memory really is running low and you need the memory to do more IO... ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Convert idr's internal locking to _irqsave variant 2006-07-13 21:03 ` Roland Dreier 2006-07-13 21:05 ` Arjan van de Ven @ 2006-07-13 21:43 ` Andrew Morton 2006-07-14 1:08 ` Roland Dreier 1 sibling, 1 reply; 32+ messages in thread From: Andrew Morton @ 2006-07-13 21:43 UTC (permalink / raw) To: Roland Dreier; +Cc: arjan, mingo, zach.brown, openib-general, linux-kernel On Thu, 13 Jul 2006 14:03:21 -0700 Roland Dreier <rdreier@cisco.com> wrote: > > I suspect it'll get really ugly. It's a container library which needs to > > allocate memory when items are added, like the radix-tree. Either it needs > > to assume GFP_ATOMIC, which is bad and can easily fail or it does weird > > things like radix_tree_preload(). > > Actually I don't think it has to be too bad. We could tweak the > interface a little bit so that consumers do something like: > > struct idr_layer *layer = NULL; /* opaque */ > > retry: > spin_lock(&my_idr_lock); > ret = idr_get_new(&my_idr, ptr, &id, layer); > spin_unlock(&my_idr_lock); > > if (ret == -EAGAIN) { > layer = idr_alloc_layer(&my_idr, GFP_KERNEL); > if (!IS_ERR(layer)) > goto retry; > } > > in other words make the consumer responsible for passing in new memory > that can be used for a new entry (or freed if other entries have > become free in the meantime). > Good point, a try-again loop would work. Do we really need the caller to maintain a cache? I suspect something like drat: if (idr_pre_get(GFP_KERNEL) == ENOMEM) give_up(); spin_lock(); ret = idr_get_new(); spin_unlock(); if (ret == ENOMEM) goto drat; would do it. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Convert idr's internal locking to _irqsave variant 2006-07-13 21:43 ` Andrew Morton @ 2006-07-14 1:08 ` Roland Dreier 2006-07-14 1:18 ` Andrew Morton 0 siblings, 1 reply; 32+ messages in thread From: Roland Dreier @ 2006-07-14 1:08 UTC (permalink / raw) To: Andrew Morton; +Cc: arjan, mingo, zach.brown, openib-general, linux-kernel > Good point, a try-again loop would work. Do we really need the caller to > maintain a cache? I suspect something like > > drat: > if (idr_pre_get(GFP_KERNEL) == ENOMEM) > give_up(); > spin_lock(); > ret = idr_get_new(); > spin_unlock(); > if (ret == ENOMEM) > goto drat; > > would do it. The problem (for my tiny brain at least) is that I don't know where idr_pre_get() can put the memory it allocates if there's no lock in the idr structure -- how do you maintain internal consistency if no locks are held when filling the cache? Having the caller hold a chunk of memory in a stack variable was the trick I came up with to get around that. - R. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Convert idr's internal locking to _irqsave variant 2006-07-14 1:08 ` Roland Dreier @ 2006-07-14 1:18 ` Andrew Morton 2006-07-14 1:30 ` Andrew Morton 0 siblings, 1 reply; 32+ messages in thread From: Andrew Morton @ 2006-07-14 1:18 UTC (permalink / raw) To: Roland Dreier; +Cc: arjan, mingo, zach.brown, openib-general, linux-kernel On Thu, 13 Jul 2006 18:08:17 -0700 Roland Dreier <rdreier@cisco.com> wrote: > > Good point, a try-again loop would work. Do we really need the caller to > > maintain a cache? I suspect something like > > > > drat: > > if (idr_pre_get(GFP_KERNEL) == ENOMEM) > > give_up(); > > spin_lock(); > > ret = idr_get_new(); > > spin_unlock(); > > if (ret == ENOMEM) > > goto drat; > > > > would do it. > > The problem (for my tiny brain at least) is that I don't know where > idr_pre_get() can put the memory it allocates if there's no lock in > the idr structure -- how do you maintain internal consistency if no > locks are held when filling the cache? argh. Aren't you supposed to be on vacation or something? > Having the caller hold a chunk of memory in a stack variable was the > trick I came up with to get around that. Yes, that certainly works. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Convert idr's internal locking to _irqsave variant 2006-07-14 1:18 ` Andrew Morton @ 2006-07-14 1:30 ` Andrew Morton 2006-07-17 15:57 ` Roland Dreier 0 siblings, 1 reply; 32+ messages in thread From: Andrew Morton @ 2006-07-14 1:30 UTC (permalink / raw) To: rdreier, arjan, mingo, zach.brown, openib-general, linux-kernel On Thu, 13 Jul 2006 18:18:35 -0700 Andrew Morton <akpm@osdl.org> wrote: > > Having the caller hold a chunk of memory in a stack variable was the > > trick I came up with to get around that. > > Yes, that certainly works. Problem is, I think, you'll need to preallocate IDR_FREE_MAX items. And then free them all again when none of them were consumed (usual). Yes, storing the preallocated nodes in the idr itself requires locking. But that locking is 100% private to the IDR implementation. It locks only the preload list and not the user's stuff. radix_tree_preload() effectively does this. Except the preload list is kernel-wide. It's split across CPUs and uses local_irq_disable/preempt_disable locking tricks as a performance optimisation. But conceptually it's the same. Simply copying that would give something which is known to work... It seems like a large amount of fuss, but when you think about it the problem isn't simple. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Convert idr's internal locking to _irqsave variant 2006-07-14 1:30 ` Andrew Morton @ 2006-07-17 15:57 ` Roland Dreier 0 siblings, 0 replies; 32+ messages in thread From: Roland Dreier @ 2006-07-17 15:57 UTC (permalink / raw) To: Andrew Morton; +Cc: arjan, mingo, zach.brown, openib-general, linux-kernel > Problem is, I think, you'll need to preallocate IDR_FREE_MAX items. And > then free them all again when none of them were consumed (usual). Actually I think it's OK if we just pass in no more than one extra layer for each try to add something with idr_get_new(). In the worst case, this leads to a lot of extra calls to idr_get_new(), but in the usual case it's fine. I'm including a lightly tested big patch with all my idr changes for comments -- I'll split it up into a form more suitable for merging. (Some of the changes are unrelated and obviously good, eg using kmem_cache_zalloc() instead of a slab cache with a constructor that does memset(0)). I'm not sure I'm thrilled with this approach, but it does seem to be a net win. With an allyesconfig with debugging options turned off (so spinlocks shrink back down to 8 bytes), I get the following: text data bss dec hex filename 24347759 5971210 2322176 32641145 1f21079 vmlinux.old 24347370 5970474 2320704 32638548 1f20654 vmlinux.new Most of the savings comes from ocfs2, which has a static array of 255 structures that each contain an idr -- so removing the lock from struct idr saves 255 * 8 = 2040 bytes. However, even without factoring that in, this does seem to be a net win: add/remove: 2/4 grow/shrink: 23/51 up/down: 719/-3215 (-2496) function old new delta idr_get_new_above 38 554 +516 dm_create 957 1000 +43 ipath_init_one 3294 3329 +35 get_layer - 32 +32 idr_alloc_layer - 16 +16 sd_probe 871 881 +10 rtc_device_register 493 503 +10 proc_register 277 286 +9 mmc_add_host_sysfs 126 135 +9 idr_add_uobj 80 85 +5 cma_alloc_port 224 229 +5 sys_timer_create 876 880 +4 set_anon_super 173 177 +4 sctp_process_init 1312 1316 +4 ib_ucm_ctx_alloc 197 201 +4 ib_create_cm_id 287 290 +3 proc_mkdir_mode 95 97 +2 vfs_kern_mount 279 280 +1 send_mad 325 326 +1 proc_symlink 141 142 +1 proc_file_write 40 41 +1 kill_block_super 56 57 +1 get_sb_single 175 176 +1 free_proc_entry 108 109 +1 deactivate_super 126 127 +1 proc_readdir 353 352 -1 proc_getattr 40 39 -1 get_sb_nodev 150 149 -1 o2net_send_message_vec 2032 2030 -2 hwmon_device_register 198 196 -2 create_proc_entry 170 168 -2 __put_super_and_need_restart 54 52 -2 v9fs_read_work 1424 1421 -3 v9fs_mux_init 1333 1330 -3 v9fs_mux_flush_cb 303 300 -3 v9fs_mux_destroy 369 366 -3 v9fs_mux_cancel 382 379 -3 o2net_init 441 438 -3 inotify_add_watch 285 280 -5 v9fs_session_init 1490 1484 -6 unnamed_dev_idr 32 24 -8 unit_table 32 24 -8 tcp_ps 32 24 -8 sdp_ps 32 24 -8 sd_index_idr 32 24 -8 sctp_assocs_id 32 24 -8 rtc_idr 32 24 -8 query_idr 32 24 -8 proc_inum_idr 32 24 -8 posix_timers_id 32 24 -8 mmc_host_idr 32 24 -8 lpfc_hba_index 32 24 -8 ib_uverbs_srq_idr 32 24 -8 ib_uverbs_qp_idr 32 24 -8 ib_uverbs_pd_idr 32 24 -8 ib_uverbs_mw_idr 32 24 -8 ib_uverbs_mr_idr 32 24 -8 ib_uverbs_cq_idr 32 24 -8 ib_uverbs_ah_idr 32 24 -8 i2c_adapter_idr 32 24 -8 hwmon_idr 32 24 -8 ctx_id_table 32 24 -8 cm 112 104 -8 allocated_ptys 32 24 -8 _minor_idr 32 24 -8 i2c_add_adapter 494 484 -10 idr_cache_ctor 12 - -12 v9fs_get_idpool 149 134 -15 ib_cm_init 237 220 -17 free_layer 55 34 -21 idr_init 88 62 -26 idr_get_new 40 13 -27 idr_remove 357 328 -29 infinipath_init 227 197 -30 lpfc_pci_probe_one 2752 2710 -42 ptmx_open 427 379 -48 idr_pre_get 59 - -59 alloc_layer 66 - -66 idr_get_new_above_int 533 - -533 o2net_nodes 99960 97920 -2040 --- diff --git a/arch/powerpc/mm/mmu_context_64.c b/arch/powerpc/mm/mmu_context_64.c index 90a06ac..85d6a03 100644 --- a/arch/powerpc/mm/mmu_context_64.c +++ b/arch/powerpc/mm/mmu_context_64.c @@ -26,20 +26,21 @@ static DEFINE_IDR(mmu_context_idr); int init_new_context(struct task_struct *tsk, struct mm_struct *mm) { + struct idr_layer *layer = NULL; int index; int err; again: - if (!idr_pre_get(&mmu_context_idr, GFP_KERNEL)) - return -ENOMEM; - spin_lock(&mmu_context_lock); - err = idr_get_new_above(&mmu_context_idr, NULL, 1, &index); + err = idr_get_new_above(&mmu_context_idr, NULL, 1, &index, layer); spin_unlock(&mmu_context_lock); - if (err == -EAGAIN) + if (err == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + return -ENOMEM; goto again; - else if (err) + } else if (err) return err; if (index > MAX_CONTEXT) { diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c index bfdb902..0ae099e 100644 --- a/drivers/char/tty_io.c +++ b/drivers/char/tty_io.c @@ -2135,6 +2135,7 @@ #ifdef CONFIG_UNIX98_PTYS static int ptmx_open(struct inode * inode, struct file * filp) { struct tty_struct *tty; + struct idr_layer *layer = NULL; int retval; int index; int idr_ret; @@ -2143,24 +2144,27 @@ static int ptmx_open(struct inode * inod /* find a device that is not in use. */ down(&allocated_ptys_lock); - if (!idr_pre_get(&allocated_ptys, GFP_KERNEL)) { - up(&allocated_ptys_lock); - return -ENOMEM; - } - idr_ret = idr_get_new(&allocated_ptys, NULL, &index); - if (idr_ret < 0) { - up(&allocated_ptys_lock); - if (idr_ret == -EAGAIN) - return -ENOMEM; - return -EIO; - } - if (index >= pty_limit) { - idr_remove(&allocated_ptys, index); - up(&allocated_ptys_lock); - return -EIO; - } + do { + idr_ret = idr_get_new(&allocated_ptys, NULL, &index, layer); + if (idr_ret == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) { + idr_ret = -ENOMEM; + break; + } + continue; + } + + if (index >= pty_limit) { + idr_remove(&allocated_ptys, index); + idr_ret = -EIO; + } + } while (idr_ret == -EAGAIN); up(&allocated_ptys_lock); + if (idr_ret) + return idr_ret == -EAGAIN ? -ENOMEM : -EIO; + mutex_lock(&tty_mutex); retval = init_dev(ptm_driver, index, &tty); mutex_unlock(&tty_mutex); diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c index 106fa01..82d6d04 100644 --- a/drivers/hwmon/hwmon.c +++ b/drivers/hwmon/hwmon.c @@ -38,20 +38,21 @@ static DEFINE_SPINLOCK(idr_lock); */ struct class_device *hwmon_device_register(struct device *dev) { + struct idr_layer *layer = NULL; struct class_device *cdev; int id, err; again: - if (unlikely(idr_pre_get(&hwmon_idr, GFP_KERNEL) == 0)) - return ERR_PTR(-ENOMEM); - spin_lock(&idr_lock); - err = idr_get_new(&hwmon_idr, NULL, &id); + err = idr_get_new(&hwmon_idr, NULL, &id, layer); spin_unlock(&idr_lock); - if (unlikely(err == -EAGAIN)) + if (unlikely(err == -EAGAIN)) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + return ERR_PTR(-ENOMEM); goto again; - else if (unlikely(err)) + } else if (unlikely(err)) return ERR_PTR(err); id = id & MAX_ID_MASK; diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c index 9cb277d..f8aa8ea 100644 --- a/drivers/i2c/i2c-core.c +++ b/drivers/i2c/i2c-core.c @@ -151,22 +151,24 @@ static struct device_attribute dev_attr_ int i2c_add_adapter(struct i2c_adapter *adap) { int id, res = 0; + struct idr_layer *layer = NULL; struct list_head *item; struct i2c_driver *driver; mutex_lock(&core_lists); - if (idr_pre_get(&i2c_adapter_idr, GFP_KERNEL) == 0) { - res = -ENOMEM; - goto out_unlock; - } - - res = idr_get_new(&i2c_adapter_idr, adap, &id); - if (res < 0) { - if (res == -EAGAIN) +again: + res = idr_get_new(&i2c_adapter_idr, adap, &id, layer); + if (res == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) { res = -ENOMEM; - goto out_unlock; + goto out_unlock; + } + goto again; } + if (res < 0) + goto out_unlock; adap->nr = id & MAX_ID_MASK; mutex_init(&adap->bus_lock); diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index f85c97f..cf14d01 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -298,6 +298,7 @@ static int cm_init_av_by_path(struct ib_ static int cm_alloc_id(struct cm_id_private *cm_id_priv) { + struct idr_layer *layer = NULL; unsigned long flags; int ret; static int next_id; @@ -305,9 +306,15 @@ static int cm_alloc_id(struct cm_id_priv do { spin_lock_irqsave(&cm.lock, flags); ret = idr_get_new_above(&cm.local_id_table, cm_id_priv, next_id++, - (__force int *) &cm_id_priv->id.local_id); + (__force int *) &cm_id_priv->id.local_id, + layer); spin_unlock_irqrestore(&cm.lock, flags); - } while( (ret == -EAGAIN) && idr_pre_get(&cm.local_id_table, GFP_KERNEL) ); + if (ret == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + ret = -ENOMEM; + } + } while (ret == -EAGAIN); return ret; } @@ -3347,7 +3354,6 @@ static int __init ib_cm_init(void) cm.remote_qp_table = RB_ROOT; cm.remote_sidr_table = RB_ROOT; idr_init(&cm.local_id_table); - idr_pre_get(&cm.local_id_table, GFP_KERNEL); cm.wq = create_workqueue("ib_cm"); if (!cm.wq) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d6f99d5..314b150 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1433,6 +1433,7 @@ static void cma_bind_port(struct rdma_bi static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv, unsigned short snum) { + struct idr_layer *layer = NULL; struct rdma_bind_list *bind_list; int port, start, ret; @@ -1443,8 +1444,13 @@ static int cma_alloc_port(struct idr *ps start = snum ? snum : sysctl_local_port_range[0]; do { - ret = idr_get_new_above(ps, bind_list, start, &port); - } while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL)); + ret = idr_get_new_above(ps, bind_list, start, &port, layer); + if (ret == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + ret = -ENOMEM; + } + } while (ret == -EAGAIN); if (ret) goto err; diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index aeda484..824b652 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -490,17 +490,20 @@ static void init_mad(struct ib_sa_mad *m static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask) { + struct idr_layer *layer = NULL; unsigned long flags; int ret, id; retry: - if (!idr_pre_get(&query_idr, gfp_mask)) - return -ENOMEM; spin_lock_irqsave(&idr_lock, flags); - ret = idr_get_new(&query_idr, query, &id); + ret = idr_get_new(&query_idr, query, &id, layer); spin_unlock_irqrestore(&idr_lock, flags); - if (ret == -EAGAIN) - goto retry; + if (ret == -EAGAIN) { + layer = idr_alloc_layer(gfp_mask); + if (layer) + goto retry; + ret = -ENOMEM; + } if (ret) return ret; diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c index c1c6fda..22d5e24 100644 --- a/drivers/infiniband/core/ucm.c +++ b/drivers/infiniband/core/ucm.c @@ -173,6 +173,7 @@ static void ib_ucm_cleanup_events(struct static struct ib_ucm_context *ib_ucm_ctx_alloc(struct ib_ucm_file *file) { + struct idr_layer *layer = NULL; struct ib_ucm_context *ctx; int result; @@ -186,13 +187,15 @@ static struct ib_ucm_context *ib_ucm_ctx INIT_LIST_HEAD(&ctx->events); do { - result = idr_pre_get(&ctx_id_table, GFP_KERNEL); - if (!result) - goto error; - mutex_lock(&ctx_id_mutex); - result = idr_get_new(&ctx_id_table, ctx, &ctx->id); + result = idr_get_new(&ctx_id_table, ctx, &ctx->id, layer); mutex_unlock(&ctx_id_mutex); + + if (result == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + result = -ENOMEM; + } } while (result == -EAGAIN); if (result) diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index bdf5d50..71dea88 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -109,18 +109,20 @@ static void put_uobj_write(struct ib_uob static int idr_add_uobj(struct idr *idr, struct ib_uobject *uobj) { + struct idr_layer *layer = NULL; int ret; -retry: - if (!idr_pre_get(idr, GFP_KERNEL)) - return -ENOMEM; + do { + spin_lock(&ib_uverbs_idr_lock); + ret = idr_get_new(idr, uobj, &uobj->id, layer); + spin_unlock(&ib_uverbs_idr_lock); - spin_lock(&ib_uverbs_idr_lock); - ret = idr_get_new(idr, uobj, &uobj->id); - spin_unlock(&ib_uverbs_idr_lock); - - if (ret == -EAGAIN) - goto retry; + if (ret == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + ret = -ENOMEM; + } + } while (ret == -EAGAIN); return ret; } diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index 823131d..6f2a711 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -168,15 +168,11 @@ static void ipath_free_devdata(struct pc static struct ipath_devdata *ipath_alloc_devdata(struct pci_dev *pdev) { + struct idr_layer *layer = NULL; unsigned long flags; struct ipath_devdata *dd; int ret; - if (!idr_pre_get(&unit_table, GFP_KERNEL)) { - dd = ERR_PTR(-ENOMEM); - goto bail; - } - dd = vmalloc(sizeof(*dd)); if (!dd) { dd = ERR_PTR(-ENOMEM); @@ -187,7 +183,19 @@ static struct ipath_devdata *ipath_alloc spin_lock_irqsave(&ipath_devs_lock, flags); - ret = idr_get_new(&unit_table, dd, &dd->ipath_unit); + do { + ret = idr_get_new(&unit_table, dd, &dd->ipath_unit, layer); + if (ret == -EAGAIN) { + spin_unlock_irqrestore(&ipath_devs_lock, flags); + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) { + dd = ERR_PTR(-ENOMEM); + goto bail; + } + spin_lock_irqsave(&ipath_devs_lock, flags); + } + } while (ret == -EAGAIN); + if (ret < 0) { printk(KERN_ERR IPATH_DRV_NAME ": Could not allocate unit ID: error %d\n", -ret); @@ -1754,10 +1762,6 @@ static int __init infinipath_init(void) * the PCI subsystem. */ idr_init(&unit_table); - if (!idr_pre_get(&unit_table, GFP_KERNEL)) { - ret = -ENOMEM; - goto bail; - } ret = pci_register_driver(&ipath_driver); if (ret < 0) { @@ -1780,7 +1784,7 @@ static int __init infinipath_init(void) goto bail_group; } - goto bail; + return ret; bail_group: ipath_driver_remove_group(&ipath_driver.driver); @@ -1791,7 +1795,6 @@ bail_pci: bail_unit: idr_destroy(&unit_table); -bail: return ret; } diff --git a/drivers/md/dm.c b/drivers/md/dm.c index c99bf9f..e2dd20a 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -807,23 +807,30 @@ static void free_minor(int minor) */ static int specific_minor(struct mapped_device *md, int minor) { + struct idr_layer *layer = NULL; int r, m; if (minor >= (1 << MINORBITS)) return -EINVAL; - r = idr_pre_get(&_minor_idr, GFP_KERNEL); - if (!r) - return -ENOMEM; - spin_lock(&_minor_lock); +again: if (idr_find(&_minor_idr, minor)) { r = -EBUSY; goto out; } - r = idr_get_new_above(&_minor_idr, MINOR_ALLOCED, minor, &m); + r = idr_get_new_above(&_minor_idr, MINOR_ALLOCED, minor, &m, layer); + if (r == -EAGAIN) { + spin_unlock(&_minor_lock); + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + return -ENOMEM; + + spin_lock(&_minor_lock); + goto again; + } if (r) goto out; @@ -840,18 +847,21 @@ out: static int next_free_minor(struct mapped_device *md, int *minor) { + struct idr_layer *layer = NULL; int r, m; - r = idr_pre_get(&_minor_idr, GFP_KERNEL); - if (!r) - return -ENOMEM; - +again: spin_lock(&_minor_lock); - - r = idr_get_new(&_minor_idr, MINOR_ALLOCED, &m); - if (r) { - goto out; + r = idr_get_new(&_minor_idr, MINOR_ALLOCED, &m, layer); + if (r == -EAGAIN) { + spin_unlock(&_minor_lock); + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + return -ENOMEM; + goto again; } + if (r) + goto out; if (m >= (1 << MINORBITS)) { idr_remove(&_minor_idr, m); diff --git a/drivers/mmc/mmc_sysfs.c b/drivers/mmc/mmc_sysfs.c index a2a35fd..bf61ff4 100644 --- a/drivers/mmc/mmc_sysfs.c +++ b/drivers/mmc/mmc_sysfs.c @@ -280,14 +280,19 @@ struct mmc_host *mmc_alloc_host_sysfs(in */ int mmc_add_host_sysfs(struct mmc_host *host) { + struct idr_layer *layer = NULL; int err; - if (!idr_pre_get(&mmc_host_idr, GFP_KERNEL)) - return -ENOMEM; - +again: spin_lock(&mmc_host_lock); - err = idr_get_new(&mmc_host_idr, host, &host->index); + err = idr_get_new(&mmc_host_idr, host, &host->index, layer); spin_unlock(&mmc_host_lock); + if (err == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + return -ENOMEM; + goto again; + } if (err) return err; diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c index 1cb61a7..3388059 100644 --- a/drivers/rtc/class.c +++ b/drivers/rtc/class.c @@ -42,19 +42,21 @@ struct rtc_device *rtc_device_register(c struct rtc_class_ops *ops, struct module *owner) { + struct idr_layer *layer = NULL; struct rtc_device *rtc; int id, err; - if (idr_pre_get(&rtc_idr, GFP_KERNEL) == 0) { - err = -ENOMEM; - goto exit; - } - - +again: mutex_lock(&idr_lock); - err = idr_get_new(&rtc_idr, NULL, &id); + err = idr_get_new(&rtc_idr, NULL, &id, layer); mutex_unlock(&idr_lock); + if (err == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + return ERR_PTR(-ENOMEM); + goto again; + } if (err < 0) goto exit; diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 81755a3..401608b 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -1421,6 +1421,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev, struct lpfc_hba *phba; struct lpfc_sli *psli; struct lpfc_iocbq *iocbq_entry = NULL, *iocbq_next = NULL; + struct idr_layer *layer = NULL; unsigned long bar0map_len, bar2map_len; int error = -ENODEV, retval; int i; @@ -1443,10 +1444,16 @@ lpfc_pci_probe_one(struct pci_dev *pdev, phba->pcidev = pdev; /* Assign an unused board number */ - if (!idr_pre_get(&lpfc_hba_index, GFP_KERNEL)) - goto out_put_host; - - error = idr_get_new(&lpfc_hba_index, NULL, &phba->brd_no); +again: + error = idr_get_new(&lpfc_hba_index, NULL, &phba->brd_no, layer); + if (error == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) { + error = -ENOMEM; + goto out_put_host; + } + goto again; + } if (error) goto out_put_host; diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 3225d31..fe504be 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1617,6 +1617,7 @@ static int sd_probe(struct device *dev) struct scsi_device *sdp = to_scsi_device(dev); struct scsi_disk *sdkp; struct gendisk *gd; + struct idr_layer *layer = NULL; u32 index; int error; @@ -1636,13 +1637,19 @@ static int sd_probe(struct device *dev) if (!gd) goto out_free; - if (!idr_pre_get(&sd_index_idr, GFP_KERNEL)) - goto out_put; - +again: spin_lock(&sd_index_lock); - error = idr_get_new(&sd_index_idr, NULL, &index); + error = idr_get_new(&sd_index_idr, NULL, &index, layer); spin_unlock(&sd_index_lock); + if (error == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) { + error = -ENOMEM; + goto out_put; + } + goto again; + } if (index >= SD_MAX_DISKS) error = -EBUSY; if (error) diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c index 22f7ccd..d03c3ba 100644 --- a/fs/9p/v9fs.c +++ b/fs/9p/v9fs.c @@ -198,25 +198,26 @@ struct v9fs_session_info *v9fs_inode2v9s int v9fs_get_idpool(struct v9fs_idpool *p) { + struct idr_layer *layer = NULL; int i = 0; int error; retry: - if (idr_pre_get(&p->pool, GFP_KERNEL) == 0) - return 0; - if (down_interruptible(&p->lock) == -EINTR) { eprintk(KERN_WARNING, "Interrupted while locking\n"); return -1; } /* no need to store exactly p, we just need something non-null */ - error = idr_get_new(&p->pool, p, &i); + error = idr_get_new(&p->pool, p, &i, layer); up(&p->lock); - if (error == -EAGAIN) + if (error == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + return -1; goto retry; - else if (error) + } else if (error) return -1; return i; diff --git a/fs/inotify.c b/fs/inotify.c index 723836a..7e12bed 100644 --- a/fs/inotify.c +++ b/fs/inotify.c @@ -131,12 +131,16 @@ EXPORT_SYMBOL_GPL(put_inotify_watch); static int inotify_handle_get_wd(struct inotify_handle *ih, struct inotify_watch *watch) { + struct idr_layer *layer = NULL; int ret; do { - if (unlikely(!idr_pre_get(&ih->idr, GFP_KERNEL))) - return -ENOSPC; - ret = idr_get_new_above(&ih->idr, watch, ih->last_wd+1, &watch->wd); + ret = idr_get_new_above(&ih->idr, watch, ih->last_wd+1, &watch->wd, layer); + if (ret == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + ret = -ENOSPC; + } } while (ret == -EAGAIN); if (likely(!ret)) diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c index b650efa..f9d1b04 100644 --- a/fs/ocfs2/cluster/tcp.c +++ b/fs/ocfs2/cluster/tcp.c @@ -175,19 +175,21 @@ static u8 o2net_num_from_nn(struct o2net static int o2net_prep_nsw(struct o2net_node *nn, struct o2net_status_wait *nsw) { + struct idr_layer *layer = NULL; int ret = 0; do { - if (!idr_pre_get(&nn->nn_status_idr, GFP_ATOMIC)) { - ret = -EAGAIN; - break; - } spin_lock(&nn->nn_lock); - ret = idr_get_new(&nn->nn_status_idr, nsw, &nsw->ns_id); + ret = idr_get_new(&nn->nn_status_idr, nsw, &nsw->ns_id, layer); if (ret == 0) list_add_tail(&nsw->ns_node_item, &nn->nn_status_list); spin_unlock(&nn->nn_lock); + if (ret == -EAGAIN) { + layer = idr_alloc_layer(GFP_ATOMIC); + if (!layer) + ret = -ENOMEM; + } } while (ret == -EAGAIN); if (ret == 0) { diff --git a/fs/proc/generic.c b/fs/proc/generic.c index 4ba0300..60bdd42 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -318,19 +318,20 @@ #define PROC_DYNAMIC_FIRST 0xF0000000UL */ static unsigned int get_inode_number(void) { + struct idr_layer *layer = NULL; int i, inum = 0; int error; retry: - if (idr_pre_get(&proc_inum_idr, GFP_KERNEL) == 0) - return 0; - spin_lock(&proc_inum_lock); - error = idr_get_new(&proc_inum_idr, NULL, &i); + error = idr_get_new(&proc_inum_idr, NULL, &i, layer); spin_unlock(&proc_inum_lock); - if (error == -EAGAIN) + if (error == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (!layer) + return 0; goto retry; - else if (error) + } else if (error) return 0; inum = (i & MAX_ID_MASK) + PROC_DYNAMIC_FIRST; diff --git a/fs/super.c b/fs/super.c index 6d4e817..67455f5 100644 --- a/fs/super.c +++ b/fs/super.c @@ -607,19 +607,20 @@ static DEFINE_SPINLOCK(unnamed_dev_lock) int set_anon_super(struct super_block *s, void *data) { + struct idr_layer *layer = NULL; int dev; int error; retry: - if (idr_pre_get(&unnamed_dev_idr, GFP_ATOMIC) == 0) - return -ENOMEM; spin_lock(&unnamed_dev_lock); - error = idr_get_new(&unnamed_dev_idr, NULL, &dev); + error = idr_get_new(&unnamed_dev_idr, NULL, &dev, layer); spin_unlock(&unnamed_dev_lock); - if (error == -EAGAIN) - /* We raced and lost with another CPU. */ + if (error == -EAGAIN) { + layer = idr_alloc_layer(GFP_ATOMIC); + if (!layer) + return -ENOMEM; goto retry; - else if (error) + } else if (error) return -EAGAIN; if ((dev & MAX_ID_MASK) == (1 << MINORBITS)) { diff --git a/include/linux/idr.h b/include/linux/idr.h index 8268034..de34c4e 100644 --- a/include/linux/idr.h +++ b/include/linux/idr.h @@ -44,7 +44,7 @@ #define MAX_ID_MASK (MAX_ID_BIT - 1) #define MAX_LEVEL (MAX_ID_SHIFT + IDR_BITS - 1) / IDR_BITS /* Number of id_layer structs to leave in free list */ -#define IDR_FREE_MAX MAX_LEVEL + MAX_LEVEL +#define IDR_FREE_MAX (MAX_LEVEL + MAX_LEVEL) struct idr_layer { unsigned long bitmap; /* A zero bit means "space here" */ @@ -57,7 +57,6 @@ struct idr { struct idr_layer *id_free; int layers; int id_free_cnt; - spinlock_t lock; }; #define IDR_INIT(name) \ @@ -66,7 +65,6 @@ #define IDR_INIT(name) \ .id_free = NULL, \ .layers = 0, \ .id_free_cnt = 0, \ - .lock = __SPIN_LOCK_UNLOCKED(name.lock), \ } #define DEFINE_IDR(name) struct idr name = IDR_INIT(name) @@ -75,9 +73,10 @@ #define DEFINE_IDR(name) struct idr name */ void *idr_find(struct idr *idp, int id); -int idr_pre_get(struct idr *idp, gfp_t gfp_mask); -int idr_get_new(struct idr *idp, void *ptr, int *id); -int idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id); +struct idr_layer *idr_alloc_layer(gfp_t gfp_mask); +int idr_get_new(struct idr *idp, void *ptr, int *id, struct idr_layer *layer); +int idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id, + struct idr_layer *layer); void *idr_replace(struct idr *idp, void *ptr, int id); void idr_remove(struct idr *idp, int id); void idr_destroy(struct idr *idp); diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c index ac6dc87..18891b2 100644 --- a/kernel/posix-timers.c +++ b/kernel/posix-timers.c @@ -434,6 +434,7 @@ sys_timer_create(const clockid_t which_c struct sigevent __user *timer_event_spec, timer_t __user * created_timer_id) { + struct idr_layer *layer = NULL; int error = 0; struct k_itimer *new_timer = NULL; int new_timer_id; @@ -451,17 +452,16 @@ sys_timer_create(const clockid_t which_c spin_lock_init(&new_timer->it_lock); retry: - if (unlikely(!idr_pre_get(&posix_timers_id, GFP_KERNEL))) { - error = -EAGAIN; - goto out; - } spin_lock_irq(&idr_lock); error = idr_get_new(&posix_timers_id, (void *) new_timer, - &new_timer_id); + &new_timer_id, layer); spin_unlock_irq(&idr_lock); - if (error == -EAGAIN) - goto retry; - else if (error) { + if (error == -EAGAIN) { + layer = idr_alloc_layer(GFP_KERNEL); + if (layer) + goto retry; + } + if (error) { /* * Wierd looking, but we return EAGAIN if the IDR is * full (proper POSIX return value for this) diff --git a/lib/idr.c b/lib/idr.c index 16d2143..187ce4d 100644 --- a/lib/idr.c +++ b/lib/idr.c @@ -35,65 +35,39 @@ #include <linux/idr.h> static kmem_cache_t *idr_layer_cache; -static struct idr_layer *alloc_layer(struct idr *idp) +static struct idr_layer *get_layer(struct idr *idp) { struct idr_layer *p; - unsigned long flags; - spin_lock_irqsave(&idp->lock, flags); - if ((p = idp->id_free)) { + p = idp->id_free; + if (p) { idp->id_free = p->ary[0]; idp->id_free_cnt--; p->ary[0] = NULL; } - spin_unlock_irqrestore(&idp->lock, flags); - return(p); -} -/* only called when idp->lock is held */ -static void __free_layer(struct idr *idp, struct idr_layer *p) -{ - p->ary[0] = idp->id_free; - idp->id_free = p; - idp->id_free_cnt++; + return p; } static void free_layer(struct idr *idp, struct idr_layer *p) { - unsigned long flags; - - /* - * Depends on the return element being zeroed. - */ - spin_lock_irqsave(&idp->lock, flags); - __free_layer(idp, p); - spin_unlock_irqrestore(&idp->lock, flags); + if (idp->id_free_cnt < IDR_FREE_MAX) { + p->ary[0] = idp->id_free; + idp->id_free = p; + idp->id_free_cnt++; + } else + kmem_cache_free(idr_layer_cache, p); } /** - * idr_pre_get - reserver resources for idr allocation - * @idp: idr handle + * idr_alloc_layer - reserve resources for idr allocation * @gfp_mask: memory allocation flags - * - * This function should be called prior to locking and calling the - * following function. It preallocates enough memory to satisfy - * the worst possible allocation. - * - * If the system is REALLY out of memory this function returns 0, - * otherwise 1. */ -int idr_pre_get(struct idr *idp, gfp_t gfp_mask) +struct idr_layer *idr_alloc_layer(gfp_t gfp_mask) { - while (idp->id_free_cnt < IDR_FREE_MAX) { - struct idr_layer *new; - new = kmem_cache_alloc(idr_layer_cache, gfp_mask); - if (new == NULL) - return (0); - free_layer(idp, new); - } - return 1; + return kmem_cache_zalloc(idr_layer_cache, gfp_mask); } -EXPORT_SYMBOL(idr_pre_get); +EXPORT_SYMBOL(idr_alloc_layer); static int sub_alloc(struct idr *idp, void *ptr, int *starting_id) { @@ -136,7 +110,7 @@ static int sub_alloc(struct idr *idp, vo * Create the layer below if it is missing. */ if (!p->ary[m]) { - if (!(new = alloc_layer(idp))) + if (!(new = get_layer(idp))) return -1; p->ary[m] = new; p->count++; @@ -171,14 +145,13 @@ static int idr_get_new_above_int(struct { struct idr_layer *p, *new; int layers, v, id; - unsigned long flags; id = starting_id; build_up: p = idp->top; layers = idp->layers; if (unlikely(!p)) { - if (!(p = alloc_layer(idp))) + if (!(p = get_layer(idp))) return -1; layers = 1; } @@ -190,19 +163,17 @@ build_up: layers++; if (!p->count) continue; - if (!(new = alloc_layer(idp))) { + if (!(new = get_layer(idp))) { /* * The allocation failed. If we built part of * the structure tear it down. */ - spin_lock_irqsave(&idp->lock, flags); for (new = p; p && p != idp->top; new = p) { p = p->ary[0]; new->ary[0] = NULL; new->bitmap = new->count = 0; - __free_layer(idp, new); + free_layer(idp, new); } - spin_unlock_irqrestore(&idp->lock, flags); return -1; } new->ary[0] = p; @@ -216,7 +187,7 @@ build_up: v = sub_alloc(idp, ptr, &id); if (v == -2) goto build_up; - return(v); + return v; } /** @@ -225,20 +196,25 @@ build_up: * @ptr: pointer you want associated with the ide * @start_id: id to start search at * @id: pointer to the allocated handle + * @layer: pointer to extra storage * * This is the allocate id function. It should be called with any * required locks. * * If memory is required, it will return -EAGAIN, you should unlock - * and go back to the idr_pre_get() call. If the idr is full, it will - * return -ENOSPC. + * and call idr_alloc_layer() to get a new layer to pass in as the + * @layer parameter. If the idr is full, it will return -ENOSPC. * * @id returns a value in the range 0 ... 0x7fffffff */ -int idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id) +int idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id, + struct idr_layer *layer) { int rv; + if (layer) + free_layer(idp, layer); + rv = idr_get_new_above_int(idp, ptr, starting_id); /* * This is a cheap hack until the IDR code can be fixed to @@ -260,33 +236,20 @@ EXPORT_SYMBOL(idr_get_new_above); * @idp: idr handle * @ptr: pointer you want associated with the ide * @id: pointer to the allocated handle + * @layer: pointer to extra storage * * This is the allocate id function. It should be called with any * required locks. * * If memory is required, it will return -EAGAIN, you should unlock - * and go back to the idr_pre_get() call. If the idr is full, it will - * return -ENOSPC. + * and call idr_alloc_layer() to get a new layer to pass in as the + * @layer parameter. If the idr is full, it will return -ENOSPC. * * @id returns a value in the range 0 ... 0x7fffffff */ -int idr_get_new(struct idr *idp, void *ptr, int *id) +int idr_get_new(struct idr *idp, void *ptr, int *id, struct idr_layer *layer) { - int rv; - - rv = idr_get_new_above_int(idp, ptr, 0); - /* - * This is a cheap hack until the IDR code can be fixed to - * return proper error values. - */ - if (rv < 0) { - if (rv == -1) - return -EAGAIN; - else /* Will be -3 */ - return -ENOSPC; - } - *id = rv; - return 0; + return idr_get_new_above(idp, ptr, 0, id, layer); } EXPORT_SYMBOL(idr_get_new); @@ -349,11 +312,6 @@ void idr_remove(struct idr *idp, int id) idp->top = p; --idp->layers; } - while (idp->id_free_cnt >= IDR_FREE_MAX) { - p = alloc_layer(idp); - kmem_cache_free(idr_layer_cache, p); - return; - } } EXPORT_SYMBOL(idr_remove); @@ -364,7 +322,7 @@ EXPORT_SYMBOL(idr_remove); void idr_destroy(struct idr *idp) { while (idp->id_free_cnt) { - struct idr_layer *p = alloc_layer(idp); + struct idr_layer *p = get_layer(idp); kmem_cache_free(idr_layer_cache, p); } } @@ -445,17 +403,11 @@ void *idr_replace(struct idr *idp, void } EXPORT_SYMBOL(idr_replace); -static void idr_cache_ctor(void * idr_layer, kmem_cache_t *idr_layer_cache, - unsigned long flags) -{ - memset(idr_layer, 0, sizeof(struct idr_layer)); -} - static int init_id_cache(void) { if (!idr_layer_cache) idr_layer_cache = kmem_cache_create("idr_layer_cache", - sizeof(struct idr_layer), 0, 0, idr_cache_ctor, NULL); + sizeof(struct idr_layer), 0, 0, NULL, NULL); return 0; } @@ -470,6 +422,5 @@ void idr_init(struct idr *idp) { init_id_cache(); memset(idp, 0, sizeof(struct idr)); - spin_lock_init(&idp->lock); } EXPORT_SYMBOL(idr_init); diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c index 2a87736..fc712c4 100644 --- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -1928,6 +1928,7 @@ int sctp_process_init(struct sctp_associ * association. */ if (!asoc->temp) { + struct idr_layer *layer = NULL; int assoc_id; int error; @@ -1937,15 +1938,16 @@ int sctp_process_init(struct sctp_associ goto clean_up; retry: - if (unlikely(!idr_pre_get(&sctp_assocs_id, gfp))) - goto clean_up; spin_lock_bh(&sctp_assocs_id_lock); error = idr_get_new_above(&sctp_assocs_id, (void *)asoc, 1, - &assoc_id); + &assoc_id, layer); spin_unlock_bh(&sctp_assocs_id_lock); - if (error == -EAGAIN) - goto retry; - else if (error) + if (error == -EAGAIN) { + layer = idr_alloc_layer(gfp); + if (layer) + goto retry; + } + if (error) goto clean_up; asoc->assoc_id = (sctp_assoc_t) assoc_id; ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [openib-general] ipoib lockdep warning 2006-07-11 20:10 ipoib lockdep warning Zach Brown 2006-07-11 21:16 ` Michael S. Tsirkin 2006-07-11 22:54 ` Roland Dreier @ 2006-07-12 2:33 ` Roland Dreier 2006-07-12 6:49 ` Arjan van de Ven 2 siblings, 1 reply; 32+ messages in thread From: Roland Dreier @ 2006-07-12 2:33 UTC (permalink / raw) To: Zach Brown Cc: openib-general, Linux Kernel Mailing List, Arjan van de Ven, Ingo Molnar OK, the patch to lib/idr.c below is at least one way to fix this. However, with that applied I get the lockdep warning below, which seems to be a false positive -- I'm not sure what the right fix is, but the blame really seems to fall on udp_ioctl for poking into the sk_buff_head lock itself. Ingo and/or Arjan, any thoughts on the idr.c change or the sk_buff_head warning? Thanks, Roland Here's the idr.c change: diff --git a/lib/idr.c b/lib/idr.c index 4d09681..16d2143 100644 --- a/lib/idr.c +++ b/lib/idr.c @@ -38,14 +38,15 @@ static kmem_cache_t *idr_layer_cache; static struct idr_layer *alloc_layer(struct idr *idp) { struct idr_layer *p; + unsigned long flags; - spin_lock(&idp->lock); + spin_lock_irqsave(&idp->lock, flags); if ((p = idp->id_free)) { idp->id_free = p->ary[0]; idp->id_free_cnt--; p->ary[0] = NULL; } - spin_unlock(&idp->lock); + spin_unlock_irqrestore(&idp->lock, flags); return(p); } @@ -59,12 +60,14 @@ static void __free_layer(struct idr *idp static void free_layer(struct idr *idp, struct idr_layer *p) { + unsigned long flags; + /* * Depends on the return element being zeroed. */ - spin_lock(&idp->lock); + spin_lock_irqsave(&idp->lock, flags); __free_layer(idp, p); - spin_unlock(&idp->lock); + spin_unlock_irqrestore(&idp->lock, flags); } /** @@ -168,6 +171,7 @@ static int idr_get_new_above_int(struct { struct idr_layer *p, *new; int layers, v, id; + unsigned long flags; id = starting_id; build_up: @@ -191,14 +195,14 @@ build_up: * The allocation failed. If we built part of * the structure tear it down. */ - spin_lock(&idp->lock); + spin_lock_irqsave(&idp->lock, flags); for (new = p; p && p != idp->top; new = p) { p = p->ary[0]; new->ary[0] = NULL; new->bitmap = new->count = 0; __free_layer(idp, new); } - spin_unlock(&idp->lock); + spin_unlock_irqrestore(&idp->lock, flags); return -1; } new->ary[0] = p; And here's the warning I get, which appears to be a false positive: ====================================================== [ INFO: hard-safe -> hard-unsafe lock order detected ] ------------------------------------------------------ swapper/0 [HC0[0]:SC1[2]:HE0:SE0] is trying to acquire: (&skb_queue_lock_key){-+..}, at: [<ffffffff8038683e>] skb_queue_tail+0x1d/0x47 and this task is already holding: (&priv->lock){.+..}, at: [<ffffffff881efd5b>] ipoib_mcast_send+0x29/0x413 [ib_ipoib] which would create a new lock dependency: (&priv->lock){.+..} -> (&skb_queue_lock_key){-+..} but this new dependency connects a hard-irq-safe lock: (&priv->tx_lock){+...} ... which became hard-irq-safe at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff881ee615>] ipoib_ib_completion+0x340/0x3ff [ib_ipoib] [<ffffffff881ca8a5>] mthca_cq_completion+0x65/0x6b [ib_mthca] [<ffffffff881ca1bc>] mthca_eq_int+0x70/0x3d2 [ib_mthca] [<ffffffff881ca644>] mthca_arbel_interrupt+0x3b/0x98 [ib_mthca] [<ffffffff802537de>] handle_IRQ_event+0x28/0x64 [<ffffffff802538c6>] __do_IRQ+0xac/0x117 [<ffffffff8020becf>] do_IRQ+0xf7/0x108 [<ffffffff80209bd8>] common_interrupt+0x64/0x65 to a hard-irq-unsafe lock: (&skb_queue_lock_key){-+..} ... which became hard-irq-unsafe at: ... [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb349>] _spin_lock_bh+0x26/0x33 [<ffffffff803c6de8>] udp_ioctl+0x46/0x87 [<ffffffff803cd1ab>] inet_ioctl+0x8c/0x8f [<ffffffff8038203c>] sock_ioctl+0x1c0/0x1ea [<ffffffff8028591e>] do_ioctl+0x26/0x74 [<ffffffff80285bb6>] vfs_ioctl+0x24a/0x264 [<ffffffff80285c11>] sys_ioctl+0x41/0x68 [<ffffffff802096a1>] system_call+0x7d/0x83 other info that might help us debug this: 2 locks held by swapper/0: #0: (&priv->tx_lock){+...}, at: [<ffffffff881ed6ef>] ipoib_start_xmit+0x42/0x66d [ib_ipoib] #1: (&priv->lock){.+..}, at: [<ffffffff881efd5b>] ipoib_mcast_send+0x29/0x413 [ib_ipoib] the hard-irq-safe lock's dependencies: -> (&priv->tx_lock){+...} ops: 0 { initial-use at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb644>] _spin_lock_irq+0x28/0x35 [<ffffffff881ef705>] ipoib_mcast_join_finish+0x340/0x379 [ib_ipoib] [<ffffffff881ef7e0>] ipoib_mcast_join_complete+0xa2/0x294 [ib_ipoib] [<ffffffff881e6389>] ib_sa_mcmember_rec_callback+0x4b/0x57 [ib_sa] [<ffffffff881e648d>] recv_handler+0x3d/0x4a [ib_sa] [<ffffffff881b9c20>] ib_mad_completion_handler+0x3df/0x5e4 [ib_mad] [<ffffffff80238422>] run_workqueue+0xa0/0xf7 [<ffffffff8023861d>] worker_thread+0xee/0x122 [<ffffffff8023b661>] kthread+0xd0/0xfb [<ffffffff8020a5f1>] child_rip+0x7/0x12 in-hardirq-W at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff881ee615>] ipoib_ib_completion+0x340/0x3ff [ib_ipoib] [<ffffffff881ca8a5>] mthca_cq_completion+0x65/0x6b [ib_mthca] [<ffffffff881ca1bc>] mthca_eq_int+0x70/0x3d2 [ib_mthca] [<ffffffff881ca644>] mthca_arbel_interrupt+0x3b/0x98 [ib_mthca] [<ffffffff802537de>] handle_IRQ_event+0x28/0x64 [<ffffffff802538c6>] __do_IRQ+0xac/0x117 [<ffffffff8020becf>] do_IRQ+0xf7/0x108 [<ffffffff80209bd8>] common_interrupt+0x64/0x65 } ... key at: [<ffffffff881f9a28>] __key.23652+0x0/0xffffffffffff804f [ib_ipoib] -> (&priv->lock){.+..} ops: 0 { initial-use at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb644>] _spin_lock_irq+0x28/0x35 [<ffffffff881ef2c6>] ipoib_mcast_start_thread+0x6e/0x87 [ib_ipoib] [<ffffffff881eee7d>] ipoib_ib_dev_up+0x52/0x58 [ib_ipoib] [<ffffffff881ec23f>] ipoib_open+0x65/0x102 [ib_ipoib] [<ffffffff8038c74c>] dev_open+0x3a/0x80 [<ffffffff8038ba90>] dev_change_flags+0x65/0x139 [<ffffffff803ccb56>] devinet_ioctl+0x240/0x5e2 [<ffffffff803cd18f>] inet_ioctl+0x70/0x8f [<ffffffff8038203c>] sock_ioctl+0x1c0/0x1ea [<ffffffff8028591e>] do_ioctl+0x26/0x74 [<ffffffff80285bb6>] vfs_ioctl+0x24a/0x264 [<ffffffff80285c11>] sys_ioctl+0x41/0x68 [<ffffffff802096a1>] system_call+0x7d/0x83 in-softirq-W at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb316>] _spin_lock+0x21/0x2e [<ffffffff881efd5a>] ipoib_mcast_send+0x28/0x413 [ib_ipoib] [<ffffffff881eda0e>] ipoib_start_xmit+0x361/0x66d [ib_ipoib] [<ffffffff8038dfc8>] dev_hard_start_xmit+0x1ab/0x221 [<ffffffff8039a024>] __qdisc_run+0xfa/0x1cd [<ffffffff8038e174>] dev_queue_xmit+0x136/0x263 [<ffffffff80390209>] neigh_connected_output+0xae/0xc7 [<ffffffff8813f4de>] ip6_output2+0x254/0x28c [ipv6] [<ffffffff8813fcf8>] ip6_output+0x7e2/0x7f8 [ipv6] [<ffffffff8814dbf7>] ndisc_send_ns+0x38f/0x4c1 [ipv6] [<ffffffff88145048>] addrconf_dad_timer+0xfb/0x11e [ipv6] [<ffffffff80231a8e>] run_timer_softirq+0x150/0x1dc [<ffffffff8022e345>] __do_softirq+0x6b/0xf7 [<ffffffff8020a94d>] call_softirq+0x1d/0x28 [<ffffffff8022e0bc>] irq_exit+0x56/0x59 [<ffffffff802146ee>] smp_apic_timer_interrupt+0x59/0x5f [<ffffffff8020a2c1>] apic_timer_interrupt+0x69/0x70 } ... key at: [<ffffffff881f9a30>] __key.23651+0x0/0xffffffffffff8047 [ib_ipoib] -> (&idr_lock){....} ops: 0 { initial-use at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff881e6161>] send_mad+0x39/0x141 [ib_sa] [<ffffffff881e6a20>] ib_sa_mcmember_rec_query+0x147/0x17c [ib_sa] [<ffffffff881f02a4>] ipoib_mcast_join+0x15f/0x1f2 [ib_ipoib] [<ffffffff881f0582>] ipoib_mcast_join_task+0x24b/0x2e7 [ib_ipoib] [<ffffffff80238422>] run_workqueue+0xa0/0xf7 [<ffffffff8023861d>] worker_thread+0xee/0x122 [<ffffffff8023b661>] kthread+0xd0/0xfb [<ffffffff8020a5f1>] child_rip+0x7/0x12 } ... key at: [<ffffffff881ea488>] __key.17990+0x0/0xffffffffffffc8c2 [ib_sa] -> (query_idr.lock){....} ops: 0 { initial-use at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff802e4581>] free_layer+0x1c/0x40 [<ffffffff802e45d6>] idr_pre_get+0x31/0x42 [<ffffffff881e614d>] send_mad+0x25/0x141 [ib_sa] [<ffffffff881e6a20>] ib_sa_mcmember_rec_query+0x147/0x17c [ib_sa] [<ffffffff881f02a4>] ipoib_mcast_join+0x15f/0x1f2 [ib_ipoib] [<ffffffff881f0582>] ipoib_mcast_join_task+0x24b/0x2e7 [ib_ipoib] [<ffffffff80238422>] run_workqueue+0xa0/0xf7 [<ffffffff8023861d>] worker_thread+0xee/0x122 [<ffffffff8023b661>] kthread+0xd0/0xfb [<ffffffff8020a5f1>] child_rip+0x7/0x12 } ... key at: [<ffffffff881e9f30>] query_idr+0x30/0xffffffffffffce4a [ib_sa] ... acquired at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff802e4530>] alloc_layer+0x18/0x4d [<ffffffff802e48de>] idr_get_new_above_int+0x37/0x225 [<ffffffff802e4adb>] idr_get_new+0xf/0x2f [<ffffffff881e6177>] send_mad+0x4f/0x141 [ib_sa] [<ffffffff881e6a20>] ib_sa_mcmember_rec_query+0x147/0x17c [ib_sa] [<ffffffff881f02a4>] ipoib_mcast_join+0x15f/0x1f2 [ib_ipoib] [<ffffffff881f0582>] ipoib_mcast_join_task+0x24b/0x2e7 [ib_ipoib] [<ffffffff80238422>] run_workqueue+0xa0/0xf7 [<ffffffff8023861d>] worker_thread+0xee/0x122 [<ffffffff8023b661>] kthread+0xd0/0xfb [<ffffffff8020a5f1>] child_rip+0x7/0x12 ... acquired at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff881e6069>] ib_sa_cancel_query+0x1b/0x73 [ib_sa] [<ffffffff881eefb5>] wait_for_mcast_join+0x35/0xd2 [ib_ipoib] [<ffffffff881ef385>] ipoib_mcast_stop_thread+0xa6/0xe6 [ib_ipoib] [<ffffffff881f07da>] ipoib_mcast_restart_task+0x4b/0x3d9 [ib_ipoib] [<ffffffff80238422>] run_workqueue+0xa0/0xf7 [<ffffffff8023861d>] worker_thread+0xee/0x122 [<ffffffff8023b661>] kthread+0xd0/0xfb [<ffffffff8020a5f1>] child_rip+0x7/0x12 -> (&mad_agent_priv->lock){....} ops: 0 { initial-use at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff881b936d>] ib_post_send_mad+0x3e0/0x512 [ib_mad] [<ffffffff881bb251>] agent_send_response+0x11b/0x155 [ib_mad] [<ffffffff881b9b2b>] ib_mad_completion_handler+0x2ea/0x5e4 [ib_mad] [<ffffffff80238422>] run_workqueue+0xa0/0xf7 [<ffffffff8023861d>] worker_thread+0xee/0x122 [<ffffffff8023b661>] kthread+0xd0/0xfb [<ffffffff8020a5f1>] child_rip+0x7/0x12 } ... key at: [<ffffffff881c1424>] __key.17415+0x0/0xffffffffffffaf74 [ib_mad] -> (base_lock_keys + cpu){++..} ops: 0 { initial-use at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb644>] _spin_lock_irq+0x28/0x35 [<ffffffff8023197f>] run_timer_softirq+0x41/0x1dc [<ffffffff8022e345>] __do_softirq+0x6b/0xf7 [<ffffffff8020a94d>] call_softirq+0x1d/0x28 [<ffffffff8022e0bc>] irq_exit+0x56/0x59 [<ffffffff8020bed4>] do_IRQ+0xfc/0x108 [<ffffffff80209bd8>] common_interrupt+0x64/0x65 in-hardirq-W at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb316>] _spin_lock+0x21/0x2e [<ffffffff80231bad>] __mod_timer+0x93/0xc8 [<ffffffff80231c0e>] mod_timer+0x2c/0x2f [<ffffffff80377fb2>] i8042_interrupt+0x2a/0x222 [<ffffffff802537de>] handle_IRQ_event+0x28/0x64 [<ffffffff802538c6>] __do_IRQ+0xac/0x117 [<ffffffff8020becf>] do_IRQ+0xf7/0x108 [<ffffffff80209bd8>] common_interrupt+0x64/0x65 [<7ffffffffffffffd>] 0x7ffffffffffffffd [<ffffffff8020a94d>] call_softirq+0x1d/0x28 [<ffffffff8022e0bc>] irq_exit+0x56/0x59 [<ffffffff802146ee>] smp_apic_timer_interrupt+0x59/0x5f [<ffffffff8020a2c1>] apic_timer_interrupt+0x69/0x70 in-softirq-W at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb644>] _spin_lock_irq+0x28/0x35 [<ffffffff8023197f>] run_timer_softirq+0x41/0x1dc [<ffffffff8022e345>] __do_softirq+0x6b/0xf7 [<ffffffff8020a94d>] call_softirq+0x1d/0x28 [<ffffffff8022e0bc>] irq_exit+0x56/0x59 [<ffffffff8020bed4>] do_IRQ+0xfc/0x108 [<ffffffff80209bd8>] common_interrupt+0x64/0x65 } ... key at: [<ffffffff805a16e0>] base_lock_keys+0x0/0x20 ... acquired at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff80231316>] lock_timer_base+0x21/0x47 [<ffffffff80231c84>] try_to_del_timer_sync+0x14/0x62 [<ffffffff80231ce3>] del_timer_sync+0x11/0x1a [<ffffffff881b8a7b>] ib_mad_complete_send_wr+0x10b/0x1f8 [ib_mad] [<ffffffff881b8c8d>] ib_mad_send_done_handler+0x125/0x17a [ib_mad] [<ffffffff881b9db2>] ib_mad_completion_handler+0x571/0x5e4 [ib_mad] [<ffffffff80238422>] run_workqueue+0xa0/0xf7 [<ffffffff8023861d>] worker_thread+0xee/0x122 [<ffffffff8023b661>] kthread+0xd0/0xfb [<ffffffff8020a5f1>] child_rip+0x7/0x12 -> (base_lock_keys + cpu#4){++..} ops: 0 { initial-use at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb644>] _spin_lock_irq+0x28/0x35 [<ffffffff8023197f>] run_timer_softirq+0x41/0x1dc [<ffffffff8022e345>] __do_softirq+0x6b/0xf7 [<ffffffff8020a94d>] call_softirq+0x1d/0x28 [<ffffffff8022e0bc>] irq_exit+0x56/0x59 [<ffffffff802146ee>] smp_apic_timer_interrupt+0x59/0x5f [<ffffffff8020a2c1>] apic_timer_interrupt+0x69/0x70 in-hardirq-W at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff80231316>] lock_timer_base+0x21/0x47 [<ffffffff80231c2f>] del_timer+0x1e/0x5f [<ffffffff8034f281>] scsi_delete_timer+0x12/0x5d [<ffffffff8034d064>] scsi_done+0xd/0x1e [<ffffffff8035d444>] ata_scsi_qc_complete+0xb1/0xc3 [<ffffffff803583e9>] __ata_qc_complete+0x218/0x225 [<ffffffff803584c5>] ata_qc_complete+0xcf/0xd5 [<ffffffff80359cf0>] ata_hsm_qc_complete+0x1c6/0x1d8 [<ffffffff8035a31a>] ata_hsm_move+0x618/0x638 [<ffffffff8035a8ba>] ata_interrupt+0x161/0x1ae [<ffffffff802537de>] handle_IRQ_event+0x28/0x64 [<ffffffff802538c6>] __do_IRQ+0xac/0x117 [<ffffffff8020becf>] do_IRQ+0xf7/0x108 [<ffffffff80209bd8>] common_interrupt+0x64/0x65 in-softirq-W at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb644>] _spin_lock_irq+0x28/0x35 [<ffffffff8023197f>] run_timer_softirq+0x41/0x1dc [<ffffffff8022e345>] __do_softirq+0x6b/0xf7 [<ffffffff8020a94d>] call_softirq+0x1d/0x28 [<ffffffff8022e0bc>] irq_exit+0x56/0x59 [<ffffffff802146ee>] smp_apic_timer_interrupt+0x59/0x5f [<ffffffff8020a2c1>] apic_timer_interrupt+0x69/0x70 } ... key at: [<ffffffff805a16f8>] base_lock_keys+0x18/0x20 ... acquired at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb316>] _spin_lock+0x21/0x2e [<ffffffff80231bad>] __mod_timer+0x93/0xc8 [<ffffffff802386c6>] queue_delayed_work+0x75/0x7d [<ffffffff881b776a>] wait_for_response+0xdf/0xe8 [ib_mad] [<ffffffff881b8a31>] ib_mad_complete_send_wr+0xc1/0x1f8 [ib_mad] [<ffffffff881b8c8d>] ib_mad_send_done_handler+0x125/0x17a [ib_mad] [<ffffffff881b9db2>] ib_mad_completion_handler+0x571/0x5e4 [ib_mad] [<ffffffff80238422>] run_workqueue+0xa0/0xf7 [<ffffffff8023861d>] worker_thread+0xee/0x122 [<ffffffff8023b661>] kthread+0xd0/0xfb [<ffffffff8020a5f1>] child_rip+0x7/0x12 -> (base_lock_keys + cpu#2){++..} ops: 0 { initial-use at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb644>] _spin_lock_irq+0x28/0x35 [<ffffffff8023197f>] run_timer_softirq+0x41/0x1dc [<ffffffff8022e345>] __do_softirq+0x6b/0xf7 [<ffffffff8020a94d>] call_softirq+0x1d/0x28 [<ffffffff8022e0bc>] irq_exit+0x56/0x59 [<ffffffff802146ee>] smp_apic_timer_interrupt+0x59/0x5f [<ffffffff8020a2c1>] apic_timer_interrupt+0x69/0x70 in-hardirq-W at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb316>] _spin_lock+0x21/0x2e [<ffffffff80231bad>] __mod_timer+0x93/0xc8 [<ffffffff80231c0e>] mod_timer+0x2c/0x2f [<ffffffff80377fb2>] i8042_interrupt+0x2a/0x222 [<ffffffff802537de>] handle_IRQ_event+0x28/0x64 [<ffffffff802538c6>] __do_IRQ+0xac/0x117 [<ffffffff8020becf>] do_IRQ+0xf7/0x108 [<ffffffff80209bd8>] common_interrupt+0x64/0x65 in-softirq-W at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb644>] _spin_lock_irq+0x28/0x35 [<ffffffff8023197f>] run_timer_softirq+0x41/0x1dc [<ffffffff8022e345>] __do_softirq+0x6b/0xf7 [<ffffffff8020a94d>] call_softirq+0x1d/0x28 [<ffffffff8022e0bc>] irq_exit+0x56/0x59 [<ffffffff802146ee>] smp_apic_timer_interrupt+0x59/0x5f [<ffffffff8020a2c1>] apic_timer_interrupt+0x69/0x70 } ... key at: [<ffffffff805a16e8>] base_lock_keys+0x8/0x20 ... acquired at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb316>] _spin_lock+0x21/0x2e [<ffffffff80231bad>] __mod_timer+0x93/0xc8 [<ffffffff802386c6>] queue_delayed_work+0x75/0x7d [<ffffffff881b776a>] wait_for_response+0xdf/0xe8 [ib_mad] [<ffffffff881b8a31>] ib_mad_complete_send_wr+0xc1/0x1f8 [ib_mad] [<ffffffff881b8c8d>] ib_mad_send_done_handler+0x125/0x17a [ib_mad] [<ffffffff881b9db2>] ib_mad_completion_handler+0x571/0x5e4 [ib_mad] [<ffffffff80238422>] run_workqueue+0xa0/0xf7 [<ffffffff8023861d>] worker_thread+0xee/0x122 [<ffffffff8023b661>] kthread+0xd0/0xfb [<ffffffff8020a5f1>] child_rip+0x7/0x12 ... acquired at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff881b7a16>] ib_modify_mad+0x27/0x14c [ib_mad] [<ffffffff881b7b45>] ib_cancel_mad+0xa/0xd [ib_mad] [<ffffffff881e60b8>] ib_sa_cancel_query+0x6a/0x73 [ib_sa] [<ffffffff881eefb5>] wait_for_mcast_join+0x35/0xd2 [ib_ipoib] [<ffffffff881ef385>] ipoib_mcast_stop_thread+0xa6/0xe6 [ib_ipoib] [<ffffffff881f07da>] ipoib_mcast_restart_task+0x4b/0x3d9 [ib_ipoib] [<ffffffff80238422>] run_workqueue+0xa0/0xf7 [<ffffffff8023861d>] worker_thread+0xee/0x122 [<ffffffff8023b661>] kthread+0xd0/0xfb [<ffffffff8020a5f1>] child_rip+0x7/0x12 -> (modlist_lock){.+..} ops: 0 { initial-use at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff80247f0c>] module_text_address+0x15/0x3b [<ffffffff803ee75b>] __register_kprobe+0x2bc/0x2da [<ffffffff803ee8e3>] register_kprobe+0xc/0xf [<ffffffff80806064>] arch_init_kprobes+0xf/0x12 [<ffffffff808094be>] init_kprobes+0x3f/0x52 [<ffffffff8020718b>] init+0x143/0x308 [<ffffffff8020a5f1>] child_rip+0x7/0x12 in-softirq-W at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff80247e1e>] is_module_address+0x14/0x80 [<ffffffff8023ede1>] static_obj+0x6c/0x79 [<ffffffff8023ee8b>] lockdep_init_map+0x7d/0xc3 [<ffffffff802f64d2>] __spin_lock_init+0x2e/0x58 [<ffffffff881ef221>] ipoib_mcast_alloc+0x97/0xce [ib_ipoib] [<ffffffff881efe3b>] ipoib_mcast_send+0x109/0x413 [ib_ipoib] [<ffffffff881eda0e>] ipoib_start_xmit+0x361/0x66d [ib_ipoib] [<ffffffff8038dfc8>] dev_hard_start_xmit+0x1ab/0x221 [<ffffffff8039a024>] __qdisc_run+0xfa/0x1cd [<ffffffff8038e174>] dev_queue_xmit+0x136/0x263 [<ffffffff80390209>] neigh_connected_output+0xae/0xc7 [<ffffffff8813f4de>] ip6_output2+0x254/0x28c [ipv6] [<ffffffff8813fcf8>] ip6_output+0x7e2/0x7f8 [ipv6] [<ffffffff8814d736>] ndisc_send_rs+0x33d/0x46f [ipv6] [<ffffffff881427b9>] addrconf_dad_completed+0x90/0xe2 [ipv6] [<ffffffff88144fc1>] addrconf_dad_timer+0x74/0x11e [ipv6] [<ffffffff80231a8e>] run_timer_softirq+0x150/0x1dc [<ffffffff8022e345>] __do_softirq+0x6b/0xf7 [<ffffffff8020a94d>] call_softirq+0x1d/0x28 [<ffffffff8022e0bc>] irq_exit+0x56/0x59 [<ffffffff802146ee>] smp_apic_timer_interrupt+0x59/0x5f [<ffffffff8020a2c1>] apic_timer_interrupt+0x69/0x70 } ... key at: [<ffffffff804cfd98>] modlist_lock+0x18/0x40 ... acquired at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff80247e1e>] is_module_address+0x14/0x80 [<ffffffff8023ede1>] static_obj+0x6c/0x79 [<ffffffff8023ee8b>] lockdep_init_map+0x7d/0xc3 [<ffffffff802f64d2>] __spin_lock_init+0x2e/0x58 [<ffffffff881ef221>] ipoib_mcast_alloc+0x97/0xce [ib_ipoib] [<ffffffff881f0903>] ipoib_mcast_restart_task+0x174/0x3d9 [ib_ipoib] [<ffffffff80238422>] run_workqueue+0xa0/0xf7 [<ffffffff8023861d>] worker_thread+0xee/0x122 [<ffffffff8023b661>] kthread+0xd0/0xfb [<ffffffff8020a5f1>] child_rip+0x7/0x12 -> (&qp->sq.lock){.+..} ops: 0 { initial-use at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb644>] _spin_lock_irq+0x28/0x35 [<ffffffff881cdef3>] mthca_modify_qp+0x67/0xd7b [ib_mthca] [<ffffffff881a68fa>] ib_modify_qp+0xc/0xf [ib_core] [<ffffffff881b8394>] ib_mad_init_device+0x2db/0x55c [ib_mad] [<ffffffff881a83fe>] ib_register_device+0x20d/0x300 [ib_core] [<ffffffff881d2f53>] mthca_register_device+0x3e0/0x430 [ib_mthca] [<ffffffff881c4f26>] mthca_init_one+0xbc2/0xcbc [ib_mthca] [<ffffffff802fcab2>] pci_device_probe+0x4b/0x72 [<ffffffff803481b7>] driver_probe_device+0x59/0xaf [<ffffffff803482d3>] __driver_attach+0x58/0x90 [<ffffffff80347665>] bus_for_each_dev+0x48/0x7a [<ffffffff80348047>] driver_attach+0x1b/0x1e [<ffffffff803479d2>] bus_add_driver+0x74/0x112 [<ffffffff80348705>] driver_register+0x8c/0x91 [<ffffffff802fc684>] __pci_register_driver+0x60/0x84 [<ffffffff88022016>] 0xffffffff88022016 [<ffffffff80247a93>] sys_init_module+0x174c/0x1884 [<ffffffff802096a1>] system_call+0x7d/0x83 in-softirq-W at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff881cff9d>] mthca_arbel_post_send+0x2f/0x5be [ib_mthca] [<ffffffff881ee043>] ipoib_send+0x117/0x1d6 [ib_ipoib] [<ffffffff881f012d>] ipoib_mcast_send+0x3fb/0x413 [ib_ipoib] [<ffffffff881eda0e>] ipoib_start_xmit+0x361/0x66d [ib_ipoib] [<ffffffff8038dfc8>] dev_hard_start_xmit+0x1ab/0x221 [<ffffffff8039a024>] __qdisc_run+0xfa/0x1cd [<ffffffff8038e174>] dev_queue_xmit+0x136/0x263 [<ffffffff80390209>] neigh_connected_output+0xae/0xc7 [<ffffffff8813f4de>] ip6_output2+0x254/0x28c [ipv6] [<ffffffff8813fcf8>] ip6_output+0x7e2/0x7f8 [ipv6] [<ffffffff8814dbf7>] ndisc_send_ns+0x38f/0x4c1 [ipv6] [<ffffffff88145048>] addrconf_dad_timer+0xfb/0x11e [ipv6] [<ffffffff80231a8e>] run_timer_softirq+0x150/0x1dc [<ffffffff8022e345>] __do_softirq+0x6b/0xf7 [<ffffffff8020a94d>] call_softirq+0x1d/0x28 [<ffffffff8022e0bc>] irq_exit+0x56/0x59 [<ffffffff802146ee>] smp_apic_timer_interrupt+0x59/0x5f [<ffffffff8020a2c1>] apic_timer_interrupt+0x69/0x70 } ... key at: [<ffffffff881e4060>] __key.20374+0x0/0xffffffffffff231a [ib_mthca] -> (&qp->rq.lock){+...} ops: 0 { initial-use at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb316>] _spin_lock+0x21/0x2e [<ffffffff881cdf02>] mthca_modify_qp+0x76/0xd7b [ib_mthca] [<ffffffff881a68fa>] ib_modify_qp+0xc/0xf [ib_core] [<ffffffff881b8394>] ib_mad_init_device+0x2db/0x55c [ib_mad] [<ffffffff881a83fe>] ib_register_device+0x20d/0x300 [ib_core] [<ffffffff881d2f53>] mthca_register_device+0x3e0/0x430 [ib_mthca] [<ffffffff881c4f26>] mthca_init_one+0xbc2/0xcbc [ib_mthca] [<ffffffff802fcab2>] pci_device_probe+0x4b/0x72 [<ffffffff803481b7>] driver_probe_device+0x59/0xaf [<ffffffff803482d3>] __driver_attach+0x58/0x90 [<ffffffff80347665>] bus_for_each_dev+0x48/0x7a [<ffffffff80348047>] driver_attach+0x1b/0x1e [<ffffffff803479d2>] bus_add_driver+0x74/0x112 [<ffffffff80348705>] driver_register+0x8c/0x91 [<ffffffff802fc684>] __pci_register_driver+0x60/0x84 [<ffffffff88022016>] 0xffffffff88022016 [<ffffffff80247a93>] sys_init_module+0x174c/0x1884 [<ffffffff802096a1>] system_call+0x7d/0x83 in-hardirq-W at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff881cd749>] mthca_arbel_post_receive+0x35/0x23a [ib_mthca] [<ffffffff881ee183>] ipoib_ib_post_receive+0x81/0xf9 [ib_ipoib] [<ffffffff881ee55b>] ipoib_ib_completion+0x286/0x3ff [ib_ipoib] [<ffffffff881ca8a5>] mthca_cq_completion+0x65/0x6b [ib_mthca] [<ffffffff881ca1bc>] mthca_eq_int+0x70/0x3d2 [ib_mthca] [<ffffffff881ca644>] mthca_arbel_interrupt+0x3b/0x98 [ib_mthca] [<ffffffff802537de>] handle_IRQ_event+0x28/0x64 [<ffffffff802538c6>] __do_IRQ+0xac/0x117 [<ffffffff8020becf>] do_IRQ+0xf7/0x108 [<ffffffff80209bd8>] common_interrupt+0x64/0x65 } ... key at: [<ffffffff881e4058>] __key.20375+0x0/0xffffffffffff2322 [ib_mthca] ... acquired at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb316>] _spin_lock+0x21/0x2e [<ffffffff881cdf02>] mthca_modify_qp+0x76/0xd7b [ib_mthca] [<ffffffff881a68fa>] ib_modify_qp+0xc/0xf [ib_core] [<ffffffff881b8394>] ib_mad_init_device+0x2db/0x55c [ib_mad] [<ffffffff881a83fe>] ib_register_device+0x20d/0x300 [ib_core] [<ffffffff881d2f53>] mthca_register_device+0x3e0/0x430 [ib_mthca] [<ffffffff881c4f26>] mthca_init_one+0xbc2/0xcbc [ib_mthca] [<ffffffff802fcab2>] pci_device_probe+0x4b/0x72 [<ffffffff803481b7>] driver_probe_device+0x59/0xaf [<ffffffff803482d3>] __driver_attach+0x58/0x90 [<ffffffff80347665>] bus_for_each_dev+0x48/0x7a [<ffffffff80348047>] driver_attach+0x1b/0x1e [<ffffffff803479d2>] bus_add_driver+0x74/0x112 [<ffffffff80348705>] driver_register+0x8c/0x91 [<ffffffff802fc684>] __pci_register_driver+0x60/0x84 [<ffffffff88022016>] 0xffffffff88022016 [<ffffffff80247a93>] sys_init_module+0x174c/0x1884 [<ffffffff802096a1>] system_call+0x7d/0x83 ... acquired at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff881cff9d>] mthca_arbel_post_send+0x2f/0x5be [ib_mthca] [<ffffffff881ee043>] ipoib_send+0x117/0x1d6 [ib_ipoib] [<ffffffff881f012d>] ipoib_mcast_send+0x3fb/0x413 [ib_ipoib] [<ffffffff881eda0e>] ipoib_start_xmit+0x361/0x66d [ib_ipoib] [<ffffffff8038dfc8>] dev_hard_start_xmit+0x1ab/0x221 [<ffffffff8039a024>] __qdisc_run+0xfa/0x1cd [<ffffffff8038e174>] dev_queue_xmit+0x136/0x263 [<ffffffff80390209>] neigh_connected_output+0xae/0xc7 [<ffffffff8813f4de>] ip6_output2+0x254/0x28c [ipv6] [<ffffffff8813fcf8>] ip6_output+0x7e2/0x7f8 [ipv6] [<ffffffff8814dbf7>] ndisc_send_ns+0x38f/0x4c1 [ipv6] [<ffffffff88145048>] addrconf_dad_timer+0xfb/0x11e [ipv6] [<ffffffff80231a8e>] run_timer_softirq+0x150/0x1dc [<ffffffff8022e345>] __do_softirq+0x6b/0xf7 [<ffffffff8020a94d>] call_softirq+0x1d/0x28 [<ffffffff8022e0bc>] irq_exit+0x56/0x59 [<ffffffff802146ee>] smp_apic_timer_interrupt+0x59/0x5f [<ffffffff8020a2c1>] apic_timer_interrupt+0x69/0x70 ... acquired at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb316>] _spin_lock+0x21/0x2e [<ffffffff881efd5a>] ipoib_mcast_send+0x28/0x413 [ib_ipoib] [<ffffffff881eda0e>] ipoib_start_xmit+0x361/0x66d [ib_ipoib] [<ffffffff8038dfc8>] dev_hard_start_xmit+0x1ab/0x221 [<ffffffff8039a024>] __qdisc_run+0xfa/0x1cd [<ffffffff8038e174>] dev_queue_xmit+0x136/0x263 [<ffffffff80390209>] neigh_connected_output+0xae/0xc7 [<ffffffff8813f4de>] ip6_output2+0x254/0x28c [ipv6] [<ffffffff8813fcf8>] ip6_output+0x7e2/0x7f8 [ipv6] [<ffffffff8814dbf7>] ndisc_send_ns+0x38f/0x4c1 [ipv6] [<ffffffff88145048>] addrconf_dad_timer+0xfb/0x11e [ipv6] [<ffffffff80231a8e>] run_timer_softirq+0x150/0x1dc [<ffffffff8022e345>] __do_softirq+0x6b/0xf7 [<ffffffff8020a94d>] call_softirq+0x1d/0x28 [<ffffffff8022e0bc>] irq_exit+0x56/0x59 [<ffffffff802146ee>] smp_apic_timer_interrupt+0x59/0x5f [<ffffffff8020a2c1>] apic_timer_interrupt+0x69/0x70 the hard-irq-unsafe lock's dependencies: -> (&skb_queue_lock_key){-+..} ops: 0 { initial-use at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff8038683d>] skb_queue_tail+0x1c/0x47 [<ffffffff8039ff72>] netlink_broadcast+0x212/0x2ea [<ffffffff802e57c4>] kobject_uevent+0x3c9/0x43d [<ffffffff8034897f>] store_uevent+0x16/0x1e [<ffffffff803488ef>] class_device_attr_store+0x1b/0x1e [<ffffffff802b0c33>] sysfs_write_file+0xb7/0xe4 [<ffffffff8027448e>] vfs_write+0xad/0x154 [<ffffffff80274d8b>] sys_write+0x46/0x6f [<ffffffff802096a1>] system_call+0x7d/0x83 in-softirq-W at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb7da>] _spin_lock_irqsave+0x2b/0x3c [<ffffffff80386aca>] skb_dequeue+0x18/0x5c [<ffffffff80387015>] skb_queue_purge+0x18/0x26 [<ffffffff80394261>] neigh_timer_handler+0x26f/0x346 [<ffffffff80231a8e>] run_timer_softirq+0x150/0x1dc [<ffffffff8022e345>] __do_softirq+0x6b/0xf7 [<ffffffff8020a94d>] call_softirq+0x1d/0x28 [<ffffffff8022e0bc>] irq_exit+0x56/0x59 [<ffffffff802146ee>] smp_apic_timer_interrupt+0x59/0x5f [<ffffffff8020a2c1>] apic_timer_interrupt+0x69/0x70 hardirq-on-W at: [<ffffffff8024172a>] lock_acquire+0x4a/0x69 [<ffffffff803eb349>] _spin_lock_bh+0x26/0x33 [<ffffffff803c6de8>] udp_ioctl+0x46/0x87 [<ffffffff803cd1ab>] inet_ioctl+0x8c/0x8f [<ffffffff8038203c>] sock_ioctl+0x1c0/0x1ea [<ffffffff8028591e>] do_ioctl+0x26/0x74 [<ffffffff80285bb6>] vfs_ioctl+0x24a/0x264 [<ffffffff80285c11>] sys_ioctl+0x41/0x68 [<ffffffff802096a1>] system_call+0x7d/0x83 } ... key at: [<ffffffff807ccac8>] skb_queue_lock_key+0x0/0x18 stack backtrace: Call Trace: [<ffffffff8020ac99>] show_trace+0xaa/0x238 [<ffffffff8020b037>] dump_stack+0x13/0x15 [<ffffffff8023ff9e>] check_usage+0x282/0x293 [<ffffffff802412b7>] __lock_acquire+0x85c/0xa29 [<ffffffff8024172b>] lock_acquire+0x4b/0x69 [<ffffffff803eb7db>] _spin_lock_irqsave+0x2c/0x3c [<ffffffff8038683e>] skb_queue_tail+0x1d/0x47 [<ffffffff881efecc>] :ib_ipoib:ipoib_mcast_send+0x19a/0x413 [<ffffffff881eda0f>] :ib_ipoib:ipoib_start_xmit+0x362/0x66d [<ffffffff8038dfc9>] dev_hard_start_xmit+0x1ac/0x221 [<ffffffff8039a025>] __qdisc_run+0xfb/0x1cd [<ffffffff8038e175>] dev_queue_xmit+0x137/0x263 [<ffffffff8039020a>] neigh_connected_output+0xaf/0xc7 [<ffffffff8813f4df>] :ipv6:ip6_output2+0x255/0x28c [<ffffffff8813fcf9>] :ipv6:ip6_output+0x7e3/0x7f8 [<ffffffff8814d737>] :ipv6:ndisc_send_rs+0x33e/0x46f [<ffffffff881427ba>] :ipv6:addrconf_dad_completed+0x91/0xe2 [<ffffffff88144fc2>] :ipv6:addrconf_dad_timer+0x75/0x11e [<ffffffff80231a8f>] run_timer_softirq+0x151/0x1dc [<ffffffff8022e346>] __do_softirq+0x6c/0xf7 [<ffffffff8020a94e>] call_softirq+0x1e/0x28 ib0: no IPv6 routers present ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [openib-general] ipoib lockdep warning 2006-07-12 2:33 ` [openib-general] ipoib lockdep warning Roland Dreier @ 2006-07-12 6:49 ` Arjan van de Ven 2006-07-12 19:01 ` Roland Dreier 0 siblings, 1 reply; 32+ messages in thread From: Arjan van de Ven @ 2006-07-12 6:49 UTC (permalink / raw) To: Roland Dreier Cc: Zach Brown, openib-general, Linux Kernel Mailing List, Ingo Molnar On Tue, 2006-07-11 at 19:33 -0700, Roland Dreier wrote: > OK, the patch to lib/idr.c below is at least one way to fix this. > However, with that applied I get the lockdep warning below, which > seems to be a false positive -- I'm not sure what the right fix is, > but the blame really seems to fall on udp_ioctl for poking into the > sk_buff_head lock itself. this does not have to be a false positive! It is not legal to take ANY non-hardirq safe lock after having taken a lock that's used in hardirq context. (having said that the skb_queue_tail lock needs a special treatment for some real false positives; Linus merged that already) > And here's the warning I get, which appears to be a false positive: > > ====================================================== > [ INFO: hard-safe -> hard-unsafe lock order detected ] > ------------------------------------------------------ > swapper/0 [HC0[0]:SC1[2]:HE0:SE0] is trying to acquire: > (&skb_queue_lock_key){-+..}, at: [<ffffffff8038683e>] skb_queue_tail+0x1d/0x47 > > and this task is already holding: > (&priv->lock){.+..}, at: [<ffffffff881efd5b>] ipoib_mcast_send+0x29/0x413 [ib_ipoib] > which would create a new lock dependency: > (&priv->lock){.+..} -> (&skb_queue_lock_key){-+..} > Lets call priv->lock lock A, and the skb_queue_lock lock B to keep this short CPU 0 CPU 1 soft irq context user or softirq context take lock B (anywhere in net stack) take lock A (ipoib_mcast_send) interrupt happens take lock A (spins) in ISR somewhere take lock B (spins) (in skb_queue_tail() called from ipoib_mcast_send) Now this assumes your queue is shared with the networking stack, so that the first "take lock B" on CPU 0 can happen without disabling interrupts. (as several of the SKB functions do that get used by the networking stack, but not the simple ones used by drivers for private queues). If this queue used in ipoib_mcast_send() is a private queue you probably want to apply the patch below (already in/on its way to mainline) Index: linux-2.6.18-rc1/include/linux/skbuff.h =================================================================== --- linux-2.6.18-rc1.orig/include/linux/skbuff.h +++ linux-2.6.18-rc1/include/linux/skbuff.h @@ -606,10 +606,17 @@ static inline __u32 skb_queue_len(const extern struct lock_class_key skb_queue_lock_key; +/* + * This function creates a split out lock class for each invocation; + * this is needed for now since a whole lot of users of the skb-queue + * infrastructure in drivers have different locking usage (in hardirq) + * than the networking core (in softirq only). In the long run either the + * network layer or drivers should need annotation to consolidate the + * main types of usage into 3 classes. + */ static inline void skb_queue_head_init(struct sk_buff_head *list) { spin_lock_init(&list->lock); - lockdep_set_class(&list->lock, &skb_queue_lock_key); list->prev = list->next = (struct sk_buff *)list; list->qlen = 0; } ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [openib-general] ipoib lockdep warning 2006-07-12 6:49 ` Arjan van de Ven @ 2006-07-12 19:01 ` Roland Dreier 0 siblings, 0 replies; 32+ messages in thread From: Roland Dreier @ 2006-07-12 19:01 UTC (permalink / raw) To: Arjan van de Ven Cc: Zach Brown, openib-general, Linux Kernel Mailing List, Ingo Molnar > this does not have to be a false positive! > It is not legal to take ANY non-hardirq safe lock after having taken a > lock that's used in hardirq context. > (having said that the skb_queue_tail lock needs a special treatment for > some real false positives; Linus merged that already) ... > Now this assumes your queue is shared with the networking stack, Right, understood -- but ipoib has a private skb queue. Which is why I said it was a false positive. Thanks, Roland ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2006-07-17 15:57 UTC | newest] Thread overview: 32+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-07-11 20:10 ipoib lockdep warning Zach Brown 2006-07-11 21:16 ` Michael S. Tsirkin 2006-07-11 21:40 ` Sean Hefty 2006-07-11 21:50 ` [openib-general] " Zach Brown 2006-07-11 22:54 ` Roland Dreier 2006-07-11 23:27 ` Zach Brown 2006-07-11 23:43 ` Roland Dreier 2006-07-11 23:53 ` Zach Brown 2006-07-12 0:06 ` Roland Dreier 2006-07-12 9:38 ` Ingo Molnar 2006-07-12 11:09 ` Michael S. Tsirkin 2006-07-12 16:31 ` [openib-general] " Sean Hefty 2006-07-12 18:56 ` Roland Dreier 2006-07-13 15:55 ` [PATCH] IB/core: use correct gfp_mask in sa_query Michael S. Tsirkin 2006-07-12 19:06 ` [openib-general] ipoib lockdep warning Roland Dreier 2006-07-12 20:45 ` [PATCH] Convert idr's internal locking to _irqsave variant Roland Dreier 2006-07-12 21:14 ` Ingo Molnar 2006-07-13 1:30 ` Andrew Morton 2006-07-13 15:42 ` Roland Dreier 2006-07-13 20:54 ` Andrew Morton 2006-07-13 21:03 ` Roland Dreier 2006-07-13 21:05 ` Arjan van de Ven 2006-07-14 0:18 ` Roland Dreier 2006-07-14 6:20 ` Arjan van de Ven 2006-07-13 21:43 ` Andrew Morton 2006-07-14 1:08 ` Roland Dreier 2006-07-14 1:18 ` Andrew Morton 2006-07-14 1:30 ` Andrew Morton 2006-07-17 15:57 ` Roland Dreier 2006-07-12 2:33 ` [openib-general] ipoib lockdep warning Roland Dreier 2006-07-12 6:49 ` Arjan van de Ven 2006-07-12 19:01 ` Roland Dreier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox