From: Jens Axboe <jens.axboe@oracle.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>,
torvalds@osdl.org, Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org
Subject: Re: 2.6.26-rc4: RIP __call_for_each_cic+0x20/0x50
Date: Wed, 28 May 2008 14:44:24 +0200 [thread overview]
Message-ID: <20080528124423.GV25504@kernel.dk> (raw)
In-Reply-To: <20080528103026.GE8255@linux.vnet.ibm.com>
On Wed, May 28 2008, Paul E. McKenney wrote:
> On Wed, May 28, 2008 at 12:07:21PM +0200, Jens Axboe wrote:
> > On Tue, May 27 2008, Paul E. McKenney wrote:
> > > On Tue, May 27, 2008 at 03:35:10PM +0200, Jens Axboe wrote:
> > > > On Tue, May 27 2008, Alexey Dobriyan wrote:
> > > > > On Sat, May 10, 2008 at 02:37:19PM +0400, Alexey Dobriyan wrote:
> > > > > > > > > > @@ -41,8 +41,8 @@ int put_io_context(struct io_context *ioc)
> > > > > > > > > > rcu_read_lock();
> > > > > > > > > > if (ioc->aic && ioc->aic->dtor)
> > > > > > > > > > ioc->aic->dtor(ioc->aic);
> > > > > > > > > > - rcu_read_unlock();
> > > > > > > > > > cfq_dtor(ioc);
> > > > > > > > > > + rcu_read_unlock();
> > > > > > > > > >
> > > > > > > > > > kmem_cache_free(iocontext_cachep, ioc);
> > > > > > > > > > return 1;
> > > > > > > > >
> > > > > > > > > This helps in sense that 3 times bulk cross-compiles finish to the end.
> > > > > > > > > You'll hear me if another such oops will resurface.
> > > > > > > >
> > > > > > > > Still looking good?
> > > > > > >
> > > > > > > Yup!
> > > > > >
> > > > > > And this with patch in mainline, again with PREEMPT_RCU.
> > > > >
> > > > > Ping, this happened again with 2.6.26-rc4 and PREEMPT_RCU.
> > > >
> > > > Worrisome... Paul, would you mind taking a quick look at cfq
> > > > and see if you can detect why breaks with preempt rcu? It's
> > > > clearly a use-after-free symptom, but I don't see how it can
> > > > happen.
> > >
> > > Some quick and probably off-the-mark questions...
> >
> > Thanks!
>
> Glad it actually was of help! ;-)
Your reviews are ALWAYS greatly appreciated!
> > > o What is the purpose of __call_for_each_cic()? When called
> > > from call_for_each_cic(), it is under rcu_read_lock(), as
> > > required, but it is also called from cfq_free_io_context(),
> > > which is assigned to the ->dtor and ->exit members of the
> > > cfq_io_context struct. What protects calls through these
> > > members?
> > >
> > > (This is for the ->cic_list field of the cfq_io_context structure.
> > > One possibility is that the io_context's ->lock member is held,
> > > but I don't see this. Not that I looked all that hard...)
> > >
> > > My suggestion would be to simply change all invocations of
> > > __call_for_each_cic() to instead invoke call_for_each_cic().
> > > The rcu_read_lock()/rcu_read_unlock() pair is pretty
> > > lightweight, even in CONFIG_PREEMPT_RCU.
> >
> > __call_for_each_cic() is always called under rcu_read_lock(), it merely
> > exists to avoid a double rcu_read_lock(). Even if it is cheap. The
> > convention follows the usual __lock_is_already_held() double under
> > score, but I guess it could do with a comment! There are only two
> > callers of the function, call_for_each_cic() which does the
> > rcu_read_lock(), and cfq_free_io_context() which is called from ->dtor
> > (and holds the rcu_read_lock() and ->trim which actually does not. That
> > looks like it could be problematic, but it's only called when an io
> > scheduler module is removed so not really critical. I'll add it, though!
> > Actually, the task_lock() should be enough there. So no bug, but (again)
> > it could do with a comment.
>
> Sounds good!
>
> > > o When calling cfq_slab_kill(), for example from cfq_exit(),
> > > what ensures that all previous RCU callbacks have completed?
> > >
> > > I suspect that you need an rcu_barrier() at the beginning
> > > of cfq_slab_kill(), but I could be missing something.
> >
> > So we have two callers of that, one is from the error path at init time
> > and is obviously ok. The other does need rcu_barrier()! I'll add that.
>
> OK, that does make my brain hurt less. ;-)
So that one was also OK, as Fabio pointed out. If you follow the
ioc_gone and user tracking, the:
if (elv_ioc_count_read(ioc_count))
wait_for_completion(ioc_gone);
also has the side effect of waiting for RCU callbacks calling
kmem_cache_free() to have finished as well.
> > > o What protects the first rcu_dereference() in cfq_cic_lookup()?
> > > There needs to be either an enclose rcu_read_lock() on the
> > > one hand or the ->queue_lock needs to be held.
> > >
> > > (My guess is the latter, given the later rcu_assign_pointer()
> > > in this same function, but I don't see a lock acquisition
> > > in the immediate vicinity -- might be further up the function
> > > call stack, though.)
> >
> > There's no locking going into that function when coming from
> > cfq_get_io_context(), the other paths (happen) to hold the queue lock
> > already though.
>
> So the call from cfq_get_io_context() needs an rcu_read_lock()?
> Not seeing this in the patch below, but maybe you have it up a
> function-call level or two?
It's in there, it now does:
rcu_read_lock();
cic = rcu_dereference(ioc->ioc_data);
if (cic && cic->key == cfqd) {
rcu_read_unlock();
return cic;
}
...
OK? Which is basically what remains of the patch now, except for the
comment additions. Oh, and the ioc->lock protecting setting of
->ioc_data as well. New version below. Alexey, care to give this a
spin? Seems your box is very well suited for finding RCU preempt
problems :-)
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 4df3f05..d01b411 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1142,6 +1142,9 @@ static void cfq_put_queue(struct cfq_queue *cfqq)
kmem_cache_free(cfq_pool, cfqq);
}
+/*
+ * Must always be called with the rcu_read_lock() held
+ */
static void
__call_for_each_cic(struct io_context *ioc,
void (*func)(struct io_context *, struct cfq_io_context *))
@@ -1197,6 +1200,11 @@ static void cic_free_func(struct io_context *ioc, struct cfq_io_context *cic)
cfq_cic_free(cic);
}
+/*
+ * Must be called with rcu_read_lock() held or preemption otherwise disabled.
+ * Only two callers of this - ->dtor() which is called with the rcu_read_lock(),
+ * and ->trim() which is called with the task lock held
+ */
static void cfq_free_io_context(struct io_context *ioc)
{
/*
@@ -1502,20 +1510,24 @@ static struct cfq_io_context *
cfq_cic_lookup(struct cfq_data *cfqd, struct io_context *ioc)
{
struct cfq_io_context *cic;
+ unsigned long flags;
void *k;
if (unlikely(!ioc))
return NULL;
+ rcu_read_lock();
+
/*
* we maintain a last-hit cache, to avoid browsing over the tree
*/
cic = rcu_dereference(ioc->ioc_data);
- if (cic && cic->key == cfqd)
+ if (cic && cic->key == cfqd) {
+ rcu_read_unlock();
return cic;
+ }
do {
- rcu_read_lock();
cic = radix_tree_lookup(&ioc->radix_root, (unsigned long) cfqd);
rcu_read_unlock();
if (!cic)
@@ -1524,10 +1536,13 @@ cfq_cic_lookup(struct cfq_data *cfqd, struct io_context *ioc)
k = cic->key;
if (unlikely(!k)) {
cfq_drop_dead_cic(cfqd, ioc, cic);
+ rcu_read_lock();
continue;
}
+ spin_lock_irqsave(&ioc->lock, flags);
rcu_assign_pointer(ioc->ioc_data, cic);
+ spin_unlock_irqrestore(&ioc->lock, flags);
break;
} while (1);
@@ -2134,6 +2149,10 @@ static void *cfq_init_queue(struct request_queue *q)
static void cfq_slab_kill(void)
{
+ /*
+ * Caller already ensured that pending RCU callbacks are completed,
+ * so we should have no busy allocations at this point.
+ */
if (cfq_pool)
kmem_cache_destroy(cfq_pool);
if (cfq_ioc_pool)
@@ -2292,6 +2311,11 @@ static void __exit cfq_exit(void)
ioc_gone = &all_gone;
/* ioc_gone's update must be visible before reading ioc_count */
smp_wmb();
+
+ /*
+ * this also protects us from entering cfq_slab_kill() with
+ * pending RCU callbacks
+ */
if (elv_ioc_count_read(ioc_count))
wait_for_completion(ioc_gone);
cfq_slab_kill();
--
Jens Axboe
next prev parent reply other threads:[~2008-05-28 12:44 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-27 22:55 2.6.25-$sha1: RIP call_for_each_cic+0x25/0x50 Alexey Dobriyan
2008-04-28 12:01 ` Andrew Morton
2008-04-28 12:04 ` Jens Axboe
2008-04-28 19:55 ` Alexey Dobriyan
2008-04-29 6:21 ` Alexey Dobriyan
2008-04-29 9:06 ` Jens Axboe
2008-04-30 22:12 ` Alexey Dobriyan
2008-05-04 19:08 ` Jens Axboe
2008-05-04 20:15 ` Alexey Dobriyan
2008-05-04 19:25 ` Jens Axboe
2008-05-04 21:17 ` Alexey Dobriyan
2008-05-10 10:37 ` 2.6.25-$sha1: RIP __call_for_each_cic+0x20/0x50 Alexey Dobriyan
2008-05-27 5:27 ` 2.6.26-rc4: " Alexey Dobriyan
2008-05-27 13:35 ` Jens Axboe
2008-05-27 15:18 ` Paul E. McKenney
2008-05-28 10:07 ` Jens Axboe
2008-05-28 10:30 ` Paul E. McKenney
2008-05-28 12:44 ` Jens Axboe [this message]
2008-05-28 13:20 ` Paul E. McKenney
2008-05-29 4:38 ` Paul E. McKenney
2008-05-29 6:26 ` Jens Axboe
2008-05-29 6:42 ` Jens Axboe
2008-05-29 9:17 ` Paul E. McKenney
2008-05-29 10:13 ` Jens Axboe
2008-05-29 11:25 ` Paul E. McKenney
2008-05-29 11:44 ` Jens Axboe
2008-05-29 12:11 ` Paul E. McKenney
2008-05-29 12:13 ` Jens Axboe
2008-05-30 11:04 ` Paul E. McKenney
2008-05-30 13:16 ` Paul E. McKenney
2008-05-30 18:34 ` Alexey Dobriyan
2008-06-04 3:31 ` Paul E. McKenney
2008-06-04 18:32 ` Linus Torvalds
2008-06-05 4:23 ` Paul E. McKenney
2008-06-06 14:49 ` Paul E. McKenney
2008-05-28 11:52 ` Fabio Checconi
2008-05-28 11:58 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080528124423.GV25504@kernel.dk \
--to=jens.axboe@oracle.com \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox