From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>,
torvalds@osdl.org, Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org
Subject: Re: 2.6.26-rc4: RIP __call_for_each_cic+0x20/0x50
Date: Wed, 28 May 2008 06:20:12 -0700 [thread overview]
Message-ID: <20080528132012.GF8255@linux.vnet.ibm.com> (raw)
In-Reply-To: <20080528124423.GV25504@kernel.dk>
On Wed, May 28, 2008 at 02:44:24PM +0200, Jens Axboe wrote:
> On Wed, May 28 2008, Paul E. McKenney wrote:
> > On Wed, May 28, 2008 at 12:07:21PM +0200, Jens Axboe wrote:
> > > On Tue, May 27 2008, Paul E. McKenney wrote:
> > > > On Tue, May 27, 2008 at 03:35:10PM +0200, Jens Axboe wrote:
> > > > > On Tue, May 27 2008, Alexey Dobriyan wrote:
> > > > > > On Sat, May 10, 2008 at 02:37:19PM +0400, Alexey Dobriyan wrote:
> > > > > > > > > > > @@ -41,8 +41,8 @@ int put_io_context(struct io_context *ioc)
> > > > > > > > > > > rcu_read_lock();
> > > > > > > > > > > if (ioc->aic && ioc->aic->dtor)
> > > > > > > > > > > ioc->aic->dtor(ioc->aic);
> > > > > > > > > > > - rcu_read_unlock();
> > > > > > > > > > > cfq_dtor(ioc);
> > > > > > > > > > > + rcu_read_unlock();
> > > > > > > > > > >
> > > > > > > > > > > kmem_cache_free(iocontext_cachep, ioc);
> > > > > > > > > > > return 1;
> > > > > > > > > >
> > > > > > > > > > This helps in sense that 3 times bulk cross-compiles finish to the end.
> > > > > > > > > > You'll hear me if another such oops will resurface.
> > > > > > > > >
> > > > > > > > > Still looking good?
> > > > > > > >
> > > > > > > > Yup!
> > > > > > >
> > > > > > > And this with patch in mainline, again with PREEMPT_RCU.
> > > > > >
> > > > > > Ping, this happened again with 2.6.26-rc4 and PREEMPT_RCU.
> > > > >
> > > > > Worrisome... Paul, would you mind taking a quick look at cfq
> > > > > and see if you can detect why breaks with preempt rcu? It's
> > > > > clearly a use-after-free symptom, but I don't see how it can
> > > > > happen.
> > > >
> > > > Some quick and probably off-the-mark questions...
> > >
> > > Thanks!
> >
> > Glad it actually was of help! ;-)
>
> Your reviews are ALWAYS greatly appreciated!
:-)
> > > > o What is the purpose of __call_for_each_cic()? When called
> > > > from call_for_each_cic(), it is under rcu_read_lock(), as
> > > > required, but it is also called from cfq_free_io_context(),
> > > > which is assigned to the ->dtor and ->exit members of the
> > > > cfq_io_context struct. What protects calls through these
> > > > members?
> > > >
> > > > (This is for the ->cic_list field of the cfq_io_context structure.
> > > > One possibility is that the io_context's ->lock member is held,
> > > > but I don't see this. Not that I looked all that hard...)
> > > >
> > > > My suggestion would be to simply change all invocations of
> > > > __call_for_each_cic() to instead invoke call_for_each_cic().
> > > > The rcu_read_lock()/rcu_read_unlock() pair is pretty
> > > > lightweight, even in CONFIG_PREEMPT_RCU.
> > >
> > > __call_for_each_cic() is always called under rcu_read_lock(), it merely
> > > exists to avoid a double rcu_read_lock(). Even if it is cheap. The
> > > convention follows the usual __lock_is_already_held() double under
> > > score, but I guess it could do with a comment! There are only two
> > > callers of the function, call_for_each_cic() which does the
> > > rcu_read_lock(), and cfq_free_io_context() which is called from ->dtor
> > > (and holds the rcu_read_lock() and ->trim which actually does not. That
> > > looks like it could be problematic, but it's only called when an io
> > > scheduler module is removed so not really critical. I'll add it, though!
> > > Actually, the task_lock() should be enough there. So no bug, but (again)
> > > it could do with a comment.
> >
> > Sounds good!
> >
> > > > o When calling cfq_slab_kill(), for example from cfq_exit(),
> > > > what ensures that all previous RCU callbacks have completed?
> > > >
> > > > I suspect that you need an rcu_barrier() at the beginning
> > > > of cfq_slab_kill(), but I could be missing something.
> > >
> > > So we have two callers of that, one is from the error path at init time
> > > and is obviously ok. The other does need rcu_barrier()! I'll add that.
> >
> > OK, that does make my brain hurt less. ;-)
>
> So that one was also OK, as Fabio pointed out. If you follow the
> ioc_gone and user tracking, the:
>
> if (elv_ioc_count_read(ioc_count))
> wait_for_completion(ioc_gone);
>
> also has the side effect of waiting for RCU callbacks calling
> kmem_cache_free() to have finished as well.
I stand corrected.
> > > > o What protects the first rcu_dereference() in cfq_cic_lookup()?
> > > > There needs to be either an enclose rcu_read_lock() on the
> > > > one hand or the ->queue_lock needs to be held.
> > > >
> > > > (My guess is the latter, given the later rcu_assign_pointer()
> > > > in this same function, but I don't see a lock acquisition
> > > > in the immediate vicinity -- might be further up the function
> > > > call stack, though.)
> > >
> > > There's no locking going into that function when coming from
> > > cfq_get_io_context(), the other paths (happen) to hold the queue lock
> > > already though.
> >
> > So the call from cfq_get_io_context() needs an rcu_read_lock()?
> > Not seeing this in the patch below, but maybe you have it up a
> > function-call level or two?
>
> It's in there, it now does:
>
> rcu_read_lock();
> cic = rcu_dereference(ioc->ioc_data);
> if (cic && cic->key == cfqd) {
> rcu_read_unlock();
> return cic;
> }
> ...
>
> OK? Which is basically what remains of the patch now, except for the
> comment additions. Oh, and the ioc->lock protecting setting of
> ->ioc_data as well. New version below. Alexey, care to give this a
> spin? Seems your box is very well suited for finding RCU preempt
> problems :-)
OK, looks good.
Thanx, Paul
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 4df3f05..d01b411 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -1142,6 +1142,9 @@ static void cfq_put_queue(struct cfq_queue *cfqq)
> kmem_cache_free(cfq_pool, cfqq);
> }
>
> +/*
> + * Must always be called with the rcu_read_lock() held
> + */
> static void
> __call_for_each_cic(struct io_context *ioc,
> void (*func)(struct io_context *, struct cfq_io_context *))
> @@ -1197,6 +1200,11 @@ static void cic_free_func(struct io_context *ioc, struct cfq_io_context *cic)
> cfq_cic_free(cic);
> }
>
> +/*
> + * Must be called with rcu_read_lock() held or preemption otherwise disabled.
> + * Only two callers of this - ->dtor() which is called with the rcu_read_lock(),
> + * and ->trim() which is called with the task lock held
> + */
> static void cfq_free_io_context(struct io_context *ioc)
> {
> /*
> @@ -1502,20 +1510,24 @@ static struct cfq_io_context *
> cfq_cic_lookup(struct cfq_data *cfqd, struct io_context *ioc)
> {
> struct cfq_io_context *cic;
> + unsigned long flags;
> void *k;
>
> if (unlikely(!ioc))
> return NULL;
>
> + rcu_read_lock();
> +
> /*
> * we maintain a last-hit cache, to avoid browsing over the tree
> */
> cic = rcu_dereference(ioc->ioc_data);
> - if (cic && cic->key == cfqd)
> + if (cic && cic->key == cfqd) {
> + rcu_read_unlock();
> return cic;
> + }
>
> do {
> - rcu_read_lock();
> cic = radix_tree_lookup(&ioc->radix_root, (unsigned long) cfqd);
> rcu_read_unlock();
> if (!cic)
> @@ -1524,10 +1536,13 @@ cfq_cic_lookup(struct cfq_data *cfqd, struct io_context *ioc)
> k = cic->key;
> if (unlikely(!k)) {
> cfq_drop_dead_cic(cfqd, ioc, cic);
> + rcu_read_lock();
> continue;
> }
>
> + spin_lock_irqsave(&ioc->lock, flags);
> rcu_assign_pointer(ioc->ioc_data, cic);
> + spin_unlock_irqrestore(&ioc->lock, flags);
> break;
> } while (1);
>
> @@ -2134,6 +2149,10 @@ static void *cfq_init_queue(struct request_queue *q)
>
> static void cfq_slab_kill(void)
> {
> + /*
> + * Caller already ensured that pending RCU callbacks are completed,
> + * so we should have no busy allocations at this point.
> + */
> if (cfq_pool)
> kmem_cache_destroy(cfq_pool);
> if (cfq_ioc_pool)
> @@ -2292,6 +2311,11 @@ static void __exit cfq_exit(void)
> ioc_gone = &all_gone;
> /* ioc_gone's update must be visible before reading ioc_count */
> smp_wmb();
> +
> + /*
> + * this also protects us from entering cfq_slab_kill() with
> + * pending RCU callbacks
> + */
> if (elv_ioc_count_read(ioc_count))
> wait_for_completion(ioc_gone);
> cfq_slab_kill();
>
> --
> Jens Axboe
>
next prev parent reply other threads:[~2008-05-28 13:20 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-27 22:55 2.6.25-$sha1: RIP call_for_each_cic+0x25/0x50 Alexey Dobriyan
2008-04-28 12:01 ` Andrew Morton
2008-04-28 12:04 ` Jens Axboe
2008-04-28 19:55 ` Alexey Dobriyan
2008-04-29 6:21 ` Alexey Dobriyan
2008-04-29 9:06 ` Jens Axboe
2008-04-30 22:12 ` Alexey Dobriyan
2008-05-04 19:08 ` Jens Axboe
2008-05-04 20:15 ` Alexey Dobriyan
2008-05-04 19:25 ` Jens Axboe
2008-05-04 21:17 ` Alexey Dobriyan
2008-05-10 10:37 ` 2.6.25-$sha1: RIP __call_for_each_cic+0x20/0x50 Alexey Dobriyan
2008-05-27 5:27 ` 2.6.26-rc4: " Alexey Dobriyan
2008-05-27 13:35 ` Jens Axboe
2008-05-27 15:18 ` Paul E. McKenney
2008-05-28 10:07 ` Jens Axboe
2008-05-28 10:30 ` Paul E. McKenney
2008-05-28 12:44 ` Jens Axboe
2008-05-28 13:20 ` Paul E. McKenney [this message]
2008-05-29 4:38 ` Paul E. McKenney
2008-05-29 6:26 ` Jens Axboe
2008-05-29 6:42 ` Jens Axboe
2008-05-29 9:17 ` Paul E. McKenney
2008-05-29 10:13 ` Jens Axboe
2008-05-29 11:25 ` Paul E. McKenney
2008-05-29 11:44 ` Jens Axboe
2008-05-29 12:11 ` Paul E. McKenney
2008-05-29 12:13 ` Jens Axboe
2008-05-30 11:04 ` Paul E. McKenney
2008-05-30 13:16 ` Paul E. McKenney
2008-05-30 18:34 ` Alexey Dobriyan
2008-06-04 3:31 ` Paul E. McKenney
2008-06-04 18:32 ` Linus Torvalds
2008-06-05 4:23 ` Paul E. McKenney
2008-06-06 14:49 ` Paul E. McKenney
2008-05-28 11:52 ` Fabio Checconi
2008-05-28 11:58 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080528132012.GF8255@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.