public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Jens Axboe <jaxboe@fusionio.com>
Cc: Paul Bolle <pebolle@tiscali.nl>,
	"paulmck@linux.vnet.ibm.com" <paulmck@linux.vnet.ibm.com>,
	linux kernel mailing list <linux-kernel@vger.kernel.org>
Subject: Re: Mysterious CFQ crash and RCU
Date: Mon, 6 Jun 2011 10:28:04 -0400	[thread overview]
Message-ID: <20110606142804.GA31757@redhat.com> (raw)
In-Reply-To: <4DEB28A1.5090109@fusionio.com>

On Sun, Jun 05, 2011 at 08:56:33AM +0200, Jens Axboe wrote:
> On 2011-06-05 00:48, Paul Bolle wrote:
> > I think I finally found it!
> > 
> > The culprit seems to be io_context.ioc_data (not the most clear of
> > names!). It seems to be a single entry "last-hit cache" of an hlist
> > called cic_list. (There are three, subtly different, cic_lists in the
> > CFQ code!) It is not entirely clear, but that last-hit cache can get out
> > of sync with the hlist it is supposed to cache. My guess it that every
> > now and then a member of the hlist gets deleted while it's still in that
> > (single entry) cache. If it then gets retrieved from that cache it
> > already points to poisoned memory. For some strange reason this only
> > results in an Oops if one or more debugging options are set (as are set
> > in the Fedora Rawhide non-stable kernels that I ran into this). I have
> > no clue whatsoever, why that is ...
> > 
> > Anyhow, after ripping out ioc_data this bug seems to have disappeared!
> > Jens, Vivek, could you please have a look at this? In the mean time I
> > hope to pinpoint this issue and draft a small patch to really solve it
> > (ie, not by simply ripping out ioc_data).
> 
> Does this fix it? It will introduce a hierarchy that is queue -> ioc
> lock, but as far as I can remember (and tell from a quick look), we
> don't have any dependencies on that order of locking at this moment. So
> should be OK.

Excellent debugging PaulB. This bug was troubling us for a very long
time now.

While we are looking at races, I am wondering if we still have more races
in process exit vs queue exit path. That's a different thing that it is
not easy to hit the races.

So process exit path we do following.

cfq_exit_single_io_context {
	cfqd = cic_to_cfqd();
	q = cfqd->queue; <----- Racy
	spin_lock_irqsave(q->queue_lock, flags); <--- Racy
}

And in queue exit path we do following.

cfq_exit_queue()
{
	spin_lock_irq(q->queue_lock);
		__cfq_exit_single_io_context {
			cic->key = cfqd_dead_key(cfqd);	
		}
	spin_unlock_irq(q->queue_lock);
	kfree(cfqd);	
}

In process exit path, we retrieve cfqd without any locks (Except rcu read
lock), and it might race with queue exit path. Resulting in the fact that
by the time we fetch queue pointer from cfqd, cfqd might have already been
freed.

I think issue here seems to be that we fetch cfqd = cic->key under rcu
but free up cfqd without waiting for rcu grace period.

If we introduce an unconditional synchronize_rcu() in cfq_exit_queue()
then it takes a long time for drivers which create lots of queues
and tear them down during boot.

If we use call_rcu() to free cfqd(), then there is no gurantee that
request queue object is still around. (cfqd->queue).

Even if we take request queue reference to keep request queue object,
then there is no gurantee that queue->lock is around as driver might
have freed it up.

Not sure how to fix this cleanly. Just thought of raising the concern
though while we are at it.

Thanks
Vivek

  parent reply	other threads:[~2011-06-06 14:28 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-19 22:24 Mysterious CFQ crash and RCU Vivek Goyal
2011-05-21 21:00 ` Paul E. McKenney
2011-05-21 22:23   ` Paul Bolle
2011-05-21 23:54     ` Paul E. McKenney
2011-05-22 19:30       ` Paul Bolle
2011-05-22 20:13         ` Paul E. McKenney
2011-05-23 15:21   ` Vivek Goyal
2011-05-23 15:38     ` Paul E. McKenney
2011-05-23 22:20       ` Paul Bolle
2011-05-24  4:14         ` Paul E. McKenney
2011-05-24  9:41         ` Jens Axboe
2011-05-24 14:35           ` Paul E. McKenney
2011-05-24 14:51             ` Jens Axboe
2011-05-24 15:42               ` Paul E. McKenney
2011-05-24 15:51                 ` Paul E. McKenney
2011-05-25  8:28           ` Paul Bolle
2011-05-25  8:46             ` Jens Axboe
2011-05-25  9:13               ` Paul Bolle
2011-05-25  9:30                 ` Jens Axboe
2011-05-25  9:40                   ` Paul Bolle
2011-05-25 12:48               ` Paul Bolle
2011-05-25 12:51                 ` Jens Axboe
2011-05-25 17:28               ` Paul Bolle
2011-05-25 18:59                 ` Jens Axboe
2011-05-25 10:17       ` Paul Bolle
2011-05-25 15:33         ` Paul E. McKenney
2011-05-25 17:44           ` Paul Bolle
2011-05-25 20:40             ` Paul E. McKenney
2011-05-26  9:15       ` Paul Bolle
2011-06-03  5:07         ` Paul E. McKenney
2011-06-03 13:45           ` Vivek Goyal
2011-06-03 15:33             ` Paul E. McKenney
2011-06-03 16:54               ` Paul E. McKenney
2011-06-04 12:22             ` Paul Bolle
2011-06-04 12:50           ` Paul Bolle
2011-06-04 16:03             ` Paul E. McKenney
2011-06-04 22:48               ` Paul Bolle
2011-06-04 23:06                 ` Paul E. McKenney
2011-08-04 15:05                   ` Vivek Goyal
2011-08-04 19:43                     ` Jens Axboe
2011-08-04 19:51                       ` Vivek Goyal
2011-06-05  6:56                 ` Jens Axboe
2011-06-05  8:39                   ` Paul Bolle
2011-06-05 10:38                   ` Paul Bolle
2011-06-05 22:51                     ` Jens Axboe
2011-06-06 14:28                   ` Vivek Goyal [this message]
2011-05-23 15:36   ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110606142804.GA31757@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=jaxboe@fusionio.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pebolle@tiscali.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox