All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Jens Axboe <jaxboe@fusionio.com>
Cc: Jerome Marchand <jmarchan@redhat.com>,
	Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] block: fix accounting bug on cross partition merges
Date: Fri, 17 Dec 2010 17:32:03 -0500	[thread overview]
Message-ID: <20101217223203.GN14502@redhat.com> (raw)
In-Reply-To: <4D0BB4A1.8080305@fusionio.com>

On Fri, Dec 17, 2010 at 08:06:09PM +0100, Jens Axboe wrote:
> On 2010-12-17 14:42, Jerome Marchand wrote:
> > 
> > /proc/diskstats would display a strange output as follows.
> 
> [snip]
> 
> This looks a lot better! One comment:
> 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 4ce953f..064921d 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -64,13 +64,16 @@ static void drive_stat_acct(struct request *rq, int new_io)
> >  		return;
> >  
> >  	cpu = part_stat_lock();
> > -	part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq));
> >  
> > -	if (!new_io)
> > +	if (!new_io) {
> > +		part = rq->part;
> >  		part_stat_inc(cpu, part, merges[rw]);
> > -	else {
> > +	} else {
> > +		part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq));
> >  		part_round_stats(cpu, part);
> >  		part_inc_in_flight(part, rw);
> > +		kref_get(&part->ref);
> > +		rq->part = part;
> >  	}
> >  
> >  	part_stat_unlock();
> 
> I don't think this is completely safe. The rcu lock is held due to the
> part_stat_lock(), but that only prevents the __delete_partition()
> callback from happening. Lets say you have this:
> 
> CPU0                                         CPU1
> part = disk_map_sector_rcu()
>                                              kref_put(part); <- now 0
> part_stat_unlock()
>                                              __delete_partition();
>                                              ...
>                                              delete_partition_rcu_cb();
> merge, or endio, boom
> 
> Now rq has ->part pointing to freed memory, later merges or end
> accounting will touch freed memory.
> 
> I think we can fix this by just having delete_partition_rcu_rb() check
> the reference count and return if non-zero. Since someone holds a
> reference to the table, they will drop it and we'll re-schedule the rcu
> callback.

This is interesting. Using RCU with kref(). So even if somebody has done
a kref_put() and this is last reference, but rcu period is not over, somebody
can still go and take reference again and set it to 1 again and then
partition will not be freed as delete_partition_rcu_cb() will find it set.

I guess read shall have to be atomic_read() and struct kref is opaque so
one might have to introduce kref_read() or something like that and
possibly update Documentation/kref.txt for this usage of with RCU. I would
also recommend it to get it reviewed from  Paul McKenney to make sure this
usage of RCU is fine.

Thanks
Vivek

  reply	other threads:[~2010-12-17 22:32 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-06  9:44 [PATCH 1/2] Don't merge different partition's IOs Yasuaki Ishimatsu
2010-12-06 16:08 ` Linus Torvalds
2010-12-07  7:18   ` Satoru Takeuchi
2010-12-07 18:39     ` Vivek Goyal
2010-12-08  7:33     ` Jens Axboe
2010-12-08  7:59       ` Satoru Takeuchi
2010-12-08  8:06         ` Jens Axboe
2010-12-08  8:11           ` Satoru Takeuchi
2010-12-08 14:46             ` Jens Axboe
2010-12-08 15:51               ` Vivek Goyal
2010-12-08 15:58                 ` Vivek Goyal
2010-12-10 11:22                   ` Jerome Marchand
2010-12-10 16:12               ` Jerome Marchand
2010-12-10 16:55                 ` Vivek Goyal
2010-12-14 20:25                   ` Jens Axboe
2010-12-17 13:42                     ` [PATCH] block: fix accounting bug on cross partition merges Jerome Marchand
2010-12-17 19:06                       ` Jens Axboe
2010-12-17 22:32                         ` Vivek Goyal [this message]
2010-12-23 15:10                         ` Jerome Marchand
2010-12-23 15:39                           ` Vivek Goyal
2010-12-23 17:04                             ` Jerome Marchand
2010-12-24 19:29                               ` Vivek Goyal
2011-01-04 15:52                                 ` [PATCH 1/2] kref: add kref_test_and_get Jerome Marchand
2011-01-04 15:55                                   ` [PATCH 2/2] block: fix accounting bug on cross partition merges Jerome Marchand
2011-01-04 21:00                                     ` Greg KH
2011-01-05 13:51                                       ` Jerome Marchand
2011-01-05 16:00                                         ` Greg KH
2011-01-05 16:19                                           ` Jerome Marchand
2011-01-05 16:27                                             ` Greg KH
2011-01-05 13:55                                       ` Jens Axboe
2011-01-05 15:58                                         ` Greg KH
2011-01-05 18:46                                           ` Jens Axboe
2011-01-05 20:08                                             ` Greg KH
2011-01-05 21:38                                               ` Jens Axboe
2011-01-05 22:16                                                 ` Greg KH
2011-01-06  9:46                                                   ` Jens Axboe
2011-01-05 14:00                                     ` Jens Axboe
2011-01-05 14:09                                       ` Jerome Marchand
2011-01-05 14:17                                         ` Jens Axboe
2011-01-04 16:05                                   ` [PATCH 1/2] kref: add kref_test_and_get Eric Dumazet
2011-01-05 15:02                                     ` [PATCH 1/2 v2] " Jerome Marchand
2011-01-05 15:43                                       ` Alexey Dobriyan
2011-01-05 15:57                                         ` Greg KH
2011-01-05 15:56                                       ` Greg KH
2011-01-04 20:57                                   ` [PATCH 1/2] " Greg KH
2011-01-05 13:35                                     ` Jerome Marchand
2011-01-05 15:55                                       ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101217223203.GN14502@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=jaxboe@fusionio.com \
    --cc=jmarchan@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=takeuchi_satoru@jp.fujitsu.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.