From: Ming Lei <ming.lei@redhat.com>
To: Yufen Yu <yuyufen@huawei.com>
Cc: Hou Tao <houtao1@huawei.com>,
axboe@kernel.dk, linux-block@vger.kernel.org, hch@lst.de,
zhengchuan@huawei.com, yi.zhang@huawei.com, paulmck@kernel.org,
joel@joelfernandes.org, rcu@vger.kernel.org
Subject: Re: [PATCH] block: make sure last_lookup set as NULL after part deleted
Date: Mon, 6 Jan 2020 16:11:37 +0800 [thread overview]
Message-ID: <20200106081137.GA10487@ming.t460p> (raw)
In-Reply-To: <582f8e81-6127-47aa-f7fe-035251052238@huawei.com>
On Mon, Jan 06, 2020 at 03:39:07PM +0800, Yufen Yu wrote:
> Hi, Ming
>
> On 2020/1/3 23:16, Ming Lei wrote:
> > Hello Yufen,
> >
> > OK, we still can move clearing .last_lookup into __delete_partition(),
> > at that time all IO path can observe the partition percpu-refcount killed.
> >
> > Also the rcu work fn is run after one RCU grace period, at that time,
> > the NULL .last_lookup becomes visible in all IO path too.
> >
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 089e890ab208..79599f5fd5b7 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -1365,18 +1365,6 @@ void blk_account_io_start(struct request *rq, bool new_io)
> > part_stat_inc(part, merges[rw]);
> > } else {
> > part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq));
> > - if (!hd_struct_try_get(part)) {
> > - /*
> > - * The partition is already being removed,
> > - * the request will be accounted on the disk only
> > - *
> > - * We take a reference on disk->part0 although that
> > - * partition will never be deleted, so we can treat
> > - * it as any other partition.
> > - */
> > - part = &rq->rq_disk->part0;
> > - hd_struct_get(part);
> > - }
> > part_inc_in_flight(rq->q, part, rw);
> > rq->part = part;
> > }
> > diff --git a/block/genhd.c b/block/genhd.c
> > index ff6268970ddc..e3dec90b1f43 100644
> > --- a/block/genhd.c
> > +++ b/block/genhd.c
> > @@ -286,17 +286,21 @@ struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector)
> > ptbl = rcu_dereference(disk->part_tbl);
> > part = rcu_dereference(ptbl->last_lookup);
> > - if (part && sector_in_part(part, sector))
> > + if (part && sector_in_part(part, sector) && hd_struct_try_get(part))
> > return part;
> > for (i = 1; i < ptbl->len; i++) {
> > part = rcu_dereference(ptbl->part[i]);
> > if (part && sector_in_part(part, sector)) {
> > + if (!hd_struct_try_get(part))
> > + goto exit;
> > rcu_assign_pointer(ptbl->last_lookup, part);
> > return part;
> > }
> > }
> > + exit:
> > + hd_struct_get(&disk->part0);
> > return &disk->part0;
> > }
> > EXPORT_SYMBOL_GPL(disk_map_sector_rcu);
> > diff --git a/block/partition-generic.c b/block/partition-generic.c
> > index 1d20c9cf213f..1739f750dbf2 100644
> > --- a/block/partition-generic.c
> > +++ b/block/partition-generic.c
> > @@ -262,6 +262,12 @@ static void delete_partition_work_fn(struct work_struct *work)
> > void __delete_partition(struct percpu_ref *ref)
> > {
> > struct hd_struct *part = container_of(ref, struct hd_struct, ref);
> > + struct disk_part_tbl *ptbl =
> > + rcu_dereference_protected(part->disk->part_tbl, 1);
> > +
> > + rcu_assign_pointer(ptbl->last_lookup, NULL);
> > + put_device(disk_to_dev(part->disk));
> > +
> > INIT_RCU_WORK(&part->rcu_work, delete_partition_work_fn);
> > queue_rcu_work(system_wq, &part->rcu_work);
> > }
> > @@ -283,8 +289,9 @@ void delete_partition(struct gendisk *disk, int partno)
> > if (!part)
> > return;
> > + get_device(disk_to_dev(disk));
> > rcu_assign_pointer(ptbl->part[partno], NULL);
> > - rcu_assign_pointer(ptbl->last_lookup, NULL);
> > +
> > kobject_put(part->holder_dir);
> > device_del(part_to_dev(part));
> > @@ -349,6 +356,7 @@ struct hd_struct *add_partition(struct gendisk *disk, int partno,
> > p->nr_sects = len;
> > p->partno = partno;
> > p->policy = get_disk_ro(disk);
> > + p->disk = disk;
> > if (info) {
> > struct partition_meta_info *pinfo = alloc_part_info(disk);
> > diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> > index 8bb63027e4d6..66660ec5e8ee 100644
> > --- a/include/linux/genhd.h
> > +++ b/include/linux/genhd.h
> > @@ -129,6 +129,7 @@ struct hd_struct {
> > #else
> > struct disk_stats dkstats;
> > #endif
> > + struct gendisk *disk;
> > struct percpu_ref ref;
> > struct rcu_work rcu_work;
> > };
>
>
> IMO, this change can solve the problem. But, __delete_partition will
> depend on the implementation of disk_release(). If disk .release modify
> as blocked in the future, then __delete_partition will also be blocked,
> which is not expected in rcu callback function.
__delete_partition() won't be blocked because it just calls queue_rcu_work() to
release the partition instance in wq context.
>
> We may cache index of part[] instead of part[i] itself to fix the use-after-free bug.
> https://patchwork.kernel.org/patch/11318767/
That approach can fix the issue too, but extra overhead is added in the
fast path because partition retrieval is changed to the following way:
+ last_lookup = READ_ONCE(ptbl->last_lookup);
+ if (last_lookup > 0 && last_lookup < ptbl->len) {
+ part = rcu_dereference(ptbl->part[last_lookup]);
+ if (part && sector_in_part(part, sector))
+ return part;
+ }
from
part = rcu_dereference(ptbl->last_lookup);
So ptbl->part[] has to be fetched, it is fine if the ->part[] array
shares same cacheline with ptbl->last_lookup, but one disk may have
too many partitions, then your approach may introduce one extra cache
miss every time.
READ_ONCE() may imply one read barrier too.
Thanks,
Ming
next prev parent reply other threads:[~2020-01-06 8:11 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-31 11:09 [PATCH] block: make sure last_lookup set as NULL after part deleted Yufen Yu
2019-12-31 14:55 ` Hou Tao
2019-12-31 23:11 ` Paul E. McKenney
2020-01-01 2:33 ` htbegin
2020-01-01 3:39 ` htbegin
2020-01-03 23:45 ` Joel Fernandes
2020-01-04 9:16 ` Hou Tao
2020-01-02 1:23 ` Ming Lei
2020-01-03 3:06 ` Hou Tao
2020-01-03 4:18 ` Ming Lei
2020-01-03 7:35 ` Hou Tao
2020-01-03 8:17 ` Ming Lei
2020-01-03 12:03 ` Yufen Yu
2020-01-03 15:16 ` Ming Lei
2020-01-06 7:39 ` Yufen Yu
2020-01-06 8:11 ` Ming Lei [this message]
2020-01-06 9:41 ` Hou Tao
2020-01-06 10:05 ` Ming Lei
2020-01-07 11:40 ` Hou Tao
2020-01-08 3:19 ` Ming Lei
2020-01-03 12:43 ` Yufen Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200106081137.GA10487@ming.t460p \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=houtao1@huawei.com \
--cc=joel@joelfernandes.org \
--cc=linux-block@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=yi.zhang@huawei.com \
--cc=yuyufen@huawei.com \
--cc=zhengchuan@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.