* [PATCH 0/3] Avoid that blk_{un,}register_queue() trigger lock inversion
@ 2018-01-16 18:17 Bart Van Assche
2018-01-16 18:17 ` [PATCH 1/3] block: Unexport elv_register_queue() and elv_unregister_queue() Bart Van Assche
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Bart Van Assche @ 2018-01-16 18:17 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Mike Snitzer, Bart Van Assche
Hello Jens,
The three patches in this series are what I came up with after having
analyzed a lockdep complaint triggered by blk_unregister_queue. Please
consider these patches for kernel v4.16.
Thanks,
Bart.
Bart Van Assche (3):
block: Unexport elv_register_queue() and elv_unregister_queue()
block: Document scheduler change code locking requirements
block: Protect less code with sysfs_lock in blk_{un,}register_queue()
block/blk-sysfs.c | 13 ++++---------
block/blk.h | 3 +++
block/elevator.c | 10 ++++++++--
include/linux/elevator.h | 2 --
4 files changed, 15 insertions(+), 13 deletions(-)
--
2.15.1
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH 1/3] block: Unexport elv_register_queue() and elv_unregister_queue() 2018-01-16 18:17 [PATCH 0/3] Avoid that blk_{un,}register_queue() trigger lock inversion Bart Van Assche @ 2018-01-16 18:17 ` Bart Van Assche 2018-01-16 18:17 ` [PATCH 2/3] block: Document scheduler change code locking requirements Bart Van Assche 2018-01-16 18:17 ` [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue() Bart Van Assche 2 siblings, 0 replies; 10+ messages in thread From: Bart Van Assche @ 2018-01-16 18:17 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Mike Snitzer, Bart Van Assche These two functions are only called from inside the block layer so unexport them. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> --- block/blk.h | 3 +++ block/elevator.c | 2 -- include/linux/elevator.h | 2 -- 3 files changed, 3 insertions(+), 4 deletions(-) diff --git a/block/blk.h b/block/blk.h index 662005e46560..46db5dc83dcb 100644 --- a/block/blk.h +++ b/block/blk.h @@ -162,6 +162,9 @@ static inline void elv_deactivate_rq(struct request_queue *q, struct request *rq e->type->ops.sq.elevator_deactivate_req_fn(q, rq); } +int elv_register_queue(struct request_queue *q); +void elv_unregister_queue(struct request_queue *q); + struct hd_struct *__disk_get_part(struct gendisk *disk, int partno); #ifdef CONFIG_FAIL_IO_TIMEOUT diff --git a/block/elevator.c b/block/elevator.c index 138faeb08a7c..4f00b53cd5fd 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -886,7 +886,6 @@ int elv_register_queue(struct request_queue *q) } return error; } -EXPORT_SYMBOL(elv_register_queue); void elv_unregister_queue(struct request_queue *q) { @@ -900,7 +899,6 @@ void elv_unregister_queue(struct request_queue *q) wbt_enable_default(q); } } -EXPORT_SYMBOL(elv_unregister_queue); int elv_register(struct elevator_type *e) { diff --git a/include/linux/elevator.h b/include/linux/elevator.h index 3d794b3dc532..6d9e230dffd2 100644 --- a/include/linux/elevator.h +++ b/include/linux/elevator.h @@ -198,8 +198,6 @@ extern bool elv_attempt_insert_merge(struct request_queue *, struct request *); extern void elv_requeue_request(struct request_queue *, struct request *); extern struct request *elv_former_request(struct request_queue *, struct request *); extern struct request *elv_latter_request(struct request_queue *, struct request *); -extern int elv_register_queue(struct request_queue *q); -extern void elv_unregister_queue(struct request_queue *q); extern int elv_may_queue(struct request_queue *, unsigned int); extern void elv_completed_request(struct request_queue *, struct request *); extern int elv_set_request(struct request_queue *q, struct request *rq, -- 2.15.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] block: Document scheduler change code locking requirements 2018-01-16 18:17 [PATCH 0/3] Avoid that blk_{un,}register_queue() trigger lock inversion Bart Van Assche 2018-01-16 18:17 ` [PATCH 1/3] block: Unexport elv_register_queue() and elv_unregister_queue() Bart Van Assche @ 2018-01-16 18:17 ` Bart Van Assche 2018-01-16 18:17 ` [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue() Bart Van Assche 2 siblings, 0 replies; 10+ messages in thread From: Bart Van Assche @ 2018-01-16 18:17 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Mike Snitzer, Bart Van Assche This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> --- block/elevator.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/block/elevator.c b/block/elevator.c index 4f00b53cd5fd..e87e9b43aba0 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -869,6 +869,8 @@ int elv_register_queue(struct request_queue *q) struct elevator_queue *e = q->elevator; int error; + lockdep_assert_held(&q->sysfs_lock); + error = kobject_add(&e->kobj, &q->kobj, "%s", "iosched"); if (!error) { struct elv_fs_entry *attr = e->type->elevator_attrs; @@ -889,6 +891,8 @@ int elv_register_queue(struct request_queue *q) void elv_unregister_queue(struct request_queue *q) { + lockdep_assert_held(&q->sysfs_lock); + if (q) { struct elevator_queue *e = q->elevator; @@ -965,6 +969,8 @@ static int elevator_switch_mq(struct request_queue *q, { int ret; + lockdep_assert_held(&q->sysfs_lock); + blk_mq_freeze_queue(q); blk_mq_quiesce_queue(q); @@ -1010,6 +1016,8 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) bool old_registered = false; int err; + lockdep_assert_held(&q->sysfs_lock); + if (q->mq_ops) return elevator_switch_mq(q, new_e); -- 2.15.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue() 2018-01-16 18:17 [PATCH 0/3] Avoid that blk_{un,}register_queue() trigger lock inversion Bart Van Assche 2018-01-16 18:17 ` [PATCH 1/3] block: Unexport elv_register_queue() and elv_unregister_queue() Bart Van Assche 2018-01-16 18:17 ` [PATCH 2/3] block: Document scheduler change code locking requirements Bart Van Assche @ 2018-01-16 18:17 ` Bart Van Assche 2018-01-16 22:32 ` Mike Snitzer 2 siblings, 1 reply; 10+ messages in thread From: Bart Van Assche @ 2018-01-16 18:17 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Mike Snitzer, Bart Van Assche The __blk_mq_register_dev(), blk_mq_unregister_dev(), elv_register_queue() and elv_unregister_queue() calls need to be protected with sysfs_lock but other code in these functions not. Hence protect only this code with sysfs_lock. This patch fixes a locking inversion issue in blk_unregister_queue() and also in an error path of blk_register_queue(): it is not allowed to hold sysfs_lock around the kobject_del(&q->kobj) call. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> --- block/blk-sysfs.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 4a6a40ffd78e..e9ce45ff0ef2 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -909,11 +909,12 @@ int blk_register_queue(struct gendisk *disk) if (q->request_fn || (q->mq_ops && q->elevator)) { ret = elv_register_queue(q); if (ret) { + mutex_unlock(&q->sysfs_lock); kobject_uevent(&q->kobj, KOBJ_REMOVE); kobject_del(&q->kobj); blk_trace_remove_sysfs(dev); kobject_put(&dev->kobj); - goto unlock; + return ret; } } ret = 0; @@ -934,28 +935,22 @@ void blk_unregister_queue(struct gendisk *disk) if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags)) return; - /* - * Protect against the 'queue' kobj being accessed - * while/after it is removed. - */ - mutex_lock(&q->sysfs_lock); - spin_lock_irq(q->queue_lock); queue_flag_clear(QUEUE_FLAG_REGISTERED, q); spin_unlock_irq(q->queue_lock); wbt_exit(q); + mutex_lock(&q->sysfs_lock); if (q->mq_ops) blk_mq_unregister_dev(disk_to_dev(disk), q); if (q->request_fn || (q->mq_ops && q->elevator)) elv_unregister_queue(q); + mutex_unlock(&q->sysfs_lock); kobject_uevent(&q->kobj, KOBJ_REMOVE); kobject_del(&q->kobj); blk_trace_remove_sysfs(disk_to_dev(disk)); kobject_put(&disk_to_dev(disk)->kobj); - - mutex_unlock(&q->sysfs_lock); } -- 2.15.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue() 2018-01-16 18:17 ` [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue() Bart Van Assche @ 2018-01-16 22:32 ` Mike Snitzer 2018-01-17 0:03 ` Bart Van Assche 0 siblings, 1 reply; 10+ messages in thread From: Mike Snitzer @ 2018-01-16 22:32 UTC (permalink / raw) To: Bart Van Assche; +Cc: Jens Axboe, linux-block, Christoph Hellwig On Tue, Jan 16 2018 at 1:17pm -0500, Bart Van Assche <bart.vanassche@wdc.com> wrote: > The __blk_mq_register_dev(), blk_mq_unregister_dev(), > elv_register_queue() and elv_unregister_queue() calls need to be > protected with sysfs_lock but other code in these functions not. > Hence protect only this code with sysfs_lock. This patch fixes a > locking inversion issue in blk_unregister_queue() and also in an > error path of blk_register_queue(): it is not allowed to hold > sysfs_lock around the kobject_del(&q->kobj) call. > > Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> > --- > block/blk-sysfs.c | 13 ++++--------- > 1 file changed, 4 insertions(+), 9 deletions(-) > > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > index 4a6a40ffd78e..e9ce45ff0ef2 100644 > --- a/block/blk-sysfs.c > +++ b/block/blk-sysfs.c > @@ -909,11 +909,12 @@ int blk_register_queue(struct gendisk *disk) > if (q->request_fn || (q->mq_ops && q->elevator)) { > ret = elv_register_queue(q); > if (ret) { > + mutex_unlock(&q->sysfs_lock); > kobject_uevent(&q->kobj, KOBJ_REMOVE); > kobject_del(&q->kobj); > blk_trace_remove_sysfs(dev); > kobject_put(&dev->kobj); > - goto unlock; > + return ret; > } > } > ret = 0; > @@ -934,28 +935,22 @@ void blk_unregister_queue(struct gendisk *disk) > if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags)) > return; > > - /* > - * Protect against the 'queue' kobj being accessed > - * while/after it is removed. > - */ > - mutex_lock(&q->sysfs_lock); > - > spin_lock_irq(q->queue_lock); > queue_flag_clear(QUEUE_FLAG_REGISTERED, q); > spin_unlock_irq(q->queue_lock); > > wbt_exit(q); > > + mutex_lock(&q->sysfs_lock); > if (q->mq_ops) > blk_mq_unregister_dev(disk_to_dev(disk), q); > > if (q->request_fn || (q->mq_ops && q->elevator)) > elv_unregister_queue(q); My concern with this change is detailed in the following portion of the header for commit 667257e8b2988c0183ba23e2bcd6900e87961606: 2) Conversely, __elevator_change() is testing for QUEUE_FLAG_REGISTERED in case elv_iosched_store() loses the race with blk_unregister_queue(), it needs a way to know the 'queue' kobj isn't there. I don't think moving mutex_lock(&q->sysfs_lock); after the clearing of QUEUE_FLAG_REGISTERED is a step in the right direction. Current code shows: blk_cleanup_queue() calls blk_set_queue_dying() while holding the sysfs_lock. queue_attr_{show,store} both test if blk_queue_dying(q) while holding the sysfs_lock. BUT drivers can/do call del_gendisk() _before_ blk_cleanup_queue(). (if your proposed change above were to go in all of the block drivers would first need to be audited for the need to call blk_cleanup_queue() before del_gendisk() -- seems awful). Therefore it seems to me that all queue_attr_{show,store} are racey vs blk_unregister_queue() removing the 'queue' kobject. And it was just that __elevator_change() was myopicly fixed to address the race whereas a more generic solution was/is needed. But short of that more generic fix your change will reintroduce the potential for hitting the issue that commit e9a823fb34a8b fixed. In that light, think it best to leave blk_unregister_queue()'s mutex_lock() above the QUEUE_FLAG_REGISTERED clearing _and_ update queue_attr_{show,store} to test for QUEUE_FLAG_REGISTERED while holding sysfs_lock. Then remove the unicorn test_bit for QUEUE_FLAG_REGISTERED from __elevator_change(). But it could be I'm wrong for some reason.. as you know that happens ;) Mike ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue() 2018-01-16 22:32 ` Mike Snitzer @ 2018-01-17 0:03 ` Bart Van Assche 2018-01-17 1:23 ` Ming Lei 0 siblings, 1 reply; 10+ messages in thread From: Bart Van Assche @ 2018-01-17 0:03 UTC (permalink / raw) To: snitzer@redhat.com Cc: hch@lst.de, linux-block@vger.kernel.org, axboe@kernel.dk T24gVHVlLCAyMDE4LTAxLTE2IGF0IDE3OjMyIC0wNTAwLCBNaWtlIFNuaXR6ZXIgd3JvdGU6DQo+ IFRoZXJlZm9yZSBpdCBzZWVtcyB0byBtZSB0aGF0IGFsbCBxdWV1ZV9hdHRyX3tzaG93LHN0b3Jl fSBhcmUgcmFjZXkgdnMNCj4gYmxrX3VucmVnaXN0ZXJfcXVldWUoKSByZW1vdmluZyB0aGUgJ3F1 ZXVlJyBrb2JqZWN0Lg0KPiANCj4gQW5kIGl0IHdhcyBqdXN0IHRoYXQgX19lbGV2YXRvcl9jaGFu Z2UoKSB3YXMgbXlvcGljbHkgZml4ZWQgdG8gYWRkcmVzcw0KPiB0aGUgcmFjZSB3aGVyZWFzIGEg bW9yZSBnZW5lcmljIHNvbHV0aW9uIHdhcy9pcyBuZWVkZWQuICBCdXQgc2hvcnQgb2YNCj4gdGhh dCBtb3JlIGdlbmVyaWMgZml4IHlvdXIgY2hhbmdlIHdpbGwgcmVpbnRyb2R1Y2UgdGhlIHBvdGVu dGlhbCBmb3INCj4gaGl0dGluZyB0aGUgaXNzdWUgdGhhdCBjb21taXQgZTlhODIzZmIzNGE4YiBm aXhlZC4NCj4gDQo+IEluIHRoYXQgbGlnaHQsIHRoaW5rIGl0IGJlc3QgdG8gbGVhdmUgYmxrX3Vu cmVnaXN0ZXJfcXVldWUoKSdzDQo+IG11dGV4X2xvY2soKSBhYm92ZSB0aGUgUVVFVUVfRkxBR19S RUdJU1RFUkVEIGNsZWFyaW5nIF9hbmRfIHVwZGF0ZQ0KPiBxdWV1ZV9hdHRyX3tzaG93LHN0b3Jl fSB0byB0ZXN0IGZvciBRVUVVRV9GTEFHX1JFR0lTVEVSRUQgd2hpbGUgaG9sZGluZw0KPiBzeXNm c19sb2NrLg0KPiANCj4gVGhlbiByZW1vdmUgdGhlIHVuaWNvcm4gdGVzdF9iaXQgZm9yIFFVRVVF X0ZMQUdfUkVHSVNURVJFRCBmcm9tDQo+IF9fZWxldmF0b3JfY2hhbmdlKCkuDQoNClRoYW5rcyBN aWtlIGZvciB0aGUgZmVlZGJhY2suIEhvd2V2ZXIsIEkgdGhpbmsgYSBzaW1wbGVyIGFwcHJvYWNo IGV4aXN0cyB0aGFuDQp3aGF0IGhhcyBiZWVuIGRlc2NyaWJlZCBhYm92ZSwgbmFtZWx5IGJ5IHVu cmVnaXN0ZXJpbmcgdGhlIHN5c2ZzIGF0dHJpYnV0ZXMNCmVhcmxpZXIuIEhvdyBhYm91dCB0aGUg cGF0Y2ggYmVsb3c/DQoNCltQQVRDSF0gYmxvY2s6IFByb3RlY3QgbGVzcyBjb2RlIHdpdGggc3lz ZnNfbG9jayBpbiBibGtfe3VuLH1yZWdpc3Rlcl9xdWV1ZSgpDQotLS0NCiBibG9jay9ibGstc3lz ZnMuYyB8IDM5ICsrKysrKysrKysrKysrKysrKysrKysrKysrLS0tLS0tLS0tLS0tLQ0KIGJsb2Nr L2VsZXZhdG9yLmMgIHwgIDQgLS0tLQ0KIDIgZmlsZXMgY2hhbmdlZCwgMjYgaW5zZXJ0aW9ucygr KSwgMTcgZGVsZXRpb25zKC0pDQoNCmRpZmYgLS1naXQgYS9ibG9jay9ibGstc3lzZnMuYyBiL2Js b2NrL2Jsay1zeXNmcy5jDQppbmRleCA0YTZhNDBmZmQ3OGUuLmNlMzIzNjZjNmRiNyAxMDA2NDQN Ci0tLSBhL2Jsb2NrL2Jsay1zeXNmcy5jDQorKysgYi9ibG9jay9ibGstc3lzZnMuYw0KQEAgLTg1 Myw2ICs4NTMsMTAgQEAgc3RydWN0IGtvYmpfdHlwZSBibGtfcXVldWVfa3R5cGUgPSB7DQogCS5y ZWxlYXNlCT0gYmxrX3JlbGVhc2VfcXVldWUsDQogfTsNCiANCisvKioNCisgKiBibGtfcmVnaXN0 ZXJfcXVldWUgLSByZWdpc3RlciBhIGJsb2NrIGxheWVyIHF1ZXVlIHdpdGggc3lzZnMNCisgKiBA ZGlzazogRGlzayBvZiB3aGljaCB0aGUgcmVxdWVzdCBxdWV1ZSBzaG91bGQgYmUgcmVnaXN0ZXJl ZCB3aXRoIHN5c2ZzLg0KKyAqLw0KIGludCBibGtfcmVnaXN0ZXJfcXVldWUoc3RydWN0IGdlbmRp c2sgKmRpc2spDQogew0KIAlpbnQgcmV0Ow0KQEAgLTkwOSwxMSArOTEzLDEyIEBAIGludCBibGtf cmVnaXN0ZXJfcXVldWUoc3RydWN0IGdlbmRpc2sgKmRpc2spDQogCWlmIChxLT5yZXF1ZXN0X2Zu IHx8IChxLT5tcV9vcHMgJiYgcS0+ZWxldmF0b3IpKSB7DQogCQlyZXQgPSBlbHZfcmVnaXN0ZXJf cXVldWUocSk7DQogCQlpZiAocmV0KSB7DQorCQkJbXV0ZXhfdW5sb2NrKCZxLT5zeXNmc19sb2Nr KTsNCiAJCQlrb2JqZWN0X3VldmVudCgmcS0+a29iaiwgS09CSl9SRU1PVkUpOw0KIAkJCWtvYmpl Y3RfZGVsKCZxLT5rb2JqKTsNCiAJCQlibGtfdHJhY2VfcmVtb3ZlX3N5c2ZzKGRldik7DQogCQkJ a29iamVjdF9wdXQoJmRldi0+a29iaik7DQotCQkJZ290byB1bmxvY2s7DQorCQkJcmV0dXJuIHJl dDsNCiAJCX0NCiAJfQ0KIAlyZXQgPSAwOw0KQEAgLTkyMyw2ICs5MjgsMTMgQEAgaW50IGJsa19y ZWdpc3Rlcl9xdWV1ZShzdHJ1Y3QgZ2VuZGlzayAqZGlzaykNCiB9DQogRVhQT1JUX1NZTUJPTF9H UEwoYmxrX3JlZ2lzdGVyX3F1ZXVlKTsNCiANCisvKioNCisgKiBibGtfdW5yZWdpc3Rlcl9xdWV1 ZSAtIGNvdW50ZXJwYXJ0IG9mIGJsa19yZWdpc3Rlcl9xdWV1ZSgpDQorICogQGRpc2s6IERpc2sg b2Ygd2hpY2ggdGhlIHJlcXVlc3QgcXVldWUgc2hvdWxkIGJlIHVucmVnaXN0ZXJlZCBmcm9tIHN5 c2ZzLg0KKyAqDQorICogTm90ZTogdGhlIGNhbGxlciBpcyByZXNwb25zaWJsZSBmb3IgZ3VhcmFu dGVlaW5nIHRoYXQgdGhpcyBmdW5jdGlvbiBpcyBjYWxsZWQNCisgKiBhZnRlciBibGtfcmVnaXN0 ZXJfcXVldWUoKSBoYXMgZmluaXNoZWQuDQorICovDQogdm9pZCBibGtfdW5yZWdpc3Rlcl9xdWV1 ZShzdHJ1Y3QgZ2VuZGlzayAqZGlzaykNCiB7DQogCXN0cnVjdCByZXF1ZXN0X3F1ZXVlICpxID0g ZGlzay0+cXVldWU7DQpAQCAtOTM0LDI4ICs5NDYsMjkgQEAgdm9pZCBibGtfdW5yZWdpc3Rlcl9x dWV1ZShzdHJ1Y3QgZ2VuZGlzayAqZGlzaykNCiAJaWYgKCF0ZXN0X2JpdChRVUVVRV9GTEFHX1JF R0lTVEVSRUQsICZxLT5xdWV1ZV9mbGFncykpDQogCQlyZXR1cm47DQogDQotCS8qDQotCSAqIFBy b3RlY3QgYWdhaW5zdCB0aGUgJ3F1ZXVlJyBrb2JqIGJlaW5nIGFjY2Vzc2VkDQotCSAqIHdoaWxl L2FmdGVyIGl0IGlzIHJlbW92ZWQuDQotCSAqLw0KLQltdXRleF9sb2NrKCZxLT5zeXNmc19sb2Nr KTsNCi0NCiAJc3Bpbl9sb2NrX2lycShxLT5xdWV1ZV9sb2NrKTsNCiAJcXVldWVfZmxhZ19jbGVh cihRVUVVRV9GTEFHX1JFR0lTVEVSRUQsIHEpOw0KIAlzcGluX3VubG9ja19pcnEocS0+cXVldWVf bG9jayk7DQogDQotCXdidF9leGl0KHEpOw0KLQ0KKwkvKg0KKwkgKiBSZW1vdmUgdGhlIHN5c2Zz IGF0dHJpYnV0ZXMgYmVmb3JlIHVucmVnaXN0ZXJpbmcgdGhlIHF1ZXVlIGRhdGENCisJICogc3Ry dWN0dXJlcyB0aGF0IGNhbiBiZSBtb2RpZmllZCB0aHJvdWdoIHN5c2ZzLg0KKwkgKi8NCisJbXV0 ZXhfbG9jaygmcS0+c3lzZnNfbG9jayk7DQogCWlmIChxLT5tcV9vcHMpDQogCQlibGtfbXFfdW5y ZWdpc3Rlcl9kZXYoZGlza190b19kZXYoZGlzayksIHEpOw0KLQ0KLQlpZiAocS0+cmVxdWVzdF9m biB8fCAocS0+bXFfb3BzICYmIHEtPmVsZXZhdG9yKSkNCi0JCWVsdl91bnJlZ2lzdGVyX3F1ZXVl KHEpOw0KKwltdXRleF91bmxvY2soJnEtPnN5c2ZzX2xvY2spOw0KIA0KIAlrb2JqZWN0X3VldmVu dCgmcS0+a29iaiwgS09CSl9SRU1PVkUpOw0KIAlrb2JqZWN0X2RlbCgmcS0+a29iaik7DQogCWJs a190cmFjZV9yZW1vdmVfc3lzZnMoZGlza190b19kZXYoZGlzaykpOw0KLQlrb2JqZWN0X3B1dCgm ZGlza190b19kZXYoZGlzayktPmtvYmopOw0KIA0KKwl3YnRfZXhpdChxKTsNCisNCisJbXV0ZXhf bG9jaygmcS0+c3lzZnNfbG9jayk7DQorCWlmIChxLT5yZXF1ZXN0X2ZuIHx8IChxLT5tcV9vcHMg JiYgcS0+ZWxldmF0b3IpKQ0KKwkJZWx2X3VucmVnaXN0ZXJfcXVldWUocSk7DQogCW11dGV4X3Vu bG9jaygmcS0+c3lzZnNfbG9jayk7DQorDQorCWtvYmplY3RfcHV0KCZkaXNrX3RvX2RldihkaXNr KS0+a29iaik7DQogfQ0KZGlmZiAtLWdpdCBhL2Jsb2NrL2VsZXZhdG9yLmMgYi9ibG9jay9lbGV2 YXRvci5jDQppbmRleCBlODdlOWI0M2FiYTAuLjRiNzk1N2IyOGE5OSAxMDA2NDQNCi0tLSBhL2Js b2NrL2VsZXZhdG9yLmMNCisrKyBiL2Jsb2NrL2VsZXZhdG9yLmMNCkBAIC0xMDgwLDEwICsxMDgw LDYgQEAgc3RhdGljIGludCBfX2VsZXZhdG9yX2NoYW5nZShzdHJ1Y3QgcmVxdWVzdF9xdWV1ZSAq cSwgY29uc3QgY2hhciAqbmFtZSkNCiAJY2hhciBlbGV2YXRvcl9uYW1lW0VMVl9OQU1FX01BWF07 DQogCXN0cnVjdCBlbGV2YXRvcl90eXBlICplOw0KIA0KLQkvKiBNYWtlIHN1cmUgcXVldWUgaXMg bm90IGluIHRoZSBtaWRkbGUgb2YgYmVpbmcgcmVtb3ZlZCAqLw0KLQlpZiAoIXRlc3RfYml0KFFV RVVFX0ZMQUdfUkVHSVNURVJFRCwgJnEtPnF1ZXVlX2ZsYWdzKSkNCi0JCXJldHVybiAtRU5PRU5U Ow0KLQ0KIAkvKg0KIAkgKiBTcGVjaWFsIGNhc2UgZm9yIG1xLCB0dXJuIG9mZiBzY2hlZHVsaW5n DQogCSAqLw0KLS0gDQoyLjE1LjENCg== ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue() 2018-01-17 0:03 ` Bart Van Assche @ 2018-01-17 1:23 ` Ming Lei 2018-01-17 2:17 ` Bart Van Assche 0 siblings, 1 reply; 10+ messages in thread From: Ming Lei @ 2018-01-17 1:23 UTC (permalink / raw) To: Bart Van Assche Cc: snitzer@redhat.com, hch@lst.de, linux-block@vger.kernel.org, axboe@kernel.dk On Wed, Jan 17, 2018 at 8:03 AM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Tue, 2018-01-16 at 17:32 -0500, Mike Snitzer wrote: >> Therefore it seems to me that all queue_attr_{show,store} are racey vs >> blk_unregister_queue() removing the 'queue' kobject. >> >> And it was just that __elevator_change() was myopicly fixed to address >> the race whereas a more generic solution was/is needed. But short of >> that more generic fix your change will reintroduce the potential for >> hitting the issue that commit e9a823fb34a8b fixed. >> >> In that light, think it best to leave blk_unregister_queue()'s >> mutex_lock() above the QUEUE_FLAG_REGISTERED clearing _and_ update >> queue_attr_{show,store} to test for QUEUE_FLAG_REGISTERED while holding >> sysfs_lock. >> >> Then remove the unicorn test_bit for QUEUE_FLAG_REGISTERED from >> __elevator_change(). > > Thanks Mike for the feedback. However, I think a simpler approach exists than > what has been described above, namely by unregistering the sysfs attributes > earlier. How about the patch below? > > [PATCH] block: Protect less code with sysfs_lock in blk_{un,}register_queue() > --- > block/blk-sysfs.c | 39 ++++++++++++++++++++++++++------------- > block/elevator.c | 4 ---- > 2 files changed, 26 insertions(+), 17 deletions(-) > > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > index 4a6a40ffd78e..ce32366c6db7 100644 > --- a/block/blk-sysfs.c > +++ b/block/blk-sysfs.c > @@ -853,6 +853,10 @@ struct kobj_type blk_queue_ktype = { > .release = blk_release_queue, > }; > > +/** > + * blk_register_queue - register a block layer queue with sysfs > + * @disk: Disk of which the request queue should be registered with sysfs. > + */ > int blk_register_queue(struct gendisk *disk) > { > int ret; > @@ -909,11 +913,12 @@ int blk_register_queue(struct gendisk *disk) > if (q->request_fn || (q->mq_ops && q->elevator)) { > ret = elv_register_queue(q); > if (ret) { > + mutex_unlock(&q->sysfs_lock); > kobject_uevent(&q->kobj, KOBJ_REMOVE); > kobject_del(&q->kobj); > blk_trace_remove_sysfs(dev); > kobject_put(&dev->kobj); > - goto unlock; > + return ret; > } > } > ret = 0; > @@ -923,6 +928,13 @@ int blk_register_queue(struct gendisk *disk) > } > EXPORT_SYMBOL_GPL(blk_register_queue); > > +/** > + * blk_unregister_queue - counterpart of blk_register_queue() > + * @disk: Disk of which the request queue should be unregistered from sysfs. > + * > + * Note: the caller is responsible for guaranteeing that this function is called > + * after blk_register_queue() has finished. > + */ > void blk_unregister_queue(struct gendisk *disk) > { > struct request_queue *q = disk->queue; > @@ -934,28 +946,29 @@ void blk_unregister_queue(struct gendisk *disk) > if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags)) > return; > > - /* > - * Protect against the 'queue' kobj being accessed > - * while/after it is removed. > - */ > - mutex_lock(&q->sysfs_lock); > - > spin_lock_irq(q->queue_lock); > queue_flag_clear(QUEUE_FLAG_REGISTERED, q); > spin_unlock_irq(q->queue_lock); > > - wbt_exit(q); > - > + /* > + * Remove the sysfs attributes before unregistering the queue data > + * structures that can be modified through sysfs. > + */ > + mutex_lock(&q->sysfs_lock); > if (q->mq_ops) > blk_mq_unregister_dev(disk_to_dev(disk), q); > - > - if (q->request_fn || (q->mq_ops && q->elevator)) > - elv_unregister_queue(q); > + mutex_unlock(&q->sysfs_lock); > > kobject_uevent(&q->kobj, KOBJ_REMOVE); > kobject_del(&q->kobj); elevator switch still can come just after the above line code is completed, so the previous warning addressed in e9a823fb34a8b can be triggered again. > blk_trace_remove_sysfs(disk_to_dev(disk)); > - kobject_put(&disk_to_dev(disk)->kobj); > > + wbt_exit(q); > + > + mutex_lock(&q->sysfs_lock); > + if (q->request_fn || (q->mq_ops && q->elevator)) > + elv_unregister_queue(q); > mutex_unlock(&q->sysfs_lock); > + > + kobject_put(&disk_to_dev(disk)->kobj); > } > diff --git a/block/elevator.c b/block/elevator.c > index e87e9b43aba0..4b7957b28a99 100644 > --- a/block/elevator.c > +++ b/block/elevator.c > @@ -1080,10 +1080,6 @@ static int __elevator_change(struct request_queue *q, const char *name) > char elevator_name[ELV_NAME_MAX]; > struct elevator_type *e; > > - /* Make sure queue is not in the middle of being removed */ > - if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags)) > - return -ENOENT; > - The above check shouldn't be removed, as I explained above. -- Ming Lei ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue() 2018-01-17 1:23 ` Ming Lei @ 2018-01-17 2:17 ` Bart Van Assche 2018-01-17 13:04 ` Ming Lei 0 siblings, 1 reply; 10+ messages in thread From: Bart Van Assche @ 2018-01-17 2:17 UTC (permalink / raw) To: tom.leiming@gmail.com Cc: hch@lst.de, linux-block@vger.kernel.org, snitzer@redhat.com, axboe@kernel.dk T24gV2VkLCAyMDE4LTAxLTE3IGF0IDA5OjIzICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g V2VkLCBKYW4gMTcsIDIwMTggYXQgODowMyBBTSwgQmFydCBWYW4gQXNzY2hlIDxCYXJ0LlZhbkFz c2NoZUB3ZGMuY29tPiB3cm90ZToNCj4gPiBPbiBUdWUsIDIwMTgtMDEtMTYgYXQgMTc6MzIgLTA1 MDAsIE1pa2UgU25pdHplciB3cm90ZToNCj4gPiA+IFRoZXJlZm9yZSBpdCBzZWVtcyB0byBtZSB0 aGF0IGFsbCBxdWV1ZV9hdHRyX3tzaG93LHN0b3JlfSBhcmUgcmFjZXkgdnMNCj4gPiA+IGJsa191 bnJlZ2lzdGVyX3F1ZXVlKCkgcmVtb3ZpbmcgdGhlICdxdWV1ZScga29iamVjdC4NCj4gPiA+IA0K PiA+ID4gQW5kIGl0IHdhcyBqdXN0IHRoYXQgX19lbGV2YXRvcl9jaGFuZ2UoKSB3YXMgbXlvcGlj bHkgZml4ZWQgdG8gYWRkcmVzcw0KPiA+ID4gdGhlIHJhY2Ugd2hlcmVhcyBhIG1vcmUgZ2VuZXJp YyBzb2x1dGlvbiB3YXMvaXMgbmVlZGVkLiAgQnV0IHNob3J0IG9mDQo+ID4gPiB0aGF0IG1vcmUg Z2VuZXJpYyBmaXggeW91ciBjaGFuZ2Ugd2lsbCByZWludHJvZHVjZSB0aGUgcG90ZW50aWFsIGZv cg0KPiA+ID4gaGl0dGluZyB0aGUgaXNzdWUgdGhhdCBjb21taXQgZTlhODIzZmIzNGE4YiBmaXhl ZC4NCj4gPiA+IA0KPiA+ID4gSW4gdGhhdCBsaWdodCwgdGhpbmsgaXQgYmVzdCB0byBsZWF2ZSBi bGtfdW5yZWdpc3Rlcl9xdWV1ZSgpJ3MNCj4gPiA+IG11dGV4X2xvY2soKSBhYm92ZSB0aGUgUVVF VUVfRkxBR19SRUdJU1RFUkVEIGNsZWFyaW5nIF9hbmRfIHVwZGF0ZQ0KPiA+ID4gcXVldWVfYXR0 cl97c2hvdyxzdG9yZX0gdG8gdGVzdCBmb3IgUVVFVUVfRkxBR19SRUdJU1RFUkVEIHdoaWxlIGhv bGRpbmcNCj4gPiA+IHN5c2ZzX2xvY2suDQo+ID4gPiANCj4gPiA+IFRoZW4gcmVtb3ZlIHRoZSB1 bmljb3JuIHRlc3RfYml0IGZvciBRVUVVRV9GTEFHX1JFR0lTVEVSRUQgZnJvbQ0KPiA+ID4gX19l bGV2YXRvcl9jaGFuZ2UoKS4NCj4gPiANCj4gPiBUaGFua3MgTWlrZSBmb3IgdGhlIGZlZWRiYWNr LiBIb3dldmVyLCBJIHRoaW5rIGEgc2ltcGxlciBhcHByb2FjaCBleGlzdHMgdGhhbg0KPiA+IHdo YXQgaGFzIGJlZW4gZGVzY3JpYmVkIGFib3ZlLCBuYW1lbHkgYnkgdW5yZWdpc3RlcmluZyB0aGUg c3lzZnMgYXR0cmlidXRlcw0KPiA+IGVhcmxpZXIuIEhvdyBhYm91dCB0aGUgcGF0Y2ggYmVsb3c/ DQo+ID4gDQo+ID4gW1BBVENIXSBibG9jazogUHJvdGVjdCBsZXNzIGNvZGUgd2l0aCBzeXNmc19s b2NrIGluIGJsa197dW4sfXJlZ2lzdGVyX3F1ZXVlKCkNCj4gPiAtLS0NCj4gPiAgYmxvY2svYmxr LXN5c2ZzLmMgfCAzOSArKysrKysrKysrKysrKysrKysrKysrKysrKy0tLS0tLS0tLS0tLS0NCj4g PiAgYmxvY2svZWxldmF0b3IuYyAgfCAgNCAtLS0tDQo+ID4gIDIgZmlsZXMgY2hhbmdlZCwgMjYg aW5zZXJ0aW9ucygrKSwgMTcgZGVsZXRpb25zKC0pDQo+ID4gDQo+ID4gZGlmZiAtLWdpdCBhL2Js b2NrL2Jsay1zeXNmcy5jIGIvYmxvY2svYmxrLXN5c2ZzLmMNCj4gPiBpbmRleCA0YTZhNDBmZmQ3 OGUuLmNlMzIzNjZjNmRiNyAxMDA2NDQNCj4gPiAtLS0gYS9ibG9jay9ibGstc3lzZnMuYw0KPiA+ ICsrKyBiL2Jsb2NrL2Jsay1zeXNmcy5jDQo+ID4gQEAgLTg1Myw2ICs4NTMsMTAgQEAgc3RydWN0 IGtvYmpfdHlwZSBibGtfcXVldWVfa3R5cGUgPSB7DQo+ID4gICAgICAgICAucmVsZWFzZSAgICAg ICAgPSBibGtfcmVsZWFzZV9xdWV1ZSwNCj4gPiAgfTsNCj4gPiANCj4gPiArLyoqDQo+ID4gKyAq IGJsa19yZWdpc3Rlcl9xdWV1ZSAtIHJlZ2lzdGVyIGEgYmxvY2sgbGF5ZXIgcXVldWUgd2l0aCBz eXNmcw0KPiA+ICsgKiBAZGlzazogRGlzayBvZiB3aGljaCB0aGUgcmVxdWVzdCBxdWV1ZSBzaG91 bGQgYmUgcmVnaXN0ZXJlZCB3aXRoIHN5c2ZzLg0KPiA+ICsgKi8NCj4gPiAgaW50IGJsa19yZWdp c3Rlcl9xdWV1ZShzdHJ1Y3QgZ2VuZGlzayAqZGlzaykNCj4gPiAgew0KPiA+ICAgICAgICAgaW50 IHJldDsNCj4gPiBAQCAtOTA5LDExICs5MTMsMTIgQEAgaW50IGJsa19yZWdpc3Rlcl9xdWV1ZShz dHJ1Y3QgZ2VuZGlzayAqZGlzaykNCj4gPiAgICAgICAgIGlmIChxLT5yZXF1ZXN0X2ZuIHx8IChx LT5tcV9vcHMgJiYgcS0+ZWxldmF0b3IpKSB7DQo+ID4gICAgICAgICAgICAgICAgIHJldCA9IGVs dl9yZWdpc3Rlcl9xdWV1ZShxKTsNCj4gPiAgICAgICAgICAgICAgICAgaWYgKHJldCkgew0KPiA+ ICsgICAgICAgICAgICAgICAgICAgICAgIG11dGV4X3VubG9jaygmcS0+c3lzZnNfbG9jayk7DQo+ ID4gICAgICAgICAgICAgICAgICAgICAgICAga29iamVjdF91ZXZlbnQoJnEtPmtvYmosIEtPQkpf UkVNT1ZFKTsNCj4gPiAgICAgICAgICAgICAgICAgICAgICAgICBrb2JqZWN0X2RlbCgmcS0+a29i aik7DQo+ID4gICAgICAgICAgICAgICAgICAgICAgICAgYmxrX3RyYWNlX3JlbW92ZV9zeXNmcyhk ZXYpOw0KPiA+ICAgICAgICAgICAgICAgICAgICAgICAgIGtvYmplY3RfcHV0KCZkZXYtPmtvYmop Ow0KPiA+IC0gICAgICAgICAgICAgICAgICAgICAgIGdvdG8gdW5sb2NrOw0KPiA+ICsgICAgICAg ICAgICAgICAgICAgICAgIHJldHVybiByZXQ7DQo+ID4gICAgICAgICAgICAgICAgIH0NCj4gPiAg ICAgICAgIH0NCj4gPiAgICAgICAgIHJldCA9IDA7DQo+ID4gQEAgLTkyMyw2ICs5MjgsMTMgQEAg aW50IGJsa19yZWdpc3Rlcl9xdWV1ZShzdHJ1Y3QgZ2VuZGlzayAqZGlzaykNCj4gPiAgfQ0KPiA+ ICBFWFBPUlRfU1lNQk9MX0dQTChibGtfcmVnaXN0ZXJfcXVldWUpOw0KPiA+IA0KPiA+ICsvKioN Cj4gPiArICogYmxrX3VucmVnaXN0ZXJfcXVldWUgLSBjb3VudGVycGFydCBvZiBibGtfcmVnaXN0 ZXJfcXVldWUoKQ0KPiA+ICsgKiBAZGlzazogRGlzayBvZiB3aGljaCB0aGUgcmVxdWVzdCBxdWV1 ZSBzaG91bGQgYmUgdW5yZWdpc3RlcmVkIGZyb20gc3lzZnMuDQo+ID4gKyAqDQo+ID4gKyAqIE5v dGU6IHRoZSBjYWxsZXIgaXMgcmVzcG9uc2libGUgZm9yIGd1YXJhbnRlZWluZyB0aGF0IHRoaXMg ZnVuY3Rpb24gaXMgY2FsbGVkDQo+ID4gKyAqIGFmdGVyIGJsa19yZWdpc3Rlcl9xdWV1ZSgpIGhh cyBmaW5pc2hlZC4NCj4gPiArICovDQo+ID4gIHZvaWQgYmxrX3VucmVnaXN0ZXJfcXVldWUoc3Ry dWN0IGdlbmRpc2sgKmRpc2spDQo+ID4gIHsNCj4gPiAgICAgICAgIHN0cnVjdCByZXF1ZXN0X3F1 ZXVlICpxID0gZGlzay0+cXVldWU7DQo+ID4gQEAgLTkzNCwyOCArOTQ2LDI5IEBAIHZvaWQgYmxr X3VucmVnaXN0ZXJfcXVldWUoc3RydWN0IGdlbmRpc2sgKmRpc2spDQo+ID4gICAgICAgICBpZiAo IXRlc3RfYml0KFFVRVVFX0ZMQUdfUkVHSVNURVJFRCwgJnEtPnF1ZXVlX2ZsYWdzKSkNCj4gPiAg ICAgICAgICAgICAgICAgcmV0dXJuOw0KPiA+IA0KPiA+IC0gICAgICAgLyoNCj4gPiAtICAgICAg ICAqIFByb3RlY3QgYWdhaW5zdCB0aGUgJ3F1ZXVlJyBrb2JqIGJlaW5nIGFjY2Vzc2VkDQo+ID4g LSAgICAgICAgKiB3aGlsZS9hZnRlciBpdCBpcyByZW1vdmVkLg0KPiA+IC0gICAgICAgICovDQo+ ID4gLSAgICAgICBtdXRleF9sb2NrKCZxLT5zeXNmc19sb2NrKTsNCj4gPiAtDQo+ID4gICAgICAg ICBzcGluX2xvY2tfaXJxKHEtPnF1ZXVlX2xvY2spOw0KPiA+ICAgICAgICAgcXVldWVfZmxhZ19j bGVhcihRVUVVRV9GTEFHX1JFR0lTVEVSRUQsIHEpOw0KPiA+ICAgICAgICAgc3Bpbl91bmxvY2tf aXJxKHEtPnF1ZXVlX2xvY2spOw0KPiA+IA0KPiA+IC0gICAgICAgd2J0X2V4aXQocSk7DQo+ID4g LQ0KPiA+ICsgICAgICAgLyoNCj4gPiArICAgICAgICAqIFJlbW92ZSB0aGUgc3lzZnMgYXR0cmli dXRlcyBiZWZvcmUgdW5yZWdpc3RlcmluZyB0aGUgcXVldWUgZGF0YQ0KPiA+ICsgICAgICAgICog c3RydWN0dXJlcyB0aGF0IGNhbiBiZSBtb2RpZmllZCB0aHJvdWdoIHN5c2ZzLg0KPiA+ICsgICAg ICAgICovDQo+ID4gKyAgICAgICBtdXRleF9sb2NrKCZxLT5zeXNmc19sb2NrKTsNCj4gPiAgICAg ICAgIGlmIChxLT5tcV9vcHMpDQo+ID4gICAgICAgICAgICAgICAgIGJsa19tcV91bnJlZ2lzdGVy X2RldihkaXNrX3RvX2RldihkaXNrKSwgcSk7DQo+ID4gLQ0KPiA+IC0gICAgICAgaWYgKHEtPnJl cXVlc3RfZm4gfHwgKHEtPm1xX29wcyAmJiBxLT5lbGV2YXRvcikpDQo+ID4gLSAgICAgICAgICAg ICAgIGVsdl91bnJlZ2lzdGVyX3F1ZXVlKHEpOw0KPiA+ICsgICAgICAgbXV0ZXhfdW5sb2NrKCZx LT5zeXNmc19sb2NrKTsNCj4gPiANCj4gPiAgICAgICAgIGtvYmplY3RfdWV2ZW50KCZxLT5rb2Jq LCBLT0JKX1JFTU9WRSk7DQo+ID4gICAgICAgICBrb2JqZWN0X2RlbCgmcS0+a29iaik7DQo+IA0K PiBlbGV2YXRvciBzd2l0Y2ggc3RpbGwgY2FuIGNvbWUganVzdCBhZnRlciB0aGUgYWJvdmUgbGlu ZSBjb2RlIGlzIGNvbXBsZXRlZCwNCj4gc28gdGhlIHByZXZpb3VzIHdhcm5pbmcgYWRkcmVzc2Vk IGluIGU5YTgyM2ZiMzRhOGIgY2FuIGJlIHRyaWdnZXJlZA0KPiBhZ2Fpbi4NCj4gDQo+ID4gICAg ICAgICBibGtfdHJhY2VfcmVtb3ZlX3N5c2ZzKGRpc2tfdG9fZGV2KGRpc2spKTsNCj4gPiAtICAg ICAgIGtvYmplY3RfcHV0KCZkaXNrX3RvX2RldihkaXNrKS0+a29iaik7DQo+ID4gDQo+ID4gKyAg ICAgICB3YnRfZXhpdChxKTsNCj4gPiArDQo+ID4gKyAgICAgICBtdXRleF9sb2NrKCZxLT5zeXNm c19sb2NrKTsNCj4gPiArICAgICAgIGlmIChxLT5yZXF1ZXN0X2ZuIHx8IChxLT5tcV9vcHMgJiYg cS0+ZWxldmF0b3IpKQ0KPiA+ICsgICAgICAgICAgICAgICBlbHZfdW5yZWdpc3Rlcl9xdWV1ZShx KTsNCj4gPiAgICAgICAgIG11dGV4X3VubG9jaygmcS0+c3lzZnNfbG9jayk7DQo+ID4gKw0KPiA+ ICsgICAgICAga29iamVjdF9wdXQoJmRpc2tfdG9fZGV2KGRpc2spLT5rb2JqKTsNCj4gPiAgfQ0K PiA+IGRpZmYgLS1naXQgYS9ibG9jay9lbGV2YXRvci5jIGIvYmxvY2svZWxldmF0b3IuYw0KPiA+ IGluZGV4IGU4N2U5YjQzYWJhMC4uNGI3OTU3YjI4YTk5IDEwMDY0NA0KPiA+IC0tLSBhL2Jsb2Nr L2VsZXZhdG9yLmMNCj4gPiArKysgYi9ibG9jay9lbGV2YXRvci5jDQo+ID4gQEAgLTEwODAsMTAg KzEwODAsNiBAQCBzdGF0aWMgaW50IF9fZWxldmF0b3JfY2hhbmdlKHN0cnVjdCByZXF1ZXN0X3F1 ZXVlICpxLCBjb25zdCBjaGFyICpuYW1lKQ0KPiA+ICAgICAgICAgY2hhciBlbGV2YXRvcl9uYW1l W0VMVl9OQU1FX01BWF07DQo+ID4gICAgICAgICBzdHJ1Y3QgZWxldmF0b3JfdHlwZSAqZTsNCj4g PiANCj4gPiAtICAgICAgIC8qIE1ha2Ugc3VyZSBxdWV1ZSBpcyBub3QgaW4gdGhlIG1pZGRsZSBv ZiBiZWluZyByZW1vdmVkICovDQo+ID4gLSAgICAgICBpZiAoIXRlc3RfYml0KFFVRVVFX0ZMQUdf UkVHSVNURVJFRCwgJnEtPnF1ZXVlX2ZsYWdzKSkNCj4gPiAtICAgICAgICAgICAgICAgcmV0dXJu IC1FTk9FTlQ7DQo+ID4gLQ0KPiANCj4gVGhlIGFib3ZlIGNoZWNrIHNob3VsZG4ndCBiZSByZW1v dmVkLCBhcyBJIGV4cGxhaW5lZCBhYm92ZS4NCg0KSGVsbG8gTWluZywNCg0KU29ycnkgYnV0IEkg dGhpbmsgd2hhdCB5b3Ugd3JvdGUgaXMgd3JvbmcuIGtvYmplY3RfZGVsKCZxLT5rb2JqKSB3YWl0 cyB1bnRpbCBhbGwNCm9uZ29pbmcgc3lzZnMgY2FsbGJhY2sgbWV0aG9kcywgaW5jbHVkaW5nIGVs dl9pb3NjaGVkX3N0b3JlKCksIGhhdmUgZmluaXNoZWQgYW5kDQpwcmV2ZW50cyB0aGF0IGFueSBu ZXcgZWx2X2lvc2NoZWRfc3RvcmUoKSBjYWxscyBzdGFydC4gVGhhdCBpcyB3aHkgSSB0aGluayB0 aGUNCmFib3ZlIGNoYW5nZXMgZG8gbm90IHJlaW50cm9kdWNlIHRoZSByYWNlIGZpeGVkIGJ5IGNv bW1pdCBlOWE4MjNmYjM0YTggKCJibG9jazoNCmZpeCB3YXJuaW5nIHdoZW4gSS9PIGVsZXZhdG9y IGlzIGNoYW5nZWQgYXMgcmVxdWVzdF9xdWV1ZSBpcyBiZWluZyByZW1vdmVkIikuDQoNCkJhcnQu ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue() 2018-01-17 2:17 ` Bart Van Assche @ 2018-01-17 13:04 ` Ming Lei 2018-01-17 17:19 ` Bart Van Assche 0 siblings, 1 reply; 10+ messages in thread From: Ming Lei @ 2018-01-17 13:04 UTC (permalink / raw) To: Bart Van Assche Cc: hch@lst.de, linux-block@vger.kernel.org, snitzer@redhat.com, axboe@kernel.dk On Wed, Jan 17, 2018 at 10:17 AM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Wed, 2018-01-17 at 09:23 +0800, Ming Lei wrote: >> On Wed, Jan 17, 2018 at 8:03 AM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: >> > On Tue, 2018-01-16 at 17:32 -0500, Mike Snitzer wrote: >> > > Therefore it seems to me that all queue_attr_{show,store} are racey vs >> > > blk_unregister_queue() removing the 'queue' kobject. >> > > >> > > And it was just that __elevator_change() was myopicly fixed to address >> > > the race whereas a more generic solution was/is needed. But short of >> > > that more generic fix your change will reintroduce the potential for >> > > hitting the issue that commit e9a823fb34a8b fixed. >> > > >> > > In that light, think it best to leave blk_unregister_queue()'s >> > > mutex_lock() above the QUEUE_FLAG_REGISTERED clearing _and_ update >> > > queue_attr_{show,store} to test for QUEUE_FLAG_REGISTERED while holding >> > > sysfs_lock. >> > > >> > > Then remove the unicorn test_bit for QUEUE_FLAG_REGISTERED from >> > > __elevator_change(). >> > >> > Thanks Mike for the feedback. However, I think a simpler approach exists than >> > what has been described above, namely by unregistering the sysfs attributes >> > earlier. How about the patch below? >> > >> > [PATCH] block: Protect less code with sysfs_lock in blk_{un,}register_queue() >> > --- >> > block/blk-sysfs.c | 39 ++++++++++++++++++++++++++------------- >> > block/elevator.c | 4 ---- >> > 2 files changed, 26 insertions(+), 17 deletions(-) >> > >> > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c >> > index 4a6a40ffd78e..ce32366c6db7 100644 >> > --- a/block/blk-sysfs.c >> > +++ b/block/blk-sysfs.c >> > @@ -853,6 +853,10 @@ struct kobj_type blk_queue_ktype = { >> > .release = blk_release_queue, >> > }; >> > >> > +/** >> > + * blk_register_queue - register a block layer queue with sysfs >> > + * @disk: Disk of which the request queue should be registered with sysfs. >> > + */ >> > int blk_register_queue(struct gendisk *disk) >> > { >> > int ret; >> > @@ -909,11 +913,12 @@ int blk_register_queue(struct gendisk *disk) >> > if (q->request_fn || (q->mq_ops && q->elevator)) { >> > ret = elv_register_queue(q); >> > if (ret) { >> > + mutex_unlock(&q->sysfs_lock); >> > kobject_uevent(&q->kobj, KOBJ_REMOVE); >> > kobject_del(&q->kobj); >> > blk_trace_remove_sysfs(dev); >> > kobject_put(&dev->kobj); >> > - goto unlock; >> > + return ret; >> > } >> > } >> > ret = 0; >> > @@ -923,6 +928,13 @@ int blk_register_queue(struct gendisk *disk) >> > } >> > EXPORT_SYMBOL_GPL(blk_register_queue); >> > >> > +/** >> > + * blk_unregister_queue - counterpart of blk_register_queue() >> > + * @disk: Disk of which the request queue should be unregistered from sysfs. >> > + * >> > + * Note: the caller is responsible for guaranteeing that this function is called >> > + * after blk_register_queue() has finished. >> > + */ >> > void blk_unregister_queue(struct gendisk *disk) >> > { >> > struct request_queue *q = disk->queue; >> > @@ -934,28 +946,29 @@ void blk_unregister_queue(struct gendisk *disk) >> > if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags)) >> > return; >> > >> > - /* >> > - * Protect against the 'queue' kobj being accessed >> > - * while/after it is removed. >> > - */ >> > - mutex_lock(&q->sysfs_lock); >> > - >> > spin_lock_irq(q->queue_lock); >> > queue_flag_clear(QUEUE_FLAG_REGISTERED, q); >> > spin_unlock_irq(q->queue_lock); >> > >> > - wbt_exit(q); >> > - >> > + /* >> > + * Remove the sysfs attributes before unregistering the queue data >> > + * structures that can be modified through sysfs. >> > + */ >> > + mutex_lock(&q->sysfs_lock); >> > if (q->mq_ops) >> > blk_mq_unregister_dev(disk_to_dev(disk), q); >> > - >> > - if (q->request_fn || (q->mq_ops && q->elevator)) >> > - elv_unregister_queue(q); >> > + mutex_unlock(&q->sysfs_lock); >> > >> > kobject_uevent(&q->kobj, KOBJ_REMOVE); >> > kobject_del(&q->kobj); >> >> elevator switch still can come just after the above line code is completed, >> so the previous warning addressed in e9a823fb34a8b can be triggered >> again. >> >> > blk_trace_remove_sysfs(disk_to_dev(disk)); >> > - kobject_put(&disk_to_dev(disk)->kobj); >> > >> > + wbt_exit(q); >> > + >> > + mutex_lock(&q->sysfs_lock); >> > + if (q->request_fn || (q->mq_ops && q->elevator)) >> > + elv_unregister_queue(q); >> > mutex_unlock(&q->sysfs_lock); >> > + >> > + kobject_put(&disk_to_dev(disk)->kobj); >> > } >> > diff --git a/block/elevator.c b/block/elevator.c >> > index e87e9b43aba0..4b7957b28a99 100644 >> > --- a/block/elevator.c >> > +++ b/block/elevator.c >> > @@ -1080,10 +1080,6 @@ static int __elevator_change(struct request_queue *q, const char *name) >> > char elevator_name[ELV_NAME_MAX]; >> > struct elevator_type *e; >> > >> > - /* Make sure queue is not in the middle of being removed */ >> > - if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags)) >> > - return -ENOENT; >> > - >> >> The above check shouldn't be removed, as I explained above. > > Hello Ming, > > Sorry but I think what you wrote is wrong. kobject_del(&q->kobj) waits until all > ongoing sysfs callback methods, including elv_iosched_store(), have finished and > prevents that any new elv_iosched_store() calls start. That is why I think the > above changes do not reintroduce the race fixed by commit e9a823fb34a8 ("block: > fix warning when I/O elevator is changed as request_queue is being removed"). But your patch basically reverts commit e9a823fb34a8, and I just saw the warning again after applying your patch in my stress test of switching elelvato: [ 225.999503] ------------[ cut here ]------------ [ 225.999505] kobject_add_internal failed for iosched (error: -2 parent: queue) [ 225.999521] WARNING: CPU: 0 PID: 12707 at lib/kobject.c:244 kobject_add_internal+0x236/0x24c [ 225.999522] Modules linked in: scsi_debug ebtable_filter ebtables ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c bridge stp llc fuse iptable_filter ip_tables sd_mod sg sdhci_pci sdhci bcache usb_storage crc32c_intel mmc_core ahci lpc_ich mfd_core virtio_scsi libahci libata nvme shpchp qemu_fw_cfg nvme_core binfmt_misc dm_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi null_blk configs [last unloaded: scsi_debug] [ 225.999551] CPU: 0 PID: 12707 Comm: elv-switch Tainted: G W 4.15.0-rc7+ #119 [ 225.999552] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.9.3-1.fc25 04/01/2014 [ 225.999554] RIP: 0010:kobject_add_internal+0x236/0x24c [ 225.999555] RSP: 0018:ffffc900012bfca8 EFLAGS: 00010286 [ 225.999556] RAX: 0000000000000000 RBX: ffff88016ec23010 RCX: 0000000000000001 [ 225.999556] RDX: 0000000000000000 RSI: ffffffff81e545b2 RDI: 00000000ffffffff [ 225.999557] RBP: ffff880178f8e6a8 R08: 00000047c13e75ba R09: ffffffff8275daf0 [ 225.999560] R10: ffffc900012bfd60 R11: ffffffff826e3fb1 R12: 00000000fffffffe [ 225.999560] R13: ffff880178f8e6a8 R14: 0000000000000006 R15: ffff880178f8e4e0 [ 225.999561] FS: 00007fcd1445a700(0000) GS:ffff88017bc00000(0000) knlGS:0000000000000000 [ 225.999562] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 225.999563] CR2: 00007f623847a750 CR3: 0000000279974001 CR4: 00000000003606f0 [ 225.999565] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 225.999565] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 225.999566] Call Trace: [ 225.999570] kobject_add+0x9e/0xc5 [ 225.999573] elv_register_queue+0x35/0xa2 [ 225.999575] elevator_switch+0x7a/0x1a4 [ 225.999577] elv_iosched_store+0xd2/0x103 [ 225.999579] queue_attr_store+0x66/0x82 [ 225.999581] kernfs_fop_write+0xf3/0x135 [ 225.999583] __vfs_write+0x31/0x142 [ 225.999585] ? _raw_spin_unlock+0x16/0x27 [ 225.999586] ? __alloc_fd+0x147/0x159 [ 225.999588] ? preempt_count_add+0x6d/0x8c [ 225.999589] ? preempt_count_add+0x7a/0x8c [ 225.999590] ? _raw_spin_lock+0x13/0x30 [ 225.999591] vfs_write+0xcb/0x16e [ 225.999593] SyS_write+0x5d/0xab [ 225.999595] entry_SYSCALL_64_fastpath+0x1a/0x7d [ 225.999597] RIP: 0033:0x7fcd13f76a10 [ 225.999597] RSP: 002b:00007fffd6b565c8 EFLAGS: 00000246 [ 225.999599] Code: eb 31 48 85 ed 49 c7 c0 2f 5a ea 81 74 04 4c 8b 45 00 48 8b 13 44 89 e1 48 c7 c6 10 e5 cc 81 48 c7 c7 23 5b ea 81 e8 83 6b a7 ff <0f> ff eb 04 80 4b 3c 02 5b 44 89 e0 5d 41 5c 41 5d 41 5e 41 5f [ 225.999619] ---[ end trace 83f7e40aaf041cea ]--- -- Ming Lei ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue() 2018-01-17 13:04 ` Ming Lei @ 2018-01-17 17:19 ` Bart Van Assche 0 siblings, 0 replies; 10+ messages in thread From: Bart Van Assche @ 2018-01-17 17:19 UTC (permalink / raw) To: tom.leiming@gmail.com Cc: hch@lst.de, linux-block@vger.kernel.org, snitzer@redhat.com, axboe@kernel.dk T24gV2VkLCAyMDE4LTAxLTE3IGF0IDIxOjA0ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g V2VkLCBKYW4gMTcsIDIwMTggYXQgMTA6MTcgQU0sIEJhcnQgVmFuIEFzc2NoZQ0KPiA8QmFydC5W YW5Bc3NjaGVAd2RjLmNvbT4gd3JvdGU6DQo+ID4gU29ycnkgYnV0IEkgdGhpbmsgd2hhdCB5b3Ug d3JvdGUgaXMgd3JvbmcuIGtvYmplY3RfZGVsKCZxLT5rb2JqKSB3YWl0cyB1bnRpbCBhbGwNCj4g PiBvbmdvaW5nIHN5c2ZzIGNhbGxiYWNrIG1ldGhvZHMsIGluY2x1ZGluZyBlbHZfaW9zY2hlZF9z dG9yZSgpLCBoYXZlIGZpbmlzaGVkIGFuZA0KPiA+IHByZXZlbnRzIHRoYXQgYW55IG5ldyBlbHZf aW9zY2hlZF9zdG9yZSgpIGNhbGxzIHN0YXJ0LiBUaGF0IGlzIHdoeSBJIHRoaW5rIHRoZQ0KPiA+ IGFib3ZlIGNoYW5nZXMgZG8gbm90IHJlaW50cm9kdWNlIHRoZSByYWNlIGZpeGVkIGJ5IGNvbW1p dCBlOWE4MjNmYjM0YTggKCJibG9jazoNCj4gPiBmaXggd2FybmluZyB3aGVuIEkvTyBlbGV2YXRv ciBpcyBjaGFuZ2VkIGFzIHJlcXVlc3RfcXVldWUgaXMgYmVpbmcgcmVtb3ZlZCIpLg0KPiANCj4g QnV0IHlvdXIgcGF0Y2ggYmFzaWNhbGx5IHJldmVydHMgY29tbWl0IGU5YTgyM2ZiMzRhOCwgYW5k IEkganVzdCBzYXcgdGhlIHdhcm5pbmcNCj4gYWdhaW4gYWZ0ZXIgYXBwbHlpbmcgeW91ciBwYXRj aCBpbiBteSBzdHJlc3MgdGVzdCBvZiBzd2l0Y2hpbmcgZWxlbHZhdG86DQo+IA0KPiBbICAyMjUu OTk5NTA1XSBrb2JqZWN0X2FkZF9pbnRlcm5hbCBmYWlsZWQgZm9yIGlvc2NoZWQgKGVycm9yOiAt MiBwYXJlbnQ6IHF1ZXVlKQ0KPiBbICAyMjUuOTk5NTIxXSBXQVJOSU5HOiBDUFU6IDAgUElEOiAx MjcwNyBhdCBsaWIva29iamVjdC5jOjI0NA0KPiBrb2JqZWN0X2FkZF9pbnRlcm5hbCsweDIzNi8w eDI0Yw0KPiBbICAyMjUuOTk5NTY2XSBDYWxsIFRyYWNlOg0KPiBbICAyMjUuOTk5NTcwXSAga29i amVjdF9hZGQrMHg5ZS8weGM1DQo+IFsgIDIyNS45OTk1NzNdICBlbHZfcmVnaXN0ZXJfcXVldWUr MHgzNS8weGEyDQo+IFsgIDIyNS45OTk1NzVdICBlbGV2YXRvcl9zd2l0Y2grMHg3YS8weDFhNA0K PiBbICAyMjUuOTk5NTc3XSAgZWx2X2lvc2NoZWRfc3RvcmUrMHhkMi8weDEwMw0KPiBbICAyMjUu OTk5NTc5XSAgcXVldWVfYXR0cl9zdG9yZSsweDY2LzB4ODINCj4gWyAgMjI1Ljk5OTU4MV0gIGtl cm5mc19mb3Bfd3JpdGUrMHhmMy8weDEzNQ0KPiBbICAyMjUuOTk5NTgzXSAgX192ZnNfd3JpdGUr MHgzMS8weDE0Mg0KPiBbICAyMjUuOTk5NTkxXSAgdmZzX3dyaXRlKzB4Y2IvMHgxNmUNCj4gWyAg MjI1Ljk5OTU5M10gIFN5U193cml0ZSsweDVkLzB4YWINCj4gWyAgMjI1Ljk5OTU5NV0gIGVudHJ5 X1NZU0NBTExfNjRfZmFzdHBhdGgrMHgxYS8weDdkDQoNClRoZSBhYm92ZSBjYWxsIHN0YWNrIG1l YW5zIHRoYXQgc3lzZnNfY3JlYXRlX2Rpcl9ucygpIHJldHVybmVkIC1FTk9FTlQgYmVjYXVzZQ0K a29iai0+cGFyZW50LT5zZCB3YXMgTlVMTCAod2hlcmUga29iaiBpcyB0aGUgZWxldmF0b3Iga2Vy bmVsIG9iamVjdCkuIFRoYXQncw0KcHJvYmFibHkgYmVjYXNlIHN5c2ZzX3JlbW92ZV9kaXIoKSBj bGVhcnMgdGhlIHNkIHBvaW50ZXIgYmVmb3JlIHBlcmZvcm1pbmcgdGhlDQphY3R1YWwgcmVtb3Zh bC4gRnJvbSBmcy9zeXNmcy9kaXIuYzoNCg0KCXNwaW5fbG9jaygmc3lzZnNfc3ltbGlua190YXJn ZXRfbG9jayk7DQoJa29iai0+c2QgPSBOVUxMOw0KCXNwaW5fdW5sb2NrKCZzeXNmc19zeW1saW5r X3RhcmdldF9sb2NrKTsNCg0KCWlmIChrbikgew0KCQlXQVJOX09OX09OQ0Uoa2VybmZzX3R5cGUo a24pICE9IEtFUk5GU19ESVIpOw0KCQlrZXJuZnNfcmVtb3ZlKGtuKTsNCgl9DQoNCkluIHRoZSBw YXN0IHRoZXNlIHR3byBhY3Rpb25zIHdlcmUgcGVyZm9ybWVkIGluIHRoZSBvcHBvc2l0ZSBvcmRl ci4gU2VlIGFsc28NCmNvbW1pdCBhZWNkY2VkYWFiNDkgKCJzeXNmczogaW1wbGVtZW50IGtvYmpf c3lzZnNfYXNzb2NfbG9jayIpLiBBbnl3YXksIEkgd2lsbA0KcmV3b3JrIHRoaXMgcGF0Y2guDQoN CkJhcnQu ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-01-17 17:19 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-16 18:17 [PATCH 0/3] Avoid that blk_{un,}register_queue() trigger lock inversion Bart Van Assche
2018-01-16 18:17 ` [PATCH 1/3] block: Unexport elv_register_queue() and elv_unregister_queue() Bart Van Assche
2018-01-16 18:17 ` [PATCH 2/3] block: Document scheduler change code locking requirements Bart Van Assche
2018-01-16 18:17 ` [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue() Bart Van Assche
2018-01-16 22:32 ` Mike Snitzer
2018-01-17 0:03 ` Bart Van Assche
2018-01-17 1:23 ` Ming Lei
2018-01-17 2:17 ` Bart Van Assche
2018-01-17 13:04 ` Ming Lei
2018-01-17 17:19 ` Bart Van Assche
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox