public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] hard LOCKUP caused by race between blk_init_queue_node and blkcg_print_blkgs
@ 2018-01-30 11:21 Joseph Qi
  2018-01-30 21:19 ` Bart Van Assche
  0 siblings, 1 reply; 4+ messages in thread
From: Joseph Qi @ 2018-01-30 11:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Gang Deng

Hi Jens and Folks,

Recently we've gotten a hard LOCKUP issue. After investigating the issue
we've found a race between blk_init_queue_node and blkcg_print_blkgs.
The race is described below.

blk_init_queue_node                 blkcg_print_blkgs
  blk_alloc_queue_node (1)
    q->queue_lock = &q->__queue_lock (2)
    blkcg_init_queue(q) (3)
                                    spin_lock_irq(blkg->q->queue_lock) (4)
  q->queue_lock = lock (5)
                                    spin_unlock_irq(blkg->q->queue_lock) (6)

(1) allocate an uninitialized queue;
(2) initialize queue_lock to its default internal lock;
(3) initialize blkcg part of request queue, which will create blkg and
then insert it to blkg_list;
(4) traverse blkg_list and find the created blkg, and then take its
queue lock, here it is the default *internal lock*;
(5) *race window*, now queue_lock is overridden with *driver specified
lock*;
(6) now unlock *driver specified lock*, not the locked *internal lock*,
unlock balance breaks.

For the issue above, I think blkcg_init_queue is a bit earlier. We
can't allow a further use before request queue is fully initialized.
Since blk_init_queue_node is a really common path and it allows driver
to override the default internal lock, I'm afraid several other places
may also have the same issue.
Am I missing something here?

Thanks,
Joseph

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] hard LOCKUP caused by race between blk_init_queue_node and blkcg_print_blkgs
  2018-01-30 11:21 [RFC] hard LOCKUP caused by race between blk_init_queue_node and blkcg_print_blkgs Joseph Qi
@ 2018-01-30 21:19 ` Bart Van Assche
  2018-01-31  1:53   ` Joseph Qi
  0 siblings, 1 reply; 4+ messages in thread
From: Bart Van Assche @ 2018-01-30 21:19 UTC (permalink / raw)
  To: joseph.qi@linux.alibaba.com, axboe@kernel.dk
  Cc: gavin.dg@linux.alibaba.com, linux-block@vger.kernel.org

T24gVHVlLCAyMDE4LTAxLTMwIGF0IDE5OjIxICswODAwLCBKb3NlcGggUWkgd3JvdGU6DQo+IEhp
IEplbnMgYW5kIEZvbGtzLA0KPiANCj4gUmVjZW50bHkgd2UndmUgZ290dGVuIGEgaGFyZCBMT0NL
VVAgaXNzdWUuIEFmdGVyIGludmVzdGlnYXRpbmcgdGhlIGlzc3VlDQo+IHdlJ3ZlIGZvdW5kIGEg
cmFjZSBiZXR3ZWVuIGJsa19pbml0X3F1ZXVlX25vZGUgYW5kIGJsa2NnX3ByaW50X2Jsa2dzLg0K
PiBUaGUgcmFjZSBpcyBkZXNjcmliZWQgYmVsb3cuDQo+IA0KPiBibGtfaW5pdF9xdWV1ZV9ub2Rl
ICAgICAgICAgICAgICAgICBibGtjZ19wcmludF9ibGtncw0KPiAgIGJsa19hbGxvY19xdWV1ZV9u
b2RlICgxKQ0KPiAgICAgcS0+cXVldWVfbG9jayA9ICZxLT5fX3F1ZXVlX2xvY2sgKDIpDQo+ICAg
ICBibGtjZ19pbml0X3F1ZXVlKHEpICgzKQ0KPiAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICBzcGluX2xvY2tfaXJxKGJsa2ctPnEtPnF1ZXVlX2xvY2spICg0KQ0KPiAgIHEtPnF1
ZXVlX2xvY2sgPSBsb2NrICg1KQ0KPiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICBzcGluX3VubG9ja19pcnEoYmxrZy0+cS0+cXVldWVfbG9jaykgKDYpDQo+IA0KPiAoMSkgYWxs
b2NhdGUgYW4gdW5pbml0aWFsaXplZCBxdWV1ZTsNCj4gKDIpIGluaXRpYWxpemUgcXVldWVfbG9j
ayB0byBpdHMgZGVmYXVsdCBpbnRlcm5hbCBsb2NrOw0KPiAoMykgaW5pdGlhbGl6ZSBibGtjZyBw
YXJ0IG9mIHJlcXVlc3QgcXVldWUsIHdoaWNoIHdpbGwgY3JlYXRlIGJsa2cgYW5kDQo+IHRoZW4g
aW5zZXJ0IGl0IHRvIGJsa2dfbGlzdDsNCj4gKDQpIHRyYXZlcnNlIGJsa2dfbGlzdCBhbmQgZmlu
ZCB0aGUgY3JlYXRlZCBibGtnLCBhbmQgdGhlbiB0YWtlIGl0cw0KPiBxdWV1ZSBsb2NrLCBoZXJl
IGl0IGlzIHRoZSBkZWZhdWx0ICppbnRlcm5hbCBsb2NrKjsNCj4gKDUpICpyYWNlIHdpbmRvdyos
IG5vdyBxdWV1ZV9sb2NrIGlzIG92ZXJyaWRkZW4gd2l0aCAqZHJpdmVyIHNwZWNpZmllZA0KPiBs
b2NrKjsNCj4gKDYpIG5vdyB1bmxvY2sgKmRyaXZlciBzcGVjaWZpZWQgbG9jayosIG5vdCB0aGUg
bG9ja2VkICppbnRlcm5hbCBsb2NrKiwNCj4gdW5sb2NrIGJhbGFuY2UgYnJlYWtzLg0KPiANCj4g
Rm9yIHRoZSBpc3N1ZSBhYm92ZSwgSSB0aGluayBibGtjZ19pbml0X3F1ZXVlIGlzIGEgYml0IGVh
cmxpZXIuIFdlDQo+IGNhbid0IGFsbG93IGEgZnVydGhlciB1c2UgYmVmb3JlIHJlcXVlc3QgcXVl
dWUgaXMgZnVsbHkgaW5pdGlhbGl6ZWQuDQo+IFNpbmNlIGJsa19pbml0X3F1ZXVlX25vZGUgaXMg
YSByZWFsbHkgY29tbW9uIHBhdGggYW5kIGl0IGFsbG93cyBkcml2ZXINCj4gdG8gb3ZlcnJpZGUg
dGhlIGRlZmF1bHQgaW50ZXJuYWwgbG9jaywgSSdtIGFmcmFpZCBzZXZlcmFsIG90aGVyIHBsYWNl
cw0KPiBtYXkgYWxzbyBoYXZlIHRoZSBzYW1lIGlzc3VlLg0KPiBBbSBJIG1pc3Npbmcgc29tZXRo
aW5nIGhlcmU/DQoNCldoYXQgY29kZSB0cmlnZ2VyZWQgdGhlIGJsa2NnX3ByaW50X2Jsa2dzKCkg
Y2FsbD8gV2FzIGl0IHBlcmhhcHMgYSBmdW5jdGlvbg0KcmVsYXRlZCB0byB0aHJvdHRsaW5nPyBJ
ZiBzbywgSSB0aGluayB0aGVyZSBhcmUgdHdvIHBvc3NpYmxlIHdheXMgdG8gZml4IHRoaXMNCnJh
Y2U6DQoxLiBNb3ZpbmcgdGhlIC5xdWV1ZV9sb2NrIGluaXRpYWxpemF0aW9uIGZyb20gYmxrX2lu
aXRfcXVldWVfbm9kZSgpIGludG8NCiAgIGJsa19hbGxvY19xdWV1ZV9ub2RlKCkgc3VjaCB0aGF0
IGl0IGhhcHBlbnMgYmVmb3JlIHRoZSBibGtjZ19pbml0X3F1ZXVlKCkNCiAgIGNhbGwuIFRoaXMg
aG93ZXZlciB3aWxsIHJlcXVpcmUgYSB0cmVlLXdpZGUgY2hhbmdlIG9mIHRoZSBibGtfYWxsb2Nf
cXVldWUoKQ0KICAgYW5kIGFsbCBibGtfYWxsb2NfcXVldWVfbm9kZSgpIGNhbGxzIGluIGFsbCBi
bG9jayBkcml2ZXJzLg0KMi4gU3BsaXR0aW5nIHRoZSBibGtfdGhyb3RsX2luaXQoKSBmdW5jdGlv
biBzdWNoIHRoYXQgdGhlIGJsa2NnX2FjdGl2YXRlX3BvbGljeSgpDQogICBjYWxsIGNhbiBvY2N1
ciBhZnRlciB0aGUgLnF1ZXVlX2xvY2sgcG9pbnRlciBoYXMgYmVlbiBpbml0aWFsaXplZCwgZS5n
LiBmcm9tDQogICBpbnNpZGUgYmxrX2luaXRfYWxsb2NhdGVkX3F1ZXVlKCkgb3IgZnJvbSBpbnNp
ZGUgYmxrX3JlZ2lzdGVyX3F1ZXVlKCkgaW5zdGVhZA0KICAgb2YgZnJvbSBpbnNpZGUgYmxrY2df
aW5pdF9xdWV1ZSgpLg0KDQpJJ20gbm90IHN1cmUgd2hhdCB0aGUgYmVzdCBhcHByb2FjaCBpcy4g
TWF5YmUgdGhlcmUgaXMgZXZlbiBhIHRoaXJkIGFwcHJvYWNoDQp0aGF0IGlzIGJldHRlciB0aGFu
IHRoZSB0d28gYXBwcm9hY2hlcyBtZW50aW9uZWQgYWJvdmUuDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] hard LOCKUP caused by race between blk_init_queue_node and blkcg_print_blkgs
  2018-01-30 21:19 ` Bart Van Assche
@ 2018-01-31  1:53   ` Joseph Qi
  2018-01-31 16:39     ` Bart Van Assche
  0 siblings, 1 reply; 4+ messages in thread
From: Joseph Qi @ 2018-01-31  1:53 UTC (permalink / raw)
  To: Bart Van Assche, axboe@kernel.dk
  Cc: gavin.dg@linux.alibaba.com, linux-block@vger.kernel.org

Hi Bart,
Thanks very much for the quick response.

On 18/1/31 05:19, Bart Van Assche wrote:
> On Tue, 2018-01-30 at 19:21 +0800, Joseph Qi wrote:
>> Hi Jens and Folks,
>>
>> Recently we've gotten a hard LOCKUP issue. After investigating the issue
>> we've found a race between blk_init_queue_node and blkcg_print_blkgs.
>> The race is described below.
>>
>> blk_init_queue_node                 blkcg_print_blkgs
>>   blk_alloc_queue_node (1)
>>     q->queue_lock = &q->__queue_lock (2)
>>     blkcg_init_queue(q) (3)
>>                                     spin_lock_irq(blkg->q->queue_lock) (4)
>>   q->queue_lock = lock (5)
>>                                     spin_unlock_irq(blkg->q->queue_lock) (6)
>>
>> (1) allocate an uninitialized queue;
>> (2) initialize queue_lock to its default internal lock;
>> (3) initialize blkcg part of request queue, which will create blkg and
>> then insert it to blkg_list;
>> (4) traverse blkg_list and find the created blkg, and then take its
>> queue lock, here it is the default *internal lock*;
>> (5) *race window*, now queue_lock is overridden with *driver specified
>> lock*;
>> (6) now unlock *driver specified lock*, not the locked *internal lock*,
>> unlock balance breaks.
>>
>> For the issue above, I think blkcg_init_queue is a bit earlier. We
>> can't allow a further use before request queue is fully initialized.
>> Since blk_init_queue_node is a really common path and it allows driver
>> to override the default internal lock, I'm afraid several other places
>> may also have the same issue.
>> Am I missing something here?
> 
> What code triggered the blkcg_print_blkgs() call? Was it perhaps a function
> related to throttling? If so, I think there are two possible ways to fix this
> race:
Yes, it is from block throttle.

> 1. Moving the .queue_lock initialization from blk_init_queue_node() into
>    blk_alloc_queue_node() such that it happens before the blkcg_init_queue()
>    call. This however will require a tree-wide change of the blk_alloc_queue()
>    and all blk_alloc_queue_node() calls in all block drivers.
> 2. Splitting the blk_throtl_init() function such that the blkcg_activate_policy()
>    call can occur after the .queue_lock pointer has been initialized, e.g. from
>    inside blk_init_allocated_queue() or from inside blk_register_queue() instead
>    of from inside blkcg_init_queue().
> 
At the first glance, I'm thinking of moving the blkcg_init_queue after
the queue_lock is overridden. But I don't have an idea yet to avoid the
risk during cleanup.
Since the initialization of request queue is so fundamental, I'm not
sure if there is the same risk in several other places.

Thanks,
Joseph

> I'm not sure what the best approach is. Maybe there is even a third approach
> that is better than the two approaches mentioned above.
> 
> Bart.
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] hard LOCKUP caused by race between blk_init_queue_node and blkcg_print_blkgs
  2018-01-31  1:53   ` Joseph Qi
@ 2018-01-31 16:39     ` Bart Van Assche
  0 siblings, 0 replies; 4+ messages in thread
From: Bart Van Assche @ 2018-01-31 16:39 UTC (permalink / raw)
  To: joseph.qi@linux.alibaba.com, axboe@kernel.dk
  Cc: gavin.dg@linux.alibaba.com, linux-block@vger.kernel.org

T24gV2VkLCAyMDE4LTAxLTMxIGF0IDA5OjUzICswODAwLCBKb3NlcGggUWkgd3JvdGU6DQo+IEF0
IHRoZSBmaXJzdCBnbGFuY2UsIEknbSB0aGlua2luZyBvZiBtb3ZpbmcgdGhlIGJsa2NnX2luaXRf
cXVldWUgYWZ0ZXINCj4gdGhlIHF1ZXVlX2xvY2sgaXMgb3ZlcnJpZGRlbi4gQnV0IEkgZG9uJ3Qg
aGF2ZSBhbiBpZGVhIHlldCB0byBhdm9pZCB0aGUNCj4gcmlzayBkdXJpbmcgY2xlYW51cC4NCj4g
U2luY2UgdGhlIGluaXRpYWxpemF0aW9uIG9mIHJlcXVlc3QgcXVldWUgaXMgc28gZnVuZGFtZW50
YWwsIEknbSBub3QNCj4gc3VyZSBpZiB0aGVyZSBpcyB0aGUgc2FtZSByaXNrIGluIHNldmVyYWwg
b3RoZXIgcGxhY2VzLg0KDQpIZWxsbyBKb3NlcGgsDQoNCk1vdmluZyB0aGUgYmxrY2dfaW5pdF9x
dWV1ZSgpIGNhbGwgZnJvbSBibGtfYWxsb2NfcXVldWVfbm9kZSgpIGludG8NCmJsa19pbml0X2Fs
bG9jYXRlZF9xdWV1ZV9ub2RlKCkgd291bGQgZHJvcCBjZ3JvdXAgc3VwcG9ydCBmcm9tIGFsbCBi
bG9jaw0KZHJpdmVycyB0aGF0IGRvIG5vdCBjYWxsIGJsa19pbml0X2FsbG9jYXRlZF9xdWV1ZV9u
b2RlKCkuIEkgdGhpbmsgdGhlc2UNCmRyaXZlcnMgYXJlIGJyZCwgZHJiZCwgbnVsbF9ibGssIHBr
dGNkdmQsIHBzM3ZyYW0sIHJzeHgsIHVtZW0sIHpyYW0sDQpsaWdodG52bSwgYmNhY2hlLCBkbSBp
biBiaW8gbW9kZSwgbWQsIG52ZGltbSwgdGhlIE5WTWUgbXVsdGlwYXRoIGRyaXZlciwNCmRjc3Ni
bGsgYW5kIHhwcmFtLiBUaGVzZSBkcml2ZXJzIGNhbiBiZSBmb3VuZCBieSBzZWFyY2hpbmcgZm9y
DQpibGtfcXVldWVfbWFrZV9yZXF1ZXN0KCkgY2FsbHMgaW4gdGhlIGtlcm5lbCBzb3VyY2UgY29k
ZS4gSSdtIG5vdCBzdXJlIHdlDQpjYW4gZHJvcCBjZ3JvdXAgc3VwcG9ydCBmcm9tIGFsbCB0aGVz
ZSBibG9jayBkcml2ZXJzLg0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-01-31 16:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-30 11:21 [RFC] hard LOCKUP caused by race between blk_init_queue_node and blkcg_print_blkgs Joseph Qi
2018-01-30 21:19 ` Bart Van Assche
2018-01-31  1:53   ` Joseph Qi
2018-01-31 16:39     ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox