linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
       [not found] <20170822183816.7925e0f8@canb.auug.org.au>
@ 2017-08-22 10:47 ` Sergey Senozhatsky
  2017-08-22 21:43   ` Bart Van Assche
  0 siblings, 1 reply; 24+ messages in thread
From: Sergey Senozhatsky @ 2017-08-22 10:47 UTC (permalink / raw)
  To: linux-block, linux-scsi
  Cc: Jens Axboe, Martin K. Petersen, Stephen Rothwell,
	Linux-Next Mailing List, Linux Kernel Mailing List

Hello,

======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 Not tainted
------------------------------------------------------
fsck.ext4/148 is trying to acquire lock:
 (&bdev->bd_mutex){+.+.}, at: [<ffffffff8116e73e>] __blkdev_put+0x33/0x190

 but now in release context of a crosslock acquired at the following:
 ((complete)&wait#2){+.+.}, at: [<ffffffff812159e0>] blk_execute_rq+0xbb/0xda

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 ((complete)&wait#2){+.+.}:
       lock_acquire+0x176/0x19e
       __wait_for_common+0x50/0x1e3
       blk_execute_rq+0xbb/0xda
       scsi_execute+0xc3/0x17d [scsi_mod]
       sd_revalidate_disk+0x112/0x1549 [sd_mod]
       rescan_partitions+0x48/0x2c4
       __blkdev_get+0x14b/0x37c
       blkdev_get+0x191/0x2c0
       device_add_disk+0x2b4/0x3e5
       sd_probe_async+0xf8/0x17e [sd_mod]
       async_run_entry_fn+0x34/0xe0
       process_one_work+0x2af/0x4d1
       worker_thread+0x19a/0x24f
       kthread+0x133/0x13b
       ret_from_fork+0x27/0x40

 -> #0 (&bdev->bd_mutex){+.+.}:
       __blkdev_put+0x33/0x190
       blkdev_close+0x24/0x27
       __fput+0xee/0x18a
       task_work_run+0x79/0xa0
       prepare_exit_to_usermode+0x9b/0xb5

 other info that might help us debug this:
 Possible unsafe locking scenario by crosslock:
       CPU0                    CPU1
       ----                    ----
  lock(&bdev->bd_mutex);
  lock((complete)&wait#2);
                               lock(&bdev->bd_mutex);
                               unlock((complete)&wait#2);

  *** DEADLOCK ***
4 locks held by fsck.ext4/148:
 #0:  (&bdev->bd_mutex){+.+.}, at: [<ffffffff8116e73e>] __blkdev_put+0x33/0x190
 #1:  (rcu_read_lock){....}, at: [<ffffffff81217f16>] rcu_lock_acquire+0x0/0x20
 #2:  (&(&host->lock)->rlock){-.-.}, at: [<ffffffffa00e7550>] ata_scsi_queuecmd+0x23/0x74 [libata]
 #3:  (&x->wait#14){-...}, at: [<ffffffff8106b593>] complete+0x18/0x50

 stack backtrace:
CPU: 1 PID: 148 Comm: fsck.ext4 Not tainted 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746
Call Trace:
 dump_stack+0x67/0x8e
 print_circular_bug+0x2a1/0x2af
 ? zap_class+0xc5/0xc5
 check_prev_add+0x76/0x20d
 ? __lock_acquire+0xc27/0xcc8
 lock_commit_crosslock+0x327/0x35e
 complete+0x24/0x50
 scsi_end_request+0x8d/0x176 [scsi_mod]
 scsi_io_completion+0x1be/0x423 [scsi_mod]
 __blk_mq_complete_request+0x112/0x131
 ata_scsi_simulate+0x212/0x218 [libata]
 __ata_scsi_queuecmd+0x1be/0x1de [libata]
 ata_scsi_queuecmd+0x41/0x74 [libata]
 scsi_dispatch_cmd+0x194/0x2af [scsi_mod]
 scsi_queue_rq+0x1e0/0x26f [scsi_mod]
 blk_mq_dispatch_rq_list+0x193/0x2a7
 ? _raw_spin_unlock+0x2e/0x40
 blk_mq_sched_dispatch_requests+0x132/0x176
 __blk_mq_run_hw_queue+0x59/0xc5
 __blk_mq_delay_run_hw_queue+0x5f/0xc1
 blk_mq_flush_plug_list+0xfc/0x10b
 blk_flush_plug_list+0xc6/0x1eb
 blk_finish_plug+0x25/0x32
 generic_writepages+0x56/0x63
 do_writepages+0x36/0x70
 __filemap_fdatawrite_range+0x59/0x5f
 filemap_write_and_wait+0x19/0x4f
 __blkdev_put+0x5f/0x190
 blkdev_close+0x24/0x27
 __fput+0xee/0x18a
 task_work_run+0x79/0xa0
 prepare_exit_to_usermode+0x9b/0xb5
 entry_SYSCALL_64_fastpath+0xab/0xad
RIP: 0033:0x7ff5755a2f74
RSP: 002b:00007ffe46fce038 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000555ddeddded0 RCX: 00007ff5755a2f74
RDX: 0000000000001000 RSI: 0000555ddede2580 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000555ddede2580 R09: 0000555ddedde080
R10: 0000000108000000 R11: 0000000000000246 R12: 0000555ddedddfa0
R13: 00007ff576523680 R14: 0000000000001000 R15: 0000555ddeddc2b0

	-ss

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-22 10:47 ` possible circular locking dependency detected [was: linux-next: Tree for Aug 22] Sergey Senozhatsky
@ 2017-08-22 21:43   ` Bart Van Assche
  2017-08-23  0:03     ` Byungchul Park
  0 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2017-08-22 21:43 UTC (permalink / raw)
  To: byungchul.park@lge.com
  Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	peterz@infradead.org, sergey.senozhatsky.work@gmail.com,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org

T24gVHVlLCAyMDE3LTA4LTIyIGF0IDE5OjQ3ICswOTAwLCBTZXJnZXkgU2Vub3poYXRza3kgd3Jv
dGU6DQo+ID09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PQ0KPiBXQVJOSU5HOiBwb3NzaWJsZSBjaXJjdWxhciBsb2NraW5nIGRlcGVuZGVuY3kgZGV0
ZWN0ZWQNCj4gNC4xMy4wLXJjNi1uZXh0LTIwMTcwODIyLWRiZy0wMDAyMC1nMzk3NThlZDhhYWUw
LWRpcnR5ICMxNzQ2IE5vdCB0YWludGVkDQo+IC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KPiBmc2NrLmV4dDQvMTQ4IGlzIHRyeWluZyB0byBh
Y3F1aXJlIGxvY2s6DQo+ICAoJmJkZXYtPmJkX211dGV4KXsrLisufSwgYXQ6IFs8ZmZmZmZmZmY4
MTE2ZTczZT5dIF9fYmxrZGV2X3B1dCsweDMzLzB4MTkwDQo+IA0KPiAgYnV0IG5vdyBpbiByZWxl
YXNlIGNvbnRleHQgb2YgYSBjcm9zc2xvY2sgYWNxdWlyZWQgYXQgdGhlIGZvbGxvd2luZzoNCj4g
ICgoY29tcGxldGUpJndhaXQjMil7Ky4rLn0sIGF0OiBbPGZmZmZmZmZmODEyMTU5ZTA+XSBibGtf
ZXhlY3V0ZV9ycSsweGJiLzB4ZGENCj4gDQo+ICB3aGljaCBsb2NrIGFscmVhZHkgZGVwZW5kcyBv
biB0aGUgbmV3IGxvY2suDQo+IA0KPiAgdGhlIGV4aXN0aW5nIGRlcGVuZGVuY3kgY2hhaW4gKGlu
IHJldmVyc2Ugb3JkZXIpIGlzOg0KPiANCj4gIC0+ICMxICgoY29tcGxldGUpJndhaXQjMil7Ky4r
Ln06DQo+ICAgICAgICBsb2NrX2FjcXVpcmUrMHgxNzYvMHgxOWUNCj4gICAgICAgIF9fd2FpdF9m
b3JfY29tbW9uKzB4NTAvMHgxZTMNCj4gICAgICAgIGJsa19leGVjdXRlX3JxKzB4YmIvMHhkYQ0K
PiAgICAgICAgc2NzaV9leGVjdXRlKzB4YzMvMHgxN2QgW3Njc2lfbW9kXQ0KPiAgICAgICAgc2Rf
cmV2YWxpZGF0ZV9kaXNrKzB4MTEyLzB4MTU0OSBbc2RfbW9kXQ0KPiAgICAgICAgcmVzY2FuX3Bh
cnRpdGlvbnMrMHg0OC8weDJjNA0KPiAgICAgICAgX19ibGtkZXZfZ2V0KzB4MTRiLzB4MzdjDQo+
ICAgICAgICBibGtkZXZfZ2V0KzB4MTkxLzB4MmMwDQo+ICAgICAgICBkZXZpY2VfYWRkX2Rpc2sr
MHgyYjQvMHgzZTUNCj4gICAgICAgIHNkX3Byb2JlX2FzeW5jKzB4ZjgvMHgxN2UgW3NkX21vZF0N
Cj4gICAgICAgIGFzeW5jX3J1bl9lbnRyeV9mbisweDM0LzB4ZTANCj4gICAgICAgIHByb2Nlc3Nf
b25lX3dvcmsrMHgyYWYvMHg0ZDENCj4gICAgICAgIHdvcmtlcl90aHJlYWQrMHgxOWEvMHgyNGYN
Cj4gICAgICAgIGt0aHJlYWQrMHgxMzMvMHgxM2INCj4gICAgICAgIHJldF9mcm9tX2ZvcmsrMHgy
Ny8weDQwDQo+IA0KPiAgLT4gIzAgKCZiZGV2LT5iZF9tdXRleCl7Ky4rLn06DQo+ICAgICAgICBf
X2Jsa2Rldl9wdXQrMHgzMy8weDE5MA0KPiAgICAgICAgYmxrZGV2X2Nsb3NlKzB4MjQvMHgyNw0K
PiAgICAgICAgX19mcHV0KzB4ZWUvMHgxOGENCj4gICAgICAgIHRhc2tfd29ya19ydW4rMHg3OS8w
eGEwDQo+ICAgICAgICBwcmVwYXJlX2V4aXRfdG9fdXNlcm1vZGUrMHg5Yi8weGI1DQo+IA0KPiAg
b3RoZXIgaW5mbyB0aGF0IG1pZ2h0IGhlbHAgdXMgZGVidWcgdGhpczoNCj4gIFBvc3NpYmxlIHVu
c2FmZSBsb2NraW5nIHNjZW5hcmlvIGJ5IGNyb3NzbG9jazoNCj4gICAgICAgIENQVTAgICAgICAg
ICAgICAgICAgICAgIENQVTENCj4gICAgICAgIC0tLS0gICAgICAgICAgICAgICAgICAgIC0tLS0N
Cj4gICBsb2NrKCZiZGV2LT5iZF9tdXRleCk7DQo+ICAgbG9jaygoY29tcGxldGUpJndhaXQjMik7
DQo+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBsb2NrKCZiZGV2LT5iZF9tdXRleCk7
DQo+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB1bmxvY2soKGNvbXBsZXRlKSZ3YWl0
IzIpOw0KPiANCj4gICAqKiogREVBRExPQ0sgKioqDQo+IDQgbG9ja3MgaGVsZCBieSBmc2NrLmV4
dDQvMTQ4Og0KPiAgIzA6ICAoJmJkZXYtPmJkX211dGV4KXsrLisufSwgYXQ6IFs8ZmZmZmZmZmY4
MTE2ZTczZT5dIF9fYmxrZGV2X3B1dCsweDMzLzB4MTkwDQo+ICAjMTogIChyY3VfcmVhZF9sb2Nr
KXsuLi4ufSwgYXQ6IFs8ZmZmZmZmZmY4MTIxN2YxNj5dIHJjdV9sb2NrX2FjcXVpcmUrMHgwLzB4
MjANCj4gICMyOiAgKCYoJmhvc3QtPmxvY2spLT5ybG9jayl7LS4tLn0sIGF0OiBbPGZmZmZmZmZm
YTAwZTc1NTA+XSBhdGFfc2NzaV9xdWV1ZWNtZCsweDIzLzB4NzQgW2xpYmF0YV0NCj4gICMzOiAg
KCZ4LT53YWl0IzE0KXstLi4ufSwgYXQ6IFs8ZmZmZmZmZmY4MTA2YjU5Mz5dIGNvbXBsZXRlKzB4
MTgvMHg1MA0KPiANCj4gIHN0YWNrIGJhY2t0cmFjZToNCj4gQ1BVOiAxIFBJRDogMTQ4IENvbW06
IGZzY2suZXh0NCBOb3QgdGFpbnRlZCA0LjEzLjAtcmM2LW5leHQtMjAxNzA4MjItZGJnLTAwMDIw
LWczOTc1OGVkOGFhZTAtZGlydHkgIzE3NDYNCj4gQ2FsbCBUcmFjZToNCj4gIGR1bXBfc3RhY2sr
MHg2Ny8weDhlDQo+ICBwcmludF9jaXJjdWxhcl9idWcrMHgyYTEvMHgyYWYNCj4gID8gemFwX2Ns
YXNzKzB4YzUvMHhjNQ0KPiAgY2hlY2tfcHJldl9hZGQrMHg3Ni8weDIwZA0KPiAgPyBfX2xvY2tf
YWNxdWlyZSsweGMyNy8weGNjOA0KPiAgbG9ja19jb21taXRfY3Jvc3Nsb2NrKzB4MzI3LzB4MzVl
DQo+ICBjb21wbGV0ZSsweDI0LzB4NTANCj4gIHNjc2lfZW5kX3JlcXVlc3QrMHg4ZC8weDE3NiBb
c2NzaV9tb2RdDQo+ICBzY3NpX2lvX2NvbXBsZXRpb24rMHgxYmUvMHg0MjMgW3Njc2lfbW9kXQ0K
PiAgX19ibGtfbXFfY29tcGxldGVfcmVxdWVzdCsweDExMi8weDEzMQ0KPiAgYXRhX3Njc2lfc2lt
dWxhdGUrMHgyMTIvMHgyMTggW2xpYmF0YV0NCj4gIF9fYXRhX3Njc2lfcXVldWVjbWQrMHgxYmUv
MHgxZGUgW2xpYmF0YV0NCj4gIGF0YV9zY3NpX3F1ZXVlY21kKzB4NDEvMHg3NCBbbGliYXRhXQ0K
PiAgc2NzaV9kaXNwYXRjaF9jbWQrMHgxOTQvMHgyYWYgW3Njc2lfbW9kXQ0KPiAgc2NzaV9xdWV1
ZV9ycSsweDFlMC8weDI2ZiBbc2NzaV9tb2RdDQo+ICBibGtfbXFfZGlzcGF0Y2hfcnFfbGlzdCsw
eDE5My8weDJhNw0KPiAgPyBfcmF3X3NwaW5fdW5sb2NrKzB4MmUvMHg0MA0KPiAgYmxrX21xX3Nj
aGVkX2Rpc3BhdGNoX3JlcXVlc3RzKzB4MTMyLzB4MTc2DQo+ICBfX2Jsa19tcV9ydW5faHdfcXVl
dWUrMHg1OS8weGM1DQo+ICBfX2Jsa19tcV9kZWxheV9ydW5faHdfcXVldWUrMHg1Zi8weGMxDQo+
ICBibGtfbXFfZmx1c2hfcGx1Z19saXN0KzB4ZmMvMHgxMGINCj4gIGJsa19mbHVzaF9wbHVnX2xp
c3QrMHhjNi8weDFlYg0KPiAgYmxrX2ZpbmlzaF9wbHVnKzB4MjUvMHgzMg0KPiAgZ2VuZXJpY193
cml0ZXBhZ2VzKzB4NTYvMHg2Mw0KPiAgZG9fd3JpdGVwYWdlcysweDM2LzB4NzANCj4gIF9fZmls
ZW1hcF9mZGF0YXdyaXRlX3JhbmdlKzB4NTkvMHg1Zg0KPiAgZmlsZW1hcF93cml0ZV9hbmRfd2Fp
dCsweDE5LzB4NGYNCj4gIF9fYmxrZGV2X3B1dCsweDVmLzB4MTkwDQo+ICBibGtkZXZfY2xvc2Ur
MHgyNC8weDI3DQo+ICBfX2ZwdXQrMHhlZS8weDE4YQ0KPiAgdGFza193b3JrX3J1bisweDc5LzB4
YTANCj4gIHByZXBhcmVfZXhpdF90b191c2VybW9kZSsweDliLzB4YjUNCj4gIGVudHJ5X1NZU0NB
TExfNjRfZmFzdHBhdGgrMHhhYi8weGFkDQoNCkJ5dW5nY2h1bCwgZGlkIHlvdSBhZGQgdGhlIGNy
b3NzbG9jayBjaGVja3MgdG8gbG9ja2RlcD8gQ2FuIHlvdSBoYXZlIGEgbG9vayBhdA0KdGhlIGFi
b3ZlIHJlcG9ydD8gVGhhdCByZXBvcnQgbmFtZWx5IGRvZXNuJ3QgbWFrZSBzZW5zZSB0byBtZS4N
Cg0KQmFydC4=

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-22 21:43   ` Bart Van Assche
@ 2017-08-23  0:03     ` Byungchul Park
  2017-08-23  2:36       ` Sergey Senozhatsky
                         ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Byungchul Park @ 2017-08-23  0:03 UTC (permalink / raw)
  To: Bart Van Assche, peterz
  Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	peterz@infradead.org, sergey.senozhatsky.work@gmail.com,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

On Tue, Aug 22, 2017 at 09:43:56PM +0000, Bart Van Assche wrote:
> On Tue, 2017-08-22 at 19:47 +0900, Sergey Senozhatsky wrote:
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 Not tainted
> > ------------------------------------------------------
> > fsck.ext4/148 is trying to acquire lock:
> >  (&bdev->bd_mutex){+.+.}, at: [<ffffffff8116e73e>] __blkdev_put+0x33/0x190
> > 
> >  but now in release context of a crosslock acquired at the following:
> >  ((complete)&wait#2){+.+.}, at: [<ffffffff812159e0>] blk_execute_rq+0xbb/0xda
> > 
> >  which lock already depends on the new lock.
> > 
> >  the existing dependency chain (in reverse order) is:
> > 
> >  -> #1 ((complete)&wait#2){+.+.}:
> >        lock_acquire+0x176/0x19e
> >        __wait_for_common+0x50/0x1e3
> >        blk_execute_rq+0xbb/0xda
> >        scsi_execute+0xc3/0x17d [scsi_mod]
> >        sd_revalidate_disk+0x112/0x1549 [sd_mod]
> >        rescan_partitions+0x48/0x2c4
> >        __blkdev_get+0x14b/0x37c
> >        blkdev_get+0x191/0x2c0
> >        device_add_disk+0x2b4/0x3e5
> >        sd_probe_async+0xf8/0x17e [sd_mod]
> >        async_run_entry_fn+0x34/0xe0
> >        process_one_work+0x2af/0x4d1
> >        worker_thread+0x19a/0x24f
> >        kthread+0x133/0x13b
> >        ret_from_fork+0x27/0x40
> > 
> >  -> #0 (&bdev->bd_mutex){+.+.}:
> >        __blkdev_put+0x33/0x190
> >        blkdev_close+0x24/0x27
> >        __fput+0xee/0x18a
> >        task_work_run+0x79/0xa0
> >        prepare_exit_to_usermode+0x9b/0xb5
> > 
> >  other info that might help us debug this:
> >  Possible unsafe locking scenario by crosslock:
> >        CPU0                    CPU1
> >        ----                    ----
> >   lock(&bdev->bd_mutex);
> >   lock((complete)&wait#2);
> >                                lock(&bdev->bd_mutex);
> >                                unlock((complete)&wait#2);
> > 
> >   *** DEADLOCK ***
> > 4 locks held by fsck.ext4/148:
> >  #0:  (&bdev->bd_mutex){+.+.}, at: [<ffffffff8116e73e>] __blkdev_put+0x33/0x190
> >  #1:  (rcu_read_lock){....}, at: [<ffffffff81217f16>] rcu_lock_acquire+0x0/0x20
> >  #2:  (&(&host->lock)->rlock){-.-.}, at: [<ffffffffa00e7550>] ata_scsi_queuecmd+0x23/0x74 [libata]
> >  #3:  (&x->wait#14){-...}, at: [<ffffffff8106b593>] complete+0x18/0x50
> > 
> >  stack backtrace:
> > CPU: 1 PID: 148 Comm: fsck.ext4 Not tainted 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746
> > Call Trace:
> >  dump_stack+0x67/0x8e
> >  print_circular_bug+0x2a1/0x2af
> >  ? zap_class+0xc5/0xc5
> >  check_prev_add+0x76/0x20d
> >  ? __lock_acquire+0xc27/0xcc8
> >  lock_commit_crosslock+0x327/0x35e
> >  complete+0x24/0x50
> >  scsi_end_request+0x8d/0x176 [scsi_mod]
> >  scsi_io_completion+0x1be/0x423 [scsi_mod]
> >  __blk_mq_complete_request+0x112/0x131
> >  ata_scsi_simulate+0x212/0x218 [libata]
> >  __ata_scsi_queuecmd+0x1be/0x1de [libata]
> >  ata_scsi_queuecmd+0x41/0x74 [libata]
> >  scsi_dispatch_cmd+0x194/0x2af [scsi_mod]
> >  scsi_queue_rq+0x1e0/0x26f [scsi_mod]
> >  blk_mq_dispatch_rq_list+0x193/0x2a7
> >  ? _raw_spin_unlock+0x2e/0x40
> >  blk_mq_sched_dispatch_requests+0x132/0x176
> >  __blk_mq_run_hw_queue+0x59/0xc5
> >  __blk_mq_delay_run_hw_queue+0x5f/0xc1
> >  blk_mq_flush_plug_list+0xfc/0x10b
> >  blk_flush_plug_list+0xc6/0x1eb
> >  blk_finish_plug+0x25/0x32
> >  generic_writepages+0x56/0x63
> >  do_writepages+0x36/0x70
> >  __filemap_fdatawrite_range+0x59/0x5f
> >  filemap_write_and_wait+0x19/0x4f
> >  __blkdev_put+0x5f/0x190
> >  blkdev_close+0x24/0x27
> >  __fput+0xee/0x18a
> >  task_work_run+0x79/0xa0
> >  prepare_exit_to_usermode+0x9b/0xb5
> >  entry_SYSCALL_64_fastpath+0xab/0xad
> 
> Byungchul, did you add the crosslock checks to lockdep? Can you have a look at
> the above report? That report namely doesn't make sense to me.

The report is talking about the following lockup:

A work in a worker                     A task work on exit to user
------------------                     ---------------------------
mutex_lock(&bdev->bd_mutex)
                                       mutext_lock(&bdev->bd_mutex)
blk_execute_rq()
   wait_for_completion_io_timeout(&A)
                                       complete(&A)

Is this impossible?

To Peterz,

Anyway I wanted to avoid lockdep reports in the case using a timeout
interface. Do you think it's still worth reporting the kind of lockup?
I'm ok if you do.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-23  0:03     ` Byungchul Park
@ 2017-08-23  2:36       ` Sergey Senozhatsky
  2017-08-23  2:59         ` Byungchul Park
  2017-08-23  3:49       ` Boqun Feng
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Sergey Senozhatsky @ 2017-08-23  2:36 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Bart Van Assche, peterz, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, sergey.senozhatsky.work@gmail.com,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

On (08/23/17 09:03), Byungchul Park wrote:
[..]

aha, ok

> The report is talking about the following lockup:
> 
> A work in a worker                     A task work on exit to user
> ------------------                     ---------------------------
> mutex_lock(&bdev->bd_mutex)
>                                        mutext_lock(&bdev->bd_mutex)
> blk_execute_rq()
>    wait_for_completion_io_timeout(&A)
>                                        complete(&A)
> 
> Is this impossible?

I was really confused how this "unlock" may lead to a deadlock

> > >  other info that might help us debug this:
> > >  Possible unsafe locking scenario by crosslock:
> > >        CPU0                    CPU1
> > >        ----                    ----
> > >   lock(&bdev->bd_mutex);
> > >   lock((complete)&wait#2);
> > >                                lock(&bdev->bd_mutex);
> > >                                unlock((complete)&wait#2);


any chance the report can be improved? mention timeout, etc?
// well, if this functionality will stay.


p.s.
Bart Van Assche, thanks for Cc-ing Park Byungchul, I was really
sure I didn't enabled the cross-release, but apparently I was wrong:
 CONFIG_LOCKDEP_CROSSRELEASE=y
 CONFIG_LOCKDEP_COMPLETIONS=y

	-ss

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-23  2:36       ` Sergey Senozhatsky
@ 2017-08-23  2:59         ` Byungchul Park
  0 siblings, 0 replies; 24+ messages in thread
From: Byungchul Park @ 2017-08-23  2:59 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Bart Van Assche, peterz, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, martin.petersen@oracle.com,
	axboe@kernel.dk, linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

On Wed, Aug 23, 2017 at 11:36:49AM +0900, Sergey Senozhatsky wrote:
> On (08/23/17 09:03), Byungchul Park wrote:
> [..]
> 
> aha, ok
> 
> > The report is talking about the following lockup:
> > 
> > A work in a worker                     A task work on exit to user
> > ------------------                     ---------------------------
> > mutex_lock(&bdev->bd_mutex)
> >                                        mutext_lock(&bdev->bd_mutex)
> > blk_execute_rq()
> >    wait_for_completion_io_timeout(&A)
> >                                        complete(&A)
> > 
> > Is this impossible?
> 
> I was really confused how this "unlock" may lead to a deadlock

Hi Sergey,

Right. It should be enhanced.

> 
> > > >  other info that might help us debug this:
> > > >  Possible unsafe locking scenario by crosslock:
> > > >        CPU0                    CPU1
> > > >        ----                    ----
> > > >   lock(&bdev->bd_mutex);
> > > >   lock((complete)&wait#2);
> > > >                                lock(&bdev->bd_mutex);
> > > >                                unlock((complete)&wait#2);
> 
> 
> any chance the report can be improved? mention timeout, etc?
> // well, if this functionality will stay.
> 
> 
> p.s.
> Bart Van Assche, thanks for Cc-ing Park Byungchul, I was really
> sure I didn't enabled the cross-release, but apparently I was wrong:
>  CONFIG_LOCKDEP_CROSSRELEASE=y
>  CONFIG_LOCKDEP_COMPLETIONS=y
> 
> 	-ss

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-23  0:03     ` Byungchul Park
  2017-08-23  2:36       ` Sergey Senozhatsky
@ 2017-08-23  3:49       ` Boqun Feng
  2017-08-23  4:38         ` Boqun Feng
  2017-08-23  4:46         ` Byungchul Park
  2017-08-23  7:53       ` Peter Zijlstra
  2017-08-30  5:20       ` Sergey Senozhatsky
  3 siblings, 2 replies; 24+ messages in thread
From: Boqun Feng @ 2017-08-23  3:49 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Bart Van Assche, peterz, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, sergey.senozhatsky.work@gmail.com,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

Hi Byungchul,

On Wed, Aug 23, 2017 at 09:03:04AM +0900, Byungchul Park wrote:
> On Tue, Aug 22, 2017 at 09:43:56PM +0000, Bart Van Assche wrote:
> > On Tue, 2017-08-22 at 19:47 +0900, Sergey Senozhatsky wrote:
> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
> > > WARNING: possible circular locking dependency detected
> > > 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 Not tain=
ted
> > > ------------------------------------------------------
> > > fsck.ext4/148 is trying to acquire lock:
> > >  (&bdev->bd_mutex){+.+.}, at: [<ffffffff8116e73e>] __blkdev_put+0x33/=
0x190
> > >=20
> > >  but now in release context of a crosslock acquired at the following:
> > >  ((complete)&wait#2){+.+.}, at: [<ffffffff812159e0>] blk_execute_rq+0=
xbb/0xda
> > >=20
> > >  which lock already depends on the new lock.
> > >=20

I felt this message really misleading, because the deadlock is detected
at the commit time of "((complete)&wait#2)" rather than the acquisition
time of "(&bdev->bd_mutex)", so I made the following improvement.

Thoughts?

Regards,
Boqun

----------------------->8
=46rom: Boqun Feng <boqun.feng@gmail.com>
Date: Wed, 23 Aug 2017 10:18:30 +0800
Subject: [PATCH] lockdep: Improve the readibility of crossrelease related
 splats

When a crossrelease related deadlock is detected in a commit, the
current implemention makes splats like:

> fsck.ext4/148 is trying to acquire lock:
>  (&bdev->bd_mutex){+.+.}, at: [<ffffffff8116e73e>] __blkdev_put+0x33/0x190
>
>  but now in release context of a crosslock acquired at the following:
>  ((complete)&wait#2){+.+.}, at: [<ffffffff812159e0>] blk_execute_rq+0xbb/=
0xda
>
>  which lock already depends on the new lock.
> ...

However, it could be misleading because the current task has got the
lock already, and in fact the deadlock is detected when it is doing the
commit of the crossrelease lock. So make the splats more accurate to
describe the deadlock case.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/locking/lockdep.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 66011c9f5df3..642fb5362507 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1195,17 +1195,23 @@ print_circular_bug_header(struct lock_list *entry, =
unsigned int depth,
 	pr_warn("WARNING: possible circular locking dependency detected\n");
 	print_kernel_ident();
 	pr_warn("------------------------------------------------------\n");
-	pr_warn("%s/%d is trying to acquire lock:\n",
-		curr->comm, task_pid_nr(curr));
-	print_lock(check_src);
=20
-	if (cross_lock(check_tgt->instance))
-		pr_warn("\nbut now in release context of a crosslock acquired at the fol=
lowing:\n");
-	else
+	if (cross_lock(check_tgt->instance)) {
+		pr_warn("%s/%d is committing a crossrelease lock:\n",
+			curr->comm, task_pid_nr(curr));
+		print_lock(check_tgt);
+		pr_warn("\n, with the following lock held:\n");
+		print_lock(check_src);
+		pr_warn("\non which lock the crossrelease lock already depends.\n\n");
+	} else {
+		pr_warn("%s/%d is trying to acquire lock:\n",
+			curr->comm, task_pid_nr(curr));
+		print_lock(check_src);
 		pr_warn("\nbut task is already holding lock:\n");
+		print_lock(check_tgt);
+		pr_warn("\nwhich lock already depends on the new lock.\n\n");
+	}
=20
-	print_lock(check_tgt);
-	pr_warn("\nwhich lock already depends on the new lock.\n\n");
 	pr_warn("\nthe existing dependency chain (in reverse order) is:\n");
=20
 	print_circular_bug_entry(entry, depth);
--=20
2.14.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-23  3:49       ` Boqun Feng
@ 2017-08-23  4:38         ` Boqun Feng
  2017-08-23  4:46           ` Sergey Senozhatsky
  2017-08-23  5:44           ` Byungchul Park
  2017-08-23  4:46         ` Byungchul Park
  1 sibling, 2 replies; 24+ messages in thread
From: Boqun Feng @ 2017-08-23  4:38 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Bart Van Assche, peterz, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, sergey.senozhatsky.work@gmail.com,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

On Wed, Aug 23, 2017 at 11:49:51AM +0800, Boqun Feng wrote:
> Hi Byungchul,
> 
> On Wed, Aug 23, 2017 at 09:03:04AM +0900, Byungchul Park wrote:
> > On Tue, Aug 22, 2017 at 09:43:56PM +0000, Bart Van Assche wrote:
> > > On Tue, 2017-08-22 at 19:47 +0900, Sergey Senozhatsky wrote:
> > > > ======================================================
> > > > WARNING: possible circular locking dependency detected
> > > > 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 Not tainted
> > > > ------------------------------------------------------
> > > > fsck.ext4/148 is trying to acquire lock:
> > > >  (&bdev->bd_mutex){+.+.}, at: [<ffffffff8116e73e>] __blkdev_put+0x33/0x190
> > > > 
> > > >  but now in release context of a crosslock acquired at the following:
> > > >  ((complete)&wait#2){+.+.}, at: [<ffffffff812159e0>] blk_execute_rq+0xbb/0xda
> > > > 
> > > >  which lock already depends on the new lock.
> > > > 
> 
> I felt this message really misleading, because the deadlock is detected
> at the commit time of "((complete)&wait#2)" rather than the acquisition
> time of "(&bdev->bd_mutex)", so I made the following improvement.
> 
> Thoughts?
> 
> Regards,
> Boqun
> 

While I'm on this one, I think we should also add a case in @check_src
is a cross lock, i.e. we detect cross deadlock at the acquisition time
of the cross lock. How about the following?

Regards,
Boqun

--------------------------------------->8
From: Boqun Feng <boqun.feng@gmail.com>
Date: Wed, 23 Aug 2017 12:12:16 +0800
Subject: [PATCH] lockdep: Print proper scenario if cross deadlock detected at
 acquisition time

For a potential deadlock about CROSSRELEASE as follow:

	P1		P2
	===========	=============
	lock(A)
	lock(X)
			lock(A)
			commit(X)

	A: normal lock, X: cross lock

, we could detect it at two places:

1. commit time:

	We have run P1 first, and have dependency A --> X in graph, and
	then we run P2, and find the deadlock.

2. acquisition time:

	We have run P2 first, and have dependency A --> X, in
	graph(because another P3 may run previously and is acquiring for
	lock X), and then we run P1 and find the deadlock.

In current print_circular_lock_scenario(), for 1) we could print the
right scenario and note that's a deadlock related to CROSSRELEASE,
however for 2) we print the scenario as a normal lockdep deadlock.

It's better to print a proper scenario related to CROSSRELEASE to help
users find their bugs more easily, so improve this.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/locking/lockdep.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 642fb5362507..a3709e15f609 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1156,6 +1156,23 @@ print_circular_lock_scenario(struct held_lock *src,
 		__print_lock_name(target);
 		printk(KERN_CONT ");\n");
 		printk("\n *** DEADLOCK ***\n\n");
+	} else if (cross_lock(src->instance)) {
+		printk(" Possible unsafe locking scenario by crosslock:\n\n");
+		printk("       CPU0                    CPU1\n");
+		printk("       ----                    ----\n");
+		printk("  lock(");
+		__print_lock_name(target);
+		printk(KERN_CONT ");\n");
+		printk("  lock(");
+		__print_lock_name(source);
+		printk(KERN_CONT ");\n");
+		printk("                               lock(");
+		__print_lock_name(parent == source ? target : parent);
+		printk(KERN_CONT ");\n");
+		printk("                               unlock(");
+		__print_lock_name(source);
+		printk(KERN_CONT ");\n");
+		printk("\n *** DEADLOCK ***\n\n");
 	} else {
 		printk(" Possible unsafe locking scenario:\n\n");
 		printk("       CPU0                    CPU1\n");
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-23  3:49       ` Boqun Feng
  2017-08-23  4:38         ` Boqun Feng
@ 2017-08-23  4:46         ` Byungchul Park
  2017-08-23  5:01           ` Boqun Feng
  1 sibling, 1 reply; 24+ messages in thread
From: Byungchul Park @ 2017-08-23  4:46 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Bart Van Assche, peterz, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, sergey.senozhatsky.work@gmail.com,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

On Wed, Aug 23, 2017 at 11:49:51AM +0800, Boqun Feng wrote:
> Hi Byungchul,
> 
> On Wed, Aug 23, 2017 at 09:03:04AM +0900, Byungchul Park wrote:
> > On Tue, Aug 22, 2017 at 09:43:56PM +0000, Bart Van Assche wrote:
> > > On Tue, 2017-08-22 at 19:47 +0900, Sergey Senozhatsky wrote:
> > > > ======================================================
> > > > WARNING: possible circular locking dependency detected
> > > > 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 Not tainted
> > > > ------------------------------------------------------
> > > > fsck.ext4/148 is trying to acquire lock:
> > > >  (&bdev->bd_mutex){+.+.}, at: [<ffffffff8116e73e>] __blkdev_put+0x33/0x190
> > > > 
> > > >  but now in release context of a crosslock acquired at the following:
> > > >  ((complete)&wait#2){+.+.}, at: [<ffffffff812159e0>] blk_execute_rq+0xbb/0xda
> > > > 
> > > >  which lock already depends on the new lock.
> > > > 
> 
> I felt this message really misleading, because the deadlock is detected
> at the commit time of "((complete)&wait#2)" rather than the acquisition
> time of "(&bdev->bd_mutex)", so I made the following improvement.
> 
> Thoughts?
> 
> Regards,
> Boqun
> 
> ----------------------->8
> From: Boqun Feng <boqun.feng@gmail.com>
> Date: Wed, 23 Aug 2017 10:18:30 +0800
> Subject: [PATCH] lockdep: Improve the readibility of crossrelease related
>  splats
> 
> When a crossrelease related deadlock is detected in a commit, the
> current implemention makes splats like:
> 
> > fsck.ext4/148 is trying to acquire lock:
> >  (&bdev->bd_mutex){+.+.}, at: [<ffffffff8116e73e>] __blkdev_put+0x33/0x190
> >
> >  but now in release context of a crosslock acquired at the following:
> >  ((complete)&wait#2){+.+.}, at: [<ffffffff812159e0>] blk_execute_rq+0xbb/0xda
> >
> >  which lock already depends on the new lock.
> > ...
> 
> However, it could be misleading because the current task has got the
> lock already, and in fact the deadlock is detected when it is doing the
> commit of the crossrelease lock. So make the splats more accurate to
> describe the deadlock case.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  kernel/locking/lockdep.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 66011c9f5df3..642fb5362507 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -1195,17 +1195,23 @@ print_circular_bug_header(struct lock_list *entry, unsigned int depth,
>  	pr_warn("WARNING: possible circular locking dependency detected\n");
>  	print_kernel_ident();
>  	pr_warn("------------------------------------------------------\n");
> -	pr_warn("%s/%d is trying to acquire lock:\n",
> -		curr->comm, task_pid_nr(curr));
> -	print_lock(check_src);
>  
> -	if (cross_lock(check_tgt->instance))
> -		pr_warn("\nbut now in release context of a crosslock acquired at the following:\n");
> -	else
> +	if (cross_lock(check_tgt->instance)) {
> +		pr_warn("%s/%d is committing a crossrelease lock:\n",
> +			curr->comm, task_pid_nr(curr));

I think it would be better to print something in term of acquisition,
since the following print_lock() will print infromation of acquisition.

> +		print_lock(check_tgt);
> +		pr_warn("\n, with the following lock held:\n");

The lock does not have to be held at the commit.

> +		print_lock(check_src);
> +		pr_warn("\non which lock the crossrelease lock already depends.\n\n");
> +	} else {
> +		pr_warn("%s/%d is trying to acquire lock:\n",
> +			curr->comm, task_pid_nr(curr));
> +		print_lock(check_src);
>  		pr_warn("\nbut task is already holding lock:\n");
> +		print_lock(check_tgt);
> +		pr_warn("\nwhich lock already depends on the new lock.\n\n");
> +	}
>  
> -	print_lock(check_tgt);
> -	pr_warn("\nwhich lock already depends on the new lock.\n\n");
>  	pr_warn("\nthe existing dependency chain (in reverse order) is:\n");
>  
>  	print_circular_bug_entry(entry, depth);
> -- 
> 2.14.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-23  4:38         ` Boqun Feng
@ 2017-08-23  4:46           ` Sergey Senozhatsky
  2017-08-23  5:35             ` Boqun Feng
  2017-08-23  5:44           ` Byungchul Park
  1 sibling, 1 reply; 24+ messages in thread
From: Sergey Senozhatsky @ 2017-08-23  4:46 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Byungchul Park, Bart Van Assche, peterz,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	sergey.senozhatsky.work@gmail.com, martin.petersen@oracle.com,
	axboe@kernel.dk, linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

On (08/23/17 12:38), Boqun Feng wrote:
[..]
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 642fb5362507..a3709e15f609 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -1156,6 +1156,23 @@ print_circular_lock_scenario(struct held_lock *src,
>  		__print_lock_name(target);
>  		printk(KERN_CONT ");\n");

KERN_CONT and "\n" should not be together. "\n" flushes the cont
buffer immediately.

	-ss

>  		printk("\n *** DEADLOCK ***\n\n");
> +	} else if (cross_lock(src->instance)) {
> +		printk(" Possible unsafe locking scenario by crosslock:\n\n");
> +		printk("       CPU0                    CPU1\n");
> +		printk("       ----                    ----\n");
> +		printk("  lock(");
> +		__print_lock_name(target);
> +		printk(KERN_CONT ");\n");
> +		printk("  lock(");
> +		__print_lock_name(source);
> +		printk(KERN_CONT ");\n");
> +		printk("                               lock(");
> +		__print_lock_name(parent == source ? target : parent);
> +		printk(KERN_CONT ");\n");
> +		printk("                               unlock(");
> +		__print_lock_name(source);
> +		printk(KERN_CONT ");\n");
> +		printk("\n *** DEADLOCK ***\n\n");
>  	} else {
>  		printk(" Possible unsafe locking scenario:\n\n");
>  		printk("       CPU0                    CPU1\n");
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-23  4:46         ` Byungchul Park
@ 2017-08-23  5:01           ` Boqun Feng
  0 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2017-08-23  5:01 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Bart Van Assche, peterz, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, sergey.senozhatsky.work@gmail.com,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

[-- Attachment #1: Type: text/plain, Size: 4914 bytes --]

On Wed, Aug 23, 2017 at 01:46:17PM +0900, Byungchul Park wrote:
> On Wed, Aug 23, 2017 at 11:49:51AM +0800, Boqun Feng wrote:
> > Hi Byungchul,
> > 
> > On Wed, Aug 23, 2017 at 09:03:04AM +0900, Byungchul Park wrote:
> > > On Tue, Aug 22, 2017 at 09:43:56PM +0000, Bart Van Assche wrote:
> > > > On Tue, 2017-08-22 at 19:47 +0900, Sergey Senozhatsky wrote:
> > > > > ======================================================
> > > > > WARNING: possible circular locking dependency detected
> > > > > 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 Not tainted
> > > > > ------------------------------------------------------
> > > > > fsck.ext4/148 is trying to acquire lock:
> > > > >  (&bdev->bd_mutex){+.+.}, at: [<ffffffff8116e73e>] __blkdev_put+0x33/0x190
> > > > > 
> > > > >  but now in release context of a crosslock acquired at the following:
> > > > >  ((complete)&wait#2){+.+.}, at: [<ffffffff812159e0>] blk_execute_rq+0xbb/0xda
> > > > > 
> > > > >  which lock already depends on the new lock.
> > > > > 
> > 
> > I felt this message really misleading, because the deadlock is detected
> > at the commit time of "((complete)&wait#2)" rather than the acquisition
> > time of "(&bdev->bd_mutex)", so I made the following improvement.
> > 
> > Thoughts?
> > 
> > Regards,
> > Boqun
> > 
> > ----------------------->8
> > From: Boqun Feng <boqun.feng@gmail.com>
> > Date: Wed, 23 Aug 2017 10:18:30 +0800
> > Subject: [PATCH] lockdep: Improve the readibility of crossrelease related
> >  splats
> > 
> > When a crossrelease related deadlock is detected in a commit, the
> > current implemention makes splats like:
> > 
> > > fsck.ext4/148 is trying to acquire lock:
> > >  (&bdev->bd_mutex){+.+.}, at: [<ffffffff8116e73e>] __blkdev_put+0x33/0x190
> > >
> > >  but now in release context of a crosslock acquired at the following:
> > >  ((complete)&wait#2){+.+.}, at: [<ffffffff812159e0>] blk_execute_rq+0xbb/0xda
> > >
> > >  which lock already depends on the new lock.
> > > ...
> > 
> > However, it could be misleading because the current task has got the
> > lock already, and in fact the deadlock is detected when it is doing the
> > commit of the crossrelease lock. So make the splats more accurate to
> > describe the deadlock case.
> > 
> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > ---
> >  kernel/locking/lockdep.c | 22 ++++++++++++++--------
> >  1 file changed, 14 insertions(+), 8 deletions(-)
> > 
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index 66011c9f5df3..642fb5362507 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -1195,17 +1195,23 @@ print_circular_bug_header(struct lock_list *entry, unsigned int depth,
> >  	pr_warn("WARNING: possible circular locking dependency detected\n");
> >  	print_kernel_ident();
> >  	pr_warn("------------------------------------------------------\n");
> > -	pr_warn("%s/%d is trying to acquire lock:\n",
> > -		curr->comm, task_pid_nr(curr));
> > -	print_lock(check_src);
> >  
> > -	if (cross_lock(check_tgt->instance))
> > -		pr_warn("\nbut now in release context of a crosslock acquired at the following:\n");
> > -	else
> > +	if (cross_lock(check_tgt->instance)) {
> > +		pr_warn("%s/%d is committing a crossrelease lock:\n",
> > +			curr->comm, task_pid_nr(curr));
> 
> I think it would be better to print something in term of acquisition,
> since the following print_lock() will print infromation of acquisition.
> 

Well, that print_lock() will print the cross lock acquisition
information at other contexts, but the current thread is doing the
commit. So I think the information would be a little misleading. I will
add "aacquired at" to indicate the lock information is for acquisition.

> > +		print_lock(check_tgt);
> > +		pr_warn("\n, with the following lock held:\n");
> 
> The lock does not have to be held at the commit.
> 

Ah.. right.

How about this:

		pr_warn("%s/%d is committing a crossrelease lock acquired at:\n",
			curr->comm, task_pid_nr(curr));
		print_lock(check_tgt);
		pr_warn("\n, after having the following lock held at least once:\n");

Regards,
Boqun

> > +		print_lock(check_src);
> > +		pr_warn("\non which lock the crossrelease lock already depends.\n\n");
> > +	} else {
> > +		pr_warn("%s/%d is trying to acquire lock:\n",
> > +			curr->comm, task_pid_nr(curr));
> > +		print_lock(check_src);
> >  		pr_warn("\nbut task is already holding lock:\n");
> > +		print_lock(check_tgt);
> > +		pr_warn("\nwhich lock already depends on the new lock.\n\n");
> > +	}
> >  
> > -	print_lock(check_tgt);
> > -	pr_warn("\nwhich lock already depends on the new lock.\n\n");
> >  	pr_warn("\nthe existing dependency chain (in reverse order) is:\n");
> >  
> >  	print_circular_bug_entry(entry, depth);
> > -- 
> > 2.14.1

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-23  4:46           ` Sergey Senozhatsky
@ 2017-08-23  5:35             ` Boqun Feng
  2017-08-23  5:44               ` Sergey Senozhatsky
  2017-08-23  5:55               ` Sergey Senozhatsky
  0 siblings, 2 replies; 24+ messages in thread
From: Boqun Feng @ 2017-08-23  5:35 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Byungchul Park, Bart Van Assche, peterz,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

[-- Attachment #1: Type: text/plain, Size: 1855 bytes --]

On Wed, Aug 23, 2017 at 01:46:48PM +0900, Sergey Senozhatsky wrote:
> On (08/23/17 12:38), Boqun Feng wrote:
> [..]
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index 642fb5362507..a3709e15f609 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -1156,6 +1156,23 @@ print_circular_lock_scenario(struct held_lock *src,
> >  		__print_lock_name(target);
> >  		printk(KERN_CONT ");\n");
> 
> KERN_CONT and "\n" should not be together. "\n" flushes the cont
> buffer immediately.
> 

Hmm.. Not quite familiar with printk() stuffs, but I could see several
usages of printk(KERN_CONT "...\n") in kernel.

Did a bit research myself, and I now think the inappropriate use is to
use a KERN_CONT printk *after* another printk ending with a "\n". Am I
missing some recent changes or rules of KERN_CONT?

Regards,
Boqun

> 	-ss
> 
> >  		printk("\n *** DEADLOCK ***\n\n");
> > +	} else if (cross_lock(src->instance)) {
> > +		printk(" Possible unsafe locking scenario by crosslock:\n\n");
> > +		printk("       CPU0                    CPU1\n");
> > +		printk("       ----                    ----\n");
> > +		printk("  lock(");
> > +		__print_lock_name(target);
> > +		printk(KERN_CONT ");\n");
> > +		printk("  lock(");
> > +		__print_lock_name(source);
> > +		printk(KERN_CONT ");\n");
> > +		printk("                               lock(");
> > +		__print_lock_name(parent == source ? target : parent);
> > +		printk(KERN_CONT ");\n");
> > +		printk("                               unlock(");
> > +		__print_lock_name(source);
> > +		printk(KERN_CONT ");\n");
> > +		printk("\n *** DEADLOCK ***\n\n");
> >  	} else {
> >  		printk(" Possible unsafe locking scenario:\n\n");
> >  		printk("       CPU0                    CPU1\n");
> > -- 
> > 2.14.1
> > 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-23  4:38         ` Boqun Feng
  2017-08-23  4:46           ` Sergey Senozhatsky
@ 2017-08-23  5:44           ` Byungchul Park
  1 sibling, 0 replies; 24+ messages in thread
From: Byungchul Park @ 2017-08-23  5:44 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Bart Van Assche, peterz, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, sergey.senozhatsky.work@gmail.com,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

On Wed, Aug 23, 2017 at 12:38:13PM +0800, Boqun Feng wrote:
> From: Boqun Feng <boqun.feng@gmail.com>
> Date: Wed, 23 Aug 2017 12:12:16 +0800
> Subject: [PATCH] lockdep: Print proper scenario if cross deadlock detected at
>  acquisition time
> 
> For a potential deadlock about CROSSRELEASE as follow:
> 
> 	P1		P2
> 	===========	=============
> 	lock(A)
> 	lock(X)
> 			lock(A)
> 			commit(X)
> 
> 	A: normal lock, X: cross lock
> 
> , we could detect it at two places:
> 
> 1. commit time:
> 
> 	We have run P1 first, and have dependency A --> X in graph, and
> 	then we run P2, and find the deadlock.
> 
> 2. acquisition time:
> 
> 	We have run P2 first, and have dependency A --> X, in

X -> A

> 	graph(because another P3 may run previously and is acquiring for

".. another P3 may have run previously and was holding .."
            ^
    Additionally, not only P3 but also P2 like:

	lock(A)
	lock(X)
			lock(X) // I mean it's at _P2_
			lock(A)
			commit(X)

> 	lock X), and then we run P1 and find the deadlock.
>
> In current print_circular_lock_scenario(), for 1) we could print the
> right scenario and note that's a deadlock related to CROSSRELEASE,
> however for 2) we print the scenario as a normal lockdep deadlock.
> 
> It's better to print a proper scenario related to CROSSRELEASE to help
> users find their bugs more easily, so improve this.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  kernel/locking/lockdep.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 642fb5362507..a3709e15f609 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -1156,6 +1156,23 @@ print_circular_lock_scenario(struct held_lock *src,
>  		__print_lock_name(target);
>  		printk(KERN_CONT ");\n");
>  		printk("\n *** DEADLOCK ***\n\n");
> +	} else if (cross_lock(src->instance)) {
> +		printk(" Possible unsafe locking scenario by crosslock:\n\n");
> +		printk("       CPU0                    CPU1\n");
> +		printk("       ----                    ----\n");
> +		printk("  lock(");
> +		__print_lock_name(target);
> +		printk(KERN_CONT ");\n");
> +		printk("  lock(");
> +		__print_lock_name(source);
> +		printk(KERN_CONT ");\n");
> +		printk("                               lock(");
> +		__print_lock_name(parent == source ? target : parent);
> +		printk(KERN_CONT ");\n");
> +		printk("                               unlock(");
> +		__print_lock_name(source);
> +		printk(KERN_CONT ");\n");
> +		printk("\n *** DEADLOCK ***\n\n");
>  	} else {
>  		printk(" Possible unsafe locking scenario:\n\n");
>  		printk("       CPU0                    CPU1\n");

I need time to be sure if it's correct.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-23  5:35             ` Boqun Feng
@ 2017-08-23  5:44               ` Sergey Senozhatsky
  2017-08-23  5:55               ` Sergey Senozhatsky
  1 sibling, 0 replies; 24+ messages in thread
From: Sergey Senozhatsky @ 2017-08-23  5:44 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Sergey Senozhatsky, Byungchul Park, Bart Van Assche, peterz,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

On (08/23/17 13:35), Boqun Feng wrote:
[..]
> > >  		printk(KERN_CONT ");\n");
> > 
> > KERN_CONT and "\n" should not be together. "\n" flushes the cont
> > buffer immediately.
> > 
> 
> Hmm.. Not quite familiar with printk() stuffs, but I could see several
> usages of printk(KERN_CONT "...\n") in kernel.
> 
> Did a bit research myself, and I now think the inappropriate use is to
> use a KERN_CONT printk *after* another printk ending with a "\n". Am I
> missing some recent changes or rules of KERN_CONT?

has been this way for quite some time (if not always).

LOG_NEWLINE results in cont_flush(), which log_store() the content
of KERN_CONT buffer.

if we see that supplied message has no \n then we store it in a
dedicated buffer (cont buffer)

	if (!(lflags & LOG_NEWLINE))
		return cont_add();
	
	return log_store();

we flush that buffer (move its content to the kernel log buffer) when
we receive a message with a \n or when printk() from another task/context
interrupts the current cont line and, thus, forces us to flush.

	-ss

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-23  5:35             ` Boqun Feng
  2017-08-23  5:44               ` Sergey Senozhatsky
@ 2017-08-23  5:55               ` Sergey Senozhatsky
  2017-08-24  4:39                 ` Boqun Feng
  1 sibling, 1 reply; 24+ messages in thread
From: Sergey Senozhatsky @ 2017-08-23  5:55 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Sergey Senozhatsky, Byungchul Park, Bart Van Assche, peterz,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

On (08/23/17 13:35), Boqun Feng wrote:
> > KERN_CONT and "\n" should not be together. "\n" flushes the cont
> > buffer immediately.
> > 
> 
> Hmm.. Not quite familiar with printk() stuffs, but I could see several
> usages of printk(KERN_CONT "...\n") in kernel.
> 
> Did a bit research myself, and I now think the inappropriate use is to
> use a KERN_CONT printk *after* another printk ending with a "\n".

ah... I didn't check __print_lock_name(): it leaves unflushed cont buffer
upon the return. sorry, your code is correct.

	-ss

> > >  		printk("\n *** DEADLOCK ***\n\n");
> > > +	} else if (cross_lock(src->instance)) {
> > > +		printk(" Possible unsafe locking scenario by crosslock:\n\n");
> > > +		printk("       CPU0                    CPU1\n");
> > > +		printk("       ----                    ----\n");
> > > +		printk("  lock(");
> > > +		__print_lock_name(target);
> > > +		printk(KERN_CONT ");\n");
> > > +		printk("  lock(");
> > > +		__print_lock_name(source);
> > > +		printk(KERN_CONT ");\n");
> > > +		printk("                               lock(");
> > > +		__print_lock_name(parent == source ? target : parent);
> > > +		printk(KERN_CONT ");\n");
> > > +		printk("                               unlock(");
> > > +		__print_lock_name(source);
> > > +		printk(KERN_CONT ");\n");
> > > +		printk("\n *** DEADLOCK ***\n\n");
> > >  	} else {
> > >  		printk(" Possible unsafe locking scenario:\n\n");
> > >  		printk("       CPU0                    CPU1\n");
> > > -- 
> > > 2.14.1
> > > 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-23  0:03     ` Byungchul Park
  2017-08-23  2:36       ` Sergey Senozhatsky
  2017-08-23  3:49       ` Boqun Feng
@ 2017-08-23  7:53       ` Peter Zijlstra
  2017-08-30  5:20       ` Sergey Senozhatsky
  3 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2017-08-23  7:53 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Bart Van Assche, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, sergey.senozhatsky.work@gmail.com,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

On Wed, Aug 23, 2017 at 09:03:04AM +0900, Byungchul Park wrote:
> On Tue, Aug 22, 2017 at 09:43:56PM +0000, Bart Van Assche wrote:

> The report is talking about the following lockup:
> 
> A work in a worker                     A task work on exit to user
> ------------------                     ---------------------------
> mutex_lock(&bdev->bd_mutex)
>                                        mutext_lock(&bdev->bd_mutex)
> blk_execute_rq()
>    wait_for_completion_io_timeout(&A)
>                                        complete(&A)
> 
> Is this impossible?
> 
> To Peterz,
> 
> Anyway I wanted to avoid lockdep reports in the case using a timeout
> interface. Do you think it's still worth reporting the kind of lockup?

Yes, people might not have expected to hit the timeout on this. They
might think timeout means a dead device or something like that.

I'd like to heard from the block folks if this was constructed thus on
purpose though.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-23  5:55               ` Sergey Senozhatsky
@ 2017-08-24  4:39                 ` Boqun Feng
  2017-08-24  4:49                   ` Sergey Senozhatsky
  0 siblings, 1 reply; 24+ messages in thread
From: Boqun Feng @ 2017-08-24  4:39 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Byungchul Park, Bart Van Assche, peterz,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

[-- Attachment #1: Type: text/plain, Size: 1905 bytes --]

On Wed, Aug 23, 2017 at 02:55:17PM +0900, Sergey Senozhatsky wrote:
> On (08/23/17 13:35), Boqun Feng wrote:
> > > KERN_CONT and "\n" should not be together. "\n" flushes the cont
> > > buffer immediately.
> > > 
> > 
> > Hmm.. Not quite familiar with printk() stuffs, but I could see several
> > usages of printk(KERN_CONT "...\n") in kernel.
> > 
> > Did a bit research myself, and I now think the inappropriate use is to
> > use a KERN_CONT printk *after* another printk ending with a "\n".
> 
> ah... I didn't check __print_lock_name(): it leaves unflushed cont buffer
> upon the return. sorry, your code is correct.
> 

So means printk(KERN_CON "..."); + printk(KERN_CONT "...\n") is a
correct usage, right? Thanks. Again, not familiar with printk stuff,
glad you can help me go through this ;-)

Regards,
Boqun

> 	-ss
> 
> > > >  		printk("\n *** DEADLOCK ***\n\n");
> > > > +	} else if (cross_lock(src->instance)) {
> > > > +		printk(" Possible unsafe locking scenario by crosslock:\n\n");
> > > > +		printk("       CPU0                    CPU1\n");
> > > > +		printk("       ----                    ----\n");
> > > > +		printk("  lock(");
> > > > +		__print_lock_name(target);
> > > > +		printk(KERN_CONT ");\n");
> > > > +		printk("  lock(");
> > > > +		__print_lock_name(source);
> > > > +		printk(KERN_CONT ");\n");
> > > > +		printk("                               lock(");
> > > > +		__print_lock_name(parent == source ? target : parent);
> > > > +		printk(KERN_CONT ");\n");
> > > > +		printk("                               unlock(");
> > > > +		__print_lock_name(source);
> > > > +		printk(KERN_CONT ");\n");
> > > > +		printk("\n *** DEADLOCK ***\n\n");
> > > >  	} else {
> > > >  		printk(" Possible unsafe locking scenario:\n\n");
> > > >  		printk("       CPU0                    CPU1\n");
> > > > -- 
> > > > 2.14.1
> > > > 
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-24  4:39                 ` Boqun Feng
@ 2017-08-24  4:49                   ` Sergey Senozhatsky
  0 siblings, 0 replies; 24+ messages in thread
From: Sergey Senozhatsky @ 2017-08-24  4:49 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Sergey Senozhatsky, Byungchul Park, Bart Van Assche, peterz,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

Hi,

On (08/24/17 12:39), Boqun Feng wrote:
> On Wed, Aug 23, 2017 at 02:55:17PM +0900, Sergey Senozhatsky wrote:
> > On (08/23/17 13:35), Boqun Feng wrote:
> > > > KERN_CONT and "\n" should not be together. "\n" flushes the cont
> > > > buffer immediately.
> > > > 
> > > 
> > > Hmm.. Not quite familiar with printk() stuffs, but I could see several
> > > usages of printk(KERN_CONT "...\n") in kernel.
> > > 
> > > Did a bit research myself, and I now think the inappropriate use is to
> > > use a KERN_CONT printk *after* another printk ending with a "\n".
> > 
> > ah... I didn't check __print_lock_name(): it leaves unflushed cont buffer
> > upon the return. sorry, your code is correct.
> > 
> 
> So means printk(KERN_CON "..."); + printk(KERN_CONT "...\n") is a
> correct usage, right?

well, yes. with one precondition - there should be no printk-s from other
CPUs/tasks in between

	printk(KERN_CON "...");  +  printk(KERN_CONT "...\n")
                               ^^^^^
			here we can have a preliminary flush and broken
			cont line. but it's been this way forever.

	-ss

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-23  0:03     ` Byungchul Park
                         ` (2 preceding siblings ...)
  2017-08-23  7:53       ` Peter Zijlstra
@ 2017-08-30  5:20       ` Sergey Senozhatsky
  2017-08-30  5:43         ` Byungchul Park
  3 siblings, 1 reply; 24+ messages in thread
From: Sergey Senozhatsky @ 2017-08-30  5:20 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Bart Van Assche, peterz, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, sergey.senozhatsky.work@gmail.com,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

On (08/23/17 09:03), Byungchul Park wrote:
[..]
> > Byungchul, did you add the crosslock checks to lockdep? Can you have a look at
> > the above report? That report namely doesn't make sense to me.
> 
> The report is talking about the following lockup:
> 
> A work in a worker                     A task work on exit to user
> ------------------                     ---------------------------
> mutex_lock(&bdev->bd_mutex)
>                                        mutext_lock(&bdev->bd_mutex)
> blk_execute_rq()
>    wait_for_completion_io_timeout(&A)
>                                        complete(&A)
> 
[..]
> To Peterz,
> 
> Anyway I wanted to avoid lockdep reports in the case using a timeout
> interface. Do you think it's still worth reporting the kind of lockup?
> I'm ok if you do.

Byungchul, a quick question.
have you measured the performance impact? somehow my linux-next is
notably slower than earlier 4.13 linux-next. (e.g. scrolling in vim
is irritatingly slow)


`time dmesg' shows some difference, but probably that's not a good
test.

	!LOCKDEP	LOCKDEP		LOCKDEP -CROSSRELEASE -COMPLETIONS
	real 0m0.661s	0m2.290s	0m1.920s
	user 0m0.010s	0m0.105s	0m0.000s
	sys  0m0.636s	0m2.224s	0m1.888s

anyone else "sees"/"can confirm" the slow down?


it gets back to "usual normal" when I disable CROSSRELEASE and COMPLETIONS.

---

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index b19c491cbc4e..cdc30ef81c5e 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1091,8 +1091,6 @@ config PROVE_LOCKING
        select DEBUG_MUTEXES
        select DEBUG_RT_MUTEXES if RT_MUTEXES
        select DEBUG_LOCK_ALLOC
-       select LOCKDEP_CROSSRELEASE
-       select LOCKDEP_COMPLETIONS
        select TRACE_IRQFLAGS
        default n
        help

---

	-ss

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-30  5:20       ` Sergey Senozhatsky
@ 2017-08-30  5:43         ` Byungchul Park
  2017-08-30  6:15           ` Sergey Senozhatsky
  0 siblings, 1 reply; 24+ messages in thread
From: Byungchul Park @ 2017-08-30  5:43 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Bart Van Assche, peterz, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, martin.petersen@oracle.com,
	axboe@kernel.dk, linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

On Wed, Aug 30, 2017 at 02:20:37PM +0900, Sergey Senozhatsky wrote:
> Byungchul, a quick question.

Hello Sergey,

> have you measured the performance impact? somehow my linux-next is

Yeah, it might have performance impact inevitably.

> notably slower than earlier 4.13 linux-next. (e.g. scrolling in vim
> is irritatingly slow)

To Ingo,

I cannot decide if we have to roll back CONFIG_LOCKDEP_CROSSRELEASE
dependency on CONFIG_PROVE_LOCKING in Kconfig. With them enabled,
lockdep detection becomes strong but has performance impact. But,
it's anyway a debug option so IMHO we don't have to take case of the
performance impact. Please let me know your decision.

> `time dmesg' shows some difference, but probably that's not a good
> test.
> 
> 	!LOCKDEP	LOCKDEP		LOCKDEP -CROSSRELEASE -COMPLETIONS
> 	real 0m0.661s	0m2.290s	0m1.920s
> 	user 0m0.010s	0m0.105s	0m0.000s
> 	sys  0m0.636s	0m2.224s	0m1.888s
> 
> anyone else "sees"/"can confirm" the slow down?
> 
> 
> it gets back to "usual normal" when I disable CROSSRELEASE and COMPLETIONS.
> 
> ---
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index b19c491cbc4e..cdc30ef81c5e 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1091,8 +1091,6 @@ config PROVE_LOCKING
>         select DEBUG_MUTEXES
>         select DEBUG_RT_MUTEXES if RT_MUTEXES
>         select DEBUG_LOCK_ALLOC
> -       select LOCKDEP_CROSSRELEASE
> -       select LOCKDEP_COMPLETIONS
>         select TRACE_IRQFLAGS
>         default n
>         help
> 
> ---
> 
> 	-ss

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-30  5:43         ` Byungchul Park
@ 2017-08-30  6:15           ` Sergey Senozhatsky
  2017-08-30  8:42             ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Sergey Senozhatsky @ 2017-08-30  6:15 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Sergey Senozhatsky, Bart Van Assche, peterz,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

Hi,

On (08/30/17 14:43), Byungchul Park wrote:
[..]
> > notably slower than earlier 4.13 linux-next. (e.g. scrolling in vim
> > is irritatingly slow)
> 
> To Ingo,
> 
> I cannot decide if we have to roll back CONFIG_LOCKDEP_CROSSRELEASE
> dependency on CONFIG_PROVE_LOCKING in Kconfig. With them enabled,
> lockdep detection becomes strong but has performance impact. But,
> it's anyway a debug option so IMHO we don't have to take case of the
> performance impact. Please let me know your decision.

well, I expected it :)

I've been running lockdep enabled kernels for years, and was OK with
the performance. but now it's just too much and I'm looking at disabling
lockdep.

a more relevant test -- compilation of a relatively small project

  LOCKDEP -CROSSRELEASE -COMPLETIONS     LOCKDEP +CROSSRELEASE +COMPLETIONS

   real    1m23.722s                      real    2m9.969s
   user    4m11.300s                      user    4m15.458s
   sys     0m49.386s                      sys     2m3.594s


you don't want to know how much time now it takes to recompile the
kernel ;)

	-ss

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-30  6:15           ` Sergey Senozhatsky
@ 2017-08-30  8:42             ` Peter Zijlstra
  2017-08-30  8:47               ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2017-08-30  8:42 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Byungchul Park, Bart Van Assche, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, martin.petersen@oracle.com,
	axboe@kernel.dk, linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

On Wed, Aug 30, 2017 at 03:15:11PM +0900, Sergey Senozhatsky wrote:
> Hi,
> 
> On (08/30/17 14:43), Byungchul Park wrote:
> [..]
> > > notably slower than earlier 4.13 linux-next. (e.g. scrolling in vim
> > > is irritatingly slow)
> > 
> > To Ingo,
> > 
> > I cannot decide if we have to roll back CONFIG_LOCKDEP_CROSSRELEASE
> > dependency on CONFIG_PROVE_LOCKING in Kconfig. With them enabled,
> > lockdep detection becomes strong but has performance impact. But,
> > it's anyway a debug option so IMHO we don't have to take case of the
> > performance impact. Please let me know your decision.
> 
> well, I expected it :)
> 
> I've been running lockdep enabled kernels for years, and was OK with
> the performance. but now it's just too much and I'm looking at disabling
> lockdep.
> 
> a more relevant test -- compilation of a relatively small project
> 
>   LOCKDEP -CROSSRELEASE -COMPLETIONS     LOCKDEP +CROSSRELEASE +COMPLETIONS
> 
>    real    1m23.722s                      real    2m9.969s
>    user    4m11.300s                      user    4m15.458s
>    sys     0m49.386s                      sys     2m3.594s
> 
> 
> you don't want to know how much time now it takes to recompile the
> kernel ;)

Right,.. so when I look at perf annotate for __lock_acquire and
lock_release (the two most expensive lockdep functions in a kernel
profile) I don't actually see much cross-release stuff.

So the overhead looks to be spread out over all sorts, which makes it
harder to find and fix.

stack unwinding is done lots and is fairly expensive, I've not yet
checked if crossrelease does too much of that.

The below saved about 50% of my __lock_acquire() time, not sure it made
a significant difference over all though.

---
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 44c8d0d17170..f8db1ead1c48 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3386,7 +3386,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 		if (!class)
 			return 0;
 	}
-	atomic_inc((atomic_t *)&class->ops);
+	/* atomic_inc((atomic_t *)&class->ops); */
 	if (very_verbose(class)) {
 		printk("\nacquire class [%p] %s", class->key, class->name);
 		if (class->name_version > 1)

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-30  8:42             ` Peter Zijlstra
@ 2017-08-30  8:47               ` Peter Zijlstra
  2017-08-30  8:53                 ` Byungchul Park
  2017-08-30 12:30                 ` Sergey Senozhatsky
  0 siblings, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2017-08-30  8:47 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Byungchul Park, Bart Van Assche, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, martin.petersen@oracle.com,
	axboe@kernel.dk, linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

On Wed, Aug 30, 2017 at 10:42:07AM +0200, Peter Zijlstra wrote:
> 
> So the overhead looks to be spread out over all sorts, which makes it
> harder to find and fix.
> 
> stack unwinding is done lots and is fairly expensive, I've not yet
> checked if crossrelease does too much of that.

Aah, we do an unconditional stack unwind for every __lock_acquire() now.
It keeps a trace in the xhlocks[].

Does the below cure most of that overhead?

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 44c8d0d17170..7b872036b72e 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4872,7 +4872,7 @@ static void add_xhlock(struct held_lock *hlock)
 	xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
 	xhlock->trace.entries = xhlock->trace_entries;
 	xhlock->trace.skip = 3;
-	save_stack_trace(&xhlock->trace);
+	/* save_stack_trace(&xhlock->trace); */
 }
 
 static inline int same_context_xhlock(struct hist_lock *xhlock)

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* RE: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-30  8:47               ` Peter Zijlstra
@ 2017-08-30  8:53                 ` Byungchul Park
  2017-08-30 12:30                 ` Sergey Senozhatsky
  1 sibling, 0 replies; 24+ messages in thread
From: Byungchul Park @ 2017-08-30  8:53 UTC (permalink / raw)
  To: 'Peter Zijlstra', 'Sergey Senozhatsky'
  Cc: 'Bart Van Assche', linux-kernel, linux-block,
	martin.petersen, axboe, linux-scsi, sfr, linux-next, kernel-team

> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: Wednesday, August 30, 2017 5:48 PM
> To: Sergey Senozhatsky
> Cc: Byungchul Park; Bart Van Assche; linux-kernel@vger.kernel.org; linux-
> block@vger.kernel.org; martin.petersen@oracle.com; axboe@kernel.dk; linux-
> scsi@vger.kernel.org; sfr@canb.auug.org.au; linux-next@vger.kernel.org;
> kernel-team@lge.com
> Subject: Re: possible circular locking dependency detected [was: linux-
> next: Tree for Aug 22]
> 
> On Wed, Aug 30, 2017 at 10:42:07AM +0200, Peter Zijlstra wrote:
> >
> > So the overhead looks to be spread out over all sorts, which makes it
> > harder to find and fix.
> >
> > stack unwinding is done lots and is fairly expensive, I've not yet
> > checked if crossrelease does too much of that.
> 
> Aah, we do an unconditional stack unwind for every __lock_acquire() now.
> It keeps a trace in the xhlocks[].

Yeah.. I also think this is most significant..

> 
> Does the below cure most of that overhead?
> 
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 44c8d0d17170..7b872036b72e 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -4872,7 +4872,7 @@ static void add_xhlock(struct held_lock *hlock)
>  	xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
>  	xhlock->trace.entries = xhlock->trace_entries;
>  	xhlock->trace.skip = 3;
> -	save_stack_trace(&xhlock->trace);
> +	/* save_stack_trace(&xhlock->trace); */
>  }
> 
>  static inline int same_context_xhlock(struct hist_lock *xhlock)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
  2017-08-30  8:47               ` Peter Zijlstra
  2017-08-30  8:53                 ` Byungchul Park
@ 2017-08-30 12:30                 ` Sergey Senozhatsky
  1 sibling, 0 replies; 24+ messages in thread
From: Sergey Senozhatsky @ 2017-08-30 12:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sergey Senozhatsky, Byungchul Park, Bart Van Assche,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org, kernel-team

Hello Peter,

On (08/30/17 10:47), Peter Zijlstra wrote:
[..]
> On Wed, Aug 30, 2017 at 10:42:07AM +0200, Peter Zijlstra wrote:
> > 
> > So the overhead looks to be spread out over all sorts, which makes it
> > harder to find and fix.
> > 
> > stack unwinding is done lots and is fairly expensive, I've not yet
> > checked if crossrelease does too much of that.
> 
> Aah, we do an unconditional stack unwind for every __lock_acquire() now.
> It keeps a trace in the xhlocks[].
> 
> Does the below cure most of that overhead?

thanks.

mostly yes. the kernel is not so dramatically slower now. it's still
seems to be a bit slower, which is expected I suppose, but over all
it's much better

	real	1m35.209s
	user	4m12.467s
	sys	0m49.457s

// approx 10 seconds slower.

	-ss

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-08-30 12:30 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20170822183816.7925e0f8@canb.auug.org.au>
2017-08-22 10:47 ` possible circular locking dependency detected [was: linux-next: Tree for Aug 22] Sergey Senozhatsky
2017-08-22 21:43   ` Bart Van Assche
2017-08-23  0:03     ` Byungchul Park
2017-08-23  2:36       ` Sergey Senozhatsky
2017-08-23  2:59         ` Byungchul Park
2017-08-23  3:49       ` Boqun Feng
2017-08-23  4:38         ` Boqun Feng
2017-08-23  4:46           ` Sergey Senozhatsky
2017-08-23  5:35             ` Boqun Feng
2017-08-23  5:44               ` Sergey Senozhatsky
2017-08-23  5:55               ` Sergey Senozhatsky
2017-08-24  4:39                 ` Boqun Feng
2017-08-24  4:49                   ` Sergey Senozhatsky
2017-08-23  5:44           ` Byungchul Park
2017-08-23  4:46         ` Byungchul Park
2017-08-23  5:01           ` Boqun Feng
2017-08-23  7:53       ` Peter Zijlstra
2017-08-30  5:20       ` Sergey Senozhatsky
2017-08-30  5:43         ` Byungchul Park
2017-08-30  6:15           ` Sergey Senozhatsky
2017-08-30  8:42             ` Peter Zijlstra
2017-08-30  8:47               ` Peter Zijlstra
2017-08-30  8:53                 ` Byungchul Park
2017-08-30 12:30                 ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).