* Circular locking dependency with pblk
@ 2017-10-05 10:53 Javier González
2017-10-05 16:24 ` Jens Axboe
2017-10-05 23:36 ` Dave Chinner
0 siblings, 2 replies; 7+ messages in thread
From: Javier González @ 2017-10-05 10:53 UTC (permalink / raw)
To: linux-xfs, linux-block; +Cc: darrick.wong, Christoph Hellwig
[-- Attachment #1: Type: text/plain, Size: 11462 bytes --]
Hi,
lockdep is reporting a circular dependency when using XFS and pblk,
which I am a bit confused about.
This happens when XFS sends a number of nested reads and (at least) one
of them hits partially pblk's cache. In this case, pblk will retrieve
the cached lbas and form a new bio, which is submitted _synchronously_
to the media using struct completion. The original bio is then populated
with the read data.
What lockdep complains about, is that the unlocking operation in
complete() has a circular dependency with ionode->i_rwsem when they both
happen on the same core, which is different from the core that issued
wait_for_completion_io_timeout() and is waiting for the partial read.
However, AFAIU complete() happens in interrupt context, so this should
not be a problem.
The only thing I can think of is that XFS is making an assumption when
submitting direct I/O that there will be no synchronous I/O coming from
anybody underneath.
Any thoughts?
TRACE:
[ 8557.938336] ======================================================
[ 8557.944513] WARNING: possible circular locking dependency detected
[ 8557.950688] 4.14.0-rc2--509811f1f2df #726 Not tainted
[ 8557.955739] ------------------------------------------------------
[ 8557.961919] rocksdb:bg11/19129 is trying to acquire lock:
[ 8557.967319] (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] xfs_ilock+0x1a1/0x230
[ 8557.975063]
but now in release context of a crosslock acquired at the following:
[ 8557.983921] ((complete)&wait#5){+.+.}, at: [<ffffffffad933d0a>] pblk_fill_partial_read_bio+0x22a/0x520
[ 8557.993318]
which lock already depends on the new lock.
[ 8558.001490]
the existing dependency chain (in reverse order) is:
[ 8558.008966]
-> #1 ((complete)&wait#5){+.+.}:
[ 8558.014715] __lock_acquire+0x1186/0x11d0
[ 8558.019245] lock_acquire+0xed/0x1e0
[ 8558.023337] wait_for_completion_io_timeout+0x65/0x180
[ 8558.028989] pblk_fill_partial_read_bio+0x22a/0x520
[ 8558.034375] pblk_submit_read+0x158/0x3a0
[ 8558.038900] pblk_make_rq+0xc1/0x110
[ 8558.042992] generic_make_request+0x11d/0x3c0
[ 8558.047862] submit_bio+0x73/0x140
[ 8558.051781] iomap_dio_actor+0x1f8/0x3a0
[ 8558.056216] iomap_apply+0xa1/0x110
[ 8558.060220] iomap_dio_rw+0x26f/0x3c0
[ 8558.064398] xfs_file_dio_aio_read+0x88/0x1c0
[ 8558.069269] xfs_file_read_iter+0xab/0xc0
[ 8558.073794] __vfs_read+0xef/0x160
[ 8558.077709] vfs_read+0xa8/0x150
[ 8558.081454] SyS_pread64+0x98/0xc0
[ 8558.085372] entry_SYSCALL_64_fastpath+0x23/0xc2
[ 8558.090501]
-> #0 (&inode->i_rwsem){++++}:
[ 8558.096067] down_read_nested+0x49/0x70
[ 8558.100417] xfs_ilock+0x1a1/0x230
[ 8558.104335] xfs_file_dio_aio_read+0x74/0x1c0
[ 8558.109202] xfs_file_read_iter+0xab/0xc0
[ 8558.113729] __vfs_read+0xef/0x160
[ 8558.117646]
other info that might help us debug this:
[ 8558.125636] Possible unsafe locking scenario by crosslock:
[ 8558.132672] CPU0 CPU1
[ 8558.137196] ---- ----
[ 8558.141720] lock(&inode->i_rwsem);
[ 8558.145291] lock((complete)&wait#5);
[ 8558.149035] lock(&inode->i_rwsem);
[ 8558.155118] unlock((complete)&wait#5);
[ 8558.161550]
*** DEADLOCK ***
[ 8558.167461] 4 locks held by rocksdb:bg11/19129:
[ 8558.171984] #0: (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] xfs_ilock+0x1a1/0x230
[ 8558.180148] #1: (rcu_read_lock){....}, at: [<ffffffffad72b8cc>] __blk_mq_run_hw_queue+0xec/0x1d0
[ 8558.189091] #2: (&(&nvmeq->q_lock)->rlock){-.-.}, at: [<ffffffffad9d438c>] nvme_queue_rq+0x14c/0x9a0
[ 8558.198384] #3: (&x->wait#17){-.-.}, at: [<ffffffffad16444d>] complete+0x1d/0x60
[ 8558.205948]
stack backtrace:
[ 8558.210300] CPU: 6 PID: 19129 Comm: rocksdb:bg11 Not tainted 4.14.0-rc2--509811f1f2df #726
[ 8558.218550] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0a 06/23/2016
[ 8558.225675] Call Trace:
[ 8558.228123] dump_stack+0x85/0xc7
[ 8558.231438] print_circular_bug+0x226/0x330
[ 8558.235616] ? print_bfs_bug+0x40/0x40
[ 8558.239358] check_prev_add+0x432/0x840
[ 8558.243190] ? __lock_acquire+0x736/0x11d0
[ 8558.247282] lock_commit_crosslock+0x3d3/0x5d0
[ 8558.251718] ? lock_commit_crosslock+0x3d3/0x5d0
[ 8558.256328] complete+0x29/0x60
[ 8558.259469] pblk_end_io_sync+0x12/0x20
[ 8558.263297] nvm_end_io+0x2b/0x30
[ 8558.266607] nvme_nvm_end_io+0x2e/0x50
[ 8558.270351] blk_mq_end_request+0x3e/0x70
[ 8558.274360] nvme_complete_rq+0x1c/0xd0
[ 8558.278194] nvme_pci_complete_rq+0x7b/0x130
[ 8558.282459] __blk_mq_complete_request+0xa3/0x160
[ 8558.287156] blk_mq_complete_request+0x16/0x20
[ 8558.291592] nvme_process_cq+0xf8/0x1e0
[ 8558.295424] nvme_queue_rq+0x16e/0x9a0
[ 8558.299172] blk_mq_dispatch_rq_list+0x19e/0x330
[ 8558.303787] ? blk_mq_flush_busy_ctxs+0x91/0x130
[ 8558.308400] blk_mq_sched_dispatch_requests+0x19d/0x1d0
[ 8558.313617] __blk_mq_run_hw_queue+0x12e/0x1d0
[ 8558.318053] __blk_mq_delay_run_hw_queue+0xb9/0xd0
[ 8558.322837] blk_mq_run_hw_queue+0x14/0x20
[ 8558.326928] blk_mq_sched_insert_request+0xa4/0x180
[ 8558.331797] blk_execute_rq_nowait+0x72/0xf0
[ 8558.336061] nvme_nvm_submit_io+0xd9/0x130
[ 8558.340151] nvm_submit_io+0x3c/0x70
[ 8558.343723] pblk_submit_io+0x1b/0x20
[ 8558.347379] pblk_submit_read+0x1ec/0x3a0
[ 8558.351385] pblk_make_rq+0xc1/0x110
[ 8558.354954] generic_make_request+0x11d/0x3c0
[ 8558.359308] ? iov_iter_get_pages+0xc0/0x350
[ 8558.363569] submit_bio+0x73/0x140
[ 8558.366966] ? submit_bio+0x73/0x140
[ 8558.370541] ? bio_iov_iter_get_pages+0xd7/0x120
[ 8558.375147] iomap_dio_actor+0x1f8/0x3a0
[ 8558.379066] ? iomap_dio_zero+0x110/0x110
[ 8558.383068] iomap_apply+0xa1/0x110
[ 8558.386553] iomap_dio_rw+0x26f/0x3c0
[ 8558.390210] ? iomap_dio_zero+0x110/0x110
[ 8558.394216] ? down_read_nested+0x49/0x70
[ 8558.398219] xfs_file_dio_aio_read+0x88/0x1c0
[ 8558.402570] ? xfs_file_dio_aio_read+0x88/0x1c0
[ 8558.407094] xfs_file_read_iter+0xab/0xc0
[ 8558.411098] __vfs_read+0xef/0x160
[ 8558.414494] vfs_read+0xa8/0x150
[ 8558.417718] SyS_pread64+0x98/0xc0
[ 8558.421116] entry_SYSCALL_64_fastpath+0x23/0xc2
[ 8558.425726] RIP: 0033:0x7f816caaed43
[ 8558.429296] RSP: 002b:00007f8150ff4d50 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
[ 8558.436854] RAX: ffffffffffffffda RBX: 00007f8130219a10 RCX: 00007f816caaed43
[ 8558.443978] RDX: 0000000000200000 RSI: 00007f8130408000 RDI: 0000000000000114
[ 8558.451102] RBP: 00007f8150ff6110 R08: 00007f8150ff4f20 R09: 00007f8130408000
[ 8558.458226] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000114
[ 8558.465350] R13: 000000000143fb20 R14: 0000000001445fb8 R15: 00007f8150ff6550
[40409.278987] perf: interrupt took too long (6252 > 6245), lowering kernel.perf_event_max_sample_rate to 31750
[40458.049396] XFS (pblk0): Unmounting Filesystem
javigon@hero:~$
[ 8558.117646]
other info that might help us debug this:
[ 8558.125636] Possible unsafe locking scenario by crosslock:
[ 8558.132672] CPU0 CPU1
[ 8558.137196] ---- ----
[ 8558.141720] lock(&inode->i_rwsem);
[ 8558.145291] lock((complete)&wait#5);
[ 8558.149035] lock(&inode->i_rwsem);
[ 8558.155118] unlock((complete)&wait#5);
[ 8558.161550]
*** DEADLOCK ***
[ 8558.167461] 4 locks held by rocksdb:bg11/19129:
[ 8558.171984] #0: (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] xfs_ilock+0x1a1/0x230
[ 8558.180148] #1: (rcu_read_lock){....}, at: [<ffffffffad72b8cc>] __blk_mq_run_hw_queue+0xec/0x1d0
[ 8558.189091] #2: (&(&nvmeq->q_lock)->rlock){-.-.}, at: [<ffffffffad9d438c>] nvme_queue_rq+0x14c/0x9a0
[ 8558.198384] #3: (&x->wait#17){-.-.}, at: [<ffffffffad16444d>] complete+0x1d/0x60
[ 8558.205948]
stack backtrace:
[ 8558.210300] CPU: 6 PID: 19129 Comm: rocksdb:bg11 Not tainted 4.14.0-rc2--509811f1f2df #726
[ 8558.218550] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0a 06/23/2016
[ 8558.225675] Call Trace:
[ 8558.228123] dump_stack+0x85/0xc7
[ 8558.231438] print_circular_bug+0x226/0x330
[ 8558.235616] ? print_bfs_bug+0x40/0x40
[ 8558.239358] check_prev_add+0x432/0x840
[ 8558.243190] ? __lock_acquire+0x736/0x11d0
[ 8558.247282] lock_commit_crosslock+0x3d3/0x5d0
[ 8558.251718] ? lock_commit_crosslock+0x3d3/0x5d0
[ 8558.256328] complete+0x29/0x60
[ 8558.259469] pblk_end_io_sync+0x12/0x20
[ 8558.263297] nvm_end_io+0x2b/0x30
[ 8558.266607] nvme_nvm_end_io+0x2e/0x50
[ 8558.270351] blk_mq_end_request+0x3e/0x70
[ 8558.274360] nvme_complete_rq+0x1c/0xd0
[ 8558.278194] nvme_pci_complete_rq+0x7b/0x130
[ 8558.282459] __blk_mq_complete_request+0xa3/0x160
[ 8558.287156] blk_mq_complete_request+0x16/0x20
[ 8558.291592] nvme_process_cq+0xf8/0x1e0
[ 8558.295424] nvme_queue_rq+0x16e/0x9a0
[ 8558.299172] blk_mq_dispatch_rq_list+0x19e/0x330
[ 8558.303787] ? blk_mq_flush_busy_ctxs+0x91/0x130
[ 8558.308400] blk_mq_sched_dispatch_requests+0x19d/0x1d0
[ 8558.313617] __blk_mq_run_hw_queue+0x12e/0x1d0
[ 8558.318053] __blk_mq_delay_run_hw_queue+0xb9/0xd0
[ 8558.322837] blk_mq_run_hw_queue+0x14/0x20
[ 8558.326928] blk_mq_sched_insert_request+0xa4/0x180
[ 8558.331797] blk_execute_rq_nowait+0x72/0xf0
[ 8558.336061] nvme_nvm_submit_io+0xd9/0x130
[ 8558.340151] nvm_submit_io+0x3c/0x70
[ 8558.343723] pblk_submit_io+0x1b/0x20
[ 8558.347379] pblk_submit_read+0x1ec/0x3a0
[ 8558.351385] pblk_make_rq+0xc1/0x110
[ 8558.354954] generic_make_request+0x11d/0x3c0
[ 8558.359308] ? iov_iter_get_pages+0xc0/0x350
[ 8558.363569] submit_bio+0x73/0x140
[ 8558.366966] ? submit_bio+0x73/0x140
[ 8558.370541] ? bio_iov_iter_get_pages+0xd7/0x120
[ 8558.375147] iomap_dio_actor+0x1f8/0x3a0
[ 8558.379066] ? iomap_dio_zero+0x110/0x110
[ 8558.383068] iomap_apply+0xa1/0x110
[ 8558.386553] iomap_dio_rw+0x26f/0x3c0
[ 8558.390210] ? iomap_dio_zero+0x110/0x110
[ 8558.394216] ? down_read_nested+0x49/0x70
[ 8558.398219] xfs_file_dio_aio_read+0x88/0x1c0
[ 8558.402570] ? xfs_file_dio_aio_read+0x88/0x1c0
[ 8558.407094] xfs_file_read_iter+0xab/0xc0
[ 8558.411098] __vfs_read+0xef/0x160
[ 8558.414494] vfs_read+0xa8/0x150
[ 8558.417718] SyS_pread64+0x98/0xc0
[ 8558.421116] entry_SYSCALL_64_fastpath+0x23/0xc2
[ 8558.425726] RIP: 0033:0x7f816caaed43
[ 8558.429296] RSP: 002b:00007f8150ff4d50 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
[ 8558.436854] RAX: ffffffffffffffda RBX: 00007f8130219a10 RCX: 00007f816caaed43
[ 8558.443978] RDX: 0000000000200000 RSI: 00007f8130408000 RDI: 0000000000000114
[ 8558.451102] RBP: 00007f8150ff6110 R08: 00007f8150ff4f20 R09: 00007f8130408000
[ 8558.458226] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000114
[ 8558.465350] R13: 000000000143fb20 R14: 0000000001445fb8 R15: 00007f8150ff6550
[40409.278987] perf: interrupt took too long (6252 > 6245), lowering kernel.perf_event_max_sample_rate to 31750
[40458.049396] XFS (pblk0): Unmounting Filesystem
Thanks,
Javier
[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Circular locking dependency with pblk
2017-10-05 10:53 Circular locking dependency with pblk Javier González
@ 2017-10-05 16:24 ` Jens Axboe
2017-10-05 23:36 ` Dave Chinner
1 sibling, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2017-10-05 16:24 UTC (permalink / raw)
To: Javier González, linux-xfs, linux-block
Cc: darrick.wong, Christoph Hellwig
On 10/05/2017 04:53 AM, Javier Gonz�lez wrote:
> Hi,
>
> lockdep is reporting a circular dependency when using XFS and pblk,
> which I am a bit confused about.
>
> This happens when XFS sends a number of nested reads and (at least) one
> of them hits partially pblk's cache. In this case, pblk will retrieve
> the cached lbas and form a new bio, which is submitted _synchronously_
> to the media using struct completion. The original bio is then populated
> with the read data.
>
> What lockdep complains about, is that the unlocking operation in
> complete() has a circular dependency with ionode->i_rwsem when they both
> happen on the same core, which is different from the core that issued
> wait_for_completion_io_timeout() and is waiting for the partial read.
> However, AFAIU complete() happens in interrupt context, so this should
> not be a problem.
But the very trace you are posting shows the completion being down
inline, since we catch it at submission time:
> [ 8558.256328] complete+0x29/0x60
> [ 8558.259469] pblk_end_io_sync+0x12/0x20
> [ 8558.263297] nvm_end_io+0x2b/0x30
> [ 8558.266607] nvme_nvm_end_io+0x2e/0x50
> [ 8558.270351] blk_mq_end_request+0x3e/0x70
> [ 8558.274360] nvme_complete_rq+0x1c/0xd0
> [ 8558.278194] nvme_pci_complete_rq+0x7b/0x130
> [ 8558.282459] __blk_mq_complete_request+0xa3/0x160
> [ 8558.287156] blk_mq_complete_request+0x16/0x20
> [ 8558.291592] nvme_process_cq+0xf8/0x1e0
> [ 8558.295424] nvme_queue_rq+0x16e/0x9a0
> [ 8558.299172] blk_mq_dispatch_rq_list+0x19e/0x330
> [ 8558.303787] ? blk_mq_flush_busy_ctxs+0x91/0x130
> [ 8558.308400] blk_mq_sched_dispatch_requests+0x19d/0x1d0
> [ 8558.313617] __blk_mq_run_hw_queue+0x12e/0x1d0
> [ 8558.318053] __blk_mq_delay_run_hw_queue+0xb9/0xd0
> [ 8558.322837] blk_mq_run_hw_queue+0x14/0x20
> [ 8558.326928] blk_mq_sched_insert_request+0xa4/0x180
> [ 8558.331797] blk_execute_rq_nowait+0x72/0xf0
> [ 8558.336061] nvme_nvm_submit_io+0xd9/0x130
> [ 8558.340151] nvm_submit_io+0x3c/0x70
> [ 8558.343723] pblk_submit_io+0x1b/0x20> [ 8558.347379] pblk_submit_read+0x1ec/0x3a0
[snip]
This happens since we call nvme_process_cq() after submitting IO,
just in case there's something we can complete.
--
Jens Axboe
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Circular locking dependency with pblk
@ 2017-10-05 16:24 ` Jens Axboe
0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2017-10-05 16:24 UTC (permalink / raw)
To: Javier González, linux-xfs, linux-block
Cc: darrick.wong, Christoph Hellwig
On 10/05/2017 04:53 AM, Javier González wrote:
> Hi,
>
> lockdep is reporting a circular dependency when using XFS and pblk,
> which I am a bit confused about.
>
> This happens when XFS sends a number of nested reads and (at least) one
> of them hits partially pblk's cache. In this case, pblk will retrieve
> the cached lbas and form a new bio, which is submitted _synchronously_
> to the media using struct completion. The original bio is then populated
> with the read data.
>
> What lockdep complains about, is that the unlocking operation in
> complete() has a circular dependency with ionode->i_rwsem when they both
> happen on the same core, which is different from the core that issued
> wait_for_completion_io_timeout() and is waiting for the partial read.
> However, AFAIU complete() happens in interrupt context, so this should
> not be a problem.
But the very trace you are posting shows the completion being down
inline, since we catch it at submission time:
> [ 8558.256328] complete+0x29/0x60
> [ 8558.259469] pblk_end_io_sync+0x12/0x20
> [ 8558.263297] nvm_end_io+0x2b/0x30
> [ 8558.266607] nvme_nvm_end_io+0x2e/0x50
> [ 8558.270351] blk_mq_end_request+0x3e/0x70
> [ 8558.274360] nvme_complete_rq+0x1c/0xd0
> [ 8558.278194] nvme_pci_complete_rq+0x7b/0x130
> [ 8558.282459] __blk_mq_complete_request+0xa3/0x160
> [ 8558.287156] blk_mq_complete_request+0x16/0x20
> [ 8558.291592] nvme_process_cq+0xf8/0x1e0
> [ 8558.295424] nvme_queue_rq+0x16e/0x9a0
> [ 8558.299172] blk_mq_dispatch_rq_list+0x19e/0x330
> [ 8558.303787] ? blk_mq_flush_busy_ctxs+0x91/0x130
> [ 8558.308400] blk_mq_sched_dispatch_requests+0x19d/0x1d0
> [ 8558.313617] __blk_mq_run_hw_queue+0x12e/0x1d0
> [ 8558.318053] __blk_mq_delay_run_hw_queue+0xb9/0xd0
> [ 8558.322837] blk_mq_run_hw_queue+0x14/0x20
> [ 8558.326928] blk_mq_sched_insert_request+0xa4/0x180
> [ 8558.331797] blk_execute_rq_nowait+0x72/0xf0
> [ 8558.336061] nvme_nvm_submit_io+0xd9/0x130
> [ 8558.340151] nvm_submit_io+0x3c/0x70
> [ 8558.343723] pblk_submit_io+0x1b/0x20> [ 8558.347379] pblk_submit_read+0x1ec/0x3a0
[snip]
This happens since we call nvme_process_cq() after submitting IO,
just in case there's something we can complete.
--
Jens Axboe
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Circular locking dependency with pblk
2017-10-05 16:24 ` Jens Axboe
(?)
@ 2017-10-05 16:44 ` Javier González
-1 siblings, 0 replies; 7+ messages in thread
From: Javier González @ 2017-10-05 16:44 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-xfs, linux-block, darrick.wong, Christoph Hellwig
[-- Attachment #1: Type: text/plain, Size: 2668 bytes --]
> On 5 Oct 2017, at 18.24, Jens Axboe <axboe@kernel.dk> wrote:
>
> On 10/05/2017 04:53 AM, Javier González wrote:
>> Hi,
>>
>> lockdep is reporting a circular dependency when using XFS and pblk,
>> which I am a bit confused about.
>>
>> This happens when XFS sends a number of nested reads and (at least) one
>> of them hits partially pblk's cache. In this case, pblk will retrieve
>> the cached lbas and form a new bio, which is submitted _synchronously_
>> to the media using struct completion. The original bio is then populated
>> with the read data.
>>
>> What lockdep complains about, is that the unlocking operation in
>> complete() has a circular dependency with ionode->i_rwsem when they both
>> happen on the same core, which is different from the core that issued
>> wait_for_completion_io_timeout() and is waiting for the partial read.
>> However, AFAIU complete() happens in interrupt context, so this should
>> not be a problem.
>
> But the very trace you are posting shows the completion being down
> inline, since we catch it at submission time:
>
>> [ 8558.256328] complete+0x29/0x60
>> [ 8558.259469] pblk_end_io_sync+0x12/0x20
>> [ 8558.263297] nvm_end_io+0x2b/0x30
>> [ 8558.266607] nvme_nvm_end_io+0x2e/0x50
>> [ 8558.270351] blk_mq_end_request+0x3e/0x70
>> [ 8558.274360] nvme_complete_rq+0x1c/0xd0
>> [ 8558.278194] nvme_pci_complete_rq+0x7b/0x130
>> [ 8558.282459] __blk_mq_complete_request+0xa3/0x160
>> [ 8558.287156] blk_mq_complete_request+0x16/0x20
>> [ 8558.291592] nvme_process_cq+0xf8/0x1e0
>> [ 8558.295424] nvme_queue_rq+0x16e/0x9a0
>> [ 8558.299172] blk_mq_dispatch_rq_list+0x19e/0x330
>> [ 8558.303787] ? blk_mq_flush_busy_ctxs+0x91/0x130
>> [ 8558.308400] blk_mq_sched_dispatch_requests+0x19d/0x1d0
>> [ 8558.313617] __blk_mq_run_hw_queue+0x12e/0x1d0
>> [ 8558.318053] __blk_mq_delay_run_hw_queue+0xb9/0xd0
>> [ 8558.322837] blk_mq_run_hw_queue+0x14/0x20
>> [ 8558.326928] blk_mq_sched_insert_request+0xa4/0x180
>> [ 8558.331797] blk_execute_rq_nowait+0x72/0xf0
>> [ 8558.336061] nvme_nvm_submit_io+0xd9/0x130
>> [ 8558.340151] nvm_submit_io+0x3c/0x70
>> [ 8558.343723] pblk_submit_io+0x1b/0x20> [ 8558.347379] pblk_submit_read+0x1ec/0x3a0
>
> [snip]
>
> This happens since we call nvme_process_cq() after submitting IO,
> just in case there's something we can complete.
>
Hmm. It's still interesting that the FS is allowed to take the
rw_semaphore before we get to fully complete the read bio in pblk. I'll
look into it tomorrow.
Also, is it normal that we switch core when calling nvme_process_cq()
on the submission path?
Javier
[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Circular locking dependency with pblk
2017-10-05 10:53 Circular locking dependency with pblk Javier González
@ 2017-10-05 23:36 ` Dave Chinner
2017-10-05 23:36 ` Dave Chinner
1 sibling, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2017-10-05 23:36 UTC (permalink / raw)
To: Javier González
Cc: linux-xfs, linux-block, darrick.wong, Christoph Hellwig,
Peter Zijlstra
On Thu, Oct 05, 2017 at 12:53:50PM +0200, Javier Gonz�lez wrote:
> Hi,
>
> lockdep is reporting a circular dependency when using XFS and pblk,
> which I am a bit confused about.
>
> This happens when XFS sends a number of nested reads and (at least) one
> of them hits partially pblk's cache. In this case, pblk will retrieve
> the cached lbas and form a new bio, which is submitted _synchronously_
> to the media using struct completion. The original bio is then populated
> with the read data.
>
> What lockdep complains about, is that the unlocking operation in
> complete() has a circular dependency with ionode->i_rwsem when they both
> happen on the same core, which is different from the core that issued
> wait_for_completion_io_timeout() and is waiting for the partial read.
> However, AFAIU complete() happens in interrupt context, so this should
> not be a problem.
>
> The only thing I can think of is that XFS is making an assumption when
> submitting direct I/O that there will be no synchronous I/O coming from
> anybody underneath.
>
> Any thoughts?
The new lockdep completion tracking that was included in 4.14 can
throw false positives because it doesn't (yet) have the capability
to understand the multiply layered completion contexts we have in
the storage stack.
Peter, here's another for you to ponder....
> TRACE:
> [ 8557.938336] ======================================================
> [ 8557.944513] WARNING: possible circular locking dependency detected
> [ 8557.950688] 4.14.0-rc2--509811f1f2df #726 Not tainted
> [ 8557.955739] ------------------------------------------------------
> [ 8557.961919] rocksdb:bg11/19129 is trying to acquire lock:
> [ 8557.967319] (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] xfs_ilock+0x1a1/0x230
> [ 8557.975063]
> but now in release context of a crosslock acquired at the following:
> [ 8557.983921] ((complete)&wait#5){+.+.}, at: [<ffffffffad933d0a>] pblk_fill_partial_read_bio+0x22a/0x520
> [ 8557.993318]
> which lock already depends on the new lock.
>
> [ 8558.001490]
> the existing dependency chain (in reverse order) is:
> [ 8558.008966]
> -> #1 ((complete)&wait#5){+.+.}:
> [ 8558.014715] __lock_acquire+0x1186/0x11d0
> [ 8558.019245] lock_acquire+0xed/0x1e0
> [ 8558.023337] wait_for_completion_io_timeout+0x65/0x180
> [ 8558.028989] pblk_fill_partial_read_bio+0x22a/0x520
> [ 8558.034375] pblk_submit_read+0x158/0x3a0
> [ 8558.038900] pblk_make_rq+0xc1/0x110
> [ 8558.042992] generic_make_request+0x11d/0x3c0
> [ 8558.047862] submit_bio+0x73/0x140
> [ 8558.051781] iomap_dio_actor+0x1f8/0x3a0
> [ 8558.056216] iomap_apply+0xa1/0x110
> [ 8558.060220] iomap_dio_rw+0x26f/0x3c0
> [ 8558.064398] xfs_file_dio_aio_read+0x88/0x1c0
> [ 8558.069269] xfs_file_read_iter+0xab/0xc0
> [ 8558.073794] __vfs_read+0xef/0x160
> [ 8558.077709] vfs_read+0xa8/0x150
> [ 8558.081454] SyS_pread64+0x98/0xc0
> [ 8558.085372] entry_SYSCALL_64_fastpath+0x23/0xc2
This is completely normal, blocked on a bio completion with a
filesystem lock held. More commonly we see this sort of thing
through through submit_bio_wait().
> [ 8558.125636] Possible unsafe locking scenario by crosslock:
>
> [ 8558.132672] CPU0 CPU1
> [ 8558.137196] ---- ----
> [ 8558.141720] lock(&inode->i_rwsem);
> [ 8558.145291] lock((complete)&wait#5);
> [ 8558.149035] lock(&inode->i_rwsem);
> [ 8558.155118] unlock((complete)&wait#5);
> [ 8558.161550]
> *** DEADLOCK ***
>
> [ 8558.167461] 4 locks held by rocksdb:bg11/19129:
> [ 8558.171984] #0: (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] xfs_ilock+0x1a1/0x230
> [ 8558.180148] #1: (rcu_read_lock){....}, at: [<ffffffffad72b8cc>] __blk_mq_run_hw_queue+0xec/0x1d0
> [ 8558.189091] #2: (&(&nvmeq->q_lock)->rlock){-.-.}, at: [<ffffffffad9d438c>] nvme_queue_rq+0x14c/0x9a0
> [ 8558.198384] #3: (&x->wait#17){-.-.}, at: [<ffffffffad16444d>] complete+0x1d/0x60
> [ 8558.205948]
> stack backtrace:
> [ 8558.210300] CPU: 6 PID: 19129 Comm: rocksdb:bg11 Not tainted 4.14.0-rc2--509811f1f2df #726
> [ 8558.218550] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0a 06/23/2016
> [ 8558.225675] Call Trace:
> [ 8558.228123] dump_stack+0x85/0xc7
> [ 8558.231438] print_circular_bug+0x226/0x330
> [ 8558.235616] ? print_bfs_bug+0x40/0x40
> [ 8558.239358] check_prev_add+0x432/0x840
> [ 8558.243190] ? __lock_acquire+0x736/0x11d0
> [ 8558.247282] lock_commit_crosslock+0x3d3/0x5d0
> [ 8558.251718] ? lock_commit_crosslock+0x3d3/0x5d0
> [ 8558.256328] complete+0x29/0x60
> [ 8558.259469] pblk_end_io_sync+0x12/0x20
> [ 8558.263297] nvm_end_io+0x2b/0x30
> [ 8558.266607] nvme_nvm_end_io+0x2e/0x50
And this is where lockdep gets confused. Up until this point, it's
only ever seen the bio completion come from /outside/ the XFS
i_rwsem context (i.e. async completion form interupts). With a
synchronous completion like this, the completion has now been done
from inside the i_rwsem context and that is what triggers lockdep.
This is safe to do because even if we have a bio blocked on a
completion queue, we've got all we way down to this complete() call
from a *different bio context* inside a read locked i_rwsem. Hence
there's no way this can be a deadlock even if the two processes are
holding the same i_rwsem.
Therefore: false positive.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Circular locking dependency with pblk
@ 2017-10-05 23:36 ` Dave Chinner
0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2017-10-05 23:36 UTC (permalink / raw)
To: Javier González
Cc: linux-xfs, linux-block, darrick.wong, Christoph Hellwig,
Peter Zijlstra
On Thu, Oct 05, 2017 at 12:53:50PM +0200, Javier González wrote:
> Hi,
>
> lockdep is reporting a circular dependency when using XFS and pblk,
> which I am a bit confused about.
>
> This happens when XFS sends a number of nested reads and (at least) one
> of them hits partially pblk's cache. In this case, pblk will retrieve
> the cached lbas and form a new bio, which is submitted _synchronously_
> to the media using struct completion. The original bio is then populated
> with the read data.
>
> What lockdep complains about, is that the unlocking operation in
> complete() has a circular dependency with ionode->i_rwsem when they both
> happen on the same core, which is different from the core that issued
> wait_for_completion_io_timeout() and is waiting for the partial read.
> However, AFAIU complete() happens in interrupt context, so this should
> not be a problem.
>
> The only thing I can think of is that XFS is making an assumption when
> submitting direct I/O that there will be no synchronous I/O coming from
> anybody underneath.
>
> Any thoughts?
The new lockdep completion tracking that was included in 4.14 can
throw false positives because it doesn't (yet) have the capability
to understand the multiply layered completion contexts we have in
the storage stack.
Peter, here's another for you to ponder....
> TRACE:
> [ 8557.938336] ======================================================
> [ 8557.944513] WARNING: possible circular locking dependency detected
> [ 8557.950688] 4.14.0-rc2--509811f1f2df #726 Not tainted
> [ 8557.955739] ------------------------------------------------------
> [ 8557.961919] rocksdb:bg11/19129 is trying to acquire lock:
> [ 8557.967319] (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] xfs_ilock+0x1a1/0x230
> [ 8557.975063]
> but now in release context of a crosslock acquired at the following:
> [ 8557.983921] ((complete)&wait#5){+.+.}, at: [<ffffffffad933d0a>] pblk_fill_partial_read_bio+0x22a/0x520
> [ 8557.993318]
> which lock already depends on the new lock.
>
> [ 8558.001490]
> the existing dependency chain (in reverse order) is:
> [ 8558.008966]
> -> #1 ((complete)&wait#5){+.+.}:
> [ 8558.014715] __lock_acquire+0x1186/0x11d0
> [ 8558.019245] lock_acquire+0xed/0x1e0
> [ 8558.023337] wait_for_completion_io_timeout+0x65/0x180
> [ 8558.028989] pblk_fill_partial_read_bio+0x22a/0x520
> [ 8558.034375] pblk_submit_read+0x158/0x3a0
> [ 8558.038900] pblk_make_rq+0xc1/0x110
> [ 8558.042992] generic_make_request+0x11d/0x3c0
> [ 8558.047862] submit_bio+0x73/0x140
> [ 8558.051781] iomap_dio_actor+0x1f8/0x3a0
> [ 8558.056216] iomap_apply+0xa1/0x110
> [ 8558.060220] iomap_dio_rw+0x26f/0x3c0
> [ 8558.064398] xfs_file_dio_aio_read+0x88/0x1c0
> [ 8558.069269] xfs_file_read_iter+0xab/0xc0
> [ 8558.073794] __vfs_read+0xef/0x160
> [ 8558.077709] vfs_read+0xa8/0x150
> [ 8558.081454] SyS_pread64+0x98/0xc0
> [ 8558.085372] entry_SYSCALL_64_fastpath+0x23/0xc2
This is completely normal, blocked on a bio completion with a
filesystem lock held. More commonly we see this sort of thing
through through submit_bio_wait().
> [ 8558.125636] Possible unsafe locking scenario by crosslock:
>
> [ 8558.132672] CPU0 CPU1
> [ 8558.137196] ---- ----
> [ 8558.141720] lock(&inode->i_rwsem);
> [ 8558.145291] lock((complete)&wait#5);
> [ 8558.149035] lock(&inode->i_rwsem);
> [ 8558.155118] unlock((complete)&wait#5);
> [ 8558.161550]
> *** DEADLOCK ***
>
> [ 8558.167461] 4 locks held by rocksdb:bg11/19129:
> [ 8558.171984] #0: (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] xfs_ilock+0x1a1/0x230
> [ 8558.180148] #1: (rcu_read_lock){....}, at: [<ffffffffad72b8cc>] __blk_mq_run_hw_queue+0xec/0x1d0
> [ 8558.189091] #2: (&(&nvmeq->q_lock)->rlock){-.-.}, at: [<ffffffffad9d438c>] nvme_queue_rq+0x14c/0x9a0
> [ 8558.198384] #3: (&x->wait#17){-.-.}, at: [<ffffffffad16444d>] complete+0x1d/0x60
> [ 8558.205948]
> stack backtrace:
> [ 8558.210300] CPU: 6 PID: 19129 Comm: rocksdb:bg11 Not tainted 4.14.0-rc2--509811f1f2df #726
> [ 8558.218550] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0a 06/23/2016
> [ 8558.225675] Call Trace:
> [ 8558.228123] dump_stack+0x85/0xc7
> [ 8558.231438] print_circular_bug+0x226/0x330
> [ 8558.235616] ? print_bfs_bug+0x40/0x40
> [ 8558.239358] check_prev_add+0x432/0x840
> [ 8558.243190] ? __lock_acquire+0x736/0x11d0
> [ 8558.247282] lock_commit_crosslock+0x3d3/0x5d0
> [ 8558.251718] ? lock_commit_crosslock+0x3d3/0x5d0
> [ 8558.256328] complete+0x29/0x60
> [ 8558.259469] pblk_end_io_sync+0x12/0x20
> [ 8558.263297] nvm_end_io+0x2b/0x30
> [ 8558.266607] nvme_nvm_end_io+0x2e/0x50
And this is where lockdep gets confused. Up until this point, it's
only ever seen the bio completion come from /outside/ the XFS
i_rwsem context (i.e. async completion form interupts). With a
synchronous completion like this, the completion has now been done
from inside the i_rwsem context and that is what triggers lockdep.
This is safe to do because even if we have a bio blocked on a
completion queue, we've got all we way down to this complete() call
from a *different bio context* inside a read locked i_rwsem. Hence
there's no way this can be a deadlock even if the two processes are
holding the same i_rwsem.
Therefore: false positive.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Circular locking dependency with pblk
2017-10-05 23:36 ` Dave Chinner
(?)
@ 2017-10-06 6:52 ` Javier González
-1 siblings, 0 replies; 7+ messages in thread
From: Javier González @ 2017-10-06 6:52 UTC (permalink / raw)
To: Dave Chinner
Cc: linux-xfs, linux-block, darrick.wong, Christoph Hellwig,
Peter Zijlstra
[-- Attachment #1: Type: text/plain, Size: 6167 bytes --]
> On 6 Oct 2017, at 01.36, Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Oct 05, 2017 at 12:53:50PM +0200, Javier González wrote:
>> Hi,
>>
>> lockdep is reporting a circular dependency when using XFS and pblk,
>> which I am a bit confused about.
>>
>> This happens when XFS sends a number of nested reads and (at least) one
>> of them hits partially pblk's cache. In this case, pblk will retrieve
>> the cached lbas and form a new bio, which is submitted _synchronously_
>> to the media using struct completion. The original bio is then populated
>> with the read data.
>>
>> What lockdep complains about, is that the unlocking operation in
>> complete() has a circular dependency with ionode->i_rwsem when they both
>> happen on the same core, which is different from the core that issued
>> wait_for_completion_io_timeout() and is waiting for the partial read.
>> However, AFAIU complete() happens in interrupt context, so this should
>> not be a problem.
>>
>> The only thing I can think of is that XFS is making an assumption when
>> submitting direct I/O that there will be no synchronous I/O coming from
>> anybody underneath.
>>
>> Any thoughts?
>
> The new lockdep completion tracking that was included in 4.14 can
> throw false positives because it doesn't (yet) have the capability
> to understand the multiply layered completion contexts we have in
> the storage stack.
>
> Peter, here's another for you to ponder....
>
>> TRACE:
>> [ 8557.938336] ======================================================
>> [ 8557.944513] WARNING: possible circular locking dependency detected
>> [ 8557.950688] 4.14.0-rc2--509811f1f2df #726 Not tainted
>> [ 8557.955739] ------------------------------------------------------
>> [ 8557.961919] rocksdb:bg11/19129 is trying to acquire lock:
>> [ 8557.967319] (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] xfs_ilock+0x1a1/0x230
>> [ 8557.975063]
>> but now in release context of a crosslock acquired at the following:
>> [ 8557.983921] ((complete)&wait#5){+.+.}, at: [<ffffffffad933d0a>] pblk_fill_partial_read_bio+0x22a/0x520
>> [ 8557.993318]
>> which lock already depends on the new lock.
>>
>> [ 8558.001490]
>> the existing dependency chain (in reverse order) is:
>> [ 8558.008966]
>> -> #1 ((complete)&wait#5){+.+.}:
>> [ 8558.014715] __lock_acquire+0x1186/0x11d0
>> [ 8558.019245] lock_acquire+0xed/0x1e0
>> [ 8558.023337] wait_for_completion_io_timeout+0x65/0x180
>> [ 8558.028989] pblk_fill_partial_read_bio+0x22a/0x520
>> [ 8558.034375] pblk_submit_read+0x158/0x3a0
>> [ 8558.038900] pblk_make_rq+0xc1/0x110
>> [ 8558.042992] generic_make_request+0x11d/0x3c0
>> [ 8558.047862] submit_bio+0x73/0x140
>> [ 8558.051781] iomap_dio_actor+0x1f8/0x3a0
>> [ 8558.056216] iomap_apply+0xa1/0x110
>> [ 8558.060220] iomap_dio_rw+0x26f/0x3c0
>> [ 8558.064398] xfs_file_dio_aio_read+0x88/0x1c0
>> [ 8558.069269] xfs_file_read_iter+0xab/0xc0
>> [ 8558.073794] __vfs_read+0xef/0x160
>> [ 8558.077709] vfs_read+0xa8/0x150
>> [ 8558.081454] SyS_pread64+0x98/0xc0
>> [ 8558.085372] entry_SYSCALL_64_fastpath+0x23/0xc2
>
> This is completely normal, blocked on a bio completion with a
> filesystem lock held. More commonly we see this sort of thing
> through through submit_bio_wait().
Yes. We do the same construct in pblk.
>
>> [ 8558.125636] Possible unsafe locking scenario by crosslock:
>>
>> [ 8558.132672] CPU0 CPU1
>> [ 8558.137196] ---- ----
>> [ 8558.141720] lock(&inode->i_rwsem);
>> [ 8558.145291] lock((complete)&wait#5);
>> [ 8558.149035] lock(&inode->i_rwsem);
>> [ 8558.155118] unlock((complete)&wait#5);
>> [ 8558.161550]
>> *** DEADLOCK ***
>>
>> [ 8558.167461] 4 locks held by rocksdb:bg11/19129:
>> [ 8558.171984] #0: (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] xfs_ilock+0x1a1/0x230
>> [ 8558.180148] #1: (rcu_read_lock){....}, at: [<ffffffffad72b8cc>] __blk_mq_run_hw_queue+0xec/0x1d0
>> [ 8558.189091] #2: (&(&nvmeq->q_lock)->rlock){-.-.}, at: [<ffffffffad9d438c>] nvme_queue_rq+0x14c/0x9a0
>> [ 8558.198384] #3: (&x->wait#17){-.-.}, at: [<ffffffffad16444d>] complete+0x1d/0x60
>> [ 8558.205948]
>> stack backtrace:
>> [ 8558.210300] CPU: 6 PID: 19129 Comm: rocksdb:bg11 Not tainted 4.14.0-rc2--509811f1f2df #726
>> [ 8558.218550] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0a 06/23/2016
>> [ 8558.225675] Call Trace:
>> [ 8558.228123] dump_stack+0x85/0xc7
>> [ 8558.231438] print_circular_bug+0x226/0x330
>> [ 8558.235616] ? print_bfs_bug+0x40/0x40
>> [ 8558.239358] check_prev_add+0x432/0x840
>> [ 8558.243190] ? __lock_acquire+0x736/0x11d0
>> [ 8558.247282] lock_commit_crosslock+0x3d3/0x5d0
>> [ 8558.251718] ? lock_commit_crosslock+0x3d3/0x5d0
>> [ 8558.256328] complete+0x29/0x60
>> [ 8558.259469] pblk_end_io_sync+0x12/0x20
>> [ 8558.263297] nvm_end_io+0x2b/0x30
>> [ 8558.266607] nvme_nvm_end_io+0x2e/0x50
>
> And this is where lockdep gets confused. Up until this point, it's
> only ever seen the bio completion come from /outside/ the XFS
> i_rwsem context (i.e. async completion form interupts). With a
> synchronous completion like this, the completion has now been done
> from inside the i_rwsem context and that is what triggers lockdep.
>
> This is safe to do because even if we have a bio blocked on a
> completion queue, we've got all we way down to this complete() call
> from a *different bio context* inside a read locked i_rwsem. Hence
> there's no way this can be a deadlock even if the two processes are
> holding the same i_rwsem.
>
> Therefore: false positive.
Thanks for the explanation Dave. I could not see how waiting for a
struct complete under generic_make_request could in anyway interleave
with a FS lock, but it is good to have it confirmed.
Javier.
[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-10-06 6:52 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-05 10:53 Circular locking dependency with pblk Javier González
2017-10-05 16:24 ` Jens Axboe
2017-10-05 16:24 ` Jens Axboe
2017-10-05 16:44 ` Javier González
2017-10-05 23:36 ` Dave Chinner
2017-10-05 23:36 ` Dave Chinner
2017-10-06 6:52 ` Javier González
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.