* Re: [bug report][bisected] WARNING: CPU: 4 PID: 10482 at block/mq-deadline.c:597 dd_exit_sched+0x198/0x1d0'
@ 2021-12-07 10:40 Halil Pasic
2021-12-07 11:14 ` Yi Zhang
0 siblings, 1 reply; 3+ messages in thread
From: Halil Pasic @ 2021-12-07 10:40 UTC (permalink / raw)
To: yi.zhang; +Cc: axboe, bgoncalv, czhong, linux-block, skt-results-master
Hi Jens!
What is the status of this?
I see just one fix for e0d78afeb8d1 ("block: fix too broad elevator check in
blk_mq_free_request()") which pre-dates this bug report.
We see something similar (i.e. the exact same warning) in our CI occasionally,
when the nbd module is unloaded. Unfortunately I can't trigger it reliably and
frequently enough to confirm that the problem is caused by the aforementioned
commit. All I know is that we occasionally do hit the same warning.
Thanks in advance!
Regards,
Halil
If your interested, here are the relevant kernel messages we observed:
[ 2697.795977] block nbd0: shutting down sockets
[ 2697.949807] ------------[ cut here ]------------
[ 2697.949816] statistics for priority 1: i 2736 m 0 d 2739 c 2735
[ 2697.949839] WARNING: CPU: 5 PID: 163229 at block/mq-deadline.c:597 dd_exit_sched+0x118/0x138
[ 2697.949849] Modules linked in: nbd(-) crc32_generic algif_hash dm_mirror dm_region_hash dm_log algif_skcipher af_alg paes_s390 dm_crypt encrypted_keys loop lcs ctcm fsm kvm binfmt_misc sunrpc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua zfcp scsi_transport_fc mlx5_ib ib_uverbs ib_core s390_trng vfio_ccw mdev vfio_iommu_type1 zcrypt_cex4 vfio eadm_sch sch_fq_codel configfs ip_tables x_tables ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 mlx5_core sha1_s390 sha_common nvme nvme_core pkey zcrypt rng_core autofs4 [last unloaded: trace_printk]
[ 2697.949904] CPU: 5 PID: 163229 Comm: modprobe Not tainted 5.16.0-20211122.rc1.git0.b2753a24042f.300.fc34.s390x #1
[ 2697.949907] Hardware name: IBM 8561 T01 703 (LPAR)
[ 2697.949908] Krnl PSW : 0704c00180000000 00000002eac7f4f4 (dd_exit_sched+0x11c/0x138)
[ 2697.949912] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 2697.949915] Krnl GPRS: 00000000ffffffea 0000000200000027 0000000000000033 00000000fffeffff
[ 2697.949917] 00000002f6cfc000 0000038000000001 0000000000000ab3 0000000100000000
[ 2697.949919] 000000025d5caf48 000000025d5cae00 0000000000000001 000000025d5cae80
[ 2697.949920] 00000002b31f2100 000000025d5caf4a 00000002eac7f4f0 0000038007167af8
[ 2697.949927] Krnl Code: 00000002eac7f4e4: e310f0a00024 stg %r1,160(%r15)
00000002eac7f4ea: c0e50025381f brasl %r14,00000002eb126528
#00000002eac7f4f0: af000000 mc 0,0
>00000002eac7f4f4: a7f4ffca brc 15,00000002eac7f488
00000002eac7f4f8: b9040028 lgr %r2,%r8
00000002eac7f4fc: c0e50008acda brasl %r14,00000002ead94eb0
00000002eac7f502: a7f4ffb5 brc 15,00000002eac7f46c
00000002eac7f506: af000000 mc 0,0
[ 2697.949941] Call Trace:
[ 2697.949943] [<00000002eac7f4f4>] dd_exit_sched+0x11c/0x138
[ 2697.949946] ([<00000002eac7f4f0>] dd_exit_sched+0x118/0x138)
[ 2697.949948] [<00000002eac5fddc>] blk_mq_exit_sched+0xb4/0xd8
[ 2697.949951] [<00000002eac44e38>] __elevator_exit+0x40/0x60
[ 2697.949955] [<00000002eac4abda>] blk_release_queue+0xc2/0x168
[ 2697.949958] [<00000002ead7f78a>] kobject_cleanup+0x5a/0x180
[ 2697.949961] [<00000002eac62270>] disk_release+0x70/0x90
[ 2697.949963] [<00000002eae04b38>] device_release+0x48/0xb0
[ 2697.949968] [<00000002ead7f78a>] kobject_cleanup+0x5a/0x180
[ 2697.949970] [<000003ff806b724a>] nbd_dev_remove+0x3a/0x90 [nbd]
[ 2697.949976] [<000003ff806bc1c2>] nbd_cleanup+0xda/0x120 [nbd]
[ 2697.949980] [<00000002ea67838a>] __do_sys_delete_module+0x1a2/0x268
[ 2697.949984] [<00000002eb1369ec>] __do_syscall+0x1d4/0x200
[ 2697.949987] [<00000002eb143f32>] system_call+0x82/0xb0
[ 2697.949990] Last Breaking-Event-Address:
[ 2697.949991] [<00000002eb126588>] __warn_printk+0x60/0x68
[ 2697.949996] Kernel panic - not syncing: panic_on_warn set ...
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [bug report][bisected] WARNING: CPU: 4 PID: 10482 at block/mq-deadline.c:597 dd_exit_sched+0x198/0x1d0'
2021-12-07 10:40 [bug report][bisected] WARNING: CPU: 4 PID: 10482 at block/mq-deadline.c:597 dd_exit_sched+0x198/0x1d0' Halil Pasic
@ 2021-12-07 11:14 ` Yi Zhang
2021-12-07 13:40 ` Halil Pasic
0 siblings, 1 reply; 3+ messages in thread
From: Yi Zhang @ 2021-12-07 11:14 UTC (permalink / raw)
To: Halil Pasic
Cc: Jens Axboe, Bruno Goncalves, Changhui Zhong, linux-block,
skt-results-master
On Tue, Dec 7, 2021 at 6:40 PM Halil Pasic <pasic@linux.ibm.com> wrote:
>
> Hi Jens!
>
> What is the status of this?
Hi Halil
This issue should be already fixed, here is the commit:
commit 2b504bd4841bccbf3eb83c1fec229b65956ad8ad (tag: block-5.16-2021-11-19)
Author: Ming Lei <ming.lei@redhat.com>
Date: Thu Nov 18 23:30:41 2021 +0800
blk-mq: don't insert FUA request with data into scheduler queue
We never insert flush request into scheduler queue before.
Recently commit d92ca9d8348f ("blk-mq: don't handle non-flush requests in
blk_insert_flush") tries to handle FUA data request as normal request.
This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in
case of kyber since RQF_ELVPRIV isn't set for flush request, then
->finish_request won't be called.
Fix the issue by inserting FUA data request with
blk_mq_request_bypass_insert()
when the device supports FUA, just like what we did before.
>
> I see just one fix for e0d78afeb8d1 ("block: fix too broad elevator check in
> blk_mq_free_request()") which pre-dates this bug report.
>
> We see something similar (i.e. the exact same warning) in our CI occasionally,
> when the nbd module is unloaded. Unfortunately I can't trigger it reliably and
> frequently enough to confirm that the problem is caused by the aforementioned
> commit. All I know is that we occasionally do hit the same warning.
>
> Thanks in advance!
>
> Regards,
> Halil
>
> If your interested, here are the relevant kernel messages we observed:
>
> [ 2697.795977] block nbd0: shutting down sockets
> [ 2697.949807] ------------[ cut here ]------------
> [ 2697.949816] statistics for priority 1: i 2736 m 0 d 2739 c 2735
> [ 2697.949839] WARNING: CPU: 5 PID: 163229 at block/mq-deadline.c:597 dd_exit_sched+0x118/0x138
> [ 2697.949849] Modules linked in: nbd(-) crc32_generic algif_hash dm_mirror dm_region_hash dm_log algif_skcipher af_alg paes_s390 dm_crypt encrypted_keys loop lcs ctcm fsm kvm binfmt_misc sunrpc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua zfcp scsi_transport_fc mlx5_ib ib_uverbs ib_core s390_trng vfio_ccw mdev vfio_iommu_type1 zcrypt_cex4 vfio eadm_sch sch_fq_codel configfs ip_tables x_tables ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 mlx5_core sha1_s390 sha_common nvme nvme_core pkey zcrypt rng_core autofs4 [last unloaded: trace_printk]
> [ 2697.949904] CPU: 5 PID: 163229 Comm: modprobe Not tainted 5.16.0-20211122.rc1.git0.b2753a24042f.300.fc34.s390x #1
> [ 2697.949907] Hardware name: IBM 8561 T01 703 (LPAR)
> [ 2697.949908] Krnl PSW : 0704c00180000000 00000002eac7f4f4 (dd_exit_sched+0x11c/0x138)
> [ 2697.949912] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [ 2697.949915] Krnl GPRS: 00000000ffffffea 0000000200000027 0000000000000033 00000000fffeffff
> [ 2697.949917] 00000002f6cfc000 0000038000000001 0000000000000ab3 0000000100000000
> [ 2697.949919] 000000025d5caf48 000000025d5cae00 0000000000000001 000000025d5cae80
> [ 2697.949920] 00000002b31f2100 000000025d5caf4a 00000002eac7f4f0 0000038007167af8
> [ 2697.949927] Krnl Code: 00000002eac7f4e4: e310f0a00024 stg %r1,160(%r15)
> 00000002eac7f4ea: c0e50025381f brasl %r14,00000002eb126528
> #00000002eac7f4f0: af000000 mc 0,0
> >00000002eac7f4f4: a7f4ffca brc 15,00000002eac7f488
> 00000002eac7f4f8: b9040028 lgr %r2,%r8
> 00000002eac7f4fc: c0e50008acda brasl %r14,00000002ead94eb0
> 00000002eac7f502: a7f4ffb5 brc 15,00000002eac7f46c
> 00000002eac7f506: af000000 mc 0,0
> [ 2697.949941] Call Trace:
> [ 2697.949943] [<00000002eac7f4f4>] dd_exit_sched+0x11c/0x138
> [ 2697.949946] ([<00000002eac7f4f0>] dd_exit_sched+0x118/0x138)
> [ 2697.949948] [<00000002eac5fddc>] blk_mq_exit_sched+0xb4/0xd8
> [ 2697.949951] [<00000002eac44e38>] __elevator_exit+0x40/0x60
> [ 2697.949955] [<00000002eac4abda>] blk_release_queue+0xc2/0x168
> [ 2697.949958] [<00000002ead7f78a>] kobject_cleanup+0x5a/0x180
> [ 2697.949961] [<00000002eac62270>] disk_release+0x70/0x90
> [ 2697.949963] [<00000002eae04b38>] device_release+0x48/0xb0
> [ 2697.949968] [<00000002ead7f78a>] kobject_cleanup+0x5a/0x180
> [ 2697.949970] [<000003ff806b724a>] nbd_dev_remove+0x3a/0x90 [nbd]
> [ 2697.949976] [<000003ff806bc1c2>] nbd_cleanup+0xda/0x120 [nbd]
> [ 2697.949980] [<00000002ea67838a>] __do_sys_delete_module+0x1a2/0x268
> [ 2697.949984] [<00000002eb1369ec>] __do_syscall+0x1d4/0x200
> [ 2697.949987] [<00000002eb143f32>] system_call+0x82/0xb0
> [ 2697.949990] Last Breaking-Event-Address:
> [ 2697.949991] [<00000002eb126588>] __warn_printk+0x60/0x68
> [ 2697.949996] Kernel panic - not syncing: panic_on_warn set ...
>
--
Best Regards,
Yi Zhang
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [bug report][bisected] WARNING: CPU: 4 PID: 10482 at block/mq-deadline.c:597 dd_exit_sched+0x198/0x1d0'
2021-12-07 11:14 ` Yi Zhang
@ 2021-12-07 13:40 ` Halil Pasic
0 siblings, 0 replies; 3+ messages in thread
From: Halil Pasic @ 2021-12-07 13:40 UTC (permalink / raw)
To: Yi Zhang
Cc: Jens Axboe, Bruno Goncalves, Changhui Zhong, linux-block,
skt-results-master, Halil Pasic
On Tue, 7 Dec 2021 19:14:23 +0800
Yi Zhang <yi.zhang@redhat.com> wrote:
> > What is the status of this?
>
> Hi Halil
> This issue should be already fixed, here is the commit:
>
> commit 2b504bd4841bccbf3eb83c1fec229b65956ad8ad (tag: block-5.16-2021-11-19)
Thank you very much! We haven't seen this in a while, and it is very
likely that this commit indeed fixes our symptoms as well! I didn't find
it because I looked for 'Fixes: e0d78afeb8d1', sorry :D
Have a great day!
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-12-07 13:41 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-07 10:40 [bug report][bisected] WARNING: CPU: 4 PID: 10482 at block/mq-deadline.c:597 dd_exit_sched+0x198/0x1d0' Halil Pasic
2021-12-07 11:14 ` Yi Zhang
2021-12-07 13:40 ` Halil Pasic
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).