linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [bug report][bisected] WARNING: CPU: 4 PID: 10482 at block/mq-deadline.c:597 dd_exit_sched+0x198/0x1d0'
@ 2021-12-07 10:40 Halil Pasic
  2021-12-07 11:14 ` Yi Zhang
  0 siblings, 1 reply; 3+ messages in thread
From: Halil Pasic @ 2021-12-07 10:40 UTC (permalink / raw)
  To: yi.zhang; +Cc: axboe, bgoncalv, czhong, linux-block, skt-results-master

Hi Jens!

What is the status of this?

I see just one fix for e0d78afeb8d1 ("block: fix too broad elevator check in
blk_mq_free_request()") which pre-dates this bug report.

We see something similar (i.e. the exact same warning) in our CI occasionally,
when the nbd module is unloaded. Unfortunately I can't trigger it reliably and
frequently enough to confirm that the problem is caused by the aforementioned
commit. All I know is that we occasionally do hit the same warning.

Thanks in advance!

Regards,
Halil 

If your interested, here are the  relevant kernel messages we observed:

[ 2697.795977] block nbd0: shutting down sockets
[ 2697.949807] ------------[ cut here ]------------
[ 2697.949816] statistics for priority 1: i 2736 m 0 d 2739 c 2735
[ 2697.949839] WARNING: CPU: 5 PID: 163229 at block/mq-deadline.c:597 dd_exit_sched+0x118/0x138
[ 2697.949849] Modules linked in: nbd(-) crc32_generic algif_hash dm_mirror dm_region_hash dm_log algif_skcipher af_alg paes_s390 dm_crypt encrypted_keys loop lcs ctcm fsm kvm binfmt_misc sunrpc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua zfcp scsi_transport_fc mlx5_ib ib_uverbs ib_core s390_trng vfio_ccw mdev vfio_iommu_type1 zcrypt_cex4 vfio eadm_sch sch_fq_codel configfs ip_tables x_tables ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 mlx5_core sha1_s390 sha_common nvme nvme_core pkey zcrypt rng_core autofs4 [last unloaded: trace_printk]
[ 2697.949904] CPU: 5 PID: 163229 Comm: modprobe Not tainted 5.16.0-20211122.rc1.git0.b2753a24042f.300.fc34.s390x #1
[ 2697.949907] Hardware name: IBM 8561 T01 703 (LPAR)
[ 2697.949908] Krnl PSW : 0704c00180000000 00000002eac7f4f4 (dd_exit_sched+0x11c/0x138)
[ 2697.949912]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 2697.949915] Krnl GPRS: 00000000ffffffea 0000000200000027 0000000000000033 00000000fffeffff
[ 2697.949917]            00000002f6cfc000 0000038000000001 0000000000000ab3 0000000100000000
[ 2697.949919]            000000025d5caf48 000000025d5cae00 0000000000000001 000000025d5cae80
[ 2697.949920]            00000002b31f2100 000000025d5caf4a 00000002eac7f4f0 0000038007167af8
[ 2697.949927] Krnl Code: 00000002eac7f4e4: e310f0a00024        stg     %r1,160(%r15)
                          00000002eac7f4ea: c0e50025381f        brasl   %r14,00000002eb126528
                         #00000002eac7f4f0: af000000            mc      0,0
                         >00000002eac7f4f4: a7f4ffca            brc     15,00000002eac7f488
                          00000002eac7f4f8: b9040028            lgr     %r2,%r8
                          00000002eac7f4fc: c0e50008acda        brasl   %r14,00000002ead94eb0
                          00000002eac7f502: a7f4ffb5            brc     15,00000002eac7f46c
                          00000002eac7f506: af000000            mc      0,0
[ 2697.949941] Call Trace:
[ 2697.949943]  [<00000002eac7f4f4>] dd_exit_sched+0x11c/0x138
[ 2697.949946] ([<00000002eac7f4f0>] dd_exit_sched+0x118/0x138)
[ 2697.949948]  [<00000002eac5fddc>] blk_mq_exit_sched+0xb4/0xd8
[ 2697.949951]  [<00000002eac44e38>] __elevator_exit+0x40/0x60
[ 2697.949955]  [<00000002eac4abda>] blk_release_queue+0xc2/0x168
[ 2697.949958]  [<00000002ead7f78a>] kobject_cleanup+0x5a/0x180
[ 2697.949961]  [<00000002eac62270>] disk_release+0x70/0x90
[ 2697.949963]  [<00000002eae04b38>] device_release+0x48/0xb0
[ 2697.949968]  [<00000002ead7f78a>] kobject_cleanup+0x5a/0x180
[ 2697.949970]  [<000003ff806b724a>] nbd_dev_remove+0x3a/0x90 [nbd]
[ 2697.949976]  [<000003ff806bc1c2>] nbd_cleanup+0xda/0x120 [nbd]
[ 2697.949980]  [<00000002ea67838a>] __do_sys_delete_module+0x1a2/0x268
[ 2697.949984]  [<00000002eb1369ec>] __do_syscall+0x1d4/0x200
[ 2697.949987]  [<00000002eb143f32>] system_call+0x82/0xb0
[ 2697.949990] Last Breaking-Event-Address:
[ 2697.949991]  [<00000002eb126588>] __warn_printk+0x60/0x68
[ 2697.949996] Kernel panic - not syncing: panic_on_warn set ...


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bug report][bisected] WARNING: CPU: 4 PID: 10482 at block/mq-deadline.c:597 dd_exit_sched+0x198/0x1d0'
  2021-12-07 10:40 [bug report][bisected] WARNING: CPU: 4 PID: 10482 at block/mq-deadline.c:597 dd_exit_sched+0x198/0x1d0' Halil Pasic
@ 2021-12-07 11:14 ` Yi Zhang
  2021-12-07 13:40   ` Halil Pasic
  0 siblings, 1 reply; 3+ messages in thread
From: Yi Zhang @ 2021-12-07 11:14 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Jens Axboe, Bruno Goncalves, Changhui Zhong, linux-block,
	skt-results-master

On Tue, Dec 7, 2021 at 6:40 PM Halil Pasic <pasic@linux.ibm.com> wrote:
>
> Hi Jens!
>
> What is the status of this?

Hi Halil
This issue should be already fixed, here is the commit:

commit 2b504bd4841bccbf3eb83c1fec229b65956ad8ad (tag: block-5.16-2021-11-19)
Author: Ming Lei <ming.lei@redhat.com>
Date:   Thu Nov 18 23:30:41 2021 +0800

    blk-mq: don't insert FUA request with data into scheduler queue

    We never insert flush request into scheduler queue before.

    Recently commit d92ca9d8348f ("blk-mq: don't handle non-flush requests in
    blk_insert_flush") tries to handle FUA data request as normal request.
    This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in
    case of kyber since RQF_ELVPRIV isn't set for flush request, then
    ->finish_request won't be called.

    Fix the issue by inserting FUA data request with
blk_mq_request_bypass_insert()
    when the device supports FUA, just like what we did before.

>
> I see just one fix for e0d78afeb8d1 ("block: fix too broad elevator check in
> blk_mq_free_request()") which pre-dates this bug report.
>
> We see something similar (i.e. the exact same warning) in our CI occasionally,
> when the nbd module is unloaded. Unfortunately I can't trigger it reliably and
> frequently enough to confirm that the problem is caused by the aforementioned
> commit. All I know is that we occasionally do hit the same warning.
>
> Thanks in advance!
>
> Regards,
> Halil
>
> If your interested, here are the  relevant kernel messages we observed:
>
> [ 2697.795977] block nbd0: shutting down sockets
> [ 2697.949807] ------------[ cut here ]------------
> [ 2697.949816] statistics for priority 1: i 2736 m 0 d 2739 c 2735
> [ 2697.949839] WARNING: CPU: 5 PID: 163229 at block/mq-deadline.c:597 dd_exit_sched+0x118/0x138
> [ 2697.949849] Modules linked in: nbd(-) crc32_generic algif_hash dm_mirror dm_region_hash dm_log algif_skcipher af_alg paes_s390 dm_crypt encrypted_keys loop lcs ctcm fsm kvm binfmt_misc sunrpc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua zfcp scsi_transport_fc mlx5_ib ib_uverbs ib_core s390_trng vfio_ccw mdev vfio_iommu_type1 zcrypt_cex4 vfio eadm_sch sch_fq_codel configfs ip_tables x_tables ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 mlx5_core sha1_s390 sha_common nvme nvme_core pkey zcrypt rng_core autofs4 [last unloaded: trace_printk]
> [ 2697.949904] CPU: 5 PID: 163229 Comm: modprobe Not tainted 5.16.0-20211122.rc1.git0.b2753a24042f.300.fc34.s390x #1
> [ 2697.949907] Hardware name: IBM 8561 T01 703 (LPAR)
> [ 2697.949908] Krnl PSW : 0704c00180000000 00000002eac7f4f4 (dd_exit_sched+0x11c/0x138)
> [ 2697.949912]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [ 2697.949915] Krnl GPRS: 00000000ffffffea 0000000200000027 0000000000000033 00000000fffeffff
> [ 2697.949917]            00000002f6cfc000 0000038000000001 0000000000000ab3 0000000100000000
> [ 2697.949919]            000000025d5caf48 000000025d5cae00 0000000000000001 000000025d5cae80
> [ 2697.949920]            00000002b31f2100 000000025d5caf4a 00000002eac7f4f0 0000038007167af8
> [ 2697.949927] Krnl Code: 00000002eac7f4e4: e310f0a00024        stg     %r1,160(%r15)
>                           00000002eac7f4ea: c0e50025381f        brasl   %r14,00000002eb126528
>                          #00000002eac7f4f0: af000000            mc      0,0
>                          >00000002eac7f4f4: a7f4ffca            brc     15,00000002eac7f488
>                           00000002eac7f4f8: b9040028            lgr     %r2,%r8
>                           00000002eac7f4fc: c0e50008acda        brasl   %r14,00000002ead94eb0
>                           00000002eac7f502: a7f4ffb5            brc     15,00000002eac7f46c
>                           00000002eac7f506: af000000            mc      0,0
> [ 2697.949941] Call Trace:
> [ 2697.949943]  [<00000002eac7f4f4>] dd_exit_sched+0x11c/0x138
> [ 2697.949946] ([<00000002eac7f4f0>] dd_exit_sched+0x118/0x138)
> [ 2697.949948]  [<00000002eac5fddc>] blk_mq_exit_sched+0xb4/0xd8
> [ 2697.949951]  [<00000002eac44e38>] __elevator_exit+0x40/0x60
> [ 2697.949955]  [<00000002eac4abda>] blk_release_queue+0xc2/0x168
> [ 2697.949958]  [<00000002ead7f78a>] kobject_cleanup+0x5a/0x180
> [ 2697.949961]  [<00000002eac62270>] disk_release+0x70/0x90
> [ 2697.949963]  [<00000002eae04b38>] device_release+0x48/0xb0
> [ 2697.949968]  [<00000002ead7f78a>] kobject_cleanup+0x5a/0x180
> [ 2697.949970]  [<000003ff806b724a>] nbd_dev_remove+0x3a/0x90 [nbd]
> [ 2697.949976]  [<000003ff806bc1c2>] nbd_cleanup+0xda/0x120 [nbd]
> [ 2697.949980]  [<00000002ea67838a>] __do_sys_delete_module+0x1a2/0x268
> [ 2697.949984]  [<00000002eb1369ec>] __do_syscall+0x1d4/0x200
> [ 2697.949987]  [<00000002eb143f32>] system_call+0x82/0xb0
> [ 2697.949990] Last Breaking-Event-Address:
> [ 2697.949991]  [<00000002eb126588>] __warn_printk+0x60/0x68
> [ 2697.949996] Kernel panic - not syncing: panic_on_warn set ...
>


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bug report][bisected] WARNING: CPU: 4 PID: 10482 at block/mq-deadline.c:597 dd_exit_sched+0x198/0x1d0'
  2021-12-07 11:14 ` Yi Zhang
@ 2021-12-07 13:40   ` Halil Pasic
  0 siblings, 0 replies; 3+ messages in thread
From: Halil Pasic @ 2021-12-07 13:40 UTC (permalink / raw)
  To: Yi Zhang
  Cc: Jens Axboe, Bruno Goncalves, Changhui Zhong, linux-block,
	skt-results-master, Halil Pasic

On Tue, 7 Dec 2021 19:14:23 +0800
Yi Zhang <yi.zhang@redhat.com> wrote:

> > What is the status of this?  
> 
> Hi Halil
> This issue should be already fixed, here is the commit:
> 
> commit 2b504bd4841bccbf3eb83c1fec229b65956ad8ad (tag: block-5.16-2021-11-19)

Thank you very much! We haven't seen this in a while, and it is very
likely that this commit indeed fixes our symptoms as well! I didn't find
it because I looked for 'Fixes: e0d78afeb8d1', sorry :D

Have a great day!

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-12-07 13:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-07 10:40 [bug report][bisected] WARNING: CPU: 4 PID: 10482 at block/mq-deadline.c:597 dd_exit_sched+0x198/0x1d0' Halil Pasic
2021-12-07 11:14 ` Yi Zhang
2021-12-07 13:40   ` Halil Pasic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).