From: Bob Pearson <rpearsonhpe@gmail.com>
To: Yanjun Zhu <yanjun.zhu@linux.dev>,
Bart Van Assche <bvanassche@acm.org>,
Zhu Yanjun <zyjzyj2000@gmail.com>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
Bernard Metzler <bmt@zurich.ibm.com>,
Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: Apparent regression in blktests since 5.18-rc1+
Date: Sat, 7 May 2022 08:40:09 -0500 [thread overview]
Message-ID: <8a05c359-8e2d-b88d-8741-2743be2eb779@gmail.com> (raw)
In-Reply-To: <4b0153c7-a8e9-98de-26ae-d421434a116d@linux.dev>
On 5/6/22 19:29, Yanjun Zhu wrote:
> 在 2022/5/7 8:10, Bart Van Assche 写道:
>> On 5/6/22 11:11, Bob Pearson wrote:
>>> Before the most recent kernel update I had blktests running OK on rdma_rxe. Since we went on to 5.18.0-rc1+
>>> I have been experiencing hangs. All of this is with the 'revert scsi-debug' patch which addressed the
>>> 3 min timeout related to modprobe -r scsi-debug.
>>>
>>> You suggested checking with siw and I finally got around to this and the behavior is exactly the same.
>>>
>>> Specifically here is a run and dmesgs from that run:
>>>
>>> root@u-22:/home/bob/src/blktests# use_siw=1 ./check srp
>>>
>>> srp/001 (Create and remove LUNs) [passed]
>>>
>>> runtime 3.388s ... 3.501s
>>>
>>> srp/002 (File I/O on top of multipath concurrently with logout and login (mq))
>>>
>>> runtime 54.689s ...
>>> <HANGS HERE>
>>>
>>> I had to reboot to recover.
>>>
>>> The dmesg output is attached in a long file called out.
>>> The output looks normal until line 1875 where it hangs at an "Already connected ..." message.
>>> This is the same as the other hangs I have been seeing.
>>> This is followed by a splat warning that a cpu has hung for 120 seconds.
>>>
>>> Since this is behaving the same for rxe and siw I am going to stop chasing this bug since
>>> it is most likely outside of the the rxe driver.
>>
>> Hi Bob,
>>
>> What I see on my test setup is that the SRP tests from the blktests suite pass with
>> the SoftiWARP driver (kernel v5.18-rc5 / commit 4b97bac0756a):
>>
>> # (cd blktests && use_siw=1 ./check -q srp)
>> srp/001 (Create and remove LUNs) [passed]
>> runtime 5.781s ... 5.464s
>> srp/002 (File I/O on top of multipath concurrently with logout and login (mq)) [passed]time 40.772s ...
>> runtime 40.772s ... 42.039s
>> srp/003 (File I/O on top of multipath concurrently with logout and login (sq)) [not run]
>> legacy device mapper support is missing
>> srp/004 (File I/O on top of multipath concurrently with logout and login (sq-on-srp/004 (File I/O on top of multipath concurrently with logout and login (sq-on-mq)) [not run]
>> legacy device mapper support is missing
>> srp/005 (Direct I/O with large transfer sizes, cmd_sg_entries=255 and bs=4M) [passed]untime 17.870s ...
>> runtime 17.870s ... 17.016s
>> srp/006 (Direct I/O with large transfer sizes, cmd_sg_entries=255 and bs=8M) [passed]untime 16.369s ...
>> runtime 16.369s ... 17.315s
>> srp/007 (Direct I/O with large transfer sizes, cmd_sg_entries=1 and bs=4M) [passed] runtime 16.729s ...
>> runtime 16.729s ... 17.409s
>> srp/008 (Direct I/O with large transfer sizes, cmd_sg_entries=1 and bs=8M) [passed] runtime 16.823s ...
>> runtime 16.823s ... 16.453s
>> srp/009 (Buffered I/O with large transfer sizes, cmd_sg_entries=255 and bs=4M) [passed]time 17.304s ...
>> runtime 17.304s ... 17.838s
>> srp/010 (Buffered I/O with large transfer sizes, cmd_sg_entries=255 and bs=8M) [passed]time 17.191s ...
>> runtime 17.191s ... 17.117s
>> srp/011 (Block I/O on top of multipath concurrently with logout and login) [passed] runtime 40.835s ...
>> runtime 40.835s ... 38.728s
>> srp/012 (dm-mpath on top of multiple I/O schedulers) [passed]
>> runtime 23.703s ... 24.763s
>> srp/013 (Direct I/O using a discontiguous buffer) [passed]
>> runtime 11.279s ... 9.265s
>> srp/014 (Run sg_reset while I/O is ongoing) [passed]
>> runtime 39.110s ... 37.929s
>> srp/015 (File I/O on top of multipath concurrently with logout and login (mq) ussrp/015
>> (File I/O on top of multipath concurrently with logout and login (mq) using the SoftiWARP (siw) driver) [passed]
>> runtime 40.027s ... 40.220s
>>
>> If I try to run the SRP test 002 with the soft-RoCE driver, the following appears:
>>
>> [ 749.901966] ================================
>> [ 749.903638] WARNING: inconsistent lock state
>> [ 749.905376] 5.18.0-rc5-dbg+ #1 Not tainted
>> [ 749.907039] --------------------------------
>> [ 749.908699] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
>> [ 749.910646] ksoftirqd/5/40 [HC0[0]:SC1[1]:HE0:SE0] takes:
>> [ 749.912499] ffff88818244d350 (&xa->xa_lock#14){+.?.}-{2:2}, at: rxe_pool_get_index+0x73/0x170 [rdma_rxe]
>> [ 749.914691] {SOFTIRQ-ON-W} state was registered at:
>> [ 749.916648] __lock_acquire+0x45b/0xce0
>> [ 749.918599] lock_acquire+0x18a/0x450
>> [ 749.920480] _raw_spin_lock+0x34/0x50
>> [ 749.922580] __rxe_add_to_pool+0xcc/0x140 [rdma_rxe]
>> [ 749.924583] rxe_alloc_pd+0x2d/0x40 [rdma_rxe]
>> [ 749.926394] __ib_alloc_pd+0xa3/0x270 [ib_core]
>> [ 749.928579] ib_mad_port_open+0x44a/0x790 [ib_core]
>> [ 749.930640] ib_mad_init_device+0x8e/0x110 [ib_core]
>> [ 749.932495] add_client_context+0x26a/0x330 [ib_core]
>> [ 749.934302] enable_device_and_get+0x169/0x2b0 [ib_core]
>> [ 749.936217] ib_register_device+0x26f/0x330 [ib_core]
>> [ 749.938020] rxe_register_device+0x1b4/0x1d0 [rdma_rxe]
>> [ 749.939794] rxe_add+0x8c/0xc0 [rdma_rxe]
>> [ 749.941552] rxe_net_add+0x5b/0x90 [rdma_rxe]
>> [ 749.943356] rxe_newlink+0x71/0x80 [rdma_rxe]
>> [ 749.945182] nldev_newlink+0x21e/0x370 [ib_core]
>> [ 749.946917] rdma_nl_rcv_msg+0x200/0x410 [ib_core]
>> [ 749.948657] rdma_nl_rcv+0x140/0x220 [ib_core]
>> [ 749.950373] netlink_unicast+0x307/0x460
>> [ 749.952063] netlink_sendmsg+0x422/0x750
>> [ 749.953672] __sys_sendto+0x1c2/0x250
>> [ 749.955281] __x64_sys_sendto+0x7f/0x90
>> [ 749.956849] do_syscall_64+0x35/0x80
>> [ 749.958353] entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [ 749.959942] irq event stamp: 1411849
>> [ 749.961517] hardirqs last enabled at (1411848): [<ffffffff810cdb28>] __local_bh_enable_ip+0x88/0xf0
>> [ 749.963338] hardirqs last disabled at (1411849): [<ffffffff81ebf24d>] _raw_spin_lock_irqsave+0x5d/0x60
>> [ 749.965214] softirqs last enabled at (1411838): [<ffffffff82200467>] __do_softirq+0x467/0x6e1
>> [ 749.967027] softirqs last disabled at (1411843): [<ffffffff810cd947>] run_ksoftirqd+0x37/0x60
> To this, Please use this patch series news://nntp.lore.kernel.org:119/20220422194416.983549-1-yanjun.zhu@linux.dev
>
> Zhu Yanjun
>>
>> I think the above is strong evidence that there is something wrong with the
>> soft-RoCE driver.
>>
>> Thanks,
>>
>> Bart.
>
I was showing siw results not rxe results. When I have run srp on rxe I use a patch similar to the
one Zhu suggested to fix the lockdep warnings.
Bob
next prev parent reply other threads:[~2022-05-07 13:40 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-06 18:11 Apparent regression in blktests since 5.18-rc1+ Bob Pearson
2022-05-07 0:10 ` Bart Van Assche
2022-05-07 0:29 ` Yanjun Zhu
2022-05-07 1:29 ` Jason Gunthorpe
2022-05-07 1:55 ` Yanjun Zhu
2022-05-07 13:43 ` Bob Pearson
2022-05-08 4:13 ` Bart Van Assche
2022-05-10 15:24 ` Pearson, Robert B
2022-05-12 21:57 ` Bob Pearson
2022-05-12 22:25 ` Bart Van Assche
2022-05-13 0:41 ` Bob Pearson
2022-05-13 3:40 ` Bart Van Assche
2022-05-17 15:21 ` Bob Pearson
2022-05-17 20:44 ` Bart Van Assche
2022-05-17 20:54 ` Bob Pearson
2022-05-17 20:59 ` Bob Pearson
2022-05-08 8:43 ` Yanjun Zhu
2022-05-09 8:01 ` Zhu Yanjun
2022-05-09 11:52 ` Jason Gunthorpe
2022-05-09 12:31 ` Yanjun Zhu
2022-05-09 12:33 ` Jason Gunthorpe
2022-05-09 12:42 ` Yanjun Zhu
2022-05-07 13:40 ` Bob Pearson [this message]
2022-05-09 6:56 ` Thorsten Leemhuis
2022-05-10 3:53 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8a05c359-8e2d-b88d-8741-2743be2eb779@gmail.com \
--to=rpearsonhpe@gmail.com \
--cc=bmt@zurich.ibm.com \
--cc=bvanassche@acm.org \
--cc=jgg@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
--cc=yanjun.zhu@linux.dev \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.