From: Zhu Yanjun <yanjun.zhu@linux.dev>
To: Bob Pearson <rpearsonhpe@gmail.com>,
"Daisuke Matsuda (Fujitsu)" <matsuda-daisuke@fujitsu.com>,
'Bart Van Assche' <bvanassche@acm.org>,
'Rain River' <rain.1986.08.12@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>,
"leon@kernel.org" <leon@kernel.org>,
Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
RDMA mailing list <linux-rdma@vger.kernel.org>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: [bug report] blktests srp/002 hang
Date: Wed, 18 Oct 2023 16:16:45 +0800 [thread overview]
Message-ID: <2a5e1fb6-6c73-4d25-b29a-4ccdbf2c5678@linux.dev> (raw)
In-Reply-To: <a3be5e98-e783-4108-a690-acc8a5cc5981@gmail.com>
在 2023/10/18 1:09, Bob Pearson 写道:
> On 9/25/23 20:17, Daisuke Matsuda (Fujitsu) wrote:
>> On Tue, Sep 26, 2023 12:01 AM Bart Van Assche:
>>> On 9/24/23 21:47, Daisuke Matsuda (Fujitsu) wrote:
>>>> As Bob wrote above, nobody has found any logical failure in rxe
>>>> driver.
>>> That's wrong. In case you would not yet have noticed my latest email in
>>> this thread, please take a look at
>>> https://lore.kernel.org/linux-rdma/e8b76fae-780a-470e-8ec4-c6b650793d10@leemhuis.info/T/#m0fd8ea8a4cbc27b37
>>> b042ae4f8e9b024f1871a73.
>>> I think the report in that email is a 100% proof that there is a
>>> use-after-free issue in the rdma_rxe driver. Use-after-free issues have
>>> security implications and also can cause data corruption. I propose to
>>> revert the commit that introduced the rdma_rxe use-after-free unless
>>> someone comes up with a fix for the rdma_rxe driver.
>>>
>>> Bart.
>> Thank you for the clarification. I see your intention.
>> I hope the hang issue will be resolved by addressing this.
>>
>> Thanks,
>> Daisuke
>>
> I have made some progress in understanding the cause of the srp/002 etc. hang.
>
> The two attached files are traces of activity for two qp's qp#151 and qp#167. In my runs of srp/002
> All the qp's pass before 167 and all fail after 167 which is the first to fail.
>
> It turns out that all the passing qp's call srp_post_send() some number of times and also call
> srp_send_done() the same number of times. Starting at qp#167 the last call to srp_send_done() does
> not take place leaving the srp driver waiting for the final completion and causing the hang I believe.
Thanks, Bob
I will delve into your findings and the source code to find the root cause.
BTW, what linux distribution are you using to find this? Ubuntu, Fedora
or Debian?
From the above, sometings this problem is difficult to reproduce on
Ubuntu. But it can be reproduced in Ubuntu and Debian.
So can you let me know what linux distribution you are using?
Thanks
Zhu Yanjun
>
> There are four cq's involved in each pair of qp's in the srp test. Two in ib_srp and two in ib_srpt
> for the two qp's. Three of them execute completion processing in a soft irq context so the code in
> core/cq.c gathers the completions and calls back to the srp drivers. The send side cq in srp uses
> cq_direct which requires srp to call ib_process_direct() in order to collect the completions. This
> happens in __srp_get_tx_iu() which is called in several places in the srp driver. But only as a side effect
> since the purpose of this routine is to get an iu to start a new command.
>
> In the attached files for qp#151 the final call to srp_post_send is followed by the rxe requester and
> completer work queues processing the send packet and the ack before a final call to __srp_get_rx_iu()
> which gathers the final send side completion and success.
>
> For qp#167 the call to srp_post_send() is followed by the rxe driver processing the send operation and
> generating a work completion which is posted to the send cq but there is never a following call to
> __srp_get_rx_iu() so the cqe is not received by srp and failure.
>
> I don't yet understand the logic of the srp driver to fix this but the problem is not in the rxe driver
> as far as I can tell.
>
> Bob
next prev parent reply other threads:[~2023-10-18 8:17 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-21 6:46 [bug report] blktests srp/002 hang Shinichiro Kawasaki
2023-08-22 1:46 ` Bob Pearson
2023-08-22 10:18 ` Shinichiro Kawasaki
2023-08-22 15:20 ` Bart Van Assche
2023-08-23 16:19 ` Bob Pearson
2023-08-23 19:46 ` Bart Van Assche
2023-08-24 16:24 ` Bob Pearson
2023-08-24 8:55 ` Bernard Metzler
2023-08-24 15:35 ` Bernard Metzler
2023-08-24 16:05 ` Bart Van Assche
2023-08-24 16:27 ` Bob Pearson
2023-08-25 1:11 ` Shinichiro Kawasaki
2023-08-25 1:36 ` Bob Pearson
2023-08-25 10:16 ` Shinichiro Kawasaki
2023-08-25 13:49 ` Bart Van Assche
2023-08-25 13:52 ` Bart Van Assche
2023-09-13 17:36 ` Bob Pearson
2023-09-13 23:38 ` Zhu Yanjun
2023-09-16 5:59 ` Zhu Yanjun
2023-09-19 4:14 ` Shinichiro Kawasaki
2023-09-19 8:07 ` Zhu Yanjun
2023-09-19 16:30 ` Pearson, Robert B
2023-09-19 18:11 ` Bob Pearson
2023-09-20 4:22 ` Zhu Yanjun
2023-09-20 16:24 ` Bob Pearson
2023-09-20 16:36 ` Bart Van Assche
2023-09-20 17:18 ` Bob Pearson
2023-09-20 17:22 ` Bart Van Assche
2023-09-20 17:29 ` Bob Pearson
2023-09-21 5:46 ` Zhu Yanjun
2023-09-21 10:06 ` Zhu Yanjun
2023-09-21 14:23 ` Rain River
2023-09-21 14:39 ` Bob Pearson
2023-09-21 15:08 ` Zhu Yanjun
2023-09-21 15:10 ` Zhu Yanjun
2023-09-22 18:14 ` Bob Pearson
2023-09-22 22:06 ` Bart Van Assche
2023-09-24 1:17 ` Rain River
2023-09-25 4:47 ` Daisuke Matsuda (Fujitsu)
2023-09-25 14:31 ` Zhu Yanjun
2023-09-26 1:09 ` Daisuke Matsuda (Fujitsu)
2023-09-26 6:09 ` Zhu Yanjun
2023-09-25 15:00 ` Bart Van Assche
2023-09-25 15:25 ` Bob Pearson
2023-09-25 15:52 ` Jason Gunthorpe
2023-09-25 15:54 ` Bob Pearson
2023-09-25 19:57 ` Bob Pearson
2023-09-25 20:33 ` Bart Van Assche
2023-09-25 20:40 ` Bob Pearson
2023-09-26 15:36 ` Rain River
2023-09-26 1:17 ` Daisuke Matsuda (Fujitsu)
2023-10-17 17:09 ` Bob Pearson
2023-10-17 17:13 ` Bart Van Assche
2023-10-17 17:15 ` Bob Pearson
2023-10-17 17:19 ` Bob Pearson
2023-10-17 17:34 ` Bart Van Assche
2023-10-17 17:58 ` Jason Gunthorpe
2023-10-17 18:44 ` Bob Pearson
2023-10-17 18:51 ` Jason Gunthorpe
2023-10-17 19:55 ` Bob Pearson
2023-10-17 20:06 ` Bart Van Assche
2023-10-17 20:13 ` Bob Pearson
2023-10-17 21:14 ` Bob Pearson
2023-10-17 21:18 ` Bart Van Assche
2023-10-17 21:23 ` Bob Pearson
2023-10-17 21:30 ` Bart Van Assche
2023-10-17 21:39 ` Bob Pearson
2023-10-17 22:42 ` Bart Van Assche
2023-10-18 18:29 ` Bob Pearson
2023-10-18 19:17 ` Jason Gunthorpe
2023-10-18 19:48 ` Bart Van Assche
2023-10-18 20:03 ` Bob Pearson
2023-10-18 20:04 ` Bob Pearson
2023-10-18 20:14 ` Bob Pearson
2023-10-18 20:29 ` Bob Pearson
2023-10-18 20:49 ` Bart Van Assche
2023-10-18 21:17 ` Pearson, Robert B
2023-10-18 21:27 ` Bart Van Assche
2023-10-18 21:52 ` Bob Pearson
2023-10-19 19:17 ` Bart Van Assche
2023-10-20 17:12 ` Bob Pearson
2023-10-20 17:41 ` Bart Van Assche
2023-10-18 19:38 ` Bart Van Assche
2023-10-17 19:18 ` Bart Van Assche
2023-10-18 8:16 ` Zhu Yanjun [this message]
2023-09-22 11:06 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-10-13 12:51 ` Linux regression tracking #update (Thorsten Leemhuis)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2a5e1fb6-6c73-4d25-b29a-4ccdbf2c5678@linux.dev \
--to=yanjun.zhu@linux.dev \
--cc=bvanassche@acm.org \
--cc=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=matsuda-daisuke@fujitsu.com \
--cc=rain.1986.08.12@gmail.com \
--cc=rpearsonhpe@gmail.com \
--cc=shinichiro.kawasaki@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox