linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Bob Pearson <rpearsonhpe@gmail.com>
Cc: "Daisuke Matsuda (Fujitsu)" <matsuda-daisuke@fujitsu.com>,
	'Bart Van Assche' <bvanassche@acm.org>,
	'Rain River' <rain.1986.08.12@gmail.com>,
	Zhu Yanjun <yanjun.zhu@linux.dev>,
	"leon@kernel.org" <leon@kernel.org>,
	Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
	RDMA mailing list <linux-rdma@vger.kernel.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: [bug report] blktests srp/002 hang
Date: Tue, 17 Oct 2023 15:51:39 -0300	[thread overview]
Message-ID: <20231017185139.GA691768@ziepe.ca> (raw)
In-Reply-To: <8801fc68-0e8e-4bb1-acaa-597bf72a567d@gmail.com>

On Tue, Oct 17, 2023 at 01:44:58PM -0500, Bob Pearson wrote:
> On 10/17/23 12:58, Jason Gunthorpe wrote:
> > On Tue, Oct 17, 2023 at 12:09:31PM -0500, Bob Pearson wrote:
> > 
> >  
> >> For qp#167 the call to srp_post_send() is followed by the rxe driver
> >> processing the send operation and generating a work completion which
> >> is posted to the send cq but there is never a following call to
> >> __srp_get_rx_iu() so the cqe is not received by srp and failure.
> > 
> > ? I don't see this funcion in the kernel?  __srp_get_tx_iu ?
> >  
> >> I don't yet understand the logic of the srp driver to fix this but
> >> the problem is not in the rxe driver as far as I can tell.
> > 
> > It looks to me like __srp_get_tx_iu() is following the design pattern
> > where the send queue is only polled when it needs to allocate a new
> > send buffer - ie the send buffers are pre-allocated and cycle through
> > the queue.
> > 
> > So, it is not surprising this isn't being called if it is hung - the
> > hang is probably something that is preventing it from even wanting to
> > send, which is probably a receive side issue.
> > 
> > Followup back up from that point to isolate what is the missing
> > resouce to trigger send may bring some more clarity.
> > 
> > Alternatively if __srp_get_tx_iu() is failing then perhaps you've run
> > into an issue where it hit something rare and recovery does not work.
> > 
> > eg this kind of design pattern carries a subtle assumption that the rx
> > and send CQ are ordered together. Getting a rx CQ before a matching tx
> > CQ can trigger the unusual scenario where the send side runs out of
> > resources.
> > 
> > Jason
> 
> In all the traces I have looked at the hang only occurs once the final
> send side completions are not received. This happens when the srp
> driver doesn't poll (i.e. call ib_process_cq_direct). The rest is
> my conjecture. Since there are several (e.g. qp#167 through qp#211 (odd))
> qp's with missing completions there are 23 iu's tied up when srp hangs.
> Your suggestion makes sense as why the hang occurs. When the test
> finishes the qp's are destroyed and the driver calls ib_process_cq_direct
> again which cleans up the resources.
> 
> The problem is that there isn't any obvious way to find a thread related
> to the missing cqe to poll for them. I think the best way to fix this is
> to convert the send side cq handling to interrupt driven (as is the case
> with the srpt driver.) The provider drivers have to run in any case to
> convert cqe's to wc's so there isn't much penalty to call the cq
> completion handler since there is already software running and then you
> will get reliable delivery of completions.

Can you add tracing to show that SRP is running out of SQ resources,
ie __srp_get_tx_iu() fails and that is a precondition for the hang?

I am fully willing to belive that is not ever tested.

Otherwise if srp thinks it has SQ resources then the SQ is probably
not the cause of the hang.

Jason

  reply	other threads:[~2023-10-17 18:51 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-21  6:46 [bug report] blktests srp/002 hang Shinichiro Kawasaki
2023-08-22  1:46 ` Bob Pearson
2023-08-22 10:18   ` Shinichiro Kawasaki
2023-08-22 15:20     ` Bart Van Assche
2023-08-23 16:19       ` Bob Pearson
2023-08-23 19:46         ` Bart Van Assche
2023-08-24 16:24           ` Bob Pearson
2023-08-24  8:55         ` Bernard Metzler
2023-08-24 15:35         ` Bernard Metzler
2023-08-24 16:05           ` Bart Van Assche
2023-08-24 16:27             ` Bob Pearson
2023-08-25  1:11       ` Shinichiro Kawasaki
2023-08-25  1:36         ` Bob Pearson
2023-08-25 10:16           ` Shinichiro Kawasaki
2023-08-25 13:49           ` Bart Van Assche
2023-08-25 13:52         ` Bart Van Assche
2023-09-13 17:36           ` Bob Pearson
2023-09-13 23:38             ` Zhu Yanjun
2023-09-16  5:59               ` Zhu Yanjun
2023-09-19  4:14                 ` Shinichiro Kawasaki
2023-09-19  8:07                   ` Zhu Yanjun
2023-09-19 16:30                     ` Pearson, Robert B
2023-09-19 18:11                     ` Bob Pearson
2023-09-20  4:22                       ` Zhu Yanjun
2023-09-20 16:24                         ` Bob Pearson
2023-09-20 16:36                           ` Bart Van Assche
2023-09-20 17:18                             ` Bob Pearson
2023-09-20 17:22                               ` Bart Van Assche
2023-09-20 17:29                                 ` Bob Pearson
2023-09-21  5:46                                   ` Zhu Yanjun
2023-09-21 10:06                                   ` Zhu Yanjun
2023-09-21 14:23                                   ` Rain River
2023-09-21 14:39                                     ` Bob Pearson
2023-09-21 15:08                                       ` Zhu Yanjun
2023-09-21 15:10                                       ` Zhu Yanjun
2023-09-22 18:14                                         ` Bob Pearson
2023-09-22 22:06                                           ` Bart Van Assche
2023-09-24  1:17                                           ` Rain River
2023-09-25  4:47                                             ` Daisuke Matsuda (Fujitsu)
2023-09-25 14:31                                               ` Zhu Yanjun
2023-09-26  1:09                                                 ` Daisuke Matsuda (Fujitsu)
2023-09-26  6:09                                                   ` Zhu Yanjun
2023-09-25 15:00                                               ` Bart Van Assche
2023-09-25 15:25                                                 ` Bob Pearson
2023-09-25 15:52                                                 ` Jason Gunthorpe
2023-09-25 15:54                                                   ` Bob Pearson
2023-09-25 19:57                                                 ` Bob Pearson
2023-09-25 20:33                                                   ` Bart Van Assche
2023-09-25 20:40                                                     ` Bob Pearson
2023-09-26 15:36                                                   ` Rain River
2023-09-26  1:17                                                 ` Daisuke Matsuda (Fujitsu)
2023-10-17 17:09                                                   ` Bob Pearson
2023-10-17 17:13                                                     ` Bart Van Assche
2023-10-17 17:15                                                       ` Bob Pearson
2023-10-17 17:19                                                       ` Bob Pearson
2023-10-17 17:34                                                         ` Bart Van Assche
2023-10-17 17:58                                                     ` Jason Gunthorpe
2023-10-17 18:44                                                       ` Bob Pearson
2023-10-17 18:51                                                         ` Jason Gunthorpe [this message]
2023-10-17 19:55                                                           ` Bob Pearson
2023-10-17 20:06                                                             ` Bart Van Assche
2023-10-17 20:13                                                               ` Bob Pearson
2023-10-17 21:14                                                               ` Bob Pearson
2023-10-17 21:18                                                                 ` Bart Van Assche
2023-10-17 21:23                                                                   ` Bob Pearson
2023-10-17 21:30                                                                     ` Bart Van Assche
2023-10-17 21:39                                                                       ` Bob Pearson
2023-10-17 22:42                                                                         ` Bart Van Assche
2023-10-18 18:29                                                                           ` Bob Pearson
2023-10-18 19:17                                                                             ` Jason Gunthorpe
2023-10-18 19:48                                                                               ` Bart Van Assche
2023-10-18 20:03                                                                                 ` Bob Pearson
2023-10-18 20:04                                                                                 ` Bob Pearson
2023-10-18 20:14                                                                                 ` Bob Pearson
     [not found]                                                                                 ` <fb5f6da5-5017-440d-9cb5-38796554366c@gmail.com>
2023-10-18 20:49                                                                                   ` Bart Van Assche
2023-10-18 21:17                                                                                     ` Pearson, Robert B
2023-10-18 21:27                                                                                       ` Bart Van Assche
2023-10-18 21:52                                                                                         ` Bob Pearson
2023-10-19 19:17                                                                                           ` Bart Van Assche
2023-10-20 17:12                                                                                             ` Bob Pearson
2023-10-20 17:41                                                                                               ` Bart Van Assche
2023-10-18 19:38                                                                             ` Bart Van Assche
2023-10-17 19:18                                                       ` Bart Van Assche
2023-10-18  8:16                                                     ` Zhu Yanjun
2023-09-22 11:06 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-10-13 12:51   ` Linux regression tracking #update (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231017185139.GA691768@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=bvanassche@acm.org \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=matsuda-daisuke@fujitsu.com \
    --cc=rain.1986.08.12@gmail.com \
    --cc=rpearsonhpe@gmail.com \
    --cc=shinichiro.kawasaki@wdc.com \
    --cc=yanjun.zhu@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).