linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: "Liuyixian (Eason)" <liuyixian@huawei.com>
Cc: dledford@redhat.com, linux-rdma@vger.kernel.org, jgg@ziepe.ca,
	linuxarm@huawei.com
Subject: Re: [PATCH for-next] RDMA/hns: Bugfix for flush cqe in case softirq and multi-process
Date: Tue, 5 Nov 2019 16:37:24 +0200	[thread overview]
Message-ID: <20191105143724.GD6763@unreal> (raw)
In-Reply-To: <2a0ae88d-908f-df4b-11ea-26e639b7b338@huawei.com>

On Tue, Nov 05, 2019 at 10:06:20AM +0800, Liuyixian (Eason) wrote:
>
>
> On 2019/10/28 17:34, Liuyixian (Eason) wrote:
> >
> >
> > On 2019/10/15 16:00, Leon Romanovsky wrote:
> >> On Sat, Oct 12, 2019 at 11:53:36AM +0800, Liuyixian (Eason) wrote:
> >>>
> >>>
> >>> On 2019/9/24 11:54, Liuyixian (Eason) wrote:
> >>>>
> >>>>
> >>>> On 2019/9/23 13:01, Leon Romanovsky wrote:
> >>>>> On Fri, Sep 20, 2019 at 11:55:56AM +0800, Liuyixian (Eason) wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 2019/9/11 21:17, Liuyixian (Eason) wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> On 2019/9/10 15:52, Leon Romanovsky wrote:
> >>>>>>>> On Tue, Sep 10, 2019 at 02:40:20PM +0800, Liuyixian (Eason) wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 2019/9/8 16:03, Leon Romanovsky wrote:
> >>>>>>>>>> On Thu, Sep 05, 2019 at 08:31:11PM +0800, Weihang Li wrote:
> >>>>>>>>>>> From: Yixian Liu <liuyixian@huawei.com>
> >>>>>>>>>>>
> >>>>>>>>>>> Hip08 has the feature flush cqe, which help to flush wqe in workqueue
> >>>>>>>>>>> (sq and rq) when error happened by transmitting producer index with
> >>>>>>>>>>> mailbox to hardware. Flush cqe is emplemented in post send and recv
> >>>>>>>>>>> verbs. However, under NVMe cases, these verbs will be called under
> >>>>>>>>>>> softirq context, and it will lead to following calltrace with
> >>>>>>>>>>> current driver as mailbox used by flush cqe can go to sleep.
> >>>>>>>>>>>
> >>>>>>>>>>> This patch solves this problem by using workqueue to do flush cqe,
> >>>>>>>>>>
> >>>>>>>>>> Unbelievable, almost every bug in this driver is solved by introducing
> >>>>>>>>>> workqueue. You should fix "sleep in flush path" issue and not by adding
> >>>>>>>>>> new workqueue.
> >>>>>>>>>>
> >>>>>>>>> Hi Leon,
> >>>>>>>>>
> >>>>>>>>> Thanks for the comment.
> >>>>>>>>> Up to now, for hip08, only one place use workqueue in hns_roce_hw_v2.c
> >>>>>>>>> where for irq prints.
> >>>>>>>>
> >>>>>>>> Thanks to our lack of desire to add more workqueues and previous patches
> >>>>>>>> which removed extra workqueues from the driver.
> >>>>>>>>
> >>>>>>> Thanks, I see.
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>> The solution for flush cqe in this patch is as follow:
> >>>>>>>>> While flush cqe should be implement, the driver should modify qp to error state
> >>>>>>>>> through mailbox with the newest product index of sq and rq, the hardware then
> >>>>>>>>> can flush all outstanding wqes in sq and rq.
> >>>>>>>>>
> >>>>>>>>> That's the whole mechanism of flush cqe, also is the flush path. We can't
> >>>>>>>>> change neither mailbox sleep attribute or flush cqe occurred in post send/recv.
> >>>>>>>>> To avoid the calltrace of flush cqe in post verbs under NVMe softirq,
> >>>>>>>>> use workqueue for flush cqe seems reasonable.
> >>>>>>>>>
> >>>>>>>>> As far as I know, there is no other alternative solution for this situation.
> >>>>>>>>> I will be very grateful if you reminder me more information.
> >>>>>>>>
> >>>>>>>> ib_drain_rq/ib_drain_sq/ib_drain_qp????
> >>>>>>>>
> >>>>>>> Hi Leon,
> >>>>>>>
> >>>>>>> I think these interfaces are designed for application to check that all wqes
> >>>>>>> have been processed by hardware, so called drain or flush. However, it is not
> >>>>>>> the same as the flush in this patch. The solution in this patch is used
> >>>>>>> to help the hardware generate flush cqes for outstanding wqes while qp error.
> >>>>>>>
> >>>>>> Hi Leon,
> >>>>>>
> >>>>>> What's your opinion about above? Do you have any further comments?
> >>>>>
> >>>>> My opinion didn't change, you need to read discussions about ib_drain_*()
> >>>>> functions, how and why they were introduced. It is a way to go.
> >>>>>
> >>>>> Thanks
> >>>>
> >>>> Hi Leon,
> >>>>
> >>>> Thanks a lot! I will dig those functions for my problem.
> >>>>
> >>>
> >>> Hi Leon,
> >>>
> >>> I have analysis the mechanism of ib_drain_(qp, sq, rq), that's okay to use
> >>> it instead of our flush cqe as both of them are calling modify qp to error
> >>> state in flush path.
> >>>
> >>> However, both ib_drain_* and flush cqe will face the same problem as declared
> >>> in previous emails, that is, in NVME case, post verbs will be called under
> >>> **softirq**, which will result to calltrace as mailbox used in modify qp
> >>> (flush path) can sleep, this is not allowed under softirq.
> >>>
> >>> Thus, to resolve above calltrace (sleep in softirq), using workqueue as in
> >>> this patch seems is a reasonable solution regardless of ib_drain_qp or
> >>> flush cqe is called in the workqueue.
> >>>
> >>> I think it is not a good idea to fix sleep in flush path (actually referred
> >>> to mailbox used in modify qp) as the mailbox is such a mature mechanism.
> >>
> >> No, it is not reasonable solution.
> >>
> >
> > Hi Leon,
> >
> >      I have explained this issue better in another patch set and pruned other logic.
> >      Thanks a lot for your review!
> >
> > Best regards.
> > Eason
> >
>
> Hi Doug and Loen,
>
> I just want to make sure that you know the above mentioned patch set is on:
> https://patchwork.kernel.org/project/linux-rdma/list/?series=194423
>
> Sorry to reply your last comment so late as I analyzed all possible solutions with
> your comment, and found that I haven't describe our problem clear enough and accurate,
> thus, I made this new patch set with simple logic and detailed commit message. I hope
> I have clearly explained this problem .

Hi,

I'm confident that Doug and/or Jason will review it very soon.

Thanks

>
> Thanks.
>
>
>
>

  reply	other threads:[~2019-11-05 14:37 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-05 12:31 [PATCH for-next] RDMA/hns: Bugfix for flush cqe in case softirq and multi-process Weihang Li
2019-09-08  8:03 ` Leon Romanovsky
2019-09-10  6:40   ` Liuyixian (Eason)
2019-09-10  7:52     ` Leon Romanovsky
2019-09-11 13:17       ` Liuyixian (Eason)
2019-09-20  3:55         ` Liuyixian (Eason)
2019-09-23  5:01           ` Leon Romanovsky
2019-09-24  3:54             ` Liuyixian (Eason)
2019-10-12  3:53               ` Liuyixian (Eason)
2019-10-15  8:00                 ` Leon Romanovsky
2019-10-28  9:34                   ` Liuyixian (Eason)
2019-11-05  2:06                     ` Liuyixian (Eason)
2019-11-05 14:37                       ` Leon Romanovsky [this message]
2019-11-06  2:16                         ` Liuyixian (Eason)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191105143724.GD6763@unreal \
    --to=leon@kernel.org \
    --cc=dledford@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=liuyixian@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).