From: Jason Gunthorpe <jgg@ziepe.ca>
To: "Wei Hu (Xavier)" <xavier.huwei@huawei.com>
Cc: dledford@redhat.com, linux-rdma@vger.kernel.org,
lijun_nudt@163.com, oulijun@huawei.com, liudongdong3@huawei.com,
liuyixian@huawei.com, zhangxiping3@huawei.com,
linuxarm@huawei.com, linux-kernel@vger.kernel.org,
xavier_huwei@163.com
Subject: Re: [PATCH rdma-rc 1/3] RDMA/hns: Fix the Oops during rmmod or insmod ko when reset occurs
Date: Tue, 15 Jan 2019 15:02:59 -0700 [thread overview]
Message-ID: <20190115220259.GH22045@ziepe.ca> (raw)
In-Reply-To: <5C3D3BD1.4000508@huawei.com>
On Tue, Jan 15, 2019 at 09:48:01AM +0800, Wei Hu (Xavier) wrote:
>
>
> On 2019/1/15 6:06, Jason Gunthorpe wrote:
> > On Sat, Jan 12, 2019 at 03:55:31PM +0800, Wei Hu (Xavier) wrote:
> >>
> >> On 2019/1/12 5:34, Jason Gunthorpe wrote:
> >>> On Thu, Jan 10, 2019 at 09:57:41PM +0800, Wei Hu (Xavier) wrote:
> >>>> + /* Check the status of the current software reset process, if in
> >>>> + * software reset process, wait until software reset process finished,
> >>>> + * in order to ensure that reset process and this function will not call
> >>>> + * __hns_roce_hw_v2_uninit_instance at the same time.
> >>>> + * If a timeout occurs, it indicates that the network subsystem has
> >>>> + * encountered a serious error and cannot be recovered from the reset
> >>>> + * processing.
> >>>> + */
> >>>> + if (ops->ae_dev_resetting(handle)) {
> >>>> + dev_warn(dev, "Device is busy in resetting state. waiting.\n");
> >>>> + end = msecs_to_jiffies(HNS_ROCE_V2_RST_PRC_MAX_TIME) + jiffies;
> >>>> + while (ops->ae_dev_resetting(handle) &&
> >>>> + time_before(jiffies, end))
> >>>> + msleep(20);
> >>> Really? Does this have to be so ugly? Why isn't there just a simple
> >>> lock someplace that is held during reset?
> >>>
> >>> I'm skeptical that all this strange looking stuff is properly locked
> >>> and concurrency safe.
> >> Hi, Jason
> >>
> >> The hns3 NIC driver notifies the hns RoCE driver to perform
> >> reset related processing by calling the .reset_notify() interface
> >> registered by the RoCE driver.
> >>
> >> There is a constraint on the hip08 chip, the NIC driver needs to
> >> stop the flow before hardware startup reset, otherwise the chip
> >> may hang up.
> >>
> >> We've also thought about using locks, but found using locks can
> >> lead to more serious problems because of that restriction of the
> >> chip.
> >> If using locks here, reset processing may wait for uninstallation
> >> to complete, this may lead that NIC driver fails to stop the flow
> >> in time in the reset process, thus causing the chip to hang up.
> > If you are sleeping then I'm sure a lock can be used instead, how
> > would it be any different?
> Hi, Jason
> If using locks here, reset process may wait until uninstallation to
> complete,
> it may trigger the chip constraint, causing chip to hang up.
> But if using sleeping here, there will notthe case that reset
> process wait until
> uninstallation to complete, then will not trigger the chip
> constraint.
But how is this even right? If ops->ae_dev_resetting can change at any
time, and you need to wait for it here, without locks can't it just
change instantly after the if statement?
I think it shows the concurrancy & locking is not done right when I
see loops reading shared data and spinning on them with msleep.
Jason
next prev parent reply other threads:[~2019-01-15 22:03 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-10 13:57 [PATCH rdma-rc 0/3] RDMA/hns: Some fixes for 5.0 Wei Hu (Xavier)
2019-01-10 13:57 ` [PATCH rdma-rc 1/3] RDMA/hns: Fix the Oops during rmmod or insmod ko when reset occurs Wei Hu (Xavier)
2019-01-11 21:34 ` Jason Gunthorpe
2019-01-12 7:55 ` Wei Hu (Xavier)
2019-01-14 22:06 ` Jason Gunthorpe
2019-01-15 1:48 ` Wei Hu (Xavier)
2019-01-15 22:02 ` Jason Gunthorpe [this message]
2019-01-18 12:58 ` Wei Hu (Xavier)
2019-01-10 13:57 ` [PATCH rdma-rc 2/3] RDMA/hns: Fix the chip hanging caused by sending mailbox&CMQ during reset Wei Hu (Xavier)
2019-01-10 13:57 ` [PATCH rdma-rc 3/3] RDMA/hns: Fix the chip hanging caused by sending doorbell " Wei Hu (Xavier)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190115220259.GH22045@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=dledford@redhat.com \
--cc=lijun_nudt@163.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=liudongdong3@huawei.com \
--cc=liuyixian@huawei.com \
--cc=oulijun@huawei.com \
--cc=xavier.huwei@huawei.com \
--cc=xavier_huwei@163.com \
--cc=zhangxiping3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox