public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Wei Hu (Xavier)" <xavier.huwei@huawei.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: <dledford@redhat.com>, <linux-rdma@vger.kernel.org>,
	<lijun_nudt@163.com>, <oulijun@huawei.com>,
	<liudongdong3@huawei.com>, <liuyixian@huawei.com>,
	<zhangxiping3@huawei.com>, <linuxarm@huawei.com>,
	<linux-kernel@vger.kernel.org>, <xavier_huwei@163.com>
Subject: Re: [PATCH rdma-rc 1/3] RDMA/hns: Fix the Oops during rmmod or insmod ko when reset occurs
Date: Fri, 18 Jan 2019 20:58:11 +0800	[thread overview]
Message-ID: <5C41CD63.60506@huawei.com> (raw)
In-Reply-To: <20190115220259.GH22045@ziepe.ca>



On 2019/1/16 6:02, Jason Gunthorpe wrote:
> On Tue, Jan 15, 2019 at 09:48:01AM +0800, Wei Hu (Xavier) wrote:
>>
>> On 2019/1/15 6:06, Jason Gunthorpe wrote:
>>> On Sat, Jan 12, 2019 at 03:55:31PM +0800, Wei Hu (Xavier) wrote:
>>>> On 2019/1/12 5:34, Jason Gunthorpe wrote:
>>>>> On Thu, Jan 10, 2019 at 09:57:41PM +0800, Wei Hu (Xavier) wrote:
>>>>>> +	/* Check the status of the current software reset process, if in
>>>>>> +	 * software reset process, wait until software reset process finished,
>>>>>> +	 * in order to ensure that reset process and this function will not call
>>>>>> +	 * __hns_roce_hw_v2_uninit_instance at the same time.
>>>>>> +	 * If a timeout occurs, it indicates that the network subsystem has
>>>>>> +	 * encountered a serious error and cannot be recovered from the reset
>>>>>> +	 * processing.
>>>>>> +	 */
>>>>>> +	if (ops->ae_dev_resetting(handle)) {
>>>>>> +		dev_warn(dev, "Device is busy in resetting state. waiting.\n");
>>>>>> +		end = msecs_to_jiffies(HNS_ROCE_V2_RST_PRC_MAX_TIME) + jiffies;
>>>>>> +		while (ops->ae_dev_resetting(handle) &&
>>>>>> +		       time_before(jiffies, end))
>>>>>> +			msleep(20);
>>>>> Really? Does this have to be so ugly? Why isn't there just a simple
>>>>> lock someplace that is held during reset?
>>>>>
>>>>> I'm skeptical that all this strange looking stuff is properly locked
>>>>> and concurrency safe.
>>>> Hi, Jason
>>>>
>>>> The hns3 NIC driver notifies the hns RoCE driver to perform
>>>> reset related processing by calling the .reset_notify() interface
>>>> registered by the RoCE driver.
>>>>
>>>> There is a constraint on the hip08 chip, the NIC driver needs to
>>>> stop the flow before hardware startup reset, otherwise the chip
>>>> may hang up.
>>>>
>>>> We've also thought about using locks, but found using locks can
>>>> lead to more serious problems because of that restriction of the
>>>> chip.
>>>> If using locks here, reset processing may wait for uninstallation
>>>> to complete, this may lead that NIC driver fails to stop the flow
>>>> in time in the reset process, thus causing the chip to hang up.
>>> If you are sleeping then I'm sure a lock can be used instead, how
>>> would it be any different?
>> Hi, Jason
>>     If using locks here, reset process may wait until uninstallation to
>> complete,
>>         it may trigger the chip constraint, causing chip to hang up.
>>     But if using sleeping here, there will notthe case that reset
>> process wait until
>>        uninstallation to complete, then will not trigger the chip
>> constraint.
> But how is this even right? If ops->ae_dev_resetting can change at any
> time, and you need to wait for it here, without locks can't it just
> change instantly after the if statement?
>
> I think it shows the concurrancy & locking is not done right when I
> see loops reading shared data and spinning on them with msleep.
Hi, Jason

    Thanks for your comments,
    We will modify the related process in hns NIC driver and delete checking
    whether in the reset state and waiting for the reset to complete in
    hns_roce_hw_v2_uninit_instance function, and will send patch V2 for
    rdma-next branch. Thanks

    Regards
Xavier
> Jason
>
> .
>



  reply	other threads:[~2019-01-18 12:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-10 13:57 [PATCH rdma-rc 0/3] RDMA/hns: Some fixes for 5.0 Wei Hu (Xavier)
2019-01-10 13:57 ` [PATCH rdma-rc 1/3] RDMA/hns: Fix the Oops during rmmod or insmod ko when reset occurs Wei Hu (Xavier)
2019-01-11 21:34   ` Jason Gunthorpe
2019-01-12  7:55     ` Wei Hu (Xavier)
2019-01-14 22:06       ` Jason Gunthorpe
2019-01-15  1:48         ` Wei Hu (Xavier)
2019-01-15 22:02           ` Jason Gunthorpe
2019-01-18 12:58             ` Wei Hu (Xavier) [this message]
2019-01-10 13:57 ` [PATCH rdma-rc 2/3] RDMA/hns: Fix the chip hanging caused by sending mailbox&CMQ during reset Wei Hu (Xavier)
2019-01-10 13:57 ` [PATCH rdma-rc 3/3] RDMA/hns: Fix the chip hanging caused by sending doorbell " Wei Hu (Xavier)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5C41CD63.60506@huawei.com \
    --to=xavier.huwei@huawei.com \
    --cc=dledford@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=lijun_nudt@163.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=liudongdong3@huawei.com \
    --cc=liuyixian@huawei.com \
    --cc=oulijun@huawei.com \
    --cc=xavier_huwei@163.com \
    --cc=zhangxiping3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox