public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Doug Ledford <dledford@redhat.com>
To: Lijun Ou <oulijun@huawei.com>, jgg@ziepe.ca
Cc: leon@kernel.org, linux-rdma@vger.kernel.org, linuxarm@huawei.com
Subject: Re: [PATCH for-next 3/9] RDMA/hns: Completely release qp resources when hw err
Date: Mon, 12 Aug 2019 11:29:14 -0400	[thread overview]
Message-ID: <f49c56933205d90d82ffd3fa55a951843e22cda1.camel@redhat.com> (raw)
In-Reply-To: <1565343666-73193-4-git-send-email-oulijun@huawei.com>

[-- Attachment #1: Type: text/plain, Size: 2910 bytes --]

On Fri, 2019-08-09 at 17:41 +0800, Lijun Ou wrote:
> From: Yangyang Li <liyangyang20@huawei.com>
> 
> Even if no response from hardware, make sure that qp related
> resources are completely released.
> 
> Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
> ---
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> index 7a14f0b..0409851 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> @@ -4562,16 +4562,14 @@ static int
> hns_roce_v2_destroy_qp_common(struct hns_roce_dev *hr_dev,
>  {
>  	struct hns_roce_cq *send_cq, *recv_cq;
>  	struct ib_device *ibdev = &hr_dev->ib_dev;
> -	int ret;
> +	int ret = 0;
>  
>  	if (hr_qp->ibqp.qp_type == IB_QPT_RC && hr_qp->state !=
> IB_QPS_RESET) {
>  		/* Modify qp to reset before destroying qp */
>  		ret = hns_roce_v2_modify_qp(&hr_qp->ibqp, NULL, 0,
>  					    hr_qp->state, IB_QPS_RESET);
> -		if (ret) {
> +		if (ret)
>  			ibdev_err(ibdev, "modify QP to Reset
> failed.\n");
> -			return ret;
> -		}
>  	}
>  
>  	send_cq = to_hr_cq(hr_qp->ibqp.send_cq);
> @@ -4627,7 +4625,7 @@ static int hns_roce_v2_destroy_qp_common(struct
> hns_roce_dev *hr_dev,
>  		kfree(hr_qp->rq_inl_buf.wqe_list);
>  	}
>  
> -	return 0;
> +	return ret;
>  }
>  
>  static int hns_roce_v2_destroy_qp(struct ib_qp *ibqp, struct ib_udata
> *udata)
> @@ -4637,11 +4635,9 @@ static int hns_roce_v2_destroy_qp(struct ib_qp
> *ibqp, struct ib_udata *udata)
>  	int ret;
>  
>  	ret = hns_roce_v2_destroy_qp_common(hr_dev, hr_qp, udata);
> -	if (ret) {
> +	if (ret)
>  		ibdev_err(&hr_dev->ib_dev, "Destroy qp 0x%06lx
> failed(%d)\n",
>  			  hr_qp->qpn, ret);
> -		return ret;
> -	}
>  
>  	if (hr_qp->ibqp.qp_type == IB_QPT_GSI)
>  		kfree(hr_to_hr_sqp(hr_qp));

I don't know your hardware, but this patch sounds wrong/dangerous to me.
As long as the resources this card might access are allocated by the
kernel, you can't get random data corruption by the card writing to
memory used elsewhere in the kernel.  So if your card is not responding
to your requests to free the resources, it would seem safer to leak
those resources permanently than to free them and risk the card coming
back to life long enough to corrupt memory reallocated to some other
task.

Only if you can guarantee me that there is no way your commands to the
card will fail and then the card start working again later would I
consider this patch safe.  And if it's possible for the card to hang
like this, should that be triggering a reset of the device?

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2019-08-12 15:29 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-09  9:40 [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Lijun Ou
2019-08-09  9:40 ` [PATCH for-next 1/9] RDMA/hns: Logic optimization of wc_flags Lijun Ou
2019-08-09  9:40 ` [PATCH for-next 2/9] RDMA/hns: Bugfix for creating qp attached to srq Lijun Ou
2019-08-12 15:29   ` Doug Ledford
2019-08-09  9:41 ` [PATCH for-next 3/9] RDMA/hns: Completely release qp resources when hw err Lijun Ou
2019-08-12 15:29   ` Doug Ledford [this message]
2019-08-14  6:02     ` Yangyang Li
2019-08-14 15:05       ` Doug Ledford
2019-08-14 18:47         ` Leon Romanovsky
2019-08-19 17:39           ` Doug Ledford
2019-10-08  8:43             ` liweihang
2019-08-09  9:41 ` [PATCH for-next 4/9] RDMA/hns: Modify pi vlaue when cq overflows Lijun Ou
2019-08-09  9:41 ` [PATCH for-next 5/9] RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver Lijun Ou
2019-08-09  9:41 ` [PATCH for-next 6/9] RDMA/hns: bugfix for slab-out-of-bounds when loading " Lijun Ou
2019-08-09  9:41 ` [PATCH for-next 7/9] RDMA/hns: Remove unuseful member Lijun Ou
2019-08-09  9:41 ` [PATCH for-next 8/9] RDMA/hns: Kernel notify usr space to stop ring db Lijun Ou
2019-08-12  5:52   ` Leon Romanovsky
2019-08-12 13:14     ` Jason Gunthorpe
2019-08-14  5:54       ` Yangyang Li
2019-08-09  9:41 ` [PATCH for-next 9/9] RDMA/hns: Copy some information of AV to user Lijun Ou
2019-10-21 17:23   ` Doug Ledford
2019-10-22  1:13     ` oulijun
2019-08-13 16:34 ` [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Doug Ledford
2019-08-24  6:23   ` oulijun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f49c56933205d90d82ffd3fa55a951843e22cda1.camel@redhat.com \
    --to=dledford@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=oulijun@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox