public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Zhu Yanjun <yanjun.zhu@linux.dev>
To: Junxian Huang <huangjunxian6@hisilicon.com>,
	jgg@ziepe.ca, leon@kernel.org
Cc: linux-rdma@vger.kernel.org, linuxarm@huawei.com,
	linux-kernel@vger.kernel.org, tangchengchang@huawei.com
Subject: Re: [PATCH v2 for-rc 2/5] RDMA/hns: Fix flush cqe error when racing with destroy qp
Date: Thu, 24 Oct 2024 19:48:41 +0200	[thread overview]
Message-ID: <38b31782-6ab1-43b0-9e6e-6fc06b0060e2@linux.dev> (raw)
In-Reply-To: <20241024124000.2931869-3-huangjunxian6@hisilicon.com>

在 2024/10/24 14:39, Junxian Huang 写道:
> From: wenglianfa <wenglianfa@huawei.com>
> 
> QP needs to be modified to IB_QPS_ERROR to trigger HW flush cqe. But
> when this process races with destroy qp, the destroy-qp process may
> modify the QP to IB_QPS_RESET first. In this case flush cqe will fail
> since it is invalid to modify qp from IB_QPS_RESET to IB_QPS_ERROR.
> 
> Add lock and bit flag to make sure pending flush cqe work is completed
> first and no more new works will be added.
> 
> Fixes: ffd541d45726 ("RDMA/hns: Add the workqueue framework for flush cqe handler")
> Signed-off-by: wenglianfa <wenglianfa@huawei.com>
> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
> ---
>   drivers/infiniband/hw/hns/hns_roce_device.h |  2 ++
>   drivers/infiniband/hw/hns/hns_roce_hw_v2.c  |  7 +++++++
>   drivers/infiniband/hw/hns/hns_roce_qp.c     | 15 +++++++++++++--
>   3 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
> index 73c78005901e..9b51d5a1533f 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_device.h
> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
> @@ -593,6 +593,7 @@ struct hns_roce_dev;
>   
>   enum {
>   	HNS_ROCE_FLUSH_FLAG = 0,
> +	HNS_ROCE_STOP_FLUSH_FLAG = 1,
>   };
>   
>   struct hns_roce_work {
> @@ -656,6 +657,7 @@ struct hns_roce_qp {
>   	enum hns_roce_cong_type	cong_type;
>   	u8			tc_mode;
>   	u8			priority;
> +	spinlock_t flush_lock;
>   };
>   
>   struct hns_roce_ib_iboe {
> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> index e85c450e1809..aa42c5a9b254 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> @@ -5598,8 +5598,15 @@ int hns_roce_v2_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
>   {
>   	struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
>   	struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
> +	unsigned long flags;
>   	int ret;
>   
> +	/* Make sure flush_cqe() is completed */
> +	spin_lock_irqsave(&hr_qp->flush_lock, flags);
> +	set_bit(HNS_ROCE_STOP_FLUSH_FLAG, &hr_qp->flush_flag);
> +	spin_unlock_irqrestore(&hr_qp->flush_lock, flags);
> +	flush_work(&hr_qp->flush_work.work);
> +
>   	ret = hns_roce_v2_destroy_qp_common(hr_dev, hr_qp, udata);
>   	if (ret)
>   		ibdev_err(&hr_dev->ib_dev,
> diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c
> index dcaa370d4a26..2ad03ecdbf8e 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_qp.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_qp.c
> @@ -90,11 +90,18 @@ static void flush_work_handle(struct work_struct *work)
>   void init_flush_work(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp)
>   {
>   	struct hns_roce_work *flush_work = &hr_qp->flush_work;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&hr_qp->flush_lock, flags);
> +	/* Exit directly after destroy_qp() */
> +	if (test_bit(HNS_ROCE_STOP_FLUSH_FLAG, &hr_qp->flush_flag)) {
> +		spin_unlock_irqrestore(&hr_qp->flush_lock, flags);
> +		return;
> +	}
>   
> -	flush_work->hr_dev = hr_dev;
> -	INIT_WORK(&flush_work->work, flush_work_handle);
>   	refcount_inc(&hr_qp->refcount);
>   	queue_work(hr_dev->irq_workq, &flush_work->work);
> +	spin_unlock_irqrestore(&hr_qp->flush_lock, flags);
>   }
>   
>   void flush_cqe(struct hns_roce_dev *dev, struct hns_roce_qp *qp)
> @@ -1140,6 +1147,7 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
>   				     struct ib_udata *udata,
>   				     struct hns_roce_qp *hr_qp)
>   {
> +	struct hns_roce_work *flush_work = &hr_qp->flush_work;
>   	struct hns_roce_ib_create_qp_resp resp = {};
>   	struct ib_device *ibdev = &hr_dev->ib_dev;
>   	struct hns_roce_ib_create_qp ucmd = {};
> @@ -1148,9 +1156,12 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
>   	mutex_init(&hr_qp->mutex);
>   	spin_lock_init(&hr_qp->sq.lock);
>   	spin_lock_init(&hr_qp->rq.lock);
> +	spin_lock_init(&hr_qp->flush_lock);

Thanks a lot. I am fine with this spin_lock_init(&hr_qp->flush_lock);
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>

Zhu Yanjun

>   
>   	hr_qp->state = IB_QPS_RESET;
>   	hr_qp->flush_flag = 0;
> +	flush_work->hr_dev = hr_dev;
> +	INIT_WORK(&flush_work->work, flush_work_handle);
>   
>   	if (init_attr->create_flags)
>   		return -EOPNOTSUPP;


  reply	other threads:[~2024-10-24 17:48 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-24 12:39 [PATCH v2 for-rc 0/5] RDMA/hns: Bugfixes Junxian Huang
2024-10-24 12:39 ` [PATCH v2 for-rc 1/5] RDMA/hns: Fix an AEQE overflow error caused by untimely update of eq_db_ci Junxian Huang
2024-10-24 12:39 ` [PATCH v2 for-rc 2/5] RDMA/hns: Fix flush cqe error when racing with destroy qp Junxian Huang
2024-10-24 17:48   ` Zhu Yanjun [this message]
2024-10-24 12:39 ` [PATCH v2 for-rc 3/5] RDMA/hns: Modify debugfs name Junxian Huang
2024-10-30 12:12   ` Leon Romanovsky
2024-10-31  9:16     ` Junxian Huang
2024-10-24 12:39 ` [PATCH v2 for-rc 4/5] RDMA/hns: Use dev_* printings in hem code instead of ibdev_* Junxian Huang
2024-10-24 12:40 ` [PATCH v2 for-rc 5/5] RDMA/hns: Fix cpu stuck caused by printings during reset Junxian Huang
2024-10-30 12:14 ` [PATCH v2 for-rc 0/5] RDMA/hns: Bugfixes Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=38b31782-6ab1-43b0-9e6e-6fc06b0060e2@linux.dev \
    --to=yanjun.zhu@linux.dev \
    --cc=huangjunxian6@hisilicon.com \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=tangchengchang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox