From: Guoqing Jiang <guoqing.jiang@linux.dev>
To: "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com>,
Leon Romanovsky <leon@kernel.org>
Cc: "haris.iqbal@ionos.com" <haris.iqbal@ionos.com>,
"jinpu.wang@ionos.com" <jinpu.wang@ionos.com>,
"jgg@ziepe.ca" <jgg@ziepe.ca>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH for-next 2/3] RDMA/rtrs: Fix rxe_dealloc_pd warning
Date: Fri, 14 Apr 2023 14:04:23 +0800 [thread overview]
Message-ID: <bfa3317b-656f-9a3f-9564-4dbb1bbef3e3@linux.dev> (raw)
In-Reply-To: <f71e67d8-119c-7ec5-fbc0-d37799ed82b6@fujitsu.com>
On 4/14/23 13:37, Zhijian Li (Fujitsu) wrote:
>
> On 14/04/2023 11:40, Guoqing Jiang wrote:
>>
>> On 4/13/23 16:12, Zhijian Li (Fujitsu) wrote:
>>> On 13/04/2023 15:35, Guoqing Jiang wrote:
>>>> Hi,
>>>>
>>>> I take a closer look today.
>>>>
>>>> On 4/12/23 09:15, Zhijian Li (Fujitsu) wrote:
>>>>> On 11/04/2023 20:26, Leon Romanovsky wrote:
>>>>>> On Tue, Apr 11, 2023 at 02:43:46AM +0000, Zhijian Li (Fujitsu) wrote:
>>>>>>> On 10/04/2023 21:10, Guoqing Jiang wrote:
>>>>>>>> On 4/10/23 20:08, Leon Romanovsky wrote:
>>>>>>>>> On Mon, Apr 10, 2023 at 06:43:03AM +0000, Li Zhijian wrote:
>>>>>>>>>> The warning occurs when destroying PD whose reference count is not zero.
>>>>>>>>>>
>>>>>>>>>> Precodition: clt_path->s.con_num is 2.
>>>>>>>>>> So 2 cm connection will be created as below:
>>>>>>>>>> CPU0 CPU1
>>>>>>>>>> init_conns { |
>>>>>>>>>> create_cm() // a. con[0] created |
>>>>>>>>>> | a'. rtrs_clt_rdma_cm_handler() {
>>>>>>>>>> | rtrs_rdma_addr_resolved()
>>>>>>>>>> | create_con_cq_qp(con); << con[0]
>>>>>>>>>> | }
>>>>>>>>>> | in this moment, refcnt of PD was increased to 2+
>>>> What do you mean "refcnt of PD"? usecnt in struct ib_pd or dev_ref.
>>> I mean usecnt in struct ib_pd
>>>
>>>
>>>
>>>>>>>>>> |
>>>>>>>>>> create_cm() // b. cid = 1, failed |
>>>>>>>>>> destroy_con_cq_qp() |
>>>>>>>>>> rtrs_ib_dev_put() |
>>>>>>>>>> dev_free() |
>>>>>>>>>> ib_dealloc_pd(dev->ib_pd) << PD |
>>>>>>>>>> is destroyed, but refcnt is |
>>>>>>>>>> still greater than 0 |
>>>> Assuming you mean "pd->usecnt". We only allocate pd in con[0] by rtrs_ib_dev_find_or_add,
>>>> if con[1] failed to create cm, then alloc_path_reqs -> ib_alloc_mr -> atomic_inc(&pd->usecnt)
>> The above can't be invoked, right?
>>
>>>> can't be triggered. Is there other places could increase the refcnt?
>>> Yes, when create a qp, it will also associate to this PD, that also mean refcnt of PD will be increased.
>>>
>>> When con[0](create_con_cq_qp) succeeded, refcnt of PD will be 2. and then when con[1] failed, since
>>> QP didn't create, refcnt of PD is still 2. con[1]'s cleanup will destroy the PD(ib_dealloc_pd) since dev_ref = 1, after that its
>>> refcnt is still 1.
>> I can see the path increase usecnt to 1.
>>
>> rtrs_cq_qp_create -> create_qp
>> -> rdma_create_qp
>> -> ib_create_qp
>> -> create_qp
>> -> ib_qp_usecnt_inc which increases pd->usecnt
>>
>> Where is another place to increase usecnt to 2?
> It should be
> ib_create_qp ...
> -> rxe_create_qp
> -> rxe_qp_from_init
> -> rxe_get(pd) <<< pd's refcnt will be increased.
Isn't rxe_get just increase elem->ref_cnt?
https://elixir.bootlin.com/linux/v6.3-rc6/source/drivers/infiniband/sw/rxe/rxe_pool.c#L240
>>>> Then what is the appropriate time to call destroy_con_cq_qp for this scenario?
>>>> Otherwise there could be memory leak.
>>> we must ensure QP in con[0] is closed before destroying the PD.
>>> Currently destroy_con_cq_qp() subroutine will close the opened QP first.
>> Let me try another way, with below change, rtrs_ib_dev_put can't be called
>> from destroy_con_cq_qp, right?
> Not really, con[0]->has_dev is true, so con[0]'s cleanup will call rtrs_ib_dev_put()
>
> Without this patch, when con[1] failed, con[1]'s cleanup will be called first. then call con[0]'s cleanup.
> After this change, con[1]'s cleanup will not call rtrs_ib_dev_put, but it will be called the later con[0]'s cleanup.
But rtrs_ib_dev_put relies on dev_ref, if con[1] returns earlier without
decrease dev_ref
(it is shared among connections), how rtrs_ib_dev_put can be called?
Thanks,
Guoqing
next prev parent reply other threads:[~2023-04-14 6:04 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-10 6:43 [PATCH for-next 0/3] rtrs bugfix and cleanups Li Zhijian
2023-04-10 6:43 ` [PATCH for-next 1/3] RDMA/rtrs: Remove duplicate cq_num assignment Li Zhijian
2023-04-10 13:09 ` Guoqing Jiang
2023-04-19 10:37 ` Jinpu Wang
2023-04-10 6:43 ` [PATCH for-next 2/3] RDMA/rtrs: Fix rxe_dealloc_pd warning Li Zhijian
2023-04-10 12:08 ` Leon Romanovsky
2023-04-10 13:10 ` Guoqing Jiang
2023-04-11 2:43 ` Zhijian Li (Fujitsu)
2023-04-11 12:26 ` Leon Romanovsky
2023-04-12 1:15 ` Zhijian Li (Fujitsu)
2023-04-13 7:35 ` Guoqing Jiang
2023-04-13 8:12 ` Zhijian Li (Fujitsu)
2023-04-13 13:24 ` Leon Romanovsky
2023-04-14 15:58 ` Zhu Yanjun
2023-04-17 2:18 ` Zhijian Li (Fujitsu)
2023-04-17 18:04 ` Leon Romanovsky
2023-04-18 7:04 ` Zhijian Li (Fujitsu)
2023-04-18 7:57 ` Leon Romanovsky
2023-04-19 9:53 ` Zhijian Li (Fujitsu)
2023-04-19 13:20 ` Jinpu Wang
2023-04-20 2:00 ` Zhijian Li (Fujitsu)
2023-04-21 1:38 ` Zhijian Li (Fujitsu)
2023-04-21 6:49 ` Zhijian Li (Fujitsu)
2023-04-21 7:05 ` Jinpu Wang
2023-04-14 3:40 ` Guoqing Jiang
2023-04-14 4:25 ` Bob Pearson
2023-04-14 5:37 ` Zhijian Li (Fujitsu)
2023-04-14 6:03 ` Jinpu Wang
2023-04-14 6:47 ` Zhijian Li (Fujitsu)
2023-04-14 6:04 ` Guoqing Jiang [this message]
2023-04-14 10:09 ` Zhijian Li (Fujitsu)
2023-04-17 3:08 ` Guoqing Jiang
2023-04-18 6:47 ` Zhijian Li (Fujitsu)
2023-04-10 6:43 ` [PATCH for-next 3/3] RDMA/rtrs: Avoid use-after-free in rtrs_clt_rdma_cm_handler Li Zhijian
2023-04-10 12:10 ` Leon Romanovsky
2023-04-10 13:13 ` Guoqing Jiang
2023-04-11 1:33 ` Zhijian Li (Fujitsu)
2023-04-12 1:15 ` Guoqing Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bfa3317b-656f-9a3f-9564-4dbb1bbef3e3@linux.dev \
--to=guoqing.jiang@linux.dev \
--cc=haris.iqbal@ionos.com \
--cc=jgg@ziepe.ca \
--cc=jinpu.wang@ionos.com \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=lizhijian@fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.