Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* Re: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor
From: Jason Gunthorpe @ 2023-11-06 14:13 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu)
  Cc: Zhu Yanjun, zyjzyj2000@gmail.com, leon@kernel.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	rpearsonhpe@gmail.com, Daisuke Matsuda (Fujitsu),
	bvanassche@acm.org, yi.zhang@redhat.com
In-Reply-To: <a256a01d-1572-427a-80df-46f2079af967@fujitsu.com>

On Mon, Nov 06, 2023 at 04:07:19AM +0000, Zhijian Li (Fujitsu) wrote:

> I'm sorry i'm not familiar with the linux MM subsystem. It seem it's safe/correct to access
> address/memory across pages start from the return of
> kmap_loca_page(page).

kmap_local_page() gives you a PAGE_SIZE window only

Jason

^ permalink raw reply

* Re: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor
From: Zhu Yanjun @ 2023-11-06 13:58 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu), zyjzyj2000@gmail.com, jgg@ziepe.ca,
	leon@kernel.org, linux-rdma@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com,
	Daisuke Matsuda (Fujitsu), bvanassche@acm.org,
	yi.zhang@redhat.com
In-Reply-To: <a256a01d-1572-427a-80df-46f2079af967@fujitsu.com>

在 2023/11/6 12:07, Zhijian Li (Fujitsu) 写道:
> 
> 
> On 03/11/2023 21:00, Zhu Yanjun wrote:
>> 在 2023/11/3 17:55, Li Zhijian 写道:
>>> I don't collect the Reviewed-by to the patch1-2 this time, since i
>>> think we can make it better.
>>>
>>> Patch1-2: Fix kernel panic[1] and benifit to make srp work again.
>>>             Almost nothing change from V1.
>>> Patch3-5: cleanups # newly add
>>> Patch6: make RXE support PAGE_SIZE aligned mr # newly add, but not fully tested
>>>
>>> My bad arm64 mechine offten hangs when doing blktests even though i use the
>>> default siw driver.
>>>
>>> - nvme and ULPs(rtrs, iser) always registers 4K mr still don't supported yet.
>>
>> Zhijian
>>
>> Please read carefully the whole discussion about this problem. You will find a lot of valuable suggestions, especially suggestions from Jason.
> 
> Okay, i will read it again. If you can tell me which thread, that would be better.
> 
> 
>>
>>   From the whole discussion, it seems that the root cause is very clear.
>> We need to fix this prolem. Please do not send this kind of commits again.
>>
> 
> Let's think about what's our goal first.
> 
> - 1) Fix the panic[1] and only support PAGE_SIZE MR
> - 2) support PAGE_SIZE aligned MR
> - 3) support any page_size MR.
> 
> I'm sorry i'm not familiar with the linux MM subsystem. It seem it's safe/correct to access
> address/memory across pages start from the return of kmap_loca_page(page).
> In other words, 2) is already native supported, right?

Yes. Please read the comments from Jason, Leon and Bart. They shared a 
lot of good advice. From them, we can know the root cause and how to fix 
this problem.

Good Luck.

Zhu Yanjun

> 
> I get totally confused now.
> 
> 
> 
>> Zhu Yanjun
>>
>>>
>>> [1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
>>>
>>> Li Zhijian (6):
>>>     RDMA/rxe: RDMA/rxe: don't allow registering !PAGE_SIZE mr
>>>     RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
>>>     RDMA/rxe: remove unused rxe_mr.page_shift
>>>     RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from
>>>       page_list
>>>     RDMA/rxe: cleanup rxe_mr.{page_size,page_shift}
>>>     RDMA/rxe: Support PAGE_SIZE aligned MR
>>>
>>>    drivers/infiniband/sw/rxe/rxe_mr.c    | 80 ++++++++++++++++-----------
>>>    drivers/infiniband/sw/rxe/rxe_param.h |  2 +-
>>>    drivers/infiniband/sw/rxe/rxe_verbs.h |  9 ---
>>>    3 files changed, 48 insertions(+), 43 deletions(-)
>>>


^ permalink raw reply

* Re: [PATCH for-next 3/6] RDMA/rxe: Register IP mcast address
From: Zhu Yanjun @ 2023-11-06 13:26 UTC (permalink / raw)
  To: Bob Pearson, jgg, linux-rdma
In-Reply-To: <a0b998f6-7c03-466e-b163-3317f7a5576c@gmail.com>


在 2023/11/6 4:19, Bob Pearson 写道:
>
>
> On 11/4/23 07:42, Zhu Yanjun wrote:
>
>>
>> Using reverse fir tree, a.k.a. reverse Christmas tree or reverse XMAS 
>> tree, for
>>
>> variable declarations isn't strictly required, though it is still 
>> preferred.
>>
>> Zhu Yanjun
>>
>>
> Yeah. I usually follow that style for new code (except if there are
> dependencies) but mostly add new variables at the end of the list
> together  because it makes the patch simpler to read. At least it
> does for me. If you care, I am happy to fix this.

Yes. It is good to fix it.

And your commits add mcast address supports. And I think you

should have the test case in the rdma-core to verify these commits.

Can you share the test case in the rdma maillist? ^_^

Zhu Yanjun

>
> Bob

^ permalink raw reply

* [recipe build #3629124] of ~linux-rdma rdma-core-daily in xenial: Dependency wait
From: noreply @ 2023-11-06 12:31 UTC (permalink / raw)
  To: Linux RDMA

 * State: Dependency wait
 * Recipe: linux-rdma/rdma-core-daily
 * Archive: ~linux-rdma/ubuntu/rdma-core-daily
 * Distroseries: xenial
 * Duration: 1 minute
 * Build Log: https://launchpad.net/~linux-rdma/+archive/ubuntu/rdma-core-daily/+recipebuild/3629124/+files/buildlog.txt.gz
 * Upload Log: 
 * Builder: https://launchpad.net/builders/lcy02-amd64-048

-- 
https://launchpad.net/~linux-rdma/+archive/ubuntu/rdma-core-daily/+recipebuild/3629124
Your team Linux RDMA is the requester of the build.


^ permalink raw reply

* Re: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor
From: Zhijian Li (Fujitsu) @ 2023-11-06  9:55 UTC (permalink / raw)
  To: Greg Sword
  Cc: zyjzyj2000@gmail.com, jgg@ziepe.ca, leon@kernel.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	rpearsonhpe@gmail.com, Daisuke Matsuda (Fujitsu),
	bvanassche@acm.org, yi.zhang@redhat.com
In-Reply-To: <CAEz=LcuKpkTfGZ44Kf3YamK=roa-OC=j47ZcHeLsuFe+FqOnaA@mail.gmail.com>



On 06/11/2023 17:35, Greg Sword wrote:
> On Mon, Nov 6, 2023 at 4:01 PM Zhijian Li (Fujitsu)
> <lizhijian@fujitsu.com> wrote:
>>
>>
>>
>> Very thanks for all your feedback.
>>
>> On 03/11/2023 17:55, Li Zhijian wrote:
>>> I don't collect the Reviewed-by to the patch1-2 this time, since i
>>> think we can make it better.
>>>
>>> Patch1-2: Fix kernel panic[1] and benifit to make srp work again.
>>>             Almost nothing change from V1.
>>
>> Quote from Jason:
>> "
>>> The concept was that the xarray could store anything larger than
>>> PAGE_SIZE and the entry would point at the first struct page of the
>>> contiguous chunk
>>>
>>> That looks like it is right, or at least close to right, so lets try
>>> to keep it
>> "
>>
>>
>> It seems it's okay to access address/memory across pages on RXE even though
>> we only map the first page.
> 
> Do you really make tests in your test environment? Do you have test environment?



> Do you really reproduce this problem in your test environment?
I did the test, the kernel panic[1] is gone after patch1-patch2


Thanks
Zhijian


> Your patches do not work actually. Please do not send these rubbish patches out.
> 
>>
>> That also means PAGE_SIZE aligned MR is already supported, so only check
>> `if (IS_ALIGNED(page_size, PAGE_SIZE))` is sufficient, right?
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>> index f54042e9aeb2..3755e530e6dc 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
>>           struct rxe_mr *mr = to_rmr(ibmr);
>>           unsigned int page_size = mr_page_size(mr);
>>
>> +       if (!IS_ALIGNED(page_size, PAGE_SIZE)) {
>> +               rxe_err_mr(mr, "FIXME...\n")
>> +               return -EINVAL;
>> +       }
>> +
>>           mr->nbuf = 0;
>>           mr->page_shift = ilog2(page_size);
>>           mr->page_mask = ~((u64)page_size - 1);
>> diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
>> index d2f57ead78ad..b1cf1e1c0ce1 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_param.h
>> +++ b/drivers/infiniband/sw/rxe/rxe_param.h
>> @@ -38,7 +38,7 @@ static inline enum ib_mtu eth_mtu_int_to_enum(int mtu)
>>    /* default/initial rxe device parameter settings */
>>    enum rxe_device_param {
>>           RXE_MAX_MR_SIZE                 = -1ull,
>> -       RXE_PAGE_SIZE_CAP               = 0xfffff000,
>> +       RXE_PAGE_SIZE_CAP               = 0xffffffff - (PAGE_SIZE - 1),
>>           RXE_MAX_QP_WR                   = DEFAULT_MAX_VALUE,
>>           RXE_DEVICE_CAP_FLAGS            = IB_DEVICE_BAD_PKEY_CNTR
>>                                           | IB_DEVICE_BAD_QKEY_CNTR
>>
>>
>> * minor cleanup will be done after this.
>>
>> Thanks
>> Zhijian
>>
>>> Patch3-5: cleanups # newly add
>>> Patch6: make RXE support PAGE_SIZE aligned mr # newly add, but not fully tested
>>>
>>> My bad arm64 mechine offten hangs when doing blktests even though i use the
>>> default siw driver.
>>>
>>> - nvme and ULPs(rtrs, iser) always registers 4K mr still don't supported yet.
>>>
>>> [1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
>>>
>>> Li Zhijian (6):
>>>     RDMA/rxe: RDMA/rxe: don't allow registering !PAGE_SIZE mr
>>>     RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
>>>     RDMA/rxe: remove unused rxe_mr.page_shift
>>>     RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from
>>>       page_list
>>>     RDMA/rxe: cleanup rxe_mr.{page_size,page_shift}
>>>     RDMA/rxe: Support PAGE_SIZE aligned MR
>>>
>>>    drivers/infiniband/sw/rxe/rxe_mr.c    | 80 ++++++++++++++++-----------
>>>    drivers/infiniband/sw/rxe/rxe_param.h |  2 +-
>>>    drivers/infiniband/sw/rxe/rxe_verbs.h |  9 ---
>>>    3 files changed, 48 insertions(+), 43 deletions(-)
>>>

^ permalink raw reply

* Re: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor
From: Greg Sword @ 2023-11-06  9:35 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu)
  Cc: zyjzyj2000@gmail.com, jgg@ziepe.ca, leon@kernel.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	rpearsonhpe@gmail.com, Daisuke Matsuda (Fujitsu),
	bvanassche@acm.org, yi.zhang@redhat.com
In-Reply-To: <27a06d26-4443-4349-801e-c09da0d57884@fujitsu.com>

On Mon, Nov 6, 2023 at 4:01 PM Zhijian Li (Fujitsu)
<lizhijian@fujitsu.com> wrote:
>
>
>
> Very thanks for all your feedback.
>
> On 03/11/2023 17:55, Li Zhijian wrote:
> > I don't collect the Reviewed-by to the patch1-2 this time, since i
> > think we can make it better.
> >
> > Patch1-2: Fix kernel panic[1] and benifit to make srp work again.
> >            Almost nothing change from V1.
>
> Quote from Jason:
> "
> > The concept was that the xarray could store anything larger than
> > PAGE_SIZE and the entry would point at the first struct page of the
> > contiguous chunk
> >
> > That looks like it is right, or at least close to right, so lets try
> > to keep it
> "
>
>
> It seems it's okay to access address/memory across pages on RXE even though
> we only map the first page.

Do you really make tests in your test environment? Do you have test environment?
Do you really reproduce this problem in your test environment?
Your patches do not work actually. Please do not send these rubbish patches out.

>
> That also means PAGE_SIZE aligned MR is already supported, so only check
> `if (IS_ALIGNED(page_size, PAGE_SIZE))` is sufficient, right?
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> index f54042e9aeb2..3755e530e6dc 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
>          struct rxe_mr *mr = to_rmr(ibmr);
>          unsigned int page_size = mr_page_size(mr);
>
> +       if (!IS_ALIGNED(page_size, PAGE_SIZE)) {
> +               rxe_err_mr(mr, "FIXME...\n")
> +               return -EINVAL;
> +       }
> +
>          mr->nbuf = 0;
>          mr->page_shift = ilog2(page_size);
>          mr->page_mask = ~((u64)page_size - 1);
> diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
> index d2f57ead78ad..b1cf1e1c0ce1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_param.h
> +++ b/drivers/infiniband/sw/rxe/rxe_param.h
> @@ -38,7 +38,7 @@ static inline enum ib_mtu eth_mtu_int_to_enum(int mtu)
>   /* default/initial rxe device parameter settings */
>   enum rxe_device_param {
>          RXE_MAX_MR_SIZE                 = -1ull,
> -       RXE_PAGE_SIZE_CAP               = 0xfffff000,
> +       RXE_PAGE_SIZE_CAP               = 0xffffffff - (PAGE_SIZE - 1),
>          RXE_MAX_QP_WR                   = DEFAULT_MAX_VALUE,
>          RXE_DEVICE_CAP_FLAGS            = IB_DEVICE_BAD_PKEY_CNTR
>                                          | IB_DEVICE_BAD_QKEY_CNTR
>
>
> * minor cleanup will be done after this.
>
> Thanks
> Zhijian
>
> > Patch3-5: cleanups # newly add
> > Patch6: make RXE support PAGE_SIZE aligned mr # newly add, but not fully tested
> >
> > My bad arm64 mechine offten hangs when doing blktests even though i use the
> > default siw driver.
> >
> > - nvme and ULPs(rtrs, iser) always registers 4K mr still don't supported yet.
> >
> > [1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
> >
> > Li Zhijian (6):
> >    RDMA/rxe: RDMA/rxe: don't allow registering !PAGE_SIZE mr
> >    RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
> >    RDMA/rxe: remove unused rxe_mr.page_shift
> >    RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from
> >      page_list
> >    RDMA/rxe: cleanup rxe_mr.{page_size,page_shift}
> >    RDMA/rxe: Support PAGE_SIZE aligned MR
> >
> >   drivers/infiniband/sw/rxe/rxe_mr.c    | 80 ++++++++++++++++-----------
> >   drivers/infiniband/sw/rxe/rxe_param.h |  2 +-
> >   drivers/infiniband/sw/rxe/rxe_verbs.h |  9 ---
> >   3 files changed, 48 insertions(+), 43 deletions(-)
> >

^ permalink raw reply

* Re: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor
From: Zhijian Li (Fujitsu) @ 2023-11-06  7:59 UTC (permalink / raw)
  To: zyjzyj2000@gmail.com, jgg@ziepe.ca, leon@kernel.org,
	linux-rdma@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com,
	Daisuke Matsuda (Fujitsu), bvanassche@acm.org,
	yi.zhang@redhat.com
In-Reply-To: <20231103095549.490744-1-lizhijian@fujitsu.com>



Very thanks for all your feedback.

On 03/11/2023 17:55, Li Zhijian wrote:
> I don't collect the Reviewed-by to the patch1-2 this time, since i
> think we can make it better.
> 
> Patch1-2: Fix kernel panic[1] and benifit to make srp work again.
>            Almost nothing change from V1.

Quote from Jason:
"
> The concept was that the xarray could store anything larger than
> PAGE_SIZE and the entry would point at the first struct page of the
> contiguous chunk
> 
> That looks like it is right, or at least close to right, so lets try
> to keep it
"


It seems it's okay to access address/memory across pages on RXE even though
we only map the first page.

That also means PAGE_SIZE aligned MR is already supported, so only check
`if (IS_ALIGNED(page_size, PAGE_SIZE))` is sufficient, right?

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index f54042e9aeb2..3755e530e6dc 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -234,6 +234,12 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
         struct rxe_mr *mr = to_rmr(ibmr);
         unsigned int page_size = mr_page_size(mr);
  
+       if (!IS_ALIGNED(page_size, PAGE_SIZE)) {
+               rxe_err_mr(mr, "FIXME...\n")
+               return -EINVAL;
+       }
+
         mr->nbuf = 0;
         mr->page_shift = ilog2(page_size);
         mr->page_mask = ~((u64)page_size - 1);
diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
index d2f57ead78ad..b1cf1e1c0ce1 100644
--- a/drivers/infiniband/sw/rxe/rxe_param.h
+++ b/drivers/infiniband/sw/rxe/rxe_param.h
@@ -38,7 +38,7 @@ static inline enum ib_mtu eth_mtu_int_to_enum(int mtu)
  /* default/initial rxe device parameter settings */
  enum rxe_device_param {
         RXE_MAX_MR_SIZE                 = -1ull,
-       RXE_PAGE_SIZE_CAP               = 0xfffff000,
+       RXE_PAGE_SIZE_CAP               = 0xffffffff - (PAGE_SIZE - 1),
         RXE_MAX_QP_WR                   = DEFAULT_MAX_VALUE,
         RXE_DEVICE_CAP_FLAGS            = IB_DEVICE_BAD_PKEY_CNTR
                                         | IB_DEVICE_BAD_QKEY_CNTR


* minor cleanup will be done after this.

Thanks
Zhijian

> Patch3-5: cleanups # newly add
> Patch6: make RXE support PAGE_SIZE aligned mr # newly add, but not fully tested
> 
> My bad arm64 mechine offten hangs when doing blktests even though i use the
> default siw driver.
> 
> - nvme and ULPs(rtrs, iser) always registers 4K mr still don't supported yet.
> 
> [1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
> 
> Li Zhijian (6):
>    RDMA/rxe: RDMA/rxe: don't allow registering !PAGE_SIZE mr
>    RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
>    RDMA/rxe: remove unused rxe_mr.page_shift
>    RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from
>      page_list
>    RDMA/rxe: cleanup rxe_mr.{page_size,page_shift}
>    RDMA/rxe: Support PAGE_SIZE aligned MR
> 
>   drivers/infiniband/sw/rxe/rxe_mr.c    | 80 ++++++++++++++++-----------
>   drivers/infiniband/sw/rxe/rxe_param.h |  2 +-
>   drivers/infiniband/sw/rxe/rxe_verbs.h |  9 ---
>   3 files changed, 48 insertions(+), 43 deletions(-)
> 

^ permalink raw reply related

* Re: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor
From: Zhijian Li (Fujitsu) @ 2023-11-06  4:07 UTC (permalink / raw)
  To: Zhu Yanjun, zyjzyj2000@gmail.com, jgg@ziepe.ca, leon@kernel.org,
	linux-rdma@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com,
	Daisuke Matsuda (Fujitsu), bvanassche@acm.org,
	yi.zhang@redhat.com
In-Reply-To: <d838620b-51df-4216-864e-1c793dae7721@linux.dev>



On 03/11/2023 21:00, Zhu Yanjun wrote:
> 在 2023/11/3 17:55, Li Zhijian 写道:
>> I don't collect the Reviewed-by to the patch1-2 this time, since i
>> think we can make it better.
>>
>> Patch1-2: Fix kernel panic[1] and benifit to make srp work again.
>>            Almost nothing change from V1.
>> Patch3-5: cleanups # newly add
>> Patch6: make RXE support PAGE_SIZE aligned mr # newly add, but not fully tested
>>
>> My bad arm64 mechine offten hangs when doing blktests even though i use the
>> default siw driver.
>>
>> - nvme and ULPs(rtrs, iser) always registers 4K mr still don't supported yet.
> 
> Zhijian
> 
> Please read carefully the whole discussion about this problem. You will find a lot of valuable suggestions, especially suggestions from Jason.

Okay, i will read it again. If you can tell me which thread, that would be better.


> 
>  From the whole discussion, it seems that the root cause is very clear.
> We need to fix this prolem. Please do not send this kind of commits again.
> 

Let's think about what's our goal first.

- 1) Fix the panic[1] and only support PAGE_SIZE MR
- 2) support PAGE_SIZE aligned MR
- 3) support any page_size MR.

I'm sorry i'm not familiar with the linux MM subsystem. It seem it's safe/correct to access
address/memory across pages start from the return of kmap_loca_page(page).
In other words, 2) is already native supported, right?

I get totally confused now.



> Zhu Yanjun
> 
>>
>> [1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
>>
>> Li Zhijian (6):
>>    RDMA/rxe: RDMA/rxe: don't allow registering !PAGE_SIZE mr
>>    RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
>>    RDMA/rxe: remove unused rxe_mr.page_shift
>>    RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from
>>      page_list
>>    RDMA/rxe: cleanup rxe_mr.{page_size,page_shift}
>>    RDMA/rxe: Support PAGE_SIZE aligned MR
>>
>>   drivers/infiniband/sw/rxe/rxe_mr.c    | 80 ++++++++++++++++-----------
>>   drivers/infiniband/sw/rxe/rxe_param.h |  2 +-
>>   drivers/infiniband/sw/rxe/rxe_verbs.h |  9 ---
>>   3 files changed, 48 insertions(+), 43 deletions(-)
>>
> 

^ permalink raw reply

* Re: [PATCH RFC V2 0/6] rxe_map_mr_sg() fix cleanup and refactor
From: Zhijian Li (Fujitsu) @ 2023-11-06  3:46 UTC (permalink / raw)
  To: Greg Sword
  Cc: zyjzyj2000@gmail.com, jgg@ziepe.ca, leon@kernel.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	rpearsonhpe@gmail.com, Daisuke Matsuda (Fujitsu),
	bvanassche@acm.org, yi.zhang@redhat.com
In-Reply-To: <CAEz=LcvrztPxSZj5uiaDe-mdC0qD4km07d8aFuVPOb5dgnHNug@mail.gmail.com>



On 03/11/2023 18:17, Greg Sword wrote:
> On Fri, Nov 3, 2023 at 5:58 PM Li Zhijian <lizhijian@fujitsu.com> wrote:
>>
>> I don't collect the Reviewed-by to the patch1-2 this time, since i
>> think we can make it better.
>>
>> Patch1-2: Fix kernel panic[1] and benifit to make srp work again.
>>            Almost nothing change from V1.
>> Patch3-5: cleanups # newly add
>> Patch6: make RXE support PAGE_SIZE aligned mr # newly add, but not fully tested
> 
> Do some work. Do not use these rubbish patch to waste our time.

So sorry about this. Of course, any other proposals are welcomed.




> 
>>
>> My bad arm64 mechine offten hangs when doing blktests even though i use the
>> default siw driver.
>>
>> - nvme and ULPs(rtrs, iser) always registers 4K mr still don't supported yet.
>>
>> [1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
>>
>> Li Zhijian (6):
>>    RDMA/rxe: RDMA/rxe: don't allow registering !PAGE_SIZE mr
>>    RDMA/rxe: set RXE_PAGE_SIZE_CAP to PAGE_SIZE
>>    RDMA/rxe: remove unused rxe_mr.page_shift
>>    RDMA/rxe: Use PAGE_SIZE and PAGE_SHIFT to extract address from
>>      page_list
>>    RDMA/rxe: cleanup rxe_mr.{page_size,page_shift}
>>    RDMA/rxe: Support PAGE_SIZE aligned MR
>>
>>   drivers/infiniband/sw/rxe/rxe_mr.c    | 80 ++++++++++++++++-----------
>>   drivers/infiniband/sw/rxe/rxe_param.h |  2 +-
>>   drivers/infiniband/sw/rxe/rxe_verbs.h |  9 ---
>>   3 files changed, 48 insertions(+), 43 deletions(-)
>>
>> --
>> 2.41.0
>>

^ permalink raw reply

* Re: [PATCH RFC V2 6/6] RDMA/rxe: Support PAGE_SIZE aligned MR
From: Zhijian Li (Fujitsu) @ 2023-11-06  3:07 UTC (permalink / raw)
  To: Bart Van Assche, zyjzyj2000@gmail.com, jgg@ziepe.ca,
	leon@kernel.org, linux-rdma@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com,
	Daisuke Matsuda (Fujitsu), yi.zhang@redhat.com
In-Reply-To: <d2ccef1e-2bea-4596-8787-8d2491ce0278@acm.org>



On 03/11/2023 23:04, Bart Van Assche wrote:
> 
> On 11/3/23 02:55, Li Zhijian wrote:
>> -    return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset, rxe_set_page);
>> +    for_each_sg(sgl, sg, sg_nents, i) {
>> +        u64 dma_addr = sg_dma_address(sg) + sg_offset;
>> +        unsigned int dma_len = sg_dma_len(sg) - sg_offset;
>> +        u64 end_dma_addr = dma_addr + dma_len;
>> +        u64 page_addr = dma_addr & PAGE_MASK;
>> +
>> +        if (sg_dma_len(sg) == 0) {
>> +            rxe_dbg_mr(mr, "empty SGE\n");
>> +            return -EINVAL;
>> +        }
>> +        do {
>> +            int ret = rxe_store_page(mr, page_addr);
>> +            if (ret)
>> +                return ret;
>> +
>> +            page_addr += PAGE_SIZE;
>> +        } while (page_addr < end_dma_addr);
>> +        sg_offset = 0;
>> +    }
>> +
>> +    return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset_p, rxe_set_page);
>>   }
> 
> Is this change necessary? 

There is already a loop in ib_sg_to_pages()
> that splits SG entries that are larger than mr->page_size into entries
> with size mr->page_size.

I see.

My thought was that we are only able to safely access PAGE_SIZE memory scope [page_va, page_va + PAGE_SIZE)
from the return of kmap_local_page(page).
However when mr->page_size is larger than PAGE_SIZE, we may access the next pages without mapping it.

Thanks
Zhijian

^ permalink raw reply

* Re: [PATCH for-next 3/6] RDMA/rxe: Register IP mcast address
From: Bob Pearson @ 2023-11-05 20:19 UTC (permalink / raw)
  To: Zhu Yanjun, jgg, linux-rdma
In-Reply-To: <30513a47-68c6-410f-bbfb-09211f07b082@linux.dev>



On 11/4/23 07:42, Zhu Yanjun wrote:

> 
> Using reverse fir tree, a.k.a. reverse Christmas tree or reverse XMAS 
> tree, for
> 
> variable declarations isn't strictly required, though it is still 
> preferred.
> 
> Zhu Yanjun
> 
> 
Yeah. I usually follow that style for new code (except if there are
dependencies) but mostly add new variables at the end of the list
together  because it makes the patch simpler to read. At least it
does for me. If you care, I am happy to fix this.

Bob

^ permalink raw reply

* Re: [PATCH v2] IB: rework memlock limit handling code
From: Leon Romanovsky @ 2023-11-05 10:21 UTC (permalink / raw)
  To: Dennis Dalessandro
  Cc: Maxim Samoylov, Bernard Metzler, Guoqing Jiang,
	linux-rdma@vger.kernel.org, Jason Gunthorpe, Christian Benvenuti,
	Vadim Fedorenko
In-Reply-To: <daf453fa-c834-9cf1-0ddc-04abdfa37abb@cornelisnetworks.com>

On Thu, Nov 02, 2023 at 04:54:22PM -0400, Dennis Dalessandro wrote:
> On 11/2/23 8:32 AM, Leon Romanovsky wrote:
> >>
> >> So, as for 31.10.2023 I still see siw_umem_get() call used in
> >> linux-rdma repo in "for-next" branch.
> > 
> > I hoped to hear some feedback from Bernard and Dennis.
> > 
> 
> Sorry about that. I thought I did respond about qib.

Dennis, qib probably needs to use ib_umem_get too.

> 
> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>

^ permalink raw reply

* Re: Re: [PATCH v2] IB: rework memlock limit handling code
From: Leon Romanovsky @ 2023-11-05 10:20 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Maxim Samoylov, Dennis Dalessandro, Guoqing Jiang,
	linux-rdma@vger.kernel.org, Jason Gunthorpe, Christian Benvenuti,
	Vadim Fedorenko
In-Reply-To: <SN7PR15MB575594FE0EB633F0A7878A0C99A5A@SN7PR15MB5755.namprd15.prod.outlook.com>

On Fri, Nov 03, 2023 at 10:18:51AM +0000, Bernard Metzler wrote:
> 
> 
> > -----Original Message-----
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Thursday, November 2, 2023 1:32 PM
> > To: Maxim Samoylov <max7255@meta.com>; Bernard Metzler
> > <BMT@zurich.ibm.com>; Dennis Dalessandro
> > <dennis.dalessandro@cornelisnetworks.com>
> > Cc: Guoqing Jiang <guoqing.jiang@linux.dev>; linux-rdma@vger.kernel.org;
> > Jason Gunthorpe <jgg@ziepe.ca>; Christian Benvenuti <benve@cisco.com>;
> > Vadim Fedorenko <vadim.fedorenko@linux.dev>
> > Subject: [EXTERNAL] Re: [PATCH v2] IB: rework memlock limit handling code
> > 
> > On Tue, Oct 31, 2023 at 01:30:27PM +0000, Maxim Samoylov wrote:
> > > On 23/10/2023 07:52, Leon Romanovsky wrote:
> > > > On Mon, Oct 23, 2023 at 09:40:16AM +0800, Guoqing Jiang wrote:
> > > >>
> > > >>
> > > >> On 10/15/23 17:19, Leon Romanovsky wrote:
> > > >>> On Thu, Oct 12, 2023 at 01:29:21AM -0700, Maxim Samoylov wrote:
> > > >>>> This patch provides the uniform handling for RLIM_INFINITY value
> > > >>>> across the infiniband/rdma subsystem.
> > > >>>>
> > > >>>> Currently in some cases the infinity constant is treated
> > > >>>> as an actual limit value, which could be misleading.
> > > >>>>
> > > >>>> Let's also provide the single helper to check against process
> > > >>>> MEMLOCK limit while registering user memory region mappings.
> > > >>>>
> > > >>>> Signed-off-by: Maxim Samoylov<max7255@meta.com>
> > > >>>> ---
> > > >>>>
> > > >>>> v1 -> v2: rewritten commit message, rebased on recent upstream
> > > >>>>
> > > >>>>    drivers/infiniband/core/umem.c             |  7 ++-----
> > > >>>>    drivers/infiniband/hw/qib/qib_user_pages.c |  7 +++----
> > > >>>>    drivers/infiniband/hw/usnic/usnic_uiom.c   |  6 ++----
> > > >>>>    drivers/infiniband/sw/siw/siw_mem.c        |  6 +++---
> > > >>>>    drivers/infiniband/sw/siw/siw_verbs.c      | 23 ++++++++++-------
> > -----
> > > >>>>    include/rdma/ib_umem.h                     | 11 +++++++++++
> > > >>>>    6 files changed, 31 insertions(+), 29 deletions(-)
> > > >>> <...>
> > > >>>
> > > >>>> @@ -1321,8 +1322,8 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd
> > *pd, u64 start, u64 len,
> > > >>>>    	struct siw_umem *umem = NULL;
> > > >>>>    	struct siw_ureq_reg_mr ureq;
> > > >>>>    	struct siw_device *sdev = to_siw_dev(pd->device);
> > > >>>> -
> > > >>>> -	unsigned long mem_limit = rlimit(RLIMIT_MEMLOCK);
> > > >>>> +	unsigned long num_pages =
> > > >>>> +		(PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT;
> > > >>>>    	int rv;
> > > >>>>    	siw_dbg_pd(pd, "start: 0x%pK, va: 0x%pK, len: %llu\n",
> > > >>>> @@ -1338,19 +1339,15 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd
> > *pd, u64 start, u64 len,
> > > >>>>    		rv = -EINVAL;
> > > >>>>    		goto err_out;
> > > >>>>    	}
> > > >>>> -	if (mem_limit != RLIM_INFINITY) {
> > > >>>> -		unsigned long num_pages =
> > > >>>> -			(PAGE_ALIGN(len + (start & ~PAGE_MASK))) >>
> > PAGE_SHIFT;
> > > >>>> -		mem_limit >>= PAGE_SHIFT;
> > > >>>> -		if (num_pages > mem_limit - current->mm->locked_vm) {
> > > >>>> -			siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n",
> > > >>>> -				   num_pages, mem_limit,
> > > >>>> -				   current->mm->locked_vm);
> > > >>>> -			rv = -ENOMEM;
> > > >>>> -			goto err_out;
> > > >>>> -		}
> > > >>>> +	if (!ib_umem_check_rlimit_memlock(num_pages + current->mm-
> > >locked_vm)) {
> > > >>>> +		siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n",
> > > >>>> +				num_pages, rlimit(RLIMIT_MEMLOCK),
> > > >>>> +				current->mm->locked_vm);
> > > >>>> +		rv = -ENOMEM;
> > > >>>> +		goto err_out;
> > > >>>>    	}
> > > >>> Sorry for late response, but why does this hunk exist in first place?
> 
> 
> If using ib_umem_get() for siw, as I sent as for-next
> patch yesterday, we can drop that logic completely, since we now
> have it in ib_umem_get(). It was only there because of not
> using ib_umem_get().
> 
> I can resend my pending for-next patch as a patch to current,
> also removing memlock check (I simply forgot to remove it).
> Not sure if it would obsolete this patch here completely.
> Leon, please advise.

We are in the middle of merge window, so won't take any patches except
bug fixes.

So please, resend your patch after after merge window ends.

Thanks

> 
> Otherwise:
> 
> Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
> 
> 
> > > >>>
> > >
> > > Trailing newline, will definitely drop it.
> > >
> > > >>>> +
> > > >>>>    	umem = siw_umem_get(start, len, ib_access_writable(rights));
> > > >>> This should be ib_umem_get().
> > > >>
> > > >> IMO, it deserves a separate patch, and replace siw_umem_get with
> > ib_umem_get
> > > >> is not straightforward given siw_mem has two types of memory (pbl and
> > umem).
> > > >
> > > > The thing is that once you convince yourself that SIW should use
> > ib_umem_get(),
> > > > the same question will arise for other parts of this patch where
> > > > ib_umem_check_rlimit_memlock() is used.
> > > >
> > > > And if we eliminate them all, there won't be a need for this new API
> > call at all.
> > > >
> > > > Thanks
> > > >
> > >
> > > Hi!
> > >
> > > So, as for 31.10.2023 I still see siw_umem_get() call used in
> > > linux-rdma repo in "for-next" branch.
> > 
> > I hoped to hear some feedback from Bernard and Dennis.
> > 
> > >
> > > AFAIU this helper call is used only in a single place and could
> > > potentially be replaced with ib_umem_get() as Leon suggests.
> > >
> > > But should we perform it right inside this memlock helper patch?
> > >
> > > I can submit later another patch with siw_umem_get() replaced
> > > if necessary.
> > >
> > >
> > > >>
> > > >> Thanks,
> > > >> Guoqing
> > >

^ permalink raw reply

* [recipe build #3628316] of ~linux-rdma rdma-core-daily in xenial: Dependency wait
From: noreply @ 2023-11-04 18:32 UTC (permalink / raw)
  To: Linux RDMA

 * State: Dependency wait
 * Recipe: linux-rdma/rdma-core-daily
 * Archive: ~linux-rdma/ubuntu/rdma-core-daily
 * Distroseries: xenial
 * Duration: 2 minutes
 * Build Log: https://launchpad.net/~linux-rdma/+archive/ubuntu/rdma-core-daily/+recipebuild/3628316/+files/buildlog.txt.gz
 * Upload Log: 
 * Builder: https://launchpad.net/builders/lcy02-amd64-016

-- 
https://launchpad.net/~linux-rdma/+archive/ubuntu/rdma-core-daily/+recipebuild/3628316
Your team Linux RDMA is the requester of the build.


^ permalink raw reply

* Re: [PATCH for-next 3/6] RDMA/rxe: Register IP mcast address
From: Zhu Yanjun @ 2023-11-04 12:42 UTC (permalink / raw)
  To: Bob Pearson, jgg, linux-rdma
In-Reply-To: <20231103204324.9606-4-rpearsonhpe@gmail.com>


在 2023/11/4 4:43, Bob Pearson 写道:
> Add code to rxe_mcast_add() and rxe_mcast_del() to register/deregister
> the IP multicast address. This is required for multicast traffic to
> reach the rxe driver.
>
> Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c")
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_mcast.c | 110 +++++++++++++++++++++-----
>   drivers/infiniband/sw/rxe/rxe_net.c   |   2 +-
>   drivers/infiniband/sw/rxe/rxe_net.h   |   1 +
>   drivers/infiniband/sw/rxe/rxe_verbs.h |   1 +
>   4 files changed, 93 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
> index 86cc2e18a7fd..ec757b955979 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mcast.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
> @@ -19,38 +19,107 @@
>    * mcast packets in the rxe receive path.
>    */
>   
> +#include <linux/igmp.h>
> +
>   #include "rxe.h"
>   
> -/**
> - * rxe_mcast_add - add multicast address to rxe device
> - * @rxe: rxe device object
> - * @mgid: multicast address as a gid
> - *
> - * Returns 0 on success else an error
> - */
> -static int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
> +/* register mcast IP and MAC addresses with net stack */
> +static int rxe_mcast_add6(struct rxe_dev *rxe, union ib_gid *mgid)
>   {
>   	unsigned char ll_addr[ETH_ALEN];
> +	struct in6_addr *addr6 = (struct in6_addr *)mgid;
> +	int err;


Using reverse fir tree, a.k.a. reverse Christmas tree or reverse XMAS 
tree, for

variable declarations isn't strictly required, though it is still preferred.

Zhu Yanjun


> +
> +	rtnl_lock();
> +	err = ipv6_sock_mc_join(recv_sockets.sk6->sk, rxe->ndev->ifindex,
> +				addr6);
> +	rtnl_unlock();
> +	if (err && err != -EADDRINUSE)
> +		goto err_out;
>   
>   	ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
> +	err = dev_mc_add(rxe->ndev, ll_addr);
> +	if (err)
> +		goto err_drop;
> +
> +	return 0;
>   
> -	return dev_mc_add(rxe->ndev, ll_addr);
> +err_drop:
> +	ipv6_sock_mc_drop(recv_sockets.sk6->sk, rxe->ndev->ifindex, addr6);
> +err_out:
> +	return err;
>   }
>   
> -/**
> - * rxe_mcast_del - delete multicast address from rxe device
> - * @rxe: rxe device object
> - * @mgid: multicast address as a gid
> - *
> - * Returns 0 on success else an error
> - */
> -static int rxe_mcast_del(struct rxe_dev *rxe, union ib_gid *mgid)
> +static int rxe_mcast_add(struct rxe_mcg *mcg)
>   {
> +	struct rxe_dev *rxe = mcg->rxe;
> +	union ib_gid *mgid = &mcg->mgid;
> +	struct ip_mreqn imr = {};
>   	unsigned char ll_addr[ETH_ALEN];
> +	int err;
> +
> +	if (mcg->is_ipv6)
> +		return rxe_mcast_add6(rxe, mgid);
> +
> +	imr.imr_multiaddr = *(struct in_addr *)(mgid->raw + 12);
> +	imr.imr_ifindex = rxe->ndev->ifindex;
> +	rtnl_lock();
> +	err = ip_mc_join_group(recv_sockets.sk4->sk, &imr);
> +	rtnl_unlock();
> +	if (err && err != -EADDRINUSE)
> +		goto err_out;
> +
> +	ip_eth_mc_map(imr.imr_multiaddr.s_addr, ll_addr);
> +	err = dev_mc_add(rxe->ndev, ll_addr);
> +	if (err)
> +		goto err_leave;
> +
> +	return 0;
> +
> +err_leave:
> +	ip_mc_leave_group(recv_sockets.sk4->sk, &imr);
> +err_out:
> +	return err;
> +}
> +
> +/* deregister mcast IP and MAC addresses with net stack */
> +static int rxe_mcast_del6(struct rxe_dev *rxe, union ib_gid *mgid)
> +{
> +	unsigned char ll_addr[ETH_ALEN];
> +	int err, err2;
>   
>   	ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
> +	err = dev_mc_del(rxe->ndev, ll_addr);
> +
> +	rtnl_lock();
> +	err2 = ipv6_sock_mc_drop(recv_sockets.sk6->sk,
> +			rxe->ndev->ifindex, (struct in6_addr *)mgid);
> +	rtnl_unlock();
> +
> +	return err ?: err2;
> +}
> +
> +static int rxe_mcast_del(struct rxe_mcg *mcg)
> +{
> +	struct rxe_dev *rxe = mcg->rxe;
> +	union ib_gid *mgid = &mcg->mgid;
> +	struct ip_mreqn imr = {};
> +	unsigned char ll_addr[ETH_ALEN];
> +	int err, err2;
> +
> +	if (mcg->is_ipv6)
> +		return rxe_mcast_del6(rxe, mgid);
> +
> +	imr.imr_multiaddr = *(struct in_addr *)(mgid->raw + 12);
> +	imr.imr_ifindex = rxe->ndev->ifindex;
> +	ip_eth_mc_map(imr.imr_multiaddr.s_addr, ll_addr);
> +	err = dev_mc_del(rxe->ndev, ll_addr);
> +
> +	rtnl_lock();
> +	err2 = ip_mc_leave_group(recv_sockets.sk4->sk, &imr);
> +	rtnl_unlock();
>   
> -	return dev_mc_del(rxe->ndev, ll_addr);
> +	return err ?: err2;
>   }
>   
>   /**
> @@ -164,6 +233,7 @@ static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
>   {
>   	kref_init(&mcg->ref_cnt);
>   	memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid));
> +	mcg->is_ipv6 = !ipv6_addr_v4mapped((struct in6_addr *)mgid);
>   	INIT_LIST_HEAD(&mcg->qp_list);
>   	mcg->rxe = rxe;
>   
> @@ -225,7 +295,7 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
>   	spin_unlock_bh(&rxe->mcg_lock);
>   
>   	/* add mcast address outside of lock */
> -	err = rxe_mcast_add(rxe, mgid);
> +	err = rxe_mcast_add(mcg);
>   	if (!err)
>   		return mcg;
>   
> @@ -273,7 +343,7 @@ static void __rxe_destroy_mcg(struct rxe_mcg *mcg)
>   static void rxe_destroy_mcg(struct rxe_mcg *mcg)
>   {
>   	/* delete mcast address outside of lock */
> -	rxe_mcast_del(mcg->rxe, &mcg->mgid);
> +	rxe_mcast_del(mcg);
>   
>   	spin_lock_bh(&mcg->rxe->mcg_lock);
>   	__rxe_destroy_mcg(mcg);
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 2fad56fc95e7..36617d07fddf 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -18,7 +18,7 @@
>   #include "rxe_net.h"
>   #include "rxe_loc.h"
>   
> -static struct rxe_recv_sockets recv_sockets;
> +struct rxe_recv_sockets recv_sockets;
>   
>   static struct dst_entry *rxe_find_route4(struct rxe_qp *qp,
>   					 struct net_device *ndev,
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h
> index 45d80d00f86b..89cee7d5340f 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.h
> +++ b/drivers/infiniband/sw/rxe/rxe_net.h
> @@ -15,6 +15,7 @@ struct rxe_recv_sockets {
>   	struct socket *sk4;
>   	struct socket *sk6;
>   };
> +extern struct rxe_recv_sockets recv_sockets;
>   
>   int rxe_net_add(const char *ibdev_name, struct net_device *ndev);
>   
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
> index ccb9d19ffe8a..7be9e6232dd9 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
> @@ -352,6 +352,7 @@ struct rxe_mcg {
>   	atomic_t		qp_num;
>   	u32			qkey;
>   	u16			pkey;
> +	bool			is_ipv6;
>   };
>   
>   struct rxe_mca {

^ permalink raw reply

* Re: [PATCH for-next 2/6] RDMA/rxe: Handle loopback of mcast packets
From: Zhu Yanjun @ 2023-11-04 12:30 UTC (permalink / raw)
  To: Bob Pearson, jgg, linux-rdma
In-Reply-To: <20231103204324.9606-3-rpearsonhpe@gmail.com>

在 2023/11/4 4:43, Bob Pearson 写道:
> Add a mask bit to indicate that a multicast packet has been locally
> sent and use to set the correct qpn for multicast packets.
> 
> Add code to rxe_xmit_packet() to correctly handle multicast packets
> which must be sent on the wire and also duplicated to any local qps
> which may belong the multicast group, but not including the sender.
> 
> Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c")
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_av.c     |  7 +++++++
>   drivers/infiniband/sw/rxe/rxe_loc.h    |  1 +
>   drivers/infiniband/sw/rxe/rxe_net.c    | 25 ++++++++++++++++++++++++-
>   drivers/infiniband/sw/rxe/rxe_opcode.h |  2 +-
>   drivers/infiniband/sw/rxe/rxe_recv.c   |  4 ++++
>   drivers/infiniband/sw/rxe/rxe_req.c    | 11 +++++++++--
>   6 files changed, 46 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_av.c b/drivers/infiniband/sw/rxe/rxe_av.c
> index 4ac17b8def28..022173eb5d75 100644
> --- a/drivers/infiniband/sw/rxe/rxe_av.c
> +++ b/drivers/infiniband/sw/rxe/rxe_av.c
> @@ -7,6 +7,13 @@
>   #include "rxe.h"
>   #include "rxe_loc.h"
>   
> +bool rxe_is_mcast_av(struct rxe_av *av)
> +{
> +	struct in6_addr *daddr = (struct in6_addr *)av->grh.dgid.raw;
> +
> +	return rdma_is_multicast_addr(daddr);
> +}
> +
>   void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av)
>   {
>   	rxe_av_from_attr(rdma_ah_get_port_num(attr), av, attr);
> diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
> index 3d2504a0ae56..62b2b25903fc 100644
> --- a/drivers/infiniband/sw/rxe/rxe_loc.h
> +++ b/drivers/infiniband/sw/rxe/rxe_loc.h
> @@ -8,6 +8,7 @@
>   #define RXE_LOC_H
>   
>   /* rxe_av.c */
> +bool rxe_is_mcast_av(struct rxe_av *av);
>   void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av);
>   int rxe_chk_ah_attr(struct rxe_dev *rxe, struct rdma_ah_attr *attr);
>   void rxe_av_from_attr(u8 port_num, struct rxe_av *av,
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index cd59666158b1..2fad56fc95e7 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -412,6 +412,27 @@ static int rxe_loopback(struct sk_buff *skb, struct rxe_pkt_info *pkt)
>   	return 0;
>   }
>   
> +/* for a multicast packet must send remotely and looback to any local qps
> + * that may belong to the mcast group
> + */

https://www.kernel.org/doc/html/v4.15/process/coding-style.html
Please follow the preferred style for long (multi-line) comments in the 
above link.

Zhu Yanjun

> +static int rxe_loop_and_send(struct sk_buff *skb, struct rxe_pkt_info *pkt)
> +{
> +	struct sk_buff *cskb;
> +	int err, loc_err = 0;
> +
> +	if (atomic_read(&pkt->rxe->mcg_num)) {
> +		loc_err = -ENOMEM;
> +		cskb = skb_clone(skb, GFP_KERNEL);
> +		if (cskb)
> +			loc_err = rxe_loopback(cskb, pkt);
> +	}
> +
> +	err = rxe_send(skb, pkt);
> +	if (loc_err)
> +		err = loc_err;
> +	return err;
> +}
> +
>   int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
>   		    struct sk_buff *skb)
>   {
> @@ -431,7 +452,9 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
>   
>   	rxe_icrc_generate(skb, pkt);
>   
> -	if (pkt->mask & RXE_LOOPBACK_MASK)
> +	if (pkt->mask & RXE_MCAST_MASK)
> +		err = rxe_loop_and_send(skb, pkt);
> +	else if (pkt->mask & RXE_LOOPBACK_MASK)
>   		err = rxe_loopback(skb, pkt);
>   	else
>   		err = rxe_send(skb, pkt);
> diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.h b/drivers/infiniband/sw/rxe/rxe_opcode.h
> index 5686b691d6b8..c4cf672ea26d 100644
> --- a/drivers/infiniband/sw/rxe/rxe_opcode.h
> +++ b/drivers/infiniband/sw/rxe/rxe_opcode.h
> @@ -85,7 +85,7 @@ enum rxe_hdr_mask {
>   	RXE_END_MASK		= BIT(NUM_HDR_TYPES + 11),
>   
>   	RXE_LOOPBACK_MASK	= BIT(NUM_HDR_TYPES + 12),
> -
> +	RXE_MCAST_MASK		= BIT(NUM_HDR_TYPES + 13),
>   	RXE_ATOMIC_WRITE_MASK   = BIT(NUM_HDR_TYPES + 14),
>   
>   	RXE_READ_OR_ATOMIC_MASK	= (RXE_READ_MASK | RXE_ATOMIC_MASK),
> diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
> index 5861e4244049..7153de0799fc 100644
> --- a/drivers/infiniband/sw/rxe/rxe_recv.c
> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
> @@ -217,6 +217,10 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
>   	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
>   		qp = mca->qp;
>   
> +		/* don't reply packet to sender if locally sent */
> +		if (pkt->mask & RXE_MCAST_MASK && qp_num(qp) == deth_sqp(pkt))
> +			continue;
> +
>   		/* validate qp for incoming packet */
>   		err = check_type_state(rxe, pkt, qp);
>   		if (err)
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
> index d8c41fd626a9..599bec88cb54 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -442,8 +442,12 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
>   			(pkt->mask & (RXE_WRITE_MASK | RXE_IMMDT_MASK)) ==
>   			(RXE_WRITE_MASK | RXE_IMMDT_MASK));
>   
> -	qp_num = (pkt->mask & RXE_DETH_MASK) ? ibwr->wr.ud.remote_qpn :
> -					 qp->attr.dest_qp_num;
> +	if (pkt->mask & RXE_MCAST_MASK)
> +		qp_num = IB_MULTICAST_QPN;
> +	else if (pkt->mask & RXE_DETH_MASK)
> +		qp_num = ibwr->wr.ud.remote_qpn;
> +	else
> +		qp_num = qp->attr.dest_qp_num;
>   
>   	ack_req = ((pkt->mask & RXE_END_MASK) ||
>   		(qp->req.noack_pkts++ > RXE_MAX_PKT_PER_ACK));
> @@ -809,6 +813,9 @@ int rxe_requester(struct rxe_qp *qp)
>   		goto err;
>   	}
>   
> +	if (rxe_is_mcast_av(av))
> +		pkt.mask |= RXE_MCAST_MASK;
> +
>   	skb = init_req_packet(qp, av, wqe, opcode, payload, &pkt);
>   	if (unlikely(!skb)) {
>   		rxe_dbg_qp(qp, "Failed allocating skb\n");


^ permalink raw reply

* [rdma:for-next] BUILD SUCCESS 2ef422f063b74adcc4a4a9004b0a87bb55e0a836
From: kernel test robot @ 2023-11-04 11:02 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next
branch HEAD: 2ef422f063b74adcc4a4a9004b0a87bb55e0a836  IB/mlx5: Fix init stage error handling to avoid double free of same QP and UAF

Warning ids grouped by kconfigs:

gcc_recent_errors
|-- arm-randconfig-002-20231101
|   `-- tools-testing-selftests-kvm-.gitignore:warning:ignored-by-one-of-the-.gitignore-files
|-- arm64-defconfig
|   `-- tools-testing-selftests-kvm-.gitignore:warning:ignored-by-one-of-the-.gitignore-files
|-- i386-allmodconfig
|   `-- tools-testing-selftests-kvm-.gitignore:warning:ignored-by-one-of-the-.gitignore-files
|-- i386-allnoconfig
|   `-- tools-testing-selftests-kvm-.gitignore:warning:ignored-by-one-of-the-.gitignore-files
|-- i386-debian-10.3
|   `-- tools-testing-selftests-kvm-.gitignore:warning:ignored-by-one-of-the-.gitignore-files
`-- i386-defconfig
    `-- tools-testing-selftests-kvm-.gitignore:warning:ignored-by-one-of-the-.gitignore-files

elapsed time: 5047m

configs tested: 143
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha                             allnoconfig   gcc  
alpha                            allyesconfig   gcc  
alpha                               defconfig   gcc  
arc                              allmodconfig   gcc  
arc                               allnoconfig   gcc  
arc                              allyesconfig   gcc  
arc                                 defconfig   gcc  
arc                   randconfig-001-20231101   gcc  
arc                   randconfig-002-20231101   gcc  
arm                              allmodconfig   gcc  
arm                               allnoconfig   gcc  
arm                              allyesconfig   gcc  
arm                                 defconfig   gcc  
arm                   randconfig-001-20231101   gcc  
arm                   randconfig-002-20231101   gcc  
arm                   randconfig-003-20231101   gcc  
arm                   randconfig-004-20231101   gcc  
arm64                               defconfig   gcc  
arm64                 randconfig-001-20231101   gcc  
arm64                 randconfig-002-20231101   gcc  
arm64                 randconfig-003-20231101   gcc  
arm64                 randconfig-004-20231101   gcc  
csky                                defconfig   gcc  
csky                  randconfig-001-20231101   gcc  
csky                  randconfig-002-20231101   gcc  
i386                             allmodconfig   gcc  
i386                              allnoconfig   gcc  
i386                              debian-10.3   gcc  
i386                                defconfig   gcc  
i386                  randconfig-001-20231101   gcc  
i386                  randconfig-002-20231101   gcc  
i386                  randconfig-003-20231101   gcc  
i386                  randconfig-004-20231101   gcc  
i386                  randconfig-005-20231101   gcc  
i386                  randconfig-006-20231101   gcc  
i386                  randconfig-011-20231101   gcc  
i386                  randconfig-012-20231101   gcc  
i386                  randconfig-013-20231101   gcc  
i386                  randconfig-014-20231101   gcc  
i386                  randconfig-015-20231101   gcc  
i386                  randconfig-016-20231101   gcc  
loongarch                        allmodconfig   gcc  
loongarch                         allnoconfig   gcc  
loongarch                        allyesconfig   gcc  
loongarch                           defconfig   gcc  
loongarch             randconfig-001-20231101   gcc  
loongarch             randconfig-002-20231101   gcc  
m68k                             allmodconfig   gcc  
m68k                              allnoconfig   gcc  
m68k                             allyesconfig   gcc  
m68k                                defconfig   gcc  
microblaze                       allmodconfig   gcc  
microblaze                        allnoconfig   gcc  
microblaze                       allyesconfig   gcc  
microblaze                          defconfig   gcc  
mips                             allmodconfig   gcc  
mips                              allnoconfig   gcc  
mips                             allyesconfig   gcc  
nios2                             allnoconfig   gcc  
nios2                            allyesconfig   gcc  
nios2                               defconfig   gcc  
nios2                 randconfig-001-20231101   gcc  
nios2                 randconfig-002-20231101   gcc  
openrisc                         allmodconfig   gcc  
openrisc                          allnoconfig   gcc  
openrisc                         allyesconfig   gcc  
openrisc                            defconfig   gcc  
parisc                           allmodconfig   gcc  
parisc                            allnoconfig   gcc  
parisc                           allyesconfig   gcc  
parisc                              defconfig   gcc  
parisc                randconfig-001-20231101   gcc  
parisc                randconfig-002-20231101   gcc  
parisc64                            defconfig   gcc  
powerpc                          allmodconfig   gcc  
powerpc                           allnoconfig   gcc  
powerpc                          allyesconfig   gcc  
powerpc               randconfig-001-20231101   gcc  
powerpc               randconfig-002-20231101   gcc  
powerpc               randconfig-003-20231101   gcc  
powerpc64             randconfig-001-20231101   gcc  
powerpc64             randconfig-002-20231101   gcc  
powerpc64             randconfig-003-20231101   gcc  
riscv                            allmodconfig   gcc  
riscv                             allnoconfig   gcc  
riscv                            allyesconfig   gcc  
riscv                               defconfig   gcc  
riscv                 randconfig-001-20231101   gcc  
riscv                 randconfig-002-20231101   gcc  
riscv                          rv32_defconfig   gcc  
s390                             allmodconfig   gcc  
s390                              allnoconfig   gcc  
s390                             allyesconfig   gcc  
s390                                defconfig   gcc  
s390                  randconfig-001-20231101   gcc  
s390                  randconfig-002-20231101   gcc  
sh                               allmodconfig   gcc  
sh                                allnoconfig   gcc  
sh                               allyesconfig   gcc  
sh                                  defconfig   gcc  
sparc                            allmodconfig   gcc  
sparc                             allnoconfig   gcc  
sparc                            allyesconfig   gcc  
sparc                               defconfig   gcc  
sparc64                          allmodconfig   gcc  
sparc64                          allyesconfig   gcc  
sparc64                             defconfig   gcc  
um                               allmodconfig   clang
um                                allnoconfig   clang
um                               allyesconfig   clang
um                                  defconfig   gcc  
um                             i386_defconfig   gcc  
um                           x86_64_defconfig   gcc  
x86_64                            allnoconfig   gcc  
x86_64                           allyesconfig   gcc  
x86_64       buildonly-randconfig-001-20231101   gcc  
x86_64       buildonly-randconfig-002-20231101   gcc  
x86_64       buildonly-randconfig-003-20231101   gcc  
x86_64       buildonly-randconfig-004-20231101   gcc  
x86_64       buildonly-randconfig-005-20231101   gcc  
x86_64       buildonly-randconfig-006-20231101   gcc  
x86_64                              defconfig   gcc  
x86_64                randconfig-001-20231101   gcc  
x86_64                randconfig-002-20231101   gcc  
x86_64                randconfig-003-20231101   gcc  
x86_64                randconfig-004-20231101   gcc  
x86_64                randconfig-005-20231101   gcc  
x86_64                randconfig-006-20231101   gcc  
x86_64                randconfig-011-20231101   gcc  
x86_64                randconfig-012-20231101   gcc  
x86_64                randconfig-013-20231101   gcc  
x86_64                randconfig-014-20231101   gcc  
x86_64                randconfig-015-20231101   gcc  
x86_64                randconfig-016-20231101   gcc  
x86_64                randconfig-071-20231102   gcc  
x86_64                randconfig-072-20231102   gcc  
x86_64                randconfig-073-20231102   gcc  
x86_64                randconfig-074-20231102   gcc  
x86_64                randconfig-075-20231102   gcc  
x86_64                randconfig-076-20231102   gcc  
x86_64                          rhel-8.3-rust   clang
x86_64                               rhel-8.3   gcc  
xtensa                            allnoconfig   gcc  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* [PATCH for-next v2] RDMA/siw: Use ib_umem_get() to pin user pages
From: Bernard Metzler @ 2023-11-04  7:56 UTC (permalink / raw)
  To: linux-rdma
  Cc: jgg, leon, max7255, dennis.dalessandro, guoqing.jiang, benve,
	vadim.fedorenko, Bernard Metzler

Abandon siw private code to pin user pages during user
memory registration, but use ib_umem_get() instead.
This will help maintaining the driver in case of changes
to the memory subsystem.

Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
---
v1 -> v2: remove RLIMIT memlock check logic, now done in ib_umem_get()
---
 drivers/infiniband/sw/siw/siw.h       |   3 +-
 drivers/infiniband/sw/siw/siw_mem.c   | 109 ++++++++++----------------
 drivers/infiniband/sw/siw/siw_mem.h   |   5 +-
 drivers/infiniband/sw/siw/siw_verbs.c |  19 +----
 4 files changed, 47 insertions(+), 89 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw.h b/drivers/infiniband/sw/siw/siw.h
index 58dddb143b9f..6930586109d4 100644
--- a/drivers/infiniband/sw/siw/siw.h
+++ b/drivers/infiniband/sw/siw/siw.h
@@ -121,11 +121,10 @@ struct siw_page_chunk {
 };
 
 struct siw_umem {
+	struct ib_umem *base_mem;
 	struct siw_page_chunk *page_chunk;
 	int num_pages;
-	bool writable;
 	u64 fp_addr; /* First page base address */
-	struct mm_struct *owning_mm;
 };
 
 struct siw_pble {
diff --git a/drivers/infiniband/sw/siw/siw_mem.c b/drivers/infiniband/sw/siw/siw_mem.c
index e6e25f15567d..6cc44df2ece5 100644
--- a/drivers/infiniband/sw/siw/siw_mem.c
+++ b/drivers/infiniband/sw/siw/siw_mem.c
@@ -5,6 +5,7 @@
 
 #include <linux/gfp.h>
 #include <rdma/ib_verbs.h>
+#include <rdma/ib_umem.h>
 #include <linux/dma-mapping.h>
 #include <linux/slab.h>
 #include <linux/sched/mm.h>
@@ -60,28 +61,17 @@ struct siw_mem *siw_mem_id2obj(struct siw_device *sdev, int stag_index)
 	return NULL;
 }
 
-static void siw_free_plist(struct siw_page_chunk *chunk, int num_pages,
-			   bool dirty)
+void siw_umem_release(struct siw_umem *umem)
 {
-	unpin_user_pages_dirty_lock(chunk->plist, num_pages, dirty);
-}
-
-void siw_umem_release(struct siw_umem *umem, bool dirty)
-{
-	struct mm_struct *mm_s = umem->owning_mm;
 	int i, num_pages = umem->num_pages;
 
-	for (i = 0; num_pages; i++) {
-		int to_free = min_t(int, PAGES_PER_CHUNK, num_pages);
+	if (umem->base_mem)
+		ib_umem_release(umem->base_mem);
 
-		siw_free_plist(&umem->page_chunk[i], to_free,
-			       umem->writable && dirty);
+	for (i = 0; num_pages > 0; i++) {
 		kfree(umem->page_chunk[i].plist);
-		num_pages -= to_free;
+		num_pages -= PAGES_PER_CHUNK;
 	}
-	atomic64_sub(umem->num_pages, &mm_s->pinned_vm);
-
-	mmdrop(mm_s);
 	kfree(umem->page_chunk);
 	kfree(umem);
 }
@@ -145,7 +135,7 @@ void siw_free_mem(struct kref *ref)
 
 	if (!mem->is_mw && mem->mem_obj) {
 		if (mem->is_pbl == 0)
-			siw_umem_release(mem->umem, true);
+			siw_umem_release(mem->umem);
 		else
 			kfree(mem->pbl);
 	}
@@ -362,18 +352,16 @@ struct siw_pbl *siw_pbl_alloc(u32 num_buf)
 	return pbl;
 }
 
-struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable)
+struct siw_umem *siw_umem_get(struct ib_device *base_dev, u64 start,
+			      u64 len, int rights)
 {
 	struct siw_umem *umem;
-	struct mm_struct *mm_s;
+	struct ib_umem *base_mem;
+	struct sg_page_iter sg_iter;
+	struct sg_table *sgt;
 	u64 first_page_va;
-	unsigned long mlock_limit;
-	unsigned int foll_flags = FOLL_LONGTERM;
 	int num_pages, num_chunks, i, rv = 0;
 
-	if (!can_do_mlock())
-		return ERR_PTR(-EPERM);
-
 	if (!len)
 		return ERR_PTR(-EINVAL);
 
@@ -385,65 +373,50 @@ struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable)
 	if (!umem)
 		return ERR_PTR(-ENOMEM);
 
-	mm_s = current->mm;
-	umem->owning_mm = mm_s;
-	umem->writable = writable;
-
-	mmgrab(mm_s);
-
-	if (writable)
-		foll_flags |= FOLL_WRITE;
-
-	mmap_read_lock(mm_s);
-
-	mlock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
-	if (atomic64_add_return(num_pages, &mm_s->pinned_vm) > mlock_limit) {
-		rv = -ENOMEM;
-		goto out_sem_up;
-	}
-	umem->fp_addr = first_page_va;
-
 	umem->page_chunk =
 		kcalloc(num_chunks, sizeof(struct siw_page_chunk), GFP_KERNEL);
 	if (!umem->page_chunk) {
 		rv = -ENOMEM;
-		goto out_sem_up;
+		goto err_out;
 	}
-	for (i = 0; num_pages; i++) {
+	base_mem = ib_umem_get(base_dev, start, len, rights);
+	if (IS_ERR(base_mem)) {
+		rv = PTR_ERR(base_mem);
+		siw_dbg(base_dev, "Cannot pin user memory: %d\n", rv);
+		goto err_out;
+	}
+	umem->fp_addr = first_page_va;
+	umem->base_mem = base_mem;
+
+	sgt = &base_mem->sgt_append.sgt;
+	__sg_page_iter_start(&sg_iter, sgt->sgl, sgt->orig_nents, 0);
+
+	if (!__sg_page_iter_next(&sg_iter)) {
+		rv = -EINVAL;
+		goto err_out;
+	}
+	for (i = 0; num_pages > 0; i++) {
 		int nents = min_t(int, num_pages, PAGES_PER_CHUNK);
 		struct page **plist =
 			kcalloc(nents, sizeof(struct page *), GFP_KERNEL);
 
 		if (!plist) {
 			rv = -ENOMEM;
-			goto out_sem_up;
+			goto err_out;
 		}
 		umem->page_chunk[i].plist = plist;
-		while (nents) {
-			rv = pin_user_pages(first_page_va, nents, foll_flags,
-					    plist);
-			if (rv < 0)
-				goto out_sem_up;
-
-			umem->num_pages += rv;
-			first_page_va += rv * PAGE_SIZE;
-			plist += rv;
-			nents -= rv;
-			num_pages -= rv;
+		while (nents--) {
+			*plist = sg_page_iter_page(&sg_iter);
+			umem->num_pages++;
+			num_pages--;
+			plist++;
+			if (!__sg_page_iter_next(&sg_iter))
+				break;
 		}
 	}
-out_sem_up:
-	mmap_read_unlock(mm_s);
-
-	if (rv > 0)
-		return umem;
-
-	/* Adjust accounting for pages not pinned */
-	if (num_pages)
-		atomic64_sub(num_pages, &mm_s->pinned_vm);
-
-	siw_umem_release(umem, false);
+	return umem;
+err_out:
+	siw_umem_release(umem);
 
 	return ERR_PTR(rv);
 }
diff --git a/drivers/infiniband/sw/siw/siw_mem.h b/drivers/infiniband/sw/siw/siw_mem.h
index f911287576d1..562a693f7662 100644
--- a/drivers/infiniband/sw/siw/siw_mem.h
+++ b/drivers/infiniband/sw/siw/siw_mem.h
@@ -6,8 +6,9 @@
 #ifndef _SIW_MEM_H
 #define _SIW_MEM_H
 
-struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable);
-void siw_umem_release(struct siw_umem *umem, bool dirty);
+struct siw_umem *siw_umem_get(struct ib_device *base_dave, u64 start,
+			      u64 len, int rights);
+void siw_umem_release(struct siw_umem *umem);
 struct siw_pbl *siw_pbl_alloc(u32 num_buf);
 dma_addr_t siw_pbl_get_buffer(struct siw_pbl *pbl, u64 off, int *len, int *idx);
 struct siw_mem *siw_mem_id2obj(struct siw_device *sdev, int stag_index);
diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c
index fdbef3254e30..5910207f60b1 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.c
+++ b/drivers/infiniband/sw/siw/siw_verbs.c
@@ -1321,8 +1321,6 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len,
 	struct siw_umem *umem = NULL;
 	struct siw_ureq_reg_mr ureq;
 	struct siw_device *sdev = to_siw_dev(pd->device);
-
-	unsigned long mem_limit = rlimit(RLIMIT_MEMLOCK);
 	int rv;
 
 	siw_dbg_pd(pd, "start: 0x%pK, va: 0x%pK, len: %llu\n",
@@ -1338,20 +1336,7 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len,
 		rv = -EINVAL;
 		goto err_out;
 	}
-	if (mem_limit != RLIM_INFINITY) {
-		unsigned long num_pages =
-			(PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT;
-		mem_limit >>= PAGE_SHIFT;
-
-		if (num_pages > mem_limit - current->mm->locked_vm) {
-			siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n",
-				   num_pages, mem_limit,
-				   current->mm->locked_vm);
-			rv = -ENOMEM;
-			goto err_out;
-		}
-	}
-	umem = siw_umem_get(start, len, ib_access_writable(rights));
+	umem = siw_umem_get(pd->device, start, len, rights);
 	if (IS_ERR(umem)) {
 		rv = PTR_ERR(umem);
 		siw_dbg_pd(pd, "getting user memory failed: %d\n", rv);
@@ -1404,7 +1389,7 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len,
 		kfree_rcu(mr, rcu);
 	} else {
 		if (umem)
-			siw_umem_release(umem, false);
+			siw_umem_release(umem);
 	}
 	return ERR_PTR(rv);
 }
-- 
2.38.1


^ permalink raw reply related

* Słowa kluczowe do wypozycjonowania 
From: Adam Charachuta @ 2023-10-31  8:30 UTC (permalink / raw)
  To: linux-rdma

Dzień dobry,

zapoznałem się z Państwa ofertą i z przyjemnością przyznaję, że przyciąga uwagę i zachęca do dalszych rozmów. 

Pomyślałem, że może mógłbym mieć swój wkład w Państwa rozwój i pomóc dotrzeć z tą ofertą do większego grona odbiorców. Pozycjonuję strony www, dzięki czemu generują świetny ruch w sieci.

Możemy porozmawiać w najbliższym czasie?


Pozdrawiam serdecznie
Adam Charachuta

^ permalink raw reply

* [PATCH for-next] RDMA/siw: Use ib_umem_get() to pin user pages
From: Bernard Metzler @ 2023-11-03 20:49 UTC (permalink / raw)
  To: linux-rdma
  Cc: jgg, leon, max7255, dennis.dalessandro, guoqing.jiang, benve,
	vadim.fedorenko, Bernard Metzler

Abandon siw private code to pin user pages during user
memory registration, but use ib_umem_get() instead.
This will help maintaining the driver in case of changes
to the memory subsystem.

Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
---
v1 -> v2: remove RLIMIT memlock check logic, now done in ib_umem_get()
---
 drivers/infiniband/sw/siw/siw.h       |   3 +-
 drivers/infiniband/sw/siw/siw_mem.c   | 109 ++++++++++----------------
 drivers/infiniband/sw/siw/siw_mem.h   |   5 +-
 drivers/infiniband/sw/siw/siw_verbs.c |  19 +----
 4 files changed, 47 insertions(+), 89 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw.h b/drivers/infiniband/sw/siw/siw.h
index 58dddb143b9f..6930586109d4 100644
--- a/drivers/infiniband/sw/siw/siw.h
+++ b/drivers/infiniband/sw/siw/siw.h
@@ -121,11 +121,10 @@ struct siw_page_chunk {
 };
 
 struct siw_umem {
+	struct ib_umem *base_mem;
 	struct siw_page_chunk *page_chunk;
 	int num_pages;
-	bool writable;
 	u64 fp_addr; /* First page base address */
-	struct mm_struct *owning_mm;
 };
 
 struct siw_pble {
diff --git a/drivers/infiniband/sw/siw/siw_mem.c b/drivers/infiniband/sw/siw/siw_mem.c
index e6e25f15567d..6cc44df2ece5 100644
--- a/drivers/infiniband/sw/siw/siw_mem.c
+++ b/drivers/infiniband/sw/siw/siw_mem.c
@@ -5,6 +5,7 @@
 
 #include <linux/gfp.h>
 #include <rdma/ib_verbs.h>
+#include <rdma/ib_umem.h>
 #include <linux/dma-mapping.h>
 #include <linux/slab.h>
 #include <linux/sched/mm.h>
@@ -60,28 +61,17 @@ struct siw_mem *siw_mem_id2obj(struct siw_device *sdev, int stag_index)
 	return NULL;
 }
 
-static void siw_free_plist(struct siw_page_chunk *chunk, int num_pages,
-			   bool dirty)
+void siw_umem_release(struct siw_umem *umem)
 {
-	unpin_user_pages_dirty_lock(chunk->plist, num_pages, dirty);
-}
-
-void siw_umem_release(struct siw_umem *umem, bool dirty)
-{
-	struct mm_struct *mm_s = umem->owning_mm;
 	int i, num_pages = umem->num_pages;
 
-	for (i = 0; num_pages; i++) {
-		int to_free = min_t(int, PAGES_PER_CHUNK, num_pages);
+	if (umem->base_mem)
+		ib_umem_release(umem->base_mem);
 
-		siw_free_plist(&umem->page_chunk[i], to_free,
-			       umem->writable && dirty);
+	for (i = 0; num_pages > 0; i++) {
 		kfree(umem->page_chunk[i].plist);
-		num_pages -= to_free;
+		num_pages -= PAGES_PER_CHUNK;
 	}
-	atomic64_sub(umem->num_pages, &mm_s->pinned_vm);
-
-	mmdrop(mm_s);
 	kfree(umem->page_chunk);
 	kfree(umem);
 }
@@ -145,7 +135,7 @@ void siw_free_mem(struct kref *ref)
 
 	if (!mem->is_mw && mem->mem_obj) {
 		if (mem->is_pbl == 0)
-			siw_umem_release(mem->umem, true);
+			siw_umem_release(mem->umem);
 		else
 			kfree(mem->pbl);
 	}
@@ -362,18 +352,16 @@ struct siw_pbl *siw_pbl_alloc(u32 num_buf)
 	return pbl;
 }
 
-struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable)
+struct siw_umem *siw_umem_get(struct ib_device *base_dev, u64 start,
+			      u64 len, int rights)
 {
 	struct siw_umem *umem;
-	struct mm_struct *mm_s;
+	struct ib_umem *base_mem;
+	struct sg_page_iter sg_iter;
+	struct sg_table *sgt;
 	u64 first_page_va;
-	unsigned long mlock_limit;
-	unsigned int foll_flags = FOLL_LONGTERM;
 	int num_pages, num_chunks, i, rv = 0;
 
-	if (!can_do_mlock())
-		return ERR_PTR(-EPERM);
-
 	if (!len)
 		return ERR_PTR(-EINVAL);
 
@@ -385,65 +373,50 @@ struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable)
 	if (!umem)
 		return ERR_PTR(-ENOMEM);
 
-	mm_s = current->mm;
-	umem->owning_mm = mm_s;
-	umem->writable = writable;
-
-	mmgrab(mm_s);
-
-	if (writable)
-		foll_flags |= FOLL_WRITE;
-
-	mmap_read_lock(mm_s);
-
-	mlock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
-	if (atomic64_add_return(num_pages, &mm_s->pinned_vm) > mlock_limit) {
-		rv = -ENOMEM;
-		goto out_sem_up;
-	}
-	umem->fp_addr = first_page_va;
-
 	umem->page_chunk =
 		kcalloc(num_chunks, sizeof(struct siw_page_chunk), GFP_KERNEL);
 	if (!umem->page_chunk) {
 		rv = -ENOMEM;
-		goto out_sem_up;
+		goto err_out;
 	}
-	for (i = 0; num_pages; i++) {
+	base_mem = ib_umem_get(base_dev, start, len, rights);
+	if (IS_ERR(base_mem)) {
+		rv = PTR_ERR(base_mem);
+		siw_dbg(base_dev, "Cannot pin user memory: %d\n", rv);
+		goto err_out;
+	}
+	umem->fp_addr = first_page_va;
+	umem->base_mem = base_mem;
+
+	sgt = &base_mem->sgt_append.sgt;
+	__sg_page_iter_start(&sg_iter, sgt->sgl, sgt->orig_nents, 0);
+
+	if (!__sg_page_iter_next(&sg_iter)) {
+		rv = -EINVAL;
+		goto err_out;
+	}
+	for (i = 0; num_pages > 0; i++) {
 		int nents = min_t(int, num_pages, PAGES_PER_CHUNK);
 		struct page **plist =
 			kcalloc(nents, sizeof(struct page *), GFP_KERNEL);
 
 		if (!plist) {
 			rv = -ENOMEM;
-			goto out_sem_up;
+			goto err_out;
 		}
 		umem->page_chunk[i].plist = plist;
-		while (nents) {
-			rv = pin_user_pages(first_page_va, nents, foll_flags,
-					    plist);
-			if (rv < 0)
-				goto out_sem_up;
-
-			umem->num_pages += rv;
-			first_page_va += rv * PAGE_SIZE;
-			plist += rv;
-			nents -= rv;
-			num_pages -= rv;
+		while (nents--) {
+			*plist = sg_page_iter_page(&sg_iter);
+			umem->num_pages++;
+			num_pages--;
+			plist++;
+			if (!__sg_page_iter_next(&sg_iter))
+				break;
 		}
 	}
-out_sem_up:
-	mmap_read_unlock(mm_s);
-
-	if (rv > 0)
-		return umem;
-
-	/* Adjust accounting for pages not pinned */
-	if (num_pages)
-		atomic64_sub(num_pages, &mm_s->pinned_vm);
-
-	siw_umem_release(umem, false);
+	return umem;
+err_out:
+	siw_umem_release(umem);
 
 	return ERR_PTR(rv);
 }
diff --git a/drivers/infiniband/sw/siw/siw_mem.h b/drivers/infiniband/sw/siw/siw_mem.h
index f911287576d1..562a693f7662 100644
--- a/drivers/infiniband/sw/siw/siw_mem.h
+++ b/drivers/infiniband/sw/siw/siw_mem.h
@@ -6,8 +6,9 @@
 #ifndef _SIW_MEM_H
 #define _SIW_MEM_H
 
-struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable);
-void siw_umem_release(struct siw_umem *umem, bool dirty);
+struct siw_umem *siw_umem_get(struct ib_device *base_dave, u64 start,
+			      u64 len, int rights);
+void siw_umem_release(struct siw_umem *umem);
 struct siw_pbl *siw_pbl_alloc(u32 num_buf);
 dma_addr_t siw_pbl_get_buffer(struct siw_pbl *pbl, u64 off, int *len, int *idx);
 struct siw_mem *siw_mem_id2obj(struct siw_device *sdev, int stag_index);
diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c
index fdbef3254e30..5910207f60b1 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.c
+++ b/drivers/infiniband/sw/siw/siw_verbs.c
@@ -1321,8 +1321,6 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len,
 	struct siw_umem *umem = NULL;
 	struct siw_ureq_reg_mr ureq;
 	struct siw_device *sdev = to_siw_dev(pd->device);
-
-	unsigned long mem_limit = rlimit(RLIMIT_MEMLOCK);
 	int rv;
 
 	siw_dbg_pd(pd, "start: 0x%pK, va: 0x%pK, len: %llu\n",
@@ -1338,20 +1336,7 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len,
 		rv = -EINVAL;
 		goto err_out;
 	}
-	if (mem_limit != RLIM_INFINITY) {
-		unsigned long num_pages =
-			(PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT;
-		mem_limit >>= PAGE_SHIFT;
-
-		if (num_pages > mem_limit - current->mm->locked_vm) {
-			siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n",
-				   num_pages, mem_limit,
-				   current->mm->locked_vm);
-			rv = -ENOMEM;
-			goto err_out;
-		}
-	}
-	umem = siw_umem_get(start, len, ib_access_writable(rights));
+	umem = siw_umem_get(pd->device, start, len, rights);
 	if (IS_ERR(umem)) {
 		rv = PTR_ERR(umem);
 		siw_dbg_pd(pd, "getting user memory failed: %d\n", rv);
@@ -1404,7 +1389,7 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len,
 		kfree_rcu(mr, rcu);
 	} else {
 		if (umem)
-			siw_umem_release(umem, false);
+			siw_umem_release(umem);
 	}
 	return ERR_PTR(rv);
 }
-- 
2.38.1


^ permalink raw reply related

* [PATCH for-next 6/6] RDMA/rxe: Cleanup mcg lifetime
From: Bob Pearson @ 2023-11-03 20:43 UTC (permalink / raw)
  To: jgg, yanjun.zhu, linux-rdma; +Cc: Bob Pearson
In-Reply-To: <20231103204324.9606-1-rpearsonhpe@gmail.com>

Fix up mcg reference counting so the ref count will drop
to zero correctly and move code from rxe_destroy_mcg to
rxe_cleanup_mcg since rxe_destroy is no longer needed.

Also general code cleanup. Drop comments on statics, etc.

Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_loc.h   |   2 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c | 190 ++++++++------------------
 drivers/infiniband/sw/rxe/rxe_recv.c  |   2 +-
 3 files changed, 58 insertions(+), 136 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 62b2b25903fc..0509ccdaa2f2 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -37,7 +37,7 @@ void rxe_cq_cleanup(struct rxe_pool_elem *elem);
 struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid);
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
-void rxe_cleanup_mcg(struct kref *kref);
+int rxe_put_mcg(struct rxe_mcg *mcg);
 
 /* rxe_mmap.c */
 struct rxe_mmap_info {
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index bca5b022b797..65a420a540cd 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -23,7 +23,6 @@
 
 #include "rxe.h"
 
-/* register mcast IP and MAC addresses with net stack */
 static int rxe_mcast_add6(struct rxe_dev *rxe, union ib_gid *mgid)
 {
 	unsigned char ll_addr[ETH_ALEN];
@@ -82,7 +81,6 @@ static int rxe_mcast_add(struct rxe_mcg *mcg)
 	return err;
 }
 
-/* deregister mcast IP and MAC addresses with net stack */
 static int rxe_mcast_del6(struct rxe_dev *rxe, union ib_gid *mgid)
 {
 	unsigned char ll_addr[ETH_ALEN];
@@ -122,13 +120,31 @@ static int rxe_mcast_del(struct rxe_mcg *mcg)
 	return err ?: err2;
 }
 
-/**
- * __rxe_insert_mcg - insert an mcg into red-black tree (rxe->mcg_tree)
- * @mcg: mcg object with an embedded red-black tree node
- *
- * Context: caller must hold a reference to mcg and rxe->mcg_mutex and
- * is responsible to avoid adding the same mcg twice to the tree.
- */
+static void __rxe_remove_mcg(struct rxe_mcg *mcg)
+{
+	rb_erase(&mcg->node, &mcg->rxe->mcg_tree);
+}
+
+static void rxe_cleanup_mcg(struct kref *kref)
+{
+	struct rxe_mcg *mcg = container_of(kref, typeof(*mcg), ref_cnt);
+
+	__rxe_remove_mcg(mcg);
+	rxe_mcast_del(mcg);
+	atomic_dec(&mcg->rxe->mcg_num);
+	kfree_rcu(mcg, rcu);
+}
+
+static int rxe_get_mcg(struct rxe_mcg *mcg)
+{
+	return kref_get_unless_zero(&mcg->ref_cnt);
+}
+
+int rxe_put_mcg(struct rxe_mcg *mcg)
+{
+	return kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
+}
+
 static void __rxe_insert_mcg(struct rxe_mcg *mcg)
 {
 	struct rb_root *tree = &mcg->rxe->mcg_tree;
@@ -144,34 +160,17 @@ static void __rxe_insert_mcg(struct rxe_mcg *mcg)
 		cmp = memcmp(&tmp->mgid, &mcg->mgid, sizeof(mcg->mgid));
 		if (cmp > 0)
 			link = &(*link)->rb_left;
-		else
+		else if (cmp < 0)
 			link = &(*link)->rb_right;
+		else
+			WARN_ON_ONCE(1);
 	}
 
 	rb_link_node_rcu(&mcg->node, node, link);
 	rb_insert_color(&mcg->node, tree);
 }
 
-/**
- * __rxe_remove_mcg - remove an mcg from red-black tree holding lock
- * @mcg: mcast group object with an embedded red-black tree node
- *
- * Context: caller must hold a reference to mcg and rxe->mcg_mutex
- */
-static void __rxe_remove_mcg(struct rxe_mcg *mcg)
-{
-	rb_erase(&mcg->node, &mcg->rxe->mcg_tree);
-}
-
-/**
- * rxe_lookup_mcg - lookup mcg in rxe->mcg_tree while holding lock
- * @rxe: rxe device object
- * @mgid: multicast IP address
- *
- * Returns: mcg on success and takes a ref to mcg else NULL
- */
-struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe,
-					union ib_gid *mgid)
+struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
 {
 	struct rb_root *tree = &rxe->mcg_tree;
 	struct rxe_mcg *mcg;
@@ -196,21 +195,16 @@ struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe,
 	rcu_read_unlock();
 
 	if (node) {
-		kref_get(&mcg->ref_cnt);
+		/* take a ref on mcg for each lookup */
+		rxe_get_mcg(mcg);
 		return mcg;
 	}
 
 	return NULL;
 }
 
-/**
- * rxe_get_mcg - lookup or allocate a mcg
- * @rxe: rxe device object
- * @mgid: multicast IP address as a gid
- *
- * Returns: mcg on success else ERR_PTR(error)
- */
-static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
+/* find an existing mcg or allocate a new one */
+static struct rxe_mcg *rxe_alloc_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
 {
 	struct rxe_mcg *mcg;
 	int err;
@@ -234,22 +228,22 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
 	memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid));
 	mcg->is_ipv6 = !ipv6_addr_v4mapped((struct in6_addr *)mgid);
 	mcg->rxe = rxe;
+	/* take ref on mcg when created */
 	kref_init(&mcg->ref_cnt);
 	INIT_LIST_HEAD(&mcg->qp_list);
 	spin_lock_init(&mcg->lock);
-	kref_get(&mcg->ref_cnt);
-	__rxe_insert_mcg(mcg);
 
 	err = rxe_mcast_add(mcg);
 	if (err)
 		goto err_free;
 
+	/* can insert into tree now that mcg is finished */
+	__rxe_insert_mcg(mcg);
 out:
 	mutex_unlock(&rxe->mcg_mutex);
 	return mcg;
 
 err_free:
-	__rxe_remove_mcg(mcg);
 	kfree(mcg);
 err_dec:
 	atomic_dec(&rxe->mcg_num);
@@ -257,64 +251,12 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
 	return ERR_PTR(err);
 }
 
-/**
- * rxe_cleanup_mcg - cleanup mcg for kref_put
- * @kref: struct kref embnedded in mcg
- */
-void rxe_cleanup_mcg(struct kref *kref)
-{
-	struct rxe_mcg *mcg = container_of(kref, typeof(*mcg), ref_cnt);
-
-	kfree_rcu(mcg, rcu);
-}
-
-/**
- * __rxe_destroy_mcg - destroy mcg object holding rxe->mcg_mutex
- * @mcg: the mcg object
- *
- * Context: caller is holding rxe->mcg_mutex
- * no qp's are attached to mcg
- */
-static void __rxe_destroy_mcg(struct rxe_mcg *mcg)
-{
-	struct rxe_dev *rxe = mcg->rxe;
-
-	/* remove mcg from red-black tree then drop ref */
-	__rxe_remove_mcg(mcg);
-	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
-
-	atomic_dec(&rxe->mcg_num);
-}
-
-/**
- * rxe_destroy_mcg - destroy mcg object
- * @mcg: the mcg object
- *
- * Context: no qp's are attached to mcg
- */
-static void rxe_destroy_mcg(struct rxe_mcg *mcg)
-{
-	/* delete mcast address outside of lock */
-	rxe_mcast_del(mcg);
-
-	mutex_lock(&mcg->rxe->mcg_mutex);
-	__rxe_destroy_mcg(mcg);
-	mutex_unlock(&mcg->rxe->mcg_mutex);
-}
-
-/**
- * rxe_attach_mcg - attach qp to mcg if not already attached
- * @qp: qp object
- * @mcg: mcg object
- *
- * Returns: 0 on success else an error
- */
-static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
+static int rxe_attach_mcg(struct rxe_qp *qp, struct rxe_mcg *mcg)
 {
 	struct rxe_dev *rxe = mcg->rxe;
 	struct rxe_mca *mca;
 	unsigned long flags;
-	int err;
+	int err = 0;
 
 	mutex_lock(&rxe->mcg_mutex);
 	spin_lock_irqsave(&mcg->lock, flags);
@@ -348,29 +290,28 @@ static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 	rxe_get(qp);
 	mca->qp = qp;
 
+	/* hold a ref on mcg for each attached qp
+	 * protects the pointers in mca->qp_list
+	 */
+	rxe_get_mcg(mcg);
+
 	spin_lock_irqsave(&mcg->lock, flags);
 	list_add_tail(&mca->qp_list, &mcg->qp_list);
 	spin_unlock_irqrestore(&mcg->lock, flags);
-out:
-	mutex_unlock(&rxe->mcg_mutex);
-	return 0;
+	goto out;
 
 err_dec_qp_num:
 	atomic_dec(&mcg->qp_num);
 err_dec_attach:
 	atomic_dec(&rxe->mcg_attach);
+out:
+	/* drop the ref on mcg from rxe_alloc_mcg */
+	rxe_put_mcg(mcg);
 	mutex_unlock(&rxe->mcg_mutex);
 	return err;
 }
 
-/**
- * rxe_detach_mcg - detach qp from mcg
- * @mcg: mcg object
- * @qp: qp object
- *
- * Returns: 0 on success else an error if qp is not attached.
- */
-static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
+static int rxe_detach_mcg(struct rxe_qp *qp, struct rxe_mcg *mcg)
 {
 	struct rxe_dev *rxe = mcg->rxe;
 	struct rxe_mca *mca;
@@ -387,7 +328,6 @@ static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 	}
 	spin_unlock_irqrestore(&mcg->lock, flags);
 
-	/* we didn't find the qp on the list */
 	err = -EINVAL;
 	goto err_out;
 
@@ -395,23 +335,18 @@ static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 	spin_lock_irqsave(&mcg->lock, flags);
 	list_del(&mca->qp_list);
 	spin_unlock_irqrestore(&mcg->lock, flags);
+	/* drop the ref on mcg from rxe_attach_mcg */
+	rxe_put_mcg(mcg);
 
 	atomic_dec(&mcg->qp_num);
 	atomic_dec(&mcg->rxe->mcg_attach);
 	atomic_dec(&mca->qp->mcg_num);
+	/* drop the ref on qp that was protecting mca->qp */
 	rxe_put(mca->qp);
 	kfree(mca);
-
-	/* if the number of qp's attached to the
-	 * mcast group falls to zero go ahead and
-	 * tear it down. This will not free the
-	 * object since we are still holding a ref
-	 * from the caller
-	 */
-	if (atomic_read(&mcg->qp_num) <= 0)
-		__rxe_destroy_mcg(mcg);
-
 err_out:
+	/* drop the ref on mcg from rxe_lookup_mcg */
+	rxe_put_mcg(mcg);
 	mutex_unlock(&rxe->mcg_mutex);
 	return err;
 }
@@ -426,7 +361,6 @@ static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
  */
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 {
-	int err;
 	struct rxe_dev *rxe = to_rdev(ibqp->device);
 	struct rxe_qp *qp = to_rqp(ibqp);
 	struct rxe_mcg *mcg;
@@ -435,19 +369,11 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 		return -EINVAL;
 
 	/* takes a ref on mcg if successful */
-	mcg = rxe_get_mcg(rxe, mgid);
+	mcg = rxe_alloc_mcg(rxe, mgid);
 	if (IS_ERR(mcg))
 		return PTR_ERR(mcg);
 
-	err = rxe_attach_mcg(mcg, qp);
-
-	/* if we failed to attach the first qp to mcg tear it down */
-	if (atomic_read(&mcg->qp_num) == 0)
-		rxe_destroy_mcg(mcg);
-
-	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
-
-	return err;
+	return rxe_attach_mcg(qp, mcg);
 }
 
 /**
@@ -463,14 +389,10 @@ int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	struct rxe_dev *rxe = to_rdev(ibqp->device);
 	struct rxe_qp *qp = to_rqp(ibqp);
 	struct rxe_mcg *mcg;
-	int err;
 
 	mcg = rxe_lookup_mcg(rxe, mgid);
 	if (!mcg)
 		return -EINVAL;
 
-	err = rxe_detach_mcg(mcg, qp);
-	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
-
-	return err;
+	return rxe_detach_mcg(qp, mcg);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 6cf0da958864..e3ec3dfc57f4 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -262,7 +262,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 
 	spin_unlock_irqrestore(&mcg->lock, flags);
 
-	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
+	rxe_put_mcg(mcg);
 
 	if (likely(!skb))
 		return;
-- 
2.40.1


^ permalink raw reply related

* [PATCH for-next 5/6] RDMA/rxe: Split multicast lock
From: Bob Pearson @ 2023-11-03 20:43 UTC (permalink / raw)
  To: jgg, yanjun.zhu, linux-rdma; +Cc: Bob Pearson
In-Reply-To: <20231103204324.9606-1-rpearsonhpe@gmail.com>

Split rxe->mcg_lock into two locks. One to protect mcg->qp_list
and one to protect rxe->mcg_tree (red-black tree) write side
operations and provide serialization between rxe_attach_mcast
and rxe_detach_mcast.

Make the qp_list lock a spin_lock_irqsave lock and move to the
mcg struct. It protects the qp_list from simultaneous access
from rxe_mcast.c and rxe_recv.c when processing incoming multi-
cast packets. In theory some ethernet driver could bypass NAPI
so an irq lock is better than a bh lock.

Make the mcg_tree lock a mutex since the attach/detach APIs are
not called in atomic context. This allows some significant cleanup
since we can call kzalloc while holding the mutex so some recheck
code can be eliminated.

Use rcu to protect mcg_tree read side operations as set up in
the previous patch. rxe_recv_mcast_pkt which does run in an
atomic context now does not use the mcg_mutex lock.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |   2 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c | 256 ++++++++++----------------
 drivers/infiniband/sw/rxe/rxe_recv.c  |   5 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h |   3 +-
 4 files changed, 106 insertions(+), 160 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 54c723a6edda..147cb16e937d 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -142,7 +142,7 @@ static void rxe_init(struct rxe_dev *rxe)
 	INIT_LIST_HEAD(&rxe->pending_mmaps);
 
 	/* init multicast support */
-	spin_lock_init(&rxe->mcg_lock);
+	mutex_init(&rxe->mcg_mutex);
 	rxe->mcg_tree = RB_ROOT;
 
 	mutex_init(&rxe->usdev_lock);
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index d7b8e31ab480..bca5b022b797 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -126,7 +126,7 @@ static int rxe_mcast_del(struct rxe_mcg *mcg)
  * __rxe_insert_mcg - insert an mcg into red-black tree (rxe->mcg_tree)
  * @mcg: mcg object with an embedded red-black tree node
  *
- * Context: caller must hold a reference to mcg and rxe->mcg_lock and
+ * Context: caller must hold a reference to mcg and rxe->mcg_mutex and
  * is responsible to avoid adding the same mcg twice to the tree.
  */
 static void __rxe_insert_mcg(struct rxe_mcg *mcg)
@@ -156,7 +156,7 @@ static void __rxe_insert_mcg(struct rxe_mcg *mcg)
  * __rxe_remove_mcg - remove an mcg from red-black tree holding lock
  * @mcg: mcast group object with an embedded red-black tree node
  *
- * Context: caller must hold a reference to mcg and rxe->mcg_lock
+ * Context: caller must hold a reference to mcg and rxe->mcg_mutex
  */
 static void __rxe_remove_mcg(struct rxe_mcg *mcg)
 {
@@ -203,34 +203,6 @@ struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe,
 	return NULL;
 }
 
-/**
- * __rxe_init_mcg - initialize a new mcg
- * @rxe: rxe device
- * @mgid: multicast address as a gid
- * @mcg: new mcg object
- *
- * Context: caller should hold rxe->mcg lock
- */
-static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
-			   struct rxe_mcg *mcg)
-{
-	kref_init(&mcg->ref_cnt);
-	memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid));
-	mcg->is_ipv6 = !ipv6_addr_v4mapped((struct in6_addr *)mgid);
-	INIT_LIST_HEAD(&mcg->qp_list);
-	mcg->rxe = rxe;
-
-	/* caller holds a ref on mcg but that will be
-	 * dropped when mcg goes out of scope. We need to take a ref
-	 * on the pointer that will be saved in the red-black tree
-	 * by __rxe_insert_mcg and used to lookup mcg from mgid later.
-	 * Inserting mcg makes it visible to outside so this should
-	 * be done last after the object is ready.
-	 */
-	kref_get(&mcg->ref_cnt);
-	__rxe_insert_mcg(mcg);
-}
-
 /**
  * rxe_get_mcg - lookup or allocate a mcg
  * @rxe: rxe device object
@@ -240,51 +212,48 @@ static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
  */
 static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
 {
-	struct rxe_mcg *mcg, *tmp;
+	struct rxe_mcg *mcg;
 	int err;
 
-	if (rxe->attr.max_mcast_grp == 0)
-		return ERR_PTR(-EINVAL);
-
-	/* check to see if mcg already exists */
+	mutex_lock(&rxe->mcg_mutex);
 	mcg = rxe_lookup_mcg(rxe, mgid);
 	if (mcg)
-		return mcg;
+		goto out;	/* nothing to do */
 
-	/* check to see if we have reached limit */
 	if (atomic_inc_return(&rxe->mcg_num) > rxe->attr.max_mcast_grp) {
-		err = -ENOMEM;
+		err = -EINVAL;
 		goto err_dec;
 	}
 
-	/* speculative alloc of new mcg */
 	mcg = kzalloc(sizeof(*mcg), GFP_KERNEL);
 	if (!mcg) {
 		err = -ENOMEM;
 		goto err_dec;
 	}
 
-	spin_lock_bh(&rxe->mcg_lock);
-	/* re-check to see if someone else just added it */
-	tmp = __rxe_lookup_mcg(rxe, mgid);
-	if (tmp) {
-		spin_unlock_bh(&rxe->mcg_lock);
-		atomic_dec(&rxe->mcg_num);
-		kfree(mcg);
-		return tmp;
-	}
-
-	__rxe_init_mcg(rxe, mgid, mcg);
-	spin_unlock_bh(&rxe->mcg_lock);
+	memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid));
+	mcg->is_ipv6 = !ipv6_addr_v4mapped((struct in6_addr *)mgid);
+	mcg->rxe = rxe;
+	kref_init(&mcg->ref_cnt);
+	INIT_LIST_HEAD(&mcg->qp_list);
+	spin_lock_init(&mcg->lock);
+	kref_get(&mcg->ref_cnt);
+	__rxe_insert_mcg(mcg);
 
-	/* add mcast address outside of lock */
 	err = rxe_mcast_add(mcg);
-	if (!err)
-		return mcg;
+	if (err)
+		goto err_free;
 
+out:
+	mutex_unlock(&rxe->mcg_mutex);
+	return mcg;
+
+err_free:
+	__rxe_remove_mcg(mcg);
 	kfree(mcg);
 err_dec:
 	atomic_dec(&rxe->mcg_num);
+	mutex_unlock(&rxe->mcg_mutex);
 	return ERR_PTR(err);
 }
 
@@ -300,10 +269,10 @@ void rxe_cleanup_mcg(struct kref *kref)
 }
 
 /**
- * __rxe_destroy_mcg - destroy mcg object holding rxe->mcg_lock
+ * __rxe_destroy_mcg - destroy mcg object holding rxe->mcg_mutex
  * @mcg: the mcg object
  *
- * Context: caller is holding rxe->mcg_lock
+ * Context: caller is holding rxe->mcg_mutex
  * no qp's are attached to mcg
  */
 static void __rxe_destroy_mcg(struct rxe_mcg *mcg)
@@ -328,151 +297,123 @@ static void rxe_destroy_mcg(struct rxe_mcg *mcg)
 	/* delete mcast address outside of lock */
 	rxe_mcast_del(mcg);
 
-	spin_lock_bh(&mcg->rxe->mcg_lock);
+	mutex_lock(&mcg->rxe->mcg_mutex);
 	__rxe_destroy_mcg(mcg);
-	spin_unlock_bh(&mcg->rxe->mcg_lock);
+	mutex_unlock(&mcg->rxe->mcg_mutex);
 }
 
 /**
- * __rxe_init_mca - initialize a new mca holding lock
+ * rxe_attach_mcg - attach qp to mcg if not already attached
  * @qp: qp object
  * @mcg: mcg object
- * @mca: empty space for new mca
- *
- * Context: caller must hold references on qp and mcg, rxe->mcg_lock
- * and pass memory for new mca
  *
  * Returns: 0 on success else an error
  */
-static int __rxe_init_mca(struct rxe_qp *qp, struct rxe_mcg *mcg,
-			  struct rxe_mca *mca)
+static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 {
-	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
-	int n;
+	struct rxe_dev *rxe = mcg->rxe;
+	struct rxe_mca *mca;
+	unsigned long flags;
+	int err;
 
-	n = atomic_inc_return(&rxe->mcg_attach);
-	if (n > rxe->attr.max_total_mcast_qp_attach) {
-		atomic_dec(&rxe->mcg_attach);
-		return -ENOMEM;
+	mutex_lock(&rxe->mcg_mutex);
+	spin_lock_irqsave(&mcg->lock, flags);
+	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
+		if (mca->qp == qp) {
+			spin_unlock_irqrestore(&mcg->lock, flags);
+			goto out;	/* nothing to do */
+		}
 	}
+	spin_unlock_irqrestore(&mcg->lock, flags);
 
-	n = atomic_inc_return(&mcg->qp_num);
-	if (n > rxe->attr.max_mcast_qp_attach) {
-		atomic_dec(&mcg->qp_num);
-		atomic_dec(&rxe->mcg_attach);
-		return -ENOMEM;
+	if (atomic_inc_return(&rxe->mcg_attach) >
+	    rxe->attr.max_total_mcast_qp_attach) {
+		err = -EINVAL;
+		goto err_dec_attach;
 	}
 
-	atomic_inc(&qp->mcg_num);
+	if (atomic_inc_return(&mcg->qp_num) >
+	    rxe->attr.max_mcast_qp_attach) {
+		err = -EINVAL;
+		goto err_dec_qp_num;
+	}
 
+	mca = kzalloc(sizeof(*mca), GFP_KERNEL);
+	if (!mca) {
+		err = -ENOMEM;
+		goto err_dec_qp_num;
+	}
+
+	atomic_inc(&qp->mcg_num);
 	rxe_get(qp);
 	mca->qp = qp;
 
+	spin_lock_irqsave(&mcg->lock, flags);
 	list_add_tail(&mca->qp_list, &mcg->qp_list);
-
+	spin_unlock_irqrestore(&mcg->lock, flags);
+out:
+	mutex_unlock(&rxe->mcg_mutex);
 	return 0;
+
+err_dec_qp_num:
+	atomic_dec(&mcg->qp_num);
+err_dec_attach:
+	atomic_dec(&rxe->mcg_attach);
+	mutex_unlock(&rxe->mcg_mutex);
+	return err;
 }
 
 /**
- * rxe_attach_mcg - attach qp to mcg if not already attached
- * @qp: qp object
+ * rxe_detach_mcg - detach qp from mcg
  * @mcg: mcg object
+ * @qp: qp object
  *
- * Context: caller must hold reference on qp and mcg.
- * Returns: 0 on success else an error
+ * Returns: 0 on success else an error if qp is not attached.
  */
-static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
+static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 {
 	struct rxe_dev *rxe = mcg->rxe;
-	struct rxe_mca *mca, *tmp;
-	int err;
+	struct rxe_mca *mca;
+	unsigned long flags;
+	int err = 0;
 
-	/* check to see if the qp is already a member of the group */
-	spin_lock_bh(&rxe->mcg_lock);
+	mutex_lock(&rxe->mcg_mutex);
+	spin_lock_irqsave(&mcg->lock, flags);
 	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
-			spin_unlock_bh(&rxe->mcg_lock);
-			return 0;
+			spin_unlock_irqrestore(&mcg->lock, flags);
+			goto found;
 		}
 	}
-	spin_unlock_bh(&rxe->mcg_lock);
+	spin_unlock_irqrestore(&mcg->lock, flags);
 
-	/* speculative alloc new mca without using GFP_ATOMIC */
-	mca = kzalloc(sizeof(*mca), GFP_KERNEL);
-	if (!mca)
-		return -ENOMEM;
-
-	spin_lock_bh(&rxe->mcg_lock);
-	/* re-check to see if someone else just attached qp */
-	list_for_each_entry(tmp, &mcg->qp_list, qp_list) {
-		if (tmp->qp == qp) {
-			kfree(mca);
-			err = 0;
-			goto out;
-		}
-	}
-
-	err = __rxe_init_mca(qp, mcg, mca);
-	if (err)
-		kfree(mca);
-out:
-	spin_unlock_bh(&rxe->mcg_lock);
-	return err;
-}
+	/* we didn't find the qp on the list */
+	err = -EINVAL;
+	goto err_out;
 
-/**
- * __rxe_cleanup_mca - cleanup mca object holding lock
- * @mca: mca object
- * @mcg: mcg object
- *
- * Context: caller must hold a reference to mcg and rxe->mcg_lock
- */
-static void __rxe_cleanup_mca(struct rxe_mca *mca, struct rxe_mcg *mcg)
-{
+found:
+	spin_lock_irqsave(&mcg->lock, flags);
 	list_del(&mca->qp_list);
+	spin_unlock_irqrestore(&mcg->lock, flags);
 
 	atomic_dec(&mcg->qp_num);
 	atomic_dec(&mcg->rxe->mcg_attach);
 	atomic_dec(&mca->qp->mcg_num);
 	rxe_put(mca->qp);
-
 	kfree(mca);
-}
 
-/**
- * rxe_detach_mcg - detach qp from mcg
- * @mcg: mcg object
- * @qp: qp object
- *
- * Returns: 0 on success else an error if qp is not attached.
- */
-static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
-{
-	struct rxe_dev *rxe = mcg->rxe;
-	struct rxe_mca *mca, *tmp;
-
-	spin_lock_bh(&rxe->mcg_lock);
-	list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) {
-		if (mca->qp == qp) {
-			__rxe_cleanup_mca(mca, mcg);
-
-			/* if the number of qp's attached to the
-			 * mcast group falls to zero go ahead and
-			 * tear it down. This will not free the
-			 * object since we are still holding a ref
-			 * from the caller
-			 */
-			if (atomic_read(&mcg->qp_num) <= 0)
-				__rxe_destroy_mcg(mcg);
-
-			spin_unlock_bh(&rxe->mcg_lock);
-			return 0;
-		}
-	}
+	/* if the number of qp's attached to the
+	 * mcast group falls to zero go ahead and
+	 * tear it down. This will not free the
+	 * object since we are still holding a ref
+	 * from the caller
+	 */
+	if (atomic_read(&mcg->qp_num) <= 0)
+		__rxe_destroy_mcg(mcg);
 
-	/* we didn't find the qp on the list */
-	spin_unlock_bh(&rxe->mcg_lock);
-	return -EINVAL;
+err_out:
+	mutex_unlock(&rxe->mcg_mutex);
+	return err;
 }
 
 /**
@@ -490,6 +431,9 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	struct rxe_qp *qp = to_rqp(ibqp);
 	struct rxe_mcg *mcg;
 
+	if (rxe->attr.max_mcast_grp == 0)
+		return -EINVAL;
+
 	/* takes a ref on mcg if successful */
 	mcg = rxe_get_mcg(rxe, mgid);
 	if (IS_ERR(mcg))
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 7153de0799fc..6cf0da958864 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -194,6 +194,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 	struct rxe_mca *mca;
 	struct rxe_qp *qp;
 	union ib_gid dgid;
+	unsigned long flags;
 	int err;
 
 	if (skb->protocol == htons(ETH_P_IP))
@@ -207,7 +208,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 	if (!mcg)
 		goto drop;	/* mcast group not registered */
 
-	spin_lock_bh(&rxe->mcg_lock);
+	spin_lock_irqsave(&mcg->lock, flags);
 
 	/* this is unreliable datagram service so we let
 	 * failures to deliver a multicast packet to a
@@ -259,7 +260,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 		}
 	}
 
-	spin_unlock_bh(&rxe->mcg_lock);
+	spin_unlock_irqrestore(&mcg->lock, flags);
 
 	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 8058e5039322..f21963dcb2c8 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -351,6 +351,7 @@ struct rxe_mcg {
 	struct list_head	qp_list;
 	union ib_gid		mgid;
 	atomic_t		qp_num;
+	spinlock_t		lock;	/* protect qp_list */
 	u32			qkey;
 	u16			pkey;
 	bool			is_ipv6;
@@ -390,7 +391,7 @@ struct rxe_dev {
 	struct rxe_pool		mw_pool;
 
 	/* multicast support */
-	spinlock_t		mcg_lock;
+	struct mutex		mcg_mutex;
 	struct rb_root		mcg_tree;
 	atomic_t		mcg_num;
 	atomic_t		mcg_attach;
-- 
2.40.1


^ permalink raw reply related

* [PATCH for-next 1/6] RDMA/rxe: Cleanup rxe_ah/av_chk_attr
From: Bob Pearson @ 2023-11-03 20:43 UTC (permalink / raw)
  To: jgg, yanjun.zhu, linux-rdma; +Cc: Bob Pearson
In-Reply-To: <20231103204324.9606-1-rpearsonhpe@gmail.com>

Replace rxe_ah_chk_attr() and rxe_av_chk_attr() by a single
routine rxe_chk_ah_attr().

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_av.c    | 43 ++++-----------------------
 drivers/infiniband/sw/rxe/rxe_loc.h   |  3 +-
 drivers/infiniband/sw/rxe/rxe_qp.c    |  4 +--
 drivers/infiniband/sw/rxe/rxe_verbs.c |  5 ++--
 4 files changed, 12 insertions(+), 43 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_av.c b/drivers/infiniband/sw/rxe/rxe_av.c
index 889d7adbd455..4ac17b8def28 100644
--- a/drivers/infiniband/sw/rxe/rxe_av.c
+++ b/drivers/infiniband/sw/rxe/rxe_av.c
@@ -14,45 +14,24 @@ void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av)
 	memcpy(av->dmac, attr->roce.dmac, ETH_ALEN);
 }
 
-static int chk_attr(void *obj, struct rdma_ah_attr *attr, bool obj_is_ah)
+int rxe_chk_ah_attr(struct rxe_dev *rxe, struct rdma_ah_attr *attr)
 {
 	const struct ib_global_route *grh = rdma_ah_read_grh(attr);
-	struct rxe_port *port;
-	struct rxe_dev *rxe;
-	struct rxe_qp *qp;
-	struct rxe_ah *ah;
+	struct rxe_port *port = &rxe->port;
 	int type;
 
-	if (obj_is_ah) {
-		ah = obj;
-		rxe = to_rdev(ah->ibah.device);
-	} else {
-		qp = obj;
-		rxe = to_rdev(qp->ibqp.device);
-	}
-
-	port = &rxe->port;
-
 	if (rdma_ah_get_ah_flags(attr) & IB_AH_GRH) {
 		if (grh->sgid_index > port->attr.gid_tbl_len) {
-			if (obj_is_ah)
-				rxe_dbg_ah(ah, "invalid sgid index = %d\n",
-						grh->sgid_index);
-			else
-				rxe_dbg_qp(qp, "invalid sgid index = %d\n",
-						grh->sgid_index);
+			rxe_dbg_dev(rxe, "invalid sgid index = %d\n",
+					grh->sgid_index);
 			return -EINVAL;
 		}
 
 		type = rdma_gid_attr_network_type(grh->sgid_attr);
 		if (type < RDMA_NETWORK_IPV4 ||
 		    type > RDMA_NETWORK_IPV6) {
-			if (obj_is_ah)
-				rxe_dbg_ah(ah, "invalid network type for rdma_rxe = %d\n",
-						type);
-			else
-				rxe_dbg_qp(qp, "invalid network type for rdma_rxe = %d\n",
-						type);
+			rxe_dbg_dev(rxe, "invalid network type for rdma_rxe = %d\n",
+					type);
 			return -EINVAL;
 		}
 	}
@@ -60,16 +39,6 @@ static int chk_attr(void *obj, struct rdma_ah_attr *attr, bool obj_is_ah)
 	return 0;
 }
 
-int rxe_av_chk_attr(struct rxe_qp *qp, struct rdma_ah_attr *attr)
-{
-	return chk_attr(qp, attr, false);
-}
-
-int rxe_ah_chk_attr(struct rxe_ah *ah, struct rdma_ah_attr *attr)
-{
-	return chk_attr(ah, attr, true);
-}
-
 void rxe_av_from_attr(u8 port_num, struct rxe_av *av,
 		     struct rdma_ah_attr *attr)
 {
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 4d2a8ef52c85..3d2504a0ae56 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -9,8 +9,7 @@
 
 /* rxe_av.c */
 void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av);
-int rxe_av_chk_attr(struct rxe_qp *qp, struct rdma_ah_attr *attr);
-int rxe_ah_chk_attr(struct rxe_ah *ah, struct rdma_ah_attr *attr);
+int rxe_chk_ah_attr(struct rxe_dev *rxe, struct rdma_ah_attr *attr);
 void rxe_av_from_attr(u8 port_num, struct rxe_av *av,
 		     struct rdma_ah_attr *attr);
 void rxe_av_to_attr(struct rxe_av *av, struct rdma_ah_attr *attr);
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 28e379c108bc..c28005db032d 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -456,11 +456,11 @@ int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp *qp,
 			goto err1;
 	}
 
-	if (mask & IB_QP_AV && rxe_av_chk_attr(qp, &attr->ah_attr))
+	if (mask & IB_QP_AV && rxe_chk_ah_attr(rxe, &attr->ah_attr))
 		goto err1;
 
 	if (mask & IB_QP_ALT_PATH) {
-		if (rxe_av_chk_attr(qp, &attr->alt_ah_attr))
+		if (rxe_chk_ah_attr(rxe, &attr->alt_ah_attr))
 			goto err1;
 		if (!rdma_is_port_valid(&rxe->ib_dev, attr->alt_port_num))  {
 			rxe_dbg_qp(qp, "invalid alt port %d\n", attr->alt_port_num);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 48f86839d36a..6706d540f1f6 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -286,7 +286,7 @@ static int rxe_create_ah(struct ib_ah *ibah,
 	/* create index > 0 */
 	ah->ah_num = ah->elem.index;
 
-	err = rxe_ah_chk_attr(ah, init_attr->ah_attr);
+	err = rxe_chk_ah_attr(rxe, init_attr->ah_attr);
 	if (err) {
 		rxe_dbg_ah(ah, "bad attr");
 		goto err_cleanup;
@@ -322,10 +322,11 @@ static int rxe_create_ah(struct ib_ah *ibah,
 
 static int rxe_modify_ah(struct ib_ah *ibah, struct rdma_ah_attr *attr)
 {
+	struct rxe_dev *rxe = to_rdev(ibah->device);
 	struct rxe_ah *ah = to_rah(ibah);
 	int err;
 
-	err = rxe_ah_chk_attr(ah, attr);
+	err = rxe_chk_ah_attr(rxe, attr);
 	if (err) {
 		rxe_dbg_ah(ah, "bad attr");
 		goto err_out;
-- 
2.40.1


^ permalink raw reply related

* [PATCH for-next 3/6] RDMA/rxe: Register IP mcast address
From: Bob Pearson @ 2023-11-03 20:43 UTC (permalink / raw)
  To: jgg, yanjun.zhu, linux-rdma; +Cc: Bob Pearson
In-Reply-To: <20231103204324.9606-1-rpearsonhpe@gmail.com>

Add code to rxe_mcast_add() and rxe_mcast_del() to register/deregister
the IP multicast address. This is required for multicast traffic to
reach the rxe driver.

Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 110 +++++++++++++++++++++-----
 drivers/infiniband/sw/rxe/rxe_net.c   |   2 +-
 drivers/infiniband/sw/rxe/rxe_net.h   |   1 +
 drivers/infiniband/sw/rxe/rxe_verbs.h |   1 +
 4 files changed, 93 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 86cc2e18a7fd..ec757b955979 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -19,38 +19,107 @@
  * mcast packets in the rxe receive path.
  */
 
+#include <linux/igmp.h>
+
 #include "rxe.h"
 
-/**
- * rxe_mcast_add - add multicast address to rxe device
- * @rxe: rxe device object
- * @mgid: multicast address as a gid
- *
- * Returns 0 on success else an error
- */
-static int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
+/* register mcast IP and MAC addresses with net stack */
+static int rxe_mcast_add6(struct rxe_dev *rxe, union ib_gid *mgid)
 {
 	unsigned char ll_addr[ETH_ALEN];
+	struct in6_addr *addr6 = (struct in6_addr *)mgid;
+	int err;
+
+	rtnl_lock();
+	err = ipv6_sock_mc_join(recv_sockets.sk6->sk, rxe->ndev->ifindex,
+				addr6);
+	rtnl_unlock();
+	if (err && err != -EADDRINUSE)
+		goto err_out;
 
 	ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
+	err = dev_mc_add(rxe->ndev, ll_addr);
+	if (err)
+		goto err_drop;
+
+	return 0;
 
-	return dev_mc_add(rxe->ndev, ll_addr);
+err_drop:
+	ipv6_sock_mc_drop(recv_sockets.sk6->sk, rxe->ndev->ifindex, addr6);
+err_out:
+	return err;
 }
 
-/**
- * rxe_mcast_del - delete multicast address from rxe device
- * @rxe: rxe device object
- * @mgid: multicast address as a gid
- *
- * Returns 0 on success else an error
- */
-static int rxe_mcast_del(struct rxe_dev *rxe, union ib_gid *mgid)
+static int rxe_mcast_add(struct rxe_mcg *mcg)
 {
+	struct rxe_dev *rxe = mcg->rxe;
+	union ib_gid *mgid = &mcg->mgid;
+	struct ip_mreqn imr = {};
 	unsigned char ll_addr[ETH_ALEN];
+	int err;
+
+	if (mcg->is_ipv6)
+		return rxe_mcast_add6(rxe, mgid);
+
+	imr.imr_multiaddr = *(struct in_addr *)(mgid->raw + 12);
+	imr.imr_ifindex = rxe->ndev->ifindex;
+	rtnl_lock();
+	err = ip_mc_join_group(recv_sockets.sk4->sk, &imr);
+	rtnl_unlock();
+	if (err && err != -EADDRINUSE)
+		goto err_out;
+
+	ip_eth_mc_map(imr.imr_multiaddr.s_addr, ll_addr);
+	err = dev_mc_add(rxe->ndev, ll_addr);
+	if (err)
+		goto err_leave;
+
+	return 0;
+
+err_leave:
+	ip_mc_leave_group(recv_sockets.sk4->sk, &imr);
+err_out:
+	return err;
+}
+
+/* deregister mcast IP and MAC addresses with net stack */
+static int rxe_mcast_del6(struct rxe_dev *rxe, union ib_gid *mgid)
+{
+	unsigned char ll_addr[ETH_ALEN];
+	int err, err2;
 
 	ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
+	err = dev_mc_del(rxe->ndev, ll_addr);
+
+	rtnl_lock();
+	err2 = ipv6_sock_mc_drop(recv_sockets.sk6->sk,
+			rxe->ndev->ifindex, (struct in6_addr *)mgid);
+	rtnl_unlock();
+
+	return err ?: err2;
+}
+
+static int rxe_mcast_del(struct rxe_mcg *mcg)
+{
+	struct rxe_dev *rxe = mcg->rxe;
+	union ib_gid *mgid = &mcg->mgid;
+	struct ip_mreqn imr = {};
+	unsigned char ll_addr[ETH_ALEN];
+	int err, err2;
+
+	if (mcg->is_ipv6)
+		return rxe_mcast_del6(rxe, mgid);
+
+	imr.imr_multiaddr = *(struct in_addr *)(mgid->raw + 12);
+	imr.imr_ifindex = rxe->ndev->ifindex;
+	ip_eth_mc_map(imr.imr_multiaddr.s_addr, ll_addr);
+	err = dev_mc_del(rxe->ndev, ll_addr);
+
+	rtnl_lock();
+	err2 = ip_mc_leave_group(recv_sockets.sk4->sk, &imr);
+	rtnl_unlock();
 
-	return dev_mc_del(rxe->ndev, ll_addr);
+	return err ?: err2;
 }
 
 /**
@@ -164,6 +233,7 @@ static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 {
 	kref_init(&mcg->ref_cnt);
 	memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid));
+	mcg->is_ipv6 = !ipv6_addr_v4mapped((struct in6_addr *)mgid);
 	INIT_LIST_HEAD(&mcg->qp_list);
 	mcg->rxe = rxe;
 
@@ -225,7 +295,7 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
 	spin_unlock_bh(&rxe->mcg_lock);
 
 	/* add mcast address outside of lock */
-	err = rxe_mcast_add(rxe, mgid);
+	err = rxe_mcast_add(mcg);
 	if (!err)
 		return mcg;
 
@@ -273,7 +343,7 @@ static void __rxe_destroy_mcg(struct rxe_mcg *mcg)
 static void rxe_destroy_mcg(struct rxe_mcg *mcg)
 {
 	/* delete mcast address outside of lock */
-	rxe_mcast_del(mcg->rxe, &mcg->mgid);
+	rxe_mcast_del(mcg);
 
 	spin_lock_bh(&mcg->rxe->mcg_lock);
 	__rxe_destroy_mcg(mcg);
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 2fad56fc95e7..36617d07fddf 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -18,7 +18,7 @@
 #include "rxe_net.h"
 #include "rxe_loc.h"
 
-static struct rxe_recv_sockets recv_sockets;
+struct rxe_recv_sockets recv_sockets;
 
 static struct dst_entry *rxe_find_route4(struct rxe_qp *qp,
 					 struct net_device *ndev,
diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h
index 45d80d00f86b..89cee7d5340f 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.h
+++ b/drivers/infiniband/sw/rxe/rxe_net.h
@@ -15,6 +15,7 @@ struct rxe_recv_sockets {
 	struct socket *sk4;
 	struct socket *sk6;
 };
+extern struct rxe_recv_sockets recv_sockets;
 
 int rxe_net_add(const char *ibdev_name, struct net_device *ndev);
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index ccb9d19ffe8a..7be9e6232dd9 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -352,6 +352,7 @@ struct rxe_mcg {
 	atomic_t		qp_num;
 	u32			qkey;
 	u16			pkey;
+	bool			is_ipv6;
 };
 
 struct rxe_mca {
-- 
2.40.1


^ permalink raw reply related

* [PATCH for-next 4/6] RDMA/rxe: Let rxe_lookup_mcg use rcu_read_lock
From: Bob Pearson @ 2023-11-03 20:43 UTC (permalink / raw)
  To: jgg, yanjun.zhu, linux-rdma; +Cc: Bob Pearson
In-Reply-To: <20231103204324.9606-1-rpearsonhpe@gmail.com>

Change locking of read side operations of the multicast group
red-black tree to use rcu read locking. This will allow changing
the mcast lock in the next patch to be changed to a mutex without
breaking rxe_recv.c.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 35 +++++++--------------------
 drivers/infiniband/sw/rxe/rxe_verbs.h |  1 +
 2 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index ec757b955979..d7b8e31ab480 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -148,7 +148,7 @@ static void __rxe_insert_mcg(struct rxe_mcg *mcg)
 			link = &(*link)->rb_right;
 	}
 
-	rb_link_node(&mcg->node, node, link);
+	rb_link_node_rcu(&mcg->node, node, link);
 	rb_insert_color(&mcg->node, tree);
 }
 
@@ -164,14 +164,13 @@ static void __rxe_remove_mcg(struct rxe_mcg *mcg)
 }
 
 /**
- * __rxe_lookup_mcg - lookup mcg in rxe->mcg_tree while holding lock
+ * rxe_lookup_mcg - lookup mcg in rxe->mcg_tree while holding lock
  * @rxe: rxe device object
  * @mgid: multicast IP address
  *
- * Context: caller must hold rxe->mcg_lock
  * Returns: mcg on success and takes a ref to mcg else NULL
  */
-static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe,
+struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe,
 					union ib_gid *mgid)
 {
 	struct rb_root *tree = &rxe->mcg_tree;
@@ -179,7 +178,8 @@ static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe,
 	struct rb_node *node;
 	int cmp;
 
-	node = tree->rb_node;
+	rcu_read_lock();
+	node = rcu_dereference_raw(tree->rb_node);
 
 	while (node) {
 		mcg = rb_entry(node, struct rxe_mcg, node);
@@ -187,12 +187,13 @@ static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe,
 		cmp = memcmp(&mcg->mgid, mgid, sizeof(*mgid));
 
 		if (cmp > 0)
-			node = node->rb_left;
+			node = rcu_dereference_raw(node->rb_left);
 		else if (cmp < 0)
-			node = node->rb_right;
+			node = rcu_dereference_raw(node->rb_right);
 		else
 			break;
 	}
+	rcu_read_unlock();
 
 	if (node) {
 		kref_get(&mcg->ref_cnt);
@@ -202,24 +203,6 @@ static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe,
 	return NULL;
 }
 
-/**
- * rxe_lookup_mcg - lookup up mcg in red-back tree
- * @rxe: rxe device object
- * @mgid: multicast IP address
- *
- * Returns: mcg if found else NULL
- */
-struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
-{
-	struct rxe_mcg *mcg;
-
-	spin_lock_bh(&rxe->mcg_lock);
-	mcg = __rxe_lookup_mcg(rxe, mgid);
-	spin_unlock_bh(&rxe->mcg_lock);
-
-	return mcg;
-}
-
 /**
  * __rxe_init_mcg - initialize a new mcg
  * @rxe: rxe device
@@ -313,7 +296,7 @@ void rxe_cleanup_mcg(struct kref *kref)
 {
 	struct rxe_mcg *mcg = container_of(kref, typeof(*mcg), ref_cnt);
 
-	kfree(mcg);
+	kfree_rcu(mcg, rcu);
 }
 
 /**
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 7be9e6232dd9..8058e5039322 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -345,6 +345,7 @@ struct rxe_mw {
 
 struct rxe_mcg {
 	struct rb_node		node;
+	struct rcu_head		rcu;
 	struct kref		ref_cnt;
 	struct rxe_dev		*rxe;
 	struct list_head	qp_list;
-- 
2.40.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox