* Re: [PATCH for-next 2/2] RDMA/rxe: Enable rcu locking of indexed objects
2023-07-18 17:59 ` [PATCH for-next 2/2] RDMA/rxe: Enable rcu locking of indexed objects Bob Pearson
@ 2023-07-19 5:38 ` Zhu Yanjun
2023-07-19 14:34 ` Bob Pearson
2023-07-19 7:49 ` Leon Romanovsky
2023-08-09 19:17 ` Jason Gunthorpe
2 siblings, 1 reply; 10+ messages in thread
From: Zhu Yanjun @ 2023-07-19 5:38 UTC (permalink / raw)
To: Bob Pearson; +Cc: jgg, leon, linux-rdma, jhack
On Wed, Jul 19, 2023 at 2:00 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> Make rcu_read locking of critical sections with the indexed
> verbs objects be protected from early freeing of those objects.
> The AH, QP, MR and MW objects are looked up from their indices
> contained in received packets or WQEs during I/O processing.
> Make these objects be freed using kfree_rcu to avoid races.
>
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
> drivers/infiniband/sw/rxe/rxe_pool.h | 1 +
> drivers/infiniband/sw/rxe/rxe_verbs.c | 6 +++++-
> 2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
> index b42e26427a70..ef2f6f88e254 100644
> --- a/drivers/infiniband/sw/rxe/rxe_pool.h
> +++ b/drivers/infiniband/sw/rxe/rxe_pool.h
> @@ -25,6 +25,7 @@ struct rxe_pool_elem {
> struct kref ref_cnt;
> struct list_head list;
> struct completion complete;
> + struct rcu_head rcu;
> u32 index;
> };
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
> index 903f0b71447e..b899924b81eb 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> @@ -1395,7 +1395,7 @@ static int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
> if (cleanup_err)
> rxe_err_mr(mr, "cleanup failed, err = %d", cleanup_err);
>
> - kfree_rcu_mightsleep(mr);
> + kfree_rcu(mr, elem.rcu);
> return 0;
>
> err_out:
> @@ -1494,6 +1494,10 @@ static const struct ib_device_ops rxe_dev_ops = {
> INIT_RDMA_OBJ_SIZE(ib_srq, rxe_srq, ibsrq),
> INIT_RDMA_OBJ_SIZE(ib_ucontext, rxe_ucontext, ibuc),
> INIT_RDMA_OBJ_SIZE(ib_mw, rxe_mw, ibmw),
> +
> + .rcu_offset_ah = offsetof(struct rxe_ah, elem.rcu),
> + .rcu_offset_qp = offsetof(struct rxe_qp, elem.rcu),
> + .rcu_offset_mw = offsetof(struct rxe_mw, elem.rcu),
In your commits, you mentioned that ah, qp, mw and mr will be
protected. But here, only ah, qp and mw are set.
Not sure if mr is also protected and set in other place.
Except that, I am fine with it.
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Zhu Yanjun
> };
>
> int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
> --
> 2.39.2
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH for-next 2/2] RDMA/rxe: Enable rcu locking of indexed objects
2023-07-19 5:38 ` Zhu Yanjun
@ 2023-07-19 14:34 ` Bob Pearson
0 siblings, 0 replies; 10+ messages in thread
From: Bob Pearson @ 2023-07-19 14:34 UTC (permalink / raw)
To: Zhu Yanjun; +Cc: jgg, leon, linux-rdma, jhack
On 7/19/23 00:38, Zhu Yanjun wrote:
> On Wed, Jul 19, 2023 at 2:00 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>>
>> Make rcu_read locking of critical sections with the indexed
>> verbs objects be protected from early freeing of those objects.
>> The AH, QP, MR and MW objects are looked up from their indices
>> contained in received packets or WQEs during I/O processing.
>> Make these objects be freed using kfree_rcu to avoid races.
>>
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> ---
>> drivers/infiniband/sw/rxe/rxe_pool.h | 1 +
>> drivers/infiniband/sw/rxe/rxe_verbs.c | 6 +++++-
>> 2 files changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
>> index b42e26427a70..ef2f6f88e254 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_pool.h
>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.h
>> @@ -25,6 +25,7 @@ struct rxe_pool_elem {
>> struct kref ref_cnt;
>> struct list_head list;
>> struct completion complete;
>> + struct rcu_head rcu;
>> u32 index;
>> };
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
>> index 903f0b71447e..b899924b81eb 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
>> @@ -1395,7 +1395,7 @@ static int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
>> if (cleanup_err)
>> rxe_err_mr(mr, "cleanup failed, err = %d", cleanup_err);
>>
>> - kfree_rcu_mightsleep(mr);
>> + kfree_rcu(mr, elem.rcu);
>> return 0;
>>
>> err_out:
>> @@ -1494,6 +1494,10 @@ static const struct ib_device_ops rxe_dev_ops = {
>> INIT_RDMA_OBJ_SIZE(ib_srq, rxe_srq, ibsrq),
>> INIT_RDMA_OBJ_SIZE(ib_ucontext, rxe_ucontext, ibuc),
>> INIT_RDMA_OBJ_SIZE(ib_mw, rxe_mw, ibmw),
>> +
>> + .rcu_offset_ah = offsetof(struct rxe_ah, elem.rcu),
>> + .rcu_offset_qp = offsetof(struct rxe_qp, elem.rcu),
>> + .rcu_offset_mw = offsetof(struct rxe_mw, elem.rcu),
>
> In your commits, you mentioned that ah, qp, mw and mr will be
> protected. But here, only ah, qp and mw are set.
> Not sure if mr is also protected and set in other place.
> Except that, I am fine with it.
>
> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
>
> Zhu Yanjun
>
>> };
>>
>> int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
>> --
>> 2.39.2
>>
All of the verbs objects except MR are allocated and freed by rdma-core. MR is allocated
and freed by the drivers. So for MR the kfree_rcu call is made in rxe_dereg_mr().
I changed it from kfree_rcu_mightsleep to the kfree_rcu variant to match the three others.
Bob
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH for-next 2/2] RDMA/rxe: Enable rcu locking of indexed objects
2023-07-18 17:59 ` [PATCH for-next 2/2] RDMA/rxe: Enable rcu locking of indexed objects Bob Pearson
2023-07-19 5:38 ` Zhu Yanjun
@ 2023-07-19 7:49 ` Leon Romanovsky
2023-07-19 16:43 ` Bob Pearson
2023-08-09 19:17 ` Jason Gunthorpe
2 siblings, 1 reply; 10+ messages in thread
From: Leon Romanovsky @ 2023-07-19 7:49 UTC (permalink / raw)
To: Bob Pearson; +Cc: jgg, zyjzyj2000, linux-rdma, jhack
On Tue, Jul 18, 2023 at 12:59:44PM -0500, Bob Pearson wrote:
> Make rcu_read locking of critical sections with the indexed
> verbs objects be protected from early freeing of those objects.
> The AH, QP, MR and MW objects are looked up from their indices
> contained in received packets or WQEs during I/O processing.
> Make these objects be freed using kfree_rcu to avoid races.
Sorry, how use of RCU avoid races?
Thanks
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH for-next 2/2] RDMA/rxe: Enable rcu locking of indexed objects
2023-07-19 7:49 ` Leon Romanovsky
@ 2023-07-19 16:43 ` Bob Pearson
2023-07-23 15:00 ` Leon Romanovsky
0 siblings, 1 reply; 10+ messages in thread
From: Bob Pearson @ 2023-07-19 16:43 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: jgg, zyjzyj2000, linux-rdma, jhack
On 7/19/23 02:49, Leon Romanovsky wrote:
> On Tue, Jul 18, 2023 at 12:59:44PM -0500, Bob Pearson wrote:
>> Make rcu_read locking of critical sections with the indexed
>> verbs objects be protected from early freeing of those objects.
>> The AH, QP, MR and MW objects are looked up from their indices
>> contained in received packets or WQEs during I/O processing.
>> Make these objects be freed using kfree_rcu to avoid races.
>
> Sorry, how use of RCU avoid races?
>
> Thanks
The races are between destroy/deallocate/dereg verbs API calls and packets arriving or completing send
or deferred processing of wqes. Packets and wqes contain indices/keys/numbers that refer to objects.
The rxe driver maintains xarrays for each type of object that allow to lookup the address of the object
from its index and then take a reference to protect the pointer. The destroy verbs defer completion
until the reference count falls to zero and then removes the entry in the xarray. These operations
need to be atomic. One alternative is to use spinlocks to protect them but that places a load on
performance under heavy load which is typically dominated by the lookup function since objects tend
to have a long lifetime. rcu readlocks are a better alternative but depend on the deferred destruction
of the objects used in the rcu critical section. In certain benchmarks the use of rcu locking has
a very large performance advantage over spinlocks. For example 100s of QPs running over a 200 gbit/s
network.
Bob
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH for-next 2/2] RDMA/rxe: Enable rcu locking of indexed objects
2023-07-19 16:43 ` Bob Pearson
@ 2023-07-23 15:00 ` Leon Romanovsky
0 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2023-07-23 15:00 UTC (permalink / raw)
To: Bob Pearson; +Cc: jgg, zyjzyj2000, linux-rdma, jhack
On Wed, Jul 19, 2023 at 11:43:30AM -0500, Bob Pearson wrote:
> On 7/19/23 02:49, Leon Romanovsky wrote:
> > On Tue, Jul 18, 2023 at 12:59:44PM -0500, Bob Pearson wrote:
> >> Make rcu_read locking of critical sections with the indexed
> >> verbs objects be protected from early freeing of those objects.
> >> The AH, QP, MR and MW objects are looked up from their indices
> >> contained in received packets or WQEs during I/O processing.
> >> Make these objects be freed using kfree_rcu to avoid races.
> >
> > Sorry, how use of RCU avoid races?
> >
> > Thanks
>
> The races are between destroy/deallocate/dereg verbs API calls and packets arriving or completing send
> or deferred processing of wqes. Packets and wqes contain indices/keys/numbers that refer to objects.
> The rxe driver maintains xarrays for each type of object that allow to lookup the address of the object
> from its index and then take a reference to protect the pointer. The destroy verbs defer completion
> until the reference count falls to zero and then removes the entry in the xarray. These operations
> need to be atomic. One alternative is to use spinlocks to protect them but that places a load on
> performance under heavy load which is typically dominated by the lookup function since objects tend
> to have a long lifetime. rcu readlocks are a better alternative but depend on the deferred destruction
> of the objects used in the rcu critical section.
You rarely can replace locks with RCU without careful design changes.
Thanks
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH for-next 2/2] RDMA/rxe: Enable rcu locking of indexed objects
2023-07-18 17:59 ` [PATCH for-next 2/2] RDMA/rxe: Enable rcu locking of indexed objects Bob Pearson
2023-07-19 5:38 ` Zhu Yanjun
2023-07-19 7:49 ` Leon Romanovsky
@ 2023-08-09 19:17 ` Jason Gunthorpe
2 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2023-08-09 19:17 UTC (permalink / raw)
To: Bob Pearson; +Cc: leon, zyjzyj2000, linux-rdma, jhack
On Tue, Jul 18, 2023 at 12:59:44PM -0500, Bob Pearson wrote:
> Make rcu_read locking of critical sections with the indexed
> verbs objects be protected from early freeing of those objects.
> The AH, QP, MR and MW objects are looked up from their indices
> contained in received packets or WQEs during I/O processing.
> Make these objects be freed using kfree_rcu to avoid races.
>
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
> drivers/infiniband/sw/rxe/rxe_pool.h | 1 +
> drivers/infiniband/sw/rxe/rxe_verbs.c | 6 +++++-
> 2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
> index b42e26427a70..ef2f6f88e254 100644
> --- a/drivers/infiniband/sw/rxe/rxe_pool.h
> +++ b/drivers/infiniband/sw/rxe/rxe_pool.h
> @@ -25,6 +25,7 @@ struct rxe_pool_elem {
> struct kref ref_cnt;
> struct list_head list;
> struct completion complete;
> + struct rcu_head rcu;
> u32 index;
> };
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
> index 903f0b71447e..b899924b81eb 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> @@ -1395,7 +1395,7 @@ static int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
> if (cleanup_err)
> rxe_err_mr(mr, "cleanup failed, err = %d", cleanup_err);
>
> - kfree_rcu_mightsleep(mr);
> + kfree_rcu(mr, elem.rcu);
> return 0;
>
> err_out:
> @@ -1494,6 +1494,10 @@ static const struct ib_device_ops rxe_dev_ops = {
> INIT_RDMA_OBJ_SIZE(ib_srq, rxe_srq, ibsrq),
> INIT_RDMA_OBJ_SIZE(ib_ucontext, rxe_ucontext, ibuc),
> INIT_RDMA_OBJ_SIZE(ib_mw, rxe_mw, ibmw),
> +
> + .rcu_offset_ah = offsetof(struct rxe_ah, elem.rcu),
> + .rcu_offset_qp = offsetof(struct rxe_qp, elem.rcu),
> + .rcu_offset_mw = offsetof(struct rxe_mw, elem.rcu),
> };
This isn't quite the right calculation.. It works because the ib
struct is at the top of these structs
Do it like this:
/*
* When used the core will RCU free the object using the driver provided
* rcu member.
*/
#define INIT_RDMA_OBJ_SIZE_RCU(ib_struct, drv_struct, member, rcu_member) \
.rcu_offset_##ib_struct = \
(offsetof(struct drv_struct, rcu_member) - \
offsetof(struct drv_struct, member) + \
BUILD_BUG_ON_ZERO(offsetof(struct drv_struct, rcu_member) > \
offsetof(struct drv_struct, member))), \
INIT_RDMA_OBJ_SIZE(ib_struct, drv_struct, member),
(and maybe the prior patch could stick with the ssize_t and we could
remove the BUILD_BUG_ON_ZERO?)
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread