Re: [PATCH for-next] RDMA/rxe: Fix incorrect fencing

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

From: Bob Pearson <rpearsonhpe@gmail.com>
To: Haris Iqbal <haris.iqbal@ionos.com>
Cc: jgg@nvidia.com, zyjzyj2000@gmail.com, jhack@hpe.com,
	frank.zago@hpe.com, linux-rdma@vger.kernel.org,
	Aleksei Marov <aleksei.marov@ionos.com>,
	Jinpu Wang <jinpu.wang@ionos.com>
Subject: Re: [PATCH for-next] RDMA/rxe: Fix incorrect fencing
Date: Mon, 23 May 2022 13:22:18 -0500	[thread overview]
Message-ID: <aa01e627-04fe-b331-b367-07cbb8731b8d@gmail.com> (raw)
In-Reply-To: <CAJpMwyhsf_C6dosUH81_5aD4fd5XHNPD94B3NE=TT+fSBAKW1g@mail.gmail.com>

On 5/23/22 03:05, Haris Iqbal wrote:
> On Mon, May 23, 2022 at 5:51 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>>
>> On 5/22/22 18:59, Haris Iqbal wrote:
>>> On Mon, May 23, 2022 at 12:36 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>>>>
>>>> Currently the rxe driver checks if any previous operation
>>>> is not complete to determine if a fence wait is required.
>>>> This is not correct. For a regular fence only previous
>>>> read or atomic operations must be complete while for a local
>>>> invalidate fence all previous operations must be complete.
>>>> This patch corrects this behavior.
>>>>
>>>> Fixes: 8700e3e7c4857 ("Soft RoCE (RXE) - The software RoCE driver")
>>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>>> ---
>>>>  drivers/infiniband/sw/rxe/rxe_req.c | 42 ++++++++++++++++++++++++-----
>>>>  1 file changed, 36 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>>>> index ae5fbc79dd5c..f36263855a45 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>>>> @@ -163,16 +163,41 @@ static struct rxe_send_wqe *req_next_wqe(struct rxe_qp *qp)
>>>>                      (wqe->state != wqe_state_processing)))
>>>>                 return NULL;
>>>>
>>>> -       if (unlikely((wqe->wr.send_flags & IB_SEND_FENCE) &&
>>>> -                                                    (index != cons))) {
>>>> -               qp->req.wait_fence = 1;
>>>> -               return NULL;
>>>> -       }
>>>> -
>>>>         wqe->mask = wr_opcode_mask(wqe->wr.opcode, qp);
>>>>         return wqe;
>>>>  }
>>>>
>>>> +/**
>>>> + * rxe_wqe_is_fenced - check if next wqe is fenced
>>>> + * @qp: the queue pair
>>>> + * @wqe: the next wqe
>>>> + *
>>>> + * Returns: 1 if wqe is fenced (needs to wait)
>>>> + *         0 if wqe is good to go
>>>> + */
>>>> +static int rxe_wqe_is_fenced(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
>>>> +{
>>>> +       unsigned int cons;
>>>> +
>>>> +       if (!(wqe->wr.send_flags & IB_SEND_FENCE))
>>>> +               return 0;
>>>> +
>>>> +       cons = queue_get_consumer(qp->sq.queue, QUEUE_TYPE_FROM_CLIENT);
>>>> +
>>>> +       /* Local invalidate fence (LIF) see IBA 10.6.5.1
>>>> +        * Requires ALL previous operations on the send queue
>>>> +        * are complete.
>>>> +        */
>>>> +       if (wqe->wr.opcode == IB_WR_LOCAL_INV)
>>>> +               return qp->req.wqe_index != cons;
>>>
>>>
>>> Do I understand correctly that according to this code a wr with opcode
>>> IB_WR_LOCAL_INV needs to have the IB_SEND_FENCE also set for this to
>>> work?
>>>
>>> If that is the desired behaviour, can you point out where in spec this
>>> is mentioned.
>>
>> According to IBA "Local invalidate fence" (LIF) and regular Fence behave
>> differently. (See the referenced sections in the IBA.) For a local invalidate
>> operation the fence bit fences all previous operations. That was the old behavior
>> which made no distinction between local invalidate and other operations.
>> The change here are the other operations with a regular fence which should only
>> requires read and atomic operations to be fenced.
>>
>> Not sure what you mean by 'also'. Per the IBA if the LIF is set then you have
>> strict invalidate ordering if not then you have relaxed ordering. The kernel verbs
>> API only has one fence bit and does not have a separate LIF bit so I am
>> interpreting them to share the one bit.
> 
> I see. Now I understand. Thanks for the explanation.
> 
> I do have a follow-up question. For a IB_WR_LOCAL_INV wr, without the
> fence bit means relaxed ordering. This would mean that the completion
> for that wr must take place "before any subsequent WQE has begun
> execution". From what I understand, multiple rxe_requester instances
> can run in parallel and pick up wqes and execute them. How is the
> relaxed ordering criteria fulfilled?

The requester is a tasklet. There is one tasklet instance per QP. Tasklets can only
run on a single cpu so not in parallel. The tasklets for multiple cpus each
execute a single send queue in order.
> 
>>
>> Bob
>>>
>>> Thanks.
>>>
>>>
>>>> +
>>>> +       /* Fence see IBA 10.8.3.3
>>>> +        * Requires that all previous read and atomic operations
>>>> +        * are complete.
>>>> +        */
>>>> +       return atomic_read(&qp->req.rd_atomic) != qp->attr.max_rd_atomic;
>>>> +}
>>>> +
>>>>  static int next_opcode_rc(struct rxe_qp *qp, u32 opcode, int fits)
>>>>  {
>>>>         switch (opcode) {
>>>> @@ -636,6 +661,11 @@ int rxe_requester(void *arg)
>>>>         if (unlikely(!wqe))
>>>>                 goto exit;
>>>>
>>>> +       if (rxe_wqe_is_fenced(qp, wqe)) {
>>>> +               qp->req.wait_fence = 1;
>>>> +               goto exit;
>>>> +       }
>>>> +
>>>>         if (wqe->mask & WR_LOCAL_OP_MASK) {
>>>>                 ret = rxe_do_local_ops(qp, wqe);
>>>>                 if (unlikely(ret))
>>>>
>>>> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a
>>>> --
>>>> 2.34.1
>>>>
>>

next prev parent reply	other threads:[~2022-05-23 18:45 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-22 22:33 [PATCH for-next] RDMA/rxe: Fix incorrect fencing Bob Pearson
2022-05-22 23:59 ` Haris Iqbal
2022-05-23  3:51   ` Bob Pearson
2022-05-23  8:05     ` Haris Iqbal
2022-05-23 18:22       ` Bob Pearson [this message]
2022-05-24 10:28         ` Haris Iqbal
2022-05-24 18:20           ` Bob Pearson
2022-05-27 10:18             ` Haris Iqbal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa01e627-04fe-b331-b367-07cbb8731b8d@gmail.com \
    --to=rpearsonhpe@gmail.com \
    --cc=aleksei.marov@ionos.com \
    --cc=frank.zago@hpe.com \
    --cc=haris.iqbal@ionos.com \
    --cc=jgg@nvidia.com \
    --cc=jhack@hpe.com \
    --cc=jinpu.wang@ionos.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox