Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Bob Pearson <rpearsonhpe@gmail.com>
To: Zhu Yanjun <zyjzyj2000@gmail.com>
Cc: jgg@nvidia.com, leonro@nvidia.com, linux-rdma@vger.kernel.org
Subject: Re: [PATCH for-next] RDMA/rxe: Handle zero length cases correctly
Date: Thu, 19 Jan 2023 22:27:02 -0600	[thread overview]
Message-ID: <20809b59-0d7f-b6b0-e51c-026a78f07a86@gmail.com> (raw)
In-Reply-To: <CAD=hENcdkWchRrvH+KXLXZoaQcZPpnCdV9V9T9mmzkJ13DJKUA@mail.gmail.com>

On 1/19/23 19:38, Zhu Yanjun wrote:
> On Fri, Jan 20, 2023 at 3:09 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>>
>> Currently the rxe driver, in rare situations, can respond incorrectly
>> to zero length operations which are retried. The client does not
>> have to provide an rkey for zero length RDMA operations so the rkey
>> may be invalid. The driver saves this rkey in the responder resources
>> to replay the rdma operation if a retry is required so the second pass
>> will use this (potentially) invalid rkey which may result in memory
>> faults.
>>
>> This patch corrects the driver to ignore the provided rkey if the
>> reth length is zero and make sure to set the mr to NULL. In read_reply()
>> if the length is zero the MR is set to NULL. Warnings are added in
>> the routines in rxe_mr.c to catch NULL MRs when the length is non-zero.
>>
> 
> There is a patch in the following link:
> 
> https://patchwork.kernel.org/project/linux-rdma/patch/20230113023527.728725-1-baijiaju1990@gmail.com/
> 
> Not sure whether it is similar or not.
> 
> Zhu Yanjun
> 
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> ---
>>  drivers/infiniband/sw/rxe/rxe_mr.c   |  9 +++++++
>>  drivers/infiniband/sw/rxe/rxe_resp.c | 36 +++++++++++++++++++++-------
>>  2 files changed, 36 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>> index 072eac4b65d2..134a74f315c2 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>> @@ -267,6 +267,9 @@ void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length)
>>         int m, n;
>>         void *addr;
>>
>> +       if (WARN_ON(!mr))
>> +               return NULL;
>> +
>>         if (mr->state != RXE_MR_STATE_VALID) {
>>                 rxe_dbg_mr(mr, "Not in valid state\n");
>>                 addr = NULL;
>> @@ -305,6 +308,9 @@ int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, int length)
>>         if (length == 0)
>>                 return 0;
>>
>> +       if (WARN_ON(!mr))
>> +               return -EINVAL;
>> +
>>         if (mr->ibmr.type == IB_MR_TYPE_DMA)
>>                 return -EFAULT;
>>
>> @@ -349,6 +355,9 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length,
>>         if (length == 0)
>>                 return 0;
>>
>> +       if (WARN_ON(!mr))
>> +               return -EINVAL;
>> +
>>         if (mr->ibmr.type == IB_MR_TYPE_DMA) {
>>                 u8 *src, *dest;
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
>> index c74972244f08..a528dc25d389 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>> @@ -457,13 +457,23 @@ static enum resp_states rxe_resp_check_length(struct rxe_qp *qp,
>>         return RESPST_CHK_RKEY;
>>  }
>>
>> +/* if the reth length field is zero we can assume nothing
>> + * about the rkey value and should not validate or use it.
>> + * Instead set qp->resp.rkey to 0 which is an invalid rkey
>> + * value since the minimum index part is 1.
>> + */
>>  static void qp_resp_from_reth(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
>>  {
>> +       unsigned int length = reth_len(pkt);
>> +
>>         qp->resp.va = reth_va(pkt);
>>         qp->resp.offset = 0;
>> -       qp->resp.rkey = reth_rkey(pkt);
>> -       qp->resp.resid = reth_len(pkt);
>> -       qp->resp.length = reth_len(pkt);
>> +       qp->resp.resid = length;
>> +       qp->resp.length = length;
>> +       if (length)
>> +               qp->resp.rkey = reth_rkey(pkt);
>> +       else
>> +               qp->resp.rkey = 0;
>>  }
>>
>>  static void qp_resp_from_atmeth(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
>> @@ -512,8 +522,8 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
>>
>>         /* A zero-byte op is not required to set an addr or rkey. See C9-88 */
>>         if ((pkt->mask & RXE_READ_OR_WRITE_MASK) &&
>> -           (pkt->mask & RXE_RETH_MASK) &&
>> -           reth_len(pkt) == 0) {
>> +           (pkt->mask & RXE_RETH_MASK) && reth_len(pkt) == 0) {
>> +               qp->resp.mr = NULL;
>>                 return RESPST_EXECUTE;
>>         }
>>
>> @@ -592,6 +602,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
>>         return RESPST_EXECUTE;
>>
>>  err:
>> +       qp->resp.mr = NULL;
>>         if (mr)
>>                 rxe_put(mr);
>>         if (mw)
>> @@ -966,7 +977,10 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>         }
>>
>>         if (res->state == rdatm_res_state_new) {
>> -               if (!res->replay) {
>> +               if (qp->resp.length == 0) {
>> +                       mr = NULL;
>> +                       qp->resp.mr = NULL;
>> +               } else if (!res->replay) {
>>                         mr = qp->resp.mr;
>>                         qp->resp.mr = NULL;
>>                 } else {
>> @@ -980,9 +994,13 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>                 else
>>                         opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST;
>>         } else {
>> -               mr = rxe_recheck_mr(qp, res->read.rkey);
>> -               if (!mr)
>> -                       return RESPST_ERR_RKEY_VIOLATION;
>> +               if (qp->resp.length == 0) {
>> +                       mr = NULL;
>> +               } else {
>> +                       mr = rxe_recheck_mr(qp, res->read.rkey);
>> +                       if (!mr)
>> +                               return RESPST_ERR_RKEY_VIOLATION;
>> +               }
>>
>>                 if (res->read.resid > mtu)
>>                         opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE;
>> --
>> 2.37.2
>>

Zhu,

It relates since he is checking for NULL MRs. But I don't think it addresses the root
causes. The patch I sent should eliminate NULL MRs together with length != 0 in
the copy routines. I added WARN_ON's in case someone changes things later and
we hit this again. (A warning is more useful than a fault which can be very hard
to diagnose.)

The two changes I made that attack the cause of problems are
(1) clearing qp->resp.mr in check_rkey() in the alternate paths. The primary
path demands that it get set with a valid mr. But on the alternate paths it isn't
set at all and can leave with a stale, invalid or wrong mr value.
(2) in read_reply() there is an error path where a zero length read fails to get
acked and the requester retries the operation and sends a second request. This
will end up in read_reply and as currently written attempt to lookup the rkey and
turn it into an MR but no valid rkey is required in a zero length operation so this
is likely to fail. The fixes treats length == 0 as a special case and force a NULL mr.
This should not trigger a fault in the mr copy/etc. routines since they always
check for length == 0 and return or require a non zero length.

Thanks,

Bob



  reply	other threads:[~2023-01-20  4:27 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-19 19:06 [PATCH for-next] RDMA/rxe: Handle zero length cases correctly Bob Pearson
2023-01-20  1:38 ` Zhu Yanjun
2023-01-20  4:27   ` Bob Pearson [this message]
2023-01-20  6:04     ` Zhu Yanjun
2023-01-23 10:20       ` Daisuke Matsuda (Fujitsu)
2023-01-24 22:03         ` Bob Pearson
2023-01-27 16:24           ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20809b59-0d7f-b6b0-e51c-026a78f07a86@gmail.com \
    --to=rpearsonhpe@gmail.com \
    --cc=jgg@nvidia.com \
    --cc=leonro@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox