Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Tom Talpey <tom@talpey.com>
To: "Daisuke Matsuda (Fujitsu)" <matsuda-daisuke@fujitsu.com>,
	'Bob Pearson' <rpearsonhpe@gmail.com>,
	"jgg@nvidia.com" <jgg@nvidia.com>,
	"zyjzyj2000@gmail.com" <zyjzyj2000@gmail.com>,
	"leonro@nvidia.com" <leonro@nvidia.com>,
	"yangx.jy@fujitsu.com" <yangx.jy@fujitsu.com>,
	"lizhijian@fujitsu.com" <lizhijian@fujitsu.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: [PATCH for-next v3] Subject: RDMA/rxe: Handle zero length rdma
Date: Wed, 1 Feb 2023 10:38:05 -0500	[thread overview]
Message-ID: <24a30a4b-ab51-b24a-7976-eeefabb99619@talpey.com> (raw)
In-Reply-To: <TYCPR01MB845509B0DBC0BBCF1CC8DB73E5D19@TYCPR01MB8455.jpnprd01.prod.outlook.com>

On 2/1/2023 6:06 AM, Daisuke Matsuda (Fujitsu) wrote:
> On Sat, Jan 28, 2023 6:10 AM Bob Pearson wrote:
>>
>> Currently the rxe driver does not handle all cases of zero length
>> rdma operations correctly. The client does not have to provide an
>> rkey for zero length RDMA operations so the rkey provided may be
>> invalid and should not be used to lookup an mr.
>>
>> This patch corrects the driver to ignore the provided rkey if the
>> reth length is zero and make sure to set the mr to NULL. In read_reply()
>> if length is zero rxe_recheck_mr() is not called. Warnings are added in
>> the routines in rxe_mr.c to catch NULL MRs when the length is non-zero.
>>
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> ---
> 
> When I applied this change, a testcase in rdma-core failed as shown below:
> ======================================================================
> ERROR: test_qp_ex_rc_flush (tests.test_qpex.QpExTestCase)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>    File "/root/rdma-core/tests/test_qpex.py", line 258, in test_qp_ex_rc_flush
>      raise PyverbsError(f'Unexpected {wc_status_to_str(wcs[0].status)}')
> pyverbs.pyverbs_error.PyverbsError: Unexpected Remote access error
> 
> ----------------------------------------------------------------------
> 
> In my opinion, your change makes sense within the range of traditional
> RDMA operations, but conflicts with the new RDMA FLUSH operation.
> Responder cannot access the target MR because of invalid rkey. The
> root cause is written in IBA Annex A19, especially 'oA19-2'.
> We thus cannot set qp->resp.rkey to 0 in qp_resp_from_reth().
> 
> Do you have anything to say about this? > Li Zhijian
> 
> Thanks,
> Daisuke Matsuda

I'm confused too, Bob can you point to the section of the spec
that allows the rkey to be zero? It's my understanding that
a zero-length RDMA Read must always check for access, even
though no data is actually fetched. That would not be possible
without an rkey.

Tom.

>> v3:
>>    Fixed my fat finger typing on v2. Mangled the patch.
>>
>> v2:
>>    Rebased to current for-next.
>>    Cleaned up description to be a little more accurate.
>> ---
>>   drivers/infiniband/sw/rxe/rxe_mr.c   |  6 +++
>>   drivers/infiniband/sw/rxe/rxe_resp.c | 55 +++++++++++++++++++++-------
>>   2 files changed, 47 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>> index c80458634962..5b7ede1d2b08 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>> @@ -314,6 +314,9 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
>>   	if (length == 0)
>>   		return 0;
>>
>> +	if (WARN_ON(!mr))
>> +		return -EINVAL;
>> +
>>   	if (mr->ibmr.type == IB_MR_TYPE_DMA) {
>>   		rxe_mr_copy_dma(mr, iova, addr, length, dir);
>>   		return 0;
>> @@ -435,6 +438,9 @@ int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, unsigned int length)
>>   	if (length == 0)
>>   		return 0;
>>
>> +	if (WARN_ON(!mr))
>> +		return -EINVAL;
>> +
>>   	if (mr->ibmr.type == IB_MR_TYPE_DMA)
>>   		return -EFAULT;
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
>> index cd2d88de287c..b13ae98400c1 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>> @@ -420,13 +420,23 @@ static enum resp_states rxe_resp_check_length(struct rxe_qp *qp,
>>   	return RESPST_CHK_RKEY;
>>   }
>>
>> +/* if the reth length field is zero we can assume nothing
>> + * about the rkey value and should not validate or use it.
>> + * Instead set qp->resp.rkey to 0 which is an invalid rkey
>> + * value since the minimum index part is 1.
>> + */
>>   static void qp_resp_from_reth(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
>>   {
>> +	unsigned int length = reth_len(pkt);
>> +
>>   	qp->resp.va = reth_va(pkt);
>>   	qp->resp.offset = 0;
>> -	qp->resp.rkey = reth_rkey(pkt);
>> -	qp->resp.resid = reth_len(pkt);
>> -	qp->resp.length = reth_len(pkt);
>> +	qp->resp.resid = length;
>> +	qp->resp.length = length;
>> +	if (length)
>> +		qp->resp.rkey = reth_rkey(pkt);
>> +	else
>> +		qp->resp.rkey = 0;
>>   }
>>
>>   static void qp_resp_from_atmeth(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
>> @@ -437,6 +447,10 @@ static void qp_resp_from_atmeth(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
>>   	qp->resp.resid = sizeof(u64);
>>   }
>>
>> +/* resolve the packet rkey to qp->resp.mr or set qp->resp.mr to NULL
>> + * if an invalid rkey is received or the rdma length is zero. For middle
>> + * or last packets use the stored value of mr.
>> + */
>>   static enum resp_states check_rkey(struct rxe_qp *qp,
>>   				   struct rxe_pkt_info *pkt)
>>   {
>> @@ -475,8 +489,8 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
>>
>>   	/* A zero-byte op is not required to set an addr or rkey. See C9-88 */
>>   	if ((pkt->mask & RXE_READ_OR_WRITE_MASK) &&
>> -	    (pkt->mask & RXE_RETH_MASK) &&
>> -	    reth_len(pkt) == 0) {
>> +	    (pkt->mask & RXE_RETH_MASK) && reth_len(pkt) == 0) {
>> +		qp->resp.mr = NULL;
>>   		return RESPST_EXECUTE;
>>   	}
>>
>> @@ -555,6 +569,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
>>   	return RESPST_EXECUTE;
>>
>>   err:
>> +	qp->resp.mr = NULL;
>>   	if (mr)
>>   		rxe_put(mr);
>>   	if (mw)
>> @@ -885,7 +900,11 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>   	}
>>
>>   	if (res->state == rdatm_res_state_new) {
>> -		if (!res->replay) {
>> +		if (!res->replay || qp->resp.length == 0) {
>> +			/* if length == 0 mr will be NULL (is ok)
>> +			 * otherwise qp->resp.mr holds a ref on mr
>> +			 * which we transfer to mr and drop below.
>> +			 */
>>   			mr = qp->resp.mr;
>>   			qp->resp.mr = NULL;
>>   		} else {
>> @@ -899,6 +918,10 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>   		else
>>   			opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST;
>>   	} else {
>> +		/* re-lookup mr from rkey on all later packets.
>> +		 * length will be non-zero. This can fail if someone
>> +		 * modifies or destroys the mr since the first packet.
>> +		 */
>>   		mr = rxe_recheck_mr(qp, res->read.rkey);
>>   		if (!mr)
>>   			return RESPST_ERR_RKEY_VIOLATION;
>> @@ -916,18 +939,16 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>   	skb = prepare_ack_packet(qp, &ack_pkt, opcode, payload,
>>   				 res->cur_psn, AETH_ACK_UNLIMITED);
>>   	if (!skb) {
>> -		if (mr)
>> -			rxe_put(mr);
>> -		return RESPST_ERR_RNR;
>> +		state = RESPST_ERR_RNR;
>> +		goto err_out;
>>   	}
>>
>>   	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>>   			  payload, RXE_FROM_MR_OBJ);
>> -	if (mr)
>> -		rxe_put(mr);
>>   	if (err) {
>>   		kfree_skb(skb);
>> -		return RESPST_ERR_RKEY_VIOLATION;
>> +		state = RESPST_ERR_RKEY_VIOLATION;
>> +		goto err_out;
>>   	}
>>
>>   	if (bth_pad(&ack_pkt)) {
>> @@ -936,9 +957,12 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>   		memset(pad, 0, bth_pad(&ack_pkt));
>>   	}
>>
>> +	/* rxe_xmit_packet always consumes the skb */
>>   	err = rxe_xmit_packet(qp, &ack_pkt, skb);
>> -	if (err)
>> -		return RESPST_ERR_RNR;
>> +	if (err) {
>> +		state = RESPST_ERR_RNR;
>> +		goto err_out;
>> +	}
>>
>>   	res->read.va += payload;
>>   	res->read.resid -= payload;
>> @@ -955,6 +979,9 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>   		state = RESPST_CLEANUP;
>>   	}
>>
>> +err_out:
>> +	if (mr)
>> +		rxe_put(mr);
>>   	return state;
>>   }
>>
>> --
>> 2.37.2
> 

  reply	other threads:[~2023-02-01 15:38 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-27 21:09 [PATCH for-next v3] Subject: RDMA/rxe: Handle zero length rdma Bob Pearson
2023-02-01 11:06 ` Daisuke Matsuda (Fujitsu)
2023-02-01 15:38   ` Tom Talpey [this message]
2023-02-02  3:45     ` Bob Pearson
2023-02-02  5:49       ` lizhijian
2023-02-02 13:02       ` Tom Talpey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=24a30a4b-ab51-b24a-7976-eeefabb99619@talpey.com \
    --to=tom@talpey.com \
    --cc=jgg@nvidia.com \
    --cc=leonro@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=lizhijian@fujitsu.com \
    --cc=matsuda-daisuke@fujitsu.com \
    --cc=rpearsonhpe@gmail.com \
    --cc=yangx.jy@fujitsu.com \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox