From: Bob Pearson <rpearsonhpe@gmail.com>
To: Zhu Yanjun <zyjzyj2000@gmail.com>
Cc: jgg@nvidia.com, leonro@nvidia.com, linux-rdma@vger.kernel.org
Subject: Re: [PATCH for-next] RDMA/rxe: Handle zero length cases correctly
Date: Thu, 19 Jan 2023 22:27:02 -0600 [thread overview]
Message-ID: <20809b59-0d7f-b6b0-e51c-026a78f07a86@gmail.com> (raw)
In-Reply-To: <CAD=hENcdkWchRrvH+KXLXZoaQcZPpnCdV9V9T9mmzkJ13DJKUA@mail.gmail.com>
On 1/19/23 19:38, Zhu Yanjun wrote:
> On Fri, Jan 20, 2023 at 3:09 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>>
>> Currently the rxe driver, in rare situations, can respond incorrectly
>> to zero length operations which are retried. The client does not
>> have to provide an rkey for zero length RDMA operations so the rkey
>> may be invalid. The driver saves this rkey in the responder resources
>> to replay the rdma operation if a retry is required so the second pass
>> will use this (potentially) invalid rkey which may result in memory
>> faults.
>>
>> This patch corrects the driver to ignore the provided rkey if the
>> reth length is zero and make sure to set the mr to NULL. In read_reply()
>> if the length is zero the MR is set to NULL. Warnings are added in
>> the routines in rxe_mr.c to catch NULL MRs when the length is non-zero.
>>
>
> There is a patch in the following link:
>
> https://patchwork.kernel.org/project/linux-rdma/patch/20230113023527.728725-1-baijiaju1990@gmail.com/
>
> Not sure whether it is similar or not.
>
> Zhu Yanjun
>
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> ---
>> drivers/infiniband/sw/rxe/rxe_mr.c | 9 +++++++
>> drivers/infiniband/sw/rxe/rxe_resp.c | 36 +++++++++++++++++++++-------
>> 2 files changed, 36 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>> index 072eac4b65d2..134a74f315c2 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>> @@ -267,6 +267,9 @@ void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length)
>> int m, n;
>> void *addr;
>>
>> + if (WARN_ON(!mr))
>> + return NULL;
>> +
>> if (mr->state != RXE_MR_STATE_VALID) {
>> rxe_dbg_mr(mr, "Not in valid state\n");
>> addr = NULL;
>> @@ -305,6 +308,9 @@ int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, int length)
>> if (length == 0)
>> return 0;
>>
>> + if (WARN_ON(!mr))
>> + return -EINVAL;
>> +
>> if (mr->ibmr.type == IB_MR_TYPE_DMA)
>> return -EFAULT;
>>
>> @@ -349,6 +355,9 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length,
>> if (length == 0)
>> return 0;
>>
>> + if (WARN_ON(!mr))
>> + return -EINVAL;
>> +
>> if (mr->ibmr.type == IB_MR_TYPE_DMA) {
>> u8 *src, *dest;
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
>> index c74972244f08..a528dc25d389 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>> @@ -457,13 +457,23 @@ static enum resp_states rxe_resp_check_length(struct rxe_qp *qp,
>> return RESPST_CHK_RKEY;
>> }
>>
>> +/* if the reth length field is zero we can assume nothing
>> + * about the rkey value and should not validate or use it.
>> + * Instead set qp->resp.rkey to 0 which is an invalid rkey
>> + * value since the minimum index part is 1.
>> + */
>> static void qp_resp_from_reth(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
>> {
>> + unsigned int length = reth_len(pkt);
>> +
>> qp->resp.va = reth_va(pkt);
>> qp->resp.offset = 0;
>> - qp->resp.rkey = reth_rkey(pkt);
>> - qp->resp.resid = reth_len(pkt);
>> - qp->resp.length = reth_len(pkt);
>> + qp->resp.resid = length;
>> + qp->resp.length = length;
>> + if (length)
>> + qp->resp.rkey = reth_rkey(pkt);
>> + else
>> + qp->resp.rkey = 0;
>> }
>>
>> static void qp_resp_from_atmeth(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
>> @@ -512,8 +522,8 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
>>
>> /* A zero-byte op is not required to set an addr or rkey. See C9-88 */
>> if ((pkt->mask & RXE_READ_OR_WRITE_MASK) &&
>> - (pkt->mask & RXE_RETH_MASK) &&
>> - reth_len(pkt) == 0) {
>> + (pkt->mask & RXE_RETH_MASK) && reth_len(pkt) == 0) {
>> + qp->resp.mr = NULL;
>> return RESPST_EXECUTE;
>> }
>>
>> @@ -592,6 +602,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
>> return RESPST_EXECUTE;
>>
>> err:
>> + qp->resp.mr = NULL;
>> if (mr)
>> rxe_put(mr);
>> if (mw)
>> @@ -966,7 +977,10 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>> }
>>
>> if (res->state == rdatm_res_state_new) {
>> - if (!res->replay) {
>> + if (qp->resp.length == 0) {
>> + mr = NULL;
>> + qp->resp.mr = NULL;
>> + } else if (!res->replay) {
>> mr = qp->resp.mr;
>> qp->resp.mr = NULL;
>> } else {
>> @@ -980,9 +994,13 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>> else
>> opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST;
>> } else {
>> - mr = rxe_recheck_mr(qp, res->read.rkey);
>> - if (!mr)
>> - return RESPST_ERR_RKEY_VIOLATION;
>> + if (qp->resp.length == 0) {
>> + mr = NULL;
>> + } else {
>> + mr = rxe_recheck_mr(qp, res->read.rkey);
>> + if (!mr)
>> + return RESPST_ERR_RKEY_VIOLATION;
>> + }
>>
>> if (res->read.resid > mtu)
>> opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE;
>> --
>> 2.37.2
>>
Zhu,
It relates since he is checking for NULL MRs. But I don't think it addresses the root
causes. The patch I sent should eliminate NULL MRs together with length != 0 in
the copy routines. I added WARN_ON's in case someone changes things later and
we hit this again. (A warning is more useful than a fault which can be very hard
to diagnose.)
The two changes I made that attack the cause of problems are
(1) clearing qp->resp.mr in check_rkey() in the alternate paths. The primary
path demands that it get set with a valid mr. But on the alternate paths it isn't
set at all and can leave with a stale, invalid or wrong mr value.
(2) in read_reply() there is an error path where a zero length read fails to get
acked and the requester retries the operation and sends a second request. This
will end up in read_reply and as currently written attempt to lookup the rkey and
turn it into an MR but no valid rkey is required in a zero length operation so this
is likely to fail. The fixes treats length == 0 as a special case and force a NULL mr.
This should not trigger a fault in the mr copy/etc. routines since they always
check for length == 0 and return or require a non zero length.
Thanks,
Bob
next prev parent reply other threads:[~2023-01-20 4:27 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-19 19:06 [PATCH for-next] RDMA/rxe: Handle zero length cases correctly Bob Pearson
2023-01-20 1:38 ` Zhu Yanjun
2023-01-20 4:27 ` Bob Pearson [this message]
2023-01-20 6:04 ` Zhu Yanjun
2023-01-23 10:20 ` Daisuke Matsuda (Fujitsu)
2023-01-24 22:03 ` Bob Pearson
2023-01-27 16:24 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20809b59-0d7f-b6b0-e51c-026a78f07a86@gmail.com \
--to=rpearsonhpe@gmail.com \
--cc=jgg@nvidia.com \
--cc=leonro@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox