* Re: [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
2022-08-29 5:44 ` [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests Daisuke Matsuda
@ 2022-08-29 5:51 ` Cheng Xu
2022-08-29 6:31 ` Zhu Yanjun
2022-08-29 7:36 ` Li Zhijian
2 siblings, 0 replies; 13+ messages in thread
From: Cheng Xu @ 2022-08-29 5:51 UTC (permalink / raw)
To: Daisuke Matsuda, jgg; +Cc: linux-rdma, zyjzyj2000
On 8/29/22 1:44 PM, Daisuke Matsuda wrote:
> An incoming Read request causes multiple Read responses. If a user MR to
> copy data from is unavailable or responder cannot send a reply, then the
> error messages can be printed for each response attempt, resulting in
> message overflow.
>
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> ---
> drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index b36ec5c4d5e0..4b3e8aec2fb8 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>
> err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> payload, RXE_FROM_MR_OBJ);
> - if (err)
> - pr_err("Failed copying memory\n");
The err is set but not used. Below is better:
- err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
- payload, RXE_FROM_MR_OBJ);
+ rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
+ payload, RXE_FROM_MR_OBJ);
Thanks,
Cheng Xu
> if (mr)
> rxe_put(mr);
>
> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> }
>
> err = rxe_xmit_packet(qp, &ack_pkt, skb);
> - if (err) {
> - pr_err("Failed sending RDMA reply.\n");
> + if (err)
> return RESPST_ERR_RNR;
> - }
>
> res->read.va += payload;
> res->read.resid -= payload;
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
2022-08-29 5:44 ` [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests Daisuke Matsuda
2022-08-29 5:51 ` Cheng Xu
@ 2022-08-29 6:31 ` Zhu Yanjun
2022-08-29 7:02 ` Leon Romanovsky
2022-08-29 7:36 ` Li Zhijian
2 siblings, 1 reply; 13+ messages in thread
From: Zhu Yanjun @ 2022-08-29 6:31 UTC (permalink / raw)
To: Daisuke Matsuda; +Cc: Jason Gunthorpe, RDMA mailing list
On Mon, Aug 29, 2022 at 1:45 PM Daisuke Matsuda
<matsuda-daisuke@fujitsu.com> wrote:
>
> An incoming Read request causes multiple Read responses. If a user MR to
> copy data from is unavailable or responder cannot send a reply, then the
> error messages can be printed for each response attempt, resulting in
> message overflow.
>
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> ---
> drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index b36ec5c4d5e0..4b3e8aec2fb8 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>
> err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> payload, RXE_FROM_MR_OBJ);
> - if (err)
> - pr_err("Failed copying memory\n");
pr_err_once is better?
> if (mr)
> rxe_put(mr);
>
> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> }
>
> err = rxe_xmit_packet(qp, &ack_pkt, skb);
> - if (err) {
> - pr_err("Failed sending RDMA reply.\n");
The same with the above
Zhu Yanjun
> + if (err)
> return RESPST_ERR_RNR;
> - }
>
> res->read.va += payload;
> res->read.resid -= payload;
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
2022-08-29 6:31 ` Zhu Yanjun
@ 2022-08-29 7:02 ` Leon Romanovsky
2022-08-29 7:13 ` Zhu Yanjun
0 siblings, 1 reply; 13+ messages in thread
From: Leon Romanovsky @ 2022-08-29 7:02 UTC (permalink / raw)
To: Zhu Yanjun; +Cc: Daisuke Matsuda, Jason Gunthorpe, RDMA mailing list
On Mon, Aug 29, 2022 at 02:31:00PM +0800, Zhu Yanjun wrote:
> On Mon, Aug 29, 2022 at 1:45 PM Daisuke Matsuda
> <matsuda-daisuke@fujitsu.com> wrote:
> >
> > An incoming Read request causes multiple Read responses. If a user MR to
> > copy data from is unavailable or responder cannot send a reply, then the
> > error messages can be printed for each response attempt, resulting in
> > message overflow.
> >
> > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> > ---
> > drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
> > 1 file changed, 1 insertion(+), 5 deletions(-)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> > index b36ec5c4d5e0..4b3e8aec2fb8 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> > @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> >
> > err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> > payload, RXE_FROM_MR_OBJ);
> > - if (err)
> > - pr_err("Failed copying memory\n");
>
> pr_err_once is better?
Like Jason said, there shouldn't any prints in network triggered flows.
Thanks
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
2022-08-29 7:02 ` Leon Romanovsky
@ 2022-08-29 7:13 ` Zhu Yanjun
0 siblings, 0 replies; 13+ messages in thread
From: Zhu Yanjun @ 2022-08-29 7:13 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Daisuke Matsuda, Jason Gunthorpe, RDMA mailing list
On Mon, Aug 29, 2022 at 3:02 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Mon, Aug 29, 2022 at 02:31:00PM +0800, Zhu Yanjun wrote:
> > On Mon, Aug 29, 2022 at 1:45 PM Daisuke Matsuda
> > <matsuda-daisuke@fujitsu.com> wrote:
> > >
> > > An incoming Read request causes multiple Read responses. If a user MR to
> > > copy data from is unavailable or responder cannot send a reply, then the
> > > error messages can be printed for each response attempt, resulting in
> > > message overflow.
> > >
> > > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> > > ---
> > > drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
> > > 1 file changed, 1 insertion(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> > > index b36ec5c4d5e0..4b3e8aec2fb8 100644
> > > --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> > > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> > > @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> > >
> > > err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> > > payload, RXE_FROM_MR_OBJ);
> > > - if (err)
> > > - pr_err("Failed copying memory\n");
> >
> > pr_err_once is better?
>
> Like Jason said, there shouldn't any prints in network triggered flows.
>
Ok, thanks.
Zhu Yanjun
> Thanks
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
2022-08-29 5:44 ` [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests Daisuke Matsuda
2022-08-29 5:51 ` Cheng Xu
2022-08-29 6:31 ` Zhu Yanjun
@ 2022-08-29 7:36 ` Li Zhijian
2022-08-29 10:21 ` matsuda-daisuke
2 siblings, 1 reply; 13+ messages in thread
From: Li Zhijian @ 2022-08-29 7:36 UTC (permalink / raw)
To: Daisuke Matsuda, jgg; +Cc: linux-rdma, zyjzyj2000
On 29/08/2022 13:44, Daisuke Matsuda wrote:
> An incoming Read request causes multiple Read responses. If a user MR to
> copy data from is unavailable or responder cannot send a reply, then the
> error messages can be printed for each response attempt, resulting in
> message overflow.
>
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> ---
> drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index b36ec5c4d5e0..4b3e8aec2fb8 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>
> err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> payload, RXE_FROM_MR_OBJ);
> - if (err)
> - pr_err("Failed copying memory\n");
Not relate to this patch.
I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ?
IMO, when err happens, responder shall notify the request anyhow.
Thanks
Zhijian
> if (mr)
> rxe_put(mr);
>
> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> }
>
> err = rxe_xmit_packet(qp, &ack_pkt, skb);
> - if (err) {
> - pr_err("Failed sending RDMA reply.\n");
> + if (err)
> return RESPST_ERR_RNR;
> - }
>
> res->read.va += payload;
> res->read.resid -= payload;
^ permalink raw reply [flat|nested] 13+ messages in thread* RE: [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
2022-08-29 7:36 ` Li Zhijian
@ 2022-08-29 10:21 ` matsuda-daisuke
2022-08-30 9:44 ` Li Zhijian
0 siblings, 1 reply; 13+ messages in thread
From: matsuda-daisuke @ 2022-08-29 10:21 UTC (permalink / raw)
To: lizhijian@fujitsu.com, jgg@nvidia.com
Cc: linux-rdma@vger.kernel.org, zyjzyj2000@gmail.com
On Monday, August 29, 2022 4:36 PM, Li Zhijian wrote:
> On 29/08/2022 13:44, Daisuke Matsuda wrote:
> > An incoming Read request causes multiple Read responses. If a user MR to
> > copy data from is unavailable or responder cannot send a reply, then the
> > error messages can be printed for each response attempt, resulting in
> > message overflow.
> >
> > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> > ---
> > drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
> > 1 file changed, 1 insertion(+), 5 deletions(-)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> > index b36ec5c4d5e0..4b3e8aec2fb8 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> > @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> >
> > err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> > payload, RXE_FROM_MR_OBJ);
> > - if (err)
> > - pr_err("Failed copying memory\n");
> Not relate to this patch.
> I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ?
> IMO, when err happens, responder shall notify the request anyhow.
Practically, I have never seen rxe_mr_copy() failed before,
but I agree the implementation may be incorrect as you mentioned.
As far as I tested, responder replied with the requested amount of payloads
even when rxe_mr_copy() is modified to fail. In this case,
requester may mistakenly believe that they get data correctly.
For more details, see IB Specification Vol 1-Revision-1.5 Ch.9.7.5.1.3 (page.334).
Daisuke Matsuda
>
> Thanks
> Zhijian
>
> > if (mr)
> > rxe_put(mr);
> >
> > @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> > }
> >
> > err = rxe_xmit_packet(qp, &ack_pkt, skb);
> > - if (err) {
> > - pr_err("Failed sending RDMA reply.\n");
> > + if (err)
> > return RESPST_ERR_RNR;
> > - }
> >
> > res->read.va += payload;
> > res->read.resid -= payload;
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
2022-08-29 10:21 ` matsuda-daisuke
@ 2022-08-30 9:44 ` Li Zhijian
2022-08-30 9:54 ` Li Zhijian
0 siblings, 1 reply; 13+ messages in thread
From: Li Zhijian @ 2022-08-30 9:44 UTC (permalink / raw)
To: Matsuda, Daisuke/松田 大輔,
jgg@nvidia.com
Cc: linux-rdma@vger.kernel.org, zyjzyj2000@gmail.com
On 29/08/2022 18:21, Matsuda, Daisuke/松田 大輔 wrote:
> On Monday, August 29, 2022 4:36 PM, Li Zhijian wrote:
>> On 29/08/2022 13:44, Daisuke Matsuda wrote:
>>> An incoming Read request causes multiple Read responses. If a user MR to
>>> copy data from is unavailable or responder cannot send a reply, then the
>>> error messages can be printed for each response attempt, resulting in
>>> message overflow.
>>>
>>> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
>>> ---
>>> drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
>>> 1 file changed, 1 insertion(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
>>> index b36ec5c4d5e0..4b3e8aec2fb8 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>>> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>
>>> err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>>> payload, RXE_FROM_MR_OBJ);
>>> - if (err)
>>> - pr_err("Failed copying memory\n");
>> Not relate to this patch.
>> I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ?
>> IMO, when err happens, responder shall notify the request anyhow.
> Practically, I have never seen rxe_mr_copy() failed before,
> but I agree the implementation may be incorrect as you mentioned.
>
> As far as I tested, responder replied with the requested amount of payloads
> even when rxe_mr_copy() is modified to fail. In this case,
> requester may mistakenly believe that they get data correctly.
>
> For more details, see IB Specification Vol 1-Revision-1.5 Ch.9.7.5.1.3 (page.334).
it seems it's suitable to reply NAK code "REMOTE ACCESS ERROR" to the requester side
by returning RESPST_ERR_RKEY_VIOLATION here.
see "9.7.5.2.4 REMOTE ACCESS ERROR" and "9.7.4.1.5 RESPONDER R_KEY VALIDATION"
>
> Daisuke Matsuda
>
>> Thanks
>> Zhijian
>>
>>> if (mr)
>>> rxe_put(mr);
>>>
>>> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>> }
>>>
>>> err = rxe_xmit_packet(qp, &ack_pkt, skb);
>>> - if (err) {
>>> - pr_err("Failed sending RDMA reply.\n");
>>> + if (err)
>>> return RESPST_ERR_RNR;
>>> - }
>>>
>>> res->read.va += payload;
>>> res->read.resid -= payload;
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
2022-08-30 9:44 ` Li Zhijian
@ 2022-08-30 9:54 ` Li Zhijian
0 siblings, 0 replies; 13+ messages in thread
From: Li Zhijian @ 2022-08-30 9:54 UTC (permalink / raw)
To: Matsuda, Daisuke/松田 大輔,
jgg@nvidia.com
Cc: linux-rdma@vger.kernel.org, zyjzyj2000@gmail.com
On 30/08/2022 17:44, Li Zhijian wrote:
>
>
> On 29/08/2022 18:21, Matsuda, Daisuke/松田 大輔 wrote:
>> On Monday, August 29, 2022 4:36 PM, Li Zhijian wrote:
>>> On 29/08/2022 13:44, Daisuke Matsuda wrote:
>>>> An incoming Read request causes multiple Read responses. If a user MR to
>>>> copy data from is unavailable or responder cannot send a reply, then the
>>>> error messages can be printed for each response attempt, resulting in
>>>> message overflow.
>>>>
>>>> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
>>>> ---
>>>> drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
>>>> 1 file changed, 1 insertion(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
>>>> index b36ec5c4d5e0..4b3e8aec2fb8 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>>>> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>>
>>>> err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>>>> payload, RXE_FROM_MR_OBJ);
Looks i was missing the the faulting point inside rxe_mr_copy()
mr_check_range() is the only point to return an error inside rxe_mr_copy()
and mr_check_range() would never fail in this moment, since it is always tested by RESPST_CHK_RKEY
before calling read_reply()
so it's safe to remove this print, and add some comments ?
Thanks
Zhijian
>>>> - if (err)
>>>> - pr_err("Failed copying memory\n");
>>> Not relate to this patch.
>>> I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ?
>>> IMO, when err happens, responder shall notify the request anyhow.
>> Practically, I have never seen rxe_mr_copy() failed before,
>> but I agree the implementation may be incorrect as you mentioned.
>>
>> As far as I tested, responder replied with the requested amount of payloads
>> even when rxe_mr_copy() is modified to fail. In this case,
>> requester may mistakenly believe that they get data correctly.
>>
>> For more details, see IB Specification Vol 1-Revision-1.5 Ch.9.7.5.1.3 (page.334).
>
> it seems it's suitable to reply NAK code "REMOTE ACCESS ERROR" to the requester side
> by returning RESPST_ERR_RKEY_VIOLATION here.
>
> see "9.7.5.2.4 REMOTE ACCESS ERROR" and "9.7.4.1.5 RESPONDER R_KEY VALIDATION"
>
>
>
>>
>> Daisuke Matsuda
>>
>>> Thanks
>>> Zhijian
>>>
>>>> if (mr)
>>>> rxe_put(mr);
>>>>
>>>> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>> }
>>>>
>>>> err = rxe_xmit_packet(qp, &ack_pkt, skb);
>>>> - if (err) {
>>>> - pr_err("Failed sending RDMA reply.\n");
>>>> + if (err)
>>>> return RESPST_ERR_RNR;
>>>> - }
>>>>
>>>> res->read.va += payload;
>>>> res->read.resid -= payload;
>
^ permalink raw reply [flat|nested] 13+ messages in thread