[PATCH net v1] net/smc: avoid data corruption caused by decline

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net v1] net/smc: avoid data corruption caused by decline
@ 2023-11-08  9:48 D. Wythe
  2023-11-08 13:00 ` Wenjia Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: D. Wythe @ 2023-11-08  9:48 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, wintera; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

From: "D. Wythe" <alibuda@linux.alibaba.com>

We found a data corruption issue during testing of SMC-R on Redis
applications.

The benchmark has a low probability of reporting a strange error as
shown below.

"Error: Protocol error, got "\xe2" as reply type byte"

Finally, we found that the retrieved error data was as follows:

0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2

It is quite obvious that this is a SMC DECLINE message, which means that
the applications received SMC protocol message.
We found that this was caused by the following situations:

client			server
	   proposal
	------------->
	   accept
	<-------------
	   confirm
	------------->
wait confirm

	 failed llc confirm
	    x------
(after 2s)timeout
			wait rsp

wait decline

(after 1s) timeout
			(after 2s) timeout
	    decline
	-------------->
	    decline
	<--------------

As a result, a decline message was sent in the implementation, and this
message was read from TCP by the already-fallback connection.

This patch double the client timeout as 2x of the server value,
With this simple change, the Decline messages should never cross or
collide (during Confirm link timeout).

This issue requires an immediate solution, since the protocol updates
involve a more long-term solution.

Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC flow")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/af_smc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index abd2667..5b91f55 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct smc_sock *smc)
 	int rc;

 	/* receive CONFIRM LINK request from server over RoCE fabric */
-	qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME,
+	qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME,
 			      SMC_LLC_CONFIRM_LINK);
 	if (!qentry) {
 		struct smc_clc_msg_decline dclc;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net v1] net/smc: avoid data corruption caused by decline
  2023-11-08  9:48 [PATCH net v1] net/smc: avoid data corruption caused by decline D. Wythe
@ 2023-11-08 13:00 ` Wenjia Zhang
       [not found]   ` <b3ce2dfe-ece9-919b-024d-051cd66609ed@linux.alibaba.com>
  2023-11-08 14:58 ` Jakub Kicinski
  2023-11-13  3:44 ` Dust Li
  2 siblings, 1 reply; 10+ messages in thread
From: Wenjia Zhang @ 2023-11-08 13:00 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 08.11.23 10:48, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> We found a data corruption issue during testing of SMC-R on Redis
> applications.
> 
> The benchmark has a low probability of reporting a strange error as
> shown below.
> 
> "Error: Protocol error, got "\xe2" as reply type byte"
> 
> Finally, we found that the retrieved error data was as follows:
> 
> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2
> 
> It is quite obvious that this is a SMC DECLINE message, which means that
> the applications received SMC protocol message.
> We found that this was caused by the following situations:
> 
> client			server
> 	   proposal
> 	------------->
> 	   accept
> 	<-------------
> 	   confirm
> 	------------->
> wait confirm
> 
> 	 failed llc confirm
> 	    x------
> (after 2s)timeout
> 			wait rsp
> 
> wait decline
> 
> (after 1s) timeout
> 			(after 2s) timeout
> 	    decline
> 	-------------->
> 	    decline
> 	<--------------
> 
> As a result, a decline message was sent in the implementation, and this
> message was read from TCP by the already-fallback connection.
> 
> This patch double the client timeout as 2x of the server value,
> With this simple change, the Decline messages should never cross or
> collide (during Confirm link timeout).
> 
> This issue requires an immediate solution, since the protocol updates
> involve a more long-term solution.
> 
> Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC flow")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
>   net/smc/af_smc.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index abd2667..5b91f55 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct smc_sock *smc)
>   	int rc;
>   
>   	/* receive CONFIRM LINK request from server over RoCE fabric */
> -	qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME,
> +	qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME,
>   			      SMC_LLC_CONFIRM_LINK);
>   	if (!qentry) {
>   		struct smc_clc_msg_decline dclc;
I'm wondering if the double time (if sufficient) of timeout could be for 
waiting for CLC_DECLINE on the client's side. i.e.

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 35ddebae8894..9b1feef1013d 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -605,7 +605,7 @@ static int smcr_clnt_conf_first_link(struct smc_sock 
*smc)
                 struct smc_clc_msg_decline dclc;

                 rc = smc_clc_wait_msg(smc, &dclc, sizeof(dclc),
-                                     SMC_CLC_DECLINE, CLC_WAIT_TIME_SHORT);
+                                     SMC_CLC_DECLINE, 2 * 
CLC_WAIT_TIME_SHORT);
                 return rc == -EAGAIN ? SMC_CLC_DECL_TIMEOUT_CL : rc;
         }
         smc_llc_save_peer_uid(qentry);

Because the purpose is to let the server have the control to deline.


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net v1] net/smc: avoid data corruption caused by decline
  2023-11-08  9:48 [PATCH net v1] net/smc: avoid data corruption caused by decline D. Wythe
  2023-11-08 13:00 ` Wenjia Zhang
@ 2023-11-08 14:58 ` Jakub Kicinski
  2023-11-13  3:44 ` Dust Li
  2 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2023-11-08 14:58 UTC (permalink / raw)
  To: D. Wythe
  Cc: kgraul, wenjia, jaka, wintera, davem, netdev, linux-s390,
	linux-rdma

On Wed,  8 Nov 2023 17:48:29 +0800 D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> We found a data corruption issue during testing of SMC-R on Redis
> applications.

Please make sure you CC all relevant people pointed out
by get_maintainers. Make sure you run get_maintainers
on the generated patch, not just on file paths.

You seem to have missed the following people:
 pabeni@redhat.com
 guwen@linux.alibaba.com
 tonylu@linux.alibaba.com
 edumazet@google.com 
-- 
pw-bot: cr
pv-bot: cc

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v1] net/smc: avoid data corruption caused by decline
       [not found]   ` <b3ce2dfe-ece9-919b-024d-051cd66609ed@linux.alibaba.com>
@ 2023-11-13  2:50     ` D. Wythe
  2023-11-13 10:57       ` Wenjia Zhang
  0 siblings, 1 reply; 10+ messages in thread
From: D. Wythe @ 2023-11-13  2:50 UTC (permalink / raw)
  To: Wenjia Zhang, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 11/10/23 10:51 AM, D. Wythe wrote:
>
>
> On 11/8/23 9:00 PM, Wenjia Zhang wrote:
>>
>>
>> On 08.11.23 10:48, D. Wythe wrote:
>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>
>>> We found a data corruption issue during testing of SMC-R on Redis
>>> applications.
>>>
>>> The benchmark has a low probability of reporting a strange error as
>>> shown below.
>>>
>>> "Error: Protocol error, got "\xe2" as reply type byte"
>>>
>>> Finally, we found that the retrieved error data was as follows:
>>>
>>> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
>>> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2
>>>
>>> It is quite obvious that this is a SMC DECLINE message, which means 
>>> that
>>> the applications received SMC protocol message.
>>> We found that this was caused by the following situations:
>>>
>>> client            server
>>>        proposal
>>>     ------------->
>>>        accept
>>>     <-------------
>>>        confirm
>>>     ------------->
>>> wait confirm
>>>
>>>      failed llc confirm
>>>         x------
>>> (after 2s)timeout
>>>             wait rsp
>>>
>>> wait decline
>>>
>>> (after 1s) timeout
>>>             (after 2s) timeout
>>>         decline
>>>     -------------->
>>>         decline
>>>     <--------------
>>>
>>> As a result, a decline message was sent in the implementation, and this
>>> message was read from TCP by the already-fallback connection.
>>>
>>> This patch double the client timeout as 2x of the server value,
>>> With this simple change, the Decline messages should never cross or
>>> collide (during Confirm link timeout).
>>>
>>> This issue requires an immediate solution, since the protocol updates
>>> involve a more long-term solution.
>>>
>>> Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC 
>>> flow")
>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>> ---
>>>   net/smc/af_smc.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>> index abd2667..5b91f55 100644
>>> --- a/net/smc/af_smc.c
>>> +++ b/net/smc/af_smc.c
>>> @@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct 
>>> smc_sock *smc)
>>>       int rc;
>>>         /* receive CONFIRM LINK request from server over RoCE fabric */
>>> -    qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME,
>>> +    qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME,
>>>                     SMC_LLC_CONFIRM_LINK);
>>>       if (!qentry) {
>>>           struct smc_clc_msg_decline dclc;
>> I'm wondering if the double time (if sufficient) of timeout could be 
>> for waiting for CLC_DECLINE on the client's side. i.e.
>>
>
> It depends. We can indeed introduce a sysctl to allow server to 
> manager their Confirm Link timeout,
> but if there will be protocol updates, this introduction will no 
> longer be necessary, and we will
> have to maintain it continuously.
>
> I believe the core of the solution is to ensure that decline messages 
> never cross or collide. Increasing
> the client's timeout by twice as much as the server's timeout can 
> temporarily solve this problem.
> If Jerry's proposed protocol updates are too complex or if there won't 
> be any future protocol updates,
> it's still not late to let server manager their Confirm Link timeout then.
>
> Best wishes,
> D. Wythe
>

FYI:

It seems that my email was not successfully delivered due to some 
reasons. Sorry
for that.

D. Wythe


>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>> index 35ddebae8894..9b1feef1013d 100644
>> --- a/net/smc/af_smc.c
>> +++ b/net/smc/af_smc.c
>> @@ -605,7 +605,7 @@ static int smcr_clnt_conf_first_link(struct 
>> smc_sock *smc)
>>                 struct smc_clc_msg_decline dclc;
>>
>>                 rc = smc_clc_wait_msg(smc, &dclc, sizeof(dclc),
>> -                                     SMC_CLC_DECLINE, 
>> CLC_WAIT_TIME_SHORT);
>> +                                     SMC_CLC_DECLINE, 2 * 
>> CLC_WAIT_TIME_SHORT);
>>                 return rc == -EAGAIN ? SMC_CLC_DECL_TIMEOUT_CL : rc;
>>         }
>>         smc_llc_save_peer_uid(qentry);
>>
>> Because the purpose is to let the server have the control to deline.
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v1] net/smc: avoid data corruption caused by decline
  2023-11-08  9:48 [PATCH net v1] net/smc: avoid data corruption caused by decline D. Wythe
  2023-11-08 13:00 ` Wenjia Zhang
  2023-11-08 14:58 ` Jakub Kicinski
@ 2023-11-13  3:44 ` Dust Li
  2023-11-15 14:06   ` Wenjia Zhang
  2 siblings, 1 reply; 10+ messages in thread
From: Dust Li @ 2023-11-13  3:44 UTC (permalink / raw)
  To: D. Wythe, kgraul, wenjia, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma

On Wed, Nov 08, 2023 at 05:48:29PM +0800, D. Wythe wrote:
>From: "D. Wythe" <alibuda@linux.alibaba.com>
>
>We found a data corruption issue during testing of SMC-R on Redis
>applications.
>
>The benchmark has a low probability of reporting a strange error as
>shown below.
>
>"Error: Protocol error, got "\xe2" as reply type byte"
>
>Finally, we found that the retrieved error data was as follows:
>
>0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
>0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2
>
>It is quite obvious that this is a SMC DECLINE message, which means that
>the applications received SMC protocol message.
>We found that this was caused by the following situations:
>
>client			server
>	   proposal
>	------------->
>	   accept
>	<-------------
>	   confirm
>	------------->
>wait confirm
>
>	 failed llc confirm
>	    x------
>(after 2s)timeout
>			wait rsp
>
>wait decline
>
>(after 1s) timeout
>			(after 2s) timeout
>	    decline
>	-------------->
>	    decline
>	<--------------
>
>As a result, a decline message was sent in the implementation, and this
>message was read from TCP by the already-fallback connection.
>
>This patch double the client timeout as 2x of the server value,
>With this simple change, the Decline messages should never cross or
>collide (during Confirm link timeout).
>
>This issue requires an immediate solution, since the protocol updates
>involve a more long-term solution.
>
>Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC flow")
>Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>---
> net/smc/af_smc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>index abd2667..5b91f55 100644
>--- a/net/smc/af_smc.c
>+++ b/net/smc/af_smc.c
>@@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct smc_sock *smc)
> 	int rc;
> 
> 	/* receive CONFIRM LINK request from server over RoCE fabric */
>-	qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME,
>+	qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME,
> 			      SMC_LLC_CONFIRM_LINK);

It may be difficult for people to understand why LLC_WAIT_TIME is
different, especially without any comments explaining its purpose.
People are required to use git to find the reason, which I believe is
not conducive to easy maintenance.

Best regards,
Dust



> 	if (!qentry) {
> 		struct smc_clc_msg_decline dclc;
>-- 
>1.8.3.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v1] net/smc: avoid data corruption caused by decline
  2023-11-13  2:50     ` D. Wythe
@ 2023-11-13 10:57       ` Wenjia Zhang
  2023-11-14  9:52         ` D. Wythe
  0 siblings, 1 reply; 10+ messages in thread
From: Wenjia Zhang @ 2023-11-13 10:57 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 13.11.23 03:50, D. Wythe wrote:
> 
> 
> On 11/10/23 10:51 AM, D. Wythe wrote:
>>
>>
>> On 11/8/23 9:00 PM, Wenjia Zhang wrote:
>>>
>>>
>>> On 08.11.23 10:48, D. Wythe wrote:
>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>
>>>> We found a data corruption issue during testing of SMC-R on Redis
>>>> applications.
>>>>
>>>> The benchmark has a low probability of reporting a strange error as
>>>> shown below.
>>>>
>>>> "Error: Protocol error, got "\xe2" as reply type byte"
>>>>
>>>> Finally, we found that the retrieved error data was as follows:
>>>>
>>>> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
>>>> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>>>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2
>>>>
>>>> It is quite obvious that this is a SMC DECLINE message, which means 
>>>> that
>>>> the applications received SMC protocol message.
>>>> We found that this was caused by the following situations:
>>>>
>>>> client            server
>>>>        proposal
>>>>     ------------->
>>>>        accept
>>>>     <-------------
>>>>        confirm
>>>>     ------------->
>>>> wait confirm
>>>>
>>>>      failed llc confirm
>>>>         x------
>>>> (after 2s)timeout
>>>>             wait rsp
>>>>
>>>> wait decline
>>>>
>>>> (after 1s) timeout
>>>>             (after 2s) timeout
>>>>         decline
>>>>     -------------->
>>>>         decline
>>>>     <--------------
>>>>
>>>> As a result, a decline message was sent in the implementation, and this
>>>> message was read from TCP by the already-fallback connection.
>>>>
>>>> This patch double the client timeout as 2x of the server value,
>>>> With this simple change, the Decline messages should never cross or
>>>> collide (during Confirm link timeout).
>>>>
>>>> This issue requires an immediate solution, since the protocol updates
>>>> involve a more long-term solution.
>>>>
>>>> Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC 
>>>> flow")
>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>>> ---
>>>>   net/smc/af_smc.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>>> index abd2667..5b91f55 100644
>>>> --- a/net/smc/af_smc.c
>>>> +++ b/net/smc/af_smc.c
>>>> @@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct 
>>>> smc_sock *smc)
>>>>       int rc;
>>>>         /* receive CONFIRM LINK request from server over RoCE fabric */
>>>> -    qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME,
>>>> +    qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME,
>>>>                     SMC_LLC_CONFIRM_LINK);
>>>>       if (!qentry) {
>>>>           struct smc_clc_msg_decline dclc;
>>> I'm wondering if the double time (if sufficient) of timeout could be 
>>> for waiting for CLC_DECLINE on the client's side. i.e.
>>>
>>
>> It depends. We can indeed introduce a sysctl to allow server to 
>> manager their Confirm Link timeout,
>> but if there will be protocol updates, this introduction will no 
>> longer be necessary, and we will
>> have to maintain it continuously.
>>
no, I don't think, either, that we need a sysctl for that.
>> I believe the core of the solution is to ensure that decline messages 
>> never cross or collide. Increasing
>> the client's timeout by twice as much as the server's timeout can 
>> temporarily solve this problem.

I have no objection with that, but my question is why you don't increase 
the timeout waiting for CLC_DECLINE instead of waiting LLC_Confirm_Link? 
Shouldn't they have the same effect?

>> If Jerry's proposed protocol updates are too complex or if there won't 
>> be any future protocol updates,
>> it's still not late to let server manager their Confirm Link timeout 
>> then.
>>
>> Best wishes,
>> D. Wythe
>>
> 
> FYI:
> 
> It seems that my email was not successfully delivered due to some 
> reasons. Sorry
> for that.
> 
> D. Wythe
> 
> 

>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>> index 35ddebae8894..9b1feef1013d 100644
>>> --- a/net/smc/af_smc.c
>>> +++ b/net/smc/af_smc.c
>>> @@ -605,7 +605,7 @@ static int smcr_clnt_conf_first_link(struct 
>>> smc_sock *smc)
>>>                 struct smc_clc_msg_decline dclc;
>>>
>>>                 rc = smc_clc_wait_msg(smc, &dclc, sizeof(dclc),
>>> -                                     SMC_CLC_DECLINE, 
>>> CLC_WAIT_TIME_SHORT);
>>> +                                     SMC_CLC_DECLINE, 2 * 
>>> CLC_WAIT_TIME_SHORT);
>>>                 return rc == -EAGAIN ? SMC_CLC_DECL_TIMEOUT_CL : rc;
>>>         }
>>>         smc_llc_save_peer_uid(qentry);
>>>
>>> Because the purpose is to let the server have the control to deline.
>>
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v1] net/smc: avoid data corruption caused by decline
  2023-11-13 10:57       ` Wenjia Zhang
@ 2023-11-14  9:52         ` D. Wythe
  2023-11-15 13:52           ` Wenjia Zhang
  0 siblings, 1 reply; 10+ messages in thread
From: D. Wythe @ 2023-11-14  9:52 UTC (permalink / raw)
  To: Wenjia Zhang, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 11/13/23 6:57 PM, Wenjia Zhang wrote:
>
>
> On 13.11.23 03:50, D. Wythe wrote:
>>
>>
>> On 11/10/23 10:51 AM, D. Wythe wrote:
>>>
>>>
>>> On 11/8/23 9:00 PM, Wenjia Zhang wrote:
>>>>
>>>>
>>>> On 08.11.23 10:48, D. Wythe wrote:
>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>
>>>>> We found a data corruption issue during testing of SMC-R on Redis
>>>>> applications.
>>>>>
>>>>> The benchmark has a low probability of reporting a strange error as
>>>>> shown below.
>>>>>
>>>>> "Error: Protocol error, got "\xe2" as reply type byte"
>>>>>
>>>>> Finally, we found that the retrieved error data was as follows:
>>>>>
>>>>> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
>>>>> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>>>>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2
>>>>>
>>>>> It is quite obvious that this is a SMC DECLINE message, which 
>>>>> means that
>>>>> the applications received SMC protocol message.
>>>>> We found that this was caused by the following situations:
>>>>>
>>>>> client            server
>>>>>        proposal
>>>>>     ------------->
>>>>>        accept
>>>>>     <-------------
>>>>>        confirm
>>>>>     ------------->
>>>>> wait confirm
>>>>>
>>>>>      failed llc confirm
>>>>>         x------
>>>>> (after 2s)timeout
>>>>>             wait rsp
>>>>>
>>>>> wait decline
>>>>>
>>>>> (after 1s) timeout
>>>>>             (after 2s) timeout
>>>>>         decline
>>>>>     -------------->
>>>>>         decline
>>>>>     <--------------
>>>>>
>>>>> As a result, a decline message was sent in the implementation, and 
>>>>> this
>>>>> message was read from TCP by the already-fallback connection.
>>>>>
>>>>> This patch double the client timeout as 2x of the server value,
>>>>> With this simple change, the Decline messages should never cross or
>>>>> collide (during Confirm link timeout).
>>>>>
>>>>> This issue requires an immediate solution, since the protocol updates
>>>>> involve a more long-term solution.
>>>>>
>>>>> Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the 
>>>>> LLC flow")
>>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>>>> ---
>>>>>   net/smc/af_smc.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>>>> index abd2667..5b91f55 100644
>>>>> --- a/net/smc/af_smc.c
>>>>> +++ b/net/smc/af_smc.c
>>>>> @@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct 
>>>>> smc_sock *smc)
>>>>>       int rc;
>>>>>         /* receive CONFIRM LINK request from server over RoCE 
>>>>> fabric */
>>>>> -    qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME,
>>>>> +    qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME,
>>>>>                     SMC_LLC_CONFIRM_LINK);
>>>>>       if (!qentry) {
>>>>>           struct smc_clc_msg_decline dclc;
>>>> I'm wondering if the double time (if sufficient) of timeout could 
>>>> be for waiting for CLC_DECLINE on the client's side. i.e.
>>>>
>>>
>>> It depends. We can indeed introduce a sysctl to allow server to 
>>> manager their Confirm Link timeout,
>>> but if there will be protocol updates, this introduction will no 
>>> longer be necessary, and we will
>>> have to maintain it continuously.
>>>
> no, I don't think, either, that we need a sysctl for that.

I am okay about that.

>>> I believe the core of the solution is to ensure that decline 
>>> messages never cross or collide. Increasing
>>> the client's timeout by twice as much as the server's timeout can 
>>> temporarily solve this problem.
>
> I have no objection with that, but my question is why you don't 
> increase the timeout waiting for CLC_DECLINE instead of waiting 
> LLC_Confirm_Link? Shouldn't they have the same effect?
>

Logically speaking, of course, they have the same effect, but there are 
two reasons that i choose to increase LLC timeout here:

1. to avoid DECLINE  cross or collide, we need a bigger time gap, a 
simple math is

     2 ( LLC_Confirm_Link) + 1 (CLC_DECLINE) = 3
     2 (LLC_Confirm_Link)  + 1 * 2 (CLC_DECLINE) = 4
     2 * 2(LLC_Confirm_Link) + 1 (CLC_DECLINE) = 5

Obviously, double the LLC_Confirm_Link will result in more time gaps.

2. increase LLC timeout to allow as many RDMA link as possible to 
succeed, rather than fallback.

D. Wythe

>>> If Jerry's proposed protocol updates are too complex or if there 
>>> won't be any future protocol updates,
>>> it's still not late to let server manager their Confirm Link timeout 
>>> then.
>>>
>>> Best wishes,
>>> D. Wythe
>>>
>>
>> FYI:
>>
>> It seems that my email was not successfully delivered due to some 
>> reasons. Sorry
>> for that.
>>
>> D. Wythe
>>
>>
>
>>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>>> index 35ddebae8894..9b1feef1013d 100644
>>>> --- a/net/smc/af_smc.c
>>>> +++ b/net/smc/af_smc.c
>>>> @@ -605,7 +605,7 @@ static int smcr_clnt_conf_first_link(struct 
>>>> smc_sock *smc)
>>>>                 struct smc_clc_msg_decline dclc;
>>>>
>>>>                 rc = smc_clc_wait_msg(smc, &dclc, sizeof(dclc),
>>>> -                                     SMC_CLC_DECLINE, 
>>>> CLC_WAIT_TIME_SHORT);
>>>> +                                     SMC_CLC_DECLINE, 2 * 
>>>> CLC_WAIT_TIME_SHORT);
>>>>                 return rc == -EAGAIN ? SMC_CLC_DECL_TIMEOUT_CL : rc;
>>>>         }
>>>>         smc_llc_save_peer_uid(qentry);
>>>>
>>>> Because the purpose is to let the server have the control to deline.
>>>
>>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v1] net/smc: avoid data corruption caused by decline
  2023-11-14  9:52         ` D. Wythe
@ 2023-11-15 13:52           ` Wenjia Zhang
  0 siblings, 0 replies; 10+ messages in thread
From: Wenjia Zhang @ 2023-11-15 13:52 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 14.11.23 10:52, D. Wythe wrote:
> 
> 
> On 11/13/23 6:57 PM, Wenjia Zhang wrote:
>>
>>
>> On 13.11.23 03:50, D. Wythe wrote:
>>>
>>>
>>> On 11/10/23 10:51 AM, D. Wythe wrote:
>>>>
>>>>
>>>> On 11/8/23 9:00 PM, Wenjia Zhang wrote:
>>>>>
>>>>>
>>>>> On 08.11.23 10:48, D. Wythe wrote:
>>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>>
>>>>>> We found a data corruption issue during testing of SMC-R on Redis
>>>>>> applications.
>>>>>>
>>>>>> The benchmark has a low probability of reporting a strange error as
>>>>>> shown below.
>>>>>>
>>>>>> "Error: Protocol error, got "\xe2" as reply type byte"
>>>>>>
>>>>>> Finally, we found that the retrieved error data was as follows:
>>>>>>
>>>>>> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
>>>>>> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>>>>>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2
>>>>>>
>>>>>> It is quite obvious that this is a SMC DECLINE message, which 
>>>>>> means that
>>>>>> the applications received SMC protocol message.
>>>>>> We found that this was caused by the following situations:
>>>>>>
>>>>>> client            server
>>>>>>        proposal
>>>>>>     ------------->
>>>>>>        accept
>>>>>>     <-------------
>>>>>>        confirm
>>>>>>     ------------->
>>>>>> wait confirm
>>>>>>
>>>>>>      failed llc confirm
>>>>>>         x------
>>>>>> (after 2s)timeout
>>>>>>             wait rsp
>>>>>>
>>>>>> wait decline
>>>>>>
>>>>>> (after 1s) timeout
>>>>>>             (after 2s) timeout
>>>>>>         decline
>>>>>>     -------------->
>>>>>>         decline
>>>>>>     <--------------
>>>>>>
>>>>>> As a result, a decline message was sent in the implementation, and 
>>>>>> this
>>>>>> message was read from TCP by the already-fallback connection.
>>>>>>
>>>>>> This patch double the client timeout as 2x of the server value,
>>>>>> With this simple change, the Decline messages should never cross or
>>>>>> collide (during Confirm link timeout).
>>>>>>
>>>>>> This issue requires an immediate solution, since the protocol updates
>>>>>> involve a more long-term solution.
>>>>>>
>>>>>> Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the 
>>>>>> LLC flow")
>>>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>>>>> ---
>>>>>>   net/smc/af_smc.c | 2 +-
>>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>>>>> index abd2667..5b91f55 100644
>>>>>> --- a/net/smc/af_smc.c
>>>>>> +++ b/net/smc/af_smc.c
>>>>>> @@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct 
>>>>>> smc_sock *smc)
>>>>>>       int rc;
>>>>>>         /* receive CONFIRM LINK request from server over RoCE 
>>>>>> fabric */
>>>>>> -    qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME,
>>>>>> +    qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME,
>>>>>>                     SMC_LLC_CONFIRM_LINK);
>>>>>>       if (!qentry) {
>>>>>>           struct smc_clc_msg_decline dclc;
>>>>> I'm wondering if the double time (if sufficient) of timeout could 
>>>>> be for waiting for CLC_DECLINE on the client's side. i.e.
>>>>>
>>>>
>>>> It depends. We can indeed introduce a sysctl to allow server to 
>>>> manager their Confirm Link timeout,
>>>> but if there will be protocol updates, this introduction will no 
>>>> longer be necessary, and we will
>>>> have to maintain it continuously.
>>>>
>> no, I don't think, either, that we need a sysctl for that.
> 
> I am okay about that.
> 
>>>> I believe the core of the solution is to ensure that decline 
>>>> messages never cross or collide. Increasing
>>>> the client's timeout by twice as much as the server's timeout can 
>>>> temporarily solve this problem.
>>
>> I have no objection with that, but my question is why you don't 
>> increase the timeout waiting for CLC_DECLINE instead of waiting 
>> LLC_Confirm_Link? Shouldn't they have the same effect?
>>
> 
> Logically speaking, of course, they have the same effect, but there are 
> two reasons that i choose to increase LLC timeout here:
> 
> 1. to avoid DECLINE  cross or collide, we need a bigger time gap, a 
> simple math is
> 
>      2 ( LLC_Confirm_Link) + 1 (CLC_DECLINE) = 3
>      2 (LLC_Confirm_Link)  + 1 * 2 (CLC_DECLINE) = 4
>      2 * 2(LLC_Confirm_Link) + 1 (CLC_DECLINE) = 5
> 
> Obviously, double the LLC_Confirm_Link will result in more time gaps.
> 
That's already clear to me. That's why I stressed "(if sufficient)".

> 2. increase LLC timeout to allow as many RDMA link as possible to 
> succeed, rather than fallback.
> 
ok, that sounds reasonable. And I think that's the answer which 
persuaded me. Thank you!

> D. Wythe
> 
>>>> If Jerry's proposed protocol updates are too complex or if there 
>>>> won't be any future protocol updates,
>>>> it's still not late to let server manager their Confirm Link timeout 
>>>> then.
>>>>
>>>> Best wishes,
>>>> D. Wythe
>>>>
>>>
>>> FYI:
>>>
>>> It seems that my email was not successfully delivered due to some 
>>> reasons. Sorry
>>> for that.
>>>
>>> D. Wythe
>>>
>>>
>>
>>>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>>>> index 35ddebae8894..9b1feef1013d 100644
>>>>> --- a/net/smc/af_smc.c
>>>>> +++ b/net/smc/af_smc.c
>>>>> @@ -605,7 +605,7 @@ static int smcr_clnt_conf_first_link(struct 
>>>>> smc_sock *smc)
>>>>>                 struct smc_clc_msg_decline dclc;
>>>>>
>>>>>                 rc = smc_clc_wait_msg(smc, &dclc, sizeof(dclc),
>>>>> -                                     SMC_CLC_DECLINE, 
>>>>> CLC_WAIT_TIME_SHORT);
>>>>> +                                     SMC_CLC_DECLINE, 2 * 
>>>>> CLC_WAIT_TIME_SHORT);
>>>>>                 return rc == -EAGAIN ? SMC_CLC_DECL_TIMEOUT_CL : rc;
>>>>>         }
>>>>>         smc_llc_save_peer_uid(qentry);
>>>>>
>>>>> Because the purpose is to let the server have the control to deline.
>>>>
>>>
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v1] net/smc: avoid data corruption caused by decline
  2023-11-13  3:44 ` Dust Li
@ 2023-11-15 14:06   ` Wenjia Zhang
  2023-11-16 11:50     ` D. Wythe
  0 siblings, 1 reply; 10+ messages in thread
From: Wenjia Zhang @ 2023-11-15 14:06 UTC (permalink / raw)
  To: dust.li, D. Wythe, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 13.11.23 04:44, Dust Li wrote:
> On Wed, Nov 08, 2023 at 05:48:29PM +0800, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> We found a data corruption issue during testing of SMC-R on Redis
>> applications.
>>
>> The benchmark has a low probability of reporting a strange error as
>> shown below.
>>
>> "Error: Protocol error, got "\xe2" as reply type byte"
>>
>> Finally, we found that the retrieved error data was as follows:
>>
>> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
>> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2
>>
>> It is quite obvious that this is a SMC DECLINE message, which means that
>> the applications received SMC protocol message.
>> We found that this was caused by the following situations:
>>
>> client			server
>> 	   proposal
>> 	------------->
>> 	   accept
>> 	<-------------
>> 	   confirm
>> 	------------->
>> wait confirm
>>
>> 	 failed llc confirm
>> 	    x------
>> (after 2s)timeout
>> 			wait rsp
>>
>> wait decline
>>
>> (after 1s) timeout
>> 			(after 2s) timeout
>> 	    decline
>> 	-------------->
>> 	    decline
>> 	<--------------
>>
>> As a result, a decline message was sent in the implementation, and this
>> message was read from TCP by the already-fallback connection.
>>
>> This patch double the client timeout as 2x of the server value,
>> With this simple change, the Decline messages should never cross or
>> collide (during Confirm link timeout).
>>
>> This issue requires an immediate solution, since the protocol updates
>> involve a more long-term solution.
>>
>> Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC flow")
>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>> ---
>> net/smc/af_smc.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>> index abd2667..5b91f55 100644
>> --- a/net/smc/af_smc.c
>> +++ b/net/smc/af_smc.c
>> @@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct smc_sock *smc)
>> 	int rc;
>>
>> 	/* receive CONFIRM LINK request from server over RoCE fabric */
>> -	qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME,
>> +	qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME,
>> 			      SMC_LLC_CONFIRM_LINK);
> 
> It may be difficult for people to understand why LLC_WAIT_TIME is
> different, especially without any comments explaining its purpose.
> People are required to use git to find the reason, which I believe is
> not conducive to easy maintenance.
> 
> Best regards,
> Dust
> 
> 
Good point! @D.Wythe, could you please try to add a simple commet to 
explain it?

Thanks,
Wenjia
> 
>> 	if (!qentry) {
>> 		struct smc_clc_msg_decline dclc;
>> -- 
>> 1.8.3.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v1] net/smc: avoid data corruption caused by decline
  2023-11-15 14:06   ` Wenjia Zhang
@ 2023-11-16 11:50     ` D. Wythe
  0 siblings, 0 replies; 10+ messages in thread
From: D. Wythe @ 2023-11-16 11:50 UTC (permalink / raw)
  To: Wenjia Zhang, dust.li, kgraul, jaka, wintera
  Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 11/15/23 10:06 PM, Wenjia Zhang wrote:
>
>
> On 13.11.23 04:44, Dust Li wrote:
>> On Wed, Nov 08, 2023 at 05:48:29PM +0800, D. Wythe wrote:
>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>
>>> We found a data corruption issue during testing of SMC-R on Redis
>>> applications.
>>>
>>> The benchmark has a low probability of reporting a strange error as
>>> shown below.
>>>
>>> "Error: Protocol error, got "\xe2" as reply type byte"
>>>
>>> Finally, we found that the retrieved error data was as follows:
>>>
>>> 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
>>> 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>>> 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2
>>>
>>> It is quite obvious that this is a SMC DECLINE message, which means 
>>> that
>>> the applications received SMC protocol message.
>>> We found that this was caused by the following situations:
>>>
>>> client            server
>>>        proposal
>>>     ------------->
>>>        accept
>>>     <-------------
>>>        confirm
>>>     ------------->
>>> wait confirm
>>>
>>>      failed llc confirm
>>>         x------
>>> (after 2s)timeout
>>>             wait rsp
>>>
>>> wait decline
>>>
>>> (after 1s) timeout
>>>             (after 2s) timeout
>>>         decline
>>>     -------------->
>>>         decline
>>>     <--------------
>>>
>>> As a result, a decline message was sent in the implementation, and this
>>> message was read from TCP by the already-fallback connection.
>>>
>>> This patch double the client timeout as 2x of the server value,
>>> With this simple change, the Decline messages should never cross or
>>> collide (during Confirm link timeout).
>>>
>>> This issue requires an immediate solution, since the protocol updates
>>> involve a more long-term solution.
>>>
>>> Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC 
>>> flow")
>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>> ---
>>> net/smc/af_smc.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>> index abd2667..5b91f55 100644
>>> --- a/net/smc/af_smc.c
>>> +++ b/net/smc/af_smc.c
>>> @@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct 
>>> smc_sock *smc)
>>>     int rc;
>>>
>>>     /* receive CONFIRM LINK request from server over RoCE fabric */
>>> -    qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME,
>>> +    qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME,
>>>                   SMC_LLC_CONFIRM_LINK);
>>
>> It may be difficult for people to understand why LLC_WAIT_TIME is
>> different, especially without any comments explaining its purpose.
>> People are required to use git to find the reason, which I believe is
>> not conducive to easy maintenance.
>>
>> Best regards,
>> Dust
>>
>>
> Good point! @D.Wythe, could you please try to add a simple commet to 
> explain it?
>

Also good to me, i will add comment to explain it.

D. Wythe

> Thanks,
> Wenjia
>>
>>>     if (!qentry) {
>>>         struct smc_clc_msg_decline dclc;
>>> -- 
>>> 1.8.3.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-11-16 11:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-08  9:48 [PATCH net v1] net/smc: avoid data corruption caused by decline D. Wythe
2023-11-08 13:00 ` Wenjia Zhang
     [not found]   ` <b3ce2dfe-ece9-919b-024d-051cd66609ed@linux.alibaba.com>
2023-11-13  2:50     ` D. Wythe
2023-11-13 10:57       ` Wenjia Zhang
2023-11-14  9:52         ` D. Wythe
2023-11-15 13:52           ` Wenjia Zhang
2023-11-08 14:58 ` Jakub Kicinski
2023-11-13  3:44 ` Dust Li
2023-11-15 14:06   ` Wenjia Zhang
2023-11-16 11:50     ` D. Wythe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).