netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2] net/smc: fix application data exception
@ 2023-02-16  6:39 D. Wythe
  2023-02-16 13:49 ` Wenjia Zhang
  2023-02-28 18:37 ` Heiko Carstens
  0 siblings, 2 replies; 5+ messages in thread
From: D. Wythe @ 2023-02-16  6:39 UTC (permalink / raw)
  To: kgraul, wenjia, jaka; +Cc: kuba, davem, netdev, linux-s390, linux-rdma

From: "D. Wythe" <alibuda@linux.alibaba.com>

There is a certain probability that following
exceptions will occur in the wrk benchmark test:

Running 10s test @ http://11.213.45.6:80
  8 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.72ms   13.94ms 245.33ms   94.17%
    Req/Sec     1.96k   713.67     5.41k    75.16%
  155262 requests in 10.10s, 23.10MB read
Non-2xx or 3xx responses: 3

We will find that the error is HTTP 400 error, which is a serious
exception in our test, which means the application data was
corrupted.

Consider the following scenarios:

CPU0                            CPU1

buf_desc->used = 0;
                                cmpxchg(buf_desc->used, 0, 1)
                                deal_with(buf_desc)

memset(buf_desc->cpu_addr,0);

This will cause the data received by a victim connection to be cleared,
thus triggering an HTTP 400 error in the server.

This patch exchange the order between clear used and memset, add
barrier to ensure memory consistency.

Fixes: 1c5526968e27 ("net/smc: Clear memory when release and reuse buffer")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
v2: rebase it with latest net tree.

 net/smc/smc_core.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index c305d8d..c19d4b7 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1120,8 +1120,9 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
 
 		smc_buf_free(lgr, is_rmb, buf_desc);
 	} else {
-		buf_desc->used = 0;
-		memset(buf_desc->cpu_addr, 0, buf_desc->len);
+		/* memzero_explicit provides potential memory barrier semantics */
+		memzero_explicit(buf_desc->cpu_addr, buf_desc->len);
+		WRITE_ONCE(buf_desc->used, 0);
 	}
 }
 
@@ -1132,19 +1133,17 @@ static void smc_buf_unuse(struct smc_connection *conn,
 		if (!lgr->is_smcd && conn->sndbuf_desc->is_vm) {
 			smcr_buf_unuse(conn->sndbuf_desc, false, lgr);
 		} else {
-			conn->sndbuf_desc->used = 0;
-			memset(conn->sndbuf_desc->cpu_addr, 0,
-			       conn->sndbuf_desc->len);
+			memzero_explicit(conn->sndbuf_desc->cpu_addr, conn->sndbuf_desc->len);
+			WRITE_ONCE(conn->sndbuf_desc->used, 0);
 		}
 	}
 	if (conn->rmb_desc) {
 		if (!lgr->is_smcd) {
 			smcr_buf_unuse(conn->rmb_desc, true, lgr);
 		} else {
-			conn->rmb_desc->used = 0;
-			memset(conn->rmb_desc->cpu_addr, 0,
-			       conn->rmb_desc->len +
-			       sizeof(struct smcd_cdc_msg));
+			memzero_explicit(conn->rmb_desc->cpu_addr,
+					 conn->rmb_desc->len + sizeof(struct smcd_cdc_msg));
+			WRITE_ONCE(conn->rmb_desc->used, 0);
 		}
 	}
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net v2] net/smc: fix application data exception
  2023-02-16  6:39 [PATCH net v2] net/smc: fix application data exception D. Wythe
@ 2023-02-16 13:49 ` Wenjia Zhang
  2023-02-28 18:37 ` Heiko Carstens
  1 sibling, 0 replies; 5+ messages in thread
From: Wenjia Zhang @ 2023-02-16 13:49 UTC (permalink / raw)
  To: D. Wythe, kgraul, jaka; +Cc: kuba, davem, netdev, linux-s390, linux-rdma



On 16.02.23 07:39, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> There is a certain probability that following
> exceptions will occur in the wrk benchmark test:
> 
> Running 10s test @ http://11.213.45.6:80
>    8 threads and 64 connections
>    Thread Stats   Avg      Stdev     Max   +/- Stdev
>      Latency     3.72ms   13.94ms 245.33ms   94.17%
>      Req/Sec     1.96k   713.67     5.41k    75.16%
>    155262 requests in 10.10s, 23.10MB read
> Non-2xx or 3xx responses: 3
> 
> We will find that the error is HTTP 400 error, which is a serious
> exception in our test, which means the application data was
> corrupted.
> 
> Consider the following scenarios:
> 
> CPU0                            CPU1
> 
> buf_desc->used = 0;
>                                  cmpxchg(buf_desc->used, 0, 1)
>                                  deal_with(buf_desc)
> 
> memset(buf_desc->cpu_addr,0);
> 
> This will cause the data received by a victim connection to be cleared,
> thus triggering an HTTP 400 error in the server.
> 
> This patch exchange the order between clear used and memset, add
> barrier to ensure memory consistency.
> 
> Fixes: 1c5526968e27 ("net/smc: Clear memory when release and reuse buffer")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
> v2: rebase it with latest net tree.
> 

Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>

>   net/smc/smc_core.c | 17 ++++++++---------
>   1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index c305d8d..c19d4b7 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -1120,8 +1120,9 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
>   
>   		smc_buf_free(lgr, is_rmb, buf_desc);
>   	} else {
> -		buf_desc->used = 0;
> -		memset(buf_desc->cpu_addr, 0, buf_desc->len);
> +		/* memzero_explicit provides potential memory barrier semantics */
> +		memzero_explicit(buf_desc->cpu_addr, buf_desc->len);
> +		WRITE_ONCE(buf_desc->used, 0);
>   	}
>   }
>   
> @@ -1132,19 +1133,17 @@ static void smc_buf_unuse(struct smc_connection *conn,
>   		if (!lgr->is_smcd && conn->sndbuf_desc->is_vm) {
>   			smcr_buf_unuse(conn->sndbuf_desc, false, lgr);
>   		} else {
> -			conn->sndbuf_desc->used = 0;
> -			memset(conn->sndbuf_desc->cpu_addr, 0,
> -			       conn->sndbuf_desc->len);
> +			memzero_explicit(conn->sndbuf_desc->cpu_addr, conn->sndbuf_desc->len);
> +			WRITE_ONCE(conn->sndbuf_desc->used, 0);
>   		}
>   	}
>   	if (conn->rmb_desc) {
>   		if (!lgr->is_smcd) {
>   			smcr_buf_unuse(conn->rmb_desc, true, lgr);
>   		} else {
> -			conn->rmb_desc->used = 0;
> -			memset(conn->rmb_desc->cpu_addr, 0,
> -			       conn->rmb_desc->len +
> -			       sizeof(struct smcd_cdc_msg));
> +			memzero_explicit(conn->rmb_desc->cpu_addr,
> +					 conn->rmb_desc->len + sizeof(struct smcd_cdc_msg));
> +			WRITE_ONCE(conn->rmb_desc->used, 0);
>   		}
>   	}
>   }



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v2] net/smc: fix application data exception
  2023-02-16  6:39 [PATCH net v2] net/smc: fix application data exception D. Wythe
  2023-02-16 13:49 ` Wenjia Zhang
@ 2023-02-28 18:37 ` Heiko Carstens
  2023-03-01 13:41   ` D. Wythe
  2023-03-08 14:11   ` D. Wythe
  1 sibling, 2 replies; 5+ messages in thread
From: Heiko Carstens @ 2023-02-28 18:37 UTC (permalink / raw)
  To: D. Wythe
  Cc: kgraul, wenjia, jaka, kuba, davem, netdev, linux-s390, linux-rdma,
	David Howells, Paul E. McKenney, Will Deacon, Peter Zijlstra

On Thu, Feb 16, 2023 at 02:39:05PM +0800, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> There is a certain probability that following
> exceptions will occur in the wrk benchmark test:
> 
> Running 10s test @ http://11.213.45.6:80
>   8 threads and 64 connections
>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>     Latency     3.72ms   13.94ms 245.33ms   94.17%
>     Req/Sec     1.96k   713.67     5.41k    75.16%
>   155262 requests in 10.10s, 23.10MB read
> Non-2xx or 3xx responses: 3
> 
> We will find that the error is HTTP 400 error, which is a serious
> exception in our test, which means the application data was
> corrupted.
> 
> Consider the following scenarios:
> 
> CPU0                            CPU1
> 
> buf_desc->used = 0;
>                                 cmpxchg(buf_desc->used, 0, 1)
>                                 deal_with(buf_desc)
> 
> memset(buf_desc->cpu_addr,0);
> 
> This will cause the data received by a victim connection to be cleared,
> thus triggering an HTTP 400 error in the server.
> 
> This patch exchange the order between clear used and memset, add
> barrier to ensure memory consistency.
> 
> Fixes: 1c5526968e27 ("net/smc: Clear memory when release and reuse buffer")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
> v2: rebase it with latest net tree.
> 
>  net/smc/smc_core.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index c305d8d..c19d4b7 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -1120,8 +1120,9 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
>  
>  		smc_buf_free(lgr, is_rmb, buf_desc);
>  	} else {
> -		buf_desc->used = 0;
> -		memset(buf_desc->cpu_addr, 0, buf_desc->len);
> +		/* memzero_explicit provides potential memory barrier semantics */
> +		memzero_explicit(buf_desc->cpu_addr, buf_desc->len);
> +		WRITE_ONCE(buf_desc->used, 0);

This looks odd to me. memzero_explicit() is only sort of a compiler
barrier, since it is a function call, but not a real memory barrier.

You may want to check Documentation/memory-barriers.txt and
Documentation/atomic_t.txt.

To me the proper solution looks like buf_desc->used should be converted to
an atomic_t, and then you could do:

	memset(buf_desc->cpu_addr, 0, buf_desc->len);
	smp_mb__before_atomic();
	atomic_set(&buf_desc->used, 0);

and in a similar way use atomic_cmpxchg() instead of the now used cmpxchg()
for the part that sets buf_desc->used to 1.

Adding experts to cc, since s390 has such strong memory ordering semantics
that you can basically do whatever you want without breaking anything. So I
don't consider myself an expert here at all. :)

But given that this is common code, let's make sure this is really correct.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v2] net/smc: fix application data exception
  2023-02-28 18:37 ` Heiko Carstens
@ 2023-03-01 13:41   ` D. Wythe
  2023-03-08 14:11   ` D. Wythe
  1 sibling, 0 replies; 5+ messages in thread
From: D. Wythe @ 2023-03-01 13:41 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: kgraul, wenjia, jaka, kuba, davem, netdev, linux-s390, linux-rdma,
	David Howells, Paul E. McKenney, Will Deacon, Peter Zijlstra



On 3/1/23 2:37 AM, Heiko Carstens wrote:
> On Thu, Feb 16, 2023 at 02:39:05PM +0800, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> There is a certain probability that following
>> exceptions will occur in the wrk benchmark test:
>>
>> Running 10s test @ http://11.213.45.6:80
>>    8 threads and 64 connections
>>    Thread Stats   Avg      Stdev     Max   +/- Stdev
>>      Latency     3.72ms   13.94ms 245.33ms   94.17%
>>      Req/Sec     1.96k   713.67     5.41k    75.16%
>>    155262 requests in 10.10s, 23.10MB read
>> Non-2xx or 3xx responses: 3
>>
>> We will find that the error is HTTP 400 error, which is a serious
>> exception in our test, which means the application data was
>> corrupted.
>>
>> Consider the following scenarios:
>>
>> CPU0                            CPU1
>>
>> buf_desc->used = 0;
>>                                  cmpxchg(buf_desc->used, 0, 1)
>>                                  deal_with(buf_desc)
>>
>> memset(buf_desc->cpu_addr,0);
>>
>> This will cause the data received by a victim connection to be cleared,
>> thus triggering an HTTP 400 error in the server.
>>
>> This patch exchange the order between clear used and memset, add
>> barrier to ensure memory consistency.
>>
>> Fixes: 1c5526968e27 ("net/smc: Clear memory when release and reuse buffer")
>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>> ---
>> v2: rebase it with latest net tree.
>>
>>   net/smc/smc_core.c | 17 ++++++++---------
>>   1 file changed, 8 insertions(+), 9 deletions(-)
>>
>> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
>> index c305d8d..c19d4b7 100644
>> --- a/net/smc/smc_core.c
>> +++ b/net/smc/smc_core.c
>> @@ -1120,8 +1120,9 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
>>   
>>   		smc_buf_free(lgr, is_rmb, buf_desc);
>>   	} else {
>> -		buf_desc->used = 0;
>> -		memset(buf_desc->cpu_addr, 0, buf_desc->len);
>> +		/* memzero_explicit provides potential memory barrier semantics */
>> +		memzero_explicit(buf_desc->cpu_addr, buf_desc->len);
>> +		WRITE_ONCE(buf_desc->used, 0);
> 
> This looks odd to me. memzero_explicit() is only sort of a compiler
> barrier, since it is a function call, but not a real memory barrier.

Hi Heiko,

Thanks for you point out, the semantics of memzero_explicit
is exactly what you said. But my original intention is
just wants to ensure the order relationship between memset and the assignment.
I'm not really sure whether a CPU memory barrier is needed here.

> You may want to check Documentation/memory-barriers.txt and
> Documentation/atomic_t.txt.
> 
> To me the proper solution looks like buf_desc->used should be converted to
> an atomic_t, and then you could do:
> 
> 	memset(buf_desc->cpu_addr, 0, buf_desc->len);
> 	smp_mb__before_atomic();
> 	atomic_set(&buf_desc->used, 0);

Anyhow, your solution is definitely correct, because that CPU memory barrier
(smp_mb__before_atomic) implies the compiler barrier.

> and in a similar way use atomic_cmpxchg() instead of the now used cmpxchg()
> for the part that sets buf_desc->used to 1.
> 
> Adding experts to cc, since s390 has such strong memory ordering semantics
> that you can basically do whatever you want without breaking anything. So I
> don't consider myself an expert here at all. :)
> 
> But given that this is common code, let's make sure this is really correct
Thank you for your comments again. :-), I am looking up some more information,
and I believe I can reply to you soon.

best wishes,
D. Wythe







^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v2] net/smc: fix application data exception
  2023-02-28 18:37 ` Heiko Carstens
  2023-03-01 13:41   ` D. Wythe
@ 2023-03-08 14:11   ` D. Wythe
  1 sibling, 0 replies; 5+ messages in thread
From: D. Wythe @ 2023-03-08 14:11 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: kgraul, wenjia, jaka, kuba, davem, netdev, linux-s390, linux-rdma,
	David Howells, Paul E. McKenney, Will Deacon, Peter Zijlstra


On 3/1/23 2:37 AM, Heiko Carstens wrote:
> On Thu, Feb 16, 2023 at 02:39:05PM +0800, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> There is a certain probability that following
>> exceptions will occur in the wrk benchmark test:
>>
>> Running 10s test @ http://11.213.45.6:80
>>    8 threads and 64 connections
>>    Thread Stats   Avg      Stdev     Max   +/- Stdev
>>      Latency     3.72ms   13.94ms 245.33ms   94.17%
>>      Req/Sec     1.96k   713.67     5.41k    75.16%
>>    155262 requests in 10.10s, 23.10MB read
>> Non-2xx or 3xx responses: 3
>>
>> We will find that the error is HTTP 400 error, which is a serious
>> exception in our test, which means the application data was
>> corrupted.
>>
>> Consider the following scenarios:
>>
>> CPU0                            CPU1
>>
>> buf_desc->used = 0;
>>                                  cmpxchg(buf_desc->used, 0, 1)
>>                                  deal_with(buf_desc)
>>
>> memset(buf_desc->cpu_addr,0);
>>
>> This will cause the data received by a victim connection to be cleared,
>> thus triggering an HTTP 400 error in the server.
>>
>> This patch exchange the order between clear used and memset, add
>> barrier to ensure memory consistency.
>>
>> Fixes: 1c5526968e27 ("net/smc: Clear memory when release and reuse buffer")
>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>> ---
>> v2: rebase it with latest net tree.
>>
>>   net/smc/smc_core.c | 17 ++++++++---------
>>   1 file changed, 8 insertions(+), 9 deletions(-)
>>
>> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
>> index c305d8d..c19d4b7 100644
>> --- a/net/smc/smc_core.c
>> +++ b/net/smc/smc_core.c
>> @@ -1120,8 +1120,9 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
>>   
>>   		smc_buf_free(lgr, is_rmb, buf_desc);
>>   	} else {
>> -		buf_desc->used = 0;
>> -		memset(buf_desc->cpu_addr, 0, buf_desc->len);
>> +		/* memzero_explicit provides potential memory barrier semantics */
>> +		memzero_explicit(buf_desc->cpu_addr, buf_desc->len);
>> +		WRITE_ONCE(buf_desc->used, 0);
> This looks odd to me. memzero_explicit() is only sort of a compiler
> barrier, since it is a function call, but not a real memory barrier.
>
> You may want to check Documentation/memory-barriers.txt and
> Documentation/atomic_t.txt.
>
> To me the proper solution looks like buf_desc->used should be converted to
> an atomic_t, and then you could do:
>
> 	memset(buf_desc->cpu_addr, 0, buf_desc->len);
> 	smp_mb__before_atomic();
> 	atomic_set(&buf_desc->used, 0);
>
> and in a similar way use atomic_cmpxchg() instead of the now used cmpxchg()
> for the part that sets buf_desc->used to 1.
>
> Adding experts to cc, since s390 has such strong memory ordering semantics
> that you can basically do whatever you want without breaking anything. So I
> don't consider myself an expert here at all. :)
>
> But given that this is common code, let's make sure this is really correct.

Hi  Heiko,

I realize that you are completely right, and I will repair this problem 
according to your ideas.

Thank you very much!!!

Best wishes.

D. Wythe


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-03-08 14:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-16  6:39 [PATCH net v2] net/smc: fix application data exception D. Wythe
2023-02-16 13:49 ` Wenjia Zhang
2023-02-28 18:37 ` Heiko Carstens
2023-03-01 13:41   ` D. Wythe
2023-03-08 14:11   ` D. Wythe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).