netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB
@ 2024-05-28 13:51 Guangguan Wang
  2024-05-28 13:51 ` [PATCH net-next 1/2] net/smc: set rmb's SG_MAX_SINGLE_ALLOC limitation only when CONFIG_ARCH_NO_SG_CHAIN is defined Guangguan Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Guangguan Wang @ 2024-05-28 13:51 UTC (permalink / raw)
  To: wenjia, jaka, davem, edumazet, kuba, pabeni
  Cc: kgraul, alibuda, tonylu, guwen, linux-s390, netdev, linux-kernel

SMCR_RMBE_SIZES is the upper boundary of SMC-R's snd_buf and rcv_buf.
The maximum bytes of snd_buf and rcv_buf can be calculated by 2^SMCR_
RMBE_SIZES * 16KB. SMCR_RMBE_SIZES = 5 means the upper boundary is 512KB.
TCP's snd_buf and rcv_buf max size is configured by net.ipv4.tcp_w/rmem[2]
whose defalut value is 4MB or 6MB, is much larger than SMC-R's upper
boundary.

In some scenarios, such as Recommendation System, the communication
pattern is mainly large size send/recv, where the size of snd_buf and
rcv_buf greatly affects performance. Due to the upper boundary
disadvantage, SMC-R performs poor than TCP in those scenarios. So it
is time to enlarge the upper boundary size of SMC-R's snd_buf and rcv_buf,
so that the SMC-R's snd_buf and rcv_buf can be configured to larger size
for performance gain in such scenarios.

The SMC-R rcv_buf's size will be transferred to peer by the field
rmbe_size in clc accept and confirm message. The length of the field
rmbe_size is four bits, which means the maximum value of SMCR_RMBE_SIZES
is 15. In case of frequently adjusting the value of SMCR_RMBE_SIZES
in different scenarios, set the value of SMCR_RMBE_SIZES to the maximum
value 15, which means the upper boundary of SMC-R's snd_buf and rcv_buf
is 512MB. As the real memory usage is determined by the value of
net.smc.w/rmem, not by the upper boundary, set the value of SMCR_RMBE_SIZES
to the maximum value has no side affects.

Guangguan Wang (2):
  net/smc: set rmb's SG_MAX_SINGLE_ALLOC limitation only when
    CONFIG_ARCH_NO_SG_CHAIN is defined
  net/smc: change SMCR_RMBE_SIZES from 5 to 15

 net/smc/smc_core.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net-next 1/2] net/smc: set rmb's SG_MAX_SINGLE_ALLOC limitation only when CONFIG_ARCH_NO_SG_CHAIN is defined
  2024-05-28 13:51 [PATCH net-next 0/2] Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB Guangguan Wang
@ 2024-05-28 13:51 ` Guangguan Wang
  2024-06-01  8:35   ` Simon Horman
  2024-05-28 13:51 ` [PATCH net-next 2/2] net/smc: change SMCR_RMBE_SIZES from 5 to 15 Guangguan Wang
  2024-05-29 16:28 ` [PATCH net-next 0/2] Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB Wenjia Zhang
  2 siblings, 1 reply; 9+ messages in thread
From: Guangguan Wang @ 2024-05-28 13:51 UTC (permalink / raw)
  To: wenjia, jaka, davem, edumazet, kuba, pabeni
  Cc: kgraul, alibuda, tonylu, guwen, linux-s390, netdev, linux-kernel

SG_MAX_SINGLE_ALLOC is used to limit maximum number of entries that
will be allocated in one piece of scatterlist. When the entries of
scatterlist exceeds SG_MAX_SINGLE_ALLOC, sg chain will be used. From
commit 7c703e54cc71 ("arch: switch the default on ARCH_HAS_SG_CHAIN"),
we can know that the macro CONFIG_ARCH_NO_SG_CHAIN is used to identify
whether sg chain is supported. So, SMC-R's rmb buffer should be limitted
by SG_MAX_SINGLE_ALLOC only when the macro CONFIG_ARCH_NO_SG_CHAIN is
defined.

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Co-developed-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Fixes: a3fe3d01bd0d ("net/smc: introduce sg-logic for RMBs")
---
 net/smc/smc_core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index fafdb97adfad..acca3b1a068f 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -2015,7 +2015,6 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
  */
 static u8 smc_compress_bufsize(int size, bool is_smcd, bool is_rmb)
 {
-	const unsigned int max_scat = SG_MAX_SINGLE_ALLOC * PAGE_SIZE;
 	u8 compressed;
 
 	if (size <= SMC_BUF_MIN_SIZE)
@@ -2025,9 +2024,11 @@ static u8 smc_compress_bufsize(int size, bool is_smcd, bool is_rmb)
 	compressed = min_t(u8, ilog2(size) + 1,
 			   is_smcd ? SMCD_DMBE_SIZES : SMCR_RMBE_SIZES);
 
+#ifdef CONFIG_ARCH_NO_SG_CHAIN
 	if (!is_smcd && is_rmb)
 		/* RMBs are backed by & limited to max size of scatterlists */
-		compressed = min_t(u8, compressed, ilog2(max_scat >> 14));
+		compressed = min_t(u8, compressed, ilog2((SG_MAX_SINGLE_ALLOC * PAGE_SIZE) >> 14));
+#endif
 
 	return compressed;
 }
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 2/2] net/smc: change SMCR_RMBE_SIZES from 5 to 15
  2024-05-28 13:51 [PATCH net-next 0/2] Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB Guangguan Wang
  2024-05-28 13:51 ` [PATCH net-next 1/2] net/smc: set rmb's SG_MAX_SINGLE_ALLOC limitation only when CONFIG_ARCH_NO_SG_CHAIN is defined Guangguan Wang
@ 2024-05-28 13:51 ` Guangguan Wang
  2024-05-29 16:28 ` [PATCH net-next 0/2] Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB Wenjia Zhang
  2 siblings, 0 replies; 9+ messages in thread
From: Guangguan Wang @ 2024-05-28 13:51 UTC (permalink / raw)
  To: wenjia, jaka, davem, edumazet, kuba, pabeni
  Cc: kgraul, alibuda, tonylu, guwen, linux-s390, netdev, linux-kernel

SMCR_RMBE_SIZES is the upper boundary of SMC-R's snd_buf and rcv_buf.
The maximum bytes of snd_buf and rcv_buf can be calculated by 2^SMCR_
RMBE_SIZES * 16KB. SMCR_RMBE_SIZES = 5 means the upper boundary is 512KB.
TCP's snd_buf and rcv_buf max size is configured by net.ipv4.tcp_w/rmem[2]
whose defalut value is 4MB or 6MB, is much larger than SMC-R's upper
boundary.

In some scenarios, such as Recommendation System, the communication
pattern is mainly large size send/recv, where the size of snd_buf and
rcv_buf greatly affects performance. Due to the upper boundary
disadvantage, SMC-R performs poor than TCP in those scenarios. So it
is time to enlarge the upper boundary size of SMC-R's snd_buf and rcv_buf,
so that the SMC-R's snd_buf and rcv_buf can be configured to larger size
for performance gain in such scenarios.

The SMC-R rcv_buf's size will be transferred to peer by the field
rmbe_size in clc accept and confirm message. The length of the field
rmbe_size is four bits, which means the maximum value of SMCR_RMBE_SIZES
is 15. In case of frequently adjusting the value of SMCR_RMBE_SIZES
in different scenarios, set the value of SMCR_RMBE_SIZES to the maximum
value 15, which means the upper boundary of SMC-R's snd_buf and rcv_buf
is 512MB. As the real memory usage is determined by the value of
net.smc.w/rmem, not by the upper boundary, set the value of SMCR_RMBE_SIZES
to the maximum value has no side affects.

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Co-developed-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/smc_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index acca3b1a068f..3b95828d9976 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -2006,7 +2006,7 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
 }
 
 #define SMCD_DMBE_SIZES		6 /* 0 -> 16KB, 1 -> 32KB, .. 6 -> 1MB */
-#define SMCR_RMBE_SIZES		5 /* 0 -> 16KB, 1 -> 32KB, .. 5 -> 512KB */
+#define SMCR_RMBE_SIZES		15 /* 0 -> 16KB, 1 -> 32KB, .. 15 -> 512MB */
 
 /* convert the RMB size into the compressed notation (minimum 16K, see
  * SMCD/R_DMBE_SIZES.
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 0/2] Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB
  2024-05-28 13:51 [PATCH net-next 0/2] Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB Guangguan Wang
  2024-05-28 13:51 ` [PATCH net-next 1/2] net/smc: set rmb's SG_MAX_SINGLE_ALLOC limitation only when CONFIG_ARCH_NO_SG_CHAIN is defined Guangguan Wang
  2024-05-28 13:51 ` [PATCH net-next 2/2] net/smc: change SMCR_RMBE_SIZES from 5 to 15 Guangguan Wang
@ 2024-05-29 16:28 ` Wenjia Zhang
  2024-05-31  8:15   ` Guangguan Wang
  2 siblings, 1 reply; 9+ messages in thread
From: Wenjia Zhang @ 2024-05-29 16:28 UTC (permalink / raw)
  To: Guangguan Wang, jaka, davem, edumazet, kuba, pabeni
  Cc: kgraul, alibuda, tonylu, guwen, linux-s390, netdev, linux-kernel



On 28.05.24 15:51, Guangguan Wang wrote:
> SMCR_RMBE_SIZES is the upper boundary of SMC-R's snd_buf and rcv_buf.
> The maximum bytes of snd_buf and rcv_buf can be calculated by 2^SMCR_
> RMBE_SIZES * 16KB. SMCR_RMBE_SIZES = 5 means the upper boundary is 512KB.
> TCP's snd_buf and rcv_buf max size is configured by net.ipv4.tcp_w/rmem[2]
> whose defalut value is 4MB or 6MB, is much larger than SMC-R's upper
> boundary.
> 
> In some scenarios, such as Recommendation System, the communication
> pattern is mainly large size send/recv, where the size of snd_buf and
> rcv_buf greatly affects performance. Due to the upper boundary
> disadvantage, SMC-R performs poor than TCP in those scenarios. So it
> is time to enlarge the upper boundary size of SMC-R's snd_buf and rcv_buf,
> so that the SMC-R's snd_buf and rcv_buf can be configured to larger size
> for performance gain in such scenarios.
> 
> The SMC-R rcv_buf's size will be transferred to peer by the field
> rmbe_size in clc accept and confirm message. The length of the field
> rmbe_size is four bits, which means the maximum value of SMCR_RMBE_SIZES
> is 15. In case of frequently adjusting the value of SMCR_RMBE_SIZES
> in different scenarios, set the value of SMCR_RMBE_SIZES to the maximum
> value 15, which means the upper boundary of SMC-R's snd_buf and rcv_buf
> is 512MB. As the real memory usage is determined by the value of
> net.smc.w/rmem, not by the upper boundary, set the value of SMCR_RMBE_SIZES
> to the maximum value has no side affects.
> 
Hi Guangguan,

That is correct that the maximum buffer(snd_buf and rcv_buf) size of 
SMCR is much smaller than TCP's. If I remember correctly, that was 
because the 512KB was enough for the traffic and did not waist memory 
space after some experiment. Sure, that was years ago, and it could be 
very different nowadays. But I'm still curious if you have any concrete 
scenario with performance benchmark which shows the distinguish 
disadvantage of the current maximum buffer size.

Thanks,
Wenjia

> Guangguan Wang (2):
>    net/smc: set rmb's SG_MAX_SINGLE_ALLOC limitation only when
>      CONFIG_ARCH_NO_SG_CHAIN is defined
>    net/smc: change SMCR_RMBE_SIZES from 5 to 15
> 
>   net/smc/smc_core.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 0/2] Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB
  2024-05-29 16:28 ` [PATCH net-next 0/2] Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB Wenjia Zhang
@ 2024-05-31  8:15   ` Guangguan Wang
  2024-05-31  9:03     ` Wenjia Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Guangguan Wang @ 2024-05-31  8:15 UTC (permalink / raw)
  To: Wenjia Zhang, jaka, davem, edumazet, kuba, pabeni
  Cc: kgraul, alibuda, tonylu, guwen, linux-s390, netdev, linux-kernel



On 2024/5/30 00:28, Wenjia Zhang wrote:
> 
> 
> On 28.05.24 15:51, Guangguan Wang wrote:
>> SMCR_RMBE_SIZES is the upper boundary of SMC-R's snd_buf and rcv_buf.
>> The maximum bytes of snd_buf and rcv_buf can be calculated by 2^SMCR_
>> RMBE_SIZES * 16KB. SMCR_RMBE_SIZES = 5 means the upper boundary is 512KB.
>> TCP's snd_buf and rcv_buf max size is configured by net.ipv4.tcp_w/rmem[2]
>> whose defalut value is 4MB or 6MB, is much larger than SMC-R's upper
>> boundary.
>>
>> In some scenarios, such as Recommendation System, the communication
>> pattern is mainly large size send/recv, where the size of snd_buf and
>> rcv_buf greatly affects performance. Due to the upper boundary
>> disadvantage, SMC-R performs poor than TCP in those scenarios. So it
>> is time to enlarge the upper boundary size of SMC-R's snd_buf and rcv_buf,
>> so that the SMC-R's snd_buf and rcv_buf can be configured to larger size
>> for performance gain in such scenarios.
>>
>> The SMC-R rcv_buf's size will be transferred to peer by the field
>> rmbe_size in clc accept and confirm message. The length of the field
>> rmbe_size is four bits, which means the maximum value of SMCR_RMBE_SIZES
>> is 15. In case of frequently adjusting the value of SMCR_RMBE_SIZES
>> in different scenarios, set the value of SMCR_RMBE_SIZES to the maximum
>> value 15, which means the upper boundary of SMC-R's snd_buf and rcv_buf
>> is 512MB. As the real memory usage is determined by the value of
>> net.smc.w/rmem, not by the upper boundary, set the value of SMCR_RMBE_SIZES
>> to the maximum value has no side affects.
>>
> Hi Guangguan,
> 
> That is correct that the maximum buffer(snd_buf and rcv_buf) size of SMCR is much smaller than TCP's. If I remember correctly, that was because the 512KB was enough for the traffic and did not waist memory space after some experiment. Sure, that was years ago, and it could be very different nowadays. But I'm still curious if you have any concrete scenario with performance benchmark which shows the distinguish disadvantage of the current maximum buffer size.
> 

Hi Wenjia,

The performance benchmark can be "Wide & Deep Recommender Model Training in TensorFlow" (https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Recommendation/WideAndDeep).
The related paper here: https://arxiv.org/pdf/1606.07792.

The performance unit is steps/s, where a higher value indicates better performance.

1) using 512KB snd_buf/recv_buf for SMC-R, default(4MB snd_buf/6MB recv_buf) for TCP:
 SMC-R performance vs TCP performance = 24.21 steps/s vs 24.85 steps/s

ps smcr stat:
RX Stats
  Data transmitted (Bytes)    37600503985 (37.60G)
  Total requests                   677841
  Buffer full                       40074 (5.91%)
            8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
  Bufs        0       0       0       0       0       0       4       0
  Reqs   178.2K  12.69K  8.125K  45.71K  23.51K  20.75K  60.16K       0
TX Stats
  Data transmitted (Bytes)   118471581684 (118.5G)
  Total requests                   874395
  Buffer full                      343080 (39.24%)
  Buffer full (remote)             468523 (53.58%)
  Buffer too small                 607914 (69.52%)
  Buffer too small (remote)        607914 (69.52%)
            8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
  Bufs        0       0       0       0       0       0       4       0
  Reqs   119.7K  3.169K  2.662K  5.583K  8.523K  21.55K  34.58K  318.0K

worker smcr stat:
RX Stats
  Data transmitted (Bytes)   118471581723 (118.5G)
  Total requests                   835959
  Buffer full                       99227 (11.87%)
            8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
  Bufs        0       0       0       0       0       0       4       0
  Reqs   125.4K  13.14K  17.49K  16.78K  34.27K  34.12K  223.8K       0
TX Stats
  Data transmitted (Bytes)    37600504139 (37.60G)
  Total requests                   606822
  Buffer full                       86597 (14.27%)
  Buffer full (remote)             156098 (25.72%)
  Buffer too small                 154218 (25.41%)
  Buffer too small (remote)        154218 (25.41%)
            8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
  Bufs        0       0       0       0       0       0       4       0
  Reqs   323.6K  13.26K  6.979K  50.84K  19.43K  14.46K  8.231K  81.80K

2) using 4MB snd_buf and 6MB recv_buf for SMC-R, default(4MB snd_buf/6MB recv_buf) for TCP:
 SMC-R performance vs TCP performance = 29.35 steps/s vs 24.85 steps/s

ps smcr stat:
RX Stats
  Data transmitted (Bytes)   110853495554 (110.9G)
  Total requests                  1165230
  Buffer full                           0 (0.00%)
            8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
  Bufs        0       0       0       0       0       0       0       4
  Reqs   340.2K  29.65K  19.58K  76.32K  55.37K  39.15K  7.042K  43.88K
TX Stats
  Data transmitted (Bytes)   349072090590 (349.1G)
  Total requests                   922705
  Buffer full                      154765 (16.77%)
  Buffer full (remote)             309940 (33.59%)
  Buffer too small                  46896 (5.08%)
  Buffer too small (remote)         14304 (1.55%)
            8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
  Bufs        0       0       0       0       0       0       0       4
  Reqs   420.8K  11.15K  3.609K  12.28K  13.05K  26.08K  22.13K  240.3K

worker smcr stat:
RX Stats
  Data transmitted (Bytes)   349072090590 (349.1G)
  Total requests                   585165
  Buffer full                           0 (0.00%)
            8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
  Bufs        0       0       0       0       0       0       0       4
  Reqs   155.4K  13.42K  4.070K  4.462K  3.628K  9.720K  12.01K  165.0K
TX Stats
  Data transmitted (Bytes)   110854684711 (110.9G)
  Total requests                  1052628
  Buffer full                       34760 (3.30%)
  Buffer full (remote)              77630 (7.37%)
  Buffer too small                  22330 (2.12%)
  Buffer too small (remote)          7040 (0.67%)
            8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
  Bufs        0       0       0       0       0       0       0       4
  Reqs   666.3K  38.43K  20.65K  135.1K  54.19K  36.69K  3.948K  56.42K


From the above smcr stat, we can see quantities send/recv with large size more than 512KB, and quantities send blocked due to
buffer full or buffer too small. And when configured with larger send/recv buffer, we get less send block and better performance.

> Thanks,
> Wenjia
> 
>> Guangguan Wang (2):
>>    net/smc: set rmb's SG_MAX_SINGLE_ALLOC limitation only when
>>      CONFIG_ARCH_NO_SG_CHAIN is defined
>>    net/smc: change SMCR_RMBE_SIZES from 5 to 15
>>
>>   net/smc/smc_core.c | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 0/2] Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB
  2024-05-31  8:15   ` Guangguan Wang
@ 2024-05-31  9:03     ` Wenjia Zhang
  2024-05-31  9:35       ` Guangguan Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Wenjia Zhang @ 2024-05-31  9:03 UTC (permalink / raw)
  To: Guangguan Wang, jaka, davem, edumazet, kuba, pabeni
  Cc: kgraul, alibuda, tonylu, guwen, linux-s390, netdev, linux-kernel



On 31.05.24 10:15, Guangguan Wang wrote:
> 
> 
> On 2024/5/30 00:28, Wenjia Zhang wrote:
>>
>>
>> On 28.05.24 15:51, Guangguan Wang wrote:
>>> SMCR_RMBE_SIZES is the upper boundary of SMC-R's snd_buf and rcv_buf.
>>> The maximum bytes of snd_buf and rcv_buf can be calculated by 2^SMCR_
>>> RMBE_SIZES * 16KB. SMCR_RMBE_SIZES = 5 means the upper boundary is 512KB.
>>> TCP's snd_buf and rcv_buf max size is configured by net.ipv4.tcp_w/rmem[2]
>>> whose defalut value is 4MB or 6MB, is much larger than SMC-R's upper
>>> boundary.
>>>
>>> In some scenarios, such as Recommendation System, the communication
>>> pattern is mainly large size send/recv, where the size of snd_buf and
>>> rcv_buf greatly affects performance. Due to the upper boundary
>>> disadvantage, SMC-R performs poor than TCP in those scenarios. So it
>>> is time to enlarge the upper boundary size of SMC-R's snd_buf and rcv_buf,
>>> so that the SMC-R's snd_buf and rcv_buf can be configured to larger size
>>> for performance gain in such scenarios.
>>>
>>> The SMC-R rcv_buf's size will be transferred to peer by the field
>>> rmbe_size in clc accept and confirm message. The length of the field
>>> rmbe_size is four bits, which means the maximum value of SMCR_RMBE_SIZES
>>> is 15. In case of frequently adjusting the value of SMCR_RMBE_SIZES
>>> in different scenarios, set the value of SMCR_RMBE_SIZES to the maximum
>>> value 15, which means the upper boundary of SMC-R's snd_buf and rcv_buf
>>> is 512MB. As the real memory usage is determined by the value of
>>> net.smc.w/rmem, not by the upper boundary, set the value of SMCR_RMBE_SIZES
>>> to the maximum value has no side affects.
>>>
>> Hi Guangguan,
>>
>> That is correct that the maximum buffer(snd_buf and rcv_buf) size of SMCR is much smaller than TCP's. If I remember correctly, that was because the 512KB was enough for the traffic and did not waist memory space after some experiment. Sure, that was years ago, and it could be very different nowadays. But I'm still curious if you have any concrete scenario with performance benchmark which shows the distinguish disadvantage of the current maximum buffer size.
>>
> 
> Hi Wenjia,
> 
> The performance benchmark can be "Wide & Deep Recommender Model Training in TensorFlow" (https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Recommendation/WideAndDeep).
> The related paper here: https://arxiv.org/pdf/1606.07792.
> 
> The performance unit is steps/s, where a higher value indicates better performance.
> 
> 1) using 512KB snd_buf/recv_buf for SMC-R, default(4MB snd_buf/6MB recv_buf) for TCP:
>   SMC-R performance vs TCP performance = 24.21 steps/s vs 24.85 steps/s
> 
> ps smcr stat:
> RX Stats
>    Data transmitted (Bytes)    37600503985 (37.60G)
>    Total requests                   677841
>    Buffer full                       40074 (5.91%)
>              8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
>    Bufs        0       0       0       0       0       0       4       0
>    Reqs   178.2K  12.69K  8.125K  45.71K  23.51K  20.75K  60.16K       0
> TX Stats
>    Data transmitted (Bytes)   118471581684 (118.5G)
>    Total requests                   874395
>    Buffer full                      343080 (39.24%)
>    Buffer full (remote)             468523 (53.58%)
>    Buffer too small                 607914 (69.52%)
>    Buffer too small (remote)        607914 (69.52%)
>              8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
>    Bufs        0       0       0       0       0       0       4       0
>    Reqs   119.7K  3.169K  2.662K  5.583K  8.523K  21.55K  34.58K  318.0K
> 
> worker smcr stat:
> RX Stats
>    Data transmitted (Bytes)   118471581723 (118.5G)
>    Total requests                   835959
>    Buffer full                       99227 (11.87%)
>              8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
>    Bufs        0       0       0       0       0       0       4       0
>    Reqs   125.4K  13.14K  17.49K  16.78K  34.27K  34.12K  223.8K       0
> TX Stats
>    Data transmitted (Bytes)    37600504139 (37.60G)
>    Total requests                   606822
>    Buffer full                       86597 (14.27%)
>    Buffer full (remote)             156098 (25.72%)
>    Buffer too small                 154218 (25.41%)
>    Buffer too small (remote)        154218 (25.41%)
>              8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
>    Bufs        0       0       0       0       0       0       4       0
>    Reqs   323.6K  13.26K  6.979K  50.84K  19.43K  14.46K  8.231K  81.80K
> 
> 2) using 4MB snd_buf and 6MB recv_buf for SMC-R, default(4MB snd_buf/6MB recv_buf) for TCP:
>   SMC-R performance vs TCP performance = 29.35 steps/s vs 24.85 steps/s
> 
> ps smcr stat:
> RX Stats
>    Data transmitted (Bytes)   110853495554 (110.9G)
>    Total requests                  1165230
>    Buffer full                           0 (0.00%)
>              8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
>    Bufs        0       0       0       0       0       0       0       4
>    Reqs   340.2K  29.65K  19.58K  76.32K  55.37K  39.15K  7.042K  43.88K
> TX Stats
>    Data transmitted (Bytes)   349072090590 (349.1G)
>    Total requests                   922705
>    Buffer full                      154765 (16.77%)
>    Buffer full (remote)             309940 (33.59%)
>    Buffer too small                  46896 (5.08%)
>    Buffer too small (remote)         14304 (1.55%)
>              8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
>    Bufs        0       0       0       0       0       0       0       4
>    Reqs   420.8K  11.15K  3.609K  12.28K  13.05K  26.08K  22.13K  240.3K
> 
> worker smcr stat:
> RX Stats
>    Data transmitted (Bytes)   349072090590 (349.1G)
>    Total requests                   585165
>    Buffer full                           0 (0.00%)
>              8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
>    Bufs        0       0       0       0       0       0       0       4
>    Reqs   155.4K  13.42K  4.070K  4.462K  3.628K  9.720K  12.01K  165.0K
> TX Stats
>    Data transmitted (Bytes)   110854684711 (110.9G)
>    Total requests                  1052628
>    Buffer full                       34760 (3.30%)
>    Buffer full (remote)              77630 (7.37%)
>    Buffer too small                  22330 (2.12%)
>    Buffer too small (remote)          7040 (0.67%)
>              8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
>    Bufs        0       0       0       0       0       0       0       4
>    Reqs   666.3K  38.43K  20.65K  135.1K  54.19K  36.69K  3.948K  56.42K
> 
> 
>  From the above smcr stat, we can see quantities send/recv with large size more than 512KB, and quantities send blocked due to
> buffer full or buffer too small. And when configured with larger send/recv buffer, we get less send block and better performance.
> 
That is exactly what I asked for, thank you for the details! Please give 
me some days to try by ourselves. If the performance is also significant 
as yours and no other side effect, why not?!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 0/2] Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB
  2024-05-31  9:03     ` Wenjia Zhang
@ 2024-05-31  9:35       ` Guangguan Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Guangguan Wang @ 2024-05-31  9:35 UTC (permalink / raw)
  To: Wenjia Zhang, jaka, davem, edumazet, kuba, pabeni
  Cc: kgraul, alibuda, tonylu, guwen, linux-s390, netdev, linux-kernel



On 2024/5/31 17:03, Wenjia Zhang wrote:
> 
> 
> On 31.05.24 10:15, Guangguan Wang wrote:
>>
>>
>> On 2024/5/30 00:28, Wenjia Zhang wrote:
>>>
>>>
>>> On 28.05.24 15:51, Guangguan Wang wrote:
>>>> SMCR_RMBE_SIZES is the upper boundary of SMC-R's snd_buf and rcv_buf.
>>>> The maximum bytes of snd_buf and rcv_buf can be calculated by 2^SMCR_
>>>> RMBE_SIZES * 16KB. SMCR_RMBE_SIZES = 5 means the upper boundary is 512KB.
>>>> TCP's snd_buf and rcv_buf max size is configured by net.ipv4.tcp_w/rmem[2]
>>>> whose defalut value is 4MB or 6MB, is much larger than SMC-R's upper
>>>> boundary.
>>>>
>>>> In some scenarios, such as Recommendation System, the communication
>>>> pattern is mainly large size send/recv, where the size of snd_buf and
>>>> rcv_buf greatly affects performance. Due to the upper boundary
>>>> disadvantage, SMC-R performs poor than TCP in those scenarios. So it
>>>> is time to enlarge the upper boundary size of SMC-R's snd_buf and rcv_buf,
>>>> so that the SMC-R's snd_buf and rcv_buf can be configured to larger size
>>>> for performance gain in such scenarios.
>>>>
>>>> The SMC-R rcv_buf's size will be transferred to peer by the field
>>>> rmbe_size in clc accept and confirm message. The length of the field
>>>> rmbe_size is four bits, which means the maximum value of SMCR_RMBE_SIZES
>>>> is 15. In case of frequently adjusting the value of SMCR_RMBE_SIZES
>>>> in different scenarios, set the value of SMCR_RMBE_SIZES to the maximum
>>>> value 15, which means the upper boundary of SMC-R's snd_buf and rcv_buf
>>>> is 512MB. As the real memory usage is determined by the value of
>>>> net.smc.w/rmem, not by the upper boundary, set the value of SMCR_RMBE_SIZES
>>>> to the maximum value has no side affects.
>>>>
>>> Hi Guangguan,
>>>
>>> That is correct that the maximum buffer(snd_buf and rcv_buf) size of SMCR is much smaller than TCP's. If I remember correctly, that was because the 512KB was enough for the traffic and did not waist memory space after some experiment. Sure, that was years ago, and it could be very different nowadays. But I'm still curious if you have any concrete scenario with performance benchmark which shows the distinguish disadvantage of the current maximum buffer size.
>>>
>>
>> Hi Wenjia,
>>
>> The performance benchmark can be "Wide & Deep Recommender Model Training in TensorFlow" (https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Recommendation/WideAndDeep).
>> The related paper here: https://arxiv.org/pdf/1606.07792.
>>
>> The performance unit is steps/s, where a higher value indicates better performance.
>>
>> 1) using 512KB snd_buf/recv_buf for SMC-R, default(4MB snd_buf/6MB recv_buf) for TCP:
>>   SMC-R performance vs TCP performance = 24.21 steps/s vs 24.85 steps/s
>>
>> ps smcr stat:
>> RX Stats
>>    Data transmitted (Bytes)    37600503985 (37.60G)
>>    Total requests                   677841
>>    Buffer full                       40074 (5.91%)
>>              8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
>>    Bufs        0       0       0       0       0       0       4       0
>>    Reqs   178.2K  12.69K  8.125K  45.71K  23.51K  20.75K  60.16K       0
>> TX Stats
>>    Data transmitted (Bytes)   118471581684 (118.5G)
>>    Total requests                   874395
>>    Buffer full                      343080 (39.24%)
>>    Buffer full (remote)             468523 (53.58%)
>>    Buffer too small                 607914 (69.52%)
>>    Buffer too small (remote)        607914 (69.52%)
>>              8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
>>    Bufs        0       0       0       0       0       0       4       0
>>    Reqs   119.7K  3.169K  2.662K  5.583K  8.523K  21.55K  34.58K  318.0K
>>
>> worker smcr stat:
>> RX Stats
>>    Data transmitted (Bytes)   118471581723 (118.5G)
>>    Total requests                   835959
>>    Buffer full                       99227 (11.87%)
>>              8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
>>    Bufs        0       0       0       0       0       0       4       0
>>    Reqs   125.4K  13.14K  17.49K  16.78K  34.27K  34.12K  223.8K       0
>> TX Stats
>>    Data transmitted (Bytes)    37600504139 (37.60G)
>>    Total requests                   606822
>>    Buffer full                       86597 (14.27%)
>>    Buffer full (remote)             156098 (25.72%)
>>    Buffer too small                 154218 (25.41%)
>>    Buffer too small (remote)        154218 (25.41%)
>>              8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
>>    Bufs        0       0       0       0       0       0       4       0
>>    Reqs   323.6K  13.26K  6.979K  50.84K  19.43K  14.46K  8.231K  81.80K
>>
>> 2) using 4MB snd_buf and 6MB recv_buf for SMC-R, default(4MB snd_buf/6MB recv_buf) for TCP:
>>   SMC-R performance vs TCP performance = 29.35 steps/s vs 24.85 steps/s
>>
>> ps smcr stat:
>> RX Stats
>>    Data transmitted (Bytes)   110853495554 (110.9G)
>>    Total requests                  1165230
>>    Buffer full                           0 (0.00%)
>>              8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
>>    Bufs        0       0       0       0       0       0       0       4
>>    Reqs   340.2K  29.65K  19.58K  76.32K  55.37K  39.15K  7.042K  43.88K
>> TX Stats
>>    Data transmitted (Bytes)   349072090590 (349.1G)
>>    Total requests                   922705
>>    Buffer full                      154765 (16.77%)
>>    Buffer full (remote)             309940 (33.59%)
>>    Buffer too small                  46896 (5.08%)
>>    Buffer too small (remote)         14304 (1.55%)
>>              8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
>>    Bufs        0       0       0       0       0       0       0       4
>>    Reqs   420.8K  11.15K  3.609K  12.28K  13.05K  26.08K  22.13K  240.3K
>>
>> worker smcr stat:
>> RX Stats
>>    Data transmitted (Bytes)   349072090590 (349.1G)
>>    Total requests                   585165
>>    Buffer full                           0 (0.00%)
>>              8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
>>    Bufs        0       0       0       0       0       0       0       4
>>    Reqs   155.4K  13.42K  4.070K  4.462K  3.628K  9.720K  12.01K  165.0K
>> TX Stats
>>    Data transmitted (Bytes)   110854684711 (110.9G)
>>    Total requests                  1052628
>>    Buffer full                       34760 (3.30%)
>>    Buffer full (remote)              77630 (7.37%)
>>    Buffer too small                  22330 (2.12%)
>>    Buffer too small (remote)          7040 (0.67%)
>>              8KB    16KB    32KB    64KB   128KB   256KB   512KB  >512KB
>>    Bufs        0       0       0       0       0       0       0       4
>>    Reqs   666.3K  38.43K  20.65K  135.1K  54.19K  36.69K  3.948K  56.42K
>>
>>
>>  From the above smcr stat, we can see quantities send/recv with large size more than 512KB, and quantities send blocked due to
>> buffer full or buffer too small. And when configured with larger send/recv buffer, we get less send block and better performance.
>>
> That is exactly what I asked for, thank you for the details! Please give me some days to try by ourselves. If the performance is also significant as yours and no other side effect, why not?!

Hi Wenjia,

Happy to hear that.

More information about my test:
Test cmd is "nohup python3 -u -m trainer.task --benchmark_warmup_steps 5 --benchmark_steps 300000 --benchmark --is_ps --global_batch_size=8192 --job_name=$role &> ./${role}-${local_ip}.log &".
And test environment has one parameter server and two workers with A10 GPU.

Thanks,
Guangguan Wang

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 1/2] net/smc: set rmb's SG_MAX_SINGLE_ALLOC limitation only when CONFIG_ARCH_NO_SG_CHAIN is defined
  2024-05-28 13:51 ` [PATCH net-next 1/2] net/smc: set rmb's SG_MAX_SINGLE_ALLOC limitation only when CONFIG_ARCH_NO_SG_CHAIN is defined Guangguan Wang
@ 2024-06-01  8:35   ` Simon Horman
  2024-06-03  2:21     ` Guangguan Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Simon Horman @ 2024-06-01  8:35 UTC (permalink / raw)
  To: Guangguan Wang
  Cc: wenjia, jaka, davem, edumazet, kuba, pabeni, kgraul, alibuda,
	tonylu, guwen, linux-s390, netdev, linux-kernel

On Tue, May 28, 2024 at 09:51:37PM +0800, Guangguan Wang wrote:
> SG_MAX_SINGLE_ALLOC is used to limit maximum number of entries that
> will be allocated in one piece of scatterlist. When the entries of
> scatterlist exceeds SG_MAX_SINGLE_ALLOC, sg chain will be used. From
> commit 7c703e54cc71 ("arch: switch the default on ARCH_HAS_SG_CHAIN"),
> we can know that the macro CONFIG_ARCH_NO_SG_CHAIN is used to identify
> whether sg chain is supported. So, SMC-R's rmb buffer should be limitted

Hi Guangguan Wang,

As it looks like there will be a v2:

In this patch: limitted -> limited
In patch 2/2:  defalut -> default

checkpatch.pl --codespell is your friend.

> by SG_MAX_SINGLE_ALLOC only when the macro CONFIG_ARCH_NO_SG_CHAIN is
> defined.
> 
> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
> Co-developed-by: Wen Gu <guwen@linux.alibaba.com>
> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
> Fixes: a3fe3d01bd0d ("net/smc: introduce sg-logic for RMBs")

I think it is usual to put the fixes tag above the Signed-of tags,
although I don't see anything about that in [1].

[1] https://www.kernel.org/doc/html/latest/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes

...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 1/2] net/smc: set rmb's SG_MAX_SINGLE_ALLOC limitation only when CONFIG_ARCH_NO_SG_CHAIN is defined
  2024-06-01  8:35   ` Simon Horman
@ 2024-06-03  2:21     ` Guangguan Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Guangguan Wang @ 2024-06-03  2:21 UTC (permalink / raw)
  To: Simon Horman
  Cc: wenjia, jaka, davem, edumazet, kuba, pabeni, kgraul, alibuda,
	tonylu, guwen, linux-s390, netdev, linux-kernel



On 2024/6/1 16:35, Simon Horman wrote:
> On Tue, May 28, 2024 at 09:51:37PM +0800, Guangguan Wang wrote:
>> SG_MAX_SINGLE_ALLOC is used to limit maximum number of entries that
>> will be allocated in one piece of scatterlist. When the entries of
>> scatterlist exceeds SG_MAX_SINGLE_ALLOC, sg chain will be used. From
>> commit 7c703e54cc71 ("arch: switch the default on ARCH_HAS_SG_CHAIN"),
>> we can know that the macro CONFIG_ARCH_NO_SG_CHAIN is used to identify
>> whether sg chain is supported. So, SMC-R's rmb buffer should be limitted
> 
> Hi Guangguan Wang,
> 
> As it looks like there will be a v2:
> 
> In this patch: limitted -> limited
> In patch 2/2:  defalut -> default
> 
> checkpatch.pl --codespell is your friend.
> 
>> by SG_MAX_SINGLE_ALLOC only when the macro CONFIG_ARCH_NO_SG_CHAIN is
>> defined.
>>
>> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
>> Co-developed-by: Wen Gu <guwen@linux.alibaba.com>
>> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
>> Fixes: a3fe3d01bd0d ("net/smc: introduce sg-logic for RMBs")
> 
> I think it is usual to put the fixes tag above the Signed-of tags,
> although I don't see anything about that in [1].
> 
> [1] https://www.kernel.org/doc/html/latest/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes
> 
> ...

I will fix it in the next version.

Thanks,
Guangguan Wang

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-06-03  2:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-28 13:51 [PATCH net-next 0/2] Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB Guangguan Wang
2024-05-28 13:51 ` [PATCH net-next 1/2] net/smc: set rmb's SG_MAX_SINGLE_ALLOC limitation only when CONFIG_ARCH_NO_SG_CHAIN is defined Guangguan Wang
2024-06-01  8:35   ` Simon Horman
2024-06-03  2:21     ` Guangguan Wang
2024-05-28 13:51 ` [PATCH net-next 2/2] net/smc: change SMCR_RMBE_SIZES from 5 to 15 Guangguan Wang
2024-05-29 16:28 ` [PATCH net-next 0/2] Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB Wenjia Zhang
2024-05-31  8:15   ` Guangguan Wang
2024-05-31  9:03     ` Wenjia Zhang
2024-05-31  9:35       ` Guangguan Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).