Re: [PATCH net-next 6/7] net: keep sk->sk_forward_alloc as small as possible

All of lore.kernel.org
 help / color / mirror / Atom feed

From: K Prateek Nayak <kprateek.nayak@amd.com>
To: Eric Dumazet <edumazet@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	netdev <netdev@vger.kernel.org>,
	Soheil Hassas Yeganeh <soheil@google.com>,
	Wei Wang <weiwan@google.com>, Shakeel Butt <shakeelb@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	Gautham Shenoy <gautham.shenoy@amd.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Chen Yu <yu.c.chen@intel.com>, Abel Wu <wuyun.abel@bytedance.com>,
	Yicong Yang <yangyicong@hisilicon.com>
Subject: Re: [PATCH net-next 6/7] net: keep sk->sk_forward_alloc as small as possible
Date: Mon, 17 Oct 2022 09:34:52 +0530	[thread overview]
Message-ID: <abf9aae5-1497-5a68-26cd-e49d54bbe0fd@amd.com> (raw)
In-Reply-To: <CANn89iJF2sWcxEJQF8SN4+VuAfVGUmP-s7qFXZEGYJH28iQLWQ@mail.gmail.com>

Hello Eric,

On 10/16/2022 1:49 AM, Eric Dumazet wrote:
> On Fri, Oct 14, 2022 at 1:30 AM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>>
>> Hello Eric,
> ...
>>
>> Following are the results:
>>
>> Clients:      good                 good + series         good  +series + larger wmem
>>     1    574.93 (0.00 pct)       554.42 (-3.56 pct)      552.92 (-3.82 pct)
>>     2    1135.60 (0.00 pct)      1034.76 (-8.87 pct)     1036.94 (-8.68 pct)
>>     4    2117.29 (0.00 pct)      1796.97 (-15.12 pct)    1539.21 (-27.30 pct)
>>     8    3799.57 (0.00 pct)      3020.87 (-20.49 pct)    2797.98 (-26.36 pct)
>>    16    6129.79 (0.00 pct)      4536.99 (-25.98 pct)    4301.20 (-29.83 pct)
>>    32    11630.67 (0.00 pct)     8674.74 (-25.41 pct)    8199.28 (-29.50 pct)
>>    64    20895.77 (0.00 pct)     14417.26 (-31.00 pct)   14473.34 (-30.73 pct)
>>   128    31989.55 (0.00 pct)     20611.47 (-35.56 pct)   19671.08 (-38.50 pct)
>>   256    56388.57 (0.00 pct)     48822.72 (-13.41 pct)   48455.77 (-14.06 pct)
>>   512    59326.33 (0.00 pct)     43960.03 (-25.90 pct)   43968.59 (-25.88 pct)
>>  1024    58281.10 (0.00 pct)     41256.18 (-29.21 pct)   40550.97 (-30.42 pct)
>>
>> Given the message size is small, I think wmem size does not
>> impact the benchmark results much.
> 
> Hmmm.
> 
> tldr; I can not really repro the issues (tested on AMD EPYC 7B12,
> NPS1) with CONFIG_PREEMPT_NONE=y
> 
> sendmsg(256 bytes)
>     grab 4096 bytes forward allocation from sk->sk_prot->per_cpu_fw_alloc
>    send skb, softirq handler immediately sends ACK back, and queues
> the packet into receiver socket (also grabbing bytes from
> sk->sk_prot->per_cpu_fw_alloc)
>      ACK releases the 4096 bytes to per-cpu
> sk->sk_prot->per_cpu_fw_alloc on sender TCP socket
> 
> per_cpu_fw_alloc have a 1MB cushion (per cpu), not sure why it is not
> enough in your case.
> Worst case would be one dirtying of tcp_memory_allocated every ~256 messages,
> but in more common cases we dirty this cache less often...
> 
> I wonder if NPS2/NPS4 could land per-cpu variables into the wrong NUMA
> node maybe ?
> (or on NPS1, incorrect NUMA information on your platform ?)
> Or maybe the small changes are enough for your system to hit a cliff.
> AMD systems are quite sensitive to mem-bw saturation.

We've observed some unintended side effects of introducing per-cpu
variables in the past that impacted tbench performance
(https://lore.kernel.org/lkml/e000b124-afd4-28e1-fde2-393b0e38ce19@amd.com/)

In those cases, the introduction of new per-cpu variables was enough
to see a regression but with this series, I only see the regression
from Patch 6 which is why I believed it was the changes in the reclaim
strategy that caused this. 

> 
>  I ran the following on an AMD host (NPS1) with two physical cpu (256 HT total)
> 
> for i in 1 2 4 8 16 32 64 128 192 256; do echo -n $i: ;
> ./super_netperf $i -H ::1 -l 10 -- -m 256 -M 256; done
> 
> Before patch series ( 5c281b4e529c )
> 1:   6956
> 2:  14169
> 4:  28311
> 8:  56519
> 16: 113621
> 32: 225317
> 64: 341658
> 128: 475131
> 192: 304515
> 256: 181754
> 
> After patch series, to me this looks very close or even much better at
> high number of threads.
> 1:   6963
> 2:  14166
> 4:  28095
> 8:  56878
> 16: 112723
> 32: 202417
> 64: 266744
> 128: 482031
> 192: 317876
> 256: 293169
> 
> And if we look at "ss -tm" while tests are running, it is clearly
> visible that the old kernels were pretty bad in terms of memory
> control.
> 
> Old kernel:
> ESTAB        0              55040
> [::1]:39474                                                [::1]:32891
> skmem:(r0,rb540000,t0,tb10243584,f1167104,w57600,o0,bl0,d0)
> ESTAB        36864          0
> [::1]:37733                                                [::1]:54752
> skmem:(r55040,rb8515000,t0,tb2626560,f1710336,w0,o0,bl0,d0)
> 
> These two sockets were holding 1167104+1710336 bytes of forward
> allocations, just to 'be efficient'
> Now think of servers with millions of TCP sockets :/
> 
> New kernel : No more extra forward allocations above 4096 bytes.
> sk_forward_alloc only holds the reminder of allocations,
> because memcg/tcp_memory_allocated granularity is in pages.
> 
> ESTAB   35328     0                             [::1]:36493
>                            [::1]:41394
> skmem:(r46848,rb7467000,t0,tb2626560,f2304,w0,o0,bl0,d0)
> ESTAB   0         54272                         [::1]:58680
>                            [::1]:47859
> skmem:(r0,rb540000,t0,tb6829056,f512,w56832,o0,bl0,d0)
> 
> Only when enabling CONFIG_PREEMPT=y I had some kind of spinlock contention
> in scheduler/rcu layers, making test results very flaky.

Thank you for trying to reproduce the issue on your system.
The results you shared are indeed promising. I've probably
overlooked something during my testing.

Can you please share the kernel config you used during your
testing? I would like to rule out any obvious setup errors
from my side.

--
Thanks and Regards,
Prateek

next prev parent reply	other threads:[~2022-10-17  4:05 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-09  6:34 [PATCH net-next 0/7] net: reduce tcp_memory_allocated inflation Eric Dumazet
2022-06-09  6:34 ` [PATCH net-next 1/7] Revert "net: set SK_MEM_QUANTUM to 4096" Eric Dumazet
2022-06-09 15:08   ` Shakeel Butt
2022-06-09  6:34 ` [PATCH net-next 2/7] net: remove SK_MEM_QUANTUM and SK_MEM_QUANTUM_SHIFT Eric Dumazet
2022-06-09 15:09   ` Shakeel Butt
2022-06-09  6:34 ` [PATCH net-next 3/7] net: add per_cpu_fw_alloc field to struct proto Eric Dumazet
2022-06-09 15:11   ` Shakeel Butt
2022-06-09  6:34 ` [PATCH net-next 4/7] net: implement per-cpu reserves for memory_allocated Eric Dumazet
2022-06-09 13:33   ` Soheil Hassas Yeganeh
2022-06-09 13:47     ` Eric Dumazet
2022-06-09 13:48       ` Soheil Hassas Yeganeh
2022-06-09 14:46   ` Neal Cardwell
2022-06-09 15:07     ` Shakeel Butt
2022-06-09 15:09       ` Neal Cardwell
2022-06-09 15:43         ` Eric Dumazet
2022-06-09 15:12   ` Shakeel Butt
2022-06-09  6:34 ` [PATCH net-next 5/7] net: fix sk_wmem_schedule() and sk_rmem_schedule() errors Eric Dumazet
2022-06-09 15:18   ` Shakeel Butt
2022-06-09  6:34 ` [PATCH net-next 6/7] net: keep sk->sk_forward_alloc as small as possible Eric Dumazet
2022-06-09 16:38   ` Shakeel Butt
2022-06-10 23:00   ` Mat Martineau
2022-10-13 13:15   ` K Prateek Nayak
2022-10-13 14:35     ` Eric Dumazet
2022-10-13 15:52       ` Shakeel Butt
2022-10-14  8:32         ` K Prateek Nayak
2022-10-14  8:30       ` K Prateek Nayak
2022-10-15 20:19         ` Eric Dumazet
2022-10-17  4:04           ` K Prateek Nayak [this message]
2022-06-09  6:34 ` [PATCH net-next 7/7] net: unexport __sk_mem_{raise|reduce}_allocated Eric Dumazet
2022-06-09 16:38   ` Shakeel Butt
2022-06-09 13:33 ` [PATCH net-next 0/7] net: reduce tcp_memory_allocated inflation Soheil Hassas Yeganeh
2022-06-11  0:10 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abf9aae5-1497-5a68-26cd-e49d54bbe0fd@amd.com \
    --to=kprateek.nayak@amd.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=gautham.shenoy@amd.com \
    --cc=kuba@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=peterz@infradead.org \
    --cc=shakeelb@google.com \
    --cc=soheil@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=weiwan@google.com \
    --cc=wuyun.abel@bytedance.com \
    --cc=yangyicong@hisilicon.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.