Re: [net] 4890b686f4: netperf.Throughput_Mbps -69.4% regression

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Feng Tang <feng.tang@intel.com>
To: Eric Dumazet <edumazet@google.com>
Cc: Shakeel Butt <shakeelb@google.com>, Linux MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Michal Hocko <mhocko@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Muchun Song <songmuchun@bytedance.com>,
	Jakub Kicinski <kuba@kernel.org>, Xin Long <lucien.xin@gmail.com>,
	Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>,
	kernel test robot <oliver.sang@intel.com>,
	Soheil Hassas Yeganeh <soheil@google.com>,
	LKML <linux-kernel@vger.kernel.org>,
	network dev <netdev@vger.kernel.org>,
	linux-s390@vger.kernel.org,
	MPTCP Upstream <mptcp@lists.linux.dev>,
	"linux-sctp @ vger . kernel . org" <linux-sctp@vger.kernel.org>,
	lkp@lists.01.org, kbuild test robot <lkp@intel.com>,
	Huang Ying <ying.huang@intel.com>,
	Xing Zhengjun <zhengjun.xing@linux.intel.com>,
	Yin Fengwei <fengwei.yin@intel.com>, Ying Xu <yinxu@redhat.com>
Subject: Re: [net] 4890b686f4: netperf.Throughput_Mbps -69.4% regression
Date: Mon, 27 Jun 2022 22:48:22 +0800	[thread overview]
Message-ID: <20220627144822.GA20878@shbuild999.sh.intel.com> (raw)
In-Reply-To: <CANn89iJAoYCebNbXpNMXRoDUkFMhg9QagetVU9NZUq+GnLMgqQ@mail.gmail.com>

On Mon, Jun 27, 2022 at 04:07:55PM +0200, Eric Dumazet wrote:
> On Mon, Jun 27, 2022 at 2:34 PM Feng Tang <feng.tang@intel.com> wrote:
> >
> > On Mon, Jun 27, 2022 at 10:46:21AM +0200, Eric Dumazet wrote:
> > > On Mon, Jun 27, 2022 at 4:38 AM Feng Tang <feng.tang@intel.com> wrote:
> > [snip]
> > > > > >
> > > > > > Thanks Feng. Can you check the value of memory.kmem.tcp.max_usage_in_bytes
> > > > > > in /sys/fs/cgroup/memory/system.slice/lkp-bootstrap.service after making
> > > > > > sure that the netperf test has already run?
> > > > >
> > > > > memory.kmem.tcp.max_usage_in_bytes:0
> > > >
> > > > Sorry, I made a mistake that in the original report from Oliver, it
> > > > was 'cgroup v2' with a 'debian-11.1' rootfs.
> > > >
> > > > When you asked about cgroup info, I tried the job on another tbox, and
> > > > the original 'job.yaml' didn't work, so I kept the 'netperf' test
> > > > parameters and started a new job which somehow run with a 'debian-10.4'
> > > > rootfs and acutally run with cgroup v1.
> > > >
> > > > And as you mentioned cgroup version does make a big difference, that
> > > > with v1, the regression is reduced to 1% ~ 5% on different generations
> > > > of test platforms. Eric mentioned they also got regression report,
> > > > but much smaller one, maybe it's due to the cgroup version?
> > >
> > > This was using the current net-next tree.
> > > Used recipe was something like:
> > >
> > > Make sure cgroup2 is mounted or mount it by mount -t cgroup2 none $MOUNT_POINT.
> > > Enable memory controller by echo +memory > $MOUNT_POINT/cgroup.subtree_control.
> > > Create a cgroup by mkdir $MOUNT_POINT/job.
> > > Jump into that cgroup by echo $$ > $MOUNT_POINT/job/cgroup.procs.
> > >
> > > <Launch tests>
> > >
> > > The regression was smaller than 1%, so considered noise compared to
> > > the benefits of the bug fix.
> >
> > Yes, 1% is just around noise level for a microbenchmark.
> >
> > I went check the original test data of Oliver's report, the tests was
> > run 6 rounds and the performance data is pretty stable (0Day's report
> > will show any std deviation bigger than 2%)
> >
> > The test platform is a 4 sockets 72C/144T machine, and I run the
> > same job (nr_tasks = 25% * nr_cpus) on one CascadeLake AP (4 nodes)
> > and one Icelake 2 sockets platform, and saw 75% and 53% regresson on
> > them.
> >
> > In the first email, there is a file named 'reproduce', it shows the
> > basic test process:
> >
> > "
> >   use 'performane' cpufre  governor for all CPUs
> >
> >   netserver -4 -D
> >   modprobe sctp
> >   netperf -4 -H 127.0.0.1 -t SCTP_STREAM_MANY -c -C -l 300 -- -m 10K  &
> >   netperf -4 -H 127.0.0.1 -t SCTP_STREAM_MANY -c -C -l 300 -- -m 10K  &
> >   netperf -4 -H 127.0.0.1 -t SCTP_STREAM_MANY -c -C -l 300 -- -m 10K  &
> >   (repeat 36 times in total)
> >   ...
> >
> > "
> >
> > Which starts 36 (25% of nr_cpus) netperf clients. And the clients number
> > also matters, I tried to increase the client number from 36 to 72(50%),
> > and the regression is changed from 69.4% to 73.7%"
> >
> 
> This seems like a lot of opportunities for memcg folks :)
> 
> struct page_counter has poor field placement [1], and no per-cpu cache.
> 
> [1] "atomic_long_t usage" is sharing cache line with read mostly fields.
> 
> (struct mem_cgroup also has poor field placement, mainly because of
> struct page_counter)
> 
>     28.69%  [kernel]       [k] copy_user_enhanced_fast_string
>     16.13%  [kernel]       [k] intel_idle_irq
>      6.46%  [kernel]       [k] page_counter_try_charge
>      6.20%  [kernel]       [k] __sk_mem_reduce_allocated
>      5.68%  [kernel]       [k] try_charge_memcg
>      5.16%  [kernel]       [k] page_counter_cancel

Yes, I also analyzed the perf-profile data, and made some layout changes
which could recover the changes from 69% to 40%.

7c80b038d23e1f4c 4890b686f4088c90432149bd6de 332b589c49656a45881bca4ecc0
---------------- --------------------------- --------------------------- 
     15722           -69.5%       4792           -40.8%       9300        netperf.Throughput_Mbps
 

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 1bfcfb1af352..aa37bd39116c 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -179,14 +179,13 @@ struct cgroup_subsys_state {
 	atomic_t online_cnt;
 
 	/* percpu_ref killing and RCU release */
-	struct work_struct destroy_work;
 	struct rcu_work destroy_rwork;
-
+	struct cgroup_subsys_state *parent;
+	struct work_struct destroy_work;
 	/*
 	 * PI: the parent css.	Placed here for cache proximity to following
 	 * fields of the containing structure.
 	 */
-	struct cgroup_subsys_state *parent;
 };
 
 /*
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 9ecead1042b9..963b88ab9930 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -239,9 +239,6 @@ struct mem_cgroup {
 	/* Private memcg ID. Used to ID objects that outlive the cgroup */
 	struct mem_cgroup_id id;
 
-	/* Accounted resources */
-	struct page_counter memory;		/* Both v1 & v2 */
-
 	union {
 		struct page_counter swap;	/* v2 only */
 		struct page_counter memsw;	/* v1 only */
@@ -251,6 +248,9 @@ struct mem_cgroup {
 	struct page_counter kmem;		/* v1 only */
 	struct page_counter tcpmem;		/* v1 only */
 
+	/* Accounted resources */
+	struct page_counter memory;		/* Both v1 & v2 */
+
 	/* Range enforcement for interrupt charges */
 	struct work_struct high_work;
 
@@ -313,7 +313,6 @@ struct mem_cgroup {
 	atomic_long_t		memory_events[MEMCG_NR_MEMORY_EVENTS];
 	atomic_long_t		memory_events_local[MEMCG_NR_MEMORY_EVENTS];
 
-	unsigned long		socket_pressure;
 
 	/* Legacy tcp memory accounting */
 	bool			tcpmem_active;
@@ -349,6 +348,7 @@ struct mem_cgroup {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	struct deferred_split deferred_split_queue;
 #endif
+	unsigned long		socket_pressure;
 
 	struct mem_cgroup_per_node *nodeinfo[];
 };

And some of these are specific for network and may not be a universal
win, though I think the 'cgroup_subsys_state' could keep the
read-mostly 'parent' away from following written-mostly counters.

Btw, I tried your debug patch which compiled fail with 0Day's kbuild
system, but it did compile ok on my local machine.

Thanks,
Feng

> 
> > Thanks,
> > Feng
> >
> > > >
> > > > Thanks,
> > > > Feng

next prev parent reply	other threads:[~2022-06-27 14:48 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-19 15:04 [net] 4890b686f4: netperf.Throughput_Mbps -69.4% regression kernel test robot
2022-06-23  0:28 ` Jakub Kicinski
2022-06-23  3:08   ` Xin Long
2022-06-23 22:50     ` Xin Long
2022-06-24  1:57       ` Jakub Kicinski
2022-06-24  4:13         ` Eric Dumazet
2022-06-24  4:22           ` Eric Dumazet
2022-06-24  5:13           ` Feng Tang
2022-06-24  5:45             ` Eric Dumazet
2022-06-24  6:00               ` Feng Tang
2022-06-24  6:07                 ` Eric Dumazet
2022-06-24  6:34           ` Shakeel Butt
2022-06-24  7:06             ` Feng Tang
2022-06-24 14:43               ` Shakeel Butt
2022-06-25  2:36                 ` Feng Tang
2022-06-27  2:38                   ` Feng Tang
2022-06-27  8:46                     ` Eric Dumazet
2022-06-27 12:34                       ` Feng Tang
2022-06-27 14:07                         ` Eric Dumazet
2022-06-27 14:48                           ` Feng Tang [this message]
2022-06-27 16:25                             ` Eric Dumazet
2022-06-27 16:48                               ` Shakeel Butt
2022-06-27 17:05                                 ` Eric Dumazet
2022-06-28  1:46                                 ` Roman Gushchin
2022-06-28  3:49                               ` Feng Tang
2022-07-01 15:47                                 ` Shakeel Butt
2022-07-03 10:43                                   ` Feng Tang
2022-07-03 22:55                                     ` Roman Gushchin
2022-07-05  5:03                                       ` Feng Tang
2022-08-16  5:52                                         ` Oliver Sang
2022-08-16 15:55                                           ` Shakeel Butt
2022-06-27 14:52                         ` Shakeel Butt
2022-06-27 14:56                           ` Eric Dumazet
2022-06-27 15:12                           ` Feng Tang
2022-06-27 16:25                             ` Shakeel Butt

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:1bfcfb1af35 dfblob:aa37bd39116 dfblob:9ecead1042b
dfblob:963b88ab993 )
 OR (
bs:"Re: [net] 4890b686f4: netperf.Throughput_Mbps -69.4% regression" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220627144822.GA20878@shbuild999.sh.intel.com \
    --to=feng.tang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=edumazet@google.com \
    --cc=fengwei.yin@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sctp@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=lucien.xin@gmail.com \
    --cc=marcelo.leitner@gmail.com \
    --cc=mhocko@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=oliver.sang@intel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=soheil@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=ying.huang@intel.com \
    --cc=yinxu@redhat.com \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).