public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Aaron Lu <aaron.lu@intel.com>,
	kernel test robot <xiaolong.ye@intel.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>, Tejun Heo <tj@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.14 34/52] mem_cgroup: make sure moving_account, move_lock_task and stat_cpu in the same cacheline
Date: Mon,  5 Apr 2021 10:54:00 +0200	[thread overview]
Message-ID: <20210405085023.107144420@linuxfoundation.org> (raw)
In-Reply-To: <20210405085021.996963957@linuxfoundation.org>

From: Aaron Lu <aaron.lu@intel.com>

commit e81bf9793b1861d74953ef041b4f6c7faecc2dbd upstream.

The LKP robot found a 27% will-it-scale/page_fault3 performance
regression regarding commit e27be240df53("mm: memcg: make sure
memory.events is uptodate when waking pollers").

What the test does is:
 1 mkstemp() a 128M file on a tmpfs;
 2 start $nr_cpu processes, each to loop the following:
   2.1 mmap() this file in shared write mode;
   2.2 write 0 to this file in a PAGE_SIZE step till the end of the file;
   2.3 unmap() this file and repeat this process.
 3 After 5 minutes, check how many loops they managed to complete, the
   higher the better.

The commit itself looks innocent enough as it merely changed some event
counting mechanism and this test didn't trigger those events at all.
Perf shows increased cycles spent on accessing root_mem_cgroup->stat_cpu
in count_memcg_event_mm()(called by handle_mm_fault()) and in
__mod_memcg_state() called by page_add_file_rmap().  So it's likely due
to the changed layout of 'struct mem_cgroup' that either make stat_cpu
falling into a constantly modifying cacheline or some hot fields stop
being in the same cacheline.

I verified this by moving memory_events[] back to where it was:

: --- a/include/linux/memcontrol.h
: +++ b/include/linux/memcontrol.h
: @@ -205,7 +205,6 @@ struct mem_cgroup {
:  	int		oom_kill_disable;
:
:  	/* memory.events */
: -	atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS];
:  	struct cgroup_file events_file;
:
:  	/* protect arrays of thresholds */
: @@ -238,6 +237,7 @@ struct mem_cgroup {
:  	struct mem_cgroup_stat_cpu __percpu *stat_cpu;
:  	atomic_long_t		stat[MEMCG_NR_STAT];
:  	atomic_long_t		events[NR_VM_EVENT_ITEMS];
: +	atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS];
:
:  	unsigned long		socket_pressure;

And performance restored.

Later investigation found that as long as the following 3 fields
moving_account, move_lock_task and stat_cpu are in the same cacheline,
performance will be good.  To avoid future performance surprise by other
commits changing the layout of 'struct mem_cgroup', this patch makes
sure the 3 fields stay in the same cacheline.

One concern of this approach is, moving_account and move_lock_task could
be modified when a process changes memory cgroup while stat_cpu is a
always read field, it might hurt to place them in the same cacheline.  I
assume it is rare for a process to change memory cgroup so this should
be OK.

Link: https://lkml.kernel.org/r/20180528114019.GF9904@yexl-desktop
Link: http://lkml.kernel.org/r/20180601071115.GA27302@intel.com
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/linux/memcontrol.h | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 6503a9ca27c1..c7876eadd206 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -155,6 +155,15 @@ enum memcg_kmem_state {
 	KMEM_ONLINE,
 };
 
+#if defined(CONFIG_SMP)
+struct memcg_padding {
+	char x[0];
+} ____cacheline_internodealigned_in_smp;
+#define MEMCG_PADDING(name)      struct memcg_padding name;
+#else
+#define MEMCG_PADDING(name)
+#endif
+
 /*
  * The memory controller data structure. The memory controller controls both
  * page cache and RSS per cgroup. We would eventually like to provide
@@ -202,7 +211,6 @@ struct mem_cgroup {
 	int		oom_kill_disable;
 
 	/* memory.events */
-	atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS];
 	struct cgroup_file events_file;
 
 	/* protect arrays of thresholds */
@@ -222,19 +230,26 @@ struct mem_cgroup {
 	 * mem_cgroup ? And what type of charges should we move ?
 	 */
 	unsigned long move_charge_at_immigrate;
+	/* taken only while moving_account > 0 */
+	spinlock_t		move_lock;
+	unsigned long		move_lock_flags;
+
+	MEMCG_PADDING(_pad1_);
+
 	/*
 	 * set > 0 if pages under this cgroup are moving to other cgroup.
 	 */
 	atomic_t		moving_account;
-	/* taken only while moving_account > 0 */
-	spinlock_t		move_lock;
 	struct task_struct	*move_lock_task;
-	unsigned long		move_lock_flags;
 
 	/* memory.stat */
 	struct mem_cgroup_stat_cpu __percpu *stat_cpu;
+
+	MEMCG_PADDING(_pad2_);
+
 	atomic_long_t		stat[MEMCG_NR_STAT];
 	atomic_long_t		events[NR_VM_EVENT_ITEMS];
+	atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS];
 
 	unsigned long		socket_pressure;
 
-- 
2.30.2




  parent reply	other threads:[~2021-04-05  9:00 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-05  8:53 [PATCH 4.14 00/52] 4.14.229-rc1 review Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 01/52] selinux: vsock: Set SID for socket returned by accept() Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 02/52] ipv6: weaken the v4mapped source check Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 03/52] ext4: fix bh ref count on error paths Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 04/52] rpc: fix NULL dereference on kmalloc failure Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 05/52] ASoC: rt5640: Fix dac- and adc- vol-tlv values being off by a factor of 10 Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 06/52] ASoC: rt5651: " Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 07/52] ASoC: sgtl5000: set DAP_AVC_CTRL register to correct default value on probe Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 08/52] ASoC: es8316: Simplify adc_pga_gain_tlv table Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 09/52] ASoC: cs42l42: Fix mixer volume control Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 10/52] ASoC: cs42l42: Always wait at least 3ms after reset Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 11/52] powerpc: Force inlining of cpu_has_feature() to avoid build failure Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 12/52] vhost: Fix vhost_vq_reset() Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 13/52] scsi: st: Fix a use after free in st_open() Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 14/52] scsi: qla2xxx: Fix broken #endif placement Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 15/52] staging: comedi: cb_pcidas: fix request_irq() warn Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 16/52] staging: comedi: cb_pcidas64: " Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 17/52] ASoC: rt5659: Update MCLK rate in set_sysclk() Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 18/52] ext4: do not iput inode under running transaction in ext4_rename() Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 19/52] brcmfmac: clear EAP/association status bits on linkdown events Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 20/52] net: ethernet: aquantia: Handle error cleanup of start on open Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 21/52] appletalk: Fix skb allocation size in loopback case Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 22/52] net: wan/lmc: unregister device when no matching device is found Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 23/52] bpf: Remove MTU check in __bpf_skb_max_len Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 24/52] ALSA: usb-audio: Apply sample rate quirk to Logitech Connect Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 25/52] ALSA: hda/realtek: fix a determine_headset_type issue for a Dell AIO Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 26/52] ALSA: hda/realtek: call alc_update_headset_mode() in hp_automute_hook Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 27/52] tracing: Fix stack trace event size Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 28/52] mm: fix race by making init_zero_pfn() early_initcall Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 29/52] drm/amdgpu: fix offset calculation in amdgpu_vm_bo_clear_mappings() Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 30/52] drm/amdgpu: check alignment on CPU page for bo map Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 31/52] reiserfs: update reiserfs_xattrs_initialized() condition Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 32/52] mm: memcontrol: fix NR_WRITEBACK leak in memcg and system stats Greg Kroah-Hartman
2021-04-05  8:53 ` [PATCH 4.14 33/52] mm: memcg: make sure memory.events is uptodate when waking pollers Greg Kroah-Hartman
2021-04-05  8:54 ` Greg Kroah-Hartman [this message]
2021-04-05  8:54 ` [PATCH 4.14 35/52] mm: fix oom_kill event handling Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 36/52] mm: writeback: use exact memcg dirty counts Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 37/52] pinctrl: rockchip: fix restore error in resume Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 38/52] extcon: Add stubs for extcon_register_notifier_all() functions Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 39/52] extcon: Fix error handling in extcon_dev_register Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 40/52] firewire: nosy: Fix a use-after-free bug in nosy_ioctl() Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 41/52] usbip: vhci_hcd fix shift out-of-bounds in vhci_hub_control() Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 42/52] USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 43/52] usb: musb: Fix suspend with devices connected for a64 Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 44/52] usb: xhci-mtk: fix broken streams issue on 0.96 xHCI Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 45/52] cdc-acm: fix BREAK rx code path adding necessary calls Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 46/52] USB: cdc-acm: untangle a circular dependency between callback and softint Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 47/52] USB: cdc-acm: downgrade message to debug Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 48/52] USB: cdc-acm: fix use-after-free after probe failure Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 49/52] usb: gadget: udc: amd5536udc_pci fix null-ptr-dereference Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 50/52] staging: rtl8192e: Fix incorrect source in memcpy() Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 51/52] staging: rtl8192e: Change state information from u16 to u8 Greg Kroah-Hartman
2021-04-05  8:54 ` [PATCH 4.14 52/52] drivers: video: fbcon: fix NULL dereference in fbcon_cursor() Greg Kroah-Hartman
2021-04-05 17:57 ` [PATCH 4.14 00/52] 4.14.229-rc1 review Guenter Roeck
2021-04-06  7:14 ` Naresh Kamboju
2021-04-07  0:58 ` Samuel Zou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210405085023.107144420@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=xiaolong.ye@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox