linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Chris Down <chris@chrisdown.name>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Shakeel Butt <shakeelb@google.com>,
	Michal Hocko <mhocko@kernel.org>, Tejun Heo <tj@kernel.org>,
	Roman Gushchin <guro@fb.com>, Dennis Zhou <dennis@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 5.1 48/85] mm, memcg: consider subtrees in memory.events
Date: Fri,  7 Jun 2019 17:39:33 +0200	[thread overview]
Message-ID: <20190607153854.908885718@linuxfoundation.org> (raw)
In-Reply-To: <20190607153849.101321647@linuxfoundation.org>

From: Chris Down <chris@chrisdown.name>

commit 9852ae3fe5293264f01c49f2571ef7688f7823ce upstream.

memory.stat and other files already consider subtrees in their output, and
we should too in order to not present an inconsistent interface.

The current situation is fairly confusing, because people interacting with
cgroups expect hierarchical behaviour in the vein of memory.stat,
cgroup.events, and other files.  For example, this causes confusion when
debugging reclaim events under low, as currently these always read "0" at
non-leaf memcg nodes, which frequently causes people to misdiagnose breach
behaviour.  The same confusion applies to other counters in this file when
debugging issues.

Aggregation is done at write time instead of at read-time since these
counters aren't hot (unlike memory.stat which is per-page, so it does it
at read time), and it makes sense to bundle this with the file
notifications.

After this patch, events are propagated up the hierarchy:

    [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
    low 0
    high 0
    max 0
    oom 0
    oom_kill 0
    [root@ktst ~]# systemd-run -p MemoryMax=1 true
    Running as unit: run-r251162a189fb4562b9dabfdc9b0422f5.service
    [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
    low 0
    high 0
    max 7
    oom 1
    oom_kill 1

As this is a change in behaviour, this can be reverted to the old
behaviour by mounting with the `memory_localevents' flag set.  However, we
use the new behaviour by default as there's a lack of evidence that there
are any current users of memory.events that would find this change
undesirable.

akpm: this is a behaviour change, so Cc:stable.  THis is so that
forthcoming distros which use cgroup v2 are more likely to pick up the
revised behaviour.

Link: http://lkml.kernel.org/r/20190208224419.GA24772@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 Documentation/admin-guide/cgroup-v2.rst |    9 +++++++++
 include/linux/cgroup-defs.h             |    5 +++++
 include/linux/memcontrol.h              |   10 ++++++++--
 kernel/cgroup/cgroup.c                  |   16 ++++++++++++++--
 4 files changed, 36 insertions(+), 4 deletions(-)

--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -177,6 +177,15 @@ cgroup v2 currently supports the followi
 	ignored on non-init namespace mounts.  Please refer to the
 	Delegation section for details.
 
+  memory_localevents
+
+        Only populate memory.events with data for the current cgroup,
+        and not any subtrees. This is legacy behaviour, the default
+        behaviour without this option is to include subtree counts.
+        This option is system wide and can only be set on mount or
+        modified through remount from the init namespace. The mount
+        option is ignored on non-init namespace mounts.
+
 
 Organizing Processes and Threads
 --------------------------------
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -83,6 +83,11 @@ enum {
 	 * Enable cpuset controller in v1 cgroup to use v2 behavior.
 	 */
 	CGRP_ROOT_CPUSET_V2_MODE = (1 << 4),
+
+	/*
+	 * Enable legacy local memory.events.
+	 */
+	CGRP_ROOT_MEMORY_LOCAL_EVENTS = (1 << 5),
 };
 
 /* cftype->flags */
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -777,8 +777,14 @@ static inline void count_memcg_event_mm(
 static inline void memcg_memory_event(struct mem_cgroup *memcg,
 				      enum memcg_memory_event event)
 {
-	atomic_long_inc(&memcg->memory_events[event]);
-	cgroup_file_notify(&memcg->events_file);
+	do {
+		atomic_long_inc(&memcg->memory_events[event]);
+		cgroup_file_notify(&memcg->events_file);
+
+		if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
+			break;
+	} while ((memcg = parent_mem_cgroup(memcg)) &&
+		 !mem_cgroup_is_root(memcg));
 }
 
 static inline void memcg_memory_event_mm(struct mm_struct *mm,
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1775,11 +1775,13 @@ int cgroup_show_path(struct seq_file *sf
 
 enum cgroup2_param {
 	Opt_nsdelegate,
+	Opt_memory_localevents,
 	nr__cgroup2_params
 };
 
 static const struct fs_parameter_spec cgroup2_param_specs[] = {
-	fsparam_flag  ("nsdelegate",		Opt_nsdelegate),
+	fsparam_flag("nsdelegate",		Opt_nsdelegate),
+	fsparam_flag("memory_localevents",	Opt_memory_localevents),
 	{}
 };
 
@@ -1802,6 +1804,9 @@ static int cgroup2_parse_param(struct fs
 	case Opt_nsdelegate:
 		ctx->flags |= CGRP_ROOT_NS_DELEGATE;
 		return 0;
+	case Opt_memory_localevents:
+		ctx->flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS;
+		return 0;
 	}
 	return -EINVAL;
 }
@@ -1813,6 +1818,11 @@ static void apply_cgroup_root_flags(unsi
 			cgrp_dfl_root.flags |= CGRP_ROOT_NS_DELEGATE;
 		else
 			cgrp_dfl_root.flags &= ~CGRP_ROOT_NS_DELEGATE;
+
+		if (root_flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
+			cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS;
+		else
+			cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_LOCAL_EVENTS;
 	}
 }
 
@@ -1820,6 +1830,8 @@ static int cgroup_show_options(struct se
 {
 	if (cgrp_dfl_root.flags & CGRP_ROOT_NS_DELEGATE)
 		seq_puts(seq, ",nsdelegate");
+	if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
+		seq_puts(seq, ",memory_localevents");
 	return 0;
 }
 
@@ -6122,7 +6134,7 @@ static struct kobj_attribute cgroup_dele
 static ssize_t features_show(struct kobject *kobj, struct kobj_attribute *attr,
 			     char *buf)
 {
-	return snprintf(buf, PAGE_SIZE, "nsdelegate\n");
+	return snprintf(buf, PAGE_SIZE, "nsdelegate\nmemory_localevents\n");
 }
 static struct kobj_attribute cgroup_features_attr = __ATTR_RO(features);
 



  parent reply	other threads:[~2019-06-07 15:49 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-07 15:38 [PATCH 5.1 00/85] 5.1.8-stable review Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 5.1 01/85] sparc64: Fix regression in non-hypervisor TLB flush xcall Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 5.1 02/85] include/linux/bitops.h: sanitize rotate primitives Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 5.1 03/85] xhci: update bounce buffer with correct sg num Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 5.1 04/85] xhci: Use %zu for printing size_t type Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 5.1 05/85] xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic() Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 5.1 06/85] usb: xhci: avoid null pointer deref when bos field is NULL Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 5.1 07/85] usbip: usbip_host: fix BUG: sleeping function called from invalid context Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 5.1 08/85] usbip: usbip_host: fix stub_dev lock context imbalance regression Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 5.1 09/85] USB: Fix slab-out-of-bounds write in usb_get_bos_descriptor Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 5.1 10/85] USB: sisusbvga: fix oops in error path of sisusb_probe Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 5.1 11/85] USB: Add LPM quirk for Surface Dock GigE adapter Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 5.1 12/85] USB: rio500: refuse more than one device at a time Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 5.1 13/85] USB: rio500: fix memory leak in close after disconnect Greg Kroah-Hartman
2019-06-07 15:38 ` [PATCH 5.1 14/85] media: usb: siano: Fix general protection fault in smsusb Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 15/85] media: usb: siano: Fix false-positive "uninitialized variable" warning Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 16/85] media: smsusb: better handle optional alignment Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 17/85] brcmfmac: fix NULL pointer derefence during USB disconnect Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 18/85] scsi: zfcp: fix missing zfcp_port reference put on -EBUSY from port_remove Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 19/85] scsi: zfcp: fix to prevent port_remove with pure auto scan LUNs (only sdevs) Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 20/85] tracing: Avoid memory leak in predicate_parse() Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 21/85] Btrfs: fix wrong ctime and mtime of a directory after log replay Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 22/85] Btrfs: fix race updating log root item during fsync Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 23/85] Btrfs: fix fsync not persisting changed attributes of a directory Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 24/85] btrfs: correct zstd workspace manager lock to use spin_lock_bh() Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 25/85] btrfs: qgroup: Check bg while resuming relocation to avoid NULL pointer dereference Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 26/85] Btrfs: incremental send, fix file corruption when no-holes feature is enabled Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 27/85] btrfs: reloc: Also queue orphan reloc tree for cleanup to avoid BUG_ON() Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 28/85] iio: dac: ds4422/ds4424 fix chip verification Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 29/85] iio: adc: ads124: avoid buffer overflow Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 30/85] iio: adc: modify NPCM ADC read reference voltage Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 31/85] iio: adc: ti-ads8688: fix timestamp is not updated in buffer Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 32/85] s390/crypto: fix gcm-aes-s390 selftest failures Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 33/85] s390/crypto: fix possible sleep during spinlock aquired Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 34/85] KVM: PPC: Book3S HV: XIVE: Do not clear IRQ data of passthrough interrupts Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 35/85] KVM: PPC: Book3S HV: Fix lockdep warning when entering guest on POWER9 Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 36/85] KVM: PPC: Book3S HV: Restore SPRG3 in kvmhv_p9_guest_entry() Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 37/85] powerpc/perf: Fix MMCRA corruption by bhrb_filter Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 38/85] powerpc/kexec: Fix loading of kernel + initramfs with kexec_file_load() Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 39/85] ALSA: line6: Assure canceling delayed work at disconnection Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 40/85] ALSA: hda/realtek - Set default power save node to 0 Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 41/85] ALSA: hda/realtek - Improve the headset mic for Acer Aspire laptops Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 42/85] KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 43/85] drm/nouveau/i2c: Disable i2c bus access after ->fini() Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 44/85] i2c: mlxcpld: Fix wrong initialization order in probe Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 45/85] i2c: synquacer: fix synquacer_i2c_doxfer() return value Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 46/85] tty: serial: msm_serial: Fix XON/XOFF Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 47/85] tty: max310x: Fix external crystal register setup Greg Kroah-Hartman
2019-06-07 15:39 ` Greg Kroah-Hartman [this message]
2019-06-07 15:39 ` [PATCH 5.1 49/85] memcg: make it work on sparse non-0-node systems Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 50/85] kasan: initialize tag to 0xff in __kasan_kmalloc Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 51/85] kernel/signal.c: trace_signal_deliver when signal_group_exit Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 52/85] signal/arm64: Use force_sig not force_sig_fault for SIGKILL Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 53/85] mm, compaction: make sure we isolate a valid PFN Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 54/85] arm64: Fix the arm64_personality() syscall wrapper redirection Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 55/85] docs: Fix conf.py for Sphinx 2.0 Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 56/85] doc: Cope with the deprecation of AutoReporter Greg Kroah-Hartman
2019-06-10  6:27   ` Jiri Slaby
2019-06-10  7:31     ` Greg Kroah-Hartman
2019-06-10  7:34       ` Jiri Slaby
2019-06-10  7:48         ` Greg Kroah-Hartman
2019-06-10  7:56           ` Jiri Slaby
2019-06-10 12:33           ` Jonathan Corbet
2019-06-10 14:05             ` Greg Kroah-Hartman
2019-06-10 14:27               ` Thomas Backlund
2019-06-10 14:39                 ` Greg Kroah-Hartman
2019-06-11  8:50                   ` Jiri Slaby
2019-06-07 15:39 ` [PATCH 5.1 57/85] doc: Cope with Sphinx logging deprecations Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 58/85] x86/ima: Check EFI_RUNTIME_SERVICES before using Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 59/85] ima: fix wrong signed policy requirement when not appraising Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 60/85] ima: show rules with IMA_INMASK correctly Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 61/85] evm: check hash algorithm passed to init_desc() Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 62/85] clk: imx: imx8mm: fix int pll clk gate Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 63/85] vt/fbcon: deinitialize resources in visual_init() after failed memory allocation Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 64/85] serial: sh-sci: disable DMA for uart_console Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 65/85] staging: vc04_services: prevent integer overflow in create_pagelist() Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 66/85] staging: wlan-ng: fix adapter initialization failure Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 67/85] cifs: fix memory leak of pneg_inbuf on -EOPNOTSUPP ioctl case Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 68/85] CIFS: cifs_read_allocate_pages: dont iterate through whole page array on ENOMEM Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 69/85] Revert "lockd: Show pid of lockd for remote locks" Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 70/85] gcc-plugins: Fix build failures under Darwin host Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 71/85] drm/tegra: gem: Fix CPU-cache maintenance for BOs allocated using get_pages() Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 72/85] drm/vmwgfx: Fix user space handle equal to zero Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 73/85] drm/vmwgfx: Fix compat mode shader operation Greg Kroah-Hartman
2019-06-07 15:39 ` [PATCH 5.1 74/85] drm/vmwgfx: Dont send drm sysfs hotplug events on initial master set Greg Kroah-Hartman
2019-06-07 15:40 ` [PATCH 5.1 75/85] drm/sun4i: Fix sun8i HDMI PHY clock initialization Greg Kroah-Hartman
2019-06-07 15:40 ` [PATCH 5.1 76/85] drm/sun4i: Fix sun8i HDMI PHY configuration for > 148.5 MHz Greg Kroah-Hartman
2019-06-07 15:40 ` [PATCH 5.1 77/85] drm/imx: ipuv3-plane: fix atomic update status query for non-plus i.MX6Q Greg Kroah-Hartman
2019-06-07 15:40 ` [PATCH 5.1 78/85] drm/fb-helper: generic: Call drm_client_add() after setup is done Greg Kroah-Hartman
2019-06-07 15:40 ` [PATCH 5.1 79/85] drm/atomic: Wire file_priv through for property changes Greg Kroah-Hartman
2019-06-07 15:40 ` [PATCH 5.1 80/85] drm: Expose "FB_DAMAGE_CLIPS" property to atomic aware user-space only Greg Kroah-Hartman
2019-06-07 15:40 ` [PATCH 5.1 81/85] drm/rockchip: shutdown drm subsystem on shutdown Greg Kroah-Hartman
2019-06-07 15:40 ` [PATCH 5.1 82/85] drm/lease: Make sure implicit planes are leased Greg Kroah-Hartman
2019-06-07 15:40 ` [PATCH 5.1 83/85] drm/cma-helper: Fix drm_gem_cma_free_object() Greg Kroah-Hartman
2019-06-07 15:40 ` [PATCH 5.1 84/85] Revert "x86/build: Move _etext to actual end of .text" Greg Kroah-Hartman
2019-06-07 15:40 ` [PATCH 5.1 85/85] x86/kprobes: Set instruction page as executable Greg Kroah-Hartman
2019-06-07 19:29 ` [PATCH 5.1 00/85] 5.1.8-stable review kernelci.org bot
2019-06-07 20:19 ` Jiunn Chang
2019-06-08  9:31   ` Greg Kroah-Hartman
2019-06-08  7:54 ` Naresh Kamboju
2019-06-08  9:34   ` Greg Kroah-Hartman
2019-06-08 18:50 ` Guenter Roeck
2019-06-09  7:16   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190607153854.908885718@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=chris@chrisdown.name \
    --cc=dennis@kernel.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).