From: akpm@linux-foundation.org
To: hannes@cmpxchg.org, gthelen@google.com, mhocko@suse.cz,
vdavydov@parallels.com, mm-commits@vger.kernel.org
Subject: [to-be-updated] mm-memcontrol-default-hierarchy-interface-for-memory-fix.patch removed from -mm tree
Date: Tue, 20 Jan 2015 16:21:46 -0800 [thread overview]
Message-ID: <54bef11a.ATtk8f5aowDXtF89%akpm@linux-foundation.org> (raw)
The patch titled
Subject: mm: memcontrol: default hierarchy interface for memory fix
has been removed from the -mm tree. Its filename was
mm-memcontrol-default-hierarchy-interface-for-memory-fix.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: default hierarchy interface for memory fix
Document and rationalize where the default hierarchy interface differs
from the traditional memory cgroups interface.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/cgroups/unified-hierarchy.txt | 80 ++++++++++++++++++
1 file changed, 80 insertions(+)
diff -puN Documentation/cgroups/unified-hierarchy.txt~mm-memcontrol-default-hierarchy-interface-for-memory-fix Documentation/cgroups/unified-hierarchy.txt
--- a/Documentation/cgroups/unified-hierarchy.txt~mm-memcontrol-default-hierarchy-interface-for-memory-fix
+++ a/Documentation/cgroups/unified-hierarchy.txt
@@ -327,6 +327,86 @@ supported and the interface files "relea
- use_hierarchy is on by default and the cgroup file for the flag is
not created.
+- The original lower boundary, the soft limit, is defined as a limit
+ that is per default unset. As a result, the set of cgroups that
+ global reclaim prefers is opt-in, rather than opt-out. The costs
+ for optimizing these mostly negative lookups are so high that the
+ implementation, despite its enormous size, does not even provide the
+ basic desirable behavior. First off, the soft limit has no
+ hierarchical meaning. All configured groups are organized in a
+ global rbtree and treated like equal peers, regardless where they
+ are located in the hierarchy. This makes subtree delegation
+ impossible. Second, the soft limit reclaim pass is so aggressive
+ that it not just introduces high allocation latencies into the
+ system, but also impacts system performance due to overreclaim, to
+ the point where the feature becomes self-defeating.
+
+ The memory.low boundary on the other hand is a top-down allocated
+ reserve. A cgroup enjoys reclaim protection when it and all its
+ ancestors are below their low boundaries, which makes delegation of
+ subtrees possible. Secondly, new cgroups have no reserve per
+ default and in the common case most cgroups are eligible for the
+ preferred reclaim pass. This allows the new low boundary to be
+ efficiently implemented with just a minor addition to the generic
+ reclaim code, without the need for out-of-band data structures and
+ reclaim passes. Because the generic reclaim code considers all
+ cgroups except for the ones running low in the preferred first
+ reclaim pass, overreclaim of individual groups is eliminated as
+ well, resulting in much better overall workload performance.
+
+- The original high boundary, the hard limit, is defined as a strict
+ limit that can not budge, even if the OOM killer has to be called.
+ But this generally goes against the goal of making the most out of
+ the available memory. The memory consumption of workloads varies
+ during runtime, and that requires users to overcommit. But doing
+ that with a strict upper limit requires either a fairly accurate
+ prediction of the working set size or adding slack to the limit.
+ Since working set size estimation is hard and error prone, and
+ getting it wrong results in OOM kills, most users tend to err on the
+ side of a looser limit and end up wasting precious resources.
+
+ The memory.high boundary on the other hand can be set much more
+ conservatively. When hit, it throttles allocations by forcing them
+ into direct reclaim to work off the excess, but it never invokes the
+ OOM killer. As a result, a high boundary that is chosen too
+ aggressively will not terminate the processes, but instead it will
+ lead to gradual performance degradation. The user can monitor this
+ and make corrections until the minimal memory footprint that still
+ gives acceptable performance is found.
+
+ In extreme cases, with many concurrent allocations and a complete
+ breakdown of reclaim progress within the group, the high boundary
+ can be exceeded. But even then it's mostly better to satisfy the
+ allocation from the slack available in other groups or the rest of
+ the system than killing the group. Otherwise, memory.max is there
+ to limit this type of spillover and ultimately contain buggy or even
+ malicious applications.
+
+- The original control file names are unwieldy and inconsistent in
+ many different ways. For example, the upper boundary hit count is
+ exported in the memory.failcnt file, but an OOM event count has to
+ be manually counted by listening to memory.oom_control events, and
+ lower boundary / soft limit events have to be counted by first
+ setting a threshold for that value and then counting those events.
+ Also, usage and limit files encode their units in the filename.
+ That makes the filenames very long, even though this is not
+ information that a user needs to be reminded of every time they type
+ out those names.
+
+ To address these naming issues, as well as to signal clearly that
+ the new interface carries a new configuration model, the naming
+ conventions in it necessarily differ from the old interface.
+
+- The original limit files indicate the state of an unset limit with a
+ Very High Number, and a configured limit can be unset by echoing -1
+ into those files. But that very high number is implementation and
+ architecture dependent and not very descriptive. And while -1 can
+ be understood as an underflow into the highest possible value, -2 or
+ -10M etc. do not work, so it's not consistent.
+
+ memory.low and memory.high will indicate "none" if the boundary is
+ not configured, and a configured boundary can be unset by writing
+ "none" into these files as well.
5. Planned Changes
_
Patches currently in -mm which might be from hannes@cmpxchg.org are
mm-page_alloc-embed-oom-killing-naturally-into-allocation-slowpath.patch
memcg-remove-extra-newlines-from-memcg-oom-kill-log.patch
mm-vmscan-fix-highidx-argument-type.patch
mm-memory-remove-vm_file-check-on-shared-writable-vmas.patch
mm-memory-merge-shared-writable-dirtying-branches-in-do_wp_page.patch
mm-page_alloc-place-zone_id-check-before-vm_bug_on_page-check.patch
memcg-zap-__memcg_chargeuncharge_slab.patch
memcg-zap-memcg_name-argument-of-memcg_create_kmem_cache.patch
memcg-zap-memcg_slab_caches-and-memcg_slab_mutex.patch
mm-add-fields-for-compound-destructor-and-order-into-struct-page.patch
swap-remove-unused-mem_cgroup_uncharge_swapcache-declaration.patch
mm-memcontrol-track-move_lock-state-internally.patch
mm-memcontrol-track-move_lock-state-internally-fix.patch
mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask.patch
mm-vmscan-wake-up-all-pfmemalloc-throttled-processes-at-once.patch
mm-hugetlb-reduce-arch-dependent-code-around-follow_huge_.patch
mm-hugetlb-pmd_huge-returns-true-for-non-present-hugepage.patch
mm-hugetlb-take-page-table-lock-in-follow_huge_pmd.patch
mm-hugetlb-fix-getting-refcount-0-page-in-hugetlb_fault.patch
mm-hugetlb-add-migration-hwpoisoned-entry-check-in-hugetlb_change_protection.patch
mm-hugetlb-add-migration-entry-check-in-__unmap_hugepage_range.patch
mm-hugetlb-fix-suboptimal-migration-hwpoisoned-entry-check.patch
mm-hugetlb-cleanup-and-rename-is_hugetlb_entry_migrationhwpoisoned.patch
mm-set-page-pfmemalloc-in-prep_new_page.patch
mm-page_alloc-reduce-number-of-alloc_pages-functions-parameters.patch
mm-reduce-try_to_compact_pages-parameters.patch
mm-microoptimize-zonelist-operations.patch
list_lru-introduce-list_lru_shrink_countwalk.patch
fs-consolidate-nrfree_cached_objects-args-in-shrink_control.patch
vmscan-per-memory-cgroup-slab-shrinkers.patch
memcg-rename-some-cache-id-related-variables.patch
memcg-add-rwsem-to-synchronize-against-memcg_caches-arrays-relocation.patch
list_lru-get-rid-of-active_nodes.patch
list_lru-organize-all-list_lrus-to-list.patch
list_lru-introduce-per-memcg-lists.patch
fs-make-shrinker-memcg-aware.patch
vmscan-force-scan-offline-memory-cgroups.patch
vmscan-force-scan-offline-memory-cgroups-fix.patch
memcg-add-build_bug_on-for-string-tables.patch
mm-memcontrol-default-hierarchy-interface-for-memory-fix-high-reclaim.patch
mm-memcontrol-default-hierarchy-interface-for-memory-fix-none.patch
mm-memcontrol-fold-move_anon-and-move_file.patch
mm-memcontrol-fold-move_anon-and-move_file-fix.patch
oom-add-helpers-for-setting-and-clearing-tif_memdie.patch
oom-thaw-the-oom-victim-if-it-is-frozen.patch
pm-convert-printk-to-pr_-equivalent.patch
sysrq-convert-printk-to-pr_-equivalent.patch
oom-pm-make-oom-detection-in-the-freezer-path-raceless.patch
mm-memcontrol-remove-unnecessary-soft-limit-tree-node-test.patch
mm-memcontrol-consolidate-memory-controller-initialization.patch
mm-memcontrol-consolidate-swap-controller-code.patch
fs-shrinker-always-scan-at-least-one-object-of-each-type.patch
fs-shrinker-always-scan-at-least-one-object-of-each-type-fix.patch
mm-vmscan-fix-the-page-state-calculation-in-too_many_isolated.patch
mm-vmscan-fix-the-page-state-calculation-in-too_many_isolated-fix.patch
documentation-proc-add-proc-pid-numa_maps-interface-explanation-snippet.patch
fs-proc-task_mmu-show-page-size-in-proc-pid-numa_maps.patch
fs-proc-task_mmu-show-page-size-in-proc-pid-numa_maps-fix.patch
reply other threads:[~2015-01-21 0:21 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54bef11a.ATtk8f5aowDXtF89%akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.cz \
--cc=mm-commits@vger.kernel.org \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.