From: Johannes Weiner <hannes@cmpxchg.org>
To: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@kernel.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: [PATCH 1/3] mm: workingset: eviction buckets for bigmem/lowbit machines
Date: Mon, 25 Jan 2016 11:41:14 -0500 [thread overview]
Message-ID: <20160125164114.GB29291@cmpxchg.org> (raw)
In-Reply-To: <20160125163907.GA29291@cmpxchg.org>
space will get tight once we need to identify the memcg. add this to
stretch out the necessary distance by sacrificing granularity.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/workingset.c | 40 +++++++++++++++++++++++++++++++++++-----
1 file changed, 35 insertions(+), 5 deletions(-)
diff --git a/mm/workingset.c b/mm/workingset.c
index 61ead9e5549d..6f3ba184ffb2 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -152,8 +152,23 @@
* refault distance will immediately activate the refaulting page.
*/
+#define EVICTION_SHIFT (NODES_SHIFT + ZONES_SHIFT + \
+ RADIX_TREE_EXCEPTIONAL_SHIFT)
+#define EVICTION_MASK (~0UL >> EVICTION_SHIFT)
+
+/*
+ * Eviction timestamps need to be able to cover the full range of
+ * actionable refaults. However, bits are tight in the radix tree
+ * entry, and after storing the identifier for the lruvec there might
+ * not be enough left to represent every single actionable refault. In
+ * that case, we have to sacrifice granularity for distance, and group
+ * evictions into coarser buckets by shaving off lower timestamp bits.
+ */
+static unsigned int bucket_order;
+
static void *pack_shadow(unsigned long eviction, struct zone *zone)
{
+ eviction >>= bucket_order;
eviction = (eviction << NODES_SHIFT) | zone_to_nid(zone);
eviction = (eviction << ZONES_SHIFT) | zone_idx(zone);
eviction = (eviction << RADIX_TREE_EXCEPTIONAL_SHIFT);
@@ -168,7 +183,6 @@ static void unpack_shadow(void *shadow,
unsigned long entry = (unsigned long)shadow;
unsigned long eviction;
unsigned long refault;
- unsigned long mask;
int zid, nid;
entry >>= RADIX_TREE_EXCEPTIONAL_SHIFT;
@@ -176,13 +190,12 @@ static void unpack_shadow(void *shadow,
entry >>= ZONES_SHIFT;
nid = entry & ((1UL << NODES_SHIFT) - 1);
entry >>= NODES_SHIFT;
- eviction = entry;
+ eviction = entry << bucket_order;
*zone = NODE_DATA(nid)->node_zones + zid;
refault = atomic_long_read(&(*zone)->inactive_age);
- mask = ~0UL >> (NODES_SHIFT + ZONES_SHIFT +
- RADIX_TREE_EXCEPTIONAL_SHIFT);
+
/*
* The unsigned subtraction here gives an accurate distance
* across inactive_age overflows in most cases.
@@ -199,7 +212,7 @@ static void unpack_shadow(void *shadow,
* inappropriate activation leading to pressure on the active
* list is not a problem.
*/
- *distance = (refault - eviction) & mask;
+ *distance = (refault - eviction) & EVICTION_MASK;
}
/**
@@ -398,8 +411,25 @@ static struct lock_class_key shadow_nodes_key;
static int __init workingset_init(void)
{
+ unsigned int timestamp_bits;
+ unsigned int max_order;
int ret;
+ BUILD_BUG_ON(BITS_PER_LONG < EVICTION_SHIFT);
+ /*
+ * Calculate the eviction bucket size to cover the longest
+ * actionable refault distance, which is currently half of
+ * memory (totalram_pages/2). However, memory hotplug may add
+ * some more pages at runtime, so keep working with up to
+ * double the initial memory by using totalram_pages as-is.
+ */
+ timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT;
+ max_order = fls_long(totalram_pages - 1);
+ if (max_order > timestamp_bits)
+ bucket_order = max_order - timestamp_bits;
+ printk("workingset: timestamp_bits=%d max_order=%d bucket_order=%u\n",
+ timestamp_bits, max_order, bucket_order);
+
ret = list_lru_init_key(&workingset_shadow_nodes, &shadow_nodes_key);
if (ret)
goto err;
--
2.7.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@kernel.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: [PATCH 1/3] mm: workingset: eviction buckets for bigmem/lowbit machines
Date: Mon, 25 Jan 2016 11:41:14 -0500 [thread overview]
Message-ID: <20160125164114.GB29291@cmpxchg.org> (raw)
In-Reply-To: <20160125163907.GA29291@cmpxchg.org>
space will get tight once we need to identify the memcg. add this to
stretch out the necessary distance by sacrificing granularity.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/workingset.c | 40 +++++++++++++++++++++++++++++++++++-----
1 file changed, 35 insertions(+), 5 deletions(-)
diff --git a/mm/workingset.c b/mm/workingset.c
index 61ead9e5549d..6f3ba184ffb2 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -152,8 +152,23 @@
* refault distance will immediately activate the refaulting page.
*/
+#define EVICTION_SHIFT (NODES_SHIFT + ZONES_SHIFT + \
+ RADIX_TREE_EXCEPTIONAL_SHIFT)
+#define EVICTION_MASK (~0UL >> EVICTION_SHIFT)
+
+/*
+ * Eviction timestamps need to be able to cover the full range of
+ * actionable refaults. However, bits are tight in the radix tree
+ * entry, and after storing the identifier for the lruvec there might
+ * not be enough left to represent every single actionable refault. In
+ * that case, we have to sacrifice granularity for distance, and group
+ * evictions into coarser buckets by shaving off lower timestamp bits.
+ */
+static unsigned int bucket_order;
+
static void *pack_shadow(unsigned long eviction, struct zone *zone)
{
+ eviction >>= bucket_order;
eviction = (eviction << NODES_SHIFT) | zone_to_nid(zone);
eviction = (eviction << ZONES_SHIFT) | zone_idx(zone);
eviction = (eviction << RADIX_TREE_EXCEPTIONAL_SHIFT);
@@ -168,7 +183,6 @@ static void unpack_shadow(void *shadow,
unsigned long entry = (unsigned long)shadow;
unsigned long eviction;
unsigned long refault;
- unsigned long mask;
int zid, nid;
entry >>= RADIX_TREE_EXCEPTIONAL_SHIFT;
@@ -176,13 +190,12 @@ static void unpack_shadow(void *shadow,
entry >>= ZONES_SHIFT;
nid = entry & ((1UL << NODES_SHIFT) - 1);
entry >>= NODES_SHIFT;
- eviction = entry;
+ eviction = entry << bucket_order;
*zone = NODE_DATA(nid)->node_zones + zid;
refault = atomic_long_read(&(*zone)->inactive_age);
- mask = ~0UL >> (NODES_SHIFT + ZONES_SHIFT +
- RADIX_TREE_EXCEPTIONAL_SHIFT);
+
/*
* The unsigned subtraction here gives an accurate distance
* across inactive_age overflows in most cases.
@@ -199,7 +212,7 @@ static void unpack_shadow(void *shadow,
* inappropriate activation leading to pressure on the active
* list is not a problem.
*/
- *distance = (refault - eviction) & mask;
+ *distance = (refault - eviction) & EVICTION_MASK;
}
/**
@@ -398,8 +411,25 @@ static struct lock_class_key shadow_nodes_key;
static int __init workingset_init(void)
{
+ unsigned int timestamp_bits;
+ unsigned int max_order;
int ret;
+ BUILD_BUG_ON(BITS_PER_LONG < EVICTION_SHIFT);
+ /*
+ * Calculate the eviction bucket size to cover the longest
+ * actionable refault distance, which is currently half of
+ * memory (totalram_pages/2). However, memory hotplug may add
+ * some more pages at runtime, so keep working with up to
+ * double the initial memory by using totalram_pages as-is.
+ */
+ timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT;
+ max_order = fls_long(totalram_pages - 1);
+ if (max_order > timestamp_bits)
+ bucket_order = max_order - timestamp_bits;
+ printk("workingset: timestamp_bits=%d max_order=%d bucket_order=%u\n",
+ timestamp_bits, max_order, bucket_order);
+
ret = list_lru_init_key(&workingset_shadow_nodes, &shadow_nodes_key);
if (ret)
goto err;
--
2.7.0
next prev parent reply other threads:[~2016-01-25 16:41 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-24 16:56 [PATCH v2] mm: workingset: make workingset detection logic memcg aware Vladimir Davydov
2016-01-24 16:56 ` Vladimir Davydov
2016-01-25 16:39 ` Johannes Weiner
2016-01-25 16:39 ` Johannes Weiner
2016-01-25 16:41 ` Johannes Weiner [this message]
2016-01-25 16:41 ` [PATCH 1/3] mm: workingset: eviction buckets for bigmem/lowbit machines Johannes Weiner
2016-01-25 16:41 ` [PATCH 2/3] mm: workingset: separate shadow unpacking and refault calculation Johannes Weiner
2016-01-25 16:41 ` Johannes Weiner
2016-01-25 16:42 ` [PATCH 3/3] mm: workingset: cgroup-aware Johannes Weiner
2016-01-25 16:42 ` Johannes Weiner
2016-01-26 8:27 ` [PATCH v2] mm: workingset: make workingset detection logic memcg aware Vladimir Davydov
2016-01-26 8:27 ` Vladimir Davydov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160125164114.GB29291@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=vdavydov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.