linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm:workingset use real time to judge activity of the file page
@ 2019-04-04  2:01 Zhaoyang Huang
  2019-04-07  0:43 ` Suren Baghdasaryan
  0 siblings, 1 reply; 9+ messages in thread
From: Zhaoyang Huang @ 2019-04-04  2:01 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Pavel Tatashin, Joonsoo Kim,
	David Rientjes, Roman Gushchin, Jeff Layton, Matthew Wilcox,
	linux-mm, linux-kernel

From: Zhaoyang Huang <Zhaoyang Huang@unisoc.com>

In previous implementation, the number of refault pages is used
for judging the refault period of each page, which is not precised.
We introduce the timestamp into the workingset's entry to measure
the file page's activity.

The patch is tested on an Android system, which can be described as
comparing the launch time of an application between a huge memory
consumption. The result is launch time decrease 50% and the page fault
during the test decrease 80%.

Signed-off-by: Zhaoyang Huang <huangzhaoyang@gmail.com>
---
 include/linux/mmzone.h |  2 ++
 mm/workingset.c        | 24 +++++++++++++++++-------
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 32699b2..c38ba0a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -240,6 +240,8 @@ struct lruvec {
 	atomic_long_t			inactive_age;
 	/* Refaults at the time of last reclaim cycle */
 	unsigned long			refaults;
+	atomic_long_t			refaults_ratio;
+	atomic_long_t			prev_fault;
 #ifdef CONFIG_MEMCG
 	struct pglist_data *pgdat;
 #endif
diff --git a/mm/workingset.c b/mm/workingset.c
index 40ee02c..6361853 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -159,7 +159,7 @@
 			 NODES_SHIFT +	\
 			 MEM_CGROUP_ID_SHIFT)
 #define EVICTION_MASK	(~0UL >> EVICTION_SHIFT)
-
+#define EVICTION_JIFFIES (BITS_PER_LONG >> 3)
 /*
  * Eviction timestamps need to be able to cover the full range of
  * actionable refaults. However, bits are tight in the radix tree
@@ -175,18 +175,22 @@ static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction)
 	eviction >>= bucket_order;
 	eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid;
 	eviction = (eviction << NODES_SHIFT) | pgdat->node_id;
+	eviction = (eviction << EVICTION_JIFFIES) | (jiffies >> EVICTION_JIFFIES);
 	eviction = (eviction << RADIX_TREE_EXCEPTIONAL_SHIFT);
 
 	return (void *)(eviction | RADIX_TREE_EXCEPTIONAL_ENTRY);
 }
 
 static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
-			  unsigned long *evictionp)
+			  unsigned long *evictionp, unsigned long *prev_jiffp)
 {
 	unsigned long entry = (unsigned long)shadow;
 	int memcgid, nid;
+	unsigned long prev_jiff;
 
 	entry >>= RADIX_TREE_EXCEPTIONAL_SHIFT;
+	entry >>= EVICTION_JIFFIES;
+	prev_jiff = (entry & ((1UL << EVICTION_JIFFIES) - 1)) << EVICTION_JIFFIES;
 	nid = entry & ((1UL << NODES_SHIFT) - 1);
 	entry >>= NODES_SHIFT;
 	memcgid = entry & ((1UL << MEM_CGROUP_ID_SHIFT) - 1);
@@ -195,6 +199,7 @@ static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
 	*memcgidp = memcgid;
 	*pgdat = NODE_DATA(nid);
 	*evictionp = entry << bucket_order;
+	*prev_jiffp = prev_jiff;
 }
 
 /**
@@ -242,8 +247,12 @@ bool workingset_refault(void *shadow)
 	unsigned long refault;
 	struct pglist_data *pgdat;
 	int memcgid;
+	unsigned long refault_ratio;
+	unsigned long prev_jiff;
+	unsigned long avg_refault_time;
+	unsigned long refault_time;
 
-	unpack_shadow(shadow, &memcgid, &pgdat, &eviction);
+	unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &prev_jiff);
 
 	rcu_read_lock();
 	/*
@@ -288,10 +297,11 @@ bool workingset_refault(void *shadow)
 	 * list is not a problem.
 	 */
 	refault_distance = (refault - eviction) & EVICTION_MASK;
-
 	inc_lruvec_state(lruvec, WORKINGSET_REFAULT);
-
-	if (refault_distance <= active_file) {
+	lruvec->refaults_ratio = atomic_long_read(&lruvec->inactive_age) / jiffies;
+	refault_time = jiffies - prev_jiff;
+	avg_refault_time = refault_distance / lruvec->refaults_ratio;
+	if (refault_time <= avg_refault_time) {
 		inc_lruvec_state(lruvec, WORKINGSET_ACTIVATE);
 		rcu_read_unlock();
 		return true;
@@ -521,7 +531,7 @@ static int __init workingset_init(void)
 	 * some more pages at runtime, so keep working with up to
 	 * double the initial memory by using totalram_pages as-is.
 	 */
-	timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT;
+	timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT - EVICTION_JIFFIES;
 	max_order = fls_long(totalram_pages - 1);
 	if (max_order > timestamp_bits)
 		bucket_order = max_order - timestamp_bits;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread
* [PATCH] mm:workingset use real time to judge activity of the file page
@ 2019-04-04  3:30 Zhaoyang Huang
  2019-04-04  7:15 ` Michal Hocko
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Zhaoyang Huang @ 2019-04-04  3:30 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Pavel Tatashin, Joonsoo Kim,
	David Rientjes, Zhaoyang Huang, Roman Gushchin, Jeff Layton,
	Matthew Wilcox, linux-mm, linux-kernel

From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

In previous implementation, the number of refault pages is used
for judging the refault period of each page, which is not precised as
eviction of other files will be affect a lot on current cache.
We introduce the timestamp into the workingset's entry and refault ratio
to measure the file page's activity. It helps to decrease the affection
of other files(average refault ratio can reflect the view of whole system
's memory).
The patch is tested on an Android system, which can be described as
comparing the launch time of an application between a huge memory
consumption. The result is launch time decrease 50% and the page fault
during the test decrease 80%.

Signed-off-by: Zhaoyang Huang <huangzhaoyang@gmail.com>
---
 include/linux/mmzone.h |  2 ++
 mm/workingset.c        | 24 +++++++++++++++++-------
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 32699b2..c38ba0a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -240,6 +240,8 @@ struct lruvec {
 	atomic_long_t			inactive_age;
 	/* Refaults at the time of last reclaim cycle */
 	unsigned long			refaults;
+	atomic_long_t			refaults_ratio;
+	atomic_long_t			prev_fault;
 #ifdef CONFIG_MEMCG
 	struct pglist_data *pgdat;
 #endif
diff --git a/mm/workingset.c b/mm/workingset.c
index 40ee02c..6361853 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -159,7 +159,7 @@
 			 NODES_SHIFT +	\
 			 MEM_CGROUP_ID_SHIFT)
 #define EVICTION_MASK	(~0UL >> EVICTION_SHIFT)
-
+#define EVICTION_JIFFIES (BITS_PER_LONG >> 3)
 /*
  * Eviction timestamps need to be able to cover the full range of
  * actionable refaults. However, bits are tight in the radix tree
@@ -175,18 +175,22 @@ static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction)
 	eviction >>= bucket_order;
 	eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid;
 	eviction = (eviction << NODES_SHIFT) | pgdat->node_id;
+	eviction = (eviction << EVICTION_JIFFIES) | (jiffies >> EVICTION_JIFFIES);
 	eviction = (eviction << RADIX_TREE_EXCEPTIONAL_SHIFT);
 
 	return (void *)(eviction | RADIX_TREE_EXCEPTIONAL_ENTRY);
 }
 
 static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
-			  unsigned long *evictionp)
+			  unsigned long *evictionp, unsigned long *prev_jiffp)
 {
 	unsigned long entry = (unsigned long)shadow;
 	int memcgid, nid;
+	unsigned long prev_jiff;
 
 	entry >>= RADIX_TREE_EXCEPTIONAL_SHIFT;
+	entry >>= EVICTION_JIFFIES;
+	prev_jiff = (entry & ((1UL << EVICTION_JIFFIES) - 1)) << EVICTION_JIFFIES;
 	nid = entry & ((1UL << NODES_SHIFT) - 1);
 	entry >>= NODES_SHIFT;
 	memcgid = entry & ((1UL << MEM_CGROUP_ID_SHIFT) - 1);
@@ -195,6 +199,7 @@ static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
 	*memcgidp = memcgid;
 	*pgdat = NODE_DATA(nid);
 	*evictionp = entry << bucket_order;
+	*prev_jiffp = prev_jiff;
 }
 
 /**
@@ -242,8 +247,12 @@ bool workingset_refault(void *shadow)
 	unsigned long refault;
 	struct pglist_data *pgdat;
 	int memcgid;
+	unsigned long refault_ratio;
+	unsigned long prev_jiff;
+	unsigned long avg_refault_time;
+	unsigned long refault_time;
 
-	unpack_shadow(shadow, &memcgid, &pgdat, &eviction);
+	unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &prev_jiff);
 
 	rcu_read_lock();
 	/*
@@ -288,10 +297,11 @@ bool workingset_refault(void *shadow)
 	 * list is not a problem.
 	 */
 	refault_distance = (refault - eviction) & EVICTION_MASK;
-
 	inc_lruvec_state(lruvec, WORKINGSET_REFAULT);
-
-	if (refault_distance <= active_file) {
+	lruvec->refaults_ratio = atomic_long_read(&lruvec->inactive_age) / jiffies;
+	refault_time = jiffies - prev_jiff;
+	avg_refault_time = refault_distance / lruvec->refaults_ratio;
+	if (refault_time <= avg_refault_time) {
 		inc_lruvec_state(lruvec, WORKINGSET_ACTIVATE);
 		rcu_read_unlock();
 		return true;
@@ -521,7 +531,7 @@ static int __init workingset_init(void)
 	 * some more pages at runtime, so keep working with up to
 	 * double the initial memory by using totalram_pages as-is.
 	 */
-	timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT;
+	timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT - EVICTION_JIFFIES;
 	max_order = fls_long(totalram_pages - 1);
 	if (max_order > timestamp_bits)
 		bucket_order = max_order - timestamp_bits;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-04-07  0:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-04-04  2:01 [PATCH] mm:workingset use real time to judge activity of the file page Zhaoyang Huang
2019-04-07  0:43 ` Suren Baghdasaryan
  -- strict thread matches above, loose matches on Subject: below --
2019-04-04  3:30 Zhaoyang Huang
2019-04-04  7:15 ` Michal Hocko
2019-04-05  3:13   ` Zhaoyang Huang
2019-04-04 16:39 ` Johannes Weiner
2019-04-04 23:23   ` Zhaoyang Huang
2019-04-05 19:34     ` Johannes Weiner
2019-04-05  3:24 ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).