* [RFC 0/5] idle page tracking / working set estimation
@ 2011-03-25 8:43 Michel Lespinasse
2011-03-25 8:43 ` [PATCH 1/5] page_referenced: replace vm_flags parameter with struct pr_info Michel Lespinasse
` (4 more replies)
0 siblings, 5 replies; 8+ messages in thread
From: Michel Lespinasse @ 2011-03-25 8:43 UTC (permalink / raw)
To: linux-mm; +Cc: KOSAKI Motohiro
I would like to sollicit comments on the following patches. In order to
optimize job placement accross many machines, we collect memory utilization
statistics for each cgroup on each machine. The statistics are intended
to be compared accross many machines - we don't just want to know which
cgroup to reclaim from on an individual machine, we also need to know
which machine is best to target a job onto within a large cluster.
Also, we try to have a low impact on the normal MM algorithms - we think
they already do a fine job balancing resources on individual machines, so
we are not trying to mess up with that here.
Patch 1 introduces no functionality; it modifies the page_referenced API
so that it can be more easily extended in patch 2.
Patch 2 introduces page_referenced_kstaled(), which is similar to
page_referenced() but is used for idle page tracking rather than
for memory reclaimation. Since both functions clear the pte_young bits
and we don't want them to interfere with each other, two new page flags
are introduced that track when young pte references have been cleared by
each of the page_referenced variants. The page_referenced functions are also
extended to return the dirty status of any pte references encountered.
Patch 3 introduces the 'kstaled' thread that handles idle page tracking.
The thread starts disabled; one enables it by setting a scanning interval
in /sys/kernel/mm/kstaled/scan_seconds. It then scans all physical memory
pages, looking for idle pages - pages that have not been touched since the
previous scan interval. These pages are further classified into idle_clean
(which are immediately reclaimable), idle_dirty_swap (which are reclaimable
if swap is enabled on the system), and idle_dirty_file (which are reclaimable
after writeback occurs). These statistics are published for each cgroup in
a new /dev/cgroup/*/memory.idle_page_stats file. We did not use the
memory.stat file there because we thought these stats are different -
first, they are meaningless until one sets the scan_seconds value, and
then they are only updated once per scan interval where the memory.stat
values are continually updated.
Patch 4 is a small optimization skipping over memory holes
Patch 5 rate limits the idle page scanning so that it occurs in small
chunks over the length of the scan interval, rather than all at once.
Please note that there are known problems in these changes. In particular,
kstaled_scan_page gets page references in a way that is known to be unsafe
when THP is enabled. I'm still figuring out how to address this, but thought
it may be useful to send the current patches for discussion first.
Michel Lespinasse (5):
page_referenced: replace vm_flags parameter with struct pr_info
kstaled: page_referenced_kstaled() and supporting infrastructure.
kstaled: minimalistic implementation.
kstaled: skip non-RAM regions.
kstaled: rate limit pages scanned per second.
arch/x86/include/asm/page_types.h | 8 +
arch/x86/kernel/e820.c | 45 +++++
include/linux/ksm.h | 9 +-
include/linux/mmzone.h | 7 +
include/linux/page-flags.h | 25 +++
include/linux/rmap.h | 78 ++++++++-
mm/ksm.c | 15 +-
mm/memcontrol.c | 339 +++++++++++++++++++++++++++++++++++++
mm/memory.c | 14 ++
mm/rmap.c | 136 ++++++++-------
mm/swap.c | 1 +
mm/vmscan.c | 18 +-
12 files changed, 604 insertions(+), 91 deletions(-)
--
1.7.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/5] page_referenced: replace vm_flags parameter with struct pr_info
2011-03-25 8:43 [RFC 0/5] idle page tracking / working set estimation Michel Lespinasse
@ 2011-03-25 8:43 ` Michel Lespinasse
2011-03-25 8:43 ` [PATCH 2/5] kstaled: page_referenced_kstaled() and supporting infrastructure Michel Lespinasse
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Michel Lespinasse @ 2011-03-25 8:43 UTC (permalink / raw)
To: linux-mm; +Cc: KOSAKI Motohiro
Introduce struct pr_info, passed into page_referenced() family of functions,
to represent information about the pte references that have been found for
that page. Currently contains the vm_flags information as well as
a PR_REFERENCED flag. The idea is to make it easy to extend the API
with new flags in the future.
Signed-off-by: Michel Lespinasse <walken@google.com>
---
include/linux/ksm.h | 9 ++---
include/linux/rmap.h | 28 ++++++++++-----
mm/ksm.c | 15 +++-----
mm/rmap.c | 92 +++++++++++++++++++++++---------------------------
mm/vmscan.c | 18 +++++----
5 files changed, 81 insertions(+), 81 deletions(-)
diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 3319a69..432c49b 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -83,8 +83,8 @@ static inline int ksm_might_need_to_copy(struct page *page,
page->index != linear_page_index(vma, address));
}
-int page_referenced_ksm(struct page *page,
- struct mem_cgroup *memcg, unsigned long *vm_flags);
+void page_referenced_ksm(struct page *page,
+ struct mem_cgroup *memcg, struct pr_info *info);
int try_to_unmap_ksm(struct page *page, enum ttu_flags flags);
int rmap_walk_ksm(struct page *page, int (*rmap_one)(struct page *,
struct vm_area_struct *, unsigned long, void *), void *arg);
@@ -119,10 +119,9 @@ static inline int ksm_might_need_to_copy(struct page *page,
return 0;
}
-static inline int page_referenced_ksm(struct page *page,
- struct mem_cgroup *memcg, unsigned long *vm_flags)
+static inline void page_referenced_ksm(struct page *page,
+ struct mem_cgroup *memcg, struct pr_info *info)
{
- return 0;
}
static inline int try_to_unmap_ksm(struct page *page, enum ttu_flags flags)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index e9fd04c..61f51af 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -70,6 +70,15 @@ struct anon_vma_chain {
struct list_head same_anon_vma; /* locked by anon_vma->lock */
};
+/*
+ * Information to be filled by page_referenced() and friends.
+ */
+struct pr_info {
+ unsigned long vm_flags;
+ unsigned int pr_flags;
+#define PR_REFERENCED 1
+};
+
#ifdef CONFIG_MMU
#if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
static inline void anonvma_external_refcount_init(struct anon_vma *anon_vma)
@@ -181,10 +190,11 @@ static inline void page_dup_rmap(struct page *page)
/*
* Called from mm/vmscan.c to handle paging out
*/
-int page_referenced(struct page *, int is_locked,
- struct mem_cgroup *cnt, unsigned long *vm_flags);
-int page_referenced_one(struct page *, struct vm_area_struct *,
- unsigned long address, unsigned int *mapcount, unsigned long *vm_flags);
+void page_referenced(struct page *, int is_locked,
+ struct mem_cgroup *cnt, struct pr_info *info);
+void page_referenced_one(struct page *, struct vm_area_struct *,
+ unsigned long address, unsigned int *mapcount,
+ struct pr_info *info);
enum ttu_flags {
TTU_UNMAP = 0, /* unmap mode */
@@ -272,12 +282,12 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *,
#define anon_vma_prepare(vma) (0)
#define anon_vma_link(vma) do {} while (0)
-static inline int page_referenced(struct page *page, int is_locked,
- struct mem_cgroup *cnt,
- unsigned long *vm_flags)
+static inline void page_referenced(struct page *page, int is_locked,
+ struct mem_cgroup *cnt,
+ struct pr_info *info)
{
- *vm_flags = 0;
- return 0;
+ info->vm_flags = 0;
+ info->pr_flags = 0;
}
#define try_to_unmap(page, refs) SWAP_FAIL
diff --git a/mm/ksm.c b/mm/ksm.c
index c2b2a94..bbd80ad 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1593,14 +1593,13 @@ struct page *ksm_does_need_to_copy(struct page *page,
return new_page;
}
-int page_referenced_ksm(struct page *page, struct mem_cgroup *memcg,
- unsigned long *vm_flags)
+void page_referenced_ksm(struct page *page, struct mem_cgroup *memcg,
+ struct pr_info *info)
{
struct stable_node *stable_node;
struct rmap_item *rmap_item;
struct hlist_node *hlist;
unsigned int mapcount = page_mapcount(page);
- int referenced = 0;
int search_new_forks = 0;
VM_BUG_ON(!PageKsm(page));
@@ -1608,7 +1607,7 @@ int page_referenced_ksm(struct page *page, struct mem_cgroup *memcg,
stable_node = page_stable_node(page);
if (!stable_node)
- return 0;
+ return;
again:
hlist_for_each_entry(rmap_item, hlist, &stable_node->hlist, hlist) {
struct anon_vma *anon_vma = rmap_item->anon_vma;
@@ -1633,19 +1632,17 @@ again:
if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
continue;
- referenced += page_referenced_one(page, vma,
- rmap_item->address, &mapcount, vm_flags);
+ page_referenced_one(page, vma, rmap_item->address,
+ &mapcount, info);
if (!search_new_forks || !mapcount)
break;
}
anon_vma_unlock(anon_vma);
if (!mapcount)
- goto out;
+ return;
}
if (!search_new_forks++)
goto again;
-out:
- return referenced;
}
int try_to_unmap_ksm(struct page *page, enum ttu_flags flags)
diff --git a/mm/rmap.c b/mm/rmap.c
index 941bf82..ee2c413 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -490,12 +490,12 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
* Subfunctions of page_referenced: page_referenced_one called
* repeatedly from either page_referenced_anon or page_referenced_file.
*/
-int page_referenced_one(struct page *page, struct vm_area_struct *vma,
- unsigned long address, unsigned int *mapcount,
- unsigned long *vm_flags)
+void page_referenced_one(struct page *page, struct vm_area_struct *vma,
+ unsigned long address, unsigned int *mapcount,
+ struct pr_info *info)
{
struct mm_struct *mm = vma->vm_mm;
- int referenced = 0;
+ bool referenced = false;
if (unlikely(PageTransHuge(page))) {
pmd_t *pmd;
@@ -509,19 +509,19 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
PAGE_CHECK_ADDRESS_PMD_FLAG);
if (!pmd) {
spin_unlock(&mm->page_table_lock);
- goto out;
+ return;
}
if (vma->vm_flags & VM_LOCKED) {
spin_unlock(&mm->page_table_lock);
*mapcount = 0; /* break early from loop */
- *vm_flags |= VM_LOCKED;
- goto out;
+ info->vm_flags |= VM_LOCKED;
+ return;
}
/* go ahead even if the pmd is pmd_trans_splitting() */
if (pmdp_clear_flush_young_notify(vma, address, pmd))
- referenced++;
+ referenced = true;
spin_unlock(&mm->page_table_lock);
} else {
pte_t *pte;
@@ -533,13 +533,13 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
*/
pte = page_check_address(page, mm, address, &ptl, 0);
if (!pte)
- goto out;
+ return;
if (vma->vm_flags & VM_LOCKED) {
pte_unmap_unlock(pte, ptl);
*mapcount = 0; /* break early from loop */
- *vm_flags |= VM_LOCKED;
- goto out;
+ info->vm_flags |= VM_LOCKED;
+ return;
}
if (ptep_clear_flush_young_notify(vma, address, pte)) {
@@ -551,7 +551,7 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
* set PG_referenced or activated the page.
*/
if (likely(!VM_SequentialReadHint(vma)))
- referenced++;
+ referenced = true;
}
pte_unmap_unlock(pte, ptl);
}
@@ -560,28 +560,27 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
swap token and is in the middle of a page fault. */
if (mm != current->mm && has_swap_token(mm) &&
rwsem_is_locked(&mm->mmap_sem))
- referenced++;
+ referenced = true;
(*mapcount)--;
- if (referenced)
- *vm_flags |= vma->vm_flags;
-out:
- return referenced;
+ if (referenced) {
+ info->vm_flags |= vma->vm_flags;
+ info->pr_flags |= PR_REFERENCED;
+ }
}
-static int page_referenced_anon(struct page *page,
- struct mem_cgroup *mem_cont,
- unsigned long *vm_flags)
+static void page_referenced_anon(struct page *page,
+ struct mem_cgroup *mem_cont,
+ struct pr_info *info)
{
unsigned int mapcount;
struct anon_vma *anon_vma;
struct anon_vma_chain *avc;
- int referenced = 0;
anon_vma = page_lock_anon_vma(page);
if (!anon_vma)
- return referenced;
+ return;
mapcount = page_mapcount(page);
list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
@@ -596,21 +595,20 @@ static int page_referenced_anon(struct page *page,
*/
if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
continue;
- referenced += page_referenced_one(page, vma, address,
- &mapcount, vm_flags);
+ page_referenced_one(page, vma, address, &mapcount, info);
if (!mapcount)
break;
}
page_unlock_anon_vma(anon_vma);
- return referenced;
}
/**
* page_referenced_file - referenced check for object-based rmap
* @page: the page we're checking references on.
* @mem_cont: target memory controller
- * @vm_flags: collect encountered vma->vm_flags who actually referenced the page
+ * @info: collect encountered vma->vm_flags who actually referenced the page
+ * as well as flags describing the page references encountered.
*
* For an object-based mapped page, find all the places it is mapped and
* check/clear the referenced flag. This is done by following the page->mapping
@@ -619,16 +617,15 @@ static int page_referenced_anon(struct page *page,
*
* This function is only called from page_referenced for object-based pages.
*/
-static int page_referenced_file(struct page *page,
- struct mem_cgroup *mem_cont,
- unsigned long *vm_flags)
+static void page_referenced_file(struct page *page,
+ struct mem_cgroup *mem_cont,
+ struct pr_info *info)
{
unsigned int mapcount;
struct address_space *mapping = page->mapping;
pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
struct vm_area_struct *vma;
struct prio_tree_iter iter;
- int referenced = 0;
/*
* The caller's checks on page->mapping and !PageAnon have made
@@ -664,14 +661,12 @@ static int page_referenced_file(struct page *page,
*/
if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
continue;
- referenced += page_referenced_one(page, vma, address,
- &mapcount, vm_flags);
+ page_referenced_one(page, vma, address, &mapcount, info);
if (!mapcount)
break;
}
spin_unlock(&mapping->i_mmap_lock);
- return referenced;
}
/**
@@ -679,45 +674,42 @@ static int page_referenced_file(struct page *page,
* @page: the page to test
* @is_locked: caller holds lock on the page
* @mem_cont: target memory controller
- * @vm_flags: collect encountered vma->vm_flags who actually referenced the page
+ * @info: collect encountered vma->vm_flags who actually referenced the page
+ * as well as flags describing the page references encountered.
*
* Quick test_and_clear_referenced for all mappings to a page,
* returns the number of ptes which referenced the page.
*/
-int page_referenced(struct page *page,
- int is_locked,
- struct mem_cgroup *mem_cont,
- unsigned long *vm_flags)
+void page_referenced(struct page *page,
+ int is_locked,
+ struct mem_cgroup *mem_cont,
+ struct pr_info *info)
{
- int referenced = 0;
int we_locked = 0;
- *vm_flags = 0;
+ info->vm_flags = 0;
+ info->pr_flags = 0;
+
if (page_mapped(page) && page_rmapping(page)) {
if (!is_locked && (!PageAnon(page) || PageKsm(page))) {
we_locked = trylock_page(page);
if (!we_locked) {
- referenced++;
+ info->pr_flags |= PR_REFERENCED;
goto out;
}
}
if (unlikely(PageKsm(page)))
- referenced += page_referenced_ksm(page, mem_cont,
- vm_flags);
+ page_referenced_ksm(page, mem_cont, info);
else if (PageAnon(page))
- referenced += page_referenced_anon(page, mem_cont,
- vm_flags);
+ page_referenced_anon(page, mem_cont, info);
else if (page->mapping)
- referenced += page_referenced_file(page, mem_cont,
- vm_flags);
+ page_referenced_file(page, mem_cont, info);
if (we_locked)
unlock_page(page);
}
out:
if (page_test_and_clear_young(page))
- referenced++;
-
- return referenced;
+ info->pr_flags |= PR_REFERENCED;
}
static int page_mkclean_one(struct page *page, struct vm_area_struct *vma,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6771ea7..7fa9385 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -631,10 +631,10 @@ enum page_references {
static enum page_references page_check_references(struct page *page,
struct scan_control *sc)
{
- int referenced_ptes, referenced_page;
- unsigned long vm_flags;
+ int referenced_page;
+ struct pr_info info;
- referenced_ptes = page_referenced(page, 1, sc->mem_cgroup, &vm_flags);
+ page_referenced(page, 1, sc->mem_cgroup, &info);
referenced_page = TestClearPageReferenced(page);
/* Lumpy reclaim - ignore references */
@@ -645,10 +645,10 @@ static enum page_references page_check_references(struct page *page,
* Mlock lost the isolation race with us. Let try_to_unmap()
* move the page to the unevictable list.
*/
- if (vm_flags & VM_LOCKED)
+ if (info.vm_flags & VM_LOCKED)
return PAGEREF_RECLAIM;
- if (referenced_ptes) {
+ if (info.pr_flags & PR_REFERENCED) {
if (PageAnon(page))
return PAGEREF_ACTIVATE;
/*
@@ -1504,7 +1504,7 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
{
unsigned long nr_taken;
unsigned long pgscanned;
- unsigned long vm_flags;
+ struct pr_info info;
LIST_HEAD(l_hold); /* The pages which were snipped off */
LIST_HEAD(l_active);
LIST_HEAD(l_inactive);
@@ -1551,7 +1551,8 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
continue;
}
- if (page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
+ page_referenced(page, 0, sc->mem_cgroup, &info);
+ if (info.pr_flags & PR_REFERENCED) {
nr_rotated += hpage_nr_pages(page);
/*
* Identify referenced, file-backed active pages and
@@ -1562,7 +1563,8 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
* IO, plus JVM can create lots of anon VM_EXEC pages,
* so we ignore them here.
*/
- if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) {
+ if ((info.vm_flags & VM_EXEC) &&
+ page_is_file_cache(page)) {
list_add(&page->lru, &l_active);
continue;
}
--
1.7.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/5] kstaled: page_referenced_kstaled() and supporting infrastructure.
2011-03-25 8:43 [RFC 0/5] idle page tracking / working set estimation Michel Lespinasse
2011-03-25 8:43 ` [PATCH 1/5] page_referenced: replace vm_flags parameter with struct pr_info Michel Lespinasse
@ 2011-03-25 8:43 ` Michel Lespinasse
2011-04-06 23:22 ` Dave Hansen
2011-03-25 8:43 ` [PATCH 3/5] kstaled: minimalistic implementation Michel Lespinasse
` (2 subsequent siblings)
4 siblings, 1 reply; 8+ messages in thread
From: Michel Lespinasse @ 2011-03-25 8:43 UTC (permalink / raw)
To: linux-mm; +Cc: KOSAKI Motohiro
Add a new page_referenced_kstaled() interface. The desired behavior
is that page_referenced() returns page references since the last
page_referenced() call, and page_referenced_kstaled() returns page
references since the last page_referenced_kstaled() call, but they
are both independent of each other and do not influence each other.
The following events are counted as kstaled page references:
- CPU data access to the page (as noticed through pte_young());
- mark_page_accessed() calls;
- page being freed / reallocated.
Signed-off-by: Michel Lespinasse <walken@google.com>
---
include/linux/page-flags.h | 25 +++++++++++++++++
include/linux/rmap.h | 64 +++++++++++++++++++++++++++++++++++++++-----
mm/memory.c | 14 +++++++++
mm/rmap.c | 60 ++++++++++++++++++++++++++++------------
mm/swap.c | 1 +
5 files changed, 139 insertions(+), 25 deletions(-)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0db8037..6033b7c 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -51,6 +51,13 @@
* PG_hwpoison indicates that a page got corrupted in hardware and contains
* data with incorrect ECC bits that triggered a machine check. Accessing is
* not safe since it may cause another machine check. Don't touch!
+ *
+ * PG_young indicates that kstaled cleared the young bit on some PTEs pointing
+ * to that page. In order to avoid interacting with the LRU algorithm, we want
+ * the next page_referenced() call to still consider the page young.
+ *
+ * PG_idle indicates that the page has not been referenced since the last time
+ * kstaled scanned it.
*/
/*
@@ -107,6 +114,8 @@ enum pageflags {
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
PG_compound_lock,
#endif
+ PG_young, /* kstaled cleared pte_young */
+ PG_idle, /* idle since start of kstaled interval */
__NR_PAGEFLAGS,
/* Filesystems */
@@ -278,6 +287,22 @@ PAGEFLAG_FALSE(HWPoison)
#define __PG_HWPOISON 0
#endif
+PAGEFLAG(Young, young)
+
+PAGEFLAG(Idle, idle)
+
+static inline void set_page_young(struct page *page)
+{
+ if (!PageYoung(page))
+ SetPageYoung(page);
+}
+
+static inline void clear_page_idle(struct page *page)
+{
+ if (PageIdle(page))
+ ClearPageIdle(page);
+}
+
u64 stable_page_flags(struct page *page);
static inline int PageUptodate(struct page *page)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 61f51af..d6aab09 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -77,6 +77,8 @@ struct pr_info {
unsigned long vm_flags;
unsigned int pr_flags;
#define PR_REFERENCED 1
+#define PR_DIRTY 2
+#define PR_FOR_KSTALED 4
};
#ifdef CONFIG_MMU
@@ -190,8 +192,8 @@ static inline void page_dup_rmap(struct page *page)
/*
* Called from mm/vmscan.c to handle paging out
*/
-void page_referenced(struct page *, int is_locked,
- struct mem_cgroup *cnt, struct pr_info *info);
+void __page_referenced(struct page *, int is_locked,
+ struct mem_cgroup *cnt, struct pr_info *info);
void page_referenced_one(struct page *, struct vm_area_struct *,
unsigned long address, unsigned int *mapcount,
struct pr_info *info);
@@ -282,12 +284,10 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *,
#define anon_vma_prepare(vma) (0)
#define anon_vma_link(vma) do {} while (0)
-static inline void page_referenced(struct page *page, int is_locked,
- struct mem_cgroup *cnt,
- struct pr_info *info)
+static inline void __page_referenced(struct page *page, int is_locked,
+ struct mem_cgroup *cnt,
+ struct pr_info *info)
{
- info->vm_flags = 0;
- info->pr_flags = 0;
}
#define try_to_unmap(page, refs) SWAP_FAIL
@@ -300,6 +300,56 @@ static inline int page_mkclean(struct page *page)
#endif /* CONFIG_MMU */
+/**
+ * page_referenced - test if the page was referenced
+ * @page: the page to test
+ * @is_locked: caller holds lock on the page
+ * @mem_cont: target memory controller
+ * @vm_flags: collect encountered vma->vm_flags who actually referenced the page
+ *
+ * Quick test_and_clear_referenced for all mappings to a page,
+ * returns the number of ptes which referenced the page.
+ */
+static inline void page_referenced(struct page *page,
+ int is_locked,
+ struct mem_cgroup *mem_cont,
+ struct pr_info *info)
+{
+ info->vm_flags = 0;
+ info->pr_flags = 0;
+
+ /*
+ * Always clear PageYoung at the start of a scanning interval. It will
+ * get get set if kstaled clears a young bit in a pte reference,
+ * so that vmscan will still see the page as referenced.
+ */
+ if (PageYoung(page)) {
+ ClearPageYoung(page);
+ info->pr_flags |= PR_REFERENCED;
+ }
+
+ __page_referenced(page, is_locked, mem_cont, info);
+}
+
+static inline void page_referenced_kstaled(struct page *page, bool is_locked,
+ struct pr_info *info)
+{
+ info->vm_flags = 0;
+ info->pr_flags = PR_FOR_KSTALED;
+
+ /*
+ * Always set PageIdle at the start of a scanning interval. It will
+ * get cleared if a young page reference is encountered; otherwise
+ * the page will be counted as idle at the next kstaled scan cycle.
+ */
+ if (!PageIdle(page)) {
+ SetPageIdle(page);
+ info->pr_flags |= PR_REFERENCED;
+ }
+
+ __page_referenced(page, is_locked, NULL, info);
+}
+
/*
* Return values of try_to_unmap
*/
diff --git a/mm/memory.c b/mm/memory.c
index 5823698..d331e85 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -966,6 +966,20 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
else {
if (pte_dirty(ptent))
set_page_dirty(page);
+ /*
+ * Using pte_young() here means kstaled
+ * interferes with the LRU algorithm. We can't
+ * just use PageYoung() to handle the case
+ * where kstaled transfered the young bit from
+ * pte to page, because mark_page_accessed()
+ * is not idempotent: if the same page was
+ * referenced by several unmapped ptes we don't
+ * want to call mark_page_accessed() for every
+ * such mapping.
+ * We did not have this problem in 300 kernels
+ * because they were using SetPageReferenced()
+ * here instead.
+ */
if (pte_young(ptent) &&
likely(!VM_SequentialReadHint(vma)))
mark_page_accessed(page);
diff --git a/mm/rmap.c b/mm/rmap.c
index ee2c413..c632bbb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -520,8 +520,17 @@ void page_referenced_one(struct page *page, struct vm_area_struct *vma,
}
/* go ahead even if the pmd is pmd_trans_splitting() */
- if (pmdp_clear_flush_young_notify(vma, address, pmd))
- referenced = true;
+ if (!(info->pr_flags & PR_FOR_KSTALED)) {
+ if (pmdp_clear_flush_young_notify(vma, address, pmd)) {
+ referenced = true;
+ clear_page_idle(page);
+ }
+ } else {
+ if (pmdp_test_and_clear_young(vma, address, pmd)) {
+ referenced = true;
+ set_page_young(page);
+ }
+ }
spin_unlock(&mm->page_table_lock);
} else {
pte_t *pte;
@@ -535,6 +544,9 @@ void page_referenced_one(struct page *page, struct vm_area_struct *vma,
if (!pte)
return;
+ if (pte_dirty(*pte))
+ info->pr_flags |= PR_DIRTY;
+
if (vma->vm_flags & VM_LOCKED) {
pte_unmap_unlock(pte, ptl);
*mapcount = 0; /* break early from loop */
@@ -542,23 +554,38 @@ void page_referenced_one(struct page *page, struct vm_area_struct *vma,
return;
}
- if (ptep_clear_flush_young_notify(vma, address, pte)) {
+ if (!(info->pr_flags & PR_FOR_KSTALED)) {
+ if (ptep_clear_flush_young_notify(vma, address, pte)) {
+ /*
+ * Don't treat a reference through a
+ * sequentially read mapping as such.
+ * If the page has been used in another
+ * mapping, we will catch it; if this other
+ * mapping is already gone, the unmap path
+ * will have set PG_referenced or activated
+ * the page.
+ */
+ if (likely(!VM_SequentialReadHint(vma)))
+ referenced = true;
+ clear_page_idle(page);
+ }
+ } else {
/*
- * Don't treat a reference through a sequentially read
- * mapping as such. If the page has been used in
- * another mapping, we will catch it; if this other
- * mapping is already gone, the unmap path will have
- * set PG_referenced or activated the page.
+ * Within page_referenced_kstaled():
+ * skip TLB shootdown & VM_SequentialReadHint heuristic
*/
- if (likely(!VM_SequentialReadHint(vma)))
+ if (ptep_test_and_clear_young(vma, address, pte)) {
referenced = true;
+ set_page_young(page);
+ }
}
pte_unmap_unlock(pte, ptl);
}
/* Pretend the page is referenced if the task has the
swap token and is in the middle of a page fault. */
- if (mm != current->mm && has_swap_token(mm) &&
+ if (!(info->pr_flags & PR_FOR_KSTALED) &&
+ mm != current->mm && has_swap_token(mm) &&
rwsem_is_locked(&mm->mmap_sem))
referenced = true;
@@ -670,7 +697,7 @@ static void page_referenced_file(struct page *page,
}
/**
- * page_referenced - test if the page was referenced
+ * __page_referenced - test if the page was referenced
* @page: the page to test
* @is_locked: caller holds lock on the page
* @mem_cont: target memory controller
@@ -680,16 +707,13 @@ static void page_referenced_file(struct page *page,
* Quick test_and_clear_referenced for all mappings to a page,
* returns the number of ptes which referenced the page.
*/
-void page_referenced(struct page *page,
- int is_locked,
- struct mem_cgroup *mem_cont,
- struct pr_info *info)
+void __page_referenced(struct page *page,
+ int is_locked,
+ struct mem_cgroup *mem_cont,
+ struct pr_info *info)
{
int we_locked = 0;
- info->vm_flags = 0;
- info->pr_flags = 0;
-
if (page_mapped(page) && page_rmapping(page)) {
if (!is_locked && (!PageAnon(page) || PageKsm(page))) {
we_locked = trylock_page(page);
diff --git a/mm/swap.c b/mm/swap.c
index c02f936..4829e53 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -291,6 +291,7 @@ void mark_page_accessed(struct page *page)
} else if (!PageReferenced(page)) {
SetPageReferenced(page);
}
+ clear_page_idle(page);
}
EXPORT_SYMBOL(mark_page_accessed);
--
1.7.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/5] kstaled: minimalistic implementation.
2011-03-25 8:43 [RFC 0/5] idle page tracking / working set estimation Michel Lespinasse
2011-03-25 8:43 ` [PATCH 1/5] page_referenced: replace vm_flags parameter with struct pr_info Michel Lespinasse
2011-03-25 8:43 ` [PATCH 2/5] kstaled: page_referenced_kstaled() and supporting infrastructure Michel Lespinasse
@ 2011-03-25 8:43 ` Michel Lespinasse
2011-03-25 8:43 ` [PATCH 4/5] kstaled: skip non-RAM regions Michel Lespinasse
2011-03-25 8:43 ` [PATCH 5/5] kstaled: rate limit pages scanned per second Michel Lespinasse
4 siblings, 0 replies; 8+ messages in thread
From: Michel Lespinasse @ 2011-03-25 8:43 UTC (permalink / raw)
To: linux-mm; +Cc: KOSAKI Motohiro
Introduce minimal kstaled implementation. The scan rate is controlled by
/sys/kernel/mm/kstaled/scan_seconds and per-cgroup statistics are output
into /dev/cgroup/*/memory.idle_page_stats
Signed-off-by: Michel Lespinasse <walken@google.com>
---
mm/memcontrol.c | 279 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 279 insertions(+), 0 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index da53a25..042e266 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -48,6 +48,8 @@
#include <linux/page_cgroup.h>
#include <linux/cpu.h>
#include <linux/oom.h>
+#include <linux/kthread.h>
+#include <linux/rmap.h>
#include "internal.h"
#include <asm/uaccess.h>
@@ -270,6 +272,14 @@ struct mem_cgroup {
*/
struct mem_cgroup_stat_cpu nocpu_base;
spinlock_t pcp_counter_lock;
+
+ seqcount_t idle_page_stats_lock;
+ struct idle_page_stats {
+ unsigned long idle_clean;
+ unsigned long idle_dirty_file;
+ unsigned long idle_dirty_swap;
+ } idle_page_stats, idle_scan_stats;
+ unsigned long idle_page_scans;
};
/* Stuffs for move charges at task migration. */
@@ -4168,6 +4178,28 @@ static int mem_cgroup_oom_control_write(struct cgroup *cgrp,
return 0;
}
+static int mem_cgroup_idle_page_stats_read(struct cgroup *cgrp,
+ struct cftype *cft, struct cgroup_map_cb *cb)
+{
+ struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
+ unsigned int seqcount;
+ struct idle_page_stats stats;
+ unsigned long scans;
+
+ do {
+ seqcount = read_seqcount_begin(&mem->idle_page_stats_lock);
+ stats = mem->idle_page_stats;
+ scans = mem->idle_page_scans;
+ } while (read_seqcount_retry(&mem->idle_page_stats_lock, seqcount));
+
+ cb->fill(cb, "idle_clean", stats.idle_clean * PAGE_SIZE);
+ cb->fill(cb, "idle_dirty_file", stats.idle_dirty_file * PAGE_SIZE);
+ cb->fill(cb, "idle_dirty_swap", stats.idle_dirty_swap * PAGE_SIZE);
+ cb->fill(cb, "scans", scans);
+
+ return 0;
+}
+
static struct cftype mem_cgroup_files[] = {
{
.name = "usage_in_bytes",
@@ -4231,6 +4263,10 @@ static struct cftype mem_cgroup_files[] = {
.unregister_event = mem_cgroup_oom_unregister_event,
.private = MEMFILE_PRIVATE(_OOM_TYPE, OOM_CONTROL),
},
+ {
+ .name = "idle_page_stats",
+ .read_map = mem_cgroup_idle_page_stats_read,
+ },
};
#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
@@ -4494,6 +4530,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
atomic_set(&mem->refcnt, 1);
mem->move_charge_at_immigrate = 0;
mutex_init(&mem->thresholds_lock);
+ seqcount_init(&mem->idle_page_stats_lock);
return &mem->css;
free_out:
__mem_cgroup_free(mem);
@@ -5076,3 +5113,245 @@ static int __init disable_swap_account(char *s)
}
__setup("noswapaccount", disable_swap_account);
#endif
+
+static unsigned int kstaled_scan_seconds;
+static DECLARE_WAIT_QUEUE_HEAD(kstaled_wait);
+
+static inline void kstaled_scan_page(struct page *page)
+{
+ bool is_locked = false;
+ bool is_file;
+ struct pr_info info;
+ struct page_cgroup *pc;
+ struct idle_page_stats *stats;
+
+ /*
+ * Before taking the page reference, check if the page is
+ * a user page which is not obviously unreclaimable
+ * (we will do more complete checks later).
+ */
+ if (!PageLRU(page) || PageMlocked(page) ||
+ (page->mapping == NULL && !PageSwapCache(page)))
+ return;
+
+ if (!get_page_unless_zero(page))
+ return;
+
+ /* Recheck now that we have the page reference. */
+ if (unlikely(!PageLRU(page) || PageMlocked(page)))
+ goto out;
+
+ /*
+ * Anon and SwapCache pages can be identified without locking.
+ * For all other cases, we need the page locked in order to
+ * dereference page->mapping.
+ */
+ if (PageAnon(page) || PageSwapCache(page))
+ is_file = false;
+ else if (!trylock_page(page)) {
+ /*
+ * We need to lock the page to dereference the mapping.
+ * But don't risk sleeping by calling lock_page().
+ * We don't want to stall kstaled, so we conservatively
+ * count locked pages as unreclaimable.
+ */
+ goto out;
+ } else {
+ struct address_space *mapping = page->mapping;
+
+ is_locked = true;
+
+ /*
+ * The page is still anon - it has been continuously referenced
+ * since the prior check.
+ */
+ VM_BUG_ON(PageAnon(page) || mapping != page_rmapping(page));
+
+ /*
+ * Check the mapping under protection of the page lock.
+ * 1. If the page is not swap cache and has no mapping,
+ * shrink_page_list can't do anything with it.
+ * 2. If the mapping is unevictable (as in SHM_LOCK segments),
+ * shrink_page_list can't do anything with it.
+ * 3. If the page is swap cache or the mapping is swap backed
+ * (as in shmem), consider it a swappable page.
+ * 4. If the backing dev has indicated that it does not want
+ * its pages sync'd to disk (as in ramfs), take this as
+ * a hint that its pages are not reclaimable.
+ * 5. Otherwise, consider this as a file page reclaimable
+ * through standard pageout.
+ */
+ if (!mapping && !PageSwapCache(page))
+ goto out;
+ else if (mapping && mapping_unevictable(mapping))
+ goto out;
+ else if (PageSwapCache(page) ||
+ mapping_cap_swap_backed(mapping))
+ is_file = false;
+ else if (!mapping_cap_writeback_dirty(mapping))
+ goto out;
+ else
+ is_file = true;
+ }
+
+ /* Find out if the page is idle. Also test for pending mlock. */
+ page_referenced_kstaled(page, is_locked, &info);
+ if ((info.pr_flags & PR_REFERENCED) || (info.vm_flags & VM_LOCKED))
+ goto out;
+
+ /*
+ * Unlock early to avoid depending on lock ordering
+ * between page and page_cgroup
+ */
+ if (is_locked) {
+ unlock_page(page);
+ is_locked = false;
+ }
+
+ /* Locate kstaled stats for the page's cgroup. */
+ pc = lookup_page_cgroup(page);
+ if (!pc)
+ goto out;
+ lock_page_cgroup(pc);
+ if (!PageCgroupUsed(pc))
+ goto unlock_page_cgroup_out;
+ stats = &pc->mem_cgroup->idle_scan_stats;
+
+ /* Finally increment the correct statistic for this page. */
+ if (!(info.pr_flags & PR_DIRTY) &&
+ !PageDirty(page) && !PageWriteback(page))
+ stats->idle_clean++;
+ else if (is_file)
+ stats->idle_dirty_file++;
+ else
+ stats->idle_dirty_swap++;
+
+ unlock_page_cgroup_out:
+ unlock_page_cgroup(pc);
+
+ out:
+ if (is_locked)
+ unlock_page(page);
+ put_page(page);
+}
+
+static void kstaled_scan_node(pg_data_t *pgdat)
+{
+ unsigned long flags;
+ unsigned long start, end, pfn;
+
+ pgdat_resize_lock(pgdat, &flags);
+
+ start = pgdat->node_start_pfn;
+ end = start + pgdat->node_spanned_pages;
+
+ for (pfn = start; pfn < end; pfn++) {
+ if (!pfn_valid(pfn))
+ continue;
+
+ kstaled_scan_page(pfn_to_page(pfn));
+ }
+
+ pgdat_resize_unlock(pgdat, &flags);
+}
+
+static int kstaled(void *dummy)
+{
+ while (1) {
+ int scan_seconds;
+ int nid;
+ struct mem_cgroup *mem;
+
+ wait_event_interruptible(kstaled_wait,
+ (scan_seconds = kstaled_scan_seconds) > 0);
+ /*
+ * We use interruptible wait_event so as not to contribute
+ * to the machine load average while we're sleeping.
+ * However, we don't actually expect to receive a signal
+ * since we run as a kernel thread, so the condition we were
+ * waiting for should be true once we get here.
+ */
+ BUG_ON(scan_seconds <= 0);
+
+ for_each_mem_cgroup_all(mem)
+ memset(&mem->idle_scan_stats, 0,
+ sizeof(mem->idle_scan_stats));
+
+ for_each_node_state(nid, N_HIGH_MEMORY) {
+ const struct cpumask *cpumask = cpumask_of_node(nid);
+
+ if (!cpumask_empty(cpumask))
+ set_cpus_allowed_ptr(current, cpumask);
+
+ kstaled_scan_node(NODE_DATA(nid));
+ }
+
+ for_each_mem_cgroup_all(mem) {
+ write_seqcount_begin(&mem->idle_page_stats_lock);
+ mem->idle_page_stats = mem->idle_scan_stats;
+ mem->idle_page_scans++;
+ write_seqcount_end(&mem->idle_page_stats_lock);
+ }
+
+ schedule_timeout_interruptible(scan_seconds * HZ);
+ }
+
+ BUG();
+ return 0; /* NOT REACHED */
+}
+
+static ssize_t kstaled_scan_seconds_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return sprintf(buf, "%u\n", kstaled_scan_seconds);
+}
+
+static ssize_t kstaled_scan_seconds_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ int err;
+ unsigned long input;
+
+ err = strict_strtoul(buf, 10, &input);
+ if (err)
+ return -EINVAL;
+ kstaled_scan_seconds = input;
+ wake_up_interruptible(&kstaled_wait);
+ return count;
+}
+
+static struct kobj_attribute kstaled_scan_seconds_attr = __ATTR(
+ scan_seconds, 0644,
+ kstaled_scan_seconds_show, kstaled_scan_seconds_store);
+
+static struct attribute *kstaled_attrs[] = {
+ &kstaled_scan_seconds_attr.attr,
+ NULL
+};
+static struct attribute_group kstaled_attr_group = {
+ .name = "kstaled",
+ .attrs = kstaled_attrs,
+};
+
+static int __init kstaled_init(void)
+{
+ int error;
+ struct task_struct *thread;
+
+ error = sysfs_create_group(mm_kobj, &kstaled_attr_group);
+ if (error) {
+ pr_err("Failed to create kstaled sysfs node\n");
+ return error;
+ }
+
+ thread = kthread_run(kstaled, NULL, "kstaled");
+ if (IS_ERR(thread)) {
+ pr_err("Failed to start kstaled\n");
+ return PTR_ERR(thread);
+ }
+
+ return 0;
+}
+module_init(kstaled_init);
--
1.7.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 4/5] kstaled: skip non-RAM regions.
2011-03-25 8:43 [RFC 0/5] idle page tracking / working set estimation Michel Lespinasse
` (2 preceding siblings ...)
2011-03-25 8:43 ` [PATCH 3/5] kstaled: minimalistic implementation Michel Lespinasse
@ 2011-03-25 8:43 ` Michel Lespinasse
2011-03-25 8:43 ` [PATCH 5/5] kstaled: rate limit pages scanned per second Michel Lespinasse
4 siblings, 0 replies; 8+ messages in thread
From: Michel Lespinasse @ 2011-03-25 8:43 UTC (permalink / raw)
To: linux-mm; +Cc: KOSAKI Motohiro
Signed-off-by: Michel Lespinasse <walken@google.com>
---
arch/x86/include/asm/page_types.h | 8 ++++++
arch/x86/kernel/e820.c | 45 +++++++++++++++++++++++++++++++++++++
include/linux/mmzone.h | 6 +++++
mm/memcontrol.c | 21 +++++++++++-----
4 files changed, 73 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 1df6621..7ae791f 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -52,6 +52,14 @@ extern void initmem_init(unsigned long start_pfn, unsigned long end_pfn,
int acpi, int k8);
extern void free_initmem(void);
+extern void e820_skip_hole(unsigned long *start_pfn, unsigned long *end_pfn);
+
+#define ARCH_HAVE_PFN_SKIP_HOLE 1
+static inline void pfn_skip_hole(unsigned long *start, unsigned long *end)
+{
+ e820_skip_hole(start, end);
+}
+
#endif /* !__ASSEMBLY__ */
#endif /* _ASM_X86_PAGE_DEFS_H */
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 294f26d..b816706 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1122,3 +1122,48 @@ void __init memblock_find_dma_reserve(void)
set_dma_reserve(mem_size_pfn - free_size_pfn);
#endif
}
+
+/*
+ * The caller wants to skip pfns that are guaranteed to not be valid
+ * memory. Find a stretch of ram between [start_pfn, end_pfn) and
+ * return its pfn range back through start_pfn and end_pfn.
+ */
+
+void e820_skip_hole(unsigned long *start_pfn, unsigned long *end_pfn)
+{
+ unsigned long start = *start_pfn << PAGE_SHIFT;
+ unsigned long end = *end_pfn << PAGE_SHIFT;
+ int i;
+
+ if (start >= end)
+ goto fail; /* short-circuit e820 checks */
+
+ for (i = 0; i < e820.nr_map; i++) {
+ struct e820entry *ei = &e820.map[i];
+ unsigned long last, addr;
+
+ addr = round_up(ei->addr, PAGE_SIZE);
+ last = round_down(ei->addr + ei->size, PAGE_SIZE);
+
+ if (addr >= end)
+ goto fail; /* We're done, not found */
+ if (last <= start)
+ continue; /* Not at start yet, move on */
+ if (ei->type != E820_RAM)
+ continue; /* Not RAM, move on */
+
+ /*
+ * We've found RAM. If start is in this e820 range, return
+ * it, otherwise return the start of this e820 range.
+ */
+
+ if (addr > start)
+ *start_pfn = addr >> PAGE_SHIFT;
+ if (last < end)
+ *end_pfn = last >> PAGE_SHIFT;
+ return;
+ }
+fail:
+ *start_pfn = *end_pfn;
+ return; /* No luck, return failure */
+}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 02ecb01..955fd02 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -931,6 +931,12 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn)
#define pfn_to_section_nr(pfn) ((pfn) >> PFN_SECTION_SHIFT)
#define section_nr_to_pfn(sec) ((sec) << PFN_SECTION_SHIFT)
+#ifndef ARCH_HAVE_PFN_SKIP_HOLE
+static inline void pfn_skip_hole(unsigned long *start, unsigned long *end)
+{
+}
+#endif
+
#ifdef CONFIG_SPARSEMEM
/*
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 042e266..5bdaa23 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5238,18 +5238,25 @@ static inline void kstaled_scan_page(struct page *page)
static void kstaled_scan_node(pg_data_t *pgdat)
{
unsigned long flags;
- unsigned long start, end, pfn;
+ unsigned long pfn, end;
pgdat_resize_lock(pgdat, &flags);
- start = pgdat->node_start_pfn;
- end = start + pgdat->node_spanned_pages;
+ pfn = pgdat->node_start_pfn;
+ end = pfn + pgdat->node_spanned_pages;
- for (pfn = start; pfn < end; pfn++) {
- if (!pfn_valid(pfn))
- continue;
+ while (pfn < end) {
+ unsigned long contiguous = end;
+
+ /* restrict pfn..contiguous to be a RAM backed range */
+ pfn_skip_hole(&pfn, &contiguous);
- kstaled_scan_page(pfn_to_page(pfn));
+ for (; pfn < contiguous; pfn++) {
+ if (!pfn_valid(pfn))
+ continue;
+
+ kstaled_scan_page(pfn_to_page(pfn));
+ }
}
pgdat_resize_unlock(pgdat, &flags);
--
1.7.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 5/5] kstaled: rate limit pages scanned per second.
2011-03-25 8:43 [RFC 0/5] idle page tracking / working set estimation Michel Lespinasse
` (3 preceding siblings ...)
2011-03-25 8:43 ` [PATCH 4/5] kstaled: skip non-RAM regions Michel Lespinasse
@ 2011-03-25 8:43 ` Michel Lespinasse
4 siblings, 0 replies; 8+ messages in thread
From: Michel Lespinasse @ 2011-03-25 8:43 UTC (permalink / raw)
To: linux-mm; +Cc: KOSAKI Motohiro
Signed-off-by: Michel Lespinasse <walken@google.com>
---
include/linux/mmzone.h | 1 +
mm/memcontrol.c | 81 +++++++++++++++++++++++++++++++++++++++--------
2 files changed, 68 insertions(+), 14 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 955fd02..f98fc64 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -636,6 +636,7 @@ typedef struct pglist_data {
unsigned long node_present_pages; /* total number of physical pages */
unsigned long node_spanned_pages; /* total size of physical page
range, including holes */
+ unsigned long node_idle_scan_pfn;
int node_id;
wait_queue_head_t kswapd_wait;
struct task_struct *kswapd;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5bdaa23..64b157b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5115,6 +5115,7 @@ __setup("noswapaccount", disable_swap_account);
#endif
static unsigned int kstaled_scan_seconds;
+static DEFINE_SPINLOCK(kstaled_scan_seconds_lock);
static DECLARE_WAIT_QUEUE_HEAD(kstaled_wait);
static inline void kstaled_scan_page(struct page *page)
@@ -5235,15 +5236,19 @@ static inline void kstaled_scan_page(struct page *page)
put_page(page);
}
-static void kstaled_scan_node(pg_data_t *pgdat)
+static bool kstaled_scan_node(pg_data_t *pgdat, int scan_seconds, bool reset)
{
unsigned long flags;
- unsigned long pfn, end;
+ unsigned long pfn, end, node_end;
pgdat_resize_lock(pgdat, &flags);
pfn = pgdat->node_start_pfn;
- end = pfn + pgdat->node_spanned_pages;
+ node_end = pfn + pgdat->node_spanned_pages;
+ if (!reset && pfn < pgdat->node_idle_scan_pfn)
+ pfn = pgdat->node_idle_scan_pfn;
+ end = min(pfn + DIV_ROUND_UP(pgdat->node_spanned_pages, scan_seconds),
+ node_end);
while (pfn < end) {
unsigned long contiguous = end;
@@ -5260,14 +5265,21 @@ static void kstaled_scan_node(pg_data_t *pgdat)
}
pgdat_resize_unlock(pgdat, &flags);
+
+ pgdat->node_idle_scan_pfn = end;
+ return end == node_end;
}
static int kstaled(void *dummy)
{
+ int delayed = 0;
+ bool reset = true;
+
while (1) {
int scan_seconds;
int nid;
- struct mem_cgroup *mem;
+ long earlier, delta;
+ bool scan_done;
wait_event_interruptible(kstaled_wait,
(scan_seconds = kstaled_scan_seconds) > 0);
@@ -5280,27 +5292,66 @@ static int kstaled(void *dummy)
*/
BUG_ON(scan_seconds <= 0);
- for_each_mem_cgroup_all(mem)
- memset(&mem->idle_scan_stats, 0,
- sizeof(mem->idle_scan_stats));
+ earlier = jiffies;
+ scan_done = true;
for_each_node_state(nid, N_HIGH_MEMORY) {
const struct cpumask *cpumask = cpumask_of_node(nid);
if (!cpumask_empty(cpumask))
set_cpus_allowed_ptr(current, cpumask);
- kstaled_scan_node(NODE_DATA(nid));
+ scan_done &= kstaled_scan_node(NODE_DATA(nid),
+ scan_seconds, reset);
}
- for_each_mem_cgroup_all(mem) {
- write_seqcount_begin(&mem->idle_page_stats_lock);
- mem->idle_page_stats = mem->idle_scan_stats;
- mem->idle_page_scans++;
- write_seqcount_end(&mem->idle_page_stats_lock);
+ if (scan_done) {
+ struct mem_cgroup *mem;
+
+ for_each_mem_cgroup_all(mem) {
+ write_seqcount_begin(&mem->idle_page_stats_lock);
+ mem->idle_page_stats = mem->idle_scan_stats;
+ mem->idle_page_scans++;
+ write_seqcount_end(&mem->idle_page_stats_lock);
+ memset(&mem->idle_scan_stats, 0,
+ sizeof(mem->idle_scan_stats));
+ }
+ }
+
+ delta = jiffies - earlier;
+ if (delta < HZ / 2) {
+ delayed = 0;
+ schedule_timeout_interruptible(HZ - delta);
+ } else {
+ /*
+ * Emergency throttle if we're taking too long.
+ * We are supposed to scan an entire slice in 1 second.
+ * If we keep taking more than half a second for
+ * 10 consecutive times, scale back our scan_seconds.
+ *
+ * If someone changed kstaled_scan_cycle while we were
+ * running, hope they know what they're doing and
+ * assume they've eliminated any delays.
+ */
+ bool updated = false;
+ spin_lock(&kstaled_scan_seconds_lock);
+ if (scan_seconds != kstaled_scan_seconds)
+ delayed = 0;
+ else if (++delayed == 10) {
+ delayed = 0;
+ scan_seconds *= 2;
+ kstaled_scan_seconds = scan_seconds;
+ updated = true;
+ }
+ spin_unlock(&kstaled_scan_seconds_lock);
+ if (updated)
+ pr_warning("kstaled taking too long, "
+ "scan_seconds now %d\n",
+ scan_seconds);
+ schedule_timeout_interruptible(HZ / 2);
}
- schedule_timeout_interruptible(scan_seconds * HZ);
+ reset = scan_done;
}
BUG();
@@ -5324,7 +5375,9 @@ static ssize_t kstaled_scan_seconds_store(struct kobject *kobj,
err = strict_strtoul(buf, 10, &input);
if (err)
return -EINVAL;
+ spin_lock(&kstaled_scan_seconds_lock);
kstaled_scan_seconds = input;
+ spin_unlock(&kstaled_scan_seconds_lock);
wake_up_interruptible(&kstaled_wait);
return count;
}
--
1.7.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/5] kstaled: page_referenced_kstaled() and supporting infrastructure.
2011-03-25 8:43 ` [PATCH 2/5] kstaled: page_referenced_kstaled() and supporting infrastructure Michel Lespinasse
@ 2011-04-06 23:22 ` Dave Hansen
2011-04-07 7:15 ` KOSAKI Motohiro
0 siblings, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2011-04-06 23:22 UTC (permalink / raw)
To: Michel Lespinasse; +Cc: linux-mm, KOSAKI Motohiro
On Fri, 2011-03-25 at 01:43 -0700, Michel Lespinasse wrote:
> +PAGEFLAG(Young, young)
> +
> +PAGEFLAG(Idle, idle)
> +
> +static inline void set_page_young(struct page *page)
> +{
> + if (!PageYoung(page))
> + SetPageYoung(page);
> +}
> +
> +static inline void clear_page_idle(struct page *page)
> +{
> + if (PageIdle(page))
> + ClearPageIdle(page);
> +}
Is it time for a CONFIG_X86_32_STRUCT_PAGE_IS_NOW_A_BLOATED_BIG config
option? If folks want these kinds of features, then they need to suck
it up and make their 'struct page' 36 bytes. Any of these new page
flags features could:
config EXTENDED_PAGE_FLAGS
depends on 64BIT || X86_32_STRUCT_PAGE_IS_NOW_A_BLOATED_BIG
config KSTALED
depends on EXTENDED_PAGE_FLAGS
And then we can wrap the "enum pageflags" entries for them in #ifdefs,
along with making page->flags a u64 when
X86_32_STRUCT_PAGE_IS_NOW_A_BLOATED_BIG is set.
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/5] kstaled: page_referenced_kstaled() and supporting infrastructure.
2011-04-06 23:22 ` Dave Hansen
@ 2011-04-07 7:15 ` KOSAKI Motohiro
0 siblings, 0 replies; 8+ messages in thread
From: KOSAKI Motohiro @ 2011-04-07 7:15 UTC (permalink / raw)
To: Dave Hansen; +Cc: kosaki.motohiro, Michel Lespinasse, linux-mm
> On Fri, 2011-03-25 at 01:43 -0700, Michel Lespinasse wrote:
> > +PAGEFLAG(Young, young)
> > +
> > +PAGEFLAG(Idle, idle)
> > +
> > +static inline void set_page_young(struct page *page)
> > +{
> > + if (!PageYoung(page))
> > + SetPageYoung(page);
> > +}
> > +
> > +static inline void clear_page_idle(struct page *page)
> > +{
> > + if (PageIdle(page))
> > + ClearPageIdle(page);
> > +}
>
> Is it time for a CONFIG_X86_32_STRUCT_PAGE_IS_NOW_A_BLOATED_BIG config
> option? If folks want these kinds of features, then they need to suck
> it up and make their 'struct page' 36 bytes. Any of these new page
> flags features could:
>
> config EXTENDED_PAGE_FLAGS
> depends on 64BIT || X86_32_STRUCT_PAGE_IS_NOW_A_BLOATED_BIG
>
> config KSTALED
> depends on EXTENDED_PAGE_FLAGS
>
> And then we can wrap the "enum pageflags" entries for them in #ifdefs,
> along with making page->flags a u64 when
> X86_32_STRUCT_PAGE_IS_NOW_A_BLOATED_BIG is set.
Right.
x86_32 has no left space for new flags and 36byte struct page is unacceptable.
Hmm...
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-04-07 7:16 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-25 8:43 [RFC 0/5] idle page tracking / working set estimation Michel Lespinasse
2011-03-25 8:43 ` [PATCH 1/5] page_referenced: replace vm_flags parameter with struct pr_info Michel Lespinasse
2011-03-25 8:43 ` [PATCH 2/5] kstaled: page_referenced_kstaled() and supporting infrastructure Michel Lespinasse
2011-04-06 23:22 ` Dave Hansen
2011-04-07 7:15 ` KOSAKI Motohiro
2011-03-25 8:43 ` [PATCH 3/5] kstaled: minimalistic implementation Michel Lespinasse
2011-03-25 8:43 ` [PATCH 4/5] kstaled: skip non-RAM regions Michel Lespinasse
2011-03-25 8:43 ` [PATCH 5/5] kstaled: rate limit pages scanned per second Michel Lespinasse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).