linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/4] free reclaimed pages by paging out instantly
@ 2013-05-13  2:10 Minchan Kim
  2013-05-13  2:10 ` [RFC 1/4] mm: Don't hide spin_lock in swap_info_get Minchan Kim
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Minchan Kim @ 2013-05-13  2:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Rik van Riel, Mel Gorman, Johannes Weiner,
	Kamezawa Hiroyuki, Michal Hocko, Hugh Dickins, Minchan Kim

Normally, I/O completed pages for reclaim would be rotated into 
inactive LRU tail. IMHO, the why we did is we can't remove the page
from page cache and (swap cache, swap slot) by locking problem.

So for reclaiming the I/O completed pages, we need one more iteration
of reclaim and it could make unnecessary CPU overhead(ex,
acitve->inactive deactivation, isolation, shrink_page_list).

Another concern is related to per process reclaim, which smart platform
can reclaim some of the process's pages forcely without OOM kill before
VM reaches a latency trouble. Assuming that people does per process
reclaim on some processes before even kswapd runs(ie, free pages > high
watermark).

And they believe nr_free_pages in vmstat should be increased but it's
not true because reclaimed pages caused by paging out(ex, swap page,
dirty pages) will be not freed until kswapd runs in the future so that
the platform confused and discard more workingset unnecessary or 
reclaims more processes until the nr_free_pages is increased by
our goal.

This patch makes swap cache free logic being aware of irq context
so we can free reclaimed pages asap without rotating them back into
LRU's tail so that it can reduce unnecessary CPU overhead and LRU
churning and makes VM more intuitive.

Big problem of this patch is how to handle for memcg.
I hope please memcg guys look at description [3/4] and get feedback.

Minchan Kim (4):
  [1] mm: Don't hide spin_lock in swap_info_get
  [2] mm: introduce __swapcache_free
  [3] mm: support remove_mapping in irqcontext
  [4] mm: free reclaimed pages instantly without depending next reclaim

 fs/splice.c          |  2 +-
 include/linux/swap.h | 12 ++++++++++-
 mm/filemap.c         |  6 +++---
 mm/swap.c            | 14 ++++++++++++-
 mm/swapfile.c        | 22 +++++++++++++++------
 mm/truncate.c        |  2 +-
 mm/vmscan.c          | 56 +++++++++++++++++++++++++++++++++++++++++++---------
 7 files changed, 92 insertions(+), 22 deletions(-)

-- 
1.8.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC 1/4] mm: Don't hide spin_lock in swap_info_get
  2013-05-13  2:10 [RFC 0/4] free reclaimed pages by paging out instantly Minchan Kim
@ 2013-05-13  2:10 ` Minchan Kim
  2013-05-13  2:10 ` [RFC 2/4] mm: introduce __swapcache_free Minchan Kim
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2013-05-13  2:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Rik van Riel, Mel Gorman, Johannes Weiner,
	Kamezawa Hiroyuki, Michal Hocko, Hugh Dickins, Minchan Kim

Now, swap_info_get hides lock holding by doing it internally
but releasing the lock is caller's duty. It's not serious bad
pattern but not good for readability, either.
More concern that if we uses swap_info_get in irq context,
the lock should be held with irq disabled.
So it would be better for caller to hold it because he can
judge the function will be used in irqcontext or not.

This patch will be used next patchset in this series.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/swapfile.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 6c340d9..2966978 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -523,7 +523,6 @@ static struct swap_info_struct *swap_info_get(swp_entry_t entry)
 		goto bad_offset;
 	if (!p->swap_map[offset])
 		goto bad_free;
-	spin_lock(&p->lock);
 	return p;
 
 bad_free:
@@ -629,6 +628,7 @@ void swap_free(swp_entry_t entry)
 
 	p = swap_info_get(entry);
 	if (p) {
+		spin_lock(&p->lock);
 		swap_entry_free(p, entry, 1);
 		spin_unlock(&p->lock);
 	}
@@ -644,6 +644,7 @@ void swapcache_free(swp_entry_t entry, struct page *page)
 
 	p = swap_info_get(entry);
 	if (p) {
+		spin_lock(&p->lock);
 		count = swap_entry_free(p, entry, SWAP_HAS_CACHE);
 		if (page)
 			mem_cgroup_uncharge_swapcache(page, entry, count != 0);
@@ -665,6 +666,7 @@ int page_swapcount(struct page *page)
 	entry.val = page_private(page);
 	p = swap_info_get(entry);
 	if (p) {
+		spin_lock(&p->lock);
 		count = swap_count(p->swap_map[swp_offset(entry)]);
 		spin_unlock(&p->lock);
 	}
@@ -747,6 +749,7 @@ int free_swap_and_cache(swp_entry_t entry)
 
 	p = swap_info_get(entry);
 	if (p) {
+		spin_lock(&p->lock);
 		if (swap_entry_free(p, entry, 1) == SWAP_HAS_CACHE) {
 			page = find_get_page(swap_address_space(entry),
 						entry.val);
@@ -2373,6 +2376,7 @@ int add_swap_count_continuation(swp_entry_t entry, gfp_t gfp_mask)
 		goto outer;
 	}
 
+	spin_lock(&si->lock);
 	offset = swp_offset(entry);
 	count = si->swap_map[offset] & ~SWAP_HAS_CACHE;
 
-- 
1.8.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 2/4] mm: introduce __swapcache_free
  2013-05-13  2:10 [RFC 0/4] free reclaimed pages by paging out instantly Minchan Kim
  2013-05-13  2:10 ` [RFC 1/4] mm: Don't hide spin_lock in swap_info_get Minchan Kim
@ 2013-05-13  2:10 ` Minchan Kim
  2013-05-13  2:10 ` [RFC 3/4] mm: support remove_mapping in irqcontext Minchan Kim
  2013-05-13  2:10 ` [RFC 4/4] mm: free reclaimed pages instantly without depending next reclaim Minchan Kim
  3 siblings, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2013-05-13  2:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Rik van Riel, Mel Gorman, Johannes Weiner,
	Kamezawa Hiroyuki, Michal Hocko, Hugh Dickins, Minchan Kim

The __swapcache_free is almost same with swapcache_free
but only difference is that caller should pass stable swap_info_struct.

This function will be used by next patchsets.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/swapfile.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 2966978..33ebdd5 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -634,20 +634,26 @@ void swap_free(swp_entry_t entry)
 	}
 }
 
+
+void __swapcache_free(struct swap_info_struct *p,
+			swp_entry_t entry, struct page *page)
+{
+	unsigned char count;
+	count = swap_entry_free(p, entry, SWAP_HAS_CACHE);
+	mem_cgroup_uncharge_swapcache(page, entry, count != 0);
+}
+
 /*
  * Called after dropping swapcache to decrease refcnt to swap entries.
  */
 void swapcache_free(swp_entry_t entry, struct page *page)
 {
 	struct swap_info_struct *p;
-	unsigned char count;
 
 	p = swap_info_get(entry);
 	if (p) {
 		spin_lock(&p->lock);
-		count = swap_entry_free(p, entry, SWAP_HAS_CACHE);
-		if (page)
-			mem_cgroup_uncharge_swapcache(page, entry, count != 0);
+		__swapcache_free(p, entry, page);
 		spin_unlock(&p->lock);
 	}
 }
-- 
1.8.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 3/4] mm: support remove_mapping in irqcontext
  2013-05-13  2:10 [RFC 0/4] free reclaimed pages by paging out instantly Minchan Kim
  2013-05-13  2:10 ` [RFC 1/4] mm: Don't hide spin_lock in swap_info_get Minchan Kim
  2013-05-13  2:10 ` [RFC 2/4] mm: introduce __swapcache_free Minchan Kim
@ 2013-05-13  2:10 ` Minchan Kim
  2013-05-13 14:58   ` Michal Hocko
  2013-05-13  2:10 ` [RFC 4/4] mm: free reclaimed pages instantly without depending next reclaim Minchan Kim
  3 siblings, 1 reply; 9+ messages in thread
From: Minchan Kim @ 2013-05-13  2:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Rik van Riel, Mel Gorman, Johannes Weiner,
	Kamezawa Hiroyuki, Michal Hocko, Hugh Dickins, Minchan Kim

This patch makes to use remove_mapping in irq context.
For it, this patch adds irqcontext argument in some functions
(ex, remove_maping and __remove_mapping) but these functions
are not hot path so i believe it's not a problem.

And it exports swap_info_get and check that we can get a
swap_info_struct->s lock in advance in irq context.
Because __swapcache_free must be succesful.
Otherwise, If __swapcache_free can be failed, we should rollback
things done by __delete_from_swap_cache and it could need memory
allocation for radix tree node in irq context.

More concern is handling for mem_cgroup_uncharge_swapcache in
irqcontext, which isn't aware of irqcontext at the moment and it
should be successful like above reason.

After I review that code, I think it's not a big challenge if I missed
somethings.

My rough plan is following as,

1. Make mctz->lock beging aware of irq by changing spin_lock with
   spin_lock_irqsave.
2. Introuduce new argument "locked" in __mem_cgroup_uncharge_common
   so that __mem_cgroup_uncharge_common can avoid lock_page_cgroup in
   irqcontext to avoid deadlock but caller in irqcontext should be held
   it in advance by next patch.
3. Introduce try_lock_page_cgroup, which will be used __swapcache_free.
4. __remove_mapping can held a page_cgroup lock in advance before calling
   __swapcache_free

I'd like to listen memcg people's opinions before diving into coding.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 fs/splice.c          |  2 +-
 include/linux/swap.h | 12 ++++++++++-
 mm/swapfile.c        |  2 +-
 mm/truncate.c        |  2 +-
 mm/vmscan.c          | 56 +++++++++++++++++++++++++++++++++++++++++++---------
 5 files changed, 61 insertions(+), 13 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index e6b2559..db77694 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -70,7 +70,7 @@ static int page_cache_pipe_buf_steal(struct pipe_inode_info *pipe,
 		 * If we succeeded in removing the mapping, set LRU flag
 		 * and return good.
 		 */
-		if (remove_mapping(mapping, page)) {
+		if (remove_mapping(mapping, page, false)) {
 			buf->flags |= PIPE_BUF_FLAG_LRU;
 			return 0;
 		}
diff --git a/include/linux/swap.h b/include/linux/swap.h
index ca031f7..eb126d2 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -274,7 +274,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
 						unsigned long *nr_scanned);
 extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
-extern int remove_mapping(struct address_space *mapping, struct page *page);
+extern int remove_mapping(struct address_space *mapping, struct page *page,
+				bool irqcontext);
 extern unsigned long vm_total_pages;
 
 #ifdef CONFIG_NUMA
@@ -407,6 +408,9 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
 }
 #endif
 
+
+extern struct swap_info_struct *swap_info_get(swp_entry_t entry);
+
 #else /* CONFIG_SWAP */
 
 #define get_nr_swap_pages()			0L
@@ -430,6 +434,12 @@ static inline void show_swap_cache_info(void)
 #define free_swap_and_cache(swp)	is_migration_entry(swp)
 #define swapcache_prepare(swp)		is_migration_entry(swp)
 
+
+struct swap_info_struct *swap_info_get(swp_entry_t entry)
+{
+	return NULL;
+}
+
 static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
 {
 	return 0;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 33ebdd5..8a425d4 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -505,7 +505,7 @@ swp_entry_t get_swap_page_of_type(int type)
 	return (swp_entry_t) {0};
 }
 
-static struct swap_info_struct *swap_info_get(swp_entry_t entry)
+struct swap_info_struct *swap_info_get(swp_entry_t entry)
 {
 	struct swap_info_struct *p;
 	unsigned long offset, type;
diff --git a/mm/truncate.c b/mm/truncate.c
index c75b736..fa1dc60 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -131,7 +131,7 @@ invalidate_complete_page(struct address_space *mapping, struct page *page)
 	if (page_has_private(page) && !try_to_release_page(page, 0))
 		return 0;
 
-	ret = remove_mapping(mapping, page);
+	ret = remove_mapping(mapping, page, false);
 
 	return ret;
 }
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fa6a853..d14c9be 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -450,12 +450,18 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
  * Same as remove_mapping, but if the page is removed from the mapping, it
  * gets returned with a refcount of 0.
  */
-static int __remove_mapping(struct address_space *mapping, struct page *page)
+static int __remove_mapping(struct address_space *mapping, struct page *page,
+				bool irqcontext)
 {
+	unsigned long flags;
+
 	BUG_ON(!PageLocked(page));
 	BUG_ON(mapping != page_mapping(page));
 
-	spin_lock_irq(&mapping->tree_lock);
+	if (irqcontext)
+		spin_lock_irqsave(&mapping->tree_lock, flags);
+	else
+		spin_lock_irq(&mapping->tree_lock);
 	/*
 	 * The non racy check for a busy page.
 	 *
@@ -490,17 +496,45 @@ static int __remove_mapping(struct address_space *mapping, struct page *page)
 	}
 
 	if (PageSwapCache(page)) {
+		struct swap_info_struct *p;
 		swp_entry_t swap = { .val = page_private(page) };
+		p = swap_info_get(swap);
+		/*
+		 * If we are irq context, check that we can get a
+		 * swap_info_strcut->lock before removing the page from
+		 * swap cache. Because __swapcache_free must be successful.
+		 * If __swapcache_free can be failed, we should rollback
+		 * things done by __delete_from_swap_cache and it needs
+		 * memory allocation for radix tree node in irqcontext
+		 * That's thing we really want to avoid.
+		 * TODO : memcg mem_cgroup_uncharge_swapcache handling
+		 * in irqcontext
+		 */
+		if (irqcontext && p && !spin_trylock(&p->lock)) {
+			page_unfreeze_refs(page, 2);
+			goto cannot_free;
+		}
+
 		__delete_from_swap_cache(page);
-		spin_unlock_irq(&mapping->tree_lock);
-		swapcache_free(swap, page);
+		if (irqcontext) {
+			spin_unlock_irqrestore(&mapping->tree_lock, flags);
+			if (p)
+				__swapcache_free(p, swap, page);
+			spin_unlock(&p->lock);
+		} else {
+			spin_unlock_irq(&mapping->tree_lock);
+			swapcache_free(swap, page);
+		}
 	} else {
 		void (*freepage)(struct page *);
 
 		freepage = mapping->a_ops->freepage;
 
 		__delete_from_page_cache(page);
-		spin_unlock_irq(&mapping->tree_lock);
+		if (irqcontext)
+			spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		else
+			spin_unlock_irq(&mapping->tree_lock);
 		mem_cgroup_uncharge_cache_page(page);
 
 		if (freepage != NULL)
@@ -510,7 +544,10 @@ static int __remove_mapping(struct address_space *mapping, struct page *page)
 	return 1;
 
 cannot_free:
-	spin_unlock_irq(&mapping->tree_lock);
+	if (irqcontext)
+		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	else
+		spin_unlock_irq(&mapping->tree_lock);
 	return 0;
 }
 
@@ -520,9 +557,10 @@ cannot_free:
  * successfully detached, return 1.  Assumes the caller has a single ref on
  * this page.
  */
-int remove_mapping(struct address_space *mapping, struct page *page)
+int remove_mapping(struct address_space *mapping, struct page *page,
+			bool irqcontext)
 {
-	if (__remove_mapping(mapping, page)) {
+	if (__remove_mapping(mapping, page, irqcontext)) {
 		/*
 		 * Unfreezing the refcount with 1 rather than 2 effectively
 		 * drops the pagecache ref for us without requiring another
@@ -904,7 +942,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			}
 		}
 
-		if (!mapping || !__remove_mapping(mapping, page))
+		if (!mapping || !__remove_mapping(mapping, page, false))
 			goto keep_locked;
 
 		/*
-- 
1.8.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 4/4] mm: free reclaimed pages instantly without depending next reclaim
  2013-05-13  2:10 [RFC 0/4] free reclaimed pages by paging out instantly Minchan Kim
                   ` (2 preceding siblings ...)
  2013-05-13  2:10 ` [RFC 3/4] mm: support remove_mapping in irqcontext Minchan Kim
@ 2013-05-13  2:10 ` Minchan Kim
  2013-05-14 17:32   ` Rik van Riel
  3 siblings, 1 reply; 9+ messages in thread
From: Minchan Kim @ 2013-05-13  2:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Rik van Riel, Mel Gorman, Johannes Weiner,
	Kamezawa Hiroyuki, Michal Hocko, Hugh Dickins, Minchan Kim

Normally, file I/O for reclaiming is asynchronous so that
when page writeback is completed, reclaimed page will be
rotated into LRU tail for fast reclaiming in next turn.
But it makes unnecessary CPU overhead and more iteration with higher
priority of reclaim could reclaim too many pages than needed
pages.

This patch frees reclaimed pages by paging out instantly without
rotating back them into LRU's tail when the I/O is completed so
that we can get out of reclaim loop as soon as poosbile and avoid
unnecessary CPU overhead for moving them.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/filemap.c |  6 +++---
 mm/swap.c    | 14 +++++++++++++-
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 7905fe7..8e2017b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -618,12 +618,12 @@ EXPORT_SYMBOL(unlock_page);
  */
 void end_page_writeback(struct page *page)
 {
-	if (TestClearPageReclaim(page))
-		rotate_reclaimable_page(page);
-
 	if (!test_clear_page_writeback(page))
 		BUG();
 
+	if (TestClearPageReclaim(page))
+		rotate_reclaimable_page(page);
+
 	smp_mb__after_clear_bit();
 	wake_up_page(page, PG_writeback);
 }
diff --git a/mm/swap.c b/mm/swap.c
index dfd7d71..87f21632 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -324,7 +324,19 @@ static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec,
 	int *pgmoved = arg;
 
 	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
-		enum lru_list lru = page_lru_base_type(page);
+		enum lru_list lru;
+
+		if (!trylock_page(page))
+			goto move_tail;
+
+		if (!remove_mapping(page_mapping(page), page, true)) {
+			unlock_page(page);
+			goto move_tail;
+		}
+		unlock_page(page);
+		return;
+move_tail:
+		lru = page_lru_base_type(page);
 		list_move_tail(&page->lru, &lruvec->lists[lru]);
 		(*pgmoved)++;
 	}
-- 
1.8.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC 3/4] mm: support remove_mapping in irqcontext
  2013-05-13  2:10 ` [RFC 3/4] mm: support remove_mapping in irqcontext Minchan Kim
@ 2013-05-13 14:58   ` Michal Hocko
  2013-05-14  7:17     ` Minchan Kim
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2013-05-13 14:58 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Rik van Riel, Mel Gorman,
	Johannes Weiner, Kamezawa Hiroyuki, Hugh Dickins

On Mon 13-05-13 11:10:47, Minchan Kim wrote:
[...]
> My rough plan is following as,
> 
> 1. Make mctz->lock beging aware of irq by changing spin_lock with
>    spin_lock_irqsave.

I wouldn't be worried about this one as it is on its way out with the
soft limit rework (the core uncontroversial part ;))

> 2. Introuduce new argument "locked" in __mem_cgroup_uncharge_common
>    so that __mem_cgroup_uncharge_common can avoid lock_page_cgroup in
>    irqcontext to avoid deadlock but caller in irqcontext should be held
>    it in advance by next patch.
> 3. Introduce try_lock_page_cgroup, which will be used __swapcache_free.
> 4. __remove_mapping can held a page_cgroup lock in advance before calling
>    __swapcache_free
> 
> I'd like to listen memcg people's opinions before diving into coding.

It should work. It will require some code moving, though.

> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  fs/splice.c          |  2 +-
>  include/linux/swap.h | 12 ++++++++++-
>  mm/swapfile.c        |  2 +-
>  mm/truncate.c        |  2 +-
>  mm/vmscan.c          | 56 +++++++++++++++++++++++++++++++++++++++++++---------
>  5 files changed, 61 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/splice.c b/fs/splice.c
> index e6b2559..db77694 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -70,7 +70,7 @@ static int page_cache_pipe_buf_steal(struct pipe_inode_info *pipe,
>  		 * If we succeeded in removing the mapping, set LRU flag
>  		 * and return good.
>  		 */
> -		if (remove_mapping(mapping, page)) {
> +		if (remove_mapping(mapping, page, false)) {
>  			buf->flags |= PIPE_BUF_FLAG_LRU;
>  			return 0;
>  		}
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index ca031f7..eb126d2 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -274,7 +274,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
>  						unsigned long *nr_scanned);
>  extern unsigned long shrink_all_memory(unsigned long nr_pages);
>  extern int vm_swappiness;
> -extern int remove_mapping(struct address_space *mapping, struct page *page);
> +extern int remove_mapping(struct address_space *mapping, struct page *page,
> +				bool irqcontext);
>  extern unsigned long vm_total_pages;
>  
>  #ifdef CONFIG_NUMA
> @@ -407,6 +408,9 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
>  }
>  #endif
>  
> +
> +extern struct swap_info_struct *swap_info_get(swp_entry_t entry);
> +
>  #else /* CONFIG_SWAP */
>  
>  #define get_nr_swap_pages()			0L
> @@ -430,6 +434,12 @@ static inline void show_swap_cache_info(void)
>  #define free_swap_and_cache(swp)	is_migration_entry(swp)
>  #define swapcache_prepare(swp)		is_migration_entry(swp)
>  
> +
> +struct swap_info_struct *swap_info_get(swp_entry_t entry)
> +{
> +	return NULL;
> +}
> +
>  static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
>  {
>  	return 0;
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 33ebdd5..8a425d4 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -505,7 +505,7 @@ swp_entry_t get_swap_page_of_type(int type)
>  	return (swp_entry_t) {0};
>  }
>  
> -static struct swap_info_struct *swap_info_get(swp_entry_t entry)
> +struct swap_info_struct *swap_info_get(swp_entry_t entry)
>  {
>  	struct swap_info_struct *p;
>  	unsigned long offset, type;
> diff --git a/mm/truncate.c b/mm/truncate.c
> index c75b736..fa1dc60 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -131,7 +131,7 @@ invalidate_complete_page(struct address_space *mapping, struct page *page)
>  	if (page_has_private(page) && !try_to_release_page(page, 0))
>  		return 0;
>  
> -	ret = remove_mapping(mapping, page);
> +	ret = remove_mapping(mapping, page, false);
>  
>  	return ret;
>  }
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fa6a853..d14c9be 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -450,12 +450,18 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
>   * Same as remove_mapping, but if the page is removed from the mapping, it
>   * gets returned with a refcount of 0.
>   */
> -static int __remove_mapping(struct address_space *mapping, struct page *page)
> +static int __remove_mapping(struct address_space *mapping, struct page *page,
> +				bool irqcontext)
>  {
> +	unsigned long flags;
> +
>  	BUG_ON(!PageLocked(page));
>  	BUG_ON(mapping != page_mapping(page));
>  
> -	spin_lock_irq(&mapping->tree_lock);
> +	if (irqcontext)
> +		spin_lock_irqsave(&mapping->tree_lock, flags);
> +	else
> +		spin_lock_irq(&mapping->tree_lock);
>  	/*
>  	 * The non racy check for a busy page.
>  	 *
> @@ -490,17 +496,45 @@ static int __remove_mapping(struct address_space *mapping, struct page *page)
>  	}
>  
>  	if (PageSwapCache(page)) {
> +		struct swap_info_struct *p;
>  		swp_entry_t swap = { .val = page_private(page) };
> +		p = swap_info_get(swap);
> +		/*
> +		 * If we are irq context, check that we can get a
> +		 * swap_info_strcut->lock before removing the page from
> +		 * swap cache. Because __swapcache_free must be successful.
> +		 * If __swapcache_free can be failed, we should rollback
> +		 * things done by __delete_from_swap_cache and it needs
> +		 * memory allocation for radix tree node in irqcontext
> +		 * That's thing we really want to avoid.
> +		 * TODO : memcg mem_cgroup_uncharge_swapcache handling
> +		 * in irqcontext
> +		 */
> +		if (irqcontext && p && !spin_trylock(&p->lock)) {
> +			page_unfreeze_refs(page, 2);
> +			goto cannot_free;
> +		}
> +
>  		__delete_from_swap_cache(page);
> -		spin_unlock_irq(&mapping->tree_lock);
> -		swapcache_free(swap, page);
> +		if (irqcontext) {
> +			spin_unlock_irqrestore(&mapping->tree_lock, flags);
> +			if (p)
> +				__swapcache_free(p, swap, page);
> +			spin_unlock(&p->lock);
> +		} else {
> +			spin_unlock_irq(&mapping->tree_lock);
> +			swapcache_free(swap, page);
> +		}
>  	} else {
>  		void (*freepage)(struct page *);
>  
>  		freepage = mapping->a_ops->freepage;
>  
>  		__delete_from_page_cache(page);
> -		spin_unlock_irq(&mapping->tree_lock);
> +		if (irqcontext)
> +			spin_unlock_irqrestore(&mapping->tree_lock, flags);
> +		else
> +			spin_unlock_irq(&mapping->tree_lock);
>  		mem_cgroup_uncharge_cache_page(page);
>  
>  		if (freepage != NULL)
> @@ -510,7 +544,10 @@ static int __remove_mapping(struct address_space *mapping, struct page *page)
>  	return 1;
>  
>  cannot_free:
> -	spin_unlock_irq(&mapping->tree_lock);
> +	if (irqcontext)
> +		spin_unlock_irqrestore(&mapping->tree_lock, flags);
> +	else
> +		spin_unlock_irq(&mapping->tree_lock);
>  	return 0;
>  }
>  
> @@ -520,9 +557,10 @@ cannot_free:
>   * successfully detached, return 1.  Assumes the caller has a single ref on
>   * this page.
>   */
> -int remove_mapping(struct address_space *mapping, struct page *page)
> +int remove_mapping(struct address_space *mapping, struct page *page,
> +			bool irqcontext)
>  {
> -	if (__remove_mapping(mapping, page)) {
> +	if (__remove_mapping(mapping, page, irqcontext)) {
>  		/*
>  		 * Unfreezing the refcount with 1 rather than 2 effectively
>  		 * drops the pagecache ref for us without requiring another
> @@ -904,7 +942,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  			}
>  		}
>  
> -		if (!mapping || !__remove_mapping(mapping, page))
> +		if (!mapping || !__remove_mapping(mapping, page, false))
>  			goto keep_locked;
>  
>  		/*
> -- 
> 1.8.2.1
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 3/4] mm: support remove_mapping in irqcontext
  2013-05-13 14:58   ` Michal Hocko
@ 2013-05-14  7:17     ` Minchan Kim
  0 siblings, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2013-05-14  7:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Rik van Riel, Mel Gorman,
	Johannes Weiner, Kamezawa Hiroyuki, Hugh Dickins

Hey Michal,

On Mon, May 13, 2013 at 04:58:57PM +0200, Michal Hocko wrote:
> On Mon 13-05-13 11:10:47, Minchan Kim wrote:
> [...]
> > My rough plan is following as,
> > 
> > 1. Make mctz->lock beging aware of irq by changing spin_lock with
> >    spin_lock_irqsave.
> 
> I wouldn't be worried about this one as it is on its way out with the
> soft limit rework (the core uncontroversial part ;))

Good to hear!

> 
> > 2. Introuduce new argument "locked" in __mem_cgroup_uncharge_common
> >    so that __mem_cgroup_uncharge_common can avoid lock_page_cgroup in
> >    irqcontext to avoid deadlock but caller in irqcontext should be held
> >    it in advance by next patch.
> > 3. Introduce try_lock_page_cgroup, which will be used __swapcache_free.
> > 4. __remove_mapping can held a page_cgroup lock in advance before calling
> >    __swapcache_free
> > 
> > I'd like to listen memcg people's opinions before diving into coding.
> 
> It should work. It will require some code moving, though.

Yeb. I will give it a shot!

Thanks for the review!

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 4/4] mm: free reclaimed pages instantly without depending next reclaim
  2013-05-13  2:10 ` [RFC 4/4] mm: free reclaimed pages instantly without depending next reclaim Minchan Kim
@ 2013-05-14 17:32   ` Rik van Riel
  2013-05-15  7:12     ` Minchan Kim
  0 siblings, 1 reply; 9+ messages in thread
From: Rik van Riel @ 2013-05-14 17:32 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman,
	Johannes Weiner, Kamezawa Hiroyuki, Michal Hocko, Hugh Dickins

On 05/12/2013 10:10 PM, Minchan Kim wrote:
> Normally, file I/O for reclaiming is asynchronous so that
> when page writeback is completed, reclaimed page will be
> rotated into LRU tail for fast reclaiming in next turn.
> But it makes unnecessary CPU overhead and more iteration with higher
> priority of reclaim could reclaim too many pages than needed
> pages.
>
> This patch frees reclaimed pages by paging out instantly without
> rotating back them into LRU's tail when the I/O is completed so
> that we can get out of reclaim loop as soon as poosbile and avoid
> unnecessary CPU overhead for moving them.
>
> Signed-off-by: Minchan Kim <minchan@kernel.org>

I like this approach and am looking forward to your v2 series,
with the reworked patch 3/4.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 4/4] mm: free reclaimed pages instantly without depending next reclaim
  2013-05-14 17:32   ` Rik van Riel
@ 2013-05-15  7:12     ` Minchan Kim
  0 siblings, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2013-05-15  7:12 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman,
	Johannes Weiner, Kamezawa Hiroyuki, Michal Hocko, Hugh Dickins

Hey Rik,

On Tue, May 14, 2013 at 01:32:33PM -0400, Rik van Riel wrote:
> On 05/12/2013 10:10 PM, Minchan Kim wrote:
> >Normally, file I/O for reclaiming is asynchronous so that
> >when page writeback is completed, reclaimed page will be
> >rotated into LRU tail for fast reclaiming in next turn.
> >But it makes unnecessary CPU overhead and more iteration with higher
> >priority of reclaim could reclaim too many pages than needed
> >pages.
> >
> >This patch frees reclaimed pages by paging out instantly without
> >rotating back them into LRU's tail when the I/O is completed so
> >that we can get out of reclaim loop as soon as poosbile and avoid
> >unnecessary CPU overhead for moving them.
> >
> >Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> I like this approach and am looking forward to your v2 series,
> with the reworked patch 3/4.

I will do it after I finish more urgent works. :)
I am looking forward to seeing your review, then.

Thanks for the interest.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-05-15  7:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-13  2:10 [RFC 0/4] free reclaimed pages by paging out instantly Minchan Kim
2013-05-13  2:10 ` [RFC 1/4] mm: Don't hide spin_lock in swap_info_get Minchan Kim
2013-05-13  2:10 ` [RFC 2/4] mm: introduce __swapcache_free Minchan Kim
2013-05-13  2:10 ` [RFC 3/4] mm: support remove_mapping in irqcontext Minchan Kim
2013-05-13 14:58   ` Michal Hocko
2013-05-14  7:17     ` Minchan Kim
2013-05-13  2:10 ` [RFC 4/4] mm: free reclaimed pages instantly without depending next reclaim Minchan Kim
2013-05-14 17:32   ` Rik van Riel
2013-05-15  7:12     ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).