From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx175.postini.com [74.125.245.175]) by kanga.kvack.org (Postfix) with SMTP id 90ABA6B0031 for ; Tue, 6 Aug 2013 02:42:56 -0400 (EDT) Received: from eucpsbgm1.samsung.com (unknown [203.254.199.244]) by mailout3.w1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MR300EBZJZ5FG30@mailout3.w1.samsung.com> for linux-mm@kvack.org; Tue, 06 Aug 2013 07:42:54 +0100 (BST) From: Krzysztof Kozlowski Subject: [RFC PATCH 0/4] mm: reclaim zbud pages on migration and compaction Date: Tue, 06 Aug 2013 08:42:37 +0200 Message-id: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> Sender: owner-linux-mm@kvack.org List-ID: To: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Cc: Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Krzysztof Kozlowski Hi, Currently zbud pages are not movable and they cannot be allocated from CMA region. These patches try to address the problem by: 1. Adding a new form of reclaim of zbud pages. 2. Reclaiming zbud pages during migration and compaction. 3. Allocating zbud pages with __GFP_RECLAIMABLE flag. This reclaim process is different than zbud_reclaim_page(). It acts more like swapoff() by trying to unuse pages stored in zbud page and bring them back to memory. The standard zbud_reclaim_page() on the other hand tries to write them back. One of patches introduces a new flag: PageZbud. This flag is used in isolate_migratepages_range() to grab zbud pages and pass them later for reclaim. Probably this could be replaced with something smarter than a flag used only in one case. Any ideas for a better solution are welcome. This patch set is based on Linux 3.11-rc4. TODOs: 1. Replace PageZbud flag with other solution. Best regards, Krzysztof Kozlowski Krzysztof Kozlowski (4): zbud: use page ref counter for zbud pages mm: split code for unusing swap entries from try_to_unuse mm: add zbud flag to page flags mm: reclaim zbud pages on migration and compaction include/linux/page-flags.h | 12 ++ include/linux/swapfile.h | 2 + include/linux/zbud.h | 11 +- mm/compaction.c | 20 ++- mm/internal.h | 1 + mm/page_alloc.c | 9 ++ mm/swapfile.c | 354 +++++++++++++++++++++++--------------------- mm/zbud.c | 301 +++++++++++++++++++++++++------------ mm/zswap.c | 57 ++++++- 9 files changed, 499 insertions(+), 268 deletions(-) -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx187.postini.com [74.125.245.187]) by kanga.kvack.org (Postfix) with SMTP id B126A6B0033 for ; Tue, 6 Aug 2013 02:43:03 -0400 (EDT) Received: from eucpsbgm2.samsung.com (unknown [203.254.199.245]) by mailout1.w1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MR300247JZC8N30@mailout1.w1.samsung.com> for linux-mm@kvack.org; Tue, 06 Aug 2013 07:43:01 +0100 (BST) From: Krzysztof Kozlowski Subject: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages Date: Tue, 06 Aug 2013 08:42:38 +0200 Message-id: <1375771361-8388-2-git-send-email-k.kozlowski@samsung.com> In-reply-to: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> Sender: owner-linux-mm@kvack.org List-ID: To: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Cc: Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Krzysztof Kozlowski , Tomasz Stanislawski Use page reference counter for zbud pages. The ref counter replaces zbud_header.under_reclaim flag and ensures that zbud page won't be freed when zbud_free() is called during reclaim. It allows implementation of additional reclaim paths. The page count is incremented when: - a handle is created and passed to zswap (in zbud_alloc()), - user-supplied eviction callback is called (in zbud_reclaim_page()). Signed-off-by: Krzysztof Kozlowski Signed-off-by: Tomasz Stanislawski --- mm/zbud.c | 150 +++++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 86 insertions(+), 64 deletions(-) diff --git a/mm/zbud.c b/mm/zbud.c index ad1e781..a8e986f 100644 --- a/mm/zbud.c +++ b/mm/zbud.c @@ -109,7 +109,6 @@ struct zbud_header { struct list_head lru; unsigned int first_chunks; unsigned int last_chunks; - bool under_reclaim; }; /***************** @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page) zhdr->last_chunks = 0; INIT_LIST_HEAD(&zhdr->buddy); INIT_LIST_HEAD(&zhdr->lru); - zhdr->under_reclaim = 0; return zhdr; } -/* Resets the struct page fields and frees the page */ -static void free_zbud_page(struct zbud_header *zhdr) -{ - __free_page(virt_to_page(zhdr)); -} - /* * Encodes the handle of a particular buddy within a zbud page * Pool lock should be held as this function accesses first|last_chunks @@ -188,6 +180,65 @@ static int num_free_chunks(struct zbud_header *zhdr) return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1; } +/* + * Called after zbud_free() or zbud_alloc(). + * Checks whether given zbud page has to be: + * - removed from buddied/unbuddied/LRU lists completetely (zbud_free). + * - moved from buddied to unbuddied list + * and to beginning of LRU (zbud_alloc, zbud_free), + * - added to buddied list and LRU (zbud_alloc), + * + * The page must be already removed from buddied/unbuddied lists. + * Must be called under pool->lock. + */ +static void rebalance_lists(struct zbud_pool *pool, struct zbud_header *zhdr) +{ + if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { + /* zbud_free() */ + list_del(&zhdr->lru); + return; + } else if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) { + /* zbud_free() or zbud_alloc() */ + int freechunks = num_free_chunks(zhdr); + list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); + } else { + /* zbud_alloc() */ + list_add(&zhdr->buddy, &pool->buddied); + } + /* Add/move zbud page to beginning of LRU */ + if (!list_empty(&zhdr->lru)) + list_del(&zhdr->lru); + list_add(&zhdr->lru, &pool->lru); +} + +/* + * Increases ref count for zbud page. + */ +static void get_zbud_page(struct zbud_header *zhdr) +{ + get_page(virt_to_page(zhdr)); +} + +/* + * Decreases ref count for zbud page and frees the page if it reaches 0 + * (no external references, e.g. handles). + * + * Must be called under pool->lock. + * + * Returns 1 if page was freed and 0 otherwise. + */ +static int put_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr) +{ + struct page *page = virt_to_page(zhdr); + if (put_page_testzero(page)) { + free_hot_cold_page(page, 0); + pool->pages_nr--; + return 1; + } + return 0; +} + + /***************** * API Functions *****************/ @@ -250,7 +301,7 @@ void zbud_destroy_pool(struct zbud_pool *pool) int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, unsigned long *handle) { - int chunks, i, freechunks; + int chunks, i; struct zbud_header *zhdr = NULL; enum buddy bud; struct page *page; @@ -273,6 +324,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, bud = FIRST; else bud = LAST; + get_zbud_page(zhdr); goto found; } } @@ -284,6 +336,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, return -ENOMEM; spin_lock(&pool->lock); pool->pages_nr++; + /* + * We will be using zhdr instead of page, so + * don't increase the page count. + */ zhdr = init_zbud_page(page); bud = FIRST; @@ -293,19 +349,7 @@ found: else zhdr->last_chunks = chunks; - if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) { - /* Add to unbuddied list */ - freechunks = num_free_chunks(zhdr); - list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); - } else { - /* Add to buddied list */ - list_add(&zhdr->buddy, &pool->buddied); - } - - /* Add/move zbud page to beginning of LRU */ - if (!list_empty(&zhdr->lru)) - list_del(&zhdr->lru); - list_add(&zhdr->lru, &pool->lru); + rebalance_lists(pool, zhdr); *handle = encode_handle(zhdr, bud); spin_unlock(&pool->lock); @@ -326,10 +370,10 @@ found: void zbud_free(struct zbud_pool *pool, unsigned long handle) { struct zbud_header *zhdr; - int freechunks; spin_lock(&pool->lock); zhdr = handle_to_zbud_header(handle); + BUG_ON(zhdr->last_chunks == 0 && zhdr->first_chunks == 0); /* If first buddy, handle will be page aligned */ if ((handle - ZHDR_SIZE_ALIGNED) & ~PAGE_MASK) @@ -337,26 +381,9 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) else zhdr->first_chunks = 0; - if (zhdr->under_reclaim) { - /* zbud page is under reclaim, reclaim will free */ - spin_unlock(&pool->lock); - return; - } - - /* Remove from existing buddy list */ list_del(&zhdr->buddy); - - if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { - /* zbud page is empty, free */ - list_del(&zhdr->lru); - free_zbud_page(zhdr); - pool->pages_nr--; - } else { - /* Add to unbuddied list */ - freechunks = num_free_chunks(zhdr); - list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); - } - + rebalance_lists(pool, zhdr); + put_zbud_page(pool, zhdr); spin_unlock(&pool->lock); } @@ -400,7 +427,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) */ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) { - int i, ret, freechunks; + int i, ret; struct zbud_header *zhdr; unsigned long first_handle = 0, last_handle = 0; @@ -411,11 +438,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) return -EINVAL; } for (i = 0; i < retries; i++) { + if (list_empty(&pool->lru)) { + /* + * LRU was emptied during evict calls in previous + * iteration but put_zbud_page() returned 0 meaning + * that someone still holds the page. This may + * happen when some other mm mechanism increased + * the page count. + * In such case we succedded with reclaim. + */ + return 0; + } zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru); + BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0); + /* Move this last element to beginning of LRU */ list_del(&zhdr->lru); - list_del(&zhdr->buddy); + list_add(&zhdr->lru, &pool->lru); /* Protect zbud page against free */ - zhdr->under_reclaim = true; + get_zbud_page(zhdr); /* * We need encode the handles before unlocking, since we can * race with free that will set (first|last)_chunks to 0 @@ -441,28 +481,10 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) } next: spin_lock(&pool->lock); - zhdr->under_reclaim = false; - if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { - /* - * Both buddies are now free, free the zbud page and - * return success. - */ - free_zbud_page(zhdr); - pool->pages_nr--; + if (put_zbud_page(pool, zhdr)) { spin_unlock(&pool->lock); return 0; - } else if (zhdr->first_chunks == 0 || - zhdr->last_chunks == 0) { - /* add to unbuddied list */ - freechunks = num_free_chunks(zhdr); - list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); - } else { - /* add to buddied list */ - list_add(&zhdr->buddy, &pool->buddied); } - - /* add to beginning of LRU */ - list_add(&zhdr->lru, &pool->lru); } spin_unlock(&pool->lock); return -EAGAIN; -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx188.postini.com [74.125.245.188]) by kanga.kvack.org (Postfix) with SMTP id A1BCC6B0034 for ; Tue, 6 Aug 2013 02:43:10 -0400 (EDT) Received: from eucpsbgm1.samsung.com (unknown [203.254.199.244]) by mailout3.w1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MR300EBZJZ5FG30@mailout3.w1.samsung.com> for linux-mm@kvack.org; Tue, 06 Aug 2013 07:43:09 +0100 (BST) From: Krzysztof Kozlowski Subject: [RFC PATCH 2/4] mm: split code for unusing swap entries from try_to_unuse Date: Tue, 06 Aug 2013 08:42:39 +0200 Message-id: <1375771361-8388-3-git-send-email-k.kozlowski@samsung.com> In-reply-to: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> Sender: owner-linux-mm@kvack.org List-ID: To: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Cc: Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Krzysztof Kozlowski Move out the code for unusing swap entries from loop in try_to_unuse() to separate function: try_to_unuse_swp_entry(). Export this new function in swapfile.h just like try_to_unuse() is exported. Signed-off-by: Krzysztof Kozlowski --- include/linux/swapfile.h | 2 + mm/swapfile.c | 354 ++++++++++++++++++++++++---------------------- 2 files changed, 187 insertions(+), 169 deletions(-) diff --git a/include/linux/swapfile.h b/include/linux/swapfile.h index e282624..68c24a7 100644 --- a/include/linux/swapfile.h +++ b/include/linux/swapfile.h @@ -9,5 +9,7 @@ extern spinlock_t swap_lock; extern struct swap_list_t swap_list; extern struct swap_info_struct *swap_info[]; extern int try_to_unuse(unsigned int, bool, unsigned long); +extern int try_to_unuse_swp_entry(struct mm_struct **start_mm, + struct swap_info_struct *si, swp_entry_t entry); #endif /* _LINUX_SWAPFILE_H */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 36af6ee..331d0b8 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1100,6 +1100,189 @@ static unsigned int find_next_to_unuse(struct swap_info_struct *si, } /* + * Returns: + * - negative on error, + * - 0 on success (entry unused) + */ +int try_to_unuse_swp_entry(struct mm_struct **start_mm, + struct swap_info_struct *si, swp_entry_t entry) +{ + pgoff_t offset = swp_offset(entry); + unsigned char *swap_map; + unsigned char swcount; + struct page *page; + int retval = 0; + + if (signal_pending(current)) { + retval = -EINTR; + goto out; + } + + /* + * Get a page for the entry, using the existing swap + * cache page if there is one. Otherwise, get a clean + * page and read the swap into it. + */ + swap_map = &si->swap_map[offset]; + page = read_swap_cache_async(entry, + GFP_HIGHUSER_MOVABLE, NULL, 0); + if (!page) { + /* + * Either swap_duplicate() failed because entry + * has been freed independently, and will not be + * reused since sys_swapoff() already disabled + * allocation from here, or alloc_page() failed. + */ + if (!*swap_map) + retval = 0; + else + retval = -ENOMEM; + goto out; + } + + /* + * Don't hold on to start_mm if it looks like exiting. + */ + if (atomic_read(&(*start_mm)->mm_users) == 1) { + mmput(*start_mm); + *start_mm = &init_mm; + atomic_inc(&init_mm.mm_users); + } + + /* + * Wait for and lock page. When do_swap_page races with + * try_to_unuse, do_swap_page can handle the fault much + * faster than try_to_unuse can locate the entry. This + * apparently redundant "wait_on_page_locked" lets try_to_unuse + * defer to do_swap_page in such a case - in some tests, + * do_swap_page and try_to_unuse repeatedly compete. + */ + wait_on_page_locked(page); + wait_on_page_writeback(page); + lock_page(page); + wait_on_page_writeback(page); + + /* + * Remove all references to entry. + */ + swcount = *swap_map; + if (swap_count(swcount) == SWAP_MAP_SHMEM) { + retval = shmem_unuse(entry, page); + VM_BUG_ON(retval > 0); + /* page has already been unlocked and released */ + goto out; + } + if (swap_count(swcount) && *start_mm != &init_mm) + retval = unuse_mm(*start_mm, entry, page); + + if (swap_count(*swap_map)) { + int set_start_mm = (*swap_map >= swcount); + struct list_head *p = &(*start_mm)->mmlist; + struct mm_struct *new_start_mm = *start_mm; + struct mm_struct *prev_mm = *start_mm; + struct mm_struct *mm; + + atomic_inc(&new_start_mm->mm_users); + atomic_inc(&prev_mm->mm_users); + spin_lock(&mmlist_lock); + while (swap_count(*swap_map) && !retval && + (p = p->next) != &(*start_mm)->mmlist) { + mm = list_entry(p, struct mm_struct, mmlist); + if (!atomic_inc_not_zero(&mm->mm_users)) + continue; + spin_unlock(&mmlist_lock); + mmput(prev_mm); + prev_mm = mm; + + cond_resched(); + + swcount = *swap_map; + if (!swap_count(swcount)) /* any usage ? */ + ; + else if (mm == &init_mm) + set_start_mm = 1; + else + retval = unuse_mm(mm, entry, page); + + if (set_start_mm && *swap_map < swcount) { + mmput(new_start_mm); + atomic_inc(&mm->mm_users); + new_start_mm = mm; + set_start_mm = 0; + } + spin_lock(&mmlist_lock); + } + spin_unlock(&mmlist_lock); + mmput(prev_mm); + mmput(*start_mm); + *start_mm = new_start_mm; + } + if (retval) { + unlock_page(page); + page_cache_release(page); + goto out; + } + + /* + * If a reference remains (rare), we would like to leave + * the page in the swap cache; but try_to_unmap could + * then re-duplicate the entry once we drop page lock, + * so we might loop indefinitely; also, that page could + * not be swapped out to other storage meanwhile. So: + * delete from cache even if there's another reference, + * after ensuring that the data has been saved to disk - + * since if the reference remains (rarer), it will be + * read from disk into another page. Splitting into two + * pages would be incorrect if swap supported "shared + * private" pages, but they are handled by tmpfs files. + * + * Given how unuse_vma() targets one particular offset + * in an anon_vma, once the anon_vma has been determined, + * this splitting happens to be just what is needed to + * handle where KSM pages have been swapped out: re-reading + * is unnecessarily slow, but we can fix that later on. + */ + if (swap_count(*swap_map) && + PageDirty(page) && PageSwapCache(page)) { + struct writeback_control wbc = { + .sync_mode = WB_SYNC_NONE, + }; + + swap_writepage(page, &wbc); + lock_page(page); + wait_on_page_writeback(page); + } + + /* + * It is conceivable that a racing task removed this page from + * swap cache just before we acquired the page lock at the top, + * or while we dropped it in unuse_mm(). The page might even + * be back in swap cache on another swap area: that we must not + * delete, since it may not have been written out to swap yet. + */ + if (PageSwapCache(page) && + likely(page_private(page) == entry.val)) + delete_from_swap_cache(page); + + /* + * So we could skip searching mms once swap count went + * to 1, we did not mark any present ptes as dirty: must + * mark page dirty so shrink_page_list will preserve it. + */ + SetPageDirty(page); + unlock_page(page); + page_cache_release(page); + + /* + * Make sure that we aren't completely killing + * interactive performance. + */ + cond_resched(); +out: + return retval; +} + +/* * We completely avoid races by reading each swap page in advance, * and then search for the process using it. All the necessary * page table adjustments can then be made atomically. @@ -1112,10 +1295,6 @@ int try_to_unuse(unsigned int type, bool frontswap, { struct swap_info_struct *si = swap_info[type]; struct mm_struct *start_mm; - unsigned char *swap_map; - unsigned char swcount; - struct page *page; - swp_entry_t entry; unsigned int i = 0; int retval = 0; @@ -1142,172 +1321,9 @@ int try_to_unuse(unsigned int type, bool frontswap, * there are races when an instance of an entry might be missed. */ while ((i = find_next_to_unuse(si, i, frontswap)) != 0) { - if (signal_pending(current)) { - retval = -EINTR; - break; - } - - /* - * Get a page for the entry, using the existing swap - * cache page if there is one. Otherwise, get a clean - * page and read the swap into it. - */ - swap_map = &si->swap_map[i]; - entry = swp_entry(type, i); - page = read_swap_cache_async(entry, - GFP_HIGHUSER_MOVABLE, NULL, 0); - if (!page) { - /* - * Either swap_duplicate() failed because entry - * has been freed independently, and will not be - * reused since sys_swapoff() already disabled - * allocation from here, or alloc_page() failed. - */ - if (!*swap_map) - continue; - retval = -ENOMEM; - break; - } - - /* - * Don't hold on to start_mm if it looks like exiting. - */ - if (atomic_read(&start_mm->mm_users) == 1) { - mmput(start_mm); - start_mm = &init_mm; - atomic_inc(&init_mm.mm_users); - } - - /* - * Wait for and lock page. When do_swap_page races with - * try_to_unuse, do_swap_page can handle the fault much - * faster than try_to_unuse can locate the entry. This - * apparently redundant "wait_on_page_locked" lets try_to_unuse - * defer to do_swap_page in such a case - in some tests, - * do_swap_page and try_to_unuse repeatedly compete. - */ - wait_on_page_locked(page); - wait_on_page_writeback(page); - lock_page(page); - wait_on_page_writeback(page); - - /* - * Remove all references to entry. - */ - swcount = *swap_map; - if (swap_count(swcount) == SWAP_MAP_SHMEM) { - retval = shmem_unuse(entry, page); - /* page has already been unlocked and released */ - if (retval < 0) - break; - continue; - } - if (swap_count(swcount) && start_mm != &init_mm) - retval = unuse_mm(start_mm, entry, page); - - if (swap_count(*swap_map)) { - int set_start_mm = (*swap_map >= swcount); - struct list_head *p = &start_mm->mmlist; - struct mm_struct *new_start_mm = start_mm; - struct mm_struct *prev_mm = start_mm; - struct mm_struct *mm; - - atomic_inc(&new_start_mm->mm_users); - atomic_inc(&prev_mm->mm_users); - spin_lock(&mmlist_lock); - while (swap_count(*swap_map) && !retval && - (p = p->next) != &start_mm->mmlist) { - mm = list_entry(p, struct mm_struct, mmlist); - if (!atomic_inc_not_zero(&mm->mm_users)) - continue; - spin_unlock(&mmlist_lock); - mmput(prev_mm); - prev_mm = mm; - - cond_resched(); - - swcount = *swap_map; - if (!swap_count(swcount)) /* any usage ? */ - ; - else if (mm == &init_mm) - set_start_mm = 1; - else - retval = unuse_mm(mm, entry, page); - - if (set_start_mm && *swap_map < swcount) { - mmput(new_start_mm); - atomic_inc(&mm->mm_users); - new_start_mm = mm; - set_start_mm = 0; - } - spin_lock(&mmlist_lock); - } - spin_unlock(&mmlist_lock); - mmput(prev_mm); - mmput(start_mm); - start_mm = new_start_mm; - } - if (retval) { - unlock_page(page); - page_cache_release(page); + if (try_to_unuse_swp_entry(&start_mm, si, + swp_entry(type, i)) != 0) break; - } - - /* - * If a reference remains (rare), we would like to leave - * the page in the swap cache; but try_to_unmap could - * then re-duplicate the entry once we drop page lock, - * so we might loop indefinitely; also, that page could - * not be swapped out to other storage meanwhile. So: - * delete from cache even if there's another reference, - * after ensuring that the data has been saved to disk - - * since if the reference remains (rarer), it will be - * read from disk into another page. Splitting into two - * pages would be incorrect if swap supported "shared - * private" pages, but they are handled by tmpfs files. - * - * Given how unuse_vma() targets one particular offset - * in an anon_vma, once the anon_vma has been determined, - * this splitting happens to be just what is needed to - * handle where KSM pages have been swapped out: re-reading - * is unnecessarily slow, but we can fix that later on. - */ - if (swap_count(*swap_map) && - PageDirty(page) && PageSwapCache(page)) { - struct writeback_control wbc = { - .sync_mode = WB_SYNC_NONE, - }; - - swap_writepage(page, &wbc); - lock_page(page); - wait_on_page_writeback(page); - } - - /* - * It is conceivable that a racing task removed this page from - * swap cache just before we acquired the page lock at the top, - * or while we dropped it in unuse_mm(). The page might even - * be back in swap cache on another swap area: that we must not - * delete, since it may not have been written out to swap yet. - */ - if (PageSwapCache(page) && - likely(page_private(page) == entry.val)) - delete_from_swap_cache(page); - - /* - * So we could skip searching mms once swap count went - * to 1, we did not mark any present ptes as dirty: must - * mark page dirty so shrink_page_list will preserve it. - */ - SetPageDirty(page); - unlock_page(page); - page_cache_release(page); - - /* - * Make sure that we aren't completely killing - * interactive performance. - */ - cond_resched(); if (frontswap && pages_to_unuse > 0) { if (!--pages_to_unuse) break; -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx188.postini.com [74.125.245.188]) by kanga.kvack.org (Postfix) with SMTP id 6EE6F6B0036 for ; Tue, 6 Aug 2013 02:43:12 -0400 (EDT) Received: from eucpsbgm1.samsung.com (unknown [203.254.199.244]) by mailout3.w1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MR300EBZJZ5FG30@mailout3.w1.samsung.com> for linux-mm@kvack.org; Tue, 06 Aug 2013 07:43:10 +0100 (BST) From: Krzysztof Kozlowski Subject: [RFC PATCH 3/4] mm: add zbud flag to page flags Date: Tue, 06 Aug 2013 08:42:40 +0200 Message-id: <1375771361-8388-4-git-send-email-k.kozlowski@samsung.com> In-reply-to: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> Sender: owner-linux-mm@kvack.org List-ID: To: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Cc: Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Krzysztof Kozlowski Add PageZbud flag to page flags to distinguish pages allocated in zbud. Currently these pages do not have any flags set. Signed-off-by: Krzysztof Kozlowski --- include/linux/page-flags.h | 12 ++++++++++++ mm/page_alloc.c | 3 +++ mm/zbud.c | 4 ++++ 3 files changed, 19 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6d53675..5b8b61a6 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -109,6 +109,12 @@ enum pageflags { #ifdef CONFIG_TRANSPARENT_HUGEPAGE PG_compound_lock, #endif +#ifdef CONFIG_ZBUD + /* Allocated by zbud. Flag is necessary to find zbud pages to unuse + * during migration/compaction. + */ + PG_zbud, +#endif __NR_PAGEFLAGS, /* Filesystems */ @@ -275,6 +281,12 @@ PAGEFLAG_FALSE(HWPoison) #define __PG_HWPOISON 0 #endif +#ifdef CONFIG_ZBUD +PAGEFLAG(Zbud, zbud) +#else +PAGEFLAG_FALSE(Zbud) +#endif + u64 stable_page_flags(struct page *page); static inline int PageUptodate(struct page *page) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b100255..1a120fb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6345,6 +6345,9 @@ static const struct trace_print_flags pageflag_names[] = { #ifdef CONFIG_TRANSPARENT_HUGEPAGE {1UL << PG_compound_lock, "compound_lock" }, #endif +#ifdef CONFIG_ZBUD + {1UL << PG_zbud, "zbud" }, +#endif }; static void dump_page_flags(unsigned long flags) diff --git a/mm/zbud.c b/mm/zbud.c index a8e986f..a452949 100644 --- a/mm/zbud.c +++ b/mm/zbud.c @@ -230,7 +230,10 @@ static void get_zbud_page(struct zbud_header *zhdr) static int put_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr) { struct page *page = virt_to_page(zhdr); + BUG_ON(!PageZbud(page)); + if (put_page_testzero(page)) { + ClearPageZbud(page); free_hot_cold_page(page, 0); pool->pages_nr--; return 1; @@ -341,6 +344,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, * don't increase the page count. */ zhdr = init_zbud_page(page); + SetPageZbud(page); bud = FIRST; found: -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx188.postini.com [74.125.245.188]) by kanga.kvack.org (Postfix) with SMTP id D0FE56B0038 for ; Tue, 6 Aug 2013 02:43:13 -0400 (EDT) Received: from eucpsbgm1.samsung.com (unknown [203.254.199.244]) by mailout3.w1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MR300EBZJZ5FG30@mailout3.w1.samsung.com> for linux-mm@kvack.org; Tue, 06 Aug 2013 07:43:11 +0100 (BST) From: Krzysztof Kozlowski Subject: [RFC PATCH 4/4] mm: reclaim zbud pages on migration and compaction Date: Tue, 06 Aug 2013 08:42:41 +0200 Message-id: <1375771361-8388-5-git-send-email-k.kozlowski@samsung.com> In-reply-to: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> Sender: owner-linux-mm@kvack.org List-ID: To: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Cc: Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Krzysztof Kozlowski Reclaim zbud pages during migration and compaction by unusing stored data. This allows adding__GFP_RECLAIMABLE flag when allocating zbud pages and effectively CMA pool can be used for zswap. zbud pages are not movable and are not stored under any LRU (except zbud's LRU). PageZbud flag is used in isolate_migratepages_range() to grab zbud pages and pass them later for reclaim. This reclaim process is different than zbud_reclaim_page(). It acts more like swapoff() by trying to unuse pages stored in zbud page and bring them back to memory. The standard zbud_reclaim_page() on the other hand tries to write them back. Signed-off-by: Krzysztof Kozlowski --- include/linux/zbud.h | 11 +++- mm/compaction.c | 20 ++++++- mm/internal.h | 1 + mm/page_alloc.c | 6 ++ mm/zbud.c | 163 +++++++++++++++++++++++++++++++++++++++----------- mm/zswap.c | 57 ++++++++++++++++-- 6 files changed, 215 insertions(+), 43 deletions(-) diff --git a/include/linux/zbud.h b/include/linux/zbud.h index 2571a5c..57ee85d 100644 --- a/include/linux/zbud.h +++ b/include/linux/zbud.h @@ -5,8 +5,14 @@ struct zbud_pool; +/** + * Template for functions called during reclaim. + */ +typedef int (*evict_page_t)(struct zbud_pool *pool, unsigned long handle); + struct zbud_ops { - int (*evict)(struct zbud_pool *pool, unsigned long handle); + evict_page_t evict; /* callback for zbud_reclaim_lru_page() */ + evict_page_t unuse; /* callback for zbud_reclaim_pages() */ }; struct zbud_pool *zbud_create_pool(gfp_t gfp, struct zbud_ops *ops); @@ -14,7 +20,8 @@ void zbud_destroy_pool(struct zbud_pool *pool); int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, unsigned long *handle); void zbud_free(struct zbud_pool *pool, unsigned long handle); -int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries); +int zbud_reclaim_lru_page(struct zbud_pool *pool, unsigned int retries); +void zbud_reclaim_pages(struct list_head *zbud_pages); void *zbud_map(struct zbud_pool *pool, unsigned long handle); void zbud_unmap(struct zbud_pool *pool, unsigned long handle); u64 zbud_get_pool_size(struct zbud_pool *pool); diff --git a/mm/compaction.c b/mm/compaction.c index 05ccb4c..9bbf412 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "internal.h" #ifdef CONFIG_COMPACTION @@ -534,6 +535,17 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, goto next_pageblock; } + if (PageZbud(page)) { + /* + * Zbud pages do not exist in LRU so we must + * check for Zbud flag before PageLRU() below. + */ + BUG_ON(PageLRU(page)); + get_page(page); + list_add(&page->lru, &cc->zbudpages); + continue; + } + /* * Check may be lockless but that's ok as we recheck later. * It's possible to migrate LRU pages and balloon pages @@ -810,7 +822,10 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn, false); if (!low_pfn || cc->contended) return ISOLATE_ABORT; - +#ifdef CONFIG_ZBUD + if (!list_empty(&cc->zbudpages)) + zbud_reclaim_pages(&cc->zbudpages); +#endif cc->migrate_pfn = low_pfn; return ISOLATE_SUCCESS; @@ -1023,11 +1038,13 @@ static unsigned long compact_zone_order(struct zone *zone, }; INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); + INIT_LIST_HEAD(&cc.zbudpages); ret = compact_zone(zone, &cc); VM_BUG_ON(!list_empty(&cc.freepages)); VM_BUG_ON(!list_empty(&cc.migratepages)); + VM_BUG_ON(!list_empty(&cc.zbudpages)); *contended = cc.contended; return ret; @@ -1105,6 +1122,7 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc) cc->zone = zone; INIT_LIST_HEAD(&cc->freepages); INIT_LIST_HEAD(&cc->migratepages); + INIT_LIST_HEAD(&cc->zbudpages); if (cc->order == -1 || !compaction_deferred(zone, cc->order)) compact_zone(zone, cc); diff --git a/mm/internal.h b/mm/internal.h index 4390ac6..eaf5c884 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -119,6 +119,7 @@ struct compact_control { unsigned long nr_migratepages; /* Number of pages to migrate */ unsigned long free_pfn; /* isolate_freepages search base */ unsigned long migrate_pfn; /* isolate_migratepages search base */ + struct list_head zbudpages; /* List of pages belonging to zbud */ bool sync; /* Synchronous migration */ bool ignore_skip_hint; /* Scan blocks even if marked skip */ bool finished_update_free; /* True when the zone cached pfns are diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1a120fb..e482876 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -60,6 +60,7 @@ #include #include #include +#include #include #include @@ -6031,6 +6032,10 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, ret = -EINTR; break; } +#ifdef CONFIG_ZBUD + if (!list_empty(&cc.zbudpages)) + zbud_reclaim_pages(&cc.zbudpages); +#endif tries = 0; } else if (++tries == 5) { ret = ret < 0 ? ret : -EBUSY; @@ -6085,6 +6090,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, .ignore_skip_hint = true, }; INIT_LIST_HEAD(&cc.migratepages); + INIT_LIST_HEAD(&cc.zbudpages); /* * What we do here is we mark all pageblocks in range as diff --git a/mm/zbud.c b/mm/zbud.c index a452949..98a04c8 100644 --- a/mm/zbud.c +++ b/mm/zbud.c @@ -103,12 +103,14 @@ struct zbud_pool { * @lru: links the zbud page into the lru list in the pool * @first_chunks: the size of the first buddy in chunks, 0 if free * @last_chunks: the size of the last buddy in chunks, 0 if free + * @pool: pool to which this zbud page belongs to */ struct zbud_header { struct list_head buddy; struct list_head lru; unsigned int first_chunks; unsigned int last_chunks; + struct zbud_pool *pool; }; /***************** @@ -137,6 +139,7 @@ static struct zbud_header *init_zbud_page(struct page *page) zhdr->last_chunks = 0; INIT_LIST_HEAD(&zhdr->buddy); INIT_LIST_HEAD(&zhdr->lru); + zhdr->pool = NULL; return zhdr; } @@ -241,7 +244,6 @@ static int put_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr) return 0; } - /***************** * API Functions *****************/ @@ -345,6 +347,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, */ zhdr = init_zbud_page(page); SetPageZbud(page); + zhdr->pool = pool; bud = FIRST; found: @@ -394,8 +397,57 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) #define list_tail_entry(ptr, type, member) \ list_entry((ptr)->prev, type, member) +/* + * Pool lock must be held when calling this function and at least + * one handle must not free. + * On return the pool lock will be still held however during the + * execution it will be unlocked and locked for the time of calling + * the evict callback. + * + * Returns 1 if page was freed here, 0 otherwise (still in use) + */ +static int do_reclaim(struct zbud_pool *pool, struct zbud_header *zhdr, + evict_page_t evict_cb) +{ + int ret; + unsigned long first_handle = 0, last_handle = 0; + + BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0); + /* Move this last element to beginning of LRU */ + list_del(&zhdr->lru); + list_add(&zhdr->lru, &pool->lru); + /* Protect zbud page against free */ + get_zbud_page(zhdr); + /* + * We need encode the handles before unlocking, since we can + * race with free that will set (first|last)_chunks to 0 + */ + first_handle = 0; + last_handle = 0; + if (zhdr->first_chunks) + first_handle = encode_handle(zhdr, FIRST); + if (zhdr->last_chunks) + last_handle = encode_handle(zhdr, LAST); + spin_unlock(&pool->lock); + + /* Issue the eviction callback(s) */ + if (first_handle) { + ret = evict_cb(pool, first_handle); + if (ret) + goto next; + } + if (last_handle) { + ret = evict_cb(pool, last_handle); + if (ret) + goto next; + } +next: + spin_lock(&pool->lock); + return put_zbud_page(pool, zhdr); +} + /** - * zbud_reclaim_page() - evicts allocations from a pool page and frees it + * zbud_reclaim_lru_page() - evicts allocations from a pool page and frees it * @pool: pool from which a page will attempt to be evicted * @retires: number of pages on the LRU list for which eviction will * be attempted before failing @@ -429,11 +481,10 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) * no pages to evict or an eviction handler is not registered, -EAGAIN if * the retry limit was hit. */ -int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) +int zbud_reclaim_lru_page(struct zbud_pool *pool, unsigned int retries) { - int i, ret; + int i; struct zbud_header *zhdr; - unsigned long first_handle = 0, last_handle = 0; spin_lock(&pool->lock); if (!pool->ops || !pool->ops->evict || list_empty(&pool->lru) || @@ -454,44 +505,84 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) return 0; } zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru); - BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0); - /* Move this last element to beginning of LRU */ - list_del(&zhdr->lru); - list_add(&zhdr->lru, &pool->lru); - /* Protect zbud page against free */ - get_zbud_page(zhdr); - /* - * We need encode the handles before unlocking, since we can - * race with free that will set (first|last)_chunks to 0 - */ - first_handle = 0; - last_handle = 0; - if (zhdr->first_chunks) - first_handle = encode_handle(zhdr, FIRST); - if (zhdr->last_chunks) - last_handle = encode_handle(zhdr, LAST); - spin_unlock(&pool->lock); - - /* Issue the eviction callback(s) */ - if (first_handle) { - ret = pool->ops->evict(pool, first_handle); - if (ret) - goto next; + if (do_reclaim(pool, zhdr, pool->ops->evict)) { + spin_unlock(&pool->lock); + return 0; } - if (last_handle) { - ret = pool->ops->evict(pool, last_handle); - if (ret) - goto next; + } + spin_unlock(&pool->lock); + return -EAGAIN; +} + + +/** + * zbud_reclaim_pages() - reclaims zbud pages by unusing stored pages + * @zbud_pages list of zbud pages to reclaim + * + * zbud reclaim is different from normal system reclaim in that the reclaim is + * done from the bottom, up. This is because only the bottom layer, zbud, has + * information on how the allocations are organized within each zbud page. This + * has the potential to create interesting locking situations between zbud and + * the user, however. + * + * To avoid these, this is how zbud_reclaim_pages() should be called: + + * The user detects some pages should be reclaimed and calls + * zbud_reclaim_pages(). The zbud_reclaim_pages() will remove zbud + * pages from the pool LRU list and call the user-defined unuse handler with + * the pool and handle as arguments. + * + * If the handle can not be unused, the unuse handler should return + * non-zero. zbud_reclaim_pages() will add the zbud page back to the + * appropriate list and try the next zbud page on the list. + * + * If the handle is successfully unused, the unuse handler should + * return 0. + * The zbud page will be freed later by unuse code + * (e.g. frontswap_invalidate_page()). + * + * If all buddies in the zbud page are successfully unused, then the + * zbud page can be freed. + */ +void zbud_reclaim_pages(struct list_head *zbud_pages) +{ + struct page *page; + struct page *page2; + + list_for_each_entry_safe(page, page2, zbud_pages, lru) { + struct zbud_header *zhdr; + struct zbud_pool *pool; + + list_del(&page->lru); + if (!PageZbud(page)) { + /* + * Drop page count from isolate_migratepages_range() + */ + put_page(page); + continue; } -next: + zhdr = page_address(page); + BUG_ON(!zhdr->pool); + pool = zhdr->pool; + spin_lock(&pool->lock); + /* Drop page count from isolate_migratepages_range() */ if (put_zbud_page(pool, zhdr)) { + /* + * zbud_free() could free the handles before acquiring + * pool lock above. No need to reclaim. + */ spin_unlock(&pool->lock); - return 0; + continue; + } + if (!pool->ops || !pool->ops->unuse || list_empty(&pool->lru)) { + spin_unlock(&pool->lock); + continue; } + BUG_ON(!PageZbud(page)); + do_reclaim(pool, zhdr, pool->ops->unuse); + spin_unlock(&pool->lock); } - spin_unlock(&pool->lock); - return -EAGAIN; } /** diff --git a/mm/zswap.c b/mm/zswap.c index deda2b6..846649b 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -35,6 +35,9 @@ #include #include #include +#include +#include +#include #include #include @@ -61,6 +64,8 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0); static u64 zswap_pool_limit_hit; /* Pages written back when pool limit was reached */ static u64 zswap_written_back_pages; +/* Pages unused due to reclaim */ +static u64 zswap_unused_pages; /* Store failed due to a reclaim failure after pool limit was reached */ static u64 zswap_reject_reclaim_fail; /* Compressed page was too big for the allocator to (optimally) store */ @@ -596,6 +601,47 @@ fail: return ret; } +/** + * Tries to unuse swap entries by uncompressing them. + * Function is a stripped swapfile.c::try_to_unuse(). + * + * Returns 0 on success or negative on error. + */ +static int zswap_unuse_entry(struct zbud_pool *pool, unsigned long handle) +{ + struct zswap_header *zhdr; + swp_entry_t swpentry; + struct zswap_tree *tree; + pgoff_t offset; + struct mm_struct *start_mm; + struct swap_info_struct *si; + int ret; + + /* extract swpentry from data */ + zhdr = zbud_map(pool, handle); + swpentry = zhdr->swpentry; /* here */ + zbud_unmap(pool, handle); + tree = zswap_trees[swp_type(swpentry)]; + offset = swp_offset(swpentry); + BUG_ON(pool != tree->pool); + + /* + * We cannot hold swap_lock here but swap_info may + * change (e.g. by swapoff). In case of swapoff + * check for SWP_WRITEOK. + */ + si = swap_info[swp_type(swpentry)]; + if (!(si->flags & SWP_WRITEOK)) + return -ECANCELED; + + start_mm = &init_mm; + atomic_inc(&init_mm.mm_users); + ret = try_to_unuse_swp_entry(&start_mm, si, swpentry); + mmput(start_mm); + zswap_unused_pages++; + return ret; +} + /********************************* * frontswap hooks **********************************/ @@ -620,7 +666,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, /* reclaim space if needed */ if (zswap_is_full()) { zswap_pool_limit_hit++; - if (zbud_reclaim_page(tree->pool, 8)) { + if (zbud_reclaim_lru_page(tree->pool, 8)) { zswap_reject_reclaim_fail++; ret = -ENOMEM; goto reject; @@ -647,8 +693,8 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, /* store */ len = dlen + sizeof(struct zswap_header); - ret = zbud_alloc(tree->pool, len, __GFP_NORETRY | __GFP_NOWARN, - &handle); + ret = zbud_alloc(tree->pool, len, __GFP_NORETRY | __GFP_NOWARN | + __GFP_RECLAIMABLE, &handle); if (ret == -ENOSPC) { zswap_reject_compress_poor++; goto freepage; @@ -819,7 +865,8 @@ static void zswap_frontswap_invalidate_area(unsigned type) } static struct zbud_ops zswap_zbud_ops = { - .evict = zswap_writeback_entry + .evict = zswap_writeback_entry, + .unuse = zswap_unuse_entry }; static void zswap_frontswap_init(unsigned type) @@ -880,6 +927,8 @@ static int __init zswap_debugfs_init(void) zswap_debugfs_root, &zswap_reject_compress_poor); debugfs_create_u64("written_back_pages", S_IRUGO, zswap_debugfs_root, &zswap_written_back_pages); + debugfs_create_u64("unused_pages", S_IRUGO, + zswap_debugfs_root, &zswap_unused_pages); debugfs_create_u64("duplicate_entry", S_IRUGO, zswap_debugfs_root, &zswap_duplicate_entry); debugfs_create_u64("pool_pages", S_IRUGO, -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx184.postini.com [74.125.245.184]) by kanga.kvack.org (Postfix) with SMTP id 8178C6B0031 for ; Tue, 6 Aug 2013 05:00:20 -0400 (EDT) Message-ID: <5200BB18.9010105@oracle.com> Date: Tue, 06 Aug 2013 17:00:08 +0800 From: Bob Liu MIME-Version: 1.0 Subject: Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> <1375771361-8388-2-git-send-email-k.kozlowski@samsung.com> In-Reply-To: <1375771361-8388-2-git-send-email-k.kozlowski@samsung.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Krzysztof Kozlowski Cc: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Tomasz Stanislawski Hi Krzysztof, On 08/06/2013 02:42 PM, Krzysztof Kozlowski wrote: > Use page reference counter for zbud pages. The ref counter replaces > zbud_header.under_reclaim flag and ensures that zbud page won't be freed > when zbud_free() is called during reclaim. It allows implementation of > additional reclaim paths. > > The page count is incremented when: > - a handle is created and passed to zswap (in zbud_alloc()), > - user-supplied eviction callback is called (in zbud_reclaim_page()). > > Signed-off-by: Krzysztof Kozlowski > Signed-off-by: Tomasz Stanislawski Looks good to me. Reviewed-by: Bob Liu > --- > mm/zbud.c | 150 +++++++++++++++++++++++++++++++++++-------------------------- > 1 file changed, 86 insertions(+), 64 deletions(-) > > diff --git a/mm/zbud.c b/mm/zbud.c > index ad1e781..a8e986f 100644 > --- a/mm/zbud.c > +++ b/mm/zbud.c > @@ -109,7 +109,6 @@ struct zbud_header { > struct list_head lru; > unsigned int first_chunks; > unsigned int last_chunks; > - bool under_reclaim; > }; > > /***************** > @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page) > zhdr->last_chunks = 0; > INIT_LIST_HEAD(&zhdr->buddy); > INIT_LIST_HEAD(&zhdr->lru); > - zhdr->under_reclaim = 0; > return zhdr; > } > > -/* Resets the struct page fields and frees the page */ > -static void free_zbud_page(struct zbud_header *zhdr) > -{ > - __free_page(virt_to_page(zhdr)); > -} > - > /* > * Encodes the handle of a particular buddy within a zbud page > * Pool lock should be held as this function accesses first|last_chunks > @@ -188,6 +180,65 @@ static int num_free_chunks(struct zbud_header *zhdr) > return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1; > } > > +/* > + * Called after zbud_free() or zbud_alloc(). > + * Checks whether given zbud page has to be: > + * - removed from buddied/unbuddied/LRU lists completetely (zbud_free). > + * - moved from buddied to unbuddied list > + * and to beginning of LRU (zbud_alloc, zbud_free), > + * - added to buddied list and LRU (zbud_alloc), > + * > + * The page must be already removed from buddied/unbuddied lists. > + * Must be called under pool->lock. > + */ > +static void rebalance_lists(struct zbud_pool *pool, struct zbud_header *zhdr) > +{ Nit picker, how about change the name to adjust_lists() or something like this because we don't do any rebalancing. -- Regards, -Bob -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx181.postini.com [74.125.245.181]) by kanga.kvack.org (Postfix) with SMTP id 083986B0034 for ; Tue, 6 Aug 2013 05:16:40 -0400 (EDT) Message-ID: <5200BEEF.7060904@oracle.com> Date: Tue, 06 Aug 2013 17:16:31 +0800 From: Bob Liu MIME-Version: 1.0 Subject: Re: [RFC PATCH 0/4] mm: reclaim zbud pages on migration and compaction References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> In-Reply-To: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Krzysztof Kozlowski Cc: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park On 08/06/2013 02:42 PM, Krzysztof Kozlowski wrote: > Hi, > > Currently zbud pages are not movable and they cannot be allocated from CMA > region. These patches try to address the problem by: > 1. Adding a new form of reclaim of zbud pages. > 2. Reclaiming zbud pages during migration and compaction. > 3. Allocating zbud pages with __GFP_RECLAIMABLE flag. > > This reclaim process is different than zbud_reclaim_page(). It acts more > like swapoff() by trying to unuse pages stored in zbud page and bring > them back to memory. The standard zbud_reclaim_page() on the other hand > tries to write them back. I prefer to migrate zbud pages directly if it's possible than reclaiming them during compaction. > > One of patches introduces a new flag: PageZbud. This flag is used in > isolate_migratepages_range() to grab zbud pages and pass them later > for reclaim. Probably this could be replaced with something > smarter than a flag used only in one case. > Any ideas for a better solution are welcome. > > This patch set is based on Linux 3.11-rc4. > > TODOs: > 1. Replace PageZbud flag with other solution. > > Best regards, > Krzysztof Kozlowski > > > Krzysztof Kozlowski (4): > zbud: use page ref counter for zbud pages > mm: split code for unusing swap entries from try_to_unuse > mm: add zbud flag to page flags > mm: reclaim zbud pages on migration and compaction > > include/linux/page-flags.h | 12 ++ > include/linux/swapfile.h | 2 + > include/linux/zbud.h | 11 +- > mm/compaction.c | 20 ++- > mm/internal.h | 1 + > mm/page_alloc.c | 9 ++ > mm/swapfile.c | 354 +++++++++++++++++++++++--------------------- > mm/zbud.c | 301 +++++++++++++++++++++++++------------ > mm/zswap.c | 57 ++++++- > 9 files changed, 499 insertions(+), 268 deletions(-) > -- Regards, -Bob -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx129.postini.com [74.125.245.129]) by kanga.kvack.org (Postfix) with SMTP id 3427E6B0034 for ; Tue, 6 Aug 2013 05:25:36 -0400 (EDT) Received: from eucpsbgm2.samsung.com (unknown [203.254.199.245]) by mailout1.w1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MR3002MIRIM8N50@mailout1.w1.samsung.com> for linux-mm@kvack.org; Tue, 06 Aug 2013 10:25:34 +0100 (BST) Message-id: <1375781132.2003.4.camel@AMDC1943> Subject: Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages From: Krzysztof Kozlowski Date: Tue, 06 Aug 2013 11:25:32 +0200 In-reply-to: <5200BB18.9010105@oracle.com> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> <1375771361-8388-2-git-send-email-k.kozlowski@samsung.com> <5200BB18.9010105@oracle.com> Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7bit MIME-version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Bob Liu Cc: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Tomasz Stanislawski Hi Bob, Thank you for review. On wto, 2013-08-06 at 17:00 +0800, Bob Liu wrote: > Nit picker, how about change the name to adjust_lists() or something > like this because we don't do any rebalancing. OK, I'll change it. Best regards, Krzysztof -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx163.postini.com [74.125.245.163]) by kanga.kvack.org (Postfix) with SMTP id 5D95D6B0031 for ; Tue, 6 Aug 2013 09:05:18 -0400 (EDT) Received: from eucpsbgm2.samsung.com (unknown [203.254.199.245]) by mailout3.w1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MR4002G31NX1190@mailout3.w1.samsung.com> for linux-mm@kvack.org; Tue, 06 Aug 2013 14:05:16 +0100 (BST) Message-id: <1375794314.13955.6.camel@AMDC1943> Subject: Re: [RFC PATCH 0/4] mm: reclaim zbud pages on migration and compaction From: Krzysztof Kozlowski Date: Tue, 06 Aug 2013 15:05:14 +0200 In-reply-to: <5200BEEF.7060904@oracle.com> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> <5200BEEF.7060904@oracle.com> Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7bit MIME-version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Bob Liu Cc: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park On wto, 2013-08-06 at 17:16 +0800, Bob Liu wrote: > On 08/06/2013 02:42 PM, Krzysztof Kozlowski wrote: > > This reclaim process is different than zbud_reclaim_page(). It acts more > > like swapoff() by trying to unuse pages stored in zbud page and bring > > them back to memory. The standard zbud_reclaim_page() on the other hand > > tries to write them back. > > I prefer to migrate zbud pages directly if it's possible than reclaiming > them during compaction. I think it is possible however it would be definitely more complex. In case of migration the zswap handles should be updated as they are just virtual addresses. Am I right? Best regards, Krzysztof -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx114.postini.com [74.125.245.114]) by kanga.kvack.org (Postfix) with SMTP id B40556B0031 for ; Tue, 6 Aug 2013 12:59:12 -0400 (EDT) Message-ID: <52012B35.90801@intel.com> Date: Tue, 06 Aug 2013 09:58:29 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [RFC PATCH 3/4] mm: add zbud flag to page flags References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> <1375771361-8388-4-git-send-email-k.kozlowski@samsung.com> In-Reply-To: <1375771361-8388-4-git-send-email-k.kozlowski@samsung.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Krzysztof Kozlowski Cc: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park On 08/05/2013 11:42 PM, Krzysztof Kozlowski wrote: > +#ifdef CONFIG_ZBUD > + /* Allocated by zbud. Flag is necessary to find zbud pages to unuse > + * during migration/compaction. > + */ > + PG_zbud, > +#endif Do you _really_ need an absolutely new, unshared page flag? The zbud code doesn't really look like it uses any of the space in 'struct page'. I think you could pretty easily alias PG_zbud=PG_slab, then use the page->{private,slab_cache} (or some other unused field) in 'struct page' to store a cookie to differentiate slab and zbud pages. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx136.postini.com [74.125.245.136]) by kanga.kvack.org (Postfix) with SMTP id 9FCA86B0031 for ; Tue, 6 Aug 2013 14:51:06 -0400 (EDT) Date: Tue, 6 Aug 2013 13:51:04 -0500 From: Seth Jennings Subject: Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages Message-ID: <20130806185104.GD5765@medulla.variantweb.net> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> <1375771361-8388-2-git-send-email-k.kozlowski@samsung.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1375771361-8388-2-git-send-email-k.kozlowski@samsung.com> Sender: owner-linux-mm@kvack.org List-ID: To: Krzysztof Kozlowski Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Tomasz Stanislawski On Tue, Aug 06, 2013 at 08:42:38AM +0200, Krzysztof Kozlowski wrote: > Use page reference counter for zbud pages. The ref counter replaces > zbud_header.under_reclaim flag and ensures that zbud page won't be freed > when zbud_free() is called during reclaim. It allows implementation of > additional reclaim paths. > > The page count is incremented when: > - a handle is created and passed to zswap (in zbud_alloc()), > - user-supplied eviction callback is called (in zbud_reclaim_page()). I like the idea. I few things below. Also agree with Bob the s/rebalance/adjust/ for rebalance_lists(). > > Signed-off-by: Krzysztof Kozlowski > Signed-off-by: Tomasz Stanislawski > --- > mm/zbud.c | 150 +++++++++++++++++++++++++++++++++++-------------------------- > 1 file changed, 86 insertions(+), 64 deletions(-) > > diff --git a/mm/zbud.c b/mm/zbud.c > index ad1e781..a8e986f 100644 > --- a/mm/zbud.c > +++ b/mm/zbud.c > @@ -109,7 +109,6 @@ struct zbud_header { > struct list_head lru; > unsigned int first_chunks; > unsigned int last_chunks; > - bool under_reclaim; > }; > > /***************** > @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page) > zhdr->last_chunks = 0; > INIT_LIST_HEAD(&zhdr->buddy); > INIT_LIST_HEAD(&zhdr->lru); > - zhdr->under_reclaim = 0; > return zhdr; > } > > -/* Resets the struct page fields and frees the page */ > -static void free_zbud_page(struct zbud_header *zhdr) > -{ > - __free_page(virt_to_page(zhdr)); > -} > - > /* > * Encodes the handle of a particular buddy within a zbud page > * Pool lock should be held as this function accesses first|last_chunks > @@ -188,6 +180,65 @@ static int num_free_chunks(struct zbud_header *zhdr) > return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1; > } > > +/* > + * Called after zbud_free() or zbud_alloc(). > + * Checks whether given zbud page has to be: > + * - removed from buddied/unbuddied/LRU lists completetely (zbud_free). > + * - moved from buddied to unbuddied list > + * and to beginning of LRU (zbud_alloc, zbud_free), > + * - added to buddied list and LRU (zbud_alloc), > + * > + * The page must be already removed from buddied/unbuddied lists. > + * Must be called under pool->lock. > + */ > +static void rebalance_lists(struct zbud_pool *pool, struct zbud_header *zhdr) > +{ > + if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { > + /* zbud_free() */ > + list_del(&zhdr->lru); > + return; > + } else if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) { s/else if/if/ since the if above returns if true. > + /* zbud_free() or zbud_alloc() */ > + int freechunks = num_free_chunks(zhdr); > + list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); > + } else { > + /* zbud_alloc() */ > + list_add(&zhdr->buddy, &pool->buddied); > + } > + /* Add/move zbud page to beginning of LRU */ > + if (!list_empty(&zhdr->lru)) > + list_del(&zhdr->lru); We don't want to reinsert to the LRU list if we have called zbud_free() on a zbud page that previously had two buddies. This code causes the zbud page to move to the front of the LRU list which is not what we want. > + list_add(&zhdr->lru, &pool->lru); > +} > + > +/* > + * Increases ref count for zbud page. > + */ > +static void get_zbud_page(struct zbud_header *zhdr) > +{ > + get_page(virt_to_page(zhdr)); > +} > + > +/* > + * Decreases ref count for zbud page and frees the page if it reaches 0 > + * (no external references, e.g. handles). > + * > + * Must be called under pool->lock. > + * > + * Returns 1 if page was freed and 0 otherwise. > + */ > +static int put_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr) > +{ > + struct page *page = virt_to_page(zhdr); > + if (put_page_testzero(page)) { > + free_hot_cold_page(page, 0); > + pool->pages_nr--; > + return 1; > + } > + return 0; > +} > + > + > /***************** > * API Functions > *****************/ > @@ -250,7 +301,7 @@ void zbud_destroy_pool(struct zbud_pool *pool) > int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, > unsigned long *handle) > { > - int chunks, i, freechunks; > + int chunks, i; > struct zbud_header *zhdr = NULL; > enum buddy bud; > struct page *page; > @@ -273,6 +324,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, > bud = FIRST; > else > bud = LAST; > + get_zbud_page(zhdr); > goto found; > } > } > @@ -284,6 +336,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, > return -ENOMEM; > spin_lock(&pool->lock); > pool->pages_nr++; > + /* > + * We will be using zhdr instead of page, so > + * don't increase the page count. > + */ > zhdr = init_zbud_page(page); > bud = FIRST; > > @@ -293,19 +349,7 @@ found: > else > zhdr->last_chunks = chunks; > > - if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) { > - /* Add to unbuddied list */ > - freechunks = num_free_chunks(zhdr); > - list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); > - } else { > - /* Add to buddied list */ > - list_add(&zhdr->buddy, &pool->buddied); > - } > - > - /* Add/move zbud page to beginning of LRU */ > - if (!list_empty(&zhdr->lru)) > - list_del(&zhdr->lru); > - list_add(&zhdr->lru, &pool->lru); > + rebalance_lists(pool, zhdr); > > *handle = encode_handle(zhdr, bud); > spin_unlock(&pool->lock); > @@ -326,10 +370,10 @@ found: > void zbud_free(struct zbud_pool *pool, unsigned long handle) > { > struct zbud_header *zhdr; > - int freechunks; > > spin_lock(&pool->lock); > zhdr = handle_to_zbud_header(handle); > + BUG_ON(zhdr->last_chunks == 0 && zhdr->first_chunks == 0); Not sure we need this. Maybe, at most, VM_BUG_ON()? > > /* If first buddy, handle will be page aligned */ > if ((handle - ZHDR_SIZE_ALIGNED) & ~PAGE_MASK) > @@ -337,26 +381,9 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) > else > zhdr->first_chunks = 0; > > - if (zhdr->under_reclaim) { > - /* zbud page is under reclaim, reclaim will free */ > - spin_unlock(&pool->lock); > - return; > - } > - > - /* Remove from existing buddy list */ > list_del(&zhdr->buddy); > - > - if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { > - /* zbud page is empty, free */ > - list_del(&zhdr->lru); > - free_zbud_page(zhdr); > - pool->pages_nr--; > - } else { > - /* Add to unbuddied list */ > - freechunks = num_free_chunks(zhdr); > - list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); > - } > - > + rebalance_lists(pool, zhdr); > + put_zbud_page(pool, zhdr); > spin_unlock(&pool->lock); > } > > @@ -400,7 +427,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) > */ > int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) > { > - int i, ret, freechunks; > + int i, ret; > struct zbud_header *zhdr; > unsigned long first_handle = 0, last_handle = 0; > > @@ -411,11 +438,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) > return -EINVAL; > } > for (i = 0; i < retries; i++) { > + if (list_empty(&pool->lru)) { > + /* > + * LRU was emptied during evict calls in previous > + * iteration but put_zbud_page() returned 0 meaning > + * that someone still holds the page. This may > + * happen when some other mm mechanism increased > + * the page count. > + * In such case we succedded with reclaim. > + */ > + return 0; > + } > zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru); > + BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0); Again here. Thanks, Seth > + /* Move this last element to beginning of LRU */ > list_del(&zhdr->lru); > - list_del(&zhdr->buddy); > + list_add(&zhdr->lru, &pool->lru); > /* Protect zbud page against free */ > - zhdr->under_reclaim = true; > + get_zbud_page(zhdr); > /* > * We need encode the handles before unlocking, since we can > * race with free that will set (first|last)_chunks to 0 > @@ -441,28 +481,10 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) > } > next: > spin_lock(&pool->lock); > - zhdr->under_reclaim = false; > - if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { > - /* > - * Both buddies are now free, free the zbud page and > - * return success. > - */ > - free_zbud_page(zhdr); > - pool->pages_nr--; > + if (put_zbud_page(pool, zhdr)) { > spin_unlock(&pool->lock); > return 0; > - } else if (zhdr->first_chunks == 0 || > - zhdr->last_chunks == 0) { > - /* add to unbuddied list */ > - freechunks = num_free_chunks(zhdr); > - list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); > - } else { > - /* add to buddied list */ > - list_add(&zhdr->buddy, &pool->buddied); > } > - > - /* add to beginning of LRU */ > - list_add(&zhdr->lru, &pool->lru); > } > spin_unlock(&pool->lock); > return -EAGAIN; > -- > 1.7.9.5 > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx177.postini.com [74.125.245.177]) by kanga.kvack.org (Postfix) with SMTP id 95AF06B0031 for ; Tue, 6 Aug 2013 14:58:00 -0400 (EDT) Date: Tue, 6 Aug 2013 13:57:59 -0500 From: Seth Jennings Subject: Re: [RFC PATCH 3/4] mm: add zbud flag to page flags Message-ID: <20130806185759.GE5765@medulla.variantweb.net> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> <1375771361-8388-4-git-send-email-k.kozlowski@samsung.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1375771361-8388-4-git-send-email-k.kozlowski@samsung.com> Sender: owner-linux-mm@kvack.org List-ID: To: Krzysztof Kozlowski Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park On Tue, Aug 06, 2013 at 08:42:40AM +0200, Krzysztof Kozlowski wrote: > Add PageZbud flag to page flags to distinguish pages allocated in zbud. > Currently these pages do not have any flags set. Yeah, using a page flags for zbud is probably not going to be acceptable. We'll have to find some other way to identify zbud pages. Seth -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx127.postini.com [74.125.245.127]) by kanga.kvack.org (Postfix) with SMTP id 1614C6B006C for ; Wed, 7 Aug 2013 03:04:06 -0400 (EDT) Received: from eucpsbgm2.samsung.com (unknown [203.254.199.245]) by mailout1.w1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MR500IZ3FMB2D30@mailout1.w1.samsung.com> for linux-mm@kvack.org; Wed, 07 Aug 2013 08:04:04 +0100 (BST) Message-id: <1375859042.17079.1.camel@AMDC1943> Subject: Re: [RFC PATCH 3/4] mm: add zbud flag to page flags From: Krzysztof Kozlowski Date: Wed, 07 Aug 2013 09:04:02 +0200 In-reply-to: <52012B35.90801@intel.com> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> <1375771361-8388-4-git-send-email-k.kozlowski@samsung.com> <52012B35.90801@intel.com> Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7bit MIME-version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park On wto, 2013-08-06 at 09:58 -0700, Dave Hansen wrote: > On 08/05/2013 11:42 PM, Krzysztof Kozlowski wrote: > > +#ifdef CONFIG_ZBUD > > + /* Allocated by zbud. Flag is necessary to find zbud pages to unuse > > + * during migration/compaction. > > + */ > > + PG_zbud, > > +#endif > > Do you _really_ need an absolutely new, unshared page flag? > The zbud code doesn't really look like it uses any of the space in > 'struct page'. > > I think you could pretty easily alias PG_zbud=PG_slab, then use the > page->{private,slab_cache} (or some other unused field) in 'struct page' > to store a cookie to differentiate slab and zbud pages. Thanks for idea, I will try that. Best regards, Krzysztof -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx179.postini.com [74.125.245.179]) by kanga.kvack.org (Postfix) with SMTP id A5ADB6B0075 for ; Wed, 7 Aug 2013 03:31:54 -0400 (EDT) Received: from eucpsbgm1.samsung.com (unknown [203.254.199.244]) by mailout4.w1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MR500ER6GWMOZ40@mailout4.w1.samsung.com> for linux-mm@kvack.org; Wed, 07 Aug 2013 08:31:52 +0100 (BST) Message-id: <1375860711.17079.16.camel@AMDC1943> Subject: Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages From: Krzysztof Kozlowski Date: Wed, 07 Aug 2013 09:31:51 +0200 In-reply-to: <20130806185104.GD5765@medulla.variantweb.net> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> <1375771361-8388-2-git-send-email-k.kozlowski@samsung.com> <20130806185104.GD5765@medulla.variantweb.net> Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7bit MIME-version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Seth Jennings Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Tomasz Stanislawski , Bob Liu Hi Seth, On wto, 2013-08-06 at 13:51 -0500, Seth Jennings wrote: > I like the idea. I few things below. Also agree with Bob the > s/rebalance/adjust/ for rebalance_lists(). OK. > s/else if/if/ since the if above returns if true. Sure. > > + /* zbud_free() or zbud_alloc() */ > > + int freechunks = num_free_chunks(zhdr); > > + list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); > > + } else { > > + /* zbud_alloc() */ > > + list_add(&zhdr->buddy, &pool->buddied); > > + } > > + /* Add/move zbud page to beginning of LRU */ > > + if (!list_empty(&zhdr->lru)) > > + list_del(&zhdr->lru); > > We don't want to reinsert to the LRU list if we have called zbud_free() > on a zbud page that previously had two buddies. This code causes the > zbud page to move to the front of the LRU list which is not what we want. Right, I'll fix it. > > @@ -326,10 +370,10 @@ found: > > void zbud_free(struct zbud_pool *pool, unsigned long handle) > > { > > struct zbud_header *zhdr; > > - int freechunks; > > > > spin_lock(&pool->lock); > > zhdr = handle_to_zbud_header(handle); > > + BUG_ON(zhdr->last_chunks == 0 && zhdr->first_chunks == 0); > > Not sure we need this. Maybe, at most, VM_BUG_ON()? Actually it is somehow a leftover after debugging so I don't mind removing it completely. > > @@ -411,11 +438,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) > > return -EINVAL; > > } > > for (i = 0; i < retries; i++) { > > + if (list_empty(&pool->lru)) { > > + /* > > + * LRU was emptied during evict calls in previous > > + * iteration but put_zbud_page() returned 0 meaning > > + * that someone still holds the page. This may > > + * happen when some other mm mechanism increased > > + * the page count. > > + * In such case we succedded with reclaim. > > + */ > > + return 0; > > + } > > zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru); > > + BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0); > > Again here. I agree. Thanks for comments, Krzysztof -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx181.postini.com [74.125.245.181]) by kanga.kvack.org (Postfix) with SMTP id 40F546B0032 for ; Thu, 8 Aug 2013 03:26:38 -0400 (EDT) Received: from eucpsbgm1.samsung.com (unknown [203.254.199.244]) by mailout4.w1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MR700LK5BBWDH50@mailout4.w1.samsung.com> for linux-mm@kvack.org; Thu, 08 Aug 2013 08:26:36 +0100 (BST) Message-id: <1375946794.25843.1.camel@AMDC1943> Subject: Re: [RFC PATCH 3/4] mm: add zbud flag to page flags From: Krzysztof Kozlowski Date: Thu, 08 Aug 2013 09:26:34 +0200 In-reply-to: <52012B35.90801@intel.com> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> <1375771361-8388-4-git-send-email-k.kozlowski@samsung.com> <52012B35.90801@intel.com> Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7bit MIME-version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park Hi, On wto, 2013-08-06 at 09:58 -0700, Dave Hansen wrote: > On 08/05/2013 11:42 PM, Krzysztof Kozlowski wrote: > > +#ifdef CONFIG_ZBUD > > + /* Allocated by zbud. Flag is necessary to find zbud pages to unuse > > + * during migration/compaction. > > + */ > > + PG_zbud, > > +#endif > > Do you _really_ need an absolutely new, unshared page flag? > The zbud code doesn't really look like it uses any of the space in > 'struct page'. > > I think you could pretty easily alias PG_zbud=PG_slab, then use the > page->{private,slab_cache} (or some other unused field) in 'struct page' > to store a cookie to differentiate slab and zbud pages. How about using page->_mapcount with negative value (-129)? Just like PageBuddy()? Best regards, Krzysztof -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755243Ab3HFGm6 (ORCPT ); Tue, 6 Aug 2013 02:42:58 -0400 Received: from mailout3.w1.samsung.com ([210.118.77.13]:34453 "EHLO mailout3.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753917Ab3HFGm5 (ORCPT ); Tue, 6 Aug 2013 02:42:57 -0400 X-AuditID: cbfec7f4-b7f5f6d000000ff6-86-52009aee6cb5 From: Krzysztof Kozlowski To: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Cc: Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Krzysztof Kozlowski Subject: [RFC PATCH 0/4] mm: reclaim zbud pages on migration and compaction Date: Tue, 06 Aug 2013 08:42:37 +0200 Message-id: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> X-Mailer: git-send-email 1.7.9.5 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrFJMWRmVeSWpSXmKPExsVy+t/xa7rvZjEEGez8Y2YxZ/0aNouNM9az Wrx+YWhxtukNu8XlXXPYLO6t+c9qsfbIXXaLye+eMVoc2reK3YHTY9OnSeweJ2b8ZvF4cGgz i0ffllWMHptPV3t83iQXwBbFZZOSmpNZllqkb5fAldHwdAZ7wXreivaf7cwNjF1cXYycHBIC JhLfZs9jhrDFJC7cW8/WxcjFISSwlFHi86FmZginj0ni2vGpYFVsAsYSm5cvYQOxRQT6GCUm XbYAsZkFnjJK7PruBGILC3hL7J+5hgnEZhFQldi+Yyc7iM0r4Cbx+/kPoDkcQNsUJOZMspnA yL2AkWEVo2hqaXJBcVJ6rqFecWJucWleul5yfu4mRkj4fNnBuPiY1SFGAQ5GJR7ehKv/A4VY E8uKK3MPMUpwMCuJ8DJWMQQJ8aYkVlalFuXHF5XmpBYfYmTi4JRqYJzdx+qbUxPcpTB57a2X PxknfNDwThZfHv3LqEjkNf/u/FcnDDOudXuK5Rn8LFPcZab6d83VC1cbI1b4MD9v6VW6saKy 6nKMe/y2W78ZZsjacm5X28w17YPkHFN5vYWTggX2ZNhri4tYled7zrBm3Hinz8blfefy9Y1T gytPHJtevqtbccuqj0osxRmJhlrMRcWJADNeX9b9AQAA Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Currently zbud pages are not movable and they cannot be allocated from CMA region. These patches try to address the problem by: 1. Adding a new form of reclaim of zbud pages. 2. Reclaiming zbud pages during migration and compaction. 3. Allocating zbud pages with __GFP_RECLAIMABLE flag. This reclaim process is different than zbud_reclaim_page(). It acts more like swapoff() by trying to unuse pages stored in zbud page and bring them back to memory. The standard zbud_reclaim_page() on the other hand tries to write them back. One of patches introduces a new flag: PageZbud. This flag is used in isolate_migratepages_range() to grab zbud pages and pass them later for reclaim. Probably this could be replaced with something smarter than a flag used only in one case. Any ideas for a better solution are welcome. This patch set is based on Linux 3.11-rc4. TODOs: 1. Replace PageZbud flag with other solution. Best regards, Krzysztof Kozlowski Krzysztof Kozlowski (4): zbud: use page ref counter for zbud pages mm: split code for unusing swap entries from try_to_unuse mm: add zbud flag to page flags mm: reclaim zbud pages on migration and compaction include/linux/page-flags.h | 12 ++ include/linux/swapfile.h | 2 + include/linux/zbud.h | 11 +- mm/compaction.c | 20 ++- mm/internal.h | 1 + mm/page_alloc.c | 9 ++ mm/swapfile.c | 354 +++++++++++++++++++++++--------------------- mm/zbud.c | 301 +++++++++++++++++++++++++------------ mm/zswap.c | 57 ++++++- 9 files changed, 499 insertions(+), 268 deletions(-) -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755320Ab3HFGnJ (ORCPT ); Tue, 6 Aug 2013 02:43:09 -0400 Received: from mailout1.w1.samsung.com ([210.118.77.11]:33889 "EHLO mailout1.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755253Ab3HFGnF (ORCPT ); Tue, 6 Aug 2013 02:43:05 -0400 X-AuditID: cbfec7f5-b7f5f6d00000105f-57-52009af54ce2 From: Krzysztof Kozlowski To: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Cc: Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Krzysztof Kozlowski , Tomasz Stanislawski Subject: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages Date: Tue, 06 Aug 2013 08:42:38 +0200 Message-id: <1375771361-8388-2-git-send-email-k.kozlowski@samsung.com> X-Mailer: git-send-email 1.7.9.5 In-reply-to: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpjluLIzCtJLcpLzFFi42I5/e/4Nd2vsxiCDC7c1bSYs34Nm8XGGetZ LV6/MLQ42/SG3eLyrjlsFvfW/Ge1WHvkLrvF5HfPGC0O7VvFbjGv/SWrA5fHpk+T2D1OzPjN 4vHg0GYWj74tqxg9Np+u9vi8SS6ALYrLJiU1J7MstUjfLoEro3Pva6aCA/YVn/ZuZm5gXGXU xcjJISFgItH//yYjhC0mceHeerYuRi4OIYGljBL3undBOX1MEsfX9rCDVLEJGEtsXr6EDcQW EehjlJh02QLEZhaYziRx9nseiC0s4CRxvXUWaxcjBweLgKrEgY06IGFeATeJm9uOsoCEJQQU JOZMsgEJcwq4SyzdextsuhBQya8Ls1gmMPIuYGRYxSiaWppcUJyUnmukV5yYW1yal66XnJ+7 iRESdl93MC49ZnWIUYCDUYmHd8f1/4FCrIllxZW5hxglOJiVRHgZqxiChHhTEiurUovy44tK c1KLDzEycXBKNTDqqSbuvMjTvylvKbegawNzeKe/s2K+ybaXU65lbnK5l/aULVDF4zDv0jeF ctfldnq6z7iTsPekl+k0rr8Wy2Zpr9roHGCy60uA0/8Vk/ZsV1qk+WphW6plpVbS0V/XBB9q TFyp8GPi+QTX9rbaAg3n7VF3M5dMfpEtuqX3/YSMuq+Hmb43yCxWYinOSDTUYi4qTgQA6sxl kxkCAAA= Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Use page reference counter for zbud pages. The ref counter replaces zbud_header.under_reclaim flag and ensures that zbud page won't be freed when zbud_free() is called during reclaim. It allows implementation of additional reclaim paths. The page count is incremented when: - a handle is created and passed to zswap (in zbud_alloc()), - user-supplied eviction callback is called (in zbud_reclaim_page()). Signed-off-by: Krzysztof Kozlowski Signed-off-by: Tomasz Stanislawski --- mm/zbud.c | 150 +++++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 86 insertions(+), 64 deletions(-) diff --git a/mm/zbud.c b/mm/zbud.c index ad1e781..a8e986f 100644 --- a/mm/zbud.c +++ b/mm/zbud.c @@ -109,7 +109,6 @@ struct zbud_header { struct list_head lru; unsigned int first_chunks; unsigned int last_chunks; - bool under_reclaim; }; /***************** @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page) zhdr->last_chunks = 0; INIT_LIST_HEAD(&zhdr->buddy); INIT_LIST_HEAD(&zhdr->lru); - zhdr->under_reclaim = 0; return zhdr; } -/* Resets the struct page fields and frees the page */ -static void free_zbud_page(struct zbud_header *zhdr) -{ - __free_page(virt_to_page(zhdr)); -} - /* * Encodes the handle of a particular buddy within a zbud page * Pool lock should be held as this function accesses first|last_chunks @@ -188,6 +180,65 @@ static int num_free_chunks(struct zbud_header *zhdr) return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1; } +/* + * Called after zbud_free() or zbud_alloc(). + * Checks whether given zbud page has to be: + * - removed from buddied/unbuddied/LRU lists completetely (zbud_free). + * - moved from buddied to unbuddied list + * and to beginning of LRU (zbud_alloc, zbud_free), + * - added to buddied list and LRU (zbud_alloc), + * + * The page must be already removed from buddied/unbuddied lists. + * Must be called under pool->lock. + */ +static void rebalance_lists(struct zbud_pool *pool, struct zbud_header *zhdr) +{ + if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { + /* zbud_free() */ + list_del(&zhdr->lru); + return; + } else if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) { + /* zbud_free() or zbud_alloc() */ + int freechunks = num_free_chunks(zhdr); + list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); + } else { + /* zbud_alloc() */ + list_add(&zhdr->buddy, &pool->buddied); + } + /* Add/move zbud page to beginning of LRU */ + if (!list_empty(&zhdr->lru)) + list_del(&zhdr->lru); + list_add(&zhdr->lru, &pool->lru); +} + +/* + * Increases ref count for zbud page. + */ +static void get_zbud_page(struct zbud_header *zhdr) +{ + get_page(virt_to_page(zhdr)); +} + +/* + * Decreases ref count for zbud page and frees the page if it reaches 0 + * (no external references, e.g. handles). + * + * Must be called under pool->lock. + * + * Returns 1 if page was freed and 0 otherwise. + */ +static int put_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr) +{ + struct page *page = virt_to_page(zhdr); + if (put_page_testzero(page)) { + free_hot_cold_page(page, 0); + pool->pages_nr--; + return 1; + } + return 0; +} + + /***************** * API Functions *****************/ @@ -250,7 +301,7 @@ void zbud_destroy_pool(struct zbud_pool *pool) int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, unsigned long *handle) { - int chunks, i, freechunks; + int chunks, i; struct zbud_header *zhdr = NULL; enum buddy bud; struct page *page; @@ -273,6 +324,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, bud = FIRST; else bud = LAST; + get_zbud_page(zhdr); goto found; } } @@ -284,6 +336,10 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, return -ENOMEM; spin_lock(&pool->lock); pool->pages_nr++; + /* + * We will be using zhdr instead of page, so + * don't increase the page count. + */ zhdr = init_zbud_page(page); bud = FIRST; @@ -293,19 +349,7 @@ found: else zhdr->last_chunks = chunks; - if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0) { - /* Add to unbuddied list */ - freechunks = num_free_chunks(zhdr); - list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); - } else { - /* Add to buddied list */ - list_add(&zhdr->buddy, &pool->buddied); - } - - /* Add/move zbud page to beginning of LRU */ - if (!list_empty(&zhdr->lru)) - list_del(&zhdr->lru); - list_add(&zhdr->lru, &pool->lru); + rebalance_lists(pool, zhdr); *handle = encode_handle(zhdr, bud); spin_unlock(&pool->lock); @@ -326,10 +370,10 @@ found: void zbud_free(struct zbud_pool *pool, unsigned long handle) { struct zbud_header *zhdr; - int freechunks; spin_lock(&pool->lock); zhdr = handle_to_zbud_header(handle); + BUG_ON(zhdr->last_chunks == 0 && zhdr->first_chunks == 0); /* If first buddy, handle will be page aligned */ if ((handle - ZHDR_SIZE_ALIGNED) & ~PAGE_MASK) @@ -337,26 +381,9 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) else zhdr->first_chunks = 0; - if (zhdr->under_reclaim) { - /* zbud page is under reclaim, reclaim will free */ - spin_unlock(&pool->lock); - return; - } - - /* Remove from existing buddy list */ list_del(&zhdr->buddy); - - if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { - /* zbud page is empty, free */ - list_del(&zhdr->lru); - free_zbud_page(zhdr); - pool->pages_nr--; - } else { - /* Add to unbuddied list */ - freechunks = num_free_chunks(zhdr); - list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); - } - + rebalance_lists(pool, zhdr); + put_zbud_page(pool, zhdr); spin_unlock(&pool->lock); } @@ -400,7 +427,7 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) */ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) { - int i, ret, freechunks; + int i, ret; struct zbud_header *zhdr; unsigned long first_handle = 0, last_handle = 0; @@ -411,11 +438,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) return -EINVAL; } for (i = 0; i < retries; i++) { + if (list_empty(&pool->lru)) { + /* + * LRU was emptied during evict calls in previous + * iteration but put_zbud_page() returned 0 meaning + * that someone still holds the page. This may + * happen when some other mm mechanism increased + * the page count. + * In such case we succedded with reclaim. + */ + return 0; + } zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru); + BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0); + /* Move this last element to beginning of LRU */ list_del(&zhdr->lru); - list_del(&zhdr->buddy); + list_add(&zhdr->lru, &pool->lru); /* Protect zbud page against free */ - zhdr->under_reclaim = true; + get_zbud_page(zhdr); /* * We need encode the handles before unlocking, since we can * race with free that will set (first|last)_chunks to 0 @@ -441,28 +481,10 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) } next: spin_lock(&pool->lock); - zhdr->under_reclaim = false; - if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) { - /* - * Both buddies are now free, free the zbud page and - * return success. - */ - free_zbud_page(zhdr); - pool->pages_nr--; + if (put_zbud_page(pool, zhdr)) { spin_unlock(&pool->lock); return 0; - } else if (zhdr->first_chunks == 0 || - zhdr->last_chunks == 0) { - /* add to unbuddied list */ - freechunks = num_free_chunks(zhdr); - list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); - } else { - /* add to buddied list */ - list_add(&zhdr->buddy, &pool->buddied); } - - /* add to beginning of LRU */ - list_add(&zhdr->lru, &pool->lru); } spin_unlock(&pool->lock); return -EAGAIN; -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755465Ab3HFGnQ (ORCPT ); Tue, 6 Aug 2013 02:43:16 -0400 Received: from mailout3.w1.samsung.com ([210.118.77.13]:34479 "EHLO mailout3.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755253Ab3HFGnK (ORCPT ); Tue, 6 Aug 2013 02:43:10 -0400 X-AuditID: cbfec7f4-b7f5f6d000000ff6-d1-52009afd8a76 From: Krzysztof Kozlowski To: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Cc: Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Krzysztof Kozlowski Subject: [RFC PATCH 2/4] mm: split code for unusing swap entries from try_to_unuse Date: Tue, 06 Aug 2013 08:42:39 +0200 Message-id: <1375771361-8388-3-git-send-email-k.kozlowski@samsung.com> X-Mailer: git-send-email 1.7.9.5 In-reply-to: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrHJMWRmVeSWpSXmKPExsVy+t/xa7p/ZzEEGVy5x2wxZ/0aNouNM9az Wrx+YWhxtukNu8XlXXPYLO6t+c9qsfbIXXaLye+eMVoc2reK3YHTY9OnSeweJ2b8ZvF4cGgz i0ffllWMHptPV3t83iQXwBbFZZOSmpNZllqkb5fAlfH9pFTB8uSKG7/vsDUwnvHpYuTkkBAw kdj/sZMFwhaTuHBvPRuILSSwlFHi15u0LkYuILuPSeL54smMIAk2AWOJzcuXgBWJCPQxSky6 bAFiMws8ZZTY9d2pi5GDQ1ggRGLiF2YQk0VAVWLBPQeQCl4BN4mtH5axgIQlBBQk5kyyAQlz CrhLLN17mx1iq5vErwuzWCYw8i5gZFjFKJpamlxQnJSea6hXnJhbXJqXrpecn7uJERJmX3Yw Lj5mdYhRgINRiYc34er/QCHWxLLiytxDjBIczEoivIxVDEFCvCmJlVWpRfnxRaU5qcWHGJk4 OKUaGIsnfuGtWFV3/Gj4t4mfX3/qmlvlUTb168wKGdPHWy59iTrn8PJRe9Sb+YeO7dg13+Pr z1+K92+LHfj6y95A0T/G9qjBNo9vu3aVeidny9/X4v11zy/GRCtGOSrV4njAx4gNaSeelj19 uTvMukRDUilPT9pyq4pk8P+lurvcG/34qk+8+CR30k6JpTgj0VCLuag4EQCP5mZuEQIAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Move out the code for unusing swap entries from loop in try_to_unuse() to separate function: try_to_unuse_swp_entry(). Export this new function in swapfile.h just like try_to_unuse() is exported. Signed-off-by: Krzysztof Kozlowski --- include/linux/swapfile.h | 2 + mm/swapfile.c | 354 ++++++++++++++++++++++++---------------------- 2 files changed, 187 insertions(+), 169 deletions(-) diff --git a/include/linux/swapfile.h b/include/linux/swapfile.h index e282624..68c24a7 100644 --- a/include/linux/swapfile.h +++ b/include/linux/swapfile.h @@ -9,5 +9,7 @@ extern spinlock_t swap_lock; extern struct swap_list_t swap_list; extern struct swap_info_struct *swap_info[]; extern int try_to_unuse(unsigned int, bool, unsigned long); +extern int try_to_unuse_swp_entry(struct mm_struct **start_mm, + struct swap_info_struct *si, swp_entry_t entry); #endif /* _LINUX_SWAPFILE_H */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 36af6ee..331d0b8 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1100,6 +1100,189 @@ static unsigned int find_next_to_unuse(struct swap_info_struct *si, } /* + * Returns: + * - negative on error, + * - 0 on success (entry unused) + */ +int try_to_unuse_swp_entry(struct mm_struct **start_mm, + struct swap_info_struct *si, swp_entry_t entry) +{ + pgoff_t offset = swp_offset(entry); + unsigned char *swap_map; + unsigned char swcount; + struct page *page; + int retval = 0; + + if (signal_pending(current)) { + retval = -EINTR; + goto out; + } + + /* + * Get a page for the entry, using the existing swap + * cache page if there is one. Otherwise, get a clean + * page and read the swap into it. + */ + swap_map = &si->swap_map[offset]; + page = read_swap_cache_async(entry, + GFP_HIGHUSER_MOVABLE, NULL, 0); + if (!page) { + /* + * Either swap_duplicate() failed because entry + * has been freed independently, and will not be + * reused since sys_swapoff() already disabled + * allocation from here, or alloc_page() failed. + */ + if (!*swap_map) + retval = 0; + else + retval = -ENOMEM; + goto out; + } + + /* + * Don't hold on to start_mm if it looks like exiting. + */ + if (atomic_read(&(*start_mm)->mm_users) == 1) { + mmput(*start_mm); + *start_mm = &init_mm; + atomic_inc(&init_mm.mm_users); + } + + /* + * Wait for and lock page. When do_swap_page races with + * try_to_unuse, do_swap_page can handle the fault much + * faster than try_to_unuse can locate the entry. This + * apparently redundant "wait_on_page_locked" lets try_to_unuse + * defer to do_swap_page in such a case - in some tests, + * do_swap_page and try_to_unuse repeatedly compete. + */ + wait_on_page_locked(page); + wait_on_page_writeback(page); + lock_page(page); + wait_on_page_writeback(page); + + /* + * Remove all references to entry. + */ + swcount = *swap_map; + if (swap_count(swcount) == SWAP_MAP_SHMEM) { + retval = shmem_unuse(entry, page); + VM_BUG_ON(retval > 0); + /* page has already been unlocked and released */ + goto out; + } + if (swap_count(swcount) && *start_mm != &init_mm) + retval = unuse_mm(*start_mm, entry, page); + + if (swap_count(*swap_map)) { + int set_start_mm = (*swap_map >= swcount); + struct list_head *p = &(*start_mm)->mmlist; + struct mm_struct *new_start_mm = *start_mm; + struct mm_struct *prev_mm = *start_mm; + struct mm_struct *mm; + + atomic_inc(&new_start_mm->mm_users); + atomic_inc(&prev_mm->mm_users); + spin_lock(&mmlist_lock); + while (swap_count(*swap_map) && !retval && + (p = p->next) != &(*start_mm)->mmlist) { + mm = list_entry(p, struct mm_struct, mmlist); + if (!atomic_inc_not_zero(&mm->mm_users)) + continue; + spin_unlock(&mmlist_lock); + mmput(prev_mm); + prev_mm = mm; + + cond_resched(); + + swcount = *swap_map; + if (!swap_count(swcount)) /* any usage ? */ + ; + else if (mm == &init_mm) + set_start_mm = 1; + else + retval = unuse_mm(mm, entry, page); + + if (set_start_mm && *swap_map < swcount) { + mmput(new_start_mm); + atomic_inc(&mm->mm_users); + new_start_mm = mm; + set_start_mm = 0; + } + spin_lock(&mmlist_lock); + } + spin_unlock(&mmlist_lock); + mmput(prev_mm); + mmput(*start_mm); + *start_mm = new_start_mm; + } + if (retval) { + unlock_page(page); + page_cache_release(page); + goto out; + } + + /* + * If a reference remains (rare), we would like to leave + * the page in the swap cache; but try_to_unmap could + * then re-duplicate the entry once we drop page lock, + * so we might loop indefinitely; also, that page could + * not be swapped out to other storage meanwhile. So: + * delete from cache even if there's another reference, + * after ensuring that the data has been saved to disk - + * since if the reference remains (rarer), it will be + * read from disk into another page. Splitting into two + * pages would be incorrect if swap supported "shared + * private" pages, but they are handled by tmpfs files. + * + * Given how unuse_vma() targets one particular offset + * in an anon_vma, once the anon_vma has been determined, + * this splitting happens to be just what is needed to + * handle where KSM pages have been swapped out: re-reading + * is unnecessarily slow, but we can fix that later on. + */ + if (swap_count(*swap_map) && + PageDirty(page) && PageSwapCache(page)) { + struct writeback_control wbc = { + .sync_mode = WB_SYNC_NONE, + }; + + swap_writepage(page, &wbc); + lock_page(page); + wait_on_page_writeback(page); + } + + /* + * It is conceivable that a racing task removed this page from + * swap cache just before we acquired the page lock at the top, + * or while we dropped it in unuse_mm(). The page might even + * be back in swap cache on another swap area: that we must not + * delete, since it may not have been written out to swap yet. + */ + if (PageSwapCache(page) && + likely(page_private(page) == entry.val)) + delete_from_swap_cache(page); + + /* + * So we could skip searching mms once swap count went + * to 1, we did not mark any present ptes as dirty: must + * mark page dirty so shrink_page_list will preserve it. + */ + SetPageDirty(page); + unlock_page(page); + page_cache_release(page); + + /* + * Make sure that we aren't completely killing + * interactive performance. + */ + cond_resched(); +out: + return retval; +} + +/* * We completely avoid races by reading each swap page in advance, * and then search for the process using it. All the necessary * page table adjustments can then be made atomically. @@ -1112,10 +1295,6 @@ int try_to_unuse(unsigned int type, bool frontswap, { struct swap_info_struct *si = swap_info[type]; struct mm_struct *start_mm; - unsigned char *swap_map; - unsigned char swcount; - struct page *page; - swp_entry_t entry; unsigned int i = 0; int retval = 0; @@ -1142,172 +1321,9 @@ int try_to_unuse(unsigned int type, bool frontswap, * there are races when an instance of an entry might be missed. */ while ((i = find_next_to_unuse(si, i, frontswap)) != 0) { - if (signal_pending(current)) { - retval = -EINTR; - break; - } - - /* - * Get a page for the entry, using the existing swap - * cache page if there is one. Otherwise, get a clean - * page and read the swap into it. - */ - swap_map = &si->swap_map[i]; - entry = swp_entry(type, i); - page = read_swap_cache_async(entry, - GFP_HIGHUSER_MOVABLE, NULL, 0); - if (!page) { - /* - * Either swap_duplicate() failed because entry - * has been freed independently, and will not be - * reused since sys_swapoff() already disabled - * allocation from here, or alloc_page() failed. - */ - if (!*swap_map) - continue; - retval = -ENOMEM; - break; - } - - /* - * Don't hold on to start_mm if it looks like exiting. - */ - if (atomic_read(&start_mm->mm_users) == 1) { - mmput(start_mm); - start_mm = &init_mm; - atomic_inc(&init_mm.mm_users); - } - - /* - * Wait for and lock page. When do_swap_page races with - * try_to_unuse, do_swap_page can handle the fault much - * faster than try_to_unuse can locate the entry. This - * apparently redundant "wait_on_page_locked" lets try_to_unuse - * defer to do_swap_page in such a case - in some tests, - * do_swap_page and try_to_unuse repeatedly compete. - */ - wait_on_page_locked(page); - wait_on_page_writeback(page); - lock_page(page); - wait_on_page_writeback(page); - - /* - * Remove all references to entry. - */ - swcount = *swap_map; - if (swap_count(swcount) == SWAP_MAP_SHMEM) { - retval = shmem_unuse(entry, page); - /* page has already been unlocked and released */ - if (retval < 0) - break; - continue; - } - if (swap_count(swcount) && start_mm != &init_mm) - retval = unuse_mm(start_mm, entry, page); - - if (swap_count(*swap_map)) { - int set_start_mm = (*swap_map >= swcount); - struct list_head *p = &start_mm->mmlist; - struct mm_struct *new_start_mm = start_mm; - struct mm_struct *prev_mm = start_mm; - struct mm_struct *mm; - - atomic_inc(&new_start_mm->mm_users); - atomic_inc(&prev_mm->mm_users); - spin_lock(&mmlist_lock); - while (swap_count(*swap_map) && !retval && - (p = p->next) != &start_mm->mmlist) { - mm = list_entry(p, struct mm_struct, mmlist); - if (!atomic_inc_not_zero(&mm->mm_users)) - continue; - spin_unlock(&mmlist_lock); - mmput(prev_mm); - prev_mm = mm; - - cond_resched(); - - swcount = *swap_map; - if (!swap_count(swcount)) /* any usage ? */ - ; - else if (mm == &init_mm) - set_start_mm = 1; - else - retval = unuse_mm(mm, entry, page); - - if (set_start_mm && *swap_map < swcount) { - mmput(new_start_mm); - atomic_inc(&mm->mm_users); - new_start_mm = mm; - set_start_mm = 0; - } - spin_lock(&mmlist_lock); - } - spin_unlock(&mmlist_lock); - mmput(prev_mm); - mmput(start_mm); - start_mm = new_start_mm; - } - if (retval) { - unlock_page(page); - page_cache_release(page); + if (try_to_unuse_swp_entry(&start_mm, si, + swp_entry(type, i)) != 0) break; - } - - /* - * If a reference remains (rare), we would like to leave - * the page in the swap cache; but try_to_unmap could - * then re-duplicate the entry once we drop page lock, - * so we might loop indefinitely; also, that page could - * not be swapped out to other storage meanwhile. So: - * delete from cache even if there's another reference, - * after ensuring that the data has been saved to disk - - * since if the reference remains (rarer), it will be - * read from disk into another page. Splitting into two - * pages would be incorrect if swap supported "shared - * private" pages, but they are handled by tmpfs files. - * - * Given how unuse_vma() targets one particular offset - * in an anon_vma, once the anon_vma has been determined, - * this splitting happens to be just what is needed to - * handle where KSM pages have been swapped out: re-reading - * is unnecessarily slow, but we can fix that later on. - */ - if (swap_count(*swap_map) && - PageDirty(page) && PageSwapCache(page)) { - struct writeback_control wbc = { - .sync_mode = WB_SYNC_NONE, - }; - - swap_writepage(page, &wbc); - lock_page(page); - wait_on_page_writeback(page); - } - - /* - * It is conceivable that a racing task removed this page from - * swap cache just before we acquired the page lock at the top, - * or while we dropped it in unuse_mm(). The page might even - * be back in swap cache on another swap area: that we must not - * delete, since it may not have been written out to swap yet. - */ - if (PageSwapCache(page) && - likely(page_private(page) == entry.val)) - delete_from_swap_cache(page); - - /* - * So we could skip searching mms once swap count went - * to 1, we did not mark any present ptes as dirty: must - * mark page dirty so shrink_page_list will preserve it. - */ - SetPageDirty(page); - unlock_page(page); - page_cache_release(page); - - /* - * Make sure that we aren't completely killing - * interactive performance. - */ - cond_resched(); if (frontswap && pages_to_unuse > 0) { if (!--pages_to_unuse) break; -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755412Ab3HFGnN (ORCPT ); Tue, 6 Aug 2013 02:43:13 -0400 Received: from mailout3.w1.samsung.com ([210.118.77.13]:34479 "EHLO mailout3.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755333Ab3HFGnL (ORCPT ); Tue, 6 Aug 2013 02:43:11 -0400 X-AuditID: cbfec7f4-b7f5f6d000000ff6-da-52009afe30e7 From: Krzysztof Kozlowski To: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Cc: Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Krzysztof Kozlowski Subject: [RFC PATCH 3/4] mm: add zbud flag to page flags Date: Tue, 06 Aug 2013 08:42:40 +0200 Message-id: <1375771361-8388-4-git-send-email-k.kozlowski@samsung.com> X-Mailer: git-send-email 1.7.9.5 In-reply-to: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpiluLIzCtJLcpLzFFi42I5/e/4Nd1/sxiCDP6v07aYs34Nm8XGGetZ LV6/MLQ42/SG3eLyrjlsFvfW/Ge1WHvkLrvF5HfPGC0O7VvF7sDpsenTJHaPEzN+s3g8OLSZ xaNvyypGj82nqz0+b5ILYIvisklJzcksSy3St0vgyti68hZjwQLBiu/bDrA1MH7j7WLk5JAQ MJFY+aCfHcIWk7hwbz1bFyMXh5DAUkaJwy+eMUI4fUwSRyfcZgapYhMwlti8fAkbiC0i0Mco MemyBYjNLPCUUWLXdycQW1jAQuLO5GcsIDaLgKrEr11PWEFsXgE3ieW3bgNt4wDapiAxZ5IN SJhTwF1i6d7bYEcIAZX8ujCLZQIj7wJGhlWMoqmlyQXFSem5hnrFibnFpXnpesn5uZsYIeH2 ZQfj4mNWhxgFOBiVeHgTrv4PFGJNLCuuzD3EKMHBrCTCy1jFECTEm5JYWZValB9fVJqTWnyI kYmDU6qB0fRswSLj9R9TlGY+YDy6P5XRaHr3TTudqr1L7P13bLab5y34eMmb5ZasLB15pRyc 2j8YH7xZ0sL+Yt97i88a3j+ysiVOrj/kd2LZnAn1kVlmhxoOad/46i4j+rLAqkdB6gn7Eh6H vvn6AT+1fzpsdudZb7pQawuj3PuUx+YSq36a7+g9ptxwVomlOCPRUIu5qDgRAK+/NbIVAgAA Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add PageZbud flag to page flags to distinguish pages allocated in zbud. Currently these pages do not have any flags set. Signed-off-by: Krzysztof Kozlowski --- include/linux/page-flags.h | 12 ++++++++++++ mm/page_alloc.c | 3 +++ mm/zbud.c | 4 ++++ 3 files changed, 19 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6d53675..5b8b61a6 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -109,6 +109,12 @@ enum pageflags { #ifdef CONFIG_TRANSPARENT_HUGEPAGE PG_compound_lock, #endif +#ifdef CONFIG_ZBUD + /* Allocated by zbud. Flag is necessary to find zbud pages to unuse + * during migration/compaction. + */ + PG_zbud, +#endif __NR_PAGEFLAGS, /* Filesystems */ @@ -275,6 +281,12 @@ PAGEFLAG_FALSE(HWPoison) #define __PG_HWPOISON 0 #endif +#ifdef CONFIG_ZBUD +PAGEFLAG(Zbud, zbud) +#else +PAGEFLAG_FALSE(Zbud) +#endif + u64 stable_page_flags(struct page *page); static inline int PageUptodate(struct page *page) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b100255..1a120fb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6345,6 +6345,9 @@ static const struct trace_print_flags pageflag_names[] = { #ifdef CONFIG_TRANSPARENT_HUGEPAGE {1UL << PG_compound_lock, "compound_lock" }, #endif +#ifdef CONFIG_ZBUD + {1UL << PG_zbud, "zbud" }, +#endif }; static void dump_page_flags(unsigned long flags) diff --git a/mm/zbud.c b/mm/zbud.c index a8e986f..a452949 100644 --- a/mm/zbud.c +++ b/mm/zbud.c @@ -230,7 +230,10 @@ static void get_zbud_page(struct zbud_header *zhdr) static int put_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr) { struct page *page = virt_to_page(zhdr); + BUG_ON(!PageZbud(page)); + if (put_page_testzero(page)) { + ClearPageZbud(page); free_hot_cold_page(page, 0); pool->pages_nr--; return 1; @@ -341,6 +344,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, * don't increase the page count. */ zhdr = init_zbud_page(page); + SetPageZbud(page); bud = FIRST; found: -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755433Ab3HFGnf (ORCPT ); Tue, 6 Aug 2013 02:43:35 -0400 Received: from mailout3.w1.samsung.com ([210.118.77.13]:34486 "EHLO mailout3.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755350Ab3HFGnN (ORCPT ); Tue, 6 Aug 2013 02:43:13 -0400 X-AuditID: cbfec7f4-b7f5f6d000000ff6-e4-52009aff6181 From: Krzysztof Kozlowski To: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Cc: Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Krzysztof Kozlowski Subject: [RFC PATCH 4/4] mm: reclaim zbud pages on migration and compaction Date: Tue, 06 Aug 2013 08:42:41 +0200 Message-id: <1375771361-8388-5-git-send-email-k.kozlowski@samsung.com> X-Mailer: git-send-email 1.7.9.5 In-reply-to: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpmluLIzCtJLcpLzFFi42I5/e/4Nd3/sxiCDNp3W1rMWb+GzWLjjPWs Fq9fGFqcbXrDbnF51xw2i3tr/rNarD1yl91i8rtnjBaH9q1id+D02PRpErvHiRm/WTweHNrM 4tG3ZRWjx+bT1R6fN8kFsEVx2aSk5mSWpRbp2yVwZTw49IG94HEjY8XBPZtZGxiPZ3QxcnJI CJhIPJ7QxAhhi0lcuLeerYuRi0NIYCmjxPEbU9khnD4miakvG8Gq2ASMJTYvX8IGYosI9DFK TLpsAWIzCzxllNj13QnEFhbwljhzby4LiM0ioCrx4O1hMJtXwE3iefMkoKEcQNsUJOZMsgEJ cwq4Syzde5sdxBYCKvl1YRbLBEbeBYwMqxhFU0uTC4qT0nMN9YoTc4tL89L1kvNzNzFCAu7L DsbFx6wOMQpwMCrx8CZc/R8oxJpYVlyZe4hRgoNZSYSXsYohSIg3JbGyKrUoP76oNCe1+BAj EwenVANjmpa3hzNTV23yst0/1x3qONX9YKLct9t3p9//MXemqJsaz9pLeb9bO5s6pm1gWuzA c2j/LJf6jZvbDr6cujq5ccbfuddPT1wtqhpXbaT8cleQHJuKyKbdJ5i2hHqGH+ezrT131Erl juABltzXiw1mLV3P7eudov4i8aziPIkb6hsOOCyN4Z+ep8RSnJFoqMVcVJwIAMg3EgAWAgAA Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Reclaim zbud pages during migration and compaction by unusing stored data. This allows adding__GFP_RECLAIMABLE flag when allocating zbud pages and effectively CMA pool can be used for zswap. zbud pages are not movable and are not stored under any LRU (except zbud's LRU). PageZbud flag is used in isolate_migratepages_range() to grab zbud pages and pass them later for reclaim. This reclaim process is different than zbud_reclaim_page(). It acts more like swapoff() by trying to unuse pages stored in zbud page and bring them back to memory. The standard zbud_reclaim_page() on the other hand tries to write them back. Signed-off-by: Krzysztof Kozlowski --- include/linux/zbud.h | 11 +++- mm/compaction.c | 20 ++++++- mm/internal.h | 1 + mm/page_alloc.c | 6 ++ mm/zbud.c | 163 +++++++++++++++++++++++++++++++++++++++----------- mm/zswap.c | 57 ++++++++++++++++-- 6 files changed, 215 insertions(+), 43 deletions(-) diff --git a/include/linux/zbud.h b/include/linux/zbud.h index 2571a5c..57ee85d 100644 --- a/include/linux/zbud.h +++ b/include/linux/zbud.h @@ -5,8 +5,14 @@ struct zbud_pool; +/** + * Template for functions called during reclaim. + */ +typedef int (*evict_page_t)(struct zbud_pool *pool, unsigned long handle); + struct zbud_ops { - int (*evict)(struct zbud_pool *pool, unsigned long handle); + evict_page_t evict; /* callback for zbud_reclaim_lru_page() */ + evict_page_t unuse; /* callback for zbud_reclaim_pages() */ }; struct zbud_pool *zbud_create_pool(gfp_t gfp, struct zbud_ops *ops); @@ -14,7 +20,8 @@ void zbud_destroy_pool(struct zbud_pool *pool); int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, unsigned long *handle); void zbud_free(struct zbud_pool *pool, unsigned long handle); -int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries); +int zbud_reclaim_lru_page(struct zbud_pool *pool, unsigned int retries); +void zbud_reclaim_pages(struct list_head *zbud_pages); void *zbud_map(struct zbud_pool *pool, unsigned long handle); void zbud_unmap(struct zbud_pool *pool, unsigned long handle); u64 zbud_get_pool_size(struct zbud_pool *pool); diff --git a/mm/compaction.c b/mm/compaction.c index 05ccb4c..9bbf412 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "internal.h" #ifdef CONFIG_COMPACTION @@ -534,6 +535,17 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, goto next_pageblock; } + if (PageZbud(page)) { + /* + * Zbud pages do not exist in LRU so we must + * check for Zbud flag before PageLRU() below. + */ + BUG_ON(PageLRU(page)); + get_page(page); + list_add(&page->lru, &cc->zbudpages); + continue; + } + /* * Check may be lockless but that's ok as we recheck later. * It's possible to migrate LRU pages and balloon pages @@ -810,7 +822,10 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn, false); if (!low_pfn || cc->contended) return ISOLATE_ABORT; - +#ifdef CONFIG_ZBUD + if (!list_empty(&cc->zbudpages)) + zbud_reclaim_pages(&cc->zbudpages); +#endif cc->migrate_pfn = low_pfn; return ISOLATE_SUCCESS; @@ -1023,11 +1038,13 @@ static unsigned long compact_zone_order(struct zone *zone, }; INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); + INIT_LIST_HEAD(&cc.zbudpages); ret = compact_zone(zone, &cc); VM_BUG_ON(!list_empty(&cc.freepages)); VM_BUG_ON(!list_empty(&cc.migratepages)); + VM_BUG_ON(!list_empty(&cc.zbudpages)); *contended = cc.contended; return ret; @@ -1105,6 +1122,7 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc) cc->zone = zone; INIT_LIST_HEAD(&cc->freepages); INIT_LIST_HEAD(&cc->migratepages); + INIT_LIST_HEAD(&cc->zbudpages); if (cc->order == -1 || !compaction_deferred(zone, cc->order)) compact_zone(zone, cc); diff --git a/mm/internal.h b/mm/internal.h index 4390ac6..eaf5c884 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -119,6 +119,7 @@ struct compact_control { unsigned long nr_migratepages; /* Number of pages to migrate */ unsigned long free_pfn; /* isolate_freepages search base */ unsigned long migrate_pfn; /* isolate_migratepages search base */ + struct list_head zbudpages; /* List of pages belonging to zbud */ bool sync; /* Synchronous migration */ bool ignore_skip_hint; /* Scan blocks even if marked skip */ bool finished_update_free; /* True when the zone cached pfns are diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1a120fb..e482876 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -60,6 +60,7 @@ #include #include #include +#include #include #include @@ -6031,6 +6032,10 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, ret = -EINTR; break; } +#ifdef CONFIG_ZBUD + if (!list_empty(&cc.zbudpages)) + zbud_reclaim_pages(&cc.zbudpages); +#endif tries = 0; } else if (++tries == 5) { ret = ret < 0 ? ret : -EBUSY; @@ -6085,6 +6090,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, .ignore_skip_hint = true, }; INIT_LIST_HEAD(&cc.migratepages); + INIT_LIST_HEAD(&cc.zbudpages); /* * What we do here is we mark all pageblocks in range as diff --git a/mm/zbud.c b/mm/zbud.c index a452949..98a04c8 100644 --- a/mm/zbud.c +++ b/mm/zbud.c @@ -103,12 +103,14 @@ struct zbud_pool { * @lru: links the zbud page into the lru list in the pool * @first_chunks: the size of the first buddy in chunks, 0 if free * @last_chunks: the size of the last buddy in chunks, 0 if free + * @pool: pool to which this zbud page belongs to */ struct zbud_header { struct list_head buddy; struct list_head lru; unsigned int first_chunks; unsigned int last_chunks; + struct zbud_pool *pool; }; /***************** @@ -137,6 +139,7 @@ static struct zbud_header *init_zbud_page(struct page *page) zhdr->last_chunks = 0; INIT_LIST_HEAD(&zhdr->buddy); INIT_LIST_HEAD(&zhdr->lru); + zhdr->pool = NULL; return zhdr; } @@ -241,7 +244,6 @@ static int put_zbud_page(struct zbud_pool *pool, struct zbud_header *zhdr) return 0; } - /***************** * API Functions *****************/ @@ -345,6 +347,7 @@ int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, */ zhdr = init_zbud_page(page); SetPageZbud(page); + zhdr->pool = pool; bud = FIRST; found: @@ -394,8 +397,57 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) #define list_tail_entry(ptr, type, member) \ list_entry((ptr)->prev, type, member) +/* + * Pool lock must be held when calling this function and at least + * one handle must not free. + * On return the pool lock will be still held however during the + * execution it will be unlocked and locked for the time of calling + * the evict callback. + * + * Returns 1 if page was freed here, 0 otherwise (still in use) + */ +static int do_reclaim(struct zbud_pool *pool, struct zbud_header *zhdr, + evict_page_t evict_cb) +{ + int ret; + unsigned long first_handle = 0, last_handle = 0; + + BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0); + /* Move this last element to beginning of LRU */ + list_del(&zhdr->lru); + list_add(&zhdr->lru, &pool->lru); + /* Protect zbud page against free */ + get_zbud_page(zhdr); + /* + * We need encode the handles before unlocking, since we can + * race with free that will set (first|last)_chunks to 0 + */ + first_handle = 0; + last_handle = 0; + if (zhdr->first_chunks) + first_handle = encode_handle(zhdr, FIRST); + if (zhdr->last_chunks) + last_handle = encode_handle(zhdr, LAST); + spin_unlock(&pool->lock); + + /* Issue the eviction callback(s) */ + if (first_handle) { + ret = evict_cb(pool, first_handle); + if (ret) + goto next; + } + if (last_handle) { + ret = evict_cb(pool, last_handle); + if (ret) + goto next; + } +next: + spin_lock(&pool->lock); + return put_zbud_page(pool, zhdr); +} + /** - * zbud_reclaim_page() - evicts allocations from a pool page and frees it + * zbud_reclaim_lru_page() - evicts allocations from a pool page and frees it * @pool: pool from which a page will attempt to be evicted * @retires: number of pages on the LRU list for which eviction will * be attempted before failing @@ -429,11 +481,10 @@ void zbud_free(struct zbud_pool *pool, unsigned long handle) * no pages to evict or an eviction handler is not registered, -EAGAIN if * the retry limit was hit. */ -int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) +int zbud_reclaim_lru_page(struct zbud_pool *pool, unsigned int retries) { - int i, ret; + int i; struct zbud_header *zhdr; - unsigned long first_handle = 0, last_handle = 0; spin_lock(&pool->lock); if (!pool->ops || !pool->ops->evict || list_empty(&pool->lru) || @@ -454,44 +505,84 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) return 0; } zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru); - BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0); - /* Move this last element to beginning of LRU */ - list_del(&zhdr->lru); - list_add(&zhdr->lru, &pool->lru); - /* Protect zbud page against free */ - get_zbud_page(zhdr); - /* - * We need encode the handles before unlocking, since we can - * race with free that will set (first|last)_chunks to 0 - */ - first_handle = 0; - last_handle = 0; - if (zhdr->first_chunks) - first_handle = encode_handle(zhdr, FIRST); - if (zhdr->last_chunks) - last_handle = encode_handle(zhdr, LAST); - spin_unlock(&pool->lock); - - /* Issue the eviction callback(s) */ - if (first_handle) { - ret = pool->ops->evict(pool, first_handle); - if (ret) - goto next; + if (do_reclaim(pool, zhdr, pool->ops->evict)) { + spin_unlock(&pool->lock); + return 0; } - if (last_handle) { - ret = pool->ops->evict(pool, last_handle); - if (ret) - goto next; + } + spin_unlock(&pool->lock); + return -EAGAIN; +} + + +/** + * zbud_reclaim_pages() - reclaims zbud pages by unusing stored pages + * @zbud_pages list of zbud pages to reclaim + * + * zbud reclaim is different from normal system reclaim in that the reclaim is + * done from the bottom, up. This is because only the bottom layer, zbud, has + * information on how the allocations are organized within each zbud page. This + * has the potential to create interesting locking situations between zbud and + * the user, however. + * + * To avoid these, this is how zbud_reclaim_pages() should be called: + + * The user detects some pages should be reclaimed and calls + * zbud_reclaim_pages(). The zbud_reclaim_pages() will remove zbud + * pages from the pool LRU list and call the user-defined unuse handler with + * the pool and handle as arguments. + * + * If the handle can not be unused, the unuse handler should return + * non-zero. zbud_reclaim_pages() will add the zbud page back to the + * appropriate list and try the next zbud page on the list. + * + * If the handle is successfully unused, the unuse handler should + * return 0. + * The zbud page will be freed later by unuse code + * (e.g. frontswap_invalidate_page()). + * + * If all buddies in the zbud page are successfully unused, then the + * zbud page can be freed. + */ +void zbud_reclaim_pages(struct list_head *zbud_pages) +{ + struct page *page; + struct page *page2; + + list_for_each_entry_safe(page, page2, zbud_pages, lru) { + struct zbud_header *zhdr; + struct zbud_pool *pool; + + list_del(&page->lru); + if (!PageZbud(page)) { + /* + * Drop page count from isolate_migratepages_range() + */ + put_page(page); + continue; } -next: + zhdr = page_address(page); + BUG_ON(!zhdr->pool); + pool = zhdr->pool; + spin_lock(&pool->lock); + /* Drop page count from isolate_migratepages_range() */ if (put_zbud_page(pool, zhdr)) { + /* + * zbud_free() could free the handles before acquiring + * pool lock above. No need to reclaim. + */ spin_unlock(&pool->lock); - return 0; + continue; + } + if (!pool->ops || !pool->ops->unuse || list_empty(&pool->lru)) { + spin_unlock(&pool->lock); + continue; } + BUG_ON(!PageZbud(page)); + do_reclaim(pool, zhdr, pool->ops->unuse); + spin_unlock(&pool->lock); } - spin_unlock(&pool->lock); - return -EAGAIN; } /** diff --git a/mm/zswap.c b/mm/zswap.c index deda2b6..846649b 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -35,6 +35,9 @@ #include #include #include +#include +#include +#include #include #include @@ -61,6 +64,8 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0); static u64 zswap_pool_limit_hit; /* Pages written back when pool limit was reached */ static u64 zswap_written_back_pages; +/* Pages unused due to reclaim */ +static u64 zswap_unused_pages; /* Store failed due to a reclaim failure after pool limit was reached */ static u64 zswap_reject_reclaim_fail; /* Compressed page was too big for the allocator to (optimally) store */ @@ -596,6 +601,47 @@ fail: return ret; } +/** + * Tries to unuse swap entries by uncompressing them. + * Function is a stripped swapfile.c::try_to_unuse(). + * + * Returns 0 on success or negative on error. + */ +static int zswap_unuse_entry(struct zbud_pool *pool, unsigned long handle) +{ + struct zswap_header *zhdr; + swp_entry_t swpentry; + struct zswap_tree *tree; + pgoff_t offset; + struct mm_struct *start_mm; + struct swap_info_struct *si; + int ret; + + /* extract swpentry from data */ + zhdr = zbud_map(pool, handle); + swpentry = zhdr->swpentry; /* here */ + zbud_unmap(pool, handle); + tree = zswap_trees[swp_type(swpentry)]; + offset = swp_offset(swpentry); + BUG_ON(pool != tree->pool); + + /* + * We cannot hold swap_lock here but swap_info may + * change (e.g. by swapoff). In case of swapoff + * check for SWP_WRITEOK. + */ + si = swap_info[swp_type(swpentry)]; + if (!(si->flags & SWP_WRITEOK)) + return -ECANCELED; + + start_mm = &init_mm; + atomic_inc(&init_mm.mm_users); + ret = try_to_unuse_swp_entry(&start_mm, si, swpentry); + mmput(start_mm); + zswap_unused_pages++; + return ret; +} + /********************************* * frontswap hooks **********************************/ @@ -620,7 +666,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, /* reclaim space if needed */ if (zswap_is_full()) { zswap_pool_limit_hit++; - if (zbud_reclaim_page(tree->pool, 8)) { + if (zbud_reclaim_lru_page(tree->pool, 8)) { zswap_reject_reclaim_fail++; ret = -ENOMEM; goto reject; @@ -647,8 +693,8 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, /* store */ len = dlen + sizeof(struct zswap_header); - ret = zbud_alloc(tree->pool, len, __GFP_NORETRY | __GFP_NOWARN, - &handle); + ret = zbud_alloc(tree->pool, len, __GFP_NORETRY | __GFP_NOWARN | + __GFP_RECLAIMABLE, &handle); if (ret == -ENOSPC) { zswap_reject_compress_poor++; goto freepage; @@ -819,7 +865,8 @@ static void zswap_frontswap_invalidate_area(unsigned type) } static struct zbud_ops zswap_zbud_ops = { - .evict = zswap_writeback_entry + .evict = zswap_writeback_entry, + .unuse = zswap_unuse_entry }; static void zswap_frontswap_init(unsigned type) @@ -880,6 +927,8 @@ static int __init zswap_debugfs_init(void) zswap_debugfs_root, &zswap_reject_compress_poor); debugfs_create_u64("written_back_pages", S_IRUGO, zswap_debugfs_root, &zswap_written_back_pages); + debugfs_create_u64("unused_pages", S_IRUGO, + zswap_debugfs_root, &zswap_unused_pages); debugfs_create_u64("duplicate_entry", S_IRUGO, zswap_debugfs_root, &zswap_duplicate_entry); debugfs_create_u64("pool_pages", S_IRUGO, -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755188Ab3HFJAd (ORCPT ); Tue, 6 Aug 2013 05:00:33 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:19865 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754416Ab3HFJAb (ORCPT ); Tue, 6 Aug 2013 05:00:31 -0400 Message-ID: <5200BB18.9010105@oracle.com> Date: Tue, 06 Aug 2013 17:00:08 +0800 From: Bob Liu User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130221 Thunderbird/17.0.3 MIME-Version: 1.0 To: Krzysztof Kozlowski CC: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Tomasz Stanislawski Subject: Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> <1375771361-8388-2-git-send-email-k.kozlowski@samsung.com> In-Reply-To: <1375771361-8388-2-git-send-email-k.kozlowski@samsung.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Source-IP: acsinet21.oracle.com [141.146.126.237] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Krzysztof, On 08/06/2013 02:42 PM, Krzysztof Kozlowski wrote: > Use page reference counter for zbud pages. The ref counter replaces > zbud_header.under_reclaim flag and ensures that zbud page won't be freed > when zbud_free() is called during reclaim. It allows implementation of > additional reclaim paths. > > The page count is incremented when: > - a handle is created and passed to zswap (in zbud_alloc()), > - user-supplied eviction callback is called (in zbud_reclaim_page()). > > Signed-off-by: Krzysztof Kozlowski > Signed-off-by: Tomasz Stanislawski Looks good to me. Reviewed-by: Bob Liu > --- > mm/zbud.c | 150 +++++++++++++++++++++++++++++++++++-------------------------- > 1 file changed, 86 insertions(+), 64 deletions(-) > > diff --git a/mm/zbud.c b/mm/zbud.c > index ad1e781..a8e986f 100644 > --- a/mm/zbud.c > +++ b/mm/zbud.c > @@ -109,7 +109,6 @@ struct zbud_header { > struct list_head lru; > unsigned int first_chunks; > unsigned int last_chunks; > - bool under_reclaim; > }; > > /***************** > @@ -138,16 +137,9 @@ static struct zbud_header *init_zbud_page(struct page *page) > zhdr->last_chunks = 0; > INIT_LIST_HEAD(&zhdr->buddy); > INIT_LIST_HEAD(&zhdr->lru); > - zhdr->under_reclaim = 0; > return zhdr; > } > > -/* Resets the struct page fields and frees the page */ > -static void free_zbud_page(struct zbud_header *zhdr) > -{ > - __free_page(virt_to_page(zhdr)); > -} > - > /* > * Encodes the handle of a particular buddy within a zbud page > * Pool lock should be held as this function accesses first|last_chunks > @@ -188,6 +180,65 @@ static int num_free_chunks(struct zbud_header *zhdr) > return NCHUNKS - zhdr->first_chunks - zhdr->last_chunks - 1; > } > > +/* > + * Called after zbud_free() or zbud_alloc(). > + * Checks whether given zbud page has to be: > + * - removed from buddied/unbuddied/LRU lists completetely (zbud_free). > + * - moved from buddied to unbuddied list > + * and to beginning of LRU (zbud_alloc, zbud_free), > + * - added to buddied list and LRU (zbud_alloc), > + * > + * The page must be already removed from buddied/unbuddied lists. > + * Must be called under pool->lock. > + */ > +static void rebalance_lists(struct zbud_pool *pool, struct zbud_header *zhdr) > +{ Nit picker, how about change the name to adjust_lists() or something like this because we don't do any rebalancing. -- Regards, -Bob From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755408Ab3HFJQx (ORCPT ); Tue, 6 Aug 2013 05:16:53 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:37448 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755241Ab3HFJQv (ORCPT ); Tue, 6 Aug 2013 05:16:51 -0400 Message-ID: <5200BEEF.7060904@oracle.com> Date: Tue, 06 Aug 2013 17:16:31 +0800 From: Bob Liu User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130221 Thunderbird/17.0.3 MIME-Version: 1.0 To: Krzysztof Kozlowski CC: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park Subject: Re: [RFC PATCH 0/4] mm: reclaim zbud pages on migration and compaction References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> In-Reply-To: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Source-IP: acsinet22.oracle.com [141.146.126.238] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/06/2013 02:42 PM, Krzysztof Kozlowski wrote: > Hi, > > Currently zbud pages are not movable and they cannot be allocated from CMA > region. These patches try to address the problem by: > 1. Adding a new form of reclaim of zbud pages. > 2. Reclaiming zbud pages during migration and compaction. > 3. Allocating zbud pages with __GFP_RECLAIMABLE flag. > > This reclaim process is different than zbud_reclaim_page(). It acts more > like swapoff() by trying to unuse pages stored in zbud page and bring > them back to memory. The standard zbud_reclaim_page() on the other hand > tries to write them back. I prefer to migrate zbud pages directly if it's possible than reclaiming them during compaction. > > One of patches introduces a new flag: PageZbud. This flag is used in > isolate_migratepages_range() to grab zbud pages and pass them later > for reclaim. Probably this could be replaced with something > smarter than a flag used only in one case. > Any ideas for a better solution are welcome. > > This patch set is based on Linux 3.11-rc4. > > TODOs: > 1. Replace PageZbud flag with other solution. > > Best regards, > Krzysztof Kozlowski > > > Krzysztof Kozlowski (4): > zbud: use page ref counter for zbud pages > mm: split code for unusing swap entries from try_to_unuse > mm: add zbud flag to page flags > mm: reclaim zbud pages on migration and compaction > > include/linux/page-flags.h | 12 ++ > include/linux/swapfile.h | 2 + > include/linux/zbud.h | 11 +- > mm/compaction.c | 20 ++- > mm/internal.h | 1 + > mm/page_alloc.c | 9 ++ > mm/swapfile.c | 354 +++++++++++++++++++++++--------------------- > mm/zbud.c | 301 +++++++++++++++++++++++++------------ > mm/zswap.c | 57 ++++++- > 9 files changed, 499 insertions(+), 268 deletions(-) > -- Regards, -Bob From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755449Ab3HFJZh (ORCPT ); Tue, 6 Aug 2013 05:25:37 -0400 Received: from mailout1.w1.samsung.com ([210.118.77.11]:41264 "EHLO mailout1.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755343Ab3HFJZg (ORCPT ); Tue, 6 Aug 2013 05:25:36 -0400 X-AuditID: cbfec7f5-b7f5f6d00000105f-5b-5200c10dda25 Message-id: <1375781132.2003.4.camel@AMDC1943> Subject: Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages From: Krzysztof Kozlowski To: Bob Liu Cc: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Tomasz Stanislawski Date: Tue, 06 Aug 2013 11:25:32 +0200 In-reply-to: <5200BB18.9010105@oracle.com> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> <1375771361-8388-2-git-send-email-k.kozlowski@samsung.com> <5200BB18.9010105@oracle.com> Content-type: text/plain; charset=UTF-8 X-Mailer: Evolution 3.2.3-0ubuntu6 Content-transfer-encoding: 7bit MIME-version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrOLMWRmVeSWpSXmKPExsVy+t/xa7p8BxmCDM5aW8xZv4bNYuOM9awW XaemslicbXrDbnF51xw2i3tr/rNarD1yl91i8rtnjBaH9q1it5jX/pLVgctj06dJ7B4nZvxm 8XhwaDOLx8ent1g8+rasYvTYfLra4/MmuQD2KC6blNSczLLUIn27BK6Mrdt+sxZ0MFX83zWZ pYHxJGMXIyeHhICJxOt/k5ggbDGJC/fWs3UxcnEICSxllHj8dTYzhPOZUWLToylsIFW8AnoS y75sYAWxhQXcJF6dmc0OYrMJGEtsXr4ErEZEQEHi9afzTCDNzAJfmCQ6Fy8EW8cioCrxd9k7 MJtTQEtizc8NLBAbFjNKvO98xAKSYBZQl5g0bxEzxE1KErvbO9kh4vISm9e8ZYa4QlDix+R7 LBMYBWYhaZmFpGwWkrIFjMyrGEVTS5MLipPSc430ihNzi0vz0vWS83M3MUIi4+sOxqXHrA4x CnAwKvHw7rj+P1CINbGsuDL3EKMEB7OSCK/KToYgId6UxMqq1KL8+KLSnNTiQ4xMHJxSDYw1 WhuaFp6SmCqu8I5DhWOJiPzRjWaJ6XJvnuZsKcneMaFPQHXTym2Oyxdo6t1JELt56nOykafx /EMfijPmiJ+Y9WPRl4NhzRx9U4RfeaRoK/yJKYp8+/LNm9lRrbo/fh8LFYvY2FfT5cosfuDr T5PLV9zCFPU/B8/+e+V+QqSbxCSdV1mbevuUWIozEg21mIuKEwFbc30TagIAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Bob, Thank you for review. On wto, 2013-08-06 at 17:00 +0800, Bob Liu wrote: > Nit picker, how about change the name to adjust_lists() or something > like this because we don't do any rebalancing. OK, I'll change it. Best regards, Krzysztof From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755880Ab3HFNFU (ORCPT ); Tue, 6 Aug 2013 09:05:20 -0400 Received: from mailout3.w1.samsung.com ([210.118.77.13]:56482 "EHLO mailout3.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754845Ab3HFNFS (ORCPT ); Tue, 6 Aug 2013 09:05:18 -0400 X-AuditID: cbfec7f5-b7f5f6d00000105f-45-5200f48c0710 Message-id: <1375794314.13955.6.camel@AMDC1943> Subject: Re: [RFC PATCH 0/4] mm: reclaim zbud pages on migration and compaction From: Krzysztof Kozlowski To: Bob Liu Cc: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park Date: Tue, 06 Aug 2013 15:05:14 +0200 In-reply-to: <5200BEEF.7060904@oracle.com> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> <5200BEEF.7060904@oracle.com> Content-type: text/plain; charset=UTF-8 X-Mailer: Evolution 3.2.3-0ubuntu6 Content-transfer-encoding: 7bit MIME-version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrALMWRmVeSWpSXmKPExsVy+t/xy7o9XxiCDL638VvMWb+GzWLjjPWs Fl2nprJYnG16w25xedccNot7a/6zWqw9cpfdYvK7Z4wWh/atYnfg9Nj0aRK7x4kZv1k8Hhza zOLx8ektFo++LasYPTafrvb4vEkugD2KyyYlNSezLLVI3y6BK+PmsYtsBatZKz716zYwLmDp YuTkkBAwkWjuvQRli0lcuLeerYuRi0NIYCmjxKXL01ghnM+MEt/uHmIHqeIV0JfY9fI6K4gt LOAvsfHoXkYQm03AWGLz8iVsILaIgILE60/nmUBsZoFDTBIrb3KB2CwCqhJb//aB1XAKaEmc W7QAzBYSSJN49PMzI0S9usSkeYuYIS5Sktjd3skOEZeX2LzmLTPEDYISPybfY5nAKDALScss JGWzkJQtYGRexSiaWppcUJyUnmukV5yYW1yal66XnJ+7iRESC193MC49ZnWIUYCDUYmHd8f1 /4FCrIllxZW5hxglOJiVRHjLXzAECfGmJFZWpRblxxeV5qQWH2Jk4uCUamDsP36geurnonmZ ZYt43/830PTb5GLWFSZ4jrvqlM3GDyd/NvxhtBbwu/Fqjup12Su73Tkv/963av0cDpXZL9fp 7OTW8H1p7FLktvLH8bYVElZf2XIfz79bKlG5aGbnlTq5ZWmbVnobb//mIZZocDlUmnMLp/bM 7NW38uR2PRJ8Yl+8+HbwnmxnJZbijERDLeai4kQA1HKvkWMCAAA= Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On wto, 2013-08-06 at 17:16 +0800, Bob Liu wrote: > On 08/06/2013 02:42 PM, Krzysztof Kozlowski wrote: > > This reclaim process is different than zbud_reclaim_page(). It acts more > > like swapoff() by trying to unuse pages stored in zbud page and bring > > them back to memory. The standard zbud_reclaim_page() on the other hand > > tries to write them back. > > I prefer to migrate zbud pages directly if it's possible than reclaiming > them during compaction. I think it is possible however it would be definitely more complex. In case of migration the zswap handles should be updated as they are just virtual addresses. Am I right? Best regards, Krzysztof From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756084Ab3HFQ6u (ORCPT ); Tue, 6 Aug 2013 12:58:50 -0400 Received: from mga02.intel.com ([134.134.136.20]:43055 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755686Ab3HFQ6t (ORCPT ); Tue, 6 Aug 2013 12:58:49 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.89,827,1367996400"; d="scan'208";a="358475849" Message-ID: <52012B35.90801@intel.com> Date: Tue, 06 Aug 2013 09:58:29 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130623 Thunderbird/17.0.7 MIME-Version: 1.0 To: Krzysztof Kozlowski CC: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park Subject: Re: [RFC PATCH 3/4] mm: add zbud flag to page flags References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> <1375771361-8388-4-git-send-email-k.kozlowski@samsung.com> In-Reply-To: <1375771361-8388-4-git-send-email-k.kozlowski@samsung.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/05/2013 11:42 PM, Krzysztof Kozlowski wrote: > +#ifdef CONFIG_ZBUD > + /* Allocated by zbud. Flag is necessary to find zbud pages to unuse > + * during migration/compaction. > + */ > + PG_zbud, > +#endif Do you _really_ need an absolutely new, unshared page flag? The zbud code doesn't really look like it uses any of the space in 'struct page'. I think you could pretty easily alias PG_zbud=PG_slab, then use the page->{private,slab_cache} (or some other unused field) in 'struct page' to store a cookie to differentiate slab and zbud pages. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757032Ab3HGHEJ (ORCPT ); Wed, 7 Aug 2013 03:04:09 -0400 Received: from mailout1.w1.samsung.com ([210.118.77.11]:16517 "EHLO mailout1.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751493Ab3HGHEH (ORCPT ); Wed, 7 Aug 2013 03:04:07 -0400 X-AuditID: cbfec7f5-b7f5f6d00000105f-88-5201f16316ed Message-id: <1375859042.17079.1.camel@AMDC1943> Subject: Re: [RFC PATCH 3/4] mm: add zbud flag to page flags From: Krzysztof Kozlowski To: Dave Hansen Cc: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park Date: Wed, 07 Aug 2013 09:04:02 +0200 In-reply-to: <52012B35.90801@intel.com> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> <1375771361-8388-4-git-send-email-k.kozlowski@samsung.com> <52012B35.90801@intel.com> Content-type: text/plain; charset=UTF-8 X-Mailer: Evolution 3.2.3-0ubuntu6 Content-transfer-encoding: 7bit MIME-version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrALMWRmVeSWpSXmKPExsVy+t/xy7rJHxmDDJa+tLSYs34Nm8XGGetZ LT69fMBocbbpDbvF5V1z2CzurfnParH2yF12i8nvnjFaHNq3it2B02PxnpdMHps+TWL3ODHj N4vHg0ObWTz6tqxi9Nh8utrj8ya5APYoLpuU1JzMstQifbsErowFzdkFl1gr2hd8YGlg3MzS xcjJISFgIrHhxAVGCFtM4sK99WxdjFwcQgJLGSUeHVjOAuF8ZpRo2XyeGaSKV0Bfou3uF6AE B4ewgI3Eo/N8IGE2AWOJzcuXsIHYIgLqEqdWLmcHsZkFDjFJrLzJBWKzCKhK7DxyEayGE6jm YMcpdoj5Cxgl/m+5ygjRoC4xad4iZoiLlCR2t3dCDZKX2LzmLdQNghI/Jt9jmcAoMAtJyywk ZbOQlC1gZF7FKJpamlxQnJSea6RXnJhbXJqXrpecn7uJERILX3cwLj1mdYhRgINRiYe3Qowx SIg1say4MvcQowQHs5IIr8h7oBBvSmJlVWpRfnxRaU5q8SFGJg5OqQZGv28FSzcp1uQuCF/7 Zm0mb7SY0SruvLkKqp9O123ZE9r99ewXdcEY31vvzmrEXTze/NlcklfwIVdPVvS5AkZzBVMx 3msN3UuO3b2t4XRoXmTM6V5hSdPjWw5lZSX/3f9pdqRxMAM7341XvIF3+2qO/VFUPPJqycKX B6LXfNxdk8PgGjvjs/4eJZbijERDLeai4kQAzIj//GMCAAA= Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On wto, 2013-08-06 at 09:58 -0700, Dave Hansen wrote: > On 08/05/2013 11:42 PM, Krzysztof Kozlowski wrote: > > +#ifdef CONFIG_ZBUD > > + /* Allocated by zbud. Flag is necessary to find zbud pages to unuse > > + * during migration/compaction. > > + */ > > + PG_zbud, > > +#endif > > Do you _really_ need an absolutely new, unshared page flag? > The zbud code doesn't really look like it uses any of the space in > 'struct page'. > > I think you could pretty easily alias PG_zbud=PG_slab, then use the > page->{private,slab_cache} (or some other unused field) in 'struct page' > to store a cookie to differentiate slab and zbud pages. Thanks for idea, I will try that. Best regards, Krzysztof From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932177Ab3HGHbz (ORCPT ); Wed, 7 Aug 2013 03:31:55 -0400 Received: from mailout4.w1.samsung.com ([210.118.77.14]:17367 "EHLO mailout4.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756354Ab3HGHby (ORCPT ); Wed, 7 Aug 2013 03:31:54 -0400 X-AuditID: cbfec7f4-b7f5f6d000000ff6-2f-5201f7e8f082 Message-id: <1375860711.17079.16.camel@AMDC1943> Subject: Re: [RFC PATCH 1/4] zbud: use page ref counter for zbud pages From: Krzysztof Kozlowski To: Seth Jennings Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park , Tomasz Stanislawski , Bob Liu Date: Wed, 07 Aug 2013 09:31:51 +0200 In-reply-to: <20130806185104.GD5765@medulla.variantweb.net> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> <1375771361-8388-2-git-send-email-k.kozlowski@samsung.com> <20130806185104.GD5765@medulla.variantweb.net> Content-type: text/plain; charset=UTF-8 X-Mailer: Evolution 3.2.3-0ubuntu6 Content-transfer-encoding: 7bit MIME-version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrOLMWRmVeSWpSXmKPExsVy+t/xK7ovvjMGGczdoWExZ/0aNouNM9az WnSdmspicbbpDbvF5V1z2CzurfnParH2yF12i8nvnjFaHNq3it1iXvtLVgcuj02fJrF7nJjx m8XjwaHNLB4fn95i8ejbsorRY/Ppao/Pm+QC2KO4bFJSczLLUov07RK4Ms79uMJUcE6g4tWJ A4wNjCd4uhg5OSQETCTOrD3FCmGLSVy4t56ti5GLQ0hgKaPE3olbmSGcz4wSp5feBqri4OAV MJA43i4M0iAs4Cbx6sxsdhCbTcBYYvPyJWwgtoiAvkT37BVgvcwCj5kkLp9pBCtiEVCVuNXw GMzmFLCW2HLyPCPEgi2MEpNeL2ICSTALqEtMmreIGeIkJYnd7Z3sEHF5ic1r3oLFeQUEJX5M vscygVFgFpKWWUjKZiEpW8DIvIpRNLU0uaA4KT3XUK84Mbe4NC9dLzk/dxMjJDK+7GBcfMzq EKMAB6MSD2+FGGOQEGtiWXFl7iFGCQ5mJRFekfdAId6UxMqq1KL8+KLSnNTiQ4xMHJxSDYw2 YoLOUqK/f2otvHd1k18ez857dW9fdsn7eF5qMmru5OCdWMt417Xh4e51H/567dqyO7b2Y4H3 2ier5Uvj/jIeYt20MyVaziBU5P1106VJs1ziE5iSblas2L69cs3+4Au1BQmxn6sdfO2e7zu2 /TPHPxe3fAaxa7d2iHdx/9SL8dx15emNt1OVWIozEg21mIuKEwGzsSaJagIAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Seth, On wto, 2013-08-06 at 13:51 -0500, Seth Jennings wrote: > I like the idea. I few things below. Also agree with Bob the > s/rebalance/adjust/ for rebalance_lists(). OK. > s/else if/if/ since the if above returns if true. Sure. > > + /* zbud_free() or zbud_alloc() */ > > + int freechunks = num_free_chunks(zhdr); > > + list_add(&zhdr->buddy, &pool->unbuddied[freechunks]); > > + } else { > > + /* zbud_alloc() */ > > + list_add(&zhdr->buddy, &pool->buddied); > > + } > > + /* Add/move zbud page to beginning of LRU */ > > + if (!list_empty(&zhdr->lru)) > > + list_del(&zhdr->lru); > > We don't want to reinsert to the LRU list if we have called zbud_free() > on a zbud page that previously had two buddies. This code causes the > zbud page to move to the front of the LRU list which is not what we want. Right, I'll fix it. > > @@ -326,10 +370,10 @@ found: > > void zbud_free(struct zbud_pool *pool, unsigned long handle) > > { > > struct zbud_header *zhdr; > > - int freechunks; > > > > spin_lock(&pool->lock); > > zhdr = handle_to_zbud_header(handle); > > + BUG_ON(zhdr->last_chunks == 0 && zhdr->first_chunks == 0); > > Not sure we need this. Maybe, at most, VM_BUG_ON()? Actually it is somehow a leftover after debugging so I don't mind removing it completely. > > @@ -411,11 +438,24 @@ int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) > > return -EINVAL; > > } > > for (i = 0; i < retries; i++) { > > + if (list_empty(&pool->lru)) { > > + /* > > + * LRU was emptied during evict calls in previous > > + * iteration but put_zbud_page() returned 0 meaning > > + * that someone still holds the page. This may > > + * happen when some other mm mechanism increased > > + * the page count. > > + * In such case we succedded with reclaim. > > + */ > > + return 0; > > + } > > zhdr = list_tail_entry(&pool->lru, struct zbud_header, lru); > > + BUG_ON(zhdr->first_chunks == 0 && zhdr->last_chunks == 0); > > Again here. I agree. Thanks for comments, Krzysztof From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933953Ab3HHH0l (ORCPT ); Thu, 8 Aug 2013 03:26:41 -0400 Received: from mailout4.w1.samsung.com ([210.118.77.14]:54515 "EHLO mailout4.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933730Ab3HHH0i (ORCPT ); Thu, 8 Aug 2013 03:26:38 -0400 X-AuditID: cbfec7f4-b7f5f6d000000ff6-f5-5203482c2ecc Message-id: <1375946794.25843.1.camel@AMDC1943> Subject: Re: [RFC PATCH 3/4] mm: add zbud flag to page flags From: Krzysztof Kozlowski To: Dave Hansen Cc: Seth Jennings , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Kyungmin Park Date: Thu, 08 Aug 2013 09:26:34 +0200 In-reply-to: <52012B35.90801@intel.com> References: <1375771361-8388-1-git-send-email-k.kozlowski@samsung.com> <1375771361-8388-4-git-send-email-k.kozlowski@samsung.com> <52012B35.90801@intel.com> Content-type: text/plain; charset=UTF-8 X-Mailer: Evolution 3.2.3-0ubuntu6 Content-transfer-encoding: 7bit MIME-version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrPLMWRmVeSWpSXmKPExsVy+t/xq7o6HsxBBu928lrMWb+GzWLjjPWs Fp9ePmC0ONv0ht3i8q45bBb31vxntVh75C67xeR3zxgtDu1bxe7A6bF4z0smj02fJrF7nJjx m8XjwaHNLB59W1Yxemw+Xe3xeZNcAHsUl01Kak5mWWqRvl0CV0bLC96CbraKk20r2BoYH7N0 MXJySAiYSMx9sJQNwhaTuHBvPZDNxSEksJRR4vPsnVDOZ0aJcxveMIJU8QroS0y+cp69i5GD Q1jARuLReT6QMJuAscTm5UvABokIqEucWrmcHcRmFjjEJLHyJhdIOYuAqsS+54UgYU6gkoMd p9ghxi9glPi/5SojRL26xKR5i5ghDlKS2N3eCTVHXmLzmrfMECcISvyYfI9lAqPALCQts5CU zUJStoCReRWjaGppckFxUnquoV5xYm5xaV66XnJ+7iZGSCR82cG4+JjVIUYBDkYlHt6OAKYg IdbEsuLK3EOMEhzMSiK8F4uBQrwpiZVVqUX58UWlOanFhxiZODilGhjd13EWuQso/PXvTu9f efF6wqUD23SymLa8+5pzamXpfobs4LyNlqq1fBcv9a+6K/LWOMggdV755+xtc3a3LHZj3nbw /bSVK17PWRja5r/tV9bG3d7vvkSd9X0+d2LIB4dpN7fMMN0ruVRSTVZtn1mT7aFEybMzesWc j3fI/YzuvzidTXXyn0x5JZbijERDLeai4kQAc3X+KmICAAA= Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On wto, 2013-08-06 at 09:58 -0700, Dave Hansen wrote: > On 08/05/2013 11:42 PM, Krzysztof Kozlowski wrote: > > +#ifdef CONFIG_ZBUD > > + /* Allocated by zbud. Flag is necessary to find zbud pages to unuse > > + * during migration/compaction. > > + */ > > + PG_zbud, > > +#endif > > Do you _really_ need an absolutely new, unshared page flag? > The zbud code doesn't really look like it uses any of the space in > 'struct page'. > > I think you could pretty easily alias PG_zbud=PG_slab, then use the > page->{private,slab_cache} (or some other unused field) in 'struct page' > to store a cookie to differentiate slab and zbud pages. How about using page->_mapcount with negative value (-129)? Just like PageBuddy()? Best regards, Krzysztof