From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Dave Hansen <dave.hansen@intel.com>,
Vlastimil Babka <vbabka@suse.cz>,
Christoph Lameter <cl@gentwo.org>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Jerome Marchand <jmarchan@redhat.com>,
Yang Shi <yang.shi@linaro.org>,
Sasha Levin <sasha.levin@oracle.com>, Ning Qu <quning@gmail.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCHv4 00/25] THP-enabled tmpfs/shmem
Date: Thu, 24 Mar 2016 12:17:28 +0300 [thread overview]
Message-ID: <20160324091727.GA26796@node.shutemov.name> (raw)
In-Reply-To: <alpine.LSU.2.11.1603231305560.4946@eggly.anvils>
On Wed, Mar 23, 2016 at 01:09:05PM -0700, Hugh Dickins wrote:
> The small files thing formed my first impression. My second
> impression was similar, when I tried mmap(NULL, size_of_RAM,
> PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_SHARED, -1, 0) and
> cycled around the arena touching all the pages (which of
> course has to push a little into swap): that soon OOMed.
>
> But there I think you probably just have some minor bug to be fixed:
> I spent a little while trying to debug it, but then decided I'd
> better get back to writing to you. I didn't really understand what
> I was seeing, but when I hacked some stats into shrink_page_list(),
> converting !is_page_cache_freeable(page) to page_cache_references(page)
> to return the difference instead of the bool, a large proportion of
> huge tmpfs pages seemed to have count 1 too high to be freeable at
> that point (and one huge tmpfs page had a count of 3477).
I'll reply to your other points later, but first I wanted to address this
obvious bug.
I cannot really explain page_count() == 3477, but otherwise:
The root cause is that try_to_unmap() doesn't handle PMD-mapped huge
pages, so we hit 'case SWAP_AGAIN' all the time.
The patch below effectively rewrites 17/25: now we split the huge page
before trying to unmap it.
split_huge_page() has its own check similar to is_page_cache_freeable(),
so we woundn't split pages we cannot free later on.
And split_huge_page() for file pages would unmap the page, so we wouldn't
need to go to try_to_unmap() after that.
The patch look rather simple, but I haven't done full validation cycle for
it. Regressions are unlikely, but possible.
At some point we would need to teach try_to_unmap() to handle huge pages.
It would be required for filesystems with backing storage. But I don't see
need for it to get huge tmpfs/shmem work.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9fa9e15594e9..86008f8f1f9b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -473,14 +473,12 @@ void drop_slab(void)
static inline int is_page_cache_freeable(struct page *page)
{
- int radix_tree_pins = PageTransHuge(page) ? HPAGE_PMD_NR : 1;
-
/*
* A freeable page cache page is referenced only by the caller
* that isolated the page, the page cache radix tree and
* optional buffer heads at page->private.
*/
- return page_count(page) - page_has_private(page) == 1 + radix_tree_pins;
+ return page_count(page) - page_has_private(page) == 2;
}
static int may_write_to_inode(struct inode *inode, struct scan_control *sc)
@@ -550,6 +548,8 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
* swap_backing_dev_info is bust: it doesn't reflect the
* congestion state of the swapdevs. Easy to fix, if needed.
*/
+ if (!is_page_cache_freeable(page))
+ return PAGE_KEEP;
if (!mapping) {
/*
* Some data journaling orphaned pages can have
@@ -1055,8 +1055,14 @@ static unsigned long shrink_page_list(struct list_head *page_list,
/* Adding to swap updated mapping */
mapping = page_mapping(page);
+ } else if (unlikely(PageTransHuge(page))) {
+ /* Split file THP */
+ if (split_huge_page_to_list(page, page_list))
+ goto keep_locked;
}
+ VM_BUG_ON_PAGE(PageTransHuge(page), page);
+
/*
* The page is mapped into the page tables of one or more
* processes. Try to unmap it here.
@@ -1112,15 +1118,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
* starts and then write it out here.
*/
try_to_unmap_flush_dirty();
-
- if (!is_page_cache_freeable(page))
- goto keep_locked;
-
- if (unlikely(PageTransHuge(page))) {
- if (split_huge_page_to_list(page, page_list))
- goto keep_locked;
- }
-
switch (pageout(page, mapping, sc)) {
case PAGE_KEEP:
goto keep_locked;
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Dave Hansen <dave.hansen@intel.com>,
Vlastimil Babka <vbabka@suse.cz>,
Christoph Lameter <cl@gentwo.org>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Jerome Marchand <jmarchan@redhat.com>,
Yang Shi <yang.shi@linaro.org>,
Sasha Levin <sasha.levin@oracle.com>, Ning Qu <quning@gmail.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCHv4 00/25] THP-enabled tmpfs/shmem
Date: Thu, 24 Mar 2016 12:17:28 +0300 [thread overview]
Message-ID: <20160324091727.GA26796@node.shutemov.name> (raw)
In-Reply-To: <alpine.LSU.2.11.1603231305560.4946@eggly.anvils>
On Wed, Mar 23, 2016 at 01:09:05PM -0700, Hugh Dickins wrote:
> The small files thing formed my first impression. My second
> impression was similar, when I tried mmap(NULL, size_of_RAM,
> PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_SHARED, -1, 0) and
> cycled around the arena touching all the pages (which of
> course has to push a little into swap): that soon OOMed.
>
> But there I think you probably just have some minor bug to be fixed:
> I spent a little while trying to debug it, but then decided I'd
> better get back to writing to you. I didn't really understand what
> I was seeing, but when I hacked some stats into shrink_page_list(),
> converting !is_page_cache_freeable(page) to page_cache_references(page)
> to return the difference instead of the bool, a large proportion of
> huge tmpfs pages seemed to have count 1 too high to be freeable at
> that point (and one huge tmpfs page had a count of 3477).
I'll reply to your other points later, but first I wanted to address this
obvious bug.
I cannot really explain page_count() == 3477, but otherwise:
The root cause is that try_to_unmap() doesn't handle PMD-mapped huge
pages, so we hit 'case SWAP_AGAIN' all the time.
The patch below effectively rewrites 17/25: now we split the huge page
before trying to unmap it.
split_huge_page() has its own check similar to is_page_cache_freeable(),
so we woundn't split pages we cannot free later on.
And split_huge_page() for file pages would unmap the page, so we wouldn't
need to go to try_to_unmap() after that.
The patch look rather simple, but I haven't done full validation cycle for
it. Regressions are unlikely, but possible.
At some point we would need to teach try_to_unmap() to handle huge pages.
It would be required for filesystems with backing storage. But I don't see
need for it to get huge tmpfs/shmem work.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9fa9e15594e9..86008f8f1f9b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -473,14 +473,12 @@ void drop_slab(void)
static inline int is_page_cache_freeable(struct page *page)
{
- int radix_tree_pins = PageTransHuge(page) ? HPAGE_PMD_NR : 1;
-
/*
* A freeable page cache page is referenced only by the caller
* that isolated the page, the page cache radix tree and
* optional buffer heads at page->private.
*/
- return page_count(page) - page_has_private(page) == 1 + radix_tree_pins;
+ return page_count(page) - page_has_private(page) == 2;
}
static int may_write_to_inode(struct inode *inode, struct scan_control *sc)
@@ -550,6 +548,8 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
* swap_backing_dev_info is bust: it doesn't reflect the
* congestion state of the swapdevs. Easy to fix, if needed.
*/
+ if (!is_page_cache_freeable(page))
+ return PAGE_KEEP;
if (!mapping) {
/*
* Some data journaling orphaned pages can have
@@ -1055,8 +1055,14 @@ static unsigned long shrink_page_list(struct list_head *page_list,
/* Adding to swap updated mapping */
mapping = page_mapping(page);
+ } else if (unlikely(PageTransHuge(page))) {
+ /* Split file THP */
+ if (split_huge_page_to_list(page, page_list))
+ goto keep_locked;
}
+ VM_BUG_ON_PAGE(PageTransHuge(page), page);
+
/*
* The page is mapped into the page tables of one or more
* processes. Try to unmap it here.
@@ -1112,15 +1118,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
* starts and then write it out here.
*/
try_to_unmap_flush_dirty();
-
- if (!is_page_cache_freeable(page))
- goto keep_locked;
-
- if (unlikely(PageTransHuge(page))) {
- if (split_huge_page_to_list(page, page_list))
- goto keep_locked;
- }
-
switch (pageout(page, mapping, sc)) {
case PAGE_KEEP:
goto keep_locked;
--
Kirill A. Shutemov
next prev parent reply other threads:[~2016-03-24 9:17 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-11 22:58 [PATCHv4 00/25] THP-enabled tmpfs/shmem Kirill A. Shutemov
2016-03-11 22:58 ` Kirill A. Shutemov
2016-03-11 22:58 ` [PATCHv4 01/25] mm: do not pass mm_struct into handle_mm_fault Kirill A. Shutemov
2016-03-11 22:58 ` Kirill A. Shutemov
2016-03-11 22:58 ` [PATCHv4 02/25] mm: introduce fault_env Kirill A. Shutemov
2016-03-11 22:58 ` Kirill A. Shutemov
2016-03-11 22:58 ` [PATCHv4 03/25] mm: postpone page table allocation until we have page to map Kirill A. Shutemov
2016-03-11 22:58 ` Kirill A. Shutemov
2016-03-11 22:58 ` [PATCHv4 04/25] rmap: support file thp Kirill A. Shutemov
2016-03-11 22:58 ` Kirill A. Shutemov
2016-03-18 9:40 ` Aneesh Kumar K.V
2016-03-18 9:40 ` Aneesh Kumar K.V
2016-03-18 9:40 ` Aneesh Kumar K.V
2016-03-19 1:01 ` Kirill A. Shutemov
2016-03-19 1:01 ` Kirill A. Shutemov
2016-03-11 22:58 ` [PATCHv4 05/25] mm: introduce do_set_pmd() Kirill A. Shutemov
2016-03-11 22:58 ` Kirill A. Shutemov
2016-03-11 22:58 ` [PATCHv4 06/25] mm, rmap: account file thp pages Kirill A. Shutemov
2016-03-11 22:58 ` Kirill A. Shutemov
2016-03-15 15:30 ` [PATCHv5 " Kirill A. Shutemov
2016-03-15 15:30 ` Kirill A. Shutemov
2016-03-11 22:58 ` [PATCHv4 07/25] thp, vmstats: add counters for huge file pages Kirill A. Shutemov
2016-03-11 22:58 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 08/25] thp: support file pages in zap_huge_pmd() Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-18 13:53 ` Aneesh Kumar K.V
2016-03-18 13:53 ` Aneesh Kumar K.V
2016-03-18 13:53 ` Aneesh Kumar K.V
2016-03-19 1:02 ` Kirill A. Shutemov
2016-03-19 1:02 ` Kirill A. Shutemov
2016-03-21 4:33 ` Aneesh Kumar K.V
2016-03-21 4:33 ` Aneesh Kumar K.V
2016-03-21 14:39 ` Kirill A. Shutemov
2016-03-21 14:39 ` Kirill A. Shutemov
2016-03-21 16:42 ` Aneesh Kumar K.V
2016-03-21 16:42 ` Aneesh Kumar K.V
2016-03-11 22:59 ` [PATCHv4 09/25] thp: handle file pages in split_huge_pmd() Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 10/25] thp: handle file COW faults Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 11/25] thp: handle file pages in mremap() Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 12/25] thp: skip file huge pmd on copy_huge_pmd() Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 13/25] thp: prepare change_huge_pmd() for file thp Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 14/25] thp: run vma_adjust_trans_huge() outside i_mmap_rwsem Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 15/25] thp: file pages support for split_huge_page() Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 16/25] thp, mlock: do not mlock PTE-mapped file huge pages Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 17/25] vmscan: split file huge pages before paging them out Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 18/25] page-flags: relax policy for PG_mappedtodisk and PG_reclaim Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 19/25] radix-tree: implement radix_tree_maybe_preload_order() Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 20/25] filemap: prepare find and delete operations for huge pages Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 21/25] truncate: handle file thp Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 22/25] shmem: prepare huge= mount option and sysfs knob Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 23/25] shmem: get_unmapped_area align huge page Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 24/25] shmem: add huge pages support Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-11 22:59 ` [PATCHv4 25/25] shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings Kirill A. Shutemov
2016-03-11 22:59 ` Kirill A. Shutemov
2016-03-15 15:52 ` [PATCHv5 26/25] thp: update Documentation/vm/transhuge.txt Kirill A. Shutemov
2016-03-15 15:52 ` Kirill A. Shutemov
2016-03-23 20:09 ` [PATCHv4 00/25] THP-enabled tmpfs/shmem Hugh Dickins
2016-03-23 20:09 ` Hugh Dickins
2016-03-24 9:17 ` Kirill A. Shutemov [this message]
2016-03-24 9:17 ` Kirill A. Shutemov
2016-03-24 19:08 ` Hugh Dickins
2016-03-24 19:08 ` Hugh Dickins
2016-03-25 15:04 ` Kirill A. Shutemov
2016-03-25 15:04 ` Kirill A. Shutemov
2016-03-26 0:00 ` Hugh Dickins
2016-03-26 0:00 ` Hugh Dickins
2016-03-28 18:00 ` Kirill A. Shutemov
2016-03-28 18:00 ` Kirill A. Shutemov
2016-03-28 18:42 ` Hugh Dickins
2016-03-28 18:42 ` Hugh Dickins
2016-03-28 12:29 ` Kirill A. Shutemov
2016-03-28 12:29 ` Kirill A. Shutemov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160324091727.GA26796@node.shutemov.name \
--to=kirill@shutemov.name \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=cl@gentwo.org \
--cc=dave.hansen@intel.com \
--cc=hughd@google.com \
--cc=jmarchan@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=quning@gmail.com \
--cc=sasha.levin@oracle.com \
--cc=vbabka@suse.cz \
--cc=yang.shi@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.