From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
Hugh Dickins <hughd@google.com>,
Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
Mel Gorman <mgorman@suse.de>,
linux-mm@kvack.org, Andi Kleen <ak@linux.intel.com>,
Matthew Wilcox <willy@linux.intel.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Hillf Danton <dhillf@gmail.com>, Dave Hansen <dave@sr71.net>,
Ning Qu <quning@google.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: [PATCHv6 18/22] truncate: support huge pages
Date: Mon, 23 Sep 2013 15:05:46 +0300 [thread overview]
Message-ID: <1379937950-8411-19-git-send-email-kirill.shutemov@linux.intel.com> (raw)
In-Reply-To: <1379937950-8411-1-git-send-email-kirill.shutemov@linux.intel.com>
truncate_inode_pages_range() drops whole huge page at once if it's fully
inside the range.
If a huge page is only partly in the range we zero out the part,
exactly like we do for partial small pages.
In some cases it worth to split the huge page instead, if we need to
truncate it partly and free some memory. But split_huge_page() now
truncates the file, so we need to break truncate<->split interdependency
at some point.
invalidate_mapping_pages() just skips huge pages if they are not fully
in the range.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/pagemap.h | 9 ++++
mm/truncate.c | 125 ++++++++++++++++++++++++++++++++++++++----------
2 files changed, 109 insertions(+), 25 deletions(-)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 967aadbc5e..8ce130fe56 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -580,4 +580,13 @@ static inline void clear_pagecache_page(struct page *page)
clear_highpage(page);
}
+static inline void zero_pagecache_segment(struct page *page,
+ unsigned start, unsigned len)
+{
+ if (PageTransHugeCache(page))
+ zero_huge_user_segment(page, start, len);
+ else
+ zero_user_segment(page, start, len);
+}
+
#endif /* _LINUX_PAGEMAP_H */
diff --git a/mm/truncate.c b/mm/truncate.c
index 353b683afd..ba62ab2168 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -203,10 +203,10 @@ int invalidate_inode_page(struct page *page)
void truncate_inode_pages_range(struct address_space *mapping,
loff_t lstart, loff_t lend)
{
+ struct inode *inode = mapping->host;
pgoff_t start; /* inclusive */
pgoff_t end; /* exclusive */
- unsigned int partial_start; /* inclusive */
- unsigned int partial_end; /* exclusive */
+ bool partial_start, partial_end;
struct pagevec pvec;
pgoff_t index;
int i;
@@ -215,15 +215,13 @@ void truncate_inode_pages_range(struct address_space *mapping,
if (mapping->nrpages == 0)
return;
- /* Offsets within partial pages */
+ /* Whether we have to do partial truncate */
partial_start = lstart & (PAGE_CACHE_SIZE - 1);
partial_end = (lend + 1) & (PAGE_CACHE_SIZE - 1);
/*
* 'start' and 'end' always covers the range of pages to be fully
- * truncated. Partial pages are covered with 'partial_start' at the
- * start of the range and 'partial_end' at the end of the range.
- * Note that 'end' is exclusive while 'lend' is inclusive.
+ * truncated. Note that 'end' is exclusive while 'lend' is inclusive.
*/
start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
if (lend == -1)
@@ -236,10 +234,12 @@ void truncate_inode_pages_range(struct address_space *mapping,
else
end = (lend + 1) >> PAGE_CACHE_SHIFT;
+ i_split_down_read(inode);
pagevec_init(&pvec, 0);
index = start;
while (index < end && pagevec_lookup(&pvec, mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
+ bool thp = false;
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];
@@ -249,6 +249,23 @@ void truncate_inode_pages_range(struct address_space *mapping,
if (index >= end)
break;
+ thp = PageTransHugeCache(page);
+ if (thp) {
+ /* the range starts in middle of huge page */
+ if (index < start) {
+ partial_start = true;
+ start = index + HPAGE_CACHE_NR;
+ break;
+ }
+
+ /* the range ends on huge page */
+ if (index == (end & ~HPAGE_CACHE_INDEX_MASK)) {
+ partial_end = true;
+ end = index;
+ break;
+ }
+ }
+
if (!trylock_page(page))
continue;
WARN_ON(page->index != index);
@@ -258,54 +275,88 @@ void truncate_inode_pages_range(struct address_space *mapping,
}
truncate_inode_page(mapping, page);
unlock_page(page);
+ if (thp)
+ break;
}
pagevec_release(&pvec);
mem_cgroup_uncharge_end();
cond_resched();
- index++;
+ if (thp)
+ index += HPAGE_CACHE_NR;
+ else
+ index++;
}
if (partial_start) {
- struct page *page = find_lock_page(mapping, start - 1);
+ struct page *page;
+
+ page = find_get_page(mapping, start - 1);
if (page) {
- unsigned int top = PAGE_CACHE_SIZE;
- if (start > end) {
- /* Truncation within a single page */
- top = partial_end;
- partial_end = 0;
+ pgoff_t index_mask;
+ loff_t page_cache_mask;
+ unsigned pstart, pend;
+
+ if (PageTransHugeCache(page)) {
+ index_mask = HPAGE_CACHE_INDEX_MASK;
+ page_cache_mask = HPAGE_PMD_MASK;
+ } else {
+ index_mask = 0UL;
+ page_cache_mask = PAGE_CACHE_MASK;
}
+
+ pstart = lstart & ~page_cache_mask;
+ if ((end & ~index_mask) == page->index) {
+ pend = (lend + 1) & ~page_cache_mask;
+ end = page->index;
+ partial_end = false; /* handled here */
+ } else
+ pend = PAGE_CACHE_SIZE << compound_order(page);
+
+ lock_page(page);
wait_on_page_writeback(page);
- zero_user_segment(page, partial_start, top);
+ zero_pagecache_segment(page, pstart, pend);
cleancache_invalidate_page(mapping, page);
if (page_has_private(page))
- do_invalidatepage(page, partial_start,
- top - partial_start);
+ do_invalidatepage(page, pstart,
+ pend - pstart);
unlock_page(page);
page_cache_release(page);
}
}
if (partial_end) {
- struct page *page = find_lock_page(mapping, end);
+ struct page *page;
+
+ page = find_lock_page(mapping, end);
if (page) {
+ loff_t page_cache_mask;
+ unsigned pend;
+
+ if (PageTransHugeCache(page))
+ page_cache_mask = HPAGE_PMD_MASK;
+ else
+ page_cache_mask = PAGE_CACHE_MASK;
+ pend = (lend + 1) & ~page_cache_mask;
+ end = page->index;
wait_on_page_writeback(page);
- zero_user_segment(page, 0, partial_end);
+ zero_pagecache_segment(page, 0, pend);
cleancache_invalidate_page(mapping, page);
if (page_has_private(page))
- do_invalidatepage(page, 0,
- partial_end);
+ do_invalidatepage(page, 0, pend);
unlock_page(page);
page_cache_release(page);
}
}
/*
- * If the truncation happened within a single page no pages
- * will be released, just zeroed, so we can bail out now.
+ * If the truncation happened within a single page no
+ * pages will be released, just zeroed, so we can bail
+ * out now.
*/
if (start >= end)
- return;
+ goto out;
index = start;
for ( ; ; ) {
+ bool thp = false;
cond_resched();
if (!pagevec_lookup(&pvec, mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
@@ -327,16 +378,24 @@ void truncate_inode_pages_range(struct address_space *mapping,
if (index >= end)
break;
+ thp = PageTransHugeCache(page);
lock_page(page);
WARN_ON(page->index != index);
wait_on_page_writeback(page);
truncate_inode_page(mapping, page);
unlock_page(page);
+ if (thp)
+ break;
}
pagevec_release(&pvec);
mem_cgroup_uncharge_end();
- index++;
+ if (thp)
+ index += HPAGE_CACHE_NR;
+ else
+ index++;
}
+out:
+ i_split_up_read(inode);
cleancache_invalidate_inode(mapping);
}
EXPORT_SYMBOL(truncate_inode_pages_range);
@@ -375,6 +434,7 @@ EXPORT_SYMBOL(truncate_inode_pages);
unsigned long invalidate_mapping_pages(struct address_space *mapping,
pgoff_t start, pgoff_t end)
{
+ struct inode *inode = mapping->host;
struct pagevec pvec;
pgoff_t index = start;
unsigned long ret;
@@ -389,9 +449,11 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
* (most pages are dirty), and already skips over any difficulties.
*/
+ i_split_down_read(inode);
pagevec_init(&pvec, 0);
while (index <= end && pagevec_lookup(&pvec, mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+ bool thp = false;
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];
@@ -401,6 +463,15 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
if (index > end)
break;
+ /* skip huge page if it's not fully in the range */
+ thp = PageTransHugeCache(page);
+ if (thp) {
+ if (index < start)
+ break;
+ if (index == (end & ~HPAGE_CACHE_INDEX_MASK))
+ break;
+ }
+
if (!trylock_page(page))
continue;
WARN_ON(page->index != index);
@@ -417,8 +488,12 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
pagevec_release(&pvec);
mem_cgroup_uncharge_end();
cond_resched();
- index++;
+ if (thp)
+ index += HPAGE_CACHE_NR;
+ else
+ index++;
}
+ i_split_up_read(inode);
return count;
}
EXPORT_SYMBOL(invalidate_mapping_pages);
--
1.8.4.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-09-23 12:05 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-23 12:05 [PATCHv6 00/22] Transparent huge page cache: phase 1, everything but mmap() Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 01/22] mm: implement zero_huge_user_segment and friends Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 02/22] radix-tree: implement preload for multiple contiguous elements Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 03/22] memcg, thp: charge huge cache pages Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 04/22] thp: compile-time and sysfs knob for thp pagecache Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 05/22] thp, mm: introduce mapping_can_have_hugepages() predicate Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 06/22] thp: represent file thp pages in meminfo and friends Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 07/22] thp, mm: rewrite add_to_page_cache_locked() to support huge pages Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 08/22] mm: trace filemap: dump page order Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 09/22] block: implement add_bdi_stat() Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 10/22] thp, mm: rewrite delete_from_page_cache() to support huge pages Kirill A. Shutemov
2013-09-25 20:02 ` Ning Qu
2013-09-23 12:05 ` [PATCHv6 11/22] thp, mm: warn if we try to use replace_page_cache_page() with THP Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 12/22] thp, mm: add event counters for huge page alloc on file write or read Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 13/22] mm, vfs: introduce i_split_sem Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 14/22] thp, mm: allocate huge pages in grab_cache_page_write_begin() Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 15/22] thp, mm: naive support of thp in generic_perform_write Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 16/22] thp, mm: handle transhuge pages in do_generic_file_read() Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 17/22] thp, libfs: initial thp support Kirill A. Shutemov
2013-09-23 12:05 ` Kirill A. Shutemov [this message]
2013-09-23 12:05 ` [PATCHv6 19/22] thp: handle file pages in split_huge_page() Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 20/22] thp: wait_split_huge_page(): serialize over i_mmap_mutex too Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 21/22] thp, mm: split huge page on mmap file page Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 22/22] ramfs: enable transparent huge page cache Kirill A. Shutemov
2013-09-24 23:37 ` [PATCHv6 00/22] Transparent huge page cache: phase 1, everything but mmap() Andrew Morton
2013-09-24 23:48 ` Ning Qu
2013-09-24 23:49 ` Andi Kleen
2013-09-24 23:58 ` Andrew Morton
2013-09-25 11:15 ` Kirill A. Shutemov
2013-09-25 15:05 ` Andi Kleen
2013-09-26 18:30 ` Zach Brown
2013-09-26 19:05 ` Andi Kleen
2013-09-30 10:13 ` Mel Gorman
2013-09-30 16:05 ` Andi Kleen
2013-09-25 9:51 ` Kirill A. Shutemov
2013-09-25 23:29 ` Dave Chinner
2013-10-14 13:56 ` Kirill A. Shutemov
2013-09-30 10:02 ` Mel Gorman
2013-09-30 10:10 ` Mel Gorman
2013-09-30 18:07 ` Ning Qu
2013-09-30 18:51 ` Andi Kleen
2013-10-01 8:38 ` Mel Gorman
2013-10-01 17:11 ` Ning Qu
2013-10-14 14:27 ` Kirill A. Shutemov
2013-09-30 15:27 ` Dave Hansen
2013-09-30 18:05 ` Ning Qu
2013-09-25 0:12 ` Ning Qu
2013-09-25 9:23 ` Kirill A. Shutemov
2013-09-26 21:13 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1379937950-8411-19-git-send-email-kirill.shutemov@linux.intel.com \
--to=kirill.shutemov@linux.intel.com \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=dave@sr71.net \
--cc=dhillf@gmail.com \
--cc=fengguang.wu@intel.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=quning@google.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).