From: Matthew Wilcox <willy@infradead.org>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Jan Kara <jack@suse.com>,
Andrew Morton <akpm@linux-foundation.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Hugh Dickins <hughd@google.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Dave Hansen <dave.hansen@intel.com>,
Vlastimil Babka <vbabka@suse.cz>,
Ross Zwisler <ross.zwisler@linux.intel.com>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-block@vger.kernel.org
Subject: Re: [PATCHv6 07/37] filemap: allocate huge page in page_cache_read(), if allowed
Date: Thu, 9 Feb 2017 13:18:35 -0800 [thread overview]
Message-ID: <20170209211835.GV2267@bombadil.infradead.org> (raw)
In-Reply-To: <20170126115819.58875-8-kirill.shutemov@linux.intel.com>
On Thu, Jan 26, 2017 at 02:57:49PM +0300, Kirill A. Shutemov wrote:
> Later we can add logic to accumulate information from shadow entires to
> return to caller (average eviction time?).
I would say minimum rather than average. That will become the refault
time of the entire page, so minimum would probably have us making better
decisions?
> + /* Wipe shadow entires */
> + radix_tree_for_each_slot(slot, &mapping->page_tree, &iter,
> + page->index) {
> + if (iter.index >= page->index + hpage_nr_pages(page))
> + break;
>
> p = radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
> - if (!radix_tree_exceptional_entry(p))
> + if (!p)
> + continue;
Just FYI, this can't happen. You're holding the tree lock so nobody
else gets to remove things from the tree. radix_tree_for_each_slot()
only gives you the full slots; it skips the empty ones for you. I'm
OK if you want to leave it in out of an abundance of caution.
> + __radix_tree_replace(&mapping->page_tree, iter.node, slot, NULL,
> + workingset_update_node, mapping);
I may add an update_node argument to radix_tree_join at some point,
so you can use it here. Or maybe we don't need to do that, and what
you have here works just fine.
> mapping->nrexceptional--;
... because adjusting the exceptional count is going to be a pain.
> + error = __radix_tree_insert(&mapping->page_tree,
> + page->index, compound_order(page), page);
> + /* This shouldn't happen */
> + if (WARN_ON_ONCE(error))
> + return error;
A lesser man would have just ignored the return value from
__radix_tree_insert. I salute you.
> @@ -2078,18 +2155,34 @@ static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask)
> {
> struct address_space *mapping = file->f_mapping;
> struct page *page;
> + pgoff_t hoffset;
> int ret;
>
> do {
> - page = __page_cache_alloc(gfp_mask|__GFP_COLD);
> + page = page_cache_alloc_huge(mapping, offset, gfp_mask);
> +no_huge:
> + if (!page)
> + page = __page_cache_alloc(gfp_mask|__GFP_COLD);
> if (!page)
> return -ENOMEM;
>
> - ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL);
> - if (ret == 0)
> + if (PageTransHuge(page))
> + hoffset = round_down(offset, HPAGE_PMD_NR);
> + else
> + hoffset = offset;
> +
> + ret = add_to_page_cache_lru(page, mapping, hoffset,
> + gfp_mask & GFP_KERNEL);
> +
> + if (ret == -EEXIST && PageTransHuge(page)) {
> + put_page(page);
> + page = NULL;
> + goto no_huge;
> + } else if (ret == 0) {
> ret = mapping->a_ops->readpage(file, page);
> - else if (ret == -EEXIST)
> + } else if (ret == -EEXIST) {
> ret = 0; /* losing race to add is OK */
> + }
>
> put_page(page);
If the filesystem returns AOP_TRUNCATED_PAGE, you'll go round this loop
again trying the huge page again, even if the huge page didn't work
the first time. I would tend to think that if the huge page failed the
first time, we shouldn't try it again, so I propose this:
struct address_space *mapping = file->f_mapping;
struct page *page;
pgoff_t index;
int ret;
bool try_huge = true;
do {
if (try_huge) {
page = page_cache_alloc_huge(gfp_mask|__GFP_COLD);
if (page)
index = round_down(offset, HPAGE_PMD_NR);
else
try_huge = false;
}
if (!try_huge) {
page = __page_cache_alloc(gfp_mask|__GFP_COLD);
index = offset;
}
if (!page)
return -ENOMEM;
ret = add_to_page_cache_lru(page, mapping, index,
gfp_mask & GFP_KERNEL);
if (ret < 0) {
if (try_huge) {
try_huge = false;
ret = AOP_TRUNCATED_PAGE;
} else if (ret == -EEXIST)
ret = 0; /* losing race to add is OK */
} else {
ret = mapping->a_ops->readpage(file, page);
}
put_page(page);
} while (ret == AOP_TRUNCATED_PAGE);
But ... maybe it's OK to retry the huge page. I mean, not many
filesystems return AOP_TRUNCATED_PAGE, and they only do so rarely.
Anyway, I'm fine with the patch going in as-is. I just wanted to type out
my review notes.
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
next prev parent reply other threads:[~2017-02-09 21:18 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-26 11:57 [PATCHv6 00/37] ext4: support of huge pages Kirill A. Shutemov
2017-01-26 11:57 ` [PATCHv6 01/37] mm, shmem: swich huge tmpfs to multi-order radix-tree entries Kirill A. Shutemov
2017-02-09 3:57 ` Matthew Wilcox
2017-02-09 16:58 ` Kirill A. Shutemov
2017-02-13 13:43 ` Kirill A. Shutemov
2017-01-26 11:57 ` [PATCHv6 02/37] Revert "radix-tree: implement radix_tree_maybe_preload_order()" Kirill A. Shutemov
2017-01-26 15:38 ` Matthew Wilcox
2017-01-26 11:57 ` [PATCHv6 03/37] page-flags: relax page flag policy for few flags Kirill A. Shutemov
2017-02-09 4:01 ` Matthew Wilcox
2017-02-13 13:59 ` Kirill A. Shutemov
2017-01-26 11:57 ` [PATCHv6 04/37] mm, rmap: account file thp pages Kirill A. Shutemov
2017-02-09 20:17 ` Matthew Wilcox
2017-01-26 11:57 ` [PATCHv6 05/37] thp: try to free page's buffers before attempt split Kirill A. Shutemov
2017-02-09 20:14 ` Matthew Wilcox
2017-02-13 14:32 ` Kirill A. Shutemov
2017-01-26 11:57 ` [PATCHv6 06/37] thp: handle write-protection faults for file THP Kirill A. Shutemov
2017-01-26 15:44 ` Matthew Wilcox
2017-01-26 15:57 ` Kirill A. Shutemov
2017-02-09 20:19 ` Matthew Wilcox
2017-01-26 11:57 ` [PATCHv6 07/37] filemap: allocate huge page in page_cache_read(), if allowed Kirill A. Shutemov
2017-02-09 21:18 ` Matthew Wilcox [this message]
2017-02-13 15:17 ` Kirill A. Shutemov
2017-01-26 11:57 ` [PATCHv6 08/37] filemap: handle huge pages in do_generic_file_read() Kirill A. Shutemov
2017-02-09 21:55 ` Matthew Wilcox
2017-02-13 15:33 ` Kirill A. Shutemov
2017-02-13 16:01 ` Matthew Wilcox
2017-02-13 16:09 ` Matthew Wilcox
2017-02-13 16:28 ` Matthew Wilcox
2017-01-26 11:57 ` [PATCHv6 09/37] filemap: allocate huge page in pagecache_get_page(), if allowed Kirill A. Shutemov
2017-02-09 21:59 ` Matthew Wilcox
2017-01-26 11:57 ` [PATCHv6 10/37] filemap: handle huge pages in filemap_fdatawait_range() Kirill A. Shutemov
2017-02-09 23:03 ` Matthew Wilcox
2017-01-26 11:57 ` [PATCHv6 11/37] HACK: readahead: alloc huge pages, if allowed Kirill A. Shutemov
2017-02-09 23:34 ` Matthew Wilcox
2017-02-10 0:23 ` Andreas Dilger
2017-02-10 14:51 ` Matthew Wilcox
2017-01-26 11:57 ` [PATCHv6 12/37] brd: make it handle huge pages Kirill A. Shutemov
2017-02-10 17:24 ` Matthew Wilcox
2017-01-26 11:57 ` [PATCHv6 13/37] mm: make write_cache_pages() work on " Kirill A. Shutemov
2017-02-10 17:46 ` Matthew Wilcox
2017-01-26 11:57 ` [PATCHv6 14/37] thp: introduce hpage_size() and hpage_mask() Kirill A. Shutemov
2017-01-26 11:57 ` [PATCHv6 15/37] thp: do not threat slab pages as huge in hpage_{nr_pages,size,mask} Kirill A. Shutemov
2017-02-10 22:13 ` Matthew Wilcox
2017-01-26 11:57 ` [PATCHv6 16/37] thp: make thp_get_unmapped_area() respect S_HUGE_MODE Kirill A. Shutemov
2017-02-10 17:50 ` Matthew Wilcox
2017-01-26 11:57 ` [PATCHv6 17/37] fs: make block_read_full_page() be able to read huge page Kirill A. Shutemov
2017-02-10 17:58 ` Matthew Wilcox
2017-01-26 11:58 ` [PATCHv6 18/37] fs: make block_write_{begin,end}() be able to handle huge pages Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 19/37] fs: make block_page_mkwrite() aware about " Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 20/37] truncate: make truncate_inode_pages_range() " Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 21/37] truncate: make invalidate_inode_pages2_range() " Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 22/37] mm, hugetlb: switch hugetlbfs to multi-order radix-tree entries Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 23/37] mm: account huge pages to dirty, writaback, reclaimable, etc Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 24/37] ext4: make ext4_mpage_readpages() hugepage-aware Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 25/37] ext4: make ext4_writepage() work on huge pages Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 26/37] ext4: handle huge pages in ext4_page_mkwrite() Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 27/37] ext4: handle huge pages in __ext4_block_zero_page_range() Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 28/37] ext4: make ext4_block_write_begin() aware about huge pages Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 29/37] ext4: handle huge pages in ext4_da_write_end() Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 30/37] ext4: make ext4_da_page_release_reservation() aware about huge pages Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 31/37] ext4: handle writeback with " Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 32/37] ext4: make EXT4_IOC_MOVE_EXT work " Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 33/37] ext4: fix SEEK_DATA/SEEK_HOLE for " Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 34/37] ext4: make fallocate() operations work with " Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 35/37] ext4: reserve larger jounral transaction for " Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 36/37] mm, fs, ext4: expand use of page_mapping() and page_to_pgoff() Kirill A. Shutemov
2017-01-26 11:58 ` [PATCHv6 37/37] ext4, vfs: add huge= mount option Kirill A. Shutemov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170209211835.GV2267@bombadil.infradead.org \
--to=willy@infradead.org \
--cc=aarcange@redhat.com \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=hughd@google.com \
--cc=jack@suse.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ross.zwisler@linux.intel.com \
--cc=tytso@mit.edu \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).