From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Dave Hansen <dave@sr71.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Hugh Dickins <hughd@google.com>,
Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
Mel Gorman <mgorman@suse.de>,
linux-mm@kvack.org, Andi Kleen <ak@linux.intel.com>,
Matthew Wilcox <matthew.r.wilcox@intel.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Hillf Danton <dhillf@gmail.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
Date: Fri, 22 Mar 2013 12:34:38 +0200 (EET) [thread overview]
Message-ID: <20130322103438.46C5FE0085@blue.fi.intel.com> (raw)
In-Reply-To: <514B3F24.3070006@sr71.net>
Dave Hansen wrote:
> On 03/14/2013 10:50 AM, Kirill A. Shutemov wrote:
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> >
> > For huge page we add to radix tree HPAGE_CACHE_NR pages at once: head
> > page for the specified index and HPAGE_CACHE_NR-1 tail pages for
> > following indexes.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> > mm/filemap.c | 76 ++++++++++++++++++++++++++++++++++++++++------------------
> > 1 file changed, 53 insertions(+), 23 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 2d99191..6bac9e2 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -447,6 +447,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
> > pgoff_t offset, gfp_t gfp_mask)
> > {
> > int error;
> > + int nr = 1;
> >
> > VM_BUG_ON(!PageLocked(page));
> > VM_BUG_ON(PageSwapBacked(page));
> > @@ -454,32 +455,61 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
> > error = mem_cgroup_cache_charge(page, current->mm,
> > gfp_mask & GFP_RECLAIM_MASK);
> > if (error)
> > - goto out;
> > + return error;
> >
> > - error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
> > - if (error == 0) {
> > - page_cache_get(page);
> > - page->mapping = mapping;
> > - page->index = offset;
> > + if (PageTransHuge(page)) {
> > + BUILD_BUG_ON(HPAGE_CACHE_NR > RADIX_TREE_PRELOAD_NR);
> > + nr = HPAGE_CACHE_NR;
> > + }
>
> That seems like a slightly odd place to put a BUILD_BUG_ON(). I guess
> it doesn't matter to some degree, but does putting it inside the if()
> imply anything?
It actually matters.
HPAGE_CACHE_NR is BUILD_BUG() if !CONFIG_TRANSPARENT_HUGEPAGE, so we need
to hide it inside 'if (PageTransHuge(page))'. PageTransHuge(page) is 0 in
compile time if !CONFIG_TRANSPARENT_HUGEPAGE, so compiler can be smart and
optimize out the check.
> > + error = radix_tree_preload_count(nr, gfp_mask & ~__GFP_HIGHMEM);
> > + if (error) {
> > + mem_cgroup_uncharge_cache_page(page);
> > + return error;
> > + }
> >
> > - spin_lock_irq(&mapping->tree_lock);
> > - error = radix_tree_insert(&mapping->page_tree, offset, page);
> > - if (likely(!error)) {
> > - mapping->nrpages++;
> > - __inc_zone_page_state(page, NR_FILE_PAGES);
> > - spin_unlock_irq(&mapping->tree_lock);
> > - trace_mm_filemap_add_to_page_cache(page);
> > - } else {
> > - page->mapping = NULL;
> > - /* Leave page->index set: truncation relies upon it */
> > - spin_unlock_irq(&mapping->tree_lock);
> > - mem_cgroup_uncharge_cache_page(page);
> > - page_cache_release(page);
>
> I do really like how this rewrite de-indents this code. :)
:)
> > + page_cache_get(page);
> > + spin_lock_irq(&mapping->tree_lock);
> > + page->mapping = mapping;
> > + page->index = offset;
> > + error = radix_tree_insert(&mapping->page_tree, offset, page);
> > + if (unlikely(error))
> > + goto err;
> > + if (PageTransHuge(page)) {
> > + int i;
> > + for (i = 1; i < HPAGE_CACHE_NR; i++) {
> > + page_cache_get(page + i);
> > + page[i].index = offset + i;
>
> Is it OK to leave page->mapping unset for these?
Good catch, thanks.
Seems nobody really use it, since I haven't got any oops, but we need to
set it anyway.
> > + error = radix_tree_insert(&mapping->page_tree,
> > + offset + i, page + i);
> > + if (error) {
> > + page_cache_release(page + i);
> > + break;
> > + }
> > }
>
> Throughout all this new code, I'd really challenge you to try as much as
> possible to minimize the code stuck under "if (PageTransHuge(page))".
I put thp-related code under the 'if' intentionally to be able to optimize
it out if !CONFIG_TRANSPARENT_HUGEPAGE. The config option is disabled by
default.
> For instance, could you change the for() loop a bit and have it shared
> between both cases, like:
>
> > + for (i = 0; i < nr; i++) {
> > + page_cache_get(page + i);
> > + page[i].index = offset + i;
> > + error = radix_tree_insert(&mapping->page_tree,
> > + offset + i, page + i);
> > + if (error) {
> > + page_cache_release(page + i);
> > + break;
> > + }
> > }
>
> > - radix_tree_preload_end();
> > - } else
> > - mem_cgroup_uncharge_cache_page(page);
> > -out:
> > + if (error) {
> > + error = ENOSPC; /* no space for a huge page */
> > + for (i--; i > 0; i--) {
> > + radix_tree_delete(&mapping->page_tree,
> > + offset + i);
> > + page_cache_release(page + i);
> > + }
> > + radix_tree_delete(&mapping->page_tree, offset);
>
> I wonder if this would look any nicer if you just did all the
> page_cache_get()s for the entire huge page along with the head page, and
> then released them all in one place. I think it might shrink the error
> handling paths here.
>
> > + goto err;
> > + }
> > + }
> > + __mod_zone_page_state(page_zone(page), NR_FILE_PAGES, nr);
> > + mapping->nrpages += nr;
> > + spin_unlock_irq(&mapping->tree_lock);
> > + trace_mm_filemap_add_to_page_cache(page);
>
> Do we need to change the tracing to make sure it notes that these were
> or weren't huge pages?
Hm.. I guess we just need to add page order to the trace.
> > + radix_tree_preload_end();
> > + return 0;
> > +err:
> > + page->mapping = NULL;
> > + /* Leave page->index set: truncation relies upon it */
> > + spin_unlock_irq(&mapping->tree_lock);
> > + radix_tree_preload_end();
> > + mem_cgroup_uncharge_cache_page(page);
> > + page_cache_release(page);
> > return error;
> > }
> > EXPORT_SYMBOL(add_to_page_cache_locked);
>
> Does the cgroup code know how to handle these large pages internally
> somehow? It looks like the charge/uncharge is only being done for the
> head page.
It can. We only need to remove PageCompound() check there. Patch is in
git.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-03-22 10:34 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-14 17:50 [PATCHv2, RFC 00/30] Transparent huge page cache Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 01/30] block: implement add_bdi_stat() Kirill A. Shutemov
2013-03-21 14:46 ` Dave Hansen
2013-03-21 17:19 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 02/30] mm: implement zero_huge_user_segment and friends Kirill A. Shutemov
2013-03-21 15:23 ` Dave Hansen
2013-03-22 9:21 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 03/30] mm: drop actor argument of do_generic_file_read() Kirill A. Shutemov
2013-03-15 0:21 ` Hillf Danton
2013-03-15 0:27 ` Hillf Danton
2013-03-15 13:22 ` Kirill A. Shutemov
2013-03-21 15:26 ` Dave Hansen
2013-03-14 17:50 ` [PATCHv2, RFC 04/30] radix-tree: implement preload for multiple contiguous elements Kirill A. Shutemov
2013-03-21 15:56 ` Dave Hansen
2013-03-22 9:47 ` Kirill A. Shutemov
2013-03-22 14:38 ` Dave Hansen
2013-03-25 13:03 ` Kirill A. Shutemov
2013-04-05 3:37 ` Ric Mason
2013-03-14 17:50 ` [PATCHv2, RFC 05/30] thp, mm: avoid PageUnevictable on active/inactive lru lists Kirill A. Shutemov
2013-03-21 16:15 ` Dave Hansen
2013-03-22 10:11 ` Kirill A. Shutemov
2013-04-05 3:42 ` Ric Mason
2013-03-14 17:50 ` [PATCHv2, RFC 06/30] thp, mm: basic defines for transparent huge page cache Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 07/30] thp, mm: introduce mapping_can_have_hugepages() predicate Kirill A. Shutemov
2013-03-21 16:21 ` Dave Hansen
2013-03-22 10:12 ` Kirill A. Shutemov
2013-03-22 14:44 ` Dave Hansen
2013-04-02 14:46 ` Kirill A. Shutemov
2013-04-05 3:45 ` Ric Mason
2013-04-05 3:48 ` Ric Mason
2013-03-14 17:50 ` [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages Kirill A. Shutemov
2013-03-15 1:30 ` Hillf Danton
2013-03-15 13:23 ` Kirill A. Shutemov
2013-03-15 13:25 ` Hillf Danton
2013-03-15 13:50 ` Kirill A. Shutemov
2013-03-15 13:55 ` Hillf Danton
2013-03-15 15:05 ` Kirill A. Shutemov
2013-03-21 17:11 ` Dave Hansen
2013-03-22 10:34 ` Kirill A. Shutemov [this message]
2013-03-22 14:51 ` Dave Hansen
2013-03-14 17:50 ` [PATCHv2, RFC 09/30] thp, mm: rewrite delete_from_page_cache() " Kirill A. Shutemov
2013-03-15 2:25 ` Hillf Danton
2013-03-15 13:23 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 10/30] thp, mm: locking tail page is a bug Kirill A. Shutemov
2013-03-21 17:20 ` Dave Hansen
2013-03-14 17:50 ` [PATCHv2, RFC 11/30] thp, mm: handle tail pages in page_cache_get_speculative() Kirill A. Shutemov
2013-04-05 4:03 ` Ric Mason
2013-03-14 17:50 ` [PATCHv2, RFC 12/30] thp, mm: add event counters for huge page alloc on write to a file Kirill A. Shutemov
2013-03-21 17:59 ` Dave Hansen
2013-03-26 8:40 ` Kirill A. Shutemov
2013-04-05 4:05 ` Ric Mason
2013-03-14 17:50 ` [PATCHv2, RFC 13/30] thp, mm: implement grab_cache_huge_page_write_begin() Kirill A. Shutemov
2013-03-15 2:34 ` Hillf Danton
2013-03-15 13:24 ` Kirill A. Shutemov
2013-03-15 13:30 ` Hillf Danton
2013-03-15 13:35 ` Kirill A. Shutemov
2013-03-15 13:37 ` Hillf Danton
2013-03-21 18:15 ` Dave Hansen
2013-03-26 10:48 ` Kirill A. Shutemov
2013-03-26 15:40 ` Dave
2013-03-21 18:16 ` Dave Hansen
2013-03-14 17:50 ` [PATCHv2, RFC 14/30] thp, mm: naive support of thp in generic read/write routines Kirill A. Shutemov
2013-03-15 3:11 ` Hillf Danton
2013-03-15 13:27 ` Kirill A. Shutemov
2013-03-22 15:22 ` Dave Hansen
2013-03-28 12:25 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 15/30] thp, libfs: initial support of thp in simple_read/write_begin/write_end Kirill A. Shutemov
2013-03-22 18:01 ` Dave
2013-03-28 14:29 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 16/30] thp: handle file pages in split_huge_page() Kirill A. Shutemov
2013-03-15 6:15 ` Hillf Danton
2013-03-15 13:26 ` Kirill A. Shutemov
2013-03-15 13:33 ` Hillf Danton
2013-03-22 18:18 ` Dave
2013-03-28 14:32 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 17/30] thp: wait_split_huge_page(): serialize over i_mmap_mutex too Kirill A. Shutemov
2013-03-22 18:22 ` Dave
2013-03-28 15:08 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 18/30] thp, mm: truncate support for transparent huge page cache Kirill A. Shutemov
2013-03-22 18:29 ` Dave
2013-03-28 15:31 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 19/30] thp, mm: split huge page on mmap file page Kirill A. Shutemov
2013-03-15 6:58 ` Hillf Danton
2013-03-15 13:29 ` Kirill A. Shutemov
2013-03-15 13:35 ` Hillf Danton
2013-03-15 13:45 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 20/30] ramfs: enable transparent huge page cache Kirill A. Shutemov
2013-04-02 16:28 ` Kirill A. Shutemov
2013-04-02 22:15 ` Hugh Dickins
2013-04-03 1:11 ` Minchan Kim
2013-04-05 6:47 ` Simon Jeons
2013-04-05 8:01 ` Minchan Kim
2013-04-05 8:22 ` Wanpeng Li
[not found] ` <515e89d2.e725320a.3a74.7fe7SMTPIN_ADDED_BROKEN@mx.google.com>
2013-04-05 8:31 ` Minchan Kim
2013-04-05 8:35 ` Wanpeng Li
2013-04-05 13:46 ` Christoph Lameter
2013-04-03 13:53 ` Christoph Lameter
2013-03-14 17:50 ` [PATCHv2, RFC 21/30] x86-64, mm: proper alignment mappings with hugepages Kirill A. Shutemov
2013-03-22 18:37 ` Dave
2013-03-14 17:50 ` [PATCHv2, RFC 22/30] mm: add huge_fault() callback to vm_operations_struct Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 23/30] thp: prepare zap_huge_pmd() to uncharge file pages Kirill A. Shutemov
2013-03-15 7:09 ` Hillf Danton
2013-03-15 13:30 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 24/30] thp: move maybe_pmd_mkwrite() out of mk_huge_pmd() Kirill A. Shutemov
2013-03-15 7:31 ` Hillf Danton
2013-03-14 17:50 ` [PATCHv2, RFC 25/30] thp, mm: basic huge_fault implementation for generic_file_vm_ops Kirill A. Shutemov
2013-03-15 7:44 ` Hillf Danton
2013-03-15 13:30 ` Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 26/30] thp: extract fallback path from do_huge_pmd_anonymous_page() to a function Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 27/30] thp: initial implementation of do_huge_linear_fault() Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 28/30] thp: handle write-protect exception to file-backed huge pages Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 29/30] thp: call __vma_adjust_trans_huge() for file-backed VMA Kirill A. Shutemov
2013-03-14 17:50 ` [PATCHv2, RFC 30/30] thp: map file-backed huge pages on fault Kirill A. Shutemov
2013-03-15 0:33 ` [PATCHv2, RFC 00/30] Transparent huge page cache Hillf Danton
2013-03-15 13:33 ` Kirill A. Shutemov
2013-03-18 4:03 ` Simon Jeons
2013-03-18 5:23 ` Simon Jeons
2013-03-18 11:19 ` Kirill A. Shutemov
2013-03-18 11:29 ` Simon Jeons
2013-03-18 11:42 ` Kirill A. Shutemov
2013-03-18 11:42 ` Ric Mason
2013-03-20 1:09 ` Simon Jeons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130322103438.46C5FE0085@blue.fi.intel.com \
--to=kirill.shutemov@linux.intel.com \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=dave@sr71.net \
--cc=dhillf@gmail.com \
--cc=fengguang.wu@intel.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.r.wilcox@intel.com \
--cc=mgorman@suse.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).