From: Dave Hansen <dave@sr71.net>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Hugh Dickins <hughd@google.com>,
Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
Mel Gorman <mgorman@suse.de>,
linux-mm@kvack.org, Andi Kleen <ak@linux.intel.com>,
Matthew Wilcox <matthew.r.wilcox@intel.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Hillf Danton <dhillf@gmail.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCHv3, RFC 31/34] thp: initial implementation of do_huge_linear_fault()
Date: Wed, 17 Apr 2013 15:07:56 -0700 [thread overview]
Message-ID: <516F1D3C.1060804@sr71.net> (raw)
In-Reply-To: <20130417143842.1A76CE0085@blue.fi.intel.com>
[-- Attachment #1: Type: text/plain, Size: 1648 bytes --]
On 04/17/2013 07:38 AM, Kirill A. Shutemov wrote:
>> > Ugh. This is essentially a copy-n-paste of code in __do_fault(),
>> > including the comments. Is there no way to consolidate the code so that
>> > there's less duplication here?
> I've looked into it once again and it seems there's not much space for
> consolidation. Code structure looks very similar, but there are many
> special cases for thp: fallback path, pte vs. pmd, etc. I don't see how we
> can consolidate them in them in sane way.
> I think copy is more maintainable :(
I took the two copies, put them each in a file, changed some of the
_very_ trivial stuff to match (foo=1 vs foo=true) and diffed them.
They're very similar lengths (in lines):
185 __do_fault
197 do_huge_linear_fault
If you diff them:
1 file changed, 68 insertions(+), 56 deletions(-)
That means that of 185 lines in __do_fault(), 129 (70%) of them were
copied *VERBATIM*. Not similar in structure or appearance. Bit-for-bit
the same.
I took a stab at consolidating them. I think we could add a
VM_FAULT_FALLBACK flag to explicitly indicate that we need to do a
huge->small fallback, as well as a FAULT_FLAG_TRANSHUGE to indicate that
a given fault has not attempted to be handled by a huge page. If we
call __do_fault() with FAULT_FLAG_TRANSHUGE and we get back
VM_FAULT_FALLBACK or VM_FAULT_OOM, then we clear FAULT_FLAG_TRANSHUGE
and retry in handle_mm_fault().
I only went about 1/4 of the way in to __do_fault(). If went and spent
another hour or two, I'm pretty convinced I could push this even further.
Are you still sure you can't do _any_ better than a verbatim copy of 129
lines?
[-- Attachment #2: extend-__do_fault.patch --]
[-- Type: text/x-patch, Size: 3942 bytes --]
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7acc9dc..d408b5b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -879,6 +879,7 @@ static inline int page_mapped(struct page *page)
#define VM_FAULT_NOPAGE 0x0100 /* ->fault installed the pte, not return page */
#define VM_FAULT_LOCKED 0x0200 /* ->fault locked the returned page */
#define VM_FAULT_RETRY 0x0400 /* ->fault blocked, must retry */
+#define VM_FAULT_FALLBACK 0x0800 /* large page fault failed, fall back to small */
#define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
diff --git a/mm/memory.c b/mm/memory.c
index 494526a..9aced3a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3229,6 +3229,40 @@ oom:
return VM_FAULT_OOM;
}
+static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, unsigned long addr)
+{
+ unsigned long haddr = address & HPAGE_PMD_MASK;
+
+ if (((vma->vm_start >> PAGE_SHIFT) & HPAGE_CACHE_INDEX_MASK) !=
+ (vma->vm_pgoff & HPAGE_CACHE_INDEX_MASK)) {
+ return false;
+ }
+ if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end) {
+ return false;
+ }
+ return true;
+}
+
+static struct page *alloc_fault_page_vma(gfp_t flags, vm_area_struct *vma,
+ unsigned long address, unsigned int flags)
+{
+ int try_huge_pages = flags & FAULT_FLAG_TRANSHUGE;
+ unsigned long haddr = address & HPAGE_PMD_MASK;
+
+ if (try_huge_pages) {
+ return alloc_hugepage_vma(transparent_hugepage_defrag(vma),
+ vma, haddr, numa_node_id(), 0);
+ }
+ return alloc_page_vma(flags, vma, address);
+}
+
+static inline void __user *align_fault_address(unsigned long address, unsigned int flags)
+{
+ if (flags & FAULT_FLAG_TRANSHUGE)
+ return (void __user *)address & HPAGE_PMD_MASK;
+ return (void __user *)(address & PAGE_MASK);
+}
+
/*
* __do_fault() tries to create a new page mapping. It aggressively
* tries to share with existing pages, but makes a separate copy if
@@ -3256,17 +3290,21 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
struct vm_fault vmf;
int ret;
int page_mkwrite = 0;
+ int try_huge_pages = !!(flags & FAULT_FLAG_TRANSHUGE);
+
+ if (try_huge_pages && !transhuge_vma_suitable(vma)) {
+ return VM_FAULT_FALLBACK;
+ }
/*
* If we do COW later, allocate page befor taking lock_page()
* on the file cache page. This will reduce lock holding time.
*/
if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
-
if (unlikely(anon_vma_prepare(vma)))
return VM_FAULT_OOM;
- cow_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
+ cow_page = alloc_fault_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
if (!cow_page)
return VM_FAULT_OOM;
@@ -3277,7 +3315,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
} else
cow_page = NULL;
- vmf.virtual_address = (void __user *)(address & PAGE_MASK);
+ vmf.virtual_address = align_fault_address(address, flags);
vmf.pgoff = pgoff;
vmf.flags = flags;
vmf.page = NULL;
@@ -3714,7 +3752,6 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
-
__set_current_state(TASK_RUNNING);
count_vm_event(PGFAULT);
@@ -3726,6 +3763,9 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
if (unlikely(is_vm_hugetlb_page(vma)))
return hugetlb_fault(mm, vma, address, flags);
+ /* We will try a single shot (only if enabled an possible)
+ * to do a transparent huge page */
+ flags |= FAULT_FLAG_TRANSHUGE;
retry:
pgd = pgd_offset(mm, address);
pud = pud_alloc(mm, pgd, address);
@@ -3738,6 +3778,11 @@ retry:
if (!vma->vm_ops)
return do_huge_pmd_anonymous_page(mm, vma, address,
pmd, flags);
+ ret = __do_fault(mm, vma, address, pmd, ...)
+ if (ret & (VM_FAULT_OOM | VM_FAULT_FALLBACK)) {
+ flags &= ~FAULT_FLAG_TRANSHUGE;
+ goto retry;
+ }
} else {
pmd_t orig_pmd = *pmd;
int ret;
next prev parent reply other threads:[~2013-04-17 22:07 UTC|newest]
Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-05 11:59 [PATCHv3, RFC 00/34] Transparent huge page cache Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 01/34] mm: drop actor argument of do_generic_file_read() Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 02/34] block: implement add_bdi_stat() Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 03/34] mm: implement zero_huge_user_segment and friends Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 04/34] radix-tree: implement preload for multiple contiguous elements Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 05/34] memcg, thp: charge huge cache pages Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 06/34] thp, mm: avoid PageUnevictable on active/inactive lru lists Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 07/34] thp, mm: basic defines for transparent huge page cache Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 08/34] thp, mm: introduce mapping_can_have_hugepages() predicate Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 09/34] thp: represent file thp pages in meminfo and friends Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-08 19:38 ` Dave Hansen
2013-04-08 19:38 ` Dave Hansen
2013-04-16 14:49 ` Kirill A. Shutemov
2013-04-16 14:49 ` Kirill A. Shutemov
2013-04-16 15:11 ` Dave Hansen
2013-04-16 15:11 ` Dave Hansen
2013-04-05 11:59 ` [PATCHv3, RFC 10/34] thp, mm: rewrite add_to_page_cache_locked() to support huge pages Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 11/34] mm: trace filemap: dump page order Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 12/34] thp, mm: rewrite delete_from_page_cache() to support huge pages Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 13/34] thp, mm: trigger bug in replace_page_cache_page() on THP Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 14/34] thp, mm: locking tail page is a bug Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 15/34] thp, mm: handle tail pages in page_cache_get_speculative() Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 16/34] thp, mm: add event counters for huge page alloc on write to a file Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 17/34] thp, mm: implement grab_thp_write_begin() Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 18/34] thp, mm: naive support of thp in generic read/write routines Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 19/34] thp, libfs: initial support of thp in simple_read/write_begin/write_end Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 20/34] thp: handle file pages in split_huge_page() Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 21/34] thp: wait_split_huge_page(): serialize over i_mmap_mutex too Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 22/34] thp, mm: truncate support for transparent huge page cache Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 23/34] thp, mm: split huge page on mmap file page Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 24/34] ramfs: enable transparent huge page cache Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 25/34] x86-64, mm: proper alignment mappings with hugepages Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 26/34] mm: add huge_fault() callback to vm_operations_struct Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 27/34] thp: prepare zap_huge_pmd() to uncharge file pages Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 28/34] thp: move maybe_pmd_mkwrite() out of mk_huge_pmd() Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 29/34] thp, mm: basic huge_fault implementation for generic_file_vm_ops Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 30/34] thp: extract fallback path from do_huge_pmd_anonymous_page() to a function Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 31/34] thp: initial implementation of do_huge_linear_fault() Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-08 18:46 ` Dave Hansen
2013-04-08 18:46 ` Dave Hansen
2013-04-08 18:52 ` Dave Hansen
2013-04-08 18:52 ` Dave Hansen
2013-04-17 14:38 ` Kirill A. Shutemov
2013-04-17 14:38 ` Kirill A. Shutemov
2013-04-17 22:07 ` Dave Hansen [this message]
2013-04-18 16:09 ` Kirill A. Shutemov
2013-04-18 16:09 ` Kirill A. Shutemov
2013-04-18 16:19 ` Kirill A. Shutemov
2013-04-18 16:19 ` Kirill A. Shutemov
2013-04-18 16:19 ` Kirill A. Shutemov
2013-04-18 16:20 ` Dave Hansen
2013-04-18 16:20 ` Dave Hansen
2013-04-18 16:38 ` Kirill A. Shutemov
2013-04-18 16:38 ` Kirill A. Shutemov
2013-04-18 16:42 ` Dave Hansen
2013-04-18 16:42 ` Dave Hansen
2013-04-05 11:59 ` [PATCHv3, RFC 32/34] thp: handle write-protect exception to file-backed huge pages Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-08 19:07 ` Dave Hansen
2013-04-08 19:07 ` Dave Hansen
2013-04-26 15:31 ` Kirill A. Shutemov
2013-04-26 15:31 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 33/34] thp: call __vma_adjust_trans_huge() for file-backed VMA Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-05 11:59 ` [PATCHv3, RFC 34/34] thp: map file-backed huge pages on fault Kirill A. Shutemov
2013-04-05 11:59 ` Kirill A. Shutemov
2013-04-07 0:40 ` [PATCHv3, RFC 00/34] Transparent huge page cache Ric Mason
2013-04-07 0:40 ` Ric Mason
2013-04-15 16:02 ` IOZone with transparent " Kirill A. Shutemov
2013-04-15 16:02 ` Kirill A. Shutemov
2013-04-15 18:17 ` [RESEND] " Kirill A. Shutemov
2013-04-15 18:17 ` Kirill A. Shutemov
2013-04-15 23:19 ` Dave Hansen
2013-04-15 23:19 ` Dave Hansen
2013-04-16 5:57 ` Kirill A. Shutemov
2013-04-16 5:57 ` Kirill A. Shutemov
2013-04-16 6:11 ` Dave Hansen
2013-04-16 6:11 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=516F1D3C.1060804@sr71.net \
--to=dave@sr71.net \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=dhillf@gmail.com \
--cc=fengguang.wu@intel.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=kirill.shutemov@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.r.wilcox@intel.com \
--cc=mgorman@suse.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.