From: Vlastimil Babka <vbabka@suse.cz>
To: Ebru Akagunduz <ebru.akagunduz@gmail.com>, linux-mm@kvack.org
Cc: akpm@linux-foundation.org, kirill.shutemov@linux.intel.com,
n-horiguchi@ah.jp.nec.com, aarcange@redhat.com, riel@redhat.com,
iamjoonsoo.kim@lge.com, xiexiuqi@huawei.com, gorcunov@openvz.org,
linux-kernel@vger.kernel.org, mgorman@suse.de,
rientjes@google.com, aneesh.kumar@linux.vnet.ibm.com,
hughd@google.com, hannes@cmpxchg.org, mhocko@suse.cz,
boaz@plexistor.com, raindel@mellanox.com
Subject: Re: [RFC v3 1/3] mm: add tracepoint for scanning pages
Date: Tue, 28 Jul 2015 10:39:29 +0200 [thread overview]
Message-ID: <55B73FC1.2070708@suse.cz> (raw)
In-Reply-To: <1436819284-3964-2-git-send-email-ebru.akagunduz@gmail.com>
On 07/13/2015 10:28 PM, Ebru Akagunduz wrote:
> Using static tracepoints, data of functions is recorded.
> It is good to automatize debugging without doing a lot
> of changes in the source code.
>
> This patch adds tracepoint for khugepaged_scan_pmd,
> collapse_huge_page and __collapse_huge_page_isolate.
>
> Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> ---
> Changes in v2:
> - Nothing changed
>
> Changes in v3:
> - Print page address instead of vm_start (Vlastimil Babka)
> - Define constants to specify exact tracepoint result (Vlastimil Babka)
Hi, and thanks for improving the tracepoints!
>
> include/linux/mm.h | 18 ++++++
> include/trace/events/huge_memory.h | 100 ++++++++++++++++++++++++++++++++
> mm/huge_memory.c | 114 +++++++++++++++++++++++++++----------
> 3 files changed, 203 insertions(+), 29 deletions(-)
> create mode 100644 include/trace/events/huge_memory.h
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 7f47178..bf341c0 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -21,6 +21,24 @@
> #include <linux/resource.h>
> #include <linux/page_ext.h>
>
> +#define MM_PMD_NULL 0
> +#define MM_EXCEED_NONE_PTE 3
> +#define MM_PTE_NON_PRESENT 4
> +#define MM_PAGE_NULL 5
> +#define MM_SCAN_ABORT 6
> +#define MM_PAGE_COUNT 7
> +#define MM_PAGE_LRU 8
> +#define MM_ANY_PROCESS 0
> +#define MM_VMA_NULL 2
> +#define MM_VMA_CHECK 3
> +#define MM_ADDRESS_RANGE 4
> +#define MM_PAGE_LOCK 2
> +#define MM_SWAP_CACHE_PAGE 6
> +#define MM_ISOLATE_LRU_PAGE 7
> +#define MM_ALLOC_HUGE_PAGE_FAIL 6
> +#define MM_CGROUP_CHARGE_FAIL 7
> +#define MM_COLLAPSE_ISOLATE_FAIL 5
this would better go to mm/huge_memory.c since it's used nowhere else,
so we shouldn't pollute a global header. Also I'd suggest changing the
MM_ prefix to e.g. SCAN_ ?
Reusing the numbers depending on whether they can appear in a single
function is unnecessarily complicated, we don't have to fit in some
small limit here. You could also use an enum to avoid defining each
constant's value manually.
> struct mempolicy;
> struct anon_vma;
> struct anon_vma_chain;
> diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
> new file mode 100644
> index 0000000..cbc56fc
> --- /dev/null
> +++ b/include/trace/events/huge_memory.h
> @@ -0,0 +1,100 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM huge_memory
> +
> +#if !defined(__HUGE_MEMORY_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define __HUGE_MEMORY_H
> +
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT(mm_khugepaged_scan_pmd,
> +
> + TP_PROTO(struct mm_struct *mm, struct page *page, bool writable,
> + bool referenced, int none_or_zero, int ret),
> +
> + TP_ARGS(mm, page, writable, referenced, none_or_zero, ret),
> +
> + TP_STRUCT__entry(
> + __field(struct mm_struct *, mm)
> + __field(struct page *, page)
> + __field(bool, writable)
> + __field(bool, referenced)
> + __field(int, none_or_zero)
> + __field(int, ret)
> + ),
> +
> + TP_fast_assign(
> + __entry->mm = mm;
> + __entry->page = page;
> + __entry->writable = writable;
> + __entry->referenced = referenced;
> + __entry->none_or_zero = none_or_zero;
> + __entry->ret = ret;
> + ),
> +
> + TP_printk("mm=%p, page=%p, writable=%d, referenced=%d, none_or_zero=%d, ret=%d",
> + __entry->mm,
> + __entry->page,
Sorry, when I suggested "the address of the page itself" instead of
vm_start, I was thinking physical address (pfn).
Compaction tracepoints recently standardized on this print format so I'd
recommend it here too:
scan_pfn=0x%lx
> + __entry->writable,
> + __entry->referenced,
> + __entry->none_or_zero,
> + __entry->ret)
Instead of printing a number that has to be translated manually, I'd
recommend converting to string. Look at how compaction_status_string is
defined in mm/compaction.c and used from the tracepoints.
[ ... ]
>
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/huge_memory.h>
> +
> /*
> * By default transparent hugepage support is disabled in order that avoid
> * to risk increase the memory footprint of applications without a guaranteed
> @@ -2190,25 +2193,32 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> unsigned long address,
> pte_t *pte)
> {
> - struct page *page;
> + struct page *page = NULL;
> pte_t *_pte;
> - int none_or_zero = 0;
> + int none_or_zero = 0, ret = 0;
> bool referenced = false, writable = false;
> for (_pte = pte; _pte < pte+HPAGE_PMD_NR;
> _pte++, address += PAGE_SIZE) {
> pte_t pteval = *_pte;
> if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> if (!userfaultfd_armed(vma) &&
> - ++none_or_zero <= khugepaged_max_ptes_none)
> + ++none_or_zero <= khugepaged_max_ptes_none) {
> continue;
> - else
> + } else {
> + ret = MM_EXCEED_NONE_PTE;
> goto out;
> + }
> }
> - if (!pte_present(pteval))
> + if (!pte_present(pteval)) {
> + ret = MM_PTE_NON_PRESENT;
> goto out;
> + }
> +
> page = vm_normal_page(vma, address, pteval);
> - if (unlikely(!page))
> + if (unlikely(!page)) {
> + ret = MM_PAGE_NULL;
> goto out;
> + }
>
> VM_BUG_ON_PAGE(PageCompound(page), page);
> VM_BUG_ON_PAGE(!PageAnon(page), page);
> @@ -2220,8 +2230,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> * is needed to serialize against split_huge_page
> * when invoked from the VM.
> */
> - if (!trylock_page(page))
> + if (!trylock_page(page)) {
> + ret = MM_PAGE_LOCK;
> goto out;
> + }
>
> /*
> * cannot use mapcount: can't collapse if there's a gup pin.
> @@ -2230,6 +2242,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> */
> if (page_count(page) != 1 + !!PageSwapCache(page)) {
> unlock_page(page);
> + ret = MM_PAGE_COUNT;
> goto out;
> }
> if (pte_write(pteval)) {
> @@ -2237,6 +2250,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> } else {
> if (PageSwapCache(page) && !reuse_swap_page(page)) {
> unlock_page(page);
> + ret = MM_SWAP_CACHE_PAGE;
> goto out;
> }
> /*
> @@ -2251,6 +2265,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> */
> if (isolate_lru_page(page)) {
> unlock_page(page);
> + ret = MM_ISOLATE_LRU_PAGE;
> goto out;
> }
> /* 0 stands for page_is_file_cache(page) == false */
> @@ -2263,11 +2278,16 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> mmu_notifier_test_young(vma->vm_mm, address))
> referenced = true;
> }
> - if (likely(referenced && writable))
> + if (likely(referenced && writable)) {
> + trace_mm_collapse_huge_page_isolate(page, none_or_zero,
> + referenced, writable, ret);
> return 1;
> + }
> out:
> release_pte_pages(pte, _pte);
> - return 0;
> + trace_mm_collapse_huge_page_isolate(page, none_or_zero,
> + referenced, writable, ret);
> + return ret;
> }
Having success returned as "1" and failures of either 0 or other
positive values is uncommon and may lead to mistakes. Per
Documentation/CodingStyle Chapter 16, it should be either 0 = success
and any other = error, or 0 = failure, non-zero = success. Here the
first variant would be applicable. Following the chapter strictly, the
function should have been using this variant even before your patch,
since here the "name of a function is an action or an imperative
command" :) but yeah...
Anyway I don't think you need to return the exact error, since the
caller doesn't use it. It's there only for the tracepoint, so you can
simply keep returning just 1 or 0. Then with tracepoints disabled in
.config, the compiler should be also able to eliminate all the
assignments that would be unused in the end.
Same suggestions apply to khugepaged_scan_pmd()
>
> static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
> @@ -2501,7 +2521,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> pgtable_t pgtable;
> struct page *new_page;
> spinlock_t *pmd_ptl, *pte_ptl;
> - int isolated;
> + int isolated = 0, ret = 1;
> unsigned long hstart, hend;
> struct mem_cgroup *memcg;
> unsigned long mmun_start; /* For mmu_notifiers */
> @@ -2516,12 +2536,18 @@ static void collapse_huge_page(struct mm_struct *mm,
>
> /* release the mmap_sem read lock. */
> new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node);
> - if (!new_page)
> + if (!new_page) {
> + ret = MM_ALLOC_HUGE_PAGE_FAIL;
> + trace_mm_collapse_huge_page(mm, isolated, ret);
> return;
> + }
>
> if (unlikely(mem_cgroup_try_charge(new_page, mm,
> - gfp, &memcg)))
> + gfp, &memcg))) {
> + ret = MM_CGROUP_CHARGE_FAIL;
> + trace_mm_collapse_huge_page(mm, isolated, ret);
> return;
> + }
You could add a label called e.g. "out_nolock" right after
"up_write(&mm->mmap_sem);" below, and goto there to avoid the multiple
tracepoints.
>
> /*
> * Prevent all access to pagetables with the exception of
> @@ -2529,21 +2555,31 @@ static void collapse_huge_page(struct mm_struct *mm,
> * handled by the anon_vma lock + PG_lock.
> */
> down_write(&mm->mmap_sem);
> - if (unlikely(khugepaged_test_exit(mm)))
> + if (unlikely(khugepaged_test_exit(mm))) {
> + ret = MM_ANY_PROCESS;
> goto out;
> + }
>
> vma = find_vma(mm, address);
> - if (!vma)
> + if (!vma) {
> + ret = MM_VMA_NULL;
> goto out;
> + }
> hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> hend = vma->vm_end & HPAGE_PMD_MASK;
> - if (address < hstart || address + HPAGE_PMD_SIZE > hend)
> + if (address < hstart || address + HPAGE_PMD_SIZE > hend) {
> + ret = MM_ADDRESS_RANGE;
> goto out;
> - if (!hugepage_vma_check(vma))
> + }
> + if (!hugepage_vma_check(vma)) {
> + ret = MM_VMA_CHECK;
> goto out;
> + }
> pmd = mm_find_pmd(mm, address);
> - if (!pmd)
> + if (!pmd) {
> + ret = MM_PMD_NULL;
> goto out;
> + }
>
> anon_vma_lock_write(vma->anon_vma);
>
> @@ -2568,7 +2604,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> isolated = __collapse_huge_page_isolate(vma, address, pte);
> spin_unlock(pte_ptl);
>
> - if (unlikely(!isolated)) {
> + if (unlikely(isolated != 1)) {
> pte_unmap(pte);
> spin_lock(pmd_ptl);
> BUG_ON(!pmd_none(*pmd));
> @@ -2580,6 +2616,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> pmd_populate(mm, pmd, pmd_pgtable(_pmd));
> spin_unlock(pmd_ptl);
> anon_vma_unlock_write(vma->anon_vma);
> + ret = MM_COLLAPSE_ISOLATE_FAIL;
> goto out;
> }
>
> @@ -2619,6 +2656,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> khugepaged_pages_collapsed++;
> out_up_write:
> up_write(&mm->mmap_sem);
out_nolock:
> + trace_mm_collapse_huge_page(mm, isolated, ret);
> return;
>
> out:
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz>
To: Ebru Akagunduz <ebru.akagunduz@gmail.com>, linux-mm@kvack.org
Cc: akpm@linux-foundation.org, kirill.shutemov@linux.intel.com,
n-horiguchi@ah.jp.nec.com, aarcange@redhat.com, riel@redhat.com,
iamjoonsoo.kim@lge.com, xiexiuqi@huawei.com, gorcunov@openvz.org,
linux-kernel@vger.kernel.org, mgorman@suse.de,
rientjes@google.com, aneesh.kumar@linux.vnet.ibm.com,
hughd@google.com, hannes@cmpxchg.org, mhocko@suse.cz,
boaz@plexistor.com, raindel@mellanox.com
Subject: Re: [RFC v3 1/3] mm: add tracepoint for scanning pages
Date: Tue, 28 Jul 2015 10:39:29 +0200 [thread overview]
Message-ID: <55B73FC1.2070708@suse.cz> (raw)
In-Reply-To: <1436819284-3964-2-git-send-email-ebru.akagunduz@gmail.com>
On 07/13/2015 10:28 PM, Ebru Akagunduz wrote:
> Using static tracepoints, data of functions is recorded.
> It is good to automatize debugging without doing a lot
> of changes in the source code.
>
> This patch adds tracepoint for khugepaged_scan_pmd,
> collapse_huge_page and __collapse_huge_page_isolate.
>
> Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> ---
> Changes in v2:
> - Nothing changed
>
> Changes in v3:
> - Print page address instead of vm_start (Vlastimil Babka)
> - Define constants to specify exact tracepoint result (Vlastimil Babka)
Hi, and thanks for improving the tracepoints!
>
> include/linux/mm.h | 18 ++++++
> include/trace/events/huge_memory.h | 100 ++++++++++++++++++++++++++++++++
> mm/huge_memory.c | 114 +++++++++++++++++++++++++++----------
> 3 files changed, 203 insertions(+), 29 deletions(-)
> create mode 100644 include/trace/events/huge_memory.h
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 7f47178..bf341c0 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -21,6 +21,24 @@
> #include <linux/resource.h>
> #include <linux/page_ext.h>
>
> +#define MM_PMD_NULL 0
> +#define MM_EXCEED_NONE_PTE 3
> +#define MM_PTE_NON_PRESENT 4
> +#define MM_PAGE_NULL 5
> +#define MM_SCAN_ABORT 6
> +#define MM_PAGE_COUNT 7
> +#define MM_PAGE_LRU 8
> +#define MM_ANY_PROCESS 0
> +#define MM_VMA_NULL 2
> +#define MM_VMA_CHECK 3
> +#define MM_ADDRESS_RANGE 4
> +#define MM_PAGE_LOCK 2
> +#define MM_SWAP_CACHE_PAGE 6
> +#define MM_ISOLATE_LRU_PAGE 7
> +#define MM_ALLOC_HUGE_PAGE_FAIL 6
> +#define MM_CGROUP_CHARGE_FAIL 7
> +#define MM_COLLAPSE_ISOLATE_FAIL 5
this would better go to mm/huge_memory.c since it's used nowhere else,
so we shouldn't pollute a global header. Also I'd suggest changing the
MM_ prefix to e.g. SCAN_ ?
Reusing the numbers depending on whether they can appear in a single
function is unnecessarily complicated, we don't have to fit in some
small limit here. You could also use an enum to avoid defining each
constant's value manually.
> struct mempolicy;
> struct anon_vma;
> struct anon_vma_chain;
> diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
> new file mode 100644
> index 0000000..cbc56fc
> --- /dev/null
> +++ b/include/trace/events/huge_memory.h
> @@ -0,0 +1,100 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM huge_memory
> +
> +#if !defined(__HUGE_MEMORY_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define __HUGE_MEMORY_H
> +
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT(mm_khugepaged_scan_pmd,
> +
> + TP_PROTO(struct mm_struct *mm, struct page *page, bool writable,
> + bool referenced, int none_or_zero, int ret),
> +
> + TP_ARGS(mm, page, writable, referenced, none_or_zero, ret),
> +
> + TP_STRUCT__entry(
> + __field(struct mm_struct *, mm)
> + __field(struct page *, page)
> + __field(bool, writable)
> + __field(bool, referenced)
> + __field(int, none_or_zero)
> + __field(int, ret)
> + ),
> +
> + TP_fast_assign(
> + __entry->mm = mm;
> + __entry->page = page;
> + __entry->writable = writable;
> + __entry->referenced = referenced;
> + __entry->none_or_zero = none_or_zero;
> + __entry->ret = ret;
> + ),
> +
> + TP_printk("mm=%p, page=%p, writable=%d, referenced=%d, none_or_zero=%d, ret=%d",
> + __entry->mm,
> + __entry->page,
Sorry, when I suggested "the address of the page itself" instead of
vm_start, I was thinking physical address (pfn).
Compaction tracepoints recently standardized on this print format so I'd
recommend it here too:
scan_pfn=0x%lx
> + __entry->writable,
> + __entry->referenced,
> + __entry->none_or_zero,
> + __entry->ret)
Instead of printing a number that has to be translated manually, I'd
recommend converting to string. Look at how compaction_status_string is
defined in mm/compaction.c and used from the tracepoints.
[ ... ]
>
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/huge_memory.h>
> +
> /*
> * By default transparent hugepage support is disabled in order that avoid
> * to risk increase the memory footprint of applications without a guaranteed
> @@ -2190,25 +2193,32 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> unsigned long address,
> pte_t *pte)
> {
> - struct page *page;
> + struct page *page = NULL;
> pte_t *_pte;
> - int none_or_zero = 0;
> + int none_or_zero = 0, ret = 0;
> bool referenced = false, writable = false;
> for (_pte = pte; _pte < pte+HPAGE_PMD_NR;
> _pte++, address += PAGE_SIZE) {
> pte_t pteval = *_pte;
> if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> if (!userfaultfd_armed(vma) &&
> - ++none_or_zero <= khugepaged_max_ptes_none)
> + ++none_or_zero <= khugepaged_max_ptes_none) {
> continue;
> - else
> + } else {
> + ret = MM_EXCEED_NONE_PTE;
> goto out;
> + }
> }
> - if (!pte_present(pteval))
> + if (!pte_present(pteval)) {
> + ret = MM_PTE_NON_PRESENT;
> goto out;
> + }
> +
> page = vm_normal_page(vma, address, pteval);
> - if (unlikely(!page))
> + if (unlikely(!page)) {
> + ret = MM_PAGE_NULL;
> goto out;
> + }
>
> VM_BUG_ON_PAGE(PageCompound(page), page);
> VM_BUG_ON_PAGE(!PageAnon(page), page);
> @@ -2220,8 +2230,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> * is needed to serialize against split_huge_page
> * when invoked from the VM.
> */
> - if (!trylock_page(page))
> + if (!trylock_page(page)) {
> + ret = MM_PAGE_LOCK;
> goto out;
> + }
>
> /*
> * cannot use mapcount: can't collapse if there's a gup pin.
> @@ -2230,6 +2242,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> */
> if (page_count(page) != 1 + !!PageSwapCache(page)) {
> unlock_page(page);
> + ret = MM_PAGE_COUNT;
> goto out;
> }
> if (pte_write(pteval)) {
> @@ -2237,6 +2250,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> } else {
> if (PageSwapCache(page) && !reuse_swap_page(page)) {
> unlock_page(page);
> + ret = MM_SWAP_CACHE_PAGE;
> goto out;
> }
> /*
> @@ -2251,6 +2265,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> */
> if (isolate_lru_page(page)) {
> unlock_page(page);
> + ret = MM_ISOLATE_LRU_PAGE;
> goto out;
> }
> /* 0 stands for page_is_file_cache(page) == false */
> @@ -2263,11 +2278,16 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> mmu_notifier_test_young(vma->vm_mm, address))
> referenced = true;
> }
> - if (likely(referenced && writable))
> + if (likely(referenced && writable)) {
> + trace_mm_collapse_huge_page_isolate(page, none_or_zero,
> + referenced, writable, ret);
> return 1;
> + }
> out:
> release_pte_pages(pte, _pte);
> - return 0;
> + trace_mm_collapse_huge_page_isolate(page, none_or_zero,
> + referenced, writable, ret);
> + return ret;
> }
Having success returned as "1" and failures of either 0 or other
positive values is uncommon and may lead to mistakes. Per
Documentation/CodingStyle Chapter 16, it should be either 0 = success
and any other = error, or 0 = failure, non-zero = success. Here the
first variant would be applicable. Following the chapter strictly, the
function should have been using this variant even before your patch,
since here the "name of a function is an action or an imperative
command" :) but yeah...
Anyway I don't think you need to return the exact error, since the
caller doesn't use it. It's there only for the tracepoint, so you can
simply keep returning just 1 or 0. Then with tracepoints disabled in
.config, the compiler should be also able to eliminate all the
assignments that would be unused in the end.
Same suggestions apply to khugepaged_scan_pmd()
>
> static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
> @@ -2501,7 +2521,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> pgtable_t pgtable;
> struct page *new_page;
> spinlock_t *pmd_ptl, *pte_ptl;
> - int isolated;
> + int isolated = 0, ret = 1;
> unsigned long hstart, hend;
> struct mem_cgroup *memcg;
> unsigned long mmun_start; /* For mmu_notifiers */
> @@ -2516,12 +2536,18 @@ static void collapse_huge_page(struct mm_struct *mm,
>
> /* release the mmap_sem read lock. */
> new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node);
> - if (!new_page)
> + if (!new_page) {
> + ret = MM_ALLOC_HUGE_PAGE_FAIL;
> + trace_mm_collapse_huge_page(mm, isolated, ret);
> return;
> + }
>
> if (unlikely(mem_cgroup_try_charge(new_page, mm,
> - gfp, &memcg)))
> + gfp, &memcg))) {
> + ret = MM_CGROUP_CHARGE_FAIL;
> + trace_mm_collapse_huge_page(mm, isolated, ret);
> return;
> + }
You could add a label called e.g. "out_nolock" right after
"up_write(&mm->mmap_sem);" below, and goto there to avoid the multiple
tracepoints.
>
> /*
> * Prevent all access to pagetables with the exception of
> @@ -2529,21 +2555,31 @@ static void collapse_huge_page(struct mm_struct *mm,
> * handled by the anon_vma lock + PG_lock.
> */
> down_write(&mm->mmap_sem);
> - if (unlikely(khugepaged_test_exit(mm)))
> + if (unlikely(khugepaged_test_exit(mm))) {
> + ret = MM_ANY_PROCESS;
> goto out;
> + }
>
> vma = find_vma(mm, address);
> - if (!vma)
> + if (!vma) {
> + ret = MM_VMA_NULL;
> goto out;
> + }
> hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> hend = vma->vm_end & HPAGE_PMD_MASK;
> - if (address < hstart || address + HPAGE_PMD_SIZE > hend)
> + if (address < hstart || address + HPAGE_PMD_SIZE > hend) {
> + ret = MM_ADDRESS_RANGE;
> goto out;
> - if (!hugepage_vma_check(vma))
> + }
> + if (!hugepage_vma_check(vma)) {
> + ret = MM_VMA_CHECK;
> goto out;
> + }
> pmd = mm_find_pmd(mm, address);
> - if (!pmd)
> + if (!pmd) {
> + ret = MM_PMD_NULL;
> goto out;
> + }
>
> anon_vma_lock_write(vma->anon_vma);
>
> @@ -2568,7 +2604,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> isolated = __collapse_huge_page_isolate(vma, address, pte);
> spin_unlock(pte_ptl);
>
> - if (unlikely(!isolated)) {
> + if (unlikely(isolated != 1)) {
> pte_unmap(pte);
> spin_lock(pmd_ptl);
> BUG_ON(!pmd_none(*pmd));
> @@ -2580,6 +2616,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> pmd_populate(mm, pmd, pmd_pgtable(_pmd));
> spin_unlock(pmd_ptl);
> anon_vma_unlock_write(vma->anon_vma);
> + ret = MM_COLLAPSE_ISOLATE_FAIL;
> goto out;
> }
>
> @@ -2619,6 +2656,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> khugepaged_pages_collapsed++;
> out_up_write:
> up_write(&mm->mmap_sem);
out_nolock:
> + trace_mm_collapse_huge_page(mm, isolated, ret);
> return;
>
> out:
next prev parent reply other threads:[~2015-07-28 8:39 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-13 20:28 [RFC v3 0/3] mm: make swapin readahead to gain more thp performance Ebru Akagunduz
2015-07-13 20:28 ` Ebru Akagunduz
2015-07-13 20:28 ` [RFC v3 1/3] mm: add tracepoint for scanning pages Ebru Akagunduz
2015-07-13 20:28 ` Ebru Akagunduz
2015-07-13 21:06 ` Kirill A. Shutemov
2015-07-13 21:06 ` Kirill A. Shutemov
2015-07-14 19:16 ` Ebru Akagunduz
2015-07-14 19:16 ` Ebru Akagunduz
2015-07-28 8:39 ` Vlastimil Babka [this message]
2015-07-28 8:39 ` Vlastimil Babka
2015-07-13 20:28 ` [RFC v3 2/3] mm: make optimistic check for swapin readahead Ebru Akagunduz
2015-07-13 20:28 ` Ebru Akagunduz
2015-07-13 21:07 ` Kirill A. Shutemov
2015-07-13 21:07 ` Kirill A. Shutemov
2015-07-14 18:51 ` Ebru Akagunduz
2015-07-14 18:51 ` Ebru Akagunduz
2015-07-13 20:28 ` [RFC v3 3/3] mm: make swapin readahead to improve thp collapse rate Ebru Akagunduz
2015-07-13 20:28 ` Ebru Akagunduz
2015-07-14 9:02 ` Kirill A. Shutemov
2015-07-14 9:02 ` Kirill A. Shutemov
2015-07-28 8:59 ` Vlastimil Babka
2015-07-28 8:59 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55B73FC1.2070708@suse.cz \
--to=vbabka@suse.cz \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=boaz@plexistor.com \
--cc=ebru.akagunduz@gmail.com \
--cc=gorcunov@openvz.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=raindel@mellanox.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=xiexiuqi@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.