From: "Mika Penttilä" <mika.penttila@kolumbus.fi>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Tony Battersby <tonyb@cybernetics.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Nick Piggin <npiggin@suse.de>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] more ZERO_PAGE handling ( was 2.6.24 regression: deadlock on coredump of big process)
Date: Wed, 30 Apr 2008 08:03:33 +0300 [thread overview]
Message-ID: <4817FDA5.1040702@kolumbus.fi> (raw)
In-Reply-To: <20080430132516.28f1ee0c.kamezawa.hiroyu@jp.fujitsu.com>
KAMEZAWA Hiroyuki wrote:
> On Tue, 29 Apr 2008 10:10:58 -0400
> Tony Battersby <tonyb@cybernetics.com> wrote:
>
>> If I leave more memory free by changing the argument to
>> malloc_all_but_x_mb(), then I have to increase the number of threads
>> required to trigger the deadlock. Changing the thread stack size via
>> setrlimit(RLIMIT_STACK) also changes the number of threads that are
>> required to trigger the deadlock. For example, with
>> malloc_all_but_x_mb(16) and the default stack size of 8 MB, <= 5 threads
>> will coredump successfully, and >= 6 threads will deadlock. With
>> malloc_all_but_x_mb(16) and a reduced stack size of 4096 bytes, <= 8
>> threads will coredump successfully, and >= 9 threads will deadlock.
>>
>> Also note that the "free" command reports 10 MB free memory while the
>> program is running before the segfault is triggered.
>>
>>
> Hmm, my idea is below.
>
> Nick's remove ZERO_PAGE patch includes following change
>
> ==
> @@ -2252,39 +2158,24 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
> spinlock_t *ptl;
> {
> <snip>
> - page_add_new_anon_rmap(page, vma, address);
> - } else {
> - /* Map the ZERO_PAGE - vm_page_prot is readonly */
> - page = ZERO_PAGE(address);
> - page_cache_get(page);
> - entry = mk_pte(page, vma->vm_page_prot);
> + if (unlikely(anon_vma_prepare(vma)))
> + goto oom;
> + page = alloc_zeroed_user_highpage_movable(vma, address);
> ==
>
> above change is for avoiding to use ZERO_PAGE at read-page-fault to anonymous
> vma. This is reasonable I think. But at coredump, tons of read-but-never-written
> pages can be allocated.
> ==
> coredump
> -> get_user_pages()
> -> follow_page() returns NULL
> -> handle mm fault
> -> do_anonymous page.
> ==
> follow_page() returns ZERO_PAGE only when page table is not avaiable.
>
> So, making follow_page() return ZERO_PAGE can be a fix of extra memory
> consumpstion at core dump. (Maybe someone can think of other fix.)
>
> how about this patch ? Could you try ?
>
> (I'm sorry but I'll not be active for a week because my servers are powered off.)
>
> -Kame
>
>
But sure we still have to handle the fault for instance swapped pages,
for other uses of get_user_pages();
--Mika
> ==
> follow_page() returns ZERO_PAGE if page table is not available.
> but returns NULL pte is not presentl.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Index: linux-2.6.25/mm/memory.c
> ===================================================================
> --- linux-2.6.25.orig/mm/memory.c
> +++ linux-2.6.25/mm/memory.c
> @@ -926,15 +926,15 @@ struct page *follow_page(struct vm_area_
> page = NULL;
> pgd = pgd_offset(mm, address);
> if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
> - goto no_page_table;
> + goto null_or_zeropage;
>
> pud = pud_offset(pgd, address);
> if (pud_none(*pud) || unlikely(pud_bad(*pud)))
> - goto no_page_table;
> + goto null_or_zeropage;
>
> pmd = pmd_offset(pud, address);
> if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
> - goto no_page_table;
> + goto null_or_zeropage;
>
> if (pmd_huge(*pmd)) {
> BUG_ON(flags & FOLL_GET);
> @@ -947,8 +947,10 @@ struct page *follow_page(struct vm_area_
> goto out;
>
> pte = *ptep;
> - if (!pte_present(pte))
> - goto unlock;
> + if (!(flags & FOLL_WRITE) && !pte_present(pte)) {
> + pte_unmap_unlock(ptep, ptl);
> + goto null_or_zeropage;
> + }
> if ((flags & FOLL_WRITE) && !pte_write(pte))
> goto unlock;
> page = vm_normal_page(vma, address, pte);
> @@ -968,7 +970,7 @@ unlock:
> out:
> return page;
>
> -no_page_table:
> +null_or_zeropage:
> /*
> * When core dumping an enormous anonymous area that nobody
> * has touched so far, we don't want to allocate page tables.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
WARNING: multiple messages have this Message-ID (diff)
From: "Mika Penttilä" <mika.penttila@kolumbus.fi>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Tony Battersby <tonyb@cybernetics.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Nick Piggin <npiggin@suse.de>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] more ZERO_PAGE handling ( was 2.6.24 regression: deadlock on coredump of big process)
Date: Wed, 30 Apr 2008 08:03:33 +0300 [thread overview]
Message-ID: <4817FDA5.1040702@kolumbus.fi> (raw)
In-Reply-To: <20080430132516.28f1ee0c.kamezawa.hiroyu@jp.fujitsu.com>
KAMEZAWA Hiroyuki wrote:
> On Tue, 29 Apr 2008 10:10:58 -0400
> Tony Battersby <tonyb@cybernetics.com> wrote:
>
>> If I leave more memory free by changing the argument to
>> malloc_all_but_x_mb(), then I have to increase the number of threads
>> required to trigger the deadlock. Changing the thread stack size via
>> setrlimit(RLIMIT_STACK) also changes the number of threads that are
>> required to trigger the deadlock. For example, with
>> malloc_all_but_x_mb(16) and the default stack size of 8 MB, <= 5 threads
>> will coredump successfully, and >= 6 threads will deadlock. With
>> malloc_all_but_x_mb(16) and a reduced stack size of 4096 bytes, <= 8
>> threads will coredump successfully, and >= 9 threads will deadlock.
>>
>> Also note that the "free" command reports 10 MB free memory while the
>> program is running before the segfault is triggered.
>>
>>
> Hmm, my idea is below.
>
> Nick's remove ZERO_PAGE patch includes following change
>
> ==
> @@ -2252,39 +2158,24 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
> spinlock_t *ptl;
> {
> <snip>
> - page_add_new_anon_rmap(page, vma, address);
> - } else {
> - /* Map the ZERO_PAGE - vm_page_prot is readonly */
> - page = ZERO_PAGE(address);
> - page_cache_get(page);
> - entry = mk_pte(page, vma->vm_page_prot);
> + if (unlikely(anon_vma_prepare(vma)))
> + goto oom;
> + page = alloc_zeroed_user_highpage_movable(vma, address);
> ==
>
> above change is for avoiding to use ZERO_PAGE at read-page-fault to anonymous
> vma. This is reasonable I think. But at coredump, tons of read-but-never-written
> pages can be allocated.
> ==
> coredump
> -> get_user_pages()
> -> follow_page() returns NULL
> -> handle mm fault
> -> do_anonymous page.
> ==
> follow_page() returns ZERO_PAGE only when page table is not avaiable.
>
> So, making follow_page() return ZERO_PAGE can be a fix of extra memory
> consumpstion at core dump. (Maybe someone can think of other fix.)
>
> how about this patch ? Could you try ?
>
> (I'm sorry but I'll not be active for a week because my servers are powered off.)
>
> -Kame
>
>
But sure we still have to handle the fault for instance swapped pages,
for other uses of get_user_pages();
--Mika
> ==
> follow_page() returns ZERO_PAGE if page table is not available.
> but returns NULL pte is not presentl.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Index: linux-2.6.25/mm/memory.c
> ===================================================================
> --- linux-2.6.25.orig/mm/memory.c
> +++ linux-2.6.25/mm/memory.c
> @@ -926,15 +926,15 @@ struct page *follow_page(struct vm_area_
> page = NULL;
> pgd = pgd_offset(mm, address);
> if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
> - goto no_page_table;
> + goto null_or_zeropage;
>
> pud = pud_offset(pgd, address);
> if (pud_none(*pud) || unlikely(pud_bad(*pud)))
> - goto no_page_table;
> + goto null_or_zeropage;
>
> pmd = pmd_offset(pud, address);
> if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
> - goto no_page_table;
> + goto null_or_zeropage;
>
> if (pmd_huge(*pmd)) {
> BUG_ON(flags & FOLL_GET);
> @@ -947,8 +947,10 @@ struct page *follow_page(struct vm_area_
> goto out;
>
> pte = *ptep;
> - if (!pte_present(pte))
> - goto unlock;
> + if (!(flags & FOLL_WRITE) && !pte_present(pte)) {
> + pte_unmap_unlock(ptep, ptl);
> + goto null_or_zeropage;
> + }
> if ((flags & FOLL_WRITE) && !pte_write(pte))
> goto unlock;
> page = vm_normal_page(vma, address, pte);
> @@ -968,7 +970,7 @@ unlock:
> out:
> return page;
>
> -no_page_table:
> +null_or_zeropage:
> /*
> * When core dumping an enormous anonymous area that nobody
> * has touched so far, we don't want to allocate page tables.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-04-30 7:45 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-28 15:11 2.6.24 regression: deadlock on coredump of big process Tony Battersby
2008-04-28 15:11 ` Tony Battersby
2008-04-29 1:00 ` KAMEZAWA Hiroyuki
2008-04-29 1:00 ` KAMEZAWA Hiroyuki
2008-04-29 14:10 ` Tony Battersby
2008-04-29 14:10 ` Tony Battersby
2008-04-30 4:25 ` [PATCH] more ZERO_PAGE handling ( was 2.6.24 regression: deadlock on coredump of big process) KAMEZAWA Hiroyuki
2008-04-30 4:25 ` KAMEZAWA Hiroyuki
2008-04-30 4:46 ` Nick Piggin
2008-04-30 4:46 ` Nick Piggin
2008-04-30 5:03 ` Mika Penttilä [this message]
2008-04-30 5:03 ` Mika Penttilä
2008-04-30 5:09 ` Nick Piggin
2008-04-30 5:09 ` Nick Piggin
2008-04-30 5:17 ` KAMEZAWA Hiroyuki
2008-04-30 5:17 ` KAMEZAWA Hiroyuki
2008-04-30 5:19 ` Nick Piggin
2008-04-30 5:19 ` Nick Piggin
2008-04-30 5:35 ` KAMEZAWA Hiroyuki
2008-04-30 5:35 ` KAMEZAWA Hiroyuki
2008-04-30 6:11 ` Nick Piggin
2008-04-30 6:11 ` Nick Piggin
2008-05-07 2:14 ` KAMEZAWA Hiroyuki
2008-05-07 2:14 ` KAMEZAWA Hiroyuki
2008-05-07 2:27 ` KAMEZAWA Hiroyuki
2008-05-07 2:27 ` KAMEZAWA Hiroyuki
2008-04-30 13:57 ` Tony Battersby
2008-04-30 13:57 ` Tony Battersby
2008-05-01 8:39 ` kamezawa.hiroyu
2008-05-01 8:39 ` kamezawa.hiroyu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4817FDA5.1040702@kolumbus.fi \
--to=mika.penttila@kolumbus.fi \
--cc=akpm@linux-foundation.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=tonyb@cybernetics.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.