[PATCH] coredump_filter: add hugepage core dumping

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] coredump_filter: add hugepage core dumping
@ 2008-08-28  5:24 KOSAKI Motohiro
  2008-08-28 14:48 ` Adam Litke
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2008-08-28  5:24 UTC (permalink / raw)
  To: Kawai Hidehiro, Hugh Dickins, William Irwin, Adam Litke, LKML,
	Andrew Morton
  Cc: kosaki.motohiro

Now, hugepage's vma has VM_RESERVED flag because it cannot be swapped.

and VM_RESERVED vma isn't core dumped because its flag often be used for
kernel internal vma (e.g. vmalloc, sound related).

So, hugepage is never dumped and it indicate hugepages's program can't be debugged easily.

In these days, demand on making use of hugepage is increasing.
IMO, native support for coredump of hugepage is useful.


I think VM_RESERVED default dumping bahavior is good,
then I'd like to add coredump_filter mask.

This patch doesn't change dafault behavior.


I tested by following method.

# ulimit -c unlimited
# echo 0x23 > /proc/self/coredump_filter
# ./hugepage_dump
# gdb ./hugepage_dump core


hugepage_dump.c
------------------------------------------------
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>

#define HUGEPAGE_SIZE (256*1024*1024)

int main(int argc, char** argv)
{
	int err;
	int shmid;
	int *pval;
	int shm_flags = 0666;

	if ((argc >= 2) && (strcmp(argv[1], "-h")==0))
		shm_flags |= SHM_HUGETLB;

	err = shmid = shmget(IPC_PRIVATE, HUGEPAGE_SIZE, shm_flags);
	if (err < 0) {
		perror("shmget");
		exit(1);
	}

	pval = shmat(shmid, 0, 0);
	if (pval == (void*)-1) {
		perror("shmat");
		exit(1);
	}

	*pval = 1;

	*(int*)0 = 1;

	exit(0);
}
-----------------------------------------------------


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Kawai, Hidehiro <hidehiro.kawai.ez@hitachi.com>
CC: Hugh Dickins <hugh@veritas.com>
CC: William Irwin <wli@holomorphy.com>
CC: Adam Litke <agl@us.ibm.com>

---
 Documentation/filesystems/proc.txt |    3 ++-
 fs/binfmt_elf.c                    |    7 ++++++-
 include/linux/sched.h              |    3 ++-
 3 files changed, 10 insertions(+), 3 deletions(-)

Index: b/Documentation/filesystems/proc.txt
===================================================================
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -2389,11 +2389,12 @@ will be dumped when the <pid> process is
 of memory types. If a bit of the bitmask is set, memory segments of the
 corresponding memory type are dumped, otherwise they are not dumped.
 
-The following 4 memory types are supported:
+The following 5 memory types are supported:
   - (bit 0) anonymous private memory
   - (bit 1) anonymous shared memory
   - (bit 2) file-backed private memory
   - (bit 3) file-backed shared memory
+  - (bit 5) hugetlb memory
 
   Note that MMIO pages such as frame buffer are never dumped and vDSO pages
   are always dumped regardless of the bitmask status.
Index: b/include/linux/sched.h
===================================================================
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -402,8 +402,9 @@ extern int get_dumpable(struct mm_struct
 #define MMF_DUMP_MAPPED_PRIVATE	4
 #define MMF_DUMP_MAPPED_SHARED	5
 #define MMF_DUMP_ELF_HEADERS	6
+#define MMF_DUMP_HUGETLB	7
 #define MMF_DUMP_FILTER_SHIFT	MMF_DUMPABLE_BITS
-#define MMF_DUMP_FILTER_BITS	5
+#define MMF_DUMP_FILTER_BITS	(MMF_DUMP_HUGETLB - MMF_DUMP_ANON_PRIVATE + 1)
 #define MMF_DUMP_FILTER_MASK \
 	(((1 << MMF_DUMP_FILTER_BITS) - 1) << MMF_DUMP_FILTER_SHIFT)
 #define MMF_DUMP_FILTER_DEFAULT \
Index: b/fs/binfmt_elf.c
===================================================================
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1160,11 +1160,16 @@ static unsigned long vma_dump_size(struc
 	if (vma->vm_flags & VM_ALWAYSDUMP)
 		goto whole;
 
+#define FILTER(type)	(mm_flags & (1UL << MMF_DUMP_##type))
+
+	if ((vma->vm_flags & VM_HUGETLB) && FILTER(HUGETLB))
+		goto whole;
+
 	/* Do not dump I/O mapped devices or special mappings */
 	if (vma->vm_flags & (VM_IO | VM_RESERVED))
 		return 0;
 
-#define FILTER(type)	(mm_flags & (1UL << MMF_DUMP_##type))
 
 	/* By default, dump shared memory if mapped from an anonymous file. */
 	if (vma->vm_flags & VM_SHARED) {



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] coredump_filter: add hugepage core dumping
  2008-08-28  5:24 [PATCH] coredump_filter: add hugepage core dumping KOSAKI Motohiro
@ 2008-08-28 14:48 ` Adam Litke
  2008-08-28 14:59   ` KOSAKI Motohiro
  2008-08-28 16:38 ` Hugh Dickins
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Adam Litke @ 2008-08-28 14:48 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Kawai Hidehiro, Hugh Dickins, William Irwin, LKML, Andrew Morton

On Thu, 2008-08-28 at 14:24 +0900, KOSAKI Motohiro wrote:
> Now, hugepage's vma has VM_RESERVED flag because it cannot be swapped.
> 
> and VM_RESERVED vma isn't core dumped because its flag often be used for
> kernel internal vma (e.g. vmalloc, sound related).
> 
> So, hugepage is never dumped and it indicate hugepages's program can't be debugged easily.
> 
> In these days, demand on making use of hugepage is increasing.
> IMO, native support for coredump of hugepage is useful.
> 
> 
> I think VM_RESERVED default dumping bahavior is good,
> then I'd like to add coredump_filter mask.
> 
> This patch doesn't change dafault behavior.

I have tested this using both the provided C program and with a program
that used libhugetlbfs to back its text and data segments with huge
pages.  This seems to work as expected.  From a hugetlbfs perspective,
this seems like a reasonable patch to me.

> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Kawai, Hidehiro <hidehiro.kawai.ez@hitachi.com>
> CC: Hugh Dickins <hugh@veritas.com>
> CC: William Irwin <wli@holomorphy.com>
> CC: Adam Litke <agl@us.ibm.com>

Tested-by: Adam Litke <agl@us.ibm.com>
Acked-by: Adam Litke <agl@us.ibm.com>

> ---
>  Documentation/filesystems/proc.txt |    3 ++-
>  fs/binfmt_elf.c                    |    7 ++++++-
>  include/linux/sched.h              |    3 ++-
>  3 files changed, 10 insertions(+), 3 deletions(-)
> 
> Index: b/Documentation/filesystems/proc.txt
> ===================================================================
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -2389,11 +2389,12 @@ will be dumped when the <pid> process is
>  of memory types. If a bit of the bitmask is set, memory segments of the
>  corresponding memory type are dumped, otherwise they are not dumped.
> 
> -The following 4 memory types are supported:
> +The following 5 memory types are supported:
>    - (bit 0) anonymous private memory
>    - (bit 1) anonymous shared memory
>    - (bit 2) file-backed private memory
>    - (bit 3) file-backed shared memory
> +  - (bit 5) hugetlb memory
> 
>    Note that MMIO pages such as frame buffer are never dumped and vDSO pages
>    are always dumped regardless of the bitmask status.
> Index: b/include/linux/sched.h
> ===================================================================
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -402,8 +402,9 @@ extern int get_dumpable(struct mm_struct
>  #define MMF_DUMP_MAPPED_PRIVATE	4
>  #define MMF_DUMP_MAPPED_SHARED	5
>  #define MMF_DUMP_ELF_HEADERS	6
> +#define MMF_DUMP_HUGETLB	7
>  #define MMF_DUMP_FILTER_SHIFT	MMF_DUMPABLE_BITS
> -#define MMF_DUMP_FILTER_BITS	5
> +#define MMF_DUMP_FILTER_BITS	(MMF_DUMP_HUGETLB - MMF_DUMP_ANON_PRIVATE + 1)
>  #define MMF_DUMP_FILTER_MASK \
>  	(((1 << MMF_DUMP_FILTER_BITS) - 1) << MMF_DUMP_FILTER_SHIFT)
>  #define MMF_DUMP_FILTER_DEFAULT \
> Index: b/fs/binfmt_elf.c
> ===================================================================
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1160,11 +1160,16 @@ static unsigned long vma_dump_size(struc
>  	if (vma->vm_flags & VM_ALWAYSDUMP)
>  		goto whole;
> 
> +#define FILTER(type)	(mm_flags & (1UL << MMF_DUMP_##type))
> +
> +	if ((vma->vm_flags & VM_HUGETLB) && FILTER(HUGETLB))
> +		goto whole;
> +
>  	/* Do not dump I/O mapped devices or special mappings */
>  	if (vma->vm_flags & (VM_IO | VM_RESERVED))
>  		return 0;
> 
> -#define FILTER(type)	(mm_flags & (1UL << MMF_DUMP_##type))
> 
>  	/* By default, dump shared memory if mapped from an anonymous file. */
>  	if (vma->vm_flags & VM_SHARED) {
> 
> 
-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] coredump_filter: add hugepage core dumping
  2008-08-28 14:48 ` Adam Litke
@ 2008-08-28 14:59   ` KOSAKI Motohiro
  0 siblings, 0 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2008-08-28 14:59 UTC (permalink / raw)
  To: Adam Litke
  Cc: Kawai Hidehiro, Hugh Dickins, William Irwin, LKML, Andrew Morton

> I have tested this using both the provided C program and with a program
> that used libhugetlbfs to back its text and data segments with huge
> pages.  This seems to work as expected.  From a hugetlbfs perspective,
> this seems like a reasonable patch to me.

Thank you, Adam!

Yes, libhugetlbfs user increased in these days.
So, I believe debaggability of libhugetlbfs linked program is important.

>> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> CC: Kawai, Hidehiro <hidehiro.kawai.ez@hitachi.com>
>> CC: Hugh Dickins <hugh@veritas.com>
>> CC: William Irwin <wli@holomorphy.com>
>> CC: Adam Litke <agl@us.ibm.com>
>
> Tested-by: Adam Litke <agl@us.ibm.com>
> Acked-by: Adam Litke <agl@us.ibm.com>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] coredump_filter: add hugepage core dumping
  2008-08-28  5:24 [PATCH] coredump_filter: add hugepage core dumping KOSAKI Motohiro
  2008-08-28 14:48 ` Adam Litke
@ 2008-08-28 16:38 ` Hugh Dickins
  2008-08-28 23:35   ` KOSAKI Motohiro
  2008-09-01  6:00 ` [PATCH] coredump_filter: add hugepage core dumping Hidehiro Kawai
  2008-09-02 13:48 ` Mel Gorman
  3 siblings, 1 reply; 19+ messages in thread
From: Hugh Dickins @ 2008-08-28 16:38 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Kawai Hidehiro, William Irwin, Adam Litke, LKML, Andrew Morton

On Thu, 28 Aug 2008, KOSAKI Motohiro wrote:
> Now, hugepage's vma has VM_RESERVED flag because it cannot be swapped.

(Ooh, don't get me started on what VM_RESERVED means or does not mean.)

> 
> and VM_RESERVED vma isn't core dumped because its flag often be used for
> kernel internal vma (e.g. vmalloc, sound related).
> 
> So, hugepage is never dumped and it indicate hugepages's program can't
> be debugged easily.
> 
> In these days, demand on making use of hugepage is increasing.
> IMO, native support for coredump of hugepage is useful.
> 
> 
> I think VM_RESERVED default dumping bahavior is good,
> then I'd like to add coredump_filter mask.
> 
> This patch doesn't change dafault behavior.

This seems very reasonable to me
(though I've little use for coredumps or hugepages myself).

One caution though: how well does it behave when coredumping a large
area of hugepages which have not actually been instantiated prior to
the coredump?  We have that funny FOLL_ANON ZERO_PAGE code in
follow_page() to avoid wasting memory on large uninstantiated anon
areas, but hugepages won't go that way.  If the dump hangs waiting for
memory to be freed, or OOMkills other processes, that wouldn't be good;
whereas if hugepage reservations (I've not followed what happens with
them) or whatever just make it skip when no more, that should be okay.

Hugh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] coredump_filter: add hugepage core dumping
  2008-08-28 16:38 ` Hugh Dickins
@ 2008-08-28 23:35   ` KOSAKI Motohiro
  2008-08-29 15:28     ` Adam Litke
  0 siblings, 1 reply; 19+ messages in thread
From: KOSAKI Motohiro @ 2008-08-28 23:35 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Kawai Hidehiro, William Irwin, Adam Litke, LKML,
	Andrew Morton

Hi

> > I think VM_RESERVED default dumping bahavior is good,
> > then I'd like to add coredump_filter mask.
> > 
> > This patch doesn't change dafault behavior.
> 
> This seems very reasonable to me
> (though I've little use for coredumps or hugepages myself).

Thanks a lot.

> One caution though: how well does it behave when coredumping a large
> area of hugepages which have not actually been instantiated prior to
> the coredump?  We have that funny FOLL_ANON ZERO_PAGE code in
> follow_page() to avoid wasting memory on large uninstantiated anon
> areas, but hugepages won't go that way.  If the dump hangs waiting for
> memory to be freed, or OOMkills other processes, that wouldn't be good;
> whereas if hugepage reservations (I've not followed what happens with
> them) or whatever just make it skip when no more, that should be okay.

I think hugepage reservation pages always exist when hugepage COW happend.
Then, hugepage access never cause try_to_free_pages() nor OOM.

Adam, if I talk lie, please tell us truth.





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] coredump_filter: add hugepage core dumping
  2008-08-28 23:35   ` KOSAKI Motohiro
@ 2008-08-29 15:28     ` Adam Litke
  2008-09-02  1:21       ` [PATCH] hugepage: support ZERO_PAGE() KOSAKI Motohiro
  0 siblings, 1 reply; 19+ messages in thread
From: Adam Litke @ 2008-08-29 15:28 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Hugh Dickins, Kawai Hidehiro, William Irwin, LKML, Andrew Morton

On Fri, 2008-08-29 at 08:35 +0900, KOSAKI Motohiro wrote:
> Hi
> 
> > > I think VM_RESERVED default dumping bahavior is good,
> > > then I'd like to add coredump_filter mask.
> > > 
> > > This patch doesn't change dafault behavior.
> > 
> > This seems very reasonable to me
> > (though I've little use for coredumps or hugepages myself).
> 
> Thanks a lot.
> 
> > One caution though: how well does it behave when coredumping a large
> > area of hugepages which have not actually been instantiated prior to
> > the coredump?  We have that funny FOLL_ANON ZERO_PAGE code in
> > follow_page() to avoid wasting memory on large uninstantiated anon
> > areas, but hugepages won't go that way.  If the dump hangs waiting for
> > memory to be freed, or OOMkills other processes, that wouldn't be good;
> > whereas if hugepage reservations (I've not followed what happens with
> > them) or whatever just make it skip when no more, that should be okay.
> 
> I think hugepage reservation pages always exist when hugepage COW happend.
> Then, hugepage access never cause try_to_free_pages() nor OOM.

(Mel, since you wrote the private reservation hugetlb code, would you
care to verify the following:)

Well, reserved huge pages _almost_ always exist.  The notable exception
happens when a process creates a MAP_PRIVATE hugetlb mapping and then
forks.  No guarantees are made to the children for access to that
hugetlb mapping.  So if such a child were to core dump an unavailable
huge page, follow_hugetlb_page() would fail.  I think that case is
harmless since it looks like elf_coredump() will replace it with a
zeroed page?

The part of Hugh's email that does deserve more attention is the part
about FOLL_ANON and the ZERO_PAGE.  It seems like an awful waste to zero
out and instantiate huge pages just for a core dump.  I think it would
be worth adding a flag to follow_hugetlb_page() so that it can be
instructed to not fault in un-instantiated huge pages.  This would take
some investigating as to whether it is even valid for
follow_hugetlb_page() to return the ZERO_PAGE().

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] coredump_filter: add hugepage core dumping
  2008-08-28  5:24 [PATCH] coredump_filter: add hugepage core dumping KOSAKI Motohiro
  2008-08-28 14:48 ` Adam Litke
  2008-08-28 16:38 ` Hugh Dickins
@ 2008-09-01  6:00 ` Hidehiro Kawai
  2008-09-02  2:18   ` KOSAKI Motohiro
  2008-09-02 13:48 ` Mel Gorman
  3 siblings, 1 reply; 19+ messages in thread
From: Hidehiro Kawai @ 2008-09-01  6:00 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Hugh Dickins, William Irwin, Adam Litke, LKML, Andrew Morton,
	sugita, Satoshi OSHIMA, agl

Hi Kosaki-san,

KOSAKI Motohiro wrote:

> Now, hugepage's vma has VM_RESERVED flag because it cannot be swapped.
> 
> and VM_RESERVED vma isn't core dumped because its flag often be used for
> kernel internal vma (e.g. vmalloc, sound related).
> 
> So, hugepage is never dumped and it indicate hugepages's program can't be debugged easily.
> 
> In these days, demand on making use of hugepage is increasing.
> IMO, native support for coredump of hugepage is useful.
> 
> 
> I think VM_RESERVED default dumping bahavior is good,
> then I'd like to add coredump_filter mask.

> Index: b/Documentation/filesystems/proc.txt
> ===================================================================
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -2389,11 +2389,12 @@ will be dumped when the <pid> process is
>  of memory types. If a bit of the bitmask is set, memory segments of the
>  corresponding memory type are dumped, otherwise they are not dumped.
>  
> -The following 4 memory types are supported:
> +The following 5 memory types are supported:
>    - (bit 0) anonymous private memory
>    - (bit 1) anonymous shared memory
>    - (bit 2) file-backed private memory
>    - (bit 3) file-backed shared memory
> +  - (bit 5) hugetlb memory

Hugetlb VMAs fall also into one of the lowest 4 bit case.
If you introduce the hugetlb bit (bit 5), you'd better describe that
the hugetlb bit takes precedence over the lowest 4 bits.

Thanks,
-- 
Hidehiro Kawai
Hitachi, Systems Development Laboratory
Linux Technology Center


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] hugepage: support ZERO_PAGE()
  2008-08-29 15:28     ` Adam Litke
@ 2008-09-02  1:21       ` KOSAKI Motohiro
  2008-09-02 14:22         ` Mel Gorman
                           ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2008-09-02  1:21 UTC (permalink / raw)
  To: Adam Litke
  Cc: kosaki.motohiro, Hugh Dickins, Kawai Hidehiro, William Irwin,
	LKML, Andrew Morton, Mel Gorman

CCed Mel Golman

> > > One caution though: how well does it behave when coredumping a large
> > > area of hugepages which have not actually been instantiated prior to
> > > the coredump?  We have that funny FOLL_ANON ZERO_PAGE code in
> > > follow_page() to avoid wasting memory on large uninstantiated anon
> > > areas, but hugepages won't go that way.  If the dump hangs waiting for
> > > memory to be freed, or OOMkills other processes, that wouldn't be good;
> > > whereas if hugepage reservations (I've not followed what happens with
> > > them) or whatever just make it skip when no more, that should be okay.
> > 
> > I think hugepage reservation pages always exist when hugepage COW happend.
> > Then, hugepage access never cause try_to_free_pages() nor OOM.
> 
> (Mel, since you wrote the private reservation hugetlb code, would you
> care to verify the following:)
> 
> Well, reserved huge pages _almost_ always exist.  The notable exception
> happens when a process creates a MAP_PRIVATE hugetlb mapping and then
> forks.  No guarantees are made to the children for access to that
> hugetlb mapping.  So if such a child were to core dump an unavailable
> huge page, follow_hugetlb_page() would fail.  I think that case is
> harmless since it looks like elf_coredump() will replace it with a
> zeroed page?
> 
> The part of Hugh's email that does deserve more attention is the part
> about FOLL_ANON and the ZERO_PAGE.  It seems like an awful waste to zero
> out and instantiate huge pages just for a core dump.  I think it would
> be worth adding a flag to follow_hugetlb_page() so that it can be
> instructed to not fault in un-instantiated huge pages.  This would take
> some investigating as to whether it is even valid for
> follow_hugetlb_page() to return the ZERO_PAGE().

Adam, Thank you precious explain.

Honestly, I can't imazine non-zero-page-support cause terrible things.
Can you explain when happend the terrible things?
I don't know its problem is big issue or not.


Anyway, I made hugepage's zero page patch.
Could you please see it?



=======================================================================================
Subject: hugepage: supoort ZERO_PAGE()

Now, hugepage doesn't use zero page at all because zero page is almost used for coredumping only
and it isn't supported ago.

But now, we implemented hugepage coredumping and we should implement the zero page of hugepage.
The patch do that.


Implementation note:
-------------------------------------------------------------
o Why do we only check VM_SHARED for zero page?
  normal page checked as ..

	static inline int use_zero_page(struct vm_area_struct *vma)
	{
	        if (vma->vm_flags & (VM_LOCKED | VM_SHARED))
	                return 0;
	
	        return !vma->vm_ops || !vma->vm_ops->fault;
	}

First, hugepages never mlock()ed. we don't need concern to VM_LOCKED.

Second, hugetlbfs is pseudo filesystem, not real filesystem and it doesn't have any file backing.
Then, ops->fault checking is meaningless.


o Why don't we use zero page if !pte.

!pte indicate {pud, pmd} doesn't exist or any error happend.
So, We shouldn't return zero page if any error happend.



test method
-------------------------------------------------------
console 1:

	# su
	# echo 100 >/proc/sys/vm/nr_hugepages
	# mount -t hugetlbfs none /hugetlbfs/
	# watch -n1 cat /proc/meminfo

console 2:
	% gcc -g -Wall crash_hugepage.c -o crash_hugepage -lhugetlbfs
	% ulimit -c unlimited
	% echo 0x23 >/proc/self/coredump_filter
	% HUGETLB_MORECORE=yes ./crash_hugepage 50
		-> segmentation fault
	% gdb

crash_hugepage.c
----------------------
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

#define HUGEPAGE_SIZE (2*1024*1024)

int main(int argc, char** argv){
	char* p;

	p = malloc( atoi(argv[1]) * HUGEPAGE_SIZE);
	sleep(2);

	*(p + HUGEPAGE_SIZE) = 1;
	sleep(2);

	*(int*)0 = 1;

	return 0;
}
--------------------------------


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Adam Litke <agl@us.ibm.com>
CC: Hugh Dickins <hugh@veritas.com>
CC: Kawai Hidehiro <hidehiro.kawai.ez@hitachi.com>
CC: William Irwin <wli@holomorphy.com>
CC: Mel Gorman <mel@skynet.ie>

---
 include/linux/hugetlb.h |    6 ++++--
 mm/hugetlb.c            |   29 +++++++++++++++++++++++++----
 mm/memory.c             |    3 ++-
 3 files changed, 31 insertions(+), 7 deletions(-)

Index: b/mm/hugetlb.c
===================================================================
--- a/mm/hugetlb.c	2008-08-31 01:57:36.000000000 +0900
+++ b/mm/hugetlb.c	2008-09-02 08:39:31.000000000 +0900
@@ -2022,15 +2022,30 @@ follow_huge_pud(struct mm_struct *mm, un
 	return NULL;
 }
 
+static int huge_zeropage_ok(pte_t *ptep, int write, int shared)
+{
+	if (!ptep)
+		return 0;
+
+	if (write)
+		return 0;
+
+	if (shared)
+		return 0;
+
+	return huge_pte_none(huge_ptep_get(ptep));
+}
+
 int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			struct page **pages, struct vm_area_struct **vmas,
 			unsigned long *position, int *length, int i,
-			int write)
+			int write, int shared)
 {
 	unsigned long pfn_offset;
 	unsigned long vaddr = *position;
 	int remainder = *length;
 	struct hstate *h = hstate_vma(vma);
+	int zeropage_ok = 0;
 
 	spin_lock(&mm->page_table_lock);
 	while (vaddr < vma->vm_end && remainder) {
@@ -2043,8 +2058,11 @@ int follow_hugetlb_page(struct mm_struct
 		 * first, for the page indexing below to work.
 		 */
 		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
+		if (huge_zeropage_ok(pte, write, shared))
+			zeropage_ok = 1;
 
-		if (!pte || huge_pte_none(huge_ptep_get(pte)) ||
+		if (!pte ||
+		    (huge_pte_none(huge_ptep_get(pte)) && !zeropage_ok) ||
 		    (write && !pte_write(huge_ptep_get(pte)))) {
 			int ret;
 
@@ -2061,11 +2079,14 @@ int follow_hugetlb_page(struct mm_struct
 		}
 
 		pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
-		page = pte_page(huge_ptep_get(pte));
+		if (zeropage_ok)
+			page = ZERO_PAGE(0);
+		else
+			page = pte_page(huge_ptep_get(pte));
 same_page:
 		if (pages) {
 			get_page(page);
-			pages[i] = page + pfn_offset;
+			pages[i] = page + (zeropage_ok ? 0 : pfn_offset);
 		}
 
 		if (vmas)
Index: b/include/linux/hugetlb.h
===================================================================
--- a/include/linux/hugetlb.h	2008-09-02 08:05:46.000000000 +0900
+++ b/include/linux/hugetlb.h	2008-09-02 08:40:46.000000000 +0900
@@ -21,7 +21,9 @@ int hugetlb_sysctl_handler(struct ctl_ta
 int hugetlb_overcommit_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
 int hugetlb_treat_movable_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
 int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
-int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int, int);
+int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
+			struct page **, struct vm_area_struct **,
+			unsigned long *, int *, int, int, int);
 void unmap_hugepage_range(struct vm_area_struct *,
 			unsigned long, unsigned long, struct page *);
 void __unmap_hugepage_range(struct vm_area_struct *,
@@ -74,7 +76,7 @@ static inline unsigned long hugetlb_tota
 	return 0;
 }
 
-#define follow_hugetlb_page(m,v,p,vs,a,b,i,w)	({ BUG(); 0; })
+#define follow_hugetlb_page(m, v, p, vs, a, b, i, w, s)	({ BUG(); 0; })
 #define follow_huge_addr(mm, addr, write)	ERR_PTR(-EINVAL)
 #define copy_hugetlb_page_range(src, dst, vma)	({ BUG(); 0; })
 #define hugetlb_prefault(mapping, vma)		({ BUG(); 0; })
Index: b/mm/memory.c
===================================================================
--- a/mm/memory.c	2008-08-30 11:31:53.000000000 +0900
+++ b/mm/memory.c	2008-09-02 08:41:12.000000000 +0900
@@ -1208,7 +1208,8 @@ int __get_user_pages(struct task_struct 
 
 		if (is_vm_hugetlb_page(vma)) {
 			i = follow_hugetlb_page(mm, vma, pages, vmas,
-						&start, &len, i, write);
+						&start, &len, i, write,
+						vma->vm_flags & VM_SHARED);
 			continue;
 		}
 




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] coredump_filter: add hugepage core dumping
  2008-09-01  6:00 ` [PATCH] coredump_filter: add hugepage core dumping Hidehiro Kawai
@ 2008-09-02  2:18   ` KOSAKI Motohiro
  0 siblings, 0 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2008-09-02  2:18 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: kosaki.motohiro, Hugh Dickins, William Irwin, Adam Litke, LKML,
	Andrew Morton, sugita, Satoshi OSHIMA

> > Index: b/Documentation/filesystems/proc.txt
> > ===================================================================
> > --- a/Documentation/filesystems/proc.txt
> > +++ b/Documentation/filesystems/proc.txt
> > @@ -2389,11 +2389,12 @@ will be dumped when the <pid> process is
> >  of memory types. If a bit of the bitmask is set, memory segments of the
> >  corresponding memory type are dumped, otherwise they are not dumped.
> >  
> > -The following 4 memory types are supported:
> > +The following 5 memory types are supported:
> >    - (bit 0) anonymous private memory
> >    - (bit 1) anonymous shared memory
> >    - (bit 2) file-backed private memory
> >    - (bit 3) file-backed shared memory
> > +  - (bit 5) hugetlb memory
> 
> Hugetlb VMAs fall also into one of the lowest 4 bit case.
> If you introduce the hugetlb bit (bit 5), you'd better describe that
> the hugetlb bit takes precedence over the lowest 4 bits.

Thank you good review.
I updated that documentations.


=====================================================================
Subject: [PATCH v2] coredump_filter: add hugepage core dumping

Now, hugepage's vma has VM_RESERVED flag in order not to being swapped.
But VM_RESERVED vma isn't core dumped because this flag is often used for
some kernel vmas (e.g. vmalloc, sound related).

Then hugepage is never dumped and it can't be debugged easily.
Many developers want hugepages to be included into core-dump.

I think current VM_RESERVED default dumping bahavior is good,
then I'd like to add coredump_filter mask.


I tested by following method.

# ulimit -c unlimited
# echo 0x23 > /proc/self/coredump_filter
# ./hugepage_dump
# gdb ./hugepage_dump core


hugepage_dump.c
------------------------------------------------
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>

#define HUGEPAGE_SIZE (256*1024*1024)

int main(int argc, char** argv)
{
	int err;
	int shmid;
	int *pval;
	int shm_flags = 0666;

	if ((argc >= 2) && (strcmp(argv[1], "-h")==0))
		shm_flags |= SHM_HUGETLB;

	err = shmid = shmget(IPC_PRIVATE, HUGEPAGE_SIZE, shm_flags);
	if (err < 0) {
		perror("shmget");
		exit(1);
	}

	pval = shmat(shmid, 0, 0);
	if (pval == (void*)-1) {
		perror("shmat");
		exit(1);
	}

	*pval = 1;

	*(int*)0 = 1;

	exit(0);
}
-----------------------------------------------------


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Kawai, Hidehiro <hidehiro.kawai.ez@hitachi.com>
CC: Hugh Dickins <hugh@veritas.com>
CC: William Irwin <wli@holomorphy.com>
Tested-by: Adam Litke <agl@us.ibm.com>
Acked-by: Adam Litke <agl@us.ibm.com>

---
 Documentation/filesystems/proc.txt |    6 +++++-
 fs/binfmt_elf.c                    |    6 +++++-
 include/linux/sched.h              |    3 ++-
 3 files changed, 12 insertions(+), 3 deletions(-)

Index: b/Documentation/filesystems/proc.txt
===================================================================
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -2389,15 +2389,19 @@ will be dumped when the <pid> process is
 of memory types. If a bit of the bitmask is set, memory segments of the
 corresponding memory type are dumped, otherwise they are not dumped.
 
-The following 4 memory types are supported:
+The following 5 memory types are supported:
   - (bit 0) anonymous private memory
   - (bit 1) anonymous shared memory
   - (bit 2) file-backed private memory
   - (bit 3) file-backed shared memory
+  - (bit 5) hugetlb memory
 
   Note that MMIO pages such as frame buffer are never dumped and vDSO pages
   are always dumped regardless of the bitmask status.
 
+  Note hugetlb memory flag (bit 5) is prior to other bit. then 0x20 mean
+  that hugetlb memory can be dumped, but any other memory can't be dumped.
+
 Default value of coredump_filter is 0x3; this means all anonymous memory
 segments are dumped.
 
Index: b/include/linux/sched.h
===================================================================
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -402,8 +402,9 @@ extern int get_dumpable(struct mm_struct
 #define MMF_DUMP_MAPPED_PRIVATE	4
 #define MMF_DUMP_MAPPED_SHARED	5
 #define MMF_DUMP_ELF_HEADERS	6
+#define MMF_DUMP_HUGETLB	7
 #define MMF_DUMP_FILTER_SHIFT	MMF_DUMPABLE_BITS
-#define MMF_DUMP_FILTER_BITS	5
+#define MMF_DUMP_FILTER_BITS	(MMF_DUMP_HUGETLB - MMF_DUMP_ANON_PRIVATE + 1)
 #define MMF_DUMP_FILTER_MASK \
 	(((1 << MMF_DUMP_FILTER_BITS) - 1) << MMF_DUMP_FILTER_SHIFT)
 #define MMF_DUMP_FILTER_DEFAULT \
Index: b/fs/binfmt_elf.c
===================================================================
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1160,11 +1160,15 @@ static unsigned long vma_dump_size(struc
 	if (vma->vm_flags & VM_ALWAYSDUMP)
 		goto whole;
 
+#define FILTER(type)	(mm_flags & (1UL << MMF_DUMP_##type))
+
+	if ((vma->vm_flags & VM_HUGETLB) && FILTER(HUGETLB))
+		goto whole;
+
 	/* Do not dump I/O mapped devices or special mappings */
 	if (vma->vm_flags & (VM_IO | VM_RESERVED))
 		return 0;
 
-#define FILTER(type)	(mm_flags & (1UL << MMF_DUMP_##type))
 
 	/* By default, dump shared memory if mapped from an anonymous file. */
 	if (vma->vm_flags & VM_SHARED) {




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] coredump_filter: add hugepage core dumping
  2008-08-28  5:24 [PATCH] coredump_filter: add hugepage core dumping KOSAKI Motohiro
                   ` (2 preceding siblings ...)
  2008-09-01  6:00 ` [PATCH] coredump_filter: add hugepage core dumping Hidehiro Kawai
@ 2008-09-02 13:48 ` Mel Gorman
  2008-09-05  8:06   ` KOSAKI Motohiro
  2008-09-08  1:51   ` Hidehiro Kawai
  3 siblings, 2 replies; 19+ messages in thread
From: Mel Gorman @ 2008-09-02 13:48 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Kawai Hidehiro, Hugh Dickins, William Irwin, Adam Litke, LKML,
	Andrew Morton

On (28/08/08 14:24), KOSAKI Motohiro didst pronounce:
> Now, hugepage's vma has VM_RESERVED flag because it cannot be swapped.
> and VM_RESERVED vma isn't core dumped because its flag often be used for
> kernel internal vma (e.g. vmalloc, sound related).
> 

The meaning of VM_RESERVED is a bit more complicated than that, but
you're right in that it prevents hugetlbfs files being core-dumped.

> So, hugepage is never dumped and it indicate hugepages's program can't be
> debugged easily.
> 
> In these days, demand on making use of hugepage is increasing.
> IMO, native support for coredump of hugepage is useful.
> 

Fair point.

> I think VM_RESERVED default dumping bahavior is good,
> then I'd like to add coredump_filter mask.
> 

That aside, it is not always safe to read a VM_RESERVED area without
side-effects.

> This patch doesn't change dafault behavior.
> 

I wonder about this and if that is the right thing to do. By default,
core dumps include anonymous private and shared memory. Strictly speaking,
hugetlbfs is a file-backed mapping but it is always a ram-based file and a
private mapping is likely to contain information that would normally be in
a private anonymous mapping. It feels like that information should be core
dumped by default. Would it be better to

1. Distinguish between private and shared hugetlbfs mappings
2. Default dump MAP_PRIVATE hugetlbfs mappings

?

> 
> I tested by following method.
> 
> # ulimit -c unlimited
> # echo 0x23 > /proc/self/coredump_filter
> # ./hugepage_dump
> # gdb ./hugepage_dump core
> 
> 
> hugepage_dump.c
> ------------------------------------------------
> #include <sys/ipc.h>
> #include <sys/shm.h>
> #include <sys/types.h>
> #include <unistd.h>
> #include <stdlib.h>
> #include <stdio.h>
> #include <errno.h>
> #include <string.h>
> 
> #define HUGEPAGE_SIZE (256*1024*1024)
> 
> int main(int argc, char** argv)
> {
> 	int err;
> 	int shmid;
> 	int *pval;
> 	int shm_flags = 0666;
> 
> 	if ((argc >= 2) && (strcmp(argv[1], "-h")==0))
> 		shm_flags |= SHM_HUGETLB;
> 
> 	err = shmid = shmget(IPC_PRIVATE, HUGEPAGE_SIZE, shm_flags);
> 	if (err < 0) {
> 		perror("shmget");
> 		exit(1);
> 	}
> 
> 	pval = shmat(shmid, 0, 0);
> 	if (pval == (void*)-1) {
> 		perror("shmat");
> 		exit(1);
> 	}
> 
> 	*pval = 1;
> 
> 	*(int*)0 = 1;
> 
> 	exit(0);
> }
> -----------------------------------------------------
> 
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Kawai, Hidehiro <hidehiro.kawai.ez@hitachi.com>
> CC: Hugh Dickins <hugh@veritas.com>
> CC: William Irwin <wli@holomorphy.com>
> CC: Adam Litke <agl@us.ibm.com>
> 
> ---
>  Documentation/filesystems/proc.txt |    3 ++-
>  fs/binfmt_elf.c                    |    7 ++++++-
>  include/linux/sched.h              |    3 ++-
>  3 files changed, 10 insertions(+), 3 deletions(-)
> 
> Index: b/Documentation/filesystems/proc.txt
> ===================================================================
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -2389,11 +2389,12 @@ will be dumped when the <pid> process is
>  of memory types. If a bit of the bitmask is set, memory segments of the
>  corresponding memory type are dumped, otherwise they are not dumped.
>  
> -The following 4 memory types are supported:
> +The following 5 memory types are supported:
>    - (bit 0) anonymous private memory
>    - (bit 1) anonymous shared memory
>    - (bit 2) file-backed private memory
>    - (bit 3) file-backed shared memory
> +  - (bit 5) hugetlb memory
>  

It's not your fault, but the meaning of bit 4 appears to be
undocumented. Offhand, does anyone know if this is intentional?

>    Note that MMIO pages such as frame buffer are never dumped and vDSO pages
>    are always dumped regardless of the bitmask status.
> Index: b/include/linux/sched.h
> ===================================================================
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -402,8 +402,9 @@ extern int get_dumpable(struct mm_struct
>  #define MMF_DUMP_MAPPED_PRIVATE	4
>  #define MMF_DUMP_MAPPED_SHARED	5
>  #define MMF_DUMP_ELF_HEADERS	6
> +#define MMF_DUMP_HUGETLB	7
>  #define MMF_DUMP_FILTER_SHIFT	MMF_DUMPABLE_BITS
> -#define MMF_DUMP_FILTER_BITS	5
> +#define MMF_DUMP_FILTER_BITS	(MMF_DUMP_HUGETLB - MMF_DUMP_ANON_PRIVATE + 1)

Why is this not just changing to 6?

>  #define MMF_DUMP_FILTER_MASK \
>  	(((1 << MMF_DUMP_FILTER_BITS) - 1) << MMF_DUMP_FILTER_SHIFT)
>  #define MMF_DUMP_FILTER_DEFAULT \
> Index: b/fs/binfmt_elf.c
> ===================================================================
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1160,11 +1160,16 @@ static unsigned long vma_dump_size(struc
>  	if (vma->vm_flags & VM_ALWAYSDUMP)
>  		goto whole;
>  
> +#define FILTER(type)	(mm_flags & (1UL << MMF_DUMP_##type))
> +
> +	if ((vma->vm_flags & VM_HUGETLB) && FILTER(HUGETLB))
> +		goto whole;
> +
>  	/* Do not dump I/O mapped devices or special mappings */
>  	if (vma->vm_flags & (VM_IO | VM_RESERVED))
>  		return 0;
>  
> -#define FILTER(type)	(mm_flags & (1UL << MMF_DUMP_##type))
>  
>  	/* By default, dump shared memory if mapped from an anonymous file. */
>  	if (vma->vm_flags & VM_SHARED) {
> 

Otherwise, it appears ok.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] hugepage: support ZERO_PAGE()
  2008-09-02  1:21       ` [PATCH] hugepage: support ZERO_PAGE() KOSAKI Motohiro
@ 2008-09-02 14:22         ` Mel Gorman
  2008-09-02 15:13           ` Mel Gorman
  2008-09-02 16:27         ` Mel Gorman
  2008-09-02 17:27         ` Adam Litke
  2 siblings, 1 reply; 19+ messages in thread
From: Mel Gorman @ 2008-09-02 14:22 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Adam Litke, Hugh Dickins, Kawai Hidehiro, William Irwin, LKML,
	Andrew Morton

On (02/09/08 10:21), KOSAKI Motohiro didst pronounce:
> CCed Mel Golman
> 
> > > > One caution though: how well does it behave when coredumping a large
> > > > area of hugepages which have not actually been instantiated prior to
> > > > the coredump?  We have that funny FOLL_ANON ZERO_PAGE code in
> > > > follow_page() to avoid wasting memory on large uninstantiated anon
> > > > areas, but hugepages won't go that way.  If the dump hangs waiting for
> > > > memory to be freed, or OOMkills other processes, that wouldn't be good;
> > > > whereas if hugepage reservations (I've not followed what happens with
> > > > them) or whatever just make it skip when no more, that should be okay.
> > > 
> > > I think hugepage reservation pages always exist when hugepage COW happend.
> > > Then, hugepage access never cause try_to_free_pages() nor OOM.
> > 
> > (Mel, since you wrote the private reservation hugetlb code, would you
> > care to verify the following:)
> > 
> > Well, reserved huge pages _almost_ always exist.  The notable exception
> > happens when a process creates a MAP_PRIVATE hugetlb mapping and then
> > forks. 

Also when MAP_NORESERVE is specified, there is not guarantee the huge
pages exist.

> > No guarantees are made to the children for access to that
> > hugetlb mapping.  So if such a child were to core dump an unavailable
> > huge page, follow_hugetlb_page() would fail.  I think that case is
> > harmless since it looks like elf_coredump() will replace it with a
> > zeroed page?
> > 
> > The part of Hugh's email that does deserve more attention is the part
> > about FOLL_ANON and the ZERO_PAGE.  It seems like an awful waste to zero
> > out and instantiate huge pages just for a core dump. 
> > I think it would
> > be worth adding a flag to follow_hugetlb_page() so that it can be
> > instructed to not fault in un-instantiated huge pages.  This would take
> > some investigating as to whether it is even valid for
> > follow_hugetlb_page() to return the ZERO_PAGE().
> 
> Adam, Thank you precious explain.
> 
> Honestly, I can't imazine non-zero-page-support cause terrible things.
> Can you explain when happend the terrible things?
> I don't know its problem is big issue or not.

I believe the impact is that core dumps could take longer and be of a larger
size than necessary if uninstantiated pages are not skipped.  However, if the
zero page was used for anything other than core dumps, you have to be sure
that only the base pages worth of data is being read. I'm not convinced your
patch is doing that. For example, what happens if you ptrace an application
and read the memory area?

> Anyway, I made hugepage's zero page patch.
> Could you please see it?
> 
> 
> 
> =======================================================================================
> Subject: hugepage: supoort ZERO_PAGE()
> 

s/supoort/support/

> Now, hugepage doesn't use zero page at all because zero page is almost used for coredumping only
> and it isn't supported ago.
> 
> But now, we implemented hugepage coredumping and we should implement the zero page of hugepage.
> The patch do that.
> 
> 
> Implementation note:
> -------------------------------------------------------------
> o Why do we only check VM_SHARED for zero page?
>   normal page checked as ..
> 

Offhand, I'm not 100% certain but I think it's because a shared mapping
should always fault the page in case another process sharing the mapping
has put real data there.

> 	static inline int use_zero_page(struct vm_area_struct *vma)
> 	{
> 	        if (vma->vm_flags & (VM_LOCKED | VM_SHARED))
> 	                return 0;
> 	
> 	        return !vma->vm_ops || !vma->vm_ops->fault;
> 	}
> 
> First, hugepages never mlock()ed. we don't need concern to VM_LOCKED.
> 
> Second, hugetlbfs is pseudo filesystem, not real filesystem and it doesn't have any file backing.
> Then, ops->fault checking is meaningless.
> 
> 
> o Why don't we use zero page if !pte.
> 
> !pte indicate {pud, pmd} doesn't exist or any error happend.
> So, We shouldn't return zero page if any error happend.
> 
> 
> 
> test method
> -------------------------------------------------------
> console 1:
> 
> 	# su
> 	# echo 100 >/proc/sys/vm/nr_hugepages
> 	# mount -t hugetlbfs none /hugetlbfs/
> 	# watch -n1 cat /proc/meminfo
> 
> console 2:
> 	% gcc -g -Wall crash_hugepage.c -o crash_hugepage -lhugetlbfs
> 	% ulimit -c unlimited
> 	% echo 0x23 >/proc/self/coredump_filter
> 	% HUGETLB_MORECORE=yes ./crash_hugepage 50
> 		-> segmentation fault
> 	% gdb
> 
> crash_hugepage.c
> ----------------------
> #include <stdlib.h>
> #include <stdio.h>
> #include <unistd.h>
> 
> #define HUGEPAGE_SIZE (2*1024*1024)
> 

It's not a bit deal but as you link against libhugetlbfs, you could have
also included the header and called gethugepagesize().

> int main(int argc, char** argv){
> 	char* p;
> 
> 	p = malloc( atoi(argv[1]) * HUGEPAGE_SIZE);
> 	sleep(2);
> 
> 	*(p + HUGEPAGE_SIZE) = 1;
> 	sleep(2);
> 
> 	*(int*)0 = 1;
> 
> 	return 0;
> }
> --------------------------------
> 
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Adam Litke <agl@us.ibm.com>
> CC: Hugh Dickins <hugh@veritas.com>
> CC: Kawai Hidehiro <hidehiro.kawai.ez@hitachi.com>
> CC: William Irwin <wli@holomorphy.com>
> CC: Mel Gorman <mel@skynet.ie>
> 
> ---
>  include/linux/hugetlb.h |    6 ++++--
>  mm/hugetlb.c            |   29 +++++++++++++++++++++++++----
>  mm/memory.c             |    3 ++-
>  3 files changed, 31 insertions(+), 7 deletions(-)
> 
> Index: b/mm/hugetlb.c
> ===================================================================
> --- a/mm/hugetlb.c	2008-08-31 01:57:36.000000000 +0900
> +++ b/mm/hugetlb.c	2008-09-02 08:39:31.000000000 +0900
> @@ -2022,15 +2022,30 @@ follow_huge_pud(struct mm_struct *mm, un
>  	return NULL;
>  }
>  
> +static int huge_zeropage_ok(pte_t *ptep, int write, int shared)
> +{
> +	if (!ptep)
> +		return 0;
> +
> +	if (write)
> +		return 0;
> +
> +	if (shared)
> +		return 0;
> +
> +	return huge_pte_none(huge_ptep_get(ptep));
> +}
> +
>  int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  			struct page **pages, struct vm_area_struct **vmas,
>  			unsigned long *position, int *length, int i,
> -			int write)
> +			int write, int shared)
>  {
>  	unsigned long pfn_offset;
>  	unsigned long vaddr = *position;
>  	int remainder = *length;
>  	struct hstate *h = hstate_vma(vma);
> +	int zeropage_ok = 0;
>  
>  	spin_lock(&mm->page_table_lock);
>  	while (vaddr < vma->vm_end && remainder) {
> @@ -2043,8 +2058,11 @@ int follow_hugetlb_page(struct mm_struct
>  		 * first, for the page indexing below to work.
>  		 */
>  		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> +		if (huge_zeropage_ok(pte, write, shared))
> +			zeropage_ok = 1;
>  
> -		if (!pte || huge_pte_none(huge_ptep_get(pte)) ||
> +		if (!pte ||
> +		    (huge_pte_none(huge_ptep_get(pte)) && !zeropage_ok) ||
>  		    (write && !pte_write(huge_ptep_get(pte)))) {
>  			int ret;
>  
> @@ -2061,11 +2079,14 @@ int follow_hugetlb_page(struct mm_struct
>  		}
>  
>  		pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
> -		page = pte_page(huge_ptep_get(pte));
> +		if (zeropage_ok)
> +			page = ZERO_PAGE(0);
> +		else
> +			page = pte_page(huge_ptep_get(pte));

This does not look safe in the ptrace case at all. If I ptrace the app
to read a hugetlbfs-backed region, get_user_pages() gets called and then
this. In that case, it would appear that a 4K page would be put in place
where a hugepage is expected. What am I missing?

>  same_page:
>  		if (pages) {
>  			get_page(page);
> -			pages[i] = page + pfn_offset;
> +			pages[i] = page + (zeropage_ok ? 0 : pfn_offset);
>  		}
>  
>  		if (vmas)
> Index: b/include/linux/hugetlb.h
> ===================================================================
> --- a/include/linux/hugetlb.h	2008-09-02 08:05:46.000000000 +0900
> +++ b/include/linux/hugetlb.h	2008-09-02 08:40:46.000000000 +0900
> @@ -21,7 +21,9 @@ int hugetlb_sysctl_handler(struct ctl_ta
>  int hugetlb_overcommit_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
>  int hugetlb_treat_movable_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
>  int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
> -int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int, int);
> +int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
> +			struct page **, struct vm_area_struct **,
> +			unsigned long *, int *, int, int, int);
>  void unmap_hugepage_range(struct vm_area_struct *,
>  			unsigned long, unsigned long, struct page *);
>  void __unmap_hugepage_range(struct vm_area_struct *,
> @@ -74,7 +76,7 @@ static inline unsigned long hugetlb_tota
>  	return 0;
>  }
>  
> -#define follow_hugetlb_page(m,v,p,vs,a,b,i,w)	({ BUG(); 0; })
> +#define follow_hugetlb_page(m, v, p, vs, a, b, i, w, s)	({ BUG(); 0; })
>  #define follow_huge_addr(mm, addr, write)	ERR_PTR(-EINVAL)
>  #define copy_hugetlb_page_range(src, dst, vma)	({ BUG(); 0; })
>  #define hugetlb_prefault(mapping, vma)		({ BUG(); 0; })
> Index: b/mm/memory.c
> ===================================================================
> --- a/mm/memory.c	2008-08-30 11:31:53.000000000 +0900
> +++ b/mm/memory.c	2008-09-02 08:41:12.000000000 +0900
> @@ -1208,7 +1208,8 @@ int __get_user_pages(struct task_struct 
>  
>  		if (is_vm_hugetlb_page(vma)) {
>  			i = follow_hugetlb_page(mm, vma, pages, vmas,
> -						&start, &len, i, write);
> +						&start, &len, i, write,
> +						vma->vm_flags & VM_SHARED);
>  			continue;
>  		}
>  

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] hugepage: support ZERO_PAGE()
  2008-09-02 14:22         ` Mel Gorman
@ 2008-09-02 15:13           ` Mel Gorman
  0 siblings, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2008-09-02 15:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Adam Litke, Hugh Dickins, Kawai Hidehiro, William Irwin, LKML,
	Andrew Morton

On (02/09/08 15:22), Mel Gorman didst pronounce:
> > <SNIP>
> > @@ -2061,11 +2079,14 @@ int follow_hugetlb_page(struct mm_struct
> >  		}
> >  
> >  		pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
> > -		page = pte_page(huge_ptep_get(pte));
> > +		if (zeropage_ok)
> > +			page = ZERO_PAGE(0);
> > +		else
> > +			page = pte_page(huge_ptep_get(pte));
> 
> This does not look safe in the ptrace case at all. If I ptrace the app
> to read a hugetlbfs-backed region, get_user_pages() gets called and then
> this. In that case, it would appear that a 4K page would be put in place
> where a hugepage is expected. What am I missing?
> 

D'oh, what was I thinking. It's only read in PAGE_SIZE portions so it will
not be reading beyond the boundary. It should be safe in the ptrace and
direct-io cases.

> >  same_page:
> >  		if (pages) {
> >  			get_page(page);
> > -			pages[i] = page + pfn_offset;
> > +			pages[i] = page + (zeropage_ok ? 0 : pfn_offset);
> >  		}
> >  
> >  		if (vmas)
> > Index: b/include/linux/hugetlb.h
> > ===================================================================
> > --- a/include/linux/hugetlb.h	2008-09-02 08:05:46.000000000 +0900
> > +++ b/include/linux/hugetlb.h	2008-09-02 08:40:46.000000000 +0900
> > @@ -21,7 +21,9 @@ int hugetlb_sysctl_handler(struct ctl_ta
> >  int hugetlb_overcommit_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
> >  int hugetlb_treat_movable_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
> >  int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
> > -int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int, int);
> > +int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
> > +			struct page **, struct vm_area_struct **,
> > +			unsigned long *, int *, int, int, int);
> >  void unmap_hugepage_range(struct vm_area_struct *,
> >  			unsigned long, unsigned long, struct page *);
> >  void __unmap_hugepage_range(struct vm_area_struct *,
> > @@ -74,7 +76,7 @@ static inline unsigned long hugetlb_tota
> >  	return 0;
> >  }
> >  
> > -#define follow_hugetlb_page(m,v,p,vs,a,b,i,w)	({ BUG(); 0; })
> > +#define follow_hugetlb_page(m, v, p, vs, a, b, i, w, s)	({ BUG(); 0; })
> >  #define follow_huge_addr(mm, addr, write)	ERR_PTR(-EINVAL)
> >  #define copy_hugetlb_page_range(src, dst, vma)	({ BUG(); 0; })
> >  #define hugetlb_prefault(mapping, vma)		({ BUG(); 0; })
> > Index: b/mm/memory.c
> > ===================================================================
> > --- a/mm/memory.c	2008-08-30 11:31:53.000000000 +0900
> > +++ b/mm/memory.c	2008-09-02 08:41:12.000000000 +0900
> > @@ -1208,7 +1208,8 @@ int __get_user_pages(struct task_struct 
> >  
> >  		if (is_vm_hugetlb_page(vma)) {
> >  			i = follow_hugetlb_page(mm, vma, pages, vmas,
> > -						&start, &len, i, write);
> > +						&start, &len, i, write,
> > +						vma->vm_flags & VM_SHARED);
> >  			continue;
> >  		}
> >  
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] hugepage: support ZERO_PAGE()
  2008-09-02  1:21       ` [PATCH] hugepage: support ZERO_PAGE() KOSAKI Motohiro
  2008-09-02 14:22         ` Mel Gorman
@ 2008-09-02 16:27         ` Mel Gorman
  2008-09-02 17:27         ` Adam Litke
  2 siblings, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2008-09-02 16:27 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Adam Litke, Hugh Dickins, Kawai Hidehiro, William Irwin, LKML,
	Andrew Morton

On (02/09/08 10:21), KOSAKI Motohiro didst pronounce:
> CCed Mel Golman
> 

Here is a second go at reviewing this after spending a bit more time on
it.

> > > > One caution though: how well does it behave when coredumping a large
> > > > area of hugepages which have not actually been instantiated prior to
> > > > the coredump?  We have that funny FOLL_ANON ZERO_PAGE code in
> > > > follow_page() to avoid wasting memory on large uninstantiated anon
> > > > areas, but hugepages won't go that way.  If the dump hangs waiting for
> > > > memory to be freed, or OOMkills other processes, that wouldn't be good;
> > > > whereas if hugepage reservations (I've not followed what happens with
> > > > them) or whatever just make it skip when no more, that should be okay.
> > > 
> > > I think hugepage reservation pages always exist when hugepage COW happend.
> > > Then, hugepage access never cause try_to_free_pages() nor OOM.
> > 
> > (Mel, since you wrote the private reservation hugetlb code, would you
> > care to verify the following:)
> > 
> > Well, reserved huge pages _almost_ always exist.  The notable exception
> > happens when a process creates a MAP_PRIVATE hugetlb mapping and then
> > forks.  No guarantees are made to the children for access to that
> > hugetlb mapping.  So if such a child were to core dump an unavailable
> > huge page, follow_hugetlb_page() would fail.  I think that case is
> > harmless since it looks like elf_coredump() will replace it with a
> > zeroed page?
> > 
> > The part of Hugh's email that does deserve more attention is the part
> > about FOLL_ANON and the ZERO_PAGE.  It seems like an awful waste to zero
> > out and instantiate huge pages just for a core dump.  I think it would
> > be worth adding a flag to follow_hugetlb_page() so that it can be
> > instructed to not fault in un-instantiated huge pages.  This would take
> > some investigating as to whether it is even valid for
> > follow_hugetlb_page() to return the ZERO_PAGE().
> 
> Adam, Thank you precious explain.
> 
> Honestly, I can't imazine non-zero-page-support cause terrible things.
> Can you explain when happend the terrible things?
> I don't know its problem is big issue or not.
> 
> 
> Anyway, I made hugepage's zero page patch.
> Could you please see it?
> 
> 
> 
> =======================================================================================
> Subject: hugepage: supoort ZERO_PAGE()
> 
> Now, hugepage doesn't use zero page at all because zero page is almost used for coredumping only
> and it isn't supported ago.
> 
> But now, we implemented hugepage coredumping and we should implement the zero page of hugepage.
> The patch do that.
> 
> 
> Implementation note:
> -------------------------------------------------------------
> o Why do we only check VM_SHARED for zero page?
>   normal page checked as ..
> 
> 	static inline int use_zero_page(struct vm_area_struct *vma)
> 	{
> 	        if (vma->vm_flags & (VM_LOCKED | VM_SHARED))
> 	                return 0;
> 	
> 	        return !vma->vm_ops || !vma->vm_ops->fault;
> 	}
> 
> First, hugepages never mlock()ed. we don't need concern to VM_LOCKED.
> 
> Second, hugetlbfs is pseudo filesystem, not real filesystem and it doesn't have any file backing.
> Then, ops->fault checking is meaningless.
> 
> 
> o Why don't we use zero page if !pte.
> 
> !pte indicate {pud, pmd} doesn't exist or any error happend.
> So, We shouldn't return zero page if any error happend.
> 
> 
> 
> test method
> -------------------------------------------------------
> console 1:
> 
> 	# su
> 	# echo 100 >/proc/sys/vm/nr_hugepages
> 	# mount -t hugetlbfs none /hugetlbfs/
> 	# watch -n1 cat /proc/meminfo
> 
> console 2:
> 	% gcc -g -Wall crash_hugepage.c -o crash_hugepage -lhugetlbfs
> 	% ulimit -c unlimited
> 	% echo 0x23 >/proc/self/coredump_filter
> 	% HUGETLB_MORECORE=yes ./crash_hugepage 50
> 		-> segmentation fault
> 	% gdb
> 
> crash_hugepage.c
> ----------------------
> #include <stdlib.h>
> #include <stdio.h>
> #include <unistd.h>
> 
> #define HUGEPAGE_SIZE (2*1024*1024)
> 
> int main(int argc, char** argv){
> 	char* p;
> 
> 	p = malloc( atoi(argv[1]) * HUGEPAGE_SIZE);
> 	sleep(2);
> 
> 	*(p + HUGEPAGE_SIZE) = 1;
> 	sleep(2);
> 
> 	*(int*)0 = 1;
> 
> 	return 0;
> }
> --------------------------------
> 

I checked and this appears to be ok for both gdb and direct-io at least.
More comments are below. They are mostly about style except for one.

> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Adam Litke <agl@us.ibm.com>
> CC: Hugh Dickins <hugh@veritas.com>
> CC: Kawai Hidehiro <hidehiro.kawai.ez@hitachi.com>
> CC: William Irwin <wli@holomorphy.com>
> CC: Mel Gorman <mel@skynet.ie>
> 
> ---
>  include/linux/hugetlb.h |    6 ++++--
>  mm/hugetlb.c            |   29 +++++++++++++++++++++++++----
>  mm/memory.c             |    3 ++-
>  3 files changed, 31 insertions(+), 7 deletions(-)
> 
> Index: b/mm/hugetlb.c
> ===================================================================
> --- a/mm/hugetlb.c	2008-08-31 01:57:36.000000000 +0900
> +++ b/mm/hugetlb.c	2008-09-02 08:39:31.000000000 +0900
> @@ -2022,15 +2022,30 @@ follow_huge_pud(struct mm_struct *mm, un
>  	return NULL;
>  }
>  
> +static int huge_zeropage_ok(pte_t *ptep, int write, int shared)
> +{
> +	if (!ptep)
> +		return 0;
> +
> +	if (write)
> +		return 0;
> +
> +	if (shared)
> +		return 0;
> +
> +	return huge_pte_none(huge_ptep_get(ptep));
> +}
> +
>  int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  			struct page **pages, struct vm_area_struct **vmas,
>  			unsigned long *position, int *length, int i,
> -			int write)
> +			int write, int shared)

Why does the signature need to change? You have the VMA and could check
the vma->flags within the function.

>  {
>  	unsigned long pfn_offset;
>  	unsigned long vaddr = *position;
>  	int remainder = *length;
>  	struct hstate *h = hstate_vma(vma);
> +	int zeropage_ok = 0;
>  
>  	spin_lock(&mm->page_table_lock);
>  	while (vaddr < vma->vm_end && remainder) {
> @@ -2043,8 +2058,11 @@ int follow_hugetlb_page(struct mm_struct
>  		 * first, for the page indexing below to work.
>  		 */
>  		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> +		if (huge_zeropage_ok(pte, write, shared))
> +			zeropage_ok = 1;
>  
> -		if (!pte || huge_pte_none(huge_ptep_get(pte)) ||
> +		if (!pte ||
> +		    (huge_pte_none(huge_ptep_get(pte)) && !zeropage_ok) ||
>  		    (write && !pte_write(huge_ptep_get(pte)))) {
>  			int ret;
>  
> @@ -2061,11 +2079,14 @@ int follow_hugetlb_page(struct mm_struct
>  		}
>  
>  		pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
> -		page = pte_page(huge_ptep_get(pte));
> +		if (zeropage_ok)
> +			page = ZERO_PAGE(0);
> +		else
> +			page = pte_page(huge_ptep_get(pte));

Calculate pfn_offset within the if statement here so that the change
below is unnecessary. The zeropage_ok ? 0 : pfn_offset is trickier to
read than it needs to be.

>  same_page:
>  		if (pages) {
>  			get_page(page);
> -			pages[i] = page + pfn_offset;
> +			pages[i] = page + (zeropage_ok ? 0 : pfn_offset);
>  		}

For direct-IO on NUMA, do we care that we are now calling get_page() and
bumping the reference count on the zero page instead of a struct page
that could be local? I suspect the answer is "no" because the same
problem would apply for base pages but someone might disagree.

>  
>  		if (vmas)
> Index: b/include/linux/hugetlb.h
> ===================================================================
> --- a/include/linux/hugetlb.h	2008-09-02 08:05:46.000000000 +0900
> +++ b/include/linux/hugetlb.h	2008-09-02 08:40:46.000000000 +0900
> @@ -21,7 +21,9 @@ int hugetlb_sysctl_handler(struct ctl_ta
>  int hugetlb_overcommit_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
>  int hugetlb_treat_movable_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
>  int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
> -int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int, int);
> +int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
> +			struct page **, struct vm_area_struct **,
> +			unsigned long *, int *, int, int, int);
>  void unmap_hugepage_range(struct vm_area_struct *,
>  			unsigned long, unsigned long, struct page *);
>  void __unmap_hugepage_range(struct vm_area_struct *,
> @@ -74,7 +76,7 @@ static inline unsigned long hugetlb_tota
>  	return 0;
>  }
>  
> -#define follow_hugetlb_page(m,v,p,vs,a,b,i,w)	({ BUG(); 0; })
> +#define follow_hugetlb_page(m, v, p, vs, a, b, i, w, s)	({ BUG(); 0; })
>  #define follow_huge_addr(mm, addr, write)	ERR_PTR(-EINVAL)
>  #define copy_hugetlb_page_range(src, dst, vma)	({ BUG(); 0; })
>  #define hugetlb_prefault(mapping, vma)		({ BUG(); 0; })
> Index: b/mm/memory.c
> ===================================================================
> --- a/mm/memory.c	2008-08-30 11:31:53.000000000 +0900
> +++ b/mm/memory.c	2008-09-02 08:41:12.000000000 +0900
> @@ -1208,7 +1208,8 @@ int __get_user_pages(struct task_struct 
>  
>  		if (is_vm_hugetlb_page(vma)) {
>  			i = follow_hugetlb_page(mm, vma, pages, vmas,
> -						&start, &len, i, write);
> +						&start, &len, i, write,
> +						vma->vm_flags & VM_SHARED);
>  			continue;
>  		}
>  
> 
> 
> 
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] hugepage: support ZERO_PAGE()
  2008-09-02  1:21       ` [PATCH] hugepage: support ZERO_PAGE() KOSAKI Motohiro
  2008-09-02 14:22         ` Mel Gorman
  2008-09-02 16:27         ` Mel Gorman
@ 2008-09-02 17:27         ` Adam Litke
  2 siblings, 0 replies; 19+ messages in thread
From: Adam Litke @ 2008-09-02 17:27 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Hugh Dickins, Kawai Hidehiro, William Irwin, LKML, Andrew Morton,
	Mel Gorman

On Tue, 2008-09-02 at 10:21 +0900, KOSAKI Motohiro wrote:
> +static int huge_zeropage_ok(pte_t *ptep, int write, int shared)
> +{
> +	if (!ptep)
> +		return 0;
> +
> +	if (write)
> +		return 0;
> +
> +	if (shared)
> +		return 0;
> +
> +	return huge_pte_none(huge_ptep_get(ptep));
> +}

In addition to the comments from Mel, I'd like to see this function
collapsed a bit...

if (!ptep || write || shared)
	return 0;
else
	return huge_pte_none(huge_ptep_get(ptep));

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] coredump_filter: add hugepage core dumping
  2008-09-02 13:48 ` Mel Gorman
@ 2008-09-05  8:06   ` KOSAKI Motohiro
  2008-09-08  1:51   ` Hidehiro Kawai
  1 sibling, 0 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2008-09-05  8:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: kosaki.motohiro, Kawai Hidehiro, Hugh Dickins, William Irwin,
	Adam Litke, LKML, Andrew Morton

Hi

> That aside, it is not always safe to read a VM_RESERVED area without
> side-effects.
> 
> > This patch doesn't change dafault behavior.
> > 
> 
> I wonder about this and if that is the right thing to do. By default,
> core dumps include anonymous private and shared memory. Strictly speaking,
> hugetlbfs is a file-backed mapping but it is always a ram-based file and a
> private mapping is likely to contain information that would normally be in
> a private anonymous mapping. It feels like that information should be core
> dumped by default. Would it be better to
> 
> 1. Distinguish between private and shared hugetlbfs mappings
> 2. Default dump MAP_PRIVATE hugetlbfs mappings
> 
> ?

make much sense.
ok, I'll make again under your design.

Thank you very much.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] coredump_filter: add hugepage core dumping
  2008-09-02 13:48 ` Mel Gorman
  2008-09-05  8:06   ` KOSAKI Motohiro
@ 2008-09-08  1:51   ` Hidehiro Kawai
  2008-09-09 11:20     ` KOSAKI Motohiro
  2008-09-10  6:04     ` Roland McGrath
  1 sibling, 2 replies; 19+ messages in thread
From: Hidehiro Kawai @ 2008-09-08  1:51 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, Hugh Dickins, William Irwin, Adam Litke, LKML,
	Andrew Morton, roland, sugita, Satoshi OSHIMA

[Added CC to Roland McGrath]

Mel Gorman wrote:

>>--- a/Documentation/filesystems/proc.txt
>>+++ b/Documentation/filesystems/proc.txt
>>@@ -2389,11 +2389,12 @@ will be dumped when the <pid> process is
>> of memory types. If a bit of the bitmask is set, memory segments of the
>> corresponding memory type are dumped, otherwise they are not dumped.
>> 
>>-The following 4 memory types are supported:
>>+The following 5 memory types are supported:
>>   - (bit 0) anonymous private memory
>>   - (bit 1) anonymous shared memory
>>   - (bit 2) file-backed private memory
>>   - (bit 3) file-backed shared memory
>>+  - (bit 5) hugetlb memory
> 
> It's not your fault, but the meaning of bit 4 appears to be
> undocumented. Offhand, does anyone know if this is intentional?

I think it was just forgotten to be updated.  Bit 4 was introduced
by Roland McGrath, and it means elf header pages in file-backed
private VMAs are dumped even if bit 2 is cleared.

Thanks,


Subject: [PATCH] coredump_filter: add description of bit 4

There is no description of bit 4 of coredump_filter in the
documentation.  This patch adds it.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
CC: Roland McGrath <roland@redhat.com>
---
 Documentation/filesystems/proc.txt |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6.27-rc5/Documentation/filesystems/proc.txt
===================================================================
--- linux-2.6.27-rc5.orig/Documentation/filesystems/proc.txt
+++ linux-2.6.27-rc5/Documentation/filesystems/proc.txt
@@ -2394,6 +2394,8 @@ The following 4 memory types are support
   - (bit 1) anonymous shared memory
   - (bit 2) file-backed private memory
   - (bit 3) file-backed shared memory
+  - (bit 4) ELF header pages in file-backed private memory areas (it is
+            effective only if the bit 2 is cleared)
 
   Note that MMIO pages such as frame buffer are never dumped and vDSO pages
   are always dumped regardless of the bitmask status.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] coredump_filter: add hugepage core dumping
  2008-09-08  1:51   ` Hidehiro Kawai
@ 2008-09-09 11:20     ` KOSAKI Motohiro
  2008-09-10  6:04     ` Roland McGrath
  1 sibling, 0 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2008-09-09 11:20 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: Mel Gorman, Hugh Dickins, William Irwin, Adam Litke, LKML,
	Andrew Morton, roland, sugita, Satoshi OSHIMA

> I think it was just forgotten to be updated.  Bit 4 was introduced
> by Roland McGrath, and it means elf header pages in file-backed
> private VMAs are dumped even if bit 2 is cleared.
>
> Thanks,
>
>
> Subject: [PATCH] coredump_filter: add description of bit 4
>
> There is no description of bit 4 of coredump_filter in the
> documentation.  This patch adds it.
>
> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> CC: Roland McGrath <roland@redhat.com>
> ---
>  Documentation/filesystems/proc.txt |    2 ++
>  1 file changed, 2 insertions(+)
>
> Index: linux-2.6.27-rc5/Documentation/filesystems/proc.txt
> ===================================================================
> --- linux-2.6.27-rc5.orig/Documentation/filesystems/proc.txt
> +++ linux-2.6.27-rc5/Documentation/filesystems/proc.txt
> @@ -2394,6 +2394,8 @@ The following 4 memory types are support
>   - (bit 1) anonymous shared memory
>   - (bit 2) file-backed private memory
>   - (bit 3) file-backed shared memory
> +  - (bit 4) ELF header pages in file-backed private memory areas (it is
> +            effective only if the bit 2 is cleared)
>
>   Note that MMIO pages such as frame buffer are never dumped and vDSO pages
>   are always dumped regardless of the bitmask status.

Very thanks, kawai-san.

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] coredump_filter: add hugepage core dumping
  2008-09-08  1:51   ` Hidehiro Kawai
  2008-09-09 11:20     ` KOSAKI Motohiro
@ 2008-09-10  6:04     ` Roland McGrath
  2008-09-10  6:53       ` KOSAKI Motohiro
  1 sibling, 1 reply; 19+ messages in thread
From: Roland McGrath @ 2008-09-10  6:04 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: Mel Gorman, KOSAKI Motohiro, Hugh Dickins, William Irwin,
	Adam Litke, LKML, Andrew Morton, sugita, Satoshi OSHIMA

It was certainly not meant to be excluded from any documentation.  

To make the description clear, MMF_DUMP_ELF_HEADERS applies when a
file-backed private mapping would otherwise be elided (based on the other
bits and whether it's been modified).  If the vma maps offset 0 of the
file, and that page is readable and starts with ELFMAG, then just that
first page will be dumped.  (So p_filesz will be PAGE_SIZE rather than 0
for an elided mapping, or p_memsz for a mapping dumped whole.)

Thanks,
Roland

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] coredump_filter: add hugepage core dumping
  2008-09-10  6:04     ` Roland McGrath
@ 2008-09-10  6:53       ` KOSAKI Motohiro
  0 siblings, 0 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2008-09-10  6:53 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Hidehiro Kawai, Mel Gorman, Hugh Dickins, William Irwin,
	Adam Litke, LKML, Andrew Morton, sugita, Satoshi OSHIMA

> It was certainly not meant to be excluded from any documentation.
>
> To make the description clear, MMF_DUMP_ELF_HEADERS applies when a
> file-backed private mapping would otherwise be elided (based on the other
> bits and whether it's been modified).  If the vma maps offset 0 of the
> file, and that page is readable and starts with ELFMAG, then just that
> first page will be dumped.  (So p_filesz will be PAGE_SIZE rather than 0
> for an elided mapping, or p_memsz for a mapping dumped whole.)

Could you please make the patch.
Then, I'll ack it with presure.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-09-10  6:53 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-28  5:24 [PATCH] coredump_filter: add hugepage core dumping KOSAKI Motohiro
2008-08-28 14:48 ` Adam Litke
2008-08-28 14:59   ` KOSAKI Motohiro
2008-08-28 16:38 ` Hugh Dickins
2008-08-28 23:35   ` KOSAKI Motohiro
2008-08-29 15:28     ` Adam Litke
2008-09-02  1:21       ` [PATCH] hugepage: support ZERO_PAGE() KOSAKI Motohiro
2008-09-02 14:22         ` Mel Gorman
2008-09-02 15:13           ` Mel Gorman
2008-09-02 16:27         ` Mel Gorman
2008-09-02 17:27         ` Adam Litke
2008-09-01  6:00 ` [PATCH] coredump_filter: add hugepage core dumping Hidehiro Kawai
2008-09-02  2:18   ` KOSAKI Motohiro
2008-09-02 13:48 ` Mel Gorman
2008-09-05  8:06   ` KOSAKI Motohiro
2008-09-08  1:51   ` Hidehiro Kawai
2008-09-09 11:20     ` KOSAKI Motohiro
2008-09-10  6:04     ` Roland McGrath
2008-09-10  6:53       ` KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox