huge zero page vs FOLL

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* huge zero page vs FOLL_DUMP
@ 2013-01-11 23:53 Michel Lespinasse
  2013-01-12  3:36 ` Kirill A. Shutemov
  0 siblings, 1 reply; 6+ messages in thread
From: Michel Lespinasse @ 2013-01-11 23:53 UTC (permalink / raw)
  To: Kirill A. Shutemov, Hugh Dickins, linux-mm

Hi,

follow_page() has code to return ERR_PTR(-EFAULT) when it encounters
the zero page and FOLL_DUMP flag is passed - this is used to avoid
dumping the zero page to disk when doing core dumps, and also by
munlock to avoid having potentially large number of threads trying to
munlock the zero page at once, which we can't reclaim anyway.

We don't have the corresponding logic when follow_page() encounters a
huge zero page. I think we should, preferably before 3.8. However, I
am slightly confused as to what to do for the munlock case, as the
huge zero page actually does seem to be reclaimable. My guess is that
we could still skip the munlocks, until the zero page is actually
reclaimed at which point we should check if we can munlock it.

Kirill, is this something you would have time to look into ?

Thanks,

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: huge zero page vs FOLL_DUMP
  2013-01-11 23:53 huge zero page vs FOLL_DUMP Michel Lespinasse
@ 2013-01-12  3:36 ` Kirill A. Shutemov
  2013-01-12  4:27   ` Michel Lespinasse
  2013-01-13  1:43   ` Simon Jeons
  0 siblings, 2 replies; 6+ messages in thread
From: Kirill A. Shutemov @ 2013-01-12  3:36 UTC (permalink / raw)
  To: Michel Lespinasse; +Cc: Hugh Dickins, linux-mm

[-- Attachment #1: Type: text/plain, Size: 2191 bytes --]

On Fri, Jan 11, 2013 at 03:53:34PM -0800, Michel Lespinasse wrote:
> Hi,
> 
> follow_page() has code to return ERR_PTR(-EFAULT) when it encounters
> the zero page and FOLL_DUMP flag is passed - this is used to avoid
> dumping the zero page to disk when doing core dumps, and also by
> munlock to avoid having potentially large number of threads trying to
> munlock the zero page at once, which we can't reclaim anyway.
> 
> We don't have the corresponding logic when follow_page() encounters a
> huge zero page. I think we should, preferably before 3.8. However, I
> am slightly confused as to what to do for the munlock case, as the
> huge zero page actually does seem to be reclaimable. My guess is that
> we could still skip the munlocks, until the zero page is actually
> reclaimed at which point we should check if we can munlock it.
> 
> Kirill, is this something you would have time to look into ?

Nice catch! Thank you.

I don't think we should do anything about mlock(). Huge zero page cannot
be mlocked -- it will not pass page->mapping check in
follow_trans_huge_pmd(). And it's not reclaimable if it's mapped to
anywhere.

Could you tese the patch?

From 062a9b670ede9fe5fca1d1947b42990b6b0642a4 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Sat, 12 Jan 2013 05:18:58 +0200
Subject: [PATCH] thp: Avoid dumping huge zero page

No reason to preserve huge zero page in core dump.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Michel Lespinasse <walken@google.com>
---
 mm/huge_memory.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 6001ee6..b5783d8 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1257,6 +1257,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 	if (flags & FOLL_WRITE && !pmd_write(*pmd))
 		goto out;
 
+	/* Avoid dumping huge zero page */
+	if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd))
+		return ERR_PTR(-EFAULT);
+
 	page = pmd_page(*pmd);
 	VM_BUG_ON(!PageHead(page));
 	if (flags & FOLL_TOUCH) {
-- 
1.8.1

-- 
 Kirill A. Shutemov

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: huge zero page vs FOLL_DUMP
  2013-01-12  3:36 ` Kirill A. Shutemov
@ 2013-01-12  4:27   ` Michel Lespinasse
  2013-01-14 15:18     ` Kirill A. Shutemov
  2013-01-13  1:43   ` Simon Jeons
  1 sibling, 1 reply; 6+ messages in thread
From: Michel Lespinasse @ 2013-01-12  4:27 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Hugh Dickins, linux-mm, Andrew Morton

On Fri, Jan 11, 2013 at 7:36 PM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
> On Fri, Jan 11, 2013 at 03:53:34PM -0800, Michel Lespinasse wrote:
>> Hi,
>>
>> follow_page() has code to return ERR_PTR(-EFAULT) when it encounters
>> the zero page and FOLL_DUMP flag is passed - this is used to avoid
>> dumping the zero page to disk when doing core dumps, and also by
>> munlock to avoid having potentially large number of threads trying to
>> munlock the zero page at once, which we can't reclaim anyway.
>>
>> We don't have the corresponding logic when follow_page() encounters a
>> huge zero page. I think we should, preferably before 3.8. However, I
>> am slightly confused as to what to do for the munlock case, as the
>> huge zero page actually does seem to be reclaimable. My guess is that
>> we could still skip the munlocks, until the zero page is actually
>> reclaimed at which point we should check if we can munlock it.
>>
>> Kirill, is this something you would have time to look into ?
>
> Nice catch! Thank you.
>
> I don't think we should do anything about mlock(). Huge zero page cannot
> be mlocked -- it will not pass page->mapping check in
> follow_trans_huge_pmd(). And it's not reclaimable if it's mapped to
> anywhere.

Ah, thanks for the explanation about mlock.

> Could you tese the patch?
>
> From 062a9b670ede9fe5fca1d1947b42990b6b0642a4 Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Sat, 12 Jan 2013 05:18:58 +0200
> Subject: [PATCH] thp: Avoid dumping huge zero page
>
> No reason to preserve huge zero page in core dump.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: Michel Lespinasse <walken@google.com>
> ---
>  mm/huge_memory.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 6001ee6..b5783d8 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1257,6 +1257,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>         if (flags & FOLL_WRITE && !pmd_write(*pmd))
>                 goto out;
>
> +       /* Avoid dumping huge zero page */
> +       if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd))
> +               return ERR_PTR(-EFAULT);
> +
>         page = pmd_page(*pmd);
>         VM_BUG_ON(!PageHead(page));
>         if (flags & FOLL_TOUCH) {

Looks sane to me, and it also helps my munlock test (we were getting
and dropping references on the zero page which made it noticeably
slower). Thanks!

Reviewed-by: Michel Lespinasse <walken@google.com>

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: huge zero page vs FOLL_DUMP
  2013-01-12  3:36 ` Kirill A. Shutemov
  2013-01-12  4:27   ` Michel Lespinasse
@ 2013-01-13  1:43   ` Simon Jeons
  2013-01-13 16:10     ` Kirill A. Shutemov
  1 sibling, 1 reply; 6+ messages in thread
From: Simon Jeons @ 2013-01-13  1:43 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Michel Lespinasse, Hugh Dickins, linux-mm

On Sat, 2013-01-12 at 05:36 +0200, Kirill A. Shutemov wrote:
> On Fri, Jan 11, 2013 at 03:53:34PM -0800, Michel Lespinasse wrote:
> > Hi,
> > 
> > follow_page() has code to return ERR_PTR(-EFAULT) when it encounters
> > the zero page and FOLL_DUMP flag is passed - this is used to avoid
> > dumping the zero page to disk when doing core dumps, and also by
> > munlock to avoid having potentially large number of threads trying to
> > munlock the zero page at once, which we can't reclaim anyway.
> > 
> > We don't have the corresponding logic when follow_page() encounters a
> > huge zero page. I think we should, preferably before 3.8. However, I
> > am slightly confused as to what to do for the munlock case, as the
> > huge zero page actually does seem to be reclaimable. My guess is that
> > we could still skip the munlocks, until the zero page is actually
> > reclaimed at which point we should check if we can munlock it.
> > 
> > Kirill, is this something you would have time to look into ?
> 
> Nice catch! Thank you.
> 
> I don't think we should do anything about mlock(). Huge zero page cannot
> be mlocked -- it will not pass page->mapping check in

Hi Kirill,

What's store in page->mapping of huge zero page?

> follow_trans_huge_pmd(). And it's not reclaimable if it's mapped to
> anywhere.
> 
> Could you tese the patch?
> 
> From 062a9b670ede9fe5fca1d1947b42990b6b0642a4 Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Sat, 12 Jan 2013 05:18:58 +0200
> Subject: [PATCH] thp: Avoid dumping huge zero page
> 
> No reason to preserve huge zero page in core dump.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: Michel Lespinasse <walken@google.com>
> ---
>  mm/huge_memory.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 6001ee6..b5783d8 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1257,6 +1257,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>  	if (flags & FOLL_WRITE && !pmd_write(*pmd))
>  		goto out;
>  
> +	/* Avoid dumping huge zero page */
> +	if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd))
> +		return ERR_PTR(-EFAULT);
> +
>  	page = pmd_page(*pmd);
>  	VM_BUG_ON(!PageHead(page));
>  	if (flags & FOLL_TOUCH) {
> -- 
> 1.8.1
> 

-- 
Simon Jeons <simon.jeons@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: huge zero page vs FOLL_DUMP
  2013-01-13  1:43   ` Simon Jeons
@ 2013-01-13 16:10     ` Kirill A. Shutemov
  0 siblings, 0 replies; 6+ messages in thread
From: Kirill A. Shutemov @ 2013-01-13 16:10 UTC (permalink / raw)
  To: Simon Jeons; +Cc: Michel Lespinasse, Hugh Dickins, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1404 bytes --]

On Sat, Jan 12, 2013 at 07:43:08PM -0600, Simon Jeons wrote:
> On Sat, 2013-01-12 at 05:36 +0200, Kirill A. Shutemov wrote:
> > On Fri, Jan 11, 2013 at 03:53:34PM -0800, Michel Lespinasse wrote:
> > > Hi,
> > > 
> > > follow_page() has code to return ERR_PTR(-EFAULT) when it encounters
> > > the zero page and FOLL_DUMP flag is passed - this is used to avoid
> > > dumping the zero page to disk when doing core dumps, and also by
> > > munlock to avoid having potentially large number of threads trying to
> > > munlock the zero page at once, which we can't reclaim anyway.
> > > 
> > > We don't have the corresponding logic when follow_page() encounters a
> > > huge zero page. I think we should, preferably before 3.8. However, I
> > > am slightly confused as to what to do for the munlock case, as the
> > > huge zero page actually does seem to be reclaimable. My guess is that
> > > we could still skip the munlocks, until the zero page is actually
> > > reclaimed at which point we should check if we can munlock it.
> > > 
> > > Kirill, is this something you would have time to look into ?
> > 
> > Nice catch! Thank you.
> > 
> > I don't think we should do anything about mlock(). Huge zero page cannot
> > be mlocked -- it will not pass page->mapping check in
> 
> Hi Kirill,
> 
> What's store in page->mapping of huge zero page?

NULL.

-- 
 Kirill A. Shutemov

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: huge zero page vs FOLL_DUMP
  2013-01-12  4:27   ` Michel Lespinasse
@ 2013-01-14 15:18     ` Kirill A. Shutemov
  0 siblings, 0 replies; 6+ messages in thread
From: Kirill A. Shutemov @ 2013-01-14 15:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, linux-mm, Michel Lespinasse

[-- Attachment #1: Type: text/plain, Size: 1574 bytes --]

On Fri, Jan 11, 2013 at 08:27:31PM -0800, Michel Lespinasse wrote:
> On Fri, Jan 11, 2013 at 7:36 PM, Kirill A. Shutemov
> > Could you tese the patch?
> >
> > From 062a9b670ede9fe5fca1d1947b42990b6b0642a4 Mon Sep 17 00:00:00 2001
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > Date: Sat, 12 Jan 2013 05:18:58 +0200
> > Subject: [PATCH] thp: Avoid dumping huge zero page
> >
> > No reason to preserve huge zero page in core dump.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Reported-by: Michel Lespinasse <walken@google.com>
> > ---
> >  mm/huge_memory.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 6001ee6..b5783d8 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1257,6 +1257,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
> >         if (flags & FOLL_WRITE && !pmd_write(*pmd))
> >                 goto out;
> >
> > +       /* Avoid dumping huge zero page */
> > +       if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd))
> > +               return ERR_PTR(-EFAULT);
> > +
> >         page = pmd_page(*pmd);
> >         VM_BUG_ON(!PageHead(page));
> >         if (flags & FOLL_TOUCH) {
> 
> Looks sane to me, and it also helps my munlock test (we were getting
> and dropping references on the zero page which made it noticeably
> slower). Thanks!
> 
> Reviewed-by: Michel Lespinasse <walken@google.com>

Andrew, please take the patch.

-- 
 Kirill A. Shutemov

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-01-14 15:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-11 23:53 huge zero page vs FOLL_DUMP Michel Lespinasse
2013-01-12  3:36 ` Kirill A. Shutemov
2013-01-12  4:27   ` Michel Lespinasse
2013-01-14 15:18     ` Kirill A. Shutemov
2013-01-13  1:43   ` Simon Jeons
2013-01-13 16:10     ` Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).