linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: madvise(MADV_DODUMP) allow hugetlbfs pages
@ 2018-09-30  5:46 Daniel Black
  2018-10-01 22:11 ` Mike Kravetz
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Black @ 2018-09-30  5:46 UTC (permalink / raw)
  To: linux-mm, mike.kravetz, khlebnikov; +Cc: Daniel Black

Reproducer, assuming 2M of hugetlbfs available:

Hugetlbfs mounted, size=2M and option user=testuser

  # mount | grep ^hugetlbfs
  hugetlbfs on /dev/hugepages type hugetlbfs (rw,pagesize=2M,user=dan)
  # sysctl vm.nr_hugepages=1
  vm.nr_hugepages = 1
  # grep Huge /proc/meminfo
  AnonHugePages:         0 kB
  ShmemHugePages:        0 kB
  HugePages_Total:       1
  HugePages_Free:        1
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  Hugepagesize:       2048 kB
  Hugetlb:            2048 kB

Code:

  #include <sys/mman.h>
  #include <stddef.h>
  #define SIZE 2*1024*1024
  int main()
  {
    void *ptr;
    ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_HUGETLB | MAP_ANONYMOUS, -1, 0);
    madvise(ptr, SIZE, MADV_DONTDUMP);
    madvise(ptr, SIZE, MADV_DODUMP);
  }

Compile and strace:

  mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = 0x7ff7c9200000
  madvise(0x7ff7c9200000, 2097152, MADV_DONTDUMP) = 0
  madvise(0x7ff7c9200000, 2097152, MADV_DODUMP) = -1 EINVAL (Invalid argument)

hugetlbfs pages have VM_DONTEXPAND in the VmFlags driver pages based on
author testing with analysis from Florian Weimer[1].

The inclusion of VM_DONTEXPAND into the VM_SPECIAL defination
was a consequence of the large useage of VM_DONTEXPAND in device
drivers.

A consequence of [2] is that VM_DONTEXPAND marked pages are unable to be
marked DODUMP.

A user could quite legitimately madvise(MADV_DONTDUMP) their hugetlbfs
memory for a while and later request that madvise(MADV_DODUMP) on the
same memory. We correct this omission by allowing madvice(MADV_DODUMP)
on hugetlbfs pages.

[1] https://stackoverflow.com/questions/52548260/madvisedodump-on-the-same-ptr-size-as-a-successful-madvisedontdump-fails-wit
[2] commit 0103bd16fb90 ("mm: prepare VM_DONTDUMP for using in drivers")

Fixes: 0103bd16fb90 ("mm: prepare VM_DONTDUMP for using in drivers")
Reported-by: Kenneth Penza <kpenza@gmail.com>
Signed-off-by: Daniel Black <daniel@linux.ibm.com>
Buglink: https://lists.launchpad.net/maria-discuss/msg05245.html
Cc: linux-mm@kvack.org
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/madvise.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 972a9eaa898b..71d21df2a3f3 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -96,7 +96,7 @@ static long madvise_behavior(struct vm_area_struct *vma,
 		new_flags |= VM_DONTDUMP;
 		break;
 	case MADV_DODUMP:
-		if (new_flags & VM_SPECIAL) {
+		if (!is_vm_hugetlb_page(vma) && new_flags & VM_SPECIAL) {
 			error = -EINVAL;
 			goto out;
 		}
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: madvise(MADV_DODUMP) allow hugetlbfs pages
  2018-09-30  5:46 [PATCH] mm: madvise(MADV_DODUMP) allow hugetlbfs pages Daniel Black
@ 2018-10-01 22:11 ` Mike Kravetz
  2018-10-03  5:47   ` Daniel Black
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Kravetz @ 2018-10-01 22:11 UTC (permalink / raw)
  To: Daniel Black, linux-mm, khlebnikov

On 9/29/18 10:46 PM, Daniel Black wrote:
<snip>
> hugetlbfs pages have VM_DONTEXPAND in the VmFlags driver pages based on
> author testing with analysis from Florian Weimer[1].
> 
> The inclusion of VM_DONTEXPAND into the VM_SPECIAL defination
> was a consequence of the large useage of VM_DONTEXPAND in device
> drivers.
> 
> A consequence of [2] is that VM_DONTEXPAND marked pages are unable to be
> marked DODUMP.
> 
> A user could quite legitimately madvise(MADV_DONTDUMP) their hugetlbfs
> memory for a while and later request that madvise(MADV_DODUMP) on the
> same memory. We correct this omission by allowing madvice(MADV_DODUMP)
> on hugetlbfs pages.
> 
> [1] https://stackoverflow.com/questions/52548260/madvisedodump-on-the-same-ptr-size-as-a-successful-madvisedontdump-fails-wit
> [2] commit 0103bd16fb90 ("mm: prepare VM_DONTDUMP for using in drivers")
> 
> Fixes: 0103bd16fb90 ("mm: prepare VM_DONTDUMP for using in drivers")
> Reported-by: Kenneth Penza <kpenza@gmail.com>
> Signed-off-by: Daniel Black <daniel@linux.ibm.com>
> Buglink: https://lists.launchpad.net/maria-discuss/msg05245.html
> Cc: linux-mm@kvack.org
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
> ---
>  mm/madvise.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 972a9eaa898b..71d21df2a3f3 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -96,7 +96,7 @@ static long madvise_behavior(struct vm_area_struct *vma,
>  		new_flags |= VM_DONTDUMP;
>  		break;
>  	case MADV_DODUMP:
> -		if (new_flags & VM_SPECIAL) {
> +		if (!is_vm_hugetlb_page(vma) && new_flags & VM_SPECIAL) {

Thanks Daniel,

This is certainly a regression.  My only question is whether this condition
should be more specific and test the default hugetlb vma flags
(VM_DONTEXPAND | VM_HUGETLB).  Or, whether simply checking VM_HUGETLB as you
have done above is sufficient.  Only reason for concern is that I am not
100% certain other VM_SPECIAL flags could not be set in VM_HUGETLB vma.

Perhaps Konstantin has an opinion as he did a bunch of the vm_flag reorg.

-- 
Mike Kravetz


>  			error = -EINVAL;
>  			goto out;
>  		}
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: madvise(MADV_DODUMP) allow hugetlbfs pages
  2018-10-01 22:11 ` Mike Kravetz
@ 2018-10-03  5:47   ` Daniel Black
  2018-10-04 21:47     ` Mike Kravetz
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Black @ 2018-10-03  5:47 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: linux-mm, khlebnikov

On Mon, 1 Oct 2018 15:11:32 -0700
Mike Kravetz <mike.kravetz@oracle.com> wrote:

> On 9/29/18 10:46 PM, Daniel Black wrote:
> <snip>
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 972a9eaa898b..71d21df2a3f3 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -96,7 +96,7 @@ static long madvise_behavior(struct
> > vm_area_struct *vma, new_flags |= VM_DONTDUMP;
> >  		break;
> >  	case MADV_DODUMP:
> > -		if (new_flags & VM_SPECIAL) {
> > +		if (!is_vm_hugetlb_page(vma) && new_flags &
> > VM_SPECIAL) {
> 
> Thanks Daniel,
> 
> This is certainly a regression.  My only question is whether this
> condition should be more specific and test the default hugetlb vma
> flags (VM_DONTEXPAND | VM_HUGETLB).

>  Or, whether simply checking
> VM_HUGETLB as you have done above is sufficient.

The is_vm_hugetlb_page() function seems widely used elsewhere for that
single purpose.

> Only reason for
> concern is that I am not 100% certain other VM_SPECIAL flags could
> not be set in VM_HUGETLB vma.

They might be, but being a VM_HUGETLB flag is the main criteria for
being able to madvise(DODUMP) on the memory. It highlight its user
memory for the user to do as they wish.

When 314e51b9851b was added, it seemed the direction was to kill of the
VM_RESERVED, now a few years later VM_SPECIAL seems to be replacing
this. I think it would be better to preserve the original goal and
keep flags having a single meaning.

The purpose in 0103bd16fb90 as I surmise it, is that VM_IO | VM_PFNMAP
| VM_MIXEDMAP are the true things that want to be prevented from having
madvise(DO_DUMP) on them, based on frequent use of DONT_DUMP with those
memory pages. Was VM_DONTEXPAND an intentional inclusion there it did it
just get included with VM_SPECIAL?

Either way, I've tried to keep to the principles of the
is_vm_hugetlb_page function being the authoritative source of a HUGETLB
page.

> Perhaps Konstantin has an opinion as he did a bunch of the vm_flag
> reorg.
> 

Thanks for the review.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: madvise(MADV_DODUMP) allow hugetlbfs pages
  2018-10-03  5:47   ` Daniel Black
@ 2018-10-04 21:47     ` Mike Kravetz
  0 siblings, 0 replies; 4+ messages in thread
From: Mike Kravetz @ 2018-10-04 21:47 UTC (permalink / raw)
  To: Daniel Black; +Cc: linux-mm, khlebnikov, Andrew Morton

On 10/2/18 10:47 PM, Daniel Black wrote:
> On Mon, 1 Oct 2018 15:11:32 -0700
> Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
>> On 9/29/18 10:46 PM, Daniel Black wrote:
>> <snip>
>> > diff --git a/mm/madvise.c b/mm/madvise.c
>> > index 972a9eaa898b..71d21df2a3f3 100644
>> > --- a/mm/madvise.c
>> > +++ b/mm/madvise.c
>> > @@ -96,7 +96,7 @@ static long madvise_behavior(struct
>> > vm_area_struct *vma, new_flags |= VM_DONTDUMP;
>> >          break;
>> >      case MADV_DODUMP:
>> > -        if (new_flags & VM_SPECIAL) {
>> > +        if (!is_vm_hugetlb_page(vma) && new_flags &
>> > VM_SPECIAL) {
>>
>> Thanks Daniel,
>>
>> This is certainly a regression.  My only question is whether this
>> condition should be more specific and test the default hugetlb vma
>> flags (VM_DONTEXPAND | VM_HUGETLB).
> 
>>  Or, whether simply checking
>> VM_HUGETLB as you have done above is sufficient.
> 
> The is_vm_hugetlb_page() function seems widely used elsewhere for that
> single purpose.
> 
>> Only reason for
>> concern is that I am not 100% certain other VM_SPECIAL flags could
>> not be set in VM_HUGETLB vma.
> 
> They might be, but being a VM_HUGETLB flag is the main criteria for
> being able to madvise(DODUMP) on the memory. It highlight its user
> memory for the user to do as they wish.
> 
> When 314e51b9851b was added, it seemed the direction was to kill of the
> VM_RESERVED, now a few years later VM_SPECIAL seems to be replacing
> this. I think it would be better to preserve the original goal and
> keep flags having a single meaning.
> 
> The purpose in 0103bd16fb90 as I surmise it, is that VM_IO | VM_PFNMAP
> | VM_MIXEDMAP are the true things that want to be prevented from having
> madvise(DO_DUMP) on them, based on frequent use of DONT_DUMP with those
> memory pages. Was VM_DONTEXPAND an intentional inclusion there it did it
> just get included with VM_SPECIAL?
> 
> Either way, I've tried to keep to the principles of the
> is_vm_hugetlb_page function being the authoritative source of a HUGETLB
> page.
> 
>> Perhaps Konstantin has an opinion as he did a bunch of the vm_flag
>> reorg.
>>
> 
> Thanks for the review.
> 

Thanks for the explanation.  I was thinking of the case where hugetlb pages
could be used for RDMA, and wanted to make sure those drivers were not doing
anything to vm_flags.  A quick look reveals nothing special happening.

You can add,
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

Adding Andrew on Cc:
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-10-04 21:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-30  5:46 [PATCH] mm: madvise(MADV_DODUMP) allow hugetlbfs pages Daniel Black
2018-10-01 22:11 ` Mike Kravetz
2018-10-03  5:47   ` Daniel Black
2018-10-04 21:47     ` Mike Kravetz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).