linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT
       [not found] <1528484212-7199-1-git-send-email-jbaron@akamai.com>
@ 2018-06-11  7:20 ` Michal Hocko
  2018-06-11 14:51   ` Jason Baron
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2018-06-11  7:20 UTC (permalink / raw)
  To: Jason Baron
  Cc: akpm, linux-kernel, linux-mm, Vlastimil Babka, Joonsoo Kim,
	Mel Gorman, Kirill A. Shutemov, linux-api

[CCing linux-api - please make sure to CC this mailing list anytime you
 are touching user visible apis]

On Fri 08-06-18 14:56:52, Jason Baron wrote:
> In order to free memory that is marked MLOCK_ONFAULT, the memory region
> needs to be first unlocked, before calling MADV_DONTNEED. And if the region
> is to be reused as MLOCK_ONFAULT, we require another call to mlock2() with
> the MLOCK_ONFAULT flag.
> 
> Let's simplify freeing memory that is set MLOCK_ONFAULT, by allowing
> MADV_DONTNEED to work directly for memory that is set MLOCK_ONFAULT.

I do not understand the point here. How is MLOCK_ONFAULT any different
from the regular mlock here? If you want to free mlocked memory then
fine but the behavior should be consistent. MLOCK_ONFAULT is just a way
to say that we do not want to pre-populate the mlocked area and do that
lazily on the page fault time. madvise should make any difference here.

That being said we do not allow MADV_DONTNEED on VM_LOCKED since ever. I
do not really see why but this would be a user visible change. Can we do
that? What was the original motivation for exclusion?

[keeping the rest of email for linux-api]

> The
> locked memory limits, tracked by mm->locked_vm do not need to be adjusted
> in this case, since they were charged to the entire region when
> MLOCK_ONFAULT was initially set.
> 
> Further, I don't think allowing MADV_FREE for MLOCK_ONFAULT regions makes
> sense, since the point of MLOCK_ONFAULT is for userspace to know when pages
> are locked in memory and thus to know when page faults will occur.
> 
> Signed-off-by: Jason Baron <jbaron@akamai.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  mm/internal.h | 18 ++++++++++++++++++
>  mm/madvise.c  |  4 ++--
>  mm/oom_kill.c |  2 +-
>  3 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 9e3654d..16c0041 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -15,6 +15,7 @@
>  #include <linux/mm.h>
>  #include <linux/pagemap.h>
>  #include <linux/tracepoint-defs.h>
> +#include <uapi/asm-generic/mman-common.h>
>  
>  /*
>   * The set of flags that only affect watermark checking and reclaim
> @@ -45,9 +46,26 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
>  
>  static inline bool can_madv_dontneed_vma(struct vm_area_struct *vma)
>  {
> +	return !(((vma->vm_flags & (VM_LOCKED|VM_LOCKONFAULT)) == VM_LOCKED) ||
> +		 (vma->vm_flags & (VM_HUGETLB|VM_PFNMAP)));
> +}
> +
> +static inline bool can_madv_free_vma(struct vm_area_struct *vma)
> +{
>  	return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP));
>  }
>  
> +static inline bool can_madv_dontneed_or_free_vma(struct vm_area_struct *vma,
> +						 int behavior)
> +{
> +	if (behavior == MADV_DONTNEED)
> +		return can_madv_dontneed_vma(vma);
> +	else if (behavior == MADV_FREE)
> +		return can_madv_free_vma(vma);
> +	else
> +		return 0;
> +}
> +
>  void unmap_page_range(struct mmu_gather *tlb,
>  			     struct vm_area_struct *vma,
>  			     unsigned long addr, unsigned long end,
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 4d3c922..61ff306 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -517,7 +517,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
>  				  int behavior)
>  {
>  	*prev = vma;
> -	if (!can_madv_dontneed_vma(vma))
> +	if (!can_madv_dontneed_or_free_vma(vma, behavior))
>  		return -EINVAL;
>  
>  	if (!userfaultfd_remove(vma, start, end)) {
> @@ -539,7 +539,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
>  			 */
>  			return -ENOMEM;
>  		}
> -		if (!can_madv_dontneed_vma(vma))
> +		if (!can_madv_dontneed_or_free_vma(vma, behavior))
>  			return -EINVAL;
>  		if (end > vma->vm_end) {
>  			/*
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 8ba6cb8..9817d15 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -492,7 +492,7 @@ void __oom_reap_task_mm(struct mm_struct *mm)
>  	set_bit(MMF_UNSTABLE, &mm->flags);
>  
>  	for (vma = mm->mmap ; vma; vma = vma->vm_next) {
> -		if (!can_madv_dontneed_vma(vma))
> +		if (!can_madv_free_vma(vma))
>  			continue;
>  
>  		/*
> -- 
> 2.7.4
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT
  2018-06-11  7:20 ` [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT Michal Hocko
@ 2018-06-11 14:51   ` Jason Baron
  2018-06-11 15:03     ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Baron @ 2018-06-11 14:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, linux-kernel, linux-mm, Vlastimil Babka, Joonsoo Kim,
	Mel Gorman, Kirill A. Shutemov, linux-api, emunson

On 06/11/2018 03:20 AM, Michal Hocko wrote:
> [CCing linux-api - please make sure to CC this mailing list anytime you
>  are touching user visible apis]
> 
> On Fri 08-06-18 14:56:52, Jason Baron wrote:
>> In order to free memory that is marked MLOCK_ONFAULT, the memory region
>> needs to be first unlocked, before calling MADV_DONTNEED. And if the region
>> is to be reused as MLOCK_ONFAULT, we require another call to mlock2() with
>> the MLOCK_ONFAULT flag.
>>
>> Let's simplify freeing memory that is set MLOCK_ONFAULT, by allowing
>> MADV_DONTNEED to work directly for memory that is set MLOCK_ONFAULT.
> 
> I do not understand the point here. How is MLOCK_ONFAULT any different
> from the regular mlock here? If you want to free mlocked memory then
> fine but the behavior should be consistent. MLOCK_ONFAULT is just a way
> to say that we do not want to pre-populate the mlocked area and do that
> lazily on the page fault time. madvise should make any difference here.
>

The difference for me is after the page has been freed, MLOCK_ONFAULT
will re-populate the range if its accessed again. Whereas with regular
mlock I don't think it will because its normally done at mlock() or
mmap() time. In any case, the state of a region being locked with
regular mlock and pages not present does not currently exist, whereas it
does for MLOCK_ONFAULT, so it seems more natural to do it only for
MLOCK_ONFAULT. Finally, the use-case we had for this, didn't need
regular mlock().

> That being said we do not allow MADV_DONTNEED on VM_LOCKED since ever. I
> do not really see why but this would be a user visible change. Can we do
> that? What was the original motivation for exclusion?
> 

I'm not sure precisely for regular mlock. But for MLOCK_ONFAULT I did
ask the original author, Eric Munson (added to the 'cc) about allowing
MADV_DONTNEED, and iirc, he thought it made sense for MLOCK_ONFAULT.

Thanks,

-Jason


> [keeping the rest of email for linux-api]
> 
>> The
>> locked memory limits, tracked by mm->locked_vm do not need to be adjusted
>> in this case, since they were charged to the entire region when
>> MLOCK_ONFAULT was initially set.
>>
>> Further, I don't think allowing MADV_FREE for MLOCK_ONFAULT regions makes
>> sense, since the point of MLOCK_ONFAULT is for userspace to know when pages
>> are locked in memory and thus to know when page faults will occur.
>>
>> Signed-off-by: Jason Baron <jbaron@akamai.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> ---
>>  mm/internal.h | 18 ++++++++++++++++++
>>  mm/madvise.c  |  4 ++--
>>  mm/oom_kill.c |  2 +-
>>  3 files changed, 21 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 9e3654d..16c0041 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -15,6 +15,7 @@
>>  #include <linux/mm.h>
>>  #include <linux/pagemap.h>
>>  #include <linux/tracepoint-defs.h>
>> +#include <uapi/asm-generic/mman-common.h>
>>  
>>  /*
>>   * The set of flags that only affect watermark checking and reclaim
>> @@ -45,9 +46,26 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
>>  
>>  static inline bool can_madv_dontneed_vma(struct vm_area_struct *vma)
>>  {
>> +	return !(((vma->vm_flags & (VM_LOCKED|VM_LOCKONFAULT)) == VM_LOCKED) ||
>> +		 (vma->vm_flags & (VM_HUGETLB|VM_PFNMAP)));
>> +}
>> +
>> +static inline bool can_madv_free_vma(struct vm_area_struct *vma)
>> +{
>>  	return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP));
>>  }
>>  
>> +static inline bool can_madv_dontneed_or_free_vma(struct vm_area_struct *vma,
>> +						 int behavior)
>> +{
>> +	if (behavior == MADV_DONTNEED)
>> +		return can_madv_dontneed_vma(vma);
>> +	else if (behavior == MADV_FREE)
>> +		return can_madv_free_vma(vma);
>> +	else
>> +		return 0;
>> +}
>> +
>>  void unmap_page_range(struct mmu_gather *tlb,
>>  			     struct vm_area_struct *vma,
>>  			     unsigned long addr, unsigned long end,
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 4d3c922..61ff306 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -517,7 +517,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
>>  				  int behavior)
>>  {
>>  	*prev = vma;
>> -	if (!can_madv_dontneed_vma(vma))
>> +	if (!can_madv_dontneed_or_free_vma(vma, behavior))
>>  		return -EINVAL;
>>  
>>  	if (!userfaultfd_remove(vma, start, end)) {
>> @@ -539,7 +539,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
>>  			 */
>>  			return -ENOMEM;
>>  		}
>> -		if (!can_madv_dontneed_vma(vma))
>> +		if (!can_madv_dontneed_or_free_vma(vma, behavior))
>>  			return -EINVAL;
>>  		if (end > vma->vm_end) {
>>  			/*
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 8ba6cb8..9817d15 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -492,7 +492,7 @@ void __oom_reap_task_mm(struct mm_struct *mm)
>>  	set_bit(MMF_UNSTABLE, &mm->flags);
>>  
>>  	for (vma = mm->mmap ; vma; vma = vma->vm_next) {
>> -		if (!can_madv_dontneed_vma(vma))
>> +		if (!can_madv_free_vma(vma))
>>  			continue;
>>  
>>  		/*
>> -- 
>> 2.7.4
>>
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT
  2018-06-11 14:51   ` Jason Baron
@ 2018-06-11 15:03     ` Michal Hocko
  2018-06-11 16:23       ` Jason Baron
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2018-06-11 15:03 UTC (permalink / raw)
  To: Jason Baron
  Cc: akpm, linux-kernel, linux-mm, Vlastimil Babka, Joonsoo Kim,
	Mel Gorman, Kirill A. Shutemov, linux-api, emunson

On Mon 11-06-18 10:51:44, Jason Baron wrote:
> On 06/11/2018 03:20 AM, Michal Hocko wrote:
> > [CCing linux-api - please make sure to CC this mailing list anytime you
> >  are touching user visible apis]
> > 
> > On Fri 08-06-18 14:56:52, Jason Baron wrote:
> >> In order to free memory that is marked MLOCK_ONFAULT, the memory region
> >> needs to be first unlocked, before calling MADV_DONTNEED. And if the region
> >> is to be reused as MLOCK_ONFAULT, we require another call to mlock2() with
> >> the MLOCK_ONFAULT flag.
> >>
> >> Let's simplify freeing memory that is set MLOCK_ONFAULT, by allowing
> >> MADV_DONTNEED to work directly for memory that is set MLOCK_ONFAULT.
> > 
> > I do not understand the point here. How is MLOCK_ONFAULT any different
> > from the regular mlock here? If you want to free mlocked memory then
> > fine but the behavior should be consistent. MLOCK_ONFAULT is just a way
> > to say that we do not want to pre-populate the mlocked area and do that
> > lazily on the page fault time. madvise should make any difference here.
> >
> 
> The difference for me is after the page has been freed, MLOCK_ONFAULT
> will re-populate the range if its accessed again. Whereas with regular
> mlock I don't think it will because its normally done at mlock() or
> mmap() time.

The vma would still be locked so we would effectively turn it into
ONFAULT IIRC.

> In any case, the state of a region being locked with
> regular mlock and pages not present does not currently exist, whereas it
> does for MLOCK_ONFAULT, so it seems more natural to do it only for
> MLOCK_ONFAULT. Finally, the use-case we had for this, didn't need
> regular mlock().

So can we start discussing whether we want to allow MADV_DONTNEED on
mlocked areas and what downsides it might have? Sure it would turn the
strong mlock guarantee to have the whole vma resident but is this
acceptable for something that is an explicit request from the owner of
the memory?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT
  2018-06-11 15:03     ` Michal Hocko
@ 2018-06-11 16:23       ` Jason Baron
  2018-06-12  7:46         ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Baron @ 2018-06-11 16:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, linux-kernel, linux-mm, Vlastimil Babka, Joonsoo Kim,
	Mel Gorman, Kirill A. Shutemov, linux-api, emunson



On 06/11/2018 11:03 AM, Michal Hocko wrote:
> On Mon 11-06-18 10:51:44, Jason Baron wrote:
>> On 06/11/2018 03:20 AM, Michal Hocko wrote:
>>> [CCing linux-api - please make sure to CC this mailing list anytime you
>>>  are touching user visible apis]
>>>
>>> On Fri 08-06-18 14:56:52, Jason Baron wrote:
>>>> In order to free memory that is marked MLOCK_ONFAULT, the memory region
>>>> needs to be first unlocked, before calling MADV_DONTNEED. And if the region
>>>> is to be reused as MLOCK_ONFAULT, we require another call to mlock2() with
>>>> the MLOCK_ONFAULT flag.
>>>>
>>>> Let's simplify freeing memory that is set MLOCK_ONFAULT, by allowing
>>>> MADV_DONTNEED to work directly for memory that is set MLOCK_ONFAULT.
>>>
>>> I do not understand the point here. How is MLOCK_ONFAULT any different
>>> from the regular mlock here? If you want to free mlocked memory then
>>> fine but the behavior should be consistent. MLOCK_ONFAULT is just a way
>>> to say that we do not want to pre-populate the mlocked area and do that
>>> lazily on the page fault time. madvise should make any difference here.
>>>
>>
>> The difference for me is after the page has been freed, MLOCK_ONFAULT
>> will re-populate the range if its accessed again. Whereas with regular
>> mlock I don't think it will because its normally done at mlock() or
>> mmap() time.
> 
> The vma would still be locked so we would effectively turn it into
> ONFAULT IIRC.
>

Indeed. I just tried allowing MADV_DONTNEED against regular mlock() and
in my brief testing it seemed to work as expected against both anonymous
and file back pages. I am certainly not against allowing it for regular
mlock() as well, if you think that makes it more consistent.


>> In any case, the state of a region being locked with
>> regular mlock and pages not present does not currently exist, whereas it
>> does for MLOCK_ONFAULT, so it seems more natural to do it only for
>> MLOCK_ONFAULT. Finally, the use-case we had for this, didn't need
>> regular mlock().
> 
> So can we start discussing whether we want to allow MADV_DONTNEED on
> mlocked areas and what downsides it might have? Sure it would turn the
> strong mlock guarantee to have the whole vma resident but is this
> acceptable for something that is an explicit request from the owner of
> the memory?
> 

If its being explicity requested by the owner it makes sense to me. I
guess there could be a concern about this breaking some userspace that
relied on MADV_DONTNEED not freeing locked memory?

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT
  2018-06-11 16:23       ` Jason Baron
@ 2018-06-12  7:46         ` Michal Hocko
  2018-06-12 14:11           ` Jason Baron
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2018-06-12  7:46 UTC (permalink / raw)
  To: Jason Baron
  Cc: akpm, linux-kernel, linux-mm, Vlastimil Babka, Joonsoo Kim,
	Mel Gorman, Kirill A. Shutemov, linux-api, emunson

On Mon 11-06-18 12:23:58, Jason Baron wrote:
> On 06/11/2018 11:03 AM, Michal Hocko wrote:
> > So can we start discussing whether we want to allow MADV_DONTNEED on
> > mlocked areas and what downsides it might have? Sure it would turn the
> > strong mlock guarantee to have the whole vma resident but is this
> > acceptable for something that is an explicit request from the owner of
> > the memory?
> > 
> 
> If its being explicity requested by the owner it makes sense to me. I
> guess there could be a concern about this breaking some userspace that
> relied on MADV_DONTNEED not freeing locked memory?

Yes, this is always the fear when changing user visible behavior.  I can
imagine that a userspace allocator calling MADV_DONTNEED on free could
break. The same would apply to MLOCK_ONFAULT/MCL_ONFAULT though. We
have the new flag much shorter so the probability is smaller but the
problem is very same. So I _think_ we should treat both the same because
semantically they are indistinguishable from the MADV_DONTNEED POV. Both
remove faulted and mlocked pages. Mlock, once applied, should guarantee
no later major fault and MADV_DONTNEED breaks that obviously.

So the more I think about it the more I am worried about this but I am
more and more convinced that making ONFAULT special is just a wrong way
around this.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT
  2018-06-12  7:46         ` Michal Hocko
@ 2018-06-12 14:11           ` Jason Baron
  2018-06-13  6:32             ` Vlastimil Babka
  2018-06-13  9:13             ` Michal Hocko
  0 siblings, 2 replies; 15+ messages in thread
From: Jason Baron @ 2018-06-12 14:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, linux-kernel, linux-mm, Vlastimil Babka, Joonsoo Kim,
	Mel Gorman, Kirill A. Shutemov, linux-api, emunson



On 06/12/2018 03:46 AM, Michal Hocko wrote:
> On Mon 11-06-18 12:23:58, Jason Baron wrote:
>> On 06/11/2018 11:03 AM, Michal Hocko wrote:
>>> So can we start discussing whether we want to allow MADV_DONTNEED on
>>> mlocked areas and what downsides it might have? Sure it would turn the
>>> strong mlock guarantee to have the whole vma resident but is this
>>> acceptable for something that is an explicit request from the owner of
>>> the memory?
>>>
>>
>> If its being explicity requested by the owner it makes sense to me. I
>> guess there could be a concern about this breaking some userspace that
>> relied on MADV_DONTNEED not freeing locked memory?
> 
> Yes, this is always the fear when changing user visible behavior.  I can
> imagine that a userspace allocator calling MADV_DONTNEED on free could
> break. The same would apply to MLOCK_ONFAULT/MCL_ONFAULT though. We
> have the new flag much shorter so the probability is smaller but the
> problem is very same. So I _think_ we should treat both the same because
> semantically they are indistinguishable from the MADV_DONTNEED POV. Both
> remove faulted and mlocked pages. Mlock, once applied, should guarantee
> no later major fault and MADV_DONTNEED breaks that obviously.
> 
> So the more I think about it the more I am worried about this but I am
> more and more convinced that making ONFAULT special is just a wrong way
> around this.
> 

Ok, I share the concern that there is a chance that userspace is relying
on MADV_DONTNEED not free'ing locked memory. In that case, what if we
introduce a MADV_DONTNEED_FORCE, which does everything that
MADV_DONTNEED currently does but in addition will also free mlock areas.
That way there is no concern about breaking something.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT
  2018-06-12 14:11           ` Jason Baron
@ 2018-06-13  6:32             ` Vlastimil Babka
  2018-06-13  7:15               ` Michal Hocko
  2018-06-13  9:13             ` Michal Hocko
  1 sibling, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2018-06-13  6:32 UTC (permalink / raw)
  To: Jason Baron, Michal Hocko
  Cc: akpm, linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman,
	Kirill A. Shutemov, linux-api, emunson

On 06/12/2018 04:11 PM, Jason Baron wrote:
> 
> 
> On 06/12/2018 03:46 AM, Michal Hocko wrote:
>> On Mon 11-06-18 12:23:58, Jason Baron wrote:
>>> On 06/11/2018 11:03 AM, Michal Hocko wrote:
>>>> So can we start discussing whether we want to allow MADV_DONTNEED on
>>>> mlocked areas and what downsides it might have? Sure it would turn the
>>>> strong mlock guarantee to have the whole vma resident but is this
>>>> acceptable for something that is an explicit request from the owner of
>>>> the memory?
>>>>
>>>
>>> If its being explicity requested by the owner it makes sense to me. I
>>> guess there could be a concern about this breaking some userspace that
>>> relied on MADV_DONTNEED not freeing locked memory?
>>
>> Yes, this is always the fear when changing user visible behavior.  I can
>> imagine that a userspace allocator calling MADV_DONTNEED on free could
>> break. The same would apply to MLOCK_ONFAULT/MCL_ONFAULT though. We
>> have the new flag much shorter so the probability is smaller but the
>> problem is very same. So I _think_ we should treat both the same because
>> semantically they are indistinguishable from the MADV_DONTNEED POV. Both
>> remove faulted and mlocked pages. Mlock, once applied, should guarantee
>> no later major fault and MADV_DONTNEED breaks that obviously.

I think more concerning than guaranteeing no later major fault is
possible data loss, e.g. replacing data with zero-filled pages.

The madvise manpage is also quite specific about not allowing
MADV_DONTNEED and MADV_FREE for locked pages.

So I don't think we should risk changing that for all mlocked pages.
Maybe we can risk MCL_ONFAULT, since it's relatively new and has few users?

>> So the more I think about it the more I am worried about this but I am
>> more and more convinced that making ONFAULT special is just a wrong way
>> around this.
>>
> 
> Ok, I share the concern that there is a chance that userspace is relying
> on MADV_DONTNEED not free'ing locked memory. In that case, what if we
> introduce a MADV_DONTNEED_FORCE, which does everything that
> MADV_DONTNEED currently does but in addition will also free mlock areas.
> That way there is no concern about breaking something.

A new niche case flag? Sad :(

BTW I didn't get why we should allow this for MADV_DONTNEED but not
MADV_FREE. Can you expand on that?

> Thanks,
> 
> -Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT
  2018-06-13  6:32             ` Vlastimil Babka
@ 2018-06-13  7:15               ` Michal Hocko
  2018-06-13  7:51                 ` Vlastimil Babka
  2018-06-15 19:36                 ` Jason Baron
  0 siblings, 2 replies; 15+ messages in thread
From: Michal Hocko @ 2018-06-13  7:15 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Jason Baron, akpm, linux-kernel, linux-mm, Joonsoo Kim,
	Mel Gorman, Kirill A. Shutemov, linux-api, emunson

On Wed 13-06-18 08:32:19, Vlastimil Babka wrote:
> On 06/12/2018 04:11 PM, Jason Baron wrote:
> > 
> > 
> > On 06/12/2018 03:46 AM, Michal Hocko wrote:
> >> On Mon 11-06-18 12:23:58, Jason Baron wrote:
> >>> On 06/11/2018 11:03 AM, Michal Hocko wrote:
> >>>> So can we start discussing whether we want to allow MADV_DONTNEED on
> >>>> mlocked areas and what downsides it might have? Sure it would turn the
> >>>> strong mlock guarantee to have the whole vma resident but is this
> >>>> acceptable for something that is an explicit request from the owner of
> >>>> the memory?
> >>>>
> >>>
> >>> If its being explicity requested by the owner it makes sense to me. I
> >>> guess there could be a concern about this breaking some userspace that
> >>> relied on MADV_DONTNEED not freeing locked memory?
> >>
> >> Yes, this is always the fear when changing user visible behavior.  I can
> >> imagine that a userspace allocator calling MADV_DONTNEED on free could
> >> break. The same would apply to MLOCK_ONFAULT/MCL_ONFAULT though. We
> >> have the new flag much shorter so the probability is smaller but the
> >> problem is very same. So I _think_ we should treat both the same because
> >> semantically they are indistinguishable from the MADV_DONTNEED POV. Both
> >> remove faulted and mlocked pages. Mlock, once applied, should guarantee
> >> no later major fault and MADV_DONTNEED breaks that obviously.
> 
> I think more concerning than guaranteeing no later major fault is
> possible data loss, e.g. replacing data with zero-filled pages.

But MADV_DONTNEED is an explicit call for data loss. Or do I miss your
point?

> The madvise manpage is also quite specific about not allowing
> MADV_DONTNEED and MADV_FREE for locked pages.

Yeah, but that seems to describe the state of the art rather than
explain why.

> So I don't think we should risk changing that for all mlocked pages.
> Maybe we can risk MCL_ONFAULT, since it's relatively new and has few users?

That is what Jason wanted but I argued that the two are the same from
MADV_DONTNEED point of view. I do not see how treating them differently
would be less confusing or error prone. It's new so we can make it
behave differently is certainly not an argument.

> >> So the more I think about it the more I am worried about this but I am
> >> more and more convinced that making ONFAULT special is just a wrong way
> >> around this.
> >>
> > 
> > Ok, I share the concern that there is a chance that userspace is relying
> > on MADV_DONTNEED not free'ing locked memory. In that case, what if we
> > introduce a MADV_DONTNEED_FORCE, which does everything that
> > MADV_DONTNEED currently does but in addition will also free mlock areas.
> > That way there is no concern about breaking something.
> 
> A new niche case flag? Sad :(
> 
> BTW I didn't get why we should allow this for MADV_DONTNEED but not
> MADV_FREE. Can you expand on that?

Well, I wanted to bring this up as well. I guess this would require some
more hacks to handle the reclaim path correctly because we do rely on
VM_LOCK at many places for the lazy mlock pages culling.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT
  2018-06-13  7:15               ` Michal Hocko
@ 2018-06-13  7:51                 ` Vlastimil Babka
  2018-06-13  8:37                   ` Michal Hocko
  2018-06-15 19:36                 ` Jason Baron
  1 sibling, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2018-06-13  7:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Jason Baron, akpm, linux-kernel, linux-mm, Joonsoo Kim,
	Mel Gorman, Kirill A. Shutemov, linux-api, emunson

On 06/13/2018 09:15 AM, Michal Hocko wrote:
> On Wed 13-06-18 08:32:19, Vlastimil Babka wrote:
>> On 06/12/2018 04:11 PM, Jason Baron wrote:
>>>
>>>
>>> On 06/12/2018 03:46 AM, Michal Hocko wrote:
>>>> On Mon 11-06-18 12:23:58, Jason Baron wrote:
>>>>> On 06/11/2018 11:03 AM, Michal Hocko wrote:
>>>>>> So can we start discussing whether we want to allow MADV_DONTNEED on
>>>>>> mlocked areas and what downsides it might have? Sure it would turn the
>>>>>> strong mlock guarantee to have the whole vma resident but is this
>>>>>> acceptable for something that is an explicit request from the owner of
>>>>>> the memory?
>>>>>>
>>>>>
>>>>> If its being explicity requested by the owner it makes sense to me. I
>>>>> guess there could be a concern about this breaking some userspace that
>>>>> relied on MADV_DONTNEED not freeing locked memory?
>>>>
>>>> Yes, this is always the fear when changing user visible behavior.  I can
>>>> imagine that a userspace allocator calling MADV_DONTNEED on free could
>>>> break. The same would apply to MLOCK_ONFAULT/MCL_ONFAULT though. We
>>>> have the new flag much shorter so the probability is smaller but the
>>>> problem is very same. So I _think_ we should treat both the same because
>>>> semantically they are indistinguishable from the MADV_DONTNEED POV. Both
>>>> remove faulted and mlocked pages. Mlock, once applied, should guarantee
>>>> no later major fault and MADV_DONTNEED breaks that obviously.
>>
>> I think more concerning than guaranteeing no later major fault is
>> possible data loss, e.g. replacing data with zero-filled pages.
> 
> But MADV_DONTNEED is an explicit call for data loss. Or do I miss your
> point?

My point is that if somebody is relying on MADV_DONTNEED not affecting
mlocked pages, the consequences will be unexpected data loss, not just
extra page faults.

>> The madvise manpage is also quite specific about not allowing
>> MADV_DONTNEED and MADV_FREE for locked pages.
> 
> Yeah, but that seems to describe the state of the art rather than
> explain why.

Right, but as it's explicitly described there, it makes it more likely
that somebody is relying on it.

>> So I don't think we should risk changing that for all mlocked pages.
>> Maybe we can risk MCL_ONFAULT, since it's relatively new and has few users?
> 
> That is what Jason wanted but I argued that the two are the same from
> MADV_DONTNEED point of view. I do not see how treating them differently
> would be less confusing or error prone. It's new so we can make it
> behave differently is certainly not an argument.

Right. Might be either this inconsistency, or a new flag.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT
  2018-06-13  7:51                 ` Vlastimil Babka
@ 2018-06-13  8:37                   ` Michal Hocko
  0 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2018-06-13  8:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Jason Baron, akpm, linux-kernel, linux-mm, Joonsoo Kim,
	Mel Gorman, Kirill A. Shutemov, linux-api, emunson

On Wed 13-06-18 09:51:23, Vlastimil Babka wrote:
> On 06/13/2018 09:15 AM, Michal Hocko wrote:
> > On Wed 13-06-18 08:32:19, Vlastimil Babka wrote:
[...]
> >> I think more concerning than guaranteeing no later major fault is
> >> possible data loss, e.g. replacing data with zero-filled pages.
> > 
> > But MADV_DONTNEED is an explicit call for data loss. Or do I miss your
> > point?
> 
> My point is that if somebody is relying on MADV_DONTNEED not affecting
> mlocked pages, the consequences will be unexpected data loss, not just
> extra page faults.

OK, I see your point now. I would consider this an application bug
though. Calling MADV_DONTNEED and wondering that the content is gone is,
ehm, questionable at best. Why would anybody do that in the first place?

Anyway, I think that we cannot change the behavior because of mlockall
semantic as mentioned earlier.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT
  2018-06-12 14:11           ` Jason Baron
  2018-06-13  6:32             ` Vlastimil Babka
@ 2018-06-13  9:13             ` Michal Hocko
  2018-06-15 19:28               ` Jason Baron
  1 sibling, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2018-06-13  9:13 UTC (permalink / raw)
  To: Jason Baron
  Cc: akpm, linux-kernel, linux-mm, Vlastimil Babka, Joonsoo Kim,
	Mel Gorman, Kirill A. Shutemov, linux-api, emunson

On Tue 12-06-18 10:11:33, Jason Baron wrote:
[...]
> Ok, I share the concern that there is a chance that userspace is relying
> on MADV_DONTNEED not free'ing locked memory. In that case, what if we
> introduce a MADV_DONTNEED_FORCE, which does everything that
> MADV_DONTNEED currently does but in addition will also free mlock areas.

What about other types of vmas that are not allowed to MADV_DONTNEED?
_FORCE suggests that the user of the API know what he is doing so why
shouldn't we allow unmapping hugetlb pages or special PFNMAPS? Or do we
want to add MADV_DONTNEED_FORCE_FOR_REAL when somebody comes with
another usecase?

I agree with Vlastimil that adding new madvise mode for niche case
sounds like a bad idea so we should better be sure that a new flag has
a reasonable semantic. Just allow mlocked pages is more of a tweak than
a proper semantic. So making it force for real requires to analyze what
that would mean for other vmas which are excluded now.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT
  2018-06-13  9:13             ` Michal Hocko
@ 2018-06-15 19:28               ` Jason Baron
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Baron @ 2018-06-15 19:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, linux-kernel, linux-mm, Vlastimil Babka, Joonsoo Kim,
	Mel Gorman, Kirill A. Shutemov, linux-api, emunson



On 06/13/2018 05:13 AM, Michal Hocko wrote:
> On Tue 12-06-18 10:11:33, Jason Baron wrote:
> [...]
>> Ok, I share the concern that there is a chance that userspace is relying
>> on MADV_DONTNEED not free'ing locked memory. In that case, what if we
>> introduce a MADV_DONTNEED_FORCE, which does everything that
>> MADV_DONTNEED currently does but in addition will also free mlock areas.
> 
> What about other types of vmas that are not allowed to MADV_DONTNEED?
> _FORCE suggests that the user of the API know what he is doing so why
> shouldn't we allow unmapping hugetlb pages or special PFNMAPS? Or do we
> want to add MADV_DONTNEED_FORCE_FOR_REAL when somebody comes with
> another usecase?
> 
> I agree with Vlastimil that adding new madvise mode for niche case
> sounds like a bad idea so we should better be sure that a new flag has
> a reasonable semantic. Just allow mlocked pages is more of a tweak than
> a proper semantic. So making it force for real requires to analyze what
> that would mean for other vmas which are excluded now.
> 

If its a new flag, I agree it makes sense to look at hugetlb and
pfnmaps. pfnmaps might be more work, since it may require callbacks to
do driver specific actions...

I was able to do something very close to the original requirement of
free'ing mlock'd pages, via using a tmpfs mmap that is mlock'd. And then
using fallocate(FALLOC_FL_PUNCH_HOLE) to free the locked pages. I think
the tmpfs is sufficient for my needs, I wonder if there is any other
interest in this feature?

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT
  2018-06-13  7:15               ` Michal Hocko
  2018-06-13  7:51                 ` Vlastimil Babka
@ 2018-06-15 19:36                 ` Jason Baron
  2018-06-20 11:00                   ` Michal Hocko
  1 sibling, 1 reply; 15+ messages in thread
From: Jason Baron @ 2018-06-15 19:36 UTC (permalink / raw)
  To: Michal Hocko, Vlastimil Babka
  Cc: akpm, linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman,
	Kirill A. Shutemov, linux-api, emunson



On 06/13/2018 03:15 AM, Michal Hocko wrote:
> On Wed 13-06-18 08:32:19, Vlastimil Babka wrote:
>> On 06/12/2018 04:11 PM, Jason Baron wrote:
>>>
>>>
>>> On 06/12/2018 03:46 AM, Michal Hocko wrote:
>>>> On Mon 11-06-18 12:23:58, Jason Baron wrote:
>>>>> On 06/11/2018 11:03 AM, Michal Hocko wrote:
>>>>>> So can we start discussing whether we want to allow MADV_DONTNEED on
>>>>>> mlocked areas and what downsides it might have? Sure it would turn the
>>>>>> strong mlock guarantee to have the whole vma resident but is this
>>>>>> acceptable for something that is an explicit request from the owner of
>>>>>> the memory?
>>>>>>
>>>>>
>>>>> If its being explicity requested by the owner it makes sense to me. I
>>>>> guess there could be a concern about this breaking some userspace that
>>>>> relied on MADV_DONTNEED not freeing locked memory?
>>>>
>>>> Yes, this is always the fear when changing user visible behavior.  I can
>>>> imagine that a userspace allocator calling MADV_DONTNEED on free could
>>>> break. The same would apply to MLOCK_ONFAULT/MCL_ONFAULT though. We
>>>> have the new flag much shorter so the probability is smaller but the
>>>> problem is very same. So I _think_ we should treat both the same because
>>>> semantically they are indistinguishable from the MADV_DONTNEED POV. Both
>>>> remove faulted and mlocked pages. Mlock, once applied, should guarantee
>>>> no later major fault and MADV_DONTNEED breaks that obviously.
>>
>> I think more concerning than guaranteeing no later major fault is
>> possible data loss, e.g. replacing data with zero-filled pages.
> 
> But MADV_DONTNEED is an explicit call for data loss. Or do I miss your
> point?
> 
>> The madvise manpage is also quite specific about not allowing
>> MADV_DONTNEED and MADV_FREE for locked pages.
> 
> Yeah, but that seems to describe the state of the art rather than
> explain why.
> 
>> So I don't think we should risk changing that for all mlocked pages.
>> Maybe we can risk MCL_ONFAULT, since it's relatively new and has few users?
> 
> That is what Jason wanted but I argued that the two are the same from
> MADV_DONTNEED point of view. I do not see how treating them differently
> would be less confusing or error prone. It's new so we can make it
> behave differently is certainly not an argument.
> 
>>>> So the more I think about it the more I am worried about this but I am
>>>> more and more convinced that making ONFAULT special is just a wrong way
>>>> around this.
>>>>
>>>
>>> Ok, I share the concern that there is a chance that userspace is relying
>>> on MADV_DONTNEED not free'ing locked memory. In that case, what if we
>>> introduce a MADV_DONTNEED_FORCE, which does everything that
>>> MADV_DONTNEED currently does but in addition will also free mlock areas.
>>> That way there is no concern about breaking something.
>>
>> A new niche case flag? Sad :(
>>
>> BTW I didn't get why we should allow this for MADV_DONTNEED but not
>> MADV_FREE. Can you expand on that?
> 
> Well, I wanted to bring this up as well. I guess this would require some
> more hacks to handle the reclaim path correctly because we do rely on
> VM_LOCK at many places for the lazy mlock pages culling.
> 

The point of not allowing MADV_FREE on mlock'd pages for me was that
with mlock and even MLOCK_ON_FAULT, one can always can always determine
if a page is present or not (and thus avoid the major fault). Allowing
MADV_FREE on lock'd pages breaks that assumption.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT
  2018-06-15 19:36                 ` Jason Baron
@ 2018-06-20 11:00                   ` Michal Hocko
  2018-06-28 20:20                     ` Jason Baron
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2018-06-20 11:00 UTC (permalink / raw)
  To: Jason Baron
  Cc: Vlastimil Babka, akpm, linux-kernel, linux-mm, Joonsoo Kim,
	Mel Gorman, Kirill A. Shutemov, linux-api, emunson

On Fri 15-06-18 15:36:07, Jason Baron wrote:
> 
> 
> On 06/13/2018 03:15 AM, Michal Hocko wrote:
> > On Wed 13-06-18 08:32:19, Vlastimil Babka wrote:
[...]
> >> BTW I didn't get why we should allow this for MADV_DONTNEED but not
> >> MADV_FREE. Can you expand on that?
> > 
> > Well, I wanted to bring this up as well. I guess this would require some
> > more hacks to handle the reclaim path correctly because we do rely on
> > VM_LOCK at many places for the lazy mlock pages culling.
> > 
> 
> The point of not allowing MADV_FREE on mlock'd pages for me was that
> with mlock and even MLOCK_ON_FAULT, one can always can always determine
> if a page is present or not (and thus avoid the major fault). Allowing
> MADV_FREE on lock'd pages breaks that assumption.

But once you have called MADV_FREE you cannot assume anything about the
content until you touch the memory again. So you can safely assume a
major fault for the worst case. Btw. why knowing whether you major fault
is important in the first place? What is an application going to do
about that information?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT
  2018-06-20 11:00                   ` Michal Hocko
@ 2018-06-28 20:20                     ` Jason Baron
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Baron @ 2018-06-28 20:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vlastimil Babka, akpm, linux-kernel, linux-mm, Joonsoo Kim,
	Mel Gorman, Kirill A. Shutemov, linux-api, emunson



On 06/20/2018 07:00 AM, Michal Hocko wrote:
> On Fri 15-06-18 15:36:07, Jason Baron wrote:
>>
>>
>> On 06/13/2018 03:15 AM, Michal Hocko wrote:
>>> On Wed 13-06-18 08:32:19, Vlastimil Babka wrote:
> [...]
>>>> BTW I didn't get why we should allow this for MADV_DONTNEED but not
>>>> MADV_FREE. Can you expand on that?
>>>
>>> Well, I wanted to bring this up as well. I guess this would require some
>>> more hacks to handle the reclaim path correctly because we do rely on
>>> VM_LOCK at many places for the lazy mlock pages culling.
>>>
>>
>> The point of not allowing MADV_FREE on mlock'd pages for me was that
>> with mlock and even MLOCK_ON_FAULT, one can always can always determine
>> if a page is present or not (and thus avoid the major fault). Allowing
>> MADV_FREE on lock'd pages breaks that assumption.
> 
> But once you have called MADV_FREE you cannot assume anything about the
> content until you touch the memory again. So you can safely assume a
> major fault for the worst case. Btw. why knowing whether you major fault
> is important in the first place? What is an application going to do
> about that information?
> 

Fair enough, I think that means you end up with a MADV_FREE_FORCE to
support that case? As I said I worked around this by using tmpfs and
fallocate(FALLOC_FL_PUNCH_HOLE). However, I still think there is a
use-case for doing this for anonymous memory, to avoid the unlock() calls.

The use-case I had in mind was simply an application that has a fast
path for when it knows that the requested item is locked in memory.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-06-28 20:20 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1528484212-7199-1-git-send-email-jbaron@akamai.com>
2018-06-11  7:20 ` [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT Michal Hocko
2018-06-11 14:51   ` Jason Baron
2018-06-11 15:03     ` Michal Hocko
2018-06-11 16:23       ` Jason Baron
2018-06-12  7:46         ` Michal Hocko
2018-06-12 14:11           ` Jason Baron
2018-06-13  6:32             ` Vlastimil Babka
2018-06-13  7:15               ` Michal Hocko
2018-06-13  7:51                 ` Vlastimil Babka
2018-06-13  8:37                   ` Michal Hocko
2018-06-15 19:36                 ` Jason Baron
2018-06-20 11:00                   ` Michal Hocko
2018-06-28 20:20                     ` Jason Baron
2018-06-13  9:13             ` Michal Hocko
2018-06-15 19:28               ` Jason Baron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).