Re: [PATCH RFC 1/4] mm: throttle MADV_FREE

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>, Shaohua Li <shli@kernel.org>,
	Yalin.Wang@sonymobile.com
Subject: Re: [PATCH RFC 1/4] mm: throttle MADV_FREE
Date: Wed, 25 Feb 2015 09:08:09 +0900	[thread overview]
Message-ID: <20150225000809.GA6468@blaptop> (raw)
In-Reply-To: <20150224154318.GA14939@dhcp22.suse.cz>

Hi Michal,

On Tue, Feb 24, 2015 at 04:43:18PM +0100, Michal Hocko wrote:
> On Tue 24-02-15 17:18:14, Minchan Kim wrote:
> > Recently, Shaohua reported that MADV_FREE is much slower than
> > MADV_DONTNEED in his MADV_FREE bomb test. The reason is many of
> > applications went to stall with direct reclaim since kswapd's
> > reclaim speed isn't fast than applications's allocation speed
> > so that it causes lots of stall and lock contention.
> 
> I am not sure I understand this correctly. So the issue is that there is
> huge number of MADV_FREE on the LRU and they are not close to the tail
> of the list so the reclaim has to do a lot of work before it starts
> dropping them?

No, Shaohua already tested deactivating of hinted pages to head/tail
of inactive anon LRU and he said it didn't solve his problem.
I thought main culprit was scanning/rotating/throttling in
direct reclaim path.

> 
> > This patch throttles MADV_FREEing so it works only if there
> > are enough pages in the system which will not trigger backgroud/
> > direct reclaim. Otherwise, MADV_FREE falls back to MADV_DONTNEED
> > because there is no point to delay freeing if we know system
> > is under memory pressure.
> 
> Hmm, this is still conforming to the documentation because the kernel is
> free to free pages at its convenience. I am not sure this is a good
> idea, though. Why some MADV_FREE calls should be treated differently?

It's hint for VM to free pages so I think it's okay to free them instantly
sometime if it can save more important thing like system stall.
IOW, madvise is just hint, not a strict rule.

> Wouldn't that lead to hard to predict behavior? E.g. LIFO reused blocks
> would work without long stalls most of the time - except when there is a
> memory pressure.

True.

> 
> Comparison to MADV_DONTNEED is not very fair IMHO because the scope of the
> two calls is different.

I agree it's not a apple to apple comparison.

Acutally, MADV_FREE moves the cost from hot path(ie, system call path)
to slow path(ie, reclaim context) so it would be slower if there are
much memory pressure continuously due to a lot overhead of freeing pages
in reclaim context. So, it would be good if kernel detects it nicely
and prevent the situation. This patch aims for that.

> 
> > When I test the patch on my 3G machine + 12 CPU + 8G swap,
> > test: 12 processes
> > 
> > loop = 5;
> > mmap(512M);
> 
> Who is eating the rest of the memory?

As I wrote down,  there are 12 processes with below test.
IOW, 512M * 12 = 6G but system RAM is just 3G.

> 
> > while (loop--) {
> > 	memset(512M);
> > 	madvise(MADV_FREE or MADV_DONTNEED);
> > }
> > 
> > 1) dontneed: 6.78user 234.09system 0:48.89elapsed
> > 2) madvfree: 6.03user 401.17system 1:30.67elapsed
> > 3) madvfree + this ptach: 5.68user 113.42system 0:36.52elapsed
> > 
> > It's clearly win.
> > 
> > Reported-by: Shaohua Li <shli@kernel.org>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> I don't know. This looks like a hack with hard to predict consequences
> which might trigger pathological corner cases.

Yeb, it might be. That's why I tagged RFC so hope other guys suggest
better idea.

> 
> > ---
> >  mm/madvise.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 6d0fcb8921c2..81bb26ecf064 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -523,8 +523,17 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
> >  		 * XXX: In this implementation, MADV_FREE works like
> >  		 * MADV_DONTNEED on swapless system or full swap.
> >  		 */
> > -		if (get_nr_swap_pages() > 0)
> > -			return madvise_free(vma, prev, start, end);
> > +		if (get_nr_swap_pages() > 0) {
> > +			unsigned long threshold;
> > +			/*
> > +			 * If we have trobule with memory pressure(ie,
> > +			 * under high watermark), free pages instantly.
> > +			 */
> > +			threshold = min_free_kbytes >> (PAGE_SHIFT - 10);
> > +			threshold = threshold + (threshold >> 1);
> 
> Why threshold += threshold >> 1 ?

I wanted to trigger this logic if we have free pages under high watermark.

> 
> > +			if (nr_free_pages() > threshold)
> > +				return madvise_free(vma, prev, start, end);
> > +		}
> >  		/* passthrough */
> >  	case MADV_DONTNEED:
> >  		return madvise_dontneed(vma, prev, start, end);
> > -- 
> > 1.9.1
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Minchan Kim <minchan@kernel.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>, Shaohua Li <shli@kernel.org>,
	Yalin.Wang@sonymobile.com
Subject: Re: [PATCH RFC 1/4] mm: throttle MADV_FREE
Date: Wed, 25 Feb 2015 09:08:09 +0900	[thread overview]
Message-ID: <20150225000809.GA6468@blaptop> (raw)
In-Reply-To: <20150224154318.GA14939@dhcp22.suse.cz>

Hi Michal,

On Tue, Feb 24, 2015 at 04:43:18PM +0100, Michal Hocko wrote:
> On Tue 24-02-15 17:18:14, Minchan Kim wrote:
> > Recently, Shaohua reported that MADV_FREE is much slower than
> > MADV_DONTNEED in his MADV_FREE bomb test. The reason is many of
> > applications went to stall with direct reclaim since kswapd's
> > reclaim speed isn't fast than applications's allocation speed
> > so that it causes lots of stall and lock contention.
> 
> I am not sure I understand this correctly. So the issue is that there is
> huge number of MADV_FREE on the LRU and they are not close to the tail
> of the list so the reclaim has to do a lot of work before it starts
> dropping them?

No, Shaohua already tested deactivating of hinted pages to head/tail
of inactive anon LRU and he said it didn't solve his problem.
I thought main culprit was scanning/rotating/throttling in
direct reclaim path.

> 
> > This patch throttles MADV_FREEing so it works only if there
> > are enough pages in the system which will not trigger backgroud/
> > direct reclaim. Otherwise, MADV_FREE falls back to MADV_DONTNEED
> > because there is no point to delay freeing if we know system
> > is under memory pressure.
> 
> Hmm, this is still conforming to the documentation because the kernel is
> free to free pages at its convenience. I am not sure this is a good
> idea, though. Why some MADV_FREE calls should be treated differently?

It's hint for VM to free pages so I think it's okay to free them instantly
sometime if it can save more important thing like system stall.
IOW, madvise is just hint, not a strict rule.

> Wouldn't that lead to hard to predict behavior? E.g. LIFO reused blocks
> would work without long stalls most of the time - except when there is a
> memory pressure.

True.

> 
> Comparison to MADV_DONTNEED is not very fair IMHO because the scope of the
> two calls is different.

I agree it's not a apple to apple comparison.

Acutally, MADV_FREE moves the cost from hot path(ie, system call path)
to slow path(ie, reclaim context) so it would be slower if there are
much memory pressure continuously due to a lot overhead of freeing pages
in reclaim context. So, it would be good if kernel detects it nicely
and prevent the situation. This patch aims for that.

> 
> > When I test the patch on my 3G machine + 12 CPU + 8G swap,
> > test: 12 processes
> > 
> > loop = 5;
> > mmap(512M);
> 
> Who is eating the rest of the memory?

As I wrote down,  there are 12 processes with below test.
IOW, 512M * 12 = 6G but system RAM is just 3G.

> 
> > while (loop--) {
> > 	memset(512M);
> > 	madvise(MADV_FREE or MADV_DONTNEED);
> > }
> > 
> > 1) dontneed: 6.78user 234.09system 0:48.89elapsed
> > 2) madvfree: 6.03user 401.17system 1:30.67elapsed
> > 3) madvfree + this ptach: 5.68user 113.42system 0:36.52elapsed
> > 
> > It's clearly win.
> > 
> > Reported-by: Shaohua Li <shli@kernel.org>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> I don't know. This looks like a hack with hard to predict consequences
> which might trigger pathological corner cases.

Yeb, it might be. That's why I tagged RFC so hope other guys suggest
better idea.

> 
> > ---
> >  mm/madvise.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 6d0fcb8921c2..81bb26ecf064 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -523,8 +523,17 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
> >  		 * XXX: In this implementation, MADV_FREE works like
> >  		 * MADV_DONTNEED on swapless system or full swap.
> >  		 */
> > -		if (get_nr_swap_pages() > 0)
> > -			return madvise_free(vma, prev, start, end);
> > +		if (get_nr_swap_pages() > 0) {
> > +			unsigned long threshold;
> > +			/*
> > +			 * If we have trobule with memory pressure(ie,
> > +			 * under high watermark), free pages instantly.
> > +			 */
> > +			threshold = min_free_kbytes >> (PAGE_SHIFT - 10);
> > +			threshold = threshold + (threshold >> 1);
> 
> Why threshold += threshold >> 1 ?

I wanted to trigger this logic if we have free pages under high watermark.

> 
> > +			if (nr_free_pages() > threshold)
> > +				return madvise_free(vma, prev, start, end);
> > +		}
> >  		/* passthrough */
> >  	case MADV_DONTNEED:
> >  		return madvise_dontneed(vma, prev, start, end);
> > -- 
> > 1.9.1
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Kind regards,
Minchan Kim

next prev parent reply	other threads:[~2015-02-25  0:08 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-24  8:18 [PATCH RFC 1/4] mm: throttle MADV_FREE Minchan Kim
2015-02-24  8:18 ` Minchan Kim
2015-02-24  8:18 ` [PATCH RFC 2/4] mm: change deactivate_page with deactivate_file_page Minchan Kim
2015-02-24  8:18   ` Minchan Kim
2015-02-24  8:18 ` [PATCH RFC 3/4] mm: move lazy free pages to inactive list Minchan Kim
2015-02-24  8:18   ` Minchan Kim
2015-02-24 16:14   ` Michal Hocko
2015-02-24 16:14     ` Michal Hocko
2015-02-25  0:27     ` Minchan Kim
2015-02-25  0:27       ` Minchan Kim
2015-02-25 15:17       ` Michal Hocko
2015-02-25 15:17         ` Michal Hocko
2015-02-24  8:18 ` [PATCH RFC 4/4] mm: support MADV_FREE in swapless system Minchan Kim
2015-02-24  8:18   ` Minchan Kim
2015-02-24 16:51   ` Michal Hocko
2015-02-24 16:51     ` Michal Hocko
2015-02-25  1:41     ` Minchan Kim
2015-02-25  1:41       ` Minchan Kim
2015-02-24 15:43 ` [PATCH RFC 1/4] mm: throttle MADV_FREE Michal Hocko
2015-02-24 15:43   ` Michal Hocko
2015-02-24 22:54   ` Shaohua Li
2015-02-24 22:54     ` Shaohua Li
2015-02-25 14:13     ` Michal Hocko
2015-02-25 14:13       ` Michal Hocko
2015-02-25  0:08   ` Minchan Kim [this message]
2015-02-25  0:08     ` Minchan Kim
2015-02-25  7:11     ` Minchan Kim
2015-02-25  7:11       ` Minchan Kim
2015-02-25 15:07       ` Michal Hocko
2015-02-25 15:07         ` Michal Hocko
2015-02-25 18:37       ` Shaohua Li
2015-02-25 18:37         ` Shaohua Li
2015-02-26  0:42         ` Minchan Kim
2015-02-26  0:42           ` Minchan Kim
2015-02-26 19:04           ` Shaohua Li
2015-02-26 19:04             ` Shaohua Li
2015-02-27  3:37     ` [RFC] mm: change mm_advise_free to clear page dirty Wang, Yalin
2015-02-27  3:37       ` Wang, Yalin
2015-02-27  5:28       ` Minchan Kim
2015-02-27  5:28         ` Minchan Kim
2015-02-27  5:48         ` Wang, Yalin
2015-02-27  5:48           ` Wang, Yalin
2015-02-27  6:44           ` Minchan Kim
2015-02-27  6:44             ` Minchan Kim
2015-02-27  7:50             ` Wang, Yalin
2015-02-27  7:50               ` Wang, Yalin
2015-02-27 13:37               ` Minchan Kim
2015-02-27 13:37                 ` Minchan Kim
2015-02-28 13:50                 ` Minchan Kim
2015-02-28 13:50                   ` Minchan Kim
2015-03-02  1:59                   ` Wang, Yalin
2015-03-02  1:59                     ` Wang, Yalin
2015-03-03  0:42                     ` Minchan Kim
2015-03-03  0:42                       ` Minchan Kim
2015-02-28 13:50                 ` [RFC] mm: change mm_advise_free to clear page dirty, " Minchan Kim
2015-02-27 21:02       ` Michal Hocko
2015-02-27 21:02         ` Michal Hocko
2015-02-28  2:11         ` Wang, Yalin
2015-02-28  2:11           ` Wang, Yalin
2015-02-28  6:01           ` [RFC V2] " Wang, Yalin
2015-02-28  6:01             ` Wang, Yalin
2015-03-02 12:38             ` Michal Hocko
2015-03-02 12:38               ` Michal Hocko
2015-03-03  2:06               ` [RFC V3] " Wang, Yalin
2015-03-03  2:06                 ` Wang, Yalin
2015-02-28 13:55           ` [RFC] " Minchan Kim
2015-02-28 13:55             ` Minchan Kim
2015-03-02  1:53             ` Wang, Yalin
2015-03-02  1:53               ` Wang, Yalin
2015-03-02 12:33           ` Michal Hocko
2015-03-02 12:33             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150225000809.GA6468@blaptop \
    --to=minchan@kernel.org \
    --cc=Yalin.Wang@sonymobile.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.